# LATEST ADVANCES IN GLYCOENGINEERING

EDITED BY : Yanmei Li and Zhongping Tan PUBLISHED IN : Frontiers in Chemistry

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-120-6 DOI 10.3389/978-2-88966-120-6

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# LATEST ADVANCES IN GLYCOENGINEERING

Topic Editors: Yanmei Li, Tsinghua University, China Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China

Glycans have long been known to be one of the most abundant biological molecules in living organisms. They can function as energy compounds, form structural cell wall/matrix polymers, or exist as oligomers that are attached on proteins, lipids and natural products to influence their properties and function. Because of their important biological roles, glycans have great potential for applications in the development of new drugs, materials, food additives and many other products. However, it is often difficult to directly obtain glycans from natural sources with ideal properties for these applications. Thus, modification of glycan structures for desired properties has emerged as an active area of research. This research area is generally called glycoengineering.

Citation: Li, Y., Tan, Z., eds. (2020). Latest Advances in Glycoengineering. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-120-6

# Table of Contents


Ganglong Yang, Naseruddin Höti, Shao-Yung Chen, Yangying Zhou, Qiong Wang, Michael Betenbaugh and Hui Zhang


Tao Luo, Ying Zhang, Jiafeng Xi, Yuchao Lu and Hai Dong


Jing-Jing Du, Lian Zhang, Xiao-Fei Gao, Hui Sun and Jun Guo

*109 Recent Progress in Chemo-Enzymatic Methods for the Synthesis of N-Glycans*

Qiang Chao, Yi Ding, Zheng-Hui Chen, Meng-Hai Xiang, Ning Wang and Xiao-Dong Gao


#### *169 Design and Synthesis of Chitosan—Gelatin Hybrid Hydrogels for 3D Printable* in vitro *Models*

Sofia Magli, Giulia Beatrice Rossi, Giulia Risi, Sabrina Bertini, Cesare Cosentino, Luca Crippa, Elisa Ballarini, Guido Cavaletti, Laura Piazza, Elisa Masseroni, Francesco Nicotra and Laura Russo

*182 Protein Glycoengineering: An Approach for Improving Protein Properties* Bo Ma, Xiaoyang Guan, Yaohao Li, Shiying Shang, Jing Li and Zhongping Tan

#### *196 Cell-Free Synthetic Glycobiology: Designing and Engineering Glycomolecules Outside of Living Cells* Thapakorn Jaroentomeechai, May N. Taw, Mingji Li, Alicia Aquino,

Ninad Agashe, Sean Chung, Michael C. Jewett and Matthew P. DeLisa

# Cell Line-, Protein-, and Sialoglycosite-Specific Control of Flux-Based Sialylation in Human Breast Cells: Implications for Cancer Progression

Christopher T. Saeui <sup>1</sup> , Kyung-cho Cho<sup>2</sup> , Vrinda Dharmarha<sup>1</sup> , Alison V. Nairn<sup>3</sup> , Melina Galizzi <sup>3</sup> , Sagar R. Shah<sup>1</sup> , Prateek Gowda<sup>1</sup> , Marian Park <sup>1</sup> , Melissa Austin<sup>1</sup> , Amelia Clarke<sup>1</sup> , Edward Cai <sup>1</sup> , Matthew J. Buettner <sup>1</sup> , Ryan Ariss <sup>1</sup> , Kelley W. Moremen<sup>3</sup> , Hui Zhang<sup>2</sup> and Kevin J. Yarema1,4,5 \*

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Mare Cudic, Florida Atlantic University, United States Francisco Solano, University of Murcia, Spain*

> \*Correspondence: *Kevin J. Yarema kyarema1@jhu.edu*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *19 November 2019* Accepted: *07 January 2020* Published: *05 February 2020*

#### Citation:

*Saeui CT, Cho K, Dharmarha V, Nairn AV, Galizzi M, Shah SR, Gowda P, Park M, Austin M, Clarke A, Cai E, Buettner MJ, Ariss R, Moremen KW, Zhang H and Yarema KJ (2020) Cell Line-, Protein-, and Sialoglycosite-Specific Control of Flux-Based Sialylation in Human Breast Cells: Implications for Cancer Progression. Front. Chem. 8:13. doi: 10.3389/fchem.2020.00013* *<sup>1</sup> Department of Biomedical Engineering, Translational Tissue Engineering Center, The Johns Hopkins University, Baltimore, MD, United States, <sup>2</sup> Department of Pathology, The Johns Hopkins School of Medicine, Baltimore, MD, United States, <sup>3</sup> Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States, <sup>4</sup> Department of Chemical and Biomolecular Engineering, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, United States, <sup>5</sup> Department of Oncology, The Johns Hopkins School of Medicine, Baltimore, MD, United States*

Sialylation, a post-translational modification that impacts the structure, activity, and longevity of glycoproteins has been thought to be controlled primarily by the expression of sialyltransferases (STs). In this report we explore the complementary impact of metabolic flux on sialylation using a glycoengineering approach. Specifically, we treated three human breast cell lines (MCF10A, T-47D, and MDA-MB-231) with 1,3,4-O-Bu3ManNAc, a "high flux" metabolic precursor for the sialic acid biosynthetic pathway. We then analyzed N-glycan sialylation using solid phase extraction of glycopeptides (SPEG) mass spectrometry-based proteomics under conditions that selectively captured sialic acid-containing glycopeptides, referred to as "sialoglycosites." Gene ontology (GO) analysis showed that flux-based changes to sialylation were broadly distributed across classes of proteins in 1,3,4-O-Bu3ManNAc-treated cells. Only three categories of proteins, however, were "highly responsive" to flux (defined as two or more sialylation changes of 10-fold or greater). Two of these categories were cell signaling and cell adhesion, which reflect well-known roles of sialic acid in oncogenesis. A third category—protein folding chaperones—was unexpected because little precedent exists for the role of glycosylation in the activity of these proteins. The highly flux-responsive proteins were all linked to cancer but sometimes as tumor suppressors, other times as proto-oncogenes, or sometimes both depending on sialylation status. A notable aspect of our analysis of metabolically glycoengineered breast cells was decreased sialylation of a subset of glycosites, which was unexpected because of the increased intracellular levels of sialometabolite "building blocks" in the 1,3,4-O-Bu3ManNAc-treated cells. Sites of decreased sialylation were minor in the MCF10A (<25% of all glycosites) and T-47D (<15%) cells but dominated in the MDA-MB-231 line (∼60%) suggesting that excess sialic acid could be detrimental in advanced cancer and cancer cells can

**5**

evolve mechanisms to guard against hypersialylation. In summary, flux-driven changes to sialylation offer an intriguing and novel mechanism to switch between context-dependent pro- or anti-cancer activities of the several oncoproteins identified in this study. These findings illustrate how metabolic glycoengineering can uncover novel roles of sialic acid in oncogenesis.

Keywords: metabolic glycoengineering, ManNAc analogs, breast cancer, sialylation, sialic acid, metabolic flux

#### INTRODUCTION

Sialic acid is a unique 9-carbon sugar that caps mammalian glycans and determines many aspects of a cell's interaction with its microenvironment in health and disease (Varki, 1997, 2008; Schauer, 2009). The incorporation of this sugar into glycans generally has been assumed to be controlled primarily by STs, a family of 20 enzymes in humans (Du et al., 2009; Li and Chen, 2012). For example, early mathematical models of N-linked glycosylation (Krambeck and Betenbaugh, 2005) (including sialylation; Monica et al., 1997), were based solely on enzyme levels and activity. The idea that enzyme activity can predict glycan patterns was supported by the ability of mathematical models to be trained to reflect experimentally observed sialylation with reasonable accuracy and to predict glycosylation patterns found in different subtypes of cancers (Krambeck et al., 2009; Bennun et al., 2013). The prevailing premise that metabolic flux plays a small—perhaps even negligible role—in sialylation was consistent with the antiport nature of CMP-sialic acid import into the lumen of the Golgi where transfer of the sialic acid moiety from this nucleotide sugar donor to nascent glycoconjugates occurs. Specifically, the antiport transfer of spent CMP **out** of the Golgi limits flux **into** this organelle (Hadley et al., 2014). As a result, regardless of how much flux enters the sialic acid biosynthetic pathway via ManNAc, the committed precursor to the pathway (Keppler et al., 1999; Luchansky et al., 2004), later bottlenecks (Viswanathan et al., 2003) limit subsequent glycan sialylation. Certain experimental results support this premise, including findings that sialuria mutations of UDP-GlcNAc 2-epimerase/ManNAc kinase (GNE) that greatly increase intracellular sialic acid production (Seppala et al., 1999) do not necessarily translate into correspondingly large increases in cell surface sialylation (Yarema et al., 2001); similarly, lossof-activity mutations do not always correspondingly diminish sialylation (Hinderlich et al., 2004; Salama et al., 2005). Finally, introduction of exogenously-supplied ManNAc (or ManNAc precursors) into cells can result in large (e.g., 10–100-fold) increases in intracellular sialic acid with minimal (e.g., only 0.05– 0.25-fold) changes to surface sialylation (Jacobs et al., 2001; Jones et al., 2004).

Gaining clear-cut evidence for flux-based changes to sialic acid has been hampered by technical difficulties in introducing ManNAc, the precursor for sialic acid biosynthesis (Luchansky et al., 2003), into cells. Mammalian cells lack plasma membrane transporters for this sugar, necessitating uptake by pinocytosis. As a consequence, internalization is not saturated even at very high concentrations of exogenous ManNAc (e.g., 75 mM; Yarema et al., 1998), at which point osmotic stress decreases cell viability and adversely affects sialylation. Similar pitfalls—decreased cellular viability and even overt cytotoxicity (Jones et al., 2004; Kim et al., 2004a,b)—occurs with peracetylated sugar analogs (Sarkar et al., 1995, 1997). For context, peracetylation is a strategy that facilitates cellular uptake of ManNAc (Hadfield et al., 1983; Schwartz et al., 1983; Lemieux et al., 1999). Once peracetylated ManNAc is inside a cell, non-specific esterases (Mathew et al., 2012, 2017) remove the ester-linked acetate groups or other short chain fatty acids (SCFAs) such as propionate or butyrate (Kim et al., 2004b; Sampathkumar et al., 2006; Hao et al., 2019). Our team overcame these difficulties by omission of esterlinked SCFAs from the C6-OH position of ManNAc, which largely eliminates cytotoxicity (Aich et al., 2008; Wang et al., 2009) and ameliorates other off-target effects (Campbell et al., 2008; Elmouelhi et al., 2009). By using the resulting "high flux" analogs (exemplified by 1,3,4-O-Bu3ManNAc, **Figure 1**) we can introduce saturating levels of flux into the sialic acid pathway at sub-cytotoxic levels (Almaraz et al., 2012a; Yin et al., 2017, 2018).

Having developed 1,3,4-O-Bu3ManNAc to effectively enhance sialylation, we exploited this glycoengineering tool to gain insight into the role of sialic acid in cancer progression. For example, we previously used 1,3,4-O-Bu3ManNAc to increase sialylation in SW1990 pancreatic cancer cells and determined changes in N-glycan sialylation through glycosite analysis (Almaraz et al., 2012b; Shah et al., 2013; Tian et al., 2015). In parallel we linked these changes to cell behaviors associated with cancer such as integrin-mediated cell motility (Almaraz et al., 2012b) and EGFR-related drug sensitivity (Mathew et al., 2015, 2016). Although limited to a single cell line, these results unambiguously showed that metabolic flux can influence sialylation; cell surface sialic acid increased globally by ∼75% and individual glycosites increased by as much as ∼8-fold. Subsequent studies expanded our analyses to multiple cell lines and to a different type of cancer by comparing intracellular sialic acid production in different subtypes of breast cancer (Saeui et al., 2018). Certain results

**Abbreviations:** FAT1, FAT atypical cadherin 1; FKBP10, FKBP prolyl isomerase 10; GNE, UDP-GlcNAc 2-epimerase/ManNAc kinase; GO, gene ontology; HYOU1, hypoxia up-regulated protein 1; IGF2R, insulin-like growth factor 2 receptor; L1CAM, L1 cell adhesion molecule; LRP1, low-density lipoprotein receptor-related protein 1; NANP, Neu5Ac 9-phosphate phosphatase; NANS, Neu5Ac 9-phosphate synthase; NCSTN, nicastrin; NEU1, neuramidase 1; PTPRJ, protein tyrosine phosphatase receptor-type J; SAMG, sialic acid metabolism and glycosylation; SLC35A1, cytidine 5′ -monophosphate (CMP) sialic acid transporter; SLC35A3, UDP-N-acetylglucosamine transporter; SORL1, sortilin-related receptor; SPEG, solid phase extraction of glycopeptides; ST, sialyltransferase; SFCA, short chain fatty acid; TENM3, teneurin-3; TXNDC11, thioredoxin domain containing 11 protein.

from the breast cell lines were consistent with the known role of sialic acid in cancer; for example, flux-driven sialometabolites increased dramatically in the T-47D breast cancer line compared to a much smaller increase in the near-normal MCF10A line (**Figure 1**), consistent with the well-known oncogenic role of sialic acid. By comparison, sialic acid production in the advanced triple negative MDA-MB-231 line was lower than in the early stage T-47D line; this finding was unexpected because oncogenesis is driven by sialic acid leading to the presumption that advanced cancers have higher levels of this sugar.

In the current study, we sought additional insight into fluxdriven sialylation at various stages of breast cancer by conducting glycosite evaluation of sialoglycans of the MCF10A, T-47D, and MD-MD-231 lines using solid phase extraction of glycopeptide (SPEG) analysis (Zhang et al., 2003; Tian et al., 2007). All together, 1,410 sites of N-linked glycosylation were identified in common across these three breast cell types. As described in this report, the newly-obtained results raise intriguing new insights into the role of metabolic flux-based control of sialic acid in oncogenesis.

#### MATERIALS AND METHODS

#### Materials

We purchased chemical reagents from Sigma-Aldrich (St. Louis, MO) to synthesize, purify, and characterize 1,3,4-O-Bu3ManNAc as previously described (Aich et al., 2008); characterization data is provided in **Supplemental File S1**. We purchased the cell lines MCF10A (ATCC CRL-10317), T-47D (ATCC HTB-133), and MDA-MB-231 (ATCC CRM-HTB-26) from the American Type Culture Collection (ATCC, Manassas, VA). Cell lines were authenticated by the Johns Hopkins Genetic Resources Core Facility using short tandem repeat (STR) profiling according to the National Institutes of Health (NIH) recommendations and by cross-referencing the resulting STR data with both the ATCC and the German Collection of Microorganisms and Cell Cultures (DSMZ) data repositories for cell authentication.

#### Cell Culture

We maintained stock cultures of the T-47D and MDA-MB-231 cell lines in RPMI-1640 medium (Corning 10-040-CV) supplemented with 10% fetal bovine serum (v/v) (Corning 35-011-CV) and the appropriate dilution of 20× antibioticantimycotic solution (Thermo Fisher Scientific 15240062). MCF10A cells were maintained in the same media supplemented with 10µg/mL insulin (Thermo Fisher Scientific 12585014) and 5.0µg/mL hydrocortisone (Sigma Aldrich H0888); these media are referred to as "growth media" below. As noted below, sialic acid metabolism and quantification experiments were conducted with 1.0% fetal bovine serum (v/v) in the absence of antibioticantimycotic solution to avoid ST inhibition and to reduce the salvage and recycling of sialic acid-containing serum components (Bonay et al., 1996; Badr et al., 2015a,b); this low serum antibiotics-free medium is referred to as "assay media" below. We also note that hydrocortisone can increase sialylation (e.g., in HeLa (Carubelli and Griffin, 1967) and Chinese hamster ovary (Rouiller et al., 2012) cells); it is unknown if it affects MCF10A cells in this way but we do not believe this to be a significant confounding factor in the current set of experiments because this reagent was included in both untreated control cells and 1,3,4-O-Bu3ManNAc-treated test cells.

### Cell Proliferation and Sialic Acid Production Assays

Cells cultured in growth media were collected via trypsinization, counted, and plated in 150 mm tissue culture dishes (5.0 × 10<sup>6</sup> cells per dish) in 20 mL of assay medium. The cells were allowed to attach to the plates overnight, and then treated with 1,3,4- O-Bu3ManNAc (typical concentrations tested include 0, 10, 50, 100, and 250µM, exact concentrations used in any particular experiment are indicated below). Appropriate dilutions of analog were added to each test condition from a 100 mM stock solution (stock solutions were either maintained in ethanol and stored at −20◦C for up to 3 months or as lyophilized analog, which is stable when stored under nitrogen at −80◦C for up to 2 years). The untreated controls were exposed to the equivalent volume of ethanol given to cells subject to the 250µM dose (the maximum amount of ethanol added [0.25% v/v] has previously been shown to have no observable effect on cell growth, viability, or sialylation). At the specified time points (6, 24, and 48 h), the cells were detached using non-enzymatic buffer (Cellstripper Corning 25-056-CI) and cell counts were performed as previously described (Almaraz et al., 2012a) using a Beckman-Coulter Z2 Coulter Counter. Each experiment was performed in triplicate and cell counts were normalized to untreated controls. Intracellular free and conjugate bound sialic acid levels were determined for treated (100µM) and untreated cells using the periodate resorcinol method as previously described (Jourdian et al., 1971; Yarema et al., 2001; Saeui et al., 2018).

### Transcript Analysis of Sialic Acid Metabolism and Glycosylation (SAMG) Genes

Cells were treated with 1,3,4-O-Bu3ManNAc at 0 or 100µM in assay media using 100 × 20 mm tissue culture plates and 3.0 × 10<sup>6</sup> cells per dish. After incubation with the analogs for 24 h, the cells were harvested by scraping, counted, and portioned into aliquots of 1.0 × 10<sup>6</sup> cells that were flash-frozen in liquid nitrogen and stored at −80◦C until analysis. Total RNA isolation and cDNA synthesis on three biological replicates of each treatment condition was carried out as described previously for quantitative RT-PCR analysis of SAMG genes; e.g., sialyltransferases and Golgi transporters (Nairn et al., 2007). The qRT-PCR reactions were performed in triplicate for each gene analyzed using primer pairs listed in **Supplemental File S2**. Amplification conditions and data analysis was performed as described (Nairn et al., 2010; Saeui et al., 2018); briefly, Ct values for each gene were normalized with the control gene, RPL4, prior to calculation of relative transcript abundance. Each experiment and PCR analysis was performed in triplicate. Statistical analyses were conducted for pairs of samples as well as multiple sample and treatment comparisons (Tukey's test).

### SPEG Analysis of Sialoglycopeptide

Cells were incubated with 0 or 100µM 1,3,4-O-Bu3ManNAc for 24 h in assay media (eight 150 × 25 mm plates were used for each condition to obtain ≥10<sup>9</sup> cells per condition at the end of the incubation period). After treatment cells from each condition were pooled and fractions were subjected to modified SPEG analyses (Zhang et al., 2003; Tian et al., 2007). Briefly, each pooled sample was subject to trypsin digestion using 1.0 mg of protein; the resulting peptides were separated by C18 chromatography using 60% acetonitrile with 0.1% TFA; and 200 µL of the C18 eluate was used for protein enrichment. Of note, we used a modification to the originally reported SPEG procedure to selectively oxidize sialic acids by using 1.0 mM precooled sodium periodate for 15 min, as reported in Almaraz et al. (2012b). These mild conditions avoid non-specific glycan oxidation, resulting in selective capture and identification of sialylated glycopeptides, which are termed "sialoglycosites" throughout this report.

## RESULTS

### Cell Viability and Sialic Acid Metabolism

The impact of 1,3,4-O-Bu3ManNAc on intracellular sialic acid metabolism was evaluated using conditions (e.g., 100µM treatment for 24 h; Almaraz et al., 2012a; Saeui et al., 2018) we previously optimized across several human cell lines to provide a substantial increase in flux into the sialic acid biosynthetic pathway (Almaraz et al., 2012b) while avoiding decreased cell viability (Jones et al., 2004; Kim et al., 2004b). ManNAc analog treatment did not affect cell counts in any of the three breast cell lines compared to untreated controls (**Figure 2A**). In this study we used cell counts as a surrogate measure for cell viability; we have reported more extensive and rigorous evaluation of growth inhibition and cytotoxicity

elsewhere (Jones et al., 2004; Kim et al., 2004b; Sampathkumar et al., 2006; Almaraz et al., 2012a). Sialic acid levels increased in all lines, although without statistical significance in the near-normal MCF10A line despite almost doubled levels in these cells (**Figure 2B**). These results were consistent with a previous study where we documented increased flux into the sialic acid biosynthetic pathway upon 1,3,4-O-Bu3ManNAcsupplementation (Saeui et al., 2018). After confirming that 1,3,4-O-Bu3ManNAc had the expected impact on intracellular sialic acid metabolism, we conducted detailed glycoproteomics characterization (as described below) to evaluate its downstream impact on sialoglycoconjugate formation.

### Pearson Correlation and Hierarchal Clustering

In the next experiments, we conducted Pearson correlation and hierarchal clustering after SPEG isolation and identification of the sialylated glycopeptides. We found that the quantification of sialylated peptide abundance had an acceptable Pearson correlation coefficient (r) of approximately 0.8 between the same cell lines as well as in replicate runs whereas the r value between cell lines was under 0.62 (MCF10A vs. T-47D) or 0.5 (MCF10A vs. MDA-MB-231 or T-47D vs. MDA-MB-231) (**Figure 3A**). Furthermore, the abundance of sialylated peptides in each cell line showed definite hierarchal clustering tendencies, regardless of 1,3,4-O-Bu3ManNAc treatment (**Figure 3B**). An interesting facet of the hierarchal clustering was that the T-47D cancer line showed greater similarity to the near-normal MCF10A line than to the advanced stage MDA-MB-231 cancer cells. For example, peptides over-represented in the MDA-MB-231 cells, which appear near the bottom of the heatmap, were under-represented in the other two lines, and vice versa. This result presaged subsequent results where the sialylation characteristics uncovered by our glycoengineering approach diverged substantially for the advanced MDA-MB-231 line compared to the MCF10A and T-47D lines.

Hierarchal clustering data also showed that 1,3,4-O-Bu3ManNAc altered sialylglycopeptide abundance **within** a cell type (**Figure 3B**) as evident when comparing control "(−)" and analog-treated "(+)" sample heatmaps from each cell line. In general, the MCF10A and T-47D cell lines responded to 1,3,4- O-Bu3ManNAc treatment as expected with an unambiguous overall increase in sialylation for many of the sialoglycopeptides, as indicated in red. Furhtermore, increased sialylation was more pronounced in T-47D cells compared to the MCF10A line. The response to 1,3,4-O-Bu3ManNAc observed in the MCF10a and T-47D lines was consistent with the generally accepted premise that cancer cells (i.e., the T-47D line) have greater sialylation compared to normal cells (represented by the near-normal MCF10A line). Hierarchal clustering analysis of the MDA-MB-231 line, by contrast, did not show any clear trend toward increased sialylation upon 1,3,4-O-Bu3ManNAc treatment, which was puzzling considering the generally accepted role of sialic acid in cancer progression.

#### Flux-Based Control of Glycosite Sialyation Flux-Based Regulation of Sialoglycosites—Global Considerations

The SPEG analysis identified 1410 sialylated glycopeptides that were present in all three breast cell lines (the complete data set is supplied in **Supplemental File S3**). To understand the biological significance of this data set, we conducted GO analysis of these 1410 glycosites and found they were spread across many functional categories (**Supplemental File S4**) and included both oncogenic proteins and those not related to cancer. Because we did not gain meaningful insights from this global data analysis, we next focused on proteins where at least two changes in sialoglycosite sialylation of >10-fold occurred upon 1,3,4-O-Bu3ManAc treatment. In some cases these changes occurred at different sialoglycosites in the same cell line, in other cases the changes occurred at the same sialoglycosite in different lines, while in other cases the changes occurred at different sialoglycosites in different lines. In the selection of these proteins, we counted both increases and decreases in sialylation of >10-fold and found 13 proteins that met these criteria (**Figure 4**; of these proteins three had two sialoglycosites with >10-fold changes,

three proteins had three such sialoglycosites, six proteins had four, and one had five).

A case-by-case evaluation of these proteins—which we deem to be "highly flux-responsive" because of these large changes (i.e., at least two changes of 10-fold or more) in sialoglycosite abundance upon treatment with 1,3,4-O-Bu3ManNAc—revealed interesting biological features. First, in contrast to the global GO analysis, the highly flux-responsive proteins disproportionately fell into two expected categories as well as an unexpected category. The first category includes proteins involved in cell signaling (**Figure 4A**), a second category includes proteins involved in cell adhesion (**Figure 4B**), and the third (unexpected) category consists of chaperones linked to protein folding (**Figure 4C**). One notable aspect of these findings was that proteins involved in neural development were strongly represented (i.e., five of the 13 [LRP1, PLXNB2, PLXND1, L1CAM, and TENM3] fit into this category). We discuss each of these categories in more detail next including the cancerrelevance of each of the 13 proteins.

#### Cell Signaling (Figure 4A)

Sialic acid is strongly linked to cell signaling (Allende and Proia, 2002; Schauer, 2009; Parker and Kohler, 2011); accordingly, we were not surprised that proteins identified as highly responsive to metabolic flux fell into this category. One of these, **IGF2R** (insulin-like growth factor 2 receptor), is a dual receptor for insulin-like growth factor 2 and mannose 6-phosphate. Functions of IGF2R include intracellular trafficking of lysosomal enzymes, activation of TGFβ, and the degradation of IGF2 (Bergman et al., 2013). A regulatory circuit links the insulin/IGF system with cancer through the glycosylation status of IGF2R (de-Freitas-Junior et al., 2017); in one example, desialylation of insulin receptors controls the proliferation of L6 myoblasts (Arabkhari et al., 2010). Finally, human tumors (including breast carcinomas) show genetic loss or mutation of IGF2R (Kalla Singh et al., 2010). A second receptor, **LRP1** (low-density lipoprotein receptor-related protein 1), is a highly-glycosylated protein that plays a role in endocytosis, modulates cellular events related to β-amyloid precursor protein metabolism, mediates kinasedependent intracellular signaling, and is involved in neuronal calcium signaling as well as neurotransmission (Mao et al., 2017). LRP1 has been linked to a causative role in breast cancer susceptibility based on ethnic origins (Beneš et al., 2003). A third highly flux-responsive protein involved in signal transduction (albeit not a receptor per se) is **NCSTN** (nicastrin). Nicastin is an essential subunit of the γ-secretase complex that catalyzes intramembrane cleavage of receptors involved in Notch signaling in a glycosylation- and sialic acid-dependent manner (Yu et al., 2000; Moniruzzaman et al., 2018). Nicastrin modulates the epithelial to mesenchymal transition and tumorigenicity in breast cancer cells (Lombardo et al., 2012, 2014). Finally, we identified two plexins (**PLXNB2**, plexin B2 and **PLXND1**, plexin D1). Plexins are proteins that function as receptors for semaphorin signaling proteins that play important roles in neuronal development (e.g., axonal guidance) (Janssen et al., 2012). In cancer plexins can be either oncogenic or tumor suppressors; for example, plexins A1-4 are tumor suppressors while plexin-B2 is tumor promoting (Ramesh et al., 2018) and plexin D1 has been associated with tumor vasculature (Roodink et al., 2009).

FIGURE 4 | Examples of proteins with "glycosites" with large increases (up or down or both) in sialylation. Proteins are shown that illustrate how flux-driven sialylation can selectively influence glycopeptide sialylation at the cell, protein, or sub-protein (i.e., sialoglycosite) levels that fall into three functional categories: (A) cell signaling, (B) cell adhesion, and (C) protein folding chaperones. Note that in this figure each sialoglycosite is arbitrarily numbered, the exact sites within each protein are provided in Supplemental File S5. (D) The absolute number of consensus sequons (gray bars), predicted sites of N-glycans (white bars, as predicted by NetNGly), and observed sites of sialylglycopeptides identified in this study for each of the eight highly responsive oncoproteins listed in (A–C). The percentages listed represent the number of experimentally observed compared to the total predicted number of N-glycan consensus sequon sites.

#### Cell Adhesion (Figure 4B)

Sialic acid is strongly associated with cell adhesion. Indeed, intracellular levels of sialic acid previously have been linked to neuronal cell adhesion molecule (NCAM) sialylation (Bork et al., 2005), a protein similar to L1 cell adhesion molecule (**L1CAM**) identified in the current study. Consequently, we found it unsurprising that flux-based changes to sialylation affected molecules in this category. In health, L1CAM is an axonal glycoprotein involved in the dynamics of cell adhesion and in the generation of transmembrane signals at tyrosine kinase receptors. In cancer, it is an established biomarker for triple negative, advanced cancers with poor prognosis (Doberstein et al., 2014; Altevogt et al., 2016) specifically due to changes in sialylation linked to metastasis (Hoja-Łukowicz et al., 2013). A second cell adhesion protein we identified was FAT atypical cadherin 1 (**FAT1)**, a highly glycosylated cadherin-like protein that plays a role in cell migration, lamellipodia dynamics, cell polarity, and cell-cell adhesion (Katoh, 2012; Zhang et al., 2016). FAT1 repression in cancer occurs due to homozygous deletion or epigenetic silencing and is preferentially downregulated in invasive breast cancer (Katoh, 2012). Third, **TENM3** (Teneurin-3) is a single pass, richly glycosylated type II transmembrane protein that is one of four human Teneurins, a family involved in cell-cell adhesion and organization of neuronal synapses (Mosca, 2015; Jackson et al., 2018). Teneurins 2 and 4 have been linked to tumor differentiation and patient survival in ovarian cancer (Graumann et al., 2017) and Teneurin 3 is expressed at low to moderate levels in a subset of breast cancer patients (e.g., 4 of 12 reported in the Protein Atlas database, https://www.proteinatlas. org/ENSG00000218336-TENM3/pathology). Finally, **PTPRJ** (protein tyrosine phosphatase, receptor-type, J [a.k.a., receptortype tyrosine-protein phosphatase eta]) negatively regulates PDGF, EGF, and VEGF signaling, and as such is a tumor suppressor gene (Smart et al., 2012). This protein's downstream targets play a role in cell-cell adhesion, cell-matrix adhesion, cell migration, cell adhesion, and barrier function of epithelial junctions during reassembly (Smart et al., 2012), thus positioning PTPRJ at the interface between signaling and cell adhesion (for the purposes of this discussion we arbitrarily included it in the cell adhesion category). Interestingly PTPRJ is one of 46 proteins we previously identified using 1,3,4-O-Bu3ManNAz (an azide-modified analog of 1,3,4-O-Bu3ManNAc) in the SW1990 pancreatic cancer line (Tian et al., 2015), indicating that this oncoprotein is responsive to flux through the sialic acid pathway across cancer and ManNAc analog types.

#### Protein Folding and Trafficking (Figure 4C)

In contrast to signal transduction and cell adhesion, the third category of highly flux-responsive proteins—molecular chaperones that assist in protein folding and other proteins involved in protein trafficking—was unexpected. For context, although these proteins—exemplified by calnexin and calreticulin—require glycopeptides as binding partners (Helenius and Aebi, 2001), relatively little is known about their own glycosylation (the only reports of glycosylation in online proteomic or genomic databases for two of the proteins identified in the current study came from an unrelated publication from our team; Hu et al., 2018). One of these proteins, **HYOU1** (hypoxia up-regulated protein 1), assists protein folding and secretion from the ER and has a pivotal role in cytoprotection during oxygen deprivation (Ikeda et al., 1997). This protein is highly expressed in the liver and pancreas, in macrophages found within aortic atherosclerotic plaques, and in breast cancer (Wang et al., 2015). A second member of this category, **FKBP10** (FKBP prolyl isomerase 10; Ishikawa et al., 2008) accelerates protein folding during synthesis. In cancer, aberrant epigenetic regulation of FKBP10 predicts poor clinical prognosis (Carmona et al., 2014). Third, **TXNDC11** (thioredoxin domain containing 11 protein) is another protein folding chaperone (Wang et al., 2005) linked to breast cancer through 2,522 mutations in the COSMIC database (as of November, 2018; https://cancer.sanger. ac.uk/cosmic/gene/analysis?ln=TXNDC11). Finally, **SORL1** (sortilin-related receptor) binds the receptor-associated protein and helps coordinate the cellular uptake, endosomal trafficking, and subsequent proteolytic processing of lipoproteins; in some cases such as the amyloid precursor protein, SORL1 impedes proteolytic processing (Rohe et al., 2008). (Note that we include SORL1 in this category because of its role in protein quality control and trafficking although unlike HYOU1, FKBP10, and TXNDC11 it is involved in protein recycling and degradation rather than biosynthesis). A "mutome" analysis identified SORL1 to be down-regulated in breast cancer (Hernández et al., 2007), suggesting that it may be a tumor suppressor.

An interesting aspect of the three molecular chaperones identified as being highly flux responsive (i.e., HYOU1, FKBP10, and TXNDC11; **Figure 4C**) was their fewer number of predicted sites of N-glycosylation compared to proteins related to signaling (**Figure 4A**) and adhesion (**Figure 4B**). Specifically all three of these proteins had fewer than 10 consensus sequons for Nglycans while the other proteins had from 21 to 49 sequons, depending on the protein (**Figure 4D**). Despite this large number of potential sites of N-glycosylation, only about half of these sites were predicted to be occupied using NetNGly (Blom et al., 2004) (from 13 to 31), with only a subset of this latter group identified in this study (from 3 to 12). To present this data another way, only 33 or 29% (for cell signaling and cell adhesion proteins, respectively) of possible acceptor sites were occupied with sialylated N-glycans (**Figure 4D**). By comparison, the molecular chaperones had a smaller number of possible consensus sequons (from 7 to 9) but sialoglycan occupancy was higher (at 50%), suggesting that sialylation is important for their activity.

#### Mechanism of 1,3,4-O-Bu3ManNAc-Based Control of N-glycan Sialylation Transcript Profiling of SAMG Genes

The down-regulated sialoglycosites identified in **Figure 4** were unexpected because, when flux through a metabolic pathway increases, product formation logically should increase in tandem. Our previous study with SW1990 cells was consistent with this premise, failing to identify any sialoglycosites with decreased abundance upon 1,3,4-O-Bu3ManNAc treatment (Almaraz et al., 2012b). A possible explanation for the apparently anomalous results in the current study was that butyrate released from 1,3,4-O-Bu3ManNAc epigenetically modulated gene expression through changes to histone acetylation (Sampathkumar et al., 2006) and affected transcription of SAMG genes in the breast lines differently than in SW1990 pancreatic cancer cells. This mechanism is plausible because 1,3,4-O-Bu3ManNAc can simultaneously up- and down-regulate transcription (Elmouelhi et al., 2009). Experimentally, profiling of SAMG genes revealed differences in transcript levels of SAMG genes between the cell lines (**Figure 5**, data for the MCF10A, T-47D, and MDA-MB-231 lines are shown in Panels **A**, **B**, and **C** respectively, with statistical analyses in Panel **D**). Such differences were consistent with basal differences in sialylation observed in each cell line (e.g., as shown in the clustering analysis of untreated cells in **Figure 3**). More germane to the flux-based glycoengineering evaluated in this paper, however, was that the transcript levels for SAMG genes changed relatively little (if any) upon analog supplementation as evidenced by the side-by-side comparison of each gene with and without 1,3,4-O-Bu3ManNAc treatment where only four statistically significant changes were observed (**Figure 5D**).

Having ruled out overt effects on transcription, we reasoned that glycosite sialylation could be affected by the availability of CMP-Neu5Ac in the Golgi, which can selectively activate subsets of cell's repertoire of STs toward individual glycosites (presumably each glycosite:ST interaction has an individualized K<sup>M</sup> value) (Legaigneur et al., 2001; Gupta et al., 2017). Alternatively, the activity of the SAMG gene products themselves could be altered by flux-driven sialylation; increased esterase activity upon enhanced sialylation in 1,3,4-O-Bu3ManNActreated cells provides precedent for the flux-based control of a sialylated protein's activity (Mathew et al., 2017) along with evidence that sialylation of STs can impact their activity (Breen, 2002). To test if flux-based sialylation was relevant to STs and other SAMG proteins evaluated in the current study, we analyzed our dataset and found 10 sialoglycosites in six SAMG proteins, five of which were STs (**Figure 5E**). Although many of the changes were modest, 12 (of the 30 possible changes) exceeded a 2-fold change and were plausibly biologically significant.

Three of these proteins were particularly noteworthy. First, by far the largest change (of 619-fold) was for ST6GAL1 in the T-47D line; this glycosite is essentially unsialylated in untreated cells (i.e., sialylation must be <0.16% under basal conditions to observe a 619-fold increase). If increased sialylation at this site is activating, this change is consistent with the oncogenic role of this ST as described in several publications from the Bellis group (Zhuo and Bellis, 2011; Schultz et al., 2012, 2016 and corroborated by others; Meng et al., 2013). Conversely, sialylation of ST6GALNAC2 was strongly down-regulated upon 1,3,4-O-Bu3ManNAc treatment in both the MCF10A and T-47D lines but barely affected in the MDA-MB-231 line. The evaluation of global sialoglycosite abundance (as presented next, below) suggests that sialylation of this enzyme may be inactivating. Specifically, if sialylation is inactivating, the higher sialylation of this enzyme in MDA-MB-231 cells could explain the overall lower siaylation in this line. Finally, the recycling enzyme neuramidase 1 (NEU1) experienced diametrically opposed sialylation at one glycosite with strong up-regulation in the near-normal MCF10A line and strong down-regulation in the advanced MDA-MB-231 line. In this case if sialylation is inactivating, the less sialylated form of NEU1 in MDA-MB-231 cells would have enhanced activity, consistent with the lower levels of sialylated N-glycans in this line.

#### "Global" Analysis of Sialoglycopeptides

As discussed above, many sialoglycosites experienced increased sialylation upon 1,3,4-O-Bu3ManNAc supplementation consistent with the increased intracellular production of sialic acid in the treated cells treated (**Figure 2**). By contrast, the decreased abundance of other sialoglycosites was unexpected because of the intuitive expectation that elevated levels of intracellular sialic acid would enhance sialoglycoconjugate formation. The surprising decreases in sialylation could be due to increased sensitivity of mass spectrometry that now allows identification of low abundance outliers with atypical sialylation or could result from the different cell types now analyzed (i.e., breast compared to pancreatic cancer cells) compared to our earlier study with pancreatic SW1990 cells (Almaraz et al., 2012b). Alternatively, the currently observed decreases could be artifacts of the experimental process; we reasoned that if technical issues were responsible similar trends would be observed across all cell lines. Accordingly, we plotted all sialoglycosite-specific changes observed upon 1,3,4-O-Bu3ManNAc treatment in the three lines (**Figure 6**). We first compared the number of proteins that generated single sialoglycopeptides with those that produce two or more sialoglycosites; the latter proteins were further categorized based on whether the sialoglycosites experienced uni- or bi-directional changes in abundance (**Figure 6A**). In subsequent panels, we provide dot representations showing the fold-change of each glycosite to compare responses in each cell line for each of these three categories with data for proteins with two or more sialoglycosites with unidirectional changes is given in **Figure 6B**; proteins that experienced bidirectional changes are shown in **Figure 6C**; and proteins that generated a single glycosite are shown in **Figure 6D**. In all cases, the majority of sialoglycosites identified in the MCF10A and T-47D lines (red and blue dots) clustered above 0 on the y-axis indicating increased abundance in 1,3,4-O-Bu3ManNAc-treated cells while glycosites from the MDA-MB-231 line (green dots) were disproportionately skewed below 0, indicating reduced sialylation.

We next normalized sialoglycopeptides based on abundance and then scaled and summed these values to represent the aggregate abundance of up- and down-regulated sialopeptides in each line (**Figure 7**). The results reinforced that both the MCF10A and T-47D cell lines responded as expected with the preponderance of glycopeptides experiencing increased sialylation upon 1,3,4-O-Bu3ManNAc treatment. In addition, the aggregate increase in sialylation was more pronounced in the T-47D line compared to the MCF10A line, consistent with the increased amount of intracellular sialic acid produced in T-47D cells (**Figure 2**). Similarly, the hierarchal clustering analysis showed that more peptides with decreased sialylation occurred in the near-normal MCF10A line (e.g., in the topmost cluster,


FIGURE 5 | Transcript levels (A–D) and sialylation changes (E) observed for SAMG genes. Transcript levels of the SAMG genes were compared in the three breast lines: (A) MCF10A, (B) T-47D, and (C) MDA-MB-231 with and without incubation with 100µM 1,3,4-O-Bu3ManNAc with *p-*values given in panel (D) with any statistically significant changes (*p* < 0.05) highlight in red font. (E) Sialoglycosites in SAMG proteins with > 2-fold changes in sialylation in at least one of the three cell lines are shown (the exact sites of these sialoglycosites are provided in Supplemental File S5). Transcript analyses were performed in triplicate. Student's *t*-tests were conducted to assess for significance for gene expression between control and 1,3,4-O-Bu3ManNAc treated samples.

**Figure 3B**). By contrast to the T-47D cancer and the nearnormal MCF10A lines, the MDA-MB-231 cells did not show a trend toward increased sialylation in the clustering analysis (**Figure 3B**) even though 1,3,4-O-Bu3ManNAc did support a clearly-measurable increase in intracellular sialic acid in this advanced cancer line (**Figure 2B**). A possible explanation for this unexpected result was that the glycoproteins experiencing decreased sialylation upon 1,3,4-O-Bu3ManNAc treatment were disproportionately low in abundance and did not reflect global sialylation. The data shown in **Figure 7**, however, discounts this possibility because the scaled representation of aggregate sialoglyconjugate abundance showed a clear decrease in global sialylation in the MDA-MB-231 cells that was reproduced in two independent experiments.

showed that this methodology showed measurable quantitative differences but the qualitative differences between this line and the other two remained very discernible in either replicate.

## DISCUSSION

The sialoglycosite analyses of 1,3,4-O-Bu3ManNAc-treated human breast cell lines highlights—and further reinforces a coalescing consensus that metabolic flux can influence sialylation in biologically-meaningful and disease-relevant ways. Recently, flux-based modulation of sialylation has been shown to be important in several contexts. For example, media supplementation with ManNAc (Yorke, 2013) or our analog (i.e., 1,3,4-O-Bu3ManNAc) improves glycan product quality for therapeutic glycoproteins (Wang et al., 2019) (a similar approach used sialuria-type mutations to GNE to increase sialylation; Bork et al., 2007). In this report we focus on a second context, which is flux-driven sialylation in cancer. In the past we have used sialoglycosite analysis to characterize 1,3,4-O-Bu3ManNActreated SW1990 pancreatic cancer cells (Almaraz et al., 2012b), and in follow up studies, showed that these changes altered oncogenic signal transduction and sensitivity to drugs (Mathew et al., 2015, 2017).

In the current study we used a metabolic glycoengineering strategy to uncover numerous flux-based perturbations to sialylation exemplified by the 1410 sialylated glycopeptides identified in common between MCF10A, T-47D, and MDA-MB-231 cells. Overall, these changes affected proteins of almost all classes and activities. The categories of biological activity, however, became tightly focused in the small subset of 13 highly flux-responsive proteins that exhibited two (or more) changes in sialylation of at least 10-fold (**Figure 4**). Two of these categories (cell signaling and cell adhesion) were expected because of literature precedent implicating sialic acid in these processes in cancer. The third category (protein folding chaperones) was not expected because very little information is available on the glycosylation, let alone the sialylation, of these proteins. Accordingly, the discovery that molecular chaperone proteins are highly responsive to flux-based sialylation opens a new frontier for exploring (and ultimately controlling) the function of these proteins.

Another aspect of the highly flux-responsive set of proteins shown in **Figure 4** is that they all are linked to cancer and in most cases, specifically to breast cancer. Further, several are either tumor suppressors or, depending on context, protooncogenes. The controlling factors that determine whether tumor suppressor or pro-oncogenic activity dominates the function of these proteins largely remains obscure; our current study opens the intriguing possibility that sialylation is a switch that turns either behavior on or off. Although speculative, exquisite control of sialylation may provide cells with the ability to tune the activity of "context dependent" tumor suppressor or proto-oncogenic proteins (Katoh, 2012; Li and Reynolds, 2012) and thus provide a cancer-driving stimuli or tumor inhibition depending on glycosylation status. The ability of cells to tune biological activity (such as tumor suppression vs. cancer progression) is supported by the high resolution characterization (as shown in **Figure 4** for selected proteins and **Figure 6** for all sialoglycosites) that shows that metabolic flux has a remarkable ability to selectively increase, decrease, or avoid perturbing sialylation. We emphasize that this selectivity occurs at individual sialoglycosite level (as can be seen by the comparisons made in **Figure 6** where no discernible difference is evident for proteins with single sialoglycosites or multi-sialoglycosite proteins with either uni- or bi-directional changes). Global patterns of sialylation are nonetheless heavily influenced by the host cell line, which is evident in **Figure 7** where 1,3,4-O-Bu3ManNAc treatment induces a modest enhancement of sialylation in the near-normal MCF10A line, a strong increase in the cancerous T-47D line, and the unexpected global decrease in the advanced triple negative MDA-MB-231 line.

Although decreased abundance was observed at dozens of sialoglycosites in the near-normal MCF10A and T-47D lines, the overall response of these lines nevertheless fit the canonical understanding of the role of sialic acid in oncogenesis. Specifically the minor increase in intracellular metabolites as detailed previously (Saeui et al., 2018) and confirmed in the present study in the near-normal MCF10A line translated into a substantial increase in sialoglycoconjugates at the global level (as shown in **Figures 6**, **7**). Considering the long-established links between sialylation and oncogenesis, these data suggest that flux driven sialylation could be an early triggering factor in the development of cancer. The cancerous T-47D line has an increased ability to produce intracellular sialometabolites (**Figure 2**), which in turn increases sialoglycoconjugate production substantially beyond levels seen in the near-normal MCFA10A line, consistent with the many known roles of sialic acid in cancer progression. Indeed, we speculate that the copious production of this sugar in the T-47D line could help drive these cancer cells to more advanced and malignant forms of this disease. The surprising aspect of this study comes into play with the advanced MDA-MB-231 line where there was a substantial global decrease in sialylated glycosites (**Figures 6**, **7**).

To ensure that the decrease in sialylation observed in the MDA-MB-231 line was a legitimate metabolic flux-based effect we repeated glycosite analysis in this line with quantitatively similar results (see **Figure 7**). We also ruled out that 1,3,4- O-Bu3ManNAc treatment reduced transcript levels of SAMG genes (**Figure 5**), suggesting that cells maintained the ability to produce the biosynthetic machinery (i.e., the various enzymes and transporters shown in **Figure 1**) required to synthesize sialylated glycans. Instead, an interesting hypothesis is that reduced sialoglycan levels may be a consequence of the sialylation status of the SAMG proteins themselves (**Figure 5E**). Finally, the MDA-MB-231 line has been analyzed by the Affymetrix Human Genome U133 2.0 Plus chip by using the protocols and facilities available through the Johns Hopkins Cancer Center Microarray Core [the resulting data were deposited in NCBI's Gene ExpressionOmnibus database (Edgar et al., 2002) and are accessible through GEO series accession number GSE11407 [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc]]. This data showed minimal global perturbation of transcript levels in 1,3,4- O-Bu3ManNAc treated MDA-MB-231 lines (Elmouelhi et al., 2009), further supporting the idea that that differences we observed were due to flux-based changes that affected posttranslational sialylation.

We close by speculating on why advanced cancer cells exemplified by MDA-MB-231 in this study—may benefit by down-regulating sialylated N-glycans. We hypothesize that sialic acid in cancer is an example of a "Goldilocks" effect where levels have to be "just right" (vitamins, which are critical to maintain health but often become toxic at higher doses, exemplify this paradigm in a different health context; Diab and Krebs, 2018). In particular, although sialylation drives multiple aspects of oncogenesis, too large of an increase similarly may be detrimental. This idea is consistent with descriptions of only "slightly increased" levels of sialic acid in some types of cancer (Sillanaukee et al., 1999) and feedback mechanisms that carefully titer metabolic flux into the sialic acid biosynthetic pathway (Kornfeld et al., 1964; Keppler et al., 1999).

Once cancer cells achieve an advanced stage, they may no longer require sialic acid as an oncogenic driving force. Instead, sialylation can become a liability. One example of this phenomenon is provided by the SW1990 pancreatic cancer cell line where oncogenic EGFR signaling is dampened by 1,3,4-O-Bu3ManNAc-driven sialylation (Mathew et al., 2016). Increased flux-driven sialylation also sensitizes this drug-resistant line to tyrosine kinase inhibitors (e.g., erlotinib and gefitinib; Mathew et al., 2015). Related to the current study, the MDA-MB-231 cancer line is of breast origin but obtained from a distal metastatic site; this may help explain why this line downregulates sialylation. Specifically, because sialic acid inhibits cell extravasation through endothelium (Cross et al., 2003; Sakarya et al., 2004; French et al., 2017), MDA-MB-231 cells evolved mechanisms to reduce sialylation to facilitate exit from the vascular to form secondary tumors at distal sites during metastasis. Interestingly, sialidases (e.g., NEU1, identified in this study to undergo flux-driven changes to sialylation in MDA-MB-231 cells in this study, **Figure 5E**) play a major role in cell migration across endothelia (Cross et al., 2003, 2012; Sakarya et al., 2004).

In conclusion, the glycoengineering approach taken in this report has uncovered novel, and in some cases completely unexpected, roles for flux-based sialylation in breast cancer. One striking result was the down-regulation of dozens of sialoglycosites upon treatment with the metabolite precursor (1,3,4-O-Bu3ManNAc) that feeds the sialic acid biosynthetic pathway. Another intriguing result was that the subset of 13 proteins that were highly flux-responsive are all linked to cancer, in many cases specifically to breast cancer, which validates our metabolic glycoengineering approach as an appropriate strategy to uncover insights into how sialylation affects this disease. Finally, although the function and activity of a few of the highly flux-responsive proteins have already been linked to glycosylation in general and sialylation more specifically (e.g., IGF2F and L1CAM), virtually nothing is currently known about the role of glycans for most of the other proteins, in particular for the protein folding chaperones (i.e., HYOU1, FKBP10, and TXNDC11).

Based on these considerations, this report establishes a set of sialoglycoproteins as novel targets for investigation into how glycosylation impacts their biological activity in both health and malignant disease. Consequently, this work augments growing efforts to translate metabolic glycoengineering into clinical healthcare (Agatemor et al., 2019). Although entirely speculative at this point, one way our 1,3,4-O-Bu3ManNAc-based approach can be envisioned to be deployed clinically is through emerging in vitro functional assays of a patient's living cancer cells (Kodack et al., 2017). More specifically, cancer cells obtained from a biopsy can be incubated with 1,3,4-O-Bu3ManNAc (or similar

agent) to reveal otherwise hidden biochemical features that if the trends in sialylation observed for the three cell lines evaluated in this study hold—can be used to assess the stage of cancer progression and use this information for personalized patient care.

#### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

#### AUTHOR CONTRIBUTIONS

CS and KY provided project design and management. KC provided mass spectrometry data acquisition. AN and MG conducted SAMG transcript profiling experiments. CS, VD, SS, MP, MA, AC, EC, MB, and RA conducted all other experiments. KY, HZ, and KM were responsible for funding acquisition. CS, KC, KY, AN, KM, and HZ participated in writing and editing of the manuscript.

#### REFERENCES


#### FUNDING

Funding was provided by the National Institutes of Health (R01CA112314 [KY], F31CA192767 [CS], and P41GM103390 [KM]).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00013/full#supplementary-material

Supplemental File S1 | Characterization of 1,3,4-O-Bu3ManNAc.

Supplemental File S2 | Primers for SAMG gene transcript quantification. Validated primer sequences for all 20 STs is provided here.

Supplemental File S3 | Sialoglycosite analysis. Raw mass spectrometry data from SPEG experiments for protein abundance and identified peptide sequences containing sialoglycosites is provided.

Supplemental File S4 | GO analysis. Overview of PANTHER (http://www. pantherdb.org/) generated GO results of SPEG analyzed samples.

Supplemental File S5 | Glycosite annotation. This file provides precise identification of glycosites highlighted in Figures 4, 5.

identification of glycan cell signatures. PLoS Computat. Biol. 9:e1002813. doi: 10.1371/journal.pcbi.1002813


adhesion to and migration across the endothelium. Glycobiology 14, 481–494. doi: 10.1093/glycob/cwh065


improves EPO glycan quality and production. Biotechnol. J. 14:e1800186. doi: 10.1002/biot.201800186


transduction and bAPP processing. Nature 407, 48–54. doi: 10.1038/350 24009


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Saeui, Cho, Dharmarha, Nairn, Galizzi, Shah, Gowda, Park, Austin, Clarke, Cai, Buettner, Ariss, Moremen, Zhang and Yarema. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Carbohydrate Conjugates in Vaccine Developments

#### Shuyao Lang1,2 and Xuefei Huang1,2,3 \*

*<sup>1</sup> Department of Chemistry, Michigan State University, East Lansing, MI, United States, <sup>2</sup> Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United States, <sup>3</sup> Department of Biomedical Engineering, Michigan State University, East Lansing, MI, United States*

Vaccines are powerful tools that can activate the immune system for protection against various diseases. As carbohydrates can play important roles in immune recognition, they have been widely applied in vaccine development. Carbohydrate antigens have been investigated in vaccines against various pathogenic microbes and cancer. Polysaccharides such as dextran and β-glucan can serve as smart vaccine carriers for efficient antigen delivery to immune cells. Some glycolipids, such as galactosylceramide and monophosphoryl lipid A, are strong immune stimulators, which have been studied as vaccine adjuvants. In this review, we focus on the current advances in applying carbohydrates as vaccine delivery carriers and adjuvants. We will discuss the examples that involve chemical modifications of the carbohydrates for effective antigen delivery, as well as covalent antigen-carbohydrate conjugates for enhanced immune responses.

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Qian Wan, Huazhong University of Science and Technology, China Mare Cudic, Florida Atlantic University, United States*

#### \*Correspondence:

*Xuefei Huang huangxu2@msu.edu*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *07 February 2020* Accepted: *23 March 2020* Published: *15 April 2020*

#### Citation:

*Lang S and Huang X (2020) Carbohydrate Conjugates in Vaccine Developments. Front. Chem. 8:284. doi: 10.3389/fchem.2020.00284* Keywords: adjuvant, carbohydrates, immune activation, vaccine development, glyco-conjugates

## INTRODUCTION

Carbohydrates are common surface molecules in the living system. With their rich structural diversities, carbohydrate molecules play important roles in cellular recognition and signaling, including immune recognition, and activation (Rabinovich et al., 2012; Mahla et al., 2013; Varki, 2016). Most of the cell surface immune receptors, such as toll-like receptors (TLRs), NOD-like receptors (NLRs) and major histocompatibility complex class I and class II (MHC I and MHC II), are glycoproteins. Several essential receptors for immune cell activation, e.g., TLRs, NLRs, C-type lectins, and sialic acid-binding immunoglobulin-type lectins (Siglecs), can recognize glycan containing ligands including those expressed on the surface of many pathogenic microbes and cancer cells (Rabinovich et al., 2012).

Carbohydrates have been widely applied in vaccine development (Lesinski and Westerink, 2001). Vaccines containing bacterial polysaccharides have been commercialized as anti-bacterial vaccines (Roy, 2004; Astronomo and Burton, 2010), and many anti-cancer vaccines have been studied to target tumor-associated carbohydrate antigens (TACAs) (Guo and Wang, 2009; Astronomo and Burton, 2010; Yin and Huang, 2012; Feng et al., 2016). Carbohydrates are also attractive immune adjuvant candidates. Various carbohydrates such as β-glucan, mannan, and monophosphoryl lipid A (MPLA) can activate the immune system and induce T helper cell type 1 (Th1) immune responses (Suzuki et al., 2001; Stambas et al., 2002; Petrovsky and Cooper, 2011; Hu et al., 2013). They may complement Alum, the FDA approved adjuvant in humans, which only induces T helper cell type 2 (Th2) immune responses. Carbohydrates can be readily metabolized or degraded in vivo and are less likely to generate long-term toxicity (Petrovsky and Cooper, 2011; Hu et al., 2015; Li and Wang, 2015). With their biocompatibility, low toxicity and ease of modification,

**22**

carbohydrates have been studied as carriers for antigen delivery (Liu et al., 2008; Correia-Pinto et al., 2013; Zhang et al., 2013; Cordeiro et al., 2015; Pushpamalar et al., 2016), which can often induce immune cell targeting and provide self-adjuvanting activities for a successful vaccination.

Although natural carbohydrates can be applied as vaccine components directly (Mata-Haro et al., 2007; Arca et al., 2009; Mirza et al., 2017). in many cases chemical modification of carbohydrates is necessary for enhanced efficacy. One of the commonly used strategies in vaccine design is to prepare conjugates of antigens and/or adjuvants with the delivery carrier (Liu and Irvine, 2015). This can be beneficial in multiple ways, such as prolonged circulation and controlled release, size-induced lymph node targeting, better immune recognition through multivalency, enhanced cell uptake and immune activation. In this review, we focus on recent vaccine designs applying carbohydrates as vaccine delivery carriers and adjuvants. We will discuss examples involving chemical modifications of the carbohydrates, especially the covalent conjugates of antigens and carbohydrate-based delivery carrier or adjuvants. Vaccines that contain carbohydrates and derivatives only as antigen components, or natural carbohydrates encapsulated/admixed with other vaccine components, have been reviewed (Marzabadi and Franck, 2017; Colombo et al., 2018; Wei et al., 2018; Weyant et al., 2018; Jin et al., 2019; Micoli et al., 2019), and are not discussed here.

### ZWITTERIONIC POLYSACCHARIDES (ZPSs)

Many types of bacteria can produce high molecular weight polysaccharides as their capsules. Polysaccharides have been traditionally considered as T cell independent antigens unless conjugated to proteins or lipids (Stein, 1992; Wei et al., 2018). Polysaccharides usually interact with polysaccharide-specific B cells generating low-affinity IgM with little detectable IgG antibodies and little induction of T cell responses or immune memory (Abbas et al., 2000). However, a special group of polysaccharides, referred to as ZPSs, has been found to have the ability to induce MHC II mediated T cell response specifically (Kalka-Moll et al., 2002; Mazmanian and Kasper, 2006). At least eight different ZPSs have been isolated from Bacteroides fragilis, Staphylococcus aureus, and Streptococcus pneumoniae type 1, of which the PS A1 (isolated from Bacteroides fragilis) is the most studied ZPS so far (**Scheme 1A**) (Cobb and Kasper, 2005; Mazmanian and Kasper, 2006; Surana and Kasper, 2012; Nishat and Andreana, 2016).

TACAs are saccharides aberrantly expressed on surfaces of multiple types of cancer cells (Heimburg-Molinaro et al., 2011). Like most types of carbohydrate antigens, TACAs induce only weak IgM responses when administered alone. For successful TACA vaccines, TACAs are commonly conjugated with strong immunogenic proteins, such as bovine serum albumin (BSA), tetanus toxoid (TT), keyhole limpet hemocyanin (KLH), and virus like particles, in order to generate high levels of IgG responses (Kaltgrad et al., 2007; Heimburg-Molinaro et al., 2011; Wu et al., 2018, 2019). However, these carrier proteins can result in carrier induced suppression of antibody responses to the desired TACA due to high antibody responses to the carrier itself (Leclerc et al., 1990). Furthermore, some of the protein carriers tend to aggregate or suffer from stability issues (Dasgupta et al., 2014). ZPSs as novel non-protein T cell-activating carriers have been applied to cancer vaccine design by the Andreana group (De Silva et al., 2009). They first reported an "entirely carbohydrate vaccine" by conjugating a model TACA, Tn, and the most studied type of ZPS, PS A1. PS A1 was isolated from B. Fragilis in a large scale, then subjected to selective oxidation leading to aldehyde functioned PS A1 that reacted with aminooxy functionalized Tn by oxime formation (**Scheme 1B**).

Immunization of mice with Tn-PS A1 resulted in a 200 fold increase of total antibody titer against Tn compared to the pre-immunized sera, while the antibody titers against the PS A1 backbone were modest. IgM and IgG3 were the major subtypes of antibodies generated (De Silva et al., 2009). Anti-sera of Tn-PS A1 immunized mice were found to react with a range of Tn expressing cancer cell lines (MCF-7, MDA-231, Jurkat, JurkatTAg, Panc-1) (De Silva et al., 2012), while binding little to human peripheral blood mononuclear cells and human bone marrow cells as the negative control. The anti-PS A1 and anti-Tn-PS A1 sera showed completely different cytokine profiles. A high level of IL-17A, a pro-inflammatory factor promoting CD4<sup>+</sup> T cell proliferation, was detected in anti-Tn-PS A1 sera but not in anti-PS A1 sera. Besides Tn antigen, other TACAs such as sialyl-Tn (STn) (Nishat and Andreana, 2016; Shi et al., 2016) and Thomsen-Friedenreich (Tf) (Trabbic et al., 2016) have been conjugated with PS A1 (**Scheme 1B**) and another ZPS, i.e., PS B (**Scheme 1C**) (Trabbic et al., 2016). The conjugates were able to induce moderate levels of both IgM and IgG antibodies against the target TACAs. Co-administration of an exogenous adjuvant such as Sigma adjuvant system (SAS) and TiterMax Gold (TMG) could enhance the levels of IgG antibodies. Postimmune sera bound with multiple types of cancer cells and were able to kill tumor cells via complement-dependent cytotoxicities while sparing normal cells. Furthermore, the STn-PS A1+SAS vaccine generated cellular immunity besides humoral antibody response. The enzyme-linked immune absorbent spot (ELISpot) assay of splenocytes from mice immunized with STn-PS A1+SAS pulsed with STn-PS A1 or BSM showed secretion of INF-γ, clearly indicating a Th1-dominant cellular immune response.

These studies indicated that ZPSs are promising vaccine carrier/adjuvant to elicit a selective immune response against TACAs. However, to date, the efficacy of protection in mouse tumor models by these entirely carbohydrate vaccines have not been reported. Further studies are needed to demonstrate the full potential of ZPS in anti-cancer vaccine development.

### MPLA

MPLA is a derivative of lipopolysaccharide (LPS), a fraction isolated from cell walls of gram-negative bacteria such as Salmonella minnesota (Casella and Mitchell, 2008). Through a hydrolytic process reported by Edgar Ribi, LPS can be

converted into an acylated di-glucosamine mixture widely known as monophosphoryl lipid (Ribi et al., 1979; Qureshi et al., 1982; Casella and Mitchell, 2008). The majority of these species contains six acyl side chains, no polysaccharide chains and one phosphoryl group (**Scheme 2A**) (Evans et al., 2003; Casella and Mitchell, 2008). Compared to LPS, MPLA is about 0.1% as toxic as the parent LPS compound in rabbit pyrogenicity assays while maintaining its immune-stimulating activities (Qureshi et al., 1982; Evans et al., 2003). MPLA interacts with the immune system through TLR-4 and usually induces Th1 or a blended Th1 and Th2 type immune response. With its low toxicity, MPLA has been applied as the adjuvant in several vaccines successfully in clinical trials (Evans et al., 2003; Cluff, 2009; Artiaga et al., 2016). Vaccines containing MPLA such as FENDrix (HBV vaccine), Cervarix (HPV vaccine), Melacine (melanoma vaccine), Pollinex Quattro (allergy vaccine), and Mosquirix (malaria vaccine for young children) have been registered for use in many countries (Artiaga et al., 2016). MPLA can also serve as a vaccine carrier and a built-in adjuvant when conjugated with antigens covalently. Herein we discuss examples of fully synthetic vaccines containing MPLA as the carrier (Wu and Guo, 2006; Wang et al., 2011; Zhou et al., 2014, 2015; Liao et al., 2016).

In 2011, the Guo lab first reported the covalent conjugation of a TACA, i.e., GM3, with MPLA as an anti-cancer vaccine (Wang et al., 2011). The liposomal vaccine was formed by sonication of a mixture of the GM3-MPLA conjugate, 1,2-distearoyl-sn-glycero-3-phosphocholine, and cholesterol. The resulting vaccine was injected to C57BL/6 mice subcutaneously for 4 weekly injections. A strong GM3-specific antibody response was observed by enzyme-linked immuno-sorbent assay (ELISA) in antisera on day 38, which included high levels of both IgM and IgG3 antibodies. When a GM3 derivative, GM3NPhAc (Pan et al., 2005), was conjugated with MPLA using a similar strategy, a 3.8 times higher total antibody titer with a significant increase of IgG3 and IgG1 titers was observed in day 38 antisera compared to the GM3- MPLA group. The antisera from GM3NPhAc-MPLA immunized mice showed strong binding toward cancer cell SKMEL-28 by fluorescence activated cell sorting (FACS) analysis. The free phosphate and free hydroxyl groups on MPLA are important for immunostimulation, as the conjugates with benzyl protected phosphate and hydroxyl groups showed no significant immune responses. The linker between MPLA and GM3/GM3NPhAc did not significantly influence the immunological properties of the resulting conjugates. Interestingly, addition of an external

adjuvant such as Titermax Gold to the vaccine formulation led to lower antibody titers relative to GM3/GM3NPhAc-MPLA conjugates alone. This work indicated that the fully synthetic conjugation of MPLA-TACA can serve as a possible "selfadjuvanting" cancer vaccine candidate.

The generality of the MPLA platform has been demonstrated in later studies. Three more MPLA analogs with different lipid chain lengths and linkages were synthesized and conjugated to another TACA derivative, STnNPhAc (Wu and Guo, 2006; Zhou et al., 2014), and formulated into a liposomal vaccine. All STnNPhAc-MPLA conjugates successfully generated immune responses toward STnNPhAc in mice and the conjugate with an 8-carbon lipid chain length and free -OH groups induced the highest antibody titers. Similar to the GM3-MPLA conjugate, when the exogenous adjuvant Titermax Gold was added to the formulation, the antibody titers decreased.

The optimized MPLA structure was used to conjugate with another TACA, globo H, and the immunological properties were compared with the globo H conjugate with KLH, a gold standard carrier commonly utilized in vaccine studies (Zhou et al., 2015). Significantly higher total antibody titers as well as IgG titers were observed in anti-sera from MPLA-globo H immunized mice

on MPLA.

compared to those immunized with KLH-globo H, suggesting the advantage of MPLA as the carrier. Both conjugates induced higher levels of pro-inflammatory cytokines including IL-4, IL-12, IFN-γ, and TNF-α in mice compared to the non-immunized group. Although the KLH-globo H group showed a higher level of cytokine secretion compared to MPLA-globo H, antisera from MPLA-globo H immunized mice showed a stronger binding toward both MCF-7 and SKMEL-28 tumor cells by FACS analysis and induced more cell lysis of human breast cancer cell MCF-7. The enhanced cytokine secretion in KLH conjugate group might come from the immune response against the protein carrier instead of the globo H antigen. This study indicated that the MPLA may serve as a good alternative to KLH protein vaccine carrier.

In addition to the aforementioned cancer vaccines, a Group C meningitis vaccine has been reported by conjugating MPLA and α-2,9-oligosialic acid containing di-, tri-, tetra- and pentasialic acid (Liao et al., 2016). The resulting liposomal vaccines with various MPLA-oligosialic acid conjugates induced strong immune responses as revealed by high total antibody titers. The major antibody subtype generated was IgG2b indicating a T cell-dependent immunity. Both oligosialic acid chain length and MPLA structure influenced the immune responses. The shorter sialic acid chains (di- and tri-sialic acid) were overall better immunogens than longer ones (tetra- and penta-sialic acid). However, the antibody induced by the short sialic acid were more restricted to short sialic acid chains. Conjugates containing tri-, tetra-, or penta-sialic acid showed stronger binding toward Group C meningitides capsule polysaccharide than the conjugate containing di-sialic acid. Consistent with cancer vaccine studies, addition of external adjuvants such as CFA, alum and Titermax Gold did not lead to higher antibody responses. All conjugates showed protective effects against Group C meningitides bacterial challenges in mice, which suggested the possibility of applying the MPLA platform to anti-microbial vaccine development.

In the aforementioned MPLA based vaccine designs, the antigens were all conjugated with MPLA through 1-O-position instead of 6′ -O-position where the polysaccharide chain is attached to LPS in nature (Wang et al., 2017). Guo and Gu further studied the influence of different antigen linkage positions on immunological properties (**Schemes 2B,C**), by linking a tetrasaccharide antigen from lipoarabinomannan (LAM), a Mycobacterium tuberculosis cell surface lipopolysaccharide, to either 1 or 6′ position of MPLA. As the ester linkage on 6′ position was not stable, the 6′ -O was first substituted with an amino group linker in order to form a more stable amide bond. The resulting conjugates were evaluated in vivo. Both conjugates showed significantly enhanced antibody titers against LAM compared to the simple mixture of tetrasaccharide and MPLA, which indicated the importance of covalent conjugation between the antigen and MPLA. As revealed by ELISA, the antigen conjugated to MPLA through 6′ -N position induced significantly higher IgG titers than the corresponding conjugate through the 1-O position. The method of vaccine administration also influenced the immune response outcome. Vaccine given through intraperitoneal injection induced a 4–5 times higher antibody titer compared to the subcutaneous route. This study suggested the conjugation through 6′ position of MPLA could be a more superior strategy for MPLA based vaccine design.

As a low toxicity TLR4 stimulator, MPLA has been widely applied in many vaccines as an add-in adjuvant (Artiaga et al., 2016). Guo's work demonstrated the potential of MPLA as a good "self-adjuvating" vaccine carrier. MPLA-antigen conjugates containing liposomal vaccines can induce strong immune responses comparable to KLH protein. The MPLA platform showed good generality for several carbohydrate antigens including TACAs and bacterial glycans. This platform is not compatible with many external adjuvants and the antigen conjugation site can significantly influence the outcome of vaccination.

#### MANNAN

Mannan, a polysaccharide derived from the yeast cell wall, contains mostly β-1,4-linked mannose backbone with a small number of α-1,6- linked glucose and galactose side chain residues (Moreira and Filho, 2008). In addition, around 5% proteins were contained in mannan (**Scheme 3A**) (Nelson et al., 1991; Tzianabos, 2000). As an important component of fungal cell wall, mannan has been widely targeted as carbohydrate based vaccines for Candidiasis (Han and Rhew, 2012; Cassone, 2013; Johnson and Bundle, 2013). It was noticed from patients suffering from Candidiasis that the mannan has immunomodulatory functions (Domer et al., 1986; Wang et al., 1998). Mannan can be recognized through binding with mannose recognition lectins presented on macrophages and other immune cells, which activates the host immune system via a non-self-recognition mechanism (Vasta et al., 1999; Gadjeva et al., 2004). The recognition initiates a set of signal transduction events leading to cytokine secretion, complement activation and CD8<sup>+</sup> T cell activation (Garner et al., 1990; Garner and Hudson, 1996; Tzianabos, 2000). In this section, we focus on vaccines based on mannan carrier-antigen complex/conjugations, including mannan-mucin 1 (MUC1) fusion protein conjugation for tumor therapy, mannan-DNA vaccine and mannan-allergy vaccines.

The investigation of mannan's potential as a vaccine carrier started in 1990s. The Steward group conjugated mannan and dextran to hepatitis B virus (HBV) 139-147 peptide and studied the immune response in mice toward these two constructs (Okawa et al., 1992). The mannan carrier successfully induced high IgG titers against HBV 139–147 peptide without additional adjuvants, while the corresponding dextran conjugate failed to elicit an immune response. Although some previous studies showed that mannan could suppress immunity (Garner et al., 1990; Podzorski et al., 1990), this study opened the door for using mannan as a "self-adjuvanting" vaccine carrier to enhance antibody production.

### Mannan-MUC1 Fusion Protein Conjugation (M-FP)

Mucins are heavily glycosylated proteins expressed on cell surface. MUC1 is a prototypical mucin, which has been found to be over-expressed on a wide range of tumor cells. Furthermore,

tumor associated MUC1 has drastically shorter O-glycans in the tandem repeat region of MUC1 made of 20-amino acid residues (APDTRPAPGSTAPPAHGVTS) (Gendler et al., 1990), which leads to the exposure of the protein core, rendering it a highly attractive antigen for anti-cancer immune-therapy (Gendler et al., 1988; Hanisch et al., 1989).

MUC1 by itself is only weakly immunogenic in humans partly due to its self-antigen nature. Immunization of mice with MUC1 fusion protein containing 5 of the tandem repeats induced antibodies but with little measurable cytotoxic T cell (CTL) responses and poor tumor protection (Apostolopoulos et al., 1994). To enhance anti-MUC1 immunity, MUC1 has been conjugated with mannan (Apostolopoulos et al., 1996).

Two strategies (oxidative or reductive, **Scheme 3B**) for linking mannan to MUC1 have been investigated, which induced drastically different types of immune responses (Apostolopoulos et al., 1995b). Human MUC1 FP was conjugated to mannan oxidized with sodium periodate to provide the oxidative mannan-MUC1 fusion protein conjugate (ox-M-FP). The reductive mannan-MUC1 fusion protein conjugate (red-M-FP) was obtained by treating ox-M-FP with sodium borohydride. BALB/c mice were immunized with either ox- or red- M-FP then challenged with MUC1<sup>+</sup> 3T3 tumor cells. The red-M-FP generated Th2 type immune responses and induced antibody secretion. However, it has little tumor protective effects. In contrast, the ox-M-FP generated Th1 type responses and induced a high tumor specific CTL precursor frequency providing protection in a mouse tumor model. The CTL response elicited by ox-M-FP was MHC I restricted (Apostolopoulos et al., 1995a), and the CTL precursor frequency can be further enhanced by a combination with a chemotherapeutic drug, i.e., cyclophosphamide (Apostolopoulos et al., 1998). The detailed mechanism of the entry of ox-M-FP into MHC I pathway has also been studied (Apostolopoulos et al., 2000). While both aldehyde and Schiff base groups are presented on ox-M-FP, the aldehyde groups but not the Schiff base groups were found to be important for antigen presentation through the MHC I pathway.

The ox-M-FP had been evaluated in human clinical trials. In phase I studies, no significant toxicities or autoimmunities were noted among >100 patients with advanced melanoma. However, in contrast to preclinical mouse studies, the patients generated mainly antibodies rather than cellular immunity against MUC1 (Karanikas et al., 1997, 2000, 2001). The route of ox-M-FP administration influenced antibody generation in patients. Intraperitoneal injections were significantly more effective compared to intramuscular injections (Karanikas et al., 2001). Pilot phase III study of ox-M-FP has been performed in early-stage breast cancer (Apostolopoulos et al., 2006). Although vaccine-induced antibody and weak cellular immunity responses showed little benefits in advanced disease stage, ox-M-FP significantly improved survival time compared to the placebo control group in early-stage cancer patients (Apostolopoulos et al., 2006). In a 12–15 years follow-up study, the recurrence rate of ox-M-FP group was much lower than that of the placebo group (12.5 vs. 60%) (Vassilaros et al., 2013). The mean time of recurrence in the ox-M-FP group was 52.2 months longer compared to placebo group (118 vs. 65.8 months) (Vassilaros et al., 2013). In another study, autologous dendritic cells were chosen as the vaccine carrier to maximize the cellular immunity in patients (Loveland et al., 2006). The phase I/II clinical trial showed ox-M-FP loaded monocyte derived dendritic cells were well tolerated for immunotherapy, and vaccine-specific IFN-γ secreting CD4<sup>+</sup> and CD8<sup>+</sup> T cells were successfully induced in all patients (Loveland et al., 2006).

#### Mannan as Carrier for DNA Vaccines

Oxidized and reduced mannan (ox-Man and red-Man, respectively) have been studied as DNA vaccine carriers. Apostolopoulos and Pietersz groups conjugated ox-Man and red-Man with polycationic linker poly-L-lysine (PLL) and then complexed them with DNA corresponding to the protein ovalbumin (OVA) (Tang et al., 2007). The conjugation with mannan reduced cytotoxicity of PLL, and the Man-PLL-OVA DNA complex successfully induced immune responses against OVA. At a lower dose (10 µg), red-Man-PLL-OVA DNA mainly induced CD4<sup>+</sup> T cell responses, while ox-Man-PLL-OVA DNA induced CD8<sup>+</sup> T cell responses. Meanwhile, at a higher immunization dose (50 µg), both red-Man and ox-Man-PLL-OVA DNA complex generated CD4<sup>+</sup> and CD8<sup>+</sup> T cell responses. Both complexes induced good tumor protection against OVA expressing EG.7 tumor using either low (10 µg) or high (50 µg) immunization doses.

With the success of OVA DNA vaccine, Apostolopoulos and coworkers further studied MUC1 DNA vaccine, by preparing the Man-PLL-DNA complex (Tang et al., 2008). The resulting ox-Man-PLL-MUC1 DNA generated immune responses in C57BL/6 mice and protected mice in tumor challenge with a low immunization dose. In addition, the vaccines generated strong immune responses in MUC1 transgenic mice, which are tolerant toward human MUC1 as in humans. Similar to previous reports, the ox-Man-PLL-MUC1 DNA mainly generated a Th1 response while red-Man-PLL-MUC1 DNA generated a Th2 dominant response. A more detailed study showed the differences between DNA alone and Man-PLL-DNA complex upon immunization (Tang et al., 2009). Man-PLL protected cargo DNA against the DNase digestion. Ox-Man and red-Man induced different cytokine secreting profiles. Compared to DNA alone, ox-Man induced higher levels of IL-2, IL-12, IFN-γ, and TNF-α while red-Man induced only IL-2. The Man-PLL-DNA complex was able to stimulate DC maturation through a TLR2 but not a TLR4 dependent pathway.

#### Mannan as the Carrier for Allergy Vaccine

Allergen-specific immunotherapy has attracted researchers' attention as it may provide a long-lasting relief from allergy for the patients. Mannan-allergen conjugates have been studied as potential anti-allergy vaccines (Benito-Villalvilla et al., 2018).

The Weiss lab studied the conjugation between oxidized mannan and model allergens, OVA protein and papain, for vaccination targeting dendritic cells (Weinberger et al., 2013). The mannan backbone here served as not only a targeting molecule toward the C-type lectin receptor (a receptor expressing on DCs), but also a platform to induce cross-linking for multimerization of allergen proteins for immunogenicity

enhancement (Chackerian et al., 2002). Sodium periodate was used for generating aldehyde groups on mannan backbone for allergen conjugation by oxidative cleavage between C2 and C3. The conjugation efficiency depended on antigen properties as well as the degree of oxidation. The C-lectin binding property of mannan was not disturbed after conjugation with antigen proteins when the oxidation degree was careful controlled. The mannan-antigen conjugate significantly increased the number of antigen-presenting DCs in lymph nodes in vivo. Immunization successfully reduced the enzymatic activity or IgE binding capacity of antigen proteins in vaccinated mice. Antibody class-switching from allergy-promoting IgE subtype to nonallergic IgG1 subtype was noticed indicating an anti-allergy therapeutic effect.

Palomares et al. used another strategy to conjugate allergen proteins to non-oxidized mannan by a simple treatment of glutaraldehyde (**Scheme 3C**) (Manzano et al., 2016). The conjugate took advantage of the trace amount of mannan protein on mannan backbone. Allergens were polymerized and linked to mannan protein through glutaryl diimine linker and the resulting conjugate significantly reduced IgE binding activity against the allergens. Later Palomares et al. applied this conjugation method for preparing P pratense pollen-non-oxidized mannan conjugate (PM) (Sirvent et al., 2016). The PM was hypoallergenic with low IgE binding in vitro and induced fewer mast cells under the skin in an in vivo skin-prick test. Immunization of rabbit with PM induced blocking antibodies against IgE binding. Compared to the free allergen or the polymerized allergen, the PM can be captured more effectively by human DCs. More antiinflammatory cytokines IL-6 and IL-10 secretion in human DCs were induced by PM, and Foxp3+ Treg generation through PD-L1 in human subjects was also promoted, which indicated a down-regulation of immune responses toward the allergen.

A drawback in using oxidized mannan is that the mannose ring in the mannan backbone is partially opened, which may impair the capture of PM by DCs in mice and human subjects (Sirvent et al., 2016). This can be overcome with nonoxidized mannan.

Another important consideration in mannan based vaccine is the combination of external adjuvant. In a recent study, the Palomares lab reported the PM induced anti-allergy Foxp3+ Treg generation can be inhibited when co-administrated with Alum (Benito-Villalvilla et al., 2019). This was because Alum suppressed the increasing production of lactate and consumption of glucose induced by PM in human DCs by altering the glucose metabolic fate in mitochondria and inhibiting mammalian target of rapamycin (mTOR).

#### α-GALACTOSYLCERAMIDE (α-GALCER)

The presentation of antigen fragments on antigen presenting cell (APC) surface is an important step for activating the adaptive immune system. Besides the commonly known MHC I and MHC II, CD1 family is a third subset of antigen presenting molecules (Zajonc, 2016). There are 4 types of CD1 (CD1a-CD1d) capable of binding and presenting glycolipids to CD1-restricted T cells. A subtype of T cells, invariant natural killer T (iNKT) cells, is defined as a T cell lineage expressing NK cell receptors and an additional invariant CD1d restricted αβ-T cell receptor (TCR) (Bendelac et al., 2007). After activation through its TCR binding with glycolipid presenting CD1d on APCs, iNKT cells can secret various cytokines, which build a bridge between the innate and the adaptive immune system. iNKT cells can initiate "T dependent (TD) type II response," which needs no participation of CD4<sup>+</sup> T cells. It has been reported that iNKT cells play a role in protection against pathogens as well as cancer (Kawano et al., 1997; Metelitsa et al., 2001; Merle et al., 2015). The first iNKT activator, α-GalCer (KRN7000, **Scheme 4A**) was a synthetic compound discovered from a class of glycolipids originally isolated from marine sponges (Natori et al., 1993; Morita et al., 1995; Shimosaka, 2002). Since then, hundreds of analogs were synthesized by varying the amide side chain length and functional groups, substitutions at galactose-6 position and galactose-ceramide linker etc. α-GalCer is by far the most explored structure and the C-glycoside analog 7DW8-5 with an aryl side chain were also attractive structures for immune studies. Many excellent reviews about α-GalCer and its analogs have been published (Carreño et al., 2014; Marzabadi and Franck, 2017; Waldowska et al., 2017; Zhang et al., 2019).

α-GalCer has been applied as an adjuvant in many studies (Mattarollo and Smyth, 2013; Faveeuw and Trottein, 2014; Artiaga et al., 2016; Liu and Guo, 2017; Fujii et al., 2018; Sainz et al., 2018; Yamashita et al., 2018; Zhang et al., 2019), including vaccines against cancer, influenza, and malaria. To improve the delivery efficiency of α-GalCer and therefore enhancing the activation of iNKT cells, various delivery systems have been designed, such as liposomes, poly(lactic-co-glycolic acid) (PLGA) particles and bacteriophage particles (Macho-Fernandez et al., 2014; Dölen et al., 2016; Ghinnagow et al., 2017a,b; Sartorius et al., 2018). By delivering covalently conjugated antigen and α-GalCer, the immune response could be stronger due to the simultaneous delivery of the antigen and the adjuvant to the same immune cell, and we focus on examples of covalent conjugate vaccines of α-GalCer.

The first examples of covalent conjugation of the antigen and α-GalCer were reported in 2014 (Anderson et al., 2014; Cavallari et al., 2014). The Painter and Herman's lab developed self-adjuvanting vaccines that suppress allergy by conjugating the antigen peptide to α-GalCer through a cleavable linker (**Scheme 4B**) (Anderson et al., 2014). Starting from α-GalCer, an N to O acyl migration occurred under acidic conditions, which produced an α-GalCer prodrug with a free amino group for further functionalization. The amino group was then capped with an esterase-labile acyloxymethyl carbamate group. The resulting ketone group could be functionalized with an aminooxy peptide containing the protease cleavable FFRK sequence following the desired antigen peptide. Under the physiological condition, the FFRK linker would be cleaved to release the desired antigen while the acyloxymethyl carbamate group would be degraded by an esterase to release the α-GalCer prodrug. After a reversed O to N acyl migration, the active adjuvant α-GalCer would be formed in situ. In this study, two model antigen peptides, SIINFEKL and KAVYNFATM, were selected. Both peptide-GalCer conjugates

SCHEME 4 | (A) Structure of α-GalCer. (B) Examples of antigen-α-GalCer prodrug conjugates (conjugate through α-GalCer lipid chain). (C) Examples of antigen-α-GalCer conjugate through 6-OH.

stimulated greater CD8<sup>+</sup> T cell proliferation compared to nonconjugated mixtures containing the same amount of peptide and α-GalCer. By intracellular staining, large amounts of IFN-γ and TNF-α were detected, while allergy related IL-4 cytokine was not detectable. The conjugates induced antigen-specific cytotoxic responses in immunized animals, while the admixture of peptide and α-GalCer failed to do so. This strong activity was CD4<sup>+</sup> T cell independent and the covalent conjugation was shown to be critical. The SIINFEKL-α-GalCer conjugate strongly reduced inflammatory responses in an allergy animal model, sensitized by the OVA protein. In contrast, the mixture of peptide and α-GalCer did not reduce the allergic response.

About the same time, the De Libero' lab developed a semisynthetic vaccine against S. pneumoniae by conjugating S. pneumoniae serotype 4 capsular polysaccharides (CPS 4) to 6 position of α-GalCer through a cleavable linker (**Scheme 4C**) (Cavallari et al., 2014). Different from Painter and Herman's strategy, the immunogenic lipid tail was kept intact. Instead, an amino moiety was connected to 6-OH of α-GalCer then conjugated with CPS 4 via cyanogen bromide chemistry. The conjugates were usually a mixture of isoureas, N-substituted imidocarbonates and N-substituted carbamates, which could release the original CPS 4 under acidic condition when taken up by APCs. The CPS 4-GalCer conjugation generated polysaccharide-specific IgM, IgG1, IgG2a, IgG2b, and IgG3 antibody responses in mice, while the mixture of CPS 4 and α-GalCer and CPS 4 only generated weak IgM responses with no IgGs. The conjugation induced germinal centers and the resulting antibody induced S. pneumoniae opsonization. Animals vaccinated with the CPS 4-GalCer conjugate exhibited a significant survival advantage (89%) in bacterial challenge model compared to animals receiving CPS alone (25%). By FACS analysis of the splenocytes, CPS 4-GalCer, but not mixture of CPS 4 and α-GalCer or CPS4 alone, induced antibody isotype switching to IgG, generation of memory B cells and antigen secreting plasma cells. Experiments on CD1d−/<sup>−</sup> mice indicated that iNKT cells were required to establish effective protections against S. pneumoniae.

Both conjugation methods, i.e., conjugating antigen to lipid tail or to 6-OH on galactose through cleavable linker, were proven to be successful. The conjugated vaccines have been demonstrated to provide stronger immune stimulation compared to a simple mixture of antigen and adjuvant. Several more examples using either conjugation method have been published since then (**Schemes 4B,C**).

Painter and Herman continued the study on conjugation linkers and designed several possible linkage methods to covalently conjugate the antigen with α-GalCer (**Scheme 4C**) (Anderson et al., 2015; Compton et al., 2015). They first investigated four different linkers to link short peptide antigens on GalCer lipid tail (Anderson et al., 2015). Similar to their previous work (Anderson et al., 2014), an N to O migration of the acyl group on α-GalCer was designed, resulting in an α-GalCer prodrug with a free amino group. The amino group was further capped with an esterase sensitive acyloxymethyl carbamate linker containing ketone (linker **1**) or azido group (linker **3**), or with protease sensitive valine-citrulline-p-amino-benzyl (VC-PAB) carbamate linkers containing ketone (linker **2**) or azido group (linker **4**). Short peptide antigens with a protease cleavable FFRK sequence were conjugated to the 4 different linkers through oxime formation (for linkers **1** and **2**) or copper catalyzed azido-alkyne coupling (CuAAC) (for linkers **3** and **4**). All four conjugates showed similar levels of NKT cell activation in a melanoma challenge model. These conjugates showed improved protection compared to unconjugated mixtures. Among the four choices, linker **4** provided a better stability under physiological pH and eased the synthesis of peptide payload, and therefore was considered as a lead compound for further development.

Painter and Turner applied the aforementioned conjugation strategy for the development of an influenza vaccine. They linked a synthetic long peptide (SLP) containing an immunogenic sequence OVA<sup>257</sup> (amino sequence: SIINFEKL), a known CD4<sup>+</sup> T cell epitope OVA<sup>323</sup> (amino acid sequence: ISQAVHAAHAEINEAGR) and a protease cleavage sequence FFRK, with the α-GalCer prodrug with VC-PAB linker through CuAAC (linker **4**) or strain-promoted alkyne-azide cycloaddition (SPAAC) (linker **5**) (Anderson et al., 2017). Though the two conjugation methods introduced slightly different linker structures in the final α-GalCer prodrug-SLP conjugates, the two vaccines primed NKT cells similarly in vivo. As the SPAAC strategy provided a higher yield with fewer side-products, this form of vaccine was subjected to further studies. It has been noted that the α-GalCer prodrug-SLP conjugate vaccine induced CD8+ T memory cell at a similar level as A/PR8-OVA challenged group, which was known to induce OVA specific memory response. The memory T cell response lasted for at least 60 days after immunization. The α-GalCer alone, SLP alone or α-GalCer + SLP mixture failed to induce such memory T cell response. In vivo challenge study using OVA modified influenza virus showed that mice vaccinated with the α-GalCer prodrug-SLP conjugates showed a faster viral clearance and body weight recovery compared to α-GalCer alone or α-GalCer + SLP mixture, suggesting the generation of protective immunity by vaccination.

Weinkove and Painter reported an α-GalCer prodrug conjugated with pp65495−503, an HLA-A<sup>∗</sup> 02-restricted peptide from cytomegalovirus (CMV) pp65 protein, through the VC-PAB linker using CuAAC chemistry (linker **4**) (Speir et al., 2017). The resulting conjugate activated human DCs and CD8<sup>+</sup> T cells besides NKT cells in vitro. After incubating human peripheral blood mononuclear cells (PBMCs) with α-GalCer or α-GalCerpp65495−<sup>503</sup> conjugate, increased NKT proliferation and IFNγ secretion were observed. Human DCs can be activated by α-GalCer or α-GalCer-pp65495−<sup>503</sup> conjugate only when cocultured with NKT cells. The activation of NKT cells and DCs can be blocked by anti-CD1d antibodies, which suggested α-GalCer-pp65495−<sup>503</sup> activate human immune cells through the CD1d dependent pathway. The activation of human CD8<sup>+</sup> T cells also required NKT cells. The conjugation between antigen peptide and α-GalCer is crucial for CD8<sup>+</sup> T cell activation, as the admixed components failed to induce the expression of T cell activation marker CD137. An oncogenic viral antigen HPV16 E749−<sup>57</sup> was conjugated to α-GalCer prodrug through the same strategy and the resulting conjugate vaccine showed significant antitumor response against HPV16 E7 expressing tumor in mice model, which further suggested the effectiveness of α-GalCer prodrug-peptide antigen conjugate strategy.

Painter and Herman's labs also investigated the conjugation of antigen to 6-OH position of α-GalCer through a disulfide bond or a maleimido-linker (Compton et al., 2015). 6′′-Deoxy-6 ′′ -thiol-α-GalCer was first synthesized and was proven to have similar bioactivities as α-GalCer. The thiol group may be trapped with 2,2′ -dithiodipyridine followed by reacting with Cyspeptide to form disulfide bond, or reacting with N-propargyl bromomaleimide followed by CuAAC for conjugation with the peptide. Both conjugates induced a stronger peptide-specific cytotoxic response in vivo relative to a mixture of α-GalCer and the peptide.

Liu and Guo designed a fully synthetic cancer vaccine candidate by linking tumor associated STn antigen to α-GalCer through a covalent linker at the 6-OH position (Yin et al., 2017). Previous study showed that PEGylation on 6-OH position of α-GalCer through the amide linker retained the specificity of CD1d receptor and the ability to activate iNKT cells (Ebensen et al., 2007). Therefore, the 6 position of α-GalCer was selected as the site of conjugation via an amide bond to a non-cleavable linker consisted of a non-branched aliphatic chain to link with the STn antigen. STn-β-GalCer was also synthesized as a weak iNKT activator. The synthetic STn-α-GalCer and STn-β-GalCer were mixed with other lipids to form liposomal vaccines, respectively. Based on ELISA results, though the two vaccines generated similar sera IgM titers against STn on BALb/c mice, STn-α-GalCer induced 23-fold higher IgG titers compared to STn-β-GalCer. Subtype analysis indicated the IgG antibodies were primarily IgG1 and IgG3, which were strong inducers of complement-dependent cytotoxicity (CDC) and antibody dependent cell-mediated cytotoxicity (ADCC). In this case, α-GalCer served as a liposomal carrier as well as an adjuvant for iNKT cell activation. In a later study from the Seeberger lab, the liposomal form of Tn-α-GalCer conjugates showed effective activation of anti-Tn immunity in vivo (Broecker et al., 2018). Compared to Tn-CRM197, a protein carrier-based vaccine, the anti-Tn IgG response generated by the liposomal form of Tnα-GalCer conjugate was more consistent and more specific. Furthermore, the liposomal form of Tn-α-GalCer conjugates also generated long-lasting memory response against Tn, while the Tn-CRM<sup>197</sup> only induced memory response to the carrier protein in some of the mice but not to the glycan antigen. Liposomes formed by Tn-lipid conjugate without the α-Gal structure could also generate anti-Tn IgG, but with a lower magnitude of response compared to Tn-α-GalCer liposomes. The size of the liposomes was shown to be crucial in this case. While the ∼400 nm sized liposomes promoted Th1-type IgG2a antibodies, the smaller particles (∼120 nm) mainly induced the production of Th2-type IgG1 antibodies. This report indicated the multivalent display of antigens by the antigen-α-GalCer conjugated liposome can be beneficial.

The aforementioned examples have shown the promises of antigen-α-GalCer conjugates as vaccines. The conjugates have been reported to have a stronger protective effect compared to the antigen and α-GalCer mixture. Short peptides and carbohydrates antigens can be used and multiple methods for conjugation were developed, which provided flexible ways for vaccine design. The liposomal form of antigen-α-GalCer covalent conjugates can further help inducing strong and tunable immune responses.

### MODIFIED DEXTRAN

Dextran is a branched natural polysaccharide containing α-1,6 linkage between glucoses as the backbone with α-1,3 linked branches. It is a biocompatible, biodegradable and FDA proved material. Dextran is water soluble and is easy to modify with other functional groups to achieve environment responsive properties. Though crystalized dextran particles can serve as vaccine delivery vehicle as reported (Schröder and Ståhl, 1984; Shen et al., 2013), most studies have focused on modified dextran as a candidate for vaccine design. In this section, we discuss only modified dextran.

#### Acetalated Dextran

Acetalated dextran (Ac-Dex) is a pH responsive material first reported in 2008 by the Fréchet's group (Bachelder et al., 2008). It can be synthesized easily from dextran through a single step acetal formation with 2-methoxypropene. In contrast of dextran, Ac-Dex is not soluble in water and can form microparticles using an emulsion procedure. Under acidic conditions, the acetals get hydrolyzed to unmask the parent water soluble dextran structure and therefore breaking up the hydrophobic microparticles. In their study, a model hydrophobic payload, OVA, was encapsulated inside Ac-Dex particles via double emulsion with a loading rate of 3.6 wt%. At pH = 7.4, the particles were stable, while in pH = 5.5 buffers, the particles degraded within 24 h. T cell activation assay showed that OVA loaded Ac-Dex particles significantly increased MHC I presentation of SIINFEKL on RAW macrophages compared to free OVA group. The Huang group applied Ac-Dex to deliver foreign antigens for anti-tumor therapy (Kavunja et al., 2017). The SIINFEKL loaded Ac-Dex particles significantly enhanced the efficiency of SIINFEKL reaching tumor tissue and successfully slowed down tumor growth. In vivo CTL assay indicated the OVA loaded Ac-Dex induced CTL responses without additional adjuvants (Kavunja et al., 2017). With the ability of enhancing CTL activation, these Ac-Dex particles were highly efficacious in protecting mice from tumor induced death.

A great advantage of Ac-Dex over traditional PLGA is the ease in tuning rate of degradation, providing the possibility to optimize the payload releasing rate for a specific application (Broaders et al., 2009; Chen et al., 2016). During the acetal modification, two types of acetal, cyclic acetal which hydrolyzes more slowly and acyclic acetal with faster degradation rates, would be formed on dextran (**Scheme 5**). As the kinetic product acyclic acetal forms first before the more stable cyclic acetals, the ratio of cyclic/acyclic acetal on the dextran backbone can be tuned by reaction time. The ratio of cyclic and acyclic acetal in the final product dictates the degradation behavior of the Ac-Dex particles. By controlling the reaction time from 2 to 1,500 min, a set of Ac-Dex with different ratios of cyclic/acyclic acetal was prepared (Broaders et al., 2009). The degradation half-life at pH

= 5.5 was tuned from minutes to days. The degradation rates at pH = 7.4 were usually 230–280 times slower than those at pH = 5.5, which was stable enough for delivery applications. Half-life of degradation correlated well with cyclic acetal content, which indicated the hydrolysis of cyclic acetal may be the rate-limiting step in particle degradation. The molecular weight of dextran also influenced the degradation of particles (Chen et al., 2016). With similar cyclic acetal coverage, the Ac-Dex with higher molecular weight degraded faster.

The degradation rate can be important for both MHC I and MHC II antigen presentation (Broaders et al., 2009). OVA loaded Ac-Dex particles with degradation half-lives from 0.27 to 16 h were prepared and incubated with bone marrow dendritic cells (BMDCs) followed by T cell activation assays to determine MHC I and MHC II presentation of OVA derived epitopes. The particles with 1.7 h degradation half-life led to an optimal MHC

I or MHC II presentation of OVA derived epitopes compared to particles with either longer or shorter degradation half-life. These optimal particles performed an order of magnitude better than traditional PLGA or iron oxide particles. Interestingly, the Ac-Dex particles with 1.7 h degradation half-life did not require the transporter to be associated with antigen processing (TAP), a protein involved in the most common MHC I antigen loading mechanism, for antigen presentation, while the particles with 16 h degradation half-life required TAP for antigen loading (Broaders et al., 2009). The difference might be attributed to the surface chemistry difference of the two materials due to the different degradation rate. A recent in vivo study (Chen et al., 2018b) showed that OVA loaded Ac-Dex particles with 20% cyclic acetal coverage (CAC) generated stronger antibody response during the entire experiment period compared to particles with 40 and 60% CAC. Notably, when the particles were used for adjuvant delivery, the immune activating behavior was different. The adjuvant loaded Ac-Dex particles with 20% CAC induced stronger antibody and cytokine response at early time points (day 14), while the 40 and 60% CAC induced greater antibody titers at later time points (days 28 and 42). This study suggested the importance of delivery of antigen and adjuvant separately in individually optimized Ac-Dex particles.

One possible limitation for Ac-Dex is that, one of the products released from degradation is methanol, which is known to be highly toxic. Therefore, 2-ethoxypropene was explored as an alternative to functionalize dextran instead of 2-methoxypropene (Kauffman et al., 2012). No significant differences were observed in cell viability when cells were incubated with the acetalated dextran formed with 2-ethoxypropene or Ac-Dex at concentrations below 1 mg/ml. Further toxicity study is needed to determine if the new acetalated dextran improved the biocompatibility at higher concentrations. To date, most studies have been using Ac-Dex as the carrier material.

Ac-Dex has been introduced for vaccine adjuvant delivery since 2010 (Bachelder et al., 2010) for several types of TLR agonists. Keane-Myers and co-workers first studied Ac-Dex microparticles as the delivery platform for imiquimod, a hydrophobic TLR7/8 agonist, as an adjuvant in vitro. Imiquimod loaded Ac-Dex microparticles were prepared with 4 wt% loading rate and 100% loading efficiency. After incubation with imiquimod loaded particles, the gene expression level, cytokine secretion level of inflammatory cytokines IL-1β, IL-6, and TNFα, and the expression of two activation markers PD1-L1 and iNOS as well as the production of downstream product NO, were significantly increased in two macrophage cell lines, MH-S and RAW 264.7. The particles also significantly increased the production of IL-1β, IL-6, IL-12p70, and MIP-1α in BMDCs. Compared to free imiquimod, the encapsulated imiquimod induced higher amounts of cytokine at lower concentrations of the particles. Empty Ac-Dex did not induce detectable inflammatory cytokine or activation marker increases. This in vitro study showed the promise of Ac-Dex as a vaccine adjuvant carrier to achieve a good immune stimulation effect.

Another method for Ac-Dex particle preparation, electrospray (ES), provided a better encapsulation efficiency (83%) toward a less hydrophobic TLR 7/8 agonist resiquimod compared to the standard emulsion encapsulation method (6%) (Duong et al., 2013). Particles made by electrospray were larger (1–5µm) than those from the emulsion method (∼300 nm) and had a collapsed morphology. More spherical particles could be obtained when blending with Tween 80 during electrospray process. The Tween 80-blended Ac-Dex particles stimulated macrophages in vitro to increase NO release and inflammatory cytokine secretion. The in vivo study showed that these particles reduced L. donovani amastigotes in heart and liver of mice relative to mice receiving empty nanoparticles or PBS.

The Ainslie's group applied Ac-Dex to deliver another two TLR agonists, i.e., poly I:C and CpG as vaccine adjuvants (Peine et al., 2013). 71 kDa Ac-Dex with 5 min acetalation reaction time [Ac-Dex (5min)] was found to be the best material for the delivery of both agonists. The encapsulation efficiencies of poly I:C and CpG in Ac-Dex (∼55 and ∼36%, respectively) were significantly higher compared to traditional PLGA particles (∼33 and ∼3%, respectively). A significantly higher level of NO release and cytokine secretion including IL-6, IL-12p70, IL-1β, IL-2, TNF-α, and IFN-γ was observed in RAW 264.7 macrophages with poly I:C encapsulated Ac-Dex (5 min) particles compared to poly I:C encapsulated PLGA particles and another Ac-Dex, Ac-Dex (4 h), which degraded slower. Due to the poor encapsulation of CpG in PLGA (∼3%), only Ac-Dex (5 min) was tested for delivering CpG to RAW 264.7. For both NO release and cytokine profile, CpG encapsulated in Ac-Dex was superior to free CpG.

Ting's lab applied Ac-Dex particles for the delivery of cyclic dinucleotide (CDN) 3′ 3 ′ -cGAMP, a ligand of stimulator of interferon genes (STING), for immune cell activation (Junkins et al., 2018). The cGAMP is a water-soluble adjuvant, which has poor cell penetration abilities. Liposomes and hydrogel delivery carrier of cGAMP were associated with low encapsulation efficiency and poor long-term stability (Hanson et al., 2015; Irvine et al., 2015; Lee et al., 2016; Koshy et al., 2017). With the electrospray method, the Ac-Dex particles (ES Ac-Dex) loaded up to 0.52%wt of cGAMP with 89.7% encapsulation efficiency, which is significantly higher compared to Ac-Dex particles prepared through the emulsion method (EM Ac-Dex), PLGA particles or liposomes. The cGAMP loaded ES Ac-Dex remained intact in pH neutral media at 37◦C for at least 28 days without losing the bioactivity of cGAMP. Strong immune activation was observed both in vitro and in vivo without significant toxicities. When ES Ac-Dex was co-administrated with a model antigen OVA, the level of antibody against OVA generated in vivo was enhanced by 10<sup>4</sup> to 10<sup>6</sup> folds compared to OVA alone. Analysis of antibody subtype indicated the cGAMP encapsulated ES Ac-Dex particles induced balanced Th1 and Th2 associated immune responses, while the Alum adjuvant produced mainly Th2 polarized responses. Besides humoral responses, the cGAMP encapsulated ES Ac-Dex also induced cellular responses against the model antigen OVA. On a B16F10 melanoma model, the cGAMP Ac-Dex showed a better anti-tumor effect compared to three other Ac-Dex particles encapsulating different adjuvants, Murabutide, imiquimod, and Poly I:C (Watkins-Schulz et al., 2019). The successful anti-cancer immunotherapy by cGAMP Ac-Dex particles was also observed on a triple negative breast cancer cell line E0771. Systematic administration of cGAMP Ac-Dex through intravenously injection slowed down tumor growth as efficient as local administration through intratumoral injection. Interestingly, in the B16F10 model, the NK cells, instead of T cells, were the major type of cells for tumor lysis. However, for E0771 tumor, both NK and T cells were important for the anti-tumor responses. These results indicated the importance of activating both the innate immune cells (NKs) and adaptive immune cells (T cells) for tumor immunotherapy, as the T cells may not always be the major anti-tumor responders.

Co-delivering more than one adjuvant within one Ac-Dex particle can improve the immune activation compared to single adjuvant loaded Ac-Dex particles. For example, cGAMP ES Ac-Dex successfully induced high levels of IFN-β, IL-6, and TNF. With the co-encapsulation of resiquimod (R848) in the same particle, the cGAMP/R848 ES Ac-Dex elicited two more important cytokines for adaptive immune activation, IL-1β, and IL-12p70 (Collier et al., 2018). Co-administration of separate cGAMP ES Ac-Dex and R848 ES Ac-Dex particles was not as efficient as co-encapsulation of the two adjuvants within the same particle based on in vitro cytokine release study. The combination of muramyl dipeptide (MDP), a NOD2 ligand, with R848, also showed superior additive effects (Paßlick et al., 2018).

Besides serving as an adjuvant carrier, Ac-Dex particles can deliver both the antigen and the adjuvant as a full vaccine against various targets, such as anthrax, bacterial infection and influenza.

Anthrax caused by the infection of (B. Anthracis) can lead to death within 1 week, with the current vaccine Anthrax Vaccine Adsorbed requiring up to 6 doses and 18 months to achieve protection (Schully et al., 2013). A vaccine that can generate fast immune protection against anthrax is urgently needed. The Ainslie' group designed a Ac-Dex based vaccine to generate a rapid immune response against anthrax, where Ac-Dex was used to encapsulate R848, and Protective Antigen (PA), the most important toxic component of anthrax antigen, in separate particles by emulsion (Schully et al., 2013). Mice received both R848 Ac-Dex and PA Ac-Dex showed much stronger IgG responses on days 14, 28, and 42 after immunization compared to PA+Alum or free PA + R848 Ac-Dex particles. All mice immunized with PA Ac-Dex +R848 Ac-Dex vaccine survived 3 challenges on days 14, 28, and 42 with both low and high doses of B. Anthracis. This Ac-Dex based vaccine only required two injections at days 0 and 7, and effective protection against anthrax was observed as early as 14 days. The fast generation of protective immune response by Ac-Dex based vaccine provided a promising way fighting against fast progressing diseases. In a later study, electrospray method was used instead of emulsion to fabricate Ac-Dex particles with PA only or with both PA and R848 (Gallovic et al., 2016). Three vaccine formulations were used to immunize the mice: (i) PA absorbed to resiquimod microparticles; (ii) PA and resiquimod encapsulated in separate particles; and (iii) PA and resiquimod encapsulated in same particle. Both (ii) and (iii) induced high IgG1 and IgG2a titers on day 42 after immunization similar to or higher than Anthrax Vaccine Adsorbed, the current anthrax vaccine. The in vivo study showed that (ii) was the best vaccine, which protected 50% mice from death during 28 days observation, while mice immunized with (iii) only had 10% survival. BioThrax group did not survive beyond 13 days. The in vivo study indicated that delivering PA and adjuvant in separate particles may provide a faster and stronger immune response toward anthrax. This finding supported the idea that adjuvant and antigen should be encapsulated in separate Ac-Dex particles optimized for each component with different CAC percentages (Chen et al., 2018b).

Ac-Dex was used as carrier for a Burkholderia pseudomallei subunit vaccine and showed the ability to generate immune responses within a short time period (Schully et al., 2015). The antigen B. pseudomallei lysate and an adjuvant R848 were encapsulated in separate Ac-Dex particles. The rapid immunization schedule (two injections on day 0 and 7) slowed down the death progress during 26-days observation when mice were challenged on day 14 with a lethal dose B. pseudomallei. 12% of the immunized mice survived the challenge on day 26 while most mice in control groups died within 2 days of challenge and none survived beyond 20 days. The vaccinated group had higher antibody titers, stronger cytokine secretion (IL-4, IL-5, IL-17A, IL-12, IFN-γ, GM-CSF, and TNF-α) and more cytotoxic T cells compared to the control group receiving PBS only.

The Ting lab applied the cGAMP encapsulated Ac-Dex with soluble hemagglutinin (HA) protein from H1N1 influenza virus for anti-influenza vaccination (Junkins et al., 2018). A strong Th1-biased antibody response was observed in cGAMP Ac-Dex + HA group, while Alum + HA only induced weak Th2-biased antibody response. The cGAMP Ac-Dex + HA protected 12 out of 13 mice from H1N1 influenza challenge, while >90% of untreated mice and >75% of mice immunized with free HA only were killed during the challenge. The neutralizing antibodies generated by cGAMP Ac-Dex + HA remained detectable in mouse sera for more than 4 months after immunization and protected the mice from a lethal dose of H1N1 influenza virus challenge 7 months after immunization. The Bachelder lab investigated the co-administration of cGAMP Ac-Dex and the ectodomain of matrix protein 2 (M2e) encapsulated Ac-Dex particles as an anti-influenza vaccine (Chen et al., 2018a). The M2e and cGAMP were encapsulated in separate Ac-Dex particles with different percentage of CAC. In contrast to the delivery of OVA antigen where a high antibody titer was observed in Ac-Dex particles with low CAC (20%) (Chen et al., 2018b), it was observed that the M2e Ac-Dex with high CAC (60%) induced higher antibody titers compared to M2e Ac-Dex with lower CAC (40 and 20%). The cGAMP encapsulated Ac-Dex particles with different CAC (20, 40, or 60%) did not significantly change the antibody titers. The M2e and cGAMP encapsulated in separate Ac-Dex particles (60% CAC) induced significantly higher antibody titers compared to the co-encapsulation of M2e and cGAMP in same Ac-Dex (60% CAC). Besides the antibody titer, significantly higher levels of IFN-γ, IL-2, and IL-6 secretion were detected in mice immunized with M2e Ac-Dex (40% or 60% CAC) + cGAMP Ac-Dex (60% CAC), which suggested a successful generation of cellular immunity. Both vaccines, M2e Ac-Dex (40%) + cGAMP Ac-Dex (60%) and M2e Ac-Dex (60%) + cGAMP Ac-Dex (60%) showed significant improvement of survival during a lethal dose influenza challenge in mice.

The studies discussed so far relied on passive uptake of the Ac-Dex particles by immune cells. The Fréchet group studied mannosylated Ac-Dex particles for immunomodulation through mannose targeting (Cui et al., 2011). "Click-able" Ac-Dex was obtained by partially modifying the hydroxyl groups on dextran backbone with an azido-triethylene glycol linker followed by acetalation. Microparticles were then prepared through the emulsion method with subsequent surface mannosylation using the CuAAC reaction. These particles (referred to as Man-Ac-Dex) with high density mannose on the surface (up to 10<sup>6</sup> /particle) had high binding avidity to mannose receptors on DC surface. Man-Ac-Dex showed 1.5–2 fold increase of DC uptake and about 5 fold increase of MHC I presentation on DCs compared to Gal-Ac-Dex, azido-Ac-Dex, or Ac-Dex particles, suggesting more potent immune activation. However, no in vivo study was performed with particles.

#### Reducible Dextran Nanogel

Besides acetalated dextran, reducible dextran nanogel is another type of modified dextran, which has been developed for antigen delivery to DCs (Li et al., 2015, 2016). A cationic dextran nanogel has been fabricated by inverse mini-emulsion photopolymerization with methacrylated dextran, a methacrylamide functionalized disulfide linker, and a positively charged methacrylate monomer. The nanogel was then covalently conjugated with a model antigen OVA through a disulfide linker (**Scheme 6**). The confocal microscopy indicated the OVA conjugated nanogel enhanced the uptake by D1 cells compared to non-covalently loaded OVA-nanogel, free OVA or empty nanogel. The OVA-conjugated nanogel combined with poly I:C significantly slowed down the growth of B16-OVA tumor expressing OVA antigen in a mouse tumor model compared to free OVA, non-covalent OVA-nanogel (Li et al., 2016). A preventive antitumor model was studied by immunizing C57BL/6 mice on days 0 and 14 with different vaccine formula followed by tumor challenge on day 28 with B16-OVA cells. All PBS or empty nanogel treated mice died within 20 days after tumor cell injection. Only 30% of the mice in non-covalent OVA-nanogel group were tumor-free on day 52, while 90% of mice immunized with OVA-conjugated nanogel+poly I:C remained tumor free. OVA-conjugated nanogel+poly I:C induced highest percentage of OVA specific CD8<sup>+</sup> T cell and OVA specific IgG titers. In addition to the preventive model, the efficacy of the vaccine was investigated in a therapeutic model. Mice were injected with B16-OVA on day 0, which was followed by two immunizations on days 6 and 16. While all other groups developed fast-growing tumor and died within 35 days, the OVA-conjugated nanogel+poly I:C significantly slowed the tumor growth and prolonged the survival.

These two studies showed the reducible nanogel carrier can enhance DC activation in vitro and generate significant preventive and curative effects against tumor in vivo. It was found that the OVA-loaded nanogel exhibited cytotoxicity at high concentrations, which may require more chemical modifications to improve biocompatibility (Li et al., 2015). For example, the percentage of the cationic monomer may be lowered to reduce the level of positive surface potential to decrease cytotoxicity.

#### Oxidation Sensitive Dextran

Reactive oxygen species are heavily produced in the phagosomes of APCs, which are crucial for initiating immune responses (Jones, 2008; Winterbourn, 2008). It has been reported that the most effective APCs, DCs, may have phagosomes with

H2O<sup>2</sup> concentration up to 1 mM (Savina et al., 2009). Therefore oxidation sensitive dextran was investigated as a vaccine carrier candidate (Broaders et al., 2011). Free hydroxyl groups on dextran were modified with arylboronic ester resulting in Oxi-Dex (**Scheme 7**). 100–200 nm sized particles were prepared via the standard emulsion method. The resulting particles were stable in PBS buffer but decomposed in 1 mM H2O<sup>2</sup> with a halflife of 36 min. The OVA encapsulated Oxi-Dex induced a 27 fold increase of OVA presentation in DC 2.4 cells compared to OVA encapsulated PLGA particles, while free OVA did not get presented. However, this Oxi-Dex was not further studied after this report.

### pH Sensitive Amphiphilic Galactosyl-dextran-retinal Conjugates (GDR)

The galactosyl-dextran-retinal (GDR) conjugates is a pH sensitive amphiphilic material reported by the Ma group (Wang et al., 2016). All-trans retinal, the precursor of retinoic acid (active metabolite of Vitamin A), was first conjugated to dextran through a pH-responsive hydrazone bond then further modified with ethylenediamine following reaction with NHS activated lactobionic acid to obtain the GDR conjugate. GDR was amphiphilic, which spontaneously self-assembled into nanogel with size around 115 nm and zeta-potential around 27 mV. At pH = 7.4, the GDR was relatively stable with <10% of retinal release within 48 h. However, the hydrazone bond in GDR conjugate could be rapidly cleaved at pH 5.0 resulting in over 50% retinal release within 24 h, which could serve as an adjuvant. GDR nanogel induced BMDC maturation in vitro while free retinal failed to do so. OVA-loaded GDR nanogel enhanced both MHC I and MHC II antigen presentation on BMDCs. The release of retinal from GDR nanogel significantly elevated the reactive oxygen species (ROS) generation in BMDCs by 2–3 folds relative to free all-trans retinal within 4 h due to lysosomal disruption, and the resulting ROS significantly enhanced proteasome activity in BMDCs. In a B16-OVA tumor model, the OVA-GDR nanogel vaccine suppressed tumor growth and prolonged mouse survival compared to free OVA, free OVA+retinal and PBS groups. OVA-GDR nanogel induced robust CD 8<sup>+</sup> T cell proliferation as well as high levels of IFN-γ production and lysis of tumor cells.

## β-GLUCANS

β-Glucans are β-1,3-linked glucose polymers with β-1,6 branches. β-Glucans can be isolated from fungal cell wall, bacteria, seaweed, cereal, etc. Depending on the source, the polysaccharides may have varied primary, secondary or tertiary structures, or physical properties. Though heterogeneous, these polysaccharides can induce similar immune responses and therefore usually termed as a common name "β-glucans" (Novak and Vetvicka, 2008). The major β-glucan receptors in mammals are dectin-1, and complement receptor 3 (CR3, CD11b/CD18) (Levitz et al., 2015). It has been reported that the stimulation via dectin-1 primes Th1, Th17, and cytotoxic T lymphocyte responses (LeibundGut-Landmann et al., 2008; Geijtenbeek and Gringhuis, 2009; Levitz et al., 2015). With their immune stimulating properties, β-glucans have been studied in vaccine design with an established record of safety in both preclinical and human trials (Williams et al., 1988; Novak and Vetvicka, 2008; Weitberg, 2008). As a major component of fungal cell wall, β-glucans has been widely used as antigens for generating anti-glucan antibodies against fungal infections (Bromuro et al., 2010; Cassone and Casadevall, 2012; Liao et al., 2015). In this section, we focus on examples applying β-glucans as vaccine carriers and built-in adjuvants.

### β-Glucan Particles

β-Glucan particles (GPs) are the most studied vaccine carriers in the β-glucan family. They were developed in 1980s but only widely used as vaccine carriers in recent years (Hunter et al., 2002; Mirza et al., 2017; Abraham et al., 2019). GPs are highly purified, hollow porous cell wall shells with 2–4µm sizes. GPs can be derived from baker's yeast through a series of hot alkali and organic extractions (Di Luzio et al., 1979; Williams et al., 1989). It contains primarily 1,3-β-glucans along with small amounts of β-1,6-glucans and chitin (Levitz et al., 2015). GPs can be recognized by dectin-1 and upregulate cell surface presentation of MHC molecules and co-stimulation molecules as well as inducing the production of inflammatory cytokines (Hunter and Redelman, 2004; Berner et al., 2005; Huang et al., 2009). The hollow GPs have been studied as carriers for proteins, DNA, siRNA, and other small molecules (Soto and Ostroff, 2008; Aouadi et al., 2009; Huang et al., 2010, 2012; Tesz et al., 2011; Soto et al., 2012).

Antigens can be non-covalently trapped inside GPs with the addition of polymers such as yeast tRNA, alginate-calcium or alginate-calcium-chitosan mixture. The Levitz group used tRNA to trap OVA protein inside GPs (Huang et al., 2010). These GPs were efficiently taken up and proteolyzed by DCs to induce DC maturation. Significant T cell proliferation was observed when incubated with GP-OVA at concentrations starting from 0.03 µg OVA /ml, while the free OVA protein needed 100 times higher concentration to reach similar stimulation levels. The CD4<sup>+</sup> T cells isolated from GP-OVA immunized mice secreted significantly higher amounts of pro-inflammatory cytokines such as IL-4, IL-17, and IFN-γ compared to Alum/OVA immunized mice. For antibody responses, the GP-OVA vaccine successfully induced Th1 skewing antibody subtype IgG2c, while the Alum/OVA induced only IgG1 responses. The long-term immune responses were monitored 18–20 months after the last immunization (Huang et al., 2013). The CD4<sup>+</sup> T cells isolated from immunized mice resumed cytokine secretion upon ex vivo OVA stimulation, and the serum antibody titer remained detectable. Notably, the encapsulation of OVA in GPs was found important, as the admixture of OVA and GPs was not as effective in inducing CD4<sup>+</sup> T cell cytokine secretion and antibody responses (Huang et al., 2013). The Levitz group also studied polymers such as alginate-calcium (AC) or alginatecalcium-chitosan (ACC) mixture for trapping antigens in GPs (Huang et al., 2013). The AC and ACC trapped GP-OVA showed comparable capacities to induce antigen-specific T cell responses and antibody responses in mice as the tRNA trapped GP-OVA. Other antigens such as BSA (De Smet et al., 2013), FedF (Baert et al., 2016), could also be trapped inside GPs as vaccine candidates.

Antigens can be loaded into GPs through covalent coupling. The Hunter group covalently conjugated antigen BSA to GPs through amide bonds (Berner et al., 2008). The BSA-GP conjugates were phagocytized by macrophages and both intradermal and oral administration of BSA-GP vaccine induced immune responses against BSA. OVA-GP conjugates were synthesized similarly, which induced strong BMDC, CD4<sup>+</sup> and CD8<sup>+</sup> T cells activation in vitro (Berner et al., 2015). The Hong group prepared OVA loaded GPs in organic phase, which reduced GP aggregation compared to aqueous phase conjugation, and provided more homogenous OVA-GPs (Yang et al., 2017). With this novel conjugation method, the GPs were first dispersed in cyclohexane/Igepal CO-520 (85:15) solution followed by the addition of aqueous solution containing the OVA antigen and glutaraldehyde cross-linker sequentially. The hydrophilic antigen and cross-linker would be slowly soaked into GP cavity due to the hydrophilic environment of the glucans and the conjugation primarily took place inside the GP cores rather than on the exterior of the GPs, which may cause cross-linking between particles and lead to aggregation. The resulting OVA-GPs successfully induced BMDC maturation and T cell proliferation in vitro and stimulated B cell activation and germinal center formation in vivo. High anti-OVA IgG2c titers were detected after only one immunization with the OVA-GP vaccine, which indicated a strong Th1 biased immune response. The OVA-GPs successfully induced antigen-specific CD8<sup>+</sup> T cell response in vivo and provided significant protection against tumor development to EG.7-OVA tumor bearing mice.

An interesting property of GPs is that they can be administered orally. GPs can be taken up by human intestinal epithelial cells and induce the secretion of chemokines and the expression of pattern recognition receptors and costimulatory molecules (De Smet et al., 2013). The GP-OVA complex can be delivered by M cells to mucosal lymphoid tissues and induce the proliferation of OVA specific CD4<sup>+</sup> T cells when given orally to mice. Surface functionalization of an immunoglobulinbinding protein G followed by the anti-aminopeptida N (APN, an intestinal epithelial receptor) antibody on GPs can further enhance the passage of particles through the epithelial barrier (Baert et al., 2015). Compared to isotype antibody conjugated GPs, the anti-APN GPs can be internalized 10 times more by intestinal epithelial cell line IPEC-J2 at a 16-fold lower concentration. In vivo study showed that orally administrated anti-APN-coated, FedF-loaded GPs induced significantly higher titers of antibodies compared to non-targeting FedF loaded GPs.

### β-Glucan-antigen Complex

A β-glucan member, schizophyllan (SPG), contains a β-1,3 glucan main chain with β-1,6-glycosyl side chain every three glucose residues. It can form stoichiometric complexes with specific homonucleiotides such as poly(C) or poly (dA) via a combination of hydrogen bonding and hydrophobic interactions (**Scheme 8**) (Sakurai and Shinkai, 2000; Sakurai et al., 2001; Numata et al., 2003). Unlike β-glucan particles, these SPG complexes are nano-rod shaped with diameters around 10–20 nm (Kobiyama et al., 2014). The complex includes two SPG chains and one polynucleiotide chain forming a triple helix through interactions between two main-chain glucoses and one base, and the stability of complex depends on the length of polynucleotide (Sakurai and Shinkai, 2000; Sakurai et al., 2001). The complex can be recognized by dectin-1 receptor inducing immune responses (Minari et al., 2010; Mochizuki and Sakurai, 2011), and therefore have been studied as vaccine adjuvants.

A complex of SPG with CpG-dA40, a short single stranded DNA fragment with CpG motif and a 40-mer poly(dA) tail, has shown strong immune activating effects due to the combination delivery of immunocytes targeting SPG and immuno-stimulative CpG (Kobiyama et al., 2014; Miyamoto et al., 2014, 2018). This complex can induce antigen-presenting cell activation as well as Th1 and CD8 T cell responses (Kobiyama et al., 2014, 2016; Ito et al., 2017). Intravenous injection of CpG-SPG complex suppressed tumor growth more efficiently than SPG, CpG or mixture of SPG and CpG on several tumor models (Kitahata et al., 2016). The CpG-SPG complex could be cross-linked to form nanogels with a larger size (∼150 nm) by mixing CpG-SPG and its complementary sequence (Kobiyama et al., 2014; Miyamoto

et al., 2017), which may further improve the delivery efficiency toward immune cells due to the size effect (Manolova et al., 2008). Compared to CpG-SPG complex, the crosslinked CpG-SPG nanogel induced significantly higher IL-6 secretion in mice splenocytes in vitro (Kobiyama et al., 2014). The fluorescence microscopy imaging indicated a 10 times higher uptake of the cross-linked CpG-SPG nanogel than CpG-SPG complex by macrophages (Miyamoto et al., 2017). The CpG-SPG nanogel induced more antigen specific CD8<sup>+</sup> T cells in vivo compared to CpG-SPG complex when coadministrated with OVA antigen. The nanogel immunization significantly slowed down EG7 tumor growth and prolonged survival in mice compared to free CpG or CpG-SPG complex (Miyamoto et al., 2017).

Besides CpG, peptide antigens can be conjugated with poly(dA) for preparing SPG-antigen complexes. The Sakurai group reported an OVA-SPG complex prepared with OVA peptide-poly(dA) conjugate and SPG (Mochizuki et al., 2015, 2017). It was observed that the conjugation strategy could influence the immune cell processing of the OVA-SPA complex (Mochizuki et al., 2017). OVA-poly(dA) conjugated through a glutathione cleavable disulfide linker can induce significantly higher levels of OVA antigen presentation on macrophages compared to the OVA-poly(dA) conjugated through a triazole. The conjugation of poly(dA) at the N terminal of OVA peptide, instead of at the C terminal, showed a higher OVA presentation by macrophages (Mochizuki et al., 2017). The OVA-SPG induced peptide specific CD8<sup>+</sup> T cell responses both in vitro and in vivo when co-administrated with CpG-SPG complex. OVA-SPG/CpG-SPG vaccine immunized mice showed significantly more effective in vivo lysis of OVA-pulsed target cells compared to free OVA peptide, free OVA + free CpG and free OVA + CpG-SPG group as indicated by in vivo CTL assays (Mochizuki et al., 2015). The strong CTL activation was observed with a very low dose of OVA peptide (100 ng/mouse) (Mochizuki et al., 2017). The OVA-SPG/CpG-SPG vaccine also successfully suppressed the growth of EG7 tumor and prolonged survival time in mice (Mochizuki et al., 2015).

### β-Glucan Based Nanoparticles for Vaccine Delivery

Beside the large-sized GPs and rod-shaped SPG complexes, βglucan nanoparticles were investigated for vaccine delivery. The Dong group developed a synthetic MUC1 vaccine by conjugating MUC1 peptide with a β-glucan chain (Wang et al., 2019). The resulting MUC1-β-glucan material formed homogenous nanoparticles sized 150 nm due to hydrophobic interactions. This MUC1-β-glucan nanoparticle induced significantly higher serum antibody titers and IFN-γ and IL-6 cytokines. The Zhang lab prepared β-glucan nanoparticles based vaccines by mixing positively charged aminated β-glucan with negatively charged CpG adjuvant and OVA protein antigen (Jin et al., 2018). The combination of dectin-1 activating β-glucan and TLR-9 activating CpG in one nanoparticle showed synergistic effects in inducing both strong humoral and cellular immune responses.

The Kono lab reported a set of β-glucan based pH sensitive materials for cytoplasmic delivery of antigen (Yuba et al., 2017). Curdlan, a kind of β-glucan, was modified with methyl glutaric acid (MGlu) to generate a pH responsive polysaccharide MGlu-Curd. Using a similar strategy, pH responsive 3 methyl glutaryl mannan (MGlu-Man), and 3-methyl glutaryl dextran (MGlu-Dex) were prepared. 1-Aminodecane was then conjugated to these polysaccharides to anchor these pH responsive polysaccharide chains onto membranes of OVAloaded liposomes. All three types of liposomes with different polysaccharides induced the release of cargo from liposome at around pH 5. The polysaccharide backbone played an important role for obtaining liposomes with high affinity to DC cells. Compared to MGlu-Man and MGlu-Dex coated liposomes, the liposome containing MGlu-Curd with 59 MGlu groups per chain (MGlu59-Curd), induced the highest DC uptake of the liposomes. The percentage of MGlu modification also influenced the immune activation. In general, curdlan with higher percentage of MGlu content (MGlu71-Curd and MGlu59-Curd) induced higher pro-inflammatory cytokines such as TNF-α and IL-12 in DC2.4 cells compared to those with lower MGlu content (MGlu41-Curd and MGlu21-Curd). Compared to MGlu-Man and MGlu-Dex, MGlu59-Curd elicited more IFN-γ and higher cell-mediated cytotoxicity in splenocytes isolated from OVAimmunized mice in vitro. The tumor challenge study showed that mice immunized with MGlu59-Curd had the smallest tumor size and longest survival time highlighting the advantage of the curdlan backbone.

#### CONCLUSIONS AND FUTURE OUTLOOK

In summary, we have reviewed recent advances in vaccine development applying carbohydrates as adjuvants and/or vaccine carriers. With their biocompatibility, ease for modification, and the ability to interact with the immune system through multiple mechanisms, carbohydrates provide a great variety of choices to meet the various needs for vaccine studies.

Carbohydrates can be modified through multiple methods such as amide or ester formation, CuAAC reaction, oxidation of sugar rings followed by imine or oxime formation, which make them flexible for various applications in vaccine designs. For example, the controlled release of the antigen and adjuvant from the vaccine carrier is important for immune activation. A desired carrier should not release their cargos before entering immune tissues, and should not release too slow after encountering immune cells, which may fail to produce enough immune stimulation resulting in tolerance (Lofthouse, 2002; Sivakumar et al., 2011).

The optimal deliveries of antigens and adjuvants can be different, and the carriers may need to be optimized separately (Chen et al., 2018a,b). By controlling the reaction time during the acetalation of dextran, a carbohydrate-based vaccine carrier with fine-tuned releasing profile can be achieved, which can serve as a great platform for vaccine optimization. Antigen-MPLA and antigen-α-GalCer conjugates can be easily combined with other well-studied lipid molecules to form liposomal vaccines. Taking advantage of the well-developed strategies for liposome preparation (Abu Lila and Ishida, 2017; Bulbake et al., 2017), carriers with controlled size and surface charges, another two important factors for immune targeting (Xiang et al., 2006; Bachmann and Jennings, 2010), can be obtained.

Notably, although there are many examples showing that successful carbohydrate conjugate-based vaccines can be achieved through multiple chemistry reactions and linker structures, the small structure alteration of carbohydrate backbones due to the conjugation may significantly influence the final immune outcomes. The carbohydrates often contain more than one position available for chemical modification. When designing carbohydrate vaccines, the conjugation site should be carefully chosen in order to obtain optimal immune recognition. As an example, the antigen-MPLA conjugates through 6′ position, where the polysaccharide chain is attached to the natural LPS, were superior in generating IgG responses compared to the antigen-MPLA conjugates that using 1-O-position as the conjugation site (Wang et al., 2017), while the blockade of the phosphate group on MPLA completely suppressed the ability for immune activation (Wang et al., 2011). The linkers between the payload and the carbohydrate backbones also played important roles in immune tuning. For example, the oxidative conjugation of mannan and MUC1 FP through imine linkers induced Th1 type immune response and successfully protected mice from tumor growth, while reductive conjugation through amines induced Th2 type immune response without successful tumor protection (Apostolopoulos et al., 1995b). Interestingly, there are examples using the trace amount of mannoproteins (∼5% in mannan) for allergen conjugation as allergic vaccines (Manzano et al., 2016; Sirvent et al., 2016). This strategy, taking advantage of other components in polysaccharide mixtures for chemical conjugation, can maintain the intact carbohydrate structure, which may reduce the chance of disturbing the immune activation function. However, the disadvantage of this strategy might be the quality control issue. The protein components may vary batch-to-batch, which may influence the conjugation efficiency, the physical and biological properties of the final materials.

An attractive strategy for future vaccine design can be the combination of different adjuvants that activates the immune system through different receptors. Adjuvants are playing crucial roles in vaccine design, and there have been examples indicating that combining adjuvants with different immune activating mechanisms can trigger additive effects and enhance the vaccine efficacy (Collier et al., 2018; Paßlick et al., 2018). However, cautions need to be taken in combining other adjuvants with the "self-adjuvating" carbohydrates. There are examples indicating the additional adjuvants have negative effects in MPLA and mannan based vaccine conjugates (Wu and Guo, 2006; Wang et al., 2011; Zhou et al., 2014, 2015; Liao et al., 2016; Sirvent et al., 2016; Benito-Villalvilla et al., 2019). Therefore, the external adjuvant needs to be carefully selected. Understanding the detailed mechanism of how multiple adjuvants collaborate with each other can guide future vaccine designs.

### AUTHOR CONTRIBUTIONS

XH conceived the topic. SL and XH wrote the manuscript.

#### ACKNOWLEDGMENTS

We are grateful to the National Institutes of Health (R01 CA225105, R01AI146210), Michigan State University Foundation, and the Department of Chemistry, Michigan State University for partial financial supports.

#### REFERENCES


into stimulatory and suppressive components. Cell. Immunol. 101, 403–414. doi: 10.1016/0008-8749(86)90153-X


lymph node-targeted vaccine adjuvants. J. Clin. Invest. 125, 2532–2546. doi: 10.1172/JCI79915


immunized with mannan-MUC1 fusion protein. J. Clin. Invest. 100, 2783–2792. doi: 10.1172/JCI119825


cell targeted vaccination: a novel tool for specific immunotherapy. J. Control. Release 165, 101–109. doi: 10.1016/j.jconrel.2012.11.002


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Lang and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# One-Step Enrichment of Intact Glycopeptides From Glycoengineered Chinese Hamster Ovary Cells

Ganglong Yang<sup>1</sup> , Naseruddin Höti <sup>1</sup> , Shao-Yung Chen1,2, Yangying Zhou<sup>1</sup> , Qiong Wang<sup>2</sup> , Michael Betenbaugh<sup>2</sup> and Hui Zhang1,2 \*

<sup>1</sup> Department of Pathology, Johns Hopkins University, Baltimore, MD, United States, <sup>2</sup> Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States

#### Edited by:

Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China

#### Reviewed by:

Zhixin Tian, Tongji University, China Matthew Robert Pratt, University of Southern California, United States Lianli Chi, Shandong University, China Wen Yi, Zhejiang University, China

> \*Correspondence: Hui Zhang huizhang@jhu.edu

#### Specialty section:

This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry

Received: 23 December 2019 Accepted: 12 March 2020 Published: 17 April 2020

#### Citation:

Yang G, Höti N, Chen S-Y, Zhou Y, Wang Q, Betenbaugh M and Zhang H (2020) One-Step Enrichment of Intact Glycopeptides From Glycoengineered Chinese Hamster Ovary Cells. Front. Chem. 8:240. doi: 10.3389/fchem.2020.00240 Recently, the glycoproteomic analysis of intact glycopeptides has emerged as an effective approach to decipher the glycan modifications of glycoproteins at the site-specific level. A rapid method to enrich intact glycopeptides is essential for the analysis of glycoproteins, especially for biopharmaceutical proteins. In this study, we established a one-step method for the rapid capture of intact glycopeptides for analysis by mass spectrometry. Compared to the conventional sequential enrichment method, the one-step intact glycopeptide enrichment method reduced the sample preparation time and improved the detection of intact glycopeptides with long sequences or non-polar amino acids. Moreover, an increased number of glycosite-containing peptides was identified by the one-step method compared with the sequential method. When we applied this method to the glycoproteomic analysis of glycoengineered Chinese hamster ovary (CHO)-K1 cells with α1,6-fucosyltransferase (FUT8) knockout, the results showed that the knockout of FUT8 altered the overall glycosylation profile of CHO-K1 cells with the elimination of core fucosylation and together with increases in high-mannose and sialylated N-glycans. Interestingly, the knockout of the FUT8 also appeared to regulate the expression of glycoproteins involved in several functions and pathways in CHO-K1 cells, such as the down-regulation of an intracellular lectin LMAN2 showing cellular adaptation to the alterations in FUT8 knockout cells. These findings indicate that the site-specific characterization of glycoproteins from glycoengineered CHO-K1 cells can be achieved rapidly using the one-step intact glycopeptide enrichment method, which could provide insights for bio-analysts and biotechnologists to better tailor therapeutic drugs.

Keywords: intact glycopeptides, glycoengineered CHO cells, one-step enrichment, FUT8 knockout, mass spectrometry

## INTRODUCTION

It has been challenging to interpret the heterogeneity of glycoproteins, which includes protein sequence, glycosite information, glycan structures at each glycosite, and overall and individual glycan occupancy at specific sites (Liu et al., 2017). Recently, the mass spectrometric analysis of intact glycopeptides has emerged as a promising strategy to analyze glycoproteins for glycan Yang et al. One-Step Enrichment of Intact Glycopeptides

modifications at site- and structure-specific levels (Sun et al., 2016; Yang et al., 2018; Xiao and Tian, 2019). However, the major limitation for this approach is the low detectability of intact glycopeptides in a complex mixture, which are suppressed by non-glycosylated peptides (Shajahan et al., 2017). Therefore, a capture method for intact glycopeptides before mass spectrometry analysis is highly desirable.

Several approaches, including the immobilization of glycopeptides using hydrazide chemistry (Zhang et al., 2003; Nilsson et al., 2009), lectin enrichment (Kaji et al., 2003; Yang et al., 2013), hydrophilic interaction chromatography (HILIC) (Wada et al., 2004; Sun et al., 2016), and anion exchange (Yang et al., 2017), are reported for glycopeptide capture. Among them, HILIC is commonly used for intact glycopeptide enrichment. Typically, the preparation of intact glycopeptides using HILIC involves several steps. Briefly, the proteins are digested to peptides by protease(s), followed by peptides clean-up by hydrophobic chromatography, drying, and then reconstitution in the organic solvent for glycopeptide enrichment using hydrophilic enrichment.

In this study, we incorporated the hydrophobic and the hydrophilic chemistries into a one-step column for enrichment of glycopeptides (**Figure 1**). The one-step method was applied to the capture of intact glycopeptides from glycoengineered Chinese hamster ovary (CHO)-K1 cells with α1,6-fucosyltransferase (FUT8) knockout. The captured intact glycopeptides from glycoengineered and wild-type CHO-K1 cells were analyzed by mass spectrometry. The results showed that, in addition to eliminating core fucosylation, the FUT8 knockout altered the overall glycosylation profile of CHO-K1 cells with increases in high-mannose and sialylated N-glycans, as well as the expression of proteins that were involved in several functions and pathways. The results showed that the rapid capture and the analysis of sitespecific characterization of glycoproteins from glycoengineered CHO-K1 cells can be used for the fast characterization of glycoproteins expressed from glycoengineered CHO cells to better tailor therapeutic drugs.

## MATERIALS AND METHODS

### Protein Digestion

Proteins (500 µg) from wild-type CHO-K1 and FUT8 knockout (KO) CHO-K1 cells were denatured in 8 M urea/1 M NH4HCO<sup>3</sup> buffer, reduced with 10 mM TCEP at 37◦C for 1 h and alkylated with 15 mM iodoacetamide at room temperature for 30 min in the dark. The solutions were diluted 8-fold with ddH2O. Then, the sequencing-grade trypsin (protein: enzyme, 40:1, w/w; Promega, Madison, WI) was added to the samples and incubated at 37◦C for 30 min to overnight. For the traditional approach, the peptides were acidified by acetic acid with pH <3. The samples were centrifuged at 13,000 g for 10 min, and the supernatant was cleaned by C18 solid-phase extraction. The peptides were eluted from the C18 column in 60% acetonitrile (ACN)/0.1% TFA,

**Abbreviations:** IGP, intact glycopeptide; CHO, Chinese hamster ovary; MAX, mixed anion exchange; PSM, peptide spectrum match.

and the peptide concentrations were measured by bicinchoninic acid protein assay. For the one-step method, the peptides were acidified like the traditional approach and then applied to the conditioned one-step columns.

### Preparation of Columns for the One-Step Enrichment Method

For the mixed columns, 30 mg mixed anion exchange (MAX) beads and 50 mg C18 beads were suspended in 1 ml ACN and mixed homogeneously in a tube. Then, 100 µl of the mixture was packed into a column with filters at the bottom and at the top and followed with a single wash using 1 ml ACN. For the stacked columns, the MAX beads (30 mg, Waters) were packed into a column with a filter at the bottom first and washed with 1 ml ACN three times. Subsequently, the C18 beads (50 mg) in ACN were packed onto the top of the MAX beads followed with a filter on the top and washed once with 1 ml ACN.

#### Enrichment of Intact Glycopeptides

The columns were sequentially conditioned in ACN three times, 100 mM triethylammonium acetate three times, 95% ACN (v/v) with 1% TFA (v/v) five times, and finally 1% TFA(v/v) five times. The samples were loaded twice in 1% TFA (v/v). The columns were washed with 1% TFA four times. Then, the nonglycopeptides were eluted with 95% ACN (v/v) with 1% TFA (v/v) three times. Finally, the intact glycopeptides were eluted in 50% ACN (v/v) with 0.1% TFA (v/v) and dried in a speed-vac in preparation for mass spectrometry.

### Nano-LC–MS/MS Analysis and Data Analysis

The mass spectrometric method and data analysis are described in our previous publication with modification (Yang et al., 2018). In brief, the peptide samples in 3% ACN and 0.1% FA were analyzed with a Q-Exactive or Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific). The data were searched using glycopeptide analysis software GPQuest 2.0 (Toghi Eshghi et al., 2015; Hu et al., 2018), which was developed in-house. The results were filtered with false discovery rate (FDR) and glyco-peptide spectrum match (PSM) count number.

## RESULTS AND DISCUSSION

### Rapid Capture of Intact Glycopeptides From Fetuin by the One-Step Enrichment Method

First, we investigated the performance of the one-step enrichment by two strategies for packing C18 and MAX particles into one cartridge. In one strategy, the C18 and the MAX beads were mixed homogeneously in acetonitrile and then packed into one column (mixed column), while in the other strategy, the C18 and the MAX beads were packed sequentially as a stacked C18 and MAX column (stacked column). Using the mixed column or stacked column, the IGPs were enriched from the mixture of peptides by the one-step method, in which the peptides were bound to C18 particles in the column, desalted by washing with 5% ACN, and eluted with 95% ACN. Then, the eluted IGPs directly bound to the MAX particles through hydrophobic interaction in the same column; the non-glycopeptides were washed away by 95% ACN. Then, the IGPs were eluted with 50% ACN from the one-step method. Meanwhile, these two one-step enrichment strategies were also compared with a traditional sequential method using two individual enrichment steps (sequential method) that we described in our previous studies (Yang et al., 2017, 2018), in which the peptides were prepared by hydrophobic chemistry using C18 column, eluted with 60% ACN. The eluted peptides were dried, and the IGPs were enriched by hydrophilic chemistry using a MAX column.

Next, we evaluated the one-step enrichment on its performance to enrich intact glycopeptides (IGPs) from bovine fetuin, which worked as a model protein in this study. Briefly, the bovine fetuin was first digested wit trypsin, and the tryptic peptides were enriched for IGPs by the one-step methods (mixed or stacked columns) and the traditional sequential glycopeptide enrichment method (sequential method), respectively. The peptides with and without glycopeptide enrichment were run on a Q-Exactive mass spectrometer using our previous IGP analysis parameters (Yang et al., 2018). The glycosite-containing database was comprised of three N-linked glycosite-containing peptides from bovine fetuin (P12763, up to two missed cleavage sites) (**Table S1**). For the N-glycans and O-glycans database, we performed the glycomics analysis of fetuin using the reported glycan permethylation method (Ciucanu and Costello, 2003) as well as using the N-glycans identified from our previous report using NGAG method (Sun et al., 2016). The glycosite-containing peptides and glycans generated databases containing 16 glycositecontaining peptides, 108 N-glycan structures (**Figure S1** and **Table S2 A**), and 3 O-glycan structures (**Figure S2**). From our GPQuest search result, all the three known N-linked glycosites were identified by the global method without glycopeptide enrichment, the one-step method, and the sequential method for IGP enrichment (**Table S2**). From the heat maps of peptidespecific glycosylation distribution and global glycan distribution on fetuin, the global method without glycopeptide enrichment could only identify highly abundant glycosylated peptides on fetuin. The one-step methods were efficient for the enrichment of IGPs from all three glycosites, LCPDCPLLAPLN#DSR, RPTGEVYDIEIDTLETTCHVLDPTPLAN#CSVR, and VVHAVEVALATFNAESN#GSYLQLVEISR, while the IGPs from glycosite LCPDCPLLAPLN#DSR were preferentially enriched by the sequential method rather than the other two glycosites (**Figure S3**). The glycosite-specific IGPs carrying different glycans suggest that the sequential method is favorable for short glycosite-containing peptides, and the long glycositecontaining peptides get lost in the sequential method, possibly due to the challenging elution of long glycopeptides from C18 particles with 60% ACN elution solution, which is typically used to elute peptides from C18 particles. However, in the one-step method, the peptides were eluted from C18 particles with 95% ACN solution, which is more beneficial for long peptides with high hydrophobicity. Based on our results, 95% ACN solution could be used for the elution of peptides from C18 columns in the future. Collectively, the one-step method (both mixed and stacked columns) can efficiently enrich IGPs from the mixture of peptides. It is also worthy to mention that the one-step method requires less sample preparation process and time compared to conventional sample preparation methods.

### Rapid Enrichment of Intact Glycopeptides From FUT8 Knockout CHO-K1 Cell by One-Step Enrichment Method

Furthermore, we compared the one-step and the sequential methods to analyze the glycosylation profile from glycoengineered CHO-K1 cells. The CHO cells are currently the primary production platform in biopharmaceutical manufacturing (Hossler et al., 2009). We evaluated the alterations of the N-linked glycosylation in CHO-K1 cells caused by FUT8 knockout (FUT8 KO). The core fucosylation-deficient cell line was generated by knocking out the FUT8 gene from the CHO-K1 cells (Wang et al., 2018). The FUT8 (α1,6-fucosyltransferase) catalyzes the transfer of a fucose residue to position 6 of the innermost GlcNAc residue of N-linked oligosaccharide (core fucosylation) of a glycoprotein (Miyoshi et al., 1999). The removal of core fucosylation of pharmaceutical glycoproteins, especially for mAbs, has positive effects on their potency and efficacy, such as antibody-dependent cellular cytotoxicity (Chung et al., 2017).

Because the stacked columns showed a similar performance as the mixed columns for the standard glycoproteins (**Figure S3**) but had an easier column preparation process, we then enriched all the glycopeptides from cells using the stacked columns for the following experiments. The peptides without and with enrichment were analyzed with an Orbitrap Fusion Lumos Tribrid mass spectrometer, of which 1,422 PSMs of IGPs, representing 1,118 unique IGPs, were detected by the onestep enrichment method, while 1,193 and 298 PSMs of IGPs, representing 469 and 198 unique IGPs, were detected by the sequential method or without enrichment (global method) (**Figure 2A** and **Table S3**). From the Venn diagram of the IGPs and the glycans identified by these three strategies, it showed that most of the IGPs and all the glycans identified by the global analysis were covered both by the one-step and the sequential methods. A total of 222 unique IGPs and 114 glycosite-containing peptides were overlapped by both the onestep and the sequential enrichment methods. The IGP number only identified by the one-step method (234) is 4.9-fold higher than the number identified exclusively by the sequential method (48). For the glycans from the IGPs, the high-mannose N-glycan were identified by all the three methods, and the sialylated glycans were most abundant in the one-step method (**Figure 2B**).

The results showed that the number of glycosite-containing peptides identified by the one-step method is 2.4-fold higher than that by the sequential method. However, the glycan numbers and the glycan types did not show much difference (47 vs. 44) between the two methods (**Figure 2B**). It indicated that the discrimination of the enrichment efficiency is caused by the difference of peptides. We next analyzed the property of the identified peptides, including the length and the grand average of hydropathy (GRAVY; indicating the hydrophobic or hydrophilic properties). It showed that there are no significant differences in the peptide length (including the distribution of the peptide length) and GRAVY, but like the fetuin result in **Figure S3**, some of the long peptides containing more than 40 amino acids were identified only by one-step methods. Some of the peptide GRAVYs from the sequential methods were higher than those of the one-step methods, which meant that the hydrophobicity of the peptide identified by the one-step methods is higher (**Figure 2C**).

### Analysis of Intact Glycopeptides From the Wild-Type and FUT8 KO CHO-K1 Cells by the One-Step Method

Given that the one-step column can efficiently enrich glycopeptides for glycoproteomic analysis of FUT8 KO CHO-K1 cells, we next performed a comparison of the IGPs identified from wild-type (WT) and FUT8 KO CHO-K1 cells using the one-step method. Peptides were prepared from cells cultured in three biological replicates and mixed together for glycopeptide enrichment using the one-step method. The enriched glycopeptides were analyzed twice using LC-MS/MS. We previously constructed a sample-specific glycopeptide database for the identification of N-linked glycosite-containing peptides using PNGase-F-released peptides (Yang et al., 2018). We applied the previous CHO-K1 cell glycomic results as glycan database (North et al., 2010). The databases used for the GPQuest search contain 57,653 potential glycosite-containing peptides and 343 glycan structure entries (**Table S4**). The spectra containing an oxonium ion (m/z 204.09) were chosen for further IGP searching. The results were filtered based on the following criteria: (1) FDR <1%, (2) ≥2 PSMs for each peptide were required, (3) all the PSMs should be annotated by at least one N-linked glycans, and (4) core Fuc fragment ions (peptide + HexNAcFuc) for core-fucosylated IGP were required. Using these filtering criteria, 2,634 unique IGPs were identified from WT and FUT8 KO CHO-K1 cells, matching up to 459 glycosite-containing peptides, 243 glycoproteins, and 105 glycan compositions. For 2,634 unique IGPs, 1,273 IGPs were identified from WT CHO-K1 cells and 1,855 IGPs were identified from FUT8 KO CHO-K1 cells (**Table S5A**). Additionally, we noticed that the ratio of overlap for IGPs between the WT and FUT8 KO CHO-K1 cells was 18.8%, while the ratio of overlap for glycositecontaining peptides was 48.1%; for protein it was 54.3%, and for glycan it was 57.1% (**Figure 3A**). For the core fucosylation analysis, the FUT8 expression was significantly reduced, and only three PSMs of core-fucosylated IGPs were identified in FUT8 KO CHO-K1 cells, while 127 PSMs of core-fucosylated IGPs were identified in the WT CHO-K1 cells. In addition, the overall proportion of the other glycosylated forms of IGPs was also remarkably changed, in which the proportion of fucosylated IGPs was significantly decreased; sialylated and high-mannose IGPs were increased in the FUT8 KO CHO-K1 cells (**Figure 3A**).

For all the identified glycoproteins, we found that the glycosylation heterogeneity was also altered in FUT8 KO CHO-K1 cells. In the 243 glycoproteins identified from the CHO-K1

FIGURE 2 | Enrichment of intact glycopeptides from FUT8 KO CHO cells by one-step and sequential methods. (A) Identification and distribution of intact glycopeptides, glycosites, and glycoproteins identified from FUT8 KO CHO cells by different methods. (B) Venn diagram of identified glycosites and their glycans by global, one-step, and sequential methods. (C) The length and grand average of hydropathy of identified glycopeptides enriched by one-step and sequential methods.

cells using the one-step methods, over 60% of the glycoproteins were identified with only one glycosite, and about 3.3% of the glycoproteins were identified with more than five glycosites (**Figure 3B**). For example, from the hyperglycosylated protein laminin subunit alpha-5 with 10 identified glycosites, we noticed that the relative abundance was not changed after the FUT8 KO, but the glycosylation at one glycosite (N2021CT) was lost and two neoglycosites (N102QT and N3111TT) were generated, which were mostly glycosylated by high-mannose N-glycans. For the core-fucosylated glycosites in WT cells, almost all the fucosylation were lost, and the high-mannose and complex glycans were increased (N79AS, N2211AS, and N2709AS). For the most microheterogeneity glycosite (N2570SS), the high-mannose and sialylated glycans were increased, which was similar to the tendency of glycosylation on all glycoproteins (**Figure 3C**).

Comparing FUT8 KO with WT CHO-K1 cell samples, although the relative abundance of glycosite-containing peptides was similar to each other, the IGP and glycan profile has altered greatly. It displayed that the sialylated and most of the high-mannose IGPs and glycans were increased in the FUT8 KO CHO-K1 cells compared to those in the WT CHO-K1 cells. Moreover, a heat map analysis of the relative abundance showed glycosite-containing peptides, IGPs, core-fucosylated IGPs, and glycans identified from the WT and FUT8 KO CHO-K1 cells (**Figure 3D**). These results together demonstrated that the knockout of FUT8 changed the glycosylation patterns of glycoproteins in CHO-K1 cells.

#### Relative Abundance Alteration of Glycoprotein Expression in FUT8 KO CHO-K1 Cells

Interestingly, not only were the N-glycotypes altered but also the relative abundance of the glycoproteins was affected in FUT8 KO CHO-K1 cells. We performed a comparative analysis of glycoproteins using label-free quantification methods based on spectral counting and normalized the data with the sum of spectra in each data set. A total of 40 glycoproteins were up-regulated (fold change >1.5) in the FUT8 KO CHO-K1 cells. Contactin-associated protein 1 and voltage-dependent calcium channel subunit alpha-2/delta-1 were the top two among them (**Figure S4**). In these up-regulated glycoproteins, the diversity of glycans were also increased significantly, especially for the high-mannose and sialylated complex glycans. Regarding relative abundance, the fold change of glycans on some up-regulated glycoproteins is higher than the fold

change of the glycoprotein expressions, like the high-mannose glycan on the protein laminin subunit gamma-1 (5.3 vs. 1.85) (**Figure 4**). On the other hand, a total of 31 glycoproteins were down-regulated (fold change <0.67) in the FUT8 knockout CHO-K1 cells. Vesicular integral-membrane protein VIP36 (LMAN2) and CD166 antigen were the top two down-regulated glycoproteins. Intriguingly, we also noticed that the glycan diversity/abundance of these proteins was also sharply decreased as their relative glycoprotein abundance decreased. The relative abundance of some glycans was changed more rapidly than the abundance change to their corresponding glycoproteins (**Figure 4** and **Table S5B**). For the glycoproteins with a consistent expression level between FUT8 KO and WT CHO-K1 cells, the glycosylation of these proteins in the FUT8 KO cells was also changed greatly, in which the high-mannose and sialylated complex glycans were increased compared to those in the WT CHO-K1 cells. Overall, the disruption of the FUT8 gene not only caused the alteration of glycosylation profile on glycoproteins but also regulated the glycoprotein expression intracellularly.

Interestingly, the top four regulated glycoproteins are signaland transmitting-related proteins. LMAN2 plays a role as an intracellular lectin in the early secretory pathway (Hara-Kuge et al., 1999). It interacts with GalNAc and high-mannose type glycans and is involved in the transport and sorting of glycoproteins (Yamashita et al., 1999). The down-regulation of LMAN2 could be a cellular adaptation to the increase in high-mannose glycoproteins. Furthermore, a total of 90 glycoproteins were exclusively expressed in the FUT8 KO CHO-K1 cell but were not detected in the WT CHO-K1 cells, such as dipeptidyl peptidase 2, laminin subunit alpha-2, and transforming growth factor beta-1 (**Table S5C**). These three proteins play an important role in cell functions especially in terms of cell surface receptor binding (www.uniprot.org/ uniprot/). About 63 glycoproteins of these 90 proteins were also identified by our previous proteomics analysis of the CHO-K1 cell (Baycin-Hizal et al., 2012). This discovery indicated that the suppression of α1,6-fucosyltransferase induced some nonglycosylated proteins to be glycosylated in the FUT8 KO cells. To be specific, the majority (≥82%) of the intact glycopeptides identified from these 90 new glycoproteins contained highmannose N-glycans, and only four glycoproteins did not contain any high-mannose N-glycan (**Figure S5**). This result indicates that these newly glycosylated proteins may not have complete glycosylation when they were processed through the Golgi apparatus.

In addition, the pathways containing significantly regulated glycoproteins were further analyzed by the Database for Annotation, Visualization, and Integrated Discovery software using the complete Cricetulus griseus proteome as background (Huang et al., 2007). For the pathway analysis, 11 pathways were involved in the 40 up-regulated glycoproteins, including lysosome, metabolic pathways, glycan and glycosaminoglycan degradation, PI3K-Akt signaling pathways, etc. Seven pathways were involved in the 31 down-regulated glycoproteins, such as lysosome, cell adhesion molecules, extracellular matrix–receptor interaction, protein processing in the endoplasmic reticulum, etc. As the primarily involved pathway group, we discovered that the 11 up-regulated glycoproteins and the 7 down-regulated glycoproteins which participated in the lysosome pathway for protein degradation are displayed in S6A. The other two most involved pathways are metabolic pathway and glycan and glycosaminoglycan degradation pathway, which included 10 and 7 glycoproteins separately. Interestingly, for glycan and glycosaminoglycan degradation pathway, all these seven glycoproteins were up-regulated in FUT8 KO CHO-K1 cells, and they all participated in N-glycan biosynthesis (Man2b1, Man2b2, Neu1, Glb1, Gba, and LOC100756951) and glycosphingolipid biosynthesis (Gns, Glb1, and LOC100756951) as glycosidase (**Figures S6B, C**).

### CONCLUSION

In summary, we developed and implemented a one-step method for the rapid and efficient enrichment of intact glycopeptides for mass spectrometry analysis. This method saved sample preparation time and improved the coverage of the glycopeptides for different glycosites. Furthermore, we characterized the differences in glycoprofiles between WT and FUT8 knockout CHO-K1 cells and observed that the knockout of FUT8 eliminates the core fucosylation in CHO-K1 cells and induces alterations of overall glycosylation. Moreover, we also identified changes in the type and the amount of N-linked glycopeptides from WT and FUT8 knockout CHO-K1 cells, including the number of glycoproteins up-regulated and downregulated that are involved in lysosome, metabolic, glycosylation, and PI3K-Akt signaling and other pathways. These findings indicated that future N-glycosylation profiling can be both enhanced in quality at the site-specific level and made more rapid by combining multi-column enrichment processes into a single column. More intriguingly, except for the site-specific glycosylation characterization, intact glycopeptide can also reveal the intracellular glycoprotein dynamics which is unachievable for the conventional glycan analysis methods and give insights for bio-analysts and biotechnologists to better tailor therapeutic drugs in the coming decades.

### DATA AVAILABILITY STATEMENT

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

## AUTHOR CONTRIBUTIONS

HZ, GY, and S-YC conceived the study. GY designed and performed the IGP enrichment and MS analysis and conceived the bio-informatic analysis. NH and QW performed cell culture experiments. MB provided the Fut8 mutant cell lines. S-YC and YZ performed and supervised glycoproteomics experiments. GY, HZ, QW, and MB wrote and revised the manuscript.

#### FUNDING

This work was supported by the National Cancer Institute, the Clinical Proteomic Tumor Analysis Consortium (CPTAC, Grant U24CA210985), the Early Detection Research Network

#### REFERENCES


(EDRN, U01CA152813), and the National Science Foundation (Grant 1512265).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00240/full#supplementary-material


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Yang, Höti, Chen, Zhou, Wang, Betenbaugh and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Efficient *N*-Glycosylation of the Heavy Chain Tailpiece Promotes the Formation of Plant-Produced Dimeric IgA

Kathrin Göritzer <sup>1</sup> , Iris Goet <sup>1</sup> , Stella Duric<sup>1</sup> , Daniel Maresch<sup>2</sup> , Friedrich Altmann<sup>2</sup> , Christian Obinger <sup>2</sup> and Richard Strasser <sup>1</sup> \*

<sup>1</sup> Department of Applied Genetics and Cell Biology, Institute for Plant Biotechnology and Cell Biology, University of Natural Resources and Life Sciences, Vienna, Austria, <sup>2</sup> Division of Biochemistry, Department of Chemistry, University of Natural Resources and Life Sciences, Vienna, Austria

Production of monomeric IgA in mammalian cells and plant expression systems such as Nicotiana benthamiana is well-established and can be achieved by co-expression of the corresponding light and heavy chains. In contrast, the assembly of dimeric IgA requires the additional expression of the joining chain and remains challenging especially in plant-based systems. Here, we examined factors affecting the assembly and expression of HER2 binding dimeric IgA1 and IgA2m(2) variants transiently produced in N. benthamiana. While co-expression of the joining chain resulted in efficient formation of dimeric IgAs in HEK293F cells, a mixture of monomeric, dimeric and tetrameric variants was detected in plants. Mass-spectrometric analysis of site-specific glycosylation revealed that the N-glycan profile differed between monomeric and dimeric IgAs in the plant expression system. Co-expression of a single-subunit oligosaccharyltransferase from the protozoan Leishmania major in N. benthamiana increased the N-glycosylation occupancy at the C-terminal heavy chain tailpiece and changed the ratio of monomeric to dimeric IgAs. Our data demonstrate that N-glycosylation engineering is a suitable strategy to promote the formation of dimeric IgA variants in plants.

*Edited by:* Yanmei Li, Tsinghua University, China

#### *Reviewed by:*

Harald Kolmar, Darmstadt University of Technology, Germany Francisco Solano, University of Murcia, Spain

*\*Correspondence:* Richard Strasser richard.strasser@boku.ac.at

#### *Specialty section:*

This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry

*Received:* 20 January 2020 *Accepted:* 02 April 2020 *Published:* 22 April 2020

#### *Citation:*

Göritzer K, Goet I, Duric S, Maresch D, Altmann F, Obinger C and Strasser R (2020) Efficient N-Glycosylation of the Heavy Chain Tailpiece Promotes the Formation of Plant-Produced Dimeric IgA. Front. Chem. 8:346. doi: 10.3389/fchem.2020.00346 Keywords: glyco-engineering, glycosylation, immunoglobulin, monoclonal antibody, recombinant glycoprotein

### INTRODUCTION

Human immunoglobulin A (IgA) is the second most prevalent serum immunoglobulin after IgG and is the predominant antibody class in the external secretions of mucosal surfaces, where it serves as a first line of defense against invading pathogens. The human body expends a considerable amount of energy producing IgA variants thereby exceeding the daily production of all other immunoglobulin classes combined (Woof and Mestecky, 2005). This huge IgA demand highlights the importance of IgA in immune defense processes for which it is equipped with unique structural attributes of its heavy chain and its ability to form monomeric, dimeric and polymeric forms (**Figure 1**). While monomeric IgA is mainly found in serum and consists of two heavy chains (HC) and two light chains (LC), mucosal IgA is mainly dimeric whereby two IgA monomers are linked together by the incorporation of one joining chain (JC). In the dimeric IgA, the JC is covalently linked by disulfide bonds to the penultimate cysteine residue present in the C-terminal tailpiece of the IgA HC. The binding of the JC to the 18 amino acid long HC tailpiece is necessary for the formation of dimeric and polymeric IgA variants in the endoplasmic reticulum (ER) (Atkin et al., 1996; Yoo et al., 1999).

While recombinant IgGs are currently most widely used as therapeutic antibodies to combat infections or diseases, alternative antibody formats are gaining attention as potential biopharmaceuticals due to their specific structural properties and binding to different immune receptors (Loos et al., 2014; Brandsma et al., 2019; Montero-Morales et al., 2019). Polymeric antibody formats have a higher valency of antigen-binding sites which has several advantages compared to monomeric antibodies. Recombinant dimeric IgA against EGFR has been shown to be more effective in tumor cell killing than monomeric IgA or IgG1 due to the recruitment of a distinct repertoire of effector functions (Lohse et al., 2011). Moreover, dimeric IgA can bind to the polymeric immunoglobulin receptor (pIgR) and the pIgR-mediated transcytosis of dimeric IgA enables the access to therapeutic targets within the luminal side of mucosal tissues that are inefficiently targeted by current IgG therapeutics (Olsan et al., 2015). On the other hand, the formation of complexes between antiviral dimeric IgAs and viruses prevents the penetration of mucosal barriers (Ruprecht et al., 2019) and contributes to protection against mucosal virus challenge (Watkins et al., 2013). Together, these studies demonstrate the great potential of recombinant dimeric IgA as therapeutic agent. However, the production of dimeric IgA for clinical studies is technologically challenging due to the complex subunit assembly and extensive glycosylation (Vasilev et al., 2016).

Plants are attractive systems for the manufacturing of recombinant biopharmaceuticals including monoclonal antibodies (Stoger et al., 2014). Plant and mammalian cells share a common machinery for the biosynthesis and processing of N-glycans that is conserved up to the initiation of complextype N-glycan formation in the cis/medial-Golgi (Strasser, 2016). On plant produced recombinant glycoproteins, complex and truncated N-glycans with β1,2-xylose and core α1,3-fucose are frequently found. These non-human residues are potentially immunogenic and numerous strategies have been employed to prevent their attachment to N-glycans (Montero-Morales and Steinkellner, 2018). These efforts resulted in the formation of human-like N-glycan structures on recombinant glycoproteins including different immunoglobulin classes (Strasser et al., 2008; Loos et al., 2014; Göritzer et al., 2017; Montero-Morales et al., 2019) and showed that plants tolerate extensive engineering of their glycan structures. Different plant expression systems have been used to produce monomeric, dimeric or secretory IgA variants (Ma et al., 1995; Karnoup et al., 2005; Juarez et al., 2013; Virdi et al., 2013; Paul et al., 2014; Westerhof et al., 2014, 2015; Dicker et al., 2016; Göritzer et al., 2019).

In a previous study, we have transiently expressed monomeric HER2 binding IgA1 and two IgA2 allotypes in leaves of N. benthamiana to examine the site-specific N-glycosylation (Göritzer et al., 2017). While we found no differences in glycosylation efficiency on most of the N-glycosylation sites compared to human cell-derived IgAs, the N-glycosylation site present in the tailpiece of all plant-produced IgAs was only 40–60% glycosylated. Here, we investigated how efficient dimeric HER2 binding IgAs are produced in the transient N. benthamiana expression system and whether the N-glycosylation in the tailpiece plays a role in the assembly of multimeric IgA.

### MATERIALS AND METHODS

#### Construct Design and Cloning

All constructs used for the expression of monomeric human HER2 binding IgA1 and IgA2m(2) isotypes in N. benthamiana and HEK293F cells have been described in detail recently (Göritzer et al., 2017, 2019). The codon optimized gene for expression of the joining chain (JC) (AK312014.1) in either N. benthamiana or HEK293F cells was synthesized by GeneArt (Thermo Fisher Scientific, USA) and was flanked with the signal peptide from barley alpha-amylase (AAA98615) and the restriction sites XhoI/AgeI (for expression in N. benthamiana) or the signal peptide "MELGLSWIFLLAILKGVQC" and the restriction sites BamHI/SalI (for expression in HEK293F). The synthesized DNA fragments were cloned into the binary vector pEAQ-HT (Sainsbury et al., 2009) and the mammalian expression vector gWIZ (Genlantis, San Diego, CA) for expression of dimeric IgA in N. benthamiana and HEK293F, respectively, as described previously (Göritzer et al., 2017, 2019).

For the expression of the marginal zone B and B1-cellspecific protein (MZB1) (Q8WU39) in N. benthamiana the codon-optimized MZB1 coding sequence was synthesized by GeneArt. A construct for the expression of a tagged MZB1 (mRFP-MZB1) was obtained by amplification with the primers "TATATCTAGAGATAGGGCTCCTCTTACTGCTA"/"TATAG GATCCTCAAAGTTCCTCTCTGGTAGC," digestion with the restriction enzymes XbaI/BamHI, and cloning into XbaI/BamHI digested p117 (Shin et al., 2017). For the expression of BiP2, the coding sequence was amplified from A. thaliana cDNA using the primers "TACTAGTATGGCTCGCTCGTTTGGAGC AAACAGCACT"/"TACTAGTCTAGAGCTCATCGTGAGA CTCATCT" and subcloned using a Zero Blunt TOPO PCR Cloning Kit (Thermo Fisher Scientific, USA). The cloned fragment was excised by SpeI digestion and ligated into XbaI digested expression vector p42. Vector p42 is derived from pPT2M (Strasser et al., 2005) and carries the A. thaliana ubiquitin 10 promoter instead of the CaMV35S promoter and the sequence for the attachment of a 3x HA-tag plus the HDEL peptide at the C-terminus of the expressed protein. The CRT2 coding sequence was amplified from A. thaliana cDNA using the primers "TATATCTAGAATGGCGAAAATGATTCCTA GCC"/"TATAGGATCCAGCGGTGGCGTCTTTCTCAGAGG." The PCR product was XbaI/BamHI digested and cloned into the expression vector p59 (Schoberer et al., 2019) to express CRT2 fused to mRFP-HDEL. CNX1 was amplified from A. thaliana cDNA using "TATATCTAGAGACGATCAAACGGTT CTGTATG"/"TATAGGATCCCTAATTATCACGTCTCGGTT GCC," XbaI/BamHI digested and cloned into expression vector p110 (same vector as p117 but with a kanamycin resistance gene for selection in plants) to express an mRFP-CNX1 variant. For endoplasmic reticulum resident protein 44 (ERp44) expression, a codon-optimized coding sequence of human ERp44 (CAC87611) including the sequence coding for the signal peptide from barley alpha-amylase was synthesized by GeneArt and cloned into the XbaI/BamHI sites of pPT2M. The expression construct of the single-subunit oligosaccharyltransferase from Leishmania major (LmSTT3D) has been described previously (Castilho et al., 2018).

#### Expression and Purification of Dimeric IgA

For the expression of different recombinant monomeric and dimeric IgA isotypes in 5 to 6 weeks old N. benthamiana 1XT/FT plants, syringe-mediated agro-infiltration was used (Strasser et al., 2008; Göritzer et al., 2017). To obtain dimeric IgA variants, the κ-LC and respective α-HC were co-infiltrated with the JC with an OD<sup>600</sup> of 0.1 or 0.2. Chaperones were coinfiltrated at an OD<sup>600</sup> of 0.05. To increase the N-glycosylation occupancy, IgAs were co-infiltrated with LmSTT3D at an OD<sup>600</sup> of 0.1. After 4 days, infiltrated leaf material was harvested and the clarified crude extract was prepared for IgA purification as described previously (Göritzer et al., 2017). For the transient expression of monomeric and dimeric IgA isotypes in HEK293F cells, cultures were transfected with the κ-LC, the different α-HCs and JC constructs in a 1:1:0 and 1:1:0.5 ratio of µg DNA, respectively, as described (Göritzer et al., 2019). Finally, IgA from clarified N. benthamiana 1XT/FT leaf extract and supernatant of HEK293F cells was purified with IgA CaptureSelect affinity resin (Thermo Fisher Scientific, US), followed by a size-exclusion chromatography step (Göritzer et al., 2017).

#### SDS-PAGE

For reducing or non-reducing SDS-PAGE 2.5 µg of purified protein were loaded on a 4–15% Mini-PROTEAN <sup>R</sup> TGXTM gel (Bio-Rad laboratories, USA) and detected by Coomassie Brilliant Blue staining.

### Size-Exclusion Chromatography Coupled to Multi-Angle Light Scattering (SE-HPLC-MALS)

To investigate the oligomeric state, conformational integrity and molecular weight of purified IgAs, high performanceliquid-chromatography (HPLC) coupled to a size-exclusion chromatography column (Superdex 200 10/300 GL column, GE Healthcare, USA) combined with multi-angle light scattering were carried out as described previously (Göritzer et al., 2017). HPLC (Shimadzu prominence LC20) was equipped with MALS (WYATT Heleos Dawn8+ QELS; software ASTRA6), refractive index detector (RID-10A, Shimadzu) and a diode array detector (SPD-M20A, Shimadzu). Ratios of monomeric, dimeric and polymeric IgA were determined by peak-integration using LabSolutions Data Analysis (Shimadzu) software.

### ELISA

Purified human HER2 (residues 1–631) was provided by Elisabeth Lobner (University of Natural Resources and Life Sciences, Vienna). For antigen-binding experiments of monomeric, dimeric and polymeric IgA variants ELISA was performed as described recently (Göritzer et al., 2017).

### Surface Plasmon Resonance (SPR) Spectroscopy

Binding experiments of monomeric and dimeric IgA variants to FcαRI were performed with surface plasmon resonance spectroscopy using a Biacore T200 (GE Healthcare Life Sciences, Sweden). Recombinant soluble FcαRI was available from a previous study (Göritzer et al., 2019). All measurements were conducted with a Protein L sensor chip (GE Healthcare Life Sciences, Sweden) as described recently (Göritzer et al., 2019). Binding affinities (KD) were calculated with the Biacore T2

normalized based on the highest signal of each chromatogram.

Evaluation software using a 1:1 binding model. All experiments were repeated as three independent kinetic runs.

#### *N*-Glycan Analysis

A total of 20 µg purified protein was reduced, S-alkylated and digested with trypsin (Promega, USA). Glycopeptides were then analyzed by capillary reversed-phase chromatography and electron-spray mass spectrometry using a Bruker Maxis 4G Q-TOF instrument (Göritzer et al., 2017). Site-specific glycosylation occupancy was calculated using the ratio of deamidated to unmodified peptide determined upon N-glycan release with PNGase A (Europa Bioproducts).

### RESULTS

### Dimeric IgA Variants Are Less Efficiently Formed in *N. benthamiana*

To obtain a better understanding of dimeric IgA assembly, HER2 binding monomeric and dimeric IgA1 and IgA2m(2) (**Figure 1**) were transiently expressed in HEK293F cells and glyco-engineered N. benthamiana 1XT/FT plants. Therefore, the κ-LC and respective α-HC were co-expressed in the presence and absence of the JC, followed by affinity purification and analysis of the assembly using SE-HPLC coupled to multiangle light scattering (MALS). This allowed the determination of the molecular mass of the proteins in solution and quantification of the relative amounts of the different species using peak integration. Size-exclusion chromatograms showed that relatively pure monomers of IgA1 and IgA2m(2) with a mass of ∼160 kDa are produced in the absence of the JC. In both expression systems, only small amounts of IgA with a molecular weight >160 kDa could be observed (**Figure 2**). Cotransfection of the JC resulted in almost complete formation of dimeric IgAs with a molecular mass of around 360 kDa in HEK293F cells. By contrast, a mixture of monomeric, dimeric and polymeric species was observed in plants. Thereby, the assembly of dimeric IgA1 appeared to be more efficient than the assembly of dimeric IgA2m(2). The formation of polymeric IgA, however, was dependent on the relative amount of JC cotransfected with the κ-LC and α-HC and the harvesting time after infiltration. Increasing ratios of JC to κ-LC and α-HC in the infiltration mix resulted in a decreased percentage of

FIGURE 3 | Biophysical and functional characterization of recombinant monomeric, dimeric and tetrameric IgA. (A) Overlay of normalized SE-HPLC-MALS chromatograms of affinity and gel-filtration purified IgA1 and IgA2m(2) monomers, dimers and tetramers produced in HEK293F cells and N. benthamiana 1XT/FT plants. (B) SDS-PAGE under reducing (+DTT) and non-reducing conditions of purified monomeric (m) and dimeric (d) IgA1 and IgA2m(2) produced in N. benthamiana 1XT/FT plants followed by Coomassie Brilliant Blue staining. (C) Binding of the IgA variants to the antigen HER2. The EC<sup>50</sup> vales were determined as the mean ± standard deviation from three independent measurements. "m" monomeric, "d" dimeric, "t" tetrameric IgA. (D) Binding affinities of IgA1 and IgA2m(2) monomers and dimers to FcαRI. K<sup>D</sup> values were obtained by SPR spectroscopy in single-cycle kinetic experiments from three independent measurements. Error bars represent standard deviation.

FIGURE 4 | N-glycan analysis of the α-HC and JC from purified monomeric and dimeric IgA1. (A) Representative MS-spectra ([M+3H]3+) of the tryptic glycopeptide containing the CH2 resident NLT glycosylation site of the α-HC of HEK293F- and plant-produced IgA1. (B) Representative MS-spectra ([M+2H]2<sup>+</sup> and [M+3H]3+) of the tryptic glycopeptide containing the single NIS glycosylation site of the HEK293F- and plant-produced dimeric IgA1 joining chain. N-glycans are abbreviated according to the ProGlycAn system (www.proglycan.com). The most abundant glycoform is highlighted in blue and illustrated with cartoons.

polymeric IgA. Furthermore, a later harvesting point yielded higher amounts of polymeric IgA (**Figure 2** and **Figure S1**).

#### Monomeric and Dimeric/Tetrameric IgA Display Similar Antigen and Receptor Binding Affinities

To determine the biophysical and biochemical properties of different IgA forms we up-scaled the production in both expression systems, followed by separate isolation of monomeric, dimeric and tetrameric species after affinity chromatography using preparative size-exclusion chromatography. The analytical profiles of all purified variants from the two different expression systems gave narrow single and monodisperse peaks (**Figure 3A**). The masses of these peaks were confirmed by MALS and represent fully assembled monomeric, dimeric and tetrameric IgA (present in plant-derived variants), with masses corresponding well to their theoretical masses. The SDS-PAGE of plant-produced monomeric and dimeric IgA1 and IgA2m(2) under reducing conditions confirmed the presence of the κ-LC and α-HC without the presence of degradation products (**Figure 3B**). Interestingly, under reducing conditions a small shift in the migration behavior of the α-HC of dimeric compared to monomeric IgA1 and IgA2m(2) was observed, which could arise from the presence of differentially processed N-glycans on monomeric and dimeric IgA. The JC (15 kDa) of dimeric variants could not be detected on the gel likely due to its low abundance. Under non-reducing conditions purified monomeric and dimeric IgA1 and IgA2m(2) variants showed a predominant band in the range of the expected molecular mass of 160 kDa for each monomeric variant and at 320 kDa for each dimeric variant, representing the assembled forms.

Next, we investigated the functionality of all purified IgAs in terms of binding to the antigen HER2. ELISA experiments were performed and the half-maximal effective concentrations (EC50) were determined (**Figure 3C**). Thereby it could be shown that the antigen binding behavior of monomeric and dimeric IgA1 and IgA2m(2) from plant and mammalian hosts is essentially the same and only tetrameric IgA2m(2) expressed in N. benthamiana showed slightly decreased binding to the antigen.

The monomeric and dimeric variants of IgA1 and IgA2m(2) were further tested for binding to the Fcα-receptor (FcαRI) using surface plasmon resonance (SPR) spectroscopy (**Figure 3D**). Therefore, the different IgA variants were immobilized on a Protein L chip in an oriented manner with the Fc-domain pointing toward the solution. Five increasing concentrations of the FcαRI were injected in single-cycle kinetic experiments. The obtained response units suggested a 1:1 binding stoichiometry for monomeric and dimeric IgA variants to the receptor and curves were fitted accordingly. The K<sup>D</sup> values around 110 nM and 170 nM obtained for the HEK293F- and plantderived monomeric IgA1 and IgA2m(2) variants, respectively, corresponded to the previously reported values using this setup (Göritzer et al., 2019). A rapid association and dissociation rate was characteristic for the interaction of FcαRI with all IgA variants, whereas a decreased association rate for dimeric IgAs could be observed resulting in a slightly reduced binding

affinity compared to the monomeric IgA variants (**Figure 3D** and **Table S1**).

of dimeric IgA compared to monomeric and polymeric forms is the mean ± standard deviation from four independent experiments.

#### The *N*-Glycans of Plant-Derived Dimeric IgAs Are Different From Monomeric Variants

In the reducing SDS-PAGE of monomeric IgA1 and IgA2m(2) produced in N. benthamiana the shift in the mobility between monomeric/dimeric forms may arise from differential glycosylation. There are two and five N-glycosylation sites in IgA1 and IgA2m(2), respectively (**Figure 1**). In addition, IgA1 has up to 6 O-glycosylation sites in the proline-rich hinge region (Royle et al., 2003; Göritzer et al., 2017). To assess the N-glycosylation status of purified monomeric and dimeric IgA1 and IgA2m(2) produced in N. benthamiana and HEK293F cells, the purified proteins were digested with trypsin and analyzed by LC-ESI-MS for site-specific N-glycosylation and the presence of modifications within the IgA1 hinge region. The N-glycans found on plant-produced monomeric IgA1 showed biantennary complex-type structures like GlcNAc1Man3GlcNAc<sup>2</sup> (MGn/GnM), GlcNAc2Man3GlcNAc<sup>2</sup> (GnGn) and the paucimannosidic Man3GlcNAc<sup>2</sup> (MM) as major glycoforms. In addition, small amounts of oligomannosidic N-glycans were detected on some of the sites (**Figure 4A** and **Figures S2, S3**). HEK293F-produced monomeric IgA variants have a more diverse profile with different amounts of branched or sialylated complex N-glycans. While most of the complex N-glycans are fucosylated **Figures S2, S3**), the conserved NLT site (**Figure 1**) lacks core fucose on monomeric as well as on the dimeric HEK293F-derived variants (**Figure 4A**). This finding is in accordance with the site-specific differences occurring on the monomeric IgA isotypes that have been described (Göritzer et al., 2017). Generally, the N-glycosylation profile of HEK293F-produced dimeric IgAs was similar to monomeric IgAs. For plant-produced dimeric IgAs a clear shift of paucimannosidic (MM structures) toward fully processed complex N-glycans (GnGn structures) was observed for the NLT site of dimeric IgA1 (**Figure 4A**), as well as for the NVT, NSS, and NIT sites of dimeric IgA2m(2) (**Figure S3**). By contrast, O-glycans present in the IgA1 hinge region appeared similar in monomeric and dimeric variants (**Figure S4**).

Furthermore, we were able to detect the single glycopeptide corresponding to the JC of HEK293F- and plant-produced dimeric IgA variants (**Figure 4B** and **Figure S5**). The N-glycan profile of the single site in the JC of HEK293F-produced IgA1 and IgA2m(2) showed a high heterogeneity with high levels of branching and incomplete sialylation and some peaks corresponding to core-fucosylated and hybrid N-glycans. In plant-produced JC derived from purified dimeric IgA1 and IgA2m(2) variants, the N-glycans were more homogenous with the GnGn-type complex N-glycan and oligomannosidic structures as major glycoforms and low levels of unglycosylated JC. The presence of oligomannosidic N-glycans suggests incomplete processing of the JC N-glycans in the Golgi of plants.

#### Co-expression of ER-Resident Proteins Increased the Overall Yield, but Did Not Improve Dimeric IgA Formation in *N. benthamiana*

Despite the JC co-expression, plants were less efficient in assembly of dimeric IgAs compared to HEK293F cells with still large amounts of monomeric IgA present. Therefore, we investigated whether this limitation can be overcome by co-expression of different ER-resident proteins including the chaperone BiP, which is known to play a role for the antibody assembly (Haas and Wabl, 1983), the protein disulfide isomerase ERp44 and the lectins calnexin (CNX) and calreticulin (CRT). Human ERp44 binds to the tailpiece of the IgM HC, which is quite similar to the IgA tailpiece and promotes IgM polymerization in the ER of mammalian cells (Cortini and Sitia, 2010). The lectins CNX and CRT bind to immature N-glycans and promote the folding of glycosylated proteins (Hammond et al., 1994). In addition, we co-expressed the human marginal zone B and B-1 cell-specific protein MZB1, which was recently shown to promote JC binding and dimeric IgA assembly in mammalian cells (Xiong et al., 2019). The κ-LC, α-HC, and JC were co-infiltrated with either Arabidopsis BiP2, CRT2 mRFP, mRFP-CNX1, human ERp44, or mRFP-MZB1. None of these ER-resident proteins increased the relative amount of dimeric to monomeric IgA (**Figure S6**). However, upon using BiP2, CRT2, CNX1, and ERp44 higher yields could be achieved with up to 2-fold increase of purified IgA per gram fresh weight (**Figure 5A**).

#### Increased *N*-Glycosylation of the Tailpiece Promotes Dimeric IgA Formation in *N. benthamiana*

Previous studies in mammals have indicated an important role of the tailpiece N-glycan for JC incorporation (Atkin et al., 1996; Sørensen et al., 2000). In plant-produced IgAs, the tailpiece is incompletely glycosylated which might contribute to inefficient dimeric IgA formation (Westerhof et al., 2015; Göritzer et al., 2017; Castilho et al., 2018). Underglycosylation was even more pronounced in the IgA2m(2) isotype which also exhibited higher amounts of non-assembled monomeric IgA when co-infiltrated with the JC. In a previous study we have shown that it is possible to overcome the reduced glycosylation efficiency by co-expression of LmSTT3D, a single subunit oligosaccharyltransferase from the protozoan Leishmania major (Castilho et al., 2018). Using this approach the occupancy of the tailpiece N-glycosylation site of dimeric IgA1 and IgA2m(2) increased from 61.1 ± 0.9% to 90.0 ± 0.4 and 43.5 ± 2.7% to 87 ± 2.2%, respectively (**Figure 5B**). However, the almost complete Nglycosylation in the tailpiece did not fully compensate for the reduced dimeric IgA formation compared to HEK293Fproduced IgA variants. The LmSTT3D co-expression led to an 17 ± 2.7% increase of dimeric to monomeric IgA for IgA1 and an increase of 22.7 ± 6.4% for IgA2m(2) (**Figure 5C**). The assembly was also not further improved when, MZB1 and LmSTT3D were co-expressed together with IgA variants (**Figure S7**) suggesting the involvement of additional factors for dimeric IgA formation that are missing in plants.

### DISCUSSION

In this study we examined factors that affect the dimeric IgA formation in N. benthamiana which is currently one of the most widely used plant for recombinant protein expression and glyco-engineering (Bally et al., 2018; Montero-Morales and Steinkellner, 2018). IgAs are heavily glycosylated and distinct glycoforms contribute to the overall thermal stability of IgAs (Göritzer et al., 2017), the in vivo halflife (Rouwendal et al., 2016) and effector functions (Steffen et al., 2020). Moreover, not only is the N- as well as Oglycan composition different between plant- and mammalian cell-derived IgAs, but also the degree of glycosylation in the single tailpiece N-glycosylation site (Göritzer et al., 2017; Castilho et al., 2018). This underglycosylation at the C-terminal end of the α-HC may be caused by unknown differences in the oligosaccharyltransferase function between mammals and plants (Strasser, 2016; Shrimal and Gilmore, 2019). Mammals contain two distinct OST complexes and the STT3B complex post-translationally glycosylates acceptor sites at extreme Cterminal regions (Ruiz-Canada et al., 2009; Shrimal et al., 2013). Plants have also two distinct STT3 subunits, but their role in co- and post-translational N-glycosylation is unknown (Koiwa et al., 2003). Previously it was suggested that partial N-glycosylation of the α-HC tailpiece and/or the JC impairs secretory IgA assembly in plants (Westerhof et al., 2015). Our data demonstrate that the N-glycosylation in the tailpiece is critical for the formation of dimeric IgAs. Hence, engineering of the plant oligosaccharyltransferase complex toward a more mammalian-like function is a strategy to improve the production of dimeric IgAs. While the site-specific N-glycosylation of plantproduced α-HC and the secretory component were shown previously (Dicker et al., 2016; Göritzer et al., 2017), we could determine for the first time the N-glycan profile of a plantproduced JC incorporated into dimeric IgA. In contrast to the α-HC tailpiece, an almost complete occupancy was found at the single JC site suggesting that the JC N-glycosylation does not contribute to the reduced amounts of dimeric IgAs in plants.

Previously it was revealed that the JC incorporation is the limiting factor for secretory IgA formation (Westerhof et al., 2015). This is consistent with our findings, where we did not observe an increase of dimeric IgA when the amount of infiltrated JC was varied. Along with N-glycosylation, our data indicate that there are other factors contributing to the IgA dimerization. When we tested the effect of plant and mammalian ER-resident proteins involved in protein folding and assembly, we did not observe improved dimeric IgA formation. However, the overall yield of the produced IgAs was increased. This finding suggests that the plant proteins BiP, CNX, or CRT are not specifically involved in dimeric IgA formation, but in the assembly or stabilization of the IgA α-HC and κ-LC resulting in higher amounts of monomeric and dimeric IgA. Human ERp44 expression clearly increased the yield, but had no effect on the IgA polymerization suggesting that it is not required for this process in plants. Despite being expressed and correctly retained in the ER in plants, tagged human MZB1 neither affected the yield nor improved the dimeric IgA formation. The precise mechanism for the MZB1-mediated incorporation of the JC into dimeric IgA is currently unknown (Xiong et al., 2019). There is no homolog of MZB1 present in plants and it is possible that the protein is not functional in N. benthamiana because a potential interaction partner is missing. MZB1 is a member of a large ER-chaperone complex that includes BiP as well as other folding assistants (Shimizu et al., 2009). Expression of MZB1 variants or the combination of MZB1 with ER-resident mammalian chaperones could be tested in the future to increase the assembly and JC incorporation. In this regard, the use of multi-cassette vectors carrying the α-HC, k-LC, JC as well as potential chaperones on the same vector could overcome unbalanced expression in cells (Westerhof et al., 2015) and further boost the dimeric IgA formation.

In previous studies it was discovered that plant-produced IgAs are poorly secreted and the tailpiece harbors a sequence motif for sorting to the vacuole (Hadlington et al., 2003; Paul et al., 2014; Westerhof et al., 2014). We analyzed the amount of secreted IgAs in the presence or absence of co-expressed JC and could not detect an appreciable difference in secretion (**Figure S8**). The majority of monomeric and dimeric IgA remained inside the cells, while the IgA in the apoplast was degraded. The absence of major degradation products for the IgA variants inside the cells suggests that they are not targeted to the vacuole. Further studies are required to unravel the subcellular compartment where the majority of the monomeric and dimeric IgAs accumulate.

We were interested to examine whether the N-glycan composition is different for monomeric or dimeric IgA variants as reported for monomeric and polymeric serum IgA (Oortwijn et al., 2006). On several N-glycosylation sites we observed distinct changes in the N-glycan composition with a decrease of truncated N-glycans on plant-produced dimeric IgAs. These structures are likely generated in a post-Golgi compartment by β-hexosaminidases (Shin et al., 2017) and differences between monomeric and dimeric IgAs can be explained by changes in the accessibility of the N-glycans due to the dimer formation and incorporation of the JC. Whether these changes in Nglycan composition cause differences in binding to specific receptors needs to be tested in the future. Previously, we have shown that monomeric IgAs with different N-glycans display comparable binding affinities to FcαRI (Göritzer et al., 2019). Here, we performed a kinetic analysis of the dimeric IgA variants and observed similar affinities and kinetics for monomeric IgA. The N-glycan composition of dimeric IgA likely does not contribute FcαRI binding and the JC incorporation does not cause steric hindrance as shown for the secretory component (Vidarsson et al., 2001). Moreover, the deduced 1:1 binding stoichiometry for dimeric IgA variants to the receptor is consistent with models proposing that one dimeric IgA binds only two FcαRI, although one dimeric IgA has four binding sites (Bonner et al., 2008; Breedveld and van Egmond, 2019).

In conclusion, we show that functional dimeric IgA binding to the antigen as well as to the most relevant IgA receptor can be produced by transient expression in N. benthamiana. The inefficient formation of dimeric IgAs is partly caused by underglycosylation of the N-glycosylation in the tailpiece as well as by other factors that need to be uncovered to make plants a suitable expression system for this important class of monoclonal antibodies.

### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

## AUTHOR CONTRIBUTIONS

KG, IG, SD, and DM conducted the experiments. KG, IG, DM, and RS analyzed the results. KG, FA, CO, and RS conceived and supervised the experiments. KG and RS wrote the paper. All authors have made a substantial and intellectual contribution to the work and approved it for publication.

## FUNDING

This work was supported by the Austrian Science Fund (FWF) [Doctoral Program BioToP−410 Biomolecular Technology of Proteins (W1224)] and by the FWF Project P31920-B32.

## ACKNOWLEDGMENTS

We thank Professor George Lomonosoff (John Innes Centre, Norwich, UK) and Plant Bioscience Limited (PBL) (Norwich, UK) for supplying the pEAQ-HT expression vector. We thank Andreas Loos, Yun-ji Shin, and Christiane Veit (Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences) for assisting in cloning of expression vectors for the ER-chaperones and the BOKU Core Facility Biomolecular & Cellular Analysis for help with SPR analysis.

#### REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00346/full#supplementary-material


a specific β-hexosaminidase from Nicotiana benthamiana. Plant Biotechnol. J. 15, 197–206. doi: 10.1111/pbi.12602


enterotoxigenic Escherichia coli infection. Proc. Natl. Acad. Sci. U.S.A. 110, 11809–11814. doi: 10.1073/pnas.1301975110


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Göritzer, Goet, Duric, Maresch, Altmann, Obinger and Strasser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Improved Synthesis of Sulfur-Containing Glycosides by Suppressing Thioacetyl Migration

Tao Luo<sup>1</sup> , Ying Zhang<sup>1</sup> , Jiafeng Xi <sup>2</sup> , Yuchao Lu<sup>2</sup> \* and Hai Dong<sup>1</sup> \*

*<sup>1</sup> Key Laboratory for Large-Format Battery Materials and System, Ministry of Education, School of Chemistry & Chemical Engineering, Huazhong University of Science and Technology, Wuhan, China, <sup>2</sup> Analysis Center of College of Science & Technology, Hebei Agricultural University, Huanghua, China*

Complex mixtures were often observed when we attempted to synthesize 4-thio- and 2,4-dithio-glycoside derivatives by double parallel and double serial inversion, thus leading to no or low yields of target products. The reason was later found to be that many unexpected side products were produced when a nucleophile substituted the leaving group on the substrate containing the thioacetate group. We hypothesized that thioacetyl migration is prone to occur due to the labile thioacetate group even under weak basic conditions caused by the nucleophile, leading to this result. Therefore, we managed to inhibit the generation of thiol groups from thioacetate groups by the addition of an appropriate amount of conjugate acid/anhydride, successfully improving the synthesis of 4-thio- and 2,4-dithio-glycoside derivatives. The target products which were previously difficult to synthesize, were herein obtained in relatively high yields. Finally, 4-deoxy- and 2,4-dideoxy-glycoside derivatives were efficiently synthesized through the removal of thioacetate groups under UV light, starting from 4-thio- and 2,4-dithio-glycoside derivatives.

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*George Kokotos, National and Kapodistrian University of Athens, Greece Martin D. Witte, University of Groningen, Netherlands*

#### \*Correspondence:

*Yuchao Lu Luyuchao@hebau.edu.cn Hai Dong hdong@mail.hust.edu.cn*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *12 December 2019* Accepted: *30 March 2020* Published: *23 April 2020*

#### Citation:

*Luo T, Zhang Y, Xi J, Lu Y and Dong H (2020) Improved Synthesis of Sulfur-Containing Glycosides by Suppressing Thioacetyl Migration. Front. Chem. 8:319. doi: 10.3389/fchem.2020.00319* Keywords: sulfur-containing glycoside, acetyl group migration, 4-deoxy glycoside, 2,4-dideoxy-glycoside, desulfurization

## INTRODUCTION

The synthesis of deoxysugars has drawn increasing attention due to their biological importance (Weymouth-Wilson, 1997; Langenhan et al., 2005; Li et al., 2010; Zou et al., 2012; Balmond et al., 2014; Issa and Bennett, 2014; Thoden and Holden, 2014; Zhu et al., 2014; Elshahawi et al., 2015; Sau et al., 2017; Zhang et al., 2019). The synthesis of 4-deoxysugars drew attention because they are expected to express a variety of biological activities including angiogenesis inhibitory activities (Furuta et al., 1979; van Wijk et al., 2010, 2013; Valueva et al., 2011). 4-Deoxysugars are capable of acting as chain terminators for oligosaccharide biosynthesis with 1–4 glycosidic linkages. In order to achieve a siteselective deoxy product starting from a naturally abundant sugar, multistep protection/deprotection sequences and harsh reduction conditions are usually required (Arita et al., 1972; Rasmussen, 1980; Haque et al., 1986; Lin et al., 1989; Raju et al., 2009; Zou et al., 2012). A method has been developed toward direct synthesis of 4-deoxy pyranosides by two steps, site-selective toluoylation of 4-OH of free pyranosides and subsequent reductive deacyloxylation (Yanagi et al., 2019). However, the catalyst for the toluoylation is not readily available, and the yield for deacyloxylation is low (38–61%). Recently, we have developed an efficient method for the synthesis of deoxy glycosides through UV light promoted desulfurization of

sulfur-containing glycosides (Ge et al., 2017a, 2019). The efficiency of obtaining siteselective sulfur-containing glycosides is the key to this approach (Ge et al., 2017a,b). We have been developing methods for the synthesis of sulfur-containing carbohydrate (Ren et al., 2015; Wu et al., 2015; Ge et al., 2017a,b; Norberg et al., 2017), since they can be used as tools for model studies or even therapeutic intervention (Crich and Li, 2007; Sakamoto et al., 2009; Caraballo et al., 2010; Baryal et al., 2013; Daly et al., 2013; Jana and Misra, 2013; Zeng et al., 2013). The introduction of sulfur into a carbohydrate molecule usually proceed through substitution of the leaving group with a thioacetate nucleophile. However, unexpected side reactions were often observed during the substitution process due to the existence of thioacetyl group (Knapp et al., 1992; Pei et al., 2006, 2007; Chen and Withers, 2010), which puzzled us until we thought that thioacetyl group migration cause these side reactions (Zhou et al., 2014). In this study, we initially attempted to synthesize 4-thio- and 2,4-dithio-glycoside derivatives by double parallel and double serial inversion (Dong et al., 2007a). However, no or low yields of target products due to complex side reactions. With methyl 3,6-di-OAc-α-mannoside as a starting material, E2 elimination products were obtained due to the axial 2-OTf leaving group and the steric hindrance of 1-OMe groups. With methyl 3,6-di-OAc-β-mannoside as a starting material, the inversion reactivity at the 2-position showed slightly higher than that at the 4-position due to that the axial 2-OTf leaving group can be attacked directly, leading to the failure of the double serial inversion. With methyl 3,6-di-OAc glucosides and galactosides as starting meterials, it was found that many unexpected side products were produced when a nucleophile substituted the leaving group on the substrate containing an thioacetate group. We hypothesized that thioacetyl group migration causes these unexpected side products and are due to the labile thioacetate group even under weak basic conditions caused by the nucleophile. Therefore, we managed to inhibit the generation of thiol groups from thioacetate groups by the addition of an appropriate amount of conjugate acid/anhydride to the reaction system, successfully improving the synthesis of 4-thio- and 2,4-dithio-glycoside derivatives (**Scheme 1**). Finally, 4-deoxy- and 2,4-dideoxy-glycoside derivatives were efficiently synthesized through the removal of thioacetate groups under UV light, starting from 4-thio- and 2,4-dithio-glycoside derivatives.

#### RESULTS AND DISCUSSION

The synthesis efficiencies of 2-thio-, 4-thio- and 2,4-dithioglycosides are key to obtaining 2-deoxy-, 4-deoxy- and 2,4 dideoxy-glycosides by desulfurization (Ren et al., 2015; Wu et al., 2015; Ge et al., 2017a,b; Norberg et al., 2017). We have developed several efficient methods to synthesize glycosides in which both 3- and 6-positions were protected (Ren et al., 2014b; Xu et al., 2016; Zhang et al., 2016; Lv et al., 2018a,b, 2019). One of the methods using acetate (Ren et al., 2014b) or benzoate (Zhang et al., 2016) as a catalyst is particularly convenient and environmentally friendly. Based on this, we conceived to synthesize 4-thio- and 2,4-dithio-glycosides by double parallel or double serial inversion strategies (Dong et al., 2007a) starting from methyl 3,6-OAc glycosides. Methyl 3,6- OAc glycosides can be efficently obtained by selective acetylation of free methyl glycosides catalyzed by the acetate anion (Ren et al., 2014b), and be triflated to afford 2,4-OTf intermediates. Then the intermediates can be allowed to react with thioacetate anion in the double parallel inversion to give 2,4-dithio-glycoside derivatives, or sequentially react with thioacetate/acetate anion or acetate/thioacetate anion in the double serial inversion to give 2-thio- or 4-thio-glycoside derivatives. In our previous attempts to obtain 2-thio-, 4-thio- and 2,4-dithio-mannosides (**Scheme 2**) (Wu et al., 2015), the 2,4-dithio- and 2-thio-α/β-D-mannoside derivatives **9**/**10** and **11**/**12** were efficiently synthesized while the attempts to synthesize 4-thio-mannoside derivatives **5**/**6** failed. A complex mixture was obtained when the triflated intermediate **3**/**4** was treated with KSAc to substitute its 4-OTf, followed by the substitution of its 2-OTf with KOAc. In order to investigate the cause, we repeated this reaction. The investigation indicated that the substitution of the 4-OTf of **3**/**4** with KSAc in acetonitrile proceeded very well so as to afford the 4-thioacetate intermediate **7/8** in a high yield. However, A complex mixture was observed when the intermediate **7/8** was treated with KOAc whether in aceonitrile or in DMF. Withers encountered a similar dilemma when he attempted to synthesize p-nitrophenyl 4-thio-β-Dmannopyranoside by the double serial inversion (Chen and Withers, 2010). He proposed that the thioacetate group is usually labile even under weak basic conditions so as to cause a number of side reactions. Based on our previous studies on thioacetyl migration (Zhou et al., 2014), we guessed that thio group should be readily produced from the deacetylation of thioacetate under basic condition and further lead to acetyl migration, oxidation, and inversion products.

It is more difficult to synthesize methyl 2-thio-, 4-thio- and 2, 4-thio-α/β-D-talosides through the double parallel and double serial inversion (**Scheme 3**). Methyl 3,6-di-OAc-α/β-D-glucoside **13**/**14** can be synthesized in a high yield by regioselective acetylation of free methyl α/β-D-glucoside (Ren et al., 2014b), followed by triflation to give triflated intermediate **15**/**16**. The

intermediate **15**/**16** was expected to be sequentially substituted with KOAc and KSAc to give 2-thio-α/β-D-taloside **17**/**18**, to be substituted with an excess amount of KSAc to give 2,4-dithioα/β-D-taloside **19**/**20**, and to be sequentially substituted with KSAc and KOAc to give 4-thio-α/β-D-taloside **21**/**22**. However, no or very low yields were obtained in all these reactions due to the formation of complex mixtures. The reason was supposed to be due to the neighboring group participation (3-OAc attacking 2 or 4-position) (Dong et al., 2007a, 2008b) and the instability of the thioacetate group under even weak basic conditions. Then the intermediate **15**/**16** was substituted with TBASAc in toluene to supress neighboring group participation, affording 4-SAc intermediate **23**/**24** in a high yield. The isolated **23**/**24** further reacted with KSAc in DMF to give the 2,4-di-SAc taloside derivative **19**/**20** in a yield of 33/36%. The attempts to obtain **21**/**22** by the inversion of **23**/**24** with KOAc in DMF still failed.

The attempt to synthesize methyl 2-thio-, 4-thio- and 2, 4 thio-α-D-galactosides through double parallel and double serial inversion failed (**Scheme 4**). The 3,6-di-OAc-α-D-mannoside **26** was obtained in 80% yield by organotin-mediated regioselective acetylation (Dong et al., 2007b) of free methyl α-D-mannoside **25**. Triflation of **26** afforded triflated intermediate **27**. Treatment of the intermediate **27** with 5.0 equiv of KSAc in DMF for 12 h provided a major product **28** and a minor product **29**. However, it was observed (by TLC plate) that **29** was first formed and then slowly converted to **28** with time. Treatment of **27** with 2.0 equiv of TBASAc in toluene for 48 h gave a mixture of half and half of **28** and **29**. Treatment of **27** with 1.2 equiv of KSAc in DMF for 48 h provided a major product **29**. Similarly, when **27** was treated with 1.2 equiv of KOAc in DMF for 48 h, a major product **30** was isolated. The isolated other product proven to be a mixture containing **31** (**Figure S1** in Supplementary Material). Obviously, the substitution of the 4-OTf of **27** by thioacetate could successfully produce the intermediate **32**. However, the substitution of the axial 2-OTf of **32** by thioacetate or acetate was difficult due to the steric hindrance of 1-OMe in this case, thus leading to the E2 elimination and the elimination product **29** under basic conditions. Under this basic conditions, **29** slowly converted to **28** going through migration intermediates **B** and **C** with time.

SCHEME 3 | Attempts to synthesize methyl 2-thio-, 4-thio- and 2, 4-thio-a/β-D-talosides: (a) i: TBAOAc, Ac2O, MeCN, rt, 24 h; ii: Tf2O, pyridine, DCM, −20–10◦C, 3 h; (b) i: KOAc, DMF, rt, 1 h; ii: KSAc, DMF, 40◦C, overnight (17, 11%, 18, <5%); (c) KSAc, DMF, rt, 48 h, complex mixture; (d) TBASAc, PhMe, rt, 7 d, <10% yield; (e) i: KSAc, DMF, rt, 1 h; ii: KOAc, DMF, 40◦C, overnight, complex mixture; (f) TBASAc, PhMe, rt, 1 h, over 3 steps; (g) KSAc, DMF, 40◦C, overnight; (h) KOAc, DMF, 40◦C, overnight, complex mixture.

Ac2O, MeCN, 0◦C to rt, 12 h; (b) Tf2O, pyridine, DCM, −20–10◦C, 3 h; (c) KSAc (5.0 eq), DMF, rt, 24 h; (d) TBASAc (2.0 eq), PhMe, rt, 48 h; (e) KSAc (1.2 eq), DMF, rt, 48 h; (f) KOAc (1.2 eq), DMF, rt, 48 h.

The attempt to abtain methyl 2-thio-, 4-thio- and 2, 4-thio-β-D-galactosides starting from methyl 3,6-di-OAcβ-manopyranoside **33** showed better results (**Scheme 5**). Substitution of **34** (the triflated product of **33**) with 5.0 equiv of TBASAc in MeCN at room temperature led to a 84% yield of methyl 2,4-di-thioacetate-galactoside **35**. Unexpectedly, neither intermediaite **36a** (the 4-OTf of **34** substituted by thioacetate) nor intermediate **36b** (the 2-OTf of **34** substituted by thioacetate) could be observed under the conditions that 1.0 equiv of TBASAc was used instead, and **35** was still obtained. Usually, 4-OTf showed higher reactivity than 2-OTf when substituted on a glycoside ring. However, the substitution of 4-OTf of **34** is disfavored due to steric hindrance of 2-OTf in this case. Once 2-OTf of **34** had been substituted by thioacetate, the substitution of 4-OTf would occur immediately due to the disappearance of the steric hindrance from 2-OTf, leading to the formation of **35**. Similarly, the treatment of **34** with 1.0 equiv of TBAOAc in MeCN at room temperature mainly gave product **37**. However, when this reaction was performed at 0 ◦C, intermediate **38** was formed. Consequently, the following addition of 3 equiv of TBASAc led to 4-thioacetate galactoside **39** in 48% yield. However, the treatment of **34** with 1.0 equiv of TBASAc in MeCN at 0◦C did not give intermediate **36b**, but gave **35** in 38% yield. While axial triflates can be attacked

SCHEME 5 | Attempt to synthesize methyl 2-thio-, 4-thio- and 2, 4-thio-β-D-galactoside: (a) TBANO2, PhMe, rt, 6 h, 70%; (b) Tf2O, pyridine, DCM, −20–10◦C, 3 h; (c) TBASAc (5.0 eq), MeCN, rt, 2 h, 35 (84%); (d) TBASAc (1.0 eq), MeCN, 0◦C, 4 h, 35 (38%); (e) TBAOAc (2.0 eq), MeCN, rt, 2 h; (f) TBAOAc (2.0 eq), MeCN, 0◦C, 4 h; (g) TBASAc (3.0 eq), MeCN, 0◦C-rt, 3 h, 39 (8% from b-e-g, 48% from b-f-g).

directly (the antibonding orbital can be approached), the sugar ring has to adopt a different conformation to allow attack on the equatorial triflate (since the antibonding orbital is shielded by the axial substituents on the ring). Lowering the temperature thus may slow down the interconversion of the ring. Thus, the 2-OTf of **34** showed high reactivity on substitution by acetate due to the axial ttiflate leaving group. When the reaction proceeded at room temperature, the product **37** was formed immediately from the intermediate **38**. However, when the reaction proceeded at 0◦C, the interconversion of the sugar ring turned very slow, which thus restrained the further substitution of 4-OTf by the acetate. However, thioacetate showed much


TABLE 1 | Optimization of the reaction conditions<sup>a</sup> .

*<sup>a</sup>Substrate* 7 *(50 mg), KOAc, TBAOAc or KSAc (5 equiv), Solvent (1 mL).*

higher nucleophilicity than acetate since sulfur is a big atom and it will therefore readily have a productive overlap with the antiboding orbital. This high nucleophilicity of thioacetate flattened the difference in reactivity between 2-OTf and 4-OTf. We also proposed that a supramolecular control effect perhaps plays a key role in this process, in which an acetate ion can be accommodated at the center of the β-pyranoside face to produce an anion-carbohydrate complex (Dong et al., 2008a; Ren et al., 2014a), resulting in a higher reactivity of 2-OTf than that of 4-OTf. However, the poor or no supramolecular effect in polar solvent acetonitrile can not fully support this result.

From these experiments, we noticed that the substitution on a substrate which have already contained a thioacetate group usually led to unexpected side-products. The previous studies (Chen and Withers, 2010; Zhou et al., 2014) have suggested that thioacetate groups are usually labile even under weak basic conditions, and then produce thiol groups, leading to acetyl migration, oxidation, and inversion products (**Figure 1**). The initial thiol might be generated by intermolecular acetyl migration to a nucleophile (trace of water, dimethylamine in DMF, acetate, or thioacetate) in the reaction mixture. Once a thiol group was formed, the intramolecular acetyl migration from an adjacent acetyl group to the thiol group would occur under even weak basic conditions. If the complex mixtures were indeed caused by thiol group and acetyl migration in these reactions, suppressing such formation of thiol and such migration by adjusting acidic/basic condition may improve these reactions.

Therefore, the substitution on 2-triflated intermediate **7** by acetate/thioacetate was used as a model reaction to test under various acidic/basic conditions (**Table 1**). Substitution of **7** with acetate in acetonitrile or DMF led to a complex mixture (entry 1). To our delight, with the addition of more and more acetic

#### TABLE 2 | Synthesis of 4-thio-glycosides under optimized condition.


*Reaction condition: <sup>a</sup>Ac2O, KOAc (5.0 equiv), 75*◦*C, 24 h; <sup>b</sup> i: Tf2O, pyridine, DCM,* −*20– 10*◦*C, 3 h; ii: CH3CN, KSAc (5.0 equiv), HSAc (1.5 equiv), 50*◦*C, 12–24 h. <sup>c</sup>CH3CN, KSAc (5.0 equiv), HSAc (1.5 equiv), 50*◦*C, 24–48 h; <sup>d</sup>KSAc (1.5 equiv), DMF, 60*◦*C, 6 h. <sup>e</sup>KSAc (1.5 equiv), TsOH (0.5 equiv), DMF, 60*◦*C, 6 h.*

acid to this reaction system, the target **5** was isolated for better and better yield (entries 2–5). Especially, the yield of **5** when using Ac2O as the solvent (entries 6 and 7) was 76–78% as

compared with the 50% yield when using acetic acid as the solvent (entry 5). This must be because Ac2O greatly inhibited the generation of thiol groups by reacting with thiol groups to form thioacetates. Substitution of **7** with 5 equiv of thioacetate in acetonitrile yielded the target **9** in 56% yield (entry 8). The yield of **9** was increased to 75% when 1.5 equiv of thioacetic acid was added to this reaction system (entry 9). However, the addition of more thioacetic acid decreased the yield of **9** (entries 10 and 11). Substitution of **7** with 5 equiv of thioacetate in DMF in the absence/presence of 1.5 equiv of thioacetic acid yielded **9** in 70/90% yield, respectively (entries 12 and 13). These results indicated that our hypothesis was reasonable. The substitution on a substrate containing a thioacetate group by thioacetate/acetate could be improved by suppressing the formation of thiol and by adjusting the acidity/basicity of the reaction system.

The synthesis of 2-OAc-4-SAc methyl glycosides **5**, **6**, **21,** and **22** failed in **Schemes 2**, **3**. However, with Ac2O used as the reaction solvent instead, **5**, **6**, **21,** and **22** were successfully synthesized in medium to high yields (47–78%) starting from their 2-OTf intermediates **7**, **8**, **23,** and **24** (entries 1–4 in **Table 2**). The main side-products seemed to be caused by 1-OMe group participation, such as **6a** (**Figure S2** in Supplementary Material). The yields of 2,4-di-SAc glycosides **9**, **10**, **19,** and **20** were increased by 9–13% compared to the reaction without the addition of 1.5 equiv of thioacetic acid (entries 5–8). The main side-products seemed to be caused by the oxidation of intra-molecular thiol groups, such as **19a**. Substitution of 6-OTs glycosides **40**, **42,** and **44** with potassium thioacetate in DMF, the yields of 6-SAc glycosides **41**, **43,** and **45** were increased by 10– 16% compared to the reaction without the addition of 0.5 equiv of toluene sulfonic acid (entries 9–11).

In our previous studies on the synthesis of deoxyglycosides by desulfurization under UV light (Ge et al., 2017b, 2019), we didn't obtain 4-deoxymannosidic derivatives because we were unable to obtain 4-SAc mannosides efficiently. With 4- SAc mannosides **5** and **6** in the hands, 4-deoxymannosidic derivatives **46** and **47** were obtained in 81 and 79% yields by our one-pot method removing thioacetate group (**Figure 2**), respectively. With 2,4-di-SAc mannosides **9** and **10** in the hands, we started to test if 2,4-di-deoxy glycosides could be obtained by simultaneously removing two thioacetate groups in a onepot method. After optimizing the reaction conditions, substrates **9** and **10** were treated with 2.5 equiv of N2H4·H2O in DMF at room temperature for 4 min, followed by the addition of 3.0 equiv of TCEP.HCl, and desulfurization under UV light led to 2,4-dideoxy glycosides **48** and **49** in 80 and 84% yields, respectively.

### CONCLUSION

It was attempted to synthesize methyl 2-thio-, 4-thio- and 2,4 dithio-glycosides starting from methyl 3,6-di-OAc glycosides by a double parallel and double serial inversion strategy in this study. Complex mixtures were often observed, thus leading to no or low yields of target products. With methyl 3,6-di-OAc-α-mannoside as a starting material in this strategy, elimination products were obtained due to the steric hindrance of 1-OMe group. With methyl 3,6-di-OAc-β-mannoside as a starting material in this strategy, the slightly higher reactivity of 2-OTf than that of 4- OTf due to the axial 2-OTf leaving group, leads to the failure of the double serial inversion. With methyl 3,6-di-OAc glucosides and galactosides as starting materials, it was found that many unexpected side products were produced when a nucleophile substituted the leaving group on the substrate containing an thioacetate group. The reason is hypothesized that thioacetyl migration is prone to occur due to the labile thioacetate group even under weak basic conditions caused by the nucleophile. Therefore, when substitution of the substrate with an acetate anion, Ac2O was used as a solvent to inhibited the generation of thiol groups by reacting with thiol groups to form thioacetates; when substitution of the substrate with a thioacetate anion, an appropriate amount of thioacetic acid was added to the reaction system to adjust the basicity. Consequently, the synthesis of target 4-thio- and 2,4-dithio-glycoside products was successfully improved due to suppressing thioacetyl migration. Finally, 4 deoxy- and 2,4-dideoxy-glycoside derivatives were efficiently synthesized through the removal of thioacetate groups under UV light, starting from 4-thio- and 2,4-dithio-glycoside derivatives.

#### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

#### AUTHOR CONTRIBUTIONS

TL performed the experiments and analyzed the data. YZ and JX prepared some related substrates and analyzed the data. YL and HD came up with the original idea, conceptualized and directed the project, and drafted the paper with the assistance from all co-authors.

#### FUNDING

This study was supported by the National Nature Science Foundation of China (Nos. 21772049, 21708010),

#### REFERENCES


Specialized Research Fund for the Doctoral Program of Hebei Agricultural University (No. 2D201712), and Natural Science Foundation of Hebei Province (No. B2019204243).

#### ACKNOWLEDGMENTS

The authors are also grateful to the staff in the Analytical and Test Center of HUST for support with the NMR and HRMS instruments.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00319/full#supplementary-material


heparan sulfate expression using a sugar analogue reduces angiogenesis. ACS Chem. Biol. 8, 2331–2338. doi: 10.1021/cb4004332


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Luo, Zhang, Xi, Lu and Dong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# O-Acetylated Chemical Reporters of Glycosylation Can Display Metabolism-Dependent Background Labeling of Proteins but Are Generally Reliable Tools for the Identification of Glycoproteins

Narek Darabedian<sup>1</sup> , Bo Yang<sup>2</sup> , Richie Ding<sup>3</sup> , Giuliano Cutolo<sup>1</sup> , Balyn W. Zaro<sup>4</sup> , Christina M. Woo<sup>2</sup> and Matthew R. Pratt 1,3 \*

*<sup>1</sup> Department of Chemistry, University of Southern California, Los Angeles, CA, United States, <sup>2</sup> Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, United States, <sup>3</sup> Biological Sciences, University of Southern California, Los Angeles, CA, United States, <sup>4</sup> Department of Biological Science, University of Southern California, San Francisco, CA, United States*

Monosaccharide analogs bearing bioorthogonal functionalities, or metabolic chemical reporters (MCRs) of glycosylation, have been used for approximately two decades for the visualization and identification of different glycoproteins. More recently, proteomics analyses have shown that per-*O*-acetylated MCRs can directly and chemically react with cysteine residues in lysates and potentially cells, drawing into question the physiological relevance of the labeling. Here, we report robust metabolism-dependent labeling by Ac42AzMan but not the structurally similar Ac44AzGal. However, the levels of background chemical-labeling of cell lysates by both reporters are low and identical. We then characterized Ac42AzMan labeling and found that the vast majority of the labeling occurs on intracellular proteins but that this MCR is not converted to previously characterized reporters of intracellular O-GlcNAc modification. Additionally, we used isotope targeted glycoproteomics (IsoTaG) proteomics to show that essentially all of the Ac42AzMan labeling is on cysteine residues. Given the implications this result has for the identification of intracellular O-GlcNAc modifications using MCRs, we then performed a meta-analysis of the potential O-GlcNAcylated proteins identified by different techniques. We found that many of the proteins identified by MCRs have also been found by other methods. Finally, we randomly selected four proteins that had only been identified as O-GlcNAcylated by MCRs and showed that half of them were indeed modified. Together, these data indicate that the selective metabolism of certain MCRs is responsible for S-glycosylation of proteins in the cytosol and nucleus. However, these results also show that MCRs are still good tools for unbiased identification of glycosylated proteins, as long as complementary methods are employed for confirmation.

Keywords: bioorthogonal reporters, metabolic engineering, O-GlcNAc, proteomics, cysteine labeling

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Martin D. Witte, University of Groningen, Netherlands Shisheng Sun, Northwest University, China*

> \*Correspondence: *Matthew R. Pratt matthew.pratt@usc.edu*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *24 January 2020* Accepted: *30 March 2020* Published: *28 April 2020*

#### Citation:

*Darabedian N, Yang B, Ding R, Cutolo G, Zaro BW, Woo CM and Pratt MR (2020) O-Acetylated Chemical Reporters of Glycosylation Can Display Metabolism-Dependent Background Labeling of Proteins but Are Generally Reliable Tools for the Identification of Glycoproteins. Front. Chem. 8:318. doi: 10.3389/fchem.2020.00318*

## INTRODUCTION

Cellular biosynthetic pathways have been exploited for over two decades to incorporate chemical functionality into proteins and posttranslational modifications (Chuh et al., 2016; Gilormini et al., 2018; Parker and Pratt, 2020). For obvious biochemical reasons, metabolic probes or metabolic chemical reporters (MCRs) were traditionally designed to exploit known enzymatic promiscuities (**Figure 1A**). For example, the Bertozzi lab accomplished the first metabolic incorporation of reactive functionalities into complex carbohydrates by taking advantage of the enzymatic tolerances around the N-acetyl position of N-acetyl-mannosamine (Mahal et al., 1997; Saxon and Bertozzi, 2000) that had been previously discovered by Werner Reutter (Kayser et al., 1992). More specifically, small chemicalhandles like azides or alkynes are tolerated at this position by the biosynthetic pathways that scavenge monosaccharides and convert them to the corresponding nucleotide sugar-donors. Glycosyltransferases can use these unnatural donors for the modification of proteins. A second bioorthogonal reaction step is then exploited to attached visualization or affinity tags for analysis. More recently, we and others have taken a broader approach to glycoprotein-MCR discovery through the synthesis and characterization of monosaccharide analogs that may not transit well-characterized biosynthetic pathways (Zaro et al., 2011, 2017; Chuh et al., 2014, 2017; Li et al., 2016; Shen et al., 2017; Darabedian et al., 2018). For instance, we demonstrated that 6-azido-6-deoxy-N-acetylglucosamine (6AzGlcNAc) can bypass the traditional GlcNAc-salvage pathway to generate uridine diphosphate sugar (UDP-6AzGlcNAc) (Chuh et al., 2014), resulting in labeling of O-GlcNAcylated proteins and suggesting that cellular metabolism is more accommodating to MCRs than previously appreciated. While this phenomenon was confirmed and expanded by ourselves and other labs, a recent analysis of per-O-acetylated MCRs by the Wang and Chen labs showed that they can chemically react with cysteines on proteins when incubated with cell lysates at moderate to high concentrations (0.2–2.0 mM) (Qin et al., 2018; Hao et al., 2019), raising concerns about how much reporter-labeling is due to enzymatic glycosylation.

Here, we analyze the "non-traditional" potential O-acetylated MCRs, 2-azido-2-deoxy-mannose (Ac42AzMan) and 4-azido-4-deoxy-galactose (Ac44AzGal) (**Figure 1B**). We find that treatment of mammalian cells with Ac42AzMan results in robust labeling of proteins, while Ac44AzGal does not result in any signal over background. In contrast to this live-cell labeling, we find that Ac42AzMan and Ac44AzGal display identical levels of chemical modification in cell lysates. We then further characterized Ac42AzMan and found that the vast majority of the labeling is intracellular in nature with essentially nodetectable signal on the cell-surface. We and the Vocadlo lab independently showed that 2-azido-2-deoxy-glucose (2AzGlc) can be incorporated onto intracellular O-GlcNAc modification by the enzyme O-GlcNAc transferase (Shen et al., 2017; Zaro et al., 2017). These results raised the possibility that 2AzMan is converted to 2AzGlc by the enzyme N-acetylglucosamine 2-epimerase (Uniprot P51606); however, we used in vitro

biochemistry to show that this is likely not the case. We do not know the mechanism by which Ac42AzMan treatment labels proteins. However, comparison of this MCR to Ac44AzGal indicates that direct cysteine-labeling by MCRs is not a universal property of per-O-acetylated monosaccharides. Instead, we believe that some background chemical labeling of proteins by MCRs is likely, but that it requires the conversion of the MCR into reactive species in cells. We believe that this background labeling pathway is distinct from glycosyltransferase-mediated modification of proteins. We then performed a meta-analysis of proteomic results from different strategies used to enrich and identify potentially O-GlcNAcylated proteins. We found that while MCRs gave the largest fraction of unique proteins, there was notable overlap with other techniques. Finally, we randomly chose four proteins that were only identified by an MCR as being O-GlcNAcylated and were able to confirm that two of them were indeed modified. Together, our results show that MCRs can result in chemical modification of intracellular proteins, but that this may be due to cellular metabolism of the reporter instead of direct reaction of the per-O-acetylated monosaccharide with cysteines. However, we also show that MCRs are still powerful discovery tools that should be used in conjunction with complementary techniques to confirm the glycosylation status of any identified protein.

#### MATERIALS AND METHODS

#### General Information

Reagents and solvents were obtained from various commercial suppliers and were used without further purification. Thin-layer chromatography (TLC) was performed on EMD Silica Gel 60 F<sup>254</sup> plates and visualized by ceric ammonium molybdate or UV. Flash chromatography was perfumed on 60 Ã silica gel (EMD. <sup>1</sup>H spectra were obtained on a 400 MHz on a Varian spectrometer Mercury 400 and chemical shifts are recorded in ppm (δ) relative to solvent and coupling constants (J) are reported in Hz.

## Synthesis of Ac44AzGal

1,2,3,6-Tetra-O-acetyl-D-glucopyranose (250 mg, 0.72 mmol) was dissolved in 7 mL CH2Cl<sup>2</sup> and 1 mL of pyridine was added and cooled to 0◦C. Triflic anhydride (0.49 mL, 2.87 mmol) was then added dropwise and the reaction was allowed to stir at 0◦C for 1 h. The mixture was then diluted with CH2Cl<sup>2</sup> and washed twice with 1 M hydrogen chloride, once with saturated sodium bicarbonate, and once with brine. The organic layer was dried over sodium sulfate, filtered, and concentrated. The resulting oil was dissolved in 5 mL DMF, NaN<sup>3</sup> (234 mg, 3.6 mmol) was added, and the mixture was stirred for 1.5 h. The reaction was then diluted with ethyl acetate and the organic layer was washed twice with water and once with brine. The organic layer was dried over sodium sulfate, filtered, and concentrated. The product was then purified by column chromatography (10–30% acetone in hexanes, 3 steps) to afford 205 mg of the product. <sup>1</sup>H NMR (400 MHz, Chloroform-d) δ 6.31 (t, J = 1.6 Hz, 1H), 5.41–5.39 (m, 2H), 4.24–4.16 (m, 4H), 2.15 (s, 3H), 2.14 (s, 3H), 2.09 (s, 3H), 2.03 (s, 3H). These data are consistent with previously published characterization (Chen and Withers, 2018).

#### Cell Culture

H1299 cells were grown in RPMI media (GenClone), while HeLa cells were grown in DMEM high glucose media (GenClone). In both cases, media was supplemented with 10% Fetal Bovine Serum (Altanta Biologicals). NIH3T3 cells were grown in DMEM high glucose (GenClone) supplemented with 10% Fetal Calf Serum (Altanta Biologicals). All Cell lines were grown at 37◦C and 5.0% CO2.

## Methanol/Chloroform/H2O Precipitation

Proteins were recovered through first addition of a 3× volume of methanol, a 0.75× volume of chloroform, and a 2× volume of H2O. The resulting mixtures were then subjected to mixing by vortexing and centrifugation (5 min, 5,000 × g). The aqueous phase separates at the top of the mixture and was removed and discarded without disturbing the interface layer. An additional 2.5× volume of methanol was then added, followed by mixing by vortexing, and pelleting of protein by centrifugation (10 min, 5,000 × g).

#### Background Lysate-Labeling

HeLa cells were collected by trypsinization and washed two times with PBS (2 min, 2,000 × g, 4◦C).

#### Native Lysis Conditions

The resulting cell-pellets were resuspended in 400 µL of PBS with 5 mg/mL Protease Inhibitor and then tip sonicated in the ice for 45 s (15 s on, 10 s off). Protein concentration was normalized using a BCA assay (Pierce, ThermoScientific) and diluted to 2 mg mL−<sup>1</sup> . To 100 µL (200 µg) of this protein solution, was added either Ac42AzMan or Ac44AzGal from either a 2 or 10 mM stock solution in DMSO to give a final concentration of 200µM or 2 mM, respectively. After incubation of this mixture at 37◦C for 2 h, the lysates were precipitated by the addition of 800 µL of cold MeOH and incubation at −80◦ for 1 h. The precipitates were collected by centrifugation at 8,000 × g, 5 min at 4◦C and washed twice with cold MeOH. The supernatant was removed, and the pellet was allowed to air-dry, and then 188 µL 1% SDS buffer (1% SDS, 150 mM NaCl, 50 mM triethanolamine pH 7.4) was added to each sample. The mixture was sonicated in a bath sonicator to ensure complete dissolution. The resulting protein mixture was then subjected to the CuAAC conditions described below.

#### Denaturing Lysis Conditions

The resulting cell-pellets were resuspended in 100 µL 1% SDS with 5 mg/mL Protease Inhibitor and then tip sonicated for 15 s. Protein concentration was normalized using a BCA assay (Pierce, ThermoScientific) and diluted to 2 mg mL−<sup>1</sup> . To 50 uL (100 µg) of this protein solution, was added either Ac42AzMan or Ac44AzGal from either a 2 or 10 mM stock solution in DMSO to give a final concentration of 200µM or 2 mM, respectively. After incubation of this mixture at 37◦C for 2 h, the reaction was diluted with 50 µL of 4% SDS (4% SDS, 50 mM triethanolamine pH 7.4, 150 mM NaCl) and subjected to methanol/chloroform/H2O precipitation. The supernatant was removed, and the pellet was allowed to air-dry, and then 94 µL 1% SDS buffer (1% SDS, 150 mM NaCl, 50 mM 50 mM triethanolamine pH 7.4) was added to each sample. The mixture was sonicated in a bath sonicator to ensure complete dissolution. The resulting mixture was then subjected to the CuAAC conditions described below.

#### CuAAC

To 100 µg of the protein mixtures above was added fresh click chemistry cocktail (6 µL)[alkyne-TAMRA tag (Click Chemistry Tools, 100µM, 10 mM stock solution in DMSO), tris(2 carboxyethyl)phosphine hydrochloride (TCEP) (1 mM, 50 mM freshly prepared stock solution in water), tris[(1-benzyl-1-H-1,2,3-triazol-4-yl)methyl]amine (TBTA) (Click Chemistry Tools, 100µM, 10 mM stock solution in DMSO), CuSO4·5H2O (1 mM, 50 mM freshly prepared stock solution in water)]. After 1 h, the CuAAC reaction was subjected to methanol/chloroform/H2O precipitation. The resulting protein-pellet was air-dried (5– 10 min), and then 25 µL of 4% SDS buffer (4% SDS, 150 mM NaCl, 50 mM 50 mM triethanolamine pH 7.4) was added to each sample. The mixture was sonicated in a bath sonicator until all the protein was dissolved, and 25 µL of 2× SDS-free loading buffer (20% glycerol, 0.2% bromophenol blue, 1.4% βmercaptoethanol, pH 6.8) was then added. The samples were boiled for 5 min at 97◦C, and 40 µg of protein was then separated by SDS-PAGE. The resulting gels were then visualized using a Typhoon 9400 Variable Mode Imager (GE Healthcare) with a 532 nm laser for excitation and a 30 nm bandpass filter centered at 610 nm for detection.

#### Metabolic Labeling

To cells at 80–85% confluency, media was exchanged for fresh media containing Ac42AzMan, Ac44AzGal, Ac36AzGlcNAc, Ac4ManNAz, Ac45SGlcNAc (1,000× stock in DMSO), or DMSO vehicle as indicated. Cells were removed from the plate by trypsinization, collected by centrifugation (2 min, 2,500 × g), and gently washed with PBS (1 mL, 2×). The resulting pellets were lysed in 100 µL of 1% NP-40 lysis buffer [1% NP-40, 150 mM NaCl, 50 mM triethanolamine (TEA) pH 7.4] with complete, Mini, EDTA-free Protease Inhibitor Cocktail Tablets (Sigma Aldrich) for 20 min. Any remaining cell debris was then removed by centrifugation (10 min, 10, 000 × g, 4◦C). The soluble fraction, or soluble cell lysate, was collected and a BCA assay (Pierce, ThermoScientific) was used to determine protein concentration, which normalized to 1 µg µL <sup>−</sup><sup>1</sup> using more lysis buffer. CuAAC was then performed on 200 µg of protein by adding 12 µL of fresh click chemistry cocktail [alkyne-TAMRA tag (Click Chemistry Tools, 100µM, 10 mM stock solution in DMSO), tris(2-carboxyethyl)phosphine hydrochloride (TCEP) (1 mM, 50 mM freshly prepared stock solution in water), tris[(1-benzyl-1-H-1,2,3-triazol-4-yl)methyl]amine (TBTA) (Click Chemistry Tools, 100µM, 10 mM stock solution in DMSO), CuSO4·5H2O (1 mM, 50 mM freshly prepared stock solution in water)]. The reaction was gently vortexed before proceeding for 1 h at room temperature. After this time, ice cold methanol (1 mL) was then added to the reaction, and the samples were incubated at −20◦C for 2 h. The samples were then centrifuged (10 min, 10,000 × g, 4◦C), resulting in a pellet of the proteins. This protein-pellet was air-dried (5–10 min), and 25 µL of 4% SDS buffer (4% SDS, 150 mM NaCl, 50 mM 50 mM triethanolamine pH 7.4) was then added. The mixture was completely dissolved using a bath sonicator, followed by addition of 25 µL of 2× SDS-free loading buffer (20% glycerol, 0.2% bromophenol blue, 1.4% βmercaptoethanol, pH 6.8). The samples were fully denatured by incubation at 97◦C for 5 min, and 40 µg of protein was then separated by SDS-PAGE. The resulting gels were then visualized using a Typhoon 9400 Variable Mode Imager (GE Healthcare) with a 532 nm laser for excitation and a 30 nm bandpass filter centered at 610 nm for detection.

#### β-Elimination

β-Elimination was performed as previously described (Darabedian et al., 2018).

#### PNGase F Treatment

PNGase F was obtained from New England Biolabs and treatment was performed according to the manufacturer's protocol with some changes as previously described (Darabedian et al., 2018).

### Flow Cytometry of Cell-Surface Labeling With DBCO-Biotin

NIH3T3 cells were grown in 10 cm plates at 80–85% confluency and treated with 200µM MCRs or Ac4GlcNAc in triplicate for 16 h. The media was then removed and cells were washed with DPBS before being detached from the plate by incubation in 10 mL of 10 mM EDTA in DPBS at 37◦C for 10 min. Cells were collected by centrifugation (5 min, 800 × g, 4◦C) and were washed three times with DPBS (5 min, 800 × g, 4◦C). Cells were then resuspended in 200 µL PBS containing DBCO-biotin (Click Chemistry Tools, 60µM, 10 mM stock in DMSO) for 1 h at RT, after which time they were washed three times with DPBS (5 min, 800 g at 4◦C) before being resuspended in ice-cold PBS containing fluorescein isothiocynate (FITC) conjugated avidin (Sigma, 5 µg µL −1 , 1 mg/mL stock) for 30 min at 4 ◦C. Cells were then washed three times in DPBS (5 min, 800 × g, 4◦C) and then resuspended in 400 µL PBS containing propidium iodide (2.5 µg mL−<sup>1</sup> in DPBS, 1 mg/mL stock in DPBS] for 30 min. A total of 50,000 cells were analyzed on a BD Accuri C6 Flow Cyometer, and dead cells (propidium iodide positive) were excluded.

### GlcNAc-2-Epimerase Cloning and Expression

Condon optimized GlcNAc-2-Epimerase (11–427) was purchased from Integrated Device Technology and cloned into pET-28b vector between NdeI and HindIII using standard molecular cloning techniques. Plasmid is available upon request. Pet28b-6XHis-GlcNAc-2-Epimerase (11–427) was subsequently transformed into BL21 E. coli (Novagen). Terrific broth (1.8 L) containing kanamycin (50 µg mL−<sup>1</sup> ) was inoculated with 24 mL of starter culture grown overnight at 37◦C. The culture was grown at 37◦C until an OD (A600) of 0.80 was obtained, at which time expression was induced with addition of 1 mM isopropyl β-D-1-thiogalacto-pyranoside (IPTG) for 5 h at 37◦C. Cells were harvested by centrifugation (10 min, 6,000 × g, 4◦C). The pooled pellets were suspended in 20 mL Buffer A (250 mM NaH2PO4, 300 mM NaCl, 20 mM imidazole, pH 7.4) and tip sonicated for 12 min (30 s on, 15 s off). The resulting lysate was then centrifuged (15 min, 30,000 × g, 4◦C), and the supernatant was transferred into a new tube and centrifuged again (30 min, 30,000 × g, 4◦C). The supernatant was transferred into a new tube and 2 mL of pre-washed Cobalt Metal Affinity Resin (Genesee Scientific) was added and placed onto a rotator for 1 h at 4◦C. The solution was then transferred to a gravity-flow column, allowed to drain, and the beads were washed with an additional 100 mL of Buffer A. 6XHis-GlcNAc-2-Epimerase (11–427) was eluted with 5 mL fractions of Buffer B (25 mM NaH2PO4, 300 mM NaCl, 250 mM imidazole, pH 7.4). Fractions containing 6XHis-GlcNAc-2-Epimerase (11–427) were concentrated to 2 mL using a 10 kDa cutoff Amicon Ultra-15 Centrifugal Filter. The concentrate was dialyzed overnight into 2 L of dialysis buffer (20 mM NaH2PO4, 1 mM EDTA, 5% glycerol, 0.5% 2-Mercaptoethanol, pH 7.0). The dialyzed solution was then concentrated to 0.5 mL using a 10 kDa cutoff Amicon Ultra-15 Centrifugal Filter and stored at −80◦C in single use aliquots.

### HPLC Analysis for GlcNAc-2-Epimerase Conversion

To a 500 µL microcentrifuge tube was added 10 µL of phosphate buffer (1 M, pH 7.5), 10 µL of ATP (50 mM, dissolved in water), 10 µL of MgCl<sup>2</sup> (100 mM, dissolved in water), 20 µL of ManNAc or 2AzMan (500 mM, dissolved in water), and 5 µL of GlcNAc-2-Epimerase (or water). Water, 45 µL for samples containing GlcNAc-2-Epimerase or 50 µL for null samples, was then added to the reaction mixtures. The samples were then incubated at 37◦C for 12 h and then lyophilized. The resulting solids were suspended in 75 µL of pyridine and then 25 µL of acetic anhydride was added and allow to rotate for 16 h. Then 5 µL of each solution was diluted to 100 µL using water and 20 µL was injected onto an Agilent Eclipse XDB-C18 (5µm, 4.6 × 150 mm) running at 1 ml/min and PDA set to a wavelength 200 nm. Buffer A was H2O containing 0.1% TFA, buffer B was 90% ACN, 10% H2O containing 0.1% TFA. HPLC conditions for 2AzMan were 10 min at 25% B and then a ramp to 70% over 20 min. HPLC conditions for ManNAc were 10 min at 10% B and then a ramp to 50% over 20 min.

#### Biotin Enrichment for Proteomics

H1299 cells were treated as indicated and cells were collected by trypsinization and pelleted by centrifugation for (2 min, 2,000 × g), followed by washing 2× with PBS. The resulting cell-pellets were resuspended in 4% SDS buffer (4% SDS, 10 mM TEA pH 7.4, 150 mM NaCl) containing c0mplete, Mini, EDTA-free Protease Inhibitor Cocktail Tablets (Roche), tip sonicated for 15 s, and cleared by centrifugation (10 min, 10,000 × g, 15◦C). Soluble protein concentration was normalized by BCA assay (Pierce, ThermoScientific) to 1 mg mL−<sup>1</sup> , and 1 mg of total protein was subjected to the appropriate amount of click chemistry cocktail containing alkyne-biotin (Click Chemistry Tools) for 1 h, then 10 µL of 0.5 M EDTA was added. Then proteins were precipitated by adding a 7.5 mL of methanol, 1.9 mL of CHCl3, and 5 mL of H2O followed by vortexing and centrifugation (5 min, 5,000 × g). The aqueous phase was discarded without disturbing the interface layer after which 3.7 mL of methanol was added, vortexed, and centrifuged (10 min, 10,000 × g, 4◦C). The supernatant was removed and the pellet was allowed to air-dry for 5 min and then a 100 µL of 4% SDS was added. The mixture was sonicated in a bath sonicator to ensure complete dissolution, and 1.9 mL of 10 mM TEA, pH 7.4, 150 mM NaCl was added followed by 125 µL of high-capacity NeutrAvidin beads (ThermoScientific, prewashed three times with 0.2% SDS, 150 mM NaCl, 50 mM TEA pH 7.4) and incubated on a rotator for 1.5 h. Afterwards, the beads were washed with 6× 1% SDS in PBS, 3 × 4M urea in PBS, and 8× 50 mM NH4HCO3. The beads were then resuspended in 1 mL of 50 mM NH4HCO3, 10 mM TCEP (pH 8) and incubated for 30 min with gentle shaking. Afterwards, the resin was washed with 50 mM NH4HCO<sup>3</sup> and the beads were resuspended in 1 mL of 10 mM iodoacetamide (pH 8) in 50 mM NH4HCO<sup>3</sup> and incubated in the dark for 30 min. The beads were then washed 3 × 50 mM NH4HCO3, and resuspended in 100 µL 50 mM NH4HCO3. Then 2 µL of CaCl<sup>2</sup> (200 mM in H2O) and 2 µL of trypsin (Sequencing Grade, Promega, 0.1 µg/µl) were added and incubated for 18 h at 37◦C. The beads were centrifuged, the supernatants were transferred into clean tubes, and the beads were washed with an additional 100 µL 1% formic acid, 100 µL 15 % acetonitrile in H2O and 100 µL 1% FA in H2O. The combined elution and wash were desalted on C18 Spin Columns (Pierce, ThermoScientific) according to the manufacturer's protocol and lyophilized to dryness.

#### Proteomics

A nanoElute was attached in line to a timsTOF Pro equipped with a CaptiveSpray Source (Bruker). Chromatography was conducted at 40◦C through a 25 cm reversed-phase Aurora Series C18 column (IonOpticks) at a constant flow-rate of 0.4 µL/min. Mobile phase A was 98/2/0.1% Water/acetonitrile/formic acid (v/v/v) and phase B was acetonitrile with 0.1% Formic Acid (v/v). During a 120 min method, peptides were separated by a 4-step linear gradient (0% to 15% B over 60 min, 15% to 23% B over 30 min, 23% to 35% B over 10 min, 35% to 80% over 10 min) followed by a 10 min isocratic flush at 80% for 10 min before washing and a return to low organic conditions. Experiments were run as data-dependent acquisitions with ion mobility activated in parallel accumulation serial fragmentation (PASEF) mode. MS and MS/MS spectra were collected with m/z 400–1,500 and ions with z = +1 were excluded. Raw data files were processed with Peaks Studio. Fixed modifications included +57.02146 C. Variable modifications included Acetyl +42.010565 N-term, pyro-Glu −17.026549 N-term Q, pyro-Glu −18.010565 N-term E. Precursor tolerance 30.0 ppm. False discovery rate was set to 0.01 with significance calculated using ANOVA.

### Chemical Enrichment of Glycoproteins and Sample Preparation for IsoTag

H1299 cells were treated as indicated and cells were collected by trypsinization and pelleted by centrifugation for (2 min, 2,000 × g), followed by washing 2× with PBS. The resulting cellpellets were lysed on ice by probe tip sonication in 1 × PBS + 1% SDS (1 mL), containing EDTA-free Pierce HaltTM protease inhibitor cocktail. Debris were removed from the cellular lysate by centrifugation (20,000 × g) for 20 min at 4◦C and the supernatant transferred to a new Eppendorf tube. A BCA protein assay (Pierce) was performed and protein concentration was adjusted to 3.5 µg/µL with lysis buffer. Protein lysate (1.4 mg, 400 µL) was treated with a pre-mixed solution of the click chemistry reagents [100 µL; final concentration of 200µM IsoTaG silane probe (3:1 heavy:light mixture), 500µM CuSO4, 100µM THPTA, 2.5 mM sodium ascorbate] and the reaction was incubated for 3.5 h at 24◦C. The click reaction was quenched by a methanol-chloroform protein precipitation [aqueous phase/methanol/chloroform = 4:4:1 (v/v/v)]. The protein pellet was allowed to air dry for 5 min at 24◦C. The dried pellet was resuspended in 1 × PBS + 1% SDS (400 µL) by probe tip sonication and then diluted in PBS (1.6 mL) to a final concentration of 0.2% SDS. Streptavidin-agarose resin [400 µL, washed with PBS (3 × 1 mL)] were added to the protein solution and the resulting mixture was incubated for 12 h at 24◦C with rotation. The beads were washed using spin columns with 8 M urea (5 × 1 mL), and PBS (5 × 1 mL). The washed beads were resuspended in 500 µL PBS containing 10 mM DTT and incubated at 37◦C for 30 min, followed by addition of 20 mM iodoacetamide for 30 min at 37◦C in the dark. The reduced and alkylated beads were collected by centrifugation (1,500 × g) and resuspended in 520 µL PBS. Urea (8 M, 32 µL) and trypsin (1.5 µg) was added to the resuspended beads and digestion was performed for 16 h at 37◦C with rotation. The beads were washed three times with PBS (200 µL) and distilled water (200 µL). The IsoTaG silane probe was cleaved with 2% formic acid/water (2 × 200 µL) for 30 min at 24◦C with rotation and the eluent was collected. The beads were washed with 50% acetonitrilewater + 1% formic acid (2 × 500 µL), and the washes were combined with the eluent to form the cleavage fraction. The

cleavage fractions were dried in a vacuum centrifuge and stored at −20◦C until analysis.

### Mass Spectrometry Parameters Used for Glycoproteomics

A Waters nanoAcquity system was coupled to a ThermoScientific Orbitrap Fusion Tribrid with a nano-electrospray ion source. Half of the sample was reconstituted in 10 µL of 5% acetonitrile and 0.1% formic acid in water, loaded onto a C18 trap column (WATERS Cat # 186008821 nanoEase MZ Symmetry C18 Trap Column, 100 Å, 5µm × 180µm × 20 mm), and separated on an analytical column (WATERS Cat # 186008795 nanoEase MZ Peptide BEH C18 Column, 130 Å, 1.7µm × 75µm × 250 mm). Mobile phases A and B were water with 0.1% formic acid (v/v) and acetonitrile with 0.1% formic acid (v/v), respectively. Peptides were separated with a linear gradient from

TABLE 1 | 2AzMan-modified peptides.

5 to 30% B within 95 min, followed by an increase to 50% B within 15 min and further to 98% B within 10 min, and reequilibration. The instrument parameters were set as previously described (Ramirez et al., 2020) with minor modifications. Briefly, MS1 spectra were recorded from m/z 400–2,000 Da. If oxonium product ions (HexAz0Si +288.1190 Da; HexAz2Si +290.1316 Da) were observed in the HCD spectra, ETD with supplemental activation (35%) was performed in a subsequent scan on the same precursor ion selected for HCD. The raw data was processed using Proteome Discoverer 2.4 (Thermo Fisher Scientific). Both HCD and EThcD spectra were searched against a database containing the Swissprot 2018 annotated human proteome (20,355 proteins, downloaded on Feb. 21, 2019) and contaminant proteins using Sequest HT and Byonic algorithms. The searches were performed with the following guidelines: trypsin as enzyme, 2 missed cleavages allowed; 10 ppm mass error tolerance on precursor ions; 0.02 Da mass error tolerance


on fragment ions; variable modifications (methionine oxidation, +15.995 Da; carbamidomethyl cysteine, +57.021 Da; and others as described below). Intact glycopeptide searches allowed for the tagged hexose, or for the mono-acetylated, di-acetylated or triacetylated form (HexAz0Si, +287.112 Da; HexAz2Si, +289.124 Da; mono-acetylated HexAz0Si, +329.122 Da; mono-acetylated HexAz2Si, + 331.135 Da; di-acetylated HexAz0Si, +371.133 Da; di-acetylated HexAz2Si, +373.145 Da; tri-acetylated HexAz0Si, +413.143 Da; and tri-acetylated HexAz2Si, +415.156 Da) on asparagine, cysteine, serine, and threonine. Glycopeptide spectral assignments passing a false discovery rate of 1% at the spectrum level based on a target decoy database were manually validated for an isotope precursor pattern.

#### UDP-GalNAz Chemoenzymatic Labeling and Cu(I)-Catalyzed [3 + 2] Azide–Alkyne Cycloaddition (CuAAC) for Western Blotting

H1299 cells were grown to 80% confluency, collected by trypsinization, and washed two times with PBS with collection by centrifugation (2 min, 2,000 × g, 4◦C). The cell pellets were then resuspended in 2.5 mL 4% SDS (4% SDS, 50 mM TEA, 150 mM NaCl, pH 7.4) containing 12.5 mg of c0mplete Mini Protease Inhibitor Cocktail (Roche). The suspended cells were subjected to tip sonication (3× 10 s on, 10 s off) followed by centrifugation (10 min, 10,000, 15◦C). The soluble protein was collected and the concentration was normalized by BCA assay (Pierce) to 1 mg mL−<sup>1</sup> using 1% SDS (1% SDS, 50 mM TEA, 150 mM NaCl, pH 7.4). Proteins were subjected to methanol/chloroform/H2O precipitation. The resulting protein pellet was allowed to airdry for 5–10 min before being resuspended in 10% of the original volume using 1% SDS chemoenzymatic buffer (1% SDS, 20 mM HEPES, pH 7.9). Protein concentration was normalized using the BCA Assay and diluted to 2.5 mg mL−<sup>1</sup> in 1% SDS chemoenzymatic buffer. To 2,400 µL of resuspended protein (6 mg) was added 2,940 µL of H2O, 4,800 µL of labeling buffer (2.5×; 5% IGEPAL CA-630, 125 mM NaCl, 50 mM HEPES, pH 7.9), 660 µL of MnCl<sup>2</sup> (100 mM in H2O), and 900 µL of UDP-GalNAz (0.5 mM in 10 mM HEPES, pH 7.9) and vortexed. Finally, 300 µL of purified GalT Y289L or H2O was added, and the reaction mixture was incubated for 20 h at 4◦C. The unreacted UDP-GalNAz was then removed by methanol/chloroform/H2O precipitation. Air-dried protein pellets were resuspended in 1,500 µL of 4% SDS. To generate inputs, 40 µL was removed and combined with 40 µL of 2× loading buffer (20% glycerol, 0.2% bromophenol blue, pH 6.8, and 14 µL/mL β-mercaptoethanol). The remaining labeled lysate was diluted to 6 mL using SDS-free buffer (150 mM NaCl, 50 mM TEA pH 7.4) and newly made click chemistry cocktail (420 µL) was added to each sample [alkyne-azo-biotin (Click Chemistry Tools, 100µM, 5 mM stock solution in DMSO); tris(2-carboxyethyl)phosphine hydrochloride (TCEP) (1 mM, 50 mM freshly prepared stock solution in water); tris[(1-benzyl-1-H-1,2,3- triazol-4-yl)methyl]amine (TBTA) (Click Chemistry Tools, 100µM, 10 mM stock solution in DMSO); CuSO4·5H2O (1 mM, 50 mM freshly prepared stock solution in water). After 1 h, 60 µL of 0.5 M EDTA was added, and then the proteins were subjected to methanol/chloroform/H2O precipitation. The resulting proteins were then suspended in 600 µL of 4% SDS buffer.

### Metabolic Chemical Labeling and Cu(I)-Catalyzed [3 + 2] Azide–Alkyne Cycloaddition (CuAAC) for Western Blotting

H1299 cells were grown to 80% confluency, collected by trypsinization, and washed two times with PBS with collection by centrifugation (2 min, 2,000 × g, 4◦C). The cell pellets were then resuspended in 1 mL 4% SDS containing 5 mg of c0mplete Mini Protease Inhibitor Cocktail (Roche). The suspended cells were subjected to tip sonication (3× 10 s on, 10 s off) and followed by centrifugation (10 min, 10,000, 15◦C). The soluble protein was collected and the concentration was normalized by BCA assay (Pierce) to 4 µg µL −1 in 4% SDS buffer. To generate inputs, 40 µL was removed and combined with 40 µL of 2× loading buffer. To 1,500 µL of lysate (6 mg of protein) was added 780 µL of 1.25× SDS-buffer (1.25% SDS, 50 mM TEA, 150 mM NaCl, pH 7.4), 3.3 ml of SDS-free buffer, and 420 µL of freshly made click chemistry cocktail was added [alkyne-azo-biotin (Click Chemistry Tools, 100µM, 5 mM stock solution in DMSO); tris(2-carboxyethyl)phosphine hydrochloride (TCEP) (1 mM, 50 mM freshly prepared stock solution in water); tris[(1-benzyl-1-H-1,2,3- triazol-4-yl)methyl]amine (TBTA) (Click Chemistry Tools, 100µM, 10 mM stock solution in DMSO); CuSO4·5H2O (1 mM, 50 mM freshly prepared stock solution in water). After 1 h, the proteins were subjected to methanol/chloroform/H2O precipitation. The resulting protein pellets were then suspended in 600 uL of 4% SDS buffer.

#### Biotin Enrichment and Western Blotting

To the labeled proteins (600 µL, 6 mg) was added 11.4 mL of SDS-free buffer and 300 µL of high-capacity NeutrAvidin beads (ThermoScientific, pre-washed three times with 0.2% SDS, 150 mM NaCl, 50 mM TEA pH 7.4), then the mixture was then incubated for 1.5 h. The resulting mixture was transferred into a gravity flow chromatography column and drained. The beads were washed 10× with 3 mL of 1% SDS in PBS and then transferred into a dolphin nose tube. Each sample was then incubated with 300 µL of 25 mM sodium hydrosulfite for 30 min, beads were then collected by centrifugation (2 min, 2,500 × g) and the supernatant was collected. The procedure was repeated two more times. The supernatant was pooled, 4× volume of ice-cold methanol was added, and was placed at −20◦C for 2 h. Precipitated proteins were then collected by centrifugation (10 min, 10,000 × g, 4◦C). The supernatant was removed and the pellet was allowed to air-dry for 10 min and then 37.5 µL of 4% SDS buffer was added to each sample. The mixture was sonicated in a bath sonicator to ensure complete dissolution and then 37.5 µL of 2X SDS-free loading buffer was added. The samples were boiled for 5 min at 97◦C and 20 µL of input or 25 µL of enriched sample was loaded per lane for SDS-PAGE separation.

#### RESULTS

In order to further explore the potential of structural diverse monosaccharide analogs as potential MCRs, we purchased Ac42AzMan and synthesized Ac44AzGal in a two-step, one-pot synthesis from commercially available 1,2,3,6-O-acetyl-glucose (Supporting Information). We then incubated these compounds (200µM or 2 mM) at 37◦C for 2 h with HeLa cell lysates, under the same conditions previously reported to result in chemical modification of cysteine residues (Qin et al., 2018), as well as chemically-denatured (1% SDS) cell lysates. In parallel, we treated HeLa cells in culture with the same compounds (200µM) for 16 h, our standard MCR labeling protocol. The samples were then subjected to copper(I)-catalyzed azide-alkyne cycloaddition (CuAAC) with alkyne-TAMRA and then analyzed by in-gel fluorescence (**Figure 2**). Incubation of the reporters with cell lysates resulted in a small amount of labeling over background that was essentially identical for the two compounds. However, we observed noticeably higher live-cell labeling for Ac42AzMan compared to Ac44AzGal. Notably, the pattern of both reportermodified proteins in lysates largely matches those proteins that non-selectively react with alkyne-TAMRA during CuAAC, presumably abundant proteins in the lysate. This pattern is conserved in the cells labeled with Ac44AzGal, indicating that these proteins are indeed the result of background chemical modification. In contrast, cells treated with Ac42AzMan resulted in the visualization of several unique protein bands. These results confirm that high concentrations of per-O-acetylated MCRs can result in at least low levels of background protein labeling, but also suggest that cellular metabolism is critical for the robust labeling observed with Ac42AzMan treatment.

We next set out to characterize the labeling of living cells by Ac42AzMan. First, we explored the possibility that the reporter is incorporated into cell-surface glycosylation. Accordingly, H1299 cells were treated with Ac42AzMan (200µM) or vehicle for 16 h. We chose this concentration, as in our experience acetylated MCRs are often toxic to cells at higher concentrations. The corresponding total cell lysates were then split and treated with water or PNGase-F for 5 h at 37◦C to enzymatically remove N-linked glycosylation. The samples were then subjected to CuAAC with alkyne-TAMRA and analyzed by in-gel fluorescence and lectin blotting (**Figure 3A**). The fluorescent gel scanning showed no loss of signaling upon

FIGURE 2 | Neither Ac42AzMan nor Ac44AzGal chemically-modifies cell lysates, but Ac42AzMan labels proteins in living cells. Native (A) or denatured (B) HeLa cell lysates or living HeLa cells (B) were treated with the indicated concentrations of the MCRs at 37◦C for 2 or 16 h, respectively. After this length of time, CuAAC with alkyne-TAMRA was performed and any labeled proteins were visualized by in-gel fluorescence scanning.

PNGase-F treatment but dramatic removal of N-linked glycans as visualized by Concanavalin A (ConA), demonstrating that the MCR is not incorporated into this type of glycosylation to any significant extent. Next, we used flow cytometry to more broadly examine the potential incorporation of Ac42AzMan into cell-surface glycoconjugates. More specifically, NIH3T3 cells were treated with Ac42AzMan (200µM) or vehicle for 16 h. Simultaneously, the same cell-line was treated separately with either Ac4ManNAz (200µM) or Ac36AzGlcNAc (200µM). Ac4ManNAz treatment serves as a positive control for cell surface labeling (Saxon and Bertozzi, 2000), while we have previously shown that Ac36AzGlcNAc treatment results in the exclusive modification of intracellular proteins (Chuh et al., 2014). After 16 h, the cells were released, reacted with DBCO-Biotin, and incubated with FITC-Avidin before analysis by flow-cytometry (**Figure 3B**). As expected, we observed high levels of labeling after Ac4ManNAz treatment but essentially no signal over background from Ac36AzGlcNAc treated cells. Consistent with our PNGase-F experiment, we also found no cell-surface labeling with Ac42AzMan. Next, we took advantage of β-elimination chemistry to test whether the observed signal was due to base labile modifications on residues such as serine, threonine, or cysteine. H1299 cells were first treated with Ac42AzMan (200µM) or vehicle for 16 h before the corresponding cell lysates were subjected to CuAAC with alkyne-biotin. The samples were then ran in duplicate on SDS-PAGE and transferred to a nitrocellulose membrane. The membranes were then incubated at 40◦C for 24 h in either water or 55 mM NaOH and then visualized

using streptavidin blotting (**Figure 3C**). We found that the βelimination conditions removed essentially all of the Ac42AzMan labeling. As a control for the chemistry, we also visualized the loss of intracellular O-GlcNAc modifications by Western blotting under the same conditions (**Figure 3C**).

The fact that Ac42AzMan results in intracellular protein modification raised the possibility that it was entering the O-GlcNAc modification pathway. More specifically, we hypothesized that 2AzMan might be enzymatically converted to 2-azido-2-deoxy-glucose (2AzGlc) by the enzyme Nacetylglucosamine 2-epimerase, as we and the Vocadlo lab demonstrated that 2AzGlc is an MCR for O-GlcNAcylation (Shen et al., 2017; Zaro et al., 2017). To directly test this possibility, we first incubated an anomeric mixture of α- and β-ManNAc with either buffer or recombinantly expressed N-acylglucosamine 2-epimerase (Uniprot P51606) for 12 h before analysis by HPLC (**Figure 4A**). As expected, Nacetylmannosamine (ManNAc) was enzymatically converted to two new peaks consisting of the α- and β-anomers of GlcNAc. In contrast, we observed no conversion of 2AzMan to 2AzGlc, rejecting our conversion hypothesis (**Figure 4A**). Simultaneously, we also co-treated H1299 cells with Ac42AzMan (200µM) and either the OGT inhibitor Ac45SGlcNAc (150µM) or DMSO for 16 h (Gloster et al., 2011). The lysates were then subjected to CuAAC with alkyne-TAMRA and labeling visualized by in-gel fluorescence (**Figure 4B**). In support of our in vitro experiment, we observed very little loss of Ac42AzMan labeling upon OGT inhibition. In contrast, a

FIGURE 3 | Ac42AzMan labels intracellular proteins and is likely on serine, threonine, and/or cysteine residues. (A) Removal of N-linked glycans has no effect on Ac42AzMan labeling. H1299 cells were treated with Ac42AzMan (200µM) or vehicle for 16 h before treatment of the corresponding lysates with PNGase-F to remove N-linked glycans. Any protein labeling was then observed by in-gel fluorescence scanning after CuAAC with alkyne-TAMRA. Lectin blotting with concavalin A confirmed the removal of N-linked glycans. (B) Ac42AzMan-treatment does not result in cell-surface labeling. NIH3T3 cells were incubated with the indicated MCRs (200µM) for 16 h. The cells were then harvested, reacted with DBCO-biotin, incubated with FITC-streptavidin, and analyzed by flow cytometry. Error bars represent ±s.e.m. from the mean of three biological replicates (*n* = 3). (C) β-Elimination results in loss of Ac42AzMan labeling. H1299 cells were treated with Ac42AzMan (200µM) or vehicle for 16 h. After this time, the corresponding lysates were subjected to CuAAC with alkyne-biotin and SDS-PAGE. After transfer the corresponding PVDF membranes were incubated with either NaOH or H2O before streptavidin or Western blotting.

similar OGT inhibition experiment with Ac42AzGlc previously resulted in loss of over half of the labeling (Shen et al., 2017). Together, these experiments argue against the conversion of 2AzMan to 2AzGlc and subsequent incorporation into O-GlcNAc modifications.

Next, we set out to identify the 2AzMan-modified proteins. Accordingly, we treated H1299 cells in triplicate with either Ac42AzMan (200µM) or DMSO vehicle for 16 h, followed by CuAAC with alkyne-biotin and enrichment of any modified proteins with neutravidin beads. We then performed on-bead trypsinolysis and identification of the resulting peptides by Label Free Quantitative (LFQ) proteomics (Cox et al., 2014) (n = 3 biological replicates with a false discovery rate of 0.01). This allowed us to identify over 1,000 2AzMan-labeled proteins based on several criteria (**Figure 5** and **Table S1**): the protein must have been identified in at least 2 out of the 3 biological replicates, the enrichment ratio (LFQ-based) must have been at least 5 linear-fold greater in the treated samples vs. vehicle, and the statistical significance (p-value) of this difference must have been <0.01 (Student's t-test) This cutoff was chosen arbitrarily and is fairly stringent; however, a full list of proteins enriched at lower ratios can be found in **Table S1**. Consistent with our other biochemical analysis, the enriched proteins represented a wide-range of intracellular proteins. We next employed the IsoTaG platform (Woo et al., 2015, 2017) to determine if 2AzMan or 4AzGal could be directly identified on proteins and the specific sites of those modifications. We again treated H1299 cells with either Ac42AzMan (200µM), Ac44AzGal (200µM), or DMSO vehicle in duplicate for 16 h. We then subjected the corresponding lysates to CuAAC with a mixture of isotopicallylabeled, cleavable biotin tags. After enrichment of the labeled proteins on streptavidin beads and on-bead trypsinolysis, we eluted the directly modified, and therefore isotopically encoded, peptides using weak acid. Subsequent LC-MS analysis using the IsoStamp v2.0 software was then used to look for un-, mono-, di-, or tri-acetylated azido-hexose modification of any Asn, Ser, Thr, or Cys residue. Using IsoTaG, we were able to localize 2AzMan on 33 peptides, with all of the modifications on Cys (**Table 1** and **Table S2**), while we found no peptides modified by 4AzGal. Overall our results are consistent with the background MCR labeling of proteins on cysteine residues seen by Chen and Wang (Qin et al., 2018; Hao et al., 2019) but also indicate that cellular metabolism of Ac42AzMan plays an important role that distinguishes it from Ac44AzGal.

The chemical modification of intracellular proteins upon treatment of living cells with certain MCRs could detrimentally affect the discovery of legitimately O-GlcNAcylated proteins. To explore this question, we performed a meta-analysis of potential O-GlcNAcylated proteins identified using common MCRs for O-GlcNAc (Ac4GlcNAz, Ac4GalNAz, Ac36AzGlcNAc, etc.) (Worth et al., 2017) and with other methods that detect endogenous O-GlcNAcylation (lectin chromatography, anti-O-GlcNAc antibodies, or chemoenzymatic modification). A complete list of the proteomics studies used in this analysis is available in the **Supporting Information**. More specifically, we used python scripts (**Scripts S1–S3**). The first script produces one comprehensive file containing all of the identified proteins and

their associated proteomic studies, ordered by the occurrences for each protein from most to least, which should be of general interest to the field and allow for the easy identification of

fluorescence scanning.

potentially O-GlcNAcylated proteins across datasets. The second and third scripts first return a list a proteins identified by a particular method (e.g., MCR treatment) and then generate the numbers of exclusive and overlapping proteins for each identification technique, allowing us to generate a large Venn diagram (**Figure 6A**). Consistent with a significant amount of published work using MCRs to discover new O-GlcNAc modified proteins, essentially half of the MCR-identified proteins were also found to be O-GlcNAcylated by at least one of the other techniques. However, the MCRs also yielded the highest number of exclusively identified proteins. We reasoned that these proteins could arise from the chemical modification of proteins, the induction of O-GlcNAcylation by MCR treatment, or reporting on endogenous glycosylation that was missed during the proteomic analysis using other techniques. To estimate how many of the proteins exclusive to MCRs fall into this last category of bonafide O-GlcNAcylated proteins, we selected four such proteins at random: TRADD (Uniprot Q15628), calreticulin (Uniprot P27797), USP10 (Uniprot Q14694), and CYLD (Uniprot Q9NQC7). We then treated H1299 or HeLa cells with Ac4GlcNAz (200µM) for 16 h, followed by CuAAC with a cleavable biotin-linker (Darabedian and Pratt, 2019). The modified proteins were then enriched on streptavidin beads, extensively washed, and eluted before visualization by Western blotting (**Figure 6B**). As expected from the proteomic data, we found all four of these proteins to be enriched, as well as the known O-GlcNAcylated proteins Nup62 and CREB. Simultaneously, we subjected H1299 and HeLa cell lysates to chemoenzymatic modification followed by the same CuAAC and enrichment procedure. Analysis by Western blotting showed enrichment over background of TRADD and calreticulin in both cells lines, confirming their O-GlcNAcylation status, while CYLD was not enriched in either cell line (**Figure 6C**). These data suggest that the commonly used MCRs for O-GlcNAc probably do result in the enrichment and false identification of some proteins that are not endogenously modified. However, they also indicate that this number is not overwhelmingly large, with a crude estimation that of the proteins only identified using an MCR ∼50% are real O-GlcNAcylated proteins that can be confirmed using another technique. Combining this estimation with the documented overlap with other techniques in the Venn diagram suggests that around 75% of the proteins found by MCRs are indeed modified.

### DISCUSSION

The relatively recent discovery that per-O-acetylated MCRs can label cysteine residues when incubated this protein lysates (Qin et al., 2018; Hao et al., 2019) has raised important questions about some of the biological conclusions that have been drawn using these tools. This is particularly true for MCRs that target intracellular O-GlcNAcylation due to the increased abundance of free cysteine sulfhydryl groups compared to the cell surface, where many cysteines are found as oxidized disulfides. Despite good evidence for this background modification, the proposed mechanism of a reaction between cysteine sides-chains and the anomeric O-acetate (Qin et al., 2018) of the MCR was somewhat chemically unsatisfying.

Here, we demonstrate that at least some background, chemical labeling of proteins is due to selective metabolism of certain MCRs by living cells. Specifically, when we incubated cell lysates with two potential MCRs, Ac42AzMan, and Ac44AzGal (**Figure 1B**), we observed essentially no labeling (**Figure 2**). However, Ac42AzMan showed robust labeling of protein in living cells, while Ac44AzGal did not (**Figure 2**). Using a variety of biochemical techniques, we demonstrated that Ac42AzMan labeling is not due to incorporation into cell surface glycosylation (**Figures 3A,B**) but is instead found on various intracellular proteins, most likely through serine, threonine, or cysteine residues (**Figure 3C**). This raised the possibility that 2AzMan is enzymatically epimerized into the previously characterized O-GlcNAc MCR, 2AzGlc (Shen et al., 2017; Zaro et al., 2017). However, we ruled out this possibility using both in vitro enzymology (**Figure 4A**) and in cells through inhibition of O-GlcNAc transferase (**Figure 4B**). We then confirmed our biochemical analysis using proteomics to show widespread labeling of intracellular proteins (**Figure 5** and **Table S1**) and essentially exclusive modification of cysteine residues over other potential side chains (**Table 1** and **Table S2**). Together, our data support the work by Wang and Chen by demonstrating that background modification of proteins by O-acetylated MCRs is certainly a possibility. However, we also show that this labeling is not exclusively because of

(Student's *t*-test) are marked in red.

direct chemical modification of proteins by per-O-acetylated MCRs but can also result from metabolism in living cells. Notably, this metabolism-dependent, background labeling is not universal, since Ac44AzGal treatment does not result in protein modification.

We do not know yet know the metabolic pathways that are involved in this observation, but we speculate that it could be the deacetylation of different hydroxyl groups on the MCR. In particular, the enzymatic deacetylation of the 1-hydroxyl of any monosaccharide could result in the generation of reactive aldehyde. Importantly, this is consistent with the observation that MCRs with a free anomeric position more readily react with proteins (Hao et al., 2019) and display increased cellular toxicity (Aich et al., 2008). The fact that we detected partially Oacetylated-2AzMan on cysteines in the proteomics data supports this possibility as at least contributing to the labeling. It is equally possible that inherent differences in the chemical structure and therefore reactivity of the MCRs is the driving force behind the different levels of cellular labeling. For example, after deacetylation the azide at the 2-position of 2AzMan would result in a very stereoelectronically different environment around the reactive 1-aldehyde compared to 4AzGal. Therefore, it is also possible that the MCRs are metabolized similarly by the cells but then modify proteins because of reactivity differences derived from their chemical structures. It is also important to point out that we do not believe that Ac42AzMan is acting as a reporter for glycosyltransferase-mediated labeling of cysteine residues but rather as a precursor for a reactive metabolite that results in their chemical modification. Finally, our results strongly support the use of glycosite mapping compared to simply protein identification in proteomics. At the glycosite level, MCR-modification of serines and threonines, which are likely enzymatic modifications, can easily be distinguished from cysteine modifications that may be background. In fact, the numerous O- and N-linked glycan modification sites that have been identified using MCRs are almost certainly due to enzymatic addition, further highlighting the utility of metabolic probes.

Together, these results suggest that many of the proteins that have been identified as being O-GlcNAcylated by MCRs may be background-labeled proteins instead. To investigate this possibility, we performed an analysis of proteins who had been previously identified as being potentially O-GlcNAcylation using different techniques, including MCRs, chemoenzymatic modification, or lectin- or antibody-based enrichment (**Figure 6A**). Notably, we found that MCR-based identification did not result in an inordinate amount of unique identifications compared with the other techniques, all of which enrich endogenous O-GlcNAc modifications. However, given the potential for "off-target" labeling by MCRs and the largest number of potential O-GlcNAcylated proteins uniquely identified using these tools, we randomly chose 4 proteins from the "MCR-unique" list and first confirmed that a common O-GlcNAc-targeted MCR would indeed enrich these proteins (**Figure 6A**). We then used chemoenzymatic enrichment to determine if we could confirm that these proteins are indeed O-GlcNAcylated and found that at least 2 of them are endogenously modified (**Figure 6B**). In summary, our results further confirm that per-O-acetylated monosaccharide MCRs can label proteins in a way that does not necessarily reflect their glycosylation status. Despite this, we also found that overall MCRs are fairly reliable tools for the identification of O-GlcNAcylated proteins and should not be discarded but instead complementary methods should simply be used to confirm any potentially modified proteins.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study can be found in the ProteomeXchange Consortium under accession number PXD016217. Additionally, publicly available datasets were analyzed in this study. This data can be found in UniProt under the accession numbers listed in **Table 1**.

#### AUTHOR CONTRIBUTIONS

ND, BY, RD, GC, BZ, CW, and MP designed experiments and interpreted data. ND carried out the synthesis of MCRs. ND and GC performed analysis of MCR labeling in cell lysates. ND performed the cellular analysis of MCR labeling, flow cytometry, β-elimination, analysis of potential MCR epimerization, and MCR/chemoenzymatic IP analyses. BZ performed protein-level proteomic analysis. BY performed site-identification of MCR labeling by proteomics. RD generated the Python scripts and performed meta-analysis. ND, BY, BZ, CW, and MP prepared the manuscript.

#### FUNDING

This research was supported by the National Institutes of Health (R01GM125939) to MP; the National Institutes of

#### REFERENCES


Health (U01CA242098), the Burroughs Wellcome Fund Career Award at the Scientific Interface, and the Sloan Research Fellowship to CW; and the University of California San Francisco.

#### ACKNOWLEDGMENTS

Protein identification by proteomics was performed at the Institute of Stem Cell Biology and Regenerative Medicine at Stanford University. The authors thank Profs. Irving L. Weissman and Peter K. Jackson for access to the instrumentation.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00318/full#supplementary-material

The supplementary material contains synthetic methods for Ac44AzGal preparation, NMR characterization of Ac44AzGal, list of published proteomic results used for meta-analysis in **Figure 6A**, Python scripts used to generate **Figure 6A**, instructions for the use of these Python scripts, proteomics data for both protein- (**Table S1**) and site-identification (**Table S2**).


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Darabedian, Yang, Ding, Cutolo, Zaro, Woo and Pratt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Conjugation of Synthetic Trisaccharide of *Staphylococcus aureus* Type 8 Capsular Polysaccharide Elicits Antibodies Recognizing Intact Bacterium

Ming Zhao1†, Chunjun Qin1†, Lingxin Li <sup>1</sup> , Haotian Xie<sup>2</sup> , Beining Ma<sup>2</sup> , Ziru Zhou<sup>2</sup> , Jian Yin<sup>1</sup> \* and Jing Hu<sup>2</sup> \*

*<sup>1</sup> Key Laboratory of Carbohydrate Chemistry and Biotechnology Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, China, <sup>2</sup> Wuxi School of Medicine, Jiangnan University, Wuxi, China*

#### *Edited by:*

*Rui Zhao, University of Chinese Academy of Sciences, China*

#### *Reviewed by:*

*Weizhi Wang, University of Chinese Academy of Sciences, China Zufeng Guo, Johns Hopkins University, United States*

#### *\*Correspondence:*

*Jian Yin jianyin@jiangnan.edu.cn Jing Hu hujing@jiangnan.edu.cn*

*†These authors have contributed equally to this work*

#### *Specialty section:*

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

*Received: 01 February 2020 Accepted: 17 March 2020 Published: 28 April 2020*

#### *Citation:*

*Zhao M, Qin C, Li L, Xie H, Ma B, Zhou Z, Yin J and Hu J (2020) Conjugation of Synthetic Trisaccharide of Staphylococcus aureus Type 8 Capsular Polysaccharide Elicits Antibodies Recognizing Intact Bacterium. Front. Chem. 8:258. doi: 10.3389/fchem.2020.00258*

*Staphylococcus aureus* causes a wide range of life-threatening diseases. One of the powerful approaches for prevention and treatment is to develop an efficient vaccine as antibiotic resistance greatly increases. *S. aureus* type 8 capsular polysaccharide (CP8) has shown great potential in vaccine development. An understanding of the immunogenicity of CP8 trisaccharide repeating unit is valuable for epitope-focused vaccine design and cost-efficient vaccine production. We report the chemical synthesis of conjugation-ready CP8 trisaccharide 1 bearing an amine linker, which effectively served for immunological evaluation. The trisaccharide 1-CRM197 conjugate elicited a robust immunoglobulin G (IgG) immune response in mice. Both serum antibodies and prepared monoclonal antibodies recognized *S. aureus* strain, demonstrating that synthetic trisaccharide 1 can be an efficient antigen for vaccine development.

Keywords: *Staphylococcus aureus*, capsular polysaccharide, synthetic oligosaccharide, immunogenicity, immunoglobulin G (IgG), monoclonal antibody, carbohydrate-based vaccine

#### INTRODUCTION

Staphylococcus aureus, one of the most common opportunistic human pathogens, causes a wide range of life-threatening diseases including endocarditis, abscesses, bacteremia, sepsis, and osteomyelitis (Zhang et al., 2018). The high prevalence and antimicrobial resistance of S. aureus strains have led to a high incidence of hospital infections and complicated treatment (O'Brien and McLoughlin, 2019). Particularly, the capsular serotype 5 and 8 strains cause most cases of severe disease and death worldwide (Gerlach et al., 2018; Ansari et al., 2019; Mohamed et al., 2019). It is urgent to provide protection against S. aureus infections. Vaccines based on isolated bacterial polysaccharides against Haemophilus influenzae, Neisseria meningitidis, and Streptococcus pneumoniae save millions of lives each year (Broecker et al., 2014). Numerous effort has been made to develop S. aureus serotype 5 and 8 capsular polysaccharides (CP5 and CP8)-based vaccine, such as StaphVAX (Nabi Biopharmaceuticals, Rockville, MD) and S. aureus four-antigen vaccine (SA4Ag), which show great potential in clinical trials (Shinefield et al., 2002; Fattom et al., 2015; Begier et al., 2017; Creech et al., 2017; Frenck et al., 2017; Ansari et al., 2019; O'Brien and McLoughlin, 2019).

However, the side effects and hyporesponsiveness from impurities and non-protective epitopes have hampered the development of polysaccharide-based vaccine (Anish et al., 2014).

Homogeneous polysaccharide antigens after tedious purification steps are required to increase vaccine quality, efficacy, and safety (Anish et al., 2014). Synthetic oligosaccharides provide an attractive alternative to furnish vaccines free of contaminants, particularly against non-culturable pathogens. Tremendous progress has been achieved in the field of developing synthetic oligosaccharide vaccines against human pathogenic bacteria (Verez-Bencomo et al., 2004; Aguilar-Betancourt et al., 2008; Shang et al., 2015; Kong et al., 2016; Liao et al., 2016; Schumann et al., 2017). Synthetic oligosaccharides with welldefined structures can facilitate epitope mapping, which allows for rational epitope design (Broecker et al., 2016). Most polysaccharide chains of pathogens contain repetitive sequences that can be an attractive option for epitope discovery and design (Anish et al., 2014; Schumann et al., 2014; Reinhardt et al., 2015; Menova et al., 2018). The immunogenicity of oligosaccharide antigen can be enhanced and evaluated after conjugation to a carrier protein. Insights into the immunological features of oligosaccharide antigens, such as epitope recognition patterns, binding affinities, and carbohydrate–antibody interactions, can be gained by dissecting oligosaccharide interactions with purified monoclonal antibodies (mAbs) using various biochemical and biophysical techniques (Reinhardt et al., 2015; Broecker et al., 2016; Liao et al., 2016; Emmadi et al., 2017; Lisboa et al., 2017; Kaplonek et al., 2018). Identification of the minimal epitopes of bacterial surface polysaccharides may contribute to more costefficient vaccines with limited synthetic effort (Anish et al., 2014; Pereira et al., 2015).

S. aureus CP5 and CP8 have been found as highly potent antigenic targets. To date, chemical synthesis of the trisaccharide repeating units of CP5 (Danieli et al., 2012; Yasomanee et al., 2016; Hagen et al., 2017; Behera et al., 2020) and CP8 (Visansirikul et al., 2015) has been achieved. The immunological mechanism remains unclear. During the course of our investigations on the synthesis of complex oligosaccharides, we have successfully completed several complicated bacterial lipopolysaccharide repeating antigens (Qin et al., 2018; Zou et al., 2018; Tian et al., 2020). Here, we describe the design and chemical synthesis of CP8 trisaccharide containing an amine linker at the reducing end with Dglucose and L-fucose as starting materials, which is ready for glycoconjugate preparation and glycan microarray fabrication. The immunogenicity of synthetic trisaccharide was evaluated with glycan microarray after the conjugation with CRM197 protein. The nontoxic diphtheria toxoid mutant CRM197 is often used in licensed vaccines, which can prove highly immunogenic (Hecht et al., 2009; Avci and Kasper, 2010; Broecker et al., 2011). The mAbs were generated and the recognition and binding with S. aureus strain were detected, indicating the great potential of synthetic trisaccharide **1** as an efficient vaccine antigen.

## MATERIALS AND METHODS

#### Chemicals and Instruments

Commercially available reagents and solvents (analytical grade) were used without further purification unless otherwise stated. The anhydrous solvents were obtained from an MBraun MB-SPS 800 Dry Solvent System. <sup>1</sup>H, <sup>13</sup>C, and two-dimensional NMR spectra were recorded on a Bruker Ultrashield Plus 400 MHz spectrometer at 25◦C. High-resolution mass spectra were acquired on an Agilent 6220 ESI-TOF mass spectrometer. Optical rotation (OR) was performed with a Schmidt & Haensch UniPol L 1000 at 589 nm and a concentration (c) expressed in g/100 mL. Infrared (IR) spectra were acquired on Nicolet iS5 spectrometer (Thermo Fisher).

### Synthesis of Trisaccharide 1

The synthetic route of building blocks **4** and **6** is outlined in **Scheme 1**, **Scheme 2**, respectively (synthetic procedure, see **Supplementary Material**). The synthetic route of target trisaccharide **1** is outlined in **Scheme 3**.

Procedure for the synthesis of N-benzyl-Nbenzyloxycarbonyl-3-aminopropyl 4-O-benzyl-3-O- (4-O-benzyl-3-O-[benzyl 3-O-benzyl-2-acetamido-2 deoxy-β-D-mannopyranosyl uronate]-2-acetamido-2 deoxy-α-L-fucopyranosyl)-2-acetamido−2-deoxy-α-Dfucopyranoside (**22**):

To a solution of compound **21** (synthetic procedure, see **Supplementary Material**) (8.3 mg, 6.7 µmol) in dichloromethane (DCM; 0.7 mL), water (2 µL) and trifluoroacetic acid (70 µL, 0.94 mmol) were added at room temperature. The reaction was stirred overnight and monitored by thin layer chromatography (TLC) analysis. After being quenched with triethylamine (Et3N; 0.1 mL), the solvent was evaporated. The residue was purified by silica gel column chromatography (DCM:methanol 20:1 v/v) to give 4,6-diol compound (6 mg, 5.2 µmol, 78%).

Water (0.28 mL), 2,2,6,6-tetramethylpiperidine 1-oxyl (TEMPO) (0.9 mg, 5.5 µmol), and (diacetoxyliodo) benzene (BAIB) (9 mg, 27.7 µmol) were added to a solution of the 4,6 diol compound (12.7 mg, 11 µmol) in DCM (0.6 mL) at room temperature. The mixture was stirred for 4 h and monitored by TLC analysis. After that, the mixture was passed through a pad of silica gel (DCM:methanol 5:1 v/v), concentrated and dried under high vacuum. The crude acid was used in the next step directly.

The crude acid was dissolved in anhydrous N,Ndimethylformamide (DMF; 1.1 mL) under argon. Sodium hydrogen carbonate (5 mg, 50 µmol) and benzyl bromide (BnBr; 10 µL, 82.5 µmol) were added at room temperature. The reaction mixture was stirred at room temperature for 12 h and monitored by TLC analysis. After being quenched with water (2 mL), the reaction mixture was extracted with ethyl acetate (3 mL × 5 mL) and the organic layer was washed with brine (5 mL). The organic layer was dried over Na2SO4, filtered and concentrated. The residue was purified by silica gel column chromatography (DCM:methanol 50:1 v/v) to afford trisaccharide **22** as a colorless syrup (12.0 mg, 9.6 µmol, 87% over two steps). IR νmax (film) 2,942, 2,865, 1,722, 1,366, 1,294, 1,242, 1,190, 1,104, 1,046, 732 cm−<sup>1</sup> ; <sup>1</sup>H NMR [400 MHz, deuterochloroform (CDCl3)] δ = 7.51–7.09 (m, 30H, 5Ph), 6.90 (d, J = 9.3 Hz, 1H, N-H), 6.32 (m, 2H, N′′-H, N-H), 5.43-5.24 (m, 2H, Bn-2H), 5.13 (s, 2H, Bn-2H), 4.99-4.82 (m, 3H, Bn-2H, 1′ -H), 4.83-4.69 (m, 4H, Bn-2H, 2-H, 2 ′ -H), 4.68-4.56 (m, 3H, 1-H, NBn-1H, 2′′-H), 4.54 (d, J = 2.0 Hz, 1H, 1′′-H), 4.47 (dd, J = 11.3, 3.4 Hz, 2H, Bn-2H), 4.38 (d, J = 15.8 Hz, 1H, NBn-1H), 3.99-3.85 (m, 3H, 5′ -H, 3-H, 4′′-H), 3.79 (d, J = 9.3 Hz, 2H, 5′′-H, 5-H), 3.71 (dd, J = 10.7, 2.6 Hz, 2H, 3′ -H, linker-1H), 3.68-3.57 (m, 1H, linker-1H), 3.50-3.37 (m, 3H, 4-H, 4′ -H, 3′′-H), 3.28 (s, 1H, linker-1H), 3.20 (d, J = 13.9 Hz, 1H, linker-1H), 2.74 (d, J = 2.5 Hz, 1H, 4′′-OH), 2.05 (s, 3H, CH3CO), 2.00 (s, 6H, 2CH3CO), 1.71 (m, 2H, linker-2H), 1.26 (d, J = 7.0 Hz, 3H, 6′ -CH3), 1.20 (d, J = 6.5 Hz, 3H, 6-CH3). Procedure for the synthesis of N-benzyl-Nbenzyloxycarbonyl-3-aminopropyl 4-O-benzyl-3-O-(4-Obenzyl-3-O-[benzyl 4-O-acetyl-3-O-benzyl-2-acetamido-2-deoxy-β-D-mannopyranosyl uronate]-2-acetamido-2-

fucopyranoside (**2**): To a solution of compound **22** (7 mg, 5.6 µmol) in pyridine (0.2 mL) under argon, acetic anhydride (6 µL, 63.5 µmol) was added at 0◦C. After stirring for 3 h at room temperature, the reaction was quenched with methanol (50 µL). After removal of solvent, the residue was purified by silica gel column chromatography (DCM:methanol 70:1 v/v) to give product **2** as a colorless syrup (5.8 mg, 4.5 µmol, 80%). [α] 20 <sup>D</sup> = −13.47◦ [c = 0.50, chloroform (CHCl3)]; IR νmax (film) 3,334, 2,923, 1,754, 1,671, 1,526, 1,454, 1,368, 1,232, 1,095, 1,051, 736, 698 cm−<sup>1</sup> ; <sup>1</sup>H NMR (400 MHz, CDCl3) δ = 7.51–7.10 (m, 30H, 6Ph), 6.84 (d, J = 9.5 Hz, 1H, N-H), 6.57 (d, J = 9.1 Hz, 1H, N′′-H), 6.36 (d, J = 9.6 Hz, 1H, N-H), 5.38 (t, J = 6.5 Hz, 1H, 4′′-H), 5.27-5.00 (m, 5H, Bn-5H), 4.98 (d, J = 3.8 Hz, 1H, 1′ -H), 4.91–4.67 (m, 5H, 1-H, 2-H, 2′ -H, Bn-2H), 4.67-4.47 (m, 5H, 1′′-H, 2′′-H, Bn-3H), 4.46- 4.34 (m, 2H, Bn-2H), 4.29–4.18 (m, 1H, 3′ -H), 4.04 (d, J = 6.1 Hz, 1H, 5′′-H), 3.96 (m, 3H, 5-H, 5′ -H, 3-H), 3.67 (m, 3H, 4-H, linker-2H), 3.51 (m, 2H, 4′ -H, 3′′-H), 3.38-3.15 (m, 2H, linker-2H), 2.04 (s, 3H, CH3CO), 2.00 (s, 3H, CH3CO), 1.92 (s, 3H, CH3CO), 1.82 (s, 3H, CH3CO), 1.76 (m, 2H, linker-2H), 1.25 (d, J = 6.4 Hz, 6H, 6-CH3, 6′ -CH3); <sup>13</sup>C NMR (100 MHz, CDCl3) δ = 171.9, 169.7, 169.6, 167.4, 156.4, 139.0, 138.3, 137.9, 136.5, 135.0, 128.8, 128.7, 128.6, 128.5, 128.4, 128.3, 128.2, 128.1, 127.8, 127.5, 127.3, 100.3 (anomeric), 98.0 (anomeric), 94.5 (anomeric), 74.7, 74.4, 72.1,

deoxy-α-L-fucopyranosyl)- 2-acetamido-2-deoxy-α-D-

67.9, 67.7, 67.5, 67.2, 63.5, 49.7, 48.8, 47.7, 47.0, 29.7, 23.6, 22.9, 20.8, 17.3, 17.0; high-resolution electrospray ionization mass spectrometry (HR-ESI-MS) (m/z): calcd for C72H84N4O18Na<sup>+</sup> (M + Na+): 1,315.5678, found: 1,315.5697.

Procedure for the synthesis of 3-aminopropyl 3-O-(3-O- [4-O-acetyl-2-acetamido-2-deoxy-β-D-mannopyranosyluronic acid]-2-acetamido-2-deoxy-α-L-fucopyranosyl)-2-acetamido-2 deoxy-α-D-fucopyranoside (**1**):

Trisaccharide **2** (4.7 mg, 3.63 µmol) was dissolved in a mixture of tert-butyl alcohol (tBuOH)/water/DCM (5:2:1 v/v/v, 4 mL). The solution was purged with nitrogen, 10% Pd/C was added, and the solution was purged with H<sup>2</sup> for 5 min, then stirred under an H<sup>2</sup> atmosphere overnight, filtered (celite pad), and concentrated. The residue was purified with a Sep-Pak cartridge C18 (Macherey-Nagel, Düren, Germany) using water and methanol as eluents to give trisaccharide **1** as a white solid (2.5 mg, 3.53 µmol, 97%). [α] 20 <sup>D</sup> = −45.25◦ (c = 0.20, H2O); <sup>1</sup>H NMR (400 MHz, D2O) δ = 5.13–5.05 (m, 2H, 4′′-H, 1′ -H), 5.02 (s, 1H, 1′′-H), 4.83 (d, J = 3.8 Hz, 1H, 1-H), 4.59 (d, J = 4.4 Hz, 1H, 2′′-H), 4.33 (dd, J = 11.1, 3.8 Hz, 1H, 2-H), 4.27–4.20 (m, 2H, 2 ′ -H, 4′ -H), 4.13 (dd, J = 9.6, 4.5 Hz, 3H, 3′′-H, 5-H, 5′ -H), 4.08 (s, 1H, 3′ -H), 4.04–3.94 (m, 2H, 5′′-H, 3-H), 3.85 (d, J = 3.2 Hz, 1H, 4-H), 3.81 (dd, J = 10.9, 5.6 Hz, 1H, linker-CH2), 3.57 (dt, J = 11.2, 6.0 Hz, 1H, linker-CH2), 3.16 (t, J = 7.6 Hz, 2H, linker-CH2), 2.18 (s, 3H, CH3CO), 2.11 (s, 6H, 2CH3CO), 2.03 (s, 5H, linker-CH2, CH3CO), 1.28 (d, J = 2.3 Hz, 3H, 6′ -CH3), 1.27 (d,

J = 2.7 Hz, 3H, 6-CH3); <sup>13</sup>C NMR (100 MHz, D2O) δ = 175.6, 173.9, 173.8, 173.1, 98.9 (1′ -C), 97.2 (1-C), 95.1 (1′′-C), 74.1, 73.3, 73.1, 71.1, 70.1, 69.5, 67.6, 66.9, 66.5, 65.0, 53.0, 48.5, 47.6, 37.2, 26.8, 22.3, 21.94, 21.92, 20.3, 15.5, 15.3; HR-ESI-MS (m/z): calcd for C29H48N4O16Na<sup>+</sup> (M + Na+): 731.2963, found: 731.2962.

Characterization data: <sup>1</sup>H, <sup>13</sup>C, and two-dimensional NMR spectra for products are shown in the **Supplementary Material** (Pages S14-S33).

#### Preparation and Analysis of Glycoconjugate

To a solution of bis(p-nitrophenyl adipate) (PNP; 67.8 µmol) in dimethyl sulfoxide (DMSO)/pyridine (1:1) was added triethylamine (86 µmol) stirred for 5 min at room temperature. Followed by dropwise addition of the compound trisaccharide **1** (2.26 µmol) in a mixture of DMSO and pyridine (1:1) and the reaction mixture was stirred at room temperature for 7 h. The reaction mixture was lyophilized. The solid residue was washed with CHCl3. Trisaccharide PNP-ester was obtained. CRM197 protein (0.017 µmol) was washed with autoclaved water and phosphate buffer (pH 8.0). CRM197 in phosphate buffer (100 µL) was added to trisaccharide PNPester and stirred at room temperature for 24 h. After the reaction, the mixture was washed with water and phosphate buffer. The glycoconjugate was analyzed by matrix-assisted laser desorption/ionization (MALDI)–time-of-flight (TOF)– MS and sodium dodecyl sulfate (SDS)–polyacrylamide gel electrophoresis (PAGE) analysis.

#### Immunization Experiments

Animal experiments were approved by Jiangnan University of Technology Animal Care and Use Committee (Animal Ethics Committee Number: JN.No. 20180915b0121125[175]). Twelve female Balb/c mice (6 weeks old; Charles River, Beijing, China) were randomly divided into two groups. Each mouse of the immunized group was immunized subcutaneously with glycoconjugate corresponding to 4 µg oligosaccharide hapten in complete Freund's adjuvant. The mice were then boosted twice with glycoconjugates in incomplete adjuvants in a 2 week interval. Sham immunized mice were treated with the respective adjuvant without glycoconjugate. Serum samples were drawn from the tail vein and tested every week. According to the mice serum microarray result, the mouse with the highest immunogenicity was boosted once more and euthanized to collect the spleen for cell fusion.

#### Preparation of Glycan Microarray Slides and Microarray Binding

Diisopropylamine (DIPA; 3.6 mL) was added into the solution of tetraethylene glycol disuccinimidyl disuccinate (TGDD; 1.58 g) in DMF (257 mL). APTES slides (Electron Microscopy Science) immersed in the solution of TGDD at 40◦C with 60–70 rpm, overnight. Then, the slides were sonicated for 15 min and washed with anhydrous ethanol three times. After spindry, the slides were vacuum dried at 37◦C for 3 h. The oligosaccharides were dissolved in the coupling buffer (50 mM sodium phosphate, pH 8.5) for printing using Arrayjet Sprint

(Arrayjet). After printing, the slides were incubated into a humidified chamber at 26◦C with 55% humidity overnight. The slides were then placed into microarray quenching buffer (dissolve 50 nM Na2HPO<sup>4</sup> and 100 nM ethanolamine in 1 L ddH2O) 50◦C for 1 h, then washed with ddH2O three times. The slides were shortly centrifuged to remove residual water and then ready for use.

Quenched slides were blocked by incubation in 3% bovine serum albumin (BSA; w/v) in phosphate buffered saline (PBS) at 4◦C, overnight. The slides were washed with PBST (0.1% tween in PBS) once, twice with PBS. The residual liquid was removed by centrifugation. Mice serum samples were serially diluted in 1% BSA (w/v) in PBS and then added into the wells of incubation chamber (ProPlate) on the microarray. Each sample has at least two replicates. The microarray was incubated in a dark humid chamber for 1 h at room temperature. The samples were then removed, and each well was washed three times with 50 µL PBST. Secondary antibodies diluted 1:400 in 1% BSA (w/v) in PBS was added into the wells and incubated in the dark humid chamber for 45 min at room temperature. After removing secondary antibodies, each well was washed three times with 50 µL PBST. The chamber was then carefully removed, and the slides were washed once with ddH2O and once with ddH2O for 15 min. The residual liquid was removed by centrifugation and ready for scanning by Axon GenePix 4200AM.

experiments with similar results.

#### Generation and Purification of Monoclonal Antibodies

Isolated splenocytes from the selected mouse were fused to P3X63Ag8.653 myeloma cells (ATCC CRL-1580) with standard hybridoma technique (Broecker et al., 2015). Selection of positive clones specific for trisaccharide **1** was conducted with glycan array screening. Consecutive subcloning steps were performed to identify mAb-producing hybridoma cells. The large-scale production of ascitic fluid rich in mAb was performed according to the published method (Ren et al., 2017), which was approved by Jiangnan University of Technology Animal Care and Use Committee (Animal Ethics Committee Number: JN.No. 20190515b0060831[107]). The mAb was further isolated from ascitic fluid using sequential precipitation with caprylic acid and ammonium sulfate.

#### Immunofluorescence of Inactivated *S. aureus*

S. aureus serotype 8 (ATCC 49525) and Escherichia coli BL21 were cultured in Soybean-Casein Digest Medium and Luria-Bertani (LB) medium at 37◦C, respectively. The bacteria were inactivated in 0.4% paraformaldehyde for 48 h. The bacteria were collected and washed with buffer I (50 mM NaHCO3, 100 mM NaCl, pH 7.5) for labeling with 0.1 mg·mL−<sup>1</sup> fluorescein isothiocyanate (FITC). The bacteria were then washed with 0.25% BSA (w/v) in PBS suspended in 1% BSA (w/v) in PBS and incubated with mAbs or mouse serum at 4◦C for 16 h under agitation. After being washed with 1% BSA (w/v) in PBS, the bacteria were incubated with goat anti-mouse IgG–Alexa Fluor 635 solution in the dark at room temperature for 1.5 h. After washing, the fluorescence on bacteria was monitored using a confocal laser scanning microscope.

## RESULTS AND DISCUSSION

#### Synthesis of Trisaccharide 1

Conjugation-ready S. aureus CP8 trisaccharide **1** bearing orthogonal amine linker was designed. Retrosynthetic analysis revealed disaccharide trifluoroacetimidate **3** and D-fucosamine **4** as key intermediates (**Figure 1**). According to a β-glucosylation– epimerization strategy, disaccharide **3** containing β-mannosidic linkage in turn can be derived from D-glucose building block **5** and L-fucosamine building block **6**. The C2 epimerization was designed to progress at disaccharide stage for improving the overall synthesis efficiency. Particularly, both C2 amino groups in D-fucosamine and L-fucosamine were marked by nonparticipating azido groups to help the stereoselective formation of two 1,2-cis-α-glycosidic linkages.

Preparation of D-fucosamine building block **4** was started from the cheapest D-glucose via the known compound **7** (Qin et al., 2018) (**Scheme 1**). The removal of benzylidene and subsequent tosylation afford compound **8**, which was C6 deoxidized through iodination and reduction in good overall yield. Triflation of D-quinovosamine derivative **9** followed by C4 epimerization through a Lattrell-Dax inversion gave rise to Dfucosamine derivative **10** in 51% overall yield. After benzylation under a neutral condition, compound **11** was converted to Schmidt donor **12** in good overall yield. Acid-catalyzed glycosylation with linker N-Bn-N-Cbz-3-aminopropan-1-ol in diethyl ether/DCM gave compound **13** (α:β = 3.5:1) in 80% yield. Treatment of **13** with sodium methoxide (NaOMe) gave rise to building block **4**.

Synthesis of L-fucosamine building block **6** was initiated from known compound **14** (Qin et al., 2018) (**Scheme 2**). Deacetylation and subsequent dibutyltin oxide-mediated regioselective p-methoxybenzylation afforded alcohol **16** in 93% overall yield. Building block **6** was obtained by C4 benzylation

and removal of the C3 para-methoxybenzyl (PMB) group in good overall yield.

Assembly of trisaccharide was started from the nonreducing to the reducing end (**Scheme 3**). The union of thioglycoside donor **5** (David et al., 1989; Wang et al., 2013; Li et al., 2014) and acceptor **6** in the presence of trimethylsilyl trifluoromethanesulfonate (TMSOTf) and N-iodosuccinimide (NIS) at 0◦C afforded disaccharide **18** in good yield and stereoselectivity. After removal of levulinoyl group (Lev), βglucosyl derivative was transformed to β-mannosyl derivative **19** by azide displacement of triflate in good overall yield. The anomeric allyl group was removed with palladium(II) chloride (PdCl2), followed by introduction of trifluoroacetimidate to give Yu donor **3**. TMSOTf-catalyzed glycosylation of **4** with Yu donor **3** in a blended-solvents system including DCM, diethyl ether, and thiophene afforded trisaccharide **20** in 82% yield and good stereoselectivity. Reduction of the azido groups with propane-1,3-dithiol and subsequent acetylation gave trisaccharide **21**. 4,6-O-benzylidene was removed to afford diol compound, which was oxidized and benzylated at C6 position in good overall yield. Acetylation of alcohol **22** afforded compound **2**, which was transformed to target trisaccharide **1** through global deprotection in 97% yield. The NMR data (1H and <sup>13</sup>C NMR spectra) of trisaccharide **1** are in agreement with those of the isolated polysaccharide (Jones, 2005) (for details, see **Supplementary Material**, **Table S1**). The slight spectral differences are most evident toward the reducing end and probably arise due to the installation of aminopropyl linker in the synthetic trisaccharide.

### Preparation and Immunogenicity of Glycoconjugate of Trisaccharide 1

In order to test the immunogenicity of synthetic hapten, the trisaccharide **1** was covalently linked with immunogenic carrier protein CRM197 to obtain CRM197–trisaccharide **1** glycoconjugate. CRM197 is a Food and Drug Administration (FDA)-approved constitute in marketed carbohydrate conjugate vaccines (Broecker et al., 2011). The glycoconjugate was generated by the coupling of the spacer bis(4-nitrophenyl) adipate with amine group of trisaccharide **1** and lysine amino groups of CRM197 (**Scheme 4**). The glycoconjugate was confirmed by SDS-PAGE. And the glycan loading on CRM197 was analyzed by MALDI-TOF-MS (**Figure S1**). The mass spectrum showed the mass peak of glycoconjugate about 63.6 kDa, which indicated around six trisaccharides loaded onto CRM197.

In the immunization experiment, each mouse of the immunized group was immunized with one priming dose of CRM197–trisaccharide **1** glycoconjugate containing 4 µg glycan content in complete Freund's adjuvant and one boosting dose of the conjugate in incomplete Freund's adjuvant. Mice of the sham immunized group received the same volume of PBS in the formulation. The serum antibody titers were monitored by glycan microarray. The glycan array results revealed that the immunized mice had increased IgG response specifically against the trisaccharide antigen comparing to pre-immunized serum levels and the sham immunized mice (**Figure S2A**). Mouse 6 with the highest IgG antibody level was boosted once more and sacrificed to collect the spleen for mAb development (**Figure S2B**). The serum of endpoint was further analyzed and showed robust IgG response against CRM197 and glycan hapten (**Figure 2**). No antibody against the spacer constructed in the glycoconjugate was detected. Several glycans, including E. coli O55:B5 lipopolysaccharide (LPS), Plesiomonas shigelloides (P. shigelloides) serotype 51 O-antigen trisaccharide, α-1-6-glucose trisaccharide, and mannose, were selected as control on the glycan microarrays (**Figure S3**). α-1-6-Glucose trisaccharide is known as an immunodeterminant of Helicobacter pylori LPS core oligosaccharide. P. shigelloides serotype 51 O-antigen trisaccharide, a zwitterionic oligosaccharide comprising diamino-D-glucuronic acid, L-fucosamine, and D-quinovosamine, served to elucidate the structural specificity of antibodies against trisaccharide **1**. No cross reactivity was detected against control glycans, confirming that immunological specificity of trisaccharide **1** relied on its monosaccharide types, substituents, and glycosidic bonds.

### Immunological Evaluation on Inactivated Bacteria

The mAb was prepared by hybridoma development and subcloning. Several selected hybridoma supernatants containing secreted antibodies were evaluated for binding to S. aureus Type 8 trisaccharide **1** by glycan microarray analysis. One selected hybridoma clone cell was intraperitoneally injected to Balb/c mice to collect ascitic fluid for large-scale preparation of mAbs. The ascitic fluid was further purified by sequential precipitation with caprylic acid and ammonium sulfate (Perosa et al., 1990) to obtain the purified mAbs with a concentration of 0.75 mg·mL−<sup>1</sup> (**Figure S3**).

Inspired with the glycan microarray results, the bacteria recognition of the serum antibodies and purified mAbs was analyzed by immunofluorescence and imaged with confocal laser scanning microscopy (CLSM). A widely distributed bacteria E. coli BL21 strain was used as control. The bacteria were observed by localized green fluorescence on the bacterial surface after direct FITC labeling. The binding of serum IgG antibodies and purified mAbs with the bacteria was detected by the co-localization of red fluorescence after incubation of a goat anti-mouse IgG–Alexa Fluor 635. As shown in **Figure 2B** and **Figure 3**, both serum antibodies and purified mAbs bound significantly to the bacterial surface of S. aureus, indicating the good recognition of antibodies. Serum antibodies showed very weak binding with E. coli BL21, which was probably due to the weak immune response against E. coli, but no purified mAbs bound to the control bacteria, implying the specificity of mAbs.

### CONCLUSION

Based on S. aureus CP8 trisaccharide repeating unit, trisaccharide **1** equipped with an anomeric linker was synthesized. A βglycosylation-epimerization strategy served well to introduce the β-mannosyl into disaccharide intermediate. Two 1,2 cis-α-glycosidic linkages were stereoselectively formed that relied on a non-participating C2 azide group and solvent effects of diethyl ether and thiophene. Here, we have further shown the immunogenicity of the synthetic S. aureus CP8 trisaccharide **1**. After conjugation with CRM197, the synthetic trisaccharide **1** elicited IgG antibodies in mice. The prepared mAbs have great potential to develop novel therapeutic or preventive approaches with the high specificity with S. aureus. The synthetic trisaccharide **1** can be the suitable candidate antigen for the development of carbohydrate-based vaccine against S. aureus.

#### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

### ETHICS STATEMENT

The animal study was reviewed and approved by Jiangnan University of Technology Animal Care and Use Committee.

### AUTHOR CONTRIBUTIONS

MZ performed animal studies, generation of mAbs, glycan microarray screening, and bacterial recognition with the help of

### REFERENCES


LL, HX, ZZ, and BM. CQ synthesized S. aureus CP8 trisaccharide **1**. JY and JH designed and initiated this study. CQ, JY, and JH wrote the manuscript with input from all authors.

### FUNDING

This work was supported by the National Natural Science Foundation of China (21877052, 21907039), Natural Science Foundation of Jiangsu Province (BK20180030, BK20190575), National Key R&D Program of China (2018YFA0901700), National First-class Discipline Program of Light Industry Technology and Engineering (LITE2018-14), the Fundamental Research Funds for the Central Universities (JUSRP51712B), and the Max Planck Society International Partner Group Program.

### ACKNOWLEDGMENTS

We thank Prof. Peter H. Seeberger and Ms. Bruna Mara Silva Seco from the Department of Biomolecular Systems, Max Planck Institute of Colloids and Interfaces, for their beneficial discussion and generous help.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00258/full#supplementary-material

epitopes mimics antigenic properties of larger glycans. Nat. Commun. 7:11224. doi: 10.1038/ncomms11224


4-antigen Staphylococcus aureus vaccine (SA4Ag): results from a first-inhuman randomised, placebo-controlled phase 1/2 study. Vaccine 35, 375–384. doi: 10.1016/j.vaccine.2016.11.010


lipopolysaccharide inner core structure defined by chemical synthesis. Chem. Biol. 22, 38–49. doi: 10.1016/j.chembiol.2014.11.016


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Zhao, Qin, Li, Xie, Ma, Zhou, Yin and Hu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Peptidyl ω-Asp Selenoesters Enable Efficient Synthesis of N-Linked Glycopeptides

Jing-Jing Du1†, Lian Zhang1†, Xiao-Fei Gao<sup>2</sup> , Hui Sun<sup>3</sup> and Jun Guo<sup>1</sup> \*

<sup>1</sup> Key Laboratory of Pesticide and Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, International Joint Research Center for Intelligent Biosensing Technology and Health, College of Chemistry, Central China Normal University, Wuhan, China, <sup>2</sup> Jiangxi Key Laboratory for Mass Spectrometry and Instrumentation, East China University of Technology, Nanchang, China, <sup>3</sup> Hubei Key Laboratory of Cell Homeostasis, Hubei Province Key Laboratory of Allergy and Immunology, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, College of Life Sciences, Ministry of Education, Wuhan University, Wuhan, China

Chemical synthesis is an attractive approach allows for the assembly of homogeneous complex N-linked glycopeptides and glycoproteins, but the limited coupling efficiency between glycans and peptides hampered the synthesis and research in the related field. Herein we developed an alternative glycosylation to construct N-linked glycopeptide via efficient selenoester-assisted aminolysis, which employs the peptidyl ω-asparagine selenoester and unprotected glycosylamine to perform rapid amide-bond ligation. This glycosylation strategy is highly compatible with the free carboxylic acids and hydroxyl groups of peptides and carbohydrates, and readily available for the assembly of structure-defined homogeneous N-linked glycopeptides, such as segments derived from glycoprotein EPO and IL-5.

Keywords: N-linked glycopeptide, glycosylation, selenoester, aminolysis, chemical synthesis

## INTRODUCTION

Many proteins undergo co- or post-translational modifications, including phosphorylation, acetylation, and glycosylation to fulfill their functions (Walsh and Jefferis, 2006; Carubbi et al., 2019). It is estimated that glycosylation modifications are associated with approximately 50% of human proteins (Clerc et al., 2016; Oliveira-Ferrer et al., 2017) and 30% of approved biopharmaceutical proteins (Zou et al., 2020), which are critical for important biological processes in living systems, such as cell's adhesion, recognition, targeting, and differentiation (Varki, 2017; Bhat et al., 2019). Despite the importance of glycosylations, rigorous evaluation of the relationship between the precise structure and biological function of glycoproteins is complicated by the structural heterogeneity of the oligosaccharides in biological organisms, and the difficulty to obtain sufficient amounts of structure-defined glycoproteins with single glycoform from natural sources (Park et al., 2009).

In order to develop viable and efficient strategies to chemically construct homogeneous complex N-linked glycopeptides and glycoproteins, extensive efforts and advances have been made in the field (Payne and Wong, 2010; Wilson et al., 2013; Okamoto et al., 2014a; Wang and Amin, 2014; Fairbanks, 2019; Li et al., 2019), such as the resin-bound glycosylation (Kunz and Unverzagt, 1988; Vetter et al., 1995; Offer et al., 1996; Mezzato et al., 2005; Kajihara et al., 2006; Yamamoto et al., 2008; Piontek et al., 2009a,b; Chen and Tolbert, 2010; Conroy et al., 2010; Ullmann et al., 2012; Okamoto et al., 2014b; Reif et al., 2014; Lee et al., 2016; Schöwe et al., 2019)

#### Edited by:

Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China

#### Reviewed by:

Albert Moyano, University of Barcelona, Spain Suwei Dong, Peking University, China

> \*Correspondence: Jun Guo jguo@mail.ccnu.edu.cn

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Organic Chemistry, a section of the journal Frontiers in Chemistry

Received: 25 January 2020 Accepted: 15 April 2020 Published: 05 May 2020

#### Citation:

Du J-J, Zhang L, Gao X-F, Sun H and Guo J (2020) Peptidyl ω-Asp Selenoesters Enable Efficient Synthesis of N-Linked Glycopeptides. Front. Chem. 8:396. doi: 10.3389/fchem.2020.00396

**100**

and solution glycosylation (Anisfeld and Lansbury, 1990; Cohen-Anisfeld and Lansbury, 1993; Kaneshiro and Michael, 2006; Wang et al., 2011, 2012, 2013; Aussedat et al., 2012; Nagorny et al., 2012; Sakamoto et al., 2012; Joseph et al., 2013; Chai et al., 2016; Schöwe et al., 2019). However, unneglectable limitations still remain in these strategies. Consumption of large amount of precious materials and low coupling yields usually occurred for the glycopeptide assembly on-resin via either the stepwise (**Scheme 1A**) or the convergent (**Scheme 1B**) strategy. Based on the aspartylation technology pioneered by Lansbury and coworkers (**Scheme 1C**) (Anisfeld and Lansbury, 1990; Cohen-Anisfeld and Lansbury, 1993), Danishefsky group and Unverzagt group developed the synthetic methods and optimized the pseudoproline dipeptide building block to construct the peptide fragement at the site of Asn-Xaa-Ser/Thr, and this approach significantly suppressed the formation of aspartimide byproducts during glycosylation (Ullmann et al., 2012; Wang et al., 2012). Although useful, requirement for additional metal catalysts or protected C-terminal carboxylic acid derivatives may limit the application of this strategy in glycopeptide assembly.

Notwithstanding substantial advances have been made in Nlinked glycopeptides and glycoproteins synthesis, it is still a great challenge to efficiently achieve large N-linked gylcoproteins bearing complex glycan forms. The desired synthetic methods will have fewer protecting groups and modifications on the peptide and glycan fragments, and promote efficient and selective ligation reactions between fragments. Previously, our research group has developed a strategy for the convergent synthesis of Nlinked glycopeptides via peptidyl ω-Asp p-nitrophenyl thioesterassisted glycosylation (**Scheme 1C**) (Du et al., 2016). This convergent strategy with direct aminolysis provides an access to complex N-linked glycopeptides, usually with good yields and simple operation, and is worthy of further investigating more reactions and applications.

Many investigators have proved that coupling of peptide fragments via direct aminolysis is a feasible method for preparation of peptides and glycopeptides. This method employs direct coupling reaction between peptide fragments bearing Nterminal free amines and peptide fragments bearing C-terminal active esters, such as oxoesters (Kemp and Vellaccio, 1975; Wan et al., 2008; Li et al., 2010), thioesters (Payne et al., 2008; Agrigento et al., 2014; LingáTung and Clarence, 2015; Gui et al., 2016) or selenoester derivatives (Grieco et al., 1981; Mitchell et al., 2015; Raj et al., 2015; Takei et al., 2017; Temperini et al., 2017; Du et al., 2018; Sayers et al., 2018a,b; Chisholm et al., 2020; Wang et al., 2020), eliminates the need for N-terminal

cysteine residues or thiol ligation auxiliaries, which are generally required for the sequential native chemical ligation (Dawson et al., 1994; Kent, 2009). Notably, the active selenoesters or derivatives always offer enhanced reactivity compared to the thio- or oxoesters (Mitchell et al., 2015; Raj et al., 2015; Takei et al., 2017). Our previous studies have shown that the aminolysis of peptidyl selenoester is an efficient strategy for peptide and glycopeptide assembly (Yin et al., 2016; Du et al., 2018). Herein we are interested in pursuing a highly reactive peptidyl ω-Asp selenoester-assisted glycosylation methodology for constructing N-linked glycopeptides without coupling reagents (**Scheme 1C**). This methodology is assumed to be compatible with free carboxylic groups and hydroxyl groups of peptides and glycans.

### RESULTS AND DISCUSSION

#### Evaluation of the Reactivity of the Active Esters for Glycosidic Amide Bond Formation

To evaluate the methods for synthesizing N-linked glycopeptide synthesis via active ester-assisted aminolysis (Du et al., 2016), the activity and efficiency of different active esters were compared and investigated using model reactions, in which Fmoc-Gly ester **2** and glycosylamine **1a** (Likhosherstov et al., 1986; Cohen-Anisfeld and Lansbury, 1993) were condensed in DMSO to form β-anomer product **3** and monitored by HPLC (**Table 1**, **Figure 1**).

For oxoester **2a**, it has the lowest activity and almost no product was observed (**Table 1**, entry 1). For thioesters (**Table 1**, entries 2-3), phenyl thioester **2b** underwent glycosidic bond formation slightly faster than the oxoester **2a**, but it is not efficient enough to be applied in the N-linked glycopeptide synthesis; p-nitrophenyl thioester **2c** with a strong electron-withdrawing group reacts more efficiently, providing the target product in a yield of 75% within 10 h, which is consistent with previous studies (Hondal et al., 2001; Du et al., 2016). Therefore, the peptidyl

p-nitrophenyl thioester has been successfully utilized to prepare N-linked glycopeptide in our lab (Du et al., 2016).

To improve the efficiency of glycosylation reaction, various selenoesters were assessed under the same conditions (**Table 1**, entries 4–6). For seleno-phenyl ester **2d**, it underwent complete conversion within 2 h, and afforded the target product **3** in 92% yield; for seleno-benzaldehyde esters **2e** with the o-benzaldehyde group and **2f** with the p-benzaldehyde group (Raj et al., 2015), they underwent complete conversion in <1 h, and gave the products in yield of 69 and 67%, respectively. We postulate that the participation of o-benzaldehyde (neighbor-participating group) and p-benzaldehyde, which both have electronwithdrawing groups can increase the phenyl selenoester's electrophile reaction rate, but also facilitate the hydrolysis reaction and reduce the yield of aminolysis product. Therefore, the seleno-phenyl ester **2d** affords an optimal balance between high reactivity and sufficient stability, will be appropriate for the selenoester-assisted aminolysis in glycosylation reactions.

As shown in **Table 2**, we compared the reaction kinetic data p-nitrophenyl thioester **2c** and seleno-phenyl ester **2d**. As expected, the glycosylation reaction for the product **3** between glycine-derived ester and glycosylamine follows a second-order kinetics, with a rate constant 0.0071 ± 0.0004 M−<sup>1</sup> s −1 for **2c** and 0.0420 ± 0.0012 M−<sup>1</sup> s −1 for **2d**, respectively. The seleno-phenyl ester is roughly 6-times faster than the p-nitrophenyl thioester to form the glycosidic amide bond.

#### Condition Optimization

As depicted in **Table 3**, various glycosylation reaction conditions were evaluated for further optimization. From the results of optimizing the solvent (**Table 3**, entries 1–4), the efficiency of the glycosylation reaction was shown to be greatly boosted in DMSO, but the aqueous solution of NMP/PB is prone to decompose the seleno-phenyl ester **2d**. The amounts of DIPEA from 0.1 to 3.0 equivalents didn't significantly influence the yields (**Table 3**, entries 4–7). Additionally, we found that the product **3** was achieved in optimal yield when seleno-phenyl ester **2d** was treated with 2.0 equivalents of glycosylamine **1a**

#### TABLE 2 | Kinetic studies for glycosidic bond formation<sup>a</sup> .

<sup>a</sup>Reaction conditions: 1a (10 µmol), esters (5 µmol) and DIPEA (10 µmol) in 1 mL of DMSO, rt.

(**Table 3**, entries 4, 8–9). In order to maximize the glycosylation and minimize the hydrolysis, we selected the optimum conditions, i.e., 2.0 equivalents of DIPEA and glycosylamine **1a**, and 1.0 equivalent of seleno-phenyl ester **2d** were dissolved in DMSO.

#### Substrate Scope

To explore the universal applicability of selenoester-assisted glycosylation, we embarked on the attachment of selenophenyl esters to a series of peptides to assemble peptidyl ω-Asp selenoester substrates, and examined substrates that incorporating the free C-terminal carboxylic groups and unprotected glycosylamines. A series of partially protected peptides bearing selenoesters at the ω-aspartyl terminus (including pseudoproline dipeptides that suppress aspartimide formation) were successfully prepared for evaluation (Ullmann et al., 2012; Wang et al., 2012). These peptide substrates were conducted via stepwise solid-phase peptide synthesis (SPPS), the general synthetic procedures for **4b-12b** are outlined in **Figure 2** (more details are shown in the **Supporting Information**). The installation of phenyl selenoester group at the ω-aspartyl terminus is straightforward on the resin: firstly, the allyl esters were removed; subsequently, the ω-aspartyl carboxyl groups were converted to selenoesters (**4a-12a**); finally, these peptidyl selenoesters were cleaved from the resin. The ω-aspartyl selenoester peptide substrates (**4b-12b**) were isolated via reversephase HPLC purification in 58–83% yields. In addition, the glycosylamines (**Figure 3**) for the study are monosaccharide **1a**, chitobiose **1b** and undecasaccharide **1c** (extracted from fresh egg yolks) (Seko et al., 1997; Sun et al., 2014).

With peptidyl selenoesters and glycosylamines in hand, the glycosylation reactions at the site of natural ω-asparagine linkage were evaluated. On the one hand, the coupling of monosaccharide **1a** and peptides **4b-6b** gave glycosylated peptides **4c-6c** in approximately 69%-83% isolated yields (**Table 4**, entries 1–3), proving the feasibility of utilizing unprotected glycosylamines together with peptidyl selenoesters bearing free C-terminal carboxylic groups in glycosylation TABLE 3 | Reaction optimization and control experiments<sup>a</sup> .

<sup>a</sup>Reaction conditions: 1a (5-15 µmol), ester 2d (5 µmol) and DIPEA (10 µmol) in 1mL of solvent, rt. <sup>b</sup>Determined by HPLC at 2 h. PB = phosphate buffer (pH 7.4, 0.2 M).

reactions. To our delight, peptide **7b** containing two ωasparagine selenoesters, still gave an isolated yield of 80% of product **7c** derived from multiply glycosylated protein erythropoietin (EPO; fragment 22–43) (Park et al., 2009; Wang et al., 2013; Wilson et al., 2013) with two glycosylation modifications (**Table 4**, entry 4). On the other hand, this strategy also afforded good results for glycosylation of disaccharides. As shown in entries 5–7, coupling of chitobiose **1b** and peptidyl selenoesters **5b**, **8b,** and **9b** formed glycosidic bond at ω-asparagine residue with excellent yields.

For this methodology, it is noteworthy that the desired Nlinked glycopeptides are synthesized rapidly only through mixing two substrates, without using a condensation reagent, and the workup procedure is simple. Excitingly, the free carboxylic groups of ω-aspartyl peptide segments were readily converted into peptidyl selenoesters for further condensation with various glycosylamines. Additionally, each amino acid protecting group in glycopeptide can be easily removed in an acidic environment.

#### Syntheses of N-Linked Glycopeptides With Complex-Type Oligosaccharide

As shown in **Table 5**, the protocol of selenoester-mediated glycopeptide synthesis is extended to complex-type oligosaccharide amines. Given the structural complexity of the precious undecasaccharide amine **1c**, an excessive amount of peptidyl selenoester (1.5:1) was used, and the final products (**10e**, **11e**, **12e**) of the peptides modified with undecasaccharides were achieved in good yields of 59–65% (**Table 5**, entries 1–3). Specially, product **12e** corresponds to the truncated segment of the glycoprotein found in human interleukin-5 (IL-5, an eosinophil chemotactic factor, fragment 26–43) (Coffman et al., 1989; Liu and Dong, 2018).

## CONCLUSION

In this work we have developed a convergent and facile synthetic methodology to construct homogeneous N-linked glycopeptides from the peptides with ω-Asp phenyl selenoester, the use of peptidyl selenoesters has the merits of simple operation and obtained excellent yields of N-linked glycopeptides, such as truncated segments derived from glycoprotein EPO or IL-5. This selenoester-mediated glycosylation provides several advantages: the reactivity of the peptide ester is improved, the complex sialyloligosaccharide in its native form without protection, it is not only compatible with free C-terminal carboxylic acid

#### TABLE 4 | Scope of the peptidyl selenoester-based glycosylation<sup>a</sup> .


<sup>a</sup>Reaction conditions: 1a (10 µmol), selenoester peptides (5 µmol) and DIPEA (10 µmol) in 1 mL of DMSO, rt, 2 h.

TABLE 5 | Selenoester-mediated glycosylation<sup>a</sup> .

<sup>a</sup>Reaction conditions: 1c (3 µmol), 10b-12b (2 µmol), DIPEA (4 µmol) in 0.5 mL of DMSO, 4 Å MS, rt, 6 h.

groups, but also rapidly forms glycosidic bond without additional coupling reagents or catalysts. This method will be further applied to the formation of homogenous N-linked glycopeptides and glycoproteins with therapeutic potential.

#### DATA AVAILABILITY STATEMENT

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

#### AUTHOR CONTRIBUTIONS

JG conceived the project. J-JD, LZ, and X-FG designed and performed the experiments. All authors discussed the results and commented on the manuscript.

#### REFERENCES


#### FUNDING

The project was funded by the National Natural Science Foundation of China (Nos. 21772056 and 81670531), the National Key Research and Development Program of China (No. 2017YFA0505200), Program of Introducing Talents of Discipline to Universities of China (111 program, B17019), and the Research Fund of East China University of Technology (No. DHBK2017114).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00396/full#supplementary-material


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Du, Zhang, Gao, Sun and Guo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Recent Progress in Chemo-Enzymatic Methods for the Synthesis of N-Glycans

Qiang Chao, Yi Ding, Zheng-Hui Chen, Meng-Hai Xiang, Ning Wang\* and Xiao-Dong Gao\*

*Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, China*

Asparagine (N)-linked glycosylation is one of the most common co- and post-translational modifications of both intra- and extracellularly distributing proteins, which directly affects their biological functions, such as protein folding, stability and intercellular traffic. Production of the structural well-defined homogeneous N-glycans contributes to comprehensive investigation of their biological roles and molecular basis. Among the various methods, chemo-enzymatic approach serves as an alternative to chemical synthesis, providing high stereoselectivity and economic efficiency. This review summarizes some recent advances in the chemo-enzymatic methods for the production of N-glycans, including the preparation of substrates and sugar donors, and the progress in the glycosyltransferases characterization which leads to the diversity of N-glycan synthesis. We discuss the bottle-neck and new opportunities in exploiting the chemo-enzymatic synthesis of N-glycans based on our research experiences. In addition, downstream applications of the constructed N-glycans, such as automation devices and homogeneous glycoproteins synthesis are also described.

Keywords: N-glycosylation, chemo-enzymatic synthesis, glycosyltransferases, glycosidase, glycosynthase, homogeneous glycoprotein, biomarker

## INTRODUCTION

In living cells, oligosaccharides usually attach to other macromolecules, forming glycoconjugates to fulfill their biological functions (Hanson et al., 2004). As the representative glycoconjugate, glycoproteins are believed to constitute 50% of human proteins, which is still considered as an underestimated value (An et al., 2009). N-Linked glycosylation is one of the most abundant and complicated posttranslational modifications of proteins, which affects various biological processes such as lectin (calnexin/calreticulin)-mediated protein folding in the endoplasmic reticulum (ER) quality control system and the ER-associated degradation pathways (Helenius and Aebi, 2001; Roth and Zuber, 2017). N-Glycans also play important roles in signal transduction, embryogenesis, neural development, immune regulation, and cell proliferation (Moremen et al., 2012), and they are associated with pathogen recognition, immune responses, autoimmune diseases, cancer cell proliferation, and metastasis in pathological conditions (Ohtsubo and Marth, 2006; Lauc et al., 2016). Therefore, obtaining diverse glycan structures is essential to study their biological roles and can be further applicable in the glycoprotein synthesis (Pilobello and Mahal, 2007; Wang and Lomino, 2012; Hofmann and Pagel, 2017; Hyun et al., 2017).

The structures of oligosaccharides are far more diverse and complex than those of nucleic acids and proteins, due to the variety of monosaccharide residues and linkages of the glycosidic bonds

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Marthe T. C. Walvoort, University of Groningen, Netherlands Wen Yi, Zhejiang University, China*

#### \*Correspondence:

*Ning Wang wangning@jiangnan.edu.cn Xiao-Dong Gao xdgao@jiangnan.edu.cn*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *20 January 2020* Accepted: *18 May 2020* Published: *16 June 2020*

#### Citation:

*Chao Q, Ding Y, Chen Z-H, Xiang M-H, Wang N and Gao X-D (2020) Recent Progress in Chemo-Enzymatic Methods for the Synthesis of N-Glycans. Front. Chem. 8:513. doi: 10.3389/fchem.2020.00513*

**109**

(Bertozzi and Rabuka, 2009). The assembly and digestion of Nglycans are catalyzed by a series of glycosyltransferases (GTs) and glycosidases (GHs). In mammalian cells, more than 30 enzymes are involved in the trimming of N-glycans in the Golgi apparatus (Stanley et al., 2009), each of which may influence the glycan structures, leading to the heterogeneity of N-glycans. For example, the mouse zona pellucida glycoprotein bears up to 58 N-glycan structures at one N-glycosylation site (Stanley et al., 2009). In other words, naturally occurring glycans are always the mixture of extremely similar structures that are difficult to separate, making the preparation of the structurally welldefined oligosaccharides from natural sources impractical. This inadequacy led to a lack of in-depth studies on the molecular basis of how glycans regulate biological and disease processes (Hart and Copeland, 2010; Kiessling and Splain, 2010; Cummings and Pierce, 2014).

In recent decades, various synthetic methods including onepot synthesis, solid-phase synthesis, cascade multienzymatic synthesis and chemo-enzymatic synthesis, have been wellinvestigated to prepare structurally defined N-glycans (Bartolozzi and Seeberger, 2001; Yu et al., 2005a; Muthana et al., 2009; Kajiwara, 2010; Bouhall and Sucheck, 2014; Yu and Chen, 2016; Kinnaert et al., 2017). However, it is difficult to achieve a general synthetic method for N-glycans due to their complicated structures and inherent chemical properties (Boltje et al., 2009). For instance, chemical synthesis which requires careful design in synthetic route and protecting groups, serves as the most reliable method to prepare structural well-defined oligosaccharides. Nevertheless, it is very time-consuming and sometimes risky to accomplish, especially for the highly complicated N-glycan structures. Enzymatic glycosylation is another efficient way to synthesize N-glycans, which doesn't require the introduction of protecting groups and can react under mild conditions with high specificity. But this method is still challenging because the suitable substrates are not always available. The development of chemo-enzymatic synthesis, which requires synthesized precursors and a series of corresponding GTs, has provided a novel approach to producing diverse N-glycans. Compared with chemical and enzymatic approaches, chemo-enzymatic methods show both high stereoselectivity and economic efficiency (Palcic, 2011; Schmaltz et al., 2011). In addition, automated solid-phase synthesis strategy, in which the enzymatic or chemical reactions occur automatically on a solid-phase carrier such as beads is expected to be broadly applicable thanks to its simplified purification steps. This review summarizes some recent advances in the chemo-enzymatic synthesis of N-glycans and discusses the applications and new opportunities in exploiting this method, which should be essential to understanding the roles of N-glycans in glycobiology.

#### ENZYMES AND DONORS USED IN THE SYNTHESIS OF N-GLYCANS

N-Glycans widely exist in eukaryotic cells, whose common monosaccharide building blocks include N-acetylglucosamine (GlcNAc), mannose (Man), glucose (Glc), galactose (Gal), fucose (Fuc), and sialic acid (Neu5Ac, Neu5Gc). The structures and universal symbols of these monosaccharides are shown in **Figure 1**. There are also some unique monosaccharides in the N-glycosylation pathway in bacteria, archaea, fungi and plants (Deshpande et al., 2008; Nothaft and Szymanski, 2010; Eichler, 2013; Jarrell et al., 2014; Strasser, 2016). For example, Nacetylgalactosamine (GalNAc), glucuronic acid (GlcA), di-Nacetyl-d-bacillosamine (BacNAc2) and heptose (Hep) can be found in bacteria and archaea (Nothaft and Szymanski, 2010; Eichler, 2013; Jarrell et al., 2014). In plants, some N-glycans can be modified with xylose (Xyl) during their maturation process in the Golgi (Strasser, 2016). The highly specialized galactofuranose residues could be found in the N-glycan structures in the filamentous fungi (Deshpande et al., 2008). The N-glycans with these unique monosaccharide residues will not be discussed in this paper.

It is well-known that eukaryotic N-glycans can be structurally divided into high-mannose, hybrid and complex types (**Figure 2**). In the biosynthesis pathway, each type of Nglycan is assembled and trimmed stepwise by enzyme-catalyzed reactions involving GTs and GHs (Hanson et al., 2004; Schmaltz et al., 2011; Yu and Chen, 2016). Recently, the available number of these glycan-modifying enzymes is growing rapidly. Nearly 660000 GTs (classified into 110 families) and 770000 GHs (classified into 167 families) from all kingdoms of life can be found at the Carbohydrate-Active Enzymes (CAZy) database (http://www.cazy.org), indicating the potential to use some of these enzymes for the chemo-enzymatic synthesis of N-glycans. In contrast, the donors for N-glycosylation including sugar nucleotides and dolichol phosphate sugar, are sometimes difficult to prepare or commercially expensive, resulting in another critical concern in obtaining N-glycans.

### N-Glycosylation Enzymes

#### Glycosyltransferases (GTs)

GTs are the enzymes which promote glycosidic bond formation by catalyzing the transfer reaction of a saccharide from its activated form (likely a nucleotide sugar) to the acceptor. The acceptor should contain a nucleophile, such as certain carbohydrate hydroxyl groups or the nucleophile in a macromolecule (Breton et al., 2006; Lairson et al., 2008; Wagner and Pesnot, 2010). In recent decades, the number of characterized GTs, which allows a wide range of glycoside linkages to be installed for the preparation of specific N-glycan epitopes, has been increasing (Muthana et al., 2009; Schmaltz et al., 2011; Moremen et al., 2018).

**Abbreviations:** ER, endoplasmic reticulum; GT, glycosyltransferase; GH, glycosidase; CAZy, The Carbohydrate-Active Enzymes database; DLO, dolichollinked oligosaccharide; LacNAc, N-acetyllactosamine; SGP, sialoglycopeptide; ENGase, β-N-acetylglucosaminidase; ManNAc, N-Acetyl-D-mannosamine; LLO, lipid-linked oligosaccharide; TMD, transmembrane domain; CSEE, core synthesis/enzymatic extension; SPPS, solid-phase peptide synthesis; SPE, solid phase extraction; DEAE, diethylaminoethyl; AGA, automated glycan assembly; GBP, glycan binding protein; ACG, aluminum-oxide-coated glass slide; Rtx, Rituximab; ADCC, antibody dependent cell mediated cytotoxicity.

Currently, researchers prepare recombinant GTs either in prokaryotic or eukaryotic expression systems. Taking the advantage of the high expression level, many GTs have been prepared in prokaryotic expression systems such as Escherichia coli (E. coli) as either soluble forms which may be fulllength and truncated catalytic domains (Seto et al., 1995; Rao et al., 2009), or in vitro refolded precipitates (Ramakrishnan and Qasba, 2001). However, the prokaryotic expression system often produces nonfunctional protein aggregates (Paulson and Colley, 1989) when preparing eukaryotic proteins due to the lack of the protein modification system and chaperones for proper folding. Therefore, many GTs are expressed in eukaryotic expression systems, including mammals, insects and yeast (Ramirez et al., 2017; Moremen et al., 2018; Gao et al., 2019). It is worth mentioning that, recently, the eukaryotic expression constructs of all human glycosylation enzymes were generated (Moremen et al., 2018). In this strategy, a modular approach was used to create the library of the expression vectors, which were then transformed into mammalian or insect host cells for the protein expression. By removal of the transmembrane domains at the N-terminus or C-terminus, the active form of recombinant human GTs can be prepared with a high expression level. This work greatly expands the use of GTs for the synthesis of N-glycans (Prudden et al., 2017). Based on the above works, the number of commercially available GTs is increasing, which allows

researchers to use the enzymatic method as an alternative way to modify glycans.

In N-glycan enzymatic synthesis, 6 classes of GTs are commonly used, namely, N-acetylglucosaminyltransferases (GlcNAcTs), mannosyltransferases (ManTs), glucosyltransferases (GlcTs), galactosyltransferases (GalTs), fucosyltransferases (FucTs) and sialyltransferases (SiaTs) (**Figure 2**). In cells, assembly of the N-glycans is initiated by the biosynthesis of dolichol-linked oligosaccharide (DLO) in the ER. Before the transfer of oligosaccharides to nascent polypeptides, the glycan chain in the DLO is elongated sequentially by GlcNAcTs, ManTs and GlcTs (namely, Alg proteins), up to Glc3Man9GlcNAc2-PP-Dolichol, which contains 14 monosaccharide residues (**Figure 3A**). In vitro, since Flitsch and coworkers synthesized Man1GlcNAc<sup>2</sup> by using recombinant β-1,4-ManT Alg1 in 1997 (Watt et al., 1997), the preparation of a series of DLO precursors such as the core pentasaccharide using corresponding Mans has been reported (Ramirez et al., 2017; Boilevin and Reymond, 2018).

In the biosynthesis pathway, the diversity of N-glycan structures starts to appear in the Golgi apparatus, where several GTs (i.e., GlcNAcTs, GalTs, SiaTs and FucTs) are involved in the N-glycan processing. GlcNAcTs, also called GnTs, increase the complexity of oligosaccharide structures by the modification of the core pentasaccharide Man3GlcNAc<sup>2</sup> (Kizuka and Taniguchi, 2016),resulting in N-glycans with different numbers and linkages of branching GlcNAc moieties. These GlcNAc moieties could be converted to N-acetyllactosamines (LacNAc) by GalTs, leading to further structural diversity (Cummings, 2009). In the laboratory, the commercially available or expressed recombinant GnTs (e.g., GnT I, GnT II, and GnT IV) could be applied to the preparation of hybrid- and complex type N-glycan libraries including asymmetric multi-antennary complex type N-glycans by structurally remodeling microbial oligosaccharides, which could be used to generate the N-glycan microarrays for highthroughput screening of glycan-binding proteins (Hamilton et al., 2017).

In many glycoconjugates in living cells, sialic acid serves as the terminal epitopes of glycans. The formation of a certain sialyl glycosidic bond is quite difficult for in vitro chemical oligosaccharide synthesis due to the unique structure and properties of sialic acid (Schwardt et al., 2006). Therefore, SiaTs that transfer Neu5Ac groups from the donor CMP-Neu5Ac are particularly important for the chemo-enzymatic synthesis of N-glycans and glycoproteins. In 2017, Boons and coworkers prepared mono- and disialylated N-glycan derivatives using ST3- Gal-IV, a mammalian α-2,3-sialyltransferase, which recognized the LacNAc antenna structure as the sole substrate (Gagarinov et al., 2017).

In addition, fucose is also often found at the glycan terminus of many naturally existing glycoconjugates, such as the ABO and Lewis blood group epitope glycans. These fucoses are transferred from GDP-Fuc to the core or branch termini of N-glycans by FucTs, which are a series of unique GTs with high structural tolerance to donors and acceptors (Bastida et al., 2001; Khaled et al., 2004, 2008; Nguyen et al., 2007; Li et al., 2008; Woodward et al., 2010). This property enhances the utility and flexibility of FucTs in N-glycan synthesis in vitro. For example, after human FUT8 (an α-1,6 FucT) was overexpressed in a baculovirus system and purified, 77 structurally defined N-glycans were applied to the substrate specificity assay, which accordingly facilitated the development of an efficient chemo-enzymatic strategy to synthesize core-fucosylated asymmetric N-glycans (Calderon et al., 2016).

#### Glycosidases (GHs) and Glycosynthases

During its maturation process in vivo, the N-glycan is modified by first removing certain sugar residues (trimming), followed by re-glycosylation by appropriate GTs to give its mature form. Glycosidases, also called glycoside hydrolases, are enzymes that hydrolyze glycosidic bonds in glycans to saccharide units (Bourne and Henrissat, 2001). GHs are classified into endo-glycosidase and exo-glycosidase according to their cleavage sites in the oligosaccharide chain. Namely, an exo-glycosidase hydrolyzes the glycosidic bond at the non-reducing end of a glycan chain, whereas an endo-glycosidase hydrolyzes the internal glycosidic bond. Typical GHs for mammalian N-glycan trimming and their cleavage sites are summarized in **Figure 2**.

In the laboratory, trimming the N-glycans into desired structure is a valuable method in chemo-enzymatic glycan synthesis. Exo-glycosidases are efficient tools due to their capability to digest glycan structures. For instance, Fmoc labeled high-mannose type N-glycan Man9GlcNAc2Asn-Fmoc, which can be obtained from the Fmoc modified fractional precipitate of soybean flour, could be digested by the α-1,2-mannosidase to produce several high-mannose type N-glycan intermediates. Since Man9GlcNAc2-Asn-Fmoc contains four terminal Man-α-1,2-Man linkages, it can generate Man5−8GlcNAc2-Asn-Fmoc by the treatment with an α-1,2-mannosidase from the human gut bacterial symbiont Bacteroides thetaiotaomicron under controlled conditions (Toonstra et al., 2018). On the other hand, sialylated bi-antennary complex type N-glycan (SCT), which is available at large scale from sialoglycopeptide (SGP) isolated from the chicken egg yolk, can be further trimmed by sequentially adding sialidase, galactosidase, and N-acetylglucosaminidase to give various N-glycan structures. At present, several exo-glycosidases are routinely used to digest corresponding glycan bonds, most of which are commercially available or can be prepared in prokaryotic systems such as E. coli easily (Schmaltz et al., 2011).

Endo-glycosidases also show significant capacity in chemo-enzymatic methods to prepare N-glycans. Golgi endoα-1,2-mannosidase, which can cleave the glucose-substituted mannose from immature glucosylated high-mannose type N-glycans (**Figure 4A**), is useful in chemo-enzymatic synthesis, such as the establishment of a high-mannose glycan library from a non-natural tetradecasaccharide precursor (Koizumi et al., 2013). Another widely used endo-glycosidase is endoβ-N-acetylglucosaminidase (ENGase), which hydrolyzes the N-glycan structure of the glycoprotein and leaves a single proximal GlcNAc residue (**Figure 4B**). ENGases from different species show substrate specificity toward N-glycan structures (Li and Wang, 2018). Endo D is specific for paucimannose (Man1−3GlcNAc2Asn); Endo A and Endo H specifically recognize the high-mannose type N-glycans; Endo F recognizes N-glycan structures ranging from the high-mannose to the bi-antennary complex type; and Endo M cleaves most N-glycan structures including the high-mannose, complex and hybrid types. Endos D, H and F are now commercially available (Schmaltz et al., 2011). Some ENGases show higher specificity, such as Endo S which cleaves only biantennary complex type N-glycan in the Fc domain of human IgG (Albert et al., 2008; Allhorn et al., 2010). In contrast, Endo S2 can cleave almost all kinds of N-linked glycans in IgG (Sjogren et al., 2013).

GHs are also suitable for the assembly of N-glycans since they sometimes possess the reverse ability of glycan transformation.

For example, by using the substrate with a methylene linker between the glycan and peptide, Endo A could synthesize the high-mannose type C-linked glycopeptide with 26% yield (Wang et al., 1997). In another case, Endo M was employed to synthesize the glycopeptides containing high-mannose type Nglycans in 8.5% yield (Haneda et al., 1998). However, undesired side reactions, such as self-condensation, regio-condensation and product hydrolysis, usually happen, resulting in the byproducts during N-glycan synthesis (Hamilton, 2004; Hancock et al., 2006; Faijes and Planas, 2007). To overcome this limitation, engineering of the GH active sites by mutagenesis could result in a glycosynthase, which provides improved practicability by blocking the hydrolysis ability while keeping the capacity of glycan transformation (Kittl and Withers, 2010). One successful case is ENGase, whose mutants have become a widely used tool to synthesize the homogeneous glycoproteins including therapeutic monoclonal antibodies (mAbs) in the past decade (Fairbanks, 2017). The typical procedure is accomplished by the combination of GH and glycosynthase; i.e., heterogeneous glycans are first removed by GHs, followed by the installation of structuredefined homogeneous N-glycans back to the glycoprotein by glycosynthase (Li et al., 2017a). In addition, some other GHs have also been engineered to generate the useful glycosynthases for the construction of N-glycans and glycopeptides (Perugino et al., 2004; Umekawa et al., 2008; Wang, 2008; Huang et al., 2009). For example, a highly efficient fucosynthase was generated by mutagenesis (N423H) of the 1,2-α-L-fucosidase from Bifidobacterium bifidum, which showed the ability to add fucose moieties to both N- and O-linked glycans on the asialofetuin (Sugiyama et al., 2017). A Golgi endo-α-mannosidase was mutated (E407D) to generate the glycosynthase, which was able to mediate the transglycosylation from Glc-α1,3-Man-αfluoride to the acceptor Man8GlcNAc2-BODIPY, resulting in the high-mannose type dodecasaccharide Glc1Man9GlcNAc<sup>2</sup> (Iwamoto et al., 2017).

#### Sugar Donors in N-Glycosylation

For glycosylation reactions, except for GTs, another key issue is sugar donors. Most GTs use sugar nucleotides as donors, among which UDP-Glc, UDP-GlcNAc, UDP-Gal, GDP-Man, GDP-Fuc, and CMP-Sia are commonly found in the N-glycan biosynthesis pathway. Some GTs use dolichol phosphate sugar as their glycosylation donor, such as a few Alg ManTs in the ER lumen that use dolichol phosphate mannose (Man-P-Dol) as the activated sugar donor to elongate the glycan chain (**Figure 3A**; Maeda and Kinoshita, 2008). Although nucleotide sugar substrates are often commercially available, their expensive price has driven the exploration of large-scale preparation methods by many groups in the past few years (Tanaka et al., 2012).Some sugar nucleotides are highly unstable, such as CMP-Sia which is indeed prone to hydrolysis due to the additional electron-withdrawing effect of the carboxyl group (Gilormini et al., 2016). To solve this problem, multienzyme catalyzed onepot reactions and in situ sugar nucleotide regeneration system have been developed for the in vitro experiments (Tsai et al., 2013; Yu and Chen, 2016; Yu et al., 2016, 2017; Liu et al., 2019). For example, as the widely used donor of SiaT, CMP-Neu5Ac could be one-pot synthesized from cytidine triphosphate (CTP) and N-Acetyl-D-mannosamine (ManNAc) or Neu5Ac analogs by using sialic acid aldolase and CMP-sialic acid synthetase (Yu et al., 2005b, 2006, 2009). In addition, as the sugar donor of some GTs (Man-P-dolichol for Alg3, Alg9, and Alg12; Glc-P-dolichol for Alg6, Alg8, and Alg10) (**Figure 3A**), polyprenol sugar phosphate is always difficult to prepare because of its insolubility and the difficulty in obtaining dolichol. Therefore, some works used lipid sugar phosphate with lipid tails of different dolichol analogs as the mimic donor. For example, the phytanyl phosphate mannose (Man-P-Phy) which was chemo-enzymatically synthesized as an alternative to Man-P-dolichol (Wilson et al., 1995), was applied to the extension of Man5GlcNAc<sup>2</sup> to Man9GlcNAc<sup>2</sup> by recombinant Alg3, Alg9, and Alg12 (**Figure 3B**; Li et al., 2019a).

Furthermore, non-natural sugar donors have the capacity to be used in the synthesis of N-glycans. In 2004, a group synthesized some unnatural fucose nucleotides (UDP-Fuc, ADP-Fuc, CDP-Fuc) and evaluated their efficiency toward FucT-III (Khaled et al., 2004). The results showed that the kinetics of the conversion using these donor analogs were in the order: UDP-Fuc = ADP-Fuc > CDP-Fuc at a concentration of 20 mM, providing useful information on the enzyme specificities and structure-activity relationships. As an analog of the acetamido (-NHAc) group in N-acetylhexosamine-containing substrates, the trifluoroacetamido (-NHTFA) group was found to be an

excellent substitute in enzymatic reactions (Sala et al., 1998). Boons and colleagues reported an off-the-rack biomimetic method for the synthesis of multi-antennary N-glycans with less than ten chemical and enzymatic steps. They use the non-natural sugar donor UDP-GlcNTFA, an analog of UDP-GlcNAc, to install GlcNTFA to the N-glycan core by recombinant GnTs (GnT IV and GnT V). The GlcNTFA moieties were chemically modified into GlcN<sup>3</sup> or GlcNH2, which could not be further extended by GalTs, resulting in inhibition of galactosylation. At the appropriate step in the enzymatic elongation, these terminal GlcN<sup>3</sup> and GlcNH<sup>2</sup> species was converted into their natural GlcNAc counterparts to make the certain branching arms be elaborately processed into target constructs (**Figure 5**; Liu et al., 2019).

### DIFFERENT TYPES OF N-GLYCANS SYNTHESIZED BY CHEMO-ENZYMATIC METHODS

Chemo-enzymatic approaches to synthesize N-glycans combine the advantages of the flexibility of chemical methods and regioand stereoselectivity of the enzymatic reactions. This highly efficient strategy starts from the chemical synthesis of key Nglycan modular structures or an N-glycan precursor from the natural source, followed by steps of enzymatic extension to achieve complicated N-glycan structures.

The pentasaccharide Man3GlcNAc2, which is shared in both DLOs and N-linked glycoproteins, is found in almost all eukaryotic cells. This structure is considered the key intermediate of N-glycans in both the in vivo biosynthesis pathway and in vitro synthesis strategies, which can be elongated and elaborated by various GTs, and is thus called a core pentasaccharide (boxed structure in **Figure 3A**). Moreover, Man3GlcNAc<sup>2</sup> can be prepared from chemical synthesis or obtained from natural source digestion (Seeberger et al., 1996; Li et al., 2018b; Toonstra et al., 2018; Pistorio et al., 2019). Recently, the successful expression and purification of the ER ManTs Alg1 and Alg2 allow the reconstitution of the lipid-linked oligosaccharide (LLO) biosynthesis pathway up to the core pentasaccharide in vitro (Li et al., 2017b). Nevertheless, the availability of this chemoenzymatic process is still limited due to the reaction of Alg2, which was only effective for LLOs with isoprenyl lipid chains longer than C20-C25 (Ramirez et al., 2017), making this step the major bottleneck in the synthesis of Man3GlcNAc2.

The core pentasaccharides can be extended to form highmannose type N-glycans, which have been used to clarify the specificities of ER-related enzymes, such as calreticulin (Arai et al., 2005; Tatami et al., 2007), F-box protein Fbs1 (Hagihara et al., 2005), uridine 5'-diphospho-glucuronosyltransferase (Totani et al., 2005, 2009) and glucosidase-II (Totani et al., 2006). In addition to the various established chemical routes (Matsuo et al., 2003; Geng et al., 2004; Bailey and Bundle, 2014; Ramos-Soriano et al., 2017), several chemo-enzymatic synthesis methods have been developed to obtain high-mannose type N-glycans. Starting from LLO substrates with simplified lipid tails, a series of high-mannose type N-glycan precursors including Man3GlcNAc<sup>2</sup> were produced through Alg-catalyzed elongation (Dsouza et al., 1992; Ramirez et al., 2017; Boilevin and Reymond, 2018; Li et al., 2018a; Rexer et al., 2018). More recently, the in vitro bottom-up chemo-enzymatic synthesis of full-length high-mannose type N-glycan Man9GlcNAc<sup>2</sup> was accomplished by using recombinant Alg proteins expressed in E. coli and lipid-linked GlcNAc<sup>2</sup> as the substrate (Li et al., 2019a; **Figure 3B**). Most Alg proteins have one or more transmembrane domains (TMDs), which lead to the difficulties in their expression and purification. To solve this problem, TMD truncated Alg1 and Alg11 were co-expressed with thioredoxin tagged Alg2, and the Alg3 and Alg9 were fused with Mistic-tag to generate Mistic-Alg3 and Mistic-Alg9. Without purification, the membrane fractions of E.coli were extracted and used for the construction of Man9GlcNAc2-PP-Phy. On the other hand, top-down chemo-enzymatic strategies were also practical in synthesizing high-mannose type N-glycans. Ito and coworkers chemically synthesized a well-designed high-mannose type N-glycan precursor whose terminal mannose moieties were selectively protected by a monosaccharide (i.e., Glc, GlcNAc and Gal) or a protecting group (i.e., isopropylidene), which were then trimmed by chemical deprotection or GH hydrolysis to give a library of high-mannose type N-glycans (Koizumi et al., 2013; Fujikawa et al., 2015; **Figure 6**). Two natural sources of N-glycans, i.e., Asn-linked Man9GlcNAc<sup>2</sup> from soybean flour and SGP from chicken egg yolks, were also successfully treated by sequential enzymatic digestion to produce a library of N-glycans. These N-glycans could be conjugated to bovine serum albumin (BSA) to give neoglycoprotein microarrays for the comprehensive analysis of critical virus-neutralizing epitopes (Toonstra et al., 2018).

The structural diversity of N-glycans occurs during many cellular processes, such as embryogenesis, morphogenesis, cell cycle entry and oncogenesis (Freeze, 2006; Ohtsubo and Marth, 2006). Originating from the core pentasaccharides, the variation of N-glycan structures arises from the degree and patterns of LacNAcs on the branching arms, and further elaborates by the sialylation and fucosylation (Spik et al., 1985). Complex type N-glycans are usually substituted at each branching point, resulting in either symmetrical or asymmetrical architectures. In recent decades, scientists mainly focused on the synthesis of symmetrically branched complex type N-glycan structures by chemo-enzymatic synthetic strategies. For example, different human symmetrically N-glycans (e.g. GlcNAc2Man3GlcNAc2) were assembled from the bacteriaderived core pentasaccharide precursor with subsequent extension by different GnTs (Hamilton et al., 2017). In particular, isolated SGP from egg yolk, which can be trimmed by sialidases and β1, 4-galactosidase to give various complex type N-glycans with symmetrical branches have been widely used as the substrate (Huang et al., 2009; Li et al., 2016b; Wu et al., 2016).

Furthermore, several chemo-enzymatic strategies to access asymmetric N-glycans have also been established. A series of core-fucosylated asymmetric N-glycans were accomplished by the fucosylation with a Caenorhabditis elegans FUT8 expressed in Pichia pastoris (Serna et al., 2011). In this work, the chemical modular synthesized asymmetric N-glycan structures were core fucosylated with high efficiency in very short time, which sometimes required extensive work for the synthesis. In another study, libraries of asymmetrical multi-antennary glycans could be generated from the precursors by the selective extension. The attachment of GlcNAc, Gal, Sia and Fuc moieties was catalyzed by GnT, GalT, SiaT and FucT, respectively. To produce the precursor glycans, the core pentasaccharide Man3GlcNAc<sup>2</sup> was modified at potential branching origin by orthogonal protecting groups, including levulinoyl (Lev), fluorenylmethyloxycarbonate (Fmoc), allyloxycarbonate (Alloc) and 2-naphthylmethyl (Nap) groups, which could be selectively deprotected and further chemically elongated with LacNAc and GlcNAc donors. This work greatly expands the availability of asymmetrical Nglycans, which are quite difficult to prepare from chemical methods or natural sources (Wang et al., 2013b). Similarly, to prepare core-fucosylated asymmetrical tri-antennary N-glycan

isomers, an undecasaccharide was used as the precursor for enzymatic extension by GTs. The undecasaccharide precursor was synthesized from a core-fucosylated hexasaccharide with orthogonal protecting groups, which were sequentially chemical glycosylated with LacNAc and GlcNAc donors. The achieved fucosylated asymmetrical triantennary complex type N-glycans are suggested as the biomarkers of breast cancer (**Figure 7A**; Li et al., 2016a). In addition, a similar tetra-antennary asymmetric precursor, whose terminals were GlcNAc, LacNAc, non-native Gal-α-1,4-GlcNAc and Man-β-1,4-GlcNAc, was described as a substrate to prepare bi-, tri-, and tetra-antennary asymmetric N-glycans through enzymatic transformations (Gagarinov et al., 2017). One application of the above method was the assembly of some tri-antennary N-glycans of zona pellucida carrying sialyl Lewis X (SLe<sup>x</sup> ) moieties at the C-2 and C-2' arms and a sialyl Lewis X-Lewis X (SLe<sup>x</sup> -Le<sup>x</sup> ) residue at the C-6 antenna and another two analogs. The synthesized compounds were used to analyze the glycan-dependent interactions between human sperm and oocytes, indicating that the SLe<sup>x</sup> -Le<sup>x</sup> residue is essential for the inhibiting effect, while the other SLe<sup>x</sup> moieties showed much less effect (Chinoy et al., 2018).

When the abovementioned methodology was expanded to generate N-glycan libraries, it was named core synthesis/enzymatic extension (CSEE), which typically utilizes bacterial GTs and well-designed oligosaccharide core structures (Li et al., 2015; Wu et al., 2016; Calderon et al., 2017). An efficient CSEE strategy was developed to prepare N-glycans with or without (S)LeX moieties, which were rapidly purified by HPLC using an amide column to a minimum 98% purity at the milligram scale (Li et al., 2015). Eight N-glycan core structures with varying non-reducing-end GlcNAc residues were first chemically synthesized from 5 building blocks. Among them, 2 core structures had the peracetylated GlcNAc residue on either the α-1,6-Man or α-1,3-Man branch. After enzymatic extension on the unprotected branch, the acetyl protecting groups would be removed for the further elongation. At last, the 8 GlcNAc residues were elongated by a set of robust GTs to yield an N-glycan library of 73 structures in total (**Figure 7B**).

### AUTOMATIC SYNTHESIS OF N-GLYCANS

Owing to the discovery of enzymes and the development of chemo-enzymatic synthesis techniques, it is supposed that most N-glycan structures can be prepared by GT and/or GH transformations. However, compared with the assembly of nucleotides and peptides, these approaches are still laborintensive and time-consuming processes. The critical issue is probably that almost all peptides (Merrifield, 1965) and oligonucleotides(Caruthers, 1985) can be prepared by automated sequencing synthesis, even by non-specialists, but glycans cannot. In most cases, automated synthesis refers to the synthetic strategy based on a solid support, in which the enzymatic or chemical reactions occur on a solid-phase carrier such as beads, allowing simplified purification and washing steps.

Since Merrified and co-workers developed the solid-phase peptide synthesis (SPPS) method decades ago (Merrifield, 1965), much effort has been devoted to establish the convenient and efficient systems for the automated glycan synthesis (Schuerch, 1971; Plante et al., 2001; Ganesh et al., 2012; Nokami et al., 2013; Pistorio et al., 2016; Hahm et al., 2017; Panza et al., 2018; Wen et al., 2018). So far, automated synthesis of N-glycans mainly focused on the chemical strategies. For example, Seeberger and co-workers prepared the N-glycan core Man3GlcNAc<sup>2</sup> using the automated solid-phase oligosaccharide synthesizer (Kröck et al., 2012). There are few examples of the automated chemoenzymatic synthesis of the N-glyans. This could be due to the in vitro enzymatic synthesis of N-glycans commonly starts from a substrate containing the core pentasaccharide structure Man3GlcNAc<sup>2</sup> (**Figure 3A**). To obtain the core pentasaccharide by enzymatic method, two ManTs Alg1 and Alg2, which have the specificity on the length of lipid tail of the LLO substrate, are required (Ramirez et al., 2017). Till now, the cost-effective largescale enzymatic synthesis of Man3GlcNAc<sup>2</sup> was not reported, thus it is difficult to get sufficient amount of this N-glycan core and immobilize it on the solid-phase carrier for automatic synthesis. An alternative method is to synthesize and immobilize the Man3GlcNAc<sup>2</sup> core structure using chemical method for the automated chemo-enzymatic N-glycan synthesis, though there was no specific report in this field.

The other possible way is to prepare the desired core structure from natural precursors such as SGP from egg yolk by enzymatic digestion, and use it for the automatic chemo-enzymatic elongation after immobilization. Based on this method, an automated platform which can be used to automatically synthesize N-glycans was reported in 2019 (Li et al., 2019b; **Figure 8**). In this work, liquid-phase enzymatic reactions were first performed. The products were purified by being captured onto a resin selectively, followed by the release in appropriate conditions. The "catch and release" of glycan products were realized by introducing a sulfonate tag which could be easily installed and retrieved using a solid phase extraction (SPE) cartridge containing diethylaminoethyl (DEAE) resin. The

products were eluted with aqueous ammonium bicarbonate, and the pH was adjusted with acetic acid to give an appropriate buffer for the next enzymatic reaction. In total, 15 reaction cycles were performed using this automated strategy, resulting in highly pure N-glycan structures (Li et al., 2019b).

In addition, various automatic devices for glycan synthesis have been developed, such as the HPLC-assisted automated synthesizer (Ganesh et al., 2012; Pistorio et al., 2016), the syringe pump-based electrochemical synthesizer (Nokami et al., 2013) and the automated glycan assembly (AGA) machine (Fair et al., 2015). These devices provide another option to chemo-enzymatically synthesize N-glycans and accomplish the purification of products.

### APPLICATIONS OF CHEMO-ENZYMATICALLY SYNTHESIZED N-GLYCANS

#### Glycan Microarray

Immobilization of the glycans on the specific locations of a slide surface, the so-called glycan microarray, allows the high-throughput screening of carbohydrate-binding molecules (Oyelaran and Gildersleeve, 2009). As the discoveries on the significance and applicability of glycans in many biological processes have been continuously reported, glycan microarrays have attracted much attention. This technology not only has enabled comprehensive analyses of the interactions between glycan moieties and glycan binding proteins (GBPs) but could also be applicable for screening the binding properties of proteins, viruses, bacteria, yeast and mammalian cells (Geissner and Seeberger, 2016).

Recently, several works based on chemo-enzymatically synthesized N-glycan microarrays have been reported. For example, it was used to rapidly screen and identify the optimal Nglycan structures recognized by broadly neutralizing antibodies (bNAbs), whose targets are the N-glycans on the HIV surface envelope glycoprotein GP120. Various N-glycans were prepared by the modular chemo-enzymatic synthesis and immobilized on an aluminum-oxide-coated glass slide (ACG). The detection of the HIV-1 bNAbs binding specificities for N-glycans using this microarray could accelerate the development of HIV-1 vaccines (Shivatare et al., 2016). In another work, a microarray of isomeric multi-antennary N-glycans varying in terminal Neu5Ac, Gal, GlcNAc and core Fuc synthesized by a chemo-enzymatic method was constructed (Gao et al., 2019; **Figure 9A**). Using this library of 33 N-glycans and 5 N-glycan conjugates, the specific recognition of plant lectins, human galectins, influenza viruses and Siglecs was investigated, providing new insights into the uses of lectins in glycan identification (**Figure 9B**).

### Homogeneous Glycopeptides and Glycoproteins

N-Glycosylation of proteins with specific N-glycan structures is critical for their stability and biological functions (Jefferis, 2009). On the other hand, the high heterogeneity of N-glycans in glycoproteins makes it difficult to deep understand their structure and functional relationships and slows their use in therapy and diagnosis (Lowary, 2013). Thus, tremendous progress has been made to produce structurally defined homogeneous glycoproteins (such as antibodies), including both chemical synthesis methods (Pratt and Bertozzi, 2005; Kajihara et al., 2010; Unverzagt and Kajihara, 2013; Fernandez-Tejada et al., 2015; Seeberger and Overkleeft, 2015) and chemo-enzymatic methods (Bennett and Wong, 2007; Rich and Withers, 2009; Wang and Huang, 2009; Schmaltz et al., 2011; Wang and Amin, 2014; Danby and Withers, 2016; Fairbanks, 2017). In particular, endoglycosidases such as ENGase have been well-developed and have become a useful tool to prepare homogeneous glycoproteins in the past decade (Fairbanks, 2017). In addition, the glycan oxazoline which can be utilized as the efficient donor of ENGase-catalyzed glycosylation reactions, were widely used in the chemo-enzymatic construction of various glycopeptides and glycoproteins (Zeng et al., 2006; Heidecke et al., 2009; Priyanka and Fairbanks, 2016; Yamaguchi et al., 2016). The synthetic strategy of the homogeneous glycoprotein by ENGase-derived glycosynthase and glycan oxazoline is shown in **Figure 10A**.

Traditionally, N-glycans used for the homogeneous glycoprotein preparation are chemically synthesized or isolated from natural sources, i.e., the bi-antennary complex type SGP

from egg yolks and high-mannose type Man9GlcNAc2-Asn from soy bean flour. In 2009, two endo-glycosidase-based glycosynthases, EndoM-N175A and EndoA-N171A, were constructed to accomplish the assembly of homogeneous Nglycoproteins carrying natural N-glycans (Huang et al., 2009). Based on this work, the same group chemically synthesized glycan oxazolines bearing mannose-6-phosphate (M6P) moieties with different numbers and locations, which were transferred to the GlcNAc residue on the protein by ENGase, providing homogeneous glycoproteins with M6P-containing N-glycans (Yamaguchi et al., 2016).

Recently, chemo-enzymatic methods have become practicably in preparing the N-glycan structures used in homogeneous glycoprotein synthesis. For example, in a highly convergent chemo-enzymatic strategy, a large N-glycan oxazoline precursor was chemically synthesized and subsequently ligated to GlcNAc-RNase (bovine) catalyzed by EndoA-N171A, resulting in a glycoprotein with the selectively modified glycoform, i.e., GalGlcMan9GlcNAc2-RNase (Amin et al., 2011). After the terminal galactose was hydrolyzed by the β-galactosidase in excellent yield, the resulted monoglucosylated RNase GlcMan9GlcNAc2-RNase could serves as the specific ligand of calnexin and calreticulin. In 2018, Endo-S2 mutations were screened to give three mutants (D226Q, D182Q, and T138Q), which could transfer the high-mannose, hybrid-, and bi- or tri-antennary complex type N-glycans to prepare homogeneous Rituximab-variants (Rtx-variants) (Shivatare et al., 2018; **Figure 10B**). These Rtx-variants were used to examine the FcγIIIA binding affinity and further evaluate the antibody dependent cell mediated cytotoxicity (ADCC) activities (**Figure 10C**). This research investigated the glycosynthases possess various substrate specificities, significantly expanding the application of chemo-enzymatic approach to obtain the desired homogeneous antibodies.

The development of chemo-enzymatic methods for synthesizing various complex N-glycans and investigating glycosynthases have simplified the preparation of homogeneous glycoproteins. The availability of these homogeneous glycoproteins will be of great significance in investigating the effects and functions of N-glycans in glycoproteins.

#### Potential Biomarkers

Commonly, alterations in protein glycosylation, including Nglycosylation, will affect the biological function, thus leading to the disorder of cells (Varki, 1993). Researches on glycomics have shown the biological significance of N-glycans in human disease, particularly in the study of tumor cells, and several diseases related to N-glycans that directly indicate biological changes have been identified. These N-glycan biomarkers might be used to estimate the developing risk, serve as the diagnosis tools, and monitor the progress and medication effects of a disease. Thus, chemo-enzymatic synthesis was applied to prepare these potential biomarkers due to its efficiency and variability in vitro.

Along with the development of the glycan analysis methodologies, such as the high-throughput technologies to analyze large quantities of samples, many aberrant N-glycans associated with diseases have been discovered. For instance, the fucosylation and sialylation levels are found significantly changed in the N-linked glycoproteins of cancer patients (Peracaula et al., 2008) and the high-mannose type N-glycan (Man9GlcNAc2) was found in serum of aggressive prostate cancer patients (Wang et al., 2013a). These aberrant N-glycans are considered as the biomarkers of corresponding diseases, and some of them have been achieved using chemo-enzymatic approaches.

For example, asymmetrical N-glycans containing sLeX, which are structures detected in serum glycoproteins of breast cancer patients, were prepared with a panel of glycosyl transferases. These synthesized N-glycans showed the potential in disease early diagnostic, and served as the specific therapeutic targets (Alley and Novotny, 2010; Li et al., 2016a). Similarly, a tetraantennary N-glycan detected in the tissue of ductal invasive breast carcinoma patients was chemo-enzymatically synthesized from an asymmetric tetra-antennary intermediate precursor. This biomarker was considered as one of the most complex Nglycan structures ever discovered, thus difficult for the chemical synthesis. After it was prepared, this compound commonly used as the standard to analyze the quantity and structure of the glycans on the glycoproteins from biological samples using mass spectrometry, which would help to understand the metabolic or disease processes and be useful for the early disease diagnosis (Gagarinov et al., 2017).

### SUMMARY AND OUTLOOK

N-glycans are a family of highly diverse oligosaccharide structures that are assembled and trimmed by GTs and GHs. The chemo-enzymatic approach to produce N-glycans, combining the advantages of chemical and enzymatic glycosylation methods, shows high specificity, mild reaction conditions and economic efficiency. Meanwhile, an increasing number of identified or commercially available N-glycosylation enzymes and largescale preparation methods for monosaccharide donors provide strong support for the chemo-enzymatic synthesis of N-glycan structures. To date, all three types of structurally well-defined Nglycans (i.e., high-mannose, hybrid and complex types) have been generated by chemo-enzymatic strategies, starting from either chemically synthesized materials or isolated natural substrates. These N-glycans are applicable in the analysis of the interactions between GBPs and carbohydrates by microarray, preparation of homogeneous glycoproteins and assembly of potential Nglycan biomarkers. In addition, novel methodologies such as the automated solid-phase chemo-enzymatic synthesis of N-glycans are under investigation.

Since N-glycans play essential roles in biological pathways, exploratory studies to establish mature and convenient technologies for the chemo-enzymatic synthesis of N-glycan structures will be the frontier research. More likely, the development of automated synthesis devices and systems that are easily accessible will have a revolutionary impact on N-glycan preparation, thereby leading to an understanding of N-glycan functions in biological systems and illuminating N-glycan-related therapies.

#### AUTHOR CONTRIBUTIONS

QC and NW wrote the manuscript. YD contributed to the creation of most figures in this manuscript. Z-HC and M-HX contributed to the editing of this manuscript. NW and X-DG revised and edited the final manuscript. All authors contributed to reference collection, selection, and final proof.

#### FUNDING

This work was supported by the National Natural Science Foundation of China (Grand nos. 21778023,

#### REFERENCES


21807048, and 31971216), the Natural Science Foundation of Jiangsu Province (BK20170174), the Open Project Program of Key Laboratory of Carbohydrate Chemistry and Biotechnology (KLCCB-KF201604), and the Topnotch Academic Programs Project of Jiangsu Higher Education Institutions.

#### ACKNOWLEDGMENTS

We are very grateful to Drs. Hideki Nakanishi, Morihisa Fujita, Zijie Li and Ganglong Yang for the discussion about this manuscript.


J.D. Esko, P. Stanley, G.W. Hart, M. Aebi, A.G. Darvill, T. Kinoshita, N.H. Packer, J.H. Prestegard, R.L. Schnaar and P.H. Seeberger (New York, NY: Cold Spring Harbor Laboratory Press), 681–679


and non-natural alpha-2,6-linked sialosides: A P. damsela alpha-2,6 sialyltransferase with extremely flexible donor-substrate specificity. Angew Chem Int Ed Engl. 45, 3938–3944. doi: 10.1002/anie.2006 00572


oxazolines: probing substrate structural requirement. Chemistry 12, 3355–3364. doi: 10.1002/chem.200501196

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Chao, Ding, Chen, Xiang, Wang and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Chemoenzymatic Production and Engineering of Chitooligosaccharides and N-acetyl Glucosamine for Refining Biological Activities

Manish Kumar 1†, Meenakshi Rajput 1†, Twinkle Soni <sup>1</sup> , Vivekanand Vivekanand<sup>2</sup> and Nidhi Pareek <sup>1</sup> \*

*<sup>1</sup> Microbial Catalysis and Process Engineering Laboratory, Department of Microbiology, School of Life Sciences, Central University of Rajasthan, Ajmer, India, <sup>2</sup> Centre for Energy and Environment, Malaviya National Institute of Technology, Jaipur, India*

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Paripok Phitsuwan, King Mongkut's University of Technology Thonburi, Thailand Jian Yin, Jiangnan University, China*

> \*Correspondence: *Nidhi Pareek nidhipareek@curaj.ac.in*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *25 January 2020* Accepted: *05 May 2020* Published: *24 June 2020*

#### Citation:

*Kumar M, Rajput M, Soni T, Vivekanand V and Pareek N (2020) Chemoenzymatic Production and Engineering of Chitooligosaccharides and N-acetyl Glucosamine for Refining Biological Activities. Front. Chem. 8:469. doi: 10.3389/fchem.2020.00469* Chitooligosaccharides (COS) and *N*-acetyl glucosamine (GlcNAc) are currently of enormous relevance to pharmaceutical, nutraceutical, cosmetics, food, and agriculture industries due to their wide range of biological activities, which include antimicrobial, antitumor, antioxidant, anticoagulant, wound healing, immunoregulatory, and hypocholesterolemic effects. A range of methods have been developed for the synthesis of COS with a specific degree of polymerization along with high production titres. In this respect, chemical, enzymatic, and microbial means, along with modern genetic manipulation techniques, have been extensively explored; however no method has been able to competently produce defined COS and GlcNAc in a mono-system approach. Henceforth, the chitin research has turned toward increased exploration of chemoenzymatic processes for COS and GlcNAc generation. Recent developments in the area of green chemicals, mainly ionic liquids, proved vital for the specified COS and GlcNAc synthesis with better yield and purity. Moreover, engineering of COS and GlcNAc to generate novel derivatives viz. carboxylated, sulfated, phenolic acid conjugated, amino derived COS, etc., further improved their biological activities. Consequently, chemoenzymatic synthesis and engineering of COS and GlcNAc emerged as a useful approach to lead the biologically-active compound-based biomedical research to an advanced prospect in the forthcoming era.

Keywords: chemoenzymatic, chitinase, ionic liquids, chitooligosaccharides, N-acetyl glucosamine, biological activities

#### INTRODUCTION

Modern biomedical research largely focuses on the biomaterials obtained from natural polymers (Raghavendra et al., 2015). Biopolymer-derived biomaterials indicate the tremendous possibility of utilization in the biomedical sector owing to their biocompatible and biodegradable nature. Among the biopolymers, chitin, and chitosan (CHS) have been extensively researched for their biological activities (Kumar, 2000). Chitin is a linear polysaccharide of β(1→4) linked N-acetyl-D-glucosamine (GlcNAc) monomers whereas CHS is a homo- or hetero-polymer of

**127**

GlcNAc and D-glucosamine (GlcN) residues (Islam et al., 2017). The high MW and degree of polymerization (DP) limit the water solubility of these polymers. Water insolubility and high crystallinity limit industrial applicability of chitin and CHS, however, their transformation into soluble and more biologically active forms, i.e., chitooligosaccharides (COS) and GlcNAc, provides a way to overcome it (Qin and Zhao, 2019).

COS are the degraded products of chitin or CHS with DP ranging from 2 to 20. A wide range of biological activities, such as antimicrobial, anti-tumor, anti-oxidative, immunostimulatory, and anti-inflammatory activities, makes COS and GlcNAc molecules of interest for diverse industrial applications (Liaqat and Eltem, 2018). Despite having numerous significant applications, the production routes of COS and GlcNAc are still in their infancy in terms of bio-sustainability. Currently, chemical processes are predominantly used for the production of COS and GlcNAc; however, degradation of product quality, generation of undesirable byproducts, and environmental concerns limits their application (Varun et al., 2017). Moreover, interest has been raised over enzymatic conversion (employing chitinase and/or N-acetyl glucosaminidase) of chitin and CHS to COS and GlcNAc that has the advantage of producing defined products with high specificity (Kumar et al., 2018a). However, the limited availability of high enzyme yielding microbial resources, along with the high cost of enzyme/product extraction and purification bioprocesses and low yields, restricted the development of chitin bioconversion processes at a commercial scale. Therefore, the scientific community is currently focusing on the development of highly effective yet environmentally-friendly methods. In this regard, the synergistic approach of employing both chemicals and biocatalysts, i.e., the chemoenzymatic approach, seems to be more promising than other methods. The substrate pretreatment using eco-friendly chemicals, viz. ionic liquids, have shown remarkable possibilities to replace the traditionally-used harsh and toxic substances. Ionic liquids are being more frequently used in chitin research because of their high dissolving capacity, ease of handling, reusability, and lower toxicity (Li et al., 2019). The application of a mild chemical treatment contributes to increased vulnerability of chitin and CHS for further enzymatic action and thus is beneficial in enhancing the quality and yield of COS and GlcNAc. Additionally, transglycosylation methods are also in use for the synthesis of COS. The enzymes and chemicals with the transglycosylation ability can synthesize COS with a high DP, more purity, and more affinity (Alsina et al., 2019; Bhuvanachandra and Podile, 2020). The biological activities of COS depend upon DP, MW, and the pattern of acetylation (PA)/deacetylation (PD). The engineering of COS and GlcNAc to enhance their biological activities is possible due to the employment of developments made so far in the field of biotechnology and chemistry. Moreover, the medicinal applications of COS and GlcNAc can be improved by altering the chemical structure, which in turn increases their biological motion (Ngo et al., 2019). Considering the recent advancements in the biomedical applications of COS, GlcNAc, and their derivatives, this review mainly focuses on the bioconversion production approaches to enhance cost-effective production in a green manner. Moreover, the review also enlightens the developments made in the engineering and derivatization of COS and GlcNAc to expand their biological functions.

### COS AND GlcNAc

The commercial application of chitin can be augmented by overcoming its high crystallinity and insolubility. It can be achieved through the interconversion of chitin into COS and GlcNAc. COS are the homo- or heteropolymers of GlcNAc and D-glucosamine (GlcN) units in varying proportions, and are generated via chitin degradation (Aam et al., 2010). The average MW of COS is <39 kDa with <20 DP (Mourya et al., 2011). GlcNAc is an essential factor in hyaluronic acid and keratin sulfate present on the cell surface (Chen et al., 2010). GlcNAc is also explored as a renewable resource for ethanol production. COS and GlcNAc can be synthesized by employing either physical, chemical, or biological methods alone or in a synergistic manner (Kumar et al., 2018a). The demand for COS and GlcNAc has tremendously increased due to their vast potential applications in biomedical, food, cosmetic, and agricultural sectors. The biological activities (antimicrobial, anticancer, antioxidant, immune-stimulating activity, etc.) of COS and GlcNAc largely depends on the DP and sequence of acetylated and deacetylated units (Halder and Mondal, 2018) along with the solubility that increases with DP in an inversely proportional manner (Liaqat and Eltem, 2018). Water solubility and lower viscosity of COS are associated with their chain lengths and free amino groups in GlcN units. COS are usually insoluble in acetone, butanol, ethanol, ethyl acetate, propanol, and pyridine, whereas they are soluble in water and partially soluble in methanol and dimethyl sulfoxide (Mourya et al., 2011).

### BIOLOGICAL ACTIVITIES OF COS AND GlcNAc

The water solubility and low MW of COS and GlcNAc enhance their applicability in food and agricultural sectors. Still, the most significant and extraordinary applications of COS and GlcNAc are in human healthcare. Several biomedical applications of COS have been reported, such as forvectors in gene delivery, to prevent tumor growth, asthma treatment, improvement of bone strength, and fabrication of tissue scaffolds (Aam et al., 2010; Nagpure et al., 2014; Chen and Zhao, 2019) (**Table 1**). The wide range of applications of COS and GlcNAc are owed to their remarkable biological activities viz. antimicrobial, anti-tumor, antioxidant, immunoregulatory, blood pressure control, and hypocholesterolemic effects (Liaqat and Eltem, 2018) (**Figure 1**). The biological activities of COS largely depend upon DP and DA. The mechanism underlying the biological activities of COS is still not well-studied due to a lack of purity, quality, and proper characterization (Zou et al., 2016). The presence of primary amino groups have been reported to contribute in the antimicrobial activity of COS and GlcNAc. COS have been known to accomplish microbial cell death by modifying the permeability of the cell membrane. Generally, COS are positively charged, which enables them to bind and adsorb readily on the


#### TABLE 1 | Biological activities of COS and GlcN.

negative cell wall that leads to DNA rupture and blocking of RNA transcription (Mei et al., 2015).

Similarly, the amino group also plays a key role in the antioxidant activity of COS. The amino group stabilizes unstable free radicals that results in an interrupted radical chain reaction (Zhao et al., 2013). In a recent study, Bai et al. (2020) developed selenium nanoparticles loaded CHS/COS microparticles and investigated their ability to alleviate the oxidative stress induced by alcohol in mice. During oxidative stress, the concentration of thiobarbituric acid reactive substances (TBARS) and total carbonyl compounds (TCC) increases with the decrease in superoxide dismutase (SOD), catalase (CAT), glutathione (GSH), and glutathione peroxidase (GSHpx) levels in the cell. The study conducted an acute lethal test and 50% ethanol challenge to observe the efficiency of the developed microparticles as a proficient antioxidative agent, and they observed a rise in SOD, CAT, GSH, and GSHpx with a reduction in TBARS and TCC levels that proved the antioxidant potential of COS. In the present decade, the antitumor activity of COS and GlcNAc has been under extensive exploration by several researchers (Wang et al., 2019; Wu et al., 2019). Yu et al. (2016) conjugated a short synthetic peptide ES2 (Endostatin2) with soluble O-(2-hydroxy) propyl-3 trimethylammonium COS chloride (HTCOSC) through covalent binding and reported the considerable anti-angiogenic potential of the conjugate. Wang et al. (2019) further examined the role of this conjugate in inhibiting tumor growth in tumor-bearing C57BL/6 female mice by affecting vascular endothelial growth factor (VEGF), microvessel density (MVD), and caspase-3 through immunohistochemistry techniques. The results of the study revealed that the conjugate had successfully blocked cell proliferation in endothelial cells by arresting the cell cycle at the G0/G1phase. Furthermore, the study also suggested that the expression of caspase-3 was upregulated with a decrease in the number of microvessels in tumors. COS can also reduce the azomethane and dextran sulfate sodium induced by colorectal cancer in mice by restoring the composition of the gastrointestinal fungal community by increasing the healthy microbes (prebiotics) and reducing pathobionts, as alterations in the gastric and intestinal microbiota were observed during colorectal cancer (Wu et al., 2019). Thus, recent studies suggest that COS can serve as a compelling contender for the treatment of cancer. However, the mechanism behind the anti-tumor activity of COS is largely unknown. Additionally, selective electrostatic interactions with the tumor cell, MW, DP, immunostimulating, and antiangiogenic effects are also considered to be responsible for the anti-tumor activities of COS and GlcNAc (Liaqat and Eltem, 2018).

The anti-inflammatory activities of COS have been shown to mainly depend upon their physicochemical properties. COS with lower DP shows better antioxidant as well as antiinflammatory activities (Santos-Moriano et al., 2019). The study conducted by Santos-Moriano et al. (2019) analyzed the anti-inflammatory activity of a mixture of COS produced by

various combinations of enzymes and substrates on murine macrophages (RAW 264.7) by monitoring the concentration of TNF-α (tumor necrosis factor-α) through a TNF-α ELISA kit. They observed that after stimulation by lipopolysaccharides, TNF-α concentration was decreased and the highest TNF-α concentration of 1,575 pg ml−<sup>1</sup> was achieved after poststimulation of 6 h with the application of a 10 ng mixture of lipopolysaccharides. The reported remarkable biological activities of COS and GlcNAc intensified the demand for their commercial production. Therefore, the present era of chitin research is mainly focused on the development of an efficient process for the production of high COS and GlcNAc titres with improved biological activities. The following section presents an insight into the current developments being made and the diverse strategies employed for COS and GlcNAc production.

### STRATEGIES FOR COS AND GlcNAc PRODUCTION

Over the last few decades, the main issue in chitin research has been to develop an efficient and sustainable process for COS and GlcNAc production. Several approaches envisaging chemical and biological means are in practice for COS and GlcNAc production (**Figure 2**). Several chemical-based methods have been used for COS generation, and among them, acid hydrolysis of chitin is the most common (Novikov, 2004; Jung and Park, 2014). However, the acid-based production of COS and GlcNAc has several disadvantages, such as the harsh reaction conditions, formation of unstable products, and environmental concerns (Jung and Park, 2014). These issues can be addressed by the involvement of enzymatic conversion methods that allow for better reaction control with improved biological activities (Liang et al., 2018). However, commercial employment of enzymatic conversion has been restricted due to relatively low production levels and a high extraction cost. Therefore, researchers are focusing on the development of synergistic chemical treatments and enzymatic hydrolysis processes for chitin degradation with special consideration on the utilization of non-toxic or less toxic chemicals (Nu et al., 2017). The chemoenzymatic approach showed high potentiality of COS and GlcNAc production from chitin. Moreover, the transglycosylation approach for the synthesis of long-chain COS has also illustrated noteworthy prospects for industrial applications (Sinha et al., 2016). The various production methods for COS and GlcNAc have been summarized in **Table 2** with their respective advantages/disadvantages and possible industrial implementations. The forthcoming section exemplifies the recent developments in enzymatic, chemical, and chemoenzymatic production strategies for COS and GlcNAc generation.

TABLE 2 | Summary of various production methods of COS and GLcNAc.


### Enzymatic Approach

Over the last few decades, significant advancements have been made in the enzymatic methods of chitin biotransformation into COS and GlcNAc and these have emerged as a promising substitute to chemical methods. Chemical conversion is associated with various disadvantages, i.e., generation of toxic compounds, alteration in original chemical structure, environmental pollution, variable product quality, and low product yields. Whereas, the enzymatic mean of conversion has an edge due to its eco-friendly approach with ease of control. Chitozymes have been considered as the major group of enzymes involved in chitin transformations. Among the group, chitinases (E.C 3.2.2.14) are the key biocatalysts explored for the enzymatic production of COS and GlcNAc (**Table 3**). These glycosyl hydrolases degrade chitin into low/high MW COS and GlcNAc. Chitinases research has gained attention after the discovery of their remarkable role as a biocontrol agent against fungal phytopathogens and harmful insects in plant defense systems. Based on the cleavage patterns, chitinases can be classified into three major groups: i.e., (a) exo-chitinases, (b) endo-chitinases, and (c) β-N-acetylglucosaminidase (Patil et al., 2000; Hamid et al., 2013). The enzymatic production of COS using chitinases needs high levels of endo-chitinases, whereas GlcNAc production requires exo-chitinases and β-N-acetylglucosaminidase in a high quantity (Lombard et al., 2014). Depending on the amino acid sequence similarity (www.cazy.org), chitinases are placed into glycosyl hydrolase (GH) families 18, 19, and 20. The ground of this categorization comprises of N-terminal sequence, enzyme localization, signal peptide, isoelectric pH, and inducers. Bacterial and fungal chitinases belong to GH family 18 whereas GH family 19 primarily encompasses plant chitinases (Kumar et al., 2018c). Depending on the degree of acetylation of chitosan, chitinases also possess the ability to degrade chitosan (Hartl et al., 2012). To date, various studies have been reported on COS and GlcNAc production utilizing chitinases. Zhang et al. (2018) developed a novel recombinant of chitinase gene CmChi1 derived from Chitinolyticabacter meiyuanensis SYBC-H1 by expressing it in Escherichia coli BL21 (DE-3) cells. The (GlcNAc)2−<sup>6</sup> and colloidal chitin (CC) were utilized as substrates where the recombinant enzyme showed 15.3 U mg−<sup>1</sup> activity with CC. The enzyme displaying the exo- cleavage pattern as (GlcNAc)<sup>2</sup> was produced as the main product from both the substrates, whereas the considerable release of GlcNAc and (GlcNAc)3−<sup>4</sup> in small amounts revealed the endo-pattern of cleavage. Moreover, weak β-N-acetylglucosaminidase activity was also observed, thus recombinant CmChi1 chitinase appeared as a possible candidate to achieve an 100% yield of GlcNAc from chitin. Similarly, a recombinant of the chit42 gene from Trichoderma harzianum was developed in the Pichia pastoris expression system (42 kDa, 150 mU) (Kidibule et al., 2018). The recombinant enzyme was then utilized for COS and GlcNAc production using CC and CHS as the substrates and observed to produce (GlcNAc)1−<sup>2</sup> and GlcNAc as the major products (Kidibule et al., 2018).

Thermo-alkali stable, extracellular chitinase (10.5 kDa) was purified from Streptomyces chilikensis RC1830 isolated from brackish lake water sediment (Ray et al., 2019). The study suggested that purified enzymes possess a high binding affinity with CC as compared to starch, xylan, and carboxymethyl cellulose and have a significant conversion efficiency at 60◦C (pH 11). High acid tolerance and thermal stability are considered as the desired characteristics of the enzymes to be utilized for COS and GlcNAc production at an industrial level. A 46kDa Chi1 was purified from Streptomyces thermodiastaticus HF 3- 3 with specific activity of 2.4 U mg−<sup>1</sup> . The enzyme was found to be remarkably stable over a wide range of pH, temperatures, and chemical exposures (Take et al., 2018). Moreover, thin-layer chromatography (TLC) analysis of CC cleavage products revealed that Chi1 exhibited endo- pattern of cleavage with the generation of (GlcNAc)<sup>2</sup> as the major product and minimal amounts of (GlcNAc)<sup>3</sup> and GlcNAc (Take et al., 2018). Extracellular chitinases production and its utilization for COS and GlcNAc conversion has also been extensively studied, and some findings have shown high potentiality in terms of catalytic efficiency and/or thermal and pH stability (Halder et al., 2019). A 50 kDa chitinase from thermophilic Humicola grisea displayed significant catalytic efficiency toward CC with the generation of GlcNAc and COS with DP of 2 and 3 (Kumar et al., 2018b). The enzyme showed optimum enzyme activity at a pH of 3.0 and a temperature of 70◦C. Another study reported a thermostable chitinase Chi1 homologously produced from Myceliophthora thermophila C1 with high thermostability and activity toward chitin, CHS, modified CHS, and chitin oligosaccharides (Krolicka et al., 2018). The study reported that Chi1 has notable stability at 40◦C (90%, 140 h) and 50◦C (90%, 168 h).

### Chemical Approach

The chemical methods for COS and GlcNAc production are most frequently applied for commercial-scale production through shrimp and crab shells. The extraction is carried out by acid hydrolysis and oxidative degradation methods. The chemical conversion of chitin usually employs hydrochloric acid, sulphuric acid, acetic acid, lactic acid, trichloroacetic acid, and formic acid (Hamed et al., 2016). Processing temperature, time, and concentration of acid are vital factors that affect the rate of conversion. The polymer conversion research is presently aimed at utilizing relatively less toxic chemicals viz. ionic liquids to compete with the production levels as well as to overcome environmental safety issues (**Figure 3**).

#### Conventional Chemicals

The practice of utilizing harsh chemicals for the degradation of chitin has been longstanding. A three-step method for acid hydrolysis of chitin was reported by Falk et al. (1966). The study showed chitin hydrolysis using concentrated hydrochloric acid in the stages including: (a) production of smaller polymeric units or oligomers, (b) output of GlcNAc from the latter, and (c) GlcNAc conversion into glucosamine and acetic acid. GlcNAc was usually obtained by acid hydrolysis of chitin at lower temperatures or by acetylation reaction of GlcN with acetic anhydride, whereas GlcN is the product of chitin acid hydrolysis at high temperatures (Mojarrad et al., 2007). Although production of GlcNAc through chemical hydrolysis appeared economically reasonable, limited specificity of chemical catalysts, low product yield, high acidic waste generation, and operational costs demotes the application of chemical methods. Mechanocatalytic depolymerization of chitin was reported, in which soluble chitin oligomers were produced through the grounding of chitin with sulfuric acid in a ball mill and further hydrolysis at high temperature with additional acid resulted in the GlcNAc production (Yabushita et al., 2015). Similarly, an advanced approach for hydrolysis of chitin into GlcNAc and GlcN with significantly reduced amounts of acid as catalysts was demonstrated by Zhang and Yan (2017). The researchers employed co-solvents, including several aprotic solvents and etheric solvents, resulting in an 80% GlcN yield in the presence of 100 mM sulfuric acid (175◦C, 60 mins). Recently, sulphuric acid hydrolysis of straw mushroom was conducted and a yield of 56.8132 mg g−<sup>1</sup> glucosamine was achieved (Zhang and Sutheerawattananonda, 2020). Although the traditional chemicals are quite useful in terms of production levels, in view of product quality and environmental safety, it has become imperative to develop an alternative approach.

#### Ionic Liquids

Solvents composed of volatile organic compounds (VOC) tops the list of dangerous chemicals. Volatile organic solvents act as primary reaction media for the production of various chemicals at an industrial scale and proved to be harmful to both the environment and to worker's health. Moreover, the extra cost required for the disposal of these VOCs, recyclability issues, and


their separation from the desired reaction product make them highly undesirable (Mallakpour and Dinari, 2012). Therefore, a viable and environmentally-friendly alternative for volatile organic solvents is required. Ionic liquids (ILs) serve as an attractive alternate capable of overcoming all the limitations of traditional volatile organic solvents. ILs can be defined as salts with a melting point less than the boiling point of water (Wilkes, 2004). ILs are mainly composed of ions, i.e., organic cations and inorganic/organic anions (Vekariya, 2017). The highly promising properties of ILs include thermal stability, non-flammability, secure containment, and easy recycling, and have attracted the attention of many researchers in the last two decades (Marcus, 2016). The electrostatic forces lying inbetween the ions are responsible for the stability, non-volatility, and non-flammability of ILs. The ILs are renowned as "green solvents" because of their non-volatile nature and low vapor pressure (MacFarlane et al., 2014). Moreover, they are also designated with several other names, viz. designer solvents, molten salts, neoteric solvents, and ionic fluids. ILs are regarded as "designer solvents" because the polarity and hydrophilicity/hydrophobicity can be designed by various suitable combinations of cation and anions (Passos et al., 2014). Another advantage of ILs is that, during the reaction, they do not require harsh conditions and respond within a relatively short period (Maier et al., 2017).

ILs have been explored widely for the production of various valuable materials and chemicals by utilizing lignocellulosic biomass (Yoo et al., 2017). However, the utilization of ILs for the processing of chitin to yield valuable products (GlcNAc, COS, CHS, and other chemicals) is still in its infancy. The utilization of ionic liquids provided a sustainable way to process chitin in the green chemistry realm for the generation of defined products with novel biological properties. ILs firstly disrupt the intraand intermolecular hydrogen bonding in the polysaccharides (chitin, chitosan, and COS) followed by the formation of new hydrogen bonds between the anions of ionic liquids and hydroxyl groups of polysaccharides. It leads to the formation of amorphous chitin having a hydrated gel-like structure with relatively lower crystallinity. The polymeric chain of chitin gets decrystallized due to the disruption of hydrogen bonds with increased accessibility to reactants (**Figure 4**). Thus, for chitin processing, ILs can either be employed during pretreatment or as a co-solvent/solvent in simultaneous treatment (Shamshina and Berton, 2020).

Several studies have been reported on the employment of ionic liquids ([C2mim][OAc] and [C2mim][MeO(H)PO2]) for the pretreatment of chitin prior to the enzymatic hydrolysis for the efficient production of COS and GlcNAc (**Table 4**). Yu et al. (2015) compared the degradative effects of modified cellulase (Cell-ALD 10k) on chitosan in IL [Gly]BF4 and acetic acid HAc systems. The FTIR spectra revealed that [Gly]BF4 has successfully disrupted the inter- and intramolecular hydrogen bonding in chitosan more effectively than HAc. They evidenced 76.36% yields of COS at optimal reaction conditions (pH 5, 50◦C). Furthermore, the researchers also reported a significant decrease in the crystallinity of chitosan with the increase in free amino groups by employing [Gly]BF4. Gillard et al. (2016) made an attempt for the generation of glucosamine containing oligomers by glycosylating ionic linker tag (ITag) grafted acceptors with thioglucosamine donors by the β- (1→4) linkage. For this, they opted for the ionic catch-andrelease oligosaccharide synthesis (ICROS) methodology. The abovementioned studies reflected the vast potentiality of ILs in the production of COS and GlcNAc. However, further research is required to enhance the recyclability and reusability of ILs along with the synergistic action with enzymatic methods.

#### Microbial Engineering Approach

The enzymatic route of COS and GlcNAc production mainly exploits chitinolytic enzymes, that are obtained from a range of bacteria, fungi, and plants i.e., Serratia marcescens, Bacillus circulans, Streptomyces griseus, Saccharomyces cerevisiae, Candida albicans, Neurospora crassa, and Trichoderma harzianum (Hamid et al., 2013). However, the production of enzymes with low activity, yield, and specificity encumbered the employment of these biocatalysts for large-scale production of COS and GlcNAc. Thus, to increase the expression level of chitinolytic enzymes from microbial sources, researchers are focussing on the engineering of microbial cells by using different heterologous expression systems (Pan et al., 2019). To date, Escherichia coli (Yang et al., 2016; Abdel-Salam et al., 2018) and Pichia pastoris (Ueda et al., 2017; Menghiu et al., 2019) have been explored widely for the expression of chitinases from distinct sources. However, the heterogeneous expression of enzymes in these systems holds many challenges, including the formation of inclusion bodies. The eukaryotic Pichia pastoris expression system also possesses several limitations, including its complex nature, low expression levels, and extended incubation periods. Furthermore, the expression of intracellular enzymes faces more hurdles as they need the physical destruction of cells to secrete out, which may led to an increased cost of enzyme purification, low activities, and enzyme inactivation. Thus, it is undeniably necessary to select an appropriate expression host system along with genetic engineering for the development of potent strains capable of producing high chitinase titres with notable catalytic efficiency to convert chitin into COS and GlcNAc. Moreover, the genomic engineering of microbial cells has also proven to be a favorable approach for obtaining COS with better yields to support research and applications in the clinical and food industry (Ruffing et al., 2006).

Penta-N-acetyl-chitopentaose mixture has been produced by cultivating recombinant E. coli cells expressing nodC gene from Mesorhizobium loti (Zhang et al., 2007). The study developed a two-step fermentation process to improve the COS productivity in recombinant E. coli cells and observed significant enhancement in COS yield and production up to 930 mg l−<sup>1</sup> and 37 mg l−1h −1 , respectively, in a 10-L bioreactor. In Bacillus subtilis WB600, the addition of signal peptide NprB has significantly increased the extracellular expression of chitinase (Chisb) from 2.28 to 35.54 U mL−<sup>1</sup> (Pan et al., 2019). Chisb was purified with HisTrapHP and showed a 3.06-fold purification with 141.43 U mg−<sup>1</sup> specific activity when colloidal chitin was used as a substrate. Furthermore, the study reported that the ribosome binding sites (RBS) optimized with spacer sequences and molecular docking technology in combination with site-directed mutagenesis increased the expression level and the specific activity of Chisb. Chitinase activity with 20 RBS sites (R1-R20) was analyzed, and R13, R19, and R20 have reported to considerably augment the enzyme activity by 45.39, 36.83, and 14.77%, respectively. The substrate specificity of Chisb was found to be highest with (GlcNAc)<sup>5</sup> as 340.76 U mg−<sup>1</sup> among colloidal chitin, powdery chitin, chitosan, pNP-GlcNAc, and (GlcNAc)2−4. Deng et al. (2019) performed wholegenome sequencing of Corynebacterium glutamicum S9114 to evaluate the function of different genes in glutamate and GlcNAc synthesis pathways. They observed that blocking the expression of nagA (GlcNAc-6-phosphate deacetylase) and gamA (GlcN-6-phosphate deaminase) in C. glutamicum S9114 increased the production of GlcNAc by 54.8% from 3.1 to 4.8 gL−<sup>1</sup> . Further, the silencing of ldh (lactate dehydrogenase) gene led to the inhibition

TABLE 4 | Ionic liquids employed for COS and GlcNAc production.


of lactate synthesis, which resulted in increased GlcNAc titer value, i.e., 5.4 gL−<sup>1</sup> . Moreover, a recombinant CGGN-GNA1- CgglmS has been developed by expressing a vital gene for GlcN synthesis i.e., glmS from various sources in vector pJYW-4-ceN to rise the GlcNAc titer value (6.9 gL−<sup>1</sup> ). Recent advancements in microbial engineering have shown a high proficiency for enhanced production of chitinolytic enzymes that can further be effectively explored for COS and GlcNAc production. However, the commercialization of engineered strains needs a lot more research.

#### Transglycosylation Approach

Recently, the chitin and CHS modifying enzymes have gained considerable attention as efficient tools for the production of well-defined COS and their derivatives. For this, genetically engineered chitinolytic enzymes possessing transglycosylation (TG) activity have emerged as exceptional tools. Transglycosylation and hydrolysis are two diverse and essential routes that are accomplished by chitinolytic enzymes. Transglycosylation activity involved the formation of a glycosidic bond amidst two saccharides by the relocation of sugar moiety from a suitable donor to an acceptor (Ling et al., 2018). Several chitinolytic enzymes exhibiting hydrolytic activity also possess transglycosylation ability to some extent. These abilities of chitinolytic enzymes help to determine their aptness in performing specific functions and applications. Hydrolysis generally led to the synthesis of monomeric or dimeric glucosamine units with low DP. In contrast, TG is a kinetically controlled process that results in the synthesis of size- and stereo-specific oligomers of chitin/CHS and their

derivatives (Purushotham and Podile, 2012). The occurrence of TG activity during the hydrolytic process may results in the production of astonishing enzymatic products as it interferes with the hydrolytic activity of chitinases. Therefore, chitinases must be engineered in favor of hydrolysis or TG to get the desired outcome that meets the specific applications. Also, the genetically engineered chitinases with high TG activity and relatively reduced hydrolytic activity produce COS with high DP that can be utilized in the food and pharmaceutical industries.

COS with higher DP can be developed from COS with lower DP by using chitinolytic enzymes viz. chitinase, chitosanase, and other glycosidases exhibiting TG activity. For instance, Hattori et al. (2012), reported lysozyme-mediated TG for the production of COS with DP 6 to 15 by using β-1,4-(GlcNAc)<sup>3</sup> as substrates. A hyper transglycosylating chitinase (EcChi1) with an endo-acting cleavage pattern was purified from Enterobacter cloacae subsp. cloacae 13047 (Mallakuntla et al., 2017). The (GlcNAc)3−<sup>6</sup> were utilized as substrates for the production of COS via TG and resulted in the formation of products with DP ranging from 4 to 9. The study further concluded that the length of the COS substrate and the concentration of the enzyme significantly affects the TG activity of EcChi1. Similarly, a salttolerant chitinase B (FjChiB) was isolated from Flavobacterium johnsoniae UW101; it was also found to be exhibiting TG activity as it was utilizing (GlcNAc)<sup>5</sup> and (GlcNAc)<sup>6</sup> as the substrates for COS synthesis (Vaikuntapu et al., 2018). An Nacetylglucosaminidase derived from Lecanicillium lecanii on a submerged culture showed up both hydrolytic and TG activities. Findings signified that the enzyme was able to produce COS with DP from 2 to 6 units (Rojas-Osnaya et al., 2019). Recently, Bhuvanachandra and Podile (2020) expressed a CsChil gene from Chitiniphilus shinanonensis in E. coli and assessed its hydrolytic and TG ability. They found that production of (GlcNAc)<sup>4</sup> was most effective when using (GlcNAc)<sup>2</sup> as substrate. (GlcNAc)4−<sup>6</sup> were also used as substrate but no higher chain production was observed. TG of (GlcNAc)<sup>4</sup> resulted in the production of products with DP 2 to 6 where COS with DP 5 and 6 were less in fraction, as after 30 min of reaction the products were further hydrolyzed into shorter COS. Sirimontree et al. (2014) developed several mutants of chitinase A derived from Vibrio harveyi (VhChiA) for the production of longer COS through TG. However, the product formed by enhancing the TG activity is not always a longer chain COS; occasionally, disintegration could result in the generation of COS with less DP. HPLC analysis of the TG products formed by mutants W570G and D392N using GlcNAc as substrate showed improved TG efficiency but products immediately hydrolyzed into shorter COS. Contrarily, the products formed by mutants D313A and D313N were not hydrolyzed further and resulted in the accumulation of longer chain COS. This indicated that some mutational strategies should be adopted along with the enhancement of TG activity to prevent the decomposition of the product formed.

#### Chemoenzymatic Approach

The production of COS and GlcNAc has been extensively explored using chemical and enzymatic processes, but both are associated with certain limitations. In the case of synthetic production, product quality and the environment is compromised, whereas enzymatic methods result in a low production yield. So, to fulfill the demand for commercial production, interest has been raised to develop synergy between different processes. In this context, the chemical pretreatment followed by enzymatic hydrolysis has illustrated significant enhancement in COS and GlcNAc production (Trung et al., 2020). Pretreatment of chitin before enzymatic hydrolysis not only decreases the crystallinity of the polymer but also makes the substrate readily accessible to the enzyme. In recent years, the solution of alkali (LiOH/NaOH/KOH)/urea as a solvent has also emerged as a highly efficient and comparatively eco-friendly solvent for the dissolution of chitin and cellulose by freeze-thaw methods (Xiong et al., 2013). However, Gong et al. (2016) conducted a study on the dissolution of chitin using a solvent comprised of aqueous solutions of KOH (8.4–25 wt%) /urea and through NMR analysis observed that chitin displayed appreciably good solubility (80%) through the freeze-thawing process. The study supported the dissolution power of alkali-based solvent systems in the order KOH > NaOH > LiOH. Recently, Sivaramakrishna et al. (2020) used KOH and solvent of KOHurea aqueous solutions for the pretreatment of α-chitin before enzymatic hydrolysis with Enterobacter cloacae subsp. cloacae derived chitinase (EcChi1). The hydrolytic activity of KOH and KOH-urea pretreated α-chitin was increased significantly (91%) when compared to colloidal chitin and untreated α-chitin. Furthermore, the HPLC analysis of hydrolytic products obtained with treated α-chitin showed the generation of COS with DP3 along with DP1 and DP2, whereas with the colloidal chitin and untreated α-chitin only DP1and DP2 COS were produced. In another study, the chemo-thermal pretreatment of chitin by using H2SO<sup>4</sup> (2% w/v) at 21◦C and 15 psi for 60 min efficiently increased the vulnerability of chitin for the myco-chitozymes (Kumar et al., 2020). The activity of chitinase and chitosanase was found elevated up to 320 and 58 Ul−1, respectively, while using the pretreated chitin. Furthermore, the synergistic action of both the enzymes (2:1 ratio) on pretreated chitin resulted in a yield of 772 mg L−<sup>1</sup> of GlcN. Villa-Lerma et al. (2016) demonstrated the pretreatment of chitin using supercritical 1,1,1,2-tetrafluroethane (101◦C and 40 bar pressure) followed by rapid depressurization and fibrillation. Following pretreatment, chitin was subjected to enzymatic hydrolysis by β-N-acetylhexosaminidase and chitinase derived from Lecanicillium lecanii. The result showed the synthesis of highly acetylated COS with F<sup>A</sup> of 0.45 and DP of 2 to 5. Currently, the utilization of greener chemicals is getting much more attention for pretreatment due to their high digesting capacity and lower environmental toxicity (Mao et al., 2019).

Among the green chemicals, ILs have been extensively studied for the pretreatment of lignocellulosic biomass and their utilization for chitin pretreatment has also been started (**Table 4**). The dissolution of chitin in a solvent is crucial for the development of value-added biomaterials. However, the high dissolution behavior of chitin in the ILs depended upon the quality of chitin as well as on the dissolution process (Shamshina, 2019). 1-butyl-3-methylimidazolium acetate showed good solvent properties for native chitin (Wu et al., 2008). ILs have also been utilized for the modification of chitin to improve its physicochemical and biological properties. Jaworska et al. (2017) observed that several ILs have an ethyl group as a substituent in the cationic ring that promoted chitin modification. In a study, Berton et al. (2018) suggested that the distribution of product and yield completely relies on the hydrated state and type of substrate chosen for the reaction by conducting a comparative evaluation of enzymatic hydrolysis using chitinase from Streptomyces griseus. They used dried shrimp shells (SS), pure commercial grade PG-chitin, chitin hydrogel extracted from SS using IL ([C2mim] [OAc] via dissolution and coagulation), and chitin extracted from SS in dried form as substrates. The enzymatic hydrolysis of SS resulted majorly into GlcNAc, whereas (GlcNAc)<sup>2</sup> has been obtained as a major product in the case of PG-chitin and dried ILextracted chitin. Contrarily, IL-extracted hydrogel chitin at 25◦C yielded (GlcNAc)2, which on further increase in temperature resulted in the production of GlcNAc, and finally, at the highest reaction temperature, a minor amount of (GlcNAc)<sup>3</sup> has been observed as hydrolysis product. Similarly, an acidic IL i.e. 1-(carboxymethyl) pyridinium chloride is synthesized and examined for its unexploited potential to dissolve cellulose, CHS, and chitin (Taheri et al., 2018). The findings of the study suggested that IL treated samples of cellulose and CHS were less crystalline with lower temperatures of degradation. In contrast, chitin, in addition to dissolution, also hydrolyzed to quaternary ammonium CHS.

Moreover, another recent study employed double chitinase, i.e., chip1 and chip2, from Penibacillus pasadensis CS0611 on IL pretreated crab shell chitin and resulted in 712.6 mg g−<sup>1</sup> of COS with DP 2 and 177.1 mg g−<sup>1</sup> of GlcNAc (Xu et al., 2018). Li et al. (2019) evaluated the effects of IL (1-ethyl-3-methylimidazolium acetate [[C2mim][OAc]]) on chitin pretreatment and the yield of subsequent enzymatic hydrolysis utilizing chitinase derived from Streptomyces albolongus ATCC27414. The enzymatic hydrolysis of [C2mim][OAc] pretreated chitin resulted in the significant production of GlcNAc (175.62 mg g−<sup>1</sup> ) and N, N'-diacetylchitobiose (341.70 mg g−<sup>1</sup> chitin) with 61.49% conversion efficiency at 48 h. The authors correlated the apparent performance of enzymes with the evidenced great alleviation in the crystallinity of chitin, high enzyme adsorption ability, and structural changes in porosity and polymer's grain size. Further, they conducted an NMR spectroscopic evaluation of GlcNAc solvation in [C2mim][OAc] and confirmed the formation of hydrogen bonding between hydroxyl groups of GlcNAc and cations and anions of IL [C2mim][OAc]. Husson et al. (2017) demonstrated the role of two imidazoliumbased room temperature ILs (1-ethyl-3-methylimidazolium methylphosphonate, [C2mim][MeO(H)PO2] and 1-ethyl-3 methylimidazolium acetate, [C2mim][OAc]) on the conversion of chitin through sequential and simultaneous strategies. For the enzymatic hydrolysis, commercial chitinases from Trichoderma viride or Streptomyces griseus have been applied. In the sequential strategy, chitin was pretreated with ILs under mild conditions prior to enzymatic hydrolysis. Compared to [C2mim][MeO(H)PO2], the IL [C2mim][OAc] has resulted in better yield of GlcNAc (185.0 ± 4.0 mg g−<sup>1</sup> chitin) and (GlcNAc)<sup>2</sup> (667.60 ± 20.71 mg g−<sup>1</sup> chitin) following enzymatic hydrolysis by T. viride and S. griseus, respectively. On the other hand, in the simultaneous strategy, the ILs were used as co-solvents in one plot enzymatic hydrolysis using S. griseus yielded 573.72 ± 5.99 mg g−<sup>1</sup> chitin of (GlcNAc)<sup>2</sup> with traces of GlcNAc. Furthermore, enzymatic hydrolysis of [C2mim][OAc] pretreated chitin by the chitinases concoction from both the organisms resulted in the production of 760.0 ± 0.1 mg g−<sup>1</sup> chitin. Thus, it can be concluded that the sequential strategy is highly efficient for the conversion of chitin into valuable GlcNAc and (GlcNAc)<sup>2</sup> (Husson et al., 2017). Qin et al. (2010) performed the dissolution and extraction of crustacean shells to obtain the chitin utilizing ILs ([C2mim][OAc]). During dissolution, disruption of hydrogen bonds resulted in the disassembly of the chitin chain, which further gets rearranged into a different arrangement during coagulation and formed amorphous (open hydrated gel type) chitin. The utilization of ILs in the pretreatment of lignocellulosic biomass has been well-explored and showed promising results. However, employment of ILs for chitin pretreatment is relatively less explored. Therefore, given the success of cellulose pretreatment through ILs and some significant studies of the employment of ILs in chitin pretreatment, there is a requirement of further exploration of different ILs for chitin pretreatment to enhance its vulnerability for enzymatic action.

Apart from the pretreatment strategy, the development of synthetic COS can be achieved through the enzymatic glycosylation of a chemically derived COS- oxazoline monomer. The genetically engineered chitinolytic enzymes exhibiting TG activity are used to develop regio- and stereospecific COS. Shoda et al. (2006) synthesized a novel oligosaccharide of Nacetyllactosaminoglycan with DP5 having β-(1→4)-β-(1→6) linked repeating units in the main polymeric chain. For this, they used transglycosylating chitinase A1 derived from Bacillus circulans WL-12 and an oxazoline derivative of Nacetyllactosamine (LacNAc-oxa) as substrate. Likewise, Yoshida et al. (2012) demonstrated the regio- and stereospecific synthesis of (GlcNAc)7 having β-(1→4) linkage by polyaddition of 1,2 oxazoline derivative of (GlcNAc)5 with (GlcNAc)2 through TG activity of chitinase A1 from B. circulans WL-12. This chemoenzymatic route for COS production appeared as a promising approach for the commercial-scale production of COS and GlcNAc.

#### PURIFICATION AND CHARACTERIZATION OF COS AND GlcNAc

The conversion of chitin by the chemical, enzymatic, or synergistic method usually results in a mixture of monomer GlcNAc, oligomers, homologs, and isomers of discrete MW and DP. Thereby, separation and characterization of desired COS and GlcNAc from the reaction mixture is a challenging task. The purification of these products is vital for their characterization to gain an insight about the association between physicochemical properties and biological activities (Liaqat and Eltem, 2018). Moreover, the utilization of COS and GlcNAc in the biomedical and food industries require high purity, quality, and quantification (Xia et al., 2011). Numerous strategies have been reported in the literature for the extraction and purification of COS and GlcNAc, such asgel filtration, ultrafiltration, and nanofiltration (Sørbotten et al., 2005; Lopatin et al., 2009; Dong et al., 2014).

COS can be purified and characterized by employing modern chromatography and spectrometry techniques (**Table 3**). While purifying derivatized COS, the selection of purification strategy solely relies on the chemical nature of the derivative. Full structural characterization of COS is relatively tedious but pure and small oligomers can be characterized to some extent. Primarily, the analysis of COS is performed through thin layer chromatography (TLC). TLC helps in the separation of a mixture of oligomers based on DP at a relatively low cost, while HP-TLC relies on the DA. However, TLC does not offer an accurate quantitative analysis of the products.

High Performance Liquid chromatography (HPLC) serves as an advancement to TLC as it can smoothly perform the COS analysis based on both DP and DA when coupled with mass spectrometry (Li et al., 2016). GlcNAc can absorb UV light maximum at 204 nm due to the presence of the acetamido group, but sometimes contaminating compounds alter the results by absorbing at a lower wavelength. HPLC can detect only the partial or fully acetylated COS (Seki et al., 2019). Refractive index detectors (RID) are used for COS detection with HPLC. TLC and HPLC were performed to analyze the enzymecatalyzed hydrolysis products of CHS and N<sup>1</sup> -acetylchitohexaose [(GlcN)5-GlcNAc] with a GlcNAc residue at the reducing end (Seki et al., 2019). TLC analysis of CHS illustrated that the enzyme opted as an exo-type pattern of cleavage as (GlcN)<sup>2</sup> has been obtained as a major product. The observations revealed that (GlcN)<sup>2</sup> and (GlcN)3-GlcNAc have been produced as in substantial quantities and later was further hydrolyzed into GlcN-GlcNAc and (GlcN)2. In spite of being widely utilized for COS and GlcNAc characterization, HPLC is linked with several disadvantages viz. low retention time, requirement of a concentrated sample, less suitable for gradient elution, and organic eluent requirement in high amounts (Liaqat and Eltem, 2018). High performance anionic exchange chromatography with pulsed amperometric detection (HPAEC-PAD) has been considered as another promising and efficient tool for the separation of underivatized carbohydrates. HPAEC, by reducing cost and time of the derivatization of COS, offered results with high resolution and sensitivity (Li et al., 2016). Cao et al. (2016) conducted a study on the separation of the underivatized mixture of deacetylated COS (GlcN)1−<sup>6</sup> using HPAEC-PAD. The researchers used the HPAEC-PAD system comprised of ICS-3000 system, CarboPac-PA100 guard column (4 × 50 mm), and CarboPac-PA100 analytical column (4 × 250 mm). The study reported efficient separation of COS with DP 2 to 6, while GlcN separation was challenging due to the presence of its peak amid (GlcN)<sup>3</sup> and (GlcN)<sup>4</sup> peaks.

Capillary electrophoresis (CE) also provided improved purification advancements compared to HPLC. CE offers a high resolution using a small amount of solute and solvent with a comparatively short analysis time (Liaqat and Eltem, 2018). The separation of COS proceeds efficiently in aqueous acidic solution as they can adsorb on negatively charged surfaces, such asfused silica capillaries in acidic solution (Sarbu and Zamfir, 2018). Hattori et al. (2012) demonstrated the separation of COS through simple CE by using N-trimethoxypropyl- N, N, -Ntrimethylammonium chloride coated positively charged capillary. However, the process of derivatization and utilization of highly expensive materials make the CE economically less efficient. Recently, nuclear magnetic resonance (NMR) spectroscopy is considered to be one of the best techniques for the structural analysis of COS and to determine their degree of deacetylation (DD). NMR provides the complete spectra of reducing sugars, non-reducing sugars, and disparities in its nearest neighbors. Jiang et al. (2017) reported two methods for the analysis of DD in COS: the first method involved acid-base titration with bromocresol green indicator, while the second included first-order derivative UV spectrophotometric method. The accuracy in the results of both the techniques was verified by comparing the results obtained with <sup>1</sup>H NMR spectroscopy. Another study prepared COS with two different processes: the first was completed solely through enzymatic hydrolysis employing chitosanase, while the second involved chemical hydrolysis followed by an enzymatic one (Sánchez et al., 2017). The study analyzed COS produced through <sup>1</sup>H NMR spectroscopy and MALDI-TOF-MS and observed that COS with 63% fully deacetylated sequences were generated in the second process. Cao et al. (2019) confirmed the synthesis of Naringin and Chitooligosaccharide (Nari-COS) complex by using scanning electron microscopy (SEM) and <sup>1</sup>H NMR. However, COS analysis through NMR spectroscopy requires more development as it is limited to only low DP COS (< (GlcNAc)5) along with the requirement of the concentrated sample.

Currently, MALDI-TOF-MS emerged as the most appropriate technique to analyze biomolecules i.e., lipids, saccharides, peptides, and other organic macromolecules through the ionic mass and charge. MALDI-TOF-MS involves the soft-ionization process, which leads to lesser or no fragmentation of analytes that offers the identification of molecular ions of analytes even within complex mixtures such as COS (Liaqat and Eltem, 2018). Santos-Moriano et al. (2019) performed controlled enzymatic hydrolysis of CHS or chitin and obtained three types of COS viz. fully acetylated (faCOS), fully deacetylated (fdCOS), and partially acetylated (paCOS). The study employed HPAEC-PAD and MALDI-TOF-MS to determine the chemical composition of obtained COS (Santos-Moriano et al., 2019). Despite having several advantages over other COS characterization techniques, MALDI-TOF-MS also holds a few drawbacks, such as havinga lower mass limit (not <500 Da) of the sample to be analyzed (Li et al., 2016). However, the recent development in the purification and characterization of COS and GlcNAc have increased the medicinal value of these bioactive molecules. This has resulted in the further improvement of the biological activities of COS and GlcNAc through engineering/derivatization.

### COS AND GlcNAc DERIVATIZATION/ENGINEERING FOR IMPROVING BIOLOGICAL ACTIVITIES

COS and GlcNAc embrace several biological activities and have diverse applications in several fields viz. food, agriculture, pharmaceuticals, biomedical, and cosmetics, owing to their biocompatibility, biodegradability, and renewable production. Currently, engineering of the functionalized COS and GlcNAc derivatives with enhanced biological activities have gained worldwide attention due to remarkable applications in medicine (**Figure 5**). For example, phenolic compounds conjugated COS have significant medical applications (Liaqat and Eltem, 2018). Numerous recent studies reported the improvement of COS's biological activities through derivatization (**Table 5**). Several derivatives of COS and GlcNAc conjugated with aminoethyl, phenolic compounds, gallic acid, carboxyl group, and sulfate have been reported so far. Ngo et al. (2019) formed aminoethyl (AE) conjugated derivative of COS by replacing the hydroxyl group present at C-6 position with AE. The aminoethylation reactivity was found to be highest at the C-6 position. The aminoethyl COS (AE-COS) was readily soluble in water and further examined for its potential to inhibit angiotensin I converting enzyme (ACE). The findings revealed that at 2.5 mg mL−<sup>1</sup> it displayed 89.3% ACE inhibition through AE-COS (2.5 mg mL−<sup>1</sup> ) with IC<sup>50</sup> value of 0.8017 mg mL−<sup>1</sup> . AE-COS was investigated to assess the antiproliferative effect on cell invasion of human fibrosarcoma cells (Hong et al., 2016). Firstly, the impact of AE-COS on cell viability was observed, followed by the utilization of gelatin zymography and western blot to analyze the inhibitory effects of AE-COS on the activity and expression levels of matrix metalloproteinases (MMP) related to the invasion of cancer cells, i.e., MMP-2 and MMP-9. AE-COS downregulated the expression of MMP-9 at a concentration >20 g ml−<sup>1</sup> , whereas an expression of p50 was reduced when applied at <20 g ml−<sup>1</sup> , concentration (Hong et al., 2016). Similarly, the antiproliferative activity of AE-COS on AGS human gastric adenocarcinoma cells was also evaluated. Three amino derivatized COS, i.e., aminoethyl-chitooligosaccharide (AE-COS), dimethyl aminoethyl-chitooligosaccharide (DMAE-COS), and diethyl aminoethyl-chitooligosaccharide (DEAE-COS) were prepared and confirmed through IR spectra. The amino derivatized COS showed antiproliferative potential on AGS human gastric adenocarcinoma cells and AE-COS and DEAE-COS exhibited higher apoptotic activity than DMAE-COS (Karagozlu et al., 2010). AE-COS has been observed to decrease the cell viability of human lung A549 cancer cells to 32 ± 1.3% at a concentration of 500 µg mL−<sup>1</sup> (detection: MTT assay) (Ngo et al., 2019). AE-COS (500 µg mL−<sup>1</sup> ) has appreciably blocked the expression of cyclooxygenase-2 (COX-2) and Bcl-2 while upregulating the expression of apoptotic proteins caspase-3 and 9 in A549 cancer cells. AE-COS is considered as a potent candidate to be utilized as a chemotherapeutic agent in a dose-dependent manner for cancer treatment.

The carboxylated COS (C-COS) can be synthesized by incorporating the carboxyl group to the amino position at the C-2 of pyranose unit. Rajapakse et al. (2006) synthesized three different C-COS-(1/-2/-3) using succinic anhydride and evaluated their inhibitory potential on MMP-9 expression on human fibrosarcoma cell line (HT1080). The C-COS were distinct to each other based on the increase in the mole ratio of COS/succinic anhydride. The MTT assay of all three C-COS revealed that they do not exhibit cytotoxicity even at a concentration of 500 µg mL−<sup>1</sup> . Further, the researchers confirmed the dose-dependent inhibition (100 µg mL−<sup>1</sup> of C-COS-1/2/3 led to 50% inhibition) of MMP-9 mediated gelatinolytic activities in HT1080 cells through zymography. The increment in the substitution degree of C-COS was directly proportional to the MMP-9 inhibition. Furthermore, C-COS-3 blocked the MMP-9 expression at 100 µg mL−<sup>1</sup> significantly by blocking the transcription of activator protein −1 that further inhibited the invasiveness of HT1080 cells. However, C-COS did not show any considerable effect on the inhibition of another transcription factor of MMP-9 expression i.e., NF-κB. Rajapakse et al. (2007) also examined the anti-oxidative potential of C-COS in human (HL60) and mouse macrophages (RAW264.7 cells). DPPP oxide fluorescence assays and thiobarbituric acid reactive substance (TBARS) assays were performed using both C-COS and COS for the inhibition of cell membrane lipid peroxidation. The results showed that at 1,000 µg mL−<sup>1</sup> both C-COS and COS inhibited DPPP oxide fluorescence intensity by 63 and 50%, respectively. In contrast, TBARS has been reduced to 75% using C-COS (1,000 µg mL−<sup>1</sup> ) with no significant inhibitory effects of COS. In HL60 cells, C-COS has been observed to reduce myeloperoxidase activity by 43% at 1,000 µg mL−<sup>1</sup> . Furthermore, C-COS was also found to have the ability to inhibit the formation of reactive oxygen species (ROS), such as superoxide radicals H2O<sup>2</sup> and HOCl, more than the COS. This was confirmed by Direct radical scavenging studies carried out with the DCFH-DA fluorescence probe.

3, 4, 5-trihydroxybenzoic acid, commonly known as Gallic acid, is a well-known secondary polyphenolic metabolite that exhibited antioxidant, antimicrobial, anti-inflammatory, anticarcinogenic, antiangiogenic, and antimutagenic activities (Choubey et al., 2015). Gallic acid conjugated COS were synthesized by the covalent linking of gallic acid to COS via carbodiimide with a subsequent assessment of the enhanced cellular antioxidant activity. The TBARS and DPPP assays were performed to check the lipid peroxidase inhibitory potential of G-COS and a reduction was observed in TBARS by 80% and in lipid peroxide level by 62% at 100 µg mL−<sup>1</sup> in a dose-dependent manner. COS alleviated 50% of DPPP oxide fluorescence intensity when applied at a concentration of 100 µg mL−<sup>1</sup> . They demonstrated the intracellular radical scavenging potential of G-COS through the DCFH-DA method. Further, G-COS (100 µg mL−<sup>1</sup> ) inhibited membrane protein oxidation and radical-mediated DNA damage in a dose-dependent manner in RAW264.7 cells by 83 and 90%, respectively. Contrarily, at similar concentrations, COS (100 µg mL−<sup>1</sup> ) has been observed to prevent only 20% of the radical-mediated DNA damage. Moreover, G-COS elevated the level of intracellular antioxidant enzymes [superoxide dismutase (SOD) and glutathione (GSH)] while blocking the activation and expression of NF-κB in H2O<sup>2</sup> induced RAW264.7 cells (Ngo et al., 2011b). Similar results were obtained with SW1353 cells in a study conducted by Ngo et al. (2011a). The study illustrated G-COS as an efficient free radicals scavenger and showed that it also prevents oxidative damage in DNA, proteins, and lipids of SW1353 cells with increasing levels of antioxidant enzymes. The findings indicated that G-COS can be utilized as leading antioxidants in the food and pharmaceutical


industries. G-COS was also found to exhibit anti-inflammatory activities in human lung epithelial A549 cells. In this study, G-COS (200 µg mL−<sup>1</sup> ) prevented the H2O2-induced DNA damage and ROS production in A549 cells. G-COS displayed 70% DPPH radical scavenging activity at a concentration of 200 µg mL−<sup>1</sup> , whereas COS showed only 20%. On the contrary, downregulation of cyclooxygenase-2 (COX-2) expression was evidenced along with the reduction in production of PGE2 from LPS-stimulated A549 cells from 64 ± 4 −30 ± 4 pg ml−<sup>1</sup> at 200 µg mL−<sup>1</sup> of G -COS. The production of cytokines (IL-8 and TNF-α) was also inhibited in a dose-dependent manner by G-COS. The control or LPS treated cells had 3,124 ± 25 pg ml−<sup>1</sup> and 45± 1 pg ml−<sup>1</sup> of IL-8 and TNF-α, but G-COS curtailed it to 1,246 ± 15 pg ml−<sup>1</sup> and 11 ± 3 pg ml−<sup>1</sup> , respectively (Vo et al., 2017). The results highly enlighten the anti-inflammatory and anti-oxidative potential of G-COS.

Sulfated COS (S-COS) was synthesized with different degrees of substitution (DS) for the enhancement of its water-solubility as well as biological activities. S-COS holds the antioxidant activity as it protected the MIN6 cells from H2O<sup>2</sup> induced dysfunction. S-COS-I (DS-0.8) and S-COS-II (DS-1.9) enhanced cell viability in a dose-dependent manner and at the highest concentration (500 µg mL−<sup>1</sup> ), it reached up to 91.7 ± 7% and 95.6 ± 5.3%, respectively. Observations suggested that S-COS-I and S-COS-II were successful in suppressing the expression of H2O<sup>2</sup> induced Bax mRNA and Caspase-3 mRNA in H2O<sup>2</sup> induced MIN6 cells to 1.04 ± 0.08 and 0.79 ± 0.02, respectively. NFκB/p65 activation and upregulation of Bcl-2 mRNA expression has also been reported (Lu et al., 2013). Acidic fibroblast growth factor (aFGF) plays a vital role in the growth and survival of neurons and is a possible treatment for peripheral nerve injury. Heparin-like properties of S-COS enabled them to improve the biological activities of aFGF. Na2S2O<sup>4</sup> hypoxia/reoxygenation injury was induced in RSC96 cells against which the protective effects of S-COS with or without aFGF were investigated. Cell viability and cytotoxicity was assessed by MTT assay and lactate dehydrogenase (LDH) release into the culture medium, respectively. The results showed that COS-S improved the protective effects of aFGF on nerve repair and restoration of function in rats with sciatic nerve injury (Liu et al., 2019).

Phenolic acids belong to the broadly distributed plant nonflavonoid phenolic compounds and are anti-oxidative. COS conjugated with eight different types of phenolic acids (namely, caffeic acid, hydroxybenzoic acid, p-coumaric acid, vanillic acid, ferulic acid, syringic acid, sinapinic acid, and protocatechuic acid) were synthesized by Eom et al. (2012) and the conjugates were formed via amide coupling reaction. The ability of phenolic acids to donate H-atom has enhanced the antioxidative nature of conjugated COS. The characterization of synthesized conjugates was performed by UV, FTIR, and <sup>1</sup>H NMR data. Caffeic acid conjugated COS and protocatechuic acid conjugated COS have shown a comparatively higher reducing power and radical scavenging (NO and DPPH) activity as compared to COS and other derivatives (Eom et al., 2013). Therefore, caffeic acid conjugated COS can be utilized as an antioxidant compound, that was synthesized by using hydroxyl cinnamic acid and hydroxyl benzoic acid for conjugation with COS. These conjugates were evaluated for their inhibitory activities against the β-site amyloid precursor protein (APP)-cleaving enzyme (BACE). BACE plays a critical role in reducing the levels of Aβ amyloid peptide in Alzheimer's disease (AD). The results showed that caffeic acid conjugated COS derivative has significantly inhibited the BACE and reduced the risk of AD (Eom et al., 2013). Therefore, engineering of COS and GlcNAc derivatives can significantly improve their biological activities. Hence, further research and development in the derivatization can result in the augmentation of biomedicinal applications of COS and GlcNAc.

#### CONCLUSIONS

COS and GlcNAc possess remarkable biological activities and are in tremendous need for advancements for their use in biomedicine. However, despite being thoroughly studied, the production of COS and GlcNAc through chemical and enzymatic approaches has not attained the desired level for large-scale production of well-defined COS in terms of DP, MW, PD, and PA. Chemical methods led to the generation of toxic waste along with modified products, while the enzymatic methods produce welldefined COS but in a very low yield. Synergy of the chemical and enzymatic process seems to be effective in improving COS and

#### REFERENCES


GlcNAc production. Moreover, the utilization of ILs for substrate pretreatment, followed by enzymatic hydrolysis, showed high potentiality for development of an efficient and environmentallyfriendly method. Thus, investigation is needed to develop controlled chemoenzymatic processes for the generation of high COS and GlcNAc titres. Additionally, the engineering of COS and GlcNAc and derivatization by employing strategies, i.e., TG and genetic recombination approaches, can further enhance their biological activities. In the light of research done so far, it can be concluded that despite numerous studies for the development of efficient COS and GlcNAc production processes, there is still a need to advance chemoenzymatic approaches to augment the COS and GlcNAc production levels. Moreover, the engineering and derivatization of COS can further improve their biological functions and so also need detailed molecular investigation.

### AUTHOR CONTRIBUTIONS

MK conceived and wrote the review article. MR performed a literature survey and helped in preparing the manuscript. TS performed sketch work. VV proofread and give valuable advice on the content of the review. NP supervised, conceptualized, wrote, and edited the review article. All authors contributed to the article and approved the submitted version.

### FUNDING

This work was financially supported by the Department of Science and Technology (Grant No. DST/INSPIRE/04/2014/002644), Science and Engineering Research Board (Grant No. ECR/2018/002633), and Department of Biotechnology (Grant No. BT/Bio-CARe/03/9840/2013-2014), Government of India.

#### ACKNOWLEDGMENTS

The authors gratefully thank the Science and Engineering Research Board (ECR/2018/002633) and Department of Science and Technology (No. DST/INSPIRE/04/2014/002644), Ministry of Science and Technology, Government of India for their financial support. The authors also acknowledge DST- FIST grant (SR/PSI/LSI-6/6/2016(C)) for supporting the research work. The authors would like to highly acknowledge the Central University of Rajasthan, Rajasthan, India, for providing the necessary facilities to accomplish this work.

polymerization of chitooligosaccharides. Carbohydr. Res. 478, 1–9. doi: 10.1016/j.carres.2019.04.001


manner. Int. J. Biol. Macromol. 145, 1–10. doi: 10.1016/j.ijbiomac.2019. 12.134


isolated from a brackish water lake sediment. Biotechnol. Rep 21:e00311. doi: 10.1016/j.btre.2019.e00311


inflammatory responses in Chang liver cells. Int. J. Biol. Macromol. 66, 1–6. doi: 10.1016/j.ijbiomac.2014.01.064


on chitinase-template. Chem. Lett. 41, 689–690. doi: 10.1246/cl.20 12.689


in vitro. Int. J. Biol. Macromol. 59, 391–395. doi: 10.1016/j.ijbiomac.2013. 04.072


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Kumar, Rajput, Soni, Vivekanand and Pareek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Essential Functions and Detection of Bisecting GlcNAc in Cell Biology

Qiushi Chen1†, Zengqi Tan2†, Feng Guan<sup>2</sup> \* and Yan Ren1,3 \*

*<sup>1</sup> Clinical Laboratory of BGI Health, BGI-Shenzhen, Shenzhen, China, <sup>2</sup> Joint International Research Laboratory of Glycobiology and Medical Chemistry, College of Life Sciences, Northwest University, Xi'an, China, <sup>3</sup> Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China*

The N-glycans of mammalian glycoproteins vary greatly in structure, and the biological importance of these variations is mostly unknown. It is widely acknowledged that the bisecting N-acetylglucosamine (GlcNAc) structure, a β1,4-linked GlcNAc attached to the core β-mannose residue, represents a special type of N-glycosylated modification, and it has been reported to be involved in various biological processes, such as cell adhesion, fertilization and fetal development, neuritogenesis, and tumor development. In particular, the occurrence of N-glycans with a bisecting GlcNAc modification on proteins has been proven, with many implications for immune biology. Due to the essential functions of bisecting GlcNAc structures, analytical approaches to this modification are highly required. The traditional approach that has been used for bisecting GlcNAc determinations is based on the lectin recognition of *Phaseolus vulgaris* erythroagglutinin (PHA-E); however, poor binding specificity hinders the application of this method. With the development of mass spectrometry (MS) with high resolution and improved sensitivity and accuracy, MS-based glycomic analysis has provided precise characterization and quantification for glycosylation modification. In this review, we first provide an overview of the bisecting GlcNAc structure and its biological importance in neurological systems, immune tolerance, immunoglobulin G (IgG), and tumor metastasis and development and then summarize approaches to its determination by MS for performing precise functional studies. This review is valuable for those readers who are interested in the importance of bisecting GlcNAc in cell biology.

#### Keywords: bisecting GlcNAc, mass spectrometry, glycosylation, N-glycan, GlcNAc-T III

#### INTRODUCTION

The monosaccharide-amino acid linkage of N-acetylglucosamine (GlcNAc) β1- asparagine (Asn) was originally discovered in biochemical analyses of abundant glycoproteins present in serum, e.g., immunoglobulins (Imperiali and Hendrickson, 1995; Cobb, 2020). Since then, glycans that covalently attached to proteins at Asn residues by an N-glycosidic bond have been termed Nglycans. This attachment usually occurs in a conserved sequence Asn-X-Ser/Thr, in which X can be any amino acid except proline (Pro) (Varki, 2009; Taylor and Drickamer, 2011; Chung et al., 2017).

A distinctive structural feature of N-glycans is the presence of several GlcNAc antennae (branches) that are sequentially synthesized by a series of Golgi-resident glycosyltransferases,

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Wenjie Peng, Shanghai Jiao Tong University, China Hongzhi Cao, Shandong University, China*

#### \*Correspondence:

*Feng Guan guanfeng@nwu.edu.cn Yan Ren reny@genomics.cn*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *12 March 2020* Accepted: *18 May 2020* Published: *03 July 2020*

#### Citation:

*Chen Q, Tan Z, Guan F and Ren Y (2020) The Essential Functions and Detection of Bisecting GlcNAc in Cell Biology. Front. Chem. 8:511. doi: 10.3389/fchem.2020.00511*

**146**

N-acetylglucosaminyltransferases (GlcNAc-Ts) (**Figure 1**) (Schachter, 1991; Kizuka and Taniguchi, 2018). N-glycans can be divided into three categories: high-mannose, hybrid, and complex. Hybrid and complex N-glycans may carry a bisecting GlcNAc group, which forms a new subtype of glycan termed bisecting GlcNAc (Harpaz and Schachter, 1980; Varki, 2009; Nakano et al., 2019). The discovery of this structure lagged behind the detection of other glycan structures due to the limitations of the detection approaches and the peculiarity of its structure. This type of glycan was reported in the 1970s and was detected by a combination of sequential exoglycosidase digestion, methylation derivatization, acetolysis, and Smith degradation from ovalbumin (Yamashita et al., 1978; Nagae et al., 2020). GlcNAc transferred to the 4-position of the β-linked core mannose (Man) residue in complex or hybrid N-glycans by the β1,4-mannosyl-glycoprotein 4-β-Nacetylglucosaminyltransferase (GlcNAc-T III) is considered as a bisecting structure that is usually not considered as an antenna because it cannot be further extended by the proper enzymes (Narasimhan, 1982; Schachter, 1991; Varki, 2009; Miwa et al., 2012; Chen et al., 2016). GlcNAc-T III is encoded by the gene mgat3, which was initially discovered from hen oviducts in 1982 (Narasimhan, 1982; Miwa et al., 2012). It has been reported that its distribution in human tissues is mainly in the brain, liver, placenta, bone marrow, and kidney (Nishikawa et al., 1992; Yoshimura et al., 1995b; Taniguchi et al., 1999; Takamatsu et al., 2004; Schedin-Weiss et al., 2019). So far, there are no reports on any tissue specificity that is related to the functions of this subtype of glycan. The addition of this GlcNAc requires the prior action of GlcNAc-T I (Schachter, 1991; Nakano et al., 2019). The existence of a bisecting GlcNAc prevents α-mannosidase II from trimming and has been proved to inhibit the activities of GlcNAc-T II, GlcNAc-T IV, and GlcNAc-T V in vitro as well (Schachter, 1991, 2014; Varki, 2009; Nakano et al., 2019). The

the enzyme GlcNAc-T I. GlcNAc-T II creates a biantennary glycan, and GlcNAc-T III yields a bisecting GlcNAc. More branches can be produced via the action of GlcNAc-T IV, V, and VI. GlcNAc, Man.

addition of bisecting GlcNAc confers unique lectin recognition properties to this new subtype of glycan (Miwa et al., 2012; Nagae et al., 2013; Link-Lenczowski et al., 2018). B16 mouse melanoma transfected by mgat3 that encodes GlcNAc-T III shows weaker binding to phytohemagglutinin-L (PHA-L) but stronger binding to Phaseolus vulgaris erythroagglutinin (PHA-E). The lectins of PHA-L and PHA-E show specific recognition to multiple antennary glycans and bisecting GlcNAc structures, respectively (Yoshimura et al., 1995c; Varki, 2009; Liu et al., 2016; Wu et al., 2019). This suggests that increased expression of GlcNAc-T III may result in a decrease in multiple branched N-glycan structures. The balance among different types of glycans may play an important role in controlling cell functions.

The N-glycans of mammalian glycoproteins vary greatly in structure, but the biological importance of these variations is mostly unknown (Bhattacharyya et al., 2002; Reily et al., 2019). It is widely acknowledged that bisecting GlcNAc represents a special type of N-glycosylated modification that is involved in various biological processes, such as cell adhesion, fertilization and fetal development, neuritogenesis, and tumor metastasis and development (Bhattacharyya et al., 2002; Kariya et al., 2008; Akasaka-Manya et al., 2010; Gu et al., 2012; Allam et al., 2015; Zhang et al., 2015; Kizuka and Taniguchi, 2018). The clearly altered levels of bisecting GlcNAc on integrin β1 have been reported to be responsible for early spontaneous miscarriages in humans (Zhang et al., 2015). Tan et al. found that bisecting GlcNAc is able to inhibit hypoxia-induced epithelial-mesenchymal transition in breast cancer cells (Li et al., 2016; Tan et al., 2018). The presence of glycoproteins bearing complex N-glycans with bisecting GlcNAc, fucose (Fuc) and N,N-diacetyllactosamine (LacdiNAc) structures was detected in extracellular vehicles (EVs) from ovarian carcinoma cells; however, the prevention of N-glycosylation processing from high mannose to complex glycans by kifunensine resulted in alterations in the components of EVs and triggered a decrease in several glycoproteins (Gomes et al., 2015). It has also been reported that the occurrence of a bisecting GlcNAc on glycoproteins has many implications in immune biology (el Ouagari et al., 1995; Yoshimura et al., 1996; Takegawa et al., 2005; Pang et al., 2007; Clark, 2014; Chen et al., 2016; Shade et al., 2019). For instance, K562 cells are easily killed by natural killer (NK) cells; however, after being transfected with mgat3, K562 cells acquired NK-cell resistance (Yoshimura et al., 1996; Patankar et al., 1997; Miwa et al., 2012). Therefore, it is very important to take a step forward and review this type of N-glycan. Although the method that is usually used in many studies for bisecting GlcNAc determination is lectin recognition by PHA-E, there are drawbacks to this method. The first disadvantage is low specificity and sensitivity (Dang et al., 2019). This is quite common in most of the lectin-glycan recognition methods. For instance, Sambucus nigra (elderberry) agglutinin (SNA) IV prefers to bind with α2,6 linked sialic acid but also has some binding to α2,3-linked sialic acid (Chen, 2015; Shang et al., 2015; Lis-Kuberka et al., 2019). The second drawback is that lectin recognition could not tell the relative amount of the bisecting GlcNAc structure. Last but not least, this method is not able to reveal the glycosylation site or the glycan structure. Instead, approaches based on mass

spectrometry (MS) have been revealed in recent studies to be a suitable tool for expeditiously and precisely investigating this type of glycan.

MS is a technique that measures the mass-to-charge ratios of ions and has been used for small-molecule analysis since World War I (Calvete, 2014). It has a history of playing an important role in glycan or glycan-related studies since the 1960s. In 1968, electron ionization was used in the structural elucidation of diand tri-saccharides (Kochetkov et al., 1968; Chen, 2015). At that time, it was difficult to detect more complex oligosaccharides since only the molecules possessing higher volatility were analyzable; however, more oligosaccharides in the complex glycans led to decreased volatility. Additionally, the mass range of MS detection restricted study of the complex oligosaccharides with higher molecular weights. In the late 1970s, MS was used for the first time in the study of blood glycoproteins from Antarctic fish, in which a proline-containing glycopeptide that had a disaccharide was sequenced; the structure of this disaccharide was identified as galactosyl-N-acetylgalactosamine (Morris et al., 1978; Dell and Morris, 2001; Bielik and Zaia, 2010). Several years later, Dell and Morris performed the first structural analysis of glycans using a fast-atom-bombardment mass spectrometer (FAB MS) (Dell et al., 1983; Dell and Morris, 2001). With its rapid development, MS has now significantly improved in its analytical scope, speed, and depth. For glycopeptide or glycan structure analysis, the Orbitrap mass spectrometers with higher resolution and accuracy are a good choice under MSn (n > 2) mode coupled with or without liquid chromatography (LC). Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF/TOF) MS with higher collision energies is also a valuable choice for MS1/MS2 analysis of glycans (Chen, 2015; Chen et al., 2016). Electrospray ionization (ESI)-TOF MS has also been reported to be efficient for glycosylation analysis of intact IgG molecules (Wei et al., 2019).

### THE FUNCTIONS OF BISECTING GlcNAc MODIFICATION

#### In Neurological Systems

Akasaka-Manya et al. discovered that the mRNA levels of mgat3 were elevated in the temporal cortex of the brain in patients with Alzheimer's disease (AD) (Akasaka-Manya et al., 2010), which accelerated studies on the tissue distribution of GlcNAc-T III expression, with the conclusion that it was most highly expressed in the nervous system (Shimizu et al., 1993; Kizuka et al., 2016b; Kizuka and Taniguchi, 2018).

In 1993, Shimizu et al. reported that the main glycan structures detected in the mouse cerebrum, cerebellum, and brain stem are bisected, as is proposed in **Figure 2** (Shimizu et al., 1993; Nagae et al., 2016). Later, Shigeta et al. found that GlcNAc-T III promoted β1 integrin-mediated neuritogenesis triggered by serum deprivation in Neuro2a cells and that the neuritogenesis induced by GlcNAc-T III was functionally blocked by anti-β1 integrin monoclonal antibody (DF5) (Shigeta et al., 2006). In addition, β1 integrin is found to be regulated as a target protein by GlcNAc-T III, and this could be supported

by a study showing that the amount of β1 integrin in erythroagglutinating-phytohemagglutinin (E4-PHA)-associated complexes significantly increased in GlcNAc-T III transfectants compared with that in mock transfectants (Shigeta et al., 2006).

AD is a progressive, neurodegenerative disease in which there are deficits in memory and cognitive functions; more importantly, it is a global health problem (Abbott, 2011; Scheltens et al., 2016; Kizuka and Taniguchi, 2018; Schedin-Weiss et al., 2019). However, current treatments are still only symptomatic (Winblad et al., 2016; Schedin-Weiss et al., 2019). It is necessary to understand that many solid studies support connections between AD and aberrant protein glycosylation, considering the fact that glycoproteins including tau, Aβ-precursor protein (APP), and β-site APP-cleaving enzyme-1 (BACE-1) are involved in AD pathogenesis and have been found to show altered glycosylation patterns (Halim et al., 2011; Schedin-Weiss et al., 2014, 2019; Kizuka et al., 2015). More importantly, APP and BACE-1 contain glycosylation modifications with bisecting GlcNAc structures (Akasaka-Manya et al., 2008, 2010; Schedin-Weiss et al., 2014; Kizuka et al., 2015, 2016a). Bisecting GlcNAc modifications have shown the capacity to stabilize BACE1 protein under conditions of oxidative stress (Kizuka et al., 2016a), and the increased contents of bisecting GlcNAc in AD brains might function as an adaptive response, which protects the brains from the damage caused by additional beta-amyloid yields (Akasaka-Manya et al., 2010). One study showed that the lack of bisecting GlcNAc to BACE1 directed the transport of this protein to the lysosome and accelerated its degradation, which resulted in the less accumulation of β amyloid in AD (Kizuka et al., 2015; Kizuka and Taniguchi, 2018). These findings have highlighted the importance of bisecting GlcNAc modification in the nervous system. However, the underlying mechanism is still not clear.

AD is a chronic disease and begins to develop decades before the first symptoms appear, which suggests the importance of investigating early changes (e.g., glycan alteration) for improving early diagnosis. We have recently published our findings on glycosylation changes in AD research. An increase in bisecting N-GlcNAc modifications was observed in cerebrospinal fluid (CSF) from AD patients. The further investigation of CSF from 242 patients with subjective cognitive impairment (SCI), mild cognitive impairment (MCI), or AD revealed more glycoproteins binding to PHA-E in MCI and AD than in SCI (Schedin-Weiss et al., 2019). Therefore, these findings could be essential for developing early AD diagnosis biomarkers and understanding the early stages of AD development, which might be additionally beneficial for designing novel AD treatment strategies. The challenges in the future will be to perform comprehensive and detailed glycoproteomic and glycomic analysis of those glycoproteins with bisecting GlcNAc modification.

#### In Immune Tolerance

In 1996, Clark et al. proposed the human fetoembryonic defense system hypothesis (hu-FEDS) (Clark et al., 1996; Pang et al., 2016). The basic concept of this hypothesis is that glycoproteins expressed in the reproductive system and gametes can either inhibit immune responses or prevent rejection. Indeed, glycoproteins in human seminal plasma and the pregnant uterus have been shown to suppress immune responses in vitro, specifically for those glycoproteins containing bisecting GlcNAc structures (Bolton et al., 1987; Kelly and Critchley, 1997; Clark, 2014; Szczykutowicz et al., 2019).

Bisecting GlcNAc structures have been reported to possess immune suppression functions. For instance, K562 cells are easily killed by natural killer (NK) cells; however, after being transfected with the gene that encodes GlcNAc-T III, K562 cells possessing more bisecting GlcNAc attain NK cell resistance (Yoshimura et al., 1996; Patankar et al., 1997). Natural killer (NK) cells are the major type of immune cells found in the human uterus, which indicates that they potentially target sperm (King et al., 1991; Clark, 2014). Human sperm were found to express bisecting GlcNAc structures, which explains why sperm are not killed by the maternal immune system when entering the female as a foreign substrate and thus support hu-FEDS (Pang et al., 2007; Clark, 2014; Szczykutowicz et al., 2019). Additionally, abundant bisecting GlcNAc glycans were detected in human syncytiotrophoblasts (STB) and cytotrophoblasts (CTB) (Chen et al., 2016). It is most likely that the maternal immune system was suppressed due to the presence of bisecting GlcNAc glycans and that the fetus benefited from this suppression; the mother could nourish a fetus (similar to a foreign organ as the father contributes to its half genome) within her body for several months without rejection. The possible mechanism underlying this suppression could be that the glycoconjugates interacted with lectins that linked to particular signal transduction pathways modulating immune cell functions. For instance, α-2,3-linked sialic acid on soluble CD52, a glycoprotein of 12 amino acids anchored to glycosylphosphatidylinositol, could mediate T-cell suppression by binding to siglec-10 (Clark, 2014; Shathili et al., 2019). It is possible that bisecting GlcNAc can function in a similar way to suppress NK cells.

## On Immunoglobulin G (IgG)

IgG is an important molecule in the immune system. IgG regulates its immune functions through complement and cellular IgG-Fc gamma receptors (FcγR) (Dekkers et al., 2016). It contains a highly conserved N-linked glycan at position Asn297 in the Fc region (Arnold et al., 2007; Dekkers et al., 2016; Kiyoshi et al., 2017). This glycan is composed of variable levels of fucose, galactose, and sialic acid and bisecting GlcNAc (Le et al., 2016; Gudelj et al., 2018; Lu and Holland, 2019; Shade et al., 2019). It is widely acknowledged that the Fc-glycan has an influence on the biological activities of IgG. For example, a lack of fucose in the Fc glycan significantly improves binding to the human FcγR III, and this result is applied to improve the efficacy of therapeutic monoclonal antibodies. Attachment of bisecting GlcNAc to the Fc glycan has been described to induce antibody-dependent cellmediated cytotoxicity (ADCC) (Shields et al., 2002; Hodoniczky et al., 2005; Dekkers et al., 2016).

Studies focused on characterizing the IgG- and IgA-linked glycans have shown that glycans are differentially expressed in the setting of autoimmunity. For instance, patients <50 years old with Lambert-Eaton myasthenic syndrome (an autoimmune disease in which the immune system attacks the body's own tissues) show increased levels of bisecting GlcNAc on IgG1 and IgG2 (Selman et al., 2011; Maverakis et al., 2015). This suggests that particular glycan types may be potential biomarkers for certain diseases.

#### In Tumor Metastasis and Development

It is essential to understand the factors that affect tumor progression so as to determine how to control tumor growth and metastasis. It has been reported that more multiple branched N-glycan modifications occur in tumor cells due to the higher activity of GlcNAc-T V, which promotes tumor cell metastasis (Dennis et al., 1987; Gu et al., 2009; Kizuka and Taniguchi, 2016). A possible explanation for this is that β1,6-GlcNAcbranched N-glycans can be preferentially processed by β1,4 Gal-T, and β1,3 GlcNAc-T to form poly-N-acetyllacotosamine (poly-LacNAc) for elongation of N-glycans, which could be further modified into the motifs involved in cancer metastasis, such as sialyl Lewis X (Yamadera et al., 2018). As mentioned above, the increased expression of GlcNAc-T III prevented the formation of multiple branch glycans, as GlcNAc-T V could not extend the glycans beyond the bisecting GlcNAc structure and thus inhibited tumor cell metastasis (Dennis et al., 1987; Gu et al., 2009; Taniguchi and Korekane, 2011). Two years ago, it was reported that bisecting GlcNAc structures could inhibit hypoxiainduced epithelial-mesenchymal transition in breast cancer (Tan et al., 2018). However, the mechanism underlying is still not clear. It has been speculated that the addition of bisecting GlcNAc to the key glycoproteins of signaling transduction, e.g., growth factors, integrins, and cytokine receptors, has its special signaling strength under hypoxia. Actually, observations of nonsolid tumors contradict this explanation. GlcNAc-T III was more activated in patients with chronic myelogeneous leukemia in blast crisis (CML-BC) and in patients with multiple myeloma (MM) (Yoshimura et al., 1995a).

Alterations in glycosylation are usually considered as a hallmark of cancer, and the protein with the most extensive studies of its glycosylation is E-cadherin (de-Freitas-Junior et al., 2013). In 2019, researchers found that E-cadherin was required for metastasis in multiple breast cancer models (Padmanaban et al., 2019) and that it contained bisecting GlcNAc modifications (Kitada et al., 2001; de-Freitas-Junior et al., 2013). The addition of bisecting GlcNAc to E-cadherin was found to negatively regulate the tyrosine phosphorylation of β-catenin (Kitada et al., 2001; Takahashi et al., 2009), and GlcNAc T-III knockdown cells

displayed a membrane delocalization of E-cadherin, resulting in its cytoplasmic accumulation (Pinho et al., 2009). As a result, the deactivated β-catenin failed to enhance cell growth or oncogenesis as it formed a tight complex with E-cadherin and could not be translocated into the nuclei (Gu et al., 2009). These results suggest that bisecting GlcNAc plays important roles in tumor metastasis and development. Therefore, it is reasonable that certain aberrant glycosylation (e.g., bisecting) patterns could be used as biomarkers for the progression of particular diseases, including cancer metastasis and development (Dennis et al., 1999; Tan et al., 2018).

#### Others

It has been reported there are multiple functions of bisecting GlcNAc in other cell biology processes. The bisecting GlcNAc structure in N-glycans of adenylyl cyclase III was proved to be an enhancer of enzyme activity (Li et al., 2007). The bisecting GlcNAc structure has been found to inhibit stromadependent hemopoiesis in transgenic mice expressing GlcNAc-T III (Yoshimura et al., 1998).

### THE DETECTION OF BISECTING GlcNAc STRUCTURES

The approaches reviewed here have been released and proved as efficient tools for bisecting GlcNAc modification studies. These bisecting GlcNAc determination approaches are reviewed based on two detection targets, namely, glycan and glycopeptide levels. Before the samples are subjected to glycan or glycopeptide analysis, cell or tissue samples need to be processed as previously described (North et al., 2010; Chen, 2015; Chen et al., 2016); the procedure for sample preparation will not be addressed here.

#### Approaches for Detecting Glycan Levels β1,4-Galactosyltransferase Reaction

β1,4-galactosyltransferase is an enzyme that transfers a galactose (Gal) from UDP-Gal to GlcNAc and forms the disaccharide unit of Galβ1,4GlcNAc in the antenna of complex and hybrid glycans (Schwientek et al., 1996; Chen et al., 2016). However, if a GlcNAc is at the bisected position, it will not be processed by this enzyme, and there are no changes for glycans containing a bisecting GlcNAc (Pang et al., 2007; Qasba et al., 2008; Chen, 2015).

In our previous study, we adopted this method to prove the presence of a bisecting GlcNAc structure in glycans through the β1,4-galactosyltransferase reaction, as is shown in **Figure 3** (Chen et al., 2016). Using this strategy, the glycan sample was treated by the enzyme at 37◦C for 24 h to ensure a complete reaction (Chen, 2015). The glycans at m/z 2,489, 2,850, and 3,212 were chosen as the targets of observation because these glycans contain a potential substrate (an unmodified GlcNAc) for β1,4-galactosyltransferase. If the structures included a reactable GlcNAc group, a Gal residue would be incorporated into the structure, resulting in the glycans at m/z 2,489, 2,850, and 3,212 undergoing a shift in m/z to 2,693, 3,055, and 3,416, respectively. **Figure 3** displays two typical spectra scanned by MALDI-TOF MS with or without β1,4-galactosyltransferase treatment. This figure clearly shows that after the enzyme treatment, there are no obvious changes in the three glycan comparison groups at m/z 2,489 and 2,693, 2,850 and 3,055, and 3,212 and 3,416. This result demonstrates that the GlcNAc in these three glycan structures cannot be extended by β1,4 galactosyltransferase and the bisecting GlcNAc present in these glycans.

The processing performed using this approach is quite simple, and the interpretation of the results is so direct that there is no need for any software for further data analysis. The signal alteration from glycans is basic and essential for this method. However, some exceptions have been observed in the application of this approach in vitro. A research article reported a successful galactosylation occurring beyond the bisecting GlcNAc in the structure of GlcNAcMan3GlcNAc2 in vitro (Zou et al., 2011). In addition, we found that a chemoenzymatically synthesized glycan structure (Galb1-4GlcNAcb1-2 Mana1-6(Galb1-4GlcNAcb1- 4)(Galb1-4GlcNAcb1-2Mana1-3)Manb1-4GlcNAcb1-4(Fuca1- 6)GlcNAc) containing galactosylated bisecting GlcNAc was clearly labeled for Functional Glycomics (CFG) array (CFG, 2012).

Therefore, it would be better to combine the approach of galactosyltransferase reaction with other methods listed in this review to confirm the presence of a bisecting GlcNAc structure in glycans. In 2016, we adopted this method together with gas chromatography-mass spectrometry (GC-MS) to study the bisecting GlcNAc modification (Chen et al., 2016), which will be described in the following section.

#### Gas Chromatography-Mass Spectrometry (GC-MS)

GC-MS methods adopted to detect bisecting GlcNAc have been described previously (Ciucanu, 2006; North et al., 2010). Considering the volatile analytes necessary for GC-MS detection, the glycan samples must be derivatized into partially methylated alditol acetates (PMAA) before being subjected to MS, as has been published in some reports (North et al., 2010; Chen, 2015): the permethylated glycan sample was treated with a NaBD<sup>4</sup> solution and was then dried under nitrogen assistance, followed by acetylation treatment with acetic anhydride. As shown in **Figure 1**, the bisecting GlcNAc is directly attached to the C4 position of the β-linked Man, which possesses a characteristic component of 3,4,6-linked Man, which is a unique signal for the identification of bisecting GlcNAc by GC-MS.

**Figure 4** shows the structural molecule of the PMAA derivative of a 3,4,6-linked-D-mannopyranosyl residue. The fragmentation of this molecule can yield two characteristic ions with m/z of 118 and 333.

We have adopted this permethylation method and combined GC-MS detection to prove the presence of bisecting GlcNAc in human CTB and STB (Chen et al., 2016). As shown in **Table 1**, the characteristic fragment ions of m/z 118 and 333 were simultaneously detected for the group of 3,4,6-linked Man, which supported the existence of bisecting GlcNAc.

In this method, GC was used for the separation of analytes and it thus has higher resolution for complex small molecules; however, the glycan samples must be derivatized into PMAA for GC-MS analysis, and the reaction efficiency affects the quantification of the bisecting GlcNAc structures.

#### Multi-Stage Mass Spectrometry (MSn)

In principle, this method is quite similar to GC-MS detection because the detection of bisected glycan structures can be accomplished by identifying the presence of the 3,4,6-linked Man (Allam et al., 2015). In this method, the Obitrap MS was used for multiple fragmentation (Allam et al., 2015).

TABLE 1 | Summary of the GC-MS linkage analysis of partially methylated alditol acetates derived from N-glycans of cytotrophoblasts (CTB) and syncytiotrophoblasts (STB), adapted from Chen et al. (2016) with permission from Qiushi Chen.


*The elution time is indicated in minutes, and the relative abundance of 2-linked mannose (major component) is normalized to 1.*

**Figure 5** shows the logical order of the MS8 approach that was used for detecting bisecting GlcNAc in the glycan at m/z 2489.25, which is a bitennary, core-fucosylated glycan. Theoretically, the MS7 spectrum of the glycan at m/z 2489.25 should display the characteristic ion of the bisected glycan at m/z 444.18, which would support the presence of 3,4,6-linked Man. Additionally, MS8 analysis would be further carried out to show that the ion at m/z 444.18 is truly a glycan fragment ion indeed and is not noise or a contaminant.

This method is able to target the bisecting GlcNAc structure of interest. More importantly, it does not require additional sample processing. However, it is highly dependent on the MS analyzer, as well as on operator techniques. Usually, only glycans with higher abundances can provide good signals with multiple fragmentation under MSn mode.

### Approach for Detecting Glycopeptide Levels

Due to the rapid development of MS techniques, it is possible to perform analysis of glycopeptides composed of the peptides together with their glycans. In addition to detecting bisecting GlcNAc, MS can also confirm the glycosylation sites as well as the glycan components. The method introduced here for bisecting GlcNAc detection references the paper published in Analytical Chemistry in 2019 (Dang et al., 2019), which was designed to detect bisecting GlcNAc in glycopeptides by their characteristic ion(s) in fragmented MS/MS spectra under low-energy collisions. The characteristic ion(s) are either [Pep+HexNAc3Hex] or [Pep+FucHexNAc3Hex] or both. In this paper, 25 glycoproteins (possessing bisecting GlcNAc) were identified from rat kidney tissue, four of which (Q01129 decorin, P17046 lysosomeassociated membrane glycoprotein 2, P07861 neprilysin, and B5DFC9 nidogen-2) were found to be protein analogs of those identified in our human amnion samples. More importantly, one of these glycoproteins, neprilysin, has the same bisecting GlcNAc location (site N285) as the human neprilysin (P08473) (data not shown).

This method can simultaneously obtain precise information regarding the heterogeneity of glycosylation, including the modification sites and their linked glycan structures, which is useful for the functional study of target proteins. More importantly, this method does not require additional sample processing. However, as mentioned by Dang et al., the effectiveness of the method may be impacted by multiple parameters, such as glycopeptide structures (Dang et al., 2019). It also places greater requirements on the MS analyzer, and the profiling coverage of the glycosylation is limited because sufficient information regarding the peptides and the glycans in the MS2 spectra must be obtained for identification.

We summarize the advantages and disadvantages of each method mentioned above in **Table 2** to help researchers to make appropriate choices according to the laboratory instrumentation and conditions.

### SYNTHESIS OF BISECTING GLYCANS

With more studies focusing on the special bisecting glycans, the importance of this type of glycan in cell biology has been discovered. Indeed, glycosylation modification plays an important role in protein functions due to participation in the functional domain of protein configuration (Luber et al., 2018). It has been reported that human IgG, an important immune system molecule, possesses glycans containing bisecting GlcNAc (Le et al., 2016; Lu and Holland, 2019; Shade et al., 2019). Thus, only synthesizing the sequence of proteins, but not the glycan chains, is insufficient for protein function. Syntheses of glycans or glycoproteins containing bisecting GlcNAc structures have been reported in many papers (Wang et al., 2009; Castilho


TABLE 2 | A comparison of different approaches for bisecting GlcNAc characterization based on MS detection.

et al., 2011; Luber et al., 2018; Manabe et al., 2018; Yang et al., 2018). Synthesis of glycans or glycoproteins containing bisecting GlcNAc structures is helpful for glycomic and glycoproteomic research and will improve the development of protein-based therapeutics and the generation of glycan-engineered therapeutic antibodies (Castilho et al., 2011, 2015).

In 2007, Unverzagt et al. reported the first chemical synthesis of highly branched pentaantennary N-glycans and derivatives with bisecting GlcNAc modifications (Eller et al., 2007). The chemical synthesis of a bisecting GlcNAc could also be achieved through [4+2] and [6+2] glycosylations. This synthetic method reduces the number of reaction steps but faces two difficulties, namely, low yields and poor synthesis selectivity for key glycosylations (Manabe et al., 2018). A modular synthesis of 16 cores of mammalian complex-type N-glycans with optional core fucose and bisecting GlcNAc has been established by Unverzagt et al., and core fucosylated and bisected N-glycans could be synthesized with unprecedented efficiency and purity by integrating a one-pot protocol (Luber et al., 2018).

Biosynthesis of glycoproteins with bisecting GlcNAc glycans has been performed in glycoengineered Nicotiana benthamiana, which lacks plant-specific N-glycosylation (Castilho et al., 2011) but expresses a modified version of human GlcNAc-T III. However, GlcNAc-T III is sometimes not very active when fused to the Golgi α-mannosidase II-cytoplasmic tail, transmembrane domain, and stem (GMII-CTS) region. Therefore, more studies are required to overcome these difficulties.

#### CONCLUSIONS

Researchers are now beginning to realize the importance of bisecting GlcNAc glycans. We reviewed its importance in neurological systems, immune tolerance, IgG, and tumor metastasis and development and then introduced a series of MS approaches for bisecting GlcNAc detection. Compared to the traditional lectin recognition method, MS-based methods can be quantifiable, can target the glycan and glycopeptide of interest, and can provide details of the glycosylation sites and glycan components. In addition, MS approaches are more sensitive, and limits on sample amounts are overcome in glycosylation studies. However, there are bottlenecks in the use of current MS technology to detect the bisecting GlcNAc. The sensitivity of MS detection to glycosylation modification is still limited, and thus specific enrichment of the glycans or glycopeptides is needed. Especially for the MSn analysis, only glycans with higher abundance could be interpreted in detail and with accuracy. In addition, the construction of high-quality MSn spectral databases as well as an understanding of fragmentation mechanisms are also vital for developing the in silico fragmentation tools. Precise prediction of bisecting GlcNAc will be achieved via developing a probabilistic generative model for the CID/HCD fragmentation by machine learning techniques. This review will be valuable for those researchers who are interested in the importance of bisecting GlcNAc in cell biology and can conduct studies in this field and will be helpful for advancing our understanding of bisecting GlcNAc.

#### AUTHOR CONTRIBUTIONS

QC participated in the discussion and drafted the manuscript. YR participated in the discussion and made corrections to the manuscript. ZT and FG participated in the discussion and made corrections to the manuscript. All authors have checked and approved the final manuscript.

#### FUNDING

This work was supported by the National Key R&D Program of China (No. 2017YFC0908403) and the National Natural Science Foundation of China (No. 31500670).

#### ACKNOWLEDGMENTS

We thank the Complex Carbohydrate Research Center (CCRC); a figure is modified from the CCRC Spectral Database (https:// www.ccrc.uga.edu/specdb/ms/pmaa/pframe.html).

## REFERENCES


characteristic ions in tandem mass spectra. Anal. Chem. 91, 5478–5482. doi: 10.1021/acs.analchem.8b05639


down-regulates the tyrosine phosphorylation of beta-catenin. J. Biol. Chem. 276, 475–480. doi: 10.1074/jbc.M006689200


conformation of bisected glycans bound to specific lectins. Sci. Rep. 6:22973. doi: 10.1038/srep22973


myasthenic syndrome and myasthenia gravis. J. Proteome Res. 10, 143–152. doi: 10.1021/pr1004373


for characterization of monoclonal antibody minor variants. Anal. Chem. 91, 15360–15364. doi: 10.1021/acs.analchem.9b04467


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Chen, Tan, Guan and Ren. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Preparation of Complex Glycans From Natural Sources for Functional Study

Qing Zhang, Zhonghua Li and Xuezheng Song\*

*Department of Biochemistry, Emory Comprehensive Glycomics Core, Emory University School of Medicine, Atlanta, GA, United States*

One major barrier in glycoscience is the lack of diverse and biomedically relevant complex glycans in sufficient quantities for functional study. Complex glycans from natural sources serve as an important source of these glycans and an alternative to challenging chemoenzymatic synthesis. This review discusses preparation of complex glycans from several classes of glycoconjugates using both enzymatic and chemical release approaches. Novel technologies have been developed to advance the large-scale preparation of complex glycans from natural sources. We also highlight recent approaches and methods developed in functional and fluorescent tagging and high-performance liquid chromatography (HPLC) isolation of released glycans.

#### Edited by:

*Yanmei Li, Tsinghua University, China*

#### Reviewed by:

*Yingxia Li, Fudan University, China Yuguo Du, Chinese Academy of Sciences (CAS), China*

> \*Correspondence: *Xuezheng Song xsong2@emory.edu*

#### Specialty section:

*This article was submitted to Organic Chemistry, a section of the journal Frontiers in Chemistry*

Received: *06 April 2020* Accepted: *18 May 2020* Published: *03 July 2020*

#### Citation:

*Zhang Q, Li Z and Song X (2020) Preparation of Complex Glycans From Natural Sources for Functional Study. Front. Chem. 8:508. doi: 10.3389/fchem.2020.00508* Keywords: natural glycans, CORA, oxidative release, HPLC, tagging, large scale preparation

## INTRODUCTION

Glycans, as one of the four major biological macromolecules in mammalian systems, are the most diverse and abundant biopolymers (Ohtsubo and Marth, 2006). Besides serving as structural support (such as cellulose) and energy storage (such as starch and glycogen), many glycans are covalently linked to proteins or lipids and play a wide variety of functional roles in physiological and pathophysiological states (Varki, 2017; Reily et al., 2019). Aberrations of glycan structures are associated with many diseases, including cancer, autoimmune, infectious, chronic inflammatory diseases, etc. (Reily et al., 2019).

Recently, glycoscience and functional glycomics have greatly advanced to systematically study the structure and function of glycans (Paulson et al., 2006; Cummings, 2009; Taniguchi et al., 2009; Smith and Cummings, 2013; Cummings and Pierce, 2014; Song et al., 2015). However, the functional study of glycans and glycoconjugates lags far behind those of proteins/peptides and nucleic acids. This is partially due to the fact that glycosylation is a post-translational modification, and the biosynthesis of glycans is not directly template-driven. Glycans are often highly branched structures as products of concerted reactions by glycosyltransferases and/or glycosidases. As a result, both high-throughput structural characterization (sequencing) and automated synthesis/expression are yet in the infant stage. Nevertheless, the importance of biological functions of glycans has been more and more recognized, driving significant interests to glycoscience study. Over the last decades, many new methods and technologies, such as those based on high-performance liquid chromatography (HPLC), mass spectrometry (MS), and LC-MS, have been developed to facilitate glycoscience study (Royle et al., 2008; Zaia, 2008; Doneanu et al., 2009; Ruhaak et al., 2010). Among those, the glycan microarray has proved to be very successful as a high-throughput screening tool for protein–glycan interactions. A glycan microarray is a presentation of a library of diverse glycan structures on a solid surface, such as microscope glass slides, for interrogation with fluorescently tagged glycan binding proteins (GBPs). As the biological functions of glycans are often realized through their specific interaction with GBPs, the glycan microarray has become extremely useful in elucidating ligand specificity of GBPs and generating biological hypothesis based on protein–glycan interactions (Fukui et al., 2002; Stowell et al., 2010; Song et al., 2014a, 2015; Smith et al., 2019). For a glycan microarray to be useful, the expansion of glycan libraries with more diverse and biomedically relevant structures is critical for advancing functional glycomics (Song et al., 2014a). The lack of more of these glycan structures for structural and functional study is a general problem for nearly all aspects of glycoscience. To address this problem, currently there are two main approaches to prepare glycans: chemical/chemo-enzymatic synthesis and isolation/separation of glycans from natural sources. Chemoenzymatic approaches have been developed for the synthesis of structurally defined glycans in the last two decades (Koeller et al., 2000; Blixt and Razi, 2006; Boltje et al., 2009; Lepenies et al., 2010; Palcic, 2011; Schmaltz et al., 2011). A lot of effort and various synthetic methods have been introduced to make more complex glycans available (Wang et al., 2013, 2018; Chen, 2015; Li et al., 2015; Shivatare et al., 2016; Prudden et al., 2017; Zhang et al., 2017; Wen et al., 2018; Liu et al., 2019), and recently, two enzyme-mediated oligosaccharide synthesizers were reported to facilitate the synthetic progress (Zhang et al., 2018; Li et al., 2019). Despite many recent advancements in prototypic automated glycan synthesis, the synthesis of complex, highly branched glycan structures is still extremely challenging and can only be carried out in a number of noncommercialized laboratories. In addition, chemical/chemoenzymatic synthesis is target-driven, and the selection of biomedically relevant structures as synthetic targets relies on preliminary structural and functional analysis of natural glycome (Song et al., 2014a). On the other hand, the preparation of natural glycans has been traditionally carried out at µg scales for structural analysis. Because the biomedically relevant glycan structures often exist at low abundance and as heterogeneous glycoconjugates, the challenges to isolate sufficient quantities in high purity and define their structures are also high. Nevertheless, due to their higher potential biomedical relevance and lower technical barrier to access, we consider the production of natural glycan preparation for functional study to be an important and indispensable approach for glycoscience and functional glycomics.

In general, natural glycans occur in two categories: covalently attached to other biomolecules as glycoconjugates and free reducing glycans existing in organisms. The preparation of glycans from glycoconjugates requires the release of glycans first. Then glycans can be tagged, purified, and separated based on their physical and chemical properties. In this review, we discuss the diverse approaches for preparing different classes of nature glycans, including N-glycans, O-glycans, glycosphingolipids, glycosaminoglycans, glycosylphosphatidylinositol (GPI)-anchor glycans, and human milk oligosaccharides (HMOs). Glycans released from diverse natural glycoconjugates on cells or free glycans can be extracted, tagged, and purified to expand natural glycan libraries. These natural glycans can be printed onto glass slides as microarrays for functional glycomics study (**Figure 1**).

### N-GLYCAN RELEASE FROM NATURAL GLYCOPROTEINS

As the most well-studied class of glycans until now, N-glycans can be cleaved off glycoproteins by several enzymes, such as Peptide-N-Glycosidases (PNGase) and endoglycosidases (Endo) (**Figure 2**). PNGase F is the most widely used enzyme to remove N-glycans from most N-linked glycoproteins and glycopeptides except core α3-fucosylated N-glycans, which are commonly found in plants and insects (Plummer et al., 1984; Tarentino et al., 1985; Tretter et al., 1991). PNGase A has broader substrate specificity and can cleave core α1–3-fucosylated Nglycan (Takahashi, 1977). Recently, PNGase F-II, acid-stable PNGase H+, and PNGase Yl from yeast are all reported to release core α3-fucosylated N-glycans (Du et al., 2015; Lee et al., 2015; Sun et al., 2015). PNGase Ar is able to release the unusual GalαFucα1,3-reducing terminal core from Caenorhabditis elegans (Yan et al., 2018). As another option for enzymatic N-glycan release, endoglycosidases are able to cleave the β1–4-linkage of the di-N-acetylchitobiose core, and such enzymes include Endo A, Endo H, Endo M, Endo D, and Endo S (Freeze and Kranz, 2008; Huang et al., 2012; Wang and Amin, 2014; Li et al., 2016). Although they cleave N-glycans at the same position, they have different substrate specificities related to the structures of the N-glycans (Fairbanks, 2017). Although it is not a focus of this review, it is worth noting that many mutants of endoglycosidases have been developed as synthases for Nglycopeptides and glycoproteins (Huang et al., 2012; Wang and Amin, 2014). The high cost of PNGases and endoglycosidases limits their application in large-scale preparation of N-glycans. Another enzymatic approach is using pronase to cleave peptide bonds and leave glycan-peptide linkages intact (Dodds et al., 2009; Song et al., 2009a; Lu et al., 2019). Pronase is much cheaper than PNGases and endoglycosidases, but its full digestion of glycoproteins to glycoamino acids is always a challenge and often difficult to reproduce.

Chemical release approaches have provided an alternative to solve the high cost of enzymatic approaches in large preparation of N-glycans. Hydrazinolysis and ammonia/ammonium carbonate have been shown to release N-glycans from glycoproteins (Yosizawa et al., 1966; Huang et al., 2001; Nakakita et al., 2007). However, toxic reagents and/or harsh conditions are necessary, which is not amenable to large-scale preparation and may seriously affect the structural integrity of the released glycans. Under a set of optimized milder alkaline conditions, N-glycans without core α1–3-fucose can also be released by selective hydrolysis of N-glycopeptide (Yuan et al., 2014).

Recently, we reported two different chemical approaches for large-scale release of N-glycans. The first approach is a "chemoenzymatic" method to release N-glycans called threshing and trimming (TaT) (Song et al., 2014b). In the first threshing

step, glycoproteins are treated with pronase to create a pool of Nglycoamino acids and glycopeptides with short peptide moieties. In the second trimming step, N-bromosuccinimide (NBS) is added to the mixture of glycoamino acids and glycopeptides to generate free-reducing glycans, nitriles, or aldehydes, depending on different reaction conditions. These products can be easily tagged with fluorescent tags for HPLC purification, MALDI-TOF-MS analysis, and functional study. The TaT approach releases N-glycans without using specialty enzymes, hazardous chemical reagents, and harsh reaction conditions; thus, it can be easily applied for relatively large-scale glycan preparation.

Inspired by the oxidative decarboxylation by NBS treatment, we explored other oxidative reagents and surprisingly discovered that sodium hypochlorite (NaClO) (household bleach) efficiently releases glycans from most classes of natural glycoconjugates (N-glycans, O-glycans, and GSLs) directly from cells, tissues, and organs (Song et al., 2016). In this oxidative release of natural glycan (ORNG) method, household bleach is added to homogenized natural materials (animal/plant tissues) and stirred for 15–30 min at room temperature. After acid precipitation, the free-reducing N-glycans in the supernatant are purified by chromatography techniques, including size exclusion, anion exchange, and hydrophobic/hydrophilic interaction. Purified glycans are ready for fluorescent tagging by reductive amination and separated into individual components by multidimensional HPLC. The ORNG approach is fast, easy to operate, and can be applied to multi-kilograms of natural materials to produce gram-scale natural complex glycans. In our most recent study, the ORNG approach was demonstrated as a complementary route for the preparation of multi-milligram quantities of purified high-mannose N-glycans (Zhu et al., 2018a).

#### O-GLYCAN RELEASE FROM NATURAL GLYCOPROTEINS

Mucin-type O-GalNAc glycans, which attach to serine or threonine residues of proteins through an α-linkage, are the major O-glycans. Compared with N-glycans, which can be released from glycoproteins by several N-glycanases (Plummer et al., 1984; Tarentino et al., 1985; Plummer and Tarentino, 1991), there is a lack of effective general O-glycanase to release O-glycans. Natural O-glycans are traditionally released by chemical methods. The most commonly used method is reductive β-elimination using sodium hydroxide (NaOH) and sodium borohydride (NaBH4) (Carlson, 1966, 1968). Because the common 3-O-substituion at core GalNAc renders it susceptible toward a β-elimination-related peeling reaction after the release of O-glycan from the protein backbone by NaOH, in situ reduction of the reducing end by high-concentration NaBH<sup>4</sup> is necessary. The reductive β-elimination converts the reducing end of O-glycan to alditols. Although it is useful for MS-based glycan profiling, it prevents further derivatization and functionalization for glycan purification and printing on a microarray. Several nonreductive β-elimination methods have been developed to keep the reducing end for further derivatization; (Patel et al., 1993; Chai et al., 1997; Huang et al., 2001; Merry et al., 2002; Miura et al., 2010; Yamada et al., 2010; Kozak et al., 2012) however, most of them are still based on base-catalyzed βelimination, and "peeling" is nearly inevitable (Yu et al., 2010). Furthermore, even if an intact free-reducing end is generated, the following tagging step often generates open-ring O-glycans, which destroy the structural integrity of the O-glycans and, subsequently, may affect its functional study, such as the glycan recognition on a microarray (Prasanphanich et al., 2015). The regeneration of the natural α-O-linkage is significantly more challenging than that of the N-glycan linkage. A PMP-related releasing and tagging approach for O-glycans has also been developed by Wuhr's and Wang's groups (Wang et al., 2011; Zauner et al., 2012) using the combination of β-elimination followed by Michael addition, both of which are catalyzed by a strong base. However, the PMP or related tagged glycans are only suitable for glycomics analysis—not for further derivatization and functional screening on microarrays.

Interestingly, our novel ORNG method also can effectively release O-glycans from glycoproteins or tissues of organisms (Song et al., 2016). The release of O-glycans by ORNG is mechanistically different from all previously known methods. Instead of base-catalyzed elimination, sodium hypochlorite oxidatively degrades the protein backbone to generate O-glycanacids containing glycolic acid (serine-linked) or lactic acid (threonine-linked) as aglycons in addition to a smaller fraction of free-reducing O-glycans. As a result, these glycolic/lactic acid–linked O-glycans to a great extent retain the structural integrities of the O-glycans as well as the α-O-linkage to the aglycon, preserving O-glycan recognition involving the linkage. In addition, compared to β-elimination, ORNG release is faster and the reaction condition is milder; thus, many labile functional groups, such as sulfation and O-acetylation, are uncompromised after NaClO treatment. More importantly, the released O-glycan acids can be easily labeled using a common amidation reaction with a florescent tag, such as mono-9-florenyl-methoxycarbonyl (mono-Fmoc) ethylenediamine for HPLC separation to prepare O-glycan libraries, and these mono-Fmoc tagged O-glycans can be deprotected by piperidine to expose the amino group for immobilization onto microarray slides for functional Oglycomics studies.

Unlike all the above release strategies, recently we have developed a novel technology termed cellular O-glycome reporter/amplification (CORA), which uses an O-glycan precursor (peracetylated benzyl-α-N-acetylgalactosamine, Ac3Bn-α-GalNAc) to amplify O-glycans in living cells and secretes free Bn-O-glycans into the cell media. The secreted Bn-O-glycans can be easily purified and analyzed by MS (Kudelka et al., 2016). CORA greatly enhances the sensitivity of MS analysis of O-glycome from living cells. However, the low UV absorption of the Bn group makes the isolation of these glycans using HPLC challenging. In order to overcome this limit, we have recently designed and synthesized many Ac3Bn-α-GalNAc derivatives as CORA precursors to replace Ac3Bn-α-GalNAc. These new CORA precursors include many function groups, such as the fluorescence group and bioorthogonal reactive groups (Zhang et al., 2019), allowing O-glycans produced by CORA to be tagged, separated, and purified by chromatography for functional study. Preparative CORA using these derivatives as precursors is currently under investigation, and we believe this method could become a promising approach for preparation of O-glycans (**Figure 3**).

#### GLYCAN RELEASE FROM GLYCOSPHINGOLIPIDS

Glycosphingolipids (GSLs) are amphipathic glycoconjugates widely distributed on the cell surfaces. Although exoglycosidases and endoglycosidases are only able to cleave the glycan moieties from GSLs (Li and Li, 1999), endoglycoceramidases are found to release entire glycans from GSLs (Ishibashi et al., 2007; Li et al., 2009; Albrecht et al., 2016). However, the enzymes are expensive and specific to certain GSL structures, preventing their wide application in larger scale glycan preparation from GSLs.

Traditional chemical methods utilize ozonolysis or osmium tetraoxide to oxidize the C=C double bond in the sphingosine moiety, followed by base-catalyzed β-elimination (Wiegandt and Baschang, 1965; Hakomori, 1966). In order to prevent the potential adverse effect of base treatment on glycan structural integrities, we have developed several approaches to release glycans from GSLs for functional study through glycan microarray preparation using covalent immobilization. The first approach takes advantage of the aldehyde group generated by ozone treatment of GSLs, which can be directly coupled with functional and fluorescent tags by reductive amination. This approach preserves a significant portion of the lipid moiety and may benefit functional studies requiring the lipid component (Song et al., 2011). The second approach is to heat ozonized GSLs gently under neutral pH, which interestingly releases free-reducing glycans fairly efficiently (Song et al., 2012). Both of these methods still require ozone to oxidize the C=C double bond to initiate the reaction and can only be applied to purified GSLs. In our most recent ORNG approach, we found that in addition to N- and O-glycans, NaClO can also release glycans as cyanomethyl glycosides from GSLs—apparently through the oxidative degradation of the lipid moiety at the polar head group (Song et al., 2016). The ORNG approach can be applied not only to gangliosides purified by organic solvent extraction, but also directly to aqueous homogenized brain tissue (Song et al., 2016). The ability to release GSL-glycans without involving organic solvent extraction significantly reduced the complexity of GSL-glycan preparation and is essential to larger scale glycan production. Interestingly, although NBS can also release glycan nitriles from gangliosides at 65◦C, this reaction does not work directly on homogenized brain tissue.

### GLYCAN RELEASE FROM GLYCOSAMINOGLYCANS AND GPI-ANCHORS

Glycosaminoglycans (GAGs) are linear polydisperse heteropolysaccharides, consisting of up to 1,000 repetitive disaccharide units (Murata et al., 1985; Jackson et al., 1991). Heparin, a highly sulfated form of heparan sulfate (HS) glycosaminoglycans, has been shown to possess important biological functions that vary according to its fine structure (Liu et al., 2009). Heparin has widespread clinical use as an intravenous anticoagulant with more than 100,000 kg produced annually worldwide (Liu et al., 2009). Commercial heparin is currently produced from animal tissues, such as porcine intestine

derivative, and then be extended by glycosyltransferases in the O-glycosylation pathway in Golgi. The Bn-O-glycan derivatives are secreted to cell media. The fluorescently labeled O-glycans can be purified to prepare O-glycan libraries for functional O-glycome study.

and beef lung (Bhaskar et al., 2012). The methods used for commercial preparation of heparin involve five basic steps: (1) preparation of tissue, (2) extraction of heparin from tissue, (3) recovery of raw heparin, (4) purification of heparin, and (5) recovery of purified heparin (Linhardt and Gunay, 1999). While being similar, the heparins derived from different animal sources have diverse structures that relate to different functional activities, such as AT- and thrombin-binding affinities (Liu et al., 2009). A worldwide health crisis in 2007, associated with contamination of several heparin batches, reportedly resulted in more than 200 deaths alone in the United States (Liu et al., 2009; Turnbull, 2011). Low-molecular weight heparins (LMWH, MW avg <8 kDa) are subcutaneously administered, have a longer half-life than unfractionated heparin, and can be prepared with different structures by different depolymerization methods, including oxidation, deaminative degradation, and β-elimination (Linhardt and Gunay, 1999).

With the improvement in chemical and chemo-enzymatic methods, the synthetic scale of GAGs has reached gram scale, and the automated solid-phase synthesis of chondroitin sulfate GAGs is available (Eller et al., 2013; Mende et al., 2016; Xu et al., 2017), which enable facilitated access to functional and biological study of GAGs. Although glycan microarray analysis of natural GAG oligomers have been reported for more than 10 years (Noti et al., 2006; Park et al., 2008), large-scale GAG microarrays for general screening of GAGbinding proteins are only reported in a synthetic approach (Yang et al., 2017; Zhang et al., 2017; Zong et al., 2017). The high heterogeneity of the sulfation patterns of the GAG chains make the isolation of homogeneous GAG oligomers, structural characterization, and chemical/enzymatic synthesis a challenging task. Nevertheless, with the recent progress in HPLC analysis and separation, preparation of a more comprehensive natural GAG glycan library for functional study with GAG-binding proteins will become possible in the near future.

GPI-anchor proteins play critical roles in numerous biological processes, such as cell recognition and interaction (He et al., 1987; Takeda and Kinoshita, 1995; Paulick and Bertozzi, 2008). Because the first total synthesis of an intact GPI anchor was in 1991 (Murakata and Ogawa, 1991), convergent chemical and chemo-enzymatic strategies for GPI synthesis were developed, and more than 30 GPIs were isolated and characterized (Wu et al., 2008; Yu and Guo, 2009; Swarts and Guo, 2010; Guo, 2013). AN effective strategy of labeling of cell-surface GPIs and GPI-anchored proteins was developed for biological studies (Lu et al., 2015). However, natural-sourced GPI anchor preparation for functional study is not well studied yet, presumably due to the lack of well-defined enzymatic and chemical release methods and low abundance of GPI-anchors in cells.

#### PREPARATION OF HUMAN MILK OLIGOSACCHARIDES

Human milk oligosaccharides (HMOs), occurring as free-reducing glycans, are the third major component of human milk after lactose and lipids and are known to play important roles benefiting infant health (Chen, 2015). HMOs are extended from lactose by a collection of glycosyltransferases, adding N-acetyl-glucosamine, galactose, fucose, and neuraminic acid (Jenness, 1979). More than a hundred different HMO structures have been identified and elucidated (Zopf et al., 1978; Prieto and Smith, 1985; Smith et al., 1985; Jensen et al., 1995). Due to its high abundance in human milk (5–15 g/L) as free-reducing glycan without the need to release from other biomolecules, large-scale isolation and separation of HMOs have been practiced for many years. In an early study, individual HMOs were isolated directly by size exclusive, anion-exchange, and paper chromatography without being derivatized (Kobata et al., 1969; Donald and Feeney, 1988). More recently, with the wide use of HPLC isolation and MS analysis, tagging of HMOs by functional and/or fluorescent groups for separation and further functional study is more common. We have applied our bifunctional fluorescent tag AEAB to HMO isolation and fractionation (Song et al., 2009b). Isolated glycans can be directly printed on a microarray for functional screening with various GBPs and viruses (Yu et al., 2012). With more complex HMO structures becoming available for functional study, we expect further elucidation of their functions through interaction with the infant microbiome.

#### FUNCTIONAL AND FLUORESCENT TAGGING OF RELEASED GLYCANS

After being released from natural sources, glycans existing in the heterogeneous mixture need to be separated for analysis or for preparation of pure glycans. Due to the lack of an exploitable chromophore in natural glycans and the anomeric mutual rotation at the reducing end, it's a challenge to monitor glycans during HPLC separation. The preparation of glycan microarrays also requires that the glycans are derivatized with functional groups, such as an amino group. Therefore, it is important to install functional and fluorescent tags on the released glycans for easier and more efficient separation and solid phase immobilization afterward.

Reductive amination of free-reducing glycans with fluorescent amines has long been used for the HPLC profiling of glycans (**Figure 4**). 2-aminopyridine (2-AP), 2-aminobenzamide (2- AB), 2-aminobenzoic acid, or anthranilic acid (2-AA) are commonly use fluorescent amines (Hase et al., 1978; Bigge et al., 1995; Anumula, 2014). However, these small fluorescent amines lack a functional group for efficient solid phase immobilization or covalent derivatization. With an aromatic amino group, a homobifunctional tag, 2,6-diaminopyridine (DAP) conjugated glycans can be immobilized onto activated surfaces for microarray preparation (Xia et al., 2005; Song et al., 2008). To efficiently immobilize precious natural glycans, we developed a novel heterobifunctional tag, 2-amino-N-(2 aminoethyl)benzamide (AEAB), which contains both arylamine and alkylamine (Song et al., 2009b). The aromatic amine selectively reacts with the free-reducing end of released glycans by reductive amination while the alkylamine is used for efficient

solid-phase immobilization onto both NHS and epoxy-activated glass slides.

One inherent problem with commonly used reductive amination is breaking the reducing end ring structure, affecting the glycan structural integrity. To address this drawback, new methods and linkers have been reported, such as 2 amino-methyl-N,O-hydroxyethyl (AMNO) (Bohorov et al., 2006) and N-Fmoc-3-(methoxyamino)propylamine (F-MAPA) (Wei et al., 2019). We also developed a procedure to prepare HMO-AEAB conjugates with an intact reducing end ring structure (Yu et al., 2012). More recently, we have designed a new tag, O-benzylhydroxylamine (BHA), which can be easily and efficiently installed on HMOs and keep the glycan structure integrity (Zhang et al., 2020). By Pd/Ccatalyzed hydrogenation, free HMO can be easily regenerated from HMO-BHA.

Compared to nonderivatized glycans, the installation of a fluorescent tag to released glycans often increases the sensitivity of MS analysis. Although premethylation is considered a necessary step for detailed sequencing by MS (Ashline et al., 2014), the conjugated tags often generate structural complexity during permethylation. Therefore, we developed a facile and mild method using NBS to remove tags of aminated glycans, which regenerates free-reducing glycans for permethylation (Song et al., 2013). This method can be efficiently applied to all types of tags installed through reductive amination, including 2-AP, 2-AB, 2-AA, and AEAB.

Because of the easy installation and removal, the 9 fluorenylmethoxycarbonyl (Fmoc) group is widely used as an amino-protecting group in organic chemistry, especially in peptide synthesis. After being installed on released glycans, the fluorescent Fmoc group can greatly enhance sensitivity of HPLC to tagged glycans (Kamoda et al., 2005; Song et al., 2009a; Yamada et al., 2013; Lu et al., 2019). It can also serve as an affinity tag due to the hydrophobicity. The amino group can be easily regenerated for solid-phase immobilization in microarray printing (Kamoda et al., 2005; Yamada et al., 2013; Wei et al., 2019). We have successfully installed an Fmoc tag on the released glycan from natural O-glycanconjugates and glycosyphingolipids in our ORNG method (Song et al., 2016).

### HPLC SEPARATION OF GLYCANS FOR FUNCTIONAL STUDY

Over the years, various HPLC methods have been commonly used for glycan purification, including hydrophilic interaction liquid chromatography (HILIC), high-performance anionexchange chromatography (HPAEC), and reversed-phase chromatography (Ruhaak et al., 2010; Nagy et al., 2017). HILIC mode HPLC is an efficient technique for separation of unprotected saccharides (Fu et al., 2010; Melmer et al., 2011; Wan et al., 2015) while reversed-phase chromatography is suitable for hydrophobic saccharides (Rajakylä, 1986; El Rassi, 1995; Dallabernardina et al., 2016). HPAEC is usually used for negatively charged unprotected carbohydrates (Rohrer et al., 2013, 2016). Porous graphitized carbon (PGC) as a unique stationary phase combining both hydrophobic and anionic interactions separates glycans based very well on their isomeric structures under reverse-phase elution conditions (Fan et al., 1994; Itoh et al., 2002; Ruhaak et al., 2009; West et al., 2010; Lie and Pedersen, 2018). Because the glycans obtained from biological sources are often complex mixtures, multidimensional HPLC is necessary to separate them into individual glycans with significant purity (Nagy et al., 2017). We have successfully applied multidimensional HPLC to isolate an individual glycan library for microarray study (Song et al., 2009b; Yu et al., 2012).

Most of the HPLC separation methods are designed for analytical glycomics using small samples, which does not generate significant quantities of glycans for detailed functional study. There have been a few examples in which a more significant amount of starting materials are used to generate a sufficient amount glycans for NMR study (Green et al., 1988; Da Silva et al., 1995). However, no real preparative-scale HPLC separations have been tacked previously, presumably due to the unavailability of a large amount of released glycans. With gramscale glycans from a natural source are available because of the ORNG technique, development of preparative-scale purification becomes practical and provides an effective route to address the lack of glycans for functional study. We have reported isolation of high mannose N-glycans from soy proteins and egg yolks by a preparative scale multidimensional HPLC method (Zhu et al., 2018a,b). However, even after multidimensional HPLC, some fractions are still mixtures of isomers that are very difficult to separate even on analytical columns. To address this problem, recycled HPLC could be a good solution (Alley et al., 2013; Sidana and Joshi, 2013). Most recently, we have reported a simple and affordable closed-loop recycled HPLC method for separation

#### REFERENCES


of complex glycans in the preparative scale. It was successfully applied to reverse-phase chromatography, HILIC, and sizes using size-exclusion chromatography (SEC) (Zhu et al., 2020).

#### CONCLUSIONS

With a highly diverse structure, natural glycans are likely more biologically relevant for functional study. Here, we have summarized the preparation of several classes of complex glycans from glycoconjugates. Both enzymatic and chemical approaches have been discussed, and each method has its own advantages and should be carefully selected based on the specific goal of individual study. When a large amount of natural glycans are desired, chemical approaches, especially the new ORNG approach provides a good alternative to chemoenzymatic synthesis. The ORNG approach is able to quickly release up to grams of glycans from several major classes of glycoconjugates using affordable chemical reagents (household bleach), a mild reaction condition, and a simple operation. Nevertheless, more preparation methods are still in demand, especially for O-glycans, GAGs, and GPI-anchors. The novel CORA method provides a potential new route toward O-glycans if preparative scale can be achieved.

### AUTHOR CONTRIBUTIONS

QZ, ZL, and XS wrote and edited the manuscript. All authors contributed to the article and approved the submitted version.

#### FUNDING

This work was supported by NIH Common Fund Glycoscience (U01GM116254, U01CA207821) and partially by STTR grants R41GM122139, SBIR R43GM131534, and R43GM133252. It was also partially supported by Emory Comprehensive Glycomics Core (ECGC), which is subsidized by the Emory University School of Medicine and is one of the Emory Integrated Core Facilities.


with charged aerosol detection. J. Chromatogr. A 1567, 147–154. doi: 10.1016/j.chroma.2018.06.068


proteins by oxidative release of natural glycans (ORNG). Carbohydr. Res. 464, 19–27. doi: 10.1016/j.carres.2018.05.002


**Conflict of Interest:** XS is a co-founder of NatGlycan LLC, which commercializes the ORNG process.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Zhang, Li and Song. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Design and Synthesis of Chitosan—Gelatin Hybrid Hydrogels for 3D Printable in vitro Models

Sofia Magli <sup>1</sup> , Giulia Beatrice Rossi <sup>1</sup> , Giulia Risi <sup>2</sup> , Sabrina Bertini <sup>2</sup> , Cesare Cosentino<sup>2</sup> , Luca Crippa<sup>3</sup> , Elisa Ballarini <sup>3</sup> , Guido Cavaletti <sup>3</sup> , Laura Piazza<sup>4</sup> , Elisa Masseroni <sup>4</sup> , Francesco Nicotra<sup>1</sup> and Laura Russo<sup>1</sup> \*

<sup>1</sup> Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy, <sup>2</sup> G. Ronzoni Institute for Chemical and Biochemical Research, Milan, Italy, <sup>3</sup> Department of Medical and Surgical Science, University of Milano-Bicocca, Milan, Italy, <sup>4</sup> Department of Environmental Science and Policy (ESP), University of Milan, Milan, Italy

The development of 3D printable hydrogels based on the crosslinking between chitosan and gelatin is proposed. Chitosan and gelatin were both functionalized with methyl furan groups. Chemical modification was performed by reductive amination with methyl furfural involving the lysine residues of gelatin and the amino groups of chitosan to generate hydrogels with tailored properties. The methyl furan residues present in both polymers were exploited for efficient crosslinking via Diels-Alder ligation with PEG-Star-maleimide under cell-compatible conditions. The obtained chitosan-gelatin hybrid was employed to formulate hydrogels and 3D printable biopolymers and its processability and biocompatibility were preliminarily investigated.

#### Edited by:

Jonathan G. Rudick, Stony Brook University, United States

#### Reviewed by:

Amitav Sanyal, Bogaziçi University, Turkey John F. Trant, University of Windsor, Canada

\*Correspondence:

Laura Russo laura.russo@unimib.it

#### Specialty section:

This article was submitted to Organic Chemistry, a section of the journal Frontiers in Chemistry

Received: 03 March 2020 Accepted: 22 May 2020 Published: 14 July 2020

#### Citation:

Magli S, Rossi GB, Risi G, Bertini S, Cosentino C, Crippa L, Ballarini E, Cavaletti G, Piazza L, Masseroni E, Nicotra F and Russo L (2020) Design and Synthesis of Chitosan—Gelatin Hybrid Hydrogels for 3D Printable in vitro Models. Front. Chem. 8:524. doi: 10.3389/fchem.2020.00524 Keywords: glycopolymers, hybrid hydrogels, functionalization strategies, Diels-Alder click reaction, 3D bioprinting, 3D cultures, click chemistry for 3D cellular models

### INTRODUCTION

3D cultures embedded in hydrogels represent a challenging opportunity to advance in tissue engineering and 3D in vitro functional models (Ashammakhi et al., 2019). The advent of new technologies, such as 3D printing and bioprinting, allows the production of artificial 3D cell microenvironments, provided that a wide range of printable hydrogels are available (Moroni et al., 2018; Bagher et al., 2019). Such hydrogels must be biocompatible and able to provide 3D scaffolds with the appropriate structural and chemical features, such as stiffness, viscosity, and capacity to interact with cells providing them with the required biological signals to address their fate (Ooi et al., 2017; Neves et al., 2019). New fascinating strategies have been developed to better control the encapsulation of one or more cell lines in specific architectures, all based on the use of proper biomaterials with tailored properties and fabrication strategies. An important issue is, therefore, the availability of biomaterials suitable for different therapeutic purposes and fabrication strategies. Most or commercially available biopolymers employed as matrices for cell cultures are not suitable or adaptable as bioinks for 3D printing protocols, and vice versa. 3D printing protocols are strongly related to the physical properties of the polymers employed. At the same time, cells embedded in the printable polymers need motifs and functional groups able to mimic biochemical signals and structures present in the natural extracellular matrix (Nicolas et al., 2020). The development of new accurately functionalized biopolymers that enable the properties of the final constructs to be tuned is therefore desirable. The majority of the biopolymers employed for biomedical use are natural polymers extracted from animal tissues or obtained by recombinant

**169**

methods. They are biocompatible and suitable for integration in biological systems, but their applications are often limited due to the poor mechanical properties, the inadequate architecture, and the limited modularity of the structural features. In order to overcome these limitations, one of the most promising strategies is based on the combination of polymers by controlled crosslinking with linkers of different lengths, in order to "tune on demand" the morphological and mechanical properties of the final constructs (Spicer et al., 2018). Chitosan is a cationic polysaccharide characterized by N-acetyl-D-glucosamine and Dglucosamine as units. Chitosan derivatives have been already shown to recreate a microenvironment conducive to cell growth (Zhang et al., 2015) and they have been extensively employed for tissue engineering applications (Polgar et al., 2017; Fasolino et al., 2019; Ruprai et al., 2019; Sultankulov et al., 2019; Cassimjee et al., 2020; Tao et al., 2020). Moreover, chitosan can be combined with natural polymers such as gelatin, which contains specific aminoacidic residues such as Arg-Gly-Asp (RGD) in its sequence (Davidenko et al., 2016). This amino acid sequence is present ubiquitously as an adhesion sequence in the proteins of extracellular matrix (Liu et al., 2004) and is involved in numerous physiological functions. Binding between integrins and RGD induces a series of reactions in the cytoplasm involving the cytoskeleton and other proteins that regulate cell adhesion, growth, and migration. For this reason, the combination of these two polymers has been widely investigated for various biomedical applications from wound healing (Huang et al., 2013; Carvalho, 2017) or drug delivery (Kim et al., 2018). Chitosan-Gelatin hybrids have been identified as promising hybrid materials for tissue engineering or drug delivery applications (Afewerki et al., 2019; Rodríguez-Rodríguez et al., 2020). The use of hybrids obtained by ionic interactions or covalent linkages has been investigated to obtain scaffolds or hydrogels with specific kinetics and degradation properties (Gorgieva and Kokol, 2012). Different fabrication methodologies have been employed depending on the final intended application, such as crosslinking by chemical reaction of complementary groups using glutaraldehyde (Jiankang et al., 2009) or N, N- (3-dimethylaminopropyl)-N′ -ethyl carbodiimide (Alizadeh et al., 2013) as crosslinkers or crosslinking by high-energy irradiation like UV (Saraiva et al., 2015; Carvalho, 2017). However, conventional crosslinking methods involve the use of toxic reagents such as glutaraldehyde or photoinitiators and mutagenic UV irradiation and lead to the formation of side-products that can be unsafe or not fully biocompatible. In the present work, we are presenting an alternative based on Diels-Alder click chemistry that is applicable to different formulation and fabrication strategies at physiological pH without further purifications, also allowing cell encapsulation during the crosslinking without affecting cell viability.

This strategy requires the introduction in the biopolymer chains of functional groups able to react with sufficiently fast kinetics in mild and biocompatible conditions and without the formation of toxic side products—in other words, a "click reaction" (Nimmo and Shoichet, 2011; Azagarsamy and Anseth, 2013; Tam et al., 2017). A linker with complementary functional groups is commonly used for click reaction crosslinking. Several biocompatible click reactions have been employed to obtain smart biomaterials and to impart them with new biological functionalities (Nimmo and Shoichet, 2011; Russo et al., 2011, 2014, 2016; Azagarsamy and Anseth, 2013; Lin et al., 2013; Gandavarapu et al., 2014; Nair et al., 2014; Taraballi et al., 2014; Huynh et al., 2018; Kaur et al., 2018). However, the application of these procedures on heterogeneous systems is of growing interest in materials science also for the improvement of bioprinting hybrid polymer procedures.

In order to fulfill this objective, we have investigated a strategy to functionalize gelatin and chitosan, selected as biopolymers, in order to obtain a final construct with both polysaccharide and protein properties. According to our experience, the most reproducible crosslinking approach is the Diels-Alder reaction (Roy et al., 2015). The Diels-Alder cycloaddition has already been employed to generate polysaccharide-based biomaterials, as with hyaluronic acid and alginate-based hydrogels, employed to encapsulate cancer cell lines, confirming the biocompatibility of the produced biomaterials (Smith et al., 2018).

In the present work, we designed and studied gelatin (GE) and chitosan (CH) functionalization with methyl furfural as a diene and the employment of the functionalized constructs as starting polymers for the design of a customizable hybrid biomaterial crosslinked by Diels-Alder cycloaddition with a commercial Star-PEG functionalized with maleimide groups as dienophile (Star-PEG-MA) (**Scheme 1**).

Chitosan and gelatin were treated with 5-methyl furfural in the presence of NaCNBH<sup>3</sup> to perform a reductive amination, taking advantage of their amino groups, to generate the methyl furan functionalized biopolymers. This reaction has already been investigated on single-chain polymers for the generation of new diagnostic and therapeutic tools for nanomedicine and tissue engineering applications (Hall et al., 2011; Nimmo et al., 2011; Alge et al., 2013; Gandini, 2013; Koehler et al., 2013a,b; Park et al., 2014; Gregoritza and Brandl, 2015; Stewart et al., 2016; Ma et al., 2017; Tam et al., 2017; Smith et al., 2018; Madl and Heilshorn, 2019).

However, to our knowledge, the employment of Diels-Alder crosslinking to produce hybrid systems based on protein and polysaccharide components functionalized with methyl furan moieties has not been investigated yet. As a matter of fact, the crosslinking reaction between two totally different biopolymers containing the same reactive functional group must be accurately modulated to generate a hybrid material with the required properties. The intensity of derivatization of GE and CH with methyl furan was therefore determinated in order to finally obtain the most efficient crosslinking conditions.

The hydrogel network formation was assessed using different concentrations of maleimide tetra-functionalized PEG (Star-PEG-MA commercially available) to select the most promising formulation. In detail, the methyl furanfunctionalized biopolymers were mixed and reacted with star-PEG-MA to obtain the final crosslinked hydrogel (GE-CH). The different reactivities of gelatin and chitosan with the Star-PEG-MA make it problematic to assess the appropriate

degree of functionalization for obtaining network formation in the crosslinking step. On the other hand, the Diels-Alder cycloaddition turned out to be an affordable method to control the crosslinking of the final hydrogel and to easily quantify the degree of functionalization of both of the polymeric components by NMR.

The hydrogels obtained were manufactured by employing different formulation strategies (**Figure 1**) and were preliminary assessed for different biomedical applications. We tested the hybrid hydrogels for spheroid encapsulation studies, in which the applicability of commercial materials is often limited in terms of histological analysis feasibility and reproducibility. Furthermore, the biomaterial of choice must avoid uncontrolled migration or low viability of the embedded cells.

We also screened our hybrid materials as biopolymers for 3D-bioprinting applications, generating cell-laden constructs. 3D bioprinting is today an emerging fabrication technology with potential applications in tissue engineering and cell biology studies (Gungor-Ozkerim et al., 2018; Sun et al., 2020). However, also in this case, libraries of bioprintable biomaterials need to be created to enable more effective in vitro testing and to overcome the current limitations arising from the different cell population requirements and the multitude of physiological and pathological conditions to mimic.

### MATERIALS AND METHODS

Gelatin (type A), 5-methylfurfural, 4arm-PEG10K-Maleimide (Star-PEG-MA), Phosphate-Buffered Saline, U87 glioblastoma cell line, Eagle's minimal essential medium, L-glutamine, Sodium Pyruvate, Fetal Bovine Serum, penicillin, and streptomycin were purchased from Sigma-Aldrich, Italy. Water-soluble chitosan was purchased from Carbosynth Ltd, UK. A LIVE/DEAD Cell Viability Assay was purchased from ThermoFisher.

### Functionalization of Methyl-Furan Functionalized Gelatin (GE-MF)

Gelatin type A (2.00 g) was dissolved in 30 ml of PBS at pH 4.5 and heated at 37◦C until a homogeneous solution was obtained. To the dissolved gelatin, 6.8 ml of 5-methyl furfural was added and left under gentle stirring. After 30 min, 2.15 g of NaBH3CN was added, followed by stirring for 3 h. The solution was dialyzed against a NaCl solution (0.1 M) for 1 day, followed by mQ H2O for 4 days, using 14 kD dialysis membranes at 40◦C. Functionalized polymers were purified through filtration, using 0.5 and 0.22 mm filters. The obtained solution was freeze-dried to give 1.73 g of a white spongy solid.

### Functionalization of Methyl-Furan Functionalized Chitosan (CH-MF)

Chitosan (2.00 g) was dissolved in 35 ml of 2% acetic acid solution and mixed by sonication and vortex until a homogeneous solution was obtained. To the dissolved chitosan, 206 µl of 5 methyl furfural was added, and it was left under gentle stirring. After 30 min, 65 mg of NaBH3CN were added, and the reaction was stirred for 3 h at room temperature. The solution was dialyzed against 0.01 M NaCl solution for 1 day, followed by mQ H2O for 4 days, using 1 kDa dialysis membranes at 40◦C. Functionalized polymers were purified through filtration using 0.5 and 0.22 mm filters. The obtained solution was freeze-dried to give 1.37 g of a white spongy solid.

## Hybrid Hydrogel and Dried Sample Formation

#### Hydrogel Network

GE-MF (33 mg) and CH-MF (17 mg) were dissolved in 0.750 ml of PBS at 7.4 pH by vortexing at 37◦C until complete dissolution. PEG-Star-MA (2.5 mg) was dissolved in 0.250 ml

at rt, added to the hybrid solution, and mixed. The hybrid solution (GE-CH) was left for 3 h at 37◦C to allow for hydrogel network formation.

#### Dried Samples for SEM and Swelling Studies

the GE-MF, CH-MF, and PEG-Star-MA solution formed as previously described was transferred into Teflon <sup>R</sup> molds (15 cm diameter) and left for 3 h at 37◦C. Once the Diels-Alder reaction had occurred, the sample was transferred at −20◦C for 24 h and then freeze-dried for 48 h to obtain cylindrical samples (Irmak et al., 2019). Scanning Electron Microscopy (SEM) was employed to characterize the cross-section of the fibrous dried samples obtained.

### H-NMR

Spectra of chitosan and its derivate were obtained with a Bruker AVANCE III HD 500 MHz spectrometer (Bruker, Karlsruhe, Germany) equipped with a 5-mm TCI cryogenic probe at 303 K. Spectra were processed with BrukerTopspin software version 4.0.6. For preparation, 60 mg of chitosan sample was solubilized in 10 ml aqueous acid solution (Acetic acid 2%) and was mixed for 24 h at room temperature. Then, 1 ml of solution was lyophilized and solubilized in 0.6 ml D2O. The <sup>1</sup>H NMR spectrum was acquired with presaturation of residual HDO, using 64 scans, an 8-s relaxation delay, and 32 k time-domain points. Spectra of gelatin and its derivate were also obtained with the Bruker AVANCE III HD 500 MHz spectrometer (Bruker, Karlsruhe, Germany) equipped with a 5-mm TCI cryogenic probe at 333 K and processed with BrukerTopspin software version 4.0.6. Samples of about 6 mg were dissolved in 0.6 ml of D2O. The <sup>1</sup>H NMR spectra were acquired with presaturation of residual HDO, using 128 scans, a 25-s relaxation delay, and 32 k timedomain points.

## FT-IR

All of the FT-IR spectra were recorded in attenuated total reflection ATR mode using a PerkinElmer Spectrum 100 FTIR Spectrometer. All of the samples had been coated onto a steel surface and were analyzed at different points of the material. The absorbances of the samples and backgrounds were measured using 25 scans each. The spectral absorption data were collected in the range between 4,000 and 650 cm−<sup>1</sup> at a spectral resolution of 2 cm−<sup>1</sup> .

#### SEM Analysis

Scanning Electron Microscopy (SEM) was employed to characterize the surface and the cross-section of the obtained fibrous samples. The morphology of the final hybrid biomaterials was characterized using a ZEISS Gemini 500 field emission HR-SEM at voltage of 5 kV. Prior to examination under SEM, all of the samples were sputter-coated with a 10-nm chrome layer.

### Swelling Analysis

The swelling analysis was performed according to the literature (Varaprasad et al., 2011). In summary, the dried GE-CH crosslinked polymers were employed as dried cylinders (5 mm height and 15 mm diameter) and fully immersed in 10 ml of pH 7.4 PBS at 37◦C. Samples were collected at the indicated time points, and the weights of the samples were measured using an electronic balance. The followed equation was employed to calculate the swelling ratio:

$$\text{Swelling } \% \quad = \frac{\text{Ws} - \text{Wd}}{\text{Wd}} \times 100 \tag{1}$$

[Wd = Weight of polymer; Ws = weight of swollen polymer]

#### Rheological Properties

The rheological properties of the hydrogel were studied using a CMT rheometer (DHR-2, TA Instruments, USA) equipped with a 40-mm-diameter plate–plate geometry. For all tests, the temperature and the gap between the plates were kept constant 37◦C and 1.0 mm, respectively, and a solvent trap was used to prevent loss of solvent. The viscoelastic behavior of the material at the mesoscale was investigated by means of dynamic measurements and quantified through the storage modulus [or elastic component of the complex modulus G<sup>∗</sup> (ω)] G'(ω), and the loss modulus [or viscous component of the complex modulus G<sup>∗</sup> (ω)] G"(ω) [Pa]. G'(ω) and G"(ω) characterize the solid-like and fluid-like contributions to the measured stress response that follows a sinusoidal deformation of the tested material, respectively. The range of linear viscoelastic response under oscillatory shear conditions was identified by means of a strain sweep test: the sample was subjected to an extended field of strains (0.01–100%) at a constant frequency of 1 Hz. The mechanical plots were then drawn by performing a frequency sweep test over the 0.01–100 rad/s frequencies at a constant strain (2%). Finally, a step strain sweep test was carried out to investigate the self-healing properties of the sample in response to applied shear forces. Viscoelastic properties were measured as a function of time in an oscillatory time sweep (3 min, 2% strain, 1 Hz frequency) before and after severe destruction of the gel network (800% strain, 3 min, 1 Hz frequency). The extent of the self-healing behavior was calculated according to Zhao et al. (2014) (Equation 1) as the ratio of the storage moduli of the healed (G′ h ) and pristine gels (G′ p ).

$$\text{Heading Efficiency (HE)} = \mathbf{G}' \mathbf{h} / \mathbf{G}' \mathbf{p} \tag{2}$$

Data were analyzed with TRIOS 3.0.2 software.

#### Cell Culture

Before bioprinting and use to create spheroid structures, human glioma U87-MG cells were maintained in adhesion condition in T75 tissue culture flasks. U87 cells were cultured in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum, 100 units/ml penicillin, and 100 mg/ml streptomycin at 37◦C under a humidified atmosphere with 5% CO2.

### 3D Bioprinting Procedure and Biocompatibility

GE-MF (66 mg) and CH-MF (34 mg) were dissolved in 1.5 ml of PBS at 37◦C and vortexed until complete dissolution. PEG-Star-MA (5 mg) was dissolved in 0.5 ml of PBS at room temperature, added to the GE-CH hybrid solution, and mixed. The GE-CH solution was left for 30 min under UV-light for further sterilization and 2 h at 37◦C to obtain partial network formation of the hydrogel solution. U87 glioblastoma cells (7 × 10<sup>5</sup> /ml) in complete medium were added to the GE-CH solution (5%, 2 ml) and transferred into a 5-ml bioprinter syringe (Moroni et al., 2018). Each sample was bioprinted as a grid on 35-mm Petri TC dishes using a 22G nozzle with a 0.41 mm diameter at 25– 35 KPa. After printing, cells were maintained at 37◦C with 5% CO2. The culture media were refreshed every 2 days. The viability of the cells exposed to the bioprinting conditions was evaluated using a LIVE/DEAD viability/cytotoxicity kit (Invitrogen <sup>R</sup> ). Stock solutions of the assay, ethidium homodimer-1 (0.036µM), and calcein-AM (1µM), were prepared in PBS. A volume of 1 mL calcein stock solution was added to each bioprinted sample. Following 20 min of incubation at 37◦C, 1 ml of ethidium homodimer-1 stock solution was added to the sample, and then it was incubated for an additional 10 min at 37◦C (Ooi et al., 2018). The stained bioprinted models were washed three times with PBS before obtaining images. Imaging analysis was performed with a CELENA <sup>R</sup> S Digital Imaging System with a TC PlanAchro 4X Ph objective. Cell viability was calculated as (number of green-stained cells/number of total cells) ×100 using Fiji ImageJ software (Schindelin et al., 2012).

### 3D Spheroid Formation and Histological Analysis

To form spheroids, U87 cells were seeded 5 × 10<sup>3</sup> per well in 100 µl of culture medium into 96-well round-bottom ultra-low attachment plates (Corning) and incubated for 5 days. Spheroids were deposited in GE-CH hydrogel using a 24-well plate.

In order to fix hydrogel-embedded spheroids and to obtain a compact hydrogel structure, 10% buffered formalin was added for 2 h at RT into the well. After fixation, hydrogel-embedded spheroids were washed in PBS and were moved into histological cassettes, adding filter paper pieces on top and bottom of the sample to avoid loss of material. Samples were paraffinembedded with a tissue processor (ETP, Histo-Line Laboratories) using a standard protocol, cross-sectioned at 3-µm thickness by rotary microtome (Leica RM2265), mounted on glass slides, and stained with Haematoxylin and Eosin (H&E). Sample sections were observed under a light microscope (Olympus BX51). Representative images were captured with a digital camera (Evolution VF digital Camera) using Image-Pro Plus software.

#### Statistical Analysis

Results are presented as mean ± SD and compared using one-way ANOVA. Statistical significance was set at p < 0.05.

### RESULTS AND DISCUSSION

Gelatin (GE) and chitosan (CH) were chosen as commercially available starting materials with biocompatible properties. Both GE and CH starting polymers were functionalized by reductive amination with 5-methyl furfural, in order to obtain the methylfuran derivatives GE-MF and CH-MF (**Scheme 2**). CH-MF and GE-MF were characterized by chemical-physical methods to determine the reproducibility of the reaction and the degree of functionalization. With these aims, FT-IR and NMR analyses were performed, taking advantage of the fact that methyl furan is an "unnatural group" normally absent in natural proteins and polysaccharides, and therefore, it can be easily detected, and the degree of functionalization was dosed in the obtained products.

The FT-IR spectrum of GE-MF was compared with that of the untreated gelatin as control. As shown in **Figure 2A**, the spectrum of untreated gelatin shows in the green region the characteristic two peaks at 1,635 and 1,535 cm−<sup>1</sup> corresponding to C=O and -NH- of the amide II, respectively. In the case of GE-MF, in the blue-scale region, the signals of C=C, C-H, and -C-O-C- corresponding to the furan ring are detectable, respectively, at

843, 786, and 1,080 cm−<sup>1</sup> . CH and CH-5MF were also analyzed by FT-IR and showed the peaks at 1,625, 1,520, and 1,315 cm−<sup>1</sup> corresponding to C=O stretching (amide I), NH bending (amide II), and C-N stretching (amide III) of amide groups due to the partial acetylation (**Figure 2B**) (Wang et al., 2016). As for the gelatin spectrum, also in this case in the blue-scale zone, the signals of C=C, C-H, and -C-O-C- of the furan are detectable between 800 and 1,100 cm<sup>1</sup> .

To confirm the functionalization and to determine the degree of functionalization, NMR spectra were registered, and comparisons of the untreated and functionalized biopolymers were performed.

We analyzed gelatin and functionalized gelatin by <sup>1</sup>H NMR to verify the derivatization of lysine amino groups. In methyl furan derivate (**Figure 3B**), the lysine signal at 2.9 ppm decreases compared to the intensity of the same peak in the gelatin spectrum (**Figure 3A**). Moreover, new signals (in gray) attributed to the methyl furan structure are exhibited at 6.4, 6.2, and 2.4 ppm (Nimmo et al., 2011; Koehler et al., 2013c). The degree of functionalization was calculated as previously reported in the literature, showing a degree of substitution of 14% (Hoch et al., 2013).

The properties and features of chitosan, such as solubility and biodegradability, are related to its degree of acetylation (DA). NMR spectroscopy is one of the most accurate methods for determining the DA for chitosan (Fernandez-Megia et al., 2005). **Figures 3C,D** shows <sup>1</sup>H NMR spectra, respectively, of CH and CH-5MF. Since the chitosan solution is generally viscous, its NMR spectrum has been recorded at 333 K. Various expressions were worked out to calculate the degree of acetylation; we integrated the peaks related to the acetyl group, comparing anomeric protons and found that the DA value is 18% (**Figure S1**). Also, new signals attributed to the methyl furan structure at 6.4, 6.2, and 2.4 ppm are presented in the spectrum. The functionalization degree has been calculated as reported in the literature to be 18% of substitution.

### 3D Network Formation by Diels-Alder Reaction

Star-PEG-MA (10.000 MW) with four arms was selected to allow substantial spatial freedom in the network formation to favor cell viability during the 3D bio-printing process but also to better control the reactivity of functional groups during Diels-Alder cycloaddition (Smith et al., 2018). The obtained GE-MF and CH-MF were employed for biomaterial network formation using the Star-PEG-MA, as showed in **Scheme 1**. Different amounts of Star-PEG-MA were employed in order to determine the best kinetics of network formation for the final crosslinked hydrogel. The hybrid hydrogel with a % m/V ratio of CH-GE:PEG-Star-MA=1:3:0.05 and a final concentration of 5% in PBS 7.4 was selected due to the optimal properties in terms of stability and viscosity of the final hydrogel network. The selected GE-CH hybrid hydrogel was then characterized and tested to formulate both the hydrogel and the bioprintable hydrogel. The formation of crosslinked GE-CH hydrogel was studied in comparison with the unfunctionalized GE and CH polymers in the presence of Star-PEG-MA by test tube (**Figure 4D**). The GE-CH hybrid was produced in hydrogel form and preliminary printed without cells. The produced hybrid was employed to assess the processability of the produced hybrid network and to characterize its swelling behavior and morphological properties (**Figure 4C**). SEM microscopy of the final construct shows homogeneous structures with interconnected pores, as shown at higher magnification (**Figures 4A,B**). The swelling ability of hydrogel is important for subsequent in vitro cell

studies and to develop biomaterial-based cellular constructs. The swelling is also connected to the ability to absorb nutrients from the microenvironment and to favor cell adhesion. The swelling studies were performed at pH 7.4 and 37◦C to characterize both the stability and the water uptake of the produced biomaterials. The final hybrid biomaterial shows the greatest swelling rate (1,700%) between 2 and 24 h; however, by 72 h, the swelling decreases to 800% in the absence of polymer degradation and release into the medium. These results could be related to the free functional groups of the hybrid material resulting in a different structural organization of polymer chains during water uptake (Mao et al., 2006; Saraiva et al., 2015; Li et al., 2017; Guaresti et al., 2019). The hybrid reaches swelling equilibrium at 144 h, showing adequate properties for in vitro cell studies. Concerning potential final applications like tissue engineering or 3D bioprinted cell models, the biomaterials should have a slow degradation rate. As shown in **Figure 4E**, the crosslinking methodology efficiently maintained the integrity of the hybrid hydrogel at 37◦C in PBS 7.4, demonstrating the effect of the covalent linkage in the control of biomaterial degradation over time until 45 days (data not show).

### Rheological Analysis

The rheological analysis was carried out on homogenous hydrogel solutions prepared as described in the Material and Methods section. The strain sweep test (**Figure 5A**) showed a linear viscoelasticity zone (LVE), where the intrinsic structural properties of the samples are independent of the applied stress and where the storage modulus (G′ ) is higher than the loss modulus (G′′). In the terminal LVE, the deformation is so large that a liquid-like behavior prevails; that is, the yield point is reached. The crossover points of the dynamic moduli were calculated. GE-CH hydrogel showed a linear strain region up to about 40% (**Figure 5A**). The 2% strain value was then selected for subsequent sweeps. The crossover point occurred at a very high value of strain. This value of LVE is typical of entanglement networks and strong gels (Ross-Murphy and Shatwell, 1993).

Mechanical plots were obtained by means of frequency sweep tests performed at a strain value below the critical strain γc,

FIGURE 4 | (A) SEM control images of non-crosslinked GE-CH samples. (B) SEM images of crosslinked GE-CH samples. (C) GE-CH hydrogel, dried samples, and printed formulations. (D) Test tube inversion method confirming hybrid hydrogel formation of GE-CH compared with unfunctionalized GE-CH in liquid form. (E) Swelling analysis of GE-CH hydrogel reported at time points from 0 to 168 h.

frequency (ω) for GE-CH hydrogel. (C) Structural recovery behavior of the GE-CH as a function of time, assessed by monitoring G′ (t) (γ2%, ω1 rad/s) after destruction by applying an 800% oscillatory shear strain. Modulus G′ (blue) and G" (green).

in the LVE zone (2% for all samples). Measurements of the viscoelastic moduli G' and G" were registered with a range of oscillation frequencies at a constant oscillation amplitude. **Figure 5B** shows how the viscous (G") and the elastic (G') moduli vary with frequency. The storage modulus G' was higher than the loss modulus G". This reflects the existence of threedimensional networks similar to those of strong gels. Thus, in the LVE region, the sample shows solid-like properties. The mechanical plots are representative of hydrogel properties and classification. The hydrogels can be classified as "strong" hydrogels when G' > G" showing linear viscoelasticity at high strains. "Weak" hydrogels exhibit G′ > G ′′ linear viscoelasticity just at low strain values at all the detected frequencies (Lapasin and Pricl, 1995). Consequently, the GE-CH hydrogel under study can be considered as a strong gel because of the slight dependence of G' and G" on the frequency. The data presented

here are similar to those in a study by Martínez-Ruvalcaba et al. (2007) on chitosan/xanthan hydrogels. According to the theory of weak gels (Bohlin, 1980), the assessment of the viscoelastic behavior of hydrocolloid gel allows the quantification of the intensity of colloidal forces acting within the polymer network and the interactions among components that interact with each other to a certain extent, forming a single strand. Therefore, the relationship between the mesostructure of a hydrogel and its rheological behavior can be established. The Bohlin coordination number z quantifies the number of flow units interacting with one another to give the observed flow response of the material:

$$\mathcal{G}'(\omega) \sim \omega^{\frac{1}{x}} \tag{3}$$

By processing the G' (w) data of the GE-CH hydrogel, the z value was equal to 25.6, confirming the status of the robust structured gel network.

Traditional hydrogels are characterized by weak properties if subjected to a mechanical stimulus or stress. Compared to other hydrogels presented in the literature, the growth of the viscoelastic behavior in response to deformation of the produced GE-CH hydrogel extends the plausible application of the hybrid polymer to different tissue engineering or biomedical applications. In particular, the self-healing properties of hydrogels have an increased value for both 3D bioprinting procedures and injectable systems (Taylor and in het Panhuis, 2016). Therefore, the self-healing properties of the hydrogel were investigated by the application of 800% oscillatory shear strain. After shear removal, the restoration of the dynamic moduli was followed in real time (g 2%, ω 1 Hz). The healing efficiency (HE) was calculated according to Equation 3. A value of HE closer to one indicates the more desirable self-healing capability, whereas a value closer to zero indicates less efficient self-healing (the result is shown in **Figure 5C**). A completely destructured network and transformation into a liquid-like material (G">G') are the immediate results of high shear strain (g = 800% for 3 min, ω 1 Hz). Right after cessation of destructive strain, the sample exhibited solid gel responses, with values of instantly restored G′ of around 90% of the original value, with a calculated HE index equal to 0.89. The healed hydrogels were strong enough to sustain repeated stretching; indeed, upon repeating this change of amplitude force, the structure of the hydrogel did not change significantly from that after the first step, and the same healing efficiency (HE=0.88) was obtained. Generally, the self-healing ability of the hydrogel and the time required for the healing process increase with the growth of the viscoelastic behavior in response to deformation of the hydrogels. The observed selfhealing properties of the hydrogel may be related to the physical interactions between the amino groups of chitosan and the carboxylic groups of gelatin.

#### 3D Spheroid Cultures and 3D Bioprinting

To test the biocompatibility of GE-CH hybrid hydrogel and to investigate the range of potential applications, 3D cell spheroid screening and bioprinted protocols were investigated.

The development of a new 3D in vitro model is today an ambitious aim. In the case of biomaterials, embedded and bioprinted models are of great interest to better support cell distribution and cross-talk with the surrounding matrix (as in the tumor microenvironment) so as to investigate drug distribution and the matrix effect around cell aggregates (Fetah et al., 2019; Sun et al., 2020). To set up the experimental conditions for 3D embedded and bioprinted cells, U87 human primary glioblastoma cell lines were selected. Spheroids have gained increasing attention in cell biology studies due to their 3D organization being closer to that of the real cell microenvironment in tissues (Sutherland et al., 1981). Nowadays, the employment of spheroids or more complex organoids is an area of interest for the study of cell biology mechanisms in pathologies, for drug screening purposes, or for tissue engineering and regenerative medicine studies. Today, several commercial materials are under investigation to better control the integrity of spheroid, to study the effect of the microenvironment, and to build up complex tissue-like studies. The viability of spheroids embedded in synthetic matrices or polymers has, in any cases, some limitations. Natural materials can induce undesirable phenomena, like cell migration (Liu et al., 2018), or in certain cases, the selected materials can be toxic or not appropriate for cell-adhesion purposes. The advantages of the employment of biomaterials in cells and spheroid encapsulation are the stability of the construct over time and the compositions of the employed biomaterials (Mironov et al., 2009). The GE-CH crosslinked biomaterial was employed as a hydrogel to encapsulate and maintain U87 3D spheroids. The spheroids

were cultured from U87 cells, and, once uniform spheroidal structures (diameter ranging from about 400 to 500µm) had been obtained, the GE-CH hydrogel was employed to embed the 3D spheroids and to follow their integrity and viability over time.

The results of histological analysis of hydrogel-embedded spheroids are reported in **Figure 6** (**Figure S2**). As shown by staining with H&E, cells in the external layer exhibit a polygonal morphology and tend to be tightly packed, while, in the center of the spheroid, U87 cells are fusiform and less densely packed. However, in all of the observed days, cells appear viable and proliferative and do not show signs of sufferance or degeneration. An interesting observation for the produced hydrogel is related to its ability to maintain the spheroid dimensions and structure, maintaining viability over the culture time points, whereas many of the commercially available matrices result in increased cell migration (Thakuri et al., 2018).

3D bioprinting is a promising fabrication technique, not just for traditional tissue engineering applications but also to model pathological tissues using multiple bioinks and architecture in the third dimension (Gungor-Ozkerim et al., 2018; Sun et al., 2020). The 3D bioprinting process includes cells and polymers in the same "environment." As a consequence, the properties of the polymers and the crosslinking methodologies employed during the bioprinting process must be biocompatible and easy to tailor. Examples to date include the use of "ionic crosslinkers" (i.e., calcium or divalent salts) or UV and photo-initiator-assisted reactions (i.e., methacrylation, acrylation; Derakhshanfar et al., 2018; Gungor-Ozkerim et al., 2018; Anil Kumar et al., 2019; Ding et al., 2019; Petta et al., 2020; Schipani et al., 2020; Sun et al., 2020). Click reactions can be conducted without the presence of catalyzers or components that can have a detrimental effect; this is advantageous for bioprinting protocols but can be challenging when heterogeneous polymers are employed. The Diels-Alder reaction has already been successfully employed to fabricate 3D bioprintable alginate biopolymers (Ooi et al., 2018). Here, we demonstrate that the same procedure can be adapted to hybrid systems of chitosan and gelatin to expand the employment of the methods to cell populations needed of adhesion motifs.

The bioprinting protocol was assessed, adapting the printing condition of the hydrogel to cell culture. Human U87 glioblastoma cells were employed to preliminarily test the applicability of the produced hydrogel also in bioprinting procedures. In detail, the hybrid hydrogel was dissolved in PBS, and the time of hydrogel network reaction was reduced from 3 to 2.5 h to produce a partially crosslinked material in order to facilitate extrusion in the presence of cells and to obtain homogenous cell-hydrogel solutions.

The printability was tested operating under sterile conditions on a single Petri dish (35 mm diameter) to avoid contamination and to control the printability and medium change better. The bioprinting process (**Video S1**) resulted in a printed construct stable in the medium until 6 days of culture, showing the stability of crosslinked cell-laden construct without the employment of additives or salts/crosslinkers to improve the hydrogel shape during the culture media changes. The LIVE/DEADTM Viability/Cytotoxicity assay (Madl et al., 2016; Zhang et al., 2018) revealed acceptable viability of the bioprinted cells at day 1 and that this increased at days 3 and 6, indicating proliferation of the cells embedded in the hydrogel, as also confirmed by the imaging results (**Figures 7A,B**).

#### CONCLUSIONS

In this article, we reported the functionalization of gelatin and chitosan with methyl furan. The polymers obtained were employed with a PEG-star-MA to control the polymerization of hybrid polymers. Network formation by Diels-Alder reaction was preliminarily employed to test the performance of the obtained hydrogels for spheroid encapsulation and 3D bioprinting with the U87 cell line. The hybrid biomaterial obtained was characterized in terms of physicochemical and biological properties. It showed interesting rheological properties, including self-healing features and promising preliminary evidence for biocompatibility. Furthermore, the possibility of employing the GE-CH hydrogel in 3D bioprinting applications opens the way to more detailed studies in the field of tissue engineering and 3D culture for advanced biological studies.

#### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

### AUTHOR CONTRIBUTIONS

All of the authors contributed to the implementation of the research, to the analysis of the results, and to the writing of the manuscript.

#### REFERENCES


### FUNDING

The authors acknowledge funding from the EC, H2020-MSCA-ITN-2014-GA-642028, Design and development of advanced NAnomedicines to overcome Biological BArriers and to treat severe diseases (NABBA), and H2020-NMBP-15-2017-GA-760986, Integration of Nano- and Biotechnology for beta-cell and islet Transplantation (iNanoBIT). They also acknowledge funding from the Italian Ministry of Health (Grant No. RF-2016- 02362946), POR-FESR 2014-2020 Innovazione e Competitività, and Progetti Strategici di Ricerca, Sviluppo e Innovazione, Azione I.1.b.1.3-IMMUN-HUB**—**Sviluppo di nuove molecole di seconda generazione per immunoterapia oncologica.

### ACKNOWLEDGMENTS

The authors thank Dr. Tiziano Catelani for SEM analysis.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2020.00524/full#supplementary-material

Video S1 | Bioprinting process of cell-laden gelatin-chitosan hybrid hydrogel in a grid pattern.


patches: a preliminary study. Langmuir 30, 1336–1342. doi: 10.1021/la40 4310p


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Magli, Rossi, Risi, Bertini, Cosentino, Crippa, Ballarini, Cavaletti, Piazza, Masseroni, Nicotra and Russo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Protein Glycoengineering: An Approach for Improving Protein Properties

Bo Ma<sup>1</sup> , Xiaoyang Guan<sup>2</sup> , Yaohao Li 1,2, Shiying Shang<sup>3</sup> , Jing Li <sup>4</sup> \* and Zhongping Tan<sup>1</sup> \*

*<sup>1</sup> State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China, <sup>2</sup> Department of Chemistry and Biochemistry and BioFrontiers Institute, University of Colorado, Boulder, CO, United States, <sup>3</sup> School of Pharmaceutical Sciences, Tsinghua University, Beijing, China, <sup>4</sup> Beijing Key Laboratory of DNA Damage Response and College of Life Sciences, Capital Normal University, Beijing, China*

#### Edited by:

*Xuechen Li, The University of Hong Kong, Hong Kong*

#### Reviewed by:

*Matt Schinn, University of California, San Diego, United States Suwei Dong, Peking University, China*

#### \*Correspondence:

*Jing Li jing\_li@mail.cnu.edu.cn Zhongping Tan zhongping.tan@imm.pumc.edu.cn*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *18 March 2020* Accepted: *15 June 2020* Published: *23 July 2020*

#### Citation:

*Ma B, Guan X, Li Y, Shang S, Li J and Tan Z (2020) Protein Glycoengineering: An Approach for Improving Protein Properties. Front. Chem. 8:622. doi: 10.3389/fchem.2020.00622* Natural proteins are an important source of therapeutic agents and industrial enzymes. While many of them have the potential to be used as highly effective medical treatments for a wide range of diseases or as catalysts for conversion of a range of molecules into important product types required by modern society, problems associated with poor biophysical and biological properties have limited their applications. Engineering proteins with reduced side-effects and/or improved biophysical and biological properties is therefore of great importance. As a common protein modification, glycosylation has the capacity to greatly influence these properties. Over the past three decades, research from many disciplines has established the importance of glycoengineering in overcoming the limitations of proteins. In this review, we will summarize the methods that have been used to glycoengineer proteins and briefly discuss some representative examples of these methods, with the goal of providing a general overview of this research area.

Keywords: glycoengineering, therapeutic protein, enzyme, biological method, chemical method

#### INTRODUCTION

With the deepening of our understanding of biology, recombinant proteins have become an important class of biological macromolecules that are widely used in medicine, industry, agriculture, environmental protection, and other fields (Puetz and Wurm, 2019). In the arena of medicine, therapeutic proteins, such as antibodies, cytokines/growth factors, and hormones, are indispensable for the prevention and treatment of cancer, infections, autoimmune diseases, metabolic genetic diseases, and many other diseases, largely due to their advantages of high specificity, low toxicity, and defined biological functions. They are now the fastest-growing segment of the global pharma market (Owczarek et al., 2019). Proteins that are frequently utilized in industrial, agricultural, environmental protection, and other related fields are enzymes, which include amylase, lactase, lipase, phytase, xylanase, and cellulase. Enzymes have the advantages of high catalytic efficiency, high specificity, mild reaction conditions, and less pollution. Their applications in food, detergent, textile, paper, breeding, new energy, and waste management industries have greatly improved the quality of produced products, reduced environmental pollution, and promoted sustainable economic and ecological development (Arbige et al., 2019).

However, due to the nature of biological macromolecules, proteins also have their own problems. Because of their large molecular weight, complex composition and structure, many proteins have limited solubility and thermal and proteolytic stability. They can be denatured during storage or are prone to aggregation and chemical modifications, such as oxidation and deamidation. The existence of these problems can result in decreased efficacy of therapeutic proteins and increased immunogenic side effects. For enzymes, these problems could lead to their slow development and high production costs, which in turn limit their industrial applications. Scientists have been trying for many years to solve these problems (Sinha and Shukla, 2019). They have explored many different methods to engineer proteins, with the hope of improving their stability, solubility, and biological activity, decreasing the immunogenicity or other side effects of therapeutic proteins, and reducing the production costs of industrial enzymes. Among all the methods tested, glycoengineering appeared to be one of the most promising for future research.

Glycoengineering is a method of improving the properties of proteins by changing their glycosylation (Goochee et al., 1991; Sinclair and Elliott, 2005; Beck and Reichert, 2012; Dicker and Strasser, 2015). Glycosylation of proteins refers to the attachment of glycans to proteins in the form of covalent bonds (**Figure 1**) (Spiro, 2002). Glycans can also be called carbohydrates, sugars, monosaccharides, oligosaccharides, or polysaccharides. Glycosylation is a major form of posttranslational modification (PTM) of proteins. Glycosylation can occur on the side chains of many amino acid residues of proteins in a number of different ways. The two most common ways are to attach glycans to the side chain nitrogen (N) atoms of Asn residues and to the side chain oxygen (O) atoms of Ser and Thr residues. Depending on atoms to which glycans are linked, these two types of glycosylation are called N-linked glycosylation and O-linked glycosylation, respectively. In addition to the different side chain atoms in the glycosidic linkage, there are also many other differences between these two types of glycosylation. For example, in eukaryotic cells where glycosylation is widely present, the first sugar residue that is directly attached to Asn is usually β-linked N-acetylglucosamine (β-GlcNAc), while the ones on Ser and Thr side chains include many different structures, such as β-GlcNAc, α-linked N-acetylgalactose (α-GalNAc), α-linked mannose (α-Man), α-linked fucose, β-linked xylose, α- or βlinked galactose and glucose (**Figure 1**).

Protein with glycosylation are called glycoproteins. Many years of research has demonstrated that glycosylation is an important PTM that plays important roles in regulating the properties of proteins (Rudd et al., 1994; Boyd et al., 1995; Van den Steen et al., 1998). By forming hydrogen bonds or other noncovalent interactions with amino acid residues of the proteins to which they are attached, glycans can improve the folding efficiency and conformational stability of proteins, prevent their abnormal aggregation, increase their water solubility, and decrease their rate of thermal denaturation, proteolytic inactivation and chemical degradation (Varki, 2017). In addition, glycans can also directly participate in the interaction with other macromolecules, viruses, and cells, thereby leading to altered substrate binding affinity and specificity, and biological activity of glycoproteins. Compared with other types of PTMs and amino acid mutations, the greatest advantage of glycoengineering is that, when glycosylation sites and glycan structures are selected appropriately, this method is capable of simultaneously improving many different properties of proteins. Such an advantage has aroused great interest of scientists to explore this new frontier.

Since the 1980s, scientists have started to use glycosyltransferases and glycosidases to add sugars to and remove sugars from oligosaccharide chains of proteins by utilizing in vivo (cellular) genetic technologies and in vitro enzymatic methods (Lee et al., 1989; Lairson et al., 2008; Bennett et al., 2012; Albesa-Jove et al., 2014; Janetzko and Walker, 2014; Moremen and Haltiwanger, 2019). Their efforts have led to many important findings, and the discovery and development of many therapeutic proteins and enzymes with improved properties and functions. But on the whole, the number of successfully commercialized enzymes and approved therapeutic proteins that have been developed through protein glycoengineering is small, with probably the most well-known one being darbepoetin alfa, a novel therapeutic agent for renal anemia (Elliott et al., 2003). A possible explanation for the small number is that sufficient understanding of the structure-function relationship of protein glycosylation has not been achieved and reliable scientific theories have not been fully developed to guide the glycoengineering efforts. In order to improve the success rate of protein glycoengineering, scientists need to conduct more research into the relationship between the structure and performance of glycoproteins. Although it may take a long time to establish reliable guidelines for predicting the outcomes of protein glycoengineering, more and more encouraging results have been obtained in recent years. In this review, we will summarize and compare some of the representative results, with the goal of providing a general picture of this research area.

This review is intended to provide a brief introduction to the protein glycoengineering area. We will only touch upon a limited number of examples for each research direction. Interested readers may refer to more comprehensive reviews for detailed information (Bailey, 1991; Wright and Morrison, 1997; Saxon and Bertozzi, 2001; Bretthauer, 2003; Sinclair and Elliott, 2005; Hamilton and Gerngross, 2007; Beck et al., 2008; Beck and Reichert, 2012; Beckham et al., 2012; Baker et al., 2013; Merritt et al., 2013; Dicker and Strasser, 2015; Geisler et al., 2015; Greene et al., 2015; Buettner et al., 2018; Mimura et al., 2018; Montero-Morales and Steinkellner, 2018; Tejwani et al., 2018; Wang et al., 2018, 2019; Yates et al., 2018; Agatemor et al., 2019; Harding and Feldman, 2019; Mastrangeli et al., 2019). In addition to glycoengineering using naturally occurring glycans and glycosidic linkages to improve the properties of proteins, there are many research efforts geared toward chemical and enzymatic synthesis of glycans, development of glycan-based vaccines and adjuvants, or using unnatural glycans and site-selective conjugation chemistry to achieve protein glycoengineering objectives. Detailed discussions of these efforts are beyond the scope of this review. The necessary information about these research studies can be found side chains.

in excellent review articles by Saxon and Bertozzi (2001), Sola et al. (2007), Gamblin et al. (2009), Wolfert and Boons (2013), Krasnova and Wong (2016), Wu et al. (2017), Sun et al. (2018), Wen et al. (2018), Guberman and Seeberger (2019), Moremen and Haltiwanger (2019), and Rahfeld and Withers (2020).

Protein glycosylation is defined by glycosylation sites and glycan structures. Accordingly, protein glycoengineering is carried out by varying two parameters: site and structure, and more specifically, by changing the number and position of the glycosylation sites and/or by changing the structure of glycans (including linkage type, chain length, and composition) at individual glycosylation sites. Based on the way how glycoproteins are produced, protein glycoengineering can be roughly divided into two main categories (Wang et al., 2019). In one category, glycoproteins are produced by cell expression. In the other category, they are prepared through chemical synthesis, including biochemical and organic synthesis (Rich and Withers, 2009). Here, we will first review glycoengineering methods based on cell expression, and then discuss chemical synthesis-based glycoengineering methods.

### CELL-BASED PROTEIN GLYCOENGINEERING

In the past 30 years, many different methods have been developed to engineer cells of animals, plants, insects, yeasts, bacteria, etc. to express proteins with desired glycosylation patterns. These methods mainly use gene knockout, knockdown, knockin, overexpression, mutation, or small molecule suppression technologies to change the type and concentration of glycosidases and glycosyltransferases that are available inside these cells, thereby changing the glycosylation patterns of interested proteins expressed in them. Recent advances in gene editing tools, especially the CRISPR/Cas9 system, has enabled more rapid and cost-effective cell glycoengineering (Chan et al., 2016; Chung et al., 2017; Mabashi-Asazuma and Jarvis, 2017; Jansing et al., 2019; Karottki et al., 2020). Currently, the most widely used cells for protein glycoengineering are mammalian cells.

#### Glycoengineering Based on Mammalian Cells

Since the 1980s, mammalian cells, mainly Chinese hamster ovary (CHO) cells, have been used for the production of glycosylated recombinant therapeutic proteins (Tejwani et al., 2018; Wang et al., 2018). Compared to human cell lines, CHO cells tend to add a small amount of non-human glycans α-galactose (α-Gal) and N-glycolylneuraminic acid (Neu5Gc) to recombinant proteins (Hokke et al., 1990). If their quality is not well controlled, engineered glycoproteins produced by this expression system may cause immune response. Despite this minor limitation, CHO cells offer multiple advantages. First, they can be cultured in large-scale bioreactors and their production rate of glycoproteins is much higher than that of human cells. Second, due to the natural differences in species, CHO cells are much less likely to transmit human pathogens. Because the advantages outweigh the disadvantages, CHO cells have become one of the most widely used mammalian cell expression system for the production of glycoproteins.

One protein glycoengineering strategy based on CHO cells is to modify the structure of glycans on proteins through gene knockout technologies, so as to achieve the goal of improving their properties. A representative work in this regard is to enhance the antibody-dependent cell-mediated cytotoxicity (ADCC) of immunoglobulin (IgG) antibodies by knocking out α-1,6-fucosyltransferase (FUT8). ADCC is an important mechanism of antibody therapeutics. Antibodies recognize and bind to surface antigens of target cells (e.g., cancer cells) through the antigen binding fragments (Fab), and interact with crystalline

fragment (Fc) receptors (FcR) on effector cells (such as natural killer cells) via the Fc portion. After the interaction of Fc with FcR, immune effector cells are activated and secrete cytotoxic molecules to kill target cells. This process is called ADCC (**Figure 2**). Enhancement of ADCC can be achieved by increasing the binding affinity of antibodies to Fc receptors, which in turn can be accomplished by modifying the glycosylation of the Fc region of IgG.

The highly conserved Asn residues at position 297 (N297) of the IgG Fc regions are N-glycosylated (Wright and Morrison, 1997; Beck et al., 2008; Reusch and Tejada, 2015; Mastrangeli et al., 2019). Previous studies have found that the fucose residue attached via α-1,6-linkage to the innermost N-GlcNAc of the N-glycans at N297 is the key residue for modulating ADCC. Removal of the core fucose moiety from IgG-Fc glycans can significantly increase the binding affinity of Fc for FcR, thereby enhancing ADCC (Shields et al., 2002). FUT8 is the sole enzyme that catalyzes the transfer of fucose from GDPfucose to N-linked oligosaccharides. Therefore, knocking out the FUT8 gene in CHO cells would be a promising method for producing therapeutic IgG antibodies with enhanced ADCC (Yamane-Ohnuki et al., 2004). This concept was validated experimentally. In a representative study, Yamane-Ohnuk et al. successfully generated FUT8−/<sup>−</sup> CHO/DG44 cell lines by sequential homologous recombination. Their expression results showed that the anti-CD20 (IgG1) antibody produced by their cell line had significantly increased binding affinity to the human receptor FcγRIIIa, and the ADCC of this antibody was enhanced to ∼ 100-fold compared with that produced in normal CHO/DG44 cells.

Previous studies have shown that other monosaccharides of N-glycans attached to Asn297 of IgG could also regulate ADCC (for example, the bisecting GlcNAc linked β-1,4 to the mannosyl residue in the core pentasaccharide **Figure 3**; Davies et al., 2001). During the biosynthesis of N-glycans, the key enzyme that catalyzes the introduction of bisecting GlcNAc into N-glycans is the β-1,4-N-acetylglucosaminyltransferase III (GnTIII). Based on this knowledge, Umana et al. (1999) constructed a GnTIII cDNA transfected CHO cell line. By promoting the overexpression of GnTIII, they were able to obtain IgG antibodies with increased bisecting GlcNAc. Their results showed that the ADCC of the produced IgG antibody is much higher than that of antibodies containing less bisecting glycans, suggesting that bisecting GlcNAc has a positive impact on ADCC.

However, although both fucose removal and bisecting GlcNAc addition enhance ADCC of antibodies, the magnitude of the increase caused by these two different types of modifications is quite different. The increase caused by the modification of Nglycans by bisecting GlcNAc is generally less than 10-fold, which is much lower than that observed with the removal of fucose. In addition, the success rates of these two methods are also different. Glycoengineering carried out by removing fucose residues has a higher success rate than that by adding bisecting GlcNAc. Indeed, Yamane-Ohnuki et al. (2004) has argued that bisecting GlcNAc may have no effect on ADCC. Therefore, the former method is currently more widely used.

The high variability and controversial reliability of the results of protein glycoengineering based on bisecting GlcNAc was related to the previous lack of a clear and definite understanding of this type of glycosylation and how it is regulated (Shinkawa et al., 2003). These glycoengineering studies were performed using empirical knowledge. Without a theoretical foundation, little was known about how glycosylation affects protein properties, and under what circumstances it could improves protein properties. The glycoengineering design in such a way is not very scientific and therefore would inevitably produce controversial results. To reverse this situation, a deeper and clearer understanding of protein glycosylation is required. An excellent example demonstrating this point is the work by Ferrara et al. (2006). Through their research, they found that the high bisecting GlcNAc level introduced by overexpression of GnTIII inhibited the core fucosylation, which led to an increase in the proportion of N-glycans without fucose (Ferrara et al., 2006). This finding suggested that the bisecting GlcNAc may regulate ADCC indirectly and therefore, its effect is not very predictable.

Increasing the content of other monosaccharides on N-glycans in the Fc region of IgG antibodies, such as the penultimate Gal and terminal N-acetylneuraminic acid (Neu5Ac/sialic acid) residues, has also been shown to improve the performance of antibodies, including enhancing their ADCC, complement-dependent cytotoxicity (CDC), and anti-inflammatory activities (Tsuchiya et al., 1989; Raymond et al., 2015). Similar to the previous uncertain role of bisecting GlcNAc in the ADCC, the effects of the presence of Gal and Neu5Ac on IgG antibodies are also not quite clear. Again, this is mainly due to the current lack of a deep understanding of protein glycosylation. The reason why it is difficult to improve the understanding of glycosylation is that there are not many available tools to accurately control or determine the composition of glycoproteins. For example, when increasing or decreasing the expression of one or more enzymes involved in the biosynthesis of glycans, it is hard to find a robust analytical tool that would allow one to assess whether the change in their expression would affect the functions of other glycosyltransferases and/or glycosidases. Even if this is not so, the inherent heterogeneity

in the sugar moieties makes it difficult to describe precisely the composition of glycoproteins produced by recombinant host cells (Kodama et al., 1991; Higel et al., 2016). Glycosylation is not template-driven and heterogeneity of glycoproteins arises from the presence of different glycan structures at one glycosylation site (microheterogeneity) and different degrees of glycosylation site occupancy (macroheterogeneity). Due to the heterogeneity, glycoproteins typically exist as complex mixtures, which can consist of several tens to more than one hundred different glycoforms (Toll et al., 2006; Yang et al., 2016). The extent of heterogeneity can vary depending on glycoproteins and their production methods. Researchers from many different disciplines have undertaken considerable efforts to develop and optimize methods and tools for the control and analysis of the heterogeneity of recombinant glycoproteins and have achieved encouraging success. For example, by developing computation models of protein glycosylation, researchers are now able to provide guidance on the design of optimal strategies to obtain a target glycosylation profile with desired properties (Umana and Bailey, 1997; Grainger and James, 2013; Spahn et al., 2016, 2017; Krambeck et al., 2017; Sokolov et al., 2018; Liang et al., 2020). By improving chromatographic separation and analytical tools such as capillary electrophoresis, high performance liquid chromatography and mass spectrometry, researchers have made significant advances in the determination of the identity and quantity of differently glycosylated protein forms (glycoforms) (Domann et al., 2007; Zaia, 2008; Artemenko et al., 2012; Campbell et al., 2014; Zhang et al., 2016). Continued progress in these areas is expected to further broaden and deepen the understanding of the role of different monosaccharide units in regulating the properties of antibodies, thus making the cell-based glycoengineering results more predictable in the future.

Besides changing glycan structure at specific glycosylation sites, glycoengineering can also be performed by changing the number of glycosylation sites. The most representative example in this regard is the glycoengineering of human erythropoietin (hEPO) (Egrie and Browne, 2001). The main medical use of hEPO is to treat anemia, especially anemia caused by chronic kindney disease, cancer radiotherapy and chemotherapy. The purpose of hEPO glycoengineering, simply put, is to extend its half-life in vivo by increasing the number of its N-linked glycosylation sites. Naturally occurring hEPO contains three N-glycosylation sites and one O-glycosylation site (**Figure 4**). Neu5Ac located at the terminal position of N-linked glycans is important for the circulating half-life of proteins because it can help reduce the chance of a protein being taken up into hepatocytes by endocytosis, filtered by the glomeruli, and degraded by proteases (Morell et al., 1971). Through careful research and analysis, Elliott et al. (2004) found that it is much easier to add new N-glycosylation sites to hEPO than to increase the number of O-linked ones. The main reason for this observation is that N-glycosylation sites are defined by the consensus sequence (or sequon), Asn-Xaa-Thr/Ser, where Asn is the glycosylation site and Xaa is any natural amino acid except Pro. Although it is not guaranteed that Asn residues in all consensus sequences can be glycosylated, the probability of them bearing N-glycans is very high. Unlike N-glycosylation,

they found that O-glycosylation does not appear to be controlled by the primary sequence context and has no clear consensus sequences, and thought that it may be directed by the secondary or tertiary structure and occurs only in a very few sites that could meet its conformational requirements (Elliott et al., 1994). Guided by these empirical findings, Elliott et al. decided to only introduce new N-glycosylation sites into hEPO via site-directed mutagenesis. When the DNA sequence encoding the mutant hEPO was expressed in CHO cells, five N-glycans and one Oglycan were added to its surface. These two additional N-glycans greatly increased the content of Neu5Ac on hEPO, and thus helped reduce its rate of clearance from the bloodstream and improved its clinical efficacy.

In addition to the glycoengineering method of adjusting the expression and activity of enzymes involved in glycan biosynthesis and the method of increasing the number of glycosylation sites, another commonly used method is metabolic glycoengineering, a technique that was developed almost thirty years ago where protein glycosylation can be altered by changing the concentrations of monosaccharides or nucleotide sugars in the culture media (Bailey, 1991; Gramer et al., 2011; Buettner et al., 2018; Agatemor et al., 2019). A representative example is the study by Gu and Wang (1998) in which 20 mM of Nacetylaminomannose (ManNAc) was added to the culture media of CHO cells. They found that the supplement was able to decrease the proportion of incompletely sialylated N-glycans at Asn97 of interferon-γ (IFN-γ) from 35 to 20% without any adverse effect on cell growth and protein production. In mammalian systems, ManNAc is a metabolic precursor for the biosynthesis of Neu5Ac. It is converted into Neu5Ac in the cytosol, and following that, Neu5Ac enters the nucleus and is activated to form CMP-Neu5Ac. Finally, CMP-Neu5Ac is transported to the Golgi apparatus where Neu5Ac is transferred to an oligosaccharide chain. In this manner, the increase in the concentration of ManNAc leads to an elevated level of Neu5Ac, which in turn leads to an extended half-life of glycoproteins. In addition to ManNAc, a wide range of metabolite precursors, glycosyltransferase inhibitors, pH modulators, as well as cell culture parameters (e.g., pH, temperature) have also been explored for protein glycoengineering (Sha et al., 2016; Ehret et al., 2019). The glycoengineering method based on metabolism and based on the regulation of enzyme expression and activity are similar in principle, both of which achieve changes in glycan structures by interfering the pathway of N-glycan biosynthesis. It is thus conceivable that the metabolic glycoengineering method is also limited by the nature of the CHO cell expression system. Proteins glycoengineered using this method also exist as inseparable heterogeneous mixtures of glycoforms.

Apart from CHO cells, there are many other mammalian cell lines that have been utilized for protein glycoengineering, with the more frequent ones being mouse myeloma cells NS0 and SP2/0 (Lifely et al., 1995). The advantages of these cells for glycoengineering are very similar to those of CHO cells, i.e., they are also relatively easy to use and can give a high yield of proteins. Their disadvantages are also similar to those of CHO cells, that is, the engineered glycoproteins produced by the cells are in the form of heterogeneous mixtures, and may contain traces of non-human monosaccharides like α-Gal and Neu5Gc, etc.

#### Glycoengineering Based on Non-mammalian Cells

Scientists have also chosen many different types of nonmammalian cells for protein glycoengineering, including plant, insect, yeast, and bacteria cells. Compared with mammalian cells, plant cells have several advantages, the most important of which is that the glycoproteins produced in plant cells are more homogeneous than those synthesized in mammalian cells (Montero-Morales and Steinkellner, 2018). The reason for this is that plant cells normally produce only a few N-glycans, with two of them, namely GnGnXF and MMXF, accounting for more than 90% of the total (**Figure 5**) (Chen, 2016). Therefore, plant cells have the potential to generate glycoproteins with better defined N-glycan structures. A high degree of homogeneity would better help establish the detailed contribution of glycans to the physicochemical and biological properties of proteins and such information would be beneficial for protein glycoengineering. Other advantages of plant cells as a production host include fast production of glycoproteins and high tolerance toward manipulation of N-glycan biosynthetic pathways. A disadvantage of plant cells is that glycoproteins produced by such cells usually contain plant-specific core α-1,3-fucose and β-1,2 xylose, which are absent in humans. Glycoproteins decorated with such monosaccharides may elicit immune responses. The advantage and disadvantage of the insect expression system are similar to those of plant cells. It is also a high-yielding expression system and easy to use, but can incorporate nonhuman glycan structures, including the α-1,3-fucose moiety, into target glycoproteins (Geisler et al., 2015). The major difference

between these two non-mammalian expression systems is that they produce different glycan structures (**Figure 5**).

The methylotrophic yeast Pichia pastoris has also been developed for protein glycoengineering (Cereghino and Cregg, 2000; Bretthauer, 2003; Choi et al., 2008). Compared with mammalian cells, yeast can be cultured at a higher cell density, which makes glycoprotein production more efficient and production costs much cheaper. However, O-linked glycans in Pichia pastoris are typically linear chains of oligomannoses and N-linked glycans are of the high-mannose type (**Figure 5**). Therapeutic glycoproteins carrying such glycan structures can be easily cleared from the body due to the lack of terminal Neu5Ac residues. Recently, bacteria have also attracted great interest in their potential use for protein glycoengineering as a fast, simple, and low-cost expression system (Baker et al., 2013; Merritt et al., 2013; Yates et al., 2018). However, glycans in bacteria are also significantly different from human glycans (**Figure 5**) (Du et al., 2019; Harding and Feldman, 2019). In order to circumvent the risk of immunogenic reactions from non-human glycans, several approaches to humanizing yeast and bacterial N-glycosylation pathways have been attempted over the last twenty years (Hamilton and Gerngross, 2007). For example, Hamilton et al. (2006) engineered the protein glycosylation pathway in Pichia pastoris by knocking out four yeast-specific glycosylation genes and introducing 14 heterologous glycosylation genes. Using humanzied Pichia pastoris expression system, they were able to produce hEPO containing predominantly human N-glycans that had greater than 90% terminal sialylation. However, although today there are many selections of non-mammalian systems available for protein glycoengineering, they have not been widely applied in the production of therapeutic glycoproteins, largely due to the complexity of these expression systems.

Non-mammalian cells can also be applied for the glycoengineering of industrial enzymes. Unlike therapeutic proteins, where immune response is a concern, industrial enzymes are not products for direct human use and, therefore, there is no need to humanize the glycosylation pathways in these cells. Currently, the methods for industrial enzyme glycoengineering mainly include changing the structures of glycans on industrial enzymes by switching their expression systems, by optimizing the culture conditions, or by changing the number of glycosylation sites through amino acid mutations. In this research area, one of the relatively more explored industrial enzyme families is the cellulase family (Beckham et al., 2012; Greene et al., 2015). Cellulases are glycoside hydrolases (GHs) that can decompose cellulose in wood, agricultural residues and municipal solid wastes into shorterchain sugars, such as cellodextrin, cellobiose, and glucose, which could then be converted to bioethanol through a fermentation process. In the process of bioethanol production, the enzymatic activity of cellulase plays a crucial role. In order to improve the performance of cellulase, Adney et al. (2009) inactivated the N-glycosylation site at the position 384 of the Trichoderma reesei Family 7 cellobiohydrolase (TrCel7A) by mutation and expressed the resulting mutant enzyme in a different host, Aspergillus niger var. Niger, which is a fungus and one of the most common species of the genus Aspergillus. By comparing the bacterial cellulose hydrolysis time courses for the wild-type TrCel7A and the mutant, they found that the removal of a glycan at N384 resulted in the improvement of the activity of the enzyme by 70% after 120 h. However, although enzyme glycoengineering has received more and more attention in recent years, the glycoengineering outcomes are still not satisfactory. In order to get better results faster, more detailed research is needed to answer some fundamental questions that have not been answered. These questions are essentially the same as the ones for therapeutic protein glycoengineering research: what, how, and why

specific glycosylation patterns can improve the performance of enzymes.

#### CHEMISTRY-BASED PROTEIN GLYCOENGINEERING

Over the past 40 years, substantial progress has been made in all aspects of cell-based protein glycoengineering, including optimization of fermentation conditions, genetic modification of glycoprotein expression hosts, glycoprotein purification, composition analysis, and characterization. However, challenges that limit the wide application of this approach in industry and medicine still exist. As aforementioned, the challenges are mainly related to the inseparable and unpredictable nature of glycoform mixtures produced by different cells. Because it is still difficult to precisely and effectively quantify heterogeneous glycoform mixtures, it is not trivial to obtain definitive and reliable information about the changes in properties caused by protein glycoengeering. In order to meet these challenges, scientists have explored various technologies to simplify the complexity of glycoprotein samples, such as those involved the use of protein glycosylation pathway engineering and those based on the use of biochemistry and organic chemistry. These technologies do not in any simple sense replace or exclude each other, but rather complement and enrich each other.

Compared with cell-based technologies, chemistry has the advantage of being relatively more precise and flexible for the production of homogeneous glycoforms of proteins, but has the disadvantage of being more labor-intensive and less useful in large-scale production. In theory, chemistry allows for the smallscale preparation of homogeneous glycoforms with any glycans or any amino acid sequences, which can meet the requirement of structural diversity and representativeness of research samples for both basic research and protein glycoengineering studies. However, in addition to the above-mentioned disadvantage, chemistry as a tool is currently still immature: many crucial steps for glycoprotein synthesis have not been well optimized and most essential starting materials are not commercially available. It is thus still difficult for non-professionals to use chemistry to perform protein glycoengineering.

#### Glycoengineering Based on Biochemistry

Protein glycoengineering based on biochemistry methods is mainly accomplished through the use of biochemical reactions catalyzed by a variety of glycosidases and glycosyltransferases (Rothman et al., 1989; Nemansky et al., 1995; Hodoniczky et al., 2005). Glycosidases catalyze the cleavage of glycosidic bonds, while glycosyltransferases catalyze the opposite reaction, glycosidic bond formation, mainly using sugar nucleotides as glycosyl donors. Glycosidases are broadly classified as exoand endo-glycosidases. Exo-glycosidases sequentially remove monosaccharides from the non-reducing end of glycans. Endoglycosidases are capable of cleaving specific glycosidic bonds inside the glycan chains.

In recent years, the application of glycosidases and glycosyltransferases to protein glycoengineering, including the development of a cell-free glycoprotein synthesis technology, has greatly advanced this field (Jaroentomeechai et al., 2018; Wen et al., 2018; Kightlinger et al., 2019; Moremen and Haltiwanger, 2019; Rahfeld and Withers, 2020). A prominent aspect in the advance is to make us realize the importance of subtle variations in glycan structures to protein performance (Washburn et al., 2015). A representative example is that by changing the glycosidic linkages between the terminal sialic acid residue and the penultimate galactose residue, Anthony et al. was able to greatly improve the therapeutic efficacy of intravenous immunoglobulin (IVIG) (Anthony et al., 2008). As a blood product, IVIG is a treatment for autoimmune diseases including immune thrombocytopenia, rheumatoid arthritis, and systemic lupus erythematosus. Just like many other glycosylated antibodies, its N-linked glycans at amino acid position 297 have many different structures, some without the terminal Nue5Ac and some with α-2,6-linked or α-2,3-linked Neu5Ac (Kaneko et al., 2006). In their work, Anthony et al. (2008) found that when IVIG was treated with α-2,6-neuraminidase, the anti-inflammatory activity of IVIG was completely lost. When digested with α-2,3-neuraminidase, its activity was not affected. This observation suggested that the anti-inflammatory activity of IVIG may be directly correlated with the presence of α-2,6-Neu5Ac. Under the guidance of this hypothesis, they first removed Neu5Ac residues from glycans at the Asn297 site of IVIG-derived Fc fragments with α-2,3/6-neuraminidase, and then use β-1,4-galactosyltransferase and α-2,6-sialyltransferase to increase their homogeneity and α-2,6-sialylation level (**Figure 3**). Biological tests confirmed that the resulting Fc fragments had the same anti-inflammatory activity at significantly reduced doses. The success of this glycoengineering effort illustrated the importance of increasing the level of glycoprotein homogeneity to enhance the capability of protein glycoengineering.

Although relatively homogeneous glycoproteins can be prepared through the combined use of glycosidases and glycosyltransferases, this is a rather complex process largely due to the current limitations of glycosyltransferases and of the reactions they catalyze. Glycosyltransferases typically add specific monosaccharides one at a time to specific substrates and to specific sites on these substrates. In addition, many glycosyltransferases and sugar nucleotide donors are either expensive or not commercially available. All these facts render it not very straightforward to apply glycosyltransferase-catalyzed multistep reactions to generate a large number of homogeneous glycoforms bearing structurally closely related glycans to meet the research needs of protein glycoengineering. To overcome these limitations, it is necessary to replace the stepwise enzymatic approach with a highly convergent one. The key to achieving a convergent synthesis is to find enzymes that can catalyze the attachment of oligosaccharides relatively nonspecifically to a variety of substrates. To meet this demand, a new class of enzymes has been developed. They are named "glycosynthases" (Mackenzie et al., 1998; Malet and Planas, 1998). Glycosynthases are generally derived from glycosidases through genetic mutations. In the presence of activated oligosaccharide donors, glycosynthases can transfer en bloc the oligosaccharides onto different glycoprotein acceptors in high yields (**Figure 6**).

The glycoengineering method based on glycosynthasecatalyzed transglycosylation is similar to that based on

glycosyltransferase-catalyzed reactions. It also requires the removal of a large portion of N-linked glycans from glycoproteins by glycosidases before the transglycosylation reaction. This method was invented more than two decades ago and the current most commonly used one employs oligosaccharide oxazolines as donor substrates (**Figure 6**; Wei et al., 2008; Wang et al., 2019). The development of glycosynthase enzymes has helped solve the problem of structural diversity of glycoforms required by glycoengineering research to a certain extent. For example, Lin et al. (2015) were able to generate more than a dozen of homogeneous antibody glyco-variants, i.e., variants with different glycosylation patterns, using this type of enzymes. By comparing the activities of synthetic glyco-variants, they found that the complex-type biantennary N-glycans with two terminal α-2,6-linked Neu5Ac residues seemed to be optimal structures. Antibodies modified with such glycans showed enhanced activities against cancer, influenza, and inflammation.

In addition to increasing the diversity of glycoforms for research, the glycosynthase-based method can also be applied to achieve glycosylation site selectivity, that is, attaching different glycans to different glycosylation sites. In an example of such study, Giddens et al. (2018) successfully prepared several antibody variants with different N-glycans at the glycosylation sites in their Fc and Fab regions through the combined use of three endoglycosidases (Endo-S, Endo-S2, and Endo-F3), 1,6-fucosidase from Lactobacillus casei, and endoglycosidase mutants. They found that the antibody containing sialylated N-glycans on the Fab fragments and non-fucosylated ones on the Fc fragments had enhanced binding capacity to the FcγRIIIa receptor and greatly improved ADCC activity.

The advantage of in vitro enzymatic glycoengineering is that, by improving the structural control of protein glycosylation, it allows for relatively easy access to homogeneous glycoforms. With such research samples, quantitative structure-function relationships can be derived to guide the design of new protein glycoengineering research. However, the efficiency of this method is currently still limited due to the limited commercial availability of oligosaccharide substrates, limited range of substrates that can be tolerated by glycosynthase enzymes and the difficult-to-control glycosylation site selectivity. It is also challenging to use this method in large-scale. Because of these limitations, the diversity and quantity of generated samples may not be high enough to meet the requirement, and thus the research process could be slow and the identified glycoforms may not be the best choices for future use. In addition, because of the differences in the enzymes involved in protein Oglycosylation, it is currently still difficult to enzymatically transfer oligosaccharides en bloc to O-glycosylation sites. But thanks to the development of useful software like ISOGlyP and NetOglyc, it is now possible to predict O-glycosylation sites based on sequence and structure features of proteins (Hansen et al., 1998; Leung et al., 2014).

### Glycoengineering Based on Organic Chemistry

Engineering O-linked protein glycosylation can be achieved by organic synthesis. This technique can also significantly expand the structural diversity of glycoforms. These advantages mainly come from the more precise and flexible nature of organic synthesis. Unlike many other methods, organic synthesis enables the modification of glycoprotein structures at the atomic level. In theory, it could allow scientists to prepare glycoforms containing any number of glycosylation sites and any type of glycan structures that are required for protein glycoengineering research (Price et al., 2012; Chaffey et al., 2018).

In the past two decades, with the development of synthetic methods for the preparation of glycans and proteins, total chemical synthesis of glycoproteins was also developed (Fernandez-Tejada et al., 2015). The current strategy for glycoprotein synthesis relies on native chemical ligation/metal-free desulfurization (NCL/MFD) to connect synthetic peptides and glycopeptide fragments together. After that, the resulting long glycopeptide chains can be folded in vitro to form biologically active glycoproteins. The peptides for glycoprotein synthesis can be prepared from commercially-available protected amino acids by solid-phase peptide synthesis (SPPS). N-glycopeptides can be synthesized by condensation of glycosyl amines with side-chain-unprotected aspartic acids in partially protected peptides. O-linked glycopeptides can be made by incorporating O-glycosylated amino acid building blocks during SPPS.

Among the studies undertaken, the most representative one is the case of the chemical glycoengineering of hEPO. In their study, Wang et al. (2013) first applied the NCL/MFD technology to assemble the sequence of glycosylated hEPO from three N-glycopeptides, one O-glycopeptide and one peptide, which was then folded in a cysteine-cystine redox system to produce the final three-dimensional structure of hEPO. The resulting glycoform has the expected biological activity (**Figure 4**). This work for the first time provided sufficient experimental evidence for the feasibility of protein glycoengineering based on a chemical synthesis strategy, and laid a solid foundation for further development in this research area.

Using chemical approaches, two research groups were able to develop new guidelines for N- and O-linked glycoengineering of proteins (Chaffey et al., 2018). In their work, Price et al. (2012) provided a theoretical principle to guide the design of protein N-glycoengineering, which stated that "incorporating the enhanced aromatic sequons into appropriate reverse turn types within proteins should enhance the well-known pharmacokinetic benefits of N-glycosylation-based stabilization by lowering the population of protease-susceptible unfolded and aggregationprone misfolded states". An enhanced aromatic sequon normally is a five- or six-residue sequence that contains an aromatic amino acid being located two or three resides away from the N-terminus of the consensus sequence of N-linked glycosylation (Asn-Xaa-Thr/Ser). The five-residue sequence forms a type I β-turn, while the six-residue one forms a type II β-turn. This principle was confirmed in practical applications like the N-glycoengineering of β-sheet-rich 34-residue WW domain from the human Pin1 protein (Pin WW). By replacing the loop 1 of Pin WW with a fiveresidue enhanced aromatic sequon, Phe16-Ala18-Asn19-Gly20- Thr21, and glycosylating the Asn19 with N-GlcNAc, Price et al. (2011) were able to significantly increase its melting temperature.

By systematically studying the effects of O-linked glycans on the properties of a family 1 carbohydrate binding module, Patrick et al. established a guideline for protein O-glycoengineering, which stated that "O-linked glycoforms with better overall properties can be generated by collaboratively varying glycan structures and adjacent amino acids within unstructured regions that are important for biological function and/or susceptible to proteolytic cleavage and other undesired degradation reactions" (Chaffey et al., 2018). The validity of this guideline was confirmed by the glycoengineering study of a therapeutic protein, human insulin. In this study, they demonstrated that O-mannosylation of insulin B-chain Thr27 reduced its susceptibility to proteases and self-association (Guan et al., 2018).

However, although protein glycoengineering based on organic chemistry has some advantages, it also has a big disadvantage, that is, organic synthesis of glycoproteins as a new technology has not been well optimized and currently, it can only be utilized by experienced researchers. In order to make this glycoengineering approach widely accepted and used, more efforts need to be put to improve the synthesis of oligosaccharides and glycopeptides and the efficiency of the ligation of peptide/glycopeptide fragments. Perhaps more importantly, it is necessary to expedite the commercialization process of glycan building blocks, oligosaccharides, and glycopeptides and even synthetic glycoforms, because the easy access to these substances usually could help scientists gain and maintain their interest in a research area.

## CONCLUSIONS AND OUTLOOK

Protein glycoengineering as an important way to improve the performance of therapeutic proteins and industrial enzymes has attracted substantial interest over the past few decades (Neustroev et al., 1993; Elliott et al., 2003). However, due to the lack of reliable guidance, this technology is still in its infancy, and the degree of its acceptance in the scientific community is not high. At present, because biology-based methods are relatively easy to implement, some of them, especially those involving the manipulation of protein N-glycosylation pathway are more frequently employed in protein engineering research. Although such methods provide some results more quickly, the results may have some uncertainty due to the heterogeneity and low purity of research samples (Mimura et al., 2018). Chemistry-based methods, especially organic synthesis, can help overcome some of the uncertainty issue because they can produce structurally defined homogeneous glycoforms. But organic synthesis has its own weakness. It is difficult to use and is complex, expensive and time-consuming.

In order to solve the present predicament, these different methods need to be better combined to increase the practical applicability and the success rate of protein glycoengineering. A possible combination strategy is as follows: organic and/or enzymatic synthesis is used to deeply understand the structureproperty relationships of representative model glycoproteins that have relatively small sizes and simple structures. Theoretical predictions derived from the high-level understanding of protein glycosylation can then be used to guide protein glycoengineering efforts (Umana and Bailey, 1997; Grainger and James, 2013; Spahn et al., 2016, 2017; Krambeck et al., 2017; Sokolov et al., 2018; Liang et al., 2020). Finally, cell-based methods can be used to more quickly obtain designed glycoforms in large-scale. Previous studies have suggested the feasibility of this strategy. It is expected that such a strategy, once fully established, should greatly promote the advancement of protein glycoengineering in the future.

#### AUTHOR CONTRIBUTIONS

BM, XG, YL, SS, JL, and ZT wrote the paper. All authors have given approval to the final version of the manuscript.

#### FUNDING

The Training Program of the Major Research Plan of National Natural Science Foundation of China (Grant No. 91853120), the General Program of the National Natural Science Foundation

#### REFERENCES


of China (Grant No. 31872720), the National Major Scientific and Technological Special Project of China (Grant Nos. 2018ZX09711001-005 and 2018ZX09711001-013), the National Key R&D Program of China (Grant No. 2018YFE0111400), and the NIH Research Project Grant Program (R01 EB025892).

#### ACKNOWLEDGMENTS

We would like to thank the National Natural Science Foundation of China, the Ministry of Science of Technology of China, the State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, and the Chinese Academy of Medical Sciences and Peking Union Medical College, and the National Institute of Health of the United States for funding.


antibodies and Fc fusion proteins. Eur. J. Pharm. Biopharm. 100, 94–100. doi: 10.1016/j.ejpb.2016.01.005


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Ma, Guan, Li, Shang, Li and Tan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cell-Free Synthetic Glycobiology: Designing and Engineering Glycomolecules Outside of Living Cells

Thapakorn Jaroentomeechai <sup>1</sup> , May N. Taw<sup>1</sup> , Mingji Li <sup>1</sup> , Alicia Aquino<sup>1</sup> , Ninad Agashe<sup>1</sup> , Sean Chung<sup>2</sup> , Michael C. Jewett 3,4,5 and Matthew P. DeLisa1,2 \*

*<sup>1</sup> Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, United States, <sup>2</sup> Graduate Field of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY, United States, <sup>3</sup> Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, United States, <sup>4</sup> Center for Synthetic Biology, Northwestern University, Evanston, IL, United States, <sup>5</sup> Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, United States*

#### Edited by:

*Zhongping Tan, Chinese Academy of Medical Sciences and Peking Union Medical College, China*

#### Reviewed by:

*Wen Yi, Zhejiang University, China Kelley Moremen, University of Georgia, United States*

> \*Correspondence: *Matthew P. DeLisa md255@cornell.edu*

#### Specialty section:

*This article was submitted to Chemical Biology, a section of the journal Frontiers in Chemistry*

Received: *06 March 2020* Accepted: *22 June 2020* Published: *29 July 2020*

#### Citation:

*Jaroentomeechai T, Taw MN, Li M, Aquino A, Agashe N, Chung S, Jewett MC and DeLisa MP (2020) Cell-Free Synthetic Glycobiology: Designing and Engineering Glycomolecules Outside of Living Cells. Front. Chem. 8:645. doi: 10.3389/fchem.2020.00645* Glycans and glycosylated biomolecules are directly involved in almost every biological process as well as the etiology of most major diseases. Hence, glycoscience knowledge is essential to efforts aimed at addressing fundamental challenges in understanding and improving human health, protecting the environment and enhancing energy security, and developing renewable and sustainable resources that can serve as the source of next-generation materials. While much progress has been made, there remains an urgent need for new tools that can overexpress structurally uniform glycans and glycoconjugates in the quantities needed for characterization and that can be used to mechanistically dissect the enzymatic reactions and multi-enzyme assembly lines that promote their construction. To address this technology gap, cell-free synthetic glycobiology has emerged as a simplified and highly modular framework to investigate, prototype, and engineer pathways for glycan biosynthesis and biomolecule glycosylation outside the confines of living cells. From nucleotide sugars to complex glycoproteins, we summarize here recent efforts that harness the power of cell-free approaches to design, build, test, and utilize glyco-enzyme reaction networks that produce desired glycomolecules in a predictable and controllable manner. We also highlight novel cell-free methods for shedding light on poorly understood aspects of diverse glycosylation processes and engineering these processes toward desired outcomes. Taken together, cell-free synthetic glycobiology represents a promising set of tools and techniques for accelerating basic glycoscience research (e.g., deciphering the "glycan code") and its application (e.g., biomanufacturing high-value glycomolecules on demand).

Keywords: Cell-free system, chemoenzymatic synthesis, glycoscience, glycoprotein therapeutics and vaccines, metabolic glycoengineering, post-translational modification, nucleotide sugars, synthetic biology

## INTRODUCTION

Glycans and glycan-modified biomolecules including glycolipids, glycoproteins, and glycosylated small molecules (collectively referred to as glycomolecules hereafter) are ubiquitous across all domains of life. It has now been firmly established that conjugation of carbohydrate structures to biomolecules including lipids, proteins, metabolites, and nucleic acids profoundly affects their physicochemical properties, subcellular localization, immunogenicity, and pharmacokinetics and pharmacodynamics (Walt, 2012; Varki, 2017a,b). Moreover, the redesign of carbohydrate structures on glycomolecules has been demonstrated to improve their therapeutic properties such as extending half-life in vivo (Elliott et al., 2003; Chen et al., 2012), fine-tuning efficacy (Jefferis, 2009a), and enhancing vaccinespecific immunity (Berti and Adamo, 2018; Stevenson et al., 2018). At present, however, challenges associated with preparing structurally-homogeneous glycomolecules at sufficient quantities has limited our fundamental understanding of glycosylation processes and their corresponding biotechnological applications. Naturally occurring glycans are usually complex, exist in small quantities, and are present as heterogeneous mixtures or glycoforms. This heterogeneity is due to the fact that glycan biosynthesis is not template driven like those of nucleic acid and protein synthesis, but rather through a series of glycosylation reactions catalyzed by specific glycosyltransferase (GT) enzymes that are co-expressed in different subcellular locations (Aebi, 2013). Such processes are highly dynamic, resulting in multiple glycan structures on the glycomolecules (Varki and Kornfeld, 2015). Further complexity is added to the glycan repertoire through branching of the glycan core, the addition of terminal sugars such as sialic acids, as well as the modification of carbohydrates with functional groups such as phosphate, sulfate, and acetate. In addition, as glycosylation is essential for viability and highly regulated within eukaryotic cells, small perturbations in the glycosylation network can severely reduce cell fitness, further complicating glycoengineering approaches in certain living organisms (Clausen et al., 2015).

## SYNTHETIC GLYCOBIOLOGY

The term "synthetic glycobiology" was first used to describe the redesign of GT assembly lines for the production of specific glycan structures using protein engineering and chemical approaches (Czlapinski and Bertozzi, 2006). This initial definition referred narrowly to the exploitation of Golgi-resident GTs to engineer protein glycosylation inside and on the surface of eukaryotic cells, as exemplified by a number of notable glycoengineering studies in yeast (Choi et al., 2003; Hamilton et al., 2003) and more recently in mammalian cells (Meuris et al., 2014; Chang et al., 2019). These successes notwithstanding, simpler, cell-viability independent systems that permit bottom-up assembly of prescribed glycosylation pathways and controllable biosynthesis of designer glycomolecules are of great scientific and technological interest, and have the potential to be transformative. In this vein, Aebi and coworkers pioneered the first bacterial glycoprotein expression platform by transferring the N-linked glycosylation machinery from Campylobacter jejuni into laboratory strains of Escherichia coli, giving the latter the ability to transfer glycans site-specifically onto acceptor proteins (Wacker et al., 2002). Following this seminal work, numerous additional heterologous glycosylation systems have been functionally reconstituted in E. coli (Feldman et al., 2005; Ihssen et al., 2010; Hug et al., 2011; Schwarz et al., 2011; Valderrama-Rincon et al., 2012; Shang et al., 2016; Keys et al., 2017; Tytgat et al., 2019), giving this simple organism the ability to produce a diverse array of complex glycomolecules. Hence, a more current definition of synthetic glycobiology is the purposeful alteration or rational construction of any glycosylation system using chemical and molecular biological approaches in conjunction with metabolic pathway engineering tools. Such synthetic systems have been instrumental in increasing our understanding of glycosylation networks and producing desired glycans and glycoconjugates.

#### SYNTHETIC GLYCOBIOLOGY GOES CELL-FREE

While the majority of synthetic glycobiology efforts to date have involved living organisms, recent years have seen the emergence of cell-free systems as a new platform for synthetic glycobiologists to investigate and manipulate glycosylation outside of cells, leading to the birth of an entirely new field that we call cell-free synthetic glycobiology. Although still in its infancy, cell-free synthetic glycobiology has already helped to uncover the underlying mechanisms governing a variety of glycosylation reactions and enabled preparation of structurallydefined glycomolecules. The origins of this new field can be traced back almost 60 years ago when cell-free biology was used to decipher the genetic code (Nirenberg and Matthaei, 1961; Matthaei et al., 1962). Since that time, cell-free biology has matured into a well-established field in biological research (Carlson et al., 2011; Dudley et al., 2015; Silverman et al., 2020), undergoing a technological renaissance in the early twenty-first century (Shimizu et al., 2001; Jewett and Swartz, 2004; Jewett et al., 2008) that has catalyzed significant improvements in batch reaction yields (Caschera and Noireaux, 2014; Des Soye et al., 2019), operational volumes (Zawada et al., 2011; Yin et al., 2012), standardization of protocols (Kwon and Jewett, 2015; Kim et al., 2019), availability of various active lysate systems (Perez et al., 2016), incorporation of non-standard amino acids (Shimizu et al., 2005; Goerke and Swartz, 2009; Martin et al., 2018), and posttranslational modifications (Oza et al., 2015; Jaroentomeechai et al., 2018) into biomolecule products. As a result, cell-free systems are now widely used, sometimes in tandem with cellbased systems, to produce complex biomolecules (Matthies et al., 2011), to prototype and optimize metabolic pathways (Karim and Jewett, 2016; Casini et al., 2018; Lim and Kim, 2019), for molecular sensing (Pardee et al., 2014, 2016), and to build and implement genetic networks (Takahashi et al., 2015; Swank et al., 2019). Owing to its open nature, cell-free reactions provide unprecedented flexibility to directly and precisely control compositions and conditions of a given system. Eliminating the biological membrane boundary also facilitates the integration of cell-free reactions with high-throughput screening tools (Su et al., 2016; Zhang et al., 2019), real-time monitoring, and automation (Georgi et al., 2016), resulting in significant reductions in design-build-test (DBT) timelines. Furthermore, the ability to harness cellular machineries without any impediments due to cell viability provides an opportunity to synthesize products and engineer biochemical pathways that otherwise exceed cellular toxicity tolerance (Kai et al., 2015; Thoring et al., 2017). Collectively, these versatile features are precisely what make cellfree platforms especially attractive for both mechanistic discovery and technological applications in glycoscience.

In the sections that follow, we describe the utility of cell-free synthetic biology to produce structurally-defined glycomolecules including nucleotide-activated monosaccharide building blocks, glycosylated small molecules, free-reducing end glycans, glycolipids, and glycoproteins including conjugate vaccines. Characteristic features of these approaches are the synthesis of glycosylation components and the assembly of these components into functional glycosylation pathways that produce glycosylated molecules of interest in a well-controlled environment without the use of intact, living cells (**Figure 1**). In its simplest form, cell-free synthetic biology involves using purified enzymes to catalyze specific glycosylation reactions in vitro (Yu and Chen, 2016; Natarajan et al., 2018). Alternatively, to circumvent the time- and labor-intensive process of enzyme purification, in situ production of glycosylation enzymes in cellfree lysates has been used to assemble multi-step glycosylation reactions (Yu and Chen, 2016; Kightlinger et al., 2019). These biosynthesis-focused methods will be discussed in detail below, along with the use of cell-free synthetic biology as a tool to characterize and evolve glycosylation enzymes and pathways for designer functions. Collectively, these greatly simplified platforms offer exquisite control over reaction fluxes and compositions, which in turn become powerful tools to understand the biotransformation of glycomolecules. The primary focus here will be on cell-free biosynthesis approaches using glycosylation enzymes; detailed reviews on total chemical synthesis of glycans and glycomolecules can be found elsewhere (Ahmadipour and Miller, 2017; Krasnova and Wong, 2019).

### CELL-FREE ENZYMATIC SYNTHESIS OF NUCLEOTIDE SUGAR BUILDING BLOCKS

Nucleotide sugars are essential to carbohydrate metabolism and glycomolecule biosynthesis in living organisms. These molecules consist of a nucleotide base linked to a monosaccharide via a mono- or pyrophosphate group. The attachment of nucleotidyl phosphate groups to monosaccharides is crucial for the recognition of sugar nucleotide-dependent (Leloir type) GTs, a major group of enzymes responsible for complex carbohydrate biosynthesis with high regio- and stereo-control. Nucleotide sugar biosynthesis has become a major focus area because the product outputs are indispensable for synthesizing complex glycoconjugates as well as for developing biochemical assays that enable characterization and engineering of glycosylation enzymes. Furthermore, these molecules and their derivatives have therapeutic potential as inhibitors of key enzymes in inflammation (Wang et al., 2018), cancer metastasis (Trapannone et al., 2016), and pathogen infection (Turnock and Ferguson, 2007).

Naturally occurring glycan repertoires are structurally complex. Yet despite this complexity, such structures in animal cells are generally diversified using only 9 common nucleotide-activated monosaccharide building blocks: UDP-Glc, UDP-Gal, UDP-Xyl, UDP-GlcNAc, UDP-GalNAc, UDP-GlcA, GDP-Man, GDP-Fuc, and CMP-Neu5Ac (**Figure 2A**). In nature, nucleotide-activated sugars are synthesized primarily from sugar-1-phosphate generated during glycolysis and, to a lesser extent, from salvage reactions whereby monosaccharides such as GalNAc, GlcNAc, Man, and Fuc are directly activated by nucleotide attachment (Cai, 2012). These general biosynthetic pathways are used as blueprints to design and construct nucleotide-activated sugars in cell-free systems.

To activate monosaccharides at the anomeric center by phosphorylation (**Figure 2B**, Scheme i), Nidetzky and coworkers devised a novel diastereoselective synthesis technique to prepare glucose 1-phosphate using sucrose phosphorylase and glucose 1-phosphatase (Wildberger et al., 2015). This method provides an economical route to prepare glucose 1-phosphate from simpler and cheaper inorganic phosphates, in comparison to traditional synthesis, which relies on the usage of nucleoside triphosphates (NTPs). In the past decade, several new sugar kinases have been characterized and utilized to prepare sugar 1-phosphates in vitro (Ahmadipour and Miller, 2017). The most recent example is Leminorella grimontii galactokinase (LgGalK) reported by the Flitsch group (Huang et al., 2018). This enzyme was capable of phosphorylating C1 of galactose with high efficiency. Notably, LgGalK displayed a broad substrate tolerance, including 3-deoxy-3-fluorogalactose and 4-deoxy-4 fluorogalactose, pointing to its potential use in the preparation of a library of nucleotide sugar analogs and their derivatives.

The utilization of completely enzyme-catalyzed synthesis of nucleotide sugars from simple monosaccharides in a singlepot reaction has become an attractive platform due to its simplicity, versatility, and ability to be coupled with GTcatalyzed reactions. One-pot multienzyme (OPME) synthesis technology was developed to realize this goal and has now become a widely used system (Li W. et al., 2019). In its simplest system, OPME involves modification of the monosaccharide with a nucleotide pytophosphate group using multiple enzymes in a single-pot reaction. These enzymes include a suitable glycokinase/nucleotidyltransferase to generate NDP-sugars, or a pair of Neu5Ac synthases/CMP-Neu5Ac synthethases to produce CMP-sialic acid (**Figure 2B**, Schemes i, ii). This simple onepot reaction can then be coupled with other biosynthesis modules such as inorganic pyrophosphatase and GT reactions to construct a more elaborate carbohydrate structure on the aglycone molecule (Yu and Chen, 2016). Originally developed by the Chen group to prepare CMP-Neu5Ac (Yu et al., 2004) (**Figure 2B**, Scheme ii), state-of-the-art OPME technology is now capable of furnishing all common animal nucleotide sugars including UDP-Glc, UDP-Gal, UDP-GalNAc (Muthana

et al., 2012), UDP-GlcNAc (Chen et al., 2011), UDP-GlcA (Muthana et al., 2015), UDP-Xyl (Errey et al., 2004), and GDP-Man (Li et al., 2013). In addition to natural sugars, the open nature of cell-free OPME system facilitates the synthesis of many non-natural sugars, for example: NDP-Man (Mizanur and Pohl, 2009); UDP-CH2-Galp (Partha et al., 2010); UDP-Glc-6-deoxy-6-F (Caputi et al., 2013); 5-position base-modified sugar nucleotides (Wagstaff et al., 2015); and UDP-4-F-GlcNAc (Schultz et al., 2017). These nucleotide sugar analogs are not only useful as mechanistic probes and biochemical reporters, but they also could find use as novel enzyme inhibitors.

An emerging approach for the biosynthesis of nucleotide sugars is the reverse glycosidic bond strategy (**Figure 2B**, Scheme iii), which was first described by Thorson and co-workers (Zhang et al., 2006; Gantt et al., 2011). Breaking glycosidic bonds to form nucleotide sugars is thermodynamically unfavorable as it produces highly energetic molecules from a relatively stable covalent bond between the carbohydrate and aglycone. To overcome this thermodynamic barrier, the authors designed a series of aromatic sugar donors and successfully utilized these glycoside donors to shift the equilibria toward glycosidic bond breakage in a reaction catalyzed by a macrolide-inactivated GT mutant, namely the OleD variant TDP-16. The enzyme was further evolved by the same group to a Loki variant that is capable of recognizing a broader set of sugar donors and NDP acceptors (Gantt et al., 2013). Taken together, the development of both the OPME platform and reverse glycosidic bond approach have provided an ever-expanding library of nucleotide sugars that can be used to assemble more elaborate glycomolecules for fundamental studies and applications in glycoscience.

### CELL-FREE BIOSYNTHESIS OF GLYCOSYLATED SMALL MOLECULES

The past decade has seen the emergence of cell-free systems, both in the form of purified enzymes and enzyme-enriched crude extracts, as a platform to supply high-value and commodity chemicals. By sidestepping the use of live-cell factories, cellfree systems enable biosynthesis schemes in which all resources can be focused toward preparation of the desired product while at the same time allowing a wider range of preciselycontrollable operational conditions (Dudley et al., 2015). More importantly, liberation from cell viability allows cell-free reactions to synthesize target molecules at concentrations that exceed the cellular toxicity limit (Swartz, 2018). Together, these features have made cell-free systems an attractive platform for high-yield biomanufacturing of target compounds as well as for prototyping novel biosynthetic routes to their production. To date, several metabolic pathways and enzyme cascades have been implemented in cell-free formats including, but not limited to, those synthesizing antibiotics (Kim et al., 2000), cannabinoid precursors (Valliere et al., 2019), commodity alcohols (Guterl et al., 2012; Kay and Jewett, 2015, 2020), foodgrade antimicrobials (Kawai et al., 2003), glycolysis intermediates (Bogorad et al., 2013), hydrogen gas fuel (Martin Del Campo et al., 2013), isoprene compounds (Korman et al., 2014; Dudley

CMP-sialic acid synthetase to generate CMP-Neu5Ac. This synthetic scheme has also been successfully used to synthesize other CMP-sialic acids including

*(Continued)*

FIGURE 2 | *N*-glycolylneuraminic acid (Neu5Gc) and 2-keto-3-deoxy-d-glycero-d-galactonononic acid (Kdn) as well as their derivatives. (Scheme iii) Alternatively, reverse the glycosidic bond reaction utilizes aromatic glycosyl donors such as 2-chloro-4-nitrophenol (CINP) monosaccharide as a substrate in the reaction, which is converted to NDP-sugars by OleD/OleD Loki enzymes. Structures of nucleotide bases and phosphate groups are omitted for clarity. ATP, adenosine triphosphate; ADP, adenosine diphosphate; NTP, nucleoside triphosphate; PPi, inorganic pyrophosphate; and CTP, cytidine triphosphate.

et al., 2016), natural products (Goering et al., 2017), and nucleotides (Schultheisz et al., 2008, 2011).

Natural products and their derivatives including flavonoids, alkaloids, polyphenols, terpenoids, antibiotics, vitamins, and sweeteners are a major group of high-value chemicals with utility as anti-cancer, anti-inflammatory, antioxidant, and antibacterial agents. However, their clinical evaluation and utility are often limited by poor solubility, low stability, and, severe toxicity resulting from their inherent structural properties. Modifying such chemicals with a carbohydrate moiety to form the O-, N-, S-, or C-linked glycosides is a universal way to circumvent these limitations (Desmet et al., 2012). These glycosylation reactions are generally mediated by Leloir-type GTs with different types of glycosyl donors. For example, OleD from Streptomyces antibiotics and YjiC from various Bacillus species are extensively used for cell-free glycosylation of small molecules. Both enzymes accept a diverse set of NDP-sugars as glycosyl donors (Gantt et al., 2013) and show promiscuous substrate specificity (Zhou et al., 2013). The Thorson group conducted a pilot-scale, cellfree reaction in which purified OleD was shown to glycosylate more than 100 small molecules covering various classes of natural products including flavonoids, alkaloids, antibiotics, steroids, and stilbenes (Zhou et al., 2013). A similar study by Sohng and coworkers revealed that purified YjiC from Bacillus licheniformis can glucosylate more than 23 structurally diverse flavonoids with high (∼80–100% conversion) efficiency in a cellfree reaction (Pandey et al., 2014). YjiC from a related species, Bacillus subtilis 168, was also capable of in vitro glucosylation of numerous drug-like molecular scaffolds including 19 diverse structures of flavonoids, phenylketones, curcuminoids, lignins, triterpenes, anthraquinone, stilbene, zingerone, and aromatic aglycones with nucleophilic groups (Dai et al., 2017). Notably, both OleD (Gantt et al., 2008) and YjiC (Dai et al., 2017) are multi-functional GTs that can catalyze the formation of O-, N-, and S-glycosidic linkages. Along similar lines, Walsh and coworkers reported cell-free C-linked glycosylation using purified C-glycosyltransferase IroB from uropathogenic E. coli strain CFT073 that was capable of decorating enterobactin substrates with several glucose molecules (Fischbach et al., 2005).

While much attention has been given to the GT-mediated small molecules glycosylation using nucleotide-activated sugars, alternative biosynthetic routes using either glycoside phosphorylases (GPs) or glycosyl hydrolases (GHs) should not be overlooked. GP-catalyzed reactions can achieve rigid stereo- and regio-selective synthesis from a relatively stable and economically-viable sugar phosphate substrate, making this synthetic route especially attractive for large-scale synthesis (Nakai et al., 2013). By using purified sucrose phosphorylase and cellobiose phosphorylase with sucrose as glycan donors, Desmet and colleagues were able to prepare a series of structurally diverse glycosylated phenolic compounds, albeit at relatively low yields compared to GT-driven reactions (De Winter et al., 2015). The resulting glycosides exhibited significantly improved solubility and thermal stability, although their antioxidant activities were decreased to different extents. Alternatively, non-activated sugars (e.g., sucrose, starch) can also be used as glycan donors for the GH enzyme to glycosylate small molecules. Many GH enzymes exhibit dual functionalities and are thus capable of catalyzing both hydrolysis (glycosidic bond breakage) and transglycosylation (glycosidic bond formation) reactions. Whereas, GHs generally catalyze glycoside hydrolysis in vivo, the equilibrium of their reactions can be effectively reversed in vitro under certain conditions, making these enzymes suitable for cell-free transglycosylation reactions (Mladenoska, 2016). One representative study utilized a cyclodextrin glucanotransferase from Thermoanaerobacter sp. to transfer a glucosyl group from starch to pterostilbene in vitro (Gonzalez-Alfonso et al., 2018). Although glycosylation was found to slightly reduce its antioxidant activity, the resulting pterostilbene α-D-glucopyranoside exhibited improved solubility and reduced toxicity.

### ENZYME-MEDIATED IN VITRO TECHNOLOGIES FOR ASSEMBLING GLYCANS AND GLYCOLIPIDS

Complex carbohydrates or glycans in their unconjugated form (free glycans) are valuable reagents, finding use in both fundamental research and biomedical applications. An outstanding example is the use of structurally-defined free glycans to construct glycoarrays that enable high-throughput screening of molecular interactions between glycan epitopes and carbohydrate-binding entities including proteins and even whole organisms (Rillahan and Paulson, 2011). Since their first report (Fukui et al., 2002; Blixt et al., 2004), glycoarrays have proven to be tremendously useful for the discovery of antibodies, lectins, and immune receptors against carbohydrate antigens as well as for determining the substrate-specificity of various GTs (Blixt et al., 2008; Wen et al., 2018). To date, the Consortium for Functional Glycomics (CFG) and the Glycosciences Laboratory at Imperial College have developed two of the largest glycoarray libraries, respectively, consisting of ∼609 mammalian glycans (Mcquillan et al., 2019) and ∼796 neoglycolipid glycan structures (Palma et al., 2014; Li and Feizi, 2018). Another important application of free glycans is in the synthesis and development of conjugate vaccines, which are particularly effective against various bacterial pathogens (Moeller et al., 2018).

One of the major impediments to using free glycans as described above is accessibility of pure glycans at sufficient quantities. Initially, glycan libraries were obtained from natural sources such as microbes, plants, or animal products. In this process, glycans were separated from their bioconjugates through chemical or enzymatic hydrolysis followed by tedious, multistep purifications (Rillahan and Paulson, 2011). However, the high diversity of glycan structures present in natural samples makes it very difficult to acquire highly pure compounds using this approach. An alternative to harvesting glycans from natural sources is the use of chemical synthesis methods to generate free glycans from simple monosaccharide precursors. Chemical synthesis typically involves performing iterative rounds of glycosylation reactions utilizing a protecting group scheme that enables functionalization of a single hydroxyl group for sugar attachment. However, such de novo synthesis requires lengthy organic chemistry procedures, often necessitating highly specialized individuals and instrumentation. Some of these limitations have been alleviated by the introduction of automated solid-phase oligosaccharide synthesizers for the rapid synthesis of glycans as described by Seeberger and coworkers (Plante et al., 2001). By adopting solid-phase synthesis, excess amounts of glycosyl donor can be used to drive reactions to completion and the removal of unwanted side products or reagents can be done in a single wash step. Since the time of its inception, the technology has now matured into a fully commercial system known as Glyconeer 2.1 (Hahm et al., 2017). These developments notwithstanding, the chemical synthesis of glycans remains a significant challenge due to the complexity in achieving stereoand regio-selective synthesis. Selecting appropriate protective groups to achieve the desired glycosidic linkage remains one of the main hurdles and becomes more difficult as the complexity of the glycan architecture increases.

To circumvent the need for protecting group manipulation, the development of cell-free glycan synthesis systems that leverage enzymes such as GTs, GHs, and other glycan-processing enzymes is an attractive alternative. Enzymatic glycosylation permits precise stereo- and regio-controlled synthesis with high conversions using unprotected monosaccharides as substrates. Reactions generally proceed under mild, aqueous conditions without the need for toxic and harsh organic reagents. Using bio- and/or chemoenzymatic synthesis tools, several natural and engineered glycan libraries have recently been constructed including asymmetric multi-antennary N-glycans (Wang Z. et al., 2013), glycosphingolipid glycans (Yu et al., 2016), authentic human type N-glycans (Li L. et al., 2015; Hamilton et al., 2017), O-mannosyl glycans (Meng et al., 2018; Wang S. et al., 2018), human milk oligosaccharides (HMOs) (Xiao et al., 2016; Prudden et al., 2017), and tumor-associated antigens (Li P. J. et al., 2019; 'T Hart et al., 2019). Similar strategies have been adopted for cellfree enzymatic synthesis of glycolipid libraries including those from bacterial (Glover K. et al., 2005), animal (Stubs et al., 2010), and human origins (Li S. T. et al., 2019). Many of these glycan and glycolipid libraries have been employed to construct glycan microarrays for profiling glycan-binding molecules such as lectins and antibodies as well as for gaining mechanistic insights into glycosylation reactions.

Cell-free enzymatic glycan synthesis can also be integrated with automated systems for more expeditious glycan assembly. To achieve this goal, several developments that simplify purification processes and increase conversion efficiencies have recently been reported. For example, the Linhardt group demonstrated the use of a fluorous tag to capture heparin sulfate products directly from solution (Cai et al., 2014). Additional advances include a photocleavable linker that enables chemoenzymatic synthesis of tumor-associated glycan epitopes (Bello et al., 2015) and an ion-exchange purification technique that aids in cell-free biosynthesis of HMOs (Zhu et al., 2017). These advances, along with many others, have been instrumental in realizing the goal of a fully automated enzymatic glycan synthesizer, several of which have now been reported or are in late stages of development. For example, Nishimura and coworkers developed an artificial "Golgi apparatus" to prepare sialyl Lewis X derivatives using a dendrimer-based solid support (Matsushita et al., 2010). Their process took 4 days and provided an overall yield of 16%. One of the main challenges of the Golgi apparatus was that it required multiple filtration-purification steps that hindered its efficiency. To address these challenges, Wang and colleagues recently combined a thermosensitive polymer with a commercially available peptide synthesizer to mediate automated glycan assembly (Zhang et al., 2018). Their system was able to prepare several blood group antigens and ganglioside glycans with yields ranging from 27 to 38% within 1–2 days. Coincidentally, Boons and coworkers simultaneously developed a similar automated system using a set of water-soluble sulfate tags for a catch-and-release synthesis strategy (Li T. et al., 2019). The sulfate tags were compatible with a range of glycosylation enzymes and, more importantly, were readily adapted to a custom-designed automated glycosynthesizer. Using this fully automated platform, quantitative amounts of complex glycans including gangliosides, HMOs, poly-N-acetyllactosamine (poly-LacNAc) derivatives, and N-glycans could be prepared in a less labor- and time-intensive process. Despite the small number of examples, the use of automated enzymatic glycan synthesis platforms shows significant promise, both as standalone systems and in combination with automated chemical synthesis (Fair et al., 2015). With rapid developments in automation, instrumentation, solid-support matrices, reliable tags and linkers, as well as a growing collection of accessible glycosylation enzymes, a fully mature and reliable enzymatic glycan synthesizer capable of synthesizing virtually any complex carbohydrate structure appears to be within reach.

#### CELL-BASED AND CELL-FREE BIOSYNTHESIS OF STRUCTURALLY-DEFINED GLYCOPROTEINS

Protein glycosylation, the covalent attachment of glycans onto specific amino acid residues within a polypeptide chain, is one of the most common post-translational modifications in nature (Apweiler et al., 1999; Khoury et al., 2011). The attached glycan can significantly affect the intrinsic properties of its recipient protein such as folding, stability, localization, antigenicity, and immunogenicity (Helenius and Aebi, 2001; Shental-Bechor and Levy, 2009; Skropeta, 2009). Aberrant protein glycosylation is widely linked to disease states such as cancer and autoimmune diseases (Ohtsubo and Marth, 2006; Peixoto et al., 2019). Furthermore, the majority of therapeutic proteins including monoclonal antibodies are glycosylated and the manner of glycosylation often determines protein drug stability and biological function (Jefferis, 2009a).

More than 40 different types of carbohydrate-to-protein linkages have been identified to date. Among these, glycan installation at the asparagine (N-linked) and serine/threonine (O-linked) residues constitutes the greatest proportion of glycoproteins (Spiro, 2002). Protein glycosylation is highly dynamic and the glycan profile is controlled both spatially and temporally by the amino acid sequence, the local structural conformation of the glycosylation site, and the expression level of glycoenzymes at different stages of cellular development (Colley et al., 2015). Thus, glycoproteins are generally found in nature as a mixture of glycoforms sharing the same protein backbone but a variety of glycan structures. This intrinsic heterogeneity makes it challenging to decipher how specific glycoforms impact the structure and function of a modified protein. It has also been proven to be a major impediment for the development of glycoprotein-based therapeutics as the consistent ratio and identity of glycoforms are essential for reproducible clinical efficacy and safety of the biologic (Wang and Lomino, 2012). To address these challenges, a variety of glycoengineering approaches have been reported that involve the design and construction of molecular, cellular, and whole-organism systems with tunable glycosylation. In the following sections, we describe recent progress in cellular glycoengineering as well as highlight emerging cell-free technologies that leverage diverse glycoenzymes to produce structurally-defined glycoproteins for a range of downstream applications.

#### Cell-Based Glycoengineering

There is a long history of cellular glycoengineering in eukaryotes including in mammalian cells, plants, and yeasts (Bertozzi et al., 2009). Among these, the glycoengineering of Chinese hamster ovary (CHO) cultures has dominated the field as it is still the most commonly used host cell line in the biopharmaceutical industry (Walsh, 2018). Many groups have explored glycosylation control using genetic manipulation to overexpress genes encoding glycoenzymes such as Golgi-resident GTs (Weikert et al., 1999; Son et al., 2011) (**Figure 3A**). Small molecule inhibitors targeting glycoenzymes such as kifunensine (Elbein et al., 1990) and swainsonine (Elbein et al., 1981) have also been successfully used to regulate a protein's glycoform in CHO culture (Ehret et al., 2019). More recently, systems biology and bioinformatics tools have been used to model glycosylation reaction networks in order to explore and quantify how perturbations to glycosylation parameters affect the cell (Neelamegham and Liu, 2011). Coupling this insight with precise genome editing tools will offer unprecedented freedom to glycoengineer organisms with greater control over glycoprotein products. A landmark achievement in this regard was reported by the Clausen group whereby quantitative genomics data and precise genome editing was used to generate a panel of CHO cells with specific GT gene knock-outs (Yang et al., 2015). These glycoengineered CHO cells were used to screen and identify GT genes that play a major role in regulating protein N-glycosylation within the CHO cell glycome. Such knowledge, in turn, provided a blueprint for genetic reconstruction of CHO cells with desirable glycosylation capacities including those producing humanlike α2,6-linked sialic acid-capped glycoforms on therapeutic proteins such as human IgG and Erythropoietin (EPO) (Yang et al., 2015; Caval et al., 2018; Schulz et al., 2018). Another notable example from the Weiss group explored the use of CRISPR/Cas9 to implement synthetic gene circuits in CHO cells, allowing tunable N-glycan profiles of CHO culture-derived IgGs in a small molecule concentration-dependent manner (Chang et al., 2019). More recently, precision gene editing was used to create a library of validated CRISPR/Cas9 guide RNA targeting constructs for all human GT genes (Narimatsu et al., 2018). This gRNA library was subsequently applied to create an array of HEK293 cells displaying the human glycome (Narimatsu et al., 2019). This cell-based library of human glycan structures should become a valuable resource for dissecting glycan biosynthesis and glycomolecule interactions within a native physiological context (Narimatsu et al., 2019). It should also be pointed out that advances in cell-based glycoengineering have extended beyond mammalian cells, with significant progress toward producing homogeneous glycoforms in other eukaryotes including yeast (Hamilton et al., 2003; Wildt and Gerngross, 2005), microalgae (Barolo et al., 2020), insect cells (Toth et al., 2014), and plant cell cultures (Montero-Morales and Steinkellner, 2018; Hurtado et al., 2020). Comprehensive reviews of the glycoengineering approaches developed in these eukaryotic systems have been published elsewhere (Hamilton and Zha, 2015; Heffner et al., 2018).

Not to be outdone, glycoengineering in prokaryotes has emerged as an attractive strategy for cell-based production of homogenous glycoproteins (**Figure 3A**). The discovery of a bona fide N-linked protein glycosylation pathway in the mucosal bacterium C. jejuni (Szymanski et al., 1999; Gross et al., 2008), and its functional reconstitution in E. coli (Wacker et al., 2002), laid the foundation for the development of a bacterial glycoengineering system. Owing to its lack of any native protein glycosylation systems, E. coli offers a blank canvas on which prescribed, orthogonal glycosylation pathways can be assembled without concern over interference from endogenous glycoenzymes. Combined with its fast growth, ease of genetic manipulation, and the ability to express a wide range of recombinant proteins, E. coli cells equipped with glycosylation machinery are capable of biosynthesizing designer glycoproteins bearing various therapeutically-important glycan epitopes such as the eukaryotic core N-glycan Man3GlcNAc<sup>2</sup> (Valderrama-Rincon et al., 2012; Glasscock et al., 2018), bacterial O-polysaccharide (O-PS) antigen structures (Feldman et al., 2005), human blood group antigens (Hug et al., 2011; Shang et al., 2016), authentic human O-glycans (Du et al., 2018; Natarajan et al., 2020), and polysialic acid-containing glycans (Keys et al., 2017; Tytgat et al., 2019). Taken together, efforts in cellular glycoengineering have yielded a variety of expression platforms, both prokaryotic, and eukaryotic, for producing glycoproteins with chemically-defined carbohydrate structures.

FIGURE 3 | (*O*-PS) antigens. (B) Cell-free glycoengineering using glycoenzymes including (i) endoglycosynthases (ENGases), (ii) prokaryotic oligosaccharyltransferases (OSTs), and (iii) glycosyltransferases (GTs). For ENGase-mediated glycosylation, glycoproteins bearing heterogenous *N*-glycoforms are deglycosylated using specific glycosyl hydrolases (GHs) to generate monosaccharide handle such as GlcNAc at the native glycosylation site. Pre-synthesized glycan structures containing an oxazoline functional group at the reducing end are then used as glycosyl donor in a reaction catalyzed by ENGase to remodel glycans to homogeneity. For prokaryotic OST-mediated glycosylation, cell-free extract is generated from glycoengineered *E. coli* such that the extract is enriched with all necessary gene transcription, protein translation, and protein glycosylation machineries. Supplementing extracts with DNA encoding target protein co-activates protein synthesis and site-specific protein glycosylation. For GT-mediated protein glycosylation, sequential glycosylation reactions are carried out, beginning with installation of an initial monosaccharide on the protein using a specific GT such as *Ap*NGT that installs Glc on asparagine residues and ppGalNAcT that modifies serines or threonines with GalNAc. The monosaccharide primer can then be extended, directly on glycoprotein, by a series of specific GTs such as GalT and SiaT to generate a final glycoform.

Further improvement of the existing methods as well as the invention of entirely new technologies are anticipated to expand the glycoprotein expression toolkit available to scientists and engineers.

#### Cell-Free Glycoengineering

While glycoengineering in living cells is offering novel engineered organisms with desirable glycosylation capacities (Steentoft et al., 2014; Tejwani et al., 2018), the repertoire of accessible glycan structures remains limited. Moreover, genetic manipulation of host cells is often non-trivial as it is constrained by a multitude of factors such as the essential nature of cellular glycosylation and its impact on cell viability, difficulties in precisely tuning the expression of glycosylation components, and intracellular complexity especially with respect to the plethora of native GTs that compete for glycosylation substrates and catalyze the formation of unwanted glycoforms. On the other hand, cellfree approaches are not restricted by these cellular limitations and can provide more stringent control over glycan assembly and installation reactions to obtain highly pure, structurallydefined glycoproteins. Many of the early efforts in cell-free glycoengineering focused on total synthesis of glycoproteins using native chemical ligation and/or chemoselective ligation (Wang and Davis, 2013), and significant progress on this front has been made as documented in reports describing the assembly of large and complex glycoproteins including the α- and β-subunits of human hormone (Aussedat et al., 2012; Nagorny et al., 2012), interferon (Sakamoto et al., 2012), RNase C (Piontek et al., 2009a,b), and human erythropoietin (Wang P. et al., 2013). In parallel, enzyme-mediated cell-free glycoprotein synthesis is emerging as a tool to complement chemical methods for synthesizing homogeneous glycoproteins. As mentioned earlier, enzymatic glycosylation offers precise control over stereoand regio-chemistry without the need for chemical protecting groups, making it especially attractive for preparative-scale biosynthesis of complex glycoproteins. In the text that follows, we present three major cell-free enzymatic approaches for preparing glycan-defined glycoproteins.

#### Endoglycosynthase-Mediated Preparation of Homogeneous N-Glycoproteins

GHs are a class of glycoenzymes responsible for breaking specific glycosidic bonds in glycomolecules. They exhibit dual functionalities depending on whether a water molecule (hydrolysis reaction) or an activated –OH group of another carbohydrate acceptor (transglycosylation reaction) attacks an enzyme-substrate complex during catalysis (Li and Wang, 2018). The latter activity has pointed to the potential use of GHs in preparing glycoproteins by the en bloc transfer of presynthesized glycans from a glycosyl donor onto an acceptor protein (**Figure 3B**, Scheme i). The most commonly used acceptor is a protein containing a single GlcNAc moiety, which can be generated via chemical synthesis or by enzymatic glycan trimming of glycoproteins derived from mammalian (Goodfellow et al., 2012; Giddens et al., 2016), yeast (Liu et al., 2018), or microbial cultures (Schwarz et al., 2010).

Following initial attempts to use transglycosylation to synthesize glycosides (Kobayashi et al., 1991), two seminal discoveries have significantly propelled progress in the field: (i) the generation of GH mutants called glycosynthases that favor transglycosylation over hydrolysis (Mackenzie et al., 1998); and (ii) the use of sugar oxazolines as glycosyl donors for glycosynthases, which dramatically improves transglycosylation yields (Umekawa et al., 2008). To date, more than a dozen GHs including β-glycosidases (GH 1), α-galactosidases (GH 35, 36), α-fucosidases (GH 29), and endohexosaminidases (GH 18, 20, 25, 56, 84, and 85) have been cataloged and transformed into glycosynthases (Danby and Withers, 2016). Among these, endo-β-N-acetylglucosaminidases (ENGases) such as those isolated from Arthrobacter protophorminae (EndoA), Mucor hiemalis (EndoM), Streptococcus species (EndoD, S, and H), and Elizabethkinga meningoseptica (EndoF3) of the GH 18 and 85 families, have attracted the greatest attention due to their ability to cleave between the chitobiose core of Nglycans (Barreaud et al., 1995). A series of ENGase-mutants with improved transglycosylation activity has been isolated and successfully used for convergent synthesis and glycan remodeling of diverse N-glycoproteins including Saposin C glycopeptide with a complex-type N-nona-saccharide (Hojo et al., 2012), RNase B bearing a high mannose-type N-glycan (Takegawa et al., 1995; Amin et al., 2011), glycosylated insulin (Tomabechi et al., 2010), glycosylated HIV peptide antigen (Amin et al., 2013), fibrinogen (Giddens et al., 2016), and, most notably, IgG-Fc with a homogenous glycoform (Fan et al., 2012).

Monoclonal antibodies (mAbs) continue to be one of the fastest growing classes of biotherapeutics (Walsh, 2018). All therapeutic mAbs contain an N-glycan at the conserved N297 residue within the Fc region. The impact of Fc glycan on the physicochemical properties and effector functions of mAbs has been well-documented (Jefferis, 2009b). Thus, the ability to obtain mAbs in a pure glycoform not only guarantees a reproducible route for producing safe biologics, but also opens the door for engineering more effective therapeutic mAbs. With this in mind, Lai-Xi Wang and coworkers isolated two EndoS mutants that could effectively remodel the glycan of an intact IgG (Huang et al., 2012). The utility of their approach was demonstrated through the remodeling of glycans on the therapeutic mAb, Rituximab, resulting in well-defined glycoforms including Man3GlcNAc<sup>2</sup> (M3) azide-containing M3, Gal2GlcNAc2Man3GlcNAc<sup>2</sup> (G2), and NeuNAc2Gal2GlcNAc2Man3GlcNAc2Fuc (G2FS2). The glycan remodeling reactions were efficient, yielding sufficient quantities of each glycoform to be examined for binding affinity with Fcγ receptors. Following this breakthrough, ENGase-mediated chemoenzymatic glycan remodeling has become widely adopted by many research groups to generate homogeneous glycoforms of therapeutic mAbs, including Rituximab (Lin et al., 2015) and Herceptin (Kurogochi et al., 2015; Liu et al., 2018), with a relatively large glycoform library. Recent work from the Davis group further improved the transglycosylation reaction conditions with reduced unfavorable side reactions such as chemical glycation. The most optimal reaction yielded the purest glycoform of Herceptin (∼90%) to date (Parsons et al., 2016). Finally, the utility of chemoenzymatic transglycosylation was recently extended to install phosphorylated glycans (Priyanka et al., 2016) and to remodel N-linked glycans in the Fab region (Giddens et al., 2018). Importantly, the ability to generate relatively homogenous mAb glycoforms is providing insights into how specific carbohydrate epitopes modulate conformational changes and effector functions of antibodies, including antibody-dependent cellular cytotoxicity (ADCC), complement-dependent cytotoxicity (CDC), and anti-inflammatory activities.

#### Prokaryotic OST-Mediated Cell-Free Glycoproteins Biosynthesis

The ability to reconstitute glycoprotein biosynthesis in a welldefined, cell-free environment has the potential to transform the study of glycoscience. In such a system, not only can a particular step in glycan assembly, glycan modification, and glycan installation on the protein be carefully interrogated, but it can also facilitate the construction of engineered glycosylation pathways for making specific glycoforms of a protein. Such systems are inspired by and borrow components from natural glycosylation mechanisms found in eukaryotes, and more recently in prokaryotes.

In eukaryotes, N-glycoprotein biosynthesis involves the transfer of a preassembled glycan (Glc3Man9GlcNAc2) from a dolichyl-pyrophosphate carrier to an asparagine residue within the Asn-Xaa-Thr/Ser (where X 6= Pro) consensus sequon of a nascent polypeptide chain by an oligosaccharyltransferase (OST) enzyme (Aebi, 2013). The precursor N-glycan on glycoproteins then undergoes a series of GH-mediated glycan trimming and GT-mediated glycan elaboration steps in the ER and Golgi to yield the final glycoform of the protein (Berger, 1985; Arigoni-Affolter et al., 2019). The OST is a key enzyme of this pathway and consists of a protein complex containing multiple transmembrane subunit proteins, including the catalytic subunit STT3 (Kelleher and Gilmore, 2006). Early work from the Coward and Imperiali groups devised in vitro glycosylation assays to gain mechanistic understanding of the substrate specificity and activity of the yeast OST (Xu and Coward, 1997; Tai and Imperiali, 2001). Many of these studies were done using crude extract-containing detergent-solubilized OSTs from yeast microsomes to catalyze glycan transfer from dolichyl lipid-linked oligosaccharides (LLOs) onto peptide acceptors containing a glycosylation motif (Sharma et al., 1981; Srinivasan and Coward, 2002). Due to the inherent structural complexity, the preparation of membrane-bound OST complexes has proven difficult, often leading to inactive or unstable enzymes. Recent advances in biochemical techniques, however, have now made it possible to obtain highly-pure and active OST complexes, including those from humans, for in vitro functional characterization and most importantly structural elucidation (Bai et al., 2018; Wild et al., 2018; Ramirez et al., 2019). Nevertheless, the preparation of glycoproteins by eukaryotic OST-mediated in vitro glycosylation has yet to be realized. Key impediments include the inaccessibility of dolichyl LLO libraries (Gibbs and Coward, 1999) and the uncertainty of whether eukaryotic OSTs, which operate co-translationally, can also post-translationally modify target proteins. To date, in vitro glycosylation reactions catalyzed by eukaryotic OSTs have only been performed with short synthetic peptide acceptors and it remains to be seen whether these enzymes can efficiently glycosylate fully foldedproteins in vitro (Bai et al., 2018; Wild et al., 2018; Ramirez et al., 2019).

Similar to eukaryotes, N-linked protein glycosylation in certain Proteobacteria such as Campylobacter and Helicobacter species involves en bloc transfer of glycans from undecaprenyl-pyrophosphate (Und-PP) glycolipids onto conserved glycosylation motifs within the protein chain (Szymanski and Wren, 2005; Nothaft and Szymanski, 2010). Bacterial OSTs share a conserved architecture with eukaryotic STT3s with the exception that bacterial OSTs are single-subunit enzymes (Szymanski et al., 1999; Dell et al., 2010). Shortly after the discovery of the first bona fide N-glycosylation system in C. jejuni, Aebi and colleagues demonstrated the functional transfer of the C. jejuni protein glycosylation locus (pgl) into E. coli (Wacker et al., 2002), which not only facilitated mechanistic studies of the pathway but opened the door to bacterial glycoengineering.

By leveraging glycoengineered strains of E. coli, early work demonstrated that the C. jejuni OST (hereafter CjOST) has a more stringent substrate specificity than eukaryotic OSTs, requiring an extended glycosylation sequon, Asp/Glu-X−1-Asn-X+1-Ser/Thr (where X−1, X+<sup>1</sup> 6= Pro) (Kowarik et al., 2006b). The so called "minus-two rule" of the CjOST, requiring an acidic amino acid residue at the −2 position of the glycosylation site, did not strictly apply to other bacteria, as several CjOST homologs, such as those found in Desulfovibrio, Helicobacter, and deep sea vent bacterial species, were observed to have significantly relaxed substrate specificity (Ollis et al., 2015; Mills et al., 2016). Regardless of their specific sequon preferences, these enzymes are capable of installing glycans onto sequons that have been engineered at the N- and C-termini and in flexible regions of heterologous proteins (Fisher et al., 2011; Lizak et al., 2011a) and can glycosylate such heterologous proteins both in cellbased and cell-free systems (Kowarik et al., 2006a; Ollis et al., 2015).

An attractive feature of bacterial N-glycosylation systems is their inherent simplicity, which makes them readily amenable to reconstitution outside the cell. Indeed, following the functional expression of the C. jejuni pgl locus in E. coli cells, the same glycosylation reaction was recapitulated in vitro by Imperiali and coworkers who showed that purified CjOST was capable of transferring a glycan from a synthetic donor, Und-PPdisaccharide, onto a synthetic peptide acceptor (Glover K. J. et al., 2005). Along similar lines, Aebi and coworkers described an in vitro glycosylation assay comprised of purified CjOST, a purified acceptor protein, and LLOs bearing the C. jejuni heptasaccharide glycan (CjLLOs) that were extracted from glycosylation-competent E. coli (Kowarik et al., 2006a). Using this greatly simplified and well-controlled in vitro system, they were able to evaluate the ability of CjOST to glycosylate distinct folding states of a model acceptor protein, RNase AS32D, leading to important insights about the preferred conformation (folded vs. unfolded) of bacterial acceptor proteins and the timing (co- vs. post-translational) of the bacterial glycosylation process (Kowarik et al., 2006a). Further, despite being large, integral membrane proteins with 13 transmembrane segments (Lizak et al., 2011b), bacterial OSTs can be readily overexpressed and purified from a recombinant host like E. coli, and the robust protocols for large-scale purification of CjOST and other bacterial OST homologs are documented (Jaffee and Imperiali, 2013; Jaroentomeechai et al., 2017). Taken together, these developments have established in vitro glycosylation as one of the standard tools in bacterial glycobiology and glycoengineering.

Building on these advances, Guarino and DeLisa explored coupling bacterial-based in vitro glycosylation with E. colibased cell-free protein synthesis (CFPS) technology (Guarino and Delisa, 2012). Specifically, they demonstrated that by supplementing either standard cell-free S30 extracts derived from E. coli or the PURE (protein synthesis using recombinant elements) system (Shimizu et al., 2001) with purified CjOST and extracted CjLLOs, it was possible to achieve efficient glycosylation of different model glycoprotein targets including the C. jejuni AcrA protein and a single chain fragment variable (scFv) antibody engineered with a C-terminal glycosylation sequon. More recently, the DeLisa and Jewett groups have developed a more integrated, single-pot platform for cellfree glycoprotein synthesis (CFGpS) (**Figure 3B**, Scheme ii) in which S30 extracts were selectively enriched with both CjOST and CjLLOs, effectively bypassing the need for purification and extraction, respectively, of these essential glycosylation components (Jaroentomeechai et al., 2018). When these glyco-enriched extracts were supplemented with plasmid DNA encoding different acceptor proteins including human erythropoietin, protein synthesis and N-glycosylation were co-activated in a manner that resulted in appreciable amounts of site-specifically modified target proteins that retained biological activity. Importantly, the system was demonstrated to be highly modular, allowing several different CjOST homologs and structurally-distinct glycans including the eukaryotic trimannosyl core glycan, Man3GlcNAc2, to be rapidly interchanged into the cell-free reaction. DeLisa, Jewett, and coworkers have recently extended the CFGpS platform for cell-free conjugate vaccine synthesis (Stark et al., 2019), which takes advantage of the fact that CjOST has a relaxed glycan substrate specificity and is capable of catalyzing transfer of O-PS antigens to yield conjugate vaccines (Feldman et al., 2005; Terra et al., 2012). By developing S30 extracts from low-endotoxin E. coli cells expressing CjOST and different O-PS structures, it was possible to decorate a panel of different FDA-approved protein carriers such as CRM197 and Haemophilus influenza protein D with pathogen-specific polysaccharides including the O-PS antigen from the highly virulent pathogen Franciscella tularensis subsp. tularensis (type A) strain Schu S4. Importantly, conjugates supplied by this cell-free technology were observed to elicit O-PS-specific antibodies and provided complete protection against pathogen challenge in immunized mice (Stark et al., 2019).

It should be pointed out that while the CFGpS systems described above rely on heterologous expression of OSTs and LLOs in cells prior to extract preparation, it should be possible to streamline these systems with cell-free biosynthesis of each glycosylation component. To this end, it has been demonstrated that full-length and active membrane-bound bacterial OSTs could be directly synthesized in cell-free extracts that were supplemented with nanodisc scaffolds (Schoborg et al., 2018). It has also been shown that chemically-defined LLOs bearing the C. jejuni glycan can be generated by in vitro assembly of a biosynthetic pathway comprised of purified GTs (Glover K. et al., 2005). By integrating the biogenesis of OSTs and LLOs in vitro with cell-free glycoprotein synthesis platforms, we anticipate the creation of a simplified yet highly modular framework for furthering the study and exploitation of the bacterial glycosylation mechanism.

In addition to N-linked glycosylation, certain bacterial species including those in Neisseria and Pseudomonas genera possess O-linked protein glycosylation pathways that involve a similar en bloc glycan transfer mechanism (Faridmoayer et al., 2007). Utilizing cell-free reconstitution, a central enzyme in this pathway, O-OST, was found to have extremely broad substrate promiscuity, both in terms of the recognizable glycan structures and their lipid carriers (Faridmoayer et al., 2008; Musumeci et al., 2013). These features make this class of OST enzymes especially attractive for biotechnological and biomedical applications. However, widespread use of O-OSTs for preparing useful glycoproteins has been hindered by the lack of a consensus glycosylation motif, which in turn limits our ability to perform Oglycosylation on heterologous targets. This hurdle was partially resolved recently with the rational design of a minimum optimal O-linked recognition (MOOR) motif that was recognized by the O-OST PglL from Neisseria meningitidis (Pan et al., 2016). The MOOR sequence, which is composed of 8 amino acids flanked by two hydrophilic motifs, was used to produce an O-linked conjugate vaccine against Shigella flexneri. Since O-OSTs can transfer a wide range of structurally-diverse O-polysaccharides (Faridmoayer et al., 2008), the advent of the MOOR motif is expected to accelerate the use of O-OST-based glycosylation as a platform to produce and engineer conjugate vaccines against diverse bacterial pathogens. In an important first step toward cellfree O-glycoprotein biosynthesis, the DeLisa group has generated S30 extracts enriched with different O-OSTs and LLOs bearing short-chain human O-glycans (e.g., Tn antigen, T antigen, and sialylated versions of both) (Natarajan et al., 2020). The resulting glyco-enriched extracts were capable of synthesizing antigenically authentic glycoforms of human mucin 1 (MUC1), thereby providing a platform for construction of designer Oglycoproteins and further expanding the cell-free glycoprotein expression toolkit.

#### GT-Mediated Protein Glycosylation and Glycan Elaboration

Processive protein glycosylation is prevalent in nature with the archetype represented by vertebrate mucin-type O-glycosylation, a mechanism whereby the glycan is assembled directly on the protein by sequential addition of monosaccharides by GTs (Hang and Bertozzi, 2005). Mucin-type O-glycosylation is initiated by the formation of α-glycosidic bonds between GalNAc monosaccharides and Ser/Thr residues that are catalyzed by a specific enzyme in the polypeptide-N-acetylgalactosaminyl transferase (ppGalNAcT) family. This core structure, named Tn antigen, can then be extended via sequential addition of other monosaccharides including Gal, GlcNAc, and Neu5Ac by one or more of the ∼30 different Golgi-resident GTs (Bennett et al., 2012). Since mucin-associated O-glycan structures are associated with many types of cancer (Pinho and Reis, 2015), there is great interest in obtaining structurally-defined O-glycoproteins for the development of carbohydrate-based cancer vaccine candidates. One promising avenue has been chemoenzymatic synthesis for preparing large glycopeptides carrying cancer-related O-glycans including Tn and sialylated Tn antigens. For example, Clausen and coworkers pre-synthesized MUC1 peptides that were subsequently modified by a series of ppGalNAcT enzymes with differential glycosylation site preferences (Sorensen et al., 2006). The GalNAc moieties on the MUC1 glycopeptides were then elongated to T or sTn antigen using β3-Gal or ST6GalNAcI transferases, respectively (**Figure 3B**, Scheme iii). The resulting O-glycosylated mucin peptides were subsequently used to immunize mice, leading to the elicitation of Tn/sTn antigen-specific antibodies that could recognize specific types of cancer cells. This study highlights the potential of cell-free glycoprotein synthesis approaches in the design and production of carbohydrate-based vaccine candidates. It is worth mentioning that while O-GalNAcylation of mucin has been the focus of intense research, many lesser studied types of O-linked glycosylation have been reported in recent years including the enzymatic transfer of GlcNAc, Man, Fuc, Glc, Gal, and Xyl sugars onto specific proteins such as Notch receptors and epidermal growth factor like (EGF) repeats (Bennett et al., 2012; Haltiwanger et al., 2015; Varki, 2017a; Holdener and Haltiwanger, 2019). Given our incomplete understanding of these relatively new types of O-glycosylation, it stands to reason that cell-free approaches will help to decipher their mechanisms and roles in biology.

Certain gram-negative γ-proteobacteria have been found to contain unique processive N- and O-linked protein glycosylation pathways (Ohuchi et al., 2000; Zhou and Wu, 2009). Among them, N-glycosyltransferase (NGT) from Actinobacillus pleuropneumoniae (ApNGT) is the best characterized enzyme, which is capable of transferring Glc residues onto the same Asn-X-Ser/Thr motif used in canonical N-glycosylation that proceeds by the en bloc mechanism (Choi et al., 2010; Kawai et al., 2011; Naegeli et al., 2014). The ApNGT has been functionally transferred into E. coli (Naegeli et al., 2014), providing a novel mode of bacterial glycoengineering (Keys et al., 2017). Indeed, several groups have leveraged ApNGT to site-specifically Nglycosylate target proteins with a Glc moiety that serves as a "glycan primer" for further extension to defined glycoforms such to α-Gal, lactose, siallylactose, LacNAc, and Lewis-X structures by prescribed GTs (Kightlinger et al., 2019; Tytgat et al., 2019). In the work by Aebi, Keys and coworkers, multiple copies of these glycoepitopes could then be installed on the same target protein to create multivalent glycopolymers or equipped onto self-assembling polypeptides to produce megadalton glycoprotein assemblies (Tytgat et al., 2019). Such multivalent glycostructures could find applications in antibody discovery and the development of novel biomedical materials. The Jewett and DeLisa groups devised a multi-pot reaction scheme whereby each pot contained E. coli extract synthesizing a specific GT enzyme (Kightlinger et al., 2019). These reaction pots containing active GTs could then be combined in a sequence-specific manner to prototype designer glycosylation pathways (**Figure 3B**, Scheme iii). This modular technology, called glycosylation pathway assembly by rapid in vitro mixing and expression (GlycoPRIME), enabled the generation of 23 unique glycan epitopes whose pathways were successfully transferred into E. coli to biomanufacture useful glycoproteins including the H1HA10 protein vaccine containing an α-Gal epitope. This study showcases the power of cell-free synthetic glycobiology as a versatile tool to design, build, test, and employ designer glycosylation pathways for the development and production of putative glycomedicines. It should be pointed out that while unique bacterial enzymes such as ApNGT have been harnessed for processive glycan construction, similar strategies have been developed using CjPglB. That is, even though CjPglB is known for its ability to transfer preassembled glycan structures, it can also be used to install a single GlcNAc residue onto acceptor protein targets as was shown recently by Liu et al. (2014). These authors designed a series of short polyisoprenol variants that were modified with a single GlcNAc monosaccharide and used these unnatural sugar-unnatural lipid conjugates to demonstrate that purified CjOST could catalyze the formation of GlcNAc-ylated peptides and proteins. They further showed that these GlcNAc-ylated species could be extended with ENGases and GTs (e.g., EndoA, β1,4-GalT) thereby demonstrating a novel in vitro route to tailor-made glycoproteins.

### HIGH-THROUGHPUT SCREENING STRATEGIES FOR IMPROVING GLYCOENZYMES

In the past few decades, directed evolution approaches have proven tremendously useful in improving and/or altering the activities of existing enzymes (Porter et al., 2016; Arnold, 2018). In any directed evolution experiment, the development of a reliable high-throughput screening (HTS) assay is critical to successful library-based isolation of enzyme candidates with desirable traits (Qu et al., 2020). Generally, any enzymatic reaction amenable to the use of chromo-, radio-, or fluorogenic substrates can be easily integrated into a HTS format using a standard multi-well plate. A direct coupling between enzymatic activity and signal generation has often been used to screen for GH activity using a chemical reporter substrate (Kwan et al., 2015). However, such direct coupling methods have been proven to be extremely difficult to adapt for screening libraries of glycoenzymes like GTs and OSTs, since glycosidic bond formation does not provide any convenient readouts (Chao and Jongkees, 2019). To circumvent this issue, numerous indirect coupling assays measuring signal from interactions between glycomolecule products or reaction byproducts with a secondary reporter have been developed. For example, the UDP-Glo assay from Promega measures luminescent signals generated from the coupling that occurs between the UDP byproduct of the GT reaction with ATP generation (Zegzouti et al., 2016). Aside from indirect coupling with NDP byproducts, most assays rely on the affinity of biomolecules, such as lectins or antibodies, toward glycans and glycoprotein products. These affinity reagents are often conjugated with a chemical reporter or with an enzyme such as horseradish peroxidase that can generate a spectroscopic signal upon the addition of specific chemicals. The concept of affinity-based indirect coupling has been widely applied for the screening of GT and OST enzymes in various formats including by enzyme-linked immunosorbent assay (Ihssen et al., 2012; Pandhal et al., 2013), glycophage display (Celik et al., 2010; Durr et al., 2010), fluorescence-activated cell sorting (FACS) assay (Aharoni et al., 2006; Glasscock et al., 2018), and modified colony blotting methods (Ollis et al., 2014).

Alternatively, mass spectrometry-based high-throughput screening (MS-HTS) is an emerging technology offering a rapid, label-free, quantitative, and highly sensitive method to screen biomolecule libraries (Xu et al., 2007). In addition to its wellestablished workflow, its ability to multiplex and to be integrated with other in vitro techniques makes MS-HTS an attractive tool to screen glycoenzyme libraries. Recent glycoengineering work by Mrksich and coworkers reported a novel MS-HTS strategy for the characterization of GT enzymes produced directly from CFPS lysate (Kightlinger et al., 2018). This platform, called glycosylation sequence characterization and optimization by rapid expression and screening (GlycoSCORES), integrates E. coli-based CFPS with self-assembled monolayers for matrixassisted desorption/ionization (SAMDI) mass spectrometry. Specifically, by combining rapid in vitro biosynthesis of GTs, such as ApNGT, in cell-free extract with high-throughput analysis of their activity using SAMDI-MS, the authors were able to systematically investigate the enzyme's substrate specificity using 3,480 unique peptides and 13,903 unique reaction conditions, revealing the optimal glycosylation sequon (Kightlinger et al., 2018). More recently, the same team extended the methodology to the analysis of intact glycoproteins (Techner et al., 2020), providing an exciting new avenue for the discovery and improvement of glycosylation enzymes. In addition, they used conditionally orthogonal peptide acceptor specificities of NGTs to site-specifically control installation of multiple distinct glycans (Lin et al., 2020).

### CONCLUDING REMARKS

The impact of glycomolecules in basic biology and applied biotechnology is undeniable. As such, new and expanded toolkits are required to help transform the field of glycoscience and realize its full potential across biology, chemistry, and material science (Walt, 2012). Methods for the synthesis, characterization, evolution, and database processing of glycomolecules and glycoenzymes are still lacking, especially in comparison to those available to scientists and engineers for studying and engineering nucleic acids and proteins. In this review, we have outlined a number of emerging cell-free synthetic glycobiology approaches for the biosynthesis of chemically-defined glycomolecules. The past decade has seen considerable improvements in one-pot multienzyme synthesis (OPME) platforms, yielding several synthesis modules specific for sugar nucleotide building blocks including those with non-canonical and non-natural structures. Many of these modules have been employed to construct remarkably diverse glycans, glycolipids, and glycopolymers such as heparin sulfate. In parallel, glycochemists have made significant progress integrating enzymatic glycan biosynthesis with automated solid-phase synthesis with the goal of offering a fully-commercialized machine capable of synthesizing pre-designed, complex glycans for structural and functional investigation.

Glycoproteins with structurally uniform glycoforms are highly valuable as research reagents and biotherapeutics. Yet, our progress in understanding the structure-activity relationships of glycoproteins has been hindered, due in large part to technical barriers in preparing glycoproteins bearing welldefined glycan structures. In vitro chemoenzymatic approaches using ENGases have emerged as a versatile strategy for assembling homogenous glycoproteins including, and perhaps most notably, antibodies with specific N-glycoforms. Such antibodies have been instrumental in gaining insight into how specific glycan epitopes influence the effector functions of therapeutic antibodies. Alternatively, CFPS combined with the power of cell-free synthetic biology and bacterial glycoenzymes such as OSTs and NGTs provides a fully-integrated platform for rapidly producing uniform glycoproteins by seamlessly integrating transcription/translation with protein glycosylation in a one-pot reaction.

From a cell-free synthetic biology perspective, remaining challenges include improving current cell-free systems to produce correctly folded versions of more complex proteins and protein complexes, expanding the genetic code of CFPS for the incorporation of multiple non-natural amino acids in a protein for site-selective modification, and to further optimize synthesis efficiency, product titers, and cost reduction. In addition, cell-free extract equipped with the machineries for protein glycosylation and other important post-translational modifications such as phosphorylation and acetylation will be increasingly in demand, especially by those working to fully expand our understanding of glycosylation networks and their regulation (Yu and Chen, 2016). From a glycobiology standpoint, one of the key limitations is the availability of glycoenzymes with high catalytic activity and ease of purification. Currently, only a small fraction of useful glycoenzymes has been cataloged, characterized, and commercialized for broader use (Walt, 2012), while the list of available glycoenzymes continues to expand (Moremen et al., 2018). Another potential challenge is the strict substrate specificity of glycoenzymes, which in turn limits their utility in biotransformation. Novel directed evolution and high-throughput screening methods leveraging the power of cell-free biology are needed to identify more efficient glycoenzyme variants with precisely tailored substrate specificity. Finally, integration of the tools from cellfree synthetic glycobiology with those from other disciplines including metabolic engineering (Wratil and Horstkorte, 2017); (Agatemor et al., 2019), mathematic modeling (Umana and Bailey, 1997; Spahn et al., 2016; Wayman et al., 2019), and machine learning and bioinformatics (Li F. et al., 2015; York et al., 2019) will be needed to solve the most complex problems in

#### REFERENCES


glycoscience. Only when these unmet needs have been addressed can the full potential of cell-free synthetic glycobiology be unlocked for furthering glycoscience and its application.

#### AUTHOR CONTRIBUTIONS

TJ, MT, ML, AA, NA, SC, MJ, and MD contributed to the writing and editing of the manuscript. All authors read and approved the final manuscript.

#### FUNDING

This work was supported by the Defense Threat Reduction Agency (HDTRA1-15-10052 and HDTRA1-20-10004 to MD and MJ), National Institutes of Health Grant (R01 GM137314-01 and R01 GM127578-01 to MD and 1U19AI142780-01 to MJ), National Science Foundation (Grants # CBET 1159581, CBET 1264701, and CMMI 1728049 to MD; MCB 1716766 to MJ; and CBET 1936823 and MCB 1413563 to MD and MJ), the Bill and Melinda Gates Foundation (Grant OPP1217652 to MD and MJ), the DARPA 1000 Molecules Program (HR0011-15-C-0084), the Human Frontiers Science Program (Grant RGP0015/2017), the David and Lucile Packard Foundation (to MJ), and the Camille Dreyfus Teacher-Scholar Program (to MJ). TJ was supported by a Royal Thai Government Fellowship and the Cornell Fleming Graduate Scholarship. AA was supported by an NIH Chemical Biology Interface (CBI) training fellowship (supporting grant T32GM008500) and a National Science Foundation Graduate Research Fellowship (DGE-1650441).


J. D. Esko, P. Stanley, G. W. Hart, M. Aebi, et al. (New York, NY: Cold Spring Harbor), 41–49.


via a nonbiosynthetic path utilizing cellulase as catalyst. J. Am. Chem. Soc. 113, 3079–3084. doi: 10.1021/ja00008a042


in vitro enzymatic glycosylation. Proc. Natl. Acad. Sci. U.S.A. 115, 720–725. doi: 10.1073/pnas.1718172115


YjiC-mediated glycosylation toward flavonoids. Carbohydr. Res. 393, 26–31. doi: 10.1016/j.carres.2014.03.011


**Conflict of Interest:** MD has a financial interest in Glycobia, Inc., Versatope, Inc., and Swiftscale Biologics, Inc. MD's interests are reviewed and managed by Cornell University in accordance with their conflict of interest policies. MJ has a financial interest in StemLoop, Swiftscale Biologics, Inc., and Design Pharmaceuticals. MJ's interests are reviewed and managed by Northwestern University in accordance with their conflict of interest policies.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Jaroentomeechai, Taw, Li, Aquino, Agashe, Chung, Jewett and DeLisa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.