Unexpected Arabinosylation after Humanization of Plant Protein N-Glycosylation

As biopharmaceuticals, recombinant proteins have become indispensable tools in medicine. An increasing demand, not only in quantity but also in diversity, drives the constant development and improvement of production platforms. The N-glycosylation pattern on biopharmaceuticals plays an important role in activity, serum half-life and immunogenicity. Therefore, production platforms with tailored protein N-glycosylation are of great interest. Plant-based systems have already demonstrated their potential to produce pharmaceutically relevant recombinant proteins, although their N-glycan patterns differ from those in humans. Plants have shown great plasticity towards the manipulation of their glycosylation machinery, and some have already been glyco-engineered in order to avoid the attachment of plant-typical, putatively immunogenic sugar residues. This resulted in complex-type N-glycans with a core structure identical to the human one. Compared to humans, plants lack the ability to elongate these N-glycans with β1,4-linked galactoses and terminal sialic acids. However, these modifications, which require the activity of several mammalian enzymes, have already been achieved for Nicotiana benthamiana and the moss Physcomitrella. Here, we present the first step towards sialylation of recombinant glycoproteins in Physcomitrella, human β1,4-linked terminal N-glycan galactosylation, which was achieved by the introduction of a chimeric β1,4-galactosyltransferase (FTGT). This chimeric enzyme consists of the moss α1,4-fucosyltransferase transmembrane domain, fused to the catalytic domain of the human β1,4-galactosyltransferase. Stable FTGT expression led to the desired β1,4-galactosylation. However, additional pentoses of unknown identity were also observed. The nature of these pentoses was subsequently determined by Western blot and enzymatic digestion followed by mass spectrometric analysis and resulted in their identification as α-linked arabinoses. Since a pentosylation of β1,4-galactosylated N-glycans was reported earlier, e.g., on recombinant human erythropoietin produced in glyco-engineered Nicotiana tabacum, this phenomenon is of a more general importance for plant-based production platforms. Arabinoses, which are absent in humans, may prevent the full humanization of plant-derived products. Therefore, the identification of these pentoses as arabinoses is important as it creates the basis for their abolishment to ensure the production of safe biopharmaceuticals in plant-based systems.


INTRODUCTION
Recombinant protein biopharmaceuticals are highly effective and specific, and therefore essential in the area of healthcare. The advancement of biotechnology made their production feasible, their share in the market has grown steadily in the last decades and is predicted to keep growing (Facts and Figures 2021: The Pharmaceutical Industry and Global Health;Walsh, 2018). The production of high-quality therapeutic proteins is still a complex process. For this the biosynthesis machinery from cells is required, and the choice of the production platform is highly associated with the product´s requirements and quality (Tripathi and Shrivastava, 2019). Proteins are frequently posttranslationally modified. Particularly, protein N-glycosylation, a very common post-translational modification (PTM) in most eukaryotes, is of great importance as most protein biopharmaceuticals need a correct glycosylation to achieve the desired therapeutic efficacy (Solá and Griebenow, 2010) and to prevent immunogenic effects by the pharmaceutical (Zhou and Qiu, 2019). Mammalian (esp. Chinese Hamster Ovary (CHO)) cell lines, have dominated the recombinant biologics industry since the 1990s, largely because their PTMs resemble human ones (Walsh, 2018;Tripathi and Shrivastava, 2019). However, high production costs of these systems and the increasing demand of newly designed protein therapeutics, driven by the growing knowledge of molecular mechanisms of diseases, reveal the need for alternative platforms for tailored production. The current COVID-19 pandemic highlights particularly the urgent need to expand the production capacities for vaccines, diagnostic reagents and therapeutical proteins, such as neutralizing antibodies. Plant-based production of biopharmaceuticals offers an interesting alternative. For this, plants combine several advantageous properties like their ability to produce, fold and post-translationally modify complex proteins, a high range of scalability combined with cost-effective cultivation and the lack of human pathogens which provides inherent safe products (Buyel, 2019). Currently, one plant-produced recombinant therapeutic is on the market (Elelyso ® , a βglucocerebrosidase for the treatment of Morbus Gaucher, Grabowski et al., 2014) and many promising plant-made biopharmaceuticals are in clinical trials. Among them are the HIV-neutralizing human monoclonal antibody 2G12 produced in Nicotiana tabacum (Ma et al., 2015), the Nicotiana benthamiana-derived virus-like particles as candidate vaccines against influenza, dengue fever or COVID-19, respectively (Ward et al., 2020(Ward et al., , 2021Ponndorf et al., 2021) or α-galactosidase for enzyme replacement therapy in Morbus Fabry treatment produced in the moss Physcomitrella (Shen et al., 2016;Hennermann et al., 2019). Physcomitrella provides several beneficial features for biopharmaceutical production (reviewed in Decker and Reski, 2020): its high rate of somatic homologous recombination enables easy genome engineering (e.g., Strepp et al., 1998;Wiedemann et al., 2018), it is able to produce complex recombinant human proteins and multi-level synthetic complement regulators (Reski et al., 2015;Michelfelder et al., 2017;Top et al., 2019;Ruiz-Molina et al., 2021), can be cultivated under Good Manufacturing Practice (GMP) conditions in suspensions with volumes up to 500 L in photobioreactors (Reski et al., 2018). The promising biopharmaceutical candidates mentioned above demonstrate the potential of plant-based systems in this field. All the plantderived biopharmaceuticals approved or in advanced clinical trials have in common, that their efficacy is not impaired by the lack of mammalian-typical N-glycosylation patterns, which differ from those produced in plants. The early processing of N-glycans in plants and mammals is conserved, while their maturation in the Golgi apparatus differs (Gomord et al., 2010). Plant and human N-glycans share the identical heptasaccharide GlcNAc 2 Man 3 GlcNAc 2 (GnGn, Figure 1) diantennary complex-type core structure, while fucosylation of the Asn-linked N-acetylglucosamine (GlcNAc) is α1,3-linked in plants and α1,6-linked in humans. In humans though not in plants, the GnGn core is extended via β1,4-linked galactose, which is often terminally capped with α2,6-linked sialic acid. In plants, the GnGn core is substituted with a β1,2-linked xylose, a sugar not produced in humans, and it is terminally extended by β1,3-linked galactose and α1,4-linked fucose, both linked to the outer GlcNAc residues, forming the trisaccharidic Lewis A (Le a ) epitope. This epitope as well as the plant-specific β1,2-attached xylose and the α1,3-attached fucose have been associated with antibody formation in humans (Fitchette et al., 1999;Wilson et al., 2001). Antibodies recognizing a therapeutic protein can affect its efficacy by altering the pharmacokinetics and pharmacodynamics, and represent an additional safety risk (Tourdot and Hickling, 2019). Therefore, to avoid potential immunogenicity of plant-made therapeutical proteins, plantspecific N-glycan residues have already been tackled. Plantspecific N-glycan xylosylation and fucosylation were eliminated in several plant-based systems by knockout (KO) or downregulation of the genes encoding the respective xylosyltransferases (XT) and fucosyltransferases (FT) (Koprivova et al., 2004;Strasser et al., 2004Strasser et al., , 2008Cox et al., 2006;Sourrouille et al., 2008;Shin et al., 2011;Hanania et al., 2017;Mercx et al., 2017;Jansing et al., 2018). Additionally, Le a epitope formation was abolished in Physcomitrella by knockout of the β1,3-galactosyltransferase 1 (GalT1) encoding gene (Parsons et al., 2012). The triple KO of xt, ft and galt1 in Physcomitrella resulted in an outstanding N-glycan homogeneity, with a strongly predominant GnGn glycosylation pattern (Parsons et al., 2012). This provides a suitable platform for the further glyco-optimization, comprising β1,4-galactosylation and sialylation.
The impact of terminal N-glycan residues on efficacy and functional role of protein therapeutics has been extensively Frontiers in Bioengineering and Biotechnology | www.frontiersin.org February 2022 | Volume 10 | Article 838365 2 reviewed (Jefferis, 2009;Li and d'Anjou, 2009;Tan et al., 2018). Terminal N-glycan sialylation increases the protein surface charge and hides the underlying sugars galactose, GlcNAc and mannose. Renal filtration and elimination rates are retarded for highly charged proteins (Solá and Griebenow, 2010). Additionally, liver asialoglycoprotein receptors recognizing terminal galactose, as well as mannose receptors, mainly on immune cells, recognizing terminal mannose or GlcNAc, are responsible for a rapid clearance of non-sialylated glycoproteins from serum (Datta-Mannan, 2019).
To reach N-glycan sialylation, which has already been stably attained in N. benthamiana and Physcomitrella (Kallolimath et al., 2016;Bohlender et al., 2020), the galactosylated N-glycan acceptor should be provided as a first step. In planta N-glycan β1,4galactosylation has been achieved via expression of heterologous coding sequences (CDSs) of different versions of β1,4galactosyltransferases (β1,4-GalT), including the sequences of various animal species along with the human one and chimeric varieties thereof (Palacpac et al., 1999;Bakker et al., 2001Bakker et al., , 2006Misaki et al., 2003;Huether et al., 2005;Fujiyama et al., 2007;Hesselink et al., 2014;Kittur et al., 2020;Kriechbaum et al., 2020). In these various approaches in different plant species, it has become evident that galactosylation efficiency and quality is influenced by diverse factors. Among them, localization of the enzyme within the Golgi apparatus plays an important role. When localized too early in the Golgi sub-compartments, the β1,4-GalT activity interferes with the activities of the α-mannosidase II (GMII) or the N-acetylglucosaminyltransferase II (GnTII), impeding further N-glycan maturation and leading to incompletely processed mono-antennary galactosylated N-glycans (Strasser et al., 2009;Schneider et al., 2015;Kallolimath et al., 2018). The localization of a protein anchored in the endomembrane system is dependent on the N-terminal cytoplasmic, transmembrane and stem (CTS) domain (Czlapinski and Bertozzi, 2006;Schoberer and Strasser, 2011;Welch and Munro, 2019). Accordingly, the CTS of the human β1,4-GalT, which is apparently localized in the early to Frontiers in Bioengineering and Biotechnology | www.frontiersin.org February 2022 | Volume 10 | Article 838365 medial plant Golgi apparatus was replaced by CTS sequences with an assumed late trans-Golgi localization. Chimeric variants of the β1,4-GalT with different CTS domains, like the CTS of the human sialyltransferase (Strasser et al., 2009), the CTS of the Arabidopsis β1,3-galactosyltransferase 1 (Kriechbaum et al., 2020) or the CTS of the Physcomitrella α1,4-fucosyltransferase (FTGT) (Bohlender et al., 2020) have been described and led to higher shares of di-antennary galactosylated N-glycans. Furthermore, the target glycoprotein itself influences its galactosylation efficiency (Kriechbaum et al., 2020), probably based on conformation-related accessibility.
In this study we analyzed the galactosylation efficiency of the chimeric β1,4-galactosyltransferase FTGT, which consists of the CTS domain of the moss α1,4-fucosyltransferase fused to the catalytic domain of the human β1,4-GalT (Bohlender et al., 2020). Differing from the protein employed in our previous study, here we use recombinant human erythropoietin (rhEPO). Human EPO is a highly glycosylated protein hormone which inhibits apoptosis of erythroid progenitor cells and stimulates their differentiation, increasing the number of circulating mature red blood cells (Jelkmann, 2013). Recombinant hEPO is widely used for the treatment of severe chronic anemia especially associated with chronic kidney disease and chemotherapy (Jelkmann, 2013). Additionally, non-sialylated rhEPO (asialo-rhEPO) is of pharmacological interest due to its tissue-protective activity devoid of erythropoietic activity (Peng et al., 2020).
FTGT expression led to a galactosylation efficiency of about 66% on rhEPO N-glycans, and 65% of the galactosylated fraction consisted of mature di-antennary galactosylated structures. However, up to five additional pentoses were found to be attached to about 92% of all β1,4-galactosylated N-glycans. Pentosylation on β1,4-galactosylated N-glycans was recently reported in N. tabacum and Physcomitrella (Bohlender et al., 2020;Kittur et al., 2020), indicating that this modification might affect different plant-based production systems; but so far no reports are available elucidating its identity. Here, we identified the unknown pentoses as α-linked arabinofuranoses. The arabinose identity was verified by immunoblot-based detection on rhEPO with an anti-α1,5-arabinan antibody and specific digestion of the pentoses from rhEPO with α-L-arabinofuranosidase, confirmed via immunoblot and mass spectrometry analysis.
Arabinoses are not present in humans, and therefore potentially immunogenic (Anderson et al., 1984;Steffan et al., 1995;Leonard et al., 2005). Moreover, they might interfere with the efficient establishment of in planta sialylation. In this regard, the characterization of the undesired pentosylation as α-Larabinosylation is an indispensable step towards the identification of the responsible glycosyltransferase and thus to provide plant-based glyco-engineered biopharmaceuticals with tailored N-glycosylation patterns.

Protein Precipitation from Culture Supernatant
For rhEPO production, the respective Physcomitrella lines were inoculated at an initial density of 0.6 g dry weight (DW)/L and cultivated for 10 days (Parsons et al., 2012). Recombinant hEPO was recovered from culture supernatant by precipitation with trichlorocetic acid as described before (Büttner-Mainik et al., 2011).

Enzymatic Arabinose Digestion
Protein pellets recovered from culture supernatant and containing moss-produced rhEPO were dissolved in a 100 mM sodium acetate buffer containing 2% SDS (pH 4.0). After 10 min shaking (1,200 rpm, Thermomix, Eppendorf) at 90°C and additional 10 min centrifugation at 15,000 rpm the supernatant was transferred to a fresh 1.5 ml reaction tube. SDS was removed from the samples using Pierce ™ detergent removal spin columns (0.5 ml, Thermo Fisher Scientific) according to the manufacturer's instructions. Total protein concentration was determined using bicinchoninic acid assay (BCA Protein Assay Kit; Thermo Fisher Scientific) following the manufacturer's instructions. For each analyzed line, 10 µg of total protein were mixed with one unit of α-L-arabinofuranosidase from either Aspergillus niger or a corresponding recombinant version (E-AFASE or E-ABFCJ, Megazyme, Bray, Ireland) and incubated over night at 40°C. In parallel, enzyme-free samples from each moss line were treated under the same conditions.

SDS-PAGE and Western Blot
For SDS-PAGE, samples of 5-10 µg protein were reduced with 50 mM dithiothreitol (DTT) for 15 min at 90°C and mixed with 4× sample loading buffer (Bio-Rad, Munich, Germany). Protein separation was carried out via SDS-PAGE in 12% polyacrylamide

Mass Spectrometry
The N-glycosylation pattern on rhEPO was analyzed via mass spectrometry (MS) on glycopeptides obtained by double digestion with trypsin and GluC. For this, the samples were reduced as described above and additionally S-alkylated with a final concentration of 120 mM iodoacetamide (IAA) for 20 min at RT in darkness prior to SDS-PAGE. After Coomassie staining as described previously (Bohlender et al., 2020), bands corresponding to the molecular weight of rhEPO, ranging between 20 and 40 kDa, were cut. Double digestions were performed with trypsin (Promega, Walldorf, Germany) and GluC (Thermo Fisher Scientific) in 100 mM ammonium bicarbonate solution at 37°C overnight. Peptide recovery and sample cleanup were performed as described in Top et al. (2019). The initial MS analysis comparing the three test lines (I10, X13 and X24) was performed on a Q-TOF istrument as described in Michelfelder et al. (2017) (Top et al., 2019). Identification of glycopeptides and quantitation was performed as described in Bohlender et al. (2020). In brief, glycopeptides were identified with custom Perl scripts from Mascot mgf files of processed raw data. Precursors were matched at a mass tolerance of 5 ppm and resulting spectra were scanned for the presence of typical glycosylation reporter ions such as GlcNAc oxonium ions (

Statistics
Mass spectrometry data were obtained from technical triplicates. For statistical analyses one-way ANOVA with Šidák's multiple comparison test was performed. Compared were the means of corresponding galactosylated structures of the different treatments. The α-level was set to 0.05.

Expression of FTGT Leads to Efficient N-Glycan β1,4 Galactosylation on rhEPO and Attachments of Additional Unknown Pentoses
To achieve mature β1,4-galactosylation on rhEPO in moss, the plant 174.16, which produces rhEPO devoid of plant-specific xylose and α1,3-attached core fucose (Weise et al., 2007) was transformed with the expression construct coding for the chimeric β1,4-galactosyltransferase FTGT (Bohlender et al., 2020). This construct is targeted to the genomic locus encoding the β1,3-galactosyltransferase 1, galt1. Gene knockout via targeted integration in the galt1 locus was confirmed by PCR, therefore presence of galactose on rhEPO glycopeptides can be inferred to be β1,4-linked and not β1,3 (Parsons et al., 2012). Three lines (I10, X13 and X24) were chosen for MS-based rhEPO glycopeptide analysis. A first MS survey revealed galactosylation in all three lines and on all three rhEPO N-glycosylation sites. However, in line I10 almost no di-antennary galactosylated structures were detected, and in line X13 a larger proportion of immature N-glycans, such as AM structures and a broader heterogeneity of N-glycans at the three different glycosylations sites, compared to line X24, were observed (Supplementary Table S2). Therefore, line X24 was chosen for further studies.
In addition to the expected galactosylation, single or multiple mass additions of 132.0423 Da, unknown from the N-glycans before the introduction of FTGT, were observed. These mass additions, which correspond to the monoisotopic mass of one or multiple attached pentose residues, were detected in all three analyzed FTGTexpressing lines (Supplementary Table S2). Characteristic reporter ions of N-glycan fragments bound to pentoses were detected on MS 2 spectra for all rhEPO glycopeptides (Supplementary Figure S1). This indicates an attachment of the pentoses to the N-glycans and not directly to the peptide backbone.
While up to three pentoses were detected on mono-antennary galactosylated N-glycans, mass shifts corresponding to up to five pentoses were measured on di-antennary galactosylated N-glycans (exemplarily depicted in Figure 1A for the rhEPO glycopeptide HCSLNENITVPDTK). Additionally, some pentosylated N-glycan structures carried single or multiple mass increments of 14.0157 Da, characteristic for methyl groups. These mass increments occurred as one or up to the number of attached pentoses (Figure 1,  Supplementary Figure S2). From this analysis it was not immediately obvious if the detected structures were methylpentoses or deoxy-hexoses (e.g., fucoses), as the monoisotopic mass of a deoxy-hexose matches that of a methyl-pentose.

Western Blot of rhEPO with an Arabinose-specific Antibody
As a first step to identify the nature of the unknown pentoses attached to N-glycans, proteins recovered from the culture supernatants of the β1,4-galactosylating moss line X24, the parental line 174.16, and line Δgalt1 (devoid of any N-glycan galactosylation), were analyzed via Western blot with the antibody LM6-M, which recognizes short α-L-1,5-arabinan chains (Cornuault et al., 2017). For each line a strong and defined signal at a high molecular weight range (>180 kDa) was observed, which in Physcomitrella is known to be associated with arabinogalactan-proteins (Lee et al., 2005). In the lower molecular weight range, a signal of around 37 kDa was detected exclusively in the X24 sample (Figure 2A). To check if this signal is related to rhEPO, a subsequent anti-hEPO detection was performed (after antibody stripping from the membrane). This anti-hEPO immunodetection revealed rhEPO-corresponding signals between 27 and 37 kDa in all analyzed lines ( Figure 2B). The signal with the lowest molecular weight was detected in Δgalt1, which displays the most reduced glycosylation pattern of the three investigated lines. The intermediate signal was derived from the line 174.16, while the signal with the highest molecular weight, ranging from 30 to 37 kDa, was detected in X24, which fits to an increased molecular weight of the rhEPO-attached N-glycans due to additional galactosylation and pentosylation. This upper part of the rhEPO-corresponding band detected in line X24 overlaps with the position of the signal detected with the LM6-M antibody ( Figure 2A). Therefore, we conclude that the LM6-M antibody detects arabinoses on rhEPO produced in moss line X24.

Specific Enzymatic Activity of α-L-Arabinofuranosidase on X24-produced rhEPO Confirmed Arabinosylation
To further investigate the detected arabinose residues attached to β1,4-galactosylated rhEPO N-glycans, samples of all three rhEPOproducing lines were digested with α-L-arabinofuranosidase. The enzyme-treated samples were first analyzed via immunodetection with LM6-M antibodies followed by a detection with anti-hEPO antibodies and compared to mock-treated samples, as a control for possible non-enzymatic hydrolysis.
With the α-L-arabinan-detecting LM6-M antibody, samples treated without α-L-arabinofuranosidase show a similar band profile to the untreated samples analyzed previously. Only in the sample from moss line X24 could a band in the lower molecular weight range of about 37 kDa be detected. Strong LM6-M-derived signals for all undigested samples were observed above 180 kDa, corresponding to arabinogalactan-proteins (Figures 2A, 3A). These high-molecular weight signals disappeared from the α-L-arabinofuranosidase-digested samples, supporting the activity of the enzyme, which is able to digest the 1,5-linked arabinans known to be attached to arabinogalactan-proteins in Physcomitrella (Lee et al., 2005). Furthermore, the arabinosespecific LM6-M-derived signal also disappeared from the digested X24 sample ( Figure 3A). The rhEPO-corresponding signals, however, were detected in all samples with the hEPOspecific antibody Western blot ( Figure 3B), supporting the Frontiers in Bioengineering and Biotechnology | www.frontiersin.org February 2022 | Volume 10 | Article 838365 6 hypothesis that the absence of an arabinose-specific signal after α-L-arabinofuranosidase digest is due to the loss of N-glycanattached arabinoses on rhEPO in the β1,4-galactosylating line X24.

Mass-spectrometric Validation of Enzymatic Digestion of Arabinoses on rhEPO Produced in the β1,4-Galactosylating Moss Line
The exact effect of the α-L-arabinofuranosidase treatment on rhEPO-glycopeptides of the β1,4-galactosylating line X24 was further analyzed in triplicates via mass spectrometry in comparison to undigested samples. The total N-glycan distribution in rhEPO was estimated by adding together quantitative values (peak areas) from detected glycopeptides. Values were further added together for N-glycan classes across all three rhEPO N-glycosylation sites (Figure 4). For easier comparison, the quantitative values of all pentose-carrying Nglycan structures were also added together, in order to distinguish the total pentosylated and non-pentosylated proportion of an identified structure ( Figure 4A). A detailed breakdown of all identified structures and modifications is given in the Supplementary Table S4 and the quantification is depicted in the Supplementary Figure S2. The MS data of X24-derived rhEPO glycopeptides from α-L-arabinofuranosidase-digested and undigested samples showed a total amount of galactosylated N-glycans (comprising pentosylated and nonpentosylated AM, AGn, and AA structures, abbreviations explained in Figure 1C) of about 66% each. Also the FIGURE 3 | Western blots of mock-treated and α-Larabinofuranosidase-digested samples from rhEPO-producing Physcomitrella lines. Ten microgram total protein of precipitated culture supernatants of the rhEPO-producing lines 174.16, Δgalt1 and X24 were digested with one unit of α-L-arabinofuranosidase (α-Arafase), while control samples were treated equivalently but without α-L-arabinofuranosidase (mock). After separation on SDS-PAGE and blotting, the PVDF-membrane was subsequently incubated with the anti-1,5-α-L-arabinan antibody (LM6-M, 1:10) (A) and an anti-hEPO monoclonal antibody (1:4,000) (B). FIGURE 4 | Quantitative MS/MS analysis of the N-glycan distribution on rhEPO from α-L-arabinofuranosidase treated in comparison to mock-treated samples. Prior to MS analysis X24derived rhEPO containing samples were digested with α-Larabinofuranosidases, and mock-treated samples without enzyme addition were prepared in parallel. Quantitative values are derived from detected glycopeptides. For a better visualization of the results, the nomenclature for isomeric structures was simplified (e.g., AM, MA or a mixture of both are all displayed as AM). (A) N-glycosylation patterns of rhEPO are represented as relative percentages of all identified N-glycan structures within a category (α-L-arabinofuranosidasetreated or mock-treated). For easier comparison, the quantitative values of all pentose-carrying N-glycan structures were further added together, thus from each structure the total non-pentosylated and, if applicable, pentosylated shares are depicted. The presence of pentoses on N-glycan structures is displayed as +P, while the range of detected pentoses on the corresponding structure is given in subscripted numbers. (B) For a more detailed representation of the data, the pentosylated share of each N-glycan structure was further depicted according to the defined number of pentoses (indicated as nP) identified on the respective structure. A quantitative profile depicting the share of methylation (+Me) on the identified N-glycan structures is given in (C). Depicted is the mean of three technical replicates with standard deviation. Stars indicate significance levels from a one-way ANOVA (α = 0.05) with subsequent Šidák's post hoc test (*: p < 0.05; **: p < 0.01; ***: p < 0.001; ****: p < 0.0001). M: mannose, Gn: N-acetylglucosamine, A: galactose, P: pentose. A detailed breakdown of all identified structures and modifications is given in the Supplementary Table S4 and  proportion of mono-(AM, AGn) and di-antennary (AA) processed structures within the galactosylated fraction was the same in both conditions, approximately 35 and 65%, respectively. However, in the undigested samples 92% of the galactosylated Nglycans were found to be pentosylated, while in the α-Larabinofuranosidase-treated samples only 29% of the galactosylated structures remained pentosylated ( Figure 4A). The number of pentoses on galactosylated N-glycans in the undigested approach were 28% single, 31% double, 24% triple, 8.5% quadruple and less than 1% quintuple attachments. The α-L-arabinofuranosidases cleaved the pentoses from N-glycans, without a clear selectivity for only one pentose or up to five pentoses at once, leading to a high amount of structures with complete pentose removal and a clear increase of the corresponding N-glycan structure with terminal galactose (Figures 4A,B). This suggests that the remaining pentoses on the digested sample are not present due to enzyme specificity but rather to incomplete efficiency of the enzyme. Finally, we analyzed the single and multiple mass increments of 14.0157 Da in digested and non-digested samples. The fraction of pentosylated structures with 14.0157 Da increments remained constant after digestion with α-L-arabinofuranosidases. In both treatments about 40% of the pentosylated glycans carried this modification, indicating that the α-L-arabinofuranosidases activity decreased the amount of all pentosylated structures, regardless of the presence of the 14.0157 Da mass increments ( Figure 4C). The maximal number of identified 14.0157 Da mass additions on pentosylated N-glycan structures matches the number of attached pentoses and this mass shift was not observed on non-pentosylated N-glycans ( Figure 1A). Moreover, these additions did not interfere with the specific arabinose-cleaving activity of the enzymes and disappeared when the pentoses were cleaved from N-glycans ( Figure 4C). Therefore, and consistent with the specificity of the enzymes, we conclude that the detected masses attached to the galactosylated N-glycans are α-linked arabinofuranoses, which are occasionally methylated.

DISCUSSION
Glycosylation, a frequent and complex posttranslational modification of proteins, is a critical quality feature for glycoprotein-based therapeutics, as it influences their conformation, solubility, activity, pharmacokinetics and antigenicity (Arnold et al., 2007;Solá and Griebenow, 2010). The composition of the respective N-glycans is dictated by intrinsic characteristics of the protein itself, such as conformation, as well as by the glycan-processing enzymes of the production platform (Clausen et al., 2015;Suga et al., 2018). N-glycosylation of most biopharmaceutical production hosts, even the predominantly used mammalian cell systems, differ to their human counterparts to different extents (Wang et al., 2015). For instance, N-glycolylneuraminic acid (Neu5Gc), a sialic acid not existing in humans and consequently associated with antibody formation (Tangvoranuntakul et al., 2003;Padler-Karavani et al., 2011), can be found on N-glycans of glycoproteins produced in some nonhuman mammalian cell lines (Varki, 2001;Ghaderi et al., 2012).
Although plant N-glycosylation differs from the human pattern, its humanization, which includes the removal of plant-specific sugar residues, the introduction of a β1,4-galactosylation capacity and the final establishment of terminal N-glycan sialylation, has been performed to varying degrees in different plant systems (reviewed in Montero-Morales and Steinkellner, 2018). These studies have demonstrated a great flexibility of plants towards glyco-engineering. Especially the moss Physcomitrella offers the additional advantages of a high rate of homologous recombination in mitotic cells, a characteristic feature used for efficient precise genome editing, and a haploid gametophytic tissue, which enables immediate implementation of glyco-modifications (Parsons et al., 2012;Decker et al., 2014;Wiedemann et al., 2018).
The β1,4-linked galactoses on N-glycans provide the anchor for sialic acid, but terminal galactose also plays an important role in non-sialylated glycoproteins. For example, asialo-EPO was proposed to be neuroprotective (Erbayraktar et al., 2003;Peng et al., 2020) and on the Fc domains of monoclonal antibodies terminal N-glycan galactosylation increases complementdependent (Hodoniczky et al., 2005) as well as antibodydependent cytotoxicity (Thomann et al., 2016).
In this study, we established β1,4-galactosylation on rhEPO produced in moss devoid of plant-specific sugar residues. To target the β1,4-GalT activity to the late Golgi compartments, the catalytic domain of this enzyme was fused to the CTS domain of the moss-endogenous α1,4-fucosyltransferase, whose activity is the last known in plant N-glycan maturation (Fitchette et al., 1999;Parsons et al., 2012).
In our earlier study with sialylating moss lines (Bohlender et al., 2020), FTGT-mediated overall galactosylation was up to 89%. From that fraction up to 34% comprised di-antennary galactosylated structures. In contrast, in the current study, 66% of all glycans carried galactoses, and 65% of these were di-antennary galactosylated, indicating a medial-to trans-Golgi localization of the FTGT enzyme. These values are very promising, considering that previous studies reported lower galactosylation efficiencies with up to 20 and 12% diantennary galactosylated rhEPO produced in N. tabacum or N. benthamiana plants, respectively (Kittur et al., 2013;Kriechbaum et al., 2020). However, the galactosylation efficiency on rhEPO produced in N. benthamiana was increased by knocking out the β-galactosidase NbBGAL1, an enzyme responsible for galactose cleavage (Kriechbaum et al., 2020). A similar strategy might be applied to moss.
Accompanying the established human-like galactosylation, we detected the attachment of pentose residues on β1,4galactosylated N-glycans. Up to three pentoses were attached to mono-antennary and up to five pentose residues to diantennary galactosylated N-glycans, which indicates the building of short pentose chains. These were not present in Frontiers in Bioengineering and Biotechnology | www.frontiersin.org February 2022 | Volume 10 | Article 838365 8 the corresponding parental line with an intact β1,3galactosyltransferase (Parsons et al., 2012), indicating that naturally occurring β1,3-galactosylated N-glycans do not display a substrate for this modification.
In planta N-glycan pentosylation on a recombinant protein upon the establishment of β1,4-galactosylation has also been observed in N. tabacum (Kittur et al., 2020), suggesting that this phenomenon is not restricted to Physcomitrella but rather affects plant-based production in general. Pentosylation was also observed in sialylating moss lines (Bohlender et al., 2020). However, in these plants either pentoses or sialic acid could be detected on galactosylated N-glycans, indicating that the pentosylation may interfere with the full N-glycan humanization of plant-derived glycoproteins. This observation confers importance to the elucidation of the respective pentose residues.
Based on immunodetection with LM6-M, a monoclonal antibody recognizing short chains of α1,5-linked arabinan (Cornuault et al., 2017), we could identify the pentoses on moss-produced rhEPO as arabinoses. Specific digestion of these pentoses with α-L-arabinofuranosidase, an enzyme specifically cleaving α1,2-, α1,3and α1,5-linked arabinofuranoses from arabinan molecules, was verified via immunodetection and supported by MS analysis of rhEPO glycopetides. These findings confirm the identity of the pentoses as (short chains of) α-linked arabinofuranoses. Additionally, we found the arabinoses to be occasionally methylated. Some residual pentoses after α-L-arabinofuranosidase digest may be attributed to inefficient hydrolysis of α-1,5-linked arabino-oligosaccharides by the enzymes used. The fact that these residual pentoses (structures with up to two pentoses) present in the α-Larabinofuranosidase-treated sample were not detected by the LM6-M antibody, can be due to the very low concentration of these residual sugars, and the characteristic of LM6-M antibodies that recognize arabinose 2 chains with much less avidity than longer arabinose chains, while no information about its avidity towards single arabinose residues is available (Cornuault et al., 2017).
Recently, the presence of arabinose and methylated arabinose on N-glycans of the microalga Chlorella sorokiniana has been described (Mócsai et al., 2020) and in very rare cases methylation of N-glycans in Physcomitrella wild-type strains was detected (Stenitzer et al., 2022). However, we never observed this sugar and any methylation on N-glycans of glyco-engineered Physcomitrella strains before the establishment of human-like β1,4-galactosylation. Evidently, an arabinosyltransferase from a different biosynthetic pathway recognizes the β1,4-galactosylated N-glycan as substrate. Plants display a wide diversity of cell-wall glycans and O-glycosylated hydroxyproline-rich glycoproteins (Seifert et al., 2021). This diversity originates from the combination of different monosaccharides and various linkages, generated by a huge variety of glycosyltransferases from which a considerable amount has not been thoroughly characterized yet (Showalter and Basu, 2016;Amos and Mohnen, 2019). Some enzymes responsible for the attachment of arabinoses to β1,4-linked galactoses on O-glycosylated arabinogalactan proteins as well as in cell-wall associated structures like rhamnogalaturan I have been described, but many still remain unknown (Léonard et al., 2010;Laursen et al., 2018;Ropartz and Ralet, 2020;Petersen et al., 2021). The identification of the enzyme or enzymes responsible for the arabinosylation of galactosylated N-glycans is therefore not a straightforward task.
For the application of plant-based biopharmaceuticals, this newly appearing N-glycan attachment bears the risk of immunogenicity in patients, as arabinose is a sugar not produced in humans (Anderson et al., 1984;Steffan et al., 1995;Leonard et al., 2005). To avoid arabinose attachment, the responsible arabinosyltransferases need to be identified and abolished by gene targeting to create stable lines devoid of N-glycan arabinosylation. To this aim, our study provides the first important step by elucidating the unknown pentose residues, which helps to ensure the production of safe biopharmaceuticals in plant-based systems.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
LLB performed most of the experiments, SNWH performed the MS data analysis, NB performed some Western blot experiments, FRJ created the analyzed lines I10, X13 and X24, LLB, JP, RR and ELD designed the study and wrote the manuscript.

FUNDING
We gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy EXC-2189 (CIBSS to RR) and GSC-4 (SGBM to FRJ). We acknowledge support by the Open Access Publication Fund of the University of Freiburg.