Activation mechanism and activity of globupain, a thermostable C11 protease from the Arctic Mid-Ocean Ridge hydrothermal system

Deep-sea hydrothermal vents offer unique habitats for heat tolerant enzymes with potential new enzymatic properties. Here, we present the novel C11 protease globupain, which was prospected from a metagenome-assembled genome of uncultivated Archaeoglobales sampled from the Soria Moria hydrothermal vent system located on the Arctic Mid-Ocean Ridge. Sequence comparisons against the MEROPS-MPRO database showed that globupain has the highest sequence identity to C11-like proteases present in human gut and intestinal bacteria. Successful recombinant expression in Escherichia coli of the wild-type zymogen and 13 mutant substitution variants allowed assessment of residues involved in maturation and activity of the enzyme. For activation, globupain required the addition of DTT and Ca2+. When activated, the 52kDa proenzyme was processed at K137 and K144 into a 12kDa light- and 32kDa heavy chain heterodimer. A structurally conserved H132/C185 catalytic dyad was responsible for the proteolytic activity, and the enzyme demonstrated the ability to activate in-trans. Globupain exhibited caseinolytic activity and showed a strong preference for arginine in the P1 position, with Boc-QAR-aminomethylcoumarin (AMC) as the best substrate out of a total of 17 fluorogenic AMC substrates tested. Globupain was thermostable (Tm activated enzyme = 94.51°C ± 0.09°C) with optimal activity at 75°C and pH 7.1. Characterization of globupain has expanded our knowledge of the catalytic properties and activation mechanisms of temperature tolerant marine C11 proteases. The unique combination of features such as elevated thermostability, activity at relatively low pH values, and ability to operate under high reducing conditions makes globupain a potential intriguing candidate for use in diverse industrial and biotechnology sectors.


Introduction
Proteases catalyze the hydrolysis of peptide bonds in proteins and are important in industrial applications (Gimenes et al., 2021). They are used in food and leather processing, as additives to detergents, as pharmaceuticals, and in biorefineries (Barzkar et al., 2018;García-Moyano et al., 2021). Proteases are among the most widely used enzymes globally, accounting for over 60 percent of all enzyme sales (Ward, 2011). Temperature-tolerant proteases offer the possibility for industrial processing at high temperatures by improving reaction rates, enhancing nongaseous reactant solubility, and reducing contamination by mesophiles (Kumar et al., 2000;Barzkar et al., 2018). Deep-sea hydrothermal vents sustain microorganisms at high temperatures (Kuwabara et al., 2007;Pikuta et al., 2007;Nunoura et al., 2008), making them an interesting starting point for the discovery of new thermostable proteases (Barzkar et al., 2018). Moreover, the increasing sequence diversity of encoded proteases revealed in hydrothermal vent microorganisms (Li et al., 2015;Dombrowski et al., 2018;Cheng et al., 2021) offers considerable potential for discovering new and novel proteases with optimized catalytic properties that may support future innovations.
Proteases are remarkably diverse in terms of activity and the nucleophilic residues that participate in hydrolysis (Rawlings and Bateman, 2019). Clostripain is a well-characterized endopeptidase originating from the bacterium Clostridium histolyticum (accession MER0000831) and is a member of enzyme family C11. Peptidases in this family are characterized by the presence of a catalytic cysteine-histidine dyad with a preference for hydrolyzing arginine and lysine bonds in the P1 position (Ogle and Tytell, 1953;Barrett and Rawlings, 1996;Labrou and Rigden, 2004). Clostripain-like proteases are synthesized as inactive zymogens that have various requirements for activation (Kembhavi et al., 1991;McLuskey et al., 2016). Some require divalent cations such as Ca 2+ and/or reducing agents such as dithiothreitol (DTT) for activation and catalysis. Variance in the number of cleavage site(s) for activation is also observed, and in some cases, an amino acid linker peptide is removed (Gilles et al., 1979;Dargatz et al., 1993). Nevertheless, the resulting active peptidase will comprise of a light-and heavy chain making up a macromolecular active heterodimer. In-trans activation has been demonstrated in some, while others activate in-cis, reflecting the accessibility of cleavage sites to neighboring peptidase activity (Herrou et al., 2016;Roncase et al., 2017;González-Páez et al., 2019;Roncase et al., 2019).
This report presents a C11 protease called globupain, with "globu" representing its unclassified Archaeoglobus species origin and "pain" depicting it as a clostripain homolog. The type species, Archaeoglobus fulgidus (Stetter, 1988), of genus Archaeoglobus, was one of the first archaea to have its genome sequenced 25 years ago (Klenk et al., 1997). It has since served as a model thermophilic archaeon and has provided important information about archaeal DNA replication (Maisnier-Patin et al., 2002), DNA repair Knaevelsrud et al., 2010), thermostable enzymes Steen et al., 2001) and enzymes of biotechnological relevance (Isupov et al., 2019;Palombarini et al., 2020). With globupain, we have discovered a novel archaeal clostripain-like protease with a complex activation mechanism. Its unique catalytic properties and high thermal stability makes globupain a promising candidate for industrial applications.

Environmental sampling, DNA extraction, and sequencing
The Soria Moria vent field is part of the Jan Mayen vent fields (JMVFs), located at the southern part of the Mohns Ridge (Pedersen et al., 2005(Pedersen et al., , 2010 in the Norwegian-Greenland Sea (71.2°N, 5.5°W). The end-member fluids of white smokers in the Soria Moria vent field have a pH of 4.1 and a concentration of hydrogen sulfide of 4.1 mmol kg −1 (Dahle et al., 2015). In June of 2011, an in-situ titanium incubator (Stokke et al., 2020) consisting of one chamber filled with 2 g of dried krill shells (Nofima, Bergen, Norway), mixed with grained flange rock material (Dahle et al., 2015), was deployed at ~30-35 cm below seafloor (blsf) in sediments at 716 m depth. The temperature was measured to be ~40°C and ~70°C at 20 and 30 cm blsf, respectively, indicating diffuse hydrothermal venting. The sample was recovered in July 2012, and the incubated material was immediately snap-frozen in liquid nitrogen and stored at −80°C. DNA was extracted with FastDNA™ SPIN Kit for Soil (MP Biomedicals, CA, United States) and sequenced at the Norwegian Sequencing Center in Oslo, NSC. 1

Gene synthesis
Based on primary metagenome data, the globupain gene (GenBank accession OQ718499) was synthesized by GenScript (GenScript, NJ, United States) and codon-optimized for Escherichia coli expression (Supplementary Work Sheet 1). The gene was cloned (cloning site NdeI and XhoI) into pET-21A by GenScript, omitting the predicted 21 amino acid signal peptide (SignalP v6.0, Teufel et al., 2022). The resulting signal-free protein was extended with Met at the N-terminus, whereas the C-terminus was extended before the C-terminal hexahistidine tag (His tag) with Leu and Glu (LEHHHHHH). For identification of amino acids in the catalytic dyad and in maturation (Figure 1), targeted amino acid residues (Table 1) were substituted with Ala, and the respective coding genes were synthesized and cloned by GenScript as described for the wild-type (WT) globupain.

Protein production and purification
Expression plasmid of globupain and substitution variants were transformed into BL21-Gold (DE3) chemically competent E. coli cells (Agilent, TX, United States) using a heat-pulse manual supplied by the manufacturer. Cells were spread onto LB-agar plates supplemented with 100 μg/mL ampicillin and incubated at 37°C overnight. Pre-cultures were inoculated by picking one single colony and incubating at 37°C in LB media containing 100 μg/mL ampicillin with 190 rpm shaking overnight (Innova 44,New Brunswick Scientific,St Albans,United Kingdom). Expression cultures were inoculated with 5% of pre-culture in LB media with 100 μg/mL ampicillin at 37°C and 190 rpm. At OD 600 of 0.6, the temperature was set to 20°C, and the culture was equilibrated for 30 min. Heterologous expression was induced by IPTG brought to 0.1 mM IPTG, followed by overnight incubation (20°C). Cells were harvested by centrifugation at 5,500 rpm for 15 min at 4°C (Allegra™ 21R Centrifuge, Beckman Coulter, CA, United States). Pellets were stored at −20°C.
For purification of globupain and substitution variants, cells were resuspended in lysis buffer (50 mM HEPES, pH 7.5, 300 mM NaCl, 0.25 mg/mL lysozyme, 10% glycerol), placed on ice for 30 min, and lysed by ultra-sonication (5 times with 30% amplitude, in intervals of 20 s on ice using the Vibra Cell with probe model CV188, Sonics and Materials INC, LT, United States). The lysate was clarified by centrifugation at 5,500 rpm for 20 min at 4°C (Allegra™ 21R Centrifuge, Beckman Coulter, CA, United States). The sample was then loaded into a HisTrap HP 5 mL column (Cytiva, Uppsala, Sweden) equilibrated with 20 mM HEPES, 500 mM NaCl, 25 mM imidazole, pH 7.5 with a flow rate of 1 mL/min. After elution with 20 mM HEPES, 500 mM NaCl, 500 mM imidazole, pH 7.5. fractions with the highest amount of enzyme were pooled and concentrated. The buffer was changed (20 mM HEPES, 150 mM NaCl, 0.1% CHAPS, pH 7.5) using Amicon ® Ultra-15 centrifugal filter unit (Merck KGaA, Darmstadt, Germany) with a 30 K molecular weight cut-off. Approximately 1 mL of the concentrated enzyme preparations were purified by gel filtration using a GE 16/600 Superdex 200 pg. column (Cytiva, Uppsala, Sweden). Purified globupain and substitution variants were stored in 20 mM HEPES, 150 mM NaCl, 0.1% CHAPS, pH 7.5 at 4°C.

Maturation/activation
For activation of globupain and substitution variants (Table 1), purified enzyme (< 5 mg/mL) was incubated at 75°C for up to 4.5 h in 20 mM tri-sodium citrate dihydrate, 150 mM NaCl, pH 5.5 (at RT) with 2.5 mM DTT and 1 mM CaCl 2 , respectively (activation buffer). To investigate if globupain could in-trans activate, 10 μg of activated WT globupain was mixed with 10 μg of inactive C 185 A variant. The number and size of cleavage products were assessed by visualization of protein bands on 8%-16% SurePAGE precast gels (GenScript) using MES SDS running buffer (GenScript) in a Bio-Rad Mini-PROTEAN Tetra Cell (BioRad, Hercules, CA, United States). For sample preparation, 4′ lithium dodecyl sulfate (LDS) sample buffer (GenScript) with 2-mercaptoethanol was mixed with the protein sample, followed by denaturing at 95°C for 10 min. Gels were stained with InstantBlue™ ultrafast protein stain (Abcam, Cambridge, United Kingdom), and the size of bands was indicated by broad multicolor pre-stained protein standard (GenScript). Edman sequencing was performed on a Shimadzu PPSQ-53A at the Iowa State University Protein Facility, United States. with 25 mM NH 4 HCO 3 in 50% acetonitrile for 10 min each time. The gels were then dried completely in a Savant Speed Vac Plus AR (Thermo Fisher Scientific, MA, USA). A mixture of 10 mM TCEP and 25 mM iodoacetamide in 25 mM NH 4 HCO 3 was added to cover the gel pieces and this reaction proceeded in the dark for 1 h. Gels were then washed with 25 mM NH 4 HCO 3 and dehydrated with 25 mM NH 4 HCO 3 in 50% acetonitrile. Samples were then dried in a Savant Speed Vac Plus AR (Thermo Fisher Scientific) before addition of 12.5 ng/uL trypsin in 25 mM NH 4 HCO 3 for peptide digestion. Following a 10 min incubation at 4°C, the samples were covered in 25 mM NH 4 HCO 3 and the digestion proceeded at 37°C for 20 h. The supernatant was then transferred to a clean tube and the remaining peptides were extracted from the gel by addition of 50% acetonitrile, 5% formic acid. The extracted digests were then dried and resuspended in 0.1% formic acid to prepare for C18 (CPI International) ZipTip desalting.C18 columns were washed with methanol and spun for 45 s at 3,500 × g. Columns were then cleaned and equilibrated with 50% acetonitrile, 0.1% formic acid and 0.1% formic acid in water, respectively. Samples were then loaded onto columns and spun for 2 min at 2,000 × g. Samples were washed with 0.1% formic acid and spun at 3,500 × g for 45 s. Peptides were eluted from C18 with 50% acetonitrile, 0.1% formic acid by spinning at 3,500 × g for 45 s. Samples were dried in a Savant Speed Vac Plus AR (Thermo Fisher Scientific) and stored at −80°C until they were prepared for mass spectrometry.

LC-MS/MS
Samples were redissolved in 0.1% formic acid prior to LC-MS/MS injection. Chromatography was performed as previously described on an Easy-nLC 1200 (Thermo Fisher Scientific; Myers et al., 2019). Mass spectrometry was performed on an Orbitrap Eclipse with ETD and PTCR (Thermo Fisher Scientific). The scan range was 350-1,800 m/z at a resolution of 60,000 with a 50 ms maximum injection time. The top 8 scans were selected for MS2. MS/MS spectra were analyzed in PEAKS Studio (v 8.5) software (Bioinformatics Solutions Inc.). MS2 data were searched against the combined E. coli (GCA_000022665.2) Sequence alignment of the C11 proteases globupain (Archaeoglobus), clostripain (Clostridium histolyticum), distapain (Parabacteroides distasonis), PmC11 (Parabacteroides merdae), and thetapain (Bacteroides thetaiotaomicron) by ESPript 3.0. Symbols depict results from site-directed mutagenesis of the globupain coding sequence; , His/Cys catalytic dyad; , sites showing resistance against cleavage when the amino acid was mutated into alanine; , sites able to cleave when mutated into alanine. The detected N-terminal residues following activation are shown in bold.

Size-exclusion chromatography
Size-exclusion chromatography (SEC) analysis was performed using a Superdex 75 10/300 GL prepacked column connected to ÄKTA pure 25 chromatography system (GE Healthcare). The column was equilibrated with a 50 mM potassium phosphate buffer (pH 7.0), 150 mM NaCl and then loaded with a 500 μL sample of globupain protein (1 mg/mL). The flow rate of the run was adjusted to 0.5 mL/ min, and the absorbance was measured at 280 nm (mAU, milliabsorbance units). For the experiment, the column was calibrated with proteins of known molecular weight: alcohol dehydrogenase (tetramer), 146,800; bovine serum albumin, 66,000; ovalbumin, 43,000; trypsin inhibitor, 22,000; and cytochrome C, 12,400 (Sigma-Aldrich, St. Louis, MO, United States). Dextran blue 2000 (Cytiva) was used to determine the column void volume.

Analytical ultracentrifugation
Sedimentation velocity experiments were performed in a Beckman-Coulter ProteomeLab XL-I analytical ultracentrifuge (Indianapolis, IN, United States), equipped with AN 60Ti 4-hole rotor and 12 mm path length, double-sector charcoal-Epon cells, loaded with 400 μL of samples and 410 μL of buffer (50 mM potassium phosphate buffer pH 7.0, 150 mM NaCl, and 1 mM EDTA). The experiments were conducted at 20°C and 50,000 rpm, using continuous scan mode and radial spacing of 0.003 cm. Scans were collected in absorbance, in 4 min intervals at 280 nm. Data were analyzed using the "Continuous c(s) distribution" model of the SEDFIT program (Schuck, 2000), with a confidence level (F-ratio) specified to 0.6. Biophysical parameters of the buffer: density (1,01395 g/cm 3 ), and viscosity (1,030 mPa s), were measured at 20°C using Anton Paar DMA 5000 density meter and Lovis 2000 ME viscometer. Protein partial specific volume (V-bars) was estimated at 0.7309 mL/g using SEDNTERP software (version 1.10, Informer Technologies Inc., Dallas, TX, United States). The results were plotted using GUSSI graphical program (Brautigam, 2015).

Thermal stability analysis
Thermostability of the inactive globupain and its activated form were assayed by nanoscale differential scanning fluorimetry (nanoDSF). Measurements were performed with Prometheus NT.48 instrument (NanoTemper Technologies, München, Germany) and PR.ThermControl software using standard grade capillaries. Before measurement, the capillaries were sealed with a sealing paste according to the manufacturer recommendations. The results were further analyzed with PR.StabilityAnalysis software. Thermostability of globupain zymogen at 0.4 mg/mL concentration and its activated form were assayed in 20 mM tri-sodium citrate (pH 5.5) buffer with 150 mM NaCl and 20 mM HEPES buffer (pH 7.5), 500 mM NaCl with 25 mM imidazole, respectively. Melting temperature (T m ) of proteins was determined by thermal unfolding with a temperature gradient between 20°C and 110°C at a ramp rate of 1°C/min. Thermal unfolding was measured by tryptophan and tyrosine fluorescence change at 330 and 350 nm emission wavelengths. All measurements were performed in triplicates.

Casein activity assays
The proteolytic activity of activated globupain and its substitution variants was assessed using the casein gelzan™ CM plate assay and the EnzChek™ Protease Assay Kit (Thermo Fisher Scientific, MA, USA), respectively. The gelzan™ CM plate assay was prepared by autoclaving 1.5% gelzan™ CM (Sigma-Aldrich) dissolved in 20 mM tri-sodium citrate (pH 5.5) buffer with 150 mM NaCl. The casein powder (Sigma-Aldrich) was dissolved in 20 mM tri-sodium citrate dihydrate (pH 5.5), 150 mM NaCl. NaOH was added until the casein was fully dissolved in the solution and then autoclaved at 115°C for 10 min. Casein was added to the gelzan™ CM solution at a final concentration of 1.0%. The casein gelzan™ CM solution was poured into sterile glass Petri dishes and set to harden. Wells were made by punching holes in the plates using an inverted sterile 1 mL pipet tip. To test for proteolytic activity, 60 μL of activated globupain at 0.7-1.0 mg/mL was added to wells and incubated overnight at 75°C. Clearance zones would indicate caseinolytic activity.
When the proteolytic activity was assessed using the EnzChek™ Protease Assay Kit (Thermo Fisher Scientific), 20 mM tri-sodium citrate (pH 5.5) buffer with 150 mM NaCl was used to dilute the 1.0 mg/mL stock solution of BODIPY FL casein to 10 μg/mL. An aliquot of the activated enzyme (0.15 μg) was then added to the reaction mixture (100 μL of total volume) comprising 12.5 μL of 10 μg/ mL BODIPY FL casein working solution and 77.5 μL of 20 mM tri-sodium citrate (pH 5.5) buffer with 150 mM NaCl, 10 mM DTT, and 1 mM CaCl 2 . The caseinolytic activity was measured by running a time-resolved fluorescence read at 60°C, measuring fluorescence intensity every 20 s for 100 cycles. Fluorescence was measured with excitation wavelength 485 nm and emission wavelength 530 nm using an EnSpire™ 2300 Multilabel Reader (PerkinElmer, Turku, Finland). All measurements were done in triplicates and baseline corrected using GraphPad Prism 9.1.0. tert-butyloxycarbonyl, benzyloxycarbonyl, and pyroglutamyl, respectively. Substrates containing K-AMC were synthesized by Dennis Wolan, The Scripps Research Institute, La Jolla, California and purified to >95%. All substrates were stored at −20°C as 10 mM stocks in DMSO. Substrates were diluted to 100 μM in 20 mM tri-sodium citrate dihydrate, 150 mM NaCl, 2.5 mM DTT, and 1 mM CaCl 2 , pH 5.5, and mixed 1:1 with globupain such that the final concentration in the assay was of 2.9 μg/mL enzyme and 50 μM substrate. Assays were performed in triplicate wells of a black 384-well plate (Thermo Fisher Scientific). Fluorescence was measured at 50°C over 1 h at excitation 360 nm and emission 460 nm on a BioTek Synergy HTX Multimode Reader (BioTek, Agilent, Tx, United States). The reaction rate was calculated as the maximum velocity over 12 sequential readings and means with standard errors were calculated. A Welch's ANOVA and Brown Forsythe ANOVA were performed to calculate significances in GraphPad Prism 9.1.0. With Boc-QAR-AMC substrate, the Michaelis Menten kinetics was assessed at final concentrations ranging from 0 to 400 μM, and a Michaelis-Menten curve was fitted in GraphPad Prism 9.1.0 (Supplementary Figure 1).

Determining the pH-and temperature optimum
The pH optimum was determined using 5 nM globupain assayed against 50 μM Boc-QAR-AMC substrate in citrate phosphate buffer at various pH values. Buffers were made by mixing 0.2 M NaHPO 4 and 0.1 M citric acid following the McIlvaine's buffer system (McIlvaine, 1921) and pH was verified using a pH-meter. Samples were preincubated at 50°C for 10 min before fluorescence was measured. The optimum temperature for activity was assessed using Boc-QAR-AMC by incubating the enzyme and substrate at temperatures; 30°C, 40°C, 50°C, 60°C, 70°C, 80°C, 85°C, 90°C, 95°C, 100°C, 105°C, 110°C, 120°C, and 130°C in triplicate tubes in a final volume of 50 μL. The reaction temperatures were controlled using a digital dry bath (Thermo Fisher Scientific) with max temperature of 130°C set to the respective temperatures. After 10 min, the enzyme was inactivated by mixing 1:5 in 8 M urea. Samples were plated on a black 384-well plate (Thermo Fisher Scientific), and the total fluorescence was measured at excitation 360 nm and emission 460 nm. The data reported is the average RFU for each temperature with standard error. Gaussian distribution was fitted in GraphPad Prism 9.1.0. For the time-dependent loss of enzyme activity at pH 5.5 and pH 7.1, the enzyme activity (0.1 mg/mL) was measured at 60°C with the EnzChek™ Protease Assay Kit. All readings were done in triplicate reactions with the exception of duplicates for pH 7.1 at 120 min. Measurements were baseline corrected in GraphPad Prism 9.1.0. Statistical analyses were performed using RStudio 2022.07.1 + 554 "Spotted Wakerobin" release (RStudio, 2023).

Data availability
The native C11 globupain protease has been submitted to GenBank under the accession number OQ718499. The sequence is archived under BioProject PRJNA296938 and derived from the primary metagenomic assembly, BioSample SAMN04111445. The reconstructed Archaeoglobus genome, INS_M23_B45, has been archived under the BioSample SAMN33944460 derived from the secondary metagenome, BioSample SAMN33925184. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JARQZL000000000. The version described in this paper is version JARQZL010000000.

Metagenomic globupain discovery
By conducting deep-sea hydrothermal in situ enrichments emended with targeted biomass, we have previously shown the induced shifts in community structure toward higher fractions of heterotrophic microorganisms (Stokke et al., 2020). Furthermore, in silico screening from derived metagenomes has shown a high potential for discovering novel enzymes Stepnov et al., 2019;Vuoristo et al., 2019;Arntzen et al., 2021). In the current study, a novel protease named globupain was identified from an in situ enriched metagenome and targeted for expression and characterization. The selected gene encoded a C11 protease and originated from a metagenome-assembled genome (MAG) classified as an uncharacterized genome within the genus Archaeoglobus (INS_ M23_B45; SAMN33944460 INS_M23_B45; SAMN33944460). The putative polypeptide comprised 481 amino acids with a 21 amino acid N-terminal signal peptide (Figure 1). The estimated molecular mass after signal peptide removal was 52.0 kDa, and the pI was 4.2, as determined by the ProtParam tool (Gasteiger et al., 2005). The highest sequence identity scores against the MEROPS-MPRO database (Rawlings et al., 2016) were of the human gut and intestinal C11 members; clostripain of 23.5% (C. histolyticum), distapain of 27.3% (Parabacteroides distasonis), PmC11 of 24.2% (Parabacteroides merdae) and thetapain of 26.9% (Bacteroides thetaiotaomicron). Sequence alignments (Figure 1) indicated a conserved catalytic His/ Cys dyad in globupain at positions 132 and 185, respectively. Moreover, in the globupain model obtained with AlphaFold ( Figure 2A) deep learning-based algorithm (Jumper et al., 2021), structural similarities were observed between the predicted structure and the available PmC11 crystal structure ( Figure 2B; PDB ID: 4YEC). The active site residues in PmC11 (e.g., D 177 ), including catalytic H 133 and C 179 , were conserved between both structures ( Figure 2C). The residues which did not overlay well with PmC11 include an area between the known heavy-and light chain of PmC11 ( Figure 2D) and a long C-terminal region ( Figure 2E).

Globupain activation
Globupain and substitution variants were expressed as soluble proteins in E. coli BL21-Gold (DE3) cells, with almost 100% of the total recombinant protein as soluble enzyme (Supplementary Figures 2,  3). The purified WT enzyme (Supplementary Figure 4) was produced as an inactive zymogen. However, incubation at 75°C for 4.5 h (Supplementary Figure 5) in the activation buffer resulted in an active form of the C11 globupain. SDS-PAGE imaging showed that the 52 kDa zymogen was cleaved into a 32 kDa heavy chain and a 12 kDa light chain ( Figures 3A,B), forming a heterodimer stabilized by noncovalent bonding. Oligomeric structure analysis performed by Frontiers in Microbiology 07 frontiersin.org size-exclusion chromatography and analytical ultracentrifugation (AUC) revealed that globupain in zymogen form exists in solution as a homodimer (Supplementary Figures 6, 7). While the zymogen is inactive, the enzyme, after activation, can hydrolyze casein ( Figures 3E,F). Globupain activation and its enzymatic activity, as in the case of other C11 proteases (Labrou and Rigden, 2004), depends on the presence of a His/Cys catalytic dyad in the primary protein sequence (H 132 /C 185 ). H 132 of the light chain is responsible for deprotonating the neighboring C 185 in the heavy chain, which then promotes its nucleophilic attack on the substrate. For globupain, we found that substitution variant C 185 A generated by site-directed mutagenesis cannot be activated ( Figure 3C). Also, no caseinolytic activity was observed for either H 132 A or C 185 A variants ( Figure 3E). Edman sequencing on a Shimadzu PPSQ-53A at the Iowa State University Protein Facility of the heavy chain revealed that the N-terminus consisted of G 145 VCWD; hence cleavage (*) occurred between K 144 and G 145 within the sequence LPPIK*GVCWD (Figure 1). To further evaluate globupain autoprocessing at this cleavage site, a K 144 A variant was constructed. Notably, processing of the zymogen into this variant's heavy-and light chain still occurred with similar size of cleaved products as WT globupain, as visualized by SDS-PAGE (Supplementary Figure 8). To further assess if this result could be explained by cleavage after nearby Lys residues, 7 new variants were synthesized (Table 1; Supplementary Figure 8). Only the double (K 137 A/K 144 A) and triple (K 137 A/K 139 A/K 144 A) mutants failed to activate into the processed form ( Figure 3C; Supplementary Figure 8) and remained catalytically inactive ( Figure 3E), which altogether suggests that globupain can self-activate by cleavage after both K 137 and K 144 (Figure 1), respectively.
When mixing the WT zymogen with inactive C 185 A and performing the standard activation protocol, both the WT and C 185 A proteins were processed into the light-and heavy chain ( Figure 3D). This finding demonstrates that globupain can activate in-trans and indicates that the sites for activation are exposed for cleavage by nearby proteases. Interestingly, the activation sites result in the removal of the unique region that poorly overlays with PmC11 ( Figure 2D).
The combined molecular mass of the heavy-and light chain of activated globupain was determined to be 44 kDa, which, when compared to the 52.0 kDa zymogen ( Figures 3A,B), indicates that additional autoprocessing occurs during activation. This discrepancy in molecular weight points to a likely cleavage in the C-terminal region, which the model supports ( Figure 2E). Activated globupain (Supplementary Figure 9) failed to bind to the Ni 2+ affinity column, and the C-terminal His tag of globupain was not detected on either light-or heavy chain (Supplementary Figure 10), altogether, revealing that a C-terminal fragment that contains the His tag was removed during autoprocessing. In-gel digest and subsequent proteomics of the 44 kDa protein band without C-terminal His tag upon activation (Supplementary Figure 10) revealed the most N-terminal and C-terminal tryptic peptides to be I 65 DGYDDSYGNWTTAK 79 L and F 384 ATDTLWDEFLNR 396 (data can be found at ProteomeXChange: Globupain modeled structure predicted with AlphaFold in comparison to PmC11. (A) AlphaFold predicted structure of globupain is represented by cartoon with transparent surface (pale cyan). (B) AlphaFold of globupain compared to crystallized PmC11 (PDB ID: 4YEC) structure (light pink) in a 3D alignment showing their structural similarity. (C) Active site residues (e.g., His and Cys) are conserved among the two aligned structures. (D) Light-and heavy chain cleavage region is depicted for the modeled globupain superimposed to PmC11's structure, as well as the (E) likely C-terminal cleavage region. Images were generated with PyMOL (v0.99c).
Frontiers in Microbiology 08 frontiersin.org PXD042411 or ftp://massive.ucsd.edu/MSV000092007/). This corresponds to a 332 amino acid protein fragment with estimated Mw of 37.67 kDa using the ProtParam tool (Gasteiger et al., 2005). Although this proteomics study cannot clarify the exact cleavage location, it suggests that the site for C-terminal processing occurs at R 396 or C-terminal of this site. From the sequence alignments (Figure 1) of globupain and several family C11 members, it is clear that R 396 does not represent a conserved Arg cleavage site for the respective enzymes. Further, overlay of the PmC11 crystal structure (Figures 2B,E) and the modeled globupain structure suggest that the processing occurs in the non-conserved structural region of the two enzymes. Moreover, manual inspection of the primary sequence suggested that K 383 and R 396 might be the putative cleavage sites. However, each of the enzyme variants K 383 A, R 396 A, and K 383 A/R 396 A were still processed into the active form, and their C-terminal portion was removed (Supplementary Figure 11).

Substrate specificity (AMC) determination
To quantify globupain activity in a microwell plate assay, the enzyme was incubated with three substrates that were previously developed for another clostripain-like C11 family member known as PmC11 (Roncase et al., 2017). This enzyme was encoded in the P. merdae genome. The substrates consisted of tetrapeptides (VLXK) with an N-terminal acetyl group (Ac) and a C-terminal AMC reporter group. These substrates were chosen as the P1 residue corresponds to Activation of globupain. (A) Schematic representation of the primary structure of globupain. The enzyme was overproduced without the N-terminal 21 amino acid signal peptide (SP). The light chain (yellow) and the heavy chain (green) of the active heterodimer result from zymogen activation. Cleavage sites at K 137 and K 144 are shown. H 132 and C 185 of the catalytic dyad are indicated on the light-and heavy chain, respectively. The putative C-terminal region is indicated with gray stripes. Single and double mutation variants of K 383 and R 396 were tested for zymogen activation. (B) SDS-PAGE gel presentation of inactive C 185 A and wild-type (WT) of 52 kDa, respectively, and activation of WT into a 32 kDa heavy-and 12 kDa light chain when incubated at 75°C for 4.5 h in activation buffer. The region within the red rectangle was excised for N-terminal sequencing. (C) SDS-PAGE gel analysis shows that the C 185 A variant and K 137 A/K 144 A variant cannot process into a heterodimer. (D) SDS-PAGE image of WT, C 185 A and WT + C 185 A incubated at 75°C for 0 h and 8 h in activation buffer shows that the enzyme is able to in-trans activate. (E) When activated for 4.5 h at 75°C in activation buffer, globupain can cleave casein whereas mutation variants H 132 A, C 185 A and K 137 A/K 144 A showed no increase in fluorescence (RFU) when assayed with EnzChek™ Protease Assay Kit at 60°C. (F) Casein-gelzan™ plate showing globupain zymogen and clearance zones when activated at 75°C for 0-8 h in activation buffer.
Frontiers in Microbiology 09 frontiersin.org the N-terminal auto-activation site of globupain. For PmC11, Ac-VLTK-AMC was most efficiently cleaved, followed by Ac-VLGK-AMC. However, the substitution of the P2 residue for a hydrophobic Val side chain ablated substrate turnover by PmC11. All three substrates were tested against globupain and found them to be cleaved at a similar rate ( Figure 4A). This finding revealed that globupain has broader substrate specificity to PmC11 at the P2 position. Subsequently, 7 other substrates with Lys at P1 available in the laboratory were tested ( Figure 4B). Globupain was able to cleave all substrates and cleaved Ac-AIK-AMC with highest efficiency. Interestingly, globupain cleaved each of the 7 new substrates more efficiently than the set of three initial substrates, which were based on the optimal substrate of PmC11 from P. merdae, indicating a distinct specificity to PmC11. The most statistically significant cleavage differences (p < 0.01) among the new 7 substrates occurred between Ac-AIK-AMC (the most efficient) and Ac-YK-AMC (least efficient). Activity with Ac-AIK-AMC was also statistically significantly increased (p < 0.01) compared to Ac-AEIK-AMC. We hypothesized that the broad enzymatic activity of globupain for degrading casein may be partially due to cleavage following a structurally similar amino acid, arginine. Therefore, fluorogenic substrates were examined with Arg as the P1 residue ( Figure 4C). This screen of 7 additional substrates revealed that globupain cleaved substrates with even higher efficiency than the previous best with Lys at P1. However, some substrates, such as z-RR-AMC and Pyr-RTKR-AMC, showed minimal cleavage, indicating that globupain may favor non-polar amino acids at the P2 position. Out of the 17 AMC substrates that were tested in this study, it was clear that globupain had a strong preference for Arg in the P1 position, with Boc-QAR-AMC as the best substrate.
3.4. Thermal stability, optimal temperature, and pH Using the Boc-QAR-AMC substrate, the temperature optimum of globupain was determined to be 75.4°C ± 0.56°C and remained 90% active at 60°C and 90°C ( Figure 5A). Thermal stability of inactive globupain as characterized by melting temperature, which indicates the point at which half the protein is unfolded was 84.59°C ± 0.21°C. The activated heterodimer's melting temperature was 94.51°C ± 0.09°C ( Figure 5B). Finally, the optimum pH of globupain using the Boc-QAR-AMC substrate was evaluated. The optimum pH for catalytic activity was calculated to be pH 7.1 ( Figure 6A). Initial Analysis of Covariance (ANCOVA) indicated that there was a significant effect of pH on fluorescence after controlling for time, F(3,26) = 42.85, p < 0.05, R 2 = 83.18%. A follow-up post hoc Tukey's Honest Significance Difference Test (HSD) indicated that pH 7.1 had a stronger effect on decreasing RFU over time relative to the effect measured on RFU at pH 5.5, p < 0.05 ( Figure 6B). Thus, while the optimum pH is higher than the pH used for activation, the enzyme was shown to be more stable against autolysis at pH 5.5 than at pH 7.1 ( Figures 6C,D) which supported our use of pH 5.5 buffers for the biochemical characterization of globupain.

Discussion
In this study, we characterized the novel cysteine protease, globupain belonging to enzyme family C11. Globupain was prospected from metagenomic data assigned to an unclassified Archaeoglobus species from the Arctic Mid-Ocean Ridge vent fields. The enzyme was highly soluble, expressing at relatively high concentrations in E. coli (Supplementary Figure 4). Two protein bands (52 kDa and 40 kDa) with intact C-terminal His tag were visualized on SDS-PAGE gels after protein purification (Supplementary Figure 10). The zymogen of globupain is processed in the N-terminal region at K 137 and K 144 to yield a heavy-and light chain when exposed to activation conditions. Similar to clostripain from C. histolyticum (Kembhavi et al., 1991), globupain requires calcium and a reducing environment for activation (Supplementary Figure 5). This condition contrasts the C11 protease, PmC11 from P. merdae, which activates independently of calcium (McLuskey et al., 2016). When activated, the globupain enzyme cleaves Substrate utilization by globupain. WT enzyme was assayed against 50 μM of each fluorescent substrate at 50°C, fluorescence was measured, and the rate of enzyme cleavage, V max , for each substrate is reported. (A) Globupain was assayed against the substrates initially designed for PmC11. (B) Globupain was assayed against 7 additional substrates with Lys at P1. (C) Globupain was assayed against 7 additional substrates with Arg at P1. Microbiology  10 frontiersin.org off a C-terminal region which our proteomic analysis indicated that it was at R 396 or C-terminal to this site. This kind of autoprocessing is not uncommon for C11 proteases; for example, activation of clostripain starts with a 23 amino acid pro-peptide removal (Dargatz et al., 1993). The two cut sites of globupain at K 137 and K 144 , leads to the removal of a 7-amino acid linker sequence and the formation of a heterodimer consisting of a heavy-and light chain. For clostripain, a linker peptide is removed by cleavage at two Arg sites (Gilles et al., 1979;Dargatz et al., 1993). When activated, globupain showed the ability to in-trans activate and implied that the cut sites (K 137 and K 144 ) are exposed to proteolytic  Effect of pH on activity and autolysis of globupain. (A) pH optimum resolved by assaying with the substrate Boc-QAR-AMC in buffers ranging from pH 2 to pH 8. Enzyme activity is shown as V max for the different pH-values. (B) Time-dependent loss of enzyme activity at pH 5.5 and 7.1, respectively. Activity is shown as relative percent with standard deviations based on RFU measurements. (C) SDS-PAGE gel presentation reveals intact globupain after incubation at pH 5.5 whereas at pH 7.1 (D), autolysis is observed, explaining the loss of activity in (B).

Frontiers in
Frontiers in Microbiology 11 frontiersin.org cleavage by neighboring proteases. This kind of activation is known to occur for several C11 enzymes such as thetapain , fragipain (Herrou et al., 2016), and distapain (González-Páez et al., 2019) and contrasts PmC11 which activates only in-cis (Roncase et al., 2017). Globupain showed maximum activity at pH 7.1. This value is in the same range as known pH optima of PmC11 (pH 8.0), clostripain (pH 7.4-7.8), and thetapain (pH 7.4), respectively (Ogle and Tytell, 1953;Mitchell and Harrington, 1968;Roncase et al., 2017Roncase et al., , 2019. However, globupain showed an optimum temperature of 75°C and matures into a heat tolerant enzyme, which allows it to function in its thermal environment (Dahle et al., 2015). The observed thermal properties are in line with the growth characteristics of cultivated species within the genus Archaeoglobus (Stetter, 1988;Burggraf et al., 1990;Huber et al., 1997;Mori et al., 2008;Steinsbu et al., 2010;Slobodkina et al., 2021) and enzymes characterized previously . Moreover, in comparison to well-characterized industrially relevant marine thermostable proteases, the thermal tolerance of globupain is superior to proteases sourced from marine Bacillus species and in the same range as of proteases from (hyper)thermophilic archaea (Barzkar et al., 2018). Active clostripain-like proteases have been identified in marine sediment archaea (Lloyd et al., 2013). However, the highest sequence similarity scores of globupain using the MEROPS-MPRO database (Rawlings et al., 2016) were C11 proteases that originate from bacteria such as C. histolyticum, P. distasonis, P. merdae, B. thetaiotaomicron that have been found in the human intestinal microbiota (Salyers, 1984;Johnson et al., 1986;Franks et al., 1998). Some of these bacteria have been reported to cause disease and/or affect human health and have been studied to a greater extent (Salyers, 1984;Parracho et al., 2005;McLuskey et al., 2016;Roncase et al., 2017Roncase et al., , 2019Ezeji et al., 2021). This finding highlights the significance of acquiring greater knowledge of marine C11 proteases. Notably, all C11 proteases, including globupain, show a conserved His/Cys catalytic dyad by sequence alignment. Moreover, the catalytic residues were also conserved in the globupain model obtained with AlphaFold (Jumper et al., 2021). Finally, it was shown experimentally using site-directed mutagenesis that in globupain, H 132 and C 185 were critical for activation and activity. When assayed against several AMC substrates, the enzyme showed a clear preference for the substrate Boc-QAR-AMC. Preference for hydrolyzing Arg bonds in the P1 position is a known trait for C11 members (Ogle and Tytell, 1953;Labrou and Rigden, 2004). Globupain showed much lower activity against the Ac-VLTK-AMC substrate, which both PmC11 and thetapain hydrolyze efficiently (Roncase et al., 2017. This observation indicates that the substrate specificity may vary substantially between different C11 proteases despite having sequence and structural similarities around the active site. In conclusion, the revealed temperature tolerance and catalytic properties of globupain render it as a promising protease in diverse industrial and biotechnology sectors. Further studies focused on in-depth knowledge of the substrate specificity (O'Donoghue et al., 2012;Rohweder et al., 2023), effects of protease inhibitors, resistance to organic solvents and chemical denaturants may provide a deeper understanding of the applicability of globupain.

Author contributions
VR, AO'D, and IHS conceived the study. VR, BH, and IHS wrote the manuscript. VR, BH, A-KK, SD, A-EF, HA, MSMS, SM, OW, TK, and RS performed the experiments. All authors contributed to the article and approved the submitted version.