A Case Study of the Glycoside Hydrolase Enzyme Mechanism Using an Automated QM-Cluster Model Building Toolkit

Glycoside hydrolase enzymes are important for hydrolyzing the β-1,4 glycosidic bond in polysaccharides for deconstruction of carbohydrates. The two-step retaining reaction mechanism of Glycoside Hydrolase Family 7 (GH7) was explored with different sized QM-cluster models built by the Residue Interaction Network ResidUe Selector (RINRUS) software using both the wild-type protein and its E217Q mutant. The first step is the glycosylation, in which the acidic residue 217 donates a proton to the glycosidic oxygen leading to bond cleavage. In the subsequent deglycosylation step, one water molecule migrates into the active site and attacks the anomeric carbon. Residue interaction-based QM-cluster models lead to reliable structural and energetic results for proposed glycoside hydrolase mechanisms. The free energies of activation for glycosylation in the largest QM-cluster models were predicted to be 19.5 and 31.4 kcal mol−1 for the wild-type protein and its E217Q mutant, which agree with experimental trends that mutation of the acidic residue Glu217 to Gln will slow down the reaction; and are higher in free energy than the deglycosylation transition states (13.8 and 25.5 kcal mol−1 for the wild-type protein and its mutant, respectively). For the mutated protein, glycosylation led to a low-energy product. This thermodynamic sink may correspond to the intermediate state which was isolated in the X-ray crystal structure. Hence, the glycosylation is validated to be the rate-limiting step in both the wild-type and mutated enzyme.


INTRODUCTION
The organic biopolymer cellulose, which acts as a large source of carbohydrates, is found abundantly in the cell wall of green plants (Updegraff, 1969;Klemm et al., 2005). Cellulose consists of hundreds to thousands of β-1,4 linked glucose units, which can be cleaved to form shorter chains by degradation processes. Glycoside hydrolase (GH) enzymes, as one of many enzyme classes that can modify, decompose, and assemble carbohydrates in nature (Cantarel et al., 2009), are responsible for breaking down the glycosidic β-1,4 linkages between C 1 -O 4 via a hydrolysis reaction. Two mechanisms have been proposed for the reaction, "retaining" and "inverting" (Koshland, 1953). Utilized by many GH families (Cantarel et al., 2009), the retaining mechanism consists of two steps (Sulzenbacher et al., 1997;Davies et al., 1998;Mackenzie et al., 1998). In step one, a proton will be transferred from an acidic residue to the glycosidic oxygen (O 4 ), coupling with a nucleophilic attack of the anomeric carbon of the carbohydrate (C 1 ) forming a glycosyl-enzyme intermediate (GEI). In step two, a water molecule attacks the anomeric carbon, breaking the GEI bond and transferring a proton back to the acidic residue. These two steps are also respectively called glycosylation and deglycosylation. In the less-common inverting mechanism, proton transfer causes the cleavage of the glycosidic bond to take place via nucleophilic attack by water (Koshland, 1953). The focus of this study will be exploring the retaining mechanism, which is utilized by many GHs, specifically GH Family 7 (GH7). Note that a protein in the GH7 family can be interchangeably called Cellobiohydrolase 7A (Cel7A), Exocellobiohydrolase I (CBHI), and 1,4-β-cellobiohydrolase, though the recommended name is Exoglucanase 1 to adhere to the Enzyme Commission number EC:3.2.1.91.
The glycosyl substrate distortion in the hydrolysis of glycosides has been studied extensively (Vocadlo and Davies, 2008;Mayes et al., 2014). The IUPAC conventions for labelling ring conformations are used throughout (Conformational Nomenclature for Five and Six-Membered Ring Forms of Monosaccharides andTheir Derivatives. Recommendations 1980, IUPAC-IUB, 1980). Large conformational change of the −1 glycose unit from the half-chair, envelope, or boat-like conformation to the 4 C 1 chair conformation is typically observed in the glycosylation step (Vocadlo and Davies, 2008). This was also observed in a recent study by Beckham, Ståhlberg, Withers, Götz, et al. (Knott et al., 2014b), that the −1 glucose unit adopts a 4 E envelope conformation in the Michaelis complex of the H. jecorina Cel7A (HjeCel7A) E217Q mutated protein and an intact cellononaose chain (PDB: 4C4C) and a 4 C 1 chair conformation in the GEI; which was further confirmed in molecular dynamics (MD) simulations using the theoretical model of the wild-type enzyme (PDB: 8CEL, a Michaelis complex of the protein and 10 glucosyl units). The MD simulations suggested that the glucose unit changed from a half-chair ( 4 H 5 ) in the Michaelis complex to a half-chair ( 4 H 3 ) at the transition state, then to a stable chair ( 4 C 1 ) conformation in the glycosylation product state (Knott et al., 2014b). The transition state geometries were also found to be able to adopt the 4 H 3 and 3 H 4 half-chair and 2,5 B and B 2,5 boat conformations (Cremer and Pople, 1975;Davies et al., 2003;Vocadlo and Davies, 2008;Satoh and Manabe, 2013).
In an early mutation study of TrCel7A, the catalytic activities (k cat ) were measured to be 12.8 ± 0.2, 0.15 ± 0.05, 0.035 ± 0.005, and 0.0063 ± 0.0005 min −1 at pH 5.7 and 37°C for the wild-type, the D214E, E212Q and E217Q mutated proteins, respectively. These values correspond to a free energy of activation of 19.1 kcal mol −1 for the wild-type protein and 23.8 kcal mol −1 for the E217Q mutated protein (Ståhlberg et al., 1996), indicating the E217Q mutation slowed down the reaction. An experimental study using high-speed atomic force microscopy found that the rate of processive cellobiose hydrolysis by HjeCel7A on a crystalline cellulose surface was 7.1 ± 3.9 s −1 at 30°C (16.3-17.1 kcal mol −1 ) (Igarashi et al., 2011). In a more recent study comparing another GH7 from the fungus Penicillium funiculosum with TrCel7A, the k cat for the E217Q mutated protein was reported to be 14.26 ± 1.05 min −1 at pH = 4 and 55°C (20.2-20.3 kcal mol −1 ) (Taylor et al., 2018). This rate constant is also slower than that of the wild-type protein.
In the study by Beckham, Ståhlberg, Withers, Götz, et al. (Knott et al., 2014b), the mixed quantum mechanics/molecular mechanics (QM/MM) and transition path sampling method was applied to study the full retaining catalytic mechanism starting from the Michaelis complex of theoretical model PDB: 8CEL and 10 glucosyl units (+2 to −7, Glc452 to Glc460) at 300 K and 1.0 bar (Knott et al., 2014b). The simulation results indicated that in the glycosylation step, the nucleophile Glu212, which first hydrogen bonds with Asp214, attacked the anomeric carbon C 1 on the −1 position (Glc454) of glucopyranose, breaking the C 1 -O 4 bond. A proton on the acidic residue Glu217 transferred to O 4 on the +1 position (Glc453) of glucopyranose. In the deglycosylation step, one water enters the active site, undergoes nucleophilic attack on the −1 glycosyl unit bonding to the C 1 , breaks the bond between the glucopyranose and Glu212 oxygen, and lastly donates its proton to Glu217. The QM/MM computations predicted that the glycosylation was the rate limiting step with a free energy of activation of 15.5 kcal mol −1 , and the intermediate structure was 2.5 kcal mol −1 lower in energy than the initial reactant. For deglycosylation, the free energy of activation of the elementary step was 11.6 kcal mol −1 , and the final product was 2.1 kcal mol −1 lower in free energy than the initial reactant.
The glycosylation of GH7 (PDB: 8CEL) was later studied by the Wang group using both a pure quantum mechanics (QM) and QM/MM approach (Li et al., 2010). The activation free energy was computed to be 14.1 kcal mol −1 using a density functional theory (DFT)-based QM-cluster model at the B3LYP/6-31G(d, p) level of theory. The QM-cluster model was very small, only including residues E212, E217, +1 and −1Glc. A higher free energy glycosylation TS (23.9 kcal mol −1 ) was found when using QM/MM in the same study (Li et al., 2010). The relatively high QM/MM glycosylation barrier was explained as arising from the catalytic residues frozen in the MM-region, and this indicated that the active site environment has an important impact on the kinetics (Li et al., 2010). In the same study, deglycosylation was examined by adding one water molecule to the QM-cluster model and a much higher free energy of activation, 32.9 kcal mol −1 at B3LYP/6-31G(d,p) level of theory, was computed for the glycosylation (Li et al., 2010). The results of Wang and coauthors suggest that the QM-region must be appropriately defined to obtain good agreement between experiment and theory for the GH7 enzyme.
Most early computational studies ignored deglycosylation, instead focusing on the rate-limiting glycosylation step. One full catalytic cycle was explored using a small model QMcluster model (two residues and the substrate) built from the Escherichia coli β-galactosidase-galactopyranoside complex system with various DFT functionals and basis sets and found that the glycosylation activation energies ranged from 21.1 to 28.7 kcal mol −1 (Brás et al., 2008). Mutational effects were also studied with QM/MD simulation which predicted the glycosylation barriers to be 32.6 kcal mol −1 for wild-type TrCel7A (Yan et al., 2011). The following nucleophilic attack step had barriers of 0.4 kcal mol −1 for the elementary step (Yan et al., 2011). The glycosylation free energy barrier was computed to be 17.5 kcal mol −1 in a more recent DFTB/MM study on CBHI (Barnett et al., 2011), and 13.4 kcal mol −1 in a QM/MM metadynamics study on GH27 (Pan et al., 2013). Clearly, kinetics of various enzymes in the GHs family show variability, and the relationship between computational accuracy and partitioning of enzyme QM-regions needs to be explored further.
The workflow of the RINRUS (Residue Interaction Networkbased ResidUe Selector) toolkit provides a standard QM-cluster model construction procedure, with embedded reproducibility for system comparison and benchmarking. The RINRUS software has been developed by our lab to rationally select and rank importance of biological fragments (amino acid residues, cofactors, solvent) for inclusion in QM-cluster models and write input files to set up QM calculations. Ranking of residue importance can be obtained through distance metrics, but also more importantly via qualitative topological features of the residue interaction network (Shannon et al., 2003;del Sol et al., 2006;Csermely, 2008;Vishveshwara et al., 2009;Doncheva et al., 2011;Di Paola et al., 2013) or from firstprinciples interaction energies computed via Symmetry Adapted Perturbation Theory (Summers et al., 2019). RINRUS has been applied to many other enzyme case studies which provided valuable results (DeYonker and Webster, 2013;DeYonker and Webster, 2015;Summers et al., 2018;Cheng and DeYonker, 2020;Summers et al., 2021;Cheng and DeYonker, 2021). It can currently interface with several quantum chemistry packages (such as Gaussian, Q-Chem, PSI4, and xtb). RINRUS is open source, but not yet publicly available. It is available by request from the authors and will be shared with the community once the web-based interface is refined and a peer-reviewed introduction to the software (currently in preparation) is published. Until public release of RINRUS is announced in the literature, a thorough description of its capabilities has been described in our work on calibrating QMcluster models of catechol-O-methyltransferase (Summers et al., 2021).
In this study, we use RINRUS built QM-cluster models to study the full catalytic cycle of the retaining mechanism of cellulose hydrolysis based on the wild-type protein and its E217Q mutant. The proposed mechanistic detail and energetics will provide more insights for enzyme engineering of effective catalytic modifications.

COMPUTATIONAL METHODS
Both a theoretical model of an exoglucanase 1 in complex with a cellulose nanomer (PDB ID: 8CEL) and the X-ray crystal structure of Hypocrea jecorina exoglucanase 1 E217Q in complex with a cellononaose (PDB ID: 4C4C, GH7) were used to construct the models for QM-cluster computations. Hydrogen atoms were added to backbone and side chain heavy atoms of the enzyme using the reduce program (Word et al., 1999b). Hydrogen atoms were also manually added to the cellononaose. The active site histidine residue (His228) was doubly protonated (X-ray crystal data for 4C4C is recorded at pH = 6). Next, the protonated PDB files were processed by probe (Word et al., 1999a) to generate the Residue Interaction Networks (RINs) based on atom contact information. In the RIN of 8CEL, 23 residues (including the Glc452, Glc453, Glc454, Glc455 as the +2, +1, −1, and −2 glucosyl units, 16 residues, and three waters) were identified by RINRUS that interact with the +1 and −1 glucosyl units (named Glc+1 and Glc−1, used as the RINRUS "seed" of our models), and 17 residues among the 23 have a high interaction score (more than 50 contact dot counts). Next, 22 residues (including the Bgc1, Bgc2, Bgc3 and Bgc4 as the +2, +1, −1, −2 glucosyl units, 14 residues, and four waters) were identified in the RIN of 4C4C, among which 17 fragments have a >50 interaction count with the seed. The catalytic triad residues Glu212, Asp214 and Glu217/ GLN217 in 8CEL and 4C4C which were included in the QM region in the study by Beckham, Ståhlberg, Withers, Götz, et al. (Knott et al., 2014b), were all identified by RINRUS as they have high contact dot counts with the −1 or +1 glucosyl units. Glu212 is the nucleophile and is positioned closely to the −1 glucose with a C 1 (Glc454)-O(Glu212) bond distance of 3.44 Å in 8CEL and 3.40 Å in 4C4C. Glu217 and Asp214 were protonated to be neutral residues similar to the procedure in reference 13, Glu217 is the acidic residue, mutated to a protonated Gln in PDB:4C4C, that donates a proton to the +1 glucosyl unit, while Asp214 hydrogen bonds to Glu212 to support the catalytic reaction. All three residues exhibit multiple conformations in the X-ray crystal structure. In Conformation A of the PDB file, Gln217 is oriented towards the glycosidic bond (similar to Glu217 in 8CEL), therefore all the QM-cluster models were built based on this conformation.
Considering the glycosylation cleavage of the O 4 -C 1 bond connecting the −1 and +1 glucosyl units, only the O 4 atom in the glucosyl −2 unit was kept and protonated as a hydroxyl group in the model. The +2 unit was completely removed, and one hydrogen atom was added to the O 4 atom of the −1 glucosyl unit. Hence, QM-cluster models of 16 residues (227 atoms, 12 C α and 10 C β atoms were kept frozen, named Res16-E) and 21 residues (the "maximal model" where all residues with nonzero contact dot counts with seed are included, 305 atoms, 19 C α and 14 C β atoms were kept frozen, named Res21-E) based on the 8CEL RIN were constructed. Frozen atoms in the QMcluster models are provided in Table 1. A model from the same 16 residue set using PDB:4C4C was built with 228 atoms (same set of frozen atoms, named Res16-Q) and a "maximal model" that included all 20 residues identified by the RIN as having interactions with the seed (275 atoms, 14 C α and nine C β atoms were kept frozen, named Res20-Q). Residues Thr246 and Arg251 were included in the RIN of 8CEL but not 4C4C. In order to have some comparable analysis, two larger models were built by adding these residues to the 4C4C-based models (named Res21-Q which includes Arg251, and Res23-Q which includes Thr246, Arg251, and one additional water). The aligned Res21-E and Res23-Q 3D models are shown in Figure 1. The 2D structure of the Res21-E and Res23-Q model along with frozen atom information are shown in Figures 2A,B. The atomic labeling for the glycosyl units is shown in Figures 2C,D.
All quantum mechanical cluster model computations were performed using the Gaussian16 program (Frisch et al., 2016). Density functional theory (DFT) with the hybrid B3LYP exchange-correlation functional (Lee et al., 1988;Becke, 1993) ″Cα″ indicates frozen alpha-carbon atoms for that residue, ″Cα Cβ″ indicates frozen alpha-and beta-carbons for that residue, ″X″ indicates the specified non-amino acid residue fragment is included in the model. Residues ID#s written in italics were not identified in the RIN as interacting with the seed, but were added as bridging residues connecting two discrete residues. was employed with the 6-31G(d′) basis set for N, O, and S atoms (Hariharan and Pople, 1973;Petersson and Al-Laham, 1991;Foresman and Frisch, 1996), and the 6-31G basis sets for C and H atoms (Hehre et al., 1972). QM-cluster models incorporated the Grimme D3 (Becke-Johnson) dispersion correction (GD3BJ) (Grimme et al., 2010;Grimme et al., 2011) and implicit solvation via the conductor-like polarizable continuum model (CPCM) (Barone and Cossi, 1998;Cossi et al., 2003) using universal force field (UFF) atomic radii, a nondefault electrostatic scaling factor of 1.2, and the default parameters for water with an attenuated dielectric constant of ε = 4. This dielectric constant value has been previously determined as appropriate for simulating the less-polarized environment within an enzyme active site (Siegbahn and Blomberg, 2000;Blomberg et al., 2014). Unscaled harmonic vibrational frequency calculations were used to identify all stationary points as either minima (no imaginary frequencies) or transition states (TSs, only one imaginary frequency). TSs were located first for each elementary step of the proposed mechanism; the reactants and products were then located by following the intrinsic reaction coordinate (IRC) (Fukui, 1970;Fukui, 1981). It is important to note that our group uses the "freeze code" scheme

RESULTS AND DISCUSSION
The labeling scheme ResV-W(X)-YZ is used for the QM-cluster model illustrating the reaction mechanism, where V = # of residues in the model (16, 20, 21 or 23); W = E for residue E217 in the wild-type protein (8CEL) or Q for residue Q217 in the mutated protein (4C4C); X indicates when multiple  Step 1. Glycosylation The representative reactants, transition states (TS), and products of the glycosylation reaction step are shown in Figures 3, 4, which use the maximal model Res21-E (3D and 2D representation in Figure 3) and Res23-Q (3D and 2D representation in Figure 4) generated from the wild-type protein theoretical model (8CEL) and its E212Q mutant X-ray crystal structure (4C4C), respectively. The reactant is a Michaelis complex and the glycosidic bond (O 4 -C 1 ) between Glc+1 and Glc−1 is intact. Glu217/Gln217 is protonated, and the proton is hydrogen bonding with O 4 of Glc+1. In the TS, the proton of Glu217/Gln217 is transferred between the O 4 of Glc+1 and sidechain oxygen atom of Glu217 or nitrogen atom of Glu217. While the glycosidic bond is elongated and the Glc−1 ring distorts from a 4 E envelope conformation in the reactant (R1) to a 4 H 3 half chair conformation (in TS1), the anomeric C 1 shifts close to Glu212 ( Figure 5). Sidechain rotations are observed in Glu212, Asp214 and Glu217/Gln217 as well as sidechain shifts in the aromatic residues Tyr145 and Trp367 close to Glc−1 and Trp376 above Glc+1. In the product, the glycosidic bond is broken as the proton is transferred to the glycosidic O 4 , while C 1 migrates closer to Glu212 and the Glc−1 glycosyl ring distorts to a 4 C 1 chair-like conformation (in P1). In this step, the Glc+1 glycosyl ring migrates away from Glc−1 without large conformational change. The geometric parameters for the optimized reactants, TSs, and products for different models can be found in supporting information Supplementary Table S1.
The free energy diagram for glycosylation is shown in Figure 6. The free energies of activation using Res16-E and Res21-E models are computed to be 15.5 and 19.5 kcal mol −1 . Similarly to the earlier computational study using the QM/  MM+potential of mean force free energy sampling (15.5 kcal mol −1 ), (Knott et al., 2014b), our free energy of activation in glycosylation using the smaller model is close to the experimentally measured kinetics. The result using the larger model is in excellent agreement with the activation free energy (19.11-19.13 kcal mol −1 ) derived from the experimentally measured rate constants [k cat = 12.8(±0.2) min −1 ] at 37°C (Ståhlberg et al., 1996). Our two models of 8CEL lead to glycosylation products that are 10.8 and 7.0 kcal mol −1 higher in free energy than their corresponding reactants. Using the QMcluster models based on the E217Q mutated 4C4C X-ray crystal structure, the activation free energies of glycosylation are computed to be 28.2, 34.6, and 30.4 kcal mol −1 for Res16-Q, Res20-Q, and Res21-Q, respectively; and 31.4 kcal mol −1 using the maximal RINRUS model Res23-Q, which is 7 kcal mol −1 higher than the free energy (22.68-22.85 kcal mol −1 ) converted from the experimentally measured rate constants [k cat = 3.5(±0.5) × 10 -2 min −1 ] (Ståhlberg et al., 1996). Addition of a water molecule plus Thr246 and Arg251 residues causes the Glc+1 ring to shift less from Glc−1 (C 1 -O 4 bond length is 2.31 Å in Res23-Q-TS1 versus 2.91 Å in Res20-Q-TS1). All models of the E217Q mutated protein led to glycosylation products that are higher in free energy compared to their corresponding reactant. Clearly, the E217Q mutation is much less kinetically or thermodynamically favored for glycosylation. This nicely corresponds to the hypothesis that the E217Q should destabilize the glycosylation transition state. The predicted free energy of activation for the glycosylation step of the mutated protein is concerningly higher than the experimentally measured kinetics. However, our QM-cluster model kinetics are still more qualitatively reasonable than some previous Cel7A QM/MM-based models. Disagreement between experiment and theory for the E217Q glycosylation activation free energy may be rationalized by frozen atoms exacerbating the already poor proton donating ability of the Gln217 residue. Alternatively, we may not have exhaustively explored glycosyl ring distortions that may provide further transition state stabilization. Lastly, the somewhat coarse level of electronic structure theory used in our QM-cluster models will always cause some deviation from experimental kinetic and thermodynamic values.
Step 2. Deglycosylation One explicit water molecule is added to the active site models for the deglycosylation path, so that the deglycosylation reactant can be connected to the glycosylation product. The representative reactants, transition states (TS), and products of the deglycosylation reaction step are shown in Figures 7, 8, which use models Res21-E (3D and 2D representation in Figure 7) and Res23-Q (3D and 2D representation in Figure 8). In the reactant (R2), the water positions its oxygen close to the C 1 of Glc−1, but opposite to Glu212. While one of the hydrogen atoms is close to Glu217-O/ Gln217-N, the other hydrogen bonds to the glycosidic O 4 of Glc+1 and pushes the Glc+1 further away from Glc−1 compared to the glycosylation product. In the TS2, Glc−1 shifts slightly towards the water but further away from Glu212, while the water shifts slightly coupled with sidechain rotation in Glu217/ Gln217. A proton transfer from the new water molecule couples with the water nucleophilic attack C 1 of Glc−1 in the TS2. The proton from the water transfers to Glu217-O/Gln217-N, and the OH group bonds to C 1 of Glc−1, leading to the glycosyl unit migrating further away from Glu212. The glycosyl unit Glc−1 again undergoes a large conformational change from 4 C 1 to 4 H 3 to 4 E (Figure 9 from R2 to TS2 to P2), which is opposite from the glycosylation process. Most of the residues of the reactant, TS, and product in deglycosylation are structurally unperturbed, while large geometric changes are seen in Trp376 and Trp367, which are near the glycosyl units, and Tyr145, which is close to the Glc−1 glycosyl unit. The geometric parameters for the optimized reactants, TSs, and products of deglycosylation for different QM-cluster models can be found in Supplementary  Table S2. The free energy diagram of deglycosylation is also shown in Figure 6 and Supplementary Table S3. The deglycosylation reactant (R2) is lower in energy than the glycosylation reactant in Res16-E model, but higher than the glycosylation reactant in other models. The deglycosylation free energies of activation using the Res16-E and Res21-E models are computed to be 23.9 and 13.8 kcal mol −1 (14.1 and 1.9 kcal mol −1 for this elementary step) and the corresponding products are found to be 3.4 kcal mol −1 higher and 0.9 kcal mol −1 lower in free energy than the glycosylation reactant (R1). The free energies of activation for deglycosylation of the mutated models Res16-QA, Res20-QA, Res21-QA and Res23-QA are 1.5, 1.7, 0.7, and 1.3 kcal mol −1 for the elementary step, but much higher in overall free energy of activation (34.9, 30.0, 25.5 and 31.0 kcal mol −1 ). Similar to Res21-EA model, whose deglycosylation reactant is 5.0 kcal mol −1 higher than the glycosylation product plus one water, all E217Q mutated models have deglycosylation reactants 20.9, 4.9, 6.4, and 11.5 kcal mol −1 higher in free energy than glycosylation products plus water. Comparing both maximal models, the larger energy difference between the glycosylation product and deglycosylation reactant seen in the mutated model Res23-QA may explain the observation of a GEI in the X-ray crystal structure (PDB 4C4C).
Overall, in the RINRUS models of the wild-type protein, the glycosylation transition state is found to be the rate limiting step in our proposed mechanism. While the deglycosylation step in the mutated enzyme has a very high free energy of activation and is competitive with glycosylation, it is still slightly lower in free energy than the TS of the glycosylation step. The glycosylation of the mutant is slower than the wild-type enzyme, which agrees well with experimental observation.  For the glycosylation step, using the RINRUS-derived models (Res16-E and Res21-E) of the theoretical protein 8CEL enabled us to explore different conformations. Conformational differences (Supplementary Figure S1) are seen in the sidechains of the aromatic residues Tyr145, His228, Trp367, and Trp376 which surround the glycosyl units, Asp214 which hydrogen bonds to the important residue Glu212, and Asp173 and water molecules which form hydrogen bonds with the glycosyl units. The Glu217 sidechain is also quite flexible and can adopt multiple conformations in our QM-cluster models. The free energy diagram of glycosylation using different conformers is shown in Figure 10. Note that the Res16-EA and Res21-EA models discussed here are shown in Figure 6 and previously discussed. Both Res16-EB-R1 and Res21-EB-R1 are slightly higher in energy than Res16-EA-R1 and Res21-EA-R1 but lead to a higher energy TS1, which indicates that the glycosylation will be unlikely to take place via conformation B. The free energy of activation for models Res16-EC is slightly higher than that of Res16-EA, and reactant Res16-EC-R1 is 6.5 kcal mol −1 higher in free energy than Res16-EA-R1, which indicates that the glycosylation reaction will be more likely to occur via conformation A. While for the maximal model, even though Res21-EC-R1 is 4.3 kcal mol −1 higher in energy than Res21-EA-R1, the Res21-EC model has a lower free glycosylation activation energy than that of Res21-EA. Hence, the deglycosylation was examined and a much higher energy TS was located which indicates that reaction will be unlikely to take place via conformation C.
Three conformations of models Res20-Q and Res21-Q based on the mutated protein are found in the glycosylation step and the free energy diagram is shown in Figure 11. The glycosylation reactants in the three conformations have different Trp367 and Trp376 sidechain orientations (Supplementary Figures S2B,C); and the reactants (R1) in conformation B and C are less than 4.5 kcal mol −1 higher in free energy than the reactants in conformation A. However, the activation free energies of glycosylation of conformation B (28.2 and 26.5 kcal mol −1 for Res20-QB and Res21-QB) are lower than those in conformation A, but corresponding free energies of activation for deglycosylation were computed to be 40.5 and 43.3 kcal mol −1 for Res20-QB and Res21-QB, respectively, which indicates that even if conformation B of the mutated protein is more kinetically favorable for glycosylation, the free energies of activation for deglycosylation are too high to be catalytically viable. These large energy differences in the two steps could be caused by one water shifting to a position close to Glc+1, which blocked Glc+1 from moving away from Glc−1. This water reorientation makes it difficult for the additional water to attack the anomeric carbon. In both Res20-QC and Res21-QC models, the free energies of TS1s are competitive compared to conformer A, but Res20-QC-R1 and Res21-QC-R1 are 1.7 and 0.2 kcal mol −1 higher than Res20-QA-R1 and Res21-QA-R1 and the glycosylation products (P1) of both conformer C are higher in energy than those of conformer A, which may lead to a similar deglycosylation reactant in conformer A.
Multiple conformations of Res16-Q and Res23-Q are located for both glycosylation and deglycosylation. The free energy diagrams of both models are shown in Figure 12. The reactants, TS, and products in glycosylation step are very similar in Res16-QA and Res16-QB models; only small differences are seen in Trp367, Trp376 and waters. Both  conformers have very high energy deglycosylation TSs, indicating that large QM-cluster models are needed to study the reactions of GHs. Three conformations of the maximal model are examined. Like Res20-Q and Res21-Q, the TSs in the three additional conformations have almost the same free energy, even though the reactants of Res23-QB and Res23-QC are 6.4 and 3.5 kcal mol −1 higher than Res23-QA-R1. For conformer B the deglycosylation has a very high energy TS2, indicating the reaction is unlikely to take place via this conformation. The product of conformer C is also higher in free energy than that of conformer A. Overall, conformation A is the most energetically favored.

CONCLUSION
Cellulose hydrolysis by the cellobiohydrolase 7A (also known as CBHI) enzyme via a two-step retaining process is elucidated by two QM-cluster models built on the wild-type theoretical protein structure PDB:8CEL and four models built on the X-ray crystal structure PDB:4C4C in which Glu217 is mutated to Gln. In the glycosylation step, a proton from residue 217 is transferred to the glycosyl unit coupled with the glycosidic bond cleavage. Large conformational change is observed in the Glc−1 unit in both the wild-type and mutated enzyme models. The catalytic trio Glu212, Asp214 and Glu217/Gln217 are important for the reaction. Mutation of Glu to Gln leads to a weaker proton donor that increases the free energy of activation in both glycosylation and deglycosylation. These three residues are also quite flexible as multiple sidechain orientations are seen in the X-ray crystal structure as well as in the multiple pathways proposed in our study. In the deglycosylation step, one explicit water molecule is added to the models and the free energy of activation is lower than those in the glycosylation step in the wild-type protein, indicating glycosylation is the rate-limiting step. The deglycosylation step of the mutated protein model becomes more competitive kinetically. Large QM-cluster models (Res21-EA, Res20-QA, Res21-QA, and Res23-QA) lead to a low energy deglycosylation product, hence the full catalytic cycle is thermodynamically favored. The RINs based on 8CEL and 4C4C are slightly different. However, the fringe residues Arg251 and Thr246, which have fewer interaction counts with the glycosyl units are found to have minor effect on the kinetics of the proposed mechanisms. The free energy of activation of glycosylation is only lowered by 1.5 kcal mol −1 when adding Arg251 to Res20-QA (to form Res21-QA); the glycosylation activation free energy is lowered by 3.0 kcal mol −1 when adding Thr246 and one water to Res21-QA (to form Res23-QA). All RINRUS-built models provide free energies of activation for the glycosylation step which are in good agreement with experimental measurements of k cat values for the wild-type protein and its E217Q mutant. Enzymatic reaction studies based on the residue interaction network and QM-cluster models automatically generated by RINRUS can provide good structural and energetic results for better understanding of enzyme reaction mechanisms. The RINRUS workflow is inherently reproducible, which will be greatly beneficial to the computational enzymology community.
This is the first time that RINRUS has been used to explore thermodynamic and kinetic changes of an enzymatic reaction based on point mutation, and RINRUS provides results which are reasonably in good agreement with experimental observation and would be useful to apply to study other metal-free enzyme systems. Based on this study along with our previous QMcluster studies of different enzymatic reaction, large QMcluster models with typical 200-300 atoms provide better results than smaller sized models. RINRUS-built QM-cluster models should more accurately predict thermodynamic and kinetic results if conformational sampling is included, which may be necessary when glycosyl units are present in the active site.