# CHALLENGES IN COMPUTATIONAL ENZYMOLOGY

EDITED BY : Vicent Moliner and Fahmi Himo PUBLISHED IN : Frontiers in Chemistry

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-272-5 DOI 10.3389/978-2-88963-272-5

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# CHALLENGES IN COMPUTATIONAL ENZYMOLOGY

Topic Editors: Vicent Moliner, University of Jaume I, Spain Fahmi Himo, Stockholm University, Sweden

Citation: Moliner, V., Himo, F., eds. (2019). Challenges in Computational Enzymology. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-272-5

# Table of Contents


Vikash Kumar, Michael Naumann and Matthias Stein


Per E. M. Siegbahn and Margareta R. A. Blomberg


Kersti Caddell Haatveit, Marc Garcia-Borràs and Kendall N. Houk

*108 Theoretical Studies on Mechanism of Inactivation of Kanamycin A by 4*′*-O-Nucleotidyltransferase*

Sergio Martí, Agatha Bastida and Katarzyna Świderek


Santiago Alonso-Gil, Joan Coines, Isabelle André and Carme Rovira

*160 Benchmark of Density Functionals for the Calculation of the Redox Potential of Fe3+/Fe2+ Within Protein Coordination Shells* Risnita Vicky Listyarini, Diana Sofia Gesto, Pedro Paiva, Maria João Ramos

and Pedro Alexandrino Fernandes

# Editorial: Challenges in Computational Enzymology

#### Vicente Moliner <sup>1</sup> \* † and Fahmi Himo<sup>2</sup> \* †

<sup>1</sup> Departamento de Química Física y Analítica, Universidad Jaume I, Castellón de la Plana, Spain, <sup>2</sup> Arrhenius Laboratory, Department of Organic Chemistry, Stockholm University, Stockholm, Sweden

Keywords: computational chemistry, enzyme catalysis, DFT, QM/MM, molecular modeling, molecular dynamics

### **Editorial on the Research Topic**

#### **Challenges in Computational Enzymology**

Living organisms utilize enzymes to accelerate and control chemical reactions, and the attraction of these biological catalysts lies not only in their extraordinary efficiency, but also in their selectivity and their ability to function at mild conditions. Because of their involvement in fundamental biological processes, enzymes are common targets for the development of drug molecules in the pharmaceutical industry. In addition, enzymes are today increasingly employed as synthetic tools for industrial production of high value chemicals. It is therefore of great importance to understand in depth how these fascinating molecular machines perform their reactions, both from a fundamental scientific point of view and also for technical applications. To this end, computational approaches have in recent years become an indispensable tool in this endeavor. A number of powerful methodologies have been developed that have allowed for breakthroughs in the mechanistic understanding of enzymes. The exponential growth of computer power has, of course, been of key importance here. The field has also benefited greatly from the fruitful collaborations between theoreticians and experimentalists, where results derived from the simulations are used to interpret the experiments, and also to predict the behavior of natural and altered enzymes.

This Research Topic is concerned with the general theme of computational enzymology. It consists of 13 contributions covering various aspects of the field, spanning from benchmarking of methods to high-end applications and reviews of recent work.

In a review paper by Wei et al., the authors summarized their work on the modeling of metalloenzymes with both the quantum chemical cluster approach and the hybrid quantum mechanics/molecular mechanics (QM/MM) method. This review discussed recent progress in the computational understanding of mechanisms and various selectivities, such as chemo-, regio-, and stereoselectivity, in this important class of enzymes. In a more focused mini-review, Caddell Haatveit et al. described a computational protocol for modeling cytochrome P450 and exploring its reactivity and selectivity. The approach leads to predictions of enzyme variants that are more efficient and selective. Rovaletti et al. discussed in another mini-review the modeling of an illustrative example of redox-active enzymes, namely Mo/Cu-dependent CO dehydrogenase. The challenges concerned with both the DFT and the QM/MM approaches are highlighted and promising future directions are mentioned. In the paper of Siegbahn and Blomberg, problems associated with the study of mechanisms of redox processes in enzymes by DFT methods were discussed. An alternative systematic approach was proposed, where the amount of exact exchange in the B3LYP functional can be used as a parameter, rather than using a large number of different functionals. The limitations of DFT methods to study catalytic reaction mechanisms involving Fe(III)/Fe(II) oxidation states were discussed in the paper of Listyarini et al. The benchmark study, based on QM and QM/MM computational methods, provides guidelines to estimate the inaccuracies coming from the density functionals and to choose the most appropriate ones.

#### Edited by:

Thomas S. Hofer, University of Innsbruck, Austria

#### Reviewed by:

Pedro Alexandrino Fernandes, University of Porto, Portugal

#### \*Correspondence:

Vicente Moliner moliner@uji.es Fahmi Himo fahmi.himo@su.se

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 28 June 2019 Accepted: 07 October 2019 Published: 22 October 2019

#### Citation:

Moliner V and Himo F (2019) Editorial: Challenges in Computational Enzymology. Front. Chem. 7:690. doi: 10.3389/fchem.2019.00690

**4**

On the applications side, a number of very interesting papers are included in the Research Topic. Marques et al. presented a molecular dynamics study, employing metadynamics and adaptive sampling techniques, to estimate the rates for unbinding of the product molecule from the enzyme haloalkane dehalogenase DhaA and its mutant DhaA31. The simulations provided insights into the energetic bottlenecks in the ratelimiting unbinding process, which is of great help for the design of improved biocatalysts. Alonso-Gil et al. unraveled the hydrolytic reaction mechanism of Neisseria polysaccharea amylosucrase (NpAS), a member of GH13 family, based on QM(DFT)/MM metadynamics simulations. The results provide an atomistic picture of the active site reorganization along the catalytic double-displacement reaction, consistent with the general conformational itinerary observed for α-glucosidases. Kumar et al. investigated the binding modes of two inhibitors of the two metalloproteases (human JAMM deubiquitinylases Rpn11 and CSN5) using MD simulations and binding energy analysis and found that it was necessary to include larger heterodimeric protein-protein complexes in order to avoid unrealistic structural changes.

Alonso-Cotchico et al. combined force field-based techniques with a variety of trajectory convergence analyses to study the effect of cofactor binding on the conformational plasticity of the Lactococcal multidrug resistance regulator (LmrR) protein, a crucial aspect in order to design efficient artificial metalloenzymes. Martí et al. used QM/MM calculations to explore the mechanisms of the inactivation process of kanamycin A catalyzed by 4′ -O-nucleotidyltransferase. Free energy perturbation techniques were employed and primary and secondary <sup>18</sup>O kinetic isotope effects were computed and compared to available experiments to elucidate the detailed reaction mechanism. Romero-Téllez et al. used QM(DFT)/MM calculations to compare the glycosylation, hydrolysis and transglycosylation steps catalyzed by wild type Thermus thermophilus β-glycosidase. They showed how the molecular understanding of similarities and differences between hydrolysis and transglycosylation steps may be of help in the design of new biocatalysts for glycan synthesis. Prejanò et al. described based on QM and QM/MM calculations the reaction mechanism of the inhibition mechanism of hydrolyzed piperlongumine (hPL), an anticancer compound whose activity is related to the inhibition of human glutathione transferase of pi class (GSTP1). Finally, Timmins et al. studied the reaction mechanism of the HctB, a non-heme iron halogenase, using a combination of MM, MD, QM/MM, and DFT techniques. The effect of the substrate position in the active site on the halogenation vs. hydroxylation selectivity is discussed and comparison is made to other non-heme iron enzymes.

We believe that the new insights presented in these contributions will lead to advances in both fundamental understanding and practical applications in, for example, the design of new drug compounds and the developments of new biocatalysts for the chemical industry. We wish finally to thank all the contributors for their excellent work and all the reviewers for their comments that helped to improve the contents.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Moliner and Himo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computational Studies on the Inhibitor Selectivity of Human JAMM Deubiquitinylases Rpn11 and CSN5

Vikash Kumar 1,2, Michael Naumann<sup>1</sup> and Matthias Stein<sup>2</sup> \*

1 Institute of Experimental and Internal Medicine, Medical Faculty, Otto von Guericke University, Magdeburg, Germany, <sup>2</sup> Molecular Simulations and Design Group, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany

Deubiquitinylases (DUBs) are highly specialized enzymes which are responsible for removal of covalently attached ubiquitin(s) from the targeted proteins. DUBs play an important role in maintaining the protein homeodynamics. Recently, DUBs have emerged as novel therapeutic targets in cancer, inflammation, diabetes, and neurodegeneration. Among the different families of DUBs, the metalloprotease group or JAB1/MOV34/MPR1 (JAMMs) proteases are unique in terms of catalytic mechanism. JAMMs exhibit a Zn2+-dependent deubiquitinylase activity. Within the JAMM family, deubiquitinylases Rpn11 and CSN5 are constituents of large bimolecular complexes, namely the 26S proteasome and COP9 signalosome (CSN), respectively. Rpn11 and CSN5 are potential drug targets in cancer and selective inhibitors of both proteins have been reported in the literature. However, the selectivity of JAMM inhibitors (capzimin for RPN11 and CSN5i-3 for CSN5) has not been structurally resolved yet. In the present work, we have explored the binding modes of capzimin and CSN5i-3 and rationalize their selectivity for Rpn11 and CSN5 targets. We found that capzimin interacts with the active site Zn+<sup>2</sup> of Rpn11 in a bidentate manner and also interacts with the residues in the distal ubiquitin binding site. MD simulations studies and binding energy analysis revealed that the selective binding of the inhibitors can be only explained by the consideration of larger heterodimeric complexes of Rpn11 (Rpn8-Rpn11) and CSN5 (CSN5-CSN6). Simulation of these protein-protein complexes is necessary to avoid unrealistic large conformational changes. The selective binding of inhibitors is mainly governed by residues in the distal ubiquitin binding site. This study demonstrates that selective inhibitor binding design for Rpn11 and CSN5 JAMM proteases requires consideration of heterodimeric protein-protein target structures.

Keywords: computational drug design, deubiquitinase (DUB), selectivities, protein-protein interaction, molecular dynamics (MD), ligand binding

### INTRODUCTION

Ubiquitinylation is one of the major post-translational modifications of proteins. Ubiquitin is a 76 amino acids protein which is covalently attached to the lysine residue of the substrate by consecutive action of three enzymes i.e., activating (E1), conjugating (E2), and ligating (E3) enzymes (Nandi et al., 2006). Ubiqutinylated proteins participate in various cellular processes. In

Edited by:

Fahmi Himo, Stockholm University, Sweden

#### Reviewed by:

Silvia Osuna, University of Girona, Spain Etienne Derat, Université Pierre et Marie Curie, France \*Correspondence:

> Matthias Stein matthias.stein@ mpi-magdeburg.mpg.de

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 23 July 2018 Accepted: 20 September 2018 Published: 09 October 2018

#### Citation:

Kumar V, Naumann M and Stein M (2018) Computational Studies on the Inhibitor Selectivity of Human JAMM Deubiquitinylases Rpn11 and CSN5. Front. Chem. 6:480. doi: 10.3389/fchem.2018.00480

**6**

order to remove the ubiquitinylation mark, separate families of proteins exist and they have been named as deubiquitinylases. In contrast to ubiquitinylation, deubiquitinylation requires action of a single enzyme only (Reyes-Turcu et al., 2009). DUBs have been classified into 5 families i.e., ubiquitin specific proteases (USPs), ovarian tumor proteases (OTUs), Machado-Joseph disease proteases (MJDs), and JAB1/MOV34/MPR1 (JAMM) proteases (Komander et al., 2009; Mevissen and Komander, 2017). Except for JAMM proteases which have Zn2<sup>+</sup> in the catalytic site, all other DUBs are cysteine proteases. Members of JAMM family have a common Zn2<sup>+</sup> binding motif which contains three conserved residues (one aspartate and two histidines) (Berndt et al., 2002).

The human genome encodes for 14 JAMM proteins and only seven of them have the full set of conserved residues required for Zn+<sup>2</sup> binding (Shrestha et al., 2014). Six JAMM proteins have deubiquitinylase activities which are AMSH, AMSH-LP, BRCC36, CSN5, MYSM1, and Rpn11 (also known as POH1 or PSMD14) (Shrestha et al., 2014). CSN5 is a component of the CSN and it also possesses nedylation activity (Lee et al., 2011; Echalier et al., 2013). Monomeric CSN5 shows no DUB activity and requires the presence of its non-active binding partner (CSN6) to gain full activity (Echalier et al., 2013). Rpn11 is a component of the 19S regulatory subunit of the large proteasome complex and only displays deubiquitinylase activity (Verma et al., 2002). Similar to the CSN5, a monomeric Rpn11 is not catalytically active (Yao and Cohen, 2002; Pathare et al., 2014).

Recently, Rpn11 has emerged as a potential drug target in human cancers (Li et al., 2017). Rpn11 is responsible for deubiquitinylation of proteasomal substrates (Verma et al., 2002) Inhibition of Rpn11 has been reported to overcome bortezomib resistance and induce apoptosis in multiple myeloma cells (Song et al., 2017). Capzimin is a potent and selective inhibitor of Rpn11 (**Figure 1A**) and it has been suggested that capzimin chelates the Zn2<sup>+</sup> ion in the active site of Rpn11 (Li et al., 2017; Perez et al., 2017). Capzimin shows an 80-fold selectivity for Rpn11 over CSN5 (Li et al., 2017). Similarly, CSN5i-3, a potent inhibitor of CSN5 (**Figure 1B**), shows 10,000-fold selectivity for CSN5 over Rpn11. However, the structural basis of these inhibitor selectivities is not known. From the drug design perspective, it is important to rationalize the binding mode of capzimin and structural elements responsible for imparting DUB selectivity. In the present study, we used a workflow of molecular docking, refinement and ligand binding stability studies by molecular dynamics simulations and binding energy calculations to investigate the structural basis of selective inhibition of Rpn11 and CSN5.

### MATERIALS AND METHODS

### Monomer and Heterodimeric Target Structures

Cryo-EM structure of human 26S proteasome (PDB\_code: 5GJR) was obtained from PDB (Huang et al., 2016). With the exception of the chains U (Rpn8 or PSMD7) and V (Rpn11 or PSMD14), other were deleted. The Rpn-Rpn11 heterodimer in

the unprocessed form is shown in **Figure S1A**. We noticed that RPN11 does not have Zn2<sup>+</sup> ion in the catalytic site. Rpn8 and Rpn11 were processed separately. We deleted the C-terminal region of Rpn8 (182–295) and Rpn11 (210–316) since they are not relevant for enzymatic catalysis. With the help of UCSF Chimera (Pettersen et al., 2004) interface to MODELLER (Sali and Blundell, 1993; Webb and Sali, 2017), missing regions in Rpn8 (143–151) and Rpn11 (164–189) were modeled as loops (**Figure 2A** and **Figure S1**). Ins-1 loop (76–88) of the Rpn11 was initially in the closed conformation hence we generated 10 alternative conformations of Ins-1 loop (**Figure S1B**). We selected a loop conformation which was pointing away from the catalytic site (**Figure 2A**). The final Rpn8-Rpn11 heterodimer model is shown is **Figure 3A** and **Figure S1C**. A Zn2<sup>+</sup> ion was transferred to the catalytic site of Rpn11 by aligning the Rpn11 structure with the crystal structure of CSN5. After transferring Zn2<sup>+</sup> ion to the Rpn11, we used both the Rpn11 monomer (**Figure 2A**) and the Rpn8-Rpn11 heterodimer (**Figure 3A**) structures for subsequent analysis.

For the CSN5 monomer (**Figure 2B**), the recently solved crystal structure of CSN5 with CSN5i-3 (PDB\_code: 5JOG) (Schlierf et al., 2016) was considered. We modeled the missing Ins-1 loop region (**Figure S2A**) by analogy to Rpn11 (see above) and out of 5 loop conformations (**Figure S2B**), a loop conformation which did not show any steric clash and contact with the bound inhibitor was considered (**Figure S2C**). The structure of the CSN5-CSN6 heterodimer (**Figure S3A**) was extracted from the crystal structure of human COP9 signalasome (PDB\_code: 4D10) (Lingaraju et al., 2014) C-terminal regions of CSN5 (258–333) and CSN6 (215–316) were also deleted since they are not involved in catalysis and heterodimer formation can take place without them. Here, Ins-1 loop (98– 113) of CSN5 was found to be in a closed conformation. We generated 10 alternative conformations of Ins-1 loop (**Figure 2C** and **Figure S3B**). We eventually selected a CSN5 model in which the Ins-1 loop is in an open conformation and does not obstruct ligand access to the catalytic site. The processed CSN5-CSN6 heterodimer is shown in **Figure 3B** and **Figure S3C.**

### Inhibitor Positioning

The 3D structure of capzimin was generated with the help of MarvinSketch program (https://chemaxon.com/products/ marvin). The sulfur thiolate has a negative charge. Ligand docking of capzimin was carried out with AUTODOCK4.2 (Morris et al., 2009) Kollman charges were assigned to the atoms of Rpn11 and CSN5. On Zn, +2 charge was manually assigned. Partial charges on capzimin were calculated using the SwissParam server (Zoete et al., 2011). For both Rpn11 and CSN5, a grid was centered on the catalytic Zn2+. To enclose the binding site in RPN11, the size of the 3D grid was set to 46, 50, and 52 grid points in x, y and z directions, respectively, with a default spacing of 0.375 Å. In case of CSN5, binding site was enclosed in a grid consisting of 48, 50, and 64 grid points in x, y and z directions, respectively. In each case, the Lamarckian Genetic Algorithm (LGA) was used to generate 100 docked conformations of capzimin. Binding mode of CSN5i-3 in the CSN5 monomer is known (Schlierf et al., 2016) hence we used the co-crystalized conformation of CSN5i-3 to generate CSN5i-3 bound CSN5-CSN6 heterodimeric, Rpn11 monomeric and Rpn8-Rpn11 heterodimeric complexes by manual docking (structural superimposition).

### Molecular Dynamics Simulations

All MD simulations were carried out using GROMACS-5.1.2 (Van Der Spoel et al., 2005). For monomeric Rpn11, CSN5, and heterodimeric complexes Rpn11-Rpn8 and CSN5-CSN6 the all-atom CHARMM27 force field (which has CHARMM22 and CMAP for proteins) (Mackerell et al., 1998, 2004) provided in the GROMACS package was used. CHARMM27 force field provides non-bonded parameters for Zn2+. Optimized forcefield parameters for Zn+<sup>2</sup> were taken from Stote and Karplus (1995) and were shown to give reliable coordination geometries. The topology files for capzimin and CSN5i-3 (see **Supplementary File** for more information) were generated with the help of the SwissParam server (Zoete et al., 2011). All complexes were enclosed in triclinic boxes (see **Table S1**). The TIP3P water (Jorgensen, 1981; Mark and Nilsson, 2001) model was used to solvate all complexes. Ions (Na<sup>+</sup> and Cl−) were added at 0.15 M concentration to neutralize the systems. After neutralization,

the systems were subjected to 5000 steps of steepest decent minimization. Minimized systems were further equilibrated under both NVT and NPT conditions for 1 and 2 ns, respectively. During equilibration, position restrains were applied to both protein (including Zn+<sup>2</sup> ) and ligand atoms. Temperature (310 K) and pressure (1 atm) were controlled by the velocity rescaling thermostat (Bussi et al., 2007) and Parrinello-Rahman barostat (Parrinello and Rahman, 1981), respectively. The equilibrated systems were finally subjected to the 100 ns production phase under NPT condition without any position restraints. Three independent simulations were carried out for each of the complexes.

### Binding Energy Calculation

Binding energies of inhibitors were calculated with the help of the linear interaction energy (LIE) methodology. LIE methodology has been reported to predict reliable binding energies (Hansson et al., 1998; Aqvist and Marelius, 2001). In the LIE methodology, the free energy of transfer of ligand from water to the protein environment is giving the binding energy. In simple terms, the LIE equation can be given as:

$$
\Delta \text{G}\_{\text{bind}} \text{ (ligand)} = \text{G}^{\text{bound}} \text{ (ligand)} - \text{G}^{\text{free}} \text{ (ligand)} \quad \text{(1)}
$$

Where 1Gbind (ligand) is binding energy of ligand, Gbounds sol (ligand) is the energy of ligand in the solvated protein-ligand complex and Gfree sol (ligand) is the energy of the free ligand in water. LIE calculation is generally carried out in combination with MD or Monte Carlo (MC) simulation. LIE has two components, i.e., electrostatic (el) and van der Waals (vdW) interactions30. Hence, Equation (1) can be rewritten as

$$
\Delta\text{G}\_{\text{bind}} = \alpha \langle \langle \mathcal{U}\_{\text{lig}-\text{surr}}^{\text{vdW}} \rangle\_{\text{protein}} - \langle \mathcal{U}\_{\text{lig}-\text{surr}}^{\text{vdW}} \rangle\_{\text{water}} \rangle
$$

$$
+ \beta \langle \langle \mathcal{U}\_{\text{lig}-\text{surr}}^{\text{el}} \rangle\_{\text{protein}} - \langle \mathcal{U}\_{\text{lig}-\text{surr}}^{\text{el}} \rangle\_{\text{water}} \rangle
$$

$$
= \alpha \Delta \mathcal{U}^{\text{vdW}} + \beta \Delta \mathcal{U}^{\text{el}} \tag{2}
$$

Where brackets <> indicate thermodynamic averages of the interaction energies of the ligand with its surroundings (Aqvist and Marelius, 2001). α is an empirically derived non-polar scaling factor and β is a polar scaling factor (Aqvist and Marelius, 2001). Almlöf et al. have suggested β<sup>0</sup> = 0.43 for neutral compounds and correction factors (1β) for different ligands (Almlöf et al., 2007). For calculation of ligand binding energies we have used α = 0.18. β values for the thiolate form of capzimin (anion) and CSN5i-3 (alcohol) were obtained after applying functional group-specific correction factors to the β<sup>0</sup> (0.43+0.02 = 0.45 for capzimin and 0.43–0.06 = 0.37 for CSN5i-3) (Almlöf et al., 2007; Gutiérrez-De-Terán and Aqvist, 2012). Values of β depend on the chemical nature of the ligand (Hansson et al., 1998; Rinaldi et al., 2004). In order to calculate the energy terms (vdW and el) of capzimin and CSN5i-3 in the water, we have carried out separate 50 ns MD simulation for each.

### RESULTS AND DISCUSSION

### Open Conformations of Rpn11 and CSN5

Ins-1 region of CSN5 (97–131) and RPN11 (74–106) has been reported to be flexible and its flexibility is important for the binding of distal ubiquitin (Echalier et al., 2013; Worden et al., 2014). In the cryo-EM structure of Rpn11, the Ins-1 loop (76–88) obstructs the distal ubiquitin binding pocket. Previous studies suggest that the Ins-1 loop is flexible and very important for the regulation of enzymatic activity of zinc metalloproteases. It seems that during ligand binding this Ins-1 loop can adopt different conformations. Keeping the above fact in mind, we generated alternative loop conformations so that the distal ubiquitin binding pocket becomes accessible for the binding of inhibitors. In the open conformation, Ins-1 loop is pointing away from the catalytic site (**Figure 2**). We have used two different crystal structures to represent CSN5 in monomeric and heterodimeric states. In the monomeric state, CSN5 is already complexed with a potent inhibitor CSN5i-3 and a part of Ins-1 loop (100–106) is missing but we have modeled the Ins-1 loop in open conformation. In the crystal structure of CSN5-CSN6 heterodimer, initial Ins-1 loop (98– 109) conformation blocks the distal ubiquitin binding site which we also observed in cryo-EM structure of Rpn11. Hence, Ins-1 loop was remodeled in the open conformation (**Figures 2B,C**)**.** Rpn8-RPN11 and CSN5-CSN6 heterodimeric states are shown in **Figure 3**.

### Initial Docked Conformations of Capzimin

The recently published crystal structure of CSN5 with CSN5i-3 provides a picture of inhibitor occupation of the distal ubiquitin binding site and possible interference with the catalytic activity of CSN5. Due to the conservation of residues in the distal ubiquitin binding site between CSN5 and Rpn11, we assumed that capzimin might also occupy the distal ubiquitin binding site of Rpn11. In the CSN5i-3 bound crystal structure of CSN5, the nitrogen atom of the azole ring makes a coordinate bond with Zn2+. In the previous study (Perez et al., 2017), the parent compound of capzimin that is 8-thiquinoline (8TQ) has been shown to interact with the Zn2<sup>+</sup> in a bidentate manner and it has been proposed that capzimin inhibits Rpn11 also via chelation of the catalytic Zn+<sup>2</sup> . Interestingly, the top binding pose generated by AUTODOCK4.2 showed that the 8TQ moiety of capzimin also makes a monodentate interaction with Zn2<sup>+</sup> of CSN5 and Rpn11 (**Figure 4**). Another possible coordinating atom of the 8TQ moiety was remote from the Zn2<sup>+</sup> ion. The binding modes of capzimin in both Rpn11 and CSN5 appear similar but we observed few differences in the distal ubiquitin binding site. In RPN11, the amide portion of capzimin makes H bond with Thr129, and the thiazole moiety makes hydrophobic and van der Waals interactions with side chains of Met54 and Asp88, respectively (**Figure 4A**). In CSN5, H-bond with Thr154 was absent and thiazole moiety of capzimin makes H-bond with side chain of Asn158 and shows hydrophobic interaction with the sidechains of Met78 and Trp136 (**Figure 4B**).

### Binding of Capzimin and CSN5i-3 to Monomeric Rpn11 and Rpn8-Rpn11 Heterodimer

Being an integral part of the proteasome machinery, Rpn11 works in coordination with other subunits. Monomeric Rpn11 lacks deubiquitinylase activity and is active only in the presence of Rpn8. Here, we have investigated the binding of Rpn11 inhibitors in the absence and presence of Rpn8 (**Figures 5**–**9**). Cα root mean square deviation (Cα-RMSD) analysis revealed that compared to the monomer capzimin-Rpn11 complex, Rpn11 in the heterodimeric capzimin-Rpn8-Rpn11 complex shows lower RMSD (**Figure 5A**) and is thus stabilized**.** Cα root mean square fluctuation (Cα-RMSF plot) (**Figure 5B**) showed that in the absence of Rpn8, residues belonging to the Ins-1 loop and α2 helix undergo larger fluctuations. Structural analysis also revealed that in the capzimin-Rpn11 complex, the Ins-1 loop and α2 helix exhibited large movements. Particularly, the α2 helix came closer to the α3 helix (**Figure 9A**). The presence of Rpn8 restricts this movement of α2 helix as well as that of the Ins-1 loop. Both crystal structures of the yeast Rpn8-Rpn11 heterodimer (Pathare et al., 2014) and the cryo-EM structure of human 26S proteasome (Huang et al., 2016) show that the

of Rpn11 (C) RMSD of capzimin and (D) distances between coordinating atoms of capzimin and Zn+<sup>2</sup> . Zn+<sup>2</sup> -S distances in capzimin bound Rpn11 and Rpn8-Rpn11 heterodimer are shown in blue and green color, respectively.

α2 helices of both proteins makes extensive contacts with each other.

Molecular dynamics simulations refined the binding position of capzimin. The average RMSDs of the refined structures to the starting were 0.32 and 0.27 Å in Rpn11 monomer and Rpn8-Rpn11 heterodimer, respectively. In the heterodimeric state, capzimin showed overall less deviation in (**Figure 5C**). It seems that during the start of simulation, capzimin tries to optimize interactions with the residues of the binding pocket and this leads to the deviation from initial conformation. However, in both complexes, S-Zn2<sup>+</sup> and N-Zn2<sup>+</sup> distances (**Table 1**) were stable (**Figure 5D**) and comparable to the experimentally determined distances reported in [(TpMe,Ph)Zn(8TQ)] complex (Perez et al., 2017). The H-bond between capzimin and the side chain of Thr129 was maintained during most of the part of trajectory (**Figures S4A,B**). In **Figures 6A,B**, we see that the 8TQ fragment of the capzimin interacts with the catalytic Zn+<sup>2</sup> and the additional amide moiety interacts with Thr129. Leu56, Pro89, and Phe133 provides hydrophobic interactions to the azole moiety of the

capzimin. Capzimin showed almost similar binding affinity to monomeric Rpn11 as well as Rpn8-Rpn11 heterodimer (**Table 2**).

We also investigated the binding of CSN5i-3, which is a very weak inhibitor of Rpn11. The starting structures of CSN5i-3 bound to the monomeric Rpn11 and Rpn8-Rpn11 heterodimer were generated by manual docking. The CSN5i-3-Rpn11 complex showed a very high Cα-RMSD (**Figure 7A**) suggesting that Rpn11 undergoes large conformational changes (**Figure 9B**). However, in the presence of Rpn8, the Cα-RMSD of Rpn11 was comparatively low (**Figure 7A**). The Ins-1 loop was more flexible in the absence of Rpn8 (**Figure 7B**). In

CSN5i-3-Rpn11 complex, we observed that CSN5i-3 was stable (**Figure 7C**). The N-Zn2<sup>+</sup> distance was very close to the distance reported in the crystal structure of CSN5 crystallized with CSN5i-3 (**Figure 7D** and **Table 1**). This simulated binding of CSN5i-3 to Rpn11 would not be in agreement with experiment and explain its low inhibitor activity. Thus, we investigated whether this was an unrealistic over binding, a protein-protein complex environment would be able to reproduce the selectivity.

In the heterodimeric Rpn8-Rpn11 complex, CSN5i-3 binding in the pocket was not stable (**Figure 7C**), the distance to zinc increases continously and eventually the ligand lost coordination with Zn+<sup>2</sup> (**Figure 7D** and **Table 1**) which shows that the influence of Rpn8 on the capzimin bound Rpn11 is prominent. Interaction of CSN5i-3 in monomeric Rpn11 is shown in **Figure 8A**. In both complexes, H-bond with Thr129 was absent (**Figure 8** and **Figures S4C,D**). We see that binding of CSN5i-3 to Rpn11 is significantly influenced by the presence of Rpn8 (**Figures 7C,D**, **8B**) and the consideration of this heterodimeric state off Rpn11-Rpn8 is necessary to explain capzimin binding and CSN5i-3 non-binding. LIE calculations (**Table 2**) revealed that CSN5i-3 binds moderately to the monomeric Rpn11 and very weakly to the Rpn8-Rpn11 heterodimer. Outside the proteasome, Rpn11 also plays an important role in different cellular activities.<sup>33</sup> Therefore, binding of capzimin and CSN5i-3 to the monomeric Rpn11 are physiologically possible. In recent study, CSN5i-3 has been shown to bind with the recombinant monomeric CSN5 and a co-crystallized structure has been obtained (Schlierf et al., 2016).

### Binding of Capzimin and CSN5i-3 to Monomeric CSN5 and CSN5-CSN6 Heterodimer

MD simulation of capzimin with monomeric CSN5 and CSN5- CSN6 heterodimer revealed that its binding to CSN5 is also influenced by the CSN6 (**Figure 10**). In the presence of CSN6, the Ins-1 loop showed a larger flexibility (**Figure 10B**). Capzimin shows a higher ligand RMSD in the monomeric



Data represented here are averages of three independent MD simulation runs.

TABLE 2 | Binding energies of ligands calculated using LIE method.


The final 20 ns of each trajectory were used to calculate LIE. Data represented here are averages of three independent MD simulation runs. (α = 018, β = 0.37 for CSN5i-3 and 0.45 for capzimin). Bold values refer to protein-protein complexes.

CSN5 (**Figure 10C**). In the monomeric CSN5, capzimin shows bidentate interactions with the catalytic zinc (**Figures 10D**, **11A**) and stable H-bond between amide NH and Thr154 (**Figure 11A** and **Figure S5A**). Intermittent H-bonds with Glu101 and Tyr143 were also observed. However, in the presence of CSN6, capzimin only showed a mono-dentate coordination with catalytic zinc (**Figures 10D, 11B**) and low occupancy H-bond with Met78, Arg106, and Asn158 (**Figure S5B**). Thiazole and amide moieties of capzimin showed H-bonding with the side chains of Asn158 and Met78, respectively. The side chain of Arg106 of Ins-1 loop showed H-bond with the 8TQ fragment of capzimin. Apart from H-bond, the thiazole moiety also showed hydrophobic interactions with side chains of Met78, Leu80 and Phe165. LIE calculations show that capzimin binds more strongly to the monomeric CSN5 (**Table 2**).

If we consider the crystal structure of CSN5 with CSN5i-3, a part of Ins-1 loop (100–106) is missing and residues of α4 helix do not interact with the CSN5i-3 ligand. The orientation of the α4 helix suggests that residues in the Ins-1 loop will also not make any interaction with the CSN5i-3. The MD simulation results (**Figure 12**) show that the Cα-RMSDs of CSN5 in both complexes do not vary much and they converge near the end of simulations (**Figure 12A**). In the presence of CSN6, Residues (101–110) in the Ins-1 loop (98–113) of CSN5 showed more flexibility but α4 helix (111–131) was comparatively less flexible (**Figure 12B**). Previous MD simulations study on monomeric CNS5 suggests that portions of the Ins-1 region show high flexibility (Echalier et al., 2013). In the CSN5-CSN6 heterodimer we observed that CSN5i-3 is stable (**Figure 12C**) and its conformation is very close to the conformation reported in the crystal structure (**Figure S6**). The N-Zn2<sup>+</sup> distances are shown in **Figure 12D**. They are stable over time for the monomer and heterodimer states and only slightly longer than in the crystal structure. In both complexes, residues in Ins-1 loop make contact with the CSN5i-3 (**Figure 13**). We observed that interaction of the α2 helix of CSN6 affects the movement of the Ins-1 loop and α4 helix of CSN5.

In the monomeric CSN5, CSN5i-3 makes a low occupancy H-bond with Thr154 and relatively stable H-bond with Asn158 (**Figure 13A** and **Figure S5C**). However, in the CSN5-CSN6 heterodimer, CSN5i-3 forms stable H-bonds with both Thr154 and Asn158 (**Figure S5D** and **Figure 13B**). In the crystal structure, the H-bond between the azole ring of CSN5i-3 and Asn158 is not present but our MD refinement showed a stable H-bond. The difluoromethyl group projects toward Leu157 and makes hydrophobic interactions with it (**Figure 13**). In both capzimin and CSN5i-3 bound monomeric CSN5, we observed that Ins-1 region slightly moves toward inhibitors and a part of Ins-1 loop changes into α helix (**Figure 14**). This may be the reason behind lower flexibility of Ins-1 loop in the monomeric CSN5 compared to the CSN6 bound CSN5.

### Structural Elements Responsible for Ligand-Selective Inhibition of Rpn11 and CSN5

Based on the MD simulation results of Rpn11 and CSN5 with capzimin and CSN5i-3, we can explain the selective inhibition of these two proteins. At the sequence level, the MPN domain of CSN5 and Rpn11 are moderately conserved. However, the Zn2<sup>+</sup> binding residues are fully conserved. The distal ubiquitin binding region is large therefore we mainly focused on the residues which interact with capzimin and CSN5i-3. We found that Leu98, Val100, Thr105, Arg106, Gln110, Ala112, Ala113, Tyr114, Glu115, Tyr116, Met117, A119, Ile150, Leu157 and Asn158, and Phe161 (in CSN5) are substituted by Met75, Gln77, Val82, Ser83, Glu85, Val87, Asp88, Pro89, Val90, Phe91, Gln92, Lys94, Val125, Ser132, Phe133, and Leu136, respectively (in Rpn11). We observed that most of the residues in the Ins-1 loop region are not conserved. The Ins-1 region plays an important role in positioning of C-terminus of the distal ubiquitin for cleavage of iso-peptide bond (Worden et al., 2014). Hence Ins-1 region appears very promising for the design of selective JAMM inhibitors.

If we consider the calculated LIE of capzimin in the monomeric Rpn11 and CSN5 proteins, we see that capzimin has a slightly higher affinity for Rpn11 (1 kcal/mole more than CSN5). Selective binding is more pronounced and can be rationalized when we consider their respective binding partners, Rpn8 and CSN6. Similarly, selective binding of CSN5i-3 is more pronounced when we consider the heterodimeric states of Rpn11 and CSN5.

FIGURE 10 | MD simulation data of capzimin bound to CSN5 in the absence (red) and presence of CSN6 (black). (A) Cα-RMSD of CSN5 (B) Cα-RMSF of Ins-1 region of CSN5 (C) RMSD of capzimin and (D) distances between coordinating atoms of capzimin and Zn+<sup>2</sup> . Zn+<sup>2</sup> -S distances in capzimin bound CSN5 and CSN5-CSN6 heterodimer are shown in blue and green color, respectively.

Thus, we need to consider the heterodimeric states of Rpn11 and CSN5 to explain the selectivity of capzimin and CSN5i-3. In the presence of binding partners, the Ins-1 regions of RPN11 and CSN5 show different flexibility which in turn affects the binding of capzimin and CSN5i-3.

Capzimin is 80-fold more selective toward Rpn11 and its thiazole moiety appears important for this selectivity. In the Rpn8-Rpn11 heterodimer, capzimin showed a bidentate coordination with Zn2<sup>+</sup> and a stable H-bond with side chain of Thr129 (**Figure S4B**). However, in the CSN5-CSN6 heterodimer, capzimin displayed a monodentate coordination with Zn2<sup>+</sup> and only low occupancy H-bonds with side chains of Met78, Arg106, and Asn158 (**Figure S5B**). The low affinity of capzimin for CSN5 can be attributed to the lack of an extra N-Zn2<sup>+</sup> coordination and

FIGURE 12 | MD simulation data of CSN5i-3 bound to CSN5 in the absence (red) and presence of CSN6 (black). (A) Cα-RMSD of CSN5 (B) Cα-RMSF of Ins-1 region of CSN5 (C) RMSD of CSN5i-3 and (D) distances between coordinating atoms of CSN5i-3 and Zn+<sup>2</sup> .

H-bond with Thr154. We observed that in the case of capzimin, the binding energies were overestimated. The major reason for this overestimation can be attributed to the presence of a net negative charge on the capzimin ligand. A previous study has also reported very negative binding energy values for negatively charged ligands (Genheden and Ryde, 2015). It should be noted that LIE calculations are very sensitive to the β parameter. Values of β depend on the chemical nature of ligands and can have different values for different ligands. In present study, we have used LIE data only to compare relative binding affinities between monomeric and heterodimeric protein structures.

As for selective binding of CSN5i-3 to CSN5, we observed that in the presence of CSN6, CSN5i-3 binds strongly to the protein-protein complex. However in the case of the Rpn8-Rpn11 heterodimer, CSN5i-3 showed only very weak interaction. With the progress of the MD simulations, CSN5i-3 loses coordination with Zn2<sup>+</sup> and H-bond with Thr129. In Rpn11, the side chain of Phe133 cannot form H-bond with

the azole ring of CSN5i-3. Instead, the azole ring showed hydrophobic interaction with side chain of Leu56, Met75 and Phe133 and van der Waals interactions with the sidechain of Asp88.

It seems that establishing interactions of the azole ring with nearby sidechains resulted in loss of coordination with the active site Zn2<sup>+</sup> ion. In the case of the Rpn8-Rpn11 heterodimer, LIE calculations showed only a very low binding energy of CSN5i-3 (**Table 2**) which is in agreement with the reported ligand binding selectivity. Overall our findings suggest that the heterodimeric protein states of both CSN5 and Rpn11 targets have to be considered to explain the selectivity of capzimin and CSN5i-3 ligands.

### CONCLUSIONS

In the present study the computational analysis of binding modes and selectivities of reported metallo-deubiquitinylase

### REFERENCES


inhibitors, capzimin and CSN5i-3, was performed in detail. Capzimin is a selective inhibitor of Rpn11 while CSN5i-3 is selective for CSN5. We found that capzimin binds to RPN11 via chelation of the active site Zn2<sup>+</sup> and its interaction extends to the distal ubiquitin binding site. Our MD studies suggest that compared to the monomeric states, the heterodimeric protein-protein complexes of Rpn11 and CSN5 are conformationally stable. Considering the Rpn8-Rpn11 and CSN5-CSN6 heterodimers, we found that residues in the distal ubiquitin site are responsible for selectivity and must be taken into consideration for the design of selective inhibitors of CSN5 and Rpn11 in future studies. Additionally, we have shown that flexibility of Ins-1 region in Rpn11 and CSN5 is significantly affected by the presence of their respective protein binding partners.

### AUTHOR CONTRIBUTIONS

MS and MN designed the study. VK performed simulations in this study. MS, MN, and VK analyzed and discussed the results and wrote the manuscript.

### ACKNOWLEDGMENTS

We thank the Max Planck Gesellschaft for the Advancement of Science for financial support. The work was also supported by the Ministery of Economy, Science and Digitalisation (Förderung von Wissenschaft und Forschung in Sachsen-Anhalt aus Mitteln der Europäischen Strukturund Investitionsfonds in der Förderperiode 2014–2020, ZS/2016/04/78155) and the Excellence Center for Dynamic Systems (MDUB).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2018.00480/full#supplementary-material


morphology and function is mapped to a distinct C-terminal domain. Biochem. J. 381, 275–285. doi: 10.1042/BJ20040008


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Kumar, Naumann and Stein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Does Substrate Positioning Affect the Selectivity and Reactivity in the Hectochlorin Biosynthesis Halogenase?

Amy Timmins <sup>1</sup> , Nicholas J. Fowler <sup>2</sup> , Jim Warwicker <sup>2</sup> , Grit D. Straganz 3,4 and Sam P. de Visser <sup>1</sup> \*

*<sup>1</sup> The Manchester Institute of Biotechnology and School of Chemical Engineering and Analytical Science, University of Manchester, Manchester, United Kingdom, <sup>2</sup> The Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester, United Kingdom, <sup>3</sup> Institute of Biochemistry, Graz University of Technology, Graz, Austria, 4 Institute of Molecular Biosciences, Graz University, Graz, Austria*

#### Edited by:

*Fahmi Himo, Stockholm University, Sweden*

#### Reviewed by:

*Rongzhen Liao, Huazhong University of Science and Technology, China Robert S. Paton, Colorado State University, United States*

\*Correspondence: *Sam P. de Visser sam.devisser@manchester.ac.uk*

#### Specialty section:

*This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry*

Received: *01 September 2018* Accepted: *04 October 2018* Published: *30 October 2018*

#### Citation:

*Timmins A, Fowler NJ, Warwicker J, Straganz GD and de Visser SP (2018) Does Substrate Positioning Affect the Selectivity and Reactivity in the Hectochlorin Biosynthesis Halogenase? Front. Chem. 6:513. doi: 10.3389/fchem.2018.00513* In this work we present the first computational study on the hectochlorin biosynthesis enzyme HctB, which is a unique three-domain halogenase that activates non-amino acid moieties tethered to an acyl-carrier, and as such may have biotechnological relevance beyond other halogenases. We use a combination of small cluster models and full enzyme structures calculated with quantum mechanics/molecular mechanics methods. Our work reveals that the reaction is initiated with a rate-determining hydrogen atom abstraction from substrate by an iron (IV)-oxo species, which creates an iron (III)-hydroxo intermediate. In a subsequent step the reaction can bifurcate to either halogenation or hydroxylation of substrate, but substrate binding and positioning drives the reaction to optimal substrate halogenation. Furthermore, several key residues in the protein have been identified for their involvement in charge-dipole interactions and induced electric field effects. In particular, two charged second coordination sphere amino acid residues (Glu<sup>223</sup> and Arg245) appear to influence the charge density on the Cl ligand and push the mechanism toward halogenation. Our studies, therefore, conclude that nonheme iron halogenases have a chemical structure that induces an electric field on the active site that affects the halide and iron charge distributions and enable efficient halogenation. As such, HctB is intricately designed for a substrate halogenation and operates distinctly different from other nonheme iron halogenases.

Keywords: nonheme iron, enzyme catalysis, reaction mechanism, QM/MM, density functional theory, halogenation, hydroxylation

## INTRODUCTION

Enzymatic C–Cl bond formation is a rare process in Nature, yet over the past few decades a range of haloperoxidases and halogenases have been discovered (Gribble, 2003; Vaillancourt et al., 2006; van Pée et al., 2006; Butler and Sandy, 2009; Wagner et al., 2009; Weichold et al., 2016; Agarwal et al., 2017; Schnepel and Sewald, 2017; Timmins and de Visser, 2018). Their catalytic mechanism, however, is still subject to controversies and understanding the fundamental details of these processes may have an impact on biotechnological advances as well as drug development. Three different classes of halogenation enzymes appear in Nature, namely the heme haloperoxidases, the vanadium-dependent nonheme haloperoxidases and the α-ketoglutarate dependent nonheme iron halogenases (Gribble, 2003; Vaillancourt et al., 2006; van Pée et al., 2006; Butler and Sandy, 2009; Wagner et al., 2009; Weichold et al., 2016; Agarwal et al., 2017; Schnepel and Sewald, 2017; Timmins and de Visser, 2018).

Heme haloperoxidases, (Sundaramoorthy et al., 1995; Wagenknecht and Woggon, 1997; Green et al., 2004; Kim et al., 2006) such as chloroperoxidase bind H2O<sup>2</sup> on an iron(III)-heme center, which is then converted into an iron(IV)-oxo heme cation radical species called Compound I, (Meunier et al., 2004; Denisov et al., 2005; Shaik et al., 2005; Rittle and Green, 2010) with the help of a proton shuttle machinery. Compound I subsequently reacts with chloride to form OCl<sup>−</sup> products. The product drifts out of the active site and reacts with substrates through halogenation. A second class of haloperoxidases has a nonheme vanadium co-factor that also utilizes hydrogen peroxide and halide in a catalytic cycle to form OCl<sup>−</sup> products (Messerschmidt and Wever, 1996; Chen and van Pée, 2008). The vanadium haloperoxidases have been characterized in marine algae and are believed to have functions related to natural product synthesis associated with defense mechanisms (Martinez et al., 2001).

The final class of halogenases are the α-ketoglutarate (αKG) dependent nonheme iron halogenases, (Blasiak et al., 2006; Buongiorno and Straganz, 2013; Huang and Groves, 2017) which show structural and functional similarities to the corresponding nonheme iron—α-ketoglutarate-dependent dioxygenases (Solomon et al., 2000; Bugg, 2001; Ryle and Hausinger, 2002; Costas et al., 2004; Abu-Omar et al., 2005; Bruijnincx et al., 2008). These nonheme iron halogenases contain an iron(II) center in the resting state that is bound to the side chains of two histidine residues. During the catalytic cycle, halide and αKG bind to the iron(II) ion prior to molecular oxygen binding. It is believed dioxygen attacks αKG to form succinate and a high-valent iron(IV)-oxo species similarly to the αKG-dependent hydroxylases (Schofield and Zhang, 1999; Bollinger et al., 2005). The iron(IV)-oxo species of several nonheme iron halogenases have been spectroscopically trapped and characterized and shown to react with substrates with a rate-determining hydrogen atom abstraction barrier (Galonic´ Fujimori et al., 2007; Neidig et al., 2007; Matthews et al., 2009; Wong et al., 2013; Srnec et al., 2016; Srnec and Solomon, 2017). The subsequent pathway leading to the halogenated product; however, is controversial as a thermodynamically much more favorable hydroxylation pathway is prevented. How the enzyme manages to perform this unfavorable thermodynamic reaction pathway is under much debate. One proposal suggested links to substrate binding and orientation (Matthews et al., 2006). Thus, in the antibiotic biosynthesis protein SyrB2 the substrate L-Thr is linked to an acyl-carrier protein (SyrB1) via a phosphopantetheinyl (PPT) bridge. Replacement of L-Thr by L-norvaline changed the chemoselectivity of substrate halogenation to hydroxylation due to substrate positioning in the active site (Matthews et al., 2006). On the other hand, computational modeling of the Borowski group proposed a rotation step in the iron(IV)-oxo(halide) group, whereby after hydrogen atom abstraction, the positions of the hydroxo and halide ions switched position leading to easier halide rebound (Borowski et al., 2010). Clearly, the mechanism of aliphatic halogenation remains controversial. Because of this, synthetic (biomimetic) model complexes of these nonheme iron halogenases and haloperoxidases have been developed and studied with experimental (Podgoršek et al., 2009; Comba and Wunderlich, 2010; Liu and Groves, 2010; Chatterjee and Paine, 2016; Puri et al., 2016; Wang et al., 2016) and computational approaches (Noack and Siegbahn, 2007; de Visser and Latifi, 2009; Kulik et al., 2009; Pandian et al., 2009; Quesne and de Visser, 2012; Senn, 2014; Huang et al., 2016; Timmins et al., 2018). Thus, if the halogenation pathway is dependent on substrate positioning then biomimetic models that lack the protein and do not bind substrates in fixed orientations may not be able to react via substrate halogenation efficiently.

A recently discovered halogenase (HctB) from Lyngbya majuscula is involved in the biosynthesis of hectochlorin, whereby a fatty acyl substrate is dihalogenated on a nonheme iron center (Ramaswamy et al., 2007; Pratter et al., 2014a). In contrast to the nonheme iron halogenase SyrB2, HctB activates a non-amino acid group as a substrate. This would give HctB biotechnological applicability that is beyond that of halogenases like SyrB2. As such, HctB appears to have certain flexibility in the activation of an alkyl chain; therefore, we decided to investigate its mechanism using computational methods and compare the mechanism and active site features with those described for SyrB2 previously. The mechanism of O<sup>2</sup> activation for this mononuclear nonheme iron halogenase was characterized with stopped-flow and spectroscopic (circular dichroism and magnetic circular dichroism) studies (Pratter et al., 2014b).

Functional characterization of the enzyme together with analysis of its primary structure reveals that it is a unique threedomain halogenase containing an acyl carrier protein (ACP), which binds the substrate covalently via a phosphopantetheinyl bridge (**Figure 1**). The other terminus of the ACP group is connected to the C-terminus of an acyl-Coenzyme-A subunit, while the N-terminus is linked to the halogenase domain. As such the protein has an intricate set-up and arrangement that enables dihalogenation of the tethered substrate. Interestingly, no evidence of substrate hydroxylation is available, but the activation of hexanoic acid by HctB apart from dihalogenation also gave products from oxygenation leading to vinyl-chloride and ketone products (Pratter et al., 2014a). HctB shows a certain degree of sequence similarity with SyrB2 (see **Supporting Information** for an overlay and sequence alignment), (Pratter et al., 2014a) and binds iron in a nonheme ligand configuration and utilizes αketoglutarate (αKG). The metal in HctB is linked to the protein via two histidine groups (His<sup>111</sup> and His227) in a pentacoordinate environment. However, there are major differences in the overall structure of SyrB2 vs. HctB, which are currently understood and warrant a detailed computational study.

After αKG and halide binding, the metal binds molecular oxygen, and it has been hypothesized to react with αKG to form an iron(IV)-oxo species and succinate upon release of CO2. Unfortunately, the iron(IV)-oxo species in HctB has never been trapped and characterized and details of the halogenation

mechanism are unknown. In the next stage of the catalytic cycle it is expected that a hydrogen atom abstraction occurs to form an iron(III)-hydroxo species, which is elusive as well. Technically, however, the halide should rebound to the substrate radical to form halogenated products, but thermodynamically it has been shown that hydroxo rebound is energetically favorable (Timmins and de Visser, 2015). How the enzyme avoids the low-energy hydroxylation pathway in favor of the higher-energy halogenation remains a matter of discussion. In order to understand the effect of substrate binding on the chemoselectivity of substrate halogenation vs. hydroxylation in HctB we employed a detailed molecular mechanics (MM) and quantum mechanics/molecular mechanics (QM/MM) study. We located two substrate entrance channels and have calculated the substrate halogenation and hydroxylation pathways with a substrate in these positions. The work shows that dramatic differences in halogenation vs. hydroxylation product ratios should be expected based upon substrate positioning. Moreover, our calculations predict that the protein induces an electric field effect that withdraws electron density from the halide toward the metal during the reaction in order to make the halogenation process favorable.

### RESULTS

### Model Selection and Reactant Benchmarking

So far, no computational modeling studies have been reported on HctB and little is known how different (if at all) it is to the well-studied SyrB2. Obviously, both nonheme iron halogenases utilize a different substrate although both are tethered to a carrier protein. Conversely, SyrB2 catalyzes a monohalogenation of the substrate, whereas HctB performs a dihalogenation instead. A comparison of the amino acid sequences of HctB vs. SyrB2 shows major deviations which must result in differences in secondary structure. Therefore, their mechanisms may be quite different and we decided to do a computational study on the mechanism of halogenation vs. hydroxylation of HctB and how it compares with previous experimental and computational studies of analogous nonheme iron halogenases. In particular, we focused our computational study on the first halogenation step of the fatty acyl tethered substrate of a HctB model and took the structure reported by Pratter et al. (2014a). Our set-up of QM/MM models was reviewed thoroughly recently; therefore, we will summarize the main issues only briefly (Hernández-Ortega et al., 2015; Quesne et al., 2016a; Hofer and de Visser, 2018). The model was altered from an iron(II)-water α-ketoglutarate bound structure into an iron(IV)-oxo with ligated succinate. Prior to the full set-up of the complete QM/MM chemical system, however, we investigated possible substrate binding positions of the tethered hexanoyl-PPT moiety. In particular, we searched for alternative substrate entrance channels into the active site.

**Figure 2** shows highlights of the two substrate entrance channels we identified, which are narrow channels that should fit the linear terminal chain of the substrate-carrier protein. The acyl-carrier group of the substrate protein then latches onto the surface of the protein and inserts the tethered hexanoyl-PPT group into the active site. Thus, substrate entrance channel II represents the model from Pratter et al. (2014a) with substrate position **1**. In model **1** with the substrate entering through channel II, the substrate is located parallel to the iron(IV)-oxo

group in a position very similar to the methylated DNA strand in the AlkB repair enzyme (Quesne et al., 2014). The substrate approaches the iron(IV)-oxo group in between the Val<sup>113</sup> and Glu<sup>223</sup> side chains in the corner where the halide group is also located. The entrance channel is located well below the iron(IV) oxo group.

During analysis of the structure, we identified another entrance channel and attempted to latch the acyl carrier protein onto this and manually inserted the tethered hexanoyl-PPT chain. In the analogous halogenase SyrB2 a similar substrate entrance channel is seen in the same position, (Matthews et al., 2006) but due to the lack of a substrate bound crystal structure of HctB we will consider all possible substrate orientations and entrance channels. Entrance channel I is located above the iron(IV)-oxo group and inserts the tethered hexanoyl group into a large open space (possibly filled with water molecules) and hence we created three starting orientations for the substrate, namely models **2**, **3,** and **4**. In model **2** the terminus of the tethered hexanoyl group points down and hangs in between the Val<sup>113</sup> and Arg<sup>245</sup> residues, whereas in model **3** it is found in between Lys<sup>126</sup> and Pro127. The final substrate position (**4**) through channel I is located on the side of the iron(IV)-oxo group and also brings the terminus of the tethered hexanoyl group into the active site nearby the Val<sup>113</sup> and Glu<sup>223</sup> residues. The substrate binding orientation through channel I model **4** brings it in close proximity to both the oxo and halide groups. These three substrate binding positions are distinctly different and guided by hydrogen bonding interactions of the thioester moiety of the hexanoyl-PPT moiety with amino acid residues aligning the channel walls. As such we do not expect easy interconversion between the three substrate binding models.

Subsequently, we investigated the reaction mechanism from each of these different substrate starting orientations and set up QM/MM models with the substrate bound in positions **1**, **2**, **3** and **4**. The set-up of the QM/MM models follows previously reported and benchmarked methods (Quesne et al., 2014, 2016a; Hernández-Ortega et al., 2015; Hofer and de Visser, 2018). Firstly, we added hydrogen atoms using known pK<sup>a</sup> values and visually inspected polar residues for correct protonation states. Thereafter, we applied an iterative solvation procedure and followed this with an equilibration and heating run to a temperature of 298 K. During these set-up steps the protein backbone was fixed, but in the final molecular dynamics simulation all atoms were allowed to move. We initially ran MD simulations for a period of up to 200 ns; however, the run stabilized in all cases after about 5 ns. Clearly, HctB is a very rigid protein with substrate and active site in tight binding orientation with little flexibility as seen from the MD runs. Therefore, for all subsequent systems described here only a 10 ns MD simulation was taken, see **Supporting Information Figure S1**. From the MD simulations we selected a low energy snapshot after 5 ns as starting structures for the actual QM/MM calculations.

Finally, we bisected the full chemical structure of protein, substrate and water layer into a QM and MM region and included key residues in the QM region that form covalent or hydrogen bonding interactions with substrate and oxidant. Initial, exploratory calculations were done with a small QM region containing only the iron(IV)-oxo(chloro) group and the first-coordination sphere of ligands to the metal, namely the imidazole groups of His<sup>111</sup> and His227, the acetate terminus of succinate (Succ) and the thiohexanoic acid arm of the substrate as our minimal QM region **A**, see **Figure 3**. This model contains 48 QM atoms and is overall charge neutral. To test the effect of the second coordination sphere we also calculated the full mechanism with a larger QM region that in addition to QM region **A** included the amino acid side chains of residues within 6 Å of the iron(IV)-oxo-chloro structure, i.e. QM region **AB**. For model **1**, the large QM region **AB** contained the amino acid side chains of Ile108, Val113, Asp200, Asp202, Glu223, Val225, Met226,

Arg245, and six water molecules and has a total of 160 atoms in the QM region.

Thereafter, QM/MM geometry optimizations of the iron(IV) oxo(chloro) species (**Re**) were performed in the singlet, triplet and quintet spin states. Note that in the label of the structure we give the spin multiplicity in superscript before the label and the substrate binding position and the QM region (**A** or **AB**) in subscript after the label. The full set of results that were obtained is given in the **Supporting Information**, while we focus in the main text on the low-energy pathways only.

We started the work with extensive validation and benchmarking of the methods. Unfortunately, there are no experimental rate constants and spectroscopic data for the iron(IV)-oxo species. Previously, we calibrated thioanisole sulfoxidation free energies of activation of a biomimetic nonheme iron(IV)-oxo complex against experimental data and tested 50 different computational methods and techniques (Cantú Reinhard et al., 2016a). The best agreement with experiment was found for the PBE0/BS2 and B3LYP/BS2 methods with a solvent model included. In particular, free energies of activation were reproduced within 4 kcal mol−<sup>1</sup> from experiment (Vardhaman et al., 2011, 2013; de Visser et al., 2014; Sainna et al., 2015; Barman et al., 2016a; Cantú Reinhard et al., 2016b). Furthermore, the methods reproduced experimental product distributions of bifurcation processes well (Ji et al., 2015; Kaczmarek et al., 2018). Finally, for the nonheme iron dioxygenase prolyl-4-hydroxylase six hydrogen atom abstraction barriers from substrate were investigated and the QM/MM predicted the correct regioselectivity and therefore the methods are expected to predict regio- and chemoselectivities well (Karamzadeh et al., 2010; Pratter et al., 2013; Timmins and de Visser, 2017; Timmins et al., 2017).

In agreement with the experimental studies on the iron(IV) oxo species of the halogenase SyrB2, (Galonic Fujimori et al., ´ 2007) we find the quintet spin state as the ground state for all chemical systems. Interestingly, the quintet spin state is below the triplet spin state by a considerable margin of well over 13 kcal mol−<sup>1</sup> ; hence the triplet and singlet spin states of HctB will not play a major role during the reaction and are high in energy. Therefore, we will focus on the quintet spin state structures and mechanism only here. The **Supporting Information** gives all results of the alternative spin states investigated. Nevertheless, the reaction is expected to proceed through single-state-reactivity on the dominant quintet spin state surface in agreement with previous studies on pentacoordinated iron(IV)-oxo complexes reported before for nonheme iron enzymatic and model complexes (de Visser, 2006a,b, 2010; Hirao et al., 2006; Bernasconi and Baerends, 2008; Latifi et al., 2009; Ye and Neese, 2011; Ansari et al., 2013; Tang et al., 2013; Saouma and Mayer, 2014; Cantú Reinhard and de Visser, 2017a).

The quintet spin state of the iron(IV)-oxo(chloro) species in HctB has four unpaired electrons located in the π ∗ xy, π ∗ xz, π ∗ yz, and σ ∗ x2−y2 molecular orbitals and a virtual σ ∗ z2 orbital. Thus, the π ∗ xz and π ∗ yz orbitals represent the antibonding interactions of the metal 3dxz/3dyz atomic orbitals with a 2px/2p<sup>y</sup> orbital on the oxo group. These two orbitals are orthogonal and close in degeneracy in nonheme iron(IV)-oxo and located in the xz and yz molecular planes, whereby the z-axis is defined by the Fe–O axis. The π ∗ xy and σ ∗ x2−y2 orbitals are both in the plane perpendicular to the Fe–O axis (xy-plane) and represent antibonding orbitals with ligands in the equatorial plane, namely the nitrogen of His111, the carboxylate of succinate and the halide atom. The final metal-type 3d molecular orbital is the σ ∗ z2 orbital for the σ-type antibonding interaction along the Fe–O bond, which is virtual.

During the equilibration and MD simulation as well as the subsequent QM/MM geometry optimization the positions of the hexanoyl substrate chains have moved slightly. In particular this was the case for model **3** where the terminus moved away from the direction of the iron(IV)-oxo(chloro) group. As a result, the distance between substrate and iron(IV) oxo(chloro) is quite large (>5 Å) in model **3**. **Figure 4** shows the

eight different optimized geometries of the iron(IV)-oxo(chloro) reactant species as calculated with QM/MM for models **1**, **2**, **3** and **4** with either QM region **A** or **AB**. As can be seen, all optimized structures, regardless of the substrate binding position, give very similar optimized geometries with Fe–O distances ranging from 1.621 to 1.641 Å and Fe–Cl distances between 2.278 and 2.326 Å. These values match previous calculations on nonheme iron(IV) oxo(chloro) and nonheme iron(IV)-oxo complexes excellently (de Visser, 2006a,b, 2010; Hirao et al., 2006, 2011; Matthews et al., 2006; Noack and Siegbahn, 2007; Bernasconi and Baerends, 2008, 2013; de Visser and Latifi, 2009; Kulik et al., 2009; Latifi et al., 2009; Pandian et al., 2009; Dey, 2010; Ye and Neese, 2011; Quesne and de Visser, 2012; Ansari et al., 2013; Liu et al., 2013; Tang et al., 2013; Usharani et al., 2013; Kumar et al., 2014; Saouma and Mayer, 2014; Senn, 2014; Huang et al., 2016; Zhao et al., 2016; Cantú Reinhard and de Visser, 2017a; Timmins et al., 2018).

The QM/MM results from **Figure 4** show that the secondary environment and the substrate positioning have little or no effect on the Fe–O and Fe–Cl distances in the iron(IV)-oxo(chloro) species although it may affect the kinetics as shown below. In addition, no major electronic differences are seen for the eight optimized structures and all converge to a quintet spin ground state. To be specific, group spin densities of all structures are close with values of around ρFe = 3.12 and ρ<sup>O</sup> = 0.63. The orbital analysis and group spin densities implicate that most spin density is located along the Fe–O bond and point to an orbital occupation of π ∗1 xy π ∗1 xz π ∗1 yz σ ∗1 x2−y2 for all optimized geometries. Our calculations are in agreement with experimental EPR and Mössbauer spectroscopy studies on the analogous structure in SyrB2 that also reported a high-spin ground state (Galonic´ Fujimori et al., 2007).

The only noticeable difference between the four substrate binding positions relates to the orientation of the active site Arg<sup>245</sup> residue, which is locked in hydrogen bonding interactions to the dangling terminal carboxylate group of succinate via a bridging water molecule in model **1** and **2**. By contrast, in model **3** the Arg<sup>245</sup> group forms a direct salt bridge with the carboxylate of the succinate moiety and also is in close proximity to the oxo group. Finally, in structure **4**, the Arg<sup>245</sup> forms a salt bridge with Glu223, whereby it appears to close entrance channel II. It may very well be, therefore, that the Arg<sup>245</sup> residue is involved in αketoglutarate binding and/or succinate release from the active site. Indeed, Arg<sup>245</sup> is a conserved residue in most reported nonheme iron halogenases and therefore, is expected to play a key role in catalysis and/or substrate positioning. On the other hand, Glu<sup>223</sup> is not conserved in the majority of reported nonheme iron halogenases and only found in HctB. It would be interesting to see how mutation of either Glu<sup>223</sup> or Arg<sup>245</sup> affects the enzyme function, chemical catalysis and product distributions. But that will have to await a future experimental study.

### Chemoselectivity Patterns

Next, the chemoselectivity of substrate halogenation vs. hydroxylation was investigated for the pathways described and defined as in **Scheme 1**. The reaction starts from the reactant complex of iron(IV)-oxo(chloro) with substrate (**Re**) and proceeds via a stepwise mechanism with an initial hydrogen atom abstraction transition state (**TS**HA) leading to an iron(III) hydroxo complex and a substrate radical (**I**HA). In the next step the pathways diverge and either OH rebound (via transition state **TS**OH) or Cl rebound (via transition state **TS**Cl) occurs. These mechanisms then lead to the hydroxylated products (**P**OH) and halogenated products (**P**Cl), respectively. Technically, the reaction apart from being chemoselective is also enantioselective where only one of the two isomers is expected. Thus, due to tight substrate binding and positioning, the hydrogen atom abstraction will be selective from only one of the two hydrogen atoms from the ω-1 position of the substrate. Hence the hydrogen atom abstraction will guide the enantioselectivity as seen before on related enzymes (Karamzadeh et al., 2010; Pratter et al., 2013; Timmins and de Visser, 2017; Timmins et al., 2017). Nevertheless, in HctB the first halogenation of substrate is followed by a second halogen transfer through the binding of another molecule of O<sup>2</sup> and Cl<sup>−</sup> to repeat the cycle and the formation of the dihalogenated product.

We tested the mechanisms for the first halogenation step in HctB on all low lying spin states (singlet, triplet, quintet), for each of the substrate binding positions (**1**, **2**, **3**, **4**) and with different QM regions (**A** or **AB**). Previously, we found the septet spin state of nonheme iron(IV)-oxo to be at least 7 kcal mol−<sup>1</sup> higher in energy than the quintet spin state and, therefore, we did not investigate this state further (Latifi et al., 2009). However, we tested the models and methods by using different optimization techniques (basis set, DFT method, snapshot).

Firstly, changing the basis set from SV(P) to TV(P) on all atoms had little effect on the optimized geometries and lowered the hydrogen atom abstraction barrier by only a small amount. In particular, geometry optimizations at UB3LYP/BS1 and UB3LYP/BS2 of <sup>5</sup>**Re**<sup>1</sup> and <sup>5</sup>**TS**HA,1 using model **1** and the large QM region **AB** gave hydrogen atom abstraction barriers of 23.5 kcal mol−<sup>1</sup> with UB3LYP/BS2 and 21.5 kcal mol−<sup>1</sup> with UB3LYP/BS2//UB3LYP/BS1. In addition, the optimized geometries obtained with UB3LYP/BS1 vs. UB3LYP/BS2 are very similar (vide infra). Therefore, we continued with geometry optimizations at UB3LYP/BS1 level of theory for the remainder of the project.

Secondly, choosing alternative snapshots from the MD simulation also had little effect on the optimized geometries of the reactant complexes of nonheme iron complexes and did not change spin state orderings and relative energies dramatically. This probably is as nonheme iron dioxygenases often have very rigid structures where the substrate is tightly bound (Aluri and de Visser, 2007; Kumar et al., 2011; Tchesnokov et al., 2016; Faponle et al., 2017). Previous QM/MM studies on analogous nonheme iron dioxygenases indeed reproduced experimental chemoselectivities excellently using the same methods and procedures described here (Aluri and de Visser, 2007; Karamzadeh et al., 2010; Kumar et al., 2011; Pratter et al., 2013; Quesne et al., 2014, 2016a; Hernández-Ortega et al., 2015; Tchesnokov et al., 2016; Faponle et al., 2017; Timmins and de Visser, 2017; Timmins et al., 2017; Hofer and de Visser, 2018). Moreover, changing the QM region, basis set or density functional method did not change the ordering of the transition states.

Let us first focus on the hydrogen atom abstraction step, which is rate-determining in the overall reaction mechanism and optimized geometries for all four substrate binding positions are shown in **Figure 5**. Hydrogen atom abstraction from a quintet spin iron(IV)-oxo species with configuration π ∗1 xy π ∗1 xz π ∗1 yz σ ∗1 x2−y2 leads to a sextet spin iron(III)-hydroxo with a nearby radical, or alternatively a quartet spin iron(III)-hydroxo with a nearby radical. Thus, the abstracted electron from the hydrogen atom either pairs up with the lowest π ∗ xy electron to form a quartet spin iron(III)-hydroxo species or enters the virtual σ ∗ z2 orbital to give a sextet spin iron(III)-hydroxo species. Previously, (de Visser, 2006b; Bernasconi and Baerends, 2008; Ye and Neese, 2011; Ansari et al., 2013; Tang et al., 2013; Cantú Reinhard and de Visser, 2017a) it was shown that hydrogen atom abstraction by a quintet spin pentacoordinated nonheme iron(IV)-oxo species leads to electron transfer to the metal that preferentially fills the σ ∗ z2 orbital with one electron, designated the <sup>5</sup>σ-pathway. Alternatively, the electron can move into one of the lowerlying and singly occupied π <sup>∗</sup> orbitals: the so-called <sup>5</sup>π-pathway.

A spin density and molecular orbital analysis for all hydrogen atom abstraction transition states revealed that all are of <sup>5</sup>σ-type leading to a radical intermediate with five unpaired electrons in metal-type orbitals (π ∗1 xy π ∗1 xz π ∗1 yz σ ∗1 z2 σ ∗1 x2−y2) and a downspin electron on the substrate. Thus, in all transition states there is negative spin density on the substrate (ranging from −0.27 to −0.47) while the spin density on iron increases to 4.01–4.14. These values are in agreement with hydrogen atom abstraction reactions calculated before for nonheme iron dioxygenases and synthetic model complexes (Karamzadeh et al., 2010; Hirao et al., 2011; Pratter et al., 2013; Kumar et al., 2014; Zhao et al., 2016; Timmins and de Visser, 2017; Timmins et al., 2017). We attempted to find a transition state for the <sup>5</sup>π-pathway for several models, but found them considerably higher in energy and we did not manage to optimize those structures with QM/MM. Filling of the σ ∗ z2 orbital with one electron, as happens in the <sup>5</sup>σ-pathway, usually means that the hydrogen atom abstraction is performed along the molecular z-axis, i.e., a linear Fe–O–C angle is seen in the transition state. However, due to constraints in substrate position and orientation and the location of amino acid residues in that region, in the protein the substrates are unable to approach the metal-oxo under this "ideal" angle. As a matter of fact, only in model **3** the substrate can approach the iron(IV)-oxo(chloro) from the top and an almost linear Fe–O–C<sup>5</sup> angle is found, while the angle is considerably less for the other structures. As a result, the barrier is the lowest for model **3**. Nevertheless, even with the transition states in the bent orientation they are still favored in the <sup>5</sup>σ-pathway and no <sup>5</sup>π-pathway is found. Thus, even though the transition states are not in the geometric ideal position, the lowest energy pathway is the <sup>5</sup>σ-pathway. In a previous study on a biomimetic model complex, we investigated hydrogen atom abstraction from dihydroanthracene by a pentacoordinated iron(IV)-oxo species, where the z-axis was blocked by ligands, so that the substrate had to approach from the side. For that system, analogously to what is seen here, the <sup>5</sup>σ-pathway was still the lowest in energy (Latifi et al., 2013). Therefore, geometric constraints cannot overcome the orbital energy differences between the <sup>5</sup>σ− and <sup>5</sup>π pathways for the HctB active site.

As can be seen from **Figure 5**, the approach of the substrate on the iron(IV)-oxo center has a profound effect on the barrier height for hydrogen atom abstraction. In particular, pathway **3** appears to be catalytically most effective with a low hydrogen atom abstraction barrier of only 4.9 kcal mol−<sup>1</sup> (UB3LYP/BS2//UB3LYP/BS1) for QM region **AB**. This is thanks to its geometric orientation, whereby substrate approach to the catalytic center is from the top and the donating hydrogen atom is aligned with the Fe–O axis. With substrates approaching through channels **1**, **2,** and **4**, however, the Fe–O–H angle is strongly bent and the hydrogen atom abstraction barriers are dramatically raised in energy, although they still should be accessible at room temperature.

All transition state geometries display short C–H and long O– H bonds as well as relatively long O–C distances. It may very well be that the enzyme has evolved to do this on purpose in order to guide the chemoselectivity of the reaction and prevent unwanted (more favorable) side reactions. Thus, alternative nonheme iron dioxygenases, such as taurine/αKG dioxygenase have the substrate located in a pocket nearby the iron(IV) oxo species and a QM/MM study provided a substantially lower hydrogen atom abstraction barrier than found for HctB (Borowski et al., 2004; de Visser, 2006c; Nemukhin et al., 2006; Cicero et al., 2007; Sinnecker et al., 2007; Godfrey et al., 2008; Chen et al., 2011; Bushnell et al., 2012; Mai and Kim, 2016; Wójcik et al., 2016; Álvarez-Barcia and Kästner, 2017). All hydrogen atom abstraction transition states display a large imaginary frequency for the C–H–O stretch vibration; however, distinct differences in the magnitude are seen. In particular, the barriers <sup>5</sup>**TS**HA,2 and <sup>5</sup>**TS**HA,3 have imaginary frequencies well over i1,000 cm−<sup>1</sup> and hence considerable amount of tunneling can be expected. Indeed, previous computational studies on hydrogen atom abstraction reactions by iron(IV)-oxo complexes predicted significant tunneling but also a huge change in barrier upon replacement of the transferring hydrogen atom by deuterium (Kumar et al., 2003, 2004; de Visser, 2006d; Quesne et al., 2016b; Cantú Reinhard et al., 2017b). In particular, the kinetic isotope effect was shown to linearly increase with the value of the imaginary frequency in the transition state.

**Table 1** summarizes all hydrogen atom abstraction barriers for models **1**, **2,** and **4**. In general, for model **2** and **4** the UB3LYP/BS1 and UB3LYP/BS2//UB3LYP/BS1 energies are very close, whereas a change from the small QM region to the larger one leads to a drop of about 4 kcal mol−<sup>1</sup> in the transition state energy. More dramatic changes are seen for model **1**. Thus, with the small QM region large variations in the transition state energy are found dependent on the calculation method and basis set. Moreover, the energies obtained for the small QM region are quite different from those for the larger QM region, where little fluctuation upon changing the basis set is observed. Therefore, the large QM region is the most suitable chemical system for our calculations and hence we used that for the rest of the work.

We even did a full geometry optimization at UB3LYP/BS2:Charmm with QM/MM for the same reaction mechanism using both QM region **A** and **AB** for model **1**. The UB3LYP/BS2 QM/MM optimized geometry of <sup>5</sup>**TS**HA,**<sup>1</sup>** is located at a value of 23.5 kcal mol−<sup>1</sup> above the reactant complex, which is within 2 kcal mol−<sup>1</sup> of the UB3LYP/BS1 and UB3LYP/BS2//UB3LYP/BS1 results. Moreover, the effect of the basis set on the structures of the optimized geometries is small as well and geometrically they are very close (**Supporting Information**). Thus, the C–H and H–O distances change from 1.22/1.45 Å calculated with UB3LYP/BS1:Charmm to 1.22/1.43 Å as obtained with UB3LYP/BS2:Charmm for model **1**.

There is quite a bit of fluctuation in the hydrogen atom abstraction barrier depending on the position of the substrate (**Figure 5**). The lowest barrier is found for model **3** and is



*a In kcal mol*−<sup>1</sup> *.* 4.9 kcal mol−<sup>1</sup> in energy (UB3LYP/BS2//UB3LYP/BS1:Charmm with QM region **AB**), while the substrate approach via models **1**, **2,** and **4** are higher in energy by 16.6, 11.5, and 15.2 kcal mol−<sup>1</sup> , respectively. The latter three, although somewhat higher in energy than model **3**, have barrier heights that are still within range of what would be an accessible hydrogen atom abstraction barrier at room temperature. It is very interesting to see that models **1**, **2,** and **4** give hydrogen atom abstraction barriers to within a couple of kcal mol−<sup>1</sup> of each other and hence, the substrate entrance and positioning appears to have a small effect on the ability of the iron(IV)-oxo(chloro) species to react with substrate. As the substrate is connected to an acyl-carrier protein that is latched on the surface of the protein, the hydrogen atom abstraction barrier is dependent on the distance the substrate terminus can be inserted into the protein. We show here that in all four orientations (models **1**, **2**, **3,** and **4**) the substrate entrance leads to a viable hydrogen atom abstraction channel; however, the ease of hydrogen atom abstraction is strongly influenced by its positioning. Nevertheless, a low hydrogen atom abstraction barrier not necessarily correlates with the correct chemoselectivity of reaction, which is determined in the subsequent reaction step. Therefore, even though the hydrogen atom abstraction is rate determining, it does not decide the product distributions.

The hydrogen atom abstraction reaction leads to an iron(III) hydroxo(chloro) radical intermediate (**I**HA) and proceeds to products through either OH or Cl rebound to the substrate radical to form the products **P**OH and **P**Cl. For all four substrate position orientations (**1**, **2**, **3,** and **4**), we then investigated the mechanisms for either Cl rebound or OH rebound. As can be seen from **Figure 6**, dramatic changes in rebound as well as chemoselectivity of the reaction are obtained depending on the position of the substrate. The ordering and relative energies are not dependent on the choice of the basis set or the size of the QM region: all results point in the same direction and give the same conclusions (**Supporting Information**). With the substrate entering from a channel above the iron(IV)-oxo species, like through models **1** and **2**, the chemoselectivity is in favor of halogenation over hydroxylation by more than 10 kcal mol−<sup>1</sup> . In particular, the halide rebound step was found to have a negligible barrier in both cases; therefore, halide transfer will be fast in both cases.

A complete reversal of the chemoselectivity is seen in model **3**, but now the OH rebound is barrierless. By contrast, almost identical OH and Cl rebound barriers are found in model **4**. Clearly, the substrate position and orientation has a major effect in the bifurcation pathways and the chemoselectivity of the reaction. Moreover, the calculations presented in **Figure 6** implicate that HctB has two viable substrate entrance channels (I and II in **Figure 2**) and models with the substrate in either of these entrance channels give low-energy hydrogen atom abstraction barriers and preferential halogenation over hydroxylation. However, substrate entrance through channel I in HctB can lead to the location of the substrate in various positions, whereby the one in model **4** probably will give a mixture of products and position **3** gives substrate hydroxylation. As such, substrate positioning and orientation is vital for efficient

halogenation in HctB and we predict the enzyme to latch the acyl carrier into substrate entrance channel II and position the substrate in the orientation as shown in model **1**. The structure and orientation of the substrate in model **1** leads to a reaction mechanism with thermochemically accessible hydrogen atom abstraction barrier at room temperature and chemoselectivity preference for substrate halogenation as expected from a halogenase. Most likely, therefore, the other channel located from our structure analysis (substrate entrance channel I) could be involved in guiding halide, dioxygen or α-ketoglutarate reactants into the active site and release of succinate and CO2.

The hydrogen atom abstraction barriers reported above in **Figure 5** already give a clue on the potential chemoselectivity of the reaction. Thus, in structure <sup>5</sup>**TS**HA,**<sup>1</sup>** and <sup>5</sup>**TS**HA,**<sup>2</sup>** the Fe–Cl bond has elongated from about 2.28 Å in the reactant complexes to 2.42 and 2.33 Å, respectively. In both cases, the substrate radical is positioned in the quadrant in between the hydroxo and chloride ligands. On the other hand, in <sup>5</sup>**TS**HA,**<sup>3</sup>** the radical is located above the hydroxo group and far away from chloride, thereby making chloride rebound unlikely. To understand the large chemoselectivity preference of halogenation over hydroxylation by model **2**, we show in **Figure 7** the optimized geometries of the radical intermediates <sup>5</sup> **I**HA as obtained with QM region **A** and **AB**. The structures are very similar regarding of the size of the QM region chosen. The Fe–O bond has elongated from 1.62 Å in <sup>5</sup>**Re<sup>2</sup>** to 1.90 (1.88) Å in <sup>5</sup> **I**HA using QM region **A** (**AB**), respectively. This is due to occupation of the σ ∗ z2 orbital with one electron in the radical intermediate, which is antibonding along the Fe–O bond and hence leads to its elongation. Although the radical has dissipated from the metal center slightly, it is nearer to the halide than to the hydroxo group by almost 1 Å (**Figure 7**). Moreover, the hydrogen atom of the hydroxo group points in the direction of the radical. However, for hydroxo rebound it needs to move out of the way to make the C–O bond formation possible. In the gas-phase and in apolar environments this O–H rotation costs little energy and hydroxo rebound tends to be low in energy as seen in many aliphatic hydroxylation studies reported previously (Ogliaro et al., 2000; Dey, 2010; Hirao et al., 2011; Bernasconi and Baerends, 2013; Liu et al., 2013; Usharani et al., 2013; Kumar et al., 2014; Zhao et al.,

2016). By contrast, in the HctB models **1** and **2** the OH rebound barrier is high in energy due to the largely polar active site with lots of water molecules that prevent the OH group from rotation and keep it in a fixed orientation.

Recent QM/MM studies on the nonheme iron halogenase SyrB2 proposed a strong hydrogen bonding interaction between the hydroxo group in the radical intermediate with an active site Arg residue (Huang et al., 2016). In our optimized geometries of the radical intermediates (<sup>5</sup> **I**HA) no direct interaction was found for any substrate binding position, although in several cases a bridging hydrogen bonded water molecule was located in between, see **Figure 7**. The structure displayed in **Figure 7** shows that the water channel that enters the active site pocket serves as a drive to prevent OH rebound and trigger an alternative reaction pathway, namely halogenation. A recent QM/MM study on a cytochrome P450 peroxygenase also had an optimized geometry for an iron(IV)-hydroxo(heme) with an active site whereby

the hydroxo group underwent several hydrogen bonding interactions with crystal water molecules in the pocket (Faponle et al., 2016). Studies on the bifurcation pathways for substrate decarboxylation vs. hydroxylation also gave a preference for the energetically unfavorable decarboxylation mechanism similar to what is seen here.

Of course, it should be mentioned that the iron(II)-hydroxo complex with halogenated substrate **P**Cl is not the final step in the catalytic cycle. One could envisage a proton transfer to generate iron(II)-water; however, our structure analysis did not characterize a proton-transfer channel, so that this pathway may not be feasible. Instead, we predict that after the formation of **P**Cl, another chloride ion enters the substrate binding pocket and binds to the iron(II) center. A subsequent hydrogen atom abstraction by the iron(II)-hydroxo(chloro) and rebound of halide then gives the iron(II)-water resting state and a dihalogenated substrate as a product.

### DISCUSSION

To gain further insight into the factors that determine the bifurcation patterns of hydroxylation vs. halogenation, we analyzed the chemoselectivity-determining transition states and investigated the role of the protein environment on their ordering and relative energies. In particular, we compare our QM/MM results with small model DFT calculations from previous work, which should give an indication how the protein constraints to the active site structure and how the long-range electrostatic interactions through the protein affect the chemoselectivity of the reaction. Specifically, small gas-phase model complexes of the nonheme iron halogenase reaction mechanism failed to find preferential halogenation over hydroxylation, (de Visser and Latifi, 2009; Kulik et al., 2009; Pandian et al., 2009; Huang et al., 2016) and only the inclusion of the full enzymatic structure and the QM/MM approach led to the correct chemoselectivity for SyrB2 (Borowski et al., 2010).

Therefore, we took the DFT optimized geometries from de Visser and Latifi (2009) calculated at UB3LYP/BS1 with a polarized continuum model with dielectric constant of ε = 5.7 included and perturb the small model complexes with external charges, electric fields and dipole moments and investigate the effects on the product distributions. As will become clear in the discussion, these external perturbations have a profound effect on the chemoselectivity of the reactions. Thus, we started with a detailed comparison of the QM/MM optimized geometries with the gas-phase DFT models from de Visser and Latifi (2009), see **Figure 8**. Of course, the gas-phase DFT model ignores the secondary coordination sphere of atoms and has large flexibility due to the absence of structural constraints. Indeed, we see from the optimized geometries that the substrate radical in the DFT model approaches the oxidant under an ideal angle (with little or no stereochemical repulsions of active site residues) to the iron(III)-hydroxide(halide) complex <sup>5</sup> **I**HA.

**Figure 8** shows overlays of <sup>5</sup>**TS**Cl/ <sup>5</sup>**TS**OH structures of models **1** and **3** as compared to those obtained with the DFT model from (de Visser and Latifi, 2009). Overlays of models **2** and **4** with these structures are given in the **Supporting Information** and show a similar pattern. As can be seen the optimized <sup>5</sup>**TS**Cl,**<sup>1</sup>** structure overlaps the gas-phase DFT geometry almost perfectly with the substrate in the same location and under almost the same angle. However, the substrate location is far from the gasphase orientation for <sup>5</sup>**TS**OH,**1**. Therefore, the substrate binding pocket appears to be designed to accommodate the substrate for efficient halogen transfer and not OH rebound as would be expected of a halogenase enzyme. Despite the good geometric agreement between <sup>5</sup>**TS**Cl,**<sup>1</sup>** and the analogous DFT model, the barrier heights relative to their precursor complexes are quite different. Thus, an almost negligible Cl transfer barrier is found for model **1** (**Figure 6**), whereas the DFT model gave a much larger barrier of about 10 kcal mol−<sup>1</sup> (de Visser and Latifi, 2009). Clearly, even though the <sup>5</sup>**TS**Cl,**<sup>1</sup>** structures are almost the same, the QM/MM barrier is well lower in energy. Accordingly, the protein must stabilize the transition state for halogen transfer dramatically through long-range electrostatic interactions.

We then investigated the overlays of <sup>5</sup>**TS**Cl,**3**/ <sup>5</sup>**TS**OH,**<sup>3</sup>** with those obtained with the DFT model (**Figure 8B**). As can be seen, the OH rebound structure shows a reasonable match between the two models and hence rebound should stay low in energy. The overlay between the halogen transfer barriers of model **3** and the DFT model are much less good than seen for model **1** and explain why the chemoselectivity is reversed. The comparison of the gas-phase and QM/MM optimized structures, therefore, shows that the substrate should bind in a favorable position for halide rebound. In addition, as shown in **Figure 7**, water molecules prevent a low-energy OH-rotation pathway and lock the OH group in a non-rebound position. Consequently, substrate positioning in combination with a strong hydrogen bonding network around the hydroxo group destabilize OH rebound and drive the reaction toward halide transfer. A similar conclusion was reached by Mitchell et al. (2017) who reported experimental studies on protein engineering of the nonheme iron halogenase WelO5.

Subsequently, we explored the contribution of the protein environment to the chemoselectivity patterns. We initially investigated perturbations affecting the energetics of the <sup>5</sup>**TS**Cl and <sup>5</sup>**TS**OH barriers for the small model complex and particularly focused on electric field effects using previously described procedures (Shaik et al., 2004). We took the gas-phase DFT model and calculated the electronic energy under the influence of an applied electric field with magnitude 0.0050, 0.0100, and 0.0150 au along each individual coordinate axis in both positive and negative directions. Thus, the x-axis is along the Fe–Cl bond, the y-axis along the Fe–OH bond and the z-axis along the Fe– N(His) bond. Relative energies of the two bifurcation transition states <sup>5</sup>**TS**Cl over <sup>5</sup>**TS**OH were investigated with electric field effects for the small model and **Figure 9** shows the ordering and relative energies of <sup>5</sup>**TS**Cl/ <sup>5</sup>**TS**OH through perturbations along the x-axis, i.e., along the Fe–Cl bond. Similar calculations were also performed along the y- and z-axis, whereby the y-axis is defined along the Fe–OH bond and the z-axis is located along the Fe–N(His) bond. The results for the field along the y- and zaxis is given in the **Supporting Information**; however, in those cases the energies did not change by more than 3 kcal mol−<sup>1</sup>

and in all cases <sup>5</sup>**TS**OH stayed well below <sup>5</sup>**TS**Cl. The results in **Figure 9** show that without an electric field or a field along the negative x-axis, the hydroxylation pathway will be favored over the halogenation pathway by more than 7 kcal mol−<sup>1</sup> . With an electric field pointing along the positive x-axis, by contrast, a dramatic change in relative energies between the two barriers is seen and a chemoselective halogenation over hydroxylation is observed.

To understand the chemoselectivity reversal through the addition of an electric field, we analyzed the unpaired group spin densities and charges of the individual complexes. The group charges under the addition of an electric field to the transition states are given at the bottom of **Figure 9**. A clear trend emerges from these calculations. Thus, in both **TS**'s charge density (Q) is removed from Cl upon increasing the electric field along the positive x-axis. In particular, the negative charge of the chloride atom decreases from QCl = −0.61 with a field of E0, X = −0.015 au to a value of QCl = −0.31 with a field of +0.015 au, while an increase of the negative charge with 0.18 units is seen for the same field strengths for <sup>5</sup>**TS**Cl. At the same time, the charge on the OH group stays virtually the same in <sup>5</sup>**TS**OH, whereas it increases from −0.32 to −0.46 in <sup>5</sup>**TS**Cl. Therefore, an applied electric field reduces the charge on chloride and puts more radical character on Cl and enables a Cl• transfer. At the same time, charge is transferred to iron that is reduced from iron (III) to iron (II).

As a result the <sup>5</sup>**TS**Cl barrier drops below the <sup>5</sup>**TS**OH barrier when a positive electric field is applied. It also appears that the <sup>5</sup>**TS**OH barrier is more strongly affected than the <sup>5</sup>**TS**Cl barrier to an external electric field and hence the chemoselectivity switch can occur due to destabilization of the hydroxyl rebound barrier. The calculation with an electric field of 0.015 au along the positive x-axis shows that the substrate group loses most of its spin density with respect to the unperturbed system, namely the spin density on the substrate moiety drops from ρSub = −0.68 for the system without an electric field to ρSub = −0.12 with an electric field of E0, X = +0.015 au. At the same time also the Cl and OH groups lose radical character but to a much lesser extent than the substrate. These results are in line with our previous work on the chemoselectivity of aliphatic hydroxylation vs. epoxidation by a cytochrome P450 model Compound I complex, where we also showed a chemoselectivity switch upon addition of an electric field along a specific axis (Shaik et al., 2004). Further, DFT calculations on a Compound I model of cytochrome c peroxidase with an applied electric field identified radical character on the porphyrin in one direction, whereas with the electric field in the opposite direction a tryptophan radical was obtained (de Visser, 2005).

Our studies, therefore, highlight the importance of the shape and size of the protein of HctB that create an induced electric field on the active site and thereby favorably lead to substrate halogenation to substrate halogenation of the substrate while minimizing substrate hydroxylation. Evidently, an applied electric field from the dipole moment of the protein affects the charge distributions and spin densities of intermediates and transition states in the reaction mechanism and thereby affects the product distributions. In the HctB halogenase enzyme the protein perturbs the active site structure with an induced electric field in such a way that it stabilizes the halogenation pathway and lowers it below the hydroxylation pathway. This effect is the result of considerable destabilization of the OH rebound barrier and a smaller stabilization of the Cl rebound as based upon the changes of the charge distributions between no field and a positive electric field.

To find further evidence of environmental perturbations influencing the halogenase vs. hydroxylase activity of the protein, we searched for charged residues in the vicinity of the iron(IV) oxo(chloride) complex in the pdb structures of <sup>5</sup>**Re1**, <sup>5</sup>**Re2**, <sup>5</sup>**Re3,** and <sup>5</sup>**Re4**. Nearby the halide atom we locate the Glu<sup>223</sup> and Arg<sup>245</sup> side chains, whereby the latter may be involved in α-ketoglutarate

FIGURE 9 | Relative energies (UB3LYP/BS2) between halide and OH rebound barriers (5TSCl vs. <sup>5</sup>TSOH) as calculated with gas-phase DFT calculations in Gaussian with an applied electric field (with magnitude E<sup>0</sup> in au) included. Relative energies (in italics) are in kcal mol−<sup>1</sup> and the Table gives group charges (in atomic units) of transition states under the influence of an applied electric field. The electric field perturbations are along the x-axis (along the Fe—Cl bond).

binding and succinate release. The top part of **Figure 10** gives the relative orientations of the Glu<sup>223</sup> and Arg<sup>245</sup> amino acids with respect to the iron(IV)-oxo(chloride) in the reactant complexes. As can be seen, the three enzyme models show considerable differences in the position of these amino acids that may incur Coulombic interactions with the active site atoms resulting in chemoselectivity changes. Thus, in model **1** the carboxylate group of Glu<sup>223</sup> is aligned to the axis that goes through the Fe–Cl bond and appears to be set up to push electron density away from Cl and onto iron. By contrast, in **4** the Glu<sup>223</sup> and Arg<sup>245</sup> side chains have formed a salt bridge and now the positively charged Arg is on the axis of the Fe–Cl bond. Consequently, in model **4** charge density will be withdrawn from iron and move to chloride instead making it more difficult to transfer the Cl to substrate. In model **3** the Glu<sup>223</sup> is moved to the outside part of the protein and hence is located at a relatively large distance from the active center. In addition, the Arg<sup>245</sup> is positioned much closer to the halide but on the side rather than along the axis. Indeed, model **3** shows a regioselective hydroxylation and no halogenation, whereas in model **1** a preference for halogenation over hydroxylation is seen.

To find out whether the Glu<sup>223</sup> amino acid residue is near enough to the iron(IV)-oxo(chloride) to affect the charge distributions of the atoms, we took the DFT models of <sup>5</sup>**TS**OH and <sup>5</sup>**TS**Cl and included a point charge to the system in the position of the carboxylate oxygen atom of Glu<sup>223</sup> from model **1**. Single point calculations with a point charge of Q = +1 and −1 were performed and the results are given at the bottom of **Figure 10**. Surprisingly, only small electronic changes are observed to <sup>5</sup>**TS**Cl upon addition of a point charge. By contrast, the hydroxyl rebound barrier sees major changes in the charge of the halogen atom under the influence of a point charge. Specifically, the halogen charge increases from −0.48 to −0.41 with a point charge with magnitude −1 included. In addition, the barrier <sup>5</sup>**TS**Cl drops below <sup>5</sup>**TS**OH by 4.6 kcal mol−<sup>1</sup> , whereas the energy difference is −20.0 kcal mol−<sup>1</sup> with a point charge of magnitude +1 included. Therefore, a point charge located at a distance of over 5 Å from the reaction center can incur an electrostatic perturbation that affects the charge distributions in the catalytic reaction center and consequently the chemoselectivity of the reaction. This is an important caveat, given that contrary to the active site and core region of the halogenase domain, the structure of HctB is less well defined by the model, with the positioning of the two other domains and interactions of the halogenase domain not clear/not well defined.

To finally test the effect of the salt bridge between Arg<sup>245</sup> and Glu<sup>223</sup> we performed a series of electrostatic potential calculations on model **4** whereby in one case the salt bridge was rearranged so that either Arg<sup>245</sup> or Glu<sup>223</sup> is closer to the chloride. Calculations of electrostatic potential fields were made with the Protein-sol server software package (Hebditch et al., 2017) and the results are given in **Figure 11**, displayed using PyMOL (Schrödinger, LLC). Thus, the salt-bridge between Arg<sup>245</sup> and Glu<sup>223</sup> in model **4** (lower panel of **Figure 11**) induces a change in negative charge on the halide that is pushed toward the iron. Swapping the salt-bridge and changing the position of the Arg and Glu residues has a dramatic effect on the charge donation to the active site and now electron density is withdrawn from iron and moved toward the chloride group. Therefore, the salt-bridge is essential for creating the right charge distribution between the iron and chloride group and induces an electric field effect on the active site that enables efficient substrate halogenation as predicted by the DFT studies reported above.

FIGURE 10 | Top: Relative positions of Glu<sup>223</sup> and Arg<sup>245</sup> with respect to the Fe–Cl axis in <sup>5</sup>Re optimized geometries with QM/MM. Optimized geometries calculated at UB3LYP/BS1: Charmm with Turbomole:Charmm. Structure displayed along the Fe–Cl axis by looking through the Fe–O bond. Bottom: Group charges (in atomic units) of the halogenation and hydroxylation transition states of the model system upon addition of a point charge (Q) in the position of the oxygen atoms of Glu223 in model 1.

model <sup>5</sup>Re<sup>4</sup> (A) and model <sup>5</sup>Re4,mod (B), which is model <sup>4</sup> with rotated Arg<sup>245</sup> and Glu<sup>223</sup> residues with respect to the Cl–Fe axis. Color coding for the amino stick representations changes from red (negative potential) to blue (positive potential). The direction of the induced electric field is given by the arrow.

In summary, the QM/MM and DFT calculations presented here highlight that for effective substrate halogenation, radical character on the halogen atom is needed that can pair up with the substrate radical. Perturbations from the protein, i.e. from nearby located anionic amino acid residues, e.g., Glu223, can lead to a push-effect of charge density from the halogen to the metal to achieve this. At the same time, the active site Arg<sup>245</sup> should not be aligned with the Fe–Cl bond. Our observations are in excellent agreement with recent computational studies on SyrB2 using model complexes (Wong et al., 2013; Srnec et al., 2016; Srnec and Solomon, 2017). In particular, it was proposed that ideal orbital overlap between the substrate radical and halogen atom would lower the halogen transfer transition state below the OH rebound barrier and thereby overcome the large hydroxylation driving force.

### CONCLUSIONS

In this work we describe a detailed MM, MD, QM/MM, and DFT study on the nonheme iron halogenase HctB for the first time. Our initial analysis of a structure from the literature revealed several substrate entrance channels. We, therefore, created four models with the substrate in different positions and ran QM/MM calculations on snapshots from these MD simulations. We confirm a reaction mechanism starting from an iron(IV)-oxo species that reacts via hydrogen atom abstraction followed by halogen rebound. Our calculated hydrogen atom abstraction barriers vary dramatically with substrate binding position and a low barrier is found for model **3**, while much higher barriers are seen for models **1** and **2**. Interestingly, two models give preferential halogenation, one model preferential hydroxylation and the fourth one is expected to give a mixture of products. The two models that give preferential halogenation in our analyses use distinct substrate access channels. We analyzed the individual structures and identify an active site Glu residue (Glu223), which is unique for HctB as other halogenases typically have a Ser or Lys in that position. Finally, we compared the QM/MM calculations with small gas-phase model complexes and find that even though the gas-phase structure would imply preferential hydroxylation over halogenation, a complete chemoselectivity reversal can be achieved through external perturbations. Calculations on model systems using either an applied electric field along the positive x-axis or with a point charge with magnitude Q = −1 give dominant halogenation, whereas in all other cases, hydroxylation is predicted. It follows, therefore, that the HctB enzyme structure is designed in such a way, so as to destabilize the hydroxylation pathway and give favorable halogenation products. Overall, our calculations show that substrate binding position is essential for an optimal halogenation reaction. This may come at a cost through higher hydrogen atom abstraction barriers, but the enlarged hydrogen atom abstraction barriers may hamper optimal hydroxo rebound and the formation of alcohol products.

## METHODS

QM/MM calculations were performed on the mechanism of substrate hydroxylation vs. halogenation in the HctB halogenase using methods and procedures utilized previously on analogous enzymatic systems (Aluri and de Visser, 2007; Kumar et al., 2011; Hernández-Ortega et al., 2015; Quesne et al., 2016a; Tchesnokov et al., 2016; Faponle et al., 2017; Hofer and de Visser, 2018).

### Model Building

The work started from the halogenase domain reported by Pratter et al. (2014a,b) which contains an in silico docked hexanoyl phosphopentatheinyl (PPT) moiety (channel II, **Figure 2**). Upon our analysis of the structure; however, another substrate entrance channel was discovered similar to that identified in SyrB2 (channel I, **Figure 2**) (Blasiak et al., 2006). We manually docked the substrate into each substrate entrance channel and replaced the iron(II)-water group by iron(IV)-oxo and shortened the Fe–O distance to 1.63 Å. In addition, α-ketoglutarate was replaced by succinate (Succ) and the acyl protein carrier was removed. The substrate in channel I was positioned in three different orientations relative to the iron(IV)-oxo. The substrates were positioned in such a way that stereochemical clashes with protein residues along the substrate entrance channels were minimized and after insertion of the substrate the system was solvated. During solvation and molecular dynamics simulations no additional water molecules entered the active site pocket. The structures were charge neutralized with 18 chloride and 13 magnesium ions on the surface of the protein. For example, model **1** contained a total of 32,458 atoms including 9,157 water molecules.

## QM/MM Set-Up

Hydrogen atoms were added to models **1**–**4** using the PDB2PQR software package at pH = 7, (Dolinsky et al., 2007) whereby all Glu/Asp side chains were in their basic form and all Arg/Lys residues in the acidic forms. The protonation states of histidine residues was decided upon visual inspection of their local environment and His67, His111, His157, His185, His227, and His<sup>256</sup> were singly protonated at the N<sup>δ</sup> position, whereas His64, His163, His268, and His<sup>273</sup> were protonated at the N<sup>ε</sup> position. Histidine residue His<sup>8</sup> was chosen as being doubly protonated at both the Nδ - and N<sup>ε</sup> -positions.

A constraint geometry optimization with all protein backbone atoms fixed was performed in Charmm, (Brooks et al., 1983) and subsequently the system was solvated in a sphere with a radius of 40 Å. An iterative solvation procedure (**Supporting Information Figure S1**) with fixed protein backbone was followed by an equilibration and heating procedure to 298 K. For all chemical systems described here, a full molecular dynamics simulation with the Charmm forcefield was performed without constraints for a minimum of 10 ns (**Supporting Information Figure S2**). All systems relaxed to stable structures and a random low-energy snapshot was chosen as the starting geometry for the QM/MM calculations.

### QM Region

We selected a QM region containing the iron(IV)-oxo(chloro) group and included the imidazole groups of His<sup>111</sup> and His227, the acetate terminus of succinate (Succ) and the thiohexanoic acid arm of the substrate as our minimal QM region **A**. In addition, a larger QM region was tested that included the full or partial amino acid R-groups within 6 Å of the iron(IV) oxo(chloro) structure: QM region **AB**. For model **1**, the large QM region **AB** contained the amino acid side chains of Ile108, Val113, Asp200, Asp202, Glu223, Val225, Met226, Arg245, and six water molecules. QM region **AB** for Model **2** is described with the same amino acids except Asp202, Val225, and Met<sup>245</sup> and in addition also includes Ser<sup>141</sup> and Phe<sup>221</sup> and four water molecules. Model **3** in its most elaborate form covers Ile108, Val113, Ser141, Asp200, Phe221, Asn243, Arg245, and four water molecules. Finally, the large QM region for Model **4** includes Ile108, Val113, Phe221, Arg245, and five water molecules.

### QM/MM Calculations

Subsequent QM/MM calculations of the different substrate bound iron(IV)-oxo(chloro) enzyme intermediates were performed in Turbomole:Charmm and linked via ChemShell (Ahlrichs et al., 1989; Smith and Forester, 1996; Sherwood et al., 2003). Density functional theory methods at the unrestricted B3LYP level of theory (Lee et al., 1988; Becke, 1993) were applied to the QM region in Turbomole with an SV(P) basis set on all atoms: basis set BS1 (Schafer et al., 1992). Single point calculations on the optimized structures in QM/MM were performed with an all-electron Wachters-type basis set on iron and def2-TZV(P) on the rest of the atoms: basis set BS2 (Wachters, 1970; Hay, 1977; Krishnan et al., 1980; Bauschlicher et al., 1989). For several systems we tested a full geometry optimization of using basis set BS2 but found similar structures and relative energies to those obtained with BS1; hence BS1 was used for all systems. The MM region was described with the Charmm forcefield (Brooks et al., 1983). A detailed benchmark study comparing computational free energies of activation with experimental data from Eyring plots gave good agreement between experiment and theory for the methods described here (Cantú Reinhard et al., 2016a). Furthermore, these methods were also seen to excellently reproduce reduction potentials of copper proteins, (Fowler et al., 2017) and regio- and chemoselectivities of reaction mechanisms (Jastrzebski et al., 2014; Barman et al., 2016b; Yang et al., 2016).

Geometries were optimized at UB3LYP/BS1 in Turbomole:Charmm and a frequency calculation on the QM region only was done to test whether it was a local minimum or first order saddle point. Geometry scans with one degree of freedom fixed were performed along a specified reaction

### REFERENCES


coordinate. The maximum of these scans were subjected to a full geometry optimization of a first-order saddle point and the starting and end points of the scans established the proposed reaction paths.

### AUTHOR CONTRIBUTIONS

AT and SdV devised the project. AT ran the QM/MM calculations. NF and JW performed the electrostatic calculations. AT, NF, GS, and SdV wrote the paper.

### FUNDING

The BBSRC is acknowledged for a studentship for AT and NF under grant code BB/J014478/1. The EU-COST Network Explicit Control Over Spin-states in Technology and Biochemistry (ECOSTBio, CM1305) is acknowledged for support.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2018.00513/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Timmins, Fowler, Warwicker, Straganz and de Visser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# On the Inhibition Mechanism of Glutathione Transferase P1 by Piperlongumine. Insight From Theory

Mario Prejanò, Tiziana Marino\* and Nino Russo

Dipartimento di Chimica e Tecnologie Chimiche, Università della Calabria, Arcavacata di Rende, Italy

Piperlongumine (PL) is an anticancer compound whose activity is related to the inhibition of human glutathione transferase of pi class (GSTP1) overexpressed in cancerous tumors and implicated in the metabolism of electrophilic compounds. In the present work, the inhibition mechanism of hydrolyzed piperlongumine (hPL) has been investigated employing QM and QM/MM levels of theory. The potential energy surfaces (PESs) underline the contributions of Tyr residue close to G site in the catalytic pocket of the enzyme. The proposed mechanism occurs through a one-step process represented by the nucleophilic addition of the glutathione thiol to electrophilic species giving rise to the simultaneous C-S and H-C bonds formation. Both the used methods give barrier heights (19.8 and 21.5 kcal mol−<sup>1</sup> at QM/MM and QM, respectively) close to that experimentally measured for the C-S bond formations (23.8 kcal mol−<sup>1</sup> ).

#### Edited by:

Vicent Moliner, Universitat Jaume I, Spain

#### Reviewed by:

Katarzyna Swiderek, Universitat Jaume I, Spain Laura Masgrau, Universidad Autónoma de Barcelona, Spain

\*Correspondence:

Tiziana Marino tiziana.marino65@unical.it

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 05 September 2018 Accepted: 26 November 2018 Published: 10 December 2018

#### Citation:

Prejanò M, Marino T and Russo N (2018) On the Inhibition Mechanism of Glutathione Transferase P1 by Piperlongumine. Insight From Theory. Front. Chem. 6:606. doi: 10.3389/fchem.2018.00606 Keywords: glutathione S-transferase, piperlongumine, hydrolysis mechanism, inhibition mechanism, MD DFT, QM, QMMM

## INTRODUCTION

Glutathione-S-transferases (GSTs) is a ubiquitous family of multifunctional enzymes of phase II detoxification system that conjugate reactive substrates with reduced tripeptide glutathione (GSH) in most cells, especially those in the liver and kidney (Hayes et al., 2005; Oakley, 2005; Stoddard et al., 2017). In particular, they catalyze the nucleophilic attack of the thiol group arising from cysteine residue (Cys) of the GSH on electrophilic substrates leading to formation of conjugates, that are less toxic and more water-soluble than the parent species, facilitating their elimination from cells (Broxterman et al., 1995; Townsend and Tew, 2003; Wang et al., 2017). Their role in protecting the cells from oxidative attack, in association with their overexpression in many cancer cells, makes them good candidates as cancer biomarkers (McIlwain et al., 2006; Lo and Ali-Osman, 2007). Furthermore, glutathione-S-transferases are associated with multidrug resistance of tumor cells and are involved in drug detoxification and in apoptosis control (Townsend and Tew, 2003; Mejerman et al., 2008). Mammalian cytosolic GSTs isoenzymes belong to different families or classes (alpha, mu, pi, theta, kappa, sigma, zeta, and omega) (Wilce and Parker, 1994; Armstrong, 1997; Sheehan et al., 2001) based on their molecular masses, isoelectric points and other properties. Every isoenzyme subunit contains an active site entailing a binding site for the cofactor GSH (Gsite) and one for the electrophilic substrate (H-site) (Dirr et al., 1994; Wilce and Parker, 1994). In particular, the Glutathione S-transferase P1 (GSTP1) is overexpressed in different human malignancies affecting important organs as lung, colon, stomach, kidney, ovary, mouth, and testis (Green et al., 1993; Katagiri et al., 1993; Grignon et al., 1994; Okuyama et al., 1994; Zhang et al., 1994; Inoue et al., 1995; Ruiz-Gomez et al., 2000). This overexpression has been linked to acquire multidrug resistance to chemotherapeutic agents (cisplatin, chlorambucil, and ethacrynic acid) (Ban et al., 1996; Oakley et al., 1997; Mejerman et al., 2008; Karpusas et al., 2013; Pei et al., 2013; Perperopoulou et al., 2018). GSTP1 has additional role in maintaining the cellular redox state (Tew, 2007) and "nonenzymatic" antiapoptotic activity through its interaction with the c-Jun NH2-terminal kinase (JNK), a key enzyme implicated into the apoptotic cascade (Adler et al., 1999; Wang et al., 2001). For these reasons, GSTP1 is considered as a promising target for inactivation in cancer treatment and numerous researchers have spent considerable effort to propose potent inhibitors of this enzyme (Bezerra et al., 2007; Federici et al., 2009; Raj et al., 2011; Adams et al., 2012; Boskovic et al., 2013; Liao et al., 2016; Harshbarger et al., 2017; Zou et al., 2018). Among these, piperlongumine (PL) is a natural alkaloid isolated from Piper longum L. characterized by the presence of two α, β- unsaturated functionalities (see **Figure 1**) and recently has been reported as a promising anticancer molecule by targeting the stress response to ROS, inducing apoptosis (Adams et al., 2012; Boskovic et al., 2013; Liao et al., 2016; Harshbarger et al., 2017).

This molecule also represents a promising lead compound in the developing potent GSTP1 inhibitors stimulating the synthesis of a huge number of its structural analogs (Bezerra et al., 2007; Adams et al., 2012; Boskovic et al., 2013; Liao et al., 2016; Harshbarger et al., 2017; Stoddard et al., 2017). PL acts as Michael acceptor because can undergo heteroconjugate addition with the peptide-like molecules including nucleophilic thiols of cysteine residues in irreversible or reversible fashion. From stable isotope labeling (Raj et al., 2011) the anti-cancer effects of PL were related to the promotion of reactive oxygen species (ROS) and to the reduction of GSH cellular levels (Harshbarger et al., 2017). PL contains a trimethoxyphenyl head and two reactive olefins moieties (C2-C3 and C7-C8) that revealed to be essential for differentiating the cellular activity (Adams et al., 2012). The C2- C3 bond is critical for toxicity, ROS elevation and protein Sglutathionylation while C7-C8 is not necessary for these activities and is believed to enhance the toxicity (Adams et al., 2012; Harshbarger et al., 2017). This means that the two present olefins can be identified as the minimum pharmacophore of PL so that their modifications can originate analogs with different biological response (Adams et al., 2012). Furthermore, it can act as GSTP1-cosubstrate in both displacement and addition reactions. In this case, GSH bound in the G site of GSTP1 is the target of the inhibitor (Adams et al., 2012; Harshbarger et al., 2017). Recently, the high resolution X-ray crystal structure of GSTP1 (PDB code 5J41) in complex with PL and GSH proposed as the inhibition occurs without GSTP1 covalent modification by PL but, rather unexpectedly, PL results to be hydrolyzed to a trimethoxycinnamic acid (TMCA) deprived of the C2-C3 olefin (Harshbarger et al., 2017). This finding does not completely fit the behavior of PL toward other cysteine-containing peptides that react with the C2-C3 reactive bond in vitro conditions (Adams et al., 2012). Harshbarger et al. provided the first structural model for the interactions between PL, GSH and GSTP1 (Harshbarger et al., 2017). From this study emerged that PL acts as a prodrug. In fact, after entrance in the cell it undergoes hydrolysis giving rise to the TMCA that in turns reacts with GSH, located in the G site of GSTP1, affording the hPL:GSH conjugate as product of the addition reaction and confirming that no covalent bond formation occurs between PL and GSTP1. Although the presence in the literature of many scientific works (Bezerra et al., 2007; Federici et al., 2009; Adams et al., 2012; Boskovic et al., 2013; Peng et al., 2015; Liao et al., 2016; Harshbarger et al., 2017; Zou et al., 2018) on the piperlongumine selective inhibition of tumor growth in different types of cancers, the molecular mechanism involved in PL mediated cancer cell death remains still poorly understood. With the aim to contribute to a better knowledge, at atomistic level, of the inhibition mechanism of GSH by the hydrolyzed product of PL into the GSTP1 enzyme, a theoretical investigation in the framework of density functional theory (DFT) was undertaken. In addition, a MD simulation of initial enzyme-inhibitor (EI) complex has been also performed.

### METHODOLOGY

### Active Site

The enzyme structure includes two identical homodimers, with a total mass of 48 kDa. The active sites are located in the interfaces between the two domains. Each active site in turn consists of two sub-sites: the G site, where GSH is located, in proximity to the outer side of protein surface and in direct contact with solvent molecules, and the H site, where the electrophilic inhibitor can be accommodated (Harshbarger et al., 2017). The interactions between GSH and hPL with different residues of the cavity of both sites were treated at quantum mechanical level in both QM and QM/MM calculations. In particular, the QM region includes: the Arg13 which is engaged in hydrogen bonds with the N-terminal portion of GSH and the carboxyl group of inhibitor, the Lys44 which is anchored to C-terminal part of GSH, the Tyr7 with its OH moiety oriented toward the S atom of GSH-cysteine in such a manner to establish H bond between them, and Tyr108 which is involved in π-π interaction with inhibitor aromatic ring. Finally, the QM portion, contains also the Ile104 since its crucial role in correctly orienting hPL (in H site) during the conjugation phase with GSH (Harshbarger et al., 2017). Due to the closeness of active site to the protein surface, several water molecules present in the catalytic cavity were considered in the QM/MM model. Starting from the available crystallographic structure of GSTP1 by Homo Sapiens (Harshbarger et al., 2017; PDB code 5J41, 1.19 Å resolution), the preparation of the models (see **Figure 2**) is described by the following procedure.

### MD Calculations

As first step of the work it was necessary to perform the C8hPL-SGSH bond cleavage and then to relax the enzyme:GSH:inhibitor supramolecular system at the molecular mechanics (MM) level of theory before starting the MD simulation because the used X-ray structure was related to the final product of the inhibition process. Furthermore, the presence of the inhibitor molecule of non-protein nature implied its optimization at HF/6-31G(d) level of theory in order to derive the parameters by Antechamber tool, as implemented in AMBER 16 package (AMBER 16, 2016). Intramolecular Lennard-Jones parameters and atomic charges were obtained using, respectively, General Amber Force Field (GAFF) (Wang et al., 2004)

and Restrained Electrostatic potential (RESP) method (Bayly et al., 1993). The obtained parameters of hPL are collected in **Table S1**.

The amber ff14SB (Maier et al., 2015) force field was applied using the Xleap module and hydrogen atoms were added to the whole system. The protonation state of each amino acid has been assigned using the H++ web server (Gordon et al., 2005; Myers et al., 2006; Anandakrishnan et al., 2012; H++ vesion 3.2, 2016). A rectangular box (85 × 70 × 80Å) was filled with TIP3P (Jorgensen et al., 1983) water molecules within 12.0 Å from the surface of the enzyme. The classical MD simulation was applied for 100 ps in NVT ensemble with a progressive heating phase, from 0 to 310 K. A final MD production of 20 ns was obtained in NPT ensemble (1 bar and 310 K). During the simulations, a cutoff radius for non-bonded interactions was fixed at 12 Å and Particle Mesh Ewald summation method (PME) (Ewald, 1921) and SHAKE algorithm (Ryckaert et al., 1977) were employed to constrain the motion in H-including bonds, in order to use a 2 fs integration step The root-meansquare deviation (RMSD) analysis of the whole protein and the H and G active sites residues was performed to verify the stability of the system during the MD simulation (**Figure S1**). To better examine the conformational behavior of the inhibitorprotein system, a MD simulation has been also performed on the alone enzyme. The obtained root-mean-square fluctuation (RMSF) is shown in **Figure S2**. Furthermore, in order to verify conformational homogeneity for inhibitor binding modes in to the catalytic pocket, 20 structures were selected along MD simulation (**Figure S3**). Clustering results confirmed that the last frame, obtained at 20 ns, is a good representative configuration as to be adopted as starting configuration for creating QM cluster and QM/MM model.

### QM Cluster and QMMM Models

The amino acids considered in the QM region (Tyr7, Arg13, Tyr103, Ile104, Lys44) were truncated as depicted in **Figure 2**. The missing hydrogens were added manually and one water molecule (lying at 3.601 Å from the GSH) was explicitly considered, being implicated in direct interaction with nucleophilic agent while the other waters are located away than 4 Å. The C atoms labeled with "<sup>∗</sup> " were kept fixed during geometry optimizations, applying the locking scheme, to prevent artificial movements (Siegbahn and Himo, 2011; Piazzetta et al., 2015; Himo, 2017). The QM cluster model was found to be adequate in the elucidation of the catalytic mechanism followed by other enzymes (Amata et al., 2011; Lan and Chen, 2016; Prejanò et al., 2017). The obtained model consists of 136 atoms with a total charge equal to zero.

The QM/MM model was obtained applying the two layers ONIOM formalism (Svensson et al., 1996) as implemented in Gaussian09 code (Frisch et al., 2013), maintaining the same

atoms mentioned in QM cluster model setup. The entire enzyme and a number of water molecules within 5 Å from the catalytic site were considered (**Figure 2**). During the optimization, all the water molecules and residues out of 18 Å sphere from the active site were kept frozen, applying the standard procedure for single conformation PES studies (Sousa et al., 2017). The final model contains 7811 atoms.

### Technical Details

Gaussian 09 (Frisch et al., 2013) software package was used to perform calculations using B3LYP (Lee et al., 1988; Becke, 1993) hybrid functional in QM region of both used models. For S, N, H, O and C atoms, 6-31+G(d,p) basis set was used during the optimizations. Linear transit scans were performed, in order to detect stationary points along reaction coordinates. To confirm the nature of intermediates or transition states, frequencies calculation was performed at the same level of theory, for each stationary point intercepted along potential energy surface (PES). To obtain more accurate electronic energies single point calculations with 6-311+G(2d,2p) larger basis set were performed. The final energy profiles include the zero point energy (ZPE) and dispersion corrections (evaluated using the DFT-D3 procedure; Grimme et al., 2011) and solvation energy.

The electrostatic embedding as implemented in Gaussian 09 was employed to evaluate the Coulomb interactions between MM and QM regions in all calculations (Vreven et al., 2006). For the QM cluster calculations, single point calculations adopting the SCRF-SMD solvation model with a dielectric constant ε = 4, simulating the enzyme environment, was used (Marenich et al., 2009). The same level of theory was adopted during the optimizations of species involved in hydrolysis of PL, considering the dielectric constant of 78.0, as successfully proposed in other studies (Ritacco et al., 2015; Marino et al., 2016). NBO (NBO version 3.1, 2001) and non-covalent interaction (NCI) (NCIPLOT, version 3.0, 2011) analyses were performed on all the stationary points of the investigated PESs at both QM and QM/MM levels.

As far as the proton affinity calculations for establishing the oxygen carbonyl to be considered in the hydrolysis mechanism at acidic conditions, the proton affinities as binding energies (BE) have been estimated as indicated by the following expression:

$$\text{BE} = - (\Delta \text{H}\_{\text{hPL}-\text{H}+} - \Delta \text{H}\_{\text{hPL}})$$

The BE is calculated as the difference between the enthalpy of the protonated system and that of the neutral one. In the calculations, the H<sup>+</sup> contribution does not appear since we evaluated the energetic difference, therefore the obtained binding energies represent the energy involved in the formation of the protonated systems.

### RESULTS AND DISCUSSION

## Hydrolysis of Piperlongumine

Following the recent experimental indications that demonstrate as the PL suffers hydrolysis out of the enzyme pocket, (Harshbarger et al., 2017) we firstly study this process in aqueous media. The considered reaction mechanism is illustrated in **Figure 3**. As from the experimental evidence, (Harshbarger et al., 2017) we considered the hydrolysis mechanism of PL to occur on the oxygen of the carbonyl (C6) functionality next to C7- C8 olefin, under both neutral and acidic conditions to take into account the different intracellular pH conditions, since the acid pH values are observed in cancer cells (Townsend and Tew, 2003; Wang et al., 2017).

On the contrary, our computed BE shows that the carbonyl moiety next to C2-C3 olefin has minor proton affinity (about 4 kcal mol−<sup>1</sup> ) with respect to that next to C7-C8 one, indicating as under the same conditions the favored protonation site is the oxygen of C6.

enzyme. The final energies contain ZPE and D3 corrections.

The optimized geometries of the stationary points are reported in (**Figures S4**, **S5**), while the calculated energy profiles are depicted in **Figure 4**. As shown from **Figure 3**, we propose at acid pH a mechanism occurring in a multistep process contrary to that at neutral conditions occurring in only one step. In both cases the product is the hPL, while the leaving group is 1,2,5,6-tetra-hydro-pyridin-2-ol (t-PyrOH) neutral and protonated, respectively. For clarity, in the text the remaining double bond in the hPL upon hydrolysis will retain the same numeration than in PL, (C7-C8). The processes are exothermic although at pH acid the exoergonicity is more pronounced (**Figure 4**). From our results, the acidic hydrolysis is strongly favored as suggested by lower activation barriers (by about 10 kcal mol−<sup>1</sup> ) than that found at neutral pH (see **Figure 4**). The calculated barrier in acidic environment well-agrees with those characterizing other anticancer molecules acting as prodrug (Alberto and Russo, 2011; Ritacco et al., 2015; Marino et al., 2016). Once the hydrolyzed product is formed, the process of GSH-conjugation favored by GSTP1 starts through the attack to the C7-C8 double bond.

### GSTP1 Inhibition

To underline the role of GSTP1 during the inhibition process by hPL, we have considered, at both QM and QM/MM levels, two different reaction mechanisms (**A**, and **B**) as presented in **Figure 5**. In particular: (**A**) describes the nucleophile addition to the double bond of inhibitor by -SH group of GSH without involving any amino acid residue while path (**B**) takes into account the participation of the Tyr7 residue in the formation of the covalent adduct. In all the cases, the inhibition reaction occurs in a one-step process by the Michael addition of the thiol from GSH at the C7-C8 olefin of hPL. In all the considered mechanisms, the starting species is the ternary enzyme-hydrolyzed inhibitor-GSH complex (**EI**) obtained after the geometry optimization of the frame isolated by the previous MD cycle at 20 ns. From **Figure 6**, that illustrates the superposition of the crystallographic structure with the last MD snapshot, it is possible to note that hPL interacts with the binding cleft of the H site and no water molecules are close to the reaction center, in agreement with the hydrophobic nature of site (Tyr7, Tyr108, and Ile104 residues).

As expected, the crystallographic pose (obtained with the reaction product) deviates in this moiety (see **Figure 6**). At the contrary, the GSH region is well-superimposed confirming that this molecule is well-positioned with a correct orientation of the thiol.

The energy profiles obtained employing QM and QM/MM tools for the two considered mechanisms, are reported in **Figure 7**. The reported QMMM energy values do not include the entropic contribution. In order to quantify this the Grimme procedure has been employed (Grimme, 2012). Results (see **Table S2**) evidence that the T1S terms slightly affect the previously obtained energy values. QM/MM structures of the initial complex (**EI**), the final S-conjugated product (**P**) and that of the transition states are reported in **Figure 8**. All the QM optimized geometries are given in (**Figure S6**).

In **EI** the carboxylate moiety of the hPL is oriented in such a way to establish hydrogen bonds with Tyr7 (1.599 Å) and SH group of GSH (2.138 Å). Furthermore, van der Waals and hydrophobic interactions, such as those between the inhibitor and the Ile104 and Arg14 residues (see **Figure 8**) contribute to optimally accommodate the inhibitor into the H binding site. In fact, now the key reacting atoms, C8 of the hPL and SH nucleophile species, are placed in suitable way (at 4.271 Å) to allow the deactivation reaction. In path **A** the intercepted transition state (**TSA**) represents a four-centered structure where the sulfur addition to the C8 (1.880 Å) and the C7-H bond formation (1.467 Å) simultaneously occur. The corresponding frequency analysis confirms a first-order saddle point with an imaginary frequency (1510i cm−<sup>1</sup> ) which corresponds to a vibrational mode involving a strong C7–H coupling and a relatively weaker C8–S one. The C8-S bond is already established and the forming C-H one can be evinced by the elongation of the C7-C8 bond (1.544 Å). This barrier results to be 64.4

hPL:GSH conjugated product (violet) with its corresponding crystallographic structure characterized (yellow).

kcal mol−<sup>1</sup> at QM/MM level and 70.9 kcal mol−<sup>1</sup> at QM one. Both values are very close to that computed for the reaction unassisted by the catalyst (76.0 kcal mol−<sup>1</sup> , see **Figure 7**). The resulting product (**P**), shown in **Figure 8,** evidences that the C-S bond is formed (1.818 Å) and the C8-C7 is elongated (1.532 Å) confirming the occurred sp<sup>3</sup> hybridization of the two involved carbon atoms. The exothermicity is evaluated to be 10.8 kcal mol−<sup>1</sup> (5.8 kcal mol−<sup>1</sup> in the QM cluster). The mechanism **B** (**Figure 5**) involves the participation of the Tyr7 residue. In **Figure 8** is reported the optimized structure of the **TSB** connecting the **EI** and the covalent final complex (**P**). The nucleophilic attack to C8 occurs by GSH-thiolate (1.914 Å) since the hydrogen of the S-HGSH group (2.019 Å) has been delivered to oxygen (OTyr) of the side chain of Tyr7 (1.090 Å). In fact, the OH group of Tyr7, oriented via hydrogen bonding to carboxylate moiety of the inhibitor (1.599 Å), in the TS becomes 1.310 Å and points toward C7 atom for delivering its hydrogen atom (O-H and H-C7 distances are found to be 1.310 and 1.174 Å, respectively) while the C7-C8 bond is elongated (1.535 Å) (see **Figure 8)**. The TS located along the mechanism **B** lies at 21.5 kcal mol−<sup>1</sup> (QM) and 19.8 kcal mol−<sup>1</sup> (QM/MM) above the **EI**. Both values are very close to the available experimental one (23.8 kcal mol−<sup>1</sup> ) concerning the C-S bond formation (Huskey et al., 1991). The superposition of our optimized glutathionilconjugated product **P** with the corresponding crystallographic structure (Harshbarger et al., 2017; see **Figure 9**) reveals a good RMSD value in both GSH and H site regions. The exothermicity 10.8 kcal mol−<sup>1</sup> means that the reverse reaction can be accessible but much slower also for the high barrier required in the reverse process **P EI** (30.6 kcal mol−<sup>1</sup> ). **TSB** evidence as the formation of the S-C8 bond is strictly related to the deprotonation of the SH moiety of GSH at the expense of the Tyr7 acting as proton shuttle with a consequent reduction of barrier (19.8 kcal mol−<sup>1</sup> ).

This is in agreement with previous works on other GST enzymes (Zheng and Ornstein, 1997; Angelucci et al., 2005; Dourado et al., 2008) revealing the importance of the acidic properties of a Tyr during the catalysis of glutathione-S-Transferase. Furthermore, our findings corroborate the hypothesis advanced by the previous structural analysis (Harshbarger et al., 2017) revealing as no covalent bond formation between hPL and GSPT1 was observed and that PL acts as prodrug. With the aim to evaluate the nature of the interactions present inside the catalytic pocket during the process, in **Figure 10** we have reported the density of isosurfaces arising from NCI analysis, indicating the different contributions of the residues retained in the QM region.

In every characterized stationary point, it can be noted the salt bridges occurring between the side-chains of Lys44 and Arg13 with the carboxyl moiety of carboxyl- and amino-terminal of GSH (blue region indicates strong attractive interactions while the red isosurfaces account for the repulsive interactions related to the center of π systems of Tyr7 and Tyr108 and the inhibitor molecule, as usually for aromatic systems strong non-bonded overlap is indicated (Johnson et al., 2010). Further information arises from the green regions indicative of the van der Waals forces characterizing the cavity containing the inhibitor molecule

and identified by the hydrophobic residues Tyr108 and Ile104. It is interesting to underline as the interaction involving the Ile104 becomes more intense as the reaction proceeds. At the contrary no relevant contributions arise from the NBO analysis (see **Table S3**) except for a little bit increased nucleophile nature of the sulfur atom of GSH in the enzyme and a decreased negative charge on the C7-C8 bond of the hPL species with the respect to the corresponding values obtained for the process unassisted by enzyme. In the **TSB** species, a more attractive interaction appears in proximity of the region interested in the chemical events (circled in red in **Figure 10**) symptomatic of the occurring S-C8 bond formation. Furthermore, the interatomic distances, during the mechanism, between the residues of the QM region and the GSH and hPL species, reported in **Table S4**, highlight how no significant change occurs in the catalytic pocket.

circle defines the portion where the bonds breaking and formation occur.

### CONCLUSION

This study focuses on the inhibition mechanism of the glutathione-S-transferase Pi 1 by the hydrolyzed product of piperlongumine. We propose the mechanism following the most recent experimental evidences taking into account in particular the role of Tyr7 on the complex formation between the glutathione and the inhibitor inside the catalytic pocket of enzyme.

The hydrolysis of PL for giving hPL has been considered in neutral and acid conditions. The last **one** provided the better energetic path.

The agreement between cluster QM and the more computational demanding hybrid QM/MM methods is quite good. Structural and energetic computed properties are in line with the available experimental data.

The lowest energy reaction mechanism for reaction of hPL with GSH corresponds to that in which the Tyr7 residue is involved in the inhibition reaction deprotonating the GSH and donates the proton, in a concerted fashion, to the C7 substrate atom. The computed barrier heights result 19.8 and 21.5 kcal mol−<sup>1</sup> in both QM/MM and QM models, respectively. Both computations clearly indicate the same reaction mechanism by TSB as the preferred one with difference in the barrier eight is of only 1.7 kcal mol−<sup>1</sup> and propose an exergonic reaction.

We hope that the obtained new insights on the reaction mechanism of human GSTP1 inhibition with natural piperlongumine substrate can be useful in the design of new selective and more potent inhibitors.

### AUTHOR CONTRIBUTIONS

MP, TM, and NR have analyzed the results, edit, and reviewed equally the manuscript. MP, TM, and NR approved it for publications.

### REFERENCES


### FUNDING

Financial support from the Università degli Studi della Calabria-Dipartimento di Chimica e Tecnologie Chimiche (CTC) is acknowledged.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2018.00606/full#supplementary-material


of glutathione transferases. Crit. Rev. Biotechnol. 38, 511–528. doi: 10.1080/07388551.2017.1375890


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer KS and handling Editor declared their shared affiliation.

Copyright © 2018 Prejanò, Marino and Russo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computational Understanding of the Selectivities in Metalloenzymes

Wen-Jie Wei, Hui-Xia Qian, Wen-Juan Wang and Rong-Zhen Liao\*

*Key Laboratory of Material Chemistry for Energy Conversion and Storage, Ministry of Education, Hubei Key Laboratory of Bioinorganic Chemistry and Materia Medica, Hubei Key Laboratory of Materials Chemistry and Service Failure, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology, Wuhan, China*

Metalloenzymes catalyze many different types of biological reactions with high efficiency and remarkable selectivity. The quantum chemical cluster approach and the combined quantum mechanics/molecular mechanics methods have proven very successful in the elucidation of the reaction mechanism and rationalization of selectivities in enzymes. In this review, recent progress in the computational understanding of various selectivities including chemoselectivity, regioselectivity, and stereoselectivity, in metalloenzymes, is discussed.

Keywords: QM, QM/MM, metalloenzyme, selectivity, reaction mechanism

INTRODUCTION

#### Edited by:

*Vicent Moliner, Universitat Jaume I, Spain*

#### Reviewed by:

*Jordi Poater, University of Barcelona, Spain Lung Wa Chung, Southern University of Science and Technology, China*

> \*Correspondence: *Rong-Zhen Liao rongzhen@hust.edu.cn*

#### Specialty section:

*This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry*

Received: *22 October 2018* Accepted: *07 December 2018* Published: *21 December 2018*

#### Citation:

*Wei W-J, Qian H-X, Wang W-J and Liao R-Z (2018) Computational Understanding of the Selectivities in Metalloenzymes. Front. Chem. 6:638. doi: 10.3389/fchem.2018.00638* Enzymes play a vital role in many biological processes, such as cell growth, food metabolism, signaling, regulation, energy transduction, and genetic translation (Bugg, 2001). More than 6,000 biochemical reactions have been found to be catalyzed by enzymes, (Schomburg et al., 2017) about one-third of which are metalloenzymes (Ragsdale, 2006). A particular advantage of using enzymes for biotransformation's, is their ability of accelerating chemical reactions by 7 to 19 orders of magnitude in a mild reaction condition (Wolfenden and Snider, 2001). In addition, the enzyme catalysis is generally associated with exquisite selectivity, including chemoselectivity, regioselectivity, and stereoselectivity. However, some enzymes also accept unnatural substrates and show promiscuity, (Khersonsky and Tawfik, 2010), which has been proposed as directly connected to enzyme evolution (Jensen, 1976). An enzyme can also be engineered to accept the non-natural substrate with high efficiency and selectivity. For example, Kan et al. have applied the directed evolution methodology to expand the catalytic function of the cytochrome c, which elegantly mediates the C-Si bond formation with a broad substrate scope as well as with high chemoselectivity and enantioselectivity (Kan et al., 2016).

Understanding reaction mechanisms and selectivities in enzymes, is of fundamental and practical importance. Various experimental techniques have been developed to elucidate many aspects of these two important questions, including X-ray structure analysis, NMR, kinetic analysis, site-directed mutagenesis, isotope labeling, and spectroscopic methods. With the continuous advancement of computer power, the computational chemistry methods have been developed as a crucial complimentary method, to the current experimental methods in the study of enzyme catalysis (Martí et al., 2004; Bruice, 2006; Gao et al., 2006; Warshel et al., 2006; Dal Peraro et al., 2007; Senn and Thiel, 2007a, 2009; Ramos and Fernandes, 2008; Lonsdale et al., 2010, 2012; Siegbahn and Himo, 2011; Rovira, 2013; Blomberg et al., 2014; Merz, 2014; Swiderek et al., 2014; Brunk and Rothlisberger, 2015; Quesne et al., 2016; Sousa et al., 2017; Ahmadi et al., 2018; Cerqueira et al., 2018). From theoretical calculations, different reaction pathways can be analyzed and the structures of transition states and intermediates can be obtained, which are very difficult when using experimental tools. A wealth of new insights can be gained, in particular, selectivities and other important mechanistic aspects, such as kinetic isotope effects (Gao and Truhlar, 2002; Truhlar et al., 2002; Hammes-Schiffer, 2006; Pu et al., 2006), nuclear tunneling (Gao and Truhlar, 2002; Truhlar et al., 2002; Hammes-Schiffer, 2006; Pu et al., 2006) and dynamic effects (Villa and Warshel, 2001; Gao and Truhlar, 2002; Truhlar et al., 2002; Antoniou et al., 2006; Hammes-Schiffer, 2006; Olsson et al., 2006; Pu et al., 2006), can be rationalized.

In general, there are two popular approaches in the modeling of enzymes. The first one is called the quantum chemical cluster method, developed by Siegbahn, Blomberg, Himo, et al. (Siegbahn and Blomberg, 1999, 2000; Himo and Siegbahn, 2003; Noodleman et al., 2004; Siegbahn and Borowski, 2006; Siegbahn and Himo, 2009, 2011; Blomberg et al., 2014; Himo, 2017). In this approach, with more than 20 years' experience, the small model of the active site is capable of capturing the key feature of the catalysis. At the beginning, cluster models of 30–50 atoms were commonly used and nowadays cluster models of more than 200 atoms are routinely handled. The second approach is the combined quantum mechanics/molecular mechanics (QM/MM) method, which was first proposed by Warshel and Levitt (1976). In this case, the whole solvated enzyme was typically used as the model. Two QM/MM protocols were commonly applied, namely electronic energy calculations from geometry optimization starting from the X-ray structure or certain snapshots from the trajectory of a classical molecular dynamics simulations, and free energy calculations from QM/MM molecular dynamics simulations (Martí et al., 2004; Bruice, 2006; Gao et al., 2006; Warshel et al., 2006; Dal Peraro et al., 2007; Senn and Thiel, 2007a, 2009; Ramos and Fernandes, 2008; Lonsdale et al., 2010, 2012; Rovira, 2013; Merz, 2014; Swiderek et al., 2014; Brunk and Rothlisberger, 2015; Quesne et al., 2016; Sousa et al., 2017; Ahmadi et al., 2018; Cerqueira et al., 2018). With a proper selection and an increase of the size of the QM region, Liao, and Thiel have shown that the cluster approach and the QM/MM geometry optimization protocol gave similar results and the same conclusion (Liao and Thiel, 2012, 2013a).

In this review, the main focus was on the selectivities of metalloenzymes, for which the elucidation of the enzymatic mechanism was a perquisite. The reproduction of the selectivity can be considered as further support for the suggested mechanism. It should be pointed out that not all computational studies from the literature can be discussed, therefore, a number of illustrative examples have been chosen for each type of metalloenzymes, covering Mn, Fe, Co, Ni, Zn, Mo, and W. Recently, de Visser presented an excellent summary of the substrate selectivity of non-heme iron dioxygenases (de Visser, 2018).

### METHODS AND MODELS

In the quantum chemical cluster modeling of metalloenzymes, a model of the active site is designed, typically containing the metal ions with their first-shell ligands and some important second-shell ligands. The whole model can be treated at the highest possible level, mainly hybrid density functional methods with very large basis sets.

The protein environment has two major effects, sterically and electrostatically, which can be covered in a simple but effective manner. The steric effects imposed by the protein matrix are taken into account using the coordinate-locking scheme, in which certain atoms are fixed at their X-ray structure positions. The general experience is to select those atoms at the periphery of the model, typically the α-carbon atoms of the residues where the truncation is introduced. In some cases, one or two more hydrogen atoms attached to each α-carbon atom along the backbone are fixed to avoid unrealistic rotation of the residues. Due to the usage of the coordinate-locking procedure, the accuracy of the results from the cluster calculations may depend on the quality (or the coordinate error) of the crystal structure used. With acetylene hydratase as an example, Liao and Thiel have demonstrated that a coordinate error of 0.1 Å for the backbone atoms, corresponding to a resolution of about 2.0 Å for the crystal structures, result in tolerable errors (about 1 kcal/mol for relative energies and 0.01 Å for bond distances) for cluster models with about 100 atoms (Liao and Thiel, 2013b). Larger models with more flexibility may be needed to obtain reliable results when starting from a crystal structure with a lower resolution and larger coordinate error. For residues that participate directly in the catalyzed reaction, it is advisable to lock atoms further away than the α-carbon.

The electrostatic interactions between the model and the surroundings, are treated using continuum solvation model methods, such as the SMD solvation method (Marenich et al., 2009), typically with a dielectric constant of 4 (Blomberg et al., 2014). For most enzymatic reactions, in which the total charge of the model does not change during the reaction, the choice of the dielectric constant has been shown to be unimportant when the cluster model reaches a size of around 200 atoms. This kind of phenomenon has been observed for four different electrostatically challenging examples, namely 4-oxalocrotonate tautomerase (Sevastik and Himo, 2007), haloalcohol dehalogenase (Hopmann and Himo, 2008), histone lysine methyltransferase (Georgieva and Himo, 2010), and aspartate decarboxylase (Liao et al., 2011a). However, when the total charge of the model changes, such as in the calculations of an electron redox potential and pKa values, the solvation effects become very large and also sensitive to the choice of the dielectric constant (Liao et al., 2015, 2016). An error of 5–10 kcal/mol may be expected, which is too large to be ignored. It should be pointed out that an empirical method has been developed by Siegbahn in the study of the oxygen evolving complex in photosystem II, for which a single parameter, derived from fitting the available experimental data, was used to determine the redox potentials and pKas (Siegbahn, 2013). This approach has also been successfully applied in the study of Cytochrome c Oxidase (Blomberg and Siegbahn, 2010), nitric oxide reductase (Blomberg and Siegbahn, 2013), and nitrogenase (Siegbahn, 2017).

The most popular functional applied for metalloenzymes is the B3LYP (Becke, 1993) functional (with 20% HF exchange), for which the major improvement is the addition of the empirical D3 dispersion corrections as proposed by Grimme (termed B3LYP-D3) (Grimme et al., 2010). Other functionals, such as B3LYP<sup>∗</sup> -D3 (with 15% HF exchange; Reiher et al., 2001), TPSSh-D3 (Staroverov et al., 2003), and the M06 series (Zhao and Truhlar, 2008), can be applied to examine the sensitivity of the results to the choice of functional. Geometry optimizations are usually carried out by using a double-zeta quality basis set with effective pseudopotentials on the metals, while all electron basis sets may also be used. The final energies are then estimated by performing single-point calculations using larger basis sets. Analytic harmonic frequencies are then computed at the same level of theory as the geometry optimizations. Due to the coordinate-locking scheme used, a number of small imaginary frequencies are obtained, which makes the entropy effects difficult to predict. Therefore, only the zero-point energy corrections are included in the final energies.

In the alternative QM/MM methodology, the protein surrounding, with the water solvent and counter-ions are treated at the classical molecular mechanics level. Depending how the coupling between the QM and the MM part is performed, the method can be divided into two classes, additive methods and subtractive methods. In the additive QM/MM method, the total energy of the system is calculated as the sum of the individual energies of the QM and MM parts (two independent calculations), plus a QM/MM coupling term (Equation 1):

$$\begin{aligned} \text{E}\_{\text{QM/MM,fullsystem}} &= \text{E}\_{\text{QM,innerlayer}} + \text{E}\_{\text{MM,outerlayer}} \\ &+ \text{E}\_{\text{QM/MM,coupling}} \end{aligned} \tag{1}$$

In the subtractive QM/MM method, different parts are subjected to independent calculations at different levels of theory, and the total energy of the system is then obtained by additions and subtractions. With a typical two-layered system as an example, the energy of the whole system is calculated at the MM level, followed by energy calculations of the inner layer at both the QM level and the MM level. The final total energy is then expressed as Equation 2:

The "our own n-layered integrated MO and MM" method (ONIOM) developed by Morokuma and co-workers is nowadays a very popular subtractive QM/MM method. (Chung et al., 2012).

The boundary problem, which occurs when covalent bonds are presented between the QM and MM parts, can be handled using two common strategies, namely, the frozen orbital approach (Thery et al., 1994) and the link atom (typically hydrogen) approach (Singh and Kollman, 1986).

<sup>E</sup>QM/MM,fullsystem <sup>=</sup> <sup>E</sup>MM,fullsystem <sup>−</sup> <sup>E</sup>MM,innerlayer <sup>+</sup> <sup>E</sup>QM,innerlayer (2)

The interactions between the QM and MM regions are approximately categorized into two parts, namely, van der Waals interactions and electrostatic interactions. The van der Waals interactions are usually described by an empirical Lennard– Jones potential. The electrostatic interactions can be treated at three different levels of sophistication, namely, mechanical embedding, electrostatic embedding, and polarized embedding. In the mechanical-embedding scheme, the QM–MM electrostatic interactions are treated as the MM-MM electrostatics at the classical-classical level, in which the electrostatic effect of the MM environment on the QM region is neglected and the QM density is not polarized. In the electrostatic embedding scheme, the MM point charges are incorporated as one-electron terms in the QM Hamiltonian. Therefore, the QM–MM electrostatic interactions are treated at the quantum-classical level, and the QM density is polarized by the MM environment. In the polarized embedding scheme, a flexible MM charge model is introduced, which is polarized by the QM charge distribution. The mutual polarization of both QM and MM regions can be treated using a self-consistent formulation. Electrostatic embedding is still the most popular choice due to its tradeoff between accuracy and efficiency. Due to the artificial truncation of the QM and MM regions, the potential charge transfer between these two regions cannot be handled using these schemes in the QM/MM calculations. A simple and straightforward way to minimize the error of using force fields for the MM region, is to increase the size of the QM region by proper selection and inclusion of residues around the active site, into the QM region. A simple procedure is to select residues using the distance criteria. Alternatively, the charge deletion analysis (Bash et al., 1991) has been suggested to identify those residues that have strong electrostatic effects on the relative energies along the reaction pathways (Liao and Thiel, 2012, 2013a); Two different stationary points with the largest charge relocalization can be selected for single-point calculations, in which the point charges of every residue close to the active site are set to zero. Special attention should be given to residues or groups with more than one positive/negative charge, for example, a diphosphate group (Liao and Thiel, 2012). The magnitude of the change of the relative energy, upon the removal of the MM point charges of the selected residue, can be used to quantify the electrostatic contribution of each residue. The general idea is that if the residue gives quite a large relative energy difference in the analysis, then this residue should be selected into the QM region to get reliable results (Liao and Thiel, 2012, 2013a); By using this protocol, one can systematically increase the size of the QM region to check the convergence behavior of the QM/MM energies. Various different properties of enzymes and proteins have been investigated with respect to the enlargement of the QM size, for example, NMR shielding (Flaig et al., 2012; Hartman et al., 2015), excitation energies (Isborn et al., 2012; Provorse et al., 2016; Milanese et al., 2017), barrier heights (Solt et al., 2009; Liao and Thiel, 2013a; Sadeghian et al., 2014; Jindal and Warshel, 2016; Kulik et al., 2016; Karelina and Kulik, 2017; Das et al., 2018); reaction energies (Fox et al., 2011; Hu et al., 2011, 2013; Roßbach and Ochsenfeld, 2017), and others (Karelina and Kulik, 2017; Morgenstern et al., 2017).

Two different strategies have been developed in the QM/MM calculations. One is to perform QM/MM geometry optimizations using the X-ray structure directly (single-conformation) or using selected snapshots (multi potential energy surfaces) from a classical molecular dynamics trajectory. In this case, the QM/MM calculations must ensure that all stationary points are in the same local minima along each reaction path. To solve the conformational complexity problem and also to speed up the QM/MM calculations, an active region of about 1,000–1,500 atoms around the QM region can be selected while the outer part remains fixed during the geometry optimizations (Shaik et al., 2010). When a large conformational change is presented during the reaction, especially for substrate binding and product release, the QM/MM molecular dynamics and Monte Carlo simulations can be used, along with the standard free-energy methods, such as free-energy perturbation, umbrella sampling, and thermodynamic integration (Senn and Thiel, 2007b). The QM/MM free energy calculations, which often require sufficiently long simulation time scales to achieve convergence, are much more time-demanding than the QM/MM geometry optimization calculations. As a compromise, a relatively smaller QM region and cheaper method, such as the semi-empirical methods and DFT functionals with double-zeta quality basis sets, are commonly used. Importantly, Senn et al. have shown that the differences between QM/MM electronic energy and free energy profiles are quite small in the study of local chemical events (Senn et al., 2009).

When performing QM/MM calculations on enzymes, a considerable amount of work has to be invested into the setup of the system before the actual QM/MM calculations. First, MM parameters are needed for the whole system. These parameters may not be available for substrates, cofactors and metals, which could be restrained to their X-ray positions in order to avoid the need to develop bonded parameters. However, non-bonded parameters, in particular atomic charges, should be developed at a reasonable level. The point charges are commonly derived from QM calculations on the selected molecules. The starting X-ray structure needs to be checked and revised prior to classical molecular mechanics simulations. Missing hydrogen atoms should be added, the protonation states of all titrable residues (e.g., His, Asp, and Glu) should be assigned according to their pKa values and local hydrogen bonding networks, the orientations of amide groups of Asn and Gln, as well as imidazole group of His, should be checked and flipped if needed. For the calculations of pKas, the empirical algorithms of PROPKA (Olsson et al., 2011) is a suitable choice, but a more rigorous solution can be obtained using Poisson-Boltzmann or QM/MM methods. The enzyme is then solvated into a water box under periodic boundary conditions or a spherical water droplet with a sufficiently large radius under spherical boundary potential. In the following step, the whole system is neutralized by either adding counter ions or by (de)protonation of charged residues at the surface of the enzyme. After these initial preparations, constrained energy minimizations, and classical molecular dynamics simulations can be performed, from which a number of snapshots can be selected as the starting geometries for the following QM/MM calculations. In the QM/MM calculations, the choice of QM method is essentially the same as that in the cluster approach, and the standard biomolecular force fields, such as Charmm (MacKerell et al., 1998), Amber (Duan et al., 2003), and Gromos (Schmid et al., 2011) are often used for the MM treatment. With the development of the very efficient domain-based local pair natural orbital coupled cluster method with single, double, and perturbatively included triple excitations (DLPNO-CCSD(T)) from the Neese group (Neese et al., 2009), it is advisable to perform QM/MM single-point calculations using DLPNO-CCSD(T) with large basis sets to obtain more accurate energies (Bistoni et al., 2018).

Various sources of errors can be envisioned from the QM/MM calculations, the setup and the starting conformation of the model system, the choice of the QM region, the choice of the QM method and MM method, the treatment of the QM/MM boundary, the entropy effects, and so on. Due to the different choices of QM/MM calculations, it is likely that different research groups obtain different results, but in general the same conclusion.

For the investigation of selectivities in enzymes, an important question is to uncover the origin of the selectivity. For metalloenzymes, both the metals and the active site residues could be involved in controlling the selectivity. A number of analytic tools from calculations could be used to find the factors that dictate the selectivity, such as frontier molecular orbitals, Fukui functions, distortion/interaction analysis (Ess and Houk, 2008; Fernández and Bickelhaupt, 2014; Bickelhaupt and Houk, 2017), and residue contribution analysis (Bash et al., 1991; Karelina and Kulik, 2017).

### MODELING SELECTIVITIES IN METALLOENZYMES

### Mn-Dependent Enzymes

Two different groups have investigated two different Mndependent enzymes that showed selectivities. One is a non-redox enzyme FosA with regioselectivity and chemoselectivity (Rife et al., 2002), and the other is a redox-active enzyme QueD with regioselectivity (Gopal et al., 2005).

FosA (Rife et al., 2002) is a manganese-dependent Fosfomycin resistance protein that catalyzes the inactivation of the Fosfomycin antibiotic by the nucleophilic addition of glutathione (GSH) on the epoxide ring (**Figure 1**) with high regioselectivity, in which the attack exclusively takes place at the C1 position. Interestingly, the uncatalyzed reaction in water solution was shown to be essentially unselective (yield of 40% on C1 and 60% on C2) (Bernat et al., 1999). There is another Fosfomycin resistance protein named FosX that mediates a nucleophilic water attack on the epoxide ring (Fillgrove et al., 2003). This also raises the question as to why FosA does not catalyze the addition of a water molecule on Fosfomycin, even though the enzyme has an open active site pocket for water binding. Consequently, FosA is both regioselectivity and chemoselective (Bernat et al., 1998).

Liao and Thiel have performed QM/MM calculations with quite a large QM region of 170 atoms, to elucidate the reaction mechanism and selectivities of FosA (Liao and Thiel, 2013c). The uncatalyzed reaction was first considered using a model consisting of a Fosfomycin and methanethiol as a model substrate for GSH, and two water molecules. Importantly, the calculations showed that the attacks on both C1 and C2 take place in a concerted step with a very similar barrier, of around 30 kcal/mol, which explains the experimental observation for the uncatalyzed reaction (Bernat et al., 1999). In FosA,

the reaction starts from a proton transfer from the GSH thiol group to a second-shell anion residue Tyr39, which is followed by the nucleophilic attack of the GSH thiolate on the epoxide, leading to the opening of the epoxide ring. The second step was suggested to be the rate-limiting step, with a barrier of 9.1 kcal/mol for the attack on C1, while it was 18.0 kcal/mol for the attack on C2. A distortion/interaction analysis (Ess and Houk, 2008; Fernández and Bickelhaupt, 2014; Bickelhaupt and Houk, 2017) was applied to understand the origin of the regioselectivity. It was shown that the distortion energy for **TS2**C2 (27.1 kcal/mol) is much larger than that for **TS2**C1 (20.0 kcal/mol), while the interaction energies were quite similar (−16.7 kcal/mol for **TS2**C2vs. −14.4 kcal/mol for **TS2**C1). These results suggested that the regioselectivity for the GSH attack is mainly distortioncontrolled.

To understand the chemoselectivity, the water attack pathway was also taken into consideration. Different from the GSH attack, the deprotonation of the water substrate, the nucleophilic attack, and the ring-opening of the epoxide proceed in a single concerted step. The energy barrier was found to be 8.3 kcal/mol higher than that of the GSH attack. In addition, the attack on C2 is now preferred. However, the calculated barrier of 17.2 kcal/mol seems to be underestimated.

Quercetin 2,3-dioxygenase (QDO) from Bacillus subtilis is a metalloenzyme that is capable of using different transition metal ions (Mn2+, Co2+, Fe2+, and Cu2+) to catalyze the dioxygenation of quercetin to generate 2 protocatechuoylphloroglucinol carboxylic acid and carbon monoxide (Gopal et al., 2005). Interestingly, the Mn-QDO shows a nitroxygenase activity in the presence of HNO with high regioselectivity, and the sole product is 2-((3,4 dihydroxyphenyl)(imino)methoxy)-4,6-dihydroxybenzoate

(**Figure 2**). However, the Fe-QDO and the Co-QDO cannot catalyze the nitroxygenation reaction, implying a unique reactivity of the Mn-QDO (Kumar et al., 2011).

Wojdyla and Borowski have performed quantum chemical cluster calculations to elucidate the reactivity and regiospecificity of the Mn-QDO (Wojdyła and Borowski, 2016). The uncatalyzed reaction was first considered with a model containing a quercetin anion and a HNO molecule. The reaction started with the formation of a van der Waals complex. Then, both the nitrogen atom (pathway I) and the oxygen atom (pathway II) of the HNO molecule could perform an electrophilic attack on the C2 atom of the quercetin anion. This is followed by the formation of a fivemembered ring intermediate, from which the ring cleavage takes place, coupled with the release of a CO molecule. The calculation showed that in pathway I the final step is rate-limiting step, associated with a barrier of 14.7 kcal/mol. While for pathway II, the rate-limiting step is the five-membered ring formation, with a barrier of 22.6 kcal/mol. Additionally, the Fukui function has been used to analyze the regioselectivity, and pathway I is more favored.

In the Mn-QDO enzyme, the HNO molecule could bind to the Mn2<sup>+</sup> ion either via the oxygen atom (2a, **Figure 2**, pathway A) or the nitrogen atom (2b, **Figure 2**, pathway B). In pathway A, the nitrogen atom attacks the C2 atom of quercetin, (2a→3a), which is followed by a conformation change (3a→4a) and the five-membered ring formation (4a→5a). The last step is the cleavage of the five-membered ring (5a→6a), which is the ratelimiting step for the whole reaction, with a barrier of 17.9 kcal/mol. For pathway B, the barrier for the first C2-O bond formation is more than 10 kcal/mol higher than that of the first C2-N bond formation. In addition, the rate-limiting step was found to be the hydrogen bond shift process (3b→4b), with a barrier of 20.4 kcal/mol. The total barrier for pathway B is

thus 2.5 kcal/mol higher than that of pathway A, which explains the regioselectivity in the Mn-QDO catalyzed nitroxygenation reaction.

To understand the metal selectivity, the substitution of Mn2<sup>+</sup> by Co2<sup>+</sup> and Fe2<sup>+</sup> in the enzyme active site has also been taken into consideration. The calculations indicated that for both

Co-QDO and Fe-QDO, the HNO prefered to be coordinated to the metal via its nitrogen atom. For both cases, the energies of the oxygen-coordinated complexes were more than 6 kcal/mol higher than those of the nitrogen-coordinated complexes. The strong preference for N-coordination over O-coordination explains that the Co-QDO and the Fe-QDO cannot catalyze the nitroxygenation reaction.

### Heme-Dependent Enzymes

Cytochrome P450 is a large superfamily of enzymes that are capable of catalyzing many different types of oxidation reactions. In P450, the key part of the active site is a heme group, and the catalytic active species is a high–valent FeIV-oxo complex with a porphyrin radical cation, termed compound I **(Cpd I**, **Figure 3**). Due to the great importance of P450, extraordinary efforts have been dedicated to mechanistic studies of these enzymes using computational tools (de Visser and Shaik, 2003; Shaik et al., 2010; Li et al., 2011, 2013; Oláh et al., 2011; Krámos et al., 2012; Blomberg et al., 2014; Dubey et al., 2016; Faponle et al., 2016). Some representative examples of selectivities in P450 investigated theoretically are shown here.

P450 2A6 plays an important role in the metabolism of nicotine in humans (Yamazaki et al., 1999). Two principal pathways were observed for the oxidation of nicotine by P450 2A6, namely hydroxylation at C5′ position and at C<sup>α</sup> position (**Figure 3**; Jone et al., 1993). Experimentally, 95% of the product was the cotinine (C5′ hydroxylation; Murphy et al., 2005), which suggested a regioselectivity in this enzyme. In addition, C5′ hydroxylation can lead to either a trans or cis product, and the major product is the trans-5'-hydroxynicotine (Peterson et al., 1987), which implies a stereoselectivity in P450 2A6.

Zhan et al. have performed MD simulations and QM/MM calculations to uncover the origin of the regioselectivity and stereoselectivity of the oxidation of nicotine by P450 2A6 (Li et al., 2011, 2013). From MD simulations, six representative substrate binding models can be located. However, only three of them are suitable for the 5'-hydroxylation, which are labeled as CYP2A6-SR<sup>t</sup> , CYP2A6-SRc, and CYP2A6-SRH. The MD simulations showed that the binding free energies of these three models are −6.86, −5.42, and −2.04 kcal/mol, respectively. When the distribution of the conformation of the free substrate in solution was taken into consideration, the CYP2A6-SR<sup>t</sup> complex is the major structure (95.4%), while 4.4% for the CYP2A6-SR<sup>c</sup> complex.

These two complexes were chosen to rationalize the regioselectivity and stereoselectivity by performing QM/MM calculations. The oxidation mechanism is a typical Cpd I mediated hydroxylation, namely C-H activation via hydrogen transfer from the substrate to the oxyl group, followed by a rebound of the hydroxyl to the substrate. For the C5′ oxidation,

the first hydrogen abstraction is the rate-limiting step, and the barriers for the C5′ -trans and the C5′ -cis hydroxylations are 14.1 and 14.4 kcal/mol, respectively. Very close barriers indicate a competition between these two pathways. Thus, the free energy barrier alone cannot explain the stereoselectivity of P450 2A6. Instead, the distribution of CYP2A6-SR<sup>t</sup> and CYP2A6- SR<sup>c</sup> plays an important role in controlling the stereoselectivity. When this factor was considered, 97% of the product was the trans-5'hydroxylation product, which is in excellent agreement with the experimental result of 89–94% (Peterson et al., 1987). These results suggested that the stereoselectivity of P450 2A6 is primarily controlled by the substrate binding mode in the active site.

For the N-methylhydroxylation, the reaction mechanism is very similar. The first hydrogen transfer was found to be the rate– limiting step, with a barrier of 15.5 kcal/mol for the CYP2A6-SR<sup>t</sup> complex and 18.0 kcal/mol for the CYP2A6-SR<sup>c</sup> complex. On the basis of the conventional transition-state theory, the final phenomenological free energy barriers for the 5'-hydroxylation and the N-methylhydroxylation were estimated to be 14.1 kcal/mol and 15.6 kcal/mol, respectively. Consequently, the 5' hydroxylation was kinetically more favorable, corresponding to a regioselectivity of 93%. This is very close to the experimental result of 95% (Murphy et al., 2005).

P450 2D6 is a human cytochrome P450 enzyme that metabolizes the cough suppressant drug dextromethorphan. Two possible pathways can be envisioned for the oxidation of dextromethorphan, namely O-demethylation and aromatic carbon hydroxylation. (**Figure 4**) Interestingly, only the former reaction is observed in P450 2D6 catalyzed dextromethorphan oxidation, despite the fact that the aromatic carbon hydroxylation is a major route in the metabolism of other anisole derivatives (Ohi et al., 1992). Therefore, a chemoselectivity is presented in P450 2D6.

Oláh et al. have carried out both QM and QM/MM calculations to understand why the aromatic carbon oxidation route does not occur in the oxidation of dextromethorphan catalyzed by P450 2D6 (Oláh et al., 2011). Both reaction pathways were considered in their study (**Figure 4**). For the QM calculations, a small model was used, in which the dextromethorphan was represented by an anisole molecule and Cpd I was represented by an un-substituted heme ring with a ferryl oxo and a hydrogen sulfide ligand. In the O-demethylation pathway, the reaction started with a hydrogen atom transfer from the methyl group of the substrate to the oxygen atom of Cpd I, followed by the rebound of the hydroxyl group to the substrate. After the release of the hemiacetal species, a hydrolysis reaction took place with the production of the phenolic product. The first H-abstraction was found to be the rate-limiting step, with a barrier of only 9.8 kcal/mol. In the alternative aromatic carbon oxidation pathway, the oxyl group attacked the aromatic ring, followed by a rearrangement of the tetrahedral intermediate

to generate the phenol product. The rate-limiting step was the first C-O bond formation, with a barrier of 14.2 kcal/mol. The QM calculations suggested that both pathways are viable options despite the fact that the O-demethylation route is more preferred.

The QM calculations with a small model does not consider the protein environment in a proper manner, which seems to be very important to rationalize the chemoselectivity. QM/MM calculations were performed using the full enzyme model, and three snapshots were taken from MD simulations and considered. For the O-demethylation route, the energy barriers for the first H-abstraction step for the three snapshots were 18.2, 19.0, and 19.7 kcal/mol, respectively. While for the aromatic carbon oxidation route, the energy barriers for the first C-O bond formation step were 32.9, 33.5, 36.7 kcal/mol, respectively. The high energy barrier for the aromatic carbon oxidation route can safely rule out this pathway, which is in line with the experimental observation (Schmider et al., 1997). Compared with the QM calculations, the QM/MM calculations suggested that the interactions between the dextromethorphan substrate and some key residues (Ser304, Ala305, Val308, and Thr309) in the active site imposed constrains on the movement of the aromatic ring, which increased the energy barrier for the C-O bond formation. The enzyme environment therefore plays an important role in dictating the chemoselectivity of P450 2D6.

P450 OleTJE is a peroxygenase catalyzing the conversion of long-chain fatty acids to terminal olefin, which can be used for biofuels (Dennig et al., 2015; Grant et al., 2015). In the oxidation of fatty-acid, three different pathways can be located, namely, α-hydroxylation, β-hydroxylation, and decarboxylation (**Figure 5**). Faponle et al. have performed QM/MM calculations to understand the regioselectivity and chemoselectivity in this enzyme (Faponle et al., 2016).

The first hydrogen abstraction takes place either at the Cα or the Cβ position of the substrate, and the calculations gave a barrier of 6.4 kcal/mol for TSHA,<sup>α</sup> and a barrier of 6.3 kcal/mol for TSHA,β. During the reaction, a substrate radical and an iron (IV)-hydroxo species are formed, which are labeled as IH,<sup>α</sup> for α-hydroxylation and IH,<sup>β</sup> for β-hydroxylation. From IH,α, the rebound pathway to form a C-OH bond is very facile, with a barrier of only 7.3 kcal/mol. The decarboxylation from IH,<sup>α</sup> was found to be associated with a barrier of over 40 kcal/mol, which is too high to be a viable option. However, from IH,β, both the rebound process and the decarboxylation reaction can take place, and the corresponding barriers were calculated to be 9.3 and 6.7 kcal/mol, respectively. A valence bond model was used to analyze these two different pathways. It was demonstrated that electron-withdrawing substitutions at the Cα or Cβ position increased the barriers for the rebound process, in turn, making the decarboxylation easier.

P450 BM3 catalyzes the C-H hydroxylation of the fatty acid derivative N-palmitoylglycine (NPG) at the ω-1, ω-2, and ω-3 positions of the long chain, but not the terminal ω position, even though the terminal ω-carbon is closer to the heme iron in all of the crystal structures of NPG-bound P450 BM3 (**Figure 6**; Haines et al., 2001). Shaik et al. have used MD and QM/MM approaches to investigate the regio- and stereo-selectivity in P450 BM3 (Dubey et al., 2016). They first performed MD simulations on three different structures, namely, the open form (the substratefree enzyme), the closed form (the substrate-bound enzyme), and the enzyme with substrate docked into the open form, to

investigate the substrate binding process and to understand how the enzyme is prepared for catalysis. The MD simulations showed that during the binding of the substrate into the active site, the active site undergoes large conformational changes and some key residues play an important role in orientating the substrate to become exposed to the active oxidant.

To explain the regioselectivity, they performed MD simulations for two different states, namely the resting state and the Cpd I state. For the resting state, the calculations showed that in the beginning the ω carbon is the nearest to the heme iron, with a distance of 4–5 Å, while they are in the range of 6 to 8 Å for the ω-1, ω-2, and ω-3 carbons. After 275 ns, the terminal ω carbon becomes about 8 to 12 Å far away from the heme iron. In the case of the Cpd I state, a similar trend was observed. As a consequence, the MD simulations indicated that the favorable positions for C-H hydroxylation are the ω-1/ ω-2 /ω-3 sites, which are in good agreement with the experiment observation (Cryle and De Voss, 2006). They also performed another MD simulation, in which the Phe87 residue was replaced by a smaller Alanine residue. After 200 ns, the terminal ω position becomes the closest to Cpd I, while the ω-1, ω-2, and ω-3 positions become far away from the oxidant. These results are in line with the experiment, in which more than 90% ω hydroxylation product is formed upon Phe87Ala mutation (Dubey et al., 2016). Therefore, Phe87 imposes steric hindrance to the terminal ω-carbon and controls the regioselectivity of P450 BM3.

For the ω-1, ω-2, and ω-3 hydroxylations, the first Habstraction is the rate-limiting step. In all three positions, either R or S product can be produced. However, the experiment showed that the R configuration was the major product (Dubey et al., 2016). For both the ω-1 and ω-2 positions, the MD simulations indicated that the pro-R C-H bond is closer to the ferryl-oxo moiety than the pro-S C-H bond, implying a preference of R product. In addition, the QM/MM calculations showed that for the ω-1 position, the barrier for the pro-S H abstraction is 6.3 kcal/mol higher than that for the pro-R H abstraction. In the case of the ω-3 position, even though the pro-S C-H bond is closer to the ferryl-oxo moiety than the pro-R C-H bond, the calculations using two different conformational basins (the major conformational basin and the minor conformational basin) showed that the barrier for the pro-R H-abstraction is lower than that for the pro-S H-abstraction (19.5 kcal/mol for TSmajor−<sup>R</sup> vs. 26.2 kcal/mol for TSmajor−S, 19.2 kcal/mol for TSminor−<sup>R</sup> vs. 21.7 kcal/mol for TSminor−S). These results suggested that despite that the initial proximity of the pro-S ω-3 C-H bond, the R product is still the dominant one.

### Non-heme-Dependent Enzymes

Non-heme-dependent enzymes are another superfamily of enzymes that participate in a large number of biological processes. Different from cytochrome P450, the active sites of these enzymes do not contain the porphyrin ligand. Much computational work has also been done on the selectivities of these enzymes (Karamzadeh et al., 2010; Suardíaz et al., 2013, 2014; Saura et al., 2014; Liao and Siegbahn, 2015; Christian et al., 2016; Roy and Kästner, 2017; Timmins et al., 2017; Wojdyla and Borowski, 2018), and some representative cases are shown here.

Prolyl-4-hydroxylase (P4H) is a non-heme iron hydroxylase that mediates the hydroxylation of a proline residue in a peptide chain to R-4-hydroxyproline with regioselectivity and stereoselectivity (**Figure 7**; Winter and Page, 2000). To rationalize the selectivities of this enzyme, the De Visser group has performed both the QM cluster and the QM/MM calculations (Karamzadeh et al., 2010; Timmins et al., 2017).

In the QM cluster calculations (Karamzadeh et al., 2010), quite a small model was used, consisting of the Fe with its firstshell ligands and the substrate. The calculations showed that the most thermochemical favorable hydroxylation takes place at the C5 position, which is inconsistent with the experiment observation (Winter and Page, 2000). The cluster model was then enlarged by including some important second-shell residues. It was found that the steric interactions imposed by Tyr140 and Trp243 increase the energy barriers for the hydroxylation at the C3 and C5 positions, leading to the preference of hydroxylation at the C4 position.

Recently, they performed molecular dynamics simulations and QM/MM calculations to find the origin of the regioselectivity and stereoselectivity in this enzyme (Timmins et al., 2017). Since their previous cluster calculations suggested that Tyr140 and Trp243 in the active site play an important role in the regioselectivity, both the wild-type and the mutant structures were taken into consideration. The rate-limiting step was found to be the hydrogen atom abstraction by the oxygen atom of the iron (IV)-oxo species. The hydroxylation at the C4 position (TSHA, C4b) has the lowest energy barrier at 20.7 kcal/mol, while they are 21.7, 50.0 kcal/mol for TSHA,C5b and TSHA,C3b, respectively. For the Tyr140Phe mutant, the hydrogen bond between Tyr140 and the ferryl-oxo moiety vanishes, and the barrier for the hydroxylation at C4 position becomes over 50 kcal/mol, which suggested that the mutant is not active. When Tyr140 is replaced by a Gly group, the most favorable place becomes H4f and the C5 position is also feasible. It was demonstrated that the hydrogen bond between Tyr140 and the ferryl-oxo moiety dictates the regio- and stereoselectivity of the enzyme. For the Trp243Phe mutant, the QM/MM calculations showed that the most favorable hydroxylation takes place at the H5b position. While for the Trp243Gly mutant, the energy barriers for the hydroxylation at both C4 position and C5

position become over 30 kcal/mol. It was suggested that Trp243 along with Glu127 and Arg161 play an important role in orientating the substrate to a proper configuration for the hydroxylation.

Homoprotocatechuate 2, 3-dioxygeanse (HCPD) is a non-heme iron extradiol dioxygenase that catalyzes C-C bond cleavage and ring opening of catecholates with high regioselectivity (**Figure 8**; Kovaleva and Lipscomb, 2007).

Christian and Ye have performed QM cluster calculations to elucidate the regiospecificity of HCPD (Christian et al., 2016). The calculations showed that the reaction started from the attack of the superoxide on the substrate with two possibilities (Step 1), namely at C1 and C2 positions, the barriers for which were calculated to be 31.3 kJ/mol and 28.3 kJ/mol, respectively. The resulting intermediate BrC1 is 28.4 kJ/mol higher in energy than BrC2.These results indicated that the selectivity for Step 1 is thermodynamically controlled. To further understand various factors that control the selectivity, different types of substrates have first been tested. When the native substrate Homoprotocatechuate (HPCA) was replaced by 2, 3 dihydroxybenzoate (2, 3-DHB), the energy difference between the attack at C1 and C2 is only 3.6 kJ/mol for the uncatalyzed reaction. Then, the influence of the first-shell coordination was investigated. The calculations showed that the energy of BrC1 is only 1.4 kJ/mol higher than BrC2 when using a model only containing the first-shell ligands. Thus, it is neither the substrate itself nor the first-shell coordination that controls the selectivity. Indeed, the selectivity of Step 1 is mainly controlled by a second residue Tyr257, which lowers the energies of the electron accepting orbitals (C2=O2 π <sup>∗</sup> orbitals) and facilitates the C2-O2 bond formation. In addition, His200 was also found to facilitate the C2-O2 bond formation through geometric effect.

For Step 2, the influences of both the metal center and the coordination sphere were taken into consideration. Since the formation of the C1-O bond is unfavorable, the production of the distal extradiol can be ruled out. The calculations showed that three key second-shell residues (Tyr257, His200, and His248) dictate the selectivity of the proximal extradiol formation vs. the intradiol formation. In particular, Tyr257 has the greatest electronic effect, while His200 and His248 both have steric and electronic effects, and make the extradiol pathway more favorable.

Benzoyl-CoA epoxidase (BoxB) is a dinuclear iron enzyme that catalyzes the epoxidation reaction of the aromatic ring of benzoyl-CoA (Rather et al., 2011). In principle, there are two different types of aromatic oxidation, namely hydroxylation and epoxidation. For BoxB, only the epoxidation reaction was observed experimentally, even though it is thermodynamically less favorable compared to hydroxylation. In addition, the epoxidation can take place at three possible positions, namely 1, 2-position, 2, 3-position or 3, 4-position (**Figure 9**). However, only 2, 3-epoxide was obtained. Furthermore, the epoxide product has a configuration of (2S, 3R), suggesting a stereoselectivity (Rather et al., 2010).

To clarify the mechanism and the various selectivities of BoxB, Liao, and Siegbahn have performed density functional calculations on this enzyme with quite a large cluster model of 208 atoms (Liao and Siegbahn, 2015). The calculations suggested that during the binding of the dioxygen molecule, each ferrous ion delivers an electron to the dioxygen moiety to generate a superoxide bridging the two ferric ions in a sideon symmetric fashion. The two high-spin ferric ions interact in an antiferromagnetic fashion to form a broken-symmetry singlet species. From the reactant, the cleavage of the O-O bond turned out to lead to a simultaneous attack of one of the oxygen atoms on the substrate aromatic carbon. Interestingly,

only epoxidation at the 2,3-position and 2′ ,3′ -position can be located. The energy barriers for TS12S,3R and TS12R,3S are 17.6 and 20.4 kcal/mol, respectively. Based on the classical transition state theory, the energy difference of 2.8 kcal/mol corresponds to a selectivity of more than 99%:1%, which agrees with the experimental observation (Rather et al., 2010). In addition, a distortion/interaction analysis (Ess and Houk, 2008; Fernández and Bickelhaupt, 2014; Bickelhaupt and Houk, 2017) was used to understand the origin of selectivity. The calculations showed that the distortion energies are quite similar (24.3 kcal/mol for TS12S,3R and 24.8 kcal/mol for TS12R,3S), while the interaction energy for TS12S,3R (18.3 kcal/mol) is somewhat smaller than TS12R,3S (21.2 kcal/mol). Thus, the selectivity of BoxB was suggested to be mainly interactioncontrolled.

To understand the chemoselectivity, the isomerization of epoxide to phenol was also taken into consideration. The C-O bond cleavage and deprotonation was found to proceed concertedly, associated with a barrier of 19.2 kcal/mol. This process was suggested to be slower than the release of the epoxide product from the enzyme active site.

Later, Rokob used the ONIOM (B3LYP/BP86/Amber) method to re-investigate this enzyme (Rokob, 2016). Four different pathways were considered for the aromatic ring oxidation, namely an electrophilic attached by a bis(µ-oxo)-diiron(IV) species, electrophilic attack via the σ <sup>∗</sup> orbital of a µ-η 2 :η 2 peroxo-diiron(III) intermediate, radical attach via the π ∗ -orbital of a superoxo-diiron(II,III) species, and radical attach of a partially quenched bis(µ-oxo)-diiron(IV) intermediate (Rokob, 2016). Importantly, the most favorable pathway (barrier of 20.9

kcal/mol) was found to be very similar as those obtained by Liao and Siegbahn (Liao and Siegbahn, 2015).

### Cobalt-Dependent Enzymes

PceA is a cobalamin-dependent reductive dehalogenase that catalyzes the dechlorination of perchloroethylene (PCE) to trichloroethylene (TCE), to cis-dichloroethylene (cis-DCE), and further to monochloroethylene (MCE) (Bommer et al., 2014). Three possible products can be envisioned for the dechlorination of TCE, namely, 1,1-DCE, cis-DCE, and trans-DCE. However, the cis-DCE was found to be the sole product, implying that PceA is regioselective.

To elucidate the reaction mechanism and regioselectivity of PceA, Liao et al. performed density functional calculations on this enzyme with quite a large model of 215 atoms (Liao et al., 2016). The dechlorination of PCE was first considered, and two different pathways have been analyzed (**Figure 10**). For pathway I, one electron reduction of CoII to generate Co<sup>I</sup> is coupled with the protonation of the second-shell anionic Tyr246 residue. This proton-coupled electron transfer step has a potential of −0.91 V. Then, a heterolytic C-Cl bond cleavage takes place, in concomitant with the proton transfer from Tyr246 to C1. During the reaction, the Co<sup>I</sup> delivers two electrons to the Cl+, leading to the formation of a CoIII-chloride complex. This step is ratelimiting, with a total barrier of 12.5 kcal/mol. Finally, the other one electron transfer to the metal center results in the formation of the CoII product complex, and the whole reaction is exergonic by 36.0 kcal/mol. For pathway II, Tyr246 keeps anionic during the first one electron reduction. As a consequence, the following C-Cl bond cleavage is a homolytic process, and a CoII-chloride substrate radical complex is produced. This step is associated with a barrier of 15.1 kcal/mol, which is slightly higher than that for pathway I. Thus, mechanism I is kinetically more favorable, which is in line with the suggested mechanism for the other reductive dehalogenase NpRdhA (Liao et al., 2015).

To rationalize the regioselectivity in the dechlorination of TCE, all six possible substrate binding modes inside the active have been considered. There are two substrate orientations for the formation of each product, cis-DCE, trans-DCE, and 1,1-DCE. The lowest energy barriers for the formation of cis-DCE, trans-DCE, and 1,1-DCE were calculated to be 13.8, 17.6, 18.4 kcal/mol, respectively. It was suggested that the amide group of cobalamin and other important second-shell residues form a pocket that favors the formation of the cis-DCE product but imposes larger steric repulsion in the formation of the other products.

The dechlorinations of cis-DCE and MCE have also been considered, and the corresponding barriers were calculated to be 20.8, 27.6 kcal/mol, respectively. The calculations showed that the energies of the HOMOs for PCE, TCE, cis-DCE, and MCE were −7.12, −7.11, −7.07, −7.15 eV, respectively, while the energies of the LUMOs were −1.11, −0.82, −0.42, −0.07 eV, respectively. During the dechlorination, an electron is transferred from Co<sup>I</sup> to the LUMO of the substrate. Therefore, the substrate with lower LUMO was suggested to be more reactive, and with lower barrier.

Recently, Ji et al. investigated the cobalamin-mediated reductive dehalogenation reaction by DFT calculations (Ji et al., 2017). Both the inner-sphere and the out-sphere pathways have been considered. The comparison of the calculated kinetic isotope effect and the experimental kinetic isotope effect was used as a probe for diagnosing the reaction mechanism. The reaction in water solution was suggested to proceed via the formation of an organometallic intermediate with a Co-C bond (Ji et al., 2017). This is different from the case in enzyme, in which a secondshell residue was suggested to protonate the substrate during the reductive dehalogenation reaction (Liao et al., 2016).

Johannissen et al. have performed molecular docking and DFT calculations to compare the interactions between the substrate and the cobalamin in both NpRdhA and PceA (Johannissen et al., 2017). A [Co•••X•••R] adduct was suggested to be formed at the Co<sup>I</sup> state, which weakens the substrate carbon-halide bond. This is evidenced by the elongation of the carbon-halide bond during the reduction of CoII to Co<sup>I</sup> , and similar results have been found by Liao et al. (2016).

### Nickel-Dependent Enzymes

Wang et al. (2018) have investigated the mechanism and chemoselectivity of a nickel-dependent quercetin 2,4 dioxygenase (Ni-QueD) (Jeoung et al., 2016) by performing QM/MM calculations at the B3LYP-D3/def2-TZVPP:Charmm level (**Figure 11**). In the dioxygenation of quercetin, two possible pathways can be envisioned, the 2,4-dioxygenolytic cleavage to harvest 2-protocatechuoylphloroglucinol carboxylic acid and carbon monoxide, and the 2,3-dioxygenolytic cleavage to produce α-keto acid. The formal pathway was observed exclusively by experiment, suggesting a chemoselectivity of this enzyme.

In their QM/MM calculations, the critical first-shell ligand Glu74 has been considered to be both in the neutral and in the ionized form. It was found that Glu74 must be deprotonated to favor the 2,4-dioxygenolytic pathway and to rationalize the chemoselectivity. The binding of a dioxygen molecule to the NiII ion results in the formation of an open-shell broken-symmetry singlet species, in which a triplet NiII is antiferromagneticallycoupled with the triplet dioxygen moiety, with partial electron transfer from the anionic quercetin substrate to the dioxygen moiety. However, the following reaction takes place preferentially in the triplet state, and a spin-crossing has to take place during the first C-O bond formation, which leads to the generation of a NiII-peroxide intermediate. Prior to the second C-O bond formation, a conformation change take place, which is required for the second C-O bond formation. Subsequently, the peroxide could attack either C3 or C4. The barrier for the attack on C4 is 3.6 kcal/mol lower than that on C3. The attack on C4 results in the formation of a pentabasic cyclic intermediate, from which the ring opening takes place with the release of a carbon monoxide molecule. This step was calculated to be rate-limiting, with a barrier of 17.4 kcal/mol. Alternatively, the opening of the four-membered ring after the attack on C3 is associated with a barrier of 30.6 kcal/mol. The calculations showed that the 2,4-dioxygenolytic pathway is much more favored than the 2,3 dioxygenolytic pathway. In addition, QM gas phase calculations and QM/MM calculations using mechanic embedding scheme also favor the 2,4-dioxygenolytic pathway significantly. The calculated barrier of 17.4 kcal/mol agrees quite well with the experimental kinetic constant of 40.1 s−<sup>1</sup> (Merkens et al., 2008), which corresponds to barrier of about 15 kcal/mol.

Liu and co-workers have also investigated the reaction mechanism of this enzyme using the QM/MM method. However, only the 2,4-dioxygenolytic pathway was considered, and a similar mechanism was suggested (Li et al., 2018).

### Zinc-Dependent Enzymes

Moa and Himo have used the quantum chemical cluster approach to investigate the stereoselectivity of the zincdependent secondary alcohol dehydrogenase (Moa and Himo, 2017). In order to rationalize the opposite enantioselectivity in the dehydrogenation of 2-butanol (R-selective) and 3-hexanol (S-selective), an active site model of more than 300 atoms were designed from the crystal structure.

The calculations support the general mechanism as proposed for other alcohol dehydrogenase (Cui et al., 2002), in which a proton is first transferred from the substrate alcohol to a neutral histidine on the enzyme surface, followed by a hydride transfer from the substrate alkoxide to the NADP<sup>+</sup> cofactor. The zinc ion functions as a Lewis acid to stabilize the alkoxide intermediate. For 2-butanol, two different substrate orientations have been found for both the (R)- and (S)-enantiomers. Interestingly, the orientation with lower energy is always non-productive. In addition, the small cavity in the active site prefers to bind the ethyl group for both 2-butanol and 3-hexanol. This is quite unexpected for 2-butanol, for which the large cavity prefers to bind the smaller methyl group. The energy decomposition analysis (Kitaura and Morokuma, 1976) showed that larger attractive dispersion interaction is presented when the ethyl group is in the small cavity. This also explains the preference of the (R)-TS compared with the (S)-TS (energy difference of 1.3 kcal/mol) for 2-butanol. Similar analysis have been performed for 3-hexanol, for which the (S)-TS is now preferred by 4.2 kcal/mol compared with the (R)-TS. Compared with the experimental data, the barrier difference between the (R)- and (S)-enantiomers were slightly overestimated, which was suggested to originate from the constrain of the model during the geometry optimization. More flexibility of the binding pockets by using even larger models is needed to rationalize the enantioselectivity of substrates with even larger substituents.

It should be pointed out that the reproduction of stereoselectivity in enzymes has been considered to be very challenging, as it requires accurate calculations of very small energy differences between different transition states. However, the Himo group showed that the quantum chemical cluster methodology is capable of solving this kind of important question, evidenced by four additional examples, namely limonene epoxide hydrolase (Lind and Himo, 2013), arylmalonate decarboxylase (Lind and Himo, 2014), soluble epoxide hydrolase (Lind and Himo, 2016), and phenolic acid decarboxylase (Sheng and Himo, 2017).

### Molybdenum-Dependent Enzymes

Szaleniec et al. reported a QM/MM studies on the enantioselectivity of the molybdenum-dependent ethylbenzene dehydrogenase (Szaleniec et al., 2014; **Figure 12**). A two layered ONIOM method (B3LYP/Lacv3p∗∗: Amber) was used for the calculations. A small QM region of 52 atoms (QM1) was used for the initial calculations, followed by calculations using a large QM region of 168 atoms (QM2). The oxidation of the ethylbenzene substrate was suggested to be divided into two phases, namely C-H activation and OH-rebound. The steric effects imposed by the enzyme active site enforce an almost planar conformation of ethylbenzene, with the pro(S) H-atom pointing to the Mo(VI)=O moiety. The first C-H activation step turns out to be a proton-coupled two-electron transfer process. A radical substrate is formed at the transition state (TS1) and it becomes a carbon cation at the intermediate. During the reaction, the second-shell residue His192 delivers a proton to the Mo(VI)=O moiety. In the second step, the Mo(IV)-bound water molecule performs a nucleophilic attack on the carbon cation intermediate (TS2), coupled with a proton transfer from this water to His192. The first step was calculated to be rate-limiting, with a barrier of 84.4 kJ/mol using QM2, and the second OH-rebound step was found to be barrierless.

On the basis of the proposed mechanism, the enantioselectivity was then analyzed by comparing the relative energies of all stationary points for the formation of both (S) and (R)-phenylethanol. The energy difference between TS1pro(S) and TS1pro(R) was calculated to be 17.2 kJ/mol favoring the (S) pathway, while the energy difference decreases to 8.6 kJ/mol for the second transition state. The interaction energies between the substrate and the nearby active site residues were used to analyze the source of the enantioselectivity. The corresponding values were calculated to be −63.7 and −48.6 kJ/mol for TS1pro(S) and TS1pro(R), respectively.

### Tungsten-Dependent Enzymes

The mechanism and selectivity of three different tungstoenzymes have been subjected to QM and QM/MM calculations, namely, acetylene hydratase (Liao et al., 2010; Liao and Himo, 2011), formaldehyde oxidoreductase (Liao et al., 2011b; Liao, 2013) and benzoyl CoA reductase (Culka et al., 2017; Qian and Liao, 2018).

On the basis of QM calculations with a model of 116 atoms, Liao et al. (2010) suggested a first-shell mechanism that gave reasonable barrier (**Figure 13**). The reaction starts with a ligand exchange of a W(IV)-bound water molecule by the acetylene substrate. In the next step, the liberated water molecule, which gets deprotonated by the second-shell anionic Asp13 residue, performs a nucleophilic attack on the W (IV)-bound acetylene. This leads to the formation of a W(IV)-vinyl anion intermediate, which undergoes protonation by the Asp13 to produce a W(IV) vinyl alcohol complex. This step was calculated to be rate-limiting with a barrier of 23.0 kcal/mol at the B3LYP/ LANL2TZ(f)-6- 311+G(2d,2p) level. The isomerization of the vinyl alcohol to acetaldehyde catalyzed by the enzyme by two sequential proton transfer steps was found to be quite facile.

This new mechanism was then used to rationalize the chemoselectivity of this enzyme (Liao and Himo, 2011), which does not catalyze the hydration of either propyne or ethylene. The calculations showed that the ligand exchange of water by propyne is more exothermic than that by acetylene, and the following hydration has a much higher barrier for propyne, being over 30 kcal/mol. This explains that propyne is a competitive inhibitor as observed from experiment. The different reactivity of acetylene vs. propyne has been explained by analyzing the orbital interactions between the W(IV) ion and the substrate. W(IV) has an electronic configuration of 5d<sup>2</sup> , the two degenerate π orbitals of the substrate (acetylene and propyne) interact with two empty 5d orbitals of W(IV), and the occupied 5d orbital of W(IV) interacts with one of the two unoccupied π <sup>∗</sup> orbitals via back-donation. The other unoccupied π <sup>∗</sup> orbital of the substrate interacts with one of the remaining unoccupied 5d orbitals of W(IV) to generate a δ-like orbital, which facilitates the electron transfer to the empty π <sup>∗</sup> orbital during the following nucleophilic attack of the water molecule on the substrate. Upon methyl substitution, the HOMO energy raises by 0.6 eV from acetylene to propyne, and this results in better π donation and larger binding energy for propyne. In addition, the LUMO energy also raises by 0.3 eV from acetylene to propyne, and this makes the electron transfer to the δ-like orbital more difficult during the nucleophilic attack. Steric repulsion has been suggested to be another plausible reason for the different reactivity of acetylene and propyne.

For the hydration of ethylene, the ligand exchange of water by ethylene becomes slightly endothermic, and the barrier for the following water attack has a barrier of more than 30 kcal/mol. The orbital interaction analysis showed that the two electron donor character of ethylene makes its π donation much weaker than that of acetylene. In addition, the back-donation of the occupied 5d orbital of W(IV) to the only unoccupied π <sup>∗</sup> orbital of ethylene results in the involvement of an even higher π ∗ -like orbital to accept the electron during the following nucleophilic attack, which is much less favorable.

Liao et al. has also investigated the mechanism of formaldehyde oxidoreductase (FOR) and the W vs. Mo selectivity for this enzyme (Liao et al., 2011b). QM calculations showed that this enzyme uses a W(VI)=O as the key oxidant and the formaldehyde coordinates to W(VI) directly via its oxygen atom. The W(VI)=O moiety then performs a nucleophilic attack on the formaldehyde carbon, generating a tetrahedral intermediate (**Figure 14**). Subsequently, a second-shell anionic residue Glu308 abstracts a proton from the tetrahedral intermediate, coupled with two-electron transfer from the intermediate to W(VI), which becomes reduced to W(IV). Other possible pathways have also been considered but were found to have much higher barriers. The suggested mechanism was then used to explain why the molybdenum substituted enzyme is not active, even though some other enzymes are able to use both molybdenum and tungsten for catalysis (Liao, 2013). The whole catalytic cycle including the formation of the active oxidant M(VI)=O (M = Mo and W) and the oxidation of formaldehyde were considered. The resting state of the enzyme is M(IV)-OH<sup>2</sup> and its oxidation to M(VI)=O involves two sequential proton-coupled electron transfer steps. For W-FOR, the energetics for the two oxidation steps can be estimated from the experimental redox potentials of the external electron acceptor [Fe4S4] <sup>2</sup>+/<sup>+</sup> (−350 mV) and the W(IV)/W(V) and W(V)/W(VI) couples. The formation of the active reactant complex is exothermic by 1.7 kcal/mol. For Mo-FOR, the relative redox potentials, which can be obtained with very good accuracy as the metal has the same environments before and after the oxidation, were calculated and used to set up the energetics. The formation of the active reactant complex now becomes endothermic by 11.7 kcal/mol. This large difference originates from the different redox potentials of W and Mo, which have also been observed experimentally for the redox potentials of pairs of molybdenum and tungsten complexes. The oxidation of formaldehyde by Mo-FOR proceeds via a similar pathway as that by W-FOR. However, due to the energetic penalty for the formation of the active Mo(VI)=O, the total barrier increases to 28.2 kcal/mol (17.6 kcal/mol for W-FOR).

Very recently, Qian and Liao performed QM/MM calculations (B3LYP-D3/def2-TZVPP:Charmm) to rationalize the regioselectivity of the benzoyl CoA reductase (Qian and Liao, 2018). They found a similar mechanism (**Figure 15**) as suggested by Culka et al. (2017) on the basis of QM/MM calculations. In the reactant complex, a water molecule is coordinated to the W(IV) ion. The reduction of the benzene ring proceeds via two sequential steps. First, the W(IV)-bound water molecule delivers a proton to the para-carbon (C4) of the benzene ring, in concomitant with an electron transfer from the W(IV) center to the substrate. Consequently, a W(V)-OH cyclohexadienyl radical intermediate is generated. The second step involves a proton transfer from a second-shell residue His260 to the meta-carbon (C3) of the cyclohexadienyl radical and an electron transfer from the pyranopterin cofactor to the substrate. The first step was calculated to be rate-limiting, with a barrier of 23.2 kcal/mol in the broken-symmetry singlet state.

The reduction of the aromatic ring at other positions was then considered. Geometry constrains dictates that only the two other metal-carbons (C3 and C5) could accept the proton from the water molecule during the first reduction step. The barriers for these two pathways were found to be over 8 kcal/mol higher than that for the proton transfer to C4. The preference for the para attack was partially explained by the extra spin delocalization to the adjacent carbonyl group in the cyclohexadienyl radical intermediate, and the extra spin delocalization is absent for the meta attack. In addition, the substrate orientation also favors the attack on C4. For the reduction at both 2,3 and 5,6-positions, the second step turns out to be rate-limiting, with barriers of

more than 30 kcal/mol. The calculations thus reproduce the regioselectivity.

They have also investigated the substitution of tungsten by molybdenum to predict if the Mo-BCR is active or not. The reaction mechanism is generally the same, however, the barrier increases to 31.1 kcal/mol in the open-shell singlet for Mo-BCR. In addition, the reduction becomes endothermic by more than 15 kcal/mol. The main reason was suggested to originate from the different redox potentials of the M(VI)/M(V) and M(V)/M(IV) couples (M = Mo and W), and tungsten has more negative potentials than molybdenum, as found previously from both experimental and theoretical studies.

### SUMMARY AND OUTLOOK

In this review, we have presented the progress of the computational modeling of selectivities in metalloenzymes. Both the quantum chemical cluster and the QM/MM approaches have shown to be successfully applied in the rationalization of selectivities and identification of factors that control the selectivity.

One of the most important questions in these enzymes, is the origin of the various selectivities, namely chemoselectivity, regioselectivity, stereoselectivity, metal oxidation preference (Fe2<sup>+</sup> vs. Fe3+), and metal selectivity (W vs. Mo). Mn, Fe, and Ni are typically used in the activation of dioxygen for oxidative transformations. The selectivity is typically controlled by the protein environment, and a proper description of the substrate surroundings is crucial for the rationalization of the selectivity. In Fe-dependent aldoxime dehydratase, the redox nature of the mechanism dictates the use of a lower oxidation state Fe2<sup>+</sup> for electron delivering, rather than the more Lewis acidic Fe3+. In the case of Mn-dependent FosA and Zn-dependent secondary alcohol dehydrogenase, both metal ions mainly function as a Lewis acid to stabilize the oxy anion during the reaction. Co is used for reductive dehalogenation, and the regioselectivity is controlled by the active site pocket. The stereoselectivity in the Mo-dependent ethylbenzene dehydrogenase is also controlled by the active site. For tungsten-dependent enzymes, the tungsten ion in acetylene hydratase enables orbital interactions with the substrate, which illuminates the chemoselectivity. While for both formaldehyde oxidoreductase and benzoyl CoA reductase, the different oxidation potential for the W6+/W4<sup>+</sup> couple and the Mo6+/Mo4<sup>+</sup> couple, explains the metal preference in these two specific examples.

There are a number of challenging issues that need to be considered during the modeling of selectivities. First, a correction mechanism that fits all available experimental observations has to be suggested first. This is a known challenge for reactions where electrons and protons are penetrated or liberated, as the methods currently available are not accurate enough to determine the absolute pKas and redox potentials. Second, the reproduction of selectivity, especially stereoselectivity, requires an accuracy of <1 kcal/mol. For many cases, error cancellation may be present, which makes rationalization possible. The standard DFT/MM method is a suitable choice for such applications. In other very few cases, one may need to push the accuracy limit by minimizing all possible errors and considering various issues, such as the effect of entropy and conformation, the use of high-level ab initio methods, the use of proper and larger QM region, the use of better force field, et al. in the QM/MM calculations. In practice, this may be very difficult to achieve due to the tradeoff between accuracy and speed. The most challenging issue is to make a prediction that can be confirmed by an experiment. This is extremely important in the field of directed evolution, which is capable of manipulating the selectivity using active site mutations. This area is likely where extraordinary efforts will be dedicated in the future.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

This work was supported by the National Natural Science Foundation of China (21503083, 21873031), the Fundamental Research Funds for the Central Universities (2017KFKJ XX014).

#### Wei et al. Computational Enzymology

### REFERENCES


of Catechol O-Methyltransferase. J. Phys. Chem. B 120, 11381–11394. doi: 10.1021/acs.jpcb.6b07814


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wei, Qian, Wang and Liao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Systematic DFT Approach for Studying Mechanisms of Redox Active Enzymes

### Per E. M. Siegbahn\* and Margareta R. A. Blomberg

Arrhenius Laboratory, Department of Organic Chemistry, Stockholm University, Stockholm, Sweden

When DFT has been applied to study mechanisms of redox processes a common procedure has been to study the results for many different functionals. For redox reactions involving the first row transition metals, this approach has given very different results for different functionals. The conclusion has been that DFT cannot be used for these reactions. In the meantime, results with strong predictability have been generated, most noteworthy for photosystem II, where all DFT predictions have been verified by experiments performed later. In order to obtain these predictive results using DFT, an alternative, systematic approach has been used, where the key differences between the results for different functionals can be rationalized by using a single parameter, rather than using the very large number of differences in the functionals.

#### Edited by:

Vicent Moliner, Universitat Jaume I, Spain

#### Reviewed by:

Claudio Greco, Università Degli Studi di Milano Bicocca, Italy Kendall N. Houk, UCLA Department of Chemistry & Biochemistry, United States

> \*Correspondence: Per E. M. Siegbahn per.siegbahn@su.se

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 02 November 2018 Accepted: 11 December 2018 Published: 21 December 2018

#### Citation:

Siegbahn PEM and Blomberg MRA (2018) A Systematic DFT Approach for Studying Mechanisms of Redox Active Enzymes. Front. Chem. 6:644. doi: 10.3389/fchem.2018.00644 Keywords: density functional theory, redox reactions, nitrogenase, cytochrome c oxidase, exact exchange

### I. INTRODUCTION

The results using density functional theory for molecules, have been continuously criticized the past decades. In particular, the results using b3lyp (Becke, 1993) have been shown to have severe errors in several cases. For main group elements, most of these cases were found to be corrected by the addition of dispersion corrections (Schwabe and Grimme, 2007). The cases where dispersion was needed could relatively easily be identified, and the majority of calculations made using b3lyp were not affected by dispersion in a significant way. However, molecules containing transition metals have always been considered particularly difficult to treat, and the use of DFT has continued to be criticized in these cases. DFT using different functionals were shown to give very different results for many redox reactions, in particular for molecules containing first row transition metals. The selection of different functionals for different reactions has become popular, but this approach obviously suffers from user bias, and lack of predictability.

When different functionals are compared, it is important to do this in a systematic way. During the past decade it has been possible to identify the most sensitive parameter for redox enzymes, and this is the amount of exact exchange. By identifying this single parameter, it is possible to get away from the non-systematic approach of testing a large number of functionals, which differ in many irregular ways. The best procedure found is to start with the usual b3lyp functional, which has 20% exact exchange, and decrease this percentage in steps.

A breakthrough for that approach came in the studies of photosystem II (Siegbahn, 2009, 2013). The catalyst is an Mn4Ca complex, which shows most of the typical problems of using DFT. For example, the Mn(III) to Mn(IV) redox energy is very sensitive to the fraction of exact exchange used in hybrid methods like b3lyp. For every change of one percent of the exact exchange, the redox energy changes by one kcal/mol (Siegbahn and Blomberg, 2014). This means that for b3lyp using

**71**

10% exact exchange, the redox potential may change by 10 kcal/mol compared to the one using the usual 20%. Since there are three oxidations of Mn(III) in the catalytic cycle the difference becomes 30 kcal/mol from the start to the end. Still, it was possible to make new and certain predictions for water oxidation in PSII, that were confirmed in detail by experiments performed later.

### II. COMPUTATIONAL DETAILS

The general computational approach used here has been applied and evaluated for a large number of metallo-enzymes (Blomberg et al., 2014). Density functional calculations are performed on large cluster models of the active sites (170–300 atoms). The active sites in the enzymes discussed here have two or more transition metal ions, and the models include these atoms, plus the first and sometimes also the second shell ligands. As mentioned in the introduction, the b3lyp functional (Becke, 1993) with a varying fraction of exact exchange is used in the present study. A strong argument for using b3lyp as a reference functional, is that this is the hybrid DFT functional with the smallest number of parameters. In fact, the sensitivity of the results for b3lyp is essentially dependent on only one parameter, the amount of exact exchange. Geometries are optimized using a double zeta basis with polarization functions on all second row atoms, and with a few atoms close to the truncations fixed to the X-ray coordinates. The reason not to use a larger basis set for the geometry optimization is based on a large amount of experience gathered during the past three decades, see for example Siegbahn (2001). In fact, an even smaller basis set would probably be accurate enough. More accurate energies for the optimized structures are obtained from single point calculations using a larger basis set, the lacv3p+ basis for the metal ions (Jaguar, 2009), and the large cc-pvtz(-f) basis set for the rest of the atoms. In the recommended use of b3lyp, the fraction of exact exchange has generally been set to 15% (Reiher et al., 2001) in previous studies. Empirical dispersion corrections according to Grimme (Schwabe and Grimme, 2007; Grimme et al., 2010), and solvent effects from the surrounding protein using the self-consistent reaction field (SCRF) approach are included in the energetic results described below. When H<sup>2</sup> is released and N<sup>2</sup> becomes bound in nitrogenase, the gain (loss) of translational entropy of about 10 kcal/mol is also included. The inclusion of entropy is very important in these cases but not elsewhere. Zero-point corrections are taken from the Hessians, calculated at the same level as the geometry optimizations. For nitrogenase, these effects were taken from smaller models. The Jaguar 7.9 program (Jaguar, 2009) is used for the nitrogenase geometry optimization and for all the calculations with the larger basis set, and the Gaussian 09 package (Gaussian, 2010) is used for the cytochrome c oxidase geometry optimizations and for the Hessian calculations.

The computational procedure described above has been kept as similar as possible to what has been used the past decades. The reason is that a large experience has been gained of this approach, and this knowledge is very useful when the accuracy of the predictions obtained is judged.

### III. NITROGENASE

Nitrogenase is the main enzyme in nature that catalyzes nitrogen reduction from the air. The core of its catalytic cofactor is shown **Figure 1** (Kim and Rees, 1992; Spatzal et al., 2011). It has seven irons and one molybdenum connected by sulfide bridges. Species containing vanadium and with all-iron exist. A redox potential of –1.6 V, the lowest in nature, is used for the reduction. Quite surprisingly, the oxidation state of the cofactor is not particularly low, which would normally have been expected for a strongly reducing complex. Instead there are four Fe(III) and three Fe(II). Molybdenum is in the Mo(III) state.

Almost all experimental information concerns the ground state before reduction. However, there is one notable exception which is a key for finding the mechanism by DFT model calculations. It has been shown by EPR that the reduced state that activates N2, termed E4, has two bridging hydrides (Hoffman et al., 2013). These two hydrides were found to be removed as H<sup>2</sup> in a reductive elimination process, which is directly followed by the binding of N2. The process was found to be easily reversible by changing the pressure of hydrogen and nitrogen. This means that the states involved must be nearly isoenergetic.

The experimental suggestion for the structure of E<sup>4</sup> the central carbon remains unprotonated and has two bridging hydrides (Lukoyanov et al., 2016). Since E<sup>4</sup> was suggested to appear after four reductions of the experimentally characterized ground state in the catalytic cycle, see **Figure 1**, there should be two remaining protonations. They were suggested to be on the sulfides. This meant that the redox state for E<sup>4</sup> should be the same as for the ground state with four Fe(III), a very surprising situation.

The results obtained for nitrogenase after four reductions of the ground state in **Figure 1** are shown in **Table 1**. The corresponding structures are shown in **Figures 2**, **3**. For each functional, the results for six different structures are listed. These structures were taken from the best ones obtained previously, where several hundred structures were compared. They were obtained with 20% exact exchange at the lacvp\* level, and were then compared with a large cc-pvtz(-f) basis set at the 15% level. For each functional the geometries were optimized, which

Siegbahn and Blomberg A Systematic DFT Approach

TABLE 1 | Relative energies (kcal/mol) for states obtained after four reductions from the ground state of nitrogenase, using density functionals with different fractions of exact exchange.


turned out to be important. The first structure, termed C,H−, has an unprotonated carbon and one hydride. There are three protonations of the sulfides. In the second structure, termed C,2H−, one of the protons on the sulfides has moved to a hydride position, so there are two hydrides and two protonated sulfides. This is the one of the six structures that corresponds to the experimentally suggested structure. In the third structure, termed CH, the hydride in the first structure has moved to carbon, and there are still three protonated sulfides remaining. The fourth structure, termed CH2,H−, has a doubly protonated carbon, one hydride and one protonated sulfide. In the fifth structure, the hydride has moved to carbon to form a CH<sup>3</sup> group, and there is one protonated sulfide. Finally, in the sixth state, the two hydrides in the second structure has been removed to form a free H<sup>2</sup> molecule.

The main conclusion that can be drawn from the results in **Table 1** is that the structure, termed C,2H−, does not fit the experimental observations. This is true for all functionals in the table. Already at this point, the important conclusion can be made, that the E<sup>4</sup> state does not look as suggested experimentally. However, the by far strongest argument against the experimental suggestion comes from the results for the sixth structure. Experimentally, it is known that two hydrides should be removed as H<sup>2</sup> essentially thermoneutrally. As can be seen in the table, the removal of H<sup>2</sup> is much too exergonic for all functionals to fit that observation. With 20%, the removal is exergonic by 54.6 (22.6 + 32.0) kcal/mol for 15% exergonic by 48.0 (14.7 + 33.3) kcal/mol, for 10% exergonic by 43.3 kcal/mol and for 0 % exergonic by 34.0 kcal/mol.

It is also found that all functionals prefer a protonated carbon, suggested in previous DFT studies (Siegbahn, 2016). For 20, 15, and 10%, this preference is very large. For 0%, the preference is less pronounced, but it is still there by 11.4 (23.4–12.0) kcal/mol.

It should be added that results with three hydrides are not included in **Table 1**, but have been done. The reason three hydrides are not in the table is that they can be excluded as possible E<sup>4</sup> states already from the beginning. It is chemically unreasonable that the first four reductions of the cofactor should lead to an oxidation of the metals rather than a reduction, in particular, since the lowest redox potential of –1.6 V is used. The already high oxidation state of the ground state, with four Fe(III), would be six Fe(III) with three hydrides. The possibility that there should be five Fe(III) and a Mo(IV) is equally unreasonable chemically. Furthermore, for 20, 15 and 10% exchange the structure with three hydrides is very high in energy compared to the other structures. Even for the functional with 0%, the best solution obtained is higher in energy than the CH2H<sup>−</sup> structure by +6.1 kcal/mol. However, that value is somewhat uncertain since the spin-coupling with three hydrides is very different from the other cases, and includes several low-spin-coupled Fe(III). This leads to a very large number of possibilities for the spinstate. Five spin-couplings were tried, all higher in energy than the CH2H<sup>−</sup> structure. They were selected to have the sums of the spins on Fe2-Fe4 opposite to the ones for Fe5-Fe7. Two of them were the ones optimal for the ground state and for the reduced states found previously (Siegbahn, 2016). The last argument against structures with three terminal hydrides is that they are incompatible with the EPR experiments (Hoffman et al., 2013), which clearly show two bridging hydrides.

There are other points of interest in **Table 1**. For 15 and 20% exchange, the structure with two hydrides are quite high in energy compared to the first structure, by +22.6 and by +14.7 kcal/mol, respectively. Furthermore, they are also terminal, not bridging, hydrides in contradiction to experiments. The energy of

this structure is somewhat lower for 10%, but still +7.2 kcal/mol higher than for the starting structure. For 0%, the situation is different. Here the structure with two hydrides is lower than for the first structure by -12.0 kcal/mol, but the two hydrides can still be removed to form H<sup>2</sup> with a gain of 34.0 kcal/mol, as mentioned above. The lowest state for 20 and 15% is the CH<sup>3</sup> structure, but for 10 and 0% it is the CH2,H<sup>−</sup> structure, with a margin of –1.0 kcal/mol for 10 % and with –5.3 kcal/mol for 0%.

In summary, there are a few clear conclusions. First, none of the functionals prefers an unprotonated carbon. Also, all functionals bind the hydrides very poorly. To remove the hydrides as H<sup>2</sup> is strongly exergonic in all cases, by more than 30 kcal/mol, when experiments show that this should be almost thermoneutral. This means that the experimentally suggested structure for E4, does not agree with the results for any functional and can be ruled out by a large margin by DFT.

In a very recent paper, a new mechanism for H<sup>2</sup> release and N<sup>2</sup> binding in nitrogenase has been suggested (Raugei et al., 2018). A non-hybrid method was used. A mechanism was suggested in which the two hydrides in E<sup>4</sup> endergonically form a locally bound H<sup>2</sup> molecule. To avoid the problem with the very large computed exergonicity when H<sup>2</sup> is released, the key to their mechanism is that this bound H<sup>2</sup> molecule could only be released with a very high barrier. If the barrier is lower than 18 kcal/mol, there would be no protonation of N2, but a high barrier should prevent H<sup>2</sup> from being released. In their mechanism, the endergonic binding of N<sup>2</sup> was found to reduce the barrier for releasing H<sup>2</sup> by a significant amount, and H<sup>2</sup> can then be released. There are

many bound H<sup>2</sup> complexes in the literature, but none of them behaves like suggested in Ref. (Raugei et al., 2018). In all the published cases there is at most a weakly bound H2, which can be released without additional barriers apart from the endergonicity. A search for a bound H<sup>2</sup> was initiated in the present study using 0% exchange (non-hybrid), and a picture is obtained which is very similar to the ones previously published in the literature. A weak local minimum for a bound H<sup>2</sup> is obtained and releasing it from that minimum goes over at most a very small barrier, less than 5 kcal/mol. The release is quite exergonic including a gain of entropy of about 10 kcal/mol.

Since DFT rules out an unprotonated carbon structure, another structure has to be found for E4. If no other structure can be found, the conclusion must be that no version of DFT can handle nitrogenase, a very unlikely scenario. It has been suggested in previous DFT studies that the lowest energy structure in **Table 1**, is merely a starting structure for catalysis (Siegbahn, 2016). The first four reductions of the ground state structure in **Figure 1** would then be just an initial activation process, done only once, before catalysis starts leading to a new E<sup>0</sup> state. Following that suggestion, four additional reductions would lead to the actual E<sup>4</sup> structures. The results for the E<sup>4</sup> structures, determined in this way, are shown in **Table 2** for the same four functionals as discussed above. The results are also displayed in **Figure 4**. The results for six structures are shown, following the conclusions of the EPR experiments. The first entry, termed 2H−, has two hydrides, a CH<sup>3</sup> ligand and three protonated sulfurs, altogether eight protonations of the cofactor. The second entry, termed "H-H re TS" shows the barrier for the re mechanism (reductive elimination) of two hydrides to form H2. The third entry, termed "H-H hp TS," shows the barrier for the hp (hydride, proton) mechanism, where one hydride and one proton form H2. In the fourth entry, termed "- H<sup>a</sup> 2 ," the two hydrides have been removed to form a free hydrogen molecule. The fifth entry, termed "- H<sup>b</sup> 2 ", differs from the second one by a rotation of the homocitrate ligand. This rotation was found in earlier studies (Siegbahn, 2018) to be required for binding N2. In the final sixth entry, termed "+ N2–H<sup>b</sup> 2 ," N<sup>2</sup> binds to the fifth structure.

The results for the functional with 20% agree very well with the EPR experiments. H<sup>2</sup> can be removed with a very small energy gain of –2.1 kcal/mol. The barrier for H<sup>2</sup> elimination using the re mechanism is 13.1 kcal/mol, well within the range required by experiments. Even more importantly, the barrier using the hp mechanism of 17.3 kcal/mol is 4.2 kcal/mol higher than for the re mechanism, leading to the required N<sup>2</sup> activation rather than H<sup>2</sup> production. The following homocitrate rotation is only uphill by +1.4 kcal/mol. In the next step, N<sup>2</sup> binds exergonically to the third structure by –0.1 kcal/mol. This means that the release of H<sup>2</sup> and the binding of N<sup>2</sup> should be easily reversible as observed experimentally. There is a minor discrepancy, since there should be some driving force (exergonicity) for this process, but the error is small for such a complicated process. Going to 15%, the discrepancy is somewhat larger. The release of H<sup>2</sup> is exergonic by -6.6 kcal/mol and N<sup>2</sup> binding is endergonic by +4.3 kcal/mol. However, the discrepancy to experiments is not alarming. Again, the barrier eliminating H<sup>2</sup> by the re mechanism is preferred TABLE 2 | Relative energies (kcal/mol) for states obtained after eight reductions from the ground state of nitrogenase, using density functionals with different fractions of exact exchange.


<sup>a</sup> H<sup>2</sup> is removed with homocitrate non-rotated. <sup>b</sup> H<sup>2</sup> is removed with homocitrate rotated.

over the one using the hp mechanism, now by 6.2 kcal/mol. For the functional with 10%, the discrepancy to experiments is significantly increased. Most noteworthy, the binding of N<sup>2</sup> is now endergonic by + 11.2 (9.3 + 1.9) kcal/mol. However, the barrier for the re mechanism is still lower than the one for the hp mechanism. Finally, for the functional with 0%, the discrepancy to experiments increases further. For example, the binding of N<sup>2</sup> is now endergonic by +14.2 kcal/mol. Furthermore, the barrier for the hp mechanism is now lower than for the re mechanism, leading to production of H<sup>2</sup> rather than protonation of N2.

### IV. CYTOCHROME c OXIDASE

The membrane bound enzyme cytochrome c oxidase (CcO) catalyzes the reduction of molecular oxygen to water as the last step in the respiratory chain in aerobic organisms. The chemistry occurs in an active site consisting of a high-spin heme group, a copper complex, labeled CuB, and a redox active tyrosine, referred to as the binuclear center (BNC). The electrons are delivered to the BNC from a reduced cytochrome c, located on the positive side of the membrane. The protons are transferred to the BNC from bulk water on the negative side of the membrane. Molecular oxygen binds to the reduced BNC, with heme-Fe(II), CuB(I) and TyrOH. The O-O bond is cleaved in a single reaction step, yielding a four electron oxidized BNC, with heme-Fe(IV)=O, CuB(II)OH and a neutral TyrO-radical (Proshlyakov et al., 1998; Fabian et al., 1999). The rest of the catalytic cycle consists of four reduction steps, each taking up one electron and one proton to the BNC, leading back to the reduced state with two new water molecules. The overall energetics of the reduction process is obtained from the difference in reduction potential between the electron donor and the acceptor, molecular oxygen. With cytochrome c as the electron donor, reduction of one oxygen molecule becomes exergonic by 50.7 kcal/mol (2.2 eV) (Brzezinski, 2004). A significant part of this free energy is conserved as an electrochemical gradient across the membrane, which in turn is used by another enzyme, ATP-synthase, to produce ATP, the energy currency of the cells. Two processes contribute to the gradient buildup, one is the electrogenic chemistry (taking the electrons and the protons from opposite sides of the membrane), and the other is the so called proton pumping, which means that the chemistry is coupled to proton transfer across the entire membrane. The largest group of CcO:s are known to pump one proton per electron, i.e., four protons per oxygen molecule (Brzezinski and Gennis, 2008; Kaila et al., 2010). The mechanism for the proton pumping, i.e., how to couple the transfer of one electron to the active site with the uptake of two protons from the negative side of the membrane, is still under debate (Rich, 2017).

The process of oxygen reduction in CcO has been studied in detail using density functional theory. A mechanism for the O-O bond cleavage step was suggested based on computational results at an early stage (Blomberg and Siegbahn, 2006, 2010), and it was later confirmed in a combined experimental and computational study (Poiana et al., 2017). Another result from the computational studies concerns the mechanism of proton pumping, for which it has been suggested that the redox active tyrosine in the active site plays an essential role (Blomberg, 2016). Experimental support for the suggested pumping mechanism is that an active site tyrosine is conserved in all families of CcO's (Hemp et al., 2006). To understand the proton pumping in CcO it is essential to know the energetics of the individual reduction steps in the catalytic cycle, which depends on the reduction potential of the active site cofactor that is reduced in each particular step. Experimental investigations have indicated that the four different active site reduction potentials involved are quite different, and that only two of them seem to be large enough (about 0.8 V) to afford proton pumping (Wikström and Morgan, 1992; Kaila et al., 2010). Also the BNC potentials have been studied computationally, with result that partly differ from the experimental measurements (Blomberg and Siegbahn, 2015a), see further below. In all these computational studies the b3lyp type of functional was used as described above in Computational details.

To further test the reliability of the results obtained for CcO, it was decided to systematically investigate the calculated proton coupled reduction potentials for the active site cofactors by varying the fraction of exact exchange in the b3lyp potential, using the model shown in **Figure 5**. It is noted that each of the proton coupled reduction potentials corresponds to the formation of a new O-H bond in the active site. The individual reduction potentials can therefore be estimated from the strengths of the different O-H bonds. The strength of an O-H bond is not sensitive to a distant surrounding, since the charge is not changed by the addition of a (H+,e−)-couple, which means that reasonably sized models (150–200 atoms) can be used in the calculations. For each functional (different percentage of exact exchange) new structures are optimized, and since the most recently used procedure for CcO has been to optimize the structures with dispersion, the D3 dispersion correction with parameters from the original b3lyp-D3 functional (Grimme et al., 2010) was used in both geometry optimizations and energy calculations. Since results for the functional with 15% were also obtained using structures optimized with 20% (not reported) it could be noted that the final energetics differs only slightly (<2 kcal/mol) from the ones reported below.

To estimate the energetics of the reduction steps of the CcO catalytic cycle, the sum of the energy of an electron transferred from cytochrome c plus a proton transferred from bulk water is needed. This energy is parameterized to reproduce the experimental results for the overall reaction (50.7 kcal/mol, see above). By subtracting the parameterized energy from each O-H bond strength, the exergonicity relative to the cytochrome c donor is obtained, and by comparing to the midpoint potential

TABLE 3 | Calculated energetics as a function of the amount of exact exchange for the reduction of CuB (taking up one electron and one proton) in the catalytic cycle of cytochrome c oxidase.


<sup>a</sup> From Kaila et al. (2010). Although the experimental measurements have given low values, below 0.4 V, it has been suggested based on experiment that during turnover the Cu<sup>B</sup> potential actually is higher than the measured values (Kaila et al., 2010).

<sup>b</sup> Exergonicity relative to cytochrome c, with a reduction potential of 0.25 V.

TABLE 4 | Calculated energetics as a function of the amount of exact exchange for two of the four reduction steps (taking up one electron and one proton) in the catalytic cycle of cytochrome c oxidase.


<sup>a</sup> From Kaila et al. (2010).

<sup>b</sup> Exergonicity relative cytochrome c, with a reduction potential of 0.25 V.

of cytochrome c (0.25 V) the midpoint potential can be estimated for each cofactor. The first result to be discussed concerns the proton coupled reduction potential for CuB, for which the experimental measurements have given low values, 0.2–0.4 V (Jancura et al., 2006; Brand et al., 2007; Vilhjámsdóttir et al., 2018). In contrast, the previous calculations indicate a much higher value (0.9–1.0 V) during catalytic turnover (Blomberg and Siegbahn, 2015a), and an explanation for the low experimental values has been suggested based on the computational results (Blomberg and Siegbahn, 2015b). As can be seen in **Table 3**, all functionals support a large reduction potential for CuB, with results that vary only slightly and in a somewhat irregular way with the fraction of exact exchange in the functional, 0.97-1.19 V. Interestingly, using a small model complex it was also shown that the b3lyp<sup>∗</sup> (15%) results for the O-H bond strength in CuB(I)OH<sup>2</sup> agrees to within one kcal/mol with CCSD(T) results (Blomberg and Siegbahn, 2015a).

As mentioned above, experiments have shown that two of the active site reduction potentials are about 0.8 V (Kaila et al., 2010), which is large enough for proton pumping in the corresponding reduction steps. These two steps concern reduction of the tyrosyl radical and heme-Fe(IV)=O, respectively (Kaila et al., 2010). The calculated results for these two cofactors using the different functionals are reported in **Table 4**. For the tyrosyl radical the experimental value is 0.82 V, and all functionals with a fraction exact exchange (10–20%) are in reasonable agreement with experiment, 0.73–0.86 V. Again, the reduction potential does not vary very much with the exact exchange, and even the functional without exact exchange (0%) is not too far off, with a value of 0.60V. For the heme-Fe(IV)=O reduction the situation is quite different. The experimental value is 0.76 V (Kaila et al., 2010), and here the functional with 15% gives the best agreement, with a value of 0.68 V. The result with 20%, 1.01 V, is also in reasonable agreement with experiment. The result without exact exchange (0%) disagrees qualitatively with experiment, the calculated value is –0.22 V, and also with only 10% exchange the calculated value, 0.45 V, is quite far from the experimental value. An important factor for the strong variation of this reduction potential is that there is a change of spin coupling in the heme-Fe(IV)=O → heme-Fe(III)OH transition, where iron is low-spin coupled in heme-Fe(IV)=O and high-spin coupled in heme-Fe(III)OH.

### V. CONCLUSIONS

By using the present systematic approach of DFT, a few major conclusions can be drawn for the nitrogenase mechanism. In **Table 1**, the results are given for different E<sup>4</sup> structures obtained after four reductions of the ground state, as has been suggested experimentally. The most important result here is that no functional gives results consistent with the conclusions of the experimental EPR study (Hoffman et al., 2013). In particular all functionals give a very large exergonicity for releasing H2. This means that the experimentally suggested structure can not be supported by any DFT functional. The discrepancy to experiments is very large, independent of fraction of exact exchange used. Even the non-hybrid method (0%), gives very poorly bound hydrides, which can be removed with a gain of 34 kcal/mol, when experiments indicate that the release should be nearly thermoneutral. In contrast, the theoretically suggested structure for E4, obtained after eight reductions of the ground state, shows a much more reasonable agreement with the experimental EPR information, in particular, with 20 and 15% exchange. Since DFT indeed found a structure that agrees well with what is known for the E<sup>4</sup> state, a conclusion that DFT totally fails for nitrogenase is very farfetched. If that would have been the case, it would be the first example among the many redox reactions studied, that would show that behavior (Blomberg et al., 2014). If no structure would have been found that agreed with experimental information for E4, the conclusion could have been different. A very recent study with many different functionals did not find any functional that gave a preference for the experimental structure (Cao et al., 2018), in agreement with the prediction made here.

For CcO, the calculations show that the active site cofactor Cu<sup>B</sup> has a large midpoint potential, about 1 V, a result obtained regardless of the fraction of exact exchange in the functional. This is in contrast to the much lower potentials obtained in experimental measurements, but in agreement with experimental observations on proton pumping. For another active site cofactor, heme-Fe(IV)=O, the functional with 15% exact exchange gives the best agreement with experimental observations for the midpoint potential, and the functional without exact exchange gives qualitatively wrong results.

In summary, the results for the different functionals show that the best agreement with experiments, for both nitrogenase and cytochrome c oxidase, is obtained with 15–20% exact exchange in the functional. 10% exact exchange gives results slightly worse, and the use of a non-hybrid functional (0% exact exchange) gives qualitatively wrong potential surfaces in both cases. These results are in line with experience gathered during the past two decades (Blomberg et al., 2014). This experience can be used in two ways. It could either be used to calibrate the exact exchange fraction based on some well-known experimental fact for that system. Or,

### REFERENCES


the difference between the results for 15 and 20% could be used as an estimate of the error in the calculations. The latter approach has mainly been used in our previous studies.

### AUTHOR CONTRIBUTIONS

PS wrote the paper and did nitrogenase calculations. MB wrote the paper and did cytochrome c oxidase calculations.

### FUNDING

This work was supported by the Swedish Research Council (grant numbers 2015-04104 and 2016-03721). Computer time was provided by the Swedish National Infrastructure for Computing.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Siegbahn and Blomberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computational Study of Protein-Ligand Unbinding for Enzyme Engineering

#### Sérgio M. Marques 1,2, David Bednar 1,2 and Jiri Damborsky 1,2 \*

*<sup>1</sup> Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, Brno, Czechia, <sup>2</sup> International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czechia*

The computational prediction of unbinding rate constants is presently an emerging topic in drug design. However, the importance of predicting kinetic rates is not restricted to pharmaceutical applications. Many biotechnologically relevant enzymes have their efficiency limited by the binding of the substrates or the release of products. While aiming at improving the ability of our model enzyme haloalkane dehalogenase DhaA to degrade the persistent anthropogenic pollutant 1,2,3-trichloropropane (TCP), the DhaA31 mutant was discovered. This variant had a 32-fold improvement of the catalytic rate toward TCP, but the catalysis became rate-limited by the release of the 2,3-dichloropropan-1-ol (DCP) product from its buried active site. Here we present a computational study to estimate the unbinding rates of the products from DhaA and DhaA31. The metadynamics and adaptive sampling methods were used to predict the relative order of kinetic rates in the different systems, while the absolute values depended significantly on the conditions used (method, force field, and water model). Free energy calculations provided the energetic landscape of the unbinding process. A detailed analysis of the structural and energetic bottlenecks allowed the identification of the residues playing a key role during the release of DCP from DhaA31 via the main access tunnel. Some of these hot-spots could also be identified by the fast CaverDock tool for predicting the transport of ligands through tunnels. Targeting those hot-spots by mutagenesis should improve the unbinding rates of the DCP product and the overall catalytic efficiency with TCP.

### Edited by:

*Fahmi Himo, Stockholm University, Sweden*

### Reviewed by:

*Marco De Vivo, Fondazione Istituto Italiano di Technologia, Italy Mauricio Esguerra, Uppsala University, Sweden*

> \*Correspondence: *Jiri Damborsky jiri@chemi.muni.cz*

#### Specialty section:

*This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry*

Received: *30 October 2018* Accepted: *13 December 2018* Published: *08 January 2019*

#### Citation:

*Marques SM, Bednar D and Damborsky J (2019) Computational Study of Protein-Ligand Unbinding for Enzyme Engineering. Front. Chem. 6:650. doi: 10.3389/fchem.2018.00650* Keywords: unbinding kinetics, protein engineering, molecular dynamics, metadynamics, adaptive sampling, CaverDock

## INTRODUCTION

Until recently, the modern methods of structure-based drug design relied primarily on the high binding affinity toward the targets to predict their biological performance. However, that paradigm has been changed once it was realized that the half-life of a drug is equally important to define its in vivo efficacy, and hence both thermodynamics and kinetics profiles must be taken into account (Lu and Tonge, 2010). For this reason, we have recently witnessed a boom of different methods for the computational prediction of receptor-ligand (un)binding kinetics (Chiu and Xie, 2016; Ferruz and De Fabritiis, 2016; Dickson et al., 2017; Rydzewski and Nowak, 2017; Bruce et al., 2018; Kokh et al., 2018). The importance of determining association and dissociation rates (kon and koff, respectively), however, is not restricted to the field of drug design. In structural biology and biocatalysis, the

**80**

study of the thermodynamics and kinetics of binding and unbinding can be very important to attain a deep understanding of the biological processes of interest. There are well-known cases where the substrate binding or the product release are the rate-limiting steps in the catalytic cycle (Wang et al., 2001; Bosma et al., 2003; Yao et al., 2005). Interestingly, it has been shown that the substrate unbinding, under certain condition, may also have a positive impact on the enzymatic turnover (Reuveni et al., 2014).Therefore, the computational study of the (un)binding processes might reveal their kinetic and/or thermodynamics bottlenecks, and, in some cases, lead to finding improved biocatalysts for biotechnological applications.

The haloalkane dehalogenases (HLDs, E.C.3.8.1.5) are one of such cases. These bacterial enzymes can perform the hydrolytic conversion of halogenated aliphatic compounds into the respective alcohols (**Scheme 1A**). They have several practical applications, namely in the synthesis of enantiopure chemical compounds, recycling of by-products, bioremediation, and biosensing (Koudelakova et al., 2013). As several other haloalkanes, 1,2,3-trichloropropane (TCP) is an anthropogenic compound which sometimes ends up contaminating the groundwater as a recalcitrant toxic pollutant. Therefore, biodegradation would be a possible solution for the remediation of the contaminated sites (Samin and Janssen, 2012). The HLD from R. rhodochrous, DhaA, can only moderately hydrolyze TCP into 2,3-dichloropropan-1-ol (DCP). However, the 5-point mutant DhaA31 (**Figure 1**) has been reported to display a turnover number enhanced by 32-fold, resulting in a turnover number (kcat) of 1.26 s−<sup>1</sup> (Pavlova et al., 2009). DhaA31 is currently one of the best known HLDs in hydrolyzing TCP, and it has been included in the biodegradation pathway to stepwise convert the toxic TCP into glycerol (Dvorak et al., 2014; Kurumbang et al., 2014).

The HLDs have a buried active site connected to the surface by molecular tunnels (**Figure 1**). Their catalytic cycle (**Scheme 1B**) consists of: the substrate binding to the enzyme (1), rearrangement of the substrate in the catalytic site to form a reactive configuration (2), a multi-stage chemical step (3), and the release of the alcohol and halide products to regenerate the free enzyme (4). The chemical step involves an SN2 attack of the D106 nucleophile on the electrophilic carbon atom of the substrate (DhaA numeration according to UniProt ID P0A3G2). The halide ion and the alkyl-enzyme intermediate are formed, the latter is attacked by a water molecule, activated by the catalytic base H272, which ultimately leads to the final products (Verschueren et al., 1993; Kutý et al., 1998; Marques et al., 2017). It is known that the hydrolysis of TCP by the wild-type DhaA (DhaAwt) is rate-limited by the SN2 reaction, while in DhaA31 the slowest step is the release of DCP. This knowledge was attained from comparison of steady-state kinetic rates with

pre-steady-state rates (Pavlova et al., 2009; Marques et al., 2017). Moreover, the mutations C176F and V245F in DhaA31 contributed the most to the improvement of the SN2 step toward TCP, whilst most of its bulky mutations—including C176F narrow down the molecular tunnels and thus hinder the release of the alcohol product (Marques et al., 2017).

By accelerating the DCP unbinding from DhaA31 with mutagenesis, without hampering other steps in the catalytic cycle, we might improve the efficiency of DhaA31 to degrade TCP even further, which is desirable for biotechnological applications. We have recently targeted the geometric bottleneck of DhaA31's main tunnel, with mutations introduced to position 176, that had a high impact on the catalysis with different substrates (Kaushik et al., 2018). However, only minor improvements in the activity toward TCP were attained. The present work allows us to tackle the challenge from a new perspective.

Here we report a thorough computational study of the unbinding of DCP from DhaA31 and DhaAwt. Initially, we calculated the kinetic rates using metadynamics (MTD), and

**Abbreviations:** TCP, 1,2,3-trichloropropane; DCP, 2,3-dichloropropan-1-ol; HLD, haloalkane dehalogenase; MTD, metadynamics; funnel-MTD, funnelmetadynamics; CV, collective variable; RMSD, root-mean-square deviation; MD, molecular dynamics; HTMD, high throughput molecular dynamics; FES, free energy surface; τ , average transition time; MSM, Markov state model; SD, standard deviation.

adaptive sampling under different simulation conditions. This helped us to assess the best procedures for predicting absolute and relative unbinding rates. Next, we performed free energy calculations using funnel-metadynamics (funnel-MTD), and calculated the energetic profiles of the product unbinding. This allowed us to compare the energy barriers, identify the thermodynamic bottlenecks, and thus predict several hot-spots for mutagenesis that could potentially improve the release of the DCP product and thus enhance the conversion of TCP by DhaA31.

### MATERIALS AND METHODS

### Metadynamics Unbinding Kinetics System Preparation

The complexes of DhaA31 and DhaAwt bound with DCP and chloride products in their active site were prepared using the positions of DCP (only the R-enantiomer was studied here) docked into the corresponding crystal structures (PDB entries 3RK4 and 4E46, respectively), protonated and treated as previously described (Marques et al., 2017). The positions of the Cl<sup>−</sup> ion were taken from the respective crystal structures. The PREPI parameters for DCP were prepared using the Antechamber module of AmberTools 14 (Case et al., 2014), from the MOL2 structure containing the partial atomic charges, as previously calculated (Marques et al., 2017), and compiled using the atom types of GAFF force field. The topology and coordinates of the complexes hydrated only with the crystallographic water molecules were generated with tLEaP module of AmberTools 14, with the protein, and ions described by the AMBER ff12SB force field (Maier et al., 2015), and converted to the GROMACS format using the ACPYPE script (Sousa da Silva and Vranken, 2012). Each system was solvated with a cubic box of TIP3P water molecules (Jorgensen et al., 1983) with the edges at least 8 Å away from the protein atoms and then neutralized with Na<sup>+</sup> ions using the editconf module of GROMACS 5.0 package (Abraham et al., 2015).

### System Equilibration

Energy minimization was performed with GROMACS 5.0.7 (Abraham et al., 2015) without restraints to relax the whole system, using the steepest descent method until the maximum force converged to values below 1 kJ/mol·nm with a maximum of 500 steps. The Particle Mesh Ewald method was used for the treatment of the long range non-bonding interactions beyond the 10 Å cut-off (Darden et al., 1993), and the periodic boundary condition was applied. Equilibration dynamics was run in two steps: a first equilibration of 500 ps in the isothermal-isobaric ensemble (NPT), at 1 atm, with the isotropic Berendsen barostat (Berendsen et al., 1984), and coupling constant 0.2 ps, and a second one of 1 ns in the isothermal-isochoric ensemble (NVT). Both steps were conducted at 300 K with the velocityrescaled Berendsen thermostat, to ensure the proper canonical ensemble (Bussi et al., 2007), with constant for coupling of 0.1 ps. All simulations were performed with the periodic boundary conditions in all directions, the Verlet pair-lists scheme (Verlet, 1967) with cut-off values of 10 Å for both short-range coulombic and van der Waals potentials, and the LINear Constraint Solver (LINCS) (Hess et al., 1997) algorithm to constrain the bonds and eliminate drifts. The integration time step was 2 fs and the energy and coordinates of the system were recorded every 1 ps.

### Setup of the Collective Variable

A path-based collective variable (path CV) was defined to describe the release process of DCP along the p1 tunnel, according to the formalism as previously described (Branduardi et al., 2007; Bonomi et al., 2008). It involves a distance s along a reference path that leads from state A (the fully bound state, the docked conformation in the active site) to B (fully unbound state, with DCP in the bulk solvent). The path was constructed based on several snapshots selected from previous accelerated molecular dynamics (aMD) simulations with DhaA31 and DhaAwt (Marques et al., 2017) to have DCP at different distances and orientations between states A and B. In total 9 frames were chosen for each system, and only the ligand and the residues of p1 tunnel in contact with DCP during the release were selected as the path reference (**Figure S1** and **Table S1**). The path CV (hereafter termed p3) was then defined by the root mean square deviation (RMSD) space. From a further analysis of a set of unbinding metadynamics simulations, it was found that the direct variable of the path s (named p3.sss) was degenerated and hence was not suitable to be used alone in this study. The degeneracy was lifted using a second CV, which was the distance (d1) between the center of mass of DCP to the active site cavity, defined by the center of mass of the atoms Y176-Cβ, F205-Cα, L209-Cα, and H272-Cα for DhaA31, and C176-Cβ, F205-Cα, L209-Cα, and H272-C<sup>α</sup> for DhaAwt. The λ parameter was set to 92 for the DhaA31/DCP and 100 for DhaAwt/DCP. The values of λ were obtained from the analysis of the RMSD matrix obtained from the frames.

### Infrequent Metadynamics Simulations

All metadynamics (MTD) simulations were performed using PLUMED (Tribello et al., 2014) plugin, version 2.2.3 with the GROMACS 5.0.7 (Abraham et al., 2015) code. The NVT ensemble at 300 K was used as in the equilibration, with further position restraints on the atoms Leu36-Cα, Ile104-Cα and Leu237-C<sup>α</sup> with harmonic constant of 2.38 kcal/mol·Å 2 (1,000 kJ/mol·nm<sup>2</sup> ) in each dimension, to prevent drifting of the protein across the periodic cell. The potential biases were added to the path CV s dimension and the distance d variables, deposited every 50 ps, with initial height of 0.60 kcal/mol (2.50 kJ/mol) for both variables. The Gaussian widths (σ) for s and d1 were set, respectively, as 0.05 and 0.014 Å for DhaA31/DCP and 0.07 and 0.013 Å for DhaAwt/DCP, and a decay corresponding to a bias factor of 10. In total 25 independent infrequent MTD simulations were run until the ligand was released to reach distances d1 > 22 Å from the active site without immediate rebinding. These times corresponded to the biased release times, tbiased. The trajectories were visualized using VMD 1.9.1 (Humphrey et al., 1996) and PyMOL 1.7.4 (The PyMO L Molecular Graphics System, 2014).

To obtain the unbiased release time tunbiased, the acceleration factor α was used as describe by Equations 1, 2 (Tiwary and Parrinello, 2013; Tiwary et al., 2015):

$$\alpha = \langle e^{\frac{V(r,t)}{k\_B T}} \rangle \tag{1}$$

$$t\_{unbiased} = t\_{biased} \times \alpha \tag{2}$$

where h i denotes the running average accumulated through the course of the simulation at the biased time, t ortbiased, V(r,t) is the time-dependent metadynamics bias, r the set of CV descriptors, and kBT is the temperature in energy units, which has the value of 2.50 kJ·mol−<sup>1</sup> at 300 K.

### Calculation of koff From Metadynamics

The estimation of the unbinding rates koff involves the calculation of the characteristic transition time τ of a Poisson process, obtained through a least squares fitting of the empirical cumulative distribution function (ECDF) obtained with the metadynamics unbiased times, tunbiased, with the theoretical expression of a cumulative distribution function (TCDF), which in the case of a homogeneous Poisson process is given by Equation (3) (Tiwary et al., 2015):

$$\text{TCDF} = 1 - e^{-\frac{t}{\bar{t}}} \tag{3}$$

The theoretical (TCDF) and empirical (ECDF) distributions are compared by a Kolmogorov-Smirnoff test, which estimates an associated p-value, which represents the probability that the distribution of times extracted from metadynamics is obtained from the theoretical exponential distribution, and describes the quality of the data (Salvalaglio et al., 2014; Tiwary et al., 2015). Acceptable distributions should always present p-value >0.05, otherwise the set of results is discarded. To perform the fitting of those distributions, the Kolmogorov-Smirnov test, and calculate the dissociation transition time τ off, the **STPtest.m** Matlab script was used as provided (Salvalaglio et al., 2014). The dissociation rate koff was then calculated from τ off by the Equation 4:

$$k\_{\rm off} = \frac{1}{\mathbf{r}\_{\rm off}} \tag{4}$$

The error associated with the calculated koff value was estimated by a bootstrap analysis of the data set of unbiased release times obtained for each system. This was performed by re-analyzing 500 sub-samples extracted randomly from the original ensemble of release times.

### Adaptive Sampling Kinetics System Preparation

The complexes of DhaA31 and DhaAwt bound with DCP and chloride products in their active site, hydrated with the crystallographic waters, were prepared as described for the metadynamics. Na<sup>+</sup> and Cl<sup>−</sup> ions were added in order to achieve ionic strength of 0.1 M, and a TIP3P (Jorgensen et al., 1983) cubic box of water molecules with the edges 10 Å distant from the original system was added. The topology and coordinates of the hydrated complexes were generated with tLEaP module of AmberTools 14 (Case et al., 2014), with the protein and ions described with the ff12SB AMBER force field (Hornak et al., 2006; Joung and Cheatham, 2008, 2009; Nguyen et al., 2014). For comparison testing different simulation conditions, the systems were also prepared with force field ff14SB (Maier et al., 2015) and OPC3 water model (Izadi and Onufriev, 2016).

### System Equilibration

The systems were equilibrated using the Equilibration\_v2 module of high-throughput molecular dynamics (HTMD) (Doerr et al., 2016). The system was first minimized using conjugate-gradient method for 500 steps. Then the system was heated and minimized as follows: (I) 500 steps (2 ps) of NVT equilibration with the Berendsen barostat to 298 K, with constraints on all heavy atoms of the protein, (II) 625 000 steps (2.5 ns) of NPT equilibration with Langevin thermostat with 1 kcal·mol−<sup>1</sup> ·Å −2 constraints on all heavy atoms of the protein, and (III) 625 000 steps (2.5 ns) of NPT equilibration with the Langevin thermostat without constraints. During the equilibration simulations, holonomic constraints were applied on all hydrogen-heavy atom bond terms and the mass of the hydrogen atoms was scaled with factor 4, enabling the simulations to run with 4 fs time steps (Feenstra et al., 1999; Harvey and De Fabritiis, 2009; Harvey et al., 2009; Hopkins et al., 2015). The simulations employed periodic boundary conditions, using the particle mesh Ewald method for treatment of interactions beyond 9 Å cut-off, electrostatic interactions suppressed for more than 4 bond terms away from each other and the smoothing and switching van der Waals and electrostatic interaction cut-off at 7.5 Å (Harvey and De Fabritiis, 2009).

### Adaptive Sampling

The HTMD was used to perform adaptive sampling of the RMSD of the Cα atoms. The 20 ns production molecular dynamics (MD) runs were started with the system resulting from the equilibration cycle and employed the same settings as the last step of the equilibration. The trajectories were saved every 0.1 ns. Adaptive sampling was performed using the distance between the central C-2 atom of DCP and the Cγ atom of the catalytic nucleophile D106 as the reaction coordinate, and a time-based independent component analysis (TICA) (Naritomi and Fuchigami, 2011) in 1 dimension. Unless stated otherwise, 40 epochs of 10 MDs each were performed for DhaA31 and 30 epochs for DhaAwt, corresponding to cumulative simulation times of 8 and 6 µs, respectively.

### Markov State Model Construction

The simulations were made into a simulation list using HTMD method and water was filtered out, and unsuccessful simulations with length <20 ns were omitted. This resulted in 8 µs of simulation time (400 × 20 ns) for DhaA31 and 6 µs of simulation time (300 × 20 ns) for DhaAwt (Doerr et al., 2016). The DCP dynamics was studied by the distance between the C-2 atom of DCP and the Cγ atom of the catalytic nucleophile D106. The data was clustered using MiniBatchKmeans algorithm to 200 clusters. 15 ns lag time was used in the models to construct 3 Markov states, and the Chapman-Kolmogorov test was performed to assess the quality of the constructed states. A bootstrapping calculation was performed with 80% of the data and repeated 500 times to estimate the errors in the estimated kinetic parameters.

### Funnel Metadynamics

The MTD simulations were performed using the GROMACS 5.0.7 (Abraham et al., 2015) patched with the PLUMED plugin (Tribello et al., 2014), version 2.2.3, modified to include the funnel metadynamics (funnel-MTD) algorithm and used as provided by the authors of the method (Limongelli et al., 2013). The NVT ensemble at 300 K was used as previously, with the further position restraints on the Leu36-Cα, Ile104-Cα, and Leu237-Cα atoms with a harmonic force constant of 59.8 kcal/mol·Å 2 (25,000 kJ/mol·nm<sup>2</sup> ) in each dimension to prevent the protein from drifting across the periodic cell. These atoms were chosen for being buried and having some of the lowest Bfactors in the respective crystal structures. The potential biases were added to the path CV s variable, deposited every 1 ps, with initial height of 0.60 kcal/mol (2.50 kJ/mol). The Gaussian width (σ) was 0.05 Å for DhaA31/DCP and 0.07 and 0.013 Å for DhaAwt/DCP, as previously, and a decay corresponding to a bias factor of 10. A funnel-shaped restraint with 83.6 kcal/mol.A<sup>2</sup> (35,000 kJ/mol.nm<sup>2</sup> ) force constant, was defined by the Z axis passing through the points A—the coordinates of the D106-Cα atom—and B—the geometric center of the F144- Cα, F152-Cα, A167-Cα, and K175-C<sup>α</sup> atoms –, the α angle of 0.55 rad, Zcc 20.0 Å, and Rcyl 5.0 Å. To prevent the ligand from crossing the periodic cell, an upper distance restraint with 12.0 kcal/mol.A<sup>2</sup> (5,000 kJ/mol.nm<sup>2</sup> ) force constant was imposed at 23 Å from point A. The free energy surface (FES) was computed using the SUM\_HILLS module of PLUMED, from the histogram distribution reweighted from the biases added by the metadynamics (Bonomi et al., 2009; Tiwary and Parrinello, 2015). The FES was reanalyzed for the variable d1, defined above, using the DRIVER module of PLUMED. The histogram reweighting was performed by taking into account all the biases from the metadynamics and the restraints. The relevant states were selected from the FES, and the simulation snapshots corresponding to d1 values within ± 0.01 Å were extracted using GROMACS 5.0.7 (Abraham et al., 2015), in PDB format, after the trajectory was aligned by the Cα atoms.

### Binding Free Energy

The free energy of binding was calculated by the molecular mechanics/generalized Born solvent accessible surface area (MM/GBSA) method to determine the interaction energy of DCP with the protein residues in each one of the ensembles obtained from the selected states of the FES. For that, the topology of the systems in the GROMACS format were converted to the AMBER format using the ParmEd program (Swails, 2010). The ante-MMPBSA.py (Miller et al., 2012) module of AmberTools 14 (Case et al., 2014) was used to remove the solvent and ions from the resulting topology files and define the Born radii as mbondi2, and generate the corresponding topologies of the complex, receptor, and ligand, to be used in the MM/GBSA calculations. The state ensembles, in PDB format, were manually stripped from any ions and solvent. The MMPBSA.py.MPI (Miller et al., 2012) module of AmberTools 14 was used to calculate, in parallel, the mean free energy of binding considering every snapshot of the PDB ensemble. The generalized Born method was used (&gb namelist) with implicit generalized Born solvent model (igb=8) and 0.1 M ionic strength (saltcon=0.1). Decomposition of the pairwise interactions were generated (&decomp namelist) with discrimination of all types of energy contributions (idecomp=4) for the whole residue (dec\_verbose=0).

### CaverDock Simulations Tunnel Calculations

CAVER 3.02 (Chovancova et al., 2012) was used to calculate the tunnels in the protein structure of the DhaA31 and DhaAwt, as previously prepared and treated prior to the metadynamics. The tunnels were calculated using a probe radius of 0.7 Å, a shell radius of 3 Å, and shell depth 4 Å. The starting point for the tunnel calculation was the same point in the active site as previously used to calculate the distance d1 (center of mass of the

atoms Y176-Cβ, F205-Cα, L209-Cα, and H272-Cα for DhaA31, and C176-Cβ, F205-Cα, L209-Cα, and H272-Cα for DhaAwt).

### CaverDock Calculations

CaverDock package ("CaverDock," 2018; Filipovic et al., 2018 ˇ ) was used to calculate the trajectories of DCP through the p1 tunnel of DhaA31 and DhaAwt, as calculated by CAVER. The input files previously prepared for the receptors and ligand, respectively, in PDB or MOL2 format, were converted to the AutoDock Vina-compatible PDBQT format using the MGLTools v1-5-7rc1 (Morris et al., 2009), preserving the previously calculated partial charges for the ligand. The tunnels were extended by 6 Å and discretized with 0.3 Å increments. The ligand started in the active site and it was moved toward the protein surface. The side chain flexibility was iteratively introduced using the default settings, with two automatically chosen tunnel residues made flexible per iteration.

### RESULTS

### Metadynamics Unbinding Kinetics

The unbinding of the alcohol product has become the ratelimiting step in the catalytic conversion of TCP by DhaA31, and that step is expected to be slower than with the wild-type DhaAwt (Pavlova et al., 2009; Marques et al., 2017). Aiming at verifying this computationally, we have used a state-of-the-art method for the calculation of the kinetic rates of ligand unbinding, which is based on metadynamics (Tiwary et al., 2015).

The metadynamics (MTD) relies on a set of collective variables (CVs) to describe the system and the process under study. Here we have made use of the knowledge acquired from a previous computational work (Marques et al., 2017) to define the unbinding path of DCP. In that study, the release of DCP from the active site was observed with both DhaA31 and DhaAwt, always through the main tunnel (p1 tunnel; see **Figure 1**), and these simulations were used to provide several frames for describing the path CV for those two systems. A discussion on the optimization of the CV is presented in **Discussion S1** in **Supplementary Material**.

Twenty five MTD simulations were performed with each protein containing DCP in their active site, and run until DCP was fully released to the bulk solvent (distance d1 > 22 Å from the active site) without immediate rebinding to the tunnel. The release times obtained from the MTD simulations were converted to unbiased times, and the results showed a large dispersion of the release times for each system. But, as expected, in average DhaAwt released DCP significantly faster than DhaA31 (**Figure 2** and **Figure S2**). The unbiased release times were fitted to determine the distribution and probabilities of the transitions (the ligand unbinding; see **Figure S3**). These analyses resulted in p-values above the minimum confidence threshold of 0.05 (**Table S2**), and the average dissociation times (τ off) and kinetic constants (koff) were obtained (**Table 1**). The evolution analysis of the τ off with the number of simulation runs showed that the estimation of the dissociation time was wellconverged even using lower number of simulations (**Figure S4**). With transition times in the range of microsecond timescales, the predicted koff value was 36-fold faster for DhaAwt than for DhaA31 (**Table 1**). Such trend is in agreement with the previous findings that DCP was significantly more prone to be released from DhaAwt than from DhaA31 (Marques et al., 2017).

### Adaptive Sampling Kinetics

At this point we wanted to validate the kinetic rates previously calculated with the MDT approach by using another advanced and independent method. Thereby we can compare and assess the reproducibility of the kinetic predictions using different methods. So, we decided to perform high-throughput molecular dynamics (HTMD) using the adaptive sampling technique in combination with Markov state models (MSMs). This method allows us to obtain the transition matrix between the states and thus predict the kinetic rates of unbinding (Doerr et al., 2016).

Initially, the adaptive simulations were performed using the same force field and water model as the MTD simulations (ff12SB and TIP3P). We found that the distance between the ligand and the catalytic nucleophile D106 (defined as described in the methods) was a good metrics for calculating the Markov state models for describing the events that we wanted to survey (the release of DCP from DhaA31 and DhaAwt). We defined 3 Markov states which were satisfying when visually inspected: one state corresponded to DCP located in the active site, an intermediate state with DCP in the main access tunnel (p1), and the unbound state with DCP outside the protein (**Figure 3**). The Chapman-Kolmogorov test was performed to assess the quality of the Markov state models, which revealed satisfying for the parameters used (**Figures S5**–**S7**). The kinetic rates between the fully bound and fully unbound states were calculated (**Table 1**). The default simulation time was 8 µs for DhaA31 and 6 µs for DhaAwt, which generally proved satisfactory according to the errors obtained by bootstrapping. The release rates of DCP (koff) obtained from the current adaptive sampling method (**Table 1**, column "ff12SB+TIP3P") were 1-2 orders of magnitude higher than the ones previously calculated with the MTD method. Regarding the relative values of koff values between DhaAwt and DhaA31, the order is maintained but with smaller difference between the two enzymes. DCP was released 2.5 times faster from DhaAwt than from DhaA31, which is only moderately in agreement with the MTD results (36 times faster for DhaAwt) and previous computational evidence (Marques et al., 2017). Regarding the remaining kinetic parameters, DhaA31 seemed to be more prone for rebinding DCP than DhaAwt, showing higher kon and lower K<sup>d</sup> values than DhaAwt. This is not in agreement with the experimental data, which showed higher K<sup>d</sup> for DhaA31 than for DhaAwt (**Table 1**). The free energy of the bound state, compared to the unbound state, was −1.76 ± 0.19 kcal/mol for DhaA31, and −0.54 ± 0.34 kcal/mol for DhaAwt. This means that DCP's bound state in DhaA31 is thermodynamically more stable than that of DhaAwt by 1.22 kcal/mol. Furthermore, the proteins remained stable throughout the simulations, as can be inferred from the RMSD plots (**Figure S8**).

Due to the dissimilarity between the values and the kinetic rates' ratios obtained from MTD and adaptive sampling, we decided to perform a similar study under different simulation condition. For that, we varied the force field (ff14SB instead of

obtained from the infrequent metadynamics simulations (25 runs were performed for each system).

TABLE 1 | Experimental and theoretical kinetic parameters obtained from the metadynamics and adaptive sampling simulations of DCP with DhaA31 and DhaAwt for all the tested force fields and water models.


*a* τ*off , mean dissociation transition time; koff , dissociation rate;* τ*on, mean association transition time; kon, association rate; Kd, equilibrium dissociation constant;* 1*G<sup>0</sup> eq, free energy difference between bound and unbound states; rel. koff , DhaAwt/DhaA31 ratio of koff rates; <sup>b</sup> force field and water model used; <sup>c</sup> solubility concentration; <sup>d</sup>detection limit of the instrument. The variability of the parameters is the SD obtained from a bootstrap analysis.*

ff12SB), and the water model (OPC3 instead of TIP3P). The koff values obtained with ff14SB were lower than with ff12SB for both systems (**Table 1**), suggesting that the dynamic properties with ff14SB are slower than with ff12SB. When the OPC3 water model was used, slower unbinding rates were observed, as compared to those obtained with TIP3P. Other kinetic parameters were also significantly affected by the force field and solvent model, in some cases differing by orders of magnitude (namely kon, Kd, and 1G 0 eq). The <sup>K</sup><sup>d</sup> values predicted with ff14SB+OPC3 approached the experimental ones more than the other conditions (8.3 ± 2.5 mM prediction vs. 0.95 mM experimental for DhaAwt). In some cases, the dynamic behavior of the systems changed so significantly that the initial simulation times (8 µs for DhaA31 and 6 µs for DhaAwt) did not provide enough sampling to produce precise estimations. This occurred for DhaA31 with ff14SB+OPC3 and for DhaAwt with ff14SB+TIP3P. In these

cases, the simulations were run twice as long (16 µs for DhaA31 and 12 µs for DhaAwt), although for DhaA31 even a longer simulation time might be required due to the timescales of the observed events. Overall, the ff14SB+OPC3 combination seemed to produce more accurate results, with predicted K<sup>d</sup> values closer to the experimental ones.

### Pre-steady State Kinetics

To validate the findings of the present computational study, the calculated kinetic properties were compared with the previously reported results from transient kinetic measurements.

The basis for the current work is that the rate-limiting step in the catalytic conversion of TCP by the DhaA31 mutant is the product release, i.e., unbinding of DCP from the protein to the bulk solvent. This conclusion has been made based on comparison of steady-state kinetic rates with results from the transient kinetics measurements (Pavlova et al., 2009). In this study we observed, that after a rapid mixing of DhaA31 with excess of TCP, there was a burst of both DCP and chloride, followed by a linear steady-state phase with the rate constant 1.36 ± 0.18 s−<sup>1</sup> for DCP. This rate is in a very good agreement with the value from the steady-state kinetics <sup>k</sup>cat 1.26 <sup>±</sup> 0.07 s−<sup>1</sup> . Further studies showed that the release of the halide was a fast process,

SCHEME 2 | Kinetic scheme showing the binding of DCP (L) to the enzyme (E). The transition of E to E' represents a conformational change in the enzyme.

$$k\_{obs} = k\_1 + \frac{k\_{-1} \cdot K\_d}{[L] + K\_d} \tag{5}$$

which allowed us to conclude that the unbinding of DCP is ratelimiting for DhaA31, while DhaAwt is limited by the catalytic step (Marques et al., 2017).

Binding experiments of DCP were also carried out with DhaA31 and DhaAwt using stopped-flow fluorescence. Unfortunately, these experiments proved unsuccessful with DhaA31 due to the very low affinity of this enzyme for DCP, for which no binding was observed at concentrations near the solubility limit. This also implies a high dissociation constant, with value K<sup>d</sup> > 20 mM. For DhaAwt, the fluorescence curves revealed a slow kinetics profile upon mixing with DCP that could be associated with a single exponential. This was an indication that a slow conformational change of the enzyme preceded the fast binding of DCP (**Scheme 2** and Equation 5). This fact, together with the time scale limitations of the instruments, disallowed the calculation of the binding, and unbinding rates of DCP, kon and koff, but the equilibrium dissociation constant was obtained as K<sup>d</sup> = 0.95 ± 0.34 mM. Moreover, the ratio between the enzyme conformations in equilibrium, E and E', favor the nonbinding form (E) by 2:1, with <sup>k</sup><sup>1</sup> <sup>=</sup> 3.31 <sup>±</sup> 0.27 s−<sup>1</sup> and k−<sup>1</sup> = 6.16 ± 0.42 s−<sup>1</sup> (Marques et al., 2017).

### Free Energy Calculations

The free energy profiles allow a deep understanding of the thermodynamic and kinetic determinants of individual steps of the catalytic cycle, since they reveal the energy of the different states and the energy barriers to the required transitions. Therefore, their study can be important to asses not only the differences between the unbinding of DCP from DhaA31 and DhaAwt, but also to understand how the current bottlenecks can be overcome in the scope of protein engineering.

The funnel-metadynamics (funnel-MTD) is a method that allows the efficient calculation of the free energy surface (FES) of the process of ligand (un)binding (Limongelli et al., 2013). In this method, a funnel-shaped restraint prevents the ligand from drifting away and thus allows sampling several forward and reverse events (unbinding/binding), needed for a correct estimation of the free energy profile associated with the process. The funnel restraints used in this study (**Figure S9**) were defined iteratively in order to allow the free motion of the ligand within the active site, main tunnel (p1), and respective tunnel mouth. The funnel-MTD simulations were performed using the same path CV as previously used in the MTD unbinding kinetics. The simulations were run until the free energy achieved convergence. In these simulations DCP was released (reaching distances d1 > 15 Å) and rebound to the active site (reaching d1 < 5 Å) several times with each system, as desirable for the free-energy calculations (**Figure S10**). In both cases, the proteins remained stable throughout most of the simulations, with only a steep increase in the RMSD of DhaAwt backbone between 100 and 110 ns, which was reversible to levels below 1.2 Å (**Figure S11**).

The free energy was primarily calculated with respect to the original CV used in the MTD simulations, and computed by reweighting the histogram distribution with the biasing Gaussians added to the system. this resulted in the respective onedimensional FES (Bonomi et al., 2009; Tiwary and Parrinello, 2015). To analyze the convergence, the FES was calculated cumulatively by taking into account an increasing number of snapshots. Two energy basins were integrated and the difference between those basins was plotted as a function of the increasing simulation time (**Figure S12**). Similarly, the energy barrier between the same two energy basins was measured and plotted as a function of the simulation time (**Figure S13**). Altogether, this analysis allowed us to conclude that the simulations were wellconverged after the respective running times (500 ns for DhaA31 and 400 ns for DhaAwt).

The FES was projected against the distance of DCP to the active site (**Figure 4A**). For simplicity, the global minimum of each FES was adjusted to 0 kcal/mol. In both systems, the global minimum corresponded to some region in the middle of the p1 tunnel (d1 ≈ 5–6 Å), and not in the active site. From a previous study (Marques et al., 2017), we know that the length of p1 tunnel varies between ≈ 9 and 13 Å (associated with d1), which can roughly define the limits of the tunnel mouth (**Figure 1**). This corresponds well with the local minima at ≈ 12 Å for DhaA31, and ≈ 10 and 13 Å for DhaAwt (**Figure 4**). Two-dimensional FES profiles can be projected for any set of parameters, which might be useful to assess the potential degeneracy of the CV used (e.g., path CV and d1, **Figure S14**). DhaA31 presented one very steep energy barrier of 4.81 kcal/mol between the global minimum (at 5.94 Å) and a second minimum at the tunnel mouth (12.14 Å), while DhaAwt seems to have a rather smoother and stepwise transport process with two lower energy barriers (2.26 and 1.46 kcal/mol) to reach the tunnel mouth. This fact can have a strong impact on the product release kinetics (koff), since the transitions between states with lower barriers can occur exponentially faster. This result is consistent with the kinetic studies described above that showed slower DCP unbinding rates with DhaA31 than with DhaAwt.

### Structural Analysis of the Free Energy Landscapes

Here we aimed at understanding structurally the meaning of the different states if the calculated FES and identifying potential hotspots for improving the unbinding rate of DCP from DhaA31. The FES calculated for the release of DCP through the main tunnel of DhaA31 and DhaAwt have in common quite similar locations of the global minima and their shapes for longer distances. However, they differ significantly in the number of local minima and the heights of the different energy barriers along the pathway. Several relevant stages of the FES have been identified, and the respective simulation frames extracted (**Table S3**). The respective ensembles can be considered as representative structures of the main states of the systems along the process of DCP unbinding through the main tunnels of DhaA31 and DhaAwt (**Figure 4**). One first observation reveals that DCP was more confined within DhaA31 than DhaAwt, where it was more especially at the first transition state TS1 (**Figure 4**). Moreover, the states with d1 ≥ 13 Å contain DCP outside of the tunnel, where it forms interactions with the residues at the tunnel mouth, before it can be fully released to the bulk solvent (last state, for d1 ≥ 20 Å). A closer look at the enzymes' structures during the simulations revealed that some of the tunnel-lining residues were highly flexible and presented diverse conformations, which allow the ligand transport through the tunnel. One of such residues is F149, which clearly had two states, observed in both DhaA31 and DhaAwt: (i) the aromatic ring either pointed toward the middle of tunnel, or (ii) it pointed toward the side of the structure under the α4 helix (**Figure S15**). Because of these two conformations, F149 may play the role of gatekeeper to the transport of ligands through the p1 tunnel.

The measurement of the interactions formed by DCP with each residue may provide a quantitative assessment of what was discussed before and confirm the pivotal role of some residues during the unbinding process. Therefore, the free energy of binding (1Gbind) of DCP was calculated for the structural ensembles. The average interaction energies were calculated for the global minimum energy (**Figure S16**) and for the TS1 clusters (**Figure S17**). As expected, at the minimum energy the residues at the tunnel bottleneck dominated the interactions with DCP. The high standard deviations (SD) found for several residues reflect how diverse the structures within the same cluster were. We also tried to assess which residues contributed the most to prevent the transition from the global minima to TS1 due to the strength of their interactions with DCP. For that, we calculated the difference in binding energy between those two states. For DhaA31, the residues with the strongest influence (most negative 11Gbind) were F152 > F168 > F149 > F245 (**Figures 5**, **S18**). These are potential hot-spots for decreasing the energy barrier in DhaA31 and thus improve the unbinding rate of DCP. The residues F144, T148, and K175 form strong interactions in TS1, and might also reveal interesting hot-spots for mutagenesis (**Figure 5**). The hypothesis here is that the energy barrier may be lowered by increasing the interactions at TS1.

### CaverDock Calculations

Here we wanted to test the ability of a computationally cheap method for predicting the release of DCP from the buried active sites of DhaA31 and DhaAwt, and compare the results with those obtained from the robust methods used above. CaverDock (CaverDock, 2018; Filipovicˇ et al., 2018) was selected for that task. This is a computer program developed for the rapid prediction of the trajectory and energy profile of a ligand being transported through a molecular tunnel. This tool, based on molecular docking, can be used for the fast assessment or high-throughput screening of potential substrates, drugs or metabolites that are

expected to bind, or be transported through the tunnels of biomolecules (Pinto et al., 2018; Vávra et al., 2018).

CaverDock was used here to predict the trajectories and energy profiles of DCP through p1 tunnel of DhaA31 and DhaAwt, which were compared with the ones obtained from the robust MTD method. The results showed that the two enzymes have energy minima with similar binding energy (1Ebind = −4.1 kcal/mol), located at the active site instead of the middle of the tunnels (**Figure 6** and **Table S4**), which is in contrast with the FES obtained from the funnel-MTD. When the calculations

were performed with static receptors, DCP showed very high and repulsive energy barrier (with 1Ebind = +9.9 kcal/mol) for DhaA31, which was in great contrast with DhaAwt that had lower barrier (3.1 kcal/mol) and always favorable energies (1Ebind < 0). This was due to the clashes of DCP with the protein residues passing through the narrower tunnel of DhaA31. The energy barrier was much higher for DhaA31 than for DhaAwt, which is qualitatively in agreement with the FES profiles (**Table S4**). When the CaverDock calculations were performed with flexibility (a feature still under development), the energy profiles became smoother and the energy barriers for the unbinding of DCP dropped significantly, to 4.3 kcal/mol for DhaA31, and 2.7 kcal/mol for DhaAwt. The residues that are made flexible are selected by the extent of clashes during the rigid docking. In this case, they were F149 and Y176 for DhaA31, and F149, and C176 for DhaAwt. These residues are located at the tunnel bottleneck, and they were shown to form strong interactions with DCP at the global energy minima identified in the FES, thus confirming their importance (**Figure S16**). This result is remarkable, especially when considering the dramatic difference in the calculation costs for CaverDock (hours) and the free energy methods (weeks).

### DISCUSSION

The calculation of the kinetics and thermodynamics of ligand (un)binding has recently shown to be pivotal in drug design, but it can also be important in structural biology and biocatalysis. This is the case of the mutant enzyme DhaA31, which is currently the best known HLD for hydrolyzing the genotoxic compound TCP, but whose catalytic turnover number is limited by the release of the DCP product. For this system, the assessment of the kinetic and thermodynamics bottlenecks in the unbinding of DCP may pave the way to the design of improved biocatalysts.

In this study we have calculated the unbinding kinetic rates (koff) of DCP from the active sites of two enzymes, DhaA31, and DhaAwt, using two different methods—metadynamics (MTD) and adaptive sampling. Both methods predicted faster unbinding rates from DhaAwt than from DhaA31 (**Table 1**), which is in good agreement with previous evidence (Marques et al., 2017). However, there were considerable differences in the results from those two methods. For each system, the koff values differed by 1-2 orders of magnitude, being slower with the MTD method than with the adaptive sampling. Regarding the relative values of the koff rates, we obtained DhaAwt/DhaA31 ratio of 36 with MTD and only 2.5 with adaptive sampling, meaning that the latter method predicted faster rates for the two systems, but also closer values for those enzymes. On the other hand, the precision of the koff values obtained with adaptive sampling was higher (the relative errors were lower) than with the MTD method. Previous studies have attributed the differences between the predicted and experimental koff values to the errors in the force fields, the lack of polarizability, or the existence of tautomers, among other factors (Tiwary et al., 2015; Ferruz and De Fabritiis, 2016; Bruce et al., 2018). The simulations were performed under the same force field and water model (ff12SB and TIP3P, respectively), and therefore these differences are probably due to the intrinsic differences in the two methods. Since both methods have different types of biases—the MTD relies on a bias of repulsive potential energy deposited based on the CV

defined by the user, while the adaptive sampling uses Markov state models, calculated on-the-fly upon a user-defined metrics to start new epochs of MDs—and there are no experimental values available, it is difficult to assess which results are more accurate. Moreover, it is known that MDs performed with the same force fields but using different software packages may produce different conformational ensembles, and consequently different results (Childers and Daggett, 2018). The effects of the force field and solvent on the predicted kinetic rates were tested by additional adaptive sampling simulations performed with the ff14SB force field and OPC3 water model (**Table 1**). When compared to the available experimental data, the K<sup>d</sup> value predicted for DhaAwt using the combination ff14SB+OPC3 (8.3 ± 2.5 mM) was the closest to the experimental value (K<sup>d</sup> = 0.95 ± 0.34 mM). This seems to suggest that these conditions can better represent the physical properties of that system. A deeper discussion on this topic is presented in **Discussion S2** in **Supplementary Material**. We have demonstrated that the choice of the method, force field and water model can have a high impact on the prediction of kinetic properties. However, important conclusions could consistently be inferred from the comparative study of the two systems, namely the higher propensity of DhaAwt to release DCP as compared to DhaA31. This strongly supports the value of comparative studies with similar systems, namely for the design of new enzyme variants in protein engineering.

The funnel-MTD simulations provided the free energy profiles for the unbinding of DCP from DhaA31 and DhaAwt, which allowed us to derive some conclusions about the respective energetic barriers and bottlenecks. The global energy minima in both enzymes were found in the middle of the tunnel (for d1 ≈ 5-6 Å). After the global minimum, DhaA31 presented one steep energy barrier of 4.81 kcal/mol before DCP could reach the tunnel mouth, while DhaAwt had two steps with considerably reduced barriers (with 2.26 and 1.46 kcal/mol, respectively). This implies faster unbinding kinetic rates for DhaAwt, which is in good agreement with the kinetic calculations. The first transition state intermediate (TS1) in DhaA31 also corresponds to the geometric bottleneck and presents much higher steric constraints than the one observed in DhaAwt (**Figure 4**). The structural clusters, corresponding to the significant state along the FES, allowed the assessment of the respective binding energy of DCP with the protein's residues. From this analysis it was possible to identify the residues that interact with DCP at the energy minima and transition states, and thus contribute to the stabilization of these states. We hypothesize that the difference in binding energy between the global energy minimum and the transition states may help identify the residues that contribute the most to retain DCP in that minimum and prevent the enzymeproduct complex to proceed further to the full unbinding. Therefore, the residues with more negative 11Gbind are the most likely hot-spots for improving the unbinding rates. In DhaA31, there residues correspond to F152, F168, F149, and F245 (**Figure 5**). It is known, however, that residues F149 and F245 are important to stabilize and orient TCP toward the SN2 step (Marques et al., 2017), and therefore they should not be mutated to avoid undesirable disruption of the chemical steps. Previous studies performed with this system have identified several of these bottleneck residues as highly interacting with DCP, e.g., F149, F152, and F168 (Marques et al., 2017). However, their role was not so clear, and the current analysis came to confirm their pivotal importance in preventing the transition from the energy minimum toward the release. The residues with strongest interactions at the transition state, F144, T148, and K175, also represent interesting hot-spots to decrease the energy of TS1 and promote the transition of DCP along the unbinding process. However, the results from this approach are more unpredictable since the entropy can also be highly affected. CaverDock provided very interesting insights into the transport of DCP through the tunnels, especially considering that it is fast and has very low hardware requirements. With these simple calculations it could be concluded that DhaA31 has higher energy barrier to the unbinding of DCP as compared with DhaAwt, and we identified some of the residues that may hinder the transport the most.

Overall, we have shown that the computation of the kinetics and thermodynamics of protein-ligand unbinding can be a powerful tool for protein engineering when the goal is to improve the unbinding rates of a ligand from a biomolecule. Similar methods used can also be applied when the aim is to improve the ligand (substrate) binding. We illustrated that even highly sophisticated methods cannot precisely estimate kinetic values due to the computational limitation, and the results may highly depend on the selected parameters. However, they can be very useful for comparative purposes, which are typically needed in protein engineering projects. The free energy computation with funnel-MTD, or other enhanced-sampling free energy methods, can provide a deep insight into the binding/unbinding process, allow identification of the critical stages energetic and disclose the key residues for the unbinding. On the other hand, CaverDock is very fast and user-friendly, yet it may provide significant information about the ligand transport and enable the identification of key residues to improve the ligand transport. Different strategies can be followed for engineering new enzymes with improved ligand unbinding kinetic rates. The potential hot-spots for mutagenesis can be selected based on: (i) the residues showing the highest interaction differences between the energy minimum and transition state—aiming to decrease the energy barrier; (ii) residues interacting at the transition state aiming to decrease the transition state energy; (iii) tunnel-lining residues—aiming to change the shape and geometric bottleneck of the tunnel; and (iv) residues in contact with the tunnellining residues—aiming to change the flexibility and dynamic properties of the tunnel residues. The selected hot-spot residues can be targeted by site-directed mutagenesis, smart libraries or saturation mutagenesis. The effects of particular mutations on the unbinding rates can be anticipated with in silico calculations, either with the thorough but costly approaches (MTD or adaptive sampling), or using the cheaper CaverDock for a faster screening.

## CONCLUSIONS

Here we reported the application of metadynamics and adaptive sampling for computationally estimating the unbinding rates of the DCP product from two enzymes, DhaA31 and DhaAwt, and for aiding the design of improved biocatalysts. The unbinding of DCP is the rate-limiting step in the catalytic conversion of the toxic TCP with DhaA31, and improving this rate has biotechnological importance. Free energy calculation confirmed the different energetic profiles in the release of DCP by the two enzymes, and provided insights into the energetic bottlenecks in the unbinding process. By analyzing the interactions of DCP with DhaA31 at the critical stages we have identified several hot-spot residues that can be targeted by mutagenesis. Strikingly, some of these hot-spots were identified by the far less demanding CaverDock tool based on molecular docking. Site-directed mutagenesis or directed evolution applied on those hot-spots may result in new enzyme variants with the ability to release the DCP product at faster rates and thus present enhanced catalytic properties.

## AUTHOR CONTRIBUTIONS

SM carried out the computational work and wrote the manuscript. All authors contributed to the design of the study, interpretation of the data, and have given approval of the final version of the manuscript.

## FUNDING

This work was financially supported by ELIXIR-CZ (LM2015047), C4Sys (LM2015055), CERIT (LM2015042), and the Czech Ministry of Education (LQ1605, CZ.02.1.01/0.0/0.0/16\_026/0008451, CZ.02.1.01/ 0.0/0.0/16\_019/0000868). MetaCentrum VO (LM2015085) and the Swiss National Supercomputing Centre (CSCS) provided the computational resources.

## ACKNOWLEDGMENTS

The authors would like to express their gratitude to Prof. Michele Parrinello (ETH Zurich) for allowing the research visit in his group to learn MTD and perform some of the MTD calculations, to Dr. Ferruccio Palazzesi (ETH Zurich) for his invaluable guidance with the MTD and unbinding kinetics method, to Dr. Stefano Raniolo (ETH Zurich) for his help with setting up the funnel-MTD method, to Dr. Piia Kokkonen (Masaryk University) for her help with the adaptive sampling simulations, and to Dr. Zbynek Prokop (Masaryk University) for the valuable discussions on the transient kinetic studies.

### REFERENCES


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2018.00650/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Marques, Bednar and Damborsky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Challenging in silico Description of Carbon Monoxide Oxidation as Catalyzed by Molybdenum-Copper CO Dehydrogenase

#### Anna Rovaletti <sup>1</sup> , Maurizio Bruschi <sup>1</sup> , Giorgio Moro<sup>2</sup> , Ugo Cosentino<sup>1</sup> and Claudio Greco<sup>1</sup> \*

<sup>1</sup> Dipartimento di Scienze dell'Ambiente e della Terra, Università Degli Studi di Milano-Bicocca, Milan, Italy, <sup>2</sup> Dipartimento di Biotecnologie e Bioscienze, Università Degli Studi di Milano-Bicocca, Milan, Italy

Carbon monoxide (CO) is a highly toxic gas to many living organisms. However, some microorganisms are able to use this molecule as the sole source of carbon and energy. Soil bacteria such as the aerobic Oligotropha carboxidovorans are responsible for the annual removal of about 2x10<sup>8</sup> tons of CO from the atmosphere. Detoxification through oxidation of CO to CO<sup>2</sup> is enabled by the MoCu-dependent CO-dehydrogenase enzyme (MoCu-CODH) which—differently from other enzyme classes with similar function—retains its catalytic activity in the presence of atmospheric O2. In the last few years, targeted advancements have been described in the field of bioengineering and biomimetics, which is functional for future technological exploitation of the catalytic properties of MoCu-CODH and for the reproduction of its reactivity in synthetic complexes. Notably, a growing interest for the quantum chemical investigation of this enzyme has recently also emerged. This mini-review compiles the current knowledge of the MoCu-CODH catalytic cycle, with a specific focus on the outcomes of theoretical studies on this enzyme class. Rather controversial aspects from different theoretical studies will be highlighted, thus illustrating the challenges posed by this system as far as the application of density functional theory and hybrid quantum-classical methods are concerned.

### Edited by:

Fahmi Himo, Stockholm University, Sweden

### Reviewed by:

Lubomir Rulisek, Institute of Organic Chemistry and Biochemistry (ASCR), Czechia Per E. M. Siegbahn, Stockholm University, Sweden

> \*Correspondence: Claudio Greco claudio.greco@unimib.it

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 29 October 2018 Accepted: 03 December 2018 Published: 09 January 2019

#### Citation:

Rovaletti A, Bruschi M, Moro G, Cosentino U and Greco C (2019) The Challenging in silico Description of Carbon Monoxide Oxidation as Catalyzed by Molybdenum-Copper CO Dehydrogenase. Front. Chem. 6:630. doi: 10.3389/fchem.2018.00630 Keywords: molybdenum, copper, dehydrogenase, DFT, carbon monoxide

Carbon monoxide (CO) is a fatal gas to many living organisms as well as an indirect greenhouse gas in the atmosphere (Liu et al., 2018). Global CO emissions derive both from anthropogenic and natural sources (Choi et al., 2017). One of the main sinks of atmospheric CO is constituted by the soil, in which it is consumed in large amounts by microbial oxidation (Liu et al., 2018). One example of these important soil microorganisms is represented by the aerobic bacteria Oligotropha carboxidovorans. The latter is able to grow using CO as its sole source of carbon and energy (Hille et al., 2015). This metabolism is ascribed to the air-stable Mo/Cu-dependent CO dehydrogenase (MoCu-CODH) enzyme that catalyzes the oxidation of CO to CO<sup>2</sup> (Zhang et al., 2010).

This enzyme contains a unique active site composed by a molybdenum/copper bimetallic center (see **Figure 1A**). The molybdenum ion is found in a square-pyramidal geometry with one apical oxo ligand, a dithiolene ligand from the molybdopterin cytosine dinucleotide (MCD) cofactor, an equatorial oxo ligand and a sulfido ligand. The latter bridges to the copper center, which links the active site to the protein matrix by coordinating the sulfur atom of Cys388. Moreover, Cu is also

**95**

coordinated to a weakly bound water molecule (Gnida et al., 2003; Rokhsana et al., 2016), and can coordinate not only CO (i.e., the physiologic substrate) but also H2. In fact, the MoCu-CODH enzyme has the ability to catalyze dihydrogen oxidation, even though such hydrogenase activity is rather low (Santiago and Meyer, 1996; Wilcoxen and Hille, 2013).

The protonation state of the active site has been object of debate. In fact, experimental X-ray diffraction (XRD) results were interpreted as indicative of a Mo(=O)OH state both for the oxidized and the reduced forms of the enzyme (Dobbek et al., 2002). Differently, extended X-ray absorption fine structure (EXAFS) spectroscopy suggested the presence of a MoVI(=O)O unit and a MoIV(=O)OH<sup>2</sup> unit for the oxidized and reduced states, respectively (Gnida et al., 2003). Recent data in support of the EXAFS-based hypothesis came from electron paramagnetic resonance (EPR) analysis of the Mo<sup>V</sup> analog and from theoretical

calculations (Zhang et al., 2010; Rokhsana et al., 2016). As far as the oxidation state of the Cu ion is concerned, it maintains the +1 state throughout the enzymatic catalytic cycle (Dobbek et al., 2002; Gnida et al., 2003). In fact, CO oxidation occurs directly at the Cu center (Shanmugam et al., 2013), and the twoelectron transfer to Mo at each catalytic cycle is allowed by the highly delocalized nature of the Mo(µ-S)Cu unit (Gourlay et al., 2006). Key second-sphere amino acid residues are conserved in MoCu-CODH enzymes and in homologues with different activity, i.e., xanthine oxidases (Hille, 2013). In particular, a conserved glutamate residue (Glu763) is found in proximity to the equatorial oxo ligand of Mo and is considered to act as a base to facilitate deprotonation events (Wilcoxen and Hille, 2013; Hille et al., 2015). Moreover, the aromatic ring of a phenylalanine residue (Phe390), located in front of the Cu<sup>I</sup> ion, is thought to have relevance for the correct positioning of the substrate within the active site (Rokhsana et al., 2016).

Carbon monoxide (|C≡O) is expected to show similar reactivity with respect to the isoelectronic isocyanide (|C≡N-R) species. They share the presence of non-bonding electron pairs in the sp orbital of the terminal carbon atom and a triple bond between the latter and a more electronegative atom. Indeed, Dobbek et al. reported the inhibitory activity of n-butylisocyanide, |C≡N-(CH2)3CH3, toward the oxidized MoCu-CODH (Dobbek et al., 2002). In the same study, the corresponding crystallographic structure of the inhibited enzyme was determined (PDB ID: 1N62). The resulting inactive complex is characterized by a thiocarbamate geometry in which the isocyanide group forms covalent bonds with the µ-sulfido ligand, the equatorial oxygen of Mo and the Cu atom, while the alkyl chain of n-butylisocyanide extends into the hydrophobic interior of the substrate channel.

The features of the crystal structure of the n-butylisocyanidebound state prompted Dobbek and coworkers to advance the first hypothesis ever proposed for the MoCu-CODH catalytic mechanism (see **Figure 2A**) (Dobbek et al., 2002). The latter involves the formation of a thiocarbonate-like intermediate—analogous to the thiocarbamate derivative formed during the aforementioned inhibition—after the CO substrate accesses the oxidized active site. Such thiocarbonate species would be characterized by the insertion of CO between copper, the µ-sulfido ligand and the equatorial oxo ligand of the Mo atom. Taking inspiration from such a mechanistic hypothesis for CO-oxidation catalysis, the enzymatic mechanism has been subsequently studied with computational methods by several groups, thus giving birth to a debate that is still ongoing (vide infra).

Recently, it has been reported that rather bulky thiol molecules—e.g., L-cysteine, coenzyme A or glutathione—can reach the bimetallic active site (Kreß et al., 2014). They cause a reversible inactivation of the enzymatic activity, competing with the substrates for the same position at the Cu<sup>I</sup> center.

In the following two sections of this manuscript we aim at reviewing the theoretical studies that have been published on MoCu-CODH using quantum mechanical (QM) and hybrid quantum mechanical/molecular mechanical (QM/MM) approaches. In doing so, we will pay special attention to the controversies and the challenges that have emerged. Moreover, promising future developments in the theoretical description of this system will be proposed in the concluding section.

### 1. SMALL- AND MEDIUM-SIZED QM-CLUSTER MODELS

The first theoretical investigations of the catalytic activity of MoCu-CODH were carried out independently by two theoretical groups in 2005 (Hofmann et al., 2005; Siegbahn and Shestakov, 2005). To model the enzymatic active site, Hofmann et al. used a small cluster of 24 atoms representing the two transition metals and their first coordination spheres (see **Figure 1B**) (Hofmann et al., 2005), whereas Siegbahn and Shestakov performed calculations with two different models, a small one

of about 20 atoms and a bigger one composed of about 70 atoms (see **Figures 1C,D**). In the latter, some residues belonging to the second coordination spheres of the metals were explicitly included (Siegbahn and Shestakov, 2005). The hybrid density functional B3LYP (Lee et al., 1988; Becke, 1993) was employed in both cases to optimize the geometries and compute relative energies of intermediates along the putative catalytic cycles. For geometry optimizations, Hofmann and coworkers adopted the Lanl2DZ (Dunning and Hay, 1976; Hay and Wadt, 1985a,b; Wadt and Hay, 1985) effective core basis set with additional d-type functions on S atoms. This was followed by single point energy calculations using the SDD (Dunning, 1970; Dunning and Hay, 1977; Dolg et al., 1987) basis set augmented by d-type polarization functions on all non-hydrogen, non-metal atoms. The basis sets employed by Siegbahn and Shestakov were lacvp and lacv3p\*—with ECP for Mo, Cu, and S atoms—for geometry optimizations and energy calculations, respectively. In both studies, the protein matrix surrounding the active site was modeled by a continuum dielectric with ǫ = 4 (Eckert and Klamt, 2002; Cossi et al., 2003).

A comparative analysis of the results coming from two such early investigations evidences a significant variability of the results obtained as a function of the adopted level of theory, within models of the same size. In particular, the calculated energy differences showed significant dependency on the basis sets used, with deviations up to 21 kJ/mol in the case of intermediates involved in formation of the new C–O bond, a key step in the catalytic process. Interestingly, previous studies of oxygen-atom transfer (OAT) reactions involving Mo-complexes (Li et al., 2013) and other transition-metal-containing systems (Hu and Chen, 2015; Li et al., 2015) also evidenced the strong basis set effect on computed energy differences along the reactive paths.

Notwithstanding the shortcomings deriving from the choice of basis sets, both Siegbahn, Shestakov and their respective coworkers evidenced a surprisingly high stability for the thiocarbonate intermediate. The presence of such a deep minimum on the energy landscape pertaining to the reaction mechanism was interpreted differently by the two groups. Siegbahn and Shestakov in particular proposed that the thiocarbonate derivative represents an intermediate of the COoxidation mechanism. However, in the proposed mechanism the barrier for the release of the CO<sup>2</sup> product was estimated to be rather high, as it would require the insertion of a water molecule which was reported not to be a facile step (see **Figure 2B**). Differently, Hofmann and coworkers raised the possibility that the thiocarbonate adduct lies outside the catalytic cycle, in a deep potential energy well that would effectively slow down enzymatic activity (see **Figure 2C**). These authors further proposed that the constrains imposed by the protein matrix could prevent formation of such a stable offpath adduct, a hypothesis that— however—was later discarded as a result of a theoretical study focused on this topic (Siegbahn, 2011).

In a more recent theoretical study, a different mechanism for the oxidation of CO by the MoCu-CODH enzyme was proposed (see **Figure 2D**) (Stein and Kirk, 2014). Using a cluster model analogous to the one previously employed by Hofmann and coworkers (see **Figure 1B**), at the PBE/TZP (Perdew et al., 1996; Ernzerhof and Scuseria, 1999) level of theory and including continuum dielectric contributions (ǫ = hexane) (Klamt and Schüürmann, 1993), Stein and Kirk proposed that the stable thiocarbonate intermediate formation could be bypassed by evolving bicarbonate as a final product rather than CO2. Bicarbonate formation would proceed via nucleophilic attack of a copper-activated water molecule on the C atom of the metalbound CO2. However, such a picture is at odds with recent experimental studies, which appear to exclude the possibility of forming a bicarbonate complex during catalysis (Dingwall et al., 2016).

Breglia and coworkers published the most recent theoretical study of MoCu-CODH, in which only the first shell coordination spheres were included in a QM model (see **Figure 1C**) (Breglia et al., 2017). Such a study mainly regards the hydrogenase activity of the enzyme and includes a comparative analysis of the binding reactions of the physiologic substrate—i.e., CO—and of dihydrogen to the Cu ion. Similarly to previous studies on H2- and CO-binding enzyme models in which a pure functional was used in conjuction with triple-zeta bases (Greco et al., 2015; Rovaletti and Greco, 2018), geometry optimizations and energy calculations were carried out in vacuo at BP86/def2-TZVP level (Perdew, 1986; Becke, 1988; Weigend and Ahlrichs, 2005). As far as the energetics of CO binding is concerned, the computed 1E was as negative as −64 kJ/mol. A comparison of the latter value with the results of corresponding calculations previously performed at different levels of theory for the same reaction by Siegbahn, Hofmann and their respective coworkers, evidences significant discrepancies (11E = 50 and 31 kJ/mol, respectively). Actually, the occurrence of such large differences comes with little surprise, given the well-known shortcomings in binding energy calculations using quantum chemical models of coordination complexes and their ligands (Husch et al., 2018).

## 2. LARGE QM-CLUSTER MODELS AND HYBRID MODELS

The importance of extending the dimension of the bimetallic active site model and systematically accounting for the effects of the second-sphere residues on energetics was evidenced in a recent theoretical work by Rokhsana and coworkers (Rokhsana et al., 2016). In fact, a large-size cluster model of around 180 atoms (see **Figure 1E**) turned out to be required for a fully satisfactory reproduction of experimentally determined structural parameters. The same held true for the evaluation of the most plausible protonation states of the Mo/Cu core, which was done taking into account key geometric features of the enzyme crystal structure and redox potential measurements available in literature. The authors were able to assess in particular the protonation state of the equatorial oxo ligand of Mo, at the different redox states attained during catalysis. As for the adopted level of theory, Rokhsana et al. employed the def2-TZVP basis set for all elements, apart from H and C atoms, for which the def2-SVP basis set was used (Weigend and Ahlrichs, 2005). The protein environment was modeled by a continuum dielectric with ǫ = 4 (Klamt and Schüürmann, 1993). The BP86 and B3LYP density functionals were used for geometry optimizations and energy evaluation, respectively.

In a subsequent study, explicit consideration of the whole protein environment was achieved by means of a hybrid quantum mechanics/molecular mechanics (QM/MM) approach (Xu and Hirao, 2018). In the work of Xu and Hirao, the active site was described by a QM region of 89 atoms (see **Figure 1F**) using the B3LYP functional. During geometry optimization, the SDD effective core potential basis set was employed to represent the transition metal ions, whereas the 6-31G\* basis set was adopted for all the other atoms (Dolg et al., 1987; Andrae et al., 1990). Single-point energy calculations were carried out using the larger def2-TZVP basis set. For the molecular mechanics calculations, the AMBER03 force field was employed (Duan et al., 2003). Moreover, the Grimme's D3-correction with Becke−Johnson damping [D3(BJ)] was taken into account in the calculations (Grimme et al., 2011). The previously proposed catalytic mechanisms that involve the formation of the thiocarbonate species were re-investigated at such a level of theory. According to Xu and Hirao's study, the S-C-bound adduct would be formed along the reaction pathway as previously suggested (Dobbek et al., 2002; Siegbahn and Shestakov, 2005). However, based on the novel QM/MM results, the thiocarbonate species thus formed would not be as stable as previously proposed. It has to be remarked that the thiocarbonate intermediate was not found to be directly linked to the transition state for CO<sup>2</sup> releasing. In fact, it was proposed that— after thiocarbonate formation the reaction needs to follow a reverse process to productively proceed toward CO<sup>2</sup> evolution. Notably, the overall barrier for the proposed catalytic mechanism was found to be low (in the order of 50 kJ/mol). In the same study, Xu and Hirao also carried out purely QM calculations with a QM-cluster of the same size of the quantum-mechanical region of their hybrid model, and compared the obtained results with those coming from QM/MM modeling. Such a comparison evidenced that the protein environment is not involved in modulating the kinetic barrier associated with the investigated catalytic mechanism. However, it was found that the protein matrix plays an important role in the stabilization of the CO2-released state. Finally, it was also reported that the inclusion of dispersive corrections lowers by 15 kJ/mol the activation barrier of the product-releasing step, in line with what was expected for the modeling of a bimolecular reaction step.

## CONCLUDING REMARKS

Over the last fifteen years, the theoretical investigation of the CO oxidation mechanism by MoCu-CODH has given rise to a debate, the essentials of which are centered on the possible occurrence and on the role of a thiocarbonate catalytic intermediate. In the above sections, we have reported key details of the various computational studies published to date, and we are now in the condition to present a more general outlook on the state of the art regarding MoCu-CODH.

The early studies by Siegbahn, Hofmann, and their respective coworkers evidenced that the thiocarbonate intermediate would occupy a deep well in the energy profiles pertaining to the investigated reaction mechanisms. However, the kinetic barriers they computed for CO<sup>2</sup> evolution were at least 30 kJ/mol higher than the recently determined experimental counterpart (Zhang et al., 2010). In part, this picture depends on the neglect of dispersive corrections: their inclusion became a standard possibility only after the publication of the mentioned study (Siegbahn, 2011).

Results more compatible with the experimental evidence of a kinetic barrier of around 50 kJ/mol were obtained by Xu and Hirao, who exploited a larger QM-cluster model with the explicit inclusion of most of the second-sphere coordination environment, along with employment of large basis sets and dispersion corrections (Xu and Hirao, 2018). It is noticeable that, according to Xu and Hirao's results, the thiocarbonate species still appears to behave as a thermodynamic sink. Even though not a very deep one, such a sink would effectively hamper the advancement along the proposed path toward products, a rather unusual role for a species formed during an enzymatic process.

All the catalytic mechanisms proposed in the theoretical studies reviewed here focus on the possibility that the Mobound equatorial oxo ligand performs the nucleophilic attack on the activated CO substrate bound to the Cu ion. This is in line with what has been suggested in the case of the catalytic mechanism of the homologous xanthine oxidases enzymes. However, a variant with respect to such a picture has been recently proposed, in which an activated water molecule would play the role of nucleophile (Hille et al., 2015). Notably, to the best of our knowledge this alternative mechanism has not yet been investigated by QM or hybrid QM/MM studies.

Future theoretical studies on this (and other) putative catalytic mechanism will possibly face the challenge associated with rather pronounced fluctuations of computed energy differences as a function of the adopted level of theory. As far as the application of density functional theory is concerned, extensive benchmarking could in principle help to improve the theoretical predictions. In this regard, the available experimental data on enzyme inhibition— by thiols in particular—could represent a useful dataset, keeping in mind that the reproduction of binding energies in the case of bulky thiols might require extensive protein matrix phase space sampling in the case of QM/MM studies. High level ab initio methods are a viable—though still challenging—alternative for providing reliable results. In fact, thanks to recent methodological developments, the treatment of relatively large bimetallic systems has been shown to be computationally affordable with multiconfigurational post-Hartree-Fock approaches (Phung et al., 2016, 2018; Dong et al., 2017).

The relevance of MoCu-CODH as an inspiring system for future biomimetic and bioengineering applications is currently growing. This is due not only to the relevance of the reactions it catalyzes, but also to its resistance to atmospheric O<sup>2</sup> exposure a rare feature in the case of enzymes expressing carbon monoxide dehydrogenase and hydrogenase activities (Choi et al., 2017; Gourlay et al., 2018; Groysman et al., 2018). Notably, the recent establishment of a functional heterologous expression system for the MoCu-CODH enzyme (Kaufmann et al., 2018) together with developments in the computational chemistry field will hopefully boost the positive feedback among biochemical, biomimetic and quantum chemical studies, opening new perspectives for a deeper understanding of this interesting metalloenzyme.

### REFERENCES


### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

characterization and reactivity of paramagnetic MoVO(µ-S)Cu<sup>I</sup> complexes. Chem. Sci. 9, 876–888. doi: 10.1039/c7sc04239f


using a process-based biogeochemistry model. Atmos. Chem. Phys. 18:7913. doi: 10.5194/acp-18-7913-2018


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Rovaletti, Bruschi, Moro, Cosentino and Greco. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computational Protocol to Understand P450 Mechanisms and Design of Efficient and Selective Biocatalysts

Kersti Caddell Haatveit, Marc Garcia-Borràs and Kendall N. Houk\*

Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, United States

Cytochrome P450 enzymes have gained significant interest as selective oxidants in late-stage chemical synthesis. Their broad substrate scope enables them to be good candidates for their use in non-natural reactivity. Directed evolution evolves new enzyme biocatalysts that promote alternative reactivity for chemical synthesis. While directed evolution has proven useful in developing biocatalysts for specific purposes, this process is very time and labor intensive, and therefore not easily repurposed. Computational analysis of these P450 enzymes provides great insights into the broad substrate scope, the variety of reactions catalyzed, the binding specificity and the study of novel biosynthetic reaction mechanisms. By discovering new P450s and studying their reactivities, we uncover new insights into how this reactivity can be harnessed. We discuss a standard protocol using both DFT calculations and MD simulations to study a variety of cytochrome P450 enzymes. The approach entails theozyme models to study the mechanism and transition states via DFT calculations and subsequent MD simulations to understand the conformational poses and binding mechanisms within the enzyme. We discuss a few examples done in collaboration with the Tang and Sherman/Montgomery groups toward elucidating enzyme mechanisms and rationally designing new enzyme mutants as tools for selective C–H functionalization methods.

Keywords: density functional theory (DFT), theozymes, MD simulations, biocatalysis, cytochrome P450s, oxidations

### INTRODUCTION

Cytochrome P450s are a highly-conserved class of enzymes that contain an Fe-heme cofactor with an axial cysteine ligand that can perform different oxidation reactions on a large variety of native substrates. (Ortiz de Montellano, 2005) P450 mechanisms have been extensively studied (Shaik et al., 2010) and require the oxidative generation of the reactive cofactor species, an iron(IV)-oxo (FeIV = O) radical cation. With this reactive species, the oxidation step in the hydroxylation reaction occurs via a hydrogen abstraction step followed by a radical rebound step between the cofactor and substrate (Meunier et al., 2004). Due to their broad substrate scope (Durairaj et al., 2016) and the growing interest developing selective methods for late-stage C–H oxidations, P450s have been utilized extensively as biocatalysts for many applications (Kumar, 2010), particularly

#### Edited by:

Fahmi Himo, Stockholm University, Sweden

#### Reviewed by:

Margareta Blomberg, Stockholm University, Sweden Vicent Moliner, Universitat Jaume I, Spain Iñaki Tuñón, University of Valencia, Spain

> \*Correspondence: Kendall N. Houk houk@chem.ucla.edu

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 16 November 2018 Accepted: 20 December 2018 Published: 11 January 2019

#### Citation:

Caddell Haatveit K, Garcia-Borràs M and Houk KN (2019) Computational Protocol to Understand P450 Mechanisms and Design of Efficient and Selective Biocatalysts. Front. Chem. 6:663. doi: 10.3389/fchem.2018.00663

**102**

those that are difficult to perform with chemocatalysts. Cytochrome P450s are an excellent starting point for developing biocatalysts, and many great successes (Loskot et al., 2017) in modifying and engineering P450s have come from experimental directed evolution (non-rational) approaches (Romero and Arnold, 2009), pioneered by Nobel Laureate, Frances Arnold. However, these require many rounds of screening to arrive at a biocatalyst tailored for a specific purpose. This process can be sped up by developing a better understanding about how these P450s function, particularly by using computational methods. Rational approaches utilizing computational methods to study these enzymes can facilitate quicker access understanding their reactivity and predicting better variants.

While enzymes are complex systems, and their study can require elaborate computational methods, the Houk group has developed a simpler standard approach to understanding the mechanisms of complex (metallo)enzymes (particularly P450s) with great success. We utilize quantum mechanics (QM) calculations to study the transition states and mechanisms of theozyme ("theoretical enzyme" Tantillo et al., 1998) models (**Figure 1A**), a truncated portion of the enzyme that includes catalytically relevant active site residues and cofactors along with the substrate. We investigate the intrinsic mechanistic preferences of such transformations to understand which type of catalysis by the enzyme is most likely. Others have also utilized more rigorous approaches to study the enzyme catalysis, such as the cluster model (CM) approach used by Siegbahn, Thiel, and Himo (Siegbahn and Crabtree, 1997; Siegbahn and Himo, 2009, 2011; Blomberg et al., 2014) and QM/MM (Warshel and Levitt, 1976; Senn and Thiel, 2009; Karasulu and Thiel, 2015; Aranda et al., 2016) methods. In the beginning, CM were quite comparable to theozymes, but with the advance of computational power, CM has become more complex, and expanded to include systems of 200 atoms with other constraints on the amino acids residues to more accurately reflect the shape of the active site. QM/MM has been utilized to study the effects that mutations have on substrate binding. While these techniques are often more accurate and include the entire enzyme system, our approach is a more rapid method to study various enzyme systems (particularly P450s) and to understand their intrinsic preferences.

We combine DFT calculations with molecular dynamics (MD) simulations to explore the selectivities of enzymes and mutants. MD simulations allow us to study the substrate binding poses and positioning relative to the Fe = O active species. We can compare these geometries from MD simulations to the ideal transition state geometries determined from DFT to establish the enzyme's selectivity control. This method has allowed us to propose mutations to improve catalytic activity of enzymes. This review will focus on a few recent examples from the Houk group, first using QM theozymes to understand the feasibility of various mechanistic pathways within an enzyme, and subsequently, a combination of both QM and MD simulations to assess how the enzyme controls the reactivity and selectivity in a variety of P450 enzymes. Ultimately, we describe predictions to develop more efficient and selective enzyme mutants.

### USE OF QUANTUM MECHANICS (QM) THEOZYME MODELS TO UNDERSTAND NOVEL ENZYMATIC MECHANISMS

The Houk group has used quantum mechanical DFT calculations to elucidate the mechanisms of novel P450 enzymes discovered in biosynthetic pathways. These demonstrate promiscuous, yet selective, reactivity or new oxidation reactivity that has great potential for biocatalysis. By modeling the reaction using a theozyme that includes the substrate and a truncated Fe-oxo heme active species, this method leads to the intrinsically preferred mechanism. Here, we discuss an example of the mechanism for a novel C–O bond formation performed by a P450 enzyme, GsfF, to generate the natural product, griseofulvin.

In 2016, the Tang group discovered GsfF (Chooi et al., 2010; Cacho et al., 2013) from the gsf gene cluster responsible for the synthesis of the natural product, griseofulvin. GsfF catalyzes an oxidative cyclization that generates the spirocyclic core within the final product. Initially, two different mechanisms, a double O–H abstraction (Pathway A) and epoxidation (Pathway C), were proposed (**Figure 1B**). Because it was difficult to determine a plausible mechanism experimentally, the various mechanisms were studied computationally (Grandner et al., 2016). An alternative direct radical attack C–O bond formation mechanism (Pathway B) was proposed within this paper to avoid diradical formation (**Figure 1B**). These mechanisms were studied computationally using DFT methods [at (U)B3LYP-D3(BJ)/LanL2DZ(Fe)/6-311+G(d,p)// (U)B3LYP/LanL2DZ(Fe)/6-31G(d) levels] with a theozyme compromised of a truncated Fe porphyrin model with the substrate (**Figure 1A**).

Pathway C was quickly discarded as a possibility due to the high barrier (18.7 kcal/mol) as compared to the double O–H abstraction and direct radical attack mechanism that are 0.5 and 0.0 kcal/mol, respectively. The differences between pathways A and B are small (0.5 kcal/mol), and therefore further analysis was considered to distinguish the most plausible mechanism. Pathway A requires a diradical intermediate, which is highly reactive. Due to the lack of crystal structure of GsfF, homology modeling was used to evaluate the possible binding positions of the substrate in the active site. Pathway A requires the substrate to reorient its binding pose within the enzyme between the mechanistic steps, which is likely not feasible without deleterious side reactions involving the reactive diradical intermediate. Consequently, Pathway B, which doesn't require substrate reorientations, is proposed to be the most plausible pathway.

### COMBINATION OF THEOZYMES AND MOLECULAR DYNAMICS SIMULATIONS AS A TOOL TO UNDERSTAND P450S WITHIN BIOSYNTHETIC PATHWAYS

While the use of QM and theozyme models provides significant information on the mechanisms of novel enzymatic catalysis

of P450s, we usually combine QM and MD to obtain more knowledge of enzyme reactivity. DFT calculations of model theozymes demonstrates the intrinsic preferences for oxidation sites with P450 enzymes, while the MD simulations illustrate the various conformations and binding poses that the substrate can explore within the active site. We describe several collaborations with the Sherman and Montgomery groups to understand various P450 intrinsic reactivity for oxidations, how the enzyme controls binding orientations to overcome that innate selectivity, and how new mutations can enhance selectivity and activity.

### Prediction of Mutations to Improve WT Catalytic Activity in a P450 Monooxygenase

The Sherman group discovered a cytochrome P450 monooxygenase, PikC, a promiscuous enzyme that hydroxylates 12- and 14- membered macrolides, YC-17 and narbomycin (Xue et al., 1998). Co-crystal structures with both YC-17 and narbomycin revealed that the source of this selective, yet promiscuous, scope is due to a salt bridge interaction between the desosamine sugar anchor on the macrolides and E94 residue, enabling different substrates to react similarly if they maintain the desosamine sugar (Sherman et al., 2006). This interesting discovery led us to use PikC as the starting point for engineering a P450 biocatalyst that performs selective hydroxylations on a broad set of unnatural cyclic substrates. While the co-crystal structure provided great insights into one critical aspect of the binding mechanism, a dynamic view of PikC by utilizing both MD and QM enabled a rapid design of substrates and improved mutants (Narayan et al., 2015).

Menthol was chosen as a model non-native substrate due to the variety of C–H bonds and the vast amount of information on previous C–H functionalization attempts. MD simulations were performed on several menthol-based substrates that contain various lengths of synthetic anchors (**Figure 1C**). From those simulations, it was discovered that when the linker is too short (**3a**), the salt bridge interactions with E94/E85 are lost, and alternatively it develops a new interaction with E246, which unproductively orients the substrate such that unreactive C– H bonds are aimed at the iron-oxo reactive site (**Figure 2A**). Simulations also showed that the carboxylate group in D176 can unproductively interact with the substrate, forcing it to lose its close proximity to the Fe = O active species. It was also discovered that the loss of E94/E85 interactions destabilize the tertiary structure forcing PikC to adopt its open conformation, the preferred conformation in the apo state. However, when the linkers are longer (**3e**) and contain more rigid groups such as phenyls (**3f**), salt bridge interactions with E94/E85 give a new stabilizing interaction with E48 while avoiding the harmful E246 interaction. Experimentally, a PikC D50 N mutant demonstrated improved reactivity and the MD simulations confirmed this. Furthermore, it was predicted that PikC had greater selectivity for (−) menthol over (+) menthol as the latter substrate left the active site after 100 ns of MD simulations or oscillated in and out of the pocket, depending on the synthetic anchor group. Experimental results corroborated these predictions.

From these findings, we predicted two key mutation points, E246 and D176, that would improve the efficiency of the enzymes. From the mutagenesis experiments, the triple mutant PikCD50N/E176Q/E246A, showed an increase in total turnover number (TTN) for all the unnatural substrates tested. The MD simulations demonstrated a higher frequency of closed conformations and persistent favorable salt bridge interactions in the triple mutant (**Figure 2B**) versus PikCwt or single mutant PikCD50N.

While MD allowed us to predict essential mutations to improve catalysis in PikC, QM theozyme calculations were utilized to develop a model to predict the site selectivity of hydroxylation of these unnatural substrates. C–H abstraction transition states were computed for all the possible C–H bonds on menthol and C4 position was determined as the most reactive bond. In addition, the stereoselectivity at C4 was analyzed and a minimal difference in preference was found as the axial hydrogen abstraction barrier is lower by 0.7 kcal/mol. The chiral environment of the enzyme is responsible for biasing for one stereoisomer, thus overriding the intrinsic reactivity. Key geometric parameters, H–O distance and O–H–C angles, from snapshots of 0.5 µs MD simulations were compared to the ideal geometry from DFT calculations (**Figures 2C,D**). This analysis indicated that the enzyme exposes the equatorial hydrogen in the appropriate geometry to the iron-oxo species more than axial hydrogen. Experimentally, hydroxylations of these substrates gave results consistent with the computational predictions.

### Model for Origin of Selectivity for Salt Bridge Anchor Bound Substrate in a P450 Mutant

With this triple mutant PikC, the scope of the biocatalyst for oxidation of more complex substrates was explored. Anchor groups were simplified to have greater control of the selectivity, and we probed the origins of selectivity controlled by various anchors.

Our computational approach involved QM theozymes and MD simulations to predict the selectivities of substrate, **5**, modified with two linkers, **a** and **b** (**Figure 2E**). DFT calculations of the truncated system were utilized to analyze the intrinsic preference for oxidation for the various C–H bonds within the macrocycle (Gilbert et al., 2017). Positions C3 and C10 were determined to have the lowest C–H abstraction barriers in the two lowest energy conformers when using truncated model of macrolactone, **5**, where the linker was simplified as a methyl group (**Figure 2F**). These results are consistent with the two regioisomers observed experimentally. The stereochemistry at each of these positions was analyzed by comparing the C– H abstraction barrier for the axial and equatorial hydrogen at both C3 and C10. For both positions, the equatorial hydrogen abstraction had a lower energy barrier, due to better conjugation with the exocyclic olefin for C3 and less distortion to obtain conjugation for C10 (**Figure 2G**).

MD simulations were then performed on **5** attached with either linker **a** or linker **b**. These were docked into the crystal structure of the PikC mutant. MD simulation snapshots with linker **a** demonstrated that both C3 and C2 positions were maintained close to the reactive iron-oxo cofactor. Despite the favorable proximity of both positions, a C3–H reacts more rapidly than C2–H, due to its lower C–H abstraction barrier (5 kcal/mol lower) as shown by DFT calculations. The stereoselectivity of reaction at C3 was analyzed by comparing the geometry for C–H abstraction of both the axial and equatorial positions during MD trajectories to that of the ideal DFT transition state (**Figure 2H**). These results show that the PikC mutant active site bends the substrate so that the equatorial hydrogen has the appropriate orientation C–H abstraction. The same procedure was performed on the substrate with linker **b** which points the C10 close to the iron-oxo species. For C10, the axial hydrogen is in a geometry better for C–H abstraction (**Figure 2H**). However, the barrier for the equatorial C–H is much lower than that for the axial C–H. This process was repeated for many substrates, and experimental results were consistent with their predicted selectivity.

FIGURE 2 | (A) The detrimental binding interactions of menthol-based substrate with E246 in WT PikC enzyme. (B) The beneficial binding interactions of menthol-based substrate with E48, E85, and E94 in triple mutant PikC, D50N/D176Q/E246A. (C) Orientation of H4eq and H4ax relative to the reactive iron-oxo species from 0.5 µs MD trajectories. Each point represents a simulation snapshot. The x-axis captures deviations of Oheme−H distances and y-axis of Oheme−H–C angles from DFT optimized transition state (in blue). (D) Representative snapshot from MD simulations showing the closer proximity of equatorial hydrogen to active Fe-oxo species. (E) The regio and stereoselectivity of macrolactone substrate, 5, with different linkers a and b in PikC triple mutant. (F) The lowest energy conformer of truncated 5 with C–H abstraction barriers for both axial and equatorial hydrogens at C3 and C10. (G) The geometry of the C–H abstraction transition state for the lowest energy hydrogen (equatorial) for C3 (left) and C10 (right). (H) Orientation of axial (red) and equatorial (blue) hydrogens on C3 in 5a (left) and C10 in 5b (right) relative to the reactive iron-oxo species from 500 ns MD trajectories. Each point represents a simulation snapshot. The x-axis captures the Oheme−H distances and y-axis of Oheme−H–C angles compared to that of DFT optimized transition state (in black). Adapted with permission from references Narayan et al. (2015) and Gilbert et al. (2017).

## CONCLUSION

We have described several cases where we have combined QM theozyme models and MD simulations to understand enzymatic reactivities and selectivities. This computational protocol provides understanding of enzymatic mechanisms and substrate conformational space for various cytochrome P450s. We used this knowledge to predict mutations to establish better biocatalyst variants and predict the selectivities on natural and non-natural substrates. Although we showed some successful applications of this protocol, one of its limitations is that accurate energy barriers, that can be directly compared with experimental values, cannot be obtained. For these purposes, more computationally expensive methods and protocols are required. For instance, multiple QM/MM calculations can be performed to get more accurate free energies on different conformations sampled by the enzymatic system during MD simulations. This better accounts for the

### REFERENCES


catalytically relevant conformations explored by the enzyme in solution. Although the accuracy of these methods is well-established, the large amount of computational time needed dramatically limit the speed and generality of those protocols.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

KNH acknowledges financial support from the National Institutes of Health, National Institute for General Medical Sciences GM-124480. KCH is supported by the NIH predoctoral training grant GM067555. MG-B thanks the Ramón Areces Foundation for a Postdoctoral Fellowship.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Caddell Haatveit, Garcia-Borràs and Houk. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

### Theoretical Studies on Mechanism of Inactivation of Kanamycin A by 4 ′ -O-Nucleotidyltransferase

#### Sergio Martí <sup>1</sup> , Agatha Bastida<sup>2</sup> and Katarzyna Swiderek ´ <sup>1</sup> \*

<sup>1</sup> Departament de Química Física i Analítica, Universitat Jaume I, Castelló de La Plana, Spain, <sup>2</sup> Departamento de Química Bio-orgánica, Instituto de Química Orgánica General (CSIC), Madrid, Spain

This work is focused on mechanistic studies of the transfer of an adenylyl group (Adenoside-5′ -monophosfate) from adenosine 5′ -triphosphate (ATP) to a OH-4′ hydroxyl group of an antibiotic. Using hybrid quantum mechanics/molecular mechanics (QM/MM) techniques, we study the substrate and base-assisted mechanisms of the inactivation process of kanamycin A (KAN) catalyzed by 4′ -O-Nucleotidyltransferase [ANT(4′ )], an active enzyme against almost all aminoglycoside antibiotics. Free energy surfaces, obtained with Free Energy Perturbation methods at the M06-2X/MM level of theory, show that the most favorable reaction path presents a barrier of 12.2 kcal·mol−<sup>1</sup> that corresponds to the concerted activation of O4′ from KAN by Glu145. In addition, the primary and secondary <sup>18</sup>O kinetic isotope effects (KIEs) have been computed for bridge O3α, and non-bridge O1α, O2α, and O5′ atoms of ATP. The observed normal 1◦ -KIE of 1.2% and 2◦ -KIE of 0.07% for the Glu145-assisted mechanism are in very good agreement with experimentally measured data. Additionally, based on the obtained results, the role of electrostatic and compression effects in enzymatic catalysis is discussed.

Keywords: kanamycin, antibiotic, QM/MM, aminoglycosides, kinetic isotope effects, O-Nucleotidyltransferase, electrostatic effects, compression effects

## INTRODUCTION

Aminoglycoside antibiotics (AGAs) belong to the class of agents used to treat serious infections caused by bacteria that either multiply very quickly or are difficult to treat. Their role is to stop bacteria from producing proteins needed for their survival (Shaw et al., 1993). Unfortunately, due to widespread usage of these antibiotics in clinical treatment, bacterial strains have appeared that make these compounds ineffective. In fact antibiotic-resistant bacterial infection has become a concerning global threat to human health, according to World Health Organization (WHO) reports (Nature, 2013; Berendonk et al., 2015).

Several mechanisms of resistance to AGAs have been proposed including (a) the presence of AGA-modifying enzymes (Ramirez and Tolmasky, 2010), (b) the decrease of bacteria membrane permeability toward AGA uptake into the bacteria and extrusion of AGAs from the cell by efflux pumps (Kumar and Schweizer, 2005), and (c) the modification of the drug target as a result of mutations in the ribosome (Pfister et al., 2003) or methylations by 16S rRNA methyltransferases influencing the binding of AGAs (Doi and Arakawa, 2007; Leban et al., 2017). Nevertheless, the major mechanism of bacterial resistance in clinical isolates of gram-negative and gram-positive bacteria is assigned to enzymatic modification of the amino or hydroxyl groups of AGAs (Vakulenko and Mobashery, 2003).

#### Edited by:

Fahmi Himo, Stockholm University, Sweden

#### Reviewed by:

Tiziana Marino, Università della Calabria, Italy Rongzhen Liao, Huazhong University of Science and Technology, China

### \*Correspondence:

Katarzyna Swiderek ´ swiderek@uji.es

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 23 November 2018 Accepted: 18 December 2018 Published: 29 January 2019

#### Citation:

Martí S, Bastida A and Swiderek K ´ (2019) Theoretical Studies on Mechanism of Inactivation of Kanamycin A by 4 ′ -O-Nucleotidyltransferase. Front. Chem. 6:660. doi: 10.3389/fchem.2018.00660

**108**

There are three different families of AGA-modifying enzymes: the acetyl-CoA-dependent aminoglycoside acetyltransferases (AACs), the ATP-dependent aminoglycoside phosphotransferases (APHs), and the ATP-dependent aminoglycoside nucleotidyltransferases (ANTs) (Becker and Matthew, 2013). Sites of modification in kanamycin A (KAN) by different AGA-modifying enzymes are indicated on **Figure 1**. Each of these families catalyzes a different type of reaction. Thus, APHs catalyze regiospecific transfer of γ-phosphoryl group of ATP to one of the hydroxyl substituents presented in AGAs, AACs facilitate N/O-acetylation groups in AGAs, and finally ANTs promote reaction between MgATP and AGAs allowing to form O-adenylylated aminoglycoside and the magnesium chelate of inorganic pyrophosphate (MgPPi) formed after the reaction, as presented in **Figure 2**.

Herein, we focus on mechanistic studies of reaction catalyzed by 4′ -O-Nucleotidyltransferase from S. aureus, which modifies KAN antibiotic in the 4′ -OH [ANT(4′ )] position. KAN is a second-line injectable drug used in the treatment of tuberculosis (TB) (Ventola, 2015). TB is classified among 18 of the most serious drug-resistant threats<sup>1</sup> , since in many cases, as reported by WHO, TB can be resistant to first-line drugs (600,000 cases in year 2016) and/or to any other treatment (490,000 with multidrug-resistant TB). The role of KAN is to interact with the 30S subunit of prokaryotic ribosomes (exactly to bind in 16S rRNA, at the tRNA acceptor A site) in order to substantially rise the amounts of mistranslation (incorrect alignment with the mRNA), and it indirectly inhibits translocation that provokes insertion of the wrong amino acid into the peptide during protein synthesis (Pestka, 1974; Misumi and Tanaka, 1980; Carter et al., 2000). KAN belongs to an AGA family containing a 4,6-disubstituted 2-deoxystreptamine core (B ring) glycosidically linked at the 4 position to a glucosaminopyranose (ring A) and at position 6 to a N-acetylglucopyranose (ring C), as shown on **Figure 1**.

The crystal structures of the D80Y variant of ANT(4′ ) with bound KAN and AMPCPP (adenosince-5′ -[(α,β) methylene]triophosphate) indicate that the active form of the enzyme is a homodimer (Pedersen et al., 1995). Each monomer binds both substrates, nucleotide and aminoglycoside (KAN). However, and interestingly, the reaction takes place between substrates that are bound to different subunits. In general, as shown in **Figure 3**, the two nucleotides are far apart, and the two KAN molecules are at a distance of 3.5 Å from each other. The amino acid residues from the N-terminal domain interact mostly with the triphosphate moiety of the nucleotide, while those from C-terminal with the KAN (Revuelta et al., 2010).

Based on the analysis of the organization of the active site, it is believed that in order to access their binding pockets, both substrates must diffuse via the same cavity characterized by a strong negative electrostatic potential (Matesanz et al., 2012). Thus, it is assumed that KAN should bind before the nucleotide, reducing in this way possible repulsive interaction by an anionic specie (ATP) and highly charged regions of protein (Chen-Goodspeed et al., 1999). This is in contradiction to the studies previously done for ANT(2′ ), where an inverse order of binding was suggested (Lombardini and Cheng-Chu, 1980; Gates and Northrop, 1988). Nevertheless, the presence of both substrates is required for catalysis to take place.

The results of previous kinetic studies (Pelt et al., 1986) of S. aureus ANT(4′ ), with gentamicin as a substrate, indicate that the mechanism of antibiotic inactivation involves direct nucleotidyl transfer from ATP to the aminoglycoside. In fact, what was exactly observed experimentally, is inversion of stereochemistry on the α-phosphorus atom, suggesting that the chemical step involves an uneven number of phosphotransfers. Additionally, a concerted and slightly associative structure of transition state (TS) for the enzymatic 4′ -adenylation of KAN was suggested from the measurements of <sup>18</sup>O kinetic isotope effects (KIEs) for key oxygen atoms (Gerratana et al., 2001).

The transferred group in this reaction is the nucleoside monophosphate (see **Figure 2**), which is a monosubstituted derivative of the phosphoryl group, and it can be assumed that its chemical nature along the reaction path is similar to the nonsubstituted relative. Thus, as it is well known, phosphoryl transfer reactions with low-lying d-orbitals on the phosphorus atoms that permit the existence of phosphorus pentavalent moiety as intermediate (Cleland and Hengge, 2006; Marcos et al., 2008; Kamerlin, 2011; Wymore et al., 2014) have been proposed to proceed by two alternative mechanisms (Kamerlin et al., 2008), i.e., associative, where nucleophilic attack takes place before the living group departures, and dissociative, in which withdrawal of the living group preludes the nucleophilic attack. The reaction can proceed via only one step in which asynchronous forming and breaking bonds can still be observed since associative-like or dissociative-like mechanisms can be found for concerted paths (Bordes et al., 2017).

The degree to which nucleotidyl transfer goes by an associative mechanism is difficult to determine, since experimental characterization of TS geometry is not feasible. The nature of the TS for the non-enzymatic (Zhang et al.,

<sup>1</sup>Center of Disease Control and Prevention. Available online at: https://www.cdc. gov/drugresistance/biggest\_threats.html

2014) and enzymatic (Sucato et al., 2007, 2008; Oertell et al., 2012, 2014) nucleotidyl transfer reaction has recently been studied in human DNA polymerase β using different modified dNTP substrates that differed in the chemical structure.

In this paper, insight into the nucleotidyl transfer mechanism for KAN inactivation by ANT(4′ ) at molecular level is done using a QM/MM approach (Ridder and Mulholland, 2003). Molecular details of reactions involving ATP-cofactor are essential for understanding the mechanism phosphoryl transfer reactions promoted by presence of metals in the living cell, responsible for a wide range of processes (Cherepanov et al., 2008). In order to obtain values which can be directly compared with experimental data, the free energy surfaces were computed and compared to experimentally measured rate constants using the transition state theory (TST). Moreover, the <sup>18</sup>O kinetic isotope effects (KIEs) for bridge and nonbridge oxygen of ATP were determined and compared with experimental values. Finally, electrostatic potential generated by ANT(4′ ) in the active site was computed and compared with non-catalyzed reaction in order to understand the role of electrostatic effects for this particular reaction.

## COMPUTATIONAL METHODS

### System Setup

A molecular model of aminoglycoside 4'-nucleotidyltransferase [ANT(4′ )] was built based on biological assembly of homodimer structure with PDB ID 1KNY (Pedersen et al., 1995). The X-ray structure contains two KAN molecules bound into both active sites present in the enzyme, together with the non-hydrolizable ATP-cofactor, AMPCPP-Mg2<sup>+</sup> ion. In order to modify AMPCPP to ATP, the carbon atom was changed by oxygen in the α3 position. Thus, the final enzymatic model consisted of two subunits (253 residues each), two Mg2<sup>+</sup> chelate of ATP, and two KAN molecules.

Missing hydrogen atoms were added to the enzyme structure using the tLEAP (Schafmeister et al., 1995) program based on results of pKa computed for titratable residues at pH 7 using the PropKa 3.1 program (Olsson et al., 2011; Sondergaard et al., 2011). Results of pKa calculations are presented in **Table S1**. Interestingly, all glutamic acid residues for which the predicted values of pKa were over 7 are involved in the strong H-bond interactions with KAN (Glu67 ring A; Glu76 ring A; Glu142 ring C) and they should not be protonated. This misleading result is an effect of absence of hydrogen atoms in the substrate once the pKa values are computed. For the rest of the titratable residues, standard values of pKa were obtained. The geometrical analysis of specific interactions involving histidine residues were done and the following protonation states for these residues were assigned: for His17, 66, 181, and 241 the nitrogen atom of imidazole ring in δ-position was protonated, while in case of His100, 180, 181, and 207, hydrogen was added to the nitrogen atom in ε-position. Finally, no S-S bridge between Cys residues was detected. The kanamycin molecule was assumed to be a neutral species with the total charge set to 0. All hydroxyl groups were protonated, and all amine groups were assigned to be neutral.

Subsequently, neutralization of the system was achieved by adding 40 Na<sup>+</sup> counterions in the electrostatically most favorable positions. Finally, a full enzyme model was solved in a 10 × 10 × 10 nm<sup>3</sup> cubic box of TIP3P (Jorgensen et al., 1983) water. Then, 10 ns of NVT molecular dynamic (MD) simulations with a time step of 1 fs at 300 K were carried out after prior optimization, heating (from 0 to 300 K with 0.001 K temperature increment) and equilibration processes of 100 ps using an AMBER (Duan et al., 2003) force field implemented in NAMD (Phillips et al., 2005) software. The temperature during the MD simulation was controlled using the Langevin thermostat (Grest and Kremer, 1986). In order to improve the time of simulations, cut-offs for non-bonding interactions were applied using a smooth switching function between 14.5 and 16 Å. Additionally, periodic boundary conditions (PBC) were used. The parameters developed previously by Carlson and co-workers were used to describe the ATP-cofactor at MM level (Meagher et al., 2003). Missing parameters for KAN were generated using the antechamber tool and GAFF (Wang et al., 2004), with charges computed at Austin Model 1 (AM1) (Dewar et al., 1985) level. The obtained parameters are presented in **Table S2**. Root-meansquare-deviation (RMSD) and more analysis of MD simulations are provided in **Supplementary Material**. According to the timedependent evolution of the RMSD for the position of C-Cα-N, atoms of the protein backbone, the system can be considered equilibrated (see **Figure S1**).

### Potential Energy Surfaces

The last structure of MD simulation was then employed for the QM/MM calculations using the M06-2X hybrid functional (Zhao and Truhlar, 2008a,b) with the standard 6-31+G(d,p) basis set, to treat the QM subset of 76 atoms (see **Figure S2**). The rest of the system was described applying the AMBER and TIP3P force fields, as implemented in the fDYNAMO library (Field et al., 2000; Krzeminska et ´ al., 2015). The position of all atoms above 20 Å from KAN was frozen. In order to explore proposed mechanisms, the potential energy surfaces (PESs) at AM1, semiempirical level combined with MM force fields were computed. Based on chosen structures, a micromacro iteration optimization algorithm (Turner et al., 1999; Martí et al., 2005) at M06-2X/MM level was used to localize, optimize, and characterize the TS structures using a Hessian matrix containing all the coordinates of the QM subsystem, whereas the gradient norm of the remaining movable atoms was maintained at <0.01 kcal·mol−<sup>1</sup> ·Å −1 . Intrinsic reaction coordinates (IRCs) were traced down from located TSs to the valleys of the reactants and products in mass-weighted Cartesian coordinates. Subsequently, last structures from IRC were used to localize, optimize, and characterize the ground states, i.e., reactant and product complexes.

### Free Energy Calculations

To describe the mechanism of the reaction in condensed media, a free energy perturbation (FEP) (Swiderek et ´ al., 2013) method was used employing the M06-2X DFT functional to describe the QM sub-set of atoms. This method is usually second in choice, after umbrella sampling (Torrie and Valleau, 1977) (US), in terms of a potential mean force (PMF), and the selection is often dictated by a number of coordinates involved in approximate reaction coordinate (Swiderek et ´ al., 2015a). However, in these studies FEP employment originates in the limitations of semiempirical methods (SM) since, as it was shown recently by Otyepka and co-workers (Mlýnský et al., 2014) SM can easily fail and provide wrong conclusions about mechanistic paths in such theoretical models where the organic phosphorus atom is directly involved in chemical reactions.

Since FEP required the sampling of the environment along IRC traced previously from TS located at QM/MM level, the free energy pathway is obtained along a realistic reaction coordinate. Nevertheless, the limitation of this technique is assigned to lack of sampling on the chemical system, since just one characterized TS structure is used. Nevertheless, the FEP method opens the possibility of exploring the reaction path directly at a high level of theory. Thus, the QM wave function obtained at DFT level is polarized by the charges of the MM subset of atoms.

Free energy differences were estimated by mean of FEP methodology for the structures obtained along the IRC characterized by a single s coordinate:

$$\begin{split} s\_i &= s\_{i-1} \\ &+ \sqrt{\sum\_{j \in QM} m\_j \left( \left( \mathbf{x}\_{j,i} - \mathbf{x}\_{j,i-1} \right)^2 + \left( \mathbf{y}\_{j,i} - \mathbf{y}\_{j,i-1} \right)^2 + \left( \mathbf{z}\_{j,i} - \mathbf{z}\_{j,i-1} \right)^2 \right)} \end{split} \tag{1}$$

where xj,<sup>i</sup> , yj,<sup>i</sup> , and zj,<sup>i</sup> are the coordinates of the ith structure for the jth QM atom belonging to the IRC traced from the transition state structure (xj0, yj0, and zj<sup>0</sup> coordinates) and m<sup>j</sup> represents the corresponding masses of the atoms. Therefore, the free energy relative to the reactant is expressed as a function of the s coordinate as explained elsewhere (Swiderek et ´ al., 2013; Viciano et al., 2015). The MD simulations for the FEP calculation were performed at 300 K, using the NVT ensemble for the each window. 20 ps of production, with a time step of 1 fs, were completed. The total amount of windows required to generate the complete free energy path was 66 for the ATP-assisted, 37 for the Glu145-assisted mechanism in enzyme and 101 for reaction in water.

### Kinetic Isotope Effects

Kinetic isotope effects (KIEs) have been computed for isotopic substitutions of key atoms, from the TSs and the reactant complexes localized at the M06-2X/MM level of theory. Detailed information about the method used herein for computing KIEs can be found elsewhere (S´widerek et al., 2014, 2017a).

### RESULTS AND DISCUSSION

### Reaction Mechanism

MD simulations on the reactant complex at MM level shed some light onto the possible reaction mechanisms, to be later explored at QM/MM level. First, as can be seen in **Figure 4A**, the "in-line" position of the 4′ -hydroxyl group of KAN and pyrophosphate of ATP required for ensuring direct attack at the α-phosphorus atom, exists in only 2% of cases of 10,000 overall structures saved along 10 ns of MD simulations [note: distance between O4′ - Pα (ATP) not longer than 3.5 Å, and angle not smaller than 125◦ for O3α-O4′ -Pα (ATP) were assigned as boundary conditions, similar to the definition used by York and co-workers (Heldenbrand et al., 2014)]. This result supports the statement that in-line conformations are often rare. Moreover, the free energy required to bring the nucleophile in-line has been predicted, at least in some cases, to be only modest and likely not a dominant factor

on the overall catalytic rate (Min et al., 2007). In the X-ray structure, the same distance between nucleophile and Pα (ATP) of ATP imitator, AMPCPP equal to 5.0 Å, was originally interpreted as corresponding to the inactive conformation. Based on analysis made for this distance it is observed that the change of AMPCPP to ATP reduces the presence of non-reactive conformations to 15% of the entire population of structures generated along the MD. Nevertheless, a rather large spread in the values of the distances and angles can still be observed, suggesting that both substrates are not completely immobilized in the binding pocket of ANT(4′ ).

Secondly, the analysis of the distribution of distances indicates that the O4′ atom of KAN, which plays the role of a nucleophile in the studied reaction, can be activated by either O2α (ATP) of the ATP-cofactor, or OE2(Glu145) from Glu145, and rather not by Glu76 in the wild-type variant of ANT(4′ ). According to the population analysis, in c.a. 32% of the snapshots, the strongest interaction between H4′ and O2α was found with distance lower than 4 Å, as shown in **Figure 4B**.

Glu145 approaches the OH4′ group at a distance lower than 5 Å only in 5% of cases. However, Glu76 is located much farther and not closer than 7 Å. These observations are especially interesting taking into account results of experimental kinetic studies done for a single Glu145Gln, or Glu76Gln, and double Glu76Gln/Glu145Gln mutated variants of ANT4′ (Matesanz et al., 2012). In these studies, a loss of catalytic activity was observed only in case of the double mutated enzyme, suggesting that ANT(4′ ) is equipped with duplicated basis catalyst. However, from results of MD obtained in this work it can be concluded that, in case of the wild-type variant, the main role of catalytic base is played by Glu145, but it is possible that its role is taken over by Glu76 once the Glu145 is absent in the active site.

As indicated by key interactions found along the MD trajectory, two possible mechanisms for the adenylyl group transfer should be considered, i.e., ATP-assisted and Glu145 assisted mechanism, as shown in **Figure 5**. In both proposed mechanisms, the common process of nucleophilic attack of O4′ to Pα is expected to take place together with the Pα-O3α bond cleavage, and the only difference is related to the origin of acceptor of transferred H4′ from O4′ of the hydroxyl group. In the case of the ATP-assisted mechanism, H4′ is transferred to O2α, the oxygen of phosphate group not involved in interaction with Mg2<sup>+</sup> cation, while in the Glu145-assisted mechanism the same proton is transferred to the OE2 atom of the deprotonated carboxyl group of glutamic acid 145. Thus, in order to explore possible reaction pathways, two potential energy surfaces were computed at AM1/MM level (for details see Computational Method section), describing activation of O4′ of KAN by controlling antisymmetric combination of distances between the oxygen atom from the hydroxyl group of KAN (O4′ ) and its proton (H4′ ) and the same proton H4′ and its possible acceptor i.e., O2α or OE2Glu145, for ATP-assisted or Glu145-assisted mechanism, respectively, together with nucleophilic attack of O4′ to Pα of ATP. PESs are presented in **Figure 6**. The position of the TS and the reaction pathway traced by IRC calculation computed at the M06-2X/MM level are projected on the surfaces. As can be noticed, the position of the localized M06-2X/MM TS structures are quite close to quadratic regions defined on PES computed at AM1/MM level. Based on the shape of the obtained surfaces and after analysis of geometries of localized TS structures,(for which geometrical coordinates are given in **Table S5** and structures are presented in **Figure 7**), it is confirmed that both reactions proceed via concerted mechanisms with only one TS formed along the chemical path. Structures of both optimized TSs have an associative character, [see (More O'Ferrall, 1970; Jencks, 1985) plot presented in **Figure S3** and **Table S3**] in which nucleophilic attack slightly precedes the living group departure.

Additionally, the free energy barriers for both mechanisms were computed at M06-2X/MM level using the FEP method. The resulting activation free energies are 53.4 and 12.2 kcal·mol−<sup>1</sup> for the ATP-assisted and Glu145-assisted mechanisms, respectively (the computed profiles are presented in **Figure S4**). These results clearly indicate that the second mechanism is the more favorable one. Moreover, a stable (of −1.6 kcal·mol−<sup>1</sup> ) product complex is formed only in this mechanism. In contrary, in ATP-assisted mechanism the obtained product complex is energetically much more unstable, with energy of 40.3 kcal·mol−<sup>1</sup> with respect to the reactant complex. Thus, it can be concluded that inactivation of KAN in the active site of ANT(4′ ) takes place

with direct participation of the Glu145 residue in the chemical process.

It is worth mentioning that the computed free energy barrier of 12.2 kcal·mol−<sup>1</sup> for the Glu145-assisted mechanism is much smaller than those deduced from experimentally measured rate constants of 19.7 kcal·mol−<sup>1</sup> (for kcat = 0.06 ± 0.01 measured in 35◦C) (Revuelta et al., 2008), and 17.2 kcal·mol−<sup>1</sup> (for kcat = 1.3 ± 0.1 s−<sup>1</sup> measured in 25◦C) (Gerratana et al., 2001). This meaningful difference between experimental and theoretical values can be explained based on the fact that the rate-limiting step in this process is the release of the product rather than the chemical step, and thus experimental barriers can be treated, at most, as the upper limit value.

### Kinetic Isotope Effects (KIEs)

In order to ensure our predictions, the reaction mechanisms described above can now be examined using such a sensitive tool as heavy-atom kinetic isotope effects (KIEs). KIEs are defined as the ratio of rate constants for the reactions involving the light and the heavy isotopically substituted reactants, and reflect changes in bond order between the ground and rate-limiting transition states. In this section, the interpretation of KIEs obtained theoretically for studied reaction is done, including comparison with experimentally measured intrinsic KIEs for bridge and non-bridge oxygen atoms with labeled slow substrate analog m-nitrobenzyl triphosphate (mNBTP) instead of ATP (Gerratana et al., 2001). The change of the substrate seems not to influence the reaction mechanism, which still has the same regiospecificity, however it slows down inactivation of KAN of 2 orders of magnitude (Gerratana et al., 2001).

FIGURE 6 | Potential energy surfaces obtained at AM1/MM level with superposition of located transition states and the projection of the reaction pathway obtained along IRC calculations obtained at M06-2X/MM level for (A) ATP-assisted and (B) Glu145-assuted mechanisms. Isoenergetic lines are given in kcal·mol−<sup>1</sup> and distances in Å.

As shown in **Table 1**, in both ATP-assisted and Glu145 assisted mechanisms, normal primary KIEs (1◦ -KIE) of 0.9% and 1.2% for isotopically substituted oxygen in position 3α (oxygen of the leaving group), respectively, were computed. Both 1◦ - KIE values are in very good agreement with the experimentally

TABLE 1 | Primary and secondary kinetic isotope effects computed for isotopically substituted bridge and non-bridge oxygen atoms of ATP, respectively, in the ATP-assisted and Glu145-assited mechanism of KAN inactivation by ANT(4′ ).


\*Provided experimental values correspond to KIEs measured at pH = 7.7, labeled slow substrate analog m-nitrobenzyl triphosphate (mNBTP) and are taken from (Gerratana et al., 2001).

measured value of 1.4% (Gerratana et al., 2001). In all cases, the existence of normal 1◦ -KIE reflects loss of bond order formed between phosphorus atom Pα and the bridge oxygen O3α in the TS with respect to reactant complex. This result can surely indicate that in the rate-limiting step, the P-O3α bond is cleaved. However, 1◦ -KIE values make it impossible to distinguish between ATP- and Glu145-assisted mechanisms and, moreover, to provide conclusive evidence if reaction proceed via concerted or step-wise mechanism. In such a case determination of 2◦ -KIE is crucial to dispel these doubts. Calculation of KIE for <sup>18</sup>O-substituted non-bridge oxygen such as O5′ , O1α, and O2α has been done. As can be deduced from the results presented in **Table 1**, the total 2◦ -KIE for the ATP-assisted mechanism is slightly inverse, ca. −1% in contrast to the KIE computed for the Glu145-assisted mechanism with a very small normal KIE of 0.07%. The last one indicates clearly that loss of bond order for the non-bridge oxygens occurs simultaneously with P-O3α bond breaking and reveals a concerted mechanism. Existence of the small normal 2◦ -KIE is also observed experimentally (Gerratana et al., 2001), supporting our previous conclusion based on energetic analysis that the Glu145-assisted mechanism is the most favorable candidate and should correspond to the most realistic reaction pathway. Interestingly, the inverse 2◦ -KIE observed in the case of the ATP-assisted mechanism can be explained by the protonation of the non-bridge oxygen, O2α. Nevertheless, this result is in complete disagreement with obtained experimental results, which allows us once again to discard the ATP-assisted mechanism.

### Electrostatic Effects

The most accepted hypothesis explaining the origin of the enzymatic catalysis is based on an assumption that stabilization of TS is achieved by means of better electrostatic interactions with the protein compared with its equivalent reaction in solution (Adamczyk et al., 2011; Moliner, 2011). Validity of this hypothesis was examined and strongly supported by our previous results obtained for systems such as protease HIV-1 (Krzeminska et al., ´ 2016), glycine N-methyltransferase (GNMT) (Swiderek et ´ al., 2018) and de novo designed KEMP eliminase (Swiderek et ´ al., 2015b, 2017b). Thus, we decided to test this hypothesis also in the case of ANT(4′ ). For this purpose, one additional theoretical model was built, where KAN together with MgATP was solvated in a box of explicit water molecules. Additionally, in order to obtain the same reaction mechanism as proposed for Glu145 assited pathway, the presence of a base was required, and this was fulfilled by adding the propionate molecule as imitator of Glu residue to the model. Afterwards, TS structure was localized at M06-2X/MM level (its geometrical coordinates are given in **Table S6**). Subsequently, identical procedure, as explained in the computational method section was applied in order to explore free energy surface. Free energy surface obtained for reaction in aqueous solution is shown in **Figure S5**. As it is well-known, noncatalyzed phosphoryl transfer reactions are extremely slow, and enzymes can provide rate enhancements of >1020-fold (Lad et al., 2003).

Hence, our target herein was to understand the origin of enzyme catalytic power. In other words, the role of the enzyme in the KAN deactivation process is explored based on analysis of electrostatic potential generated by protein in a key position, i.e., on Pα atom, VP<sup>α</sup> (r) the center of transferred group. Since the rate-limiting step of the reaction catalyzed by ANT(4′ ) is the release of the KAN-AMP complex from the active site, it can be assumed that this enzyme achieved the highest possible catalytic effect to enhance the speed of this reaction.

The obtained results of VP<sup>α</sup> (r), generated by ANT(4′ ) and water solvent together with formal charge accumulated on the transferred group, are collected in **Table 2** and **Table S4**. A comparison of these magnitudes computed for TS structures for the reaction occurring through the base-assisted mechanism in enzyme and aqueous solution reveals the existence of a much larger positive VP<sup>α</sup> (r) generated on the negatively charged transferred group in the active site of the enzyme (801.5 ± 17.7 kJ·mol−<sup>1</sup> ) than in the water (628.2 ± 49.9 kJ·mol−<sup>1</sup> ). According to the basic laws of physics, larger positive electrostatic potential should better stabilize the transition state in which accumulation of strong negative charge of −0.698 and −0.619 a.u., in water and enzyme, respectively, is observed in this atom. Therefore, as a result of this stabilization, the reduction of the free energy barrier in enzyme is expected. This expectation is fulfilled by theoretical results where a higher barrier (of 47.9 kcal·mol−<sup>1</sup> ) was computed for reactions taking place in water solvent than in enzyme (12.2 kcal·mol−<sup>1</sup> ). Thus, the reduction of the rate constant of ca. 10<sup>26</sup> fold can be considered, according to presented hypothesis, to arise from the stronger electrostatic potential created by ANT(4′ ). The possible contribution of each amino acid residue to the catalytic power was later explored, indicating their specific role in catalysis. It was found that the highest contribution to the overall positive value of the electrostatic potential comes from the positively charged residues located in the surroundings of the active site such as Lys-149 (∼33.7%) from chain A, and Arg-42 (∼13.5%) and Lys-74 (∼10.7%) from chain B (see **Figure S7**). These residues can be considered to have the highest impact on enhancing the rate constant of the chemical reaction. On the other hand, nearby negatively charged Glu-141 and Glu-142 from chain A and Glu-76 from chain B produce negative electrostatic potential. Nevertheless, this unfavorable effect is too small to perturb the overall potential magnitude.

Finally, the analysis done for ATP-assisted and Glu145 assisted mechanisms occurring in the same active site shows, as expected, that values of VP<sup>α</sup> (r) generated on the transferred group in TS and RC are very similar. However, the meaningful



<sup>a</sup> presented values of charges correspond to sum of formal charges computed for Pα, O1α, O2α, O5' atoms <sup>b</sup> in case of TS for ATP-assisted the charge of transferred H4' to O2α was added.

difference is found in the charge distribution on substrates. (see **Table 2**) For the most favorable mechanism (with lower free energy barrier), a slight increase of negative charge on the phosphate group in TS in respect to RC (−0.619 and −0.582 a.u., respectively) is observed. However, the largest change in charge distribution from RC to TS was found in the ATP-assisted mechanism. In this case the larger negative charge of −0.846 a.u. is observed in RC. Since the H4′ proton is transferred from the hydroxyl group of KAN to the adenylyl group in this mechanism, its presence dramatically decreases the negative charge accumulated on the adenylyl group up to −0.130 a.u in TS. Consequently, VP<sup>α</sup> (r) generated by ANT(4′ ) in the case of the ATP-assisted mechanism would rather over-stabilize the ground state in comparison to the TS structure, and this could explain the high free energy barrier obtained for this mechanism.

### Compression Effects

The complementary hypothesis for enzymes catalyzing SN2 reactions is the so-called "compression hypothesis," originally proposed by Showen (Hegazi et al., 1979) for enzymatic methyl transfer. Extrapolated to hydrogen transfer reactions (Roston et al., 2013), this hypothesis explains the origin of the enzymatic catalysis, suggesting that specific protein fluctuations might reduce the donor-acceptor distance (DAD), and as a consequence decrease the reaction barrier by increasing the number of reactive trajectories (Kohen, 2015). Thus, since mechanistic results of the transfer of the adenylyl group presented in this work have revealed its SN2 character, we decided to use this reaction as an example that could shed some new light and contribute to the ongoing debate about the validity of the proposed hypothesis. For this purpose, the evolution of DAD vs. distances describing the –PO3R group transfer was analyzed for aqueous and enzymatic reaction, as presented in **Figure S6**. As it was observed, the decrease of the DAD from RC to TS is 0.68 and 0.38 Å for the ATP and Glu145-assisted mechanisms, respectively. DAD values are provided in **Table 2**. Interestingly, the minimum value of DAD (DADmin) is not achieved in TS, and in both cases it appears much beyond the highest energetic point along the reaction pathway, to be then elongated to allow for product formation. Decrease of a DAD is also observed in reaction in water, where 1DAD(RC−TS) was found to be 0.49 Å. However, in contrast to enzymatic reactions, in this case the DAD value

decreases constantly until a pentavalent intermediate is formed. Analysis of the influence of DAD distance in TS to the values of the free energy barriers, as plotted in **Figure 8**, reveals that short DADTS does not guarantee reduction in the reaction barrier. Nevertheless, an interesting relation was found indicating that the larger change in DAD from RC to TS, the slower the reaction is. This would suggest that large changes in DAD require more energy to be involved in the chemical process. Finally, it can be concluded that reduction of the reaction barrier is achieved only when the short DAD in the Michaelis complex is reached, i.e., 4.26 Å for the Glu145-assisted mechanism, vs. 4.45 Å for the ATP-assisted mechanism or 4.30 Å for reactions taking place in water.

### CONCLUSIONS

In this work the molecular mechanism of the transfer of the adenylyl group from ATP to KAN, and formation of KAN-AMP-Mg ternary complex as a result of the reaction catalyzed by the ANT(4′ ) enzyme was investigated. Two proposed mechanisms were studied, the ATP-assisted and Glu145-assited mechanisms, which were shown to proceed in a concerted manner, in which the O4′ atom of KAN is activated by proton abstraction by either ATP or Glu-145 residue, respectively, and at the same time formation of the O4′ -Pα bond and breaking of the Pα-O3α bond occur. Based on the obtained free energy barriers, it was deduced that the most favorable mechanism for KAN deactivation by ANT(4′ ) proceeds via the Glu145 assisted mechanism, with the barrier of 12.2 kcal·mol−<sup>1</sup> much smaller than that obtained for the ATP/Mg-assisted process (53.4 kcal·mol−<sup>1</sup> ). This value is slightly underestimated when compared to the barriers obtained from experimentally measured rate constants, i.e., between 17.2 and 19.7 kcal·mol−<sup>1</sup> . The lower barrier obtained in theoretical studies can be explained by the fact that the rate-limiting step in this reaction is, in fact, not the chemical conversion, but the release of the KAN-AMP complex. Then, the theoretical prediction of the most favorable mechanism was confirmed by determination of KIEs. This sensitive tool allowed us to discriminate between both mechanisms based on 2◦ -KIEs computed for isotopic substitution of non-bridge oxygen atoms. In the case of the ATP-assisted mechanism, the inverse (<1) value of 2◦ -KIE was obtained, being completely opposite to the experimentally measured data. However, very good agreement between the experimentally and theoretically determined primary and secondary KIEs was found in the case of the Glu145-assisted mechanism, confirming its existence.

Additionally, calculation of the electrostatic potential generated by enzymes on atoms involved in chemical reactions reveals that electrostatic effects can be correlated with the height of computed free energy barriers. Thus, the high barrier of 47.9 kcal·mol−<sup>1</sup> observed for the reaction in the water solvent can be explained by the low positive potential (of c.a. 628.2 ± 49.9 kJ·mol−<sup>1</sup> ) generated on the negatively charged adenyl-group. In contrast, the higher electrostatic potential of ca. 800 kJ·mol−<sup>1</sup> generated in the active site of the enzyme stabilizes a much better structure of TS and as a result reduces the reaction barrier ca. 4-fold.

Moreover, in the case of reactions with more than one possible mechanism as the one under study in the present paper, the obtained results allow us to conclude the important role of enzymes to direct chemical transformation through the reaction mechanism in which distribution of charges on key atoms in the structures of TS is the most compatible one to its electrostatic potential.

Finally, the "compression hypothesis" for reaction of SN2 character was examined, indicating that a short donoracceptor distance in TS does not guarantee reduction in the reaction barrier, and that this reduction is achieved only when the short DAD in the Michaelis complex is found.

### AUTHOR CONTRIBUTIONS

SM implemented FEP method in fDynamo library and computed free energy surfaces. AB served as an expert in ANT(4′ ) enzyme, corrected the manuscript. KS designed the studies, ran ´ calculations, analyzed results and finally wrote the manuscript.

### ACKNOWLEDGMENTS

This work was supported by the Spanish Ministerio de Economía y Competitividad funds (project CTQ2015-66223-C2). KS would ´ like specially to thank the MINECO for a Juan de la Cierva– Incorporación (ref. IJCI-2016-27503) contract. AB thanks MEC CTQ2016-79255-P project. The authors also acknowledge the Servei d'Informática, Universitat Jaume I, for generous allotment of computer time.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2018.00660/full#supplementary-material

Supplementary Table 1 | Empirically predicted pKa values for titratable residues. Atom types, charges and parameters obtained for Kanamycin A used in MM simulations. Results of MD simulations: RMSD, RMSF, evolution of energy and thermostat control. Schematic representation of active site. O'Ferrall-Jencks plot for ATP-assisted and Glu145-assisted mechanism. Free energy profiles for ATP-assisted and Glu145-assisted mechanism catalyzed by ANT(4′ ) computed at M06-2X/6-31+G(d,p)//AMBER/TIP3P level. Free energy profile for base-assisted mechanism in aqueous solution computed at M06-2X/6-31+G(d,p)//AMBER/TIP3P level. Key distances and angles for reactant complex, transition state and product complex localized along ATP-assisted and Glu145-assisted mechanism at M06-2X/6-31+G(d,p)//AMBER/TIP3P level. Atomic charge computed for structures localized at M06-2X/AMBER/TIP3P level. Geometrical coordinates of QM atoms for transition state structures localized at M06-2X/6-31+G(d,p)//AMBER/TIP3P level for reaction catalyzed by ANT(4′ ). Evolution of donor-acceptor distance (DAD) along reaction path in catalyzed and uncatalyzed reaction. Contribution of key amino acid residues to overall value of electrostatic potential. Geometrical coordinates of QM atoms for transition state structure localized at M06-2X/6-31+G(d,p)//AMBER/TIP3P for reaction is aqueous solution.

### REFERENCES


Berendonk, T. U., Manaia, C. M., Merlin, C., Fatta-Kassinos, D., Cytryn, E., Walsh, F., et al. (2015). Tackling antibiotic resistance: the environmental framework. Nat. Rev. Microbiol. 13, 310–317. doi: 10.1038/nrmicro3439


the 30S ribosomal subunit and its interactions with antibiotics. Nature 407, 340–348. doi: 10.1038/35030019


relationship between ribosomal susceptibility and X-ray crystal structures. Chembiochem 4, 1078–1088. doi: 10.1002/cbic.200300657


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Martí, Bastida and Swiderek. This is an open-access article ´ distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparing Hydrolysis and Transglycosylation Reactions Catalyzed by Thermus thermophilus β-Glycosidase. A Combined MD and QM/MM Study

Sonia Romero-Téllez 1,2, José M. Lluch1,2, Àngels González-Lafont 1,2 \* and Laura Masgrau1,2 \*

<sup>1</sup> Departament de Química, Universitat Autònoma de Barcelona, Barcelona, Spain, <sup>2</sup> Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain

#### Edited by:

Vicent Moliner, University of Jaume I, Spain

#### Reviewed by:

Alessandro Silva Nascimento, University of São Paulo, Brazil Xabier Lopez, University of the Basque Country, Spain

#### \*Correspondence:

Àngels González-Lafont angels.gonzalez@uab.cat Laura Masgrau laura.masgrau@uab.cat

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 20 December 2018 Accepted: 15 March 2019 Published: 10 April 2019

#### Citation:

Romero-Téllez S, Lluch JM, González-Lafont À and Masgrau L (2019) Comparing Hydrolysis and Transglycosylation Reactions Catalyzed by Thermus thermophilus β-Glycosidase. a Combined MD and QM/MM Study. Front. Chem. 7:200. doi: 10.3389/fchem.2019.00200 The synthesis of oligosaccharides and other carbohydrate derivatives is of relevance for the advancement of glycosciences both at the fundamental and applied level. For many years, glycosyl hydrolases (GHs) have been explored to catalyze the synthesis of glycosidic bonds. In particular, retaining GHs can catalyze a transglycosylation (T) reaction that competes with hydrolysis (H). This has been done either employing controlled conditions in wild type GHs or by engineering new mutants. The goal, which is to increase the T/H ratio, has been achieved with moderate success in several cases despite the fact that the molecular basis for T/H modulation are unclear. Here we have used QM(DFT)/MM calculations to compare the glycosylation, hydrolysis and transglycosylation steps catalyzed by wild type Thermus thermophilus β-glycosidase (family GH1), a retaining glycosyl hydrolase for which a transglycosylation yield of 36% has been determined experimentally. The three transition states have a strong oxocarbenium character and ring conformations between <sup>4</sup>H<sup>3</sup> and <sup>4</sup>E. The atomic charges at the transition states for hydrolysis and transglycosylation are very similar, except for the more negative charge of the oxygen atom of water when compared to that of the acceptor Glc. The glycosylation transition state has a stronger SN2 character than the deglycosylation ones and the proton transfer is less advanced. At the QM(PBE0/TZVP)/MM level, the TS for transglycosylation has shorter O4GLC-C1FUC (forming bond) distance and longer OE2GLU338-C1FUC (breaking) distance than the hydrolysis one, although the HACC proton is closer to the Glu164 base in the hydrolysis TS. The QM(SCC-DFTB)/MM free energy maxima show the inverted situation, although the hydrolysis TS presents significant structural fluctuations. The 3-OHGLC group of the acceptor Glc (transglycosylation) and WAT432 (neighbor water in hydrolysis) are identified to stabilize the oxocarbenium transition states through interaction with O5FUC and O4FUC. The analysis of interaction suggests that perturbing the Glu392-Fuc interaction could increase the T/H ratio, either by direct mutation of this residue or indirectly as reported experimentally in the Asn390I and Phe401S cases. The molecular understanding of similarities and differences between hydrolysis and transglycosylation steps may be of help in the design of new biocatalysts for glycan synthesis.

Keywords: glycosylhydrolase, transglycosylation, QM/MM, glycans, hydrolysis, glycosyl-enzyme complex, GH1

### INTRODUCTION

Carbohydrates and their derivatives (glycoconjugates and glycosides) play important biological functions, including energy storage, structural roles and also the encoding of a molecular and cell recognition language that drives the specificity of such processes. Accordingly, glycans are vital for normal life development and are also key factors in the progression of many diseases (from pathogen infection to cancer) (Seeberger and Cummings, 2017). In order to develop fundamental and applied research in the glycosciences field, access to pure glycans and in sufficient amounts is needed. Glycans biosynthesis involves the action of a repertoire of enzymes, amongst them glycosyltransferases (GTs, which catalyze the synthesis of a new glycosidic bond) and glycoside hydrolases (GHs). These enzymes catalyze the corresponding reaction with two possible stereochemical outcomes, that is, retention or inversion of the configuration at the anomeric carbon of the transferred/hydrolyzed sugar.

Besides being the target of drug design investigations, carbohydrate-active enzymes are of interest in glycans processing and synthesis, an important area of fundamental research and also for the preparation of commercially-valuable products. Glycoside hydrolytic enzymes (e.g., amylases, cellulases) are used in the e.g., food, textile, detergents, cosmetics, pulp, and paper industries, representing around one third of the global industrial enzyme market (Plou et al., 2007; Sajith et al., 2016). The enzymatic synthesis of glycosidic bonds by carbohydrateactive enzymes has also been a subject of study for over 60 years and different strategies have been developed that circumvent or complement chemical approaches, which usually require multiple protection, and deprotection steps to obtain the desired oligosaccharide (Kiessling and Splain, 2010; Bissaro et al., 2015b; Danby and Withers, 2016). In Nature, GTs are the main catalysts for the synthesis of glycosidic bonds. However, their broad application is hindered by the difficulties in their expression and purification and due to the economic costs of obtaining the nucleotide-phosphate sugars they use as donor substrates, despite progresses have been made in that direction (Field, 2011). As an alternative, GHs have been explored to catalyze glycosidic bond formation (Planas and Faijes, 2002; Cobucci-Ponzano et al., 2011; Bissaro et al., 2015b). GHs are more abundant than GTs, are much easier to obtain, cover a wide range of substrate specificities and their substrates are cheap.

In particular, retaining GHs can operate in a synthetic mode if the equilibrium is displaced toward glycoside bond formation (thermodynamically controlled processes) or by using activated glycosyl donors (kinetically controlled transglycosylation) (Planas and Faijes, 2002). Retaining GHs follow a double displacement mechanism in two subsequent steps with formation of a covalent glycosyl-enzyme intermediate. Two catalytic carboxyl groups (separated by ∼5Å) act as general acid/base and nucleophile in the reaction (as exemplified in **Figure 1** for the process studied in this work). In a first step (referred as glycosylation, G), the nucleophile attacks the anomeric carbon while the carboxylic acid protonates the glycosidic oxygen of the leaving group, and the glycosyl-enzyme intermediate is formed. In the final step (deglycosylation, D), the residue previously acting as an acid now deprotonates the incoming acceptor substrate that attacks the anomeric carbon forming the final product with net retention of the configuration. If the acceptor is a water molecule, hydrolysis (H) occurs. However, with the presence of a different suitable sugar acceptor, many retaining GHs are capable of catalyzing transglycosylation (T).

Different approaches and experimental conditions have been used to enhance the transglycosylation vs. hydrolysis (T/H) ratio, e.g., high acceptor concentrations, the use of highly reactive glycosyl donors (activated donors) like aryl glycosides or glycosyl fluorides, removal of the transglycosylation product or enzyme immobilization. With all, yields and regiospecificities are still low (Planas and Faijes, 2002). Engineering of GHs has also been investigated. A major breakthrough in the field was accomplished with the development of glycosynthases, in which hydrolysis is abolished by replacing the catalytic nucleophile by a non-nucleophilic residue and the use of activated substrates (Mackenzie et al., 1998; Malet and Planas, 1998; Moracci et al., 1998; Faijes and Planas, 2007; Cobucci-Ponzano and Moracci, 2012; Danby and Withers, 2016). Other studies have tried to understand the molecular determinants for having a transglycosylation vs. an hydrolytic enzyme activity by applying mutational studies (Bissaro et al., 2015b). Notably, Dion and coworkers (Feng et al., 2005) successfully applied directed evolution techniques to the β-glycosidase of Thermus thermophilus (Ttβ-gly), belonging to the CAZy family GH1 (Henrissat, 1991; Cantarel et al., 2009; Hart and Copeland, 2010), to increase its ability to synthesize oligosaccharides by transglycosylation.

Their findings were later "semi-rationalized" by realizing that mutations leading to an increase in the T/H ratio in Ttβ-gly were located at highly conserved positions at the−1 subsite (the one accommodating the hydrolyzed/transferred monosaccharide) (Teze et al., 2014). In this way, they created Ttβ-gly mutants (e.g., Tyr284Phe, Asn282Thr, Phe401Ser, Asn163Ala, Arg75Ala) that catalyzed transglycosylation between 4-nitrophenyl β-Dfucopyranoside (pNP-Fuc) (donor substrate) and N-methyl-O-benzyl-N-(β-D-glucopyranosyl)-hydroxylamine (BnON(Me)- Glc, acceptor), with increased yields of up to 82% against a 36% in wild type enzyme, although in all cases the catalytic efficiency

is significantly reduced when compared to WT Ttβ-gly. The approach has since then been applied to other GHs families (glycosyl hydrolases have been classified into different families according to sequence similarity) (Henrissat, 1991; Cantarel et al., 2009; Teze et al., 2015; Saumonneau et al., 2016). The authors postulated that mutation of these well-conserved residues around the−1 subsite may induce lower stabilization of the deglycosylation transition states (TS), being the effect larger for hydrolysis than for transglycosylation reaction, therefore resulting in the higher T/H ratio measured experimentally (Teze et al., 2014). This suggests that the H and T transition states present different characteristics and stabilization interactions. For another family GH1 β-glucosidase (NkBgl), mutation of the catalytic acid/base glutamate to aspartate was also found to generate new transglycosylation products not produced by the wild-type enzyme (Jeng et al., 2012). The authors also proposed that this catalytic residue is important in substrate entry and product release and noticed the importance of aromatic rings in the aglycone site to facilitate acceptor (other than water) binding. Compiling evidences support the idea that the properties of the deglycosylation transition state, substrate-specific interactions involving the acceptor substrate and specific features to channel and retain water molecules close to the catalytic center, are key determinants for the T/H partition (Kempton and Withers, 1992; Bissaro et al., 2015a,b; David et al., 2017). Still, the molecular details for the basis of T/H modulation are unclear.

Computational studies on retaining GHs have mainly been focused on the glycosylation step, fewer to the hydrolysis one and very few to transglycosylation. For enzymes belonging to CAZy family GH1, the one investigated in this work, quantum mechanical molecular mechanical (QM/MM) studies on glycosylation and hydrolysis steps catalyzed by Oryza sativa (rice) β-glucosidase (Osβ-gly) acting on glucose disaccharides have been reported (Wang et al., 2011, 2013; Badieyan et al., 2012). They corroborated the predicted oxocarbenium-like nature of both TSs, which present the corresponding breaking bond already broken and the forming one far from established. The interaction of the C2 hydroxyl group from the sugar with the catalytic residues was also confirmed to play an important role in the catalytic reaction by facilitating the sugar ring conformational change toward the TS and by contributing to TS stabilization. This is in agreement with experimental data that have for long identified this interaction as important for TS stabilization, with a contribution of up to 10 kcal/mol (Zechel and Withers, 2000). Recently, the effect of such interaction on lowering the free energy barrier for the glycosylation and transglycosylation reactions catalyzed by a transglycosidase of CAZy family 72, was calculated to be of 11 and 16 kcal/mol, respectively (Raich et al., 2016). The effect of using different QM/MM partitions on the barrier heights of glycosylation and hydrolysis in Osβgly has also been investigated (Badieyan et al., 2012). It was concluded that a Tyr residue equivalent to Ttβ-gly Tyr284 (thus interacting with the catalytic nucleophile and the sugar O5 atom), has an important contribution to the energy profile, being the effect more significant for deglycosylation than for glycosylation. Importantly, the study showed that this residue had to be included in the QM region to obtain reliable potential energy barriers and ring conformations. The effect of other residues from the−1 subsite was also analyzed.

Very few QM/MM studies have compared hydrolysis and transglycosylation reactions for retaining glycosidases and, to the best of our knowledge, none has focused on family GH1. BB1K:AMBER QM/MM calculations on a β-galactosidase belonging to family GH2 were used to study the catalytic mechanism of glycosylation, hydrolysis and transglycosylation reactions, the latter leading to different regioproducts (Bráa et al., 2010). All TSs were characterized as dissociative and with the proton still attached to the acidic residue, although, unfortunately, more detailed geometric information on the transglycosylation TS was not given. The 2-OH group H-bond with the nucleophile was also related to pyranosyl ring distortion and TS stabilization. The energy barriers for transglycosylation were found to be higher than for hydrolysis by ∼2–4 kcal/mol, and the origin of the observed regioselectivity was found to be thermodynamic more than kinetic. Recently, the hydrolysis and transglycosylation reactions catalyzed by a family GH3 βglucosidase have been studied by umbrella sampling calculations at the SCC-DFTB/CHARMM level (Geronimo et al., 2018). Both reactions were found to have similar free energy barriers (∼18 kcal/mol) but quite different TSs; the predicted TS for hydrolysis also differed from the ones previously described. For hydrolysis, the TS had the glycosyl-water bond practically formed (1.50 ± 0.04 Å), resulting in a reduced ionic character of the sugar which, according to the Cremer-Pople polar coordinates (Cremer and Pople, 1975) that describe ring conformations, presented a <sup>4</sup>C<sup>1</sup> conformation. For transglycosylation, an earlier TS was predicted (with the new bond only partially formed, 1.9 ± 0.1 Å), with the transferred sugar more positively charged and in a <sup>4</sup>H<sup>3</sup> conformation. In both cases, proton transfer to the catalytic base occurred late, especially in the hydrolysis step, where after approaching the base catalyst at 1.6 Å.

Here we present a QM(DFT)/MM study of the glycosylation, hydrolysis and transglycosylation steps catalyzed by wildtype Ttβ-gly using pNP-Fuc as donor and BnON(Me)-Glc as acceptor. Ttβ-gly catalyzes the hydrolysis of β-D-galactoside, β-D-glucoside, and β-D-fucoside derivatives, showing the highest activity with the latter (Dion et al., 1999). As mentioned, in the presence of a suitable acceptor, it can also catalyze transglycosylation; BnON(Me)-Glc has been reported to produce a high transglycosylation yield in the native enzyme and a sole regioisomer product (N-methyl-Obenzyl-N-(β-D-fucopyranosyl(1→ 4)β-D-glucopyranosyl)-hydr oxylamine) (Teze et al., 2013). Our main goal is to gain understanding and compare the molecular mechanisms underlying hydrolysis and transglycosylation. It has been proposed that the main driving force for increasing the T/H ratio must be the relative destabilization of the hydrolysis TS as compared to the transglycosylation one, which implies that differential interactions are involved in each case.

### METHODOLGY

### System Setup

Initial coordinates for the wild-type Ttβ-gly were taken from the corresponding X-ray structure with PDB code 1UG6 (resolution 0.99 Å) (Lokanath et al., in press). For the incorporation of the substrates into the enzyme active site, the 1UG6 coordinates were overlaid with those of the Osβ-gly structure crystalized with non-hydrolyzed cellotetraose (CTT) substrate(PDB code: 3F5J, resolution 1.95 Å) (Chuenchor et al., 2011). In this way, we took advantage of the position of CTT and modified it with PyMol program<sup>1</sup> , by building the fucose moiety from the CTT glucose ring closer to the catalytic residues (Glu338 and Glu164 of the 1UG6 wild-type structure). The second ring of the CTT substrate was used to build up the (p-nitrophenol (pNP)) leaving group of the pNP-Fuc substrate. Two crystallographic water molecules from the wild-type structure that clashed to the added substrate were removed; the remaining of crystallographic waters were kept in the model. Missing hydrogen atoms were added to the model and the ionizable residues were protonated using Propka3.0 (Olsson et al., 2011) at pH = 7.0, except for Glu338 and Glu392 that were modeled as deprotonated, Glu164 as protonated and His119 was protonated at Nδ. All molecular dynamics (MD) simulations were done using the ff14SB (Maier et al., 2015) Amber force field for the protein; GLYCAM06j (Kirschner et al., 2008) and GAFF (Wang et al., 2004) atom types and parameters were employed for the sugar moieties and the pNP group, respectively. In addition, several missing parameters corresponding to a bond, and several intramolecular angles and dihedrals of the pNP-Fuc substrate were completed searching for the corresponding atom types translation in GAFF and GLYCAM06j force field databases (those parameters are given in **Supplementary Scheme S1** and **Supplementary Table S1**). Three sodium ions were added to neutralize the system, which was finally solvated in a cubic box of preequilibrated TIP3P

<sup>1</sup>The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

water molecules, with no <15 Å from the edge of the water box to the nearest protein atom. The resulting system contained 70,227 atoms.

### Molecular Dynamics (MD) Simulation Details

The solvated Ttβ-gly/pNP-Fuc model system was relaxed by performing energy minimization at the MM level with the steepest descent and conjugate gradient methods. The minimization protocol consisted in a three-step process, firstly restraining all the system but the ligand (pNP-Fuc), secondly restraining only the protein atoms (except hydrogen atoms) and a third step without restraints. Once the system was relaxed, an MD simulation was performed under periodic boundary conditions (PBC). The MD protocol consisted in a heating simulation (from 0 to 300 K) during 200 ps under NVT conditions using Langevin dynamics. Following, an equilibration period of 400 ps was performed. The first 200 ps were run under NPT conditions to reach a system density of around 1 g cm−<sup>3</sup> and using a isotropic weak-coupling algorithm and the Berendsen barostat (Berendsen et al., 1984) at 1 atm. Then, the system volume was fixed and the next 200 ps were run under NVT conditions. An equilibration of 10 ns and the final production run of 100 ns were performed at 300 K using the NVT ensemble without any restraints. Along the MD, the covalent bonds containing hydrogen were constrained using the SHAKE algorithm (Ryckaert et al., 1977), and the particle-mesh Ewald method (Essmann et al., 1995) was used to treat long-range electrostatic interactions. A 1fs time step was used in all the MD trajectories. The analysis of the MD simulation was carried out using the standard tools of the AMBER 16 package (Case et al., 2005) [cpptraj program (Roe and Cheatham, 2013)] and VMD (Humphrey et al., 1996). Based on this analysis, we established the criteria for the selection of a few frames of the MD trajectory that served as starting point for the QM/MM calculations of the glycosylation (G) reaction.

To obtain the starting structure for the QM/MM calculations of the hydrolysis reaction (H), the product geometry of the glycosylation process was modified by deleting all the pNP coordinates except those of the just formed hydroxyl group that was converted into a water molecule using the PyMol program. The system was solvated with a cubic box of TIP3P water molecules, and their coordinates were relaxed at the MM level by constraining all protein atoms and the just built water molecule (acceptor). This was followed by a short MD simulation (210 ps) to allow the rearrangement of water molecules in the active site. From the last frame of that MD trajectory a QM/MM minimization was carried out to locate a reactant minimum to initiate the hydrolysis reaction.

For the transglycosylation reaction (T), the coordinates of the above mentioned CTT substrate were used as template to build those of the BnON(Me)-Glc acceptor substrate, enforcing the 4- OH hydroxyl group from the glucose moiety to overlay with the hydroxyl coordinates of the pNP leaving group from the previous glycosylation step. Three water molecules were deleted due to clashes with the acceptor molecule. The system was relaxed as done for the hydrolysis reactant.

All simulations were carried out using AMBER16 software (GPU (CUDA) version of the PMEMD (Götz et al., 2012; Salomon-Ferrer et al., 2013) package.

### QM/MM Calculations

QM/MM calculations were performed with the modular program package ChemShell (Sherwood et al., 2003; Metz et al., 2014) using TURBOMOLE-V6.3 (Ahlrichs et al., 1989) to obtain the QM energies and gradients at the DFT level. The PBE0 functional (Adamo and Barone, 1999) was used, as it gave errors smaller than 0.5 kcal/mol in a recent benchmarking modeling glycosidic bond hydrolysis by glycosidases (Pereira et al., 2017) and inclusion of D3 correction did not affect significantly the calculated energies. MM energies and gradients were evaluated by DL\_POLY (Smith and Forester, 1996), which was accessed through the ChemShell package, using the AMBER force field. The electrostatic embedding scheme (Bakowies and Thiel, 1996) was used within the QM/MM approach to let the MM point charges to polarize the electronic density of the QM region. No cutoffs were introduced for the nonbonding MM and QM/MM interactions. In all the QM/MM calculations, the cubic box of water molecules was simplified to a 30 Å sphere surrounding the full protein (**Figure 2**). All residues and water molecules within 15 Å from the anomeric center were included in the optimization process as active region (around 2,100 atoms) while the remaining atoms were kept fixed.

Two different QM regions were studied (**Figure 3**): the small QM region containing 84 atoms in the glycosylation step (the pNP-Fuc substrate, four nearby waters, and the sidechains of Glu164, Glu338, and Tyr284), and a large QM region with 138 atoms (including the small QM region and, in addition, the side-chains of Asn282, Asn163, Arg75, Gln18, and Glu392). For the hydrolysis and transglycosylation QM/MM descriptions the O-pNP group was substituted in the QM regions by a water molecule and the BnON(Me)-Glc molecule, respectively. Three water molecules were added (besides the nucleophilic water, WAT431) in the QM zones for the hydrolysis calculations whereas only one QM water molecule was left in the transglycosylation study. The number of QM atoms in the small/large regions for the hydrolysis and transglycosylation steps are 67/121 and 100/154, respectively. The methodology based on link atoms (three in the QM(small)/MM models and eight in the QM(large)/MM partitions) was used to define the QM/MM boundary with the charge-shift approach (Claeyssens et al., 2006). The total charge of both QM regions is−1 in the three catalytic reaction steps.

QM/MM optimizations were carried out employing the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) (Liu and Nocedal, 1989) algorithm combined with the Hybrid Delocalized Internal Coordinate Scheme (Billeter et al., 2000) as implemented in Chemshell (Metz et al., 2014). Reaction paths were scanned by performing harmonically restrained optimizations along a properly defined reaction coordinate for each chemical process in steps of 0.2 Å. Reaction coordinates were defined as linear combinations of the three main distances involved in each step, a definition that we have successfully used when modeling other carbohydrate-active

the arrangement of the QM atoms for the glycosylation reaction at the active site.

FIGURE 3 | Representation of the QM regions used in the QM/MM study of the glycosylation reaction. The small QM region includes: the substrate atoms and water molecules (in black), the catalytic residues (in blue) and the Tyr184 residue (in red). For the hydrolysis and transglycosylation the O-pNP group was substituted by a water molecule and the BnON(Me)-Glc acceptor substrate, respectively. Three additional waters were included for the hydrolysis and just one for the transglycosylation step. The large QM region comprises for the three reaction steps the corresponding small QM region plus the protein residues depicted in green.

enzymes (Gómez et al., 2015). Thus, RC = [d(C1FUC-O4DpNP)-d(C1FUC-OE2GLU338)-d(HGLU164-O4DpNP)] for glycosylation, RC = [d(C1FUC-OE2GLU338)-d(C1FUC-OWAT431) d(H1WAT431-OE2GLU164)] for hydrolysis and RC = [d(C1FUC-OE2GLU338)-d(C1FUC-O4GLC)-d(H4OGLC-OE2GLU164)] for transglycosylation. For each reaction step, the structure corresponding to the maximum of the potential energy profile was taken as indicative of the corresponding transition state. The PBE0 hybrid functional and two basis sets (SVP and TZVP) were used for the description of the small QM region, whereas the large QM region was always treated at the PBE0/TZVP level. In addition, QM(PBE0/TZVP)/MM single point energy calculations on the QM(PBE0/SVP)/MM geometries along the hydrolysis and transglycosylation potential energy profiles were also performed.

Natural population analysis (NPA) charges (Reed et al., 1985) were determined from QM/MM calculations with the QM region described at the PBE0/TZVP level. The contribution of the different residues to the QM/MM potential energy barriers of the hydrolysis and transglycosylation steps was examined by setting their point charges to zero in additional single point energy calculations at the QM(PBE0/TZVP)/MM level along the QM(PBE0/TZVP)/MM reaction paths.

Umbrella sampling at the QM(SCC-DFTB)/AMBER level was performed to compute the free energy profile for the hydrolysis and transglycosylation steps, using the dynamics module within ChemShell. The reaction coordinates, defined as before, were scanned at 0.1 Å intervals using a harmonic force constant of 300 kcal/mol·Å 2 . The same force constant was used for restraining the distance between OE1GLU338 and O2FUC to 2.4 Å in the hydrolysis step to correct for undesired proton transfers occurring during the simulations. Still, some of the hydrolysis data had to be discarded in the analyses presented below. Forty-five windows were used for hydrolysis and fifty for transglycosylation. Each simulation consisted of 20 ps of equilibration and 60 ps of production/data collection, under the NVT ensemble and using the Nosé-Hoover thermostat (Nosé, 1984; Hoover, 1985). All atoms from residues beyond 20 Å of C1FUC were frozen. The histograms show sufficient overlap of the windows for the chosen stepsize and force constant. The umbrella integration analysis method (Kästner and Thiel, 2005, 2006) was used to compute the free energy profiles.

### RESULTS AND DISCUSSION

## Glycosylation Step

### Molecular Dynamics Simulations

To obtain a first structural insight into the dynamics of the Ttβ-gly/pNP-Fuc Michaelis complex, the root-mean-square deviations (RMSDs) of the protein backbone and of the heavy atoms of the pNP-Fuc substrate were calculated along the 100 ns of production trajectory (**Supplementary Figure S1**). The protein structure is well equilibrated. The substrate RMSD (with an average value of 0.78 ± 0.37 Å) presents an oscillatory pattern because the pNP phenyl group is flipping between two conformations all along the trajectory. As it will be detailed below, the Fuc ring is more rigidly bound at the−1 subsite than the pNP group is at the larger +1 subsite, thus having this latter moiety of the substrate more room to move along the simulation.

To analyze in more detail the molecular interactions more likely to play a role in the catalytic mechanism, a search of the most populated hydrogen bonds between the substrate, protein residues and water molecules at the active site was carried out along the MD simulation. In **Table 1** a summary of these hydrogen bonds is presented (a more complete analysis is given in **Supplementary Tables S2–S5**). The fraction of frames the hydrogen bond is present (the H-bond occupancy), along with the corresponding donor···acceptor average distance, are given. An occupancy higher than 90% is obtained for the interaction between the carboxylate oxygen OE1 of the catalytic acid/base residue Glu164 and HD22 of Asn282. This interaction helps to maintain (with an occupancy of 90.6%) the hydrogen bond between the proton of Glu164 and the glycosidic oxygen (O4DpNP). This is a very important interaction for the glycosylation process because in this way Glu164 remains wellpositioned (average heavy atoms distance of 2.76 Å) for proton donation to the leaving group. The catalytic nucleophilic residue Glu338 mainly interacts through its carboxylate oxygen OE2 with Tyr284 (88.9% occupancy) so that the negative charge on the nucleophile becomes stabilized. In addition, this same carboxylate group of the nucleophilic residue Glu338 (through its OE1 oxygen) establishes nearly half of the time another hydrogen bond with the glycosyl 2-OH group of the substrate. Some oscillation of this 2-OH group between OE1GLU338 (46.6% occupancy) and OE2Glu338 (16.3% occupancy) is actually observed. As commented in the Introduction, this interaction is key for the stabilization of the glycosylation transition state, especially for retaining β-glycosidases because the interaction with the nucleophile will affect oxocarbenium cation formation. Moreover, OE1Glu338 is also H-bonded with Arg75 side chain and a water molecule. On the other hand, the 2-OH group of the Fuc ring also interacts with Asn163 side-chain (with an occupancy of 46.8%). The 3-OH group of Fuc establishes a H-bond with HE2HIS119 (34.4% occupancy) and with OE1GLN18 (79.3%). The 4-OH and 5-OH of Fuc are hydrogen bonded as acceptors to the same water molecule (with an occupancy of around 20% for TABLE 1 | Analysis over the MD simulation of the most populated hydrogen-bonds between pNP-Fuc, protein residues, and water molecules at the active site.


H-bond occupancies are defined as the fraction of frames the bond is present. Only Hbond occupancies above 10% are included. The average donor to acceptor heavy atoms distances of the bonds when present are also given.

each Fuc···water interaction), and the axial 4-OH of Fuc is Hbonded to OE1GLU392 (62.5% occupancy). This latter interaction structurally differs from that reported by Badieyan et al. (2012) when cellobiose is hydrolyzed by Osβ-gly, as in Glc this hydroxyl group is in the equatorial position, which makes it interact with the other oxygen of the Glu392 carboxylate.

To summarize, the glycosyl moiety of the substrate bound at the−1 subsite establishes a complex H-bond network with the protein and some water molecules that contributes to its stabilization. It is worth highlighting that all the first- and secondshell protein residues around the−1 subsite identified in this H-bond analysis are highly conserved residues among the GH1 family (Bráa et al., 2010; Camargo et al., 2019). In contrast, the H-bond analysis did not identify any conserved residue around the +1 site so confirming the lack of specificity for the aglycone moiety within the GH1 family as previously observed (Bráa et al., 2010). The pNP molecule only establishes H-bonds with two water molecules all along the MD trajectory through its NO2 group. In addition to those H-bond interactions, the Fuc ring remains surrounded all along the trajectory by hydrophobic interactions established with conserved residues such as Trp385, Trp393, Trp120, and also Phe401, whereas the pNP leaving group basically interacts with Trp312.

To initiate the QM/MM study of the glycosylation catalytic mechanism, two MD snapshots of the Ttβ-gly/pNP-Fuc Michaelis complex were selected. Those structures were filtered by imposing the presence of the direct H-bonds (O4DpNP-HGLU164 and OE2Glu338-H2OFUC) between the two catalytic residues, Glu338 and Glu164, with the pNP-Fuc substrate. In addition, the presence of the H-bond between Tyr284 and Glu338 was also verified in the selected frames. The relevance of this interaction (and of describing it at the QM level) for the energetics of the glycosylation step and hydrolysis steps has been recently highlighted by Badieyan et al. (2012) in their QM/MM study of Osβ-gly mentioned above. This is the reason why we have included Tyr284 in the QM(small)/MM partition. Finally, in the two frames selected it was also verified that the additional predominant H-bond interactions highlighted in **Table 1** were also present.

### QM/MM Calculations

In this section the results of the QM/MM calculations on the glycosylation reaction will be presented. This catalytic step consists in the cleavage of the glycosidic bond of the pNP-Fuc molecule to form a covalent glycosyl-enzyme intermediate with Glu338. Despite pNP is a good leaving group, acid catalysis is required for this mechanistic step, a role that is carried out by the protonated Glu164 (**Figure 1**).

The two selected frames from the MD trajectory (denoted as G<sup>I</sup> and GII in this paper) were optimized using the two QM/MM partitions described above and the corresponding minima were located and characterized. From those minimum energy structures, the potential energy profiles along the glycosylation step were calculated as a function of the reaction coordinate RC = [d(C1FUC-O4DpNP)-d(C1FUC-OE2GLU338) d(HGLU164-O4DpNP)]. In **Table 2** the QM/MM potential energy barriers (obtained from the corresponding energy profiles) and the reaction energies (considering the optimized glycosylenzyme product) for the glycosylation step at the QM = PBE0/SVP and PBE0/TZVP levels using the QM(small)/MM and QM(large)/MM partitions are given for both frames. The corresponding energy values obtained by means of PBE0/TZVP single-point energy calculations on the PBE0/SVP(small)/MM geometries are also given. It can be observed that the potential energy barrier using the QM(small)/MM partition takes values in between 28 and 30 kcal/mol when the smaller basis set is used. The energy correction using the larger basis set tends to increase those energy barriers (by 0.4–2 kcal/mol) especially for the GII frame. However, when the geometry optimization is carried out with the TZVP basis set, no significant differences are obtained with respect to the SVP energy values. These energy barriers are in the range of that calculated with a large QM region for the glycosylation reaction involving cellobiose and Osβ-gly (Badieyan et al., 2012), and also of those calculated for other retaining GHs (Petersen et al., 2009, 2010; Biarnés et al., 2011). However, a large effect is observed in the present system when the large QM region is used with the TZVP basis set. Lower potential energy barriers (by 5.9 and 9.8 kcal/mol for frames G<sup>I</sup> and GII, respectively) were obtained for both frames. Notice that, as pNP is a better leaving group than the Glc of cellobiose modeled in the glycosylation reaction catalyzed by Osβ-gly (Badieyan et al., 2012), a lower energy barrier should be expected. In particular, the potential energy barriers calculated at QM(large)/MM level and the TZVP basis set are 22.0 kcal/mol (for GI) and 20.4 kcal/mol (GII). These values are in better and in reasonable qualitative agreement with the value of 17.1 kcal/mol for the phenomenological free energy of activation derived from the experimental <sup>k</sup>cat at 40◦C (Teze, 2012). The significant differences between the calculated potential energy TABLE 2 | QM/MM potential energy barriers (1V 6=) and reaction energies (1VR) calculated for the glycosylation step for the two complexes studied (GI and GII).


<sup>a</sup>QM(small)/MM.

<sup>b</sup>QM(large)/MM.

Energies are given in kcal/mol.

barriers reflect that they are very sensitive to different QM/MM partitioning schemes and that non-catalytic residues (other than Tyr284) might play a role in the stabilization of the glycosylation transition state, as previously stated. As for the reaction energies, the glycosylation process turns out to be more endoergic for frame G<sup>I</sup> than for frame GII when the QM(small)/MM partition is used. However, both glycosylation reactions become clearly endoergic when the QM region is enlarged. Interestingly, the endothermic nature of the glycosylation reaction was recently observed in previous studies where their authors suggest that an endothermic glycosylation might be a prerequisite for efficient transglycosylation (Raich et al., 2016).

In **Figure 4** the evolution of the four main interatomic distances involved in the glycosylation reaction (C1FUC - O4DpNP, C1FUC - OE2GLU338, HGLU164 - O4DpNP, and HGLU164 - OE2GLU164) are plotted for the G<sup>I</sup> frame along the reaction coordinate calculated with the QM(large)/MM partition and at the PBE0/TZVP level for the QM region. In **Figure 5** the molecular representation of the corresponding reactant, transition state and product are depicted. It can be observed that the C1FUC - OE2GLU338 distance between the nucleophile residue and the anomeric carbon gradually diminishes from 3.33 Å at reactants (RC= −3.90 Å) to a value of 2.61 Å at the transition state (RC= −2.10 Å) (what corresponds to a bond not yet formed), and finally takes a value of 1.50 Å at the covalent glycosyl-enzyme intermediate (RC= 0.70 Å) in which this bond is already completely formed. The length of the glycosidic bond C1FUC - O4DpNP is 1.41 Å at the reactants, increases to 2.33 Å at the TS (what corresponds to the glycosylic bond already broken) and finally takes a value of 3.17 Å at the product structure in which the leaving group has been released. For the second frame (GII) the distances are very similar (**Supplementary Table S6**). These structural results are coherent with the dissociative nature of the glycosylation transition state. The oxocarbenium character of this TS is also made apparent by a higher degree of double bond character of the C1FUC-O5FUC bond (the C1FUC-O5FUC distance diminishes 0.14 Å from the reactant to the TS structure for both frames). In addition, the acidic residue Glu164 gradually approaches the glycosidic bond along the two reaction paths (the O4DpNP - HGLU164 hydrogen bond diminishes from 1.99 Å (GI) and 2.02 Å (GII) at the reactants to 1.82 and 1.88 Å at the G<sup>I</sup> and GII TS, respectively) although the proton is not yet transferred. Proton transfer takes place when the

OE2GLU338-C1FUC distance is ∼2 Å, and is followed by complete OE2GLU338-C1FUC bond formation. This is consistent with a change in pK<sup>a</sup> of Glu164 that makes proton transfer easier as the negative charge of the leaving group gradually increases and that of the nucleophile gradually decreases with the progress of the glycosylation reaction. When a smaller QM region is used within the QM/MM approach, and independently of the basis set employed for the QM description, the transition state can also be described as dissociative (**Supplementary Table S7** and **Supplementary Figure S2**). Curiously, the O4DpNP - HGLU164 Hbond distance at the reactant varies by 0.42 Å between the two frames considered (G<sup>I</sup> and GII), and a significant difference in this distance is also observed at the corresponding TSs. When applying the QM(large)/MM partition, these differences between frames disappear. With the smaller QM region, the nucleophilic attack by Glu338 is completed at lower RC values than with the QM(large)/MM partition and quite before the proton is fully transferred. Thus, the description at the QM level of the Hbonds that Glu338 and Glu164 establish with their neighboring residues seems to affect significantly their nucleophilic and acid/base characteristics.

The inclusion of other protein residues besides the catalytic ones (Glu338, and Glu164) and Tyr184 in the zone treated quantum-mechanically (as in our QM(large)/MM model) also affects the stabilization by the enzyme environment of the oxocarbenium ion and the negatively charged glycosyl oxygen anion. Moreover, as mentioned in the Introduction, one of the most stabilizing protein-substrate interactions at the glycosylation TS is the H-bond between the 2-OH group of the sugar moiety and the nucleophilic residue Glu338. This stabilizing contribution correlates with a reduction of the OE1GLU338 - H2OFUC H-bond distance, from 2.09 Å and 2.37 Å at reactants to 1.82 and 1.85 Å at the TS of G<sup>I</sup> and GII, respectively (**Supplementary Table S6**). The Fuc at the −1 subsite is also stabilized along the glycosylation pathway by the HD21ASN163 - O2FUC interaction as the corresponding H-bond distance diminishes around 0.10 Å from reactants to the transition state structures. Tyr284 plays a relevant role along the glycosylation process. As the OE2GLU338 atom loses negative charge upon formation of the C1FUC - OE2GLU338 bond, the Tyr284 hydroxyl group separates from Glu338 and clearly approaches (with a reduction in the bond distance of around 1.3 Å at the TSs and of around 1.9 Å at the product structures) the O5FUC atom. The expected oxocarbenium ion-like character of the TS is favored by the coplanarity of the C2, C1, O5, and C5 atoms of the fucose ring which adopts a slightly distorted chair conformation close to a half-chair (4H3) and with some <sup>4</sup>E character.

At this point it is interesting to comment that it has been shown that GHs usually bind the substrate in a subtle but critical distorted conformation (Davies et al., 2012). Such conformational change brings the glycosidic bond closer to the equatorial orientation and to a more planar ring structure (so closer to the oxocarbenium TS geometry) that facilitates in-line nucleophilic attack. A possible exception to this has been recently reported for family GH3 enzymes, for which the glucose bound at the−1 subsite seems to adopt a <sup>4</sup>C<sup>1</sup> conformation (maybe slightly distorted to <sup>4</sup>H5) in the Michaelis complex according to QM/MM calculations and crystallographic data (Hrmova et al., 2001; Geronimo et al., 2018). Different conformational itineraries (from reactants to TS and to products) have been associated with different GHs (Ardèvol and Rovira, 2015). Thus, for retaining βglucosidases acting on glucose substrates the <sup>1</sup> S<sup>3</sup> → <sup>4</sup>H<sup>3</sup> → <sup>4</sup>C<sup>1</sup> itinerary has been described for the glycosylation step. The QM/MM study by Badieyan et al. (2012) of the glycosylation step catalyzed by rice Osβ-gly using cellobiose as substrate agrees with this itinerary. However, in the present study, in which we have pNP-Fuc instead of Glc(β1-4)β-Glc as substrate, we were unable to characterize a Michaelis complex in a <sup>1</sup> S3 conformation. The QM(large)/MM optimized reactant has a slightly distorted <sup>4</sup>C<sup>1</sup> conformation (with θ and φ angles values of 216 and 27◦ , respectively). Attempts to obtain a different conformation by maintaining the ring restrained previous to the full QM(large)/MM minimization only produced an alternative minimum close to a <sup>4</sup>E (slightly <sup>4</sup>H3) conformation. This structure presents a small increase in the C1FUC – O4DpNP distance (1.49 Å) and slightly shorter C1FUC – OE2GLU338 (3.29 Å), C1FUC – O5FUC (1.36 Å), and HGLU164 – O4DpNP (1.77 Å) distances than the previous one. The potential energy

profile calculated from this minimum gives a barrier height of only 10 kcal/mol (**Supplementary Figure S3**), which is much smaller than the experimental value. This suggests that, although such conformation may be possible, it may not be the most favorable conformation that the enzyme will bind and, thus, it has no catalytic significance. Badieyan et al. (2012) reported the importance of having Glu440 of Osβ-gly (that interacts with the 4-OH and 6-OH groups of Glc) in the QM region in order to characterize the <sup>1</sup> S<sup>3</sup> conformation. In the present study, the QM(large)/MM partition includes the equivalent Glu392 residue, amongst many others. At the origin of these differences may be the fact that we are modeling Fuc instead of Glc (they differ in the position of the 4-OH group and Fuc has no 6- OH group but a methyl in this position); or that our substrate is not an oligosaccharide but a synthetic phenyl derivative that may introduce less steric requirements than a monosaccharide at the +1 subsite. Nevertheless, as glycosidic bond breakage occurs early in the reaction, the ring is distorted early along the calculated reaction coordinate with subsequent attack by the nucleophilic Glu338 occurring in the expected orientation.

### Hydrolysis and Transglycosylation Steps (Deglycosylation)

Once the pNP molecule leaves the enzyme active site, a water molecule can compete as nucleophile in a hydrolysis reaction with the acceptor ligand (BnON(Me)-Glc), that also attacks as a nucleophile on the anomeric carbon of the fucose molecule in a transglycosylation reaction leading to the N-methyl-O-benzyl-N- (β-D-fucopyranosyl(1-4)β-D-glucopyranosyl)-hydroxylamine product. The attacking water molecule as well as the BnON(Me)- Glc ligand are activated by a proton transfer to the Glu164 residue that acts as a base along those two nucleophilic processes.

Starting from the minimized MD structure (see Methodology section) in which a water molecule (WAT431) was located at the active site, close enough to the anomeric carbon of fucose ring, we calculated the QM/MM energy profile for the hydrolysis process using a reaction coordinate defined as RC = [d(C1FUC-OE2GLU338)-d(C1FUC-OWAT431)-d(H1WAT431- OE2GLU164)]. As mentioned above, a H-bond network of three water molecules, in addition to the nucleophilic WAT431, were included in the QM region for the hydrolysis simulation. On the other hand, for the transglycosylation step, initial energy profiles starting from minimized MD structures gave highly overestimated energy barriers (>34 kcal/mol), suggesting some structural deficiencies in the model. With the aim of improving this model, we followed a strategy successfully used by our group in previous studies of enzyme catalyzed reactions involving sugar-transfer between donor and acceptor substrates (Albesa-Jové et al., 2015; Mendoza et al., 2016, 2017). The refinement protocol consists on running a QM(SCC-DFTB)/MM MD of, in this case, the transglycosylation product, followed by scan calculations of frames from this MD to generate new models for the transglycosylation reactant that is then minimized at the QM(PBE0)/MM level. By doing this, interactions between donor and acceptor (or also involving water molecules) that are difficult to capture in the MM MD seem to be better represented. In the present system, the structural changes observed upon refinement involve the reorientation of the 4-OH hydroxyl of the acceptor Glc toward the Fuc ring. This displaces a water molecule that was located between the two and that most likely hindered acceptor approach during reaction (**Supplementary Figure S4**). In the new reactant structure, the Glc hydroxyl will be able to better interact with Fuc as transglycosylation takes place. Substratesubstrate interactions have been shown to have a very important contribution in the sugar transfer catalyzed by retaining glycosyltransferases (Gómez et al., 2012, 2013; Mendoza et al., 2017) and could, thus, be also relevant for transglycosylation. Two frames (called here T<sup>I</sup> and TII) were generated by this approach. From the two optimized QM/MM reactant structures, the transglycosylation potential energy profiles for the corresponding β(1-4) glycosidic linkages were calculated using as reaction coordinate the following expression RC = [d(C1FUC-OE2GLU338)-d(C1FUC-O4GLC)-d(H4OGLC-OE2GLU164)]. In this model only one water molecule remains in the QM region. We did not analyze other glycosidic linkages between the Fuc and BnON(Me)-Glc because it has been proven experimentally that the native enzyme does not produce other product regioisomers when these two substrates are used (Teze et al., 2013).

In **Table 3** the potential energy barriers and the reaction energies corresponding to the hydrolysis and the transglycosylation steps are given for the two QM/MM partitions used in this study and at the different QM levels of electronic structure theory analyzed. The first general observation is that the potential energy barrier for the hydrolysis step is very similar to that of the transglycosylation one, as expected because the experimental result for this reaction catalyzed by WT Ttβ-gly gives a transglycosylation yield of 36%, which implies differences in the energy barriers no larger than 1 kcal/mol. With the small QM region at the PBE0/SVP level, the hydrolysis potential energy barrier is 2.1 and 0.9 kcal/mol higher than for the transglycosylation frames T<sup>I</sup> and TII, respectively. According to this result, the transglycosylation reaction would be faster than the hydrolysis process. This trend is inverted when single-point energy calculations are carried out at a higher level (QM(PBE0/TZVP)), being the hydrolysis energy barrier now slightly lower (by 0.7 (TI) and 0.9 (TII) kcal/mol). The correct trend between the hydrolysis and transglycosylation barriers is confirmed when a bigger basis set is used in the QM/MM optimizations and also when the QM region is enlarged. As for the glycosylation step, the values of the energy barriers calculated with the smaller QM region are around 6 kcal/mol higher than those obtained with the larger QM region. Those energy differences reflect again that the two reaction processes are very sensitive to different QM/MM partitioning schemes. Despite this, the difference between transglycosylation and hydrolysis energy barriers is similar (1.2 kcal/mol for T<sup>I</sup> and 2.6 kcal/mol for TII). Thus, the energy trends might be reproduced at a lower computational cost (QM(PBE0/TZVP)/MM but QM(small)/MM). As for the reaction energies, the hydrolysis process is more exoergic than the transglycosylation reaction. This result is in agreement with previous results by Bráa et al. (2010) in their study of the catalytic mechanism of a β-galactosidase and with the known thermodynamically favored hydrolysis over synthesis reaction (Planas and Faijes, 2002).

In **Table 4** the most relevant interatomic distances (calculated with the QM(large)/MM at the QM(PBE0/TZVP)/MM level) along the hydrolysis and transglycosylation pathways are presented and the corresponding molecular structures of the different reactants, transition states and products are depicted in **Figure 6** (for QM(small)/MM results see **Supplementary Table S8**). At reactants, with QM(large)/MM, the initial location of the water molecule in the hydrolysis process corresponds to a slightly more distant position from the anomeric carbon (3.45 Å) than in the two reactants of the transglycosylation process (3.27 and 3.35 Å). The hydrolysis values reported for Glc hydrolysis by rice β-glucosidase (Osβ-gly) are slightly larger (3.62–3.75 Å) (Badieyan et al., 2012; Wang et al., 2013), as occurs at the same level of theory when the QM(small)/MM partition is used (**Supplementary Table S8**). Also at TS structures the C1FUC-OAcc bond, that is still forming, is slightly longer for hydrolysis (with distances of 2.27, 2.11, and 2.12 Å at H, T<sup>I</sup> and TII transition states, respectively), while the nucleophile Glu338 side chain is completely displaced from the sugar in both reactions but slightly closer from C1FUC in the case of hydrolysis (with C1FUC-OE2GLU338 distances of 3.12, 3.42, and 3.34 Å at H, T<sup>I</sup> , and TII transition states, respectively). For Glc hydrolysis by Osβ-gly, slightly shorter distances have been reported, especially for the C1GLC-ONu one (1.89–2.14 and 2.45–2.51 Å, for C1GLC-OAcc and C1GLC-ONu, respectively) (Badieyan et al., 2012; Wang et al., 2013). Along the nucleophilic attacks, the proton from the water molecule or the BnON(Me)-Glc acceptor comes closer to the general base residue Glu164 but it has not been transferred yet at the transition states (HAcc-OE2GLU164 distances between 1.42 and 1.48 Å). This is in agreement with the hydrolysis description given on previous works (Badieyan et al., 2012; Wang et al., 2013; Geronimo et al., 2018). Interestingly, this proton transfer is more advanced in the hydrolysis and transglycosylation TSs that it was the Glu164 to pNP one in the glycosylation TS (d(HGLU164-O4DpNP ∼1.8Å). Notice that this is the first time that the transglycosylation step is studied for wild-type family GH1, thus direct comparison for this step with previous works is not possible. For family GH2, potential energy scans followed by TS characterization identified a dissociative TS for transglycosylation and with the proton not yet transferred; however, detailed structural information (i.e., distances) was not provided (Bráa et al., 2010). Besides, for the GH2 enzyme a magnesium ion interacting with the acid/base residue participates in catalysis, which is likely to introduce some structural differences when compared to the present system (e.g., the hydrolysis TS presented the anomeric C1 atom equidistant from the nucleophile and the attacking water (2.25 Å), which differs from all the results commented above).

Despite the sensitivity observed for these TS geometries to the DFT functional, the basis set, the QM/MM partition, the characterization method, the substrates or the specific enzyme, it is clear that both the hydrolysis and the transglycosylation transition states are very loose structurally, and they have a dissociative nature. From the distances reported in **Table 4**, which are thus to be taken as orientative, some trends can be extracted at the QM(PBE0/TZVP)/MM with QM(large)/MM level. For example, the sum of the C1FUC-OAcc and C1FUC-OE2GLU338 distances for the deglycosylation TSs (5.39, 5.53, and 5.46 Å for H, T<sup>I</sup> , and TII transition states, respectively)



<sup>a</sup>QM(small)/MM.

<sup>b</sup>QM(large)/MM.

All energies are given in kcal/mol.

TABLE 4 | Distances (in Å) between selected atoms involved in each reaction step for the reactant, transition state (TS) and product of the hydrolysis (H) and transglycosylation (TI and TII, corresponding to the two different frames studied) steps.


The results correspond to QM(large)/MM and at the PBE0/TZVP level. The Acc subscript refers to the acceptor water and glucose moieties in hydrolysis and transglycosylation processes, respectively.

are all significantly larger than the sum of the analogous C1FUC-O4DpNP and C1FUC-OE2GLU338 distances at the glycosylation TS (which is 4.97 Å on average). On the other hand, the differences between C1FUC-OAcc and C1FUC-OE2GLU338 distances in the deglycosylation TSs (0.85, 1.31, and 1.22 Å for H, T<sup>I</sup> , and TII TS structures, respectively) are much larger than the corresponding glycosylation value (0.27 Å). Notice also that the hydrolysis value is significantly smaller than the transglycosylation ones, and of the same order as the values obtained for Glc hydrolysis by Osβ-gly (0.31–0.62 Å) (Badieyan et al., 2012; Wang et al., 2013). These results are in agreement with experimental observations indicating a more SN2 character for the glycosylation step than for the deglycosylation one (Kempton and Withers, 1992). Moreover, transglycosylation is predicted to have a transition state more advanced on the reaction coordinate than the hydrolysis one (RC values of −0.63, −0.17, and −0.20 Å for H, T<sup>I</sup> , and TII TS structures, respectively).

In an attempt to determine the robustness of the geometrical differences found between the hydrolysis and transglycosylation TSs, the corresponding free energy profiles at the QM(SCC-DFTB)/MM level were obtained by umbrella sampling molecular dynamics simulations (**Supplementary Figures S5, S6**). Both reactions show very similar free energy barriers (21.10 and 21.93 kcal/mol for H and T, respectively), being the transglycosylation one slightly higher. At the corresponding TSs, the average C1FUC-OAcc, C1FUC-OE2Glu338 and HAcc-OE2Glu164 distances are 1.95 ± 0.15, 2.60 ± 0.18, and 1.52 ± 0.26 Å for hydrolysis, and 2.11 ± 0.05, 2.31 ± 0.08, and 1.32 ± 0.03 Å for transglycosylation. Thus, the dissociative character of the TSs is maintained, as well as the late proton transfer, although at this level of theory the hydrolysis TS would be more advanced on the reaction coordinate (RC<sup>H</sup>

= −0.87 Å) than the transglycosylation one (RC<sup>T</sup> = −1.04 Å). However, it is important to note the significant fluctuations on the distances measured for the hydrolysis TS, which may be due to the flatness of the free energy profile at this zone and are indicative of a wide ensemble of available TS configurations with different reaction coordinate values. For the transglycosylation reaction, these fluctuations are much smaller. Interestingly, the transglycosylation QM(SCC-DFTB)/MM potential energy maximum is more advanced than the hydrolysis one (as with QM=PBE0/TZVP), which suggests that the apparent switch in TS localization along the reaction coordinate is a consequence of the introduction of dynamics. On the other hand, the more compact TS geometries when compared to the QM=PBE0/TZVP

ones (sum of C1FUC-OAcc and C1FUC-OE2GLU338 distances) is already present at the potential energy maxima, indicating that it is likely a consequence of the choice of QM level and not to the umbrella sampling simulations. QM(SCC-DFTB)/MM umbrella sampling simulations have been reported for hydrolysis and transglycosylation of Glc substrates catalyzed by a family GH3 β-glucosidase. Transglycosylation TS presented C1GLC-OAcc, C1GLC-ONuc and HAcc-OBase distances of 3.0 ± 0.2, 1.9 ± 0.1, and 1.6 ± 0.1 Å, which are relatively similar to those obtained here at the QM=PBE0/TZVP level. However, the hydrolysis TS had the C1GLC-OAcc bond practically formed (1.5 ± 0.04 Å) and the proton at a distance of 2.06 ± 0.08 Å from the catalytic base (although right before the TS this distance had shortened to 1.4–1.5 Å) (Geronimo et al., 2018). Therefore, water was predicted not to be activated by the basic residue, which is not what we observe in our calculations. The authors also related these geometric parameters to the lack of oxocarbenium ion-like character of this TS and a preference of the GH3 −1 subsite for a TS in a <sup>4</sup>C<sup>1</sup> conformation. (Geronimo et al., 2018). In the present system, the C1FUC-O5FUC bond becomes shorter from reactants to the corresponding transition states accordingly to its partial double bond character along the nucleophilic attack. Hence, the positive charge on C1FUC increases in between 0.24 and 0.27 au from reactants to the transition state structures (**Supplementary Table S9** for QM(PBE0/TZVP)/MM and QM(large)/MM) while O5FUC in the fucose ring becomes less negative (around 0.06 au) (with the QM(small)/MM partition the change on C1FUC is slightly bigger (0.29–0.30 au), **Supplementary Table S10**). Those charge variations confirm the oxocarbenium ion-like character observed in deglycosylation reactions (hydrolysis as well as transglycosylation) in the GH1 β-glucosidases family. The ring puckering is very similar in both reactions, with Cremer-Pople φ values of 41◦ (for H), 44 and 47◦ (for T<sup>I</sup> and TII) and φ angles of 224◦ (H), 224, and 221◦ (for T<sup>I</sup> and TII), which correspond to a conformation between <sup>4</sup>H<sup>3</sup> and <sup>4</sup>E. For the transglycosylation TS in family GH3 β-glucosidase mentioned above, a <sup>4</sup>H<sup>3</sup> conformation was predicted. (Geronimo et al., 2018).

It can be observed from the QM(PBE0/TZVP)/MM data in **Supplementary Table S9** that there are only very small differences on the atomic charges of the atoms involved in the hydrolysis and transglycosylation reactions, in accordance with the small differences obtained in the energy barriers for the two kinds of deglycosylation processes. The lower energy barrier for hydrolysis might be due then to the more negative charge on the oxygen atom of the nucleophilic water in comparison to the negative charge value on the O4 atom of the glucose acceptor molecule.

As can be seen in **Figure 6** and **Table 4**, some interactions between the substrates or with the surrounding network of water molecules also vary along the two reactions studied. For transglycosylation, the hydrogen of the 3-OH group of the Glc acceptor substrate (adjacent to the attacking 4-OHGLC), in going from reactant to TS, approaches O5FUC (by 0.53 and 0.38 Å for T<sup>I</sup> and TII, respectively) and O4FUC (by 0.12 and 0.51 Å for T<sup>I</sup> and TII, respectively). In the hydrolysis reaction, a water molecule (WAT432) establishes a bridge between the oxygen atom of the attacking water (WAT431) and that of the 4-OHFUC group. The OWAT431-H2WAT432 and the H1WAT432- O4FUC distances show little variation from reactant to TS, but at the product the OWAT431-H2WAT432 distance increases by 0.91 Å due to a rotation of WAT432 (as OWAT431 has formed the new bond with C1FUC), which now points its H2WAT432 toward O5FUC (d(H2WAT432-O5FUC) is 3.58 Å at reactants, 3.33 Å at TS and 1.95 Å at products). Thus, WAT432 and 3-OHGLC seem to share a similar role in stabilizing the hydrolysis and transglycosylation TS, respectively, via interaction with the oxocarbenium ion. This kind of interactions have been identified in retaining glycosyltransferases to help the reaction, with the difference that due to the different disposition of the substrates they serve leaving group departure (Gómez et al., 2015). It is interesting to notice that the hydrogen atom of this 4-OHFUC group is in turn interacting with Glu392; the H4OFUC-OE1GLU392 interaction becomes shorter along the hydrolysis and the T<sup>I</sup> paths (it is not present along the TII energy profile because a water molecule is located as a Hbonding bridge between O4FUC and OE1GLU392). Moreover, the presence of Fuc instead of Glc as substrate (differing on the 4-hydroxyl configuration) has provoked a rotation of the Glu392 sidechain. The analysis of electrostatic contributions to the potential energy barrier of residues at the −1 subsite (**Figure 7**) show that Glu392 is stabilizing both the hydrolysis and transglycosylation TSs, being the effect larger for hydrolysis. In fact, this residue is the one showing the largest difference between the two competitive reactions, suggesting that altering this interaction could be a way to increase the T/H ratio. This observation could be related to the reported transglycosylation increase in the Asn390Ile and Phe401Ser mutants, both residues in contact with Glu392; Asn390 is H-bonding to Glu392 so that its mutation is likely to alter the Glu392-Fuc interaction. Three other residues interact with Fuc hydroxyl groups: In this case, an according to the electrostatic interactions analysis, they provide a small destabilization effect in this deglycosylation steps: Asn163, which interacts with the 2-OHFUC group, with similar values for hydrolysis and transglycosylation; His119, that interacts with O3FUC and O2FUC, and provides a slightly larger destabilization effect for transglycosylation; and Gln18, that interacts with O3FUC with similar effect for both TSs.

In the hydrolysis and transglycosylation steps here studied, the nucleophile Glu338 presents strong interactions with the 2-OHFUC group, Tyr284, Arg75, and Asn282. The H2OFUC-OE1GLU338 distance increases as the Glu338 residue moves away with the breakage of the C1FUC-OE2GLU338, although the Hbond interaction is maintained along the three reaction pathways and the negative charge of Glu338 also increases (around 0.3 au from reactants to the transition states, **Supplementary Table S9**). Concomitantly with those molecular changes, Tyr384 side-chain moves away from O5FUC while it approaches OE2GLU338 atom (which, as mentioned, is increasing its negative charge). It has been suggested that the short hydrogen bond with the hydroxyl group of the equivalent tyrosine residue of a β-galactosidase could facilitate the elimination of the nucleophile protein residue (Bráa et al., 2010). The analyses depicted in **Figure 7** show this increasing stabilization by Tyr284 as reaction proceeds (by 4.76 and 6.12 kcal/mol at the hydrolysis and transglycosylation TSs and 6.31 and 8.26 kcal/mol at their respective products). The electrostatic stabilization of the TSs as compared to the reactants provided by Arg75, that tends to slightly approach the OE1GLU338 as reaction proceeds, is the largest one (by 8.24 and 8.06 kcal/mol for hydrolysis and transglycosylation, respectively). These results qualitatively agree with the much lower activity measured when these two residues are mutated. Finally, Asn282 is predicted to have a moderate destabilizing effect, especially for the products. This residue interacts with both catalytic Glu338 and Glu164, thus the results indicate that its interaction with Glu164, which will go from a total −1 au charge to neutral, is the predominant one.

in this work. Calculations were performed on the structures obtained with QM(small)/MM and at the QM = (PBE0/TZVP) level. Values, with respect to reactant, are given at different points along the reaction coordinate (RC), the transition states (TS) and products. Red: HIS 119, dark green: GLN 18, purple: ASN 163, dark blue: ASN 282, orange: TYR 284, light blue: GLU 392, pink: ARG 75, light green: CYS 164.

The residues that directly interact with the nucleophile (Tyr284, Asn282, Arg75) or with the acid/base catalyst (Asn163; which also interacts with O2FUC and Arg75) are part of the ones identified to increase the T/H ratio when properly mutated (Teze et al., 2014) Bissaro et al. recently proposed that mutations impairing the optimized electron displacement system that the enzyme has for catalyzing hydrolysis (which includes the nucleophile and the oxocarbenium TS), results in the destabilization of the deglycosylation TSs and leads to accumulation of the glycosyl-enzyme intermediate; such accumulation has been linked to an increased T/H ratio (Bissaro et al., 2015a). Hydrophobic interactions (e.g., C-H – π interactions) augmenting the affinity of the enzyme for glycosidelike acceptors is another factor that can favor transglycosylation. (Tran et al., 2010; Bissaro et al., 2015a,b). In the present system the acceptor substrate can interact with Trp312. On the other hand, mutation of the catalytic acid/base Glu to Asp in NkBgl also catalyzed unforeseen transglycosylation (Jeng et al., 2012). The crystal structure of the mutant revealed a water molecule close to the shorter Asp chain and the anomeric carbon; thus, it was proposed that Asp would play its role through this nearby water molecule. A structurally conserved water molecule observed in a wild-type enzyme belonging to family GH101, was also proposed to enable a Grotthuss proton shuttle between the protein residue carboxylate and the glycosidic oxygen (Gregg et al., 2015). Still, the particular reasons why all these mutations in the −1 subsite favor transglycosylation over hydrolysis have not been fully unveiled.

The small differences in energy barriers these mutagenesis experiments imply and the high dimensionality of enzymatic systems, make it a great challenge to reproduce the effect of mutation computationally or to clearly identify its origin. For example, in the case of Tyr284 a transglycosylation yield of 76 % (but a lower rate) was measured for the Tyr284Phe mutant as compared to a 36 % for the WT enzyme (Teze et al., 2014). This was interpreted as a destabilizing effect produced by the mutation that affected more hydrolysis than transglycosylation. Our qualitative analysis of electrostatic interactions, though, estimates that in WT Ttβ-gly Tyr284 provides a slightly higher stabilization in the transglycosylation case. Attempts to reproduce the experimental observations by directly introducing the Tyr284Phe mutation on frames H, T<sup>I</sup> , and TII and recalculating the potential energy profiles also failed. Thus, further computational studies of mutants taking into account structural rearrangements upon mutation (even if small) would be needed to try to reproduce these trends. However, and more importantly, these analyses are not able to capture the subtle changes in energy barriers involved in these cases, which are below the current "chemical accuracy."

### CONCLUSIONS

Here we have presented a QM(DFT)/MM study of the glycosylation, hydrolysis and transglycosylation steps catalyzed by wild type Thermus thermophilus β-glycosidase (family GH1), a retaining glycosyl hydrolase for which a transglycosylation yield of 36 % has been determined experimentally for pNP-Fuc and BnON(Me)-Glc substrates.

In the glycosylation step (first step of the double-displacement mechanism of retaining GHs), the pNP-Fuc substrate establishes a strong network of hydrogen bonds at the−1 subsite and is found to be bound in an only slightly distorted <sup>4</sup>C<sup>1</sup> conformation, contrary to what has been reported when the substrate is a Glc disaccharide. At the transition state for this step, that has a strong oxocarbenium character (close to <sup>4</sup>H<sup>3</sup> conformation but with some <sup>4</sup>E characteristics), proton transfer from Glu164 to the leaving group has not yet started but it takes place at a OE2GLU338-C1FUC distance of ∼2 Å with the QM(large)/MM partition and the PBE0/TZVP QM level. The potential energy barrier at this level of theory (20.4–22.0 kcal/mol) is in qualitative agreement with the experimental barrier (17.1 kcal/mol). The use of a smaller QM region increases this value by more than 5 kcal/mol. The size of the QM region also changes the location of the TS along the reaction coordinate. Thus, it seems to affect the nucleophilic and acid/base characteristics of Glu338 and Glu164, respectively.

For the hydrolysis and transglycosylation reactions (competing reactions in the deglycosylation step), a significant reduction of the barrier heights is also observed with the QM(large)/MM partition, although the T vs. H trends are well reproduced with the QM(small)/MM one provided that the PBE0/TZVP level is used to obtain the energies (or also for geometry optimization). The transglycosylation potential energy barrier is predicted to be 1.2–2.6 kcal/mol higher than the hydrolysis one. Hydrolysis and transglycosylation transition states are both very loose, with strong oxocarbeniumlike character and the proton not yet transferred. Their ring conformation falls between <sup>4</sup>H<sup>3</sup> and <sup>4</sup>E and they present very similar atomic charges, except for the more negative charge of the oxygen atom of the attacking water when compared to that of the attacking 4-OH group of Glc. Structural differences appear between the glycosylation and the deglycosylation steps, with the former having a more pronounced SN2 character. The proton transfer is also more advanced in the deglycosylation TSs that was the proton transfer in the glycosylation one. Differences between the hydrolysis and the transglycosylation TSs are also observed, but they are very dependent on the level of theory used. At the QM(PBE0/TZVP)/MM level, the potential energy maxima for transglycosylation is more advanced on the reaction coordinate than the hydrolysis one: has shorter O4GLC-C1FUC (forming bond) distance and longer OE2GLU338- C1FUC (breaking) distance, although the HAcc proton is closer to the Glu164 base in hydrolysis. QM(SCC-DFTB)/MM free energy maxima show the inverted situation, although large geometric fluctuations are seen for the hydrolysis TS, indicating a large ensemble of TS configurations covering a range of reaction coordinate values. Interestingly, some interactions between the substrates (3-OHGLC group) or with neighboring water molecules (WAT432) are identified that stabilize the oxocarbenium TSs through interaction with O5FUC and O4FUC. An analysis of electrostatic interactions of the substrates with the residues at the−1 subsite has also been performed. However, correlation with experimental mutagenesis studies found to alter the T/H ratio is very challenging due to the small differences on energy barriers involved and potential structural (even if small) rearrangements upon mutation. The analysis suggests, though, that perturbing the Glu392-Fuc interaction could increase the T/H ratio, even by direct mutation of this residue or indirectly as reported experimentally in the N390I and F401S cases.

### AUTHOR CONTRIBUTIONS

SR-T performed the calculations. SR-T, ÀG-L, and LM wrote the manuscript. SR-T, JL, ÀG-L, and LM contributed to the analysis and discussion of the results.

### ACKNOWLEDGMENTS

The authors acknowledge the financial support from the Spanish Ministerio de Ciencia e Innovación through contract number CTQ2017-83745-P. LM thanks the Universitat Autònoma de Barcelona for the Talent program.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2019.00200/full#supplementary-material

### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Romero-Téllez, Lluch, González-Lafont and Masgrau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Effect of Cofactor Binding on the Conformational Plasticity of the Biological Receptors in Artificial Metalloenzymes: The Case Study of LmrR

Lur Alonso-Cotchico1,2, Jaime Rodríguez-Guerra Pedregal <sup>1</sup> , Agustí Lledós <sup>1</sup> and Jean-Didier Maréchal <sup>1</sup> \*

<sup>1</sup> Departament de Química, Universitat Autònoma de Barcelona, Barcelona, Spain, <sup>2</sup> Stratingh Institute for Chemistry, University of Groningen, Groningen, Netherlands

### Edited by:

Vicent Moliner, University of Jaume I, Spain

#### Reviewed by:

Marco De Vivo, Istituto Italiano di Tecnologia, Italy Pedro Alexandrino Fernandes, Universidade do Porto, Portugal

\*Correspondence:

Jean-Didier Maréchal jeandidier.marechal@uab.cat

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 15 January 2019 Accepted: 18 March 2019 Published: 10 April 2019

#### Citation:

Alonso-Cotchico L, Rodríguez-Guerra Pedregal J, Lledós A and Maréchal J-D (2019) The Effect of Cofactor Binding on the Conformational Plasticity of the Biological Receptors in Artificial Metalloenzymes: The Case Study of LmrR. Front. Chem. 7:211. doi: 10.3389/fchem.2019.00211 The design of Artificial Metalloenzymes (ArMs), which result from the incorporation of organometallic cofactors into biological structures, has grown steadily in the last two decades and important new-to-Nature reactions have been reached. These type of exercises could greatly benefit from an understanding of the structural impact that the inclusion of organometallic moieties may have on the biological host. To date though, our understanding of this phenomenon is highly partial. This lack of knowledge is one of the elements that condition that first-generation ArMs generally display relatively poor catalytic profiles. In this work, we approach this matter by assessing the dynamics and stability of a series of ArMs resulting from the inclusion, via different anchoring strategies, of a variety of organometallic cofactors into the Lactococcal multidrug resistance regulator (LmrR) protein. To this aim, we coupled standard force field-based techniques such as Protein-Ligand Docking and Molecular Dynamics simulations with a variety of trajectory convergence analyses, capable of assessing both the stability and flexibility of the different systems under study upon the binding of cofactors. Together with the experimental evidence obtained in other studies, we provide an overview on how these changes can affect the catalytic outcomes obtained from the different ArMs. Fundamentally, our results show that the convergence analysis used in this work can assess how the inclusion of synthetic metallic cofactors in proteins can condition different structural modulations of their host. Those conformational modifications are key to the success of the desired catalytic activity and their proper identification can be wisely used to improve the quality and the rate of success of the ArMs.

Keywords: molecular modeling, artificial metalloenzymes, molecular dynamics, interactive analysis, cofactor binding, molecular plasticity

## INTRODUCTION

Incorporating homogenous catalysts into biological scaffolds (e.g., protein, DNA, or peptides) has become a common strategy to expand the scope of the biological space and produce biocompatible man-made biocatalysts (Schwizer et al., 2017; Diéguez et al., 2018). These de novo enzymes, also referred as Artificial Metalloenzymes (ArMs), can be generated under numerous strategies including post-translational approaches i.e., supramolecular (Ohashi et al., 2003; Mahammed and Gross, 2005; Reetz and Jiao, 2006; Bos et al., 2015), covalent (Reetz et al., 2002; Bos et al., 2013), or dative (Kokubo et al., 1983; Van De Velde et al., 2000) interactions, or eventually the incorporation of unnatural amino acids (UAA) (Drienovská et al., 2015, 2017) via sequencing approaches or by the direct expression through cellular vectors. The ArM design process can be divided in two different stages: (1) discovery, when a first catalytically efficient biohybrid shows some activity for a given reaction, and (2) optimization, when the initial candidates are chemically and/or genetically altered to reach improved activity in terms of yield, substrate selectivity, or regiospecificity. Whatever the stage at which the ArM design stands, the most important and complex molecular variable that needs to be controlled by designers is the stability of the interactions between the host and the artificial cofactor: a sine qua non condition which reaches pre-reactive resting states and catalytically competent geometries after the binding of the substrate.

Foreseeing the quality of the host-cofactor complementarity requires extensive molecular knowledge and remains a challenging exercise in the path of achieving experimentally efficient candidates. In fact, experimentalists base most of their design on trial-and-error strategies until they reach a first hit. In a way, designers are engaged in an unfair battle against evolution since they try to find good enough affinities between two moieties that never occur in Nature. One of the predominant variables for defining the quality of the interaction between the two entities, is the conformational adaptation of the receptor upon binding of the cofactor. In fact, for any de novo design, one of the major weaknesses is the poor consideration of protein dynamics along the designing exercises (Hammes-Schiffer and Benkovic, 2006; Henzler-Wildman and Kern, 2007; Nagel and Klinman, 2009; Hammes et al., 2011; Callender and Dyer, 2014; Maria-Solano et al., 2018). This is probably the reason why the catalytic efficiency of the new candidates is frequently many orders of a magnitude lower than that achieved by naturally-occurring enzymes (Jiang et al., 2008; Röthlisberger et al., 2008; Siegel et al., 2010). Despite the increasing number of ArMs over the last decades, little has been done to estimate the sensitivity of the biological host to the insertion of non-natural cofactors and how this, in turn, conditions the nature of the resting state of the ArM prior to any catalytic step. In this matter, in silico methods can be very helpful.

Molecular Modeling has been widely used to decode the nature of the dynamical events involved in structure-function relationship of naturally-occurring biological macromolecules. From short to large scale motions, theoretical (e.g., Molecular Dynamics (MD) simulations and Normal Modes Analysis) studies constantly provide evidence on how the dynamics of the protein host is influenced by the presence or absence of substrates and/or inhibitors as well as the tight relationship between these changes and the accessibility to catalytically efficient configurations (Dutta and Mishra, 2017; Sen et al., 2017; Sharma et al., 2017; Wilson and Wetmore, 2017; Wilson et al., 2017; Kamariah et al., 2018; Luirink et al., 2018; Rout et al., 2018; Schlee et al., 2018). It is therefore a legitimate question to use computation to assess to what extent bridging chemical and biological entities disturb the natural conformational space of the biomolecules and how damaging/beneficial this can be for catalysis.

Our group entered the field of ArM about a decade ago and focused both on understanding the mechanism of non-natural enzymes as well as providing protocols for the design of new systems. Our strategies are mostly based on integrated protocols, where physical models could range from Quantum Mechanics (QM) to Molecular Mechanics (MM) approaches (Muñoz Robles et al., 2015). One of the questions we wanted to solve is the magnitude of the conformational rearrangement experienced by the biological host under cofactor inclusion, a phenomenon that requires substantial computational improvements of MM methodologies in order to simulate metal-mediated recognition processes. In previous works we decoded the electronic origin of the control of the binding site motions and helix re-arrangements in the ArM constructed by the insertion of salophen into hemeoxygenase apo-enzyme (Muñoz Robles et al., 2011). Another interesting case came from describing how the inclusion of organometallic complexes into a protein scaffold can alter its structure leading to significant variations in the catalytic outcomes (Drienovská et al., 2017; Villarino et al., 2018). These studies provide us with some clues about the structural sensitivity of the receptor upon inclusion of the organometallic moiety, but no clear tendencies could be drawn as they were comprised of a case-specific analysis. To shed light on this matter we will perform a structural assessment benchmark, focusing on a unique receptor loaded with different homogeneous catalysts.

Over the past few years, Roelfes et al. have focused on the Lactococcal multidrug resistance Regulator (LmrR) protein–a transcriptional repressor from the Lactococcus lactis organism–as a biological host for a variety of organometallic cofactors, leading to a set of enantioselective artificial metalloenzymes, including hydratases (Bos et al., 2013; Drienovská et al., 2015, 2017), cyclopropanases (Villarino et al., 2018) and Diels-Alderases (Bos et al., 2012). LmrR is a homodimeric protein with a particularly flat and hydrophobic dimer interface, capable of packing foreign aromatic molecules at the patch constituted by the tryptophan's W96/W96' of chains α4/α4', respectively, which are located at the center of the cavity (**Figure 1**).

Most of the new LmrR-based designs are the result of (a) the post-translational inclusion of a phenanthroline moiety (**Figure 2A**) or (b) the expression of the (2,2'-bipyridin-5yl)alanine (BpyA) unnatural amino acid (**Figure 2B**), at positions 89/89' of the dimeric LmrR protein. Both of them are nitrogen-based ligands of copper(II) ions and presented interesting activity for the hydration of ketones as well as the Friedel-Crafts alkylation reaction for either LmrR or DNA based

catalysis (Arnold, 2009; Boersma et al., 2010; Bos et al., 2013; Drienovská et al., 2015, 2017). Separately, the same protein was used as a scaffold for the supramolecular recognition of hemin, resulting in an ArM with acquired cyclopropanase activity (**Figure 2C**) (Villarino et al., 2018). Recently, it has been employed to covalently attach a Rh(I) complex, forming a biohybrid capable of hydrogenating CO<sup>2</sup> (Laureanti et al., 2019). Those works therefore present a unique opportunity to test how a given host could be sensitive to the insertion of different cofactors into their binding site.

From a computational point of view, dealing with bioorganometallic systems represents one of the most challenging modeling tasks, due to the necessity of describing biometallic interactions under standard force fields (an area still in strong development, Riccardi et al., 2018), while also accounting for possible wide structural variations resulting from embedding net charges (the metal center) in a natural hydrophobic environment. As a result, the identification of a set of in silico modeling techniques sensitive enough to predict structural variations that arise from the incorporation of inorganic moieties into the protein host, is fundamental to speed up the success rate of ArM designs.

Here, we focus on the dynamical implications of inserting non-natural organometallic cofactors in the core of the LmrR protein. The computational approach consists of the combination of Protein-Ligand Docking and MD simulations followed by diverse trajectory analyses including all-to-all Rootmean-square-deviation (RMSD), Principal Component Analysis (PCA), cluster counting, and Root-mean-square-fluctuation (RMSF) approaches (see **Figure 3**). Together with the visual inspection of the trajectories, these analyses provide valuable information on two main areas: (1) the MD simulation timescale required to ensure a proper conformational exploration by the hybrid systems, and (2) how the inclusion of different organometallic external moieties can promote conformational variations of the same biological host.

### MATERIALS AND METHODS

### Construction of the Dataset

Together with two natural forms of the LmrR protein found in the Protein Data Bank (PDB), the models include: the apo form of the LmrR protein (1), the LmrR protein bound to its inhibitor daunomycin (2), the ArMs resulting from the supramolecular interaction between the heme group and LmrR (3) and the linking of the biaqua form of two Phen-Cu(II) (4) or BpyA-Cu(II) (5) cofactors to the positions M89C/M89C' of the LmrR protein, as illustrated in **Table 1**.

Initially, quantum calculations were performed to obtain the optimized structures of the different organometallic cofactors. All the complexes were optimized with Gaussian 09 (Frisch et al., 2009) at the DFT level using the B3LYP-D3 (Becke, 1993; Stephens et al., 1994; Grimme et al., 2010) functional. For all the non-metallic atoms the 6-31G(d,p) basis set was used. For the iron and copper atoms the SDD effective core potential and its associated basis set (Dolg et al., 1987) including f functions for (Ehlers et al., 1993) was employed. For systems 4 and 5, for which no X-ray data was available, the organometallic complexes were incorporated into the LmrR protein scaffold via a covalent Protein-Ligand Docking approach, considering flexible side-chains for all the residues pointing toward the active site. For systems 1, 2, and 3 the X-ray structures of the apo LmrR (PDB code: 3F8B), the LmrR bound to daunomycin drug (PDB code: 3F8F), or the LmrR bound to the heme group (PDB code: 6FUU), respectively, were used as a starting point for the MD simulations. For system 3, the crystallographic data shows several orientations of the heme group corresponding with a rotated porphyrin with respect to the perpendicular axis passing across the iron ion. Thus, in this case, the selection of the starting point was based on a consensus between the X-ray and docked structures, resulting in the orientation corresponding to the best scored pose. Daunomycin (for system 1) and crystallographic water molecules were manually removed. Systems 4 and 5 were constructed via a covalent Protein-Ligand Docking approach by linking the aqua bound form of both BpyA-Cu(II) and Phen-Cu(II) cofactors, respectively, to the position 89/89' of the LmrR, using the X-ray structure bound to its inhibitor daunomycin (PDB code: 3F8F) as a scaffold. All docking runs were performed with GOLD 5.2 (Verdonk et al., 2003). The best scored structures, together with the X-ray data for systems 1, 2 and 3, were embedded in boxes of around 37,000 water molecules and were used as the starting point for 300 ns MD simulations. The simulations were run with the OpenMM 7.0 engine (Eastman and Pande, 2010) as wrapped in the OMMProtocol program (Pedregal et al., 2018).

The resulting trajectories were processed with an in houseautomated procedure that computes a series of complementary analysis to assess the convergence: (1) the changes in the RMSD of the coordinates with respect to the starting structure, (2) the RMSD evolution with respect to all the frames in

the simulation, (3) a PCA on the structural variability of the backbone (Balsera et al., 1996; David and Jacobs, 2014), and (4) a cluster counting method (Daura et al., 1999; Smith et al., 2002), as described in the following section. Root-mean square fluctuation (RMSF) plots were also calculated for all systems to identify the regions with a wider range of movement in each trajectory. For depiction purposes, the dihedral angles of each trajectory were analyzed with time-structure Independent Components Analysis (tICA) (Naritomi and Fuchigami, 2013) and clustered by k-means with MSMBuilder (Harrigan et al., 2017). The resulting structures were superposed with UCSF Chimera's matchmaker command.

### A Few Words About the Analysis of MD Trajectories

The most common strategy to analyze trajectories for assessing the structural stability of biological macromolecules is based on aligning the structure along the simulation and computing the root mean square deviation (RMSD) against a reference structure, which helps to determine whether the simulation has stabilized around a given average conformation. While easy to understand and calculate, RMSD analyses fail to show the nature of the conformational states that are sampled and only provide partial information about the structural steadiness of the system. As a result, several authors have proposed additional procedures to obtain deeper information about the convergence status of MD simulations (Daura et al., 1999; Smith et al., 2002; Grossfield and Zuckerman, 2009; Knapp et al., 2011). One of them is the all-to-all RMSD, which, instead of calculating the RMSD against a single structure, considers all the possible conformations along the simulation, resulting in a two-dimensional matrix of TABLE 1 | Definition of the different systems considered in this work.

RMSD measurements. The resulting plot helps to depict the different visited regions along the MD trajectory, which might be challenging to detect with a standard RMSD analysis. Another noteworthy analytical strategy is the cluster counting tool which, in consensus with the above method, is useful to identify the rate of the appearance of new sub-states along the MD trajectory. Last, the Principal Component Analysis (PCA) is a standard statistical procedure used to study the correlation within a dataset (Abdi and Williams, 2010). Concerning the scope of this work, it allows the visual inspection of the conformational space explored along the simulation timescale and whether the system continues to visit new regions or not, which enables the user to see at a glance how stable the system is and therefore whether the trajectory is approaching convergence or not. As recommended by Grossfield and Zuckerman (2009), the integration of these four complementary analytical tools can provide an accurate assessment of the conformational space explored as well as the degree of convergence of a MD trajectory.

To compute the four analyses of the trajectories, we set up several scripts using Python 3.6 through the Jupyter Notebook interface (Kluyver et al., 2016). CPPTraj (Roe and Cheatham, 2013) was first used to remove the waters and reimage the system within the same periodic box. MDTraj (McGibbon et al., 2015) was then used to load and align the trajectories. This library also provided the RMSD calculations. PCA was calculated with the routines available in scikit-learn (Pedregosa et al., 2011) set to generate two components. Cluster counting followed the algorithm proposed in Smith et al. (2002), with a RMSD cut-off of 2.0 Å. Figures were plotted with matplotlib 2.0 (Hunter, 2007). In all cases, only the α-carbons belonging to the α-helix segments were considered for distance calculations, thus discarding highly flexible regions of the protein scaffold as evidenced in the per-residue root mean square fluctuation (RMSF) plots (**Figure S6** RMSF).

The methods described above are able to depict the magnitude of the changes along a MD simulation but not what the nature of these changes are; i.e., which regions of the protein are involved in those structural variations. For this reason, their use in consensus with both the visual inspection of the trajectories as well as tools able to identify local and global motions is of great importance. In this regard, the structures obtained from tICA and k-means clustering can be very helpful. Since tICA tends to group the structure changes explained by slower motions in the first components (Naritomi and Fuchigami, 2013), it can be used to distinguish between rapid and slow motions as the system evolves over time, since these are normally related to local and global structural changes of the system, respectively. This allows the identification of the structural features that lead to the variability/stability detected by the above methods and, thus, a clearer assessment of the structural impact promoted by the incorporation of the different external moieties into the protein scaffold.

For further details about the computational procedure the reader is referred to the **Supplementary Material**.

### Non-metallic Cofactor Bound LmrR

To provide a reference to assess the impact of the incorporation of artificial metallic cofactors in the LmrR protein scaffold, we first studied two experimental structures of LmrR available at the PDB without any metallic moieties bound: an apo form of the LmrR (PDB code: 3F8B) (**Table 1**, system 1) and a LmrR form bound to the daunomycin inhibitor (PDB code: 3F8F) (**Table 1**, system 2). This inhibitor is a substantially large and hydrophobic molecule and consequently quite reminiscent of the cofactors that have been studied in this work.

The 300 ns-long Molecular Dynamics simulations of the apo (system 1) and daunomycin (system 2) form of LmrR revealed local conformational changes that mainly involve the ends of helixes α4 and α4' and the β hairpin loops of both monomers (**Figure S1**). For system 1, in addition, collective motions, not found in system 2, related to the opening and closing of the LmrR interdimeric binding site were elucidated as illustrated in **Figures S1**, **S2**. It appears that the presence of the daunomycin inhibitor, which is sandwiched between tryptophan's W96/W96' of helix α4/α4' at the center of the cavity, does not significantly influence the global motility of the protein scaffold in contrast to the apo form of LmrR (**Figures S1**, **S3**, systems 1 and 2). The lack of the hydrophobic inhibitor at the dimer interface seems to promote the closing of the pore by bringing the helices α4/α4' closer: around 8 Å between the alpha carbons of tryptophan's W96/W96', which are located in the center of helix α4/α4' (from now this parameter will be used as reference to assess the opening/closing of the dimer interface; see **Figure S2**, system 1). In contrast, the drug-bound form maintains an opened arrangement of helices α4/α4' (around 12.5 Å; see **Figure S2**, system 2).

The combination of cluster counting, all-to-all RMSD and PCA analyses (**Figures S3**–**S5**, systems 1 and 2) provides more data about the conformational sampling for systems 1 and 2. The appearance of new structural clusters reaches its plateau at 150 ns of the MD trajectory for both systems, which suggests a converged structural sampling for the simulation time scale. This is consistent with the all-to-all RMSD analysis, which indicates that both systems fluctuate between well-defined sub-states, which is also associated with a converged trajectory (**Figure S4**, systems 1 and 2). These sub-states are related to the structures which present certain structural divergences with respect to the X-ray structure, consistent with the local/global motions described above, associated mostly to the flexibility of the β hairpin loops and the ends of the α4/α4' helixes and, only for system 1, the closing of the dimer interface (**Figure S5**, systems 1 and 2).

Altogether, these analyses show that the natural motions of LmrR involve mostly the interdominial interface between monomers A and B, as well as their relative rotation, which is smoothed in the inhibitor-bound form of the protein. Additionally, this state is associated with a wide active cavity resulting from direct hydrophobic interactions of W96/W96', A92/A92', and V15/V15' with the daunomycin inhibitor. From these results we can conclude, on one hand, that a time scale of 150 ns is enough to reach a proper conformational sampling in the MD simulations for the apo and the daunomycin-bound forms of LmrR and, on the other hand, that, overall, the accommodation of hydrophobic moieties at the hydrophobic interface would reduce the global conformational plasticity of LmrR.

### The Heme-Based Artificial Metalloenzyme

Our study then focused on the Artificial Metalloenzyme systems. We started with the ArM resulting from the supramolecular interaction between the heme group and the LmrR protein. Recent studies showed that the LmrR-heme system is able to reach efficient cyclopropanation profiles (Villarino et al., 2018). The X-ray structure of the LmrR bound to heme (PDB: 6FUU) displays the prosthetic group sandwiched in between the two α4/α4' helices of the homodimers with major hydrophobic interactions between the macrocycle and the side chains of the two tryptophan's W96/W96', as well as polar interactions between N19/N19' and N14/14' and the carboxylate groups of the heme moiety. Due to the similar size and configuration between the heme group and the inhibitor daunomycin in the LmrR active site, we expected a similar impact on the protein dynamics between systems 2 (daunomycin-bound) and 3 (hemebound). Interestingly, results showed that the dynamics of both systems followed the same trend: in both cases, the presence of the planar hydrophobic molecule at the dimer interface promotes an increase in the flexibility of the α4' helix, which for the LmrR⊂heme ArM occurs with more plasticity, comprising mainly the distortion of the same helix (**Figure 4**). This effect promotes the closing of the hydrophobic interface with respect to system 2 (**Figure S2**, systems 2 and 3), which is broader than for system 1 but narrower than for system 2, and is reflected in the increased amount of sub-states found for this system: cluster counting shows that, in this case, MD simulation reaches convergence with a maximum of 12 clusters of structures, in contrast to systems 1 and 2, which reach convergence in a maximum of eight clusters of structures (**Figure S3**). In addition, all-to-all RMSD and PCA analyses show not only a higher amount of sub-states related with a major structural exploration of the system (dark zones covering the diagonal of the all-toall RMSD plot), but also a greater flexibility (light areas in the background of the all-to-all RMSD plot), and divergence on the nature of these sub-states (no overlapping spots in the PCA plot).

These results show that the presence of an organometallic complex at the hydrophobic interface does not change the dynamical tendency of the natural form of LmrR in system 2 but promotes an increase on the plasticity of the system, that could only be deciphered by the use of cluster counting and PCA analysis, and was particularly related to the motion of the helix α4'. More likely, this effect can be explained by the presence of the porphyrin metal center, absent in the daunomyicin-bound system 2, at the LmrR active site. Interestingly, the observed motion of the helix α4' was associated with the catalytic activity of the LmrR⊂heme ArM, in which the displacement of the tryptophan W96', located in this helix, was key for vacating one of the axial faces on the porphyrin and, thus, making the metal center accessible to the substrates to reach pre-catalytic states for the cyclopropanase activity (Villarino et al., 2018).

### The Copper-Based Artificial Metalloenzymes

Next, we wanted to assess the effect of incorporating organometallic cofactors but, in this case, covalently linked to the protein scaffold at positions 89/89'. These were the copper-bound nitrogenated compounds, Ala-bipyridine (BpyA) and phenanthroline (Phen), which are covalently linked to the protein scaffold at positions 89/89'. Interestingly, despite sharing same chemical properties and a similar size and overall planarity, previous reports have shown greatly different catalytic outcomes for the enantioselective addition of water to conjugated ketones (Bos et al., 2013; Drienovská et al., 2017). These differences have been associated to the different lengths of the linkers that bind the aromatic rings of the nitrogenated moieties with the backbone (**Table 1**, systems 4 and 5), which seems to affect the overall behavior of the artificial systems (Drienovská et al., 2017). Thus, we decided to include these ArMs in our study to identify the key elements that promote such a differential effect.

For that purpose, we studied the conformational variability of the LmrR scaffold loaded with Phen-Cu(II) and BpyA-Cu(II) cofactors simulating their most likely biaqua resting state (**Table 1**, systems 4 and 5). These were included into both monomers of the LmrR protein scaffold via a covalent Protein-Ligand Docking procedure, showing a good affinity with the protein binding site (**Table S1**) in both cases, being slightly better for Phen-Cu(II)-(H2O)<sup>2</sup> (38.89 ChemScore units) than for BpyA-Cu(II)-(H2O)<sup>2</sup> (34.85 ChemScore units). Consistently with previous systems, the best score poses were selected and used as the starting point for 300 ns of Molecular Dynamics simulation.

Due to their structural similarities, it would be expected that the inclusion of the Phen and BpyA cofactors into LmrR promotes a similar impact on the scaffold than the heme organometallic cofactor (system 3). However, both systems 4 and 5 present strongly different dynamic tendencies, which are particularly noteworthy for the Phen-containing ArM. Results showed a strong decrease of the global motions and the plasticity of the system with respect to the systems 1, 2, and 3 (**Figures S4**–**S6**, systems 1–4), the main motions being reduced to the ends of the α4/α4' helices. The appearance of new clusters of sub-states converges at the very beginning of the simulation, without identification of changes along the MD time-scale, which is consistent with the very low plasticity of the system as shown in **Figure S3** for system 4 (only 3 clusters of structures were identified after 300 ns of MD simulation). In this configuration, the biaqua form of the Phen-Cu(II) cofactors appeared stabilized by interactions comprising mainly π-stacking with phenylalanine residues F93/F93' and hydrophobic contacts with I103/I103' and the side chains of the R90/R90' residues. In addition, hydrogen bonds between the tail of the cofactor with N19/N19' residues as well as between the waters bound to the metal center and the aspartates D100/D100' seemed to further stabilize the location of the cofactor at the entrance of the dimer interface (**Figure 5A**). This binding mode is accompanied by a broader dimer interface (around 14 Å), resulting, for the daunomycin bound system 2, from the hydrophobic interactions between the active site residues with the aromatic cofactor. Additionally, all-to-all RMSD and PCA analysis evidenced a significantly reduced plasticity of the LmrR⊂Phen system (**Figures S3**, **S4**, system 4). The former analysis showed few sub-states, especially welldefined and with strong presence after 50 ns of MD simulation (the big dark area which center lies at the diagonal of the plot). Furthermore, the later analysis showed that between the few sub-states identified there is a very low divergence (PCA data appears superimposed). These results evidence that the LmrR system loaded with the Phen-Cu(II) cofactors presents major stability than that of the natural form of LmrR (system 1). It is of great relevance to note that, in contrast to the flexible nature of the catalytically active Lmr⊂heme system (system 3), this ArM, which presents the highest stability among all those considered in this study, showed a very good catalytic activity, in this case for the enantioselective hydration of ketones (Bos et al., 2013). This observation suggests that one of the factors driving the promiscuity of the LmrR based artificial enzymes is also related to the flexibility of the protein backbone, which needs to be controlled in order to perform the different catalytic transformations.

Regarding system 5, the results showed a totally different scenario. The MD simulation evidenced the poor capability of the BpyA-Cu(II)-(H2O)<sup>2</sup> cofactors to reach the center of the cavity. Instead, they appeared pointing toward the solvent during most of the trajectory and at any time along 300 ns of MD simulation the complexes appeared embedded at the dimer interface. This is the result of a lack of hydrophobic and polar interactions between the active site residues and the bipyridine cofactors (**Figure 5B**). The bipyridine cofactor has a shorter tail linking the bipyridine rings and the protein backbone (the beta carbon) than phenanthroline (five atoms, see **Figure 2**), which makes the complexes lie out of the hydrophobic cavity (**Figure 5B**). Consequently, this lack of aromatic ligands at the dimer interface promotes a narrower arrangement between helixes α4/α4' (around 8 Å between the alpha carbons of W96/W96' residues, see **Figure S2** system 5), the same as the distance observed for the apo form of LmrR (system 1). Additionally, the lack of stability in the positioning of the bipyridine cofactors is extended to the rest of the protein backbone, promoting an increase in its plasticity as well as distorting the global protein structure (**Figure 6**). This behavior is well-captured by the cluster counting, all-to-all RMSD and PCA analysis (**Figures S3**–**S5**). As expected, the number of identified sub-states highly increased for this system in contrast to system 4 (LmrR⊂Phen), being grouped in a total of nine clusters of structures along 300 ns of MD simulation, in contrast to the three clusters identified for system 4. In addition, they showed a more flexible system (light areas in the background of the all-to-all RMSD plot) as well as a strong divergence of the identified sub-states (the spots in the PCA plot do not superimpose along 300 ns of MD simulation). RMSF analysis also revealed this difference: system 5 showed higher average fluctuation values for all chains (see **Figure S6**). Interestingly, this ArM was not able to efficiently perform the enantioselective hydration of ketones, presenting much lower levels of both conversion and enantioselectivity in contrast to the LmrR⊂Phen ArM (Bos et al., 2013; Drienovská et al., 2017). After deciphering the mechanism of system 5 (Drienovská et al., 2017), it was revealed that an aspartate located at helix α4' was responsible for boosting the nucleophilic attack, the first step of the hydration reaction. For this to occur in an enantioselective manner, the cofactor-substrate complex needed to be not only located inside the active site but also stabilized in only one orientation. For this reason, the stability of the interactions at the active site seems key for the desired reaction to proceed in the LmrR-based artificial hydratases (systems 4 and 5), in contrast to the LmrR⊂heme ArM, which requires the flexibility of helix α4' to reach pre-catalytic structures.

In summary, the results showed that the stabilizing interactions that occur between the active site residues and the copper cofactors seem crucial in providing stability to the global motions of the LmrR scaffold. In this regard, the hydrophobic and polar interactions and the length of the cofactor linkers play a critical role in reducing or increasing the flexibility of the system. Additionally, it is reasonable to think that the presence of non-stabilized metal moieties close

to the protein backbone is one of the factors that may promote the strong distortion of the protein backbone as observed for the LmrR⊂BpyA (system 4). As a result, only the LmrR⊂Phen ArM is able to maintain an asymmetric environment around the copper cofactor during the entire simulation time scale, which is reflected in the catalytic efficiency of this artificial hydratase (Bos et al., 2013).

### CONCLUSIONS

This study aims to shed light on questions concerning the structure-function relationship in proteins, especially focusing on organometallic-containing systems such as ArMs. Dealing with organometallic moieties adds an extra layer of difficulty to modeling and simulation exercises due to the challenging task of describing non-natural metal-containing moieties as part of the standard force fields, as well as their structural effect when interacting with the biological scaffold. Thus, the identification of a computational modeling workflow, together with an analytical protocol sensitive enough to decipher the changes that result from the incorporation of external moieties into proteins, appears to be of great relevance, especially in the enzyme design field.

To guarantee the quality of the systems used in this study, we made use of a set of well-characterized ArMs (LmrR loaded with heme, copper-bound phenanthroline, or copper-bound bipyridine) designed by Roelfes et al. whose models have been experimentally validated in previous works (Drienovská et al., 2017; Villarino et al., 2018) to perform a comparative analysis of simulations resulting from the combination of Quantum Mechanics, Protein-Ligand Docking, and Molecular Dynamics techniques. Our results provide evidence that the convergence analysis used in this work can help explain the structural trends of the different systems under study. They show how the insertion of different non-natural metallic cofactors into the same biological scaffold may condition different structural modulations that, in addition, are key to the success of the desired catalytic activity. Put into the context of ArM design and in silico exercises, it is therefore crucial to first assess (or, at least, consider this magnitude as a variable to control sooner or later in the designing pipeline) the degree of rigidity/flexibility of the receptor-cofactor partner throughout MD simulations to understand how this can affect the reaction mechanism of interest.

Aiming at including dynamical notions in computer-aided design of ArMs, here we show the strength of combining an integrative strategy (docking + MD simulations) with convergence-based analysis, including all-to-all RMSD, PCA, RMSF, and cluster counting, to characterize the structural behavior of these complex organometallic systems. Our results also contribute to the debate on the benefit of accounting for stable vs. flexible protein scaffolds to drive the designs of the first generations of ArMs. This work makes clear that, due to the high amount of degrees of freedom controlling the different catalytic mechanisms occurring in ArMs, each of them must be considered as a separate system with its own particular patterns and

### REFERENCES


features (like flexibility/rigidity): ideally, their specific structural requirements would need to be evaluated on a one-by-one basis. Unfortunately, this means that we are still far from establishing a universal metric to guide the design of any ArM. Thus, at present, we find it essential, at least, to account for a proper protocol that establishes which modeling and analytical tools, such as the ones selected for this work, will ensure the gain of enough structural knowledge before investing in further efforts.

### AUTHOR CONTRIBUTIONS

JR-G and LA-C performed the calculations and analysis. J-DM designed the project. JR-G, LA-C, and J-DM discussed the analysis and wrote the manuscript. AL participated in the discussion and the writing of the manuscript.

### ACKNOWLEDGMENTS

All the authors are thankful for the support given by the Spanish grant CTQ2017-87889-P as well as the Generalitat de Catalunya grant 2017 SGR 1323. Support of COST Action CM1306 is kindly acknowledged. JR-G and LA-C thank Generalitat de Catalunya for their Ph. D. FI grants.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2019.00211/full#supplementary-material


design of artificial inorganic cofactors? Faraday Discuss. 148, 137–228. doi: 10.1039/C004578K


Mol. Graph. Model. 76, 403–411. doi: 10.1016/j.jmgm.2017. 07.006

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Alonso-Cotchico, Rodríguez-Guerra Pedregal, Lledós and Maréchal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Conformational Itinerary of Sucrose During Hydrolysis by Retaining Amylosucrase

Santiago Alonso-Gil <sup>1</sup> , Joan Coines <sup>1</sup> , Isabelle André<sup>2</sup> \* and Carme Rovira1,3 \*

<sup>1</sup> Departament de Quimica Inorgànica i Orgànica (Secció de Química Orgànica) and Institut de Quimica Teòrica i Computacional, Universitat de Barcelona, Martí i Franquès 1, Barcelona, Spain, <sup>2</sup> Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, Toulouse, France, <sup>3</sup> Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

#### Edited by:

Vicent Moliner, University of Jaume I, Spain

#### Reviewed by:

Jon M. Matxain, University of the Basque Country, Spain Reynier Suardiaz, University of Bristol, United Kingdom Ramon Crehuet, Instituto de Química Avanzada de Cataluña (IQAC), Spain

> \*Correspondence: Isabelle André

isabelle.andre@insa-toulouse.fr Carme Rovira c.rovira@ub.edu

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 06 February 2019 Accepted: 02 April 2019 Published: 30 April 2019

#### Citation:

Alonso-Gil S, Coines J, André I and Rovira C (2019) Conformational Itinerary of Sucrose During Hydrolysis by Retaining Amylosucrase. Front. Chem. 7:269. doi: 10.3389/fchem.2019.00269 By means of QM(DFT)/MM metadynamics we have unraveled the hydrolytic reaction mechanism of Neisseria polysaccharea amylosucrase (NpAS), a member of GH13 family. Our results provide an atomistic picture of the active site reorganization along the catalytic double-displacement reaction, clarifying whether the glycosyl-enzyme reaction intermediate features an α-glucosyl unit in an undistorted <sup>4</sup>C<sup>1</sup> conformation, as inferred from structural studies, or a distorted <sup>1</sup>S3-like conformation, as expected from mechanistic analysis of glycoside hydrolases (GHs). We show that, even though the first step of the reaction (glycosylation) results in a <sup>4</sup>C<sup>1</sup> conformation, the α-glucosyl unit undergoes an easy conformational change toward a distorted conformation as the active site preorganizes for the forthcoming reaction step (deglycosylation), in which an acceptor molecule, i.e., a water molecule for the hydrolytic reaction, performs a nucleophilic attack on the anomeric carbon. The two conformations (4C<sup>1</sup> ad E3) can be viewed as two different states of the glycosyl-enzyme intermediate (GEI), but only the E<sup>3</sup> state is preactivated for catalysis. These results are consistent with the general conformational itinerary observed for α-glucosidases.

Keywords: glycoside hydrolases, quantum mechanics/molecular mechanics, double-displacement reaction, ab initio molecular dynamics, metadynamics, conformational analysis

### INTRODUCTION

Glycoside hydrolases (GHs) are enzymes responsible for the degradation or hydrolysis of glycosidic bonds in carbohydrates. GHs are of fundamental interest in glycobiology and glycomics (Gamblin et al., 2009) since they are responsible for the modification of polysaccharides and glycoconjugates involved in numerous biological processes such as cellcell recognition and polysaccharide degradation for biofuel processing (Pauly and Keegstra, 2008). These enzymes also provide paradigms for enzymatic catalysis that extend beyond the bounds of carbohydrate chemistry (Koshland, 1953; Davies et al., 2012). GHs are systematically classified in 156 families (by March 2019) according to their sequence similarity (http://www.cazy.org) (Cantarel et al., 2009). Enzymes from the same family

**150**

usually share the same catalytic mechanism, either with retention or inversion of the anomeric configuration (**Figure S1**), in which an oxocarbenium ion-like transition state is formed (Rye and Withers, 2000; Zechel and Withers, 2000). Enzymes from the same family also share the same conformational itinerary of the saccharide unit at the -1 subsite (hereafter named -1 sugar), in which this saccharide unit changes conformation following a straight line or a equatorial one in Stoddart's diagram (**Figure S2**), either the one representing the Northern of Southern projection of the puckering sphere (Iglesias-Fernandez et al., 2015). Because there are a few conformations that satisfy the stereoelectronic requirements for a stable oxocarbenium ion (4H3, <sup>3</sup>H4, B2,5, 2,5B, <sup>3</sup>E, E3, <sup>4</sup>E, and E4) the number of itineraries is limited (Speciale et al., 2014; Ardèvol and Rovira, 2015). However, there are several GH families for which the itinerary has not been proven yet or it is controversial.

Members of GH13 family are retaining GHs that catalyze the cleavage of the α-glucosidic linkages, such as α-amylase, responsible for starch and amylose degradation. This family also gathers unique enzymes called amylosucrases that catalyze from sole sucrose the synthesis of α-D-glucopyranosyl homopolymers and oligomers accompanied with limited sucrose hydrolysis (Albenne et al., 2004). These enzymes are thus considered to be both glycoside hydrolases and transglucosylases. The substrate conformational itinerary from the Michaelis complex (MC, **Figure S1**) to the covalent glycosyl-enzyme intermediate (GEI), is expected to be identical.

Several X-Ray structures on GH13 enzymes have demonstrated that the conformation of the -1 sugar ring in the MC is <sup>4</sup>C<sup>1</sup> (Fujimoto et al., 1998; Mirza et al., 2001; Skov et al., 2002). The lack of distortion in the conformation, unlike many GHs, is because the α-stereochemistry at the anomeric carbon assures that the leaving group (+1 sugar) is in a reactive orientation for the SN2 reaction to take place, unlike e.g., β-glucosidases (Biarnés et al., 2006, 2011). In fact, α-glucosidases have been suggested to follow a <sup>4</sup>C1→ [ <sup>4</sup>H3] ‡→ <sup>1</sup> S<sup>3</sup> itinerary, which is the opposite as the <sup>1</sup> S3→ [ <sup>4</sup>H3] ‡<sup>→</sup> <sup>4</sup>C<sup>1</sup> for <sup>β</sup>glucosidases (Davies et al., 2012). These differences are being exploited in the field of inhibitor design (Beenakker et al., 2017).

Concerning the glycosyl-enzyme covalent intermediate (GEI) of the reaction, structural analyses of GH13 amylosucrase show that the -1 sugar bears a <sup>4</sup>C<sup>1</sup> conformation (Jensen et al., 2004). In the case of α-amylase, the GEI trapped using sugar analogs exhibits a similar <sup>4</sup>C<sup>1</sup> conformation (Zhang et al., 2009; Caner et al., 2016). QM/MM calculations of human pancreatic α-amylase captured the relevant role of the catalytic residues during the hydrolysis mechanism (Pinto et al., 2015), but the conformational itinerary of the -1 sugar was not investigated. On the other hand, the GEI of Bifidobacterium adolescentis sucrose phosphorylase, a GH13 enzyme, shows the -1 sugar distorted in a <sup>1</sup> S<sup>3</sup> conformation, suggesting a completely different conformational pathway (4C<sup>1</sup> → TS → <sup>1</sup> S3) (Mirza et al., 2006). Being both enzymes from the same family and acting over the same substrate, this discrepancy is puzzling. The <sup>4</sup>C<sup>1</sup> conformation of the GEI is also in contrast with the irreversible cyclosulphate inhibitors of α-glucosidases from other GH families, which show an unambiguous <sup>1</sup> S<sup>3</sup> conformation of the GEI analog (Artola et al., 2017). To solve this conundrum, we here uncover the conformational itinerary of the α-glucosyl unit of Neisseria polysaccharea amylosucrase (NpAS), a member of family GH13 for which a structure of the Michaelis complex with the natural substrate (sucrose) is available (**Figure 1**), during catalysis. Our simulations, performed by means of QM/MM metadynamics methods, show that the GEI can exhibit both a

relaxed <sup>4</sup>C<sup>1</sup> conformation or a distorted E3-like conformation of the -1 sugar, depending on whether the catalytic water is properly placed and oriented for catalysis or it is on its way in.

### RESULTS

### Michaelis Complex Structure

To model the Michaelis complex of NpAS, we reverted the Glu328Gln mutation of the X-ray structure of the NpASsucrose complex and performed classical MD simulations. The conformation of the α-glucosyl ring at the -1 sugar changed from <sup>4</sup>C<sup>1</sup> to B3,O during the first 6 ns of the MD simulation, due to an increase in the distance between the nucleophile and the anomeric carbon, but returned to <sup>4</sup>C<sup>1</sup> during the remaining 8 ns, indicating that it is probably the most stable state. The <sup>4</sup>C<sup>1</sup> conformation was maintained during the subsequent QM/MM MD equilibration.

To further ascertain which is the most favored conformation, we computed the conformational free energy landscape (FEL) of the α-glucosyl ring at the -1 subsite by QM/MM metadynamics, using the Cremer-Pople puckering coordinates θ/ϕ as collective variables. This is a well-tested approach that we have used with success to analyse the conformation of carbohydrates in isolation or in the active sites of GHs (Biarnés et al., 2007; Alonso-Gil et al., 2017). The computed FEL, shown in **Figure 2**, confirms that the most stable conformation is <sup>4</sup>C1, with a secondary minimum (4.7 kcal/mol above in free energy) corresponding to a B3,O distorted conformation. The most relevant hydrogen bond interactions involving the -1 sugar for the most stable <sup>4</sup>C<sup>1</sup> conformer are listed in **Table 1** (see atom labeling in **Figure 1B**). This conformer shows a longer glycosidic bond distance (C1-O1 = 1.44 Å) compared to B3,O, (1.42 Å) and the C1-O5 bond is shorter by 0.03 Å. Notably, the distance between the nucleophile

representation) of the -1 sugar (α-glucosyl) of NpAS in complex with sucrose. Contour lines at 1 kcal/mol.

oxygen (OAsp393) and the anomeric carbon (C1) is shorter when the sugar is not distorted, with values of 3.10 Å (4C1) and 3.30 Å (B3,O), which would facilitate the nucleophilic attack by Asp286. Clearly, only the <sup>4</sup>C<sup>1</sup> conformation of the substrate is preactivated for catalysis.

### Glycosylation Reaction

The simulation of the first step of the double displacement reaction (glycosylation) was initiated from a snapshot of the global minimum (4C<sup>1</sup> conformation) of the conformational FEL of the Michaelis complex. Three collective variables were used to drive the glycosylation reaction (**Figure S3** and Methods section), representing the proton transfer (CV1), the nucleophilic attack (CV2) and the glycosidic bond cleavage (CV3).

The free energy landscape reconstructed from the metadynamics simulation (reaction FEL) is shown in **Figure 3A** (for further detail, **Figure 3B** shows a two-dimensional projection). The FEL exhibits two clear minima in opposite regions: reactants (Michaelis complex, MC) and products (glycosyl-enzyme intermediate, GEI), separated by a transition state (TS1). The rate-limiting step of NpAS is not known but we can assume it is the glycosylation step, as in most GHs acting on substrates with a sugar aglycone (Li et al., 2001). Under this assumption, the reaction free energy barrier (17.2 kcal/mol) is in very good agreement with the experimental value of 17.9 kcal/mol estimated from the room temperature rate constant (Potocki de Montalk et al., 2000).

**Figure 4** and **Table 2** show the evolution of the main catalytic distances along the minimum energy pathway and the structures of the active site at the stationary states are represented in **Figure 5** (top panels). The glycosylation reaction begins with the approach of Glu328 to the glycosidic oxygen. The distance between the Glu328 acid/base proton and the glycosidic oxygen (O1-HGlu328) quickly decreases from 2.5 to 1.2 Å. Afterwards (**1** in **Figure 4**), the acid/base residue transfers the proton to the glycosidic oxygen and the glycosidic bond (C1-O1) starts to increase. Protonation of the glycosidic oxygen takes place in the anti-configuration with respect to the C1-O5 bond, as common in α-glycosidases (Alonso-Gil et al., 2017). At the reaction transition state (TS1), the glycosidic bond is partially

TABLE 1 | Calculated values of the most important interactions between the -1 α-glucosyl residue and the enzyme in the Michaelis complex (energies in kcal/mol, distances in Å).


projection of the FEL into the collective variables CV1 and CV3.

broken (1.98 Å) but the nucleophile has not yet started to attack the anomeric carbon (2.98 Å). Therefore, the reaction follows a dissociative mechanism and can be described as DNA<sup>N</sup> (Guthrie and Jencks, 1989; Schramm and Shi, 2001). Afterwards, the distance between the nucleophilic oxygen of Asp286 and the anomeric carbon decreases and the glycosyl-enzyme covalent bond forms. As it was found for endo-β-glucanase (Biarnés et al., 2011), the maximum oxocarbenium ion character does not occur at the TS but later on the reaction pathway (**2** in **Figure 4**). At this point, both the nucleophile and the leaving group are well separated from the C1 atom and its charge is higher than the one at the TS (by 0.05 electrons, **Figure S4**). The calculations thus reveal that the glycosylation reaction features an early TS with respect to charge development. The glycosylenzyme intermediate is almost formed at **3** (C1-OAsp286 = 2.14 Å). Finally, the hydrogen bond interaction between the free oxygen of Asp286 (O') and the 6-OH breaks and Asp286 rotates around the covalent bond with the -1 sugar, which collapses to an undistorted <sup>4</sup>C<sup>1</sup> conformation (GEI).

It is interesting to analyse in detail the conformation of the - 1 sugar during the reaction, in relation with the hydrogen bond between the nucleophile residue (Asp286) and the 6-OH. The evolution of the sugar conformation during the glycosylation reaction, defined by the θ puckering coordinate, is shown in **Figure 6**. From the MC to the TS, the conformation evolves from <sup>4</sup>C<sup>1</sup> to <sup>4</sup>H3/ <sup>4</sup>C1. Afterwards, the sugar adopts an E<sup>3</sup> envelope conformation (**2**), followed by a <sup>4</sup>H<sup>3</sup> half-chair (**3**). At this point, a sudden conformational change toward <sup>4</sup>C<sup>1</sup> takes place (θ changes from 58◦ to 20◦ ), concomitant with rotation of Asp286 around the glycosyl-enzyme bond, and the disruption of the 6- OH hydrogen bond interaction. The covalent intermediate is already formed before the rotation and the two configurations of the GEI (with and without hydrogen bond with the 6-OH) are close in energy. However, only the one without the hydrogen bond, and thus with a <sup>4</sup>C<sup>1</sup> conformation, corresponds to a minimum on the FEL of **Figure 3**. Thus, the conformation of the GEI after the first step of the double displacement reaction is <sup>4</sup>C1. In other words, the conformational itinerary is in principle cyclic, i.e., it starts and ends up in the same conformation (4C<sup>1</sup> → [ <sup>4</sup>H3/E3] ‡ <sup>→</sup> <sup>4</sup>C1). This is in agreement with the conformations of the experimental structures of the MC and GEI of NpAS (both in <sup>4</sup>C<sup>1</sup> conformation) (Mirza et al., 2001; Skov et al., 2002; Jensen et al., 2004). However, the computed itinerary differs from the expected one for α-glucosidases, for which a distorted GEI is

TABLE 2 | Calculated values of the most relevant catalytic distances (in Å) and puckering coordinates (in degrees) and their standard deviations along the glycosylation minimum energy pathway.


expected (4C<sup>1</sup> → [ <sup>4</sup>H3] ‡→ <sup>1</sup> S3) (Davies et al., 2012; Speciale et al., 2014). As we will see in the next section, both views can be reconciled if we consider the reorganization of the active site required to start the deglycosylation reaction.

### Deglycosylation Reaction

As shown above, the glycosylation reaction leads to an active site configuration in which the -1 sugar is in a <sup>4</sup>C<sup>1</sup> conformation. This configuration is not suitable for the second step of the double displacement reaction for two main reasons. First, the leaving group (C1-OAsp286) is not in axially oriented, which makes the SN2 reaction unfavorable. Second, there is no water molecule in the vicinity of the anomeric carbon. The nearest water molecule remains at ∼6 Å from the anomeric carbon in the simulation (**Figure 7**, left panel), forming hydrogen bond interactions with the acid/base residue (Glu328) and an aspartate residue (Asp393). This water molecule is the best candidate to act as a nucleophile in the deglycosylation process, but it is still too far and not well oriented to attack the anomeric carbon. Most likely, there is an energy barrier to bring the water molecule to a reactive configuration.

To obtain the reactive configuration for the deglycosylation reaction, we selected the above water molecule and approached it to the anomeric carbon with the metadynamics algorithm using the Owater-C1 distance as collective variable. In order to allow the C1-Asp286 bond to adopt an axial orientation (necessary for the water nucleophilic attack), a second collective variable that accounts for the hydrogen bond between Asp286 and the 6- OH was used. This simulation facilitated the identification of a configuration of the glycosyl-enzyme intermediate in which the water molecule is at ≈ 3.5 Å from C1 (**Figure 7**, right panel). The position of the water molecule is stabilized by hydrogen bond interactions with the 2-OH of the -1 sugar, Asp393, and the Glu328 catalytic residue. The new configuration, hereafter

named GEI<sup>∗</sup> , is separated from the initial GEI by a small free energy barrier (4.7 kcal/mol; **Figure S5**). Remarkably, the -1 sugar at GEI<sup>∗</sup> is preactivated for catalysis, as it exhibits a distorted conformation (E3) with a pseudo-axial orientation of the leaving group. Moreover, the water molecule is well oriented for nucleophilic attack, as it has the oxygen atom lone pairs pointing toward the anomeric carbon. Therefore, this configuration is the one we chose to start the modeling of the deglycosylation reaction. Interestingly, a distorted conformation of the GEI of lysozyme, a family 22 β-GH, was recently described by the Mulholland group based on QM(SCC-DFTB)/MM calculations (Limb et al., 2019). However, the GEI<sup>∗</sup> states found here differs from the one of (Limb et al., 2019) in that the nucleophile carboxylate group exhibits the usual syn configuration with respect to the C1-OAsp286 bond (τC1−O−C−<sup>O</sup> ≈ 0 ◦ ). Differences in the substrate, active site and reaction stereochemistry might be the reason of the discrepancy.

The deglycosylation reaction was modeled using three collective variables that take into account the cleavage of the glycosyl-enzyme covalent bond (C1-OAsp286), the attack of the water oxygen to the anomeric carbon (C1-Owat), and the deprototonation of the water molecule by Glu328. The FEL reconstructed from the metadynamics simulation, shown in **Figure 8**, shows two minima corresponding to the activated glycosyl-enzyme intermediate (GEI<sup>∗</sup> ) and the hydrolysis products (P), which are 7.6 kcal/mol more stable. The transition state (TS2) is 13.3 kcal/mol higher in energy with respect to the GEI<sup>∗</sup> state, consistent with deglycosylation not being rate limiting.

The structure of the active site and the -1 sugar conformation along the deglycosylation reaction pathway is shown in **Figure 5** (bottom panels) and **Figure 9**, whereas **Table 3** lists the evolution of the most important distances. The deglycosylation reaction begins by cleavage of the glycosyl-Asp286 bond, followed by attack of the water molecule on the anomeric carbon, while the water forms a tight hydrogen bond with the acid/base residue (Glu328). The system overcomes the transition state (TS2) and the conformation of the -1 sugar changes to an E<sup>3</sup> envelope. Afterwards, the Glu328 abstracts a proton from the catalytic water, the covalent bond between the water molecule and the anomeric carbon forms and the -1 sugar adopts a <sup>4</sup>C<sup>1</sup>

FIGURE 8 | (A) Free energy landscape (FEL) of the deglycosylation reaction, obtained from QM/MM metadynamics simulations with three CVs. (B) Two-dimensional projection of the FEL into the collective variables CV1 and CV2.

conformation. Therefore, the conformational itinerary of the deglycosylation reaction of NpAS can be described as E<sup>3</sup> → [E3] ‡ <sup>→</sup> <sup>4</sup>C1.

### Summary and Conclusions

By means of QM(DFT)/MM metadynamics we have unraveled the reaction mechanism of α-amylosucrase during hydrolysis, in particular the conformational dynamics of the glycosyl-enzyme


TABLE 3 | Calculated values of the most relevant catalytic distances (in Å) and their standard deviations along the deglycosylation minimum energy pathway.

intermediate (GEI), for which structural studies have assigned a <sup>4</sup>C<sup>1</sup> conformation. Our results show that the glycosylation reaction, assisted by the catalytic residues Asp286 (general base) and Glu328 (acid/base) features a dissociative transition state, in which the α-glucosyl residue at the -1 subsite adopts a <sup>4</sup>H<sup>3</sup> conformation. The GEI intermediate adopts a <sup>4</sup>C<sup>1</sup> conformation, thus being not preactivated for catalysis. However, the active site is very dynamic and it can easily evolve toward another configuration, in which the -1 sugar adopts a reactive E<sup>3</sup> conformation, with a pseudo-axial orientation of the leaving group, as the catalytic water enters the active site. The two conformations can be viewed as two different states of the GEI intermediate, dry or wet. We conclude that the catalytic itinerary of amylosucrase for the Michaelis complex → TS → intermediate enzymatic half reaction could be either described indistinctly as <sup>4</sup>C<sup>1</sup> → [ <sup>4</sup>H3] ‡ <sup>→</sup> <sup>4</sup>C<sup>1</sup> (cyclic itinerary) or <sup>4</sup>C<sup>1</sup> → [ <sup>4</sup>H3] ‡ <sup>→</sup> <sup>E</sup><sup>3</sup> (linear itinerary). This reconciles the results obtained by X-ray, with an undistorted GEI (4C1), with the expected conformation of α-glucosidases (distorted GEI with a conformation near <sup>1</sup> S3).

### COMPUTATIONAL DETAILS

### Model Building

The initial structure of the NpAS-sucrose complex was taken from the Michaelis complex structure of the Asp328Asn mutant (PDB 1JGI) (Mirza et al., 2001). The protonation states and hydrogen atom positions of all histidine residues were selected based on their hydrogen bond network (specifically, histidines 39, 173, 233, 332, 370, 392, 414, 463, 512, and 540 were considered neutral with the proton at Nε, histidines 192, 306, 565 and 591 were neutral with a proton at Nδ and histidines 377, 382, and 601 were protonated). All Asp and Glu residues were taken as deprotonated (i.e., negatively charged) except Glu328 (the acid/base residue). The system was solvated with 20725 water molecules and 16 sodium ions were added to achieve neutrality of the protein structure, forming a rectangular box with dimensions of 89.1 Å × 106.4 Å × 81.1 Å.

### Classical MD Simulations

Classical MD simulations of the NpAS-sucrose complex at room temperature were performed using the AMBER11 software (Case et al., 2010). The force-fields ff99SB (Hornak et al., 2006), GLYCAM06 (Kirschner et al., 2008) and TIP3P (Jorgensen et al., 1983) were used for the protein, sucrose substrate and solvent water, respectively. The system was equilibrated in several steps. First, all water molecules were relaxed with a gradient minimizer, holding the protein, and substrate fixed. Next, the whole system was allowed to relax. To gradually reach the desired temperature of 300 K, spatial constraints were initially added to the interactions between the protein and the substrate, while water molecules and sodium ions were allowed to move freely. The constraints were then removed and the whole system was allowed to reach the desired temperature. During all the process, the acid/base residue (Glu328) was very mobile, with the proton position evolving from cis to trans conformations and vice versa, thus its relative position with respect to the sugar was controlled with a smooth restraint (force constant of 5 kcal/mol/Å<sup>2</sup> ). The simulation was pursued for 100 ps at constant pressure, allowing the cell volume to evolve, until the density stabilized (∼1.06 g/cm<sup>3</sup> ). The MD simulation was extended to 15 ns at constant volume without restraints until the system had reached equilibrium. A timestep of 1 fs was used, increasing it to 2 fs during the last 14 ns (using SHAKE). The −1 sugar conformation evolved from <sup>4</sup>C<sup>1</sup> to B3,O and returned to the original <sup>4</sup>C<sup>1</sup> conformation during the last 7 ns of the simulation. Analysis of the trajectories was carried out by using standard tools of AMBER and VMD (Humphrey et al., 1996). The rootmean-square-deviation (RMSD) of the protein backbone atoms with respect to the crystal structure was stabilized around 1.4 Å in the equilibrated structure. One snapshot from the last 0.5 ns of simulation was taken as starting structure for the subsequent QM/MM MD simulations.

### QM/MM MD Simulations

QM/MM MD simulations were performed with the CPMD software (Car and Parrinello, 1985), using the QM/MM interface developed by Laio et al. (2002). The QM region was considered as follows: (i) the whole sucrose molecule for the simulations of the conformational free energy landscape (FEL) of the -1 sugar (αglucosyl); (ii) the sucrose, the acid/base residue (Glu328, capped at the Cβ) and the nucleophile residue (Asp286, capped at the Cα) for the simulation of the glycosylation reaction; (iii) same as (ii) plus the catalytic water, for the simulation of the deglycosylation reaction. In all cases, the frontier atoms between QM and MM region were described using pseudopotential carbon link atoms. The fictitious electronic mass of the Car-Parrinello Lagrangian was taken as 600 au and the timestep was set at 0.12 fs in all CPMD simulations. All systems were enclosed in an isolated cubic box of 12.0 Å × 12.0 Å × 12.0 Å, using a fictitious electron mass of 700 au and a time step of 0.12 fs. The Kohn-Sham orbitals were expanded in a plane wave (PW) basis set with a kinetic energy cutoff of 70 Ry. Ab initio pseudopotentials generated within the Troullier-Martins scheme were employed (Troullier and Martins, 1991). The Perdew, Burke and Ernzerhoff generalized gradient-corrected approximation (PBE) (Perdew et al., 1996) was selected in view of its good performance in previous work on isolated sugars (Biarnés et al., 2007; Marianski et al., 2016), glycosidases (Jin et al., 2016) and glycosyltransferases (Bilyard et al., 2018).

### QM/MM Metadynamics Simulations

QM/MM metadynamics (Laio and Parrinello, 2002; Barducci et al., 2011) simulations were performed to characterize the conformational FEL of the α-glucosyl residue of sucrose in the active site of NpAS and to simulate the different steps of the enzymatic reaction. The following collective variables were used: (i) Conformational FEL: the Cremer-Pople puckering coordinates (Cremer and Pople, 1975) phi and theta (ϕ, θ) of the α-glucosyl unit, following the methodology previously used in our group to rationalize and predict catalytic itineraries of GHs (Biarnés et al., 2007; Ardèvol and Rovira, 2015; Iglesias-Fernandez et al., 2015): (ii) glycosylation reaction: three collective variables representing the proton transfer (CV1), the nucleophilic attack (CV2) and the glycosidic bond cleavage (CV3) (**Figure S3**); (iii) deglycosylation reaction: three collective variables representing the covalent enzyme-substrate interaction (CV1), the water attack (CV2) and the proton transfer (CV3) were used (**Figure S3**). The metadynamics algorithm (Laio and Parrinello, 2002; Barducci et al., 2011), provided by the Plumed 2 plugin (Tribello et al., 2014), was used to explore the conformational free energy landscape of the systems. The height/width of the Gaussian terms was tested according to the oscillations of the CVs in free dynamics. In the case of the conformational FEL simulation, the height/width of the Gaussian terms was set at 0.75 kcal/mol/0.10 Å and a new Gaussianlike potential was added every 400 MD steps. The simulation of the conformational FEL was stopped once energy differences between the two wells (4C<sup>1</sup> and B3,O) were maintained (1971 Gaussian terms were added), which was further tested by a timeindependent free energy estimator (Tiwary and Parrinello, 2015). The error in the region of the two relevant minima and the pathway interconnecting them was ≤ 1.2 kcal/mol (**Figure S6**).

In the case of the glycosylation reaction, the height/width of the Gaussian terms was set at 1 kcal·mol−<sup>1</sup> /0.20 Å (CV1) and 1 kcal/mol/0.10 Å (CV2 and CV3) and a new Gaussianlike potential was added every 300 MD steps. Walls for each CV at appropriate distances were used to reduce the FEL

### REFERENCES


space to the chemical event. For the deglycosylation reaction, values of 1 kcal/mol/ 0.10 Å (height/width) and 250 MD steps (deposition time) were used. Metadynamics simulations were stopped after one crossing over the transition state (**Figure S7**), as recommended for chemical reactions (Ensing et al., 2005). Previous work on carbohydrate-active enzymes shows that the error associated to the metadynamics is <1 kcal/mol using this criteria (Raich et al., 2016). The total number of Gaussian terms added was 2284 (glycosylation reaction) and 4947 (deglycosylation reaction). The reaction coordinate was taken from the minimum free energy pathway, computed according to the intrinsic reaction coordinate method (Fukui, 1981). Structures at a given point along the reaction coordinate were taken from averages over a small region defined by CV1 ± 0.2, CV2 ± 0.2, CV3 ± 0.2 Å and were used for analysis.

### AUTHOR CONTRIBUTIONS

CR and IA designed calculations. SA-G performed and analyzed the simulations. JC helped with the computational work and analysis of the results. CR wrote the manuscript with the help of all authors.

### ACKNOWLEDGMENTS

This work was supported by grants from the Spanish Minister of Science, Innovation and Universities (MICINN) (CTQ2017- 85496-P to CR), the Agency for Management of University and Research Grants of Generalitat de Catalunya (AGAUR) (2017SGR-1189 to CR), the French National Research Agency (ANR) (CarbUniVax ANR-15-CE07-0019-01 to IA) and the Spanish Structures of Excellence María de Maeztu (MDM-2017- 0767 to CR). The authors gratefully acknowledge the computer resources at and the technical support provided by the Barcelona Supercomputing Center (BSC-CNS, Barcelona, Spain), the TGCC-Curie supercomputer (Paris, France) and the mesocenter of Région Midi-Pyrénéées (CALMIP, Toulouse, France). SA-G and JC acknowledge MICINN for predoctoral fellowships (FPI-BES-2012-051782 and FPI-BES-2015-072055, respectively).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2019.00269/full#supplementary-material

reaction coordinate of GH125 alpha-mannosidases. J. Am. Chem. Soc. 139, 1085–1088. doi: 10.1021/jacs.6b11247


glycosidase inhibitor. ACS Cent. Sci. 3, 784–793. doi: 10.1021/acscentsci. 7b00214


of amylosucrase from Neisseria polysaccharea. Biochemistry 43, 3104–3110. doi: 10.1021/bi0357762


implications for the polymerase activity. J. Biol. Chem. 277, 47741–47747. doi: 10.1074/jbc.M207860200


Zhang, R., Li, C., Williams, L. K., Rempel, B. P., Brayer, G. D., and Withers, S. G. (2009). Directed "in situ" inhibitor elongation as a strategy to structurally characterize the covalent glycosyl-enzyme intermediate of human pancreatic alpha-amylase. Biochemistry 48, 10752–10764. doi: 10.1021/ bi901400p

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Alonso-Gil, Coines, André and Rovira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Benchmark of Density Functionals for the Calculation of the Redox Potential of Fe3+/Fe2<sup>+</sup> Within Protein Coordination Shells

Risnita Vicky Listyarini † , Diana Sofia Gesto† , Pedro Paiva, Maria João Ramos and Pedro Alexandrino Fernandes\*

#### UCIBIO-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal

Edited by: Vicent Moliner, University of Jaume I, Spain

#### Reviewed by:

Nino Russo, University of Calabria, Italy Jon Mujika, Donostia International Physics Center, Spain

\*Correspondence:

Pedro Alexandrino Fernandes pafernan@fc.up.pt

### †Present Address:

Risnita Vicky Listyarini, Chemistry Education Study Program, Sanata Dharma University, Yogyakarta, Indonesia Diana Sofia Gesto, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal

#### Specialty section:

This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry

Received: 22 March 2019 Accepted: 15 May 2019 Published: 05 June 2019

#### Citation:

Listyarini RV, Gesto DS, Paiva P, Ramos MJ and Fernandes PA (2019) Benchmark of Density Functionals for the Calculation of the Redox Potential of Fe3+/Fe2+ Within Protein Coordination Shells. Front. Chem. 7:391. doi: 10.3389/fchem.2019.00391 Iron is a very important transition metal often found in proteins. In enzymes specifically, it is often found at the core of reaction mechanisms, participating in the reaction cycle, more often than not in oxidation/reduction reactions, where it cycles between its most common Fe(III)/Fe(II) oxidation states. QM and QM/MM computational methods that study these catalytic reaction mechanisms mostly use density functional theory (DFT) to describe the chemical transformations. Unfortunately, density functional is known to be plagued by system-specific and property-specific inaccuracies that cast a shadow of uncertainty over the results. Here we have modeled 12 iron coordination complexes, using ligands that represent amino acid sidechains, and calculated the accuracy with which the most common density functionals reproduce the redox properties of the iron complexes (specifically the electronic component of the redox potential at 0 K, 1E Fe3+/Fe2<sup>+</sup> elec ), using the same property calculated with CCSD(T)/CBS as reference for the evaluation. A number of hybrid and hybrid-meta density functionals, generally with a large % of HF exchange (such as BB1K, mPWB1K, and mPW1B95) provided systematically accurate values for 1E Fe3+/Fe2<sup>+</sup> elec , with MUEs of <sup>∼</sup>2 kcal/mol. The very popular B3LYP density functional was found to be quite precise as well, with a MUE of 2.51 kcal/mol. Overall, the study provides guidelines to estimate the inaccuracies coming from the density functionals in the study of enzyme reaction mechanisms that involve an iron cofactor, and to choose appropriate density functionals for the study of the same reactions.

Keywords: redox potencial, DFT, benchmaking, iron, quantum-chemical calculations

## INTRODUCTION

Iron plays an important role in enzyme catalysis. It is present in a wide range of proteins as a cofactor, and intervenes in many biochemical processes, particularly in oxygen transport, and electron transfer. Iron can be often found incorporated in heme prosthetic groups, or it can also occur individually, as an ion cofactor in some metalloproteins, or, more unfrequently, together with sulfide, forming iron-sulfur clusters (Harding et al., 2010; Abbaspour et al., 2014).

One of the most important characteristics of iron is the fact that it can be found in two stable states of oxidation, Fe2<sup>+</sup> and Fe3+. It is this characteristic that allows iron to participate in a wide range of redox reactions (Broderick, 2001). In proteins, it is usually found in the active site,

**160**

coordinated with the side chains of amino acid residues and/or water molecules. Interestingly, the redox potential of the iron ion changes depending on the ligands it is complexed with. In enzymes, through evolution, the ligands to which the iron is coordinated were selected so that its redox potential became the most appropriate for the reaction catalyzed by it (Tamames and Ramos, 2010; Liu et al., 2014).

Iron is of extreme importance to almost all organisms. In adult humans, iron makes up about 0.005% of the total body weight (circa 4 g), most of which is complexed with the heme group in hemoglobin. Iron is also present in cytochromes, hemecontaining proteins that are involved in many electron transfer reactions that occur in our body, from oxidative phosphorylation to the synthesis of hormones or the degradation of drugs (Abbaspour et al., 2014).

The reduction potential (or redox potential) of an atom/molecule is, by definition, a measurement of its ability to gain an electron, and therefore be reduced (Equation 1).

$$M^n + \ e^- \to \ M^{n-1} \tag{1}$$

The reduction potential is not a fixed value for a given chemical entity, and can change depending on different factors, such as temperature, pH, structure, conformation, and electrostatic environment (Su et al., 2011; Ho et al., 2016). For this reason, the redox potentials of metal cofactors need to be calculated for each enzyme individually. In proteins, iron acts as both an electron donor and acceptor, depending on its oxidation state. When it is in the form of Fe(II), iron will mostly give up one of its electrons, thus being oxidized to Fe(III). On the other hand, when in the form of Fe(III), it can act as a oxidizing agent, gaining one electron, and changing its oxidation state to Fe(II). In some rarer cases, it is possible to find Fe4<sup>+</sup> as a cofactor of some enzymes as well, but these cases will not be addressed in this paper.

Since iron is such an important metal in biological processes, with its redox potential changing greatly depending on the ligands coordinated to it, the study of these processes, and the accuracy of the final results, is also determined by how well we understand the redox chemistry of the iron ion. The measurement of reduction potentials in biological systems using experimental methodologies is a very complex and difficult task. The alternative is the use of high-level computational methods, which employ elaborate algorithms, equations, and approximations to simulate the system in question and achieve the most accurate result possible (Riley and Merz, 2007). However, it should be noted that in order to obtain these results one must employ methods that are extremely time consuming and CPU demanding, which is impractical for biological systems such as proteins, where the number of atoms is exceedingly high. It is possible to use less accurate methods when calculating redox potentials in such systems, but the problem in these cases is that, due to their limited accuracy, we might not know which one is the best for each chemical scenario. To solve this problem, we can benchmark the different methods, and compare these results to a reference value, ranking each method as more or less accurate.

The main focus of this work is, therefore, to perform a benchmark study of different computational methods and evaluate their ability to simulate the iron ion in different arrangements, often found in proteins. We selected only density functional theory (DFT) (Thomas, 1927; Hohenberg and Kohn, 1964) methods because of their very good accuracy and applicability on large systems (they have a more favorable scaling with the system size than post-Hartree Fock methods). DFT methods are a good choice when it comes to simulating proteins (Uudsemaa and Tamm, 2003; Riley et al., 2007; Li et al., 2009). The problem comes when selecting which density functional to use for a specific case, since their performance can differ depending on the reaction, model, chemical system, etc. For this reason, benchmarking studies are gaining increasingly more attention, as they give us the ability to make a more informed decision on which DFT methods might be more accurate for the system we want to study.

For this benchmark study, in order to mimic the many complexes that are possible for an iron ion to have in the protein, we constructed 12 models, that span over different coordination numbers and ligands. The models try to mirror the various types of amino acid side chains (and water molecules) that can be most frequently found coordinating iron in proteins. Only one side chain was used in each model, to evaluate the individual effect of each side chain in the redox potential, albeit in proteins the iron coordination shell is made of many (up to six) side chains simultaneously. The coordination shells were completed with water molecules. Water was treated as a coordinating ligand as well.

For each of these models, we calculated the electronic component of the redox potential at 0 K (1E Fe3+/Fe2<sup>+</sup> elec ) using the different density functionals. We decided to measure only this component, instead of calculating the redox potential in its entirety for several reasons: this electronic component is the one that contributes the most for the overall potential; it is the only component that can be calculated with CCSD(T), which is the method to be used as a reference; it is the component in which the different density functionals show a larger variation in the final results. Furthermore, calculating only 1E Fe3+/Fe2<sup>+</sup> elec is much easier and faster computationally, and the introduction of the other components in the calculation would render this study much more time consuming with not much improvement in the final results (Su et al., 2011; Ho et al., 2016).

Reference values for 1E Fe3+/Fe2<sup>+</sup> elec , calculated in each system to benchmark the accuracy of the density functionals, were calculated using two post-Hartree-Fock methods: the second order Moller-Plesset perturbation theory (MP2) method, and the single and double coupled cluster theory with perturbative triple correction [CCSD(T)]. The extrapolation energy at the complete basis set (CBS) limit was calculated using two different, widely used extrapolation schemes.

### COMPUTATIONAL METHODS

### Model Systems and Structure Optimization

The iron complexes found in proteins are quite diversified. Iron can make complexes with a different number of ligands, and, depending on these, the geometry of the complex will also change. Furthermore, the side chains of the amino acid residues that interact and complex with iron have different functional groups.

In order to take all of these conditions into account in the benchmark, we decided to build 12 model systems, each mimicking a different coordination sphere. These conditions address most of the various coordination states, geometries and ligands that iron complexes can adopt. Our models comprise structures with one, two, four, or six coordinated ligands, which assume geometries of the type linear (for coordination numbers one and two), tetrahedral (for coordination number four), and octahedral (for coordination number six). As for the ligands, we first started with a set of structures in which the iron ion was coordinated exclusively to water molecules, having modeled one structure for each coordination number, with a total of four models. As for the remaining 8, we substituted one of the coordinated waters by one of the following ligands: methoxide (CH3O−), formate (HCOO−), methanethiolate (CH3<sup>S</sup> <sup>−</sup>), and methylamine (NH2CH3). This choice of ligands reflects our objective of benchmarking the DFT methods in the context of proteins. Each of these entities represents the side chains of one amino acid that frequently participates in the formation of iron complexes in proteins: CH3O<sup>−</sup> mimics the side chain of serine, CHOO<sup>−</sup> that of glutamate and aspartate, CH3<sup>S</sup> <sup>−</sup> the side chain of cysteine and NH2CH<sup>3</sup> that of lysine. The structures for all the models studied are represented on **Figure 1**.

After building each of the models, we optimized their geometries using the MP2/6-311+G(d,p) level of theory. One might argue that, for the sake of consistency, it would have been better to calculate 1E Fe3+/Fe2<sup>+</sup> elec for each model using the minimum-energy geometries obtained with each functional we purpose to study. In reality, such study would demand a lot more computational time, for a minimal difference in the final results, as shown in previous studies (Ribeiro et al., 2010; Brás et al., 2011).

Furthermore, MP2 geometries are known to give good results without introducing biases toward any functional. All calculations were performed using the Gaussian 09 package (Frisch et al., 2009).

We optimized the geometries of each complex with both Fe(II) and Fe(III) oxidation states. It is known that, for each oxidation state, the iron ion can have two different spin states. For Fe(II), the high spin multiplicity is five and the low spin multiplicity is one. In the case of Fe(III), the high spin multiplicity is six, and the low one is two. We calculated which of these spin states gave the lowest energy for each oxidation state, and used that spin multiplicity for the remainder of our calculations. In both cases, the high spin multiplicity [five for Fe(II) and six for Fe(III)] was the one that resulted in the lowest energies (Riley and Merz, 2007) (data not shown). As a result, these were the spin states used throughout the following calculations. The <S <sup>2</sup> > values in each MP2 calculation were checked and compared to the expected values of 6.0 for Fe(II) complexes and 8.75 for Fe(III) complexes. The Gaussian 09 package includes an annihilation step to decrease the amount of spin contamination. For the Fe(II) complexes, we observed an average <S><sup>2</sup> of 6.0079 and 6.0000 before and after annihilation, respectively. Regarding the Fe(III) complexes, the average <S <sup>2</sup> > before and after the annihilation step were 8.7625 and 8.7501, respectively. Thus, the spin contamination is virtually null.

All calculations were made in gas-phase. As the purpose is to benchmark only and specifically the DFs, we avoided to introduce solvation, so that small differences in solvation would not introduce an unwanted contribution for differences in accuracy of the DFs that are being compared.

#### Calculation of the Reference 1E Fe3+/Fe2<sup>+</sup> elec Values

In order to correctly benchmark and compare the diverse density functionals, we needed to obtain accurate values for 1E Fe3+/Fe2<sup>+</sup> elec in each of our models.

For this study, we decided to calculate 1E Fe3+/Fe2<sup>+</sup> elec using the very accurate CCSD(T) method (Bartlett and Purvis, 1978; Purvis and Bartlett, 1982; Pople et al., 1987), extrapolated to the CBS limit, and use these values as a reference. To this end, we carried out single point calculations using the MP2/augcc-pVXZ (X = 2, 3, 4) and CCSD(T)/aug-cc-pVDZ levels, in vacuum, for each optimized structure, and then we employed two different and well-known extrapolation methods to obtain the final values: one developed by Truhlar (1998) (scheme I) and another by Helgaker et al. (1997), Halkier et al. (1998) (scheme II). Both schemes begin with the extrapolation of the correlation energy to the CBS limit at the MP2 level. The difference between the correlation energies calculated using MP2 and CCSD(T) at the largest basis set possible (for this study, we used aug-ccpVDZ) is then determined and added to the MP2/CBS value. The extrapolation of the Hartree-Fock (HF) energy is done separately from that of the correlation energy in both schemes as well.

To determine the MP2/CBS energy using scheme I, we employ Equation 2, using the results obtained from the calculations using MP2/aug-cc-pVDZ and MP2/aug-cc-pVTZ. α and β are constants taken from the literature (α = 4.93 and β = 2.13).

$$E\_{\frac{\text{J}\Omega^{\alpha}}{\text{R}\text{S}}} = \frac{\text{3}^{\alpha}}{\text{3}^{\alpha} - 2^{\alpha}} E\_{\frac{\text{J}\Omega^{\alpha}}{\text{T}}} - \frac{2^{\alpha}}{\text{3}^{\alpha} - 2^{\alpha}} E\_{\frac{\text{J}\Omega^{\alpha}}{\text{T}}} + \frac{\text{3}^{\beta}}{\text{3}^{\beta} - 2^{\beta}} E\_{\frac{\text{corr}}{\text{T}\Omega}} - \frac{2^{\beta}}{\text{3}^{\beta} - 2^{\beta}} E\_{\frac{\text{corr}}{\text{T}\Omega}} \tag{2}$$

As for scheme II, E HF CBS is obtained by exponentially fitting the values obtained for the aug-cc-pVDZ, aug-cc-pVTZ, and aug-ccpVQZ basis sets. E corr CBS is calculated with Equation 3.

$$E\_{\frac{\text{corr}}{\text{CRS}}} = \frac{4^3 E\_{\frac{\text{corr}}{\text{QZ}}} - 3^3 E\_{\frac{\text{corr}}{\text{TZ}}}}{4^3 - 3^3} \tag{3}$$

The HF energy is taken from Equation 2, whichever the scheme is used to extrapolate the correlation energy.

### Benchmarking of the Density Functionals

We selected a set of 44 density functionals and tested their performance against the values for 1E Fe3+/Fe2<sup>+</sup> elec obtained at the CCSD(T)/CBS level. This selection includes functionals belonging to different classes: 2 local density approximation (LDA) functionals, 7 generalized gradient approximation (GGA),

5 meta-GGA (m-GGA), 12 hybrid-GGA (h-GGA), 10 hybridmeta-GGA (hm-GGA), 4 double hybrid-GGA (hh-GGA), 1 nonseparable gradient approximation (NGA), 1 hybrid-meta-NGA (hm-NGA), 1 meta-NGA (m-NGA), and 1 generalized gradient exchange (GGE) (**Table 1**). All of the benchmarking calculations were carried out using 6-311+G(2df,2p) basis set, which was selected due to the fact that, for large biological systems, it is the most CBS that can be generally used. Larger basis sets would render the calculations too difficult, making them extremely computationally demanding and time consuming. It is also worth noting that this basis set is frequently close to the DFT CBS limit, and any differences in relative energy (activation energy, reaction energy, redox potentials, etc.) from larger basis sets are usually in the tenths of a kilocalorie per mole. Another reason for the choice of the 6-311+G(2df,2p) basis set was because it is sufficiently large that it minimizes basis set truncation error within DFT, and so we can more accurately evaluate the performance of the functional without the interference from the basis set.

In order to calculate 1E Fe3+/Fe2<sup>+</sup> elec with each of the DFT methods, we used the structures optimized at the MP2/6- 311+G(d,p) level of theory, for both Fe(II) and Fe(III) and determined their energies through single point calculations. 1E Fe3+/Fe2<sup>+</sup> elec was then calculated from the difference between each Fe(II)/Fe(III) pair for the corresponding complex. All DFT calculations were carried out using the Gaussian 09 suite (Frisch et al., 2009).

### RESULTS AND DISCUSSION

### Calculation of the Electronic Component of the Reduction Potential at the MP2/CBS Level

In order to obtain an accurate measure of 1E Fe3+/Fe2<sup>+</sup> elec for each iron complex, after the initial geometry optimization of the complexes we used high-level ab initio post-HF methods, with CBS extrapolation to eliminate any basis set truncation error. This was done by calculating these values first with the MP2 method, with aug-cc-pVXZ (X = 2, 3, and 4) with subsequent extrapolation to the MP2/CBS level. Extrapolation to this level was done using two different schemes, as described in the methods section of this paper. These values were used in conjunction with the CCSD(T) results (at the corresponding basis set level) in order to extrapolate the energies to the CCSD(T)/CBS level. In this manner, we ensure that we have reliable values for 1E Fe3+/Fe2<sup>+</sup> elec for each of our 12 models, and that these values can be used as a reference to ascertain the precision of various density functionals in calculating the electronic energy contribution of Fe(II) and Fe(III) in each system. **Table 2** summarizes the results obtained for 1E Fe3+/Fe2<sup>+</sup> elec at the MP2 level and the aug-cc-pVDZ, aug-cc-pVTZ, and augcc-pVQZ basis sets.

1E Fe3+/Fe2<sup>+</sup> elec changed by 2–3 kcal/mol when moving from augcc-pVDZ to the aug-cc-pVTZ basis set, and by a further ∼2–3 kcal/mol when moving from aug-cc-pVTZ to the aug-cc-pVQZ basis set, emphasizing the need for the extrapolation for the CBS limit. As stated previously, we used two different schemes to determine the MP2/CBS energies. The first extrapolation scheme we used was scheme I, developed by D. Truhlar. **Table 3** shows the values obtained at the CBS limit using this scheme, as well as the differences between these energies and the corresponding ones at the aug-cc-pVXZ (X = 2, 3, and 4) basis sets.

Our results show that MP2/CBS energies obtained with scheme I are closer to those calculated with quadruple-zeta basis set than any of the other basis sets, as expected, and that the difference between the CBS and quadruple-zeta basis set is small, as can be confirmed by comparing the values for the mean signed error (MSE) and mean unsigned error (MUE) for each of them. We can also observe that the convergence of the reduction

#### TABLE 1 | List of all the density functionals tested in this wok.


potential with the basis set is slow, since at quadruple-zeta the values are still 1.40 kcal/mol (in average) away from the CBS limit.

Using scheme II, developed by T. Helgaker and co-workers, to extrapolate 1E Fe3+/Fe2<sup>+</sup> elec to the MP2/CBS level, we obtained the results shown in **Table 4**. This scheme uses the energies calculated with the aug-cc-pVTZ and aug-cc-pVQZ basis sets for the extrapolation. Since these basis sets are more complete that the ones used in scheme I, it might be expected that the reduction potentials extrapolated using scheme II are more accurate.

Besides the 1E Fe3+/Fe2<sup>+</sup> elec extrapolated to the MP2/CBS level using scheme II, **Table 4** also shows the difference between these values and those extrapolated using scheme I and calculated at the MP2/aug-cc-pVQZ level. From these results, we can observe that there is a slight difference between the scheme II MP2/CBS and MP2/aug-cc-pVQZ energies, with an average of 0.88 kcal/mol. This indicates that, for these systems, the values calculated with quadruple-zeta basis sets are not accurate enough, and extrapolation to the CBS limit is indeed necessary. Furthermore, it is interesting to note that the difference between both extrapolation schemes is small (MUE = 0.52 kcal/mol), but not insignificant, especially considering that both of these methods are expected to yield very accurate results. Although in some cases this difference is small enough that it can be disregarded [0.024 kcal/mol for Fe(H2O) and 0.0830 kcal/mol for Fe(H2O)3(CH3S <sup>−</sup>)], in cases such as that of Fe(H2O)3(CH3O−) and Fe(H2O)5(CH3O−) it is actually somewhat relevant (<sup>∼</sup> 1.3 kcal/mol). Since scheme II uses larger basis sets to perform the extrapolation, we chose those energies for the remainder of the work, particularly in the extrapolation to the CCSD(T)/CBS Listyarini et al. DFT Benchmark for Fe3+/Fe2<sup>+</sup> Redox Potential

TABLE 2 | Results for 1E Fe3+/Fe2<sup>+</sup> elec calculated at the MP2 level and the aug-cc-pVDZ, aug-cc-pVTZ, and aug-cc-pVQZ basis sets, for each model.


TABLE 3 | 1E Fe3+/Fe2<sup>+</sup> elec obtained at the MP2/CBS level, extrapolated using scheme I, for each model, and the differences between these new energies and the ones obtained in Table 2.


level. It is, however, worth mentioning that, even though the use of large basis sets to get more accurate results is a good practice in systems with a small number of atoms, such methods are impractical for large biological systems, in which case the system would have to be reduced significantly, introducing much larger errors.

Fortunately, DFT methods converge much faster than Post-HF methods in relation to the basis set completeness, and as such they will not be affected by these large basis set truncation errors, as seen here.

TABLE 4 | 1E Fe3+/Fe2<sup>+</sup> elec obtained at the MP2/CBS level, extrapolated using scheme II for each model, and the differences between these energies and those obtained using scheme I and at the MP2/aug-cc-pVQZ level.


### Calculation of the Reduction Potential at the CCSD(T)/CBS Level

The last step in the calculation of the reference values for 1E Fe3+/Fe2<sup>+</sup> elec is the extrapolation to the CCSD(T)/CBS level. After obtaining 1E Fe3+/Fe2<sup>+</sup> elec at MP2/CBS, this last step can be easily done by adding to this value the difference between the reduction potentials calculated with MP2 and CCSD(T) (ECCSD(T)−MP2) at the same basis set, aug-cc-pVDZ in our case. This widely used approximation is based on the fact that the difference in correlation energy between CCSD(T) and MP2 does not depend strongly on the basis set, in particular for medium/large basis sets. Due to the high computational cost of CCSD(T) calculations, we obtained 1E Fe3+/Fe2<sup>+</sup> elec with this method with the aug-cc-pVDZ basis set. Previous studies have shown that using a larger basis set in this correction does not significantly alter the extrapolated value, and that the largest difference between using double or triple-zeta when obtaining ECCSD(T)−MP<sup>2</sup> was < 0.1 kcal/mol (Jurecka and Hobza, 2002 ˇ ). Jurecka et al. (2006) ˇ have also previously stated that double-zeta basis sets are complete enough to obtain a good extrapolation to the CBS limit, and using larger basis sets requires more computational time without much benefit.

**Table 5** presents the results 1E Fe3+/Fe2<sup>+</sup> elec at the CCSD(T)/CBS level (as stated previously, CCSD(T)/CBS was extrapolated using the results obtained with extrapolation scheme II). We can observe that the difference between both CCSD(T) and MP2 at the CBS limit is significant, which is a result of the relevant role that correlation energy has in the case of our chemical transformation, which is expectable as an electron is being taken out from the system.

TABLE 5 | 1E Fe3+/Fe2<sup>+</sup> elec obtained at the CCSD(T)/CBS level and at the MP2/CBS level (extrapolated using scheme II), as well as the difference between these two values for each model.


The energies extrapolated to the MP2/CBS level are much closer to the true value than those calculated using a truncated basis set, but these are still, nonetheless, approximations with associated errors that are introduced at different points throughout our calculations. Furthermore, the conversion of the MP2 level into the CCSD(T) level at the CBS limit is also the cause of small error. Overall, it must be taken into account that these results were obtained through several approximations, each of which adding a small error to our final values. In total, we expect our reference values (1E Fe3+/Fe2<sup>+</sup> elec calculated at the CCSD(T)/CBS level) to have an associated uncertainty of 1 kcal/mol or less, in most of the cases. As such, a qualitative distinction between density functionals having a difference smaller than 1 kcal/mol between them should not be made.

### Benchmarking of the DFT Functionals

After obtaining the reference values for 1E Fe3+/Fe2<sup>+</sup> elec of iron in each of the complexes, we can now use these results to evaluate the performance of the 44 density functionals we have selected. It should be noted that this benchmark study does not represent the overall quality of each of the DFT, but merely indicates, for these systems in specific (which we expect to be representative of aminoacid side-chain coordination shells), which of them are better at estimating the chemical property we wish to study (iron reduction potential, in our case).

From the optimized geometries for all systems, we performed single point calculation with each of the 44 density functionals, and calculated 1E Fe3+/Fe2<sup>+</sup> elec using the basis set 6-311+G(2df,2p). We selected a large basis set in order to minimize truncation errors, which means that the fluctuations on the results arise essentially from the density functional. The values obtained for 1E Fe3+/Fe2<sup>+</sup> elec will be compared to the reference values calculated at the CCSD(T)/CBS level.

In order to make discussing the results easier, and because our study involves several different systems, we decided to group our models in 3 distinct groups (**Figure 1**). Group A comprises the iron complexes with only water as their ligand, with coordination numbers 1, 2, 4, and 6, whereas groups B and C include complexes which contain one ligand that represents the side chain of an amino acid. What differentiates these 2 groups is the coordination number of the iron ion: 4 for group B and 6 for group C. The iron is complexed with 3 and 5 water molecules, respectively, plus the ligand simulating the amino acid side chain.

For the next part of this paper, we will only analyze the ten functionals that gave the best performance for each case. We decided to present our results in this fashion in order to make a more focused and easy-to-read discussion. A full table displaying the results for all the DFs tested is available in the **Supporting Information** section of this paper.

We begin our discussion with the systems belonging to group A.

**Table 6** represents the results obtained for the ten density functionals that showed the best performance for this group. Performance is measured as the difference between 1E Fe3+/Fe2<sup>+</sup> elec calculated with each DFT and the reference value (MUE). We also show the maximum error (MaxE), which is the model were the difference between the value calculated with the reference value and the energy calculated with the density functional was higher.

Taking these results into account, we can further divide the DFs into 3 separate groups. Group I includes all the density functionals with MUE between 0 and 2.31 kcal/mol (which corresponds to an error below 0.1 V per electron transfer). Group II comprises functionals with MUE between 2.31 and 4.62 kcal/mol (between 0.1 and 0.2 V per electron transfer) and group III encompasses those whose MUE is higher than 4.62 kcal/mol. The density functionals whose performance puts them in group I are BB1K, mPWB1K, and mPW1N. It is interesting to note that these DFs all belong to either the hybrid-meta-GGA or hybrid-GGA class, and that in the whole table only two functionals do not belong to these classes. The first four functionals all have a rather high HF exchange percentage, higher than 40%. The very popular B3LYP functional appears at 9th place in our table as well, and its MUE puts it in group II. It is also worth noting that the Fe(H2O)<sup>6</sup> system is the one where the largest error is more frequently found.

We could also measure the performance of the tested functionals in terms of MaxE. However, and taking the results in **Table 6** into account, we can observe that both MUE and MaxE show a similar trend, and that those functionals with a smaller MUE also tend to have a smaller MaxE, with a few exceptions. For this reason, we decided to continue to rank the DFs in terms of MUE.

For group B, we performed the same analysis as group A. As stated previously, this group consists of systems with three water molecules and one ligand coordinated to the iron ion in a tetrahedral geometry. Each ligand represents a different amino acid side chain (CH3O<sup>−</sup> for serine, HCOO<sup>−</sup> for glutamate and aspartate, CH3S <sup>−</sup> for cysteine and NH2CH<sup>3</sup> for lysine), which correspond to the most common residues that participate in iron complexes of proteins.

The results for group B (**Table 7**), show that the DFs that display a better accuracy for this group are mostly similar to those that performed better for system A as well. The density functionals that belong to group I in this case are BMK and B3LYP, which are, yet again, either hybrid-meta-GGAs or hybrid-GGAs. The HF correlation percentage for most of the top ten functionals is also higher than 40%, with MN-12L as a remarkable exception. The system that contributes the most to the maximum error in this group is Fe(CH3S <sup>−</sup>)(H2O)3.

And finally, we reach our last group, group C, composed of iron complexes with coordination number 6, in which one of the ligand is a chemical group representing an amino acid side chain, and the other five are water molecules. The results obtained for the benchmarking performed for group C are present in **Table 8**. Six functionals have MUE values < 2.31 kcal/mol, placing them in group I. These are mPW1B95, PBE1PBE, B3PW91, wB97X-D, BB1K, and mPWB1K. The percentage of HF exchange is not as high for this group, but still some of the functionals placed in the top ten have more the 40% HF exchange. Again for group C, all of the functionals in group I, and indeed in the whole table, are either hybrid-meta-GGAs or hybrid-GGAs. In accordance with the results for group B, for group C it is also the complex with CH3S <sup>−</sup> that contributed the most times to the MaxE. B3LYP shows a good performance in this group as well, ranking in the 7th position. We also assessed the contribution of the dispersion effect on the iron complexes of group C, which are those that have a higher number of ligands and, therefore, where the dispersion interactions should be more noticeable. In this sense, we calculated the Grimme's dispersion correction (Grimme et al., 2010; Goerigk and Grimme, 2011) with the top ten functionals (**Table 8**) and concluded that its inclusion marginally enhances the results, as the average MUE was reduced by ∼0.15 kcal/mol. However, considering the precision of the reference CCSD(T)/CBS method (about 0.5–1 kcal/mol), we can assume that the inclusion of the dispersion correction does not significantly influence the results and, consequently, the gathered illations.

Having analyzed the performance of the 44 density functionals in estimating the reduction potentials of iron complexes in the three separate groups, we will now make an overall appreciation of their performance for all the systems studies as a whole.

**Table 9** presents the ten functionals that showed the best performance in calculating 1E Fe3+/Fe2<sup>+</sup> elec for the studied iron complexes. Three functionals MUE values that put them in group I: BB1K, mPWB1K, and mPW1B95. All of them belong to the hybrid-meta-GGA class of functionals. The very popular B3LYP


For each functional, we highlighted (red) the model with the largest difference between reference and calculated value.

TABLE 7 | Top ten best performing functionals for group B complexes.


For each functional, we highlighted (red) the model with the largest difference between reference and calculated value.


For each functional, we highlighted (red) the model with the largest difference between reference and calculated value.

TABLE 9 | Top ten best performing functionals for all the complexes studied.


The complex to which the MaxE corresponds to is also shown.

functional ranks 5th in this overall analysis. All functionals present in the top ten are either hybrid-meta-GGAs or hybrid-GGAs, and the percentage of HF exchange is higher than 40% in most of the cases. We can therefore conclude that both of these classes are the best at estimating the 1E Fe3+/Fe2<sup>+</sup> elec of the iron complexes studied, and should be used when calculating the reduction potential of iron complexes.

### CONCLUSION

In our study, we analyzed the performance of 44 density functionals in estimating the electronic component of the reduction potential at 0 K of 16 different iron complexes, with a special interest in iron complexes with ligands similar to some amino acid side-chains. We were able to conclude which of these DFs gave results closer to reference values calculated using very high level computational methods, with posterior extrapolation to the very accurate CCSD(T)/CBS level of theory, which was done with two different extrapolation schemes. An initial geometry optimization for each of the 16 systems was carried out with MP2/6-311+G(d,p) level of theory, followed by singlepoint calculations with each of the 44 selected functional with the 6-311+G(2df,2p) basis set. From the difference between the energies obtained with Fe2<sup>+</sup> and Fe3+, we calculated 1E Fe3+/Fe2<sup>+</sup> elec for all complexes, and then compared the results obtained with the reference values, ranking the DFs according to this difference.

Our results show that the difference between the extrapolated values for the level MP2/CBS calculated with both extrapolation schemes is very small (average difference is 0.52 kcal/mol). For the final extrapolation to the CCSD(T)/CBS level of the reduction potentials we selected the values calculated with scheme II, since it uses results obtained with a larger basis set for the extrapolation.

The benchmarking study showed us that the best functionals to calculate the reduction potential of iron complexes, especially those associated to biological systems, belong to the hybrid or hybrid-meta-GGAs classes, and have a high percentage of HF exchange. Overall, BB1K, mPWB1K, and mPW1B95 were the functionals that performed better for all iron complexes studied (average MUE 1.72, 1.93 and 2.28 kcal/mol, respectively). The very popular functional B3LYP gave a rather good performance, ranking 5th in our list of best functionals.

As a final remark it is important to note that there is an estimate of a 1 kcal/mol error in our reference values, which means that it is very difficult to assess the overall performance of functionals that differ by <1 kcal/mol between them. Nevertheless, our results are significant to show which DFs are more likely to perform well with iron complexes that incorporate biological systems, even if we are not able to show which, among all of them, is the single best density functional.

### DATA AVAILABILITY

All datasets generated for this study are included in the manuscript and/or the **Supplementary Files**.

### AUTHOR CONTRIBUTIONS

RL performed most of the calculations, analyzed the results, and contributed to the writing of the manuscript. DG and PP performed a part of the calculations, analyzed the results, and contributed to the writing of the manuscript. MR and PF planned the research, analyzed the results, and contributed to the writing of the manuscript.

### FUNDING

This work received financial support from the European Union (FEDER funds POCI/01/0145/FEDER/007728) and National Funds (FCT/MEC, Fundação para a Ciência e Tecnologia and Ministério da Educação e Ciência) under the Partnership Agreement PT2020 UID/MULTI/04378/2013

### REFERENCES


and UID/MULTI/04378/2019, as well as through the project PTDC/QUI-QFI/28714/2017.

### ACKNOWLEDGMENTS

We acknowledge Fundação para a Ciência e Tecnologia through grants IF/00052/2014 and PD/BD/135268/2017.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem. 2019.00391/full#supplementary-material


correlation energies. J. Chem. Phys. 87, 5968–5975. doi: 10.1063/1.4 53520


comparative assessments for hydrogen bonding and van der waals interactions. J Phys Chem A. 108, 6908–6918. doi: 10.1021/jp04 8147q


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Listyarini, Gesto, Paiva, Ramos and Fernandes. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.