# Free Energy, Enthalpy and Entropy from Implicit Solvent End-Point Simulations

^{1}Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Universita' di Udine, Udine, Italy^{2}Istituto Nazionale Biostrutture e Biosistemi, Rome, Italy^{3}Dipartimento di Area Medica, Universita' di Udine, Udine, Italy^{4}Science and Math Division, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates

Free energy is the key quantity to describe the thermodynamics of biological systems. In this perspective we consider the calculation of free energy, enthalpy and entropy from end-point molecular dynamics simulations. Since the enthalpy may be calculated as the ensemble average over equilibrated simulation snapshots the difficulties related to free energy calculation are ultimately related to the calculation of the entropy of the system and in particular of the solvent entropy. In the last two decades implicit solvent models have been used to circumvent the problem and to take into account solvent entropy implicitly in the solvation terms. More recently outstanding advancement in both implicit solvent models and in entropy calculations are making the goal of free energy estimation from end-point simulations more feasible than ever before. We review briefly the basic theory and discuss the advancements in light of practical applications.

## 1. Introduction

Free energy is the key quantity in the description of the thermodynamics of biological systems, and therefore an important objective of biomolecular simulations. In spite of its relevance, free energy calculations are *not* performed for every simulation performed although most often concepts like (local) stability are used to describe the results of the simulation suggesting a thermodynamic description.

The difficulties arise obviously from the calculation of the entropy of the system which cannot be immediately computed as an ensemble average over simulation snapshots, contrary to the enthalpy. The free energy *G* can be formally related to an ensemble average:

where *R* is the gas constant, *T* the temperature, β is equal to $\frac{1}{RT}$, *U* is the potential energy and ⟨⟩ indicates thermodynamic ensemble average. However, as pointed out many years ago (Beveridge and DiCapua, 1989), this formula is of no practical use unless all configurational space enters (implicitly or explicitly) the average, because lowest probability configurations display the highest value contribution to the average and there cannot be convergence for upper unbound potentials. In molecular simulations typically only near-equilibrium configurations are sampled.

For this reason free energy differences calculation is performed using methods, like umbrella sampling (Torrie and Valleau, 1977), thermodynamic integration (Straatsma and McCammon, 1991) or metadynamics (Laio and Parrinello, 2002), that compute by various methods the free energy along a pathway connecting end-point states, even if it is often just the difference in free energy between the latter points that is actually needed.

If the two end-points are close enough a simulation of one or both end-points may be performed and free energy differences obtained by free energy perturbation (Zwanzig, 1954), but this is not possible in general.

The problem with the computation of the entropy from a single MD simulation is that only part of the configurational space is accessed by simulation.

In this short perspective we remark that the long sought calculation of free energies from end point simulations may be afforded with reasonable accuracy from implicit solvent end-point simulations. In particular, some recent developments in the way entropy is calculated in fact allow to compute free energy from a single simulation, so it is worth to review briefly the theory here and to provide simple formulae to compute free energy, entropy and enthalpy from implicit solvent simulations.

## 2. Theory

We consider here simulations sampling different states of a system *A* in solution. This can be done by different simulations if the states are kinetically well separated, or by post-processing a single simulation to divide microstates belonging to different states.

If the simulation is extensive enough most probable microstates are sampled and representative thermodynamic ensembles are generated for the different states which include samples of all microstates whose probability density is higher than a threshold.

Enthalpy can be computed as the energy ensemble average, whereas entropy is problematic, in particular for what concerns solvent degrees of freedom. Although solvent entropy has been taken into account in some studies (e.g., De Simone et al., 2005) the extensive correlation among solvent molecules makes solvent entropy estimation a difficult task. In order to get rid of this problem we treat the system using an implicit solvent model.

### 2.1. Implicit Solvent Models

Following the excellent reviews available (Gilson et al., 1997; Roux and Simonson, 1999; Wereszczynski and McCammon, 2012) we write the standard chemical potential, or molar free energy, in the following form:

where ${\overrightarrow{r}}_{A}$ and ${\overrightarrow{r}}_{S}$ are the solute and solvent coordinates. *C* includes the integrals over the reference state (1M, random orientation) and the momentum integrals (independent of conformation) that cancel when comparing different states of the same system and the term ${P}^{0}{\stackrel{\u0304}{V}}_{A}$, with *P*^{0} the standard pressure and ${\stackrel{\u0304}{V}}_{A}$ the partial volume of the solute, whose dependence on conformation provides negligible effects at standard pressure. For a derivation see Gilson et al. (1997).

For this reason in the following *C* will not be considered further.

Due to the difficulties in estimating the entropy of highly correlated solvent molecules the solvation potential of mean force ($\Delta W({\overrightarrow{r}}_{A},T)$) is defined by integrating out the solvent degrees of freedom.

where *U*_{AS} denotes the energy terms that couple the solute and the solvent and *U*_{S} is the solvent energy.

With this definition and the assumptions made above we have:

where the dependence on the temperature of the solvation energy Δ*W* has been made explicit.

Implicit solvent models provide a functional form and parameters for $\Delta W({\overrightarrow{r}}_{A},T)$.

Although the implicit solvent model based on the Poisson-Boltzmann equation (Fogolari et al., 2002) and the solvent accessible surface area (PBSA) could be used in molecular dynamics simulations (Fogolari et al., 2003), the method based on the Generalized Born model and solvent accessible surface area (GBSA) is the method of choice for its computational efficiency (Still et al., 1990; Onufriev et al., 2004). Limitations of both methods include treatment of small crevices and cavities, where water could not display bulk solvation properties, and neglection of curvature dependence of surface tension coefficient (Nicholls et al., 1991). For GBSA an additional limitation could be due to the dependence of empirical parameters on molecular shape (see e.g., Fogolari et al., 2015a). Compared to approaches simulating the solutes in explicit solvent and post-processing the trajectory in implicit solvent (Kollman et al., 2000), methods that generate the configurational ensemble using the same implicit solvent used for energy calculations are more consistent because they don't suffer from possible mismatches between the implicit and explicit solvent models.

In recent years some of the available implicit solvent models (including forcefield and set of parameters used (Swanson et al., 2005, 2007)) have been shown to be extremely accurate in the treatment of protein thermodynamics, as demonstrated by the study by Simmerling and coworkers, where 16 out of 17 proteins could be correctly folded using the GB-Neck2 implicit solvent model and for 14 of them the native fold was preferred over the misfolded one (Mongan et al., 2007; Nguyen et al., 2014).

### 2.2. Enthalpy in Implicit Solvent Models

The molar enthalpy is obtained as:

When the derivation is performed, taking into account that also $\Delta W({\overrightarrow{r}}_{A},T)$ depends on the temperature, we obtain:

where the symbol <> indicates the ensemble average of the quantity within brackets, i.e., the average over the simulation of the same quantity.

The above equation shows that the enthalpy, which is obtained as the energy ensemble average in MD simulations, is directly related to the ensemble average of the implicit solvent model potential energy. The difference is expressed by the $-T<\frac{\partial \Delta W}{\partial T}>$ term which may however be obtained by explicit derivation with respect to the temperature of the solvation energy terms in the implicit solvation model.

In the GBSA model the solvation energy is the sum of an electrostatic (Δ*W*^{el}) and a surface tension (Δ*W*^{SA}) term:

where ϵ(*T*) and ϵ_{in} are the solvent and molecular (typically 1) dielectric constants, respectively, *q*_{i} is the charge of the *i*^{th} atom, *f*_{ij} a pairwise function depending on all atoms' coordinates, γ(*T*) is a surface tension and *A* is the solvent-accessible surface area. The dependence on temperature of the solvent dielectric constant and of the surface tension has been made explicit in the above equation. The contributions to the derivative of Δ*W* with respect to temperature are easily obtained from the electrostatic and surface tension solvation energy terms, when the temperature dependence of the parameters of the implicit solvent model is known:

### 2.3. Entropy in Implicit Solvent Models

The entropy is linked to the configurational solute probability distribution.

First consider that the probability density distribution in the implicit solvent model is:

When the derivation of the free energy is performed the entropy can be written as:

Rewriting the entropy in terms of probability density allows to rewrite the free energy in turn as the sum of an ensemble average and a configurational entropy term, which is itself an ensemble average:

As said above the difficulty in estimating free energies resides in the entropy estimation which requires consideration of both sampled and non-sampled configurational space. Similarly, to estimate $\rho ({\overrightarrow{r}}_{A},T)$, consideration of non-sampled configurational space is needed. Implicit solvent models circumvent the problem treating solvent entropy implicitly, through the parameters and their temperature dependence.

In the above equation the configurational entropy is written formally as an ensemble average of the probability density. In this respect a convenient description of systems containing proteins is the bond, angle, torsion (BAT) representation (Go and Scheraga, 1976), because bonds, and to some extent also angles, contribute very little to changes in entropy for different states of proteins (e.g., notably, bound and non-bound) (Karplus et al., 1987). For this reason entropy is estimated often considering to a first approximation only torsional degrees of freedom and possibly external rotation and translation degrees of freedom.

Other descriptions, e.g., in cartesian coordinates, have been used and the entropy has been calculated assuming the system is moving harmonically or anharmonically about the energy minimum (see recent reviews Polyansky et al., 2012; Wereszczynski and McCammon, 2012). This approximation may however be poor for loops or other unrestrained parts of the molecules.

A particularly attractive method to compute entropy is the approach proposed by Singh et al. (2003) which was further developed by the same authors and others (Hnizdo et al., 2003, 2007, 2008; Darian et al., 2005; Numata et al., 2007; Wang et al., 2009; Mukherjee, 2011; Fenley et al., 2014; Huggins, 2014; Fogolari et al., 2015b, 2016). In practice the probability density ρ(*x*_{1}, *x*_{2}, …, *x*_{s}) may be estimated considering a ball of radius *r*_{i} around each configurational sample ${\overrightarrow{x}}_{i}$ up to the k-th nearest neighbor (Figure 1), then the local probability density ($\widehat{\rho}({\overrightarrow{x}}_{i})$) is obtained by dividing the number *k* of the *n* samples which are found inside the ball over the volume of the ball *V*_{i} and *n*, i.e.,:

The idea is very simple and can be made rigorous (Singh et al., 2003). When equation 14, or its exact form, is substituted in equation 13 the free energy may be estimated easily from configurational samples.

The discrepancy between naive and exact treatment is limited. Note that the volume around each configurational sample is tuned to match the local density and there is no need consider regions where no configurational sample is found. Obviously the dimensionality of a protein system is large and it is not possible in practice to consider a probability density without actually assuming that most degrees of freedom are decoupled from each other.

The groups of Gilson (Killian et al., 2007) and Tidor (King and Tidor, 2009; King et al., 2012) have proposed treatments of mutual information that provide a practical estimate of global entropy (actually, an upper bound) considering only single degrees of freedom and pairwise mutual information. Depending on the number of samples the approach can be easily extended to higher orders of correlation.

## 3. Conclusion

We have recapitulated above the fundamentals of implicit solvent free energy calculations from end-point simulations, recalling the relationship between explicit and implicit solvent models, and showing how entropy and enthalpy can be obtained from implicit solvent simulations.

From a practical point of view we can estimate free energy from implicit solvent simulations using equation 13:

The ensemble average of *U* + Δ*W* is provided by the potential energy in the implicit solvent model used, whereas the entropy $-R<log(\rho ({\overrightarrow{r}}_{A},T))>$ must be computed from the conformational ensemble.

The available implicit solvent models and parameters, together with forcefield parameters, have been shown to be accurate enough to reproduce complex phenomena like protein folding giving confidence in the accuracy of the implicit solvent potential of mean force. The average of the potential energy over an implicit solvent molecular dynamics trajectory thus provides the first term in the above equation.

The entropic term may be estimated using the nearest neighbor method which is emerging as an accurate entropy estimator, with many advantages over more traditional methods, including lack of hypothesis on non-sampled conformational space, no need to consider explicitly non-sampled conformational space and sound theoretical basis.

To make the application of the method straightforward we have implemented the nearest-neighbor method in two programs, PDB2ENTROPY and PDB2TRENT, available through the git-hub repository (URL: https://github.com/federico-fogolari) which allow to compute conformational and rotational-translational entropy directly from the conformational ensemble in PDB format.

In summary, the recent advancements in solvation models and entropy calculation, based on the nearest-neighbor method, are making computation of free-energy from end-point simulations significantly more accurate than before, with many possible applications in the next future.

## Author Contributions

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

## Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Acknowledgments

Dr. I. M. Lait and A. Makek are gratefully acknowledged for support.

## References

Beveridge, D. L., and DiCapua, F. M. (1989). Free energy via molecular simulation: application to chemical and biomolecular systems. *Annu. Rev. Biophys. Biophys. Chem.* 18, 431–492. doi: 10.1146/annurev.bb.18.060189.002243

Darian, E., Hnizdo, V., Fedorowicz, A., Singh, H., and Demchuk, E. (2005). Estimation of the absolute internal-rotation entropy of molecules with two torsional degrees of freedom from stochastic simulations. *J. Comput. Chem.* 26, 651–660. doi: 10.1002/jcc.20198

De Simone, A., Dodson, G. G., Verma, C. S., Zagari, A., and Fraternali, F. (2005). Prion and water: tight and dynamical hydration sites have a key role in structural stability. *Proc. Natl. Acad. Sci. U.S.A.* 102, 7535–7540. doi: 10.1073/pnas.0501748102

Fenley, A. T., Killian, B. J., Hnizdo, V., Fedorowicz, A., Sharp, D. S., and Gilson, M. K. (2014). Correlation as a determinant of configurational entropy in supramolecular and protein systems. *J. Phys. Chem. B* 118, 6447–6455. doi: 10.1021/jp411588b

Fogolari, F., Brigo, A., and Molinari, H. (2002). The Poisson-Boltzmann equation for biomolecular electrostatics: a tool for structural biology. *J. Mol. Recogn.* 15, 377–392. doi: 10.1002/jmr.577

Fogolari, F., Brigo, A., and Molinari, H. (2003). Protocol for MM/PBSA molecular dynamics simulations of proteins. *Biophys. J.* 85, 159–166. doi: 10.1016/S0006-3495(03)74462-2

Fogolari, F., Corazza, A., and Esposito, G. (2015a). Accuracy assessment of the linear Poisson-Boltzmann equation and reparametrization of the OBC generalized Born model for nucleic acids and nucleic acid-protein complexes. *J. Comput. Chem.* 36, 585–596. doi: 10.1002/jcc.23832

Fogolari, F., Corazza, A., Fortuna, S., Soler, M. A., VanSchouwen, B., Brancolini, G., et al. (2015b). Distance-based configurational entropy of proteins from molecular dynamics simulations. *PLoS ONE* 10:e0132356. doi: 10.1371/journal.pone.0132356

Fogolari, F., Dongmo Foumthuim, C. J., Fortuna, S., Soler, M. A., Corazza, A., and Esposito, G. (2016). Accurate estimation of the entropy of rotationtranslation probability distributions. *J. Chem. Theory Comput.* 12, 1–8. doi: 10.1021/acs.jctc.5b00731

Gilson, M. K., Given, J. A., Bush, B. L., and McCammon, J. A. (1997). The statistical-thermodynamic basis for computation of binding affinities: a critical review. *Biophys J.* 72, 1047–1069. doi: 10.1016/S0006-3495(97)78756-3

Go, N., and Scheraga, H. A. (1976). On the use of classical statistical mechanics in the treatment of polymer chain conformation. *Macromolecules* 9, 535–542. doi: 10.1021/ma60052a001

Hnizdo, V., Darian, E., Fedorowicz, A., Demchuk, E., Li, S., and Singh, H. (2007). Nearest-neighbor nonparametric method for estimating the configurational entropy of complex molecules. *J. Comput. Chem.* 28, 655–668. doi: 10.1002/jcc.20589

Hnizdo, V., Fedorowicz, A., Singh, H., and Demchuk, E. (2003). Statistical thermodynamics of internal rotation in a hindering potential of mean force obtained from computer simulations. *J. Comput. Chem.* 24, 1172–1183. doi: 10.1002/jcc.10289

Hnizdo, V., Tan, J., Killian, B. J., and Gilson, M. K. (2008). Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods. *J. Comput. Chem.* 29, 1605–1614. doi: 10.1002/jcc.20919

Huggins, D. J. (2014). Quantifying the entropy of binding for water molecules in protein cavities by computing correlations. *Biophys. J.* 108, 928–936. doi: 10.1016/j.bpj.2014.12.035

Karplus, M., Ichiye, T., and Pettitt, B. M. (1987). Configurational entropy of native proteins. *Biophys. J.* 52, 1083–1085. doi: 10.1016/S0006-3495(87)83303-9

Killian, B. J., Yundenfreund Kravitz, J., and Gilson, M. K. (2007). Extraction of configurational entropy from molecular simulations via an expansion approximation. *J. Chem. Phys.* 127:024107. doi: 10.1063/1.2746329

King, B. M., Silver, N. W., and Tidor, B. (2012). Efficient calculation of molecular configurational entropies using an information theoretic approximation. *J. Phys. Chem. B* 116, 2891–2904. doi: 10.1021/jp2068123

King, B. M., and Tidor, B. (2009). MIST: maximum information spanning trees for dimension reduction of biological data sets. *Bioinformatics* 25, 1165–1172. doi: 10.1093/bioinformatics/btp109

Kollman, P., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., et al. (2000). Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. *Acc. Chem. Res.* 33, 889–897. doi: 10.1021/ar000033j

Laio, A., and Parrinello, M. (2002). Escaping free energy minima. *Proc. Natl. Acad. Sci. U.S.A.* 99, 12562–12566. doi: 10.1073/pnas.202427399

Mongan, J., Simmerling, C., McCammon, J. A., Case, D. A., and Onufriev, A. (2007). Generalized Born model with a simple robust molecular volume correction. *J. Chem. Theory Comp.* 3, 156–169. doi: 10.1021/ct600085e

Mukherjee, A. (2011). Entropy balance in the intercalation process of an anti-cancer drug daunomycin. *J. Phys. Chem. Lett.* 2, 3021–3026. doi: 10.1021/jz2013566

Nguyen, H., Maier, J., Huang, H., Perrone, V., and Simmerling, C. (2014). Folding simulations for proteins with diverse topologies are accessible in days with a single physics-based force field and implicit solvent. *J. Am. Chem. Soc.* 136, 13959–13962. doi: 10.1021/ja5032776

Nicholls, A., Sharp, K. A., and Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. *Proteins Struct. Funct. Genet.* 11, 281–296. doi: 10.1002/prot.340110407

Numata, J., Wan, M., and Knapp, E. W. (2007). Conformational entropy of biomolecules: beyond the quasi-harmonic approximation. *Genome Inform.* 18, 192–205. doi: 10.1142/9781860949920_0019

Onufriev, A., Bashford, D., and Case, D. A. (2004). Exploring protein native states and large-scale conformational change s with a modified generalized Born model. *Proteins Struct. Func. Gen.* 55, 383–394. doi: 10.1002/prot.20033

Polyansky, A. A., Zubac, R., and Zagrovic, B. (2012). Estimation of conformational entropy in protein-ligand interactions: a computational perspective. *Methods Mol. Biol.* 819, 327–353. doi: 10.1007/978-1-61779-465-0_21

Roux, B., and Simonson, T. (1999). Implicit solvent models. *Biophys. Chem.* 78, 1–20. doi: 10.1016/S0301-4622(98)00226-9

Singh, H., Misra, N., Hnizdo, V., Fedorowicz, A., and Demchuk, E. (2003). Nearest neighbours estimates of entropy. *J. Math. Manag. Sci.* 23, 301–321. doi: 10.1080/01966324.2003.10737616

Still, W. C., Tempczyk, A., Hawley, R. C., and Hendrickson, T. (1990). Semianalytical treatment of solvation for molecular mechanics and dynamics. *J. Am. Chem. Soc.* 112, 6127–6129. doi: 10.1021/ja00172a038

Straatsma, T. P., and McCammon, J. A. (1991). Multiconfiguration thermodynamic integration. *J. Chem. Phys.* 95, 1175–1188. doi: 10.1063/1.461148

Swanson, J. M. J., Adcock, S. A., and McCammon, J. A. (2005). Optimized radii for Poisson–Boltzmann calculations with the AMBER force field. *J. Chem. Theory Comp.* 1, 484–493. doi: 10.1021/ct049834o

Swanson, J. M. J., Wagoner, J. A., Baker, N. A., and McCammon, J. A. (2007). Optimizing the Poisson dielectric boundary with explicit solvent forces and energies: lessons learned with atom-centered dielectric functions. *J. Chem. Theory Comp.* 3, 170–183. doi: 10.1021/ct600216k

Torrie, G. M., and Valleau, J. P. (1977). Nonphyisical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling. *J. Comp. Phys.* 23, 187–199. doi: 10.1016/0021-9991(77)90121-8

Wang, L., Abel, R., Friesner, R. A., and Berne, B. J. (2009). Thermodynamic properties of liquid water: an application of a nonparametric approach to computing the entropy of a neat fluid. *J. Chem. Theory Comput.* 5, 1462–1473. doi: 10.1021/ct900078k

Wereszczynski, J., and McCammon, J. A. (2012). Statistical mechanics and molecular dynamics in evaluating thermodynamic properties of biomolecular recognition. *Q. Rev. Biophys.* 45, 1–25. doi: 10.1017/S0033583511000096

Keywords: free energy, enthalpy, entropy, molecular dynamics simulations, implicit solvent, continuum solvent, MM/GBSA

Citation: Fogolari F, Corazza A and Esposito G (2018) Free Energy, Enthalpy and Entropy from Implicit Solvent End-Point Simulations. *Front. Mol. Biosci.* 5:11. doi: 10.3389/fmolb.2018.00011

Received: 26 October 2017; Accepted: 25 January 2018;

Published: 08 February 2018.

Edited by:

Paolo De Los Rios, École Polytechnique Fédérale de Lausanne, SwitzerlandReviewed by:

Alfonso De Simone, Imperial College London, United KingdomPatrick Senet, Université de Bourgogne, France

Copyright © 2018 Fogolari, Corazza and Esposito. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Federico Fogolari, federico.fogolari@uniud.it