Free Energy, Enthalpy and Entropy from Implicit Solvent End-Point Simulations

Free energy is the key quantity to describe the thermodynamics of biological systems. In this perspective we consider the calculation of free energy, enthalpy and entropy from end-point molecular dynamics simulations. Since the enthalpy may be calculated as the ensemble average over equilibrated simulation snapshots the difficulties related to free energy calculation are ultimately related to the calculation of the entropy of the system and in particular of the solvent entropy. In the last two decades implicit solvent models have been used to circumvent the problem and to take into account solvent entropy implicitly in the solvation terms. More recently outstanding advancement in both implicit solvent models and in entropy calculations are making the goal of free energy estimation from end-point simulations more feasible than ever before. We review briefly the basic theory and discuss the advancements in light of practical applications.


INTRODUCTION
Free energy is the key quantity in the description of the thermodynamics of biological systems, and therefore an important objective of biomolecular simulations. In spite of its relevance, free energy calculations are not performed for every simulation performed although most often concepts like (local) stability are used to describe the results of the simulation suggesting a thermodynamic description.
The difficulties arise obviously from the calculation of the entropy of the system which cannot be immediately computed as an ensemble average over simulation snapshots, contrary to the enthalpy. The free energy G can be formally related to an ensemble average: where R is the gas constant, T the temperature, β is equal to 1 RT , U is the potential energy and indicates thermodynamic ensemble average. However, as pointed out many years ago (Beveridge and DiCapua, 1989), this formula is of no practical use unless all configurational space enters (implicitly or explicitly) the average, because lowest probability configurations display the highest value contribution to the average and there cannot be convergence for upper unbound potentials. In molecular simulations typically only near-equilibrium configurations are sampled.
For this reason free energy differences calculation is performed using methods, like umbrella sampling (Torrie and Valleau, 1977), thermodynamic integration (Straatsma and McCammon, 1991) or metadynamics (Laio and Parrinello, 2002), that compute by various methods the free energy along a pathway connecting end-point states, even if it is often just the difference in free energy between the latter points that is actually needed.
If the two end-points are close enough a simulation of one or both end-points may be performed and free energy differences obtained by free energy perturbation (Zwanzig, 1954), but this is not possible in general.
The problem with the computation of the entropy from a single MD simulation is that only part of the configurational space is accessed by simulation.
In this short perspective we remark that the long sought calculation of free energies from end point simulations may be afforded with reasonable accuracy from implicit solvent endpoint simulations. In particular, some recent developments in the way entropy is calculated in fact allow to compute free energy from a single simulation, so it is worth to review briefly the theory here and to provide simple formulae to compute free energy, entropy and enthalpy from implicit solvent simulations.

THEORY
We consider here simulations sampling different states of a system A in solution. This can be done by different simulations if the states are kinetically well separated, or by post-processing a single simulation to divide microstates belonging to different states.
If the simulation is extensive enough most probable microstates are sampled and representative thermodynamic ensembles are generated for the different states which include samples of all microstates whose probability density is higher than a threshold.
Enthalpy can be computed as the energy ensemble average, whereas entropy is problematic, in particular for what concerns solvent degrees of freedom. Although solvent entropy has been taken into account in some studies (e.g., De Simone et al., 2005) the extensive correlation among solvent molecules makes solvent entropy estimation a difficult task. In order to get rid of this problem we treat the system using an implicit solvent model.

Implicit Solvent Models
Following the excellent reviews available (Gilson et al., 1997;Roux and Simonson, 1999;Wereszczynski and McCammon, 2012) we write the standard chemical potential, or molar free energy, in the following form: where r A and r S are the solute and solvent coordinates. C includes the integrals over the reference state (1M, random orientation) and the momentum integrals (independent of conformation) that cancel when comparing different states of the same system and the term P 0V A , with P 0 the standard pressure andV A the partial volume of the solute, whose dependence on conformation provides negligible effects at standard pressure. For a derivation see Gilson et al. (1997). For this reason in the following C will not be considered further.
Due to the difficulties in estimating the entropy of highly correlated solvent molecules the solvation potential of mean force ( W( r A , T)) is defined by integrating out the solvent degrees of freedom.
(3) where U AS denotes the energy terms that couple the solute and the solvent and U S is the solvent energy.
With this definition and the assumptions made above we have: where the dependence on the temperature of the solvation energy W has been made explicit. Implicit solvent models provide a functional form and parameters for W( r A , T).
Although the implicit solvent model based on the Poisson-Boltzmann equation (Fogolari et al., 2002) and the solvent accessible surface area (PBSA) could be used in molecular dynamics simulations (Fogolari et al., 2003), the method based on the Generalized Born model and solvent accessible surface area (GBSA) is the method of choice for its computational efficiency (Still et al., 1990;Onufriev et al., 2004). Limitations of both methods include treatment of small crevices and cavities, where water could not display bulk solvation properties, and neglection of curvature dependence of surface tension coefficient (Nicholls et al., 1991). For GBSA an additional limitation could be due to the dependence of empirical parameters on molecular shape (see e.g., Fogolari et al., 2015a). Compared to approaches simulating the solutes in explicit solvent and post-processing the trajectory in implicit solvent (Kollman et al., 2000), methods that generate the configurational ensemble using the same implicit solvent used for energy calculations are more consistent because they don't suffer from possible mismatches between the implicit and explicit solvent models.
In recent years some of the available implicit solvent models (including forcefield and set of parameters used (Swanson et al., 2005(Swanson et al., , 2007) have been shown to be extremely accurate in the treatment of protein thermodynamics, as demonstrated by the study by Simmerling and coworkers, where 16 out of 17 proteins could be correctly folded using the GB-Neck2 implicit solvent model and for 14 of them the native fold was preferred over the misfolded one (Mongan et al., 2007;Nguyen et al., 2014).

Enthalpy in Implicit Solvent Models
The molar enthalpy is obtained as: When the derivation is performed, taking into account that also W( r A , T) depends on the temperature, we obtain: where the symbol <> indicates the ensemble average of the quantity within brackets, i.e., the average over the simulation of the same quantity.
The above equation shows that the enthalpy, which is obtained as the energy ensemble average in MD simulations, is directly related to the ensemble average of the implicit solvent model potential energy. The difference is expressed by the −T < ∂ W ∂T > term which may however be obtained by explicit derivation with respect to the temperature of the solvation energy terms in the implicit solvation model. In the GBSA model the solvation energy is the sum of an electrostatic ( W el ) and a surface tension ( W SA ) term: where ǫ(T) and ǫ in are the solvent and molecular (typically 1) dielectric constants, respectively, q i is the charge of the i th atom, f ij a pairwise function depending on all atoms' coordinates, γ (T) is a surface tension and A is the solvent-accessible surface area. The dependence on temperature of the solvent dielectric constant and of the surface tension has been made explicit in the above equation. The contributions to the derivative of W with respect to temperature are easily obtained from the electrostatic and surface tension solvation energy terms, when the temperature dependence of the parameters of the implicit solvent model is known:

Entropy in Implicit Solvent Models
The entropy is linked to the configurational solute probability distribution. First consider that the probability density distribution in the implicit solvent model is: When the derivation of the free energy is performed the entropy can be written as: Rewriting the entropy in terms of probability density allows to rewrite the free energy in turn as the sum of an ensemble average and a configurational entropy term, which is itself an ensemble average: As said above the difficulty in estimating free energies resides in the entropy estimation which requires consideration of both sampled and non-sampled configurational space. Similarly, to estimate ρ( r A , T), consideration of non-sampled configurational space is needed. Implicit solvent models circumvent the problem treating solvent entropy implicitly, through the parameters and their temperature dependence.
In the above equation the configurational entropy is written formally as an ensemble average of the probability density. In this respect a convenient description of systems containing proteins is the bond, angle, torsion (BAT) representation (Go and Scheraga, 1976), because bonds, and to some extent also angles, contribute very little to changes in entropy for different states of proteins (e.g., notably, bound and non-bound) (Karplus et al., 1987). For this reason entropy is estimated often considering to a first approximation only torsional degrees of freedom and possibly external rotation and translation degrees of freedom.
Other descriptions, e.g., in cartesian coordinates, have been used and the entropy has been calculated assuming the system is moving harmonically or anharmonically about the energy minimum (see recent reviews Polyansky et al., 2012;Wereszczynski and McCammon, 2012). This approximation may however be poor for loops or other unrestrained parts of the molecules.
A particularly attractive method to compute entropy is the approach proposed by Singh et al. (2003) which was further developed by the same authors and others (Hnizdo et al., , 2007(Hnizdo et al., , 2008Darian et al., 2005;Numata et al., 2007;Wang et al., 2009;Mukherjee, 2011;Fenley et al., 2014;Huggins, 2014;Fogolari et al., 2015bFogolari et al., , 2016. In practice the probability density ρ(x 1 , x 2 , ..., x s ) may be estimated considering a ball of radius r i around each configurational sample x i up to the k-th nearest neighbor (Figure 1), then the local probability density (ρ( x i )) is obtained by dividing the number k of the n samples which are found inside the ball over the volume of the ball V i and n, i.e.,: The idea is very simple and can be made rigorous . When equation 14, or its exact form, is substituted in equation 13 the free energy may be estimated easily from configurational samples. The discrepancy between naive and exact treatment is limited. Note that the volume around each configurational sample is tuned to match the local density and there is no need consider regions where no configurational sample is found. Obviously the dimensionality of a protein system is large and it is not possible in practice to consider a probability density without actually assuming that most degrees of freedom are decoupled from each other. The groups of Gilson (Killian et al., 2007) and Tidor (King and Tidor, 2009;King et al., 2012) have proposed treatments of mutual information that provide a practical estimate of global entropy (actually, an upper bound) considering only single degrees of freedom and pairwise mutual information. Depending on the number of samples the approach can be easily extended to higher orders of correlation.

CONCLUSION
We have recapitulated above the fundamentals of implicit solvent free energy calculations from end-point simulations, recalling the relationship between explicit and implicit solvent models, and showing how entropy and enthalpy can be obtained from implicit solvent simulations.
From a practical point of view we can estimate free energy from implicit solvent simulations using equation 13: The ensemble average of U + W is provided by the potential energy in the implicit solvent model used, whereas the entropy −R < log(ρ( r A , T)) > must be computed from the conformational ensemble. The available implicit solvent models and parameters, together with forcefield parameters, have been shown to be accurate enough to reproduce complex phenomena like protein folding giving confidence in the accuracy of the implicit solvent potential of mean force. The average of the potential energy over an implicit solvent molecular dynamics trajectory thus provides the first term in the above equation.
The entropic term may be estimated using the nearest neighbor method which is emerging as an accurate entropy estimator, with many advantages over more traditional methods, including lack of hypothesis on non-sampled conformational space, no need to consider explicitly non-sampled conformational space and sound theoretical basis.
To make the application of the method straightforward we have implemented the nearest-neighbor method in two programs, PDB2ENTROPY and PDB2TRENT, available through the git-hub repository (URL: https://github.com/federicofogolari) which allow to compute conformational and rotationaltranslational entropy directly from the conformational ensemble in PDB format.
In summary, the recent advancements in solvation models and entropy calculation, based on the nearest-neighbor method, are making computation of free-energy from end-point simulations significantly more accurate than before, with many possible applications in the next future.

AUTHOR CONTRIBUTIONS
All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.