dMM-PBSA: A New HADDOCK Scoring Function for Protein-Peptide Docking

Molecular-docking programs coupled with suitable scoring functions are now established and very useful tools enabling computational chemists to rapidly screen large chemical databases and thereby to identify promising candidate compounds for further experimental processing. In a broader scenario, predicting binding affinity is one of the most critical and challenging components of computer-aided structure-based drug design. The development of a molecular docking scoring function which in principle could combine both features, namely ranking putative poses and predicting complex affinity, would be of paramount importance. Here, we systematically investigated the performance of the MM-PBSA approach, using two different Poisson–Boltzmann solvers (APBS and DelPhi), in the currently rising field of protein-peptide interactions (PPIs), identifying the correct binding conformations of 19 different protein-peptide complexes and predicting their binding free energies. First, we scored the decoy structures from HADDOCK calculation via the MM-PBSA approach in order to assess the capability of retrieving near-native poses in the best-scoring clusters and of evaluating the corresponding free energies of binding. MM-PBSA behaves well in finding the poses corresponding to the lowest binding free energy, however the built-in HADDOCK score shows a better performance. In order to improve the MM-PBSA-based scoring function, we dampened the MM-PBSA solvation and coulombic terms by 0.2, as proposed in the HADDOCK score and LIE approaches. The new dampened MM-PBSA (dMM-PBSA) outperforms the original MM-PBSA and ranks the decoys structures as the HADDOCK score does. Second, we found a good correlation between the dMM-PBSA and HADDOCK scores for the near-native clusters of each system and the experimental binding energies, respectively. Therefore, we propose a new scoring function, dMM-PBSA, to be used together with the built-in HADDOCK score in the context of protein-peptide docking simulations.


INTRODUCTION
Molecular docking is a computational method that investigates the intermolecular complexes formed between two or more constituent molecules. It comprises the process of generating a model of a complex based on the known three-dimensional structures of its components, i.e., the target (protein, or nucleic acids) and the ligand (a peptide, a protein, a small organic molecule), free or bound to other species (Rognan, 2013). The docking procedure consists in the search for near-native ligand conformations and orientations (usually referred to as docking poses) with respect to a target protein, where the structure of the latter is known or modeled. Fast approximate mathematical expressions (so called scoring functions) are used to rank the docking poses based on estimates of the goodness of the conformations obtained for each putative binder and of the binding affinity estimate of the two interacting partners. Pioneered during the early 1980s (Kuntz et al., 1982), molecular docking is still a field of intensive research, as it represents a fundamental component in many drug discovery programs (Meng et al., 2011) and a primary tool for the virtual screening of large chemical libraries (Kitchen et al., 2004). The typical system considered in docking calculations includes the ligand, the receptor, and the solvent molecules. Because of the enormous number of degrees of freedom associated with the solvent, it is usually neglected in the calculations, or implicitly accounted for in the scoring functions. Despite some valuable improvements in the accuracy and efficiency of the molecular docking algorithms, there are still considerable drawbacks and limitations to face. Among these, the reliability of the scoring functions is probably one of the aspects deserving more attention, since discriminating native pose and obtaining a fair correlation between docking scores and experimental activity data remain difficult tasks. These limitations are responsible for the occurrence of false-positive and false-negative hits in the ranked lists resulting from the screenings performed with standard docking methods. Over the years, since the pioneering work of Kuntz et al. (1982), several scoring functions have been developed (Gilson and Zhou, 2007;Huang et al., 2010;Sarti et al., 2013), based on several terms. Despite empirical scoring functions are still widely used in drug discovery since they are faster and relatively accurate, first-principle methods for ranking decoy structures and for predicting affinity should be considered the first desirable choice in docking scoring stage.
Hence, it is a general opinion that molecular docking results may benefit from post-processing with more accurate tools, able to provide higher accuracy in energy scoring of the putative docked poses. Among several docking approaches, HADDOCK is one of the few computational docking programs that follow a data-driven strategy, using experimental data (generated either via NMR experiments, mutagenesis, or mass spectrometry) as pivotal information to generate docking poses (Dominguez et al., 2003). Moreover, the program allows the receptor to undergo small conformational changes upon association with the ligand, a feature that has been deemed as crucial in simulating the binding process (Spiliotopoulos and Caflisch, 2014). HADDOCK has been applied successfully to a plethora of biomolecular systems (Dominguez et al., 2003). Its reliability is highlighted by the excellent evaluations in the CAPRI experiments (van Dijk et al., 2005) and by the fact that more than 60 structures solved via HADDOCK docking have been deposited in the Protein Data Bank (Berman et al., 2000). Moreover, continuous efforts are devoted to integrate HADDOCK with experimental methodologies (Hennig et al., 2012) and other computational techniques (Kastritis et al., 2014) in order to improve its builtin scoring function. Among several other scoring approaches, Molecular Mechanics Poisson-Boltzmann Surface Area, MM-PBSA, is routinely used to evaluate the strength of the complex formation between protein and ligands. MM-PBSA represents a good trade-off between calculation efficiency and accuracy in binding energy calculations and it has been profitably exploited in virtual design since it allows a ranking in different docking runs and between different ligands (Graves et al., 2008;Venken et al., 2011;Yang et al., 2011;Barakat et al., 2012;Genheden and Ryde, 2015). Although MM-PBSA is one of the most used approximate methods for the estimate of binding free energies, it also presents weaknesses that should not be overlooked. In particular, a source of error can be represented by the entropy contribution, which is often neglected when relative binding free energies of similar molecules are computed. Furthermore, the quality of results depends on different computational factors, including the conformational sampled space, the force field, internal dielectric constant, and the set of atomic radii (Weis et al., 2006). Additional MM-PBSA limitations in the estimation of binding free energies are for highly polar molecules such as DNA and RNA (Kongsted et al., 2009), buried ligands (Singh and Warshel, 2010), and in presence of explicit water molecules that might contribute to the binding free energy (Homeyer and Gohlke, 2012). Therefore, several attempts have been made to improve accuracy and predictivity of the MM-PBSA method acting on the solvation term, including polar and non-polar terms. Expedients such as using different PB solvers (Feig et al., 2004), tuning the grid mesh (Harris et al., 2013), and/or the internal dielectric constant (Singh and Warshel, 2010;Hou et al., 2011;Genheden and Ryde, 2015), including crystallographic and/or specific water molecules (Treesuwan and Hannongbua, 2009;Liu et al., 2013;Maffucci and Contini, 2013), have allowed to successfully use MM-PBSA in the binding free energy calculations (Wang et al., 2001). However, there is still room for improvement to make MM-PBSA more efficient and reliable in the binding free energy calculations in different respects.
Here we focus our investigation on the field of proteinpeptide interactions (PPIs), which are gaining large interest in the biological and pharmaceutical research (Scott et al., 2016). In fact, the inhibition of PPIs is of paramount importance in drug discovery and development. The main problem in the PPIs simulation is that the protein-peptide interface is large, shallow, and involving several contacts characterized by being weak, transient and non-specific. Therefore, not all the PPI interface contributes equally to the strength of the binding between the partners. PPIs are rather mediated by hot spots, small regions that give the largest contribution to the binding.
In the present study, we investigated the effectiveness of MM-PBSA on evaluating protein-peptide docked complexes using HADDOCK software in order to consider the possibility of exploiting this relatively fast approach as an additional scoring function. We show in the results section the performance of MM-PBSA as a scoring function for the 19 systems (Results-Section MM-PBSA As Scoring Function for Protein-Peptide Docking) and as binding affinity predictor for the systems where reliable experimental binding affinity data were available (Results-Section Correlation between Experimental Binding Free Energies and Scores).

RESULTS AND DISCUSSION
MM-PBSA is an end-point method devised to estimate binding free energy ( G comp ) as the difference of the free energy of the complex and those of the unbound receptor and peptide (Massova and Kollman, 1999). Normally, it is performed from a set of snapshots obtained from Molecular Dynamics simulation (Hou et al., 2011). This method is significantly less computationally demanding than alternatives such as free energy perturbation (FEP) calculations and therefore it represents a possible alternative to FEP for virtual screening of large chemical libraries. It relies on the use of implicit solvent (for the PB part) and it requires energy calculations only on the endpoint (bound/unbound) states whereas other approaches require energy calculation along a reaction coordinate. MM-PBSA has already been used as a scoring function in the past with various outcomes (Kuhn et al., 2005;Thompson et al., 2008;Zhou et al., 2009;Genheden and Ryde, 2015) but to the best of our knowledge this is the first time it has been used for a set of PPIs obtained from docking calculations. Previous studies have shown that MM-PBSA is efficient to identify the correct binding poses and rank small molecules for a specific target (Thompson et al., 2008;Hou et al., 2011;Zhu et al., 2013). However, there is no systematic evaluation of the performance of MM-PBSA in identifying the correct docking poses in the protein-peptide context. As mentioned, the free energy of binding was calculated as the difference in free energy between the product state and the reactants state, that is, between the energy of the protein-peptide complex and the sum of the energies of the protein and the ligand in their unbound forms.
We investigated 19 protein-peptide complexes for which structural and thermodynamic data (binding free energy values G bind ) were available ( Table 1). The final MM-PBSA values are calculated as the sum of two molecular mechanics terms (namely Coulomb and Lennard-Jones), which are calculated by HADDOCK, and two solvation terms, including polar and non-polar solvation contributions, which here were calculated using two different Poisson-Boltzmann equation solvers, APBS (Baker et al., 2001) and DelPhi (Rocchia et al., 2001(Rocchia et al., , 2002 in combination with the NanoShaper program (Decherchi and Rocchia, 2013). We decide to use two different solvers to minimize the MM-PBSA aforementioned weakness and interestingly the binding free energy values from the two solvers were in good agreement in all case studies, indicating that the consistency of the approach. ( Figure S1).
In our calculations, we observed that the computed binding free energies were larger than those obtained by experiments ( Table 2), an overestimation that has been already observed in other systems. This behavior has been often ascribed to the omission of the entropic contribution, which is an approximation typical of these calculations (Gilson and Zhou, 2007;Spiliotopoulos et al., 2012).
The constituents of each complex (i.e., protein and peptide) were separated and re-docked using HADDOCK. The HADDOCK scores and MM-PBSA binding free energies corresponding to 200 poses were calculated in each system with the HADDOCK's clustering-based approach. Similarly to HADDOCK score calculation, where coulombic interactions are scaled to a fifth, we also dampened MM-PBSA, i.e., the MM-PBSA energies were calculated multiplying both coulombic and polar solvation terms by 0.2. We will therefore compare three different scoring functions, including HADDOCK built-in, MM-PBSA, and dampened MM-PBSA, which we call dMM-PBSA. The two following sections show the performances of each scoring function in discriminating between the correctly and incorrectly docked peptide poses (Section MM-PBSA As Scoring Function for Protein-Peptide Docking) and in correlating with the experimental G bind through the whole dataset (Section Correlation between Experimental Binding Free Energies and Scores).

MM-PBSA As Scoring Function for Protein-Peptide Docking
We sought to determine the correlation between the results of the identified scoring methods and the i-RMSD (interface RMSD, see Section Materials and Methods for details) values. The redocked structures were clustered using the HADDOCK protocol based on i-RMSD values (de Vries et al., 2010). In the Poisson-Boltzmann equation (polar term in MM-PBSA), calculations were performed using ε equal to 2 and 80 for the solute and solvent, respectively. We then calculated the probability to find at least one near-native structure (i.e., displaying an i-RMSD lower than 2 Å) among the N top-ranking of the best 4 poses in each cluster (clusters BEST4 ) according to HADDOCK, MM-PBSA, or dampened MM-PBSA (dMM-PBSA), respectively. The percentage of systems with near-native pose vs. the number of clusters for each scoring function is shown in Figure 1. Overall, we observe that HADDOCK score is a valid scoring function by which the near-native pose is ranked within the first cluster in 8 out of 19 systems (about 40%) and within the top 3 clusters in 12 out of 19 (about 63%). MM-PBSA has a somehow worse performance, ranking the near-native pose within the first and the second cluster in 7 out of 19 (about 37%) and 8 out of 19 (about 43%) in the top 3 clusters. The modulation of the polar terms resulting from the MM-PBSA calculation proved able to improve the MM-PBSA performance. In fact, dMM-PBSA reaches similar performance to HADDOCK score in ranking the near-native pose in the first cluster and in the top 3 clusters (11 out of 19, about 58%).

Correlation between Experimental Binding Free Energies and Scores
A further question of interest is whether scoring functions can reliably predict binding affinities when carried out on multiple structures. To address this question, we correlated the HADDOCK scores, MM-PBSA, and dMM-PBSA values of the   We wondered then whether the scores could be exploited to correctly predict the G comp of a new set of protein-peptides. Therefore, we performed a docking with HADDOCK for two additional protein-peptide systems: 1. AIRE-PHD1 complexed with the H3 histone peptide, both NMR structure (PDB 2KE1) and G bind value available (Chignola et al., 2009). 2. NPHP1-SH3 domain in complex with a polyproline peptide, only G bind available (Wodarczyk et al., 2010).
The latter system represents a real blind case study of PPIs since no experimental structural information was available for this system. Binding free energy obtained according to the MM-PBSA approach on the putative pose belonging to the top-ranking cluster BEST4 according to the HADDOCK score has been carried out and the calculated value correlated with experimental values (Figure 2, black star). The G comp of the AIRE and NPHP1-SH3 complexes lies close to the previously calculated regression line, suggesting that the scores can be reliably used to predict the correct pose of PPIs systems. Breakdown of the binding free energy into its components, including van der Waals, electrostatic, polar solvation, and nonpolar solvation interaction energy terms, identified the factors dominating binding affinity for the whole dataset. We analyzed the correlation between either the Lennard-Jones terms (vdW) or the buried surface area (BSA) of the same cluster BEST4 and the experimental binding free energies. Data for vdW and BSA vs. experimental binding free energies along with i-RMSD values are plotted in Figure 3. The van der Waals and BSA terms are fairly correlated with the experimental binding free energies (R = 0.53, p = 0.02, Figure 3, upper panel, and R = −0.58, p = 0.009, Figure 3, lower panel, respectively). The anticorrelation between experimental G bind values and BSA relies on the fact that larger BSA allows broader interactions between protein and peptide partners. In contrast, no correlation has been found between experimental data and the polar terms.
One of the main advantages to use the MM-PBSA approach in the docking scoring is the ability to quickly perform the computational alanine scanning (CAS) on the best pose in order to evaluate the energetic contribution to the binding affinity of individual residues. Therefore, we carried out a small study of CAS on the AIRE-PHD1 system (five mutants) using the data published in Spiliotopoulos et al. (2012) and we decided to be in the best scenario possible, e.g., we used the NMR complex (Chignola et al., 2009) as starting structure for the CAS calculation. We also evaluated the HADDOCK score of each mutated AIRE-PHD1 complex by only performing the water refinement [i.e., by which the rigid body stage (it0) and the flexible refinement (it1) are turned off]. In agreement with the published data (Spiliotopoulos et al., 2012) we found a good correlation between both MM-PBSA (R = 0.88, p < 0.02) and dMM-PBSA (R = 0.86, p < 0.03) and the experimental data ( Figure S3 middle and bottom), whereas HADDOCK score showed lower correlation with the experimental data (R = 0.60, p < 0.21) (Figure S3 top).

CONCLUSIONS
In this work, we made use of the MM-PBSA technique in docking scoring and in affinity prediction of protein-peptide complexes. We also compared the results with the HADDOCK built-in scoring function. Overall, HADDOCK and dMM-PBSA, a dampened MM-PBSA version, behaved similarly in ranking the near-native poses in the top 3 clusters, improving over the standard MM-PBSA version. The introduction of weights for the different MM-PBSA terms is not unprecedented in the literature (Zhou et al., 2009), but this approach has never been applied to PPIs. Notably, despite the fact that different experimental conditions (where the main difference regarded the type of buffer and the ionic strength, whereas as both pH and temperature were comparable) and techniques ( Table 1) were used to determine the dissociation constants, we observed a good correlation between experimental and computational G of binding using HADDOCK and dMM-PBSA scoring functions, 0.63 and 0.66 respectively. Interestingly, lack of modulation of the Our findings were then validated on two additional systems, one with known structure and binding affinity and one for which only the G bind has been reported. Both HADDOCK and dMM-PBSA methods perform remarkably well in ranking the two additional protein-peptide complexes, and lead to good correlation with experimentally measured G bind .
In order to assess the presence of possible systematic errors in the binding free energy calculations, we used two different PB solvers, namely APBS (Baker et al., 2001) and DelPhi (Rocchia et al., 2001(Rocchia et al., , 2002. No differences in term of correlation with experimental data were found using the two different solvers, except that in absolute G comp values ( Figure S1) and in the calculation speed (see Section Materials and Methods). APBS provided a G comp larger than DelPhi. This could arise from different aspects, including the different approaches used to describe the dielectric interface, the approach used to estimate the reaction field energy, and/or how the different solvers treat cavities that are internal to the solute. The accuracy of the PB equation solution has been reported to be sensitive to the grid size, in favor of smaller grid spacing (Sørensen et al., 2015) However, decreasing the grid spacing increases the computational resources needed to perform the calculation, both in terms of physical memory, and the computational time required. We choose a grid size of 0.5 Å for both solvers since it represents a good trade-off between speed and accuracy. With lower grid resolution, APBS would have encountered problems in convergence in G comp calculation (see Figure 3.2 in Sørensen et al., 2015). Finally, the choice of the interior dielectric (ε int ) value in the PB calculation is not trivial since in the literature its value can be found spanning between 1 and 20. Higher ε int (4-20) aims to effectively mimic polarization and local rearrangement effects, as well as transient penetration of water molecules into the solute interior. They should in principle be preferred when the PB calculation is performed on individual structures. On the other hand, lower ε int (2-3) is preferred in order to mainly account for electronic polarization and it is commonly used on ensembles of structures, which explicitly account for conformational flexibility. In light of this, there is still no consensus on the most appropriate ε int value. We decided to use ε int = 2 in all cases, while being aware that scaling the polar contribution in dMM-PBSA is similar to considering a dielectric screening of the solute medium, accounting for polarization and rearrangement due to the reaction to the existing fields ( Figure S2). This is also in agreement with previous studies where the rank-ordering performance for MM-PBSA improves with increasing dielectric constant (Wang et al., 2013). Simultaneously, this supports the relative importance of the non-polar components and results in better performance. In fact, the van der Waals and BSA terms, which are directly related to the MM-PBSA non-polar terms, correlated well with the experimental data and they provided the main contribution to the final score. These factors might explain the better performance of the dampened MM-PBSA. Finally, we analyzed the calculated data from HADDOCK and MM-PBSA in order to evaluate the reproducibility of the results in terms of scoring and final correlation with the experimental data. First of all, for a given complex, with the corresponding MM energy terms provided by HADDOCK, the only variations in the final results could arise from the PB calculations since it depends on many parameters, including the grid spacing, the atomic radii, and the dielectric interior. For this reason, we used two different PB solvers, APBS and DelPhi, using identical parameters. In Figures S1, S2 it is shown the good agreement in the MM-PBSA and dMM-PBSA calculations with the two PB solvers, indicating that the calculations are quite robust. Second, we calculated the correlations between the experimental and calculated binding free energy using different MM-PBSA values from the cluster BEST4 and not the average value. The final correlations R were in a range of 0.39-0.49 for MM-PBSA and 0.60-0.67 for dMM-PBSA, indicating that in principle the MM-PBSA could be performed on a single pose with less computational effort. Finally, the calculated standard error for each cluster BEST4 reported in Table 2 represent the MM-PBSA limit when we try to compare the binding affinities of different complexes, mainly when dealing with docking since the conformational sampling is poor with respect to performing MM-PBSA from molecular dynamics simulations (Spiliotopoulos et al., 2012). Nevertheless, in case of AIRE-PHD1 mutants, the MM-PBSA and dMM-PBSA uncertainty over the NMR structure bundle (20 structures) reduces, leading to an average standard error of 34 and 7 kJ/mol, respectively, indicating that dMM-PBSA is fairly reliable in predicting proteinpeptide interface alanine mutations. Moreover, the CAS dMM-PBSA of AIRE-PHD1 error range values are fairly in agreement with the CAS MM-PBSA error range (ca. 4 kJ/mol) shown in Spiliotopoulos et al., in which the sampling was carried out by molecular dynamics simulations (Spiliotopoulos et al., 2012). This result indicates that dMM-PBSA carried out on a small number of structures (e.g., 20) behaves similarly to MM-PBSA from molecular dynamics, in which the sampling is more extended. In fact, the modulation of the polar terms in MM-PBSA is probably taking into account the possible local rearrangement similarly to what is done via a higher internal dielectric, indicating that dMM-PBSA represents a simple and promising approach in evaluating alanine mutations from single structure, either from docking or from X-Ray or from NMR.
We believe that in parallel with the recently developed optimization of the HADDOCK score for PPIs inhibitors (Kastritis et al., 2014), the combination of HADDOCK score and modified MM-PBSA binding free energy might lay the groundwork for novel approaches to study in silico PPIs inhibitors in a quick and automatic fashion. The advantage of using MM-PBSA as scoring function is threefold. First, MM-PBSA is versatile. It provides estimates of the equilibrium averages over the solvent degrees of freedom, permitting the post-processing of solute representative snapshots from docking poses. Since MM-PBSA estimates binding free energy, it represents a valuable alternative since it is in principle transferable between different docking runs and can be used to score both intra-ligand and inter-ligand poses, saving individual validation for each system under study. This makes MM-PBSA more suitable for novel problems with limited experimental data as we demonstrated in the case study of NPHP1-SH3. Second, MM-PBSA can be used to quantify the thermodynamical strength of the putative poses. In particular, our dMM-PBSA is a reliable scoring function in the protein-peptide field showing good a correlation with experimental data. Therefore, the advantage to use dMM-PBSA with respect to MM-PBSA relies on the possibility to modulate the polar terms without rerunning the Poisson-Boltzmann calculations at different internal dielectric values. Finally, MM-PBSA could better allow to disclose atomistic details of protein-peptide binding, supporting the rational design of bioactive compounds. In fact, post-processing task such as CAS approach has been very recently applied to MD simulations for successfully evaluating the importance of key residues in the protein-peptide binding complex (Spiliotopoulos et al., 2012). Therefore, MM-PBSA can be used to calculate the G, defined as G wt − G mutALA , on the protein-peptide best docked poses in order to identify residues for which mutation to alanine strongly attenuates binding. The latter behavior occurred in our short CAS study of AIRE-PHD1 mutants (Figure S3), by which an acceptable correlation of R = 0.86/0.88 between the experimental and the calculated binding energies of the mutants demonstrated the possibility to use MM-PBSA as a promising tool at low computational cost to evaluate the hot-spots in the PPIs field with respect to the HADDOCK score.
Finally, structural prediction of protein-peptide complexes remains challenging due to two major obstacles: peptides are highly flexible and they often interact weakly with their substrate, underlining their importance in signal transduction or regulation which often relies on transient processes. This leaves flexible docking as one of the few amenable computational techniques to model these complexes (Verkhivker et al., 2000;Hetényi and van der Spoel, 2002;Niv and Weinstein, 2005;Raveh et al., 2010Raveh et al., , 2011Trellet et al., 2013). In our study, we are considering the two interacting partners, protein and peptide, already in the bound conformation, which represents a strong assumption in term of binding mechanism and also the best scenario for a docking calculation, especially in the PPIs field. This undoubtedly increases the success rate in the native pose determination and it reduces the error in the binding affinity calculations since in our MM-PBSA approach we are neglecting most of the solute entropic contribution (i.e., under this assumption the estimate of S = S Complex − (S protein − S peptide ) can be poor). Recent work have highlighted the efforts to improve HADDOCK protocol in the field of protein-peptide (Trellet et al., 2013) but still predicting large conformational changes remains a challenge as indicated by several failure to accurately predict cases where the protein undergoes large conformational changes upon binding (Trellet et al., 2013). In this case, even the more accurate and reliable scoring function and binding free energy methods will struggle in discriminating the correct binding mode due to both hard and soft docking failure (Verkhivker et al., 2000).
In conclusion, in contrast to other scoring functions and approximate binding free energy calculation methods such as the linear interaction energy (LIE) method, MM-PBSA contains less empirical parameters and, thus, it is more likely to be useful in determining the relative free energies of binding of quite different compounds and systems for which there is more limited experimental data, although a protein structure of the target is, of course, required.

Data Set
Particular attention should be paid to the choice of the data set exploited as a benchmark in the binding free energy computational estimation. First of all, binding affinity greatly depends on temperature, pH, and salt concentration (Acampora and Hermans, 1967) and these parameters are difficult to incorporate in the docking calculation. Second, experimental binding data present in the literature are determined with different experimental techniques and they are from different research laboratories. This could significantly impact on the reliability and comparability of the results. Therefore, it should be desirable to rely as much as possible on homogenous data, in terms of research laboratory, experimental technique, and experimental conditions. Along this line, we used a subset of the London's benchmark (http://www.weizmann.ac.il/Organic_Chemistry/London/) for which there were available free forms of the proteins and binding affinity data. The data set consists of 19 complexes the structures of which have been determined by X-ray crystallography, as shown in Table 1. In addition to this data set, two proteinpeptide systems, including AIRE-PHD1 (NMR structure, PDB ID 2KE1) (Chignola et al., 2009) and NPHP1-SH3 (Wodarczyk et al., 2010), were used in order to establish the predictive ability of the three different scoring functions. In case of NPHP1-SH3 only experimental binding free energy data were available.

Generation of Binding Poses
We used the experimental data (chemical shift mapping and mutagenesis) to generate the decoy structures of the 19 cases study and for both AIRE-PHD and NPHP1-SH3 using the HADDOCK strategy. The HADDOCK protocol proceeds through three steps (rigid docking, semi-flexible docking, and water refinement) (de Vries et al., 2007). Non-bonded interactions were calculated with the OPLS force field using a cutoff of 8.5 Å. The electrostatic energy (E elec ) was calculated using a shift function while a switching function (between 6.5 and 8.5 Å) was used for the van der Waals energy (E vdw ). This procedure generated 200 models for each complex, starting from different random velocities. As per default of the HADDOCK protocol, the average score of the top 4 models was considered. The HADDOCK score is defined as a weighted sum of the following four terms: where E elec is the electrostatic energy, E vdw is the van der Waals energy, E desolvation is the desolvation energy and E AIR restraints (i.e., distance) violation energies. The different docking parameter settings and cluster analysis were selected according to the protocol reported in de Vries et al. (2010). BSA is defined as SASA Complex − (SASA Protein + SASA Peptide ) and it is calculated directly by HADDOCK. All calculations were performed with HADDOCK, version 2.1/CNS, version 1.2, through the refinement interface of the HADDOCK web server (de Vries et al., 2010).

Binding Free Energy Calculation MM-PBSA
A modified version of the recently published GMXPBSA tool (Paissoni et al., 2014), named HADDOCKPBSA was used to perform the MM-PBSA calculations for the systems. Similarly to the previous version of the scripts, the calculations are organized in an automatic fashion that can be run in parallel in a PBS queue system and the scripts are extensively commented to facilitate their customization. Improving the previous version, HADDOCKPBSA facilitates the interface between HADDOCK and Poisson-Boltzmann Surface Area (PBSA) calculations.
The method for determining the binding free energy following the MM-PBSA approach has been described previously (Massova and Kollman, 1999). The binding free energy of MM-PBSA was estimated as following: This average over each term, i.e., using a set of snapshots, is required since the Poisson-Boltzmann method to calculate G sol averages only over the degrees of freedom of the solvent and not of the solute, i.e., protein, peptide, and ligands. The energetic term E MM is defined as: where E int indicates bond, angle, and torsional angle energies, and E coul and E LJ denote the intramolecular electrostatic and van der Waals energies, respectively. Equation (3) is normally approximated to E coul + E LJ since E int will zero out upon binding )) if the same conformations are considered for the free and bound forms. The solvation term G solv in Equation (4) is split into polar G polar and non-polar contributions, G nonpolar : Equation (2) can therefore be rewritten as: where: where α is a parameter allowing to reduce the polar contribution to the <G> values. Here, the polar contribution G polar was calculated with two different PB solvers: APBS (Adaptive Poisson-Boltzmann Solver) (Baker et al., 2001) and DelPhi (Rocchia et al., 2001(Rocchia et al., , 2002 programs. The polar contribution G polar refers to the energy required to transfer the solute from a continuum medium with a low dielectric constant (ε = 2) to a continuum medium with the dielectric constant of water (ε = 80). G polar was calculated using the nonlinear Poisson Boltzmann equation. The grid spacing was automatically set to 0.5 Å. The temperature was set to 296 K, and the salt concentration was 0.15 M. The non-polar contribution G nonpolar was calculated with two different approaches: APBS internal routine and using the NanoShaper program (Decherchi and Rocchia, 2013). This term was considered proportional to the solvent accessible surface area (SASA): where γ = 0.0227 kJ mol −1 Å 2 and β = 0 kJ mol −1 (Spiliotopoulos et al., 2012). The dielectric boundary was defined using a probe radius of 1.4 Å. The binding free energy of a protein molecule to a peptide molecule in a solution was then defined as: where <G i > is calculated as the average of the best 4 poses of each cluster. The computational determination of the free energy of binding requires the calculation of the entropic contributions to complex formation, including conformational changes in rotational, translational and vibrational degrees of freedom of the solute. Solute entropic contributions are usually estimated by either the quasi-harmonic approach (e.g., Schlitter equation) or by normal mode analysis (Gohlke and Case, 2004). Entropy calculations would require a full sampling of the free energy landscape, an extremely computationally demanding step, which can result in unreliable results (Brown and Muchmore, 2009) with standard errors usually one order of magnitude larger than those associated with the other energetic components (Kar et al., 2011). In addition, the normal mode analysis estimation is often extremely qualitative (Cheatham et al., 1998) and the configuration entropy estimate on a short dynamic time range can be non-significant (Majumdar et al., 2011). Based on these considerations, we decided to neglect the entropic term in our calculations, leading to a one-parameter model: where with α = 1 is the canonical MM-PBSA method, α = 0.2 is the new dMM-PBSA method discussed in this work. The standard error (SE) is calculated as follows: where σ is the standard deviation and N is the number of averaged structures (N = 4). Briefly, the protocol obtains the Molecular Mechanics (MM) terms, including intermolecular van der Waals and coulombic terms, for each complex from the HADDOCK output file energies.disp. Then, the get_average.inp protocol file is modified in order to re-generate the complexes file inserting the partial charges and radii according to the PQR format.
The same PQR files were used for the Poisson-Boltzmann calculations (G solv ), in order to ensure consistency across the two solvers APBS and DelPhi. Preserving partial charges and intrinsic PB radii is important, as they might significantly affect the outcomes. Notably, each program returns results in different energy units; APBS reports in kJ/mol, and DelPhi reports in kT. All results in this paper are converted to kJ/mol to ease comparison. In APBS six calculations are performed, one for each component in either solvent or "dry" (uniform dielectric ε ext = ε int = 2) environment. The energy (G solv ) is reported in the "elecEnergy" term and the G solv is then calculated again by subtracting the receptor and ligand from the complex in each environment and then subtracting the values from the dry environment from those of the solvated environment (ε ext = 80). For DelPhi, six calculations are performed, one for each component both in the presence or absence of salt/ions. The energy G solv term used is the difference in "corrected reaction field energy" from the calculations without salt, as well as the difference in the "total grid" energies calculated with and without salt. Surface Area is calculated using the apbs built-in function in APBS and NanoShaper functionalities for DelPhi.
Subsequently, the structures of the complex, the protein and the peptide are used to perform the PBSA calculations.
HADDOCKPBSA then generates a grid suitable for the calculations for all the structures: the coordinate extremes of the complexes in each dimension are extracted, and 20 and 10 Å are added to each value to set the limits of the coarse and fine grids, respectively. The tool then automatically calculates the number of grid points that is feasible for APBS and DelPhi calculations and builds a mesh finer than 0.5 Å. When all calculations are completed, the final MM-PBSA value is calculated as the sum of the van der Waals and coulombic terms (calculated by HADDOCK) and the polar and non-polar solvation terms (calculated by APBS and DelPhi). HADDOCKPBSA can also extract the HADDOCK scores values and conveniently generate output files that ease the comparison with the MM-PBSA values. HADDOCKPBSA tool is a set of bash script interfacing HADDOCK output with both APBS and DelPhi Poisson-Boltzmann solvers, which need to be installed on their own. HADDOCPBSA is available on the HADDOCK GitHub repository (https://github.com/haddocking).

Computation Time
The PBSA terms were calculated with the two different programs, APBS and DelPhi/NanoShaper. Calculations were carried out on a personal computer with CPU i7 dual-quad core and 16 GB of memory. The time-averaged calculation of the MM part relies on the HADDOCK calculations, which are performed on the clusters. The post-processing time-averaged calculation of the PBSA terms is different depending on the program used. APBS program allows to calculate all-in-once, including PB and SA terms, using the apbs tool and it requires ∼120 s per complex (about 1000-1500 atoms on average), whereas DelPhi and NanoShaper requires ∼12.5 s per complex. Each system comprises 200 putative poses, for a total of 4200 structures analyzed. The performance of the DelPhi solver benefits from the specific approach it uses to estimate the reaction field energy. As described in Rocchia et al. (2001Rocchia et al. ( , 2002 the procedure is kept analytical as far as possible. The polarization charge in each grid cube at the boundary between high and low dielectric constant is calculated via Gauss law, then its position is relocated by projecting it over the analytical expression of the Connolly molecular surface. This permits, on one side, to avoid double PBE solution using different dielectric constant values, and, on the other, to get results which are particularly robust regarding position and orientation of the system with respect to the grid. Robustness and efficiency are further enhanced by coupling DelPhi solver with NanoShaper, as shown in Decherchi and Rocchia (2013).

Statistical Treatment of the Derived Data
Linear correlation between calculated and experimental binding affinities was evaluated via the Pearson product-moment correlation coefficient (R). p-values from two-tailed Gaussian probability were determined for each data set using the R, and the sample size information, assuming that correlations are statistically significant if p < 0.05. Due to the relatively small size of the samples, we preliminarily performed the Shapiro-Wilk normality test. This test indicated that our data can be modeled according to the normal distribution, (W parameter of 0.96, >0.90, which represents the threshold for 5% significance level). Moreover, the associated p-value is 0.63, much greater than 0.05, which is the common accepted threshold to consider a distribution normal. The standard error is calculated as σ/ √ N, where σ is the standard deviation and N the number of structures (i.e., N = 4 for each cluster BEST4 ).

AUTHOR CONTRIBUTIONS
DS, designed research, ran experiments, analyzed results, wrote the paper; PK designed research, ran experiments, analyzed results; AM designed research, ran experiments, analyzed results; AB designed research, analyzed results; GM designed research, analyzed results, wrote the paper; WR analyzed results, wrote the paper; AS, designed research, ran experiments, analyzed results, wrote the paper.

FUNDING
The research leading to these results has received funding under the Horizon 2020 Program, FET