Original Research ARTICLE
An Efficient Implementation of the Nwat-MMGBSA Method to Rescore Docking Results in Medium-Throughput Virtual Screenings
- Dipartimento di Scienze Farmaceutiche, Sezione di Chimica Generale e Organica “Alessandro Marchesini,” Università degli Studi di Milano, Milan, Italy
Nwat-MMGBSA is a variant of MM-PB/GBSA based on the inclusion of a number of explicit water molecules that are the closest to the ligand in each frame of a molecular dynamics trajectory. This method demonstrated improved correlations between calculated and experimental binding energies in both protein-protein interactions and ligand-receptor complexes, in comparison to the standard MM-GBSA. A protocol optimization, aimed to maximize efficacy and efficiency, is discussed here considering penicillopepsin, HIV1-protease, and BCL-XL as test cases. Calculations were performed in triplicates on both classic HPC environments and on standard workstations equipped by a GPU card, evidencing no statistical differences in the results. No relevant differences in correlation to experiments were also observed when performing Nwat-MMGBSA calculations on 4 or 1 ns long trajectories. A fully automatic workflow for structure-based virtual screening, performing from library set-up to docking and Nwat-MMGBSA rescoring, has then been developed. The protocol has been tested against no rescoring or standard MM-GBSA rescoring within a retrospective virtual screening of inhibitors of AmpC β-lactamase and of the Rac1-Tiam1 protein-protein interaction. In both cases, Nwat-MMGBSA rescoring provided a statistically significant increase in the ROC AUCs of between 20 and 30%, compared to docking scoring or to standard MM-GBSA rescoring.
Structure based virtual screening (SBVS) methods are widely applied in drug discovery (Enyedy and Egan, 2008; Sousa et al., 2013). In most of the cases, SBVSs are done in the hit-to-lead development phase of the drug discovery process, with multiple successful outcomes (Enyedy et al., 2001a,b; Vangrevelinghe et al., 2003). In SBVS-related studies, scoring functions are mostly applied for potential hit selection. In general, the scoring functions are based on either empirical, knowledge-based, or molecular mechanics force field derived potentials (Wang et al., 2003; Raha et al., 2007). Additionally, to make the virtual screening process computational inexpensive, the scoring functions are most likely simplified. Thus, some important contributions known to influence the binding affinity are neglected (Sousa et al., 2006; Moitessier et al., 2008). Inevitably, applications of such simplified methods tend to fail in the hit optimization phase, where more meticulous selections are required about structurally similar compounds for better prediction of biological activity (Leach et al., 2006; Tirado-Rives and Jorgensen, 2006; Warren et al., 2006; Enyedy and Egan, 2008).
A better scoring can be achieved by considering energy evaluation averaged over an ensemble of conformations from a complex dynamic trajectory, as is the underlying concept of the molecular mechanics Poisson-Boltzmann / Generalized Born surface area (MM-PB/GBSA) analysis. Of course, the applications of MM-PB/GBSA methods are at the cost of increased computational expenses (Massova and Kollman, 2000). Nonetheless, the MM-PB/GBSA methods have been successfully applied to estimate binding energies (Kollman et al., 2000), or incorporated as a scoring method in SBVS applications (Lyne et al., 2006; Zhou et al., 2006; Ferrari et al., 2007; Xiong et al., 2007; Xu et al., 2010; Xu, 2012; Knight et al., 2014). The treatment of the solvent in MM-PB/GBSA calculations is implicit, providing an acceptable estimations of the energy contribution while bulk water is the only solvent-related concern (Wong and Lightstone, 2011; Yang et al., 2013). However, explicit water molecules might also be important in forming biomolecular complexes (Chong and Ham, 2017), particularly waters involved in bridging the ligand and the receptor (Wong et al., 2009; Abel et al., 2011; Ahmad et al., 2011; Wallnoefer et al., 2011; Maffucci and Contini, 2013; Mikulskis et al., 2014). Indeed, by analyzing several thousand of crystallographic complexes, it was recently observed that at least a water molecule mediates contacts between the partners in about two thirds of all the considered systems (Hendlich et al., 2003). Thus, several computational methods were proposed to aid the identification of important water molecules in crystal structures (Raymer et al., 1997; García-Sosa et al., 2003; Amadasi et al., 2006). Moreover, although replacing a water molecule in the binding site is a generally accepted strategy to increase drug potency, it has been shown that better pharmacodynamic properties might be obtained by keeping a tightly bound water as a bridge between the ligand and the receptor (García-Sosa, 2013). The effects of targeting or displacing binding site waters in drug design can be rigorously assessed by free energy calculations (García-Sosa and Mancera, 2010), that however are still too demanding when libraries of hundreds of molecules need to be evaluated. Therefore, some approaches have been attempted to consider the contribution of water-mediated interactions into the ligand docking score (Young et al., 2007; Ricchiuto et al., 2008; Forli and Olson, 2012; Ross et al., 2012; Kumar and Zhang, 2013; Murphy et al., 2016) or into the MM-PB/GBSA estimated binding energy (Checa et al., 1997; Wong et al., 2009; Genheden et al., 2011; Wallnoefer et al., 2011; Greenidge et al., 2013; Maffucci and Contini, 2013).
In this framework, we developed a MM-PB/GBSA variant, that we refer as Nwat-MMGBSA, which provided good-to-excellent results in ranking the binding energies of different protein-ligand or protein-protein complexes (Maffucci and Contini, 2013, 2016). Nwat-MMGBSA is based on the inclusion of a number of explicit water molecules, that are selected to be the closest to the ligand in each frame of the MD trajectory and are included as part of the receptor during the analysis. In addition to our work (Maffucci and Contini, 2013, 2016), Aldeghi and coworkers recently validated, by a thorough statistical analysis, the use of this approach on bromodomains (Aldeghi et al., 2017). Compared to other methods that include explicit water in MM-PB/GBSA calculations, Nwat-MMGBSA might have some advantages. For instance, relevant explicit water might be selected from the crystal structure (Wong et al., 2009; Wallnoefer et al., 2011). However, this imply that high resolution crystal structures are available, while Nwat-MMGBSA calculations can be performed on receptor models obtained by other techniques, such as homology modeling or NMR. Moreover, crystallographic water sites might derive from the average electron density of several molecules competing for the same position (Schiffer and Hermans, 2003). Indeed, we previously observed that a water-bridge between the ligand and the receptor found in the crystal structure of topoisomerase I in complex with topotecan (Staker et al., 2002) was described by the competition of three different waters in a 4 ns MD trajectory (Maffucci and Contini, 2013). It was also reported that explicit water for MM-GB/PBSA calculations might be selected from MD simulations accordingly to their distance from the ligand (Zhu et al., 2014). In this case, the distance from the ligand atoms is fixed, while the number of waters is different in each snapshot selected for MM-PB/GBSA analysis. However, by comparing this method to Nwat-MMGBSA, where the number of selected water is constant among all snapshots, we observed that Nwat-MMGBSA provided a better correlation with experiments and a better reproducibility among multiple repetitions of the same calculation (Maffucci and Contini, 2016). In this work, aiming to make Nwat-MMGBSA suitable for rescoring ligands in low- to medium-throughput SBVS experiments, we optimized the protocol to improve its efficiency, without losing in accuracy. We selected penicillopepsin (James et al., 1992; Ding et al., 1998; Hou et al., 2011a), HIV1-protease mutants (Shen et al., 2010; Olajuyigbe et al., 2011) and BCL-XL (Lessene et al., 2013) as test systems with known experimental data, either binding free energy (ΔG), inhibition constant (ki), or IC50. Our studies have shown improvements in the coefficient of determination to experimental data (r2) ranging from 10 to 60%, depending on the number of explicit water molecules considered in the energy evaluation. Moreover, we assessed the Nwat-MMGBSA approach for SBVS rescoring performance in a ligand-protein interaction and a protein-protein interaction (PPI) scenario (AmpC β-lactamase and Rac1-Tiam1, respectively). In both cases, improved outcomes were observed compared to either docking scoring or to standard MM-GBSA rescoring. Furthermore, the complete SBVS workflow applied in this work, including Nwat-MMGBSA rescoring, is provided in the Supplementary Information as a set of bash and tcsh scripts that, together with working tutorials, should make it readily applicable to other biomolecular systems of interest.
Preparation of Complexes
Crystal structures of the penicillopepsin [PDB codes: 1APU, 1APV, 1APT, 1APW (James et al., 1992), 2WEA, 2WEB, and 2WEC (Ding et al., 1998)] and HIV1-protease [PDB codes: 3NU3, 3NU4, 3NU5, 3NU6, 3NUJ, 3NU9, 3NUO (Shen et al., 2010), 3NDW, and 3NDX (Olajuyigbe et al., 2011)] complexes (Table S1) were obtained from RCSB Protein Databank (Figures S1, S2). However, for the BCL-XL system, (Figure S3) only 3ZK6, 3ZLN, 3ZLO, and 3ZLR complexes were available as crystal structures (Lessene et al., 2013). Therefore, the starting structures of the unavailable complexes were reconstructed using MOE software (Molecular Operating Environment, v2016.08, 2016) starting from the available ones. Ligand partial charges were derived with the AM1-BCC method using the antechamber (Wang et al., 2006) software of AmberTools15 package (Case et al., 2014). All waters, ions and stabilizing agents present in the crystal structures were removed. The protonation state of every titratable residue within the complexes were assigned at physiological conditions using the Protonate-3D module of MOE.
MD simulations were performed with the pmemd.MPI or pmemd.cuda (Götz et al., 2012; Salomon-Ferrer et al., 2013) modules, depending on the hardware (classical HPC environment or GPU equipped workstations, respectively), included in the Amber14 package (Case et al., 2014). The ff14SB (Maier et al., 2015) and the gaff (Wang et al., 2004) force fields were adopted for the protein and the ligand in all simulations respectively. In each complex, the total charge was neutralized by adding Na+ or Cl- ions, and the systems were solvated by an octahedral box of TIP3P water (Jorgensen et al., 1983), with a box size of 10 Å from the solute.
The equilibration and production protocols were updated to optimize performance, in respect to previous studies (Maffucci and Contini, 2013, 2016). The systems were initially relaxed by optimizing the position of hydrogens (1,000 cycles of steepest descent (SD) and 5,000 cycles of conjugated gradient (CG), up to a gradient of 0.01 kcal mol−1 · Å; restraints of 100 kcal·mol−1 · Å2 were applied on heavy atoms) and of ions and waters (2,000 cycles of SD and 5000 cycles of CG up to a gradient of 0.1 kcal·mol−1·Å; restraints of 50 kcal·mol−1·Å2 were applied on atoms other than ions and water). The solvent box was then equilibrated at 300 K by 100 ps of NVT and 100 ps of NPT simulation using a Langevin thermostat with a collision frequency of 2.0 ps−1 (restraints of 50 and 25 kcal·mol−1·Å2 were applied on the solute for NVT and NPT simulations, respectively). Successively, two cycles of restrained minimization (2500 cycles of steepest descent and 5,000 cycles of conjugated gradient, up to a gradient of 0.1 kcal mol−1 Å, with restraints of 25 and 10 kcal mol−1 Å2 on backbone atoms, respectively) were performed. The systems were then heated up to 300 K in 6 steps (ΔT = 50 K) of 5 ps each, where backbone restraints were gradually reduced from 10.0 to 5.0 kcal mol−1 Å2. An equilibration of 1.6 ns was then performed by initially using the NVT ensemble (100 ps, ligand and backbone restraints = 5.0 kcal mol−1 Å2) followed by NPT (1 step of 200 ps with ligand and backbone restraints = 5 kcal mol−1 Å2, then 3 steps of 100 ps each reducing the ligand and backbone restraints from 5.0 to 1.0 kcal mol−1 Å2, and finally 1 step of 500 ns with ligand and backbone restraints of 1.0 kcal mol−1 Å2). The last equilibration step consisted in 500 ps of unrestrained NVT simulation. Finally, production runs were conducted under the NVT condition at 300 K for 1 or 4 ns. An electrostatic cutoff of 8.0 Å, PME (Darden et al., 1993) for long electrostatic interactions, and the SHAKE (Ryckaert et al., 1977) algorithm were applied to all the calculations. Three independent simulations were performed for each hardware set-up (GPU workstation or CPU HPC cluster). For the simulations performed on GPUs, the default single precision/fixed precision (SPFP) version of pmemd.cuda (Le Grand et al., 2013) was applied in all steps, except for geometry minimizations where the double precision/fixed precision (DPFP) version was adopted.
All MD production trajectories were processed by cpptraj for backbone RMSD analyses (Figures S4–S11), solute-solvent hydrogen bond (donor-acceptor distance cutoff at 4.0 Å, angle cutoff at 150°) and water density (grid analysis over a cubic box 50 Å × 50 Å × 50 Å, mesh = 0.5 Å, centered on ligands) analyses. Images of water density plots were obtained by using UCSF Chimera (Pettersen et al., 2004).
MM-GBSA and Nwat-MMGBSA analyses were performed with the MMPBSA.py script (Miller et al., 2012) of the AmberTools15 package. The analyses were conducted on either the 1st or the 4th ns of the production runs by selecting 100 frames evenly spaced out. The GB-Neck2 implicit solvent model (Nguyen et al., 2013) was chosen for the GB calculations and the salt molar concentration in solution was set at 0.15 M. Entropy was neglected in all calculations, since the benefits of including its contribution still remain controversial (Weis et al., 2006; Hou et al., 2011a; Wallnoefer et al., 2011; Yang et al., 2011) and normal mode calculations are also extremely time consuming. It should be noted that neglecting entropy, although acceptable when comparing ligands of similar size and structure (Kollman et al., 2000; Wang et al., 2001; Wong et al., 2009), might lead to errors when the analysis involves ligands that are structurally rather different (Oehme et al., 2012).
The Nwat-MMGBSA script (Figure 1) uses the cpptraj module of AmberTools15 to process the solvated MD trajectory. When Nwat > 0, the water molecules closest to the ligand were preserved while the remaining were stripped from the selected frames by using the cpptraj command closest. The total number of water molecules to be kept in the trajectory is given by the Nwat flag in the script input section (in this work, we evaluated Nwat = 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100). The number of frames that are going to be selected from the original MD trajectory is defined by the r flag in the script input section, that corresponds to the interval keyword in the MMPBSA.py script (Miller et al., 2012). In this work, r was set at 10, meaning that one every 10 frames (i.e., 100 frames per nanosecond) was sampled. The preserved closest water molecules are considered as part of the receptor during the MM-GBSA analysis. In analogy with studies on the MM-PB/GBSA performance previously reported by us and by others (Hou et al., 2011b; Maffucci and Contini, 2013, 2016; Xu et al., 2013), the coefficient of determination (r2) between experimental data and calculated binding energies was used as the evaluation metric.
Figure 1. Pseudocode of the complete screening and rescoring workflow. Each texture represents an independent script: VScreen (diagonal), autoMD (dots), and Nwat-MMGBSA (vertical).
Restrospective Virtual Screening
Preparation of the Receptor
The AmpC β-lactamase receptor was derived from the 2HDS PDB file (Babaoglu and Shoichet, 2006) according to what described on the DUD-E website (Mysinger et al., 2012). Starting from the crystal structure, only chain B was preserved, crystallographic water molecules were removed and the “Structure preparation” module of the MOE software was used to check the protein structure and correct eventual errors. The receptor was then capped by acetyl (ACE) and methylamino (NME) groups at the N- and C-termini, respectively. Missing hydrogen atoms were added using the “Protonate 3D” function of the MOE package, considering a physiological pH. Partial charges were added accordingly to the AMBER10:EHT force field and solvation was treated with the Born model. The system geometry was then optimized up to a gradient of 0.1 kcal mol−1 Å, with protein backbone atoms restrained to the original position.
AmpC β-Lactamase Testing Library
The DUD-E database provides 48 experimentally determined active ligands and 2850 decoy molecules for AmpC β-lactamase. However, considering the computational cost of rescoring by MD simulations followed by Nwat-MMGBSA analyses, we considered using a smaller library that decently represent the original database. Hence, fingerprint clustering methods included in MOE package were applied to reduce the size of the test set. Multiple fingerprint/similarity metric method combinations have been trial-and-error-ed. We found that the application of Typed Graph Triangle (TGT) fingerprint and Tanimoto Superset/Subset (tanimoto-ss) similarity metric method provides the closest reproduction of virtual screening results as the data provided by DUD-E. However, the combination applied here might not be directly transferred to other biomolecular systems. The clustering process reduced the original database to 20 active ligands and 378 decoys (Table S2) with a docking AUC at 74.82% and top 1% enrichment factor of 9.5, in comparison to the original 78.92% and 8.3 provided on DUD-E database. The smiles of the final database are reported in Table S2.
Rac1 Testing Library
By analyzing the literature and by using in-house data, we collected a set of 116 compounds, 10 of which were active and 106 inactive on the Rac1 protein (Table S3). The active compounds were selected among those able to inhibit at least the 50% of Rac1 activity, as assessed by G-LISA biochemical assays (Ferri et al., 2009, 2013a). Conversely, the decoys were chosen among molecules that were designed (or identified by computational screening) as Rac1 inhibitors, but turned out to be completely inactive on experimental evaluation (Ferri et al., 2009, 2013a; Hernández et al., 2010; Surviladze et al., 2010; Shang et al., 2012; Rahimi et al., 2015; Lu et al., 2017). The selected compounds were designed with MOE, minimized and subjected to a conformational search (MMFF94x force field, Born solvation, with the other parameters as default). The lowest energy conformation of each compound was selected to form the final test set.
The workflow included in the VScreen script (see Supplementary Information) allows the following combinations for library processing:
1) Use the library as it is
2) Generate tautomers
3) Generate stereoisomers and tautomers
4) Generate ring conformations and tautomers
5) Generate stereoisomers, ring conformations, and tautomers
6) Generate stereoisomers, ring conformations, tautomers, and protonation states.
The sixth library processing tandem was applied in this work. The UNICON software is used to generate tautomer and protonation states (Sommer et al., 2016). We chose the topscoring keyword to generate only the most favored tautomers and protomers, as in preliminary evaluation we observed that the generation of all tautomers and protomers (using the ensemble keyword) did not provide improved results and was significantly more time consuming. The SPORES software (ten Brink and Exner, 2009, 2010) is instead used to obtain stereoisomer and ring conformation, as well as for the final assignment of atom types, as requested by the PLANTS software (Korb et al., 2006, 2007, 2009, 2010) used for all dockings. Specific docking parameters, including search speed and scoring functions, can be set directly in the VScreen script. In the examples reported here, PLANTS was used in a low speed / high accuracy mode (search speed = speed1) and with the CHEMPLP scoring function (Korb et al., 2009). Additional PLANTS commands, such as H-bond or NMR constraints (Korb et al., 2010), can also be inserted in the input section of the VScreen script. Concerning the Rac1 example, we requested a H-bond constrain of 3 kcal/mol between any H-bond donor of the docking ligand and the carbonyl oxygen of Leu 70, since the literature evidence the importance of such ligand-receptor interaction for a proper activity (Montalvo-Ortiz et al., 2012; Ferri et al., 2013a; Ruffoni et al., 2014). Binding site radii were optimized to 16 Å for Rac1 and 7 Å for AmpC β-lactamase test sets, respectively. After the virtual screening process, the outcomes were ranked according to total PLANTSCHEMPLP score, using the top ranked pose of each ligand. Receiver operating characteristic (ROC) curves and corresponding area under curve (AUC) were then generated at the end of each docking run by using an R script integrated in the VScreen program.
Following the docking, automatized parametrization of ligands for later MD simulations can be enabled by setting the doMD keyword to 1. The user is allowed to choose a “top percentage” of the ranked ligands to be subjected to parametrization, by setting the fract keyword. We have chosen 100% (fract = 100), i.e., the full test set, as we were interested in a full assessment of the Nwat-MMGBSA methods in terms of virtual screening rescoring. The antechamber software (Wang et al., 2006) of the AmberTools15 package is used for deriving AM1-BCC partial charges for each ligand and to assign atom types accordingly to the gaff force field (Jakalian et al., 2002; Wang et al., 2004). The quantum mechanical calculations necessary to perform the charge parameterization can be accomplished by using the default sqm software included in AmberTools15, or with MOPAC2016 (Stewart, 2016) by setting the qm keyword to 2 or 0, respectively. The topology and starting coordinate files of each complex are then generated by calling the tleap software, included in the AmberTools15 package. Each complex is neutralized and solvated by adding Na+ or Cl− ions and a TIP3P water box of 10 Å from the solute. MD simulations and Nwat-MMGBSA analyses can then be performed as detailed in previous sections.
Nwat-MMGBSA analyses were performed on the obtained trajectories with number of closest water molecules set to 0, 30, 60, or 100. ROC curves and corresponding AUCs were evaluated by using the rankings derived from each Nwat-MMGBSA analysis.
All scripts applied in this work are available in the Supplement Materials. Eventual updates might also be requested to the authors.
Optimization of the Nwat-MMGBSA Protocol
To optimize the Nwat-MMGBSA protocol for low- or medium-throughput virtual screening procedures, such as those applied in the hit-to-lead optimization phase of a drug discovery process, we worked on a significant reduction of the overall simulation time in comparison to our previous implementations (Maffucci and Contini, 2013, 2016). Then, we integrated Nwat-MMGBSA in a continuous workflow that includes the library setup, docking and the preparation of complexes that is propaedeutic to MD, as shown in Figure 1. The following steps of the protocol were redesigned for an optimal ratio between accuracy and speed:
• The application of AM1-BBC charges to reduce the computational cost for ligand parameterizations. Indeed, it has been reported that AM1-BCC charges behaved fairly well in MM-PB/GBSA calculations, compared to more sophisticated methods (Xu et al., 2013; Sun et al., 2014).
• The use of the NVT ensemble instead of NPT for last equilibration step and production run. This allowed a 30% reduction on the overall MD simulation time, without a significant variation in the results.
Generalized Born (GB) implicit solvent model is used by default in Nwat-MMGBSA calculations. Indeed, several articles report that GB can provide outcomes comparable to the PB method, at a fraction of the computational cost, especially when relatively short MD trajectories are used for MM-PB/GBSA calculations (Hou et al., 2011a,b; Maffucci and Contini, 2013, 2015, 2016). However, the PB method can still be requested by the user by setting the solv keyword in the input section of the Nwat-MMGBSA script (see scripts and examples provided as Supplementary Information).
Moreover, the reproducibility between independent MD simulation repeats of the same system, especially when using GPU, was also improved. This required some protocol adjustments, including a longer equilibration of the solvent box, the use of geometric restraints instead of constraints, the use of the Langevin thermostat instead of the weak coupling algorithm, and a slightly extended final equilibration phase.
The protocol modifications allowed approximately 1.5 h per ligand on a standard workstation equipped with a single GeForce GTX TITAN Black card, including parameterization, minimization, equilibration, 1 ns of production run and Nwat-MMGBSA analysis. This is roughly the same time required for the simulation on a HPC architecture using 12 nodes equipped with two 2.40 GHz octa-core processors under similar simulation settings.
Considering our interest in using Nwat-MMGBSA calculations to rescore docking results in a reasonable time, the following tests were also designed to evidence any statistical difference in the correlation to experiment when the analysis is performed on 1 ns or 4 ns long MD trajectories. All the energies computed for the discussed examples are reported in Tables S4–S39. Correlations to experiments and statistical analyses are shown in Tables S40–S42 and Table S43, respectively.
Test on Penicillopepsin
This system was already evaluated, although with a different protocol, in a previous work where the bases of the Nwat-MMGBSA approach were described (Maffucci and Contini, 2013). The results of the Nwat-MMGBSA analysis obtained with the new protocol agreed with those reported in the previous work in terms of correlation between predicted and experimental binding energies (Figure 2A and Table S40). This confirms the robustness of the Nwat-MMGBSA method toward the modifications in the MD simulation conditions. However, the new protocol showed the beneficiary application of Nwat-MMGBSA method even when only 10 closest water molecules were considered (Nwat = 10), while in the previous evaluation no significant improvement was observed at this condition. Water density plot around the binding site (Figure 2B) confirm the role of water in mediating the ligand-receptor binding. Indeed, for this system, the use of the Nwat-MMGBSA methods allowed to increase the r2 from about 0.3, obtained with the standard MM-GBSA approach (Nwat = 0), to about 0.8 (Figure 2A).
Figure 2. Results of the penicillopepsin system regarding the application of Nwat-MMGBSA method. (A) Bar chart of r2 in dependency at different Nwat and computational conditions. Nwat = 0 corresponds to a standard MM-GBSA calculation. (B) Water density plot obtained by grid analysis of penicillopepsin-1APT complex (visualization with Chimera, step = 1, level = 15).
In addition, relatively low standard deviations, obtained when averaging the r2 obtained by independent repetitions of the whole run, were observed when higher numbers of closest water molecules were considered (Table S40 and Figure 2A), thus suggesting that the inclusion of explicit waters is likely to improve the reproducibility of results from individual runs.
Moreover, the outcome obtained by running simulations on GPU and CPU hardware were statistically equivalent (Table S43), and the same was true for the analyses performed on either the 1st or the 4th ns of MD simulations (Figure 2A). This suggests that Nwat-MMGBSA analysis is suitable for the analysis of short MD simulations run on GPU cards, with a great improvement in speed and no impairments in accuracy.
Test on HIV1-Protease
Similarly to other aspartic proteases (Brik and Wong, 2003). HIV1-protease exhibits a close relationship with water-mediated bridging effects in the crystal structure (Shen et al., 2010). Consequently, the effects of explicit waters were also reflected by the Nwat-MMGBSA workflow (Figure 3A). The high water-density around a wide area at the binding site also confirms the likelihood of the involvement of explicit water during binding process (Figure 3B).
Figure 3. (A) Trend of r2 in dependency of Nwat for HIV1-protease. Nwat = 0 corresponds to a standard MM-GBSA calculation. (B) Water density plot obtained by grid analysis of HIV1-protease-3NUO complex (visualization with Chimera, step = 1, level = 15).
When considering the correlation between the experimental ki and the predicted binding energies, a significant improvement in r2 was obtained with the inclusion of a hydration shell of 30–70 water molecules (Figure 3A and Table S41). Although, results with a lower Nwat value showed no significant difference from standard MM-GBSA analyses, suggesting that smaller hydration shells around the ligand might have excluded certain solute-solvent interactions important for binding free energy estimations. However, water-mediated H-bond analyses showed that only one or two stable (occupancy > 20%) water-mediated interactions did involve the ligand, while majority of the bridging water molecules were found between protein residues (Tables S44, S45). Furthermore, crystallographic data provides that only 10–15 water molecules are generally present within 4 Å from the ligand molecules (Olajuyigbe et al., 2011). These imply potential conflicts between the lower numbers of the observed “stable” bridging water molecules to the evidently better binding free energy estimations when higher amount of closest explicit solvent (up to 70) is included. Such conflicts can only be explained when transient water bridges are considered. The averaged free energy contribution of these transient interactions is more likely been captured by Nwat-MMGBSA calculations, whereas not necessarily detectable through population distribution or electron density analyses. Indeed, for example, the inclusion of crystallographic water molecules up to 3.5 Å from the ligand did not provide a clear benefit over standard MM-PB/GBSA approach (Greenidge et al., 2013).
Similar to the penicillopepsin system, the outcomes did not show statistically significant differences between the 1st and 4th ns of MD simulations and were independent from the hardware. Apparently, 4 ns MD simulations performed using CPU averagely provided higher correlation to experiments for Nwat ≤ 20, although the high standard deviations make this result not statistically significant (Figure 3A and Table S41).
Test on BCL-XL
The Nwat-MMGBSA trails up to 50 closest water molecules have no statistical difference from standard MM-GBSA calculations, despite of different hardware environment (Figure 4A). A lower water density was indeed observed around the ligand for BCL-XL system (Figure 4B). This implies that explicit water molecules are playing a less important role in ligand binding, as reflected by the relatively deluding performance of Nwat-MMGBSA compared to MM-GBSA. A relatively high r2 (~0.7) was indeed consistent throughout the multiple trials for all of the conditions evaluated. These apparently non-effective results, however, positively suggest that the Nwat-MMGBSA method does not impair the statistical outcomes of the estimations for systems where explicit water molecules are deemed less important. Thus, it can be concluded that, even if system-specific tuning is necessary for optimal performance, Nwat-MMGBSA can be safe for binding free energy estimation even without an a priori knowledge of the bridging water in the system of interest.
Figure 4. (A) Trend of r2 in dependency of Nwat for BCL-XL. (B) Water density plot obtained by grid analysis of BCL-XL-3ZC4 complex (visualization with Chimera, step = 1, level = 15).
Retrospective Virtual Screening Test
To assess the performance of the Nwat-MMGBSA method in rescoring virtual screening results, we have chosen two case studies for protein-ligand (PLI) and protein-protein (PPI) interactions, respectively. The first system, AmpC β-lactamase (Usher et al., 1998), was selected from the Dud-E database to provide an example were water plays an active role in the target catalytic cycle. Indeed, the inclusion of an explicit water molecule was found beneficial in previously reported virtual screenings (Powers et al., 2002). Conversely, the second system, the Rac1 protein targeted at the Tiam1 binding site (Worthylake et al., 2000), was chosen because of the availability of reliable in-house activity data, including those of several inactive compounds that were however selected as potential hits by virtual screening studies previously conducted (Ferri et al., 2009, 2013a).
The AUC of the ROC curves was chosen as the main metric of comparison, since enrichment values are not indicated for databases of limited sizes (Enyedy and Egan, 2008). For the AmpC system, the full virtual screening workflow, followed by MD simulation and Nwat-MMGBSA (Nwat = 0, 30, 60 and, for Rac1 only, 100) rescoring, was repeated twice, while for Rac1 a third repetition was added due to a higher variance among the obtained correlations. Docking scores and Nwat-MMGBSA binding energies for AmpC and Rac1 screenings are reported in Tables S46, S47 and Tables S48–S50, respectively.
The receptor in Dud-E include an explicit water molecule. We did preliminary docking evaluations by including the water, using the “water_molecule” function implemented in PLANTS, but comparable results were obtained (see Figure S12). For this reason, to simplify and standardize the procedure, we decided not to include any explicit water in the docking part of the virtual screening workflow.
We initially noticed that virtual screening has already provided a decent discrimination of active from decoys with a ROC AUC that averaged at 72.0% (Table 1). The application of the standard MM-GBSA method (Nwat = 0) only provided a barely significant increase of ROC AUC value, respect to docking (Table 1), while improvements appeared once explicit water molecules were included, as shown by the Nwat = 30 and 60 scenarios (Figure 5, Figure S15). Considering that the ROC AUCs for Nwat = 30 and 60 were fully converged, no additional analyses at higher Nwat values were done. Additionally, ligand-to-ligand correlations in calculated free energies were evaluated between the two repeated runs. The standard MM-GBSA run provides a r2 of 0.66 when correlating the energies obtained by the two repetitions, while Nwat-MMGBSA with Nwat = 30 and 60 resulted in r2 of 0.84 and 0.91, respectively (Figure 6). This implies that Nwat-MMGBSA rescoring is likely to provide better reproducibility between separate runs. Moreover, the good ligand-to-ligand inter-method correlation between Nwat = 60 and 30 (r2 = 0.95 and 0.94 for runs 1 and 2, respectively; Figure S13) further confirmed the improvements in reproducibility. Interestingly, a positive binding energy was computed for two ligands by Nwat-MMGBSA calculations, but not by docking or standard MM-GBSA rescoring. The two ligands belong to the decoy set (ligands 088 and 179, Tables S46, S47) and are thus supposed to be poorly ranked. Indeed, by analyzing the binding modes of the decoy 088 (Figure S17), it can be observed that an isopropyl group overlaps with the position occupied by a water molecule present in the crystal structure (Babaoglu and Shoichet, 2006), but not explicitly considered during docking (see Methods). Conversely, decoy 179 does not overlap with the crystallographic water site (Figure S18). However, it can be observed that a solvent-exposed chloropropyl group overlaps to a position occupied by a hydrophilic amino acidic moiety of the crystallographic ligand. In both cases, it appears that Nwat-MMGBSA rescoring can correctly penalize compounds that do not offer an optimal orientation of hydrophobic groups.
Figure 5. Average ROC AUC values for AmpC β-lactamase. Statistical significance was calculated by t-test and is graphically reported only when a significant variation was observed (*P < 0.05; **P < 0.01; ***P < 0.001).
Figure 6. Correlations between the binding energies computed in two independent repetitions (Run 1 and Run 2) for AmpC β-lactamase by standard MM-GBSA (Nwat = 0) or by Nwat-MMGBSA with Nwat = 30 or Nwat = 60.
Tiam1-Rac1 PPI Interface as the PPI Test Set
Virtual screening targeting PPIs has been suggested as a challenging task, especially when only traditional docking and scoring procedures are used (Bienstock, 2012; Scott et al., 2016). In the past, we have applied standard computational methods to identify and design inhibitors of the Rac1-Tiam1 PPI, thus collecting data on compounds identified as potential hits, but that turned out to be inactive upon experiments (Ferri et al., 2009, 2013a,b; Ruffoni et al., 2014). In addition, we searched the literature to identify compounds that were tested against Rac1 inhibition, but turned out to be inactive (Hernández et al., 2010; Surviladze et al., 2010; Shang et al., 2012; Rahimi et al., 2015; Lu et al., 2017). By this way, the resulting ligand test set shared similar physico-chemical and structural features between the actives and the inactives, thus making this virtual screening a difficult discrimination process to tackle.
The docking protocol was optimized to maximize AUC by evaluating the effect of the different scoring functions available in PLANTS, by variating the binding site radius, the search speed and by using hydrogen bond constraints with residues known to be essential for activity (i.e., Leu70 or Ser71) (Gao et al., 2004). The docking poses were visually inspected to check their consistency with the poses obtained in previous studies (Ferri et al., 2013a). With the optimized protocol and for each library processing condition, all the active compounds showed a similar binding pose, except for ligand109 (Figure S14).
The ROC computed on the scores obtained by docking showed a moderate ability of this procedure in discriminating active from inactive compounds, with AUCs of about 0.6 (Table 2, Figure 7). Considering the strained characteristic of both the target and database, this result is acceptable, if compared to the ROC AUCs obtained in other benchmarks reported by literature (Brozell et al., 2012; Liebeschuetz et al., 2012; McGann, 2012; Neves et al., 2012; Novikov et al., 2012; Repasky et al., 2012; Schneider et al., 2012; Lavecchia and Di Giovanni, 2013; Yuriev et al., 2015).
Figure 7. Average ROC AUC values for Rac1-Tiam1 virtual screenings. Statistical significance was calculated by t-test and is graphically reported only when a significant variation was observed (*P < 0.05; **P < 0.01).
This time, the application of the standard MM-GBSA (Nwat = 0) rescoring did not provide any significant improvement in the AUC compared to docking (Table 2). Unexpectedly, Nwat-MMGBSA performed with 30 water molecules (Nwat = 30) behaved similarly. Conversely, the ROC AUCs improved of about 20 and 30% after rescoring with Nwat = 60 or 100, respectively (Table 2, Figure 7, and Figure S16). Since the difference in AUC between the two last scenarios was not statistically significant, no additional simulations were conducted at higher Nwat. An improvement in the ROC AUC of about 20–30%, although reproducible and significant (Zhang et al., 2014), might be questionable against the increased computational effort of rescoring with either MM-GBSA or Nwat-MMGBSA. However, in the framework of a lead optimization study, the payback of a simulation that can be easily run on relatively inexpensive hardware can be an increased chance of synthesizing a good molecule. Considering the costs associated with the synthesis of new molecules, having even only a 20% higher probability of preparing an active compound can be considered a rather good result.
When developing new drugs, computational calculation can help in identifying new hits in either the hit-to-lead or lead optimization phases. While the first task is generally performed by using very fast computational methods to screen large databases, the lead optimization phase is generally done by applying more accurate, although more computationally demanding, methods. Indeed, starting from a lead, a virtual library of hundreds-to-thousands congeneric molecules can be generated and evaluated computationally. However, the prioritization of the synthesis of a few derivatives by computational methods might still be quite challenging. In this framework, we optimized a variant on the well-known MM-GBSA method, referred as Nwat-MMGBSA (Maffucci and Contini, 2013, 2016). This approach consists in the inclusion, during the MM-GBSA analysis, of a fixed number of water molecules, which in each frame of the MD simulation are the closest to the ligand, or to a binding interface, and are therefore potentially mediating interactions between the receptor and the ligand. We demonstrated that this approach might improve the correlation between predicted and experimental binding energies up to 50%, compared to the standard MM/GBSA method (corresponding to Nwat = 0), with only a modest increase in computation time (Maffucci and Contini, 2016). Of course, the potential improvement in correlation depends on the role played by water in facilitating the ligand-receptor binding. However, we also found that when water does not play a specific role in mediating this interaction, the application of Nwat-MMGBSA is not detrimental on the quality of correlation, compared to the default approach. In the light of this, we automatized the process and optimized the MD protocol for running simulations on standard workstations equipped with a GPU, on which a full calculation can be completed in about 1–2 h per complex, depending on system size. Indeed, the results obtained by using a single GPU card are comparable, in both quality and duration, with those obtained by running MDs on a relatively large HPC environment (12 nodes with 2 octa core processors per node). Moreover, we also observed that Nwat-MMGBSA analyses provided comparable results when applied on 1 or on 4 ns MD trajectories, thus making this simulation attractive for medium-throughput virtual screenings.
In the second part of this article, we described the integration of Nwat-MMGBSA as a method to rescore docking results in SBVS studies. By applying Nwat-MMGBSA rescoring (Nwat = 60 or 100) we obtained, in both the examples, an increase in the ROC AUCs of between 20 and 30%, compared to the docking scorings or default MM/GBSA (Nwat = 0), depending on the system. In the adopted conditions, we were able to process more than 20 compounds per day using a standard octa core workstation equipped by a single GPU. Although this might appear a quite long time, compared to the thousands of compounds that can be screened per day by docking, the investment becomes reasonable when considering the time and resources required for the synthesis of new molecules. Moreover, we can expect that the fast development of GPU hardware will make MD-based rescoring even faster in short time. Indeed, in 2010 we could run a MD simulation on a Rac1 complex at a speed of 8.7 ns/day on a Tesla C1060 card, while a few years later, the same simulation was run at a speed of 59.3 ns/day on a GeForce GTX TITAN Black card.
Unfortunately, we were not able to find an ideal number of water that need to be included during Nwat-MMGBSA rescoring. Indeed, while Nwat = 30 appeared to be reasonable in most of the examples, including those reported previously (Maffucci and Contini, 2013, 2016), it failed in the Rac1 VS example. Indeed, in this case, at least 60 waters were necessary to observe a significant improvement over docking and standard MM-GBSA, possibly due to the large and solvent-exposed nature of the Rac1 binding site. Conversely, it was recently reported that MM-PBSA calculations on a set of Mnk1 and Mnk2 inhibitors provided improved correlations to experiments only when including up to 10 water molecules (Kannan et al., 2017). This quite low number, compared to other examples, was justified by the rather small interface between Mnk1/Mnk2 kinases and the respective ligands.
AC coordinated the team, designed the scripts and performed the calculations on Rac1. IM performed the calculations on penicillopepsin, HIV1 and BCL-XL. XH provided important updates to the VScreen script. VF performed the calculations on AmpC β-lactamase. IM, XH, and AC wrote the article.
This work was partially supported by the Italian Ministry of Education, University and Research (MIUR) through the FIRB—Programma ‘Futuro in Ricerca’ (grant No. RBFR087YAY), by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie ITN-European Joint Doctorate grant agreement No. 675527 MOGLYNET-programme in Drug Discovery and Development, and by NVIDIA through the GPU Grant Program.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We acknowledge the CINECA for high performance computer service and the MIUR for the Ph.D. scholarship that supported IM. We also acknowledge João Cavalheiro for having run calculations on the BCL-XL example during his Erasmus Programme. Daniela Albo and Cecilia Carvoli are also acknowledged for their work during their BS theses.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem.2018.00043/full#supplementary-material
SBVS, structure based virtual screening; VS, virtual screening; ROC, receiving operator characteristic; AUC, area under curve; MD, molecular dynamics; MM-GBSA, molecular mechanics Generalized Born surface area; PPI, protein-protein interaction; SD, steepest descendent; CG, conjugated gradient; NVT, constant number of particles, volume and temperature; NPT, constant number of particles, pressure and temperature.
Abel, R., Salam, N. K., Shelley, J., Farid, R., Friesner, R. A., and Sherman, W. (2011). Contribution of explicit solvent effects to the binding affinity of small-molecule inhibitors in blood coagulation factor serine proteases. ChemMedChem. 6, 1049–1066. doi: 10.1002/cmdc.201000533
Aldeghi, M., Bodkin, M. J., Knapp, S., and Biggin, P. C. (2017). A statistical analysis on the performance of MMPBSA versus absolute binding free energy calculations: bromodomains as a case study. J. Chem. Inf. Model. 57, 2203–2221. doi: 10.1021/acs.jcim.7b00347
Amadasi, A., Spyrakis, F., Cozzini, P., Abraham, D. J., Kellogg, G. E., and Mozzarelli, A. (2006). Mapping the Energetics of Water–Protein and Water–Ligand Interactions with the “Natural” HINT Forcefield: Predictive Tools for Characterizing the Roles of Water in Biomolecules. J. Mol. Biol. 358, 289–309. doi: 10.1016/j.jmb.2006.01.053
Brozell, S. R., Mukherjee, S., Balius, T. E., Roe, D. R., Case, D. A., and Rizzo, R. C. (2012). Evaluation of DOCK 6 as a pose generation and database enrichment tool. J. Comput. Aided Mol. Des. 26, 749–773. doi: 10.1007/s10822-012-9565-y
Checa, A., Ortiz, A. R., de Pascual-Teresa, B., and Gago, F. (1997). Assessment of solvation effects on calculated binding affinity differences: trypsin inhibition by flavonoids as a model system for congeneric series. J. Med. Chem. 40, 4136–4145. doi: 10.1021/jm970245v
Ding, J., Fraser, M. E., Meyer, J. H., Bartlett, P. A., and James, M. N. G. (1998). Macrocyclic inhibitors of penicillopepsin. II. X-ray crystallographic analyses of penicillopepsin complexed with a P3-P1 macrocyclic peptidyl inhibitor and with its two acyclic analogues. J. Am. Chem. Soc. 120, 4610–4621. doi: 10.1021/ja973714r
Enyedy, I. J., Lee, S.-L., Kuo, A. H., Dickson, R. B., Lin, C.-Y., and Wang, S. (2001a). Structure-based approach for the discovery of bis-benzamidines as novel inhibitors of matriptase. J. Med. Chem. 44, 1349–1355. doi: 10.1021/jm000395x
Enyedy, I. J., Ling, Y., Nacro, K., Tomita, Y., Wu, X., Cao, Y., et al. (2001b). Discovery of small-molecule inhibitors of Bcl-2 through structure-based computer screening. J. Med. Chem. 44, 4313–4324. doi: 10.1021/jm010016f
Ferrari, A. M., Degliesposti, G., Sgobba, M., and Rastelli, G. (2007). Validation of an automated procedure for the prediction of relative free energies of binding on a set of aldose reductase inhibitors. Bioorg. Med. Chem. 15, 7865–7877. doi: 10.1016/j.bmc.2007.08.019
Ferri, N., Bernini, S. K., Corsini, A., Clerici, F., Erba, E., Stragliotto, S., et al. (2013a). 3-Aryl-N-aminoylsulfonylphenyl-1H-pyrazole-5-carboxamides: a new class of selective Rac inhibitors. MedChemComm. 4, 537–541. doi: 10.1039/C2MD20328F
Ferri, N., Contini, A., Bernini, S. K., and Corsini, A. (2013b). Role of small GTPase protein Rac1 in Cardiovascular diseases: development of new selective pharmacological inhibitors. J. Cardiovasc. Pharmacol. 62, 425–435. doi: 10.1097/FJC.0b013e3182a18bcc
Gao, Y., Dickerson, J. B., Guo, F., Zheng, J., and Zheng, Y. (2004). Rational design and characterization of a Rac GTPase-specific small molecule inhibitor. Proc. Natl. Acad. Sci. U.S.A. 101, 7618–7623. doi: 10.1073/pnas.0307512101
García-Sosa, A. T. (2013). Hydration properties of ligands and drugs in protein binding sites: tightly-bound, bridging water molecules and their effects and consequences on molecular design strategies. J. Chem. Inf. Model. 53, 1388–1405. doi: 10.1021/ci3005786
García-Sosa, A. T., and Mancera, R. L. (2010). Free energy calculations of mutations involving a tightly bound water molecule and ligand substitutions in a ligand-protein complex. Mol. Inform. 29, 589–600. doi: 10.1002/minf.201000007
García-Sosa, A. T., Mancera, R. L., and Dean, P. M. (2003). WaterScore: a novel method for distinguishing between bound and displaceable water molecules in the crystal structure of the binding site of protein-ligand complexes. J. Mol. Model. 9, 172–182. doi: 10.1007/s00894-003-0129-x
Genheden, S., Mikulskis, P., Hu, L., Kongsted, J., Söderhjelm, P., and Ryde, U. (2011). Accurate predictions of nonpolar solvation free energies require explicit consideration of binding-site hydration. J. Am. Chem. Soc. 133, 13081–13092. doi: 10.1021/ja202972m
Götz, A. W., Williamson, M. J., Xu, D., Poole, D., Le Grand, S., and Walker, R. C. (2012). Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. generalized born. J. Chem. Theory Comput. 8, 1542–1555. doi: 10.1021/ct200909j
Greenidge, P. A., Kramer, C., Mozziconacci, J.-C., and Wolf, R. M. (2013). MM/GBSA binding energy prediction on the PDBbind data set: successes, failures, and directions for further improvement. J. Chem. Inf. Model. 53, 201–209. doi: 10.1021/ci300425v
Hendlich, M., Bergner, A., Günther, J., and Klebe, G. (2003). Relibase: Design and development of a database for comprehensive analysis of protein-ligand interactions. J. Mol. Biol. 326, 607–620. doi: 10.1016/S0022-2836(02)01408-0
Hernández, E., Dharmawardhane, S., De La Mota-Peynado, A., and Vlaar, C. (2010). Novel inhibitors of Rac1 in metastatic breast cancer. P. R. Health Sci. J. 29, 348–356. Available online at: http://prhsj.rcm.upr.edu/index.php/prhsj/article/view/451
Hou, T., Wang, J., Li, Y., and Wang, W. (2011a). Assessing the performance of the MM/PBSA and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on molecular dynamics simulations. J. Chem. Inf. Model. 51, 69–82. doi: 10.1021/ci100275a
Hou, T., Wang, J., Li, Y., and Wang, W. (2011b). Assessing the performance of the molecular mechanics/Poisson Boltzmann surface area and molecular mechanics/generalized Born surface area methods. II. The accuracy of ranking poses generated from docking. J. Comput. Chem. 32, 866–877. doi: 10.1002/jcc.21666
Jakalian, A., Jack, D. B., and Bayly, C. I. (2002). Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 23, 1623–1641. doi: 10.1002/jcc.10128
James, M. N. G., Sielecki, A. R., and Moult, J. (1992). Crystallographic analysis of transition state mimics bound to penicillopepsin: difluorostatine-and difluorostatone-containing peptides. Biochemistry 31, 3872–3886. doi: 10.1021/bi00130a019
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., and Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935. doi: 10.1063/1.445869
Kannan, S., Pradhan, M. R., Cherian, J., Joseph, T. L., Poh, Z. Y., Hai Yan, Y., et al. (2017). Small Molecules Targeting the Inactive Form of the Mnk1/2 Kinases. ACS Omega 2, 7881–7891. doi: 10.1021/acsomega.7b01403
Knight, J. L., Krilov, G., Borrelli, K. W., Williams, J., Gunn, J. R., Clowes, A., et al. (2014). Leveraging data fusion strategies in multireceptor lead optimization MM/GBSA End-point methods. J. Chem. Theory Comput. 10, 3207–3220. doi: 10.1021/ct500189s
Kollman, P. A., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., et al. (2000). Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc. Chem. Res. 33, 889–897. doi: 10.1021/ar000033j
Korb, O., Stützle, T., and Exner, T. E. (2006). “PLANTS: application of ant colony optimization to structure-based drug design,” in Ant Colony Optimization and Swarm Intelligence. ANTS 2006. Lecture Notes in Computer Science, Vol. 4150, eds M. Dorigo, L. M. Gambardella, M. Birattari, A. Martinoli, R. Poli, and T. Stützle (Berlin; Heidelberg: Springer), 247–258.
Le Grand, S., Götz, A. W., and Walker, R. C. (2013). SPFP: speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations. Comput. Phys. Commun. 184, 374–380. doi: 10.1016/j.cpc.2012.09.022
Lessene, G., Czabotar, P. E., Sleebs, B. E., Zobel, K., Lowes, K. N., Adams, J. M., et al. (2013). Structure-guided design of a selective BCL-XL inhibitor. Nat. Chem. Biol. 9, 390–397. doi: 10.1038/nchembio.1246
Liebeschuetz, J. W., Cole, J. C., and Korb, O. (2012). Pose prediction and virtual screening performance of GOLD scoring functions in a standardized test. J. Comput. Aided Mol. Des. 26, 737–748. doi: 10.1007/s10822-012-9551-4
Lyne, P. D., Lamb, M. L., and Saeh, J. C. (2006). Accurate prediction of the relative potencies of members of a series of kinase inhibitors using molecular docking and MM-GBSA scoring. J. Med. Chem. 49, 4805–4808. doi: 10.1021/jm060522a
Maffucci, I., and Contini, A. (2013). Explicit ligand hydration shells improve the correlation between MM-PB/GBSA binding energies and experimental activities. J. Chem. Theory Comput. 9, 2706–2717. doi: 10.1021/ct400045d
Maffucci, I., and Contini, A. (2015). “Tuning the solvation term in the MM-PBSA/GBSA binding affinity predictions,” in Frontiers in Computational Chemistry, eds J. D. Madura and Z. Ul-Haq (Elsevier Inc.), 82–120.
Maier, J. A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser, K. E., and Simmerling, C. (2015). ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713. doi: 10.1021/acs.jctc.5b00255
Massova, I., and Kollman, P. A. (2000). Combined molecular mechanical and continuum solvent approach (MM-PBSA/GBSA) to predict ligand binding. Perspect. Drug Discov. Des. 18, 113–135. doi: 10.1023/A:1008763014207
Mikulskis, P., Genheden, S., and Ryde, U. (2014). Effect of explicit water molecules on ligand-binding affinities calculated with the MM/GBSA approach. J. Mol. Model. 20:2273. doi: 10.1007/s00894-014-2273-x
Miller, B. R. I., Mcgee, T. D., Swails, J. M., Homeyer, N., Gohlke, H., and Roitberg, A. E. (2012). MMPBSA. py: an efficient program for end-state free energy calculations. J. Chem. Theory Comput. 8, 3314–3321. doi: 10.1021/ct300418h
Moitessier, N., Englebienne, P., Lee, D., Lawandi, J., and Corbeil, C. R. (2008). Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br. J. Pharmacol. 153(Suppl.), S7–S26. doi: 10.1038/sj.bjp.0707515
Montalvo-Ortiz, B. L., Castillo-Pichardo, L., Hernández, E., Humphries-Bickley, T., De la Mota-Peynado, A., Cubano, L. A., et al. (2012). Characterization of EHop-016, novel small molecule inhibitor of Rac GTPase. J. Biol. Chem. 287, 13228–13238. doi: 10.1074/jbc.M111.334524
Murphy, R. B., Repasky, M. P., Greenwood, J. R., Tubert-Brohman, I., Jerome, S., Annabhimoju, R., et al. (2016). WScore: a flexible and accurate treatment of explicit water molecules in ligand–receptor docking. J. Med. Chem. 59, 4364–4384. doi: 10.1021/acs.jmedchem.6b00131
Mysinger, M. M., Carchia, M., Irwin, J. J., and Shoichet, B. K. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594. doi: 10.1021/jm300687e
Neves, M. A. C., Totrov, M., and Abagyan, R. (2012). Docking and scoring with ICM: the benchmarking results and strategies for improvement. J. Comput. Aided Mol. Des. 26, 675–686. doi: 10.1007/s10822-012-9547-0
Novikov, F. N., Stroylov, V. S., Zeifman, A. A., Stroganov, O. V., Kulkov, V., and Chilov, G. G. (2012). Lead Finder docking and virtual screening evaluation with Astex and DUD test sets. J. Comput. Aided Mol. Des. 26, 725–735. doi: 10.1007/s10822-012-9549-y
Oehme, D. P., Brownlee, R. T. C., and Wilson, D. J. D. (2012). Effect of atomic charge, solvation, entropy, and ligand protonation state on MM-PB(GB)SA binding energies of HIV protease. J. Comput. Chem. 33, 2566–2580. doi: 10.1002/jcc.23095
Olajuyigbe, F. M., Demitri, N., and Geremia, S. (2011). Investigation of 2-fold disorder of inhibitors and relative potency by crystallizations of HIV-1 protease in ritonavir and saquinavir mixtures. Cryst. Growth Des. 11, 4378–4385. doi: 10.1021/cg200514z
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., et al. (2004). UCSF Chimera—A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612. doi: 10.1002/jcc.20084
Raha, K., Peters, M. B., Wang, B., Yu, N., Wollacott, A. M., Westerhoff, L. M., et al. (2007). The role of quantum mechanics in structure-based drug design. Drug Discov. Today 12, 725–731. doi: 10.1016/j.drudis.2007.07.006
Rahimi, R., Mahdavi, M., Pejman, S., Zare, P., and Balalaei, S. (2015). Inhibition of cell proliferation and induction of apoptosis in K562 human leukemia cells by the derivative (3-NpC) from dihydro-pyranochromenes family. Acta Biochim. Pol. 62, 83–88. doi: 10.18388/abp.2014_825
Raymer, M. L., Sanschagrin, P. C., Punch, W. F., Venkataraman, S., Goodman, E. D., and Kuhn, L. A. (1997). Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J. Mol. Biol. 265, 445–464. doi: 10.1006/jmbi.1996.0746
Repasky, M. P., Murphy, R. B., Banks, J. L., Greenwood, J. R., Tubert-Brohman, I., Bhat, S., et al. (2012). Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide. J. Comput. Aided Mol. Des. 26, 787–799. doi: 10.1007/s10822-012-9575-9
Ricchiuto, P., Rocco, A. G., Gianazza, E., Corrada, D., Beringhelli, T., and Eberini, I. (2008). Structural and dynamic roles of permanent water molecules in ligand molecular recognition by chicken liver bile acid binding protein. J. Mol. Recognit. 21, 348–354. doi: 10.1002/jmr.908
Ruffoni, A., Ferri, N., Bernini, S. K., Ricci, C., Corsini, A., Maffucci, I., et al. (2014). 2-Amino-3-(phenylsulfanyl)norbornane-2-carboxylate: an appealing scaffold for the Design of Rac1–Tiam1 protein–protein interaction inhibitors. J. Med. Chem. 57, 2953–2962. doi: 10.1021/jm401924s
Ryckaert, J.-P., Ciccotti, G., and Berendsen, H. J. C. (1977). Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341. doi: 10.1016/0021-9991(77)90098-5
Salomon-Ferrer, R., Götz, A. W., Poole, D., Le Grand, S., and Walker, R. C. (2013). Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh ewald. J. Chem. Theory Comput. 9, 3878–3888. doi: 10.1021/ct400314y
Schiffer, C., and Hermans, J. (2003). “Promise of advances in simulation methods for protein crystallography: implicit solvent models, time- averaging refinement, and quantum mechanical modeling,” in Methods in Enzymology, eds C. W. Carter Jr, and R. M. Sweet (New York, NY: Elsevier Inc), 412–461.
Schneider, N., Hindle, S., Lange, G., Klein, R., Albrecht, J., Briem, H., et al. (2012). Substantial improvements in large-scale redocking and screening using the novel HYDE scoring function. J. Comput. Aided Mol. Des. 26, 701–723. doi: 10.1007/s10822-011-9531-0
Scott, D. E., Bayly, A. R., Abell, C., and Skidmore, J. (2016). Small molecules, big targets: drug discovery faces the protein–protein interaction challenge. Nat. Rev. Drug Discov. 15, 533–550. doi: 10.1038/nrd.2016.29
Shang, X., Marchioni, F., Sipes, N., Evelyn, C. R., Jerabek-Willemsen, M., Duhr, S., et al. (2012). Rational design of small molecule inhibitors targeting RhoA subfamily Rho GTPases. Chem. Biol. 19, 699–710. doi: 10.1016/j.chembiol.2012.05.009
Shen, C. H., Wang, Y. F., Kovalevsky, A. Y., Harrison, R. W., and Weber, I. T. (2010). Amprenavir complexes with HIV-1 protease and its drug-resistant mutants altering hydrophobic clusters. FEBS J. 277, 3699–3714. doi: 10.1111/j.1742-4658.2010.07771.x
Sommer, K., Friedrich, N.-O., Bietz, S., Hilbig, M., Inhester, T., and Rarey, M. (2016). UNICON: a powerful and easy-to-use compound library converter. J. Chem. Inf. Model. 56, 1105–1111. doi: 10.1021/acs.jcim.6b00069
Sousa, S. F., Ribeiro, A. J. M., Coimbra, J. T. S., Neves, R. P., Martins, S. A., Moorthy, N. S. H. N., et al. (2013). Protein-ligand docking in the new millennium–a retrospective of 10 years in the field. Curr. Med. Chem. 20, 2296–2314. doi: 10.2174/0929867311320180002
Staker, B. L., Hjerrild, K., Feese, M. D., Behnke, C. A., Burgin, A. B., and Stewart, L. (2002). The mechanism of topoisomerase I poisoning by a camptothecin analog. Proc. Natl. Acad. Sci. U.S.A. 99, 15387–15392. doi: 10.1073/pnas.242259599
Stewart, J. J. P. (2016). Stewart Computational Chemistry, MOPAC2016. Colorado Springs. Available online at: http://OpenMOPAC.net
Sun, H., Li, Y., Tian, S., Xu, L., and Hou, T. (2014). Assessing the performance of MM/PBSA and MM/GBSA methods. 4. Accuracies of MM/PBSA and MM/GBSA methodologies evaluated by various simulation protocols using PDBbind data set. Phys. Chem. Chem. Phys. 16, 16719–16729. doi: 10.1039/C4CP01388C
Surviladze, Z., Waller, A., Strouse, J. J., Bologa, C., Ursu, O., Salas, V., et al. (2010). “A potent and selective inhibitor of Cdc42 GTPase,” in Probe Reports from the NIH Molecular Libraries Program [Internet] (Bethesda, MD: National Center for Biotechnology Information US), Available online at: https://www.ncbi.nlm.nih.gov/books/NBK51965/
Tirado-Rives, J., and Jorgensen, W. L. (2006). Contribution of conformer focusing to the uncertainty in predicting free energies for protein-ligand binding. J. Med. Chem. 49, 5880–5884. doi: 10.1021/jm060763i
Usher, K. C., Blaszczak, L. C., Weston, G. S., Shoichet, B. K., and Remington, S. J. (1998). Three-dimensional structure of AmpC β-lactamase from Escherichia coli bound to a transition-state analogue: possible implications for the oxyanion hypothesis and for inhibitor design. Biochemistry 37, 16082–16092. doi: 10.1021/bi981210f
Vangrevelinghe, E., Zimmermann, K., Schoepfer, J., Portmann, R., Fabbro, D., and Furet, P. (2003). Discovery of a potent and selective protein kinase CK2 inhibitor by high-throughput docking. J. Med. Chem. 46, 2656–2662. doi: 10.1021/jm030827e
Wang, J., Morin, P., Wang, W., and Kollman, P. A. (2001). Use of MM-PBSA in reproducing the binding free energies to HIV-1 RT of TIBO derivatives and predicting the binding mode to HIV-1 RT of efavirenz by docking and MM-PBSA. J. Am. Chem. Soc. 123, 5221–5230. doi: 10.1021/ja003834q
Wang, J., Wang, W., Kollman, P. A., and Case, D. A. (2006). Automatic atom type and bond type perception in molecular mechanical calculations. J. Mol. Graph. Model. 25, 247–260. doi: 10.1016/j.jmgm.2005.12.005
Warren, G. L., Andrews, C. V., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., et al. (2006). A critical assessment of docking programs and scoring functions. J. Med. Chem. 49, 5912–5931. doi: 10.1021/jm050362n
Weis, A., Katebzadeh, K., Söderhjelm, P., Nilsson, I., and Ryde, U. (2006). Ligand affinities predicted with the MM/PBSA method: dependence on the simulation method and the force field. J. Med. Chem. 49, 6596–6606. doi: 10.1021/jm0608210
Wong, S., Amaro, R. E., and McCammon, J. A. (2009). MM-PBSA captures key role of intercalating water molecules at a protein-protein interface. J. Chem. Theory Comput. 5, 422–429. doi: 10.1021/ct8003707
Xiong, Y., Li, Y., He, H., and Zhan, C.-G. (2007). Theoretical calculation of the binding free energies for pyruvate dehydrogenase E1 binding with ligands. Bioorg. Med. Chem. Lett. 17, 5186–5190. doi: 10.1016/j.bmcl.2007.06.095
Xu, D. (2012). Autodock2MMGBSA, A Multi-Level Virtual Screening Rescoring and Refinement Scheme that Combines Consensus Scoring, Simulated Annealing and MM-GBSA Binding Free Energy Methods. American Chemical Society (Washington, DC), NORM-153.
Xu, D., Sawaya, N., McCammon, J. A., and Li, W. W. (2010). Autodock2MMGBSA, A Multi-Level Virtual Screening Rescoring and Refinement Scheme that Combines Consensus Scoring and MM-GBSA Binding Free Energy Methods. American Chemical Society (Washington, DC), COMP-78.
Xu, L., Sun, H., Li, Y., Wang, J., and Hou, T. (2013). Assessing the performance of MM/PBSA and MM/GBSA methods. 3. The impact of force fields and ligand charge models. J. Phys. Chem. B 117, 8408–8421. doi: 10.1021/jp404160y
Yang, Y., Lightstone, F. C., and Wong, S. E. (2013). Approaches to efficiently estimate solvation and explicit water energetics in ligand binding: the use of WaterMap. Expert Opin. Drug Discov. 8, 277–287. doi: 10.1517/17460441.2013.749853
Young, T., Abel, R., Kim, B., Berne, B. J., and Friesner, R. A. (2007). Motifs for molecular recognition exploiting hydrophobic enclosure in protein-ligand binding. Proc. Natl. Acad. Sci. U.S.A. 104, 808–813. doi: 10.1073/pnas.0610202104
Zhang, X., Wong, S. E., and Lightstone, F. C. (2014). Toward fully automated high performance computing drug discovery: a massively parallel virtual screening pipeline for docking and molecular mechanics/generalized born surface area rescoring to improve enrichment. J. Chem. Inf. Model. 54, 324–337. doi: 10.1021/ci4005145
Zhou, Z., Bates, M., and Madura, J. D. (2006). Structure modeling, ligand binding, and binding affinity calculation (LR-MM-PBSA) of human heparanase for inhibition and drug design. Proteins 65, 580–592. doi: 10.1002/prot.21065
Keywords: MM-GBSA, explicit water, molecular dynamics, GPU, structure based virtual screening, protease, protein-protein interactions
Citation: Maffucci I, Hu X, Fumagalli V and Contini A (2018) An Efficient Implementation of the Nwat-MMGBSA Method to Rescore Docking Results in Medium-Throughput Virtual Screenings. Front. Chem. 6:43. doi: 10.3389/fchem.2018.00043
Received: 25 October 2017; Accepted: 19 February 2018;
Published: 05 March 2018.
Edited by:Daniela Schuster, Paracelsus Private Medical University of Salzburg, Austria
Reviewed by:Alfonso T. Garcia-Sosa, University of Tartu, Estonia
Pramod C. Nair, Flinders University, Australia
Copyright © 2018 Maffucci, Hu, Fumagalli and Contini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Alessandro Contini, email@example.com