Discovery of a cryptic pocket in the AI-predicted structure of PPM1D phosphatase explains the binding site and potency of its allosteric inhibitors

Virtual screening is a widely used tool for drug discovery, but its predictive power can vary dramatically depending on how much structural data is available. In the best case, crystal structures of a ligand-bound protein can help find more potent ligands. However, virtual screens tend to be less predictive when only ligand-free crystal structures are available, and even less predictive if a homology model or other predicted structure must be used. Here, we explore the possibility that this situation can be improved by better accounting for protein dynamics, as simulations started from a single structure have a reasonable chance of sampling nearby structures that are more compatible with ligand binding. As a specific example, we consider the cancer drug target PPM1D/Wip1 phosphatase, a protein that lacks crystal structures. High-throughput screens have led to the discovery of several allosteric inhibitors of PPM1D, but their binding mode remains unknown. To enable further drug discovery efforts, we assessed the predictive power of an AlphaFold-predicted structure of PPM1D and a Markov state model (MSM) built from molecular dynamics simulations initiated from that structure. Our simulations reveal a cryptic pocket at the interface between two important structural elements, the flap and hinge regions. Using deep learning to predict the pose quality of each docked compound for the active site and cryptic pocket suggests that the inhibitors strongly prefer binding to the cryptic pocket, consistent with their allosteric effect. The predicted affinities for the dynamically uncovered cryptic pocket also recapitulate the relative potencies of the compounds (τb = 0.70) better than the predicted affinities for the static AlphaFold-predicted structure (τb = 0.42). Taken together, these results suggest that targeting the cryptic pocket is a good strategy for drugging PPM1D and, more generally, that conformations selected from simulation can improve virtual screening when limited structural data is available.


Fig S2:
P2Rank predictions for the AlphaFold Wip1 structure reveal that there are no high scoring pockets at the flap or hinge.Spheres represent pocket grid points and are colored by their probability of being a ligand-binding site (red is higher, blue is lower).Only pocket 1 and 2 are predicted to bind ligands.Pocket 1 is found at the active site while pocket 2 has not been implicated as a potential binding site in previous work.The pocket involving the flap domain (pocket 4) is low scoring while there are no highscoring pockets at the hinge.

Fig S3:
LIGSITE pocket volumes calculated across MSM at proposed binding sites show that cryptic pockets form at both the photolabeling site and hinge pocket volumes.At both these sites, the pocket volumes in the AF structure (red arrow) are much smaller than the maximum sampled in the simulations.We calculated LIGSITE pocket volumes for all structures (see Methods) and restricted pocket elements that were within 5 Angstroms of select residues.The photolabeling site was defined as residues 219 and 236 while the hinge residues were defined as residues 155 to 166.

Fig S4:
The flap domain approaches residues 328-362 in simulation despite being separated by over 10 Angstroms in the AlphaFold structure.A) Equilibrium probabilityweighted distribution of flap domain minimum heavy atom distance to residues 328-362 shows that the flap domain can extend to this region.B) Wip1 structure with the smallest heavy atom distance between the flap and residues 328-362.Val241 from the flap is in proximity to Trp358 in this structure.

Fig S5:
The b-strands in the flap domain remain stable throughout the simulations as predicted by AlphaFold.We calculated the equilibrium distribution of secondary structure elements for each residue using the dictionary of protein secondary structure (DSSP) assignments.Besides the end of the b-strand spanning residues 263-269, each b-strand has an equilibrium probability of being in a non-coil assignment near 1.0 (black line).The purple line shows the assignments for the AF-predicted starting structure.

Fig S6:
The Wip1 AlphaFold structure colored by the maximum residue-level increase in ligand-binding probability (red is higher, blue is lower) shows that the flap domain is a hotspot for increases in ligand-binding probability.

Fig S7:
The flap domain loop spanning residues 276-290 is a hotspot for increases in the P2Rank ligand-binding probability.Box-and-whisker plots show the distribution of residue-level ligand-binding probabilities across the Wip1 MSM cluster center structures.The red dot indicates the predicted ligand-binding for the AF structure.Nearly all residues in this region exceed 0.6 at some point in the simulation.

Fig S8:
The hinge domain does not achieve high P2Rank ligand-binding probabilities at any point in the simulation.Box-and-whisker plots show the distribution of residue-level ligand-binding probabilities across the Wip1 MSM cluster center structures.The red dot indicates the predicted ligand-binding for the AF structure.Nearly all residues in this region do not exceed 0.6 at any point in the simulation.

Fig S9:
The photolabeling sites (P219 and M236) do not achieve high P2Rank ligandbinding probabilities at any point in the simulation.Box-and-whisker plots show the distribution of residue-level ligand-binding probabilities across the Wip1 MSM cluster center structures.The red dot indicates the predicted ligand-binding for the AF structure.

Fig S10:
An overlay of docked poses that receive poseNet predictions greater than 0.5 shows that high quality poses cluster between the flap and hinge.The AlphaFold Wip1 structure is depicted, and poses may clash with this structure since these compounds were docked to other structures in the Markov State Model.

Fig S11:
Equilibrium-weighted contact probabilities implicate the flap as the main site of GSK compound binding.Given multiple poses for each structure in the Wip1 MSM, we selected the pose with the highest poseNet prediction as the most likely binding mode for that state.We then designated any residues within 4 Angstroms of a ligand heavy atom as ones that made with the docked ligand.B) Inset showing a zoomed-in side view of the flap and hinge.Residues D277, W279, and V290 were most likely to contact the best pose in each state.

Fig S12:
The distributions of the AtomNet® PoseRanker scores per binding site shows that poses at the flap-hinge interface receive higher scores.Scores below 0.1 are not shown to better visualize higher scores.Each line represents one of the 5 slowest relaxation timescales derived from the Markov state model (from slowest to fastest: black, purple, red, blue, and gray).These implied timescales are derived from the largest eigenvalues of the transition matrix determined at a given lag time.

Fig S16:
A single carbonyl difference in chemical structure qualitatively improves pose quality and increases polar contacts between protein and ligand.Top row shows highscoring poses for a highly potent compound (cyan sticks) while the bottom row shows high-scoring poses for a compound lacking activity (magenta sticks).Dotted yellow lines indicate a predicted polar interaction.For MSM state A, the carbonyl is expected to form a hydrogen bond with the protein backbone.For this comparison, we selected MSM states with high PoseRanker pose quality scores for both compounds but higher pKi predictions for the highly potent compound.Residues which regularly contact the carbonyl (Y281-F284) are highlighted in red.

Fig S17:
The more potent compound buries more of the solvent exposed protein surface.Two MSM cluster center structures (along columns) are shown with docked poses for a highly potent compound (cyan sticks) and an inactive compound differing by a single carbonyl (magenta sticks).These states were selected because of their large differences in predicted pKi (0.5 and 1.2).

Fig S13 :
Fig S13:MSM-weighted PoseRanker predictions per binding site and compound shows that the flap-hinge interface is the highest-scoring binding stie for all but two compounds.The lowest-scoring compound is compound 3, which has no activity.

Fig S14 :
Fig S14:Histogram of tau-b scores for each MSM cluster center show that the ability to rank compounds is sensitive to the structure used for docking.

Fig S15 :
Fig S15:Implied timescales plot shows convergence on logarithmic scale around 2 ns.Each line represents one of the 5 slowest relaxation timescales derived from the Markov state model (from slowest to fastest: black, purple, red, blue, and gray).These implied timescales are derived from the largest eigenvalues of the transition matrix determined at a given lag time.