Consensus docking aid to model the activity of an inhibitor of DNA methyltransferase 1 inspired by de novo design

Prado-Romero, Diana L.; Gómez-García, Alejandro; Cedillo-González, Raziel; Villegas-Quintero, Hassan; Avellaneda-Tamayo, Juan F.; López-López, Edgar; Saldívar-González, Fernanda I.; Chávez-Hernández, Ana L.; Medina-Franco, José L.

doi:10.3389/fddsv.2023.1261094

ORIGINAL RESEARCH article

Front. Drug Discov., 11 December 2023
Sec. In silico Methods and Artificial Intelligence for Drug Discovery
Volume 3 - 2023 | https://doi.org/10.3389/fddsv.2023.1261094

Consensus docking aid to model the activity of an inhibitor of DNA methyltransferase 1 inspired by de novo design

Diana L. Prado-Romero¹

Alejandro Gómez-García¹

Raziel Cedillo-González¹ www.frontiersin.org

Hassan Villegas-Quintero¹

Juan F. Avellaneda-Tamayo¹

Edgar López-López^1,2

Fernanda I. Saldívar-González¹

Ana L. Chávez-Hernández¹

José L. Medina-Franco¹*

¹DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, Mexico
²Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City, Mexico

The structure-activity relationships data available in public databases of inhibitors of DNA methyltransferases (DNMTs), families of epigenetic targets, plus the structural information of DNMT1, enables the development of a robust structure-based drug design strategy to study, at the molecular level, the activity of DNMTs inhibitors. In this study, we discuss a consensus molecular docking strategy to aid in explaining the activity of small molecules tested as inhibitors of DNMT1. The consensus docking approach, which was based on three validated docking algorithms of different designs, had an overall good agreement with the experimental enzymatic inhibition assays reported in the literature. The docking protocol was used to explain, at the molecular level, the activity profile of a novel DNMT1 inhibitor with a distinct chemical scaffold whose identification was inspired by de novo design and complemented with similarity searching.

1 Introduction

Epigenetic drug discovery is a promising strategy for treating cancer and other complex diseases. Over the past 20 years, several small molecules with novel chemical scaffolds have been investigated with high affinity and selectivity against specific epigenetic targets (Dueñas-González et al., 2016). In several cases, the epi drugs administered alone are not very potent but are co-administered with other epigenetic drugs in combined therapies (Dueñas-González et al., 2016). Amongst the major clinically validated epigenetic targets are the DNA methyltransferases (DNMTs) including the two de novo methyltransferases: DNMT3A and DNMT3B, and the maintenance methyltransferase DNMT1. The latter, which is the most abundant of the three, duplicates the pattern of DNA methylation during replication, and it is essential for proper mammalian development. Since DNA methylation represents a crucial epigenetic mechanism for gene regulation, the development of inhibitors of DNMTs (DNMTis) represents promising perspectives for new therapies. Of the three, DNMT1 has been proposed as the most interesting target for experimental cancer treatments (Dueñas-González et al., 2016; Yu et al., 2019; Zhang et al., 2022).

Azacitidine and 5-aza-decitabine (Figure 1) are two DNMT1 FDA-approved inhibitors for the treatment of myelodysplastic syndrome. However, both drugs are non-specific and have several pharmacokinetic issues (Stresemann and Lyko, 2008). Many other small molecules have been investigated by our and other research groups (Medina-Franco et al., 2015; Giri and Aittokallio, 2019; Hu et al., 2021; Ala et al., 2023) whose structure-activity data is freely accessible in large public databases such as ChEMBL (Davies et al., 2015; Mendez et al., 2019). In the current release of ChEMBL (33), the most active DNMT1 inhibitor has a reported IC₅₀ value of 0.3 nM, although the value is inconclusive. Computational approaches including molecular docking, molecular dynamics, and a broad range of chemoinformatics methods, collectively called “epi-informatics” (Medina-Franco, 2016), have contributed to identifying or developing novel DNMT and other epigenetic targets’ modulators (Sessions et al., 2020). Of note, de novo design is being employed extensively to identify novel epigenetic drug candidates (Prado-Romero and Medina-Franco, 2021) although it has not been pursued (or at least published) to guide the design of DNMT inhibitors.

FIGURE 1

FIGURE 1. Chemical structures of representative inhibitors of DNMT1.

In addition to a large amount of enzymatic inhibition assays’ data of small molecules, since the first crystallographic structure of the catalytic domain of DNMT1 was published (Song et al., 2011) other three-dimensional (3D) coordinates of DNMTs (Syeda et al., 2011; Cheng et al., 2015; Li et al., 2018; Horton et al., 2022; Kikuchi et al., 2022) are available at the Protein Data Bank (Berman et al., 2000). This information has boosted the application of structure-based design data to understand the activity of small molecules at the structural level and to select small molecules for testing among large chemical libraries.

Structure-based virtual screening (SBVS) is a useful technique for drug discovery (Lionta et al., 2014). SBVS aims to predict the best interaction mode between two molecules to form a stable complex, and it uses scoring functions to estimate the force of non-covalent interactions between ligands against a molecular target. As a result, the ligands are ranked according to their predicted affinity to the target. The next goal is to develop hit compounds into leads that then can enter into preclinical studies as drug candidates (Lionta et al., 2014). SBVS relies on the availability of a 3D structure of the target protein. Remarkably, the pose prediction and scoring functions are major factors for the success or failure of the SBVS, not to mention that it is possible to obtain different results from different software using the same input. To reduce the number of false positives (Maia et al., 2020), consensus virtual screening (CVS) has been used (Houston and Walkinshaw, 2013).

SBVS has guided the identification of hit compounds with epigenetic targets. For example, Chen et al. uncover the first selective inhibitor against the disruptor of telomeric silencing 1-like (DOT1L) (Chen et al., 2016), the most studied non-SET-containing methyltransferase that is responsible for the mono-, di- and trimethylation of lysine 79 of histone H3 - H3K79 (Feoli et al., 2022). Zheng et al. reported the combination of high-throughput screening, SBVS, and molecular dynamics to identify computational hits against a histone methyltransferase (Zheng et al., 2021). Kong et al. used SBVS to uncover astemizole as an inhibitor of EZH2/EED (Kong, et al., 2014). Yu et al. reported the SBVS of a commercial screening library, followed by in vitro assays to identify a low micromolar DNMT3A inhibitor with a distinct chemical scaffold (Yu, Chai, et al., 2022). The experimentally validated hit compound was later used in a ligand-based virtual screening (LBVS) based on structural similarity to uncover a submicromolar DNMT3A inhibitor with selectivity against DNMT1, DNMT3B, and G9a. The hit compounds also showed activity in a cancer cell proliferation assay (Yu et al., 2022). In a recent study, Ala et al. reported an SBVS based on molecular docking and dynamics of three databases to identify four compounds with potential inhibitory activity of DNMT1 (Ala et al., 2023).

The goal of this study was to develop a consensus docking protocol to analyze DNMT1 inhibitors. The protocol was based on a combination of well-validated search algorithms, molecular docking scores, and data fusion. We also report a novel DNMT1 inhibitor with a distinct chemical scaffold whose design was based on de novo design and similarity searching. In this study, we did not test directly the compounds designed de novo because of the additional time and economic resources that require the chemical synthesis. Instead, as the first approach to reduce costs and speed up time (as explained in the Methods Section), we combined the results of de novo design with similarity searching of a commercial chemical library. The docking protocol helped to suggest a binding mode with DNMT1. Unexpectedly, new activators of the enzymatic activity of DNMT1 were also found.

2 Methods

The general approach to developing the consensus docking protocol is outlined in Figure 2A, followed by a docking-based analysis of a novel DNMT1 inhibitor whose identification was inspired by de novo design (Figure 2B). In general, the docking protocol comprised six key steps: 1) Target selection; 2) Target preparation; 3) Dataset preparation; 4) Molecular docking; 5) Ranking and re-scoring; and 6) Data fusion (consensus scoring). Details of the protocol are explained in the following sections.

FIGURE 2

FIGURE 2. General workflow of the strategies implemented in this work. (A) Consensus molecular docking based on Autodock Vina (Vina), LeDock, and Molecular Operating Environment (MOE). (B) Structure-based analysis of a de novo inspired compound. The de novo compound was obtained from fragment libraries retrieved from active compounds. As a first approach, similar compounds from a commercial and ready available library (ChemDiv) were selected for purchase and testing. Continuous black arrows represent the steps followed in this study. Dashed arrows denote perspectives of this work and alternative strategies to identify active molecules: (1) chemical synthesis and testing of compounds designed de novo; (2) virtual screening of the commercial library (including ChemDiv) using the consensus docking protocol; (3) structure-based design and selection of additional candidate compounds based on the docking results of the newly identified compound.

The consensus molecular docking (Figure 2A) was used to generate a binding model of a DNMT1 inhibitor with a novel chemical scaffold identified from an independent de novo design approach combined with similarity searching (method schematically presented in Figure 2B). In Figure 2B, we mark with dashed arrows alternative strategies that will be pursued in forthcoming studies to identify DNMT1 inhibitors based on the outcomes of this study, namely, chemical synthesis and testing of compounds designed de novo; virtual screening of a commercial library using the consensus docking protocol; and structure-based design and selection of additional candidate compounds based on the docking results of the newly identified compound.

Hereunder, we describe first the specific methods used to develop the docking protocol (Sections 2.1–2.8) and this is followed by the description of the selection of the newly tested compounds and the enzymatic inhibition assay (Sections 2.9–2.10).

In all steps, MarvinSketch 22.18 was used for drawing and displaying chemical structures (“MarvinSketch 22.18, Chemaxon, 2023”). Datasets and code for the analysis are available on GitHub at https://github.com/DIFACQUIM/DNMT1-Protocol.

2.1 Targets selection and preparation

The crystallographic structure of human DNMT1 (PDB ID: 4WXX) was retrieved from the RCSB Protein Data Bank (PDB) available online: https://www.rcsb.org/ (accessed on 30 June 2023) (Berman et al., 2000). Among the different crystallographic structures of DNMT1 available on PDB we selected PDB ID: 4WXX because it contains a co-crystallized molecule of S-adenosyl-L-homocysteine (SAH), and was diffracted with a resolution of 2.62 Å. SAH is reported to be a potent inhibitor of both DNA and histone transmethylation (Halsted and Medici, 2016), therefore this 3D structure could be of interest as a model for inhibitory interactions (Alkaff et al., 2021). The protein preparation was made with default settings of the QuickPrep module of Molecular Operating Environment (MOE) v. 2022.02 (“Molecular Operating Environment (MOE). Chemical Computing Group Inc.: Montreal, QC, Canada, 2023”): addition of all the lacking hydrogen atoms, protonation state at pH 7, elimination of water molecules 4.5 Å farther from the protein and inside the SAH cavity, addition of missing amino acids residues (breaks of up to ten residues and terminal out gaps of up to five residues) and for larger gaps, neutralization of the endpoints adjoining empty residues and energy minimization. The parameters employed for the energy minimization stage were from the AMBER14:EHT forcefield [ff14SB (Maier et al., 2015) for the protein; MAB forcefield (Gerber and Müller, 1995), and AM1-BCC charges for SAH (Jakalian et al., 2002)]. The energy minimization of the protein in MOE is carried out with three successive nonlinear methods: steepest descent, conjugate gradient, and truncated Newton.

2.2 Dataset selection and preparation

The 153 ligands with reported enzymatic activity against DNMT1 in a biochemical assay were obtained from ChEMBL API v. 32 (Davies et al., 2015; Mendez et al., 2019). Only molecules with binding assay type and unequivocally assigned IC₅₀ were selected. Compounds with nucleoside scaffolds (Supplementary Figure S1) were removed using RDKit library (Landrum et al., 2023) substructure search with SMARTS. Before docking (vide infra), the 153 ligands were built and their geometry was energy minimized using MFF94x forcefield implemented on MOE software. For every ligand, the dominant protonation state at physiological pH (7.4) was chosen (“Molecular Operating Environment (MOE). Chemical Computing Group Inc.: Montreal, QC, Canada, 2023”).

2.3 Docking with Vina

The file with the prepared ligands was split with the LeFrag module (Lephar Research, 2023), and Open Babel v.3.1.1 (O’Boyle et al., 2011) was used to convert to .pdb format. Protein and ligands were converted to.pdbqt with MGLTools v.1.5.6. The molecular docking was carried out with Vina v.1.2.3 (Trott and Olson, 2010; Eberhardt et al., 2021) with an exhaustiveness of 8 and 5 binding modes to output. The best score for each ligand was selected for further analysis, with the code freely available at https://github.com/DIFACQUIM/Docking. The grid box was centered in the coordinates: -47.673, 61.885, 6.256 (x, y, z) with a search space of 17 × 25 × 14 Å.

2.4 Docking with LeDock

Docking with Ledock (Kirkpatrick et al., 1983) was carried out in the SAH cavity with the default settings of the software: the grid centered 4 Å around the co-crystallized SAH, twenty docking runs for every ligand and 1 Å for the root mean square deviation (RMSD) clustering. For further data analysis, the best score for every ligand was selected with the code available at https://github.com/DIFACQUIM/Docking.

2.5 Docking with MOE

Docking with MOE v. 2022.02 was centered on the SAH cavity and molecular docking was carried out with the default settings: placement (method: triangle matcher, score function: London dG) and refinement (method: rigid receptor, score function: GBVI/WSA dG) (Vilar et al., 2008). Using the “Triangle Matcher” method, the compounds were subjected to 30 search steps and the default values for the other parameters. The clusters with an RMSD <2 Å were visually explored. During the docking, the receptor was considered rigid and the ligands flexible. The conformations with the lowest binding energy were selected for additional analysis.

2.6 Validation of docking protocol

For this analysis, ligands with IC₅₀ equal to, or lower than 10 μM (pIC₅₀ ≥ 5) were labeled as ‘active’, otherwise they were considered ‘inactive’. Notably, a 10 μM value has been used as a general threshold to define active/inactive molecules in other large-scale studies (Sun et al., 2017; López-López et al., 2022). To develop the current consensus docking protocol, we made the approximation that the enzymatic inhibition assays and the activity values reported in ChEMBL are comparable. The RMSD between the docked and co-crystallized binding conformation of SAH was calculated with Open Babel v. 3.1.1 (O’Boyle et al., 2011). From molecular docking scores and the positive class probabilities, receiver operating characteristic (ROC) curves were generated using KNIME software version 4.6.0 (Gómez-García and Medina-Franco, 2022; Berthold et al., 2009). The results were recorded in a comma-delimited CSV file, which included the scores of each docking program and their ligand efficiency (LE) (vide infra).

2.7 Re-scoring

Docked ligands were ranked according to their predicted scores in ascending order, compounds with higher rank have more negative values, thus better predicted affinity against DNMT1. pIC₅₀ values were also ranked in descending order since a higher value represents a more potent compound (pIC₅₀ ranking). Additionally, LE was calculated individually for each molecular docking score (obtained by Vina, LeDock, or MOE software) with the equation:

L i g a n d E f f i c i e n c y (L E) = \frac{D o c k i n g S c o r e (D S)}{H e a v y A t o m C o u n t} (1)

In Equation 1, the Heavy Atom Count for each ligand was calculated using RDKit library (Landrum et al., 2023). Correlations and graphs were obtained with SciPy (Virtanen et al., 2020), Matplotlib (Hunter, 2007), and seaborn (Waskom, 2021) libraries using Python programming language version 3.10.12.

2.8 Consensus scoring

Since there is not a single “best” scoring function and docking program, it has been established that combining results from different docking programs increases the likelihood of identifying correct docking poses and improve the performance of docking-based virtual screening (Charifson et al., 1999; Houston and Walkinshaw, 2013; Perez-Castillo et al., 2019; Blanes-Mira et al., 2022). In this study, docking scores and LE values from Vina, LeDock, and MOE were used to calculate seven data fusion metrics: maximum, minimum, arithmetic mean, geometric mean, harmonic mean, median, and Euclidean norm (Bajusz et al., 2019). The data fusion metrics were calculated employing SciPy (Virtanen et al., 2020).

2.9 De novo inspired selection of compounds

Automated de novo design was carried out with alvaBuilder v.1.0.6 (Mauri and Bertola, 2023). Briefly, alvaBuilder combines structural fragments which are obtained from the training sets chosen by the user. The new sets of molecules constructed from the fragments are scored with a scoring function, also chosen by the user (vide infra). Two different training sets were selected as the source of fragments used as construction blocks. The first dataset was retrieved from ChEMBL 31 (Davies et al., 2015; Mendez et al., 2019) selecting compounds with IC₅₀ against DNMT1 equal to, or lower than 10 μM. The second is the diversity subset (PS6) of 5,000 compounds from Life Chemicals (“Diversity Screening Libraries, 2021”) (accessed in August 2021). Both datasets were curated with the same protocol. Briefly, compounds were standardized, the largest component was retained, and compounds were neutralized and reionized to generate canonical SMILES and remove duplicates, as previously published by our group (Sánchez-Cruz et al., 2019; DIFACQUIM, 2020). A random subset of 285 compounds from Life Chemicals was used to match the number of ‘active’ molecules from ChEMBL after curation. We set the scoring function with ranges of descriptors calculated from the molecules with reported biological activity, using alvaDesc 2.0.10 (Mauri, 2020) (the values used for the scoring function are in the Supplementary Table S1): molecular weight (MW), hydrogen bond donors and acceptors, consensus partition coefficient (logP), aqueous solubility (ESOL), synthetic accessibility (SAscore), topological polar surface area (TPSA). The aggregation method was an arithmetic mean with a population size of 70 and 100 iterations. For each training set, 700 molecules were computed. Finally, 1,398 compounds remained after curation.

De novo compounds were used for similarity searching with the commercial library from the Epigenetics Focused Set of ChemDiv (“ChemDiv, 2023”), with 25,883 compounds. Morgan fingerprints of radius 2 (Morgan2) and 3 (Morgan3) (Rogers and Hahn, 2010), along with MACCS keys (166-bit) fingerprint (Durant et al., 2002) were calculated for all compounds with RDKit (Landrum et al., 2023), and similarity was computed with the Tanimoto coefficient. Molecules from ChemDiv that exhibit one of the following similarity values to at least one compound de novo designed were selected for additional analysis: equal to, or higher than 0.30 for Morgan fingerprints radius 2 or 3; or equal to, or higher than 0.80 for MACCS keys. The selection of these thresholds was based on typical values of intuitive high structure similarity for each fingerprint (Medina-Franco, 2012). Similarity values, along with commercial availability criteria, were used to purchase compounds for further evaluation (vide infra).

2.10 Enzymatic DNMT1 inhibition assay

Compounds obtained from ChemDiv were experimentally tested at the company Reaction Biology in an enzymatic inhibition methyltransferase assay (“Reaction Biology Corporation, 2023”) using the HotSpot^SM platform. Our research group has reported the methodology and results of this biochemical assay, including the identification of 7-amino alkoxy-quinazolines (Figure 1) (Medina-Franco et al., 2022). Briefly, HotSpot^SM is a low-volume radioisotope-based assay that employs tritium-labeled AdoMet (³H-SAM) as a methyl donor. The test compounds diluted in dimethyl sulfoxide were added using acoustic technology (Echo550, Labcyte, San Jose, CA, United States) into an enzyme/substrate mixture in the nano-liter range. The reactions were started by adding ³H-SAM and incubated at 30 °C. Total final methylations on the substrate (Poly dI-dC) were identified by a filter binding method implemented in Reaction Biology. Data analysis was conducted with GraphPad Prism software available at Reaction Biology (La Jolla, CA, United States) for curve fits. The enzymatic inhibition assays were carried out at 1 μM of SAM. The standard positive control was SAH. The compounds were tested in 10-concentration IC₅₀ (effective concentration to inhibit enzymatic activity by 50%) with a threefold serial dilution starting at 100 μM and 200 μM only for F447-0397. Activity percentage values and dose-response curve are reported as provided by the testing laboratory in Figure 8, Supplementary Table S4, respectively.

3 Results and discussion

First, we present the results of the docking protocol with DNMT1 (validation and consensus approach), followed by the results of the newly identified DNMT1 inhibitor with a distinct chemical scaffold. Since the chemical synthesis of de novo compounds requires more time investment, a similarity searching was performed as a first approach to identify novel scaffolds de novo inspired. To have an insight about the possible mechanism of action of the new inhibitor, the docking protocol was used to identify possible key interactions.

Three different algorithms to generate conformers were employed: MOE (Triangle Matcher), Ledock (simulated annealing) (Kirkpatrick et al., 1983), and Vina (Iterated Local Search global optimizer) (Baxter, 1981; Blum et al., 2008). In the Triangle Matcher method, the conformers generated for every ligand are placed inside a space of approximately 5 Å around SAH, this space is permeated with alpha spheres, and, the poses are generated by aligning ligand triplets of atoms on triplets of receptor site points in a systematic way. The receptor site points are alpha sphere centers representing tight packing locations (“Molecular Operating Environment (MOE). Chemical Computing Group Inc.: Montreal, QC, Canada, 2023”). The docking run begins with a random conformation, and the move consists of random perturbations of rotatable bonds and the search of the conformational space is carried out using molecular mechanics force fields, with a final rejection test for each molecular move to find an optimal solution (Vilar et al., 2008).

The simulated annealing method of Ledock initially generates an aleatory conformer from which the neighborhood of conformers is generated in search of the one with the most favorable binding energy. Nonetheless, during the first iterations, the generation of conformers will not always move in search of the most favorable binding energy but can move towards the generation of conformers with less favorable binding energy. This is intending to expand the region of search in conformational space (Kirkpatrick et al., 1983). The Iterated Local Search global optimizer of Vina consists of a succession of steps of a mutation and a local optimization, with each step being accepted according to the Metropolis criterion (Trott and Olson, 2010). It uses the Broyden-Fletcher-Goldfarb-Shanno method (Nocedal and Wright, 2006) for local optimization, which is an efficient quasi-Newton method (Trott and Olson, 2010).

In MOE, two different scoring functions were employed: London dG for the initial conformer generation and GBVI/WSA dG for the conformer refinement. London dG takes into account the average gain/loss of rotational and translational entropy, the energy due to the loss of flexibility of the ligand (calculated from ligand topology only), the hydrogen bond energy, and the desolvation energy. GBVI/WSA dG also considers the gain/loss of rotational and translational entropy, the Coulombic electrostatic energy, van der Waals interactions, and the solvation electrostatic energy. The exposed surface area of the ligand is penalized (“Molecular Operating Environment (MOE). Chemical Computing Group Inc.: Montreal, QC, Canada, 2023”). The scoring function of Ledock takes into account the Coulombic electrostatic energy, van der Waals interactions, the hydrogen bond energy, the intra-molecular clashes, and torsion strain (Kirkpatrick et al., 1983). The scoring function of Vina (Trott and Olson, 2010) is inspired by the scoring function X-CSCORE which takes into account the van der Waals interactions, hydrogen bonding, deformation penalty, and the hydrophobic effect (Wang et al., 2002).

3.1 Validation of docking protocol and re-scoring

The validation of the molecular docking protocol was done with two approaches: RMSD values between the docked and co-crystallized binding conformation of SAH, and ROC curves (as detailed in the Methods Section).

The RMSD values for SAH were lower than 2 Å for all docking programs (Vina: 1.586 Å (second pose); LeDock: 1.291 Å; and MOE: 1.214 Å), Supplementary Figure S2 shows the 3D predicted pose of SAH with each software. The calculated values suggest that the docking protocols are able to identify the experimental 3D conformation of SAH found in the crystallographic structure.

Figure 3 shows the ROC curves for all three docking software using the docking scores and the LE. The ROC curves indicated that MOE’s binding scores led to better identification of true positives, in contrast with Ledock and Vina. However, the calculation of LE (Equation 1) is detrimental to the area under the curve (AUC) (Vina: 0.295; LeDock: 0.250; and MOE: 0.084), this suggests that LE does not contribute to discarding inactive molecules (Figure 3B). These results highlight the relevance of considering the ligand size, herein with the heavy atom count, to evaluate the performance of the docking programs.

FIGURE 3

FIGURE 3. Receiver operating characteristic (ROC) curves of the docking with DNMT1 with three different docking programs. Curves are generated with scoring (A), and with ligand efficiency (B).

To have an insight into the data distribution, correlation plots are shown in Figure 4: Vina (4A), LeDock (4B), and MOE (4C). In each plot, the horizontal axis represents the pIC₅₀ ranking. Docking scores, scores’ ranking, and LE are shown in the vertical axis for each docking software. Spearman correlation (ρ) was computed for each plot. ‘Active’ and ‘inactive’ compounds against DNMT1 are represented in different colors. Supplementary Figure S3 shows the correlation plots for all three docking programs plotting the pIC₅₀ values on the horizontal axis.

FIGURE 4

FIGURE 4. Docking scores and ligand efficiency (LE) correlations with ranked pIC₅₀ of compounds with activity against DNMT1. Compounds labeled as active are in red, orange, or firebrick, and inactive compounds are in blue, cyan, or olive green. Spearman’s correlation is shown above each graph. From left to right: binding scores, scores’ ranking, and LE calculated with (A) Vina, (B) LeDock, and (C) MOE. Despite the low correlation (maximum 0.55), taking into account the ligand’s size improves the performance of the docking program.

Although ρ values for docking scores and scores’ ranking are equal for each program, as expected due to the transformation to rank variables, the distribution of the data is more scattered when plotting the scores’ ranking. Despite the fact the highest correlation observed is low (0.55), the correlation plots in Figure 4 indicate that, overall, considering the ligand size improves the performance of the docking program. This is particularly noticeable in the results obtained with Vina and LeDock where the ρ values improved when considering the LE (Figure 4). The observations obtained with the correlation plots agreed with the conclusions obtained from the ROC curves (Figure 3).

3.2 Consensus docking

As discussed in the Introduction, consensus SBVS could be more accurate at identifying active compounds as compared to individual methods (Wang and Wang, 2001; Houston and Walkinshaw, 2013). Data fusion also helps to rationalize the relationships between the chemical, physicochemical, and biological features that explain in more detail the possible binding mechanism of different kinds of inhibitors (López-López and Medina-Franco, 2023). Data generated with consensus docking has been useful in developing new drug candidates (Maia et al., 2020; Morris et al., 2022). The advantage of using different docking programs is that the variety of generated conformers is enriched because every program has its own conformer generation algorithm and scoring functions.

Analysis of consensus docking results with data fusion metrics has been shown to improve the results of individual docking (Bajusz et al., 2019; Triches et al., 2022; López-López and Medina-Franco, 2023). Table 1 summarizes the resulting correlations (ρ) of the pIC₅₀ ranking and different data fusion metrics implemented in this work (see Methods Section for details). The resulting correlations (ρ) with pIC₅₀ can be found in the Supplementary Table S2. The best performances were achieved with the median and the minimum rules for the docking scores and LE, respectively. There is a higher correlation with LE, in concordance with the results before the consensus.

TABLE 1

TABLE 1. Results of data fusion metrics and their correlations with pIC₅₀ ranking.

Figure 5 shows the correlation between different consensus docking approaches obtained with different data fusion rules (described in the Methods Section) and the bioactivity of DNMT1 inhibitors reported in the literature. The two best correlations are shown as calculated with Spearman’s correlation: the pIC₅₀ values with the median docking score (ρ = 0.372) and with the minimum LE (ρ = −0.564). In agreement with the results discussed in Section 3.1, LE had the best correlations, which further emphasizes the convenience of accounting for the size of the ligand while doing docking analysis with DNMTis.

FIGURE 5

FIGURE 5. Correlation plots between ranked pIC₅₀ values reported in ChEMBL for DNMT1 inhibitors and (A) docking scores and (B) minimum ligand efficiency (LE). The Spearman’s correlation coefficient is indicated in the plots. Compounds with IC₅₀ values lower/greater than 10 μM are represented with a different color.

As observed from Figure 5, correlation values increased from individual docking scores in the case of Vina and LeDock. Minimum LE also shows a better correlation than LeDock and MOE alone. Nevertheless, the calculated correlation for Vina LE has a close value (ρ = 0.559).

The corresponding ROC curves are shown in Figure 6 emphasizing the improved performance of the consensus median and consensus minimum.

FIGURE 6

FIGURE 6. Receiver operating characteristic (ROC) curves of the docking with DNMT1 with three different docking programs. Curves generated with scoring (A), and with ligand efficiency (B), including the best consensus metric.

3.3 Distinct DNMT1 inhibitor inspired by de novo design

As a result of the similarity searching using the 1,398 molecules proposed with de novo design, six compounds from ChemDiv were purchased. The results of the similarity calculations are provided as supplementary .csv files. Figure 7 shows the chemical structures of the newly tested compounds with DNMT1, as well as the most similar de novo compound according to Morgan fingerprint of radius 2 (Morgan2) (Rogers and Hahn, 2010). Most similar de novo compounds according to Morgan fingerprint of radius 3 and MACCS keys are in Supplementary Table S5. The database of the de novo-designed molecules is available at https://github.com/DIFACQUIM/DNMT1-Protocol/tree/main/De-Novo_inspired. Compound F447-0397 had inhibition against DNMT1 in the enzymatic assays, with an IC₅₀ of 41.3 ± 2.1 μM. Dose-response curves used for the calculation of IC₅₀ are in Figure 8. The data used by the testing laboratory (Reaction Biology) to obtain the IC₅₀ is in Supplementary Table S4. It should be noted that, although one point was excluded from the curve fit, F447-0397 inhibited in more than 99% the enzymatic activity of DNMT1 at the highest concentration tested. Nevertheless, the Hill slope of the IC50 curve is not close to 1.0 as it occurs for the positive and internal control, SAH for which the assay conditions to measure DNMT1 were developed by the testing laboratory. Based on these results it is important to conduct additional biochemical and orthogonal assays (e.g., in a cellular context) to further confirm the activity of the compound F447-0397. This compound exhibits a novel scaffold, not previously published among DNMT1 inhibitors to our knowledge. This was shown as no matching molecule was found after the substructure search with the Murcko scaffold of F447-0397 as implemented in RDKit (Landrum et al., 2023), using the curated dataset of 743 molecules with biological activities against DNMT1 found in ChEMBL 33 (Davies et al., 2015; Mendez et al., 2019). Supplementary Table S3 summarizes the results of the enzymatic inhibition assays of the six compounds.

FIGURE 7

FIGURE 7. Chemical structures of newly tested compounds with DNMT1 according to commercial availability. The Murcko scaffold is marked in green. The most similar de novo compound (Morgan2 representation) to the commercially available molecule (from ChemDiv) is shown on the left, alongside the similarity values calculated with the Tanimoto coefficient. The IC₅₀ value of the active compound is indicated.

FIGURE 8

FIGURE 8. Dose-response curves used to calculate the IC₅₀ values. Data and curves provided by the testing laboratory, Reaction Biology. Data for the positive control SAH (left) and ChemDiv compound F447-0397 (right). F447-0397 was tested in a 10-dose IC₅₀ mode with 3-fold serial dilution, starting at 200 µM.

The compounds F447-0397, F447-0509, and F447-0644 are quite similar, in particular, F447-0397 and F447-0509 (an additional methyl group and substitution pattern in the sulfonamide phenyl ring). The docking scores and LE of the three molecules are also similar (Supplementary Table S3), as could be anticipated from their structural similarity. However, the percentage of enzymatic activity at 100 μM is quite different, with F447-0397 being the only inhibitor (12.76%). These are good examples of activity cliffs: compounds with similar chemical structures but very unexpected activity differences (Maggiora, 2006). Although the IC₅₀ of F447-0397 indicated that it is not a very potent compound (41.3 μM - and could be considered “inactive,” the scaffold is novel and could be an interesting starting point for optimization). The novel DNMT1 inhibitor has a “long scaffold” e.g., four-ring systems connected with one-to-three bond linkers. This is in line with other DNMT1 inhibitors with “long or extended scaffolds,” such as the 4-aminoquinoline SGI-1027 and its analogs (Datta et al., 2009; Gros et al., 2015) and glyburide (Juárez-Mercado et al., 2020) (Figure 1). However, unlike SGI-1027 and glyburide, F447-0397 was identified by a combination of de novo design and similarity searching.

Figure 9 shows the predicted binding mode of F447-0397 with DNMT1 generated with Vina, the docking program that had, overall, the best performance of all three docking programs (as shown in Figure 3). The predicted pose shows a hydrogen bond between Glu1168 and the piperazine ring of F447-0397. This could be a key interaction since the co-crystallized SAH also makes a hydrogen bond interaction with Glu1168. Re-docking of SAH, with the three software, predicted the same interaction (Supplementary Figure S4). The predicted binding mode with Vina also exhibits a hydrogen bond between Arg1310 and the oxygens of the sulfonamide from F447-0397. MOE predicted pose also showed the hydrogen bond with the oxygens of the inhibitor’s carboxylic acid (Supplementary Figures S5, S6). Interactions with Arg1310 and computational hits were previously observed in separate docking studies with DNMT1 (Bashir et al., 2023), and also between Arg1310 and EGCG (Figure 1) (Assumpção et al., 2020). Of note, LeDock and MOE predicted interactions between Asn1578 and F447-0397, the interaction with this particular residue could provide selectivity towards DNMT1 versus DNMT3A (Yu et al., 2019). This suggests that F447-0397 could be the starting point of an optimization project toward selective DNMT1 inhibitors.

FIGURE 9

FIGURE 9. The predicted binding mode (Vina) of F447-0397 with DNMT1 (PDB ID: 4WXX), showing (A) 3D and (B) 2D binding models.

Compound E760-5661 was an activator (142% enzymatic activity under the assay conditions), followed by the structurally related molecule L162-0591 (132% activity, Supplementary Table S3). Although this is an unexpected result (as we were looking for inhibitors), at least there is agreement that compounds structurally similar have similar (activation) profiles. The activation of DNMT1 also has clinical implications, as DNA hypomethylation has been related to various human diseases, like cancer, and cardiovascular diseases (Wilson et al., 2007; Pogribny and Beland, 2009). Other activators, also identified serendipitously, have been recently published (Rodríguez-Mejía et al., 2022). It would remain to confirm the capabilities of compounds E760-5661 and L162-0591 in a cellular context. To this end, a global human DNA methylation assay could be performed, as recently reported by Rodríguez-Mejía et al. that recently identified two DNMT1 activators (Rodríguez-Mejía et al., 2022). It also remains to explore, at the structural level, the activity cliffs identified in this work. Preliminary structural comparisons of the three compounds (F447-0397, F447-0509, and F447-0644) suggest a quite precise protein-ligand interaction of the active compound - F447-0397 - with DNMT1. The structural analogs could be binding in a different binding region that activates the enzymatic activity of DNMT1 at a certain level, possibly by relaxing allosteric autoinhibition of human DNMT1, as recently proposed for two activators of DNMT1. The mechanism of activation of DNMT1 is out of the scope of this study.

4 Conclusion and perspectives

This study contributes to the further development of inhibitors of DNMT1 through a comprehensive analysis of docking protocols and analysis of the individual vs. consensus results. Herein, we also used different structure-based analyses to suggest the binding mode of a new DNMT1 inhibitor whose design was inspired by a de novo ligand-based design. Noteworthy, there are no previous reports of inhibitors of DNMTs proposed with de novo design. We concluded that, overall, out of the three docking programs, Vina had the best performance concerning the docking poses, as measured by the LE. Calculation or consideration of LE significantly enhanced the performance of Vina, Ledock, and MOE to prioritize compounds in SBVS of the 153 DNMT1is in ChEMBL. Regarding the consensus protocol, the best data fusion rules were the median and, more significantly, the minimum fusion, particularly considering the LE. The results emphasize the significance of considering the size of the ligand as part of the results of the docking analysis.

We also report a small molecule (F447-0397) with a chemical scaffold that had not been previously published as a DNMT1 inhibitor. Docking simulations suggested a binding mode of the new inhibitor making interactions with Glu1168 (like co-crystallized SAH) and Arg1310 (like previous hits). As part of the study, we uncovered two activity cliffs: compounds with a chemical structure similar to F447-0397 but a very different activity profile.

One of the main perspectives of this work is performing additional biochemical assays at different testing concentrations of F447-0397 and conducting orthogonal assays to confirm its DNMT1 inhibitory activity. To this end, the whole genome methylation profiling could be assessed with techniques such as High-Performance Liquid Chromatography Ultraviolet (HPLC-UV), Liquid Chromatography coupled with tandem Mass Spectrometry (LC-MS/MS), ELISA-Based Methods, LINE-1+Pyrosequencing, PCR-based amplification fragment length polymorphism (AFLP), restriction fragment length polymorphism (RFLP) or a combination of both, or luminometric methylation assay (LUMA) (Kurdyukov and Bullock, 2016; Pechalrieu et al., 2017). The activators of DNMT1 encourage investigating these compounds as potential biochemical probes to explore the role of DNMT1. Another perspective is to perform virtual screenings of chemical libraries with the newly developed consensus docking protocol (Figure 2B), including the screening of ChemDiv. Also, it can be pursued the chemical synthesis and testing of compounds designed de novo and the structure-based optimization (including chemical synthesis and testing) of the active compound identified in this work, F447-0397 (the latter two perspectives also outlined in Figure 2B). Of note, since several successful SBVS to identify DNMT1 inhibitors have been reported, a key point in future screenings is filtering chemical libraries that had not previously been screened, including newly developed focused libraries.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions

DP-R: Data curation, Formal Analysis, Methodology, Validation, Visualization, Conceptualization, Investigation, Writing–original draft, Writing–review and editing. AG-G: Investigation, Writing–original draft, Writing–review and editing, Data curation, Formal Analysis, Methodology, Validation, Visualization. RC-G: Investigation, Writing–original draft, Writing–review and editing, Data curation, Formal Analysis, Methodology, Validation, Visualization. HV-Q: Investigation, Writing–review and editing, Data curation, Formal Analysis, Methodology, Validation, Visualization. JA-T: Data curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing–review and editing. EL-L: Data curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing–review and editing, Writing–original draft. FS-G: Investigation, Methodology, Writing–review and editing. AC-H: Investigation, Methodology, Writing–review and editing, Data curation, Visualization. JM-F: Investigation, Writing–review and editing, Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing–original draft.

Funding

The authors declare financial support was received for the research, authorship, and/or publication of this article. We thank DGAPA, UNAM, Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIIT), grants No. IN201321 (to test the compounds) and IV200121 (to purchase the MOE’s academic license). We also thank the innovation space UNAM-HUAWEI the computational resources to use their supercomputer under project-7 “Desarrollo y aplicación de algoritmos de inteligencia artificial para el diseño de fármacos aplicables al tratamiento de diabetes mellitus y cáncer.”

Acknowledgments

DP-R, AG-G, RC-G, JA-T, EL-L, FS-G, and AC-H thank Consejo Nacional de Humanidades, Ciencias y Tecnologías (CONAHCyT), Mexico, for the postgraduate scholarships 888207, 912137, 1099206, 1270553, 894234, 848061, 847870. HV-Q is grateful to UNAM-HUAWEI for the scholarship under the project no. 7, “Desarrollo y aplicación de algoritmos de inteligencia artificial para el diseño de fármacos aplicables al tratamiento de diabetes mellitus y cáncer”. We acknowledge K. Eurídice Juárez-Mercado for providing the code for the similarity searching. We also thank Marvin for the Research License. MarvinSketch was used for drawing and displaying chemical structures, MarvinSketch 22.18, Chemaxon (https://www.chemaxon.com).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fddsv.2023.1261094/full#supplementary-material

Abbreviations

3D, three-dimensional; AUC, area under the curve; CVS, consensus virtual screening; DNMT, DNA methyltransferase; DNMTis, inhibitors of DNA methyltransferases; DS, docking score; LBVS, ligand-based virtual screening; LE, ligand efficiency; MOE, Molecular Operating Environment; MW, molecular weight; PDB, Protein Data Bank; RMSD, root mean square deviation; ROC, Receiver Operating Characteristic; SAH, S-adenosyl-L-homocysteine; SAM, S-adenosyl-L-methionine; SBVS, structure-based virtual screening; TPSA, topological polar surface area; Vina, AutoDock Vina; VS, virtual screening.

References

Ala, C., Joshi, R. P., Gupta, P., Ramalingam, S., and Sankaranarayanan, M. (2023). Discovery of potent DNMT1 inhibitors against sickle cell disease using structural-based virtual screening, MM-GBSA and molecular dynamics simulation-based approaches. J. Biomol. Struct. Dyn., 1–13. doi:10.1080/07391102.2023.2199081

ORIGINAL RESEARCH article

Consensus docking aid to model the activity of an inhibitor of DNA methyltransferase 1 inspired by de novo design

1 Introduction

2 Methods

2.1 Targets selection and preparation

2.2 Dataset selection and preparation

2.3 Docking with Vina

2.4 Docking with LeDock

2.5 Docking with MOE

2.6 Validation of docking protocol

2.7 Re-scoring

2.8 Consensus scoring

2.9 De novo inspired selection of compounds

2.10 Enzymatic DNMT1 inhibition assay

3 Results and discussion

3.1 Validation of docking protocol and re-scoring

3.2 Consensus docking

3.3 Distinct DNMT1 inhibitor inspired by de novo design

4 Conclusion and perspectives

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

Supplementary material

Abbreviations

References

This article is part of the Research Topic

People also looked at