Computational Tool for Fast in silico Evaluation of hERG K+ Channel Affinity

The development of a novel comprehensive approach for the prediction of hERG activity is herein presented. Software Phase has been used to derive a 3D-QSAR model, employing as alignment rule a common pharmacophore built on a subset of 22 highly active compounds (threshold Ki: 50 nM) against hERG K+ channel. Five features comprised the pharmacophore: two aromatic rings (R1 and R2), one hydrogen-bond acceptor (A), one hydrophobic site (H), and one positive ionizable function (P). The sequential 3D-QSAR model developed with a set of 421 compounds (randomly divided in training and test set) yielded a test set (Q2) = 0.802 and proved to be predictive with respect to an external test set of 309 compounds that were not used to generate the model (rext_ts2 = 0.860). Furthermore, the model was submitted to an in silico validation for assessing the reliability of the approach, by applying a decoys set, evaluating the Güner and Henry score (GH) and the Enrichment Factor (EF), and by using the ROC curve analysis. The outcome demonstrated the high predictive power of the inclusive 3D-QSAR model developed for the hERG K+ channel blockers, confirming the fundamental validity of the chosen approach for obtaining a fast proprietary cardiotoxicity predictive tool to be employed for rationally designing compounds with reduced hERG K+ channel activity at the early steps of the drug discovery trajectory.


INTRODUCTION
The hERG K + channel is a member of the Ether-à-go-go family encoded by KCNH2 gene that consists of four identical subunits (Trudeau et al., 1995). This channel is well known for its important role in the electrical activity of the heart that coordinates heart's beating. Mutations in KCHN2 gene lead to long-QT syndrome, which can lead to ventricular arrhythmia or other adverse effects on cardiovascular system, causing sudden death (Sanguinetti et al., 1995;De Bruin et al., 2005;Sanguinetti and Tristani-Firouzi, 2006). hERG K + channels are particularly sensitive to blockage by a large number of structurally diverse drugs and are critical antitargets in drug discovery process (Dumaine and Kirsch, 1998;Barbey et al., 2002;Redfern et al., 2003;Thomas et al., 2003a,b;Rajamani et al., 2006). The interaction of small molecules with hERG K + channel is one of the major issues encountered by the pharmaceutical companies related to the drug development process. Blockade of hERG K + channel has then become a severe limitation for the introduction of new drugs in the market. Moreover, in the recent years several blockbuster drugs including astemizole, droperidol, terfenadine, lidolazine, sertindole, cisapride, and chlorpromazine have been discontinued due to their relevant activity on hERG K + channel (Honig et al., 1993;Mohammad et al., 1997;Gottlieb, 1999;Barbey et al., 2002;Ray et al., 2004;Roden, 2004). Currently, pharmaceutical companies may account on high throughput methods for predicting the capability of compounds to interfere with hERG K + channels, thus identifying safer compounds at the early stage of the drug discovery process. However, the inherent costs associated to this screening procedure are dramatically elevated in terms of expenditure, amount of molecules consumed, and time of analysis. Accordingly, the development of a fast and reliable computational procedure for an early and trustworthy prediction of hERG K + channel interference would represent a useful and significant advance in this frame. Recently, for this purpose diverse ligand-based computational models have been developed (see ref Keserü, 2003;Aronov, 2005;Wang et al., 2013Wang et al., , 2016Braga et al., 2015 for further details). Moreover, three dimensional (3D) channel models using homology modeling technique were generated and adopted in molecular docking/molecular dynamics simulations for an in silico assessment of potential affinity of compounds for hERG K + channel (Farid et al., 2006;Boukharta et al., 2011;Durdagi et al., 2011Durdagi et al., , 2012Dempsey et al., 2014). This latter approach was successfully applied by us in the past for investigating hERG affinity of a class of antimalarials (Gemma et al., 2012) and antipsychotics agents (Butini et al., 2009(Butini et al., , 2010Brindisi et al., 2014). However, despite its accuracy, this approach is extremely time consuming. So, adopting a protocol based on homology model, molecular docking, and molecular dynamics is a challenging task for studying a large number of compounds. To improve accuracy of investigation of large set of molecules, we have developed and validated an inclusive 3D-QSAR model aimed at obtaining a fast predictive in silico tool able to unveil potential hERG K + channel activity of structurally diverse blockers at the early stages of our drug discovery and development process. This tool could be of pivotal importance for assisting us and others in designing active compounds for a given target endowed with limited hERG K + channel affinity.

RESULTS AND DISCUSSION
We have recently reported a series of predictive 3D-QSAR studies for different targets, in which a Phase common features pharmacophore has been used as the alignment rule for deriving quantitative structure-activity relationship models. These latter were used in virtual screening protocols and for rationally designing a library of compounds active on a given target (Brogi et al., 2011(Brogi et al., , 2013(Brogi et al., , 2015(Brogi et al., , 2016Castelli et al., 2012;Pasquini et al., 2012;Zaccagnini et al., 2017). The results of the abovementioned drug discovery investigations inspired us to generate a comprehensive computational model for predicting potential hERG-related cardiotoxicity of compounds during the early steps of the drug discovery process employing the same software. For this purpose, a data set of 421 compounds (see the Supplementary Material for further details and references) with hERG K + channel binding affinity spanning five orders of magnitude (from 3.3 nM of compound 1 to 54 µM of compound 421, Table  S1) and belonging to different structural classes, were used for deriving a predictive hERG 3D-QSAR model. Among them, 22 compounds selected from the literature (1-22, Figure 1, Table  S1) as potent hERG K + channel blockers such as astemizole (1) (Finlayson et al., 2001;Fletcher et al., 2002;Blackburn et al., 2006;Murphy et al., 2006;Zhu et al., 2006;Deacon et al., 2007;Coon et al., 2009;Owen et al., 2009;Patel et al., 2009;Liu et al., 2010;Aspiotis et al., 2011;Levoin et al., 2011) were chosen for developing a common features pharmacophore (Figure 2), subsequently used as alignment rule for generating the 3D-QSAR model. At this step the compounds were randomly divided, 60% in the training and 40% in the test set (Table S1). This choice was made to guarantee the inclusion of the positive information arising from 60% of the compounds included in the training set (corresponding to 253 compounds), for the development of the computational tool. At the same time the high number of compounds kept in the test set (40%, 168 compounds) assures an appropriate evaluation of the predictive power of the generated model by means of an exhaustive internal cross-validation. Remarkably, for reducing the weakness of the ligand-based approach, an extensive conformational analysis for the selected ligands was accomplished by means of MacroModel software (see experimental section for further details). Conformational analysis is relevant for enhancing the quality of the alignment for the compounds used to build the 3D-QSAR model as well as the consistency of the in silico tool (Brogi et al., 2011(Brogi et al., , 2013(Brogi et al., , 2015Durdagi et al., 2011;Zaccagnini et al., 2017).
The top-ranked hypothesis (AHPRR.42), obtained by means of Phase, was formed by five features: two aromatic rings (R 1 and R 2 ), one hydrogen-bond acceptor (A), one hydrophobic site (H), and one positive ionizable function (P). Pharmacophore AHPRR (shown in Figure 2A superimposed to astemizole (1), its inter-feature distances are reported in Figure 2B) highlighted the structural requirements for interacting with the channel, as confirmed by the comparison between the docked pose of astemizole (1) into hERG K + channel homology model (Masetti et al., 2008) with the alignment of 1 onto the pharmacophore model. Moreover, the analysis with molecular determinants reported by Farid and co-workers for a large number of hERG blockers is consistent with our developed computational model (Farid et al., 2006). Accordingly, it could be inferred that pharmacophore AHPRR actually accounts for relevant interactions between selected molecules and hERG channel. Indeed, it is well known that π-π stacking or hydrophobic contacts with key residues of hERG K + channel such as Y652 and/or F656 as well as ionic interactions are commonly found for a large number of hERG blockers. In fact, positively charged functional groups are present in many drugs that strongly bind hERG K + channels. The presence of atoms able to form polar contacts with T623 and S624 represents another important requirement for hERG inhibition. Therefore, the above-mentioned AHPRR hypothesis accounts for the key features, which play a pivotal role for hERG K + channel affinity, when at appropriate reciprocal distances. Consequently, it is not arbitrary to state that matching the pharmacophore may be predictive for binding to the channel. The AHPRR hypothesis was then used to align the molecules for the development of an atom-based 3D-QSAR model. As suggested by the Phase's user manual, the atom-based version of Phase's 3D-QSAR workflow was preferred to the pharmacophore-based one. Such a choice implied considering the whole contributions to hERG channel activity as deriving from all the structural features also including the steric clashes, rather than from only those from the pharmacophore relevant features. Models containing one to seven partial least squares (PLS) factors were generated, whose statistical parameters are detailed in Table 1. The model featuring seven PLS factors was chosen, since it better performed on the whole than those with fewer PLS factors. We have considered models containing up to seven PLS factors to avoid over-fitting phenomena. The high correlation and cross-validated correlation coefficients (r 2 = 0.911 and Q 2 = 0.802, respectively) of the selected model together with the high Pearson R-value (R-Pearson = 0.901), suggested a close correspondence between estimated and experimental Kivalues. These parameters are indicative of a computational model characterized by a strong predictive power and significance.
A scatter plot of experimental against predicted activities was built to assess the results (Figure 3), which showed that Ki-values were effectively estimated for compounds belonging to training and test sets (Table S1). This latter, coupled with the limited number of PLS factors, the small RMSE-value, standard deviation (SD), variance ratio (P), and the large F-value supported the reliability of the approach.
3D plots of the crucial volume elements occupied by ligands were employed to visualize the outcome of the 3D-QSAR model. In Figure 4 is depicted the 3D plot representation of the model superimposed to high (1, 2, 3, and 13), moderate (35, 56, 398, and 409), and less active derivatives (54, 57, 75, and 421; Figures 4A-N, respectively). In this illustration, the cubes represent positive (blue cubes) and negative (red cubes) coefficients. For a ligand possessing atoms or functional groups occupying these volumes an increase or a decrease of activity  could be predicted. Notably, compound 1 as well as other highly active molecules mainly occupy the blue regions (Figures 4A-D), while the less active compounds such as 421 largely resides on the red regions (Figures 4I-N).
After the generation of the 3D-QSAR model and, in order to perform its theoretical validation, an external test set was selected from the literature. This set was composed of 309 unrelated compounds not used for generating the model, with different inhibitory potency against hERG K + channel (ranging from 0.28 nM to 51 µM; Table S2 in the Supplementary Material). We have included very active compounds in order to stress the model for assessing its actual predictive power. Satisfyingly, the activities of the compounds included in the external test set were satisfactorily estimated by our model (Table S2). In the scatter plot depicted in Figure S1, the experimental and predicted pKi-values of these compounds are also displayed (correlation coefficient r 2 ext_ts = 0.860). This outcome provided further indication that the correlation of the model was not accidental.
Furthermore, the mentioned model was submitted to a further in silico validation by using two different approaches. In order to perform this step, we applied a commonly used validation method based on the generation of decoys set. Starting from highly active compounds used for the model generation (1-22), other compounds with relevant potency against hERG K + channel (cut-off Ki ≤ 150 nM; Table S1) and active molecules from the external test set (cut-off Ki ≤ 150 nM; Table S2) for a total of 111 actives (Table S3), we generated 7,250 decoys by means of Database of Useful Decoys: Enhanced (DUD-E) server Mysinger et al., 2012; see Materials and Methods section for further details about the selection of active compounds). This procedure is largely used to assess the ability of in silico tools such as 3D-QSAR models, to discriminate between inactive or active derivatives Thangapandian et al., 2011;Braga and Andrade, 2013;Krishna et al., 2014;Brogi et al., 2015Brogi et al., , 2016. Based on the obtained results, the assessment clearly demonstrated the validity of the proposed model. The analysis of the results (Figure 5A) of the decoys set revealed a trend where the inactive compounds fail to completely fulfill all the pharmacophore features, thus making their predicted activity very poor or absent. On the contrary, active compounds were reasonably well estimated by the 3D-QSAR model. Notably, we have found 39 actives in the top fifty ranked compounds. Moreover, this qualitative analysis was also supported by Enrichment Factor (EF) and Goodness of hit (GH) values. The statistical parameters obtained from the model validation were reported in Figure 5A and calculated as reported in the Materials and Methods section.
In particular, a database of 7,361 compounds (D) including 111 known active molecules (A) was used to early validate our hERG 3D-QSAR model herein discussed. The database screening results ( Figure 5A) displayed that 50 molecules were found as hits (Ht) (cutoff value corresponding to potential purchasable compounds after a virtual screening protocol). Among Ht, 39 compounds (Ha) are known hERG blockers (A). From screening we calculated an EF = 51.72, meaning that it could be about 52 times more probable to find active molecules from chemicaldatabases with respect to chance. The estimated GH score value of 0.67, larger than 0.5, demonstrates a great consistency of the model, suggesting that the presented computational model could be efficiently used for designing compounds with reduced hERG K + channel affinity.
The 3D-QSAR model was evaluated by means of the receiver operating characteristic (ROC) curve, for assessing the balance between model sensitivity and specificity (capability to discover true positives and capability to avoid false positives, respectively; Triballeau et al., 2005;Zhao et al., 2009;Zaccagnini et al., 2017). Compounds included in the validation database (7,361 compounds) were virtually screened by using the generated computational tool and ordered according to their estimated pKi. The output of ROC curve provided a score for assessing the performance of the model. In particular, the closer the ROC score is to 1.0, the better is the model at discriminating active from inactive compounds. The AUC score of our presented in silico model was found to be 0.96 (Figure 5B), showing high confidence of the 3D-QSAR model, indicating that the computational tool possessed a rationale for virtual screening, and it could be effectively used to rationally design compounds with reduced hERG activity. Notably the performance of our model appears to be comparable with the previous published models (Aronov, 2005;Wang et al., 2013Wang et al., , 2016Braga et al., 2015).

CONCLUSION
In summary, a pharmacophore hypothesis was built by applying ligand-based pharmacophore generation workflow, implemented in Phase, followed by the development and validation of a 3D-QSAR model, by using PLS analysis for estimating the hERG K + channel activity, using a large set of compounds (training set, test set, and an external test set for a total of 730 molecules). The aim of this approach was to develop a fast and reliable inhouse computational tool for the prediction of hERG K + channel activity during the drug discovery process. The computational model developed and validated by us is endowed with strong predictivity, as herein described, and might be useful for the rational design and optimization of new inhibitors against a given target with reduced potential hERG-related cardiotoxicity.

3D-QSAR Model Generation
3D structure building, pharmacophore mapping and 3D-QSAR studies were carried out by means of Maestro 10.1 (Schrödinger, LLC, New York, NY, 2015). Phase 4.2 (Schrödinger, LLC, New York, NY, 2015), implemented in Maestro suite, was used to generate pharmacophore hypotheses and 3D-QSAR model for hERG K + channel. Conformers of each derivative were generated by means of MacroModel 10.7 (Schrödinger, LLC, New York, NY, 2015), using the OPLS_2005 force field (Jorgensen et al., 1996). The solvent effects are simulated employing the analytical Generalized-Born/Surface-Area (GB/SA) model (Still et al., 1990), and no cutoff for nonbonded interactions was selected. Polak-Ribiere conjugate gradient (PRCG) method with 2,000 maximum iterations and 0.001 gradient convergence threshold was employed. The conformational searches were carried out by applying MCMM (Monte Carlo Multiple Minimum) torsional sampling method. Automatic setup with 21 kJ/mol (5.02 kcal/mol) in the energy window for saving structure and a 0.5 Å cutoff distance for redundant conformers was used.
Phase was employed to develop common features hypotheses. Pharmacophore feature sites for the compounds were specified by a set of features well-defined in Phase as hydrogen-bond donor (D), hydrogen-bond acceptor (A), aromatic ring (R), positively charged group (P), negatively charged group (N), and hydrophobic group (H). Twenty-two active compounds (Figure 1 and in Table S1 in the Supplementary Material) were selected for generating the pharmacophore hypotheses for hERG K + channel. Common pharmacophore hypotheses were identified, scored and ranked by means of conformational analysis and tree-based partitioning techniques. The best ranked pharmacophore model obtained by Phase [AHPRR, shown in Figure 2 superimposed to astemizole (1)], consisted of five features: two aromatic rings (R 1 and R 2 ), one hydrogen-bond acceptor (A), one hydrophobic site (H), and one positive ionizable function (P). This pharmacophore was chosen for further 3D-QSAR analysis. All molecules used for QSAR studies (Table S1) were aligned to the selected pharmacophore hypothesis. In this study, we set a pKi threshold for the selection of active and inactive ligands. Compounds were chosen based on the displacement assay against hERG using the labeled [ 3 H]dofetilide. In particular, compounds that showed a Ki comprised between 5 and 54 µM were considered as inactive compounds. Moderate inhibitors were considered compounds with Ki between 50 nM and 5 µM, while compounds possessing a Ki ≤ 50 nM were considered potent inhibitors of hERG K + channel and consequently as active in hERG blockage during 3D-QSAR model generation, a very similar selection was already reported in literature (Durdagi et al., 2011). Notably only molecules with experimentally definite inhibitory potency have been selected to develop the in silico model, for avoiding possible faults arising from the inclusion in the set of molecules with uncertain activity. Atom-based QSAR models were generated for hERG hypothesis using the 253 compounds in the training set (421 compounds were randomly divided 60% in the training and 40% in the test set) and a grid spacing of 0.5 Å. QSAR models containing one to seven PLS factors were produced, and an internal cross-validation was achieved employing leave-nout (LnO) technique as specified in Phase user manual (Phase, version 4.2, User Manual, Schrödinger press, LLC, New York, NY, 2015).

3D-QSAR Model Validation
Extensive model validation was performed using an external test set of compounds (309 molecules) not used for generating and cross validating the model. Compounds were prepared by using Maestro, LigPrep and MacroModel adopting the same procedure for preparing the molecules used to derive the model.
DUD-E web server (http://dude.docking.org access date November 2016) was employed to generate a set of decoys starting from the active compounds selected to develop the pharmacophore model, other compounds with relevant activity against hERG K + channel and active compounds in the external test set, for a total of 111 active compounds (Table S3). For this set of active ligands, DUD-E server made available 7,250 inactive ligands from a subset of the ZINC database (http://zinc.docking.org access date November 2016) filtered by means of Lipinski rules for drug-likeness, for a total of 7,361 compounds between inactives and actives. Each of these inactive decoys was selected to bear a resemblance to the reference molecule in terms of physico-chemical properties but a divergence from it in 2D structure (e. g. large difference of Tanimoto coefficient between decoys and active compounds). The generated decoys were divided in 145 smiles files, downloaded from the website, imported into Maestro and prepared by means of LigPrep 3.3 (Schrödinger, LLC, New York, NY, 2015) to properly convert smiles into three-dimensional structures. Subsequently, in order to perform a minimization and a conformational search of the obtained structures MacroModel program was adopted (the same parameters reported for ligand preparation were applied). A single file with conformers of active molecules and decoys was produced and submitted to the software for evaluating the activity against hERG K + channel using the 3D-QSAR model employing "search for matches" option. After decoys generation and the prediction of pKis, the Güner and Henry score, i.e., goodness of hit-list (GH) and the enrichment factor (EF) value were estimated using the Equations 1, 2, respectively.

EF =
Ha/Ht Ht represents the total number of compounds in the hit list found by virtual screening, Ha is the total actives found by virtual screening considering the top 50-ranked position (positions comprise within the cutoff value). The total number of compounds (Ht) might represent the amount of molecules to purchase after a virtual screening protocol and almost the 1% of the considered database (D). A represents the total of the active derivatives enclosed in the database, and D stands for the total number of molecules existing in the set. The GH score ranges from 0 to 1. The GH score 0 indicates a null model; while the GH score 1 denotes an ideal model. Additionally, by using the equations 3 and 4, the % yield of actives (%YA) and % ratio of actives (%RA) were calculated, respectively. %YA = Ha Ht * 100 Enrichment Calculator (enrichment.py) script (https://www. schrodinger.com/scriptcenter) was employed to assess the predictive power of the 3D-QSAR model by a Receiving Operator Curve (ROC). The mentioned script calculates the enrichment metrics, including area under the receiver-operating characteristic curve (AUC), from virtual screening by means of the output structure file and a list of known active molecules. The output of the screening protocol, employing decoys and active molecules, consisted of a list of molecules ranked by the predicted activity from the top-predicted molecules as estimated by the 3D-QSAR model. These ranking data and a list file of active compounds were used as input for enrichment.py application.

AUTHOR CONTRIBUTIONS
GC carried out the computational experiments and performed the acquisition, analysis, and interpretation of data. SG evaluated the integrity of every section of the manuscript and collaborated in writing the introduction. GiCa drafted the work and revised it critically for medicinal chemistry content; and approved the submitted version. SB conceived, designed, and performed the computational experiments, supervised the overall work, wrote and revised the manuscript. StB collected the literature focusing on the selection of compounds used in this study and revised the manuscript. MB analyzed the data, contributing in writing and revising the manuscript.