Experimentally Validated Pharmacoinformatics Approach to Predict hERG Inhibition Potential of New Chemical Entities

The hERG (human ether-a-go-go-related gene) encoded potassium ion (K+) channel plays a major role in cardiac repolarization. Drug-induced blockade of hERG has been a major cause of potentially lethal ventricular tachycardia termed Torsades de Pointes (TdPs). Therefore, we presented a pharmacoinformatics strategy using combined ligand and structure based models for the prediction of hERG inhibition potential (IC50) of new chemical entities (NCEs) during early stages of drug design and development. Integrated GRid-INdependent Descriptor (GRIND) models, and lipophilic efficiency (LipE), ligand efficiency (LE) guided template selection for the structure based pharmacophore models have been used for virtual screening and subsequent hERG activity (pIC50) prediction of identified hits. Finally selected two hits were experimentally evaluated for hERG inhibition potential (pIC50) using whole cell patch clamp assay. Overall, our results demonstrate a difference of less than ±1.6 log unit between experimentally determined and predicted hERG inhibition potential (IC50) of the selected hits. This revealed predictive ability and robustness of our models and could help in correctly rank the potency order (lower μM to higher nM range) against hERG.

1 Research Center for Modeling and Simulation, National University of Science and Technology, Islamabad, Pakistan, 2 Victor Chang Cardiac Research Institute, Sydney, NSW, Australia, 3 School of Chemistry, The University of Sydney, Sydney, NSW, Australia The hERG (human ether-a-go-go-related gene) encoded potassium ion (K + ) channel plays a major role in cardiac repolarization. Drug-induced blockade of hERG has been a major cause of potentially lethal ventricular tachycardia termed Torsades de Pointes (TdPs). Therefore, we presented a pharmacoinformatics strategy using combined ligand and structure based models for the prediction of hERG inhibition potential (IC 50 ) of new chemical entities (NCEs) during early stages of drug design and development. Integrated GRid-INdependent Descriptor (GRIND) models, and lipophilic efficiency (LipE), ligand efficiency (LE) guided template selection for the structure based pharmacophore models have been used for virtual screening and subsequent hERG activity (pIC 50 ) prediction of identified hits. Finally selected two hits were experimentally evaluated for hERG inhibition potential (pIC 50 ) using whole cell patch clamp assay. Overall, our results demonstrate a difference of less than ±1.6 log unit between experimentally determined and predicted hERG inhibition potential (IC 50 ) of the selected hits. This revealed predictive ability and robustness of our models and could help in correctly rank the potency order (lower µM to higher nM range) against hERG.

INTRODUCTION
The human ether-a-go-go-related gene (hERG) encoded potassium ion (K + ) channel is an important component of the pore-forming α-subunits that conduct the rapidly activated delayed rectifier potassium current (I Kr ) which is a major repolarization current of the cardiac action potential (Sanguinetti et al., 1995;Vandenberg et al., 2012).
A prolongation of the cardiac action potential and the QT interval on the surface electrocardiogram (ECG) has been associated with loss of function mutations in hERG (Yang et al., 2009;Sun et al., 2013;Zhang et al., 2013) or drug-trapping inside the central cavity of the hERG potassium channel and thus, may predispose to life-threatening ventricular tachyarrhythmia "Torsade-de-points" (TdP) (Roden, 2004;De Bruin et al., 2005). Other factors such as, electrolyte imbalance, ischemia are also reported (Sauer and Newton-Cheh, 2012). Several drugs The International Conference on Harmonization (ICH), S7A (Food and Drug Administration, 2001) and S7B (Food and Drug Administration, 2005a) guidelines regarding technical requirements for registration of pharmaceuticals for human use requires preclinical assessment of QT prolongation risk prior to first administration in human. Although, hERG inhibition and the resulting risk of QT prolongation does not preclude clinical development, there are significant costs associated with this since any drugs that show hERG block must also be assessed clinically using a thorough QT/QTc study (Food and Drug Administration, 2005b) Furthermore, a compound without hERG liability is at a commercial advantage as compared to a competitor compound known to block the hERG channel.
According to the current guidelines on pre-clinical testing, the experimental hERG blockade ability of a compound is typically quantified in terms of concentration of the half-maximal inhibition (IC 50 ) that is the most simple and amendable way (Redfern et al., 2003). However, kinetics of the channel, state dependent binding properties (Lee et al., 2016) and temperature dependent properties are also important factors in determining pro-arrhythmic risk assessment (Windley et al., 2018). The gold-standard method for the assessment of hERG liability is the patch-clamp electrophysiological assay (Hamill et al., 1981) on hERG transfected cells. However, this assay is costly, and not well-suited for extensive screening. Various other strategies including radiolabeled binding assays (Finlayson et al., 2001;Chiu et al., 2004), functional assays (Tang et al., 2001), and rubidium efflux assays (Chaudhary et al., 2006) have been developed but, these assays often result in significant under or overestimation of hERG potency when compared with electrophysiological methods (Hancox et al., 2008).
To overcome these limitations, several in silico models including, classification or machine learning approaches (Dubus et al., 2006;Sun, 2006;Li et al., 2008Li et al., , 2017Thai and Ecker, 2008a;Kireeva et al., 2013;Liu et al., 2014;Chavan et al., 2016;Wang et al., 2016;Sun et al., 2017;Alves et al., 2018;Lu et al., 2018;Siramshetty et al., 2018;Wacker and Noskov, 2018), pharmacophore models Aronov, 2005Aronov, , 2006Tan et al., 2012;Kratz et al., 2014;Wang et al., 2016;Chemi et al., 2017), 2D QSAR (Keserü, 2003;Bains et al., 2004;Seierstad and Agrafiotis, 2006;Song and Clark, 2006) and 3D QSAR (Ekins et al., 2002;Pearlstein et al., 2003) such as, comparative molecular similarity indices analysis (CoMSIA; Pearlstein et al., 2003), comparative molecular field analysis (CoMFA; Cavalli et al., 2002) and hologram QSAR (hQSAR; Keserü, 2003) have been developed to assess the pro-arrhythmia properties of drugs and their derivatives (Raschi et al., 2009). However, a generic 3D descriptors model on diverse data set of hERG inhibitors possessing good statistical parameters has not been reported yet. Here, we aim to develop a protocol using both ligand and structure based pharmacoinformatics approaches for the identification of general 3D structural features involved in hERG inhibition. Toward this goal, various GRid INdependent molecular Descriptor (GRIND) models have been developed using a diverse dataset of 207 hERG blockers extracted from the literature. Compounds with best activity/molecular weight (Hopkins et al., 2004) and activity/lipophilicity ratio (Leeson and Springthorpe, 2007; see Figure 1) were selected for building a pharmacophore hypothesis of hERG liability. Finally, proposed pharmacophore was validated by testing our identified hits using whole-cell patch clamp analysis on transfected hERG-CHO cells. Our model predictions show a difference of 1.6 and <0.1 log units from the experimentally determined pIC 50 values of hits in the nanomolar (nM) to micromolar range (µM), respectively. Overall, our model can aid in the correct prediction of the hERG liability trends of NCEs from nM to µM range.

Dataset Collection and Refinement
A dataset of 1,428 compounds with known hERG inhibitory potency (IC 50 ) values were collected from literature databases including, 938 compounds from the ChEMBL database (Bento et al., 2013), 140 compounds from the Fenichel database http:// www.fenichel.net/pages/Professional/subpages/QT/Tables/ pbydrug.htm) and 350 compounds from publications by Kramer et al. and Polak et al. (Kramer et al., 2008;Polak et al., 2009) as shown in Figure S1.
Inconsistencies in the dataset were removed by the application of various data refinement steps see Figure S2. Initially, all duplicates and fragments (molecular weight <200) in the data set were removed followed by manual correction of stereochemistry of stereoisomers. In order to remove any bias in the biological experiments, only those compounds for which hERG inhibitory potency (IC 50 ) was determined using whole cell patch clamp technique were shortlisted. However, IC 50 values of the selected compounds against hERG were calculated using two different mammalian expression system namely, Human Embryonic Kidney (HEK293) cell lines and the Chinese Hamster Ovary (CHO). Overall, a strong positive correlation R 2 = 0.91 has been observed between reported pIC 50 values of compounds tested using both HEK293 and CHO cell lines (Kramer et al., 2008;Polak et al., 2009; Figure S3). Therefore, compounds tested by whole cell patch clamp experiment using both HEK293 and CHO cell lines were kept in the final data set of 207 hERG inhibitors (see Supporting Information SMILES.csv file).
The 3D structures of all compounds in the dataset were generated using software MOE (I, 2013). Protonation and correction of partial charges were performed followed by energy minimization using MMFF94x force field (Halgren, 1996). Diverse subset selection procedure as reported by K.M. Thai et al. (Thai and Ecker, 2008b) was used to divide the data set into 80% training and 20% test set for further GRid-Independent Molecular Descriptor Analysis (GRIND).

Grid-Independent Molecular Descriptor (GRIND)
GRIND are alignment-free molecular descriptors, linked with 3D structural conformations of the dataset (Caron and Ermondi, 2007). Therefore, we used standard 3D conformations (Gasteiger et al., 1990), 3D energy minimized conformations (Gill et al., 1981), induced fit docking conformations (Sherman et al., 2006) in open state cryo-structure of hERG (Wang and MacKinnon, 2017), and molecular conformations obtained from stochastic search algorithm (Ferguson and Raber, 1989; see Supporting Information for conformational analysis details) to develop four different GRIND models. Briefly, molecular interaction fields (MIF) using four different probes including, "DRY" (hydrophobic), "N1" as (neutral flat amide: hydrogen bond donor hotspots), O (sp2 carbonyl oxygen: hydrogen bond acceptor) and TIP (molecular shape) were computed by software package Pentacle v 1.07 (Durán Alcaide, 2010). Total energy at each point was computed as the sum of Lennard-Jones (Elj), electrostatic (Eel), and hydrogen bond (Ehb) as described in Equation (1) by iteratively placing each probe at different GRID steps.
AMANDA algorithm (Durán et al., 2008) was applied for the extraction of the most relevant MIF using default energy cut-offs −0.5, −2.6, −4.2, −0.75 values for DRY, O, N1, and TIP probes, respectively. Consistently Large Auto And Cross Correlation (CLACC) algorithm was used to encode the prefiltered nodes into four auto (DRY-DRY, O-O, N1-N1, TIP-TIP)  and six cross (DRY-O, DRY-N1, DRY-TIP, O-N1, O-TIP, N1-TIP) correlograms (Durán Alcaide, 2010). These correlograms depict favorable/unfavorable interactions at Virtual Receptor Site (VRS). In order to reduce any redundancies in the large GRIND variable matrix, original variables are converted into default five principal components (PC) and latent variables (LV) for principal component analysis (PCA) and partial least square (PLS), respectively (Mannhold et al., 2006). Each independent GRIND model was built using Leave One Out (LOO) cross validation procedure of partial least square (PLS) analysis. Overall, the statistical significance of the model was determined by q 2 and r 2 and standard deviation error prediction (SDEP). The q 2 is the predictive ability of a model obtained by cross-validation procedure. Whereas, r 2 is an index of model fitting on the training set defined as correlation coefficient of determinent. However, the standard deviation error of the predictions (SDEP) is an index of the model predictive ability obtained by cross-validation (Durán Alcaide, 2010).
In order to further probe the robustness of our final training model, two independent test sets were used for external validation. Test set I (41 compounds) was obtained after diverse subset selection procedure of the original dataset of know hERG inhibitors (Supporting Information SMILES.csv file) whereas, test set II (8 compounds) consists of an antimalarial Triazolopyrazine (TP) analogs possessing hERG liability as shown in Figure 4 taken from publically available data of Open Source Malaria (OSM) database (Williamson et al., 2016).

Template Selection for Pharmacophore
In order to correlate VRS with 3D structural features of the hERG inhibitors, various pharmacophore models have been developed using selected templates with best activity/lipophilicity (LipE) and activity/molecular weight (LE) ratio (see details-in Supporting Information Methodology section and Figure S4).
Ligand-protein interaction guided pharmacophore models were developed using docked conformations of the selected templates (Figure 1) in open state cryo-structure (Wang and MacKinnon, 2017) and closed state homology model of hERG proposed by Stansfeld et al. (2007). The complete docking protocol and pose evaluation procedure have been discussed in Supporting Information (Figure S5).

Pharmacophore Modeling
The pharmacophoric sites and Gaussian radius size were used for optimizing the pharmacophoric hypothesis. Each pharmacophore model was tested against 207 compounds in our data set by setting an active threshold value of IC 50 ≤ 40 µM. Quality of each pharmacophore was assessed by calculating Mathews correlation coefficient (MCC) as described in Equation (2).

MCC
Overall, a final model with best statistical parameters was further used for virtual screening.

Virtual Screening
Finely selected pharmacophore model was used for virtual screening of the online ChemBridge database (Groom et al., 2016; http://www.chembridge.com/internal/) and Open Source Malaria (OSM) database (Williamson et al., 2016; https://ses.library.usyd.edu.au/handle/2123/15389). After pharmacophore screening 4,095 and 300 hits were obtained by Chembridge and OSM database, respectively. Various hit selection filters like Lipinski rule of five were applied to identify drug-like compounds (Lipinski et al., 2012), prediction of pIC 50 values through GRIND model in order to consider potent compounds. The hERG inhibitory potency (pIC 50 ) values of obtained hits were predicted by our final GRIND model using Pentacle software (Durán Alcaide, 2010). Finally, two compounds including one from ChemBridge (ID: 5931690) and one from OSM database (ID: OSM-S-31) with predicted IC 50 values in nanomolar (nM) and micromolar (µM) range were selected for experimental evaluation by whole cell patch clamp technique. The NMR spectrum and associated data of both selected hits have been provided in Figures S7, S8.

Molecular Biology/Cell Culture
Chinese hamster ovary (CHO) cells stably transfected Kv11.1 were purchased from American Type Culture Collection (ATTC reference PTA-6812). Whereas, the CHO cells were cultured in Hams F12 nutrient mix (ThermoFisher Scientific, Waltham, USA) containing 5% fetal bovine serum (Sigma-Aldrich, Sydney, Australia) and maintained at 37 • C in 5% CO 2 . All chemicals were purchased from Sigma-Aldrich (Sydney, Australia) unless otherwise stated.

Electrophysiology
CHO cells were prepared 24 h before the experiment. The whole cell patch clamp electrophysiology studies were performed at 22 • C. Glass capillary patch electrodes with resistance 2-4 M were pulled with borosilicate glass using vertical two-stage puller (Harvard Apparatus, Holliston, USA). The pipettes were filled with internal solution containing (in µM):120 potassium gluconate, 20 KCl, 10 HEPES, 5 EGTA, 1.5 Mg-ATP and pH adjusted to 7.2 with KOH. The cells were superfused with external bath solution that contained (in µM): 130 NaCl, 12.5 glucose, 10 HEPES, 5 KCl, 1 MgCl 2 , 1 CaCl 2, and pH was adjusted to 7.4 with NaOH. The calculated liquid junction potential for these solutions was −15 mV (Barry, 1994) and corrected by adjusting voltage pulse protocol for all experiments.
The whole-cell patch clamp mode was applied to cells in the voltage clamp configuration using Axopatch 200B amplifier (Molecular Devices, Sunny Vale, USA). Current signals were digitized at 5 kHz, filtered at 1 kHz and stored on IBMcompatible PC interfaced with a Digidata 1440A analog to digital converter (Molecular Devices, Sunny Vale, USA). Initial series resistance values were 2-5 M which was compensated by at least 80% in all experiments. Leak subtraction was performed manually offline using acquisition software pClamp 10.2 (Molecular Devices, Sunny Vale, USA). A reusable microfluidic device (Dynaflow Resolve, Collectricon, Mölandal, Sweden) with <30 ms solution exchange time (Hill et al., 2014) was used for drug delivery. With the dynaflow system, different drug concentrations under laminar flow can be delivered to the cell.

Compounds Solution
Both test compounds were prepared as a stock solution by dissolving in DMSO with a final concentration of 0.01% (v/v) in recording solution. The final concentration of DMSO if below 0.1% (v/v) has no effect on the activity of hERG channel (Walker et al., 1999).

Voltage Protocol
To measure drug block "Step-Ramp" protocol was used (Crumb et al., 2016). Cells were depolarized from a holding potential to −80 to +20 mV for 500 ms to fully activate the channels. Cells were again repolarized to −80 mV over 250 ms.

Data Analysis
For the analysis of drug block Clamp fit 10.2 software (Molecular Devices, Sunny Vale, USA) was used. Hill equation (Equation 3) was used for analyzing steady state dose response Where [x] represents the concentration of compound and n H is the Hill coefficient (the slope parameter). The IC 50 represents the concentration where there is a 50% blockage of channel current. Prism software was used for carrying out the statistical analysis.
FIGURE 2 | A plot between first two principal components (PC) illustrating descriptor space of 166 training set (hollow circles) and 41 test set (filled circles).
Frontiers in Pharmacology | www.frontiersin.org  Figure 6) The bold number represents finally selected model.

RESULT
The structural variance of the training data was determined by principal component analysis (PCA; Wold et al., 1987) using complete sets of GRIND variables. Principal component scores of the training data vary from −5.8 to 6.2 as shown in Figure 2. Also projection of test set I and II reflect a principal component space. Briefly, principal component scores of first two principal components of GRIND variables explained only 37% of 3D structural variance of the dataset and divided it into three main clusters (I, II and III) as shown in Figure 2.
The chemical scaffolds of compounds in cluster I exhibit one hydrogen bond donor group that is complementary to O (carbonyl oxygen) probe at VRS that defines the most relevant regions for ligand-protein interactions. Compounds in cluster II exhibit one hydrogen bond acceptor and one hydrogen bond donor group in respective chemical scaffolds that depicts a complementary N1 (amide nitrogen) and O (carbonyl oxygen) probe, respectively at VRS. The remaining compounds in cluster III contain a maximum of three hydrogen bond acceptors and four hydrogen bond donor groups but otherwise cover a diverse range of chemical scaffolds. One representative compound from each cluster and the GRIND descriptor features of first two principal components are shown in Figure S10.
The importance of hydrogen bonding properties in hERG inhibition is already evident from various investigations (Aronov and Goldman, 2004;Choe et al., 2006;Farid et al., 2006;Chemi et al., 2017). However, the impact of the number of hydrogen bond acceptor and donor groups and their mutual distances in a given chemical scaffold on the hERG inhibition (IC 50 ) has not been reported so far. Therefore, in the present investigation, the impact of number and mutual distances of hydrogen bond acceptor/donor groups on hERG inhibition potency of diverse chemical scaffolds has been illustrated with the help of the partial least square analysis (PLS) of GRid-INdependent molecular Descriptors (GRIND).

Grid-Independent Molecular Descriptor (GRIND) Analysis
Partial least square (PLS) analysis using Leave One Out (LOO) cross-validation procedure (Elisseeff and Pontil, 2003) on individual set of molecular conformations of the data was performed to develop four independent GRIND models using the software package Pentacle v 1.07 (Durán Alcaide, 2010). However, only the GRIND model that was developed using standard 3D molecular conformational set of data showed statistically acceptable results with q 2 of 0.54, r 2 of 0.62 and Standard Deviation of Error Prediction (SDEP) of 0.94 (see Table 1). In order to further improve the statistical parameters and to remove the respective inconsistent GRIND variable, the fractional factorial design (FFD) algorithm (Baroni et al., 1993) was applied to each model as described by Pastor et al. (Pastor et al., 2000). Briefly, mutual comparison of statistical parameters of respective models after 1st and 2nd FFD cycles revealed a statistically significant final GRIND model with standard 3D molecular conformations of the data set after 2nd cycle of FFD as shown in Table 1. Figure 3 shows a graph between q 2 and r 2 values of the final GRIND model up to the fifth latent variables (LV5). It illustrates a gradual increase in r 2 values up to LV5. However, q 2 values showed a decreasing trend after LV2. Therefore, a model with optimum statistics (q 2 = 0.63, r 2 = 0.69 and standard error of prediction (SDEP) = 0.84) was achieved at second latent variable (LV2), which was further used to correlate the structural variance of the data in general, and impact of number and mutual distances of hydrogen bond acceptors and donors in particular, with the hERG inhibition potential (pIC 50 ).
The plot between experimental vs. predicted hERG inhibitory potency values (pIC 50 ) of the training set and test set I (Information SMILES.csv file) and test set II (Figure 4) obtained after Leave One Out (LOO) cross-validation (Elisseeff and Pontil, 2003) and external validation procedure (Kiralj and Ferreira, 2009), respectively is shown in Figure 5. All compounds in training (R 2 : 0.67) as well as test set I (R 2 : 0.60) were predicted within a range of ±1.6 log units between experimental and predicted hERG inhibitory potency (pIC 50 ) values. However, one compound of the test set II OSM-S-189 showed an outlier behavior with a difference of 1.8 log unit differences between experimental (pIC 50 : 4.4) predicted (pIC 50 : 6.2) hERG inhibition potential. Subsequently, the experimental protocol using whole cell patch clamp technique also delineates that our model may overestimate the affinity of drugs with experimentally determined pIC 50 values in the high µM range. The final GRIND model was used to probe 3D structural features of the data that might contribute to the drug trapping phenomena. Briefly, PLS coefficient profiles of auto and crosscorrelograms in Figure 6 illustrates that DRY-DRY, TIP-TIP, DRY-TIP, and DRY-N1 pairs of variables map the 3D structural features of the data that play a major role in hERG inhibition potential (IC 50 ). Whereas, O-N1 and N1-N1 variables depict the 3D structural features of the data that are more prominent in least potent hERG inhibitors (IC 50 : 214-3,000 µM).
The highest peak in DRY-DRY auto correlogram in Figure 6 delineates the presence of two hydrophobic regions at a distance of 14.0-14.4 A • in virtual receptor site of hERG inhibitors exhibiting IC 50 values from 0.001 to 86 µM. In the present dataset of hERG inhibitors, this complements the distance between two or more aryl or aromatic moieties. However, these features are present at a shorter distance (5.6-6.0 A • ) in virtual receptor site around least active hERG inhibitors. Similarly, the highest peak in TIP-TIP auto correlogram in Figure 6 represents the pair of steric hotspots that define the 3D molecular shape of the hERG blockers. It elucidates the presence of two steric hotspot regions at a mutual distance of 20.0-20.4 A • around hERG inhibitors with IC 50 values ranges from 0.01 to 300 µM. Overall, both DRY-DR and TIP-TIP variables revealed the presence of polyaromatic rings on either side of the molecules as shown in Figure 7. It may point that hydrophobic molecular boundaries, perhaps owe to the unique shape of the hERG binding site as approximated by David et al. (Fernandez et al., 2004) by correlating the variation in hERG inhibition potential of various drugs with the change in van der Waals hydrophobic surface area of side chain residue Tyr_652 and Phe_656.
Similarly, DRY-N1 correlogram (Figure 6) maps the distance of a hydrophobic hotspot from a hydrogen bond donor hotspot region present at the VRS as shown in Figure 7. It has been observed that these two contours are present at a distance of 10.8-11.8 A • in VRS of highly potent hERG inhibitors with IC 50 range from 0.023 to 0.74 µM whereas, in least active hERG blockers (IC 50 > 100 µM) these two contours are present at 5.2-5.6 A • apart. Overall, our results show that one of the hydrophobic region (DRY1: yellow hotspots, enclosed by a circle Figure 7) might represent the most crucial contour as the distance of other pharmacophoric features including second hydrophobic region (DRY2), the steric molecular hotspot (TIP1) and a hydrogen bond donor (N1) contour has been calculated from this region. Thus, it is tempting to speculate that this hydrophobic region (DRY1) may provide an anchoring point for hydrophobic/aromatic interaction and the ligand may change its conformations in such a way to find complementary interaction points within binding site of hERG.
Interestingly, the highest negative peak in N1-N1 correlogram in Figure 6 represents the variables that depict two hydrogen bond donor contours at a distance of 5.6-6.0 A • surrounding the least potent hERG blockers (IC 50 values 252-2,200 µM). Similarly, the most negative correlogram for O-N1 in Figure 6 illustrates a hydrogen bond acceptor and a hydrogen bond donor hotspot region, respectively at a mutual distance of 8.4-8.8 A • surrounding the data with hERG inhibition potential (IC 50 ) value between 16 and 240 µM. Both the N1-N1 and O-N1 pair of probes have been identified surrounding 60% of compounds present in cluster III of PCA plot (Figure 2). These compounds exhibit 1-3 hydrogen bond acceptors and 1-4 hydrogen bond donor groups within diverse chemical scaffolds that complement respective N1-N1 and O-N1 hotspot regions as shown in Figure 8. However, none of the compounds in cluster I or cluster II of PCA plot (Figure 2) show a N1-N1 or O-N1 feature mainly due to the absence of second hydrogen bond donor and a hydrogen bond acceptor group within respective chemical scaffolds. Thus, it reflects that presence of two hydrogen bond acceptor groups in NCEs that complements N1-N1 hotspots of hydrogen bond donor probes at a distance of 5.6-6.0 A • may reduce the hERG inhibition potential. Also, one hydrogen bond donor group and one hydrogen bond acceptor group that complement to O-N1 contours at a distance of 8.4-8.8 A • might assist in reducing hERG liability. However, our results also emphasize that presence of one hydrogen bond acceptor group at a certain distance from a hydrophobic group within a chemical scaffold may increases the hERG liability (IC 50 ). This is evident from hydrophobic (DRY: yellow) and hydrogen bond donor (N1: blue) contours (Figure 8) at a distance of 10.8-11.8 A • in virtual   (TIP1 and TIP2), DRY-TIP representing a distance of 18.4-18.8 A • between hydrophobic molecular interaction field (DRY1: yellow) and steric hotspot (TIP2: green). DRY-N1 representing hydrophobic molecular interaction field (DRY1: yellow) at a distance of 10.8-11.8 A • from amide nitrogen representing hydrogen bond donor feature (N1: blue contours) that contribute positively to hERG blockage potential (pIC 50 ). Interestingly, the molecular features mapped by DRY-DRY correlogram complement the molecular features translated by the highest peak of DRY-TIP cross-correlogram peaks shown in Figure 6. Both DRY-DRY and DRY-TIP auto and cross-correlograms corresponds to the hydrophobic moieties attached at both sides of the linker region.
receptor space of all compounds in the present dataset of hERG inhibitors.

Template Selection for Pharmacophore Models
LipE profiles revealed that MK499, E4031, Dofetilide, and Trimethoprim showed LipE values of ≥5, logP between 0.4 and 3.0 and hERG inhibition potency (IC 50 ) of 0.001-1 µM ( Figure S4). This is in line with the already established thresholds of LipE and logP for the effectively transported drugs against therapeutic targets by Freeman-Cook et al. (Freeman-Cook et al., 2013). Therefore, MK499, E4031, Dofetilide, and Trimethoprim have been selected as templates for building pharmacophore based binding hypothesis of hERG.
While LipE normalizes for the lipophilic bias in potency description, ligand efficiency (LE) simply corrects for the size of a molecule by dividing the binding free energy of a compound by its heavy atom count. This approach is normally used in fragment-based drug design to select those fragments, which show optimal fit within the binding cavity of a protein/channel. Reynolds et al. (Reynolds et al.,  2007) have shown that LE is generally biased toward smaller molecules. Therefore, the normalized size independent fit quality (FQ) score of the data set was assessed (as explained in Supporting Information Equation 5) MK499, E4031, Dofetilide, and Trimethoprim showed maximum FQ score (>1) which indicate optimal fit of these compounds within the bonding cavity of hERG channel ( Figure S4). In order to remove any bias in the pharmacophore template selection criteria, here we also included those compounds exhibiting LipE > 4.0 and the LE > −0. 39 Kcal/mol/HA and FQ > 1 for pharmacophore query generation. These include 9-Hydroxy-risperidone, Benperidol, Droperidol, Norastemizole, Vesnarinone, BMCL-1835-4, Risperidone, Haloperidol, and Glycerol-nonivamide as shown in Figure 1. In order to probe most probable binding conformations of selected templates for building a pharmacophore, all 13 selected templates were docked into the recent cryo-EM structure of the hERG in its open state Wang and MacKinnon (Wang and MacKinnon, 2017). Additionally, the templates were also docked into closed conformational state of hERG homology model of Stansfled et al. (Stansfeld et al., 2007). The complete docking and pose evaluation protocol is discussed in Supporting Information Methods section.
Briefly, Table 2 provides an overview of the crucial amino acid residues interacting with selected templates in open and closed state of hERG and the GRIND contours that complement these interactions. However, a detail of the interaction pattern of selected templates has been provided in Supporting Information (Molecular Docking section, Figures S11-S15). Overall, final docking solutions of selected templates for the open state of hERG mainly occupy the basal cavity and show hydrophobic, π-cation and π-π interactions with amino acid residues Tyr_652, Phe_656, Ser_649 and Thr_623 in one or more subunits of hERG. Interestingly, these interactions complement the DRY-DRY, DRY-TIP and DRY-N1 probes representing the hydrophobic virtual receptor space (Figure 7) depicted by PLS coefficient correlogram in Figure 6. Whereas, in the closed state of the channel, the binding positions of the templates are shifted near the bottom of the selectivity filter and carbonyl oxygen of (MK499, Dofetilide, Trimethoprim etc.) form hydrogen bonding with Ser_624, as shown in Figures S11-S15. Additionally, π-π interactions between Phe_656/Tyr_652 and aromatic/hydrophobic moieties of templates have been identified (Figures S11-S15). Overall, these interactions correspond to GRIND mapped distance between a hydrophobic group and a hydrogen bond acceptor group within the highly potent hERG inhibitors as depicted by DRY-N1 correlogram in Figure 6. The interaction of hydrophobic substitutions with one of the four concentric rings of Tyr_652 and Phe_656 in the basal cavity and with Ser_649 and Thr_623 at the bottom of the pore helix has already been reported for drug trapping within hERG (Lees-Miller et al., 2000;Mitcheson et al., 2000;Fernandez et al., 2004). Specifically, the identified binding profiles of E4031, MK499, dofetilide and 9-Hydroxy Risperidone (Figures S11-S13) are also supported by previously reported interactions of these drugs in open and closed conformational state of hERG (Mitcheson et al., 2000;Dempsey et al., 2014) Thus, these further qualify the probable binding solutions of templates in open and closed conformational state of the hERG for pharmacophore modeling and virtual screening.

Pharmacophore-Based Virtual Screening (VS)
Finally selected binding solutions of 13 different templates (Figure 1) in open as well as closed conformations of hERG were used to build respective interaction profiles guided hERG inhibition pharmacophores. Prior to validation these 13 compounds were excluded from the whole dataset. Two statistically significant (maximum true positive and true negative) models per template in open and closed states of hERG have been selected for further feature analysis. 24 out of 26 pharmacophore models exhibit two aromatic, one hydrophobic and one hydrogen bond acceptor features at variable mutual distances. However, one hydrophobic feature was absent in two of the models built using Haloperidol template as shown in Table 3. Overall, only a slight difference in mutual distances of pharmacophore features in all 26 models has been observed which mainly reflect the dynamic nature of the complex physiological system. Additionally, all 26 pharmacophore models showed similar model statistics Table 3. However, a final pharmacophore with optimal true positive (TP: 71%), true negative (TN: 75%), sensitivity (0.72) and specificity rate (0.75) and Mathews correlation coefficient (MCC: 0.72) as shown in Table 3 and Figure 9 was selected for further comparison with GRIND features and virtual screening of validation set.

Experimental Validation
To experimentally test our identified pharmacophore features, a data set of 458,921 compounds from ChemBridge data base (Groom et al., 2016) and 405 compounds from OSM database (Williamson et al., 2016) have been screened against the final hERG inhibition model. Overall, 4,905 compounds from ChemBridge database (Groom et al., 2016) and 300 compounds from OSM (Williamson et al., 2016) screening set have been identified as hits. Various hit selection filters as described in Methods section were applied to further prioritize the hits that result in 340 and 170 hits from ChemBridge (Groom et al., 2016) and OSM database (Williamson et al., 2016), respectively. In order to further reduce the number of hits, hERG inhibition potential of final hit structures from both ChemBridge and OSM databases has been predicted using our final GRIND model (data not shown). The applicability domain of 100 top predicted hits from each of the data sets was further evaluated by principle components (PC) scores. The PC scores of 1st two principal components ranges from −5.0 to 5.0 fitted within the applicability domain of the training data (−5.8 to 6.2). Finally, one compound from each database with highest predicted hERG inhibition potential (IC 50 ) was applied to hERG-CHO cells and pIC 50 determined using the whole cell patch clamp technique. These include ChemBridge database compound ID: 5931690 and OSM database compound ID: OSM-S-31 with predicted hERG inhibition potency of 1.86 nM and 4.7 µM, respectively (Figures 10, 11, respectively).
To access the IC 50 for hERG inhibition we used a step ramp protocol (Crumb et al., 2016) as described in the Methods. Cisapride, 20 nM, was used as a positive control and all experiments were performed at ambient temperature. Drug solutions were prepared according to their predicted IC 50 values from virtual screening protocol. For example, the predicted IC 50 value of ChemBridge database (Db ID: 5931690) compound was 1.86 nM (Figure 10) therefore 1, 3, 100, 300, and 1,000 nM concentrations were used. The data in Figure 10C shows the selected sweeps at equilibrium after the application of different drug concentrations of compound ID: 5931690 and the corresponding dose-response curve is shown in Figure 10D. Similarly, for the OSM database compound (Williamson et al., 2016; Db ID: OSM-S-31), predicted IC 50 value was 4.79 µM. Therefore, 1, 5, 10, and 30 µM different drug concentrations were used. Figure 11C shows the selected sweeps at equilibrium in response to each dose concentration of OSM-S-31.
The IC 50 values of both compounds were calculated by fitting the Hill Equation (see Methods) to the dose-response

DISCUSSION
In present investigation, combined ligands and structure based pharmacoinformatics protocol supported by patch-clamp electrophysiological experiments on selected hits, has been proposed to advocate hERG liability of new chemical entities (NCEs) during early stages of drug design and development.
Recently, Siramsthetty et al. elucidated the impact of data quality, structural diversity and activity threshold of the training set and on the performance of various machine learning algorithms for the prediction of hERG liability (Siramshetty et al., 2018). This and many other previous reports reflect that inconsistent biological data under different experimental condition might result in noisy or unreliable predictive models (Zvinavashe et al., 2008;Su et al., 2010). Here, combined ligand-(GRIND) and structure based-(ligand protein interaction guided pharmacophore) pharmacoinformatics models developed using training and tests evaluated with uniform experimental protocol such as, experimental methods and cell lines may assist in overall robustness of the present hERG liability prediction approach. Finally selected GRIND model illustrates a virtual receptor site of two hydrophobic, one hydrogen bond donor and two steric hotspot regions for the binding of present data sets of hERG blockers. One of the hydrophobic regions (DRY1: yellow hotspots, enclosed by a circle in Figure 7) identified by the GRIND model may represent the most crucial virtual receptor site as the distance of other contours including second hydrophobic region (DRY2), the steric molecular hotspot (TIP1) FIGURE 9 | Showing statistically significant hERG inhibition pharmacophore model obtained using Droperidol docked in the closed conformational state of hERG as template. The pharmacophore consists of two aromatic, one hydrophobic and one hydrogen bond acceptor feature. and a hydrogen bond donor (N1) contour has been calculated from this region. Thus, it is tempting to speculate that this hydrophobic region (DRY1) may provide an anchoring point for the snug fit of the ligand followed by conformational change of the other part of the ligand to find complementary interaction points within binding site of hERG. Additionally, our model also elucidates the importance of molecular shape of the hERG blockers by spotting two steric hotspot regions (TIP1 and TIP2) at the virtual receptor site of structurally diverse data set of hERG inhibitors. The hydrophobic contours of virtual receptor site in our GRIND model also complement the four concentric rings of Tyr_652 and Phe_656 amino acid residues in the central cavity of hERG. This is also supported by various reports about the role of Tyr_652 and Phe_656 residues in drug trapping (Fernandez et al., 2004;Kalyaanamoorthy and Barakat, 2017;Vandenberg et al., 2017).
Various studies also linked the tendency of a compound to get trapped inside the hERG channel with simple increment in the molecular weight or logP (Gleeson, 2008). However, in lead optimization programmes, there is a trend of increase in molecular weight and lipophilicity with an increase in biological activity against a therapeutic target which may offer greater hERG liability in vitro. Additionally, compounds that are trapped inside the hERG are transported through membrane bilayer via cytoplasm inside the hERG (Jabeen et al., 2012). A compound that may efficiently cross the biomembrane barrier may show more hERG inhibition as compared to those with suboptimal transport properties. Therefore, lipophilic efficiency (LipE) calculation in the present study may provide a straightforward way to normalize this effect and aids in identifying the hERG inhibitors (templates) with increased activity as a result of direct interaction with hERG rather than higher biomembrane distribution. Thus, in present study, ligand protein interaction guided hERG inhibition hypothesis by using templates with best IC 50 /lipophilicity and molecular IC 50 /molecular weight ratio may offer efficient models for the virtual screening of the new chemical entities.
Overall, our hERG inhibition pharmacophore hypothesis includes two aromatic, one hydrophobic and one hydrogen bond acceptor features which select highly potent hERG inhibitors (IC 50 < 100 µM). Interestingly, a pairwise comparison of the 3D structural features of the final pharmacophore model with the GRIND contours depicting virtual receptor site revealed a great degree of complementarity as summarized in Table 4. For instance, DRY-DRY contours at a mutual distance of ∼14 A • in virtual receptor space may complement two aromatic (Aro1 and Aro2) pharmacophore features at a distance of 6.1 A • within the template structure. This is further strengthen by a recent three point 3D-SDAR model (Stoyanova-Slavova et al., 2017) where two aromatic rings at a distance of a 4.5-11.5 A • have been identified important hERG toxicophore features. Similarly, DRY-N1 pair of hotspots depict a hydrophobic and a hydrogen bond donor region at a mutual distance of ∼10 A • at the virtual receptor site may complement the hydrophobic and hydrogen bond acceptor group (HBA) within the template structure which further strengthens our proposed binding hypothesis of hERG inhibitors.
Although more recently, various machine learning models for the classification of large data sets based on 2D descriptors and fingerprints may offer good predictive ability and an automated potential to integrate with other conventional and modern modeling approaches (Siramshetty et al., 2018;Wacker and Noskov, 2018). Yet, a ligand and structure based integrated approach based on 3D descriptors information may also represent a promising route for toxicological profiling of new chemical entities. Therefore, we integrated novel alignment independent GRIND descriptors and conventional pharmacophore feature descriptors and propose a pharmacoinformatics strategy for the stepwise nominal (yes/no) as well as numeric (absolute) prediction of hERG inhibition potential (pIC 50 ) of NCEs. Moreover, in the present investigation, an exhaustive data curation protocol has been adopted to select experimentally homogeneous training as well as test data. Moreover, compounds of the training set belong to 11 diverse therapeutic classes (see Supporting Information smiles.csv) hence cover a large and diverse chemical and structural space. However, GRiD Independent Descriptors represent a novel class of descriptors which are insensitive to structural alignment of the data and thus, have ability to predict chemically diverse data set of compounds as compared to conventional 3D QSAR descriptors (Pastor et al., 2000). Therefore, the applicability domain of the training data identified by principal component scores of GRIND variables cover a wide range from −5.8 to 6.2. Projection of test set I and test set II also occupy the same applicability domain (Figure 2). Also the principal component scores of GRIND variables of the 100 top predicted hits ranges from −5.0 to 5.0 as shown in Figure S7. Thus, it reflects the predictive ability of structurally diverse data set of hERG blockers. Methodologically, in the present study highly significant GRIND variables have been selected with the help of AMANDA algorithm (Durán et al., 2008) as described by Pastor et al. (Pastor et al., 2000) in comparison with ALMOND algorithm utilized by most of the previous investigations that select a fixed number of variable irrespective of strength of the MIFs. Furthermore, in present investigation more robust pharmacophore models have been developed by normalizing suboptimal transport factor of the data and selecting templates with best pIC 50 /clogP and pIC 50 /molecular weight ratio. Overall, our final models may offer hERG inhibitory potency predictions from higher nanomolar to lower micromolar range that is validated with the help of whole-cell patch clamp experiment.
Overall, partial least square analysis of final GRIND variables reflects a cross validated q 2 of 0.63 and r 2 of 0.68. Additionally, our final pharmacophore model showed 0.72 sensitivity and 0.75 specificity with 0.72 MCC. These reflect statistically significant model. Subsequently, experimental validation of these models using whole cell patch clamp technique delineates difference of 1.6 and <0.1 log units between experimentally determined and predicted hERG inhibitory (pIC 50 ) values of selected hits Db ID: 5931690, and Db ID: OSM-S-31 demonstrate excellent agreement in the prediction of µM range but underestimate the hERG inhibition potential in nM range. Thus, it seems that the numerical values of the hERG inhibition potential of the nanomolar active hit is not predicted accurately, but the model may still allow correct identification of activity trends (from nanomolar to micromolar). It is still of great value to know which molecules are the "best" (low activity against a hERG) and which are likely to have high hERG inhibition potential. This allows molecules to be prioritized for synthesis against respective therapeutic targets.
Although, the recent cryo-EM structure of the hERG in its open state of the channel may assist in probing structure guided hERG inhibition profiles of new and safer chemical entities (Wang and MacKinnon, 2017). However, the volume of the hydrophobic pockets in the recent hERG structure and their impact on the binding of drugs that are trapped in the closed state of hERG remain obscure. Though majority of highly potent hERG blockers including dofetilide, astemizole, terfenadine, and cisapride, show from 2 to 70-fold higher affinity for the inactivated state than to the open conformational state of the channel (Perrin et al., 2008). Thus, it stresses the need to determine electron microscopic/crystal structures of hERG in closed as well as in the open state of the channel bound with one of the prototype ligands such as, Dofetilide, MK499, or E4031 to understand the molecular basis of drug binding within hERG. However, it is challenging mainly because membrane spanning hERG channel exhibit 4-fold symmetry that might result in negligible electron density due to asymmetric binding of a drug with its one out of four subunits. Therefore, in the absences of explicit experimental structural data, present investigation offers experimentally validated template-based pharmacophore models supported by statistically significant GRIND model to predict the hERG inhibition potential of diverse chemical scaffolds.

CONCLUSION
Herein, we present a novel pharmacoinformatics strategy for the nominal as well as numerical prediction of hERG inhibition potential (pIC 50 ) of NCEs during initial stages of drug development. GRid-Independent molecular Descriptor (GRIND) models have been developed to probe virtual receptor site representing most favorable interacting hotspots within the binding cavity of hERG. Furthermore, pharmacophore templates with best hERG pIC 50 /clogP and pIC 50 /molecular weight have been selected to normalize pIC 50 values against hERG due to membrane distribution effect rather than direct interaction with hERG. Ligand-protein interaction profiles guided pharmacophore features of 13 selected templates in open and closed conformational state of hERG have been evaluated to probe 3D structural features of diverse data set hERG inhibitors. Overall, our final pharmacophore model revealed the presence of two aromatic one hydrophobic and one hydrogen bond acceptor group at particular mutual distance in most potent hERG inhibitors which complement the respective hydrophobic and hydrogen bond donor contours at the virtual receptor site produced by the final GRIND model. The maximum difference of ±1.6 log unit between actual and predicted hERG inhibition potential values of the selected hits Db ID: 5931690 and Db ID: OSM-S-31 further reflect the robustness of our virtual screening protocol and thus, could aid to prioritize NCEs according to hERG inhibition potential from nanomolar to micromolar range.

AUTHOR CONTRIBUTIONS
IJ designed the research project for the Ph.D. degree of SM. SM and IJ carried out the build in silico models and analyzed results. JV and IJ designed the wet lab experiments. SM performed the experiments MW, JV, and AH analyzed the experimental data. SM and IJ wrote the manuscript. MT provided compounds for the validation. ET helped in resolving solubility issue of the OSM compounds. JV and IJ reviewed the paper.

ACKNOWLEDGMENTS
SM is very thankful to the Higher Education Commission (HEC) of Pakistan for providing a full scholarship for Ph.D. studies (Indigenous Ph.D. Fellowship for 5,000 Scholars PIN No: 112-34421-2BM1-428) and an opportunity to visit Victor Chang Cardiac Research Institute (VCCRI) Sydney, Australia for experimental work under International Research Support Initiative Program (IRSIP). JV also provided a Top-up scholarship to SM during her stay at VCCRI.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.01035/full#supplementary-material Data curration protocol, Pie chat, to show dataset distribution. Correlation plot, between IC50 values of compounds obtained from HEK and CHO cell line. PCA cluster I, II and III representative compounds. Conformational analysis for GRIND. Plots of, Lipophilic and Ligand efficiency profiling. Complete docking protocol and ligand-protein interaction diagrams of all selected compounds. Hit selection protocol. NMR spectra of compound OSM-S-31, and 5931690. Molecular Formula Strings (CSV) along with IC50 values are available as SMILES.csv.