Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds

Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

doi:10.3389/fchem.2018.00138

REVIEW article

Front. Chem., 09 May 2018

Sec. Medicinal and Pharmaceutical Chemistry

Volume 6 - 2018 | https://doi.org/10.3389/fchem.2018.00138

Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds

HH
Hongbin Huang ^1,2^†
GZ
Guigui Zhang ^1,3^†
YZ
Yuquan Zhou ^1,2
CL
Chenru Lin ^1,3
SC
Suling Chen ^1,2
YL
Yutong Lin ^1,3
SM
Shangkang Mai ^1,2
ZH
Zunnan Huang ^1,3^*

1. Key Laboratory for Medical Molecular Diagnostics of Guangdong Province, Dongguan Scientific Research Center, Guangdong Medical University Dongguan, China
2. The Second School of Clinical Medicine, Guangdong Medical University Dongguan, China
3. School of Pharmacy, Guangdong Medical University Dongguan, China

Article metrics

View details

131

Citations

30,6k

Views

7,2k

Downloads

Abstract

This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget, and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB, and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction.

Introduction

New drugs can be designed via traditional receptor structure-based virtual screening, which enables the discovery of bioactive compounds that bind the target protein, but they can also originate from reverse virtual screening, which finds the unknown protein targets of active compounds or additional targets of existing drugs (drug repositioning; Hurle et al., 2013). Among the 84 drug products introduced to the market in 2013, new indications of existing drugs accounted for 20%, implying that drug repositioning plays a key role in drug discovery (Graul et al., 2014; Li J. et al., 2016). The majority of drugs or bioactive compounds exert their functions by interacting with protein targets. With an increasing number of drugs showing the ability to target multiple proteins, target identification plays an important role in the fields of drug discovery and biomedical research (Wang J. et al., 2016). Many reverse screening methods can be used to search for the protein targets of molecules (Ziegler et al., 2013), although the earliest approaches involved expensive and time-consuming biological assays (Drews, 1997). However, with the continuous development of Big Data and computational techniques, computer-aided reverse screening methods are playing an increasingly important role in the prediction of the off-target effects and side effects of drugs as well as in drug repositioning (Rognan, 2010; Liu et al., 2014; Schomburg and Rarey, 2014).

These computational methods can be divided into three classes according to their underlying principles: shape screening, pharmacophore screening, and reverse docking. In the absence of receptor crystal structures, shape or pharmacophore screening facilitates the discovery of the potential targets of a query molecule by comparing its overall shape or key pharmacophore features with those of the compounds from a ligand database annotated with target information (Schuffenhauer et al., 2003; Hawkins et al., 2007; Chen et al., 2009). The annotated targets of the matched ligands can then be considered potential targets of the query molecule. Reverse docking, in contrast to the traditional molecular docking used to find the ligands of a target protein, refers to the successive docking of a query molecule into the active pocket of each protein from a protein 3D structure database based on spatial and energy principles to identify protein targets with strong binding affinity as potential targets of the query molecule (Li et al., 2013). Reverse screening methods are important computational techniques for identifying new macromolecular targets of existing drugs or active molecules and for analyzing their functional mechanisms or side effects (Patel et al., 2015). Based on the principles of the methods and the availability of existing large-scale small-molecule [e.g., ChEMBL, the European Molecular Biology Laboratory (Gaulton et al., 2017)] or macromolecule (e.g., the PDB; Rose et al., 2015) databases, researchers worldwide have developed a variety of software and online services for predicting the protein targets of small molecules. Representative examples include SEA (Keiser et al., 2007), PharmMapper (Liu et al., 2010) and INVDOCK (Chen and Zhi, 2001), which are among the earliest tools for shape screening, pharmacophore screening and reverse docking, respectively. In recent years, these three methods have been widely used in the prediction of protein targets to clarify the molecular mechanisms of active small molecules against various diseases (Kharkar et al., 2014; Cereto-Massagué et al., 2015). Many of these molecules are derived from Chinese herbal medicine, and while their pharmacological or biological activities are known, their cellular and molecular mechanisms remain unclear. For example, Lim et al. (2014) used shape screening to determine that curcumin (compound 1, Figure 1), extracted from Zingiberaceae, suppresses the proliferation of human colon cancer cells by targeting cyclin dependent kinase 2 (CDK2). Marine compounds are another class of bioactive small molecules. For example, wentilactone B (WB, compound 2) is a tetranorditerpenoid derivative extracted from the marine algae-derived endophytic fungus Aspergillus wentii EN-48. Zhang et al. (2013) used reverse docking to discover that this small molecule induces G2/M phase arrest and apoptosis of human hepatocellular carcinoma cells by co-targeting the Ras/Raf/MAPK proteins in their signaling pathways.

Figure 1

Here, we begin by introducing the basic principles of these three types of reverse screening methods, i.e., shape screening, pharmacophore screening and reverse docking, for the prediction of the protein targets of small molecules. Then, representative and classical software and online services for each method as well as the relevant databases are hierarchically categorized and systematically presented. Finally, we reviewed nearly all articles on the applications of these methods since 2000 and selected some typical examples to illustrate the use of these methods. By statistically analyzing these articles, we reveal the trends in the application of these three methods for computer-aided protein target prediction. In addition, we discuss their shortcomings and possible solutions as well as previous reviews of these reverse screening approaches for predicting the protein targets of small molecules.

Methods

Reverse screening to search for unknown targets, unintended targets, or secondary targets of small-molecule drugs can be achieved by shape similarity screening, pharmacophore model screening, or reverse protein-ligand docking (Figure 2). These three different calculation approaches are complementary and can be used in conjunction with each other. By comparison, shape, and pharmacophore screening are simpler and faster, while reverse docking is more complex and slower. We will introduce these three methods in detail in the following sections.

Figure 2

Shape screening

The basic principle of shape screening, from a two-dimensional (2D) perspective, is that structurally similar molecules may have similar bioactivity by targeting the same proteins. From a three-dimensional (3D) perspective, the basic principle is that molecules with similar volumes may have the potential to bind effectively to spaces of the same or similar size (considering the ligand-induced fit effect; Koshland, 1958) in the active pockets of proteins (Shang et al., 2017). To use shape screening to predict the targets of small molecules, a small-molecule ligand database annotated with protein targets is necessary. Then, the overall shape similarity of a query molecule to each ligand in the database can be measured individually. Finally, the protein targets for matched molecules with high similarity scores can be considered potential targets of the query molecule (Schuffenhauer et al., 2003). Shape screening involves two levels of mapping: the first mapping between the query molecule and the ligands in the database and the second mapping between the matched ligands in the database and their annotated protein targets (Figure 2).

Shape similarity comparison is based on the 2D or 3D topological structures of small molecules. Notably, 2D methods were originally created to obtain more of the same part between paired molecules, whereas 3D methods can be used to enhance scaffold diversity (Nettles et al., 2006). A universal descriptor for molecular similarity comparison in 2D methods is FingerPrint2D (FP2), which employs a simple bit vector to represent a variety of chemical characteristics and is encoded in a variety of software and databases (Bender et al., 2004). The most frequently used type of FP2 is extended-connectivity fingerprints (ECFPs), which are circular fingerprints. ECFPs symbolize circular atomic neighborhoods based on the Morgan algorithm and are designed especially for structural activity modeling (Rogers and Hahn, 2010). They have variable length: for example, ECFP4 refers to a diameter of 4 and ECFP6 to a diameter of 6 (Glem et al., 2006), both of which are encoded in TargetHunter (Wang L. et al., 2013). Molecular ACCess System (MACCS; Durant et al., 2002) is another commonly used FP2. It is a structure key-based fingerprint and is encoded in the 2D approach of the ChemMapper server (Gong et al., 2013). In addition to FP2, other descriptors are based on 2D topologies or paths, including the daylight fingerprint (http://www.daylight.com) encoded in ChemProt 3.0 (Kringelum et al., 2016) and the MDL structural key, another 2D descriptor (Durant et al., 2002). Structural matching based on 3D topology mainly compares the 3D geometries of the molecules, sometimes with the addition of pharmacophores (Lo et al., 2016), ElectroShapes (Armstrong et al., 2010), Spectrophores (Smusz et al., 2015), or other additional information. For example, WEGA (Yan et al., 2013) and gWEGA (Yan et al., 2014) compare only the volumes of two molecules, but SHAFTS (Lu et al., 2011), encoded in ChemMapper, incorporates pharmacophore matching when calculating the volume similarity.

The similarity of the descriptors in both 2D and 3D methods can be measured by the Tanimoto coefficient. The Tanimoto coefficient represents the ratio of the union to the intersection of the shapes of two molecules (Salim et al., 2003). For example, TargetHunter uses the Tanimoto coefficient to calculate the similarity among molecular fingerprints (Wang L. et al., 2013). The City-Block distance (CBD, also called the Manhattan or Hamming distance), which represents the difference between the sum of two molecular shapes and twice the overlap of two molecular shapes, can also be used to calculate the molecular similarity (Awale and Reymond, 2014). For example, SwissTargetPrediction uses this formula to calculate ElectroShape vectors in 3D comparisons (Gfeller et al., 2014).

Shape screening can be divided into two subclasses: indirect target prediction and direct target prediction. Indirect target prediction indicates that the potential targets of the query molecule are manually selected from the annotated protein targets of the matched database ligands. ROCS (Rush et al., 2005) and TargetHunter (Wang L. et al., 2013) are representative examples. These programs merely calculate the similarity scores between the query molecule and the matched ligands in the database but cannot reveal the complex relationships among the annotated protein targets of multiple matched ligands. In general, the annotated targets of any database ligand are not unique, and a protein target may also be annotated with multiple compounds (Rognan, 2010). Therefore, these programs can have high rates of false positives in target prediction and low accuracy in target searching.

Direct target prediction not only calculates the similarity score between the query molecules and the ligands in the database but also estimates the probability that the annotated targets of the matched ligands are targets of the query molecule. This extra process can reduce the false positive rate of target prediction and improve the accuracy of the target search. The probability that the annotated targets of the matched ligands are targets of the query molecule can be evaluated by multiple computational models or algorithms (the dotted line in Figure 2). For example, ChemMapper (Gong et al., 2013), which is based on a compound-protein network constructed from the top similar structures and their annotated targets, employs a random walk algorithm (Köhler et al., 2008) to calculate the probabilities of interaction between the query structure and the annotated targets of the hit compounds. In addition, SwissTargetPrediction (Gfeller et al., 2014) and CSNAP3D (Lo et al., 2016) use a cross-validation method and a network algorithm, respectively, to assess the probabilities that the annotated targets of the matched ligands are targets of the query molecule.

Because shape screening is based on the comparison of overall molecular shape, it may not be suitable for predicting the potential targets of molecules that are excessively large or small. Judging the potential targets of an oversized molecule is difficult because its best matched ligands usually show a low similarity score, and selecting the potential targets of an undersized molecule is difficult because its matched ligands are numerous with high similarity scores. Shape screening is suitable for predicting potential targets whose available inhibitors have sizes similar to that of the query molecule but is less fit for finding novel targets whose current inhibitors differ greatly in size from the query molecule but whose active pocket space is easily adjusted to bind diverse ligands due to a strong induced-fit effect.

Pharmacophore screening

The basic principle of pharmacophore screening is that the binding of certain drugs with their protein targets is primarily determined by key functional pharmacophores (Rognan, 2010). Thus, the matching of these important pharmacophores can be used to search for new targets of small-molecule drugs (Fang and Wang, 2002). A pharmacophore is the spatial arrangement of functional characteristics that allows molecules to interact with target proteins in a particular binding mode, such as a hydrophobic center (H), hydrogen bond acceptor vector (HBA), hydrogen bond donor vector (HBD), positively charged center (P), or negatively charged center (N) (Kurogi and Güner, 2001). A pharmacophore model is the combination of pharmacophores in a pattern of ligand-protein interaction that give the final pharmacological effect (Leach et al., 2010). Similar to a ligand database for shape screening, a pharmacophore database also requires annotation with target protein information. In pharmacophore screening, the pharmacophore features of the query molecule are successively matched with the features of the pharmacophore models in the database. A higher matching degree indicates that the annotated protein target of the matched pharmacophore model has greater potential to be a target of the query molecule (Steindl et al., 2006). Pharmacophore screening also undergoes two levels of mapping: the first mapping is between the pharmacophore models of the query molecule and of the ligands in the database, and the second mapping is between the matched pharmacophore models of the ligands in the database and their annotated protein targets (Figure 2).

The pharmacophore database is built by pharmacophore modeling. The three construction methods are the use of ligands only, receptor structures only, or co-crystallized complex structures, which can be defined as ligand-based, structure-based and complex-based pharmacophore modeling, respectively. Ligand-based pharmacophore modeling was initially designed and is often used for traditional ligand-based virtual screening; an example is the quantitative structure–activity relationship (QSAR; Pulla et al., 2016). The most substantial common features shared by a group of active molecules can be easily extracted by using this method to form a good pharmacophore model to guide the further optimization of active compounds (Leach et al., 2010; Gaurav and Gautam, 2014). However, this approach is seldom used in reverse pharmacophore modeling due to the arbitrariness of pharmacophore models based on a single protein-annotated ligand.

The other two main methods, the use of only receptor structures and the use of protein-ligand complex structures, are forms of structure-based pharmacophore modeling (Gaurav and Gautam, 2014). In receptor-based methods, the pharmacophore features are first extracted from potential binding sites detected by specific protocols, and the pharmacophore models are then derived from the clustering of interaction point information and further refined or validated by using the input of the known ligands and their available or even calculated binding data (Chen and Lai, 2006). For instance, Pocket v.2 (Chen and Lai, 2006) and Catalyst SBP in Discovery Studio (DS) (BIOVIA, 2017) can both produce this type of pharmacophore database. In complex-based methods, pharmacophore models are simply generated via knowledge-based topological rules by using all features, such as hydrogen bonding information, charge, and hydrophobic contacts, based on the interactions between the co-crystallized ligands and receptor atoms (Sutter et al., 2011; Meslamani et al., 2012). Complex-based pharmacophore modeling is commonly used to construct pharmacophore databases, such as PharmaDB in Discovery Studio (Meslamani et al., 2012) and PharmTargetDB in PharmMapper (Liu et al., 2010), due to the stronger association between the built pharmacophore models and the experimentally verified ligand-protein interactions, which can improve the accuracy of target prediction.

The matching process between a pharmacophore model of the query molecule and the pharmacophore models in the pharmacophore database considers the alignment of two core components: pharmacophore feature types and the positions of the feature types (Wolber and Langer, 2005). The alignment of feature types is the matching between the pharmacophore features shared by the query molecule and the database ligands, such as matching between a hydrophobic feature in the molecular structure and those in database ligand pharmacophore models. The alignment of the feature positions is the pairwise matching of the distances between the fitted feature types in the pharmacophore models (Kabsch, 1976). For example, PharmMapper groups pharmacophores into triplets (e.g., H-H-H, H-HBA-HBD) and uses the vertexes of a triangle to represent the pharmacophore feature types and the side length of the triangle to measure the relative positions of these feature types (Liu et al., 2010).

In pharmacophore screening, the pairwise fitness score between pharmacophore models can be used directly as a basis for target evaluation. The fitness score includes the scores obtained from both the alignments between feature types and the alignments between the positions of each pair of pharmacophore models from the query molecule and database ligands. Higher fitness scores indicate higher probabilities (Wang X. et al., 2016). In addition, other matching information, such as the number of matched features and overall shape similarity, can also be used as additional references for target evaluation (Khedkar et al., 2007). If the pharmacophore scoring process does not consider the overall shape of the query molecules, it will be more likely to find pseudo protein targets with high fitness scores for a smaller query molecule because its limited pharmacophore features can be easily matched in the database (Wang X. et al., 2016). Thus, the target score must be recalculated to improve the prediction accuracy (the dotted line in Figure 2). For example, PharmMapper utilizes a normalized fitness score to re-rank the potential targets by standardizing a normal distribution of the fitness score to achieve a higher accuracy (Wang X. et al., 2017).

Since the construction of the pharmacophore database by structure-based pharmacophore modeling is not easy, the development of corresponding tools based on this principle has been somewhat limited. However, compared with shape screening, pharmacophore screening can improve the accuracy of prediction because it focuses on matching the key pharmacophore functional groups. In addition, it can ignore the total size of the molecule. As a result, pharmacophore screening can be used to search for potential targets of a query molecule with a large or small volume and can also be employed to find novel protein targets capable of binding a large diversity of ligands. Although PharmTargetDB, the PharmMapper in-house repository, does incorporate protein structural information, a pharmacophore database can be built to use ligands only. That is, constructing a pharmacophore database based on ligands with currently unavailable target structures is also useful for pharmacophore screening.

Reverse docking

The basic principle of reverse docking is that the binding strength of a small-molecule ligand and a potential protein target is determined by their interaction energy (docking energy). To use reverse docking to predict the targets of a query molecule, a structure grid database of a large number of protein targets is normally required. Then, the query molecule is individually docked with each protein structure in the database. Each docking score is calculated. Finally, the protein targets are sorted according to their docking energy. Generally, a higher rank indicates a greater probability that the protein is a target of the query molecule. In contrast to shape screening and pharmacophore screening, reverse docking involves one level of mapping, which reflects the direct relationship between the query molecule and the target proteins (Figure 2). However, it is a complex process that includes recognition of a binding site, construction of the docking grid, a molecular docking algorithm, docking score calculation and target evaluation, among other steps (Lee et al., 2016).

In most cases, the active site of a protein is already known and can be determined from its co-crystallized small-molecule ligand. However, for some apo-form structures without co-crystallized ligands, the docking program must first recognize the active binding site of these proteins. If the apo-form structure is from a protein for which other co-crystallized structures are available, its active site can also be identified from those protein structures with co-crystallized ligands. Otherwise, de novo detection of the active site of the apo-form structure is required. The literature describes multiple ways to achieve this task. For example, Wang et al. (2012) uses the “divide-and-conquer” method in idTarget to search the surface structure of the entire protein and possible allosteric structures to find potential binding sites. Kuntz et al. (1982) describes a method that was later used in INVDOCK (Chen and Zhi, 2001) to define a binding site by a group of overlapping probe spheres of certain radii, which fill up a cavity and whose inward-facing surfaces cover the van der Waals surfaces of the protein atoms at the interface. Active site recognition is very useful in attempts to dock a query molecule into cavities other than the binding pockets of known ligands, which can increase the diversity of the binding between the query molecule and protein targets and improve the accuracy of reverse docking.

The database of protein targets used in reverse docking can be a library of protein crystal structure grids with recognized binding sites determined by co-crystallized ligands or available cavities. We can build these databases by continuously downloading a series of protein crystal structures from the Protein Data Bank (PDB); the time-consuming human-computer interaction processes (such as the deletion of water molecules, the addition of hydrogen atoms, and energy optimization) can be accomplished by using a molecular docking program, and the protein structure grids are finally generated. Traditional molecular docking programs, such as DOCK (Allen et al., 2015), AutoDock (Di Muzio et al., 2017), Schrödinger (Schrödinger, 2018) and Discovery Studio (BIOVIA, 2017), can be used to construct a custom target database for reverse docking to search for potential targets of a small molecule. Alternatively, the protein target database can also be a simply processed, automatically constructed protein structure database, and the grids can be generated after the programmed identification of active sites in the process of reverse docking; an example is the idTarget in-house database (Wang et al., 2012). Notably, the lack of a universal protein structure grid database and the need to build a new one for each docking program are the main reasons that reverse docking cannot be used as often as traditional structure-based virtual screening.

At present, reverse screening uses two main types of molecular docking techniques, originally developed in DOCK and AutoDock. DOCK (Ewing et al., 2001) adopts a “geometry matching method” to perform molecular docking by complementing the geometric shape of the docking ligands with that of the protein active site, usually including hydrogen binding sites and locally accessible sites (Shoichet et al., 2010). The matching process is performed by an “anchor and grow algorithm,” in which the anchor is a rigid portion of the ligand that is used to initialize a pruned conformation search, and grow refers to the generation of multiple conformations of the remaining segments to simulate the flexible docking of the ligand (Ewing et al., 2001). AutoDock uses a “docking simulation method” that employs the “genetic algorithm” to sample the conformations of a docking molecule inside a grid of the receptor binding pocket (Willett, 1995). In this algorithm, the molecule starts randomly at the receptor surface and undergoes orientation, translation and rotation to cause conformational changes until the ideal binding pose with the best binding energy is found (Morris et al., 2015). Among three reverse docking programs, INVDOCK (Chen and Zhi, 2001) and TarFisDock (Li et al., 2006) use the DOCK geometry matching method for molecular docking, while idTarget uses the AutoDock genetic algorithm for reverse docking (Wang et al., 2012).

Currently, almost all molecular docking programs can perform flexible-ligand docking due to the small size of the ligands; however, these programs still have difficulty in performing molecular docking with a fully flexible protein. Therefore, depending on the flexibility of the receptor proteins, reverse docking can also be classified into two types: rigid protein docking and semi-flexible protein docking. Although reverse docking with a rigid receptor is fast, it ignores ligand/receptor-induced fit effects. An example of a rigid protein docking program for reverse screening is TarFisDock (Li et al., 2006). Reverse docking with semi-flexible receptors can be achieved by various methods such as side-chain rotations (Liu H. et al., 2015), stretching of active pocket residues (Halgren et al., 2004), and ensemble docking (Lorber and Shoichet, 1998). For example, INVDOCK allows the amino acid residues of the receptor binding sites to rotate with the entry of the ligand, thereby simulating the ligand induced-fit conformational changes of receptors (Chen and Zhi, 2001). idTarget uses the docking of a query molecule into an ensemble of different receptor crystal structures after clustering (Wang et al., 2012) and thus simulates semi-flexible receptor docking by possible binding of the molecule with the distinct locations of the active pocket residues of the receptor in its different structures.

The docking score between a query molecule and receptors is an evaluation criterion for ranking its potential targets in reverse screening. Docking energy is a major method of scoring docking poses and normally refers to the interaction energy between the ligand and protein but may also include the energy of the ligand or the energies of both the ligand and the protein (or a part of the protein such as the binding pocket). For example, INVDOCK evaluates the docking structure by calculating the interaction energy between the ligand and receptor (Chen and Zhi, 2001), whereas idTarget scores the docking pose by calculating the energy of the ligand, the protein binding pocket and the interaction between them (Wang et al., 2012). According to the principle that the most stable structure has the lowest energy, a more negative docking energy results in stronger binding between the ligand and protein. The docking energy is calculated based on energy functions, which are mainly divided into three types: molecular mechanics energy functions, empirical energy functions, and semi-empirical energy functions. The molecular mechanics energy functions are more comprehensive and are rigorously defined by the sum of terms with clear physical meaning, including bond stretching, angle bending, torsion angles, van der Waals forces, electrostatic interactions, desolvation, or hydrophobic interactions, conformational entropy, and potentially others (Huang and Zou, 2010; Wang et al., 2011). In reality, the molecular mechanics energy functions used in the docking programs may include only some of these terms. For example, TarFisDock uses energy functions including only van der Waals and electrostatic interaction terms (Li et al., 2006). Empirical energy functions comprise weighted energy terms whose coefficients are obtained by reproducing the binding affinities of a benchmark data set of protein-ligand complexes (Gilson et al., 1997; Gilson and Zhou, 2007). For example, INVDOCK uses an empirical energy function based on simple contact terms, including hydrogen bond and non-bond terms, to calculate the ligand-protein interactive energy as the binding affinity (Chen and Zhi, 2001). Semi-empirical energy functions combine some molecular mechanics energy terms with empirical weights and/or empirical functional forms and have been widely used in computational docking methods (Raha and Merz, 2005). For example, idTarget follows the AutoDock 4 robust scoring functions (Huey et al., 2007) and employs a semi-empirical free energy function that includes hydrogen bonding, electrostatics, desolvation, and torsional entropy, whose weighting coefficients are derived from regression analysis of the experimental binding affinity information (Wang et al., 2011). In addition, reverse docking allows visual assessment of the docking poses by analyzing the number of hydrogen bonds, the presence or absence of critical hydrogen bonds and pi-pi conjugation, etc., as in traditional virtual screening, to further assist target evaluation for a more accurate prediction.

Reverse docking considers key elements of both shape screening and pharmacophore modeling. It determines whether or not the size of a query molecule can fit inside the binding pocket of a protein target by docking and scores the interaction of the key pharmacophore groups in the molecule and the targets to perform target evaluation. Thus, reverse docking could be the most comprehensive of the three methods in principle. However, similar to traditional molecular docking, it also has the following shortcomings: incompleteness of the search space, inaccuracy of the scoring function, and extensive calculation (Lee et al., 2016). Relative to traditional docking, reverse docking has the additional problem that the sizes of the active pockets of proteins defined by co-crystallized ligands are inconsistent. Even if the docking pockets can be defined as being a universally equal size, the residue density of different protein binding pockets may vary, resulting in differences in the calculation ranges for the binding interaction energies. Therefore, reverse docking suffers from a rationality problem, as it is unable to normalize binding energies for the correct sorting of potential targets. Nevertheless, reverse docking can serve as an effective method to complement shape and pharmacophore screening when the protein structures are available.

Software and online services

Many software programs, some of which are available as online services, can be used for reverse screening to predict protein targets of small molecules, but the numbers of online tools available for the three methods are quite different. Shape screening tools are the most numerous and include more than a dozen, such as ChemProt (Kringelum et al., 2016), ROCS (Rush et al., 2005), ChemMapper (Gong et al., 2013), and the SEA search server (Keiser et al., 2007). They are listed in the outer ring of Figure 3. By contrast, the only tool available for pharmacophore screening is PharmMapper (Liu et al., 2010), as shown in the inner ring of Figure 3. The main tools available for target searching by reverse docking are TarFisDock (Li et al., 2006), idTarget (Wang et al., 2012) and INVDOCK (Chen and Zhi, 2001), which are illustrated in the middle ring of Figure 3. A few large software packages, such as Schrödinger and Discovery Studio, also contain related modules that perform reverse screening, but they can be used only for the indirect prediction of potential targets of small molecules. These tools require users to build their own databases or perform other relevant processing steps. We have summarized the basic information on these tools, organized according to their characteristics, in Table 1. In addition, for each type of software and online service, we have provided more detailed descriptions of a few classic representatives.

Figure 3

Table 1

Name	Required format	Search method	Coverage	Reference website	First version	Last version	State
SuperPred	SMILES, Pubchem-Name	2D similarity	341,000 compounds, 1800 targets, 665,000 compound-target interactions	http://prediction.charite.de/	2014	2014	Accessible
HitPick	SMILES	1NN similarity searching and Laplacian-modified naïve Bayesian target models	145,549 chemical-protein interactions collected from STITCH 3.1	http://mips.helmholtz-muenchen.de/hitpick/cgi-bin/index.cgi?content=targetPrediction.html	2013	2013	Accessible
ChemMapper	SMILES, MOL2, SDF, SMI	SHAFTS, USR, FP2, MACCS, and random walk algorithm	Nearly 300,000 chemical structures and >3 million compounds	http://lilab.ecust.edu.cn/chemmapper/	2013	2016	Accessible
SEA server	SMILES	FP2 and BLAST-like model	Hundreds of target-ligand sets	http://sea.bkslab.org/	2007	2007	Accessible
ReverseScreen3D	SMILES	Hybrid 2D&3D	Automatically updated from RCSB PDB	http://www.modeling.leeds.ac.uk/ReverseScreen3D	2011	2011	Inaccessible
TarPred	SMILES	KNN-based data fusion with molecular similarity	533 individual targets with 179,807 active ligands	http://202.127.19.75:5555/	2015	2015	Inaccessible
SwissTargetPrediction	SMILES	Five species, FP2, 3D similarity	280,000 compounds and >2000 targets	http://www.swisstargetprediction.ch/index.php	2014	2014	Accessible
SwissSimilarity	SMILES	FP2, five 3D methods	>30 chemical databases covering drugs, bioactive compounds, etc.	http://www.swisssimilarity.ch/	2016	2016	Accessible
ChemProt	SMILES, name	MACCS, FP2, daylight-like fingerprints	>1.7 million unique chemicals and >20,000 proteins	http://potentia.cbs.dtu.dk/ChemProt/	2011	2016	Accessible
TargetHunter	SMILES	ECFP6, ECFP4, FP2	CHEMBL Version 22	http://www.cbligand.org/TargetHunter/	2013	2016	accessible
CSNAP3D	SMILES, SDF	3D similarity, network algorithms	Based on CHEMBL database	http://services.mbi.ucla.edu/CSNAP/index.html	2016	2016	accessible
ROCS	SDF, MOL2, PDB	3D similarity	User prepared	https://www.eyesopen.com/rocs	2006	2017	Accessible
PharmMapper	MOL2, SDF	Pharmacophores	23,236 proteins covering >53,000 pharmacophore models	http://lilab.ecust.edu.cn/pharmmapper/	2010	2017	Accessible
TarFisDock	MOL2	DOCK4.0	Based on the PDTD, which contains 1207 entries covering 841 known and potential drug targets	http://www.dddc.ac.cn/tarfisdock/	2006	2008	Inaccessible
idTarget	PDB, MOL2, pdbqt, cif	MeDock, divide-and-conquer	All protein structures in the PDB	http://idtarget.rcas.sinica.edu.tw	2012	2012	Accessible
INVDOCK	NA	DOCK	An in-house database (9000 protein and nucleic acid entries)	http://bidd.nus.edu.sg/group/softwares/invdock.htm	2001	NA	Accessible
Discovery Studio	SDF, MOL2, PDB	Pharmacophores	140,000 receptor-ligand pharmacophore models.	http://accelrys.com/products/collaborative-science/biovia-discovery-studio/pharmacophore-and-ligand-based-design.html	2012	2017	Accessible

Characteristics of reverse screening tools.