# CHEMOINFORMATICS APPROACHES TO STRUCTURE- AND LIGAND-BASED DRUG DESIGN

EDITED BY : Adriano D. Andricopulo and Leonardo L. G. Ferreira PUBLISHED IN : Frontiers in Pharmacology

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-744-1 DOI 10.3389/978-2-88945-744-1

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# CHEMOINFORMATICS APPROACHES TO STRUCTURE- AND LIGAND-BASED DRUG DESIGN

Topic Editors:

Adriano D. Andricopulo, University of Sao Paulo, Brazil Leonardo L. G. Ferreira, University of Sao Paulo, Brazil

Cover image: Robert Kneschke/Shutterstock.com

Chemoinformatics is paramount to current drug discovery. Structure- and ligandbased drug design strategies have been used to uncover hidden patterns in large amounts of data, and to disclose the molecular aspects underlying ligand-receptor interactions. This Research Topic aims to share with a broad audience the most recent trends in the use of chemoinformatics in drug design. To that end, experts in all areas of drug discovery have made their knowledge available through a series of articles that report state-of-the-art approaches. Readers are provided with outstanding contributions focusing on a wide variety of topics which will be of great value to those interested in the many different and exciting facets of drug design.

Citation: Andricopulo, A. D., Ferreira, L. L. G., eds (2019). Chemoinformatics Approaches to Structure- and Ligand-Based Drug Design. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-744-1

# Table of Contents

*06 Editorial: Chemoinformatics Approaches to Structure- and Ligand-Based Drug Design*

Leonardo L. G. Ferreira and Adriano D. Andricopulo

#### SECTION A

#### CHEMOINFORMATICS APPROACHES FOR INFECTIOUS DISEASE DRUG DISCOVERY


Marilia N. N. Lima, Cleber C. Melo-Filho, Gustavo C. Cassiano, Bruno J. Neves, Vinicius M. Alves, Rodolpho C. Braga, Pedro V. L. Cravo, Eugene N. Muratov, Juliana Calit, Daniel Y. Bargieri, Fabio T. M. Costa and Carolina H. Andrade

*60 Molecular Simulations of Carbohydrates With a Fucose-Binding*  Burkholderia ambifaria *Lectin Suggest Modulation by Surface Residues Outside the Fucose-Binding Pocket*

Tamir Dingjan, Anne Imberty, Serge Pérez, Elizabeth Yuriev and Paul A. Ramsland

### SECTION B

#### STRUCTURE-BASED DRUG DESIGN APPROACHES FOR CANCER

*76 Identification of a New Potent Inhibitor Targeting KRAS in Non-small Cell Lung Cancer Cells*

Chun Xie, Ying Li, Lan-Lan Li, Xing-Xing Fan, Yu-Wei Wang, Chun-Li Wei, Liang Liu, Elaine Lai-Han Leung and Xiao-Jun Yao

*84 Gossypol Inhibits Non-small Cell Lung Cancer Cells Proliferation by Targeting EGFRL858R/T790M*

Yuwei Wang, Huanling Lai, Xingxing Fan, Lianxiang Luo, Fugang Duan, Zebo Jiang, Qianqian Wang, Elaine Lai Han Leung, Liang Liu and Xiaojun Yao


Kiminori Hori, Kasumi Ajioka, Natsuko Goda, Asako Shindo, Maki Takagishi, Takeshi Tenno and Hidekazu Hiroaki

#### SECTION C

#### ASSESSING RELEVANT END-POINTS WITH CHEMOMETRICS AND PHARMACOGENOMICS

#### *116 Comparison of Quantitative and Qualitative (Q)SAR Models Created for the Prediction of Ki and IC50 Values of Antitarget Inhibitors*

Alexey A. Lagunin, Maria A. Romanova, Anton D. Zadorozhny, Natalia S. Kurilenko, Boris V. Shilov, Pavel V. Pogodin, Sergey M. Ivanov, Dmitry A. Filimonov and Vladimir V. Poroikov

*127 QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery*

Bruno J. Neves, Rodolpho C. Braga, Cleber C. Melo-Filho, José Teófilo Moreira-Filho, Eugene N. Muratov and Carolina Horta Andrade

*134 Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges*

Rodolfo S. Simões, Vinicius G. Maltarollo, Patricia R. Oliveira and Kathia M. Honorio

*141 Development of an Infrastructure for the Prediction of Biological Endpoints in Industrial Environments. Lessons Learned at the eTOX Project*

Manuel Pastor, Jordi Quintana and Ferran Sanz


Samuel Lampa, Jonathan Alvarsson, Staffan Arvidsson Mc Shane, Arvid Berg, Ernst Ahlberg and Ola Spjuth

*176 Extending* in Silico *Protein Target Prediction Models to Include Functional Effects*

Lewis H. Mervin, Avid M. Afzal, Lars Brive, Ola Engkvist and Andreas Bender

*189 Genotypic and Phenotypic Factors Influencing Drug Response in Mexican Patients With Type 2 Diabetes Mellitus*

Hector E. Sanchez-Ibarra, Luisa M. Reyes-Cortes, Xian-Li Jiang, Claudia M. Luna-Aguirre, Dionicio Aguirre-Trevino, Ivan A. Morales-Alvarado, Rafael B. Leon-Cachon, Fernando Lavalle-Gonzalez, Faruck Morcos and Hugo A. Barrera-Saldaña


### SECTION D

#### DRUG TARGETS AND CHEMICAL DIVERSITY

*224 Development of Matrix Metalloproteinase-2 Inhibitors for Cardioprotection* Péter Bencsik, Krisztina Kupai, Anikó Görbe, Éva Kenyeres, Zoltán V. Varga, János Pálóczi, Renáta Gáspár, László Kovács, Lutz Weber, Ferenc Takács, István Hajdú, Gabriella Fabó, Sándor Cseh, László Barna, Tamás Csont, Csaba Csonka, György Dormán and Péter Ferdinandy


### SECTION E

#### PROFILING INTERMOLECULAR INTERACTIONS IN EARLY DRUG DISCOVERY

*275 Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities*

Marcelo D. Polêto, Victor H. Rusu, Bruno I. Grisci, Marcio Dorn, Roberto D. Lins and Hugo Verli


Shan Wang, Yu Tian, Min Wang, Min Wang, Gui-bo Sun and Xiao-bo Sun


Fang-Yu Lin, Emilio Xavier Esposito and Yufeng J. Tseng

*401 Structural Changes Due to Antagonist Binding in Ligand Binding Pocket of Androgen Receptor Elucidated Through Molecular Dynamics Simulations*

Sugunadevi Sakkiah, Rebecca Kusko, Bohu Pan, Wenjing Guo, Weigong Ge, Weida Tong and Huixiao Hong

# Editorial: Chemoinformatics Approaches to Structure- and Ligand-Based Drug Design

Leonardo L. G. Ferreira\* and Adriano D. Andricopulo\*

Laboratory of Medicinal and Computational Chemistry, Center for Research and Innovation in Biodiversity and Drug Discovery, Physics Institute of Sao Carlos, University of Sao Paulo, Sao Carlos, Brazil

Keywords: drug design, molecular modeling, computational chemistry, QSAR, molecular docking, QSPR, virtual screening

**Editorial on the Research Topic**

#### **Chemoinformatics Approaches to Structure- and Ligand-Based Drug Design**

Pharmaceutical research and development (R&D) has faced outstanding challenges as scientific breakthroughs achieved in the past two decades have revolutionized the field. Important approaches such as high-throughput screening (HTS) have increasingly been used in combination with emerging strategies relying on genomics, chemical biology and molecular modeling (Jones and Bunnage, 2017). These forefront approaches have promoted substantial progress in our understanding of key biological processes, in addition to fostering critical advances in the armamentarium available for drug R&D (Liu et al., 2017). Along with synthetic strategies such as combinatorial chemistry, which has supported a consistent expansion of the chemical space explored in drug discovery, these state-of-the-art technologies are shaping the future of pharmaceutical industry. The integration of these methodologies to the drug discovery enterprise has led to an exponential growth of chemical and biological data, in addition to a sharp increase in the complexity of the R&D process. As a result, current players in drug discovery have invested unprecedentedly in the development of computational methods to extract meaning from these data and simulate critical phenomena related to drug efficacy, pharmacokinetics (PK) and toxicity (Macalino et al., 2015). The value of using in silico strategies has been demonstrated by the increasing number of publications reporting campaigns that have resulted in the discovery of promising lead compounds; many of them undergoing clinical development and reaching the market.

Usually, these computer-assisted efforts integrate ligand- and structure-based drug design strategies (LBDD and SBDD, respectively) with a combination of experimental techniques (Ferreira et al., 2015). Broadly used SBDD approaches, molecular docking, homology modeling, molecular dynamics and structure-based virtual screening have provided relevant insights into ligand-receptor interactions (Wang et al., 2016). Equally important, LBDD methods such as pharmacophore modeling, quantitative structure-activity relationships (QSAR) and ligand-based virtual screening have been actively used to explore small-molecule databases and produce correlations between chemical features and pharmacological activity (Lavecchia, 2015). Also a hot-topic in LBDD, quantitative structure-property relationship (QSPR) models are central for predicting PK and toxicity-related characteristics (Tao et al., 2015).

Including original and review articles, this research topic (RT) connects recent applications of LBDD and SBDD with the study of drug activity, as well as drug absorption, distribution, metabolism, excretion (ADME) and toxicity. We were able to collect 31 articles connecting more than 200 scientists from all around the world. Molecular modeling strategies for different

Edited and reviewed by:

Salvatore Salomone, Università degli Studi di Catania, Italy

\*Correspondence:

Adriano D. Andricopulo aandrico@ifsc.usp.br Leonardo L. G. Ferreira leonardo@ifsc.usp.br

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 03 November 2018 Accepted: 16 November 2018 Published: 04 December 2018

#### Citation:

Ferreira LLG and Andricopulo AD (2018) Editorial: Chemoinformatics Approaches to Structure- and Ligand-Based Drug Design. Front. Pharmacol. 9:1416. doi: 10.3389/fphar.2018.01416

**6**

conditions such as cancer, leishmaniasis, malaria, coronary heart disease, diabetes, and Alzheimer's disease are addressed. A broad variety of topics including the development of scoring functions, nuclear magnetic resonance (NMR)-assisted molecular docking, and the interplay between molecular docking and molecular dynamics are covered. In addition, this RT highlights the use of natural products as inhibitors of molecular targets such as the epidermal growth factor receptor (EGFR) and tumor necrosis factors (TNFs). New ligands targeting protein arginine methyltransferases, Kirsten rat sarcoma viral oncogene homolog (KRAS), G protein-coupled receptors (GPCRs), matrix metalloproteinases, heat shock proteins (HSPs), and mammalian Disheveled, are also considered. The design and implementation of online platforms for the prediction of in vivo toxicity, off-target interactions, and PK properties are described. The use of chemical proteomic approaches to profile molecular targets, force-fields for accessing compound-solvent interactions,

#### REFERENCES


and algorithms that consider synthetic accessibility for lead optimization are also reported.

It is our aim that the high-quality material enclosed in this RT contributes to the dissemination of outstanding science across the worldwide research community dedicated to the fascinating universe of drug discovery.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENTS

We would like to thank all the authors, editors and reviewers who have participated in the success of this research topic.

as in-silico ADME prediction tools. Adv. Drug Deliv. Rev. 86, 83–100. doi: 10.1016/j.addr.2015.03.014

Wang, T., Wu, M. B., Zhang, R. H., Chen, Z. J., Hua, C., Lin, J. P., et al. (2016). Advaces in computational structure-based drug design and application in drug discovery. Curr. Top. Med. Chem. 16, 901–916. doi: 10.2174/1568026615666150825142002

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ferreira and Andricopulo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Chemoinformatics Strategies for Leishmaniasis Drug Discovery

Leonardo L. G. Ferreira\* † and Adriano D. Andricopulo\* †

Laboratory of Medicinal and Computational Chemistry, Center for Research and Innovation in Biodiversity and Drug Discovery, São Carlos Institute of Physics, University of São Paulo, São Carlos, Brazil

Leishmaniasis is a fatal neglected tropical disease (NTD) that is caused by more than 20 species of Leishmania parasites. The disease kills approximately 20,000 people each year and more than 1 billion are susceptible to infection. Although counting on a few compounds, the therapeutic arsenal faces some drawbacks such as drug resistance, toxicity issues, high treatment costs, and accessibility problems, which highlight the need for novel treatment options. Worldwide efforts have been made to that aim and, as well as in other therapeutic areas, chemoinformatics have contributed significantly to leishmaniasis drug discovery. Breakthrough advances in the comprehension of the parasites' molecular biology have enabled the design of high-affinity ligands for a number of macromolecular targets. In addition, the use of chemoinformatics has allowed highly accurate predictions of biological activity and physicochemical and pharmacokinetics properties of novel antileishmanial compounds. This review puts into perspective the current context of leishmaniasis drug discovery and focuses on the use of chemoinformatics to develop better therapies for this life-threatening condition.

#### Edited by:

Salvatore Salomone, Università degli Studi di Catania, Italy

#### Reviewed by:

Simone Brogi, Università degli Studi di Siena, Italy Ana Carolina Rennó Sodero, Universidade Federal do Rio de Janeiro, Brazil

#### \*Correspondence:

Leonardo L. G. Ferreira leonardo@ifsc.usp.br Adriano D. Andricopulo aandrico@ifsc.usp.br

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 08 August 2018 Accepted: 18 October 2018 Published: 01 November 2018

#### Citation:

Ferreira LLG and Andricopulo AD (2018) Chemoinformatics Strategies for Leishmaniasis Drug Discovery. Front. Pharmacol. 9:1278. doi: 10.3389/fphar.2018.01278 Keywords: medicinal chemistry, ligand-based drug design, structure-based drug design, neglected tropical diseases, molecular modeling, leishmania

#### CURRENT PANORAMA OF LEISHMANIASIS

Leishmaniasis is a neglected tropical disease (NTD) that causes approximately 20,000 deaths each year. Nearly 300,000 new cases of the disease are registered annually, and over 1 billion people are exposed to the risk of infection<sup>1</sup> . The disease is caused by more than 20 species of Leishmania protozoan parasites that are transmitted to humans through the bites of female Phlebotomus and Lutzomyia sandflies. Leishmaniasis occurs in 98 tropical and subtropical countries encompassing the Mediterranean Basin, South-East Asia, Afro-Eurasia, East Africa, and the Americas. People who are exposed to adverse socioeconomic circumstances, malnutrition, poor housing, and unsanitary conditions are the main target of leishmaniasis (Hailu et al., 2016).

Although leishmaniasis is a curable condition, treatment depends on a variety of factors, including geographic region, clinical form of the disease and parasite species. The available chemotherapy consists of drugs that cause serious side effects, such as renal, pancreatic and hepatic toxicity, teratogenicity, and cardiac and gastrointestinal problems (Copeland and Aronson, 2015). The need for hospitalization, long-term and costly treatment, and drug resistance are additional drawbacks. To this list, one may add the difficulties in implementing the widespread use of the 2014-approved drug miltefosine due to problems of affordability and limited availability and accessibility (Sunyoto et al., 2018). Another current concern in endemic regions is the contingent of patients with leishmaniasis who are coinfected with the HIV virus. Lower cure rates are achieved

<sup>1</sup>http://www.who.int/leishmaniasis/en/

in these patients because both pathogens attack the immune system. Furthermore, this group is more vulnerable to the drug-associated adverse effects, which contribute to higher death rates (Abongomera et al., 2018). These drawbacks have driven the creation of robust worldwide efforts to pursue novel therapeutic options. This article provides a perspective on these efforts, focusing on recent advances that involve the use of chemoinformatics.

### FROM TRIAL-AND-ERROR TO KNOWLEDGE-BASED DRUG DESIGN

Similar to most early NTD-focused research programs, drug discovery for leishmaniasis relied on trial-and-error strategies that were based solely on phenotypic screenings. This paradigm reflected the lack of a reasonable understanding of the molecular aspects of the Leishmania biology and the cellular processes involved in parasite–host interaction (Gilbert, 2013). This setting began to change when the outstanding findings from genome projects in the mid-2000s started to open an array of new opportunities in leishmaniasis drug discovery (Reguera et al., 2014). Simultaneously, novel collaborative networks were settled, incorporating pharmaceutical companies, and not-for-profit organizations, which, along with research and academic institutions, have brought previously unavailable technological and scientific developments to the field (Preston and Gasser, 2018). Since then, genomics, proteomics, and structural biology data have been made available via openaccess NTD-focused databases, which have been essential to the use of chemoinformatics in leishmaniasis research. The Sanger Institute's GeneDB, for example, organizes the data of several Leishmania species and is a useful tool for searching particular gene sequences and investigating gene similarity and function (Logan-Klumpler et al., 2012). Another important virtual platform, the WHO's TDR Targets Database, is a chemogenomics resource that is focused on NTDs and connects information from diverse protein and small-molecule libraries (Magariños et al., 2012). In doing so, the TDR Targets Database algorithm generates privileged combinations of molecular targets and compounds to be considered for experimental studies. To this list, one may add LmSmdB, which is a database that simulates metabolic networks (Patel et al., 2016), and LeishMicrosatDB, which is a search engine for microsatellite sequences in Leishmania genomes (Dikhit et al., 2014). Resulting from these advances, more than 340 protein structures from Leishmania spp. are currently registered in the Protein Data Bank (PDB) (Berman et al., 2000). These data have been key to understanding the parasite's molecular machinery and interspecies variability, which are fundamental aspects to developing broad-spectrum drugs.

Taking advantage of this progress, researchers have increasingly engaged in research and development (R&D) organizational models that are characterized by well-structured worldwide collaboration networks, which are referred to as public-private-partnerships (PPPs) (Preston and Gasser, 2018). These initiatives have been pivotal to enhancing the research infrastructure of NTDs by providing state-of-the-art facilities and technologies, high-quality compound libraries for screening and highly qualified human resources. One noteworthy example is the Drugs for Neglected Diseases Initiative's (DNDi) Lead Optimization Latin America (LOLA) consortium, which focuses on preclinical in vitro and in vivo efficacy, safety and pharmacokinetics assessment<sup>2</sup> . Experimental evaluation is routinely followed by chemoinformatics studies to identify structure-activity and structure-property relationships that guide the design of optimized compounds. The value of this type of initiative has been demonstrated by the successful development of several candidates that are currently undergoing advanced preclinical trials for leishmaniasis<sup>3</sup> .

### STRUCTURE- AND LIGAND-BASED STRATEGIES IN LEISHMANIASIS DRUG DISCOVERY

Technologies such as combinatorial chemistry and highthroughput screening (HTS) have enabled tests on large compound libraries that encompass a significant chemical diversity in short time scales (Folmer, 2016; Liu et al., 2017). Although these highly impactful approaches have enhanced the potential of the pharmaceutical industry to deliver better drugs in all therapeutic areas, they contributed to scale up the complexity of drug R&D. In this context, in which the outstanding demands for innovation are constantly challenged by significant attrition rates, the industry has put intensive effort into the integration of computational tools into the research pipeline (Rognan, 2017). Being cost-effective mainly in the early stages of discovery, this R&D setting is especially suited to clinical conditions, such as leishmaniasis, which have limited resources compared with mainstream therapeutic areas. Hence, given the ability of chemoinformatics to rapidly estimate ligandreceptor interactions and a number of physicochemical and pharmacokinetics properties, this approach has steadily grown as a key component of drug R&D (Ponder et al., 2014; Macalino et al., 2015).

Notwithstanding their broad diversity, chemoinformatics tools are generally classified into structure- and ligand-based drug design (SBDD and LBDD, respectively) approaches. SBDD methods consist of the use of the 3D coordinates of molecular targets to investigate and optimize ligand-receptor interactions (van Montfort and Workman, 2017). SBDD programs have revealed the 3D architecture of a variety drug targets, mainly by the use of techniques such as X-ray crystallography. By uncovering binding site attributes, such as shape and electronic distribution, SBDD efforts have been able to deliver ligands with accurately designed properties to achieve high-affinity interactions with their targets (Ferreira et al., 2015). This process is generally assisted by methods such as molecular docking and structure-based virtual screening (SBVS), whereby potential ligands can be evaluated as to their binding mode and energetics

<sup>2</sup>https://www.dndi.org/2013/media-centre/news-views-stories/news/first-earlystage-research-latin-america/

<sup>3</sup>https://www.dndi.org/diseases-projects/portfolio/

(**Figure 1A**). By examining these data along with experimental results, structure-activity relationships (SAR) can be derived and then used to optimize ligand-receptor affinity and other properties (dos Santos et al., 2018).

Some promising macromolecular targets have been investigated in leishmaniasis drug discovery. The most relevant are topoisomerases and proteases (mainly cysteine-proteases) (Ansari et al., 2017). Other important targets are tubulin, proteins of the folate metabolic route, kinases, phosphodiesterases, and enzymes that are involved in the trypanothione and purine salvage pathways (Ansari et al., 2017). Ligands belonging to a broad variety of chemical classes have been identified for these targets, providing high-quality data for drug design.

Ligand-based drug design studies can be performed without the receptor 3D structure. Instead, they require information on the structure, activity, and molecular properties of small molecules (Chen, 2013). These data are used to construct chemometric models that correlate molecular properties (molecular descriptors) with pharmacodynamics and pharmacokinetics parameters (target properties). In doing so, quantitative structure-activity and structure-property relationships (QSAR and QSPR, respectively) can be derived to identify molecular descriptors that are directly associated with the target property (Yousefinejad and Hemmateenejad, 2015). By providing this type of information, these models are useful for evaluating the target property and guiding the design of new compounds that have improved profiles (**Figure 1B**). Today, many free-access and commercial software programs that include well-validated QSAR and QSPR models are available for predicting a number of properties. They vary from online platforms that are very straightforward to use to packages that require local license installation.

The use of SBDD and LBDD methods in leishmaniasis drug discovery is an encouraging strategy that has advanced alongside the progress made in the NTD field (Njogu et al., 2016). Chemoinformatics studies have incorporated different SBDD workflows that focus on established and newly discovered molecular targets. On the other hand, the use of QSAR

and QSPR models for predicting key pharmacodynamics and pharmacokinetics properties has also been noteworthy. The manipulation of this information, including genomics, metabolomics, structural, and small-molecule data, has been particularly useful for running metabolic network predictions for prospecting novel molecular targets and promising compounds and for proposing likely mechanisms of action. The next sections bring a perspective on a few recent cases using chemoinformatics, focusing on their contribution to the progress of leishmaniasis drug R&D.

#### Structure-Based Studies

Structure-based drug design efforts have prominently contributed to uncovering novel ligands for both well-established and newly discovered drug targets in Leishmania spp. One example is pteridine reductase 1 (PTR1), which is an enzyme involved in the pteridine salvage pathway and folate metabolism and a validated target in leishmaniasis drug discovery (Ong et al., 2011). This enzyme was explored in a study that reported on an SBDD strategy for designing novel inhibitors that combine the features of dihydropyrimidine and chalcone derivatives (Rashid et al., 2016). By using the crystallographic structure of L. major PTR1, the authors proposed a series of analogs to achieve high-affinity interactions with the catalytic site of the enzyme. Molecular docking-guided structural modifications on the dihydropyrimidine and chalcone moieties and a reduction in the number of rotatable bonds led to the most active compounds against L. major. For example, compound **1** proved to be highly active against both L. major and L. donovani promastigotes, exhibiting a half-maximum inhibition concentration (IC50) of 948 nM and 3 µM, respectively (**Figure 2A**). The predicted ligand-receptor binding energies were consistent with the in vitro antileishmanial activity values. These results demonstrate the suitability of these substituted dihydropyrimidines to be further investigated as potential agents against both visceral and cutaneous leishmaniasis.

Among Leishmania cysteine proteases, type B enzymes (CPB) have been recognized as key virulence factors whose activity is essential for parasite survival and the invasion of host cells (Casgrain et al., 2016). Within this group, the cathepsin-L-like endopeptidase CPB2.8 has emerged as a promising drug target in leishmaniasis. An article by De Luca et al. (2018) reported the discovery of a series of substituted benzimidazole derivatives that feature nanomolar affinity for L. mexicana CPB2.8 (K<sup>i</sup> values

ranging from 150 to 690 nM). A few analogs displayed interesting activity on L. infantum intracellular amastigotes, with the most potent one (**2**) yielding an IC<sup>50</sup> of 6.8 µM (**Figure 2B**). Molecular docking studies were run to examine the binding mode of the compounds within the catalytic site of CPB2.8 and to rationalize the enzyme kinetics data. The administration, distribution, metabolism, excretion and toxicity (ADMET) were predicted to evaluate the drug-likeness of the series and hence, its suitability for further development. Compound **2** demonstrated a good bioavailability profile, which, along with the biochemical and biological results, rendered it a good candidate for future drug design efforts.

Type 2 NADH dehydrogenase (NDH2), a mitochondrial enzyme that catalyzes the electron transfer from NADH to ubiquinone, is an emerging drug target in leishmaniasis drug discovery (Marreiros et al., 2017). By constructing a homology model of the enzyme, Stevanovic et al. (2018) ´ conducted a pharmacophore-based virtual screening to find novel L. infantum NDH2 inhibitors. A group of 23 virtual hits were selected and screened against the recombinant enzyme and subsequently tested for their activity on L. infantum whole cells. Out of this set, a 6-methoxy-quinalidine derivative (**3**, **Figure 3A**) proved to be the best NDH2 inhibitor (K<sup>i</sup> = 8.9 µM). In addition, this compound exhibited nanomolar activity against both L. infantum axenic amastigotes (IC<sup>50</sup> = 200 nM) and promastigotes (IC<sup>50</sup> = 30 nM). These remarkable results make this novel quinalidine derivative a promising starting point for molecular optimization and in vivo studies for visceral leishmaniasis.

Ochoa et al. (2016) reported the use of the IBM World Community Grid to run an SBVS campaign on 53 different Leishmania proteins. First, molecular dynamics simulations were performed for this entire set, and then, distinct conformational states of each structure were selected for the SBVS effort. Approximately 2,000 conformations were selected and used to screen a database of 600,000 drug-like compounds, resulting in 1 billion protein-ligand complexes. A group of four proteins were observed engaging in high–affinity interactions with the database

dehydrogenase (NDH2) resulting in the identification of compound 3, a remarkably potent antileishmanial agent. (B) An SBDD strategy targeting diverse Leishmania proteins that led to the discovery of 4, a novel compound having promising antileishmanial activity.

compounds, and the most favorable binding energy occurred in L. major dihydroorotate dehydrogenase (LmDHODH). This enzyme catalyzes the oxidation of dihydroorotate, a key reaction in the pyrimidine synthesis pathway (Cordeiro et al., 2012). Ten top-scoring LmDHODH inhibitors were selected and evaluated for their in vitro antileishmanial activity. Four molecules were active against L. panamensis intracellular amastigotes, with the most active one (**4**, **Figure 3B**) yielding a half maximal effective concentration (EC50) of 1.42 µM, which is a value that is comparable to that of the reference drug amphotericin B. Furthermore, this compound showed no toxicity in human macrophages. This compound is a promising candidate for further development, and future investigations are expected to assess its efficacy in reducing in vivo parasite burden.

The enzyme topoisomerase 1 from L. donovani (LdTop1) was selected as the molecular target in an SBDD study by Mamidala and coworkers (Mamidala et al., 2016). The enzyme catalyzes single-strand breaks in DNA, which enables the topological changes that are required during fundamental cellular processes such as gene replication and transcription (Pommier et al., 2016). The authors reported the discovery of a series of LdTop1 inhibitors by using scaffold hopping and bioisosteric manipulations. The structure of known Top1 inhibitors such as camptothecin and edotecarin were used as the starting points for the molecular design. The outline of the compounds was guided by molecular docking runs using the X-ray structures of LdTop1 and the human ortholog. Six compounds showed selective activity against LdTop1 over the human enzyme, yielding EC<sup>50</sup> values from 1 to 30 µM (**5–10**, **Figure 4**). The best inhibitor (**5**, EC<sup>50</sup> = 3.51 µM) exhibited interesting biological activity against L. donovani promastigotes (IC<sup>50</sup> = 4.21 µM) and no toxicity against mammalian cells. The structure of the ternary complex **5**-LdTop1-DNA, which was predicted by molecular docking, revealed key structural features to the design of novel analogs.

FIGURE 6 | Ligand-based approach to classify compounds according to their mechanism of action. The effects of the dataset compounds on Leishmania metabolism were analyzed by capillary electrophoresis–mass spectrometry, and the data were used in a principal component analysis (PCA). The PCA was able to cluster compounds according to the perturbation they caused in the parasite's metabolic network.

Considering the suitable antileishmanial activity and the lack of cytotoxicity, further studies on compound **5** would be useful for assessing other aspects, such as its pharmacokinetics profile.

Brindisi et al. (2015) reported for the first time the discovery of non-covalent tryparedoxin peroxidase inhibitors. Tryparedoxin peroxidase has been considered as a molecular target in SBDD studies since it reduces hydroperoxides produced by infected macrophages. This mechanism of detoxification is particularly attractive for drug design since it is unique to the parasite and essential for its survival (Fiorillo et al., 2012). By using the X-ray structure of Leishmania major tryparedoxin peroxidase I (LmTXNPx), the authors run a molecular docking effort and selected a set of hits for experimental profiling. The docking conformations were used for the design of a series of N,Ndisubstituted 3-aminomethyl quinolones and some of them displayed activity against LmTXNPx. Forming a number of hydrogen bonds and hydrophobic contacts with the enzyme, the most potent compound (**11**, **Figure 5**), which has a bulky aliphatic adamantyl system, showed activity in the micromolar range (K<sup>d</sup> = 39 µM). Calculation of physicochemical parameters demonstrated the drug-likeness of the designed series. In view of the activity and the drug-like properties of quinolone derivative **11**, this compound represents a suitable starting point for further studies aiming the development of novel drug candidates against leishmaniasis.

#### Ligand-Based Studies

A variety of LBDD approaches have been recently reported in leishmaniasis drug discovery. These studies are frequently conducted in combination with experimental protocols and SBDD methods. The main goals include the use of QSAR and QSPR models to predict activity and ADMET parameters and the search for novel compounds via ligand-based virtual screening (LBVS). One of these studies reports an approach to pursuing novel compounds based on their effects on cell metabolism (Armitage et al., 2018). A collection of structurally diverse compounds, including those enclosed in the Leishmania box (a set of 592 compounds identified in HTS campaigns at GSK) (Peña et al., 2015) was evaluated in axenic L. donovani amastigotes, and the resulting metabolic changes were examined by capillary electrophoresis–mass spectrometry (**Figure 6**). Next, a principal component analysis (PCA) was applied to generate a model that assorts these compounds according to their putative mode of action. The authors demonstrated structural patterns involved in the modulation of different metabolic pathways and additionally, the role of physicochemical properties in the stimulation of individual biochemical routes. The study is very interesting, as it enables the classification of compound databases according to the most likely mechanism of action and biological outcomes. It also provides a way to run mechanistic studies of compounds that are known to be active against Leishmania species, thus offering a guide for downstream experimental profiling.

With the aid of QSAR modeling, Bhagat and coauthors described the synthesis and in vitro evaluation of 26 aminophosphonate derivatives (Bhagat et al., 2014). Six compounds (**12–17**, **Figure 7A**) displayed activity on L. donovani promastigotes in the low micromolar range (IC<sup>50</sup> from 7.10 to 8.95 µM) and cytotoxicity on J774 macrophages comparable to that of amphotericin B. The authors took the gathered data for the whole compound series to build Comparative Molecular Field Analysis (CoMFA) models that have high predictive ability (r 2 pred = 0.87) (Cramer et al., 1988). The models provided useful insights for future efforts on the optimization of this series. The CoMFA contour maps indicated that adding an electronegative group at the para position and a bulky electropositive substituent at the meta position in ring A would improve biological activity. Additionally, replacing ring B with substituted heterocyclic systems was stressed to be a worthwhile strategy for achieving more potent α-aminophosphonates as novel antileishmanial agents.

against L. major promastigotes and amastigotes.

In a recent study, Temraz et al. (2018) reported the design of 1,2,3-triazole and thiosemicarbazone hybrids as novel antileishmanial compounds and the calculation of their ADMET profile. Out of the 17 evaluated molecules, most of them exhibited biological activity that is comparable or superior to that of the reference drug miltefosine. The most promising analogs, **18** and **19**, exhibited IC<sup>50</sup> values of 227.4 and 140.3 nM, respectively, on L. major promastigotes (**Figure 7B**). On amastigotes, IC<sup>50</sup> values of 1.4 and 1 µM were obtained for compounds **18** and **19**, respectively. The folate pathway was proposed as the target metabolic route, since folic acid reversed the antiparasitic activity. Toxicity data on VERO cells showed a selectivity profile that was superior to that of miltefosine (SI > 3000). Additionally, compounds **18** and **19** demonstrated no acute toxicity in mice at doses up to 125 mg/kg (oral) and 75 mg/kg (parenteral). Calculation of ADMET parameters demonstrated the druglikeness of these compounds and their agreement with Lipinski's rule of five. Considering the activity, selectivity, physicochemical and ADMET data, these triazole and thiosemicarbazone hybrids consist of promising lead compounds to be further investigated.

Tetrahydro-β-carboline derivatives have recently been reported to have antileishmanial activity. In an investigation by Ashok et al. (2016) 16 analogs were designed, and most of them showed promising activity against L. infantum promastigotes (IC<sup>50</sup> from 1.99 to 20.69 µM) and amastigotes (IC<sup>50</sup> from 0.67 to 4.16 µM). Compound **20**, the most potent one (IC<sup>50</sup> = 0.67 µM for amastigotes), showed activity comparable to that of amphotericin B (IC<sup>50</sup> = 0.32 µM) and a selectivity index (SI) that is superior to 298 for the parasite over mammalian cells (**Figure 8A**). All compounds underwent QSPR studies for physicochemical profiling. Most analogs, including **20**, showed no violation of the Lipinski's rule of five, demonstrating that they are likely to have good bioavailability. Given the gathered activity, selectivity and physicochemical data, this series

consists of appropriate starting points for further investigation. Additional studies would be highly desirable for evaluating the in vivo reduction in parasite burden and hence, the potential of this series as novel drug candidates for leishmaniasis.

Steroid derivatives were described as novel antileishmanial agents in a recent report by da Trindade Granato et al. (2018). Out of the 16 synthesized analogs, cholesterol derivative **21** and some deoxycholic acid (DOA) derivatives proved active against Leishmania promastigotes (**Figure 8B**). Most DOAs were active against L. amazonensis intracellular amastigotes and displayed low toxic effects to macrophages. DOA **22** showed the best antiparasitic activity (IC<sup>50</sup> = 15.34 µM) against amastigotes, which led to the investigation of its mechanism of action. Treatment of L. amazonensis with **22** led to the depolarization of the mitochondrial membrane potential and augmented reactive oxygen species (ROS) concentration, resulting in the arrest of the cell cycle. Estimation of ADMET properties revealed the suitability of **22** for oral administration. Additionally, the predictions indicated that this compound would have good blood-brain barrier permeation and would be susceptible to metabolic clearance by CYP3A4 enzymes. Further efforts to improve the in vitro activity of **22** and evaluate its in vivo efficacy would be worthwhile.

### CONCLUSION

A number of drug candidates are undergoing lead optimization studies and advanced in vivo preclinical profiling for leishmaniasis. Some of them could reach the clinical development phase, which have recently been filled by evaluations of different treatment regimens and combinations of previously approved drugs. Despite these advances and outcomes, it is prudent to adopt a conservative mindset given the long path that these compounds will have to take until potential approval and the high attrition rates that characterize pharmaceutical research. In this context, longlasting efforts will be required to support state-of-the-art research programs that focus on the discovery of novel lead compounds for leishmaniasis. Such programs do exist today and have taken major advantage of the plentiful availability of data on Leishmania, as they move from trial-and-error to rational drug

design. Current SBDD and LBDD campaigns have steadily contributed to rationalizing experimental data, thus providing effective insights into the design of optimized compounds. An important advance would be the validation of a higher number of molecular targets. Opportunely, some research centers have put intense efforts into this issue by developing large-scale chemical genomics and target deconvolution expertise. Regardless of the challenges ahead, chemoinformatics have been an important tool to prospect and profile promising compounds. This is corroborated by the findings discussed herein, which illustrate the rewarding integration of computational and experimental strategies in leishmaniasis drug R&D.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

We gratefully acknowledge financial support from the São Paulo Research Foundation (FAPESP), grants 2013/07600-3 and 2013/25658-9, and the National Council for Scientific and Technological Development (CNPq), Brazil.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ferreira and Andricopulo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Scaffold Diversity of Fungal Metabolites

Mariana González-Medina<sup>1</sup> , John R. Owen<sup>2</sup> , Tamam El-Elimat <sup>3</sup> , Cedric J. Pearce<sup>4</sup> , Nicholas H. Oberlies <sup>5</sup> , Mario Figueroa<sup>1</sup> and José L. Medina-Franco<sup>1</sup> \*

<sup>1</sup> Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de México, Mexico, Mexico, <sup>2</sup> High-Performance Computing Research Group, ECIT Institute, Northern Ireland Science Park, Belfast, UK, <sup>3</sup> Department of Medicinal Chemistry and Pharmacognosy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, Jordan, <sup>4</sup> Mycosynthetix, Inc., Hillsborough, NC, USA, <sup>5</sup> Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, USA

Many drug discovery projects rely on commercial compounds to discover active leads. However, current commercial libraries, with mostly synthetic compounds, access a small fraction of the possible chemical diversity. Natural products, in contrast, possess a vast structural diversity and have proven to be an outstanding source of new drugs. Several chemoinformatic analyses of natural products have demonstrated their diversity and structural complexity. However, to our knowledge, the scaffold content and structural diversity of fungal secondary metabolites have never been studied. Herein, the scaffold diversity of 223 fungal metabolites was measured and compared to the diversity of approved drugs and commercial libraries for HTS containing natural, synthetic, and semi-synthetic compounds. In addition, the global diversity of the fungal isolates was assessed and compared to other reference data sets using Consensus Diversity Plots, a chemoinformatic tool recently developed. It was concluded that fungal secondary metabolites are cyclic systems with few ramifications and more diverse than the commercial libraries with natural products and semi-synthetic compounds. The fungal metabolites data set was one of the most structurally diverse, containing a large proportion of different and unique scaffolds not found in the other compound data sets including ChEMBL. Therefore, fungal metabolites offer a rich source of molecules suited for identifying diverse candidates for drug discovery.

Keywords: chemical space, cheminformatics, consensus diversity plots, generative topographic mapping, molecular diversity, natural products, fungal metabolites

### INTRODUCTION

With a dramatic increase in commercially available compounds and the accessibility to high throughput screening (HTS), many current drug discovery projects rely on commercial libraries to uncover novel active compounds against different molecular targets (Roy et al., 2010). However, numerous analyses have revealed that libraries with poor diversity undermine HTS productivity, thus reducing the probability to find active compounds. Many research groups are investing in enhancing their collections by adding compounds with different chemotypes rather than simply increasing the size of their compound libraries (Macarron et al., 2011). Although, a highly diverse compound library would be considered the most profitable starting point to find new leads, the term diversity generates constant debate since the optimum composition of a library depends

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Kathia Honorio, University of São Paulo, Brazil Antonio Macchiarulo, University of Perugia, Italy

#### \*Correspondence:

José L. Medina-Franco medinajl@unam.mx; jose.medina.franco@gmail.com

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 14 February 2017 Accepted: 17 March 2017 Published: 03 April 2017

#### Citation:

González-Medina M, Owen JR, El-Elimat T, Pearce CJ, Oberlies NH, Figueroa M and Medina-Franco JL (2017) Scaffold Diversity of Fungal Metabolites. Front. Pharmacol. 8:180. doi: 10.3389/fphar.2017.00180 on the research objectives. Nonetheless, it has been shown that a diverse compound library is directly linked to a higher hit discovery rate than a similar sized combinatorial library with limited structural variation (Harper et al., 2004).

Natural products have a vast diversity and are rich sources of bioactive compounds (Hong, 2011). Several studies have shown that natural products and drugs approved by the United States Food and Drug Administration (FDA) share regions of chemical space and have similar molecular properties (Gu et al., 2013). Moreover, natural products have novel and complex chemotypes (Yongye et al., 2012) and new chemical structures from natural origin are constantly being discovered (Rosen et al., 2009). Therefore, natural products offer an excellent opportunity to enrich chemical libraries (Gu et al., 2013).

Specifically, natural products derived from fungi have been the source of many important approved drugs with diverse mechanisms of action (Pearce, 1997; Pearce et al., 2009). Fungi are widely found in nature and are able to generate novel structures with chemical diversity from simple starting materials including organic acids, sugars, amino acids, terpenes, and bases such as purines and pyrimidines. Gene sequencing has demonstrated there are multiple "silent" biosynthetic pathways, meaning there is genetic information that encodes for the synthesis of new products that have not been studied. Taken together with the vast number of unstudied fungal species in the world (Hawksworth and Rossman, 1997), fungi are a highly promising source for new medicines.

The number of in silico analyses of fungal metabolites is still limited but the interest in this area is increasing. El-Elimat et al. (2012) studied the chemical space of 105 compounds isolated from filamentous fungi using nine molecular descriptors, and compared them to other natural products and FDA-approved anticancer drugs. In that work it was concluded that fungal metabolites had a high overlap with the chemical space of anticancer drugs, which was an encouraging finding for the ongoing efforts to discover active anticancer compounds of fungal origin (Kinghorn et al., 2016). Gonzalez-Medina et al. (2016) analyzed a larger data set with 207 fungal isolates, adding more information on structural complexity and diversity of the fungal metabolites. In that work fungal metabolites were demonstrated to be more complex than approved drugs and commercial libraries, and as complex as compounds used in the food industry, Generally Recognized as Safe (GRAS). Those results suggested that fungal metabolites could be selective and have an appropriate toxicity profile. Furthermore, fungal metabolites had drug-like properties and covered similar chemical space of approved drugs as well as unexplored areas. However, the scaffold composition and diversity of fungal metabolites has not been studied in a systematic and quantitative manner.

The goal of this work was to measure the scaffold content and diversity of an in-house library with 223 fungal metabolites. Five data sets were used as reference: nonanticancer drugs approved by the FDA, anticancer drugs approved by the FDA, compounds based on the Flavor and Extract Manufacturers Association of the United States (FEMA), and two commercial libraries containing natural products and semi-synthetic compounds. Additional criteria, including molecular properties and fingerprints were used to obtain a complete scaffold analysis and to compare datasets of different size containing cyclic and acyclic compounds. Consensus Diversity Plot (CDP) (González-Medina et al., 2016), a novel chemoinformatic tool developed to analyze the global diversity of compound data sets, was employed to compare the total diversity of fungal metabolites with other reference collections.

## METHODS

#### Data Sets

The chemotype diversity was analyzed for a unique in-house library of 223 fungal metabolites (El-Elimat et al., 2012; Gonzalez-Medina et al., 2016). For reference, five data sets containing between 76 and 2,500 compounds were included in the analysis: compounds based on the FEMA GRAS list (hereafter referred to as GRAS; Burdock et al., 2006; Medina-Franco et al., 2012); FDA approved drugs obtained from DrugBank, version 4.0 (Wishart et al., 2006; Law et al., 2014) subdivided into: anticancer and non-anticancer drugs; and two datasets from a commercial vendor (http://www.acdiscovery.com) containing mostly natural products derived from plants (MEGx) and semi-synthetic compounds (NATx). **Table 1** summarizes all data sets used, including source and number of unique compounds after data curation. Duplicates in each data set were removed using Molecular Operating Environment (MOE), version 2014.0 (MOE, 2016). The complete data set of fungal metabolites is available upon request, the other data sets

#### TABLE 1 | Compound data sets analyzed in this work.


**Abbreviations:** AUC, area under the curve; CDP, Consensus Diversity Plot; CSR curves, cyclic system retrieval curves; FDA, Food and Drug Administration; FEMA, Flavor and Extract Manufacturers Association; GRAS, Generally Recognized as Safe; GTM, Generative Topographic Mapping; HBA, hydrogen bond acceptors; HBD, hydrogen bond donors; HTS, high throughput screening; Log P, the octanol/water partition coefficient; MACCS, Molecular ACCess System; MEQI, Molecular Equivalent Indices; MOE, Molecular Operating Environment; MW, molecular weight; N, number of chemotypes: Nsing, number of singletons; PCA, Principal Component Analysis; RBF, Radial Basis Function; RTB, number of rotatable bonds; SE, Shannon entropy; SSE, scaled Shannon Entropy; TPSA, topological polar surface area.

and the compounds information can be downloaded from the supporting information (Data Sheet 2).

#### Scaffold Definition and Acyclic Molecules

The term scaffold is now used extensively to describe the core structure of a molecule. Different ways to obtain the scaffold of a molecule have been reviewed elsewhere (Brown and Jacoby, 2006; Yan et al., 2009). In this work the scaffolds were derived with the methodology previously described by Johnson and Xu (Xu and Johnson, 2002). Chemotypes were calculated with the program Molecular Equivalent Indices (MEQI; Xu and Johnson, 2001, 2002) resulting in a code of five characters assigned to each chemotype using a unique naming algorithm (**Figure 1**). For this work, both acyclic and cyclic systems (hereafter referred to as chemotypes) were used to compare the structural diversity.

#### Chemotype Diversity

For each data set the number of unique chemotypes was recorded as well as the number of chemotypes containing only one compound. The fraction of chemotypes and singletons relative to the number of molecules in the data set were analyzed.

Cyclic system retrieval (CSR) curves were computed for each data set to analyze the distribution of chemotypes (Lipkus et al., 2008). To generate the CSR curves, the fraction of chemotypes was plotted on the X-axis and the fraction of compounds that contain those chemotypes was plotted on the Y-axis. Information such as the fraction of chemotypes required to retrieve a certain percentage of the molecules in the database and the area under the curve (AUC) can be obtained from these curves. For this work CSR curves were characterized calculating the AUC and the fraction of chemotypes required to retrieve 50% of the molecules (F50). The F50 metric has been used as a measure of scaffold diversity (Krier et al., 2006; Lipkus et al., 2008; Medina-Franco et al., 2009; Yongye et al., 2012).

As previously reported, the concept of Shannon entropy (SE) (Godden and Bajorath, 2007) was used to determine the distribution of compounds in the n most populated chemotypes based on histogram representations (Medina-Franco et al., 2009). The SE of a population of P compounds in n systems is defined as:

$$\text{SE} = -\sum\_{i=1}^{n} p\_i \log\_2 p\_i \tag{1}$$

$$p\_i = \frac{c\_i}{P} \tag{2}$$

where p<sup>i</sup> is the estimated probability, or frequency, of the occurrence of a specific chemotype i in a population of P compounds containing a total of n chemotypes and c<sup>i</sup> is the number of molecules containing a particular chemotype. The value of SE ranges from 0, when all the compounds have the same chemotype, and it takes its maximum value when SE equals to log<sup>2</sup> n, meaning that all the compounds are evenly distributed among the n chemotypes representing a highly diverse data set.

To normalize SE by the different most populated chemotypes n, the scaled Shannon entropy (SSE) is defined as:

$$\text{SSE} = \frac{\text{SE}}{\log\_2 n} \tag{3}$$

SSE values range from 0, when all the molecules in the data set contain only one chemotype, to 1 indicating high diversity within the n chemotypes. Here, different numbers of n (from 5 to 70) were analyzed.

#### Fingerprints and Molecular Properties

The inter- and intra-molecular properties diversity for each data set was analyzed based on structural fingerprints and molecular properties. Molecular ACCess System (MACCS) keys (166 bits) fingerprints were computed with MayaChem Tools (Sud, 2016) and R Studio scripts (Team, 2015). To compare the data sets, six properties of pharmaceutical relevance were calculated

with MOE software: hydrogen bond donors (HBD), hydrogen bond acceptors (HBA), the octanol/water partition coefficient (LogP), molecular weight (MW), topological polar surface area (TPSA), and number of rotatable bonds (RTB). These molecular descriptors have been used previously to measure molecular properties diversity (Gonzalez-Medina et al., 2016).

#### Similarity Coefficients

There are many ways in which the similarities between pairs of molecules can be calculated. Here, we used two wellknown measures to compare discrete and continuous variables. The Soergel distance function is a complement of Tanimoto similarity coefficient (Owen et al., 2011), widely used for binary fingerprints.

$$\text{Tanimotor} \left( \mathbf{x}, \mathbf{y} \right) = \left( \frac{\mathbf{x}. \mathbf{y}^T}{\mathbf{x}. \mathbf{x}^T + \mathbf{y}. \mathbf{y}^T - \mathbf{x}. \mathbf{y}^T} \right) \tag{4}$$

$$\text{Soergel}\left(\mathbf{x}, \mathbf{y}\right) = 1 - \text{tanimoto}\left(\mathbf{x}, \mathbf{y}\right) \tag{5}$$

The similarity coefficient between data sets (duv) was calculated with a Soergel-based inter-data set distance function, previously described by Owen et al. (2011).

$$\mathbf{d}\_{\rm uv} = \frac{1}{N\_{\rm u} N\_{\rm v}} \sum\_{i=1}^{N\_{\rm u}} \sum\_{j=1}^{N\nu} \text{sowgel } (\boldsymbol{\pi}\_{i}^{\boldsymbol{\mu}}, \boldsymbol{\pi}\_{j}^{\boldsymbol{\nu}}) \tag{6}$$

where N<sup>u</sup> and N<sup>v</sup> are the number of molecules in data sets D<sup>u</sup> and Dv, and x u i and x v j are the fingerprint vectors from the compounds i or j of the fingerprint array for the data sets D<sup>u</sup> or Dv, respectively. The diversity of the molecules within a single data set (du) was calculated rearranging the Equation 6:

$$\mathbf{d}\_{\mathbf{u}} = \frac{2}{N\_{\boldsymbol{u}}^{2}} \sum\_{i=1}^{N\boldsymbol{u}-1} \sum\_{j=i+1}^{N\boldsymbol{u}} \text{sowgel } (\boldsymbol{\pi}\_{i}^{\boldsymbol{u}}, \boldsymbol{\pi}\_{j}^{\boldsymbol{u}}) \tag{7}$$

The distance (or dissimilarity) between any two data sets, D<sup>u</sup> and Dv, was computed using the Euclidean distance (Perez, 2005; Karthikeyan and Vyas, 2014), Equation (8), as follows. Let x<sup>i</sup> be the N-dimensional vector of molecular properties for molecule i in data set Du; similarly, let y<sup>i</sup> be the N-dimensional vector of molecular properties for molecule j in data set Dv. (For the analyses in this article, 6 molecular properties where used, so N = 6). Let the number of molecules in data sets D<sup>u</sup> and D<sup>v</sup> be U and V, respectively. Then the inter-data set distance between data sets D<sup>u</sup> and Dv, was computed as introduced in Equation (9):

$$\text{Euclidean } \langle X\_i, \ Y\_j \rangle = \sqrt{\sum\_{k=1}^{N} \left( X\_{ik} - Y\_{jk} \right)^2} \tag{8}$$

$$I\_{\text{uv}} = \frac{1}{UV} \sum\_{i=1}^{U} \sum\_{j=1}^{V} \text{ }eucliean \text{ ( $X\_i$ ,  $Y\_j$ )} \quad \text{(9)}$$

#### Global Diversity Analysis with Consensus Diversity Plots (CDPs)

CDPs have been designed to compare the diversity of compound data sets analyzing, in two dimensions, four criteria of diversity (González-Medina et al., 2016). Herein, we employed two metrics to quantify structural diversity: MACCs keys/Soergelbased distance, plotted on the X axis, and AUC, on the Y axis. The third property analyzed in the CDPs was the molecular properties intra-data set distance, calculated with Euclidean distance. This property is represented in the plot by the color of each data point: data sets in red had the highest Euclidean distances, i.e., are the most diverse, data sets in orange/brown have intermediate diversity values and data sets in green are the least diverse. The fourth property represented on this plot was the size of the data sets. This property is represented by the relative size of the data point representing each set; bigger data points correspond to data sets with more compounds. Four regions, in different colors, can be distinguished on the plot: the region in red contains the most diverse data sets, i.e., this data sets are diverse either by their scaffold content or if features of the entire molecule are analyzed and compared using fingerprints; the white region shows the least diverse data sets, i.e., these data sets were the least diverse by scaffold content and fingerprints/similarity; blue, all data sets in this region contain either acyclic compounds which are diverse if the entire molecule is compared (i.e., using fingerprints) or data sets containing cyclic systems for which side chains contribute significantly to their diversity; yellow, this fourth region contains data sets diverse by the number of different scaffolds with few ramifications. To set the four regions on the plots we chose a threshold for each axis: a value of 0.75 was chosen as the threshold for the y axis, considering that the lowest AUC value a data set could have is 0.5, if it is highly diverse by scaffolds, and the highest AUC value it could have is 1; the threshold for the x axis was the median of the Soergel intra-data set distance obtained from MACCS keys fingerprints for each set, therefore this threshold is specifically for the data sets analyzed in this work. As previously discussed, other thresholds can be set up to define the quadrants of the CDPs (González-Medina et al., 2016).

#### Visual Representation of the Chemical Space

Two approaches were used to cluster and visualize the molecules in the data sets based on their molecular properties and structural features: Principal Component Analysis (PCA) (Jolliffe, 2002) and Generative Topographic Mapping (GTM) (Osolodkin et al., 2015). PCA is a technique often used to emphasize variation and find patterns in a data set. The main disadvantage of PCA is that it is a linear mapping technique and is unable to map non-linear data. GTM is a nonlinear method that trains a Radial Basis Function (RBF) neuronal network to produce a mapping from an n-dimensional data space to a two dimensional latent space (Owen et al., 2011; Gaspar et al., 2013). For further explanation on each model, the reader is referred to the cited papers (Gaspar et al., 2013, 2015). To represent the chemical space using molecular fingerprints, a fingerprint array was assembled from the MACCS key fingerprint results, consisting of 166 bits in which each element is either 0 or 1 to indicate the absence or presence, respectively, of structural elements in the corresponding molecular structure. The six molecular properties of pharmaceutical relevance (HBD, HBA, LogP, MW, TPSA, and RTB) were arranged in a similar way and were used as the data set for the models. GTM and PCA were used as dimensionality reduction techniques to encode all the molecular properties and fingerprints into two-dimensional spaces that could be visualized easily. All the models and visualizations were implemented using the Matlab toolbox Netlab (Nabney, 2002).

### RESULTS AND DISCUSSION

The scaffold diversity of the fungal metabolites was compared to data sets with biological relevance like approved drugs and commercial libraries available for HTS. In this work the chemotypes were calculated with the program MEQI (Xu and Johnson, 2001, 2002), as described in the Section Materials and Methods. Table S1 shows the most frequent chemotypes found in the fungal metabolites data set, along with their chemotype identifier. Interestingly, it was found that this library has several unique scaffolds not found in the reference data sets. To further explore the uniqueness of the scaffolds of the fungal metabolites, we compared the scaffolds from this data set with the scaffolds of all the compounds found in ChEMBL, version 22 (Bento et al., 2014; Davies et al., 2015). An exceptional finding was that out of the 130 different scaffolds in the fungal metabolites set, 26 were not found in ChEMBL or any other data set studied in this work. **Figure 2** shows representative scaffolds in the fungal metabolites data set not found in other data sets. Most of these compounds have been shown to have cytotoxicity against a variety of human tumor cell lines. For example, the chemotype TBEMM corresponds to the cytotoxic compounds Acremonidin C and Acremonidin A, reported by Ayers et al. (2012). The scaffolds with the chemotype V7D6X and YVGCT correspond to Palmarumycin CP3 and Palmarumycin CP4, whose cytotoxic activity has not been reported. However, their structural similarity with Palmarumycin CP1 could indicate that the compounds in the fungal metabolites data set with these scaffolds could have antibacterial, antifungal and antitumoral activities (Kornienko et al., 2015). The scaffolds with the codes 8MY2X and ROFC5 belong to new secondary metabolites isolated from Eupenicillium brefeldianum and Aspergillus fumigatus, respectively, and their biological activity has not been reported. **Figure 2** exemplifies the considerable structural variation among substances that have been isolated and characterized from filamentous fungi.

#### Counts

**Table 2** summarizes the number of chemotypes (N) in each database, the fraction of chemotypes relative to the number of molecules in each data set (N/M), and the number and fraction of singletons (Nsing). Based on N/M values, the set of fungal

metabolites, containing 223 compounds, has an intermediate chemotype diversity (N/M = 0.587), similar to the proportion of chemotypes in the non-anticancer drugs library, containing 1,399 compounds (N/M = 0.572). The set of anticancer drugs has fewer compounds but the largest proportion of chemotypes relative to the number of molecules (N/M = 0.921) and the largest proportion of singletons relative to the number of molecules (Nsing/M = 0.855). In contrast, GRAS, NATx, and MEGx data sets with more compounds (**Table 1**) have the lowest scaffold diversity with a smaller proportion of chemotypes and singletons.

#### CSR Curves

CSR curves represent the fraction of compounds in the data set (y-axis) contained in a fraction of chemotypes (x-axis). A data set with maximum diversity would contain a different chemotype for each molecule in the library and the CSR curve would be a diagonal with an AUC of 0.5. **Figure 3** shows the CSR curves calculated using the chemotypes of all the data sets analyzed in this study.

The CSR curve for the fungal metabolites indicates this data set contains more different scaffolds than MEGx, NATx, GRAS, and the non-anticancer drugs. All these data sets contain at least six times more compounds than the set with fungal metabolites (**Table 1**). The CSR curve for the anticancer drugs is closer to a diagonal indicating large diversity, while the curves for GRAS undergoes a sudden increase on its slope indicating this data set has the lowest chemotype diversity. AUC and the fraction of chemotypes that contains 50% of the molecules in the data set (F50) were used to compare the curves for each set quantitatively (**Table 2**). An AUC value closer to one indicates low chemotype diversity and higher F<sup>50</sup> values indicate higher diversity. Based on these metrics, the fungal metabolites are more diverse than MEGx and NATx, commercial data sets with 2,500 natural products and semi-synthetic compounds and approved non-anticancer drugs, with an AUC of 0.644 and a F<sup>50</sup> = 0.244. As expected, anticancer drugs showed the lowest AUC and the largest F<sup>50</sup> values (0.537 and 0.457, respectively). In agreement with other metrics of scaffold diversity (i.e., N/M), the GRAS and MEGx libraries had the highest AUC and lowest F<sup>50</sup> values, respectively, indicating low diversity.



N, number of chemotypes; M, number of molecules; Nsing, number of singletons; AUC, area under the curve; F50, fraction of chemotypes that contains 50% of the data set.

### Scaled Shannon Entropy (SSE)

SSE was computed to get an idea of the compound distribution in the most populated chemotypes. For this approach, a SSE value closer to 1 indicates that compounds are evenly distributed in the different chemotypes and a low SSE value (i.e., closer to 0) means all the compounds share the same chemotype. SSE will have its maximum value only when all chemotypes contain the same number of compounds, or when each chemotype contains only one compound. **Table 3** summarizes the SSE for the first 70 most populated chemotypes in each library. The chemotype diversity of the fungal metabolites is higher (SSE values ranging from 0.942 to 0.967) compared to the non-anticancer drugs and the

FIGURE 3 | Cyclic system retrieval (CSR) curves for the data sets studied in this work. The curve for the anticancer drugs indicates large chemotype diversity. In contrast, the curve for GRAS, MEGx, and NATx suggest less diversity. The curves can be characterized quantitatively by the area under the curve (AUC) and the fraction of chemotypes required to retrieve 50% of the compounds in the data sets F50 (see Table 2).



commercial libraries NATx and MEGx, which represent larger data sets containing natural products. Compounds in the library with anticancer drugs are more evenly distributed among the chemotypes studied (SSE values higher than 0.960). The least diverse set is GRAS (SSE values ranging from 0.502 to 0.617). Of note, the most diverse data sets, the fungal metabolites and the anticancer drugs, are also the smallest data sets containing only 223 and 76 compounds, respectively (**Table 1**). Overall, the SSE values vary for the rest of the libraries, indicating that that scaffold diversity decreases in this order: anticancer drugs, fungal metabolites, NATx, MEGx, non-anticancer drugs, and GRAS. Interestingly, if the most populated chemotypes in NATx and MEGx are analyzed, these sets are more diverse than that of the non-anticancer drugs.

**Figure 4** shows the distribution and SSE values of compounds in the top 70 most populated chemotypes of representative data sets. Data sets with higher SSE are colored dark red and data sets with lower SSE are light red. The chemotypes for the fungal metabolites, **Figure 4B**, are more evenly distributed after the top 10 most populated chemotypes and is the second most diverse data set. **Figure 4A** shows that anticancer drugs take its maximum SSE value when all the chemotypes are considered, indicating there is almost one different chemotype for each molecule in this data set. MEGx (**Figure 4C**) has SSE values

between 0.883 and 0.856; for this library the first most populated chemotype contains 195 compounds and the scaffolds are more evenly distributed after the first 20 most populated chemotypes. This is also the case with GRAS (**Figure 4D**), the least diverse set, measured with SSE, for which the most populated chemotype contains 1,055 compounds, nearly half of the data set. The distribution of the compounds in each chemotype and the SSE70 value for the other data sets are shown in Figure S1.

### Inter- and Intra-Library Similarities Using MACCS Keys and Molecular Properties

As stated in the Methods, the inter- and intra- library similarity was computed using MACCS keys/Soergel-based distance and molecular properties/Euclidean distance. **Figures 5A,B** show the corresponding distance matrices computed with MACCS keys and molecular properties, respectively. Values along the diagonal in red represent the intra-library diversity, i.e., the diversity within the compounds contained in a data set: the least diverse libraries are in light red while the most diverse libraries are in dark red. The values in blue represent the inter-library diversity, i.e., the diversity between the compounds in all the data sets: the least diverse libraries are in light blue while the most diverse libraries are in dark blue.

#### MACCS Keys—Structural Features

In **Figure 5A** the inter-library similarity matrix, in blue, shows that the fungal metabolites are structurally different to approved drugs, with a distance of 0.62 to the anticancer drugs and a distance of 0.63 to the non-anticancer drugs. Of note, the fungal metabolites and MEGx have similar structural features, but both libraries are structurally different to the semi-synthetic compounds in NATx. NATx is the data set most similar to approved drugs. This suggests that semi-synthetic compounds have been modified to be structurally similar to approved drugs, decreasing their structural similarity to natural products.

In **Figure 5A** the intra-library similarity in the red diagonal shows that GRAS and non-anticancer drugs are the most diverse data sets using MACCs keys (with intra-set distance of 0.61 and 0.63, respectively). In contrast, GRAS is the set with the lowest scaffold diversity. The reason for this is that 65% of GRAS molecules are classified into two chemotypes, namely, noncyclic structure (49%; 00000), and benzene ring (16.3%; RYLFV). Nonetheless, having the same chemotype does not imply that molecules should present the same chemical features, especially with very common/simple chemotypes as in this case. This is a good example of how diversity analysis should be conducted using multiple metrics (Singh et al., 2009; Gonzalez-Medina et al., 2016).

#### Molecular Properties

According to the distance scores of the molecular properties, the fungal metabolites intra-library molecular properties, **Figure 5B** red diagonal, are more diverse than the properties of nonanticancer drugs, GRAS and NATx, with a Euclidian distance equal to 2.73. Comparing the fungal metabolites inter-library distances to the lowest inter and intra-library distances obtained for other data sets, e.g., GRAS intra-library similarity with

FIGURE 5 | Intra and inter-library similarity. The diagonal in red depicts intra-library comparisons, i.e., the similarity between the compounds in a data set. Dark red scores indicate large distance or low similarity, while light red colors indicate small distance or high similarity. The matrix in blue depicts inter-library similarity comparisons, i.e., the similarity between the compounds in the data sets. Dark blue scores indicate large distance or low similarity, while white or light blue colors indicate small distance or high similarity. (A) Soergel distance using MACCS keys (166-bit) fingerprints. (B) Euclidean distance function using molecular properties.

a value of 1.00 or NATx and non-anticancer drugs with an inter-library similarity of 1.90, the fungal metabolites have diverse molecular properties compared to the other data sets. Of note, the inter-library results, in a blue scale, show that the fungal metabolites have the largest dissimilarity with GRAS, which has been previously demonstrated to contain smaller compounds with less HBD, HBA, MW, and TPSA than the fungal metabolites (Gonzalez-Medina et al., 2016). Table S2 contains the statistics of each property for all the data sets. **Figure 5B** also shows that NATx has the lowest inter-data set distance (more similar) to the rest of the data sets studied and GRAS is the least similar (i.e., the most distant) to the other libraries. Interestingly, approved anticancer drugs and GRAS present the largest distance to the other data sets, with an added distance of 28.72 and 27.37, respectively. As previously discussed (Gonzalez-Medina et al., 2016), compounds in the data set containing approved anticancer drugs show the largest distance (dissimilarity) to the nonanticancer drugs.

### Global Diversity Analysis with Consensus Diversity Plots (CDPs)

**Figure 6** shows a CDP, which compares the global structural diversity of all data sets, by plotting MACCs keys/Soergelbased distance in the x axis and AUC in the y axis. The size of the data points represents the relative size of each data set (**Table 1**) and the color of each data point represents the molecular properties diversity (**Figure 5B**). Remarkably, the fungal metabolites, a data set with 223 compounds, had more different scaffolds than data sets with 2,500 compounds, such as MEGx and NATx; the fungal metabolite dataset is on average, more structurally diverse than MEGx and more diverse than NATx when considering molecular properties. The fungal metabolites and the anticancer drugs are located in the yellow quadrant, indicating high scaffold diversity but low structure (fingerprint-based) diversity. Furthermore, the data point in red, representing the fungal metabolites, indicates this data has diverse molecular properties. Overall, non-anticancer drugs, in the red quadrant, are the most structurally diverse (with a Soergel-based distance of 0.63 and an AUC of 0.699). However, non-anticancer drugs in orange/brown are less diverse by molecular properties than the fungal isolates. GRAS, in the blue quadrant, is the most diverse library when structural features are taken into account, but the compounds in this data set have low molecular properties diversity. Compared to the other data sets, MEGx, in the white quadrant, is the least structural diverse. The molecular properties diversity is independent of the structural diversity or the size of the libraries, that is, small data sets can be both structurally diverse and diverse by their molecular properties, or structurally diverse but with low molecular properties diversity.

### Visual Representation of the Chemical Space

**Figure 7** depicts the visual representation of the six data sets generated with GTM using the structural features MACCS keys. The fungal metabolites occupy similar areas of the structural space of MEGx, which is in agreement with the results observed on **Figure 5A**. The clusters of compounds in the structural space of the fungal metabolites are in different areas of the space of most of the approved drugs, and particularly, from the approved anticancer drugs. This is also in line with the results on **Figure 5A** and could give the notion that different

considered and/or side chains contribute significantly to the diversity; yellow, the scaffolds of the molecules are the main factor contributing to the diversity and/or this set contains mostly rings with few side chains. Data points are colored by the diversity of the physicochemical properties of the data set as measured by the Euclidean distance of six properties of pharmaceutical relevance. The distance is represented with a continuous color scale from red (more diverse), to orange/brown (intermediate diversity) to green (less diverse). The relative size of the data set is represented with the size of the data point: smaller data points indicate compound data sets with fewer molecules. A value of 0.75 for AUC and the median value of the MACCs keys fingerprints/Soergel-based distance were used to set the quadrants.

structural features found in the fungal metabolites are not found in the approved drugs. Interestingly, semi-synthetic compounds (NATx) are in different areas of the structural space of natural products, compared with the fungal metabolites and MEGx. Approved non-anticancer drugs and MEGx are the most dispersed, whereas GRAS seems to be more clustered in a high-density region that contains some compounds from MEGx.

Figure S2 depicts the visual representation of the chemical space generated with GTM using physicochemical properties. The fungal metabolites form small clusters and occupy similar areas of the physicochemical space of MEGx, NATx, and the nonanticancer drugs, with a few exceptions found on the bottom left of the fungal metabolites plot, but occupy different areas than the anticancer drugs. NATx and GRAS are less distributed in the chemical space. This result is in agreement with **Figure 5B**.

The visualization generated with PCA using MACCS keys fingerprints (Figure S3) generated clusters of molecules easier to interpret. The results obtained with this representation were in line with the results obtained with GTM. Based on the structural features encoded by MACCS keys, some fungal metabolites are in the same region as the approved anticancer and non-anticancer drugs. However, most of the molecules in the data sets containing natural products, MEGx and the fungal metabolites, are clustered together in a region separated from the other data sets. Figure S4 depicts the visualization of the six molecular properties (described in the Materials and Methods Section) using PCA: the fungal metabolites are in similar regions as the non-anticancer drugs, with a few compounds dispersed similarly to MEGx. Anticancer drugs are the most spread (more diverse), while GRAS is more constrained in to specific areas of the chemical space. These results are also in agreement with results derived from **Figure 5B**.

### CONCLUSIONS

Using computational-driven approaches, this work reports the structural diversity and scaffold content of a set of 223 fungal metabolites isolated and characterized in discovery projects funded by the USA National Cancer Institute and the Mexican National Research Council of Science and Technology. Generally speaking, most of these compounds were isolated while pursuing new anticancer drug leads. The structural diversity of the fungal metabolites was quantified using three complementary approaches: Cyclic Systems Retrieval curves, Shannon entropy, and molecular fingerprints. The dataset of fungal metabolites was compared to datasets that represent synthetic, semi-synthetic, and natural products commercially available for HTS and approved drugs. It was concluded that most of the chemical structures of the fungal metabolites are cyclic compounds with few side chains. The diversity analysis showed that the set of fungal secondary metabolites herein studied is more diverse than commercial libraries with natural products and semi-synthetic compounds despite the fact that the reference collections are expected to be diverse and contain more compounds. Moreover, the fungal dataset was developed mostly via pursuing leads that were cytotoxic to cancer cell lines; if the diversity of the targets were to be expanded, the resultant chemical diversity may expand as well. Moreover, the fungal metabolites have a large proportion of different and unique scaffolds not found in the other reference sets, including ChEMBL. Additionally, visualizations of the chemical space, based both on molecular fingerprints and molecular properties, revealed that the fungal metabolites cover different areas of chemical

#### REFERENCES


space when compared to that of approved drugs, offering the possibility to expand the medicinally-relevant chemical space. For example, this diverse data set could be used for HTS to find new hits with new scaffolds and diverse properties. The high and unique scaffold diversity of fungal metabolites revealed in this work, in addition to the high structural complexity and balanced molecular properties revealed in previous studies (Greve et al., 2010; El-Elimat et al., 2012; Cragg and Newman, 2013; Gonzalez-Medina et al., 2016), further supports fungal metabolites as a promising sources of novel compounds for drug discovery.

### AUTHOR CONTRIBUTIONS

MG and JO performed the calculations. MG and JM designed the study. TE, CP, NO, and MF participate in interpreting the calculations. All authors participate in analyzing data and writing the manuscript.

#### ACKNOWLEDGMENTS

We thank the Universidad Nacional Autónoma de México (UNAM) for grant PAPIME PE200116 and the Consejo Nacional de Ciencia y Tecnología (CONACyT) for grant 236564. The isolation of fungal metabolites from the Mycosynthetix library via researchers at the University of North Carolina at Greensboro was funded in part by grant P01 CA125066 from the National Institutes of Health/National Cancer Institute. We thank Fernando Prieto-Martinez and Oscar Méndez-Lucio for helpful discussions.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fphar. 2017.00180/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 González-Medina, Owen, El-Elimat, Pearce, Oberlies, Figueroa and Medina-Franco. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations With Novel Structures

Daniel J. Mason1,2, Richard T. Eastman<sup>3</sup> , Richard P. I. Lewis <sup>1</sup> , Ian P. Stott <sup>4</sup> , Rajarshi Guha<sup>3</sup> \* † and Andreas Bender <sup>1</sup> \*

<sup>1</sup> Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom, <sup>2</sup> Healx Ltd., Cambridge, United Kingdom, <sup>3</sup> Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States, <sup>4</sup> Unilever Research and Development, Wirral, United Kingdom

### Edited by:

Leonardo L. G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Alfonso T. García-Sosa, University of Tartu, Estonia Andreas Dominik, Technische Hochschule Mittelhessen, Germany Deepak Singla, National Institute of Malaria Research (ICMR), India

\*Correspondence:

Rajarshi Guha rajarshi.guha@gmail.com Andreas Bender ab454@cam.ac.uk

#### †Present Address:

Rajarshi Guha, Vertex Pharmaceuticals, Boston, MA, United States

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 02 May 2018 Accepted: 07 September 2018 Published: 02 October 2018

#### Citation:

Mason DJ, Eastman RT, Lewis RPI, Stott IP, Guha R and Bender A (2018) Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations With Novel Structures. Front. Pharmacol. 9:1096. doi: 10.3389/fphar.2018.01096 The parasite Plasmodium falciparum is the most lethal species of Plasmodium to cause serious malaria infection in humans, and with resistance developing rapidly novel treatment modalities are currently being sought, one of which being combinations of existing compounds. The discovery of combinations of antimalarial drugs that act synergistically with one another is hence of great importance; however an exhaustive experimental screen of large drug space in a pairwise manner is not an option. In this study we apply our machine learning approach, Combination Synergy Estimation (CoSynE), which can predict novel synergistic drug interactions using only prior experimental combination screening data and knowledge of compound molecular structures, to a dataset of 1,540 antimalarial drug combinations in which 22.2% were synergistic. Cross validation of our model showed that synergistic CoSynE predictions are enriched 2.74× compared to random selection when both compounds in a predicted combination are known from other combinations among the training data, 2.36× when only one compound is known from the training data, and 1.5× for entirely novel combinations. We prospectively validated our model by making predictions for 185 combinations of 23 entirely novel compounds. CoSynE predicted 20 combinations to be synergistic, which was experimentally validated for nine of them (45%), corresponding to an enrichment of 1.70× compared to random selection from this prospective data set. Such enrichment corresponds to a 41% reduction in experimental effort. Interestingly, we found that pairwise screening of the compounds CoSynE individually predicted to be synergistic would result in an enrichment of 1.36× compared to random selection, indicating that synergy among compound combinations is not a random event. The nine novel and correctly predicted synergistic compound combinations mainly (where sufficient bioactivity information is available) consist of efflux or transporter inhibitors (such as hydroxyzine), combined with compounds exhibiting antimalarial activity alone (such as sorafenib, apicidin, or dihydroergotamine). However, not all compound synergies could be rationalized easily in this way. Overall, this study highlights the potential for predictive modeling to expedite the discovery of novel drug combinations in fight against antimalarial resistance, while the underlying approach is also generally applicable.

Keywords: synergy, combinations, malaria, plasmodium falciparum, artificial intelligence, modeling

**31**

### INTRODUCTION

Malaria is a deadly and worldwide disease, with an estimated 445,000 deaths globally in 2016, of which 91% are estimated to have occurred in Africa (World Health Organisation, 2017). Despite global mortality rates declining by 62% between 2000 and 2015, this disease remains a major killer for children under 5 years, with a young life being taken every 2 min (World Health Organisation, 2017).

When exposed to antimalarial compounds, the malariacausing parasite Plasmodium falciparum can over time develop resistance to different therapies and via a number of distinct mechanisms (Mita and Tanabe, 2012). This tendency has rendered many antimalarial therapies ineffective in the past, and continues to threaten the current standards of care. In order to combat resistance, options include the design or discovery of new antimalarial compound classes or analogs that offer increased efficacy over those with prior use. However, in the present time, and in absence of these novel discoveries, the current World Health Organization (WHO) guidelines state that combinations of at least two effective antimalarial medicines with different modes of action need to be administered in order to help protect against resistance (World Health Organisation, 2015). At present, the standard of care listed by WHO includes artemisinin-based combination therapies (ACT), such as artemether with lumefantrine, artesunate with amodiaquine, and dihydroartemisinin with piperaquine (**Figure 1**). Resistance to artemisinins has arisen more recently in South East Asia (World Health Organisation, 2017), raising concern on the future effectiveness of ACTs since resistance to the ACT partner drug significantly decreases the clinical efficacy of the combination therapy (Bacon et al., 2007). Alarmingly, this concern has recently been confirmed in Cambodia, in the form of resistance to the first line treatment dihydroartemisinin-piperaquine by P. falciparum strain PfPailin (Imwong et al., 2017). The evolution and spread of multidrug resistant organisms renders the selection of novel drug combinations only a viable medium-term option, and there is continued effort to map ACT partner drugs by the World Wide Antimalarial Resistance Network (World Wide Antimalarial Resistance Network, 2014).

The combined properties resulting from a mixture of drugs is not always equivalent to the sum of their parts. Drug combinations are well-known to result in an increase or decrease in measured therapeutic efficacy (synergy or antagonism, respectively), result in no difference in effectiveness (additivity), or present an increase or decrease in the number of side effects experienced (drug-drug interactions, which would then also possibly represent synergy, albeit of undesired effects; Lehár et al., 2009; Tatonetti et al., 2012). In the case of malaria (and probably many other diseases one wants to treat), the desired effect sought after is usually synergy, i.e., a drug combination for which the antimalarial effect is greater than that observed by each compound alone, and greater than what would be expected by assuming solely additivity of compound effect (Sucher, 2014). In this case lower doses of each individual compound would be required, thereby potentially achieving the desired efficacy with in many cases reduced side-effects (Csermely et al., 2005).

Antimalarial drug combinations can be either novel, or represent the repurposing of drugs used previously for other purposes, such as in the use of tricyclic antidepressants in chloroquine-resistant strains of P. falciparum (Bitonti et al., 1988). High throughput screening for antimalarial compound combinations is one mechanism by which discovery of novel combinations may be found faster (Mott et al., 2015). However, the discovery of synergistic combinations is experimentally challenging: As the number of compounds increases, very quickly too does the number of potential combinations, in particular when considering multiple replicates, the requirement of screening concentration matrices, and possibly against different strains of the pathogen. For example, 100 compounds screened pairwise results in 4,950 compound combinations, and testing for synergy in a 6 × 6 dose-response matrix altogether requires 178,200 data points (with numbers increasing further when taking into account replicates, different strains, etc.; Cokol et al., 2014). Increasing the search space by the addition of just 25 more compounds would require over 100,000 further data points, due to combinatorial explosion.

Computational approaches have been investigated as a means to predict the synergistic interaction of compounds previously, with methods that utilize networks of pathways and simulation (Lehár et al., 2007; Nelander et al., 2008; Miller et al., 2013; Huang et al., 2014; Patel et al., 2014; Zhang et al., 2014), relationships between physicochemical properties (Yilancioglu et al., 2014), chemogenomics approaches (Bansal et al., 2014; Wildenhain et al., 2015; KalantarMotamedi et al., 2018), and single agent efficacies (Gayvert et al., 2017) and/or combinations (Menden et al., 2018) measured across multiple cell lines (for recent reviews of compound combination modeling and perspectives, see Bulusu et al., 2016; Weinstein et al., 2017; Tsigelny, 2018). A disadvantage to many of these approaches is that they often require experimental knowledge of underlying biological interactions between drugs and disease, or chemogenomic or phenotypic readouts (Jansen et al., 2009; Bansal et al., 2014; Wildenhain et al., 2015; Menden et al., 2018). This data may be difficult to obtain, non-existent, or expensive to collect enough to create a predictive model from. In addition, the prediction of novel combinations themselves will rely on the same experimental descriptors being available for each new compound.

In order to address these problems, we have developed CoSynE (Combination Synergy Estimation; Mason et al., 2017). CoSynE constructs predictive models from existing combination screening data, and utilizes only the known structures of compounds that have been part of these screens. As such, CoSynE requires only two pieces of information, namely a list of compounds together with their structural representations, and a list of compound combinations together with a label whether the action of each combination was found to be synergistic, antagonistic, or additive (depending on the criteria for those categories one finds appropriate in a particular case). The compounds are transformed into two classes of representation by CoSynE: Firstly, a compound structure fingerprint (SFP; a 2048-bit Morgan Fingerprint), and secondly a predicted target fingerprint representing bioactivity spectra [TFP; 1,080 predicted protein target binding probabilities above a training cut-off, using PIDGIN (Mervin et al., 2015)]. This hence yields three classes of models: SFP, TFP, and STFP (a concatenation of the SFP and TFP fingerprints). These fingerprints are used as input to machine learning models that make inferences between a particular representation and the experimentally observed synergy. A number of models are optimized for the prediction of synergistic combinations, and the best-performing final model is selected following a rigorous cross-validation procedure, where either both compounds are known to the model, one compound is unknown, or both are unknown, such that the ability of CoSynE to extrapolate to novel chemical space may be inferred (**Figure 2**).

We have previously applied CoSynE to the prediction of novel antibiotic combinations effective against E. coli (Mason et al., 2017). In this initial study, CoSynE was trained upon 156 pairs of 18 compounds using the SFP representation of combinations (since in preliminary studies other types of descriptors were found to lead to inferior performance), which was then used to pre-screen a set of 123 combinations, comprising compounds that were known and/or unknown to the model. After prospective validation, 10 novel synergistic combinations were confirmed from a list of 12 that were highlighted by CoSynE. The results from our previous study correspond to a 2.8-fold enrichment in the discovery of synergistic combinations vs. that expected by random selection from the same set of compounds.

In the present study, we were starting with a much larger training dataset consisting of 1,540 combinations of 56 compounds tested against P. falciparum (Mott et al., 2015). Next, CoSynE was used to pre-screen a library of 23 compounds unknown to the model (see Methods section for compound selection process) by predicting which combinations of those compounds are likely to exhibit novel antimalarial synergy.

These predictions were prospectively validated by carrying out a full pairwise experimental screen of all 23 compounds the model could have chosen from (in order to also provide a negative control, i.e., testing of compound combinations not predicted to be synergistic by the model). This validation represents making predictions in entirely novel compound space, where both compounds have not been seen by the model before, which is a very tough challenge, compared to our previous study (and many other studies) which mostly included compounds that were previously known to the model. However, prospective validation in the present study showed CoSynE predictions to be enriched with 1.70 times more synergistic combinations than expected by random selection (over an already rather high baseline synergy level, see details below), and hence also predictions in novel chemical space are enriched over random.

### RESULTS AND DISCUSSION

### Similarity of Training and Validation Sets

Clustered hierarchical similarities are shown for whole and scaffold structures in **Supplementary Figure 1**. In general, there is little structural similarity between compounds in the training data compared to the prospectively tested data. Compounds which formed the top five most synergistic combinations in both the training and validation datasets are shown in dimensionallyreduced chemical space in **Figure 3**. The lack of a clear clustering between the top synergistic compound structures in either datasets demonstrates the difficulty in selection of compounds to screen simply via structural similarity alone. In addition to the observation that synergy is more commonly observed for drugs targeting the same processes (Brochado et al., 2018), the relationship between compound structure-related properties and synergistic interaction has been shown previously [such as lipophilicity and synergy in the case of anti-fungals (Yilancioglu et al., 2014)]. Overall, the inference of complex relationships, such as these on a scale that may quickly explode to intractable proportions is a task highly applicable to machine learning.

### Dataset Composition and Model Performance During Cross-Validation on Training Set

The number of high quality (HQ) training combinations per dataset (see Methods section for definition) and synergy type is shown in **Table 1**. The Dd2 dataset contains the greatest number of HQ combinations (1,245), followed by 3d7 (1,194), and then Hb3 (1,159). This was reflected in the results of the 5-fold leave-one-compound-out (LOCO) and leave-one-pairout (LOPO) cross-validation routines (**Supplementary Table 1**), which showed the Dd2 model to outperform 3d7 and Hb3. The mean average Matthews Correlation Coefficient (MCC) score for each strain (i.e., across all fingerprint type and all CV routines) were 0.19 (Dd2), 0.18 (3d7), and 0.11 (Hb3). Although these MCC scores are not particularly high in absolute terms (particularly since the more difficult CV routines bring the scores down, while considering that a score of 0 is equivalent to random selection), the Dd2 dataset was chosen for use in the remainder of the study due to the expectation of relatively greater performance in a prospective validation, in addition to the greater number of high quality data points upon which the model is trained upon.

The Dd2 dataset model was further examined in terms of the performance for each of the descriptor types, the results of which are displayed in **Table 2**. During 5-fold CV (where a random subset of 20% of the training data is held out to test upon), each descriptor type for Dd2 showed similar performance, with a cross-descriptor average MCC of 0.46 and a crossdescriptor average 2.78-fold enrichment (compared to random selection) of synergistic combinations correctly predicted by the model. However, for the more challenging leave-one-compoundout (LOCO) CV, the SFP model significantly outperformed the others, with MCC scores of 0.27 (SFP), 0.03 (TFP), and 0.03 (STFP). Moving on to the most difficult leave-one-pairout (LOPO) CV routine, the performance was still greatest for the SFP model with a precision of 0.33 and recall of 0.01 (corresponding to an MCC of 0.02). Although recall (number of synergistic compounds in the test data that were identified correctly) is very low, the precision (number of synergistic combinations correctly identified in all that were predicted to be synergistic) is greater at 0.33. This is still useful in practice since it suggests we are only likely to find the minority of all synergistic combinations in a dataset, but 33% of those combinations predicted to be synergistic will indeed turn out to be synergistic combinations. Compared to our previous study where CoSynE was applied to antibiotic combinations (Mason et al., 2017), the LOPO CV performance was qualitatively similar with a high precision and low recall (1.0 and 0.2, respectively) for a SFP fingerprint on the training data. Since the coverage of chemical space in this dataset overall is quite low it is likely that the model has not been exposed to enough diversity to make confident predictions about many of the compounds, and so the recall score is low as a result.

A possible reason behind the low performance of the TFP descriptor models is that the protein targets from PIDGIN are of human origin, and are unlikely to provide a useful representation of target interactions in P. falciparum. However, it is the case that orthologous proteins exist between Homo sapiens and P. falciparum, and it has previously been shown that the number of conflicting bioactivities between human and ortholog targets in public databases is comparatively low (Mervin et al., 2018), which supports the use of human targets as bioactivity spectra in this indirect manner. It has also been shown that bioactivity spectra can be used more generally as a descriptor that captures biologically relevant information, and can outperform chemical descriptors in the identification of compounds with similar bioactivities [see Petrone et al. (Petrone et al., 2012) Bender et al. (Bender et al., 2006), Kauvar et al. (Kauvar et al., 1995), Riniker et al. (Riniker et al., 2014), and Paricharak et al. (Paricharak et al., 2016)]. These, together with the lack of predictive modeling tools

distinct or well-defined chemical space. Out of these predictions in green, none were predicted by CoSynE, but paroxetine + guanethidine would be discovered following the indirect route described in the Results section, and is the second-most synergistic combination in the validation dataset. Structures for validation and training compounds are included in Supplementary Tables 5, 7, respectively.

available to predict potential P. falciparum targets from a given compound structure, provided the reasoning behind our choice of entire bioactivity spectra against proteins as a descriptor type.

Since we are carrying out the toughest validation possible for our model by exploring novel areas of chemical space (i.e., the compounds to be prospectively validated in this study are not present in the training data), the most-challenging LOPO scenario represents the predictions we wish to make. The CV performance results suggest that by using the SFP descriptor model, we may expect an approximate 1.5-fold enrichment of synergistic combinations in those predicted from our novel compounds compared to random selection (although this enrichment appears low, note that there is already a high baseline of synergy within the dataset which this suggests could be increased further and that the prediction of synergy for entirely unseen data is the most difficult test of a predictive model possible). The SFP descriptor model was therefore selected as the most suitable candidate for this study, which is the same class of descriptor used in our previous study which successfully identified antibacterial combinations (Mason et al., 2017).

### Prospective Validation of CoSynE Predictions

The library of 23 compounds that were selected for prospective validation resulted from predictions generated by a developmental version of CoSynE that had previously virtually screened 21 million DrugBank combinations using the same training data, alongside a different approach that was developed in parallel to CoSynE (KalantarMotamedi et al., 2018; see Experimental section for details). From this library of 23 compounds (and a possible 253 combinations), a total of 20 combinations comprising 12 distinct individual compounds were predicted to be synergistic, and these were submitted for prospective experimental validation. The prospective validation found that 9 of these 20 combinations (i.e., 45%) exhibited antimalarial synergy (defined in this study as γ ≤ 0.96). These predicted synergistic combinations are shown in **Table 3** where the range of γ is 0.917–0.958 (compared to the full prospective screen shown in **Supplementary Table 2**, where the range of γ is 0.88–0.959). The nine synergistic combinations that were correctly predicted comprise only seven compounds of the 23

#### TABLE 1 | Dataset statistics.


Counts for the number of synergistic, additive, and antagonistic compounds in each of the datasets available for the current study, after filtering for high quality (HQ) data. The Dd2 training dataset had the highest number of HQ datapoints, which was reflected during cross validation (CV). The Dd2 dataset also contained the highest number of HQ datapoints in the prospectively validated dataset.

#### TABLE 2 | Dd2 training performance.


The results from three increasingly difficult rounds of cross validation (CV); shuffled and stratified 5-fold CV, leave one compound out (LOCO), and leave one pair out (LOPO), for each model type (SFP, structural fingerprint; TFP, target fingerprint; and STFP, combined structure-target fingerprint). Since the current study concerns the prediction of novel compound combinations, our chosen model followed the expected performance of the SFP model during LOPO CV, since this is the most challenging test of the model. AUC, area under receiver operating curve; Pr, precision; Re, recall; Ac, accuracy; Ef, enrichment factor. The "cross descriptor average" is the average score for each metric across each cross validation routine.

that were provided to CoSynE. These seven compounds were further investigated using the literature, in order to identify a biological rationale for their selection, and are depicted in **Table 4**. It should be noted that five out of these seven compounds were found to also have self-self Èvalues that would be classed as synergistic by the threshold that was trained upon, instead of additive (as one would expect). Inclusion of this observation in a predictive model would additionally include the experimental data for self-self crosses for all compounds, which may not be feasible. Instead, this highlights a current limitation of synergy quantification based upon experimental dose-response matrices, whereby the underlying metric should include these crosses as an additional parameter (see Experimental for details). In the present study however, the model has successfully predicted combinations of drugs that produced Èvalues below a cutoff at a rate of 45%, demonstrating the ability to reduce search space significantly.

The following seven compounds were part of the nine combinations that were prospectively validated as being synergistic; dihydroergotamine (in four of the combinations), apicidin (three combinations), hydroxyzine (three combinations), trifluoperazine (three combinations), sorafenib (two combinations), virginiamycin factor S1 (two combinations), and guanethidine (one combination). The Tanimoto similarity of each compound vs. the training compounds is shown in **Supplementary Figure 2**, which shows apicidin has the greatest similarity among validation compounds to the training compounds at 39.1% (to gramicidin). Virginiamycin factor S1 is the next-closest compound to the training data, with a 30.7% similarity to gramicidin, followed by hydroxyzine (26.2% to piperaquine), trifluoperazine (24.6% to piperaquine), dihydroergotamine (23.5% to gramicidin), sorafenib (19.8% to nilotinib), and guanethidine (15.9% to pyronaridine). Overall, these greatest similarities to the training compounds are on the more-similar end of the distribution curve, but the overall similarity is still quite low. Compounds that form both the validation and training compounds are listed in **Supplementary Tables 5, 7**.

#### TABLE 3 | Dd2 SFP predictions.


The 9 combinations out of 20 predicted by CoSynE, which were prospectively validated to be synergistic, which cover a total of 7 unique compounds. The probability of being synergistic that was assigned by CoSynE is shown, which does not correlate with the experimentally quantified degree of synergy.

Out of the nine true positive synergistic predictions, four combinations involved one compound (namely, either hydroxyzine or guanethidine) known as a drug efflux pump inhibitor in other species (further details given below), which may also facilitate accumulation of a respective antimalarial partner drug in P. falciparum. Drug efflux pump inhibition has previously been suggested as attractive in combating resistance, whereby the intracellular concentration of an active compound is otherwise strongly restricted by the microorganism (Alibert-Franco et al., 2009). Firstly, hydroxyzine is a compound with antihistamine and central nervous system (CNS) properties that has been shown to act as an efflux pump inhibitor in bacteria, and also affects Quorum Sensing (QS) (Aybey et al., 2014). QS is a system of stimulus and coordination among microorganisms, which P. falciparum may use to detect conditions of the external environment (Wu et al., 2016), such as overcrowding, in order to keep the parasite population under control in the host (Mutai and Waitumbi, 2010). Hydroxyzine was correctly predicted to be synergistic in combination with sorafenib, apicidin, or dihydroergotamine. Sorafenib is a tyrosine kinase inhibitor used in the treatment of cancer that inhibits parasite egress from the host cell (Gaji et al., 2014), and is annotated with activity against both 3D7 and Dd2 strains of P. falciparum in PubChem (Pathak et al., 2015; Kim et al., 2016). Apicidin is a potent inhibitor of histone deacetylase [HDA; of which the P. falciparum ortholog PfHDA2 exists (Coleman et al., 2014)] and this mechanism of inhibition is responsible for the antiprotozoal properties of the drug (Darkin-Rattray et al., 1996; Engel et al., 2015). Dihydroergotamine is a known inhibitor of P. falciparum (Weisman et al., 2006), which may target a serotonin 5-HT1a-like receptor in the parasite thought to be a nutrient channel critical for parasite development (Hanoun et al., 2003; Locher et al., 2003). Ergotamine, the structural analog of dihydroergotamine was one compound involved in a docking study looking for competitive inhibitors for the enzyme P. falciparum lactate dehydrogenase (Pf LDH), upon which the parasite is dependent for energy production where it achieved a reasonably good docking score (Penna-Coutinho et al., 2011). The combination of these active compounds with the hydroxyzine efflux pump inhibition and QS action may be responsible for the observed synergy in these cases. Secondly, guanethidine is annotated as active against human multidrug resistance protein 1 (MDR-1) in a screen for compounds that compete for this transporter as a means to increase accumulation of active compounds in cells (AID:377). A plasmodium ortholog of MDR-1, PfMDR1 exists (Hyde, 2007), and if guanethidine competes for PfMDR1, this may explain a potential mechanism for synergy, since PfMDR1 is important for transporting substrates from the cytoplasm into the lysosomal-like parasite digestive vacuole (Reiling and Rohrbach, 2015). Guanethidine alone does not show activity against P. falciparum (Chong et al., 2006), but was correctly predicted to show synergy in combination with trifluoperazine. Trifluoperazine is an antipsychotic drug and a potent inhibitor of P. falciparum calcium-dependent protein kinase 4 (PfCDPK4) (Cavagnino et al., 2011), and so would represent the anti-malarial compound in this combination. To the authors' knowledge, these may be novel modes of action for the use of hydroxyzine and guanethidine in context of P. falciparum. Since the training dataset did not include compounds explicitly annotated as targeting P. falciparum efflux pumps [with the exception of primaquine, which exhibits synergy with chloroquine through inhibiting the P. falciparum Chloroquine Resistance Transporter; PfCRT (Bray et al., 2005)]. Further experimental validation

#### TABLE 4 | Synergistic drugs correctly predicted by CoSynE.


2006), and is annotated in PubChem as being active in several assays. May target a serotonin 5-HT1a-like receptor in the parasite thought to be a nutrient channel (Hanoun et al., 2003; Locher et al., 2003). Structural analog ergotamine achieved reasonably good docking score in a study searching for competitive inhibitors for PfLDH (Penna-Coutinho et al.,

inconclusive potency against P. falciparum of 5.72 uM (AID:504834). Also annotated as active against MDR-1 (AID:377); the P. falciparum analog of which (pfmdr1) is involved in resistance and guanethidine may therefore play a role in preventing drug efflux (Hyde,



Depiction and description of the seven compounds that were part of combinations predicted to be synergistic by CoSynE.

would be required to confirm this mechanistic hypothesis of the synergies observed experimentally.

Three of the remaining five combinations that were correctly predicted involve a combination of the previously detailed compounds that were the "active" partner drugs to those with expected efflux pump inhibitors (apicidin-dihydroergotamine, trifluoperazine-sorafenib, and trifluoperazine-dihydroergotamine). The observed synergy in these may exert their synergistic effect through their differing mechanisms.

The final two correctly predicted combinations involve virginiamycin factor S1, a macrolide antibiotic annotated as active against P. falciparum proliferation (AID:504749), with either apicidin or dihydroergotamine. Antibiotics may exhibit antimalarial properties, albeit slow-acting, by targeting the apicoplast during development (Dahl et al., 2006; Barthel et al., 2008; Chakraborty, 2016). Macrolides are known for their effectiveness in treatment of uncomplicated malaria in combination with quinine, where the main mechanism of action involves binding to ribosomal proteins, but suffer due to poor pharmacological properties (Gaillard et al., 2016). The combination of virginiamycin S1 targeting the apicoplast, and apicidin targeting plasmodium orthologs of histone deacetylase, such as PfHDA2 (Darkin-Rattray et al., 1996; Coleman et al., 2014; Engel et al., 2015) suggests that this combination puts pressure on the developmental and growth stages of the parasite. The combination of potential nutrient channel and energy inhibition properties of dihydroergotamine (Hanoun et al., 2003; Locher et al., 2003; Penna-Coutinho et al., 2011) with the apicoplast-targeting mechanism of virginiamycin S1 also suggests pressure being put on the developmental and growth stages. However, since this work used asynchrous parasite cultures to assess compound efficacy, and given that apicoplasttargeting molecules don't typically affect the first replication cycle upon drug pressure [where they are instead exhibiting a "delayed death" phenotype (Dahl and Rosenthal, 2007)], this apicoplast-targeting mechanism is unlikely to have been observed. Unfortunately, the combination of macrolides and dihydroergotamine has been reported to produce clinically significant adverse drug reactions (Horowitz et al., 1996), which means this particular combination would not be suitable as a potential treatment.

### Full Pairwise Synergy Screen of 23 Compounds

A subsequent full pairwise experimental screen of all 23 compounds was also carried out (**Supplementary Table 3**), in order to assess the performance of CoSynE for the prediction of completely novel combinations of compounds acting synergistically. Comparison of the overall number of synergistic combinations that were found (49 out of 185, or 26%, see **Table 1**), compared to the number that was present among those predicted by CoSynE (9 out of 20, or 45%) showed that we achieved a 1.70-fold enrichment (0.45/0.265); approximately that which was expected from our LOPO CV performance. This level of enrichment is significant in the search for antimalarial compound combinations in practical terms, where a 41% reduction [1 – (1/1.70)] in the total number of measurements required is a very attractive prospect in terms of both time and cost. Although this performance is attractive, the model is still far from ideal and requires further refinement to increase both the precision (0.45) and recall (0.18) seen in **Table 5**. On the other hand it should be noted that the baseline of obtaining synergy in 26.5% of cases is a rather high baseline, which the model was able to increase further to nearly half of all synergistic predictions being true positives (more precisely, to 45% of all combinations).

### Potential for Indirect Discovery of Synergistic Combinations

We next investigated the hypothetical scenario where all compounds that are part of combinations predicted to be synergistic by CoSynE were screened in a fully pairwise manner, to see whether CoSynE could indirectly expand the discovery of novel combinations. Interpreted differently, we investigated whether synergy between compounds is "clustered"—and whether the knowledge that a compound has shown synergy before increases the chances that it will show synergy also in combination with other compounds (with the limitation of our validation being the limited sampling of chemical space, which may or may not generalize to "all" chemical space). Each combination in the prospective validation dataset for Dd2 involving any of the 12 compounds that were part of a combination predicted to be synergistic was extracted, yielding a total of 61 combinations, out of which 36% were found to be synergistic (22 combinations in **Supplementary Table 4**). This proportion of synergistic combinations is hence higher (by 9.5% in absolute terms, and 36% in relative terms) than the 26.5% found in all of the 185 HQ validation combinations, which corresponds to an enrichment of 1.36× compared to random selection. However, to some extent this enrichment may be slightly inflated due to CoSynE having identified drug efflux pump inhibitors in the model. Among the synergistic combinations in this subset indirectly found through CoSynE is guanethidine (antiplasmodial and active against MDR1) and paroxetine (annotated in DrugBank as targeting MDR1, antibacterial activity via efflux pump and QS inhibition Aybey et al., 2014, and antiplasmodial activity Chong et al., 2006 including AID:524790–524796), with a Èscore of 0.889. This combination is more synergistic than all those directly predicted by CoSynE, and is the second-most synergistic combination among all HQ combinations in the validation dataset. This suggests that by not only screening compound combinations predicted to be synergistic by CoSynE, but all combinations of the compounds predicted to be part of any combination predicted to be synergistic will still increase the likelihood of identifying further synergistic combinations. This also is in line with previous studies, which have found that while synergy to an extent depends on the properties of both compounds in a combination, there is still a significant bias in chemical space, with some parts of it being significantly more frequently part of synergistic compounds combinations than others (Weinstein et al., 2017).

Along these lines, we believe that an iterative screening procedure could be followed in an industrial setting, whereby predictions are made, screened, and then fed back into CoSynE for training before further predictions are made. Such iterative approaches have been investigated in the literature (Paricharak et al., 2016), and could enable gradual expansion of chemical and/or biological space, in particular with current improvements in cherry picking compounds in such iterative screening settings.

### CONCLUSION

In this work, we describe the application of our compound combination prediction method, CoSynE, to a recently published compound combination screening dataset for P. falciparum, and the results to a prospective validation of our predictions. When we used our final CoSynE model to predict synergistic combinations (γ ≤ 0.96) from a library of compounds previously

#### TABLE 5 | Dd2 SFP Performance.


Overall performance of the Dd2 SFP model, after the full pairwise screen of prospective compounds was carried out. Overall, the precision and recall for the prediction of novel synergistic combinations, however this still provides greater enrichment of synergistic combinations than expected by random selection (1.70-fold) from the prospectively validated dataset. AUC, area under receiver operating curve; Pr, precision; Re, recall; Ac, accuracy; Ef, enrichment factor.

unknown to the model for P. falciparum Dd2, 45% of the predicted combinations (9 out of 20) were experimentally confirmed as being synergistic, corresponding to a 1.70-fold enrichment of synergistic combinations than that expected by randomly selection from the validation dataset. This is of practical significance when combinatorial explosion and experimental cost for combination screening is taken into account. Furthermore, a 2.36-fold enrichment was observed during cross validation when one compound is unknown, and 2.74-fold when both compounds are known to the model (but only in different combinations). In addition, it was found that screening only compounds part of combinations CoSynE predicted to be synergistic would yield 9.5% more synergistic combinations in absolute terms (and 36% in relative terms) than expected by random selection alone.

The combinations that were prospectively validated from our predictions mainly involve one compound with antimalarial activity coupled to another targeting potential drug efflux or substrate transport mechanisms in P. falciparum. These results in particular suggest that the approach we describe can capture meaningful information that enables the prediction of synergy, which is corroborated by our previous study involving antibiotic combinations.

CoSynE offers an advantage over similar methods that require data, such as differential gene expression analysis, or single agent efficacies across multiple cell lines related to the target, in that the only information required to make new predictions is the provision of chemical structure information. The use of CoSynE to make predictions for other therapeutic areas requires only a dataset of combination screening results together with compound structural information, and may also predict for higher orders of combinations (e.g., combinations of 3, 4, and above), should training data with a meaningful measure of synergy be made available. Our approach may be employed to prioritize screening of new combinations, thus reducing the potential burden and cost of combinatorial explosion in the search for future antimalarial compound combinations that exhibit synergy.

#### EXPERIMENTAL

#### Experimental Screening of Compound Combinations

Training data was obtained from a publicly available dataset of antimalarial compound combinations from a highthroughput screen against 3D7, Dd2, and HB3 strains of P. falciparum (assay IDs 1463, 1464 and 1465, which can be found at https://tripod.nih.gov/matrix-client/?p=183; Mott et al., 2015). Compounds were acoustically dispensed and read at 72 h as previously described (Mott et al., 2015). Matrix combination response was calculated based upon relative SYBRGreen intensity values, compared to controls (Mott et al., 2015). The prospective validation data was screened using the same method as the training data. This validation dataset includes both single-agent and combination responses, and can be found at https://tripod.nih.gov/matrixclient/?p=1261. The 23 compounds that comprised the validation dataset are listed in **Supplementary Table 5**, the experimental data used to validate the Dd2 model is listed in **Supplementary Table 3**, and reproducibility of assay results is detailed in **Supplementary Table 6**.

#### Compound Combination Datasets and Synergy

The training data used in this study consisted of 1,540 combinations of 56 antimalarial compounds that exhibit different modes of action, which were screened against the 3D7, Dd2, and HB3 strains of P. falciparum. The 56 compounds that formed this screen are listed in **Supplementary Table 7**. Synergy metrics and data quality (QC) were pre-determined from a 6 × 6 doseresponse matrix of each combination, where inhibition of the parasite in infected red blood cells was measured. The QC score for a combination was precomputed from a set of heuristics described in Mott et al. (2015), that takes in to account the quality of the single agent dose response, DMSO activity and the smoothness of the dose combination response matrix. This yields a value between 0 and 18, where lower values indicate higher quality. Only high quality (HQ) experimental readouts were kept that have a QC score ≤3, which provided 1,194 HQ combinations for 3D7, 1,245 for Dd2, and 1,159 for HB3 (**Table 1**; training dataset). For the validation dataset, the same filtering rules applied to 209 combinations of 23 compounds provided 119 for 3D7, 185 for Dd2, and 81 for HB3 (**Table 1**; validation dataset).

The metric used to interpret synergy in our modeling approach was gamma (È), which is a combination of the Highest Single Agent (HSA; also known as Gaddum's noninteraction model) and Bliss independence. Based upon a 6 × 6 dose-response matrix of compound A and compound B at concentration x and y vs. inhibition of P. falciparum, the variable Èis computed to minimize the following function (Cokol et al., 2014).

$$\Sigma \left[ f\left(A\_{\left[x\right]} + B\_{\left[y\right]}\right) - \ \mathcal{Y} \times \max\left\{ f\left(A\_{\left[x\right]}\right), f\left(B\_{\left[y\right]}\right) \right\} \right]^2 \tag{1}$$

This yields a positive value, where synergy is characterized as <1, additivity as =1, and antagonism as >1. In order to classify each of the combination readouts, we set a maximum Ècutoff for synergy of 0.96, and minimum cutoff for antagonism of 1.04, with the remainder assigned as additive. This cutoff value was empirically chosen to provide a degree of separation between antagonism and synergy in the training data, while aiming to keep the balance of each class similar across strains. Although not explicitly investigated during the study, we expect that making the Ècutoff larger may lead to an increased enrichment of synergistic combinations being predicted, while making it smaller may affect the model robustness by decreasing the number of synergistic training datapoints further.

One limitation with regard to the pre-processing of experimental combination responses during our study is that measurement of self-crosses using the Bliss model component of Èmay in fact produce values which are classed as synergistic. For example, apicidin in combination with itself in the validation dataset shows a Èvalue of 0.895, whereas our cut-off for the training data was 0.96. In other words, this self-cross should be labeled as "synergistic" according to our criteria, whereas selfinteraction should be additive; this is a well-known phenomenon among synergy measures, where a generalizable and robust model is yet to be identified (Bulusu et al., 2016). We chose to apply the cut-off of 0.96 that was used for the training data to enable our assessment of validation predictions "in the eyes of the model" with respect to training criteria, yielding 49 synergistic combinations in the Dd2 validation dataset. Compounds with self-cross Èvalues lower than our training data cut-off include trifluoperazine, raloxifene, guanethidine, hydroxyzine, megestrol acetate, FK-506, fulvestrant, sorafenib, apicidin, and ingenol mebutate. Since these cover five out of the seven compounds in **Table 3**, any future investigation into combinations involving these compounds based solely upon Èvalues should bear this in mind (i.e., eight out of our nine predictions in **Table 3**). Although it is not clear precisely how to overcome this limitation, future models that additionally train upon the validation dataset might take these self-crosses into account more explicitly by lowering synergistic cut-offs on a per-combination basis, or seek to find a way of incorporating this into the synergy metric itself. All selfcrosses for the validation data may be found at https://tripod.nih. gov/matrix-client/?p=1261, and minimum significance ratios for the validation compounds that were screened are detailed in **Supplementary Table 6**.

#### Prior Selection of Validation Compounds

The selection of compound combinations for screening and validation of our models were based upon a version of CoSynE much earlier in development. Several CoSynE models were trained upon the same dataset as described in this report, except the range of additivity for Èwas narrower at 0.975– 1.025 (opposed to 0.96–1.04). The resulting models were used to predict enumerated combinations of approved, investigational, and experimental compounds in DrugBank (Wishart et al., 2006), which amounted to around 21 million combinations for prediction. Of these, approximately 1.2 million combinations were predicted to be synergistic, and 10 combinations needed to be selected for the prospective validation. This selection was achieved by manually reviewing the top-ranked combinations (sorted by the probability of being synergistic that was assigned to each combination by CoSynE), and taking into consideration the prevalence of each compound throughout the list of combination predictions, followed by examining the literature co-occurrence of each predicted combination's compounds together with mention of P. falciparum in PubMed. These 10 chosen combinations comprised 18 compounds, and were submitted for testing together with an additional 10 selected from a different approach developed in parallel by KalantarMotamedi et al. (2018).

Out of the total number of compounds among the 20 combinations primarily suggested for testing, only the 23 compounds shown in **Supplementary Table 5** were available for purchase at the time, which meant few original predictions could be prospectively validated. The decision was made to instead use a more recent version of CoSynE to predict which combinations of these 23 compounds were synergistic, finally yielding the dataset in this study. Interestingly, **Table 1** shows that the number of antagonistic combinations observed in the validation dataset is significantly lower compared to the training dataset, while at the same time the number classed as additive or synergistic has increased. This reduction in the number of antagonistic combinations as a result of virtually screening a library of intractable size suggests that the approach taken by CoSynE, together with the process of manually reviewing the top predictions, aids the discovery of synergistic combinations.

### Comparison to a Similar Study Conducted in Parallel

The approach by KalantarMotamedi et al. (2018) differs from that described in this work primarily by the usage of gene expression data. Firstly, differential gene expression profiles of mild vs. severe malaria patient peripheral blood samples were used to predict potentially active single antimalarial agents by comparison of drug gene perturbations through a modified Gene Set Enrichment Analysis (GSEA) approach (Subramanian et al., 2005) applied to the Library of INtegrated Cellular Signatures (LINCS) Phase I database (Subramanian et al., 2017). Secondly, a Random Forest model was trained on the same dataset of 1,540 combinations from NCATS as in the present study, and human target predictions and pathway annotations were used to infer which drug combinations may interact synergistically. Finally, the single agents identified by the GSEA approach to human blood samples were enumerated as pairs and predicted by the Random Forest model as synergistic/nonsynergistic. These predicted combinations were ranked based upon the predicted probability of being synergistic, and the top 17 compound combinations were selected for prospective experimental testing (covering a total of 14 single agents). This approach reported an overall average precision of 0.488 and recall of 0.755 (F1 = 0.593) for experiments across the three strains of P. falciparum where drug combinations were predicted to be synergistic at a cutoff for synergy of γ ≤ 0.975. Among the 14 single agents in 17 combinations Kalantar-Motamedi et al. selected for prospective validation were seven that overlapped with the 12 drugs in 20 combinations CoSynE predicted for prospective validation; ciprofloxacin, wortmannin, paroxetine, raloxifene, apicidin, trifluoperazine, and hydroxyzine. The only combination of these overlapping compounds that was correctly predicted to be synergistic in both CoSynE and the method described by Kalantar-Motamedi et al. was apicidin-hydroxyzine. Since CoSynE is not constrained to compounds that are only present in the Connectivity-Map (Lamb et al., 2006) or LINCS databases (instead needing only knowledge of compound structure) it is difficult to draw a direct and fair comparison of overall performance. However, for the same experimental γ cutoff applied to the total pool of 185 prospective combinations in the current study that denotes a synergistic combination, CoSynE achieved precision of 0.45 and recall of 0.18 (F1 = 0.26). While the precision of CoSynE for the prospectively validated combinations is close to that reported by Kalantar-Motamedi et al. recall in this instance is much lower. However, it should be noted this overall performance still represents greater enrichment of synergistic combinations being discovered than by random selection (see **Table 5**), and CoSynE is not limited by the requirement for gene expression data to be made available for the compounds that are to be predicted.

#### Combination Descriptors

We represented each compound combination as an array of features in three ways. A Structural Fingerprint (SFP) descriptor based upon the molecular structure of each compound in a combination, a Target Fingerprint (TFP) descriptor based upon probabilistic combination of predicted target affinity probabilities per compound, and a concatenation of these two previous descriptors (Structure-Target; STFP). This provided three descriptor sets for which models were trained.

Structural fingerprints were generated by first obtaining SMILES representation PubChem (Kim et al., 2016) for each compound that was screened in the training data, before standardizing this representation with ChemAxon JChem Standardizer (ChemAxon, 2014) according to the protocol defined by PIDGIN (Mervin et al., 2015). Standardized SMILES were then loaded into RDKit v2015<sup>1</sup> and 2,048-bit Morgan fingerprints with radius 2 were generated, yielding arrays of 2,048 integer features. A given combination of two compounds was represented as the bitwise average of these features, yielding possible values of 0, 0.5, and 1 per feature, which formed the SFP descriptor. A Morgan fingerprint was chosen for this study due to generally outperforming the MACCS fingerprint in this dataset [however the MACCS fingerprint was found to outperform Morgan when CoSynE was used to predict antibiotic combinations (Mason et al., 2017)]. The SMILES representation was also used as input for PIDGIN (Mervin et al., 2015), where the probability of binding below the training cut-off of 10µM for each compound vs. 1,080 human protein targets was predicted, yielding arrays of 1,080 floating point value features between 0 and 1. A given combination considered the probability of binding to each protein target by each compound from the following function, such that the maximum affinity a combination of compounds may have is 100% [i.e., a value of 1.0; Equation (2)], which formed the TFP descriptor. The rationale behind the use of this function for TFP was that the probability of a protein being inhibited cannot be more than 100%, but the more compounds in a single combination that are predicted to target the protein, the more this is likely to be the case.

**p Combination**, **TargetN** = **1** − **1** − **p Compound1**, **TargetN** × (**1** − **p Compound2**, **TargetN** ) (2)

### Model Construction and Performance Testing

Model settings were optimized prior to construction of the final models, and all machine learning capabilities were carried out using SciKit-Learn v0.17 (Pedregosa et al., 2011).

The 1,245 Dd2 compound combinations that formed our training data each has either between 1,080, 2,048, or 3,128 features per combination (depending on the descriptor used), meaning that the feature space is larger than the number of combinations. It is therefore necessary to remove any features that are not useful for training prior to constructing the final models. Training data was scaled to unit variance with a zero-centered mean, and starting from N = 1, the top N percentile of features within the training data [as determined by ANOVA F-classifier score in SciKit-Learn v0.17 (Pedregosa et al., 2011)] was selected to train upon using a Support Vector Machine Classifier (SVC, optimization parameters detailed in **Supplementary Methods**), together with the synergy type labels per combination, to construct a classifier. This classifier then predicted the synergy label for test data that has had the same features selected, and the outcome of this test was scored using the Matthews Correlation Coefficient [MCC, Equation (3)] with respect to the ability for correctly predicting a synergistic combination. Due to the consideration of all possible outcomes of a classification problem (true positive; TP, false positive; FP, true negative; TN, false negative; FN), the MCC score offers benefit over performance metrics, such as the Area Under Receiver Operating Curve (AU-ROC) and Accuracy, which ignore TN and TN, and FP and FN predictions, respectively.

$$\text{MCC} = \frac{\text{TP} \times \text{TN} - \text{FP} \times \text{FN}}{\sqrt{(\text{TP} + \text{FP})(\text{TP} + \text{FN})(\text{TN} + \text{FP})(\text{TN} + \text{FN})}} \quad \text{(3)}$$

This process was repeated 10 times per N, by stratified and shuffled 5-fold cross validation, to finally yield 99 averaged MCC scores. These top N selected features that resulted in the highest MCC score overall were subsequently used by CoSynE in the final model training round, in order to test model performance in different scenarios. The top N selected features per model are detailed in **Supplementary Methods**. While CoSynE will label predicted combinations as synergistic, additive, or antagonistic, during model optimization only the prediction of synergistic combinations is carried out.

The second round that results in selection of the final model involved construction of a number of different classifiers [Bernoulli Naïve Bayes, Support Vector Machine, Random Forest, Extra Trees, and Decision Tree, SciKit-Learn v0.17

<sup>1</sup>Landrum, G. RDKit: Open-Source Cheminformatics. Available online at: http:// www.rdkit.org

(Pedregosa et al., 2011)], which were subject to grid search parameter optimization (optimization parameters detailed in **Supplementary Methods**). The selection of the best model parameters was based upon 10 repeats of stratified and shuffled 5-fold cross validation, which represents a scenario where the training data has prior knowledge of both compounds per combination (**Figure 3**). Each model with a new set of parameters was then subjected to two further rounds of validation of increasing difficulty; Leave One Compound Out (LOCO; in which one compound in a combination is made unknown to the model), and Leave One Pair Out (LOPO; in which both compounds are made unknown to the model). This provided a view on model performance when looking to extend the compounds used in combination with those already known (LOCO) or, in the toughest case, searching for novel combinations of unknown compounds (LOPO). The choice of final model settings was based upon performance in terms of the MCC score for the prediction of synergistic combinations in each of these scenarios.

In each test and train split of the data, feature selection and scaling were based solely upon the training data to ensure that no

#### REFERENCES


information from the test set was used in the model generation step. Final model settings are detailed in **Supplementary Table 7**.

#### AUTHOR CONTRIBUTIONS

DM created the tool and wrote the majority of the manuscript. RL provided advice with respect to the training dataset. RG and RE carried out the experimental work. IS and AB obtained funding, supervised and provided advice.

#### FUNDING

This work was supported by a grant from Unilever Research and Development to DM (MA-2013-00588), and an ERC Starting Grant (MIXTURE) to AB.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.01096/full#supplementary-material


Zhang, Y., Smolen, P., Baxter, D. A., and Byrne, J. H. (2014). Computational analyses of synergism in small molecular network motifs. PLoS Comput. Biol. 10:e1003524. doi: 10.1371/journal.pcbi.1003524

**Conflict of Interest Statement:** DM was employed by company Healx Ltd. at the time of submission. IS was employed by company Unilever at the time of submission. RG was employed by company Vertex Pharmaceuticals at the time of submission.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Mason, Eastman, Lewis, Stott, Guha and Bender. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# QSAR-Driven Design and Discovery of Novel Compounds With Antiplasmodial and Transmission Blocking Activities

Marilia N. N. Lima<sup>1</sup>† , Cleber C. Melo-Filho<sup>1</sup>† , Gustavo C. Cassiano<sup>2</sup> , Bruno J. Neves1,3 , Vinicius M. Alves<sup>1</sup> , Rodolpho C. Braga<sup>1</sup> , Pedro V. L. Cravo<sup>4</sup> , Eugene N. Muratov5,6 , Juliana Calit<sup>7</sup> , Daniel Y. Bargieri<sup>7</sup> , Fabio T. M. Costa<sup>2</sup> and Carolina H. Andrade1,2 \*

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Sandra Gemma, University of Siena, Italy Marco Tutone, Università degli Studi di Palermo, Italy Gildardo Rivera, Instituto Politécnico Nacional, Mexico

\*Correspondence:

Carolina H. Andrade carolina@ufg.br

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 22 November 2017 Accepted: 12 February 2018 Published: 06 March 2018

#### Citation:

Lima MNN, Melo-Filho CC, Cassiano GC, Neves BJ, Alves VM, Braga RC, Cravo PVL, Muratov EN, Calit J, Bargieri DY, Costa FTM and Andrade CH (2018) QSAR-Driven Design and Discovery of Novel Compounds With Antiplasmodial and Transmission Blocking Activities. Front. Pharmacol. 9:146. doi: 10.3389/fphar.2018.00146 <sup>1</sup> LabMol – Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil, <sup>2</sup> Laboratory of Tropical Diseases – Prof. Dr. Luiz Jacintho da Silva, Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, UNICAMP, Campinas, Brazil, <sup>3</sup> Laboratory of Cheminformatics, PPG-SOMA, University Center of Anápolis/UniEVANGELICA, Anápolis, Brazil, <sup>4</sup> Global Health and Tropical Medicine Centre, Unidade de Parasitologia Médica, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal, <sup>5</sup> Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States, <sup>6</sup> Department of Chemical Technology, Odessa National Polytechnic University, Odessa, Ukraine, <sup>7</sup> Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil

Malaria is a life-threatening infectious disease caused by parasites of the genus Plasmodium, affecting more than 200 million people worldwide every year and leading to about a half million deaths. Malaria parasites of humans have evolved resistance to all current antimalarial drugs, urging for the discovery of new effective compounds. Given that the inhibition of deoxyuridine triphosphatase of Plasmodium falciparum (PfdUTPase) induces wrong insertions in plasmodial DNA and consequently leading the parasite to death, this enzyme is considered an attractive antimalarial drug target. Using a combi-QSAR (quantitative structure-activity relationship) approach followed by virtual screening and in vitro experimental evaluation, we report herein the discovery of novel chemical scaffolds with in vitro potency against asexual blood stages of both P. falciparum multidrug-resistant and sensitive strains and against sporogonic development of P. berghei. We developed 2D- and 3D-QSAR models using a series of nucleosides reported in the literature as PfdUTPase inhibitors. The best models were combined in a consensus approach and used for virtual screening of the ChemBridge database, leading to the identification of five new virtual PfdUTPase inhibitors. Further in vitro testing on P. falciparum multidrug-resistant (W2) and sensitive (3D7) parasites showed that compounds LabMol-144 and LabMol-146 demonstrated fair activity against both strains and presented good selectivity versus mammalian cells. In addition, LabMol-144 showed good in vitro inhibition of P. berghei ookinete formation, demonstrating that hit-to-lead optimization based on this compound may also lead to new antimalarials with transmission blocking activity.

Keywords: malaria, virtual screening, QSAR, Plasmodium falciparum, dUTPase, transmission blocker

### INTRODUCTION

fphar-09-00146 March 5, 2018 Time: 17:0 # 2

Malaria is an infectious disease caused by protozoans of the genus Plasmodium and transmitted through the bite of insect vectors of the genus Anopheles. Plasmodium falciparum is the most prevalent and lethal species infecting humans in the African continent, being responsible for 99% of all malaria-attributed deaths (World Health Organization [WHO], 2016). Despite the fact that integrated control interventions have achieved significant progress in the reducing malaria cases and related mortality in recent years, malaria still causes 429,000 deaths every year, being endemic in 91 countries and territories of sub-Saharan Africa, South-East Asia, Latin America, and the Middle East (World Health Organization [WHO], 2016).

When compared to viruses and bacteria, these eukaryotic protozoans present a larger genome, have multiple stages in their life cycle, and a complex biology, which hinder the development of vaccines (Hoffman et al., 2015). Consequently, malaria control strategies largely rely on drug-dependent case management. Currently, artemisinin-based combination therapy (ACT) is the recommended official treatment for malaria. However, resistance to artemisinins has been detected in five countries in the Greater Mekong sub region of South-east Asia, endangering the future of P. falciparum elimination (Vogel, 2014; World Health Organization [WHO], 2016; Thu et al., 2017). Therefore, there is an urgent need for the discovery and development of new antimalarial therapies.

The enzyme 2<sup>0</sup> -deoxyuridine 5<sup>0</sup> -triphosphate nucleotide hydrolase (dUTPase) has emerged as a promising biological target in P. falciparum, and it is responsible for the hydrolytic cleavage of dUTP (deoxyuridine triphosphate) in dUMP (deoxyuridine monophosphate) and pyrophosphate (Nyman, 2001). The inhibition of dUTPase may cause dUTP accumulation and erroneous incorporation of uracil into DNA, leading to parasite death. Although another enzyme, DNA glycosylase, could repair the erroneous insertions, the excessive number of repairs would result in a fatal break of DNA strand (Whittingham et al., 2005). Given that DNA replication in Plasmodium takes place in all distinct stages of the parasite life cycle and given the importance of the enzyme dUTPase in this process, this enzyme is expressed in both asexual and sexual stages of the parasite (ring, trophozoite, schizont, gametocyte, and ookinete), as demonstrated in previous studies on P. falciparum 3D7 and P. berghei (López-Barragán et al., 2011; Otto et al., 2014). Thus, dUTPase inhibitors might not only act against blood-stage parasites, but also could block parasite transmission/development in mosquitoes. Experimental findings categorize dUTPase as essential for various organisms, such as Escherichia coli, Saccharomyces cerevisiae, and Mycobacterium smegmatis (El-hajj et al., 1988; Gadsden et al., 1993; Pecsi et al., 2012). The dUTPase of P. falciparum (Pf dUTPase) is an attractive target for the development of selective inhibitors since it presents relatively low sequence similarity with its human ortholog HsdUTPase (28.4% identity) (Whittingham et al., 2005).

Due to the importance of dUTPase in the parasite's DNA repair, we decided to use computer-aided drug design (CADD) approaches for discovering new dUTPase inhibitors. In the last several decades, CADD approaches have been widely applied in early stages of drug discovery, making the process faster and more financially viable (Leelananda and Lindert, 2016). Among these approaches, quantitative structure-activity relationships (QSARs) have been extensively used for lead optimization and virtual screening (Verma et al., 2010). Different QSAR approaches have been used by our group for identification of new promising hits for infectious diseases (Melo-Filho et al., 2016; Neves et al., 2016; Gomes et al., 2017).

In this work, we applied a combi-QSAR approach, combining 2D- and 3D-QSAR models, in a virtual screening campaign of the ChemBridge database for selection of new antimalarial virtual hits. Finally, we performed in vitro experimental evaluation of the potential Pf dUTPase inhibitors against chloroquinesensitive and multidrug-resistant strains of P. falciparum, and in gametocyte to ookinete conversion of P. berghei, aiming to identify new potential and selective antimalarial hits.

#### MATERIALS AND METHODS

The steps of the modeling study are briefly presented in **Figure 1**. The workflow encompasses the following steps: (i) data compilation and integration; (ii) data curation; (iii) model generation; (iv) virtual screening and (v) experimental validation. Our workflow was built following the best practices of QSAR modeling and CADD (Tropsha, 2010; Cherkasov et al., 2014).

#### Dataset Preparation

2D and 3D QSAR models were built using a series of Pf dUTPase inhibitors reported in the literature (Supplementary Table S1) (Nguyen et al., 2005, 2006; Whittingham et al., 2005; McCarthy et al., 2009; Baragaña et al., 2011; Hampton et al., 2011; Ruda et al., 2011). The data set was prepared and curated according to the protocol described by Fourches et al. (2010, 2015, 2016). Counterions were removed as chemotypes, and specific and nitroaromatic groups were standardized using Standardizer (v. 6.1, ChemAxon, Budapest, Hungary<sup>1</sup> ). Duplicates were identified using ISIDA Duplicates program (Varnek et al., 2008) and HiTQSAR (Kuz'min et al., 2008). If values of properties of identical compounds were equal, one of these compounds was kept in the data set. However, if properties were significantly different, all records were removed. After curation, 127 compounds (Supplementary Table S1) with activity against Pf dUTPase were kept for molecular modeling. The activity against both Plasmodium and human enzymes was available only for 45 compounds and used for calculation of selectivity (S) (Eq. 1). The activity was represented as K<sup>i</sup> (inhibition constant) and converted to the corresponding pK<sup>i</sup> (−logKi). In a similar approach, selectivity was converted to the logarithmic scale:

$$\text{SS} = \log \frac{H \text{sdf} \text{UTPase} \, K \text{i}}{P \text{f} \text{df} \text{UTPase} \, K \text{i}} \tag{1}$$

Values of S greater than zero indicate selective compounds while values below zero indicate compounds with poor selectivity.

<sup>1</sup>http://www.chemaxon.com

The data sets were divided into training and test sets using the Hierarchical Cluster Analysis method (HCA) available in the SYBYL v.1.2 (SYBYL-X 1.2, Tripos International, St. Louis, MO, United States). Molecules representing each cluster were manually selected for test set to maximize the coverage across the entire range of inhibition activity and selectivity. The final proportion between training and test set compounds was 3:1.

### HQSAR

Hologram QSAR (HQSAR), available on SYBYL-X v.1.2 (SYBYL-X 1.2, Tripos International, St. Louis, MO, United States; TRIPOS, 2010a), was used to build 2D QSAR models. Holograms were generated using six distinct fragment sizes (2–5, 3–6, 4– 7, 5–8, 6–9, 7–10 atoms) over a series of hologram lengths (53–997). Different combinations of fragment distinction were also considered, such as atoms (A), bonds (B), connectivity (C), hydrogen atoms (H), chirality (Ch), and hydrogen bond donor/acceptor (DA).

## Conformer Generation and Atomic Charges Assignment

The structures were converted into 3D format, and initial conformations were generated using the OMEGA v.2.5.1.4 (Hawkins et al., 2010; OMEGA 2.5.1.4: OpenEye Scientific Software, Santa Fe, NM, United States<sup>2</sup> ). Two different methods were used for the calculation of the partial atomic charges: the empirical method of Gasteiger-Hückel available on SYBYL-X v.1.2 (SYBYL-X 1.2, Tripos International, St. Louis, MO, United States) and the semi-empirical AM1-BCC (Jakalian et al., 1999, 2002) implemented in QUACPAC v.1.6.3.1 (QUACPAC 1.6.3.1: OpenEye Scientific Software, Santa Fe, NM, United States<sup>2</sup> ). The protonation state of the molecules were performed at pH 7.4, using QUACPAC 1.6.3 (QUACPAC 1.6.3.1: OpenEye Scientific Software, Santa Fe, NM, United States<sup>2</sup> ).

<sup>2</sup>http://www.eyesopen.com

### Molecular Alignment

fphar-09-00146 March 5, 2018 Time: 17:0 # 4

Compounds were submitted to three different molecular alignments: (i) alignment based on the morphological similarity function implemented in Surflex-Sim, accessible in SYBYL-X 1.2 (SYBYL-X 1.2, Tripos International, St. Louis, MO, United States); (ii) shape-based alignment from ROCS 3.2.1.4 software (Hawkins et al., 2007; ROCS 3.2.1.4: OpenEye Scientific Software, Santa Fe, NM, United States<sup>2</sup> ); and (iii) alignment by molecular docking of molecules on Pf dUTPase, using OEDocking 3.0.1 software (OEDocking 3.2.0.2: OpenEye Scientific Software, Santa Fe, NM, United States<sup>2</sup> ). For the last alignment, X-ray crystal structure of Pf dUTPase complexed with the inhibitor 2<sup>0</sup> ,50 -dideoxy-5<sup>0</sup> -[(diphenylmethyl)amino]uridine (PDB ID: 3T64) (Hampton et al., 2011) was imported to Maestro v. 9.3 (Epik version 3.0, Schrödinger, LLC, New York, NY, United States, 2014.) and prepared using Protein Preparation Wizard, where hydrogen atoms were added according to Epik v. 2.7 (Epik version 3.0, Schrödinger, LLC, New York, NY, United States, 2014.; Shelley et al., 2007) (pH 7.4 ± 0.5), and minimized using the OPLS-2005 force field (Banks et al., 2005). On Make Receptor tool, available on OEDocking 3.0.1 (OEDocking 3.2.0.2: OpenEye Scientific Software, Santa Fe, NM, United States<sup>2</sup> ), the receptor grid was generated with dimensions 22.34 Å × 19.65 Å × 25.24 Å and volume of 11,078 Å<sup>3</sup> . All compounds of the data set were docked and the best pose for each molecule was selected for alignment.

### 3D-QSAR

Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA), available in SYBYL-X v.1.2 (SYBYL-X 1.2, Tripos International, St. Louis, MO, United States; TRIPOS, 2010b), were used to build 3D QSAR models for Pf dUTPase inhibitors.

#### CoMFA

The aligned training set molecules were placed in a 3D lattice box with grid spacing of 2 Å. Then, CoMFA steric and electrostatic fields were calculated at each grid point with the Tripos force field using a carbon atom probe with sp<sup>3</sup> hybridization (Csp<sup>3</sup> ) and charge +1.0. The energy cutoff was set to 30 kcal/mol. The standard deviation coefficient method (SDC) was used for region focusing with values varying from 0.3 to 1.5.

#### CoMSIA

The models were generated using the same molecular alignments used for CoMFA. The aligned compounds were placed in the 3D lattice box with grid spacing of 2 Å. The steric, electrostatic, hydrophobic, hydrogen bond donor and acceptor descriptors were calculated at each grid point. A probe carbon atom with radius of 1.0 Å and charge +1.0, was used to obtain the similarity indices. A Gaussian function was used to describe the energy terms according to the distance between the probe atom and aligned molecules. The attenuation factor (α) was used on default value of 0.3.

### Generation and Validation of QSAR Models

Partial least squares regression (PLS) was used for development of statistical models (Lindberg et al., 1983). The internal validation of QSAR models was performed using the full cross-validation r 2 (q 2 ) leave-one-out (LOO) method. The predictive ability of the models was assessed by Q 2 ext (Tropsha et al., 2003) estimated on external set compounds that were not used for model building or selection. The consensus models were obtained by combination of three QSAR models (HQSAR + CoMFA + CoMSIA). The models were built and used separately for predictions. The predicted activity of each compound by the consensus model was the result of the arithmetic mean of individual models predictions. The external validation of these models was done using the same metrics as for individual models.

### Virtual Screening

The virtual screening of new potential Pf dUTPase inhibitors was performed on Hit2Lead library of the ChemBridge database (ChemBridge Online Chemical Store, 2017). All compounds were prepared using the same protocol and software used in the preparation of the modeling dataset. The methods of alignment and partial charges calculation were the same used in the best individual CoMFA and CoMSIA models. Then compounds had their activity and selectivity predicted by the consensus QSAR models. Two criteria were used for selection of virtual hits: (i) compounds should have the highest predicted potency against Pf dUTPase (predicted pKi); (ii) the predicted selectivity (S) should be greater than zero. Furthermore, some ADMET properties were predicted for the best virtual hits, such as physicochemical properties (logP and logS) 3 ), acute oral toxicity by GUSAR<sup>4</sup> (Filimonov et al., 2004; Lagunin et al., 2009, 2011), carcinogenicity using admetSAR<sup>5</sup> (Cheng et al., 2012), and hERG K+ channel blockage using Pred-hERG<sup>6</sup> ) (Alves et al., 2014; Braga et al., 2014, 2015).

### Molecular Docking

The selected virtual hits were submitted to molecular docking in Glide (Friesner et al., 2004), available on Maestro v. 9.3.5, to predict their binding mode in Pf dUTPase and human dUTPase (HsdUTPase). Ligands were prepared on LigPrep module of Maestro software, the correct protonation states and energy minimization were performed on Epik v. 2.7 (pH 7.4 ± 2.0) using OPLS-2005 force field. The previously prepared structure of Pf dUTPase, used for docking-based alignment, was used here. The search space was defined as a box with 10 x 10 x 10 Å<sup>3</sup> . The box was centered on the geometrical center of cocrystallized ligand (−7.7431 Å × 27.0662 Å × −3.9483 Å, x, y and z axes, respectively). The structure of HsdUTPase (PDB ID: 3ARA, resolution of 1.7 Å) (Miyakoshi et al., 2012) was prepared using the same protocol described for plasmodial enzyme. The grid was defined with dimensions 10 × 10 × 10 Å<sup>3</sup> and was

<sup>3</sup>http://www.hit2lead.com/

<sup>4</sup>http://cactus.nci.nih.gov/chemical/apps/cap

<sup>5</sup>https://omictools.com/admetsar-tool

<sup>6</sup>http://labmol.com.br/predherg/

centered on the co-crystallized ligand at 6.3901 Å × 11.1138 Å × −17.3607 Å, x, y and z coordinates. After docking, the poses of each virtual hit were submitted to rescoring using the Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) approach, available on Prime v.3.1 (Prime version 3.1, Schrödinger, LLC, New York, NY, United States, 2014), using default conditions.

#### Experimental Evaluation

#### Plasmodium Culture

fphar-09-00146 March 5, 2018 Time: 17:0 # 5

Chloroquine-sensitive (3D7) and multidrug-resistant (W2) strains were cultured in RPMI 1640 medium supplemented with 0.05 mg/mL gentamycin, 38.4 mM HEPES, 0.2% sodium bicarbonate, and 10% O<sup>+</sup> human serum, as previously described in a standardized protocol (Trager and Jensen, 1976). Then, erythrocytes were added to the culture to obtain a 5% hematocrit, and incubated at 37◦C under 5% CO<sup>2</sup> atmosphere, with daily exchange of medium. The parasitemia was monitored daily in smears stained with Giemsa. Synchronic cultures in the ring stage were obtained by two consecutive treatments at 48 h intervals with a 5% solution of D-sorbitol (Lambros and Vanderberg, 1979).

#### Determination of Growth Inhibition by SYBR Green I

Parasites synchronized at the ring stage, with 0.5% parasitemia and 2% hematocrit were distributed in each well, separately. The compounds were tested in triplicates, using 12 point of concentration, prepared in two-fold dilution (40 µM – ∼0.019 µM) over 72 h. Chloroquine and pyrimethamine were used as control. Subsequently, the in vitro susceptibility of parasite to tested drugs was measured by SYBR Green according to Hartwig et al. (2013). Briefly, 100 µL of lysis buffer (20 mM Tris, 5 mM EDTA, 0.008% wt/vol saponin, 0.08% vol/vol Triton X-100, and 0.4 µL/mL of SYBR Green) were added in each well of a new black 96-well plate and 100 µL of parasite culture incubated with drugs were added. After homogenization, the plates were incubated for 1 h in the dark. Fluorescence was measured at 490 nm excitation and 540 nm emission (CLARIOstar, Labtech BMG). The IC<sup>50</sup> was calculated by plotting the Log doses vs. Inhibition (expressed as a percentage relative to the control) in Prism 6 (GraphPad Software Inc.).

#### Cytotoxicity Assay

Cytotoxicity assays used COS7 cells (fibroblast-like cell lines derived from monkey kidney tissue), grown in DMEM medium supplemented with 10% fetal bovine serum and 0.05 mg/mL gentamicin in atmosphere containing 5% CO<sup>2</sup> at 37◦C. Drug cytotoxicity in COS7 cells was determined in duplicate, using 12 point of concentration, prepared in two-fold dilution (200 µM – ∼ 0.097 µM). After the incubation period (72 h), the cell viability analysis were done by the MMT reduction method (3-[4,5-dimethyl-thiazol-2-yl]-2,5-diphenyltetrazolium chloride (Mosmann, 1983). The optical density was determined at 570 nm (CLARIOstar, Labtech BMG) and the 50% cytotoxicity concentrations (CC50) was expressed as the percent viability relative to the control. The selectivity index of the compounds was determined by the following expression:

$$\mathcal{S} = \frac{\text{COS7 CC}\_{50}}{Pf \, IC\_{50}} \tag{2}$$

Where COS7 CC<sup>50</sup> corresponds to the 50% cytotoxic concentration on COS7 cells and Pf IC<sup>50</sup> is the 50 % inhibitory concentration on P. falciparum (3D7).

#### Ookinete Assay

All animal procedures were carried out in accordance to the Brazilian College of Animal Experimentation (COBEA). This research protocol was approved by the Ethics Committee of the Institute of Biomedical Sciences – University of Sao Paulo, protocol number 132/2014-CEUA. C57BL/6 mice received an intraperitoneal injection of P. berghei ANKA infected erythrocytes, and four days after infection, a mouse with parasitemia between 4 and 6% and gametocytemia > 0.4% was selected as blood donor for cardiac puncture. Four microliters of the infected blood was dispensed in 80 µl of ookinete medium (Blagborough et al., 2012) at 21◦C with DMSO control or with 10 µM of the tested compounds. The assay was incubated at 21◦C for 24 h and 2 µl of the blood at the bottom of the tubes was spread onto a glass slide, stained with Giemsa and analyzed under a direct light microscope. The total number of formed ookinetes were counted in each slide (triplicate for each condition), and inhibition was calculated in relation to the total ookinetes formed in the control condition.

#### RESULTS AND DISCUSSION

#### QSAR Modeling

Various combinations of hologram length, fragment size, and fragment distinction were tested with an aim to build robust and predictive HQSAR models. The original data set was divided into training and test sets in a ratio of approximately 3:1 using the HCA method. The three best HQSAR models for Pf dUTPase inhibition are shown in Supplementary Table S2. The models displayed very similar statistical features, but the model with fragment distinction A/C (Supplementary Table S2) performed slightly better than others in terms of robustness (q 2 LOO = 0.70) and external predictivity (Q 2 ext = 0.71). In addition, the best model presented a Durbin-Watson metric (Savin and White, 1977) (d) closest to the ideal value (d = 1.99), indicating that this model is less biased. The Durbin-Watson test is useful to evaluate the presence or absence of autocorrelation of residuals from regression analysis. The values range from 0 to 4. Values of d near or equal to 2 indicate no autocorrelation of residuals. Values of d < 2 or d > 2 indicate that residuals are positively or negatively auto correlated and predictions are more biased (Savin and White, 1977). The best HQSAR models for selectivity (using human dUTPase data) are also presented in Supplementary Table S2. The best model, with fragment distinction B/C (Supplementary Table S2), showed good external predictivity (Q 2 ext = 0.83), with d-value close to the reference value (d = 2.02). The plots comparing the experimental and

predicted biological activity for the best HQSAR models are shown in Supplementary Figures S1A,D. These plots demonstrate a good agreement between experimental data and predictions from the models.

The HQSAR contribution maps are useful to highlight the relationships between specific structural fragments and the biological property/activity. Colors close to the red end (red, red orange, and orange) indicate fragments with negative contribution, while colors in the green region (yellow, green blue, and green) indicate fragments with positive contribution to biological activity. The common substructure is represented in cyan (**Figure 2**).

The contribution maps of the most potent (4) and least potent inhibitors (127) and of the most selective (20) and least selective inhibitor (87) are presented in **Figure 2**. As one can see, the trityl ring has a positive contribution for both inhibition and selectivity (compounds 4 and 20, **Figure 2**). Additionally, the absence of the trityl group results in drastic decrease in activity against Pf dUTPase, as observed in compounds 4 and 127 (**Figures 2A,B**, respectively), and a clear decrease in selectivity, when we compare compounds 20 and 87 (**Figures 2C,D**, respectively). These observations corroborate previous studies (Whittingham et al., 2005; McCarthy et al., 2009; Baragaña et al., 2011; Hampton et al., 2011; Recio et al., 2011; Ruda et al., 2011; Ojha and Roy, 2013), indicating that two of the three phenyl rings from the trityl group have significant interactions with the hydrophobic pocket formed by residues Phe46 and Ile117 from Pf dUTPase (Hampton et al., 2011). In contrast, in the human enzyme, such residues are replaced by hydrophilic residues Val42 and Gly87. Therefore, there is no corresponding hydrophobic pocket in HsdUTPase (Whittingham et al., 2005; Hampton et al., 2011). In a previous study by Ojha and Roy (2013), some nucleoside inhibitors were used for QSAR studies and pharmacophore mapping of Pf dUTPase inhibitors. The results revealed that two phenyl rings from the trityl group are responsible for stablishing important hydrophobic interactions and one phenyl ring may form a π–π stacking interaction with the amino acid residue Phe46 from Pf dUTPase (Ojha and Roy, 2013).

Two steps are critical for the development of CoMFA and CoMSIA models: the partial atomic charge assignment and structural alignment (Doweyko, 2004; Melo-Filho et al., 2014). In this study, two different charges (Gasteiger-Hückel and AM1-BCC) and three different molecular alignment approaches (morphological similarity function on Surflex-Sim, shape-based superposition on ROCS and alignment accessed by molecular docking) were evaluated. The Surflex-Sim alignment was performed using the most potent inhibitors of the data set (compounds 1 and 2) as templates, which were used for the flexible alignment of the remaining compounds of the data set. The shape-based alignment was executed with previously generated conformers. These conformers were superimposed to compound 3, which is the co-crystallized inhibitor of Pf dUTPase, available at Protein Data Bank (PDB code: 3T64) (Hampton et al., 2011). The superposition was evaluated by the TanimotoCombo score (Hawkins et al., 2010). Based on this score, the best conformation of each compound was selected. In the docking-based alignment, the previously generated conformers were docked and classified using the Chemgauss4 score function (McGann, 2011). The best conformer for each compound was selected based on the Chemgauss4 score. Additionally, conformers were visually inspected for selection of those with better superposition to the co-crystallized inhibitor.

The results of the best CoMFA and CoMSIA models are available at Supplementary Tables S3 and S4, respectively. The plots comparing the experimental and predicted biological

activity for the best COMFA and CoMSIA models are shown in Supplementary Figures S1B,C,E,F. The best CoMFA models for inhibition and selectivity presented good robustness (q 2 LOO = 0.63 and 0.86, respectively) and good external predictivity (Q 2 ext = 0.75 and 0.61). Furthermore, presented good d values, indicating a low probability of biased predictions (d = 1.86 and 1.99, respectively). In general, for CoMFA models, the shape-based and Surflex-Sim alignments performed better than the docking-based alignment (Supplementary Table S3). The best CoMSIA models were obtained using shape-based alignment and AM1-BCC charges (Supplementary Table S4). The best CoMSIA model for Pf dUTPase inhibition presented good robustness and external predictivity (q 2 LOO = 0.68; Q 2 ext = 0.78, Supplementary Table S4). The best CoMSIA model for selectivity, despite its lower internal consistence (q 2 LOO = 0.59), presented an acceptable external predictivity (Q 2 ext = 0.63), as demonstrated on Supplementary Table S4.

The best CoMFA and CoMSIA models were used to generate contour maps by using STDEV∗COEFF field type and the function "contour by actual." These maps could be useful for designing new potent and selective Pf dUTPase inhibitors as they indicate regions in the molecules where certain types of interactions are favorable and unfavorable for biological activity. The contour maps from the best CoMFA and CoMSIA models, for both inhibition and selectivity, are presented in **Figures 3**, **4**, respectively.

The obtained contour maps show that bulky and hydrophobic groups in the trityl group region are favorable for both Pf dUTPase inhibition and selectivity (**Figures 3A,C**, **4A,C,D**). These results corroborate with the HQSAR contribution maps and other studies highlighting the importance of the trytil hydrophobic group for inhibition and selectivity. The trytil group interacts with the hydrophobic pocket formed by residues Phe46 and Ile117 which are missing in the human dUTPase (Hampton et al., 2011). Thus, structural modifications in trytil group should be further explored in order to improve the interactions with the hydrophobic pocket and, consequently, to help the design of novel potent and selective Pf dUTPase inhibitors. The CoMFA and CoMSIA electrostatic contour maps also show that electropositive groups in sugar moiety and uracil group are favorable for inhibition and selectivity (**Figures 3D**, **4B,E**). Additionally, these maps show that electronegative groups near the region of the oxygen atom of the pentose sugar are favorable for Pf dUTPase selectivity (**Figure 4B**), while electronegative groups near the linker between the trityl group and the sugar moiety (**Figures 3B,D**, **4E**) are unfavorable for both inhibition and selectivity.

The best individual HQSAR, CoMFA, and CoMSIA models were combined in a consensus approach (Supplementary Table S5). Thus, one consensus model for inhibition of Pf dUTPase and another for selectivity were built. The external validation of the consensus models was performed using the same external evaluation set and metrics used for individual QSAR models. The statistical characteristics of the consensus models are presented in **Table 1**. Both models showed good external predictivity (Q 2 ext = 0.85 and 0.75; RMSEP = 0.40).

favorable to biological activity; electrostatic fields: red contours indicate regions where electronegative groups are favorable for biological activity, while blue contours indicate regions where electronegative groups are unfavorable; hydrophobic fields: cyan contours indicate regions where hydrophobic groups are favorable to biological activity.

TABLE 1 | Statistical characteristics of consensus QSAR models for PfdUTPase inhibition and selectivity.


<sup>∗</sup>Consensus of the best individual HQSAR, CoMFA and CoMSIA models; Q<sup>2</sup> ext: determination coefficient for external set; RMSEP, root mean-square error of prediction.

### Virtual Screening

The virtual screening of new potential Pf dUTPase inhibitors was performed on Hit2Lead library of ChemBridge database by prediction of activity and selectivity of the compounds through the developed and validated consensus QSAR models. Each consensus prediction was obtained by the arithmetic mean of the predictions from the best individual HQSAR, CoMFA, and CoMSIA models (Supplementary Table S6). All duplicates or compounds used to generate the models were excluded. Finally, the following criteria were used for selection of the virtual hits: (i) compounds should have the highest predicted potency against Pf dUTPase (predicted pKi) and (ii) the predicted selectivity (S) should be greater than zero. At the end of this process, five virtual hits were chosen for experimental evaluation.

Inadequate ADMET properties contribute to high failure rates in late stages of drug development. The early prediction and optimization of such properties can help the reduction of latestage failures and expenses (van de Waterbeemd and Gifford, 2003; Sanders et al., 2017). In this study, the five virtual hits were evaluated by predicting/analyzing a panel of properties including logP and logS, oral acute toxicity in rodents (Filimonov et al., 2004; Lagunin et al., 2009, 2011), carcinogenicity (Cheng et al., 2012), and binding affinity to hERG (Braga et al., 2015) (**Table 2**). All molecules were predicted as non-carcinogenic and non-blockers of hERG channel. Only LabMol-143 and LabMol-146 were predicted as positive for acute oral toxicity. LabMol-142 presented a high calculated logP (7.3), while the remaining hits presented logP below or slightly above 5.

### Experimental Evaluation of Selected Compounds on P. falciparum Multi-Drug-Resistant and Sensitive Strains, and on P. berghei Sexual Stages

The five virtual hits selected were evaluated in vitro against asexual blood-stages of P. falciparum multi-drug-resistant (W2) and sensitive (3D7) strains. The half maximal inhibitory concentrations (IC50) for each compound (**Table 3**) indicate that three compounds (LabMol-144, LabMol-145, and LabMol-146) were more potent at inhibiting parasite growth, showing activity in submicromolar range against both 3D7 and W2 strains. Furthermore, the cytotoxicity was measured in mammalian COS7 cells. LabMol-144 and LabMol-146 showed promising results in terms of selectivity (SI = 11.7 and 6.7, respectively; **Table 3**).

The five compounds were also tested against P. berghei sexual stages using in vitro gametocyte to ookinete conversion assays (**Table 3**). LabMol-144, a promising selected compound in terms


TABLE 2 | Chemical structures, predicted potency against PfdUTPase, predicted selectivity, and some calculated ADMET properties of the virtual hits.

<sup>a</sup>Prediction based on consensus QSAR model for dUTPase inhibition; <sup>b</sup>Prediction based on consensus QSAR model for selectivity; logP and logS were extracted from Hit2Lead library; <sup>c</sup>Acute oral toxicity predicted using GUSAR; <sup>d</sup>Carcinogenicity predicted in admetSAR software (Cheng et al., 2012); <sup>e</sup>Prediction of hERG channel blockage in Pred-hERG web app (Alves et al., 2014; Braga et al., 2014, 2015).

of IC<sup>50</sup> and SI against asexual stages and mammalian cells, showed inhibition of 44.6% of ookinete formation relative to control. Although the IC<sup>50</sup> range of LabMol-144 and LabMol-146 are still far from that of chloroquine and pyrimethamine (**Table 3**), these compounds represent good starting points for further optimization studies and development of new antimalarial drugs. In addition, drug development based on LabMol-144 may also lead to new antimalarials with transmission blocking activity and new mechanism of action.

The two most promising compounds, LabMol-144 and LabMol-146, are similar to the most potent compound from the training set (cpd. 1) used for developing QSAR models (T<sup>c</sup> of 0.72 and 0.84, respectively, Supplementary Table S6). However, LabMol-144 presents some differences in relation to compound 1. As demonstrated on **Figures 2–4**, and based on previous reports on literature, the presence of hydrophobic groups on trytil region is favorable for both activity and selectivity against Pf dUTPase (Whittingham et al., 2005; Hampton et al., 2011; Ojha and Roy, 2013). Thus, LabMol-144 can be a potent and selective inhibitor of Pf dUTPase due to the addition of two methoxy substituents on trytil group, which can contribute for improved affinity to the hydrophobic binding pocket of the enzyme. Other modifications in LabMol-144 in comparison to compound 1 are the presence of the oxazolidine ring between the sugar moiety and uracil ring, and the substitution of nitrogen by oxygen on the linker between the sugar moiety and the trytil group.

LabMol-144 has higher similarity to the most potent inhibitors of Pf dUTPase from the training set (compounds 1 to 6, **Figure 5**) T<sup>c</sup> = 0.58–0.72, and it has a very low similarity to the currently used antimalarial drugs, T<sup>c</sup> = 0.23–0.54 (**Figure 5**). Added to the fact that LabMol-144 showed similar activity against sensitive and multidrug resistant strains of P. falciparum, this further suggests that the mode of action of nucleosides and their derivatives is different from current antimalarials. This is particularly important considering parasite resistance in natural settings. Therefore, inhibitors of Pf dUTPase, a target different from the other test antimalarials, could overcome cross-resistance phenomena, and are very promising scaffolds to be explored as new antimalarial drugs. Certainly, the activity of compounds could be caused not only by Pf dUTPase inhibition but by different mechanisms of action. However, to explore this, further in vitro enzymatic studies should be performed. Exploring other mechanisms of action is out of the scope of this paper and should be considered in the next steps of the project.

#### Molecular Docking

The most promising compound (LabMol-144, IC<sup>50</sup> = 4.23 µM against W2 strain, and highest predicted pIC<sup>50</sup> = 5.81 against the parasite enzyme) was docked in Pf dUTPase and HsdUTPase in order to compare the binding modes and to analyze how differences between the human and parasite enzymes can be explored for the design of selective inhibitors. The docking studies suggested a higher affinity of LabMol-144 to Pf dUTPase. The Glide Score on Pf dUTPase was −7.38 kcal/mol (**Figure 6A**) and −6.26 kcal/mol on HsdUTPase (**Figure 6C**). After the docking, we performed MM-GBSA calculations to obtain the free energy of binding, in order to compare the affinities of the compounds. The results are available on Supplementary Table S7. These results suggested that LabMol-144 has a higher affinity to Pf dUTPase, with a twice higher affinity toward the parasitic enzyme in comparison to the human ortholog (estimated 1G of binding of −107.8 and −52.8, respectively)."

As demonstrated on **Figures 6A,B**, the parasitic enzyme has the amino acid residues Phe46 and Ile117 in the hydrophobic region of the active site, while the human counterpart has Val65 and Gly110, respectively (**Figures 6C,D**). The presence of Phe46 in Pf dUTPase is responsible for an additional π–π stacking interaction with one ring from trytil group, while Ile117 can perform two hydrogen bonds with uracil and oxazolidine rings. These two hydrogen bonds contribute to the exposure of a hydroxyl group to Tyr112, allowing the molecule to stablish



IC<sup>50</sup> 3D7: half maximal inhibitory concentration on 3D7 strain; IC<sup>50</sup> W2: half maximal inhibitory concentration on W2 strain; CC<sup>50</sup> COS7: half maximal cytotoxic concentration on COS7 cells; SI, selectivity index calculated between CC<sup>50</sup> on COS7 and IC<sup>50</sup> in 3D7 strain. The data are expressed as mean ± SD of three independent assays.

an additional hydrogen bond with this residue (**Figure 6A**). The absence of Phe46 and Ile117 on the human enzyme (**Figure 6C**) results in a weaker affinity for Labmol-144. In HsdUTPase, there are no interactions with Val65 and Gly110, and consequently, no hydrogen bond with Tyr105. The main interactions with HsdUTPase are the hydrogen bonds with Gly99 and two structural water molecules (**Figures 6C,D**).

These results corroborate with our QSAR contribution and contour maps and also with previous studies (Whittingham et al., 2005; Hampton et al., 2011; Ojha and Roy, 2013), highlighting

the differences between human and parasite enzymes, and the importance of hydrophobic interactions with trytil group for increased potency and selectivity. In future studies, we aim to perform enzymatic assays against human and plasmodial enzymes aiming to confirm the findings observed here. Furthermore, the in vitro results against multi-drug and sensitive P. falciparum strains and inhibition of P. berghei ookinete formation are indicative that LabMol-144 is an attractive scaffold for further hit-to-lead optimization studies for the development of new antimalarials with transmission blocking activity.

#### CONCLUSION

In this work, we developed robust and externally predictive consensus QSAR models, merging 2D- (HQSAR) and 3D-QSAR (CoMFA and CoMSIA) models for prediction of inhibition and selectivity against Pf dUTPase. The QSAR models were applied for virtual screening of the ChemBridge database and allowed the selection of five new potential selective inhibitors of Pf dUTPase. The virtual hits were tested in vitro against sensitive (3D7) and multidrug-resistant (W2) strains of P. falciparum. Two compounds, LabMol-144 and LabMol-146, showed promising activity against both strains of P. falciparum and present chemical scaffolds very dissimilar from current antimalarial drugs. Thus, inhibitors of Pf dUTPase could be a good alternative for antimalarial drug combination. In addition, compound LabMol-144 showed potent in vitro inhibition of P. berghei ookinete formation, demonstrating that this compound is active against multiple parasite stages and, therefore, optimization based on this compound may also lead to new antimalarials with transmission blocking activity. In future studies, we aim to perform enzymatic assays against parasite and human enzymes. Furthermore, we aim to perform hit-to-lead optimization through structural modifications on the discovered scaffolds, based on the information gathered from the QSAR contribution and contour maps, aiming at designing new antimalarial drugs with transmission-blocking activity.

#### AUTHOR CONTRIBUTIONS

Each author has contributed significantly to this work. ML and CM-F contributed equally in the design, performing the computational experiments, and writing the paper. ML, CM-F, BN, VA, RB, PC, and CA conceived and designed the

experiments. ML, CM-F, BN, VA, and RB performed the computational experiments. GC, FC, PC, JC, and DB performed the experimental assays. ML, CM-F, GC, FC, PC, EM, JC, and DB analyzed the data. ML, CM-F, GC, BN, EM, DB, and CA wrote the paper. All authors read, edited, and approved the final manuscript.

#### FUNDING

The authors would like to thank Brazilian funding agencies, CNPq, FAPEG, FAPESP, and CAPES for financial support and fellowships. FC, GC, and CA were supported by FAPESP (Grants #2012/16525-2, #2015/20774-6, and #2017/02353-9, respectively). DB was supported by FAPESP (Grant #2013/13119-6) and CNPq (Grant #405996/2016-0). JC was supported by FAPESP fellowship (#2016/16649-4). EM appreciates support from NIH (Grant 1U01CA207160 and GM5105946) and CNPq (Grant #400760/2014-2). CA, PC, and FC are CNPq research fellows. PC was partially supported by

### REFERENCES


the Fundação Nacional de Desenvolvimento do Ensino Superior Particular – Funadesp, via UniEvangélica – Centro Universitário de Anápolis.

#### ACKNOWLEDGMENTS

We are thankful for Dr. Stephen Capuzzi for his kind help with editing the manuscript. We are also grateful to OpenEye Scientific Software Inc. and ChemAxon for providing us with academic licenses for their software.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00146/full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with one of the authors DB.

Copyright © 2018 Lima, Melo-Filho, Cassiano, Neves, Alves, Braga, Cravo, Muratov, Calit, Bargieri, Costa and Andrade. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Molecular Simulations of Carbohydrates with a Fucose-Binding Burkholderia ambifaria Lectin Suggest Modulation by Surface Residues Outside the Fucose-Binding Pocket

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Hugo Verli, Universidade Federal do Rio Grande do Sul, Brazil David Antony Morton-Blake, Trinity College, Dublin, Ireland

#### \*Correspondence:

Elizabeth Yuriev elizabeth.yuriev@monash.edu Paul A Ramsland paul.ramsland@rmit.edu.au

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 12 March 2017 Accepted: 06 June 2017 Published: 21 June 2017

#### Citation:

Dingjan T, Imberty A, Pérez S, Yuriev E and Ramsland PA (2017) Molecular Simulations of Carbohydrates with a Fucose-Binding Burkholderia ambifaria Lectin Suggest Modulation by Surface Residues Outside the Fucose-Binding Pocket. Front. Pharmacol. 8:393. doi: 10.3389/fphar.2017.00393 Tamir Dingjan<sup>1</sup> , Anne Imberty <sup>2</sup> , Serge Pérez <sup>3</sup> , Elizabeth Yuriev <sup>1</sup> \* and Paul A. Ramsland4, 5, 6, 7 \*

<sup>1</sup> Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, VIC, Australia, <sup>2</sup> Centre de Recherches sur les Macromolécules Végétales, Centre National de la Recherche Scientifique UPR5301, Université Grenoble Alpes, Grenoble, France, <sup>3</sup> Département de Pharmacochimie Moléculaire, Centre National de la Recherche Scientifique, UMR5063, Université Grenoble Alpes, Grenoble, France, <sup>4</sup> School of Science, RMIT University, Melbourne, VIC, Australia, <sup>5</sup> Department of Surgery Austin Health, University of Melbourne, Melbourne, VIC, Australia, <sup>6</sup> Department of Immunology, Central Clinical School, Monash University, Melbourne, VIC, Australia, <sup>7</sup> Burnet Institute, Melbourne, VIC, Australia

Burkholderia ambifaria is an opportunistic respiratory pathogen belonging to the Burkholderia cepacia complex, a collection of species responsible for the rapidly fatal cepacia syndrome in cystic fibrosis patients. A fucose-binding lectin identified in the B. ambifaria genome, BambL, is able to adhere to lung tissue, and may play a role in respiratory infection. X-ray crystallography has revealed the bound complex structures for four fucosylated human blood group epitopes (blood group B, H type 1, H type 2, and Le<sup>x</sup> determinants). The present study employed computational approaches, including docking and molecular dynamics (MD), to extend the structural analysis of BambL-oligosaccharide complexes to include four additional blood group saccharides (A, Le<sup>a</sup> , Le<sup>b</sup> , and Le<sup>y</sup> ) and a library of blood-group-related carbohydrates. Carbohydrate recognition is dominated by interactions with fucose via a hydrogen-bonding network involving Arg15, Glu26, Ala38, and Trp79 and a stacking interaction with Trp74. Additional hydrogen bonds to non-fucose residues are formed with Asp30, Tyr35, Thr36, and Trp74. BambL recognition is dominated by interactions with fucose, but also features interactions with other parts of the ligands that may modulate specificity or affinity. The detailed computational characterization of the BambL carbohydrate-binding site provides guidelines for the future design of lectin inhibitors.

Keywords: blood group determinants, Burkholderia ambifaria, docking, fucose, molecular dynamics

## INTRODUCTION

Cystic fibrosis morbidity is mostly due to respiratory infection by opportunistic pathogens (Lyczak et al., 2002; O'Sullivan and Freedman, 2009; Ciofu et al., 2013; Caverly et al., 2015). Burkholderia cepacia is one of the most dangerous pathogens isolated from cystic fibrosis patients; 20% of infected individuals succumb to a rapidly fatal pneumonia termed "cepacia syndrome" (Zahariadis et al., 2003; Blackburn et al., 2004; Lynch, 2009). Isolated B. cepacia strains have been classified into a steadily increasing number of species, referred to collectively as the B. cepacia complex (currently consisting of 20 species Vandamme et al., 1997; De Smet et al., 2015; Martinucci et al., 2016). Most members of the complex are resistant to multiple clinically used antibiotics, making the search for new therapeutics more urgent (Zhou et al., 2007; Loutet and Valvano, 2011; Podnecky et al., 2015). Burkholderia ambifaria, a member of the B. cepacia complex, has been isolated from both clinical and environmental samples (Coenye et al., 2001). In addition to infecting human respiratory tissue, B. ambifaria can colonize plant rhizospheres, where it promotes growth and protects against invading fungi (Li et al., 2002; Lee et al., 2006; Parra-Cota et al., 2014).

Previously, a carbohydrate-binding protein (named "BambL") was identified in the B. ambifaria genome; binding studies using human tissues suggest it may play a role in infection (Audfray et al., 2012). Opportunistic bacteria often adhere to tissues by binding to host carbohydrates using carbohydrate-recognizing proteins (lectins) displayed at the bacterial surface (Bavington and Page, 2005; Imberty and Varrot, 2008; Pieters, 2011; Audfray et al., 2013). Among the many carbohydrates present on human cells, fucose-bearing blood group determinants are often recognized by bacterial lectins (Lindén et al., 2008; Anstee, 2010; Holmner et al., 2010). In the cystic fibrosis respiratory epithelium, cell-surface carbohydrates, present on glycolipids, N-glycoproteins, and mucins, are more fucosylated than in healthy tissue (Rhim et al., 2001; Venkatakrishnan et al., 2015). This increased fucosylation may promote adhesion by fucoserecognizing pathogens (Stoykova and Scanlin, 2008; Audfray et al., 2013). Known cystic fibrosis pathogens Pseudomonas aeruginosa, Burkholderia cenocepacia and Aspergillus fumigatus, all have lectins that bind to fucosylated human blood group carbohydrates (Mitchell et al., 2002; Imberty et al., 2004; Sulak et al., 2010, 2011; Houser et al., 2013, 2015). Significantly, the P. aeruginosa lectins are strongly associated with respiratory tissue damage and bacterial load in a mouse model of lung injury, and treatment with monosaccharides, able to specifically inhibit lectin binding, reduces infection (Chemani et al., 2009). Similar effects have been reported in a human P. aeruginosa infection case study (von Bismarck et al., 2001) suggesting that interfering with lectin-carbohydrate interactions may offer a new frontier in anti-infective treatment (Sharon, 2006; Pera and Peters, 2014). Lectin inhibitor design begins with a thorough understanding of the role of each functional group in the natively recognized carbohydrate (Ernst and Magnani, 2009).

The crystallographic structure of BambL has been solved, revealing a six-bladed β-propeller fold formed by three separate protomers (Audfray et al., 2012). Each subunit contains a single carbohydrate-binding site; upon oligomerization, three additional binding sites are formed at the interfaces between protomers, for a total of six binding sites in the β-propeller fold. The intra- and inter-protomeric sites have similar architectures and (for most blood group carbohydrates) similar binding properties. For this reason, the present work addresses interactions within the intra-protomeric site only. Crystal structures of BambL have also been obtained bound to multiple fucosylated human blood group tetrasaccharides: H type 1, H type 2, B type 2, and Le<sup>x</sup> (PDB IDs: 3ZW2, 3ZZV, 3ZWE, and 3ZW1; Audfray et al., 2012; Topin et al., 2013; **Figure 1**). In each case, the carbohydrate is bound via a buried fucose residue, which participates in a network of hydrogen bonds within a tight fucose-binding pocket. Blood group carbohydrate binding specificity has also been determined by glycan array and affinity quantified by titration microcalorimetry: strongest affinity is for H type 2 tetrasaccharide (K<sup>D</sup> 7.5 µM) and Le<sup>y</sup> pentasaccharide (K<sup>D</sup> 11.1 µM; Audfray et al., 2012). This binding preference indicates that BambL is more selective for blood and tissue carbohydrate determinants containing the type 2 epitope Fucα1- 2Galβ1-4GlcNAc. Several of the blood group and tissue antigens recognized by BambL have not been structurally characterized in complex with the lectin (e.g., Le<sup>y</sup> , Le<sup>b</sup> , and A). Additionally, while existing crystal structures describe static recognition, the dynamic behavior of BambL complexes has not been described. The relative contributions of individual binding interactions to saccharide recognition is also unknown. Extending the structural analysis of BambL-blood group complexes to probe these aspects of recognition will enhance understanding of carbohydrate recognition and facilitate inhibitor design.

The goal of this computational study was to characterize BambL-saccharide binding modes and to inform future in silico or structure-based design of inhibitors for this bacterial lectin. We were interested in identifying lectin residues that are critical for ligand recognition and thus could be used as constraints in prospective virtual screening. In particular, we investigated whether the BambL binding site is restricted to recognizing fucose or is capable of engaging non-fucose saccharides using additional interactions. We first used docking and site mapping to study binding modes in complexes featuring A, B, O (H), and Lewis fucosylated carbohydrates and a library of blood-grouprelated saccharides. The dynamic behavior of these systems was then explored by molecular dynamics (MD) simulations. The recognition of fucose-containing saccharides by BambL is accomplished by a hydrogen-bonding network between fucose and Arg15, Glu26, Trp79, and to a lesser extent Ala38. A hydrophobic contact is made between the fucose non-polar face and the Trp79 imidazole. Additional hydrogen bonds outside the fucose-binding pocket to Asp30, Thr36, Trp74, and Tyr35 are formed in complex with multiple blood group and blood-grouprelated saccharides. Residues involved in these interactions are consistently engaged by blood-group-related saccharides,

**Abbreviations:** BambL, Burkholderia ambifaria lectin; H1, H type 1; H2, H type 2; Le<sup>a</sup> , Lewis a; Le<sup>b</sup> , Lewis b; Le<sup>x</sup> , Lewis x; Le<sup>y</sup> , Lewis y; MD, Molecular Dynamics; PDB, Protein Data Bank; vdW, van der Waals; RMSD, root mean square deviation

ID: 3ZZV, with the intra-protomeric binding site and ligand shown.

suggesting they may be valuable interaction targets for BambL inhibitors.

### MATERIALS AND METHODS

A single BambL subunit containing an intra-protomeric (Audfray et al., 2012) binding site was used in the below computational studies.

### Blood-Group and Blood-Group-Related Carbohydrate Structure Generation

Low energy blood-group and blood-group-related carbohydrate structures were generated and simulation parameters produced using the GLYCAM web portal (Woods, 2005; Kirschner et al., 2008). The A and B determinants were modeled as trisaccharides for comparison to previous binding data for the soluble type A determinant (Audfray et al., 2012). The H type 1, H type 2, Le<sup>a</sup> and Le<sup>x</sup> determinants were modeled as tetrasaccharides for consistency to previously determined binding data (Audfray et al., 2012) and the Le<sup>b</sup> and Le<sup>y</sup> determinants were modeled as tetrasaccharides to encompass the entire epitope. The library of blood-group-related structures is shown in Supplementary Figure 1.

#### Docking

Docking experiments were performed using the docking program Glide 6.8 (Friesner et al., 2004, 2006; Halgren et al., 2004; Schrödinger, 2014a) available within the molecular modeling package Maestro (Schrödinger, 2014a,b). The BambL crystallographic complexes were downloaded from the Protein Data Bank, PDB (Berman et al., 2000), and the protein structures prepared using the Protein Preparation Wizard tool (Madhavi Sastry et al., 2013; Schrödinger, 2014b). During this step, structural details required for the docking calculation were specified. Double bond orders were applied for backbone carbonyl and aromatic side chain moieties, hydrogen atoms were added to the structure, water molecules removed, and disulfide bonds created between cysteine side chain sulfur atoms in close proximity. Missing atoms and side chains were added based on the protein's primary sequence using the Prime tool (Schrödinger, 2014c). To remove steric clashes between added hydrogen atoms, a minimization step was then conducted on hydrogen atoms only, using the OPLS2005 forcefield (Banks et al., 2005). A receptor grid was generated using default settings, with the binding site box centered on the crystallographic ligand. Ligands were docked into the receptor grid using Standard Precision mode with default settings. All carbohydrate atoms were treated flexibly during docking, including all glycosidic linkages and exocyclic groups. The lowest-energy docked poses were retained for MD simulation. Docked poses were filtered by glycosidic dihedral angle to exclude unfavorable high energy carbohydrate conformations. Cutoff values for dihedral filtering were chosen for each glycosidic linkage based on isoenergy contours previously calculated with the MM3 force field from Imberty et al. (1995). Conformations with dihedrals in the following ranges were removed from the analysis: Fucα1-2Gal ϕ < −130◦ & 180◦ < ψ < 360◦ ; GalNAcα1-3Gal ϕ > 240◦ ; Galβ1-3GlcNAc ϕ > 0 ◦ & 180◦ < ψ < −60◦ . Thus, we have used energy maps to post-filter docked poses as a means of retaining reasonable conformations. These energy maps have been commonly used to evaluate carbohydrate conformations obtained from simulations and experimental work [for example Jackson et al. (2014) and Tempel et al. (2002)]. Hydrogen bonds and contacts were tallied using MDAnalysis (Michaud-Agrawal et al., 2011; distance = 3.0 Å, angle = 120).

### Site Mapping

All BambL-blood group carbohydrate complexes were examined using LigPlot (Wallace et al., 1995; Laskowski and Swindells, 2011). Only poses that passed the glycosidic torsion filter requirements (see above), were used for site mapping, following a previously developed method (Yuriev et al., 2001; Agostino et al., 2009b, 2011, 2013; Dingjan et al., 2015a). In brief, each individual hydrogen bond made by a particular BambL residue was counted toward the hydrogen-bond tally. Non-polar vdW interactions between a specific BambL residue and a carbohydrate residue were counted as a single interaction toward the tally. The tallies were normalized to percentages of the total number of hydrogen bond or vdW interactions. Site maps were generated using residue inclusion cutoff values for lectin-carbohydrate complexes of 90% for hydrogen bonds, 0% for vdW interactions (Agostino et al., 2013). Site map images were rendered using PyMOL (Schrödinger, 2014d).

### Molecular Dynamics

MD simulations were performed using Gromacs 5.0.4 (Berendsen et al., 1995; Van Der Spoel et al., 2005; Hess et al., 2008; Pronk et al., 2013). Proteins were parameterized using the AMBER99SB-ILDN (Lindorff-Larsen et al., 2010) forcefield. Carbohydrate topologies were generated using the GLYCAM06 (Kirschner et al., 2008) force field via the glycam.org web portal. The resulting AMBER-formatted topology was converted to GROMACS format using the "acpype" tool (Sousa da Silva and Vranken, 2012). The correctly formatted carbohydrate topology was then combined with the protein topology to describe the entire protein-carbohydrate system. Protein-carbohydrate docked complexes were placed in a rhombic dodecahedral box with a 10 Å minimum distance between solute and box wall, and subsequently solvated using the TIP3P water model. To maintain electrostatic neutrality, Na<sup>+</sup> and Cl<sup>−</sup> counterions were added by the genion module. To remove steric clashes between nearby atoms, the system contents were minimized using the steepest descent algorithm (maximum steps: 50,000). The positions and velocities of the solvent molecules and ions were then equilibrated at constant volume and temperature (NVT ensemble) using three restraint settings: with all protein heavy atoms restrained for 100 ps, then with only backbone atoms restrained for 100 ps (both at 10 K), followed by a 100 ps equilibration without restraints at 300 K. Finally, the pressure of the system was equilibrated for 300 ps without restraints at constant atmospheric pressure (NPT ensemble) at 310 K. During all equilibration steps, positional restraints were applied to protein residues using LINCS (Hess, 2007). The coordinates from the final equilibration step were used to begin production simulation, which was conducted for 400 ns.

For all MD simulations in the NPT ensemble, temperature was kept constant using the velocity rescaling thermostat coupled with a time constant of 0.1 ps. Pressure was held constant at 1 bar using the Parrinello-Rahman barometer, coupled with a time constant of 2 ps. Equations of motion were integrated using a leap-frog integrator with a 2 fs timestep. Long-range electrostatics were evaluated using the Particle Mesh Ewald method. Cutoff values for Coulomb and vdW interactions were set to 1.0 nm. Complexes with blood group carbohydrate ligands were simulated in triplicate, complexes with blood-grouprelated carbohydrate ligands were simulated in singlicate. Each replicate was commenced using randomized velocities, resulting in independent simulations with different initial velocities.

#### Analysis of MD Simulations

Hydrogen bonds in MD simulations were analyzed using the Baker-Hubbard method implemented in the MDTraj (McGibbon et al., 2015) software library. An occupancy value was assigned to each hydrogen bond by calculating the percentage of simulation frames in which the bond was present. Glycosidic dihedral angles were measured using MDTraj and compared to calculated isoenergy contours (see above). Carbohydrate ring conformations were analyzed using Best Four-Member Plane method from GLYCAM (Makeneni et al., 2014). CH-π interactions were represented by measuring a shortest distance from either of the fucose atoms C3, C4, C5, or C6 to atoms of the indole ring of Trp74. Atom labeling corresponds to the conventions of the PDB exchange dictionary (Berman et al., 2003).

#### RESULTS

#### Generation of BambL-Blood Group Complexes by Docking

To decide which of the crystallographic BambL receptor structures to use in this study, we compared complex structures predicted by re-docking with respective crystallographic complexes. The results of these cognate and cross-docking experiments are shown in **Table 1**, **Figure 2**. The Le<sup>x</sup> tetrasaccharide was poorly docked (RMSD > 2 Å) into all BambL structures. However, all four lectin structures afforded approximately equal performance when used as a receptor for the other three carbohydrate ligands: overall RMSD values of 1.09–2.62 and 0.14–0.56 Å for the buried fucose (Fucα1-2Gal) were observed. The crystallographic BambL structure from the PDB ID: 3ZZV complex was used as the receptor structure for site mapping and MD with all carbohydrates shown in **Figure 1**.

In a second step, all blood group saccharides were docked in BambL (PDB ID: 3ZZV) and the top docked poses were analyzed for structural features relevant to recognition (**Table 2**). In all cases except Le<sup>x</sup> , the majority of binding interactions were made via a single buried fucose residue (**Figure 3**). The difucosylated Le<sup>b</sup> and Le<sup>y</sup> possess two fucose residues (Fucα1-2Gal and Fucα1- 4GlcNAc in Le<sup>b</sup> or Fucα1-3GlcNAc in Le<sup>y</sup> ) and therefore may occupy the fucose-binding pocket in two ways. Of the docked Le<sup>b</sup>

TABLE 1 | Top scoring docked pose characterization for BambL-blood group saccharide complexes.


<sup>a</sup>The experimental resolution of each crystallographic BambL complex is shown in brackets beneath the PDB ID. RMSD values compare the ligand portion common between the docked and crystallographic ligand; RMSD values in brackets compare the fucose portion of the docked ligand to the fucose portion of the crystallographic ligand.

<sup>b</sup>Cross-docking performed using the ligands used in site mapping and molecular dynamics (Figure 1). Cognate docking performed using the ligand length present in the crystallographic complex.

<sup>c</sup>Values shown in bold indicate cognate docking experiments.

poses produced here, only the Fucα1-2Gal residue was predicted in the binding pocket. As for the docked Le<sup>y</sup> poses, all of the top 20 ranked poses positioned the Fucα1-2Gal residue in the pocket, with the exception of poses at rank 5 and 6 that predicted the Fucα1-3GlcNAc residue in the fucose binding pocket.

As expected, recognition of the buried fucose (Fucα1-2Gal) was governed by a conserved hydrogen-bonding network and a single hydrophobic stacking interaction (Supplementary Table 1). Rather than interacting via a buried fucose, the Le<sup>x</sup> top docked pose was placed "back-to-front" with the reducing end galactose in the fucose-binding pocket, and the fucose directed away from the protein.



<sup>a</sup>Calculated for buried fucose residue heavy atoms between crystallographic saccharide (PDB ID: 3ZZV) and docked ligand.

<sup>b</sup>Dihedral angles defined as: ϕ, O5-C1-O1-C<sup>x</sup> ; ψ, C1-O1-C<sup>x</sup> -Cx+1.

<sup>c</sup>Excluding hydrogen bonds involving the buried fucose residue.

Apart from interactions with the buried fucose residue, additional hydrogen bonds are made between non-fucose residues and amino acids in the four β-turn loops surrounding the fucose-binding pocket (**Table 2**). The most frequently participating residue, Asp30, interacts with non-fucose portions of multiple saccharides (B, H1, Le<sup>a</sup> , Le<sup>b</sup> , and Le<sup>y</sup> ). The imidazole side-chain of Trp74 (which stacks against the buried fucose) also donates a hydrogen bond to non-fucose residues in several cases. In each case, the hydrogen bond is accepted by atoms in a similar location: two residues away from the buried fucose, at the GlcNAc 6-position (Le<sup>b</sup> , Le<sup>y</sup> ), Gal/GalNAc 3-position (A, B), or GlcNAc 2-position (H1). The presence of hydrogen bonds between non-fucose portions and loop residues suggests that BambL recognition may not rely solely on interactions with a single buried fucose.

Glycosidic dihedral angles in top docked poses lie close to global or secondary minima in previously calculated (Imberty et al., 1995) energy maps (see Supplementary Figures 2, 3). An exception is the Fucα1-2Gal linkage, which is positioned in between minima in the H type 1, H type 2, Le<sup>b</sup> and Le<sup>y</sup> top poses. In the A and B trisaccharide complexes, the Fucα1-2Gal linkage adopted the lowest energy conformation. These results agree with earlier BambL-blood group docking by Topin et al. (2013) in which top docked pose glycosidic linkages also occupied a range of energetic minima.

#### Site Mapping of BambL-Blood Group Complexes

Site mapping reveals binding site residues that are frequently involved in interactions throughout an ensemble of docked poses. Site maps for BambL-blood group complexes are shown in **Figure 4**. These maps are based on docking results for all carbohydrates shown in **Figure 1**. The BambL site maps agree

saccharide docked poses. Hydrogen bonds shown as yellow dashes, hydrophobic interactions shown as teal dashes. Non-polar hydrogens omitted for clarity.

interactions are colored white; residues involved in 20% or greater interactions are colored red (for hydrogen bonding) or blue (for van der Waals). Residues with intermediate involvement are shaded according to the color scale.

with crystallographic complexes, identifying multiple residues in the fucose binding pocket known to interact with fucose in crystallographic structures (PDB IDs: 3ZW2, 3ZZV, 3ZWE, and 3ZW1; Audfray et al., 2012; Topin et al., 2013). Across the docked pose ensemble, hydrogen bonds were frequently formed to Arg15 (27.9%), Ala38 (11.6%), and Glu26 (13.7%), all located within the fucose-binding pocket. Surprisingly, Trp79 (4.9%), also in the crystallographic fucose pocket, was not often involved throughout the docked pose ensemble. van der Waals (vdW) interactions were frequently made with Trp74 (14.6%) in the fucose pocket, in close agreement with crystallographic bound complexes. Site maps also revealed new interactions not seen in crystal structures, identifying hydrogen bonding to Asp30 (7.1%) and vdW interaction with Tyr35 (11.1%) as regularly occurring across all docked poses.

### Molecular Dynamics Simulations of BambL-Blood Group Complexes

To investigate the dynamic behavior of BambL-blood group complexes, the lowest-energy poses generated by docking were simulated in explicit solvent. For difucosylated Le<sup>b</sup> and Le<sup>y</sup> , the lowest-energy poses with the Fucα1-2Gal residue in the fucosebinding pocket were used. The poorly docked Le<sup>x</sup> complex was also simulated, but quickly dissociated from the protein or was unstable in the binding site (see Supplementary Figure 4). To probe the dynamic behavior of the Le<sup>x</sup> binding interactions, the crystallographic complex was used instead (PDB ID: 3ZW1).

During MD simulations, all fucose-anchored blood group saccharides (A, B, H type 1, H type 2, Le<sup>a</sup> , Le<sup>b</sup> , Le<sup>y</sup> ) remained bound to BambL without dissociation for the entire duration (400 ns). Structural fluctuations in ligand RMSD were below 2 Å in all bound complexes, reflecting relatively small changes in ligand positions and geometries during the MD simulations (see Supplementary Figure 5). Carbohydrate ring conformations were found to generally adopt one of the two chair conformations ( <sup>1</sup>C<sup>4</sup> or <sup>4</sup>C1), while the GlcNAc rings in the H type 2, Le<sup>a</sup> , and Le<sup>x</sup> exhibited some variation (see Supplementary Figure 6). A similar hydrogen-bonding pattern was observed across all blood group simulations (**Figures 5**, **6**), featuring interactions between the buried fucose residue and the fucose-binding pocket: Glu26 acidic group to O3 and O4 hydroxyl protons, Arg15 guanidinium to O4 and O5 oxygen atoms, and Trp79 indole to O3 oxygen atom. These hydrogen bonds were highly occupied (between 60 and 90% of simulation frames), with the exception of the Glu26 hydrogen bonds in the Le<sup>b</sup> complex (50–60%). The high occupancy of these hydrogen bonds indicates the dominant role played by fucose in BambL-carbohydrate binding.

In addition to the above interactions, a low-occupancy (up to 30% of simulation frames) hydrogen bond was observed between the Ala38 backbone amide proton and the buried fucose 2-position hydroxyl oxygen atom. In contrast to the highly occupied hydrogen bonds, this interaction engages a backbone proton rather than a side-chain; combined with the low occupancy, this suggests a less significant contribution by this hydrogen bond to carbohydrate binding. Alongside hydrogen-bonding interactions, stacking of the fucose C3-C4- C5-C6 hydrophobic face against the Trp74 indole ring was consistently maintained during simulation (see Supplementary Figure 7).

Hydrogen bonds to non-fucose portions of the carbohydrate ligands were formed at low to moderate occupancies (20–50%) with fucose-binding residue Trp74 (Le<sup>y</sup> : 44%, Le<sup>a</sup> : 23%, Le<sup>x</sup> : 22%) and surface residue Asp30 (B: 37%, H type 1: 44%, Le<sup>b</sup> : 24%, Le<sup>y</sup> : 31%).

Glycosidic linkage conformations explored during MD simulations occupy global, and occasionally secondary, minima

(**Figure 7**). As observed in docking, the Fucα1-2Gal linkage is again an exception, adopting a position intermediate between the two minima for the entire duration of simulation in the H type 1, H type 2, Le<sup>b</sup> and Le<sup>y</sup> complexes. In the H type 1 and Le<sup>b</sup> complexes, this linkage explores a narrower range of higher-energy conformations compared to H type 2 and Le<sup>y</sup> . It is possible that this difference between the calculated energetic minima and the conformations observed in simulation is due to the presence of the protein. Force field-based energy contours describe the energetic behavior of each linkage as an unbound disaccharide in vacuum (Imberty et al., 1995), while simulation of the bound complex introduces protein, water, and other saccharide units within the tri- or tetrasaccharide, all of which influence conformational behavior. A recent example of the influence of protein binding on carbohydrate conformation is the Le<sup>x</sup> saccharide, which occupies well-characterized "closed"

calculated by dividing the number of frames in which the hydrogen bond exists by the total number of simulation frames.

conformations in solution and "open" conformations when bound to the RSL lectin (Topin et al., 2016; defined by the relative positions of the fucose and galactose rings). In the present study, the Le<sup>x</sup> saccharide maintained an open conformation during MD simulation, corresponding to shapes "Open V" and "Open II" in the scheme defined by Topin et al. (2016) consistent with its continuous occupation of the binding site during simulation (see Supplementary Figure 8).

In the A and B trisaccharide simulations, the Nacetylgalactosamine and non-reducing end galactose move more freely than the saccharide occupying the same position in the other ligands. The Fucα1-2Gal glycosidic linkage in these two saccharides occupies two conformations, defined by variation in the ψ-angle between −60◦ and +100◦ . The A trisaccharide explores both, while the B trisaccharide only occupies the former conformation (**Figure 7**).

### Docking and MD Simulations of Complexes with Blood-Group-Related Carbohydrates

Interactions between BambL and blood group/tissue carbohydrates was mediated mainly via the single buried fucose, with occasional hydrogen bonds formed between nonfucose atoms and residues on loops surrounding the binding pocket. Identifying these non-fucose binding interactions may provide opportunities to improve inhibitor affinity for BambL beyond the current fucose-based inhibitors.

The potential for non-fucose binding interactions to form in BambL-saccharide complexes was explored by simulating complexes of 36 blood-group-related carbohydrates to the protein (i.e., a focused carbohydrate library). The related carbohydrates ranged in size from di- to heptasaccharides and were composed of fragments of blood group and tissue determinant carbohydrates and elongated versions of blood group carbohydrates bearing additional saccharides (for structures of all library members, see Supplementary Figure 1). Most of these structures contain fucose moieties and were expected to interact with BambL via the fucose-dominated mode observed in crystallographic structures. To explore how nonfucose residues (such as galactose and N-acetylgalactosamine) might occupy the fucose-binding site, a selection of di- and trisaccharides lacking fucose were also evaluated. Complexes with BambL were assembled by docking and simulated in explicit solvent for 400 ns.

Of the 36 complexes simulated, 28 remained stably engaged without dissociation of the ligand into bulk solvent. Multiple binding modes were observed among the stable complexes, exhibiting different hydrogen-bonding patterns (**Figure 8**). In some complexes (**2, 6, 34, 30**), very few hydrogen bonds were formed and were observed for only up to 30% of MD runs. These binding modes, while stable, did not feature significant hydrogen-bonding interactions with BambL.

In four cases (**5, 19, 18, 20**), the ligand was found to interact with the fucose-binding pocket via a non-fucose saccharide (galactose or N-acetylgalactosamine). While these non-fucose binding modes do include hydrogen bonds to the three fucose pocket residues (Arg15, Glu26, and Trp79), these interactions are not as highly occupied as those made by fucose-containing saccharides (**10, 9, 1, 17**). In non-fucose binding modes, hydrogen-bond occupancies over 70% were observed for only one or two interactions per ligand; for fucose-mediated binding, all three pocket residues are engaged more than 70% of the time.

The remaining 20 carbohydrates bound in a fucosedominated manner, forming hydrogen bonds at over 70% occupancy between a fucose and all three residues of the fucose-binding pocket. In most cases, additional hydrogen bonds were formed with loop residues outside the fucose-binding pocket, with occupancies ranging from 10 to 90%. The highly stable (>70% occupancy) non-fucose hydrogen bonds involved residues Asp30 and Thr36, located on loop 4. The acidic sidechain of Asp30 projects toward the fucose-binding pocket, accepting hydrogen bonds from saccharides not directly bonded to the buried fucose. Thr36 is located further away from the fucose-binding pocket, and accepts hydrogen bonds via the

FIGURE 8 | Hydrogen bonding occupancy of blood-group-related saccharides during MD simulations. Saccharide names indicate the ligand moieties interacting with BambL during simulation.

backbone carbonyl oxygen atom. A less-occupied hydrogen bond (up to 67%) is formed to the indole nitrogen of Trp74, concurrent with hydrophobic stacking against a buried fucose. Finally, Tyr35 donates a hydrogen bond via the phenolic hydroxyl to compound **28** and **33** (and additionally to the non-fucose compound **5**). The fucose-dominated binding modes featuring highest occupancy of non-fucose hydrogen bonds involved carbohydrates **21** and **33**, illustrated in Supplementary Figure 9.

Combining all the BambL residues involved in hydrogen bonds to fucose and non-fucose saccharides presents a perspective of the target site that incorporates a wider view of BambL-saccharide recognition, considering multiple interaction points across the protein surface (**Figure 9**). This view of the BambL binding site presents opportunities for future inhibitor design to consider regions outside the fucose-binding pocket.

## DISCUSSION

We have investigated the molecular aspects of carbohydrate recognition of the B. ambifaria lectin by computational methods: docking, site mapping, and MD. Molecular docking has been shown to be extremely useful for structural predictions, if not affinity calculations (Yuriev et al., 2015). However, docking carbohydrate ligands presents a number of challenges stemming from their extreme flexibility, a large number of hydroxyl groups, leading to the formation of (often) extensive hydrogenbonding networks, and the formation of crucial CH/π stacking interactions between the C-H bonds of the carbohydrates (on their hydrophobic faces) and aromatic side chains of the protein (Agostino et al., 2009a, 2012a). Also, carbohydrate ligands are modular, and different residues (e.g., galactose vs. glucose) are able to establish highly similar interactions with the binding site. We have previously validated Glide and tested a range of other docking programs for structural prediction of carbohydrate complexes with antibodies (Agostino et al., 2009a, 2012b) and lectins (Agostino et al., 2011). We have demonstrated that, as the result of all the above-mentioned challenges, docking programs and scoring functions are not always able to predict the native binding pose faithfully as the top docked pose. To overcome this shortcoming and to harness the recognition information embedded in the docking output, we have developed a site mapping methodology that takes into account an ensemble of docked poses and identifies binding site residues critically involved in recognition of a ligand or ligand family (Yuriev et al., 2001, 2002; Agostino et al., 2013; Dingjan et al., 2015a).

In this study, docking with Glide produced reasonable top poses for a range of BambL complexes with blood group carbohydrates (**Table 2**). Using the BambL structure from PDB

FIGURE 9 | BambL binding site showing residues implicated in saccharide binding. Purple: Residues which form hydrogen bonds with the buried fucose saccharide. Orange: Residues which form hydrogen bonds with non-fucose saccharides. Green: Residues which participate in both hydrophobic and hydrogen bonding interactions.

ID: 3ZZV gave accurate complex prediction for the B, H type 1 and H type 2 saccharides and accurate fucose placement for the A, Le<sup>a</sup> , Le<sup>b</sup> , and Le<sup>y</sup> determinants. All these complexes featured a buried fucose residue (Fucα1-2Gal), providing the majority of hydrogen-bonding interactions, and conformational ranges reflective of predicted energetic minima (Imberty et al., 1995) and relevant experimental structures (Yuriev et al., 2005; Dingjan et al., 2015b). Notably, the distances between fucose carbon atoms and the geometric centers of the imidazole phenyl and pyrrole component ring systems of Trp74 (Supplementary Material, Table S1) are similar to reported geometries for fucose CH/π dispersion interactions of a closely related lectin, RSL (Wimmerova et al., 2012). As in the RSL-fucose complex, the C6 atom interacts with the pyrrole part of the imidazole ring (distance of 3.76 ± 0.3 Å), while C3 is further than 4 Å away. Unlike the RSL complex, C5 also interacts with the pyrrole ring (distance of 3.83 ± 0.1 Å), rather than the phenyl ring, which is further than 4 Å from the entire non-polar plane.

Detailed elaboration of structural aspects of molecular recognition requires expanding the single snapshot view afforded by crystal structures or top docked poses. To that effect, we have undertaken site mapping and MD investigations in order to identify BambL residues critical for recognition of blood group carbohydrates. The advantage of site mapping lies in its ability to consider alternative binding modes while MD also explicitly accounts for the role of water, mediating interactions of BambL to carbohydrates.

We have identified the atomic scale binding interactions that facilitate recognition of fucosylated human blood group saccharides by BambL. A network of hydrogen bonds combined with a single hydrophobic stacking interaction between the buried fucose and amino acids in the fucose-binding pocket account for the majority of binding interactions (**Figure 3**). These structural features of the fucose-driven recognition closely agree with experimental characterization of BambL-carbohydrate binding profile by glycan array, which has demonstrated a preference for short, fucose-bearing saccharides, with the fucose monosaccharide among the most highly ranked binders (Audfray et al., 2012). However, this fucose-driven recognition motif does not explain the specificity profile of BambL compared to other related fucose-binding lectins. Namely, the interactions between BambL and fucosylated saccharides are highly similar to those found in complexes featuring other six-bladed βpropeller fucose-binding lectins: found in fungi [Aleuria aurantia lectin, AAL (Fujihashi et al., 2003; Wimmerova et al., 2003); Aspergillus fumigatus lectin, AFL (Houser et al., 2013); Aspergillus oryzae lectin, AOL (Makyio et al., 2016)] and bacteria [Ralstonia solanacearum lectin, RSL (Kostlánová et al., 2005)]. Members of

this lectin family bind fucose via the same interactions: hydrogen bonds between O2 and a backbone amide proton, O3 and indole nitrogen, O3 and O4 to a shared carboxylate moiety, and O4 and O5 to a shared guanidinium moiety. In a previous docking study of RSL-fucose recognition by Mishra et al. (2012) the same suite of interactions was reported.

Despite the common binding mode, these lectins prefer different blood group determinants: AAL exhibits broad specificity, while AFL prefers Le<sup>y</sup> , and RSL prefers saccharides featuring Fucα1-2 and Fucα1-6 moieties (blood group A, B, and H and core of N-glycans). Varied blood group specificity has been proposed to arise from steric hindrance around the fucose-binding pocket, preventing strong binding to most branched carbohydrate structures (Fujihashi et al., 2003). Glycan array screening shows generally decreased binding to branched carbohydrates compared to mono- and disaccharides for these lectins, emphasizing the importance of steric effects (Houser et al., 2013). Additionally, the nonselective AAL lacks steric hindrance around the fucose-binding pocket: in a bound complex featuring the disaccharide Fucα1- 6GlcNAcβ1-OMe, transferred NOE experiments confirmed conformational flexibility around the glycosidic linkage (Weimar and Peters, 1994). However, steric hindrance alone does not fully explain blood group selectivity in this lectin family. AFL binds the difucosylated Le<sup>y</sup> more strongly than the corresponding monofucosylated saccharide, H-type 2, despite similar steric complementarity to the binding site (Houser et al., 2013). We suggest that stabilizing interactions outside the fucose-binding pocket (as observed in simulations of BambL complexed with blood-group-related saccharides) play a role in saccharide binding in the 6-bladed β-propeller lectin family more generally.

Interactions with non-fucose residues are not as highly occupied as interactions with the fucose. However, they contribute to a wider view of BambL-carbohydrate recognition, considering multiple interaction points across the protein surface. They include hydrogen bonding to Asp30, Tyr35, Thr36, and Trp74 and hydrophobic contacts with Tyr35 (**Figure 9**). These contacts outside the fucose-binding pocket could be employed in future inhibitor design for BambL to address issues of opportunistic infections.

### CONCLUSION

In summary, the present work details the recognition of fucosylated human blood group determinants by BambL, quantifies the occupancy of hydrogen bonding interactions, and identifies opportunities for targeting residues outside the fucose-binding pocket. Recognition mainly involves the

### REFERENCES

fucose monosaccharide through a network of highly occupied hydrogen-bonding interactions to Arg15, Glu26, and Trp79, and a lower occupancy interaction with Ala38. An additional stacking interaction between the fucose hydrophobic face and Trp74 is also highly occupied in MD simulations. Hydrogen bonds to non-fucose saccharides were formed in complexes with Le<sup>y</sup> , Le<sup>b</sup> , Le<sup>a</sup> , H1, H2, and B trisaccharide and in multiple complexes involving blood-group-related saccharides. The most occupied interactions involved Asp30, Thr36, Trp74, and to a lesser degree Tyr35. Carbohydrate recognition by BambL is therefore proposed to be driven by interactions in the fucose-binding site and further stabilized by satellite interactions between nonfucose saccharides and surface residues outside the fucosebinding pocket. The analysis of carbohydrate recognition by BambL presented in this study lays the foundation for the development of fucomimetic molecules able to bind to BambL. Such molecules have potential as anti-adhesives for the treatment of B. ambifaria infection in cystic fibrosis patients.

### AUTHOR CONTRIBUTIONS

Each author has contributed significantly to the submitted work. TD and EY conceived and designed the experiments. TD performed the experiments. TD, EY, and PR analyzed the data. TD, AI, SP, EY, and PR wrote the paper. All authors read and approved the final manuscript.

### FUNDING

This work was supported by a Victorian Life Sciences Computation Initiative (VLSCI) grant number VR0250 on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government, Australia, and with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government, and the Multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) via grant Y96. TD is supported by an Australian Postgraduate Award (APA) scholarship. AI and SP are supported by CNRS, Université Grenoble Alpes through Glyco@Alps (ANR-15-IDEX-02) and Labex ARCANE (ANR-11-LABX-0003-01). PR is supported by an RMIT University Vice Chancellor's Senior Research Fellowship.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fphar. 2017.00393/full#supplementary-material

Agostino, M., Mancera, R. L., Ramsland, P. A., and Yuriev, E. (2013). AutoMap: A tool for analyzing protein-ligand recognition using multiple ligand binding modes. J. Mol. Graphics Modell. 40, 80–90. doi: 10.1016/j.jmgm.2013.01.001

Agostino, M., Ramsland, P. A., and Yuriev, E. (2012a). "Docking of carbohydrates into protein binding sites," in Structural Glycobiology, eds E. Yuriev and P. A. Ramsland (Boca Raton, FL: CRC Press), 111–138.

Agostino, M., Jene, C., Boyle, T., Ramsland, P. A., and Yuriev, E. (2009a). Molecular docking of carbohydrate ligands to antibodies: structural validation against crystal structures. J. Chem. Inf. Model. 49, 2749–2760. doi: 10.1021/ci90 0388a


1. method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749. doi: 10.1021/jm0306430


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Dingjan, Imberty, Pérez, Yuriev and Ramsland. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Identification of a New Potent Inhibitor Targeting KRAS in Non-small Cell Lung Cancer Cells

Chun Xie<sup>1</sup>† , Ying Li<sup>1</sup>† , Lan-Lan Li<sup>2</sup> , Xing-Xing Fan<sup>1</sup> , Yu-Wei Wang<sup>1</sup> , Chun-Li Wei<sup>1</sup> , Liang Liu<sup>1</sup> \*, Elaine Lai-Han Leung<sup>1</sup> \* and Xiao-Jun Yao1,2 \*

<sup>1</sup> State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, China, <sup>2</sup> State Key Laboratory of Applied Organic Chemistry and Department of Chemistry, Lanzhou University, Lanzhou, China

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Andrea Ilari, Istituto di Biologia e Patologia Molecolari (CNR), Italy Chakrabhavi Dhananjaya Mohan, University of Mysore, India

#### \*Correspondence:

Liang Liu lliu@must.edu.mo Elaine Lai-Han Leung lhleung@must.edu.mo Xiao-Jun Yao xjyao@must.edu.mo

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 06 July 2017 Accepted: 30 October 2017 Published: 14 November 2017

#### Citation:

Xie C, Li Y, Li L-L, Fan X-X, Wang Y -W, Wei C-L, Liu L, Leung EL-H and Yao X-J (2017) Identification of a New Potent Inhibitor Targeting KRAS in Non-small Cell Lung Cancer Cells. Front. Pharmacol. 8:823. doi: 10.3389/fphar.2017.00823 KRAS (v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) is an oncogenic driver with mutations in 30% of non-small cell lung cancer (NSCLC). However, there is no effective clinical drug even though it has been identified as an oncogene for 30 years. In this study, we identified a small molecule inhibitor compound 0375-0604 targeting KRAS by using molecular docking based virtual screening approach. Compound 0375- 0604 had a good binding affinity to KRAS in vitro and exhibited cytotoxicity in oncogenic KRAS expressing NSCLC cell lines. Further mechanism study showed that compound 0375-0604 can block the formation of the complex of guanosine triphosphate (GTP) and KRAS in vitro. In addition, compound 0375-0604 inhibited KRAS downstream signaling pathway RAF/MEK/ERK and RAF/PI3K/AKT. Finally, we also found that this compound can inhibit the cell growth through G2/M cell cycle arrest and induce apoptosis on the NSCLC cell lines harboring KRAS mutation. Therefore, compound 0375-0604 may be considered as a potential KRAS inhibitor for treatment of NSCLC carrying KRAS oncogene.

Keywords: KRAS, NSCLC, small molecule inhibitor, molecular docking

### INTRODUCTION

In lung cancer, NSCLC is the majority category and accounts for 85% (Ettinger et al., 2017). The overall survival of patients with advanced or metastatic NSCLC is still dismal (Shima et al., 2015; Lazo and Sharlow, 2016). With the development of modern sequencing technology, NSCLC was further classified into different subtypes according to the frequency of gene mutation, such as EGFR, ALK, MET, ROS-1, KRAS (Chen et al., 2014). Mutated KRAS genes are frequently found in human cancers, especially in approximately 30% of lung cancer. Ninety seven percent of mutated KRAS occurs in exon 2 and 3, including amino acid G12, G13, and Q61 (Karachaliou et al., 2013). Due to high morbidity and mortality, a great deal of attention has been paid to study NSCLC with KRAS mutations. However, there is still no direct and effective drug for clinical use (Jancik et al., 2010; Gysin et al., 2011; Vasan et al., 2014; Papke and Der, 2017).

KRAS plays an important role in normal cell development, such as proliferation and differentiation (Pylayeva-Gupta et al., 2011; Santarpia et al., 2012). As a small GTPase, KRAS normally cycles between inactive GDP-bound state and active GTP-bound state, which are tightly regulated by GTPase-activating proteins (GAPs) and Guanine nucleotide exchange factors (GEFs), respectively (Maurer et al., 2012; Burns et al., 2014; Leshchiner et al., 2015). However, mutant

**76**

KRAS impairs its GAPs activity, which locks KRAS at the active state (Smith et al., 2013; Clausen et al., 2015). Thereby mutant KRAS promotes its interaction with a variety of effector proteins and activates downstream signaling events, and finally results in tumor formation (Bos et al., 2007; Zimmermann et al., 2013; Lito et al., 2016). Therefore, it is urgently needed to find effective inhibitors to target and inhibit oncogenic KRAS in cancers.

To date, there are mainly three strategies for the discovery of potent KRAS inhibitors: (1) to inhibit KRAS membrane targeting (Laheru et al., 2012; Prakash and Gorfe, 2013; Chavan et al., 2015; Cox et al., 2015); (2) to directly target KRAS (Wang et al., 2012; Cromm et al., 2015; Leshchiner et al., 2015; Brock et al., 2016; Trinh et al., 2016); (3) to inhibit interaction between KRAS and its downstream effectors (Athuluri-Divakar et al., 2016; Upadhyaya et al., 2016; Keeton et al., 2017). However, there are multiple escape pathways by process of posttranslation for inhibiting KRAS membrane targeting (Rowinsky et al., 1999; Van Cutsem et al., 2004). Inhibitor lonafarnib and tipifarnib showed effective inhibition to KRAS mutations through blocking prenylation of RAS, but failed in clinical trial, as the geranylgeranylation could be in replacement of prenylation when the farnesyltransferase was inhibited by these two inhibitors (Berndt et al., 2011). Additionally, it may be not a good choice to inhibit interaction between KRAS and its downstream effectors for developing KRAS inhibitors. Firstly, there are a lot of downstream effector proteins of KRAS involving in multiple signaling pathways, such as RAF (MAP kinase pathway), PI3K (AKT/mTOR pathway), and RalGDS (Ral pathway). Secondly, these effectors are not only highly complex but also regulating multiple pathways (Downward, 2003). Arguably, designing a small molecule inhibitor directly targeting KRAS may be one of the most effective ways. However, the biggest challenges to develop direct KRAS inhibitor are the high binding affinity between KRAS and GDP/GTP in the picomolar range and the relatively flat surface without deep hydrophobic pockets in KRAS protein (Ledford, 2015; Vo et al., 2016). Notably, in recent years, several reported works have shown the novel transient pockets on KRAS protein surfaces, which recover the hope in the development of KRAS inhibitors (Prakash and Gorfe, 2013; Wang et al., 2014).

In this study, we aimed to identify effective and potential KRAS inhibitors by directly targeting KRAS to prevent cell growth of NSCLC harboring KRAS mutation. We performed a molecular docking-based virtual screening from a small molecule database to screen KRAS inhibitors. A potential inhibitor compound 0375-0604 was found to bind to KRAS and exhibit the effective cytotoxicity to KRAS mutant NSCLC cell lines.

#### RESULTS

#### The Binding Mode between Compound 0375-0604 and KRAS

To discover potential small molecules targeting KRAS, virtual screening based on molecular docking was performed on Chemdiv library with about 1.36 million compounds. The most promising compound 0375-0604 (**Figure 1A**) was selected for further study. It was shown that the benzothiazole ring of 0375-0604 inserted into the binding pocket of KRAS with the linker sulfur atom exposed to solvent environment (**Figures 1B,C**). The amino group of compound 0375-0604 formed H-bond interactions with the backbone of Met67 and the side chain of Glu37, locating in switch I and II regions of KRASG12D, respectively. At the same time, 0375-0604 formed polar contacts and hydrophobic contacts with the surrounding residues. 0375-0604 bound to KRASG12C with a similar manner (**Figure 1D**) as KRASG12D, except for the orientation of chlorobenzene rings. The orientation of benzothiazole ring and chlorobenzene rings of 0375-0604 switched in KRASQ61H (**Figure 1E**). Docking score of 0375-0604 in various systems was shown in **Figure 1F**.

### The Binding Affinity of Compound 0375-0604 with KRAS

To determine the binding affinity of this small molecule with KRAS, we used biolayer interferometry assay (BLI) (Rich and Myszka, 2007), a label-free technology, to measure biomolecular interactions. Different concentration of compound 0375-0604 was measured in real time by association with both-labeled KRAS protein, which was immobilized on the streptavidin (SA) biosensors. All the association/dissociation binding curves was shown in **Figure 2A**, and we further performed the steady-state analysis (**Figure 2B**) with ForteìBio data analysis software to obtain the binding affinity with K<sup>D</sup> value of 92 µM (**Figure 2C**), which demonstrated their direct and reversible interaction with KRAS.

### Compound 0375-0604 Decreased Cell Viability of NSCLC Cells with KRAS Mutations

Since compound 0375-0604 bound to KRAS in vitro, we further determined its cytotoxicity in NSCLC cell lines harboring mutant KRAS by using MTT assay, including H2122 (KRASG12C), H358 (KRASG12C) and H460 (KRASQ61H) cell lines. Cells were incubated with a range of compound 0375-0604 concentrations (0, 25, 50, 100 µM) for 24, 48, and 72 h. As shown in **Figure 3**, compound 0375-0604 inhibited three NSCLC cell lines in a doseand time-dependent manner, but not in normal lung fibroblast cell line CCD-19Lu. Importantly, we found that the IC<sup>50</sup> value of compound 0375-0604 in H2122, H358 and H460 cells were up to 6-fold less than that of CCD-19Lu cells, which suggested that compound 0375-0604 showed strong inhibition selectivity in NSCLC cells.

### Compound 0375-0604 Blocked GTP-KRAS Formation in NSCLC Cells

Actually, mutant KRAS would interfere the balance between GEFs and GAPs, resulting in locking in the active GTP-bound KRAS state and aberrant stimulation of its downstream signaling. Hence, KRAS inhibitors should reduce the formation of GTP-KRAS to disrupt the mutant KRAS function.

In order to know whether compound 0375-0604 could inhibit activation of KRAS, we performed RAS activation assay to

(C–E) Binding mode of 0375-0604 with KRASG12D, KRASG12C and KRASQ61H, respectively. (F) Docking score of 0375-0604 in various systems.

examine the formation of GTP-bound KRAS after treatment with a range of concentrations of compound 0375-0604 in H2122, H358 and H460 cells at 24 h. As shown in **Figure 4**, the formation of GTP-KRAS was inhibited in KRAS mutant NSCLC by compound 0375-0604 treatment, compared to total amount of KRAS, suggesting this small molecule could partially rescue this unbalance resulted from mutant KRAS.

FIGURE 2 | The binding affinity of compound 0375-0604 with KRAS was determined by interferometry studies. (A) Binding curves of varying concentrations of compound 0375-0604 to the immobilized KRAS protein. (B) Steady-state analysis of the binding curves. (C) The binding affinity (KD) of KRAS for compound 0375-0604 was determined. Experimental data for association and dissociation are represented as shown.

lines were determined by MTT assay.

### Compound 0375-0604 Inhibited the Activation of KRAS Downstream Signaling Pathway

Active KRAS stimulates downstream signaling pathways, especially for RAF/MEK/ERK and RAF/PI3K/AKT pathway, and

then induces cell proliferation. Therefore, to investigate the effect of compound 0375-0604, we examined the phosphorylation levels of CRAF, AKT and ERK in NSCLC cell lines to monitor the impact of KRAS signaling by treatment with this compound for 48 h. As expected, compound 0375-0604 reduced the levels of phosphorylation of CRAF and AKT in a dose-dependent manner in all three NSCLC cell lines (**Figure 5**), which indicated that compound 0375-0604 may block oncogenic KRAS function through inhibiting its downstream signaling pathways.

## Compound 0375-0604 Induced Cell Cycle Arrest and Apoptosis in NSCLC

cells were used as a control. A representative of at least three independent experiments for each cell line is showed.

Since compound 0375-0604 significantly inhibited cell viability of NSCLC cells with KRAS mutation, we examined whether compound 0375-0604 exhibited cytotoxicity by cell cycle arrest or apoptotic effect in H2122, H358 and H460 cells. Cells were treated with the indicated concentrations of compound 0375-0604 for 24, 48, and 72 h. Flow cytometric analysis showed that after 24 h treatment, the percentage of cells significantly decreased in G0/G1 phase while remarkably increased in G2/M phase (**Figure 6A**). In addition, compound 0375-0604 induced a significantly increased apoptosis for 48h in NSCLC cell lines (**Figure 6B**). These result suggested that compound 0375-0604 may block cell proliferation and cause cell death via induction of G2/M cell cycle arrest or/and apoptosis in H2122, H358, and H460 cells.

### DISCUSSION

In this study, we identified and characterized a small-molecule compound 0375-0604 as a new KRAS inhibitor. By using molecular docking approach, we found that compound 0375- 0604 bound to the switch regions (switch I and II) of KRAS. KRAS conformation changes and its downstream signals are activated when its switch regions, either switch I or switch II, interact with GTP. A remarkable feature of compound 0375-0604 is that it formed two hydrogen bonds interaction with the backbone of Met67 and the side chain of Glu37, which are located in switch I and switch II, respectively. These two key hydrogen bonds could partially stabilize the interaction of KRAS and 0375-0604. The docking calculation indicated compound 0375- 0604 exhibited potent binding affinity with KRAS. In addition, the chemical structure of this inhibitor has more potential to be modified and achieve more potent and effective inhibition to oncogenic KRAS NSCLC. Based on the docking result, the binding affinity of compound 0375-0604 with KRAS protein was further determined by using BLI and K<sup>D</sup> value was 92 µM, which suggested that 0375-0604 could bind to KRAS with good performance.

There are two most extensively RAS-mediated pathways: PI3K/AKT/mTOR and RAF/MEK/ERK pathway (Papke and Der, 2017). The PI3K/AKT/mTOR pathway represents an important intracellular signaling pathway, which involved in transition of cell cycle. It is directly related to cell proliferation, cancer progress and longevity (Bryant et al., 2014). The RAF/MEK/ERK pathway is a chain of proteins in cell that communicates a signal from a receptor on the surface of cell to the DNA in the nucleus of cell. The RAF serine/threonine kinases (ARAF, BRAF and CRAF) are arguably the most important effectors of mutant RAS-dependent cancer growth, as they have a key driver role in RAS-mediated oncogenesis. In our study, we found that compound 0375-0604 could reduce the activation levels of AKT, CRAF and ERK and block the activation of KRAS downstream signaling pathways in NSCLC.

KRAS binds to GTP in its active state and then influences the expression of downstream genes involved in crucial pathways on regulating cell growth, differentiation and apoptosis (Lu et al., 2016). Compound 0375-0604 showed a strong anti-cancer activity by inhibiting the activation of KRAS proteins, and caused G2/M cell cycle arrest at the early stage and induced apoptosis

at the later stage in H2122, H358 and H460 cell lines harboring KRAS oncogene.

In summary, we identified a new small molecule compound 0375-0604 that bound to KRASG12D, KRASG12C and KRASQ61H protein with a moderate binding affinity of −5.38, −5.41, and −3.97 kcal/mol, respectively. In addition, 0375-0604 selectively inhibited the proliferation of NSCLC cells with KRAS mutation but not normal lung cells. Compound 0375-0604 also blocked the formation of GTP-KRAS and downstream activation of KRAS in vivo. Besides, compound 0375-0604 inhibited the growth of cancer cells by causing G2/M cell cycle arrest and inducing apoptosis. Regardless, our study provides further evidence for targeting to KRAS protein, which may contribute to the future study for lung cancer therapy.

### MATERIALS AND METHODS

#### Molecular Docking

The KRASG12<sup>D</sup> structure (PDB code: 4DSU) complexed with GDP and a compound benzimidazole (BZIM) was used for the modeling of possible binding modes between KRAS and 0375- 0604. The crystal structure with GDP was prepared in the Prep Wiz module of Maestro (Version 9.1, Schrodinger) and water molecules within 5 Å of het groups were kept. Subsequently, the residues of D12 and Q61 were mutated into C12 and H61 using the BioLuminate module of Maestro, respectively. A grid file was generated based on the position of compound BZIM in the Grid Generation wizard. Then, 0375-0604 was prepared to assign atomic charges and generate alternative conformations chemical rings. Finally, the docking process was employed in the Glide Docking module based on the previous obtained grid file using an extra precision (XP) protocol followed by a post-docking minimization using MM-GBSA method.

### Biotinylation

KRAS (Abcam, ab96817; 200 µg/ml) was biotinylated using the EZ-Link NHS-LC-LC-biotin (Thermo) in H2O using a 3:1 molar ratio of biotin reagent: protein for 30 min at room temperature following the FortéBio suggested protocol. Biotinylated KRAS was separated from the biotinylation reaction reagents by Zeba desalying spin columns (Thermo). Streptavidin biosensors (SA) tips (ForteìBio, Inc., Menlo Park, CA, United States) were prewetted with PBS to establish a baseline before immobilization.

#### Biolayer Interferometry Analysis

A FortéBio Octet Red instrument was used in this assay. All the assays were performed at 96-well plate (Greiner Bio-One, PN:655209) and all the final volume for all the solutions was 200 µl/well. Biotinylated KRAS was immobilized onto SA tips. The experiments comprised three steps: (1) baseline, (2) association, (3) dissociation. The association and dissociation plot and kinetic constants were obtained with ForteìBio data analysis software. For measurement the interaction between

compound 0375-0604 and KRAS, seven concentrations of compound 0375-0604 (20, 40, 60, 80, 100, 120, 140, 160 µM) were used for association step.

### 3-(4, 5-Dimethylthiazol-2-yl)-2, 5-Diphenyltetrazolium Bromide (MTT) Assay

H2122, H460 and H358 were purchased from ATCC and cultivated with RPMI 1640 medium supplemented with 10% fetal bovine serum (FBS), 100 units/mL penicillin and 100 µg/mL streptomycin. All the cells were cultivated at 37◦C with 5% CO<sup>2</sup> incubator. Cells were seeded on a 96-well microplate with 3000, 4000, or 5000 cells/well, and were cultured overnight for cell adhesion. After add a range of compound 0375-0604 the microplates put back into incubator and incubated for 24, 48, or 72 h. Each dosage was repeated as triplicate. 10 µL MTT (5 mg/mL) solutions were added to each well. After incubated 4 h 100 µL DMSO was added to each well. After 15 min shake absorbance of the plate was measured at 570 nm (absorbance) and 650 nm (reference) by a microplate reader (Tecan).

#### Pull Down Assay

RAS activity was determined using RAS activation assay kit (EMD Millipore, 17–218). Briefly, lysates were incubated with glutathione S-transferase fusion of the Ras binding domain (RBD) of Raf1 along with glutathione agarose for 1 h. Agarose beads were collected by centrifugation and washed three times with Mg2<sup>+</sup> lysis buffer. Each sample were resuspended and boiled for 5 min. Finally, samples were subjected to western blotting as previously described and blots probed using anti-KRAS antibody (Santa, sc-30).

#### Western Blot Analysis

After 48 h treatment with compound 0375-0604, RIPA lysis buffer (150 mM NaCl, 50 mM Tris pH 7.4, 1 mM EDTA, 0.25% sodium deoxycholate, 1% NP-40) containing protease (Roche) and phosphatase (Roche) inhibitors was added to extract the total whole cell protein. Bio-Rad DCTM protein assay kit was used to quantify the concentration of extract protein. Thirty microgram protein lysate was loaded and separated by 10% SDS-polyacrylamide gel electrophoresis and transferred to a nitrocellulose (Millipore) membrane. The membrane was incubated with the primary antibody (1:2000) and then with a

#### REFERENCES


fluorescence-conjugated secondary antibody (1:10000). GAPDH was used as the loading control and for normalization. The signal of the membranes was scanned with a LI-COR Odyssey Scanner (Belfast).

### Cell Cycle and Apoptosis Assay Using Flow Cytometry

H2122, H358 and H460 cells were plated on a 6-well plate with 1.5 × 10<sup>5</sup> cells/well and cultured overnight for attachment. After treatment with compound 0375-0604 at 0, 25, 50, 100 µM for 24, 48, and 72 h, all cells were harvested and collected. For cell cycle analysis, cells pellets were re-suspended in 70% ethanol and fixation at 4◦C for 30 min. Each cell pellet was stained in 300 µL propodium iodide (PI) (50 µg/ml) staining solution at 37◦C for 30 min in dark. Then, cells were washed twice with PBS. Finally, cells were re-suspended in 300 µL PBS and transferred to the flow cytometer (FACSCalibur, BD Biosciences). For apoptosis analysis, cells will re-suspended with 100 µL annexin-binding buffer, stained with annexin V and PI staining solution and incubated 15 min at room temperature protect from light. Finally, cells were diluted in 300 µL annexin-binding buffer and quantitatively measured using flow cytometer (FACSCalibur, BD Biosciences).

#### Statistical Analysis

Descriptive analytical data are presented as means ± SD. Statistical analysis was conducted using Graph Prim 5.0. Significant differences between datasets were assessed by one-way analysis of variance (ANOVA).

#### AUTHOR CONTRIBUTIONS

X-JY, EL-HL, and LL conceived the project. X-JY, EL-HL, and CX designed the experiments. CX, YL, L-LL, X-XF, Y-WW, and C-LW carried out the research and analysis of data. X-JY, EL-HL, LL, and CX wrote the paper.

### ACKNOWLEDGMENT

This work was supported by Macao Science and Technology Development Fund (Project No: 082/2013/A3 & 086/2015/A3 & 005/2014/AMJ&046/2016/A2).

Mini. Rev. Med. Chem. 16, 358–369. doi: 10.2174/13895575156661510011 54002


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Xie, Li, Li, Fan, Wang, Wei, Liu, Leung and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fphar-08-00823 November 10, 2017 Time: 15:11 # 8

# Gossypol Inhibits Non-small Cell Lung Cancer Cells Proliferation by Targeting EGFRL858R/T790M

Yuwei Wang<sup>1</sup> , Huanling Lai<sup>1</sup> , Xingxing Fan<sup>1</sup> , Lianxiang Luo<sup>1</sup> , Fugang Duan<sup>1</sup> , Zebo Jiang<sup>1</sup> , Qianqian Wang<sup>1</sup> , Elaine Lai Han Leung1,2,3 \*, Liang Liu<sup>1</sup> \* and Xiaojun Yao<sup>1</sup> \*

<sup>1</sup> State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macau, China, <sup>2</sup> Department of Thoracic Surgery, Guangzhou Institute of Respiratory Health and State Key Laboratory of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China, <sup>3</sup> Respiratory Medicine Department, Taihe Hospital, Hubei University of Medicine, Hubei, China

Background: Overexpression of epidermal growth factor receptor (EGFR) has been reported to be implicated in the pathogenesis of non-small cell lung cancer (NSCLC). Several EGFR inhibitors have been used in clinical treatment of NSCLC, but the emergence of EGFRL858R/T790M resistant mutation has reduced the efficacy of the clinical used EGFR inhibitors. There is an urgent need to develop novel EGFRL858R/T790M inhibitors for better NSCLC treatment.

#### Edited by:

Leonardo G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Ahmed Farouk Salem, University of California, Riverside, United States Jianfeng Pei, Peking University, China

#### \*Correspondence:

Elaine Lai Han Leung lhleung@must.edu.mo Liang Liu lliu@must.edu.mo Xiaojun Yao xjyao@must.edu.mo

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 22 December 2017 Accepted: 18 June 2018 Published: 09 July 2018

#### Citation:

Wang Y, Lai H, Fan X, Luo L, Duan F, Jiang Z, Wang Q, Leung ELH, Liu L and Yao X (2018) Gossypol Inhibits Non-small Cell Lung Cancer Cells Proliferation by Targeting EGFRL858R/T790M. Front. Pharmacol. 9:728. doi: 10.3389/fphar.2018.00728 Methods: By screening a natural product library, we have identified gossypol as a novel potent inhibitor targeting EGFRL858R/T790M. The activity of gossypol on NSCLC cells was evaluated by cell proliferation, cell apoptosis and cell migration assays. Kinase activity inhibition assay and molecular docking were used to study the inhibition mechanism of gossypol to EGFRL858R/T790M. Western blotting was performed to study the molecular mechanism of gossypol inhibiting the downstream pathways of EGFR.

Results: Gossypol inhibited the cell proliferation and cell migration of NSCLC cells, and induced caspase-dependent cell apoptosis of NSCLC cells by upregulating the expression of pro-apoptotic protein BAD. Molecular docking revealed that gossypol could bind to the kinase domain of EGFRL858R/T790M with good binding affinity through hydrogen bonds and hydrophobic interactions. Gossypol inhibited the kinase activity of EGFRL858R/T790M with EC<sup>50</sup> of 150.1 nM. Western blotting analysis demonstrated that gossypol inhibited the phosphorylation of EGFR and its downstream signal pathways in a dose-dependent manner.

Conclusion: Gossypol inhibited cell proliferation and induced apoptosis of NSCLC cells by targeting EGFRL858R/T790M. Our findings provided a basis for developing novel EGFRL858R/T790M inhibitors for treatment of NSCLC.

Keywords: gossypol, molecular docking, NSCLC, EGFR, TKI

## INTRODUCTION

Non-small cell lung cancer (NSCLC) accounts for approximately 85-90% of lung cancers, which has proven to be difficult to be treated due to poorly understood the pathogenesis (Oyewumi et al., 2014; Siegel et al., 2017). Conventional treatment strategies are used for NSCLC including surgical operation, radiotherapy and chemotherapy (Scott et al., 2007; Onishi et al., 2011;

Uzel and Abacıoglu, 2015 ˘ ). In addition, tyrosine kinase-based inhibitors (TKIs) molecular-targeted therapy are also employed to the treatment of NSCLC patients with EGFR mutations. Overexpression of EGFR has been reported and implicated in the pathogenesis of NSCLC, which account for more than 60% of NSCLC (Ohsaki et al., 2000). Therefore, it is increasing in clinic application as molecular targets for NSCLC patients with EGFR mutation.

The role of aberrant activation of the EGFR in NSCLC is well-documented (Sordella et al., 2004; Tracy et al., 2004; Gazdar and Minna, 2005; Sharma and Settleman, 2007; Sharma et al., 2007). The most common activating mutations, including point mutation L858R in exon 21 and deletions within exon 19 (del746-750) (Riely et al., 2006; Sharma et al., 2007), promote EGFR-driven cell proliferation and survival. Both first and second generation EGFR-targeted TKIs (gefitinib and erlotinib) targeting those activating mutants have been demonstrated to have a remarkable clinical response in the treatment of EGFR-mutated NSCLC (Lynch et al., 2004; Paez et al., 2004; Jackman et al., 2009; Rosell et al., 2009; Sequist et al., 2010). Although the early clinical results of firstgeneration EGFR inhibitors are impressive, unfortunately, most NSCLC patients with activating mutations eventually develop acquired resistance to EGFR inhibitors within several months. The most common mechanism of acquired resistance is the secondary T790M (gatekeeper residue Thr790 to methionine within the EGFR kinase domain) point mutation in exon 20 that occurs with an EGFR mutation (e.g., L858R), which accounts for approximately 60% in these acquired resistances (Balak et al., 2006; Kosaka et al., 2006; Yu et al., 2013). To overcome the acquired resistance to first-generation TKIs, several second- and third-generation EGFR TKIs [such as EKB-569 (Kwak et al., 2005), BIBW2992 (Li et al., 2008) and PF00299804 (Engelman et al., 2007)] have been developed. However, these agents still display limited clinical benefit for NSCLC patients with T790M mutation owing to doselimiting toxicities (Oxnard et al., 2011; Miller et al., 2012). Recently, third-generation covalent EGFR inhibitor osimertinib (Ward et al., 2013; Cross et al., 2014) has been developed as mutant-selective EGFR inhibitor that specifically targeting EGFRL858R/T790M mutation. However, the effective treatment of patients that harbor the EGFR T790M drug resistance mutation with osimertinib is limited by the emergence of new drug resistances to the tyrosine kinase inhibitor therapy (Thress et al., 2015; Büttner et al., 2017). C797S mutation was reported to be a major mechanism for resistance to third generation EGFR TKIs (Yu et al., 2015). In addition to C797S mutation, other rare tertiary EGFR mutations have also been reported, including novel solvent front mutations (G796S/R), hinge pocket mutations of the leucine residue at position 792 (L792F/H), binding interference at position 798 (L798I), and steric hindrance at position 718 (L718Q) (Bersanelli et al., 2016; Chabon et al., 2016; Chen et al., 2017; Ou Q. et al., 2017; Ou S.-H.I. et al., 2017). With the emergence of resistance mechanisms, there is an urgent need to discover a novel class of EGFR inhibitors that effectively inhibits drug-resistant EGFRL858R/T790M mutation.

Natural products have been widely regarded as a pivotal source of leading compounds for drug development, recently, several natural products have been identified targeting EGFRL858R/T790M to overcome resistance. (Jung et al., 2015; Xiao et al., 2016). In our previous studies, we have successfully identified several small molecules from natural products library that could inhibit the growth of gefitinib resistant NSCLC via different mechanisms. (Fan et al., 2015; Li et al., 2017). These compounds demonstrated significant anti-proliferative effects on a variety of NSCLC cell lines, including those with T790M and L858R/T790M mutations. In this study, we identified a small molecule gossypol from cottonseed, as a potent inhibitor targeting EGFRL858R/T790M. Gossypol and its derivatives exert antitumor effects on different cancer types in vitro and in vivo, including breast cancer (Xiong et al., 2017), colon cancer (Lan et al., 2015), chronic myeloid leukemia (Goff et al., 2013) and prostate cancer (Volate et al., 2010) by targeting MDM2, VEGFR, Bcl-2 and p53. Herein, the results from our work proved that gossypol could inhibit the proliferation of NSCLC cells by targeting EGFRL858R/T790M. Gossypol also inhibited the phosphorylation of EGFR and suppressed the phosphorylation of extracellular signal–regulated protein kinase (ERK) and AKT. These results indicated that gossypol could be developed as a new potent EGFRL858R/T790M inhibitor and could inhibit the proliferation of NSCLC.

### RESULTS AND DISCUSSION

### Gossypol Inhibits Cell Proliferation in NSCLC Cells

To identify potent small molecule inhibitor of EGFRL858R/T790M, we screened a natural products library with 235 compounds. We evaluated the anti-proliferative effect of each compound on H1975 cell line harboring EGFRL858R/T790M. Gossypol was identified and chosen for further mechanistic investigation due to its significantly anti-proliferative ability. H1975 cells were treated with an increasing concentration of gossypol for 72 h, and then cell viability was determined based on standard MTT assay protocol. As shown in **Figure 1**, the growth of H1975 cells were obviously inhibited by the treatment of gossypol in a dose-dependent manner, with 50% inhibition concentration (IC50) of 10.89 ± 0.84 µM. In addition, we have tested the cytotoxicity effect of gossypol on human normal lung fibroblast cell line CCD19 (IC<sup>50</sup> is 14.89 ± 1.12 µM) and human NSCLC cell line H358 with EGFRWT (IC<sup>50</sup> is 35.26 ± 1.09 µM) (the corresponding results can be seen in Supplementary Figure S1). Afatinib was used as positive control (IC<sup>50</sup> = 170.4 ± 1.1 nM). The structure and corresponding cytotoxicity of gossypol were showed in **Figure 1**. We also examined the effect of gossypol on cell colony formation (**Figure 2A**), in accordance with the cell cytotoxicity, gossypol significantly inhibited the colony formation capacity in a dosedependent manner in H1975 cell line. Collectively, these results suggested that gossypol could inhibit the proliferation of H1975 cell line.

## Gossypol Induces Cell Apoptosis in NSCLC Cells

To investigate whether the induction of apoptosis also contributed to gossypol-mediated growth inhibition of H1975 cells, Annexin V-FITC/PI staining assay was employed to analyze the number of apoptotic cells after treatment with gossypol using a flow cytometer. As shown in **Figures 3A,B**, gossypol induced cell apoptosis on H1975 cell line with a concentration-dependent manner.

Bcl-2 family members play key roles in the regulation of apoptotic progress. To understand how gossypol induced apoptosis, we next examined whether gossypol could alter the expression of apoptotic proteins in H1975 cells. As shown in **Figure 3C** and Supplementary Figure S4, treatment with gossypol for 24 h remarkably upregulated the expression level of proapoptotic protein Bad in a concentration-dependent manner. Moreover, we also observed that gossypol induced PARP cleavage, a hallmark of caspase-dependent apoptosis, in accordance with the expression level of cleaved caspase-3. Therefore, these results suggested that gossypol induced caspasedependent apoptotic cell death by upregulating the expression of pro-apoptotic protein Bad in NSCLC cells.

### Gossypol Inhibits the Cell Migration of H1975 Cell Line

The effect of gossypol on H1975 cell migration capability was estimated by a wound-healing assay. In the wound-healing assay (see **Figure 2B**), cells treated with gossypol reduced the rate of wound healing along with the increasing of treatment concentration, which was significantly lower than the untreated

cells following incubation. These results demonstrated that gossypol inhibited the migration ability of H1975 cell lines in a dose-dependent manner.

### Gossypol Inhibits the Activity of Tyrosine Kinase

To assess the kinase inhibition activities of gossypol, we performed a kinase inhibition profile assay of gossypol against recombinant human EGFRL858R/T790M. The selected compound gossypol exhibited inhibitory activity, which effectively inhibited the enzymatic activity of EGFRL858R/T790M with an EC<sup>50</sup> value of 150 ± 30.7 nM (see Supplementary Figure S2). Besides, gossypol also inhibited the enzymatic activity of EGFRWT with an EC<sup>50</sup> value of 252.9 ± 26.9 nM, higher than that to EGFRL858R/T790M (the corresponding results can be seen in Supplementary Figure S2). Afatinib was used as positive control (EC<sup>50</sup> = 9.6 ± 2.9 nM). The effect of gossypol on cells is very complicated, and it is still difficult to distinguish which part is caused by EGFR targeting. To ensure the consistency of the experimental results, we conducted the entire ELISA enzyme inhibiting assay at the same time. Therefore, EGFRWT could be used as control to compare with EGFRL858R/T790M.

### Molecular Docking Predicts the Potential Binding of Gossypol to EGFR

Molecular docking calculation was performed to gain insight into the binding mode between gossypol and EGFRL858R/T790M. The molecular docking results (see **Figure 4** and Supplementary Figure S3) proved that gossypol could be docked into the kinase domain mainly composed of hydrophobic residues of C-helix and A-loop with a docking score of −6.42 ± 0.24 kcal/mol. Five hydrogen bonds were formed between gossypol and the carbonyl group of Q791, amino group of M793, hydroxyl group of T854 and amino group of K875. In addition, the hydrophobic contacts formed between gossypol and surrounded residues, including L718, M790, F723, F858, L792, L844, and M793, which also contributed to the interaction between gossypol and EGFRL858R/T790M. Therefore, the above results suggested that gossypol could bind to EGFRL858R/T790M.

### Gossypol Effectively Suppresses Phosphorylation of EGFR as Well as Its Downstream Signaling Pathway

To determine whether gossypol could inhibit the expression level of EGFR in cells, we investigated the effect of gossypol on the phosphorylation of EGFR in NSCLC cells. H1975 cells were treated with gossypol (0–20 µM) for 24 h. Western blot analysis showed that gossypol inhibited the phosphorylation of EGFR (Tyr 1068) in a concentration dependent manner (see **Figure 5**). To explore the detailed anti-cancer mechanism of gossypol, we further evaluated the downstream pathways of EGFR, including ERK and AKT signaling pathways. Treatment with gossypol also inhibited the phosphorylation of AKT and ERK in a concentration-dependent manner, consistent with the tendency of phosphorylation level of EGFR. Thus, our results indicated that gossypol

surface of EGFRL858R/T790M.

could suppress the phosphorylation of EGFR and its downstream AKT and ERK signaling pathways, resulting in induction of apoptosis and proliferation inhibition of H1975 cells.

#### CONCLUSION

In this study, by screening a natural products library, we have identified that gossypol was a potential anticancer agent targeting EGFRL858R/T790M. Our results proved that gossypol inhibited the proliferation and induced apoptosis of human NSCLC cell line harboring EGFRL858R/T790M. Moreover, gossypol decreased the phosphorylation level of EGFR and its downstream signaling pathways AKT and ERK. Overall, our findings indicate that gossypol is a novel potent EGFRL858R/T790M inhibitor, which may serve as a useful therapeutic agent against NSCLC harboring EGFRL858R/T790M mutation.

### MATERIALS AND METHODS

#### Reagents

Gossypol was purchased from Selleck Ltd., which was dissolved in dimethyl sulfoxide (DMSO) to form a 20 mM stock solution. Fetal bovine serum (FBS), antibiotics and RPMI medium were purchased from Gibco (Carlsbad, CA, United States). RIPA lysis buffer and antibodies Bad, Bcl-XL, PARP, Cleaved Caspase-3, anti-p-EGFR (1068), anti-p-extracellular signal-regulated kinase 1/2 (Erk1/2) (Thr202/Tyr204), anti-p-Akt (Ser473), anti-Erk1/2, anti-Akt, anti-PERK, and anti-EGFR were purchased from Cell Signaling Technology (Beverly, MA, United States). Anti-GAPDH was purchased from Santa Cruz (Dallas, TX, United States).

#### Cell Culture

The human NSCLC cell line H1975 was purchased from the American Type Culture Collection (ATCC) (Manassas, VA,

United States). Cells were cultured in RPMI1640 medium supplemented with 10% FBS, 100 U/ml penicillin and 100 µg/ml streptomycin. All the cells were cultured at 37◦C in a humidified atmosphere containing 5% CO2.

#### Cell Proliferation Assay

Cell viability was evaluated by using the standard 3-(4,5 dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide MTT assay. Briefly, 3 × 10<sup>3</sup> cells per well were plated in 96-well plates and cultured overnight for cell adhesion. The cells were treated with DMSO or various concentrations of gossypol for 72 h. Subsequently, 10 µL MTT was added into each well and incubated for 4 h, and then the dark blue crystals were dissolved with 100 µl of the resolved solution (99% DMSO). Finally, the absorbance at 570 nm was measured by microplate reader (Tecan, Morrisville, NC, United States). The cell viability was calculated relative to controls, with results based on at least three independent experiments. Cells treated with the vehicle (DMSO) alone served as a control.

### Colony Formation Assay

Briefly, H1975 cells were seeded in 6-well plates (1000 cells/well), after attachment overnight, cells were exposed to various concentration of gossypol with medium changes every 3 days until visible colonies formed. The colonies were washed with cold PBS, then fixed in 4% paraformaldehyde (PFA) for 15 min, and then stained with 0.5% crystal violet (1% PFA, 0.5% crystal violet, and 20% methanol in ddH2O) for 20 min. The colonies were photographed.

#### Apoptosis Analysis Assay

NSCLC cells were plated on 6-well plate with cell density of 2 × 10<sup>5</sup> cells per well and cultured overnight for adhesion. Subsequently, the cells were treated with different concentrations of gossypol for 24 h. After treatment, the cells were harvested by trypsin digestion and washed twice with ice-cold PBS, and resuspended in 100 µl 1 × binding buffer. Next, 4 µl of propidium iodide (PI, 1 mg/ml) and 1 µl Annexin-V fluorescein dye were added to the solution and mixed well at room temperature in the dark for 15 min. After that, the cells were resuspended in 300 µl of 1 × binding buffer from BD Biosciences (San Jose, CA, United States). The percentage of apoptotic cells was quantitatively measured using a BD FACSAria III flow cytometer from BD Bioscience (San Jose, CA, United States).

### Enzyme-Linked Immunosorbent Assay (ELISA)

The kinase activity was evaluated with ELISA assay based on the kinase domain of dual-mutant EGFR (EGFRL858R/T790M) recombinant human protein (Peng et al., 2014). Briefly, 20 µg/mL Poly (Glu, Tyr) 4:1 (Sigma, St. Louis, MO, United States) was precoated in 96-well plates as substrate. Active kinases were added and incubated with indicated gossypol in 1 × reaction buffer containing 5 µmol/L ATP at 37◦C for 1 h. Then, the wells were washed with PBS and then incubated with an antiphosphotyrosine (PY99) antibody (Santa Cruz Biotechnology, Santa Cruz, CA, United States) followed by a horseradish peroxidase (HRP)-conjugated secondary antibody. The wells were read with a multiwell spectrophotometer (VERSAmaxTM, Molecular Devices, Sunnyvale, CA, United States) at 492 nm. The inhibitory rate (%) was calculated with the following formula: [1–(A<sup>492</sup> treated/A<sup>492</sup> control)] × 100%, and responding EC<sup>50</sup> values were calculated from the fitting inhibitory curves.

### Molecular Docking

The X-ray structure of EGFRL858R/T790M with a resolution of 2.5 Å complexed with diaminopyrimidine derivative was retrieved from the Protein Data Bank [PDB ID code 4RJ8 (Hanan et al., 2014)] for docking with gossypol. Molecular structures were prepared using the standard procedure from the Protein Preparation Wizard module in Schrödinger 2015. The docking grid box was defined using the Receptor Grid Generation tool in Glide by centering on native ligand in the EGFRL858R/T790M structure. The structure of gossypol was derived from the PubChem database<sup>1</sup> , which was imported to the LigPrep module (Version 2.3, Schrödinger, LLC, New York, NY, United States) based on OPLS-2005 force field (Kaminski et al., 2001). The ionized state was assigned by using Epik (Version 2.0, Schrödinger, LLC, New York, NY, United States) at a pH value of 7.0 ± 2.0. Gossypol was docked into the kinase domain of the EGFRL858R/T790M using the Glide (Version 5.5, Schrödinger, LLC, New York, NY, United States) with the extra precision (XP) scoring mode. In the process of molecular docking, 5000 poses were generated during the initial phase of the docking calculation. The best binding pose for Gossypol was conserved for the further analysis.

#### Western Blot Analysis

Preparation of whole-cell protein lysates for western blot analysis was conducted as follows. After treatment, cells were lysed in RIPA lysis buffer (150 mmol/L NaCl, 50 mmol/L Tris–HCl, pH 8.0,1% Triton X-100, 0.1% SDS, and 1% deoxycholate)

<sup>1</sup>http://pubchem.ncbi.nlm.nih.gov

containing protease inhibitor cocktail from Roche (Basel, Lewes, United Kingdom) for 15 min on ice and then boiled for 10 min. The concentration of total protein was determined with a Bio-Rad DCTM Protein Assay Kit (Bio-Rad, Hercules, CA, United States). Equal amounts of total protein (30 µg) protein lysate were loaded and separated by 10% SDS–polyacrylamide gel electrophoresis and then transferred to a nitrocellulose (NC) membrane from Millipore (Billerica, MA, United States). The membranes were blocked with 5% milk without fat in 1 × TBST for 2 h at room temperature, and then incubated with various primary antibodies, including phospho-AKT, phospho-ERK, t-AKT, t-ERK, phospho-EGFR (Tyr1068), t-EGFR at 1:1000 dilutions and anti-GADPH antibody at a 1:800 dilution overnight at 4◦C. After washing the membranes in TBST three times (5 min per time), secondary fluorescent antibodies, either anti-rabbit or anti-mouse secondary antibodies depending on the source of the primary anti-bodies, were added to the membrane at 1:10,000 dilutions at room temperature for 2 h. GAPDH was used as the loading control and for normalization. The signal intensity of the membranes was detected using an LI-COR Odessy scanner (Belfast, ME, United States).

#### Statistical Analysis

The results were expressed as mean values ± standard error (mean ± SE). Statistical analysis was performed using one-way

#### REFERENCES


ANOVA followed by Bonferroni's post-tests. Significance was accepted at P < 0.05.

#### AUTHOR CONTRIBUTIONS

EL, LL, and XY conceived this research, led the project, and revised the manuscript. YW, HL, XF, FD, ZJ, QW, and LL carried out the experiments and analyzed the data. YW and XY wrote the manuscript. All authors reviewed the manuscript.

### FUNDING

This work was supported by Macau Science and Technology Development Fund (Project Nos. 082/2013/A3, 086/2015/A3, 082/2015/A3, 005/2014/AMJ, and 046/2016/A2).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00728/full#supplementary-material


preclinical lung cancer models. Oncogene 27, 4702–4711. doi: 10.1038/onc.20 08.109


results of a phase II trial in patients with advanced non-small-cell lung cancer. J. Clin. Oncol. 28, 3076–3083. doi: 10.1200/JCO.2009.27.9414


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wang, Lai, Fan, Luo, Duan, Jiang, Wang, Leung, Liu and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Identification of a Novel Protein Arginine Methyltransferase 5 Inhibitor in Non-small Cell Lung Cancer by Structure-Based Virtual Screening

Qianqian Wang<sup>1</sup>† , Jiahui Xu<sup>1</sup>† , Ying Li<sup>1</sup> , Jumin Huang<sup>1</sup> , Zebo Jiang<sup>1</sup> , Yuwei Wang<sup>1</sup> , Liang Liu<sup>1</sup> \*, Elaine Lai Han Leung1,2,3 \* and Xiaojun Yao1,4 \*

<sup>1</sup> State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Taipa, Macau, <sup>2</sup> State Key Laboratory of Respiratory Diseases, Guangzhou Institute of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical College, Guangzhou, China, <sup>3</sup> Department of Respiratory Medicine, Taihe Hospital, Hubei University of Medicine, Hubei, China, <sup>4</sup> State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Chiara Bianca Maria Platania, Università degli Studi di Catania, Italy Matthew Brook, University of Edinburgh, United Kingdom

#### \*Correspondence:

Liang Liu lliu@must.edu.mo Elaine Lai Han Leung lhleung@must.edu.mo Xiaojun Yao xjyao@must.edu.mo †These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 31 October 2017 Accepted: 15 February 2018 Published: 01 March 2018

#### Citation:

Wang Q, Xu J, Li Y, Huang J, Jiang Z, Wang Y, Liu L, Leung ELH and Yao X (2018) Identification of a Novel Protein Arginine Methyltransferase 5 Inhibitor in Non-small Cell Lung Cancer by Structure-Based Virtual Screening. Front. Pharmacol. 9:173. doi: 10.3389/fphar.2018.00173 Protein arginine methyltransferase 5 (PRMT5) is able to regulate gene transcription by catalyzing the symmetrical dimethylation of arginine residue of histone, which plays a key role in tumorigenesis. Many efforts have been taken in discovering small-molecular inhibitors against PRMT5, but very few were reported and most of them were SAM-competitive. EPZ015666 is a recently reported PRMT5 inhibitor with a new binding site, which is different from S-adenosylmethionine (SAM)-binding pocket. This new binding site provides a new clue for the design and discovery of potent and specific PRMT5 inhibitors. In this study, the structure-based virtual screening targeting this site was firstly performed to identify potential PRMT5 inhibitors. Then, the bioactivity of the candidate compound was studied. MTT results showed that compound T1551 decreased cell viability of A549 and H460 non-small cell lung cancer cell lines. By inhibiting the methyltransferase activity of PRMT5, T1551 reduced the global level of H4R3 symmetric dimethylation (H4R3me2s). T1551 also downregulated the expression of oncogene FGFR3 and eIF4E, and disturbed the activation of related PI3K/AKT/mTOR and ERK signaling in A549 cell. Finally, we investigated the conformational spaces and identified collective motions important for description of T1551/PRMT5 complex by using molecular dynamics simulation and normal mode analysis methods. This study provides a novel non-SAM-competitive hit compound for developing small molecules targeting PRMT5 in non-small cell lung cancer.

Keywords: protein arginine methyltransferase 5, non-small cell lung cancer, T1551, virtual screening, molecular dynamics simulation

### INTRODUCTION

Protein arginine methyltransferases (PRMTs) are a class of enzymes that transfer a methyl group from the cofactor S-adenosylmethionine (SAM) to arginine omega nitrogen of substrate protein. Based on product specificity, PRMTs can be divided into three subclasses: type I, II, and III, which asymmetrically dimethylate, symmetrically dimethylate, and monomethylate their

substrates, respectively (Bedford and Clarke, 2009). Protein arginine methyltransferase 5 (PRMT5), as a type II PRMT, is responsible for catalyzing the symmetrical dimethylation of arginine residue of substrate proteins, which has been implicated in diverse cellular and biological processes including transcriptional regulation, RNA metabolism and ribosome biogenesis (Liu et al., 2011; Shilo et al., 2013; Wei et al., 2013; Yang and Bedford, 2013; Deuker and McMahon, 2014; Stopa et al., 2015). An increasing number of studies emphasized that PRMT5 was upregulated in lymphomas, breast cancer, lung cancer, colorectal cancer, and glioblastoma (Ibrahim et al., 2014; Yan et al., 2014; Li et al., 2015; Sheng and Wang, 2016). For instance, Ibrahim et al. (2014) demonstrated that a high cytoplasmic expression of PRMT5 was closely related to high-grade subtypes of primary lung adenocarcinomas and a poor prognosis. Sheng and Wang (2016) pointed out that PRMT5 could regulate multiple signaling pathways to promote lung cancer cell proliferation. All of these suggest that PRMT5 is a promising therapeutic target in lung cancer. However, although many efforts have been made in discovering PRMT5 inhibitors, very few were reported (Alinari et al., 2015; Smil et al., 2015; Mao et al., 2017), and they either occupied SAMbinding site or mimicked SAM. Recently, EPZ015666 has been shown to exhibit remarkably antitumor activity by inhibiting PRMT5, and the pre-clinical studies have also showed that both cell lines and xenograft models of mantle cell lymphoma were sensitive to EPZ015666 (Chan-Penebre et al., 2015). Importantly, the resolved PRMT5-SAM-EPZ015666 crystal complex shows that EPZ015666 does not compete with SAM, but locates in a new pocket (different from SAM-binding site) of PRMT5. This binding site in PRMT5 provides us a new way to discovery and development of more potent and specific PRMT5 inhibitors.

Structure-based virtual screening using molecular docking has become a powerful tool in the drug discovery for rapidly enriching hits from large pools of compound databases. Nowadays, it has been successfully applied to discover novel inhibitors of epigenetic targets, such as SET7, KDM4B, and SIRT2 (Chu et al., 2014; Meng et al., 2015; Huang et al., 2017). The successful use of structure-based virtual screening in the above mentioned epigenetic targets inspires us to identify the novel inhibitor against the non-SAM-binding site of PRMT5. The activity of the identified inhibitors will be further studied on their effects of the biological functions of cancer cells, histone substrate methylation, target gene expression and related signaling pathway. Here, 158 candidate compounds were firstly obtained by the structure-based virtual screening method. MTT assay results showed that among them T1551 had strongest cytotoxicity on A549 non-small cell lung cancer cell line. In addition to inhibiting PRMT5 methyltransferase activity, a series of functional assays showed that T1551 reduced symmetric dimethylation level of H4R3, downregulated the protein expressions of two target genes of PRMT5, FGFR3, and eIF4E, and inhibited the activation of PI3K/AKT/mTOR and ERK signaling. Finally, molecular dynamics simulations and normal mode analysis were performed to study the detailed binding mode and conformational space of T1551/PRMT5 complex. The identification of this novel PRMT5 inhibitor T1551 and its inhibitory mechanism study will be helpful for the development of PRMT5-targeting cancer treatment.

### MATERIALS AND METHODS

### Molecular Docking-Based Virtual Screening

Molecular docking-based virtual screening was carried out with Schrödinger software package (Schrödinger, LLC, New York, NY, United States; Schrödinger, 2015). The crystal structure of PRMT5 complexed with cofactor SAM and inhibitor EPZ015666 was derived from Protein Data Bank (PDB ID: 4X61). The protein was first prepared in Protein Preparation Wizard module, including adding hydrogens, refining loop region and minimization. Grid box was generated on the size and center of EPZ015666. Previously, SAM was proved to form crucial cation–π interactions with EPZ015666 and contribute to the binding affinity of PRMT5 inhibitors (Chan-Penebre et al., 2015). Here, to test the role of SAM in docking, enrichment factors (EFs) of virtual screening for PRMT5-EPZ015666 with and without SAM were calculated and compared. Firstly, 16 active derivatives of EPZ015666 were collected from the published paper (Duncan et al., 2015). Eight hundred decoys were then generated at a ratio of 1:50 with DUD-E (Mysinger et al., 2012). All the actives and decoys were docked into EPZ015666 binding site of PRMT5 with and without SAM, respectively. Finally, the 1 and 10% EFs for PRMT5-EPZ015666 and PRMT5-EPZ015666-SAM models were calculated, respectively. For the ligands, prior to virtual screening, a total of 1,671,908 compounds from Chemdiv, Specs and TargetMol databases were filtered by pan-assay interference structures (PAINS) (Baell and Holloway, 2010) and "Lipinski's rule of five" to remove those with false positivity, function group and poor absorption/permeability. Then, the obtained compounds were prepared with Ligand Preparation module. Three-level (HTVS, SP, and XP) molecular docking-based virtual screening was successively performed using Glide module. The top 10% (1,706) compounds ranked by glide score were clustered into 200 groups. By visually inspecting the binding poses of PRMT5-inhibitor, 158 compounds were selected for experimental validation. All compounds were purchased from Topscience company (Shanghai, China).

### Cell Culture and Cytotoxicity Assay

A549 and H460 cells (two non-small cell lung cancer cell lines) were purchased from ATCC, cultivated in RPMI 1640 medium supplemented with 10% FBS (Gibco Products, Big Cabin, OK, United States), 1% penicillin-streptomycin solution, and maintained at 37◦C in a CO<sup>2</sup> incubator with 5% CO2. One hundred and fifty-eight compounds from virtual screening were dissolved in DMSO and stored at −40◦C. To rapidly identify the compounds with strong inhibitory activity, 20 µM concentration for each compound was firstly used to treat A549 cell line for 72 h. During the MTT assay test, cells were firstly seeded on a 96-well microplate with 3,000 cells/well, cultured overnight for cell adhesion, and treated with DMSO (10.0 µM) or various concentrations (2.5, 5.0, and 10.0 µM) of the studied compound for 24, 48, and 72 h. Then, each well was added 10 µL MTT (5 mg/mL) and incubated for 4 h at 3 ◦C, followed by adding 100 µL acidic isopropanol (10% SDS and 0.01 mol/L HCl). Finally, the absorbance at 570 nm was measured by a Microplate Reader (Tecan US, Inc., Morrisville, NC, United States). Cell viability was calculated relative to untreated controls, and the results were based on at least three independent experiments.

#### In Vitro Enzymatic Assays

fphar-09-00173 February 27, 2018 Time: 15:52 # 3

PRMT5 enzymatic assay was carried out by Shanghai ChemPartner Company (998 Halei Road, Pudong New Area, Shanghai, 201203, China), as did previously by Ji et al. (2017). To obtain the specific IC<sup>50</sup> value, T1551 was diluted into 10 concentrations. PRMT5 protein was purchased from BPS bioscience (Cat. No. 51045), and SAM/SAH were purchased from Sigma. Inc. (Cat. No. A7007-100MG and No. A9384-25MG). T1551 was prepared as 10 mM stock in DMSO and diluted to the final concentration in DMSO. PRMT5 and substrates were incubated with indicated concentrations of T1551 in a 384-well plate for 60 min at room temperature. Then, acceptor and donor solutions were added to label the residual substrates of PRMT5. The labeling process was lasting for 60 min at room temperature, followed by reading endpoint with EnSpire with Alpha mode. In the in vitro enzymatic assays, 1% DMSO was used as vehicle control for normalization.

### Western Blot Analysis

Cells were washed twice with cold PBS, and lysed in RIPA lysis buffer containing protease and phosphatase inhibitors to extract total protein. Cell lysates were centrifuged for 5 min (12,000 g, 4◦C), and the supernatant was collected. Protein concentrations were determined by Bio-Rad protein Assay kit (Bio-Rad, Philadelphia, PA, United States). Equal amounts of protein (50 µg) were separated on a 10% SDS–PAGE gel, and transferred to a nitrocellulose (NC) membrane at 300 mA and 4 ◦C for 1 h. The membrane was incubated with primary antibody (1:1000), and then with a fluorescence-conjugated secondary antibody (1:10000). The primary antibody against PRMT5 was purchased from Merck Millipore Ltd., (Germany); antibodies against H4R3me2s and H4 were purchased from Abcam (Cambridge, MA, United States); antibodies against FGFR3 and eIF4E were purchased from Santa Cruz Biotechnology (Dallas, TX, United States); antibodies against total/phospho-AKT, total/phospho-ERK and total/phospho-mTOR were purchased from Cell Signaling Technology (Danvers, MA, United States). GAPDH was used as the loading control and for normalization. The signal intensity of the membranes was detected with a LI-COR Odyssey Scanner (Belfast, ME, United States).

#### Molecular Dynamics Simulation

To reveal the interaction features of T1551 and PRMT5, molecular dynamics (MD) simulations were used for sampling the conformational spaces of PRMT5-T1551 complex. Normal mode analysis was used for identifying important collective motions for the complex. All MD simulations were performed with Amber 16 software (Case et al., 2017). The Amber ff14SB force field (Maier et al., 2015) was used for PRMT5, and general amber force field (Wang et al., 2004) was utilized to parameterize inhibitors with their charges assigned by restrained electrostatic potential partial charges. TIP3P water was used to solvate the complex systems, with the solute 12 Å away from water box boundary. Chloride ions were added to neutralize the system. Then, 150 mM NaCl was added to mimic the physiological conditions. After minimization, heating and equilibration, 100 ns production run was carried out without any restraints in NPT ensemble. System temperature and pressure were regulated with Langevin thermostat and Berendsen barostat, respectively. All the bonds involving hydrogen were constrained by SHAKE algorithm allowing an integration time step of 2 fs. Particle mesh Ewald method (Linse and Linse, 2014) was used to calculate long-range electrostatic interactions. The binding free energy of inhibitors and PRMT5 was calculated by molecular mechanics generalized-born surface area (MM-GBSA) method (Hou et al., 2010; Platania et al., 2015; Wang et al., 2017). A single trajectory and three time-frames protocols were adopted here. Specifically, a total of 500 snapshots were extracted from the last 10, 20, and 40 ns trajectory, respectively. The normal mode analysis was performed to identify the collection motions of PRMT5-inhibitor complex during MD simulation, by using cpptraj in Amber 16 and Normal Mode Wizard plugin in VMD 1.9.

### Statistical Analysis

Descriptive analytical data were presented as mean ± SEM. Multiple comparisons were evaluated by one-way analysis of variance (ANOVA) using Graph Prim 5.0. P < 0.05 was considered statistically significant.

## RESULTS

#### The Selection of Candidate Compounds by Virtual Screening

In this study, we aim to find the non-SAM mimics, so EPZ015666-binding site, not SAM-binding site, was targeted in our virtual screening. Enrichment factor calculations showed that the 1 and 10% EFs for PRMT5-EPZ015666-SAM model were 44.6 and 8.7, higher than that (38.3 and 6.8) for PRMT5-EPZ015666 model. The area under receiver operating characteristic curve (AUC) for the former (0.96) was also higher than that for the latter (0.92). Both of two parameters suggested that SAM was helpful for enriching active compounds in the compound library. Therefore, SAM was remained as a part of the receptor in the screening.

By three-level (HTVS, SP, and XP) screenings, the top-1706 compounds ranked by glide score were remained and then clustered into 200 groups using k-means clustering protocol integrated in Canvas 2.4. When selecting the candidate compounds, the following criteria was considered: (1) choosing one compound at most in a group to retain structural diversity; (2) occupying the binding pocket with molecular

FIGURE 1 | Cytotoxic effects of (A) T1551, (B) 3039-0164, (C) T2002, and (D) T1090 on A549 cell, as analyzed by MTT assay. A549 cell was treated with each inhibitor for 72 h, respectively. Results were presented as mean ± SEM (n = 4). Glide score represented the docking score of inhibitor and PRMT5, and 1GMMGBSA represented the post-docking rescore of inhibitor and PRMT5.

FIGURE 2 | Cytotoxic effects of T1551 on (A) A549 and (B) H460 cells by MTT assay. (C) IC<sup>50</sup> values of T1551 on A549 and H460 cell lines. Cells were treated with each inhibitor for 24, 48, and 72 h, respectively. Data was presented as mean ± SEM (n = 4).

size neither too big nor too small; (3) choosing the one with smaller molecular weight or/and lower MM/GBSA score if compounds are similar; (4) forming the reported interactions with the key residues of PRMT5 (Chan-Penebre et al., 2015). For instance, Phe327 forms π–π interactions with THIQ ring of EPZ015666; THIQ forms cation–π interactions

FIGURE 3 | (A) Inhibition of T1551 on PRMT5 methyltransferase activity. (B) Protein expression levels of H4R3me2s in A549 cell treated with T1551 at different concentrations (0, 2.5, 5.0, and 10.0 µM). (C) Densitometric analysis of band intensities of H4R3me2s. Western blot analysis was performed for 24 h, with at least three independent experiments. Data was presented as mean ± SEM (n = 3), with ∗∗p < 0.01 for comparison between control group (DMSO-treated group) and T1551-treated group.

with methyl group of SAM; EPZ015666 interacts with the backbone -NH of Phe580 and side chains of Glu444. Based on these, 158 candidates were selected and purchased at last.

### T1551 Decreases Cell Viability of A549 Cell

The obtained 158 candidate compounds were then tested for MTT assay to determine their inhibitory activity. Many recent studies have showed that PRMT5 is upregulated in A549 non-small cell lung cancer cell line (Gu et al., 2012; Wei et al., 2012; Lim et al., 2014). A549 cell line was thus used here. To rapidly identify the compounds with the strong inhibitory activity, 20 µM concentration for each compound was firstly used to treat A549 cell for 72 h. The result showed that among 158 compounds there were four compounds exhibiting the >50% inhibitory percentage on A549 cell at 20 µM. Since T1551 had the strongest inhibitory activity (72 h, 50% inhibition

concentration IC<sup>50</sup> = 5.8 ± 1.0 µM) (**Figure 1**) and was chosen as the hit, a range of T1551 concentrations (0, 2.5, 5.0, and 10.0 µM) for 24, 48, and 72 h were then used to treat A549 to calculate its IC<sup>50</sup> values. As shown in **Figure 2**, T1551 exhibited significant anti-proliferation on A549 cell at 24 h in a concentration-dependent manner, with the IC<sup>50</sup> value of 11.2 ± 2.5 µM. The cytotoxic effects of T1551 were also verified using H460 cell, another NSCLC cell line with PRMT5 overexpression (**Figures 2B,C**).

### T1551 Inhibits PRMT5 Methyltransferase Activity and Decreases Symmetric Dimethylation Level of Histone 4

AlphaLISA assay was carried out to investigate the influence of T1551 on enzymatic activity of PRMT5. As shown in **Figure 3A**, T1551 inhibited PRMT5 enzyme activity in a dose-dependent manner. The corresponding IC<sup>50</sup> value was 34.1 ± 2.8 µM, suggesting that T1551 directly inhibited the methyltransferase function of PRMT5. PRMT5-driven methylation of arginine residues can lead to symmetric dimethylation of arginine residue 3 of histone 4 (H4R3me2s), which in turn alters chromatin structure to promote transcriptional repression (Branscombe et al., 2001; Zhao et al., 2009; Chen et al., 2017). To investigate the effect of T1551 on PRMT5 catalytic substrate, we measured the expression level of H4R3me2s protein with and without T1551 in A549 cell. The total H4 was used as loading control. From **Figures 3B,C**, we observed that after the treatment with T1551 for 24 h, the global level of H4R3me2s was notably decreased. Therefore, from the perspective of histone substrate, T1551 indeed inhibited the catalytic ability of PRMT5 methyltransferase.

### T1551 Downregulates the Expression of PRMT5 Target Genes

PRMT5 exerts its function by regulating the expression of target genes, such as oncogene FGFR3 and eIF4E (Zhang et al., 2015). FGFR3 and eIF4E were previously reported to frequently overexpress in lung cancer, myeloma, and ovarian cancers (van Rhijn et al., 2001; De Benedetti and Graff, 2004; Culjkovic-Kraljacic et al., 2012), thus playing an important role in tumor occurrence and development. Especially, according to several studies (Desai and Adjei, 2016; Babina and Turner, 2017) recently published, FGFR signaling has been considered as a promising target for lung cancer therapy. As can be seen from **Figure 4**, FGFR3 and eIF4E expressions were significantly decreased in A549 cell treated with 10.0 µM T1551. This reflects that T1551 may reduce FGFR3 and eIF4E expression by inhibiting PRMT5.

### T1551 Suppresses the Activation of AKT, ERK, and mTOR

As mentioned above, FGFR3 signaling is an important target for lung cancer treatment. In this FGFR3 pathway, PRMT5 participates in regulating FGFR3 downstream targets such as AKT, ERK, and mTOR (Wei et al., 2012). From the previous study, silencing PRMT5 could reduce FGFR3 expression, leading to the repression of AKT and ERK and subsequent inhibition of mTOR through AKT/mTOR or ERK pathway (Zhang et al., 2015).

To gain further insight into the molecular mechanism underlying PRMT5-dependent regulation of FGFR3, we examined whether T1551 could regulate the activation of AKT, ERK, and mTOR through inhibiting PRMT5. From **Figures 5A,B**, we observed that the protein levels of

T1551-treated group.

phosphorylated AKT and ERK were significantly reduced, especially at the 10 µM T1551 concentration, implying that T1551 suppressed the activation of PI3K/AKT/mTOR and ERK signaling mediated by PRMT5.

#### Inhibition Mechanism of T1551 Inhibitor for PRMT5 Protein

To investigate the detailed binding modes of PRMT5-inhibitors and compare the interaction features of T1551 and EPZ015666 with PRMT5, a single 100 ns MD simulations for PRMT5-SAM-T1551 and PRMT5-SAM-EPZ015666 systems were performed, respectively. Based on the obtained trajectory, with respect to the initial structure, the root-mean-square deviations (RMSDs) of protein CA atoms in PRMT5-SAM-T1551 and PRMT5- SAM-EPZ015666 systems were monitored to assess the overall stability of simulations. From **Figure 6a**, RMSDs of each system almost remained stable from 60 ns, indicating the convergence of the simulated trajectory. By calculating the binding free energies of PRMT5 with T1551 and EPZ015666, we can identify the energy origin of inhibitors binding to PRMT5. Here, considering the large size of PRMT5 and inhibitor complex (more than 600 residues, **Figure 6b**), entropic contribution was neglected. The predicted 1GGB for PRMT5-T1551 was higher than that of PRMT5-EPZ015666 (e.g., −32.11 ± 0.14 vs. −40.09 ± 0.18 kcal/mol in last 10 ns) in three replicas, exhibiting a consistent ranking with experimental results (**Table 1**; Chan-Penebre et al., 2015). Among the individual energy parts, van der Waals interaction (1Evdw) predominated the total energy in two systems, while non-polar solvation part (1Gsol\_np\_GB) contributed marginally to inhibitor binding. Therefore, the energetic origin of T1551/EPZ015666 inhibiting PRMT5 is mainly derived from 1Evdw.

Clustering analysis was used to extract representative structures in simulations. Comparing the binding modes of T1551 and EPZ015666 with PRMT5 (**Figures 6c,d**), we could see

TABLE 1 | The calculated binding free energy and its components (kcal/mol) of PRMT5 with T1551 and EPZ015666 complexes based on the last 10, 20, and 40 ns MD trajectory.


1G was estimated from gas-phase energy and solvation free energy. The former contains an electrostatic term (1Eele) and a van der Waals term (1Evdw). The latter is decomposed into polar (1Gsol\_polar) and non-polar solvation energy (1Gsol\_np).

that both inhibitors located in a hydrophobic pocket composed of Tyr304, Phe327, Ser578, and Phe580 when interacting with PRMT5. For EPZ015666, **Figure 6d** showed that its THIQ group formed strong cation–π interactions with partial positively charged methyl group of SAM. Actually, this feature has been reported as a key factor for EPZ015666's efficiency in the previous study (Chan-Penebre et al., 2015). Compared with EPZ015666, although T1551 was lack of THIQ group, its phenyl ring in indole scaffold also formed cation–π interactions with SAM, explaining the inhibitory activity of T1551 against PRMT5 to some extent. Meanwhile, the pyrrole ring of indole group in T1551 formed π–π interactions with Phe327. T1551 also formed a hydrogen bond with the main-chain oxygen atom of Ser578. These together fasten the interactions of T1551 with PRMT5.

Finally, in order to see the effect of inhibitors on conformational space of PRMT5, normal mode analysis was carried out. For clear visualization, only the normal modes of T1551 binding domain (10 Å around T1551) were shown here. From **Figure 7**, it could be observed that the partial collective motion of EPZ015666 was opposite to that of T1551 during the simulation. As for PRMT5, the obvious differences in two complexes were reflected from helix residues 310–319 and loop residues 290–299. In the PRMT5-EPZ015666 system (**Figure 7b**), the helix and loop vibrated in the face–face direction, which seemed like to tighten the binding pocket and thus stabilize EPZ015666 into it. From **Figure 7b**, we also observed that the obviously higher amplitude motion of loop domain made major contributions in it. Nevertheless, in the PRMT5-T1551 system (**Figure 7a**), the helix and loop moved in the back–back direction, which led the pocket not as compact as that in PRMT5-EPZ015666 system. It may be closely associated with that EPZ015666 has better biological activity for PRMT5 than T1551.

#### DISCUSSION

PRMT5, as currently the only known type II PRMT, is also a member with the few inhibitors reported in PRMT family. As the relationship of PRMT5 and lung cancer is constantly revealed, it is urgent to search for effective inhibitors targeting PRMT5 for lung cancer therapy. SAM, as the natural substrate of PRMT5, is responsible for providing the methyl group in the process of methyl transfer. To date, most of PRMT5 inhibitors reported were aimed for SAM-binding site and designed to disturb the interaction of SAM and PRMT5 (Alinari et al., 2015; Smil et al., 2015; Mao et al., 2017). However, due to their native binding state, it is difficult to find small molecules with the inhibitory activity stronger than SAM. Fortunately, the discovery of EPZ015666 and its new binding site provides a new clue for developing non-SAM competitive inhibitors.

In this study, we identified T1551 as a non-SAM competitive PRMT5 inhibitor by virtual screening method. Subsequently, the anticancer activity of T1551 against NSCLC was studied from three aspects, namely PRMT5 methyltransferase activity, expression of target genes and signaling pathway mediated by target genes. For the former, the "on-target" and direct inhibitory effect of T1551 was reflected from the low PRMT5 enzymatic activity, and indirect effect was from the low expression level of PRMT5's histone marker (H4R3me2s), which together suggested that T1551 inhibited PRMT5 methyltransferase activity.

For the latter, FGFR3 and eIF4E are two target genes of PRMT5 we focused here. We know that PI3K/AKT/mTOR pathway is a prototypic survival pathway in cancers, whose activation is closely related to cellular proliferation, growth, and mobility. FGFR3 promotes the survival of cancer cells just by stimulating the downstream PI3K/AKT/mTOR pathway (Kang et al., 2007; Hafner et al., 2010). Using RNA interference technology, Zhang et al. (2015) revealed that silencing PRMT5 could significantly downregulate FGFR3 and eIF4E expression. In our study, via inhibiting PRMT5, the identified T1551 was also shown to reduce the protein expressions of oncogene FGFR3 and eIF4E. Despite that the change of phosphorylated mTOR was not significant possibly due to the amplification effect of a signaling cascade, the concurrent reducing of phosphorylated AKT and ERK indicated that T1551 blocked the activation of PI3K/AKT/mTOR and ERK pathways in NSCLC cell line.

Previous studies emphasized that cation–π interaction between the tetrahydroisoquinoline group of EPZ015666 and partial positively charged methyl group of SAM was essential for EPZ015666's higher competitive ability for PRMT5 relative to histone substrate (Chan-Penebre et al., 2015; Duncan et al., 2015). Replacing SAM with SAH, the binding affinity of EPZ015666 and PRMT5 could be decreased more than 100 times. Due to the importance of this feature, in the subsequently structural optimization of EPZ015666, cation–π has always been retained as a crucial interaction (Duncan et al., 2015). By comparing the binding modes of T1551 and EPZ015666 with PRMT5-SAM, we observed that the conformation of T1551 in PRMT5 new pocket was similar to that of EPZ015666. Importantly, the benzene ring of T1551 indole scaffold also formed strong cation–π interactions with the methyl group of SAM. This explains the inhibitory source of T1551 for PRMT5 to some extent.

In summary, a novel PRMT5 inhibitor T1551 with the indole scaffold was identified in this study, whose functional influence on PRMT5 was verified by a series of biological assays

### REFERENCES


and theoretical inhibitory basis on PRMT5 was revealed by molecular dynamic simulation method. These results provide a lead compound for the further design of PRMT5 inhibitors, and contribute to the development of PRMT5-targeting cancer treatment.

### AUTHOR CONTRIBUTIONS

XY, EL, and LL conceived the project. XY, EL, and QW designed the experiments. QW, JX, YL, JH, ZJ, and YW carried out the research and data analysis. XY, EL, LL, and QW wrote the paper.

### FUNDING

This work was supported by Macao Science and Technology Development Fund (Project Nos: 046/2016/A2, 086/2015/A3, and 005/2014/AMJ).



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wang, Xu, Li, Huang, Jiang, Wang, Liu, Leung and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Discovery of Potent Disheveled/Dvl Inhibitors Using Virtual Screening Optimized With NMR-Based Docking Performance Index

Kiminori Hori<sup>1</sup> , Kasumi Ajioka<sup>2</sup> , Natsuko Goda<sup>1</sup> , Asako Shindo<sup>3</sup> , Maki Takagishi<sup>4</sup> , Takeshi Tenno1,5 and Hidekazu Hiroaki1,2,5 \*

<sup>1</sup> Laboratory of Structural Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Nagoya University, Nagoya, Japan, <sup>2</sup> Department of Biological Science, School of Science, Nagoya University, Nagoya, Japan, <sup>3</sup> Division of Biological Science, Graduate School of Science, Nagoya University, Nagoya, Japan, <sup>4</sup> Department of Pathology, Graduate School of Medicine, Nagoya University, Nagoya, Japan, <sup>5</sup> BeCellBar LLC, Business Incubation Center, Nagoya University, Nagoya, Japan

#### Edited by:

Leonardo G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Takuma Sugi, Shiga University of Medical Science, Japan Matthieu Sainlos, CNRS UMR 5297/University of Bordeaux, France

#### \*Correspondence:

Hidekazu Hiroaki hiroaki.hidekazu@ f.mbox.nagoya-u.ac.jp

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 29 April 2018 Accepted: 10 August 2018 Published: 05 September 2018

#### Citation:

Hori K, Ajioka K, Goda N, Shindo A, Takagishi M, Tenno T and Hiroaki H (2018) Discovery of Potent Disheveled/Dvl Inhibitors Using Virtual Screening Optimized With NMR-Based Docking Performance Index. Front. Pharmacol. 9:983. doi: 10.3389/fphar.2018.00983 Most solid tumors have their own cancer stem cells (CSCs), which are resistant to standard chemo-therapies. Recent reports have described that Wnt pathway plays a key role in self-renewal and tumorigenesis of CSCs. Regarding the Wnt/β-catenin pathway, Dvl (mammalian Disheveled) is an attractive target of drug discovery. After analyzing the PDZ domain of human Dvl1 (Dvl1-PDZ) using NMR, we subjected it to preliminary NMR titration studies with 17 potential PDZ-binding molecules including CalBioChem-322338, a commercially available Dvl PDZ domain inhibitor. Next, we performed virtual screening (VS) using the program GOLD with nine parameter sets. Results were evaluated using the NMR-derived docking performance index (NMR-DPI). One parameter set of GOLD docking showing the best NMR-DPI was selected and used for the second VS against 5,135 compounds. The second docking trial identified more than 1,700 compounds that exhibited higher scores than CalBioChem-322338. Subsequent NMR titration experiments with five new candidate molecules (NPL-4001, 4004, 4011, 4012, and 4013), Dvl1-PDZ revealed larger chemical shift changes than those of CalBioChem-322338. Finally, these compounds showed partial proliferation inhibition activity against BT-20, a triple negative breast cancer (TNBC) cell. These compounds are promising Wnt pathway inhibitors that are potentially useful for anti-TNBC therapy.

Keywords: Wnt signaling, protein–protein interaction inhibitor, NMR-derived docking performance index, virtual screening, triple negative breast cancer

### INTRODUCTION

Poor therapeutic outcomes of chemotherapy against several solid tumors pose a challenge to antitumor drug discovery and development. Cancer stem cells (CSCs) are believed to have a pivotal role in malignancy, survival against chemotherapy, and self-renewal of those tumors (Tannishtha et al., 2001). Consequently, CSCs are attractive targets for cancer chemotherapy development (Visvader and Lindeman, 2008). The Wnt/β-catenin pathway, along with Notch and Hedgehog pathways, are important in several CSCs (Reya and Clevers, 2005). In fact, the Wnt pathway has been commonly

regarded as the key signaling pathway of self-renewal and anti-differentiation of normal tissue stem cells. Accordingly, proliferation and self-renewal of several CSCs have been demonstrated as dependent on the Wnt pathway. For that reason, Wnt pathway is an attractive target for anti-CSC chemotherapy (Takebe et al., 2011; Holland et al., 2013).

The Wnt/β-catenin pathway is activated by Wnt ligands. Frizzled (Fzd)/LRP co-receptors coordinately bind Wnt, and transduce the signal to cytosolic downstream components including Axin, APC, GSK3β, and CK1. Accordingly, a transcription factor β-catenin is accumulated to induce target gene expression. This signaling system is carried out by a constitutive process of proteasomal degradation of β-catenin at the "Wnt-off " state. Specifically, β-catenin degradation is initiated by phosphorylation by GSK3β. At "Wnt-on" state, then the interaction between Dvl, and Axin inhibits GSK3β, thereby accumulating β-catenin in the cytoplasm and the nucleus. Dvl, a 75 kD multi-domain adaptor protein with Disheveled-aXin (DIX), Post synaptic density-95, Disc large, and Zonular occludens-1 (PDZ), and Disheveled-Egl10-Pleckstrin (DEP) domains (**Figure 2A**), plays a central role in both canonical (β-catenin-dependent) and non-canonical (β-cateninindependent) pathways of Wnt signaling (Gao and Chen, 2010). There are three mammalian Disheveled orthologs, Dvl-1, 2, and 3, in human genome, with functional redundancy. The PDZ domain of Dvl (Dvl-PDZ) specifically interacts to the C-terminus of Fzd (Wong et al., 2003) upon Wnt binding to the extracellular domain of Fzd. Accordingly, Dvl-PDZ is an attractive target for exploring small molecule inhibitors (**Figure 2B**), and has been characterized extensively. For instance, the binding mode of the tripeptides VVV and VWV against Dvl-PDZ has been reported (Lee et al., 2009a). The complex structure of peptidederived inhibitors and Dvl2-PDZ has also been reported (Zhang et al., 2009). In addition, several reports have described Dvl-PDZ inhibitors, including a peptide-mimic compounds NSC668036 (Shan et al., 2005), 1H-indole-5-carboxylic acid derivative FJ9 (Fujii et al., 2007), sulindac (Lee et al., 2009b), N-benzoyl-2-amino-benzoic acid derivative CalBioChem-322338 (Grandy et al., 2009), and phenoxyacetic acid analogs (Choi et al., 2016). The present study specifically examines N-benzoyl-2-aminobenzoic acid analogs including CalBioChem-322338 because 2 amino-benzoic acid moiety is independently proposed as a key moiety of group-specific inhibitors against several PDZ domains. Therefore, it represents a potential pharmacophore (Tenno et al., 2013). During our research exploring new inhibitors against Zonular Occludens-1 PDZ1 domain (Umetsu et al., 2011), we obtained several N-substituted-2-amino-benzoic acid analogs that are chemically similar to CalBioChem-322338 (**Figure 3** and **Supplementary Figure S1**). The present study evaluates the affinities of those compounds against human Dvl1 PDZ domain (hDvl1-PDZ) using solution NMR experiments (**Figure 2D**).

Virtual screening (VS) of drug candidates, known as high-throughput protein–ligand docking, is a powerful approach. Commercial applications are widely used, such as Glide (Friesner et al., 2004), FRED (McGann, 2011), MOE/ASEDock (Goto et al., 2008), and GOLD (Verdonk et al., 2003), as well as academic applications such as AutoDock (Goodsell et al., 1996), AutoDock-VINA (Trott and Olson, 2010), and Sievegene (Fukunishi et al., 2005). According their increasing convenience and availability, another practical issue has arisen: VS experiments with different algorithms, different parameter settings, and different target 3D structures might produce disparate results. Consequently, the benchmarking of docking algorithms has come to represent an important issue (McGaughey et al., 2007; Lindh et al., 2015). For the present study, we decided to use GOLD because GOLD is recognized as having acceptably high performance in comprehensive benchmarking throughout several VS programs (Wang et al., 2016). Moreover, results have demonstrated that the experimental tuning of parameter sets and/or the selection of target model structures might greatly improve performance and provide higher accuracy of prediction (Huang and Wong, 2016). Encouraged by that idea proposed by Huang et al., we introduced the idea into our project as a simplified index for evaluating nine docking scoring functions of GOLD. For this study, the index is designated as the NMR-based docking performance index (NMR-DPI).

First, 17 potential PDZ-binding molecules as well as CalBioChem-322338, all of which are N-substituted-2-aminobenzoic acid analogs, were analyzed using NMR chemical shift perturbation (CSP) experiments. We believe that the NMR-CSP experiment is among the easiest and most robust assay methods to compare the affinities of a series of compounds against <sup>15</sup>N-labeled small protein (Williamson, 2013). Second, these 17 potential PDZ-binding molecules were docked against hDvl2-PDZ using GOLD with nine different scoring functions. Third, out of the nine scoring functions, we identified the one that is most consistent to the CSP experiments of the 17 compounds. This optimized scoring function was used for a new VS with our in-house focused library, which is a subset of the library LIGANDBOX (Kawabata et al., 2013) containing commercially available 5,135 N-substituted-2-amino-benzoic acid analogs. From the top hit compounds after the new VS experiment, 13 new molecules were purchased: NPL-4001 – NPL-4007, and NPL-4011 – NPL-4016 (**Figure 1**). Our seven original compounds induced markedly larger chemical shift changes upon hDvl1-PDZ than those induced by CalBioChem-322338. The compounds were evaluated further by the cell-based assays as potential Wnt pathway inhibitors. The validity and possible limitations of NMR-DPI were also assessed.

### MATERIALS AND METHODS

### Preparation of Protein Samples

The expression vector for the recombinant GST-tagged form of prototype hDvl1-PDZ<sup>∗</sup> domain (residues 244–342) was constructed using the PRESAT-vector methodology (Goda et al., 2004). The vector for the GST-tagged hDvl1-PDZ domain (residues 246–340, four amino acids shorter construct) was then produced using the standard PCR cloning technique with pGEX-6P3 plasmid (GE Healthcare, Little Chalfont, United Kingdom). The GST-tagged hDvl2-PDZ domain (residues 262–356) was constructed similarly. Two PDZ domains, residues Cys-Trp near

the C-termini (residues 338–339 and 354–355, respectively, for hDvl1, and hDvl2) were substituted to Ala-Thr to increase protein stability. Since the position of these residues was opposite side to the ligand binding site, we assumed that the mutations affected to neither its affinity nor binding mode to the inhibitors. Isotopically labeled proteins for NMR experiments were generated, respectively, in E. coli BL21 (DE3) grown in 1 L M9 minimal medium culture at 37◦C in the presence of [15N]- NH4Cl and [13C]-glucose (if needed) as the sole nitrogen and carbon sources. The protein expression was induced by addition of final 1 mM of isopropyl-β-D-galactoside, with immediate lowering of the temperature to 20◦C. The cells were harvested 20 h after IPTG induction. The harvested cells were then resuspended in lysis buffer (50 mM Tris–HCl, pH 7.2, and 150 mM NaCl), disrupted by sonication, and clarified by centrifugation. The supernatant was applied to a DEAE–SepharoseTM Fast Flow (GE Healthcare) column. It was then affinity-purified using resin (GST-AcceptTM; Nacalai Tesque Inc., Kyoto, Japan). The GST tag was removed by PreScission protease on beads. The protein solution was loaded on a Superdex 75 HR 26/60 column (GE Healthcare) equilibrated with 50 mM Tris–HCl (pH 7.2) and 150 mM NaCl. The purified proteins were concentrated to 0.1 mM (for NMR titration experiment) and were dialyzed against 100 mM potassium phosphate buffer (pH 7.4) containing 0.5 mM EDTA supplemented with 10% D2O and 5% d6 dimethyl sulfoxide. After comparing <sup>1</sup>H-15N HSQC spectra of hDvl1 and hDvl2 PDZ domains, we decided to continue further study of hDvl1-PDZ because of its sharp and well-dispersed HSQC signals. For triple resonance experiments, 0.65 mM of <sup>15</sup>N /13C-labeled hDvl1-PDZ was solubilized into 90 mM potassium phosphate buffer (pH 7.4) containing 0.45 mM EDTA supplemented with 10% D2O. <sup>15</sup>N-labeled mouse ZO-1 first PDZ domain (residues 18–110, mZO1-PDZ1) was prepared according to an earlier report (Umetsu et al., 2011).

#### NMR Experiments

For this study, NMR experiments were conducted using NMR spectrometer (600 MHz, Bruker Avance III; Bruker Analytik GmbH, Karlsruhe, Germany) equipped with a cryogenic tripleresonance probe. For assignment of backbone <sup>1</sup>H, <sup>13</sup>C, and <sup>15</sup>N resonances, HNCA, HNCACB, CBCA (CO) NH, HNCO, HN (CA) CO, and 3D <sup>15</sup>N-edited-NOESY-HSQC spectra were recorded. For NMR titration experiments, 0.1 mM PDZ domain sample was dissolved in 250 µL of 85 mM potassium phosphate buffer (pH 7.4) containing 0.42 mM EDTA supplemented with 10% D2O and 5% d6-dimethyl sulfoxide (DMSO). Then the <sup>1</sup>H–15N HSQC spectra were obtained with and without ligands. In each titration experiment, the final concentration of the compound at 0.2 mM was added to the proteins. All NMR spectra were recorded at 298 K. All spectra were processed using NMRPipe (Delaglio et al., 1995) and were analyzed using the program Sparky 3.114 (Goddard and Kneller, 2004). All chemical shift changes in the <sup>1</sup>H–15N HSQC spectra were calculated as 1δnormalized = {1δ( <sup>1</sup>H)<sup>2</sup> + [1δ( <sup>15</sup>N)/6]<sup>2</sup> } 1/2 . The

chemical shift changes were then mapped onto the corresponding residues of the structure of hDvl2-PDZ using PyMol graphic software (Schrödinger, 2015). 1δave is the sum of 1δnormalized divided by the total residue number with their residue-specific assignment except the residues with broadened-out signals. After Signals showing marked chemical shift changes were selected, the normalized chemical shift changes were calculated. Nonlinear least-squares fitting was applied to estimate the dissociation constant K<sup>D</sup> as

$$A\delta\_{\text{normalized}} = A\delta\_{\text{saturated}} \times \left( ([R]\_{total} + [L]\_{total} + K\_D) - \right)$$

$$\text{sqrt}(([R]\_{total} + [L]\_{total} + K\_D)^2 - 4[R]\_{total}$$

$$[L]\_{total}))/2[L]\_{total} \tag{1}$$

where 1δsaturated represents the normalized chemical shifts at the saturated point. In addition, [R]total and [L]total, respectively, denote the concentrations of PDZ domain and the compound. K<sup>D</sup> and 1δsaturated values for the selected residues were optimized simultaneously by using SOLVER function in Microsoft Excel (Microsoft Corp.).

### Docking and Virtual Screening Experiments

Prior to the VS experiments, a focused library was constructed by filtering compounds with carboxylic acid moieties, which play a crucially important role in canonical peptide recognition by many PDZ domains. A focused library was constructed as a subset of the compound database (LIGANDBOX ver. 1306) (Kawabata et al., 2013) based on our earlier observation that diclofenac and flufenamic acid bound several PDZ domains in a group-specific manner (Tenno et al., 2013). We selected and pooled 5,135 compounds of N-substituted 2-amino-benzoic acid and N-substituted 2-amino-benzeneacetic acid. Subsequently, software GOLD suite (ver. 5.32) (Verdonk et al., 2003) was used for molecular docking of the compounds into the structure of hDvl2-PDZ [PDB entry 3CBY (Zhang et al., 2009)]. The GOLD software is based on a genetic algorithm for generating configurations of ligands with the two scoring modes, "simple scoring" and "consensus scoring." Simple scoring uses just a single function out of the four fitness functions. Consensus scoring combines two of four scoring functions, respectively, for initial docking and re-scoring. The present study examined the three scoring functions of ChemScore (CS), GoldScore (GS), and ChemPLP, in the simple scoring mode and the consensus scoring mode, thereby examining nine scoring methods.

#### Cell-Based Viability Assay

The newly found Dvl-PDZ inhibitors were tested to assess their effectiveness against TNBC cell lines (BT-20) on cell proliferation and viability. For that purpose, luciferase-expressing stable cell lines were chosen, although we did not perform luciferase-based biochemical experiment in this report. The TNBC cell lines BT-20 (BT-20/CMV-Luc, JCRB-1438) were obtained from the JCRB Cell Bank, National Institute of Biomedical Innovation, Health, and Nutrition (Osaka, Japan). The cells were grown in

Minimum Essential Medium Eagle (Earle's salts containing with **L**-glutamine and sodium bicarbonate; Sigma-Aldrich Corp.), supplemented with 10% fetal bovine serum (FBS) (Biosera, Boussens, France), and 1% Penicillin/Streptomycin antibiotics (Gibco, Grand Island, NY, United States). Cell lines were cultured in a 37◦C incubator with a humidified atmosphere of 5% CO2. Cells were seeded at 15,000 cells/well into 96-well plates. After overnight incubation, cells were treated with d6- DMSO or 100 µM of each Dvl-PDZ inhibitor (CalBioChem-322338, NPL-4001, 4002, 4007, and 4011–4013) for 96 h. During culture, the media with or without corresponding inhibitors was refreshed every 48 h. After 4 days of culture with the compounds, the cell growth rate was ascertained using WST-8 [2-(2-methoxy-4-nitrophenyl)-3-(4-nitrophenyl)-5-(2,4 disulfophenyl)-2H-tetrazolium] colorimetric assay with a kit (Cell Counting Kit-8 <sup>R</sup> ; Dojindo Molecular Technologies Inc., Kumamoto, Japan) according to the manufacturer's instructions. Cell viability was also ascertained after 4 days (Cytotoxicity LDH Assay Kit-WST; Dojindo Molecular Technologies Inc., Japan). The sample absorbance was measured using a microplate reader (EnSpire; PerkinElmer Inc., Waltham, MA, United States). All experiments were performed in triplicate. Each measurement was repeated twice. Statistical tests were performed using Microsoft Office <sup>R</sup> Excel program.

#### RESULTS

#### NMR Analysis of hDvl1-PDZ With Prototype N-Substituted 2-Amino-Benzoic Acid Compounds

Before analyzing the interaction between hDvl1-PDZ and the compounds, we completed assignment of the backbone amide signals of hDvl1-PDZ because few signal assignments for hDvl1- PDZ have been published or deposited in the public NMR database (BioMagResBank). The backbone signal assignment was done according to the standard method (Ikura et al., 1990) using software MARS (Jung and Zweckstetter, 2004). The assignment was further confirmed using several inversely <sup>14</sup>Nlabeled samples (Hiroaki et al., 2011). Out of the 98 residues, 79 residues (81%) were assigned, although seven NH signals at the loop between β1 and β2 strands were missing, probably because of intermediate dynamic motion in the solution. The assignments were labeled on the HSQC spectra (**Figure 2D**).

Subsequently, we performed NMR titration experiments using 17 prototypical N-substituted 2-amino-benzoic acid compounds (NPL-1010, 1011, and 3001–3015) (**Figure 3**). In an earlier study, we found from bioinformatics prediction of the eF-seek analysis of all PDZ domains in human genome (Kinoshita et al., 2007; Motono et al., 2011), that flufenamic acid and diclofenac bound several PDZ domains (Tenno et al., 2013). Moreover, we identified the structure of the mouse Zonula ocludens-1 (ZO1)- PDZ1 domain (Umetsu et al., 2011) (PDB: 2RRM) and mouse ligand of numb X1 (LNX1)-PDZ2 domain (PDB: 3VQG, 3VGF, manuscript in preparation). These structures were subjected to VS using GOLD and LIGANDBOX to discover novel PDZ

TABLE 1 | Normalized total CSPs of hDvl1-PDZ induced by 2.0 equations of the prototypical Dvl1-PDZ binding compounds.


domain inhibitors. During that study, we identified the first two prototypical mLNX1-PDZ2 binders (NPL-1010 and 1011), for which direct binding to mLNX1-PDZ2 was confirmed using NMR experiments (manuscript in preparation). Surprisingly, the chemical structure of NPL-1010 closely resembled that of CalBioChem-322338 (**Figure 3**). Accordingly, we proceeded to collect 15 related compounds (NPL-3001–3015) to analyze affinities against both mLNX1-PDZ2 and mZO1-PDZ1 by the combined use of VS and solution NMR. Subsequently, our collected N-substituted 2-amino-benzoic acid compounds (NPL-1010, 1011, and 3001–3015) were examined to elucidate whether they bind directly to hDvl1-PDZ, or not. Finally, we found that 12 of 17 compounds tested in this study showed substantial chemical shift changes of amide protons of hDvl1-PDZ larger than that of CalBioChem-322338. All results of chemical shift changes were normalized and were averaged per residue. They are presented in **Table 1** according to descending order of the CSPs. Examples of the chemical shift perturbations are presented in **Figure 4**.

### Introduction and Calculation of NMR-Derived Docking Performance Index

Greatly inspired by the idea of fine-tuning of VS parameters and setting them with experimental data to improve VS performance (Huang and Wong, 2016), we modified that original idea to fit the use of our experimental data of NMR titration (CSP) study. For this purpose, we designed a strategy to tune VS parameters with our original NMR-derived docking performance index (NMR-DPI, **Figure 1**). First, NMR titration experiments of hDvl1-PDZ were performed with all 18 compounds as described above. Second, we docked all the 17 N-substituted 2-amino-benzoic acid compounds to the hDvl2-PDZ structure (PDB: 3CBY) using the GOLD software. Note that the core region of PDZ domains of human Dvl1 and Dvl2 are 92% identical in amino acid sequences (**Figure 2C**). At that time, the nine docking scoring methods were tested with different combinations of scoring functions, as presented in Table in **Figure 5A**. In our experience, these GOLD scoring functions mutually differ to a great degree. For that reason, it is difficult to determine one of them robustly for any new VS project. Third, the final fitness score of each scoring method was normalized to a value between 0 and 1 as the docking score D(i, j), where i is the index of the scoring methods

and j is the name of the compounds. Similarly, the averaged normalized NMR chemical shift change, N(j), was calculated. Finally, NMR-DPI was defined as

$$\text{NMR\\_DPI(i)} = sqrt{\left(\sum^j \left(D\left(i, j\right) - N\left(j\right)\right)^2\right)}\tag{2}$$

The heat map representation of all docking scores of the 17 compounds with nine scoring functions in GOLD and the normalized averaged NMR chemical shift change for 18 compounds is shown in **Figure 5B**. A bar graph of NMR\_DPI is portrayed in **Figure 5C**. The lowest NMR\_DPI, which represents the best correlation between the docking score and the NMR CSP experiments, was achieved when the consensus scoring of GS followed by CS was selected.

### Advanced Virtual Screening of hDvl1-PDZ Domain Inhibitors

Consensus scoring GS-CS in this order was chosen to perform the advanced VS experiment with GOLD and the specified library, including approximately 5,135 N-substituted-2-aminobenzoic acid compounds. We obtained a list containing 1,770 compounds with scores higher than that of CalBioChem-322338 (score = 59.9). After the selected compounds were purchased (**Figure 6**), they were assessed using NMR-CSP experiments to ascertain whether they were able to bind hDvl1-PDZ. Among them, nine compounds (NPL-4001, 4002, 4004, 4007, and 4011–4016) induced substantial chemical shift changes when added to hDvl1-PDZ: 7 out of 13 (69%) compounds had reasonable affinity against hDvl1-PDZ (**Supplementary Table S1**). Some HSQC spectra are presented in **Supplementary Figure S2** with their chemical structures. The hit rate (69%) is remarkably high, emphasizing the benefit of introducing NMR-DPI combined with VS.

### Assessing Physicochemical Properties of the Most Potent hDvl1-PDZ Inhibitor: NPL-4011

Among the 13 newly examined compounds, four (NPL-4007, 4011, 4012, and 4013) possessed a common molecular architecture, with two 2-amino-benzoic acid moieties connected

at the 5-position directly or with a single methylene linker (**Figure 6**). NPL-4011 showed a large GOLD VS docking score as well as CSP. Therefore, we determined its K<sup>D</sup> further against hDvl1-PDZ using NMR titration experiments (**Figure 7A** and **Supplementary Figure S3A**). First we selected the residues surrounding the ligand binding pocket: D315, V318, L321, R322, and V325. The normalized chemical shift changes of these residues were subjected to non-linear curve fitting to find K<sup>D</sup> (**Figure 7B**), which was 34.5 ± 6.6 µM. Then we compared this value to the commercially available control compound CalBioChem-322338 under the same condition and obtained the value of 954 ± 403 µM (**Supplementary Figures S3C,D**). This K<sup>D</sup> value of CalBioChem-322338 is larger than its reported value for mouse Dvl1-PDZ (10.6 ± 1.7 µM) (Grandy et al., 2009) for reasons that remain unknown. Results show that NPL-4011 is a stronger inhibitor than CalBioChem-322338

when compared under identical conditions using hDvl1- PDZ.

Next, we carefully assessed the docking model of NPL-4011 and Dvl-PDZ generated by GOLD (**Figure 7C**). In the model, the crescent-shaped molecule NPL-4011 is well suited to the long shallow cleft of the ligand binding site of Dvl-PDZ. The residues of hDvl2-PDZ which contact to NPL-4011 are consistent with the residues that showed substantial CSPs at the NMR titration experiments (**Figure 7D**). We examined this binding model further. The lower half part of the symmetrical NPL-4011 molecule fits to the lower half part of the ligand binding cleft of Dvl-PDZ, which corresponds to the "canonical" C-terminal binding pocket common for all other PDZ domains. The upper half part of NPL-4011 also fits to the cleft between two loops:

β1–β2 loop and α2-β6 loop. This upper cleft is unique to Dvl-PDZ domain (**Figures 7D,E**), which might accommodate binding to "non-canonical" ligands such as the cytosolic regions of Fzd, the physiological partner of Dvl. **Figure 7F** is an example of a close-up view of the representative "canonical" class-III PDZ domain, the first PDZ domain of mouse ZO-1 (mZO1-PDZ1, PDB:2RRM). The domain does not possess the cleft above the canonical ligand binding pocket because the loop between β1–β2 bends upon and contacts to the end of α2-helix.

This structural difference between Dvl-PDZ and mZO1-PDZ1 invites our speculation that, because of steric crash between the half part of the ligand and the bended β1–β2 loop, NPL-4011

(and probably its related molecules, NPL-4007, 4012, and 4013) might not bind mZO1-PDZ1. Instead, the smaller prototype Dvl-PDZ inhibitor CalBioChem-322338 can bind mZO1-PDZ1 because it might only occupy the canonical ligand binding pocket of mZO1-PDZ1 without steric stress. In other words, NPL-4011 is among the more Dvl-specific PDZ domain inhibitors. In order to confirm this speculation, we further performed additional NMR-CSP experiments of mZO1-PDZ1 titrated with NPL-4011 and CalBioChem-322338 (**Supplementary Figure S4**). Assignment of backbone signals were taken from our previous study (Umetsu et al., 2007). In the presence of two equivalent of NPL-4011, mZO1-PDZ1 did not show any chemical shift changes. In contrast, the signals from the residues surrounding the canonical binding site of mZO1-PDZ1 showed substantial CSP upon CalBioChem-322338. Thus, the unique molecular shape of NPL-4011 confined its binding to Dvl-PDZ in more specific manner.

### Assessment of Biological Activities of NPL-40XX Compounds

We assessed the inhibitory activity of the selected NPL-40XX compounds toward Wnt signaling pathways in the cultured-cellbased assay. For this purpose, we used BT-20 cell, a triple negative breast cancer (TNBC) cell line. Activation of Wnt signaling pathway is often observed in many cancers. Therefore, Wnt pathway inhibition is a potential therapeutic strategy (Polakis, 2012). Reportedly, activation of Wnt/β-catenin pathway has been observed in TNBC (Geyer et al., 2011; King et al., 2012a,b). For BT-20 cell, overexpression of Fzd 7 (Fzd7) has been reported; shRNA against Fzd7 suppresses the proliferation of BT-20 efficiently (Yang et al., 2011).

We applied cell-based proliferation inhibition assay to concentrations of 100 µM of the compounds, including NPL-4001, 4002, 4004, 4007, 4011–4013, and the control compounds CalBioChem-322338 (**Figure 8** and **Supplementary Figure S5**). After 4 days of culture with 100 µM of the compounds, NPL-4001 and NPL-4004 showed approximately 80% inhibition of BT-20 cell proliferation, although NPL-4002, 4007, 4012, and 4013 showed no remarkable inhibitory activity. The stronger inhibitor NPL-4011 showed only 60% inhibition, which is less potent to 4001 and 4004. In this condition, the positive control CalBioChem-322338 showed better proliferation inhibitory activity, as 90% inhibition. The results demonstrated that our compound NPL-4011 must provide further improvement in terms of cell-based anticancer activity, although affinity against the target domain was highly optimized.

## DISCUSSION

### Experimental Aspect of the NMR-Derived Docking Performance Index (NMR-DPI) for Dvl-PDZ Domain Inhibitor Screening

A common tradeoff that arises is that between accurate prediction of binding free energy 1G in VS and the speed of calculation. Researchers must always confront the dilemmas of "rapidity– inaccuracy" and "sluggishness–accuracy" to process as many compounds as possible during a given period, simplified scoring functions should be chosen rather than the first principle-based force field in simulations between the target protein and ligands. In doing so, although such simplified scoring functions might all be equally inaccurate, eventually some scoring function can be expected to behave better than another for the specified library of the specified compounds. This study demonstrated an experimental strategy to select a better scoring function from the options presented by the GOLD program suite.

For this study, we used the averaged normalized chemical shift changes, 1δave, instead of K<sup>D</sup> for each of 17 training set

molecules: 89% (15/17) of them bound to hDvl1-PDZ. According to theory, the maximum value of CSP should be recorded at the saturation point of titration by compounds. At that time, the maximum CSP might vary depending on the chemical structure of ligands. For example, aromatic rings in the ligand might induce larger CSPs upon binding because of the ring current effect. Another important shortcoming of CSP is that it is sensitive to the allosteric conformational changes of the target protein upon ligand binding. Consequently, generally, it is not recommended to use 1δave (or other CSP-derived parameters) as an indication of KD. Irrespective of those shortcomings of CSP, however, we used 1δave for this study based on the following two criteria. (1) Only compounds with similar chemical structure were analyzed and compared using 1δave. (2) Under the experimental conditions we used, the affinity of most ligands was weak. Moreover, they did not saturate to bind against hDvl1- PDZ at 1:2 molecular equivalence. We carefully assessed our experimental system using these two criteria. Finally, we inferred that if the criteria are satisfied, then the use of 1δave as an indication of K<sup>D</sup> is convenient. Note that it was not feasible to use thermal shift assay to infer the affinity of the compounds in our case because many PDZ domains including hDvl1-PDZ showed no sharp T<sup>m</sup> transition curve. In addition, although the CSP experiment requires stable-isotope labeling, the experiment is less troublesome than those of the surface plasmon resonance experiment because it is unnecessary to immobilize the protein to the chip. Accordingly, information of amide NMR signals enables us to discern specific binding from non-specific binding.

### Comparison of Biological Activities of NPL-40XX Compounds

Results show that NPL-4011 has stronger affinity against hDvl1- PDZ in vitro, but it was a less potent cell growth inhibitor against BT-20 cell than CalBioChem-322338 was. To elucidate this observation in terms of bioavailability, we compared Lipinski's drug-likeness parameters (Lipinski, 2000). The molecular weight of NPL-4011 (580.593) is greater than that of CalBioChem-322338 (373.388). The numbers of H-bond donors are equal (2), although the number of H-bond acceptors of NPL-4011 (8) is double that of CalBioChem-322338 (4). These two parameters violate Lipinski's rule of five. Although the calculated logP-value (1.53 for NPL-4011 and 2.59 for CalBioChem-322338) is the only merit of NPL-4011, it did not contribute to overcoming the other shortcoming. Therefore, we infer that the poor biological activity of NPL-4011 is attributable to its bioavailability. This assumption is partially supported by our other observation. As described above, NPL-4001 and 4004 showed comparable growth inhibition activity to CalBioChem-322338. They are better than NPL-4011. Their Lipinski parameters are, respectively, 402.224 and 403.414 (MW), 1.75, and 2.05 (logP), 2 and 2 (H-bond donors), and 4 and 5 (H-bond acceptors). The numbers of donors and acceptors of H-bond are known to be crucially important to infer biological activity from the cell-based assay.

By contrast, NPL-4011 is expected to be more selective for Dvl-PDZ than the other PDZ domains in human cells because the crescent-shaped molecule fits to the unique cleft of Dvl-PDZ domains. The PDZ domain is the most abundant modular domain in human cell cytosol. Therefore, design of highly specific molecules to one specified PDZ domain might become crucially important. To satisfy both the specificity and the biological activity in terms of bioavailability, a good starting point is our new pharmacophore: bis-benzoic acid moiety. Screening smaller analogs such as NPL-4007 as the seed is better to improve the biological activity of this group of compounds. By contrast, a prodrug strategy starting from NPL-4011 is not recommended because it has already exceeded the drug-likeness parameters.

## CONCLUSION

In conclusion, we demonstrated a series of new class of compounds with higher affinity against hDvl1-PDZ. We proposed NMR-DPI as a useful experimental indication to optimize VS in the early stages of drug discovery.

## AUTHOR CONTRIBUTIONS

KH and TT performed all the NMR titration experiments and discovered the inhibitors. KH also performed all the cell-based assays assisted by NG and AS. KA initiated the NMR signal assignment of hDvl1 and hDvl2 PDZ domains, whereas KH completed it. NG and TT prepared the plasmid constructs and protein samples of the optimized PDZ domains. The cell and developmental biologists MT and AS designed all the biological assays, set them up, and organized the biological part of the manuscript. HH constructed the focused library, developed the idea of NMR-based DPI, and performed VS. HH wrote the manuscript and organized the project.

## FUNDING

This work was supported in part by the Target Protein Research Program from Japan Science and Technology Agency (JST), A-step feasibility study program (AS262Z01275Q and AS242Z00566Q) from JST, Japan Society for the Promotion of Science KAKENHI (15H04337), and the AMED-PDIS from Japan Agency for Medical Research and Development (AMED). This work was also supported by the Princess Takamatsu Cancer Research Fund (15-24726).

## ACKNOWLEDGMENTS

The authors would like to thank Fastek Ltd. (Sendai, Miyagi, Japan; http://www.fastekjapan.com/) for the English language review.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar.2018. 00983/full#supplementary-material

### REFERENCES

fphar-09-00983 September 4, 2018 Time: 9:5 # 13



**Conflict of Interest Statement:** HH and TT are the founders of BeCellBar LLC., and TT is employed by BeCellBar LLC.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Hori, Ajioka, Goda, Shindo, Takagishi, Tenno and Hiroaki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparison of Quantitative and Qualitative (Q)SAR Models Created for the Prediction of K<sup>i</sup> and IC<sup>50</sup> Values of Antitarget Inhibitors

Alexey A. Lagunin1,2 \*, Maria A. Romanova<sup>2</sup> , Anton D. Zadorozhny<sup>2</sup> , Natalia S. Kurilenko<sup>2</sup> , Boris V. Shilov<sup>2</sup> , Pavel V. Pogodin<sup>1</sup> , Sergey M. Ivanov1,2, Dmitry A. Filimonov<sup>1</sup> and Vladimir V. Poroikov<sup>1</sup> \*

#### Edited by:

Adriano D. Andricopulo, Universidade de São Paulo, Brazil

#### Reviewed by:

Miguel Reyes-Parada, Universidad de Chile, Chile Antreas Afantitis, NovaMechanics Ltd., Cyprus

#### \*Correspondence:

Alexey A. Lagunin alexey.lagunin@ibmc.msk.ru Vladimir V. Poroikov vladimir.poroikov@ibmc.msk.ru

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 02 May 2018 Accepted: 18 September 2018 Published: 10 October 2018

#### Citation:

Lagunin AA, Romanova MA, Zadorozhny AD, Kurilenko NS, Shilov BV, Pogodin PV, Ivanov SM, Filimonov DA and Poroikov VV (2018) Comparison of Quantitative and Qualitative (Q)SAR Models Created for the Prediction of K<sup>i</sup> and IC<sup>50</sup> Values of Antitarget Inhibitors. Front. Pharmacol. 9:1136. doi: 10.3389/fphar.2018.01136 <sup>1</sup> Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia, <sup>2</sup> Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, Russia

Estimation of interaction of drug-like compounds with antitargets is important for the assessment of possible toxic effects during drug development. Publicly available online databases provide data on the experimental results of chemical interactions with antitargets, which can be used for the creation of (Q)SAR models. The structures and experimental K<sup>i</sup> and IC<sup>50</sup> values for compounds tested on the inhibition of 30 antitargets from the ChEMBL 20 database were used. Data sets with K<sup>i</sup> and IC<sup>50</sup> values including more than 100 compounds were created for each antitarget. The (Q)SAR models were created by GUSAR software using quantitative neighborhoods of atoms (QNA), multilevel neighborhoods of atoms (MNA) descriptors, and self-consistent regression. The accuracy of (Q)SAR models was validated by the fivefold crossvalidation procedure. The balanced accuracy was higher for qualitative SAR models (0.80 and 0.81 for K<sup>i</sup> and IC<sup>50</sup> values, respectively) than for quantitative QSAR models (0.73 and 0.76 for K<sup>i</sup> and IC<sup>50</sup> values, respectively). In most cases, sensitivity was higher for SAR models than for QSAR models, but specificity was higher for QSAR models. The mean R <sup>2</sup> and RMSE were 0.64 and 0.77 for K<sup>i</sup> values and 0.59 and 0.73 for IC<sup>50</sup> values, respectively. The number of compounds falling within the applicability domain was higher for SAR models than for the test sets.

Keywords: QSAR, antitarget, inhibition, adverse drug reactions, K<sup>i</sup> , IC50, GUSAR, ChEMBL

### INTRODUCTION

Adverse drug reactions (ADRs) are one of the main problems in drug discovery and clinical practice (Böhm and Cascorbi, 2016). According to some estimates, ADR is one of the leading causes of hospitalization and death in developed countries (Starfield, 2000; Kochanek et al., 2016), the second most common cause of drug attrition in later stages of clinical trials and the major

reason for drug withdrawal from the market (Hornberg et al., 2014). This situation is largely due to disadvantages of traditional animal toxicological experiments and clinical trials that cannot detect all serious ADRs because of inter-species differences and their idiosyncratic nature. Therefore, additional methods including in vitro and in silico approaches are currently being developed. In silico approaches are usually based on machine learning techniques and network analyses to link several chemical and biological features of approved and withdrawn drugs to ADRs, which include molecular descriptors, known or predicted drug targets, drug-induced gene expression profiles and cell phenotypic features (Ivanov et al., 2016). These approaches allow predict dangerous ADRs in the early stages of drug development and provide insights into potential toxic mechanisms of drug candidates. It is currently accepted that the most ADRs are the consequence of unintended interactions of drugs with human protein targets and are not related to a therapeutic mechanism of action. For example, blocking HERG potassium channels in the heart causes life-threatening arrhythmias (Siramshetty et al., 2016). There are dozens of human proteins that have known relationships to ADRs, and corresponding information has accumulated in public databases (Ji et al., 2003; Zhang et al., 2007) and been described in some publications (Whitebread et al., 2005; Bowes et al., 2012). These proteins are called "antitargets" because to avoid dangerous ADRs, they should not interact with drugs. Many pharmaceutical companies use in vitro assays to measure interactions of lead compounds with "antitargets" and select the least promiscuous ones for further development. To avoid performing hundreds of experiments, such interactions can also be predicted using ligand-based structure-activity relationship analysis or docking (Ivanov et al., 2016; Simões et al., 2018). Due to accumulation of data on chemical-protein interactions and three-dimensional protein structures in public databases such as ChEMBL (Gaulton et al., 2017), PubChem (Wang et al., 2017), and PDB (Berman et al., 2000), it has become possible to predict interactions with many hundreds of human proteins, including "antitargets." There are plenty of published (Q)SAR models (Poroikov et al., 2007; Filz et al., 2008; García-Sosa and Maran, 2014; Ivanov et al., 2016) and free available web-services (Zakharov et al., 2012; Braga et al., 2015) that may perform such predictions; however, no study was found with a comparison between the accuracy of classification (SAR) and quantitative (QSAR) models created based on the same data, descriptors and mathematical algorithm. The aim of this work is the creation, validation, and accuracy estimation of SAR and QSAR models for the prediction of the inhibition of 30 antitargets using GUSAR software and data on structures and K<sup>i</sup> and IC<sup>50</sup> values of tested compounds from the ChEMBL 20 database. Earlier, we published a study on the creation of reasonable QSAR models by GUSAR software and the appropriate web service<sup>1</sup> for the prediction of interaction between drug-like compounds and 18 antitargets (Zakharov et al., 2012). In this paper, we have significantly expanded the list of covered "antitargets" and significantly increased the volumes and diversity of training samples, which allowed us to expand the range of applicability of models and to obtain valuable results.

#### MATERIALS AND METHODS

#### Data Sets

Structures and experimental K<sup>i</sup> and IC<sup>50</sup> values of compounds tested on the inhibition of 30 antitargets were extracted from the ChEMBL 20 database. The data sets with K<sup>i</sup> and IC<sup>50</sup> values including more than 100 compounds were created for each antitarget (**Table 1**). Only the records with K<sup>i</sup> or IC<sup>50</sup> values in nM and symbol " = " in the field "Relation" were extracted from ChEMBL database. During the creation of data sets of compounds interacting with receptors, we included records with compounds studied as truly antagonists and records with compounds studied on biding affinity because of we could not divided them. In spite of Ki and IC50 values indicate the affinity of a compound by a given receptor, and they do not necessarily provide functional information related with agonism or antagonism of a compound to such target we decided to include such data because antagonism of receptors may be related with Ki and IC50 values, whereas agonism to receptors are usually represented by EC50 values. K<sup>i</sup> or IC<sup>50</sup> values were transformed in pIC<sup>50</sup> = −log10(IC50(M)) and pK<sup>i</sup> = −log10(Ki(M)) values. **Table 1** also shows the known relations between the inhibition of antitargets and ADRs. The number of compounds with K<sup>i</sup> values was approximately 1.5 times higher than that for IC<sup>50</sup> values (46830 and 29678, respectively). The sets included structures of single electroneutral small (molecular weight in range from 50 to 1250 Da) organic molecules. In general, such representation of structure corresponds to the best QSAR practice (Fourches et al., 2016) implemented in the GUSAR software, which was used in our study (see below). If a compound had several experimental values for the parameter, then a median value was used. Such median values were calculated because the reference compounds usually had several experimental values, since they were tested in many experiments. Deleting such compounds reduces an important part of chemical space and significantly restricts the applicability domain of the global QSAR models. In several publications related to the creation of global QSAR models based on heterogeneous data, authors used average values (Politi et al., 2014; Cortes-Ciriano and Bender, 2015). The median value was used because it better characterizes the set of values for strongly skewed distributions. Zip file including SD files related with the appropriate target (the gene name of targets is used in a file name), and endpoint is provided in **Supplementary Materials**. Each SD file includes structures, ChEMBL\_ID, and experimental values. For classification models and comparison of prediction results between the SAR and QSAR models, 1 µM was used as a threshold between active and inactive compounds. The sets were sorted by the ascending mode of the appropriate values. Then, successively, a number from 1 to 5 was assigned for each structure from a set. After that, the sets were divided into five unique parts according to the assigned number of structures. These parts were used for the fivefold cross-validation (fivefold CV) procedure, when

<sup>1</sup>http://www.way2drug.com/gusar/antitargets.html

#### TABLE 1 | Data related with antitargets and the number of compounds with Ki and IC50 values in data sets.


BP, blood pressure; ECG, electrocardiogram; GI, gastrointestinal; HR, heart rate; SCID, severe-combined immunodeficiency.

each unique part was used as an external test set, and the remaining parts were used as a training set. As a result, different five training and five external test sets for K<sup>i</sup> data and five training and five external test sets for IC<sup>50</sup> data, including both quantitative and qualitative descriptions, were created for each antitarget.

#### GUSAR Software

fphar-09-01136 October 8, 2018 Time: 15:43 # 4

The (Q)SAR models were created by GUSAR software<sup>2</sup> , which used quantitative neighbourhoods of atoms (QNA), multilevel neighbourhoods of atom (MNA), and whole-molecule descriptors with self-consistent regression (Lagunin et al., 2007; Filimonov et al., 2009; Lagunin et al., 2011). QNA descriptors are calculated by two functions, P and Q. The values for P and Q for each atom i are calculated as:

$$\begin{aligned} P\_i &= B\_i \sum\_k (\exp(-\frac{1}{2}C))\_{ik} B\_k, \\\\ Q\_i &= B\_i \sum\_k (\exp(-\frac{1}{2}C))\_{ik} B\_k A\_k, \end{aligned}$$

where k is all other atoms in the molecule and

$$A\_{\mathbf{k}} = \frac{1}{2}(IP\_{\mathbf{k}} + EA\_{\mathbf{k}}), B\_{\mathbf{k}} = (IP\_{\mathbf{k}} - EA\_{\mathbf{k}})^{-\frac{1}{2}}$$

Here, IP is the ionization potential, EA is the electron affinity for each atom, and C is the connectivity matrix for the molecule. QNA descriptors describe each particular atom of a molecule; at the same time, each P or Q value depends on the total molecule composition and structure. Two-dimensional Chebyshev polynomials are used for approximating the functions P and Q over all atoms of the molecule. A detailed description of QNA descriptors is represented in the publication of Filimonov et al. (2009).

MNA descriptors (Filimonov et al., 1999) are based on the molecular structure representation, which includes hydrogens according to the valences and partial charges of other atoms and does not specify the types of bonds. MNA descriptors are generated as a recursively defined sequence:


where D<sup>i</sup> is the previous-level MNA descriptor for i–th immediate neighbor of the atom A.

The mark of the atom may include not only the atomic type but also any additional information about the atom. In particular, if the atom is not included in the ring, it is marked by "−". The neighbor descriptors D1D2...D<sup>i</sup> ... are arranged in a unique manner, for example, in lexicographic order. The iterative process of MNA descriptors generation can be continued covering first, second, and so on, neighborhoods of each atom.

For regression analysis, this molecule structure representation was transformed using the original PASS (Prediction of Activity Spectra for Substances) algorithm (Lagunin et al., 2011). This algorithm estimates the biological activity profiles for chemical compounds using MNA descriptors as input parameters. Therefore, we used the results of PASS prediction as independent variables for regression analysis. The results of PASS prediction are given as a list of biological activities, for which the difference between probabilities of being active (Pa) and inactive (Pi) was calculated. The activities from the list of predicted biological activities were randomly selected as input independent variables for regression analysis. This allows obtaining different QSAR models. GUSAR incorporates a PASS version that predicts 4130 types of biological activity. This version of PASS has a mean prediction accuracy of approximately 95% calculated by leave-one-out cross-validation procedure (Filimonov et al., 2014). The list of predictable biological activities currently includes 501 pharmacotherapeutic effects (e.g., antihypertensive, hepatoprotectant, and nootropic), 3295 mechanisms of action (e.g., 5-hydroxytryptamine antagonist, acetylcholine M1 receptor agonist, and cyclooxygenase inhibitor), 57 adverse and toxic effects (e.g., carcinogenic, mutagenic, and hematotoxic), 199 metabolic terms (e.g., CYP1A inducer, CYP1A1 inhibitor, and CYP3A4 substrate), 49 transporter proteins (e.g., P-glycoprotein 3 inhibitor, nucleoside transporters inhibitors, and proline transporter inhibitor), and 29 activities related to gene expression (e.g., TH expression enhancer, TNF expression inhibitor, and VEGF expression inhibitor). Therefore, the maximum number of independent variables for the creation of MNA models is 4130. The detailed description of realization of PASS in GUSAR is represented in the publication of Lagunin et al. (2011).

QNA and MNA descriptors do not provide information on the shape and volume of a molecule, although this information may be important for determination of structure-activity relationships. Therefore, these parameters, which are called whole-molecule descriptors, are also used in GUSAR. The wholemolecule descriptors used in GUSAR are: topological length, topological volume, lipophilicity, number of positive charges, number of negative charges, number of hydrogen bond acceptors, number of aromatic atoms, molecular weight, and number of halogen atoms. GUSAR uses estimation of the applicability domain based on different types of structural similarity using calculation of QNA and MNA descriptors (Zakharov et al., 2016).

GUSAR may provide an equation of any single (Q)SAR model (Lagunin et al., 2011). But because we used consensus (Q)SAR models from dozens or even hundreds of single (Q)SAR models, it is not possible to provide a general equation describing all selected variables. By this reason, the created consensus (Q)SAR models could not provide information about positive and negatively influencing descriptors. Instead that GUSAR shows positive and negative impact of each atom of the structure in the predicted value (Khayrullina et al., 2015). Analysis of the influence of atoms on the predicted value and the search for general relationships between the structures of active compounds interacting with antitargets is a separate task (because of each structure in the set should be analyzed), and it is beyond the scope of this publication.

<sup>2</sup>http://www.way2drug.com/gusar/index.html

FIGURE 1 | Plots of predicted and experimental values for the best and worst QSAR models by RMSE values calculated during fivefold cross-validation procedure. (A) QSAR model for prediction of pIC<sup>50</sup> values of compounds interacting with glucocorticoid receptor (the best QSAR model for IC<sup>50</sup> values). (B) QSAR model for prediction of pIC<sup>50</sup> values of compounds interacting with D(2) dopamine receptor (the worst QSAR model for IC<sup>50</sup> values). (C) QSAR model for prediction of pK<sup>i</sup> values of compounds interacting with HERG channel (Potassium voltage-gated channel subfamily H member 2) (the best QSAR model for K<sup>i</sup> values). (D) QSAR model for prediction of pK<sup>i</sup> values of compounds interacting with Beta-2 adrenergic receptor (the worst QSAR model for K<sup>i</sup> values).

#### Evaluation of Prediction Accuracy

The following statistical parameters were calculated for estimating the accuracy of prediction:


$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$$

(4) Balanced accuracy (BA): balance between sensitivity and specificity:

$$BA = \frac{Sensitivity + Specificity}{2}$$

(5) Root mean square error (RMSE):

$$RMSE = \sqrt{\frac{\sum (\nu\_{\text{exp}} - \nu\_{\text{pred}})^2}{n}}$$

(6) R-squared, coefficient of determination:

$$R^2 = 1 - \frac{\sum (\wp\_{\text{exp}} - \wp\_{\text{pred}})^2}{\sum (\wp\_{\text{exp}} - \wp\_{\text{mean}})^2},$$

where yexp – experimental value, ypred – predicted value, and ymean – average value of experimental values in a training set.

#### Y-Randomization Procedure

Y-Randomization procedure is included in GUSAR software and allows to be ensuring that the developed continues QSAR models are robust and do not have the over fitting (Wold and Eriksson, 1995). In this procedure, the dependent-variable vector, Y vector (K<sup>i</sup> or IC<sup>50</sup> values in our case), is randomly shuffled and a new QSAR model is developed using the original independent variable matrix.

It is expected that the resulting models should generally have low Q 2 values. This procedure was repeated five times for each model, and then the average Q 2 value was calculated.

### RESULTS AND DISCUSSION

Three hundred twenty SAR and 320 QSAR models with modified calculation of descriptors and regression coefficients

were created by GUSAR software for each from five training sets (five training sets with qualitative and quantitative data for K<sup>i</sup> or IC<sup>50</sup> values for each target) with internal validation (five times 20% from the training set was randomly used as an internal test set; this procedure is included into GUSAR). As a result, one consensus SAR model and one consensus QSAR model were created for each training set based on the appropriate single (Q)SAR model with R 2 train and Q<sup>2</sup> train and average R 2 calculated for internal validation sets more than 0.5. If R <sup>2</sup> of internal validation for (Q)SAR model was less than 0.5, then the model was excluded from the final consensus model [excluding QSAR models for D(1A) and D(2) dopamine receptors, histamine H1 and 5-hydroxytryptamine 2B receptors created on the basis of IC<sup>50</sup> data]. The final predicted values for tested compounds were calculated using a weighted average of the predictions from the obtained (Q)SAR models. Each model is based on a different set of descriptors, and its predictions for each compound were weighted according to the similarity

IC<sup>50</sup> data. (D) Correlation of Balanced Accuracy (BA) between SAR and QSAR models for IC<sup>50</sup> data.

value that was calculated during the applicability domain assessment.

After SAR and QSAR consensus models were created based on a training set, they were used for prediction of inhibition of the antitarget by compounds from the appropriate external test set. It was repeated for five training sets with K<sup>i</sup> values and five training sets with IC<sup>50</sup> values for each antitarget (fivefold CV procedure). The average characteristics of the created (Q)SAR models including average results of Y-randomization procedure (Q<sup>2</sup> <sup>Y</sup>−rand) are represented in **Supplementary Tables S1, S2**. It was appeared that all Q<sup>2</sup> <sup>Y</sup>−rand values for all QSAR models were less 0.15. The average Q<sup>2</sup> <sup>Y</sup>−rand values were from 0.026 to 0.06 and from 0.026 to 0.078 for QSAR models created based on K<sup>i</sup> and IC<sup>50</sup> data, respectively. It is significant less in comparison with Q<sup>2</sup> values calculated based on original data of the training sets and displays robustness of the given models.

The plots between predicted and experimental values for the best and worst QSAR models by RMSE values

calculated by fivefold cross-validation are displayed in **Figure 1**. The relations between predicted and experimental values for others QSAR models are within these extreme cases.

The statistical parameters describing accuracy of prediction and mentioned in the section "Materials and Methods" were calculated based on the prediction results given during the fivefold CV procedure for both SAR and QSAR models. To compare the accuracy of prediction of QSAR and SAR models, the quantitative results of prediction were transformed into qualitative ones according to the threshold mentioned in the section "Materials and Methods." Statistical parameters of accuracy of prediction for SAR and QSAR models created based on K<sup>i</sup> and IC<sup>50</sup> data for all antitargets are represented in **Supplementary Tables S3, S4**, respectively. The graphical representation of statistical parameters of accuracy and their comparison are represented in **Figures 2–4**.

**Figures 2A,B** show a comparison of the accuracy between SAR and QSAR models created based on K<sup>i</sup> values. **Figures 2C,D** show the results given based on IC<sup>50</sup> values. The accuracy of the QSAR models was higher in most cases than the accuracy of SAR models for both K<sup>i</sup> and IC<sup>50</sup> values (**Figures 2A**, **1C**). The mean accuracy of prediction for K<sup>i</sup> values was 0.84 and 0.87 for SAR and QSAR models, respectively. This is statistically significant difference (p < 0.05). The mean accuracy of prediction for IC<sup>50</sup> values was 0.82 and 0.83 for SAR and QSAR models, respectively. This is statistically insignificant difference (p = 0.285). The reverse result was observed for balanced accuracy (SAR models: K<sup>i</sup> data – 0.80, IC<sup>50</sup> data – 0.81; QSAR models: K<sup>i</sup> data – 0.73, IC<sup>50</sup> data – 0.76). The difference in balanced accuracy between SAR and QSAR models is statistically significant in both cases, for K<sup>i</sup> and for IC<sup>50</sup> values (p < 0.05). Specificity and sensitivity were similar for SAR and QSAR models (**Figures 2B**, **1D**). The mean value of specificity was higher for QSAR models for

both K<sup>i</sup> and IC<sup>50</sup> data (SAR models: K<sup>i</sup> data – 0.76, IC<sup>50</sup> data – 0.79; QSAR models: K<sup>i</sup> data – 0.95, IC<sup>50</sup> data – 0.90). The mean value of sensitivity was higher for SAR models for both K<sup>i</sup> and IC<sup>50</sup> data (SAR models: K<sup>i</sup> data – 0.84, IC<sup>50</sup> data – 0.82; QSAR models: K<sup>i</sup> data – 0.50, IC<sup>50</sup> data – 0.61).

The analysis of values of accuracy and balanced accuracy of SAR and QSAR models (**Supplementary Tables S1, S2**) shows that there is a correlation between them. **Figures 3A,B** show a correlation between accuracy and balanced accuracy for both SAR and QSAR models created based on K<sup>i</sup> data. **Figures 3C,D** show a correlation between accuracy and balanced accuracy for SAR and QSAR models created based on IC<sup>50</sup> data. One may see that in the both cases, the correlation between accuracy of SAR and QSAR models was higher than for balanced accuracy (**Figure 3**). If the values correlate, it means that there is no preference between SAR and QSAR models for the appropriate criterion of accuracy. But similar accuracy is achieved by different ways in the most cases (high sensitivity or high specificity, see **Figures 2B,D**). One can decide what is more important in the study: find as many as possible active compounds (the models with highest sensitivity should be selected) or reduce the number of false positive prediction (the models with highest specificity should be selected). The absence of correlation between the studied parameters shows that one of methods has preference. The values above the line show that QSAR models better than SAR ones. The values below the line show that SAR models better than QSAR ones. All cases excluding one which is displayed in **Figure 3C** (Correlation of Accuracy between SAR and QSAR models for IC<sup>50</sup> data) had statistically significant difference between the values of SAR and QSAR models (p < 0.05). The values of balanced accuracy is the most important criterion for estimation of accuracy of prediction because of many used datasets were unbalanced (the number of active and inactive compounds is significant different). Therefore, the given results showed that SAR models are the more preferable for the use of prediction of drug adverse reactions.

The other parameters of SAR and QSAR models are represented in **Figure 4**. **Figure 4A** shows the percent of compounds in applicability domain (AD) of SAR and QSAR models. The number of compounds in AD was 100% approximately for all SAR models. At the same time, the number of compounds in AD approximately for all QSAR models was less 100%. The mean value of percent of compound in AD for SAR and QSAR models was 99.9% and 98.6%, respectively. The highest present of compounds in applicability domain displays advantage and better predictive power for SAR models in comparison with QSAR models. **Figure 4B** shows the comparison of RMSE and R 2 values for QSAR models created on K<sup>i</sup> and IC<sup>50</sup> data. Clear features of distribution of these characteristics cannot be seen, but in general, the mean value of R 2 for QSAR models based on K<sup>i</sup> data was higher than one for IC<sup>50</sup> data (0.64 and 0.57, respectively). The mean RMSE value for QSAR models based on IC<sup>50</sup> data was less than one for K<sup>i</sup> data (0.73 and 0.77, respectively).

However, if we delete the RMSE value for the QSAR model created based on K<sup>i</sup> data for the beta-2 adrenergic receptor, the mean RMSE value also became 0.73 for the other QSAR models created based on K<sup>i</sup> data. It means that both K<sup>i</sup> and IC<sup>50</sup> values can be reliably used to predict interactions with antitargets. We may compare (Q)SAR models based on K<sup>i</sup> and IC<sup>50</sup> values only in general view because of they were created on different number of compounds and different structures. Nevertheless, we may reveal some features of the created models. The plots with comparison of Specificity and Sensitivity of (Q)SAR models created based on K<sup>i</sup> and IC<sup>50</sup> data are shown on **Supplementary Figure S1**. These plots display that SAR models based on IC<sup>50</sup> values have Specificity better than SAR models based on K<sup>i</sup> data for approximately half of antitargets. The biggest difference is shown for Mutype opioid receptor (0.34 for K<sup>i</sup> data and 0.97 for IC<sup>50</sup> data). SAR models based on K<sup>i</sup> data for others antitargets have better values of Specificity. The same picture we can see for Sensitivity of SAR models. Analysis of QSAR models revealed that majority of QSAR models based on K<sup>i</sup> data had better Specificity value, whereas majority of QSAR models based on IC<sup>50</sup> data had better Sensitivity value. High value of Sensitivity is more important for revealing possible adverse drug reaction than high value of Sensitivity. Analysis of Accuracy and Balanced Accuracy of (Q)SAR based on IC<sup>50</sup> and K<sup>i</sup> data (**Supplementary Figure S2**) show that the most (Q)SAR models based on K<sup>i</sup> values have better values, whereas the values of Balanced Accuracy are higher at the most of QSAR models based on IC<sup>50</sup> values.

### CONCLUSION

The creation of SAR and QSAR models based on the same data of compounds tested as inhibitors of 30 antitargets revealed some features related to the use of qualitative and quantitative data. They are valid to (Q)SAR models related to both K<sup>i</sup> and IC<sup>50</sup> values. SAR models tended to have more balanced prediction results when specificity and sensitivity have the closest values in comparison with QSAR models (**Figure 2**). High values of specificity and low values of sensitivity in QSAR models may be explained by the fact that at the given R 2 values (0.64 and 0.59), prediction results tended to lie closer to the average values of K<sup>i</sup> or IC<sup>50</sup> in the training set. If a threshold of 1 µM divided the training set into different proportions of active and inactive compounds, then a difference between specificity and sensitivity may occur. At the same time, despite the difference of specificity and sensitivity between SAR and QSAR models, the values of accuracy and balanced accuracy for SAR correlated with those of QSAR models (**Figure 3**). This indicated that the prediction results of SAR and QSAR models would complement each other and that the use of both approaches would improve the quality of assessment of interaction between ligands and antitargets.

Another conclusion is that SAR models had advantages in the applicability domain. It may be related to the fact that the use of

qualitative data gives SAR models less sensitivity to experimental errors in K<sup>i</sup> and IC<sup>50</sup> values.

In this study, we also displayed that the modern experimental data and methods of (Q)SAR modeling allow for the creation of rather reasonable (Q)SAR models for prediction of interaction between compounds and dozens of antitargets. The used approaches may be applied to the creation of in silico panels for estimation of "ligand-antitarget" interactions during the drug design process.

#### AUTHOR CONTRIBUTIONS

AL designed the study, performed the data analysis, and wrote the manuscript with inputs of all authors. MR, AZ, NK, and BS created and validated (Q)SAR models. PP

#### REFERENCES


and SI created datasets and data analysis. DF and VP designed the study, analyzed the results, and wrote the manuscript.

### FUNDING

This work was supported by Russian Science Foundation Grant No. 14-15-00449.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.01136/full#supplementary-material

discovery. Part I: why and how. Drug Discov. Today 19, 1131–1136. doi: 10. 1016/j.drudis.2013.12.008



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Lagunin, Romanova, Zadorozhny, Kurilenko, Shilov, Pogodin, Ivanov, Filimonov and Poroikov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery

Bruno J. Neves1,2, Rodolpho C. Braga<sup>1</sup> , Cleber C. Melo-Filho<sup>1</sup> , José Teófilo Moreira-Filho<sup>1</sup> , Eugene N. Muratov3,4 and Carolina Horta Andrade<sup>1</sup> \*

<sup>1</sup> LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil, <sup>2</sup> Laboratory of Cheminformatics, Centro Universitário de Anápolis (UniEVANGÉLICA), Anápolis, Brazil, <sup>3</sup> Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States, <sup>4</sup> Department of Chemical Technology, Odessa National Polytechnic University, Odessa, Ukraine

#### Edited by:

Adriano D. Andricopulo, Universidade de São Paulo, Brazil

#### Reviewed by:

Marcus Scotti, Federal University of Paraíba, Brazil Nelilma Correia Romeiro, Universidade Federal do Rio de Janeiro, Brazil Ana Carolina Rennó Sodero, Universidade Federal do Rio de Janeiro, Brazil

\*Correspondence:

Carolina Horta Andrade carolina@ufg.br; carolhandrade@gmail.com

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 11 August 2018 Accepted: 18 October 2018 Published: 13 November 2018

#### Citation:

Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN and Andrade CH (2018) QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front. Pharmacol. 9:1275. doi: 10.3389/fphar.2018.01275 Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small molecules for new hits with desired properties that can then be tested experimentally. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the number of candidates to be tested experimentally, and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and laborsaving. Among the VS approaches, quantitative structure–activity relationship (QSAR) analysis is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chemical descriptors are calculated on different levels of representation of molecular structure, ranging from 1D to nD, and then correlated with the biological property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biological property of novel compounds. Although the experimental testing of computational hits is not an inherent part of QSAR methodology, it is highly desired and should be performed as an ultimate validation of developed models. In this minireview, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compounds with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.

Keywords: cheminformatics, machine learning, molecular descriptors, computer-assisted drug design, virtual screening

## INTRODUCTION

Quantitative structure–activity relationship (QSAR) analysis is a ligand-based drug design method developed more than 50 years ago by Hansch and Fujita (1964). Since then and until now, QSAR remains an efficient method for building mathematical models, which attempts to find a statistically significant correlation between the chemical structure and continuous (pIC50, pEC50, Ki, etc.) or

categorical/binary (active, inactive, toxic, nontoxic, etc.) biological/toxicological property using regression and classification techniques, respectively (Cherkasov et al., 2014). In the last decades, QSAR has undergone several transformations, ranging from the dimensionality of the molecular descriptors (from 1D to nD) and different methods for finding a correlation between the chemical structures and the biological property. Initially, QSAR modeling was limited to small series of congeneric compounds and simple regression methods. Nowadays, QSAR modeling has grown, diversified, and evolved to the modeling and virtual screening (VS) of very large data sets comprising thousands of diverse chemical structures and using a wide variety of machine learning techniques (Cherkasov et al., 2014; Mitchell, 2014; Ekins et al., 2015; Goh et al., 2017).

This review is devoted to (i) critical analysis of advantages and disadvantages of QSAR-based VS in drug discovery; (ii) demonstration of several successful QSAR-based discoveries of compounds with desired properties; (iii) description of best practices for the QSAR-based VS; and (iv) discussion of future perspectives of this approach.

#### BEST PRACTICES IN QSAR MODELING AND VALIDATION

High-throughput screening (HTS) technologies resulted in the explosion of amount of data suitable for QSAR modeling. As a result, data quality problem became one of the fundamental questions in cheminformatics. As obvious as it seems, various errors in both chemical structure and experimental results are considered as major obstacle to building predictive models (Young et al., 2008; Southan et al., 2009; Williams and Ekins, 2011).

Considering these limitations, Fourches et al. (2010; 2015; 2016) developed the guidelines for chemical and biological data curation as a first and mandatory step of the predictive QSAR modeling. Organized into a solid functional process, these guidelines allow the identification, correction, or, if needed, removal of structural and biological errors in large data sets. Data curation procedures include the removal of organometallics, counterions, mixtures, and inorganics, as well as the normalization of specific chemotypes, structural cleaning (e.g., detection of valence violations), standardization of tautomeric forms, and ring aromatization. Additional curation elements include averaging, aggregating, or removal of duplicates to produce a single bioactivity result. Detailed discussion of aforementioned data curation procedures can be found elsewhere (Fourches et al., 2010, 2015, 2016).

The Organization for Economic Cooperation and Development (OECD) developed a set of guidelines that the researchers should follow to achieve the regulatory acceptance of QSAR models. According to these principles, QSAR models should be associated with (i) defined end point, (ii) unambiguous algorithm, (iii) defined domain of applicability, (iv) appropriate measures of goodness-of-fit, robustness, and predictivity, and (v) if possible, mechanistic interpretation (OECD, 2004). In our opinion, the additional rule requesting thorough data curation as a mandatory preliminary step to model development should be added there.

### CONTINUING IMPORTANCE OF QSAR AS VIRTUAL SCREENING TOOL

The current pipeline to discover hit compounds in early stages of drug discovery is a data-driven process, which relies on bioactivity data obtained from HTS campaigns (Nantasenamat and Prachayasittikul, 2015). Since the cost of obtaining new hit compounds in HTS platforms is rather high, QSAR modeling has been playing a pivotal role in prioritizing compounds for synthesis and/or biological evaluation. The QSAR models can be used for both hits identification and hit-tolead optimization. In the latter, a favorable balance between potency, selectivity, and pharmacokinetic and toxicological parameters, which is required to develop a new, safe, and effective drug, could be achieved through several optimization cycles. As no compound need to be synthesized or tested before computational evaluation, QSAR represents a labor-, time- , and cost-effective method to obtain compounds with desired biological properties. Consequently, QSAR is widely practiced in industries, universities, and research centers around the world (Cherkasov et al., 2014).

The general scheme of QSAR-based VS approach is shown in **Figure 1**. Initially, the data sets collected from external sources are curated and integrated to remove or correct inconsistent data. Using these data, QSAR models are developed and validated following OECD guidelines and best practices of modeling. Then, QSAR models are used to identify chemical compounds predicted to be active against selected endpoints from large chemical libraries (Cherkasov et al., 2014). In principle, VS is often compared to a funnel, where a large chemical library (i.e., 10<sup>5</sup> to 10<sup>7</sup> chemical structures) is reduced by QSAR models to a smaller number of compounds, which then will be tested experimentally (i.e., 10<sup>1</sup> to 10<sup>3</sup> chemical structures) (Kar and Roy, 2013; Tanrikulu et al., 2013). However, it is important to mention that modern VS workflows incorporate additional filtering steps, including: (i) sets of empirical rules [e.g., Lipinski's (Lipinski et al., 1997) rules], (ii) chemical similarity cutoffs, (iii) other QSAR-based filters (e.g., toxicological and pharmacokinetic endpoints), and (iv) chemical feasibility and/or purchasability (Cherkasov et al., 2014). Although the experimental validation of computational hits does not represent part of the QSAR methodology, this should be performed as the final important step. After experimental validation, a multi-parameter optimization (MPO) with QSAR predictions of potency, selectivity, and pharmacokinetic parameters can be conducted. This information will be crucial during hit-to lead and lead optimization design of the compound series, to find the properties balance (potency, selectivity, and PK) related with the effect of different decoration patterns to establish a new series of target compounds for in vivo evaluation.

### QSAR-BASED VIRTUAL SCREENING vs. HIGH-THROUGHPUT SCREENING

High-throughput screening can rapidly identify large subsets of molecules with desired activity from large screening collections of compounds (105–10<sup>6</sup> compounds) using automated platebased experimental assays (Mueller et al., 2012). However, the hit rate of HTS ranges between 0.01% and 0.1% and this highlights the frequently encountered limitation that most of the screened compounds are routinely reported as inactive toward the desired bioactivity (Thorne et al., 2010). Consequently, the drug discovery cost increases according to the number of tested compounds (Butkiewicz et al., 2013). On the other hand, typical hit rates from a validated VS method, including QSAR-based, typically range between 1% and 40%. Thus, VS campaigns are found to have a higher rate of biologically active compounds and at a lower cost than HTS.

In this perspective, we show that QSAR-based VS could be used to enrich hit rates of HTS campaigns. For example, Mueller et al. (2010) employed both HTS and QSAR models to search novel positive allosteric modulators for mGlu5, a G-protein coupled receptor involved in disorders like schizophrenia and Parkinson's disease. First, the HTS of approximately 144,000 compounds resulted in a total of 1,356 hits, with a hit rate of 0.94%. Then, this dataset was used to build continuous QSAR models (combining physicochemical descriptors and neural networks), which were subsequently applied to screen a database of approximately 450,000 compounds. Finally, 824 compounds were acquired for biological testing and 232 were confirmed as active (hit rate of 28.2%) (Mueller et al., 2010). In another study, Rodriguez et al. (2010) screened approximately 160,000 compounds to identify 624 antagonists of mGlu5. Further, these data were used to develop QSAR models and, then, applied to screen near 700,000 compounds from ChemDiv database. Among them, 88 of acquired compounds were active, corresponding to a hit rate of 3.6% while the HTS had a hit rate of 0.2% (Mueller et al., 2012).

### PRACTICAL APPLICATIONS OF QSAR-BASED VIRTUAL SCREENING

Despite its obvious advantages, QSAR modeling remains underestimated as a VS tool. Unfortunately, QSAR is still seen

as a complementary analysis to studies of synthesis and biological evaluation, often introduced in the study without any justification or additional perspective. Despite the small number of VS applications available in the literature, most of them led to the discovery of promising hits and lead candidates. Below, we discuss some successful applications of QSAR-based VS for the discovery of new hits and hit-to-lead optimization.

#### Malaria

Malaria is an infectious disease caused by five different species of Plasmodium parasites and transmitted to humans through the bite of infected female mosquitoes of the genus Anopheles. The most lethal species is P. falciparum, which can lead to severe illness and death (Phillips et al., 2017). Malaria is a widespread disease; 91 countries and areas have ongoing transmission. According to World Health Organization (WHO), about 216 million cases and 445,000 deaths from malaria were reported in 2016 (WHO, 2018c). Furthermore, the resistance to antimalarial drugs is a common and growing issue and constitutes a substantial threat for populations in endemic regions (Gorobets et al., 2017; Menard and Dondorp, 2017). In a study reported by Zhang et al. (2013), a data set of 3,133 compounds reported as active or inactive against P. falciparum chloroquine susceptible strain (3D7) was used to develop QSAR models. The models were built using Dragon descriptors (0D, 1D, and 2D), ISIDA-2D fragments descriptors and support vector machines (SVM) method. During QSAR modeling and validation, the data set was randomly divided into modeling and external evaluation set. Additionally, the modeling set was divided multiple times in training and test sets using the Sphere Exclusion algorithm. Then, by using a consensus approach, the QSAR models were applied for VS of the ChemBridge database. After VS, 176 potential antimalarial compounds were identified and submitted to experimental validation along with 42 putative inactive compounds, used as negative controls. Twentyfive compounds presented antimalarial activity in P. falciparum growth inhibition assays and low cytotoxicity in mammalian cells. All 42 compounds predicted as inactives by the models were confirmed experimentally (Zhang et al., 2013). The confirmed experimental hits presented new chemical scaffolds against P. falciparum and could be promising starting points for the development of new optimized antimalarial agents.

### Schistosomiasis

Schistosomiasis is a disease caused by flatworms of the genus Schistosoma that affects 206 million of people worldwide (WHO, 2018d). The current reliance on only one drug, praziquantel, for treatment and control of this disease calls for the urgent discovery of novel anti-schistosomal drugs (Colley et al., 2014). Aiming at discovering new drugs, our group developed binary QSAR models for Schistosoma mansoni thioredoxin glutathione reductase (SmTGR), a validated target for schistosomiasis (Kuntz et al., 2007), to find new structurally dissimilar compounds with antischistosomal activity (Neves et al., 2016). To achieve this goal, we designed a study with the following steps: (i) curation of the largest possible data set of SmTGR inhibitors, (ii) development of rigorously validated and mechanistically interpretable models, and (iii) application of generated models for VS of ChemBridge library. Using the QSAR models, we prioritized 29 compounds for further experimental evaluation. As a result, we found that the QSAR models were efficient for discovery of six novel hit compounds active against schistosomula and three hits active against adult worms (hit rate of 20.6%). Among them, 2-[2-(3-methyl-4-nitro-5-isoxazolyl)vinyl]pyridine and 2-(benzylsulfonyl)-1,3-benzothiazole, two compounds representing new chemical scaffolds have activity against schistosomula and adult worms at low micromolar concentrations and therefore represent promising antischistosomal hits for further hit-to-lead optimization (Neves et al., 2016).

In another study, we developed continuous QSAR models for a data set of oxadiazoles inhibitors of smTGR (Melo-Filho et al., 2016). Using a combi-QSAR approach, we built a consensus model combining the predictions of individual 2D- and 3D-QSAR models. Then, the model was used for VS of ChemBridge database and the 10 top ranked compounds were further evaluated in vitro against schistosomula and adult worms. Additionally, we applied five highly predictive in-house QSAR models for prediction of important pharmacokinetics and toxicity properties of the new hits. The experimental results showed that 4-nitro-3,5-bis(1-nitro-1H-pyrazol-4-yl)-1Hpyrazole (LabMol-17) and 3-nitro-4-{[(4-nitro-1,2,5-oxadiazol-3-yl)oxy]methyl}-1,2,5-oxadiazole (LabMol-19), two compounds containing new chemical scaffolds (hit rate of 20.6%), were highly active in both life stages of the parasite at low micromolar concentrations (Melo-Filho et al., 2016).

### Tuberculosis

Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), kills about 1.6 million people every year (WHO, 2018e). The current treatment of this disease takes approximately 9 months, which normally leads to noncompliance and, hence, the emergence of multidrug-resistant bacteria (AlMatar et al., 2017). Aiming the design of new anti-TB agents, our group used QSAR models to design new series of chalcone (1,3 diaryl-2-propen-1-ones) derivatives. Initially, we retrieved from the literature all chalcone compounds with in vitro inhibition data against M. tuberculosis H37Rv strain. After rigorous data curation, these chalcones were subject to structure–activity relationships (SAR) analysis. Based on SAR rules, bioisosteric replacements were employed to design new chalcone derivatives with optimized anti-TB activity. In parallel, binary QSAR models were generated using several machine learning methods and molecular fingerprints. The fivefold external cross-validation procedure confirmed the high predictive power of the developed models. Using these models, we prioritized series of chalcone derivatives for synthesis and biological evaluation (Gomes et al., 2017). As a result, five 5-nitro-substituted heteroaryl chalcones were found to exhibit MICs at nanomolar concentrations against replicating mycobacteria, as well as low micromolar activity against nonreplicating bacteria. In addition, four of these compounds were more potent than standard drug isoniazid. The series also showed low cytotoxicity against commensal bacteria and mammalian cells. These results suggest that designed heteroaryl chalcones, identified with the help of QSAR models, are promising anti-TB lead candidates (Gomes et al., 2017).

#### Viral Infections

fphar-09-01275 November 9, 2018 Time: 16:29 # 5

Yearly, influenza epidemics can seriously affect all populations in the world. These annual epidemics are estimated to result in about 5 million cases and 650,000 deaths (WHO, 2018b). Influenza virus is mutating constantly, resulting in novel resistant strains, and hence, the development of new anti-influenza drugs active against these new strains is important to prevent pandemics (Laborda et al., 2016). Aiming the discovery of new anti-influenza A drugs, Lian et al. (2015) built binary QSAR models, using SVM and Naïve Bayesian methods, to predict neuraminidase inhibition, a validated protein target for influenza. Then, four different combinations of machine learning methods and molecular descriptors were applied to screen 15,600 compounds from an in-house database, among which 60 compounds were selected to experimental evaluation on neuraminidase activity. Nine inhibitors were identified, five of which were oseltamivir derivatives exhibiting potent neuraminidase inhibition at nanomolar concentrations. Other four active compounds belonged to novel scaffolds, with potent inhibition at low micromolar concentrations (Lian et al., 2015).

According to WHO, approximately 35 million people are infected with HIV (WHO, 2018a). The treatment for HIV infections requires a lifelong antiretroviral therapy, targeting different stages of HIV replication cycle. Consequently, because of the emergence of resistance and the lack of tolerability, development of novel anti-HIV drugs is of high demand (Cihlar and Fordyce, 2016; Garbelli et al., 2017). With the purpose of discovering new anti-HIV-1 drugs, Kurczyk et al. (2015) developed a two-step VS approach to prioritize compounds against HIV integrase, an important target to viral replication cycle. The first step was based on binary QSAR models, and the second on privileged fragments. Then, 1.5 million of commercially available compounds were screened, and 13 compounds were selected to be tested in vitro for inhibiting HIV-1 replication. Among them, two novel chemotypes with moderate anti-HIV-1 potencies were identified, and therefore, represent new starting points for prospective structural optimization studies.

#### Mood and Anxiety Disorders

The 5-hydroxytryptamine 1A (5-HT1A) serotonin receptor has been an attractive target for treating mood and anxiety disorders such as schizophrenia (Nichols and Nichols, 2008; Lacivita et al., 2012). However, the currently marketed drugs targeting 5-HT1A receptor possess severe side effects. To address this, Luo et al. (2014) developed a QSAR-based VS workflow to find new hit compounds targeting 5-HT1A receptor. First, binary QSAR models were generated using Dragon descriptors and several machine learning methods. Then, developed QSAR models were rigorously validated and applied in consensus for VS four commercial chemical databases. Fifteen compounds were selected for experimental testing, and nine of them have proven to be active at low nanomolar concentrations. One of the confirmed hits, [(8α)-6-methyl-9,10-didehydroergolin-8 yl]methanol), showed very high binding affinity (Ki) of 2.3 nM against 5-HT1A receptor.

### Future Directions and Conclusion

To summarize, we would like to emphasize that QSAR modeling represents a time-, labor-, and cost-effective tool to discover hit compounds and lead candidates in the early stages of drug discovery process. Analyzing the examples of QSAR-based VS available in the literature, one can see that many of them led to the identification of promising lead candidates. However, along with success stories, many QSAR projects fail on the model building stage. This is caused by the lack of understanding that QSAR is highly interdisciplinary and application field as well as general ignorance of the best practices in the field (Tropsha, 2010; Ban et al., 2017). Earlier, we have explained this by the undesirably high population of "button pushers," that is, researchers who conduct modeling without understanding and analyzing the data and modeling process itself (Muratov et al., 2012). This was also explained by the elusive ease of obtaining computational model and making even advanced calculations without understanding of the sense and limitations of the approach (Bajorath, 2012). In addition to this, a lot of even experienced researchers target their efforts to a "vicious statistical cycle," which main goal is to validate models using as many metrics as possible. In this case, the QSAR modeling is restricted to a single simple question: "What is the best metrics or the best statistical method"? Although we recognize that the right choice of statistical approach and especially rigorous external validation are necessary and represent an essential step in any computer-aided drug discovery study, we want to reinforce that QSAR modeling is useful only if it is applied for the solution of a formulated problem and results in development of new compounds with desired properties.

As future directions, we would like to point out that the era of big data has just started, and it is still in the chemical/biological data accumulation stage. Therefore, to avoid the situation that the number of assayed compounds available on literature exceeds the modeling capability, the development, and implementation of new machine learning algorithms and data curation methods capable of handling millions of compounds are urgently needed. Finally, the overall success of any QSAR-based VS project depends on the ability of a scientist to think critically and prioritize the most promising hits according to his experience. Moreover, the success rate of collaborative drug discovery projects, where the final selection of computational hits is done by both a modeler and an expert in a given field, is much higher than success rate of the projects driven solely by computational or experimental scientists.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

fphar-09-01275 November 9, 2018 Time: 16:29 # 6

This work was partially funded by the Grant No. 1U01CA207160 from NIH and Grant No. 400760/2014-2 from CNPq. CHA is Research Fellow in productivity of CNPq.

#### REFERENCES


#### ACKNOWLEDGMENTS

The authors would like to thank Brazilian funding agencies, CNPq, CAPES, and FAPEG, for financial support and fellowships.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Neves, Braga, Melo-Filho, Moreira-Filho, Muratov and Andrade. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges

Rodolfo S. Simões<sup>1</sup> , Vinicius G. Maltarollo<sup>2</sup> , Patricia R. Oliveira<sup>1</sup> and Kathia M. Honorio1,3 \*

<sup>1</sup> School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil, <sup>2</sup> Department of Pharmaceutical Products, Faculty of Pharmacy, Federal University of Minas Gerais, Belo Horizonte, Brazil, <sup>3</sup> Center for Natural and Human Sciences, Federal University of ABC, Santo André, Brazil

Medicinal chemistry projects involve some steps aiming to develop a new drug, such as the analysis of biological targets related to a given disease, the discovery and the development of drug candidates for these targets, performing parallel biological tests to validate the drug effectiveness and side effects. Approaches as quantitative study of activity-structure relationships (QSAR) involve the construction of predictive models that relate a set of descriptors of a chemical compound series and its biological activities with respect to one or more targets in the human body. Datasets used to perform QSAR analyses are generally characterized by a small number of samples and this makes them more complex to build accurate predictive models. In this context, transfer and multitask learning techniques are very suitable since they take information from other QSAR models to the same biological target, reducing efforts and costs for generating new chemical compounds. Therefore, this review will present the main features of transfer and multi-task learning studies, as well as some applications and its potentiality in drug design projects.

Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Simone Brogi, University of Siena, Italy Andrew James Greenshaw, University of Alberta, Canada

> \*Correspondence: Kathia M. Honorio kmhonorio@usp.br

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 22 September 2017 Accepted: 22 January 2018 Published: 06 February 2018

#### Citation:

Simões RS, Maltarollo VG, Oliveira PR and Honorio KM (2018) Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges. Front. Pharmacol. 9:74. doi: 10.3389/fphar.2018.00074 Keywords: drug design, medicinal chemistry, QSAR, machine learning, transfer learning, multi-task learning

#### INTRODUCTION

The drug design process, since the discovery/identification of bioactive compounds until the approval of its clinical use by a regulatory agency, is very complex and demands time and financial support (Tufts Center for the Study of Drug Development [CSDD], 2014). There are several wellknown bottlenecks in this process, such as finding out a suitable and validated molecular target, designing and/or discovering of a lead compound, pharmacokinetic and toxicity optimization, besides commercial reasons, efficacy and clinical safety (Khanna, 2012; Medina-Franco et al., 2013). In this scenario, the use of computational techniques in drug discovery is rapidly increasing.

Computer-aided drug design (CADD) techniques are broadly employed in order to reduce costs and time involved in drug design. Among the important CADD techniques, molecular docking, similarity search and QSAR studies could be highlighted. Molecular docking and virtual screening are considered structure-based drug design (SBDD) strategies since it requires 3D structure of a molecular target and consists of predicting a binding mode of molecules and its binding energy (Walters et al., 1998; Shoichet, 2004; Andricopulo et al., 2008). As the docking simulations consider both structures (ligands and targets), its calculations are more computationally expensive. Considering these aspects, similarity searches and pharmacophore modeling are alternatives to faster calculations (Brogi et al., 2009; Tresadern et al., 2009) and are defined as ligandbased drug design (LBDD) strategies since they do not require the biological target structure

(Turki et al., 2017). **Figure 1** illustrates the main steps in a drug design process, including the use of computational tools.

Another LBDD strategy is known as quantitative structureactivity relationships (QSAR) and it has been widely employed in drug design, mainly aiming to predict the biological activity of a compound set against a specific target to optimize the binding affinity (Du et al., 2008; Gertrudes et al., 2012). QSAR models provide accurate predictions of measured endpoints instead of an independent ranking of biological activity. These quantitative approaches have also been used in other tasks, such as optimization of pharmacokinetics and toxicity profile (Maltarollo et al., 2015; Egeghy et al., 2016; Chemi et al., 2017) and virtual screening (Brogi et al., 2013; Melo-Filho et al., 2016; Neves et al., 2016; Zaccagnini et al., 2017).

Several important QSAR studies can be found in literature, which include the description of successful computational methods and algorithms (Sliwoski et al., 2014; Raies and Bajic, 2016), validation techniques (Gramatica and Sangion, 2016), applications (Cherkasov et al., 2014; Fang and Xiao, 2016) as well-challenges and how those have been addressed (Cronin and Schultz, 2003; Arthur, 2008; Dearden et al., 2009; Scior et al., 2009; Wang et al., 2015; Ponzoni et al., 2017).

In many recent studies, machine learning (ML) methods have been largely applied to QSAR analyses. This growth has been mainly motivated by the increasing availability of data in public repositories, the use of numerous and diverse chemical descriptors and the proposal of accurate predictive algorithms, such as support vector machines (SVMs) and artificial neural networks (Gertrudes et al., 2012; Maltarollo et al., 2013; Mitchell, 2014; Lima et al., 2016). A common application of ML techniques in CADD refers to forecast new compound class labels (e.g., "active" versus "inactive") using models previously derived from available training sets (Lavecchia, 2015). In such specific situation, ML techniques are said to perform a classification learning task. In addition, other sort of learning tasks can also be considered in CADD, such as clustering and ranking (Agarwal et al., 2010).

Despite of the widespread use of ML methods in QSAR modeling, the success of such approaches critically depends on the availability of a great amount of data, which remains challenging in drug discovery. This problem is strongly related to issues involving the quality of public data sources, including imprecise representation of chemical structures and inaccurate activity information (Zhao et al., 2017). Furthermore, the nature of different experimental protocols can usually lead to data belonging to different probability distributions, which makes the use of traditional ML techniques impracticable.

The data sets available in public repositories are usually obtained from single structure-activity relationship (SAR) campaigns. This explains the several particular and linear sets of compounds that are commonly used to generate only specialized QSAR models. In most of cases, biological activities of two datasets are measured under different experimental conditions, making the link among chemical spaces difficult to be analyzed (Richter and Ecker, 2015). Furthermore, a large chemical space has activity cliffs naturally: regions in a structure/activity surface where there is a discontinuous SAR (Cruz-Monteagudo et al., 2014).

In 2014, a review on QSAR (Cherkasov et al., 2014) stated that the transferability of QSAR models is one of the challenges in QSAR modeling, since the traditional approaches have been typically designed for each target property individually. Aiming to take advantage of diverse but related available experimental data, transfer and multi-task learning techniques have been recently developed. The novelty behind these approaches is related to their ability to exploit knowledge from other related tasks to improve the learning performance, especially when a small data set is available for training.

### TRANSFER AND MULTI-TASK LEARNING

For QSAR purposes, the data space under analysis is characterized by biological and chemical properties. In such scenario, changes in the distribution of data force the model to be rebuilt, implying to collect new training data. However, in many real-world applications, it is expensive or impossible to recollect data required to reconstruct these models. In such situations, transfer learning (or knowledge transfer) among related domains would be desirable (Pan and Yang, 2010).

Transfer learning can be defined as the ability of a system to recognize and apply the knowledge learned in previous (source) tasks for the solution of new (target) problems. The development of such approach was motivated by the fact that one can apply the knowledge acquired previously to solve new problems more quickly and with better solutions. The goal here relies on extracting the knowledge obtained by a model from one or more source tasks and to apply it to a target task. However, one of the premises for using transfer learning technique is that the source and the destination domains must be related. In this sense, Tan et al. (2015) suggest that such relationship can be expressed by instances (Bickel et al., 2009) or characteristics (Satpal and Sarawagi, 2007). If no direct relationship is found, the forced transfer will not work, resulting in no improvement or even degenerating the performance in the target domain (Fitzgerald and Thomaz, 2015). Multi-task learning is closely related to knowledge transfer, but they have also a clear distinction. In multi-task approaches, a number of tasks are learned simultaneously, without involving designated source and target tasks. **Figure 2** illustrates the overall schemes for transfer and multi-task learning.

The methods used for transfer learning can be summarized into four categories, depending on which aspect of knowledge will be transferred, i.e., "what to transfer" (Pan and Yang, 2010). The first category refers to instance-based transfer learning, which assumes that some data from the source set can be selected for training in the target set by re-weighting. Importance sampling and instance reweighting are the two most commonly techniques used (Dai et al., 2007). The second category refers to transfer learning methods by feature representation, which focuses on encoding the structural information carried by molecules into a numerical representation that can be effectively exploited by learning processes in other related problems. In this case,

the intuitive idea consists in learning a suitable representation of characteristics for use in the target set, i.e., the transfer learning is coded in the representation of the new characteristics (Raina et al., 2007). The third category refers to the transfer learning techniques by parameters (Lawrence and Platt, 2004), in which it is assumed that the source and the target tasks share some parameters or prior distributions of the hyper-parameters of the respective QSAR models. In this case, knowledge can be transferred between the tasks by discovering these shared parameters or priors. The last category consists of methods that deal with the problem of relational knowledge transfer, which refers to transfer learning in related domains (Mihalkova et al., 2007). In this condition, the knowledge can be transferred by mapping the data from the source set to the destination one. The statistical methods of relational learning are the most applied in this case (Mihalkova et al., 2007; Davis and Domingos, 2009). A scheme illustrating how the transfer learning approaches can be applied to obtain predictive models is presented in **Figure 3**.

To apply transfer learning techniques, it is assumed that two sets of related data are available and the knowledge will be transferred from the dataset with the largest volume to the set with the least amount of available data. However, this assumption in the chemical datasets is not always sufficient, requiring the opinion of an expert to define the source datasets. To overcome this limitation, Girschick et al. (2012) proposed an approach to select a source dataset in a repository containing target-related sets by following a data-driven methodology. The main idea behind such proposal is based on calculating a measure for the activity overlap between the target set and each related set available in PubChem database. As result, a ranking of all related sets according to their similarity to the target set is obtained. In order to find the similarity values, Tanimoto coefficient is calculated using the categorization of the chemical compounds (active/inactive) in each dataset. Therefore, the objective is to select the set that has the distribution of instances (compounds) closest to the distribution (number of instances categorized as active and inactive) of the target set.

One can find out many situations where transfer learning adds benefits, for example, molecules could be classified as active or inactive according to a biological data for a defined endpoint (e.g., IC<sup>50</sup> values). For this classification task, it is initially necessary to collect several experimentally tested samples and, next, to train a classifier for the collected data with their respective labels. Since the probability distribution of the comments on other endpoints can be very different, a new classifier has to be trained to each dataset in order to maintain a satisfactory performance. To reduce this effort, it would be desirable to use the knowledge from a classification model that is already trained on some related endpoints to improve the classification performance of other tasks with small samples or datasets (Turki et al., 2017). **Table 1** illustrates examples of transfer learning in drug design.

In general, transfer learning approaches have shown to be promising for combining the knowledge previously obtained in related tasks into a single predictive model, whether for classification, regression, or grouping (Pan and Yang, 2010). In particular, researches in medicinal chemistry with focus in drug discovery have been benefited with the use of transfer learning, as can be seen in previous studies (Girschick et al., 2012; Rosenbaum et al., 2013; Saha et al., 2016). Next, applications of transfer and multi-task learning in medicinal chemistry studies will be presented.

### SOME APPLICATIONS OF TRANSFER AND MULTI-TASK LEARNING

Many machine learning methods are based on the assumption that similar drugs may share the same side effects, but measuring the similarity of these drugs is still a challenge. However, the use of data from various sources (similar drugs) provides important information for the analysis of side effects and should be integrated for obtaining a highly accurate prediction. Zhang et al. (2016) discussed the problem of predicting side effects

caused by drugs through linear neighbor approaches and the integration of data from various sources. The authors argued that auxiliary data can bring additional and diverse information (such as drug substructures, drug targets, drug transporters, drug enzymes, drug pathways) that should be integrated to the sideeffect prediction, aiming at improving its performance. Analyses on multi-label classification showed that the proposed transfer learning approaches achieved better performance than state-ofart-methods (Pauwels et al., 2011; Liu et al., 2012; Cheng et al., 2013) applied to benchmark datasets.

The task of relating chemical structure to biological activity in QSAR studies is usually based on the notion of chemical similarity to predict the molecular behavior of close compounds. So, techniques that provide similarity measures among chemical compounds are increasingly important (Floris et al., 2014). Lately, relevant solutions have been proposed, which comprise distance learning (Biehl et al., 2014) and inductive transfer (Garcke and Vanck, 2014) methods. Distance learning aims at learning an appropriate distance measure to reflect the underlying relationship between instances in the training set, while inductive transfer refers to the process of transferring knowledge learned from one task into another related task. Girschick et al. (2012) presented an adapted transfer approach, which combines distance learning and inductive transfer by learning the distances on a related task and then transferring them to the target learning task. Additionally, the authors

developed a method for selecting a related task that can be used as source task for transfer learning. This technique consists in applying an activity overlap similarity measure to two datasets to find out a suitable source task. This approach was evaluated on five distinct datasets found in PubChem BioAssay (Wang et al., 2009) repository. The results showed that both proposals worked well for large and small amounts of training data.

The multi-task learning approach (Caruana, 1998) is considered to be closely related to transfer learning, since it attempts to learn multiple tasks simultaneously even when they


TABLE 1 | Examples of potential applications of transfer learning methods in drug design.

are different. Rosenbaum et al. (2013) introduced two multitask methods and evaluated the performance of such approaches by inferring multi-target QSAR models on a subset of human kinome. The authors assumed that the taxonomical relationship of the kinase targets should correspond to the relatedness of the QSAR problems on these targets. The multi-task techniques were compared to SVMs models independently trained for each target and an SVM model that assumed all targets to be identical. The results demonstrated that the multi-target learning can over perform baseline (pure SVM) methods if knowledge can be transferred from a target with a lot of data to a similar target with little domain knowledge.

Varnek et al. (2009) applied different inductive transfer and multi-task learning approaches to model tissue-air partition coefficients. The authors found that these techniques improved the prediction accuracy of the obtained models when compared to single task learning. Finally, this study indicated that inductive transfer learning is very suitable when single modeling is unable to generate reliable QSAR models using diverse data sets and with small amount of samples.

Brown et al. (2014) presented some challenges involved with chemogenomic data, since high-throughput assays give us a large number of information from multi-ligand and multi-target data (Pereira and Williams, 2007). So, the authors assert that computational techniques, in particular inductive transfer and explicit learning, can help to construct more robust models when compared to target-specific (classical) QSAR ones.

The study of Zhang et al. (2013) discussed the use of singleand multi-task learning to construct QSAR models for predicting the binding affinity of a compound database by estrogen receptors (ERs), which are involved with endocrine disruption by chemicals and the construction of predictive models can contribute to design safer substances. The authors concluded that multi-task learning provided better results for a small dataset (ERβ ligands) than single learning, indicating that this approach can be considered as a good tool to understand the action mechanism of endocrine disruption and to predict the ER activity of unknown compounds as endocrine disrupting chemicals.

Another interesting application of multi-task techniques was performed by Liu et al. (2011), which used multi-task learning to construct multi-target QSAR models employing three human immunodeficiency virus (HIV) inhibitor datasets together with other six subsets containing two hepatitis C virus (HCV) inhibitors. The main conclusions of this study included the fact that the integration of all databases (HIV and HCV) improved the rate of the discovery of lead HIV-HCV inhibitors, helping the design of new co-inhibitors for these important infections. Other achievement is related to the successful use (considering efficiency in convergence speed and learning accuracy) of a multi-task learning technique to construct multi-target QSAR models.

variables, weights of equations) from available ML models and other datasets in the construction of models for related endpoints. In this sense, it is possible to use different datasets with the same biological activities but measured at different experimental conditions. Other important consequence of applying transfer and multi-task learning is the decrease on computational costs related to the faster convergence obtained by using the knowledge derived from a model previously built from a related endpoint.

From a literature review taking into to account the transfer learning applications on medicinal chemistry, one can note that there is still a great potentiality to be explored in this sense. Other emerging approaches as deep learning methods (Zhang et al., 2017), which basically use complex neural networks architectures, also have promising applications in the era of big data.

Among the main challenges on applying transfer and multitask learning methods is that they require an artificial intelligence expert to code them since there are no chemical and/or pharmaceutical packages with a graphical user interface. Depend on the source data and on the learning method, transfer and multi-task learning could be also considered as "black-boxes," making the interpretability of QSAR models difficult. And, finally, the transfer of knowledge could be inappropriately employed if the assumption of "equivalent" endpoints is not valid.

### CONCLUSION

Nowadays one can observe increasing number of applications of transfer and multi-task learning in medicinal studies. There are also current challenges in the QSAR field that comprise the integration of different datasets (even from different experiments) aiming the same or similar endpoints (Maltarollo et al., 2017) and the development of universal QSAR models using very large datasets (Alves et al., 2017). Therefore, good examples of dataset that could be benefited from transfer and multi-task learning are: (i) compounds with same endpoint measured under different experimental conditions; (ii) antimicrobial activities against genetically similar microorganisms; (iii) compounds with the same mechanism of action in homologous targets and high degree of similarity in the binding pocket; (iv) non-specific endpoints as toxicity against a cell line or permeability rates determined by different models. In this complex scenario, transfer and multi-task learning techniques can be considered powerful tools for drug design.

### AUTHOR CONTRIBUTIONS

RS, KH, VM, and PO designed this article. All authors wrote and revised the manuscript. Also, all authors read and approved the final version of the manuscript.

#### ACKNOWLEDGMENTS

The main issue of transfer and multi-task learning approaches is to employ the knowledge generated (e.g., features, subset of The authors would like to thank FAPESP, CNPq, CAPES, and IBM for funding.

DISCUSSION

### REFERENCES

fphar-09-00074 February 3, 2018 Time: 13:26 # 6



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with the authors.

Copyright © 2018 Simões, Maltarollo, Oliveira and Honorio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Development of an Infrastructure for the Prediction of Biological Endpoints in Industrial Environments. Lessons Learned at the eTOX Project

#### Manuel Pastor\*, Jordi Quintana and Ferran Sanz\*

Research Programme on Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain

#### Edited by:

Leonardo L. G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Antreas Afantitis, NovaMechanics Ltd., Cyprus Ursula Gundert-Remy, Charité – Universitätsmedizin Berlin, Germany Grace Patlewicz, Environmental Protection Agency (EPA), United States

> \*Correspondence: Manuel Pastor manuel.pastor@upf.edu Ferran Sanz ferran.sanz@upf.edu

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 13 July 2018 Accepted: 21 September 2018 Published: 11 October 2018

#### Citation:

Pastor M, Quintana J and Sanz F (2018) Development of an Infrastructure for the Prediction of Biological Endpoints in Industrial Environments. Lessons Learned at the eTOX Project. Front. Pharmacol. 9:1147. doi: 10.3389/fphar.2018.01147 In silico methods are increasingly being used for assessing the chemical safety of substances, as a part of integrated approaches involving in vitro and in vivo experiments. A paradigmatic example of these strategies is the eTOX project http://www.etoxproject.eu, funded by the European Innovative Medicines Initiative (IMI), which aimed at producing high quality predictions of in vivo toxicity of drug candidates and resulted in generating about 200 models for diverse endpoints of toxicological interest. In an industry-oriented project like eTOX, apart from the predictive quality, the models need to meet other quality parameters related to the procedures for their generation and their intended use. For example, when the models are used for predicting the properties of drug candidates, the prediction system must guarantee the complete confidentiality of the compound structures. The interface of the system must be designed to provide non-expert users all the information required to choose the models and appropriately interpret the results. Moreover, procedures like installation, maintenance, documentation, validation and versioning, which are common in software development, must be also implemented for the models and for the prediction platform in which they are implemented. In this article we describe our experience in the eTOX project and the lessons learned after 7 years of close collaboration between industrial and academic partners. We believe that some of the solutions found and the tools developed could be useful for supporting similar initiatives in the future.

Keywords: in silico toxicology, computational toxicology, predictive models, chemical safety, drug safety, industrial environments, public-private partnership, machine learning

### INTRODUCTION

In silico methods are increasingly being used in the assessment of the chemical safety of chemicals as a part of integrated approaches, in which computational tools are used to synergically complement the experimental methods, with the aim of generating better and more efficient predictions of the potential toxicological liabilities of the compounds under study (Luechtefeld et al., 2018). Recent advances in machine learning and deep learning methodologies are demonstrating their

effectiveness in this respect (Lenselink et al., 2017; Liu et al., 2017). Moreover, ambitious collaborative initiatives in this field have been set up with the aim of increasing the availability of relevant data frameworks and developing the aforementioned integrative approaches on top of those data. Among these initiatives, EU-ToxRisk (Daneshian et al., 2016), HESS (Sakuratani et al., 2013), TransQST (Maldonado et al., 2017), iPiE (Bravo et al., 2016) and eTOX (Cases et al., 2014; Sanz et al., 2017; Steger-Hartmann and Pognan, 2018) deserve to be highlighted.

In particular, the eTOX project, funded by the Innovative Medicines Initiative, constituted a pioneering exercise of extracting and integrating in vivo data from legacy resources at the pharmaceutical industry, and exploiting such data for readacross and the development of predictive models, since one of the aims of the eTOX project was to set up an integrated system for the prediction of toxicological endpoints, with a focus on organ and in vivo endpoints. The project faced many challenges, some of which have been described in previous publications (Cases et al., 2014; Sanz et al., 2015, 2017; Steger-Hartmann and Pognan, 2018). Here we wish to share our experiences in an aspect that is often overlooked in this kind of projects, which is how to translate predictive models generated by academia or by Small-Medium Enterprises (SMEs) into a production environment where they can be routinely applied. Irrespectively of the scientific quality of a model, it must meet several requirements to make it amenable for being used by the pharmaceutical industry. This requires building a common understanding between academic and industrial partners, identifying the end-user needs, and making significant efforts to incorporate into the models and the predictive system features that, in spite of their low scientific interest, make the difference between usable and not usable models. In the present article we describe the most significant lessons learned in eTOX, describing some of the problems we identified and describing the solutions applied to solve or mitigate them. Most of these solutions are the result of long hours of discussion, where we learned to understand each other's points of view.

#### RESULTS

Developing a computational model for predicting a biological endpoint is a complex task. In the case of QSAR-like models, their development involves (at least) the curation of the training series, the selection of appropriate molecular descriptors and machine learning methods, building, validation, and interpretation of the model. However, when the aim is to produce a model that can be used by people outside of the modeler's laboratory, the work has not finished with the generation of the model. There is an additional difficulty if we intend to use the models in industrial environments, particularly if the structures of the compounds should be treated as confidential.

In the following sections we will discuss issues related with the model development and implementation, the need of a standard modeling framework for supporting model development and maintenance, as well as the model documentation and validation. In the last section we will discuss also the problems related with the confidentiality of the structures for model training and application.

### Platform for Model Development and Production

Most of the eTOX models were developed by academic partners and SMEs, located in different European countries. Therefore, the architecture of the system to be developed should support independent and concurrent model development, while model prototypes should be made accessible to the end users (pharmaceutical companies) for early testing. This software platform, designed to increase the model development efficiency, should be compatible with the local deployment of the final system. The final version must be installed physically at the computational resources of the pharmaceutical companies, since the end users considered that only an installation behind the company firewalls guarantees that they could be used on highly confidential compounds corresponding to drug candidates under development.

These requirements made necessary the adoption of technical solutions that facilitate the remote access to the models and the portability of its software implementation, which consisted of two layers of containerization. The outer layer consisted in a self-contained virtual machine (VM), configured to expose a REpresentational State Transfer (REST) web service (Fielding, 2000) to predict the properties of query compounds. VMs were installed at the partners facilities, thus making possible that models developed at remote sites were immediately accessible through a centralized web server which shows all available models through a single graphical interface (see **Figure 1**). The physical location of the server running the computations was completely transparent to the end user.

**Figure 2** shows a schematic representation of the setup that was adopted for the development and production of the eTOX models.

The eTOX development setup has the inconvenience that it cannot guarantee an appropriate level of confidentiality on the query structures. These are sent over the Internet, and the computations are carried out in academic servers, some of which do not comply with the strict security requirements necessary to protect confidential structures. For this reason, the testing of the models was carried out using only non-confidential structures and the user interface shows a disclaimer informing of the security risks.

The final version of the system, as mentioned before, was installed locally at the computational facilities of the EFPIA partners (**Figure 2**). The deployment of the system was facilitated by the use of VMs, which can run in heterogeneous computational environments (i.e., diverse operative systems and hardware configurations). The VMs were relatively compact (between 4 and 5 Gb each) and did not have high computational needs (recommended settings were 1 CPU and 2 Gb RAM per VM). The whole system can be accommodated in lowend computational clusters or even in an isolated server with multiple CPUs.

In the same way that the VMs provided a layer of standardization for the external access to the models, we had the need of developing an ad hoc modeling framework, called eTOXlab (Carrió et al., 2015), which supports modelers in their task of implementing and maintaining the predictive models within the VMs. Essentially, each VM contains an instance of eTOXlab, which can manage multiple models and exposes them as web services using a standard Application Programming Interface (API), as shown in **Figure 3**. All the model inputs and outputs are redirected trough the web services. Therefore, as far as the models are correctly implemented within eTOXlab, they are perfectly integrated into the project predictive platform and visible in the common interface shown in **Figure 1**.

#### Model Development and Maintenance

Apart from connecting the individual models to the eTOXsys prediction system, eTOXlab provides additional support for the model development, maintenance, and documentation. Regarding model development by diverse teams of modelers, it is important to make use of common tools providing consistent solutions for tasks that need to be carried out by the different models. An example of this is the structure normalization,

since the end user expects that the input structure is internally normalized and processed in the same way by all the models to which it is submitted. The use of a common modeling framework allows employing a common workflow for the building of all the models and for carrying out predictions with them, where the same software tools are used at each step, thus guaranteeing that the results are consistent; an example is structure normalization. Classically, 2D structures of the molecules are entered by the end user using SMILES or SDFiles formats. Before these structures can be processed, they need to be submitted to a normalization protocol that takes care of removing counterions, saturating and ionizing the molecule to a certain pH and, in some cases, generating 3D structures. Ideally, query molecules must be submitted to the same protocol that was applied to the structures of the training series used for developing the models. When the same query molecule is submitted to multiple models at the same time, the protocols must also be consistent. This requirement is easily met by using the eTOXlab modeling framework. Models implemented in eTOXlab make use of a consistent workflow (**Figure 4**), which processes input molecules in sequential order, submitting them to a normalization tool, an ionization tool and a 3D conversion tool. The tools applied, and the precise parameters used can be customized for each model and are adequately documented, thus guaranteeing a fully consistent treatment in the model training and prediction.

The use of eTOXlab also allowed developing specific components for common tasks. An example of this is ADAN (Carrió et al., 2014), a method specifically developed for assessing the applicability domain of the predictive models developed in eTOX, which is able to generate robust reliability scorings for the predictions. In summary, the ADAN method is based on assessing how far is a query compound from the model applicability domain and, based on this, provide reliability indexes to the predictions. The reliability is translated to pseudo 95% Confidence Interval (CI), thus facilitating the appraisal of

the prediction obtained. The ADAN methodology can also be applied to non-QSAR models (Capoferri et al., 2015).

Another task that can be facilitated by the use of a modeling framework like eTOXlab is the maintenance of the models. Given that models are not static entities, once they are developed, they should evolve along the time by incorporating new compounds to the training series, updating of the software used at the different steps or refining the modeling workflow. In any case, every improvement produces a new version of the original model (see **Figure 5**).

In production environments, where important decisions can be based on model results, it is important to maintain a wellordered inventory of all models and versions developed and use unique identifiers for each of them. As a minimum, the system must allow to reproduce predictions made by any model version.

In eTOX, every model was documented in a central repository, called eTOXvault, where it was assigned a unique public identifier and version number. For models developed within eTOXlab, two circuits of versioning were used. When a model was in development, all the files were stored in a specific development environment (so called "sandbox"). Only models that meet certain quality criteria were copied into a permanent storage space and assigned an internal, sequential version number. Initially these identifiers and version numbers were internal, as they were not exposed to anybody except to the model developer. Once the models were properly documented and verified (as described below), they were assigned an official identifier and version number and they were published as a web service visible to all consortium partners.

#### Model Documentation and Validation

It is widely accepted that models must be documented. However, we learned that different actors have different expectations and very diverse needs regarding model documentation. Most end users require simple documentation describing, in a concise and non-technical language, what is the precise meaning of the model predictions and how reliable are those. On the other hand, modelers need to document the models at a more detailed level to allow reproducing the models and to facilitate the model maintenance. Potential future uses of the model results for regulatory purposes, recommend following widely recognized standards, such as the Organization for Economic Co-operation and Development (OECD) guidance document about QSAR modeling (Organisation for Economic Co-operation and Development, 2007), the guidance on the development, evaluation, and application of environmental models published by the US Environmental Protection Agency (EPA) (Environmental Protection Agency, 2009), or the requirements of the European REACH (Benfenati et al., 2011), or the recent efforts from the pharmaceutical industry (Myatt et al., 2018). In eTOX, models were documented following the OECD guidelines, but the sections of the document were reorganized in a way that allow to obtain summary extractions, as we described in a previous paper (Sanz et al., 2015).

To validate a model means to determine if the model is "fitfor-purpose." This task is highly dependent of the use context and cannot be carried out in a general manner for all models. In eTOX the model validation was replaced by a systematic model verification methodology, which guarantees that the model produces the results described in the documentation (Hewitt et al., 2015).

#### Structure Confidentiality

The eTOX project was a collaborative effort involving several major pharmaceutical companies, which contributed data generated and stored in-house for the training of predictive models. Sharing this information posed a major problem, in particular when it involved the structure of confidential compounds. Predictive models should ideally be built using all available structures and biological annotations available, irrespectively of the partner who contributed this information. Unfortunately, the data protection policies of the different industrial partners imposed obvious limitations, difficult to overcome.

At the beginning of the project we hoped to be able to develop and implement new structure-masking algorithms able to hash the structures into representations usable for building models, but resilient to any effort to reverse-engineer the algorithm and guess the original chemical structure. Our hope was not unfounded, and different similar methods have been published in the past (Tetko et al., 2005; Masek et al., 2008). For this particular purpose, we obtained excellent results using a simple

random permutation of the molecular descriptors generated by methods like GRIND or GRIND2 (Pastor et al., 2000; Durán et al., 2008). The permuted vector of descriptors does not allow guessing the original structure, since the permutation destroyed any link between the value of the variables and their physicochemical interpretation. Moreover, this approach is resistant to brute-force methods (Faulon et al., 2005; Filimonov and Poroikov, 2005), since these methods require the application of the same algorithm to a comprehensive database of structures, and a key element of the hashing algorithm (the random seed of the permutation) is never shared or revealed. The robustness of the algorithm was carefully tested and further demonstrated by code-breaking challenges at the project consortium level, where the hashed representation resisted any effort to identify the original structure. In these exercises, we also demonstrated that the hashed representation preserved all the information existing in the original molecular descriptors, and the models derived from them had equivalent quality.

Unfortunately, in these exercises we found that, beyond the robustness of our masking algorithm, it was impossible to convince the pharmaceutical companies to implement it in the eTOX project since, given the high corporate sensitiveness on the issue, such implementation would require costly external audits that we could not afford. For this reason, we adopted an alternative approach: if the confidential data cannot be taken out of the companies' internal repositories, we can move the whole model building system to the companies, so the models can be built there. Indeed, we took advantage that the eTOXlab-VM containers are already portable model building engines. Without any modification, they can be used to develop fully functional models behind the companies' firewalls. Furthermore, this approach could be even better if the models obtained could be shared without compromising the confidentiality of the training series. In order to make this possible, eTOXlab implements a "safe mode" for building models in a special way, which retains no information at all about the structures or identities of the training series. When configured in this way, the eTOXlab model consists in a small text file, with the values of the coefficients that must be applied to the molecular descriptors computed for future query compounds to estimate their biological activity. This small file can be exported to other partners without any risk since it is easy to audit to guarantee that no sensitive information at all is exported even using an unsecure means (e.g., e-mail, portable USB device).

### DISCUSSION

Some of the solutions applied in eTOX for generating a predictive system usable in production environments involve the use of specific software, wrapping the scientific work developed by SMEs and academic partners into a "package" easier to deploy and integrate in corporate settings. The use of this kind of software, which is described in this article as a "modeling framework," adds further advantages, like facilitating the consortium-wide adoption of standard modeling components, and simplifying key steps of the model life cycle, like the model retraining and maintenance. In eTOX, a new modeling framework was developed ad hoc for the project (eTOXlab). This software has been released as open source under GNU GPL v3.0 (GNU GPL v3, 2007). The source code of eTOXlab is accessible at https://github.com/phi-grib/eTOXlab. A fully configured VM including eTOXlab is also accessible at http://phi.upf.edu/envoy/. Hence, future projects aiming to develop similar predictive systems have now the option of reusing these resources, either as they are or customizing them to meet specific project needs.

We consider that these resources have value on their own, but they have an additional value as a proof-of-concept, since they demonstrate that they are helpful for making software tools developed by academic and SMEs usable by pharmaceutical companies. **Table 1** lists some of the key features that, in our opinion, such kind of frameworks must incorporate.

Another key element required is the definition and consortium-wide adoption of protocols for labeling, documenting and verifying the models. These are important aspects, which must be negotiated with the end-users for providing fit-for-purpose solutions. In this dialog, the expected use of the predictive system must be identified as soon as possible, since a modeling system aiming to prioritize lead compounds has completely different requirements, in terms of documentation and verification, than a system supporting decisions that could be communicated to regulatory agencies. This consideration should not be interpreted as a justification for considering optional the complete model documentation or the quantification of the prediction uncertainty; however, the standards used in either case are different. For this reason, the requirements derived from all intended model uses must be identified with the help of the end users, clearly defined and translated into system specifications.

One of the most complex aspects in the development of the aforementioned prediction systems is the internal adoption of the


TABLE 1 | Features required for the building of a predictive system usable in production environments.

models by the end-users. The procedures vary from company to company, although they typically involve the validation of the system by comparing the prediction results with other in silico or experimental methods. As the structures being used in this comparison are often confidential, in the vast majority of cases the results of such validations are not made public. This is understandable but unfortunate, because this behavior results in a lack of feedback about the final usefulness of the predictive system. A published example of this kind of internal validations was the one carried out by Sanofi on the eTOX QT prolongation model using 434 drug candidates (Amberg et al., 2016).

Another aspect, briefly discussed in this article, is the potential use of portable modeling environments for building and sharing predictive models in which confidential structures are used. In eTOX, this was considered the only acceptable option, while solutions attempting to obfuscate, mask or encrypt the structures (or the molecular descriptors) were considered by the partners too risky to be used in practice. eTOXlab was configured for producing shareable models, which can be safely shared and exported because they contain no trace of the original structures. Similar features can also be easily implemented in other modeling frameworks. Here we want to emphasize the conceptual value of the aforementioned strategy consisting in building the models within the companies and exporting only the model coefficients. The implementation of this strategy only requires the use, across the collaborating partners, of a common modeling framework facilitating the import and export of the model coefficients.

Many of the eTOX partners have continued their collaboration and now participate in a new IMI project (eTRANSAFE)<sup>1</sup> , which shares with eTOX the aim to develop predictive systems. The ideas and principles described in this article are being applied, extending and adapting them to meet the objectives of this new project. One part of this effort is the development of a new modeling framework (called Flame), inspired on the same principles of eTOXlab but technologically more advanced. The source code of this software, still in development, is distributed under GNU GPL v3.0 (GNU GPL v3, 2007) and can be accessed at https://github.com/phi-grib/flame.

Finally, a limited version of eTOXsys, including the modeling system described here and a few selected models has been made open to the scientific community and can be accessed at http: //etoxsys.eu/.

1 http://etransafe.eu

#### REFERENCES


### CONCLUSION

Beyond the concrete database, predictive models and integrated computational system that have been developed, the eTOX project has demonstrated that the successful completion of ambitious industry-oriented collaborative projects requires not only the development and implementation of state-of-the-art scientific approaches, but also the careful implementation of adequate technical and organizational solutions. Among them, the adoption of adequate standards and protocols is a key component. The efforts done in eTOX in this respect are being extended to the new IMI eTRANSAFE project<sup>1</sup> , which will jointly exploit preclinical data and clinical safety information for a better prediction of potential human safety liabilities (Sanz et al., 2017).

We hope this paper will contribute to save the readers' time and effort in similar public-private projects, as well as to improve the efficiency in the collaboration between the pharmaceutical industry and external parties in the development and application of computational tools supporting the drug discovery and development pipeline.

#### AUTHOR CONTRIBUTIONS

FS was the academic coordinator of the eTOX project. MP is a major contributor to the design of the eTOX predictive system described here, even if the credit belongs to the whole eTOX consortium. MP wrote, designed the figures and assembled this manuscript, which was enriched, refined and formatted by FS and JQ.

#### FUNDING

The eTOX project (Grant Agreement No. 115002), was developed under the Innovative Medicines Initiative Joint Undertaking (IMI), resources of which are composed of a financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contributions. The authors of this article are also involved in other related IMI projects, such as iPiE (no. 115735), TransQST (no. 116030) and eTRANSAFE (no. 777365), as well as the H2020 EU-ToxRisk project (no. 681002).



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pastor, Quintana and Sanz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# vNN Web Server for ADMET Predictions

#### Patric Schyman\*, Ruifeng Liu, Valmik Desai and Anders Wallqvist\*

DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD, United States

In drug development, early assessments of pharmacokinetic and toxic properties are important stepping stones to avoid costly and unnecessary failures. Considerable progress has recently been made in the development of computer-based (in silico) models to estimate such properties. Nonetheless, such models can be further improved in terms of their ability to make predictions more rapidly, easily, and with greater reliability. To address this issue, we have used our vNN method to develop 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction models. These models quickly assess some of the most important properties of potential drug candidates, including their cytotoxicity, mutagenicity, cardiotoxicity, drug-drug interactions, microsomal stability, and likelihood of causing drug-induced liver injury. Here we summarize the ability of each of these models to predict such properties and discuss their overall performance. All of these ADMET models are publically available on our website (https://vnnadmet.bhsai.org/), which also offers the capability of using the vNN method to customize and build new models.

Keywords: ADME, toxicology, QSAR, machine learning, applicability domain, online web platform, open access

#### INTRODUCTION

Drug discovery is a risky, lengthy, and resource-intensive process with high attrition rates. In recent years, the development of assays and computer-based (in silico) models to assess absorption, distribution, metabolism, and excretion (ADME) properties has greatly reduced the attrition rate (Waring et al., 2015). The ability to predict these properties quickly and reliably facilitates the exclusion of compounds with potential ADME issues, and thereby helps investigators prioritize which compounds to synthesize and evaluate. However, toxicity remains a hurdle, with an attrition rate of 40% among new compounds identified in the drug discovery phase (Waring et al., 2015). This necessitates careful selection of compounds during drug development to avoid late-stage attrition. As such, there is an urgent need for in silico methods that make fast, easy, and reliable predictions of ADME and toxicity (ADMET) properties, which has resulted in several online tools and web-platforms for ADMET predictions (Walker et al., 2010; Sushko et al., 2011; Cheng et al., 2012; Maunz et al., 2013; Manganaro et al., 2016; Daina et al., 2017).

Here we provide an overview of our versatile variable nearest neighbor (vNN) method (Liu et al., 2012) and the 15 models we constructed using this method to predict the ADMET properties of potential target compounds. The vNN method has several advantages over existing in silico methods. First, it calculates the similarity distance between molecules in terms of their structure, and uses a distance threshold to define a domain of applicability (i.e., all nearest neighbors that meet a minimum similarity threshold constraint).

#### Edited by:

Adriano D. Andricopulo, São Carlos Institute of Physics, University of São Paulo, Brazil

#### Reviewed by:

Fabio Broccatelli, Genentech, United States Tero Aittokallio, Institute for Molecular Medicine Finland, Finland Emilio Benfenati, Istituto Di Ricerche Farmacologiche Mario Negri, Italy

#### \*Correspondence:

Patric Schyman pschyman@bhsai.org Anders Wallqvist sven.a.wallqvist.civ@mail.mil

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 22 September 2017 Accepted: 20 November 2017 Published: 04 December 2017

#### Citation:

Schyman P, Liu R, Desai V and Wallqvist A (2017) vNN Web Server for ADMET Predictions. Front. Pharmacol. 8:889. doi: 10.3389/fphar.2017.00889

**149**

This applicability domain, while limiting vNN-based models to making predictions only for molecules that are similar to the reference molecules, ensures that the predictions they generate are reliable. Second, vNN-based models can be built within minutes and require no re-training when new assay information becomes available—an important feature when keeping quantitative structure—activity relationship (QSAR) models up-to-date to maintain their performance levels. Finally, as we show throughout this work, the performance characteristics of our vNN-based models are comparable, and often superior, to those of other more elaborate model constructs.

We have developed a publically available vNN website (https://vnnadmet.bhsai.org/). This website provides users with ADMET prediction models that we have developed, as well as a platform for using their own experimental data to update these models or build new ones from scratch. Although we use the vNN method here for predicting ADMET properties, the vNN website can be used to build a variety of classification or regression models.

#### MATERIALS AND METHODS

#### The vNN Method

The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). This method rests on the premise that compounds with similar structures have similar activities. The simplest form of the k-NN method takes the average property values of the k nearest neighbors as the predicted value. However, because structurally similar compounds tend to show similar biological activity, it is reasonable to weight the contributions of neighbors so that closer neighbors contribute more to the predicted value. One notable feature of the k-NN method is that it always gives a prediction for a compound, based on a constant number, k, of nearest neighbors no matter how structurally dissimilar they are from the compound. An alternative approach is to use a predetermined similarity criterion. We developed the aforementioned vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction.

One of the most widely used measures of the similarity distance between two small molecules is the Tanimoto distance, d, which is defined as:

$$d = 1 - \frac{n(P \cap Q)}{n\left(P\right) + n\left(Q\right) - n(P \cap Q)},\tag{1}$$

where n(P ∩ Q) is the number of features common to molecules p and q, and n(P) and n(Q) are the total numbers of features for molecules p and q, respectively. The features used to calculate molecular similarity are often based on atom type (connectivity and chemical properties), such as element, charge, donor, acceptor, and aromatic, but they can also be based on holistic molecular properties, such as molecular weight and partition coefficient (LogP). The predicted biological activity y is then given by a weighted average across structurally similar neighbors:

$$\mathcal{Y} = \frac{\sum\_{i=1}^{\upsilon} \mathcal{Y}\_i e^{-\left(\frac{d\_i}{h}\right)^2}}{\sum\_{i=1}^{\upsilon} e^{-\left(\frac{d\_i}{h}\right)^2}}, \quad d\_i \le d\_0 \tag{2}$$

where d<sup>i</sup> denotes the Tanimoto distance between a query molecule for which a prediction is made and a molecule i of the training set; y<sup>i</sup> is the experimentally measured activity of molecule i; h is a smoothing factor, which dampens the distance penalty; d<sup>0</sup> is a Tanimoto-distance threshold, beyond which two molecules are no longer considered to be sufficiently similar to be included in the average; and v denotes the total number of molecules in the training set that satisfy the condition d<sup>i</sup> ≤ d0. The values of h and d<sup>0</sup> are determined from cross-validation studies.

To identify structurally similar compounds, we used Accelrys extended-connectivity fingerprints with a diameter of four chemical bonds (ECFP4) (Rogers and Hahn, 2010). For the vNN website, we chose ECFP4 fingerprints, which have previously been reported to show satisfactory overall performance in retrieving the active compounds of diverse datasets (Hert et al., 2004; Duan et al., 2010; Schyman et al., 2016). We emphasize that h and d<sup>0</sup> are unique, and need to be optimized for each set of fingerprints and training set.

#### Model Validation

We used the 10-fold cross-validation (CV) procedure to validate the model and determine the values of h and d0. We randomly divided the data into 10 sets, 9 of which we used to develop the model and the 10th to validate the model. We repeated this process 10 times, leaving each set of molecules out once. In the next section, we report averages of the 10-fold CV as the performance measures.

#### Performance Measures

We used the following metrics to assess the quality of the classification models:

$$\text{sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}} \tag{3}$$

$$\text{Specificity} = \frac{\text{TN}}{\text{FP} + \text{TN}} \tag{4}$$

$$\text{accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \tag{5}$$

$$\text{kappa} = \frac{\text{accuracy} - \text{Pr} \text{(e)}}{1 - \text{Pr} \text{(e)}} \tag{6}$$

where TP, TN, FP, and FN denote the numbers of true positives, true negatives, false positives, and false negatives, respectively. The metric kappa assesses the quality of binary classifiers (Dunn and Everitt, 1995). Pr(e) is an estimate

**Abbreviations:** Pgp, permeability glycoprotein; MDR, multidrug resistance.

of the probability of a correct prediction by chance. It is calculated as:

$$\Pr(\mathbf{e}) = \frac{(\text{TP } + \text{FN})(\text{TP } + \text{FP}) + (\text{FP } + \text{TN})(\text{TN } + \text{FN})}{(\text{TP } + \text{FN } + \text{FP } + \text{TN})^2} \tag{7}$$

The sensitivity measures a model's ability to correctly detect true positives, whereas the specificity measures its ability to detect true negatives. Kappa compares the probability of correct predictions to the probability of correct predictions by chance. Its value ranges from +1 (perfect agreement between model prediction and experiment) to −1 (complete disagreement), with 0 indicating no agreement beyond that expected by chance.

The performance measure for regression models is given by the Pearson's correlation coefficient (Adler and Parmryd, 2010):

$$R = \frac{\sum\_{i=1}^{n} \left(\chi\_i - \bar{\mathfrak{x}}\right) \left(\boldsymbol{\upchi}\_i - \bar{\mathfrak{y}}\right)}{\sqrt{\sum\_{i=1}^{n} \left(\chi\_i - \bar{\mathfrak{x}}\right)^2} \sqrt{\sum\_{i=1}^{n} \left(\boldsymbol{\upchi}\_i - \bar{\mathfrak{y}}\right)^2}}\tag{8}$$

where n is the sample size, x<sup>i</sup> and y<sup>i</sup> are samples, and x and y are sample means. The correlation coefficient provides a measure of the interrelatedness of numeric properties. Its value ranges from −1 (highly anticorrelated) to +1 (highly correlated), and is 0 when uncorrelated.

We also calculated the coverage, which we define as the proportion of test molecules with at least one nearest neighbor that meets the similarity criterion. For all other molecules that do not meet the criterion, we do not make any predictions. In this case, the coverage is a measure of the size of the applicability domain of a prediction model.

#### RESULTS

#### The vNN Platform

The main purpose of the vNN-based platform is to provide users with a tool to make ADMET predictions and a userfriendly environment to build new models. Hence, the platform offers users two main capabilities that are accessible from the main webpage (https://vnnadmet.bhsai.org/) (**Figure 1**): (1) to run prebuilt ADMET models and (2) to build and run customized models.

To use prebuilt ADMET models, users need only provide one or more query molecules as the input (**Figure 2**). They can do this either by drawing the molecule, entering the molecular SMILES string (Weininger, 1988) directly on the website, or uploading a text file (csv or txt format) with query molecules in SMILES format. The text file should contain column headers labeled as NAME and SMILES. Once users upload the query molecules, they can submit the job. The application will then automatically

run all ADMET prediction models. The output will be displayed once all predictions are completed and a temporary link to the result page will be sent to the user's e-mail address. The results can be downloaded as a table to the user's computer (**Figure 3**). By default, the user will see the ADMET results for our models, which use a restricted applicability domain. However, there is an option to include the results for the remaining compounds, using our unrestricted applicability domain models. The time required to run 100 query compounds is ∼5 min on the server. However, this may vary depending on the size of the molecules and whether or not the job has been queued.

Users can build their own models by either selecting Build Classification Model or Build Regression Model on the main webpage (**Figure 1**). On the Build Classification Model page (**Figure 4**), users are asked to upload a list of molecules in SMILES format and the property of interest, with column headers labeled as NAME, SMILES, and PROPERTY. The value of the property should be set to 1 or 0 for classification models and real numbers for regression models. The vNN platform will then automatically run 10-fold CV by varying the Tanimoto distance (d) from 0.1 to 1.0 in increments of 0.1, and the smoothing factor (h) from 0.1 to 1.0 at each value of d. Once the calculations are completed, a temporary link to the result page will be sent to the user's e-mail address. The results will be displayed on an interactive webpage where users can select the values for d and h (Equation 2), depending on the optimal performance measures and coverage (**Figure 4**). The time required to build a model with a dataset of 1,000 compounds is ∼10 min.

Users can then select the Run Custom Model option to predict the activity of new test molecules (**Figure 5**), using the previously selected values for the Tanimoto Distance and Smoothing Factor, and add the same molecules as those used to train the model in the Upload Compounds with Property data field. They then need to add the new query molecule(s) in SMILES format in the

Upload Query Compounds field. The result will be displayed on a new webpage, and a temporary link to that page will also be sent to the user's e-mail address (**Figure 5**).

Users can download the results from the website into a single file.

#### Available ADMET Predictions

The available ADMET prediction models, including their performance measures for the restricted applicability domain model, are summarized in **Table 1**. The performance measures for the models using an unrestricted applicability domain are presented in Table S1 in the Supplementary Material and on our website (https://vnnadmet.bhsai.org/). The 15 models cover a diverse set of ADMET endpoints. We will briefly describe these models and their performance measures, as well as the sources from which we retrieved the data. All datasets are available in SMILES format on the vNN web server or in Structure Data Format (SDF) in the Supplementary Material (Datasheet 1). Some of the models have already been published (Liu et al., 2012, 2015; Liu and Wallqvist, 2014; Schyman et al., 2016). We also present several new models here for the first time.

#### Blood-Brain Barrier

The blood-brain barrier (BBB) is a highly selective barrier that separates the circulating blood from the central nervous system (CNS) (Abbott et al., 2006). It allows the passage of water molecules and water-soluble lipid molecules, as well as the selective transport of glucose and amino acids. The benefit of predicting BBB-permeable compounds is two-fold: (1) to identify toxicants that could harm the brain, and (2) to design drug molecules that can pass the BBB and reach their target in the CNS.

We developed a vNN-based BBB model, using 353 compounds whose BBB permeability values (log BB) were obtained from the literature (Muehlbacher et al., 2011; Naef,



<sup>a</sup>Number of compounds in the dataset; <sup>b</sup>Tanimoto-distance threshold value; <sup>c</sup>Smoothing factor; <sup>d</sup>Pearson's correlation coefficient; <sup>e</sup>Regression model.

2015). We classified compounds with log BB values of <−0.3 and >+0.3 as BBB non-permeable and permeable, respectively. To calculate performance measures, we classified BBB permeable and BBB non-permeable compounds as positives and negatives, respectively.

The model predicted whether or not a given compound would pass the BBB, but only for compounds within the applicability domain defined by the training set. The performance measures in **Table 1** were calculated from 10-fold CV. The model showed a high overall accuracy of 90% and a kappa value of 0.80, with a coverage of 61%. The size of the dataset limited the applicability domain of the model. However, if new data become available, they can easily be added to the model to increase the applicability domain.

The model performed on par with the best of the BBB models published thus far. Most of the latter models, which used small datasets, are global models applied to any molecule. However, all models have a finite applicability domain (Cherkasov et al., 2014). Indeed, modeling BBB permeability is complicated because there are different possible routes across the barrier, via passive diffusion or protein transport, and no model singlehandedly accounts for all factors associated with this property. Our vNN model only makes predictions for compounds that are structurally similar enough to the test set molecules to ensure that they have the same type of transport mechanism. Thus, our vNN method accounts for multiple transport routes.

#### MMP Disruption (Mitochondrial Toxicity)

Given the fundamental role of mitochondria in cellular energetics and oxidative stress, mitochondrial dysfunction has been implicated in cancer, diabetes, neurodegenerative disorders, and cardiovascular diseases (Pieczenik and Neustadt, 2007). Many pharmaceuticals and environmental toxicants cause mitochondrial dysfunction (Meyer et al., 2013). Therefore, the ability to predict the impact of chemicals on mitochondrial function would be useful. However, predicting mitochondrial toxicants is complicated because mitochondrial dysfunction can result from impairing any of the following: (1) the electron transport chain (ETC), (2) the mitochondrial transport pathway, (3) fatty acid oxidation, (4) the citric acid cycle, (5) mtDNA replication, (6) and mitochondrial protein synthesis.

There are several common experimental techniques to measure mitochondrial function. We used the largest dataset of chemical-induced changes in mitochondrial membrane potential (MMP), based on the assumption that a compound that causes mitochondrial dysfunction is also likely to reduce the MMP. We developed a vNN-based MMP prediction model, using 6,261 compounds collected from a previous study that screened a library of 10,000 compounds (∼8,300 unique chemicals) at 15 concentrations, each in triplicate, to measure changes in the MMP in HepG2 cells (Attene-Ramos et al., 2015). The study found that 913 compounds decreased the MMP, whereas 5,395 compounds had no effect. We classified compounds that decreased the MMP as positives and those that did not affect the MMP as negatives.

Our MMP model predicted whether a given compound had the potential to affect the MMP and thereby cause mitochondrial dysfunction. It made predictions for compounds that were well represented in the applicability domain, but not for any other compound. The model showed a high overall accuracy of 89% and a kappa value of 0.61, with a coverage of 69% (**Table 1**).

#### Cytotoxicity (HepG2)

Cytotoxicity is the degree to which a chemical causes damage to cells. Cytotoxicity assays are widely used to screen compounds for unwanted cell damage, and to identify compounds that could be used, for example, to kill cancer cells. As such, the ability to identify cytotoxic compounds is highly desirable.

We developed a cytotoxicity prediction model, using a training dataset of in vitro toxicity against HepG2 cells for



<sup>a</sup>Values in parentheses are the deep learning results from Xu et al. (2015).

<sup>b</sup>Values averaged over 60 runs of 10-fold CV.

6,097 structurally diverse compounds, which we collected from Chemical European Biology Laboratory (ChEMBL) (Bento et al., 2014). In developing our model, we considered compounds with an IC<sup>50</sup> of 10µM or less in the in vitro assay as cytotoxic. We classified cytotoxic compounds as positives and non-toxic compounds as negatives.

The cytotoxicity model performed well, with an overall accuracy of 84% and a kappa value of 0.64 (**Table 1**). Because compounds in the dataset achieved only sparse coverage of the chemical space, the model only predicted compounds that were well represented in the dataset. It did not give predictions for other compounds, and thereby avoided misleading results. When using 10-fold CV, the model reliably predicted 89% of the compounds in our dataset.

#### Drug-Induced Liver Injury

Over the last 50 years, drug-induced liver injury (DILI) has been the most commonly cited reason for drug withdrawals from the market (Assis and Navarro, 2009). As a result, current drug development efforts are devoted to identifying and eliminating potential DILI compounds. Therefore, a model that predicts at an early stage whether a compound causes liver injury would be highly desirable. However, the mechanisms of DILI are complicated and diverse, making toxicology studies difficult. For example, compounds that cause DILI in humans do not necessarily induce clear liver injury in animal studies.

We collected DILI data from four sources used by Xu et al. (2015): (1) the U.S. FDA's National Center for Toxicological Research (NCTR dataset) (Chen M. et al., 2011), as well as the datasets of (2) Greene (Greene et al., 2010), (3) Xu (Xu et al., 2008), and (4) Liew (Liew et al., 2011). In the first three datasets, which included pharmaceuticals, we classified a compound as causing DILI if it was associated with a high risk of DILI and not if there was no such risk. We excluded low-risk DILI compounds. In the Liew dataset, which contained both pharmaceuticals and non-pharmaceuticals, we classified a compound as causing DILI if it was associated with any adverse liver effect. DILIassociated compounds were classified as positives and non-DILI compounds as negatives.

The performance measures of the vNN model, using 10 fold CV of the entire dataset excluding duplicated compounds, showed an overall accuracy of 71% and a coverage of 66% (**Table 1**). We also used the same datasets and compared our models with some previously published deep learning models (Xu et al., 2015; **Table 2**). Considering the complexity and computational time investment involved in training these deep learning models, our vNN models performed relatively well; they performed on-par with the deep learning models, albeit with a coverage ranging from 40 to 65%.

#### Cytochrome P450 Inhibition (Drug-Drug Interaction)

Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drugmetabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction (Murray, 2006). In drug development, in vitro assays are routinely used to assess interactions between drug candidates and CYPs. However, there is a need for in silico models that assess potential interactions with CYPs in the early stages of drug development.

We collected data for five main drug-metabolizing CYPs: 1A2, 2D6, 2C9, 2C19, and 3A4. We retrieved CYP inhibitors from ChEMBL (Bento et al., 2014) and classified them as inhibitors if the IC<sup>50</sup> was below 10µM. We removed from the dataset any duplicates or compounds tested multiple times with contradicting results, in which the reported IC<sup>50</sup> values were below and above the 10µM threshold value. For all CYPs, we classified inhibitors and non-inhibitors as positives and negatives, respectively.

The performance measures for the five CYP models are presented in **Table 1**. All models achieved high accuracy (87– 91%) and kappa values (0.54–0.68) while maintaining high coverage (75–78%).

#### hERG Blockers

The human ether-à-go-go-related gene (hERG) codes for a potassium ion channel involved in the normal cardiac repolarization activity of the heart (Sanguinetti and Tristani-Firouzi, 2006). Drug-induced blockade of hERG function can cause long QT syndrome, which may result in arrhythmia and death (De Ponti et al., 2001). For this reason, hERG liability is one of the toxicology screens that drug candidates must pass during early pre-clinical studies. Therefore, in silico models that identify hERG blockers in the early stages of drug design are of considerable interest.

We retrieved 282 known hERG blockers from the literature and classified compounds with an IC<sup>50</sup> cutoff value of 10µM or less as blockers (Wang et al., 2012). We also collected a set of 404 compounds with IC<sup>50</sup> values >10µM from ChEMBL (Bento et al., 2014) and classified them as non-blockers (Czodrowski, 2013). We classified hERG blockers and non-blockers as positives and negatives, respectively.

The hERG model performed with an overall accuracy of 84%, well-balanced sensitivity and specificity values (84 and 83%, respectively), and a kappa value of 0.68 (**Table 1**). The model reliably predicted 80% of the compounds in our dataset when using 10-fold CV. However, the coverage of chemical space by the non-hERG blockers in the dataset was sparse, and only compounds well represented in the dataset were predicted with confidence. Because the model did not give predictions for other compounds, it avoided misleading results. Therefore, users should use this model to flag potential hERG blockers rather than to identify non-hERG blockers.

#### Pgp Substrates and Inhibitors

P-glycoprotein (Pgp) is an essential cell membrane protein that extracts many foreign substances from the cell (Ambudkar et al., 2003). As such, it is a critical determinant of the pharmacokinetic properties of drugs. Cancer cells often overexpress Pgp, which increases the efflux of chemotherapeutic agents from the cell and prevents treatment by reducing the effective intracellular concentrations of such agents—a phenomenon known as multidrug resistance (Borst and Elferink, 2002). For this reason, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. Therefore, using the vNN method, we developed models to predict both Pgp substrates and Pgp inhibitors.

The Pgp substrate dataset was collected by Hou and coworkers (Li et al., 2014). This dataset included measurements for 422 substrates and 400 non-substrates. To generate a large Pgp inhibitor dataset, we combined two datasets (Broccatelli et al., 2011; Chen L. et al., 2011), and removed duplicates to form a combined dataset consisting of a training set of 1,319 inhibitors and 937 non-inhibitors. We classified the Pgp inhibitors (substrates) and non-inhibitors (non-substrates) as positives and negatives, respectively.

The vNN models for identifying Pgp substrates and inhibitors gave accurate and reliable results, showing overall accuracies of 79 and 85%, respectively, when using 10-fold CV, with corresponding kappa values of 0.58 and 0.66. These models reliably predicted 65 and 76% of the compounds in their datasets to be Pgp substrates and inhibitors, respectively. The performance characteristics of these models were comparable, or at times superior, to those of other model constructs (Schyman et al., 2016).

#### Chemical Mutagenicity (AMES Test)

Mutagens are chemicals that cause abnormal genetic mutations leading to cancer. A common way to assess a chemical's mutagenicity is the Ames test (Ames et al., 1973). This test has become the standard for assessing the safety of chemicals and drugs, and has been used to test thousands of molecules. We examined whether the vNN method could effectively use existing data to predict mutagenicity.

We retrieved an Ames mutagenicity dataset consisting of 6,512 compounds, of which 3,503 were Ames-positive (Hansen et al., 2009), and developed a vNN Ames mutagenicity prediction model. The model performed well, with an overall accuracy of 82%; sensitivity and specificity values of 86 and 75%, respectively; and a high kappa value of 0.62 (**Table 1**). The model also reliably predicted 79% of the compounds in the Ames dataset when using 10-fold CV. Further details of the model and its prediction performance can be found elsewhere (Liu and Wallqvist, 2014).

#### Maximum Recommended Therapeutic Dose

A basic principle of toxicology is that "the dose makes the poison." For most drugs, the therapeutic dose is limited by toxicity, and the maximum recommended therapeutic dose (MRTD) is an estimated upper daily dose that is safe (Contrera et al., 2004). Investigators carry out toxicological experiments on animals to determine the toxic effects of a drug and the initial dose for human clinical trials. Unfortunately, there is a lack of correlation between animal and human toxicity data. Therefore, we investigated whether the vNN method could predict the MRTD values of new compounds based on known human MRTD data. If so, the values could be used to estimate the starting dose in phase I clinical trials, while significantly reducing the number of animals used in preliminary toxicology studies.

We obtained a dataset of MRTD values publically disclosed by the FDA, mostly of single-day oral doses for an average adult with a body weight of 60 kg, for 1,220 compounds (most of which are small organic drugs). For modeling purposes we converted the MRTD unit from mg/kg-body weight/day to mol/kg-body weight/day via the molecular weight of the compound. However, the predicted values on the website are reported in mg/day based upon an average adult weighing 60 kg. We excluded organometallics, high-molecular weight polymers (>5,000 Da), nonorganic chemicals, mixtures of chemicals, and very small molecules (<100 Da). We used an external test set of 160 compounds, which was collected by the FDA for validation. The total dataset for our model contained 1,184 compounds (Liu et al., 2012).

The MRTD model reliably predicted 69% of the FDA MRTD dataset, with a Pearson's correlation coefficient (R) of 0.79 between the predicted and measured log(MRTD) values, and a mean deviation (mDev) of 0.56 log units, using 40-fold CV (Liu et al., 2012). For comparison, we used two popular QSAR regression methods—the partial least square (PLS) and support vector machine (SVM) methods—to develop two global models to fit the training dataset. We evaluated the model performance, using 40-fold CV of the training set. The best PLS model achieved an R-value of 0.50 and an mDev of 0.79. The results for the SVM model were at best comparable to those of the best PLS model, with an R-value of 0.53 and an mDev of 0.63. For further details of the model, we refer the reader to our previous paper (Liu et al., 2012).

#### Human Liver Microsomal Stability

The human liver is the most important organ for drug metabolism. For a drug to achieve effective therapeutic concentrations in the body, it cannot be metabolized too rapidly by the liver. Otherwise, it would need to be administered at high doses, which are associated with high toxicity. To identify and exclude rapidly metabolized compounds (Di et al., 2003), pharmaceutical companies commonly use the human liver microsomal (HLM) stability assay. This has led to the accumulation of a substantial body of HLM stability data in publicly accessible databases.

However, our knowledge of how enzymes in the HLM assay metabolize drugs remains fragmentary. Therefore, we examined whether the vNN method could effectively predict drugs that are TABLE 3 | Tox21 assays with PubChem assay identification number.


rapidly metabolized by the liver. We retrieved HLM data from the ChEMBL database (Bento et al., 2014), manually curated the data, and classified compounds as stable or unstable based on the reported half-life [T1/2 > 30 min was considered stable, and T1/2 < 30 min unstable (Liu et al., 2015)]. The final dataset contained 3,219 compounds. Of these, we classified 2,047 as stable and 1,166 as unstable.

The HLM model performed with an overall accuracy of 81%; sensitivity and specificity values of 71 and 87%, respectively; and a high kappa value of 0.60 (**Table 1**). The HLM model reliably predicted 91% of the compounds in the HLM dataset when using 10-fold CV. We refer the reader to our original paper for further details of the model and its prediction performance (Liu et al., 2015).

#### Implementation Aspects

The vNN-ADMET web-application is hosted on an Apache Tomcat Web server that is accessible via a secure service over Hypertext Transfer Protocol Secure (https). We developed the application on the basis of a three-tiered architecture, composed of a backend database, controller, and presentation tiers. The first tier consists of a PostgreSQL 9.5.7 database that stores user account information, uploaded files, constructed models, and model predictions. The second (controller) tier provides access to the prediction engine and implements the functionality required to create and manage multiple predictions. We implemented this tier, using Pipeline Pilot protocols hosted on a local Pipeline Pilot server. The third (presentation) tier provides for visualization of the results, with plotting capabilities for multiple predictions. The controller and presentation tiers were developed using Java Platform, Enterprise Edition 7, Spring Framework 4.2.2, JavaServer Faces 2.2, PrimeFaces 6.0, and BootsFaces 1.0.2. The graphical user interface in the presentation tier uses Web standards supported by modern Web browsers, including Microsoft Edge 38, Chrome version 58, and Firefox version 53, without any need for plugins.

To use the system, the user must register for an account at https://vnnadmet.bhsai.org/. Once logged in, the user can build custom models, and run pre-built ADMET and custom models. The data corresponding to a user (login credentials, compounds, models, results, etc.) are not shared with any other user within or outside the system. The uploaded compounds, constructed models, and model predictions are purged from the system every 2 weeks.

#### DISCUSSION

We have presented a web-based vNN prediction platform, with which a user can build and test models as well as predict the ADMET properties of a compound by using our existing tools.

All vNN models performed well with accuracies of >71% (see **Table 1** for further details). On average, the models predicted 75% of the compounds in their datasets, using 10-fold CV.

Achieving fair comparisons between a new model and a competing model is always difficult because such comparisons require the same training data, validation data, and performance measures. An important advantage of our platform is that it offers an opportunity for developers to compare their methods with our vNN method, using their training and validation data.

For demonstrative purposes, we quantitatively compared our vNN method with the winning method of the Tox21 challenge (Huang et al., 2016). This challenge was issued in 2014 by the U.S. Toxicology in the twenty-first Century (Tox21) program, which aims to improve toxicity prediction methods. The Tox 21 consortium solicited models that could best predict the toxicity of 10,000 compounds it had tested in 12 different assays (**Table 3**). It used a final evaluation dataset that was concealed to determine the winners.

**Table 4** shows the area under the curve for the receiver operating characteristic (AUC-ROC) of the 18 leading research teams with their best-performing model for each of the 12 assays. To compare our models with those in **Table 4**, we set d to 1.0 so that we could predict all compounds. The vNN method performed reasonably well in predicting most of the Tox21 assays. We note that the grand challenge winner used data from PubChem (Wang et al., 2009) and ChEMBL (Bento et al., 2014), in addition to the Tox21 data, which makes it impossible for us to directly compare our results with their results.

The MMP data we used for our mitochondrial dysfunction model were the same as those used in the Tox21 challenge (Attene-Ramos et al., 2015; Huang et al., 2016). Our MMP model was the seventh best performing model, with an AUC-ROC value of 0.882 (with h = 0.3 and d = 1.0). This was comparable to the values of more elaborate and computationally time-consuming methods, such as deep learning (**Table 4**).

Some QSAR methods do not use an applicability domain to determine whether their predictions are reliable. This could lead to the misperception that a model can predict the activity of any molecule. The applicability domain is vital to the vNN method. The user of our platform can adjust it by varying the Tanimoto distance threshold value. Although this could be set to 1 so that the model predicts the activity of any molecule, no model is likely to have an unlimited applicability domain (Liu et al., 2015).

A more reasonable approach to improve a vNN-based model is to increase the applicability domain by adding more reference


TABLE 4 | AUC-ROCs of vNN models and the best 18 models on the final evaluation test of the Tox21 Challenge.

The vNN parameters were set to h = 0.3 and d<sup>0</sup> = 1.0. Gray cells indicate models showing performance inferior to the vNN models.

compounds. A good test of the power of a model to generate prospective predictions is time-split validation, which divides the data into "old" and "new" data and uses the former to train the model and the latter "new" data for validation (Sheridan, 2013; Liu et al., 2015). We have previously shown in a time-split validation that, whereas the accuracy of a vNN model is roughly maintained, the number of "new" compounds that it can predict is significantly reduced. However, by simply adding a few "new" compounds, the coverage increases significantly (Liu et al., 2015).

The lack of training data poses an important limitation to the vNN approach. When a dataset is too small, there is a high probability that a target molecule will have no qualified near neighbors in the dataset, and hence a high-quality prediction cannot be made. However, the lack of training data is a limitation for all machine learning methods. The difference is that most such methods build a model no matter how small the training dataset, and will always make a prediction for any input molecule without considering the reliability of the predicted result. In our view, it is better not to give a prediction at all if it is unreliable. This also alerts users to use alternative methods, including experimental measurements, to derive a reliable answer. As more experimental data become available over time, the performance of the vNN method will improve without retraining. This is in contrast to most other machine learning methods, which cannot take advantage of new data without retraining a model.

This finding is especially significant for drug discovery labs because the chemical space is restricted by the target candidates they are investigating. For example, when exploring a new drug target, it is crucial to continuously update the model with new data to ensure that the applicability domain is relevant for the new target. In a vNN-based model, this can be done easily by adding the SMILES strings of the new compounds to the reference dataset. For this reason, we believe that our webbased vNN platform has the potential to greatly accelerate the development of drugs.

### AUTHOR CONTRIBUTIONS

PS, RL, and AW developed the method, analyzed the data, and wrote the manuscript. VD designed and implemented the web server.

### FUNDING

The authors were supported by the U.S. Army Medical Research and Materiel Command (Fort Detrick, MD), and the Defense Threat Reduction Agency grant CBCall14-CBS-05-2-0007.

### ACKNOWLEDGMENTS

The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Army or of the U.S. Department of Defense. This paper has been approved for public release with unlimited distribution.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2017.00889/full#supplementary-material

## REFERENCES


using eight fingerprint methods. J. Mol. Graph. Model. 29, 157–170. doi: 10.1016/j.jmgm.2010.05.008


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Schyman, Liu, Desai and Wallqvist. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Predicting Off-Target Binding Profiles With Confidence Using Conformal Prediction

Samuel Lampa<sup>1</sup> , Jonathan Alvarsson<sup>1</sup> , Staffan Arvidsson Mc Shane<sup>1</sup> , Arvid Berg<sup>1</sup> , Ernst Ahlberg<sup>2</sup> and Ola Spjuth<sup>1</sup> \*

<sup>1</sup> Pharmaceutical Bioinformatics Group, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden, <sup>2</sup> Predictive Compound ADME and Safety, Drug Safety and Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden

Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only provide point predictions. We here describe a methodology that uses Conformal Prediction for predicting off-target interactions, with models trained on data from 31 targets in the ExCAPE-DB dataset selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the developed methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.

Edited by:

Leonardo L. G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Philip Day, University of Manchester, United Kingdom Alan Talevi, National University of La Plata, Argentina

#### \*Correspondence:

Ola Spjuth ola.spjuth@farmbio.uu.se

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 02 July 2018 Accepted: 15 October 2018 Published: 06 November 2018

#### Citation:

Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E and Spjuth O (2018) Predicting Off-Target Binding Profiles With Confidence Using Conformal Prediction. Front. Pharmacol. 9:1256. doi: 10.3389/fphar.2018.01256

Keywords: target profiles, predictive modeling, conformal prediction, machine learning, off-target, adverse effects, workflow

## 1. INTRODUCTION

Drug-target interactions are central to the drug discovery process (Yildirim et al., 2007), and is the subject of study for the field of chemogenomics (Bredel and Jacoby, 2004), which has emerged and grown over the last few decades. Drugs commonly interact with multiple targets (Hopkins, 2008), and off-target pharmacology as well as polypharmacology have important implications for drug efficacy and safety (Peters, 2013; Ravikumar and Aittokallio, 2018). Organizations involved in drug discovery, such as pharmaceutical companies and academic institutions, use many types of experimental techniques and assays to determine target interactions, including in vitro pharmacological profiling (Bowes et al., 2012). However, an attractive complementary method is to use computational (in silico) profiling of binding profiles for ligands (Cereto-Massagué et al., 2015), which also opens the possibility to predict hypothetical compounds. A common approach to the target prediction problem is to use a panel of structure-activity relationship (QSAR) models, with one model per target (Hansch, 1969), where chemicals in a knowledge base with known interaction values (numerical or categorical) are described numerically by descriptors, and a statistical learning Lampa et al. Predicting Off-Target Binding Profiles

model is trained to predict numerical values (regression) or categorical values (classification) for new compounds. The recent increase in the number of available SAR data points in interaction databases such as ChEMBL (Gaulton et al., 2017) and PubChem (Wang et al., 2017) makes it feasible to use ligand-based models to predict not only targets but also panels of targets. Several methods and tools are available for target prediction and for constructing and using target profiles. Bender et al. use a Bayesian approach to train models for 70 selected targets and use these for target profiling to classify adverse drug reactions (Bender et al., 2007). Chembench is a webbased portal, which, founded in 2008 is one of the first publicly available integrated cheminformatics web portals. It integrates a number of commercial as well as open source tools for dataset creation, validation, modeling and validation. It also supports building ensembles of models, for multiple targets (Walker et al., 2010; Capuzzi et al., 2017). The Online chemical modeling environment (OCHEM), is a web-based platform that intends to serve as multi-tool platform where users can select among the many available alternatives in terms of tools and methods, for all of the steps of creating a predictive model, such as data search, selection of descriptors and machine learning model, as well as assessment of the resulting model. OCHEM also encourages tool authors to contribute with their own tools to be integrated in the platform (Sushko et al., 2011). Yu et al. use Random Forest (RF) and Support Vector Machines (SVM) to predict drugtarget interactions from heterogeneous biological data (Yu et al., 2012). TargetHunter (Wang et al., 2013) is another online tool that uses chemical similarity to predict targets for ligands, and show how training models on ChEMBL data can enable useful predictions on examples taken from PubChem bioassays. Yao et al. describe TargetNet (Yao et al., 2016), a web service for multitarget QSAR models; an online service that uses Naïve Bayes. The polypharmacology browser (Awale and Reymond, 2017) is a webbased target prediction tool that queries ChEMBL bioactivity data using multiple fingerprints.

We observe three important shortcomings among previous works. Primarily, available methods for ligand-based target profiling often do not offer valid measures of confidence in predictions, leaving the user uncertain about the usefulness of predictions. Secondly, the majority of the web tools lack an open and standardized API, meaning that it is not straightforward (and in most cases not possible at all) to consume the services programmatically, e.g., from a script or a scientific workflow tool such as KNIME (Mazanetz et al., 2012). Thirdly, previous works do not publish the pre-processing and modeling workflows in reproducible formats, rendering it hard to update the models as data changes, and limits the portability of methods. In fact, most implementations are only accessible from a website without the underlying implementations being openly available for inspection, which limits both the reproducibility (Stodden et al., 2016), and verifiability (Hinsen, 2018) of their implementation.

We here present an approach for ligand-based target profiling using a confidence framework, delivering target profiles with confidence scores for the predictions of whether a query compound interacts with each target. The confidence scores were calculated using the Conformal Prediction methodology (CP) (Vovk et al., 2005), which has been successfully demonstrated in several recent studies (Norinder et al., 2014, 2016; Cortés-Ciriano et al., 2015; Forreryd et al., 2018). For readers new to the CP methodology, we recommend (Gammerman and Vovk, 2007) for a good and gentle general overview, and Norinder et al. (2014) for a good introduction to CP for cheminformatics. The goal of this study was to create an automated and reproducible approach for generating a predicted target profile based on QSAR binding models, with the models making up the profile published online as microservices and the profile accessible from a web page. Although the models give a confidence measure we also set out to evaluate them on a test set to see how well they performed on representative data. We exemplified the process by creating a profile for the targets for broad early hazard assessment as suggested by Bowes et al. (2012).

### 2. METHODS

### 2.1. Training Data

We based this study upon data from the ExCAPE-DB dataset (Sun et al., 2017b). The reason for this is that ExCAPE-DB combines data about ligand-target binding from ChEMBL with similar data from PubChem, where importantly, PubChem contains many true non-actives, which has been shown earlier to result in better models than by using random compounds as nonactives (Mervin et al., 2015). The data in ExCAPE-DB has also gone through extensive filtering and pre-processing, specifically to make it more useful as a starting point for QSAR studies. For more details on the data filtering and processing done in the ExCAPE-DB dataset, we refer to Sun et al. (2017b).

A scientific workflow was constructed to automate the full data pre-processing pipeline. The first step comprises extracting data on binding association between ligands and targets from the ExCAPE-DB dataset (Sun et al., 2017b), more specifically the columns Gene symbol, Original entry ID (PubChem CID or CHEMBL ID), SMILES and Activity flag. This was performed early in the workflow to make subsequent data transformation steps less time-consuming, given the relatively large size of the uncompressed ExCAPE-DB data file (18 GB). From the extracted dataset, all rows for which there existed rows with a conflicting activity value for the same target (gene symbol) and SMILES string, were completely removed. Also, all duplicates in terms of the extracted information (Original entry ID, SMILES, and Activity flag) were replaced by a single entry, and thus deduplicated. Note that deduplication on InChI level was already done in for the ExCAPE-DB dataset in Sun et al. (2017b), but since the signatures descriptor is based on SMILES, which is a less specific chemical format than InChI (certain compounds that are unique in InChI might not be unique in SMILES) this turns

**Abbreviations:** A, Active; ACP, Aggregated Conformal Predictor; CAOF, Class-Averaged Observed Fuzziness; CP, Conformal Prediction; JAR, Java Archive (A file format); MC, M Criterion (Fraction of multi-label predictions); N, Non-active; OF, Observed Fuzziness; QSAR, Quantitative Structure-Activity Relationship; RF, Random Forest; SMILES, Simplified molecular-input line-entry system (A text-based representation of chemical structures); SVM, Support Vector Machines.

out to have resulted in some duplicate and conflicting rows in terms of SMILES still appearing in the dataset. Since this is a potential problem in particular if the exact same SMILES end up in both the training and calibration or test set, we performed this additional deduplication, on the SMILES level<sup>1</sup> . For full information about the pre-processing done by the ExCAPE-DB authors, see Sun et al. (2017b). As a help to the reader we note that the activity flag is – in the ExCAPE-DB dataset—set to active (or "A") if the dose-response value in the binding assays was lower than 10 µM and non-active (or "N") otherwise.

A subset of the panel of 44 binding targets as suggested in Bowes et al. (2012) was selected for inclusion in the study. The selection was based on the criteria that targets should have at least 100 active and at least 100 non-active compounds. In addition some targets were excluded for which data was not found in ExCAPE-DB. This is described in detail below. Some of the gene symbols used in Bowes et al. (2012) were not found in their exact form in the ExCAPE-DB dataset. To resolve this, PubMed was consulted to find synonymous gene symbols with the following replacements being done: KCNE1 was replaced with MINK1 which is present in ExCAPE-DB. CHRNA1 (coding for the α1 sub-unit of the Acetylcholine receptor) was excluded, as it is not present in the dataset (CHRNA4, coding for the α4 sub-unit of the Acetylcholine receptor, is present in the dataset). We note though, that both MINK1 and CHRNA4 were removed in the filtering step mentioned above, since the dataset did not contain more than 100 active and 100 non-active compounds for MINK1 nor CHRNA. However, since one aim of the study is to present and publish an automated and reproducible data processing workflow, these targets could potentially be included in subsequent runs on later versions of the database with additional data available.

The resulting dataset (named Dataset1) consists of 31 targets (marked as "included" in **Table 1**). For 21 of these targets, the dataset contained less than 10,000 non-active compounds, which makes them stand out from the other datasets, and where some of them contain a problematically low amount of non-actives. These 21 targets are referred to as Dataset2, and their respective target datasets were expanded with randomly selected examples from the ExCAPE-DB dataset which were not reported to be active for the target, thus being "assumed non-active." These target datasets are marked with a X in the "Assumed non-actives added" column of **Table 1**. The number of new examples was chosen such that the total number of non-actives and assumed non-actives added up to twice the number of actives, for each target, respectively. The compounds for the remaining 10 targets, which were not extended with assumed non-actives, were named Dataset3.

In order to validate the predictive ability of the trained models, a new dataset was created (Dataset4) by withholding 1,000 compounds from the ExCAPE-DB dataset, to form an external validation dataset. The compounds chosen to be withheld were the following: (i) all small molecules in DrugBank (version 5.0.11) with status "withdrawn," for which we could find either a PubChem ID or a CHEMBL ID, (ii) a randomly selected subset of the remaining compounds in DrugBank 5.0.11, with status "approved," for which we could also find PubChem or CHEMBL IDs, until a total number of 1,000 compounds was reached. No regard was paid to other drug statuses in DrugBank such as "investigational."

The relation of the mentioned datasets Dataset1-4 are shown in a graphical overview of how they were created in **Figure 1**, and in **Table 2**, which summarizes in words how each dataset was created.

The Conformal Prediction methodology, in particular with the Mondrian approach, can handle differing sizes of the datasets well (Norinder and Boyer, 2017), and so we see no reason to stick to the exact same number of compounds as the actives. Instead we use an active:non-active ratio of 1:2 between the classes. The justification for this is that the assumed non-actives likely have chemistry coming from a larger chemical space compared to the known compounds, thus by adding more of the assumed nonactives we can hopefully increase the number of examples in the regions of chemical space that are of interest for separating the two classes.

All the targets, with details about their respective number of active and non-active compounds, and whether they are included or not, are summarized in **Table 1**.

### 2.2. Conformal Prediction

Conformal Prediction (CP) (Vovk et al., 2005) provides a layer on top of existing machine learning methods and produces valid prediction regions for test objects. This contrasts to standard machine learning that delivers point estimates. In CP a prediction region contains the true value with probability equal to 1 − ǫ, where ǫ is the selected significance level. Such a prediction region can be obtained under the assumption that the observed data is exchangeable. An important consequence is that the size of this region directly relates to the strangeness of the test example, and is an alternative to the concept of a model's applicability domain (Norinder et al., 2014). For the classification case a prediction is given as set of conformal p-values<sup>2</sup> , one for each class, which represent a ranking for the test object. The p-values together with the user decided ǫ produces the final prediction set. Conformal Predictors are Mondrian, meaning that they handle the classes independently, which has previously been shown to work very well for imbalanced datasets and remove the need for under/oversampling, boosting or similar techniques (Norinder and Boyer, 2017; Sun et al., 2017a).

Conformal Prediction as originally invented, was described for the online transductive setting, meaning that the underlying learning model had to be retrained for every new test object. Later it was adapted for the off-line inductive setting too, where the underlying model is trained only once for a batch of training examples. The Inductive Conformal Predictor (ICP), which is used in this study, require far less computational resources, but has the disadvantage that a part of the training set must be set aside as a calibration set. The remaining data, called proper training set, is used to train the learning model. As the partitioning of data into a calibration set and proper training set can have a large influence on the performance of the predictor,

<sup>1</sup>https://github.com/pharmbio/ptp-project/blob/c529cf/exp/20180426-wodrugbank/wo\_drugbank\_wf.go#L239-L246

<sup>2</sup>The term "p-values" in Conformal Prediction does not have the same definition as in statistical hypothesis testing.

#### TABLE 1 | The panel of targets used in this study, identified by gene symbol.


Actives and non-actives refer to the number of ligand interactions marked as active and non-active in ExCAPE-DB. The labels "included" and "not included" to the left, for the two row ranges, indicate whether targets did pass the filtering criteria of at least 100 actives and 100 non-actives, to be included.


See also Figure 1 for a graphical overview of how each dataset was created.

it is common to redo this split multiple times and train an ICP for each such split. This results in a so called Aggregated Conformal Predictor (ACP) that aggregates the predictions for each individual ICP.

In this study we used the Mondrian ACP implementation in the software CPSign (Arvidsson, 2016), leveraging the LIBLINEAR SVM implementation (Fan et al., 2008) together with the signatures molecular descriptor (Faulon et al., 2003). This descriptor is based on the neighboring of atoms in a molecule and has been shown to work well for QSAR studies (Alvarsson et al., 2016; Lapins et al., 2018) and for ligandbased target prediction (Alvarsson et al., 2014). Signatures were generated with height 1-3, which means that molecular subgraphs including all atoms of distance 1, 2, or 3 from initial atoms, are generated. Support vector machines is a machine learning algorithm which is commonly used in QSAR studies (Norinder, 2003; Zhou et al., 2011) together with molecular signatures and similar molecular descriptors, e.g., the extended connectivity fingerprints (Rogers and Hahn, 2010). As nonconformity measure we used the distance between the classifier's decision surface and the test object, as previously described by Eklund et al. (2015). In order to not use the assumed non-active compounds in Dataset2 in the calibration set of the ICPs, these additional compounds were treated separately, by providing them to the CPSign software with the --proper-train parameter, see the CPSign documentation (Arvidsson, 2016). By using this parameter the additional compounds are only added to the proper training set, thus being used for training the underlying SVM model, but not for the calibration of the predictions. This ensures that potentially non-typical chemistry in the additional assumed non-active compounds does not affect the calibration of the predictions in a negative way.

#### 2.3. Hyper-Parameter Tuning

For each of the 31 targets in Dataset1, a parameter sweep was run to find the optimal value of the cost parameter of LIBLINEAR, optimizing modeling efficiency using 10-fold cross validation. The training approach used an Aggregated Conformal Predictor (ACP) with 10 aggregated models. The parameter sweep evaluated three values for the cost parameter for each target; 1, 10, and 100. The efficiency measure used for the evaluation was the observed fuzziness (OF) score described in Vovk et al. (2016) as:

$$OF = \frac{1}{m} \sum\_{i=1}^{m} \sum\_{\substack{\mathcal{Y} \neq \mathcal{Y}}} p\_i^{\mathcal{Y}},\tag{1}$$

where p y i is the p-value of the ith test case for class y, and m is the number of test examples, or in our case with only two classes:

$$OF = \frac{\sum\_{i,\,\gamma=A} p\_i^N + \sum\_{i,\,\gamma=N} p\_i^A}{m\_A + m\_N} \tag{2}$$

where p N i is the ith p-value for class N, p A i is the ith p-value for class A and m<sup>A</sup> and m<sup>N</sup> is the number of test examples in class A and N, respectively. OF is basically an average of the p-values for the wrong class, i.e., lower fuzziness means better prediction.

To study the effect of imbalanced datasets on efficiency, we also implemented a modified version of OF, due to the fact that OF is influenced more by values in the larger class in case of imbalanced datasets, referred to as class-averaged observed fuzziness (CAOF) as:

$$\text{CAOF} = \frac{\sum\_{i,\,\gamma=A} p\_i^N}{m\_A} + \frac{\sum\_{i,\,\gamma=N} p\_i^A}{m\_N} \tag{3}$$

with the same variable conventions as above. Where OF is only an average for the p-values in the test set, CAOF averages the contribution from each class separately, meaning that for very imbalanced cases OF is mostly affected by the larger class, while for CAOF, both classes contribute equally much, regardless of their respective number of p-values. CAOF was not used for cost selection, but is provided for information in the results from the workflow.

A commonly used efficiency measure in CP is the size of the prediction region or set given by the predictor. In the classification setting, this is expressed as the fraction of multilabel predictions. This measure is denoted as the M criterion (MC) and described in Vovk et al. (2016):

$$M \, criterion = \frac{1}{m} \sum\_{i=1}^{m} \mathbf{1}\_{\{|\Gamma\_i| > 1\}} \tag{4}$$

where **1**<sup>E</sup> denotes the indicator function of event E, returning the value 1 if E occurs and 0 otherwise, and Ŵ<sup>i</sup> denotes the prediction set for test example i. A smaller value is preferable.

#### 2.4. Modeling Workflow

Before the training, the CPSign precompute command was run, in order to generate a sparse representation of each target's dataset. ACPs consisting of 10 models were then trained for each target using the CPSign train command. The cost value used was the one obtained from the hyper-parameter tuning. The observations added as "assumed non-actives" were not included in the calibration set to avoid biasing the evaluation. The computational workflows for orchestrating the extraction of data, model building, and the collection of results for summarizing and plotting were implemented in the Go programming language using the SciPipe workflow library that is available as open source software at scipipe.org (Lampa et al., 2018b). The cost values for each target are stored in the workflow code, available on GitHub (PTP, 2018). A graphical overview of the modeling workflow is shown in **Figure 2**. More detailed workflow graphs are available in **Supplementary Data Sheet 1**, **Figures S4**, **S5**.

#### 2.5. Model Validation

The models built were validated by predicting the binding activity against each of the 31 targets for all compounds for which there existed known binding data for a particular target in ExCAPE-DB. The validation was done with CPSign's validate command, predicting values at confidence levels 0.8 and 0.9.

#### 3. RESULTS

#### 3.1. Published Models

Models for all targets in Dataset1 were produced in the form of portable Java Archive (JAR) files, which were also built into similarly portable Docker containers, for easy publication as microservices. The model JAR files, together with audit log files produced by SciPipe, containing execution traces of the workflow (all the shell commands and parameters) used to produce them, are available for download at Lampa et al. (2018a). The models can be run if obtaining a copy of the CPSign software and a license, from Genetta Soft AB.

#### 3.2. Validity of Models

To check that the Conformal Prediction models are valid (i.e., that they predict with an error rate in accordance to the selected significance level), calibration plots were generated in the cross validation step of the workflow. Three example plots, for three representative targets (the smallest, the median-sized and the largest, in terms of compounds in ExCAPE-DB) can be seen in **Figure 3**, while calibration plots for all targets can be found in the **Supplementary Data Sheet 1** (**Figure S1**). From these calibration plots we conclude that all models produce valid results over all significance levels.

#### 3.3. Efficiency of Models

The efficiency metrics OF, CAOF and MC for Dataset2 (without adding assumed non-actives) are shown in **Figure 4A**. In **Figure 4B**, the same metrics are shown for when all target datasets in Dataset2 have been extended with assumed nonactives, to compensate for these datasets' relative low number of non-actives. We observe that by adding assumed non-actives for datasets with few non-actives, we improve the efficiency of models trained on these datasets. Thus, this strategy of extending the "small" target datasets in Dataset2 was chosen for the subsequent analysis workflows.

#### 3.4. External Validation

In **Figure 5** predicted vs. observed labels for Dataset4 is shown, for confidence levels 0.8 and 0.9, respectively. See the methods section and in particular **Figures 1**, **2**, for information about

Supplementary Data Sheet 1, Figures S4, S5.

how Dataset4 was created. "A" denotes active compounds and "N" denotes non-active ones. It can be seen how the number of prediction of "Both" labels increase when the confidence level increases from 0.8 to 0.9. This is as expected, as this means that fewer compounds could be predicted to only one label, with the higher confidence level. The number of "Null" predictions decreases at the higher confidence, which is also as expected. The reason is that with a higher confidence, the predictor must consider less probable (in the Conformal Prediction ranking sense) predictions to be part of the prediction region. This behavior might seem backwards, but at a higher confidence the predictor has to include less likely predictions in order to reach the specified confidence level, which leads to larger prediction sets. For predicted vs. observed labels for each target individually, see **Supplementary Data Sheet 1**, **Figures S2**, **S3**. Because of the fact that CP produces sets of predicted labels, including Null, and Both in this case, the common sensitivity and specificity measures do not have clear definitions in this context. Because

of this, we have not included calculated values for them but have instead included compound counts for the predicted label sets in **Figure 5** summarized for all targets, and as CSV files in **Supplementary Data Sheet 2** (for 0.8 confidence) and **3** (for 0.9 confidence), for each target specifically.

### 3.5. Target Profile-as-a-Service

All models based on Dataset2 were published as microservices with REST APIs publicly made available using the OpenAPI specification (Ope, 2018a) on an OpenShift (Ope, 2018b) cluster. A web page aggregating all the models was also created. The

FIGURE 5 | Predicted vs. observed labels, for all targets, for the prediction data, at confidence level 0.8 (A) and 0.9 (B). "A" denotes active compounds, and "N" denotes non-active compounds. The x-axis show observed labels (as found in ExCAPE-DB), while the y-axis show the set of predicted labels. The areas of the circles are proportional to the number of SAR data points for each observed label/predicted label combination. For predicted vs. observed labels for each target individually, see Supplementary Data Sheet 1, Figures S2, S3.

the prediction for ADBR2. Red color indicates the centers of molecular fragments (of height 1–3) that contributed most to the larger class, while blue color indicates center of fragments contributing most to the smaller class. In this case the larger class is "Active," which can be seen in the size of the p-values in the bottom left of the figure (p[A] = 0.481 >p[N] = 0.001).

OpenAPI specification is a standardization for how REST APIs are described, meaning that there is a common way for looking up how to use the REST API of a web service and that greatly simplifies the process of tying multiple different web services together. It simplifies calling the services from scripts as well as from other web pages, such as the web page (**Figure 6**) that generates a profile image out of the multiple QSAR models. At the top of the web page (see **Figure 6**) is an instance of the JSME editor (Bienfait and Ertl, 2013) in which the user can draw a molecule. As the user draws the molecule, the web page extracts

the SMILES from the editor and sends it to the individual model services to get predictions based on all available models. The user can set a threshold for the confidence and get visual feedback on whether the models predict the drawn molecule as active or nonactive for each of the targets, at the chosen confidence level. In **Figure 6** on the right side is a graphical profile in the form of a bar plot where confidence of the active label is drawn in the upward direction and the confidence for non-active is drawn in the downward direction. Hovering over a bar in the plot will give information about which model the bar corresponds to. The web page can be accessed at http://ptp.service.pharmb.io/.

#### 3.6. Example Predictions

Using the models built without the external validation dataset (Dataset4), target profiles were predicted for three molecules from the test set (**Figure 7**), i.e., the profiles were made for drugs that the models have not seen before. **Figure 7A** shows the target profile for Tacrine, a centrally acting anticholinesterase, with a distinct peak for the ACHE gene, as expected. Further, we note that most other targets are predicted as non-active with high pvalues (green color) or predicted as active with relatively low p-values (purple color). **Figure 7B** shows the target profile for Pilocarpine, a muscarinic acetylcholine receptor M<sup>1</sup> agonist, with a target profile consisting of mostly non-active predictions, and only two mildly active targets (CHRM1 and LCK). We note that LCK has a similar p-value for active and non-active. For a conformal prediction in the binary classification setting, the confidence of a prediction is defined as 1 − p<sup>2</sup> where p<sup>2</sup> is the lower p-value of the two (Saunders et al., 1999). This means that even if a prediction has one high p-value, its confidence and hence usefulness in a decision setting might still be low. **Figure 7C** shows the target profile for Pergolide, an agonist for DRD1, DRD2, HTR1A, and HTR2A which shows up as the four highest active predictions in the profile.

## 4. DISCUSSION

We have presented a reproducible workflow for building profiles of predictive models for target-binding. We have exemplified our approach on data from ExCAPE-DB about 31 targets associated with adverse effects and made these models available both via a graphical web interface via an OpenAPI interface for programmatic access and made them available for download. The Conformal Prediction methodology guarantees validity of the models under the exchangeability assumption. We have further showed that our models are indeed valid, with the calibration plots in **Figure 3**.

Based on the efficiency metrics shown in **Figures 4B,C** we see that the efficiency, after adding assumed non-actives to the datasets with very few (under 10,000) non-actives, is clearly improved. Based on the external test set, Dataset4, though, especially based on the plots in **Figure 5**, we see that there is a somewhat higher fraction of observed non-actives ("N") correctly predicted as non-actives, than the fraction of observed actives ("A") correctly predicted as active.

The use of workflows to automate pre-processing and model training and make it completely reproducible has several implications. Primarily, the entire process can be repeated as data change, e.g., when new data is made available or data is curated. In our case, the pre-processing can be re-run when a new version of ExCAPE-DB is released, and new models trained on up-to-date data can be deployed and published without delay. The components of the pre-processing workflow are however general, and can be re-used in other settings as well. Further, a user can select the specific targets that will be pre-processed, and focus the analysis on smaller subsets without having to preprocess and train models on all targets, which could be resourcedemanding. With a modular workflow it is also easy to replace specific components, such as evaluating different strategies and modeling methods.

The packaging of models as JAR-files and Docker containers makes them portable and easy to transfer and deploy on different systems, including servers or laptops on public and private networks without cumbersome dependency management. We chose to deploy our services inside the RedHat OpenShift container orchestration system, which has the benefit of providing a resilient and scalable service, but any readily available infrastructure provider is sufficient. The use of OpenAPI for deploying an interoperable service API means that the service is simple to integrate and consume in many different ways, including being called from a web page, (such as our reference page on http://ptp.service.pharmb.io/) but also into third party applications and workflow systems. With the flexibility to consume models on individual level comes the power to put together custom profiles (panels) of targets. In this work we have selected targets based on usefulness in a drug safety setting, but it is easy to envision other types of panels for other purposes. While there has been some previous research on the use of predicted target profiles (Yao et al., 2016; Awale and Reymond, 2017), further research is needed to maximize their usefulness and to integrate with other types of in vitro and in silico measures. Our methodology and implementation facilitates such large-scale and integrative studies, and paves the way for target predictions that can be integrated in different stages of the drug discovery process.

#### 5. CONCLUSION

We developed a methodology and implementation of target prediction profiles, with fully automated and reproducible

#### REFERENCES

Alvarsson, J., Eklund, M., Engkvist, O., Spjuth, O., Carlsson, L., Wikberg, J. E., et al. (2014). Ligand-based target prediction with signature fingerprints. J. Chem. Inform. Model. 54, 2647–2653. doi: 10.1021/ci500361u

data pre-processing and model training workflows to build them. Models are packaged as portable Java Archive (JAR) files, and as Docker containers that can be deployed on any system. We trained data on 31 targets related to drug safety, from the ExCAPE-DB dataset and published these as a predictive profile, using Conformal Prediction to deliver prediction intervals for each target. The example profile is deployed as an online service with an interoperable API.

### DATA AVAILABILITY


### AUTHOR CONTRIBUTIONS

OS conceived the study. OS, JA, SA, and SL designed the study, interpreted results, and wrote the manuscript. SL implemented the workflow and carried out the analysis. SA extended CPSign with new features. JA, SA, and AB contributed with model deployment and APIs. EA contributed with expertise in target profiles and modeling. All authors read and approved the manuscript.

#### FUNDING

This study was supported by OpenRiskNet (Grant Agreement 731075), a project funded by the European Commission under the Horizon 2020 Programme.

#### ACKNOWLEDGMENTS

The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project SNIC 2017/7-89.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.01256/full#supplementary-material


**Conflict of Interest Statement:** OS, JA, AB, and SA are involved in Genetta Soft AB, a Swedish based company developing the CPSign software.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Lampa, Alvarsson, Arvidsson Mc Shane, Berg, Ahlberg and Spjuth. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Extending in Silico Protein Target Prediction Models to Include Functional Effects

#### Lewis H. Mervin<sup>1</sup> , Avid M. Afzal<sup>1</sup> , Lars Brive<sup>2</sup> , Ola Engkvist<sup>3</sup> and Andreas Bender<sup>1</sup> \*

<sup>1</sup> Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom, <sup>2</sup> Cygnal Bioscience, Pixbo, Sweden, <sup>3</sup> Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden

#### Edited by:

Leonardo G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Kira Vyatkina, Saint Petersburg Academic University (RAS), Russia Joanna Ida Sulkowska, University of Warsaw, Poland

> \*Correspondence: Andreas Bender ab454@cam.ac.uk

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 06 April 2018 Accepted: 22 May 2018 Published: 11 June 2018

#### Citation:

Mervin LH, Afzal AM, Brive L, Engkvist O and Bender A (2018) Extending in Silico Protein Target Prediction Models to Include Functional Effects. Front. Pharmacol. 9:613. doi: 10.3389/fphar.2018.00613 In silico protein target deconvolution is frequently used for mechanism-of-action investigations; however existing protocols usually do not predict compound functional effects, such as activation or inhibition, upon binding to their protein counterparts. This study is hence concerned with including functional effects in target prediction. To this end, we assimilated a bioactivity training set for 332 targets, comprising 817,239 active data points with unknown functional effect (binding data) and 20,761,260 inactive compounds, along with 226,045 activating and 1,032,439 inhibiting data points from functional screens. Chemical space analysis of the data first showed some separation between compound sets (binding and inhibiting compounds were more similar to each other than both binding and activating or activating and inhibiting compounds), providing a rationale for implementing functional prediction models. We employed three different architectures to predict functional response, ranging from simplistic random forest models ('Arch1') to cascaded models which use separate binding and functional effect classification steps ('Arch2' and 'Arch3'), differing in the way training sets were generated. Fivefold stratified cross-validation outlined cascading predictions provides superior precision and recall based on an internal test set. We next prospectively validated the architectures using a temporal set of 153,467 of in-house data points (after a 4-month interim from initial data extraction). Results outlined Arch3 performed with the highest target class averaged precision and recall scores of 71% and 53%, which we attribute to the use of inactive background sets. Distance-based applicability domain (AD) analysis outlined that Arch3 provides superior extrapolation into novel areas of chemical space, and thus based on the results presented here, propose as the most suitable architecture for the functional effect prediction of small molecules. We finally conclude including functional effects could provide vital insight in future studies, to annotate cases of unanticipated functional changeover, as outlined by our CHRM1 case study.

Keywords: target prediction, activation, inhibition, cheminformatics, functional effects, mechanism-of-action, chemical space, AD-AUC

### INTRODUCTION

fphar-09-00613 June 7, 2018 Time: 17:38 # 2

Target deconvolution is an important step in the subsequent analysis of data gleaned from phenotypic screenings, to identify the modulated targets of active compounds and enable the continued dissection of the biological processes involved in a system of interest (Terstappen et al., 2007; Raida, 2011; Kotz, 2012; Lee and Bogyo, 2013). One important additional parameter of consideration is the functional modulation of targets, since its activation or inhibition (in the simplest case of allowing only for two types of functional effects) may positively or negatively modulate a pathway, which in turn may relate in different ways to an observed phenotype (Parker et al., 1993; Bauer-Mehren et al., 2009; Dosa and Amin, 2016).

One example of this is Bone morphogenetic protein 1 (BMP1), which was identified as a key target linked to cytostaticity from a screening cascade discerning the cytotoxic and cytostatic tendencies of compounds (Mervin et al., 2016). In the absence of functional information for the respective target, and since activation of BMP signaling in prostate carcinoma cells is known to be cytostatic (hence its inactivation would not explain the observed phenotype) (Wahdan-Alaswad et al., 2012), the authors were forced to hypothesize that cytostatic agents may activate BMP1. Another study rationalized the polypharmacology of sleep-inducing compounds in rat, (which, without functional annotation) were forced to stipulate that bioactive compounds with multi-target activity may elicit their synergistic sleep parameter activity through inhibition of Histamine Receptor H1 (HRH1) and activation of Cholinergic Receptor Muscarinic 4 (CHRM4) (since the biological evidence at hand for both targets advocates this rationalization) (Drakakis et al., 2017). Sertindole, a withdrawn approved drug, was also experimentally determined within the study to changeover functional activity. Despite profiles linked to prolonged sleep bouts, the compound was linked to hyperactivity, not inhibition, at key targets implicated with increased bouts of sleep, which further demonstrates how the functional behavior of compounds needs to be considered to understand phenotypic response in biological systems.

One approach to target deconvolution is in silico target deconvolution, which is a well-established computational technique capable of inferring compound MOA by utilizing known bioactivity information (Koutsoukas et al., 2011; Wang et al., 2013; Lavecchia and Cerchia, 2016). This technique is well established in the areas for the deconvolution of phenotypic screens (Poroikov et al., 2001; Geronikaki et al., 2004; Liggi et al., 2014) and the identification of compoundside effects via bioactivity profiling of off-targets (Lounkine et al., 2012; Barton and Riley, 2016). The characterization of the functional effects of compounds is often a principle shortcoming for current in silico methods, since many protocols only provide probability for compound affinity at a target (Drakakis et al., 2013; Koutsoukas et al., 2013; Mervin et al., 2015).

Existing protocols, such as the Similarity Ensemble Approach (SEA) (Keiser et al., 2007) and Prediction of Activity Spectra for Substances (PASS) (Lagunin et al., 2000), provide functional annotation by training on a compound set extracted from the MDL Drug Data Report [MDDR] (2006). These implementations however only utilize active bioactivity data (experimentally validated negative bioactivity data are disregarded), which has been shown to hinder performance. Additional problems with MDDR are inconsistent annotation, since many activity classes are not on the target level (for example the activity class 'anti-helminthic activity') and relatively small numbers of compound-target pairs are available for modeling, compared to other current databases (Lagunin et al., 2000). Other cheminformatics approaches discriminate between agonist from antagonist classifications of ligands at nuclear receptors across targets simultaneously (within a single-model architecture) (Lagarde et al., 2017). This architecture could negatively affect performance due to the imbalance between the functional data and the requirement to assign probability scores across all target proteins.

We have in this work explored various cascaded approaches to predict the functional effects of orphan compounds and contrasted these with a single-model architecture (similar to previous approaches). To this end, we have assimilated a dataset of 22,836,983 compound-target annotations available in the Chemistry Connect (Muresan et al., 2011) repository across a range of G-protein-coupled receptors (GPCRs), Nuclear Hormone Receptors (NHRs), ion channels and transporters. The dataset comprises 817,239 binders (unknown if activating or inhibiting) and 20,761,260 non-binding compounds from binding assays, as well as 226,045 activating and 1,032,439 inhibiting compounds from functional assays, spanning a total of 332 protein targets.

This work explores three different in silico architectures for functional target prediction which are summarized in **Figure 1**. **Figure 1A** outlines Architecture 1 (Arch1), a Random Forest (RF) algorithm trained with all functional labels across all targets within a single model [hence, an approach using only active (functional) data], which serves as a baseline to compare the cascaded, and hence more complex, architectures. Architecture 2 (Arch2), outlined in **Figure 1B**, is the first of two cascaded approaches, combining stage 1 target prediction with subsequent stage 2 functional prediction, which we rationalize could improve performance due to the cascaded nature of models. Stage 2 of Arch2 includes a single RF model trained on both activating and inhibiting compounds during stage 2. In comparison, **Figure 1C** depicts Architecture 3 (Arch3), which is based on an ensemble of two independent RFs trained on either activating or inhibiting compounds separately versus an inactive background set.

To establish the optimal model architecture, we conducted fivefold stratified cross validation for the three different modeling approaches. Models were also prospectively validated using an external testing set of 153,467 compounds, spanning 306 targets extracted from all functional in-house AstraZeneca data after a 4-month interim from initial training set extraction. The cross validation and time-split performance of the approaches has provided guidance into the choice of architecture to be deployed in-house for future triage processes.

inhibition (line c), or as 'inhibiting' if the probability of inhibition is greater than activation (line d). Although Arch2 and Arch3 enforce functional prediction in cases of both low activating and inhibiting probabilities, this is preferred for this study rather than the addition of an extra label (e.g., predicted binding only).

## MATERIALS AND METHODS

fphar-09-00613 June 7, 2018 Time: 17:38 # 4

### Sources of Compound Training Data

AstraZeneca bioactivity data from Chemistry Connect (Muresan et al., 2011) was mined for functional data with bioactivities (IC<sup>50</sup> and EC50) better than or equal to 10 µM and annotated with functional terms based on BioAssay Ontology (BAO) assay classifications (Vempati et al., 2012; Abeyruwan et al., 2014). The resulting dataset was filtered for the GPCR, NHR, ion channel and transporter targets, since they are considered to have the highest functional annotation accuracy (in-house) and encompass large numbers of activators which are not given in the case of enzymes.

The full complement of functional annotations includes various mechanisms, such as 'activation,' 'antagonism,' 'inverse agonism,' 'opening,' 'closing' and 'modulation' (full list shown in **Table 1**), which were chosen by BAO as the appropriate units to describe what each assay measures from assay endpoints. As a simple example, the unit EC<sup>50</sup> was linked to 'activation,' whilst IC<sup>50</sup> was annotated with 'inhibition.' More complex endpoints were assigned such that the measured activity of NHRs, GPCRs and ligand-gated ion channel mechanism-of-action (MOAs) were annotated as 'agonist,' 'antagonist,' or 'partial antagonist,' whilst voltage-gated ion channel MOAs were assigned 'opening' or 'closing' annotations.

In this study, we classified all compounds into the more simplified binary labels of 'activating' or 'inhibiting' endpoints using an internal mapping scheme (**Table 1**). Although imposing only two (activation and inhibition) functional labels may be an over-simplification, this is preferred to the complex situation resulting from the original complex BAO labeling, since it reduces training data into a binary problem per protein target, ensures larger numbers of compounds are retrained within each MOA, and that generated predictions are easily compared between the complete spectra of functional predictions between targets. It is also less algorithmically difficult to build classification models compared to regression, thereby usually improving performance.

Compounds with conflicting activating and inhibiting annotations were removed from the training data. The resulting functional data set provided 226,045 activating and 1,032,439 inhibiting compounds spanning 320 different targets, the

TABLE 1 | Functional mapping schema employed in this study.


The functional labels of biological screens were reduced into the binary classifications of 'activating' or 'inhibiting' to reduce the complexity of the modeling in this study.

distribution of which is shown in **Figure 2A**, with a median of 186 ± 1,526 activating-target compound pairs and 1,190 ± 5,123 inhibiting-target compound pairs per target. The distribution of ratios between the functional labels (overall median ratio of 5.0 ± 27.4 inhibiting:activating compounds) and distribution of functional set sizes (overall median of 163 ± 1,462 and 948 ± 4,955 for the activating and inhibiting classes, respectively) are shown in **Figures 2B,C**.

Bioactivity data was also extracted from the same database, for compounds with binding activity (K<sup>i</sup> or Kd) better than or equal to 10 µM, as a supplementary source of training data for cascaded Stage 1 target prediction (Arch1 does not cascade predictions and so does not utilize binding information). The resulting data provided 817,239 binding compound-target pairs spanning 300 different targets, comprising a median and standard deviation of 752 ± 4,954 active compounds per target.

Non-binding (inactive) compounds were extracted from PubChem in a same manner as described in Mervin et al. (2015) which involved mapping to NCBI Gene IDs (GIDs) and Protein IDs (PIDs) to the Bioactivity Assay IDs (AIDs) held in the PubChem BioAssay repository for compounds annotated as 'inactive' in deposited bioactivity screens, using the PubChem REST (Kim et al., 2016) and PubChem PUG resources (NCBI, 2007). AstraZeneca high-throughput screens deposited in the HTS DataMart (an internal database of HTS information) were also mined for non-binding compounds from bioactivity screens with bioactivities (K<sup>i</sup> , Kd, IC50, and EC50) greater than 10 µM. Compounds with conflicting non-binding annotations were removed from the training data.

To compile additional non-binding compounds for proteins not covered in the internal database or PubChem (hence, for cases where insufficient numbers of confirmed negatives were available), additional putative inactive compounds were sampled

from PubChem using a sphere exclusion algorithm. In this protocol, compounds with a Tanimoto similarity coefficient (Tc) value of less than or equal to 0.4 are sampled as a background of putative inactive chemical space. Although sphere exclusion selection leads to artificially inflated performance, this is a necessary step to ensure the existence of a putative negative bioactivity class with sufficient coverage of inactive space to conduct target prediction. The resulting dataset includes 20,761,260 non-binders with a median of 32,320 ± 84,491 nonbinding compound-target pairs per protein target.

Training compounds were subjected to pre-processing and filtered to retain targets with a minimum 10 activating and inhibiting compounds, to ensure only targets encompassing sufficient functional chemical space are retained for training. Although not essential for Stage 2 model training, binding data was also filtered for five or more compounds, to ensure the minimum number of binding data is equal to the number of folds used for cross validation. Supplementary Figure 1 shows a Venn diagram of the bioactivity data available for training, comprising 332 models. Overall, the training set includes 20,761,260 nonbinding compounds, 817,239 binders, 226,045 activating and 1,032,439 inhibiting data points.

### Compound Pre-processing and Fingerprint Generation

RDKit (Landrum, 2006) (Version 2016.09.1) was employed to remove structures not containing carbon from the dataset, and to retain only compounds with atomic numbers between 21–32, 36–52, and greater than 53, as well as with a molecular weight between 100 and 1000 Da, to retain a more 'drug-like' (in the widest sense) chemical space. Compounds were standardized using an in-house (OEChem Toolkits, 2017) script, and RDKit was used to generate 2,048-bit (circular) Morgan fingerprints (Morgan, 1965), with the radius set to 2.

### In Silico Modeling

#### Single Model Functional Prediction (Arch1)

The first model architecture, Arch1 (shown in **Figure 1A**), utilizes a single RF trained using the activating and inhibiting data across all available targets, which is intended to serve as a baseline comparison against similar online web-based approaches such as SEA and PASS, which do not necessarily consider (non-)binding information or consider multiple functional labels within one model.

A RF classifier of 100 trees, with the number of features set to 'auto' and max depth set to '20,' was implemented in Scikit-learn (Pedregosa et al., 2011), and trained using the binary matrix of activating and inhibiting compound fingerprints across all targets. The single-model provides a RF (class) probability (computed as the mean predicted class probabilities of the trees in the forest) of activating or inhibiting a target on an individual compound basis, when considering all other functional predictions for available targets. Generated probabilities are subsequently converted into binary predictions based on a probability cut-off [for example above 0.2 (line a) and 0.09 (line b) in **Figure 1A**], which is described in-depth throughout the next paragraph. The functional label with the highest probability is selected in situations when a target is considered both activating and inhibiting labels. For example, Target 2 would be considered activated when using a cut-off of 0.092 (as indicated by line 'b' in **Figure 1A**).

In order to compare Arch1 to the cascaded methods, a probability cut-off was applied to generate a final set of functional predictions from the probabilities generated. This threshold was defined as the probability providing the optimal F1-score performance (i.e., target or class performance averaged across the inactive, activating and inhibiting labels) from one percentile increments across the distribution of all scores obtained during cross validation and prospective validation, in a similar procedure to Perezgonzalez (2015). This is an important step since a robust method to fairly compare the different approaches is required, a topic which will be discussed in more detail in the section entitled "Precision and recall versus. BEDROC and PR-AUC."

#### Stage 1 Target Prediction (Arch2 and Arch3)

Both Arch2 and Arch3 use Stage 1 target prediction. Here, input compounds are subjected to Stage 1 prediction and predicted as binding (or otherwise non-binding) based on the condition that the output probability of binding is greater than non-binding. Compounds predicted non-binding at this point are removed from the further cascaded profiling, whilst compounds predicted to bind are retained for Stage 2 functional prediction.

Stage 1 target prediction employs a similar target prediction protocol to the one described previously by Mervin et al. (2016) utilizing large scale inactive chemical space and active compounds from binding and functional assays. A RF classifier of 100 trees, with the number of features and max depth set to 'auto' and the 'class\_weight' set to 'balanced' was implemented in Scikit-learn. The RF was trained using the binary matrix of inactive and active compound fingerprints on a per target bases, whilst supplying the 'sample\_weight' parameter within the 'fit' method with the ratio of active and inactive training compounds. The implementation of stage 1 target prediction does not differ between Arch2 and Arch3.

#### **Stage 2 prediction (Arch2)**

Stage 2 prediction is employed in two different way between the different model architectures of Arch2 and Arch3. Both techniques aim to assign an activating or inhibiting functional prediction to input compounds predicted as active for a particular target during stage 1 prediction.

As visualized in **Figure 1B**, Arch2 employed two cascaded RF models overall (one RF for Stage 1 and one RF for Stage 2). The RF for Stage 2 used the same hyper-parameters as Stage 1, and was trained using the activating and inhibiting compound fingerprints on a per-target basis. This RF was calibrated using Platt Scaling using the Scikit-learn 'calibrate\_classification\_cv' method, with the number of calibration and validation folds set to '3'. Thus, the predictions generated by the Stage 2 RF can be directly interpreted as a likelihood that an input compound is an activator or inhibitor.

A functional prediction is made for the functional label with the largest probability output from the second cascaded model, i.e., if the probability of activation is higher than that for inhibition, then the compound is classified as an activator (and vice-versa). Thus, this procedure does not distinguish for instances when no confident prediction can be made for the second cascaded prediction. This behavior is preferred for the purpose of this study, since enforcing a prediction for the highest label regardless of confidence ensures the output between Arch1, Arch2, and Arch3 can be compared within this study.

#### **Stage 2 prediction (Arch3)**

fphar-09-00613 June 7, 2018 Time: 17:38 # 6

**Figure 1C** illustrates Arch3, which employed three RF models overall (one for Stage 1 and two independent RF models for Stage 2). Both Stage 2 RFs utilize the same parameters as in Stage 1, and are trained separately for activating and inhibiting compounds, respectively, versus a set of inactive compounds. Probabilities generated by both algorithms were calibrated using Platt Scaling via the Scikit-learn 'calibrate\_classification\_cv' method, with the number of calibration and validation folds set to '3'. Scaling the independent probabilities in this manner enables the comparison between the activating and inhibiting probabilities from both algorithms, even though the two are distinct models. Functional predictions are made for input compounds by selecting the activating or inhibiting label with the largest probability.

### Performance Measures: Precision and Recall versus BEDROC and PR-AUC

Although the Boltzmann-Enhanced Discrimination of the Receiver Operating Characteristic (BEDROC) (Truchon and Bayly, 2007) and Precision-Recall Area Under the Curve (PR-AUC) scores are frequently used to describe virtual screening performance, this is not an appropriate metric to compare the outputs between all the models benchmarked in this study. Such metrics are based on the distribution of probabilities for the classes for each method; however these are not comparable between the three architectures explored, since they are on different scales, represent different likelihoods, and are processed to generate an overall functional prediction in different ways.

For example, Arch1 is a single model with multiple labels hence the generated scores are low, since they are distributed over all 664 target-function effects which overall must sum to '1.0'. In comparison, Arch2 uses a binary classifier on a per-target basis for Stage 2, with hence only two probabilities are produced for activating or inhibiting, whose output sum to '1.0'. Thus, these values comprise comparatively higher values since they are shared between two output labels. Furthermore, Arch3 uses two different binary classifiers to deduce a final prediction in Stage 2, using the activating and inhibiting labels normalized with a background of inactive compounds. Thus, the probabilities of these activating and inhibiting labels do not sum to '1.0', since they are distinct models. Therefore, we considered that precision, recall and F1-score (i.e., the actual output expected from the deployed models) are the most suitable and robust metrics to compare the performance of methods in the current situation.

#### Cross Validation Methodology

Fivefold stratified cross validation was employed in Scikit-learn using the 'StratifiedKFold' method, ensuring training data is randomly shuffled and seeded. In this procedure, the nonbinding and binding (only available for Arch2 and Arch3), and activating and inhibiting training data is split into five folds, whilst maintaining the ratio between compounds with different labels in each split. Each fold is used as a test and train set for cascaded Stage 1 and Stage 2 training and prediction. Binding data is only utilized within training sets for Stage 1 in the cascaded approaches, since it is only used to supplement Stage 1 training data and not employed during Stage 2.

The ranked list of functional compound predictions is used to calculate the optimal threshold for Arch1 (as discussed above) and used to generate precision, recall, and F1-score for Arch1, whilst the predicted outcome for the activating and inhibiting compounds from each test set is used to calculate the corresponding performance of the cascaded models. **Figure 2** gives details into the size of targets in terms of the data points available for modeling and ratio of inhibiting to activating compounds, which is known to influence the predictivity of target prediction models (Koutsoukas et al., 2013).

#### Prospective Validation Data Set

AstraZeneca bioactivity data was mined in the same manner as described above after a 4-month interim (exactly the 4 months after extracting training data) to obtain an external dataset of compounds to prospectively validate the models. Compounds with affinities better or equal to 10 µM were extracted and employed for cascaded Stage 1 and Stage 2 prediction. The dataset includes a total of 63,640 activating and 89,827 inhibiting compounds for 306 targets (with the number of compounds per target classification shown in Supplementary Table 1), spanning both similar and dissimilar chemical space compared to the training set (prospective validation chemical space analysis shown in Supplementary Figure 2), with overall median Tc values of 0.51 ± 0.21 and 0.62 ± 0.19, respectively. Classaveraged precision, recall and F1-score were calculated for each architecture during temporal validation, since some targets comprise only very few test set compounds, which would hence produce unreliable performance metrics.

### RESULTS

### Functional Data Available in AstraZeneca

We first analyzed the nearest-neighbor similarity distribution per-target for each classification, to explore the chemical space of the functional dataset and to rationalize to what extent the different sets of compounds can be distinguished in chemical similarity space and thus a rationale for implementing and evaluating functional target prediction models.

**Figure 3** shows the results of the nearest-neighbor similarity distribution per-target for each classification. The overall distributions highlight that binding (active) and inhibiting compounds ("B-I") are more similar to each other (median of 0.958) than both binding and activating ("B-A") and activating and inhibiting ("A-I") compounds (median similarities of 0.841 and 0.835, respectively). Overall, this analysis indicated there is some separation between the activating and inhibiting classes

of compounds in chemical space, giving us a rationale for implementing and evaluating functional target prediction models (statistical analysis of chemical similarity between the target classes shown in Supplementary Table 2).

The GPCR class comprises the highest median NN similarity between the activating and inhibiting compounds of 0.905 (and an overall median of 0.923 between the three sets), a finding that is corroborated in literature since small structural modifications to GPCR-targeted ligands are known to convey major changes in their functional activity, converting agonists into antagonists and vice-versa (Dosa and Amin, 2016). Changes in certain moieties are shown to affect binding outcome more than others; for example, one study highlighted that steric modifications near a basic nitrogen, methylation of indoles, and aniline nitrogen substitutions appeared to play important roles in determining functional activity while keeping overall structure (as captured in the fingerprints employed in the current work) rather similar (Dosa and Amin, 2016). The close proximity between functional labels may be reflected in the performance of the models, since the overlap of features present in both sets confounds the separation between labels (Koutsoukas et al., 2013).

Nuclear hormone receptors are ranked as the second most similar target class based on the NN similarity between activating and inhibiting compounds, with a median Tc of 0.883. A range of ligand modifications can inter-convert functional activity due to changes in the directions in which these ligand R-groups are positioned within the ligand-binding domains (LBDs) of NHR cores (Huang et al., 2010). For example, one study explicitly outlined which ring system extensions alter the functional effects of activating compounds at the NHR estrogen receptor (ER), due to the protrusion of additional groups displacing the agonist conformation of α-helices in the LBD (Parker et al., 1993). In comparison, ion channels and transporters comprise comparatively dissimilar chemistry between compound sets, with median Tanimoto similarities of 0.774 and 0.779, respectively, between the activating and inhibiting compounds, giving rise to the expectation of better classification performance for those datasets.

#### Cross Validation Results

We next performed stratified fivefold cross-validation (as described in the section "Materials and Methods") and calculated precision, recall and F1-score metrics for 332 targets averaged over the fivefolds. Overall performance for each of the architectures was next calculated using the class-average precision and recall for the three functional labels (namely nonbinding, inhibiting and activating) obtained over the 332 targets, the results of which are shown in **Table 2**.

It can be seen that the Arch1 architecture optimized for F1-score performed with overall class-averaged precision and recall performance of 84.5 ± 12.1 and 68.7 ± 17.5, respectively, which provides a baseline performance for what we expected to be superior (or certainly more complex) model architectures. This was indeed found to be the case, since Arch2 and Arch3 performed with target averaged precision and recall scores of 89.4 ± 9.8 and 79.2 ± 11.4, and 92.0 ± 9.1 and 82.9 ± 11.6, respectively (using a cut-off for the label with the largest probabilities as described in the section "Materials and Methods").

In order to understand the performance distribution across different protein class labels, we next averaged precision, recall and F1-scores across the three functional labels for each of the GPCR, NHR, Ion Channel and NHR target classifications, as illustrated in the second row of **Table 2**. Overall, results from



Performance metrics are calculated by averaging the scores obtained over all targets or classes, for each of the three labels (inactive, activating, inhibiting), which are then averaged.

this analysis outlined that the baseline model performed with the lowest class averaged precision and recall scores of 76.1 ± 0.2 and 68.6 ± 0.9, whilst Arch2 performed with target class averaged precision and recall of 89.3 ± 1.9 and 79.5 ± 2.7, and Arch3 performed with the best scores of 91.9 ± 1.7 and 82.9 ± 3.4, respectively.

A detailed breakdown of the protein target class averaged performance for the activating and inhibiting labels is shown in **Figure 4**. Overall, the inhibiting (more often majority) label performed with an overall class-averaged precision and recall of 75.5 and 67.3 for Arch1, 89.5 and 72.0 for Arch2 and 91.0 and 74.5 for Arch3. In comparison, the activating (more often minority) label performed with precision and recall scores of 84.2 and 65.8 for Arch1, 79.6 and 66.7 for Arch2, and 86.1 and 74.4 for Arch3, respectively. Hence, we conclude that Arch3 provides the optimal performance across the architectures assessed here.

Our results indicate models frequently perform with higher precision than recall (i.e., they are more certain about positive predictions they do make, than being able to identify compounds with the respective label across all of chemical space). Although Arch2 and Arch3 provide overall superior performance profiles, Arch1 exhibits superior activating precision (84.2) over Arch2 (79.6). We attribute this to the fact that Arch1 relies solely on activating or inhibiting compounds, and hence a more simplistic input space compared to Arch 2 and Arch3, which results in a larger number of incorrectly predicted activating compounds with fewer positive predictions with a greater propensity to be correct.

Arch2 and Arch3 also exhibit lower recall compared to precision, which is a consequence of the two-stage functional prediction, when false-negative binding predictions from Stage 1 are not used as input for Stage 2 prediction. Our findings also indicate that Arch3 can best handle the imbalance between inhibiting and activating labels compared to Arch2, to obtain higher activating recall and precision performance, a trend which will be discussed in more detail in the following.

In order to test if the activating and inhibiting performance of Arch3 models lie above that of the Arch2 approach (and hence there is statistical value in normalizing the models using a background of inactive compounds when cascading predictions), we next conducted a two-sample Kolmogorov–Smirnov (KS) test for the precision, recall and F1-score values obtained for Arch2 and Arch3 (overall results are summarized in the following, more detailed results are shown in Supplementary Table 3). The KS test produces p-values less than 0.05 (5% confidence threshold) for the activating precision, recall and F1-score (3.96E−04, 7.90E−<sup>05</sup> , and 1.95E−05, respectively) and inhibiting F1-score (4.93E−03), indicating that Arch3 performance is statistically improved for these performance parameters, compared to the Arch2 model architecture.

Overall, ∼50% (166) of the Arch2 and ∼64% (214) of the Arch3 models performed with precision and recall values greater than or equal to 0.8, as shown in **Figure 5**. Thus, functional effects of compounds can be predicted with respectable performance for over half the target modeled. Conversely, only ∼40% (133) of the Arch1 models performed with equivalent precision and recall values above 0.8, as shown by the lower distribution of scores.

In total, eight targets failed to predict activating or inhibiting molecules using Arch2 and hence received precision and recall values of '0' (shown as outliers in **Figure 5**). Seven of the eight targets were assigned such scores since no predictions were generated for the activating label, with five of these targets

comprising the highest overall F1-score.

left cells visualize a scatter plot of the relationship between recall and precision along with F1-score boundaries (f). Diagonal plots in upper left and lower right show stacked histograms of the precision or recall scores achieved by the Arch1 (red), Arch2 (yellow), and Arch3 (blue) architecture. Our results show that Arch3 provides the highest performance, for models with both high precision and recall, with a higher distribution of scores above the F1-score 0.9 boundary line.

comprising fewer than 25 activating training instances and an average of 92.7 inhibiting compounds for every activating compound (92.7:1 ratio). In comparison, there was an equivalent ratio of 15.6:1 for the models that worked, with F1-score above 0.8. Hence, we conclude here that the poor performance in these situations was due to the domination of the inhibition class and lack of sufficient data points for the minority (activating) class, and conclude that datasets comprising 25 compounds constitute the minimum to generate bioactivity models with the architectures employed here.

We next analyzed how the Arch3 architecture handles class imbalance with superior class averaged precision, recall and F1-score performance, which is shown in Supplementary Figure 3. It can be seen that this architecture performs with superior performance than Arch2 and Arch1, with all models comprising one or more inhibiting predictions, and only one model with relatively few activators (18) failing to predict any activating molecules. Since this observation is likely a result of the independent comparison of activating or inhibiting compounds with an inactive background set and the subsequent comparison of Platt scaled probabilities, our most likely explanation is that this, combined with the Platt scaling, enables the minority (more often activating) class to assign higher confidence to predictions to surpass the majority (more often inhibiting) functional label predictions.

We next sought to identify the performance of the activating and inhibiting labels for the Arch2 and Arch3 architectures separated by the individual target classifications, as shown in Supplementary Figure 4. Our results demonstrate that the distribution of performance differs between classes, where the high performance of the GPCRs and NHRs (averaged median F1-scores of 86.8 and 84.3, respectively) can be contrasted with transporters, and comparatively poorly performing ion channels (with averaged median F1-scores of 77.5). Although the poor performance for ion channels and transporters may be unexpected due to the overall rather high separation in chemical space between activating and inhibiting training compounds (**Figure 3**), the large imbalance between the labels (as previously outlined by the median activating versus inhibiting ratios of 6.95 and 6.58, highlighted in **Figure 2B**) is likely one reason for the poor performance of these classes, particularly when considering activating label performance.

In order to identify further factors influencing performance of the predictivity of models, we next explored the impact of training set size of data points with functional annotations, the similarity of the five nearest intra-target neighbors and overall crossvalidation F1-score performance as depicted in Supplementary Figure 5. The figure demonstrates both increasing nearestneighbor similarity within activating and inhibiting compounds and overall model size are shown to improve model performance, with a large proportion of data points clustered toward the top right hand corner of the 3D plot. The intra-target similarity of the models is shown to increase in accordance with training set size, with increased likelihood to cover similar compounds in the train and test set (which hence leads to increased performance). In comparison, small models (with fewer than 100 compounds) perform with more diverse performance (standard deviation of 18), due to the decreased chance of retaining similar compounds throughout the cross validation.

The models also exhibit higher variance in nearest neighbor similarity due to the reduced coverage of chemical space (as previously shown in Supplementary Figures). Smaller target models below 100 compounds with similar nearest neighbors (Tanimoto similarity above 0.6) are shown to perform better, supporting the view that targets with few activating or inhibiting compounds can be reliably utilized in functional target prediction models, providing similar chemistry to the compounds which predictions are made for is represented within the training set. These findings are at least partly due to the nature of crossvalidation, and the fact that data is comprised from a single source and that in larger classes there is greater chance to have analogs (which are then easier to predict).

This analysis (Supplementary Figure 5) also highlights the influence of the modeling approach on the cross validated performance of the models, with blue and red markers denoting the Arch2 and Arch3 approaches, respectively. 97 (∼30%) of the cascaded models have an F1-score greater than 0.95, with 63 (∼65%) of these originating from the Arch3 approach, illustrating the superior performance of this method compared to the Arch2 method. The figure illustrates both Arch2 and Arch3 approaches perform erratically in situations with low intra-target similarity and small size.

#### Prospective Validation

The performance of the functional prediction protocols was next analyzed using an external data set extracted from functional screens available at AstraZeneca after a 4-month intermission

from the initial date of training data mining. The overall class averaged precision and recall results for the non-binding, binding and inhibiting labels achieved during prospective validation are shown in **Table 2**. Arch1 performed with a class-averaged precision and recall of 59.5 ± 3.2 and 48.1 ± 1.3. In agreement with cross-validation results, the cascaded models performed with superior precision and recall, where Arch2 achieved a precision and recall of 70.9 ± 4.0 and 52.9 ± 3.6, whilst Arch3 performed with values of 70.8 ± 3.5 and 53.1 ± 3.6, respectively. Therefore, a cascaded model architecture produces more predictive models both during cross validation, as well as when applied to a prospective data set comprising novel areas of chemical space (Supplementary Figure 2).

The class-averaged precision, recall and F1-score performance split between functional labels for prospective validation is shown in **Figure 6**. Our findings show that although the Arch1 architecture outperforms Arch2 and Arch3 based on activating precision (by a margin of ∼0.90 and ∼0.12 respectively), the cascaded models far outperform the inhibiting precision score obtained by Arch1, by a margin of ∼0.35 for both architectures. The inhibiting and activating recall are also higher for the Arch2 and Arch3 models, and hence produce higher F1-scores for both cascaded architectures compared to Arch1, with scores of ∼0.19 and ∼0.26 for the activating label and ∼0.47 and ∼0.46 for the inhibiting compounds, respectively. These findings are likely due to the single model architecture of Arch1, since the singlemodel architecture creates many false inhibiting predictions due many large classes with inhibiting data, which hence dominate the model with higher probabilities.

In comparison to cross validation, the difference in Arch2 and Arch3 precision, recall and F1-score performance is narrowed for prospective validation. For example, cross validation results showed a margin of ∼0.40 and ∼0.51 between activating and inhibiting target class averaged precision and recall values, which are reduced to ∼0.19 and ∼0.20 during external validation testing.

The cascaded models have fewer compounds for Stage 1, with hence less chemical space, and hence more false negatives. This is shown via the striking distribution of poor Arch2 and Arch3 recall, particularly for the activating compounds, where 87 targets (∼59% of these belong to the GPCR class) failed to predict true-positive active compounds (i.e., 'predicted to bind') during Stage 1 target prediction. The removal of testing instances are consequently assigned recall scores of '0'. This problem is further exacerbated by the imbalance of the external testing set between functional compounds, as indicated by the ratio between prospective validation compounds, which is applied to already imbalanced models.

Given this observation, we next assessed only the fraction of active compounds predicted to be positives at Stage 1 for Arch2 and Arch3 (according to the protocol outlined in **Figure 1**), to give a better indication for the benchmarked performance between the two different cascaded methods of Stage 2 prediction (i.e., only compounds predicted active at line b in **Figures 1B,C** were considered for this part of the analysis). As shown in **Table 2**, this analysis produces class averaged recall scores for Arch2 and Arch3 of 72.4 ± 3.3 and 71.0 ± 2.0 versus 72.3 ± 2.8

and 71.3 ± 2.5, respectively, indicating the recall and F1 score performance of is higher for Arch3 than Arch2 when benchmarking cascaded Stage 2 performance by considering only true positives from Stage 1 predictions.

To further explore in more detail the performance of different target classifications between Arch2 and Arch3, we analyzed the distribution of F1-score prospective validation performance when only active compounds predicted to be positives at Stage 1 are considered. Supplementary Figure 7 also shows, in a similar trend to cross validation, that the ion channels and transporter class have a distribution of activating scores lower than the GPCR and NHR classes due to the imbalance between the activating and inhibiting compounds also represented in the external testing set, whilst there is higher performance for the inhibiting classification of compounds due to the domination of this label.

We finally assessed the applicability domain (AD) of all model architectures using 'distance to the training set' as a method (Gadaleta et al., 2016; Hanser et al., 2016), the results of which are shown in **Figure 7**. The averaged five nearest neighbors (k = 5) in the training set and the true positive rate (TPR) (defined by the frequency of correct predictions within activating and inhibiting testing compounds) are shown for Arch1, Arch2 and Arch3. We see that the TPR decreases in accordance with increasing dissimilarity from the nearest compound in the respective label of training data across all architectures, as expected, with Arch3 performing with the highest area under the applicability domain curve (AD-AUC) of 0.30. This analysis enables us to assign confidence to novel predictions as follows; for example, an input

FIGURE 7 | Prospective validation distance-based applicability domain (AD) analysis. AD curves are shown for Arch1 (red), along with Arch2 (yellow) and Arch3 (red). Each line performs with AUC scores of 0.22, 0.29, 0.30, respectively, indicating that Arch3 performs with overall superior AUC when considering the true positive rate achieved and increasing distance between training and prospective validation compounds. The Arch1 architecture produces similar true positive rates to the cascaded architectures for distances beyond 0.6, indicating that all three model architectures have difficulty in extrapolating into novel areas of the chemical space. True positive rate is defined as the recall of the activating and inhibiting data points for each distance bin.

compound with a near neighbor similarity between 0.8 and 0.85 would have an anticipated true-positive rate of ∼35% for Arch1, ∼78% for Arch2 and ∼81% for Arch3. We can also see that although Arch1 performs with a comparatively low AD-AUC of 0.22, all architectures obtain comparatively similar TPR rates throughout increasing dissimilarity scores from 0.6 onward, and hence models are unable to extrapolate into these dissimilar areas of chemical space.

In a final case study, we analyzed the aforementioned study of Drakakis et al. (2017), to illustrate a scenario where functional prediction would have added value to a computational study. In this work, target prediction profiles were related to prolonged sleep bouts, where changing functional effects on receptors was related to the change on the sleep effect of compounds. Contrary to the reasoning gained from the in silico mechanism-of-action analysis, Sertindole, which was expected to increase sleep bouts, actually increased wakefulness by 44.9 min. In the absence of functional prediction, the authors hypothesized that the compound switched functional activity at one of the key receptors (CHRM1), compared to the other sleep inducing compounds (Alcaftadine, Ecopipam, Cyproheptadine, and Clopenthixol), leading to hyperactivity and promoted wakefulness. We hence suggest that our method could improve similar analyses by providing vital insight into cases of unanticipated functional changeover.

To illustrate this, we profiled the functional activity of Sertindole at the CHRM1 receptor using Arch1, Arch2, and Arch3. Arch2 and Arch3 predictions both indicate target specific activation of CHRM1, compared to the four sleep inducing compounds above. Arch1 however, did not predict CHRM1 activation or inhibition, and thus would not have predicted any functional activity against the CHRM1 receptor.

We conclude that this case study highlights how cascaded functional models provide vital insight into this previous work, and that the unanticipated functional activity could have helped to direct resources toward the experimental functional testing of CHRM1, which was not conducted in the original study.

### DISCUSSION

In this study, we present an in-depth analysis of functional bioactivity data available in-house. We first analyzed the chemical space of functional data, to rationalize whether the functional sets of compounds can be distinguished using chemical similarity. Binding and inhibiting compounds were more similar to each other [median Tanimoto Similarity (Tc) of 0.958] than both binding and activating or activating and inhibiting compounds (median Tc of 0.841 and 0.835, respectively). There was separation between functional sets giving us a rationale for implementing and evaluating functional prediction models. We first generated Architecture 1 (Arch1), which uses a simplistic RF similar to existing approaches, and contrasted this with two forms of cascaded models, namely Arch2; comprising a Stage 2 model trained directly on the activating and inhibiting compounds, and Arch3; comprising two independent Stage 2 models trained on either activating or inhibiting compounds, and a set of inactive compounds, respectively. Fivefold cross validation and temporal validation was performed using data available at AstraZeneca after a 4-month interim. Cross validation highlighted Arch3 achieved the highest precision, recall and F1-scores, which we attributed to the independent comparison of activating or inhibiting compounds with the inactive background sets, and the subsequent comparison of Platt scaled probabilities. In comparison, Arch1 had the lowest precision and recall performance which we attributed to the single-model architecture. Prospective validation indicated that Arch2 and Arch3 outperform the Arch1 overall and hence outlined there is benefit in cascading predictions using a more complex model architecture. Distance-based applicability domain (AD) analysis outlined Arch3 achieved superior AD-AUC (area under the AD curve) and hence superior extrapolation into novel areas of chemical space. Models will be deployed in-house to aid with future phenotypic screening analyses. We conclude that predicting functional effects could provide vital insight for future studies, to annotate cases of unanticipated functional changeover, as outlined by our CHRM1 case study.

#### AUTHOR CONTRIBUTIONS

LM assimilated the functional data sets, implemented and evaluated the algorithms presented in this work, and wrote this manuscript. AMA helped to implement design model architectures and benchmarking. LB curated

the BAO functional dataset. AB and OE conceived the main theme on which the work was performed and made sure that scientific aspect of the study was rationally valid. All authors contributed to revising the final draft of the manuscript.

#### FUNDING

LM thanks the Biotechnology and Biological Sciences Research Council (BBSRC) (BB/K011804/1); and AstraZeneca, grant number RG75821.

#### REFERENCES


#### ACKNOWLEDGMENTS

The authors thank Luise Scheidt for proof reading the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00613/full#supplementary-material



Wang, L., Ma, C., Wipf, P., Liu, H., Su, W., and Xie, X. Q. (2013). TargetHunter: an in silico target identification tool for predicting therapeutic potential of small organic molecules based on chemogenomic database. AAPS J. 15, 395–406. doi: 10.1208/s12248-012- 9449-z

**Conflict of Interest Statement:** LB was employed by company AstraZeneca and currently works at Cygnal Bioscience. OE is employed by company AstraZeneca.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Mervin, Afzal, Brive, Engkvist and Bender. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genotypic and Phenotypic Factors Influencing Drug Response in Mexican Patients With Type 2 Diabetes Mellitus

Hector E. Sanchez-Ibarra<sup>1</sup> , Luisa M. Reyes-Cortes<sup>1</sup> , Xian-Li Jiang<sup>2</sup> , Claudia M. Luna-Aguirre<sup>1</sup> , Dionicio Aguirre-Trevino<sup>1</sup> , Ivan A. Morales-Alvarado<sup>1</sup> , Rafael B. Leon-Cachon<sup>3</sup> , Fernando Lavalle-Gonzalez<sup>4</sup> , Faruck Morcos2,5 \* and Hugo A. Barrera-Saldaña1,6 \*

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Julio Benitez, Universidad de Extremadura, Spain Reginald F. Frye, University of Florida, United States

#### \*Correspondence:

Faruck Morcos faruckm@utdallas.edu Hugo A. Barrera-Saldaña habarrera@gmail.com

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 19 January 2018 Accepted: 20 March 2018 Published: 06 April 2018

#### Citation:

Sanchez-Ibarra HE, Reyes-Cortes LM, Jiang X-L, Luna-Aguirre CM, Aguirre-Trevino D, Morales-Alvarado IA, Leon-Cachon RB, Lavalle-Gonzalez F, Morcos F and Barrera-Saldaña HA (2018) Genotypic and Phenotypic Factors Influencing Drug Response in Mexican Patients With Type 2 Diabetes Mellitus. Front. Pharmacol. 9:320. doi: 10.3389/fphar.2018.00320 <sup>1</sup> Molecular Genetics Laboratory, Vitagénesis, S.A. de C.V., Monterrey, Mexico, <sup>2</sup> Evolutionary Information Laboratory, Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, United States, <sup>3</sup> Departamento de Ciencias Básicas, Centro de Diagnóstico Molecular y Medicina Personalizada, Vicerrectoría de Ciencias de la Salud, Universidad de Monterrey, Monterrey, Mexico, <sup>4</sup> Servicio de Endocrinología, Hospital Universitario Dr. José E. González, Universidad Autónoma de Nuevo León, Monterrey, Mexico, <sup>5</sup> Center for Systems Biology, University of Texas at Dallas, Richardson, TX, United States, <sup>6</sup> Tecnológico de Monterrey, Monterrey, Mexico

The treatment of Type 2 Diabetes Mellitus (T2DM) consists primarily of oral antidiabetic drugs (OADs) that stimulate insulin secretion, such as sulfonylureas (SUs) and reduce hepatic glucose production (e.g., biguanides), among others. The marked interindividual differences among T2DM patients' response to these drugs have become an issue on prescribing and dosing efficiently. In this study, fourteen polymorphisms selected from Genome-wide association studies (GWAS) were screened in 495 T2DM Mexican patients previously treated with OADs to find the relationship between the presence of these polymorphisms and response to the OADs. Then, a novel association screening method, based on global probabilities, was used to globally characterize important relationships between the drug response to OADs and genetic and clinical parameters, including polymorphisms, patient information, and type of treatment. Two polymorphisms, ABCC8-Ala1369Ser and KCNJ11-Glu23Lys, showed a significant impact on response to SUs. Heterozygous ABCC8-Ala1369Ser variant (A/C) carriers exhibited a higher response to SUs compared to homozygous ABCC8-Ala1369Ser variant (A/A) carriers (p-value = 0.029) and to homozygous wild-type genotypes (C/C) (p-value = 0.012). The homozygous KCNJ11-Glu23Lys variant (C/C) and wild-type (T/T) genotypes had a lower response to SUs compared to heterozygous (C/T) carriers (pvalue = 0.039). The screening of OADs response related genetic and clinical factors could help improve the prescribing and dosing of OADs for T2DM patients and thus contribute to the design of personalized treatments.

Keywords: pharmacogenetics, pharmacogenomics, diabetes, sulfonylureas, biguanides, Mexican, direct coupling analysis, direct information

## INTRODUCTION

fphar-09-00320 April 6, 2018 Time: 14:57 # 2

Type 2 Diabetes Mellitus (T2DM) is the most common form of diabetes in adults. T2DM is associated with multiple complications, such as blindness, lower limb amputation, and premature death (Marchetti et al., 2009; Barquera et al., 2013). According to the International Diabetes Federation (IDF), China, India, United States, Brazil, Russia, and Mexico are the countries with the highest incidence. It is estimated that life expectancy is reduced in diabetic individuals by 5–10 years, mainly due to lack of early treatment. In Mexico, the average age for death by diabetes or its complications was 66.7 in 2010, compared with the lifespan of 76 years of non-diabetic individuals (Agudelo-Botero and Davila-Cervantes, 2015). The average annual economic cost from 2006 to 2010 of T2DM patients in Mexico was \$941,345,886 USD of direct cost, \$177,220,390 USD of indirect cost, and \$27,969,427 USD from its complications. This immense cost, coupled with the issues of inequity and access to healthcare in Mexico, where 51% of the cost comes from household income, represents a huge social burden (Arredondo and De Icaza, 2011; Barquera et al., 2013).

Several classes of oral antidiabetic drugs (OADs) are currently available and primarily include agents that stimulate insulin secretion (sulfonylureas), reduce hepatic glucose production (biguanides), delay the digestion and absorption of intestinal carbohydrate (alpha-glucosidase inhibitors), or improve insulin function (thiazolidinediones) (Krentz and Bailey, 2005; Nathan et al., 2009). Additionally, OADs include other classes of drugs such as meglitinides, glucagon-like peptide-1 (GLP-1) agonists, dipeptidylpeptidase-4 (DPP-4) inhibitors, dopamine-2 agonists, and amylin analogs (Inzucchi et al., 2012). There is a wide variability in adverse events and glucose-lowering response to OADs among different patients, which may be attributed to factors like age, sex, and body weight, but also to genetic variation related to pharmacokinetic and pharmacodynamic properties of the OADs (Becker et al., 2013; Emami-Riedmaier et al., 2015).

Biguanide, especially metformin, which is the only one available OAD in some countries, is recommended as the first-choice therapy for T2DM (Inzucchi et al., 2012). Metformin inhibits the activity of mitochondrial respiratorychain complex I, resulting in decreased ATP synthesis and an accumulation of AMP leading to the activation of AMP-activated protein kinase (AMPK) and the subsequent suppression of hepatic gluconeogenesis (Foretz et al., 2010). Pharmacokinetic studies suggest that metformin is actively absorbed from the gut and is excreted unchanged in the urine (Zhou et al., 2009). The organic cation transporter 1 (OCT1), encoded by SLC22A1 gene, is expressed in the basolateral membrane of hepatocytes and mediates the metformin uptake, while OCT2 (encoded by SCL22A2), expressed in the basolateral membrane of kidney tubular cells, facilitates almost 80% of metformin excretion (Pearson, 2009; Pernicova and Korbonits, 2014). Associations of intronic variants in SLC22A1 and SLC22A2 with glucose-lowering response to metformin in T2DM patients have been previously reported (Tkac et al., 2013). SLC22A1 gene is highly polymorphic, with common function-reducing polymorphisms such as Arg61Cys (rs12208357), Gly401Ser (rs34130495), and Gly465Arg (rs34059508), which having been associated with decreased transportation and therefore the reduced therapeutic effect of metformin (Distefano and Watanabe, 2010). In vitro studies have shown that all three polymorphisms might be associated with reduced metformin uptake (van Dam et al., 2005). However, in vivo studies show controversial results (Tzvetkov et al., 2009).

Sulfonylureas (SUs) target an ATP-dependent potassium (K-ATP) channel present in pancreatic β-cells. K-ATP channels are hetero-octamers composed of Kir6.2 pore subunit encoded by the gene KCNJ11, and the SUR1 receptor subunit encoded by the gene ABCC8. SUs lower glycemia by enhancing insulin secretion from pancreatic β-cells by inducing K-ATP channel closure (Tkac, 2015). SUs, such as tolbutamide, glimepiride, and glipizide, are mainly metabolized by the enzyme cytochrome P450 encoded by the CYP2C9 isoform gene. Several SNPs have been related to their effect on insulin secretion enhancing (Holstein et al., 2005). Reduced drug-metabolizing activity has been reported in individuals carrying two allelic variants namely CYP2C9<sup>∗</sup> 2 (rs1799853) leading to a missense amino acid polymorphism Arg144Cys, and CYP2C9<sup>∗</sup> 3 (rs1057910) leading to the missense amino acid polymorphism Ile359Leu (Huang and Florez, 2011). The Ile359Leu polymorphism has a more profound effect (Ragia et al., 2014). These alleles encode proteins with a diminished enzymatic activity and are correlated with elevated serum levels of SUs (Ragia et al., 2009). However, CYP2C9-Arg144Cys polymorphism is not associated with diabetes susceptibility (Semiz et al., 2010).

Regarding SUs target (K-ATP channels), most studies researched two linked non-synonymous common variants in both ABCC8 and KCNJ11 genes. KCNJ11 variants are implicated in glycemic progression to either prediabetes or T2DM. One of the most common KCNJ11 polymorphisms is Glu23Lys (rs5219). The functional effects of the Glu23Lys variant on insulin secretion and sensitivity yield controversial results, even though recent larger studies demonstrate a significantly reduced insulin secretion, lower insulin levels, and improved insulin sensitivity, consistent with the enhanced K-ATP channels activity in pancreatic β-cells (Villareal et al., 2009). More recently, the associations of the Glu23Lys variant and a different KCNJ11 variant, Ile1337Val (rs5215), with T2DM have been confirmed in several genome-wide association studies (GWAS), rekindling the interest in its potential role as a genetic marker for T2DM development (Cheung et al., 2011). On the other hand, the ABCC8-Ala1369Ser (rs757110) polymorphism has been associated with a reduction of glycated hemoglobin (HbA1c) in the Chinese population with SUs treatment (Feng et al., 2008; Sokolova et al., 2015).

In addition to pharmacogenetic factors, the response to OADs is conditional on different phenotypic or clinical aspects. With the accessibility of cohorts of this T2DM patient information, various statistical approaches can be used to determine the contributing factors affecting response to OADs. Traditional

statistical tools are used to measure the co-occurrence of factor variable and treatment response at a time (Turner et al., 2009; Stransky et al., 2015). However, the human trait factors may internally relate or function together to affect the drug response. Although these tests provide real statistical connections among variables in patient data, these relationships tend to be composed with both strong and weak correlations making it difficult to disentangle direct effects that explain the influence of some variables over a factor of interest. Therefore, important efforts have been dedicated to the development of statistical models to better describe relationship networks related to human disease. In the field of pharmacogenomics, a variety of statistical models have been built, such as Bayesian networks and Elastic net regression (Barretina et al., 2012), which have exhibited great performance on finding genes highly connected to drug response. Recently, a global statistical model, direct coupling analysis (DCA), also has been demonstrated to be applicable in pharmacogenetic data (Jiang et al., 2017). DCA efficiently computes estimates of a joint probability distribution of multivariate patient profiles constructed with clinical data. The parameters of such distribution estimated by DCA are used to quantify with high success the degree of connectivity of variables in the model. The ability to disentangle direct couplings from indirect couplings has been successful in the field of structural biology where directly coupled residue pairs have been used to predict co-evolution of amino acids (dos Santos et al., 2015), predict the structure of proteins (Sulkowska et al., 2012) with an accuracy not seen before as well as predict the molecular plasticity and complexes (Morcos et al., 2013; dos Santos et al., 2015). Recently, we have used this framework to study protein expression level–based protein– protein interactions and in a pharmacogenomics approach to infer gene–drug interactions in cancer tissues and cell lines where information on drug sensitivity is available (Jiang et al., 2017). This is the first time that direct information (DI) is used as a metric of correlation in high throughput profiling data. It not only captures the connections between well-known drug response predictors, including some drug targets for certain anti-cancer agents, but also predicts some potential biomarkers and generates gene–drug networks. DCA is used in this study to find highly coupled factors for response to OADs and to construct a network for the patient cohort data. A metric called DI is computed to evaluate the association intensity of two variables, including the connections between two potential factors and between factors and drug response.

In addition to genetic variations traits containing pharmacogenetic data, the phenotypic traits of patients, such as age, sex, health status, have been suggested to have influences on the outcome of OADs treatment for T2DM. Thus, a T2DM patient database including genetic data and patient phenotypic data is advantageous. This study collects 495 T2DM patients with information about age, origin, sex, body index, health status, history of OADs treatment, polymorphisms, and results of glycated hemoglobin (HbA1c) tests. HbA1c is a recognized target for diabetes control used in international guidelines and is the most suitable parameter to be studied in pharmacogenetic studies (Lo et al., 2012). Here, we propose a new structure-learning approach for Bayesian network construction by using direct information and Chow-Liu trees. Chow-Liu algorithm is commonly used to learn Bayesian network structure (Almudevar, 2010), and mutual information is used by this algorithm to estimate the dependence of two variables (Chen et al., 2008). Due to the better performance of DI on measuring direct associations when compared to mutual information, we integrated DI and the Chow-Liu algorithm to recover global connections between clinical factors for T2DM patients.

Genetic variations or patient phenotypic data affecting the drug responses to T2DM treatments often lead to the necessity of treatment changes and adjustments, resulting in higher expenses for the patients. The aim of this study was to establish an association between patient clinical data, such as habits, treatment history, polymorphisms, and variability in the response to OAD treatments in a Mexican population. Therefore, biomarkers could help prescribe the right drug and its dosage, for better control of the disease and its consequences, including treatment savings and reduced impact in productivity.

### MATERIALS AND METHODS

#### Design

A cross-sectional and retrospective study with convenience sampling was carried out in T2DM patients treated with OADs, in monotherapy or in combination for at least 6 months, to determine possible association between patient data, gene variants, and drug response assessed by HbA1c values. This study was conducted according to Good Clinical Practice standards and guidelines of the Declarations of Helsinki and Tokyo. Furthermore, the protocol was approved by the Ethics and Research Committee from the Medical School of the Universidad Autonoma de Nuevo León (IRB00005579).

#### Patients

We recruited male and female patients with T2DM from northeastern Mexico who attended the Clinic of Diabetes of the Endocrinology Service at the Dr. José Eleuterio González Hospital in Monterrey, Mexico. The recruitment period lasted 12 months. The inclusion criteria were: patients over 18 years old with T2DM and treated with oral antihyperglycemic agents or OADs, in monotherapy or in combination for at least 6 months. The exclusion criteria were: diabetes type 1, gestational diabetes, other non-T2DM types of diabetes, active cancer, heart failure, co-treatment with corticosteroids or estrogens, conditions that can cause hyperglycemia, addiction to alcohol or illegal drugs, and dementia or severe psychiatric disorders. The co-treatments with corticosteroids and estrogens were excluded. The disease status was confirmed using the American Diabetes Association criteria and a physical examination. Blood pressure, body height, and body weight measurement were done. The body mass index (BMI) was calculated from anthropometric measurements.

All patients were apprised about the aims of the study, and a written informed consent was obtained. In addition, information on the history of diabetes and the presence of

arterial hypertension, hyperlipidemia, and chronic-degenerative diseases, smoking status, and other medications was obtained from the medical records and from the interview for inclusion in the study.

#### Definition of Response

A fasting blood sample was drawn for the determination of HbA1c. HbA1c was measured at least 3 months after drug prescription and determined using Tina-quant <sup>R</sup> HbA1C Gen. 3 (Cobas-Mira Roche). The approach taken for the treatment of the patients was "treat to target," defined as failure to reach levels of HbA1c ≤ 7%. The initial HbA1c of each patient was at least 7%.

#### DNA Isolation

Peripheral blood from patients was extracted in a tube with EDTA and genomic DNA was isolated with Wizard Genomic DNA Purification Kit (Promega, Madison, WI, United States). Protocol was followed according to manufacturer's instructions. Genomic DNA was quantified by UV absorbance using Nanodrop (Thermo Scientific, Wilmington, DE, United States). The quality of DNA was measured with the A260/280 ratio, a value of 1.8–2 was considered of good quality. Samples were kept at −20◦C in small working aliquots until analysis to avoid recurrent cycles of freezing and thawing to minimize degradation.

#### Pharmacogenetic Tests (Genotyping)

A total of 14 single nucleotide polymorphisms distributed in 5 different genes associated with response to anti-diabetic treatments were genotyped by Real-Time PCR system using validated Genotyping Assays (Applied Biosystems, Foster City, CA, United States) according to the manufacturer's instructions. Two additional polymorphisms in SLC22A1 gene (Met61Val and Met420Del) were included in the study and analyzed in 50 responders and 50 non-responder patients. These additional polymorphisms were determined by nucleotide sequencing method in a Genetic Analyzer 3100 (Applied Biosystems). As a quality control measure, genotyping for the polymorphisms were required to pass three tests for inclusion in subsequent association studies: the genotype call rate (> 0.90 completeness to obtain 99.8% accuracy), the Hardy-Weinberg equilibrium (HWE) test (p-value > 0.05), and the minor allele frequency (MAF) criterion (> 0.01).

#### Analysis of Statistical Significance

Standard descriptive and comparative analyses were performed. The responder's phenotypes classification was made using Hb1Ac parameter applied a cut-off ≤ 7 for responder's and > 7 for non-responder's [including first-line therapy (FLT), secondline therapy (SLT), third-line therapy (TLT), monotherapy, and combination therapy]. The HWE was determined by comparing the genotype frequencies with the expected values using the maximum likelihood method. To detect significant differences between two groups, Student's t-test or the Mann– Whitney U-test were used for parametric or non-parametric distributions, respectively. Differences between more than two groups were assessed by one-way ANOVA and the Kruskal– Wallis H-test for parametric or non-parametric distributions, respectively. Post hoc tests (LSD and Tamhane's T2) were used for pairwise comparisons. Possible associations between genotypes and phenotypes were assessed using contingency tables X<sup>2</sup> statistics and Fisher's exact tests. The association was evaluated under four different models (dominant, over dominant, recessive, and additive). Odds ratios were estimated with 95% confidence intervals. Aforementioned analyses were performed with SPSS for Windows, V.20 (IBM Corp., Armonk, NY, United States). All p-values were two-tailed. The corrected P (Pc)-values were adjusted by using Bonferroni's correction. A p-value ≤ 0.05 was considered statistically significant.

### Computational Modeling: Direct Coupling Analysis

To study the association between diabetes-related SNPs, patient data and antidiabetic drug response, we have developed a metric called DI, which is derived from the inference framework DCA (Morcos et al., 2011). DCA is a statistical method that infers efficiently the parameters of probability distributions with a large set of variables. DCA can be computed efficiently and is able to capture and evaluate direct pairwise correlations among potentially thousands of variable connections. The probability distribution of large sets of data is modeled with the following Boltzmann-like distribution:

$$P(data) = \frac{1}{Z} \exp\{\sum e\_{ij} + \sum h\_i\}$$

where dat represents a profile with L variables that are indexed by i and j and Z is a normalization constant. The parameters of this distribution are all possible eij and h<sup>i</sup> for i, j ≤ L and contain information about pairwise direct connectivity (eij) of the variables in the dataset. They are typically hard to be calculated exactly, but can be estimated using DCA. Once the parameters have been estimated, we can use them to compute pairwise probabilities. The following expression shows the form of DI based on the probabilities computed using the parameters, eij and hi .

$$DI\_{i\bar{j}} = \sum\_{\boldsymbol{\chi}\_{\bar{i}}, \boldsymbol{\chi}\_{\bar{j}}} P\_{i\bar{j}}(\boldsymbol{\chi}\_{\bar{i}}, \boldsymbol{\chi}\_{\bar{j}}) \log \frac{P\_{i\bar{j}}(\boldsymbol{\chi}\_{\bar{i}}, \boldsymbol{\chi}\_{\bar{j}})}{f\_{i}(\boldsymbol{\chi}\_{\bar{i}}) f\_{\bar{j}}(\boldsymbol{\chi}\_{\bar{j}})}$$

Here x<sup>i</sup> is the quantized value of the clinical variable in the profile. The values of the DIij pairs tell us how connected are two variables in the distribution.

#### Analysis on T2DM Patient Data

The DCA was applied to the complete cohort of data as described in **Figure 1**. The responder's phenotypes classification was made at a cut-off 7 as defined before. Patient's body indexes, such as weight, height, BMI, age, duration of diabetes, systolic pressure, diastolic pressure, are classified based on decade spans. To find the influential factors for response to OADs, a matrix containing all patient phenotypic informatics, 14 polymorphisms, HbA1c test result is generated as the input for DCA algorithm (Morcos et al., 2011). The T2DM database consists of patient profiles from 495 patients, including basic information, first, second, third line therapy information, 14 polymorphisms, health conditions, and

the HbA1c test result estimating the glucose-lowering effect of OADs. The patient profile columns also include the 21 OADs separately, representing the usage and doses of a specific OAD for certain patients. All of those profiles data are classified and organized in an input matrix for DCA. DI is computed from DCA as a metric of connectivity strength for pairwise variables. The higher DI values, the stronger the correlation between these two variables. DI has been successfully applied to model molecular interactions in protein folds (Morcos et al., 2011, 2014; dos Santos et al., 2015; Boyd et al., 2016) as well as to identify drug-gene connections in cancer datasets (Jiang et al., 2017). Then, DI values for each variable pair is computed by DCA algorithm and then is used to find a complete network by using a minimum spanning tree approach and then a Bayesian network is built with undirected edges.

### Predictive Model for OAD Treatment Response

The direct connectivity (eij) estimates the strength of couplings between two variables at certain states. The summation of eij over all of patient profile factors with drug response provides a score to evaluate each patient's glucose lowering response after taking OADs under his specific genetic and clinical profiles.

When summing all the eij with the j defined as the HbA1c level ≤ 7%, the Score represents how likely the patient is responding to the current OAD treatment based on his/her body indexes, treatment strategy, polymorphisms, health condition.

$$Score\_{Res} = \sum\_{i} e\_{ij}(\boldsymbol{\omega}\_i, Res).$$

where i denotes a genetic or clinical factor of patient, and x<sup>i</sup> represents the class of the factor belongs to. Additionally, the score for a patient's inert responses to the OAD is calculated based on the eij with j representing HbA1c level > 7%.

$$Score\_{NonRes} = \sum\_{i} e\_{ij}(\chi\_i, NonRes)$$

The two scores for each patient are compared and the treatment response is predicted based on which score is larger. The leave one out cross-validation is conducted to evaluate the performance of this predictive model.

### RESULTS

#### Descriptive Statistics and Phenotype Classification

A total of 495 patients treated with hypoglycemic drugs were included in this study. The subjects were Mexican, mainly from northeastern of Mexico. The average age of patients was 56.30 ± 12.16 for males and 56.41 ± 11.45 for females. No significant differences were found for the age of diagnosis, diabetes duration, and HbA1c values between males and females. However, the BMI was statistically higher in females (**Table 1**). Regarding to co-morbidities, the most frequent co-morbidity was hypertension with 24.4%, followed by hypertension-dyslipidemia with 13.1%, only dyslipidemia (7.5%), hypothyroidism (6.3%), and hypertension-hypothyroidism (4.6%).

The phenotype classification based on HbA1c values (**Table 1**) was significantly different between the responder's and nonresponder's (p = 6.29 × 10−68). More than half of the patients (353) did not respond to any type of therapy (HbA1c > 7%), failing in 71.3% of the cases, and the treatment was effective (HbA1c ≤ 7%) in 142 individuals. The average diagnosis age of non-responders showed significant lower values (p = 4.25 × 10−<sup>4</sup> ) compared to responder's, but showed statistically significant higher values of diabetes duration


Data presented as mean ± SD. BMI: body mass index; HbA1c: hemoglobin A1c; MT: monotherapy; CT: combined therapy; FLT: first-line therapy; SLT: second-line therapy; TLT: third-line therapy. p = 0.025 (male vs. female), <sup>1</sup>P = 4.25 × 10−<sup>4</sup> (non-responders vs. responders), ¶p = 2.5 × 10−<sup>7</sup> (non-responders vs. responders), £p = 6.29 × 10−<sup>68</sup> (non-responders vs. responders), <sup>U</sup>p = 1.41 × 10−<sup>4</sup> (non-responders vs. FLT), <sup>Þ</sup>P = 0.025 (FLT vs. TLT), p ≤ 0.049 (FLT vs. non-responders, SLT, and TLT), and §p ≤ 6.84 × 10−<sup>8</sup> (non-responders vs. FLT, SLT, TLT).

(p = 2.5 × 10−<sup>7</sup> ). A total of 93 patients (18.8%) responded to FLT, and they showed higher values of diagnosis age (p = 0.025), although for lower values of diabetes duration (p ≤ 0.049), compared to responder's to TLT. None other therapies had a significant difference.

The drug most commonly used for the FLT was metformin in monotherapy (46.7%). The second most used drug in FLT was a SU in combination with metformin (34.6%). For SLT and TLT, metformin was also very commonly used (16.7 and 8.0%, respectively). For FLT, SLT, and TLT, the third most common option was SU in monotherapy (9.3, 13.3, and 5.3%, respectively). Insulin was the most common treatment choice in SLT and TLT (55.2 and 69.3%, respectively), although it was the fifth option in FLT (2.2%) (**Table 2**).

#### Pharmacogenetic Findings by Standard Statistical Methods

The polymorphisms M165I and R400C in SLC22A2 gene were not in HWE equilibrium. The SNPs G401S and R465G in SLC22A1 gene, and K432Q in SLC22A2 gene, had a Minor Allele Frequency (MAF) < 0.01. The polymorphisms were excluded from subsequent analyses. As a result a total of 9 SNPs remained for statistical analysis. Two polymorphisms, Ala1369Ser in gene ABCC8 and Glu23Lys in gene KCNJ11, showed a significant impact on response to SUs.

The effect of ABCC8-Ala1369Ser polymorphism on Hb1Ac under SU treatment was statistically significant. Heterozygous variant (C/T) carriers had lower HbA1c values compared to homozygous wild-type (A/A) carriers (p = 0.029) and compared to homozygous wild-type and variant (A/A+C/C) carriers (p = 0.012). The genotypes resulting from the KCNJ11-Glu23Lys polymorphism also had a significant impact on HbA1c under SU treatment. First, the homozygous wild-type and variant (C/C+T/T) carriers had higher HbA1c values (p = 0.039) as compared to heterozygous carrier (C/T). None of the other 7 polymorphisms tested had a significant impact on clinical parameters (**Table 3**).

The association was evaluated under genetic models for only nine polymorphisms that had passed a quality control. We found that two of the nine polymorphisms were associated with the responder phenotype. The A/C genotype of ABCC8-Ala1369Ser and the C/T genotype of KCNJ11-Glu23Lys were significantly associated with responder phenotype using over dominant model. This association remained statistically significant after adjusting using Bonferroni's correction (p < 0.05) (**Table 4**).

### Pharmacogenetic and Clinical Parametric Findings From T2DM Patient Profiles by Direct Coupling Analysis

The DCA finds factor-drug response connections from a global statistical model computed from an estimate of the joint probability distribution of all clinical variables in the study. **Figure 1** shows the classification process that the patient clinical and genetic data undergoes to form the input discrete matrix for DCA algorithm. The outcome is a set of pairs with DI values. To uncover the minimal set of relevant connections between those factors, a Bayesian network is constructed by using the Chow-Liu algorithm as shown in **Figure 1**. However, this study refines Chow-Liu algorithm by replacing the typical use of mutual information with DI from DCA to calculate the Kullback–Leibler distance. This is a novel approach to generate the Bayesian network. Some factors cluster together and are connected showing previously known relationships, such as the connections between weight, height, BMI, and gender. These known associations of factors can be seen as validation of the links found by the algorithm. The time lengths of treatment (first line and second line), age, age of diagnosis, and diabetes diagnosis span are clustered; however, the treatment history for the third line therapy is more likely to be associated with weight.

In agreement with the pharmacogenomics finding that KCNJ11 Glu23Lys affects the response to SUs, while KCNJ11 Glu23Lys is generally connected to response to OADs. However, the ABCC8 Ala1369Ser variant is not connected to any drug in this network and is linked to KCNJ11 Ile1337Val variant.

#### TABLE 2 | Scheme for the treatment of T2DM.

fphar-09-00320 April 6, 2018 Time: 14:57 # 7


TABLE 3 | Association values between gene polymorphisms and clinical parameters.


Data presented as mean ± SD. BMI: body mass index; HbA1c: hemoglobin A1c. <sup>1</sup>P = 0.029 (A/A vs. A/C), ¶p = 0.012 (A/A+C/C vs. A/C), and £p = 0.039 (C/C+T/T vs. C/T).

Polymorphisms in the SLC22A2 gene have been identified and shown to cause inter-patient variability in the pharmacokinetic and pharmacodynamic profile of metformin. Three gene variants, M165I (rs8177507), Ala270Ser (rs316019), and R400C (rs8177516), of the SLC22A2 gene were reported with reduced uptake of OCT2 substrate, whereas a fourth one, K432Q (rs8177517), showed an increased uptake activity compared to the wild-type allele. However, attempts to translate those findings



OR: odds ratio; CI: confidence interval; Pc: P-values adjusted by using Bonferroni's correction for multiple comparisons; <sup>∗</sup>p ≤ 0.05.

into altered response to metformin of diabetic patients in several populations have not been successful (Meyer zu Schwabedissen et al., 2010). As shown in **Figure 2**, 3 out of 4 polymorphisms in SLC22A2 have connections to metformin in combination with other drugs. The genetic variants of SLC22A2 identified in a Korean population appear to have a significant impact on the disposition of metformin. As expected from the primary distribution of OCT2 in the kidney, the tubular excretion was influenced mainly by the M165I, Ala270Ser, and R400C variants of SLC22A2, leading to an increase in plasma metformin concentrations in subjects with these variants (Song et al., 2008). MET is connected to FLT cluster and SLT cluster, being consistent with the fact that MET is the most commonly used drug in FLT and the second common drug in SLT. Two SU drugs, GLIB and GLIM, are connected together.

To systematically investigate the connection between blood glucose lowering outcome and other factors, we studied the couplings between those factors and the drug response HbA1c test results. In the input matrix, the values in columns for each drug identify their presence or absence in the treatment. The overall ranking of each drug response connection is shown in the heatmap of **Figure 3A**. Treatment time and doses are highly associated with HbA1c results. Age and place of origin appear to be strongly influential. The administration of GLIB or MET in monotherapy is also highly connected to HbA1c results, partially corresponding with the fact that Metformin is the most commonly used treatment for T2DM. Among the body indexes parameter, weight and BMI still have high rankings, which suggests that in prediction of treatment outcome those two factors are worthy of consideration. The rankings of

polymorphisms have the highest influence at KCNJ11 Glu23Lys, which is observed to be correlated with drug response to SUs in both the statistical significance study and the DI-based Bayesian network.

In order to predict the glucose-lowering efficacy of each OADs and determine a better therapy strategy based on a given profile of patient, we develop a predictive model on DI (**Figure 3B**). DI is a metric of direct coupling among variables but it does not reveal the directionality of this connection. It is possible to use the parameters of the global joint distribution, to quantify how a large number of factors account for a possible outcome, i.e., responsive or non-responsive treatment. This additive model uses the eij(x<sup>i</sup> ,xj) estimates connecting factors to response with the aims to distinguish between the responder and non-responder group. We conducted a leave one out cross-validation on the 495 T2DM patients dataset, and reached an average of prediction rate at 0.70, with the maximum response vs. non-response prediction rate at 0.76.

#### DISCUSSION

#### Association Between Gene Polymorphisms and Clinical Parameters

From the nine analyzed pharamcogentic polymorphisms seeking to explain the relationship between diverse genotypes of diabetic patients and their response to different OADs, only two polymorphisms, ABCC8-Ala1369Ser and KCNJ11-Glu23Lys, showed a significant impact on response on the reduction of Hb1Ac with SU treatment. None of the other seven polymorphisms tested had a significant impact on clinical parameters. These results confirm the association of ABCC8- Ala1369Ser polymorphism and reduction of HbA1c level in the Chinese population with SU treatment (Feng et al., 2008). Nevertheless, studies in Caucasian populations showed no association of KCNJ11-Glu23Lys with Hb1Ac reduction in response to SUs (Ragia et al., 2012).

The CYP2C9 polymorphisms included in this study, Arg144Cys and I1359L, showed no significant differences in response to SUs in comparison with studies carried on Caucasian population in which they described a higher sensitivity to SUs for Ile359Leu and Arg144Cys variant carriers (Becker et al., 2008; Ragia et al., 2014). The KCNJ11-I337 polymorphism showed no evidence of being related in the response to SUs as a study carried on Chinese population suggests (Cheung et al., 2011). The SLC22A1 polymorphisms, Arg61Cys and Met61Val, showed no significant evidence of being related in the response to metformin in comparison with a study carried in Caucasian population in which they found a significant reduction of Hb1Ac after 6 months of metformin treatment (Tkac et al., 2013). The SLC22A2 polymorphisms showed no evidence of being related in the response to metformin, contrary of what has been suggested (Avery et al., 2009).

#### Association Between Genotypes and Phenotypes

Only the A/C nucleotide change from polymorphism Ala1369Ser (gene ABCC8) and the C/T nucleotide change from polymorphism Glu23Lys (gene KCNJ1) were significantly associated with responder phenotype using an over dominant model. KCNJ11 and ABCC8 encode for the subunits KIR6.2 and SUR1, respectively, of the heteroctomer KATP channel (Emami-Riedmaier et al., 2015). KATP channels regulate membrane K<sup>+</sup> flux for various cell types including pancreatic β-cells, where increased glucose metabolism results in the closure of the KATP

channels leading to calcium influx and subsequent insulin secretion (Nathan et al., 2009). Notably, KCNJ11 and ABCC8 genes lie close to each other on chromosome 11, with strong linkage disequilibrium. In a Caucasian population study, Ala1369Ser was correlated with Glu23Lys, where for every K allele of KCNJ11 gene found there was A allele of ABCC8, thus constituting a possible haplotype (Florez et al., 2004), whereas several studies and meta-analyses showed the association of KCNJ11, but not of ABCC8 polymorphisms, with susceptibility to type 2 diabetes (van Dam et al., 2005; Gong et al., 2012).

We showed that it is possible to use patient data in this comprehensive study to generate a model of the global distribution of patient profiles. This model includes phenotypic factors, health conditions, treatment information, and polymorphisms with clinical treatment outcome variable. Although we found agreement between the standard statistical tests and the global pairwise DCA model about how KCNJ11- Glu23Lys affects the efficacy of SUs drug, we also found novel relationships when modeling the dataset with global techniques. We uncover a network connecting OADs, gene polymorphisms, and patient information. Connections with the HbA1c test and metrics for the association between each pairwise variables can inform better how a large set of factors interact during disease progression.

A predictive model for OAD drug response is proposed based on direct coupling parameters eij in this study and its predictive performance has been validated by cross validation. The overall prediction rate both for predicting as responding or non-responding can be as high as 0.76. This model has the potential to be used as a guide to modify factors to predict higher response scores. This is a topic of further research that can have applications in personalized therapies. With increasing well-phenotyped cohorts and new methods, such as Next Generation Sequencing and global statistical analyses,

#### REFERENCES


the next few years promise a renewed interest in the use of pharmacogenetics to unravel drug and disease mechanisms, as well as the possibility to individualize T2DM therapy by genotype.

#### AUTHOR CONTRIBUTIONS

HB-S and LR-C conceived of the idea of analyzing genetic factors altering the oral antidiabetic drugs response in Mexican type II diabetes patients. LR-C, CL-A, HS-I, and IM-A carried out the real-time PCR experiments and Sanger sequencing experiments. FL-G provided the biospecimens and their clinical data. FM, X-LJ, and RL-C analyzed the data and designed the models and the computational framework. X-LJ and FM developed the predictive model. HB-S was in charge of overall direction and planning. HS-I, FM, X-LJ, LR-C, and DA-T wrote the manuscript. HB-S and FM should be considered as co-corresponding authors.

#### FUNDING

This work was funded by the National Council on Science and Technology (CONACYT) (Grant Nos. 185427, ECO-2015-C01- 260826, 294875, and 280114) and funds from the University of Texas at Dallas.

### ACKNOWLEDGMENTS

We thank Esteban Lopez for the comments that greatly improved the manuscript. We would also like to show our gratitude to Roxana Rivera, Ph.D., and Ricardo Cerda for their comments on earlier versions of the manuscript.

dose and effect of sulfonylurea in type II diabetes mellitus. Clin. Pharmacol. Ther. 83, 288–292. doi: 10.1038/sj.clpt.6100273


antidiabetic efficacy of gliclazide in Chinese type 2 diabetic patients. Diabetes Care 31, 1939–1944. doi: 10.2337/dc07-2248


diabetic patients treated with sulfonylureas. Pharmacogenomics 10, 1781–1787. doi: 10.2217/Pgs.09.96


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sanchez-Ibarra, Reyes-Cortes, Jiang, Luna-Aguirre, Aguirre-Trevino, Morales-Alvarado, Leon-Cachon, Lavalle-Gonzalez, Morcos and Barrera-Saldaña. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Decoys Selection in Benchmarking Datasets: Overview and Perspectives

Manon Réau† , Florent Langenfeld† , Jean-François Zagury, Nathalie Lagarde and Matthieu Montes\*

Laboratoire GBA, EA4627, Conservatoire National des Arts et Métiers, Paris, France

Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets.

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Katarina Nikolic, University of Belgrade, Serbia Francesco Ortuso, Magna Græcia University, Italy

#### \*Correspondence:

Matthieu Montes matthieu.montes@cnam.fr

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 10 November 2017 Accepted: 05 January 2018 Published: 24 January 2018

#### Citation:

Réau M, Langenfeld F, Zagury J-F, Lagarde N and Montes M (2018) Decoys Selection in Benchmarking Datasets: Overview and Perspectives. Front. Pharmacol. 9:11. doi: 10.3389/fphar.2018.00011 Keywords: virtual screening, benchmarking databases, benchmarking, decoy, structure-based drug design, ligand-based drug design

#### INTRODUCTION

Computer-aided drug design (CADD) is now a commonly integrated tool in drug discovery processes (Sliwoski et al., 2014). It represents a way to predict ligands bioactivity in silico, and help focusing the drug discovery efforts on a limited number of promising compounds, saving both time and money in this very competitive field. Among these computational methods, Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds able to interact with the target and to modulate its activity, out of large compound collections (Tanrikulu et al., 2013). VS approaches can be Ligand-Based (LBVS) when they rely only on the structure/properties of known active compounds to retrieve promising molecules from compound collections (using similarity search, QSAR or 2D/3D pharmacophore, etc.), or Structure-Based (SBVS) if the structural information of the target is used (like in molecular docking studies).

The evaluation of VS methods is crucial prior to large library prospective screening to select the appropriate methodology, and subsequently generate reliable outcome on real-life project. Thus, software and workflows must be thoroughly evaluated retrospectively using benchmarking datasets. Such datasets are composed of known active data together with inactive compounds referred to as "decoys" (Irwin, 2008). Ideally, both active and inactive compounds should be selected on the basis of experimental data. However, the documentation on inactive data is scarce,

**200**

and putative inactive compounds are generally used instead. Among the common metrics used to estimate the performance of VS methods we find receiver operating characteristics (ROC) curves, the area under the ROC curve (ROC AUC) (Triballeau et al., 2005), Enrichment Curves (EC), Enrichment Factors (EF) and predictiveness curves (Empereur-mot et al., 2015). While conceptually different, they all share the same objective: assess the ability of a given method to identify active compounds as such, and discriminate them from the decoy compounds.

However, since the publication of the first benchmarking database in the early 2000s, the composition in both active and decoy compounds have been pointed out to crucially impact VS methods evaluation; several biases have been shown to incline VS assessment outcomes positively or negatively. The difference between the two chemical spaces defined by the active compounds on the one hand and the decoy compounds on the other hand may lead to artificial overestimation of the enrichment (Bissantz et al., 2000). On the contrary, the possible presence of active compounds in the decoy compounds set may introduce an artificial underestimation of the enrichment (Verdonk et al., 2004; Good and Oprea, 2008) since decoys are usually assumed to be inactive rather than proved to be true inactive compounds (i.e., confirmed inactive through experimental bioassays). New databases were designed to minimize those biases (Rohrer and Baumann, 2009; Vogel et al., 2011; Mysinger et al., 2012; Ibrahim et al., 2015a). Finally, many studies pointed out that the VS performance depends on the target and its structural properties (structural flexibility, binding site physicochemical properties, etc.; Cummings et al., 2005). Taking this into consideration, and despite the growing number of protein families represented in databases, decoy datasets generation tools were made publicly available in order to allow any scientist to fine-tune target-dependant and reliable benchmarking datasets (Mysinger et al., 2012; Ibrahim et al., 2015a).

In this review, we first present how the notion of decoy compounds evolved from randomly selected putative inactive compounds to rationally selected putative inactive compounds and finally true negative compounds. We develop the successive benchmarking datasets that were published in the literature and their basic to highly refined decoys selection workflows together with the resulting positive or negative biases due to their design. We then detail 5 benchmarking databases or decoy sets generator tools along with their detailed decoy compounds selection that represent the current state-of-the-art as of 2017: their respective composition tend to minimize such biases. Finally, we propose recommendations to select minimally biased benchmarking datasets containing putative inactive compounds as decoy compounds and introduce guidelines to design true inactive compounds containing databases.

### THE HISTORY OF DECOYS SELECTION

#### Randomly Selected Decoys

The first use of a benchmarking database to evaluate virtual screening tools dates back to 2000, with the pioneering work of Bissantz et al. (2000). The objective of their study was to evaluate the ligands enrichment, i.e., the ability of docking programs to associate active compounds with the best scores within a compound collection. Three docking programs [Dock (Kuntz et al., 1982), FlexX (Rarey et al., 1996), Gold (Jones et al., 1997)] combined with 7 scoring functions [ChemScore (Eldridge et al., 1997), Dock, FlexX, Fresno (Rognan et al., 1999), Gld, Pmf (Muegge and Martin, 1999), Score (Wang et al., 1998)] were evaluated on two different target proteins: Thymidine Kinase (TK) and the ligand binding domain of the Estrogen Receptor α subtype (ER α).

For each target, a dataset containing 10 known ligands and 990 molecules assumed to be inactive (decoy compounds) was created. The decoy compounds were selected following a two-step scheme: (1) the Advanced Chemical Directory (ACD v.2000-1, Molecular Design Limited, San Leandro) was filtered to eliminate undesired compounds (chemical reagents, inorganic compounds and molecules with unsuitable molecular weights), (2) 990 molecules were randomly selected out of the filtered dataset. The datasets were used to evaluate and compare several docking and scoring schemes. The authors eventually recommended a calibration of docking/consensus scoring schemes on reduced data sets prior to large dataset screens. Later on, Bissantz et al. (2003) applied the same protocol to three human GPCRs to investigate whether their homology models were suitable for virtual screening experiments.

A growing interest for virtual screening benchmarking databases soon emerged from the community (Kellenberger et al., 2004; Brozell et al., 2012; Neves et al., 2012; Repasky et al., 2012; Spitzer and Jain, 2012). New databases were designed with an increasing complexity in the decoys selection methodologies (see section Benchmarking Databases). Nowadays, benchmarking databases are widely used to evaluate various VS tools (Kellenberger et al., 2004; Warren et al., 2006; McGaughey et al., 2007; von Korff et al., 2009; Braga and Andrade, 2013; Ibrahim et al., 2015a; Pei et al., 2015) and to support the identification of hit/lead compounds using LBVS and SBVS (Allen et al., 2015; Ruggeri et al., 2015).

#### Integration of Physicochemical Filters to the Decoy Compounds Selection

In the early 2000s, Diller's group incorporated filters in the decoys selection to ensure that the discrimination they observed was not solely based on the size of the decoy compounds (Diller and Li, 2003). In addition to the 1,000 kinases inhibitors they retrieved from the literature for 6 kinases (EGFr, VEGFr1, PDGFrβ, FGFr1, SRC, and p38), 32,000 compounds were randomly selected from a filtered version of the MDL Drug Data Report (MDDR). The filters were designed to select decoy compounds displaying similar polarity and molecular weight. Similarly, in 2003, a benchmarking database derived from the MDDR was constructed by McGovern et al. (McGovern and Shoichet, 2003). Compounds with unwanted functional groups were removed, leading to 95,000 compounds. The targets of the MDDR for which at least 20 known ligands were available constituted a target dataset (CA II, MMP-3, NEP, PDF, and XO). The remaining compounds were used as decoy compounds. The addition of rational filters was a considerable step forward in the improvement of decoys selection, but due to the commercial licensing of the MDDR, its use was limited (http://www.akosgmbh.de/accelrys/databases/ mddr.htm<sup>1</sup> ).

The first benchmarking databases were composed as follows: (1) true active compounds consisted in known ligands extracted from the literature while (2) decoy compounds consisted in putative inactive compounds randomly selected from large databases possibly filtered to be compliant to specific criteria (drug likeness, molecular weight, topological polar surface area. . . ). Since the decoy compounds were pseudo-randomly selected, they were assumed to be inactive on the defined targets.

Despite the use of the MDDR and the filtering of the decoy compounds, these benchmarking databases displayed a major drawback: the significant differences occurring between the physicochemical properties of the active compounds and decoy compounds led to obvious discrimination and then artificially good enrichments (Verdonk et al., 2004; Huang et al., 2006).

In 2006, Irwin et al. proposed that the decoy compounds should be similar to the known ligands regarding their physicochemical properties to reduce the introduction of bias while being structurally dissimilar to the known ligands to reduce their probability to be active on the defined target. Following these recommendations, they created the DUD database (Huang et al., 2006) that was immediately considered as the gold standard for the evaluation of VS methods.

The DUD database is composed of 2,950 ligands and 95,326 decoys for a total of 40 proteins from 6 classes (nuclear hormone receptors, kinases, serine proteases, metalloenzymes, folate enzymes and others). The decoy compounds were extracted from the drug-like subset of the ZINC database (Irwin and Shoichet, 2005). The 2D-similarity between known ligands and decoy compounds was computed by calculating the Tanimoto distance based on the CACTVS type 2 substructure keys and 5 physicochemical properties. For each active compound, the 36 molecules sharing the most similar properties while being topologically dissimilar (Tanimoto < 0.9) were conserved. The evaluation of the performance of DOCK (Meng et al., 1992; Wei et al., 2002; Lorber and Shoichet, 2005; Huang et al., 2006) confirmed that uncorrected databases such as the MDDR led to over-optimistic enrichments compared to corrected databases such as the DUD.

### Benchmarking Database Biases

Despite the precautions taken to build the DUD database, several remaining biases have been reported in the literature.

The "analogous bias" (Good and Oprea, 2008) lies in the limited chemical space of active compounds that is restricted to the chemical series that have been explored and referenced in databases. The discrimination of the active compounds from decoy compounds can be simplified since the decoy sets would display a larger structural variability that could induce an overestimation of the performance of VS methods. The lack of diversity in the structures of known active compounds limits the training and evaluation of LBVS methods to perform scaffold-hopping, i.e., the identification of active hit compounds that structurally differ from reference molecules while retaining similar activity.

The "complexity bias" (Stumpfe and Bajorath, 2011) or "artificial enrichment bias": active compounds and decoy compounds often display differences in their respective structural complexity since active compounds are often optimized compounds extracted from large series in the scientific and patent literature, which is not necessarily the case for the structures of pseudo-randomly selected decoy compounds.

The "false negative bias" (Vogel et al., 2011; Bauer et al., 2013) lies in the presence of active compounds in the decoy set. Unlike the analogous and complexity biases, it induces an underestimation of the performance of the VS methods that could be particularly dramatic for the evaluation of LBVS methods (Irwin, 2008).

The need for less biased benchmarking databases to objectively evaluate VS methods favored the emergence of new strategies to eradicate or at least minimize those biases. Two decoys selection strategies arose from benchmarking databases improvement attempts: (1) the use of highly refined decoys selection strategies and (2) the integration of true negative compounds in the decoy set.

### Highly Refined Putative Inactive Compounds Selection

The reported biases pointed out that the composition of both active compounds and decoy compounds sets has a huge impact on the evaluation of the performance of VS methods (Verdonk et al., 2004; Good and Oprea, 2008). Therefore, particular efforts were performed in the selection strategies for active compounds and decoy compounds.

To address analogous bias, a strategy consists in modifying the receiver operating characteristics (ROC) curves (i.e., the fraction of actives among the top fraction x of the data set) (Triballeau et al., 2005) by weighting the rank of each active compound with the size of its corresponding lead series (Clark and Webster-Clark, 2008). This allows an equal contribution of each active chemotype to the ROC curve (rather than each active compound). Another widely used method is to fine-tune the active compounds dataset prior to screen to ensure an intrinsic structural diversity. To this aim, the MUV datasets (Rohrer and Baumann, 2009) were designed using the Kennard Jones algorithm to obtain an optimal spread of the active compounds in the decoy compounds chemical space while ensuring a balance between the active compounds self-similarity and separation from the decoy compounds. Despite these observations, the most used strategy in the literature still consists in clustering ligands based on 2D descriptors and retain only cluster representatives in the final dataset (Good and Oprea, 2008; Mysinger et al., 2012; Bauer et al., 2013).

To reduce artificial enrichment, efforts were made to match as much as possible the physicochemical properties of the decoys to the physicochemical properties of the active compounds. To this aim, the Maximum Unbiased Validation database (MUV) (Rohrer and Baumann, 2009) was designed to ensure embedding of active compounds in the decoy compounds

<sup>1</sup>MDDR licensed by Molecular Design, Ltd., San Leandro, CA.

chemical space based on an embedding confidence distance cutoff calibrated on multiple drug-like compounds banks' chemical space. Active compounds that were poorly embedded in the decoy set were discarded. A way to ensure the availability of potential decoy compounds for any ligand is to generate decoys that ignore synthetic feasibility (Wallach and Lilien, 2011). Other databases select decoys that match active compounds in a multiple physicochemical properties space. The DEKOIS 2.0 (Ibrahim et al., 2015a) proposed a workflow that used 8 physicochemical properties while the DUD-E added net charge to the 5 physicochemical properties already considered in the original DUD.

To address the risk of including false negatives in the decoy set, a common strategy is to select decoy compounds topologically different to any active compound. For this purpose, Bauer et al. introduced the LADS score to guide decoys selection (Vogel et al., 2011). In the DUD-E, potential false decoys are avoided by applying a stringent FCFP\_6 fingerprints Tanimoto-based filter. It is important to note that since the evaluation of LBVS methods requires that decoy compounds should not be discriminated using basic 2D-based similarity tools, the use of 2D-based dissimilarity filters to avoid false negatives in the decoy set makes the concerned databases inappropriate for the evaluation of the performance of LBVS methods. Therefore, Xia et al. developed a method to select adequate decoys for both SBVS and LBVS (Xia et al., 2014) by favoring physicochemical similarity as well as topological similarity between active compounds and decoy compounds that passed a primary topological dissimilarity filter.

With these improvements, the notion of decoys remained the same—putative inactive compounds—but their selection critically evolved. Ever since, the main progress achieved in the literature lies in the diversification of the protein targets represented in benchmarking databases. The growing need for datasets dedicated to a given target led to (1) an increasing diversity of targets in benchmarking databases [the DUD-E (Mysinger et al., 2012) contains datasets against 102 targets while the previous DUD (Huang et al., 2006) contained datasets only for 40 targets] and (2) highly specialized benchmarking databases focused on a particular class of targets. Such specialized datasets exist for GPCRs [GPCR ligand library (GLL)/Decoy Database (GDD) (Gatica and Cavasotto, 2012)], histone deacetylases [maximal unbiased benchmarking data sets for HDACs— MUBD-HDACs (Xia et al., 2015)], or nuclear receptors [NRLiSt BDB (Lagarde et al., 2014a)]. As a notice, DUD-E or DecoyFinder (Cereto-Massagué et al., 2012) offer automated decoy set generation tools based on the properties of active compounds, enabling the community to easily design and tune their own dataset for a particular target.

#### Toward True Negative Compounds

A common issue about decoys is the lack of data regarding their potential bioactivity against the target. Most methods assume that the absence of data means an absence of activity, which may lead to include unknown active ligands into a decoy set. To eliminate such false negatives from decoy sets, one solution is to use referenced true negative compounds that can be either true inactive or compounds displaying an undesirable activity.

True inactive compounds, i.e., compounds that displayed no experimental binding affinity against the target of interest, can be used to identify binders. Inactive data is made available in several public activity and/or affinity annotated compound repositories and high throughput screening (HTS) initiatives such as: ChEMBL (Bento et al., 2014), Drugbank (Wishart et al., 2008) that provides annotations for approved drugs; PDBBind (Wang et al., 2004, 2005), Binding MOAD (Benson et al., 2008) and AffinDB (Block et al., 2006) that contain binding affinity data for protein–ligand complexes available in the Protein Data Bank (PDB) (Berman et al., 2000); PDSP Ki database (Roth et al., 2000) that stores screening data from the National Institute of Mental Health's Psychoactive Drug Screening Program; BRENDA (Placzek et al., 2017) that provides binding constants for enzymes; IUPHAR (Southan et al., 2016) that contains binding information for receptors and ion channels; GLIDA (Okuno et al., 2006) and GPCRDB (Munk et al., 2016) that contains binding data for G-proteincoupled receptors; D3R datasets (Drug Design Data Resource<sup>2</sup> ) that have been provided by pharmaceutical companies and academia and contain affinity data for 7 proteins together with inactive compounds; ToxCastTM/Tox21 (Kavlock et al., 2012) and PCBioAssay (Wang et al., 2017) that provide HTS data for various targets.

As an example, the DUD-Enhanced (Mysinger et al., 2012) (DUD-E) integrates some experimentally validated inactive compounds extracted from ChEMBL in the decoy set in addition to putative inactive compounds: an arbitrary 1µM cutoff is used to classify ligands in the active set while molecules with no measurable activity at 30µM or higher concentration were classified into the decoy set. Similarly, the Maximum Unbiased Validation (MUV) (Rohrer and Baumann, 2009) datasets are composed of both active and inactive compounds collected from the PubChem BioAssay annotated database.

Unwanted compounds, i.e., compounds that display unwanted activity or binding, can also be used as negatives. For instance, a recent study used ligands of the NRLiSt BDB (Lagarde et al., 2014a) either as active compounds or decoy compounds, depending on their activity for each nuclear receptor; antagonist (or agonist) ligands of a given nuclear receptor were used as decoys to evaluate agonistic (or antagonistic) pharmacophores (Lagarde et al., 2016, 2017). This strategy has shown successful results in the past: Guasch et al. (2012) focused on PPAR γ partial agonists to avoid side effects accompanying full receptor activation and built an anti-pharmacophore model with known full agonist compounds to remove all potential full agonist compounds from their initial set of 89,165 natural products and natural product derivatives. The authors screened the remaining compounds on a partial agonist pharmacophore model and identified 135 compounds as potential PPARγ partial agonists with good ADME properties among which 8 compounds with new chemical scaffolds for PPARγ partial agonistic activity. After

<sup>2</sup>Available at: drugdesigndata.org


**204**

BENCHMARKING

DATABASES


biological tests, 5 compounds were confirmed to be PPAR γ partial agonists.

### SELECTED DATABASES

## Maximum Unbiased Validation (MUV)

The MUV was designed to propose unbiased datasets in regard to both artificial enrichment and analogous bias by proposing a new approach gleaned from spatial statistics (Rohrer and Baumann, 2009). The authors ensured homogeneity in activesactives similarity and actives-decoys dispersion in order to reach a random-like distribution of active compounds and decoy compounds in a physicochemical descriptors chemical space. This implies that the molecular properties contained no information about the bioactivities of active and decoy compounds. Datasets were designed for 18 targets with a total of 30 actives and 15,000 decoys for each target.

#### Initial Compounds Database

Potential active and decoy compounds were extracted from HTS experiments available in PCBioAssay (June 2008) (PubChem BioAssay<sup>3</sup> ). In these assays, a primary screen was performed in a large number of compounds (>50,000) and was followed by a low throughput confirmatory screen. Compounds with an experimental EC50 in the confirmatory screen were selected as potential active compounds while inactive compounds from the primary screen were selected as potential decoys.

#### Actives Selection

A two-step process was applied to rationally select final active compounds for the MUV data sets. (1) Potential active compounds were filtered to eliminate artifacts caused by organic chemicals aggregation in aqueous buffers ("Hill slope filter"), as well as off-targets, cytotoxic effects or interference with optical detection methods ["frequency of hits filter" and "autofluorescence (Simeonov et al., 2008) and luciferase inhibition (Auld et al., 2008) filters"]. (2) A "chemical space embedding filter" was applied to ensure that actives located in regions of the chemical space devoid of decoys were eliminated from the dataset (**Figure 1**). Subsets of 30 actives with the maximum spread per target were generated using a Kennard-Jones algorithm. Selected active compounds were exchanged with remaining potential active compounds until all datasets were adjusted to a common level of spread.

#### Decoys Selection

To carefully match active and decoys physicochemical properties, Rohrer et al. proposed that the level of self-similarity within the active compounds set [measured using the "nearest neighbor function" G(t)] should be equal to the degree of separation between the active compounds set and the decoy compounds set [evaluated with the "empty space function" F (t)] (**Figure 1**). Following guideline, the data clumping should be null, ensuring a random-like distribution of decoy and active compounds in the overall chemical space. The distances were computed based on 1D molecular properties (counts of all atoms, heavy atoms, boron, bromine, carbon, chlorine, fluorine, iodine, nitrogen, oxygen, phosphorus, and sulfur atoms in each molecule as well as the number of H-bond acceptors, H-bond donors, the logP, the number of chiral centers, and the number of ring systems). The level of separation between the decoy compounds and the active compounds was adjusted to the same level of spread so that the data clumping is null. In total, 500 decoys were selected per selected active, resulting in 15,000 decoys per dataset.

The minimization of analog bias and artificial enrichment makes the MUV datasets fitted for LBVS. The availability of structures in the PDB (2008) for seven of the MUV targets makes it suitable for SBVS as well (Löwer et al., 2011). Thus, the MUV constituted the first dataset that enabled comparative evaluations of SB and LBVS methods and protocols.

### Demanding Evaluation Kits for Objective in Silico Screening (DEKOIS)

In 2011, Vogel et al. proposed a new generator of decoy compounds sets called Demanding Evaluation Kits for Objective In Silico Screening (DEKOIS) (Vogel et al., 2011). The authors designed their tool to avoid the introduction of well-known and described biases into the decoy sets, i.e., analog bias and artificial enrichment. A first step in their workflow is subsequently to closely match physicochemical properties of both ligand and decoys to limit the analog bias. Then, to deal with the risk of including false negative compounds in the decoy compounds set, a new concept is applied to the decoys selection process: the latent actives in the decoy set (LADS). Finally, the structural diversity of the active and decoy compounds structures into the sets is evaluated and maximized, and the embedding of the actives into the decoys chemical is assessed. The whole workflow was further improved in 2013 to produce the current version of this tool, DEKOIS 2.0 (Bauer et al., 2013), and 81 ready-to-use (active and decoys) benchmarking datasets for 11 target classes are currently available through the DEKOIS website (www.dekois. com/, accessed 10/23/2017).

#### Initial Compounds Database

Decoy compounds from the DEKOIS 2.0 benchmarking datasets are selected from a subset of the ZINC database of 15 million molecules. Eight physicochemical properties are evaluated: molecular weight, octanol–water partition coefficient, hydrogen bonds acceptor and/or donor, number of rotatable bonds, positive and negative charges, and the number of aromatic rings. For each physicochemical property, bins are defined, and all possible combinations of bins are used to split the database compounds into cells. The initial bins are defined so that each bin is equally populated, and each final cell is characterized by a set of 8 physicochemical properties. Each user-provided active compound is associated with the closest cell (in terms of physicochemical properties), and 1,500 decoys are randomly preselected from this parent cell, or from the direct neighbor cells if the parent cell is not populated enough to provide 1,500 decoy compounds (**Figure 1**).

#### Decoys Selection

The two criteria for the refinement steps are the structural diversity and the low rate of latent active in decoy set (LADS). A physicochemical similarity score (PSS) and a LADS score are

<sup>3</sup>Available online at: http://pubchem.ncbi.nlm.nih.gov/sources#assay

computed, normalized and combined to select the final 30 decoys associated with each active ligand:


$$LADS \, score = \frac{\sum\_{i=1}^{n} \left( N\_{i \{ HeayAtoms\}} \cdot f\_{i \{ FCFP\_6 \} \, segment \} \right)}{N\_{FCPC\_6 \} \, ranges},$$

with n the number of fingerprint bit strings shared by the decoy and the active set, f<sup>i</sup> the frequency of fragment i in the active set, N<sup>i</sup> the number of heavy atoms into fragment i, and N the total number of FCPC\_6 fragments into the decoy.

The weighting of the LADS score by the frequency of the bit string and the size of the corresponding fragment was added in the second version of DEKOIS (Bauer et al., 2013) to ensure that large bioactive substructures and substructures frequently found exert a greater influence on LADS score compared to smaller and rare functional groups.

(3) The LADS and PSS scores are normalized and combined into a consensus score to sort decoy compounds. The subsequently best 100 decoys are selected. Finally, the fingerprints are used to select the 30 most dissimilar decoys for each active.

Using this enhanced protocol, Bauer et al. showed an improvement of the "deviation from optimal embedding score" (DOE score) (Vogel et al., 2011; Bauer et al., 2013) for DEKOIS 2.0 compared to DEKOIS, and found a good (<0.2) DOE score for 89% out of the 81 targets considered.

#### Dud-Enhanced (DUD-E)

Despite the extensive use of the DUD, several studies pointed out that some scaffolds were over-represented in the active sets, that the charge was not considered in property-matching for ligand selection, and that true ligands could be found in the decoy sets (Good and Oprea, 2008; Hawkins et al.,

2008; Irwin, 2008; Mysinger and Shoichet, 2010). Shoichet et al. proposed the DUD-E (DUD-Enhanced) to address these weaknesses in both the active and the decoy sets design in the DUD, and extended the number of represented protein families in the database. The DUD-E contains 102 proteins that span diverse target classes. To address analogous bias, ligands were clustered by their Bemis-Murcko atomic frameworks (Bemis and Murcko, 1996) (**Figure 2**), and a topological dissimilarity filter was applied to avoid active compounds in the decoy sets.

#### Initial Compounds Database

Active compounds assigned to each target of the DUD-E were collected from the ChEMBL09 database if their activity/affinity (Ki, Kd, IC50, EC50, or associated logP) was ≤1µM (Gaulton et al., 2012). Additionally, 9,219 experimental decoys displaying no measurable affinity up to 30µM were included in the decoy sets.

#### Active Set Preparation

Active compounds were clustered based on their Bemis-Murcko atomic frameworks. When more than 100 frameworks were represented, the highest energy ligand from each cluster is considered, while when less than 100 frameworks are represented, the numbered of considered ligands was raised to obtain more than 100 molecules. Even if this selection protocol could have been optimized for sets with low frameworks diversity, it ensures sufficient diversity and quantity of compounds for the other sets.

#### Decoys Selection

The decoy compounds were extracted from the ZINC database (Irwin and Shoichet, 2005) and selected by narrowing or widening windows around 6 physicochemical properties: molecular weight, octanol-water partition coefficient, rotatable bonds, hydrogen bonds acceptors, hydrogen bonds donors, and the net charge. To avoid active compounds in the decoy sets, a topological dissimilarity filter was applied. Molecules were sorted according to their Tanimoto distance to any ligands using CACTVS fingerprints, and the 25% most dissimilar decoy molecules were retained. Finally, up to 50 decoys were randomly selected for each ligand and pooled with the 9,219 experimental decoys.

An automated tool was made available online to generate decoys from user-supplied ligands using the same protocol (http://decoys.docking.org). The possibility to generate decoy sets for any target has been revealed successful and is now widely used by the scientific community (Lacroix et al., 2016; Nunes et al., 2016; Allen et al., 2017; Meirson et al., 2017).

Despite the success of the DUD-E, some weaknesses should be corrected in the DUD-E benchmarking database. The 102 targets are defined as a UniProt gene prefix (such as DRD3) and not a full gene\_species (such as DRD3\_HUMAN or P35462), which can bias the actives selection when the binding site composition differs between species. Additionally, only one single structure was considered for each protein while many docking studies pointed out that the structure selection is crucial for screening and docking, particularly for proteins that accommodate ligands with different binding modes (May and Zacharias, 2005; Ben Nasr et al., 2013; Lionta et al., 2014). A recent study has shown that the ligand pharmacological profile should be considered for both the active set design and the structure selection (Lagarde et al., 2017). For instance, nuclear receptors (NR) can be inhibited by antagonists or activated by agonists that differ in their structure and properties: agonists should be considered in the active set if the screening is performed on an agonist-bound structure while antagonists should be used in the active set if the screening is performed on an antagonist-bound structure.

### Nuclear Receptors Ligands and Structures Benchmarking Database (NRLiSt BDB)

The NRLiSt BDB (Nuclear Receptors Ligands and Structures Benchmarking DataBase) was created to address the lack of annotation information and pharmacological profile consideration in existing NR databases.

#### Ligands Preparation

The NRLiSt BDB is composed of 9,905 active molecules targeting 27 nuclear receptors (NRs). Active compounds are divided into 2 datasets per target according to their agonist or antagonist profile. All active compounds were extracted from the ChEMBL database and included in the NRLiSt after a manual inspection of the corresponding ligands bioactivity data in the original papers. All inverse-agonists, modulators, agonists/antagonists, weak to partial agonists, weak to partial antagonists and ligands with unknown pharmacological profile were discarded.

In addition 339 human holo structures extracted from the PDB are provided, among which 266 are agonists-bound, 17 are antagonists-bound and 56 are others-bound. Valid active compounds extracted from literature were clustered using chemical fingerprints, and a Tanimoto cut-off of 0.5.

#### Decoys Selection

In total 458,981 decoys generated with the DUD-E online tool were provided, with a mean ratio of 1/51 for each dataset.

In further studies, Lagarde et al. integrated the antipharmacological profile ligands in the decoy set to orient the screening toward the desired pharmacological profile (Lagarde et al., 2014b). For instance, antagonists were considered as the decoy compounds set for agonists screening research, while agonists were considered as the decoy compounds set for antagonists screening research. In agreement, the corresponding agonist- and antagonist-bound structures were used for SBVS, when available. Results showed that the enrichment is better when the pharmacological profile is considered prior to screening and should therefore be systematically considered to avoid artificially bad ligands enrichment.

### Maximal Unbiased Benchmarking Data Sets for HDACs (MUBD-HDACs)

So far, most of the decoy datasets [such as DUD-E (Mysinger et al., 2012) and DEKOIS (Vogel et al., 2011; Bauer et al., 2013)] or decoys generator [such as DecoyFinder (Cereto-Massagué et al., 2012) or the DUD-E generator server] are designed for SBVS purpose. Few databases [i.e., MUV (Rohrer and Baumann, 2009), NRLiSt BDB] are intended to propose benchmarking datasets for LBVS. Xia et al. thus proposed a workflow to fulfill this need, and built up decoy datasets for LBVS targeting the histone deacetylases protein family (HDACs).

#### Ligands Preparation

Active compounds were retrieved from the ChEMBL18 database (Gaulton et al., 2012), among molecules annotated with quantitative data (i.e., IC50), manually checked, and filtered (exclusion of salts, molecules with more than 20 rotatable bonds or with a MW of 600 or more). Finally, ligands displaying a Tanimoto coefficient greater than 0.75 based on MACCS fingerprints were removed to exclude analog molecules, and 6 physicochemical properties (MW, logP, HBAs, HBDs, RBs and net Formal Charge–nFC) were computed for all HDACs inhibitors (HDACIs).

#### Decoys Selection

The "All-Purchasable Molecules" subset of the ZINC database was used as the initial set of molecules before a two-step filtering:


Last, for each ligand, the PDB (Berman et al., 2000) structures of the targeted HDAC isoform were prepared and provided for SBVS data sets. Unlike DUD-E (Mysinger et al., 2012), only Homo sapiens 3D-data were considered.

The MUBD-HDAC datasets for HDAC2 and HDAC8 isoforms were compared to DUD-E (Mysinger et al., 2012) and DEKOIS 2.0 (Ibrahim et al., 2015a) datasets, in terms of structural diversity [Bemis-Murcko atomic frameworks (Bemis and Murcko, 1996)], property matching and ligand enrichment in SB- and LB-VS approaches. The MUBD-HDAC displayed similar to better results in terms of structural diversity and property matching and was more challenging as measured by ligand enrichment using GOLD (Jones et al., 1997) or fingerprints similarity search, in agreement with a higher structural similarity. Finally, the MUBD-HDACs sets displayed small to great improvement in terms of nearer ligands bias (i.e., ligands that are more similar structurally to a ligand than to any decoy), compared to DUD-E and DEKOIS 2.0, respectively. This bias is known to produce artificially positive LBVS evaluation outcomes (Cleves and Jain, 2008) and thus, should be minimized.

Of note, a similar work was done (Xia et al., 2014) on GPCRs using the GLL/GDD database (Gatica and Cavasotto, 2012) as ligands set, and also resulted in reduced artificial enrichment and analog bias compared to the original GLL/GDD sets.

### DISCUSSION AND RECOMMENDATIONS

### Ideal Benchmarking Database

The ideal VS benchmarking datasets composition should mimic real-life cases, where a small number of diverse active ligands is embedded into a much larger fraction of inactive compounds. Moreover, both sets of molecules are usually indistinguishable using simple descriptors like their physicochemical properties and share common fragments or functional chemical groups; such features should therefore be transposed to benchmarking datasets design, so that the putative inactive compounds constitute good "decoy" compounds in line with the active compounds and ensure a robust evaluation of the VS methods(Good and Oprea, 2008; Lagarde et al., 2015; Xia et al., 2015).

### Comparison of Decoys Selection Methods for SBVS

Among the recent tools to help create benchmarking sets (MUV, DEKOIS, DUD-E, and MUBD), the main difference resides in the strategy used to achieve their respective objectives: the DUD-E and DEKOIS data sets are designed for evaluating SBVS methods while MUV and MUBD are conceived for benchmarking LBVS approaches. Following this basic distinction, the respective algorithms to generate decoy datasets differ significantly. In the former case, the topological dissimilarity between ligand compounds and decoy compounds is maximized to avoid inclusion of active compounds into decoy datasets. In the latter case, the proper embedding of decoy compounds into the ligands chemical space is of primary importance.

For the DUD-E, the final decoys were randomly selected from the 25% most topologically dissimilar molecules compared to the ligands to ensure unbiased selection of decoy compounds. However, several studies pointed out that bias are still present into DUD-E data sets. For instance, Chaput et al. recently evidenced that the performance of four VS programs (Glide, Gold, FlexX and Surflex) is biased (over-estimated) using the DUD-E. Good performance (as measure by BEDROC curves) could be achieved for all programs when original DUD-E datasets were used, while only Glide was considered successful when chemical library biases (i.e., datasets whose decoys and active compounds differ for nine physicochemical properties) were removed. While the DUD-E was successfully used for numerous studies, this observation clearly showed that there is still place for improvements.

Boeckler's group proposed a similar workflow in DEKOIS and DEKOIS 2.0. A physicochemical similarity over eight properties (and represented by the physicochemical similarity score PSS) is used and the topological dissimilarity between the active compounds and the future decoy compounds is computed as in the DUD-E. However, two main differences have to be noted: (1) the topological dissimilarity was computed using the more elaborated weighted LADS score rather than a 2D fingerprint based Tanimoto coefficient filter and (2) the LADS score was combined with the PSS prior to final selection of the decoys. Therefore, the final decoys selection was balanced by both parameters (physicochemical similarity and topological dissimilarity) rather than using successive arbitrary (even if widely used) thresholds, and was successfully used by Hamza et al. (2014) for drug repurposing. This balance may come at a cost, as evidenced by Xia et al.: DEKOIS datasets for HDAC2 and HDAC8 were shown to be less efficient in terms of property matching between the active compounds and the decoy compounds (Xia et al., 2015). However, the DUD-E and DEKOIS sets perform similarly in enrichment using Gold and DEKOIS perform significantly worse than DUD-E using 2D based similarity search approaches.

### Comparison of Decoys Selection Methods for LBVS

Both DUD-E and DEKOIS databases share the same overall decoy selection procedure by combining topological dissimilarity and physicochemical properties similarity. While adapted to SBVS, this approach may hinder the objective evaluation of LBVS that is very sensitive to topological difference between active and decoy compounds. The MUV datasets (Rohrer and Baumann, 2009) was designed to overcome this specific weakness of the benchmarking datasets. The authors introduced the notion that decoy compounds and active compounds should be homogeneously spread in the chemical space rather than decoy compounds should be topologically dissimilar to the active compounds (as in the DUD-E for instance). The authors tested 18 datasets and claimed that MUV benchmarking datasets displayed neither analogous bias nor artificial enrichment. Furthermore, they noticed that their data sets were SBVS compliant and compared advantageously to the biased DUD sets, leading to a potential broader use of their sets. MUV sets were applied to the evaluation of VS tools (Tiikkainen et al., 2009; Abdo et al., 2010), the training of new QSAR models (Marchese Robinson et al., 2017) or molecular graph convolutions (Kearnes et al., 2016).

As highlighted by Xia et al. "MUV is restricted by the sufficient experimental decoys (chemical space of decoys)" (Xia et al., 2015). Indeed, MUV relies on the availability of experimental data and is restricted to well-studied targets. The authors subsequently proposed the Maximum Unbiased Benchmarking Data sets (MUBD, see section Benchmarking Databases) that was applied to GPCRs (Xia et al., 2014), HDACs (Xia et al., 2015; Hu et al., 2017) and Toll-like receptor 8 (Pei et al., 2015). The MUBD-DecoyMaker algorithm relies on both a minimal and required topological dissimilarity (sims) between decoy and active compounds, but makes use of an additional criterion that minimizes the simsdiff parameter, i.e., ensures that decoy and active compounds are as similar as possible.

One should note that this additional step (the decoyactives similarity check) yield datasets also suitable for SBVS; they seemed even more challenging in SBVS (for HDAC2 and HDAC8) as they provided datasets with higher structural similarity (Xia et al., 2015). Thus, these approaches are particularly appealing as they provide benchmarking datasets that (1) are adapted to LB and SB-VS approaches, (2) subsequently allow comparative evaluations of the performance of LB and SB-VS approaches, and (3) may be more challenging for SBVS.

#### Fine-Tuned Benchmarking Datasets

The quality of an evaluation lies in the consistency between the retrospectively screened benchmarking datasets and the prospectively screened compound collections as well as the target binding site properties (Ben Nasr et al., 2013). The recent trend to publish protein family-specific datasets or user-provided active compounds dependent decoys generation tools paves the way for a valuable and systematic use of benchmarking datasets prior to prospective VS of large compound collections.

In SBVS, tuned datasets should be used to identify the protocol, conformational sampling, and/or scoring methods that induces the best enrichment in active compounds (Allen et al., 2015, 2017; Lacroix et al., 2016; Li et al., 2016; Nunes et al., 2016; Meirson et al., 2017). For instance, Allen et al. (2015, 2017) evaluated different scoring schemes using DUD-E generated decoys and successfully identified dual EFGR/BRD4 inhibitors. In LBVS, the choice of the dataset is crucial to build a reliable model that can be used to distinguish active compounds from decoy compounds. For example, Ruggeri et al. (2015) used DUD-E generated decoys to define and optimize pharmacophore models that led to the identification of 2 dual competitive inhibitors of P. Falciparum M1 (PfA-M1) and M17 (PfA-M17) aminopeptidases.

Of note, when using automatic decoy datasets generation tools, the provided active compounds should be carefully selected to avoid the previously detailed biases.

#### Integration of True Inactive Compounds

Despite the open-data initiatives that should ease the access to data in the near future, the low documentation about negative data (inactive and/or non-binding) is still an open issue. The inclusion of experimental data in a dataset requires great attention since (1) publicly available databases may present annotation errors that should be manually corrected (Lagarde et al., 2014a), and (2) diversity in the type of value and experimental conditions make some data barely comparable. The selection and the use of negative compounds (inactive and/or non-binding) in the evaluation/development of methods is a delicate step that strongly influences the quality of the resulting model. In agreement with Lagarde et al. (2014a) and Kaserer et al. (2015), we recommend that:


effects, cytotoxic effects or interference with optical detection methods (auto-fluorescence and luciferase inhibition).


One should note that the integration of inactive/non-binding compounds comes with new basics for datasets design. This case is particularly challenging since the inactive/non-binding compounds are usually extracted from the same chemical series as the active compounds. In this case, small fragments modification can induce important bioactivity loss or gain, thus, clustering active compounds to guarantee diversity and minimize analogous bias would have no meaning. Since the final objective of using such data is to harshly evaluate ability of VS methods to discriminate active from inactive compounds based on small signals, the proximity between active and inactive compounds within a chemotype should be conserved, as well as the similarity within the active compounds of a chemotype. However the over representation of a given chemotype could hinder the evaluation of VS method by masking the enrichment of low populated chemotypes. We suggest that a work should be made to equally represent chemotypes and/or to weight the resulting ROC curve (Ibrahim et al., 2015b).

### CONCLUSION

Benchmarking databases are widely used to evaluate virtual screening methods. They are particularly important to compare performance of virtual screening methods and therefore to select appropriate protocol prior to large compounds collections screening, and to estimate the reliability of the results of a screening. The characterization of the weaknesses of the first published databases helped designing improved benchmarking datasets with minimized bias. The rational selection of decoy compounds is particularly important to avoid artificial enrichment in the evaluation of the different methods. The diversification of public datasets gathering both active and decoy compounds for a given protein family, and the publication of online decoys generation tools contributed to the democratization of the use of benchmarking studies to help identifying protocols adapted for the query/target system under study. Nowadays, experimental data are being integrated in the decoy compounds set to look for a specific activity or to identify methods fitted for highly similar binders/non binders discrimination. Experimentally validated decoys selection requires careful attention to minimize experimental biases that may arise.

#### AUTHOR CONTRIBUTIONS

All authors listed have made substantial, direct and intellectual contribution to the work, and approved it for publication.

### REFERENCES


ligands and structures benchmarking database. J. Med. Chem. 57, 3117–3125. doi: 10.1021/jm500132p


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Réau, Langenfeld, Zagury, Lagarde and Montes. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Molecular Connectivity Predefines Polypharmacology: Aliphatic Rings, Chirality, and sp<sup>3</sup> Centers Enhance Target Selectivity

Stefania Monteleone, Julian E. Fuchs\* † and Klaus R. Liedl\*

Institute of General, Inorganic and Theoretical Chemistry, Center of Molecular Biosciences, University of Innsbruck, Innsbruck, Austria

Dark chemical matter compounds are small molecules that have been recently identified as highly potent and selective hits. For this reason, they constitute a promising class of possible candidates in the process of drug discovery and raise the interest of the scientific community. To this purpose, Wassermann et al. (2015) have described the application of 2D descriptors to characterize dark chemical matter. However, their definition was based on the number of reported positive assays rather than the number of known targets. As there might be multiple assays for one single target, the number of assays does not fully describe target selectivity. Here, we propose an alternative classification of active molecules that is based on the number of known targets. We cluster molecules in four classes: black, gray, and white compounds are active on one, two to four, and more than four targets respectively, whilst inactive compounds are found to be inactive in the considered assays. In this study, black and inactive compounds are found to have not only higher solubility, but also a higher number of chiral centers, sp<sup>3</sup> carbon atoms and aliphatic rings. On the contrary, white compounds contain a higher number of double bonds and fused aromatic rings. Therefore, the design of a screening compound library should consider these molecular properties in order to achieve target selectivity or polypharmacology. Furthermore, analysis of four main target classes (GPCRs, kinases, proteases, and ion channels) shows that GPCR ligands are more selective than the other classes, as the number of black compounds is higher in this target superfamily. On the other side, ligands that hit kinases, proteases, and ion channels bind to GPCRs more likely than to other target classes. Consequently, depending on the target protein family, appropriate screening libraries can be designed in order to minimize the likelihood of unwanted side effects early in the drug discovery process. Additionally, synergistic effects may be obtained by library design toward polypharmacology.

Keywords: dark chemical matter, drug discovery, molecular descriptors, stereochemistry, chemical properties, screening library design, off-targets, drug repurposing

#### INTRODUCTION

Drug discovery for a specific target is a long process that starts from hit finding: in the past high throughput screening (HTS) of huge compound libraries was the most common process in pharmaceutical companies. However, the chemical space that the HTS can reach is restricted to the molecules that were previously synthesized and included in the screened library. This certainly

#### Edited by:

Adriano D. Andricopulo, São Carlos Institute of Physics, University of São Paulo, Brazil

#### Reviewed by:

Luca Evangelisti, Università di Bologna, Italy Denis Fourches, North Carolina State University, United States

#### \*Correspondence:

Klaus R. Liedl klaus.liedl@uibk.ac.at Julian E. Fuchs julian.fuchs@uibk.ac.at

#### †Present address:

Julian E. Fuchs, Boehringer Ingelheim RCV GmbH & Co KG, Vienna, Austria

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 12 May 2017 Accepted: 07 August 2017 Published: 28 August 2017

#### Citation:

Monteleone S, Fuchs JE and Liedl KR (2017) Molecular Connectivity Predefines Polypharmacology: Aliphatic Rings, Chirality, and sp<sup>3</sup> Centers Enhance Target Selectivity. Front. Pharmacol. 8:552. doi: 10.3389/fphar.2017.00552

**215**

precludes the discovery of new compounds, as the chemical space is much wider and the use of limited knowledge makes the hit discovery challenging (Dobson, 2004; Reymond, 2015).

To overcome these disadvantages, computational techniques can be applied in order to speed up the process of drug design and to perform de novo drug design. One of the most popular methods is virtual screening, that is the identification of possible candidates for assays by considering their molecular properties (ligand-based) and/or their interactions with the macromolecular binding partner (typically a protein) when its structure is available (structure-based) (Kirchmair et al., 2009; von Grafenstein et al., 2014; Kaserer et al., 2015; Vuorinen and Schuster, 2015). Different virtual compound libraries can be designed, depending on the target properties and on the desired pharmacokinetics (Lionta et al., 2014). Therefore, fragment-based and relatively small focused libraries have found great success: a wider chemical space is covered by virtually assembling many different building blocks as in combinatorial synthesis (Chevillard and Kolb, 2015; Reymond, 2015) or by building compounds directly starting from the structure complex with the first fragment (Srinivas Reddy et al., 2013).

Furthermore, virtual libraries can be properly designed in order to identify active compounds, which also exhibit suitable ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties (Gleeson, 2008). The Lipinski's rule of five (Lipinski, 2004) helps in identifying orally active compounds, but does not fully describe all facets of druggability. For instance, today the in silico assessment of molecular toxicity is still challenging (Roncaglioni et al., 2013; Raies and Bajic, 2016), but at the same time necessary to establish early and in silico if a molecule could cause toxic side effects, rather than in the later preclinical phase by experimental assays, which are expensive and time consuming (Peters et al., 2012). On one side, it is undoubted that side effects take place when a molecule is active on multiple targets and, hence, by definition promiscuous (Wang and Greene, 2012). On the other side, promiscuity can represent also an advantage, where the goal of the drug development is to obtain a polypharmacological effect, especially in the treatment of diseases that involve multiple targets (Anighoro et al., 2014; Rastelli and Pinzi, 2015).

To this purpose, the computation of molecular properties has been established not only to discriminate between inactive and active, weak and potent compounds, but also between promiscuous and selective ligands. For instance, Lovering et al. (2009) showed that target selectivity increases with the number of chiral centers and with higher molecular complexity, described as fraction of carbon sp<sup>3</sup> atoms. Moreover, the presence of amines and high clogP values negatively affect target selectivity (Lovering, 2013). Indeed, many promiscuous compounds are positively charged at physiological pH, as emerged also from the analysis of a Roche dataset (Peters et al., 2009).

With the recent identification of "dark chemical matter" (DCM) as promising starting point for drug discovery (Macarron, 2015; Wassermann et al., 2015), chemical properties of this potentially highly selective compound species are in the focus of interest. Wassermann et al. (2015) use descriptors based on the two-dimensional (2D) compound structures and describe subtle shifts in their distributions toward higher solubility (logS), lower hydrophobicity (logP), smaller molecular weight (MW) and lower amount of rings for DCM versus compounds that are frequently active in HTS assays (Wassermann et al., 2015). They define DCM as molecules that are inactive in at least 100 assays, presuming that these compounds would hit only few possible targets. However, there are compounds, which are listed as DCM, but they are active on many different targets. For example, CID1048281 (Supplementary Figure 1) is considered DCM because it is inactive in more than 650 assays, but it is also active in other six assays in PubChem, which test the activity on unrelated targets (RAR-related orphan receptor gamma, aldehyde dehydrogenase, tyrosyl-DNA phosphodiesterase, ATPase, bromodomain adjacent to zinc finger domain and shiga toxin).

On the other side, many assays may be available for the same target and the number of negative test outcomes does not necessarily correctly depict target selectivity. For example, there are 245 small-molecule bioassays reported on PubChem for the adrenoreceptor beta 1 and more than 350 for the beta 2 subtype. Moreover, most of these bioassays are not specific for a receptor subtype or are simply confirmatory. In order to overcome this pitfall, Wassermann et al. (2015) filtered the set of bioassays by removing redundant readouts for the same target.

As shown, it is extremely hard to determine the target selectivity of a molecule solely on the base of its assay positive or negative outcomes. For this reason, we propose an alternative classification of active molecules, on the base of the number of targets they hit, in order to investigate target selectivity and/or polypharmacology in the early phase of the drug discovery process. In detail, we distinguish between molecules that are selective toward one single protein and other compounds that are active on multiple targets. In this way, it is possible to identify which molecular properties enhance target selectivity and which protein families are likely to constitute off-targets.

### MATERIALS AND METHODS

### Ligand Dataset Retrieval

We extracted the set of 139,352 DCM compounds from Novartis and PubChem (Kim et al., 2015) as InChi (IUPAC International Chemical Identifier) from the Supporting Information of Wassermann et al. (2015) and downloaded the 3D coordinates of 139,328 molecules from the PubChem Compound database (Kim et al., 2016).

The set of active compounds was extracted from PubChem BioAssay (Wang et al., 2017) using the list of 459 bioassays provided by Wassermann et al. (2015). Active compounds (256,448) were extracted via their compound identifiers (CIDs), downloaded as 3D coordinates (237,510) and pooled to a single set of 376,838 compounds.

Furthermore, we performed a filtering step to remove duplicates within the dataset. To this purpose we used the RDKit (RDKit, 2015) chemoinformatics toolkit. Moreover, we

removed the compounds that were active but without any specified targets (14,464). Our final dataset included 341,599 molecules.

#### Computation of Molecular Descriptors

The PubChem coordinate files contained already precomputed 2D descriptors, including MW, number of heavy atoms, defined and undefined stereocenters, H-bond donors and acceptors, which were considered for our analysis as provided.

Additionally, we calculated logS (Hou et al., 2004) and logP(o/w) using the MOE (Molecular Operating Environment, version 2015.1001) (MOE, 2016) molecular descriptor tools and the atomic geometries with MOE's Scientific Vector Language (SVL) function "aGeometry" together with the SMARTS matching function "sm\_MatchAll." In detail, aGeometry returns the hybridization of an atom and sm\_MatchAll searches for specific SMARTS patterns, which we used to count non-ring and non-terminal carbon atoms. For instance, sp<sup>3</sup> carbon atoms are counted by matching "CH2" SMARTS codes. In order to restrict the count to non-ring and non-terminal atoms, we specified "!r" and "!H3" respectively.

Furthermore, we used RDKit (RDKit, 2015) to count the number of single and fused aromatic and aliphatic rings as well as the number of carbon–carbon and carbon–nitrogen double bonds based on SMILES codes.

Statistical analysis, including the two-sided Wilcoxon ranksum test and Kolmogorov–Smirnov test, was performed using R (R Development Core Team, 2010) (Supplementary Tables 2–4).

#### Target Retrieval and Analysis

Assay and target information for all compounds have been retrieved from the PubChem database by querying the compounds identifiers (CID) against the assay summary webpage. Active targets with specified gene id were considered for Uniprot (Bateman et al., 2015; The UniProt Consortium, 2017) retrieval, in order to convert the gene id to the associated protein's Uniprot accession number.

We assigned the protein superfamily for every target, by searching Uniprot accession numbers into lists of GPCRs, kinases, proteases, and ion channels. We obtained the lists of 3,092 GPCRs, 1,365 kinases and 11,606 proteases from Uniprot, and the list of 899 ion channels from ChEMBL (Bento et al., 2014) and IUPHAR/BPS Guide to Pharmacology (Southan et al., 2016).

We counted the number of targets on which a molecule is found to be active and clustered active ligands in three classes: black compounds are active only on one single target, gray compounds are active on two to four targets and white compounds are active on more than four targets. We defined these cut-off values in order to obtain a comparable number of molecules in every subset: 73,383 black, 103,025 gray, 87,303 white, 77,888 inactive compounds (compound set provided via SI).

Figures are generated by using MATLAB (MATLAB, 2012), R (R Development Core Team, 2010) and ChemDraw (PerkinElmer Informatics, 1998–2015).

### RESULTS

### Molecular Descriptors

We analyzed the distributions of 2D molecular descriptors within the compound sets (inactive, black, gray, and white). We find that chirality enhances target selectivity. For instance, molecules become more selective if they present at least one chiral center: inactive and black compounds contain a higher number of defined R/S stereocenters with respect to white molecules (**Figure 1A**). On the contrary, the absence of a chiral center enhances promiscuity, as described by the percentage of white molecules (∼79% versus ∼62% in black ones) (Supplementary Table 1).

On the opposite, if at least a carbon–carbon or carbon– nitrogen double bond is present, molecules tend to be white and, hence, more promiscuous (**Figure 1B**). Otherwise, if they do not have any double bonds, they tend to be inactive or black (∼85% versus ∼69% in white ones) (Supplementary Table 1).

These findings are also confirmed by the analysis of atomic geometries: non-ring and non-terminal sp<sup>3</sup> carbon atoms enhance selectivity (**Figure 1C**); about 42% of white compounds do not include any sp<sup>3</sup> carbon atoms, with respect to ∼27% of inactive and black ones (Supplementary Table 1).

We also computed the molecular descriptors that were reported by Wassermann et al. (2015). However, our results show that the MW is not able to properly describe target selectivity: indeed, black compounds do not follow the expected trend, as they show MWs which are comparable to those of white molecules (**Figure 1D**). This finding disagrees with Wassermann et al. (2015), because our dataset does not include all molecules that were considered in the Novartis analysis, but only those that were reported in the publication. As this descriptor appears dataset dependent, we discarded it.

Additionally, the number of rings differs between these classes: black compounds exhibit higher numbers of aliphatic rings (∼36% of black molecules have one aliphatic ring, with respect to 30% of white ones) (**Figure 1E**). By constrast, white compounds show higher numbers of fused aromatic rings (∼35% with respect to 26% of inactive molecules) (**Figure 1F**). Indeed, more than half of the selective molecules has at least one aliphatic ring (∼53% of inactive and ∼51% of black compounds) and no fused aromatic rings (∼71 of inactive and 62% of black compounds).

Furthermore, inactive and black compounds exhibit higher values of logS compared to gray and white compounds, especially for logS in the range between −2 and −4 (**Figure 2A**). By contrast, the opposite trend is observed for lower solubility: half of white molecules shows a logS value lower than −5, whereas only 20% of inactive and ∼30% of black compounds have similar solubility (Supplementary Table 1).

Consequently, lipophilicity increases with the number of targets: gray and white molecules show higher SlogP values than inactive and black ones (**Figure 2B**). For instance, ∼36% of white compounds show SlogP values that are higher than 4, whereas selective molecules (∼33% of inactive and ∼29% of black compounds) exhibit SlogP values which are in the range between 2 and 3.

FIGURE 1 | Statistical analysis of molecular descriptors per ligand class (inactive, black, gray, and white). Data are represented as 3D bar plots, colored according to the percentage values for each subset (see color bar). (A) The number of R/S stereocenters per molecule shows that most of white compounds have no chiral centers, whereas inactive molecules show the highest percentage of compounds with one stereocenter. (B) The number of carbon–carbon or carbon–nitrogen double bonds is higher for white ligands compared to the other classes, which normally have none. (C) Inactive and black sets exhibit higher content of non-ring and non-terminal sp<sup>3</sup> carbon atoms with respect to white compounds, which tend to be sp<sup>2</sup> hybridized. (D) The molecular weight (MW) is similar for all subsets in the range 300–500 Da, but shows different results for smaller and higher values. Indeed, inactive and white compounds exhibit higher percentages for values lower than 300 Da, with respect to black and gray sets. On the contrary, black compounds can be rather complex structures as their MW can be higher than 500 Da. The MW axis is divided into different ranges and its labels represent the highest boundary. For instance, "350" indicates compounds with MW values between 300 and 350. (E) Most of white molecules have no aliphatic rings, which characterize instead inactive and black datasets. (F) In contrast, a higher number of fused aromatic rings is a chemical feature of white molecules.

Calculating these molecular descriptors, it is possible to predict which building blocks characterize black compounds and, therefore, can be used for synthesis of new selective drug candidates.

#### Target Analysis

Our dataset includes ligands that bind to a variety of targets, 2,715 in total. For instance, 10.98% of the targets are represented by G-protein coupled receptors (GPCRs), 13.41% by kinases, 10.68% by ion channels and 5.78% by proteases (**Figure 3**). About 60% of the targets comprise other enzymes, receptors or transcription factors that do not fall into these four major target classes.

G-protein coupled receptor ligands are more selective than other classes, as the number of black compounds is higher (14.30%) with respect to other targets (5.80% ion channels, 6.25% ion channels, 9.21% kinases) (**Figure 4**). For example, CID 2983576 is a ligand that binds to the human cholinergic muscarinic receptor 4 and is inactive toward other muscarinic

FIGURE 2 | Statistical analysis of molecular solubility (logS) and hydrophobicity (SlogP) per ligand class (inactive, black, gray, and white). Data are represented as 3D bar plots, colored according to the percentage values for each subset (see color bar). The logS and SlogP axes are divided into different ranges and labels represent the highest boundary of each range. (A) Molecular solubility, reported as logS, is higher for inactive and black compounds for values higher than –4. Whereas white compounds have logS values lower than –4. (B) White compounds show SlogP values higher than 4. In contrast, inactive and black molecules have values lower than 4.

receptor subtypes (**Figure 5**). As many other black compounds, it contains a chiral center, an aliphatic ring, several non-ring and non-terminal sp<sup>3</sup> carbon atoms (5) and has a low logP value (2.2).

Ligands that bind to ion channels and proteases tend to be more promiscuous (**Figure 4**). This is particularly pronounced for proteases, where 62% of ligands can bind to more than four non-protease targets (**Figure 6**). For example, CID 646260 is active on caspase 3 and other non-protease targets, such as GPCRs and other enzymes.

In contrast, only 37% of GPCR ligands binds to other proteins beyond GPCRs. For instance, only 13% of GPCR ligands bind to kinases, 16% to proteases and 24% to ion channels.

Instead, kinase ligands are able to bind to many non-kinase targets. For example, compound CID 1005278 binds not only to kinases (such as RIPK), but also to potassium channels (such as KCNQ1), dopamine receptors (D1 and D3), proteases and other non-kinase targets. However, analysis of intra-class activity shows that kinase ligands in general bind only to one kinase (for example, CID 2283311 is a black molecule that is active only on MAP3K3). This evidence is surprising, as kinases are known to be promiscuous, especially toward other kinases (Davis et al., 2011). However, the number of kinase ligands in our dataset is relatively small (27,935) and we might miss information from unselective ligands that were not included in the analysis.

Furthermore, ion channel, protease and kinase ligands exhibit higher chances to bind to GPCRs: almost half of ion channel (49%), 36.6% of protease and 35% of kinase ligands bind to GPCRs as well. However, this trend cannot be observed for proteases, kinases or ion channels, as they exhibit lower probabilities to bind to these target classes (Supplementary Figure 2).

### DISCUSSION

The escape from flatland has already been described as a valuable approach to improve clinical success (Lovering et al., 2009) and the unique activity profiles of highly potent and selective molecules might be the underlying principle. It is chemically intuitive that more complex molecular shapes restrict the diversity of binding partners and provide selectivity gains (Mendez-Lucio and Medina-Franco, 2017). A criterion favoring complex 3D shapes, with chiral centers and high sp<sup>3</sup> carbon contents, low number of double bonds and fused aromatic rings, in candidate molecules might complement widely accepted criteria for drug-likeness solely based on 2D molecular properties, like solubility and MW (Lipinski, 2004; Leeson and Springthorpe, 2007).

We also believe that these molecular properties highly affect the target selectivity. Indeed, already Lovering et al. (2009) stated that the degree of saturation is able to distinguish marketed drugs from drug-like molecules. In detail, compounds that have success through clinical trials are characterized by increased saturation and the presence of chiral centers. For instance, our findings confirm that the sp<sup>3</sup> conformation is a key feature to obtain target selectivity and in turn to improve clinical success in the process of drug development.

These molecular descriptors, together with solubility and lipophilicity, may be readily applied as an additional selection criterion for promising starting points in early stage drug discovery. Wassermann et al. (2015) have shown DCM is more soluble than active molecules. Our results are in agreement with their findings, as selective compounds are more soluble than promiscuous ones.

In contrast, MW does not properly distinguish between inactive and white molecules as shown in other datasets. For

according to the compound identifier (CID) from PubChem. CID 2983576 is a selective GPCR ligand: its absolute stereochemistry is undefined in PubChem and, hence, not shown here. CID 646260 is a protease ligand, which binds also to other non-protease targets. CID 1005278 is a kinase ligand that binds also to other non-kinase targets. CID 2283311 is a selective kinase ligand that is active only on one target.

instance, promiscuity is enhanced by lower values of MW in a dataset from Pfizer (Hopkins et al., 2006), but higher values in datasets from Novartis (Azzaoui et al., 2007), Roche (Peters et al., 2009) and Boehringer Ingelheim (Muegge and Mukherjee, 2016).

We also considered further molecular descriptors, such as the number of hydrogen bond donors and acceptors, but they do not allow to distinguish between selective and promiscuous compounds (Supplementary Table 1), as also shown by Novartis (Azzaoui et al., 2007) and Roche (Peters et al., 2009).

In our dataset many ligands are promiscuous and, hence, can effectively hit off-targets, which are represented by all other targets that a molecule can bind besides the intended target (Rudmann, 2013).

However, in our dataset GPCR ligands are highly selective. This evidence appears to be in contrast to previous knowledge, as GPCRs are known to be promiscuous targets, especially if their ligands are not peptidic or small molecules (Paolini et al., 2006). For instance, our results may change by considering specialized datasets, such as PDSP Ki database (Roth et al., 2000).

Additionally, our analysis shows that ligands from other target protein families can easily bind GPCRs. Indeed, there are great overlaps between all four target classes that we considered (Supplementary Figure 2) and we do not know if these molecules were developed firstly as GPCR ligands or not.

selectivity is highlighted by colored boxes around the pie charts.

The identification of a GPCR as off-target is extremely important, as the activity on specific GPCRs is also related to severe side effects, e.g., cardiovascular diseases. Indeed, 5-HT2B has been identified as cause of valvulopathy and led to the withdrawal of drugs from the market (Huang et al., 2009).

Our results show that protease ligands can bind to many offtargets: indeed, it can be difficult to achieve target selectivity within related proteases (Drag and Salvesen, 2010) but strategies to rationally improve the selectivity profiles of protease inhibitors based on substrate peptide data and experimental 3D structures have been described (Fuchs et al., 2013).

In our dataset, kinase ligands seem to be selective toward only one kinase member rather than to more targets in the same protein family. However, this unexpected outcome can be explained by the relatively low amount of kinases ligands that is present in the dataset. Kinase ligands are indeed generally known to be promiscuous, but some of them exhibit higher selectivity, especially if they bind to the pocket close to the ATP site and prefer a specific conformation of the activation loop (Davis et al., 2011). Moreover, in our dataset we identify even more pronounced polypharmacology within and between other target classes. For instance, ion channel ligands overlap with GPCR ligands, as they frequently exhibit a common ligand scaffold, which includes an amine linked to an aromatic ring by an alkylic chain that is present in benzodiazepines or dihydropyridines. In addition, ion channels constitute a common off-target, causing cardiac adverse effects. Indeed, hERG potassium channels are responsible of arrhythmias, in particular torsades de pointes, and many antipsychotics and other drugs bind to these channels as off-targets, increasing the risk of cardiovascular diseases (Silvestre and Prous, 2007). As example, the antihistaminic terfenadine was withdrawn from the market for its toxic adverse effect, that was caused by this off-target activity (Monahan et al., 1990).

This analysis bring us to ask if we can identify likely off-targets in the early discovery process. Normally, in the early steps, target selectivity is considered only among related targets, which are proteins that belong to the same protein

family, since high structure and ligand similarity is expected. In this case, target selectivity can be rationalized, e.g., via X-ray structures of targets and off-targets. However, several adverse side effects are caused by distant or nearly unrelated targets. For this reason, the prediction of ligand binding is still challenging and the use of cheminformatics tools can guide the medicinal chemists in identifying the chemical features that typically cause promiscuity (Besnard et al., 2012). Nevertheless, the training of virtual screening models is limited by the use of biased ligand sets. Indeed, our analysis show that results highly depend on the selected dataset, which affected the distribution of the physicochemical properties and target classes. Therefore we expect that based on the desired target, specialized datasets can be used to further improve the performance of in silico models.

In particular, screening libraries can be properly designed by taking into account molecular properties, such as stereochemistry, atomic geometries and rings, besides solubility and lipophilicity. Many predesigned compound libraries are already freely available online and could be easily filtered or prioritized by using these 2D descriptors, without the need of applying a time consuming and computationally demanding generation of 3D conformers.

### CONCLUSION

A good starting point for the design of a selective drug should favor aliphatic over aromatic rings, alkylic chains containing sp<sup>3</sup>

### REFERENCES


carbon atoms over double bonds, and stereocenters over achiral atoms. Even though the introduction of chiral centers can make the synthesis more challenging, the gain in target selectivity may be considerable.

On the other hand, polypharmacology could be achieved by introducing flat chemical moieties, such as fused aromatic rings and double bonds. However, this could bring not only additional desired, but also undesired side effects.

### AUTHOR CONTRIBUTIONS

SM and JF performed the research. SM, JF, and KL designed the study and contributed to the preparation of the manuscript.

### FUNDING

The research of the manuscript was supported by funding of the Austrian Science Fund FWF with respect to the project "Targeting Influenza Neuraminidase" (P23051).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fphar. 2017.00552/full#supplementary-material



MATLAB (2012). Matlab R2012a. Natick, MA: The MathWorks Inc.


**Conflict of Interest Statement:** JF is a permanent employee of Boehringer Ingelheim RCV GmbH & Co KG.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Monteleone, Fuchs and Liedl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Development of Matrix Metalloproteinase-2 Inhibitors for Cardioprotection

Péter Bencsik 1,2†, Krisztina Kupai 3†, Anikó Görbe1,2, Éva Kenyeres 1,2, Zoltán V. Varga<sup>4</sup> , János Pálóczi <sup>4</sup> , Renáta Gáspár <sup>3</sup> , László Kovács <sup>5</sup> , Lutz Weber <sup>6</sup> , Ferenc Takács <sup>5</sup> , István Hajdú7,8, Gabriella Fabó<sup>7</sup> , Sándor Cseh<sup>7</sup> , László Barna8,9, Tamás Csont <sup>3</sup> , Csaba Csonka<sup>3</sup> , György Dormán<sup>7</sup> \* ‡ and Péter Ferdinandy 2,4 \* ‡

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Mohammad Hassan Baig, Yeungnam University, South Korea Vittoria Colotta, Università degli Studi di Firenze, Italy

#### \*Correspondence:

György Dormán dorman@targetex.com Péter Ferdinandy peter.ferdinandy@pharmahungary.com

> †These authors have contributed equally to this work. ‡Joint last authors.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 16 January 2018 Accepted: 14 March 2018 Published: 05 April 2018

#### Citation:

Bencsik P, Kupai K, Görbe A, Kenyeres É, Varga ZV, Pálóczi J, Gáspár R, Kovács L, Weber L, Takács F, Hajdú I, Fabó G, Cseh S, Barna L, Csont T, Csonka C, Dormán G and Ferdinandy P (2018) Development of Matrix Metalloproteinase-2 Inhibitors for Cardioprotection. Front. Pharmacol. 9:296. doi: 10.3389/fphar.2018.00296 <sup>1</sup> Cardiovascular Research Group, Department of Biochemistry, Faculty of Medicine, University of Szeged, Szeged, Hungary, <sup>2</sup> Pharmahungary Group, Szeged, Hungary, <sup>3</sup> Department of Biochemistry, Faculty of Medicine, University of Szeged, Szeged, Hungary, <sup>4</sup> Department of Pharmacology and Pharmacotherapy, Faculty of Medicine, Semmelweis University, Budapest, Hungary, <sup>5</sup> Infarmatik, Budapest , Hungary, <sup>6</sup> OntoChem GmbH, Halle (Saale), Germany, <sup>7</sup> Targetex Biosciences, Dunakeszi, Hungary, <sup>8</sup> Research Centre for Natural Sciences, Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary, <sup>9</sup> Microscopy Center at IEM HAS, Institute of Experimental Medicine, Hungarian Academy of Sciences, Budapest, Hungary

The objective of our present study is to develop novel inhibitors for MMP-2 for acute cardioprotection. In a series of pilot studies, novel substituted carboxylic acid derivatives were synthesized based on imidazole and thiazole scaffolds and then tested in a screeening cascade for MMP inhibition. We found that the MMP-inhibiting effects of imidazole and thiazole carboxylic acid-based compounds are superior in efficacy in comparison to the conventional hydroxamic acid derivatives of the same molecules. Based on these results, a 568-membered focused library of imidazole and thiazole compounds was generated in silico and then the library members were docked to the 3D model of MMP-2 followed by an in vitro medium throughput screening (MTS) based on a fluorescent assay employing MMP-2 catalytic domain. Altogether 45 compounds showed a docking score of >70, from which 30 compounds were successfully synthesized. Based on the MMP-2 inhibitory tests using gelatin zymography, 7 compounds were then selected and tested in neonatal rat cardiac myocytes subjected to simulated I/R injury. Six compounds showed significant cardio-cytoprotecion and the most effective compound (MMPI-1154) significantly decreased infarct size when applied at 1µM in an ex vivo model for acute myocardial infarction. This is the first demonstration that imidazole and thiazole carboxylic acid-based compounds are more efficacious MMP-2 inhibitor than their hydroxamic acid derivatives. MMPI-1154 is a promising novel cardio-cytoprotective imidazole-carboxylic acid MMP-2 inhibitor lead candidate for the treatment of acute myocardial infarction.

Keywords: matrix metalloproteinase, MMP-2 inhibitor, heart, ischemia/reperfusion injury, cardioprotection, lead candidate

#### Bencsik et al. Cardioprotection by MMP Inhibition

### INTRODUCTION

Coronary heart disease (CHD) is the number one cause of death globally (Alwan et al., 2010). Recent data show that almost 18 million people died from CVDs in 2015, of which an estimated 7.4 million were due to coronary heart disease (Roth et al., 2017; WHO, 2017). The discovery of endogenous cardioprotective mechanisms (Ischemic pre-, post-, and remote pre- and perconditioning) has allowed for the exploration of several molecular processes of cell injury and survival mechanisms during ischemia/reperfusion (I/R) (Ferdinandy et al., 2014). However, in spite of numerous promising preclinical attempts aiming pharmacological triggering these cardioprotective mechanisms, the dilemma of translation of the results into clinical practice has remained unsolved due to the presence of several additional factors including cardiovascular co-morbidities (e.g., hyperlipidemia or diabetes mellitus) (Ferdinandy et al., 2014). Thus, to improve clinical outcomes, novel therapeutic strategies against myocardial I/R injury are needed, which may preserve their protection even in the presence of cardiovascular co-morbidities (Hausenloy et al., 2017).

Matrix metalloproteinases (MMP) are zinc containing peptidases classified into several subtypes. The gelatinase-type MMP-2 occurs in the heart in physiological conditions and is synthesized by cardiomyocytes, fibroblasts, and endothelial cells (DeCoux et al., 2014). During I/R, MMP-2 is activated and released from the injured myocardium (Cheung et al., 2000), which may contribute to the degradation of contractile proteins (Wang et al., 2002; Sawicki et al., 2005; Sung et al., 2007; Ali et al., 2010), thereby leading to myocardial dysfunction, and in the long run, to heart failure. Furthermore, in patients with STelevation myocardial infarction (STEMI), a significant positive correlation has been shown between the circulating levels of MMP-2 measured before and 12 h after recanalization therapy, and infarct size as determined by cardiac MR (D'Annunzio et al., 2009). We have demonstrated that MMP-2 can be a promising biomarker for patients with coronary artery disease (Bencsik et al., 2015). We have previously also reported that pharmacological inhibition of MMP-2 in rats evoked cardioprotection that is equivalent to ischemic preconditioning (Giricz et al., 2006; Bencsik et al., 2010). Our work has also shown that although hyperlipidemia abolished the beneficial effect of ischemic preconditioning, cardioprotection in the presence of hyperlipidemia was preserved during pharmacological inhibition of MMP-2 (Giricz et al., 2006). We can thus conclude that MMP-2 inhibition is a promosing drug target since it works in the presence of a significant cardiovascular co-morbidity, namely hyperlipidemia (see for reviews Andreadou et al., 2017).

To date, several MMP inhibitors have been identified, including hydroxamates, thiols, carbamoylphosphonates, hydroxyureas, hydrazines, β-lactam, squaric acids, and nitrogenous ligands (Durrant et al., 2011). Most of these consist of a metal-coordinating function, called a zinc-binding group (ZBG), which binds to the catalytic zinc ion of the MMPs. Despite the promising features of these potent MMP inhibitor compounds, only one compound has been approved for clinical use by the U.S. Food and Drug Administration Authority, which is Periostat <sup>R</sup> (doxycycline hyclate), for the treatment of periodontitis (Dormán et al., 2010). In spite of much preclinical evidence about the involvement of MMP-2 in acute myocardial infarction (AMI), surprisingly, only one failed clinical trial was conducted by the administration of a non-selective, hydroxamate type MMP inhibitor, PG-116800, in a relatively high dose (400 mg/day) for 90 days for AMI patients (Hudson et al., 2006).

Consequent research has been focused on the design of selective compounds that can distinguish between different members of the MMP family, thereby exploiting zincbinding groups other than the hydroxamate group (Fisher and Mobashery, 2006). In addition, we have recently shown that there is no need for complete inhibition of MMP-2 to achieve cardioprotection since a moderate (∼20–25%) inhibition of MMP-2 activity was sufficient to reduce infarct size in normoand hyperlipidemic isolated rat hearts (Giricz et al., 2006) and also in an in vivo rat model of AMI (Bencsik et al., 2014).

Consequently, our aims were to develop novel MMP-2 inhibitors with potent anti-ischemic efficacy and moderate MMP-2 selectivity among the MMP-subtypes. Preclinical studies with MMPI's revealed a severe adverse side-effect frequently, referred to as musculoskeletal syndrome. This is primarily due to MMP-1 inhibition (which is considered an anti-target within the MMPs). Selectivity against MMP-1 may be important to avoid such side effects of MMP inhibitors (Papp et al., 2007).

The significant differences in the structural features of the subpockets of the binding/active sites allow for easy differentiation and selectivity of the MMP inhibitors. S1' and S2' pockets are responsible for the selectivity of the inhibitors and this can be taken into consideration in the design of selective inhibitors to tailor the occupation of the particular sub-pockets (**Figure 1**). In the case of MMP-2, the S1' pocket is mainly hydrophobic and relatively large, while in MMP-1 it is short and shallow. Increasing bulkiness at the S1' pocket could change the activity profile and allows for some selectivity over MMP-1. This trend was clearly observed in the case of substituted thiazepine MMP inhibitors (Almstead et al., 1999; Papp et al., 2007).

Therefore, we have designed a screening cascade to select potent MMP-2 inhibitors with cardioprotective effects.

#### MATERIALS AND METHODS

#### Experimental Design-Screening Cascade

Our group applied a complex screening cascade to identify candidates that may reduce acute cardiac I/R injury via inhibition of MMP-2. During our complex screening protocol, virtual screening was combined with docking calculations followed by medium-throughput screening using MMP-2 catalytic domain. In the next stage, the inhibitory effect was confirmed on full length MMP-2 enzyme isolated from cardiac tissue. Finally, the cardioprotective effects of selected molecules were tested in neonatal cardiac myocytes that were subjected to simulated ischemia and reoxygenation as well as on an isolated rat heart model of AMI (**Figure 2**).

FIGURE 2 | The screening cascade. Complex screening cascade to identify candidates that may reduce acute cardiac I/R injury via inhibition of MMP-2. (A) AMRI Chemical Library contains ∼200,000 drug-like small molecules (<500 Da) as compound set. We intended to select zinc-binding motif holding molecules, similar to hydroxamic acids. (B) For 2D substructure and similarity search. (C) Selection of free acids from the AMRI's compound's collection. (D) Further focus to compounds holding various motifs around a central core, reflecting the typical MMP inhibitor architecture. (E) Selected acids screened in a fluorescent assay using a recombinant human MMP-2 catalytic fragment and a synthetic peptide substrate. (F) The synthesis of the thiazole and the isosteric imidazole carboxylic acids. (G) The hydroxamic acid pairs of the previously measured acids were tested. (H) The novel thiazole carboxylic acid chemotype was the starting point for further structure-based optimization. A 568-membered focused library was in silico generated around the AMRI library hits including their bioisosters and some simplified analog. (I) Docking studies: Genetic Optimization for Ligand Docking (GOLD) was used to build a 3D model based on the X-ray structure of human MMP-2 and MMP-9. (J) Thirty compounds were successfully synthesized for screening combining the in silico hits and the additional designed compounds. (K) In vitro MMP-2 activity was measured using a fluorometric assay. (L) Low throughput screening by gelatin zymography technique. (M) Cell viability experiments in isolated neonatal cadiac myocytes subjected to simulated ischemia/reperfusion injury. (N) Myocardial infarct size was measured after ex vivo global ischemia experiments on isolated rat hearts.

### Chemistry—MMP-2 Inhibitor Design Design of Selective MMP-2 Inhibitors

We applied contemporary library design approaches based on the structural features of the known MMP-2 inhibitors (**Figure 2**). Our approach started from a diverse 200k compound library and the multi-step selection procedure consisted of a substructure search for binding motifs of MMP-2 inhibitors and diversity selection.

#### In Silico Chemisty Approach **Chemical library**

The Albany Molecular Research Inc. (AMRI; Albany, NY) Library contains ∼200,000 drug-like small molecules (<500 Mwt) synthesized by solution phase parallel synthesis. The compound set contained ∼300 medicinal chemistry relevant chemotypes with diverse substitution patterns. The library was succesfully involved in many exclusive drug discovery projects.

#### **2D chemoinformatics methods**

According to the Similar Property Principle (Johnson and Maggiora, 1990), molecules that are structurally similar are likely to have similar properties. Applying simple 2D fingerprints is often the method of choice, particularly when numerous reference compounds and multimillion compound databases are available not only "because of its computational efficiency but also because of its demonstrated effectiveness in many comparative studies" (Willett, 2006; Baig et al., 2016). Most frequently the Tanimoto coefficient (Willett and Winterman, 1986) is used for measuring similarity, in spite of its marked size-dependency.

In practice, determining the similarity between known reference structures and each molecule in a database, followed by ranking the database molecules according to the similarities would lead to a potentially active compound set for in vitro screening. Similarly, reoccurring (privileged) structural motifs could also be identified and the compounds holding the motifs could represent another screening library.

For 2D substructure and similarity search, we applied standard chemical fingerprints as implemented into InstantJChem software (ChemAxon Ltd. Budapest) in which binary strings encode the presence or absence of substructures.

The physico-chemical parameters [Mwt, clogP, H-bond donors/acceptors,—Lipinski's Rule-5 (Lipinski et al., 2001); rotatable bonds, and topological polar surface area] were calculated by the calculation suit of InstantJChem (ChemAxon Ltd. Budapest).

#### **3D alignment methods**

Novel 3D approaches consider not only the molecular topology, but also deal with 3D coordinates of both the active and the potential lead molecules for the similarity comparison and estimate 3D shape similarity (Kalászi et al., 2014).

A rough estimation of the binding behavior of the compounds is to assess their conformational flexibility and the overall statistical representation of such conformational properties would be presented as a 3D structure (ChemAxon Screen3D software) (ChemAxon, 2013).

In flexible alignment, the conformations are created "on-thefly" during the alignment procedure. Flexible alignment methods, such as used in the present study, have the advantage of not requiring a pre-defined set of initial conformers to sample the conformational space of the molecules. During the alignment procedure we took specific atom-type information such as pharmacophore sites into account. This information would be capable of generating alignments where patterns (With similar binding character) are oriented in a similar fashion as occurs during the real binding to the active site. Therefore, it provides a more realistic picture of the potential bioactive similarity of the molecules.

#### **3D modeling approaches**

For docking studies, Genetic Optimization for Ligand Docking (GOLD; version 4.0.1; Jones et al., 1997) was used to build a 3D molecule model based on the X-ray structure of human MMP-2. 1CK7 was the only full length 3D structure found in protein databases but it contained a mutation (E404A). On the other hand, the availability of the 3D structure of the collagenase-like 1-2 catalytic domain is sufficient for virtual screening targeting MMP-2 inhibition, thus 1HOV (NMR), and 1QIB (X-Ray) structures provided feasible alternatives.

Another option was 1EAK (X-Ray), which contains the collagenase-like 1-2 domain together with connecting collagen binding region (propeptides). Comparing the models 1EAK was found to be the particularly reliable for virtual screening even though it also contains the E404A mutation (Supplementary Figure 1). The propeptide regions could be removed without affecting the docking realiability.

The 3D structure of small molecules to be screened were optimized and protonated before docking. The pH was set as 7.2. For docking the standard Gold parameters were used as described in the actual User Guide (Centre, 2017).

The MMP-2 active site was defined containing all the atoms around a sphere with 19 Å radius. We have chosen Zn-ion coordination as octahedral. For all the small molecules 10 independent runs were conducted.

The 1EAK model was validated with three known MMP-2 inhibitors: SC-74020 (Supplementary Figure 2), PD 166793 (**Figure 1**), and ABT-518 (**Figure 1**).

### Synthetic Methods

The hydroxamic acids (e.g., AMRI-101H, AMRI-102H, and AMRI-103H) were prepared from the corresponding acids using bromo-tris-pyrrolidino phosphoniumhexafluorophosphate (PyBrOP) and polymer supported hydroxybenzotriazole as activating agent before adding hydroxylamine hydrochloride and a base (see Supplementary Figure 3). The isolated yields were between 10 and 76%, while the purity was higher than 85%.

The synthesis of the thiazole and the isosteric imidazole carboxylic acids were carried out according to standard procedures and as described elsewhere (Ferdinandy et al., 2010).

In order to increase the solubility of the compounds, the benzene ring was replaced with pyridine in various analogs (MMPI-1248, MMPI-1260). Unfortunately, combination of the pyridine ring with the imidazolyl core was synthetically unsuccessful.

### In Vitro Pharmacological Testing by MTS Screening

In vitro MMP-2 activity was then measured, using a fluorometric assay in a 384 well format. Human MMP-2 catalytic domain (residues 110-221, 397-455) (Feng et al., 2000) was expressed in E. coli in form of inclusion bodies. The protein was refolded and then purified by means of Ni-NTA affinity and anion exchange chromatography. Inhibition assays were carried out in 50 mM Tris, 5 mM CaCl2, 300 mM NaCl, 20µM ZnSO4, pH = 7.5 buffer. For inhibition studies the catalytic domain of the enzyme was pre-incubated with varying amount of inhibitor for 30 min. Then MMP substrate (Mca-Pro-Leu-Gly-Leu-Dpa-Ala-Arg-NH2) (Papp et al., 2007) was added at 3µM final concentration. After 1 h incubation at 37◦C the fluorescence was detected using a Wallac 1420 Victor2 microplate reader at 320 nm/405 nm Ex/Em wavelength. As an alternative substrate we also used 5-FAM-Pro-Leu-Gly-Leu-Dap(QXLTM 520)-Ala-Arg-NH2, where the fluorescence was detected at 485 nm/520 nm. For each inhibitor candidate, the percentage of inhibition was determined in duplicate experiments at six inhibitor concentrations, chosen to observe a 5–95% range of inhibition. For validation of the fluorometric assay, Ilomastat [N-[(2R)-2-(Hydroxamidocarbonylmethyl)-4 methylpentanoyl]-L-tryptophan Methylamide, (GM6001)], a non-selective MMP inhibitor, was used as a positive control inhibitor. The measured IC50 values varied between 0.3-1.0 nM which is in line with previous literature data (Galardy et al., 1994; Yamamoto et al., 1998).

### Gelatin Zymography Assay to Screen the Efficacy of MMP-Inhibitiors

Gelatin zymography was performed as described previously (Kupai et al., 2010; Bencsik et al., 2017). MMP-2 was isolated from rat heart homogenates as follows: 50 µg protein/lane were loaded and separated by electrophoresis under non-reducing conditions on an 8% SDS-polyacrylamide gels copolymerized with 2 mg/ml gelatin from porcine skin (Sigma-Aldrich; St. Louis, MO). After electrophoresis, gels were washed in 2.5% Triton-X 100 with gentle agitation and then incubated for 20 h at 37◦C in zymography development buffer (50 mM Tris-HCl, pH 7.5, containing 5 mM CaCl2, 200 mM NaCl) in the presence or abscence of the MMP inhibitor compounds. Zymographic gels were stained in a 0.05% Coomassie Brilliant Blue R-250 solution followed by destaining, and then zymograms were scanned. MMP activity was detected as a colorless transparent zone on a blue background and the clear bands in the gel were quantified by densitometry using the Quantity One software (Bio-Rad, Hercules, CA). The obtained density values were measured and percentage of inhibition values were then calculated.

### Cytoprotective Effect of MMP Inhibitor Compounds in Neonatal Rat Cardiac Myocytes Subjected to Simulated Ischemia/Reperfusion (SI/R)

#### Simulated Ischemia/Reperfusion Injury Under Hypoxic Cinditions

For our cell viability experiments, 3 day-old cardiomyocytes plated onto 24-well plates were tested under normoxic condition or subjected to simulated ischemia (SI). The normoxic cardiomyocytes were kept under normoxic conditions, i.e., the growth medium was changed to a normoxic solution (in mM: NaCl 125, KCl 5.4, NaH2PO<sup>4</sup> 1.2, MgCl<sup>2</sup> 0.5, HEPES 20, glucose 15, taurine 5, CaCl<sup>2</sup> 1, creatine 2.5, BSA 0.1%, pH 7.4, 310 mOsm/l) (Li et al., 2004) and the cells were incubated under 95% air and 5% CO<sup>2</sup> at 37◦C for 2.5 h. In the second series of experiments, cardiac myocytes were subjected to SI by incubating the cells in hypoxic solution (in mM: NaCl 119, KCl 5.4, MgSO<sup>4</sup> 1.3, NaH2PO<sup>4</sup> 1.2, HEPES 5, MgCl<sup>2</sup> 0.5, CaCl<sup>2</sup> 0.9, Na-lactate 20, BSA 0.1%, 310 mOsm/l, pH = 6.4) (Li et al., 2004) and placing the plates in a humidified 37◦C hypoxic chamber exposed to a constant flow of a mixture of 95% N<sup>2</sup> and 5% CO<sup>2</sup> for 4 h. The cells were then subjected to the following treatments during SI or normoxic protocol: vehicle control or MMP inhibitors at different doses calculated according to IC doses in vitro. Normoxic and SI treatments were followed by 2 h reoxygenation with growth medium with administration of the same dose of compounds as during normoxia or SI and superfusion with 95% air and 5% CO<sup>2</sup> at 37◦C (**Figure 3**).

#### Cell Viability Assay

Cell viability was assessed by a calcein and propidium iodine assay performed in each group after 2 h reoxygenation. Briefly, the growth medium was removed, the cells were then washed with PBS twice and afterwards were incubated with calcein (1µM) for 30 min. Then the calcein solution was replaced with fresh PBS and the fluorescence intensity of each well was detected by a fluorescent plate reader (FluoStar Optima, BMG Labtech, Ortenberg, Germany). Fluorescent intensity was then measured in well scanning mode (scan matrix:10 × 10; scan diameter: 10 mm; bottom optic; no of flashes/scan point: 3; temp: 37◦C; excitation wavelength: 490 nm; emission wavelength: 520 nm). Then the PBS was removed and the cells were incubated with PI (50µM) and a digitonin (10−<sup>4</sup> M) (Sigma-Aldrich; St. Louis, MO) for 7 min. Following that, the PI solution was replaced with fresh PBS and fluorescent intensity was detected using the same settings, excitation wavelength: 544 nm; emission wavelength: 610 nm). Background fluorescent intensity (Cells without staining) was subtracted from the calcein fluorescence intensity (reflecting live cell population) and divided by PI

FIGURE 3 | Experimental protocol for cell culture studies and for the ex vivo rat heart model of AMI. (A) Isolated neonatal rat cardiac myocytes were subjected to 4 h of simulated ischemia followed by 2 h of simulated reperfusion. At the end of the reperfusion, cell viability was determined by using calcein flurescence. (B) Isolated adult rat hearts were perfused according to Langendorff and a 30-min global, no-flow ischemia was applied after a 20 min equilibration period. Subsequently, 2 h reperfusion was applied and then infarct size was determined. The hearts were perfused with Krebs-Henseleit solution containing lead candidates or vehicle from 20 min prior to the global ischemia until the 60th min of reperfusion.

fluorescence intensity (reflecting total cell count) and the average intensity of each group was plotted. The cytoprotective effect of different compounds was compared to simulated ischemic control groups.

#### Myocardial Infarction in Isolated Rat Heart Ex Vivo Global Ischemia/Reperfusion Injury

Our experiment conforms to the National Institutes of Health Guide for the Care and Use of Laboratory Animals (NIH Pub. No. 85-23, Revised 1996) and also to the EU directive guideline for the care and use of laboratory animals published by the European Union (2010/63/EU) and was approved by the local ethics committee of the University of Szeged. Eight to ten week-old male Wistar rats weighing 300–350 g (Toxicoop Ltd., Budapest, Hungary) were anesthetized intraperitoneally with 60 mg/kg pentobarbital sodium (Euthasol, Produlab Pharma, Raamsdonksveer, The Netherlands). After administration of 500 U/kg heparin through the femoral vein, the heart was isolated and perfused according to Langendorff with oxygenated Krebs-Henseleit buffer at 37◦C as previously described (Turan et al., 2006). Briefly, hearts were subjected to 10 min aerobic perfusion for equilibration and stabilization of heart function and then by 30-min global ischemia followed by 120 min reperfusion. Global ischemia was induced by setting a stopcock (B/Braun, Melsungen, Germany) in closed position, and reperfusion was achieved by turning the stopcock in the original (perfusion) position. Heart rate and coronary flow were monitored throughout the perfusion protocol. All the test compounds, their vehicle (DMSO, <0.1% in Krebs-Henseleit solution) as well as the positive control PD166793 (Tocris Bioscience, Cat. No. 2520; Bristol, UK) were applied 20 min before the onset of global ischemia and maintained until the 30th min of reperfusion (**Figure 3**).

#### Determination of Myocardial Infarct Size

At the end of the 2-h reperfusion, the right ventricle was removed, hearts were frozen, cut into six 1-mm-thick slices, and incubated in 1% triphenyl-tetrazolium chloride (Sigma-Aldrich; St. Louis, MO) at 37◦C to delineate infarcted tissue. Slices were then fixed and quantified by planimetry using InfarctsizeTM 2.5 software (Pharmahungary, Szeged, Hungary) (Fekete et al., 2013). Infarct size was expressed as a percentage of the left ventricle.

#### Statistical Analysis

Data were expressed as mean ± SEM. Cell viability were expressed as % of vehicle treated groups. Data were compared to vehicle using ANOVA followed by post-hoc tests, e.g., Tukey or Fisher LSD test.

#### RESULTS

### Focused Library Design and MTS Screening

Since hydroxamic acids are reported as the primary zincbinding motif, we intended to select such a library from the AMRI 200,000 member non-exclusive compound repository as a starting point of our drug discovery efforts. Since only a few compounds were available in the repository as hydroxamic acids and the conversion of acids to hydroxamic acids were not applicable to HT parallel synthesis, we decided first to select free acids from the AMRI's compound collection. This selection supported our initial hypothesis since acids are considered as weaker Zn2<sup>+</sup> chelators than hydroxamic acids, which might be beneficial for achieving selectivity and in addition could be considered as a good indicator of the MMP-2 inhibitory activity. The substructure search resulted in 3600 acids, which were further focused to a small diverse subset by chemoinformatics methods including 259 compounds, where the compounds hold various motifs around a central core, reflecting the typical MMP inhibitor architecture described above (see **Figure 2**). The selected acids were screened in a fluorescent assay using a recombinant human MMP-2 catalytic fragment and a synthetic peptide substrate. Ilomastat (a non-selective MMP inhibitor) was used for the validation of the assay and in each subsequent experiment as a control compound. The selected compounds (259) were first tested using single point measurements at 10µM concentration; 6 compounds showed > 70% inhibition, 7 compounds between 60–70%, and 12 compounds between 50–60%. The accumulated hit-rate was 10%. The primary acid hits (12) were attempted to convert to hydroxamic acids. Since two reactions failed 10 hydroxamic acids were prepared successfully for comparative MMP-2 screening. The hydroxamic acid pairs of the previously measured acids were then tested. Comparing the inhibitory activity of the acids and hydroxamic acids, we had an unexpected discovery. Five acids showed higher inhibition than the corresponding hydroxamic acids during catalytic fragment measurement, and among them 3 belonged to the same chemotype: thiazolyl-carboxylic acid (**Table 1**).

Furthermore, we found that the thiazole ring (MMPI-1157) to the isosteric imidazole (MMPI-1154) increased the selectivity to 1.5-fold over MMP-1 (**Table 2**) while the overall inhibitory profile was similar. The 3D similarity score was also high (3D-T = 0.85). The thiazole-imidazole replacement also made the compounds less lipophilic (cLogP was reduced from 3.3 to 2.9). Interestingly, 4- (or para)-fluoro-phenyl substitution in the shorter side chain (MMPI-1157, 1154, 1260, 1248) is favored over the 3- (or meta)-fluoro-phenyl substitution. It showed higher selectivity and MMP-2 inhibitory effect even if the 3D similarity scores were high. The 4- benzyl-phenyl ether or 4-pyridyl-phenyl ether side chain was also favored over the other groups in the longer side chain. On the other hand, if the benzene ring was replaced with pyridine in the shorter side chain, it reduced the MMP-9 inhibition significantly, thus MMP-2/9 selectivity was increased (MMP 9 inhibition: MMPI-1252, 1253 ≥ 500µM). One compound (MMPI-1140) that lacks the heterocyclic ring but contains the corresponding side chains showed similar activity profile as the parent thiazole carboxylic acid, MMPI-1133, even though the 3D similarity alignment was relatively low (0.56). In summary, the entire screening cascade (**Figure 2**) including library design, selection, virtual screening, and in vitro biological screening resulted in a novel thiazole/imidazole carboxylic acid chemotypes, which could be suitable starting points for further structure-based optimization.

As a next step we started to explore the chemical space around this chemotype using 2D/3D structure-based in silico methods. First, a 568-membered focused library was in silico generated TABLE 1 | Comparing the inhibitory activity of the acids and hydroxamic acids.

around the AMRI library hits including their bioisosters and some simplified analogs and then the library members were docked to the 3D model of MMP-2.

Virtual 3D docking of potential MMP inhibitors was executed using GOLD. The protein structure coordinates were obtained from Protein Data Bank using the highest available resolution (preferably co-crystallized with ligand). We used (MMP-2: 1QIB), (Dhanaraj et al., 1999). The region of interest used for GOLD docking was defined as all the protein residues within the 19 Å radius sphere with the midpoint of the Zinc ion in the catalytic center. GOLD default parameters were used, which were set to 200,000. The complexes were submitted to 20 genetic algorithm runs using the GOLDScore fitness function.

As a result, 45 compounds were considered as virtual hits (docking score > 70) and proposed for chemical synthesis. The synthesizable compound set was completed with several close analogs by rational design. For instance, in order to increase the solubility of the compounds, the benzene ring was successively replaced with pyridine (see MMPI-1252, 1253, 1248, and 1260). Altogether 30 compounds were successfully synthesized for screening combining the in silico hits and the additional designed compounds.

The compounds were measured for MMP-1, 2, 9, 13 to determine their inhibitory profile. Efficiency Index amplifies the two major required effects, selectivity against MMP-1 and the inhibitory activity.

**Table 2** shows the IC<sup>50</sup> values of the hit compounds (hit criteria: 100% MMP-2 inhibition at 100 microM). The Gold docking scores are shown for those hits that are coming from virtual screening.

In addition, 3D flexible alignment studies were performed between the novel hit compounds and the initial AMRI library best hit (AMRI-101A/MMPI-1157) compounds. The measure of the alignment was characterized by 3D similarity scores (3D Tanimoto coefficient, ChemAxon Screen3D software). It was postulated that high 3D similarity score could reveal similar conformation and binding mode which could result in similar bioactivities. Finally, cLogP was calculated for each compound. The lower values showing less lipophilicity which is expected to accelerate the passage through the cell membrane leading to higher bioavailability.

MMPI-1154 was investigated more deeply in 3D docking studies. **Figure 4** shows the interaction of the compound to the active site of MMP-2. In MMPI-1154 (Containing an imidazolecarboxylic acid moiety), the acid residue had a chelating interaction to the Zn2<sup>+</sup> with the contribution of one of the Nhetero atoms of the heterocyclic ring. This relatively weak Zn2<sup>+</sup> chelation dynamically and statistically gives an allosteric binding feature of this inhibitor.

#### The Effect of MMP Inhibitors on Cardiac MMP-2 Activity Measured by Zymography

To confirm the MTS screen results, we tested the potential MMP inhibitor molecules on MMP-2 enzyme isolated from rat heart in vitro. Therefore, we applied the MMPIs at 1 and 100 µM final concentration in the enzyme's development buffer (**Table 3**).

#### TABLE 2 | Results of thiazole carboxylic acid (TCA) and imidazole carboxylic acids (ICA) and related analogs.



#### Cardio-Cytoprotection by MMPIs in Cell Culture Model of I/R Injury

Some doses of MMPIs affected cell viability significantly in normoxic conditions (Supplementary Figure 4). Since the vehicle for MMPIs was DMSO, aerobic cardiac myocytes were treated with 0.1% (v/v%) DMSO and their viability was also assessed. Vehicle treatment did not affect cell viability in comparison to non-treated cardiomyocytes (Supplementary Figure 5).

Hypoxia is one of the numerous influences on cardiac matrix remodeling, via ECM turnover and induction of MMPs. In addition, I/R injury is also a critical modulator of MMP expression through alternative mechanisms (Jun et al., 2011).

The 4-h hypoxic exposure and 2-h reoxygenation caused a marked cell death (Supplementary Figure 5), which was attenuated by MMPI treatment. To investigate whether MMPIs treatment influences cardiac myocite survival after simulated I/R, we selected 6 MMPIs that were available at that time and, which showed significant MMP inhibitory effect during prescreening. We tested those compounds in cultured neonatal cardiac myocytes subjected to simulated I/R studies. Ilomastat served as positive control (Supplementary Figure 6). The tested compounds showed significant cytoprotection, between 17 and 47% (**Figure 5**). The supplementary figures show all inhibitor testing data (Supplementary Figure 4).

#### Cardioprotection by MMPI-1154 in Isolated Rat Heart Model of I/R Injury

Finally, based on the results of cell culture experiments, we selected the most potent cardioprotective compound, MMPI-1154 for testing in an isolated rat heart model of AMI. MMPI-1154 reduced myocardial infarct size significanly at 1µM as compared to the vehicle-treated group (**Figure 6**).

#### DISCUSSION

In our study, we have successfully demonstrated the development of a novel, selective MMP-2 inhibitor for cardioprotection from an in silico compound library selection, through to the testing of the most promising compound against acute myocardial infarction, in an isolated rat heart model. We've found that the MMP-inhibiting effects of imidazole and thiazole carboxylic acid-based compounds are superior to the conventional hydroxamic acid type derivatives of the same molecules. We have thus shown for the first time in the literature that the acute application of MMPI-1154 (An imidazole carboxylic acid-based compound) has a protective effect for the heart against acute myocardial infarction. We achieved ex vivo cardioprotection via a moderate MMP-2 inhibition, since MMPI-1154 was applied at around the concentration of its IC<sup>20</sup> value.

### MMP Inhibitor Development Strategy

Currently, ∼500 papers investigating the role of MMP inhibition in myocardial ischemia are available from the last 2 decades in PubMed database. There are several papers that describe the nonzinc binding, allosteric (e.g., π-π stacking) interactions of MMP-2 with selected inhibitors (Di Pizio et al., 2013; Agamennone et al., 2016; Ammazzalorso et al., 2016; Adhikari et al., 2018). Most of these papers employ MMP-2 as a potential biomarker for ischemic heart diseases or as a therapeutic target to evoke cardioprotection. However, early clinical trials targeting MMP-2 for improving cardiovascular outcomes after acute myocardial infarction have failed (e.g., PREMIER study, Hudson et al., 2006). The likely reason for failure was the lesser selectivity of the applied MMP inhibitors as well as the chronic and relatively highdose administration regimen. Therefore, in our present study, we aimed to develop novel MMP-2 inhibitor lead candidates, which possess high selectivity and lead only to a moderate MMP-2 inhibition in accordance to our previous findings (Giricz et al., 2006; Bencsik et al., 2014).

### Novel Structural Findings Regarding MMP-2 Inhibitor Development

Several hydroxamic acid compounds are known as nonselective MMP inhibitors. Therefore, we started our inhibitor development with selecting hydroxamic acid compounds from the AMRI library. We also selected their carboxylic acid derivatives. We identified thiazole and imidazole substituted carboxylic acid molecules, in which MMP-2 inhibitory effect was superior to the corresponding hydroxamic acid derivatives. Furthermore, we found that changing the thiazole ring (MMPI-1157) to the isosteric imidazole (MMPI-1154) increased the selectivity over MMP-1, although the overall inhibitory profile and the structure were similar. This feature was an advantageous factor during molecular designing process since MMP-1 inhibition was responsible for the development of musculoskeletal syndrome, the most severe adverse effect of early MMP inhibitors.

The relatively weak Zn2<sup>+</sup> chelation derived from the imidazole-carboxylic acid moiety interacting to the Zn2<sup>+</sup> dynamically and statistically gave an allosteric binding feature for MMPI-1154. It is also assumed that the additional electron donating heteroatom being in close proximity to the acid moiety (thiazole/imidazole ring) would also contribute to the chelation of the Zn2<sup>+</sup> ion. The bulky side chain is deep inside in the S1' pocket as expected, although some rotational movements would be permitted around the central tertiary N atom. This option would allow different binding modes and activity profiles as well.


Most importantly, the pyridine moiety instead of the phenyl ring at the end of the S1' pocket occupying longer side chain of the molecules increased the selectivity of the inhibition for MMP-2 against MMP-1 (MMPI-1260, 1248). This is most likely due to the increased polarity of the tail group (such as pyridine), which is exposed to the aqueous environment at the end of the S1' pocket. Similar compounds are described in Duan et al. (2007), where non-zinc chelating MMP-2 inhibitors with a similar bulky side chain were reported. This finding supported our hypothesis that weak or negligible Zn2<sup>+</sup> chelation with bulky and partially polar side chains lead to selective and active MMP-2 inhibitors. The phenyl-pyridine exchange is also beneficial to the cell penetration since the calculated octanol-water partition (cLogP) decreased in one order of magnitude. Although this change did not cause significant conformational changes, the 3D similarities were high between these compounds and the initial hit (MMPI-1157).

In conclusion, the biological data and the docking studies together with the 3D alignment modeling confirmed that these chemotypes represent a novel promising class of MMP-2 inhibitors. The bulky groups together with a weaker Zn2+ chelating carboxylic acid residue allowed us to achieve low micromolar MMP-2 inhibition, often together with an apparent selectivity against MMP-1. Finally, all the hit compounds meet the drug-likeness criteria (Lipinski Rule of 5.), which predicts high developability prognosis.

#### Screening Cascade

After the chemical optimization of the novel MMP inhibitor lead candidates, we determined their IC<sup>50</sup> values by using gelatin zymography. During zymographic analysis, we used full-length, active MMP-2 enzymes isolated from healthy young adult rat hearts. Subsequently, the cardio-cytoprotective effects of the selected candidates having the lowest IC<sup>50</sup> values to MMP-2 were tested in cultured neonatal cardiac myocytes subjected to

series of experiments are presented in the case of all compounds (for more detailed results see for Supplementary materials, Figure 4).

simulated I/R injury. Cardiac myocyte cell culture assay allowed a relatively high throughput biological efficacy testing (Gorbe et al., 2010) of the selected lead candidates in several dose ranges at different levels of inhibition of MMP-2 activity. Our cell culture test system revealed several biologically efficacious doses beyond the IC<sup>50</sup> values of the selected lead candidates (see data Supplementary Figure 1 for details).

#### Cardio-Cytoprotection by MMPI-1154

Based on the results of the abovementioned cell culture experiments, we selected MMPI-1154 (The lead candidat) which showed the highest increase in cell viability during simulated I/R experiments. We then used it for cardioprotection in an ex vivo rat heart model of acute myocardial infarction. To approximate the moderate 20% inhibition of MMP-2 activity by MMPI-1154 (based on our previous findings, Giricz et al., 2006; Bencsik et al., 2014), in the ex vivo model of AMI, we used the 1µM concentration (IC<sup>20</sup> value) instead of the most effective 2.5µM (IC<sup>50</sup> value) concentration seen during cell culture experiments. Although MMPI-1154 is not highly selective to MMP-2, it seems to be one of the most efficient MMP-2 inhibitors as shown in **Table 2** (efficiency index). In the present study, the in silico and subsequent in vitro chemical efficiency has been confirmed in the isolated heart experiments since MMPI-1154 in 1µM showed

#### REFERENCES


a significant cardioprotection effect by decreasing myocardial infarct size during acute global ischemia/reperfusion injury. Further research in in vivo models of AMI can shed light on its cardioprotective properties as well as on its safety derived from the optimal selectivity toward different MMP isoforms.

### CONCLUSIONS

This is the first demonstration that imidazole and thiazole carboxylic acid-based compounds are more efficacious than their hydroxamic acid derivatives in MMP-2 inhibition. MMPI-1154 is a promising novel cardio-cytoprotective imidazole-carboxylic acid MMP-2 inhibitor lead candidate for the treatment of acute myocardial infarction.

### AUTHOR CONTRIBUTIONS

PB: Study management and paper writing; KK: Gelatine zymography and ex vivo experiments; AG and ZV: Cell culture experiments; ÉK: Paper writing, statistics, data, and figure management; JP and RG: Cell culture and ex vivo experiments; LK: In silico drug design, synthesis of analogs; LW: MMP inhibitor design, medical chemistry advising; FT: Synthesis of analogs; IH, GF, SC, and LB: Molecular docking and high troughput screening; TC and CC: Supervising zymography and ex vivo experiments; GD: Supervising in silico and HT screening, granting, paper writing; PF: Supervising whole project, paper writing, and granting.

#### ACKNOWLEDGMENTS

This work was supported by the National Research, Development and Innovation Office (TÉT\_15\_IN-1-2016-0068) and by the former National Development Agency (NKFP\_06\_A1- MMP\_2006). PB was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. We are grateful to ChemAxon Ltd. (Budapest) providing the Screen3D software for the present study and to Miklós Szabó, Adrián Kalászi for their advice.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00296/full#supplementary-material

and 13. ChemMedChem 11, 1892–1898. doi: 10.1002/cmdc.2016 00266


chain in isolated rat hearts subjected to ischemia-reperfusion injury: a new intracellular target for matrix metalloproteinase-2. Circulation 112, 544–552. doi: 10.1161/CIRCULATIONAHA.104. 531616


Yamamoto, M., Tsujishita, H., Hori, N., Ohishi, Y., Inoue, S., Ikeda, S., et al. (1998). Inhibition of membrane-type 1 matrix metalloproteinase by hydroxamate inhibitors: an examination of the subsite pocket. J. Med. Chem. 41, 1209–1217. doi: 10.1021/jm970404a

**Conflict of Interest Statement:** PB and AG are employed by and PF is the CEO of Pharmahungary 2000 Ltd; TC and CC were employed by Pharmahungary 2000 Ltd; FT was employed by and LK was the CEO of Infarmatik Ltd; LW is the CEO of OntoChem GmbH; IH, GF, and GD are employed by and SC is the CEO of Targetex Ltd; PF, TC, CC, KK, LK, FT, SC, IH, GD, and AG are the inventors of the patent WO\_2012/080762\_A1 and related national patents now assigned by Pharmahungary and TargetEx.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Bencsik, Kupai, Görbe, Kenyeres, Varga, Pálóczi, Gáspár, Kovács, Weber, Takács, Hajdú, Fabó, Cseh, Barna, Csont, Csonka, Dormán and Ferdinandy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# In Silico Discovery of Plant-Origin Natural Product Inhibitors of Tumor Necrosis Factor (TNF) and Receptor Activator of NF-κB Ligand (RANKL)

Georgia Melagraki<sup>1</sup> , Evangelos Ntougkos<sup>2</sup> , Dimitra Papadopoulou2,3, Vagelis Rinotas2,4 , Georgios Leonis<sup>5</sup> , Eleni Douni2,4, Antreas Afantitis2,5 \* and George Kollias2,3 \*

<sup>1</sup> Hellenic Military Academy, Vari, Greece, <sup>2</sup> Division of Immunology, Biomedical Sciences Research Center "Alexander Fleming," Vari, Greece, <sup>3</sup> Department of Experimental Physiology, Medical School, National and Kapodistrian University of Athens, Athens, Greece, <sup>4</sup> Department of Biotechnology, Agricultural University of Athens, Athens, Greece, <sup>5</sup> NovaMechanics Ltd., Nicosia, Cyprus

#### Edited by:

Adriano D. Andricopulo, Universidade de São Paulo, Brazil

#### Reviewed by:

Ana Carolina Rennó Sodero, Universidade Federal do Rio de Janeiro, Brazil Mohammad Hassan Baig, Yeungnam University, South Korea

#### \*Correspondence:

Antreas Afantitis afantitis@novamechanics.com George Kollias kollias@fleming.gr

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 17 April 2018 Accepted: 03 July 2018 Published: 25 July 2018

#### Citation:

Melagraki G, Ntougkos E, Papadopoulou D, Rinotas V, Leonis G, Douni E, Afantitis A and Kollias G (2018) In Silico Discovery of Plant-Origin Natural Product Inhibitors of Tumor Necrosis Factor (TNF) and Receptor Activator of NF-κB Ligand (RANKL). Front. Pharmacol. 9:800. doi: 10.3389/fphar.2018.00800 An in silico drug discovery pipeline for the virtual screening of plant-origin natural products (NPs) was developed to explore new direct inhibitors of TNF and its close relative receptor activator of nuclear factor kappa-B ligand (RANKL), both representing attractive therapeutic targets for many chronic inflammatory conditions. Direct TNF inhibition through identification of potent small molecules is a highly desired goal; however, it is often hampered by severe limitations. Our approach yielded a priority list of 15 NPs as potential direct TNF inhibitors that were subsequently tested in vitro against TNF and RANKL. We thus identified two potent direct inhibitors of TNF function with low micromolar IC<sup>50</sup> values and minimal toxicity even at high concentrations. Most importantly, one of them (A11) was proved to be a dual inhibitor of both TNF and RANKL. Extended molecular dynamics simulations with the fully automated EnalosMD suite rationalized the mode of action of the compounds at the molecular level. To our knowledge, these compounds constitute the first NP TNF inhibitors, one of which being the first NP small-molecule dual inhibitor of TNF and RANKL, and could serve as lead compounds for the development of novel treatments for inflammatory and autoimmune diseases.

Keywords: direct TNF inhibitors, RANKL inhibitors, natural products, autoimmune diseases, virtual screening, molecular dynamics

### INTRODUCTION

Tumor necrosis factor (TNF) is an important human cytokine (Beutler et al., 1985) that is involved in a number of critical biological processes and diseases, including rheumatoid arthritis, Crohn's disease, multiple sclerosis, inflammatory bowel disease, psoriatic arthritis, AIDS, and cancer (Kollias et al., 1999; Apostolaki et al., 2010). Disruption of TNF binding to its principal receptor, TNFR1, has been a long-desired goal in the development of novel autoimmune therapeutics (Douni and Kollias, 1998; Kollias and Kontoyiannis, 2002). Previous in vivo studies from our group demonstrated that deregulated TNF production induces chronic polyarthritis in a transgenic animal model and the disease could be treated by proper anti-TNF therapy (Keffer et al., 1991). These research efforts were vital in directing the attention of the pharmaceutical industry to initial

anti-TNF approaches, which eventually resulted in clinical trials that were successfully performed for a variety of chronic inflammatory diseases, including rheumatoid arthritis (Elliott et al., 1993), psoriasis, psoriatic arthritis, Crohn's disease, juvenile idiopathic arthritis, spondyloarthritis, and Behçet's disease (Sfikakis, 2010).

To date, three synthetic antibodies that block the activity of TNF have been reported, namely infliximab, adalimumab, and etanercept (Olsen and Stein, 2004). However, these expensive agents are frequently used as secondary options for patients with a poor response to regular anti-rheumatic drugs (Chaudhari et al., 2016). Moreover, biologics are associated with several other drawbacks, including high cost, inadequate clinical response, need of intravenous administration, as well as increased risk of tuberculosis and hepatitis B due to the lowered immune response. Therefore, there is a clear need for orally available, well-tolerated, inexpensive drugs that block the production of TNF associated with pathological inflammation in rheumatoid arthritis and related conditions. It has been shown that the use of small molecules in direct TNF inhibition represents an attractive alternative that offers significant benefits, such as oral administration, shorter half-lives with reduced immunosuppression, and easier manufacturing at a lower cost (Sfikakis, 2010; Lo et al., 2017; Melagraki et al., 2018).

According to a recent report (Chaudhari et al., 2016), there are no late-stage rheumatoid arthritis products targeting TNF under development. Particularly, small molecule direct inhibition of protein–protein interactions (PPIs), such as the one between TNF and its receptor, is a nontrivial approach in drug development (Sackett and Sept, 2009; Wilson, 2009; David, 2012; Arkin et al., 2014). For this purpose, successful drug design requires the identification of compounds with low molecular weight, something extremely challenging, especially when attempting to block interactions between large molecules such as proteins (Lo et al., 2017). The successful recognition of small-molecule inhibitors is also hampered by the difficulty to identify potential "hot spots" as unique binding targets that are crucial for the disruption of biomolecular interactions.

Protein–protein interactions interfaces are mostly flat, extended (approximately 1,500–2,000 Å<sup>2</sup> ), solvent-exposed, and are characterized by hydrophobic and electrostatic interactions (Jones and Thornton, 1996; Hwang et al., 2010; Sheng et al., 2015). The main difference between PPI interfaces and deep protein cavities, which usually bind small molecules, is their size, with the latter occupying a relatively small area of less than 500 Å<sup>2</sup> (Fuller et al., 2009). Studies on the binding energy distributions over protein interfaces by mutational analyses demonstrated that only specific residues (hot spots) at the PPI interface contribute most of the binding energy, while the majority of PPI-interface residues are not important (Arkin and Wells, 2004). It was shown that hot spots rather assemble at the middle of the interface, to form a hydrophobic region similar in size to a small molecule, and possess conformational flexibility. The location of hot spots usually coincides with the putative binding sites of the protein, and these sites consist of a number of surface residues, which favorably contribute to small-molecule binding and are also critical in stabilizing PPIs. It has been shown that among all protein residues, these hot-spot regions contribute the major part of the binding energy in a protein–inhibitor complex. Therefore, successful identification of hot spots may offer significant advancements in the rational design of inhibitors (Kozakov et al., 2015a,b).

However, little progress has been obtained regarding fast and reliable identification of hot spots despite recent advances in high-throughput methodologies (Kouadio et al., 2005; Bakail and Ochsenbein, 2016). Various computational approaches for the recognition of hot spot areas have been developed by several research groups and include methodologies that employ dedicated energy functions (e.g., Rosetta, FoldX, and PCRPi) (Guerois et al., 2002; Kortemme et al., 2004; Guharoy et al., 2011), molecular simulations (Rajamani et al., 2004), computational alanine scanning (Kollman et al., 2000), and machine learning approaches [for instance, HSpred (Lise et al., 2011) and HotPoint (Tuncbag et al., 2010)].

Despite that PPIs vary in size and shape, the majority of inhibitors usually bind to hot spot regions that are restricted to small binding sites (<1000 Å<sup>2</sup> ) (Smith and Gestwicki, 2012; Basse et al., 2013) and partner proteins are defined by short residue sequences at the interface (Perkins et al., 2010; London et al., 2013). An effective PPI inhibitor must possess a large surface area and participate in many hydrophobic interactions with the receptor. However, such a ligand is usually accompanied by high molecular weight and low solubility; therefore, various pharmacokinetic problems may arise (Sheng et al., 2015). Moreover, identifying an adequate starting structure for successful design of small-molecule PPI inhibitors is often hampered by the lack of information about natural PPI inhibitors. To date, most of the published small molecules are indirectly targeting TNF by downregulating its expression and only a limited number of compounds is reported to directly disrupt this interaction. These include the polysulfonated naphthylurea suramin and its analogs (Alzani et al., 1993; Mancini et al., 1999) and the indole-linked chromone SPD304 (He et al., 2005), the use of which is hampered by low potency and poor selectivity with a concomitant tendency to cause adverse effects (suramin) (McGeary et al., 2008), and cell toxicity (SPD304) (Sun and Yost, 2008). Moreover, Chan et al. (2010) identified two natural product (NP)-like molecules, two FDA-approved drugs, namely darifenacin and ezetimibe (Leung et al., 2011), and a metal-based iridium(III) biquinoline complex (Leung et al., 2012), which act as direct inhibitors of TNF. Recently, our group with the aid of cheminformatics techniques identified two additional small molecules (T23 and T8) that were shown to directly inhibit TNF function (Melagraki et al., 2017). Importantly, the above compounds were also potent against receptor activator of nuclear factor kappa-B ligand (RANKL) and presented low toxicity. In 2017, another TNF small-molecule inhibitor, JNJ525, was discovered by Blevitt et al. (2017). The mechanism of PPI disruption was attributed to a change in the quaternary structure of the protein by an aggregate conglomerate of JNJ525 in a way that TNFR1 binding to TNF is blocked.

Drug discovery based on NP-like scaffolds has rapidly advanced through novel computational approaches (Baig et al., 2016; Rodrigues et al., 2016). Recent developments have

demonstrated the power of computationally treating complex NP structures to recognize their protein targets and to find specific applications in rational drug design (Reutlinger et al., 2014; Rodrigues et al., 2016; Basith et al., 2018; Lima et al., 2018; Zheng et al., 2018). The abundance of NPs or compounds inspired by NPs as drugs and drug candidates (Lesney, 2004) motivated us to search for novel TNF inhibitors among them. Given the high priority of plant-origin NPs in previous and current drug development efforts (including the terpenoids, e.g., Taxol and steroids, the glycosides, e.g., digitalis and the various flavonoids, and the alkaloids, e.g., camptothecins and the opiates), we focused on identifying novel TNF small molecule inhibitors from plant sources.

#### MATERIALS AND METHODS

In search of plant-origin NPs as direct TNF inhibitors, we combined chemoinformatics techniques, high-throughput virtual screening, and molecular dynamics (MD) simulations with experimental evaluation, ultimately aiming at discovering potent TNF-functioning NP inhibitors. 3,573 pure NPs of plant origin were virtually screened from the MEGxp database, which is one of the largest chemical libraries of NPs available (AnalytiCon Discovery); the highest scoring compounds were then tested in vitro to assess their inhibitory activity against TNF.

Our strategy for identifying these novel plant-origin small molecule TNF inhibitors is presented in **Scheme 1**.

#### Molecular Modeling

The initial model of TNF was built from the X-ray co-crystal structure of TNF dimer with SPD304 (PDB code: 2AZ5). All structures were prepared using Molegro's Molecules and Protein Preparation Wizard (Thomsen and Christensen, 2006). Proper bond assignments, bond orders, hybridization, and charges were calculated by Molegro Virtual Docker (MVD) software (version-5.0) (Thomsen and Christensen, 2006). Explicit hydrogen atoms were added and their hydrogen bonding (HB) patterns were also determined by MVD. Since the 3D conformation of SPD304 is known from crystallographic data, a docking template was defined. SPD304 was replaced by each ligand in TNF, and template alignment considered ligands as fully flexible: the docking algorithm recognized the optimal conformation of the ligand when fitting to the template. The MolDock score (GRID) was used as a grid-based scoring function which pre-calculates potential energy values on an evenly spaced cubic grid in order to speed up calculations. A grid resolution of 0.30 Å was set to initiate the docking process and the binding site of the protein was defined to occupy the region surrounding SPD304 in the crystal structure (including residues Ser60, Gln61, Gly121, Tyr151, and Ala156). For the pose generation, the default setting was applied (MolDock SE), namely a maximum of 1500 iterations combined with a population size of 50. If the generated pose has an energy below the predefined energy threshold (100.0 in our study), it is included into the initial population for the "simplex evolution" algorithm (Thomsen and Christensen, 2006). This algorithm performs a combined local/global search on the poses generated by the pose generator. The number of the maximum iterations of the simplex evolution algorithm (Nelder–Mead simplex minimization) was set to 300 while the neighbor distance factor, the factor which determines how close the point of the initial simplex will be to the other randomly selected individuals in the population, was set to 1.0 (causes the initial simplex to span the neighbor points evenly).

### In Vitro Testing of TNF Inhibitors

Experiments included a TNF-induced death assay in L929 cells, a measurement of cytotoxicity in L929 cells, and a TNF/TNFR1

ELISA assay. Compounds were tested with respect to TNF using a battery of previously reported assays (Melagraki et al., 2017).

### Osteoclast Differentiation and TRAP Staining

Bone marrow (BM) cells were collected after flushing out of femurs and tibiae, subjected to gradient purification using Ficoll-Paque (GE Healthcare), plated in 96-well plates at a density of 6 × 10<sup>4</sup> cells per well and cultured in AMEM medium (GIBCO) containing 10% fetal bovine serum supplemented with 40 ng/ml RANKL (Peprotech) and 25 ng/ml M-CSF (R&D Systems) for 5 days (Douni et al., 2012). Compounds A11 and A25 were pre-incubated with RANKL at various concentrations from 1 to 10 µM in AMEM medium for 1 h at room temperature and then added to cell cultures that were replenished with fresh medium every 2 days. Osteoclasts were stained for tartrate-resistant acid phosphatase (TRAP) activity using a leukocyte acid phosphatase (TRAP kit) (Sigma–Aldrich).

### TRAP Activity Assay

In the TRAP activity assay, BM cells were plated in 96 well plates at a density of 6 × 10<sup>4</sup> cells per well and cultured in AMEM medium (GIBCO) containing 10% fetal bovine serum supplemented with 40 ng/ml RANKL (Peprotech) and 25 ng/ml M-CSF (R&D Systems) for 4 days. Then, cells were lysed in ice-cold phosphate buffer containing 0.1% Triton X-100. Lysates were added to 96-well plates containing phosphatase substrate (p-nitrophenol phosphate) and 40 mM tartrate acid buffer and incubated at 37◦C for 30 min. The reaction was then stopped with the addition of 0.5 N NaOH. Absorbance was measured at 405 nm on a micro-plate reader (Optimax, Molecular Devices). TRAP activity was normalized to total protein which was determined using the Bradford assay (Bio-Rad).

#### MTT Viability Assay

Cytotoxicity was evaluated for BM cells using the 3-(4,5 dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay, which measures the ability of viable cells to reduce a soluble tetrazolium salt to an insoluble purple formazan precipitate. BM cells used for MTT assay were seeded at a density of 10<sup>5</sup> cells/well in 96-well plates and incubated with A11 and A25 compounds for 48 h in AMEM containing 10% fetal bovine serum supplemented with 25 ng/ml M-CSF (R&D Systems). After removal of the medium, each well was incubated with 0.5 mg/ml MTT (Sigma–Aldrich) in AMEM serum-free medium at 37◦C for 2 h. At the end of the incubation period, the medium was removed and the intracellular formazan was solubilised with 200 µl DMSO and quantified by reading the absorbance at 550 nm on a micro-plate reader (Optimax, Molecular Devices). Percentage of cell viability was calculated based on the absorbance measured relative to the absorbance of the untreated control.

### Molecular Dynamics with EnalosMD

Molecular dynamics simulations were performed with our in-house developed EnalosMD suite of programs (EnalosMD, NovaMechanics Ltd., 2018). A fully automated pipeline included the following steps of systems' preparation, MD runs, and analyses:


### RESULTS AND DISCUSSION

The formation of the biologically active TNF homotrimer is prevented by direct TNF inhibitors, such as SPD304, through disruption of the TNF dimer binding to the third subunit (He et al., 2005; Davis and Colangelo, 2012). TNF–inhibitor interactions are hydrophobic and shape-driven, as the inhibitor structure needs to be large enough to interact with both subunits and to prevent binding of the third subunit to the TNF dimer. We in silico explored 3,573 NPs contained in MEGxp database using a structure-based docking approach. The crystal structure of TNF dimer with SPD304 (PDB code: 2AZ5) was used as the molecular model for our investigation and the compounds were docked into the protein–protein interface. Computational molecular docking studies were performed using MVD (Thomsen and Christensen, 2006). Based on the docking score and following meticulous visual inspection of the conformations, we generated a shortlist of the top 15 commercially available NPs for in vitro validation.

Our in vitro screening strategy included one of the most commonly used assays of TNF activity. This assay exploits the

ability of TNF to induce death in the murine fibrosarcoma cell line L929 following sensitization by the transcription inhibitor actinomycin D. Functional inhibition of TNF by small molecules would result in reduction of the TNF-induced cytotoxicity.

Out of the 15 prioritized NPs mentioned above, two emerged as the most promising ones based on in vitro testing. The action of these two NPs (designated A11 and A25; structures shown in **Figure 1**) was then further characterized. In dose–response

representative of at least three replicates are shown.

FIGURE 5 | Effects of A11 and A25 on the viability of BMMs. BMMs were treated with 10–100 µM of compounds A11 and A25, respectively, in the presence of M-CSF (25 ng/ml) for 48 h. Cytotoxicity was assessed using a MTT colorimetric assay. Cell viability (%) was expressed as a percentage of the untreated control. LC<sup>50</sup> values are given as mean ± SEM from three independent experiments performed in duplicate.

experiments, the small molecules were shown to inhibit human TNF-driven death in L929 cells with an IC<sup>50</sup> of 35 ± 3 µM (A11) and 33 ± 2 µM (A25). Both compounds were found to be minimally toxic in these cells (LC<sup>50</sup> > 80 µM), in contrast to the published high toxicity of SPD304 (7.5 µM) (Melagraki et al., 2017). An already approved anti-TNF biologic, adalimumab (HUMIRA, Abbott Laboratories, IL, United States), was used as a positive control of the assay. Adalimumab is a human anti-TNF monoclonal antibody approved by the U.S. Food and Drug Administration (FDA, 2002) and by the European Medicines Agency (EMEA, 2003) for RA treatment. Adalimumab inhibits TNF-driven death in L929 cells with a low IC<sup>50</sup> of 0.5 ± 0.1 nM, without showing any cytotoxicity (**Figure 2**).

Having established that the selected products can obstruct the function of TNF, and given that TNF exerts its functions primarily through interacting with its receptor, TNFR1, an ELISA-based assay was used to quantify effects on this interaction. Both compounds significantly reduced binding of TNF to TNFR1, with an estimated IC<sup>50</sup> of 3.3 ± 0.9 µM for A11 and 4.1 ± 1.7 µM for A25. Adalimumab was again used as a positive control eliminating the TNF-TNFR1 binding with a low IC<sup>50</sup> of 0.2 nM (**Figure 3**).

The oligostilbenoid A11 (NP-003410, Ampelopsin H, (1R,2R,6R,6aR,7R,8R,12R,12aR)-1,7-Bis(3,5-dihydroxyphenyl)- 2,6,8,12-tetrakis(4-hydroxyphenyl)-1,2,6,6a,7,8,12,12a-octahydrofuro[200,300:6<sup>0</sup> ,70 ]indeno[1<sup>0</sup> ,20 :2,3]indeno [5,4-b]furan-5,11 diol) is an NP that has been isolated from Parthenocissus tricuspidata and the glycosyloxyflavone analog A25 (NP-008297, [(2R,3S,4S,5R,6S)-6-[(2S,3R,4R,5R,6S)-2-[5,7-dihydroxy-2- (4-hydroxyphenyl)-4-oxochromen-3-yl]oxy-4,5-dihydroxy-6-methyloxan-3-yl]oxy-3,4,5-trihydroxyoxan-2-yl]methyl(E)- 3-(4-hydroxyphenyl)prop-2-enoate) is an NP that has been

isolated from Ginkgo biloba (**Figure 1**). Except being isolated from natural sources, A11 can also be synthesized through a selective functionalization procedure as described by Rodrigues et al. (Rodrigues et al., 2016). Compounds A11 and A25 are promising PPI inhibitors as they both have large surface areas and are able to create many hydrophobic contacts at protein interfaces. Moreover, it has been observed that hydrophobic PPI hot-spot pockets tend to be excellent binders of small organic molecules, which combine a largely hydrophobic functionality with a secondary polar component (Guo et al., 2014). Indeed, the polar hydroxyl groups surrounding the hydrophobic core of A11 and A25 (**Figure 1**) constitute structures that are ideal binders to the concave hot-spot area of the protein (Mattos and Ringe, 1996; Shuker et al., 1996). It has been suggested that the ability of recognizing drug molecules (i.e., druggability) by a hot-spot pocket depends on the balance among total surface area, and polar/nonpolar contact areas (Hajduk et al., 2005; Cheng et al., 2007; Schmidtke and Barril, 2010).

In comparison to SPD304, NPs A11 and A25 are predicted by the molecular docking study to occupy a similar region in the binding pocket, and to be relatively hydrophobic and large enough to interact with residues from both subunits of the TNF dimer. Nonpolar residues are predominant in the binding site, which mainly includes glycine, leucine, and tyrosine. Only one HB interaction is observed between compound A25 and Tyr151. Both compounds appear to be situated more closely to subunit A than subunit B and are in close contact with the Leu120-Gly121-Gly122 β-strand of subunit A. The lack of salt bridges or extended HB interactions indicates the hydrophobic character of A11 and A25 binding as also observed with SPD304.

The docked SPD304 conformation reproduced its crystal form, with an RMSD of 0.67 Å between the two structures. The docking score of SPD304 binding to TNF was calculated to be −171.08 (arbitrary units), and compounds A11 and A25 showed a binding score of −195.76 and −180.19, respectively, thus suggesting a strong interaction between the compounds and the TNF dimer. The high inhibitory potency of A11 and A25 against TNF was also indicated by our recently developed TNF model, released through the Enalos Cloud platform (Melagraki and Afantitis, 2014). After selecting the corresponding workflow within Enalos Cloud platform (Melagraki et al., 2017), both compounds were submitted and prediction results verified their activity. However, predictions fell out of the model's domain of applicability as expected for these complex structures.

Receptor activator of nuclear factor kappa-B ligand, another TNF superfamily member, is the main regulator of osteoclast formation and bone resorption (Fuller et al., 1998). We evaluated the effect of various concentrations of A11 and A25 on RANKL-dependent osteoclast differentiation in a culture system of BM-derived monocyte/macrophages (BMMs) stimulated with RANKL (50 ng/ml) and M-CSF (25 ng/ml) for 5 days through evaluation of the TRAP activity, an osteoclast-specific enzyme. A11 fully suppressed RANKL-induced TRAP-positive osteoclast differentiation at 10 µM, whereas A25 was ineffective even at 20 µM (**Figure 4**). Moreover, using a quantitative assay that measures TRAP activity, A11 inhibited RANKL-induced osteoclastogenesis in a dose-dependent manner, displaying an IC<sup>50</sup> of 3.42 ± 0.45 µM (**Figure 4B**). Furthermore, in order to exclude the possibility that inhibition of A11 on TRAP activity was due to cytotoxicity, the viability of BMMs was tested through the MTT assay. A11 displayed an LC<sup>50</sup> of 44.76 ± 4.61 µM (**Figure 5**), suggesting that it affects osteoclastogenesis without interfering with cell viability. On the other hand, A25 had no effect either on osteoclastogenesis or BMM viability (LC<sup>50</sup> > 100 µM) (**Figure 5**).

We subsequently investigated the binding of A11 to RANKL using the proposed molecular scaffolds in a structure-based approach. For this purpose, we employed the jFATCAT pairwise structure alignment algorithm (Ye and Godzik, 2003) to align the RANKL structure (PDB code: 1S55) to the crystal structure of TNF dimer with SPD304 (PDB code: 2AZ5). For our computational approach, we employed the murine RANKL model, which shares a 100% identity with human RANKL in the binding site, including residues Trp192, Tyr214, Asn275, Gly277, and Phe279. Also, RANKL shares a high degree of structural similarity with TNF as shown in Supplementary Figure S1. The binding conformations of both NPs and SPD304 are also depicted in the Supporting Information (Supplementary Figure S2). The docking methodology for RANKL systems was identical to the procedure followed for TNF complexes as described in the section "Materials and Methods." The docking score of SPD304 binding to RANKL was calculated to be −159.712 and compounds A11 and A25 showed a binding score of −211.79 and −146.83, respectively. For A11, the computational analysis suggests a strong binding interaction with RANKL, which is in line with the experimental results.

Additionally, we employed our recently developed EnalosMD suite to perform extended MD simulations for A11 and A25 in complexes with TNF and RANKL. EnalosMD automates the preparation of any ligand-protein system and performs MD calculations in a way that minimal effort by the user is required. This application provides a powerful way to perform robust MD calculations with unprecedented speed and easiness regarding the construction of the initial model structure. Therefore, we carried out four 1000 ns-long MD runs to identify structural and energetic properties of the complexes that may further elucidate the mode of action of the two compounds. EnalosMD offers optimal performance by combining several computational programs and functionalities (**Figure 6**).

The MD results showed that protein structures early stabilized during the simulations in all complexes with RMSD values that do not exceed 3 and 4 Å in TNF and RANKL complexes, respectively (Supplementary Figure S3). A11 and A25 appear relatively stable into either protein's cavity, with A25 showing only minor structural changes when bound to TNF after 200 ns (**Figure 7**). However, during the first 200–250 ns of A25–RANKL complex simulation, a noticeable conformational change of A25 stabilized the molecule in a new orientation with respect to the binding site of RANKL (**Figure 7**). This conformational change may have induced great flexibility to B chain terminal residues Tyr187–Asp189 as denoted by further fluctuation calculations (**Figure 8**). Therefore, the experimentally observed lower affinity of A25 against RANKL compared to A11 may be rationalized through the A25-induced destabilization of the terminal region of monomer B. Average conformations of A11 and A25 into their protein targets, along with protein residues that are involved in dominant HB interactions with the compounds are shown in **Figures 9**, **10**. The sole interaction between A25 and Tyr151, which was shown after docking calculations in TNF complex is also observed by the MD runs, however, it is complemented by three significant interactions from chain A (**Figure 9**).

### CONCLUSION

In summary, we have identified and validated experimentally the first plant-origin NPs that act as direct inhibitors of TNF by preventing the PPI between the dimer and the third subunit. Both NPs (A11 and A25) were shown to have IC<sup>50</sup> values comparable to those of SPD304, but presented significantly reduced toxicity. Most importantly, A11 has been validated as the first NP dual inhibitor of TNF and RANKL. Both small molecules possess characteristics that are typical in potent PPI inhibitors, namely, large surface area and extended hydrophobic regions. Therefore, they can be explored as scaffolds representing NPs of plant origin in hit-to-lead optimization studies for the identification of direct TNF and/or RANKL inhibitors with improved pharmacological profiles and in the development of novel treatments for chronic inflammatory and autoimmune diseases.

### ETHICS STATEMENT

All animal procedures were approved and carried out in strict accordance with the guidelines of the Institutional Animal Care and Use Committee and the Region of Attica Veterinarian Office.

### AUTHOR CONTRIBUTIONS

AA and GK conceptualization, funding acquisition, methodology, project administration, and supervision. GM, EN, VR, DP, GL, ED, AA, and GK data curation, formal analysis, investigation, resources, validation, visualization, writing – original draft, and writing – review and editing. GM, GL, and AA software.

## FUNDING

This work was funded by Greek "Cooperation" Action project TheRAlead (09SYN-21-784) co-financed by the European Regional Development Fund and NSRF 2007–2013 (http:// www.gsrt.gr), the Innovative Medicines Initiative (IMI) funded project (http://www.imi.europa.eu/) BTCure (No. 115142) and Advanced European Research Council (ERC) grant (https: //erc.europa.eu/funding/advanced-grants) MCs-inTEST (No. 340217) to GK. AA would like to acknowledge funding from Cyprus Research Promotion Foundation, DESMI 2008, E5IXEIPH6EI6/E8APM/0308/20 http://www.research.org.cy. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00800/full#supplementary-material

### REFERENCES

fphar-09-00800 July 24, 2018 Time: 18:5 # 11


antibodies to tumor necrosis factor alpha. Arthritis Rheum. 36, 1681–1690. doi: 10.1002/art.1780361206


conserved and others are not. Proc. Natl. Acad. Sci. U.S.A. 112, E2585–E2594. doi: 10.1073/pnas.1501567112


**Conflict of Interest Statement:** GL and AA are affiliated with NovaMechanics Ltd., a drug design company.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Melagraki, Ntougkos, Papadopoulou, Rinotas, Leonis, Douni, Afantitis and Kollias. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Multi-Target Screening and Experimental Validation of Natural Products from Selaginella Plants against Alzheimer's Disease

Yin-Hua Deng<sup>1</sup> , Ning-Ning Wang<sup>1</sup> , Zhen-Xing Zou1, 2, Lin Zhang<sup>3</sup> , Kang-Ping Xu<sup>1</sup> , Alex F. Chen1, 4, Dong-Sheng Cao1, 4 \* and Gui-Shan Tan1, 2 \*

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Xiaojun Yao, Lanzhou University, China Claes Wahlestedt, Leonard M. Miller School of Medicine, United States

\*Correspondence:

Dong-Sheng Cao oriental-cds@163.com Gui-Shan Tan tgs395@csu.edu.cn

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 13 May 2017 Accepted: 03 August 2017 Published: 25 August 2017

#### Citation:

Deng Y-H, Wang N-N, Zou Z-X, Zhang L, Xu K-P, Chen AF, Cao D-S and Tan G-S (2017) Multi-Target Screening and Experimental Validation of Natural Products from Selaginella Plants against Alzheimer's Disease. Front. Pharmacol. 8:539. doi: 10.3389/fphar.2017.00539 <sup>1</sup> Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China, <sup>2</sup> Pharmacy Department, Xiangya Hospital, Central South University, Changsha, China, <sup>3</sup> College of Food Science and Technology, Central South University of Forestry and Technology, Changsha, China, <sup>4</sup> Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, China

Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disorder which is considered to be the most common cause of dementia. It has a greater impact not only on the learning and memory disturbances but also on social and economy. Currently, there are mainly single-target drugs for AD treatment but the complexity and multiple etiologies of AD make them difficult to obtain desirable therapeutic effects. Therefore, the choice of multi-target drugs will be a potential effective strategy inAD treatment. To find multi-target active ingredients for AD treatment from Selaginella plants, we firstly explored the behaviors effects on AD mice of total extracts (TE) from Selaginella doederleinii on by Morris water maze test and found that TE has a remarkable improvement on learning and memory function for AD mice. And then, multi-target SAR models associated with AD-related proteins were built based on Random Forest (RF) and different descriptors to preliminarily screen potential active ingredients from Selaginella. Considering the prediction outputs and the quantity of existing compounds in our laboratory, 13 compounds were chosen to carry out the in vitro enzyme inhibitory experiments and 4 compounds with BACE1/MAO-B dual inhibitory activity were determined. Finally, the molecular docking was applied to verify the prediction results and enzyme inhibitory experiments. Based on these study and validation processes, we explored a new strategy to improve the efficiency of active ingredients screening based on trace amount of natural product and numbers of targets and found some multi-target compounds with biological activity for the development of novel drugs for AD treatment.

Keywords: Alzheimer, Selaginella plants, multi-target screening, multi-target SAR, BACE1, MAO-B

## INTRODUCTION

Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disorder which is considered to be the most common cause of dementia. With the acceleration of aging process in human society, AD prevalence is expected to reach the epidemic levels (Mount and Downton, 2006). Commonly, a majority of AD patients often have both of behavioral and psychological symptoms of dementia (BPSD). The behavioral characteristic includes the progressive loss of memory, the decline of cognitive function, the decrease of physical function and ultimately problems with communication, time and space disorientation and so on. The psychological symptom includes psychosis, depression, agitation and anxiety (Gauthier et al., 2010; Okura et al., 2011; Borisovskaya et al., 2014). Furthermore, the presence of BPSD usually exacerbates the morbidity and mortality associated with dementia. In more advanced stages, BPSD has a greater impact on social and economic than on the learning and memory disturbances and it has become the major impetus to force patients choosing primary home care and specialized psychogeriatric units. Unfortunately, the existing therapeutic approaches for BPSD are usually efficacy-limited and associated with serious adverse effects, such as the increasing risk of death (Cummings, 2000; U.S. Food and Drug Administration, 2005, 2008).

Although the molecular mechanism of AD pathogenesis has not been clearly understood, several hypotheses have been proposed for AD pathogenesis and their interconnections aggravate this disease a complex disorder (Šimic et al., 2017 ´ ). The amyloid hypothesis (Goedert and Spillantini, 2006) is hallmarked by the neuropathological accumulation of amyloid beta (Aβ) plaques in the extracellular compartment and the intracellular accumulation of hyper-phosphorylated tau protein in the form of neurofibrillary tangles. The cholinergic hypothesis proposed a decreased level of acetylcholine in certain areas of brain (Craig et al., 2011). Oxidative stress hypothesis proposed the deregulation of endogenous detoxification redox systems and over-production of radical species leading to lipid peroxidation and nucleic acid mutations (Pratico, 2008). In addition, some other hypotheses, such as glutamatergic hypothesis (Bezprozvanny and Mattson, 2008), metal hypothesis (Bonda et al., 2011), and inflammatory hypothesis (Trepanier and Milgram, 2010) have also been proposed. Based on these pathogenesis, there are more than 200 enzymes or proteins related to AD, such as AchE, BACE1, GSK3β, MAO-B, GABA-A receptor, Glutamate receptor, and so on (Saura et al., 1994; Sathya et al., 2012; Fang et al., 2015; Yan et al., 2016). At present, licensed drugs approved for AD treatment are always based on single-target pharmacology. Now, there are two main categories of drugs for AD treatment: one is AchE inhibitor, including donepezil, rivastigmine, and galantamine. They can improve ACh level in the brain by decreasing the hydrolysis of ACh and are mainly used for mild to moderate AD treatment. The other one is N-methyl-D-aspartate antagonist (NMDA). The representative drug, memantine, is mainly used for the treatment of moderate to severe AD, but it is only licensed in several countries because of serious adverse drug reaction (Cummings, 2004; Standridge, 2004). Until now, the limitation of therapeutic treatments and their poor effectiveness make AD treatment become the current biggest medical problem in neurology. In fact, as described before, the complexity and multiple etiologies of AD make the single-target strategy difficult to obtain desirable therapeutic effects. Therefore, the choice of multi-target drugs will be a potential effective strategy in the treatment of AD and consequently the new chemical skeletons or active precursors with multi-target activities for AD therapy are inspired to be found.

As we all know, natural product is a highly valuable resource in searching for chemical precursors with potential bioactivity and few adverse effects because of their structural diversity. For example, biflavonoid glycosides from Impatiens balsamina show potential neuroprotective activity (Kim et al., 2017) and apigenin, quercetin show potent anti-Aβaggregation activity which is one of the major culprits in AD (Espargaró et al., 2017). Huperzine A (Hup A) is a highly selective, reversible and potent AChEI extracted from the Chinese medicinal herb Huperzia serrata. Compared with tacrine and donepezil, it has a higher bioavailability and potency but is less active toward BChE (Silva et al., 2014; Pisani et al., 2016). Nowadays, the purification of new chemical skeletons and activity screening from natural products still maintain sightless and accidental. Although more and more trace elements have been purified with the development of separation technology, it is still scarcely possible to carry out large-scale activity screening due to the contingency and trace outputs of separation. In recent years, with the rapid development of computer science and the accumulation of chemogenomics data, multi-target SAR model for activeingredient screening was proposed as a useful method for seeking active compounds and target identification (Cao et al., 2012, 2014; He et al., 2013; Yao et al., 2016). As to the multi-target SAR, the SAR predictive model for each target protein is built based on the relationship between the chemical structure of active and inactive compound. This in silico method can give a preliminary screening and target identification for a large number of natural compounds with a prediction probability before the in vitro activity test is carried out.

Based on the previous researches, flavonoids show extensive pharmacological activities including anti-AD efficiency. In 2015, Duan SW has identified silibinin, a flavonoid, as a dual inhibitor of AChE and Aβ peptide aggregation for AD treatment (Duan et al., 2015). And then, Song X also proved that Silibinin can attenuate the inflammatory responses, increase glutathione (GSH) levels, decrease malondialdehyde (MDA) levels and upregulate autophagy levels in the Aβ25−35-injected rats (Song et al., 2017). What's more, Baicalein, Scutellaria barbata flavonoids, Capparis spinose flavonoids, and 4-dimethylamine flavonoid derivatives all show some degree of anti-AD activities in animal experiments or in vitro tests (Gu et al., 2016; Luo et al., 2016; Mohebali et al., 2016; Wu et al., 2016). Therefore, it's highly valuable and feasible to screen multi-target ingredients from flavonoids extracts for the treatment of AD.

In this study, we aimed to find multi-targets active ingredients for AD treatment from the flavonoids extracts of Selaginella plants. Firstly, we explored behavioral effects on AD mice of total extracts (TE) from Selaginella doederleinii by Morris water maze test. And then, we screened our home-database consisted of compounds extracted from Selaginella plants to hunt ingredients with anti-AD activity through multi-target SAR models in silico. Finally, the in vitro enzyme activity inhibitory test and the molecular docking experiment were applied to verify the prediction results and to find the potential active ingredients for the AD multi-targets treatment.

### MATERIALS AND METHODS

#### Total Extracts of Selaginella Plants

Two hundred and fifty seven compounds were purified from Selaginella plants, including Selaginella tamariscina, Selaginella pulvinata Maxim, Selaginella braunii Baker, Selaginella delicatula (Desv.) Alston, Selaginella moellendorfii hieron, Selaginella uncinate, Selaginella involven Spring, Selaginella doederleinii Hieron. Total extracts (TE) were extracted using 75% ethanol and then freeze-dried into extractum. The suspensions of saline and freeze-dried extractum after ultrasonic vibration was orally administrated for AD mice.

#### Morris Water Maze Test

The learning and memory ability of AD mice were evaluated by Morris water maze test. Male specific-pathogen-free (SPF) grade male ICR mice (body weighing 18–22 g) were purchased from Hunan Provincial Experimental Animal Centers [Changsha, Hunan, China, Certificate No. SYXK (Xiang) 2012-0004] (Sun et al., 2009).

Mice were randomly divided into five groups (10 mice for each group), namely normal control group (NCG), model control group (MCG), low dose group (LDG, 50 mg/kg), middle dose group (MDG, 100 mg/kg) and high dose group (HDG, 200 mg/kg). To build the AD mice model, mice in MCG, LDG, MDG, and HDG were treated with D-gal (120 mg/kg, intraperitoneally) for 56 days (8 weeks), and the mice in normal group were treated with saline of the same volume for 56 days (8 weeks; dorsonuchal subcutaneous injection). After that, the TE suspensions of saline and freeze-dried extractum after ultrasonic vibration was orally administrated to the mice in LDG, MDG, and HDG for 42 days (6 weeks), and the mice in NCG and MCG were orally treated with saline of the same volume for 42 days (6 weeks). Finally, the spatial learning and memory ability of all the mice were tested by Morris water maze.

The equipment of Morris Water Maze were purchased from Anhui Zheng-hua biological equipment corporation and the test process followed to the relevant laboratory manual. Two indexes, the place navigation and spatial probe, were chosen as the main monitor elements to evaluate the spatial learning and memory ability of all the mice. The experimental method is divided into two parts: acquisition phase and probe trial. In the acquisition phase, we randomly put the head of the mouse into the wall of the pool and fix the starting position. After that, the time of finding the underwater platform was recorded. On the day after acquisition phase, the platform was removed and the probe trial began. The time of finding the position where the platform is located, the swimming distance and the number of crossing through the area where the platform is located were recorded as the spatial memory test indexes.

This study was carried out in accordance with the recommendations of "Laboratory Animals-Guideline of welfare and ethics, Ethics Committee of Hunan Provincial Experimental Animal Centers." The protocol was approved by the "Ethics Committee of Hunan Provincial Experimental Animal Centers."

### Multi-Target SAR Model and Prediction of 257 Compounds

According to previous studies published in recent years, we finally found 19 significant proteins related to AD (Cavalli et al., 2008; Fang et al., 2015). For these important AD-related proteins, we collected their ligands that are small, drug-like molecules from Binding database<sup>1</sup> . For each protein, activity data were filtered to keep only activity end-point points that have half-maximum inhibitory concentration (IC50), half-maximum effective concentration (EC50) or Ki values. A compound would be considered as a positive sample when its activity value was below 10 µM. Otherwise, this compound would be considered as a negative sample. Following this step, maybe some AD-related proteins have very little number of negative samples. To balance the number between positive samples and negative samples, we randomly selected certain number of compounds from other AD-related proteins to generate the negative samples for these AD-related proteins. The number of these selected negative samples together with inactive samples should be basically equal to the number of the active samples for these AD-related proteins. These prepared positive sets and negative sets were used for the subsequent model building. The detailed information of ADrelated proteins and these datasets used for model building can be seen in Supporting Information (Supplementary Material).

For each protein, a series of high confidence SAR models were built by Random Forest (RF) and different fingerprint representations (FP2, MACCS, Daylight, ECFP2, ECFP4, and ECFP6). RF was introduced by Breiman and Cutler for regression and classification modeling in 2001 firstly (Breiman, 2001). The method is based upon an ensemble of decision trees, from which the prediction of a classification task is provided as the majority vote of the predictions of all trees. Recent studies have suggested that RF offers several striking features which make it very attractive for QSAR/QSPR studies. These include relatively high accuracy of prediction, built-in descriptor selection, and a method for assessing the importance of each descriptor to the model (Cao et al., 2011a,b; Yun et al., 2016). RFs were trained using the RF library in the statistical computing environment, R. All the fingerprints were calculated by some tools developed by our group: ChemDes, BioTriangle webserver and ChemoPy package (Cao et al., 2013; Dong et al., 2015, 2016). To improve the prediction ability of the SAR model, we assembled all fingerprint models to obtain the consensus models with average output. All the assembled models were validated by 5-fold cross validation and test set validation to demonstrate their prediction performance. In this part, some popular statistic parameters were applied to evaluate the performance of these

<sup>1</sup>http://www.bindingdb.org/bind/index.jsp

classification models: true positive (TP); false negative (FN); true negative (TN); false positive (FP); sensitivity (SE); specificity (SP); accuracy (ACC); area under receiver operating characteristic curve (AUC). These classification evaluation parameters are defined as follows:

$$\begin{aligned} \text{SE} &= \frac{\text{TP}}{\text{TP} + \text{FN}} \\ \text{SP} &= \frac{\text{TN}}{\text{TN} + \text{FP}} \\ \text{ACC} &= \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FP} + \text{TN} + \text{FN}} \end{aligned}$$

After a series of modeling and validation processes, we aimed to obtain reliable SAR models for above-mentioned AD-related proteins. And then, 257 compounds purified from Selaginella plants were predicted by these robust and practical models and their inhibitory activities were identified preliminarily for further study.

#### Target Enzyme Inhibitory Activity In vitro

For the compounds that have been regarded as active ingredients by the multi-target SAR models, the in vitro target enzyme inhibitory activity test was applied to verify their actual activity for AD treatment. The inhibitory activities were determined by fluorimetric method on Infinite M200 Multi scan Spectrum (Tecan, Swiss). Each concentration was analyzed in triplicate and IC<sup>50</sup> values were determined by nonlinear regression of inhibition vs. log concentration plots, using GraphPad Prism 7 for Windows, Version 7.00 (GraphPad Software Inc.). BACE1 fluorescence resonance energy transfer assay kits were purchased from the Pan Vera Co and Monoamine Oxidase B (MAO-B) inhibitor screening kits were purchased from Bio Vision Inc.

In the BACE1 inhibition test, the assay was performed in 384-well plates. The assay solution was consisted of 10 µL test compounds (concentrations: 0.017, 0.050, 0.167, 0.500, 1.667, 5.000, 16.667, and 50.000 µM), 10 µL BACE1 substrate and 10 µL BACE1 enzyme. LY2811376 was selected as the reference compound with IC<sup>50</sup> = 0.242 µM and the blank buffer was set as the negative control. The mixture was incubated for 60 min at room temperature. At the end, 10 µL BACE1 stop solution was added to stop the reaction and the fluorescence was detected at the Ex/Em = 545/585 (12 nm bandwidth) settings on Multi scan Spectrum.

In the MAO-B inhibition test, the assay was performed in 96-well plates. The assay solution was consisted of 10 µL test compounds(concentrations: 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8, and 25.6 µM), 37 µL MAO-B assay buffer, 1 µL MAO-B substrate, 1 µL developer and 1 µL OxiRed Probe. Selegiline was used as the reference control with IC<sup>50</sup> = 0.028 µM and the blank buffer was set as the negative control. The mixture was incubated for 10 min at 37◦C. The fluorescence was measured at Ex/Em = 535/587 nm kinetically at 37◦C for 10–40 min. Two points (T1 and T2) in the linear range of the plot were chosen and the corresponding fluorescence values (RFU1 and RFU2) were obtained to calculate the slope for all samples. The Calculation of % relative inhibition was following the manual of MAO-B inhibitor screening kit.

### Molecular Docking Simulation

To further verify the results of multi-target SAR prediction and enzyme inhibitory experiments, the molecular docking process was applied to simulate the binding position and binding affinity between the active compounds and target proteins. Generally speaking, docking is a computer simulation modeling technique used to predict the interaction between a ligand and a receptor active site, and is an important tool in structure-based drug design. The technique of docking is to position the ligand in different orientations and conformations within the binding site to calculate optimal binding geometries and energies. In this part, the molecular operating environment (MOE, version 2014.) was applied to carry out the molecular docking process. MOE's dock application searches for favorable binding modes between smallto medium-sized ligands and a not-too-flexible macromolecular target. For each ligand, a number of placements called poses are generated and scored. The score can be calculated as either a free energy of binding including among others solvation and entropy terms, or enthalpy based on polar interaction terms including metal ligation, or as qualitative shaped-based numerical value. According to the score values, ligands with different conformations can be ranked and the optimal structural conformation will be affirmed (Wang J. et al., 2015). To make the interactions with the binding site easy to see, the ligand interaction was carried out. It will automatically be loaded with a 2D diagram of the original ligand and a schematic representation of the binding site residues, with the important interactions between ligand and binding site shown. In this study, we selected two proteins as the docking acceptors: BACE1 (PDB ID: 1TQF) (Cobum et al., 2004); MAO-B (PDB ID: 2V5Z) (Binda et al., 2007). As a control, the original ligand included in the crystal structure should also be docked. A series of parameters were set: Dock: rescoring 1 = ASE; retain = 100; rescoring 2 = ASE; retain = 100. Configure force field: final gradient = 0.0001; maximum iterations = 1,000; force constant = 10; radius offset = 0.4. For rest parameters, the default treatment was applied.

## RESULTS AND DISCUSSION

### Behavioral Evaluation of AD Mice Dealt with Total Extracts

Learning and memory ability of AD mouse was evaluated by Morris water maze test in which the navigation and space exploration are used as indexes. There were five groups of mice under study and the behavioral results can be seen in **Table 1** and **Figure 1**. In the **Table 1**, the residence time and residence distance of each quadrant, the total distance and the number through platform for each group of mice were listed. From the table, we can see that the residence time of MCG was significantly decreased (P < 0.05) in 1st and 4th quadrant compared with NCG. The stay intervals in 1st quadrant for MCG group were significantly lower than NCG (P < 0.05). However, the residence distance in 3rd quadrant increased prominently (P < 0.05). HDG showed longer distance in 1st quadrant (P < 0.05) and opposite trends in 3rd quadrant (P < 0.05) compared with MCG after 42 days' dosage. With respect to the total distance and


 1.4+ 200 39.9 ± 1.5+ 854.6 ± 110.5+ 35.8 ± 4.9 692.2 ± 262.2 16.8 ± 1.9+ 453.5 ± 95.4+ 29.3 ± 1.4+ 468.8 ± 104.0 2729.4 ± 188.3+ 20.2 ± 6.4 3.0 ± 1.4+ Compared with NCG \*P < 0.05, compared with MCG +P < 0.05; Crossing through number, the number of crossing through the area where the platform is located; time, s; distance, cm; speed, cm/s.

numbers through platform, they were significantly reduced for MCG ( P < 0.05) compared with NCG. They were increased significantly for HDG ( P < 0.05) compared with MCG after 42 days' dosage. What's more, the crossing through number for MDG were also significantly increased ( P < 0.05).

**Figure 1** shows the navigation and space exploration for different groups of mice. From the figure, we can see that the spatial learning and memory ability of NCG were significantly increased, but there was an opposite trend for MCG and mice in MCG mainly ran along the cell wall. What's more, all the low, medium and high dose of TE can significantly increase the numbers of exploration platform. Considering both results in **Table 1** and **Figure 1**, TE of Selaginella has a remarkable improvement on learning and memory function for AD mice. This result inspires us to further explore the effect of the chemical ingredients from Selaginella on AD treatment.

#### Performance Evaluation and the Inhibitory Activity Prediction

Based on the results of the Morris water maze test, the TE of Selaginella plants show a potential benefit for AD treatment. To quickly screen the active ingredients from a number of compounds preliminarily, multi-target SAR models associated with AD-related proteins were constructed as described before. In this part, we finally obtained a series of ensemble predictive models for AD-related proteins. Their statistic results of 5 fold cross validation and test set validation were listed in **Table 2**. From this table, we can see that for each predictive model, the accuracy is good enough not only for cross validation (0.808–0.955) but also the test set validation (0.846– 0.970). With respect to other statistic parameters, the similar results were obtained and it can be strong evidence for the good predictive ability of this model. Therefore, we have reasons to believe that these ensemble models are robust and practical and can be used to predict the inhibitory activity for a new compound in the early stage of drug discovery.

To evaluate the probability of inhibitory activity for the 19 AD-related targets, 257 compounds were purified from the TE including 143 flavonoids, 9 selaginellins and some other compounds. Before the inhibitory activity prediction by SAR models, the preliminary druggability evaluation was carried out to exclude some compounds that have no beneficial property for further drug development process. In this part, we mainly evaluated the molecular weight and two important ADME (absorption, distribution, metabolism, elimination) properties for druggability by corresponding QSAR models developed by our group: logD7.4 (the distribution coefficients at pH = 7.4) (Wang J. B. et al., 2015; Wang et al., 2017), and logPapp (the Caco-2 membrane permeability) (Wang et al., 2016). Based on previous studies, a good drug candidate should have a logD7.4 value smaller than 5, a logPapp value larger than −5.15 and a molecular weight smaller than 500. After excluding compounds that perform very poorly in at least two of three aforementioned

HDG

TABLE

1


The

effects

of

TE

on

AD

mice's

behavior

(x

±

s,

n

=

10).


properties, there were 238 compounds left for further activity screening.

As described before, the inhibitory activity of these 238 compounds were predicted by the multi-target SAR model. The predictive result for a new compound was outputted as a probability value. For each compound that was classified as active ingredient by SAR models, if its probability value >0.5, it is considered to be active, otherwise, it is inactive. From the predictive result, it can be seen that 54 flavonoids and 4 selaginellins present a good inhibitory correlation with MAO-B, 21 flavonoids may show BACE1 inhibitory activity. However, to improve the reliability of prediction, we apply the prediction probability of 0.8 as a cut-off value to select the active compounds for some related targets. As a result, 18 compounds

FIGURE 2 | The chemical structures of 13 compounds that with inhibitory activity after multi-target SAR model prediction. Among them, eight are biflavones and the left five are selaginellins.


<sup>a</sup>The IC<sup>50</sup> value cannot be calculated in the predetermined concentration range.

TABLE 4 | Four active compounds and their docking results.


FIGURE 4 | The docking results of S-8 bounding to BACE1 (left, PDB ID: 1TQF) and MAO-B (right, PDB ID: 2V5Z). The structure of S-8 is rendered green and the docking pocket surface was adjected to a suitable transparency.

force with CYS (397A) and GLY (13A).

with a probability value larger than 0.8 were extracted. These compounds were prepared for further validation in the inhibitory activity test and their detailed information can be seen in the Supporting Information (Supplementary Material).

### In vitro Validation of Inhibitory Activity for Target Enzyme

Based on the prediction outputs, we focused on the screening of BACE1/MAO-B dual inhibitory activity of flavonoids and Deng et al. Natural Products Screening for AD

selaginellins. The enzyme BACE1 is considered as a prime target to design therapeutics for AD mainly because of that the catalysis process by BACE1 is the rate-limiting step in APP proteolysis and the BACE1 knock-out mice lacking Aβ production survives with normal physiology (Roberds et al., 2001). As the majorβsecretase enzyme that initiates the generation of Aβ, BACE1 is undoubtedly a prime target for anti-Aβtherapy in AD (Ohno, 2016). The increase of MAO-B activity is associated with gliosis, which can result in higher levels of H2O<sup>2</sup> and oxidative free radicals (Nebbioso et al., 2012). Thus, the MAO-B inhibitors are potential candidates for anti-AD drugs due to their capacity to regulate neurotransmitters and inhibit oxidative damage in the central nervous system.

Considering the quantity of existing compounds in our laboratory, 13 compounds were chosen to carry out the inhibitory activity validation experiments. Their chemical structures were displayed in **Figure 2** and their IC<sup>50</sup> values can be seen in **Table 3**. To evaluate the inhibitory activity of these compounds, a threshold value of IC<sup>50</sup> = 10 µM was applied. If a compound has a IC<sup>50</sup> values smaller than 10 µM, it would be considered to be active. Otherwise, it is inactive. We can find that nine of them show good inhibition on BACE1 with IC<sup>50</sup> values ranged from 0.7454 to 7.578 µM and five of them show good inhibition on MAO-B with IC<sup>50</sup> values ranged from 2.913 to 8.813 µM. Among them, S-8, S-5, S-13, and S-12 all have significant dual BACE1/MAO-B inhibitory activities with IC<sup>50</sup> values in the micromole magnitude and S-8 has been proved to be the most potent against BACE1 and MAO-B with IC<sup>50</sup> values of 0.7454 and 3.619 µM, respectively. Among them, S-5 and S-8 are biflavones, S-12 and S-13 are selaginellins. The inhibitory curves for these four compounds were summarized in **Figure 3**.

#### Molecular Docking

As described before, for each structural conformation of S-5, S-8, S-12, and S-13, a score value was obtained to evaluate the binding affinity between this active compound and each target protein (BACE1 and MAO-B). Generally, a lower score is better. Therefore, the optimal conformation can be decided from a series of generated conformations for each compound according to their score values. Combining the result of ligand interaction, four active compounds and their docking results were listed in **Table 4**. From the table, we can see that four compounds indeed all have some degree of interaction with BACE1 and MAO-B compared with their original ligands. For BACE1, the most active molecule is S-8 for which the score value is −32.7 and the main binding force is the hydrogen bond force and pi-bond force with ASP (232A) and THR (231A). As to the rest three molecules, the mainly binding force are also the hydrogen bond and pi-bond force with different residues. For MAO-B, the most active molecule is S-5 for which the score value is −44.0 and the main binding force is the hydrogen bond force and pi-bond force with CYS (397A) and GLY (13A). In summary, the molecular docking results were consistent with the results of aforementioned inhibitory experiments that S-8 has the strongest inhibition activity for BACE1 and S-5/S-8 performs better than S-12/S-13 in the inhibition of MAO-B. Therefore, as the conclusion obtained from above in intro inhibitory test, S-5, S-8, S-12, S-13 all have significant dual BACE1/MAO-B inhibitory activities and S-8 promises to be the most potent against BACE1 and MAO-B. The docking results and corresponding 2D ligand interaction diagram of S-8 bound to BACE1 and MAO-B can be seen in **Figures 4**, **5**. The detailed information of all conformations and docking results for other three compounds can be seen in the Supporting Information (Supplementary Material).

### CONCLUSION

In this study, we explored that the TE extracted from Selaginella plants has a remarkable improvement on learning and memory function for AD mice by Morris water maze test. And then, we preliminarily screened our home-database consisting of flavonoids compounds by multi-target SAR modelsin silico. After that, the in vitro enzyme activity inhibitory test was applied to evaluate 13 compounds that were considered to be active by multi-target SAR models and finally 4 compounds (S-8, S-5, S-13, and S-12) were found to have significant inhibitory activities on both BACE1 and MAO-B. Among them, S-8 has been proved to be the most potent ingredient against BACE1 and MAO-B with IC<sup>50</sup> values of 0.745 and 3.619 µM, respectively. What's more, the molecular docking experiment was applied to verify the prediction results and to find the binding position and binding strength between the active ingredient and AD-related proteins. All in all, after these study and validation processes, we explored a new strategy to improve the efficiency of screening the active ingredients based on trace amount of natural product and numbers of targets and finally obtained some multi-targets potential compounds for the development of novel drugs for AD treatment.

### AUTHOR CONTRIBUTIONS

YD, NW, DC, and GT designed this study. YD and NW wrote and revised the manuscript. ZZ helped in preparing figures and tables. LZ, AC, and KX helped in giving suggestions to improve the manuscript. All authors read and approved the final manuscript.

#### FUNDING

This work is financially supported by the National Natural Science Foundation of China (Grants No. 81402853, 31370370, 81501619), the National Key Basic Research Program (2015CB910700), the Central South University Innovation Foundation for Postgraduate (2016zzts498), the Project of Innovation-driven Plan in Central South University, and the Postdoctoral Science Foundation of Central South University, the Chinese Postdoctoral Science Foundation (2014T70794, 2014M562142). The studies meet with the approval of the university's review board.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fphar. 2017.00539/full#supplementary-material

Supporting Information SI1 lists 19 AD-related proteins and their detail information, the information

#### REFERENCES


including their gene names, uniport ID, protein name, and the number of samples for modeling process. Supporting Information SI2 lists the structural information of 18 compounds selected by multi-target SAR models. Supporting Information SI3 shows the detailed information of all conformations and docking results for S-5, S-12, and S-13.

representations for chemicals, proteins, DNAs/RNAs and their interactions. J. Cheminform. 8:34. doi: 10.1186/s13321-016-0146-2


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Deng, Wang, Zou, Zhang, Xu, Chen, Cao and Tan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Targets Fishing and Identification of Calenduloside E as Hsp90AB1: Design, Synthesis, and Evaluation of Clickable Activity-Based Probe

Shan Wang1†, Yu Tian1†, Jing-Yi Zhang<sup>1</sup> , Hui-Bo Xu<sup>2</sup> , Ping Zhou<sup>1</sup> , Min Wang<sup>1</sup> , Sen-Bao Lu<sup>3</sup> , Yun Luo<sup>1</sup> , Min Wang<sup>4</sup> , Gui-Bo Sun<sup>1</sup> \*, Xu-Dong Xu<sup>1</sup> \* and Xiao-Bo Sun<sup>1</sup> \*

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Brian Hudson, University of Glasgow, United Kingdom Carlos Henrique Ramos, Universidade Estadual de Campinas, Brazil Valentina Vellecco, University of Naples Federico II, Italy

#### \*Correspondence:

Gui-Bo Sun gbsun@implad.ac.cn Xu-Dong Xu xdxu@implad.ac.cn Xiao-Bo Sun sun\_xiaobo163@163.com

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 26 February 2018 Accepted: 02 May 2018 Published: 23 May 2018

#### Citation:

Wang S, Tian Y, Zhang J-Y, Xu H-B, Zhou P, Wang M, Lu S-B, Luo Y, Wang M, Sun G-B, Xu X-D and Sun X-B (2018) Targets Fishing and Identification of Calenduloside E as Hsp90AB1: Design, Synthesis, and Evaluation of Clickable Activity-Based Probe. Front. Pharmacol. 9:532. doi: 10.3389/fphar.2018.00532 <sup>1</sup> Beijing Key Laboratory of Innovative Drug Discovery of Traditional Chinese Medicine (Natural Medicine) and Translational Medicine, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China, <sup>2</sup> Academy of Chinese Medical Sciences of Jilin Province, Changchun, China, <sup>3</sup> Department of Bioengineering, Santa Clara University, Santa Clara, CA, United States, <sup>4</sup> Life and Environmental Science Research Center, Harbin University of Commerce, Harbin, China

Calenduloside E (CE), a natural triterpenoid compound isolated from Aralia elata, can protect against ox-LDL-induced human umbilical vein endothelial cell (HUVEC) injury in our previous reports. However, the exact targets and mechanisms of CE remain elusive. For the sake of resolving this question, we designed and synthesized a clickable activity-based probe (CE-P), which could be utilized to fish the functional targets in HUVECs using a gel-based strategy. Based on the previous studies of the structure-activity relationship (SAR), we introduced an alkyne moiety at the C-28 carboxylic group of CE, which kept the protective and anti-apoptosis activity. Via proteomic approach, one of the potential proteins bound to CE-P was identified as Hsp90AB1, and further verification was performed by pure recombinant Hsp90AB1 and competitive assay. These results demonstrated that CE could bind to Hsp90AB1. We also found that CE could reverse the Hsp90AB1 decrease after ox-LDL treatment. To make our results more convincing, we performed SPR analysis and the affinity kinetic assay showed that CE/CE-P could bind to Hsp90AB1 in a dose-dependent manner. Taken together, our research showed CE could probably bind to Hsp90AB1 to protect the cell injury, which might provide the basis for the further exploration of its cardiovascular protective mechanisms. For the sake of resolving this question, we designed and synthesized a clickable activity-based probe (CE-P), which could be utilized to fish the functional targets in HUVECs using a gel-based strategy.

Keywords: Calenduloside E, clickable activity based protein profiling, computational chemistry, HUVECs, Hsp90AB1

#### INTRODUCTION

Natural products represent an enormous source of pharmacologically useful compounds and are often used as the starting point in modern drug discovery. However, many biologically interesting natural products are not being pursued as potential drug candidates, partly due to the lack of well-defined mechanisms of action. The identification of drug targets is very important in the process of drug discovery, which allows researchers to clarify the mechanisms of drug action (Krysiak and Breinbauer, 2012; Yue et al., 2012). Activity-based protein profiling (ABPP) is a chemical proteomic method that uses active site-directed chemical probes to selectively target subsets of proteins in the proteome based on shared mechanistic and/or structural features (Barglow and Cravatt, 2007; Cravatt et al., 2008; Pichler et al., 2016). This technique has been widely used in enzyme proteomes with quantitative proteomics development; this technique has been used to identify unknown target compounds (Chen et al., 2017). The basic chemical structure of the molecular probe consists of three parts: a reactive group, a binding group, and a reporter tag. The ABPP probe targets a large number of proteins via the reactive group, providing researchers with a global view of the proteome profile. Then, target proteins are identified by quantitative proteomics analysis (Hunerdosse and Nomura, 2014; Wright and Sieber, 2016). However, most tags are relatively bulky compared with the small molecule probe, which influences cell permeability and may prevent the reactive group from entering the active site. With the development of click chemistry, CC-ABPP strategies using a biorthogonal reaction with a label-free probe have been increasingly applied to circumvent this issue (Speers and Cravatt, 2009; Li et al., 2012). The reporter group is substituted with a small, latent chemical handle (alkyne or azide), which does not impede cell permeability, and can be simultaneously diversified with a variety of reporter groups without the need to develop new synthetic routes (Martell and Weerapana, 2014). The CC-ABP probe has advanced the ABPP field by expanding the enzyme classes targeted by ABPs, enabling cellular and in vivo studies and providing technological platforms to quantitatively monitor protein activities in complex biological systems. Currently, the CC-ABPP technology has become an effective method for the discovery of functional targets of small molecules (Lapinsky and Johnson, 2015).

Aralia elata (Miq) Seem (AS), which is used extensively in traditional Chinese medicine (TCM), has been used as a tonic herb due to its anti-arrhythmic, anti-arthritic, antihypertensive and anti-diabetic effects (Baranov, 1982). Moreover, as a main component of A. elata Xinmaitong capsules (Clinical Trial Approval Number 2003L01111 by the China Food and Drug Administration), AS was developed for the treatment of coronary heart disease and has successfully completed Phase III clinical trials in China. According to our previous studies, AS exhibited anti-myocardial ischemia and anti-hypoxia activities (Wang et al., 2014, 2015, 2017). The total saponins from AS are considered the main pharmacologically active ingredients of AS. Various oleanane-type triterpene saponins were extracted from AS and identified. Calenduloside E (CE, **Figure 1**) is one of the natural triterpene saponins extracted from AS. Calenduloside E was previously shown to protect endothelial cells from injury and reduce apoptotic endotheliocytes and it could protect against H2O2-induced H9c2 cardiomyocytes apoptosis (Tian et al., 2017a,b). Using the ABPP probe, we identified 587 proteins as the most likely targets of CE. In our previous paper, our ABPP probe was the basic probe, but the biotin tag on the probe may have interfered with the binding of CE to the targets. In our present research, for the first time, we designed and synthesized a clickable probe CE-P, which could be introduced the biotin tag via click chemistry to avoid the interference of bulk molecule. Utilizing this CC-ABPP strategy, we identified and confirmed potential targets of CE.

### MATERIALS AND METHODS

#### Materials

ox-LDL was obtained from Union-Biotechnology. Annexin-V/Propidium iodide (FITC/PI) staining kit (V13241) was Molecular ProbesTM. MTT [3-(4, 5-dimethylthiazol-2-yl)- 2, 5-diphenyltetrazoliumbromide, 0973] was the products of Amresco. JC-1 (C2005) was purchased from Beyotime biotechnology. Caspase-3 fluorometric assay kit (K105- 200) was acquired from BioVision. VascuLife <sup>R</sup> VEGF Endothelial Cell Culture Medium (LL-0003) was the products of Lifeline cell technology. TBTA (Tris[(1-benzyl-1H-1, 2, 3-triazol-4-yl)methyl]-amineT2993), TCEP (Tris(2 carboxyethyl)phosphine, T1656) were purchased from Tokyo Chemical Industry. Biotin-azide was provided from the Institute of Medicinal Plant Development (Beijing, China) (Tian et al., 2017b). HOBt (N-Hydroxybenzotriazole), EDCI (1-Ethyl-(3 dimethylaminopropyl) carbodiimide hydrochloride), TEMPO (2, 2, 6, 6-Tetramethylpiperidine1-oxyl) were purchased from Energy Chemical Industry. PierceTM Streptavidin Agarose (20347), PierceTM Silver Stain for Mass Spectrometry (24600) was from Thermo Fisher Scientific. The primary antibody against Hsp90AB1, Bcl2, Cytochrome C was obtained from Santa Cruz Biotechnology (Santa Cruz, CA, USA). Lox1 primary antibody was from Abcam (Cambridge, UK).Recombinant human Hsp90AB1 protein was from Abcam (Cambridge, UK).

#### Chemistry

Glycosyl donor **compound i** was prepared from galactose, and the reaction conditions were reported previously by Schmidt (Sun et al., 2014).

#### Synthesis of Compound I

To a solution of ursolic acid (10.0 g, 21.8 mmol) in dry DCM (300 mL), TBAB (0.8 g, 2.5 mmol) and K2CO<sup>3</sup> (7.4 g, 53.6 mmol) in water (50 mL) were added, and benzyl bromide (3.2 mL, 26.8 mmol) was dropped at 0◦C. Then the reaction mixture was stirred at room temperature for 18 h. Reaction was monitored by TLC. The crude mixture was separated and the water layer was extracted with DCM (3 × 100 mL). The combined organic layer was washed with 0.1 mol/L HCl aqueous solution, NaHCO<sup>3</sup> saturated aqueous solution and NaCl saturated aqueous solution in sequence, and then dried over Na2SO<sup>4</sup> and purified through column chromatography (eluent: PE-EtOAc, 8:1) to offer pure white solid **compound I** (11.1 mg, 93% yield). <sup>1</sup>H-NMR (600

**Abbreviations:** CE, Calenduloside E; CE-P, Calenduloside E Probe; HUVEC, human umbilical vein endothelial cell; ABPP, Activity-based protein profiling; CC-ABPP, click chemistry-Activity-based protein profiling; SAR, structure-activity relationship; BnBr, benzyl bromide; K2CO3, potassium carbonate solution; TBAB, tetrabutylammonium bromide; DCM, dichloromethane; TMSOTf, trimethylsilyl trifluoromethanesulfonate; ox-LDL, Oxidized Low density lipoprotein; PI, propidium iodide; 1DGE, one-dimensional gel electrophoresis; LC-MS/MS, liquid chromatography/tandem mass spectrometry; Hsp90, Heat shock protein.

MHz, pyridine-d5) δ: 7.36–7.29 (m, 5H, OPh-H), 5.23 (t, J = 3.3 Hz, 1H, H-12), 5.10 (d, J = 12.5 Hz, 1H, CH2OPh), 4.98 (d, J = 12.5 Hz, 1H, CH2OPh), 3.23–3.19 (m, 1H, H-3), 2.26 (d, J = 11.1 Hz, 1H, H-18), 1.07 (s, 3H, CH3), 0.98 (s, 3H, CH3), 0.93 (d, J = 6.3 Hz, 3H, CH3), 0.89 (s, 3H, CH3), 0.85 (d, J = 6.5 Hz, 3H, CH3), 0.77 (s, 3H, CH3), 0.64 (s, 3H, CH3); <sup>13</sup>C-NMR (150 MHz, pyridine-d5) δ: 177.5, 138.2, 136.5, 128.5, 128.3, 128.1, 125.8, 79.2, 77.4, 77.2, 76.9, 66.1, 55.3, 53.0, 48.2, 47.7, 42.2, 39.6, 39.2, 39.0, 38.9, 38.7, 37.1, 36.8, 33.1, 30.8, 28.3, 28.1, 27.3, 24.4, 23.7, 23.4, 21.3, 18.4, 17.1, 15.8, 15.6.

#### Synthesis of Compound II

To a solution of **compound I** (3.3 g, 6.0 mmol) in dry DCM (50 mL), glycosyl donor **compound i** (5.8 g, 7.9 mmol) and 4Å molecular sieve 0.5 g were added and stirred at room temperature for 1 h under N<sup>2</sup> air. Then lewis acid TMSOTf (60 µg, 0.3 mmol) was dropped and reacted for 2–4 h. When complete, triethylamine 1.0 mL was added to quench the reaction. Then the suspension was filtered out and the filtrate was evaporated and the crude product was subjected to column chromatography (eluent: PE-EtOAc, 10:1) to gain pure **compound II** (4.7 g, 70% yield) as white solid. <sup>1</sup>H-NMR (600 MHz, pyridine-d5) δ: 8.25– 8.22 (m, 4H, OBz-H), 8.16–8.15 (m, 2H, OBz-H), 8.01–8.00 (m, 2H, OBz-H), 7.56–7.52 (m, 3H, OBz-H), 7.49–7.41 (m, 6H, OBz-H), 7.38–7.34 (m, 3H, OBz-H), 7.29–7.27 (m, 3H, OBz-H), 7.10– 7.08 (m, 2H, OBz-H), 6.55–6.54 (m, 1H, Gal-H), 6.48–6.45 (m, 1H, Gal-H), 6.40–6.38 (m, 1H, Gal-H), 5.43 (d, J = 7.9 Hz, 1H, Glc-H-1′ ), 5.41 (t, J = 3.3 Hz, 1H, H-12), 5.34 (d, J = 12.5 Hz, 1H, OBn-H), 5.22 (d, J = 12.4 Hz, 1H, OBn-H), 5.16–5.13 (m, 1H, Gal-H), 4.96–4.94 (m, 1H, Gal-H), 4.80–4.77 (m, 1H, Gal-H), 3.39 (dd, J = 11.9 Hz, 4.4 Hz, 1H, H-3), 2.47 (d, J = 11.3 Hz, 1H, H-18), 1.15 (s, 3H, CH3), 0.97 (d, J = 6.5 Hz, 3H, CH3), 0.93 (s, 6H, 2×CH3), 0.79 (s, 3H, CH3), 0.77 (s, 3H, CH3), 0.73 (s, 3H, CH3); <sup>13</sup>C-NMR (150 MHz, pyridine-d5) δ: 176.8, 166.1, 166.0,165.8, 165.7, 138.5, 137.1, 133.8, 133.6, 133.5, 133.4, 130.3, 130.1, 130.0, 129.9, 129.8, 129.5, 129.0, 128.8, 128.7, 128.5, 128.3, 125.9, 103.8, 90.2, 72.6, 71.7, 71.1, 69.3, 66.1, 62.7, 55.5, 53.3, 48.2, 47.7, 42.2, 39.7, 39.2, 39.0, 38.9, 38.5, 36.9, 36.6, 33.2, 30.7, 28.2, 27.9, 26.4, 24.5, 23.7, 23.4, 21.2, 18.2, 17.2, 17.1, 16.6, 15.3.

#### Synthesis of Compound III

A mixture of **compound II** (3.0 g, 2.6 mmol) and 10% Pd/C (1.5 mg) was hydrogenated at 1 atm for 4–6 h in refluxing EtOAc (30 mL). The mixture was filtered and concentrated, the residue was purified by silica gel column chromatography (eluent: PE-EtOAc, 3:1) to get pure **compound III** (2.4 g, 91% yield) as white solid. <sup>1</sup>H-NMR (600 MHz, pyridine-d5) δ: 8.24–8.21 (m, 4H, OBz-H), 8.14–8.13 (m, 2H, OBz-H), 8.00–7.98 (m, 2H, OBz-H), 7.56–7.53 (m, 1H, OBz-H), 7.48–7.45 (m, 2H, OBz-H), 7.43–7.41 (m, 2H, OBz-H), 7.38–7.35 (m, 2H, OBz-H), 7.28–7.26 (m, 3H, OBz-H), 7.10–7.07 (m, 2H, OBz-H), 6.54–6.53 (m, 1H, Gal-H), 6.46–6.43 (m, 1H, Gal-H), 6.39–6.36 (m, 1H, Gal-H), 5.50 (t, J = 3.3 Hz, 1H, H-12), 5.42 (d, J = 7.9 Hz, 1H, Glc-H-1′ ), 5.15– 5.12 (m, 1H, Gal-H), 4.96–4.93 (m, 1H, Gal-H), 4.79–4.76 (m, 1H, Gal-H), 3.38 (dd, J = 11.7 Hz, 4.3 Hz, 1H, H-3), 2.65 (d, J = 11.3 Hz, 1H, H-18), 1.22 (s, 3H, CH3), 1.05 (d, J = 6.4 Hz, 3H, CH3), 0.98–0.97 (m, 6H, 2×CH3), 0.92 (s, 3H, CH3), 0.76 (s, 3H, CH3), 0.72 (s, 3H, CH3); <sup>13</sup>C-NMR (150 MHz, pyridine-d5) δ: 179.8, 166.1, 166.0, 165.8, 165.7, 139.1, 133.8, 133.6, 133.6, 133.4, 130.3, 130.0, 130.0, 130.0, 129.8, 129.4, 129.4, 129.0, 128.8, 128.8, 128.6, 125.4, 103.7, 90.2, 72.5, 71.7, 71.0, 69.3, 62.7, 55.7, 53.4, 47.9, 47.8, 43.3, 42.3, 39.7, 39.4, 39.3, 38.9, 38.5, 37.3, 36.6, 33.3, 30.9, 28.5, 27.8, 26.4, 24.8, 23.8, 23.4, 21.3, 18.2, 17.4, 17.2, 16.6, 15.3.

#### Synthesis of Compound IV

To a solution of compound **III** (1.0 g, 0.98 mmol) in dry DCM (15 mL), HOBt (0.2 g, 1.46 mmol) and EDCI (0.28 g, 1.46 mmol) were added and stirred at room temperature for 1 h. To this mixture, propargylamine (0.22 g, 3.92 mmol) was added respectively at 0◦C and the reaction mixture was stirred until its completion for 8 h. The solvent was washed with 0.1 mol/L HCl aqueous solution, NaHCO<sup>3</sup> saturated aqueous solution and NaCl saturated aqueous solution in sequence, and then dried over Na2SO4. The suspension was filtered and the filtrate was concentrated and purified through column chromatography (eluent: DCM-CH3OH, 100:1) to offer pure white solid compound **IV** as white solid, 79% yield. <sup>1</sup>H-NMR (600 MHz, pyridine-d5) δ: 8.24–8.22 (m, 4H, OBz-H), 8.14–8.13 (m, 2H, OBz-H), 7.99–7.98 (m, 2H, OBz-H), 7.94 (t, 1H, CONH), 7.53–7.52 (m, 1H, OBz-H), 7.47–7.46 (m, 2H, OBz-H), 7.43–7.40 (m, 2H, OBz-H), 7.37–7.35 (m, 2H, OBz-H), 7.28–7.26 (m, 3H, OBz-H), 7.09–7.07 (m, 2H, OBz-H), 6.54 (m, 1H, Gal-H), 6.46– 6.37 (m, 2H, Gal-H), 5.47–5.42 (m, 2H, H-12, Glc-H-1′ ), 5.15– 5.12 (m, 1H, Gal-H), 4.95 (m, 1H, Gal-H), 4.79–4.77 (m, 1H, Gal-H), 4.33 (m, 2H, CONHCH2), 3.39 (m, 1H, H-3), 3.13 (m, 1H, CCH), 2.47 (d, J = 10.3 Hz, 1H, H-18), 1.19 (s, 3H, CH3), 1.00 (d, J = 5.2 Hz, 3H, CH3), 0.97–0.92 (m, 9H, 3×CH3), 0.78–0.77 (m, 6H, 2×CH3); <sup>13</sup>C-NMR (150 MHz, pyridine-d5) δ: 177.1, 166.1, 166.0, 165.8, 165.7, 139.2, 133.8, 133.6, 133.6, 133.4, 130.3, 130.0, 130.0, 130.0, 129.8, 129.4, 129.4, 129.0, 128.8, 128.8, 128.6, 125.8, 103.7, 90.2, 81.9, 72.5, 71.9, 71.7, 71.0, 69.3, 62.7, 55.5, 53.1, 47.8, 47.7, 43.3, 42.3, 39.8, 39.7, 39.2, 38.9, 38.5, 37.7, 36.6, 33.1, 31.0, 29.9, 29.1, 28.1, 27.8, 26.4, 24.7, 23.7, 23.5, 21.3, 18.2, 17.5, 17.4, 16.6, 15.3.

#### Synthesis of Compound V

To a solution of compound **IV** in MeOH/DCM (8 mL, 3:1) was added 1 mol/L NaOMe/NaOH solvent (1.6 mL). The reaction mixture was stirred for 2 h until its completion, after that Amberlite IR-120 was added to acidate PH 7. The suspension was filtered out and the filtrate was evaporated and purified through column chromatography (eluent: DCM-CH3OH, 10:1) to offer pure white solid compound **V** as white solid, 93% yield. <sup>1</sup>H-NMR (600 MHz, pyridine-d5) δ: 7.84 (t, J = 5.4 Hz, 1H, N-H), 5.45 (t, J = 3.3 Hz, 1H, H-12), 4.89 (d, J = 7.7 Hz, 1H, H-1′ ), 4.60–4.59 (m, 1H, Gal-H), 4.52–4.46 (m, 3H, Gal-H), 4.40–4.28 (m, 2H, H-31), 4.19 (dd, J = 3.4 Hz, 9.4 Hz, 1H, Gal-H), 4.13 (t, J = 6.2 Hz, 1H, Gal-H), 3.43 (dd, J = 11.8 Hz, 4.5 Hz, 1H, H-3), 3.10 (t, J = 2.4 Hz, 1H, H-33), 2.44 (d, J = 10.8 Hz, 1H, H-18), 1.34 (s, 3H, CH3), 1.24 (s, 3H, CH3), 1.02 (s, 3H, CH3), 1.00 (s, 3H, CH3), 0.97 (d, J = 6.5 Hz, 3H, CH3), 0.94 (s, 3H, CH3), 0.90 (s, 3H, CH3); <sup>13</sup>C-NMR (150 MHz, pyridine-d5) δ: 177.2, 139.4, 126.0, 107.5, 88.8, 81.9, 76.8, 75.4, 73.1, 71.9, 70.3, 62.5, 55.9, 53.3, 47.9, 47.8, 42.5, 40.0, 39.8, 39.5, 39.3, 38.9, 37.8, 36.8, 33.3, 31.1, 29.2, 28.3, 26.7, 24.8, 23.8, 23.6, 21.3, 18.4, 17.6, 17.4, 17.0, 15.6; HRMS (ESI): Calcd for [M + H]<sup>+</sup> C39H62NO7: 656.4526, found 656.4516.

#### Synthesis of Compound VI (CE-P)

To a solution of compound **V** (200.0 mg, 305.14 mmol) in DCM (1 mL), KBr (7.26 mg, 61.03 mmol), TEMPO (0.95 mg, 6.1 mmol) and TBAB (19.67 mg, 61.03 mmol) were added at room temperature. To a solution of this mixture was added Na2CO3/NaHCO<sup>3</sup> (3 mL, PH 9.5). To this mixture, Ca(ClO)<sup>2</sup> (87.26 mg, 610.28 mmol) was added at 0◦C and the reaction mixture was stirred violently until its completion. The Na2SO<sup>3</sup> 20 mg was added to quench the reaction, and then 6N HCl was dropped to acidate PH 3. The crude mixture was extracted with DCM (3 × 15 mL) and the combined organic layer was dried over Na2SO<sup>4</sup> and purified through column chromatography (eluent: CH2Cl2-CH3OH-H2O, 50:10:1) to offer pure compound **VI** as white solid. (63.4 mg, 31% yield). <sup>1</sup>H-NMR (600 MHz, pyridined 5 ) δ: 7.91 (t, J = 5.4 Hz, 1H, N-H), 5.48 (t, J = 3.4 Hz, 1H, H-12), 4.87 (d, J = 7.7 Hz, 1H, H-1′ ), 4.65–4.58 (m, 1H, Gal-H), 4.56–4.46 (m, 1H, Gal-H), 4.46–4.28 (m, 3H, H-31, Gal-H), 4.24– 4.12 (m, 1H, Gal-H), 3.43 (m, 1H, H-3), 3.11 (s, 1H, H-33), 2.48 (m, 14H, H-18), 1.36 (s, 3H, CH3), 1.25 (s, 3H, CH3), 1.02 (s, 3H, CH3), 0.99 (s, 3H, CH3), 0.97 (s, 6H, 2×CH3), 0.89 (s, 3H, CH3); <sup>13</sup>C-NMR (150 MHz, pyridine-d 5 ) δ: 177.1, 175.9, 139.3, 125.8, 107.0, 88.3, 81.9, 76.1, 75.9, 72.7, 72.5, 71.9, 55.8, 53.2, 47.8, 47.7, 42.4, 39.9, 39.9, 39.7, 39.4, 39.2, 37.7, 36.8, 33.4, 31.1, 29.9, 29.2, 28.9, 28.2, 24.7, 23.8, 23.6, 21.3, 18.4, 17.5, 17.5, 16.9, 15.6; HRMS calcd mass for C39H59NNaO<sup>8</sup> [M+Na]<sup>+</sup> 692.4138, found 692.4145. The spectrograms of the compounds I–VI were shown in Electronic Supplementary Material (ESI).

#### Biological Studies

#### Cell Preparation and Culture

HUVECs were isolated from fresh human umbilical veins using 0.1% collagenase I, as previously described (Qin et al., 2015). After dissociation, the cells were collected and cultured in VascuLife <sup>R</sup> VEGF Endothelial Cell Culture Medium (Lifeline Cell Technology, MD, USA) supplemented with 100 U/mL penicillin and 100µg/mL streptomycin. All cell cultures were maintained in a humidified 37◦C incubator with 5% CO2, and the media were refreshed every 3 days. Cells at passages 3–7 were used in subsequent experiments. Neonatal umbilical cords were donated by the Maternal and Child Care Service Center in Beijing, China.

#### Cell Viability Assay

Cell viability was determined using the MTT (3-(4, 5 dimethylthiazol-2-yl)-2, 5-diphenyl tetrazolium, Amresco, 0973) assay as previously described (Tian et al., 2017b). Briefly, HUVECs were plated on 96-well plates at a density of 8 × 10<sup>4</sup> cells/well and then grown at 37◦C for 24 h. The treatment group cells were pretreated with CE-P/CE for 8 h, followed by treatment with ox-LDL (80µg/mL, 24 h), the control group was pretreated with vehicle for 8 h then exposed without ox-LDL. Twenty microliters of MTT (5 mg/mL) were added to each well and incubated for 4 h. The medium was removed and the formazan crystals were dissolved with dimethyl sulfoxide (DMSO). The absorbance was measured at 570 nm on a microplate reader (TECAN Infinite M1000, Austria).

#### Assessments of Cell Apoptosis

HUVECs were incubated with ox-LDL (70µg/mL, 24 h) and pretreated with BCEA for 8 h prior to the apoptosis assay. Double fluorescence staining was performed using an Annexin V-FITC/PI apoptosis staining kit (Molecular ProbesTM, V13241) according to the manufacturer's instructions to detect early apoptotic and necrotic cells. Cellular fluorescence was measured using flow cytometry with a FACS Calibur Flow Cytometer (BD Biosciences, USA).

#### Determination of 19m

We used JC-1 (5, 5′ , 6, 6′ -tetrachloro-1, 1′ , 3, 3′–tetraethyl benzimidazolyl carbocyanine iodide, Beyotime Biotechnology, (C2005) to analyze 19m. HUVECs were cultured on coverslips, the ox-LDL was removed, and the cells were washed twice with warm PBS and incubated with JC-1 (2µM final concentration) for 30 min in the dark. The cells were finally washed twice with PBS, and images were captured using an EVOS <sup>R</sup> FL fluorescence microscope (Thermo Fisher Scientific, USA).

#### Analysis of Caspase-3 Activation

Caspase-3 activity was measured using a Fluorometric Assay Kit (BioVision, USA) according to the manufacturer's instructions. The samples were measured in a Fluoroskan Ascent FL fluorometer (Thermo Fisher Scientific, USA) using a 400 nm excitation wavelength and a 505 nm emission wavelength. The results are expressed as fold changes compared to the control.

#### Biotin–Neutravidin Pull-Down Assay

HUVECs were cultured in a T75 culture flask. HUVECs at 100% confluence were lysed in PBS buffer, and the protein concentration was adjusted to 2 mg/mL. For each experimental and control sample, 2 × 0.5 mL aliquots of the 2 mg/mL cell homogenate were transferred into microcentrifuge tubes. The experimental and control samples were incubated with 5 µL of 10 mg/mL CE-P or 5 µL of DMSO at room temperature for 1 h. Then, the proteomes were labeled with biotin-azide (100µM), TCEP, 1 mM), TBTA, 100µM), and CuSO4·5H2O (1 mM) for 1 h. Seven hundred fifty microliters of cold MeOH were added and sonicated for 3–4 sec using a probe sonicator (∼30% power level) at 4◦C to re-suspend the protein. The samples were then centrifuged for 4 min at 6,500 × g at 4◦C and the supernatant was removed. The pellets were dissolved in PBS containing 1.2% SDS via sonication and then diluted with PBS containing 0.2% SDS. The samples were incubated with streptavidin beads for 2 h at room temperature and washed with PBS several times. Samples were denatured by heating in 2 × SDS-loading buffer and analyzed by SDS-PAGE. The resulting bands were visualized with Coomassie blue staining (Lee et al., 2014). Next, trypsin digestion was performed on selected visible protein bands.

#### Western Blot

Cell extracts were lysed in RIPA lysis buffer (Beyotime, Shanghai, China) containing a 1% protease inhibitor cocktail (Roche, Basel, Switzerland) (Sun et al., 2012). The protein content was measured with a BCA Protein Assay Kit (CWBiotech, Beijing, China). Approximately 30–50 µg of protein were resolved using 10 or 12% SDS-PAGE and then transferred to polyvinylidene difluoride membranes. The membranes were incubated with 1:500-diluted primary antibodies overnight at 4◦C, followed by horseradish peroxidase-conjugated secondary antibodies at room temperature. Then, the proteins were developed with an enhanced chemiluminescence detection system and imaged using a Bio-Rad imaging system (Bio-Rad, Hercules, CA, USA).

#### CE-P Binds to Recombinant Hsp90AB1

CE-P was incubated with the recombinant Hsp90AB1 protein at room temperature for 1 h. The protein was pulled down as the same as the previous described methods (**Biotin–neutravidin pull-down assay**), then was detected by silver staining (Thermo Fisher Scientific, USA).

#### Targets Predicted by Discovery Studio 2016

The molecular targets of CE-P were predicted using Discovery Studio 2016 (BIOVIA Software Inc., San Diego, CA, USA), a software suite for performing computational analysis of data relevant to Life Sciences research. To determine the probable target of CE-P, we employed the Ligand Profiler protocol which maps a set of pharmacophores, including Pharma DB by default. The ligand CE-P was prepared by the Specifying Ligands parameter protocol. After inputting all parameters, the job was run and the results were monitored from the Jobs Explorer.

#### Molecular Docking

To explore the potential interacting mode of CE/CE-P with the Hsp90AB1 protein (PDB code: 3NMQ), a molecular modeling study was performed using the docking program named Induced-Fit, a refinement method in another software MOE. To eliminate any bond length and bond angle biases, the ligand (CE/CE-P) was subjected to an "energy minimize" prior to docking. The binding affinities (S-values) in MOE were used to evaluate the interactions between Hsp90AB1 and CE/CE-P. The scores (binding affinities) were obtained based on the virtual calculation of various interactions of the ligands with the targeted receptor.

#### Surface-Plasmon Resonance (SPR)

The molecule/protein interaction detection and kinetic constant measurement were studied using the Biacore System. CM5 Sensor Chip was activated using sulpho-NHS/EDC chemistry in a buffer consisting of 2.7 mM KCl 137 mM NaCl, 0.05% (v/v) surfactant P20, pH 7.4. The chip was subsequently immobilized with the recombinant human Hsp90AB1 protein at a concentration of 37µg/ml in sodium acetate, pH 4.5 and then blocked with 1 M ethanolamine, pH 8.0. Compounds were dissolved to 10 mM in 100% DMSO and then 50 fold into running buffer without DMSO then diluted twofold by running buffer into 12.5, 6.25, 3.125, 1.56, 0.78, and 0µM before injection. The optical interference pattern was recorded as a change in optical path difference in units of nm. Data were analyzed with Biacore T200 Evaluation Software.

#### Statistical Analysis

Data are presented as the means ± standard deviation (SD) of three independent experiments. The groups were compared using one-way ANOVA followed by Tukey's multiple comparison tests using the statistics module of Graph Pad Prism 5.0. A value of P < 0.05 was considered statistically significant.

#### RESULTS

### Design and Synthesis of the CC Activity-Based Protein Profiling Probe CE-P Based on CE

According to previous studies, the biotinylated probe BCEA, which maintains the active moiety of the parental compound CE, exhibits similar protective effects against ox-LDL-induced human umbilical vein endothelial cell (HUVEC) damage and identified 128 proteins related to cell survival signaling pathways as the targets (Tian et al., 2017b). Based on studies of the structure-activity relationship (SAR), amide derivatives of CE containing ursane and galactoside scaffolds maintained similar activity to the parental compound CE (Tian et al., 2017a,b). In the current study, we describe the design and construction of the CC-Activity-Based Protein Profiling Probe CE-P (CC-ABPP CE-P, **Figure 1**) and its subsequent use in identifying the targets of CE. An alkynyl group was introduced at the C-28 carboxylic moiety of the saponin scaffold, which enabled the hydrophilic PEG chain to link to biotin through a Cu(I)-catalyzed Huisgen 1,3-dipolar cycloaddition reaction.

As illustrated in **Scheme 1**, naturally abundant ursolic acid was treated with benzyl bromide (BnBr), a potassium carbonate solution (K2CO3), and tetrabutylammonium bromide (TBAB) in dry dichloromethane (DCM) to obtain a good yield of compound **I**. The glycosyl donor **i** was prepared from galactose using the conditions reported by Schmidt (Schmidt and Michel, 1980). Compound **I** was reacted with glycosyl donor **i** under Lewis acidic conditions in the presence of trimethylsilyl trifluoromethanesulfonate (TMSOTf) to produce compound **II**, which was subjected to hydrogenation to obtain compound **III** in the presence of a catalytic amount of 10% Pd-C at atmospheric pressure. The above reaction conditions were reported in our previous paper. Compound **IV** was attained via amidation of the C-28 carboxyl group of saponin scaffold with propargylamine, followed by deprotection of the glycosyl groups in the presence of a NaOMe/MeOH solution to obtain compound **V**. In the final step, an oxidation reaction was performed using compound **V** and TEMPO/Ca(ClO)<sup>2</sup> in the presence of KBr and a TBAB catalyst in an Na2CO3/NaHCO<sup>3</sup> solution, yielding the CC-ABPP **CE-P** (compound **VI**).

### CE-P Protects Against ox-LDL-Induced Endothelial Cell Injury

As shown in our previous study, CE protected against ox-LDLinduced endothelial cell injury (Tian et al., 2017b). In this context, we introduced a very small alkyne group into CE to create a click chemistry activity-based probe. We first measured cell viability using the MTT assay to investigate the activity of CE-P. The cytotoxicity of CE-P was measured, and the results shown in **Figure 2A** did not reveal obvious changes in cell viability. Then, we determined whether CE-P protects cells from ox-LDL-induced injury. As shown in **Figure 2B**, the control group was pretreated for 8h with vehicle then exposed without ox-LDL, the other groups exposed to ox-LDL exhibited dramatically decreased cell viability, whereas pretreatment with CE or CE-P (0.625 or 1.25µg/mL) for 8 h significantly ameliorated cell injury. We found that there were no significant differences between the two compounds at the same doses for sustaining the cell viability. CE-P retained the ability of inhibiting ox-LDL induced HUVECs damage, and the presence of the small alkyne moiety does not affect the biological activity of CE.

### CE-P Attenuates ox-LDL-Induced HUVEC Apoptosis

CE has been shown to protect against cell apoptosis (Tian et al., 2017b). We first detected the phosphatidylserine (PS) levels using Annexin V/propidium iodide (PI) double staining and flow cytometry to explore whether the effects of CE-P on protecting cells from ox-LDL-induced injury involved the inhibition of cell apoptosis. During the early stage of apoptosis, phosphatidylserine is exposed on the extracellular side of the cell membrane, and Annexin V specifically binds PS (Qin et al., 2015). As shown in **Figure 3A**, the protective effect of CE-P on ox-LDLinduced cell death following PS exposure was investigated using Annexin V/PI double staining and flow cytometry. An 8 h CE-P pretreatment decreased the percentage of Annexin V(+)/PI(–) cells. Mitochondrial damage is closely related to cell apoptosis, and a change in the mitochondrial membrane potential (19m) is one of the main functional markers of mitochondrial injury (Yu et al., 2016). JC-1 is an indicator of the mitochondrial transmembrane potential. As indicated by the JC-1 staining shown in **Figure 3B**, red fluorescence represents the normal mitochondria, and green fluorescence indicates HUVECs in which the mitochondrial membrane potential was depolarized. The ox-LDL-treated group exhibited a decrease in the intensity of red fluorescence and an increase of green signal. In contrast, the CE-P-pretreated group reversed this change by decreasing the green signal and increasing red fluorescence intensity, indicating that CE-P mitigated 19m. Caspase-3, one of the critical enzymes involved in apoptosis, the active form cleaved capase-3 is induced at the late stage of apoptosis. DEVD-AFC is used to detect cleaved caspase-3 activity (Sun et al., 2014). As shown in **Figure 3C**, the CE-P pretreatment remarkably reduced cleaved caspase-3 activation. We evaluated the expression of apoptosisrelated proteins using western blot analyses to further confirm the anti-apoptotic effects of CE-P on HUVECs. As shown in **Figure 3D**, CE-P increased the levels of Bcl2 and pro-caspase-3

FIGURE 2 | CE-P protects ox-LDL-induced endothelial cell injury. (A) To evaluated the cytotoxicity of CE-P, HUVECs were treated with CE-P alone (1.25, 2.5, 5µg/mL) for 24 h and then the cell viability was measured by MTT assay. (B) HUVECs were pretreated with CE or CE-P (0.625, 1.25µg/mL) for 8 h, then were incubated with or without ox-LDL for another 24 h and finally cell viability was assayed by MTT. The data are expressed as means ± SD. from three independent experiments. ##P < 0.01 vs. control group, \*P < 0.05, \*\*P < 0.01 versus ox-LDL treatment group. NS is no significance.

Na2CO3/NaHCO3, Ca(ClO)2, 0oC, 8 h.

and decreased the levels of Cytochrome C, consistent with our previous results showing the anti-apoptosis activity of CE. Lox-1 is the main ox-LDL receptor in HUVECs, and ox-LDL has been shown to induce the Lox-1 expression, which triggers cell apoptosis (Li et al., 2003; Li and Mehta, 2009). In our study, the CE-P pretreatment remarkably attenuated Lox-1 expression during ox-LDL-induced injury. Based on these results, CE-P protected HUVECs from ox-LDL-induced cell apoptosis.

### Profiling of CE-P Target Proteins in HUVEC Cell Lysates Using Click Chemistry

With the effective chemical probe in hands, we performed pull-down experiments followed by proteomics analysis to identify the cellular targets of CE (**Figure 4A**). CE-P was first incubated with a cell lysate to identify the potential targets of CE. Proteomes were obtained from lysates incubated with DMSO and CE-P with a biotin-azide linker using a click reaction, after which the labeled proteins were enriched by an affinity pull-down method using streptavidin beads. The enriched proteomes were eluted and separated by one-dimensional gel electrophoresis (1DGE). As shown in **Figure 4B**, we observed a single labeled protein band in the cell lysate in the CE-P lane (band A, indicated by an arrow). We also examined the washes from the CE-P reaction to exclude non-specific binding of CE-P. After extensive washing with the binding buffer, the unbound proteins were eluted. In **Figure 4C**, lane 1 is the cell lysate, lane 2 is the first elution solution, and lane 3 is the final washing solution. Thus, band A represents proteins that specifically bound to CE-P (Yi et al., 2012). Next, we cut band A from the DMSO lane and CE-P lane for the liquid chromatography/tandem mass spectrometry (LC-MS/MS) analysis. The Mascot search algorithm was used to identify proteins from the resulting peptides identified by LC-MS/MS. A large number of proteins were identified from each LC/MS run. The proteins which got the scores > 100, were considered as reliable hits (Table S1) (Weerapana et al., 2010; Shi et al., 2011). Some of these proteins were inevitably non-specific proteins, many of which were "sticky" and/or highly abundant proteins. These proteins were automatically removed. "False" hits that appeared in control pull-down/LC/MS experiments were also eliminated to generate the final complete list of proteins (Table S1). Consequently, we identified 37 proteins as specific targets of CE-P.

### Hsp90AB1 as a Potential Target of CE-P

The molecular targets of CE-P were predicted using Discovery Studio 2016 software. Nineteen potential targets were found and shown to have probable relationships with the pharmacological effects of CE-P. Among these candidates, we selected targets with scores > 0.5 for the subsequent investigations and finally identified 9 proteins, as shown in **Figure 5A**. Moreover, Hsp90 which was predicted with a higher score 0.848264, was also identified by gel proteomic with the high score 217 in Table S1 and **Figure 5B**. Comparing these above results we thought Hsp90AB1 might be one of potential targets and be critical for cell apoptosis (Cohen-Saidon et al., 2006; Lanneau et al., 2007; Didelot et al., 2008; Chen et al., 2009). To further validate Hsp90AB1 as the direct binding target of CE-P, we confirmed the identity of the proteins that were pulled down using immunoblotting with their respective antibodies. As shown in **Figure 5C**, the CE-P pull-down precipitated Hsp90AB1, but almost no signal was observed in the control group. To verify the interaction of CE-P with Hsp90AB1, we incubated recombinant Hsp90AB1 protein with CE-P. As shown in **Figure 5D**, Hsp90AB1 was obviously pulled down by CE-P, which was detected by silver staining. We also found that CE-P can pull down Hsp90AB1 in dose-dependent manner as shown in **Figure 5E**. We incubated HUVEC cell lysates with CE-P in the absence or the presence of an excess amount of CE for competitive binding. As shown in **Figure 5F**, Hsp90AB1 was obviously pulled down by CE-P, moreover, an excess amount of CE effectively blocked the binding of Hsp90AB1 to CE-P, which were detected by Western blot. Taken together, the above results unequivocally confirmed a direct interaction between CE-P and Hsp90AB1. To further investigate the potential biological role of CE about Hsp90AB1, we then detected the effects of CE on Hsp90AB1 expression levels in ox-LDL induced HUVEC damage. **Figures 5G,H** showed that CE pretreatment significantly inhibited the down-regulation of the ox-LDLinduced Hsp90AB1 expression.

### Molecular Docking Between CE/CE-P and Hsp90AB1

Based on the predicted molecular targets, we analyzed the possible interaction between CE/CE-P and the 3D Hsp90AB1 receptor binding sites (PDBID: 3NMQ) using Molecular Operating Environment (MOE) software package. The S-values (CE: −8.70 and CE-P: − 8.78) were obtained based on the virtual calculation of the interaction of CE/CE-P with the targeted Hsp90 AB1 protein. Molecular modeling of CE/CE-P showed that both two compounds could bind to the N-terminal domain of Hsp90AB1 and participated in important hydrogen bonds with key amino acid residues Asp 93 and Asn 51 (**Figures 6A,B**). As shown in **Figure 6C**, the glycosyl moieties of CE (gray) and CE-P (green) are responsible important for binding with the key amino acid residuces of Hsp90AB1 with amino acid residues, and the propargyl group (red frame) that exposing on the edge of the pockets were designed for "clicking" conveniently with biotin tag.

### SPR Analysis of CE/CE-P Binding to Hsp90AB1

Surface plasmon resonance (SPR) biosensors are most commonly applied for real-time dynamic analysis and measurement of interactions in bio-molecular studies and compounds analysis without the need for labeling processes. In our research, we applied this system to confirm the interaction of CE/CE-P with Hsp90AB1 and explore its binding affinity. As shown in **Figures 7A,B**, SPR data analysis revealed that both CE and CE-P could bind to Hsp90AB1 in a dose-dependent manner. The KDvalue of CE-P binding to Hsp90AB1 was 23.4µM (**Figure 7D**), and CE was 2.34µM (**Figure 7C**).

FIGURE 3 | CE-P attenuates the ox-LDL induced HUVECs apoptosis. The protective effect of CE-P on ox-LDL-induced apoptosis was determined via AnnexinV/PI double staining, JC-1 staining, cleaved-caspase3 activity, and western blot assay. HUVECs were pretreated with CE-P (1.25µg/mL) for 8 h and then incubated with or without ox-LDL for additional 24 h for associated measures. (A) After cell treatment, cell early apoptosis was measured via AnnexinV/PI double staining by flow cytometry. (B) The mitochondria damage during apoptosis was detected by JC-1 staining through fluorescence microscope. (C) At the final stage of apoptosis, the cleaved caspase3 activity was measured by fluorometric assay. (D) Apoptosis associated proteins Bcl2, Caspase3, Cytochrome C were evaluated by western blot. The data are expressed as means ± SD from three independent experiments. ##P < 0.01 vs. control group, \*\*P < 0.01 vs. ox-LDL treatment group.

#### DISCUSSION

The design and synthesis of potential probes represents a major challenge for target identification. In our previous study, the introduction of a substituent at the C-28 position of CE maintained its protective effects. Based on the results of preliminary SAR studies, amide derivatives of CE that containing ursane and galactoside scaffolds maintained similar activity to the parental compound CE. In the current study, the Npropargylamide derivative CE-P was chosen as the clickable activity-based probe in which the biotin tag was introduced using a Cu (I)-catalyzed Huisgen 1, 3-dipolar cycloaddition reaction. According to the results of the MTT assay, the CE-P probe exhibited promising protective effects against ox-LDL-induced HUVEC damage. We also confirmed that CE-P protects against apoptosis using Annexin V/PI staining, JC-1 staining, caspase-3 activity assays and western blotting. Based on these results, CE-P maintains its anti-apoptosis activity and is suitable for use in further research.

In this context, we introduced a very small alkyne group into CE to create a clickable activity-based probe. Unlike the bulky biotin tag, the small alkyne group does not affect the interaction of this compound with the potential targets in vitro or its ability to penetrate the plasma membrane. In our previous reports, we utilized an ABPP probe and identified ∼750 potential targets, however, with this probe, we identified 37 proteins as the most

promising targets using the gel-based strategy. The clickable probe excluded a significant number of non-specific proteins and increased the possibility of identifying potential targets to prevent further injury. The probe will also be used to explore potential targets in vivo in future studies.

The ability to predict and interpret the mechanisms of action and biological targets of drugs has become feasible with the development of computational chemistry. Using DS 2016 software, we screened 9 proteins as potential targets that modulate a number of biological functions. Among these candidates, we focused on Hsp90AB1 because it had higher scores both in DS vital prediction and proteomics identification of the pull-down targets with CE-P. To rule out the interference of others, we used Hsp90AB1 pure proteins to repeat the binding experiments. The SPR results also revealed the affinity characters between them. By affinity analysis, we found CE-P (23.4µM) had a relatively weaker affinity than CE (2.43µM), but still maintained the property to bind the Hsp90AB1 in a dose-dependent manner. To explore their mode of action, we performed virtual assay and found both ligands could bind with Hsp90AB1, maybe it was the way that CE could influence the target function. However, this binding site was speculative and based only on molecular modeling. To confirm its exact binding domain of CE with Hsp90AB1, it needs more powerful researches such as ATP/ADP site mutation and cocrystallization to prove this.

The Hsp90s are a family of molecular chaperones that function in the cellular stabilization, regulation, and activation of a range of "client" protein. The human isoforms of Hsp90 include Hsp90α and Hsp90β (also named Hsp90AA1 and Hsp90AB1) which are 85% identical (Li and Buchner, 2013; Synoradzki and Bieganowski, 2015). Their distinct functions have been identified (Lamoth et al., 2016). Hsp90α correlates with tumor invasiveness, angiogenesis and metastasis (Tsutsumi et al., 2008; Song et al., 2010). In contrast, Hsp90β appears to have specific

role in the anti-apoptitic functions of Bcl2 and cIAP1 (Cohen-Saidon et al., 2006; Lanneau et al., 2007; Didelot et al., 2008; Chen et al., 2009). Hsp90α and Hsp90β were also recently found to have differing effects on the activity of endothelial nitric oxide synthase (Cortes-González et al., 2010; Fismen et al., 2012). In our research, the specific domains of Hsp90AB1 were identified by LC/MS of pull-down proteins. CE could protect ox-LDL induced apoptosis and this coincides with the function of Hsp90AB1, so we mainly focused on Hsp90AB1. Indeed, we also identified one non-specific sequence (HFSVEGQLEFR) of Hsp90AA1 and Hsp90AB1 except most of the specific sequences. Might it was also a possible insight for other Hsp90s such as Hsp90AA1 as the potential target of CE, but was still need a lot of experimental results to prove it. Hsp90AB1, as molecular chaperone, interact with a lot of clients to form complexes to regulate its activity. In our research, we have confirmed CE could directly bind to Hsp90AB1 by SPR assay, if CE binds Hsp90AB1 clients still need more exploration (Hartson and Matts, 2012).

Post-translation modification (PTM) is central to biology by expanding and modulating the function of a large number of proteins. PTM contains a lot of styles such as attachment of small moieties cofactors, phosphorylation, acetylation, methylation, ubiquitylation (Hartley et al., 2015). Hsp90 undergoes extensive post-translational modifications, such as posphorylation, acetylation, S-nitrosylation, and ubiquitination (Mollapour and Neckers, 2012). Each of these factors can impact significantly on protein structure and function thus influencing and even enabling inherent protein activity. In our research, we confirmed CE binds Hsp90AB1 to interfere its function. If there is some other post-translation modifications involved in their interaction need more exploration.

Taken together all these results, we have focused our attention on Hsp90AB1 as one potential target of CE in HUVECs for further studies. The other candidates in this report should still be considered as potential targets, but their roles must be confirmed in additional experiments. In our future studies, we will perform in vivo experiments to further examine all the candidates.

## CONCLUSION

In summary, our present researches employed chemical proteomics and click chemistry approaches for the first time to

FIGURE 7 | SPR analyses of CE or CE-P binding to Hsp90AB1. Hsp90AB1 immobilized to a CM5 Sensor Chip was provided with the CE/CE-P at concentrations varying from 0.75 to 12.5µM. (A,B) Representative binding curves of CE (A) and CE-P (B) binding to Hsp90AB1. (C,D) Kinetic binding constants of CE (C)/CE-P (D) with Hsp90AB1.

explore the targets of CE in HUVECs and identified Hsp90AB1 as possible molecular target. In our report, we designed and synthesized the clickable CE-P probe and showed that it exhibited similar activity to CE by inhibiting ox-LDL-induced cell injury. For the sake of fishing its targets, we pulled down the proteins in HUVECs cell lysate with CE-P and identified 37 potential targets using the gel-based proteomic strategy. Combining fishing data by DS 2016, we finally focused on Hsp90AB1 protein on account of the higher scores both in the pull-down assay and virtual assay. To confirm the target, we firstly detected its existence in whole cell lysate by western blotting. The probe CE-P performed the same mode of interaction and had the same binding site with Hsp90AB1, which were proved by the competitive inhibition experiment and molecular docking software respectively. To further confirm the interactions of CE-P with Hsp90AB1, we used the recombinant Hsp90AB1 protein to exclude the interference of others protein in cell lysate. Moreover, the SPR analysis revealed that both CE/CE-P could bind to Hsp90AB1 with the similar protein affinity which proved that both CE and CE-P could direct bind to protein Hsp90AB1. Based on upon reliable data, we believe that Hsp90AB1 is the potential target of CE, and will be a more promising target for future explorations.

#### AUTHOR CONTRIBUTIONS

G-BS, X-DX, and X-BS conducted the study. SW and YT designed the detailed experiments, performed the study, and collected and analyzed data. J-YZ, H-BX, PZ, MW (sixth author), S-BL, YL and MW (ninth author) took part in the experiments in

#### REFERENCES


this study. All Authors commented the study and approved the final manuscript.

#### ACKNOWLEDGMENTS

This work was supported by the National Natural Sciences Foundation of China (Grant Nos. 81473380, 81302656, and 81502929), the Natural Sciences Foundation of Beijing (Grant No. 7144225, 7152102), the National Science and Technology Major Project (Grant No.2015ZX09501004-001-003), Peking Union Medical College Graduate Student Innovation Fund (Grant No. 2016-1007-06), and the CAMS Innovation Fund for Medical Science (CIFMS) (Grant No. 2016-I2M-1-012).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00532/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wang, Tian, Zhang, Xu, Zhou, Wang, Lu, Luo, Wang, Sun, Xu and Sun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities

Marcelo D. Polêto<sup>1</sup> , Victor H. Rusu<sup>2</sup> , Bruno I. Grisci <sup>3</sup> , Marcio Dorn<sup>3</sup> , Roberto D. Lins <sup>4</sup> and Hugo Verli <sup>1</sup> \*

<sup>1</sup> Grupo de Bioinformática Estrutural, Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, <sup>2</sup> Swiss National Supercomputing Centre, Lugano, Switzerland, <sup>3</sup> Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, <sup>4</sup> Instituto Aggeu Magalhães, Fundação Oswaldo Cruz, Recife, Brazil

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Antonio Monari, Université de Lorraine, France Gustavo Trossini, Universidade de São Paulo, Brazil

> \*Correspondence: Hugo Verli hverli@cbiot.ufrgs.br

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 09 November 2017 Accepted: 05 April 2018 Published: 24 April 2018

#### Citation:

Polêto MD, Rusu VH, Grisci BI, Dorn M, Lins RD and Verli H (2018) Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities. Front. Pharmacol. 9:395. doi: 10.3389/fphar.2018.00395 The identification of lead compounds usually includes a step of chemical diversity generation. Its rationale may be supported by both qualitative (SAR) and quantitative (QSAR) approaches, offering models of the putative ligand-receptor interactions. In both scenarios, our understanding of which interactions functional groups can perform is mostly based on their chemical nature (such as electronegativity, volume, melting point, lipophilicity etc.) instead of their dynamics in aqueous, biological solutions (solvent accessibility, lifetime of hydrogen bonds, solvent structure etc.). As a consequence, it is challenging to predict from 2D structures which functional groups will be able to perform interactions with the target receptor, at which intensity and relative abundance in the biological environment, all of which will contribute to ligand potency and intrinsic activity. With this in mind, the aim of this work is to assess properties of aromatic rings, commonly used for drug design, in aqueous solution through molecular dynamics simulations in order to characterize their chemical features and infer their impact in complexation dynamics. For this, common aromatic and heteroaromatic rings were selected and received new atomic charge set based on the direction and module of the dipole moment from MP2/6-31G\* calculations, while other topological terms were taken from GROMOS53A6 force field. Afterwards, liquid physicochemical properties were simulated for a calibration set composed by nearly 40 molecules and compared to their respective experimental data, in order to validate each topology. Based on the reliance of the employed strategy, we expanded the dataset to more than 100 aromatic rings. Properties in aqueous solution such as solvent accessible surface area, H-bonds availability, H-bonds residence time, and water structure around heteroatoms were calculated for each ring, creating a database of potential interactions, shedding light on features of drugs in biological solutions, on the structural basis for bioisosterism and on the enthalpic/entropic costs for ligand-receptor complexation dynamics.

Keywords: drug design, GROMOS, aromatic rings, functional groups, interactions

## 1. INTRODUCTION

The development of a drug is a multi step process, usually starting with the identification of hit compounds. The challenging task of optimizing these compounds into leads and finally into drugs is commonly facilitated by computer aided drug design (CADD) techniques (Anderson, 2003; Sliwoski et al., 2013; Bajorath, 2015). With the growing information on protein structure on the last years, structure based drug design (SBDD) has become a significant tool for hit discovery (Anderson, 2003; Lounnas et al., 2013; Lionta et al., 2014). When structural information of the receptor is absent, molecular fingerprints of approved drugs are also used to search for new ligands in a process also known as ligand based drug design (LBDD) (Lee et al., 2011). Nevertheless, there are still considerable challenges associated to the predictiveness of ligand potency and affinity via computational methods (Paul et al., 2010; Csermely et al., 2012).

In general, optimization of lead compounds is based in qualitative or quantitative structure-activity relationships (SAR or QSAR, respectively) (Shahlaei, 2013). These relationships are usually based in molecular descriptors to predict ligand pharmacodynamics and pharmacokinetics, such as logP to access lipophilicity, logS to access solubility or pKa to access the ionic state of a compound, along with other topological, geometrical and physicochemical descriptors (Danishuddin and Khan, 2016). While some correlations have reasonable power of predictiveness, many descriptors have no biological meaning and can mislead the optimization process. As highlighted by Hopkins et al. (2014), high-throughput screening methods have been linked to the rise of hits with inflated physicochemical properties during the optimization process (Keserü and Makara, 2009). Also, recent reviews have shown an increase of molar mass in the recent medicinal chemistry efforts (Leeson and Springthorpe, 2007) and many authors correlate this strategy with the likelihood of poor results of such compounds (Gleeson, 2008; Waring, 2009, 2010; Gleeson et al., 2011).

Many chemical moieties are regularly used in medicinal chemistry to produce chemical diversity (Bemis and Murcko, 1996; Welsch et al., 2010; Taylor et al., 2014), a practice wellknown as fragment based drug design (FBDD), and its use for pharmacophore modeling and to prevent high toxicity is not recent (Gao et al., 2010). Particularly, aromatic rings are extensively used in drugs due to their well known synthetic and modification paths (Aldeghi et al., 2014). For example, at least, one aromatic ring can be found in 99% of a database containing more than 3,500 evaluated by the medicinal chemistry department of Pfizer, AstraZeneca (AZ) and GlaxoSmithKlin (GSK) (Roughley and Jordan, 2011). Still, little is known about their chemical features in biological solution, such as H-bonds availability, lifetime of H-bonds, solvent accessibility, and conformational ensemble. In this sense, molecular dynamics (MD) simulations can provide useful information with atomistic resolution and access the aforementioned features of chemical groups in water, providing fundamental data to drive medicinal chemistry approaches.

Still, dynamical properties of chemical moieties in biological solution are usually neglected in drug design and very difficult to access (Ferenczy and Keseru, 2010; Reynolds and Holloway, 2011; Hopkins et al., 2014). Even though MD simulations have been used in medicinal chemistry to generate different receptor conformers and to validate binding poses predicted by docking (Zhao and Caflisch, 2015; Ganesan et al., 2017), simulations of free ligand in solution is rarely used to access the conformational ensemble and energies associated with solvation due to the challenge on solving conformational flexibility and internal energies (Butler et al., 2009; Blundell et al., 2016). When solvated, the enthalpic and entropic costs of disrupting a Hbond or dismantling the entire solvation shell of a ligand can be the determinant step to provide the proper energy of binding (Biela et al., 2012; Blundell et al., 2013; Mondal et al., 2014). Yet, free-energy of binding is often predicted via geometrical or alchemical transformations (Zwanzig, 1954; Aqvist et al., 1994; Woo and Roux, 2005; Gumbart et al., 2013), alongside with recent developments in funnel metadynamics (Limongelli et al., 2013). More recently, thermodynamical features of ligands have been experimentally investigated in order to enhance binding and efficiency (Freire, 2009; Ferenczy and Keseru, 2010; Reynolds and Holloway, 2011). Ligand features such as H-bonds lifetime, effects of vicinity in H-bonds availability and strength, accessible surface area and water structure around binding sites can provide substantial information for designing new molecular entities (Blundell et al., 2016).

Different force fields have been used for drug design purposes, such as MMF94 (Halgren, 1996), OPLS-AA (Jorgensen et al., 1996), and GAFF (Wang et al., 2004). While these force fields parameterized their electrostatic terms using ab initio calculations, the GROMOS force fields (derived from the Groningen Molecular Simulation package) used free-energy of solvation as target (Daura et al., 1998; Oostenbrink et al., 2004) to empirically assign atomic partial charges. Thus, in this work, we have chosen the GROMOS force field to simulate the dynamical behavior of 103 aromatic rings (including a calibration subset of 42 molecules) mostly commonly used in drug design and their interactions with solvent in order to access thermodynamical properties in solution. These interactions, in turn, offer a reference for future rational drug design studies, as describe in details how several functional groups interact with their surroundings.

## 2. METHODS

## 2.1. Selection of Rings

A series of 103 aromatic rings commonly used in drug design were selected for this study (Broughton and Watson, 2004; Jordan and Roughley, 2009; Welsch et al., 2010; Taylor et al., 2014, 2017). Among them, a calibration set of 42 molecules (**Table 1**), for which physical-chemical properties are known, were selected from the benchmark developed by Caleman et al. (2012). Briefly, both works of Taylor et al. (2014, 2017) employed a detailed search of substructure frequencies from FDA Orange Book and cross referenced with ChEMBL, DrugBank, Nature, Drug Reviews, the FDA Web site, and the Annual Reports in Medicinal Chemistry; the work of Broughton and Watson (2004) employed search of substructure frequencies in MDL Drug Data

TABLE 1 | Charge groups (colored) and aromatic rings used as calibration set in this work.

Report database by using a "Phase II" keyword; and the work of Welsch et al. (2010) have pinpointed privileged scaffolds from natural-products works throughout literature.

### 2.2. Topology Construction

Structures for these aromatic rings were built using Avogadro (Hanwell et al., 2012). Molecular mechanical (MM) topological parameters as bonds, angles, and Lennard-Jones parameters were taken from GROMOS53A6 (Oostenbrink et al., 2004). Due to the well–known good performance of MP2 methods for small aromatic rings (Li et al., 2015; Matczak and Wojtulewski, 2015), atomic partial charges were based on quantum mechanical (QM) calculations using MP2 theory (Møller and Plesset, 1934), 6- 31G<sup>∗</sup> (Petersson et al., 1988) basis set and implicit solvent Polarizable Continuum Model (PCM) (Mennucci and Tomasi, 1997) followed by a RESP fitting (Bayly et al., 1993). The so obtained partial charges were adjusted in the MM to reproduce the QM dipole moment of the ring. The angle θ formed between the QM and MM model dipole moment vectors was monitored through an in house script to make sure the angle had the lowest value possible, guaranteeing the conservation of the QM dipole moment direction. For our calibration set, the module of the MM partial charges were adjusted to better reproduce the physicochemical properties of the organic liquids. Following the philosophy of charge group assignment, groups were limited, at maximum, to the atoms at the ortho position on each ring. In more complex substitution patterns, a superimposition of two charge groups was required to correctly describe the chemical group. In such cases, the Coulombic terms of the overlapping atoms were adjusted to correctly describe the direction of the total dipole moment of the ring. For molecules containing linear constraints (benzonitrile), virtual sites were added in order to preserve the total moment of inertia and mass, thus preserving the linearity of these groups (Feenstra et al., 1999).

### 2.3. New Torsional Potentials

The quantum mechanical torsional profile of every dihedral angle was calculated using Gaussian (Frisch et al., 2016) (RRID:SCR\_014897). Molecular structures were built using Avogadro (Hanwell et al., 2012) and their geometry were optimized using Hartree-Fock method (Fock, 1930; Hartree and Hartree, 1935) and basis set 3-21G<sup>∗</sup> (Dobbs and Hehre, 1986). Afterwards, the Scan routine was used to calculate the total energy of the molecule conformation for each dihedral orientation, adopting a tight convergence criteria, with geometric optimization, MP2/6-31G<sup>∗</sup> and steps of 30◦ . In order to calculate the torsional profile for molecular mechanics model, dihedral orientations were kept fixed during minimization using restraint forces for the same angles evaluated by quantum calculations. Both profiles were submitted to the Rotational Profiler server (Rusu et al., 2014) to obtain appropriate sets of classical mechanics parameters that provided a better fitting to the QMobtained torsional profile.

#### 2.4. General Simulation Settings

All simulations were carried out using the GROMACS 5.0.7 package (Abraham et al., 2015) (RRID:SCR\_014565). In order to create parameters compatible with the GROMOS family, we have followed previous literature (Daura et al., 1998; Schuler et al., 2001; Oostenbrink et al., 2004) settings: twin-range scheme was used with short- and long-range cutoff distances of 0.8 and 1.4 nm, respectively. Also, the reaction-field method was applied to correct the effects of electrostatic interactions beyond the long-range cutoff distance (Barker and Watts, 1973; Tironi et al., 1995), using the dielectric constant as εRF for organic liquid simulations and εRF = 62 for simulations in water (Heinz et al., 2001; Oostenbrink et al., 2004). The LINCS algorithm (Hess et al., 1997; Hess, 2008) was used to constrain all covalent bonds, using a cubic interpolation, a Fourier grid of 0.12 nm and timestep of 2 fs. Configurations were saved at every 2 ps for analysis.

#### 2.4.1. Organic Liquids Simulations

In order to build the organic liquid systems, cubic boxes of 2×2×2 nm were created, each with a single organic molecule. A total of 125 of these boxes were stacked, forming an unique box with conventional periodic boundary conditions treatment of 10×10×10 nm which was simulated under high pressure (100 bar) to induce liquid phase. The systems were then simulated and equilibrated at 1 bar. Afterwards, the boxes were staggered to obtain systems with 1000 molecules in liquid phase and simulated at 1 bar until the total energy drift converged to values below 0.5 J/(mol×ns×Degrees of Freedom). Such criterion is necessary to make sure that the fluctuating properties could be accurately calculated (Caleman et al., 2012). All simulations were carried out with Berendsen pressure and temperature coupling algorithm due to their efficiency in molecular relaxations (Berendsen et al., 1984), using τ<sup>T</sup> = 0.2 ps and τ<sup>P</sup> = 0.5 ps. When available, experimental values of isothermal compressibility and dielectric constant were used as an additional parameter for liquid simulations. Otherwise, the compressibility of the most chemically similar molecule was used. The experimental dielectric constants from each liquid were also used as parameters in the simulations (Oostenbrink et al., 2004).

In order to calculate the densities of liquids (ρ), simulations at constant pressure were carried out for 10 ns and ρ were calculated using block averages of 5 blocks. Enthalpy of vaporization (1Hvap) were calculated by block averaging the same 10 ns of liquid simulation to obtain Epot(l) and another 100 ns of gas phase simulation using a stochastic dynamics integrator (SD) (Van Gunsteren and Berendsen, 1988) with a single molecule in vacuum, to obtain Epot(g) as the equation:

$$
\Delta H\_{\nu ap} = (E\_{pot}(\text{g}) + k\_B T) - E\_{pot}(l) \tag{1}
$$

Aiming to calculate the dielectric constant (ε), the simulation of the liquid boxes from which ρ were obtained were extended up to 60 ns. Convergence calculations of ε were done using running averages and ε were evaluated only after convergence. In order to calculate thermal expansion coefficients (αP) and classic isobaric heat capacities (CPcla), three constant pressure simulations were carried out for 5 ns each, with temperatures T, T+10K, and T-10K, for each liquid. The calculations of α<sup>P</sup> and CPcla were done using the finite difference method (Kunz and van Gunsteren, 2009):

$$\alpha p \approx \frac{1}{V} \left(\frac{\partial V}{\partial T}\right)\_P \approx -\frac{\ln \langle \rho \rangle\_{T\_2} - \ln \langle \rho \rangle\_{T\_1}}{T\_2 - T\_1} \tag{2}$$

and:

$$\mathbf{C}\_P \approx \left(\frac{\partial U}{\partial T}\right)\_P \approx \frac{\langle U \rangle\_{T\_2} - \langle U \rangle\_{T\_1}}{T\_2 - T\_1} \tag{3}$$

In order to calculate isothermal compressibilities (κT), three constant volume simulations were carried out for 5 ns each, with pressures 1, 0.9, and 1.1 bar. The calculations of κ<sup>T</sup> was also done using the finite difference method:

$$\kappa\_T \approx \frac{1}{V} \left(\frac{\partial V}{\partial P}\right)\_T \approx -\frac{\ln \rho\_2 - \ln \rho\_1}{\langle P \rangle\_{\rho\_2} - \langle P \rangle\_{\rho\_1}}\tag{4}$$

#### 2.4.2. Solvation Free Energy Simulations

Simulations in water were carried out to evaluate the solvation free energies (1Ghyd) of 30 molecules at 1 bar and 298 K. Each aromatic ring (solute) was centered into a cubic box with appropriate dimensions to reproduce the density of SPC water models (0.997 g/cm<sup>3</sup> ). In free-energy calculations using thermodynamic integration (TI) method, a coupling parameter λ is used to perturb solute-solvent interactions.

$$
\Delta \mathcal{G}\_{\rm sim} = \int\_0^1 \left\langle \frac{\partial H}{\partial \lambda} \right\rangle\_\lambda d\lambda \tag{5}
$$

in which H is the Halmiltonian, λ = 0 refers to the state in which the solute fully interacts with the solvent and λ = 1 refers to the state in which the solute-solvent interactions do not exist. In our setup, Coulombic interactions were decoupled first, and the Lennard-Jones interactions after, using a soft-core potential to avoid issues related to strong Lennard-Jones interactions (Beutler et al., 1994). A soft-core power was set to 1 and αLJ set to 0.5, following recommendations of Shirts and Pande (2005). Both interactions were decoupled using λ values: 0, 0.02, 0.04, 0.07, 0.1, 0.15, 0.2, ..., 0.8, 0.85, 0.9, 0.93, 0.96, 0.98, 1, totalizing 50 λ simulations.

Our simulation protocol consisted of an initial steepestdescent minimization, followed by a L-BFGS minimization until a maximum force of 10 kJ/(mol-1 nm-1) was reached.

TABLE 2 | Dataset of aromatic rings evaluated in this work. Heteroatoms are highlighted in colors.

After, initial velocities were assigned and the systems were equilibrated for 100 ps using a NVT ensemble at each λ. The systems were subjected to another 100 ps of equilibration on a NPT ensemble, using the Parrinello-Rahman pressure coupling algorithm (Parrinello and Rahman, 1981), a τ<sup>t</sup> = 5 ps time constant for coupling and a compressibility of 4.5 × 10<sup>5</sup> bar-1 . Finally, production simulations were done using the Langevin integrator (Van Gunsteren and Berendsen, 1988) to sample the h∂H/∂λi<sup>λ</sup> until convergence. Therefore, simulations time varied between 1 and 5 ns. In addition, the last frame of the production phase of each λ was used as input for the next subsequent λ.

#### 2.4.3. Simulation of Rings in Water

After an extensive comparison of simulated and experimental physicochemical properties of our calibration set and consequent validation, the same strategy of topological construction was applied to other 61 rings commonly used in drug design (**Table 2**) for which experimental properties are not available, totalizing 103 aromatic rings in this study. Hence, in order to evaluate chemical features and interactions of aromatic rings with their surroundings, a total set of 103 aromatic was simulated in water, including all 42 molecules present in the calibration set (**Table 1**). Each solute was placed in a cubic box with a distance of 1.0 nm to its edges. The boxes were then filled with SPC water model and minimized long enough eliminate any possible clashes until convergence at a maximum force of 0.1 kJ/mol×nm. After, the system was equilibrated in a NVT ensemble at 298.15 K using the Nosé-Hoover algorithm (Nosé, 1984) for temperature coupling. Production runs of 250 ns were carried out with temperature and pressure coupling handled by V-rescale (Bussi et al., 2007) and Parrinelo-Rahman (Parrinello and Rahman, 1981) algorithms, using τ<sup>T</sup> = 0.1 ps and τ<sup>P</sup> = 2.0 ps. The GROMACS tools hbond, rdf, and sorient were used to calculate H-bonds related properties and solvation structure around the heteroatom using a block-averaging approach over 5 box of 50 ns.

dihedral distribution during simulations.

## 3. RESULTS

### 3.1. New Torsional Profiles

In order to accurately describe the torsional angles of the selected aromatic rings, a total of 15 new dihedral potentials were derived by fitting the MM profiles to the corresponding QM-calculated ones (Table S1). Fittings were conducted using the Rotational Profiler server (Rusu et al., 2014). For all cases, the use of new parameters yield almost identical values of minimum and barrier amplitudes to those calculated by QM (**Figure 1**). Dihedral distribution throughout simulations was also evaluated.

### 3.2. Physical-Chemical Properties

In order to validate our strategy of topology building, boxes of organic liquids were simulated to obtain physical-chemical properties for each compound. Reference experimental values (Table S2) were used to calculate the absolute error of each property and to guide adjustments on the coulombic terms in order to mitigate deviations. We have calculated the θ angle between QM and MM dipole moments and the final version of our calibration set (**Table 1**) yielded an average θ angle of 2.5◦ ± 6.1◦ , suggesting that our MM models conserve the direction of the QM dipole moment, preserving the electrostatic potential of each molecule.

Following the GROMOS philosophy (Oostenbrink et al., 2004; Horta et al., 2016), density (ρ), enthalpy of vaporization (1Hvap), and free energy of solvation (1Ghyd) were used as targets for the parametrization, while isothermal coefficient (αP), isothermal compressibility (κT), dielectric constant (ε), and classic isobaric heat capacity (CPcla) were calculated as benchmarks for GROMOS performance and compared with the results obtained in Caleman et al. (2012) and Horta et al. (2016) (**Table 3**). Linear regression between experimental and simulated values were calculated in order to access the prediction power of the employed strategy (**Figure 2**). The equations further reported were calculated excluding outliers (values higher than 2 standard deviations).

Regarding the targeted properties, our calibration set yielded the equations y = 0.9118x + 0.1001 for density, y = 1.0699x − 1.6491 for enthalpy of vaporization and y = 0.8676x + 0.8929 for free energy of solvation, with correlation coefficients of R = 0.92, R = 0.96, and R = 0.89, respectively. In terms of average deviation (AVED), our calibration set overestimates ρ in 0.008 g/cm<sup>3</sup> , 1Hvap in 1.51 kJ/mol and underestimates 1Ghyd in 3.35 TABLE 3 | Average deviation between experimental and simulated physicochemical properties of aromatic rings evaluated in our calibration set. Simulated GAFF and OPLS-AA values were obtained from Caleman et al. (2012) and 2016H66 values from Horta et al. (2016). Density (ρ) in g/cm<sup>3</sup> , enthalpy of vaporization (1Hvap) in kJ/mol, thermal expansion coefficient (αP) in 10-3/K, isothermal compressibility (κ<sup>T</sup> ) in 1/GPa, dielectric constant (ε), classic isobaric heat capacity (Cpcla) in J/mol×K, and free-energy of solvation (1Ghyd) in kJ/mol.


kJ/mol. Without the outliers, the AVED for 1Ghyd improves to 2.83 kJ/mol.

Non-targeted properties were calculated to evaluate how they behaved in our simulations. Linear regressions yielded equations y = 0.93825x + 0.1406 for α<sup>P</sup> (R = 0.82), y = +0.90079x − 0.0140 for κ<sup>T</sup> (R = 0.70), y = 0.2581x + 1.8961 for ε (R = 0.65), and y = 0.8989x + 100.5 for Cpcla (R = 0.77). In terms of AVED, α<sup>P</sup> is overestimated in 0.14 10−<sup>3</sup> /K and κ<sup>T</sup> is overestimated in 0.0465 1/GPa. As expected (Caleman et al., 2012; Horta et al., 2016), ε is poorly described due to the lack of polarization effects, resulting in a underestimation of −4.52 in the dielectric constant. On other hand, Cpcla was overestimated by 88.2 J/mol×K, a behavior aligned with recent works in literature (Caleman et al., 2012; Horta et al., 2016). Individual AVED and absolute errors can be found in Tables S4, S5 in Supplementary Material, along with experimental properties in Table S3.

#### 3.3. Interactions in Water

In order to quantitatively evaluate the behavior of heteroaromatic rings in water and their interactions with the aqueous surrounding, some properties were calculated throughout 250 ns of simulation. From these calculations, we were capable to assess the average H-bond (AverHB) of each heteroatom along with its residence time (τHB), lifetime (lifetimeHB), the free-energy of breakage of a H-bond (1GHB), and the percentage of simulation time that a given heteroatom was involved in, at least, one Hbond (Percent). We were also capable to obtain the optimal binding distance between an heteratom and water (OBDHB), along with the coordination number (CNHB) at the OBDHB and the average orientation of water molecules surrounding the heteroatom. These data are compiled in **Tables 4**, **5**.

#### 4. DISCUSSION

#### 4.1. Topology Building Strategy

The accurate description of organic compounds' chemical diversity, mainly in the context of drugs and medicinal chemistry, is a challenging task in molecular mechanics since it must be described as broadly as possible by the force field fragments. However, the most common sets of MM parameters

employed in biomolecules simulations are usually centered on the monomeric constituents of biopolymers and lipids, while parameters for synthetic compounds, as well as other common non-polymeric biological molecules (e.g., natural products), must be included from specific calculations or external sets of parameters.

In this sense, a proper description of torsional terms will impact directly the dynamical behavior of these small molecules, even considering that, when evaluating ligand-receptor complexes, the influence of these terms might be mitigated due to the ligand movement restriction inside the binding pocket. Still, accommodation of flexible docking derived poses, fine tunning of induced fit, and characterization of ligands conformational induction vs. selection (with potential inferences of the entropic costs of binding) require dihedrals potentials specifically adjusted to organic compounds. Hence, new parameters were generated in this work exclusively for 15 dihedrals in aromatic rings in our calibration set (**Figure 1**). In general, our results revealed that our MM parameters yielded a good description of the QM torsional profile, with the exceptions of [16] tiophenol, [42] phenoxybenzene, [24] phenylmethanol, and [18] trifluoromethylbenzene. For these molecules, the distribution profile was almost evenly spread, most likely due to the low energy barrier (below 2.5 kJ/mol), indicating that transient states are commonly achieved during our simulations in SPC water model. Simulations of these particular molecules in vacuum revealed little influence of water solvation in the dihedral profile (data not shown).

In another sense, the choice of an atomic charge set for ligands can drastically impact thermodynamical binding properties such as complexation free-energy and desolvation. Therefore, we employed in this work a dipole moment based strategy to describe the Coulombic contribution using physicochemical properties of organic liquids as target. The prediction power of our strategy was compared to recent comparisons of aromatic compounds in liquid phase (Caleman et al., 2012; Horta et al., 2016) and summarized in **Table 3**. In general, our calibration set yielded similar or lower average deviations than benchmarks made with OPLS-AA, GAFF, and 2016H66 sets for all physicochemical properties evaluated in this work. The main difference was in terms of Cpcla, for which GAFF and OPLS-AA overestimate nearly 40 J/mol×K more than our parameters. Still, all four parameters sets overestimates Cpcla. In addition, the GROMOS53A5 force field was designed to reproduce physicochemical properties, and later on adjusted to reproduce free energy of solvation and hydration (GROMOS53A6) (Oostenbrink et al., 2004). The average deviation on density, enthalpy of vaporization and freeenergy of solvation of GROMOS53A5 were 0.0389 g/cm<sup>3</sup> , −0.4 and 3.8 kJ/mol, respectively. These values are very similar to our results, as shown in **Table 3**, reiterating the quality of our parameters.

It is important to mention that the employed benchmark set was built using the same Lennard-Jones parameters used in the benzene ring of phenylalanine in GROMOS53A6. While GROMOS53A6 produces a 1Ghyd = 0.0 kJ/mol for benzene TABLE 4 | Properties of heteroaromatic rings in water. Average H-bonds (AverHB), H-bond residence time (τHB) is ps, H-bond lifetime (lifetimeHB) in 1/ps, free-energy of H-bond breakage (1GHB) in kJ/mol, percentage of simulation with at least one formed H-bond (Percent.), coordination number of water (CN), optimal binding distance with water (OBDHB) in nm, and overall water orientation around the heteroatom (Orientation).



Colors represent different functional groups: red for oxygen, blue for nitrogen, orange for sulfur and green for halogen containing groups.

(phenylalanine side-chain), our benzene parameters yield a 1Ghyd = −3.4 kJ/mol, a much closer value to the experimental data (1Ghyd = −3.6 kJ/mol). Nevertheless, the AVED value reveals a underestimation for free energy of hydration in our parameter set. A possible reason is that chemical functions such as nitro, fluorine, chlorine, and aldehydic carbonyls are not commonly found in biomolecules and, therefore, the LJ parameters used in GROMOS53A6 may not be properly extrapolated to synthetic compounds. Moreover, we have tested ether oxygens LJ parameters reported in Horta et al. (2011) in our pure liquid simulations of [2]furan and [23]methoxybenzene, leading to approximately the same behavior in their respective physical-chemical properties (data not shown).

#### 4.2. Properties in Solution: Influence of Nearby Substitutions in H-Bonds

In order to access quantitative informations regarding how aromatic rings interact with their surroundings, we performed molecular dynamics simulations for 103 aromatic rings most commonly used in drug design, including our 42 molecules TABLE 5 | Properties of heteroaromatic rings in water. Average H-bonds (AverHB), H-bond residence time (τHB) is ps, H-bond lifetime (lifetimeHB) in 1/ps, free-energy of H-bond breakage (1GHB) in kJ/mol, percentage of simulation with at least one formed H-bond (Percent.), coordination number of water (CN), optimal binding distance with water (OBDHB) in nm, and overall water orientation around the heteroatom (Orientation).






Colors represent different functional groups: red for oxygen, blue for nitrogen, orange for sulfur and green for halogen containing groups.

calibration set. These information are condensed in the **Tables 4**, **5**. Simulations were carried for 250 ns to properly sample multiple events of H-bond breakages and solvation shell rearrangements.

Our results reveal non-obvious information about the Hbond availability and strength, as in the case of [5]pyridine/ [6]pyrimidine/[56]pyrazine/[70]pyridazine/[71]triazine series (**Figure 3**). While exchanging a pyridine by a pyrimidine ring might lead to apparent gain of a H-bond acceptor, nitrogens of pyrimidine present a 1GHB of nearly 1 kJ/mol lower than pyridine. Moreover, the Percent of time with at least one formed H-bond between water and pyridine nitrogen is higher than the ones in pyrimidine. When comparing pyridine with pyrazine (an addition of another N in para), H-bonds are very similar, so as the second and third solvation layers. Also, acceptance capacity in pyrimidine ring is very similar to triazine, where all three nitrogens are located in meta. Intriguingly, values for pyridine are very similar to the ones calculated for pyridazine, with a slight increase in OBDHB and a more compact second layer of solvation, as shown in **Figure 3A**. These results suggest that

another nitrogen acceptor in meta decreases nitrogen acceptance capacity, while another nitrogen acceptor in ortho has low effect in H-bond capacity, but a considerable effect in the solvation layers structures. In this sense, these features can impact the binding inside receptors. Pyridazine, for example, has a larger OBDHB than pyridine, suggesting that these molecules can occupy the binding pocket in a different manner, impacting the entropic cost of binding.

Other cases have been equally surprising, like the [39]quinoline/[40]isoquinoline. The main difference between them is the location of the acceptor nitrogen (closer to C<sup>8</sup> in the quinoline fused ring). Counterintuitively, the AverHB of isoquinoline is slightly lower than for quinoline, such as the τHB, and the 1GHB is almost 1.25 kJ/mol lower. The same properties for pyridine ring are somewhat between these values of quinoline and isoquinoline. In addition, 1GHB for [51]quinazoline and [72]quinoxaline rings are almost 3 kJ/mol lower than quinoline and isoquinoline. In this sense, quinazoline and quinoxaline would be better candidates in fragment-based drug design due to the lower energetic cost of desolvation, while maintaining the H-bond capacity inside the receptor. Another case in terms of aromatic nitrogen hydrogen bond acceptor is the [37]2,4,6-trimethylpyridine (**Figure 3B**). The presence of methyl groups in both ortho positions drastically reduces the availability of H-bonds, as shown in **Figure 3**, and diminish the residence time of the accepted H-bond. But the presence of only one methyl group in ortho appears to have a modest effect, slightly favoring the presence of H-bond in nitrogen of [19]2-methylpyridine. Moreover, the second and third solvation layers of 2- and 2,4,6-trimethylpyridine are dismantled, while the same behavior is not observed for [20]3 and [21]4-methylpyridine.

Other non-obvious events can be observed regarding Hbond donation in hydroxyls groups. In case of [12]phenol, the necessary energy to break a donated H-bond (∼10 kJ/mol) is almost the double to break an accepted one (∼5.70 kJ/mol), in alignment with the QM data reported by Parthasarath et al. (2005) in HF, MP2, and DFT level. And while phenol and [24]phenylmethanol might appear interchangeable during the lead optimization process, the 1GHB of accepted and donated H-bonds in the hydroxyl group is almost 1 kJ/mol higher for phenylmethanol. While targeting thermodynamics of binding during drug design, these energy costs of desolvation can play a crucial role. As expected, benzenethiol was revealed to be a poor acceptor of hydrogen bonds in our simulations, but a reasonable H-bond donator. In terms of vicinity effects, methylation in ortho seems to have little effect on hydroxyl groups, since the properties evaluated for the series [12]phenol/[25]2-methylphenol/[26]3 methylphenol/[27]4-methylphenol have very similar behavior.

It is well– know that halogens are widely used for drug design, and the role of halogen bonds (X-bonds) and H-bonds role have been investigated thoroughly (Rendine et al., 2011; Ford and Ho, 2016; Lin and Mackerell, 2017). In general, the H-bonding strength decreases with the halogen radius (F > Cl > Br > I), while the halogen bond strength increases (Rendine et al., 2011). In this work, we investigated how fluorine and chlorine behave as H-bond acceptors in water. In the case of [7]fluorobenzene, the 1GHB = 1.54 ± 0.24 is in accordance with a weak Hbond (Domagała et al., 2017). The other fluorinated rings in the series (1,2-, 1,3-, 1,2,3,4-, and 1,2,3,5-tetrafluorobenzene [8-11]) have similar values, varying from 1.5 to 2.2 kJ/mol. Regarding the chlorinated rings series (chlorobenzene, 1,2-, 1,3-, 1,2,3,4-, and 1,2,3,5-tetrachlorobenzene [94–98]), 1GHB ranged from 1.80 to 3.24 kJ/mol, contradicting the expected behavior. X-bonding are often poorly described in MM, since it treats atoms as a sphere with isoelectric surface and thus not describing the necessary positive potential required for such interaction. In fact, we have visually evaluated that waters surrounding fluorine and chlorine have their hydrogens oriented toward the halogens, confirming our measure of H-bonds and not X-bonds.

Regarding oxygen atoms within the aromatic ring, AverHB are generally lower than expected. It is well known that oxygens in heterocycles act as H-bond acceptor (Kaur and Khanna, 2011), but our model does not reproduce this tendency. It is important to notice that GROMOS53A6 does not have specific parameters for oxygens within aromatic rings, and LJ parameters from ethers were employed. Not surprisingly, the calculated properties for the oxygen atom in furan and benzofuran are very similar to methoxybenzene and phenoxybenzene. This result suggests that the description of the properties in aqueous solutions of aromatic rings containing oxygen might be improved by specific LJ parameters. Moreover, we have tested ether LJ parameters reported in Horta et al. (2011) for our simulations of furan and methoxybenzene in water, yielding lower AverHB and 1GHB (data not shown). The new force field parameters developed in this work can be obtained upon request.

### 4.3. Impacts in Drug Design

Recently, several authors have questioned the LE approach as optimization tool and its actual power to lead to high affinity compounds (Abad-Zapatero, 2007; Morgan et al., 2011; Cavalluzzi et al., 2017). Another recent review (DeGoey et al., 2017) has pointed out the emergence of approved drugs that violate Lipinski's rules of 5 and correlated them to properties such as number of aromatic rings and rotatable bonds. Freire (2009) have proposed an experimental thermodynamic approach to guide the drug design process and these results led to believe that tweaking ligand enthalpy and entropy of binding is not only experimentally possible, but also possible to predict. Therefore, the GROMOS series of force fields present an extra advantage here due to their calibration to reproduce free-energy of solvation and other thermodynamical properties.

In this sense, we have parameterized and validated a calibration set of 42 aromatic rings commonly used in drug design using thermodynamical properties in condensed phase. After, we performed a study with a larger dataset of 103 heteroaromatic rings in order to understand how these molecules interact with water and to prospect and map potential interactions with target-receptors. The water molecules probe the occurrence of hydrogen bonds, and the absence of these interactions, as well as the distance from the first solvation sphere, may probe sites for hydrophobic interactions. With these information at hand, medicinal chemists and pharmacologists may employ quantitative estimations on how each functional group may or may not interact with its target protein, as well as identify the potential influence of close chemical modifications. These properties (and a handful of others) are compiled in **Tables 4**, **5**, and can be used as reference during lead optimization process.

The strategy employed here could be used to amplify the spectrum of drug fragments with accurate description of chemical events simulated by molecular dynamics. In addition, it can improve the description of drug-receptor complexation dynamics of other molecules of interest, molecular recognition of drugs and signal transduction mediated by conformational changes of ligands. In fact, by assessing the strength and availability of interactions between aromatic rings and water solvent, the results presented here not only offer detailed quantitative information about potential interactions that each individual aromatic ring can make with its surrounding, but also shed light upon the energetics of biological events, such as dismantling solvation shells — an important step in the ligand binding process.

### 5. CONCLUSIONS

In this work, we have successfully produced topologies for a calibration set of 42 aromatic rings using as target physicochemical properties of respective organic liquids. Our strategy revealed a very competitive prediction power when compared alongside with other force fields, while presenting a simple approach to describe aromatic rings through molecular dynamics simulations that can be easily extrapolated to other rings. In addition to that, H-bond availability and solvent accessibility are difficult and non-obvious informations to predict from bidimensional data, but still essential for medicinal chemistry purposes. Here, we have simulated in aqueous solvent more than 100 aromatic rings commonly used in drug design in order to assess dynamical chemical properties, such as average Hbonds, their lifetime, residence time and free energy of breakage. Thus, we have described a low cost approach based on molecular dynamics simulations to access valuable information that could be useful both to predict the enthalpic cost of desolvation and for interpretation of pharmacological data by a medicinal chemist or pharmacologist. Our results provide a large database of quantitative information for a total of 103 aromatic rings most commonly used in drug design that can guide medicinal chemists in future drug design efforts.

#### AUTHOR CONTRIBUTIONS

MP carried out quantum calculations, molecular dynamics simulations, data analyses, and drafted the manuscript. VR contributed in the simulations protocols and manuscript draft. BG wrote in house scripts for dipole-based charge assignment

#### REFERENCES


and data analyses. MD contributed to manuscript draft. RL contributed to simulations protocols and manuscript draft. HV contributed to data analyses and manuscript draft.

### FUNDING

The authors thank the funding agencies Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Fundação de Amparo à Pesquisa do Rio Grande do Sul (FAPERGS). This work was partially supported by grants from FAPERGS/PRONUPEQ (16/2551-0000520-6).

#### ACKNOWLEDGMENTS

Research developed with support of the Centro Nacional de Supercomputação (CESUP), from Universidade Federal do Rio Grande do Sul (UFRGS). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00395/full#supplementary-material


aldehydes, ketones, carboxylic acids, and esters. J. Chem. Theor. Comput. 7, 1016–1031. doi: 10.1021/ct1006407


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer GT and handling Editor declared their shared affiliation.

Copyright © 2018 Polêto, Rusu, Grisci, Dorn, Lins and Verli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Bridging Molecular Docking to Molecular Dynamics in Exploring Ligand-Protein Recognition Process: An Overview

Veronica Salmaso and Stefano Moro\*

Molecular Modeling Section, Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy

Computational techniques have been applied in the drug discovery pipeline since the 1980s. Given the low computational resources of the time, the first molecular modeling strategies relied on a rigid view of the ligand-target binding process. During the years, the evolution of hardware technologies has gradually allowed simulating the dynamic nature of the binding event. In this work, we present an overview of the evolution of structurebased drug discovery techniques in the study of ligand-target recognition phenomenon, going from the static molecular docking toward enhanced molecular dynamics strategies.

#### Edited by:

Adriano D. Andricopulo, Universidade de São Paulo, Brazil

#### Reviewed by:

José Pedro Cerón-Carrasco, Universidad Católica San Antonio de Murcia, Spain Andrea Mozzarelli, Università degli Studi di Parma, Italy

> \*Correspondence: Stefano Moro stefano.moro@unipd.it

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 04 May 2018 Accepted: 26 July 2018 Published: 22 August 2018

#### Citation:

Salmaso V and Moro S (2018) Bridging Molecular Docking to Molecular Dynamics in Exploring Ligand-Protein Recognition Process: An Overview. Front. Pharmacol. 9:923. doi: 10.3389/fphar.2018.00923 Keywords: ligand-protein binding, molecular docking, molecular dynamics, enhanced sampling, protein flexibility, molecular recognition

## INTRODUCTION

No protein is an island but exerts its function through the recognition of other molecular partners (Salmaso, 2018). Ligand-protein interactions are involved in many biological processes with consequent pharmaceutical implications. Thus, the scientific community has been putting a great effort into the investigation of the binding phenomenon during the years, leading to the proposal of several theories characterized by an increasing emphasis on the degree of flexibility of the ligand and protein counterparts.

The first explanation of binding was provided by Emil Fischer in 1894 (Fischer, 1894) with the "lock-key" model to interpret enzyme specificity: the ligand rigidly recognizes and occupies the protein binding site like a key to its lock, because of their native shape complementary. Since this model could not explain either the behavior of enzyme noncompetitive inhibition or allosteric modulation, different modifications have been proposed. Koshland (1958) introduced the "induced-fit" theory: according to his observations on enzyme-substrate interactions, the ligand is able to induce conformational changes to the protein, optimizing ligand-target interactions. Later works suggested that proteins naturally exist as an ensemble of conformations (Monod et al., 1965), described by an energy landscape (Frauenfelder et al., 1991), and ligands preferentially bind to one of them (Austin et al., 1975; Foote and Milstein, 1994). According to this interpretation of binding, known as "conformational selection," the ligand stabilizes one of the protein conformations with a consequent shift of the protein population equilibrium (Kumar et al., 2000). These two apparently contrasting theories have simply different ranges of applicability, and the descriptions they provide of molecular binding differ for the chronological sequence of events in which the binding process is decomposed (Kobilka and Deupi, 2007; Okazaki and Takada, 2008; Zhou, 2010). New theories are emerging, making a compromise between the aforementioned ones: according to the extended conformational selection model, for example, the conformational selection is followed by a conformational adjustment (induced fit) (Csermely et al., 2010).

The evolution of binding models has practical relevance besides an epistemological significance; the knowledge of ligandtarget binding is at the basis of rational drug design but understanding this complex process on a mechanistic level may open new scenarios. In addition, to suggest ligand modification meant to optimize the final bound state, the medicinal chemist may look at kinetically relevant intermediate states and try to affect them.

### COMPUTATIONAL METHODS TO STUDY LIGAND-PROTEIN BINDING

Since the 1980s, computer technologies have been applied to the drug discovery process (Van Drie, 2007), giving rise to Computer-Aided Drug Design (CADD). This technique earned soon great interest and deserved a cover article on October 5, 1981, Fortune magazine, entitled "Next Industrial Revolution: Designing Drugs by Computer at Merck" (Van Drie, 2007). CADD techniques are used principally for three reasons: virtual screening hit/lead optimization and design of novel compounds. In virtual screening a huge database of compounds is examined searching for binding capacity for a target and a subset of compounds is picked out and suggested for in vitro testing; the purpose is to increase the hit rate of novel drugs by reducing the number of compounds to test experimentally. The second application of CADD is the optimization of a hit/lead compound driven by the rationalization of a structure-activity relationship. After the individuation of key elements for binding, the design of new compounds can be attempted (Salmaso, 2018).

CADD methods may be classified as ligand-based (LB) and structure-based (SB), depending on the availability and employment of the target structure (Sliwoski et al., 2014). In the framework of CADD, structure-based drug design (SBDD) methods take advantage of the abundance of experimentally solved structures in the Protein Data Bank (Berman et al., 2000), which can possibly be used also as templates for homology models if the structure of interest is lacking. SBDD is based on the premise that the knowledge of the target structure can help to rationalize and optimize binding since ligand-target interactions are mediated by their complementarity. With the evolution of the binding models, it is clear that speaking of "target structure" is an approximation, given that proteins fluctuate among an ensemble of structures (Miller and Dill, 1997).

The possibility to predict ligand binding modes and to interpret binding processes is valuable to individuate, optimize and suggest novel ligands, and for this reason, the scientific community has been putting great efforts in developing new computational techniques.

In the following paragraphs, we will present an excursus over the main structure-based computational techniques employed in drug discovery. An urgency to simulate protein flexibility throughout binding has been experienced over the years, arising from the evolution of the binding models from static to dynamic. The inclusion of flexibility features in conformational sampling entails an increase in the number of degrees of freedom of the system, and consequently in the computational effort. For this reason, the development of computational tools has been occurring in parallel and thanks to the continuous improvement of hardware technologies.

#### Molecular Docking

Molecular docking techniques aim to predict the best matching binding mode of a ligand to a macromolecular partner (here just proteins are considered). It consists in the generation of a number of possible conformations/orientations, i.e., poses, of the ligand within the protein binding site. For this reason, the availability of the three-dimensional structure of the molecular target is a necessary condition; it can be an experimentally solved structure (such as by X-ray crystallography or NMR) or a structure obtained by computational techniques (such as homology modeling) (Salmaso, 2018).

Molecular docking is composed mainly by two stages: an engine for conformations/orientations sampling and a scoring function, which associates a score to each predicted pose (Abagyan and Totrov, 2001; Kitchen et al., 2004; Huang and Zou, 2010). The sampling process should effectively search the conformational space described by the free energy landscape, where energy, in docking, is approximated by the scoring function. The scoring function should be able to associate the native bound-conformation to the global minimum of the energy hypersurface.

#### Scoring Functions

Scoring functions play the role of poses selector, used to discriminate putative correct binding modes and binders from non-binders in the pool of poses generated by the sampling engine.

There are essentially three types of scoring functions:

1. Force-field based scoring functions:

Force-field is a concept typical of molecular mechanics (see **Box 1**) which approximates the potential energy of a system with a combination of bonded (intramolecular) and nonbonded (intermolecular) components. In molecular docking, the nonbonded components are generally taken into account, with possibly the addition of the ligand-bonded terms, especially the torsional components. Intermolecular components include the van der Waals term, described by the Lennard-Jones potential, and the electrostatic potential, described by the Coulomb function, where a distance-dependent dielectric may be introduced to mimic the solvent effect. However, additional terms have been added to the force-field scoring functions, such as solvation terms (Brooijmans and Kuntz, 2003).

Examples of force field based scoring functions are GoldScore (Verdonk et al., 2003), AutoDock (Morris et al., 1998) (improved as a semiempirical version in AutoDock4, Huey et al., 2007), GBVI/WSA (Corbeil et al., 2012).

#### 2. Empirical scoring functions:

These functions are the sum of various empirical energy terms such as van der Waals, electrostatic, hydrogen bond, desolvation, entropy, hydrophobicity, etc., which are weighted by coefficients optimized to reproduce binding affinity data of a training set by least squares fitting (Huang and Zou, 2010).

The LUDI (Böhm, 1994) scoring function was the first example of an empirical one. Other empirical scoring functions are GlideScore (Halgren et al., 2004; Friesner et al., 2006), ChemScore (Eldridge et al., 1997), PLANTSCHEMPLP (Korb et al., 2009).

#### 3. Knowledge-based scoring functions:

Box 1 | Molecular mechanics.

Molecular mechanics is a method which approximates the treatment of molecules with the laws of classical mechanics, in order to limit the computational cost required for quantum mechanical calculations (Vanommeslaeghe et al., 2014). Atoms are considered as charged spheres connected by springs, neglecting the presence of electrons, in accordance with Born-Oppenheimer approximation (Born and Oppenheimer, 1927). The potential energy is approximated by a simple function which is called forcefield; it is the sum of bonded (intramolecular) and nonbonded energy terms. The basic form of the function comprise bond stretching and bending described by harmonic potential, and torsional potential described by a trigonometric function, in the bonded portion. Nonbonded terms consist of van der Waals and Coulomb electrostatic interactions between couples of atoms.

As an example, these basic components of the CHARMM [78] force field are reported in the following equations

$$\vee = \vee\_{\text{Document}} + \vee\_{\text{nonbonded}}$$

$$\begin{split} \mathsf{V\_{\text{bonded}}} &= \sum\_{\text{decracks}} \mathsf{K\_{\tilde{\Omega}} (\tilde{p} - \Omega\_{\tilde{\Omega}})^2} + \sum\_{\text{acgesies}} \mathsf{K\_{\tilde{\theta}} (\tilde{\theta} - \theta\_{\tilde{\Omega}})^2} \\ &+ \sum\_{\text{chedresids}} \mathsf{K\_{\tilde{\lambda}} (1 + \cos \left( \sigma\_{\tilde{\lambda}} - \delta \right))} \\ \mathsf{V\_{\text{nonbonded}}} &= \sum\_{\text{contrionized} \mathsf{C} \mathsf{I}} \frac{\mathsf{q\_{\tilde{\ell}} \mathsf{q\_{\tilde{\ell}}}}}{\mathsf{c} \mathsf{r}\_{\tilde{\delta}}} \\ &+ \sum\_{\text{condresids}} \mathsf{c}\_{\tilde{\delta}} \left[ \left( \frac{\mathsf{R\_{\min, \tilde{\delta}}}}{\mathsf{r}\_{\tilde{\ell}}} \right)^{12} - 2 \left( \frac{\mathsf{R\_{\min, \tilde{\delta}}}}{\mathsf{r}\_{\tilde{\ell}}} \right)^{6} \right] \\ &\underset{\text{cids} \mathsf{r} \tilde{\delta}}{\text{ pairs}} \end{split}$$

where Kb, Kθ, and K<sup>χ</sup> are the bond, angle and torsional force constants; b, θ and χ are bond length, bond angle and dihedral angle (those with the 0 subscript are the equilibrium values); n is multiplicity and δ the phase of the torsional periodic function; rij is the distance between atoms i and j; qi and qj are the partial charges of atoms i and j; ε is the effective dielectric constant; εij is the Lennard-Jones well depth and Rmin,ij is the distance between atoms at Lennard-Jones minimum.

These terms may appear slightly different in different force-fields, and anharmonicity and cross-terms are generally added.

The parameters of the force field are obtained by fitting quantum mechanical or experimental values.

These methods assume that ligand-protein contacts statistically more explored are correlated with favorable interactions. Starting from a database of structures, the frequencies of ligand-protein atom pairs contacts are computed and converted into an energy component. When evaluating a pose, the aforementioned tabulated energy components are summed up for all ligandprotein atom pairs, giving the score of the pose.

DrugScore (Gohlke et al., 2000; Velec et al., 2005) and GOLD/ASP (Mooij and Verdonk, 2005) are examples of knowledge-based scoring functions.

Another strategy consists in the combination of multiple scoring functions leading to the so-called consensus scoring (Charifson et al., 1999).

In addition, new scoring functions have been developed: for example, based on machine learning technologies, interaction fingerprints and attempts with quantum mechanical scores (Yuriev et al., 2015).

#### Sampling

The first molecular docking algorithm was developed in the 1980s by Kuntz et al. (1982); the receptor was approximated by a series of spheres filling its surface clefts, and the ligand by another set of spheres defining its volume. A search was made to find the best steric overlap between binding site and receptor spheres, neglecting any kind of conformational movement.

This method belongs to the group of fully-rigid docking techniques, according to the classification which divides docking methods according to the degrees of flexibility of the molecules involved in the calculation Halperin et al., 2002 (**Figure 1**):

#### 1. Rigid docking:

Both ligand and protein are considered rigid entities, and just the three translational and three rotational degrees of freedom are considered during sampling. This approximation is analogous to the "lock-key" binding model and is mainly used for proteinprotein docking, where the number of conformational degrees of freedom is too high to be sampled. Generally, in these methods, the binding site and the ligand are approximated by "hot" points and the superposition of matching point is evaluated (Taylor et al., 2002).

#### 2. Semi-flexible docking:

Just one of the molecules, the ligand, is flexible, while the protein is rigid. Thus, the conformational degrees of freedom of the ligand are sampled, in addition to the six translational plus rotational ones. These methods assume that a fixed conformation of a protein may correspond to the one able to recognize the ligands to be docked. This assumption, as already reported, is not always verified.

#### 3. Flexible docking:

It is based on the concept that a protein is not a passive rigid entity during binding and considers both ligand and protein as flexible counterparts. Different methods have been introduced during the years, some rested on the induced fit binding model and others on conformational selection.

The great number of degrees of freedom introduced by flexible docking makes the potential energy surface to be a function of numerous coordinates. Consequently, the computational effort required to perform a docking calculation is augmented, but both sampling and scoring should be optimized to give a good balance between accuracy and speed. In fact, virtual screening campaign of millions of compounds depends on the velocity of docking calculations. For this reason, more and more improvements have been made in the development of the new algorithm, able to deeply search the phase space but not at the expense of velocity.

#### Semi-flexible Docking

Numerous docking algorithms have been developed since the 1980s. Often it is difficult to classify clearly each docking software, because different algorithms may be integrated into a multiphase approach. However, docking algorithms can be classified as follows (Kitchen et al., 2004; Huang and Zou, 2010):

1. Systematic search techniques:

In a systematic search, a set of discretized values is associated with each degree of freedom, and all the values of each coordinate are explored in a combinatorial way (Brooijmans and Kuntz, 2003). These methods are subdivided into:

a. Exhaustive search - it is a systematic search in the strict sense since all the rotatable bonds of the ligands are examined in a systematic way. A number of constraints and termination criteria is generally established to limit the search space and to avoid a combinatorial explosion. The docking pipeline of the software Glide (Friesner et al., 2004; Halgren et al., 2004) involves a stage of the exhaustive search.


Stochastic algorithms change randomly, instead of systematically, the values of the degrees of freedom of the system. The advantage of these techniques is the speed, so they could potentially find the optimal solution really fast. As a drawback, they do not ensure a full search of the conformational space, so the true solution may be missed. The lack of convergence is partially solved by increasing the number of iterations of the algorithm. The most famous stochastic algorithms are (Huang and Zou, 2010):

a. Monte Carlo (MC) methods - Monte Carlo methods are based on the Metropolis Monte Carlo algorithm, which introduces an acceptance criterion in the evolution of the docking search. In particular, at every iteration of the algorithm, a random modification of the ligand degrees of freedom is performed. Then, if the energy score of the pose is improved, the change is accepted, otherwise, it is accepted according to the probability expressed in the following equation:

$$P \sim \exp\left[\frac{-(E\_1 - E\_0)}{k\_B T}\right]$$

where E<sup>1</sup> and E<sup>0</sup> are the energy score before and after the modification, k<sup>B</sup> the Boltzmann constant, and T the temperature of the system.

This is the original form of the Metropolis algorithm, but it is implemented in different variants within docking software. Some example are provided by the earlier versions of AutoDock (Goodsell and Olson, 1990; Morris et al., 1996), ICM (Abagyan et al., 1994), QXP (McMartin and Bohacek, 1997), MCDOCK (Liu and Wang, 1999), AutoDock Vina (Trott and Olson, 2010), ROSETTALIGAND (Meiler and Baker, 2006).

b. Tabu search methods - the aim of these algorithms is to prevent the exploration of already sampled zones of the conformational/positional space. Random modifications are performed on the degrees of freedom of the ligand at each iteration. The already sampled conformations are registered, and when a new pose is obtained, it is accepted only if not similar to any previously explored pose. PRO\_LEADS (Baxter et al., 1998) and PSI-DOCK (Pei et al., 2006) are two examples of this category.


Other examples of SOs are SODOCK (Chen et al., 2007), pso@autodock (Namasivayam and Günther, 2007), PSOVina (Ng et al., 2015).

#### 3. Simulation methods:

The most famous example of this category is Molecular Dynamics, a method that describes the time evolution of a system. A wider explanation will be given in section Molecular Dynamics.

Energy minimization methods can be inserted in this category, but generally, they are not used as stand-alone search engines (Kitchen et al., 2004). Energy minimization is a local optimization technique, used to bring the system to the closest minimum on the potential energy surface.

#### Flexible Docking

Some attempts have been made to introduce protein flexibility into docking calculations. These methods take advantage of different degrees of approximation and can be divided into approaches that consider single protein or multiple protein conformations (Alonso et al., 2006).

1. Single Protein Conformation:

#### a. Soft docking:

This method, firstly described by Jiang and Kim (1991), consists of an implicit and rough treatment of protein flexibility. The van der Waals repulsion term employed in force field scoring functions is reduced, allowing small clashes that permit a closer ligand-protein packing. In this way, a sort of induced-fit is simulated. As a drawback, this approach approximates just feeble protein movements and could implicate unreal poses (Apostolakis et al., 1998; Vieth et al., 1999).

#### b. Sidechain flexibility:

This strategy introduces alternative conformations for some protein side chains (Leach, 1994). This is generally done exploiting databases of rotamer libraries. Some docking methods, such as GOLD, sample some degrees of freedom within their own search engine. Obviously, considering side chain flexibility, huge conformational variations of the protein are neglected by these methods.

#### 2. Multiple Protein Conformations:

Multiple experimental structures may be available for the same target. Moreover, an ensemble of protein conformations can be obtained via computational techniques, such as Monte Carlo or Molecular Dynamics simulations. The idea of multiple protein conformations docking is to take into account all the diverse structures, following different possible strategies:

a. Average grid:

The structures of the ensemble are used to construct a single average-grid, which can be either a simple or weighted average combination of them (Knegtel et al., 1997).

b. United description of the protein:

In this case, the structures do not collapse into an average grid but are used to construct the best performing "chimera" protein. For example, FlexE (Rarey et al., 1996) extracts the structurally conserved portions from the structures of the ensemble and uses them to construct an average rigid structure. This portion is fused to the flexible parts of the ensemble in a combinatorial fashion, giving a pool of "chimeras" that are used for docking.

c. Individual conformations:

The structures of the ensemble are considered as conformations that can possibly be bound by the ligand, so various docking runs are performed, evaluating the ligands of interest on all the target conformations (Huang and Zou, 2007). Moreover, a preliminary benchmark assessing the performance of different target structures in a cross-docking experiment may be employed to filter the ensemble of structures (Salmaso et al., 2016, 2018).

Among the drugs approved by the Food and Drug Administration, few examples of successful applications of CADD are available (Talele et al., 2010). Among them, the renin-inhibitor Aliskiren was developed by means of a combination of molecular modeling and crystallographic structure analysis (Wood et al., 2003). However, the binding of non-peptidomimetic ligands to renin has shown huge structural rearrangement of the protein (Teague, 2003), addressing the problem of considering protein flexibility in drug design campaigns. Recently, a comparative study evaluating the performance of ensemble docking and individual crystal structure docking has been proposed for renin (Strecker and Meyer, 2018). An ensemble of 4 crystal structures outperformed the mean results of individual crystal structures in terms of binding mode prediction and screening utility. The ensemble gave worse results than the best performing crystal structure, which though is not known a priori. Not as good results were obtained through a Molecular Dynamics ensemble when compared to crystallographic structures, as confirmed in other cases reported in the literature (Osguthorpe et al., 2012; Ganser et al., 2018). However, Molecular Dynamics has proven to be effective as a tool to explore molecular conformations and as a docking method itself, as reported in the following paragraphs.

#### Molecular Dynamics

Molecular dynamics (MD) is a computational technique which simulates the dynamic behavior of molecular systems as a function of time, treating all the entities in the simulation box (ligand, protein, as long as waters if explicit) as flexible (Salmaso, 2018).

It was developed to simulate simple systems, with the first application to study collisions among hard spheres, in 1957 (Alder and Wainwright, 1957). The first MD simulation of a biomolecule was accomplished in 1977 by McCammon et al. (McCammon et al., 1977); it was a 9.2 ps simulation of a 58-residues Bovine Pancreatic Trypsin Inhibitor (BPTI), performed in vacuum with a crude molecular mechanics potential.

Molecular dynamics compute the movements of atoms along time by the integration of Newton's equations of motions (classical mechanics), reported in the following equation (Leach, 2001; Adcock and McCammon, 2006).

$$\frac{d^2r\_i(t)}{dt^2} = \frac{F\_i(t)}{m\_i}$$

with Fi(t) force exerted on atom i at time t, ri(t) vector position of the atom i at time t, m<sup>i</sup> mass of the atom (**Figure 2**).

In particular, time is partitioned into time steps (δt), which are used to propagate the system forward in time. Several integration algorithms are available, which derive Newton's equations by a discrete-time numerical approximation. The velocity-Verlet integrator is reported in the following equations as an example to compute position and velocity of an atom i at the time step t+δt, starting from step t.

$$r\_{\dot{i}}(t+\delta t) = \left.r\_{\dot{i}}(t) + \left.\nu\_{\dot{i}}(t)\right\delta t + \frac{1}{2}a\_{\dot{i}}(t)\,\delta t^{2}\right.$$

$$\nu\_{\dot{i}}\left(t+\delta t\right) = \left.\nu\_{\dot{i}}\left(t\right) + \frac{1}{2}\left[a\_{\dot{i}}\left(t\right) + a\_{\dot{i}}\left(t+\delta t\right)\right]\delta t^{2}$$

where ri(t), vi(t) and ai(t) are respectively position, velocity and acceleration of atom i at time t, and ri(t+δt), vi(t+δt) and ai(t+δt) are respectively position, velocity and acceleration of atom i at time t+δt.

Acceleration is calculated from the forces acting on atom i according to Newton's second law, and forces are computed from the force field, according to the following equation:

$$a\_i(t) = \frac{d^2r\_i(t)}{dt^2} = \frac{F\_i(t)}{m\_i} = -\frac{dV(r\,(t))}{m\_i dr\_i(t)}$$

where V(r(t)) is the potential energy function retrieved by the force field (see **Box 1**).

The most used force fields in molecular dynamics are CHARMM (MacKerell et al., 1998), AMBER (Cornell et al., 1995), OPLS (Jorgensen and Tirado-Rives, 1988) and GROMOS (Oostenbrink et al., 2004).

#### Molecular Dynamics and Exploration of the Phase Space

MD trajectories can be used as sampling engines; in fact, they produce protein conformations usable for Multiple Protein Conformations docking applications. In particular, McCammon et al. developed the so-called Relaxed-Complex Scheme (RCS), consisting in docking mini-libraries of compounds with AutoDock (Morris et al., 1998) against a large ensemble of snapshots derived from unliganded protein MD trajectories (Lin et al., 2002, 2003; Amaro et al., 2008). This approach is based on the conformational selection binding model, disregarding any influence of the ligand on the receptor. The application of the RCS to the UDP-galactose 4′ -epimerase (TbGalE), for example, led to the identification of 14 low-micromolar inhibitors (Durrant et al., 2010). Another computational pipeline integrating MD simulations and virtual screening has proved to be effective: the coupling of MD, clustering, and choice of the target structure through fingerprints for ligand and proteins (MD-FLAP) improved VS performance (Spyrakis et al., 2015).

MD has further applications as a docking-coupled technique (Alonso et al., 2006) more anchored to the induced-fit model, as it can be used to assess stability (Sabbadin et al., 2014; Yu et al., 2018), to refine and to rescore docking poses (Rastelli et al., 2009).

The relevance of MD simulations as source of target conformational profusion can be exploited to retrieve insights into cryptic pockets or allosteric binding sites (Durrant and McCammon, 2011), as reported by Schame et al., who identified an alternative binding site, named "trench," close to the active site of the HIV-1 integrase (Schames et al., 2004). Moreover, simulations in the explicit solvent may give information on water molecules, that can be classified as "cold" or stable and "hot" or unstable (for a recent and comprehensive overview on the role of water in SBDD; see Spyrakis et al., 2017). In particular, MD may enable to individuate relevant water molecules, according to their order (Li and Lazaridis, 2003) and stationarity (Cuzzolin et al., 2018), and to estimate their contribution in modulating ligand binding (Bortolato et al., 2013; Betz et al., 2016).

All the aforementioned applications of MD are used as a complement to classic molecular docking techniques. however, the simulation of the complete binding process of a ligand, from the unbound state in bulk solvent to the bound state, be considered a fully-flexible docking in explicit solvent. The possibility to investigate the whole binding process could give insights into metastable states reached by the ligand during the simulation, alternative binding sites, the role of water during binding and conformational rearrangements preceding, concurrent or consecutive to binding.

However, the observation of a binding event during a classical MD simulation is very rare, raising the timescale problem. The timestep in molecular dynamics has to be compatible with the

fastest motion in the system; in particular, a timestep of 1–2 fs, corresponding to bond vibrations, has to be used. Thus, a high number of MD steps is required to simulate slow processes, such as large domain motions and binding (µs-ms) (Henzler-Wildman and Kern, 2007), making the computational effort really hard. In particular, slow timescale are linked to processes that require the overcoming of a high energy barrier (Henzler-Wildman and Kern, 2007), corresponding to low populated states in the conformational energy landscape; in this case the simulated system gets trapped in a local minimum, making classical MD inadequate to explore largely the conformational space.

#### Advances in Classical MD Simulations

In 1998 Duan and Kollman performed the first 1µs simulation of a protein in explicit solvent, observing the folding of a 36-residue villin headpiece subdomain from a fully unfolded state. This simulation was two orders of magnitude longer than a stateof-the-art simulation of that period, and it was made possible by advances in massively parallel supercomputers and efficient parallelized codes, but still required 2 months of CPU (Central Processing Units) time (Duan and Kollman, 1998).

Specialized informatic infrastructures have also been designed specifically for MD calculations; for example, a supercomputer named Anton was conceived as a "computational microscope" and was developed with the idea to reach previously inaccessible simulation timescales within a reasonable computation time (Shaw et al., 2008). This machine allowed Shaw et al. to characterize the folding of FiP35 WW domain from a fully extended state in a 100 µs simulation and, in addition, to reach the millisecond timescale in a single simulation of BPTI in the folded-state (Shaw et al., 2010), followed recently by ubiquitin (Lindorff-Larsen et al., 2016). Moreover, with unbiased simulations in the order of ten microseconds, Shaw's group could simulate the complete binding process of beta blockers and agonists to the β2-adrenergic receptor (Dror et al., 2011) and kinase inhibitors to Src kinase (Shan et al., 2011).

As a drawback, the utilization of supercomputer is an expense that not many research groups can afford. Fortunately, the recent years have been characterized by the development of code able to exploit the speed of GPUs (Graphics Processing Units), which has given access to tera-scale performances with the use of a common workstation, and a consequent relatively low cost (Van Meel et al., 2008; Friedrichs et al., 2009; Harvey et al., 2009; Nobile et al., 2017). The architecture of a GPU is meant to parallelize a computation over thousands of cores, with all cores executing the same instructions on different data ("Same Instruction Multiple Data," SIM) (Nobile et al., 2017). For this reason, together with few preliminary applications in the field of molecular docking (Korb et al., 2011; Khar et al., 2013), GPUs have been mainly exploited for MD simulations, which can be parallelized at the level of atoms. In fact, nowadays, simulations of hundreds of nanoseconds are easily performed, and reaching the microsecond timescale is an affordable issue on a GPU-equipped workstation (Harvey and De Fabritiis, 2012). In addition, cloud computing has been emerging nowadays, not just through the use of webservers intended to make molecular modeling accessible to a community of non-developers users, but also with the provision of computation power scalable and on-demand (Ebejer et al., 2013). As an example, AceCloud is an on-demand service for MD simulations, which is accessed through an extension of the ACEMD MD code (Harvey and De Fabritiis, 2015).

Moreover, a paradigm shift seems to have been spreading, that is the possibility to simulate long processes using numerous trajectories shorter than the process itself instead of a single long trajectory. This idea has been exploited by the folding@home project, a worldwide distributed computing environment benefitting from the computers of private citizens, when not in use (Shirts and Pande, 2000). Since during a classical MD simulation, the system is stuck in a minimum, waiting for the fortunate event that triggers the overcoming of an energy barrier, the simulation of many trajectories in parallel would increase the probability to meet the lucky event. Thus, numerous simulations are started from the same initial condition and run in parallel on different computers, and when one escapes from the energy minimum, all the simulations are stopped and started from the new productive configuration (Pande et al., 2003).

The new paradigm has found its best application in the use of Markov State Models (MSMs) and adaptive sampling. In fact, MSMs are based on an ensemble view of the dynamics, from which statistical properties, such as the probability to occupy a state and the probability to jump from one state to another, are computed. The construction of a Markov model is made of the discretization and projection of a trajectory into microstates, and of a transition probability matrix T(τ ) computation at a given time, the lag-time τ , chosen in a way that the transition is memory-less (Markovian). Each element Tij(τ ) of the transition matrix represents the conditional probability to find the system in state j at time t+ τ while being in state i at time t. The transition matrix approximates the dynamic of the system and enables to extrapolate the free energy from the equilibrium probability distribution of the system and the timescale of the slowest processes, even if they are not directly explored. In a qualitative fashion, the MSM may individuate diverse metastable states and construct multi-states models of the processes (Prinz et al., 2011). As an example, an MSM was constructed on an aggregate of nearly 500 100 ns-trajectories describing benzamidine-trypsin binding (with 37% productive trajectories); this enabled to characterize the binding process individuating three transition states, and to estimate binding free energy with 1 kcal/mol difference from the experimental one (while a higher deviation from experiment was associated with the extrapolated kon and koff) (Buch et al., 2011). Moreover, the computation of MSM on the collected data can give a feedback about undersampled zones of the phase space, suggesting where to focus further simulation, adapting the sampling (adaptive sampling methods) and increasing the efficiency of simulations (Bowman et al., 2010; Doerr and De Fabritiis, 2014). Currently, the major difficulties of this technique are related to the trajectory partition into discrete states, the choice of the lagtime and sufficient sampling to guarantee statistical significance (Pande et al., 2010).

Several alternative techniques have been developed during the years to overcome the time limitation imposed by classical MD simulations. A first example consists of the Coarse-Grained MD simulations, in which groups of atoms are condensed into spheres, reducing the degrees of freedom of the system (Kmiecik et al., 2016). This simplifies the conformational landscape of the system, but, as a drawback, the information on the all-atom simulations, that are precious for drug-discovery aim, are lost.

Additional strategies consist of enhanced sampling techniques that apply a bias to molecular dynamics simulations to increase the accessible timescale, enabling the simulation of slow processes like binding, unbinding and folding processes in a reduced amount of time.

#### Enhanced Sampling Techniques

These methods add a bias force/potential to the system to increase the rate of escape from local minima, entailing an acceleration of conformational sampling. They have been conceived primarily to study either folding or binding or unbinding processes, sharing the underlying idea of enhancement of sampling and overcoming high energy barriers.

Enhanced sampling techniques can be divided into methods that make use of collective variables to introduce the bias and methods that do not (De Vivo et al., 2016) (**Figure 3**).

The employment of a collective variable (CV) is based on the idea that a complex system can be decomposed into one or a combination of reaction coordinates describing the process of interest. These coordinates are named as collective variables since it is assumed they can summarize the behavior of the entire system. After a careful choice of the CVs, the bias is added on these coordinates during the simulation enhancing sampling along the CVs. The phase space is reduced to the space of the collective variables, since the conformational space is projected to the selected CVs, with a consequent dimensional reduction of the free energy surface.

In the following paragraphs, few representative enhanced sampling techniques are reported as an example, focusing on their application in binding and unbinding and going toward a fully dynamic docking (De Vivo and Cavalli, 2017).

#### **Collective variables-free methods**

Replica Exchange Molecular Dynamics (REMD) This method adopts an increase in temperature to accelerate the conformational sampling. The first formulation of Replica Exchange MD (Sugita and Okamoto, 1999), also known as Parallel Tempering (PT), consists of the parallel simulation of a number of independent and simultaneous replicas of the same system, starting from the same configuration, but at different temperatures. At regular time intervals, two replicas characterized by neighbor temperatures are switched, or, in other terms, their temperatures are exchanged, with a probability determined by the energy (E) and temperature (T) of the system. In particular, the transition probability between simulations at temperature T<sup>1</sup> and T<sup>2</sup> is determined by the Metropolis criterion:

$$P\left(T\_1 \blacktriangleright T\_2\right) = \begin{cases} 1 & \text{for } \left[\beta\_2 - \beta\_1\right] \left(E\_1 - E\_2\right) \le 0\\ e^{-\left[\beta\_2 - \beta\_1\right] \left(E\_1 - E\_2\right)} & \text{for } \left[\beta\_2 - \beta\_1\right] \left(E\_1 - E\_2\right) > 0 \end{cases}$$

where β =1/kBT (with k<sup>B</sup> the Boltzmann constant).

Temperatures are updated by rescaling the velocities of the parent simulations (v<sup>1</sup> and v<sup>2</sup> to v<sup>1</sup> ′ and v<sup>2</sup> ′ ) according to the following equation:

$$\begin{cases} \nu\_1' = \sqrt{\frac{T\_2}{T\_1} \nu\_1} \\ \nu\_2' = \sqrt{\frac{T\_1}{T\_2} \nu\_2} \end{cases}$$

The choice of the panel of temperatures is critical, and various strategies have been proposed to guide the selection (Patriksson and van der Spoel, 2008).

Further development of REMD has been introduced, such as the Hamiltonian Replica Exchange (H-REMD), where Hamiltonians are exchanged among replicas instead of temperatures (Fukunishi et al., 2002), and Replica Exchange with Solute Tempering, where a different treatment of the central group and the solvent buffer is performed (Liu et al., 2005). HREMD has been recently combined to conventional MD simulations using multi-ensemble Markov models (MEMMs) (Wu et al., 2016) to investigate the multistate kinetics of Mdm2 and its inhibitor peptide PMI (Paul et al., 2017). An ensemble of 500 µs unbiased MD simulations conducted from different initial states, especially dissociated, were combined to HREMD simulations (6 simulations of 1 µs and with 14 replicas) to enhance sampling of rare dissociation events; the results were analyzed through the TRAMMBAR estimator, leading to the prediction of a residence time beyond the second timescale, despite a submillisecond simulation time. Moreover, the trajectories were furtherly analyzed to investigate the binding mechanism and binding-induced folding of PMI (Paul et al., 2018). It appeared that a multitude of parallel pathways is possible and that binding and folding are coupled, while not temporarily ordered and separated.

Accelerated Molecular Dynamics (aMD) Accelerated MD (aMD) facilitates the egress from a low energy basin by adding a bias potential function (1V(r)) when the system is entrapped in an energy minimum. In particular, when the potential energy (V(r)) is lower than a certain cut-off (E), the bias is added giving a modified potential (V ∗ (r)=V(r)+ 1V(r)); otherwise the simulation continues in the true-unbiased potential (V ∗ (r)=V(r)).

The bias function is reported in the following equation:

$$
\Delta V\left(r\right) = \frac{\left(E - V\left(r\right)\right)^2}{\alpha + \left(E - V\left(r\right)\right)^2}
$$

where E is the potential energy cut-off and α is a tuning parameter determining the depth of the modified potential energy basin.

E has to be at least greater than Vmin (the minimum potential energy, close to the starting configuration), while α = E - Vmin will allow maintaining the underlying shape of the landscape (Hamelberg et al., 2004).

As an example, aMD showed qualitatively similar results to classical MD with fewer computational effort in the simulation of tiotropium-M<sup>3</sup> Muscarinic Acetylcholine Receptor binding: tiotropium was observed to recognize the extracellular vestibule of the receptor, as in a previously reported long (16 µs) classical MD simulation (Kruse et al., 2012), by accelerating the process of about one order of magnitude (three aMD replicas of 200 ns, 500 ns, and 1 µs) (Kappel et al., 2015).

#### **Collective Variables-dependent methods**

Steered Molecular Dynamics (SMD) Taking inspiration from atomic force microscopy experiments, in Steered MD (SMD) an external force is applied to a ligand to drive it out of the target binding site (Isralewitz et al., 1997, 2001; Izrailev et al., 1997). Other possibilities involve the application of forces on different CVs, such as nonlinear coordinates that can help to explore the conformational rearrangement of protein domains (Izrailev et al., 1999).

SMD gives insights into the ligand-target unbinding mechanism, which can be investigated through the dynamical evolution of the ligand-target pattern of interactions, as reported for a series of Cyclin-Dependent Kinase 5 (CDK5) inhibitors (Patel et al., 2014). In the same work, the second application of SMD in drug discovery is highlighted: since the bias force added during an SMD simulation is assumed to be related to the binding strength, the binding force profile can be used to discriminate binders from nonbinders.

SMD relies on an a priori definition of the applied force direction, which can be fixed (for example a simple straight line) or can change during the simulation. The choice of the direction is not trivial, because a ligand may bump into obstructions during its way out of the protein, but a method evaluating the minimal steric hindrance has been reported (Vuong et al., 2015). Moreover, integration with the targeted molecular dynamics (TMD) are reported: in TMD a bias force is applied to conduct the system from an initial to a desired final configuration (Schlitter et al., 1993), leading to the individuation of a path that can be used as set of directions for an SMD simulation (Isralewitz et al., 2001).

Random Acceleration Molecular Dynamics (RAMD) Random Acceleration MD (RAMD), also defined Random Expulsion MD, is an extension of SMD, and, like this, was developed to study the egress of a ligand from its target binding site. It consists of the application of an artificial randomly-directed force on a ligand to accelerate its unbinding. In this way, in comparison with SMD, RAMD avoids the preliminary choice of the force direction; consequently, if some obstructions are found during the exit pathway, the escape direction is switched.

In particular, the direction of the force is chosen stochastically and maintained for a number of MD steps. If during this time interval the average velocity of the ligand is lower than a specified cut-off (or, in other terms, if the distance covered by the ligand is lower than a cutoff distance, rmin), meaning that probably a rigid obstruction has been met, a new force direction is assigned to allow the ligand to search for alternative exit pathways (Lüdemann et al., 2000).

As SMD, RAMD is predominantly used to simulate ligand unbinding from a molecular target. The egress of carazolol from β<sup>2</sup> Adrenergic Receptor was for example described thanks to an ensemble of RAMD simulations (100 simulation, with a variable length of maximum 1 ns): the extracellular surface opening of the receptor was individuated as the predominant exit root, entailing the rupture of a salt bridge linking extracellular loop 2 to transmembrane helix 7 (Wang and Duan, 2009).

Umbrella Sampling (US) Umbrella Sampling (US) (Torrie and Valleau, 1977) consists of restraining the system along one or a combination of CVs. Commonly, the range of interest of the CV is divided into windows, each characterized by a reference value of the CV (ξref ). The bias potential enhances sampling in each window by forcing the system to stay close to the respective CV reference value. The bias is a function of the reaction coordinate, and can have different shapes, but generally consists of a simple harmonic, as in the following equation:

$$V(\xi) = \frac{k}{2} \left(\xi - \xi\_{\text{ref}}\right)^2$$

Where k is the strength of the potential and ξ is the value of the CV.

The strength of the bias has to be high enough to let energy barriers crossing, but sufficiently low to enable the overlapping of system distributions of different windows, as required for post-processing analysis.

The aim of US is to force sampling in each window to collect sufficient statistics along with the whole reaction coordinate. Then the distribution of the system and consequently the free energy is calculated along the CV (Kästner, 2011). Different post-processing methods can be used to perform combination and analysis of the data coming from the different US windows; the most famous is umbrella integration (Kästner and Thiel, 2005), the weighted histogram analysis method (WHAM) (Kumar et al., 1992), and the more recent Dynamic Weighted Histogram Analysis (DHAM) (Rosta and Hummer, 2015), which can be used also to derive kinetic parameters.

Integrations of US with other enhanced sampling techniques are reported in the literature, such as the replica-exchange umbrella sampling method (REUS), where an umbrella potential is exchanged among replicas (Sugita et al., 2000; Kokubo et al., 2011). This technique was applied to the prediction of ligandprotein binding structures, starting from unbound initial states and employing as CV ξ the distance between the centers of mass of the ligand and of the backbone of two selected residues. This technique resulted to be effective in the prediction of the binding mode of a couple of ligands on p38 and JNK3 kinases (RMSD minor than 1.7 Å), and outperformed a cross-docking experiment, highlighting the importance of considering protein flexibility to accurately predict the coordinates of a complex (Kokubo et al., 2013).

Metadynamics Metadynamics (Laio and Parrinello, 2002) introduces a bias potential to the Hamiltonian of the system in the form of a Gaussian-shaped function of one or more CVs. In this case, the bias does not restrain or constrain the system, neither force the system along with a preferred direction in the CV space. The bias is used to keep the memory of the already explored zones of the phase space, and to discourage the system to visit them again (Laio and Gervasio, 2008).

At time t, the bias potential (VG(S,t)) is reported in the following equation:

$$V\_{\rm G} \left( \mathbf{S}, t \right) = \int\_0^t dt' \boldsymbol{\phi} \exp \left( -\sum\_{i=1}^d \frac{\left( \mathbf{S}\_i \left( \mathbf{R} \right) - \mathbf{S}\_i \left( \mathbf{R} \left( \boldsymbol{\uprho}' \right) \right) \right)^2}{2\sigma\_i^2} \right),$$

where S(R)=(S1(R),...,Sd(R)) is a set of d CVs (which are functions of the coordinates R of the system), Si(R(t)) is the value of the ith CV at time t, σ<sup>i</sup> is the Gaussian width for the ith CV, and ω is the energy rate, given by:

$$
\alpha = \frac{W}{\mathfrak{t}\_G}
$$

with W the Gaussian height and τ<sup>G</sup> the deposition rate.

Thus, the bias is "history-dependent," because it is the sum of the Gaussians that have already been deposited in the CV space during the time.

The free energy landscape is explored, starting from the bottom of a well, by a random walk; bias-Gaussians are deposited in the CV space with a given frequency, and at each iteration, the bias is given by the sum of the already deposited Gaussians. As time goes by, the system, instead of being trapped in the bottom of a well, is pushed out by the hill of deposited Gaussians and enters a new minimum. The process continues until all the minima are compensated by the bias potential (Barducci et al., 2011).

Metadynamics in this way enables to enhance sampling and to reconstruct the free energy surface; this can be used to explore binding/unbinding processes (Gervasio et al., 2005), and, with the application of funnel metadynamics (Limongelli et al., 2013), to the estimation of binding free energy.

Unfortunately, it may occur that the free energy surface is overfilled, but this has been partially solved by well-tempered metadynamics, in which the height of the added Gaussian is rescaled by the already deposited bias (Barducci et al., 2008). Another issue with metadynamics is the choice of the CVs, which should describe the slowest motions of the system and the initial-final-relevant intermediates. Moreover, a small number of CVs has to be used, and a good strategy is a combination with other techniques able to enhanced sampling along a great number of transverse coordinates (Barducci et al., 2011), such as with parallel tempering (Bussi et al., 2006). Using a well-tempered multiple-walker funnel-restrained metadynamics, the binding pathway of several ligands to 5 G-proteincoupled receptors (including X-ray crystal structures and homology models) has been recently explored, resulting in the prediction of

FIGURE 4 | (A) Sketch of a pepSuMD step: the distance between the centers of mass of the ligand (peptide) and the target is computed at regular time intervals during the SuMD step. The distance values are fitted by a line, whose slope (m) determines if the current SuMD step (m > 0) or a new one (m < 0) has to be simulated. (B) Representation of the binding pathway bringing BAD peptide to the Bcl-X<sup>L</sup> binding site, occurring in 46.2 ns. The superposition of the final pepSuMD state with the experimental structure (PDB ID: 1G5J, Petros et al., 2000) is reported on the right.

binding free energies with a root-mean-square error minor than 1 kcal mol−<sup>1</sup> (Saleh et al., 2017).

#### Supervised Molecular Dynamics

In the last years, a new method, called Supervised Molecular Dynamics (SuMD), has been introduced to accelerate the binding process (Sabbadin and Moro, 2014; Cuzzolin et al., 2016). SuMD is distinguished from enhanced sampling simulations since it does not affect the energy profile of the system.

A SuMD simulation consists of a series of small MD windows (hundreds of picoseconds), called SuMD steps, where step n+1 is run after the evaluation of step n in terms of ligand-target approaching. During each SuMD step, the distance between the centers of mass of the ligand and of the target binding site (few selected residues) is computed; distance values are collected at regular intervals during the simulation and are fitted by a line (**Figure 4A**). If the slope of the line is negative, it means that the ligand is approaching the binding site, the SuMD step (step n) is considered productive, and a new step (step n+1) is started from the last coordinates and velocities of the current step. Otherwise, if the slope is positive, it means that the SuMD step is unproductive, thus the current SuMD step simulation is deleted and restarted from its initial coordinates (starting configuration of step n). The simulation is concluded after that the distance between the centers of mass of ligand and target fall under a certain cut-off. Finally, the consecutive SuMD steps are merged together providing the SuMD trajectory.

In this way, SuMD enables to observe a binding event in a reduced timescale, in the orders of tens to hundreds of nanoseconds, without the introduction of any energetic bias. Indeed, SuMD simply focuses sampling by the introduction of a tabu-like algorithm which favors the progress of a simulation toward productive events and avoids wasting simulation time in uninteresting portions of the search space.

Certainly, a single SuMD trajectory is not sufficient to explain the complex binding process, and the retrieval of thermodynamic quantities from a single simulation must be avoided. Nevertheless, a SuMD trajectory depicts one of the possible binding pathways leading a ligand to reach the target, so it can be useful to propose a mechanistic hypothesis.

The technique was first applied to Adenosine Receptors, where it facilitated the characterization of the binding pathways of several ligands toward the receptor, with the exploration of metabinding sites (Sabbadin and Moro, 2014; Sabbadin et al., 2015). In this context, SuMD can be useful in the interpretation of allosteric interactions (Deganutti et al., 2015) and has proved to be supportive to the identification of fragment-like positive allosteric modulators (Deganutti and Moro, 2017). In fact, SuMD turned out to be effective in simulating fragment compounds, as shown by the accurate prediction of the binding mode of a catechol fragment to human peroxiredoxin 5 (PRDX5), reaching a minimum RMSD of 0.7 Å from the crystallographic pose.

The applicability spectrum of SuMD has been furtherly enlarged, till the development of pepSuMD, a revised version of the technique

#### REFERENCES

Abagyan, R., and Totrov, M. (2001). High-throughput docking for lead generation. Curr. Opin. Chem. Biol. 5, 375–382. doi: 10.1016/S1367-5931(00)00217-9

Abagyan, R., Totrov, M., and Kuznetsov, D. (1994). ICM? A new method for protein modeling and design: applications able to simulate the binding pathway of a peptide ligand toward its protein binding site (Salmaso et al., 2017). The recognition process of the BAD peptide to Bcl-X<sup>L</sup> protein (**Figure 4B**) and of the p53 peptide to MDM2 has been recently reported, with the achievement of an RMSD less than 5 Å from the experimental conformation in tens of nanoseconds in both cases (46.2 and 23.40 ns, respectively). During the BAD/Bcl-X<sup>L</sup> simulation, the C-terminal helix explored different conformations, meaning that peptide and protein conformational rearrangements can be observed during a SuMD simulation when occurring in the same time scale of the SuMD-accelerated binding.

#### CONCLUSIONS AND PERSPECTIVES

In this review, an excursus over some relevant computational techniques in drug discovery has been performed, highlighting how protein flexibility has been introduced into the simulations during the years. Starting from simple rigid docking strategies justified by the lock-key model, it was soon necessary to consider conformational degrees of freedom of ligands during docking. Experimental data proving the existence of different conformations of protein structures has made the molecular models to face the problem of interpreting and simulating conformational transitions of macromolecules.

From rough attempts to include protein flexibility during classical molecular docking, the development of hardware technologies and of novel MD computational techniques has been allowing more and more to simulate huge conformational movements. The possibility to simulate contemporary folding and binding phenomena can be exploited to answer the long-standing debate about "induced-fit" and "conformational selection" binding models, by giving a mechanistic interpretation of binding pathways.

Moreover, some of the enhanced sampling techniques are no more an exclusive methodological exercise, but has become within reach of many research groups, whit a consequent real applicability in drug discovery.

#### AUTHOR CONTRIBUTIONS

VS and SM devised the organization, the main conceptual ideas, proof outline and wrote the review. The content of the present work has been largely taken from the PhD thesis entitled Exploring protein flexibility during docking to investigate ligand-target recognition written by VS under the supervision of SM.

#### ACKNOWLEDGMENTS

MMS lab is very grateful to Chemical Computing Group, OpenEye, and Acellera for the scientific and technical partnership. MMS lab gratefully acknowledges the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

to docking and structure prediction from the distorted native conformation. J. Comput. Chem. 15, 488–506. doi: 10.1002/jcc.540 150503

Adcock, S. A., and McCammon, J. A. (2006). Molecular dynamics: survey of methods for simulating the activity of proteins. Chem. Rev. 106, 1589–1615. doi: 10.1021/cr040426m


binding events. Trends Biochem. Sci. 35, 539–546. doi: 10.1016/j.tibs.2010. 04.009


of the endogenous agonist adenosine using supervised molecular dynamics simulations. Medchemcomm 6, 1081–1085. doi: 10.1039/C5MD00016E


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Salmaso and Moro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Advanced Activity-Based Protein Profiling Application Strategies for Drug Development

#### Shan Wang<sup>1</sup> , Yu Tian<sup>1</sup> , Min Wang<sup>1</sup> , Min Wang<sup>2</sup> , Gui-bo Sun<sup>1</sup> \* and Xiao-bo Sun<sup>1</sup> \*

<sup>1</sup> Beijing Key Laboratory of Innovative Drug Discovery of Traditional Chinese Medicine (Natural Medicine) and Translational Medicine, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China, <sup>2</sup> Life and Environmental Science Research Center, Harbin University of Commerce, Harbin, China

Drug targets and modes of action remain two of the biggest challenges in drug development. To address these problems, chemical proteomic approaches have been introduced to profile targets in complex proteomes. Activity-based protein profiling (ABPP) is one of a growing number chemical proteomic approaches that uses small-molecule chemical probes to understand the interaction mechanisms between compounds and targets. ABPP can be used to identify the protein targets of small molecules and even the active sites of target proteins. This review focuses on the overall workflow of the ABPP technology and on additional advanced strategies for target identification and/or drug discovery. Herein, we mainly describe the design strategies for small-molecule probes and discuss the ways in which these probes can be used to identify targets and even validate the interactions of small molecules with targets. In addition, we discuss some basic strategies that have been developed to date, such as click chemistry-ABPP, competitive strategies and, recently, more advanced strategies, including isoTOP-ABPP, fluoPol-ABPP, and qNIRF-ABPP. The isoTOP-ABPP strategy has been coupled with quantitative proteomics to identify the active sites of proteins and explore whole proteomes with specific amino acid profiling. FluoPol-ABPP combined with HTS can be used to discover new compounds for some substrate-free enzymes. The qNIRF-ABPP strategy has a number of applications for in vivo imaging. In this review, we will further discuss the applications of these advanced strategies.

#### Keywords: ABPP, isoTOP-ABPP, fluoPol-ABPP, qNIRF-ABPP, drug targets

### INTRODUCTION

Two major challenges in the field of drug discovery are drug development and target identification (Schenone et al., 2013). The identification of drug targets, which is important for elucidating the mode of action, is of great significance in the process of drug discovery. Two drug discovery strategies are currently used: phenotype-based drug discovery and target-based drug discovery (Samsdodd, 2005). Phenotype-based drug discovery refers to the screening of small molecules or polypeptides in cells, tissues, or organs based on existing pharmacology. Targetbased drug discovery involves first determining the targets and then identifying active molecules. With the rapid development of molecular biology, target-based drug discovery paradigm replaced the traditional phenotype-based approach, because it allowed an increased screening capacity and the definition of rational drug discovery programs. However, analysis of the process of target-based drug discovery showed that this screening platform did not effectively improve the productivity of

#### Edited by:

Adriano D. Andricopulo, University of São Paulo, Brazil

#### Reviewed by:

Kevin Coombs, University of Manitoba, Canada Andreas Dominik, Technische Hochschule Mittelhessen – University of Applied Sciences, Germany

#### \*Correspondence:

Gui-bo Sun gbsun@implad.ac.cn Xiao-bo Sun sun\_xiaobo163@163.com

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 26 December 2017 Accepted: 27 March 2018 Published: 09 April 2018

#### Citation:

Wang S, Tian Y, Wang M, Wang M, Sun G-b and Sun X -b (2018) Advanced Activity-Based Protein Profiling Application Strategies for Drug Development. Front. Pharmacol. 9:353. doi: 10.3389/fphar.2018.00353

pharmaceutical industry, but the time and cost increased significantly (Samsdodd, 2005). Due to the complexity of biological systems, phenotype-based strategies can provide more comprehensive evaluation of potential drugs and play an important role in drug development. In recent years, phenotypebased strategies have received increasing attention and have become the main method for drug discovery. These screening strategies are more efficient, effective and economical than other screening platforms.

Numerous technologies for identifying targets have recently been developed. Experimental approaches such as genomic and proteomic techniques are the primary tools for target identification. To complement experimental methods, a series of computational (in silico) tools have also been developed for target identification over the past two decades (Krysiak and Breinbauer, 2012; Yue et al., 2012). With the advancement of molecular biology and the advent of the post-genomic era, these technologies provide a solid technical basis for improving the efficiency of drug discovery; however, there remain many barriers for the identification of drug targets, and we need to overcome these barriers.

Activity-based protein profiling is a technology to identify the binding of small molecule probes with proteins and confirm direct interaction. It combines activity-based probe and proteomics technologies together to help us to understand the mechanisms of compounds and the modes of action (Kozarich, 2003; Cravatt et al., 2008). The ABPP-like experiments were firstly reported in the early 1970s to explore the mechanisms of penicillin (Blumberg and Strominger, 1972; Suginaka et al., 1972).

However, the term proteome was firstly proposed at a scientific conference in Italy in 1994 (Wilkins et al., 1996; Huber, 2003). The development of proteomics allows the use of ABPP in many areas, from studying enzyme classes, including proteases, kinases, phosphatases, glycosidases, and oxidoreductases, to studying uncharacterized enzymes. ABPP has contributed to our understanding of enzyme activity in specific physiological and pathological processes on a proteome-wide scale (Heal et al., 2011; Li et al., 2012). This review will discuss all aspects of the ABPP workflow in greater detail. Appropriate strategies are also very important before beginning ABPP-associated experiments. With the development of this field, an increasing number of advanced strategies have been applied in more areas, and we will discuss these strategies in a later section of this review.

### ABPP WORKFLOW

Activity-based protein profiling workflow (**Figure 1**) will be discussed in the section and some important issues will be considered. Small-molecule probes are firstly designed and synthesized before ABPP progress begin, the basic chemical structure of a small-molecule probe consists of three parts: 1, a reactive group; 2, a linker site; and 3, a reporter group (Niphakis and Cravatt, 2014). In principle, the active group of small molecule interacts directly with the target protein and the reporter group to facilitate target fishing. Commonly used reporter groups are fluorescent groups, biotin, alkynes or azide, which can be modified by click chemistry methods to visualize protein targets. Depending on the selected reporting groups, different subsequent experiments can be carried out. For example, fluorescent groups can be used for rapid gel screening and the identification of the localization of small molecules in cells or animals, and biotin can be used for protein enrichment and then detected by mass spectrometry to identify target proteins.

After the probe is obtained, it is firstly subjected to rapid determination of working concentration and reaction time by using SDS-PAGE (Wright and Sieber, 2016). Typical workflows are as follows: (i) incubation of the probe with proteins, live cells, tissues, or animals to react with the target, (ii) for ccprobes, performing CuAAC to catalytically label the protein with a fluorescent group or other detectable labels followed by protein enrichment and pull-down assays, (iii) performing gel electrophoresis and fluorescence scanning or Western blotting (for detection of biotin) or quantitative proteomics to identify the target, and (iv) verifying the targets.

During the course of an ABPP project, there are many conditions that must be carefully considered. First, the probe can be incubated in cell lysates or in tissue homogenates in vitro. In this case, the conditions of the lysate are very important because the protein function and folding state must be retained to allow the protein to specifically bind to the probe molecule; Tris buffer or PBS are usually suitable (Speers and Cravatt, 2009). In situ labeling of cells in culture or in vivo labeling of mice via i.p. injection using an ABPP probe can be used to avoid this problem because in these conditions the probe interacts with the protein in a natural state. The caveat of the in situ method is that the probelabeled protein may be metabolized. Some cytotoxic probes may also reduce the amount of protein recovered by killing the cells. However, these problems can be avoided by shortening the time of probing. Second, the selection of reporters should be considered. Biotin labeling can be used for protein enrichment, target identification, and Western blot verification. However, it has been reported that endogenous biotinylated proteins can enhance the noise signal and cause interference. Fluorescence detection is faster and cleaner than blot-based biotin detection and has no additional endogenous biotinylated protein signals (Charron et al., 2009). Other alternative approaches are emerging, such as IAF (immunoaffinity fluorescent) labeling (Yu et al., 2010), or the direct click-on-resin approach, to avoid the use of biotin (Cassiano et al., 2014). Finally, it is very important to comprehensively identify the potential target, including direct identification by pull-down Western blots and recombinantprotein interaction assays with small molecules. The next step is to confirm the mode of action between the proteins and compounds and to uncover the mechanisms by using SPR, ITC, and FP (fluorescence polarization immunoassay). Several assays of biological function are needed to test the associated pharmacological effects of the compounds.

#### The Design of the Probe

A typical ABPP probe contains three groups: a reactive group, a linking group or binding group, and a reporter tag. For probe design, the first factor to consider is the reactivity of

compound. Most probes are based on bioactive small molecules. So far, many ABPP probes have utilized electrophilic reactive groups, including epoxides, Michael-addition units, disulfides, lactones, β-lactams, and quinone compounds. These groups can react with serine, tyrosine, or glutamine to modulate enzyme activity (Bottcher and Sieber, 2008). However, there are many compounds that react with targets via non-covalent interactions. To overcome this problem, a more intuitive and unbiased strategy for identifying binding partners of unreactive NPs is to use photoaffinity labeling (PAL). PAL makes use of photoreactive moieties that are inert under standard synthetic-chemical and biological conditions but can be activated by UV light, generating highly reactive, transient species. Benzophenone, aliphatic and aromatic diazirines are the most commonly used PAL groups.

In the process of probe design, the choice of linking groups can also be critical. Linking groups can attach the reactive groups with the label groups together and reduce the impact of the label group on the reactive group. The choice of linker group is also significant for reducing non-specificity. In this basic form, a linker can take the form of an extended alkyl or polyethylene glycol (PEG) spacer. Furthermore, of late, the design of cleavable linkers for protein enrichment has received much attention, especially for the isoTOP-ABPP strategy; more details can be found in some other reviews (Leriche et al., 2012; Rudolf et al., 2013).

The other critical challenge in the process of probe design is the reporter group. The widely used reporters are the biotin-streptavidin system for pull-down assays and fluorescent reporters for imaging-based detection. Because of the existence of intrinsically biotinylated proteins, some non-specific background can interfere with the identification of targets; however, fluorescent reporters can be used to avoid this problem. An increasing number of studies are combining these two reporters to identify targets (Liao et al., 2017; Nasheri et al., 2013).

### Fishing the Targets

Fishing for targets by using probes is a very critical step, and different platforms have been developed. In this section, we will talk about two commonly used methods: gel-based and gel-free platforms.

#### Gel-Based Platform for ABPP

To investigate the targets of ABPs, the typical method is to utilize gel electrophoresis to separate proteins by one-dimensional (1D) or two-dimensional (2D) polyacrylamide gel electrophoresis (PAGE) and detect the proteins by Coomassie brilliant blue staining or silver staining to obtain specific bands. The bands are then cut, and LC/MS is used for protein identification. This is the original method for target identification; however, this method can introduce contaminants in the form of other proteins, especially keratin, which makes data analysis more challenging. Non-specific labeling of various proteins, especially of abundant and sticky proteins, in addition to that of the actual target proteins has been a major problem in ABPP. To address this limitation, Seung Park's group have developed a new method called fluorescence difference in twodimensional gel electrophoresis (FITGE) and employed it in the target identification of the anti-neuroinflammatory agent inflachromene (ICM) (Park et al., 2012; Lee et al., 2014). The platform can simultaneously label two or more different samples, such as control and treatment groups, with different fluorescent labels and then simultaneously perform two-dimensional gel electrophoresis. If one spot was labeled with two fluorescent labels, the labeling can be thought of as being non-specific, and only signals in the treatment group were identified by LC/MS. High-resolution gel electrophoresis can exclude some non-specific targets; however, 2D-PAGE always requires a large amount of protein, which can be difficult to obtain for some precious samples, especially human disease samples.

#### Gel-Free Approaches

Given the promiscuity of many small molecules and the complexity of the cellular proteome, a high-flux and highaccuracy method is necessary. With the development of mass spectrometers, ABPs coupled with quantitative chemical proteomics has been used to identify drug targets, which can achieve a high-throughput work platform while improving the accuracy of target-protein identification. Quantitative chemical proteomic approaches have been developed, including metabolic labeling (SILAC), chemical labeling (iTRAQ), and the label-free approach (Chen et al., 2017).

SILAC (stable isotope labeling by amino acids in culture) is a stable-isotope-based labeling method, which mainly involves elements of metabolic incorporation. iTRAQ, isobaric tags for relative and absolute quantification, which stands for isobaric tags for relative and absolute quantitation, uses chemical tagging to label different sample populations. These approaches need tags for quantification and identification. These tags result in mass differences that can be detected via MS and enable quantitation and comparison between multiple samples. Some researchers have used ABPP-SILAC and ABPP-iTRAQ to validate some examples. In 2014, Cravatt's group examined the application of ABPP-SILAC to study the protein targets of the kinase inhibitor class of drugs, which includes the Bruton's tyrosine kinase (BTK) inhibitor ibrutinib. A total of 29 probe targets were identified, including epidermal growth factor receptor and BTK (Lanning et al., 2014). Lin's group explored the application of ABPPiTRAQ to accurately identify the targets and mechanism of action of curcumin, a natural product with anti-inflammatory and anti-cancer properties. In total, 197 proteins were confidently identified from the HCT116 colon cancer cell line as binding targets of curcumin. Ingenuity pathway analysis (IPA) suggested that curcumin may exert its anticancer effects on multiple critical biological pathways, including the EIF2, eIF4/p70S6K, and mTOR signaling and mitochondrial dysfunction pathways (Wang et al., 2016). In iTRAQ-based mass spectrometry the protein is degraded into peptides and labeled at the final step of the entire process; therefore, in the event of an operational error, this process is irreversible. The ability of the ABPP-SILAC approach to identify a wide range of targets in an unbiased manner has been proved, especially for the identification of nonkinase off-target proteins. SILAC is limited by labeling efficiency. SILAC requires cell labeling, and cells often need to grow for at least 3 generations for high labeling efficiency, which is not suitable for some primary cells and tissues.

The label-free approach is another quantitative proteomic approach, which is generally cost-efficient and widely applicable compared to SILAC and iTRAQ. However, it was the need for very high reproducibility to allow run-run comparisons in label-free strategy. Artemisinin is the most potent of the antimalarial drugs; however, the mechanism of action of artemisinin is not completely understood. Lin's group used an unbiased chemical proteomic analysis to directly explore this mechanism in Plasmodium falciparum. This group designed and synthesized an alkyne-tagged artemisinin probe, combining click chemistry and the label-free method to identify 124 covalently binding protein targets of artemisinin, many of which are involved in essential biological processes of the parasite (Wang et al., 2015).

After the ABPP workflow is finished, the other important issue is to validate the targets. Once potential targets have been identified by ABPP, it is challenging to validate these targets and to verify their modes of action. Many approaches can be taken to assay the interactions between small molecules and targets; some of the commonly used approaches are as follows: (1) if the antibody is available or can be produced, the protein of interest may be enriched and then verified by Western blotting; (2) recombinant proteins can be used to perform the ABPP workflow and verify the interaction; (3) some biophysical methods, such as ITC (isothermal titration calorimetry), FPIA (fluorescence polarization immunoassay), SPR (surface plasmon resonance), and CTSA (cellular thermal shift assay), are should be used (Molina et al., 2013). (4) structural biology can also provide supportive evidence; (5) binding sites can be identified by LC-MS to further validate the direct site of interaction of proteins and small molecules, and if an amino acid modification can be identified, such as Cys or Ser, site-directed mutagenesis can be applied to identify these; and (6) the mode of action of small molecules can be very challenging, and it is necessary to apply many different biological and chemical tools, such as genetic methods and imaging technologies.

### ABPP STRATEGIES

In recent years, ABPP technology has developed rapidly. To enhance the specificity and accuracy of this technology, some basic strategies, such as CC-ABPP (click chemistry-ABPP) and competitive-ABPP strategies, have been utilized in most studies. To expand the application of ABPP, some more advanced strategies have been developed, such as isoTOP-ABPP, fluoPol-ABPP and qNIRF-ABPP. These advanced strategies have different characteristics and are used in many areas from active sites identification to new potential compounds discovery and live imaging. The isoTOP-ABPP strategy can be used to directly identify active sites of target proteins; fluoPol-ABPP was used for the discovery of new small molecules based on specific enzymes; and qNIRF-ABPP provides us the opportunity to image the distribution of compounds and promote the development of preclinical diagnosis. We will discuss each strategy in greater detail.

## Basic Strategies

#### CC-ABPP (Click Chemistry-ABPP)

With the development of click chemistry, this method has been introduced into the field of ABPP technology. This method can overcome the limitations of bulky groups and enhance the cell permeability of the probes. By adding smaller alkyne or azide groups to the system, a single probe can be diversified with a variety of reporter groups without the need to develop new synthetic routes. The most widely used click chemistry reaction is the copper (I)-catalyzed azide-alkyne cycloaddition (CuAAC) between an azide and a terminal alkyne to generate a 1,4 disubstituted 1,2,3-triazole (Presolski et al., 2011; Martell and Weerapana, 2014). Concerns about the use of a cytotoxic copper species to catalyze the reaction promoted the development of a copper-free variant of this reaction, which utilizes a strained alkyne to accelerate the reaction (Chang et al., 2010).

To date, the use of CuAAC in living systems has been hindered by the toxicity of copper(I). Considerable cell death occurs when optimized CuAAC conditions that require 1 mM copper(I) are employed. Thus, as presently formulated, CuAAC is of limited use for labeling biomolecules in living systems. Cyclooctyne, the smallest stable cycloalkyne, reacted "like an explosion" when combined with phenylazide and enabled the detection of azides in living systems through strain-promoted [3+2] cycloaddition (Agard et al., 2004). Moreover, with the aim of improving the kinetics of the process, a series of compounds bearing electronwithdrawing fluorine atoms at the propargylic positions were investigated.

#### Competitive-ABPP

fphar-09-00353 April 6, 2018 Time: 17:8 # 5

The non-specific binding is one of the main limitations of ABPP strategies. The photoreactive or electrophilic probes, even probes with higher concentration would in all probability label proteins non-specifically to some extent (i.e., not targets of the parent compound) (Wright and Sieber, 2016). To overcome this problem, the competitive strategy is receiving increasing attention. In competitive ABPP (Leung et al., 2003), a proteome is pre-incubated with parent compounds and subsequently with the activity-based probes, thus decreasing the binding of the probe with the target proteins by competing for the common binding site. The parent compounds are the prototype compounds before transforming to the probes, for example, Liao and his colleagues used SA to compete with the SA-probe to decrease its binding with IMPDH2 which demonstrated that they can interact with the same target (Liao et al., 2017). By this method, non-specific binding can be excluded, and only those sites that interact with the active site of the compound are analyzed. Some review papers have discussed its application and advantages and disadvantages (Willems et al., 2014; Wright and Sieber, 2016). With the development of advanced strategies, it has been applied in these strategies such as isoTOP-ABPP, fluoPol-ABPP and qNIRF-ABPP strategies, so we will discuss its application with these advanced strategies together in the next section.

## Advanced Strategies

#### isoTOP-ABPP

To identify the specific reactive amino acid sites of the target protein by using small molecules, Cravatt and co-workers developed a strategy called isoTOP-ABPP (isotopic tandem orthogonal proteolysis–ABPP) (Weerapana et al., 2010). This method uses isotope-labeled probes to achieve more reliable results compared to other quantitative protein profiling methods. This platform can simultaneously identify probe-labeled proteins and the exact sites of probe modification. Cysteine is the most intrinsically nucleophilic amino acid in proteins, and the activity of the protein is regulated by the modification of cysteine by endogenous and exogenous electrophiles. Iodoacetamide is a reagent classically used to react with cysteine and is often seen in proteomics; so, the Cravatt group used iodoacetamide to design a probe (Backus et al., 2016). The IA probe has an alkyne handle for "click chemistry" conjugation of probe-labeled proteins and isotopically labeled cleavable tags for quantitative mass spectrometry. Using this probe, researchers can quantitatively describe and profile the intrinsic reactivity of cysteine residues in native biological systems. Recently, Weerapana and his colleagues improved this IA probe. These researchers developed a pair of isotopically labeled iodoacetamide-alkyne probes, namely, IA-light and IA-heavy. These probes can be utilized for quantitative analysis of proteome samples and are easy to synthesize, especially compared to the isotopically tagged cleavable linkers (Abo et al., 2017). The iodoacetamide (IA) based chemical probe has been used to concurrently quantify reactivity changes in hundreds of cysteines within cell lysates. However, the cytotoxicity of the IA group precludes efficient live-cell labeling, which is important for preserving transient cysteine modifications. To overcome this limitation, Weerapana and his colleagues developed a caged bromomethyl ketone (BK) electrophile, which shows minimal cytotoxicity and provides spatial and temporal control of electrophile activation through irradiation. Using this probe, these researchers were the first to describe reactivity changes associated with diverse cysteine modifications in living cells (Abo and Weerapana, 2015).

A competitive isoTOP-ABPP platform expands the application of this strategy for functional cysteines in proteomes. This platform has been used to identify the protein targets of HNE, 15d-PGJ2, and 2-HD and elucidate the cellular functions and mechanisms of action of these compounds (Wang et al., 2014). Fragment-based covalent ligand discovery coupled with competitive isoTOP-ABPP can rapidly lead to the discovery of lead small molecules and the identification of druggable sites. Using this platform, the Nomura group discovered some anti-cancer fragments and revealed the mechanisms of action of these fragments (Anderson et al., 2017; Bateman et al., 2017; Roberts et al., 2017). For example, this group confirmed one compound, DKM 2-93, which impairs pancreatic cancer cell survival and in vivo tumor growth, from a fragment-based cysteine-reactive ligand library and identified UBA5 as the target of this compound by covalently modifying the catalytic cysteine, thereby inhibiting the activity of the protein as an activator of the ubiquitin-like protein UFM1 to UFMylate proteins (Roberts et al., 2017).

Recent studies have shown that reactive scaffolds targeting other amino acids such as serine (Bachovchin and Cravatt, 2012), and lysine (Anderson et al., 2017; Hacker et al., 2017), can also be explored by using these platforms to discover unique and novel druggable sites in proteins. Anderson and coworkers developed a screening platform for lysine reactive fragments, which are dichlorotriazine-based covalent ligands, and screened this library to reveal small molecules that impair 231 MFP cancer cell survivals. Using this platform, they identified KEA1- 97 and specific targets of KEA1-97 in 231 MFP proteomes and identified that this compound targets lysine 72 of thioredoxin, which disrupts the interaction of thioredoxin with caspase 3, activates caspases, and induces apoptosis.

#### FluoPol-ABPP

Target-based high-throughput screening (HTS) is essential for the discovery of small-molecule modulators of proteins. Typical screening methods rely on extensively tailored substrate assays for enzyme inhibitors or screens that profile cellular phenotypes. However, for those enzymes whose biochemical activity is not well characterized, such assays are not available. Competitive ABPP studies use SDS-PAGE as readout, limiting the applicability of such studies in HTS. Therefore, Cravatt and colleagues have developed a high-throughput competitive screening platform, namely, the fluopol-ABPP HTS assay, which can be used to select specific enzyme inhibitors, especially for enzymes with poorly

characterized substrate or biological functions. The platform also combines high-throughput screening with identification of modes of action (Bachovchin et al., 2009). This strategy, based on a probe tagged with a fluorophore, combines fluorescent probes with competitive inhibition strategies. When the fluorescent probes react with target proteins, the fluorophore signal is strong and consistent; in the presence of a competitor, the probe is released and the signal is decreased. These results can be easily and rapidly measured; therefore, this assay is suitable for HTS. Fluopol-ABPP is a substrate-free approach that is ideally suited for studying enzymes for which no substrates are known.

Using this platform identified specific inhibitors of the substrate-free enzyme RBBP9 and the mechanistically distinct enzyme GSTO1 from a library of small-molecules (Bachovchin et al., 2009). Bachovchin et al. (2009) used the serine hydrolase-directed activity-based probe fluorophosphonate (FP) rhodamine as the readout probe to select for specific inhibitors to purified RBBP9 from a library of 18,974 small molecules. From this screen, they identified 35 primary hits, and 20 compounds were confirmed via secondary gel-based screens. Finally, they identified emetine as a reversible RBBP9 inhibitor. This fluorophosphonate (FP)-rhodamine probe has also been used to explore other serine hydrolases, such as prolyl endopeptidase-like (PREPL) (Lone et al., 2011), phosphatase methylesterase-1 (PME-1) (Bachovchin et al., 2011a,b), and retinoblastoma-binding protein 9 (RBBP9) (Bachovchin et al., 2010).

Some other probes based on specific enzymes have also been used with the HTS-fluoPol-ABPP strategy. Bryan and his colleagues used a PAD-specific probe, namely, rhodamineconjugated F-amidine (RFA), to develop an HTS assay. Using these assay conditions, they screened 2,000 compounds (5 µM final concentration) from an NIH validation set at The Scripps Research Institute in La Jolla, CA, United States (Pubchem AID 463073). Finally, they identified streptonigrin as an irreversible PAD4 inactivator (Knuckley et al., 2010). Tsuboi and his colleagues also combined their specific probe, a rhodamineconjugated phenyl sulfonate ester (SE-Rh), with GSTO1 to identify GSTO1 inhibitors from a 300K+ compound library, and they confirmed an agent, KT53, that inactivates GSTO1 with excellent in vitro (IC50 = 21 nM) and in situ (IC50 = 35 nM) potency (Tsuboi et al., 2011).

#### qNIRF-ABPP

qNIRF-ABPP means quenched near-infrared fluorescent ABPP. Imaging agents that enable direct visualization and quantification in vivo have great potential value for monitoring chemotherapeutic responses and for early diagnosis and disease monitoring (Edgington et al., 2009; Garland et al., 2016). Fluorescent tags are heavily used in ABPP; however, the main limitation of these tags is the general fluorescence observed both during interaction with enzyme targets and when free in solution. To overcome this limitation, Matthew Bogyo's group engineered probes with a highly efficient quenching group to inhibit the fluorophore group and make the probe intrinsically "dark"; such a probe emits a fluorescent signal only after covalently modifying a specific protease target, resulting in the loss of the quenching group (Blum et al., 2005). Finally, they synthesized

the quenched probe GB117, which was attached the large but potentially cell-permeable quenching group QSY7 through a linker to improve the stability and potency of the probe. From fluorescent-imaging studies, they found that GB117 was mainly accumulated in lysosomes. GB117 probes are considered to be tools for cell-based imaging of cysteine cathepsin activity. However, the application of these probes for imaging in animals is limited. Therefore, these researchers combined their method with non-invasive imaging technology and generated a series of near-infrared fluorescent activity-based probes (NIRF-ABPs), which are better suited for in vivo imaging and target identification (Blum et al., 2007). These NIRF-ABPs contain Cy5 (646/664 nm excitation/emission), which is better suited for in vivo imaging owing to lower background fluorescence, and are insensitive to serum. The researchers synthesized the quenched probe GB137 and unquenched probe GB123 based on GB117 and GBB111 for application in in vivo imaging studies. An in vivo analysis of the quenched and unquenched probes was conducted to quantify the overall signal-to-background ratios for each probe in multiple animals; the results indicated that GB123 and GB137 generated similar overall signal-to-background ratios. However, some limitations still exist, such as the quenched probe achieved its maximum signal much more rapidly than the unquenched probe. Cathepsin protease activity is highly elevated in macrophages of vulnerable plaques and contributes to plaque instability. The researchers also explored the distribution of cathepsin in an atherosclerosis mouse model by using GB137 and GB123 (Abd-Elrahman et al., 2016). They compared these two probes by in vivo imaging and found that both probes showed distinct signals in the macrophage-rich ligated carotids; however, GB123 was also detected in the lymph nodes, aortic arch and heart and exhibited slower signal accumulation than GB137. These cathepsin ABPs represent a rapid diagnostic tool for macrophage detection in atherosclerotic plaque. An improved quenched fluorescent probe containing a phenoxymethyl ketone (PMK) electrophile with greater reactivity and broader selectivity compared to previously reported AOMK-based probes has been synthesized by Matthew Bogyo's group (Verdoes et al., 2013).

### DISCUSSION

Drugs that form covalent attachments with their targets have traditionally been considered to be conceptually distinct from conventional non-covalent drugs because the potential offtarget reactivity could lead to undesirable side effects. However, covalent drugs have raised various concerns in the field of drug development (Singh et al., 2011; Bauer, 2015; Pichler et al., 2016). ABPP, a very powerful technique in target identification, has generated interest in covalent drugs and allows a more thorough investigation of the modes of action of individual drugs. ABPP is based on the activities of small molecules with a reactive group for binding and covalently modifying the active site of a certain enzyme class. Many ABPP probes have, so far, utilized electrophiles, including fluorophosphonates, sulfonates and epoxides, which exhibit preferences for nucleophilic groups

in the active sites of several distinct enzyme classes (Bottcher and Sieber, 2008).

Now, ABPP has been thought as an enormous approach to explore drug targets, with the advanced strategies application, its application expand from drug targets identification to drug discovery. However, it stills exist some limitations, probes labeling non-specific proteins, which is the main issue in this field. Competitive ABPP strategy was commonly used to address this problem by comparison with control. With the quantitative proteomics application, the quantitative data can be used to cut off these background signals, and in general proteins are identified as hits by their enrichment in probe-treated sample over control groups. The other issue is the probe itself, probespecific hits, which was difficult to deal with. Enrichment in the presence and absence of a competitor (typically the parent NP) is one approach widely used to test whether a protein is a probespecific hit. Further work in this area may be helpful in providing resources to aid researchers in assessing whether putative targets are genuine or related to the probe moiety itself. To address this issue, follow-up validation of putative targets is very important.

isoTOP-ABPP can enable quantitative analysis of native amino acid reactivity and record changes in enzyme activity directly in native biological systems. It provides information about the post-translational modification of proteins and overcome the deficiency of conventional proteomic or genomic methods, which mainly focus on the expression level. Especially, a fragment-based ligand screening with competitive isoTOP-ABPP platform couples the identification of covalent ligands with the discovery of druggable hotspots. A reactivity-based chemical probe to map reactive, functional, and ligandable hotspots in complex proteomes is firstly needed such as iodoacetamide (IA) probe to label cysteine residues (Weerapana et al., 2010), fluorophosphonate (FP) probe for serine (Liu et al., 1999), sulfotetrafluoropheny (STP) for lysine (Hacker et al., 2017). An isotopically labeled valine for quantitative mass spectrometry (MS) measurements of labeled peptides across multiple proteomes is also important. Probe labeling efficiency is need consideration, for example, FP probes can react with >80% of mammalian metabolic serine hydrolases (Bachovchin and Cravatt, 2012).

FluoPol ABPP is a broadly applicable HTS platform for inhibitor discovery where the ability of compounds to block fluorescent activity-based probe labeling of proteins is monitored by fluorescence polarization and can be readily adapted for use with different classes of enzymes and ABPP probes. However, there are some important issues to be considered. A cognate activity-based probe has been developed before this platform. In addition, fluoPol-ABPP requires a substantial amount of purified protein, which may prove challenging for certain enzymes (e.g., transmembrane enzymes). Regardless, in cases where protein quantity is not limiting, fluoPol-ABPP is quite cheap, since the quantity of probe used per assay is negligible. A library of small molecules is another issue. This platform makes the ABPP technology useful not only for mechanism identification but also for compound discovery and will help us understand more about some poorly characterized enzymes and the inhibitors or activators of these enzymes.

It is important to visualize these diseased cells to enable diagnosis, facilitate surgical resection and monitor therapeutic response. Therefore, there is great opportunity to develop noninvasive imaging technologies for interventional surgical imaging and for diagnostic and therapeutic applications. The qNIRF-ABPP strategy provides a method for in vivo imaging. qNIRF-ABPs are potentially valuable novel imaging agents for disease diagnosis and are powerful tools for preclinical and clinical testing of small-molecule therapeutic agents in vivo, for the identification of specific therapeutic targets and biomarkers, and for monitoring the efficacy of small-molecule inhibitors (Joyce et al., 2004; Rosenthal et al., 2015; Garland et al., 2016).

### CONCLUSION

Activity-based protein profiling can provide an unbiased, global and quantitative analysis of protein binding partners. It has been used with different samples, including cell lysates, live cells, animal lysates, and even live animals. All these applications help us understand the interactions between compounds and organisms. With the applications of advanced strategies, ABPP has expanded its area from drug targets identification to drug discovery. The advanced strategies of ABPP open a new door for us, from target-based high-throughput screening to take images in vivo. isoTOP-ABPP strategy can provide us the global analysis of cysteine, serine and lysine reactivity even in living cells, which is important for preserving transient amino acids modifications. Fluopol-ABPP HTS assay overcome the traditional screening methods disadvantages relying on substrate assay and cellular phenotypes. It can be used for some poorly characterized enzymes to explore their inhibitors or activators. qNIRF-ABPP provides a method for in vivo imaging and is helpful for diagnosis, surgical resection and therapeutic response. The wide applicability of the above methods will provide more possibility to success for novel drug development, and expand more technical innovation in ABPP field. Finally, with advances in technology and through continuous improvement, chemical proteomic technology will remain at the forefront of drug discovery and target recognition.

### AUTHOR CONTRIBUTIONS

SW wrote and edited this paper. YT and MW (third author) gave some convincing advice. MW (fourth author) checked and edited this paper. G-bS and X-bS designed this review.

## FUNDING

This work was supported by the National Science and Technology Major Project (Grant No. 2015ZX09501004-001-003), the Special Research Project for TCM (Grant No. 201507004), the CAMS Innovation Fund for Medical Science (CIFMS) (Grant No. 2016-I2M-1-012), and Peking Union Medical College Graduate Student Innovation Fund (Grant No. 2016-1007-06).

### REFERENCES

fphar-09-00353 April 6, 2018 Time: 17:8 # 8


hepatitis c virus replication. Chem. Biol. 20, 570–582. doi: 10.1016/j.chembiol. 2013.03.014


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wang, Tian, Wang, Wang, Sun and Sun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges

#### Isabella A. Guedes, Felipe S. S. Pereira and Laurent E. Dardenne\*

Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil

Structure-based virtual screening (VS) is a widely used approach that employs the knowledge of the three-dimensional structure of the target of interest in the design of new lead compounds from large-scale molecular docking experiments. Through the prediction of the binding mode and affinity of a small molecule within the binding site of the target of interest, it is possible to understand important properties related to the binding process. Empirical scoring functions are widely used for pose and affinity prediction. Although pose prediction is performed with satisfactory accuracy, the correct prediction of binding affinity is still a challenging task and crucial for the success of structure-based VS experiments. There are several efforts in distinct fronts to develop even more sophisticated and accurate models for filtering and ranking large libraries of compounds. This paper will cover some recent successful applications and methodological advances, including strategies to explore the ligand entropy and solvent effects, training with sophisticated machine-learning techniques, and the use of quantum mechanics. Particular emphasis will be given to the discussion of critical aspects and further directions for the development of more accurate empirical scoring functions.

Keywords: structure-based drug design, molecular docking, virtual screening, scoring function, binding affinity prediction, machine learning

### INTRODUCTION

The drug discovery process required to enable a new compound to reach the market as an innovative therapeutic entity is significantly expensive and time-consuming (Mullard, 2014; DiMasi et al., 2016; Mignani et al., 2016). In this context, research groups and pharmaceutical industry have extensively included computer-aided drug design (CADD) approaches in their drug discovery pipeline to increase the potential of finding newer and safer drug candidates (Ban et al., 2017; Barril, 2017; Usha et al., 2017). Structure-based drug design (SBDD) methods, which require the three-dimensional structure of the macromolecular target, have been widely employed in successful campaigns (Bortolato et al., 2012; Danishuddin and Khan, 2015; Rognan, 2017). Although important challenges and some limitations have been addressed, many efforts have been made aiming the improvement of existing methods and the development of innovative approaches. Molecular docking is one of the most used SBDD approaches with several reviews published at the present time (Guedes et al., 2014; Ferreira et al., 2015; Yuriev et al., 2015; Pagadala et al., 2017;

#### Edited by:

Adriano D. Andricopulo, Universidade de São Paulo, Brazil

#### Reviewed by:

Antti Tapani Poso, University of Eastern Finland, Finland Giovanni Grazioso, Università degli Studi di Milano, Italy

> \*Correspondence: Laurent E. Dardenne dardenne@lncc.br

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 01 July 2018 Accepted: 07 September 2018 Published: 24 September 2018

#### Citation:

Guedes IA, Pereira FSS and Dardenne LE (2018) Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges. Front. Pharmacol. 9:1089. doi: 10.3389/fphar.2018.01089

Dos Santos et al., 2018), and has been continuously explored by the scientific community to develop more sophisticated and accurate strategies. Docking aims to predict binding modes and affinity of a small molecule within the binding site of the receptor target of interest, supporting the researcher in the understanding of the main physicochemical features related to the binding process. Docking-based virtual screening (VS) consists of largescale docking with a growing number of success cases reported (Villoutreix et al., 2009; Matter and Sotriffer, 2011; Rognan, 2017). Examples of docking programs are AutoDockVina (Trott and Olson, 2010), UCSF DOCK (Allen et al., 2015), GOLD (Jones et al., 1997), and Glide (Friesner et al., 2004, 2006a). Beyond the standalone software, web servers such as the DockThor Portal<sup>1</sup> (de Magalhães et al., 2014), MTiOpenScreen<sup>2</sup> (Labbé et al., 2015), HADDOCK<sup>3</sup> (van Zundert et al., 2016), and DOCK Blaster<sup>4</sup> (Irwin et al., 2009) provide to the scientific community friendly user interface and satisfactory time response of docking results.

The fast evaluation of docking poses generated by the search method and the accurate prediction of binding affinity of topranked poses is essential in VS protocols. In this context, scoring functions emerge as a straightforward and fast strategy despite limited accuracy, remaining as the main alternative to be applied in VS experiments (Huang et al., 2010). Moreover, the development of more accurate scoring functions is strategic in the field of SBDD and remains a challenging task, especially in the hit-to-lead optimization (Enyedy and Egan, 2008) and de novo design (Liu et al., 2017). Although there is no universal scoring function with significant reliability for all molecular systems, some important strategies were explored. Examples of free online resources for predicting protein-ligand binding affinities without the dependency a docking program are BAPPL server<sup>5</sup> (Jain and Jayaram, 2005) CSM-lig<sup>6</sup> (Pires and Ascher, 2016) and KDEEP 7 (Jiménez Luna et al., 2018).

The development of an empirical scoring function requires three components (Pason and Sotriffer, 2016): (i) descriptors that describe the binding event, (ii) a dataset composed of three-dimensional structure of diverse protein–ligand complexes associated with the corresponding experimental affinity data, and (iii) a regression or classification algorithm to calibrate the model establishing a relationship between the descriptors and the experimental affinity. The empirical models differ in the number and type of descriptors; the algorithm adopted for training the model; and the number, the diversity, and the quality data of protein–ligand complexes used during the parameterization process.

According to the algorithm used for training, the scoring function can be linear (i.e., sum of weighted terms) or nonlinear (i.e., nonlinear relationship between the descriptors). It is important to highlight that even the multiple linear regression (MLR) algorithm, frequently used to calibrate linear scoring functions, is also a machine-learning technique. However, the term "machine-learning-based" scoring function is usually defined in the literature to refer to complex/nonlinear models developed using sophisticated machine-learning techniques to approximate nonlinear problems, such as random forests (RF), support-vector machines (SVM), and deep learning (DL) methods. The linear scoring functions are also referred as "classical" scoring functions. However, we will not adopt the "classical" nomenclature to avoid confusion with scoring functions based on classical force fields. In this work, we will adopt the nomenclature "linear" for the MLR scoring functions and "nonlinear" for models trained with more complex machinelearning techniques.

#### GOALS OF SCORING FUNCTIONS

During the docking process, the search algorithm investigates a vast amount of conformations for each molecule of the compound library. In this step, the scoring functions evaluate the quality of these docking poses, guiding the search methods toward relevant ligand conformations. The first requirement for a useful scoring function is to be able to distinguish the experimentally observed binding modes – associating them with the lowest binding energies of the energy landscape – from all the other poses found by the search algorithm (pose prediction). The second goal is to classify active and inactive compounds (VS), and the third is the prediction of the absolute binding affinity, ranking compounds correctly according to their potency (binding affinity prediction) (Jain and Nicholls, 2008; Cheng et al., 2009; Li et al., 2014c). The last one is the most challenging task, mainly in de novo design and lead optimization, since small differences in the compound could lead to drastic changes in binding affinity (Schneider and Fechner, 2005). An ideal scoring function would be able to perform the three tasks. However, given several limitations of current scoring functions, they exhibit different accuracies on distinct tasks due to modeling assumptions and simplifications made during their development phase, being intrinsically associated with the main purpose of the evaluated scoring function (Li et al., 2014b). In this context, docking protocols can adopt different scoring functions for each step, e.g., one can use a fast scoring function to predict binding modes and further predict affinities employing a more sophisticated scoring function specific for affinity prediction.

Current docking methods and the associated scoring functions exhibit good pose prediction power if one assumes an adequate preparation of the system and if the target flexibility does not play a significant role (Corbeil et al., 2012; Chaput and Mouawad, 2017). However, the detection of active compounds among a set of decoy compounds and the accurate prediction of binding affinity remain challenging tasks, even when induced fit and entropy effects are not important for binding (Gohlke and Klebe, 2002; Damm-Ganamet et al., 2013; Yuriev and Ramsland, 2013; Grinter and Zou, 2014; Smith et al., 2016). In VS experiments, it is mandatory the use of a scoring function capable of, at least, discriminating active from inactive molecules.

<sup>1</sup>http://www.dockthor.lncc.br

<sup>2</sup>http://bioserv.rpbs.univ-paris-diderot.fr/services/MTiOpenScreen/

<sup>3</sup>http://haddock.science.uu.nl/services/HADDOCK2.2

<sup>4</sup>http://blaster.docking.org/

<sup>5</sup>www.scfbio-iitd.res.in/software/drugdesign/bappl.jsp

<sup>6</sup> structure.bioc.cam.ac.uk/csm\_lig

<sup>7</sup>playmolecule.org/Kdeep

Scoring functions are typically divided into three main classes (Wang et al., 2003): force field-based, knowledge-based, and empirical. Liu and Wang (2015) recently proposed a new classification scheme, suggesting classifying current scoring functions as physics-based, regression-based, potential of mean force, and descriptor-based. Herein we will follow the traditional classification proposed by Wang et al. (2002) since we believe it is more general and is capable to classify adequately scoring functions according to the main development strategy adopted.

Force field-based functions consist of a sum of energy terms from a classical force field, usually considering the interaction energies of the protein–ligand complex (non-bonded terms) and the internal ligand energy (bonded and non-bonded terms), whereas the solvation energy can be computed by continuum solvation models such as the Poisson–Boltzmann (PB) or the related Generalized Born (GB) (Gilson et al., 1997; Zou and Kuntz, 1999). Examples of force field-based scoring functions include DOCK (Meng et al., 1992) and DockThor (de Magalhães et al., 2014).

Knowledge-based scoring functions are based on the statistical analysis of interacting atom pairs from protein–ligand complexes with available three-dimensional structures. These pairwise-atom data are converted into a pseudopotential, also known as a mean force potential, that describes the preferred geometries of the protein–ligand pairwise atoms. Examples include DrugScore (Velec et al., 2005) and PMF (Muegge, 2006).

Empirical scoring functions are developed to reproduce experimental affinity data (Pason and Sotriffer, 2016) based on the idea that it is possible to correlate the free energy of binding to a set of non-related variables. The coefficients associated with the functional terms are obtained through regression analysis using known binding affinity data of experimentally determined structures. LUDI was the first empirical scoring function developed in the pioneering work of Böhm (1992) for predicting the absolute binding free energy from atomic (3D) structures of protein–ligand complexes. Other examples of empirical scoring functions include ChemScore (Eldridge et al., 1997), ID-Score (Li et al., 2013), and GlideScore (Friesner et al., 2004, 2006a). Some empirical scoring functions (also referred as hybrid scoring functions) were developed using a mixture of force field-based, contact-based, and knowledge-based descriptors, such as DockTScore from the DockThor program (empirical and force-field based) (de Magalhães et al., 2014; Guedes et al., 2016), SMoG2016 (empirical and knowledge-based) (Debroise et al., 2017), and GalaxyDock BP2 Score (empirical, knowledge-based, and force-field based) (Baek et al., 2017).

The main focus of this review is the state-of-the-art concerning empirical scoring functions motivated by two main reasons. First, the methodology behind this type of scoring function could be fast enough to be used in large-scale structurebased VS and de novo design studies. Secondly, the use of modern sophisticated machine-learning techniques and the increasing availability of protein–ligand structures and measured binding affinity data could increase considerably the accuracy of empirical scoring functions to be useful in computer-aided SBDD experiments. In the following sections, we will discuss crucial aspects concerning their development, successful applications, limitations, and future perspectives.

### DESCRIPTORS OF EMPIRICAL SCORING FUNCTIONS

#### Intermolecular Interactions

Empirical scoring functions have implemented specific terms accounting for intermolecular interactions, such as van der Waals and electrostatic potentials. For example, the Lennard-Jones potential describes the attractive forces (e.g., dispersion forces) and the intrinsic repulsive force between two separated atoms as a function of the interatomic distances (Jones, 1924a,b). Examples of empirical scoring functions using Lennard-Jones potentials are ID-Score (Li et al., 2013) and LISA (Zheng and Merz, 2011). X-Score (Wang et al., 2002) is an example of a scoring function that adopts a softened version of the Lennard-Jones potential instead of the conventional 12-6 potential.

Although all interatomic forces are of electrostatic or electromagnetic origin, the name "electrostatic" is conventionally used to describe forces between polar atoms and is usually represented by the Coulomb potential in both force field-based and empirical scoring functions. Glide (Friesner et al., 2006a) and DockThor (de Magalhães et al., 2014) are examples of scoring functions that implement the Coulomb potential for computing electrostatic interactions.

Some scoring functions include a specific term for hydrogen bonds interactions, commonly through two approaches: (i) by using specific force field-based parameters associated to the van der Waals and electrostatic energy potentials; (ii) by using a directional term, where the hydrogen bond contribution is a function of the deviation of the geometric parameters from those of an ideal hydrogen bond.

GlideScore employs the approach (i) to calculate hydrogen bonds between polar atom pairs, while the Glide XP Score applies the strategy (ii) to account for distinct categories of hydrogen bonds such as neutral–neutral, charged–charged, and neutral–charged interactions (Friesner et al., 2004, 2006b). The DockThor scoring function, which is based on the MMFF94S force field, has also implemented the strategy (i), reducing the size of the polar hydrogen atom when it is involved in hydrogenbonding interactions (i.e., interacting with a hydrogen bond acceptor) (Halgren, 1996). X-Score adopts the approach (ii) and does not consider explicitly the hydrogen atoms, adopting a concept of "root" atom. In the LUDI implementation of the approach (ii), there are specific parameters for neutral hydrogen bonds and salt bridges (Böhm, 1994). However, some empirical functions do not differentiate hydrogen bonds between charged and neutral atom pairs, e.g., X-Score (Wang et al., 2002) and FlexX (Rarey et al., 1996). ID-Score is an example of a scoring function that uses both approaches: (i) to account for electrostatic interactions between charged groups and (ii) for hydrogenbonding interactions (Li et al., 2013). The AutoDock4 scoring function employs a directional term based on a 10/12 potential (similar to the Lennard-Jones potential) dependent of the angle deviation from an ideal H-bond interaction with the protein.

Besides the improvement in affinity predictions, the inclusion of a polar desolvation might be crucial to avoid overestimation of hydrogen bonds, since the H-bond formation is directly related with the desolvation of polar atoms.

Despite the importance in considering metal ions, it can be also a source of inaccuracy when using non-specific scoring functions, since the real contribution of interaction metal ions can be underestimated – in the case of simple counting of metal-atom interacting pairs – or overestimated – when using Coulomb potential with formal charges. For example, LUDI (Böhm, 1994), ChemScore (Eldridge et al., 1997), and SFCscore (Sotriffer et al., 2008) implement a contact-based term that attributes 1 to each pair metal–ligand atom within a distance criteria, and lower scores when the distance becomes larger than the specified criteria until an upper limit of distance, attributing the score 0 for larger distances. AutoDock4Zn has implemented a specific force-field-based potential for the zinc ion to consider both geometric and energetic components of the metal–ligand interaction, achieving better performance for pose prediction in redocking experiments (Santos-Martins et al., 2014).

Many studies have highlighted the influence of halogen bonds (X-bonds) on enhancing binding affinity against several targets and the computational methods developed so far (Desiraju et al., 2013; Ford and Ho, 2016). Given the importance of this specific interaction in the hit and lead identification, some scoring functions have incorporated special treatment for X-bonds, such as XBScore (Zimmermann et al., 2015), ScorpionScore (Kuhn et al., 2011), and AutoDockVinaXB (Koebel et al., 2016).

### Desolvation

The desolvation contribution to the binding affinity arising from the formation of the protein–ligand complex with the release of water molecules to the bulk solvent can be separated into two distinct effects: the nonpolar and the polar desolvation. The nonpolar desolvation, favorable to binding, is related to the hydrophobic effect when transferring nonpolar molecular surface from the bulk water to a medium that is nonpolar, as is the case of many protein binding cavities (Tanford, 1980; Williams and Bardsley, 1999; Freire, 2008). At the same time, the desolvation of polar or charged groups of the protein or ligand is unfavorable to binding when the formed solute–solvent interactions are not effectively satisfied upon the protein–ligand binding (Blaber et al., 1993; Kar et al., 2013). In this context, many scoring functions have implemented desolvation terms to introduce the hydrophobic effect and/or penalize buried and not interacting polar/charged atoms after protein–ligand binding to improve binding affinity predictions.

The X-Score is a consensus scoring (CS) function based on three distinct strategies to represent the favorable contribution of the desolvation event related to the hydrophobic effect: hydrophobic surface (X-ScoreHS), hydrophobic matching (X-ScoreHM), and hydrophobic contact algorithms (X-ScoreHC) (Wang et al., 2002). The first one is the hydrophobic surface algorithm (X-ScoreHS), where the hydrophobic effect is proportional to the ligand hydrophobic surface in contact with the solvent accessible surface of the protein. The second is the hydrophobic matching algorithm (X-ScoreHM), the same algorithm adopted in the SCORE function (Wang et al., 1998) that calculates the hydrophobic contribution as a function of the logP of each ligand atom and the respective lipophilicity of surrounding protein atoms. The third and simplest method is the hydrophobic contact algorithm (X-ScoreHC), which approximates the hydrophobic effect through the contact between protein–ligand pairs of lipophilic atoms.

LUDI adopts an approach similar to the X-ScoreHS (Böhm, 1994), while ChemScore (Eldridge et al., 1997) implements the algorithm similar to the X-ScoreHC. Fresno scoring function (Rognan et al., 1999) implements a more sophisticated method using the resolution of the linear form of the PB equation using finite difference methods. Cyscore (Cao and Li, 2014) considers the protein shape through a curvature-dependent surface-area term for hydrophobic free energy calculation, leading to a significant improvement on affinity prediction performance on PDBbind benchmarking sets.

The unfavorable desolvation effect from burying polar groups after ligand binding also plays an important role in the binding event, but it is commonly neglected by most scoring functions (Kar et al., 2013; Li et al., 2014c; Cramer et al., 2017). Some efforts have been made to implement specific penalization terms developed with distinct approaches to account for the polar desolvation, such as in the scoring functions ICM (Abagyan et al., 1994; Totrov and Abagyan, 1999; Fernández-Recio et al., 2004), XP GlideScore (Friesner et al., 2006a), LigScore (Krammer et al., 2005), and DockTScore (de Magalhães et al., 2014; Guedes et al., 2016).

The use of more sophisticated methods based on molecular dynamics (MD), such as MM-PBSA and MM-GBSA, have been used in conjunction with empirical scoring functions to predict binding affinities. MM-PBSA and the related MM-GBSA, considered as "end-point" approaches since all calculations are based on the initial and final states of the simulation, rely on MD simulations to compute the polar and nonpolar contributions of the protein–ligand binding event. A classical force field is utilized to compute the potential energy, and the solvation energy is calculated with an implicit solvation model. PB and GB are continuum electrostatic models used to calculate the electrostatic part of the solvation energy that treats the protein and the ligand as low-dielectric regions while considering the aqueous solvent as a high-dielectric medium (Honig et al., 1993). When associated with a surface-area-dependent term (SA), they lead to the implicit solvation models PB (PBSA) (Sitkoff et al., 1994) and Generalized Born (GBSA) (Still et al., 1990; Qiu et al., 1997). Sun et al. (2014) evaluated the performance of MM-PBSA and MM-GBSA methods using several protocols with 1864 protein–ligand complexes from PDBbind v2011 dataset. They concluded that although similar results were observed, MM-GBSA is less sensitive to the investigated systems and is more suitable to be used in general cases (e.g., reverse docking, which is widely used to predict the receptor target(s) of a compound). Inspired by the promising results obtained with GBSA, Zou and Kuntz (1999) implemented a GBSA scheme into the DOCK program as an alternative scoring function and obtained improved binding affinity predictions due to a better description of electrostatic and desolvation effects. More recently,

Zhang X. et al. (2017) also obtained significant improvement on binding affinity prediction of antithrombin ligands when rescoring the top-scored docking poses from VinaLC docking engine with MM-GBSA. Spiliotopoulos et al. (2016) successfully integrated a damped version of MM-PBSA with the HADDOCK scoring function to predict binding poses and affinity of protein– peptide complexes.

### Ligand Entropy

Configurational entropy is related to the loss of flexibility of the ligand upon binding. It can be represented as a sum of the conformational (Sconf) and the vibrational (S o vib) entropies (Schäfer et al., 2002; Chang et al., 2007). In the energy landscape framework of the protein–ligand binding event, the former reflects the number of occupied energy wells and the last express the average width of the occupied wells. Sconf is related to the reduction of the number of ligand accessible conformations upon binding, while S o vib is mainly caused by the restriction of rotational amplitude inside the binding site when compared to the unbounded state (Chang et al., 2007; Gilson and Zhou, 2007).

Given the difficulty in modeling entropic effects for 1Gbind, scoring functions generally neglect their contributions or adopt simplified algorithms to approximate entropies in a straightforward manner (Jain, 2006). Scoring functions such as LUDI (Böhm, 1994) and X-Score (Wang et al., 2002) consider the entropic loss due to the restriction of rotational and translational degrees of freedom implicitly in the regression constant 1G0. Surflex approximates such entropic loss as the logarithm of the ligand molecular weight multiplied by a scale factor related to the rough mass dependence of the translational and rotational entropies (Jain, 1996).

The restriction of the rotatable bonds of the ligand after the formation of the protein–ligand complex also promotes an entropic loss (Sconf) that is unfavorable to the binding affinity. Some scoring functions have implemented specific terms in a rough approximation to account for entropic contributions of the ligand, as the most used strategies: (i) proportional to the number of rotatable bonds, and (ii) considering the environment of each rotatable bond, i.e., only penalize rotatable bonds that are in contact with the protein. LUDI (Böhm, 1994) and Fresno (Rognan et al., 1999) implement the approach (i) while ChemScore (Eldridge et al., 1997) and ID-Score (Li et al., 2013) use variations of the strategy (ii).

Inspired by the successful application of the energy landscape theory in protein folding and biomolecular binding (Jackson and Fersht, 1991; Miller and Dill, 1997; Baker, 2000), researchers make use of the multiple binding modes predicted by docking programs to describe the binding energy landscape. For example, Wei et al. (2010) developed two new parameters extracted from the multiple binding modes, generated by the AutoDock 3.05 program, and combined them for classification purposes using logistic regression to distinguish true binders among high-scored decoys. The new proposed scheme considered the energy gap (i.e., the difference between the binding energy of the native binding mode and the average binding energy of other binding modes – the thermodynamic stability of the native state) and the number of local binding wells (kinetic accessibility). This strategy was successfully applied in the neuraminidase and cyclooxygenase-2 systems from the DUD database, with even improved accuracy when associated with the docking scores. Grigoryan et al. (2012) also successfully applied the energy gap to distinguish true binders from decoys in several protein targets from DUD on single and multiple-receptor VS experiments, achieving superior performance than the ICM scoring function.

### Descriptors Based on the Counting of Atom Pairs

With the advance of sophisticated machine-learning algorithms, an increasing number of scoring functions based on a pool of simplistic descriptors have emerged, such as the counting of protein–ligand atom pairs and ligand-based properties. In the literature, such scoring functions are also known as "descriptorbased" or "machine-learning based." It is important to note that this kind of scoring functions are also empirical models, since (i) the algorithms commonly used to derive the models, such as the classical MLR or the robust RF, are machine-learning methods<sup>8</sup> , (ii) the attributes used to describe the binding event are, in fact, descriptors, independently of their functional form, physical meaning, and complexity degree.

The success of descriptors based on the simple counting of atom pairs is associated with two important aspects: (i) amount and definition not limited by complex implementations or physical meaning assumptions, and (ii) practically eliminate the necessity of a detailed preparation of the structures, correct assignment of atom types, and physical quantities (e.g., atomic partial charges). Many papers in the recent literature describe outstanding results for binding affinity prediction and active/inactive classification using this more pragmatic approach (Ballester and Mitchell, 2010; Pereira et al., 2016; Wójcikowski et al., 2017). However, the conjunction of nonlinear models and more straightforward atom counting descriptors is subjected to significant criticisms (Gabel et al., 2014). Among the main critics we can highlight: (i) insensitiveness to the protonation state of the ligands and receptor residues; (ii) insensitiveness to the ligand pose; and (iii) facilitate the inclusion of methodological artifacts due to overtraining even when using large training sets.

### TRAINING AND TEST SETS

#### Datasets

The availability of protein–ligand structures with measured binding data has been increased due to efforts on data collection, such as PDBbind-CN (Liu et al., 2015, 2017), DUD-E (Mysinger et al., 2012), and DEKOIS (Bauer et al., 2013) projects.

PDBbind-CN is a source of biomolecular complexes with protein–ligand structure determined experimentally with the associated binding data manually collected from their original reference (Liu et al., 2015). The current release (version 2017)

<sup>8</sup> Indeed, according to the IUPAC Recommendations 2015, the term "machine learning" refers to a computer algorithm that generate empirical models, (...), that is derived from the analysis of a training set for which all the necessary data are available (Martin et al., 2016).

contains 17,900 structures (14,761 protein–ligand complexes) and is annually updated to keep up with the growth of the Protein Data Bank (Berman et al., 2000). The "refined set" is a subset composed of high-quality datasets constructed according to several criteria concerning the quality of the structures, the affinity data, and the nature of the complex, being considered one of the largest datasets of structures available for the development and validation of docking methodologies and scoring functions. Collected affinities comprise a large interval of values, ranging from 1.2 pM (1.2 × 10−<sup>12</sup> M) to 10 mM (1.0 × 10−<sup>3</sup> M). Also, PDBbind-CN provides a benchmarking named "core set" widely used for comparative assessment of scoring functions in predicting affinities (Li Y. et al., 2018). The core set is a subset of the refined set constructed using the following protocol: (i) firstly, protein structures with identity of sequence higher than 90% were grouped leading to 65 clusters associated with different protein families; (ii) only the clusters composed of at least five members were considered to construct the core set; and (iii) for each of these clusters, only the complexes with the lowest, the medium, and the highest affinities were selected to the final composition of the core set. A significant drawback of PDBbind-CN datasets is the insufficient information regarding negative data (i.e., experimentally confirmed inactive compounds).

The DUD-E dataset is an enhanced version of the original DUD set and has been widely used to train and validate scoring functions (Huang et al., 2006; Mysinger et al., 2012). It is composed of 102 targets with corresponding active, inactive, marginal, and decoy compounds. Although the number of ligands (i.e., active compounds) significantly varies for each target, a proportion of 50 decoys per ligand is kept for all 102 macromolecules. Decoys are presumed, not experimentally verified, to be inactive compounds since they are chosen to be topologically distinct from ligands but exhibiting similar physicochemical properties. The use of decoys instead of validated inactive compounds remains a major drawback for most datasets since no experimental activity are reported for them, and the number of confirmed inactive molecules is too scarce (Lagarde et al., 2015; Chaput et al., 2016b; Réau et al., 2018).

DEKOIS 2.0 is composed of 81 benchmarking sets for 80 protein targets of therapeutic relevance, including nonconventional targets such as protein–protein interaction complexes (Bauer et al., 2013). Active compounds and the associated binding affinity were retrieved from BindingDB applying several filters to remove pan assay interference (PAINS) compounds, weak binders, reactive groups, and undefined stereocenters. To derive a structurally diverse data set, for each protein target the active compounds were clustered into 40 groups according to the Tanimoto structural similarity and only the most potent compound of each cluster was selected. For each active molecule, 30 structurally diverse decoys molecules from ZINC database were selected according to an improved protocol to that used in the first version of DEKOIS dataset (Vogel et al., 2011), including the detection and removing of latent actives in the decoy set (LADS). Although DUD-E and DEKOIS 2.0 share a common structure of active and decoys compounds, they are complementary since there is a small overlap between them: only four protein targets present in DEKOIS 2.0 overlaps with the DUD-E dataset.

Scoring functions can be developed based on either experimental structures (i.e., protein–ligand structure experimentally determined) or conformations predicted with docking programs. The structure source (i.e., experimental or docked) is an important point to consider. The use of benchmarking sets such as DUD-E and DEKOIS2.0 is directly dependent on the docking program adopted since the experimental structures of the protein–ligand complexes are not available as in the PDBbind datasets. In fact, the scoring function training or validation in VS experiments using these datasets is performed with no warranty that the ligand poses were correctly predicted.

#### Training, Validation, and Test Sets

The dataset is commonly separated into three subsets without overlapping structures: (i) the training set, (ii) the validation set, and (iii) the test set (also known as "external validation set").

The training set is utilized to calibrate the parameters of the scoring function and to learn the rules that establish a quantitative relationship between the descriptors and the experimental affinity. The validation is used to assess the generalization error<sup>9</sup> guiding the model tuning and selection. Once the best model is chosen, it is then applied to the test set to evaluate the real predictive capacity of the model.

There is a tradeoff between the size of the training and validation/test sets. Whereas the use of an extensive validation/test set is useful in providing a better estimate of the generalization error, this usually implicates in a smaller dataset to be utilized in the training phase (Abu-Mostafa et al., 2012). Studies evaluating the influence of the training size for the performance of linear and nonlinear scoring functions for affinity prediction demonstrated that MLR becomes insensitive to the growth of the training size whereas larger training sets can lead to an overall better accuracy of nonlinear scoring functions (Ding et al., 2013; Ain et al., 2015; Li et al., 2015a,b; Li H. et al., 2018).

In this context, cross-validation emerges as an alternative strategy to estimate the generalization error without strictly changing the training set size. Cross-validation experiments consist of continuously splitting the original training set of size N into two parts K times (K-fold cross-validation): a smaller set of size V for validation (V = N/K) and a larger set of the remaining T instances (T = N−V) for training (e.g., leave-one-out crossvalidation considers V = 1). Different schemes of cross-validation have been adopted and explored to train linear and nonlinear models (Shao, 1993; Golbraikh and Tropsha, 2002; Kramer and Gedeck, 2010; Ballester and Mitchell, 2011; Wójcikowski et al., 2017). For example, in the recent work of Wójcikowski et al. (2017), they performed fivefold cross-validations using the DUD-E dataset. Three distinct splitting strategies were considered: horizontal, vertical, and per-target. In the horizontal split, all folds necessarily contain protein–ligand complexes from all protein

<sup>9</sup>Generalization error is the expected error when the scoring function is evaluated on a dataset composed of new protein–ligand complexes (i.e., structures not used in the training step).

targets (i.e., each protein target is present in both training and test sets). In the vertical split, the protein targets present in the test set do not have representative structures in the training set. This evaluation simulates those cases where the protein target of interest was not present during the training phase. Finally, in the per-target split, the training and test are performed for each protein target (i.e., 102 unique machine-learning models relative to the 102 DUD-E targets), simulating the construction and validation of target-specific scoring functions.

It is important to keep in mind that training, validation, and test sets must never have protein–ligand complexes in common at the same time. Furthermore, the test set must be composed of instances not used in the training process at any moment. Thus, the test set must be used only for evaluating the predictive performance of different scoring functions, and no decision should be taken based on the performance for this dataset to avoid useless comparisons due to artificially high correlations.

#### Benchmarking and Evaluation Metrics

Standard benchmarks are of great importance for an objective assessment of scoring functions providing a reproducible and reliable way to compare different methods. PDBbind (Liu et al., 2015), DUD-E (Mysinger et al., 2012), and DEKOIS 2.0 (Bauer et al., 2013) are examples of widely used benchmarks for evaluating scoring functions.

Many evaluation metrics are used to quantify the performance of scoring functions in pose prediction, active/inactive classification, and affinity prediction. A special issue on Evaluation of Computational Methods collects several highquality papers covering the main aspects of the problem in evaluating and comparing distinct methodologies, highlighting the strengths and weakness of widely used metrics (Stouch, 2008). Recently, Huang and Wong (2016) developed an inexpensive method – the screening performance index (SPI) – to evaluate VS methods that correlate with BEDROC with less computational cost, since it discards the necessity of docking decoy compounds (i.e., only considers the docking of active molecules).

Scoring functions are generally evaluated regarding four aspects related to the three goals of scoring functions aforementioned (Liu et al., 2017):

Docking power: the ability of a scoring function in detecting the native binding mode from decoy poses as the top-ranked solution. The root-mean square deviation (RMSD) is the most commonly used metric to assess the docking power performance.

Screening power: the ability of a scoring function in correctly distinguishing active compounds from inactive molecules. The screening power test does not require that the scoring function correctly predict the absolute binding affinity. The screening power is usually quantified by BEDROC and enrichment factor (EF).

Ranking power: the ability of a scoring function in rank correctly the compounds according to the binding affinities against the same target protein. The Spearman correlation coefficient (RS) and Kendall's tau are metrics widely used for assessing the ranking power of scoring functions.

Scoring power: the ability of a scoring function in rank correctly the compounds according to the binding affinities against distinct target proteins. It is important to note that the scoring power test considers the absolute value of the affinity prediction, requiring that the predicted and experimentally observed binding affinities have a linear correlation. This performance is widely assessed by the Pearson correlation coefficient (RP), and the root-mean squared error (RMSE).

The predictive performance of scoring functions may vary between different benchmarking experiments due to factors such as: (i) composition of the dataset, (ii) structural quality of the complexes, (iii) level of experience of the researches performing the experiments, and (iv) protocol of preparation of the complexes (Yuriev and Ramsland, 2013). Although ranking scoring functions according to their performances for affinity prediction on benchmark sets highlights the more competitive models, it is important to observe that small differences in the calculated performances are generally insufficient to state which scoring function performs better than other when comparing the top-ranked models. Since most benchmarking studies evaluate scoring functions on a few hundred complexes, small differences in Spearman correlation coefficient between 0.05 and 0.15, for example, lack statistical significance (Carlson, 2013, 2016). Thus, larger benchmarking sets composed of highquality protein–ligand complexes structures are required for a reliable comparison of docking methodologies and scoring functions.

In addition to the well-known benchmarking sets, prospective evaluations are of substantial importance since the blinded predictions simulate real experiments of VS campaigns. Drug Design Data Resource (D3R<sup>10</sup>) periodically provide pharmaceutical-related benchmark datasets and a Grand Challenge as a blinded community challenge with unpublished data (Gathiaka et al., 2016). According to the results obtained in the Grand Challenge 2, it is clear that the pose prediction task is well performed for many methodologies, but scoring is still a very challenging task, even when the crystal structures are provided (Gaieb et al., 2018). Even with the crystal structures of 36 complexes at Stage 2, the maximum Kendall's tau achieved was 0.46, reinforcing the great deal in correctly ranking a set of compounds. Performances and detailed description of the protocols adopted are provided at the D3R Grand Challenge 2 website<sup>11</sup> and on the scientific reports published on a special issue of Journal of Computer-Aided Molecular Design (Gaieb et al., 2018).

In the last version, D3R Grand Challenge 3 (GC3), the participants had also to deal with even more challenging tasks, such as the selectivity identification for kinases, assessing the ability of the scoring functions in identifying large changes in affinity due to small structural changes in the ligand (kinase activity cliff ), and the influence of kinase mutations on protein– ligand affinity (kinase mutants).

The broad profile of the D3R Grand Challenges, regarding chemical space diversity and affinity data carefully collected, makes their datasets one of the more reliable sources to evaluate docking and scoring methods, providing useful guidelines and

<sup>10</sup>http://www.drugdesigndata.org

<sup>11</sup>https://drugdesigndata.org//about/grand-challenge-2-evaluation-results

best practices for further VS campaigns and methodological improvements.

### The Accuracy of Input Structural and Binding Data

Important issues regarding the quality of structural and affinity data must be considered for the development, validation, and application of scoring functions in VS experiments. Reliable protein–ligand structures usually comply these criteria: good resolution (2.5 Å or better), fully resolved electron density for the entire ligand and the surrounding binding-site residues, and without significant influences from crystal packing on the observed binding mode (Cole et al., 2011).

The correct assignment of both protein and ligand protonation/tautomeric states with respect to the experimental pH, Asn/Gln/His flips, and defined stereocenters of the compounds are crucial, requiring a careful inspection of the structures (Kalliokoski et al., 2009; Martin, 2009; Petukh et al., 2013; Sastry et al., 2013). Indeed, the preparation of protein–ligand complexes has a direct influence on training and evaluation of scoring functions, mainly for scoring functions based on force-field descriptors. For example, the initial automatic preparation of the structures performed by PDBbind did not provide an optimized hydrogen bond network and appropriate assignment of protonation/tautomeric states of the α-amylase and MeG2-GHIL complex [**Figure 1**, PDB code 1U33; Numao et al., 2004]. The careful inspection and correction of such complexes comprise a time-consuming and challenging task, but they are particularly important when hydrogen atoms are considered explicitly. In such cases, the wrong orientation of hydrogen atoms can lead to high van der Waals energies, underestimation of hydrogen bond interactions, and incorrect electrostatic repulsions between charged/polar groups. Despite many efforts made for collecting even more extensive and better quality datasets, little attention has been paid to the careful preparation of the protein–ligand structures, usually relying on automatic procedures (Bauer et al., 2013). In this context, scoring functions mainly composed of simple contact-based descriptors (element–element pair counting) emerge to circumvent the complicated preparation required in large datasets for VS.

Especially for affinity prediction purposes, the use of datasets with curated affinity data is essential for reliable predictions and benchmarking. For example, the PDBbind refined set follows several criteria concerning the bioactivity manually collected from the original reference (Liu et al., 2015): (i) only complexes with known dissociation constants (Kd) or inhibition constants (Ki) are allowed, (ii) no complexes with extremely low (K<sup>d</sup> or K<sup>i</sup> > 10 mM) or extremely high (K<sup>d</sup> or K<sup>i</sup> < 1 pM) affinities are accepted, and (iii) estimated values are rejected, e.g., K<sup>d</sup> ∼ 1 nM or K<sup>i</sup> > 10 µM. Despite the efforts in collecting high-quality affinity data, many factors such as the inherent experimental error can be a source of inaccuracies, limiting the average prediction error achievable on large datasets (Shoichet, 2006; Ferreira et al., 2009; Sotriffer and Matter, 2011; Kramer et al., 2012). Furthermore, the use of decoys instead of confirmed inactive compounds has important impacts in training and measuring the performance of scoring functions (Chaput et al., 2016b; Réau et al., 2018).

### MACHINE LEARNING

#### Regression and Classification

Scoring functions can be developed using regression methods to reproduce continuous (e.g., binding constants) or classification methods to reproduce binary affinity data (e.g., active/inactive). It is possible to use scoring functions trained with regression methods to classify active and inactive molecules given a predetermined range of affinity data for defining active and inactive compounds (Ain et al., 2015). It is also possible to use both classification and regression approaches to deal with the same problem of binding affinity prediction. For example, Pason and Sotriffer (2016) used a strategy of classifying the complexes using algorithms such as KNN and further generating linear regression models for each cluster achieving predictive performances comparable to that obtained by the nonlinear scoring function trained with RF. Many sophisticated machinelearning techniques automatically generate local models for similar training points (e.g., locally weighted regression), being able to classify the new instances automatically and use different regression models according to specific properties without explicitly defining classes based on such descriptors.

### Linear Versus Nonlinear Scoring Functions

Scoring functions can also be classified as "linear" and "nonlinear" models (Artemenko, 2008).

Linear regression is one of the simplest learning algorithms and is widely used as a starting point in the development of nonlinear regression models (Bishop, 2006). A linear empirical scoring function can be written as a sum of independent terms such as:

$$
\Delta G\_{\text{binding}} = \varepsilon\_0 + \varepsilon\_1 \Delta G\_{\text{vdW}} + \varepsilon\_2 \Delta G\_{\text{hbond}} + \varepsilon\_3 \Delta G\_{\text{entropy}}
$$

where c<sup>i</sup> is the weighting coefficients of the respective 1G<sup>i</sup> terms, adjusted to reproduce affinity data based on the training set. In the example, 1GvdW is a van der Waals potential, 1Ghbond is a specific term accounting for hydrogen bonds, and 1Gentropy is related to the ligand entropic loss upon binding.

The most crucial difference between linear and nonlinear scoring functions is that the former requires a predefined functional form (e.g., the sum of terms in the case of linear scoring functions), whereas the latter implicitly derives the mathematical relationship between the descriptors, allowing the combination of variables and higher order exponents for the terms. This advantage of nonlinear scoring functions partially circumvents the problematic modeling assumptions of linear models (Dill, 1997; Baum et al., 2010; Sotriffer, 2012).

Linear scoring functions developed to date have shown moderate correlations (R<sup>P</sup> ∼ 0.6), whereas nonlinear models achieved significantly better correlations (R<sup>P</sup> > 0.7) on benchmarking studies (Ashtawy and Mahapatra, 2012;

Khamis and Gomaa, 2015; Wang and Zhang, 2017; Wójcikowski et al., 2017). RF, SVM, and more recently, DL, are nonlinear algorithms widely used to develop scoring functions.

The superiority of nonlinear models has also been confirmed through the rebuild of linear scoring functions using nonlinear algorithms, i.e., scoring functions trained with the same original descriptors of the correspondent linear model but with a different regression method. As an example, Zilian and Sotriffer (2013) trained a RF scoring function using the same SFCscore descriptors (named SFCscoreRF) and found a much improved model, with R = 0.779 significantly higher than those correlations obtained for the SFScore linear models (Pason and Sotriffer, 2016). Li et al. (2014a) investigated the replacement of MLR by RF for regression using the same Cyscore descriptors and found that the nonlinear model improved the affinity prediction. Furthermore, they also observed that larger training sets and describing the complexes with more descriptors have a positive impact in the predictive performance of the nonlinear models. Pason and Sotriffer (2016) demonstrated that it is possible to achieve similar high performances of nonlinear models through the development of a set of linear scoring functions trained using clustered – smaller and more homogeneous – datasets of protein–ligand complexes. In fact, many machine-learning techniques are based in this approach. For example, locally weighted linear regression automatically generate distinct "local" linear models weighting the training points according to their similarity with the instance to be predicted.

DL is considered as a promising approach to diverse drug discovery projects guided by the successes obtained in image and speech recognition problems (Zhang L. et al., 2017). Such methods take advantage of the recent increase in computational power and the ever-expanding availability of structural and binding data. DL methods are neural networks with many hidden layers, being capable to automatically learn the complicated relationship between the descriptors related to the protein–ligand binding. Recently, DL has been applied for pose/affinity prediction and active/inactive detection, exhibiting an outstanding performance when compared with several well-performing scoring functions developed with both linear and nonlinear approaches (Wallach et al., 2015; Khamis et al., 2016; Pereira et al., 2016; Ragoza et al., 2017; Jiménez Luna et al., 2018; Nguyen et al., 2018).

Despite nonlinear scoring functions have the main advantage of discarding the necessity of a pre-defined functional form, their main drawback is that they work as "black boxes" since the relationship between the descriptors is often vague, requiring careful use to avoid meaningless interpretations (Gabel et al., 2014). Together with the use of a significant amount of descriptors lacking physical meaning, nonlinear models offer the risk of producing excellent performance indexes due to overfitting and/or bias to the training set construction (e.g., capturing the rules adopted during the selection of active and decoy compounds) (Hawkins, 2004; Abu-Mostafa et al., 2012).

### CHALLENGING TOPICS AND PROMISING STRATEGIES

### Protein Flexibility

Protein flexibility is still a great challenge for docking programs and scoring functions (Cavasotto and Singh, 2008; Tuffery and Derreumaux, 2012; Buonfiglio et al., 2015; Spyrakis and Cavasotto, 2015; Kurkcuoglu et al., 2018). Most docking methodologies adopt a single, rigid conformation of the receptor, due to the high computational cost and methodological limitations proportional to the increase in the degree of flexibility. However, over the last decades, many strategies have been implemented in docking programs to consider some degree of flexibility in the targeted, such as soft potentials and ensemble docking. In this context, the development of scoring functions adapted for flexible receptor docking is crucial to achieve real improvements in pose and affinity prediction (Totrov and Abagyan, 1997; Wei et al., 2002; Fischer et al., 2014; Ravindranath et al., 2015; Lam et al., 2017; Kong

et al., 2018). Ferrari et al. (2004) implemented the fast and methodologically simple soft-docking strategy into the DOCK program, softening the repulsive term of the Lennard-Jones potential, allowing small overlaps between the protein and the ligand atoms. They also validated the methodology in VS studies of potential ligands of the T4 lysozyme and the aldol reductase and obtained better results than using regular docking strategies. Ensemble docking implicitly considers the receptor flexibility by docking the ligand on a set of protein conformations instead of a single conformation, being capable to simulate large-scale receptor flexibility (Korb et al., 2012). Recently, Fischer et al. (2014) successfully identified new ligands targeting specific receptor conformations of cytochrome c peroxidase using a flexible docking method that samples and weights protein conformations guided by experimentally derived conformations, integrating the Boltzmann-weighted energy penalties related with the protein flexibility to the DOCK3.7 scoring function. Despite the many efforts made to include the protein flexibility in VS experiments, the complex and multifactorial framework of flexible protein–ligand binding is still a great challenge (Bottegoni et al., 2011; Nunes-Alves and Arantes, 2014; Antunes et al., 2015; Buonfiglio et al., 2015; Kong et al., 2018). Whereas the high computational cost related with sampling protein conformations and docking large compound libraries can be overcome with the use of highperformance computing platforms, weighing such conformations and integrating them with the scoring functions remains a hindrance for accurate estimation of binding affinities on flexible systems.

#### Solvation

Water molecules play an essential role in the ligand–protein binding process. Besides the hydrophobic and desolvation effects, individual water molecules can stabilize the ligand binding mode through the formation of water bridges or a water-mediated hydrogen-bond network (Poornima and Dean, 1995; Levy and Onuchic, 2006). The correct prediction of the free energy of binding associated to the ligand displacement of water molecules is a key challenge for the currently available docking scoring functions (Riniker et al., 2012; Spyrakis and Cavasotto, 2015; Bodnarchuk, 2016). An interesting approach is the use of a water-mapping protocol based on the post trajectory analysis of explicit solvent MD. This analysis is based on the inhomogeneous solvation theory and tries to predict the free energy cost of moving a water molecule from a protein hydration site into the bulk solvent (Yang et al., 2013). For instance, in the WScore docking methodology, the location and thermodynamics of explicit waters are predicted using WaterMap and integrated to the scoring function together with a desolvation term to penalize the associated desolvation of polar or uncharged groups of protein or ligand (Murphy et al., 2016). Many solvent mapping methods were evaluated on real drug design studies in a recent paper (Bucher et al., 2018), showing that solvent mapping methods could be important to help ligand optimization and to correctly rank compounds to assist synthetic prioritization. However, these approaches only calculate the solvent contribution to the free energy and must be combined with other methods to be used for lead optimization or VS.

Recently, Bodnarchuk (2016) published an extensive review of water-placement methods helpful for locating conserved water molecules within the protein binding site to be considered explicitly during the docking simulation. Once the water molecules are identified, some docking engines have implemented strategies to treat water molecules explicitly with adapted scoring functions. The GOLD program considers all-atom and flexible water model able to rotate around its three principal axes, and rewards water displacement in the GoldScore or ChemScore scoring functions according to a balance between the loss of rigid-body entropy and the change in the interaction energies on binding to the protein cavity (Verdonk et al., 2005). In AutoDock4, explicit water molecules of the first hydration shell as represented as uncharged spheres directly attached to the ligand, whereas a hydration force field accounting for the entropic and enthalpic contributions, automatically predicts their potential in mediating protein–ligand interactions (Forli and Olson, 2012).

#### Covalent Docking

All the discussion made in this review assumes that we are dealing with non-covalent inhibitors. In such cases, the identification and development of computer-aided strategies to identify or improve lead compounds are based on the identification of non-covalent interactions (e.g., electrostatic, van der Waals, hydrophobic interactions) to improve potency or increase selectivity. However, there is a whole class of inhibitors that form a covalent bond with their enzyme/receptor target (De Cesco et al., 2017). Covalent inhibitors can further be divided into two different categories according to whether inhibition is reversible or irreversible (Tuley and Fast, 2018). The development of covalent-docking methodologies capable of dealing with such type of inhibition is very important due to the potential advantages associated with covalent inhibitors (De Cesco et al., 2017), including (i) sustained duration of action leading to less frequent dosing, (ii) increased ligand efficiency, (iii) ability to inhibit targets with shallow binding sites previously categorized as "undruggable," and (iv) increased ability to overcome resistant mutations, among others. The development of non-covalent inhibitors in a drugdesign study is usually guided by the optimization of the affinity or dissociation constants (i.e., K<sup>i</sup> , Kd, IC50). However, dealing with covalent inhibition is even more complex, and in order to address the full potential of a covalent-inhibitor we need not only to measure their affinities but also kinetic binding parameters (e.g., residence time t<sup>r</sup> , the average time that a ligand remains bound in the binding site) (De Cesco et al., 2017; Trani et al., 2018). The development of docking methodologies to predict poses and binding affinities of ligands that bind covalently to the receptor is a challenging task. Due to the increasing interest in covalent drugs, many noncovalent docking programs have developed covalent versions and some new docking programs focused on covalent ligands have been developed (Kumalo et al., 2015; Awoonor-Williams

et al., 2017; De Cesco et al., 2017). GOLD (Jones et al., 1997), Autodock4 (Bianco et al., 2016), CovalentDock (Ouyang et al., 2013), CovDock (Zhu et al., 2014), DOCKovalent (London et al., 2014), and DOCK-TITE (Scholz et al., 2015) are some examples of docking programs that developed specific methodologies to deal with covalent-docking. These methodologies were discussed in recent reviews addressing covalent-inhibitors and covalent docking (Kumalo et al., 2015; Awoonor-Williams et al., 2017; De Cesco et al., 2017). Some of these methods try to include the complexity of the covalent inhibition introducing modifications into their non-covalent scoring functions. For example, the introduction of a Morse potential to describe the energy associated with the bond formation (CovalentDock). Two critical aspects in the future development of covalent scoring functions are the capacity to predict the kinetics of ligand binding (e.g., residence times) and the intrinsic reactivity of electrophilic and nucleophilic pairs of atoms (De Cesco et al., 2017).

#### Quantum Mechanics

The use of quantum mechanical methods can improve the description of protein–ligand interactions and, in principle, could provide a more accurate binding affinity (Raha and Merz, 2005; Chaskar et al., 2017; Crespo et al., 2017; Cavasotto et al., 2018). This is particularly true when dealing with systems where the molecular recognition involves bond formation, π-stacking, cation-π, halogen bonding (i.e., σ-hole bonding), and polarization and charge transfer effects (Christensen et al., 2016). These non-classical interactions/effects are beyond the limits of classical methods and represent a significant challenge to the development of scoring functions to be used in computational drug design experiments. In particular, metal ions interactions are essential when dealing with metalloproteins and, due to the large changes in the electronic structure under ligand binding, are also a great challenge. In the last 10 years, important advances were made in computing hardware (e.g., Graphics Processing Units – GPUs), in the development of quantum algorithms to compute molecular wave functions (Dixon and Merz, 1997; Birgin et al., 2013), the development of more reliable semi-empirical quantum methods (Christensen et al., 2016; Yilmazer and Korth, 2016), and development of new hybrid QM/MM methods (Chaskar et al., 2017; Melo et al., 2018). These advances were essential to overcome the bottleneck of the high computational cost and are allowing the increasing use of QM methods in the prediction of protein–ligand binding affinities (Crespo et al., 2017). Recent high-quality reviews cover applications of explicit QM calculations in lead identification and optimization (Adeniyi and Soliman, 2017; Crespo et al., 2017; Cavasotto et al., 2018), development of QM methods for ligand binding affinity calculations (Ryde and Söderhjelm, 2016), and development of semi-empirical QM methods for noncovalent interactions (Christensen et al., 2016; Yilmazer and Korth, 2016).

The results obtained using QM or hybrid QM/MM-based methods are very encouraging when compared to the standard scoring functions, principally when dealing with metalloproteins (Chaskar et al., 2017; Pecina et al., 2018). Wang et al. (2011) rebuild the AutoDock4 scoring function using ligand partial charges calculated with QM methods and protein charges from the Amber99SB instead of the Gasteiger method, improving both pose and affinity predictions. Moreover, the results from the 2016 D3R Grand Challenge indicate that the use of QM/MM scoring could be a powerful strategy (Gao et al., 2018). Yang et al. (2015) developed and introduced the quantum mechanics-based term XBScoreQM as a combination of van der Waals and electrostatic potentials to describe the X-bond interactions into the AutoDock4 scoring function. The new scoring function achieved good performances on both pose and affinity prediction when compared against 12 diverse scoring functions, and increase predictive capacity to deal with protein– ligand complexes with X-bond interactions. Nevertheless, it is important to note that it is not guaranteed that QMbased approaches will always outperform standard scoring functions (Crespo et al., 2017) and they still face the same problems associated with the correct estimation of the solvent and other entropic effects to the protein–ligand binding free energy.

### Consensus Scoring

The combination of different scoring functions on a scoring scheme (CS) is considered as a promising data fusion strategy to improve VS enrichment, pose, and affinity prediction (Charifson et al., 1999; Bissantz et al., 2000; Yang et al., 2005; Kaserer et al., 2015; Chaput et al., 2016a; Chaput and Mouawad, 2017; Ericksen et al., 2017). The CS strategy could overcome to some extent the limitations faced by the single-scoring approach, for example, the inconsistent performances across different protein targets and chemical classes (Moitessier et al., 2009). Moreover, CS is frequently used in some extent together with ensemble docking methodology, where different scores are predicted for different conformations of the protein target under investigation (Park et al., 2009, 2010; Paulsen and Anderson, 2009; Kelemen et al., 2016; Baumgartner and Evans, 2018; Li D.-D. et al., 2018).

Since the pioneering work of Charifson et al. (1999), many consensus strategies were developed and assessed on several target proteins, such as cyclooxygenases (Kaserer et al., 2015), and β-secretases (Liu et al., 2012). For instance, Kaserer et al. (2015) applied CS on prospective VS studies against cyclooxygenases 1 and 2 and found that the chance of a compound to be truly active increases when more tools predicted it as active. In the very interesting work of Wang and Wang (2001), they provided a theoretical basis for the effectiveness of CS on affinity prediction. They demonstrated that CS works due to a simple statistical reason related to the law of large numbers: the mean value found by repeated independent predictions tends toward the real and expected value.

Traditional CS approaches combine the predictions of the scoring functions using statistical methods (e.g., arithmetic mean) or voting schemes (i.e., a vote replaces the absolute score predicted by each scoring function) (Terp et al., 2001; Wang and Wang, 2001; Wang et al., 2002; Bar-Haim et al., 2009; Ericksen et al., 2017). Nonlinear CS models were also developed to improve pose prediction and ranking compounds

in VS experiments (Betzi et al., 2006; Teramoto and Fukunishi, 2007; Ashtawy and Mahapatra, 2015; Ericksen et al., 2017). For example, Ericksen et al. (2017) developed machine-learning CS using discrete mixture models and gradient boosting to combine the scores from eight docking programs and obtained improved performances than individual scoring functions on 21 targets from DUD-E dataset. In addition, they compared their machine-learning-based CS with individual scoring functions and traditional CS schemed, confirming that CS excel individual scoring functions performances in docking-based VS, being less sensitive to protein target variation.

### Tailored Scoring Functions for Protein Targets and Classes

Significant improvements in docking and VS accuracies are reported when employing target-specific scoring functions rather than non-specific models, using as training datasets protein– ligand complexes comprising specific molecular targets instead of a general dataset. Hence, it is expected that they could be more efficient in accounting for specific interactions and particular binding characteristics associated with a target class of interest (Seifert, 2009).

For instance, Logean et al. (2001) adapted the Fresno empirical scoring function to the class I MHC HLA-B<sup>∗</sup> 2705 protein with a significant improvement in affinity prediction over six different traditional scoring functions. The GOLD program also implements a modified version of the ChemScore function, with an additional term that accounts for weak hydrogen bonds that claimed to be relevant for some kinase inhibitor binding (Pierce et al., 2002; Verdonk et al., 2004). The HADDOCKPPI is a linear scoring function specifically developed to predict binding affinities of inhibitors of protein–protein interactions (iPPIs), which interact in uncommon binding cavities characterized by higher hydrophobicity, aromaticity, and molecular weight compared to enzyme inhibitors, as usually interacting within flatter, larger, and more hydrophobic binding sites than the enzyme catalytic sites (Morelli et al., 2011; Kuenemann et al., 2014). In a more recent work, a scoring function specific to Heat Shock Protein 90 (HSP90) was successfully designed and applied in VS (Santos-Martins, 2016). In general, nonlinear scoring functions specific for protein classes/targets also achieved superior performance than the generic models (Wang et al., 2015; Ashtawy and Mahapatra, 2018). Still, in the recent work of Wójcikowski et al. (2017), the target-specific scoring functions trained with RF only performed slightly better than generic models, with two-third of them increasing the EF1% less than 10%. As an intriguing result, they found that tailored scoring functions are more beneficial for the protein targets with less active compounds than the others containing more actives, where the target-specific scoring functions exhibit similar performances to the generic model.

Despite encouraging results obtained for target-specific scoring functions, it is important to highlight that the requirement of a large training set to derive a robust scoring function might become a significant hindrance and source of inaccuracy. To overcome the lack of a sufficient amount of experimental structures, protein–ligand conformations used for training target-specific scoring functions are commonly obtained from docking experiments.

### CONCLUSION

The development of accurate empirical scoring functions to predict protein–ligand binding affinities is a key aspect in SBDD. In recent years, the increasing availability of protein– ligand structures with measured binding affinities and data sets containing active, decoy, and true inactive compounds are boosting the use of sophisticated machine-learning techniques to obtain better performing scoring functions. In the coming years, it is expected that the combination of larger training datasets, non-physical/simplified descriptors, and DL techniques will be a very promising research line to improve scoring functions for structure-based VS. Methodological advances will be dependent to the size and quality of the available datasets for training and benchmarking, and great care will be necessary to avoid artificial performances due to the increased capacity of these nonlinear methods to capture bias present in the training data. In this sense, blinded community challenges with unpublished data (e.g., D3R challenge) are essential to address the real performance of scoring functions and docking protocols. Looking to the other side of the methodological spectrum, it is exciting to note that the advance in computing power, the development of new algorithms to introduce protein flexibility and solvation/desolvation effects, and more reliable semi-empirical quantum methods are enabling the development and use of new methodological advances for challenging tasks, such as QM/MM-based methods and entropy estimation.

The full potential of scoring functions will be achieved when models accurate enough to be useful in hit-to-lead optimization and de novo design studies are developed. To reach this goal, a scoring function must be sensitive to the docking pose, right for the right reasons (Kolb and Irwin, 2009). Reliable predictions of ligand binding affinity remain a big challenge, but we expect that in the next years important advances associated to distinct methodological approaches will be achieved and, probably, will be combined into more effective computer-based drug design protocols.

### AUTHOR CONTRIBUTIONS

IG and LD designed, wrote, and edited this review. FP contributed to designing and writing the review.

### FUNDING

This work was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (Grant No. 308202/2016-3), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), and Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ) (Grant No. E-26/010.001229/2015).

### REFERENCES

fphar-09-01089 September 20, 2018 Time: 13:51 # 13


comprehensive study by crystallography and isothermal titration calorimetry. J. Mol. Biol. 397, 1042–1054. doi: 10.1016/j.jmb.2010.02.007




kinase inhibitor. J. Mol. Graph. Model. 79, 81–87. doi: 10.1016/j.jmgm.2017. 11.003



integrative modeling of biomolecular complexes. J. Mol. Biol. 428, 720–725. doi: 10.1016/j.jmb.2015.09.014


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Guedes, Pereira and Dardenne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design

Shaherin Basith† , Minghua Cui † , Stephani J. Y. Macalino, Jongmi Park, Nina A. B. Clavio, Soosung Kang\* and Sun Choi\*

College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, South Korea

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Doriano Lamba, Consiglio Nazionale Delle Ricerche (CNR), Italy Dharmendra Kumar Yadav, Gachon University of Medicine and Science, South Korea Ana Carolina Rennó Sodero, Universidade Federal do Rio de Janeiro, Brazil

#### \*Correspondence:

Soosung Kang sskang@ewha.ac.kr Sun Choi sunchoi@ewha.ac.kr

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 08 December 2017 Accepted: 06 February 2018 Published: 09 March 2018

#### Citation:

Basith S, Cui M, Macalino SJY, Park J, Clavio NAB, Kang S and Choi S (2018) Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design. Front. Pharmacol. 9:128. doi: 10.3389/fphar.2018.00128 The primary goal of rational drug discovery is the identification of selective ligands which act on single or multiple drug targets to achieve the desired clinical outcome through the exploration of total chemical space. To identify such desired compounds, computational approaches are necessary in predicting their drug-like properties. G Protein-Coupled Receptors (GPCRs) represent one of the largest and most important integral membrane protein families. These receptors serve as increasingly attractive drug targets due to their relevance in the treatment of various diseases, such as inflammatory disorders, metabolic imbalances, cardiac disorders, cancer, monogenic disorders, etc. In the last decade, multitudes of three-dimensional (3D) structures were solved for diverse GPCRs, thus referring to this period as the "golden age for GPCR structural biology." Moreover, accumulation of data about the chemical properties of GPCR ligands has garnered much interest toward the exploration of GPCR chemical space. Due to the steady increase in the structural, ligand, and functional data of GPCRs, several cheminformatics approaches have been implemented in its drug discovery pipeline. In this review, we mainly focus on the cheminformatics-based paradigms in GPCR drug discovery. We provide a comprehensive view on the ligand– and structure-based cheminformatics approaches which are best illustrated via GPCR case studies. Furthermore, an appropriate combination of ligand-based knowledge with structure-based ones, i.e., integrated approach, which is emerging as a promising strategy for cheminformatics-based GPCR drug design is also discussed.

Keywords: GPCR, cheminformatics, drug discovery, ligand-based drug design, structure-based drug design

### INTRODUCTION

Rational drug design is the inventive process of identifying pharmaceutically-relevant drug candidates based on the information garnered from a biological target (Jazayeri et al., 2015). Discovery of ligands that modulate a target's activity has contributed largely to the understanding of both physiological and pathological processes (Wacker et al., 2017a). Navigating vast chemical space to identify such ligands seems a daunting task (Oprea and Gottfries, 2001; Lipinski and Hopkins, 2004). Techniques including medicinal chemistry, combinatorial chemistry, and highthroughput screening (HTS) are helpful in the identification of ligands, which can serve as effective modulators for pharmaceutically attractive targets. However, considering the astronomical number of possible druglike candidates (∼1023-1060), chemical space assessed by experimental techniques is still limited (Rodríguez et al., 2016; Mullard, 2017). In such a scenario, cheminformatics, which belongs to a part of the in silico realm, dominates in the exploration of a larger fraction of the chemical space.

Cheminformatics was defined by Brown (1998) as the combination of all available information that can be used in the optimization of a ligand to a potential drug candidate (Bajorath, 2004). This method aids in storing, searching, managing, and analyzing huge amount of chemical data, thereby expediting the development of novel ligand phenotypes (Bajorath, 2004; Valerio and Choudhuri, 2012). Additionally, the extraction of information and knowledge from chemical data could be helpful in the modeling of relationships between chemical structures and biological activities, and in the bioactivity prediction of other compounds from their structures (Schuffenhauer et al., 2006; Humbeck and Koch, 2017). Interestingly, cheminformatics fuses both chemical and biological data from drug candidates and drug targets, respectively, for the identification of new chemical entities (NCEs) and improvement of the reliability of data outcomes.

In the drug discovery pipeline, several cheminformatics approaches play a potent role in the identification of drug target and lead compounds, as well as in the prediction of ADMET properties (**Figure 1**). Chemogenomics-based databases, as well as computational polypharmacological analyses, have increased in popularity over the last several years as a supplementary method in the identification and validation of potential drug targets (Xie et al., 2014). Once a drug target is identified, the lead candidates with desirable properties are screened out of huge chemical compound libraries, thus underscoring the importance of cheminformatics tools in virtual screening (VS) (Varnek and Tropsha, 2008). Another potent cheminformatics approach, machine-learning is employed for the identification of novel drug candidates from lead compounds via generation of computational models (Lee et al., 2010, 2017; Varnek and Baskin, 2012; Mitchell, 2014). Other cheminformatics approaches including similarity and substructure searching could be utilized for the identification of novel scaffolds from large compounds repositories (Vass et al., 2016). The candidate compounds retrieved could be further docked onto the target protein to propose their possible binding affinities toward the target (Lenselink et al., 2016b). Upon identification of the druglike candidates, these could be further evaluated for ADMET properties using computational models, thus helping in the elimination of undesired compounds at an early stage of drug discovery, and minimizing the time and costs involved.

G protein-coupled receptors (GPCRs) belong to a large family of signaling proteins that mediate cellular responses to most hormones, metabolites, cytokines, and neurotransmitters, and therefore serve as "fruitful targets" for drug discovery (Shoichet and Kobilka, 2012). More than 800 genes comprise this receptor family, which modulate several signaling processes involved in behavior, blood pressure regulation, cognition, immune response, mood, smell, and taste (Thomsen et al., 2005). GPCRs are categorized into six classes based on sequence and function, namely Class A—rhodopsin-like receptors, Class B—secretin family, Class C—metabotropic glutamate receptors, Class D fungal mating pheromone receptors, Class E—cAMP receptors, and Class F—frizzled (FZD) and smoothened (SMO) receptors (Lee et al., 2018). All GPCR members share a common seven transmembrane (7TM) architecture linked by three extracellular (ECL) and three intracellular (ICL) loops (Ciancetta et al., 2015). However, they have low sequence identity and possess different extracellular N-terminal domains and diverse ligand-binding pockets (**Figure 2**). In case of class A GPCRs, the endogenous ligand is recognized by a ligand-binding site in the 7TM region. For class B GPCRs, the ligand is recognized by both extracellular and 7TM domains. For class C GPCRs, the ligand-binding pocket is found in the extracellular domain (ECD) that contains a Venus flytrap (VFT) module. In case of class F GPCRs, both SMO and FZD receptors possess an ECD that is comprised of an extracellular cysteine-rich domain (CRD) and an ECD linker domain. The endogenous lipoglycoprotein ligand, Wnt binds to the CRD of the FZD receptors (Wang et al., 2013; Wu et al., 2014). Upon ligand binding, GPCRs activate at least one of the two signaling partners, namely heterotrimeric GTP-binding proteins (G-proteins) or β-arrestins, and mediate signal flow via modulation of various downstream effectors.

GPCR drug discovery has been successful and many of the world's top-selling drugs have targeted this receptor family (Sriram and Insel, 2018). Class A GPCRs are the most immensely investigated GPCR drug target within the drug market due to their centrality in diseases, structural availability, and relative ease of accessibility. The high druggability of GPCRs and its central role in diseases (including alzheimer's disease, cancer, diabetes, obesity, and psychiatric disorders) provide a strong spearhead for its continuous efforts in drug discovery and development (Tautermann, 2016). A recent study of all GPCR drugs and agents currently in clinical trials revealed that 475 drugs (i.e., ∼34% of all drugs approved by Food and Drug administration [FDA]) mediate their effects through 108 unique GPCRs (Hauser et al., 2017). Additionally, the success rates for GPCR-targeted agents in the last 5 years were 78% (phase I), 39% (phase II), and 29% (phase III) (Hauser et al., 2017). The most recently FDA approved GPCR-targeted drug is Zilretta (triamcinolone acetonide extended-release injectable suspension), a glucocorticoid receptor agonist, which is used for the pain management of knee osteoarthritis (https://www.drugs. com/history/zilretta.html).

To utilize cheminformatics approaches in GPCR drug design, understanding the nature of the ligands, structural intricacies of the receptor, ligand-receptor interactions, and interaction of the receptors with downstream signaling complexes or other signaling partners is essential. Additionally, unveiling the relationships among ligand, receptor, and effector is necessary to investigate positive and negative allosterism, inverse agonism, biased signaling, and multimeric receptor pharmacology (Lane et al., 2017). Recent upsurge in the crystal structures of GPCRs provides a robust, 3D structural framework for identification of pharmaceutically-relevant ligands using ligand– and structurebased computational approaches, including molecular modeling

of receptor dynamics, ligand docking, and virtual ligand screening (VLS) (Coudrat et al., 2017a). Following the successful application of VLS approaches in targets such as kinases, proteases, and other enzyme families, it is also becoming a popular ligand screening tool for GPCRs (Heifetz, 2018). The success of structure-based VLS could be visualized by the encouragingly high hit-rates ranging from 20 to 70% in the identification of novel ligands for several class A GPCRs (**Table 1**).

In this review, we deliver a comprehensive assessment on the state-of-the-art cheminformatics approaches for GPCR drug discovery with successful models from literature. Firstly, insights on GPCR ligand space and its recent structural advances are summarized. Subsequently, the key principles and boundaries of ligand–, structure-based, and integrated cheminformatics approaches in GPCR drug discovery are discussed in the main text. We also shed some light on the contemporary cheminformatics tools utilized in GPCR drug discovery. Additionally, the limitations associated with cheminformatics approaches have been discussed, which could assist the reader to rationale the best in silico tool during their research. Lastly, we conclude with a summary of the review contents and prospects of the cheminformatics approaches in GPCR drug discovery.

subdivided into aminergic-like (β2AR, PDB ID: 3P0G), nucleotide-like (A2AAR, PDB ID: 3QAK), peptide-like (µ-OR, PDB ID: 5C1M), and lipid-like receptors (CB1R, PDB ID: 5XRA) along with their bound ligands are shown. Similarly, representative structures for class B (CRF1 [PDB ID: 4K5Y], GCGR [PDB ID: 5EE7], full-length GLP-1R [PDB ID: 5NX2], and CTR [5UZ7]), class C (mGlu1R [PDB ID: 4OR2]), and class F (SMO [PDB ID: 4QIN] bound to negative allosteric modulator) are shown. Receptors are shown in cartoon representation and the ligands are shown as stick models with transparent surfaces. Agonists are represented as red sticks, antagonists are shown as purple sticks, and negative allosteric modulator is shown as blue stick model.

### BOOMING AGE OF GPCR STRUCTURAL BIOLOGY

The pioneering study of two-dimensional (2D) structure for bovine rhodopsin (bRho) in 1983 marked the beginning of GPCR structural biology (Hargrave et al., 1983). A decade later, 2D projection map was calculated from the solved 2D crystals of bRho using electron cryomicroscopy, which served as the basis for the construction of the receptor molecular model (Baldwin, 1993; Schertler et al., 1993). However, the first threedimensional (3D) structure of bRho in its inactive state was released only in 2000 (Palczewski et al., 2000). Despite relentless efforts, elucidation of GPCR structures remained challenging due to several factors, including maintenance of structural integrity of the receptors by embedding in a membranelike environment, presence of flexible ECLs and ICLs, low expression level of the receptor, and displaying basal signaling activity even in the absence of a ligand. However, all the aforementioned problems have been circumvented with the advances in GPCR crystallography, protein engineering, and innovations in biotechnology. Introduction of small, stable fusion proteins (T4 lysozyme and b562RIL) decreased the flexibility of the receptor regions (ICL3, ICL2, and N-terminal regions), and improved the crystal contacts. Likewise, antibody fragments or nanobodies improved the conformational stability of the receptors. Insertion of mutations (stabilized receptor (StaR) approach) enhanced the receptor thermostability in a particular conformational state and increased the protein expression levels.

The first structural breakthrough of a human GPCR, i.e., β2-adrenergic receptor (β2AR with a diffusible ligand), using different crystallization techniques came in 2007 (Cherezov et al., 2007). Moreover, the first crystal structures for GPCR classes B, C, and F have been solved (Hollenstein et al., 2013; Wang et al., 2013; Wu et al., 2014). So far, experimental structures of 44 distinct GPCRs and ∼205 ligand-receptor complexes covering all the four classes, A–C, and F are available, of which most belong to the Class A subfamily (Hauser et al., 2017). It is to be noted that most of the existing GPCR structures are inactive ones, bound to an inhibitor. In the last year (2017) alone, more than 40 GPCR crystal structures have been determined which are listed in **Table 2**. GPCR structural studies have revealed the arrangement of the TM domains, location of the orthosteric, allosteric, bitopic, and biased ligand binding sites, homo– or heterooligomerization of receptors, and structural rearrangements involved in conformational changes upon GPCR activation or inactivation (Manglik and Kruse, 2017; Schrage and Kostenis, 2017). Besides garnering these 3D structural insights, the molecular basis of GPCR signal transduction coupled to G-proteins or β-arrestins were elucidated through X-ray crystallography and electron cryomicroscopy techniques. Oligomeric complex structures of bRho coupled to G-protein peptide (Rho/GαCT) (Scheerer et al., 2008), human Rho coupled to visual arrestins (Kang et al., 2015; Zhou et al., 2017), β2AR coupled to Gs-protein (Rasmussen et al., 2011) and βarrestin 1 (Shukla et al., 2014), A2A adenosine receptor (A2AAR) in complex with a mini-G<sup>s</sup> protein (Carpenter et al., 2016), glucagon-like peptide 1 receptor (GLP-1R) in complex with a Gs-protein (Zhang et al., 2017), and calcitonin receptor (CTR) coupled to Gs-protein (Liang et al., 2017) have been elucidated. These complex structures provide full mechanistic insights into


#### TABLE 1 | Key details of GPCR virtual screening campaigns reported in the last 5 years (2013–2017).

#### GPCR class and classification type Receptor type VS library and size Hits/hit rate Structure of notable hits References A, nonrhodopsin (peptide-like) PAR2 FDA-approved drugs: 1,216 compounds 4 hits ≥ 50% inhibition at 30 uM Xu et al., 2015 IC<sup>50</sup> = 10 uM A, nonrhodopsin (aminergic) 5-HT6R ChEMBL: 12,608 compounds 6 hits (16.7%) Kelemen et al., 2016 IC<sup>50</sup> = 0.1 uM A, nonrhodopsin (aminergic) H1R ChEMBL: 108,790 compounds 19 hits (73.1%) Kooistra et al., 2016 pK<sup>i</sup> = 4.72 A, nonrhodopsin (aminergic) β2AR ChEMBL: 108,790 compounds 18 hits (52.9%) Kooistra et al., 2016 pEC<sup>50</sup> = 4.52 C, metabotropic glutamate mGlu1R Asinex: 695,855 compounds 5 hits (14.3%) Jang et al., 2016 IC<sup>50</sup> = 10.22 uM FGSG\_02655 (Class I, pheromone receptor) Life Chemicals GPCR Targeted Libraries: 11,571 compounds 10 VS hits Bresso et al., 2016 A, nonrhodopsin (peptide-like) PAR2 (a) Asinex: 433,973 compounds (b) ChemDiv: 1,213,470 compounds 3 hits ≥ 30% inhibition at 10 uM (6.4%) Cho et al., 2016 IC<sup>50</sup> = 8.22 uM

#### TABLE 1 | Continued


GPCR and biased signaling, thus underpinning their functional significance and pharmacological targeting.

GPCRs are known to exist or function as monomers, dimers, and/or higher order oligomers, including homo– or hetero– dimers/oligomers (Guo et al., 2017). In addition to the accumulated experimental data through biochemical and biophysical techniques, the structural information on GPCR dimers or higher order oligomers were provided by X-ray crystallography. The first reported higher-order crystal structure of Rho and opsin in native membranes were reported in 2003 (Liang et al., 2003). Consequently, several structures including rhodopsin and nonrhodopsin class A GPCRs were elucidated (Lee et al., 2018). The oligomeric structures of GPCRs are essential for modulation of receptor function, mediation of cross-talk between GPCRs or other signaling pathways, and cellular trafficking, hence they have been associated with specific functional effects. Moreover, targeting these oligomeric structures as drug candidates could provide a new arena for drug development and specificity. The wealth of information supporting the existence of homo- and heterooligomers of GPCRs can be retrieved from the RCSB PDB (https://www.rcsb. org/pdb/home/home.do) or GPCR Oligomerization Knowledge




<sup>a</sup>Arrestin-bound state of the receptor.

<sup>b</sup>Ligand-free basal state of the receptor.

<sup>c</sup>Fully-active receptor complexed with a G protein.

Base (Khelashvili et al., 2010). In addition to these structural intricacies, GPCR signaling is also modulated by the presence of ligands other than orthosteric, which will be discussed in the following sections. Furthermore, adding details like GPCR dynamics to the structural information would provide a bigger picture to the biomedical researchers in this field. Such dynamic events triggered upon receptor activation or inhibition mechanisms could be covered by powerful methodologies including, bottom-up Hydrogen Deuterium eXchange Mass Spectrometry (HDX-MS) and resonance energy transfer (RET) (Li et al., 2015; Zhang, 2017). These important structural tools aid in better GPCR drug design by adding valuable information to our understanding of GPCR function, dynamics, protein-protein interactions, and receptor-ligand interactions (Vilardaga, 2011; Kauk and Hoffmann, 2017). Collectively, all the structural studies provide unprecedented insights into the structural and functional diversity of this receptor family. The wealth of structural information on all GPCRs is invaluable for ligand-based drug design (LBDD), structure-based drug design (SBDD), and integrated paradigms which complement traditional drug discovery efforts.

#### INSIGHTS INTO GPCR LIGAND SPACE

Various signaling pathways involve several GPCRs whose activities are mediated by ligand binding. Based on activation intensity, GPCR modulators can be divided into agonists, partial agonists, antagonists, and inverse agonists. Full agonists can stimulate maximal GPCR activity leading to recruitment of downstream proteins for signal transduction. Partial agonists, on the other hand, cannot induce 100% activation of receptors and acts as a type of antagonist while in the presence of full agonists. However, it can act as full agonists when there are excess receptors and in the absence of actual full agonists. Antagonists act as agonist blockers and can be divided into neutral antagonists and inverse agonists. Neutral antagonists can bind to GPCRs but do not affect the receptor's constitutive activity, whereas inverse agonists can block agonist effects. These modulators can directly interact with the orthosteric binding site of GPCRs (Wacker et al., 2017a).

While structural architecture of the TM region is largely conserved, the remarkable diversity in GPCR sequences are most notable in the ECL and ICL regions. This leads credence to the capacity of the GPCR family to interact with a wide range of ligands that vary in size, shape, and physicochemical properties, most of which bind to the orthosteric site to modulate receptor activity. In the ECL region, ECL2 plays a critical role in ligand recognition, access, and selectivity (Dror et al., 2011; Kruse et al., 2012; Zhang H. T. et al., 2015). For Class A GPCRs, lipophilic ligands often come from the lipid membrane and access the orthosteric site through the "lid" formed by the Nterminus and ECL2. In the case of hydrophilic ligands, ECL2 of different receptors only partially covers the ECL region through a variety of structures that shapes the entrance to the binding pocket (Venkatakrishnan et al., 2013). On the other hand, modulators of class B GPCRs are frequently peptide ligands, which possess large volume and high flexibility, requiring a more solvent-accessible orthosteric binding pocket (Liang et al., 2017).

Increase in static GPCR structures and advances in MD facilities have assisted in the elucidation of GPCR-ligand binding interaction. A thorough investigation of the ligand binding pocket of several GPCRs indicated the presence of multiple topologically equivalent residues that forms a consensus ligand binding network in almost all Class A receptors, providing an explanation for cross-reactivity and polypharmacology. Moreover, deviations from these consensus binding residues can account for ligand specificity in different GPCR members, and can thus be exploited in the design of specific and potent ligands (Venkatakrishnan et al., 2013). Regardless of the upsurge in information in the last few decades, it is still difficult to understand the differences in ligand binding requirements for agonists, antagonists, and inverse agonists of a given receptor, despite having almost identical structures (**Figure 3**). This calls for more studies focused on identifying key residues for agonism and antagonism, not only ligand binding specificity. Along with this, it is important to scrutinize activity cliffs of ligands as significant shifts in modulation type could be observed through small changes in ligand structures.

Besides the orthosteric site, GPCR ligands can also bind to allosteric pockets and indirectly modulate receptor activity. Allosteric modulators can be divided into two types: (a) positive allosteric modulators (PAMs), which increases agonist affinity, and (b) negative allosteric modulators (NAMs), which acts as an allosteric antagonist or inverse agonist to decrease agonist affinity (Christopher et al., 2013, 2015; Kenakin, 2016). Additionally, there are some molecules that can both interact with orthosteric and allosteric sites, known as bitopic modulators (Dror et al., 2013; Fronik et al., 2017). Allosteric modulators can be either endogenous molecules, like sodium and cholesterol (Katritch et al., 2014), or exogenous molecules like natural products and synthetic compounds. Since allosteric modulators bind to sites other than the orthosteric site, they can co-bind with the putative ligand on the receptor to alter conformation and activity, thus affecting downstream signaling.

In case of CC chemokine receptor type 9 (CCR9), vercirnon (antagonist) was co-crystallized and unexpectedly found to interact with the intracellular binding site, blocking Gprotein coupling (Oswald et al., 2016). Another example of an allosteric modulator is 1-(2-(2-(tert-butyl)phenoxy)pyridin-3-yl)-3-(4-(trifluoromethoxy)phenyl)urea (BPTU), which binds outside the purinergic P2Y<sup>1</sup> receptor, flanking the TM bundle inside the lipid bilayer. While BPTU shows lower potency than known orthosteric antagonist, MRS2500, its allosteric interactions allow higher selectivity for the P2Y<sup>1</sup> receptor (Zhang D. et al., 2015). Apart from small molecule compounds, ions can also function as an allosteric modulator, as illustrated by the discovery of the conserved allosteric binding pocket for Na<sup>+</sup> in Class A GPCRs (Katritch et al., 2014).

The current rising star in GPCR research is biased signaling. Previously, GPCRs were presumed to exist as a simple twostate receptor model ["on" (activation) and "off " (inactivation)]. However, extensive analyses of different signaling pathways paved way to an exciting discovery that GPCRs have multiple conformations, each tailored to a specific response and downstream effect. Different ligands induce different receptor conformations, and each conformational state could initiate a specific downstream signal. While this finding increases the difficulty in drug discovery and design, there is also an opportunity to selectively block pathways implicated in various pathologies, while leaving normal homeostatic processes intact (Bologna et al., 2017). Typically, G protein signaling occurs upon agonist binding, whereas arrestin-mediated signaling occurs through arrestin binding. In this instance, GPCR drug design strategy could be dependent on identifying agonists biased for either G protein or arrestin signaling, leading to higher drug efficacy and diminished adverse effects (DeWire and Violin, 2011). Some excellent examples of biased ligands include lysergic acid diethylamide (LSD) (Wacker et al., 2017b), a wellknown hallucinogen which appears to display bias toward βarrestin signaling, and synthetic opioids TRV-130 (DeWire et al., 2013) and PZM-21 (Manglik et al., 2016), which are biased toward G protein signaling. Altogether, these accumulated data may provide extremely beneficial hints in the discovery and design of GPCR ligands based on the intended activity and targeted pathology. **Figure 4** depicts some of the common GPCR modulators that are distinguished by activity types.

### CHEMINFORMATICS-BASED PARADIGMS IN GPCR DRUG DISCOVERY

#### Cheminformatics Approaches Based on the Knowledge Derived From GPCR Ligands

Cheminformatics tools are frequently utilized in GPCR research due to the enormous amount of GPCR ligand data. Difficulties in

crystallizing membrane proteins and receptor flexibility hindered structural elucidation and drug discovery research for this receptor. Due to these shortcomings, ligand-based approaches started to thrive in order to provide a better understanding of GPCR function and pharmacology. Some of the major ligandbased cheminformatics approaches are detailed below.

#### Cheminformatics and Virtual Screening

In silico screening method started to become popularly used after the integration of high throughput screening (HTS) and information technology (Coudrat et al., 2017b). Several computational and VS methods are frequently utilized in different stages of drug discovery and development, but some of the earliest and most commonly used ones are similarity- and QSAR-based strategies due to their efficiency and capability in analyzing simple 2D structures. These strategies are dependent on the principle that similar structures are predicted to display similar activities. Similarity-based methods need at least one established hit whose chemical structure is used to calculate pertinent molecular fingerprints, which is then employed to screen chemical libraries for compounds containing similar structure or fragments. On the other hand, QSAR-based strategies rely on the developed mathematical models which require an adequate number of biologically active compounds with activities covering a wide span of concentration. In this case, screening is dependent on the quality of the dataset used and the accuracy of the developed model (Luo et al., 2016).

Similarity-based VS was applied in a recent study for the discovery of a novel series of cannabinoid receptor 2 (CB2R) agonists (Gianella-Borradori et al., 2015). CB2R is a class A, lipid-like GPCR that regulates the effects of endogenously produced cannabinoid receptor ligands and has been implicated in several inflammatory diseases. In this study, an in-house database containing around 25,000 compounds was screened based on 40 low-energy conformations of known active and selective compound HU-308. Compounds were ranked based on their similarity with any of the 40 conformers of HU-308, and the top 94 were selected for biochemical screening based on the combined color score, which refers to chemistry alignment akin to pharmacophore features, and shape Tanimoto score, which accounts for 3D conformer overlay. From the initial hits, the top 16 active compounds displayed 6 new core scaffolds. Upon combined inspection of bioactivity, molecular weight, and lipophilicity, DIAS1 was chosen and used for further mining of the in-house library with the help of the newly identified scaffold. The second VS led to the discovery of DIAS2, which exhibited better activity and reduced lipophilicity as compared to DIAS1. Further structure-activity relationship (SAR) studies were performed for the optimization of the lead compound to improve potency, selectivity, and pharmacokinetic properties, resulting in candidate compounds that show nanomolar activity and selectivity for CB2R.

Another study used the US EPA's ToxCast database to develop QSAR models for 18 aminergic GPCRs (Mansouri and Judson, 2016). While the ToxCast program can screen hundreds of compounds in vitro to determine toxicity, the chemical space covered by their database is not enough to include all compounds of interest. However, the database can be employed in the development of predictive QSAR models. Two QSAR models were developed during the study, a qualitative (active vs. nonactive) and a quantitative (potency value prediction) model. Various descriptors were calculated from the 2D structures of the compounds in the database and were subjected to genetic algorithms (GAs) to identify the best and most predictive descriptors. Several model-fitting methods, including PLSDA (partial least square discriminant analysis), SVMs (support vector machines), kNNs (k-nearest neighbors), and PLSs (partial least squares), were used to generate the QSAR models, which were later evaluated for accuracy and predictability. As a result, they were able to produce suitable models for aminergic GPCR assays and demonstrate the reliability of QSAR-based methods for analysis.

#### Cheminformatics and de Novo Ligand Design

Typically, ligand-based de novo drug design utilizes approved drugs or known inhibitors as reference structures or a source of pharmacophores that are relevant for bioactivity to build new chemical structures. While novelty and potency are always favored in drug discovery research, de novo structures should also have desirable pharmacokinetic properties (Kawashita et al., 2015). The combination of de novo drug design and computer-aided VS, along with the application of ADME/Tox models for the prediction of pharmacokinetic properties, has the capability of more effectively identifying NCEs with the desirable pharmacological activity profiles. In this sense, de novo drug design approach has become the forerunner of the longenvisioned personalized medicine where patients can be given custom-tailored drugs with increased efficacy and reduced adverse effects.

Rodrigues et al. worked on 5-hydroxytryptamine receptor subtype 2B (5-HT2B) drug discovery and were able to identify selective ligands through multidimensional de novo design (Rodrigues et al., 2015). In the Molecular Ant Algorithm (MAntA) software (Reutlinger et al., 2014), chemically advanced template search version 2 (CATS2), pharmacophores, and Morgan substructure fingerprints were employed to generate 5- HT2B selective ligands via reductive amination, resulting in over 5,000 new compound structures from which 4 were selected based on calculated 5-HT2B selectivity. To further improve selectivity and increase the scaffold diversity, de novo design software DOGS (Hartenfeller et al., 2012) and FDA-approved drug molecule structures were utilized to produce NCEs. The resulting compounds were screened with PAINS (Baell and Holloway, 2010) and ADMET filters (Lagorce et al., 2008) to remove undesirable molecules before performing experimental validation assays. Finally, four more compounds were obtained and among them, one compound showed promising selectivity for the 5-HT2B receptor. Even though the newly designed compound was not comparable in potency with the most potent existing antagonists, this study still provides an excellent application of de novo drug design in GPCR drug discovery field.

#### Cheminformatics and Chemical Genomics

While the number of currently available GPCR structures is increasing, it only covers a small portion of this protein superfamily and several other pharmaceutically relevant members are not yet elucidated. Chemical genomics can be applied to overcome the difficulty of target and drug identification by screening small molecule libraries and measuring their effects on entire biological systems or a specific group of targets, such as GPCRs. This combines the strength of traditional pharmaceutical techniques and genomics to facilitate discovery and validation of therapeutic targets, as well as identification of potential drug candidates for optimization (Hauser et al., 2018). Moreover, application of this strategy provides information concerning activated signaling pathways and biological effects through measurable gene expressions, leading to relevant data about target specificity and noninteraction pairs. In this sense, chemical genomics works on mining huge chemical data with the help of structural bioinformatics to rapidly identify target structure-function relationships (Valerio and Choudhuri, 2012). One of the most popular chemical genomics-based database found online is GLIDA (GPCR-Ligand Database), a publicly available Chemical Genomics database that can be used for GPCR drug discovery (Okuno et al., 2008). It contains GPCR biological and ligand information, as well as GPCR-ligand binding data. Therefore, it can be utilized for LBDD with the help of techniques such as machine learning-based classification and similarity-based search.

Shiraishi et al. reported an interesting research wherein chemical genomics approach was employed to predict GPCRligand interaction for class A GPCRs (Shiraishi et al., 2013). GPCR-ligand interaction data was collected from GVK Biosciences database and kernel methods were applied to evaluate compound-protein interaction (CPI) pair similarities based on Extended Connectivity Fingerprint (ECFP) and Dragon software descriptors generated for the ligands, along with target specific regions, such as full structure, loop region, and TM region. The results showed that compared to kernels accounting for the full structure and loop regions, kernels for the TM region showed significantly improved performance, which agrees with experimental findings that the TM region of class A GPCRs plays a critical role in ligand binding. Reliability of the machine learning model was improved with the addition of negative noninteraction pairs. Careful investigation of GPCR-ligand pairs revealed that high co-occurrence of residue-fragment pairs may be indicative of importance in ligand binding and specificity, as well as conservation of binding modes among Class A GPCRs. Key interactions identified in their study can be used for future VS and lead optimization studies and is beneficial when employed in combination with structure-based studies.

#### Cheminformatics, Polypharmacology, Drug Repositioning, and Repurposing

Recently, pharmaceutical research focuses not only on the discovery of novel compounds for a known target but also on the discovery of new indications for currently approved drugs. Polypharmacology has quickly emerged as a critical part of drug discovery research with the knowledge of how interconnected pathways in biological systems are. Though this field is most often used to investigate adverse effects and toxicity, information garnered from possible off-target effects can also offer information about new drug indications or cross-reactivity leading to higher drug efficacy (Jacobson et al., 2014). With the upsurge of polypharmacological information, it is no surprise that it is now frequently combined with cheminformatics strategies to predict off-target effects ahead of extensive biochemical analyses in order to save time and resources.

Xie et al. reported an interesting chemical genomics-based polypharmacology study focusing on GPCR-related drug abuse problem (Xie et al., 2014). Initially, a drug-abuse domain specific chemogenomics knowledgebase (DA-KB) was built to consolidate chemogenomics data regarding drug abuse and CNS diseases. This database was later used to investigate molecular interaction networks that encompass both drug abuse and GPCR modulation. Upon identifying 85 drug abuse-related GPCRs, distribution information of these receptors was collected and studied from the MetaCore database (Ekins et al., 2006). Using HTDocking (https://omictools.com/htdocking-tool) and GPCRDocking programs, polypharmacology and polydrug addiction analyses were performed to investigate the interactions between drug abuse-related receptors and ligands, along with cross-reactivities. As a result, the DA-KB became a powerful tool that has the capability of transforming data to useable polypharmacology knowledge. Moreover, TargetHunter server was also developed and can be used for target or off-target discovery.

### Cheminformatics Approaches Based on the GPCR Structural Data

SBDD is one of the potent tools in lead discovery and optimization (Andrews et al., 2014). The application of SBDD is proven to be more efficient than traditional methods due to its working principle, which includes understanding the molecular basis of the disease and utilizing the 3D structural data of the target protein in the drug discovery pipeline (Cavasotto and Palomba, 2015). It has played a valuable role in several drug discovery projects involving enzyme targets (Wlodawer and Vondrasek, 1998; Varghese, 1999). Due to the difficulties in the expression and crystallography of GPCRs, there was only limited information available for SBDD of such targets. However, methodological advances in GPCR crystallography have paved way for the elucidation of several GPCR structures in the recent past. The availability of GPCR structures led to increased application of structure-based approaches in GPCR drug design, an area which has long been dominated by ligand-based ones. Breakthroughs in GPCR structural biology provide invaluable insights into the GPCR structure, function, and polypharmacology. The abundance of ligand-bound GPCR structures unveils the intricacies of ligand-receptor interactions, thus triggering a shift from conventional HTS techniques to less cost and highly efficient SBDD approaches for the design and discovery of potent ligands with improved pharmacological profiles. The main drawback of SBDD approaches lies on the scoring functions used by docking algorithms, wherein numerous approximations and restraints to protein flexibility are applied to expedite the process (Kim and Cho, 2016). In the following section, we briefly discuss the structure-based cheminformatics approaches for identifying novel GPCR ligands targeting ligand- and/or allosteric binding sites with few thriving models from the literature.

#### Identification of GPCR Novel Chemotypes via Structure-Based Virtual Screening

Utilizing crystal structures or homology models of target proteins in rational drug design is considered as the most powerful and popular method of choice in the design and/or screening of new lead compounds. In the early phase of drug discovery pipeline, structure-based virtual screening (SBVS) or docking-based VS has been utilized for the prediction of novel bioactive compounds from large and chemically diverse libraries (Cheng et al., 2012). In general, SBVS requires knowledge about the target's (protein or receptor) 3D structural information determined through experimental (X-ray or NMR) or in silico methods (homology modeling). Procedure involves docking of large chemical libraries of small compounds into crystal structure or homology model of the receptor. The selection criteria of small compounds for further experimental testings are based on the docking score, which assesses the binding affinity of protein-ligand complexes, predicted binding poses, chemical diversity, interactions with key residues, etc. (Ngo et al., 2016). The small compounds that cause a biological response are known as hits, which act as new chemical scaffolds for hit-to-lead development. The general VS workflow applied in several GPCR VS studies is shown in **Figure 5**.

SBVS studies for the first crystal structures of GPCRs, including β2AR, A2AAR, dopamine D3R, and histamine (H1R) have shown high hit rates. The pioneering study of SBVS for a druggable GPCR using the β2AR crystal structure was reported (Cherezov et al., 2007). In another SBVS, the authors utilized the inactive structure of β2AR/carazolol (PDB code: 2RH1) (Sabio et al., 2008) and screened proprietary and public databases for the identification of β2AR ligands. The hit rates obtained were 36 and 12%, respectively. Similarly, Kolb et al. (2009) docked ∼1 × 10<sup>6</sup> commercially available compounds onto the same crystal structure and the top 25 virtual hits were selected based on their commercial availability, chemical diversity, and complementarity to the binding sites, and subjected for biological testings. Among them, six compounds had detectable binding affinities with the best one showing a Ki of 9 nM. All six hit molecules had novel chemotypes, and five of them were confirmed as inverse agonists. Apart from the reported VS studies using crystal structures, there were also few reports using receptor homology models. Langmead et al. identified highly potent and novel chemotype 1,3,5-triazine derivatives using A2AAR homology models (Langmead et al., 2012). A virtual library of 5.45 × 10<sup>5</sup> compounds was screened and the initial hits were selected based on the shape geometry and electrostatic properties of the orthosteric site. A hit rate of 9% was obtained and the structures were modified and optimized using X-ray crystallography and structure-based optimization techniques. This series of optimization led to the successful identification of AZD4635 (HTL-1071), which is in phase 1 clinical trials for immunooncology (Jazayeri et al., 2017).

Interestingly, a large-scale VS study was carried out by Lane et al. (2013) for the identification of both orthosteric and allosteric ligands of D3R. Based on the crystal structure of D3R, two optimized D3R models were prepared. To account for protein flexibility, conformers of D3R models were generated and subsequently evaluated by VS performance, i.e., conformers that can separate D3R actives from decoys were selected for the following analyses. The Molsoft Screen Pub database, which contains 4.1 × 10<sup>6</sup> compounds, was virtually screened using

docking calculations. Top 300 hits in each model were selected and clustered by chemical similarity (0.3 Tanimoto distance). The top 25 compounds selected did not have a positively charged amine forming a conserved salt bridge to D1103.32, which is contrary to D3R apo model, but has interactions with TM1, 2, 3, and 7 as well as ECL1 and ECL2. These hits also reach dopamine and D1103.32 at the end of the orthosteric pocket. Finally, the predicted novel allosteric ligands were experimentally validated, showing distinct functional profiles on dopamine-signaling efficiency. Another SBVS approach identified nanomolar lead compounds for the melanin-concentrating hormone-1 receptor (MCH-1R) (Lionta et al., 2014). This approach combines GPCR molecular modeling, antagonist binding site prediction, design, synthesis, and a focused library screening. A primary hit compound from a pyranose-based VAST library was initially used for the construction of a high quality MCH-1R model. Furthermore, the model validation was performed using a virtual enrichment experiment, along with the model-driven structurebased expansion of the initial hit, for identification of potent interactions in the binding site. A SBVS of a library with ≤0.7 Tanimoto similarity to existing MCH-1R ligands provided a 14% hit rate and 10 unique chemotypes of potent MCH-1R inhibitors, including two nanomolar leads (Lionta et al., 2014).

In silico screening territory for classes B, C, and F largely remains uncharted due to the limited number of crystal structures available. Using SBVS approach, noncompetitive ligands (allosteric modulators) of related class B GPCRs, namely glucagon receptor (GLR) and glucagon-like peptide 1 receptor (GLP-1R), were identified (de Graaf et al., 2011b). Based on the crystal structure of corticotropin-releasing factor 1 receptor (CRF1R), a homology model for GLR was constructed. A database containing 1.9 × 10<sup>6</sup> compounds was assessed for chemical similarity to the current GLR noncompetitive inhibitors and docked onto the TM cavity of GLR. Based on the protein-ligand interaction fingerprints (IFPs), 23 compounds were selected and subjected for in vitro evaluations. Only two compounds were found to dose-dependently inhibit the effect of glucagon. One hit that was predicted as inactive for GLR bound to GLP-1R and potentiated a response similar to the endogenous GLP-1 ligand. For class C GPCRs, successful in silico VS studies were carried out against the VFT crystal structures (orthosteric N-terminal domain) of metabotropic glutamate receptor subtypes, mGlu3R and mGlu4R (Selvam et al., 2010). Besides the above-mentioned studies of VS campaigns, there are several computational works reported in the literature to discover novel orthosteric ligands for various GPCRs (which is well summarized in several review articles; Andrews et al., 2014; Cavasotto and Palomba, 2015; Shonberg et al., 2015; Ngo et al., 2016; Lee et al., 2018). Since SBVS on GPCRs is too broad to coverin this section, we have summarized representative case studies reported in the last 5 years (2013–2017) in **Table 1**.

#### Relevance of Fragment-Based Drug Discovery (FBDD) on GPCR Targets

Sequential piecing of fragments together to develop a novel lead compound is known as fragment-based drug discovery (FBDD) or fragment-based lead discovery (FBLD). FBDD is a potent scaffold-hopping and lead structure optimization tool for drug discovery projects and serves as an alternative to HTS (Matricon et al., 2017). The success of this approach in drug discovery campaign could be visualized by the increase in the number of compounds (originated from virtual fragment screens) entering clinical trials. A remarkable example of drugs identified via FBDD approach is vemurafenib, which was approved for the treatment of metastatic melanoma in 2011 (Baker, 2013). FBDD uses small molecules comprising ≤20 heavy atoms as a starting fragment for effective hit optimization. The main concept of this approach is to discover ligands that are smaller than a regular drug compound. The enlarged coverage of uncharted chemical space in fragment databases provides an exciting opportunity to find ligands after screening only a few thousand compounds (Chen et al., 2013). A fragment library can be designed and screened using molecular docking studies (Lee et al., 2018). The retrieved fragments could be further optimized using other computational approaches for growing, linking, or both.

Strategies utilized in the development of fragments into a lead compound include fragment growing, fragment linking, sequential docking, and group-based QSAR techniques. Fragment growth strategy initially begins with a fragment in the receptor' active site and allows extension of the fragment to maximize its interaction with the residues in the binding pocket. Fragment linking refers to the covalent linking of two or more fragments to form a single molecule which provides a new chemical scaffold in the active site. The application of FBDD to SBVS increases the structural space of hit-to-lead compounds. Even though ligands retrieved from fragment libraries lack selectivity and exhibit low affinity, they can be used as starting points for novel lead discovery. Despite its numerous advantages, there are still limitations associated with this approach, such as low accuracy prediction of fragment binding modes and rapid accumulation of errors. However, this approach proves to be useful when complemented with experimental techniques. Fragment screening of GPCR ligands via experimental methods (NMR, SPR, and X-ray crystallography) is challenging due to the difficulties in obtaining substantial amounts of functional protein, inherent conformational flexibility of the receptors outside the membrane, and low expression of the receptors (Lee et al., 2018). Therefore, in silico FBDD approaches could be utilized for GPCRs and other therapeutic targets. In the following paragraphs, we discuss the successful application of FBDD on GPCR drug discovery from literature.

The importance of in silico screening against GPCR protein structures or homology models to investigate novel fragment-like ligand chemical space is applicable for several GPCR targets. One of the first successful virtual fragment screening was developed by de Graaf et al. against doxepin bound human H1R crystal structure (de Graaf et al., 2011a; Shimamura et al., 2011). In this approach, molecular docking and receptor-ligand IFP protocols were combined to discover a chemically diverse set of new fragment-like H1R ligands. Out of 26 fragment-like compounds, 19 showed high binding affinity at the receptor level (hit rate 73%). Similarly, another structure-based virtual fragment screening (SBVFS) was performed against two GPCR targets, namely dopamine (D3R) crystal structure and H4R homology model structure, and an in-house fragment library of 12,905 fragments (Vass et al., 2014b). Additionally, molecular dynamics (MD) simulations were performed to represent different conformational states of the receptor orthosteric site (Vass et al., 2014b). Single structure- and ensemble docking screens were carried out for both receptors. The resulting 50 virtual hits were subjected for in vitro studies. Both the single and ensemble structures were found to be suitable for docking-based VS of fragments against GPCR targets. Chen et al. complemented in silico SBVFS with experimental biophysical screening to test the efficiency of their developed method (Chen et al., 2013). Initially, a set of 500 fragments were docked onto the orthosteric pocket of antagonist-bound A2AAR crystal structure (Jaakola et al., 2008) and ranked by affinity prior to target immobilized NMR screening of the same library (TINS). TINS resulted in 94 hits, where five fragments were identified to exceed the threshold affinity for the GPCR target. In the in silico screening, four out of five compounds were found in the top 50 fragments. Apart from these four fragments, the remaining 46 fragments also showed high binding affinities. Thus, a second computational screening approach using commercially available fragments (3.28 × 10<sup>5</sup> ) was performed and the 22 top-ranked compounds were tested experimentally. Among them, 14 fragments were identified as A2AAR ligands. Furthermore, QSAR studies were performed for three potent A2AAR ligands followed by optimization of the fragments by MD simulations and free-energy calculations. Similarly, another successful application of fragment-based screening and lead optimization using both biophysical and in silico techniques was shown in β1AR target leading to the discovery of novel high affinity leads (Christopher et al., 2013).

Verheij et al. studied target selectivity against histamine subtype H4R and 5-HT3A (ion channel) homology models using SBVFS approach (Verheij et al., 2011). The results of fragmentbased screening showed that both receptors yielded a common pool of hit fragments, thus underlining remarkable similarities in ligand recognition. This knowledge could assist in efficiently navigating chemical space during hit optimization. Besides the orthosteric binding site (primary), allosteric sites (secondary) have also been targeted for identification of novel compounds by SBVFS approach. Vass et al. applied a sequential docking protocol to predict starting points for fragment linking using D3R crystal structure and D2R homology model to identify subtype selectivity (Vass et al., 2014a). Two in-house focused fragment libraries (196 fragments function as primary binding site ligands for D<sup>2</sup> and D<sup>3</sup> receptors and 266 fragments function as secondary binding site ligands for D3R) were docked in the orthosteric and allosteric binding sites and the best fragment combinations were listed. Similar top-scoring fragments were identified for the orthosteric site, whereas allosteric site fragments showed subtype selectivity. Three fragment-linked compounds that showed 9-, 39-, and 55 fold selectivity for D3R were synthesized, and docking results were validated by the experimental data.

In tandem with SBDD, FBDD has also been successfully applied to other GPCR classes. Novel mGlu5R NAMs were identified through combination of fragment-based screening and medicinal chemistry approaches (Christopher et al., 2015). In addition, the binding modes of NAMs with the receptor were crystallographically solved. Recently, an in silico fragmentbased approach was applied on the crystal structures of mGlu5R (Doré et al., 2014; Christopher et al., 2015) for the design of novel allosteric modulators (Bian et al., 2017). Initially, a fragment library for reported GPCR allosteric modulators was constructed using the data from Allosteric Database (ASD). Subsequently, the novel compounds were generated and analyzed using retrosynthetic combinatorial analysis procedure (RECAP). Molecular docking was applied to screen the hits for the target by docking the in silico generated compounds into the binding pocket. Additionally, other computational methodologies, such as benchmark dataset verification, docking, QSAR model simulations, etc., were performed to assess validation of the hits. Twenty structurally diverse hits were predicted as potential mGlu<sup>5</sup> allosteric modulators based on the binding energies and docking scores. This study highlights the importance of purely computational FBDD approach for facilitating the design of novel compounds for other targets as well. In addition to the above-mentioned GPCR case studies on SBVFS campaigns, there are several other in silico reports available regarding the discovery of novel ligands which are summarized elsewhere (Hubbard and Murray, 2011; Murray et al., 2012; Shoichet and Kobilka, 2012; Visegrády and Keseru, 2013; Andrews et al., 2014; Lee et al., 2018).

### Integration of Ligand- and Structure-Based Cheminformatics Approaches

The use of cheminformatics in drug discovery provides an excellent foundation for the integration of structure- and ligandbased strategies due to its application in different stages of drug discovery. With the rising number of available structures, biological databases, and in silico techniques for cheminformatics and modern drug discovery, it is not surprising that ligandand structure-based approaches are used in combination to take advantage of the abundant GPCR ligand information while employing recently elucidated crucial protein structural information to aid in increasing success in GPCR drug discovery research. Furthermore, integration of LBDD and SBDD complements strengths and weaknesses of each method, leading to better insights in critical ligand functionalities and receptor-ligand interaction information. Researchers are now able to use 3D protein structures to predict binding modes and study the pharmacology of known drugs and their analogs through docking, providing rationalization of ligand activity and useful SAR information for the design and optimization of new agonists and antagonists (Munk et al., 2016). In addition, rapid innovation of hardware and computing power allows the use of MD simulations for more in-depth study of GPCR ligand binding and activity modulation (McRobb et al., 2016; Clark, 2017).

An excellent case of ligand- and structure-based integration in GPCR drug discovery is shown in studies involving A2AAR, an attractive drug target for the treatment of Parkinson's disease. Since A2AAR receptor was one among the first GPCRs to be crystallized, it has become one of the most extensively Basith et al. Cheminformatics-Based GPCR Drug Design

studied drug target. The later release of a high-resolution A2AAR structure, which revealed the presence of water in the binding site, further increased the efforts for drug design and optimization. Over the years, most of A2AAR antagonists, such as istradefylline (Jenner, 2005) and preladenant (Neustadt et al., 2007), have been designed based on the purine scaffold and other related heterocycles. Although the abundance in ligand information for A2AAR helps in the elucidation of important chemical fingerprints and ligand binding interactions, it has become difficult to discover novel entities for drug development. In a study by Lenselink et al. (2016a), they performed VS using an ensemble of A2A receptor structures split into a structure-based decision tree (Lenselink et al., 2014). Ligands were docked to each protein structure and proceeded to the next receptor docking based on a GlideScore cut-off of the previous procedure. The resulting ligands were filtered using Rapid Elimination of Swill (REOS) (Walters and Namchuk, 2003) and re-scored using MM-GBSA. Consequently, similarity-based analysis (against compounds tested for A2AAR activity recorded in ChEMBL) was performed to determine the structural novelty of the remaining hits and select the most unique compounds to be tested experimentally. Out of 71 novel ligands, only 2 compounds displayed suitable A2AAR binding affinity. They also performed a retrospective analysis of the current A2AAR ligands to determine novelty in structure and its relation to observed A2AAR activity. Decades of research efforts for this target left little room for discovery of new ligand scaffolds, as seen in previous VS studies showing ligand Tanimoto similarity in the range of 0.19–0.68 (Carlsson et al., 2010; Katritch et al., 2010; Langmead et al., 2012; Rodriguez et al., 2015), with the lowest similarity showing the least activity. While most of the virtual hits were found to be similar in structure to experimentally validated compounds from ChEMBL, it should be noted that several of the tested compounds or scaffold structures were also discovered using computational methods, highlighting the value of in silico approaches in drug discovery and design.

Aside from combining known structure- and ligand-based methods, hybrid tools that assimilate features from both approaches have been developed to afford computational chemists other strategies which can compensate current individual limitations of SBDD and LBDD. One of the hybrid methods that has gained popularity in recent years is proteochemometric (PCM) modeling. PCM modeling is similar to traditional QSAR studies since both methods require descriptors, bioactivity data, and machine learning functions for model development (Qiu et al., 2017). However, a crossterm descriptor is also required in PCM modeling to consider amino acids and ligand functional groups that are crucial for binding interaction of the complex (Lapinsh et al., 2001; van Westen et al., 2011; Qiu et al., 2017). This method has been found to be useful on polypharmacological studies as it can provide information on target selectivity (Cortes-Ciriano et al., 2015), especially in large protein families like GPCRs. In a recent study by Gao et al. (2013), 24 PCM models were developed for amine GPCRs and their corresponding ligands using machine learning methods, support vector regression (SVR), and Gaussian processes (GP). Two typical descriptors were generated per receptor: z-scale and transmembrane identity descriptors, and two typical descriptors were generated for each ligand: general (atomic contributions, logP, etc.,) and drug-like index descriptors. These descriptors were first used to build 24 PCM models, which were validated using a test-set. Although, most of the models showed strong goodness-of-fit (R<sup>2</sup> ) and predictivity (Q<sup>2</sup> ), the addition of cross-terms led to a lower predictive capability of the PCM models. This may be because it is still difficult to fully translate receptor-ligand interfaces to a descriptor value. Despite this, their PCM models showed great potential in predicting cross interactions between GPCRs and ligands.

### SUMMARY OF CHEMINFORMATICS SOFTWARES/TOOLS UTILIZED IN GPCR DRUG DISCOVERY

HTS has undergone technological advances and innovations that has rendered it as the principal method of drug discovery for years. However, it did not necessarily lead to a great leap forward in the discovery of NCEs as the hit rate for this method is frequently low, in addition to the enormous costs and efforts involved. In turn, computer-aided drug design (CADD) have been recognized and continuously receives increase in interest and usage such that most of GPCR drug discovery research efforts make use of one or more computational tools, especially in the initial stages of drug design. Due to the complexities of experimental GPCR research, it is of no surprise that CADD has emerged as a method of choice to expedite GPCR drug discovery and design. Furthermore, increasing knowledge of GPCR systems has led to the rising popularity of cheminformatics and chemogenomics as evidenced by the growing number of publicly available databases, which can provide structural or interaction information regarding receptor and its associated ligands.

There are several cheminformatics softwares and web servers available to identify lead compounds targeting GPCRs (Khan et al., 2011; Yadav et al., 2016). As mentioned previously, in silico approaches are classified into two approaches: SBDD and LBDD. If there are already known NMR and Xray crystal structures or reliable homology models available, computational methods based on target protein structures can be exploited (Lyne, 2002). These tools are related with several computational approaches, including molecular docking, VS, pharmacophore generation, and binding pocket detection. As shown in **Table 3**, several in silico cheminformatics methods have been applied for GPCR targeted drug discovery. In cases where no protein structures are available, ligand-based virtual screening (LBVS) can be utilized. LBVS can be further sub-classified into three: pharmacophore-, similarity-, and machine learning-based VS (Basith et al., 2016). As shown in **Table 4**, several in silico cheminformatics methods could be exploited for generation of pharmacophores, searching 3D similarity, and identifying targets (polypharmacology). Moreover, commercially available chemical libraries for VS are shown in **Table 5**.

#### TABLE 3 | Cheminformatics tools for structure-based drug discovery.


TABLE 4 | Cheminformatics tools for ligand-based drug discovery.


#### LIMITATIONS OF CHEMINFORMATICS APPROACHES IN GPCR DRUG DISCOVERY

In the last several years, the increasing number of high resolution GPCR structures has unlocked new avenues for structure-based GPCR drug discovery and design. However, several obstacles remain, including rapid identification of novel fragment-like compounds and structure-based elucidation of GPCR ligand function to name a few.

With the recent innovations in high-throughput, computer, and software technologies, as well as the upsurge of publicly available data, cheminformatics methodologies has no doubt become an essential part of most drug discovery efforts to date. However, a major flaw is seen during cheminformatics model development, wherein the experimental data used is assumed to be correct. In contrast to this assumption, databases can contain errors for ligand structures, bioactivity, activity types, and other information, which often results in ambiguous models leading to erroneous findings. Several recent articles (Fourches et al., 2010, 2016; Williams and Ekins, 2011; Williams et al., 2012) have discussed this topic at length and how it can have a negative effect on model development and performance. A study by Olah et al. (2005) mentioned that there were two molecules with incorrect structures on average for each medicinal chemistry journal, indicating a total error percentage of 8% in the WOMBAT database. Another more recent study by Tiikkainen et al. (2013), estimated the ligand error rates in ChEMBL, Liceptor, and WOMBAT databases to be 5, 7, and 6%, respectively. Error values for activity values in the three databases




ranged from 1 to 2%. It is therefore important to carefully and manually curate chemical and biological databases, since even minor errors can cause a substantial decrease in the predictive capability of generated models. Moreover, while the increasing sophistication of computer programs has allowed researchers an atomistic view of several GPCR systems, approximations of crucial energy terms that cannot be computationally explored at present has greatly limited the accuracy in the perception of these systems. Because of these, researchers should constantly gauge findings against their own scientific knowledge to see whether the results are significant or not. It should always be remembered that computational tools are created and continuously developed to assist in making the drug discovery process more efficient, but nothing can replace a researcher's own knowledge and experience.

Moreover, insights about GPCR structure, function, and binding partners have increased significantly compared to a few decades ago. Despite this, a great deal of information is still beyond our fingertips, such as protein structures of hundreds of unique GPCRs and ligand information for orphan GPCRs. It is imperative not lose fervor in gathering new knowledge to further enhance our understanding of GPCR structures and functions.

#### CONCLUSIONS

In the nineteenth century, chemical space exploration was initiated as a counting game to estimate its size (Reymond, 2015). However, the advent of cheminformatics field and powerful in silico technologies assisted in the exploration of uncharted ligand space from large chemical libraries. The availability of large public and commercial chemical databases, as well as ligand chemical space exploration tools, provide researchers the ease of accessibility to handle and explore huge chemical data. Cheminformatics is a complex field of study that translates large data into useful knowledge for drug design and optimization protocols. The expansion of GPCR structures and ligands over the past decade is mainly due to the progress in its structural biology and theoretical advancements. These structural and in silico breakthroughs have led to the implementation of cheminformatics approaches in GPCR drug discovery pipeline. In the GPCR drug discovery protocol, ligand- and structurebased approaches are the most commonly applied ones. LBDD is known as a fast and simple technique for the identification of vital chemical functionalities required for biological activity. However, absence of binding pocket information limits its ability in incorporating several important factors, such as receptor flexibility and ligand bioactive conformation, thereby restricting the discovery of candidate leads to only the ligand classes used in model development (Saxena et al., 2017). But due to the prolonged absence of GPCR structures, researchers relied heavily on ligand-based methods for drug discovery and lead optimization, leading to copious ligand structural information for these targets. Following the crystallization of bRho in 2000 (Palczewski et al., 2000) and β2AR in 2007 (Rasmussen et al., 2007), a striking increase in GPCR structural information have been observed in the last several years. While the current available structures are unable to cover the structural diversity of GPCR protein family members, there is enough that can be used as templates for homology modeling to perform SBDD. In contrast to ligand-based techniques, SBDD can be used to predict ligand bioactive conformation, thus providing a better understanding of receptor-ligand interactions and allowing the discovery of NCEs. Furthermore, recent researches underpin the significance of emerging integrated approaches in GPCR drug design and discovery. Assimilating LBDD and SBDD methods, as well as the use of integrated approaches, has proven to increase the success rate of finding promising leads, especially for well-studied targets such as GPCRs. All the cheminformatics approaches discussed in this review are focused toward the identification of novel ligands for GPCR targets based on the structural and ligand data, where several case studies signify the importance of VS. The evolution of cheminformatics techniques and their synergy in GPCR drug discovery pipeline is the driving force that will facilitate costeffective and prolific outcomes in the exploration of uncharted GPCR ligand space. Yet, an expert human touch is entailed to authenticate and tame the computer-generated outcome.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

SB and MC summarized the literature, wrote the manuscript, and prepared the figures. SM wrote part of the manuscript, prepared the figures, and revised the manuscript. JP and NC prepared the tables. SK and SC supervised all the works, provided critical comments, and wrote the manuscript.

#### ACKNOWLEDGMENTS

This work was supported by the Mid-career Researcher Program (NRF-2017R1A2B4010084) funded by the Ministry of Science and ICT (MSIT) through the National Research Foundation of Korea (NRF).


class B G-protein-coupled receptors. ChemMedChem 6, 2159–2169. doi: 10.1002/cmdc.201100317


non-inhibitors using machine-learning method. SAR QSAR Environ. Res. 28, 863–874. doi: 10.1080/1062936X.2017.1399925


high-throughput docking: structural insights into agonist-modulated GPCR features. Chem. Biol. Drug Des. 81, 442–454. doi: 10.1111/cbdd.12095


ion channel serotonin 5-HT(3)A. Bioorg. Med. Chem. Lett. 21, 5460–5464. doi: 10.1016/j.bmcl.2011.06.123


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Basith, Cui, Macalino, Park, Clavio, Kang and Choi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Improving Docking Performance Using Negative Image-Based Rescoring

Sami T. Kurkinen<sup>1</sup> , Sanna Niinivehmas <sup>1</sup> , Mira Ahinko<sup>1</sup> , Sakari Lätti <sup>1</sup> , Olli T. Pentikäinen1,2 and Pekka A. Postila<sup>1</sup> \*

<sup>1</sup> Department of Biological and Environmental Science and Nanoscience Center, University of Jyvaskyla, Jyväskylä, Finland, 2 Institute of Biomedicine, Integrative Physiology and Pharmacy, University of Turku, Turku, Finland

Despite the large computational costs of molecular docking, the default scoring functions are often unable to recognize the active hits from the inactive molecules in large-scale virtual screening experiments. Thus, even though a correct binding pose might be sampled during the docking, the active compound or its biologically relevant pose is not necessarily given high enough score to arouse the attention. Various rescoring and post-processing approaches have emerged for improving the docking performance. Here, it is shown that the very early enrichment (number of actives scored higher than 1% of the highest ranked decoys) can be improved on average 2.5-fold or even 8.7-fold by comparing the docking-based ligand conformers directly against the target protein's cavity shape and electrostatics. The similarity comparison of the conformers is performed without geometry optimization against the negative image of the target protein's ligand-binding cavity using the negative image-based (NIB) screening protocol. The viability of the NIB rescoring or the R-NiB, pioneered in this study, was tested with 11 target proteins using benchmark libraries. By focusing on the shape/electrostatics complementarity of the ligand-receptor association, the R-NiB is able to improve the early enrichment of docking essentially without adding to the computing cost. By implementing consensus scoring, in which the R-NiB and the original docking scoring are weighted for optimal outcome, the early enrichment is improved to a level that facilitates effective drug discovery. Moreover, the use of equal weight from the original docking scoring and the R-NiB scoring improves the yield in most cases.

Keywords: molecular docking, docking rescoring, negative image-based rescoring (R-NiB), benchmarking, consensus scoring

#### INTRODUCTION

Molecular docking is an in silico technique that samples potential binding poses of ligands flexibly against the ligand-binding cavities of receptor protein structures. This ability to mimic ligandreceptor recognition at the atom level can yield valuable insight on complex and experimentally difficult to approach phenomena such as enzyme reaction mechanics or ligand-receptor association especially when it is coupled to atomistic simulations.

The main interest for docking comes from its use in computer-aided drug discovery and virtual screening experiments that aim to discover novel drug compounds from vast compound

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Alfonso T. Garcia-Sosa, University of Tartu, Estonia Craig Doupnik, Morsani College of Medicine, University of South Florida, United States

> \*Correspondence: Pekka A. Postila pekka.a.postila@jyu.fi

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 10 November 2017 Accepted: 08 March 2018 Published: 26 March 2018

#### Citation:

Kurkinen ST, Niinivehmas S, Ahinko M, Lätti S, Pentikäinen OT and Postila PA (2018) Improving Docking Performance Using Negative Image-Based Rescoring. Front. Pharmacol. 9:260. doi: 10.3389/fphar.2018.00260

**364**

libraries—a process that ideally lowers the amount of costly experimental testing. On the one hand, the docking algorithms reproduce experimentally verified ligand binding geometries with remarkable accuracy (Kitchen et al., 2004; Warren et al., 2006; Kolb and Irwin, 2009; Meng et al., 2011). On the other hand, anybody who has used docking on routine basis can confirm that these successes are case-specific and the methodology often fails to produce sufficient enrichment (Ferrara et al., 2004; Mohan et al., 2005; Sousa et al., 2006; McGaughey et al., 2007; Plewczynski et al., 2011). In part, this hit-or-miss nature of docking is caused by the lack of relevant 3D structure data on the target proteins (Schapira et al., 2003) or inadequacies of the ligand conformer sampling (Sastry et al., 2013), but the other fundamental problem is the failure in scoring the sampled docking solutions (Wang et al., 2003; Warren et al., 2006; Plewczynski et al., 2011; Pagadala et al., 2017).

In other words, although the conformational space of the ligand binding might be sampled exhaustively, the best binding poses or the most potent compounds are not necessarily put to the top of the ranking lists by the default scoring functions (Wang et al., 2003; Ferrara et al., 2004; Cross et al., 2009; Plewczynski et al., 2011). An experienced researcher might be able to select the best pose out of 10 different conformers, but the situation becomes quickly unattainable when dealing with hundreds or thousands of compounds. The docking scoring functions put a certain weight on the specific ligand-receptor interactions such as hydrogen bonding, halogen bonding and ππ stacking but also the internal energies of the ligand conformers are considered. Despite the undeniable merits, these binding favorability or energy assessments do not always work (Chen et al., 2006; Cross et al., 2009), which means that the best pose or, more relevantly, the active compound is frequently ignored in the docking screening.

The docking solutions can be rescored after the fact to increase the yield. This is done by reassessing the favorability of the solutions utilizing a set of empirical binding descriptors that put weight on different binding characteristics. In the consensus scoring, a set of different scoring functions are employed and together they produce better enrichment than any of the functions accomplish alone (Charifson et al., 1999; Clark et al., 2002; Oda et al., 2006). Tasking more than one scoring methodology should in theory cover all the bases and, furthermore, a mix of dissimilar functions should facilitate the discovery of active hits from vast compound pools. The inherent problem with the consensus rescoring, however, is that the optimal settings are specific for each target. Accordingly, their successful use with novel targets lacking benchmark test sets is difficult to ascertain beforehand (Cheng et al., 2009).

In addition, performance enhancement might be produced by docking the ligands with different software to improve the sampling (Houston and Walkinshaw, 2013) or by optimizing and estimating the binding poses using the Poisson–Boltzmann or generalized Born and surface area continuum solvation (MM/PBSA or MM/GBSA), free energy perturbation (FEP) or solvated interaction energy (SIE) calculations (Bash et al., 1987; Kollman et al., 2000; Onufriev et al., 2004; Naïm et al., 2007; Guimarães and Cardozo, 2008; Sulea et al., 2011, 2012; Genheden and Ryde, 2015; Virtanen et al., 2015; Juvonen et al., 2016). Because these post-processing steps require a lot of extra computing, it limits their applicability in the real-world screening studies involving potentially hundreds of thousands of compounds. In addition, the success-rates of the post-processing methods vary on a case-by-case basis (Virtanen et al., 2015) and, beforehand, there is no way to tell whether the extra investment will pay out. In short, there is a genuine need for reliable rescoring methodologies that do not require a lot of extra computing resources or experiment-based tinkering.

The aim of the study was to demonstrate that by focusing solely on the shape/electrostatics complementarity between the docked ligand poses and the receptor protein's ligand-binding site, the yield of the small-molecule docking could be improved.

In the negative image-based (NIB) screening (Virtanen and Pentikäinen, 2010; Niinivehmas et al., 2011, 2015), a negative image or a NIB model is generated by inverting the shape and electrostatics of a ligand-binding cavity using a specifically tailored software PANTHER (Niinivehmas et al., 2015). The resulting NIB model is used by similarity comparison algorithms such as ShaEP (Vainio et al., 2009) the same way as ligand 3D structures extracted from the X-ray crystal structures are used in the ligand-based screening. The ligand 3D conformers, used in the similarity comparison, are generated from scratch using software such as BALLOON (Vainio and Johnson, 2007); but, notably, the conformers could also originate from molecular docking sampling.

To explore this idea further and to improve docking enrichment, the NIB screening methodology was repurposed for rescoring multiple explicit docking solutions output by the docking software PLANTS (Korb et al., 2009). The main difference between the established NIB methodology and the here introduced NIB rescoring or the R-NiB (**Figure 1**) is that it is performed as is. The coordinates of the cavity-based negative image and the docked ligand conformers are not superimposed or optimized for a better match. The rescoring was performed with 11 target proteins ranging from nuclear receptors such as progesterone receptor (PR) to neuraminidase (NEU) using established virtual screening benchmark libraries containing both known active and inactive decoy ligands (Huang et al., 2006; Mysinger et al., 2012). Altogether 22 different benchmark sets were used to validate the new methodology (**Table 1**).

As a whole, the results show that the R-NiB produces moderate or excellent early enrichment improvements using the basic settings in the NIB model generation and similarity screening. In most cases, the early enrichment of the docking can be improved also by consensus scoring, in which the original PLANTS docking scoring and the PANTHER/ShaEP-based R-NiB scoring are given an optimal weight ratio. What is more, the rescoring indicates that the hit rate is typically enhanced even when both of these scoring functions are bluntly given equal (50/50%) weight in the consensus scoring.

In summary, the success of the R-NiB approach in sorting out the active ligands from the inactive molecules is directly related to the fact that the shape/electrostatics complementarity between the ligand and the receptor is an essential part of the complex formation.

a standard docking software and multiple docking solutions or conformers are outputted for rescoring. Fourthly, a cavity-based NIB model, composed of explicit cavity points (white neutral; blue positive; red negative) is generated with PANTHER (Niinivehmas et al., 2015) for the same cavity. Fifthly, the NIB model shape/electrostatics (transparent surface with charge potential) are compared directly against the docking solutions using a similarity comparison algorithm ShaEP (Vainio et al., 2009) without geometry optimization. Those solutions matching the cavity information are given higher scores than the ones that differ.

## MATERIALS AND METHODS

#### Ligand Set Preparation

The ligand sets, including the active and inactive decoy compounds, were acquired from the DUD (A Directory of Useful Decoys) (Huang et al., 2006) and DUD-E (A Database of Useful (Docking) Decoys -Enhanced) (Mysinger et al., 2012) databases for the target proteins (**Table 1**). The initial 3D coordinates for the DUD ligands were converted to the SMILES (Simplified Molecular-Input Line-Entry System) format using STRUCTCONVERT in MAESTRO 2017-1 (Schrödinger, LLC, New York, NY, USA, 2017). LIGPREP in MAESTRO was used to generate OPLS3 charges and tautomeric states for both the DUD and DUD-E ligand sets at pH 7.4. Next, both of the ligand sets were converted to the SYBYL MOL2 format using MOL2CONVERT in MAESTRO. The back-and-forth conversion between MOL2 and SMILES formats was done with the DUD ligands to avoid potential bias of the original 3D conformations for the molecular docking (Zoete et al., 2016).

### Protein Preparation

The 3D structures of the target proteins, which were used in the molecular docking and the NIB model generation, were acquired from the Protein Data Bank (PDB) (Berman et al., 2000; Burley et al., 2017). All of the used PDB entries are listed in **Table 1**. The benchmarking was done mainly using the PDB entries listed for the DUD and DUD-E datasets and, thus, both the docking and rescoring could work better or worse using different structures. The necessary PDB entry editing (**Figure 1**) such as the removal of bound ligands from the active sites was done in the BODIL Molecular Modeling Environment (Lehtonen et al., 2004). The protein residues were protonated with the default settings in REDUCE3.24 (Word et al., 1999). The X-ray crystal structure waters were left in the deprotonated state for NIB model building.

## Molecular Docking

The molecular docking of the DUD and DUD-E compound sets (**Figure 1**) into the ligand-binding sites of the target proteins was performed using PLANTS1.2 (Korb et al., 2009). The default settings were used in the docking screenings. Accordingly, the initial docking scoring was performed with the ChemPLP that combines the PLP (Piecewise Linear Potential) with GOLD's Chemscore (Korb et al., 2009). The centroid coordinates of ligands bound in the target protein structures were used as the binding site centers in the docking. A relatively large binding



TABLE 1 | Target protein 3D structures used in the virtual screening.

<sup>a</sup>AR, androgen receptor; COX2, cyclo-oxygenase 2; CYP3A4, cytochrome P450 3A4; ER, estrogen receptor alpha; GR, glucocorticoid receptor; MR, mineralocorticoid receptor; NEU, neuraminidase; PPARγ, peroxisome proliferator activated receptor gamma; PR, progesterone receptor; RXRα, retinoid X receptor alpha; PDE5, phosphodiesterase type 5. ER-agonist, ER-antagonist and ER-mixed refer to ligand sets containing ER-specific agonists, antagonists or both, respectively.

<sup>b</sup>Number of active ligands (Ligs) and decoy (Decs) molecules after preprocessing with LIGPREP.

c In the DUD database, ER agonists and antagonists are separated into two separate datasets, but in the case of the DUD-E the ligands are mixed. For comparison, the ER datasets in the DUD were also mixed.

<sup>d</sup>Used in the NIB model generation.

site radius of 10 Å was generally used in the docking. The radius was slightly reduced for glucocorticoid receptor (GR; 9 Å) based on the size of the ligand-binding site. Altogether 10 docking solutions were output for each compound for the purpose of NIB rescoring. The idea is to provide enough different docking solutions for the rescoring.

#### Negative Image-Based Model Generation

The negative images or the NIB models of the target proteins' ligand-binding cavities (**Figure 1**) were prepared using the default settings in PANTHER0.18.15 (Niinivehmas et al., 2015). The centroids used in the NIB model generation were based on the centroid coordinates of the ligand compounds bound in the original protein 3D structures the same way as was done with the docking. The NIB models were prepared in three different ways: (1) the NIB model size and dimensions were adjusted using the box radius option (6–10 Å); (2) the cavity size was limited to a certain radius (1.5–3.0 Å) from the bound ligand in the original structure using the ligand distance limit option; (3) when available and producing better results, a model (referred as PANTHER model) was taken also from a prior NIB screening study (Niinivehmas et al., 2015). The NIB model coordinates for all new NIB models are included in the Supplementary Material.

#### Negative Image-Based Rescoring

The NIB rescoring (or the R-NiB; **Figure 1**) of the original docking solutions was performed using ShaEP1.0.7.915 (Vainio et al., 2009). The shape and electrostatics of each docking solution was compared directly against the template NIB models without superimposing or optimizing their coordinates (– noOptimization option). Both the shape and electrostatics were given equal amount of weight (ESP = 0.5) in the ShaEP similarity scoring (default option). Because altogether 10 conformers were outputted for each docked compound, even those solutions given lower scores by PLANTS (Korb et al., 2009) could be later considered in the PANTHER/ShaEP-based (Virtanen and Pentikäinen, 2010; Niinivehmas et al., 2011, 2015) NIB rescoring.

#### Rescoring With Alternative Methodologies

The docking poses initially scored by PLANTS using ChemPLP scoring function were also rescored using an alternative scoring function PLP in PLANTS. Otherwise, default options were used in the PLANTS-based rescoring. In addition, the docking solutions were also re-ranked using the default settings of XSCORE1.2.1 (Wang et al., 2002) for comparison. The XSCORE has three empirical scoring functions HPSCORE, HMSCORE and HSSCORE that can be fine-tuned on case-by-case basis to improve the docking yield. None of the scoring functions produced markedly better early enrichment separately for the docking results at least without special adjustments; thus, the software's default option of using X-CSCORE consensus scoring with all three functions was utilized.

#### Consensus Scoring

The R-NiB relies heavily on the initial success of the docking software used to generate the multiple docking poses for the rescoring phase, because no coordinate optimization or extra sampling is performed (**Figure 1**). Essentially, this means that the used PLANTS scoring is intrinsically influencing the R-NiB yield in this study. The consensus scoring takes this aspect further by directly incorporating the initial ChemPLP docking scoring with the R-NiB scoring. All possible combinations, in which both PLANTS- and ShaEP-based scoring were given different weights, were considered with 5% interval and those consensus scoring settings producing the highest early enrichment are discussed. The scores for each docked conformer outputted by PLANTS and ShaEP were normalized to fit into the scale from 1 to 0 and then combined for a consensus score.

#### Table and Figure Preparation

**Figures 1**, **4**, **5** were prepared using BODIL (Lehtonen et al., 2004), MOLSCRIPT2.1.2 (Kraulis, 1991), RASTER3D3.0.2 (Merritt and Murphy, 1994), and VMD1.9.2 (Humphrey et al., 1996). The area under curve (AUC) values (**Tables 2**, **3**), the early enrichment values (**Tables 4**, **5**) were calculated with ROCKER0.1.4 (Lätti et al., 2016). The enrichment factors were calculated as true positive rate when 1 or 5% of the decoy molecules have been found (EFn%DEC; see equation below) in order to make future comparison reliable against other methodologies (Lätti et al., 2016).

$$\text{EF}\_{\text{n@DEC}} = \frac{\text{Ligs}\_{\text{n@DEC}}}{\text{Ligs}\_{\text{all}}} \times 100 \tag{1}$$

In Equation (1), Ligsn%DEC is the number of ligands ranked higher than n % of the decoys whereas Ligsall is the total number of all ligands in the dataset. The receiver operating characteristics (ROC) curves were plotted using ROCKER with the semi-log10 scale (only x axis logarithmic) in **Figures 2**, **3** to highlight the very early enrichment of the actives. The standard deviation for the AUC is acquired in ROCKER utilizing the derived error for the Wilcoxon statistic (Hanley and McNeil, 1982). The Wilcoxon statistic estimates the probability of ranking a random ligand higher than a random decoy, which is equivalent to the value of AUC; thus, making the errors also equal.

### RESULTS

### Negative Image-Based Rescoring of Docking Solutions

The aim of the negative image-based rescoring or R-NiB (**Figure 1**) is to rescore existing molecular docking solutions and, by doing so, enrich active hits from a vast pool of compounds. The enrichment is achieved by comparing the shape/electrostatics similarity between the ligand conformers and the negative image of the target protein's ligand-binding cavity. The established NIB methodology (Virtanen and Pentikäinen, 2010; Niinivehmas et al., 2011, 2015) is employed in building the cavity-based NIB models of the target proteins' ligandbinding sites (PANTHER) and in comparing them against each docking solution (ShaEP). The starting point of the R-NiB workflow (**Figure 1**) is that the ligands are docked into the same target protein's cavity using a standard docking algorithm and, preferably, multiple solutions that roughly fit into the cavity are outputted for the rescoring.

### Molecular Docking Produces Moderate or High Enrichment in the Benchmarking

The AUC and early enrichment values (**Tables 2**, **3**) show that the molecular docking, performed with PLANTS (Korb et al., 2009), worked relatively well with both the DUD and DUD-E datasets (Huang et al., 2006; Mysinger et al., 2012). With the DUD, the AUC values ranged from 0.60 to 0.95 indicating either moderate or substantial enrichment of actives with a majority of the targets (**Tables 3**). Markedly, the docking for the estrogen receptor alpha agonists (ER-agonist; AUC = 0.81), PR (AUC = 0.63) and the peroxisome proliferator activated receptor gamma (PPARγ; AUC = 0.95) worked so well that the AUC values were not improved by the R-NiB (**Table 2**). A side note, the DUD sets are small, containing 15–348 actives (**Table 1**) and, accordingly, a difference of a few active ligands in the ranking can sometimes have disproportionate effects on the AUC values. The docking worked also with the more demanding DUD-E ligand sets, containing a lot more of actives and decoys (**Table 1**), as the AUC values were typically well above 0.50 (**Table 3**). The AUC values could not be improved with the ER-mixed (AUC = 0.74), PPARγ (AUC = 0.85), phosphodiesterase type 5 (PDE5; AUC = 0.78) and cytochrome P450 3A4 (CYP3A4; AUC = 0.61) DUD-E sets using the R-NiB (**Table 3**).

Instead of the AUC values, it is often more practical to concentrate on the early enrichment when estimating the success of the virtual screening. That is to say, paradoxically, a high AUC value does not necessarily guarantee that the very top results contain active hits despite the fact that it is a good metric for estimating the overall success-rate of the screening. By large, the docking struggled in ranking the actives to the very top of the list, when inspecting the EF1%DEC or EF5%DEC values with the DUD and DUD-E datasets (**Tables 4**, **5**). Accordingly, the very early enrichment or EF1%DEC was improved by the R-NiB with all of the DUD sets (**Table 4**). With the DUD-E, the R-NiB could not produce improvement for the ER-mixed (EF1%DEC = 21.7%), PPARγ (EF1%DEC = 24.2%), retinoid X receptor alpha (RXRα; EF1%DEC = 11.5%), cyclo-oxygenase 2 (COX2; EF1%DEC = 5.7%), and PDE5 (EF1%DEC = 11.3%; **Table 5**), however, in the remaining six datasets the early enrichment was improved notably (discussed below). The ROC curves, which were plotted using the semi-log10 scale to highlight the very early enrichment, corroborate the numerical trends for both of the benchmark datasets (**Figures 2**, **3**).

### Negative Image Generation for Rescoring Is a Straightforward Process

The NIB model has to contain key features of the target protein's ligand-binding cavity in order to produce enrichment by the R-NiB (**Figure 1**). Firstly, the shape and size of the model should be limited to the cavity area that facilitates the ligand binding. Secondly, if the cavity contains vital hydrogen bond acceptor or donor groups, the NIB model must reflect those features in its charge properties. Each data point in the NIB model can be tested and adjusted iteratively using validated ligand sets that include both active and inactive compounds. This sort of "trial-and-error" refinement is generally


#### TABLE 2 | The AUC values for the DUD datasets.

If the rescoring produced higher AUC value in comparison to the initial docking (no overlapping standard error ranges), those numbers are shown in bold.

<sup>a</sup>The ligand distance limit used in PANTHER varied between the targets due to the size/shape differences of the binding cavities and the screened ligand sets. Limits included 1.5 Å (ER, AR, MR, PPARγ, PR RXRα, and COX2), 2.0 Å (GR), and 3.0 Å (PDE5).

<sup>b</sup>The box radius varied between the targets due to the size/shape differences of the binding cavities and screened ligand sets. The radiuses included 6.0 Å (GR, PR and COX2), 7.0 Å (ER-mixed, MR and RXRα), and 8.0 Å (ER-agonist, ER-antagonist, AR, PPARγ and PDE5).

<sup>c</sup>The previously published PANTHER models, optimized for regular NIB screening, were taken from a prior study (Niinivehmas et al., 2015).

TABLE 3 | The AUC values for the DUD-E datasets.


If the rescoring produced higher AUC value in comparison to the initial docking (no overlapping standard error ranges), those numbers are shown in bold.

<sup>a</sup>The ligand distance limit used in PANTHER varied between the targets due to the size/shape differences of the binding cavities and screened ligand sets. Limits included 1.5 Å (ERmixed, AR, PPARγ, PR, and COX2), 2.0 Å (MR, RXRα, NEU, PDE5, and CYP3A4) and 3.0 Å (GR).

<sup>b</sup>The box radius varied between the targets due to the size/shape differences of the binding cavities and screened ligand sets. The radiuses included 6.0 Å (AR, GR, MR, COX2, NEU, and PR), 7.0 Å (PDE5, RXRα, and CYP3A4) and 9.0 Å (PPARγ) and 10.0 Å (ER-mixed).

<sup>c</sup>The previously published PANTHER models, optimized for regular NIB screening, were taken from a prior study (Niinivehmas et al., 2015).

not feasible and, accordingly, the R-NiB methodology was applied here using default easy-to-replicate PANTHER/ShaEP settings (Vainio et al., 2009; Niinivehmas et al., 2015). Effective models were acquired by simply adjusting the cavity detection box radius or by limiting the cavity dimensions with the ligand distance limit in PANTHER (Niinivehmas et al., 2015). The model generation relied solely on the PDB entry used also in the docking and generally the firsttried basic settings were enough to improve the enrichment (**Tables 2**–**5**; **Figures 2**, **3**). For comparison, the rescoring was also performed with prior PANTHER models (**Tables 2**–**5**) optimized for the standard NIB screening (Niinivehmas et al., 2015).

#### Negative Image-Based Rescoring Improves the Early Enrichment With Most Targets

The R-NiB (**Figure 1**) does not rely on superimposing or geometry optimization prior to the similarity comparison of the docking solutions against the cavity-based NIB models. In a nutshell, either the docked ligand poses outputted by the docking


Those EF%DEC values that are at least 1.5-fold compared to the initial docking are shown in bold.

<sup>a</sup>The ligand distance limit used in PANTHER varied between the targets due to the size/shape differences of the binding cavities and the screened ligand sets. Limits included 1.5 Å (ER-agonist, ER-mixed, AR, MR, PPARγ, RXRα, and COX2) and 2.0 Å (GR and PR), 3.0 Å (ER-antagonist) and 4.0 Å (PDE5).

<sup>b</sup>The box radius varied between the targets due to the size/shape differences of the binding cavities and screened ligand sets. The radiuses included 6.0 Å (MR and COX2), 7.0 Å (AR and PR) and 8.0 Å (ER's, GR, PPARγ and RXRα) and 9.0 Å (PDE5).

<sup>c</sup> The previously published PANTHER models, optimized for regular NIB screening, were taken from a prior study (Niinivehmas et al., 2015).

software match the cavity-based NIB models or they do not the similarity score (from 1 to 0) of ShaEP reflects this reality. Therefore, it is crucial that the initial docking has sampled the ligand conformers thoroughly and produces "correct" ligand poses that can be discovered by the R-NiB. Understandably, the rescoring cannot enrich active compounds, if they are docked completely outside the cavity space that was used in the NIB model generation.

With the DUD datasets (Huang et al., 2006), the AUC values from docking were improved somewhat or greatly with most of the target proteins using the R-NiB (**Table 2**). The AUC improvement was sizeable with the GR (0.60 vs. 0.84), RXRα (0.78 vs. 0.90), mineralocorticoid receptor (MR; 0.80 vs. 0.93) and COX2 (0.81 vs. 0.95) to name a few examples (**Table 2**). Moreover, the R-NiB could improve the AUC values substantially even with the more demanding DUD-E sets (Mysinger et al., 2012) where the docking scoring started to falter (**Table 3**). This positive effect in favor of the R-NiB was seen with a multitude of target proteins, including the androgen receptor (AR; 0.54 vs. 0.76), GR (0.54 vs. 0.74), MR, (0.55 vs. 0.74), PR (0.63 vs. 0.74), RXRα (0.77 vs. 0.83), and COX2 (0.66 vs. 0.75). The AUC values worsened or improved marginally for the CYP3A4 (0.61 vs. 0.60) and NEU (0.85 vs. 0.89), respectively, but in these cases the results remained within the margin of error (**Table 3**). The R-NiB clearly could not improve the AUC values for the PDE5, PPARγ and ER-mixed with the DUD-E datasets (**Table 3**). The PDE5 and ER-mixed datasets are particularly demanding, because they both contain two distinct ligand groups for which one cannot build a single satisfactory NIB model (Niinivehmas et al., 2011).

As stated above, it is more important that the virtual screening produces the highest possible early enrichment rather than the best AUC value. To this end, the R-NiB was able to improve the early enrichment somewhat or substantially with most of the target proteins included in the DUD datasets (**Table 4**). The EF1%DEC improvement ranged from 1.9 to 49.1% between the different targets. On average the EF1%DEC or EF5%DEC improvement was 3.3-fold or 1.8-fold, respectively, but, alas, the EF1%DEC of PR improved 9.0-fold using the R-NiB. A close inspection of the semi-logarithmic ROC curves (**Figure 2**) indicates that the very early enrichment produced by the R-NiB was always as good as or better than that of the original docking scoring (well above the random rate; **Figure 2**). This suggests that the rescoring generally has a positive effect for the yield with the


Those EF%DEC values that are at least 1.5-fold compared to the initial docking are shown in bold.

<sup>a</sup>The ligand distance limit used in PANTHER varied between the targets due to the size/shape differences of the binding cavities and screened ligand sets. Limits included 1.5 Å (ERmixed, AR, PDE5, GR, MR, PR and COX2), 2.0 Å (RXRα, NEU and CYP3A4) and 3.0 Å (PPARγ).

<sup>b</sup>The box radius varied between the targets due to the size/shape differences of the binding cavities and screened ligand sets. The radiuses included 6.0 Å (AR, GR, MR and NEU), 7.0 Å (RXRα, PR, PDE5 and CYP3A4), 8.0 Å (COX2), 9.0 Å (PPARγ) and 11.0 Å (ER-mixed).

<sup>c</sup>The previously published PANTHER models, optimized for regular NIB screening, were taken from a prior study (Niinivehmas et al., 2015).

tested DUD datasets. The EF1%DEC improvement (**Table 4**) was most prominent with the COX2 (13.5 vs. 62.6 %), but the R-NiB worked exceptionally well also based on the EF5%DEC for example with the RXRα (30.0 vs. 80.0%), COX2 (35.3 vs. 83.0%), PDE5 (25.5 vs. 39.2%) and ER-agonist (44.8 vs. 59.7%).

Based on the early enrichment values (**Table 4**) and the plotted ROC curves (**Figure 3**), the overall performance of the R-NiB with the DUD-E dataset showed similar trends as with the DUD (**Table 3**; **Figure 2**). The improvement over the original docking was on average 2.5-fold for the EF1%DEC (**Table 5**) despite the fact that the DUD-E ligand sets are much larger than the smaller but better curated DUD datasets (**Table 1**). For example, the EF1%DEC improvement of 2.1% (from 2.0 to 4.1%) with PR might seem minor at the first glance, but in terms of absolute compound numbers it is a marked uptick from the discovery of six to 13 actives over the original docking. The EF1%DEC (**Table 5**) was improved by the R-NiB substantially with the AR (1.5 vs. 13.0%), MR (3.2 vs. 11.7%) and NEU (4.1 vs. 13.3%). Although in the case of the RXRα the EF1%DEC values suggested that the docking scoring worked better than the R-NiB (**Table 5**), a close inspection of the semi-logarithmic ROC plot shows that the rescoring actually produced higher very early enrichment (EF0.5%DEC 6.1 vs. 3.8%; **Figure 3**). The EF5%DEC was improved on average 1.3-fold for these targets (**Table 5**) and, for example, the GR (12.0 vs. 22.5%) received a 1.9-fold improvement.

#### Negative Image-Based Rescoring Is Both Ultrafast and Efficient

For the purpose of comparison, the original docking solutions were also re-evaluated using empirical rescoring algorithm XSCORE (Wang et al., 2002) and the PLP scoring function in PLANTS. Target-specific settings for ligand-receptor interactions such as hydrogen bonding or hydrophobicity are considered via multivariate analysis in XSCORE. Although the R-NiB generally produced better enrichment than XSCORE, the latter algorithm excelled with both the DUD and DUD-E datasets for the RXRα (**Tables 2**–**5**). The rescoring with the PLP function in PLANTS could only in some cases (e.g., COX2) improve the original ChemPLP-based ranking and, generally, the R-NiB produced substantially better results (**Tables 2**–**5**).

The use of non-default XSCORE settings could have produced higher early enrichment; however, similar fine-tuning of the R-NiB models or even PLANTS settings could likely have improved the enrichment as well. By adjusting the assortment of the cavity charge points capable of hydrogen bonding and/or lowering/increasing the weight of the electrostatics in

the result after PANTHER/ShaEP-based rescoring, and the black line gives the result from consensus scoring where both of them are given equal weight (50/50%). The dashed line outlines the random selection (AUC = 0.50). The semi-log10 scale is used only for the x axis to highlight the very early enrichment or lack thereof.

the similarity screening generally improves the enrichment. For example, in our test runs the R-NiB produced notably better early enrichment (EF1%DEC 12.2–23.0%) for the DUD set of the AR with the box radius option when only a few cavity points were added or removed instead of using the default NIB model (data not shown). In fact, one could even overemphasize certain properties (e.g., charge) artificially in the NIB model to produce better enrichment in the rescoring than what the default settings would otherwise allow. Because this kind of rescoring bias does not alter the actual ligand poses, the preferred docking solutions remain within the realm of possible. The situation can be entirely different, if the original docking scoring function, affecting the ligand conformer sampling, is altered radically; i.e., unrealistic conformations could be put forward.

Excluding the time taken for the NIB model generation, the actual rescoring performed with ShaEP is computationally very inexpensive; spending only a fraction of the time required for the initial docking. This is possible, because no ligand conformer sampling or even geometry optimization between the NIB model and docked ligand conformers is done. In fact, the ShaEP-based scoring with the DUD sets for the ER-agonist (1.94 ms/comp. vs. ∼24.4 ms/comp.), PDE5 (3.81 ms/comp. vs. ∼35.7 ms/comp.), and COX2 (2.43 ms/comp. vs. ∼54.0 ms/comp.) was at least 10 times faster than the XSCORE rescoring, which is already very fast. Similarly, rescoring with PLP function in PLANTS took roughly double the time with the ER-agonist (1.94 ms/comp. vs. ∼3.21 ms/comp.), PDE5 (3.81 ms/comp. vs. ∼7.15 ms/comp.), and COX2 (2.43 ms/comp. vs. ∼4.54 ms/comp.) datasets, when compared to the R-NiB. These benchmark numbers vary depending on the computer set-up. Here, the software were run using a single Intel Xeon CPU (W3670 3.2 GHz) and RAM 12 GB DDR 1333 MHz in a LINUX desktop. The absolute size of the NIB model and that of the compounds being rescored affect the R-NiB performance; however, the differences in the wall time are minor.

### DISCUSSION

The negative image-based rescoring or the R-NiB is a truly novel way of rescoring docking solutions, because it does not rely on the use molecular mechanics force fields, empirical or knowledge-based descriptors in evaluating the favorability of the ligand binding. For example, the binding free energy is not considered in any shape or form during the rescoring. Although the selected atom charges and van der Waals radiuses affect the NIB model generation profoundly, the ShaEP-based rescoring itself is a simple matter of shape/electrostatics comparison. No force field-based sampling or even coordinate superimposition is needed. The NIB models can be trained for optimal effect using experimental ligand sets with the "trial-and-error" approach, but generally this is not needed.

### Applicability of Negative Image-Based Rescoring

A NIB model can be built for virtually any target protein as long as there is a solid idea where the potential small-molecule binding or initial docking should happen. The target pocket can be a welldefined and enclosed cavity (see CYP3A4 in **Figures 4A–D** and GR in **Figures 4E–H**), an opening on the protein surface (see NEU in **Figures 4I–L**), a sub-cavity, a groove or even a small dent on the protein surface (**Figure 4**). The R-NiB results with the benchmark sets confirm this hypothesis, because the method improves docking enrichment with a variety of different target proteins (**Tables 2**–**5**; **Figures 2**, **3**) and, more importantly, with physically different kind of ligand-binding cavities (**Figure 4**). The enrichment values (**Tables 2**–**4**) and semi-logarithmic ROC curves (**Figures 2**, **3**) show that the R-NiB (**Figure 1**) clearly improves the yield with a multitude of DUD-E datasets, including the nuclear receptors AR, GR, MR, and PR, but also with entirely different kind of target protein NEU.

Overall, the R-NiB results (**Tables 2**–**5**; **Figures 2**, **3**) show that a satisfactory enrichment can be acquired in most cases by building NIB models by simply adjusting the cavity detection radius or by limiting the cavity search area using a receptorbound ligand included in the PDB entry (**Figures 1**, **4**). Having protrusions outside this cavity space do not necessarily worsen

partially buried and, thus, no cross-sectioning was done. The contours of the active sites of (B,C) CYP3A4, (F,G) GR, and (J,K) NEU are shown both as opaque surfaces and finalized NIB models (transparent surfaces with charge potential) in the cross-section close-ups. The red, blue, and white dots in the NIB model indicate the negative, positive and neutral cavity dots (or filler atoms) constituting the negative image. The docked poses of five known active compounds (stick models with orange backbone) for (D) CYP3A4, (H) GR, and (L) NEU from PLANTS are shown stacked in the far right.

any ligand's similarity score a lot (a marginal penalty inflicted in the ShaEP scoring); however, it is important to understand that those ligand segments outside the cavity will be effectively ignored in the rescoring.

So, the emphasis of R-NiB is resolutely on the cavity's negative image (**Figure 4**) and it is recommended that unpractically large ligands for the cavity in question are filtered away before docking and/or rescoring. Essentially, docking sizable ligands with a lot of rotatable bonds (e.g., PPARγ datasets) or with particularly large cavities (e.g., PDE5) is likely to produce errors or difficult ascertain alternative poses that cannot be reliably rescored using the R-NiB. Despite this, in theory, the R-NiB could be used to rescore even docked peptides (not tested here) as long as their binding is dependent on the shape/electrostatics complementarity with the cavity. This narrow focus on the area designated by the NIB model for the ligand binding makes the R-NiB (**Figure 1**) truly a precision technique.

The downside of this narrow focus is that it also limits the usability of different benchmark test sets in evaluating the R-NiB (**Figure 1**). If the test set contains active compounds that bind into completely different or only partially connected ligandbinding sites in the target protein, the R-NiB cannot possibly rank all those ligands high up in the list using a single NIB model (**Figure 4**). Moreover, when dealing with large ligand-binding cavities such as the active site of PDE5, where inhibitors can have very different binding locations and poses, with very little overlap, and/or water molecules play a big role in coordinating the ligand binding, a single NIB model simply cannot provide all the necessary information needed for the enrichment. One can try to solve this issue by curating the ligand sets better, limiting the search radius for docking or by applying multiple NIB models to the task. Naturally, this level of focus is not a problem when working in an actual screening project, in which the efforts are centered on a specific binding site or subcavity.

### Recognizing Biologically Relevant Ligand-Binding Poses

The R-NiB is not optimizing the ligand positioning inside the protein's ligand-binding pocket, but merely comparing the earlier produced docking poses against the cavity's shape/electrostatics (**Figure 1**). The highest scored poses for the active compounds might not differ from the original docking; however, the enrichment can improve due to lower ranking of the inactives by the R-NiB. In fact, improvement in the enrichment values is not an absolute guarantee that the "correct" conformers are discovered during the rescoring. With certain ligand-binding pockets and compounds it is very difficult to conclude what is the actual binding pose and there might even exist more than one valid pose (Mobley and Dill, 2009). One can attempt to address this issue by looking at the individual docking solutions, their exact binding interactions and, ultimately, compare them against the experimentally validated data for the same compound or its closely-related structural analogs (**Figure 5**). For example, the R-NiB seems to be able to recognize the biologically relevant binding pose of hydrocortisone with the MR whereas the original docking scoring fails (**Figure 5**).

Because the R-NiB can only reorder the docking solutions and if all of the ligand conformers are docked in a completely "wrong" way or even outside the ligand-binding pocket, the "correct" pose or ligand cannot emerge on top of the results list. This is true for all rescoring methodologies as they mainly reshuffle existing solutions. To a certain extent, this is the case even for force fieldbased post-processing methodologies, because the initial ligandreceptor complex is crucial for the sampling as well. In certain cases even a partial shape/electrostatics match with the cavitybased NIB model can give the docked compound a substantially higher ranking and improve the enrichment. By docking the decoys mostly outside the binding cavity, one could also improve the enrichment as long as the actives reside at the site. Here, it was made sure that the docked compounds and the generated NIB models occupied roughly the same 3D space in relation to the protein. The match between the cavity space and the outputted docking solutions is highlighted for the CYP3A4 (**Figure 4C** vs. **Figure 4D**), GR (**Figure 4G** vs. **Figure 4H**), and NEU (**Figure 4K** vs. **Figure 4L**) in **Figure 4**.

### Consensus Scoring—Finding the Balance Between the Scoring Functions

If the initial docking produced the "correct" or at least reasonable pose for the active compound but it was not favored by the docking software, in theory one should be able recognize it from the multiple outputted poses using a superior scoring method. In reality, all of the scoring methodologies excel on some targets and

aldosterone (stick model with cyan backbone) are shown. (B) The negative image or NIB model (transparent surface) of the MR active site was build using the same PDB entry (Bledsoe et al., 2005) and the 1.5 Å ligand distance limit option in PANTHER. The red and blue dots depict the negatively and positively charged cavity points, respectively, whereas the white dots are neutral. (C) The rescored pose (rank #13) of hydrocortisone (stick model with orange backbone) reminds closely the experimentally verified pose of its structural analog aldosterone (A vs. C). (D) Hence, the pose of hydrocortisone given the highest score by PLANTS (rank #17), showing a reversed pose in comparison to the aldosterone (A vs. D), is likely erroneous (D).



The NIB model producing the highest EF1%DEC (Table 4) was used in the consensus scoring with PLANTS. When optimal and equal (50/50%) weight is used, all datasets produced better EF1%DEC and EF5%DEC enrichments than the docking.

a If the ShaEP weight is 1.0, the consensus score comes entirely from ShaEP rescoring, and, vice versa, if the weight is 0, only the PLANTS score is used. The value of 0.50 corresponds to the situation in which PLANTS docking and ShaEP rescoring effect have equal weight in the results. Both the ShaEP and PLANTS scores were normalized to fit the scale from 0 to 1 before combining them. The consensus scoring was not done to acquire the best AUC enrichment possible and, accordingly, upon a rare occasion the value could decrease (downward arrow) instead improving it (upward arrow).

<sup>b</sup>1EF%DEC corresponds to the EF%DEC difference between the consensus scoring and the original ShaEP rescoring of the same NIB-model.

TABLE 7 | The consensus scoring of the DUD-E datasets.


The NIB model producing the highest EF1%DEC (Table 5) was used in the consensus scoring with PLANTS. When optimal weight is used, all datasets produced better EF1%DEC and EF5%DEC enrichments than the docking. In the case of equal (50/50%) weight, only the PPARy dataset produced weaker early enrichment than the original docking. See Table 6 for further details.

ligand sets for different and sometimes even conflicting reasons. Because both the original docking software PLANTS (Korb et al., 2009) and the similarity comparison algorithm ShaEP (Vainio et al., 2009) output their own scores for each ligand conformer, it is possible to normalize and combine the results and adjust their relative weight with different targets (**Tables 6**, **7**).

This score weighting or consensus scoring (**Tables 6**, **7**) was performed to determine, if the ranking benefitted more from either of the scoring functions and if there is a generally applicable weight ratio that could be routinely used. Because the emphasis in the consensus scoring was put on the EF1%DEC improvement, the AUC values of the DUD datasets were not necessarily improved (e.g., PPARγ; **Table 2** vs. **Table 6**). Similarly, with the ER-mixed, plagued also by the dualistic nature of the included agonist/antagonist ligands, the AUC values were not improved for the DUD-E (**Table 3** vs. **Table 7**). Moreover, focusing on the early enrichment indicates that the consensus scoring worked almost without an exception better than the docking for both the DUD (**Table 4** vs. **Table 6**) and DUD-E datasets (**Table 5** vs. **Table 7**). Even a relatively tiny push by the R-NiB (e.g., 10–35% weight from ShaEP) was enough to help the early enrichment (**Tables 6**, **7**).

Dealing with a completely new target protein cavity or heterogeneous ligand set is likely to require re-weighting and careful optimization upon the arrival of experimental results. Despite this, the yield was in most cases improved by simply giving both scoring functions an equal weight in the consensus scoring (**Tables 6**, **7**) instead of using the default PLANTS scoring or the R-NiB alone (**Tables 4**, **5**). With the DUD datasets, the equal weight consensus scoring produced always better early enrichment than the docking, but the non-weighted R-NiB could sometimes work slightly better (see the negative 1EF values in **Table 6**; **Figure 2**). Similarly, the equal weighting produced better early enrichment than docking scoring alone with the DUD-E datasets; however, the yield for the PPARγ did not benefit from this arrangement. Regardless, with a multitude of targets, the non-weighted R-NiB produced higher early enrichment than the equal weight consensus scoring (see the negative 1EF values in **Table 7**; **Figure 3**).

Although the equal weighting in the consensus scoring could reduce the early enrichment marginally in certain cases, the tradeoff was that in general it produced better early enrichment; making it a viable option for future docking screening experiments.

#### CONCLUSIONS

This study demonstrates that by simply focusing on the shape/electrostatics complementarity between the ligand and the receptor protein's binding cavity, the docking performance regarding the early enrichment can be improved across the board. The rescoring is done by generating a negative image of

#### REFERENCES


the protein's ligand-binding cavity that is then used directly in the similarity comparison of the docking solutions (**Figure 1**). The results show that the negative image-based rescoring (or the R-NiB) can enhance the success-rate of docking screenings to a level that facilitates effective drug discovery. Moreover, the R-NiB can be used in unison with other docking scoring functions in consensus scoring to improve the early enrichment yet further.

### AUTHOR CONTRIBUTIONS

STK performed the docking and rescoring assays with the assistance from SN and MA. PAP wrote the manuscript with the help from the co-authors. OTP and PAP designed the experiments based on the original concept by OTP and SL. PAP supervised the study.

### ACKNOWLEDGMENTS

The Finnish IT Center for Science (CSC) is acknowledged for generous computational resources (OTP; Project Nos. jyy2516 and jyy2585).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00260/full#supplementary-material


CYP2A6 and its mouse and pig orthologous enzymes. Xenobiotica 46, 14–24. doi: 10.3109/00498254.2015.1048327


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Kurkinen, Niinivehmas, Ahinko, Lätti, Pentikäinen and Postila. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# LeadOp+R: Structure-Based Lead Optimization With Synthetic Accessibility

#### Fang-Yu Lin<sup>1</sup> , Emilio Xavier Esposito<sup>2</sup> and Yufeng J. Tseng1,3 \*

<sup>1</sup> Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan, <sup>2</sup> exeResearch, LLC, East Lansing, MI, United States, <sup>3</sup> Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

We previously described a structure-based fragment hopping for lead optimization using a pre-docked fragment database, "LeadOp," that conceptually replaced "bad" fragments of a ligand with "good" fragments while leaving the core of the ligand intact thus improving the compound's activity. LeadOp was proven to optimize the query molecules and systematically developed improved analogs for each of our example systems. However, even with the fragment-based design from common building blocks, it is still a challenge for synthesis. In this work, "LeadOp+R" was developed based on 198 classical chemical reactions to consider the synthetic accessibility while optimizing leads. LeadOp+R first allows user to identify a preserved space defined by the volume occupied by a fragment of the query molecule to be preserved. Then LeadOp+R searches for building blocks with the same preserved space as initial reactants and grows molecules toward the preferred receptor-ligand interactions according to reaction rules from reaction database in LeadOp+R. Multiple conformers of each intermediate product were considered and evaluated at each step. The conformer with the best group efficiency score would be selected as the initial conformer of the next building block until the program finished optimization for all selected receptor-ligand interactions. The LeadOp+R method was tested with two biomolecular systems: Tie-2 kinase and human 5-lipoxygenase. The LeadOp+R methodology was able to optimize the query molecules and systematically developed improved analogs for each of our example systems. The suggested synthetic routes for compounds proposed by LeadOp+R were the same as the published synthetic routes devised by the synthetic/organic chemists.

Keywords: fragment-based, lead optimization, structure-based drug design, computer-assisted synthesis, human 5-lipoxygenase, tie-2 kinase

#### INTRODUCTION

We recently reported a new structure-based fragment hopping method in lead optimization, LeadOp, (Lin and Tseng, 2011). Our lead optimization method works by decomposing a chemical structure into fragments of different parts, either by chemical or user-defined rules. The fragments are evaluated in a pre-docked fragment database and ranked according to specific fragmentreceptor binding interactions. The ranked fragments provide the ability to replace fragments possessing less favorable contributions to binding. With optimal fragments selected, LeadOp

#### Edited by:

Leonardo G. Ferreira, University of São Paulo, Brazil

#### Reviewed by:

Alain Couvineau, Institut National de la Santé et de la Recherche Médicale (INSERM), France Yingxia Li, Fudan University, China

> \*Correspondence: Yufeng J. Tseng yjtseng@csie.ntu.edu.tw

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

> Received: 28 September 2017 Accepted: 29 January 2018 Published: 05 March 2018

#### Citation:

Lin F-Y, Esposito EX and Tseng YJ (2018) LeadOp+R: Structure-Based Lead Optimization With Synthetic Accessibility. Front. Pharmacol. 9:96. doi: 10.3389/fphar.2018.00096 reassembles the fragments to form the new drug-like compound. LeadOp is an algorithm that can automatically optimize a query molecule by searching and replacing fragments from a pre-docked fragment database in the active site to generate structures with better binding without prior knowledge of better fragments. Additionally, users can specify parts of structures to be optimized based on known interactions or the user's preference. However, the proposed compounds are not always easy to synthesize. In this study, we demonstrate lead optimization with synthetic accessibility, LeadOp+R, an advanced approach for lead optimization with synthetic accessibility.

A basic difficulty in most applications of computer-aided drug design is that designed (suggested) molecules are often of uncertain synthetic accessibility, leading to a slow feedbackimprovement loop between the experimental syntheses and modeling design (Hopkins et al., 2004). Various synthetic planning software, WODCA (Ihlenfeldt and Gasteiger, 1996), SYNGEN (Hendrickson and Toczko, 1988), and ROBIA, 4 (Socorro and Goodman, 2006) were developed to provide the synthetic route generation, that involves either searching a database of chemical reactions or transformation rules for reaction centers that match the target compound to propose analogous transformations. Tools in route generation, mostly retrosynthetic software, can suggest routes based on encoded generalized reaction rules to identify those bond disconnections most apt to lead to synthetically accessible precursor structures (Corey et al., 1975; Corey and Jorgensen, 1976) while Hendrickson's group (Hendrickson et al., 1985) developed a logic-based synthesis design method with formalized reaction constraints. A good example of route generation is Route Designer (Law et al., 2009), that use rules describing retrosynthetic transformations automatically generated from reaction database and generates complete synthetic routes for target molecules starting from available reactants. Applications combining the synthetic route designing and de-novo design for the target binding sites have also been developed, such as SPROUT (Mata et al., 1995), which starts from generation of a skeleton followed by atom substitution to convert the solution skeletons to molecules and rank the output from SPROUT according to ease of synthesis. However, the molecules are generated from the ease of synthesis, the desired core of potential inhibitors could not be easily preserved.

To make the synthetic-modeling feedback loop more straightforward, we develop and implement "LeadOp+R"—**Lead Op**timization with synthetic accessibility based on chemical **R**eaction route. LeadOp+R is an algorithm that performs structure-based lead optimization while considering the synthetic reactions from reactants to products according to reaction rules. It takes into consideration the chemical reaction environment; this information is based on known chemical synthesis. The synthetic routes suggested by LeadOp+R are examined to ensure the validity of transformation from one starting reactant into the final product through the use of the LeadOp+R reaction database. The extracted reaction rules in LeadOp+R reaction database do not take into account temporarily or unwanted chemical reactions; on the contrary, these extracted reaction rules consider direct chemical reactions that transform the starting reactants into products. LeadOp+R's algorithm consists of the following five steps: (i) identify a preserved space (defined by the volume occupied by a fragment of the query molecule to be preserved by the user) and searches for building blocks with the same preserved space as initial reactants, (ii) search the reaction rules for each reactants identified, (iii) generate reaction products based on reaction rules, (iv) evaluate the conformations of each products of each reaction, and (v) select the conformer from previous steps that would be selected as the reactant to grow molecules until optimizations are fulfilled for each selected inhibitor–receptor interactions by users. Multiple conformers of each product for each step were considered and evaluated. The conformer with the best group efficiency score would be selected as the next reactant, wherein the group efficiency score is calculated based on binding energy divided by the number of heavy atoms. Thus, this evaluation would favor the conformers with stronger binding toward the specified receptor-ligand interactions with less heavy atoms (Hopkins et al., 2004; Ciulli et al., 2006; Alex and Flocco, 2007; Saxty et al., 2007; Congreve et al., 2008; Orita et al., 2009). Compounds passing the molecular property filters comprised the final list of proposed compounds. The compounds were then energyminimized and ranked on the basis of the overall ligand– receptor binding energy. To investigate the interactions between the newly assembled molecules and their receptor, molecular dynamics simulations were performed to explore the compounds' poses and interactions with the solved crystal structure of the receptor.

To demonstrate the LeadOp+R algorithm, we selected the Tie-2 kinase (Hodous et al., 2007) and human 5-lipoxygenase (5-LOX) (Ducharme et al., 2010) protein systems and their associated inhibitors as model systems. The endothelium-specific receptor tyrosine kinase Tie-2 (tyrosine kinase containing Ig and EGF homology domains) is primarily expressed in the vascular endothelium and is involved in vessel branching, sprouting, remodeling, maturation, and stability (Yu, 2005). The role of tyrosine kinases in angiogenesis and in the vascularization of solid tumors has drawn considerable interest (Hasegawa et al., 2007) and is considered to be angiogenesis-dependent in cancer. Interference with the Tie-2 pathway by diverse blocking agents has been shown to suppress tumor growth in xenograft studies (Oliner et al., 2004). The development of Tie-2 kinase inhibitors may block the beneficial anti-inflammatory and vascular stabilizing effects, thus the discovery of potent Tie-2 kinase inhibitors has advanced into clinical studies (Huang et al., 2010). Lipoxygenases are a family of iron-containing enzymes found in a large variety of organisms, including bacteria and animals. It catalyzes the dioxygenation of polyunsaturated fatty acids containing a cis-1,4-pentadiene structure—the first committed structure in a metabolic pathway cascade—and involved in the initiation of signaling molecule synthesis and inducing structural or metabolic changes (Steele et al., 1999). Four major isozymes of lipoxygenases have been identified (Ivanov et al., 2010), including 5-, 8-, 12-, and 15-LOX, that are key enzymes in the metabolism of prostaglandins and leukotrienes. In particular, leukotrienes are produced through the 5-LOX pathway and the increased activity of the 5-LOX pathway is strongly associated with atherosclerosis (Woods et al., 1993). As the 5-LOX biological pathways and byproducts lead to inflammation, discovering a 5-lipoxygenase inhibitor is important to the fields of inflammatory and allergic diseases (Shaffer and Mansmann, 1997).

### MATERIALS AND METHODS

#### Overall Procedure

The general protocol for LeadOp+R is illustrated in **Figure 1** and the details of each step are described in the following sections. Prior to applying the LeadOp+R optimization procedure, a reaction rule database is constructed, containing reaction rules for the reactant moiety, the product moiety, and the building blocks of each reaction. Thus, participants involved in each reaction are known for synthetic assessment in LeadOp+R. The initial step of LeadOp+R requires the user to select the favored inhibitor-receptor interaction positions for optimization. The inhibitor-receptor interaction positions determine the "direction" for virtual synthesis and optimizations. LeadOp+R will systematically optimize and grow a structure until all the user-defined directions are processed. LeadOp+R initiates the analysis with the inhibitor-receptor complex from docking studies or crystal structures. The user can determine which fragment(s) in the query inhibitor (initial compound) to preserve during optimization. To ensure that the initial synthesis is accessible, the starting building block—containing the preserved fragment—is used as the initial building block. LeadOp+R then searches the reaction rule database with this building block to identify associated reactions rules. Once the reactions rules and associated participants are identified, the products of each reaction rule are generated virtually. The best binding conformation of the proposed compound is selected from an ensemble of conformers are constructed of each compound. The conformer of each compound with the lowest group efficiency value is selected as the initial conformer of the next building block until the program reaches the termination condition. By evaluating the contribution of each product upon binding with group efficiency, LeadOp+R selects compounds that bind stronger yet possess less heavy atoms. The compounds passing a set of molecular property filters comprised the final list of proposed compounds. Following a short molecular dynamics simulation, the compounds are energy-minimized and ranked on the basis of the overall ligand–receptor binding (interaction) energy. This provides a series of new and more potent compounds that are chemical accessibility.

### Example Systems

Tie-2 kinase (PDB: 2p4i), an endothelium-specific receptor tyrosine kinase (Hodous et al., 2007) and human 5-LOX enzyme (Charlier et al., 2006) a key enzyme in leukotriene biosynthesis, were selected as model systems to examine the LeadOp+R approach. One Tie-2 kinase inhibitor, compound 46 in reference 16 (denoted as compound rA in this study) and a human 5-LOX inhibitor, compound 7 (substituted coumarins) in reference 17 (denoted as compound rB in this study), were selected as the LeadOp+R optimization examples.

### Construction of the LeadOp+R Reaction Database

LeadOp+R collects chemical reactions, building blocks, and reaction rules with reactant moieties and product moieties of each reaction to construct the LeadOp+R reaction database. LeadOp+R includes 198 classic chemical reactions from the Reaxy Database and 2,091 organic building blocks from the commercially available Sigma-Alderich Co<sup>1</sup> . product library. These building blocks include the typical building blocks in a chemical synthesis such as various nitrogen compounds (amines, isocyanides) and carbonyl compounds (amides, aldehydes, and ketones). A reaction rule in LeadOp+R includes the reactant moieties and product moieties extracted from the full structure of reactants and products of each reaction collected. In LeadOp+R, the reaction moieties were defined and extracted from a chemical reaction according the following steps (see **Figure 2** for the illustration of the steps):


Finally, the building blocks with the same reactant moiety for each reaction rule are collected (through a JChem applicationprogramming interface; JChem API) and classified by the reaction. Building blocks for each reaction rule are recorded and used for virtual synthesis in the LeadOp+R algorithm.

#### Identify Reactant

LeadOp+R initiates the analysis of a complexed structure (inhibitor-receptor) taken from a docking study or crystal

<sup>1</sup> "Sigma-Aldrich Chemie GmbH." (Steinheim, Germany).

structure. Initially, the user identifies and preserves the "fragment space" of a query molecule defined by a fragment's volume. LeadOp+R then searches for building blocks with the same volume as the potential initial reactants. Products of each potential initial reactant are virtually synthesized according to the steps below. For each product molecule that passes the evaluation

FIGURE 2 | Example of three steps used to construct the table of reaction rules. (A) Identification of reaction cores. The atoms with changed atom attributes are highlighted in red and blue within the two reactants. (B) Extraction of the moieties. (C) Identification of building blocks containing the reactant moieties. (D) Illustration of the steps in generating products.

step, that product molecule becomes the next reactant in the next synthesis step.

### Determine Reaction Rules for Each Reactant Identified

When a reactant is identified in the previous step, there are many potential reactant moieties and reactions associated with this reactant. Each reactant is subjected to sub-structure searching<sup>2</sup> to identify atom arrangements (moieties) that are part of a chemical reaction rule within the LeadOP+R reaction database. This is done to determine potential chemical reactions for this specific reactant.

### Generation of Reaction Products Based on Reaction Rules

Once all the potential reaction rules of a reactant are identified, the corresponding products are generated by "reacting" the reactant moieties and participant reactants (**Figure 2D**). In LeadOp+R, each reactant has two parts: one structure matches the reactant moiety and the other structure—excluding the reactant moiety—is denoted as the "clipped reactant." The same definition is used for other building blocks (participants) involved in a reaction. Each product is generated by combining the clipped portion of the reactant and the clipped portion of the participants as well as the product moiety based on the search of the reaction rule.

### Evaluation of the Products for Each Reaction

Thirty conformers of each product are generated using the Java and JChem application-programming interface (Imre et al., 2006). Each conformer is aligned with the preserved space of the query molecule, while maximizing the overlap volumes, using the flexible 3D alignment tool of Marvin<sup>3</sup> (see **Figure 3**). A conformer for each product was selected for the next step if the following criteria are met: (1) the binding mode of each conformer, aligned with the query molecule within the receptor site, has the same inhibitor-receptor interaction direction, and (2) the new moiety has a group efficiency value <−0.1.

### Final Selection by Structure-Based Analysis

The selected conformer for each product is the reactants for the next reaction in the selected inhibitor-receptor interaction direction. The molecule continues to grow until all the inhibitorreceptor interaction directions are exhausted. The collection of potential new compounds is reduced using the following criteria: molecular weight <600 g mol−<sup>1</sup> and a calculated lipophilicity (cLogP) <5, which is taken into account based on the Lipinski's Rule-of-Five (Lipinski et al., 2001). The compounds that pass the molecular property filters comprised the final list of proposed compounds. These compounds are then energy-minimized within the binding site and ranked based on the overall ligandreceptor binding energy.

inhibitor-receptor interaction direction (location) is indicated by the dotted red

### Molecular Dynamics Simulations

line.

The bound pose of the newly "constructed" compound, as determined with AutoDock Vina (Trott and Olson, 2010), is refined from the lowest binding free energy and the largest number of favorable ligand-receptor interactions within the binding site. The unfavorable contacts between the docked pose of the energy minimized "constructed" compound (fragments connected to the initial core of the compound) and the residues within the binding site are alleviated using molecular dynamics simulations; allowing the ligand-receptor complex to explore the local energy landscape. The best complex pose (ligandreceptor interaction) was selected and molecular dynamics was performed using GROMACS version 4.03 (Hess et al., 2008) and the GROMOS 53A6 force field (Oostenbrink et al., 2005). The complexes are placed in a simple cubic periodic box of SPC216 type water molecules (Berendsen et al., 1981), and the distance between the protein and each edge of the box was set to 0.9 nm. To maintain overall electrostatic neutrality and isotonic conditions, Na<sup>+</sup> and Cl<sup>−</sup> ions were randomly positioned within the solvation box. To maintain the proper structure and remove unfavorable van der Waals contacts, a 1,000-step steepest descent energy minimization was employed and terminated when the convergence criteria of an energy difference between subsequent steps differ <1,000 kJ mol−<sup>1</sup> nm−<sup>1</sup> . Following the energy minimization, the system is subjected to a 1,200 ps molecular dynamics simulation at constant temperature (300 K), pressure (1 atm), and a time step of 0.002 ps (2 fs) with the coordinates of the system recorded every ps.

<sup>2</sup> "JChem." 5.4.1.1 ed. (Budapest, Hungary: ChemAxon Ltd).

<sup>3</sup> "Marvin." 5.4.0.1 ed., (Budapest, Hungary: ChemAxon Ltd).

### RESULT AND DISCUSSION

#### LeadOp+R Optimization for Tie-2 Kinase Inhibitors

#### Structure-Based Lead Optimization With Synthetic Routes

From the literature (Bridges, 2001), it is known that a good kinase inhibitors should possess a hydrogen-bond donor/acceptor/donor motif to best interact with the backbone carbonyl/NH(amide)/carbonyl presented in the ATP-binding cleft. In the case of Tie-2 kinase, the residues in the active site of the ATP-binding cleft are Ala905 (carbonyl and amide NH) and Glu903 (carbonyl). Additionally, two hydrophobic pockets are part of the active site in the Tie-2 receptor and are designated as the first hydrophobic pocket (HP) and the extended hydrophobic pocket (EHP). We selected a series of Tie-2 inhibitors from the literature (Hodous et al., 2007) containing a co-crystal structure of inhibitor compound 47 with Tie-2 receptor (PDB code: 2p4i). In this co-crystal structure, the 2-(methylamino)pyrimidine ring of inhibitor compound 47 interacts with residue Ala905 via two hydrogen bonds and the pyrimidine is also within van der Waals contact of the Glu903. The central methyl-substituted aryl ring of compound 47 resides in the first hydrophobic pocket (HP), while the pyridine ring forms an edge-to-face π-stacking interaction with Phe983 of the DFG-motif. The carbonyl oxygen makes a hydrogen bond with the backbone NH of Asp982 (DFG motif) and the aryl amide moiety directs the terminal CF3 substituted aromatic ring into the EHP. **Figure 4A** illustrates the ligand-protein interaction of this co-crystal structure.

To demonstrate how LeadOp+R optimizes a compound automatically while considering the potential synthetic route, compound 46 is the query molecule for lead optimization (denoted as compound rA in this study) with a biologically determined IC<sup>50</sup> value of 399 nM (Hodous et al., 2007). Compound rA was docked into the Tie-2 binding site and the lowest energy conformation was selected. The selected conformation possessed similar molecular interactions, as discussed earlier, with the Tie-2 active site (**Figure 4A**). The amide functional group of compound rA forms a hydrogen bond with the backbone amide of Asp982, while the pyridine and benzene rings extend into the hydrophobic pocket (HP) and EHP, respectively. The aminobenzoic fragment was designated as the preserved space in this example of LeadOp+R due to the important hydrogen bonding.

To evaluate our algorithm, we compared all of the LeadOp+R generated compounds to Tie-2 kinase inhibitor from the literature and found nine of the LeadOp+R compounds have also been synthesized and their ability to inhibit Tie-2 kinase measured. The inclusive synthesis of proposed products in each LeadOp+R step combined with systematically examining the proposed ligand-receptor interactions resulted in nine compounds with more potent IC<sup>50</sup> values than the original compound (compound rA). All the LeadOp+R generated compounds were energy minimized in the active site of Tie-2, and then ranked on the basis of the overall ligand–receptor interaction energy. Among all LeadOp+R suggested compounds, nine compounds were previously studied in the literature (Hodous et al., 2007), and the priority suggested by the calculated binding energy had same trend as the experimentally determined IC<sup>50</sup> values. In this study of Tie-2 kinase inhibitor design three compounds, denoted as compounds rA1, rA2, and rA3 of the nine LeadOp+R generated compounds—were selected for further investigation. For these three compounds we found detailed synthetic route information (Hodous et al., 2007) and inhibition potency in the literature. These three compound rA1-rA3, have a higher potency than the query compound rA and the suggested priority of the new compounds with the calculated binding energy have a similar IC<sup>50</sup> potency trend. Depicted representations of compounds rA1-rA3, as well as the corresponding inhibition data from the biological experiments and their predicted binding energy are provided in **Table 1.**

Molecular dynamics simulations were performed with three LeadOp+R generated compounds, rA1–rA3, to further analyze the ligand-protein interactions within the Tie-2 kinase active site. Following geometry optimization of the compounds with respect to Tie-2, molecular dynamics simulation studies were performed and the unique low-energy conformations of the complexes, from the final 50 ps of the MDS (50 configurations), are shown in **Figures 4B–D**.

In the generated compounds (rA1, rA2, and rA3) both amide arrangements are engaged in strong hydrogen bonds with Asp982 of the DFG-motif (first three residues of the activation loop). The pyrimidine ring in compounds rA1 and rA2 makes key hydrogen bonds with the backbone amide of the linker residue Ala905, situating the pyridine rings in alignment and within edge-to-face π-stacking distance of Phe983 of the DFG-motif. Additionally, the central and terminal aryl rings overlap with only slight differences in orientation for compounds rA1, rA2, and rA3. The additional a hydrogen bond forms between the methoxy group of compound rA1 and residue Asp982, while the CF3 groups are placed in essentially the same location within the EHP for compounds rA2 and rA3. These optimized results indicate the hydrogen-bonding and hydrophobic interactions are important for ligands binding to and inhibiting Tie-2, as previously reported (Hodous et al., 2007).

#### Synthetic Routes Suggested by LeadOp+R

For Tie-2 kinase inhibitors, favorable interactions occur between the ligand and the specific receptor residues Glu 872, Asp 982, Phe983, Ala905, and Glu903 (see **Figure 4A**). In this example, these interactions are selected as preferred inhibitorreceptor interactions for LeadOp+R to optimize based on the provided query molecule in a selective and systematic process. Experimental synthetic routes from the literature (Hodous et al., 2007; **Figures 5A**, **6A**, **7A**) and the reaction routes suggested by LeadOp+R (**Figures 5B**, **6B**, **7B**) to generate compound rA1, rA2, and rA3 are summarized below to demonstrate how LeadOP+R suggests the synthetic reaction routes that are similar to those proposed by organic and medicinal chemists. Matched reaction rules are listed to the right of **Figures 4C**, **5C**, **6C** with details of each synthetic step identified by LeadOp+R, for each product, described below.

**Figure 5A** illustrates the experimental reactions required to synthesize compound rA1 (compound **7**) by reacting **5** (which

proposed compound at the binding site are depicted with cyan molecular surface.

was generated through transforming **2** into **4**) followed by reacting **1** with **6**. To compare LeadOp+R's suggested virtual synthesis of compound rA1 to proven synthetic routes; we compared the key reaction rules from experimental synthetic steps in the literature.

**Figure 5B** shows the LeadOp+R suggested synthetic routes to generate compound rA1 using the selected and preferred inhibitor-receptor interactions that allowed LeadOp+R to selectively and systematically optimize the query molecule. Initially, compound **1** was identified as the first reactant by searching all building blocks with the preserved fragment. LeadOp+R then proceed to produce product **8** by coupling **1** with **6** with the reaction rule (i) that conserves the preferred interaction with Glu872 specified. The reaction rule suggested by LeadOp+R matched the synthetic steps in the literature that forms compound **7** by combining compound **5** and fragment **6.** Next, product **8** was considered as the reactant to interact with compound **2** to generate product **9** by growing molecules TABLE 1 | Rank of the proposed LeadOp+R compounds based on the calculated binding energy, inhibition concentration (IC50) of Tie-2 from the literature (Hodous et al., 2007).

All proposed compounds have a lower IC<sup>50</sup> value than the query compound and the suggested priority of the three new compounds (out of 631) have a similar trend as the IC<sup>50</sup> potency values.

with preferred interaction toward Phe983. The second reaction rule (ii) suggested by LeadOp+R lead to product **9** that matched the same synthetic steps as those in the literature to synthesize compound **5** by reacting **1** with **4**. It is interesting to note that at this step, the structure marked in red is the current structure **9,** the same partial structure highlighted in red within the final product **7** (compound rA1) in the experimental synthesis. LeadOp+R continued the recursive optimization toward the

cavity near Phe983 and Ala905 to transform **9** to **7** (compound rA1) with the third reaction rule, **Figure 5C**. This reaction route suggested by LeadOp+R also matches the experimental synthetic route in the literature to transform **2** into **4**. To this end, LeadOp+R has successfully optimized the query compound rA to compound rA1 and suggested corresponding synthetic routes. In this example, we demonstrated how LeadOp+R controls the synthetic flow by extending the molecules with preferred interactions, available building blocks and associated reactions rules to reach fragment based optimization and synthetic accessible. Thus, the sequence of reactions to "grow" molecules may not be the same as those verified in experimental synthesis.

**Figure 6A** shows the experimental reaction to synthesize compound rA2 (compound **19**) by reacting **18** (which was generated through the transformation of **13**–**18**) with **12** (which was generated through the reaction of **10** with **11**). To compare

the LeadOp+R suggested virtual synthesis route for compound rA2 with the experimental synthetic route, we compared the key reaction rules from the experimental synthetic steps in the literature with the LeadOp+R suggested synthetic routes.

**Figure 6B** shows the LeadOp+R suggested synthetic routes for compound rA2, using the selected and preferred inhibitorreceptor interactions to optimize the query molecule in a selective and systematic manner. Initially, a hydroxy benzoic acid of **10**

matched reaction rules provided by LeadOp+R.

was identified as the first reactant by searching all building blocks with the preserved fragment. LeadOp+R then proceed to suggest product **12** by reacting **10** with **11** via the first reaction rule (i) that preserves the ligand's interaction with Glu972 of the active and rA3.

site. The reaction rule suggested by LeadOp+R matched the synthetic steps in the literature that forms compound **12** from compounds **10** and **11**. Next, product **12** was considered as the reactant to react with compound **13** to generate product **20**, by growing molecules with preferred interaction toward Phe983. The second reaction rule (ii) generates product **20** and the reaction route suggested by LeadOp+R matches the synthetic steps in the literature to synthesize compound **19** through the reaction of **12** with **18**. LeadOp+R's recursive optimization continues toward the cavity near Phe983 and Ala905 to transform **20** to **19** (compound rA2) via the third reaction rule (iii), **Figure 6C**. This reaction route suggested by LeadOp+R also matched the experimental synthetic step in the literature to transform compound **13**–**18.**

**Figure 7A** shows the experimental reaction to synthesize compound rA3 (compound **22**) by reacting **21** (which was generated through the reaction of **1** with **11**) with **18** (which was synthesized from **13**). To compare LeadOp+R's suggested synthesis route for compound rA3 with the experimental synthetic routes, we compared the key reaction rules from the experimental synthetic steps in the literature with the LeadOp+R suggested synthetic routes.

**Figure 7B** depicts the LeadOp+R suggested synthetic routes to generate compound rA3, using the selected and preferred inhibitor-receptor interactions to optimize the query molecule. Initially, compound **1,** a hydroxybenzoic acid, was identified as the first reactant by searching all building blocks with the preserved fragment indicated in red, **Figure 7B**. LeadOp+R then proceeded to produce compound **21** by reacting **1** with **11** via the first reaction rule (i) directing the growth of the compound (inhibitor) toward the preferred ligand interaction with Glu972. The reaction rule suggested by LeadOp+R matched the synthetic steps in the literature that forms compound **21** via the transformation of compound **1** with fragment **11**. Next, product **21** was reacted with compound **13** to generate product **23**, growing the transformed molecule toward Phe983. The second reaction rule (ii) generated product **22** as suggested by LeadOp+R and matches the same synthetic steps as those in the literature to synthesize compound **22** through the reaction of compound **21** with fragment **18**. The recursive optimization of the initial query compound toward the cavity near Phe983 and Ala905 by LeadOp+R transformed compound **23** to **22** (compound rA3) with the third reaction rule (iii) as illustrated in **Figure 7C**. This reaction rule, suggested by LeadOp+R, also matches the experimental synthetic step in the literature to transform **13**–**18**.

LeadOp+R has successfully optimized the query compound rA to compounds rA1, rA2, and rA3 with synthetic routes that match experimental synthetic routes for each compound. Through the systematic synthesis and constant evaluation of intermediate products via group efficiency, LeadOp+R searched each product and discovered higher binding inhibitors. Increased hydrophobic interactions between compound rA1 and the receptor were observed between the compound's aromatic group that resides in the EHP pocket (**Figure 4B**) and the methylpyrimidine. This corresponds to the experimental results and rA1 exhibits stronger inhibitor potency than compounds rA2

In the example of Tie-2 inhibitor design, LeadOp+R demonstrates its ability to control the synthetic flow by extending the query molecules to optimize the preferred ligand-receptor interactions while using the available building blocks and associated reactions rules to find the most feasible synthetic accessibility.

### LeadOp+R for Human 5-Lipoxygenase Inhibitor

#### Structure-Based Lead Optimization With Synthetic Routes

The human 5-Lipoxygenase (5-LOX) enzyme with the wellknown 5-LOX inhibitors was selected as the second LeadOp+R test case. To design better 5-LOX inhibitors, structural insight of the 5-LOX active site and its associated interactions with ligands would be helpful; therefore we selected a theoretical model (comparative/homology protein structure/model) of 5- LOX (Charlier et al., 2006) that has good agreement with mutagenesis studies (Hammarberg et al., 1995; Schwarz et al., 2001). The proposed active site of 5-LOX forms a deep and bent cleft (channel) that extends from Phe177 and Tyr181 at the top of the cleft to the Trp599 and Leu420 amino acid residues at the bottom of the cleft (shown in **Figure 8A**). Most of the residues lining the cleft are hydrophobic with several key polar residues (Gln363, Asn425, Gln557, Ser608, and Arg411) distributed along the channel with the ability to interact with the ligand during the binding process. A small side pocket off of the main channel is composed of hydrophobic residues (Phe421, Gln363, and Lue368) and it is postulated that the lipophilic interactions between the ligand and receptor may enhance activity. The purported major pharmacophore interactions needed for a ligand to bind to 5-LOX includes: (i) two hydrophobic groups, (ii) a hydrogen bond acceptor, (iii) an aromatic ring, and (iv) two secondary interactions. The two secondary interactions are between the ligand and an acidic moiety (amino acid residue) and a hydrogen bond acceptor within the binding pocket of the receptor. The hydrogen bond acceptor of the ligand most likely interacts with the key anchoring points of the receptor (Tyr181, Asn425, and Arg411) to form hydrogen bonds, while Leu414 and Phe421 form a hydrophobic interaction between the ligand and the binding cavity (Charlier et al., 2006).

The 5-LOX inhibitor, compound 7 in the literature (Ducharme et al., 2010), was selected as our initial query molecule (denoted as compound rB in this study), which had a biologically determined IC<sup>50</sup> value of 145 nM. Compound rB was docked into the 5-LOX computationally derived binding site and the lowest energy conformation was submitted to LeadOp+R. This selected pose (conformation) possesses similar ligand-receptor interactions as previously reported (Charlier et al., 2006). The oxochromen ring favorably interacts with the hydrophobic

residue Leu414 (CH-π interaction) in the middle of the cavity, while the fluoro phenyl group extends into the hydrogen-bond acceptor region in the lower cleft of the active site. The docked

interactions (labeled red) with the proposed compound within the binding site are depicted with gray molecular surfaces.

conformation of compound rB was selected as the reference inhibitor with the oxochromen ring serving as the template structure.

To evaluate our algorithm, we compared all of the LeadOp+R generated compounds for 5-LOX to the analogs described in the literature and found that six of the LeadOp+R proposed compounds have been synthesized and their biological activities measured (Schwarz et al., 2001). The inclusive synthesis of products at each step combined along with systematically examining the interactions of the proposed compounds with the receptor generated six compounds with more potent IC<sup>50</sup> values than the original compound (compound rB). All the LeadOp+R generated compounds were energy minimized within the active site of 5-LOX and then ranked based on the predicted binding energy of the complex and the suggested priority has the same trend as the IC<sup>50</sup> potency values from the experimental study (Schwarz et al., 2001). In this study of 5-LOX inhibitor design, three compounds (denoted as compounds rB1, rB2, and rB3) of the nine LeadOp+R generated compounds, were selected for further investigation. For these three compounds detailed synthetic information (Ducharme et al., 2010) and inhibition potency is available from the literature (Ducharme et al., 2010). Additionally, these three compound rB1, rB2, and rB3 have a higher potency than the query compound rB and their suggested priority, based on predicted binding energy, as well as a similar IC<sup>50</sup> trend. Depicted representations of the compounds rB1, rB2, and rB3, the corresponding inhibition data from the biological experiments, and their predicted binding energy are listed in **Table 2.**

Molecular dynamics simulation studies were performed with the final poses of compounds rB1, rB2, and rB3 with respect to 5-LOX. The unique low-energy conformations of the complexes, from the last 50 ps of the MDS (50 configurations), are shown in **Figures 8B–D**.

The interactions of compounds rB1, rB2, and rB3 all reside within the hydrophobic pocket and contain participate in hydrogen bonding interactions between the oxygen or nitrogen atoms of the thiazol group with Lys409 and Tyr181. For compounds rB1 and B3, the fluoro group extends to the hydrogen-bond acceptor in the upper domain of the active site and interacts with Lys409. In addition, the oxochromen ring is in close proximity to Leu414 and is potentially an important CHπ contact as indicated in the literature (Charlier et al., 2006). Also, the thiazole structure of compound rB1 interacts with the 5-LOX hydrophobic residues Leu420 and Leu607 and it has been suggested that these interactions improve ligand binding via complementary hydrophobic interaction between the ligand and receptor. Additionally, favorable interactions occur between the fluoro group and residues Lys409, Arg411, and Tyr181. These contributions to the ligand-protein binding probably accounts for compound rB1's better inhibition compared to compounds rB, rB2, and rB3. These optimized results indicate that hydrogen bonding and hydrophobic interactions are important for ligands binding to and inhibiting, 5-LOX as previous report (Hodous et al., 2007).

#### Synthetic Routes Suggested by LeadOp+R

The favorable interactions between inhibitors and 5-LOX, as stated in the literature, are two hydrogen-bond acceptor interactions within the binding pocket (including ligand interactions with Asn425 and Tyr181), two hydrophobic interaction pockets (including ligand interactions with Leu368, Gln363, Phe421, Arg411, Ile406, Lys409, and Phe177), and aromatic interactions (between the ligand and residues Leu414 and Leu607). In this example, ligand interactions with Asn425, Leu414, Leu607, and Tyr181 are indicated as "preferred" inhibitor-receptor interactions for LeadOp+R to selectively and systematically optimize. Experimental synthetic routes from the literature (Schwarz et al., 2001) (**Figures 9A**, **10A**, **11A**) and the synthetic reaction routes suggested by LeadOp+R (**Figures 9B**, **10B**, **11B**) to generate compound rB1, rB2, and rB3 are summarized below. To demonstrate LeadOp+R's ability to suggest reaction routes similar—or exactly the same—to those proposed and executed by synthetic chemists, the matched reaction rules are listed to the right of **Figures 9C**, **10C**, **11C**. Details of each synthetic step, identified by LeadOp+R for each product (proposed compounds/inhibitor), are described below.

**Figure 9A** shows the experimental reaction route (Schwarz et al., 2001) to synthesize compound rB1 (compound **30**) by reacting compound **26** (which was generated through the reaction of **24** with **25**) with **29** (which was generated through the reaction of **27** with **28**). To compare the LeadOp+R suggested synthesis with the experimental synthetic route for compound rB1, we compared the key reaction rules for the experimental synthetic steps in the literature with those suggested by LeadOp+R.

**Figure 9B** shows the LeadOp+R suggested synthetic routes to generate compound rB1 using the selected preferred inhibitorreceptor interactions. Compound **24** was identified as the initial reactant by searching all the available building blocks and preserving the molecular fragment. LeadOp+R suggested product **26** by reacting **24** with **25** with the first reaction rule (i) "growing" the compound toward the preferred interaction with Asn425. The reaction rule suggested by LeadOp+R matches the synthetic steps in the literature that yields compounds **26**, **24,** and **25**. Next, product **26** was considered as the reactant to interact with compound **28** to generate product compound **31** by extending the ligand toward preferred interactions with Leu414. The second reaction rule (ii) to generate compound **31**, as suggested by LeadOp+R, matches the synthetic routes presented in the literature to synthesize the thioether bond in compound **30** through the reaction of **26** with **29**. It should be indicated that in this step, the structure marked in red is compound **31** and it is the same as the partial structure denoted in red for the final product **30** (compound rB1) in the experimental synthesis. The recursive optimization continues via LeadOp+R toward the cavity near Ile406 and the synthesis of compound **30** (compound rB1) by reacting **31** with **27** and the third reaction rule (iii) in **Figure 9C**. The LeadOp+R suggested reaction route also matches the experimental synthetic step in the literature to synthesize compound **29** through the reaction of **27** with **28**. To this end, LeadOp+R has successfully optimized the query compound rB to compound rB1 and suggested feasible synthetic routes. In this example, we demonstrated LeadOp+R's control of the synthetic flow by extending the molecules to exploit preferred interactions, available building blocks, and associated reactions rules to achieve fragment based optimization and synthetic

TABLE 2 | Rank of the proposed LeadOp+R compounds based on the calculated binding energy, inhibition contraction (IC50) of 5-LOX from the literature (Ducharme et al., 2010).

All proposed compounds have a higher IC<sup>50</sup> value than the query compound and the suggested priority of the three new compounds (out of 419) have a similar trend as the IC<sup>50</sup> potency values.

accessibility. For these reasons, the sequence of steps to "grow" molecules may not be the same as the published experimental synthesis.

**Figure 10A** depicts the experimental reaction scheme (Schwarz et al., 2001) to synthesize compound rB2 (compound **38**) by reacting **26** (which was generated through the reaction of **24** with **25**) with **37** (which was synthesized through a series of reaction starting with compound **32** to formed **37**). To compare LeadOp+R's suggested synthesis of compound rB2 to the experimental synthetic routes, we explored the key reaction rules of the experimental synthetic steps in the literature for the proposed compound.

the LeadOp+R reaction database. (A) Synthetic routes with reagents and condition (a–e) from experimental studies (Ducharme et al., 2010). (B) Synthetic routes and (C) matched reaction rules provided by LeadOp+R.

**Figure 10B** shows the LeadOp+R selective and systematically suggested synthetic routes to generate compound rB2 based on the user specified preferred inhibitor-receptor interactions. Initially, compound **24** was identified as the first reactant by searching all building blocks with the preserved fragment. LeadOp+R then proceed to produce compound **26** by reacting **24** with **25** via the first reaction rule (i) that directs the suggested compound toward the preferred interaction with Leu414. The reaction rule suggested by LeadOp+R matches the synthetic steps in the literature for the synthesis of compound **26** from compound **24** and **25**. Next, product **26** was considered as the reactant to react with compound **32** to generate product **39**; again by growing the molecule toward the preferred interaction with Leu414. The second reaction rule (ii) to generate product **39** suggests the same synthetic steps as the literature to synthesize compound **38** by reacting **26** and **27**. The recursive optimization continues to explore the potential ligand interactions with Leu414 and Ile406 to generate compound **38** (compound rB2) by reacting **39** with **35** via the third reaction rule (iii) to synthesize compound **36** by the reaction of **34** and **35**, resulting in the final product compound rB2.

**Figure 11A** shows the experimental synthesis route (Schwarz et al., 2001) to synthesize compound rB3 (compound **43**) by reacting **40** with **42** (which was generated through the reaction of **35** with **41**). To compare the LeadOp+R suggested route to the experimental route for rB3, we look at the key reaction rules in the literature.

**Figure 11B** shows the LeadOp+R suggested synthetic routes for compound rB3 using the selected preferred inhibitor-receptor interactions. Initially, compound **24** was identified as the first reactant by searching all building blocks with the preserved fragment that is indicated in **Figure 11B** as the red structure. LeadOp+R proceeded to generate compound **26** by reacting **24** with **25** via the first reaction rule (i) suggested by LeadOp+R. Again, this methodology directs the growth of the new ligand toward the preferred interaction of the ligand interacting with Leu414. The synthetic reactions suggested by LeadOp+R match the synthetic steps presented in the literature that forms compound **26**. Next, product **26** was considered the reactant and transformed into product **40** by growing the ligand toward Ile406 of 5-LOX. The second reaction rule (ii) generates compound **40** and matches the synthetic steps discussed in the literature; compound **40** is identified as the same product that is discussed in the literature to synthesize compound **44**. Continuing the recursive optimization to initiate the ligand's interaction with Ile 406 and Tyr181 results in the third reaction rule (iii), **Figure 11C**, and leads to compound **43**. Compound **44** was identified as the reactant and reacted with **35** based on the fourth reaction rule (iv), generating compound **42** by reacting **35** with **41**.

LeadOp+R has successfully optimized the query compound rB into compounds rB1, rB2, and rB3 and has suggested corresponding synthetic route for each compound. Through systematic synthesis and evaluation of intermediates using group efficiency, LeadOp+R searches for "products" with higher calculated binding affinities and improved interactions with the receptor. The more hydrogen-bonding interactions between compound rB1's oxygen or nitrogen atoms of the thiazol group and the receptor (shown in **Figure 8B**) corresponds to the experimental results of stronger inhibitor potency then the proposed compounds rB2 and rB3. In the example of 5-LOX inhibitor design, we demonstrate LeadOp+R's ability to controls the synthetic flow by extending the ligands with preferred interactions, available building blocks, and associated reactions rules.

#### LIMITATION

LeadOp+R is an optimization algorithm that starts with a query reactant (compound) and better lead optimization occurs when starting the optimization process with a good binder that is advantageously positioned in the binding site. The LeadOp+R algorithm does not consider experimental product yield rate, reaction rate, and reaction conditions of a chemical synthesis but does propose potential synthetic routes purely based on the chemical reaction rules contained in the chemical reaction database. However, incompatibility of the reaction with specific substituents in the core may happen, the proposed synthetic routes are meant to provide a fast, systematic, and preliminary suggestion based on general reaction—synthesis—rules and structure-based (receptor) ligand optimization. The diversity of the reactant database is a critical factor when searching for the participant reactants along with the number of different poses sampled at each reaction site. The greater the number and diversity of reaction rules and reactant available for LeadOp+R to explore—for the system of interest—the greater the possibility to identify and generate new compounds.

#### CONCLUSION

In this work, we have implemented a structure-based lead optimization with synthetic accessibility algorithm called "LeadOp+R." Two model systems, Tie-2 kinase and human 5-LOX enzyme with their associated inhibitors, were selected to demonstrate the abilities of the LeadOp+R algorithm. We demonstrated how a query molecule was enhanced through structured-based optimization and a potential synthetic route was proposed based on reaction rules extracted from in-house synthetic database. In the case of Tie-2, co-crystalized structure is available, while the human 5-LOX example was performed using a theoretical 5-LOX receptor model (comparative or homology protein model) and a known inhibitor. LeadOp+R generates a set of potential compounds that exhibit better-calculated inhibition, possess better efficiency score(s), along with providing synthesis routes based on published reaction mechanisms (contained in an in-house reaction and reactants database). The molecular dynamic simulation analysis further demonstrates that the generated structures preserve the important ligand-protein interactions as seen in the crystal structures or reported in the literature. For the proposed compounds with biological inhibition values (IC50) obtained from the literature, LeadOp+R calculated inhibition values corresponding (based on rankings) to the literature values. The interactions between the inhibitor and protein, as noted in the literature, were observed in the entire molecular dynamic simulation. Moreover, we identified fragments that created and retained ligand-receptor interactions that were stronger and more consistent than the original query compound; these fragments were selected based on reaction rules and discovered in our reactant database.

In short, LeadOp+R is an algorithm that can automatically optimize a query molecule based on reaction routes by searching and selecting reactants that can undergo chemical synthesis thus generating compounds with better binding affinity for the biological system (receptor) of interest. Additionally, users can indicate specific parts of the query compound to be optimized and assign the predicted binding space (portion of the binding site) for the generated products based on known ligand-receptor interactions or preference. LeadOp+R is an algorithm that cannot only optimize the lead compounds but also design favorable and practical synthetic routes based on known reaction mechanisms, leading to faster data

#### REFERENCES


feedback between experimental and computer-aided molecular design.

### AUTHOR CONTRIBUTIONS

YT: Initiated the concepts; F-YL and YT: Drafted, programmed, and performed the analysis on the projects; EE: Edited and gave comments on the work.

#### ACKNOWLEDGMENTS

This work was funded by the Taiwan National Science Council, grants number 98-2323-B-002-011-, 106-2622-B-002- 008-, 105-3011-F-002-010-, 106-2911-I-002-533, 106-2321-B-002-041-, and 106-3114-B-038-001-. Resources of the Laboratory of Computational Molecular Design and Metabolomics and the Department of Computer Science and Information Engineering of National Taiwan University were used in performing these studies.


**Conflict of Interest Statement:** EE was employed by exeResearch, LLC and The Chem21 Group, Inc.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Lin, Esposito and Tseng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Structural Changes Due to Antagonist Binding in Ligand Binding Pocket of Androgen Receptor Elucidated Through Molecular Dynamics Simulations

Sugunadevi Sakkiah<sup>1</sup> , Rebecca Kusko<sup>2</sup> , Bohu Pan<sup>1</sup> , Wenjing Guo<sup>1</sup> , Weigong Ge<sup>1</sup> , Weida Tong<sup>1</sup> and Huixiao Hong<sup>1</sup> \*

<sup>1</sup> Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States, <sup>2</sup> Immuneering Corporation, Cambridge, MA, United States

#### Edited by:

Leonardo G. Ferreira, Universidade de São Paulo, Brazil

#### Reviewed by:

Irene Nobeli, Birkbeck, University of London, United Kingdom Sebastien Fiorucci, University of Nice Sophia Antipolis, France

> \*Correspondence: Huixiao Hong huixiao.hong@fda.hhs.gov

#### Specialty section:

This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

Received: 09 November 2017 Accepted: 25 April 2018 Published: 15 May 2018

#### Citation:

Sakkiah S, Kusko R, Pan B, Guo W, Ge W, Tong W and Hong H (2018) Structural Changes Due to Antagonist Binding in Ligand Binding Pocket of Androgen Receptor Elucidated Through Molecular Dynamics Simulations. Front. Pharmacol. 9:492. doi: 10.3389/fphar.2018.00492 When a small molecule binds to the androgen receptor (AR), a conformational change can occur which impacts subsequent binding of co-regulator proteins and DNA. In order to accurately study this mechanism, the scientific community needs a crystal structure of the Wild type AR (WT-AR) ligand binding domain, bound with antagonist. To address this open need, we leveraged molecular docking and molecular dynamics (MD) simulations to construct a structure of the WT-AR ligand binding domain bound with antagonist bicalutamide. The structure of mutant AR (Mut-AR) bound with this same antagonist informed this study. After molecular docking analysis pinpointed the suitable binding orientation of a ligand in AR, the model was further optimized through 1 µs of MD simulations. Using this approach, three molecular systems were studied: (1) WT-AR bound with agonist R1881, (2) WT-AR bound with antagonist bicalutamide, and (3) Mut-AR bound with bicalutamide. Our structures were very similar to the experimentally determined structures of both WT-AR with R1881 and Mut-AR with bicalutamide, demonstrating the trustworthiness of this approach. In our model, when WT-AR is bound with bicalutamide, Val716/Lys720/Gln733, or Met734/Gln738/Glu897 move and thus disturb the positive and negative charge clumps of the AF2 site. This disruption of the AF2 site is key for understanding the impact of antagonist binding on subsequent coregulator binding. In conclusion, the antagonist induced structural changes in WT-AR detailed in this study will enable further AR research and will facilitate AR targeting drug discovery.

Keywords: androgen receptor, molecular dynamics simulations, induced molecular docking, bicalutamide, agonist, antagonist

### INTRODUCTION

The androgen receptor (AR), a member of the nuclear subfamily 3, is a ligand-activated transcriptional factor. AR is expressed in various tissues of different species and regulates many physiological functions including bone density, cognition, muscle hypertrophy, prostate growth and differentiation (Gelmann, 2002). AR and estrogen receptor (ER) are well characterized

Sakkiah et al. Antagonist Induced AR Structural Changes

nuclear receptor target of active endocrine chemicals (Hong et al., 2002; Sakkiah et al., 2016). Copious experimental data and numerous in silico predictive models estimate both estrogenic and androgenic activity (Hong et al., 2002, 2003, 2005, 2012, 2015, 2016a,b; Shen et al., 2013; Ng et al., 2014, 2015a,b; Sakkiah et al., 2016; Ye et al., 2016). AR is a well-established drug target for prostate cancer, which is the second most common cancer by occurrence in men in western countries (Damber and Aus, 2008). Both steroid and non-steroid antagonists treat prostate cancer by blocking AR activity. A prolonged treatment course leads to tumor AR mutations, which causes AR antagonists to have a paradoxical effect. A thorough study of WT and mutant AR (Mut-AR) antagonist binding is required to better understand this paradoxical mechanism which limits therapeutic efficacy.

Full-length AR consists of 919 amino acids translated from 8 exons (Kuiper et al., 1989; Lubahn et al., 1989). Like other nuclear receptors, AR consists of three major functional domains: (1) an NH2-terminal domain, (2) a highly conserved DNA binding domain, and (3) a conserved ligand-binding domain (LBD) (Gao et al., 2005; Sakkiah et al., 2016). The hinge region acts as a bridge between the DNA binding domain and the conserved LBD. Both the AR N-terminal activation function 1 (AF1) in the DNA binding domain and the AR C-terminal activation function 2 (AF2) in the LBD control the transcriptional factors in ligand-independent and ligand-dependent manners, respectively. The AR-LBD (hereafter AR-LBD is termed as AR for simplicity) has three different binding or active sites where an agonist or antagonist can bind and alter AR functions: the ligand binding pocket, the AF2 site, and the binding function 3 (BF3) site. An agonist or a competitive antagonist can bind the AR ligand binding pocket to enhance or depress AR function, respectively. The AF2 site plays a major role in co-activator binding, which starts the transcription of AR-regulated genes. A few antagonists were reported to bind to the AF2 site, which directly blocks the binding of a co-activator protein (Axerio-Cilies et al., 2011). The BF3 site is a newly identified AR surface antagonist binding site. An antagonist can bind in any of these described binding sites to suppress AR activity. Antagonist binding causes conformational changes in the AF2 site, rendering it unsuitable for co-activators to bind AR (Estebanez-Perpina et al., 2007; Estébanez-Perpiñá and Fletterick, 2009). The three-dimensional structure of AR consists of 12 bundles of helices forming three layers (**Figure 1**). Among these 12 helices, H12 plays a major role in AR activation and undergoes a considerable conformational change due to the binding of agonist or antagonist in the ligand binding pocket. During agonist or antagonist binding, H12 functions like a "lid" which closes or moves away from the ligand binding pocket, respectively (Bohl et al., 2007; Cantin et al., 2007). When androgen binds the ligand binding pocket of AR, H12 tightly holds co-activator proteins and initiates function. AR antagonists are usually bulkier than agonists and thus require a wider binding pocket than agonists. Due to their larger size, antagonists push the residues in H12 (which is near the ligand binding pocket) outward to expand the active site. These structural changes in the ligand binding pocket cause the AF2 site to undergo conformational changes, preventing co-activator protein binding (Estébanez-Perpiñá and Fletterick, 2009). Some

mutations in AR cleverly cause drug resistance by converting AR antagonist properties into agonist properties. Prostate cancer drug resistance is predominantly driven by AR mutations. For example, mutations T877A (Sack et al., 2001; Bohl et al., 2007), W741L/C (Hara et al., 2003), L701A/T877A (Balbas et al., 2013), and F878L (Balbas et al., 2013; Korpal et al., 2013) in the LBD made AR antagonists Flutamide, R-bicalutamide, and Enzalutamide behave as agonists. The mutation T877A significantly increased the activity of AR, as evidenced by the enhanced AR affinity toward progesterone and estrogens (Taplin and Balk, 2004).

There exist 90 crystal structures of AR from different species (rat, mice, chimpanzee, and human) in the Protein Data Bank (PDB<sup>1</sup> ) (Berman et al., 2000). Wild type AR (WT-AR) crystal structures exist with either agonists in the ligand binding pocket or antagonists in the AF2 or BF3 sites. Mut-AR crystal structures exist with antagonists in the ligand binding pocket. No 3D structure of WT-AR with an antagonist in the ligand binding pocket has been described, likely because an antagonist binding to the AR-chaperone complex does not disassociate the chaperone from AR (Bohl et al., 2005; Sakkiah et al., 2016). To fill this knowledge gap, the AF2 site structural changes in WT-AR which are induced by antagonist binding could be determined via molecular modeling.

<sup>1</sup>www.rcsb.org

Determining the conformation change of a protein induced by a ligand using crystallography is at best time consuming but often infeasible. Several researchers employed molecular dynamics (MD) simulations to characterize H12 structural changes due to antagonists or agonist binding in the AR ligand binding pocket. Zhou J. et al. (2010) utilized replica-exchange MD to characterize structural conformational changes and H12 movement caused by binding of hydroxyflutamide in the ligand binding pocket of WT and mutant (T877A) AR. Using MD simulations, Bisson et al. (2008) proposed that T877A in AR destabilized hydroxyflutamide–Met895 interactions and thus decreased hydroxyflutamide antagonist activity. Additionally, Osguthorpe and Hagler (2011) employed MD simulations and quantum mechanics to discover that an antagonist occupied more space than an agonist, leading to H12 instability. While important contributions to the field, these MD simulations were limited by short time frames and mainly focused on the ligand binding pocket or H12 structural changes (Bisson et al., 2008; Osguthorpe and Hagler, 2011; Liu et al., 2015, 2016, 2017; Wang et al., 2017). Recently, many researchers captured structural changes of various proteins using long time MD simulations (hereafter called "long MD simulations") (Whitten et al., 2005; Dror et al., 2009; Khelashvili et al., 2009; Nury et al., 2010; Gotz et al., 2012; Durrant et al., 2016). For example, Lindorff-Larsen et al. (2011) predicted the folding of 12 proteins using MD simulations ranging from microsecond to a millisecond. Their results unveiled a common principle for the folding of the 12 structurally diverse proteins and more importantly demonstrated that long MD simulations are a power tool to predict and capture protein conformational changes (Lindorff-Larsen et al., 2011). Next, Kumar and Purohit (2014) found that the long MD simulations significantly increased prediction accuracy when studying cancer associated single nucleotide polymorphisms. Thus, long MD simulations overcome many limitations of short-term MD simulations. Duan et al. (2016) conducted 1 µs MD simulations and explored ligand binding pocket changes during agonist and antagonist binding in WT and Mut-AR. Using bias-exchange meta-dynamics to study the free energy profile of agonist and antagonist binding to AR, they observed agonist and antagonist binding driven movement of H12 and structural changes in the ligand binding pocket of WT-AR. They also reported that long MD simulations were required to capture H12 movement, whereas shortterm stimulations miscalculated agonist binding induced H12 structural changes (Duan et al., 2016). Hence, in this study, we applied long MD simulations (1 µs) not only to capture H12 movement but also to study AF2 site structural changes due to antagonist binding in the AR ligand binding pocket.

Three AR complex structures were studied to understand the antagonist binding induced structural changes of the AF2 site. R1881 and bicalutamide are, respectively, well-known as an agonist and an antagonist for AR. Structures of AR bound with R1881 and bicalutamide were downloaded from PDB: WT-AR-R1881 (AR with agonist, PDBID: 1E3G) and Mut-AR-bicalutamide (AR with antagonist, PDBID:1Z95). The third AR complex structure, WT-AR-bicalutamide, was absent from PDB and thus was generated using the induced fit molecular docking (IFD) method (explained in the Section "Materials and Methods"). The IFD method explores both possible binding poses of a ligand in a receptor active site as well as the associated conformational changes of the side chains near the active site. MD simulations are an important tool to study receptor–ligand interactions at an atomic level for a given time frame. MD simulations optimize three-dimensional complex protein structure bound with a ligand obtained from X-ray crystallography or molecular docking. Here, we leveraged the advantages of IFD and MD simulations together to understand the subtle structural changes in WT-AR due to anti-androgen binding and also to elucidate key co-activator binding residues in the WT-AR AF2 site. Each AR complex structure was subjected to 1 µs of MD simulations to resolve important AF2 site residue reformation during the binding of small molecules in WT-AR. Our results will enable design of improved prostate cancer treatments and facilitate endocrine disruption chemical risk assessment through AR-mediated responses.

### MATERIALS AND METHODS

#### Molecular Docking

Rigid docking (only giving flexibility to ligands) might fail to produce a precise ligand pose due to rigidness of the protein. In contrast, IFD gives flexibility to adjust not only the active site but also the side chain orientations of the protein to fit the pose and conformation of the bound ligand (Zhong et al., 2009). Hence, it can generate many protein-ligand complexes by changing the side chains or the backbone of the protein. Glide (docking) and Prime (refinement) modules were used in the IFD to determine the possible binding modes of the ligand and the concomitant binding induced conformational changes.

The IFD (Sherman et al., 2006a,b) module<sup>2</sup> from the Schrodinger-Suite (2016b) was used to dock the AR antagonist, bicalutamide, in WT-AR.

The following steps were involved in the IFD employed here (Wang et al., 2008; Luo et al., 2013):


Protein preparation is one of the most important steps in molecular docking and plays a key role in IFD. The threedimensional atomic coordinates of WT-AR (PDB ID: 1E3G)

<sup>2</sup>www.schrodinger.com/induced-fit

(Matias et al., 2000) were retrieved from PDB and used as a receptor for the IFD. The Protein Preparation module<sup>3</sup> was used to add hydrogen atoms and to build the missing side chains, residues, and loops. The OPLS-2001 force field (Jorgensen and Tirado-Rives, 1988; Kaminski et al., 2001; Shivakumar et al., 2010) was used to assign the partial charges. All water molecules were removed and the protein structure was optimized using the OPLS force field. A 10 Å docking grid was generated around the ligand, R1881, in WT-AR. The structure of bicalutamide was obtained from the crystal structure of Mut-AR-bicalutamide (PDB ID: 1Z95) (Bohl et al., 2005) and docked in the generated grid box using Glide XP docking. The Glide XP docking (Halgren et al., 2004; Friesner et al., 2006; Shelley et al., 2007) generated 20 different bicalutamide poses for the WT-AR structural refinements. The Prime module was used to refine the generated WT-AR-bicalutamide complexes. In the Prime refinement, each WT-AR-bicalutamide conformation from the previous step was subjected to side chain and backbone refinements (Jacobson et al., 2004) by selecting the residues within 10 Å from bicalutamide and/or residues from 669 to 918. The Prime energy was calculated and used to rank the refined AR-bicalutamide complexes. The lowest energy conformation (30 kcal/mol) of the refined WT-AR complex was used to re-dock the bicalutamide using Glide XP mode. The most favorable binding pose of bicalutamide in WT-AR was selected based on the IFD score (binding energy). The selected WT-AR-bicalutamide complexes were visualized to check the interactions between bicalutamide and the residues in the ligand binding pocket using Ligand Interactions module in Maestro 11 (Schrodinger-Suite, 2016a).

#### Molecular Dynamics Simulations

Proteins are dynamic in nature. Thus, understanding atomic level motion is required to capture their profound dynamic mechanisms (Chou and Mao, 1988; Chou et al., 1994; Wang and Chou, 2009). MD simulations have the capacity to analyze the dynamics of an apoprotein or a complex with other molecules in an aqueous environment (Sakkiah et al., 2013a,b). Moreover, MD simulations yield energetically favorable conformations by optimizing a protein-ligand complex, which is needed to understand protein–ligand interactions and ligand binding induced structural changes.

The structures of the WT-AR-bicalutamide complex (obtained from IFD), WT-AR-R1881, (PDBID: 1E3G) (Matias et al., 2000), and the Mut-AR-bicalutamide complex (PDBID: 1Z95) (Bohl et al., 2005) were subjected to MD simulations using the Amber 14 package (Case et al., 2005). Then the topology and coordinate files for the agonist and antagonist were prepared using antechamber. Tleap was used to prepare the topology and coordinate files for the protein as well as to make the AR complex for running MD simulations. Amber03 molecular mechanical force field (Duan et al., 2003) and general AMBER force field (gaff) (Wang et al., 2004) were employed for the protein and ligands (agonist and/or antagonist), respectively. Each of the complex structures were immersed into a rectangular box of TIP3P model water (Jorgensen et al., 1983). The boundaries of the water box size were 10 Å away from the nearest atoms of the complex. All systems were neutralized by adding Cl<sup>−</sup> ions. The Particle Mesh Ewald (PME) (Darden et al., 1993) and SHAKE (Ryckaert et al., 1977) algorithms were used to handle long-range electrostatic interactions for all heavy and hydrogen atoms involved in the covalent bonding. A cutoff of 10 Å was used for the short-range interactions (van der Waals and electrostatic interactions). In the first phase, only the solvents were minimized and equilibrated inside the water box. Then, the whole system was minimized and equilibrated by applying the steepest descent minimization for 1000 cycles, followed by conjugate gradient energy minimization for 4000 cycles. Subsequently the whole system was gradually heated from 0 to 310.15 K over a 100 ps period which was followed by a 250 ps equilibrium simulation for the whole systems. In the second phase, the prepared systems were subjected to 1 µs of MD simulations using Amber14. All MD simulations were performed with a time step of 2 fs. The coordinates were saved for every 1 ps. MD simulations were performed using PyMol (Schrodinger, 2015) and Visual Molecular Dynamics (Humphrey et al., 1996). The Amber package<sup>4</sup> was used to calculate RMSD values for the protein and ligands as well as RMSF values for residues.

#### RESULTS AND DISCUSSION

#### IFD Produced a Structure of WT-AR-Bicalutamide for MD Simulations

No crystal structure for WT-AR with an antagonist in the ligand binding pocket has been deposited in PDB (accessed on May 19, 2017). To address this open question, we conducted IFD. Flexibility was given to the active site residues and the

<sup>4</sup>http://ambermd.org/doc12/Amber14.pdf

TABLE 1 | Induced fit docking (IFD) score and the key residues involved in hydrogen bond interactions between WT-AR and bicalutamide for the top 5 complexes.


TABLE 2 | Three molecular systems in MD simulations.


<sup>3</sup>https://www.schrodinger.com/protein-preparation-wizard

ligand during Glide docking. The whole WT-AR-bicalutamide system was refined using the Prime module to predict the suitable binding orientation of bicalutamide in the ligand binding pocket of WT-AR. Among the 20 models generated for WT-AR-bicalutamide, the top 5 models were selected based on their IFD/Glide scores and checked for residue interactions (**Table 1**). Among these 5 complex structures, Model-1, Model-3, and Model-4 showed a π–cation interaction with Trp741 and Phe764. Trp741 had van der Waals interactions favorable for agonist binding in the ligand binding pocket of WT-AR (Bohl et al., 2005). In contrast, Model-2 and Model-5 failed to form π–cation interactions with Trp741 or Phe874. Model-3, Model-4, and Model-1 had shown three, three, and two hydrogen bond interactions between bicalutamide and WT-AR, respectively. In Model-3 and Model-4, bicalutamide formed hydrogen bond interactions with Leu704, Asn705, and Arg752. Importantly, the hydrogen bond between the agonist/antagonist with Arg752 in WT-AR is crucial for AR activity (Gao et al., 2005; Bohl et al., 2007; Tan et al., 2015). Bicalutamide in Model-1 failed to form hydrogen bond interactions with Arg752. Model-3 had a better binding affinity value than Model-4. Interestingly, bicalutamide in Model-3 showed a bent conformation, which is different from the bicalutamide conformation in the Mut-AR (Gao et al., 2005). Previous evidence proposed that bicalutamide forms a hydrogen bond with residues Arg752, Leu705, Asn705, and Gln711 in Mut-AR (Tan et al., 2015). While Model-3 also formed a hydrogen bond with critical residues (Leu704, Asn705, and Arg752) it failed to form a hydrogen bond with Gln711 and did not adopt a similar pose with the agonist due to the bulkier tryptophan side chain. Additionally, in Model-3, the 4-fluorophenyl group of bicalutamide moved toward the H12 region to form a suitable position in the WT-AR ligand binding pocket. Hence, Model-3 was selected for subsequent MD simulations of WT-AR-bicalutamide based on IFD score and binding interactions.

### System Stability and Fluctuation Analysis Revealed Stability of AR Structures

We used the three molecular systems listed in **Table 2** (WT-AR-R1881, WT-AR-bicalutamide, and Mut-AR-bicalutamide) to analyze the structural changes in WT-AR due to bicalutamide binding in the ligand binding pocket using MD simulations. All trajectory files obtained from the MD simulations were examined for stability and fluctuation of the systems. Metrics of root mean square deviation (RMSD) and root mean square fluctuation (RMSF) were calculated for all systems to measure their energetic stability and the spatial fluctuation of residues, respectively. **Figure 2A** plots the RMSD values of the three systems during the 1 µs simulations. The RMSD values converged in the last 100 ns, indicating that the systems had reached a stable state. The WT-AR-R1881 and Mut-AR-bicalutamide systems were stabilized with an RMSD value of around 2.0 Å, while the WT-AR-bicalutamide system had a higher RMSD value of about 2.5 Å. An average structure was calculated from the last 100 ns for each of the three systems.

FIGURE 2 | (A) Shows the root mean square deviation (RMSD) plot of the systems during the 1 µs MD simulations. The RMSD values were calculated using AR backbone atoms. The X-axis represents time with a unit of 100 ps and the Y-axis shows RMSD values in Å. (B) Shows the root mean square fluctuation (RMSF) of the Cα atoms of AR systems in the 1 µs MD simulations. The X-axis indicates AR residue number and Y-axis represents RMSF in Å. The residues with RMSF > 2 Å are marked. (C) Demonstrates the structure of WT-AR-R1881, residues with RMSF > 2 Å in the loop regions are marked. These residues are drawn in a stick model. WT-AR-R1881 is color coded in green, WT-AR-bicalutamide in purple, and Mut-AR-bicalutamide in blue.

The structure with the lowest RMSD value compared with the average structure in last 100 ns was selected as a representative structure for each of the systems to elucidate the structural changes of WT-AR induced by bicalutamide.

Root mean square fluctuation plots were used to analyze flexibility of the residues in AR in the 1 µs MD simulations. Examination of the RMSF plots in **Figure 2B** revealed that WT-AR-bicalutamide had a larger RMSF value compared with WT-AR-R1881 and Mut-AR-bicalutamide near the C-terminal of LBD (mostly near H12). The average RMSF value for WT-AR-bicalutamide, Mut-AR-bicalutamide, and WT-AR-R1881 was 1.29, 1.25, and 1.11 Å, respectively. Five residues (Asn692, Leu728, Gly820, Pro849, and Ser888) in AR had an RMSF of >2.0 Å (**Figure 2B**) and were considered to be flexible residues. These five residues were present in the loop region of AR (**Figure 2C**). The RMSF values of the active site residues were small, demonstrating the stability of the AR active site.

### Key Structural Changes in WT-AR Binding Antagonists

The AR ligand binding pocket accommodates both agonists and antagonists. Most antagonists bind in this site and alter the function of AR. The representative structures of WT-AR-R1881 and WT-AR-bicalutamide obtained from the MD simulations were superimposed to examine the difference between the two systems. Several major structural changes were identified in WT-AR due to the bicalutamide binding compared with agonist binding (R1881) (**Figure 3A**). Comparison of WT-AR-bicalutamide with WT-AR-R1881 showed a distortion at the end of H10 due to bicalutamide binding. Several residues in H10 were changed into a loop, which enabled more flexible movement. The structural conversion of H11 into a loop moved H12 away from the AR ligand binding pocket. Moreover, structural changes were observed when comparing WT-AR and Mut-AR bound with bicalutamide (**Figure 3B**). During bicalutamide binding, H11 was retained in the Mut-AR structure but was changed into a loop in the WT-AR structure (marked by the dotted circle in **Figure 3B**). As expected, Mut-AR-bicalutamide had a similar 3D structure to WT-AR-R1881.

The ligand binding pocket area and volume were calculated using the online Computed Atlas of surface Topography of protein server<sup>5</sup> . The area/volume for WT-AR-R1881,

<sup>5</sup>http://sts.bioe.uic.edu/castp/calculation.html

WT-AR-bicalutamide, and Mut-AR-bicalutamide were 185/90, 528/321, and 366/193, respectively. As expected, area and volume of the ligand binding pocket of WT-AR-bicalutamide were larger than the agonist binding in WT-AR and bicalutamide binding in Mut-AR. Bicalutamide is larger than R1881 and hence moved H12 outward from the ligand binding pocket. The RMSD values comparing the WT-AR-R1881 vs. WT-AR-bicalutamide as well as WT-AR-bicalutamide vs. Mut-AR-bicalutamide were calculated for each residue by superimposing the structures using Visual Molecular Dynamics (Humphrey et al., 1996). The residues were ranked based on the computed RMSD values and are plotted in Supplementary Figure S1. The RMSD values showed a gap between 2.8 and 3 Å in both comparisons (Supplementary Figures S1A,B). There were 42 and 37 residues with RMSD value greater than 2.8 Å between WT-AR-R1881 and WT-AR-bicalutamide and between WT-AR-bicalutamide and Mut-AR-bicalutamide, respectively. These residues are summarized in Supplementary Tables S1, S2. Twenty-two WT-AR-R1881 vs. WT-AR-bicalutamide residues and 26 WT-AR-bicalutamide vs. Mut-AR-bicalutamide residues were in helices (H3, H7, H9, H10, and H12), while the other residues were in loop regions.

The Trp741 mutation played a major role in the conversion of an AR antagonist into an agonist. The flipped Trp741 side chain moved His874 in H10 away from the ligand binding pocket to accommodate bicalutamide. Leu873, Phe876, Thr877, and

Met895 were the active site residues in the ligand binding pocket showing RMSD values greater than 3 Å between WT-AR-R1881 and WT-AR-bicalutamide. Thr850, Ser851, His874, Phe878, and Leu881 from H10 also had RMSD values greater than 3 Å (Supplementary Table S1). These structural changes drove the ligand binding pocket of WT-AR to expand to accommodate bicalutamide.

The representative structure of WT-AR-R1881 superimposed well with Mut-AR-bicalutamide compared with the superimposition of WT-AR-R1881 and WT-AR-bicalutamide. The H12 residues in Mut-AR-bicalutamide were not very different from the H12 residues in WT-AR-R1881. All residues in Mut-AR had less than 2.5 Å RMSD compared with WT-AR-R1881. Mut-AR-bicalutamide additionally did not experience large structural changes compared to WT-AR-R1881. The mutant residue Trp741Leu in Mut-AR-bicalutamide had a similar conformation to the wild type residue in WT-AR-R1881. The residues showing RMSD greater than 2.8 Å between WT-AR-bicalutamide and WT-AR-R1881 are listed in Supplementary Table S1.

Lastly, Mut-AR-bicalutamide and WT-AR-bicalutamide representative structures were superimposed to identify the crucial residues that played important roles in bicalutamide binding to AR. H11 in WT-AR-bicalutamide changed into a loop. The residues 882–984 in the loop region between H10 and H12 gave more flexibility for H12 to move away from the ligand binding pocket in WT-AR-bicalutamide. All these residues had RMSD values greater than 3.5 Å compared with WT-AR-R1881. Notably, the residues from His885 to Asp890 had RMSD values greater than 6 Å. These residues forming H11 in Mut-AR-bicalutamide reduced the flexibility of the loop and held H12 close to the ligand binding pocket. As expected, these residues showed RMSD values less than 2.8 Å between WT-AR-R1881 and Mut-AR-bicalutamide. Hence, we posit that the structural change of H11 into a loop in WT-AR-bicalutamide plays an essential role in H12 movement and thus makes the AF2 site not suitable for coactivator binding. The residues which are different between Mut-AR-bicalutamide and WT-AR-bicalutamide are listed in Supplementary Table S2.

Superimposition of the X-ray crystal structures and the representative structures from our MD simulations had an RMSD value of 1.10 Å for WT-AR-R1881 (**Figure 4A**) and 1.02 Å for Mut-AR-bicalutamide (**Figure 4B**). This indicates that the selected representative structures do not deviate much from the X-ray crystal structures. Furthermore, the


orientations of R1881 and bicalutamide were also similar to the crystal structures. The overlay of bicalutamide from the Mut-AR X-ray crystal structure and the representative WT-AR structure from MD simulations had an RMSD value of 5.2 Å (**Figure 4C**). This comparative analysis confirmed that the representative structures of WT-AR-bicalutamide obtained from the MD simulations are reliable and were not obtained by chance. Therefore, the representative structure of WT-AR-bicalutamide could be reliably used to elucidate the structural changes in WT-AR due to antagonist binding.

### Identification of Critical Residues in the AF2 Site

The AR AF2 site is bound by co-activator proteins, which initiates the transcription of target genes. **Table 3** lists the important residues in WT-AR and their interactions with co-activator proteins (Askew et al., 2007; Estebanez-Perpina et al., 2007; Hsu et al., 2014). The interactions between AR and co-activators were identified from 17 WT-AR-agonist and two Mut-AR-agonist complexes in the PDB. Most of the residues (Val713, Val716, Lys717, Lys720, Phe725, Val730, Gln733, Met734, Ile737, Gln738, Glu893, Met894, and Ile898) in the AF2 site formed hydrophobic interactions with co-activator proteins. Five residues (Val716, Met734, Ile737, Gln738, and Met894) in the AF2 site had hydrophobic interactions with most of the co-activators. Glu897, Lys720, Asp731, and Gln733 formed hydrogen bond interactions with co-activator proteins and Glu897 and Lys720 formed hydrogen bond interactions with most of the co-activators (Askew et al., 2007; Estebanez-Perpina et al., 2007; Hsu et al., 2014). From the structural analysis, it was clear that Val716, Met734, Ile737, Gln738, Met894, Glu897, and Lys720 played a paramount role in tight binding of co-activator proteins.

Comparison of the AF2 site of the three representative structures (WT-AR-R1881, WT-AR-bicalutamide, and Mut-ARbicalutamide) from the MD simulations shed light on critical residue displacements which prevent co-activator binding. Val713, Val716, Lys717, Lys720, Phe725, Met734, Met894, Glu897, and Ile898 were considerably different between WT-AR-bicalutamide and WT-AR-R1881 (**Figure 5A**). Among these residues, few had a considerable deviation in their side chain. The side chain distances of Glu897 (CD), Gln738 (CD), Met734 (SD), Val716 (O), Lys720 (CG) were 3.8, 4.2, 2.2, 2.0, and 2.2 Å, respectively, between the WT-AR-R1881 and WT-AR-bicalutamide. These residues also had different conformations between WT-AR-bicalutamide and Mut-ARbicalutamide as depicted in **Figure 5B**, with respective side chain distances of Glu897 (CD), Gln738 (CD), Met734 (SD), Val716 (O), Lys720 (CG) as 3.2, 0.5, 1.8, 1.1, and 3.0 Å.

Val716, Lys720, and Gln733 were previously experimentally proven to form a charge clump in the AF2 site, which interacts with co-activator proteins (Askew et al., 2007; Estebanez-Perpina et al., 2007; Estébanez-Perpiñá and Fletterick, 2009; Hsu et al., 2014). These residues had a remarkable deviation when comparing between the WT-AR-R1881 and WT-ARbicalutamide structures in our data. Axerio-Cilies et al. (2011) experimentally proved that Met734 was pushed away from the AF2 site when bicalutamide binds AR. In addition, Zhou X.E. et al. (2010) demonstrated that Glu897 meaningfully interacted with a co-activator protein. Taken together, these previous results support our discovery: when bicalutamide binds WT-AR, Met734, and Glu897 move, which causes structural changes in H12. H12's structural change renders the AF2 site not suitable for co-activator protein binding. Lys720, Glu897, Val716, and Met984 were found to play a major role in the binding of co-activator peptides (He et al., 2004; Hur et al., 2004).

### Electrostatic Potential Surface Analysis Revealed That Bicalutamide Binding Disturbed the Positive and Negative Charge Clump in the WT-AR AF2 Site

Electrostatic potential surface analysis is one of the most powerful tools to study intramolecular interactions in a protein and intermolecular interactions between a protein and a small molecule (Sakkiah et al., 2013a). The electrostatic potential surface was calculated only for the critical residues in the AF2 site using PyMol (Baker et al., 2001). PyMol automatically generated the electrostatic potential map and smoothed out the local charge density of the nearby atoms (within 10 Å) without taking solvent

screening effects into account6,<sup>7</sup> . The electrostatic potential surface of the AF2 site in WT-AR-R1881, WT-AR-bicalutamide, and Mut-AR-bicalutamide is shown in **Figure 6**. WT-AR-R1881 and Mut-AR-bicalutamide had very similar electrostatic potential surfaces in their AF2 site (**Figures 6A,C**), indicating the mutant residues turned the antagonist into an agonist. However, WT-AR-bicalutamide had a very different electrostatic potential surface (**Figure 6B**) compared with the other two structures due to structural changes in the AF2 site caused by the antagonist binding. Five residues (Val716, Lys720, Gln733, Gln738, and Met734) played an important role in bicalutamide binding induced WT-AR AF2 site structural changes. The binding of R1881 in the active site of WT-AR formed a positive (blue) and negative (red) binding region in the AF2 site (**Figure 6A**). Proximal residue contact closed the positive (caused by Gln733, Lys720, and Val716) and negative (caused by Met734, and Gln738) binding sites of the AF2 site in WT-AR-bicalutamide (**Figure 6B**). The critical residues in the Mut-AR-bicalutamide AF2 site (**Figure 6C**) showed a similar type of change compared with Mut-AR-R1881. Previously, it was experimentally proven that the charge clump was formed by residues Lys720 and Glu897 (Estebanez-Perpina et al., 2005; Tan et al., 2015). Co-activators can form hydrogen bond interactions with Lys720 and Glu897, leading to high binding affinity with WT-AR. These hydrogen bonds were distorted due to antagonist binding. Bicalutamide binding in the active site of WT-AR moved Lys720 and Glu897, disturbing the charge clump in the AF2 site and allowing for co-activator binding. Hence, the movement of Lys720, Val716, and Gln733 made the AF2 site unsuitable for co-activator proteins to bind together with bicalutamide. These computational findings give insight into the residues involved in the ligand induced conformational changes of the AF2 site.

### CONCLUSION

No structural details of WT-AR when bound by antagonists have been reported to date. Hence, we applied IFD and 1 µs long MD simulations to elucidate the bicalutamide binding induced structural changes of WT-AR's AF2 site. IFD identified a suitable

#### REFERENCES


pose of bicalutamide in the ligand binding pocket of WT-AR. The best WT-AR-bicalutamide structure was selected based both on IFD score and on bicalutamide interactions with the critical residues in the ligand binding pocket of WT-AR. The complexes (WT-AR-R1881, WT-AR-bicalutamide, and Mut-AR-bicalutamide) were optimized by MD simulations using Amber 14. Our results clearly pinpointed residues Val716, Lys720, Gln733 and Met734, Gln738, and Glu897 as playing a pivotal role in the formation of the AF2 site in AR. Structural changes or movement of these residues due to bicalutamide binding changed the structure of the AF2 site, making it unsuitable for co-activator protein binding. The electrostatic potential map clearly revealed that the movement of these residues due to bicalutamide binding disturbed the positive and negative charge clump in the AF2 site of WT-AR. The positive clump in the AF2 site was distorted due to the movement of residues Lys720, Val716, and Gln733. Experimental validation is needed to confirm the mechanism by which bicalutamide binding induced WT-AR AF2 structural changes impact recruitment of co-factors.

### AUTHOR CONTRIBUTIONS

SS and HH conceived the experiment(s). SS, BP, and WGe conducted the experiments. SS, BP, and WGo analyzed the results. SS, WT, HH, and RK wrote the manuscript. All authors reviewed and approved the manuscript.

### FUNDING

This research was supported in part by an appointment to the Research Participation Program at the National Center for Toxicological Research (SS, BP, and WGo) administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00492/full#supplementary-material

Balbas, M. D., Evans, M. J., Hosfield, D. J., Wongvipat, J., Arora, V. K., Watson, P. A., et al. (2013). Overcoming mutation-based resistance to antiandrogens with rational drug design. eLife 2:e00499. doi: 10.7554/eLife.00499


<sup>6</sup> http://www.bccs.uni.no

<sup>7</sup> http://www.bioinfo.no


suggests a transition in nuclear receptor activation function dominance. Mol. Cell 16, 425–438.



protonation state generation for drug-like molecules. J. Comput. Aided Mol. Des. 21, 681–691.


**Disclaimer:** The content is solely the responsibility of the authors and does not necessarily represent the official views of the Food and Drug Administration. The findings and conclusions in this article have not been formally disseminated by the US Food and Drug Administration (FDA) and should not be construed to represent the FDA determination or policy.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sakkiah, Kusko, Pan, Guo, Ge, Tong and Hong. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.