<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Pharmacol.</journal-id>
<journal-title>Frontiers in Pharmacology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Pharmacol.</abbrev-journal-title>
<issn pub-type="epub">1663-9812</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fphar.2017.00889</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Pharmacology</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>vNN Web Server for ADMET Predictions</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Schyman</surname> <given-names>Patric</given-names></name>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/479211/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Ruifeng</given-names></name>
</contrib>
<contrib contrib-type="author">
<name><surname>Desai</surname> <given-names>Valmik</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wallqvist</surname> <given-names>Anders</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/187800/overview"/>
</contrib>
</contrib-group>
<aff><institution>DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command</institution>, <addr-line>Fort Detrick, MD</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Adriano D. Andricopulo, S&#x000E3;o Carlos Institute of Physics, University of S&#x000E3;o Paulo, Brazil</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Fabio Broccatelli, Genentech, United States; Tero Aittokallio, Institute for Molecular Medicine Finland, Finland; Emilio Benfenati, Istituto Di Ricerche Farmacologiche Mario Negri, Italy</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Patric Schyman <email>pschyman&#x00040;bhsai.org</email></p></fn>
<fn fn-type="corresp" id="fn002"><p>Anders Wallqvist <email>sven.a.wallqvist.civ&#x00040;mail.mil</email></p></fn>
<fn fn-type="other" id="fn003"><p>This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>04</day>
<month>12</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>8</volume>
<elocation-id>889</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>09</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>11</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Schyman, Liu, Desai and Wallqvist.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Schyman, Liu, Desai and Wallqvist</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>In drug development, early assessments of pharmacokinetic and toxic properties are important stepping stones to avoid costly and unnecessary failures. Considerable progress has recently been made in the development of computer-based (<italic>in silico</italic>) models to estimate such properties. Nonetheless, such models can be further improved in terms of their ability to make predictions more rapidly, easily, and with greater reliability. To address this issue, we have used our vNN method to develop 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction models. These models quickly assess some of the most important properties of potential drug candidates, including their cytotoxicity, mutagenicity, cardiotoxicity, drug-drug interactions, microsomal stability, and likelihood of causing drug-induced liver injury. Here we summarize the ability of each of these models to predict such properties and discuss their overall performance. All of these ADMET models are publically available on our website (<ext-link ext-link-type="uri" xlink:href="https://vnnadmet.bhsai.org/">https://vnnadmet.bhsai.org/</ext-link>), which also offers the capability of using the vNN method to customize and build new models.</p></abstract>
<kwd-group>
<kwd>ADME</kwd>
<kwd>toxicology</kwd>
<kwd>QSAR</kwd>
<kwd>machine learning</kwd>
<kwd>applicability domain</kwd>
<kwd>online web platform</kwd>
<kwd>open access</kwd>
</kwd-group>
<contract-num rid="cn001">CBCall14-CBS-05-2-0007</contract-num>
<contract-sponsor id="cn001">Defense Threat Reduction Agency<named-content content-type="fundref-id">10.13039/100000774</named-content></contract-sponsor>
<contract-sponsor id="cn002">Medical Research and Materiel Command<named-content content-type="fundref-id">10.13039/100000182</named-content></contract-sponsor>
<counts>
<fig-count count="5"/>
<table-count count="4"/>
<equation-count count="8"/>
<ref-count count="50"/>
<page-count count="14"/>
<word-count count="8993"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Drug discovery is a risky, lengthy, and resource-intensive process with high attrition rates. In recent years, the development of assays and computer-based (<italic>in silico</italic>) models to assess absorption, distribution, metabolism, and excretion (ADME) properties has greatly reduced the attrition rate (Waring et al., <xref ref-type="bibr" rid="B46">2015</xref>). The ability to predict these properties quickly and reliably facilitates the exclusion of compounds with potential ADME issues, and thereby helps investigators prioritize which compounds to synthesize and evaluate. However, toxicity remains a hurdle, with an attrition rate of 40% among new compounds identified in the drug discovery phase (Waring et al., <xref ref-type="bibr" rid="B46">2015</xref>). This necessitates careful selection of compounds during drug development to avoid late-stage attrition. As such, there is an urgent need for <italic>in silico</italic> methods that make fast, easy, and reliable predictions of ADME and toxicity (ADMET) properties, which has resulted in several online tools and web-platforms for ADMET predictions (Walker et al., <xref ref-type="bibr" rid="B43">2010</xref>; Sushko et al., <xref ref-type="bibr" rid="B42">2011</xref>; Cheng et al., <xref ref-type="bibr" rid="B13">2012</xref>; Maunz et al., <xref ref-type="bibr" rid="B32">2013</xref>; Manganaro et al., <xref ref-type="bibr" rid="B31">2016</xref>; Daina et al., <xref ref-type="bibr" rid="B17">2017</xref>).</p>
<p>Here we provide an overview of our versatile variable nearest neighbor (vNN) method (Liu et al., <xref ref-type="bibr" rid="B29">2012</xref>) and the 15 models we constructed using this method to predict the ADMET properties of potential target compounds. The vNN method has several advantages over existing <italic>in silico</italic> methods. First, it calculates the similarity distance between molecules in terms of their structure, and uses a distance threshold to define a domain of applicability (i.e., all nearest neighbors that meet a minimum similarity threshold constraint). This applicability domain, while limiting vNN-based models to making predictions only for molecules that are similar to the reference molecules, ensures that the predictions they generate are reliable. Second, vNN-based models can be built within minutes and require no re-training when new assay information becomes available&#x02014;an important feature when keeping quantitative structure&#x02014;activity relationship (QSAR) models up-to-date to maintain their performance levels. Finally, as we show throughout this work, the performance characteristics of our vNN-based models are comparable, and often superior, to those of other more elaborate model constructs.</p>
<p>We have developed a publically available vNN website (<ext-link ext-link-type="uri" xlink:href="https://vnnadmet.bhsai.org/">https://vnnadmet.bhsai.org/</ext-link>). This website provides users with ADMET prediction models that we have developed, as well as a platform for using their own experimental data to update these models or build new ones from scratch. Although we use the vNN method here for predicting ADMET properties, the vNN website can be used to build a variety of classification or regression models.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<sec>
<title>The vNN method</title>
<p>The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, <xref ref-type="bibr" rid="B50">2000</xref>). This method rests on the premise that compounds with similar structures have similar activities. The simplest form of the k-NN method takes the average property values of the k nearest neighbors as the predicted value. However, because structurally similar compounds tend to show similar biological activity, it is reasonable to weight the contributions of neighbors so that closer neighbors contribute more to the predicted value. One notable feature of the k-NN method is that it always gives a prediction for a compound, based on a constant number, k, of nearest neighbors no matter how structurally dissimilar they are from the compound. An alternative approach is to use a predetermined similarity criterion. We developed the aforementioned vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model&#x00027;s applicability domain (Liu et al., <xref ref-type="bibr" rid="B29">2012</xref>, <xref ref-type="bibr" rid="B28">2015</xref>; Liu and Wallqvist, <xref ref-type="bibr" rid="B30">2014</xref>). When no nearest neighbor meets the criterion, the vNN method makes no prediction.</p>
<p>One of the most widely used measures of the similarity distance between two small molecules is the Tanimoto distance, <italic>d</italic>, which is defined as:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:mi>d</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>n</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x02229;</mml:mo><mml:mi>Q</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>Q</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x02229;</mml:mo><mml:mi>Q</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>n</italic>(<italic>P</italic> &#x02229; <italic>Q</italic>) is the number of features common to molecules <italic>p</italic> and <italic>q</italic>, and <italic>n</italic>(<italic>P</italic>) and <italic>n</italic>(<italic>Q</italic>) are the total numbers of features for molecules <italic>p</italic> and <italic>q</italic>, respectively. The features used to calculate molecular similarity are often based on atom type (connectivity and chemical properties), such as element, charge, donor, acceptor, and aromatic, but they can also be based on holistic molecular properties, such as molecular weight and partition coefficient (LogP). The predicted biological activity <italic>y</italic> is then given by a weighted average across structurally similar neighbors:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mi>y</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>&#x003BD;</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:mfrac><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>&#x003BD;</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:mfrac><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>d</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:math></disp-formula>
<p>where <italic>d</italic><sub><italic>i</italic></sub> denotes the Tanimoto distance between a query molecule for which a prediction is made and a molecule <italic>i</italic> of the training set; <italic>y</italic><sub><italic>i</italic></sub> is the experimentally measured activity of molecule <italic>i</italic>; <italic>h</italic> is a smoothing factor, which dampens the distance penalty; <italic>d</italic><sub>0</sub> is a Tanimoto-distance threshold, beyond which two molecules are no longer considered to be sufficiently similar to be included in the average; and <italic>v</italic> denotes the total number of molecules in the training set that satisfy the condition <italic>d</italic><sub><italic>i</italic></sub> &#x02264; <italic>d</italic><sub>0</sub>. The values of <italic>h</italic> and <italic>d</italic><sub>0</sub> are determined from cross-validation studies.</p>
<p>To identify structurally similar compounds, we used Accelrys extended-connectivity fingerprints with a diameter of four chemical bonds (ECFP4) (Rogers and Hahn, <xref ref-type="bibr" rid="B38">2010</xref>). For the vNN website, we chose ECFP4 fingerprints, which have previously been reported to show satisfactory overall performance in retrieving the active compounds of diverse datasets (Hert et al., <xref ref-type="bibr" rid="B24">2004</xref>; Duan et al., <xref ref-type="bibr" rid="B20">2010</xref>; Schyman et al., <xref ref-type="bibr" rid="B40">2016</xref>). We emphasize that <italic>h</italic> and <italic>d</italic><sub>0</sub> are unique, and need to be optimized for each set of fingerprints and training set.</p>
</sec>
<sec>
<title>Model validation</title>
<p>We used the 10-fold cross-validation (CV) procedure to validate the model and determine the values of <italic>h</italic> and <italic>d</italic><sub>0</sub>. We randomly divided the data into 10 sets, 9 of which we used to develop the model and the 10th to validate the model. We repeated this process 10 times, leaving each set of molecules out once. In the next section, we report averages of the 10-fold CV as the performance measures.</p>
</sec>
<sec>
<title>Performance measures</title>
<p>We used the following metrics to assess the quality of the classification models:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:mtext>sensitivity</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mrow><mml:mtext>TP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FN</mml:mtext></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:mtext>specificity</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TN</mml:mtext></mml:mrow><mml:mrow><mml:mtext>FP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;TN</mml:mtext></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:mtext>accuracy</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;TN</mml:mtext></mml:mrow><mml:mrow><mml:mtext>TP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;TN&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FN</mml:mtext></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mrow><mml:mtext>kappa</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>accuracy&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>Pr</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>e</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>Pr</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>e</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>where TP, TN, FP, and FN denote the numbers of true positives, true negatives, false positives, and false negatives, respectively. The metric kappa assesses the quality of binary classifiers (Dunn and Everitt, <xref ref-type="bibr" rid="B21">1995</xref>). Pr(<italic>e</italic>) is an estimate of the probability of a correct prediction by chance. It is calculated as:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:mi>Pr</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mtext>e</mml:mtext><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>TP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FN</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>TP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FP</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>FP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;TN</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mtext>TN&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FN</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mtext>TP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FN&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;FP&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;TN</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>The sensitivity measures a model&#x00027;s ability to correctly detect true positives, whereas the specificity measures its ability to detect true negatives. Kappa compares the probability of correct predictions to the probability of correct predictions by chance. Its value ranges from &#x0002B;1 (perfect agreement between model prediction and experiment) to &#x02212;1 (complete disagreement), with 0 indicating no agreement beyond that expected by chance.</p>
<p>The performance measure for regression models is given by the Pearson&#x00027;s correlation coefficient (Adler and Parmryd, <xref ref-type="bibr" rid="B2">2010</xref>):</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mrow><mml:mi>R</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:msqrt><mml:msqrt><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>where <italic>n</italic> is the sample size, <italic>x</italic><sub><italic>i</italic></sub> and <italic>y</italic><sub><italic>i</italic></sub> are samples, and <inline-formula><mml:math id="M9"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> and <inline-formula><mml:math id="M10"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> are sample means. The correlation coefficient provides a measure of the interrelatedness of numeric properties. Its value ranges from &#x02212;1 (highly anticorrelated) to &#x0002B;1 (highly correlated), and is 0 when uncorrelated.</p>
<p>We also calculated the coverage, which we define as the proportion of test molecules with at least one nearest neighbor that meets the similarity criterion. For all other molecules that do not meet the criterion, we do not make any predictions. In this case, the coverage is a measure of the size of the applicability domain of a prediction model.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>The vNN platform</title>
<p>The main purpose of the vNN-based platform is to provide users with a tool to make ADMET predictions and a user-friendly environment to build new models. Hence, the platform offers users two main capabilities that are accessible from the main webpage (<ext-link ext-link-type="uri" xlink:href="https://vnnadmet.bhsai.org/">https://vnnadmet.bhsai.org/</ext-link>) (Figure <xref ref-type="fig" rid="F1">1</xref>): (1) to run prebuilt ADMET models and (2) to build and run customized models.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The vNN-ADMET main page. From this page, users can run ADMET models or build their own models.</p></caption>
<graphic xlink:href="fphar-08-00889-g0001.tif"/>
</fig>
<p>To use prebuilt ADMET models, users need only provide one or more query molecules as the input (Figure <xref ref-type="fig" rid="F2">2</xref>). They can do this either by drawing the molecule, entering the molecular SMILES string (Weininger, <xref ref-type="bibr" rid="B47">1988</xref>) directly on the website, or uploading a text file (csv or txt format) with query molecules in SMILES format. The text file should contain column headers labeled as NAME and SMILES. Once users upload the query molecules, they can submit the job. The application will then automatically run all ADMET prediction models. The output will be displayed once all predictions are completed and a temporary link to the result page will be sent to the user&#x00027;s e-mail address. The results can be downloaded as a table to the user&#x00027;s computer (Figure <xref ref-type="fig" rid="F3">3</xref>). By default, the user will see the ADMET results for our models, which use a restricted applicability domain. However, there is an option to include the results for the remaining compounds, using our unrestricted applicability domain models. The time required to run 100 query compounds is &#x0007E;5 min on the server. However, this may vary depending on the size of the molecules and whether or not the job has been queued.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Submit ADMET predictions. On the <italic>Run ADMET Models</italic> page <bold>(top)</bold> users can upload a list of query compounds in SMILES format <bold>(lower left)</bold> or manually enter compounds by using the draw structure feature <bold>(lower right)</bold>.</p></caption>
<graphic xlink:href="fphar-08-00889-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The ADMET predictions result page. The 15 ADMET predictions for each query molecule are presented on a separate row. Predictions based on models using a restricted applicability domain are shown in solid colors and those based on models using an unrestricted applicability domain are shown in striped colors. Users can download the results from the website into a single file.</p></caption>
<graphic xlink:href="fphar-08-00889-g0003.tif"/>
</fig>
<p>Users can build their own models by either selecting <italic>Build Classification Model</italic> or <italic>Build Regression Model</italic> on the main webpage (Figure <xref ref-type="fig" rid="F1">1</xref>). On the <italic>Build Classification Model</italic> page (Figure <xref ref-type="fig" rid="F4">4</xref>), users are asked to upload a list of molecules in SMILES format and the property of interest, with column headers labeled as NAME, SMILES, and PROPERTY. The value of the property should be set to 1 or 0 for classification models and real numbers for regression models. The vNN platform will then automatically run 10-fold CV by varying the Tanimoto distance (<italic>d</italic>) from 0.1 to 1.0 in increments of 0.1, and the smoothing factor (<italic>h</italic>) from 0.1 to 1.0 at each value of <italic>d</italic>. Once the calculations are completed, a temporary link to the result page will be sent to the user&#x00027;s e-mail address. The results will be displayed on an interactive webpage where users can select the values for <italic>d</italic> and <italic>h</italic> (Equation 2), depending on the optimal performance measures and coverage (Figure <xref ref-type="fig" rid="F4">4</xref>). The time required to build a model with a dataset of 1,000 compounds is &#x0007E;10 min.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Build a classification model. On the <italic>Build Classification Model</italic> page <bold>(top)</bold>, users can upload their training data and/or draw structures. On the <italic>Build Classification Model Results</italic> page <bold>(bottom)</bold>, users can interactively select/deselect different smoothing factors for comparison. The graph shows accuracy of performance on the 10-fold cross validation test at different Tanimoto distances, where smoothing factors 0.2 and 1.0 are highlighted in green and blue, respectively (strikethrough smoothing factors indicate deselected values). The coverage is shown in gray. The red circle indicates the &#x0201C;best&#x0201D; model performance based on accuracy and coverage, where the black arrows show the corresponding Tanimoto-distance threshold (<italic>d</italic><sub>0</sub> &#x0003D; 0.7) and smoothing factor (<italic>h</italic> &#x0003D; 0.2). Although the accuracy is reduced to 88 from 90% at <italic>d</italic><sub>0</sub> &#x0003D; 0.6, the number of compounds predicted increases from 60 to 75%, which may be worth the loss in accuracy.</p></caption>
<graphic xlink:href="fphar-08-00889-g0004.tif"/>
</fig>
<p>Users can then select the <italic>Run Custom Model</italic> option to predict the activity of new test molecules (Figure <xref ref-type="fig" rid="F5">5</xref>), using the previously selected values for the <italic>Tanimoto Distance</italic> and <italic>Smoothing Factor</italic>, and add the same molecules as those used to train the model in the <italic>Upload Compounds with Property</italic> data field. They then need to add the new query molecule(s) in SMILES format in the <italic>Upload Query Compounds</italic> field. The result will be displayed on a new webpage, and a temporary link to that page will also be sent to the user&#x00027;s e-mail address (Figure <xref ref-type="fig" rid="F5">5</xref>).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Run a customized model. The first step to run a customized model is to upload the training dataset, as well as the selected Tanimoto distance and smoothing factor from Figure <xref ref-type="fig" rid="F4">4</xref>. The second step is to upload query compounds. The results can be downloaded from the <italic>Run Custom Model Results</italic> page (bottom).</p></caption>
<graphic xlink:href="fphar-08-00889-g0005.tif"/>
</fig>
</sec>
<sec>
<title>Available ADMET predictions</title>
<p>The available ADMET prediction models, including their performance measures for the restricted applicability domain model, are summarized in Table <xref ref-type="table" rid="T1">1</xref>. The performance measures for the models using an unrestricted applicability domain are presented in Table <xref ref-type="supplementary-material" rid="SM1">S1</xref> in the Supplementary Material and on our website (<ext-link ext-link-type="uri" xlink:href="https://vnnadmet.bhsai.org/">https://vnnadmet.bhsai.org/</ext-link>). The 15 models cover a diverse set of ADMET endpoints. We will briefly describe these models and their performance measures, as well as the sources from which we retrieved the data. All datasets are available in SMILES format on the vNN web server or in Structure Data Format (SDF) in the Supplementary Material (Datasheet <xref ref-type="supplementary-material" rid="SM2">1</xref>). Some of the models have already been published (Liu et al., <xref ref-type="bibr" rid="B29">2012</xref>, <xref ref-type="bibr" rid="B28">2015</xref>; Liu and Wallqvist, <xref ref-type="bibr" rid="B30">2014</xref>; Schyman et al., <xref ref-type="bibr" rid="B40">2016</xref>). We also present several new models here for the first time.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Performance measures of vNN models in 10-fold cross validation, using a restricted applicability domain.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Data<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref></bold></th>
<th valign="top" align="center"><bold><italic>d</italic><sub><italic>0</italic></sub><xref ref-type="table-fn" rid="TN2"><sup>b</sup></xref></bold></th>
<th valign="top" align="center"><bold><italic>h</italic><xref ref-type="table-fn" rid="TN3"><sup>c</sup></xref></bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>Sensitivity</bold></th>
<th valign="top" align="center"><bold>Specificity</bold></th>
<th valign="top" align="center"><bold>Kappa</bold></th>
<th valign="top" align="center"><bold><italic>R</italic><xref ref-type="table-fn" rid="TN4"><sup>d</sup></xref></bold></th>
<th valign="top" align="center"><bold>Coverage</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DILI</td>
<td valign="top" align="center">1,427</td>
<td valign="top" align="center">0.60</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.71</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.73</td>
<td valign="top" align="center">0.42</td>
<td/>
<td valign="top" align="center">0.66</td>
</tr>
<tr>
<td valign="top" align="left">Cytotox (hep2g)</td>
<td valign="top" align="center">6,097</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.84</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">0.76</td>
<td valign="top" align="center">0.64</td>
<td/>
<td valign="top" align="center">0.89</td>
</tr>
<tr>
<td valign="top" align="left">HLM</td>
<td valign="top" align="center">3,219</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.81</td>
<td valign="top" align="center">0.72</td>
<td valign="top" align="center">0.87</td>
<td valign="top" align="center">0.59</td>
<td/>
<td valign="top" align="center">0.91</td>
</tr>
<tr>
<td valign="top" align="left">CYP 1A2</td>
<td valign="top" align="center">7,558</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.66</td>
<td/>
<td valign="top" align="center">0.75</td>
</tr>
<tr>
<td valign="top" align="left">CYP 2C9</td>
<td valign="top" align="center">8,072</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">0.54</td>
<td/>
<td valign="top" align="center">0.76</td>
</tr>
<tr>
<td valign="top" align="left">CYP 2C19</td>
<td valign="top" align="center">8,155</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.87</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.58</td>
<td/>
<td valign="top" align="center">0.76</td>
</tr>
<tr>
<td valign="top" align="left">CYP 3A4</td>
<td valign="top" align="center">10,373</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">0.76</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.68</td>
<td/>
<td valign="top" align="center">0.78</td>
</tr>
<tr>
<td valign="top" align="left">CYP 2D6</td>
<td valign="top" align="center">7,805</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">0.61</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.57</td>
<td/>
<td valign="top" align="center">0.75</td>
</tr>
<tr>
<td valign="top" align="left">BBB</td>
<td valign="top" align="center">353</td>
<td valign="top" align="center">0.60</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.86</td>
<td valign="top" align="center">0.80</td>
<td/>
<td valign="top" align="center">0.61</td>
</tr>
<tr>
<td valign="top" align="left">Pgp Substrate</td>
<td valign="top" align="center">822</td>
<td valign="top" align="center">0.60</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.79</td>
<td valign="top" align="center">0.80</td>
<td valign="top" align="center">0.79</td>
<td valign="top" align="center">0.58</td>
<td/>
<td valign="top" align="center">0.66</td>
</tr>
<tr>
<td valign="top" align="left">Pgp Inhibitor</td>
<td valign="top" align="center">2,304</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">0.73</td>
<td valign="top" align="center">0.66</td>
<td/>
<td valign="top" align="center">0.76</td>
</tr>
<tr>
<td valign="top" align="left">hERG</td>
<td valign="top" align="center">685</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.84</td>
<td valign="top" align="center">0.84</td>
<td valign="top" align="center">0.83</td>
<td valign="top" align="center">0.68</td>
<td/>
<td valign="top" align="center">0.80</td>
</tr>
<tr>
<td valign="top" align="left">MMP</td>
<td valign="top" align="center">6,261</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.61</td>
<td/>
<td valign="top" align="center">0.69</td>
</tr>
<tr>
<td valign="top" align="left">AMES</td>
<td valign="top" align="center">6,512</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.82</td>
<td valign="top" align="center">0.86</td>
<td valign="top" align="center">0.75</td>
<td valign="top" align="center">0.62</td>
<td/>
<td valign="top" align="center">0.79</td>
</tr>
<tr>
<td valign="top" align="left">MRTD<xref ref-type="table-fn" rid="TN5"><sup>e</sup></xref></td>
<td valign="top" align="center">1,184</td>
<td valign="top" align="center">0.60</td>
<td valign="top" align="center">0.20</td>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">0.79</td>
<td valign="top" align="center">0.69</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1">
<label>a</label>
<p><italic>Number of compounds in the dataset;</italic></p></fn>
<fn id="TN2">
<label>b</label>
<p><italic>Tanimoto-distance threshold value;</italic></p></fn>
<fn id="TN3">
<label>c</label>
<p><italic>Smoothing factor;</italic></p></fn>
<fn id="TN4">
<label>d</label>
<p><italic>Pearson&#x00027;s correlation coefficient;</italic></p></fn>
<fn id="TN5">
<label>e</label>
<p><italic>Regression model</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<sec>
<title>Blood-brain barrier</title>
<p>The blood-brain barrier (BBB) is a highly selective barrier that separates the circulating blood from the central nervous system (CNS) (Abbott et al., <xref ref-type="bibr" rid="B1">2006</xref>). It allows the passage of water molecules and water-soluble lipid molecules, as well as the selective transport of glucose and amino acids. The benefit of predicting BBB-permeable compounds is two-fold: (1) to identify toxicants that could harm the brain, and (2) to design drug molecules that can pass the BBB and reach their target in the CNS.</p>
<p>We developed a vNN-based BBB model, using 353 compounds whose BBB permeability values (log<italic>BB</italic>) were obtained from the literature (Muehlbacher et al., <xref ref-type="bibr" rid="B34">2011</xref>; Naef, <xref ref-type="bibr" rid="B36">2015</xref>). We classified compounds with log <italic>BB</italic> values of &#x0003C;&#x02212;0.3 and &#x0003E;&#x0002B;0.3 as BBB non-permeable and permeable, respectively. To calculate performance measures, we classified BBB permeable and BBB non-permeable compounds as positives and negatives, respectively.</p>
<p>The model predicted whether or not a given compound would pass the BBB, but only for compounds within the applicability domain defined by the training set. The performance measures in Table <xref ref-type="table" rid="T1">1</xref> were calculated from 10-fold CV. The model showed a high overall accuracy of 90% and a kappa value of 0.80, with a coverage of 61%. The size of the dataset limited the applicability domain of the model. However, if new data become available, they can easily be added to the model to increase the applicability domain.</p>
<p>The model performed on par with the best of the BBB models published thus far. Most of the latter models, which used small datasets, are global models applied to any molecule. However, all models have a finite applicability domain (Cherkasov et al., <xref ref-type="bibr" rid="B14">2014</xref>). Indeed, modeling BBB permeability is complicated because there are different possible routes across the barrier, via passive diffusion or protein transport, and no model singlehandedly accounts for all factors associated with this property. Our vNN model only makes predictions for compounds that are structurally similar enough to the test set molecules to ensure that they have the same type of transport mechanism. Thus, our vNN method accounts for multiple transport routes.</p>
</sec>
<sec>
<title>MMP disruption (mitochondrial toxicity)</title>
<p>Given the fundamental role of mitochondria in cellular energetics and oxidative stress, mitochondrial dysfunction has been implicated in cancer, diabetes, neurodegenerative disorders, and cardiovascular diseases (Pieczenik and Neustadt, <xref ref-type="bibr" rid="B37">2007</xref>). Many pharmaceuticals and environmental toxicants cause mitochondrial dysfunction (Meyer et al., <xref ref-type="bibr" rid="B33">2013</xref>). Therefore, the ability to predict the impact of chemicals on mitochondrial function would be useful. However, predicting mitochondrial toxicants is complicated because mitochondrial dysfunction can result from impairing any of the following: (1) the electron transport chain (ETC), (2) the mitochondrial transport pathway, (3) fatty acid oxidation, (4) the citric acid cycle, (5) mtDNA replication, (6) and mitochondrial protein synthesis.</p>
<p>There are several common experimental techniques to measure mitochondrial function. We used the largest dataset of chemical-induced changes in mitochondrial membrane potential (MMP), based on the assumption that a compound that causes mitochondrial dysfunction is also likely to reduce the MMP. We developed a vNN-based MMP prediction model, using 6,261 compounds collected from a previous study that screened a library of 10,000 compounds (&#x0007E;8,300 unique chemicals) at 15 concentrations, each in triplicate, to measure changes in the MMP in HepG2 cells (Attene-Ramos et al., <xref ref-type="bibr" rid="B6">2015</xref>). The study found that 913 compounds decreased the MMP, whereas 5,395 compounds had no effect. We classified compounds that decreased the MMP as positives and those that did not affect the MMP as negatives.</p>
<p>Our MMP model predicted whether a given compound had the potential to affect the MMP and thereby cause mitochondrial dysfunction. It made predictions for compounds that were well represented in the applicability domain, but not for any other compound. The model showed a high overall accuracy of 89% and a kappa value of 0.61, with a coverage of 69% (Table <xref ref-type="table" rid="T1">1</xref>).</p>
</sec>
<sec>
<title>Cytotoxicity (HepG2)</title>
<p>Cytotoxicity is the degree to which a chemical causes damage to cells. Cytotoxicity assays are widely used to screen compounds for unwanted cell damage, and to identify compounds that could be used, for example, to kill cancer cells. As such, the ability to identify cytotoxic compounds is highly desirable.</p>
<p>We developed a cytotoxicity prediction model, using a training dataset of <italic>in vitro</italic> toxicity against HepG2 cells for 6,097 structurally diverse compounds, which we collected from Chemical European Biology Laboratory (ChEMBL) (Bento et al., <xref ref-type="bibr" rid="B7">2014</xref>). In developing our model, we considered compounds with an IC<sub>50</sub> of 10 &#x003BC;M or less in the <italic>in vitro</italic> assay as cytotoxic. We classified cytotoxic compounds as positives and non-toxic compounds as negatives.</p>
<p>The cytotoxicity model performed well, with an overall accuracy of 84% and a kappa value of 0.64 (Table <xref ref-type="table" rid="T1">1</xref>). Because compounds in the dataset achieved only sparse coverage of the chemical space, the model only predicted compounds that were well represented in the dataset. It did not give predictions for other compounds, and thereby avoided misleading results. When using 10-fold CV, the model reliably predicted 89% of the compounds in our dataset.</p>
</sec>
<sec>
<title>Drug-induced liver injury</title>
<p>Over the last 50 years, drug-induced liver injury (DILI) has been the most commonly cited reason for drug withdrawals from the market (Assis and Navarro, <xref ref-type="bibr" rid="B5">2009</xref>). As a result, current drug development efforts are devoted to identifying and eliminating potential DILI compounds. Therefore, a model that predicts at an early stage whether a compound causes liver injury would be highly desirable. However, the mechanisms of DILI are complicated and diverse, making toxicology studies difficult. For example, compounds that cause DILI in humans do not necessarily induce clear liver injury in animal studies.</p>
<p>We collected DILI data from four sources used by Xu et al. (<xref ref-type="bibr" rid="B49">2015</xref>): (1) the U.S. FDA&#x00027;s National Center for Toxicological Research (NCTR dataset) (Chen M. et al., <xref ref-type="bibr" rid="B12">2011</xref>), as well as the datasets of (2) Greene (Greene et al., <xref ref-type="bibr" rid="B22">2010</xref>), (3) Xu (Xu et al., <xref ref-type="bibr" rid="B48">2008</xref>), and (4) Liew (Liew et al., <xref ref-type="bibr" rid="B27">2011</xref>). In the first three datasets, which included pharmaceuticals, we classified a compound as causing DILI if it was associated with a high risk of DILI and not if there was no such risk. We excluded low-risk DILI compounds. In the Liew dataset, which contained both pharmaceuticals and non-pharmaceuticals, we classified a compound as causing DILI if it was associated with any adverse liver effect. DILI-associated compounds were classified as positives and non-DILI compounds as negatives.</p>
<p>The performance measures of the vNN model, using 10-fold CV of the entire dataset excluding duplicated compounds, showed an overall accuracy of 71% and a coverage of 66% (Table <xref ref-type="table" rid="T1">1</xref>). We also used the same datasets and compared our models with some previously published deep learning models (Xu et al., <xref ref-type="bibr" rid="B49">2015</xref>; Table <xref ref-type="table" rid="T2">2</xref>). Considering the complexity and computational time investment involved in training these deep learning models, our vNN models performed relatively well; they performed on-par with the deep learning models, albeit with a coverage ranging from 40 to 65%.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Performance measures of vNN DILI models compared with deep learning.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>NCTR<xref ref-type="table-fn" rid="TN6"><sup>a</sup></xref></bold></th>
<th valign="top" align="center"><bold>NCTR<xref ref-type="table-fn" rid="TN6"><sup>a</sup></xref></bold></th>
<th valign="top" align="center"><bold>Green<xref ref-type="table-fn" rid="TN6"><sup>a</sup></xref></bold></th>
<th valign="top" align="center"><bold>Xu<xref ref-type="table-fn" rid="TN6"><sup>a</sup></xref></bold></th>
<th valign="top" align="center"><bold>Combined<xref ref-type="table-fn" rid="TN6"><sup>a</sup></xref></bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>10-fold CV</bold></th>
<th valign="top" align="center"><bold>Test</bold></th>
<th valign="top" align="center"><bold>Test</bold></th>
<th valign="top" align="center"><bold>Test</bold></th>
<th valign="top" align="center"><bold>10-fold CV</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Accuracy (%)</td>
<td valign="top" align="center">87 (81)</td>
<td valign="top" align="center">75 (70)</td>
<td valign="top" align="center">61 (65)</td>
<td valign="top" align="center">60 (62)</td>
<td valign="top" align="center">85 (85)<xref ref-type="table-fn" rid="TN7"><sup>b</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Sensitivity (%)</td>
<td valign="top" align="center">65 (70)</td>
<td valign="top" align="center">64 (80)</td>
<td valign="top" align="center">51 (75)</td>
<td valign="top" align="center">52 (62)</td>
<td valign="top" align="center">83 (84)<xref ref-type="table-fn" rid="TN7"><sup>b</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Specificity (%)</td>
<td valign="top" align="center">95 (88)</td>
<td valign="top" align="center">86 (60)</td>
<td valign="top" align="center">75 (46)</td>
<td valign="top" align="center">70 (62)</td>
<td valign="top" align="center">88 (85)<xref ref-type="table-fn" rid="TN7"><sup>b</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Coverage (%)</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">41</td>
<td valign="top" align="center">67</td>
</tr>
<tr>
<td valign="top" align="left">No of Compounds</td>
<td valign="top" align="center">190</td>
<td valign="top" align="center">185</td>
<td valign="top" align="center">320</td>
<td valign="top" align="center">236</td>
<td valign="top" align="center">475</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN6">
<label>a</label>
<p><italic>Values in parentheses are the deep learning results from Xu et al. (<xref ref-type="bibr" rid="B49">2015</xref>)</italic>.</p></fn>
<fn id="TN7">
<label>b</label>
<p><italic>Values averaged over 60 runs of 10-fold CV</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Cytochrome P450 inhibition (drug-drug interaction)</title>
<p>Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., <xref ref-type="bibr" rid="B10">2008</xref>). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose&#x02014;an effect known as a drug-drug interaction (Murray, <xref ref-type="bibr" rid="B35">2006</xref>). In drug development, <italic>in vitro</italic> assays are routinely used to assess interactions between drug candidates and CYPs. However, there is a need for <italic>in silico</italic> models that assess potential interactions with CYPs in the early stages of drug development.</p>
<p>We collected data for five main drug-metabolizing CYPs: 1A2, 2D6, 2C9, 2C19, and 3A4. We retrieved CYP inhibitors from ChEMBL (Bento et al., <xref ref-type="bibr" rid="B7">2014</xref>) and classified them as inhibitors if the IC<sub>50</sub> was below 10 &#x003BC;M. We removed from the dataset any duplicates or compounds tested multiple times with contradicting results, in which the reported IC<sub>50</sub> values were below and above the 10 &#x003BC;M threshold value. For all CYPs, we classified inhibitors and non-inhibitors as positives and negatives, respectively.</p>
<p>The performance measures for the five CYP models are presented in Table <xref ref-type="table" rid="T1">1</xref>. All models achieved high accuracy (87&#x02013;91%) and kappa values (0.54&#x02013;0.68) while maintaining high coverage (75&#x02013;78%).</p>
</sec>
<sec>
<title>hERG blockers</title>
<p>The human ether-&#x000E0;-go-go-related gene (hERG) codes for a potassium ion channel involved in the normal cardiac repolarization activity of the heart (Sanguinetti and Tristani-Firouzi, <xref ref-type="bibr" rid="B39">2006</xref>). Drug-induced blockade of hERG function can cause long QT syndrome, which may result in arrhythmia and death (De Ponti et al., <xref ref-type="bibr" rid="B18">2001</xref>). For this reason, hERG liability is one of the toxicology screens that drug candidates must pass during early pre-clinical studies. Therefore, <italic>in silico</italic> models that identify hERG blockers in the early stages of drug design are of considerable interest.</p>
<p>We retrieved 282 known hERG blockers from the literature and classified compounds with an IC<sub>50</sub> cutoff value of 10 &#x003BC;M or less as blockers (Wang et al., <xref ref-type="bibr" rid="B44">2012</xref>). We also collected a set of 404 compounds with IC<sub>50</sub> values &#x0003E;10 &#x003BC;M from ChEMBL (Bento et al., <xref ref-type="bibr" rid="B7">2014</xref>) and classified them as non-blockers (Czodrowski, <xref ref-type="bibr" rid="B16">2013</xref>). We classified hERG blockers and non-blockers as positives and negatives, respectively.</p>
<p>The hERG model performed with an overall accuracy of 84%, well-balanced sensitivity and specificity values (84 and 83%, respectively), and a kappa value of 0.68 (Table <xref ref-type="table" rid="T1">1</xref>). The model reliably predicted 80% of the compounds in our dataset when using 10-fold CV. However, the coverage of chemical space by the non-hERG blockers in the dataset was sparse, and only compounds well represented in the dataset were predicted with confidence. Because the model did not give predictions for other compounds, it avoided misleading results. Therefore, users should use this model to flag potential hERG blockers rather than to identify non-hERG blockers.</p>
</sec>
<sec>
<title>Pgp substrates and inhibitors</title>
<p>P-glycoprotein (Pgp) is an essential cell membrane protein that extracts many foreign substances from the cell (Ambudkar et al., <xref ref-type="bibr" rid="B3">2003</xref>). As such, it is a critical determinant of the pharmacokinetic properties of drugs. Cancer cells often overexpress Pgp, which increases the efflux of chemotherapeutic agents from the cell and prevents treatment by reducing the effective intracellular concentrations of such agents&#x02014;a phenomenon known as multidrug resistance (Borst and Elferink, <xref ref-type="bibr" rid="B8">2002</xref>). For this reason, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. Therefore, using the vNN method, we developed models to predict both Pgp substrates and Pgp inhibitors.</p>
<p>The Pgp substrate dataset was collected by Hou and co-workers (Li et al., <xref ref-type="bibr" rid="B26">2014</xref>). This dataset included measurements for 422 substrates and 400 non-substrates. To generate a large Pgp inhibitor dataset, we combined two datasets (Broccatelli et al., <xref ref-type="bibr" rid="B9">2011</xref>; Chen L. et al., <xref ref-type="bibr" rid="B11">2011</xref>), and removed duplicates to form a combined dataset consisting of a training set of 1,319 inhibitors and 937 non-inhibitors. We classified the Pgp inhibitors (substrates) and non-inhibitors (non-substrates) as positives and negatives, respectively.</p>
<p>The vNN models for identifying Pgp substrates and inhibitors gave accurate and reliable results, showing overall accuracies of 79 and 85%, respectively, when using 10-fold CV, with corresponding kappa values of 0.58 and 0.66. These models reliably predicted 65 and 76% of the compounds in their datasets to be Pgp substrates and inhibitors, respectively. The performance characteristics of these models were comparable, or at times superior, to those of other model constructs (Schyman et al., <xref ref-type="bibr" rid="B40">2016</xref>).</p>
</sec>
<sec>
<title>Chemical mutagenicity (AMES test)</title>
<p>Mutagens are chemicals that cause abnormal genetic mutations leading to cancer. A common way to assess a chemical&#x00027;s mutagenicity is the Ames test (Ames et al., <xref ref-type="bibr" rid="B4">1973</xref>). This test has become the standard for assessing the safety of chemicals and drugs, and has been used to test thousands of molecules. We examined whether the vNN method could effectively use existing data to predict mutagenicity.</p>
<p>We retrieved an Ames mutagenicity dataset consisting of 6,512 compounds, of which 3,503 were Ames-positive (Hansen et al., <xref ref-type="bibr" rid="B23">2009</xref>), and developed a vNN Ames mutagenicity prediction model. The model performed well, with an overall accuracy of 82%; sensitivity and specificity values of 86 and 75%, respectively; and a high kappa value of 0.62 (Table <xref ref-type="table" rid="T1">1</xref>). The model also reliably predicted 79% of the compounds in the Ames dataset when using 10-fold CV. Further details of the model and its prediction performance can be found elsewhere (Liu and Wallqvist, <xref ref-type="bibr" rid="B30">2014</xref>).</p>
</sec>
<sec>
<title>Maximum recommended therapeutic dose</title>
<p>A basic principle of toxicology is that &#x0201C;the dose makes the poison.&#x0201D; For most drugs, the therapeutic dose is limited by toxicity, and the maximum recommended therapeutic dose (MRTD) is an estimated upper daily dose that is safe (Contrera et al., <xref ref-type="bibr" rid="B15">2004</xref>). Investigators carry out toxicological experiments on animals to determine the toxic effects of a drug and the initial dose for human clinical trials. Unfortunately, there is a lack of correlation between animal and human toxicity data. Therefore, we investigated whether the vNN method could predict the MRTD values of new compounds based on known human MRTD data. If so, the values could be used to estimate the starting dose in phase I clinical trials, while significantly reducing the number of animals used in preliminary toxicology studies.</p>
<p>We obtained a dataset of MRTD values publically disclosed by the FDA, mostly of single-day oral doses for an average adult with a body weight of 60 kg, for 1,220 compounds (most of which are small organic drugs). For modeling purposes we converted the MRTD unit from mg/kg-body weight/day to mol/kg-body weight/day via the molecular weight of the compound. However, the predicted values on the website are reported in mg/day based upon an average adult weighing 60 kg. We excluded organometallics, high-molecular weight polymers (&#x0003E;5,000 Da), nonorganic chemicals, mixtures of chemicals, and very small molecules (&#x0003C;100 Da). We used an external test set of 160 compounds, which was collected by the FDA for validation. The total dataset for our model contained 1,184 compounds (Liu et al., <xref ref-type="bibr" rid="B29">2012</xref>).</p>
<p>The MRTD model reliably predicted 69% of the FDA MRTD dataset, with a Pearson&#x00027;s correlation coefficient (<italic>R</italic>) of 0.79 between the predicted and measured <italic>log</italic>(MRTD) values, and a mean deviation (mDev) of 0.56 <italic>log</italic> units, using 40-fold CV (Liu et al., <xref ref-type="bibr" rid="B29">2012</xref>). For comparison, we used two popular QSAR regression methods&#x02014;the partial least square (PLS) and support vector machine (SVM) methods&#x02014;to develop two global models to fit the training dataset. We evaluated the model performance, using 40-fold CV of the training set. The best PLS model achieved an <italic>R</italic>-value of 0.50 and an mDev of 0.79. The results for the SVM model were at best comparable to those of the best PLS model, with an <italic>R</italic>-value of 0.53 and an mDev of 0.63. For further details of the model, we refer the reader to our previous paper (Liu et al., <xref ref-type="bibr" rid="B29">2012</xref>).</p>
</sec>
<sec>
<title>Human liver microsomal stability</title>
<p>The human liver is the most important organ for drug metabolism. For a drug to achieve effective therapeutic concentrations in the body, it cannot be metabolized too rapidly by the liver. Otherwise, it would need to be administered at high doses, which are associated with high toxicity. To identify and exclude rapidly metabolized compounds (Di et al., <xref ref-type="bibr" rid="B19">2003</xref>), pharmaceutical companies commonly use the human liver microsomal (HLM) stability assay. This has led to the accumulation of a substantial body of HLM stability data in publicly accessible databases.</p>
<p>However, our knowledge of how enzymes in the HLM assay metabolize drugs remains fragmentary. Therefore, we examined whether the vNN method could effectively predict drugs that are rapidly metabolized by the liver. We retrieved HLM data from the ChEMBL database (Bento et al., <xref ref-type="bibr" rid="B7">2014</xref>), manually curated the data, and classified compounds as stable or unstable based on the reported half-life [T1/2 &#x0003E; 30 min was considered stable, and T1/2 &#x0003C; 30 min unstable (Liu et al., <xref ref-type="bibr" rid="B28">2015</xref>)]. The final dataset contained 3,219 compounds. Of these, we classified 2,047 as stable and 1,166 as unstable.</p>
<p>The HLM model performed with an overall accuracy of 81%; sensitivity and specificity values of 71 and 87%, respectively; and a high kappa value of 0.60 (Table <xref ref-type="table" rid="T1">1</xref>). The HLM model reliably predicted 91% of the compounds in the HLM dataset when using 10-fold CV. We refer the reader to our original paper for further details of the model and its prediction performance (Liu et al., <xref ref-type="bibr" rid="B28">2015</xref>).</p>
</sec>
</sec>
<sec>
<title>Implementation aspects</title>
<p>The vNN-ADMET web-application is hosted on an Apache Tomcat Web server that is accessible via a secure service over Hypertext Transfer Protocol Secure (https). We developed the application on the basis of a three-tiered architecture, composed of a backend database, controller, and presentation tiers. The first tier consists of a PostgreSQL 9.5.7 database that stores user account information, uploaded files, constructed models, and model predictions. The second (controller) tier provides access to the prediction engine and implements the functionality required to create and manage multiple predictions. We implemented this tier, using Pipeline Pilot protocols hosted on a local Pipeline Pilot server. The third (presentation) tier provides for visualization of the results, with plotting capabilities for multiple predictions. The controller and presentation tiers were developed using Java Platform, Enterprise Edition 7, Spring Framework 4.2.2, JavaServer Faces 2.2, PrimeFaces 6.0, and BootsFaces 1.0.2. The graphical user interface in the presentation tier uses Web standards supported by modern Web browsers, including Microsoft Edge 38, Chrome version 58, and Firefox version 53, without any need for plugins.</p>
<p>To use the system, the user must register for an account at <ext-link ext-link-type="uri" xlink:href="https://vnnadmet.bhsai.org/">https://vnnadmet.bhsai.org/</ext-link>. Once logged in, the user can build custom models, and run pre-built ADMET and custom models. The data corresponding to a user (login credentials, compounds, models, results, etc.) are not shared with any other user within or outside the system. The uploaded compounds, constructed models, and model predictions are purged from the system every 2 weeks.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>We have presented a web-based vNN prediction platform, with which a user can build and test models as well as predict the ADMET properties of a compound by using our existing tools.</p>
<p>All vNN models performed well with accuracies of &#x0003E;71% (see Table <xref ref-type="table" rid="T1">1</xref> for further details). On average, the models predicted 75% of the compounds in their datasets, using 10-fold CV.</p>
<p>Achieving fair comparisons between a new model and a competing model is always difficult because such comparisons require the same training data, validation data, and performance measures. An important advantage of our platform is that it offers an opportunity for developers to compare their methods with our vNN method, using their training and validation data.</p>
<p>For demonstrative purposes, we quantitatively compared our vNN method with the winning method of the Tox21 challenge (Huang et al., <xref ref-type="bibr" rid="B25">2016</xref>). This challenge was issued in 2014 by the U.S. Toxicology in the twenty-first Century (Tox21) program, which aims to improve toxicity prediction methods. The Tox 21 consortium solicited models that could best predict the toxicity of 10,000 compounds it had tested in 12 different assays (Table <xref ref-type="table" rid="T3">3</xref>). It used a final evaluation dataset that was concealed to determine the winners.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Tox21 assays with PubChem assay identification number.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Assay ID</bold></th>
<th valign="top" align="left"><bold>Assay</bold></th>
<th valign="top" align="left"><bold>PubChem AID</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">AhR</td>
<td valign="top" align="left">Aryl hydrocarbon receptor</td>
<td valign="top" align="center">743122</td>
</tr>
<tr>
<td valign="top" align="left">Aromatase</td>
<td valign="top" align="left">Aromatase</td>
<td valign="top" align="center">743139</td>
</tr>
<tr>
<td valign="top" align="left">AR</td>
<td valign="top" align="left">Androgen receptor</td>
<td valign="top" align="center">743040</td>
</tr>
<tr>
<td valign="top" align="left">AR-LBD</td>
<td valign="top" align="left">Androgen receptor LBD</td>
<td valign="top" align="center">743053</td>
</tr>
<tr>
<td valign="top" align="left">ER</td>
<td valign="top" align="left">Estrogen receptor alpha</td>
<td valign="top" align="center">743079</td>
</tr>
<tr>
<td valign="top" align="left">ER-LBD</td>
<td valign="top" align="left">Estrogen receptor alpha LBD</td>
<td valign="top" align="center">743077</td>
</tr>
<tr>
<td valign="top" align="left">PPAR-g</td>
<td valign="top" align="left">Peroxisome proliferator-activated receptor gamma</td>
<td valign="top" align="center">743140</td>
</tr>
<tr>
<td valign="top" align="left">ARE</td>
<td valign="top" align="left">Nuclear factor antioxidant responsive element</td>
<td valign="top" align="center">743219</td>
</tr>
<tr>
<td valign="top" align="left">ATAD5</td>
<td valign="top" align="left">ATAD5</td>
<td valign="top" align="center">720516</td>
</tr>
<tr>
<td valign="top" align="left">HSE</td>
<td valign="top" align="left">Heat shock factor response element</td>
<td valign="top" align="center">743228</td>
</tr>
<tr>
<td valign="top" align="left">MMP</td>
<td valign="top" align="left">Mitochondrial membrane potential</td>
<td valign="top" align="center">720637</td>
</tr>
<tr>
<td valign="top" align="left">p53</td>
<td valign="top" align="left">p53</td>
<td valign="top" align="center">720552</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Table <xref ref-type="table" rid="T4">4</xref> shows the area under the curve for the receiver operating characteristic (AUC-ROC) of the 18 leading research teams with their best-performing model for each of the 12 assays. To compare our models with those in Table <xref ref-type="table" rid="T4">4</xref>, we set <italic>d</italic> to 1.0 so that we could predict all compounds. The vNN method performed reasonably well in predicting most of the Tox21 assays. We note that the grand challenge winner used data from PubChem (Wang et al., <xref ref-type="bibr" rid="B45">2009</xref>) and ChEMBL (Bento et al., <xref ref-type="bibr" rid="B7">2014</xref>), in addition to the Tox21 data, which makes it impossible for us to directly compare our results with their results.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>AUC-ROCs of vNN models and the best 18 models on the final evaluation test of the Tox21 Challenge.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Team</bold></th>
<th valign="top" align="center"><bold>AhR</bold></th>
<th valign="top" align="center"><bold>AR</bold></th>
<th valign="top" align="center"><bold>AR-LBD</bold></th>
<th valign="top" align="center"><bold>ARE</bold></th>
<th valign="top" align="center"><bold>Aromatase</bold></th>
<th valign="top" align="center"><bold>ATAD5</bold></th>
<th valign="top" align="center"><bold>ER</bold></th>
<th valign="top" align="center"><bold>ER-LBD</bold></th>
<th valign="top" align="center"><bold>HSE</bold></th>
<th valign="top" align="center"><bold>MMP</bold></th>
<th valign="top" align="center"><bold>p53</bold></th>
<th valign="top" align="center"><bold>PPARg</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GrandWinner</td>
<td valign="top" align="center">0.928</td>
<td valign="top" align="center">0.807</td>
<td valign="top" align="center">0.879</td>
<td valign="top" align="center">0.840</td>
<td valign="top" align="center">0.834</td>
<td valign="top" align="center">0.793</td>
<td valign="top" align="center">0.810</td>
<td valign="top" align="center">0.814</td>
<td valign="top" align="center">0.865</td>
<td valign="top" align="center">0.942</td>
<td valign="top" align="center">0.862</td>
<td valign="top" align="center">0.861</td>
</tr>
<tr>
<td valign="top" align="left">AMAZIZ</td>
<td valign="top" align="center">0.913</td>
<td valign="top" align="center">0.770</td>
<td valign="top" align="center">0.846</td>
<td valign="top" align="center">0.805</td>
<td valign="top" align="center">0.819</td>
<td valign="top" align="center">0.828</td>
<td valign="top" align="center">0.806</td>
<td valign="top" align="center">0.806</td>
<td valign="top" align="center">0.842</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.843</td>
<td valign="top" align="center">0.830</td>
</tr>
<tr>
<td valign="top" align="left">dmlab</td>
<td valign="top" align="center" style="background-color:#939598">0.781</td>
<td valign="top" align="center">0.828</td>
<td valign="top" align="center">0.819</td>
<td valign="top" align="center">0.768</td>
<td valign="top" align="center">0.838</td>
<td valign="top" align="center">0.800</td>
<td valign="top" align="center">0.766</td>
<td valign="top" align="center">0.772</td>
<td valign="top" align="center">0.855</td>
<td valign="top" align="center">0.946</td>
<td valign="top" align="center">0.880</td>
<td valign="top" align="center">0.831</td>
</tr>
<tr>
<td valign="top" align="left">T</td>
<td valign="top" align="center">0.913</td>
<td valign="top" align="center" style="background-color:#939598">0.676</td>
<td valign="top" align="center">0.848</td>
<td valign="top" align="center">0.801</td>
<td valign="top" align="center">0.825</td>
<td valign="top" align="center">0.814</td>
<td valign="top" align="center">0.784</td>
<td valign="top" align="center">0.805</td>
<td valign="top" align="center">0.811</td>
<td valign="top" align="center">0.937</td>
<td valign="top" align="center">0.847</td>
<td valign="top" align="center">0.822</td>
</tr>
<tr>
<td valign="top" align="left">Microsomes</td>
<td valign="top" align="center">0.901</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center">0.804</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center">0.812</td>
<td valign="top" align="center">0.785</td>
<td valign="top" align="center">0.827</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center">0.826</td>
<td valign="top" align="center">0.717</td>
</tr>
<tr>
<td valign="top" align="left">FilipsPL</td>
<td valign="top" align="center">0.893</td>
<td valign="top" align="center">0.736</td>
<td valign="top" align="center">0.743</td>
<td valign="top" align="center">0.758</td>
<td valign="top" align="center" style="background-color:#939598">0.776</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center">0.771</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">0.766</td>
<td valign="top" align="center">0.928</td>
<td valign="top" align="center">0.815</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
</tr>
<tr>
<td valign="top" align="left">Charite</td>
<td valign="top" align="center">0.896</td>
<td valign="top" align="center" style="background-color:#939598">0.688</td>
<td valign="top" align="center">0.789</td>
<td valign="top" align="center">0.739</td>
<td valign="top" align="center" style="background-color:#939598">0.781</td>
<td valign="top" align="center">0.751</td>
<td valign="top" align="center" style="background-color:#939598">0.707</td>
<td valign="top" align="center">0.798</td>
<td valign="top" align="center">0.852</td>
<td valign="top" align="center" style="background-color:#939598">0.880</td>
<td valign="top" align="center">0.834</td>
<td valign="top" align="center">0.7</td>
</tr>
<tr>
<td valign="top" align="left">RCC</td>
<td valign="top" align="center" style="background-color:#939598">0.872</td>
<td valign="top" align="center">0.763</td>
<td valign="top" align="center">0.747</td>
<td valign="top" align="center">0.761</td>
<td valign="top" align="center">0.792</td>
<td valign="top" align="center" style="background-color:#939598">0.673</td>
<td valign="top" align="center">0.781</td>
<td valign="top" align="center" style="background-color:#939598">0.762</td>
<td valign="top" align="center" style="background-color:#939598">0.755</td>
<td valign="top" align="center">0.920</td>
<td valign="top" align="center" style="background-color:#939598">0.795</td>
<td valign="top" align="center" style="background-color:#939598">0.637</td>
</tr>
<tr>
<td valign="top" align="left">Frozenarm</td>
<td valign="top" align="center" style="background-color:#939598">0.865</td>
<td valign="top" align="center">0.744</td>
<td valign="top" align="center">0.722</td>
<td valign="top" align="center" style="background-color:#939598">0.700</td>
<td valign="top" align="center" style="background-color:#939598">0.740</td>
<td valign="top" align="center">0.726</td>
<td valign="top" align="center">0.745</td>
<td valign="top" align="center">0.790</td>
<td valign="top" align="center" style="background-color:#939598">0.752</td>
<td valign="top" align="center" style="background-color:#939598">0.859</td>
<td valign="top" align="center" style="background-color:#939598">0.803</td>
<td valign="top" align="center">0.803</td>
</tr>
<tr>
<td valign="top" align="left">ToxFit</td>
<td valign="top" align="center" style="background-color:#939598">0.862</td>
<td valign="top" align="center">0.744</td>
<td valign="top" align="center">0.757</td>
<td valign="top" align="center" style="background-color:#939598">0.697</td>
<td valign="top" align="center" style="background-color:#939598">0.738</td>
<td valign="top" align="center">0.729</td>
<td valign="top" align="center" style="background-color:#939598">0.729</td>
<td valign="top" align="center" style="background-color:#939598">0.752</td>
<td valign="top" align="center" style="background-color:#939598">0.689</td>
<td valign="top" align="center" style="background-color:#939598">0.862</td>
<td valign="top" align="center" style="background-color:#939598">0.803</td>
<td valign="top" align="center">0.791</td>
</tr>
<tr>
<td valign="top" align="left">CGL</td>
<td valign="top" align="center" style="background-color:#939598">0.866</td>
<td valign="top" align="center">0.742</td>
<td valign="top" align="center" style="background-color:#939598">0.566</td>
<td valign="top" align="center">0.747</td>
<td valign="top" align="center" style="background-color:#939598">0.749</td>
<td valign="top" align="center">0.737</td>
<td valign="top" align="center">0.759</td>
<td valign="top" align="center" style="background-color:#939598">0.727</td>
<td valign="top" align="center" style="background-color:#939598">0.775</td>
<td valign="top" align="center" style="background-color:#939598">0.880</td>
<td valign="top" align="center">0.817</td>
<td valign="top" align="center">0.738</td>
</tr>
<tr>
<td valign="top" align="left">SuperToX</td>
<td valign="top" align="center" style="background-color:#939598">0.854</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">0.560</td>
<td valign="top" align="center" style="background-color:#939598">0.711</td>
<td valign="top" align="center" style="background-color:#939598">0.742</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
<td valign="top" align="center" style="background-color:#939598">0.862</td>
<td valign="top" align="center" style="background-color:#939598">0.732</td>
<td valign="top" align="center" style="background-color:#939598">&#x02013;</td>
</tr>
<tr>
<td valign="top" align="left">Kibutz</td>
<td valign="top" align="center" style="background-color:#939598">0.865</td>
<td valign="top" align="center">0.750</td>
<td valign="top" align="center">0.694</td>
<td valign="top" align="center" style="background-color:#939598">0.708</td>
<td valign="top" align="center" style="background-color:#939598">0.729</td>
<td valign="top" align="center">0.737</td>
<td valign="top" align="center">0.757</td>
<td valign="top" align="center">0.779</td>
<td valign="top" align="center" style="background-color:#939598">0.587</td>
<td valign="top" align="center" style="background-color:#939598">0.838</td>
<td valign="top" align="center" style="background-color:#939598">0.787</td>
<td valign="top" align="center" style="background-color:#939598">0.666</td>
</tr>
<tr>
<td valign="top" align="left">MML</td>
<td valign="top" align="center" style="background-color:#939598">0.871</td>
<td valign="top" align="center" style="background-color:#939598">0.693</td>
<td valign="top" align="center">0.660</td>
<td valign="top" align="center" style="background-color:#939598">0.701</td>
<td valign="top" align="center" style="background-color:#939598">0.709</td>
<td valign="top" align="center">0.749</td>
<td valign="top" align="center">0.750</td>
<td valign="top" align="center" style="background-color:#939598">0.710</td>
<td valign="top" align="center" style="background-color:#939598">0.647</td>
<td valign="top" align="center" style="background-color:#939598">0.854</td>
<td valign="top" align="center">0.815</td>
<td valign="top" align="center" style="background-color:#939598">0.645</td>
</tr>
<tr>
<td valign="top" align="left">NCI</td>
<td valign="top" align="center" style="background-color:#939598">0.812</td>
<td valign="top" align="center" style="background-color:#939598">0.628</td>
<td valign="top" align="center" style="background-color:#939598">0.592</td>
<td valign="top" align="center">0.783</td>
<td valign="top" align="center" style="background-color:#939598">0.698</td>
<td valign="top" align="center">0.714</td>
<td valign="top" align="center" style="background-color:#939598">0.483</td>
<td valign="top" align="center" style="background-color:#939598">0.703</td>
<td valign="top" align="center" style="background-color:#939598">0.858</td>
<td valign="top" align="center" style="background-color:#939598">0.851</td>
<td valign="top" align="center" style="background-color:#939598">0.747</td>
<td valign="top" align="center">0.736</td>
</tr>
<tr>
<td valign="top" align="left">VIF</td>
<td valign="top" align="center" style="background-color:#939598">0.827</td>
<td valign="top" align="center">0.797</td>
<td valign="top" align="center" style="background-color:#939598">0.610</td>
<td valign="top" align="center" style="background-color:#939598">0.636</td>
<td valign="top" align="center" style="background-color:#939598">0.671</td>
<td valign="top" align="center" style="background-color:#939598">0.656</td>
<td valign="top" align="center" style="background-color:#939598">0.732</td>
<td valign="top" align="center" style="background-color:#939598">0.735</td>
<td valign="top" align="center" style="background-color:#939598">0.723</td>
<td valign="top" align="center" style="background-color:#939598">0.796</td>
<td valign="top" align="center" style="background-color:#939598">0.648</td>
<td valign="top" align="center" style="background-color:#939598">0.666</td>
</tr>
<tr>
<td valign="top" align="left">Toxic Avg</td>
<td valign="top" align="center" style="background-color:#939598">0.715</td>
<td valign="top" align="center">0.721</td>
<td valign="top" align="center" style="background-color:#939598">0.611</td>
<td valign="top" align="center" style="background-color:#939598">0.633</td>
<td valign="top" align="center" style="background-color:#939598">0.671</td>
<td valign="top" align="center" style="background-color:#939598">0.593</td>
<td valign="top" align="center" style="background-color:#939598">0.646</td>
<td valign="top" align="center" style="background-color:#939598">0.640</td>
<td valign="top" align="center" style="background-color:#939598">0.465</td>
<td valign="top" align="center" style="background-color:#939598">0.732</td>
<td valign="top" align="center" style="background-color:#939598">0.614</td>
<td valign="top" align="center" style="background-color:#939598">0.682</td>
</tr>
<tr>
<td valign="top" align="left">Swamidass</td>
<td valign="top" align="center" style="background-color:#939598">0.353</td>
<td valign="top" align="center" style="background-color:#939598">0.571</td>
<td valign="top" align="center">0.748</td>
<td valign="top" align="center" style="background-color:#939598">0.372</td>
<td valign="top" align="center" style="background-color:#939598">0.274</td>
<td valign="top" align="center" style="background-color:#939598">0.391</td>
<td valign="top" align="center" style="background-color:#939598">0.680</td>
<td valign="top" align="center" style="background-color:#939598">0.738</td>
<td valign="top" align="center" style="background-color:#939598">0.711</td>
<td valign="top" align="center" style="background-color:#939598">0.828</td>
<td valign="top" align="center" style="background-color:#939598">0.661</td>
<td valign="top" align="center" style="background-color:#939598">0.585</td>
</tr>
<tr>
<td valign="top" align="left">vNN</td>
<td valign="top" align="center">0.883</td>
<td valign="top" align="center">0.716</td>
<td valign="top" align="center">0.626</td>
<td valign="top" align="center">0.727</td>
<td valign="top" align="center">0.786</td>
<td valign="top" align="center">0.699</td>
<td valign="top" align="center">0.738</td>
<td valign="top" align="center">0.770</td>
<td valign="top" align="center">0.793</td>
<td valign="top" align="center">0.882</td>
<td valign="top" align="center">0.808</td>
<td valign="top" align="center">0.690</td>
</tr>
<tr>
<td valign="top" align="left">vNN rank</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">13</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">13</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">11</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The vNN parameters were set to h &#x0003D; 0.3 and d<sub>0</sub> &#x0003D; 1.0. Gray cells indicate models showing performance inferior to the vNN models</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>The MMP data we used for our mitochondrial dysfunction model were the same as those used in the Tox21 challenge (Attene-Ramos et al., <xref ref-type="bibr" rid="B6">2015</xref>; Huang et al., <xref ref-type="bibr" rid="B25">2016</xref>). Our MMP model was the seventh best performing model, with an AUC-ROC value of 0.882 (with <italic>h</italic> &#x0003D; 0.3 and <italic>d</italic> &#x0003D; 1.0). This was comparable to the values of more elaborate and computationally time-consuming methods, such as deep learning (Table <xref ref-type="table" rid="T4">4</xref>).</p>
<p>Some QSAR methods do not use an applicability domain to determine whether their predictions are reliable. This could lead to the misperception that a model can predict the activity of any molecule. The applicability domain is vital to the vNN method. The user of our platform can adjust it by varying the Tanimoto distance threshold value. Although this could be set to 1 so that the model predicts the activity of any molecule, no model is likely to have an unlimited applicability domain (Liu et al., <xref ref-type="bibr" rid="B28">2015</xref>).</p>
<p>A more reasonable approach to improve a vNN-based model is to increase the applicability domain by adding more reference compounds. A good test of the power of a model to generate prospective predictions is time-split validation, which divides the data into &#x0201C;old&#x0201D; and &#x0201C;new&#x0201D; data and uses the former to train the model and the latter &#x0201C;new&#x0201D; data for validation (Sheridan, <xref ref-type="bibr" rid="B41">2013</xref>; Liu et al., <xref ref-type="bibr" rid="B28">2015</xref>). We have previously shown in a time-split validation that, whereas the accuracy of a vNN model is roughly maintained, the number of &#x0201C;new&#x0201D; compounds that it can predict is significantly reduced. However, by simply adding a few &#x0201C;new&#x0201D; compounds, the coverage increases significantly (Liu et al., <xref ref-type="bibr" rid="B28">2015</xref>).</p>
<p>The lack of training data poses an important limitation to the vNN approach. When a dataset is too small, there is a high probability that a target molecule will have no qualified near neighbors in the dataset, and hence a high-quality prediction cannot be made. However, the lack of training data is a limitation for all machine learning methods. The difference is that most such methods build a model no matter how small the training dataset, and will always make a prediction for any input molecule without considering the reliability of the predicted result. In our view, it is better not to give a prediction at all if it is unreliable. This also alerts users to use alternative methods, including experimental measurements, to derive a reliable answer. As more experimental data become available over time, the performance of the vNN method will improve without retraining. This is in contrast to most other machine learning methods, which cannot take advantage of new data without retraining a model.</p>
<p>This finding is especially significant for drug discovery labs because the chemical space is restricted by the target candidates they are investigating. For example, when exploring a new drug target, it is crucial to continuously update the model with new data to ensure that the applicability domain is relevant for the new target. In a vNN-based model, this can be done easily by adding the SMILES strings of the new compounds to the reference dataset. For this reason, we believe that our web-based vNN platform has the potential to greatly accelerate the development of drugs.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>PS, RL, and AW developed the method, analyzed the data, and wrote the manuscript. VD designed and implemented the web server.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack>
<p>The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Army or of the U.S. Department of Defense. This paper has been approved for public release with unlimited distribution.</p>
</ack>
<sec sec-type="supplementary-material" id="s6">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fphar.2017.00889/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fphar.2017.00889/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table1.DOCX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet1.ZIP" id="SM2" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abbott</surname> <given-names>N. J.</given-names></name> <name><surname>R&#x000F6;nnb&#x000E4;ck</surname> <given-names>L.</given-names></name> <name><surname>Hansson</surname> <given-names>E.</given-names></name></person-group> (<year>2006</year>). <article-title>Astrocyte-endothelial interactions at the blood-brain barrier</article-title>. <source>Nat. Rev. Neurosci</source>. <volume>7</volume>, <fpage>41</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1038/nrn1824</pub-id><pub-id pub-id-type="pmid">16371949</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adler</surname> <given-names>J.</given-names></name> <name><surname>Parmryd</surname> <given-names>I.</given-names></name></person-group> (<year>2010</year>). <article-title>Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander&#x00027;s overlap coefficient</article-title>. <source>Cytometry A</source> <volume>77</volume>, <fpage>733</fpage>&#x02013;<lpage>742</lpage>. <pub-id pub-id-type="doi">10.1002/cyto.a.20896</pub-id><pub-id pub-id-type="pmid">20653013</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ambudkar</surname> <given-names>S. V.</given-names></name> <name><surname>Kimchi-Sarfaty</surname> <given-names>C.</given-names></name> <name><surname>Sauna</surname> <given-names>Z. E.</given-names></name> <name><surname>Gottesman</surname> <given-names>M. M.</given-names></name></person-group> (<year>2003</year>). <article-title>P-glycoprotein: from genomics to mechanism</article-title>. <source>Oncogene</source> <volume>22</volume>, <fpage>7468</fpage>&#x02013;<lpage>7485</lpage>. <pub-id pub-id-type="doi">10.1038/sj.onc.1206948</pub-id><pub-id pub-id-type="pmid">14576852</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ames</surname> <given-names>B. N.</given-names></name> <name><surname>Durston</surname> <given-names>W. E.</given-names></name> <name><surname>Yamasaki</surname> <given-names>E.</given-names></name> <name><surname>Lee</surname> <given-names>F. D.</given-names></name></person-group> (<year>1973</year>). <article-title>Carcinogens are mutagens: a simple test system combining liver homogenates for activation and bacteria for detection</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>70</volume>, <fpage>2281</fpage>&#x02013;<lpage>2285</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.70.8.2281</pub-id><pub-id pub-id-type="pmid">4151811</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Assis</surname> <given-names>D. N.</given-names></name> <name><surname>Navarro</surname> <given-names>V. J.</given-names></name></person-group> (<year>2009</year>). <article-title>Human drug hepatotoxicity: a contemporary clinical perspective</article-title>. <source>Expert Opin. Drug Metab. Toxicol</source>. <volume>5</volume>, <fpage>463</fpage>&#x02013;<lpage>473</lpage>. <pub-id pub-id-type="doi">10.1517/17425250902927386</pub-id><pub-id pub-id-type="pmid">19416083</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Attene-Ramos</surname> <given-names>M. S.</given-names></name> <name><surname>Huang</surname> <given-names>R.</given-names></name> <name><surname>Michael</surname> <given-names>S.</given-names></name> <name><surname>Witt</surname> <given-names>K. L.</given-names></name> <name><surname>Richard</surname> <given-names>A.</given-names></name> <name><surname>Tice</surname> <given-names>R. R.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Profiling of the Tox21 chemical collection for mitochondrial function to identify compounds that acutely decrease mitochondrial membrane potential</article-title>. <source>Environ. Health Perspect</source>. <volume>123</volume>, <fpage>49</fpage>&#x02013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1289/ehp.1408642</pub-id><pub-id pub-id-type="pmid">25302578</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bento</surname> <given-names>A. P.</given-names></name> <name><surname>Gaulton</surname> <given-names>A.</given-names></name> <name><surname>Hersey</surname> <given-names>A.</given-names></name> <name><surname>Bellis</surname> <given-names>L. J.</given-names></name> <name><surname>Chambers</surname> <given-names>J.</given-names></name> <name><surname>Davies</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>The ChEMBL bioactivity database: an update</article-title>. <source>Nucleic Acids Res</source>. <volume>42</volume>, <fpage>D1083</fpage>&#x02013;<lpage>D1090</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt1031</pub-id><pub-id pub-id-type="pmid">24214965</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borst</surname> <given-names>P.</given-names></name> <name><surname>Elferink</surname> <given-names>R. O.</given-names></name></person-group> (<year>2002</year>). <article-title>Mammalian ABC transporters in health and disease</article-title>. <source>Annu. Rev. Biochem</source>. <volume>71</volume>, <fpage>537</fpage>&#x02013;<lpage>592</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.biochem.71.102301.093055</pub-id><pub-id pub-id-type="pmid">12045106</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Broccatelli</surname> <given-names>F.</given-names></name> <name><surname>Carosati</surname> <given-names>E.</given-names></name> <name><surname>Neri</surname> <given-names>A.</given-names></name> <name><surname>Frosini</surname> <given-names>M.</given-names></name> <name><surname>Goracci</surname> <given-names>L.</given-names></name> <name><surname>Oprea</surname> <given-names>T. I.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>A novel approach for predicting P-glycoprotein (ABCB1) inhibition using molecular interaction fields</article-title>. <source>J. Med. Chem</source>. <volume>54</volume>, <fpage>1740</fpage>&#x02013;<lpage>1751</lpage>. <pub-id pub-id-type="doi">10.1021/jm101421d</pub-id><pub-id pub-id-type="pmid">21341745</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>C. M.</given-names></name> <name><surname>Reisfeld</surname> <given-names>B.</given-names></name> <name><surname>Mayeno</surname> <given-names>A. N.</given-names></name></person-group> (<year>2008</year>). <article-title>Cytochromes P450: a structure-based summary of biotransformations using representative substrates</article-title>. <source>Drug Metab. Rev</source>. <volume>40</volume>, <fpage>1</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1080/03602530701836662</pub-id><pub-id pub-id-type="pmid">18259985</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Zhao</surname> <given-names>Q.</given-names></name> <name><surname>Peng</surname> <given-names>H.</given-names></name> <name><surname>Hou</surname> <given-names>T.</given-names></name></person-group> (<year>2011</year>). <article-title>ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques</article-title>. <source>Mol. Pharm</source>. <volume>8</volume>, <fpage>889</fpage>&#x02013;<lpage>900</lpage>. <pub-id pub-id-type="doi">10.1021/mp100465q</pub-id><pub-id pub-id-type="pmid">21413792</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>M.</given-names></name> <name><surname>Vijay</surname> <given-names>V.</given-names></name> <name><surname>Shi</surname> <given-names>Q.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Fang</surname> <given-names>H.</given-names></name> <name><surname>Tong</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>FDA-approved drug labeling for the study of drug-induced liver injury</article-title>. <source>Drug Discov. Today</source> <volume>16</volume>, <fpage>697</fpage>&#x02013;<lpage>703</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2011.05.007</pub-id><pub-id pub-id-type="pmid">21624500</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>F.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>J.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Liu</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties</article-title>. <source>J. Chem. Inf. Model</source>. <volume>52</volume>, <fpage>3099</fpage>&#x02013;<lpage>3105</lpage>. <pub-id pub-id-type="doi">10.1021/ci300367a</pub-id><pub-id pub-id-type="pmid">23092397</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cherkasov</surname> <given-names>A.</given-names></name> <name><surname>Muratov</surname> <given-names>E. N.</given-names></name> <name><surname>Fourches</surname> <given-names>D.</given-names></name> <name><surname>Varnek</surname> <given-names>A.</given-names></name> <name><surname>Baskin</surname> <given-names>I. I.</given-names></name> <name><surname>Cronin</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>QSAR modeling: where have you been? Where are you going to?</article-title> <source>J. Med. Chem</source>. <volume>57</volume>, <fpage>4977</fpage>&#x02013;<lpage>5010</lpage>. <pub-id pub-id-type="doi">10.1021/jm4004285</pub-id><pub-id pub-id-type="pmid">24351051</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Contrera</surname> <given-names>J. F.</given-names></name> <name><surname>Matthews</surname> <given-names>E. J.</given-names></name> <name><surname>Kruhlak</surname> <given-names>N. L.</given-names></name> <name><surname>Benz</surname> <given-names>R. D.</given-names></name></person-group> (<year>2004</year>). <article-title>Estimating the safe starting dose in phase I clinical trials and no observed effect level based on QSAR modeling of the human maximum recommended daily dose</article-title>. <source>Regul. Toxicol. Pharmacol</source>. <volume>40</volume>, <fpage>185</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1016/j.yrtph.2004.08.004</pub-id><pub-id pub-id-type="pmid">15546675</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Czodrowski</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>hERG me out</article-title>. <source>J. Chem. Inf. Model</source>. <volume>53</volume>, <fpage>2240</fpage>&#x02013;<lpage>2251</lpage>. <pub-id pub-id-type="doi">10.1021/ci400308z</pub-id><pub-id pub-id-type="pmid">23944269</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daina</surname> <given-names>A.</given-names></name> <name><surname>Michielin</surname> <given-names>O.</given-names></name> <name><surname>Zoete</surname> <given-names>V.</given-names></name></person-group> (<year>2017</year>). <article-title>SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules</article-title>. <source>Sci. Rep</source>. <volume>7</volume>:<fpage>42717</fpage>. <pub-id pub-id-type="doi">10.1038/srep42717</pub-id><pub-id pub-id-type="pmid">28256516</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Ponti</surname> <given-names>F.</given-names></name> <name><surname>Poluzzi</surname> <given-names>E.</given-names></name> <name><surname>Montanaro</surname> <given-names>N.</given-names></name></person-group> (<year>2001</year>). <article-title>Organising evidence on QT prolongation and occurrence of Torsades de Pointes with non-antiarrhythmic drugs: a call for consensus</article-title>. <source>Eur. J. Clin. Pharmacol</source>. <volume>57</volume>, <fpage>185</fpage>&#x02013;<lpage>209</lpage>. <pub-id pub-id-type="doi">10.1007/s002280100290</pub-id><pub-id pub-id-type="pmid">11497335</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Di</surname> <given-names>L.</given-names></name> <name><surname>Kerns</surname> <given-names>E. H.</given-names></name> <name><surname>Hong</surname> <given-names>Y.</given-names></name> <name><surname>Kleintop</surname> <given-names>T. A.</given-names></name> <name><surname>McConnell</surname> <given-names>O. J.</given-names></name> <name><surname>Huryn</surname> <given-names>D. M.</given-names></name></person-group> (<year>2003</year>). <article-title>Optimization of a higher throughput microsomal stability screening assay for profiling drug discovery candidates</article-title>. <source>J. Biomol. Screen</source>. <volume>8</volume>, <fpage>453</fpage>&#x02013;<lpage>462</lpage>. <pub-id pub-id-type="doi">10.1177/1087057103255988</pub-id><pub-id pub-id-type="pmid">14567798</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duan</surname> <given-names>J.</given-names></name> <name><surname>Dixon</surname> <given-names>S. L.</given-names></name> <name><surname>Lowrie</surname> <given-names>J. F.</given-names></name> <name><surname>Sherman</surname> <given-names>W.</given-names></name></person-group> (<year>2010</year>). <article-title>Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods</article-title>. <source>J. Mol. Graph. Model</source>. <volume>29</volume>, <fpage>157</fpage>&#x02013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmgm.2010.05.008</pub-id><pub-id pub-id-type="pmid">20579912</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dunn</surname> <given-names>G.</given-names></name> <name><surname>Everitt</surname> <given-names>B.</given-names></name></person-group> (<year>1995</year>). <source>Clinical Biostatistics: An Introduction to Evidence-based Medicine</source>. <publisher-loc>London</publisher-loc>: <publisher-name>E. Arnold</publisher-name>.</citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Greene</surname> <given-names>N.</given-names></name> <name><surname>Fisk</surname> <given-names>L.</given-names></name> <name><surname>Naven</surname> <given-names>R. T.</given-names></name> <name><surname>Note</surname> <given-names>R. R.</given-names></name> <name><surname>Patel</surname> <given-names>M. L.</given-names></name> <name><surname>Pelletier</surname> <given-names>D. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Developing structure&#x02013;activity relationships for the prediction of hepatotoxicity</article-title>. <source>Chem. Res. Toxicol</source>. <volume>23</volume>, <fpage>1215</fpage>&#x02013;<lpage>1222</lpage>. <pub-id pub-id-type="doi">10.1021/tx1000865</pub-id><pub-id pub-id-type="pmid">20553011</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansen</surname> <given-names>K.</given-names></name> <name><surname>Mika</surname> <given-names>S.</given-names></name> <name><surname>Schroeter</surname> <given-names>T.</given-names></name> <name><surname>Sutter</surname> <given-names>A.</given-names></name> <name><surname>ter Laak</surname> <given-names>A.</given-names></name> <name><surname>Steger-Hartmann</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Benchmark data set for <italic>in silico</italic> prediction of Ames mutagenicity</article-title>. <source>J. Chem. Inf. Model</source>. <volume>49</volume>, <fpage>2077</fpage>&#x02013;<lpage>2081</lpage>. <pub-id pub-id-type="doi">10.1021/ci900161g</pub-id><pub-id pub-id-type="pmid">19702240</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hert</surname> <given-names>J.</given-names></name> <name><surname>Willett</surname> <given-names>P.</given-names></name> <name><surname>Wilton</surname> <given-names>D. J.</given-names></name> <name><surname>Acklin</surname> <given-names>P.</given-names></name> <name><surname>Azzaoui</surname> <given-names>K.</given-names></name> <name><surname>Jacoby</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures</article-title>. <source>Org. Biomol. Chem</source>. <volume>2</volume>, <fpage>3256</fpage>&#x02013;<lpage>3266</lpage>. <pub-id pub-id-type="doi">10.1039/b409865j</pub-id><pub-id pub-id-type="pmid">15534703</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>R.</given-names></name> <name><surname>Xia</surname> <given-names>M.</given-names></name> <name><surname>Nguyen</surname> <given-names>D.-T.</given-names></name> <name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Sakamuru</surname> <given-names>S.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs</article-title>. <source>Front. Environ. Sci</source>. <volume>3</volume>:<fpage>85</fpage>. <pub-id pub-id-type="doi">10.3389/fenvs.2015.00085</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Tian</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>H.</given-names></name> <name><surname>Hou</surname> <given-names>T.</given-names></name></person-group> (<year>2014</year>). <article-title>ADMET evaluation in drug discovery. 13. Development of <italic>in silico</italic> prediction models for P-glycoprotein substrates</article-title>. <source>Mol. Pharm.</source> <volume>11</volume>, <fpage>716</fpage>&#x02013;<lpage>726</lpage>. <pub-id pub-id-type="doi">10.1021/mp400450m</pub-id><pub-id pub-id-type="pmid">24499501</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liew</surname> <given-names>C. Y.</given-names></name> <name><surname>Lim</surname> <given-names>Y. C.</given-names></name> <name><surname>Yap</surname> <given-names>C. W.</given-names></name></person-group> (<year>2011</year>). <article-title>Mixed learning algorithms and features ensemble in hepatotoxicity prediction</article-title>. <source>J. Comput. Aided Mol. Des</source>. <volume>25</volume>, <fpage>855</fpage>. <pub-id pub-id-type="doi">10.1007/s10822-011-9468-3</pub-id><pub-id pub-id-type="pmid">21898162</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Schyman</surname> <given-names>P.</given-names></name> <name><surname>Wallqvist</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Critically assessing the predictive power of QSAR models for human liver microsomal stability</article-title>. <source>J. Chem. Inf. Model</source>. <volume>55</volume>, <fpage>1566</fpage>&#x02013;<lpage>1575</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.5b00255</pub-id><pub-id pub-id-type="pmid">26170251</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Tawa</surname> <given-names>G.</given-names></name> <name><surname>Wallqvist</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>Locally weighted learning methods for predicting dose-dependent toxicity with application to the human maximum recommended daily dose</article-title>. <source>Chem. Res. Toxicol</source>. <volume>25</volume>, <fpage>2216</fpage>&#x02013;<lpage>2226</lpage>. <pub-id pub-id-type="doi">10.1021/tx300279f</pub-id><pub-id pub-id-type="pmid">22963722</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Wallqvist</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Merging applicability domains for <italic>in silico</italic> assessment of chemical mutagenicity</article-title>. <source>J. Chem. Inf. Model</source>. <volume>54</volume>, <fpage>793</fpage>&#x02013;<lpage>800</lpage>. <pub-id pub-id-type="doi">10.1021/ci500016v</pub-id><pub-id pub-id-type="pmid">24494696</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manganaro</surname> <given-names>A.</given-names></name> <name><surname>Pizzo</surname> <given-names>F.</given-names></name> <name><surname>Lombardo</surname> <given-names>A.</given-names></name> <name><surname>Pogliaghi</surname> <given-names>A.</given-names></name> <name><surname>Benfenati</surname> <given-names>E.</given-names></name></person-group> (<year>2016</year>). <article-title>Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm</article-title>. <source>Chemosphere</source> <volume>144</volume>, <fpage>1624</fpage>&#x02013;<lpage>1630</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemosphere.2015.10.054</pub-id><pub-id pub-id-type="pmid">26517391</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maunz</surname> <given-names>A.</given-names></name> <name><surname>G&#x000FC;tlein</surname> <given-names>M.</given-names></name> <name><surname>Rautenberg</surname> <given-names>M.</given-names></name> <name><surname>Vorgrimmler</surname> <given-names>D.</given-names></name> <name><surname>Gebele</surname> <given-names>D.</given-names></name> <name><surname>Helma</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>lazar: a modular predictive toxicology framework</article-title>. <source>Front. Pharmacol</source>. <volume>4</volume>:<fpage>38</fpage>. <pub-id pub-id-type="doi">10.3389/fphar.2013.00038</pub-id><pub-id pub-id-type="pmid">23761761</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meyer</surname> <given-names>J. N.</given-names></name> <name><surname>Leung</surname> <given-names>M. C.</given-names></name> <name><surname>Rooney</surname> <given-names>J. P.</given-names></name> <name><surname>Sendoel</surname> <given-names>A.</given-names></name> <name><surname>Hengartner</surname> <given-names>M. O.</given-names></name> <name><surname>Kisby</surname> <given-names>G. E.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Mitochondria as a target of environmental toxicants</article-title>. <source>Toxicol. Sci</source>. <volume>134</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1093/toxsci/kft102</pub-id><pub-id pub-id-type="pmid">23629515</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muehlbacher</surname> <given-names>M.</given-names></name> <name><surname>Spitzer</surname> <given-names>G. M.</given-names></name> <name><surname>Liedl</surname> <given-names>K. R.</given-names></name> <name><surname>Kornhuber</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Qualitative prediction of blood&#x02013;brain barrier permeability on a large and refined dataset</article-title>. <source>J. Comput. Aided Mol. Des</source>. <volume>25</volume>, <fpage>1095</fpage>&#x02013;<lpage>1106</lpage>. <pub-id pub-id-type="doi">10.1007/s10822-011-9478-1</pub-id><pub-id pub-id-type="pmid">22109848</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murray</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>Role of CYP pharmacogenetics and drug-drug interactions in the efficacy and safety of atypical and other antipsychotic agents</article-title>. <source>J. Pharm. Pharmacol</source>. <volume>58</volume>, <fpage>871</fpage>&#x02013;<lpage>885</lpage>. <pub-id pub-id-type="doi">10.1211/jpp.58.7.0001</pub-id><pub-id pub-id-type="pmid">16805946</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naef</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>A generally applicable computer algorithm based on the group additivity method for the calculation of seven molecular descriptors: heat of combustion, logPO/W, logS, refractivity, polarizability, toxicity and logBB of crganic compounds; scope and limits of applicability</article-title>. <source>Molecules</source> <volume>20</volume>:<fpage>18279</fpage>. <pub-id pub-id-type="doi">10.3390/molecules201018279</pub-id><pub-id pub-id-type="pmid">26457702</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pieczenik</surname> <given-names>S. R.</given-names></name> <name><surname>Neustadt</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>Mitochondrial dysfunction and molecular pathways of disease</article-title>. <source>Exp. Mol. Pathol</source>. <volume>83</volume>, <fpage>84</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1016/j.yexmp.2006.09.008</pub-id><pub-id pub-id-type="pmid">17239370</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rogers</surname> <given-names>D.</given-names></name> <name><surname>Hahn</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>Extended-connectivity fingerprints</article-title>. <source>J. Chem. Inf. Model</source>. <volume>50</volume>, <fpage>742</fpage>&#x02013;<lpage>754</lpage>. <pub-id pub-id-type="doi">10.1021/ci100050t</pub-id><pub-id pub-id-type="pmid">20426451</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanguinetti</surname> <given-names>M. C.</given-names></name> <name><surname>Tristani-Firouzi</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>hERG potassium channels and cardiac arrhythmia</article-title>. <source>Nature</source> <volume>440</volume>, <fpage>463</fpage>&#x02013;<lpage>469</lpage>. <pub-id pub-id-type="doi">10.1038/nature04710</pub-id><pub-id pub-id-type="pmid">16554806</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schyman</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Wallqvist</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Using the variable-nearest neighbor method to identify P-glycoprotein substrates and inhibitors</article-title>. <source>ACS Omega</source> <volume>1</volume>, <fpage>923</fpage>&#x02013;<lpage>929</lpage>. <pub-id pub-id-type="doi">10.1021/acsomega.6b00247</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sheridan</surname> <given-names>R. P.</given-names></name></person-group> (<year>2013</year>). <article-title>Time-split cross-validation as a method for estimating the goodness of prospective prediction</article-title>. <source>J. Chem. Inf. Model</source>. <volume>53</volume>, <fpage>783</fpage>&#x02013;<lpage>790</lpage>. <pub-id pub-id-type="doi">10.1021/ci400084k</pub-id><pub-id pub-id-type="pmid">23521722</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sushko</surname> <given-names>I.</given-names></name> <name><surname>Novotarskyi</surname> <given-names>S.</given-names></name> <name><surname>K&#x000F6;rner</surname> <given-names>R.</given-names></name> <name><surname>Pandey</surname> <given-names>A. K.</given-names></name> <name><surname>Rupp</surname> <given-names>M.</given-names></name> <name><surname>Teetz</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information</article-title>. <source>J. Comput. Aided Mol. Des</source>. <volume>25</volume>, <fpage>533</fpage>&#x02013;<lpage>554</lpage>. <pub-id pub-id-type="doi">10.1007/s10822-011-9440-2</pub-id><pub-id pub-id-type="pmid">21660515</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walker</surname> <given-names>T.</given-names></name> <name><surname>Grulke</surname> <given-names>C. M.</given-names></name> <name><surname>Pozefsky</surname> <given-names>D.</given-names></name> <name><surname>Tropsha</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>Chembench: a cheminformatics workbench</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>3000</fpage>&#x02013;<lpage>3001</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq556</pub-id><pub-id pub-id-type="pmid">20889496</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage</article-title>. <source>Mol. Pharm.</source> <volume>9</volume>, <fpage>996</fpage>&#x02013;<lpage>1010</lpage>. <pub-id pub-id-type="doi">10.1021/mp300023x</pub-id><pub-id pub-id-type="pmid">22380484</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Xiao</surname> <given-names>J.</given-names></name> <name><surname>Suzek</surname> <given-names>T. O.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Bryant</surname> <given-names>S. H.</given-names></name></person-group> (<year>2009</year>). <article-title>PubChem: a public information system for analyzing bioactivities of small molecules</article-title>. <source>Nucleic Acids Res</source>. <volume>37</volume>, <fpage>W623</fpage>&#x02013;<lpage>W633</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkp456</pub-id><pub-id pub-id-type="pmid">19498078</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Waring</surname> <given-names>M. J.</given-names></name> <name><surname>Arrowsmith</surname> <given-names>J.</given-names></name> <name><surname>Leach</surname> <given-names>A. R.</given-names></name> <name><surname>Leeson</surname> <given-names>P. D.</given-names></name> <name><surname>Mandrell</surname> <given-names>S.</given-names></name> <name><surname>Owen</surname> <given-names>R. M.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>An analysis of the attrition of drug candidates from four major pharmaceutical companies</article-title>. <source>Nat. Rev. Drug Discov</source>. <volume>14</volume>, <fpage>475</fpage>&#x02013;<lpage>486</lpage>. <pub-id pub-id-type="doi">10.1038/nrd4609</pub-id><pub-id pub-id-type="pmid">26091267</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weininger</surname> <given-names>D.</given-names></name></person-group> (<year>1988</year>). <article-title>SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules</article-title>. <source>J. Chem. Inf. Comput. Sci.</source> <volume>28</volume>, <fpage>31</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1021/ci00057a005</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>J. J.</given-names></name> <name><surname>Henstock</surname> <given-names>P. V.</given-names></name> <name><surname>Dunn</surname> <given-names>M. C.</given-names></name> <name><surname>Smith</surname> <given-names>A. R.</given-names></name> <name><surname>Chabot</surname> <given-names>J. R.</given-names></name> <name><surname>de Graaf</surname> <given-names>D.</given-names></name></person-group> (<year>2008</year>). <article-title>Cellular Imaging predictions of clinical drug-induced liver injury</article-title>. <source>Toxicol. Sci</source>. <volume>105</volume>, <fpage>97</fpage>&#x02013;<lpage>105</lpage>. <pub-id pub-id-type="doi">10.1093/toxsci/kfn109</pub-id><pub-id pub-id-type="pmid">18524759</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Dai</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>F.</given-names></name> <name><surname>Gao</surname> <given-names>S.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Lai</surname> <given-names>L.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning for drug-induced liver injury</article-title>. <source>J. Chem. Inf. Model</source>. <volume>55</volume>, <fpage>2085</fpage>&#x02013;<lpage>2093</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.5b00238</pub-id><pub-id pub-id-type="pmid">26437739</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>W.</given-names></name> <name><surname>Tropsha</surname> <given-names>A.</given-names></name></person-group> (<year>2000</year>). <article-title>Novel variable selection quantitative structure&#x02013;property relationship approach based on the k-nearest-neighbor principle</article-title>. <source>J. Chem. Inf. Comput. Sci</source>. <volume>40</volume>, <fpage>185</fpage>&#x02013;<lpage>194</lpage>. <pub-id pub-id-type="doi">10.1021/ci980033m</pub-id><pub-id pub-id-type="pmid">10661566</pub-id></citation></ref>
</ref-list>
<glossary>
<def-list>
<title>Abbreviations</title>
<def-item><term>Pgp</term>
<def><p>permeability glycoprotein</p></def></def-item>
<def-item><term>MDR</term>
<def><p>multidrug resistance.</p></def></def-item>
</def-list>
</glossary>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> The authors were supported by the U.S. Army Medical Research and Materiel Command (Fort Detrick, MD), and the Defense Threat Reduction Agency grant CBCall14-CBS-05-2-0007.</p>
</fn>
</fn-group>
</back>
</article>