Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Drug Discovery in Bioinformatics

This article is part of the Research TopicMethods, Tools and Algorithms in Drug Discovery BioinformaticsView all 9 articles

An Efficient Computational Chemistry Approach to Generating Negative Data for Drug Discovery Pipeline Validation

Provisionally accepted
  • Medical University Sofia, Sofia, Bulgaria

The final, formatted version of the article will be published soon.

Modern virtual high-throughput screening (VHTS) pipelines can be suboptimally validated, with no rigorous studies conclusively demonstrating that every one of their steps reliably adds increasing enrichment atop the baseline random hit rate. Moreover, what little benchmarking studies are available primarily focus on the docking aspect of the pipelines, which is usually only the beginning or near the beginning, and even there, authors tend to use flawed data sets that artificially inflate performance metrics. Herein, we present an alternative method to pipeline validation and data set generation that requires no additional experimental work and expenditure, yet offers large amounts of negative data that can be used in VHTS pipeline validation. By randomizing ligands across published experimental structures and generating structural isomers of known binders, practically unlimited amounts of negative data can be generated. Such sets of positive and negative data points match closely in key molecular properties and are well suited to pipeline validation. Once such sets are generated, they are to be run through any proposed pipeline, assessing performance at every step. We stress the importance of using negative data of adequate quality and quantity in validation studies to definitively and verifiably demonstrate the utility of a given tool or workflow. Our goal is to help distinguish tools and pipelines that truly accelerate hit discovery and lead optimization from ones that promise to do so but actually do not, whereupon academia and industry can begin to tackle the many unaddressed medical needs of the 21st century.

Keywords: Cheminformatics, gnina, MM-PBSA, molecular docking, molecular dynamics simulation, negative data, recovery plots, vHTS

Received: 28 Nov 2025; Accepted: 28 Jan 2026.

Copyright: © 2026 Ivanov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Stefan Ivanov

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.