AUTHOR=Breitenbach Tim , Dandekar Thomas TITLE=Adaptive sampling methods facilitate the determination of reliable dataset sizes for evidence-based modeling JOURNAL=Frontiers in Bioinformatics VOLUME=Volume 5 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2025.1528515 DOI=10.3389/fbinf.2025.1528515 ISSN=2673-7647 ABSTRACT=How can we be sure that there is sufficient data for our model, such that the predictions remain reliable on unseen data and the conclusions drawn from the fitted model would not vary significantly when using a different sample of the same size? We answer these and related questions through a systematic approach that examines the data size and the corresponding gains in accuracy. Assuming the sample data are drawn from a data pool with no data drift, the law of large numbers ensures that a model converges to its ground truth accuracy. Our approach provides a heuristic method for investigating the speed of convergence with respect to the size of the data sample. This relationship is estimated using sampling methods, which introduces a variation in the convergence speed results across different runs. To stabilize results—so that conclusions do not depend on the run—and extract the most reliable information encoded in the available data regarding convergence speed, the presented method automatically determines a sufficient number of repetitions to reduce sampling deviations below a predefined threshold, thereby ensuring the reliability of conclusions about the required amount of data.