ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

Volume 8 - 2025 | doi: 10.3389/frai.2025.1599412

BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets

Provisionally accepted
Yifan  GaoYifan Gao1Zakariyya  MughalZakariyya Mughal2Jose  A Jaramillo-VillegasJose A Jaramillo-Villegas3Marie  CorradiMarie Corradi4Alexandre  BorrelAlexandre Borrel5Ben  LiebermanBen Lieberman2Suliman  SharifSuliman Sharif2John  ShafferJohn Shaffer2Karamarie  FechoKaramarie Fecho6Ajay  ChatrathAjay Chatrath7Alexandra  MaertensAlexandra Maertens1Marc  TeunisMarc Teunis4Nicole  C KleinstreuerNicole C Kleinstreuer8Thomas  HartungThomas Hartung1Thomas  LuechtefeldThomas Luechtefeld2*
  • 1Johns Hopkins University, Baltimore, United States
  • 2Insilica, LLC, Bethesda, Maryland, United States
  • 3Facultad de Ciencias de la Salud, Universidad Tecnologica de Pereira, Pereira, Risaralda, Colombia
  • 4HU University of Applied Sciences Utrecht, Utrecht, Netherlands, Netherlands
  • 5Inotiv, Morrisville, New York, United States
  • 6University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States
  • 7Washington University in St. Louis, St. Louis, Missouri, United States
  • 8National Institute of Environmental Health Sciences (NIH), Durham, North Carolina, United States

The final, formatted version of the article will be published soon.

Researchers in biomedical research, public health and the life sciences often spend weeks or months discovering, accessing, curating, and integrating data from disparate sources, significantly delaying the onset of actual analysis and innovation. Instead of countless developers creating redundant and inconsistent data pipelines, BioBricks.ai offers a centralized data repository and a suite of developer-friendly tools to simplify access to scientific data. Currently, BioBricks.ai delivers over ninety biological and chemical datasets. It provides a package manager-like system for installing and managing dependencies on data sources. Each 'brick' is a Data Version Control git repository that supports an updateable pipeline for extraction, transformation, and loading data into the BioBricks.ai backend at https://biobricks.ai. Use cases include accelerating data science workflows and facilitating the creation of novel data assets by integrating multiple datasets into unified, harmonized resources. In conclusion, BioBricks.ai offers an opportunity to accelerate access and use of public data through a single open platform.

Keywords: Public health data, BioBricks.ai, data integration, machine learning, Cheminformatics, bioinformatics

Received: 24 Mar 2025; Accepted: 14 Jul 2025.

Copyright: © 2025 Gao, Mughal, Jaramillo-Villegas, Corradi, Borrel, Lieberman, Sharif, Shaffer, Fecho, Chatrath, Maertens, Teunis, Kleinstreuer, Hartung and Luechtefeld. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Thomas Luechtefeld, Insilica, LLC, Bethesda, 20817, Maryland, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.