- 1Sciome LLC, Research Triangle Park, NC, United States
- 2Division of Translational Toxicology at National Institutes of Environmental Health Sciences, Research Triangle Park, NC, United States
The development of new approach methodologies (NAMs) for next-generation risk assessment (NGRA) requires the integration of diverse data streams. A comprehensive understanding of a chemical’s potential hazard involves combining multiple mechanistic data, usually from in vitro and in silico studies, to build a coherent weight-of-evidence case. Currently, the lack of tools to effectively aggregate and navigate disparate datasets makes regulatory evaluation a challenging process. OrbiTox addresses this need by consolidating millions of data points from multiple domains, i.e., chemical properties, genes, pathways, and bioactivities, into an intuitive and interactive 3D visualization platform. To support comprehensive chemical assessments, OrbiTox incorporates hundreds of quantitative structure–activity relationship (QSAR) models for robust gap-filling of key endpoints. It also facilitates read-across by enabling the retrieval of data-rich chemical analogs with similar structures and metabolic profiles. By unifying experimental data and predictive models within a user-friendly interface, OrbiTox facilitates data-driven chemical safety assessments.
1 Introduction
The field of toxicology is undergoing a paradigm shift, moving from traditional animal testing toward the adoption of new approach methodologies (NAMs) (National Toxicology Program, 2018). Modern toxicological studies and next-generation risk assessment (NGRA) now rely on integrating extensive datasets from non-animal testing, such as in vitro assays, organs-on-chip, or transcriptomics datasets, to elucidate complex mechanistic pathways in approaches such as integrated approaches to testing and assessment (IATA) and read-across (Barrero-Canosa et al., 2025; Caloni et al., 2022; Roe et al., 2025).
Valuable data for these assessments are housed in disparate repositories such as PubChem (Kim et al., 2016), DrugBank (Wishart et al., 2018), ChEMBL (Gaulton et al., 2017), and GenBank (Benson et al., 2013), but this information is often siloed, stored in heterogeneous formats, and of variable quality. While existing platforms, such as the EPA CompTox Chemicals Dashboard (Williams et al., 2017), provide access to substantial information, they can be overwhelming and do not fully support the comprehensive exploration or integration of all necessary multi-source data. Consequently, a significant gap remains for a unified platform capable of integrating these multi-domain datasets.
We fill this gap by building a translational discovery platform, OrbiTox, with the ability to house and interactively visualize large amounts of multi-domain data and to extract novel knowledge from the connections across these diverse data types. Furthermore, OrbiTox is enriched with predictive models to fill data gaps for untested chemicals.
Access to multi-domain data, predictive models, and cheminformatics methods in one software tool can make it easier to answer questions such as: what are the property profiles of chemicals similar to a specific structure of interest? What are the closest analogs of the chemical of interest, and what is their predicted and experimental toxicity profile? Which chemicals are likely to be active or inactive on a given target of interest for a particular toxicity?
2 Methods
2.1 Data content
Data in OrbiTox have been retrieved from publicly available resources and carefully cleaned and curated while retaining information on the sources. These are structured into interconnected, concentric ‘orbits’ designed for navigation. From the exterior toward the center of the navigation view, each orbit’s description is listed below.
The chemistry orbit houses ∼900,000 chemical substances and their names, SMILES, various identifiers with source links, macro classes, chemical size parameters, etc. A variety of cleaned and harmonized experimental toxicity data from different public sources are associated with a large number of these compounds, e.g., bacterial mutagenicity with and without S9 in OECD-recommended strains (∼6,000 compounds with ∼44,000 measurements), rodent and human carcinogenicity (∼1,300 compounds with ∼1,800 labels), oral, inhalation, and dermal acute toxicity (∼8,000 compounds with ∼16,000 values), ocular irritation (∼400 compounds with ∼700 labels), skin sensitization (∼1,000 compounds with ∼1,800 values), and outcome in Tox21 and ToxCast assays (∼9,000 compounds with over 500,000 readouts). These substances are further categorized to make filtering and focusing on a desired set of chemicals easier, such as Tox21 chemicals (∼9,000), drugs (∼4,500), purine bases (∼7,000), steroids (∼6,500), and PFAS compounds (∼11,000).
The gene orbit contains gene names, synonyms, and chromosomal locations from the NCBI for ∼41,000 human genes. These genes have ∼1.7M connections, including linkages to chemicals (e.g., in vitro target assays) and other gene targets (e.g., protein–protein interactions).
The pathway orbit contains approximately ∼2,000 annotated pathways with the names of their member genes. There are ∼44,000 connections between the pathway and gene orbits.
The organism orbit is populated with various in vivo toxicity studies representing ∼170 test organisms (or systems), including their genus, family, and life span. There are ∼80,000 connections to the chemical orbit.
2.2 Data organization and connections across orbits
Objects in each orbit are clustered based on their within-orbit similarity. Chemicals are clustered in the chemistry orbit by computing their pairwise Jaccard similarity using Saagar fingerprints (Sedykh et al., 2021). The gene orbit contains genes clustered based on their gene co-expression from gene-level raw read counts for 441,356 human transcriptomic samples meeting a pre-defined criterion from the ARCHS4 portal (Lachmann et al., 2018). Gene co-expression was quantified based on the pairwise Pearson correlation coefficient (PCC) among the normalized raw read counts. Pathways are collections of annotated pathways taken mainly from KEGG, BIOCARTA, and REACTOME databases (Broad Institute, 2025) and are clustered based on the similarity of gene membership within each pathway. Organisms are organized in the organism orbit based on inter-organism distances defined by the phylogeny tree. The order of the orbits is loosely based on the complexity of their objects, thus placing more complex objects (organisms) closer to the center and chemical structures in the outermost orbit.
Connections between orbits are visualized based on defined criteria, such as experimental data from assays or established biological relationships. For instance, a chemical is linked to the ‘gene’ orbit if it exhibits activity against that gene target. A gene is connected to the ‘pathway’ orbit if it is a known member of that pathway. Finally, an organism is linked to a chemical when corresponding bioactivity data are available (such as toxic effects or phenotype change).
2.3 QSAR models
To enrich OrbiTox, the current version offers over 150 robust and cross-validated QSAR models. These models serve as in silico NAMs to extend existing experimental data connections with computational predictions. These include models for Tox21 assays (at 100 μM and 10 μM concentration thresholds), bacterial mutagenicity models for predicting outcome in the Ames test conducted using five OECD-recommended strains both in the presence (+S9) and absence (-S9) of metabolic medium, models based on the ToxCast data to predict failure modes of cardiotoxicity, models based on the carcinogenicity data available from the NTP technical reports and other regulatory agencies, and a rat oral TD50 model for nitrosamines. Along with predictions from these QSAR models, predictions of metabolites of a compound are also made using SyGMA (Ridder and Wagner, 2008). The training sets have been thoroughly prepared and harmonized to concisely define every modeled endpoint to comply with the OECD “Principle 1” for acceptance of the QSAR model for regulatory applications (OECD, 2014). Models were built following the best practices in the field (Fourches et al., 2010). Saagar molecular descriptors were chosen over ToxPrint and Mordred based on their performance in a benchmarking exercise on classification QSAR modeling (Supplementary Figure S1).
3 Application description
OrbiTox interface: The user interacts with OrbiTox through a simple 3D interactive interface of menus and windows (Figure 1), which is designed for high-performance searching, filtering, and exploration of millions of multi-domain data points, enabling real-time visualization with instantaneous updates. Main display (Figure 2a) consists of interconnected chemistry (orange), gene (blue), pathways (green), and organism (red) ‘orbits.’ The search menu accepts user queries for efficient processing and provides robust features such as fuzzy matching and autocompletion. The filter menu controls objects in the main display that meet criteria defined by a user-selected combination of textual and numerical constraints. The settings menu allows customization of node attributes, including size, shape, and color.
Figure 2. (a) Organization of multi-domain data in OrbiTox clustered in concentric orbits: chemicals (orange), gene targets (blue), biological pathways (green), and test organisms (red); (b) source data, structural visualization, and experimental results of a compound available in OrbiTox; (c) for a new query compound, its computed physicochemical properties and most similar structure in OrbiTox; (d) generated report including chemical information, structural analogs, predicted toxicities, and read-across results.
OrbiTox Operation: After invoking OrbiTox (www.orbitox.org), the user enters the name/ID/SMILES of a compound, a gene, a pathway, or an organism in the search window, depending on the use case of interest. OrbiTox identifies the selected object in the appropriate orbit and moves focus to the chosen object. On selecting an object, the information window with three tabs (structure, data, and connections) is populated with relevant information. For an acceptable 1D of a chemical as a query, for example, OrbiTox allows a user to interactively visualize it (red dot, Figure 1) in the vicinity of chemicals with similar structures. The content in the connection tab provides information on its experimental data (toxicity, ADME (absorption, distribution, metabolism, excretion), and pharmacology) and biological targets and organisms it interacts with (Figure 2b). If the query is a SMILES string of a chemical, OrbiTox recognizes it as a new chemical and makes predictions of outcomes in hundreds of bioactivity assays with QSAR models that provide chemistry-backed reasoning for each prediction (Figure 2c). A customized, printable, or machine-readable report (Figure 2d) can be generated from the Report Menu.
The user can easily navigate through various menus and options of the user-friendly interface of OrbiTox to conduct desired investigations for objects from any of the four connected domains. Several common applications of OrbiTox, useful in an NGRA process, are listed below.
1. Extract all experimental data available for a query chemical (query entered as the chemical’s name or acceptable 1D).
2. Generate a property profile of a chemical based on over a hundred validated QSAR models (query entered as a SMILES string).
3. Visualize data-rich compounds with structures similar to that of the query (the number and similarity distance of similar structures are customizable).
4. Visualize structural features that are responsible for differences in properties of compounds despite being similar in structure.
5. Analyze why, despite similarity in structures, two chemicals have different property profiles.
6. Find correlation between data for a set of compounds tested in two different biological assays for efficient screening, e.g., selecting an in vitro assay over an animal assay.
7. Collect data for modeling with compounds that interact with a gene target, such as PPARD, or that have been tested in a particular organism, e.g., Salmonella Typhimurium strain TA100.
8. Identify member genes of a pathway to find potential new therapeutic targets, such as genes BECN1 and ATG5 of the autophagy pathway as targets of breast cancer treatment.
9. Conduct a read-across assessment by identifying an analog with similarity in structure and similar physicochemical, biochemical, and metabolic profiles (Figure 1), e.g., N-nitroso-N′-methylpiperazine as a source analog for 1-nitroso-4-propylpiperazine (SMILES, CCCN1CCN(N=O)CC1).
10. Assess potential liabilities before advancing a nanomolar pharmacologically active hit to plan testing strategies, e.g., test Ranirestat (inhibitor of AKR1B1 with IC50 = 15 nM) in a mitochondrial pathway activation assay (predicted activator with probability 0.832).
4 Conclusion
OrbiTox is a web-based translational discovery platform (https://orbitox.org) that enables multi-domain data exploration within an interactive 3D environment. By facilitating connections between data from different domains, the platform opens novel avenues for NGRAs. In addition, OrbiTox uniquely integrates diverse bioactivity data from both in vivo and in vitro sources. Direct comparison of animal and non-animal data is important for building confidence in and establishing the relevance of NAMs, ultimately supporting the transition to animal-free safety testing. Finally, the OrbiTox architecture may be expanded to incorporate more domains, data, and QSAR models, including proprietary data and models.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://orbitox.org.
Author contributions
AR: Methodology, Software, Visualization, Writing – review and editing. VG: Conceptualization, Data curation, Project administration, Validation, Writing – original draft, Writing – review and editing. AS: Conceptualization, Data curation, Methodology, Writing – review and editing. AG: Formal analysis, Validation, Writing – review and editing. AB: Validation, Writing – review and editing. BK: Methodology, Visualization, Writing – review and editing. JP: Software, Writing – review and editing. MS: Resources, Software, Writing – review and editing. DP: Data curation, Methodology, Writing – review and editing. DM: Methodology, Writing – review and editing. MB-M: Data curation, Methodology, Writing – review and editing. BH: Methodology, Writing – review and editing. RS: Funding acquisition, Project administration, Validation, Writing – review and editing. NK: Supervision, Writing – review and editing. WC: Supervision, Writing – review and editing.
Funding
The authors declare that financial support was received for the research and/or publication of this article. The research reported in this publication was supported, in part, by the National Institute of Environmental Health Sciences of the National Institutes of Health under award number 75N96024C00003. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of interest
Authors AR, VG, AS, AG, AB, BK, JP, MS, DP, DM, MB-M, BH, and RS were employed by Sciome LLC.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar.2025.1710864/full#supplementary-material
SUPPLEMENTARY FIGURE S1 | Average AUROC for the predictions of left-out compounds in triplicate 5-fold cross-validation by different modeling methods using Saagar, ToxPrint, and Mordred descriptors for the Ames bacterial mutagenicity QSAR models. Saagar performs better or as good as ToxPrint and Mordred in different modeling methods.
References
Barrero-Canosa, J., Ebeling, J., Kenny, E. F., Marx-Stoelting, P., Paege, N., Feustel, S., et al. (2025). Human health risk assessment for microbial pesticides in the EU: challenges and perspectives. Environ. Health A Glob. Access Sci. Source 24, 43. doi:10.1186/s12940-025-01196-1
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., et al. (2013). GenBank. Nucleic Acids Res. 41, D36–D42. doi:10.1093/nar/gks1195
Broad Institute (2025). Molecular signatures database (MSigDB). Available online at: https://www.gsea-msigdb.org/gsea/index.jsp (Accessed June 7, 2025).
Caloni, F., De Angelis, I., and Hartung, T. (2022). Replacement of animal testing by integrated approaches to testing and assessment (IATA): a call for in vivitrosi. Archives Toxicol. 96, 1935–1950. doi:10.1007/s00204-022-03299-x
Fourches, D., Muratov, E., and Tropsha, A. (2010). Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model 50, 1189–1204. doi:10.1021/ci100176x
Gaulton, A., Hersey, A., Nowotka, M., Bento, A. P., Chambers, J., Mendez, D., et al. (2017). The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954. doi:10.1093/nar/gkw1074
Kim, S., Thiessen, P. A., Bolton, E. E., Chen, J., Fu, G., Gindulyte, A., et al. (2016). PubChem substance and compound databases. Nucleic Acids Res. 44, D1202–D1213. doi:10.1093/nar/gkv951
Lachmann, A., Torre, D., Keenan, A. B., Jagodnik, K. M., Lee, H. J., Wang, L., et al. (2018). Massive mining of publicly available RNA-Seq data from human and mouse. Nat. Commun. 9, 1366. doi:10.1038/s41467-018-03751-6
National Toxicology Program (2018). A strategic roadmap for establishing new approaches to evaluate the safety of chemicals and medical products in the United States. doi:10.22427/NTP-ICCVAM-ROADMAP2018
OECD (2014). Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models. Paris: OECD Publishing. doi:10.1787/9789264085442-en
Ridder, L., and Wagener, M. (2008). SyGMa: combining expert knowledge and empirical scoring in the prediction of metabolites. ChemMedChem 3, 821–832. doi:10.1002/cmdc.200700312
Roe, H. M., Tsai, H. H. D., Ball, N., Wright, F. A., Chiu, W. A., and Rusyn, I. (2025). A systematic analysis of read-across adaptations in testing proposal evaluations by the european chemicals agency. ALTEX 42, 22–38. doi:10.14573/altex.2408292
Sedykh, A. Y., Shah, R. R., Kleinstreuer, N. C., Auerbach, S. S., and Gombar, V. K. (2021). Saagar-A new, extensible set of molecular substructures for QSAR/QSPR and read-across predictions. Chem. Res. Toxicol. 34, 634–640. doi:10.1021/acs.chemrestox.0c00464
Williams, A. J., Grulke, C. M., Edwards, J., McEachran, A. D., Mansouri, K., Baker, N. C., et al. (2017). The CompTox chemistry dashboard: a community data resource for environmental chemistry. J. Cheminform 9, 61. doi:10.1186/s13321-017-0247-6
Keywords: web application, read, across, new approach methodologies, non animal methods, quantitative structure–activity relationship, computational toxicology
Citation: Ross A, Gombar V, Sedykh A, Green AJ, Borrel A, Kidd B, Phillips J, Shah M, Phadke D, Mav D, Balik-Meisner M, Howard B, Shah R, Kleinstreuer NC and Casey WM (2025) OrbiTox: a visualization platform for NAMs and read-across exploration of multi-domain data. Front. Pharmacol. 16:1710864. doi: 10.3389/fphar.2025.1710864
Received: 22 September 2025; Accepted: 10 November 2025;
Published: 01 December 2025.
Edited by:
Sergey Sosnin, University of Vienna, AustriaReviewed by:
Andy Nong, Health Canada, CanadaUgis Sarkans, European Bioinformatics Institute (EMBL-EBI), United Kingdom
Copyright © 2025 Ross, Gombar, Sedykh, Green, Borrel, Kidd, Phillips, Shah, Phadke, Mav, Balik-Meisner, Howard, Shah, Kleinstreuer and Casey. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Vijay Gombar, dmlqYXkuZ29tYmFyQHNjaW9tZS5jb20=
†Present addresses: Nicole Kleinstreuer, NIH OD Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI), Bethesda, MD, United States
Warren Casey, NIH OD Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI), Bethesda, MD, United States
Austin Ross1