MoltiTox: A MultiModal Fusion Model for Molecular Toxicity Prediction

Park, Junwoo; Lee, Sujee

doi:10.3389/ftox.2025.1720651

ORIGINAL RESEARCH article

Front. Toxicol.

Sec. Computational Toxicology and Informatics

MoltiTox: A MultiModal Fusion Model for Molecular Toxicity Prediction

Provisionally accepted

Junwoo Park

Sujee Lee^*

Sungkyunkwan University, Jongno-gu, Republic of Korea

The final, formatted version of the article will be published soon.

We introduce MoltiTox, a novel multimodal fusion model for molecular toxicity prediction that overcomes the limitations of single-modality approaches in drug discovery. MoltiTox integrates four complementary data types: molecular graphs, SMILES strings, 2D images, and 13C NMR spectra. The model processes these inputs using four modality-specific encoders, including a Graph Isomorphism Network, a Transformer, a 2D CNN, and a 1D CNN. These heterogeneous embeddings are fused through an attention-based mechanism, allowing the model to capture complementary structural and chemical information from multiple molecular perspectives. Evaluated on the Tox21 benchmark across 12 endpoints, MoltiTox achieves a ROC-AUC of 0.831, outperforming its single-modality counterparts. These findings highlight that integrating diverse molecular representations enhances both the robustness and generalizability of toxicity prediction models. Beyond predictive performance, the inclusion of 13C NMR data offers complementary chemical insights that are not fully captured by structure-or language-based representations, suggesting its potential contribution to mechanistic understanding of molecular toxicity. By demonstrating how multimodal integration enriches molecular representations and enhances the interpretability of toxicity mechanisms, MoltiTox provides an extensible framework for developing more reliable models in computational toxicology.

Keywords: multimodal learning, deep learning, Toxicity prediction, Tox21, 13C NMR spectra, Drug Discovery, Cheminformatics

Received: 08 Oct 2025; Accepted: 21 Nov 2025.

Copyright: © 2025 Park and Lee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Sujee Lee, sujeelee@skku.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.