ORIGINAL RESEARCH article
Front. Pharmacol.
Sec. Experimental Pharmacology and Drug Discovery
Volume 16 - 2025 | doi: 10.3389/fphar.2025.1639979
This article is part of the Research TopicAdvancing Drug Discovery with AI: Drug-Target Interactions, Mechanisms of Action, and ScreeningView all 8 articles
Leveraging Multimodal Learning for Enhanced Drug-Target Interaction Prediction
Provisionally accepted- North China Electric Power University, Beijing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The evolving landscape of artificial intelligence in drug discovery necessitates increasingly sophisticated approaches to predict drug-target interactions (DTIs) with high precision and generalizability. In alignment with the current surge of interest in AI-driven pharmacological modeling and integrative biomedical data analysis, this study introduces a multimodal framework for enhancing DTI prediction by fusing heterogeneous data sources. While conventional methods typically rely on unimodal inputs such as chemical structures or protein sequences, they fall short in capturing the complex, multi-faceted nature of biochemical interactions and are often limited in adaptability across different tasks or incomplete datasets. These limitations impede the model's capability to generalize beyond narrow benchmarks and reduce interpretability when modalities are missing or noisy. To address these challenges, we propose a comprehensive multimodal learning pipeline composed of three principal innovations. The Unified Multimodal Molecule Encoder (UMME) jointly embeds molecular graphs, textual descriptions, transcriptomics, protein sequences, and bioassay data using modality-specific enc followed by a hierarchical attention-based fusion strategy. This encoder is capable of aligning intra-and inter-modal representations while retaining high-level semantic features critical for interaction prediction. We introduce a robust training strategy named Adaptive Curriculum-guided Modality Optimization (ACMO), which dynamically prioritizes more reliable or informative modalities during early training and gradually incorporates less certain data via a curriculum mechanism. This allows the model to maintain strong performance even when faced with modality absence or noise, thereby mimicking realistic drug screening conditions. We employ a novel cross-modal contrastive alignment loss and modality dropout scheduling, which together enforce consistency and encourage generalization across diverse data settings. Experiments on multiple benchmark datasets demonstrate that our framework achieves state-of-the-art performance in drug-target affinity estimation and binding prediction tasks, particularly under conditions of partial data availability. Ablation studies confirm the effectiveness of both UMME and ACMO components in improving accuracy and robustness.
Keywords: multimodal learning, Drug-target interaction, Molecular encoding, Curriculum learning, Biomedical Data Fusion
Received: 03 Jun 2025; Accepted: 08 Sep 2025.
Copyright: © 2025 Sun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Kaixin Sun, pasemhannay@hotmail.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.