ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Integrative Bioinformatics

Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1603133

This article is part of the Research TopicFrom codes to cells to care, transforming health care with AI – Proceedings of the 20th Annual Meeting of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS)View all articles

NeSyDPP4-QSAR: Discovering DPP-4 Inhibitors for Diabetes Treatment with a Neuro-symbolic AI Approach

Provisionally accepted
  • 1Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, Birmingham, United States
  • 2School of Graduate Biomedical Sciences, The University of Alabama at Birmingham, Birmingham, Alabama, United States

The final, formatted version of the article will be published soon.

Diabetes Mellitus (DM) is a global epidemic and among the top ten leading causes of mortality (WHO, 2019), projected to rank seventh by 2030. The US National Diabetes Statistics Report (2021) states that 38.4 million Americans have diabetes. Dipeptidyl Peptidase-4 (DPP-4) is an FDA-approved target for type 2 diabetes mellitus (T2DM) treatment. However, current DPP-4 inhibitors are associated with some adverse effects, including gastrointestinal issues, severe joint pain (FDA safety warning report), nasopharyngitis, hypersensitivity, and nausea. Identifying novel inhibitors is essential. Moreover, direct in vivo DPP-4 inhibition assessment is costly and impractical; conducting in-silico IC50 prediction is a viable alternative to assess the efficacy. Quantitative Structure-Activity Relationship (QSAR) modeling is a widely used computational approach for chemical substance assessment. To build DPP4 QSAR, we employ LTN, a neuro-symbolic approach, alongside DNN and Transformers as baselines to compare the developed model performance. After deduplication and thresholding, DPP-4 related data was sourced from PubChem, ChEMBL, BindingDB, and GTP, comprising 6,563 bioactivity records (SMILES-based compounds with IC50 values). A diverse set of features, including descriptors (CDK Extended-PaDEL), fingerprints (Morgan), chemical language model embeddings (ChemBERTa2), LLaMa 3.2 embedding features, and physicochemical properties, are used to train the NeSyDPP4-QSAR model. Our model yielded the highest Accuracy, incorporating CDKextended and Morgan fingerprints, with an accuracy of 0.9725, an F1-score of 0.9723, an ROC AUC of 0.9719, and an MCC of 0.9446. The performance was benchmarked against two standard baseline models: a deep neural network and a transformer, as well as prior SOTA models. We conducted an external evaluation using the DTC dataset to ensure fair comparisons and assess model robustness. NeSyDPP4 achieved a stark performance, with an accuracy score of 0.9579, a ROC-AUC of 0.9565, a Matthews correlation coefficient (MCC) of 0.9171, and an F1 score of 0.9577. Overall, findings suggest that integrating Neuro-symbolic strategy (neural network-based learning and symbolic reasoning) holds immense potential for discovering drugs that can inhibit diabetes mellitus and classifying biological activities that inhibit it.

Keywords: Neuro-symbolic AI, deep learning, Dipeptidyl peptidase-4, Drug-discovery, machine learning

Received: 31 Mar 2025; Accepted: 13 May 2025.

Copyright: © 2025 Hossain, Saghapour and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jake Y Chen, Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, Birmingham, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.