ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Integrative Bioinformatics
Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1603133
This article is part of the Research TopicFrom codes to cells to care, transforming health care with AI – Proceedings of the 20th Annual Meeting of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS)View all articles
NeSyDPP4-QSAR: Discovering DPP-4 Inhibitors for Diabetes Treatment with a Neuro-symbolic AI Approach
Provisionally accepted- 1Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, Birmingham, United States
- 2School of Graduate Biomedical Sciences, The University of Alabama at Birmingham, Birmingham, Alabama, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Diabetes Mellitus (DM) is a global epidemic and among the top ten leading causes of mortality (WHO, 2019), projected to rank seventh by 2030. The US National Diabetes Statistics Report (2021) states that 38.4 million Americans have diabetes. Dipeptidyl Peptidase-4 (DPP-4) is an FDA-approved target for type 2 diabetes mellitus (T2DM) treatment. However, current DPP-4 inhibitors are associated with some adverse effects, including gastrointestinal issues, severe joint pain (FDA safety warning report), nasopharyngitis, hypersensitivity, and nausea. Identifying novel inhibitors is essential. Moreover, direct in vivo DPP-4 inhibition assessment is costly and impractical; conducting in-silico IC50 prediction is a viable alternative to assess the efficacy. Quantitative Structure-Activity Relationship (QSAR) modeling is a widely used computational approach for chemical substance assessment. To build DPP4 QSAR, we employ LTN, a neuro-symbolic approach, alongside DNN and Transformers as baselines to compare the developed model performance. After deduplication and thresholding, DPP-4 related data was sourced from PubChem, ChEMBL, BindingDB, and GTP, comprising 6,563 bioactivity records (SMILES-based compounds with IC50 values). A diverse set of features, including descriptors (CDK Extended-PaDEL), fingerprints (Morgan), chemical language model embeddings (ChemBERTa2), LLaMa 3.2 embedding features, and physicochemical properties, are used to train the NeSyDPP4-QSAR model. Our model yielded the highest Accuracy, incorporating CDKextended and Morgan fingerprints, with an accuracy of 0.9725, an F1-score of 0.9723, an ROC AUC of 0.9719, and an MCC of 0.9446. The performance was benchmarked against two standard baseline models: a deep neural network and a transformer, as well as prior SOTA models. We conducted an external evaluation using the DTC dataset to ensure fair comparisons and assess model robustness. NeSyDPP4 achieved a stark performance, with an accuracy score of 0.9579, a ROC-AUC of 0.9565, a Matthews correlation coefficient (MCC) of 0.9171, and an F1 score of 0.9577. Overall, findings suggest that integrating Neuro-symbolic strategy (neural network-based learning and symbolic reasoning) holds immense potential for discovering drugs that can inhibit diabetes mellitus and classifying biological activities that inhibit it.
Keywords: Neuro-symbolic AI, deep learning, Dipeptidyl peptidase-4, Drug-discovery, machine learning
Received: 31 Mar 2025; Accepted: 13 May 2025.
Copyright: © 2025 Hossain, Saghapour and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jake Y Chen, Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, Birmingham, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.