ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 8 - 2025 | doi: 10.3389/frai.2025.1561292
This article is part of the Research TopicAdvancing Human Well-being: Environment-Focused AI TechnologiesView all 3 articles
Systematic Analysis of Hepatotoxicity: Combining Literature Mining and AI Language Models
Provisionally accepted- 1MicroDiscovery GmbH, Berlin, Baden-Württemberg, Germany
- 2Department Toxicogenomics, Maastricht University, Maastricht, Netherlands
- 3Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The body of toxicological knowledge and literature is expanding at an accelerating pace. This rapid growth presents significant challenges for researchers, who must stay abreast with latest studies while also synthesizing the vast amount of published information.Goal: Our goal is to automatically identify potential hepatoxicants from over 50,000 compounds using the wealth of scientific publications and knowledge.We employ and compare three distinct methods for automatic information extraction from unstructured text: (1) text mining (2) word embeddings and (3) large language models. These approaches are combined to calculate a hepatotoxicity score for over 50,000 compounds.We assess the performance of the different methods with a use case on Drug-Induced Liver Injury (DILI).We evaluated hepatotoxicity for over 50,000 compounds and calculated a hepatotoxicity score for each compound. Our results indicate that text mining is effective for this purpose, achieving an Area Under the Curve (AUC) of 0.8 in DILI validation. Large language models performed even better, with an AUC of 0.85, thanks to their ability to interpret the semantic context accurately. Combining these methods further improved performance, yielding an AUC of 0.87 in DILI validation. All findings are available for download to support further research on toxicity assessment.We demonstrated that automated text mining is able to successfully assess the toxicity of compounds. A text mining approach seems to be superior to word embeddings.However, the application of a large language model with prompt engineering showed the best performance.
Keywords: Toxicology, Hepatotoxicity, text mining, Artificial intelligence (AI), large language model (LLM)
Received: 15 Jan 2025; Accepted: 30 Jun 2025.
Copyright: © 2025 Bauer, Duc Dang, van den Beucken, Schuchhardt and Herwig. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Ralf Herwig, Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.