AUTHOR=Stepanov Ihor , Ivasiuk Arsentii , Yavorskyi Oleksandr , Frolova Alina TITLE=Comparative analysis of classification techniques for topic-based biomedical literature categorisation JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1238140 DOI=10.3389/fgene.2023.1238140 ISSN=1664-8021 ABSTRACT=Scientific articles serve as vital sources of biomedical information, but with the yearly growth in publication volume, processing such vast amounts of information has become increasingly challenging. This difficulty is particularly pronounced when it requires the expertise of highly qualified professionals. Our research focused on the domain-specific articles classification to determine whether they contain information about drug-induced liver injury (DILI). DILI is a clinically significant condition and one of the reasons for drug registration failures. The rapid and accurate identification of drugs that may cause such conditions can prevent side effects in millions of patients. Developing a text classification method can help regulators, such as FDA, much faster at a massive scale identify facts of potential DILI of concrete drugs. In our study, we compared several text classification methodologies, including transformers, LSTMs, information theory, and statistics-based methods. We conducted experiments with various approaches to enhance the performance of the models on unbalanced data, which closely resembles real-world scenarios. Additionally, we devised a simple and interpretable text classification method that is as fast as Naïve Bayes while delivering superior performance.