ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Genomic Analysis
Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1633623
This article is part of the Research TopicAI in Genomic AnalysisView all articles
Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler
Provisionally accepted- 1University of the State of Amazonas, Manaus, Brazil
- 2Vale Institute of Technology, Belém, Pará, Brazil
- 3Universidade de Sao Paulo, So Paulo, Brazil
- 4Universite de Montpellier, Montpellier, France
- 5Universidade Federal do Para, Belm, Brazil
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Genome assembly remains an unsolved problem, and de novo strategies (i.e., those run without a reference) are relevant but computationally complex tasks in genomics. Although de novo assemblers have been previously successfully applied in genomic projects, there is still no 'best assembler', and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning has emerged as an alternative (or complementary) way to develop accurate, fast and autonomous assemblers. Reinforcement learning has proven promising for solving complex activities without supervision, such as games, and there is a pressing need to understand the limits of this approach to 'real-life' problems, such as the DNA fragment assembly problem. In this study, we analyze the boundaries of applying machine learning via reinforcement learning (RL) for genome assembly. We expand upon the previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing (>300% improvement). We tested the new approaches on 23 environments. Our results suggest the unsatisfactory performance of the approaches, both in terms of assembly quality and execution time, providing strong evidence for the poor scalability of the studied reinforcement learning approaches to the genome assembly problem. Finally, we discuss the existing proposal, complemented by attempts at improvement that also proved insufficient. In doing so, we contribute to the scientific community by offering a clear mapping of the limitations and challenges that should be taken into account in future attempts to apply reinforcement learning to genome assembly.
Keywords: reinforcement learning, genome assembly, machine learning, artificial intelligence, Bioinformatics Reinforcement Learning, bioinformatics
Received: 22 May 2025; Accepted: 29 Jul 2025.
Copyright: © 2025 Padovani, Borges, Xavier, De Carvalho, Reali, Chateau and Alves. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Kleber Padovani, University of the State of Amazonas, Manaus, Brazil
Rafael Cabral Borges, Vale Institute of Technology, Belém, Pará, Brazil
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.