Deep Q-Managed: A New Framework For Multi-Objective Deep Reinforcement Learning

Menezes, Richardson; Freire de Oliveira, Thiago Henrique; De Souza Medeiros, Luiz Paulo; Doria Neto, Adriao

doi:10.3389/frai.2025.1683323

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

This article is part of the Research TopicAdvanced Machine Learning Techniques for Single or Multi-Modal Information ProcessingView all 5 articles

Deep Q-Managed: A New Framework For Multi-Objective Deep Reinforcement Learning

Provisionally accepted

Richardson Menezes^1,2*

Thiago Henrique Freire de Oliveira³

Luiz Paulo De Souza Medeiros¹

Adriao Doria Neto¹

¹Universidade Federal do Rio Grande do Norte, Natal, Brazil
²Tractian Tecnologies Inc., São Paulo, Brazil
³Instituto Federal de Educacao Ciencia e Tecnologia do Rio Grande do Norte, Natal, Brazil

The final, formatted version of the article will be published soon.

This paper introduces Deep Q-Managed, a novel multi-objective reinforcement leaning (MORL) algorithm designed to discover all policies within the Pareto Front. This approach enhances multi-objective optimization by integrating deep leaning techniques, including Double and Dueling Networks, to effectively mitigate the curve of dimensionality and overestimation bias. Deep Q-Managed demonstrates high proficiency in attaining non-dominated multi-objective policies across deterministic episodic environments, adapting to convex, concave, or mixed Pareto Front complexities. Experiments on traditional MORL benchmarks (Deep Sea Treasure, Bountiful Sea Treasure, and Modified Bountiful Sea Treasure) show it consistently achieves maximum hypervolume values (e.g., 1155 for DST, 3352 for BST, 2632 for MBST) and locates all Pareto Front points. While robust and versatile for practical applications in robotics, finance, and healthcare, this study's validation is currently confined to deterministic episodic settings, with stochastic environments reserved for future work.

Keywords: Deep Q-learning, Double Q-Learning, Dueling networks, Multiobjective reinforcement learning, machine learning

Received: 10 Aug 2025; Accepted: 27 Oct 2025.

Copyright: © 2025 Menezes, Freire de Oliveira, De Souza Medeiros and Doria Neto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Richardson Menezes, richardson-santiago@hotmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.