ORIGINAL RESEARCH article
Front. Robot. AI
Sec. Robot Learning and Evolution
Volume 12 - 2025 | doi: 10.3389/frobt.2025.1652050
This article is part of the Research TopicReinforcement Learning for Real-World Robot NavigationView all 3 articles
Trustworthy Navigation with Variational Policy in Deep Reinforcement Learning
Provisionally accepted- 1Rochester Institute of Technology (RIT), Rochester, United States
- 2The University of Texas Rio Grande Valley, Brownsville, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Developing a reliable and trustworthy navigation policy in deep reinforcement learning (DRL) for mobile robots is extremely challenging, particularly in real-world, highly dynamic environments. Exploring and navigating unknown environments without prior knowledge, while avoiding obstacles and collisions, is very cumbersome for mobile robots. This study introduces a novel trustworthy navigation framework that utilizes variational policy learning to quantify uncertainty in the estimation of the robot's action, localization, and map representation. Trust-Nav employs the Bayesian variational approximation of the posterior distribution over the policy-based neural network's parameters. Policy-based and value-based learning are combined to guide the robot's actions in unknown environments. We derive the propagation of variational moments through all layers of the policy network and employ a first-order approximation for the nonlinear activation functions. The uncertainty in robot action is measured by the propagated variational covariance in the DRL policy network. At the same time, the uncertainty in the robot's localization and mapping is embedded in the reward function and stems from the traditional Theory of Optimal Experimental Design. The total loss function optimizes the parameters of the policy and value networks to maximize the robot's cumulative reward in an unknown environment. Experiments conducted using the Gazebo robotics simulator demonstrate the superior performance of the proposed Trust-Nav model in achieving robust autonomous navigation and mapping. Trust-Nav consistently outperforms deterministic DRL approaches, particularly in complicated environments involving noisy conditions and adversarial attacks.
Keywords: deep reinforcement learning, Robot uncertainty, trustworthy navigation, variational policy, Moment propagation
Received: 23 Jun 2025; Accepted: 03 Sep 2025.
Copyright: © 2025 Dera, Van Aardt, Ernst, Nadeem and Pedraza. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dimah Dera, Rochester Institute of Technology (RIT), Rochester, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.