AUTHOR=Bockrath Karla , Ernst Liam , Nadeem Rohaan , Pedraza Bryan , Dera Dimah TITLE=Trustworthy navigation with variational policy in deep reinforcement learning JOURNAL=Frontiers in Robotics and AI VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2025.1652050 DOI=10.3389/frobt.2025.1652050 ISSN=2296-9144 ABSTRACT=IntroductionDeveloping a reliable and trustworthy navigation policy in deep reinforcement learning (DRL) for mobile robots is extremely challenging, particularly in real-world, highly dynamic environments. Particularly, exploring and navigating unknown environments without prior knowledge, while avoiding obstacles and collisions, is very cumbersome for mobile robots. MethodsThis study introduces a novel trustworthy navigation framework that utilizes variational policy learning to quantify uncertainty in the estimation of the robot’s action, localization, and map representation. Trust-Nav employs the Bayesian variational approximation of the posterior distribution over the policy-based neural network’s parameters. Policy-based and value-based learning are combined to guide the robot’s actions in unknown environments. We derive the propagation of variational moments through all layers of the policy network and employ a first-order approximation for the nonlinear activation functions. The uncertainty in robot action is measured by the propagated variational covariance in the DRL policy network. At the same time, the uncertainty in the robot’s localization and mapping is embedded in the reward function and stems from the traditional Theory of Optimal Experimental Design. The total loss function optimizes the parameters of the policy and value networks to maximize the robot’s cumulative reward in an unknown environment.ResultsExperiments conducted using the Gazebo robotics simulator demonstrate the superior performance of the proposed Trust-Nav model in achieving robust autonomous navigation and mapping.DiscussionTrust-Nav consistently outperforms deterministic DRL approaches, particularly in complicated environments involving noisy conditions and adversarial attacks. This integration of uncertainty into the policy network promotes safer and more reliable navigation, especially in complex or unpredictable environments. Trust-Nav offers a step toward deployable, self-aware robotic systems capable of recognizing and responding to their own limitations.