Adaptive Mapless Mobile Robot Navigation Using Deep Reinforcement Learning based Improved TD3 Algorithm

Nasti, Shoaib  Mohd; Najar, Zahoor  Ahmad; Chishti, Mohammad Ahsan

doi:10.3389/frobt.2025.1625968

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Robot Learning and Evolution

This article is part of the Research TopicAdvances and Challenges in Mobile Robot Design and Control for Diverse EnvironmentsView all 4 articles

Adaptive Mapless Mobile Robot Navigation Using Deep Reinforcement Learning based Improved TD3 Algorithm

Provisionally accepted

Shoaib Mohd Nasti^1*

Zahoor Ahmad Najar¹

Mohammad Ahsan Chishti²

¹Central University of Kashmir, Ganderbal, India
²National Institute of Technology Srinagar, Srinagar, India

The final, formatted version of the article will be published soon.

Navigating in unknown environments without prior maps poses a significant challenge for mobile robots due to sparse rewards, dynamic obstacles, and limited prior knowledge. This paper presents an Improved Deep Reinforcement Learning (DRL) framework based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for adaptive mapless navigation. In addition to architectural enhancements, the proposed method offers theoretical benefits by incorporates a latent-state encoder and predictor module to transform high-dimensional sensor inputs into compact embeddings. This compact representation reduces the effective dimensionality of the state space, enabling smoother value-function approximation and mitigating overestimation errors common in actor–critic methods. It uses intrinsic rewards derived from prediction error in the latent space to promote exploration of novel states. The intrinsic reward encourages the agent to prioritize uncertain yet informative regions, improving exploration efficiency under sparse extrinsic reward signals and accelerating convergence. Furthermore, training stability is achieved through regularization of the latent space via maximum mean discrepancy (MMD) loss. By enforcing consistent latent dynamics, the MMD constraint reduces variance in target value estimation and results in more stable policy updates. Experimental results in simulated ROS2/Gazebo environments demonstrate that the proposed framework outperforms standard TD3 and other improved TD3 variants. Our model achieves a 93.1\% success rate and a low 6.8\% collision rate, reflecting efficient and safe goal-directed navigation. These findings confirm that combining intrinsic motivation, structured representation learning, and regularization-based stabilization produces more robust and generalizable policies for mapless mobile robot navigation.

Keywords: adaptive navigation, deep reinforcement learning, Mapless Navigation, mobile robot, Twin Delayed DDPG

Received: 14 May 2025; Accepted: 11 Nov 2025.

Copyright: © 2025 Nasti, Najar and Chishti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Shoaib Mohd Nasti, meshoaibnasti@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.