AUTHOR=Wang Zhenggang , Song Shuhong , Cheng Shenghui 

TITLE=Path planning of mobile robot based on improved double deep Q-network algorithm

JOURNAL=Frontiers in Neurorobotics

VOLUME=Volume 19 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2025.1512953

DOI=10.3389/fnbot.2025.1512953

ISSN=1662-5218

ABSTRACT=Aiming at the problems of slow network convergence, poor reward convergence stability, and low path planning efficiency of traditional deep reinforcement learning algorithms, this paper proposes a BiLSTM-D3QN (Bidirectional Long and Short-Term Memory Dueling Double Deep Q-Network) path planning algorithm based on the DDQN (Double Deep Q-Network) decision model. Firstly, a Bidirectional Long Short-Term Memory network (BiLSTM) is introduced to make the network have memory, increase the stability of decision making and make the reward converge more stably; secondly, Dueling Network is introduced to further solve the problem of overestimating the Q-value of the neural network, which makes the network able to be updated quickly; Adaptive reprioritization based on the frequency penalty function is proposed. Experience Playback, which extracts important and fresh data from the experience pool to accelerate the convergence of the neural network; finally, an adaptive action selection mechanism is introduced to further optimize the action exploration. Simulation experiments show that the BiLSTM-D3QN path planning algorithm outperforms the traditional Deep Reinforcement Learning algorithm in terms of network convergence speed, planning efficiency, stability of reward convergence, and success rate in simple environments; in complex environments, the path length of BiLSTM-D3QN is 20 m shorter than that of the improved ERDDQN (Experience Replay Double Deep Q-Network) algorithm, the number of turning points is 7 fewer, the planning time is 0.54 s shorter, and the success rate is 10.4% higher. The superiority of the BiLSTM-D3QN algorithm in terms of network convergence speed and path planning performance is demonstrated.