ORIGINAL RESEARCH article
Front. Robot. AI
Sec. Robot Learning and Evolution
This article is part of the Research TopicReinforcement Learning for Real-World Robot NavigationView all 5 articles
LG-H-PPO: Offline Hierarchical PPO for Robot Path Planning on a Latent Graph
Provisionally accepted- 1中国石油大学(华东), 青岛市, China
- 2China University of Petroleum East China, Qingdao, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract The path planning capability of autonomous robots in complex environments is crucial for their widespread application in the real world. However, long-term decision-making and sparse reward signals pose significant challenges to traditional reinforcement learning (RL) algorithms. Offline hierarchical reinforcement learning offers an effective approach by decomposing tasks into two stages: high-level subgoal generation and low-level subgoal attainment. Advanced Offline HRL methods, such as Guider and HIQL, typically introduce latent spaces in high-level poli-cies to represent subgoals, thereby handling high-dimensional states and enhancing generalization. However, these approaches require the high-level policy to search and generate sub-objectives within a continuous latent space. This remains a complex and sample-inefficient challenge for policy optimization algorithms—particularly policy gradient-based PPO—often leading to unstable training and slow convergence. To address this core limitation, this paper proposes a novel offline hierarchical PPO framework—LG-H-PPO (Latent Graph-based Hierarchical PPO). The core innovation of LG-H-PPO lies in discretizing the continuous latent space into a structured "latent graph." By transforming high-level planning from challenging "continuous creation" to simple "discrete selection," LG-H-PPO substantially reduces the learning difficulty for the high-level policy. Preliminary experiments on standard D4RL offline navigation benchmarks demonstrate that LG-H-PPO achieves significant advantages over advanced baselines like Guider and HIQL in both convergence speed and final task success rates. The main contribution of this paper is introducing graph structures into latent variable HRL planning. This effectively simplifies the action space for high-level policies, enhancing the training efficiency and stability of offline HRL algorithms for long-sequence navigation tasks. It lays the foundation for future offline HRL research combining latent variable representations with explicit graph planning.
Keywords: Latent graph, Offline Hierarchical PPO, Offline Reinforcement Learning, robot path planning, Sparse reward
Received: 01 Nov 2025; Accepted: 08 Dec 2025.
Copyright: © 2025 韩. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: 翔 韩
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.