ORIGINAL RESEARCH article
Front. Mar. Sci.
Sec. Marine Affairs and Policy
Volume 12 - 2025 | doi: 10.3389/fmars.2025.1598380
This article is part of the Research TopicEmerging Computational Intelligence Techniques to Address Challenges in Oceanic ComputingView all 6 articles
Advancing Ship Automatic Navigation Strategy with Prior Knowledge and Hierarchical Penalty in Irregular Obstacles: A Reinforcement Learning Approach to Enhanced Efficiency and Safety
Provisionally accepted- 1Naval Architecture And Shipping College, Guangdong Ocean University, Zhanjiang, Guangdong Province, China
- 2Guangdong Ocean University, Zhanjiang, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
With the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)-based autopilot methods still face challenges such as low learning efficiency, redundant invalid exploration, and limited obstacle avoidance capability. To this end, this research proposes a GEPA model that integrates prior knowledge and hierarchical reward and punishment mechanisms to optimize the autopilot strategy for unmanned vessels based on deep Q-network (DQN). The GEPA model introduces a priori knowledge to guide the decision-making of the intelligent agent, reduces invalid explorations, and accelerates the learning convergence, and combines with hierarchical composite reward and punishment mechanisms to improve the rationality and safety of autopilot by means of end-point incentives, path-guided rewards, and irregular obstacle avoidance penalties. The experimental results show that the GEPA model outperforms the existing methods in terms of navigating efficiency, training convergence speed, path smoothness, obstacle avoidance ability and safety, with the number of training rounds to complete the task reduced by 24.85%, the path length
Keywords: deep reinforcement learning, unmanned ship, Prior Knowledge, Hierarchical Composite Reward and Penalties, Irregular obstacle
Received: 23 Mar 2025; Accepted: 24 Apr 2025.
Copyright: © 2025 Zhang, Li, Liang, Wang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jiawen Li, Guangdong Ocean University, Zhanjiang, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.