ORIGINAL RESEARCH article
Front. Robot. AI
Sec. Field Robotics
Volume 12 - 2025 | doi: 10.3389/frobt.2025.1609765
This article is part of the Research TopicAutonomous Robotic Systems in Aquaculture: Research Challenges and Industry NeedsView all 3 articles
Leveraging Learned Monocular Depth Prediction for Pose Estimation and Mapping on Unmanned Underwater Vehicles
Provisionally accepted- 1ETH Zürich, Zurich, Switzerland
- 2Thetys Robotics, Zurich, Switzerland
- 3NTNU, Trondheim, Sør-Trøndelag, Norway
- 4SINTEF Ocean, Trondheim, Sør-Trøndelag, Norway
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV's global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV's net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection 1 .
Keywords: localization, Mapping, UUVs, Depth prediction, Aquaculture
Received: 10 Apr 2025; Accepted: 22 May 2025.
Copyright: © 2025 Job, Botta, Reijgwart, Ebner, Studer, Siegwart and Kelasidi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Marco Job, ETH Zürich, Zurich, Switzerland
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.