ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Field Robotics

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1609765

This article is part of the Research TopicAutonomous Robotic Systems in Aquaculture: Research Challenges and Industry NeedsView all 3 articles

Leveraging Learned Monocular Depth Prediction for Pose Estimation and Mapping on Unmanned Underwater Vehicles

Provisionally accepted
Marco  JobMarco Job1*David  BottaDavid Botta1Victor  ReijgwartVictor Reijgwart1Luca  EbnerLuca Ebner2Andrej  StuderAndrej Studer2Roland  SiegwartRoland Siegwart1Eleni  KelasidiEleni Kelasidi3,4
  • 1ETH Zürich, Zurich, Switzerland
  • 2Thetys Robotics, Zurich, Switzerland
  • 3NTNU, Trondheim, Sør-Trøndelag, Norway
  • 4SINTEF Ocean, Trondheim, Sør-Trøndelag, Norway

The final, formatted version of the article will be published soon.

This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV's global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV's net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection 1 .

Keywords: localization, Mapping, UUVs, Depth prediction, Aquaculture

Received: 10 Apr 2025; Accepted: 22 May 2025.

Copyright: © 2025 Job, Botta, Reijgwart, Ebner, Studer, Siegwart and Kelasidi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Marco Job, ETH Zürich, Zurich, Switzerland

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.