Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Field Robotics

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1628213

This article is part of the Research TopicAutonomous Robotic Systems in Aquaculture: Research Challenges and Industry NeedsView all 6 articles

Deep learning methods for 3D tracking of fish in challenging underwater conditions for future perception in autonomous underwater vehicles

Provisionally accepted
  • Norwegian University of Science and Technology, Trondheim, Norway

The final, formatted version of the article will be published soon.

Due to their utility in replacing workers in tasks unsuitable for humans, unmanned underwater vehicles (UUVs) have become increasingly common tools in the fish farming industry. However, earlier studies and anecdotal evidence from farmers imply that farmed fish tend to move away from and avoid intrusive objects such as vehicles that are deployed and operated inside net pens. Such responses could imply a discomfort associated with the intrusive objects that in turn can lead to stress and impaired welfare in the fish. To avoid this, vehicles and their control systems should be designed to automatically adjust operations when they perceive that they are repelling the fish. A necessary first step in this direction is to develop on-vehicle observation systems for assessing object/vehicle-fish distances in real-time that can provide inputs to the control algorithms. Due to their small size and low weight, modern cameras are ideal for this purpose. Moreover, the ongoing rapid developments within deep learning methods are enabling increasingly sophisticated methods for analysing footage from cameras. To explore this potential, we developed three new pipelines for the automated assessment of fish-camera distances in video and images. These methods were complemented by a recently published method, yielding four pipelines in total, SegmentDepth, BBoxDepth and SuperGlue that were based on stereo-vision and DepthAnything that was monocular. Overall performance was investigated using field data and comparing the fish-object distances from the methods with data measured using a sonar. The four methods were then benchmarked by comparing the number of objects detected, and the quality and overall accuracy of the stereo matches (only stereo-based methods). SegmentDepth, DepthAnything, and SuperGlue performed well in comparison with the sonar data, yielding mean absolute errors (MAE) of 0.205 m (95% CI: 0.050-0.360), 0.412 m (95% CI: 0.148-0.676) and 0.187 m (95% CI: 0.073-0.300) respectively and were integrated into the ROS2 framework to enable real-time application on fish behaviour identification and control of robotic vehicles such as UUVs.

Keywords: Aquaculture, fish tracking, challenging optical conditions, perception in underwater robotics, deep learning

Received: 14 May 2025; Accepted: 22 Sep 2025.

Copyright: © 2025 Føre, O'Brien and Kelasidi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Martin Føre, martin.fore@ntnu.no

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.