ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Technical Advances in Plant Science

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1610443

This article is part of the Research TopicMachine Vision and Machine Learning for Plant Phenotyping and Precision Agriculture, Volume IIView all 32 articles

A Dual-task Segmentation Network Based on Multi-head Hierarchical Attention for 3D Plant Point Cloud

Provisionally accepted
Dan  PanDan Pan1Baijing  LiuBaijing Liu2Lin  LuoLin Luo3An  ZengAn Zeng3*Yuting  ZhouYuting Zhou1Kaixin  PanKaixin Pan1Zhiheng  XianZhiheng Xian4Yulun  XianYulun Xian5Licheng  LiuLicheng Liu2
  • 1School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, Guangdong, China
  • 2School of Information Engineering, Guangdong University of Technology, Guangzhou, China
  • 3School of Computer Science and Engineering, Guangdong University of Technology, Guangzhou, China
  • 4Guangzhou Huitong Agricultural Technology Co.,Ltd., Guangzhou, China
  • 5Guangzhou iGrowLite Agricultural Technology Co.,Ltd., Guangzhou, China

The final, formatted version of the article will be published soon.

The development of automated high-throughput plant phenotyping systems with non-destructive characteristics fundamentally relies on achieving accurate segmentation of botanical structures at both semantic and instance levels. However, most existing approaches rely heavily on empirically determined threshold parameters and rarely integrate semantic and instance segmentation within a unified framework. To address these limitations, this study introduces a methodology leveraging 2D image data of real plants, i.e., Caladium bicolor, captured using a custom-designed plant cultivation platform. A high-quality 3D point cloud dataset was generated through reconstruction.Building on this foundation, we propose a streamlined Dual-Task Segmentation Network (DSN) incorporating a multi-head hierarchical attention mechanism to achieve superior segmentation performance. Also, the dual-task framework employs Multi-Value Conditional Random Field (MV-CRF) to enable semantic segmentation of stem-leaf and individual leaf identification through the DSN architecture when processing manually-annotated 3D point cloud data. The network features a dual-branch architecture: one branch predicts the semantic class of each point, while the other embeds points into a high-dimensional vector space for instance clustering. Multi-task joint optimization is facilitated through the MV-CRF model. Benchmark evaluations validate the novel framework's segmentation efficacy, yielding 99.16% macro-averaged precision, 95.73% class-wise recognition rate, and an average Intersection over Union of 93.64%, while comparative analyses confirm its superiority over nine benchmark architectures in 3D point cloud analytics.For instance segmentation, the model achieved leading metrics of 87.94%, 72.36%, and 71.61%, respectively. Furthermore, ablation studies validated the effectiveness of the network's design and substantiated the rationale behind each architectural choice.

Keywords: Automated plant phenotyping, 3D point cloud segmentation, Multi-head attention, Instance segmentation, Semantic segmentation, Multi-Value Conditional Random Field (MV-CRF)

Received: 12 Apr 2025; Accepted: 18 Jun 2025.

Copyright: © 2025 Pan, Liu, Luo, Zeng, Zhou, Pan, Xian, Xian and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: An Zeng, School of Computer Science and Engineering, Guangdong University of Technology, Guangzhou, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.