ORIGINAL RESEARCH article
Front. Plant Sci.
Sec. Sustainable and Intelligent Phytoprotection
This article is part of the Research TopicIntegrating Visual Sensing and Machine Learning for Advancements in Plant Phenotyping and Precision AgricultureView all 4 articles
GrapeUL-YOLO: Bidirectional Cross-Scale Fusion with Elliptical Anchors for Robust Grape Detection in Orchards
Provisionally accepted- Guangdong Polytechnic of Science and Technology, Zhuhai, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate grape detection in orchards is a core link in realizing automated harvesting. To address the challenges in orchard environments, such as complex grape backgrounds, variable lighting conditions, and dense occlusion of fruits, this study proposes a highly robust real-time grape detection model for orchard scenarios, namely Grapevine Ultra-Lightweight YOLO (GrapeUL-YOLO). Based on YOLOv11, this model enhances detection performance through three innovative designs: firstly, it adopts a Cross-Scale Residual Feature Backbone (CSRB) as the feature extraction network, combining 16 × downsampling operation with modules such as C3k2_SP and SPPELAN, which reduces computational complexity while retaining multi-scale features of grapes from small clusters to entire clusters; secondly, it constructs an Adaptive Bidirectional Fusion Network (ABFN) in the detection Neck, and through CARAFE content-aware upsampling and a bidirectional cross-scale concatenation mechanism, it strengthens the interaction between spatial details and semantic information, thereby improving the feature fusion capability in scenes with dense occlusion; thirdly, it designs a shape-adaptive detection Head, which uses customized elliptical anchor boxes to match the natural shape of grapes and detects grape targets of different sizes according to scale division. Experimental results show that on the Embrapa WGISD dataset, the mAP@0.5 of GrapeUL-YOLO reaches 0.912, and the mAP@0.5:0.95 is 0.576, both outperforming 9 mainstream models including CenterNet and YOLOv11; meanwhile, the model has only 5.11M parameters and an average detection time of 16.9ms per image, achieving a balance between high precision and lightweight, and providing an efficient solution for automated grape detection and harvesting in orchards.
Keywords: lightweight object detection, cross-scale feature fusion, Occluded Fruit Detection, Orchard automation, Anchor optimization
Received: 09 Sep 2025; Accepted: 28 Nov 2025.
Copyright: © 2025 Zhu, Yu and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Zhenghong Yu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
