Optimizing polymorphic tomato picking detection: improved YOLOv8n architecture to tackle data under complex environments

Li, Qiang; Mao, Jie; Zhao, Pengxin; Lv, Qing; Fu, Chao

doi:10.3389/fpls.2025.1660480

ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Sustainable and Intelligent Phytoprotection

Optimizing polymorphic tomato picking detection: improved YOLOv8n architecture to tackle data under complex environments

Provisionally accepted

Qiang Li

Jie Mao

Pengxin Zhao Qing Lv

Qing Lv^*

Chao Fu^*

Vocational and Technical College, Hebei Normal University, Shijiazhuang, China

The final, formatted version of the article will be published soon.

In modern agriculture, tomatoes, as key economic crops, face challenges during harvesting due to complex growth environments; traditional object detection technologies are limited by performance and struggle to accurately identify and locate ripe and small-target tomatoes under leaf occlusion and uneven illumination. To address these issues, this study sets YOLOv8n as the baseline model, focuses on improving it to enhance performance per tomato detection's core needs—first analyzing YOLOv8n's inherent bottlenecks in feature extraction and small-target recognition, then proposing targeted schemes: specifically, to boost feature extraction, a Space-to-Depth convolution module (SPD) is introduced by restructuring convolutional operations, significantly strengthening the model's ability to capture tomato-related features; to improve small-target detection, a dedicated small-target detection layer is added and integrated with the Parallelized Patch-Aware Attention mechanism (PPA), forming a dual-adjustment strategy that effectively improves detection accuracy of small-sized tomato fruits and ripe tomatoes; meanwhile, to balance performance and efficiency, a lightweight Slim-Neck structure and a self-developed Detect_CBAM detection head are adopted, reducing computational complexity while ensuring detection capability; finally, the Distance-Intersection over Union loss function (DIoU) optimizes gradient distribution during training, further improving the model's training efficiency and effectiveness. Experiments are conducted on the self-built "tomato_dataset"—7,160 images, divided into 5,008 for training, 720 for validation, 1,432 for testing—with evaluation metrics including bounding box precision, recall, mAP@0.5, mAP@0.5:0.95, Parameters, and FLOPS, and performance comparisons made with mainstream YOLO models (YOLOv5n, YOLOv6n, YOLOv8n) and lightweight models (SSD-MobileNetv2, EfficientDet-D0), and two-stage algorithms (Faster R-CNN, Cascade R-CNN); results show the improved model achieves 89.6% precision, 87.3% recall, 93.5% mAP@0.5, 58.6% mAP@0.5:0.95, significantly outperforming YOLOv8n and most This is a provisional file, not the final typeset article comparative models, and the two-stage algorithms in both detection accuracy and efficiency. In conclusion, this study solves detection problems of ripe and small-target tomatoes in polymorphic environments, improves the model's accuracy and robustness, provides reliable technical support for automated harvesting, and contributes to modern agricultural intelligent development.

Keywords: YOLOv8n, Tomato detection, small object detection layer, polymorphicenvironment, deep learning

Received: 06 Jul 2025; Accepted: 23 Sep 2025.

Copyright: © 2025 Li, Mao, Zhao, Lv and Fu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Qing Lv, lvqing@hebtu.edu.cn
Chao Fu, fuchao@hebtu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.