AUTHOR=Fu Yuliang , Li Weiheng , Li Gang , Dong Yuanzhi , Wang Songlin , Zhang Qingyang , Li Yanbin , Dai Zhiguang 

TITLE=Multi-stage tomato fruit recognition method based on improved YOLOv8

JOURNAL=Frontiers in Plant Science

VOLUME=Volume 15 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1447263

DOI=10.3389/fpls.2024.1447263

ISSN=1664-462X

ABSTRACT=To address the challenges of low recognition and localization efficiency and poor accuracy in multi-stage tomato recognition within complex environments, this study proposed a method based on an improved YOLOv8 model, dubbed YOLOv8-EA. Initially, the EfficientViT network was utilized to replace the backbone of the original YOLOv8 (You Only Look Once version 8) model, reducing the model's parameter count and enhancing its feature extraction capabilities. Subsequently, partial convolution was integrated into the C2f module, forming the C2f-Faster module, which further accelerated the model’s inference speed. The boundary box loss function was modified to SIoU, facilitating faster model convergence and improving the accuracy and precision of detections. Finally, an auxiliary detection head (Aux-Head) module was incorporated to bolster the network's learning potential. On the self-built dataset, the results show that: the accuracy, recall and average precision of the YOLOv8-EA model are 91.4%, 88.7% and 93.9% respectively, and the detection speed is 163.33 frames/s. Compared to the baseline YOLOv8n network, the model weight is increased by 2.07 MB, the accuracy, recall and average precision are improved by 10.9, 11.7 and 7.2 percentage points, the detection speed is improved by 42.1%, and the detection precision is 97.1%, 91% and 93. On the publicly available dataset, the accuracy, recall and average precision of YOLOv8-EA are 91%, 89.2% and 95.1% respectively, and the detection speed is 1.8 ms: YOLOv8-EA model accuracy, recall, and average precision are 91.4%, 88.7%, and 93.9%, respectively, and the detection speed is 163.33 frames/s. Compared to the baseline YOLOv8n network, the model weight is increased by 2.07 MB, the accuracy, recall and average precision are improved by 10.9, 11.7 and 7.2 percentage points, respectively, and the detection speed is improved by 42. 1%, and the detection accuracies for unripe, semi-ripe and ripe tomatoes are 97.1%, 91% and 93.7%, respectively; on the public dataset, the accuracy, recall and average precision of YOLOv8-EA are 91%, 89.2% and 95.1%, respectively, and the detection speed is 1.8ms.The enhanced model is capable of more efficient and accurate recognition of tomatoes at various stages, providing a technical reference for intelligent tomato harvesting.