AUTHOR=Li Yan , Li Chunping , Zhu Tingting , Zhang Shurong , Liu Li , Guan Zhanpeng TITLE=A recognition model for winter peach fruits based on improved ResNet and multi-scale feature fusion JOURNAL=Frontiers in Plant Science VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2025.1545216 DOI=10.3389/fpls.2025.1545216 ISSN=1664-462X ABSTRACT=With the continuous advancement of modern agricultural technologies, the demand for precision fruit-picking techniques has been increasing. This study addresses the challenge of accurate recognition and harvesting of winter peaches by proposing a novel recognition model based on the residual network (ResNet) architecture—WinterPeachNet—aimed at enhancing the accuracy and efficiency of winter peach detection, even in resource-constrained environments. The WinterPeachNet model achieves a comprehensive improvement in network performance by integrating depthwise separable inverted bottleneck ResNet (DIBResNet), bidirectional feature pyramid network (BiFPN) structure, GhostConv module, and the YOLOv11 detection head (v11detect). The DIBResNet module, based on the ResNet architecture, introduces an inverted bottleneck structure and depthwise separable convolution technology, enhancing the depth and quality of feature extraction while effectively reducing the model’s computational complexity. The GhostConv module further improves detection accuracy by reducing the number of convolution kernels. Additionally, the BiFPN structure strengthens the model’s ability to detect objects of different sizes by fusing multi-scale feature information. The introduction of v11detect further optimizes object localization accuracy. The results show that the WinterPeachNet model achieves excellent performance in the winter peach detection task, with P = 0.996, R = 0.996, mAP50 = 0.995, and mAP50-95 = 0.964, demonstrating the model’s efficiency and accuracy in the winter peach detection task. The high efficiency of the WinterPeachNet model makes it highly adaptable in resource-constrained environments, enabling effective object detection at a relatively low computational cost.