ORIGINAL RESEARCH article
Front. Plant Sci.
Sec. Sustainable and Intelligent Phytoprotection
This article is part of the Research TopicSmart Plant Pest and Disease Detection Machinery and Technology: Innovations for Sustainable AgricultureView all 18 articles
Paddy Pest Image Segmentation based on Multiscale attention Fusion VM-UNet
Provisionally accepted- 1SIAS University, Xinzheng, China
- 2Xijing University, Xi'an, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Precise paddy pest image segmentation (PPIS) in the real-time natural environments is an important and challenging research. Convolutional Neural Networks (CNNs) and Transformers are the most popular architectures for image segmentation, but they usually have limitations in modeling global dependencies and quadratic computational complexity, respectively. A multiscale attention fusion VM-UNet (MSAF-VMUNet) for PPIS is constructed. It integrates the long-range dependencies modeling ability of Visual State Space Model (VSS) and the precise positioning capability of U-Net with low computational complexity. In the model, multiscale VSS (MSVSS) block is used to capture the long-range contextual information, and improved attention fusion (IAF) module is designed for multi-level feature learning between Encoder and Decoder. Attention VSS module is introduced in the bottleneck layer to enable the model to adaptively emphasize key features and suppress redundant information. Compared with VM-UNet, MSAF-VMUNet can effectively model global-local and context relationships at the scale layer, and improve the detection performance of various pests in size and shape without increasing computational complexity. The experimental results on the paddy pest subset of the public IP102 dataset validate that MSAF-VMUNet can effectively address the key challenges in field PPIS, including small pest detection, occlusion and noise handling, and preprocessing requirements, and the PPIS presion is 79.17%, which are 15.51% and 3.39% higher than those of the traditional U-Net and the recent VM-UNet, respectively. It provides an effective and reliable solution for pest control detection system in smart agriculture.
Keywords: Paddy pest image segmentation (PPIS), VM-UNet, Improved attention fusion (IAF), Multiscale VSS (MSVSS), Multiscale attention fusion VM-UNet (MSAF-VMUNet)
Received: 10 Sep 2025; Accepted: 16 Dec 2025.
Copyright: © 2025 Zhang, Shao and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Yu Shao
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
