AUTHOR=Yan Jingkun , Yan Tianying , Ye Weixin , Lv Xin , Gao Pan , Xu Wei TITLE=Cotton leaf segmentation with composite backbone architecture combining convolution and attention JOURNAL=Frontiers in Plant Science VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1111175 DOI=10.3389/fpls.2023.1111175 ISSN=1664-462X ABSTRACT=Plant leaf segmentation, especially leaf edge accurate recognition, is the data support for automatically measuring plant phenotypic parameters. With the powerful computing power, the segmentation model based on deep learning has achieved encouraging results in plant segmentation in complex field environments. However, adjusting the backbone in the current cutting-edge segmentation model for cotton leaf segmentation applications requires various trial and error costs (e.g., expert experience and computing costs). Thus, a simple and effective semantic segmentation architecture (our model) based on the composite backbone was proposed, considering the computational requirements of the mainstream Transformer backbone integrating attention mechanism. In this study, five typical cotton leaves against complex field environments (normal, spotted lesions, regional lesions, occluded blades, uneven illumination) in budding, flowering, and boll periods were collected, preprocessed, and labeled to form an 800-image dataset. The composite backbone of our model was composed of CoAtNet and Xception. CoAtNet integrated the attention mechanism of the Transformers into the convolution operation. At the same time, our model based on DeepLab v3+ was combined with the fusion mechanism of multi-scale features and auxiliary supervision strategy. The experimental results showed that our model outperformed the benchmark segmentation models PSPNet, DANet, CPNet, and DeepLab v3+ on the cotton leaf dataset, especially on the leaf edge segmentation (MIoU: 0.940, BIoU: 0.608). The composite backbone of our model integrated the convolution of the convolutional neural networks and the attention of the Transformers, which alleviated the computing power requirements of the Transformers under excellent performance. In addition, the composite backbone of our model reduced the trial and error cost of adjusting the segmentation model architecture for specific agricultural applications. The composite backbone of our model provides a potential scheme for high-throughput phenotypic feature detection of plants.