AUTHOR=Lu Yuxiang , Wang Jiahe , Wang Dan , Liu Tang 

TITLE=Efficient greenhouse segmentation with visual foundation models: achieving more with fewer samples

JOURNAL=Frontiers in Environmental Science

VOLUME=Volume 12 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2024.1395337

DOI=10.3389/fenvs.2024.1395337

ISSN=2296-665X

ABSTRACT=The Vision Transformer (ViT) model based on self-supervised learning has achieved outstanding performance in the field of natural image segmentation, which substantiates its broad prospects in visual tasks. However, its performance has declined in the field of remote sensing due to the varying perspectives of remote sensing images and the unique optical properties of certain features, such as the translucency of greenhouses. Additionally, the high cost of training visual foundation model(VFM) also makes it difficult to deploy them from scratch for a specific scene. This study explores the feasibility of rapidly deploying visual foundation model on the new tasks, utilizing the embedding vectors generated by visual foundation model as prior knowledge to enhance the performance of traditional segmentation models. We discovered that the usage of embedding vectors could assist the visual foundation model in achieving rapid convergence as well as significantly improving segmentation accuracy and robustness, with the same amount of trainable parameters. Furthermore, our comparative experiments demonstrated that using only about 40% of the annotated samples can achieve or even exceed the performance of traditional segmentation models using all samples, which has important implications for reducing the reliance on manual annotation. Especially for greenhouse detection and management, our method significantly enhances the accuracy of greenhouse segmentation and reduces dependence on samples, helping the model adapt more quickly to different lighting conditions and enabling more precise monitoring of agricultural resources. This study not only proves the potential ability of visual foundation model in remote sensing tasks but also opens new avenues for the massive and diversified expansion of downstream tasks.