AUTHOR=Bao Qin-Zhou , Yang Yi-Xin , Li Qing , Yang Hai-Chao TITLE=Zero-shot instance segmentation for plant phenotyping in vertical farming with foundation models and VC-NMS JOURNAL=Frontiers in Plant Science VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2025.1536226 DOI=10.3389/fpls.2025.1536226 ISSN=1664-462X ABSTRACT=IntroductionImage instance segmentation is essential for plant phenotyping in vertical farms, yet the diversity of plant types and limited annotated image data constrain the performance of traditional supervised techniques. These challenges necessitate a zero-shot approach to enable segmentation without relying on specific training data for each plant type.MethodsWe present a zero-shot instance segmentation framework combining Grounding DINO and the Segment Anything Model (SAM). To enhance box prompts, Vegetation Cover Aware Non-Maximum Suppression (VC-NMS) incorporating the Normalized Cover Green Index (NCGI) is used to refine object localization by leveraging vegetation spectral features. For point prompts, similarity maps with a max distance criterion are integrated to improve spatial coherence in sparse annotations, addressing the ambiguity of generic point prompts in agricultural contexts.ResultsExperimental validation on two test datasets shows that our enhanced box and point prompts outperform SAM’s everything mode and Grounded SAM in zero-shot segmentation tasks. Compared to the supervised method YOLOv11, our framework demonstrates superior zero-shot generalization, achieving the best segmentation performance on both datasets without target-specific annotations.DiscussionThis study addresses the critical issue of scarce annotated data in vertical farming by developing a zero-shot segmentation framework. The integration of domain-specific indices (NCGI) and prompt optimization techniques provides an effective solution for plant phenotyping, highlighting the potential of weakly supervised models in agricultural computer vision where extensive manual annotation is impractical.