Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Technical Advances in Plant Science

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1666374

Swin-YOLO-SAM: A Hybrid Transformer Based Framework Integrating Swin Transformer, YOLOv12, and SAM-2.1 for Automated Identification, and Segmentation of Date Palm Leaf Diseases

Provisionally accepted
Ali  Saeed AlzahraniAli Saeed Alzahrani1Abid  IqbalAbid Iqbal1Wisal  ZafarWisal Zafar2Ghassan  HusnainGhassan Husnain2*
  • 1King Faisal University, Al Ahsa, Saudi Arabia
  • 2CECOS University of Information Technology and Emerging Sciences, Peshawar, Pakistan

The final, formatted version of the article will be published soon.

The cultivation of date palm (Phoenix dactylifera L.) is acutely impacted by numerous fungal, bacterial, and pest-related diseases that diminish yield, spoil fruit quality, and undermine long-term agricultural sustainability. The traditional methods of monitoring diseases, which rely heavily on expert knowledge, are not scalable and depend heavily on classical models that do not generalize readily to real-world conditions. Recent improvements in deep learning over the last two decades, particularly with Convolutional Neural Networks (CNNs), have led to significantly greater automation. However, CNNs still require relatively large labeled datasets and struggle with ambiguous or complex background features, small lesions, and overlapping symptoms when diagnosing plant diseases. To address these difficulties, we introduce an innovative hybrid Transformer deep learning framework based on four sophisticated modules: (1) Swin Transformer for hierarchical image classification; (2) YOLOv12 for real-time detection; (3) Grounding DINO with SAM2.1 for zero-shot segmentation; and (4) Vision Transformer (ViT) and a regression head for predicting disease severity. This collective architecture can deliver accurate detection, the most accurate segmentation, and quantification of disease severity in real-world, low-annotation-based scenarios and adverse environmental context situations. Experimental findings on a curated dataset of 13,459 palm leaf images show that the proposed model outperforms all previous CNN-based models with a classification accuracy of 98.91%, a precision of 98.85%, a recall of 96.8%, and an

Keywords: Date Palm Disease, Plant disease detection, Severity prediction, Zero-shotSegmentation, Automated Crop Monitoring, Grounding DINO, Segment Anything Model(SAM)

Received: 15 Jul 2025; Accepted: 25 Aug 2025.

Copyright: © 2025 Alzahrani, Iqbal, Zafar and Husnain. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ghassan Husnain, CECOS University of Information Technology and Emerging Sciences, Peshawar, Pakistan

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.