AUTHOR=Alzahrani Ali Saeed , Iqbal Abid , Zafar Wisal , Husnain Ghassan TITLE=Swin-YOLO-SAM: a hybrid Transformer-based framework integrating Swin Transformer, YOLOv12, and SAM-2.1 for automated identification and segmentation of date palm leaf diseases JOURNAL=Frontiers in Plant Science VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2025.1666374 DOI=10.3389/fpls.2025.1666374 ISSN=1664-462X ABSTRACT=The cultivation of date palm (Phoenix dactylifera L.) is acutely impacted by numerous fungal, bacterial, and pest-related diseases that diminish yield, spoil fruit quality, and undermine long-term agricultural sustainability. The traditional methods of monitoring diseases, which rely heavily on expert knowledge, are not scalable and depend heavily on classical models that do not generalize readily to real-world conditions. Recent improvements in deep learning over the last two decades, particularly with Convolutional Neural Networks (CNNs), have led to significantly greater automation. However, CNNs still require relatively large labeled datasets and struggle with ambiguous or complex background features, small lesions, and overlapping symptoms when diagnosing plant diseases. To address these difficulties, we introduce an innovative hybrid Transformer deep learning framework based on four sophisticated modules: 1) Swin Transformer for hierarchical image classification, 2) YOLOv12 for real-time detection, 3) Grounding DINO with SAM2.1 for zero-shot segmentation, and 4) Vision Transformer (ViT) and a regression head for predicting disease severity. This collective architecture can deliver accurate detection, the most accurate segmentation, and quantification of disease severity in real-world, low-annotation-based scenarios and adverse environmental context situations. Experimental findings on a curated dataset of 13,459 palm leaf images show that the proposed model outperforms all previous CNN-based models with a classification accuracy of 98.91%, a precision of 98.85%, a recall of 96.8%, and an F1-score of 96.4%. These results represent a new standard for automated, scalable, and interpretable disease diagnosis in precision agriculture.