Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Sustainable and Intelligent Phytoprotection

This article is part of the Research TopicHighlights of 1st International Conference on Sustainable and Intelligent Phytoprotection (ICSIP 2025)View all 13 articles

Identification of Tobacco Leaf Diseases Using Hyperspectral Imaging and Machine Learning with SHAP Interpretability Analysis

Provisionally accepted
Peng  LuoPeng Luo1Yang  YangYang Yang2Zhang  HuilaiZhang Huilai1Man  YiMan Yi2Xianguo  ZhouXianguo Zhou2Yide  YangYide Yang2Chen  HuabaoChen Huabao1Min  YanMin Yan2*Chunxian  JiangChunxian Jiang1*
  • 1College of Agronomy, Sichuan Agricultural University, Chengdu, China
  • 2Yibin Municipal Company of Sichuan Provincial Tobacco Corporation, Yibin, China

The final, formatted version of the article will be published soon.

Tobacco leaf diseases significantly affect yield and quality, underscoring the need for rapid and nondestructive diagnostic tools. Although hyperspectral imaging (HSI) has been applied in tobacco pathology, most existing studies focus on single diseases and lack generalized, interpretable frameworks for multi-class identification. In this study, hyperspectral images of healthy leaves and four major diseases-brown spot, wildfire, Tobacco Mosaic Virus (TMV), and Potato virus Y (PVY)- were collected to construct a balanced, leaf-independent dataset. Pixels were grouped by leaf ID, and the entire dataset was strictly partitioned at the leaf level to prevent pixel-level data leakage and ensure generalization to unseen leaves. Multiple preprocessing techniques, wavelength-selection methods, and machine-learning classifiers were systematically compared. A compact ANN model integrating Savitzky-Golay preprocessing and SPA-based wavelength selection achieved the best overall performance while requiring only a small number of informative wavelengths. A Transformer model provided slightly stronger predictive capacity but depended on full-spectrum inputs and substantially higher computational cost. Pixel-level predictions enabled lesion-area-based severity estimation for the two leaf-spot diseases. SHAP analysis highlighted physiologically meaningful spectral regions associated with pigment absorption and structural variation. Overall, this study presents an efficient and interpretable HSI framework for multi-disease tobacco diagnosis, supporting the development of practical hyperspectral or multispectral systems.

Keywords: hyperspectral imaging, machine learning, Tobacco Leaf Diseases, Diseaseclassification, SHAP analysis

Received: 24 Sep 2025; Accepted: 02 Dec 2025.

Copyright: © 2025 Luo, Yang, Huilai, Yi, Zhou, Yang, Huabao, Yan and Jiang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Min Yan
Chunxian Jiang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.