AUTHOR=Zeng Qingtian , Sun Jian , Wang Shansong TITLE=DIC-Transformer: interpretation of plant disease classification results using image caption generation technology JOURNAL=Frontiers in Plant Science VOLUME=Volume 14 - 2023 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1273029 DOI=10.3389/fpls.2023.1273029 ISSN=1664-462X ABSTRACT=Disease image classification systems play a crucial role in identifying disease categories in the field of agricultural diseases. However, current plant disease image classification methods can only predict the disease category and do not offer explanations for the characteristics of the predicted disease images. Due to the current situation, this paper employs image description generation technology to produce distinct descriptions for different plant disease categories. A twostage model called DIC-Transformer, which encompasses three tasks (detection, interpretation, and classification), is proposed. In the first stage, Faster R-CNN is utilized to detect the diseased area and generate the feature vector of the diseased image, with the Swin Transformer as the backbone. In the second stage, the model utilizes the Transformer to generate image captions. It then generates the image feature vector, which is weighted by text features, to improve the performance of image classification in the subsequent classification decoder. Additionally, a dataset containing text and visualizations for agricultural diseases (ADCG-18) was compiled. The dataset contains images of 18 diseases and descriptive information about their characteristics. Then, using the ADCG-18, the DIC-Transformer is compared to 11 existing classical caption generation methods and 10 image classification models. The evaluation indicators for captions include Bleu1-4, CiderD, and Rouge. The values of BLEU-1, CIDEr-D, and ROUGE are 0.756, 450.51, and 0.721. The results of DIC-Transformer are 0.01, 29.55, and 0.014 higher than those of the highest-performing comparison model, Fc, respectively. The classification evaluation metrics include accuracy, recall, and F1 score, with accuracy at 0.854, recall at 0.854, and F1 score at 0.853. The results of DIC-Transformer are 0.024, 0.078, and 0.075 higher than those of the highest-performing comparison model, MobileNetV2, respectively. The results indicate that the DIC-Transformer outperforms other comparison models in classification and caption generation.