Edited by: Spyros Fountas, Agricultural University of Athens, Greece
Reviewed by: Jun Liu, Weifang University of Science and Technology, China; Sanjaya Tripathy, Birla Institute of Technology, Mesra, India
This article was submitted to Technical Advances in Plant Science, a section of the journal Frontiers in Plant Science
†These authors have contributed equally to this work
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The disease spots on the grape leaves can be detected by using the image processing and deep learning methods. However, the accuracy and efficiency of the detection are still the challenges. The convolutional substrate information is fuzzy, and the detection results are not satisfactory if the disease spot is relatively small. In particular, the detection will be difficult if the number of pixels of the spot is <32 × 32 in the image. In order to effectively address this problem, we present a super-resolution image enhancement and convolutional neural network-based algorithm for the detection of black rot on grape leaves. First, the original image is up-sampled and enhanced with local details using the bilinear interpolation. As a result, the number of pixels in the image increase. Then, the enhanced images are fed into the proposed YOLOv3-SPP network for detection. In the proposed network, the IOU (Intersection Over Union, IOU) in the original YOLOv3 network is replaced with GIOU (Generalized Intersection Over Union, GIOU). In addition, we also add the SPP (Spatial Pyramid Pooling, SPP) module to improve the detection performance of the network. Finally, the official pre-trained weights of YOLOv3 are used for fast convergence. The test set test_pv from the Plant Village and the test set test_orchard from the orchard field were used to evaluate the network performance. The results of test_pv show that the grape leaf black rot is detected by the YOLOv3-SPP with 95.79% detection accuracy and 94.52% detector recall, which is a 5.94% greater in terms of accuracy and 10.67% greater in terms of recall as compared to the original YOLOv3. The results of test_orchard show that the method proposed in this paper can be applied in field environment with 86.69% detection precision and 82.27% detector recall, and the accuracy and recall were improved to 94.05 and 93.26% if the images with the simple background. Therefore, the detection method proposed in this work effectively solves the detection task of small targets and improves the detection effectiveness of the grape leaf black rot.
Grapes are one of the most commonly grown economic fruits in the world, which are often used in the production of wine, fermented beverages, and raisins (Kole et al.,
Currently, the machine vision technologies are widely used in various fields for detection and classification tasks. In the early stages of research on grape leaf diseases using machine learning, Agrawal et al. (
Recently, deep learning has been extensively used for the purpose of detection and classification in various applications. Felzenszwalb et al. (
However, it is noteworthy that the existing networks, such as AlexNet, RCNN, and Fast-RCNN, suffer from non-negligible miss-detections and low recall for the small spots. For instance, if the spot pixels are <32 32 (Bosquet et al.,
The super-resolution is an image processing method which is commonly used in the field of remote sensing (Xie et al.,
In order to improve the detection accuracy of low-resolution small targets in the grape black rot spot detection, in this work, we propose a super-resolution image enhancement and deep learning-based detection of black rot in grape leaves. The proposed method used an improved loss function for performing detections. In addition, we also propose the spatial pyramid pooling (SPP) module in the detection network, which effectively increases the reception range of the backbone features and significantly separates the most important contextual features. Moreover, the proposed method improves the target recall and accuracy as compared with the max pooling technique.
The major contributions of this work are as follows.
We perform enhancement of grape leaves by using the bilinear interpolation.
We improve the YOLOv3 network. The IOU in the original yolo YOLOv3 network is replaced with GIOU. In addition, we also add the SPP module to improve the detection performance of the network.
We perform experiments and analysis to evaluate the effectiveness of the super-resolution image enhancement and improved YOLOv3 network for grape black rot detection.
The data used to perform experiments in this work is the open dataset Plant Village. We select 1,180 images of grapevine leaf black rot for disease detection. We use LabelImg for annotating the diseased parts of the leaves. The average number of diseases present in an image is around 15, with more than 17,000 detection targets present in total. Before starting the training process, we divide 1,180 images into training and test sets. We select 1,072 images for training the network and 108 images as the test set for evaluating the network, which was named test_pv. In addition, 108 images of grape leaves with black rot spots in the orchard environment were collected as an extra test set, which was named test_orchard. We further divide the training set into two parts during the process of network training, namely training set and validation set. The division ratio of training and validation sets is 9:1. In the convolutional neural network, the training set is used for model fitting, and the validation set is a separate sample set in the process of model training, which can be used to adjust the super parameters of the model and to preliminarily evaluate the ability of the model. The test set is used to evaluate the generalization ability of the final model. In this work, the number of epochs is 200, the input batch is 8, the learning rate is 0.001, and the size of the input image is 256 × 256. The coco dataset format is used for the dataset used in this work. We use the pre-trained model weights to accelerate the convergence. We conduct the simulations on Windows 10 based on the pytorch deep learning framework. The computer on which the tests are conducted contains 8GB GPU GeForce GTX 1070Ti and an AMD Ryzen 5 1600X Six-Core Processor. The Python language is used for programming.
In order to improve the resolution of the original image, we use a software method to produce a single high-quality and high-resolution image from a set of low-quality, low-resolution images. This method of transforming images is known as the super-resolution reconstruction (Shen et al.,
The process of bilinear interpolation.
In
Where, f(Q11), f(Q21), f(R1), f(R2), and f(PP) represent the value of corresponding points, respectively.
In order to improve the accuracy of grapevine leaf black rot spot detection, we deign an improved YOLOv3 network. The YOLO (Redmon et al.,
In the original YOLOv3 target detection network, the engagement ratio IOU of the bounding box and the ground truth is used as the loss function. In the improved network proposed in this work, the IOU is replaced by GIOU in the deep learning network. The GIOU is calculated as shown in (2).
where, Ac denotes the area of the minimum closure region of the two boxes, i.e., the ground truth and the predicted bounding box, and U denotes the intersection area of the two boxes. By using the GIOU as a loss function, we avoid the problem caused when the two target boxes have no overlap. So, the gradient is continuously updated and better regression boxes are available during the training process.
The SPP works on the idea of the spatial pyramid (He et al.,
The SPP structure in the improved YOLOv3 proposed in this work.
The YOLOv3-SPP network proposed in this work is implemented by improving the original YOLOv3. The original YOLOv3 network uses Darknet-53 as the backbone network. The Darknet-53 mainly consists of 5 residual blocks. This structure uses the idea of residual neural network, and the idea of FPN (Feature Pyramid Networks, FPN). The up-sampling fusion is adopted to detect the target independently by fusing multiple feature maps at three different scales, including 16 × 16, 32 × 32, and 64 × 64. The size of the minimum prediction frame is 8 × 8 (image size divided by grid size 512/64), which effectively obtains the feature information at low and high levels. This end-to-end network is not only more accurate but is computationally efficient as well. In this work, the SPP module is introduced in the Conv6 layer of the YOLOv3. The YOLOv3-SPP network structure is shown in
The improved YOLOv3 network proposed in this work.
We use the precision (P) and recall (R) as evaluation metrics. The precision of any algorithm is computed as
where, TP represents the true positives, which is expressed as the number of manually labeled grape disease pixels that overlap with pixels in the region automatically detected by the model as grape disease. FP represents the false positives, which is expressed as the number of pixels in the region manually considered as background, but automatically detected by the model as grape leaf region pixels. We calculate recall by using the following expression.
where, FN denotes the false negatives, which indicates the number of pixels that are manually labeled as the grape leaf area pixels, but are detected by the model as background area pixels.
The size of an image in the original dataset is 256 × 256, and the average image size is around 20 kb. After the application of super-resolution technique, we enhance the input image to a size of 512 × 512. Now, the average image size is 100 kb.
The comparison of the original image and the enhanced image.
In this work, we use the annotated images to train the network. The network is trained for 200 epochs which takes around 6 h. The training results are presented in
The training performance of the original dataset YOLOv3-SPP network.
In
The enhanced image dataset with annotation information is fed into the YOLOv3-SPP network for training. Please note that the network parameters are the same as used in the case of training the network on original images. This training process consumes around 7 h. The training results are presented in
The training performance of YOLOv3-SPP network by using the enhanced dataset.
In order to evaluate the performance of the proposed network in terms of detection accuracy, we train the original YOLOv3 network and the YOLOv3-SPP network by using the original images. The precision and recall of both techniques are compared in
The training results of the detection algorithm by using the original images.
It is evident from
It is evident form
In order to further verify the detection accuracy of the original and the proposed YOLOv3, we present the recognition results of the test_pv in
The detection results of the test_pv before and after the improvement of YOLOv3 network.
YOLOv3 | 1,283 | 145 | 259 | 89.85% | 83.75% |
YOLOv3-SPP | 1,427 | 87 | 105 | 94.25% | 93.15% |
The comparison of detection results before and after the improvement of YOLOv3 network.
In this work, the BL is used to enhance the images. The enhanced images are then used for target detection. In this work, before we select the BL as a choice for this work, we compare the two other different super-resolution methods, i.e., the nearest interpolation and the enhanced deep residual networks (EDSR) (Lim et al.,
In this work, we enhance the original images in the training dataset comprising 1,072 images by using all the three aforementioned super-resolution methods. These images are input into the proposed YOLOv3-SPP network for training. The training results are shown in
The training results of YOLOv3-SPP network after image enhancement by using different super-resolution algorithms.
As presented in
Please note that the red and blue curves are almost equal, however, the green curve is slightly below the other two curves. After processing the original image by using the three super-resolution image enhancement methods, the recalls approach 1, which are all higher than directly training the network using the original images. This indicates that the super-resolution enhancement of the images before using it as an input of the CNN for detection is better than using the original images directly. The evaluation metrics, i.e., precision and recall, make it evident that the network performs best when the image is enhanced using the BL technique as compared to the nearest interpolation. As compared to the EDSR method, the result of image enhancement performed using BL method has higher precision. and almost equal recall. However, the BL method has no complex residual convolution and is relatively less computationally intensive. We enhance 108 images in tese_pvaccording to the three aforementioned methods and then use the resultant images to perform detections using the proposed YOLOv3-SPP network. The corresponding results are shown in
The evaluation of test_pv by using different super-resolution methods.
TP | 1,448 | 1,430 | 1,450 |
FP | 65 | 78 | 70 |
FN | 84 | 102 | 82 |
Precision | 95.79% | 94.83% | 95.39% |
Recall | 94.52% | 93.34% | 94.65% |
The training results of the proposed YOLO V3-SPP network by using the original and the enhanced images are presented in
The training results of the network before and after image enhancement.
The test result of the proposed YOLOv3-SPP trained with the original and the enhanced images in test_pv are shown in
The detection results on test_pv set.
Original image of test_pv | 1,532 | 1,427 | 87 | 105 | 94.25% | 93.15% |
Super Resolution of test_pv | 1,532 | 1,448 | 65 | 84 | 95.79% | 94.52% |
The detection results before and after the data is enhanced for test_pv.
The method of super-resolution image enhancement and deep learning can improve the detection effect of grape leaf black rot, which has been proved in the test_pv data set. An additional test set, test_orchard, was used to test the effectiveness of the proposed method in the orchard environment. There are 108 images of grape leaves with 1,275 spots of grape leaf black rot from different orchard environments. The results of spot identification by YOLOv3-SPP were shown in
The detection results before and after the data is enhanced for test_orchard.
The detection results on test_orchard set.
Original image test_orchard set | 1,275 | 1,028 | 186 | 247 | 84.68% | 80.63% |
Super-resolution test_orchard set | 1,275 | 1,049 | 161 | 226 | 86.69% | 82.27% |
The detection precision and recall of test_orchard were lower than that of test_pv, because the images of test_orchard are from orchards, while the images of test_pv from Plant Village, which photograph indoors. The environment of orchards is complex compared to the indoor. The images of the test_orchard were classified into single-leaf and multi-leaf based on the number of grape leaves in the images, to compare the influence of different image acquisition ways on the detection effect. The detection results of the multi-leaf images were shown in
Detection results of multi-leaf images.
The statistical results of the detection of single-leaf and multi-leaf images for test_orchard.
Single-leaf | 701 | 610 | 71 | 91 | 89.57% | 87.02% |
Multi-leaf | 574 | 439 | 90 | 135 | 82.99% | 76.48% |
This shows that the acquisition of single grape leaf image is more conductive to detection.
The background of images also affected the detection effect, so the set of test_orchard was divided into two subsets, simple background images and complex background images based on the status of the background. The images are considered to be complex background, which concludes fruits, branches, and soil except for grape leaf, otherwise, they are considered simple background. The detection results of the complex background images were shown in
Detection results of grapevine leaf black rot with complex background.
The statistical results of the detection of different background for test_orchard.
Simple background | 712 | 664 | 42 | 48 | 94.05% | 93.26% |
Complicated background | 563 | 385 | 119 | 178 | 76.39% | 68.38% |
The method proposed in this work can be used for the detection of grape leaf black rot in the natural environment through the test and analysis of the images collected from orchards, and the detection effect is satisfactory especially in the case of the simple background. At the same time, the analysis results also provide a reference for the field image acquisition, that is, to avoid other objects appearing in the image except for grape leaf.
In this work, we propose an improved YOLOv3-SPP model for the detection of black rot of grape leaves. This method replaces the loss function in the original YOLOv3 with GIOU. In addition, we also add the SPP module. We enhance the training images of YOLOv3-SPP by using BL super-resolution method. Two test sets from Plant Village dataset and orchards are performed on the model. The results show that the YOLOv3-SPP network performs better for grape leaf black rot detection and has a precision of 95.79% and recall of 94.52% for the test set from Plant Village. For the orchards test set, the precision is 86.69% and the recall is 82.27%, it also has better performs than the original images of the test set. In addition, the precision and recall are improved to 94.05 and 93.26% for those images without fruits, branches, and soil in the background. Moreover, the image enhancement of the training set using the BL method improves the results in terms of precision and recall. The current method requires image enhancement and then trains the deep learning network. In future work, we will attempt to combine these steps.
The original contributions presented in the study are included in the article/
JZ, MC, and HY: conceived the idea and proposed the method. JZ and QW: contributed to the preparation of equipment and acquisition of data, and wrote the code and tested the method. JZ, QW, and MC: validation results. JZ and HY: wrote the paper. JZ, HY, and ZC: revised the paper. All authors read and approved the final manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: