A lightweight convolutional neural network for recognition of severity stages of maydis leaf blight disease of maize

Maydis leaf blight (MLB) of maize (Zea Mays L.), a serious fungal disease, is capable of causing up to 70% damage to the crop under severe conditions. Severity of diseases is considered as one of the important factors for proper crop management and overall crop yield. Therefore, it is quite essential to identify the disease at the earliest possible stage to overcome the yield loss. In this study, we created an image database of maize crop, MDSD (Maydis leaf blight Disease Severity Dataset), containing 1,760 digital images of MLB disease, collected from different agricultural fields and categorized into four groups viz. healthy, low, medium and high severity stages. Next, we proposed a lightweight convolutional neural network (CNN) to identify the severity stages of MLB disease. The proposed network is a simple CNN framework augmented with two modified Inception modules, making it a lightweight and efficient multi-scale feature extractor. The proposed network reported approx. 99.13% classification accuracy with the f1-score of 98.97% on the test images of MDSD. Furthermore, the class-wise accuracy levels were 100% for healthy samples, 98% for low severity samples and 99% for the medium and high severity samples. In addition to that, our network significantly outperforms the popular pretrained models, viz. VGG16, VGG19, InceptionV3, ResNet50, Xception, MobileNetV2, DenseNet121 and NASNetMobile for the MDSD image database. The experimental findings revealed that our proposed lightweight network is excellent in identifying the images of severity stages of MLB disease despite complicated background conditions.


Introduction
In India, maize (Zea Mays L.) is the third most important cereal grain crop. The maize crop is being grown in Kharif and rabi seasons across the country (Kaur et al., 2020). It is considered as the 'Queen of Cereals' due to its multiple use cases, such as staple food for human beings, feed-fodder for livestock animals, raw materials for several processed foods, industrial products, a rich source of starch and so on. As per the reports, around 31.65 mt of maize was produced across the country during 2020-2021 (ICAR-IIMR, 2021). Every year, around 13.2% of the total crop yield is damaged due to the attack of several disease-causing pathogens (Aggarwal et al., 2021). Among several diseases, Maydis leaf blight or MLB (aka Southern corn leaf Blight) is a serious fungal disease across maize-growing regions of India. Generally, the country's warm and humid climatic condition is extremely favorable for the disease development (Malik et al., 2018). The MLB disease is caused by Bipolaris maydis (Nisik. & Miyake) Shoemaker 1959 fungus. In the early stages, its symptoms appear as small and oval to diamond-shaped, necrotic to brown-colored lesions on the leaf surfaces. These lesions get elongated as the disease progresses (Aggarwal et al., 2021). It is reported that this disease alone is capable of causing damage approx. 70% of the total crop yield in severe conditions . The severity of diseases is an important parameter that measures the intensity level of disease symptoms in the affected portion of the crop and is crucial for disease management too . Therefore, our first and foremost aim must be to identify and control the disease at the earliest possible stage of severity to minimize the risk of potential yield loss of maize crop. However, the conventional approach for identifying the severity stages involves visual observations and laboratory analysis. But the fact is, these approaches require highly trained and experienced personnel, which makes them practically infeasible many times. Hence, there is a much need for a precise, quick, costeffective and automated approach to identify the disease severity stages in the field conditions.
In recent years, several computer vision techniques have been applied to several challenging agricultural problems (Kamilaris and Prenafeta-Boldú , 2018). In this connection, the convolutional neural networks (aka CNNs) are considered as the benchmark for different image-based problem identification in the agriculture domain. The CNN approaches have eased the image recognition process by automatically extracting the features from the images as compared to the hand-engineered feature extractions in the traditional machine learning approaches (LeCun et al., 2015). In case of diagnosis of diseases as well as their severity stages, CNNs have shown significantly better results than the traditional image processing and machine learning techniques. In this context, a very limited number of works have been reported to diagnose disease severity stages in maize crop using in-field images. Therefore, we proposed a novel lightweight CNN network for identifying the severity stages of MLB disease in maize crop. This network would be a practical and viable solution for the farm community of the country. The main contributions of this study are provided below: • Created an image database known as MDSD (Maydis leaf blight Disease Severity Dataset) containing digital images of maize leaves infected with MLB disease covering all severity stages. The images of MDSD were collected in non-destructive manner with natural field backgrounds from different agricultural fields. • Proposed a lightweight and efficient convolutional neural network (CNN) model augmented with modified inception modules. The proposed network is trained and validated on the images of the MDSD database for automatic identification of severity stages of MLB disease. • To evaluate the effectiveness of the proposed network, we conducted a comparative analysis of the prediction performance between the proposed model and a few popular state-of-the-art pretrained networks.
This article is organized into six sections. Section 1 (present section) highlights the importance of maize crop, the devastating effect of MLB diseases, constraints of the conventional approaches of disease recognition and management, importance of computer vision-based technologies etc.: Section 2 explores and briefly discusses the related works relevant to the present study, Section 3 explains the materials and methodologies used to carry out the current study; Section 4 reports and discusses the experimental results and finding of the study; Section 5 presents the ablation studies; and Section 6 concludes the whole study highlighting the impact and crucial finding and aligns the future perspective of this study.

Related work
In this section, we will briefly discuss the methodologies proposed by research works from across the globe for recognizing diseases as well their severity stages. In recent years, deep learning-based techniques are gaining momentum for identifying diseases of several crops. Several authors like Mohanty et al. (2016); Sladojevic et al. (2016);Ferentinos (2018); Barbedo (2019) and Atila et al. (2021) focused on identifying the diseases of crops at once by applying variety of deep learning models such as state-of-the-art networks, transfer learning models, custom defined models, hybrid CNN models and many more. These works targeted identifying diseases of multiple crops by a single deep learning model. Whereas most of the reported works aimed at crop-specific disease identification problems such as for Rice crop Chen et al., 2020;Rahman et al., 2020), Wheat crop Picon et al., 2019;, Tomato crop (Fuentes et al., 2018;Zhang et al., 2018), Maize crop (DeChant et al., 2017;Priyadharshini et al., 2019;Lv et al., 2020;Haque et al., 2021;Haque et al., 2022a;Haque et al., 2022b), etc. The experimental findings of these research works reported significant results by employing several types of CNN-based networks to identify the diseases using color images. Some of these works used lab-based images of crop-diseases such as plant village for their model development, while some has used infield images Nowadays, the identification of severity stages of diseases has also attracted the attention of researchers. Significant works have been carried out to identify the disease severity stages using digital images. Wang et al. (2017) applied transfer learning of popular deep CNN models to diagnose disease severity in apple plants and obtained more than 94% classification accuracy on the test dataset. They used publicly available images and assessed them into 4 categories of severity stages for their experiment. Liang et al. (2019) proposed a robust approach for disease diagnosis and disease severity estimation of several crops using deep learning models. Verma et al. (2020) worked on tomato late blight disease and Prabhakar et al. (2020) worked on estimating the severity stages of tomato early blight disease using Deep CNN models. Recently, Sibiya and Sumbwanyambe (2021) used Deep CNN models to classify images of common rust disease of maize crop into four classes of severity levels. They applied fuzzy logic-based techniques to automatically categorize the diseased images into severity categories.  classified the stem rust disease of wheat crop into four severity categories using deep convolutional neural networks. Chen et al. (2021) worked on estimating the severity of the rice bacterial leaf streak disease using a segmentation-based approach. Wang et al. (2021) proposed an image-segmentation-based approach by integrating a deep CNN model to recognize severity stages of downy mildew, powdery mildew and cucumber viral diseases of cucumber crops. Ji and Wu (2022) proposed fuzzy logic integrated deep learning model for detecting the severity levels of grape black measles disease. Liu et al. (2022) developed a twostage CNN model for diagnosing the severity of Alternaria leaf blotch disease of the Apple plant. A summary of the previous works is provided in Table 1.

Flow of the proposed approach
The workflow of the proposed disease severity identification approach is depicted graphically in Figure 1. First, digital images of MLB disease of maize crop were captured from the fields and MDSD image database was created. Next, images were labelled into respective severity categories based on domain experts' observations and saved into respective folders in the storage disk. Then, images were pre-processed and augmented to increase the training dataset; After that, the whole image dataset was split into two categories viz. training and testing sets and the proposed CNN model was trained and validated. Finally, based on the performance evaluation, the MLB diseaseseverity identification model was finalized and its architecture

Image acquisition
In this study, we created an image database known as MDSD containing digital images of maize leaves affected with MLB disease. The images were collected in a non-destructive manner from several agricultural plots located at Bidhan Chandra Krishi Visvavidyalaya, Kalyani (22.9920°N, 88.4495°E) and ICAR-Indian Agricultural Research Institute, New Delhi (28.6331°N, 77.1525°E) during 2018-2020. Digital cameras (Nikon D3500 W/AF) and smartphones (Redmi Y2 and Asus Max Pro M1) were used for capturing the images under normal daylight conditions. We collected the images of MLB disease by focusing the camera lens on the symptomatic portions of leaves starting from the disease incidence stage to the highest severity stage with complex field backgrounds.

Disease severity stages
The images of MLB disease were thoroughly verified and categorized into four groups based on their symptomatic characteristics viz. healthy (no symptoms), low severity, medium severity and high severity stages as provided in Table 2. The categorization into the severity groups was done under the strict supervision of subject matter specialists (domain experts) of maize pathology at ICAR-IIMR, Ludhiana, India. Sample images of each category of MLB disease are shown in Figure 2. Overall framework of the proposed approach for recognition of severity stages of MLB disease.

Image pre-processing
Prior to training process, slight pre-processing of the raw images was required for better modelling. At first, unwanted images like duplicate, noisy, out-of-focus, blurred images were discarded from the raw images. After that, images were resized to 256 × 256 pixel size by keeping hardware system constraints in mind and for better interpretation by the proposed model.

Image augmentation
In order to increase the number of images for model training, synthetic images were generated and augmented with the original dataset. Here, we used two techniques to generate the synthetic images: geometric transformation and brightness adjustment. The overall summary of images in the MDSD database is provided in Table 2.

Geometric transformation
Geometric transformation means transforming the orientation of the images. In this study, we applied several geometric transformations randomly to generate artificial images which involved rotating (90°, 180°and 270°), flipping (top-down and left-right), skewing, and zooming. The geometric transformations were applied using the 'Augmentor' library (Bloice et al., 2019) which provides translation invariance transformation of the images.

Brightness adjustment
As the images were captured using different devices and at different periods of time, the images weren't homogeneous in terms of illumination. The light intensity on the diseased images greatly impacts when we apply computer vision techniques. Hence, we applied a gamma function in our images to generate synthetic images with different brightness levels. The gamma function is an image processing technique that applies the non-linear adjustment to individual pixel values to encode and decode the luminance of an image. The gamma function can be defined mathematically by the following formula (eq. 1).
where, i in is the input images with pixel values scaled from [0, 255] to [0, 1], g is the gamma value, i out is the output image scaled back to [0, 255] and a is a constant value (mainly equal to 1). The gamma values ( g ) < 1 will shift the image towards the darker end of the spectrum while gamma values ( g ) > 1 will make the image brighter and the g=1 will not affect the input image."

Proposed lightweight CNN model
In this study, we proposed a lightweight convolutional neural network (CNN) to identify the severity stages of MLB disease of maize crop. In this network, we have incorporated the modified Inception modules into a simple CNN framework, enhancing the network's finer and multi-scale feature extraction capability. The proposed model is composed of several computational modules which are discussed in following subsections:

CRB layer (crb)
The CRB is the most important layer in the proposed lightweight model which encompasses three popular operations viz. Convolution operation, ReLU and Batch Normalization operation as shown in Figure 3. The main function of this CRB layer was to generate pattern detectors from the images in the form of feature maps.

Convolution operation (conv)
The convolution operation involves the extraction of inherent features (aka feature maps) from the input images by using a set of kernels/filters (LeCun et al., 1998). The kernel/filters are of smaller size than the input images such as 3 × 3 or 1 × 1. Mathematically, the convolution operation is expressed by eq. 2: where, z l k denotes the output feature map of k-th input at l-th layer of the model x l k denotes the k-th input feature map at l-th layer of the model w l k and b l k denotes the weights and bias at the l-th layer of the model 3.6.1.2 ReLU operation (ReLU) ReLU (Rectifier Linear Unit) is the widely used activation function for the CNN models that enhances the non-linear attributes within the input feature maps (Haque et al., 2021). The ReLU function requires less computation hence speed up the overall training process. Its convergence speed is higher than the other functions and induces sparsity in feature maps. It is expressed by the following equation (eq 3): where, z k denotes the output feature map of k-th input feature map

Batch normalization operation (BN)
The batch normalization process transforms a batch of images (say m) to have a mean zero and standard deviation of one. It speeds up the training process and handles the internal covariances of the input feature maps (Ioffe, 2017). The batch normalization is expressed as the following equations (eq. 4 and eq. 5): Framework of the CRB module of the proposed model. where, y i denotes the output feature map z i is the normalized input feature map E(z i ) denotes the mean of the input feature map z i var(z i ) denotes the variance of the input feature map z i g and b are the scaling and offset factors of the network that are trainable

Maxpool module (pool)
The maxpooling operation extracts the maximum element from the respective regions of feature map covered by the pooling kernels (Chollet, 2021). The maxpool layer outputs the most promising features from the input images without adding any extra trainable parameters to the network. In this proposed model, we applied maxpool with a kernel size of 3 x 3 and strides of 1 and 2.

Modified inception module (incep)
Generally, the 'inception' module of Inception networks obtain the integration of sparse structure by approximating the available dense component of the network (Szegedy et al., 2015;Szegedy et al., 2016). In this study, we proposed a modified inception module by applying few changes with respect to the kernel sizes, number of filters and parallel convolutions. In the proposed inception module, we applied symmetrical (1 x 1) and asymmetrical convolution kernels in a parallel manner with a maxpool operation. Here, we factorized the convolutions with spatial filters of n × n (for 3 x 3 or 5 x 5) into asymmetrical convolutions with filter sizes n×1 and 1×n (e.g., 3 x 1 and 1 x 3; 5 x 1 and 1 x 5). Prior to each asymmetrical convolution, one 1 x 1 convolution kernel is incorporated to reduce the representational bottleneck of the network. We also applied ReLU in each convolution operation to induce sparsity in the feature maps (as shown in Figure 4). Finally, the outputs from all parallel convolutions and maxpool layers were concatenated and passed to the next layer of the network.

GAP module (gap)
The GAP or Global Average Pooling is a unique pooling operation designed to generate a scalar vector of features by computing the average of each feature map. It aggressively summarizes the presence of a feature in an image by downsampling the entire input feature map to a single value (Lin et al., 2013). The purpose of the GAP layer was to reduce the chance of overfitting as it doesn't add any extra learnable parameters to the network.

Softmax layer (softmax)
A softmax layer was added at the end point of the proposed CNN model. The softmax layer contains the same number of nodes as the number of classes in the dataset under study. The softmax function generates the output probability values from the input feature vectors. It converts the non-normalized feature vectors of the network into a probability distribution over the predicted output class (Bouchard, 2007). Mathematically, softmax function is expressed as the following equation (eq: 6): where, z j denotes the j-th item of the output feature vector Architecture of the proposed modified inception module. Haque et al. 10.3389/fpls.2022.1077568 Frontiers in Plant Science frontiersin.org The overall framework of the proposed network in a graphical manner is provided in Figure 5. Also, a detailed layer-wise description like layer names, kernel/filter sizes, strides, output shapes, number of kernels/filters and number of training parameters is provided in Table 3.

Evaluation metrics
We evaluated the prediction performance of the proposed CNN model on the images of testing data. We computed the confusion matrix (CM) which represents the model's prediction performance in a tabular fashion. In CM, row elements denote the actual values, while the column entities present the predicted values. In the CM the diagonal elements represent the correct predictions (i.e. true positives (TP) and true negatives (TN)), while the incorrect predictions (i.e. false positives (FP), false negatives (FN)) are denoted by the off-diagonal elements. Also computed the relevant evaluation metrics such as Recall, Precision and f1-score. Overall architectural framework of the proposed CNN model.

Precision =
True Positives TP ð Þ ð Þ False positives FP ð Þ + True Positives TP ð Þ ð Þ f1-Score: f1-Score is the measure that tells us about the robustness of the model. It is the harmonic mean of precision and recall. The following expression calculates it-

Results and discussion
In this study, 1,760 images of MLB disease of maize were collected under the MDSD database from agricultural fields which were then augmented to 14,389 images. The MDSD image database is categorized into 4 groups viz. healthy, low severity, medium severity and high severity based on the intensity levels of the disease symptoms on leaves. We randomly split the whole dataset into two sets viz. training and testing sets in the ratio of 80:20. Here, the proposed convolutional neural network (CNN) was trained and tested with the MDSD dataset for automated diagnosis of severity stages of MLB disease. In this approach, several combinations of CRB and inception modules were attempted. However, CNN network with 10 CRB and 2 modified inception modules gave the optimal classification performance. Furthermore, to inspect the effectiveness of the proposed model, we also employed a few state-of-the-art pre-trained models viz: VGG16, VGG19, InceptionV3, ResNet50, Xception, MobileNetV2, DenseNet121 and NASNetMobile networks in this study. All the models were trained and tested with similar hyperparameters and configurations as shown in Table 4. All the model architectures were implemented in python using the tensorflow environment, an open-source deep learning framework. We performed all the experimental analyses by utilizing the high computation power of the Tesla V100 GPUs in the NVIDIA DGX servers. In the present study, we trained and validated our proposed CNN model 500 times (epochs) using a batch size of 128 (according to the hardware system feasibility) on the MDSD database. Our proposed model achieved the training accuracy of 99.78% with loss of 0.046, whereas the testing accuracy achieved so far was 99.13% with loss of 0.0317. We presented the epochwise training and testing behavior (for both classification accuracy and loss) of the proposed model in Figures 6A, B to showcase the model's efficiency on images of MDSD database.
The experimental findings on the testing set of the MDSD image database reported that our proposed model achieved the overall classification accuracy (99.13%) which is far better than the employed pre-trained networks as shown in Figures 7A, B. However, among the state-of-the-art pre-trained models, the DenseNet121 model achieves the highest accuracy of 95.65% on the test dataset (shown in Figure 7A). The rest of the models achieve accuracy within 85 to 92%. The proposed model also obtained the lowest (0.0317) of all, while the DenseNet121 model reaches 0.1063 (can be seen in Figure 7B). These experimental results cater the superiority and effectiveness of the proposed model over the popular pre-trained models.
The interpretation of the model's performance evaluation based on classification accuracy and training loss wouldn't be A B

FIGURE 6
Epoch-wise behaviour of training and testing of the proposed CNN model (A) Classification accuracy: training vs testing and (B) Loss: training vs testing. Comparative performance of the proposed model and pretrained models (A) models wise classification accuracies on test data and (B) modelwise testing loss.
sufficient. Hence, we calculated the average f1-scores of all the models to evaluate the models in an unbiased way. We presented the obtained f1-scores of the models (proposed as well as pretrained) in Figure 8. It is quite evident from Figure 8, that our proposed model obtained the highest f1-score than the pretrained models in the testing dataset of MLB disease. Our proposed model's prediction performance on the MLB disease dataset was far better than the popular pretrained models. This result implies that our proposed CNN model could identify the unknown images of MDSD database and classify them into respective severity classes.
To better understand the prediction performance of our proposed model, we presented the confusion matrix in Figure 9. Figure 9 shows that our proposed model was 100% accurate in predicting the healthy samples, 98% accurate for the low severity samples, 99% accurate for both samples of medium severity and high severity. Moreover, we also computed recall, precision and f1-score to present the class-wise prediction performance of the proposed model as shown in Table 5. Table 5 shows that the proposed model obtained quite high scores (approx. 99%) for all three metrics. It is evident from the confusion matrix and the performance metrics (recall, precision, and f1-score) that our model performed remarkably well for all the classes of the severity of MLB disease in MDSD database. The model's performance was quite appreciable not only for healthy or high severity images but also for low severity images in which the symptoms of the disease are very mild. This result supports the significance of the proposed CNN model in recognizing severity levels for the unknown images of MLB disease of maize crop.
From the overall analysis of all the employed models, it is apparent that our proposed lightweight CNN model outperforms the popular pre-trained models for identifying the severity stages of MLB disease. However, the most important aspect of this study is that the proposed model can identify the images of the severity of MLB disease even with complex background conditions. This makes the proposed CNN model an effective and cost-effective approach for identifying the appropriate disease severity stages for the researchers, subject matter specialists and farmers in the field condition. f1-scores of the models obtained on testing dataset. Confusion matrix of the proposed model on testing dataset.

Ablation studies
In this section, we presented the ablation studies for selecting the optimum number of inception modules and best optimization function for the proposed model. First, we trained our proposed CNN model by incorporating 0,1,2 and 3 Inception modules. The experimental results reported in Figures 10A, B, depict that the proposed CNN framework achieved around 95% testing accuracy without any inception module. However, the accuracy kept increasing as the number of Inception modules increased as shown in Figure 10A. As a result, the proposed model showed the best prediction performance (classification accuracy of 99.13%) with two Inception modules compared to the others. From Figure 10B, it is apparent that as the number of Inception modules increased, the testing loss decreased and the proposed model achieved the lowest testing loss (i.e. 0.317) with two Inception modules. Hence, the two Inception modules were selected for the proposed CNN model.
We also conducted experiments with different optimization functions, which have a huge role in model convergence and feature learning. We experimented with four types of optimization functions viz. Stochastic gradient descent (SGD), RMSProp, Adam and Nadam in the proposed model and presented the results in Figure 10C. Among the  four optimization functions, Nadam function showed the best performance in the MLB disease severity dataset of maize crop.

Conclusion
In this study, we addressed the major issue of crop management i.e., disease severity stages by proposing a deep learning-based diagnosis approach. In this regard, we created an image database known as MDSD containing images of MLB disease with four different severity stages viz. healthy, low severity, medium and high severity. Next, we proposed a novel lightweight CNN model to identify of severity stages of MLB disease using the images of MDSD. The proposed CNN model's basic framework comprises a stack of computational layers like the CBR layer (Convolution, ReLU and Batch normalization) augmented with two modified Inception modules. On the test dataset, our proposed model reported 99.13% classification accuracy with an f1-score of 98.97% which is quite superior than most of the popular state-of-the-art pretrained models. Furthermore, the overall experimental analysis demonstrated that our proposed CNN model efficiently captures the promising features of the images with complex backgrounds and classifies them into respective severity classes. Therefore, this automated approach for identifying the severity stages of MLB disease using the proposed CNN model would be feasible and cost-effective for the farm community and the subject matter specialists. However, in the present study, the proposed CNN model only applies to the MLB disease of maize crop. In the future, the study can be further expanded to identify severity stages of other major diseases of maize crop and diseases of other crops as per the availability of image dataset.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.