^{1}

^{2}

^{*}

^{1}

^{1}

^{3}

^{1}

^{2}

^{3}

Edited by: John-Lewis Zinia Zaukuu, Kwame Nkrumah University of Science and Technology, Ghana

Reviewed by: Vijander Singh, Netaji Subhas University of Technology, India; Balkis Aouadi, Hungarian University of Agriculture and Life Sciences, Hungary

This article was submitted to Nutrition and Food Science Technology, a section of the journal Frontiers in Nutrition

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Insect-affected pests, as an important indicator in inspection and quarantine, must be inspected in the imports and exports of fruits like “Yali” pears (a kind of duck head-shaped pear). Therefore, the insect-affected pests in Yali pears should be previously detected in an online, real-time, and accurate manner during the commercial sorting process, thus improving the import and export trade competitiveness of Yali pears. This paper intends to establish a model of online and real-time discrimination for recessive insect-affected pests in Yali pears during commercial sorting. The visible-near-infrared (Vis-NIR) spectra of Yali samples were pretreated to reduce noise interference and improve the spectral signal-to-noise ratio (SNR). The Competitive Adaptive Reweighted Sampling (CARS) method was adopted for the selection of feature modeling variables, while Partial Least Squares Discriminant Analysis (PLS-DA), Support Vector Machine (SVM), and Convolutional Block Attention Module-Convolutional Neural Networks (CBAM-CNN) were used to establish online discriminant models. T-distributed Stochastic Neighbor Embedding (T-SNE) and Gradient-weighted Class Activation Mapping (Grad-CAM) were used for the clustering and attention distribution display of spectral features of deep learning models. The results show that the online discriminant model obtained by SGS pretreatment combined with the CBAM-CNN deep learning method exhibits the best performance, with 96.88 and 92.71% accuracy on the calibration set and validation set, respectively. The prediction time of a single pear is 0.032 s, which meets the online sorting requirements.

Yali pears (duck head-shaped pears) are rich in nutrients, tasting good, and have a huge consumer market (

Visible-near-infrared (Vis-NIR) spectroscopy is a non-destructive detection method which has been widely used in the internal quality detection of fruits and crops (

In this paper, an online rapid non-destructive detection method for internal defects of Yali pears based on Vis-NIR has been proposed, in a bid to tackle the online rapid screening of recessive Yali pear pests in the commercial sorting process. First, different pretreatment methods are used to suppress noise and enhance information on the original spectrum to improve the signal-to-noise ratio of the spectrum, such as Savitzky-Golay Smoothing (SGS), Spectral Standardization (SS), Max-min Normalization (MMN) and Standard Normal Variate Transformation (SNV). Second, Competitive Adaptive Reweighted Sampling (CARS) was used for the optimal selection of feature variables. Last, Partial Least Squares Discriminant Analysis (PLS-DA) and Support Vector Machines (SVM) were used to establish shallow learning model of online discrimination for Yali pears, and Convolutional Block Attention Module-Convolutional Neural Networks (CBAM-CNN) was used to establish deep learning model for online discrimination of Yali pears. This paper intends to figure out a method to establish an accurate and rapid online discrimination model for recessive insect pests inside fruits, in a bid to improve the quality sorting of fresh fruits.

The online sorting device for insect-affected Yali pears is shown in

Schematic diagram of the Vis-NIR online sorting device for Yali pears.

Schematic diagram of light source (halogen tungsten lamps) layout.

Prior to spectral acquisition, the parameters of the spectrometer must be calibrated. First, the light sources were turned on for 30 min to ensure the thermal stability of light sources and detector. Second, a PTFE ball with a diameter of about 70 mm was used to calibrate each fruit cup, so as to maintain the consistency of measurement background. In the fruit cup, the pear samples were placed in a way that the connection direction between the pear stalk (C) and the pear pedicel (D) was always perpendicular to the running direction of the conveyor belt.

The working process of the Vis-NIR online sorting device for Yali pears is shown as follows. The Yali pear samples are placed according to

Samples for experiment are Yali pears produced in Hebei province, with a total number of 960. These samples were transported in refrigerated vehicles to laboratory, and then were stored at a constant temperature of 20° for 21 days. Prior to experiment, the stains and moisture on the surface of Yali pear samples were removed. After collecting the samples of the Vis-NIR spectrum, the artificial incision identification method was used to identify the internal pests of the pear. When cutting Yali pears, the first cutting was conducted along the A–B connection direction, and the second cutting was conducted along the C–D connection direction, as shown in

Insect-affected Yali pear samples.

A total of 960 Yali pear samples were divided at a ratio of 4:1 into calibration set and validation set using SPXY algorithm. SPXY algorithm is developed on the basis of KS algorithm, and its principle is to ensure the difference and representativeness of samples by calculating the distance between the spectral feature values of different samples. SPXY can effectively cover the multi-dimensional vector space to avoid the over-fitting or poor prediction effect of the prediction model caused by too small or identical differences between samples, so as to improve the stability and accuracy of the model (

Sample set information.

Sample set | Healthy pears | Insect-affected pears | Total number | Proportion/% |

Calibration set | 468 | 300 | 768 | 80% |

Validation set | 124 | 68 | 192 | 20% |

As shown in

For the feature variable selection method, CARS method is one of the most commonly used methods in the selection of fruit spectral variables, which combines Monte Carlo sampling and PLS model regression coefficients (

Partial Least Squares Discriminant Analysis (PLS-DA) is a multivariate statistical method under supervision, which integrates the basic functions of PCA, canonical correlation analysis and multiple regression analysis, and can compress data and extract feature information (

Support Vector Machine (SVM) is an algorithm based on small-sample statistics theory, which finds out the optimal classification hyperplane by maximizing the geometric interval between the classification hyperplane and the data (

In Formula (1), _{i} is the support vector of calibration sample; _{i} is the category of corresponding sample, whose value range is [–1, 1]. ω is the normal vector of hyperplane; _{i} is slack variable.

As two representative shallow learning methods, PLS-DA and SVM are both widely employed in Vis-NIR qualitative analysis. Both methods have their own characteristics of spectral data processing, PLS-DA is suitable for processing linear problems while SVM for nonlinear problems, but these two methods can neither select feature variables. When the data amount is large or the difference between spectral data is small, PLS-DA and SVM show poor classification performance and heavily rely on spectral data. Under such circumstances, a combination of spectral pretreatment methods and feature selection methods is necessary to obtain desirable classification effect.

Convolutional Neural Network (CNN) is one of the widely studied deep learning algorithms with characteristics of local connection, weight sharing and down-sampling (

The structure of CBAM module is shown in

Convolutional block attention module (CBAM):

For the collected spectral data, this paper has realized the identification of insect-affected pears in three steps. The flowchart of the research method is shown in

Research method flowchart.

In this paper, classification accuracy (Accuracy), classification accuracy of healthy pears (RH) and classification accuracy of insect-affected pears (RB) were used to comprehensively evaluate the prediction accuracy of the model. When the accuracy, RH, and RB values are closer to 100%, the classification performance is better. The evaluation indexes can be calculated according to the following Formulas (3–4).

Where, _{e} is the number of healthy pears misclassified as insect-affected pears; _{e} is the number of insect-affected pears misclassified as healthy pears.

The Vis-NIR spectra of Yali pears are shown in

Vis-NIR spectra of healthy pears and insect-affected pears

The spatial distribution of two kinds of pear samples in calibration set is analyzed by T-distributed Stochastic Neighbor Embedding (T-SNE) method. The T-SNE method is a nonlinear data-driven tool for dimension-reduction and visualization, which shows better performance in data visualization compared with other tools (

Feature visualization of healthy pears and insect-affected pears.

In view of the large amount of spectral data of Yali pears and the redundancy and overlap of spectral information, the CARS algorithm is used to select feature variables to reduce the high collinearity between spectral variables. Through each adaptive weighted sampling, the variables with larger absolute weight of regression coefficients in PLS model are retained as new subsets and the variables with smaller weight are removed. In the algorithm implementation of the CARS method, the range of Monte Carlo randomness is 20–50, the sampling step is 1, and the sampling ratio is between 0.2 and 0.8. When the sampling times are 27 and the sampling rate is 0.8, the RMSECV is the minimum value of 0.33. After CARS screening, a total of 70 wavelength variables were selected, and the combination of wavelength variables selected at this time has the best effect. The distribution of feature spectral variables of Yali pears selected by CARS method is shown in

Distribution of feature spectral variables for Yali pears selected by CARS method.

The selection of the number of principal factors is the key to establish the model of Yali pears using the PLS-DA method. The leave-one-out cross validation is used to determine the optimal main factor number of the model, and the Accuracy and RMSECV of calibration set generated in the PLS model are calculated. The number of factors corresponding to the minimum RMSECV is selected as the optimal number of factors. The calibration results of different pretreatment methods and CARS modeling variable optimization methods combined with PLS-DA model are shown in

Calibration results of PLS-DA models based on different pretreatment methods and modeling variables.

Modeling variables | Pretreatment | Accuracy/% | RH/% | RB/% |

All variables | None | 80.73 | 88.98 | 66.89 |

SGS | 84.89 | 88.68 | 79.00 | |

MMN | 76.43 | 85.47 | 62.33 | |

SNV | 71.88 | 82.05 | 56.00 | |

Selected variables | None | 82.81 | 86.567 | 76.92 |

SGS | 85.42 | 88.46 | 80.67 | |

MMN | 80.08 | 84.61 | 73.00 | |

SNV | 73.18 | 82.27 | 59.00 |

It can be seen from

The Grid Search (GS) is used to optimize the kernel function,

Calibration results of SVM models based on different pretreatment methods and modeling variables.

Modeling variables | Pretreatment | Accuracy/% | RH/% | RB/% |

All variables | None | 75.00 | 75.601 | 73.568 |

SGS | 80.59 | 83.72 | 75.59 | |

MMN | 83.59 | 84.76 | 81.52 | |

SS | 84.51 | 84.42 | 84.67 | |

Selected variables | None | 80.33 | 83.37 | 75.43 |

SGS | 81.38 | 83.51 | 77.74 | |

MMN | 87.24 | 87.63 | 82.82 | |

SS | 90.88 | 92.16 | 88.85 | |

It can be seen from

Since the input data type of the CBAM-CNN model is two-dimensional image data while the original spectral data is one-dimensional data, so before the CBAM-CNN modeling process, it is necessary to transform one-dimensional data into two-dimensional image matrix and input the model in the form of image matrix. In the process of two-dimensional image matrix transformation, it is necessary to intercept the fixed length vector from front to back, stack the intercepted vector according to the line, and stack all the vectors in turn to form a two-dimensional image matrix. In this paper, 1,024 waveband points in the whole waveband are selected, and a row vector is formed by 32 waveband points in each row to evenly separate the whole waveband. Finally, the formed row vectors are stacked to create a 32 × 32 two-dimensional spectral matrix, and the visualization of the two-dimensional spectral matrix is shown in

Original spectra of Yali pears and the transformed two-dimensional spectral image

The structure of CBAM-CNN model is shown in

Structure diagram of CBAM-CNN model.

In the network structure of the CBAM-CNN model, the BN layer is added to solve the problem of gradient disappearance and accelerate the convergence speed of the network, which has a certain regularization effect. Since there has some over-fitting of the network during training, L2 regularization term is added to limit weight information, and the Dropout layer is added to randomly shut down 20% of the neurons. In the setting of hyperparameters,

Calibration results of CBAM-CNN models based on different pretreatment methods and modeling variables.

Modeling variables | Pretreatment | Accuracy/% | RH/% | RB/% |

All variables | None | 85.42 | 86.35 | 83.95 |

MMN | 88.41 | 95.18 | 77.32 | |

SS | 94.53 | 95.65 | 92.63 | |

SNV | 90.62 | 92.97 | 87.68 | |

Selected variables | None | 86.85 | 89.73 | 82.13 |

SGS | 80.21 | 83.93 | 74.24 | |

SS | 83.33 | 93.97 | 65.51 | |

SNV | 75.39 | 77.99 | 71.13 |

It can be seen from

The validity of CBAM module is verified by constructing CNN model. During the construction of CNN model, the network structure, hyperparameters and the optimization scheme of spectral data of CNN model are consistent with those of the CBAM-CNN model. The test verifies that for the CNN model pretreated by SGS method, the calibration accuracy, RH and RB reach 95.18, 96.15, and 93.67%, respectively. Compared with the CBAM-CNN model, the accuracy decreases from 96.88 to 95.18%, indicating that the CBAM-CNN model has more attention to key feature points in spectral data than CNN model, thus improving the classification effect of the model on the insect-affected pears.

The optimal results of PLS-DA model, SVM model and CBAM-CNN model are selected to analyze the specific classification of Yali pears in the validation set for each model, as shown in

Classification results of the validation set samples by PLS-DA, SVM, and CBAM-CNN models.

Model | Pretreatment | Sample status | _{c} |
_{w} |
Accuracy (%) | RH (%) | RB (%) | Prediction time(s) | |

PLS-DA | SS-CARS | Healthy | 124 | 117 | 7 | 90.63 | 94.36 | 83.82 | 0.018 |

Insect-affected | 68 | 57 | 11 | ||||||

SVM | SNV-CARS | Healthy | 124 | 115 | 9 | 81.25 | 92.74 | 60.29 | 0.025 |

Insect-affected | 68 | 41 | 27 | ||||||

Healthy | 124 | 119 | 5 | ||||||

Insect-affected | 68 | 59 | 9 |

^{a}

^{b}_{c}

^{c}_{w}

It can be seen that there is a small amount of imbalance in the spectral data of calibration set, resulting in more or less classification bias in the classification of each model, that is, the classification accuracy of the healthy pears (RH) is higher than that of the insect-affected pears (RB).

As classified by the PLS-DA model, the accuracy of validation set, RH, and RB are 90.63, 94.36, and 83.82%, respectively. A total of 18 Yali pears are misclassified, demonstrating that the classification results are average. The PLS is sensitive to the difference between groups, but the difference between spectral data groups of Yali pears in this experiment is large, thus leading to the misclassification. As classified by the SVM model, the accuracy of validation set, RH, and RB are 81.25, 92.74, and 60.29%, respectively. A total of 36 Yali pears are misclassified, demonstrating that the classification results are poor. Although the spectral data is pretreated, there is still overlapping of data points aliasing, which cannot be separated by a suitable hyperplane; besides,

As classified by the CBAM-CNN model, the accuracy of validation set, RH, and RB are 92.71, 95.97, and 86.76%, respectively. A total of 14 Yali pears are misclassified, demonstrating that the classification results are the best. Due to the increase of attention module in the network, more attention is paid to the spectral characteristics in the spatial dimension and channel, thereby reducing the impact of unbalanced data. The misclassification of 14 pears results from the similarity of two types of original spectral data, and the difference between them fails to be amplified after the pretreatment by SGS. But in general, the CBAM-CNN model only pretreated by SGS shows better performance than other models.

The validation set accuracy of the traditional shallow PLS-DA learning model is 90.63%, and the prediction time of a single pear is 0.018 s, which is relatively short. The validation set accuracy of CBAM-CNN deep learning model is 92.71%, and the prediction time of a single pear is 0.032 s, which takes relatively long computation time. In the actual online sorting, the detection time of a single pear is mainly calculated by the integration time of the spectrometer and the prediction time of the sample spectrum of the model. In this case, the integration time is about 0.08 s and the prediction time is 0.032 s, so the total prediction time of a single pear is 0.112 s. In the production line, a single fruit cup conveyed six pears per second, and the average transmission time of a single fruit cup is 0.167 s. The verification results show that the total predicted time of a single pear is less than the transmission time of a fruit cup, and the time difference meets the requirements of online analysis.

In this paper, the T-SNE method is adopted to visualize the output features of CBAM-CNN model, so as to analyze the data clustering of Yali pear samples by the model. The Gradient-weighted Class Activation Mapping (Grad-CAM) is used to visualize the attention area of CBAM module, in a bid to locate the attention distribution in the spectral features.

The visualization of output features by T-SNE is shown in

Visualization of output features by T-SNE:

The Grad-CAM method is used to generate class activation mappings for localizing the portion of spectral features displayed in the attention module. The feature heatmap is superimposed with the original image, which further shows the localization of some regions in the image by the network, and more intuitively explains the learning ability of the network (

Visualization of Grad-CAM attention area:

The localization of the attention area of spectral matrix image by CBAM module is shown in

Through the visualization analysis of the optimal model and feature, it can be seen that the classification of the deep learning model after spectral pretreatment on Yali pear pests is better than that of the shallow learning model after pretreatment and CARS feature variables optimization. The deep learning model itself can automatically extract features and learn, and use activate functions, attention mechanism and pooling layer, thus realizing the dual role of feature extraction and feature variable selection of spectral data. Therefore, the CBAM-CNN deep learning model shows better performance in discriminating insect-affected Yali pears than the PLS-DA and SVM shallow learning models.

This paper proposes a non-destructive, rapid and online method to detect internal defects of Yali pears based on Vis-NIR, in order to rapidly find out the recessive insect-affected Yali pears during commercial sorting. Different pretreatment methods have been adopted in combination with CARS feature variables optimization to establish the PLS-DA and SVM shallow learning models and the CBAM-CNN deep learning model for online discrimination. The T-SNE and Grad-CAM are used to cluster the output characteristics of the model and visualize the attention area. The experimental results show that the recognition accuracy of PLS-DA and SVM shallow learning online discriminant model improved and is improved to more than 80% after spectral pretreatment and CARS feature variables optimization. The online discriminant model established based on spectra pretreated by SGS combined with CBAM-CNN deep learning method shows the best performance, the accuracy of calibration set and validation set is 96.88 and 92.71%, respectively, and the prediction time of single Yali pear is 0.032 s. Compared with shallow learning method, the deep learning method makes full use of the its autonomous feature extraction and learning ability, thus simplifying the modeling process and obtaining good feature clustering and attention areas of the models. The Vis-NIR model proposed in this paper meets the requirements of accuracy and time for online detection; Hence, it can be applied to detect insect-affected Yali pears during commercial sorting in the coming future.

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

YH: theoretical methods, funding acquisition, manuscript writing, and writing—review and editing. CZ: experiments, software, method implementation, and writing and editing. XL: experimentation, supervision, and writing—review and editing. ZL: funding acquisition and writing—review. All authors contributed to this article and approved the submitted version.

This study was funded by the National Natural Science Foundation of China (grant number 31960497), Jiangxi Provincial Natural Science Foundation of China (grant numbers 20212BAB204009 and 20202ACB211002), Primary Research & Development Plan of Jiangxi Province of China (grant number 20212BBE53016), and Intelligent Detection and Management System of Axle Workshop (grant number SF/QH-TZ05- 2021-281).

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.