Garlic-YOLO-DD: a lightweight object detection algorithm for garlic damage detection

Gao, Yun; Ma, Xiaodan; Xia, Zhennan; Qi, Tao; Wang, Xin; He, Zhuang; Chen, Gang

doi:10.3389/fpls.2025.1702045

ORIGINAL RESEARCH article

Front. Plant Sci., 06 January 2026

Sec. Plant Bioinformatics

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1702045

This article is part of the Research TopicNew Trends in Distributed and Autonomous Intelligent Systems for Crop ProductionView all 3 articles

Garlic-YOLO-DD: a lightweight object detection algorithm for garlic damage detection

Yun Gao

Xiaodan Ma

Zhennan Xia

Tao Qi

Xin Wang

Zhuang He

Gang Chen^*

School of Information Engineering, Changchun College of Electronic Technology, Changchun, China

To address the challenge of applying garlic damage detection models in resource-constrained environments, this study proposes Garlic-YOLO-DD—a lightweight single-stage object detection algorithm based on YOLOv11n. This model effectively resolves the core issues of high computational complexity and excessive parameters in existing methods, achieving efficient and accurate garlic damage recognition suitable for real-time applications. Specifically, replacing conventional convolutional modules in the backbone network with the ADown module significantly reduces parameters and computational load. Simultaneously, integrating the parameter-free SimAM attention mechanism enhances localization and feature extraction capabilities for subtle lesion areas. The efficient BiFPN architecture optimizes the original feature fusion network, improving both speed and effectiveness in multi-scale feature integration. Experiments conducted on a self-built garlic damage dataset demonstrate that the Garlic-YOLO-DD model reduces the number of parameters to 57.96% of YOLOv11n, decreases computational load by 20.63%, increases inference speed by 15.97%, and achieves mAP@50% by 27.64%. This study provides a computer vision solution for automated garlic damage detection in intelligent agricultural systems.

1 Introduction

Garlic, as a globally cultivated and important cash crop, has its quality directly impacting agricultural profitability and market value (Puspitasari et al., 2024; Martínez-López et al., 2022; Anum et al., 2024; Song et al., 2025). During post-harvest handling, garlic bulbs are highly susceptible to damage from mechanical operations or environmental factors (Madhu et al., 2025; 2019a; Kale et al., 2024; Tullo, 2023; Makarichian et al., 2021; Li et al., 2024; Makarichian et al., 2025). This damage not only diminishes garlic’s commercial value but also accelerates decay and spoilage, posing significant food safety concerns (Tavoni et al., 2024). Hence, achieving rapid and precise detection of garlic damage is crucial for safeguarding agricultural product quality and advancing automated processing (Wu et al., 2019; Jia et al., 2020; Raki et al., 2024).

In recent years, deep learning-based object detection techniques have demonstrated significant potential in the field of agricultural visual inspection (Dalal and Mittal, 2025; Kanna et al., 2024; Gong and Wu, 2025; Silva et al., 2024; Pagire et al., 2025; Badgujar et al., 2024; Liu et al., 2025; Zhao et al., 2021). However, existing high-performance models typically involve high computational complexity and massive parameter scales, while also relying on expensive GPU computing power (Cao et al., 2023; Chen et al., 2025). This severely limits the practical application of these models in resource-constrained scenarios, such as embedded devices, mobile terminals, or on-site real-time detection environments (Lazarou and Exarchos, 2024; Liu et al., 2025; Yang et al., 2024). Therefore, developing a lightweight damage detection model that balances high accuracy with low resource consumption has become a critical challenge for advancing the practical implementation of intelligent agricultural detection technologies (Yang et al., 2024; Islam et al., 2025; Rezk et al., 2025; Choe and Lee, 2023).

Prior studies have explored YOLO models for mold detection (Sun et al., 2022) and crop detection in field settings (Lan et al., 2024). Some scholars applied hyperspectral imaging (Ge et al., 2024; Jin et al., 2022) or transformer-based architectures (Bi et al., 2022) for crop classification. Some scholars have also attempted to identify and classify the quality information of crop by improving the traditional convolutional neural network or proposing a brand-new image classification network (Yu et al., 2025; Qi et al., 2023; Zhao et al., 2023; Yu et al., 2025; Koklu et al., 2021; Yu et al., 2024; Joshi et al., 2021; Song et al., 2025; Zhang et al., 2025; Sun et al., 2024; Wang et al., 2024). While these methods achieve high accuracy, they often rely on computationally intensive models or costly imaging systems, which hinder their practical application in real-world production settings.

The primary contributions of this work are delineated as follows:

1. We present Garlic-YOLO-DD, an algorithm that innovates on the YOLOv11n (Khanam and Hussain, 2024) architecture to address the critical need for low-latency, high-precision garlic damage detection in practical agricultural settings.

2. To alleviate computational burdens, we architect a more efficient backbone by substituting standard convolutions with the lightweight ADown subsampling module, leading to a substantial reduction in model size and operations.

3. We incorporate the SimAM attention module, which operates without introducing additional parameters, to augment the network’s capacity for capturing discriminative features of small and inconspicuous damages, thereby elevating localization precision.

4. The original feature fusion network is refined by adopting the BiFPN paradigm, which facilitates superior cross-scale connections and contributes notably to the acceleration of inference and the enhancement of detection performance.

2 Materials and methods

2.1 Data collection and preprocessing

During image acquisition, this study employed the Honor Magic6 smartphone as the imaging device, with the experimental setup illustrated in Figure 1. Experiments were conducted in a darkroom to eliminate ambient light interference, utilizing only two sets of bidirectional controllable light sources for illumination to ensure stable and uniform lighting conditions. This configuration effectively minimized shadows and reflections caused by ambient light, guaranteeing the acquisition of high-quality images. Garlic samples (Variety Fengchan No. 1, sourced from Heze City, Shandong Province, China) were randomly placed on a black light-absorbing cloth to simulate their arrangement on a conveyor belt. The image acquisition device was mounted vertically on a top bracket to ensure consistency in image capture. All garlic datasets constructed in this study were annotated with bounding boxes using the online tool Make Sense.

Figure 1

Diagram of a garlic quality evaluation setup. It includes a darkroom with overhead camera, lighting equipment on both sides, and garlic positioned at the bottom. A connected laptop is shown on the left for data analysis.

Figure 1. Garlic image capture system.

Figure 2 clearly demonstrates the effectiveness of the garlic dataset constructed in this study. This dataset comprises 462 high-resolution images, randomly partitioned into training, validation, and testing sets at a ratio of 7:2:1. All images were uniformly resized to 640×640 pixels before inputting into the model. To enhance model generalization, random rotation augmentation—a built-in technique in the YOLO model—was applied during training. Each garlic instance was annotated into one of three categories. A black light-absorbing cloth was used as the background to simulate the industrial scenario of conveyor belt transportation. This design reduces visual noise, enabling the model to focus more effectively on damage features and thereby enhancing the practical application value of the dataset.

Figure 2

Multiple garlic bulbs are shown against a dark background. Some are highlighted with red boxes, indicating different conditions. Three close-ups are displayed on the right: one of normal garlic, one with localized damage not affecting the roots, and one with root damage. Orange arrows connect the highlighted bulbs to their corresponding close-ups.

Figure 2. Garlic image data display.

Normal garlic samples: These garlic bulbs exhibit an intact appearance with compact, plump cloves and undamaged skin. Such garlic qualifies as high-quality agricultural produce with significant market value (Madhu et al., 2019b).

Partially Damaged Garlic (intact root system): These bulbs exhibit superficial blemishes such as minor skin abrasions, surface scratches, or slight clove deformation (Wang et al., 2025). Damage is confined to the exterior, with the internal structure and root system remaining intact. Despite visible cosmetic flaws, structural integrity and functionality are largely preserved, retaining some edible and commercial value.

Garlic with root damage: These samples exhibit significant root damage, including breakage or rot (Yang et al., 2024). Damaged roots impede water and nutrient absorption, ultimately leading to quality deterioration and reduced shelf life. Additionally, bulbs with root damage are unsuitable for planting and possess lower commercial value.

2.2 Model construction

2.2.1 YOLOv11n

Figure 3 clearly illustrates the overall architecture of the base model YOLOv11n adopted in this study. As a lightweight variant within the YOLOv11 series, this model still adheres to the mature “backbone-neck-head” framework. To maintain low computational overhead, the network employs depthwise separable convolutions and cross-stage local modules, while simultaneously promoting efficient gradient flow during backpropagation. The backbone network employs multi-scale feature propagation, while the neck module utilizes an enhanced Feature Pyramid Network (FPN). This module achieves feature fusion through bidirectional top-down and bottom-up paths, effectively integrating fine-grained information from lower layers with rich semantic information from higher layers. This significantly enhances the model’s ability to recognize garlic damage across scales. In the head module, YOLOv11n employs a decoupled design, separating the classification task and bounding box regression task into independent branches. This design reduces task interference caused by feature sharing, thereby improving detection accuracy. The final prediction layer outputs confidence scores, class probabilities, and bounding box coordinates for each detected object.

Figure 3

Flowchart diagram of a neural network architecture divided into three sections: Backbone, Neck, and Head. The Backbone includes convolutional layers, labeled “Conv” and “C3K2,” with shortcuts noted. The Neck involves concatenation (Concat) and upsampling operations, also with “C3K2.” The Head includes detection outputs. The structure illustrates data flow and processing steps from input to detection.

Figure 3. The architecture of YOLOv11n.

2.2.2 Garlic-YOLO-DD

In this study, we implemented three enhancement strategies on the baseline model YOLOv11n to improve its performance in garlic disease detection, as shown in Figure 4. First, we introduced the lightweight ADown module to replace conventional convolutional modules in the backbone network. This approach significantly reduces computational cost and parameter size while enhancing multi-scale representation capabilities. Second, a parameter-free SimAM attention module was introduced at the end of the backbone network. Driven by an energy function mechanism, this module enhances the network’s ability to focus on damaged areas and improves localization accuracy for subtle defect features. Finally, in the feature fusion stage, the original path aggregation network was replaced with a weighted bidirectional feature pyramid network (BiFPN). This design achieves more adaptive multi-scale fusion through bidirectional cross-scale connections and learnable feature weights, strengthening detection performance across different damage scales. These synergistic optimizations enable the model to maintain a lightweight design while achieving higher accuracy and more efficient inference capabilities.

Figure 4

Neural network architecture diagram for processing images. The flow includes various convolution operations, splits, concatenations, downsampling, upsampling, and detection modules. Each block is labeled with parameters or functions like Conv, ADown, C3K, and Detect. The diagram shows a hierarchical structure with connections demonstrating data flow between operations.

Figure 4. The architecture of Garlic-YOLO-DD.

2.2.3 ADown module

The ADown module (Asymmetric Downsampling) (Wang et al., 2024) is an efficient asymmetric downsampling architecture first introduced in YOLOv9 to mitigate the information loss commonly associated with traditional downsampling during feature map compression. ADown achieves more efficient downsampling while preserving richer feature information by decoupling spatial reduction from channel expansion. As shown in Figure 5, its core employs a parallel branch design: one branch performs spatial downsampling via asymmetric convolutions, while the other branch preserves feature responses through pooling operations. The outputs from both branches are subsequently fused. This asymmetric decomposition significantly reduces computational parameters while diversifying the receptive field. Subsequent experimental results confirm this conclusion: ADown maintains subsampling efficiency while more effectively preserving fine-grained features. It is particularly well-suited for visual tasks requiring precise detection of minute objects, such as identifying damage in garlic.

Figure 5

Flowchart of a neural network architecture. Input X passes through average pooling and chunk process. It splits into two paths: one with convolution (kernel three, stride two, padding one) and the other with max pooling followed by convolution (kernel one, stride one, padding zero). Both paths merge in a concatenation layer, leading to output.

Figure 5. The architecture of ADown.

2.2.4 BiFPN architecture

BiFPN (Weighted Bi-directional Feature Pyramid Network) (Tan et al., 2020) is a weighted bidirectional feature pyramid network whose core concept is to optimize multi-scale feature fusion through efficient cross-scale connections and learnable feature weights. This architecture introduces three key improvements over traditional FPN (top-down) and PANet (bottom-up) structures: First, it removes nodes with only one input edge, simplifying the network structure. Second, it adds skip connections between input and output nodes at the same scale to promote feature reuse. Finally, it introduces learnable weight parameters to perform adaptive weighted fusion based on the importance of different input features. This design enables BiFPN to efficiently aggregate feature maps at different resolutions with low computational cost, enhancing the network’s representation capability for multi-scale objects. Experiments demonstrate that BiFPN significantly reduces computational overhead while maintaining high accuracy, making it highly suitable for integration into lightweight object detection models.

The architecture of the BiFPN module is illustrated in Figure 6. The symbols P3 through P7 represent the multi-scale output layers from the backbone network, where each layer produces a feature map with specific channel and spatial dimensions. For instance, the feature map from the P3 layer has a spatial size equal to the input image resolution divided by 2³, while P4 corresponds to the input resolution divided by 2⁴, and so forth, with P7 features being reduced by a factor of 2⁷. These features are denoted in the diagram as P3_in to P7_in. In the figure, uncolored circles represent feature maps, colored circles denote computational operators, and weighted connections indicate learnable weights W. The mathematical operation formula associated with each operator is shown in Equations 1–8.

Figure 6

Diagram showing a series of repeated blocks with inputs (P7_in to P3_in) on the left and outputs (P7_out to P3_out) on the right. Circular nodes are connected by labeled paths (W31 to W72). Intermediate nodes (P4_td, P5_td, P6_td) are depicted with directional arrows connecting the nodes, indicating data flow through the network. The structure demonstrates a pattern of interconnected processes with multiple levels of input, transformation, and output.

Figure 6. The architecture of BiFPN.

\begin{array}{l} \begin{matrix} P 7_{o u t} = C o n v {\frac{P 7_{i n} * W 71 + R e s i z e (P 6_{o u t}) * W 72}{W 71 + W 72 + ϵ}} \end{matrix} & (1) \end{array}

\begin{array}{l} \begin{matrix} P 6_{t d} = C o n v {\frac{P 6_{i n} * W 61 + R e s i z e (P 7_{i n}) * W 62}{W 61 + W 62 + ϵ}} \end{matrix} & (2) \end{array}

\begin{array}{l} \begin{matrix} P 6_{o u t} = C o n v {\frac{P 6_{i n} * W 63 + P 6_{t d} * W 64 + R e s i z e (P 5_{o u t}) * W 65}{W 63 + W 64 + W 65 + ϵ}} \end{matrix} & (3) \end{array}

\begin{array}{l} \begin{matrix} P 5_{t d} = C o n v {\frac{P 5_{i n} * W 51 + R e s i z e (P 6_{t d}) * W 52}{W 51 + W 52 + ϵ}} \end{matrix} & (4) \end{array}

\begin{array}{l} \begin{matrix} P 5_{o u t} = C o n v {\frac{P 5_{i n} * W 53 + P 5_{t d} * W 54 + R e s i z e (P 4_{o u t}) * W 55}{W 53 + W 54 + W 55 + ϵ}} \end{matrix} & (5) \end{array}

\begin{array}{l} \begin{matrix} P 4_{t d} = C o n v {\frac{P 4_{i n} * W 41 + R e s i z e (P 5_{t d}) * W 42}{W 41 + W 42 + ϵ}} \end{matrix} & (6) \end{array}

\begin{array}{l} \begin{matrix} P 4_{o u t} = C o n v {\frac{P 4_{i n} * W 43 + P 4_{t d} * W 44 + R e s i z e (P 3_{o u t}) * W 45}{W 43 + W 44 + W 45 + ϵ}} \end{matrix} & (7) \end{array}

\begin{array}{l} \begin{matrix} P 3_{o u t} = C o n v {\frac{P 3_{i n} * W 31 + R e s i z e (P 4_{t d}) * W 32}{W 31 + W 32 + ϵ}} \end{matrix} & (8) \end{array}

Within the overall architecture of BiFPN, dimension scaling operations align feature map resolutions through upsampling or downsampling, while convolutional layers (Conv) perform subsequent feature transformations. The introduction of the constant ϵ prevents denominator division by zero, thereby maintaining numerical stability and ensuring reliable training dynamics. Through this recursively efficient integration mechanism, BiFPN generates more discriminative and robust multi-scale feature information. This cross-scale aggregation capability significantly enhances the model’s detection accuracy, particularly when different scales of information are present in varying degrees of garlic damage. The BiFPN architecture better balances recognition performance for both large and small objects. Furthermore, the introduction of learnable weight mechanisms reduces the proposed model’s reliance on manual design and hyperparameter tuning, enhancing the network’s overall adaptability and learning capacity.

2.2.5 SimAM attention mechanism

SimAM (Simple Attention Module) (Yang et al., 2021) is a parameter-free attention mechanism proposed based on neuroscience-inspired saliency detection theory. Its core concept involves quantifying the importance of each neuron in feature maps through an energy function. SimAM treats individual neurons as independent processing units, estimating saliency by measuring the difference between a neuron and its surrounding environment. Neurons with lower energy are considered more information-rich. Based on this, SimAM automatically assigns three-dimensional attention weights to amplify relevant features and suppress redundant information—all without introducing any trainable parameters. Its structural diagram is shown in Figure 7.

Figure 7

Diagram showing a process involving a cube labeled X with dimensions C, H, and W. An arrow labeled “Generation” points to three parallel panels labeled “3-D weights.” Another arrow labeled “Expansion” points to a rearranged stack labeled H, C, and W. A horizontal arrow labeled “Fusion” connects the two processes.

Figure 7. The architecture of SimAM.

The effectiveness of the SimAM module stems from its ability to apply three-dimensional attention weights, a mechanism that simultaneously integrates global features and local details. This enables the model to capture subtle variations and complex patterns within the data, thereby achieving more precise identification of garlic with varying degrees of damage. The module optimizes the attention mechanism through an energy function based on the principle of saliency, evaluating the uniqueness and relevance between neurons. Based on this concept, the energy function is formulated as follows:

\begin{array}{l} \begin{matrix} e_{t} (w_{t}, b_{t}, y, x_{i}) = {(y_{t} - \hat{t})}^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(y_{0} - {\hat{x}}_{i})}^{2} \end{matrix} & (9) \end{array}

Here, $\hat{t}$ = $w_{t} t + b_{t}$ and ${\hat{x}}_{i}$ = $w_{t} x_{i} + b_{t}$ represent linear transformations of t and $x_{i}$ , respectively, where t and $x_{i}$ denote the target neuron and other neurons within a single channel of the input feature $X \in R^{B \times C \times H \times W}$ . The index i spans the spatial dimensions, and M = H×W indicates the total number of neurons in that channel. The terms $w_{t}$ and $b_{t}$ correspond to the weight and bias of the transformation. All values in Equation 9 are scalars. Equation 9 reaches its minimum when $\hat{t}$ equals $y_{t}$ and all other ${\hat{x}}_{i}$ equal $y_{0}$ , where $y_{t}$ and $y_{0}$ are two distinct values. By simplifying the expression, Equation 9 is equivalent to measuring the linear separability between the target neuron t and all other neurons in the same channel. For simplicity, binary labels (i.e., 1 and -1) are adopted, and a regularizer is incorporated into Equation 9. The resulting energy function is formulated as follows:

\begin{array}{l} \begin{matrix} e_{t} (w_{t}, b_{t}, y, x_{i}) = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(- 1 - (w_{t} x_{t} + b_{t}))}^{2} + {(1 - (w_{t} t + b_{t}))}^{2} + λ w_{t}^{2} \end{matrix} & (10) \end{array}

Theoretically, each channel corresponds to M energy functions. Solving all these equations using iterative solvers such as SGD would be computationally expensive. However, Equation 10 admits a fast closed-form solution for $w_{t}$ and $b_{t}$ , which can be derived as follows:

\begin{array}{l} \begin{matrix} w_{t} = - \frac{2 (t - μ_{t})}{{(t - μ_{t})}^{2} + 2 σ_{t}^{2} + 2 λ} \end{matrix} & (11) \end{array}

\begin{array}{l} \begin{matrix} b_{t} = - \frac{1}{2} (t + μ_{t}) w_{t} \end{matrix} & (12) \end{array}

In this context, $μ_{t} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} x_{i}$ and $σ_{t}^{2} = \frac{1}{M - 1} \sum_{i}^{M - 1} {(x_{i} - u_{t})}^{2}$ represent the mean and variance computed across all neurons within the channel except the target neuron t. Since the closed-form solutions given in Equation 11 and Equation 12 are derived at the channel level, it is reasonable to assume that all pixels in a single channel follow the same distribution. Under this assumption, the mean and variance can be computed once across the entire set of neurons and reused for every neuron in the channel. This approach substantially reduces computational overhead by avoiding repeated iterative calculation of μ and σ for each spatial location. Consequently, the minimum energy can be efficiently computed as follows:

\begin{array}{l} \begin{matrix} e_{t}^{*} = \frac{4 ({\hat{σ}}^{2} + λ)}{{(t - \hat{μ})}^{2} + 2 {\hat{σ}}^{2} + 2 λ} \end{matrix} & (13) \end{array}

The mean and variance are estimated as $\hat{μ} = \frac{1}{M} \sum_{i = 1}^{M} x_{i}$ and ${\hat{σ}}^{2} = \frac{1}{M} \sum_{i = 1}^{M} {(x_{i} - \hat{μ})}^{2}$ , respectively. Equation 13 indicates that a lower energy value $e_{t}^{*}$ corresponds to greater discriminability of neuron t relative to its surrounding neurons, implying higher importance in visual processing. Thus, the significance of each neuron can be quantified as 1/ $e_{t}^{*}$ . By incorporating a scaling operator for feature refinement, the overall optimization process of the module can be formulated as follows:

\begin{array}{l} \begin{matrix} \tilde{X} = s i g m o i d (\frac{1}{E}) ⊙ X \end{matrix} & (14) \end{array}

In Equation 14, E denotes the aggregation of all $e_{t}^{*}$ values across both channel and spatial dimensions, and ⊙ represents element-wise multiplication. A sigmoid function is applied to constrain excessively large values in E, and the result is multiplied by the original feature map X to produce a weighted feature map. This operation preserves the relative importance of each neuron, as the sigmoid function is a monotonic transformation.

3 Results and analysis

3.1 Experimental environment, parameter settings, and model evaluation metrics

All experiments were conducted on a workstation equipped with an Intel(R) Xeon(R) W-2245 CPU (3.9GHz) and an NVIDIA Quadro RTX 5000 GPU (16GB), running Windows 10. The software environment included Anaconda3 (2021.11), PyCharm as the compiler, and PyTorch 2.1.2 built on Python 3.8.19. The information on hyperparameters during the model training process is shown in Table 1. To ensure consistency, all algorithms were executed under identical hardware and software configurations.

Table 1

Table 1. The setting of hyperparameters during the model training process.

In this study, the performance of the model was evaluated based on several established metrics: recall (R), precision (P), F1 score (F1), average precision (AP), and mean average precision (mAP). Within the context of binary classification, the following standard definitions were applied: true positives (TP) correspond to positive instances correctly classified; false positives (FP) denote negative instances mistakenly identified as positive; false negatives (FN) refer to positive samples incorrectly classified as negative; and true negatives (TN) indicate negative samples accurately recognized. Precision is defined as the ratio of correctly predicted positive samples to all instances predicted as positive. Recall measures the proportion of actual positive samples that are correctly identified. F1-score is the harmonic mean of precision and recall, serving as a balanced evaluation metric that more comprehensively reflects the recognition performance of individual categories. mAP, on the other hand, calculates the average intersection-over-union score across all categories to holistically reflect overall detection performance. A higher score indicates greater accuracy across all categories. The specific calculation methods for these evaluation metrics are detailed in Equations 15–19.

\begin{array}{l} \begin{matrix} P = \frac{T P}{T P + F P} \end{matrix} & (15) \end{array}

\begin{array}{l} \begin{matrix} R = \frac{T P}{T P + F N} \end{matrix} & (16) \end{array}

\begin{array}{l} \begin{matrix} F 1 = \frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n} \end{matrix} & (17) \end{array}

\begin{array}{l} \begin{matrix} A P = \int ​ P (R) d R \times 100 % \end{matrix} & (18) \end{array}

\begin{array}{l} \begin{matrix} m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N} \end{matrix} & (19) \end{array}

When designing and evaluating lightweight deep learning models, we prioritize the following four lightweight evaluation metrics, which reflect the model’s computational efficiency, memory usage, and inference speed.

The number of parameters measures the total count of trainable variables, directly impacting memory consumption and computational load. Lightweight architectures typically employ efficient layer designs and parameter compression strategies to reduce storage and processing costs (Choi et al., 2025). Floating-point operations per second (FLOPs) estimate computational workload per forward pass. Effectively reducing FLOPs lowers energy consumption and enhances model applicability on low-power devices (Junaid et al., 2024). Model size refers to the storage space occupied by saved weights. Techniques like quantization and compression are commonly used to reduce model size, enabling deployment on embedded platforms (Yuan et al., 2013). FPS (frames per second) indicates the throughput of model inference, representing the number of images processed per second. Higher FPS supports real-time applications such as video analysis (Smistad et al., 2019). In this study, FPS is measured with a batch size of 1. In the results obtained from the ablation experiments and comparative experiments, all values represent the average of five independent assessments.

3.2 Ablation experiment results

As shown in Table 2, this ablation study systematically evaluated the effectiveness of the three modules introduced in this research. The baseline model YOLOv11n achieved mAP@50% of 64.94%, mAP@50-95% of 58.66%, and an F1 score of 59.35%, with 2.583 million parameters. When replacing the standard downsampling module with the ADown module, mAP@50% improved to 87.55%, while reducing parameters to 2.104 million and increasing inference speed from 144 fps to 156 fps. Using BiFPN alone slightly improved recognition accuracy while significantly reducing model parameters. The SimAM module substantially enhanced recognition performance without adding extra parameters, increasing accuracy by 11.00 percentage points to 59.86% and boosting the F1 score by 6.54 percentage points to 65.89%. Meanwhile, the combination of modules demonstrates significant synergistic effects. The collaborative application of ADown and BiFPN generates an ultra-low-parameter model achieving a high mAP@50%, highlighting their complementary roles in efficient feature extraction and fusion for garlic damage detection. Integrating ADown with SimAM further enhances performance, underscoring the powerful interaction between structured downsampling and parameter-free attention mechanisms. Finally, integrating all three enhancement strategies, Garlic-YOLO-DD achieves an impressive 92.58% mAP@50% while maintaining minimal parameters and high recognition speed. These results further validate the effectiveness of each module and the soundness of the overall architecture.

Table 2

Table 2. Detailed procedure for the ablation experiment.

To systematically evaluate the performance of the Garlic-YOLO-DD model during training on normal garlic, partially damaged garlic, and root-damaged garlic, this study conducted a comprehensive comparison and analysis of its Precision-Confidence (P-C), Recall-Confidence (R-C), F1-Confidence (F1-C), and Precision-Recall (PR) curves. As shown in Figure 8, these curves reflect the model’s classification reliability, recall capability, and inter-class performance differences at various confidence thresholds. The P-C curves for all three categories maintained high values across all confidence levels, indicating the model’s stable high accuracy. The normal garlic curve rises steeply toward the upper-right quadrant, indicating strong alignment between predicted confidence and actual accuracy with minimal false positives. In contrast, the partially damaged and root-damaged curves exhibit slight fluctuations in the low-confidence range, reflecting greater morphological variability within these damage categories. However, as confidence increases, their precision consistently stabilizes at high levels, validating the model’s robust discrimination capability. The recall-confidence curve exhibits an expected monotonically decreasing trend as the confidence threshold increases. Root damage shows the gentlest initial decline, indicating its higher sensitivity. The F1-C curves all exhibit distinct peak states, aiding in determining the confidence interval for the optimal balance between precision and recall. The optimized threshold for normal garlic is the highest, while partially damaged garlic exhibits relatively lower thresholds. Ultimately, the PR curves further validate the model’s overall excellent performance. All curves cluster tightly in the upper-right quadrant, with the normal garlic category achieving the highest values and an ideal curve shape. Both damaged garlic categories also demonstrate relatively good recognition performance, confirming the model’s effectiveness in damage identification. Notably, the Garlic-YOLO-DD model achieved the best recognition performance on normal garlic and demonstrated high model performance on the other two damage states as well.

Figure 8

Four graphs display performance metrics for garlic classification. The graphs include Precision-Confidence, Recall-Confidence, F1-Confidence, and Precision-Recall curves. Lines represent different garlic conditions: normal, root damage, local damage, and all classes. The Precision-Recall curve shows a mean average precision of 0.926 at 0.5.

Figure 8. Precision-confidence curve, Recall-confidence curve, F1-confidence curve, and Precision-Recall curve of Garlic-YOLO-DD during training.

Figure 9 presents a comparative analysis between the proposed Garlic-YOLO-DD model and the baseline YOLOv11n on the training cycle validation set, focusing on the bounding box regression loss curve and mAP50%. The results clearly demonstrate that Garlic-YOLO-DD exhibits superior convergence stability and achieves a lower final loss value. The baseline YOLOv11n model exhibits high loss values during early training, accompanied by persistent fluctuations throughout the training process, ultimately converging to a relatively high loss value. This pattern indicates potential challenges in optimizing localization for this task. In contrast, Garlic-YOLO-DD’s loss curve demonstrates clear advantages: faster early descent rates, smoother and more stable convergence without significant fluctuations, and a lower final loss value. The substantial reduction in regression loss indicates that Garlic-YOLO-DD’s architectural optimizations effectively enhance the model’s localization capabilities, enabling more precise bounding box coordinates for garlic recognition. In terms of mAP@50% curve comparisons, the Garlic-YOLO-DD model demonstrates significantly superior convergence characteristics and final performance compared to YOLOv11n. Specifically, YOLOv11n exhibits a relatively flat initial ascent phase in its convergence curve, maintaining a low level throughout the entire training cycle, resulting in a lower final stable value for mAP@50%. In contrast, the Garlic-YOLO-DD curve exhibits a steeper initial ascent phase, indicating faster feature learning rates and gradient propagation efficiency achieved through structural optimization. More importantly, this curve not only converges faster initially but also maintains higher performance throughout the entire training process. This comparative result validates the effectiveness of the model’s structural design from a training dynamics perspective.

Figure 9

Two line graphs comparing the performance of Garlic-YOLO-DD and YOLOv11n. The top graph shows loss over 300 epochs, with both models decreasing steadily before plateauing. Garlic-YOLO-DD has a lower final loss. The bottom graph shows mAP50 over 300 epochs, with Garlic-YOLO-DD outperforming YOLOv11n throughout, reaching a higher mAP score.

Figure 9. Comparison of the bounding box loss curves before and after model improvement, and comparison of mAP@50% before and after model improvement.

To visually assess the performance gap between the baseline model YOLOv11n and Garlic-YOLO-DD, Figure 10 presents a comparison visualization of detection results on the test set before and after model refinement. Most notably, YOLOv11n generates a high number of false detections, misclassifying damaged garlic as intact garlic, reflecting its insufficient feature discrimination capability. In contrast, Garlic-YOLO-DD demonstrates outstanding detection performance on the test set. This model significantly reduces false detection rates and successfully identifies damaged garlic instances. This improvement can be attributed to the SimAM module, which enhances feature attention, while the BiFPN architecture improves multi-scale feature integration. In summary, this visual analysis strongly supports the findings of the quantitative study, confirming that Garlic-YOLO-DD not only outperforms the baseline model numerically but also demonstrates higher reliability in the testing task.

Figure 10

Comparison of garlic detection using two models: YOLOv11n and Garlic-YOLO-DD. Each model identifies garlic heads and root damage with confidence scores displayed in blue. Upper images represent YOLOv11n results; lower images show Garlic-YOLO-DD results, both evaluated on arrays of garlic heads.

Figure 10. Comparison of recognition performance before and after model improvement.

3.3 Comparative experiment results

To comprehensively evaluate the performance of the proposed model, this study conducted comparative experiments with nine classic object detection algorithms, including Faster R-CNN, RT-DETR, YOLOv5n, YOLOv6n, YOLOv8n, YOLOv10n/s, YOLOv11n, and YOLOv12n. As shown in Table 3, all algorithms were evaluated under identical experimental conditions. The proposed Garlic-YOLO-DD model demonstrated significant advantages in detection accuracy, achieving the highest recognition performance with an mAP@50 value of 92.58%, surpassing the second-best YOLOv10s (89.06%) by 3.52 percentage points. Furthermore, Garlic-YOLO-DD achieved an accuracy of 87.48%, a recall rate of 84.26%, and an F1 score of 85.84%, also delivering excellent recognition performance. It is worth emphasizing that Garlic-YOLO-DD achieves significant accuracy improvements while maintaining highly lightweight characteristics. The model contains only 1.497 million parameters, representing a 31.4% reduction compared to the already impressive YOLOv5n. Garlic-YOLO-DD operates at a computational cost of 5.0 GFLOPs, lower than all comparison models. Its model weight file is only 3.4MB, representing just 20.5% of the YOLOv10s model. In terms of inference efficiency, this model achieves 167 frames per second, surpassing all lightweight counterparts in the study and demonstrating a significant advantage in throughput.

Table 3

Table 3. Comparative experiments with classic models.

To investigate the impact of input image size on model detection performance, this study systematically evaluated Garlic-YOLO-DD’s performance in identifying garlic damage effects at different resolutions. As shown in Table 4, when the resolution increased from 240×240 pixels to 960×960 pixels, a distinct nonlinear relationship emerged between input scale and model performance. The optimal input size was determined to be 640×640 pixels, where the model achieved the best comprehensive metrics: mAP@50% of 92.58%, mAP@50-95% of 86.59%, precision of 87.48%, recall of 84.26%, and an F1 score of 85.84%. At this resolution, the model demonstrated the highest localization accuracy and classification performance.

Table 4

Table 4. The impact of different image sizes on model results.

It is noteworthy that the model’s recognition performance did not improve with increased resolution. For instance, increasing the input size from 340×340 to 460×460 resulted in a noticeable decline in recognition performance, indicating that the model exhibits sensitivity to different scales across resolutions. At 760×760 pixels, recall improved to 88.52%, but precision dropped to 80.26%, suggesting a rising false alarm rate. These results further validate the importance of balancing feature detail when selecting input image dimensions. The 640×640 resolution provides sufficient spatial information for reliable garlic damage detection while avoiding performance degradation caused by excessive computation, offering practical guidance for model deployment in real-world scenarios.

As shown in Figure 11, this study investigates the impact of embedding the attention mechanism at different depths within the backbone network on model performance. Five schemes were designed for comparison, ranging from shallow to deep layers of the model backbone. Experimental results demonstrate that the embedding position of the attention mechanism significantly influences Garlic-YOLO-DD’s performance in garlic damage detection, with performance improvements exhibiting a clear positive correlation with embedding depth.

Figure 11

Flowchart depicting a neural network backbone structure with various layers labeled from left to right: Conv (k=3, s=2, p=1) twice, followed by C3K2 with a shortcut, ADown, C3K2 with a shortcut, another ADown, C3K2 with a shortcut, SPPF, and C2PSA with a shortcut. Each layer is organized under five plans.

Figure 11. Schematic diagram showing the attention mechanism positioned at different locations.

Table 5 shows the performance of the Garlic-YOLO-DD model when the SimAM attention mechanism is embedded at different depths of the main network. When SimAM is only placed at the shallow stage (Plan 1), the model achieves mAP@50% of 82.36% and an F1 score of 77.07%, indicating a limited improvement. As the module is embedded at deeper levels, the performance gradually improves. Plan 2 and Plan 3 increase mAP@50% to 84.22% and 86.72% respectively, corresponding to F1 values of 78.15% and 81.28%. This indicates that applying the attention mechanism to deeper features can capture more semantic distinction information. When SimAM is embedded at a deeper level (Plan 4), mAP@50% reaches 91.95%, and the F1 score rises to 85.30%, which is 9.59 and 8.23 percentage points higher than Plan 1, respectively. When SimAM is placed at the bottom layer (Plan 5), Garlic-YOLO-DD achieves the best results for garlic damage recognition, with mAP@50% reaching 92.58%, mAP@50-95% reaching 86.59%, precision rate of 87.48%, recall rate of 84.26%, and F1 score of 85.84%. These results indicate that embedding the attention mechanism at a deeper level enables the model to more effectively focus on the key areas for damage detection.

Table 5

Table 5. Comparison of results from placing attention mechanisms in different positions.

4 Discussion

The Garlic-YOLO-DD model proposed in this study demonstrates outstanding performance in garlic damage detection tasks. It exhibits exceptional comprehensive advantages in accuracy, lightweight design, and inference speed, providing an innovative solution for lightweight visual detection of garlic damage. The following sections will discuss the model’s performance, structural design, application potential, and limitations.

Firstly, Garlic-YOLO-DD maintains extremely low parameters and computational costs while achieving mAP@50% of 92.58% and F1 score of 85.84%, significantly outperforming several mainstream lightweight models. This result further validates the effective combination of the introduced modules. Notably, with a 42.04% reduction in parameters, the mAP@50% of this model has increased by 27.64% compared to YOLOv11n, fully demonstrating that the architecture design has significantly improved the efficiency of parameter utilization.

Secondly, ablation studies demonstrate the critical role of the introduced modules in this research. The ADown module, through its asymmetric convolution design, reduces computational cost while preserving feature information during downsampling, significantly boosting model accuracy while substantially lowering computational overhead. The parameter-free SimAM attention mechanism enhances the model’s ability to focus on damaged areas of garlic without substantially increasing computational burden. BiFPN enhances multi-scale representations through weighted bidirectional feature fusion, improving consistency and differentiation across varying damage sizes. Validation experiments with five attention placement schemes reveal that embedding the attention mechanism deeper within the model backbone captures more damage-related information, leading to more substantial performance gains. Comparative experiments on the image resolution input to the model indicate that it performs optimally at 640×640 pixels. This suggests that at this resolution, the model can capture more damage information in garlic bulbs. Lower or higher resolutions reduce the model’s effectiveness in identifying damaged areas in garlic bulbs.

Although the base model YOLOv11n used in this study did not achieve optimal performance in the garlic recognition task, selecting YOLOv11n as the base model more clearly highlights the effectiveness of subsequent enhancement measures. It is worth noting, however, that the recognition performance of the proposed improvement strategy may vary across different base models. This phenomenon warrants further investigation in future research. Comparative experiments confirm that Garlic-YOLO-DD not only surpasses stronger base models like YOLOv6n in accuracy but also maintains a significant advantage in terms of model complexity. Despite the promising results of this study, several limitations remain that will be addressed in future research. First, the training and validation data were collected under specific imaging conditions, introducing singularity and uncertainty. Future work should validate the model’s ability to identify garlic damage severity under varying lighting conditions, garlic varieties, and camera hardware. Additionally, the current model identifies only three distinct damage types. Expanding its capability to recognize more damage categories and anomalies—such as moldy, sprouted, or diseased garlic—would enhance its value. Second, while the Garlic-YOLO-DD model demonstrates excellent time complexity, space complexity, and computational efficiency, its real-time performance on resource-constrained edge devices requires further validation. Furthermore, a real-time garlic conveying and sorting device will be incorporated into future research. As garlic bulbs move along a conveyor belt, the system captures and analyzes multi-angle images in real time, enabling millisecond-level precise identification of root or skin damage. Upon detecting defects, the sorting device automatically rejects non-compliant bulbs. This solution replaces manual inspection, enhances sorting consistency and throughput, reduces labor costs, and eliminates direct contact with agricultural products. Consequently, the model provides a reliable technical pathway for intelligent post-harvest processing of garlic, demonstrating how computer vision technology can translate into tangible agricultural productivity.

5 Conclusion

The lightweight garlic damage assessment model Garlic-YOLO-DD proposed in this study integrates the ADown downsampling module, SimAM parameter-free attention mechanism, and BiFPN feature fusion architecture. This approach significantly reduces parameter count and computational complexity while substantially improving recognition accuracy. Experimental results demonstrate that the model achieves an outstanding mAP50% of 92.58% on a self-built garlic damage dataset, requiring only 1.497 million parameters and delivering an inference speed of 167 frames per second. Its overall performance exhibits a clear advantage over current mainstream lightweight detection models. Garlic-YOLO-DD provides a computer vision solution for rapid, non-destructive detection of garlic damage, demonstrating promising practical application prospects.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Author contributions

YG: Conceptualization, Investigation, Software, Writing – original draft, Writing – review & editing. XM: Data curation, Methodology, Supervision, Writing – original draft. ZX: Formal analysis, Project administration, Validation, Writing – review & editing. TQ: Funding acquisition, Resources, Visualization, Writing – original draft. XW: Software, Supervision, Validation, Writing – original draft. ZH: Methodology, Project administration, Visualization, Writing – original draft. GC: Conceptualization, Data curation, Formal analysis, Funding acquisition, Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Anum, H., Tong, Y. X., and Cheng, R. F. (2024). Different preharvest diseases in garlic and their eco-friendly management strategies. Plants-Basel 13, 14. doi: 10.3390/plants13020267

PubMed Abstract | Crossref Full Text | Google Scholar

Badgujar, C. M., Poulose, A., and Gan, H. (2024). Agricultural object detection with you only look once (Yolo) algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 223, 18. doi: 10.1016/j.compag.2024.109090

Crossref Full Text | Google Scholar

Bi, C. G., Hu, N., Zou, Y. Q., Zhang, S., Xu, S. Z., and Yu, H. L. (2022). Development of deep learning methodology for maize seed variety recognition based on improved swin transformer. Agronomy-Basel 12, 20. doi: 10.3390/agronomy12081843

Crossref Full Text | Google Scholar

Cao, J. S., Sen, R., Interlandi, M., Arulraj, J., and Kim, H. (2023). Gpu database systems characterization and optimization. Proc. Vldb Endowment 17, 441–454. doi: 10.14778/3632093.3632107

Crossref Full Text | Google Scholar

Chen, Z. Y., Yu, H. L., Song, S. Z., Bi, C. G., Guo, J. Y., and Ling, X. (2025). J-rice-resnext: A deep learning-enhanced framework for high-accuracy japonica rice varietal classification in precision agriculture. Ind. Crops Prod 237, 18. doi: 10.1016/j.indcrop.2025.122197

Crossref Full Text | Google Scholar

Choe, H. O. and Lee, M. H. (2023). Artificial intelligence-based fault diagnosis and prediction for smart farm information and communication technology equipment. Agriculture-Basel 13, 19. doi: 10.3390/agriculture13112124

Crossref Full Text | Google Scholar

Choi, G., Lee, K., and Kwak, N. (2025). Waps-quant: low-bit post-training quantization using weight-activation product scaling. IEEE Access 13, 79534–79547. doi: 10.1109/access.2025.3566307

Crossref Full Text | Google Scholar

Dalal, M. and Mittal, P. (2025). A systematic review of deep learning-based object detection in agriculture: methods, challenges, and future directions. Cmc-Computers Materials Continua 84, 57–91. doi: 10.32604/cmc.2025.066056

Crossref Full Text | Google Scholar

Ge, Y. F., Song, S. Z., Yu, S., Zhang, X. L., and Li, X. F. (2024). Rice seed classification by hyperspectral imaging system: A real-world dataset and a credible algorithm. Comput. Electron. Agric. 219, 13. doi: 10.1016/j.compag.2024.108776

Crossref Full Text | Google Scholar

Gong, X. Y. and Wu, Q. F. (2025). Fruit detection methods based on deep learning in agricultural planting: A systematic literature review. IEEE Access 13, 96092–96110. doi: 10.1109/access.2025.3573364

Crossref Full Text | Google Scholar

Islam, M. T., Swapnil, S. S., Billal, M. M., Karim, A., Shafiabady, N., and Hassan, M. M. (2025). Resource constraint crop damage classification using depth channel shuffling. Eng. Appl. Artif. Intell. 144, 19. doi: 10.1016/j.engappai.2025.110117

Crossref Full Text | Google Scholar

Jia, G., YueMing, T., YongQing, Z., FangYao, L., YueJian, L., and MingJun, M. (2020). Effect of map technology on storage quality of post-harvest dry garlic. Xi'nan nongye xuebao 33, 1573–1579. doi: 10.16213/j.cnki.scjas.2020.7.035

Crossref Full Text | Google Scholar

Jin, B. C., Zhang, C., Jia, L. Q., Tang, Q. Z., Gao, L., Zhao, G. W., et al. (2022). Identification of rice seed varieties based on near-infrared hyperspectral imaging technology combined with deep learning. ACS Omega 15. doi: 10.1021/acsomega.1c04102

PubMed Abstract | Crossref Full Text | Google Scholar

Joshi, D., Butola, A., Kanade, S. R., Prasad, D. K., Mithra, S. V. A., Singh, N. K., et al. (2021). Label-free non-invasive classification of rice seeds using optical coherence tomography assisted with deep neural network. Optics Laser Technol. 137, 7. doi: 10.1016/j.optlastec.2020.106861

Crossref Full Text | Google Scholar

Junaid, M., Aliev, H., Park, S., Kim, H., Yoo, H., and Sim, S. (2024). Hybrid precision floating-point (Hpfp) selection to optimize hardware-constrained accelerator for cnn training. Sensors 24, 22. doi: 10.3390/s24072145

PubMed Abstract | Crossref Full Text | Google Scholar

Kale, R. B., Khandagale, K., Ramadas, S., Gavhane, A. D., Gedam, P., and Mahajan, V. (2024). Unravelling physiological disorders in onion and garlic: critical assessment and bibliometric visualization. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1500917

PubMed Abstract | Crossref Full Text | Google Scholar

Kanna, S. K., Kumaraperumal, R., Pazhanivelan, P., Jagadeeswaran, R., and Prabu, P. C. (2024). Yolo deep learning algorithm for object detection in agriculture: A review. J. Agric. Eng. 55, 15. doi: 10.4081/jae.2024.1641

Crossref Full Text | Google Scholar

Khanam, R. and Hussain, M. (2024). Yolov11: an overview of the key architectural enhancements. Arxiv. doi: 10.48550/arXiv.2410.17725

Crossref Full Text | Google Scholar

Koklu, M., Cinar, I., and Taspinar, Y. S. (2021). Classification of rice varieties with deep learning methods. Comput. Electron. Agric. 187, 8. doi: 10.1016/j.compag.2021.106285

Crossref Full Text | Google Scholar

Lan, M. Y., Liu, C. J., Zheng, H. W., Wang, Y. W., Cai, W. X., Peng, Y. T., et al. (2024). Rice-yolo: in-field rice spike detection based on improved yolov5 and drone images. Agronomy-Basel 14, 23. doi: 10.3390/agronomy14040836

Crossref Full Text | Google Scholar

Lazarou, E. and Exarchos, T. P. (2024). Predicting stress levels using physiological data: real-time stress prediction models utilizing wearable devices. AIMS Neurosci. 11, 76–102. doi: 10.3934/Neuroscience.2024006

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Y. H., Li, C., Liu, L. P., Zhou, K., and Hou, J. L. (2024). Design and test of the plant-correcting reel for harvesting lodging garlic plants. Int. J. Agric. Biol. Eng. 17, 59–68. doi: 10.25165/j.ijabe.20241701.7784

Crossref Full Text | Google Scholar

Liu, R. H., Shao, Z. X., Yu, Z. Z., and Li, R. (2025). Research on real-time helmet detection and deployment based on an improved yolov7 network with channel pruning. Signal Image Video Process. 19, 10. doi: 10.1007/s11760-024-03584-5

Crossref Full Text | Google Scholar

Liu, Y. L., Yang, D. G., Song, T. T., Ye, Y. C., and Zhang, X. (2025). Yolo-ssp: an object detection model based on pyramid spatial attention and improved downsampling strategy for remote sensing images. Visual Comput. 41, 1467–1484. doi: 10.1007/s00371-024-03434-y

Crossref Full Text | Google Scholar

Madhu, B., Mudgal, V. D., and Champawat, P. S. (2019a). Storage of garlic bulbs (Allium sativum L.): A review. J. Food Process Eng. 42, 6. doi: 10.1111/jfpe.13177

Crossref Full Text | Google Scholar

Madhu, B., Mudgal, V. D., and Champawat, P. S. (2019b). Storage of garlic bulbs (Allium sativum L.): A review. J. Food Process Eng. 42, 6. doi: 10.1111/jfpe.13177

Crossref Full Text | Google Scholar

Madhu, B., Mudgal, V. D., and Champawat, P. S. (2025). Sustainable garlic production and value chain development: A holistic review of post-harvest technological interventions, processing, and product diversification. J. Food Process Eng. 48, 18. doi: 10.1111/jfpe.70205

Crossref Full Text | Google Scholar

Makarichian, A., Ahmadi, E., Chayjan, R. A., Zafari, D., and Mohtasebi, S. S. (2025). Use of the electronic nose to monitor the influences of modified atmosphere packaging on the storage of contaminated garlic. Heliyon 11, e42609. doi: 10.1016/j.heliyon.2025.e4260910.1016/j.heliyon.2025.e42609

PubMed Abstract | Crossref Full Text | Google Scholar

Makarichian, A., Chayjan, R. A., Ahmadi, E., and Mohtasebi, S. S. (2021). Assessment the influence of different drying methods and pre-storage periods on garlic (Allium sativum L.) aroma using electronic nose. Food Bioproducts Process. 127, 198–211. doi: 10.1016/j.fbp.2021.02.016

Crossref Full Text | Google Scholar

Martínez-López, J. A., López-Urrea, R., Martínez-Romero, A., Pardo, J. J., Montoya, F., and Domínguez, A. (2022). Improving the sustainability and profitability of oat and garlic crops in a mediterranean agro-ecosystem under water-scarce conditions. Agronomy-Basel 12, 17. doi: 10.3390/agronomy12081950

Crossref Full Text | Google Scholar

Pagire, V., Chavali, M., and Kale, A. (2025). A comprehensive review of object detection with traditional and deep learning methods. Signal Process. 237, 28. doi: 10.1016/j.sigpro.2025.110075

Crossref Full Text | Google Scholar

Puspitasari, Nurmalina, R., Hariyadi, and Agustian, A. (2024). Systems thinking in sustainable agriculture development: A case study of garlic production in Indonesia. Front. Sustain. Food Syst. 8. doi: 10.3389/fsufs.2024.1349024

Crossref Full Text | Google Scholar

Qi, H. N., Huang, Z. H., Sun, Z. Y., Tang, Q. Z., Zhao, G. W., Zhu, X. H., et al. (2023). Rice seed vigor detection based on near-Infrared hyperspectral imaging and deep transfer learning. Front. Plant Sci. 14. doi: 10.3389/fpls.2023.1283921

PubMed Abstract | Crossref Full Text | Google Scholar

Raki, H., Aalaila, Y., Taktour, A., and Peluffo-Ordóñez, D. H. (2024). Combining ai tools with non-destructive technologies for crop-based food safety: A comprehensive review. Foods 13, 32. doi: 10.3390/foods13010011

PubMed Abstract | Crossref Full Text | Google Scholar

Rezk, N. G., Attia, A. F., El-Rashidy, M. A., El-Sayed, A., and Hemdan, E. E. (2025). An efficient iot-based crop damage prediction framework in smart agricultural systems. Sci. Rep. 15, 15. doi: 10.1038/s41598-025-12921-8

PubMed Abstract | Crossref Full Text | Google Scholar

Silva, J., de Siqueira, V. S., Mesquita, M., Vale, L. S. R., Marques, T. D. B., da Silva, J. L. B., et al. (2024). Deep learning for weed detection and segmentation in agricultural crops using images captured by an unmanned aerial vehicle. Remote Sens. 16, 23. doi: 10.3390/rs16234394

Crossref Full Text | Google Scholar

Smistad, E., Ostvik, A., and Pedersen, A. (2019). High performance neural network inference, streaming, and visualization of medical images using fast. IEEE Access 7, 136310–136321. doi: 10.1109/access.2019.2942441

Crossref Full Text | Google Scholar

Song, S. Z., Chen, Z. Y., Yu, H. L., Xue, M. X., and Liu, J. L. (2025). Rapid and accurate classification of mung bean seeds based on hpmobilenet. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1474906

PubMed Abstract | Crossref Full Text | Google Scholar

Song, J. P., Jia, H. X., Wang, Y., Zhang, X. H., Yang, W. L., Zhang, T. T., et al. (2025). Evaluation of the effects of degradable mulching film on the growth, yield and economic benefit of garlic. Agronomy-Basel 15, 13. doi: 10.3390/agronomy15010093

Crossref Full Text | Google Scholar

Sun, F. Z., Li, R., Zhu, J. Y., Peng, H., Li, Z., Jiang, J. L., et al. (2024). Comparison analysis of moisture-Dependent orthotropic elasticity between earlywood and latewood in chinese fir using digital image correlation. Ind. Crops Prod 220, 10. doi: 10.1016/j.indcrop.2024.119185

Crossref Full Text | Google Scholar

Sun, K., Zhang, Y. J., Tong, S. Y., Tang, M. D., and Wang, C. B. (2022). Study on rice grain mildewed region recognition based on microscopic computer vision and yolo-V5 model. Foods 11, 16. doi: 10.3390/foods11244031

PubMed Abstract | Crossref Full Text | Google Scholar

Tan, M., Pang, R., and Le, Q. V. (2020). Efficientdet: scalable and efficient object detection. Arxiv. doi: 10.1109/CVPR42600.2020

Crossref Full Text | Google Scholar

Tavoni, M., Andreoni, P., Calcaterra, M., Calliari, E., Deubelli-Hwang, T., Mechler, R., et al. (2024). Economic quantification of loss and damage funding needs. Nat. Rev. Earth Environ. 5, 411–413. doi: 10.1038/s43017-024-00565-7

Crossref Full Text | Google Scholar

Tullo, A. (2023). Design, and construction of power-operated garlic (Allium sativum L.) bulbs breaker for Ethiopian garlic. Appl. Res. Innovation 1, 10–23. doi: 10.54536/ari.v1i2.2021

Crossref Full Text | Google Scholar

Wang, S. Y., Meng, Y. B., Wang, Y. J., Li, H., and Zhang, X. D. (2025). Design and test of a low-damage garlic seeding device based on rigid-flexible coupling. Agriculture-Basel 15, 19. doi: 10.3390/agriculture15192079

Crossref Full Text | Google Scholar

Wang, C.-Y., Yeh, I. H., and Liao, H.-Y. M. (2024). Yolov9: learning what you want to learn using programmable gradient information. Arxiv. doi: 10.1007/978-3-031-72751-1_1

Crossref Full Text | Google Scholar

Wang, L., Zhang, H. C., Bian, L. M., Zhou, L., Wang, S. Y., and Ge, Y. F. (2024). Poplar seedling varieties and drought stress classification based on multi-source, time-series data and deep learning. Ind. Crops Prod 218, 16. doi: 10.1016/j.indcrop.2024.118905

Crossref Full Text | Google Scholar

Wu, X. Q., Xu, H., and Liu, J. Y. (2019). Rapid evaluation of garlic cultivars by front-face fluorescence and independent component analysis. Acta Alimentaria 48, 316–323. doi: 10.1556/066.2019.48.3.6

Crossref Full Text | Google Scholar

Yang, B., Yang, S., Wang, P., Wang, H., Jiang, J. M., Ni, R. R., et al. (2024). Frpnet: an improved faster-resnet with paspp for real-time semantic segmentation in the unstructured field scene. Comput. Electron. Agric. 217, 11. doi: 10.1016/j.compag.2024.108623

Crossref Full Text | Google Scholar

Yang, L., Zhang, R.-Y., Li, L., and Xie, X. (2021). Simam: A simple, parameter-free attention module for convolutional neural networks (PMLR).

Google Scholar

Yang, K., Zhou, Y. L., Shi, H. L., et al. (2024). Research and experiments on adaptive root cutting using a garlic harvester based on a convolutional neural network. Agriculture-Basel 14, 25. doi: 10.3390/agriculture14122236

Crossref Full Text | Google Scholar

Yu, H., Chen, Z., Liu, X., Song, S., and Chen, M. (2025). Improving efficientnet_B0 for distinguishing rice from different origins: A deep learning method for geographical traceability in precision agriculture. Curr. Plant Biol., 100501. doi: 10.1016/j.cpb.2025.100501

Crossref Full Text | Google Scholar

Yu, H. L., Chen, Z. Y., Song, S. Z., Chen, M. J., and Yang, C. L. (2024). Classification of rice seeds grown in different geographical environments: an approach based on improved residual networks. Agronomy-Basel 14, 24. doi: 10.3390/agronomy14061244

Crossref Full Text | Google Scholar

Yu, H. L., Chen, Z. Y., Song, S. Z., Qi, C. Y., Liu, J. L., and Yang, C. L. (2025). Rapid and non-destructive classification of rice seeds with different flavors: an approach based on hpfasternet. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1502631

PubMed Abstract | Crossref Full Text | Google Scholar

Yuan, D., Yang, Y., Liu, X., et al. (2013). A highly practical approach toward achieving minimum data sets storage cost in the cloud. IEEE Trans. Parallel Distributed Syst. 24, 1234–1244. doi: 10.1109/tpds.2013.20

Crossref Full Text | Google Scholar

Zhang, T. T., Tong, J. P., Li, L., Yuan, J. D., Yuan, X. T., Yuan, F. Y., et al. (2025). Integration of swir hyperspectral imaging, uhplc-Q-tof/ms and optimized machine learning for non-destructive authentication of pseudostellaria heterophylla. Ind. Crops Prod 234, 14. doi: 10.1016/j.indcrop.2025.121534

Crossref Full Text | Google Scholar

Zhao, J. F., Ma, Y., Yong, K. C., Zhu, M., Wang, Y. Q., Wang, X., et al. (2023). Rice seed size measurement using a rotational perception deep learning model. Comput. Electron. Agric. 205, 16. doi: 10.1016/j.compag.2022.107583

Crossref Full Text | Google Scholar

Zhao, W., Yamada, W., Li, T. X., Digman, M., and Runge, T. (2021). Augmenting crop detection for precision agriculture with deep visual transfer learning-a case study of bale detection. Remote Sens. 13, 17. doi: 10.3390/rs13010023

Crossref Full Text | Google Scholar

Keywords: garlic damage detection, lightweight network, object detection, precision agriculture, YOLO

Citation: Gao Y, Ma X, Xia Z, Qi T, Wang X, He Z and Chen G (2026) Garlic-YOLO-DD: a lightweight object detection algorithm for garlic damage detection. Front. Plant Sci. 16:1702045. doi: 10.3389/fpls.2025.1702045

Received: 09 September 2025; Accepted: 08 December 2025; Revised: 03 December 2025;
Published: 06 January 2026.

Edited by:

Xu Zheng, University of Electronic Science and Technology of China, China

Reviewed by:

Milind B. Ratnaparkhe, ICAR Indian Institute of Soybean Research, India
Jo-Ann Magsumbol, Polytechnic University of The Philippines Sablayan Campus, Philippines
Haitao Wu, Beijing Forestry University, China

Copyright © 2026 Gao, Ma, Xia, Qi, Wang, He and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gang Chen, emh1YW5naGUyMDI1MDgwOEAxNjMuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.