A digital twin–driven deep learning framework for online quality inspection in tobacco transplanting

Zhao, Qiuyang; Ma, Erdeng; Zhao, Jian; You, Zekun; Liu, Jiahui; Zhao, Dong

doi:10.3389/fpls.2026.1716046

ORIGINAL RESEARCH article

Front. Plant Sci., 04 February 2026

Sec. Sustainable and Intelligent Phytoprotection

Volume 17 - 2026 | https://doi.org/10.3389/fpls.2026.1716046

This article is part of the Research TopicIntegrating Visual Sensing and Machine Learning for Advancements in Plant Phenotyping and Precision AgricultureView all 9 articles

A digital twin–driven deep learning framework for online quality inspection in tobacco transplanting

Updated

A correction has been applied to this article in:

Correction: A digital twin–driven deep learning framework for online quality inspection in tobacco transplanting
1. Read correction

Qiuyang Zhao^1†

Erdeng Ma^2†

Jian Zhao^1,3,4*

Zekun You¹

Jiahui Liu^1,3

Dong Zhao^1,3*

¹School of Technology, Beijing Forestry University, Beijing, China
²Yunnan Academy of Tobacco Agricultural Sciences, Kunming, China
³Key Lab of State Forestry Administration on Forestry Equipment and Automation, Beijing Forestry University, Beijing, China
⁴State Key Laboratory of Efficient Production of Forest Resources, Beijing, China

Tobacco transplanting quality inspection is crucial for tobacco production, as it directly affects crop yield and quality of tobacco leaves. Accurate transplanting status detection and assessment provide essential support for replanting decisions and transplanting machine optimization. Traditional methods rely on manual inspection, which suffer from high cost, low efficiency, and unstable results. To tackle the aforementioned issues, this paper proposes a Deep Learning and Digital Twin driven Online Quality Inspection Method for Tobacco Transplanting, which consists of four core modules: Transplanting Status Detection, Multi-sensor Data Fusion, Digital Twin Visualization, and Operational Optimization Feedback. This paper proposes a lightweight improved YAN-YOLO11 algorithm capable of assessing normal, exposed-root, and buried seedlings. By fusing GNSS positioning data with visual detection results, the system estimates in-row spacing and performs status assessment for missed planting and double planting. The system establishes a virtual-real interactive closed-loop of “collection-detection-mapping-feedback” via the digital twin. By visualizing operational status in real-time and generating replanting path suggestions, it provides guidance for operation management and significantly improves inspection efficiency. Field experiments demonstrate that, compared with YOLO11n, YAN-YOLO11 improves precision and recall by 2.4% and 2.5%, respectively; mAP@50 increased by 3% to 80.9% ± 1.4%, and mAP@0.5:0.95 increased by 5.8% to 54.2% ± 1.0%, while significantly reducing model complexity. The system achieves a real-time performance of 30 FPS in the field, with an overall recognition accuracy of 90.74%, meeting practical application requirements. This study effectively enhances the digitalization, automation, and refined management of tobacco transplanting operations, providing a theoretical foundation and practical solution for the intelligent transformation of transplanting machinery and precision crop management.

1 Introduction

Tobacco, as one of the most important cash crops worldwide, has long occupied a significant position in the agricultural economies of many countries (Liu and Filippidis, 2024). In recent years, with the continuous expansion of the global market, the demand for both tobacco quality and yield has been increasing, and tobacco cultivation has been accelerating its transformation from traditional labor-intensive modes toward efficient mechanization, automation, and intelligence. Within the entire process of tobacco agricultural production, the transplanting stage is particularly critical, as its operational quality directly affects plant growth, final yield, and the quality of tobacco leaves (Basir et al., 2021; Sun et al., 2024). Accurate transplanting quality inspection allows for the timely detection of issues such as missed planting, double planting, and uneven in-row spacing, thus providing data support for replanting operations to improve land use efficiency. Furthermore, it identifies abnormal statuses like seedling burial and root exposure, offering essential evidence for evaluating the performance of transplanting equipment. However, current traditional methods for assessing the quality of tobacco seedling transplanting mainly rely on manual inspection and row-by-row counting. This approach is not only labor-intensive and time-consuming but is also constrained by the operator’s experience and attentiveness, making it prone to human error and difficult to ensure the accuracy and consistency of statistical results (Fan et al., 2018). Therefore, it is urgent to develop an efficient, objective, and automated transplanting quality inspection method to promote the deeper application of smart agriculture in the tobacco sector.

With the advancement of smart agriculture, digital technologies represented by digital twin and deep learning have been extensively studied and applied in agricultural production. The concept of the digital twin was first proposed by Michael Grieves in the early 21st century, initially applied in manufacturing and engineering, where physical systems were replicated through digital technologies for monitoring, prediction, and optimization (Sun et al., 2024). In the agricultural domain, its potential to improve precision farming and sustainability has been widely recognized (Escriba-Gelonch et al., 2024). Chen et al. (2023) proposed a digital twin and data-driven online monitoring method for transplanting machines in plant factories, which can assess transplanting effects in real time. By enabling real-time interaction between virtual mapping and physical equipment, this approach optimizes the transplanting process and effectively prevents quality issues caused by mechanical resonance. Li et al. (2025) developed a digital twin system combined with deep learning for monitoring beak deformities in caged laying hens. Through image recognition, the system provided real-time feedback on flock health status, thereby optimizing the early detection of beak deformities and improving management in poultry farming.

In crop cultivation, digital twins create virtual farmland models to simulate various crop management processes. By leveraging advanced technologies such as artificial intelligence and the Internet of Things (IoT), they can significantly enhance productivity and management efficiency in agricultural production (Nasirahmadi and Hensel, 2022). Wang et al. (2024) proposed an intelligent sugarcane breeding system based on artificial intelligence, blockchain, and digital twin technology, which enabled real-time data circulation and optimization, greatly improving breeding efficiency and shortening the breeding cycle. Zhang et al. (2023) designed a digital twin system for plant factories, which enabled real-time monitoring and regulation of multiple environmental variables within plant factories. This system optimized the control of plant growth environments and provided intelligent decision support for plant factory managers. Xu et al. (2025) developed a digital twin system for the growth process of winter wheat. By integrating UAV-based remote sensing and IoT devices, the system monitored the growth status of winter wheat in real time and provided accurate growth prediction and optimized management strategies.

In transplanting inspection, especially in research on seedling detection, deep learning and object detection have been widely applied due to their advantages of high accuracy and real-time performance. Cui et al. (2023) proposed a real-time missed-seedling detection and counting method for paddy fields based on YOLOv5s and the ByteTrack algorithm. Through a lightweight design, this method improved detection accuracy for small and overlapping seedlings and significantly reduced counting time. Vong et al. (2022) combined UAV imagery with deep learning models to propose a method for detecting maize emergence uniformity based on plant density, the standard deviation of plant spacing, and the average number of imaging days after emergence. They also developed a field mapping function, providing valuable references for farm-level decision-making. Liu et al. (2023) developed a maize emergence evaluation system based on UAV imagery and deep learning, in which a YOLO model was used to achieve efficient detection of maize seedlings and assessment of emergence uniformity. Furthermore, recent studies have also made progress in addressing detection challenges related to small objects and complex backgrounds. For example, the Seedling-YOLO model significantly enhanced the detection accuracy and speed for exposed seedlings and missing holes in broccoli transplanting by introducing ELAN-P modules and attention mechanisms (Zhang et al., 2024); meanwhile, the LSOD-YOLO algorithm optimized YOLOv8 through a cross-layer output reconstruction module, effectively resolving the missed detection of small objects while ensuring a lightweight design (Wang et al., 2025).

For tobacco transplanting seedling recognition, Shahid et al. (2024) proposed an aerial imagery-based tobacco plant counting framework. By integrating YOLOv7 with the SORT algorithm and overlapping detection, this framework achieved real-time and precise seedling detection and counting, significantly improving the efficiency and accuracy of crop emergence monitoring. In existing studies, most of them rely on UAVs or high-resolution remote sensing images for recognition. Although these methods offer higher acquisition efficiency, they lack sufficient coordination and real-time capability with transplanting operations. Zhang et al. (2022) designed a machine vision-based monitoring system for lodging and air-pocket conditions in rapeseed mat-type seedling transplanters. Mounted at the rear of the transplanter, the system collected and processed images in real time, effectively monitoring and assessing the operational status of the transplanter and improving the accuracy and timeliness of transplanting quality monitoring. Despite the achievements of the aforementioned studies, current tobacco transplanters primarily operate in an “open-loop” mode and generally lack integrated real-time perception capabilities regarding seedling posture, plant spacing, and transplanting status. Operators often cannot obtain immediate feedback on operation quality. Advanced image-based agricultural inspection systems have developed rapidly. For instance, Johansen et al. (2020) used UAV imagery to predict tomato biomass, and Jiang et al. (2022) utilized multispectral inversion for quinoa phenotyping. However, these methods mostly serve as “passive observers,” focusing on post-transplant offline monitoring rather than real-time interaction.

At present, the application of digital twin technology to tobacco transplanting quality inspection is still in the exploratory stage. For the specific requirements of tobacco transplanting, no comprehensive solution has yet emerged that integrates digital twins, visual recognition, data fusion, and real-time processing, making innovation and optimization at both the system architecture and algorithmic levels urgently necessary. How to realize transplanting quality inspection through automated approaches, reduce labor costs, improve inspection accuracy, and provide real-time feedback for equipment performance optimization and replanting operations has become an important research topic. To address these challenges, this paper proposes and develops a real-time tobacco transplanting quality inspection system that integrates deep learning and digital twin technology. The system takes the optimized YAN-YOLO11 object detection algorithm as its core, enabling high-precision detection of transplanted tobacco seedlings and accurate assessment of transplanting status. By incorporating multi-sensor fusion for precise in-row spacing estimation and combining it with an interactive digital twin visualization platform built on Unity3D, the system achieves real-time mapping between the virtual and physical fields, providing efficient decision support for precise replanting and operational optimization.

The main contributions of this paper are summarized as follows: 1. Multi-sensor Data Fusion: We achieved high-precision measurement of plant spacing and logical assessment of missed and duplicate planting by fusing GNSS positioning data with visual detection results. 2. Digital Twin Visualization and Feedback: A virtual-real interactive closed-loop was constructed based on Unity3D, which not only achieved real-time mapping of operational scenarios but also generated replanting path suggestions based on anomaly clustering analysis. 3. Improved YAN-YOLO11 Algorithm: A lightweight object detection algorithm was proposed, which significantly improved the recognition accuracy of transplanted seedlings in complex field environments while ensuring low computational consumption.

The remainder of this paper is organized as follows: Section 2 presents the architecture design of the online quality inspection method for tobacco transplanting operations. Section 3 focuses on the YAN-YOLO11 detection algorithm applied in transplanting quality inspection. Section 4 elaborates on the visualization and feedback functions of the digital twin system. Section 5 verifies the system’s performance and application effectiveness through experiments. Finally, Section 6 concludes the study and discusses directions for future optimization.

2 Digital twin system architecture

The architecture of the proposed deep learning- and digital twin driven tobacco transplanting quality inspection system is shown in Figure 1. The hardware components of the inspection system mainly include a CMOS camera, a GNSS receiver, and a Raspberry Pi 5. The system can be flexibly deployed on an independent mobile chassis or an automatic transplanter, provided that the sensors move along the center of the tobacco ridge to collect data. The hardware specifications in this study are as follows: The CMOS camera features a resolution of 1920×1080 pixels and a frame rate of 30fps. The GNSS receiver supports multi-frequency and multi-system operation (BDS/GPS/GLONASS/GALILEO), achieving a static horizontal positioning accuracy of ±1cm+1ppm and a vertical accuracy of ±2.5cm+1ppm.

Figure 1

Diagram illustrating a digital monitoring system for a tobacco field. Detection equipment includes a GNSS receiver, camera, and Raspberry Pi, highlighted for data collection. This connects via HTTP to a cloud server for data storage and transplanting status assessment. The cloud server communicates with a digital twin terminal using TCP, providing feedback. The flow visually connects the field, equipment, server, and terminal in an iterative process.

Figure 1. Overall framework of the digital twin system.

As a distributed terminal, the Raspberry Pi connects to the camera via USB to capture video images and receives geographic location information from the GNSS receiver through a UART interface. Data preprocessing is carried out on the Raspberry Pi, including image enhancement of video frames and assigning timestamps to each frame, which are used to align data from different sensors through timestamp matching. The preprocessed data are then uploaded to the cloud server via the HTTP protocol.

The cloud server is responsible for transplanting status detection and assessment, data matching across different sensors, and data storage. A deep learning model is employed to detect tobacco seedlings in the video frames, and the transplanting status is then assessed based on the detection results. Data synchronization between the video frames captured by the CMOS camera and the geographic location information is achieved through strict system timestamp alignment.

On the digital twin side, the system adopts a client–server architecture based on the Transmission Control Protocol (TCP) (Mahmoodi Khaniabadi et al., 2023; Mishra and Sharma, 2023). Through the TCP, the system communicates with the cloud server to acquire transplanting status and geographic location information in real time. Using this information, it establishes virtual mappings of transplanted seedlings within the digital twin tobacco field, thereby achieving dynamic synchronization between the physical entities and the virtual scene.

The digital twin system not only displays transplanting quality status in the virtual tobacco field in real time but also generates alarms for abnormal transplanting cases and feeds the detected abnormal information back to the client. The feedback includes specific abnormal locations—such as missed planting points or root exposure points—as well as optimization suggestions. Through this feedback mechanism, the system forms a “collection–detection–mapping–feedback” closed loop, thereby improving the automation and precision of tobacco transplanting operations.

3 Deep learning driven transplanting quality inspection method

3.1 Dataset construction

The dataset in this study was collected from tobacco fields in Mile City, Honghe Hani and Yi Autonomous Prefecture, and Chengjiang City, Yuxi, Yunnan Province. To ensure data diversity, the collection includes transplanting images under two different operation modes (manual and mechanical transplanting), lighting environments such as direct sunlight, backlight, and cloud shadows, and soil conditions (red soil and sandy soil with different particle sizes). Examples of images under different lighting and soil conditions are shown in Figure 2. The acquisition devices were a CMOS camera and a Huawei P50 smartphone. All videos were recorded at a resolution of 1920×1080 pixels, with a frame rate of 30 fps, and stored in mp4 and MOV formats. The collected videos were processed with a 1:50 frame extraction ratio, resulting in a total of 1,080 raw natural images.

Figure 2

Five images display plant growth in different soil types. The top row shows plants in coarse-grained sandy soil, fine-grained sandy soil, and silty soil. The bottom row features plants in coarse-grained red soil and fine-grained red soil. Each scene depicts small plants emerging from the soil.

Figure 2. Typical examples of the dataset.

To obtain clearer images with more distinct features and easier recognition, the raw natural images were preprocessed. Under natural lighting conditions, the illumination of tobacco fields is complex: transplanting pits and equipment occlusion can generate shadows, strong light can cause leaf reflections on transplanted seedlings, and on cloudy days the overall image contrast is relatively low. Therefore, contrast-limited adaptive histogram equalization (CLAHE) was employed to enhance the raw image data. The core idea of this method is to adaptively adjust the local contrast of images to improve visual quality, making it particularly suitable for images captured under complex lighting conditions (Reza, 2004; Kryjak et al., 2022).

A total of 1,080 images containing transplanted seedlings were divided into 865 for the training set and 215 for the validation set. The dataset was annotated using the LabelImg tool. To ensure the uniformity and accuracy of annotation standards, the dataset was annotated by a single researcher. Since the quantity of the three class labels in the collected raw image data was 8:1:1, with a small number of abnormal transplanted seedlings, data augmentation was applied to images containing abnormal seedlings to mitigate class imbalance and improve the model’s generalization and robustness (Wang et al., 2024). The augmentation methods included random flipping, random brightness, random contrast, random saturation, and random noise, as illustrated in Figure 3. After augmentation, the training set contained 2,764 images and the validation set 755 images, with 3,804 labels of normal transplanted seedlings and 3,047 labels of abnormal transplanted seedlings. The quantity of the three types of labels was adjusted to 2:1:1, resulting in a significant increase in the proportion of abnormal samples (Santos et al., 2022; Chen et al., 2024).

Figure 3

Original and augmented images of soil with seedlings. The set includes: (A) Original Image, (B) Image after CLAHE, and (C) Augmented Images, including Salt Noise, Random Saturation, Random Brightness, Random Contrast, Pepper Noise, Gaussian Noise, Random Vertical Flip, and Random Horizontal Flip. Each image shows variations in color, brightness, and texture.

Figure 3. Dataset processing: (A) original image; (B) CLAHE preprocessing result; (C) data augmentation result.

3.2 Improved YAN-YOLO11 detection algorithm

Owing to the requirements of the inspection system for real-time digital twin applications, real-time performance and accuracy are the key criteria for selecting and improving detection algorithms. In terms of model selection, RCNN and related models can achieve high-precision object detection; however, due to their use of multi-stage pipeline processing, they involve high computational cost and long processing time. Furthermore, compared with pixel-level segmentation models (e.g., DeepLab, U-Net), object detection frameworks such as the YOLO series adopt single-stage and end-to-end architectures. They can effectively capture transplanting status and coordinates without the overhead of dense prediction, possessing sufficient detection accuracy while meeting real-time requirements (Sapkota et al., 2024; Wang et al., 2025).

After comparing the performance of different YOLO versions on multiple public datasets, YOLO11 was found to achieve high accuracy with relatively low computational cost, making it an ideal choice for applications requiring both speed and precision (Jegham et al., 2024). The key improvements of YOLO11 include the introduction of the Cross-Stage Partial Self-Attention (C2PSA) module and the replacement of the C2f module with the C3k2 module, which enable more effective cross-layer contextual information capture and improve efficiency and speed with smaller kernels while maintaining accuracy (Sapkota et al., 2025).

Considering the requirements for accuracy and real-time performance in the transplanting detection system of this study, and taking into account the size characteristics of the target objects, we propose an improved YAN-YOLO11 detection algorithm, whose network architecture is illustrated in Figure 4.

Figure 4

Flowchart illustrating a neural network with three sections: backbone, neck, and head. The backbone includes convolutional and PConv layers, labeled from 0 to 10. The neck comprises top-down and bottom-up paths with AVG, Concat, and RepHMS functions, labeled from 11 to 31. The head contains the Detect_dyhead function, labeled 32, connected by a flow indicating data processing through each layer.

Figure 4. Network architecture of YAN-YOLO11.

In the detection head, to achieve lightweight design while maintaining high detection accuracy, Dynamic Convolution (Dynamic Conv) was introduced on top of the original decoupled head and depthwise separable convolution. Their combined effect reduces computational complexity and the number of parameters, while effectively preserving the model’s representational capacity. Traditional convolutional layers use a single kernel to process input features, whereas dynamic convolution introduces input-dependent coefficients a on top of the conventional convolution. A dynamic convolution with M kernels can be expressed as Equation 1:

\begin{array}{l} Y = \sum_{i = 1}^{M} α_{i} (W_{i} * X) & (1) \end{array}

where Y is the output, X is the input, * denotes the convolution operation, $W_{i}$ is the i-th convolution weight, and $α_{i}$ is the dynamic coefficient. The dynamic coefficients a are generated adaptively from the input X, by first applying global average pooling and then passing the result through a two-layer MLP module with softmax activation, i.e., Equation 2:

\begin{array}{l} α = s o f t m a x (M L P (P o o l (X))) & (2) \end{array}

Compared with conventional convolutional layers, the introduction of dynamic coefficients brings more parameters while incurring almost no additional computational overhead (Han et al., 2024). By incorporating dynamic coefficients, the dynamic convolution head can adaptively adjust convolutional weights, thereby enhancing the feature representation capability.

In the backbone network, to enhance the ability to analyze low-level features of small targets, the standard convolutions in the 1st, 3rd, 5th, and 7th layers were replaced with pinwheel-shaped convolutions (PConv). The pinwheel-shaped convolution module employs an asymmetric padding strategy to generate four convolution kernels in both horizontal and vertical directions, which process horizontal and vertical regions of the image separately. Through multiple convolutional operations, it processes the input feature map $X^{(h_{1}, w_{1}, c_{1})}$ , thereby enhancing the ability to capture target features. Its forward propagation process can be expressed as Equations 3–7:

\begin{array}{l} X_{1}^{(h^{'}, w^{'}, c^{'})} = S i L U (B N (X_{P (1, 0, 0, 3)}^{(h_{1}, w_{1}, c_{1})} * W_{1}^{(1, 3, c^{'})})) & (3) \end{array}

\begin{array}{l} X_{2}^{(h^{'}, w^{'}, c^{'})} = S i L U (B N (X_{P (0, 3, 0, 1)}^{(h_{1}, w_{1}, c_{1})} * W_{2}^{(3, 1, c^{'})})) & (4) \end{array}

\begin{array}{l} X_{3}^{(h^{'}, w^{'}, c^{'})} = S i L U (B N (X_{P (0, 1, 3, 0)}^{(h_{1}, w_{1}, c_{1})} * W_{3}^{(1, 3, c^{'})})) & (5) \end{array}

\begin{array}{l} X_{4}^{(h^{'}, w^{'}, c^{'})} = SiLU (BN (X_{P (3, 0, 1, 0)}^{(h_{1}, w_{1}, c_{1})} * W_{4}^{(3, 1, c^{'})})) & (6) \end{array}

\begin{array}{l} Y^{(h_{2}, w_{2}, c_{2})} = S i L U (B N (C a t (X_{1}^{(h^{'}, w^{'}, c^{'})}, \dots, X_{4}^{(h^{'}, w^{'}, c^{'})}) * W^{(2, 2, c_{2})})) & (7) \end{array}

Here, $W_{1}^{(1, 3, c^{'})}$ denotes a 1×3 convolution kernel with c’ output channels; P(1,0,0,3) represents the number of padding pixels in the left, right, top, and bottom directions; Cat denotes the concatenation operation; and $W^{(2, 2, C_{2})}$ is the final normalization convolution kernel (Yang et al., 2025). Due to its unique padding scheme, pinwheel-shaped convolution forms a receptive field that attenuates outward in a manner similar to a Gaussian distribution. By adopting grouped convolution, it significantly enlarges the receptive field while minimizing the number of parameters (Luo et al., 2016; Zhang et al., 2017). When the kernel size is k = 3, the receptive field of a standard convolution is 9, whereas the pinwheel-shaped convolution expands it to 25 through the combination of four directional kernels, achieving a 177% increase.

In the neck structure, to address the issues of insufficient multi-scale feature fusion and inadequate small-object detection capability, a Multi-Branch Auxiliary Feature Pyramid Network (MAFPN) was introduced, which optimizes the feature processing pipeline through a bidirectional auxiliary fusion mechanism. In the structure, layers 11–21 form a top-down path that aggregates shallow high- and low-resolution layers with earlier features, enabling multidirectional gradient information exchange and enhancing the representation of medium- and large-scale objects. Layers 22–31 constitute a bottom-up path that fuses shallow backbone features with neck outputs, preserving shallow spatial information and providing richer details for small-object detection (Ke et al., 2024; Shi et al., 2025). To improve feature extraction efficiency, a Re-parameterized Heterogeneous Convolution (RepHMS) module was introduced into the neck feature processing. This module leverages a Global Heterogeneous Kernel Selection (GHSK) mechanism, which employs convolution kernels of different sizes across feature layers to capture multi-scale features ranging from local to global. In addition, through a re-parameterization mechanism, the multiple parallel multi-scale depthwise convolution kernels used during training are re-parameterized into a single kernel at the inference stage, thereby significantly enlarging the receptive field with almost no additional computational overhead (Yang et al., 2024).

3.3 Transplanting status assessment

The assessment and classification of transplanting status are at the core of this study. We propose a transplanting status assessment method based on seedling detection and positional information. By detecting normal, buried, and exposed-root seedlings in real time and combining GPS data with positional analysis, the method further assesses abnormal statuses such as double planting and missed planting. The transplanting statuses are prioritized in descending order as follows: Double Planting, Missed Planting, Seedling Burial, Root Exposure, and Normal Transplanting, and their assessment criteria are detailed in Table 1.

Table 1

Table 1. Transplanting status assessment criteria.

Using the YAN-YOLO11 described earlier to process video frames, normal, exposed-root, and buried seedlings are detected. When a detection box passes through the center of the frame, the seedling is counted, and both the detected category and the timestamp of the frame are recorded. GNSS data are synchronized with video frames through timestamps, enabling the assignment of geographic coordinates (longitude and latitude) to each detected seedling. To calculate the in-row spacing of seedlings, the geographic coordinates must be converted into planar coordinates. Since the agronomic requirement for in-row spacing is generally around 0.5m, a simplified approximate planar projection method was adopted for rapid calculation within small areas. The conversion process is described in Equations 8–10:

longitude direction:

\begin{array}{l} x = l o n g i t u d e \times m e t e r s_p e r_d e g_l o n & (8) \end{array}

\begin{array}{l} m e t e r s_p e r_d e g_l o n = 111000 \times c o s (r e f_l a t) & (9) \end{array}

latitude direction:

\begin{array}{l} y = l a t i t u d e \times 111000 & (10) \end{array}

Here, meter_per_deg_lon denotes the longitude coefficient, ref_lat is the reference latitude, and 111,000 is the approximate latitude conversion coefficient with units of m/°(meters per degree) (Snyder, 1997). The theoretical effective radius of the Local Tangent Plane (LTP) projection method (i.e., ENU coordinate conversion) can typically reach over 10km. The “small-scale area” defined in this study refers to fields with an operational area typically of 1km×1km (Drake, 2002). Using the converted planar coordinates, the Euclidean distance between adjacent seedlings is calculated and then compared with the agronomic standard spacing for further assessment. For missed planting, interpolation is used to calculate the coordinates of the missed-planting points. For double planting, consecutively detected seedlings with small in-row spacing are grouped as the same event, and the centroid coordinates are calculated to represent the double-planting point.

4 Digital twin design

In recent years, with the implementation of Agriculture 4.0, the exploration of digital twin technology in agricultural production practices has been deepening, gradually transforming areas such as agricultural production management and the optimization of agricultural machinery and equipment (Zhang et al., 2025). By employing high-fidelity models in virtual environments to achieve real-time mapping of physical entities, the field operation process can be perceived and identified in real time. This enables the evaluation of operational quality and feedback on detected issues, significantly enhancing the intelligence and refinement of agricultural production management (Verdouw et al., 2021). In this study, Unity3D was selected as the visualization platform for the digital twin system, with C# adopted as the primary programming language. The Unity platform offers powerful three-dimensional modeling and real-time rendering capabilities, supports efficient data interaction and cross-platform deployment, and provides flexible and scalable technical support for virtual simulation and visualization of tobacco transplanting (Lu et al., 2024).

4.1 Construction and mapping of virtual scenes

The digital twin platform establishes a stable connection via the TCP protocol with the transplanting status detection and assessment program deployed on the cloud server, enabling real-time data acquisition. The cloud server encapsulates the assessed results into JSON format for transmission. Each data entry corresponds to a transplanting point and contains transplanting status and geographic coordinates. An example is shown below:

{“latitude”: 24.64190, “longitude”: 102.92841, “state”: “buried”}.

Upon receiving the data, the digital twin converts the geographic latitude–longitude coordinates into a local East–North–Up (ENU) two-dimensional coordinate system for the virtual scene. The system employs a geographic coordinate projection conversion algorithm based on the origin of the tobacco field plot. The initial position of the detection system in the tobacco field is taken as the origin. Using the origin coordinates (φ₀, λ₀) and the clockwise angle θ between the ridge direction and true north as references, each transplanting point with geographic coordinates (φ, λ) is transformed into (X, Z) coordinates in the virtual scene. The conversion process is as follows (Equations 11–13):

\begin{array}{l} L_{λ} = R \cdot c o s φ_{0} \cdot (λ - λ_{0}) \cdot \frac{π}{180} & (11) \end{array}

\begin{array}{l} L_{φ} = R \cdot (φ - φ_{0}) \cdot \frac{π}{180} & (12) \end{array}

\begin{array}{l} [\begin{matrix} X \\ Z \end{matrix}] = [\begin{matrix} c o s θ & - s i n θ \\ s i n θ & c o s θ \end{matrix}] [\begin{matrix} L_{λ} \\ L_{φ} \end{matrix}] + [\begin{matrix} x_{o f f} \\ z_{o f f} \end{matrix}] & (13) \end{array}

Here, L_λ and Lφ denote the distances in the east–west and north–south directions. R is the semi-major axis radius of the WGS-84 ellipsoid, with R=6,378,137m. x_off and z_off represent the translation offsets along the horizontal and vertical axes, respectively (Teunissen and Montenbruck, 2017). Through this method, the transplanting points can be mapped into the virtual scene while preserving their actual relative spatial relationships, as illustrated in Figure 5.

Figure 5

Spherical coordinate system diagram showing a globe with labeled points $(\varphi, \lambda)$ and $(\varphi_0, \lambda_0)$, connected by a line $R$. Mathematical equations in a box show transformations. Arrows indicate translation to an X, Y, Z plane with a farm field, characterized by rows of plants, marked $(X, Z)$. Coordinate rotation $\theta$ is shown between the globe and field. Arrows represent north (N) and east (E) on the globe and X, Y, Z axes on the field.

Figure 5. Coordinate transformation of transplanting points.

In the transplanting status display of the virtual scene, ridge models are dynamically generated as prefabs through programmatic control. The three-dimensional structure of ridges is implemented using mesh grids, and transplanting pits are created through local mesh deformations to replicate the traces of transplanting operations. Based on the received transplanting status information, the system dynamically selects the number and placement of seedling models, thereby presenting the five transplanting statuses: normal, burial, root exposure, missed planting, and double planting. For example, in the normal transplanting status, one seedling model is placed at the center of each pit. In the root exposure status, the seedling model is shifted upward to expose part of the substrate. In the double planting status, two or more seedlings are placed within the same pit, with spatial distribution and angles adjusted to avoid overlapping. As shown in Figure 6, these operations enable the virtual scene to accurately reflect various transplanting statuses and spatial distribution characteristics, thus achieving fine-grained mapping and visualization from real tobacco fields to the virtual environment.

Figure 6

Illustration of seedling planting techniques on a ridge. Top left shows a seedling and the initial ridge model. Top right depicts the ridge mesh and its deformation with pits. The bottom image demonstrates various planting outcomes: normal transplanting, missed planting, double planting, buried seedling, and exposed root.

Figure 6. Modeling of five transplanting statuses.

During the construction of the virtual scene, the system first generates ridge prefabs in batches according to field parameters and arranges them with specified width, length, and spacing, thereby accurately reproducing the structure of tobacco fields. As each transplanting data entry is received, the corresponding transplanting status in the virtual scene is updated in real time. To enhance performance under large-scale data, the platform adopts an object pooling mechanism and model simplification strategies, ensuring that the virtual scene runs smoothly even under complex operational conditions and meets the real-time requirements of intelligent field supervision.

4.2 Transplanting quality inspection and visualization

This system leverages digital twin technology to provide visualization of transplanting quality inspection in virtual scenes, as well as abnormal transplanting alarms and replanting guidance. The system interface consists of multiple functional modules, including virtual entity display, physical entity images, data signal monitoring, transplanting data visualization, and replanting prompts, as shown in Figure 7.

Figure 7

Tobacco Transplanting Quality Detection System illustration showing a virtual plot of dirt with seedlings, accompanied by a close-up image of seedlings in a row. A scatter plot presents the distribution of transplanting statuses such as exposed, normal, missed, buried, and double. A bar graph displays the statistics of transplanting status, highlighting the number of plants in each category. Received data with latitude, longitude, and state are listed on the left, and transplanting exception warnings are in red text at the bottom right.

Figure 7. Digital twin interface of the tobacco transplanting quality inspection system.

In the virtual entity window, the system dynamically reproduces the actual operation of transplanting equipment through three-dimensional models based on the real-time received transplanting status and spatial location information, achieving high synchronization between physical operations and the virtual space. In the transplanting data visualization window, the platform continuously counts and updates the number and proportion of different transplanting statuses, including normal transplanting, missed planting, double planting, burial, and root exposure. For each abnormal transplanting status, the system quantifies it using dedicated indicators and automatically calculates the overall abnormality rate. The system employs bar charts to display the variations in quantity or proportion of each transplanting status, helping operators gain a comprehensive understanding of field operation quality. Replanting prompts are presented in the form of scatter plots, which intuitively mark all abnormal points on the spatial distribution map of the field, thereby facilitating the rapid localization and identification of problem areas.

4.3 Operational optimization feedback

In the design of the digital twin system, operational optimization feedback is not only a response mechanism for abnormal transplanting statuses but also a critical link for achieving dynamic interaction between the virtual scene and real-world tobacco field management. The system monitors transplanting statuses in real time, identifying four types of abnormal points—missed planting, double planting, root exposure, and burial—and generates replanting recommendation lists or optimal replanting paths based on their spatial clustering. This provides precise and timely decision support for field operation management.

The feedback mechanism functions through a dual-channel approach targeting both human operators and automated machinery. First, for manual intervention, the system transmits abnormal information in real time to the digital twin terminal. Through spatial clustering analysis of abnormal areas, it calculates the optimal replanting path and generates detailed suggestions, including the types of anomalies to be addressed. These outputs serve as intuitive guidance for manual replanting, enabling field managers to quickly locate problem areas.

Crucially, to achieve a fully interactive Digital Twin, the system extends this feedback to automated closed-loop regulation. By interacting with the control unit (ECU) of automatic transplanters, the system dynamically optimizes operational strategies. Specifically, utilizing the real-time recognition results, the system calculates deviation parameters and transmits control signals back to the machine. For adaptive depth adjustment, if a high frequency of “Buried Seedlings” or “Exposed Roots” is detected, the system sends a compensation signal to the depth control actuator (e.g., an electric push rod) to automatically raise or lower the planting mechanism. Similarly, for spacing calibration, feedback signals are generated to fine-tune the planting frequency or travel speed based on real-time spacing variations. This automated adjustment of key parameters not only improves replanting efficiency but also enhances equipment performance, achieving refined and intelligent transplanting operations.

Relying on this real-time synchronization, the operational optimization feedback mechanism enables a robust closed-loop interaction between the virtual and physical domains. The virtual scene functions not only as a visualization platform but as an active decision-making center. Through the closed-loop design of “collection-detection-mapping-feedback-execution,” the digital twin system ensures that dynamic adjustments are promptly transferred to real-world operations, effectively improving the automation and precision of tobacco transplanting.

5 Results and discussion

5.1 Performance comparison of different models

To achieve efficient detection of post-transplanting tobacco seedling images and to select the most suitable deep learning models for this task, lightweight versions of YOLOv5, YOLOv8, YOLOv10, and YOLO11 were trained and tested on the same post-transplanting image dataset under identical runtime environments and parameter settings, with the results shown in Table 2 and Figure 8.

Table 2

Table 2. Results of baseline model comparison test.

Figure 8

Line graph showing the mAP at 0.5 and training loss over 300 epochs for various YOLO versions. Solid lines represent mAP: YOLOv5n (black), YOLOv8n (green), YOLOv10n (blue), and YOLOv11n (red). Dashed lines represent respective training losses in matching colors. mAP increases quickly then stabilizes around 0.8, while training loss decreases steadily, indicating model training effectiveness.

Figure 8. Training precision and loss curves of four models.

Based on the comparison of training results across the baseline models, it is evident that YOLO11n demonstrates the optimal balance between accuracy and efficiency. First, regarding the mAP@0.5 metric, YOLO11n performs comparably to YOLOv10n, with both outperforming YOLOv5n and YOLOv8n. Notably, its mAP@0.5:0.95 surpasses that of all other models. This superiority is attributed to YOLO11’s enhanced feature extraction capabilities; its improved backbone and neck architectures enable more precise capture of tobacco seedling features. Second, in terms of Loss curves, YOLOv10n exhibits significantly higher loss values than the other models, indicating greater difficulty in optimization within complex scenarios and a failure to effectively fit the data. In contrast, YOLO11n demonstrates the lowest loss values and the fastest convergence speed. This validates YOLO11’s optimized training pipeline and architectural design, allowing it to rapidly minimize prediction errors and maintain exceptional training stability, even with fewer parameters.

In summary, YOLO11n not only inherits the generational advantage of “fewer parameters and higher precision” but also excels in training convergence and robustness. Its strong adaptability to edge computing environments, combined with its capability for feature extraction in complex backgrounds, fully satisfies the stringent requirements of this tobacco transplanting quality inspection system for high precision, real-time performance, and stability. Therefore, selecting YOLO11n as the baseline model for improvement in this project is the optimal choice.

5.2 Evaluation and ablation study of YAN-YOLO11

The model training configuration for this study is as follows: The AutoDL cloud computing platform was selected, equipped with an NVIDIA GeForce RTX 4090 GPU, a 20 vCPU Intel Xeon Platinum 8470Q processor, and 64 GB of RAM. The operating system was Linux, and the deep learning framework employed was PyTorch 2.4.1, running on Python 3.8.10 and CUDA 11.8. Regarding hyperparameters, the initial learning rate was set to 0.01, the batch size was set to 16, and the AdamW optimizer was selected. The model was trained for 300 epochs with an early stopping patience of 30 epochs, and the input image size was set to 640 × 640 pixels. The training results are shown in Figure 9.

Figure 9

Loss and metric graphs over 200 epochs. Top row: training box, classification, and distribution focal loss decreases; precision and recall increase. Bottom row: validation box, classification, and distributional losses decrease; mAP50 and mAP50-95 metrics increase. Results and smooth curves are shown.

Figure 9. Training and validation performance curves of the YAN-YOLO11 model.

To further verify the effectiveness of the improvements in YAN-YOLO11, ablation experiments were conducted, and the results are presented in Table 3. Analysis of the results shows that the improved dynamic convolution detection head plays a significant role in model lightweighting, while the enhancements in pinwheel-shaped convolution and the neck structure substantially improve model accuracy (Bochkovskiy et al., 2020). Compared with the original YOLO11, the improved YAN-YOLO11 achieved increases of 2.4% in Precision and 2.5% in Recall. The mean average precision (mAP) at an IoU threshold of 0.5 and at IoU thresholds of 0.5-0.95 improved by 3% and 5.8%, respectively, while the number of parameters and floating-point operations (FLOPs) were reduced by 19.8% and 9.5%.

Table 3

Table 3. Results of ablation experiments.

To evaluate the robustness of the model, we conducted multiple experiments on YAN-YOLO11 using different random seeds and calculated the mean and Standard Deviation (SD) for key metrics. The standard deviations for the key metrics of YAN-YOLO11 are as follows: Precision is 86.8% ± 1.7%, Recall is 74.4% ± 2.7%, mAP@50 is 80.9% ± 1.4%, and mAP@50:95 is 54.2% ± 1.0%. These values reflect the stability and magnitude of improvement of the model across multiple runs. We added 95% Confidence Intervals (CI): The CI for Precision is [86.0%, 87.6%], for Recall is [73.4%, 75.4%], for mAP@0.5 is [80.0%, 81.8%], and for mAP@0.5:0.95 is [53.0%, 55.4%]. Furthermore, we verified the significance of these improvements using a Binomial test (assuming the baseline corresponds to the values of the original YOLO11n). All p-values were less than 0.05, indicating that the improvements are statistically significant.

To further analyze the robustness and limitations of the system in complex scenarios, Figure 10 illustrates typical recognition results, particularly typical recognition errors. In the examples of correct detections, the system demonstrates strong robustness under complex lighting conditions such as strong backlighting and cloud shadows. This is attributed to the CLAHE preprocessing and the enhanced feature extraction capabilities of the YAN-YOLO11 backbone, which effectively suppress environmental interference. However, false identifications, missed detections, or duplicate detections still occur in certain extreme scenarios. “Misclassification” errors (e.g., classifying exposed-root seedlings as normal seedlings) typically occur when the root area is visually ambiguous due to soil clumps. “Missed detections” mainly occur when seedlings are severely occluded by soil (deeply buried) or when the leaf area is extremely small. In future work, we plan to integrate 3D depth information and temporal information. By employing sequence analysis, we aim to resolve these visual ambiguities and further reduce false detections.

Figure 10

Detection images show seedlings in soil with labels indicating detection confidence scores. Red circles highlight areas of false detections, buried seedlings, exposed roots, and duplicates. A central list categorizes each corresponding type of detection error.

Figure 10. Analysis of typical detection failure cases in field scenarios.

Overall, the improvements enhanced detection accuracy while further optimizing model lightweighting, demonstrating that the proposed algorithm performs effectively in detecting post-transplanting tobacco seedlings.

5.3 System verification experiments

To verify the functionality and performance of the online detection system, field experiments were conducted at the tobacco fields in Chengjiang City, Yuxi, Yunnan Province and Mile City, Honghe Hani and Yi Autonomous Prefecture, with the hardware system mounted on a self-propelled crawler chassis, as shown in Figure 11. The Chengjiang experimental plot consisted of 10 ridges, each with 21 theoretically transplanted seedlings, totaling 210 theoretical transplanting points; the Mile experimental plot consisted of 8 ridges, each with 48 theoretically transplanted seedlings, totaling 384 theoretical transplanting points. The two plots comprised a total of 594 theoretical transplanting points. The experimental results show that the system accurately identified transplanting points and their statuses, achieving 539 correct determinations and a transplanting status accuracy of 90.74%, as detailed in Table 4. The related demonstration video and code has been made publicly available on GitHub: https://doi.org/10.5281/zenodo.17075402.

Figure 11

(A) Two individuals operate a machine planting seedlings in a field. (B) Three people are manually planting in the soil. (C) Rows of planted seedlings are seen in a neatly organized field. (D) A person stands in a field with a tablet, monitoring equipment among planted rows.

Figure 11. (A) Semi-automatic transplanting machine operation site; (B) manual transplanting operation site; (C) tobacco field after transplanting; (D) experimental validation environment.

Table 4

Table 4. Results of system field validation experiments.

To evaluate the statistical stability of the system’s accuracy, we conducted a statistical analysis on the overall transplanting status assessment accuracy. The 95% Confidence Interval (CI) was calculated using the Wilson Score Interval method, determining that the system’s true accuracy falls within the range of [88.14%, 92.82%]. This interval confirms that, under the current sample size, the reported results possess high statistical reliability. Furthermore, to validate the effectiveness of the system’s performance, we conducted significance tests using 85% as the baseline. The p-value for the Binomial test was 2.16e-05, and the p-value for the Z-test was 4.46e-05; both are significantly less than 0.05. This demonstrates that our system is effective and holds practical application value.

During the experiment, the digital twin system realized end-to-end integration of model detection, spatial mapping, and virtual scene visualization, enabling the acquired transplanting images and data to be transmitted and intuitively reflected in the virtual scene in real time. During operation, the system continuously received detection results and transplanting status judgments from the YAN-YOLO11 model, covering five statuses: normal transplanting, missed planting, double planting, buried seedlings, and exposed-root seedlings. Through latitude–longitude coordinate transformation, all detected points were accurately mapped to the corresponding positions in the virtual tobacco field, achieving dynamic synchronization between physical entities and virtual entities. The system also integrated bar charts, scatter plots, and other statistical and visualization tools to automatically display the numbers and distributions of various transplanting statuses.

In terms of system performance, we adopted a distributed architecture: the Raspberry Pi serves solely as an edge device responsible for data acquisition, preprocessing, and transmission, while the core YAN-YOLO11 model inference is deployed on a cloud server. The YAN-YOLO11 model achieved an average inference speed of 30.66 FPS on the cloud server equipped with an NVIDIA RTX 4090 GPU, sufficient for stable and smooth real-time analysis of video frames and status output. To verify the real-time performance and stability principles of the digital twin system design, the Unity engine performance analysis tool Profiler was used to analyze scene performance during system operation. With continuous reception and rendering of multiple types of point data, the platform achieved an average per-frame time of 33ms and an average frame rate of 30FPS. During testing, the digital twin platform demonstrated excellent load-bearing capacity and smooth interactive response, with all transplanting point positions and statuses accurately reproduced in the scene.

To quantify the real-time synchronization performance of the digital twin system, we analyzed the end-to-end latency, which primarily consists of three components: data transmission latency, model inference and data fusion latency, and virtual scene rendering and feedback latency. Specifically, the average latency for capturing images and GNSS data from the edge device (Raspberry Pi) and transmitting them to the cloud server for model inference is 20ms. The average time required to complete YAN-YOLO11 inference, GNSS data fusion, and transplanting status assessment on the cloud server (RTX 4090) is 32.6ms. Furthermore, in the Unity digital twin platform, the average latency for receiving all computational results and rendering the updated virtual scene is approximately 33ms. Consequently, the total end-to-end synchronization latency is approximately 85.6ms. This latency is significantly lower than the time interval between two tobacco seedlings at a typical transplanter speed of 0.5m/s, ensuring that the digital twin system is capable of achieving quasi-real-time monitoring and feedback.

Regarding the recognition results of YAN-YOLO11 in the validation, for the three target categories—normal transplanting, buried seedlings, and exposed-root seedlings—Figure 12 shows that the recognition accuracy of the model for normal transplanting seedlings was significantly higher than for exposed-root and buried seedlings. The correct prediction rate for normal seedlings reached 95.20%, with both precision and recall being high, and the curve approaching the upper-right corner, indicating excellent recognition performance for this category. In contrast, the recognition accuracy for the other categories was relatively lower, with noticeable misclassification between exposed-root and buried seedlings. In actual transplanting operations, the occurrence probabilities of buried and exposed-root seedlings are much lower than those of normal transplanting, double planting, and missed planting. As a result, the overall recognition accuracy verified in the experiment was significantly higher than the performance of YAN-YOLO11 during training, reaching an accuracy of 90.74%. This result indicates that the model’s high accuracy for normal transplanting seedlings helped improve the overall accuracy, while misclassification of exposed-root and buried seedlings was partially alleviated.

Figure 12

Image consists of two panels. Panel A displays a confusion matrix with classes: Seedling, Root, Buried Seedling, and Background, showing high accuracy for Seedlings. Panel B features a precision-recall curve for the same classes, indicating performance metrics: Seedling (0.866), Root (0.818), Buried Seedling (0.762), and all classes averaged at 0.815 mAP at 0.5 threshold.

Figure 12. (A) Confusion matrix of YAN-YOLO11; (B) precision–recall curves.

The impact of recognition errors on status assessment was also validated during the experiment. Recognition results directly influence transplanting status judgment. When a missed detection occurs, the distance between adjacent seedlings increases due to the missed detection, causing the missed point to be classified as missed planting. In the case of a false detection, the ground or other background may be mistakenly identified as a transplanting seedling, resulting in a decrease in the distance between it and other correctly identified seedlings, leading to a false classification of double planting. Misclassification of categories does not affect the proportions of missed or double planting. Therefore, the overall transplanting point and status judgment accuracy should be similar to the recognition accuracy.

Overall, the improved YAN-YOLO11 network demonstrated high accuracy in recognizing normal, exposed-root, and buried transplanting seedlings. Combined with geographic location information, it reliably and stably detected double planting and missed planting events, yielding results highly consistent with manual inspection. The system exhibited excellent real-time performance, low status update latency, and met the requirements for field operations.

5.4 Limitations and future work

Although the deep learning and digital twin-based online inspection system proposed in this study meets practical application requirements in terms of accuracy and real-time performance, we objectively acknowledge several limitations in the current work, which also highlight directions for future research.

First, the dataset exhibits limited variations, as it was primarily collected from two locations in Yunnan Province under two main soil types (red soil and sandy soil) and specific weather conditions (sunny and cloudy). While data augmentation techniques were employed to address class imbalance for abnormal states (e.g., buried and exposed-root seedlings), the model’s robustness to these synthetically generated features remains inferior to that achieved with real samples. This constraint may affect the system’s performance in diverse geographical regions, soil compositions, tobacco varieties, or more extreme environmental conditions, such as heavy rain or dense fog, where lighting and occlusion variations could be more pronounced. The transferability of the model across these broader scenarios has not been fully validated. Future efforts will involve expanding the dataset through extensive field collections to include larger-scale, more diverse real abnormal samples, as well as incorporating cross-regional and cross-variety migration tests. Additionally, advanced techniques like domain adaptation, zero/few-shot learning, or the generation of synthetic data (e.g., using generative adversarial networks to simulate varied lighting, occlusion, and environmental scenarios) could enhance the model’s generalization and environmental adaptability.

Second, the system has notable computational requirements, particularly for edge deployment on resource-constrained devices like the Raspberry Pi. Although real-time processing (approximately 30 FPS) is achieved on the cloud server, edge-based inference demands high computational power, which could limit scalability in low-power field environments. This is compounded by potential synchronization delays between virtual and physical models in the digital twin framework, for which quantitative metrics (e.g., latency measurements) are currently lacking. Future optimizations could explore lighter model architectures, model compression techniques (e.g., pruning or quantization), or dedicated edge hardware such as the NVIDIA Jetson series to improve energy efficiency and enable higher-frequency real-time closed-loop control.

Third, the system’s effectiveness depends heavily on precise sensor calibration, including the alignment of the CMOS camera, GNSS receiver, and timestamp matching across data streams. Any misalignment or drift in sensor calibration—due to vibrations during field operations, environmental interference, or hardware inaccuracies—could introduce errors in multi-sensor data fusion, in-row spacing estimation, and transplanting status assessment. This reliance underscores the need for robust calibration protocols in practical deployments. To mitigate this, future work could integrate automated calibration mechanisms or fault-tolerant sensor fusion algorithms.

In addition to addressing these limitations, potential improvements include transitioning to transformer-based detectors (e.g., DETR or Vision Transformers), which could better handle complex occlusion and lighting conditions by leveraging attention mechanisms for global context understanding, potentially surpassing the convolutional focus of YOLO models. Furthermore, enhancing the digital twin with 3D simulations—incorporating depth data from additional sensors could provide more immersive visualizations and predictive capabilities for transplanting scenarios, enabling advanced what-if analyses for operational optimization.

6 Conclusion

This study addresses the quality inspection of tobacco transplanting and the optimization of field operations. We proposed and implemented an online quality inspection method that integrates deep learning driven detection with digital twin visualization. Leveraging the improved YAN-YOLO11 model and multi-sensor data fusion, the system achieves accurate transplanting status detection and assessment, as well as spatial localization and dynamic visualization of abnormal cases. In summary, the main conclusions are as follows:

1. Transplanting Status Detection: The proposed lightweight improved YAN-YOLO11 algorithm can achieve high-precision detection of normal seedlings, buried seedlings, and exposed-root seedlings. Compared with YOLO11n, its precision and recall increased by 2.4% and 2.5%. The mean average precision (mAP@0.5 and mAP@0.5:0.95) increased by 3% and 5.8%. At the same time, the model complexity was significantly reduced, achieving a balance between accuracy and real-time performance.

2. Multi-sensor Data Fusion: By fusing GNSS positioning data with visual detection results, the system achieves high-precision estimation of in-row spacing and assessment of missed planting and double planting. In field experiments, the accuracy of transplanting status assessment reached 93.81%, which was highly consistent with manual inspection results, demonstrating the stability and reliability of this module.

3. Digital Twin Visualization: A virtual tobacco field was constructed on the Unity3D platform to achieve three-dimensional dynamic mapping and real-time visualization of detection results. The system maintained a stable frame rate of about 30 FPS, which allowed it to clearly reflect the spatial distribution of transplanting points and enhance the intelligence of field operations.

4. Operational Optimization Feedback: The system can generate pending replanting lists and optimal replanting routes based on the spatial clustering of abnormal points, providing intuitive guidance for operators. At the same time, the feedback serves as data support for the optimization of transplanting equipment operational parameters. It establishes a “collection-detection-mapping-feedback” closed loop and effectively improves both operational efficiency and equipment performance.

Overall, the digital twin system for tobacco transplanting quality inspection developed in this study markedly enhanced the digitalization and refinement of transplanting operations, aligned with the vision of Agriculture 4.0, and accelerated the intelligent transformation of tobacco production management. With the continuous improvement of algorithms and platforms, this system will be further refined for applications in complex environments, such as large-scale multi-plot fields and multi-device access. It will be extended to a broader range of crop types and field operational stages, facilitating the deep integration of agricultural production with digital twin systems and enhancing its practical application in commercial projects.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

Ethics statement

Consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

QZ: Conceptualization, Investigation, Validation, Visualization, Writing – original draft. EM: Investigation, Project administration, Resources, Supervision, Writing – review & editing. JZ: Funding acquisition, Resources, Writing – review & editing. ZY: Data curation, Investigation, Validation, Writing – review & editing. JL: Writing – review & editing. DZ: Supervision, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. The authors declare that this study received funding from China National Tobacco Corporation Yunnan Province Branch. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Acknowledgments

We would like to express our sincere gratitude to the China National Tobacco Corporation Yunnan Province Branch for their financial support and for providing valuable resources for this study.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Basir, M. S., Chowdhury, M., Islam, M. N., and Ashik-E-Rabbani, M. (2021). Artificial neural network model in predicting yield of mechanically transplanted rice from transplanting parameters in Bangladesh. J. Agric. Food Res. 5, 100186. doi: 10.1016/j.jafr.2021.100186

Crossref Full Text | Google Scholar

Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv [preprint]. arXiv:2004.10934. Available online at: https://arxiv.org/abs/2004.10934 (Accessed September 21, 2025).

Google Scholar

Chen, K., Yuan, Y., Zhao, B., Zhou, L., Niu, K., Jin, X., et al. (2023). Digital twins and data-driven in plant factory: an online monitoring method for vibration evaluation and transplanting quality analysis. Agriculture. 13, 1165. doi: 10.3390/agriculture13061165

Crossref Full Text | Google Scholar

Chen, X., Chen, T., Meng, H., Zhang, Z., Wang, D., Sun, J., et al. (2024). An improved algorithm based on YOLOv5 for detecting Ambrosia trifida in UAV images. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1360419

PubMed Abstract | Crossref Full Text | Google Scholar

Cui, J., Zheng, H., Zeng, Z., Yang, Y., Ma, R., Tian, Y., et al. (2023). Real-time missing seedling counting in paddy fields based on lightweight network and tracking-by-detection algorithm. Comput. Electron. Agric. 212, 108045. doi: 10.1016/j.compag.2023.108045

Crossref Full Text | Google Scholar

Drake, S. P. (2002). Converting GPS coordinates (phi, lambda, h) to navigation coordinates (ENU) (Australia: Defence Science and Technology Organisation). DSTO-TN-0432.

Google Scholar

Escriba-Gelonch, M., Liang, S., van Schalkwyk, P., Fisk, I., Long, N. V. D., and Hessel, V. (2024). Digital twins in agriculture: Orchestration and applications. J. Agric. Food Chem. 72, 10737–10752. doi: 10.1021/acs.jafc.4c01934

PubMed Abstract | Crossref Full Text | Google Scholar

Fan, Z., Lu, J., Gong, M., Xie, H., and Goodman, E. D. (2018). Automatic tobacco plant detection in UAV images via deep neural networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 11, 876–887. doi: 10.1109/JSTARS.2018.2793849

Crossref Full Text | Google Scholar

Han, K., Wang, Y., Guo, J., and Wu, E. (2024). “ParameterNet: parameters are all you need for large-scale visual pretraining of mobile networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA. Seattle, WA, USA: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 15751–15761 (IEEE). doi: 10.1109/CVPR52733.2024.01491

Crossref Full Text | Google Scholar

Jegham, N., Koh, C. Y., Abdelatti, M., and Hendawi, A. (2024). Evaluating the evolution of YOLO (You Only Look Once) models: A comprehensive benchmark study of YOLO11 and its predecessors. arXiv [e-prints]. arXiv:2411.00201. Available online at: https://arxiv.org/abs/2411.00201 (Accessed September 21, 2025).

Google Scholar

Jiang, J., Johansen, K., Stanschewski, C. S., Wellman, G., Mousa, M. A. A., Fiene, G., et al. (2022). Phenotyping a diversity panel of quinoa using UAV-retrieved leaf area index, SPAD-based chlorophyll and a random forest approach. Precis. Agric. 23, 961–983. doi: 10.1007/s11119-021-09870-3

Crossref Full Text | Google Scholar

Johansen, K., Morton, M. J. L., Malbeteau, Y., Aragon, B., Al-Mashharawi, S., Ziliani, M. G., et al. (2020). Predicting biomass and yield in a tomato phenotyping experiment using UAV imagery and random forest. Front. Artif. Intell. 3, 28. doi: 10.3389/frai.2020.00028

PubMed Abstract | Crossref Full Text | Google Scholar

Ke, J., He, L., Han, B., Li, J., and Gao, X. (2024). ProFPN: Progressive feature pyramid network with soft proposal assignment for object detection. Knowl.-Based Syst. 299, 112078. doi: 10.1016/j.knosys.2024.112078

Crossref Full Text | Google Scholar

Kryjak, T., Blachut, K., Szolc, H., and Wasala, M. (2022). Real-time CLAHE algorithm implementation in SoC FPGA device for 4K UHD video stream. Electronics. 11, 2248. doi: 10.3390/electronics11142248

Crossref Full Text | Google Scholar

Li, H., Chen, H., Liu, J., Zhang, Q., Liu, T., Zhang, X., et al. (2025). Deep learning-based detection and digital twin implementation of beak deformities in caged layer chickens. Agriculture. 15, 1170. doi: 10.3390/agriculture15111170

Crossref Full Text | Google Scholar

Liu, M., Su, W.-H., and Wang, X.-Q. (2023). Quantitative evaluation of maize emergence using UAV imagery and deep learning. Remote Sens. 15, 1979. doi: 10.3390/rs15081979

Crossref Full Text | Google Scholar

Liu, Y. and Filippidis, F. T. (2024). Tobacco market trends in 97 countries between 2007 and 2021. Tob. Induc. Dis. 22, 31. doi: 10.18332/tid/177441

PubMed Abstract | Crossref Full Text | Google Scholar

Lu, Y., Yue, C., Liu, X., Wang, L., Liang, S. Y., Xia, W., et al. (2024). Research on digital twin monitoring system during milling of large parts. J. Manuf. Syst. 77, 834–847. doi: 10.1016/j.jmsy.2024.10.027

Crossref Full Text | Google Scholar

Luo, W., Li, Y., Urtasun, R., and Zemel, R. (2016). “Understanding the effective receptive field in deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS, Barcelona).

Google Scholar

Mahmoodi Khaniabadi, S., Javadpour, A., Gheisari, M., Zhang, W., Liu, Y., and Sangaiah, A. K. (2023). An intelligent sustainable efficient transmission internet protocol to switch between User Datagram Protocol and Transmission Control Protocol in IoT computing. Expert Syst. 40, e13129. doi: 10.1111/exsy.13129

Crossref Full Text | Google Scholar

Mishra, S. and Sharma, S. K. (2023). Advanced contribution of IoT in agricultural production for the development of smart livestock environments. IoT. 22, 100724. doi: 10.1016/j.iot.2023.100724

Crossref Full Text | Google Scholar

Nasirahmadi, A. and Hensel, O. (2022). Toward the next generation of digitalization in agriculture based on digital twin paradigm. Sensors. 22, 498. doi: 10.3390/s22020498

PubMed Abstract | Crossref Full Text | Google Scholar

Reza, A. M. (2004). Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. VLSI Signal Process. Syst. Signal Image Video Technol. 38, 35–44. doi: 10.1023/B:VLSI.0000028532.53893.82

Crossref Full Text | Google Scholar

Santos, C., Aguiar, M., Welfer, D., and Belloni, B. (2022). A new approach for detecting fundus lesions using image processing and deep neural network architecture based on YOLO model. Sensors. 22, 6441. doi: 10.3390/s22176441

PubMed Abstract | Crossref Full Text | Google Scholar

Sapkota, R., Ahmed, D., and Karkee, M. (2024). Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments. Artif. Intell. Agric. 13, 84–99. doi: 10.1016/j.aiia.2024.07.001

Crossref Full Text | Google Scholar

Sapkota, R., Flores-Calero, M., Qureshi, R., Badgujar, C., Nepal, U., Poulose, A., et al. (2025). YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series. Artif. Intell. Rev. 58, 274. doi: 10.1007/s10462-025-11253-3

Crossref Full Text | Google Scholar

Shahid, R., Qureshi, W. S., Khan, U. S., Munir, A., Zeb, A., and Moazzam, S. I. (2024). Aerial imagery-based tobacco plant counting framework for efficient crop emergence estimation. Comput. Electron. Agric. 217, 108557. doi: 10.1016/j.compag.2023.108557

Crossref Full Text | Google Scholar

Shi, H., Liu, C., Wu, M., Zhang, H., Song, H., Sun, H., et al. (2025). Real-time detection of Chinese cabbage seedlings in the field based on YOLO11-CGB. Front. Plant Sci. 16. doi: 10.3389/fpls.2025.1558378

PubMed Abstract | Crossref Full Text | Google Scholar

Snyder, J. P. (1997). Flattening the Earth: Two Thousand Years of Map Projections (Chicago: University of Chicago Press).

Google Scholar

Sun, G., Liu, Y., Nie, W., Du, Y., Sun, J., Chen, Z., et al. (2024). Multiple Omics investigation into the regulatory mechanisms of tobacco growth and quality by transplanting period. Ind. Crops Prod. 217, 118846. doi: 10.1016/j.indcrop.2024.118846

Crossref Full Text | Google Scholar

Sun, Z., Zhang, R., and Zhu, X. (2024). The progress and trend of digital twin research over the last 20 years: A bibliometrics-based visualization analysis. J. Manuf. Syst. 74, 1–15. doi: 10.1016/j.jmsy.2024.02.016

Crossref Full Text | Google Scholar

Teunissen, P. J. and Montenbruck, O. (2017). Springer handbook of global navigation satellite systems (Cham: Springer).

Google Scholar

Verdouw, C., Tekinerdogan, B., Beulens, A., and Wolfert, S. (2021). Digital twins in smart farming. Agric. Syst. 189, 103046. doi: 10.1016/j.agsy.2020.103046

Crossref Full Text | Google Scholar

Vong, C. N., Conway, L. S., Feng, A., Zhou, J., Kitchen, N. R., and Sudduth, K. A. (2022). Corn emergence uniformity estimation and mapping using UAV imagery and deep learning. Comput. Electron. Agric. 198, 107008. doi: 10.1016/j.compag.2022.107008

Crossref Full Text | Google Scholar

Wang, H., Liu, J., Zhao, J., Zhang, J., and Zhao, D. (2025). Precision and speed: LSOD-YOLO for lightweight small object detection. Expert Syst. Appl. 269, 126440. doi: 10.1016/j.eswa.2025.126440

Crossref Full Text | Google Scholar

Wang, K., Hu, X., Zheng, H., Lan, M., Liu, C., Liu, Y., et al. (2024). Weed detection and recognition in complex wheat fields based on an improved YOLOv7. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1372237

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, Q., Hua, Y., Lou, Q., and Kan, X. (2025). SWMD-YOLO: A lightweight model for tomato detection in greenhouse environments. Agronomy. 15, 1593. doi: 10.3390/agronomy15071593

Crossref Full Text | Google Scholar

Wang, X., Wu, Q., Zeng, H., Yang, X., Yang, X., Yi, X., et al. (2024). Digital evolution and twin miracle of sugarcane breeding. Field Crops Res. 318, 109588. doi: 10.1016/j.fcr.2024.109588

Crossref Full Text | Google Scholar

Xu, X., Gao, F., Xiong, D., Fan, Z., Xiong, S., Dong, P., et al. (2025). Digital twin-based winter wheat growth simulation and optimization. Field Crops Res. 329, 109953. doi: 10.1016/j.fcr.2025.109953

Crossref Full Text | Google Scholar

Yang, Z., Guan, Q., Zhao, K., Yang, J., Xu, X., Long, H., et al. (2024). “Multi-branch auxiliary fusion YOLO with re-parameterization heterogeneous convolutional for accurate object detection,” in Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Pattern Recognition and Computer Vision (PRCV 2024) 492–505 (Urumqi, China: Springer). doi: 10.1007/978-981-97-8858-3-34

Crossref Full Text | Google Scholar

Yang, J., Liu, S., Wu, J., Su, X., Hai, N., and Huang, X. (2025). “Pinwheel-shaped convolution and scale-based dynamic loss for infrared small target detection,” in Proceedings of the AAAI Conference on Artificial Intelligence. (Philadelphia, PA, USA: AAAI Press) 9202–9210. doi: 10.1609/aaai.v39i9.32996

Crossref Full Text | Google Scholar

Zhang, M., Jiang, Z., Jiang, L., Wu, C., and Yang, Y. (2022). Design and test of the seedling cavitation and lodging monitoring system for the rape blanket seedling transplanter. Agriculture 12, 1397. doi: 10.3390/agriculture12091397

Crossref Full Text | Google Scholar

Zhang, R., Zhu, H., Chang, Q., and Mao, Q. (2025). A comprehensive review of digital twins technology in agriculture. Agriculture. 15, 903. doi: 10.3390/agriculture15090903

Crossref Full Text | Google Scholar

Zhang, T., Qi, G.-J., Xiao, B., and Wang, J. (2017). “Interleaved group convolutions,” in Proceedings of the IEEE International Conference on Computer Vision, Venice. Proceedings of the IEEE International Conference on Computer Vision (ICCV) 4373–4382 (Venice, Italy: IEEE). doi: 10.1109/ICCV.2017.469

Crossref Full Text | Google Scholar

Zhang, T., Zhou, J., Liu, W., Yue, R., Yao, M., Shi, J., et al. (2024). Seedling-YOLO: high-efficiency target detection algorithm for field broccoli seedling transplanting quality based on YOLOv7-tiny. Agronomy. 14, 931. doi: 10.3390/agronomy14050931

Crossref Full Text | Google Scholar

Zhang, Z., Zhu, Z., Gao, G., Qu, D., Zhong, J., Jia, D., et al. (2023). Design and research of digital twin system for multi-environmental variable mapping in plant factory. Comput. Electron. Agric. 213, 108243. doi: 10.1016/j.compag.2023.108243

Crossref Full Text | Google Scholar

Keywords: crop management, deep learning, digital-twin, online quality inspection, tobacco, transplanting

Citation: Zhao Q, Ma E, Zhao J, You Z, Liu J and Zhao D (2026) A digital twin–driven deep learning framework for online quality inspection in tobacco transplanting. Front. Plant Sci. 17:1716046. doi: 10.3389/fpls.2026.1716046

Received: 30 September 2025; Accepted: 09 January 2026; Revised: 01 January 2026;
Published: 04 February 2026.

Edited by:

Sathishkumar Samiappan, The University of Tennessee, United States

Reviewed by:

Magdi A. A. Mousa, King Abdulaziz University, Saudi Arabia
Pathmanaban Pugazhendi, Easwari Engineering College, India

Copyright © 2026 Zhao, Ma, Zhao, You, Liu and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jian Zhao, emhhb2ppYW4xOTg3QGJqZnUuZWR1LmNu; Dong Zhao, emhhb2Rvbmc2OEBiamZ1LmVkdS5jbg==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.