An intelligent method and platform for obtaining lettuce canopy coverage

Liu, Hongbo; Zhang, Pan; Zheng, Jishu

doi:10.3389/fpls.2026.1749000

ORIGINAL RESEARCH article

Front. Plant Sci., 12 February 2026

Sec. Technical Advances in Plant Science

Volume 17 - 2026 | https://doi.org/10.3389/fpls.2026.1749000

An intelligent method and platform for obtaining lettuce canopy coverage

Hongbo Liu¹

Pan Zhang²

Jishu Zheng^1*

¹Research Institute of Agricultural Engineering, Chongqing Academy of Agricultural Sciences, Chongqing, China
²College of Information and Electrical Engineering, China Agricultural University, Beijing, China

The canopy characteristics of crops are essential aspects for assessing crop growth status and conducting phenotype analysis. As one of the key indicators to measure crop growth situation, accurate canopy coverage assessment can provide a strong foundation for crop growth and yield monitoring. Considering plant growth differences, this study investigated the statistical method for assessing canopy coverage using visual technology, focusing on lettuce as the research subject. Firstly, a multi-variety and multi-growth stage hydroponic lettuce image dataset was constructed, which lays a data foundation for the construction of a semantic segmentation model. Secondly, in order to ensure the precision of semantic segmentation, this study proposed a Channel-Axial-Spatial attention mechanism module from the perspective of feature enhancement. To satisfy the lightweight demands of practical model deployment, this study replaced the original backbone network of PSPNet with MobileNetv3, greatly reduced model complexity while minimizing model performance degradation. Finally, we developed a group lettuce canopy coverage acquisition system by employing Python in conjunction with PyQt5 and embedded the pre-trained models CAS-PSPNet and MobileNetv3-PSPNet into the system for effectiveness verification. By integrating the proposed attention mechanism module with PSPNet, the integrated model outperformed FCN, Unet, SegNet, Deeplabv3+, GCN, ExFusion, ENet, BiseNet, FusionNet, LinkNet, RefineNet, LWRefineNet, and PSPNet in semantic segmentation of lettuce plant groups, achieving a Mean Intersection over Union of 0.9832. The Mean Intersection over Union of PSPNet based on lightweight improvement is 0.9717, and the model size is 9.3M. The results show that the proposed semantic segmentation method can accurately capture the crop canopy coverage, offering a feasible solution for real-time crop growth monitoring.

1 Introduction

With the increasing adoption and promotion of smart agriculture, precise management and control of various agriculture tasks have emerged as top priority, Accurate monitoring of crop growth status plays a critical role in making precise decision for all aspects of agricultural production (Kamilaris and Prenafeta-Boldú, 2018; Kubota et al., 2019; Tian et al., 2021; Shu et al., 2023). the precise detection of growth indicators such as crop leaf area and canopy coverage serves a dual purpose: it not only provides good data support for crop growth models and yield estimation but also reflects the adaptability of crops to their specific environmental conditions (Chang et al., 2021; Mohammadi et al., 2021). However, traditional crop growth parameter estimation predominantly relies on manual measurement by agricultural technicians, resulting in a significant labor input and time consumption, especially in large-scale cultivation scenarios (Keller, 2018; Lee et al., 2018; Zhang, 2020; Yoosefzadeh-Najafabadi et al., 2021). Therefore, achieving real-time, accurate, and automatic estimation of crop growth status in complex agricultural scenarios remains a pivotal challenge that needs to be solved in the agricultural production process.

Plant phenotyping is an emerging technology used for analyzing the external traits of crops such as shape, structure, size, color, etc. These traits result from the interplay of genotype and environment (Makanza et al., 2018; Xu et al., 2020). By combining artificial intelligence, machine vision and automation technologies, high-throughput, accurate and efficient plant phenotyping technologies have been rapidly developed, and these advancements find applications in crop breeding, nutrient monitoring, stress analysis, disease detection, freshness estimation, etc (Gou et al., 2018; Ma et al., 2018; Singh et al., 2018; Zhang et al., 2018a; Das et al., 2021, Das et al., 2022). In addition, a high-throughput plant phenotyping platform based on phenotyping technology can employ multiple sensors to measure important physical data of plants, such as structural characteristics, plant height, volume, color, fresh weight, wilting degree, flowers/fruits count, etc (Masuda, 2021). Among these, the visual sensor, as a key component of the phenotyping platform, can realize functions such as image information processing and feature extraction, which are characterized by low cost and rapid processing speed (Deery et al., 2021). Consequently, leveraging plant phenotyping technology enables accurate real-time estimation of crop growth status.

Nevertheless, plant phenotyping technology encounters challenges in real-world agricultural production scenarios, including its vulnerability to the complex background and environmental interference (Lee et al., 2018). Therefore, to achieve an accurate estimation of crop growth status, it is essential to separate the research object from intricate backgrounds and address the accurate segmentation of that object (Jiang et al., 2018). Traditional image segmentation methods mainly include techniques based on the region, threshold, graph theory, edge, energy functional, cluster analysis, wavelet transform, mathematical morphology, artificial neural network, and genetic algorithm (Fan et al., 2021; G. and Gopi, 2021; Mohammadi et al., 2021; Nikbakhsh et al., 2021; Quan et al., 2021a). Although there is no clear requirement for the size of the dataset, it still faces difficulties such as noise interference, uneven grayscale, parameter setting, filter selection, and low operation speed (Adams et al., 2020). Simultaneously, many traditional segmentation methods are tailored to specific images, making it challenging to achieve optimal segmentation results in dynamically changing complex environments many (Rangarajan and Purushothaman, 2020).

The application of information technologies, including artificial intelligence and computer vision, into the agricultural field provides a technical guarantee for accurate crop segmentation under complex environmental conditions (Guo et al., 2021). For example, Sadashivan et al (Sadashivan et al., 2021). proposed a fully automatic segmentation method based on artificial intelligence for UAV-captured images. The method is applied to five datasets, containing three different crops and different growth stages, to calculate the leaf area index. Yang et al (Yang et al., 2020). took the leaf images captured under complex backgrounds as the research object, using Mask R-CNN for leaf segmentation model and VGG16 for leaf classification model. The results show that the average misclassification error of the segmentation model is 1.15%, and the accuracy of the classification model is 91.5%. Zhu et al (Zhu et al., 2020). proposed an automatic segmentation approach for field corn, including skeleton extraction, skeleton-based coarse segmentation, and stem-and-leaf classification-based fine segmentation. Their results reveal an impressive average accuracy of 0.964, which applies equally well to fully unfolded leaves and wrapped leaves. To monitor the growth, size, and yield of plants, Trivedi and Gupta (Trivedi and Gupta, 2021) proposed an automatic monitoring method based on U-Net. The method achieved accuracies of 94.91%, 94.93%, and 95.05% on the training dataset, validation dataset, and test dataset, respectively. Bhagat et al (Bhagat et al., 2022). introduced a new leaf segmentation and counting method Eff-ENet++ by combining EfficientNet-B4 and UNet++ with the goal of segmenting individual plant leaves. This method realizes the accurate extraction of image features through EfficientNet-B4, and improves the reliability of the segmentation algorithm. Lu et al (Lu et al., 2022). presented a new method for robust automatic plant segmentation in color images, which enhanced both plant and background segmentation by unconstrained optimization, utilizing a linear combination of RGB three-channel images. Chang et al (Chang et al., 2021). embedded artificial intelligence methods, such as segmentation models into some smart greenhouse systems. These systems enable real-time monitoring of growth parameters such as leaf number, leaf area, and dry matter mass. The results demonstrate the feasibility and efficiency of using artificial intelligence modeling in agriculture-related tasks. Marani et al (Marani et al., 2021). proposed a new method to improve cluster pixel segmentation for grape bundle. It compared the effects of pre-trained models including AlexNet, GoogLeNet, VGG16, and VGG19 in the segmentation process, among which VGG19 exhibited the most favorable performance, achieving an average segmentation accuracy of 80.58%. Notably, obtaining growth parameters such as leaf area through instance segmentation is only one aspect of AI segmentation methods. Zenkel et al (Zenkl et al., 2022). constructed a semantic segmentation model using DeepLab v3+ with high-resolution outdoor winter wheat datasets of different genotypes and growth stages. The IoU of plants and soil reached values of 0.77 and 0.90, respectively, outperforming regression support vector machines and random forest classification methods. Li et al (Li et al., 2022). proposed PlantNet, a dual functional segmentation network that can simultaneously achieve semantic segmentation and instance segmentation. Its performance was validated on tobacco, tomato, and sorghum datasets. The IoU of semantic segmentation reached 85.86%, while the average precision of instance segmentation reached 83.30%. Du et al (Du et al., 2023). reported the plant segmentation transformer network (PST) for point cloud data segmentation, which achieved a MIoU of 93.96% for semantic segmentation on the rapeseed plant dataset. Besides, by combining PST and PointGroup, the instance segmentation reached an average precision of 88.83%. In addition, segmentation can also facilitate the counting of flowers and fruits, as well as the acquisition of fruit phenotypic parameters (Zabawa et al., 2020; Li et al., 2021; Rahim et al., 2021). Li et al (Li et al., 2024). proposed a CNN-based Edge Enhancement Network (EENet), which strengthens the model’s ability to learn edge information during the training phase, thereby improving crop segmentation accuracy. Picon et al (Picon et al., 2025). innovatively introduced a semantic segmentation framework for crop condition assessment and conducted research on crop disease identification using semantic segmentation on a self-constructed dataset. Kong et al (Kong et al., 2024). proposed a novel segmentation network based on the YOLO architecture, enabling precise delineation of crop contours while avoiding additional computational resource demands. Cao et al (Cao et al., 2025). proposed a nearly unsupervised crop segmentation method termed DepthCropSeg, which leverages Depth Anything V2 to generate pseudo-labels and selects optimal label data for model training, substantially reducing the time investment required for data annotation. Li et al (Li et al., 2025). designed a multi-level knowledge distillation framework tailored for agricultural scenarios, integrating high-level semantic information with low-level texture features during distillation, which significantly enhanced segmentation accuracy in complex environments. Zheng et al (Zheng et al., 2024). proposed a lightweight semantic segmentation model based on UAV visible-light imagery for multi-crop classification, providing technical support for crop categorization and intelligent field monitoring via drones. Ye Mu et al (Mu et al., 2026). proposed a hybrid object semantic segmentation model PCSNet based on unmanned aerial vehicle remote sensing, mainly used for segmenting 14 crops and 2 backgrounds. The mIoU of the model reached 84.61%. Gangwar et al (Gangwar et al., 2025). proposed a tomato leaf disease segmentation model called Tomato TransDeepLab, which can not only recognize leaf diseases but also quantify the severity of diseases. Kang et al (Kang et al., 2025). developed an integrated IoT and computer vision monitoring system for crop growth management. The system utilizes a recursive image segmentation model to process sequential images for continuous monitoring of crop growth, and has been validated for feasibility on cabbage. Guerra Ibarra et al (Guerra Ibarra et al., 2025). compared the effects of different backbones on UNet segmentation performance and identified a backbone network VGGNet suitable for precise segmentation of leaves, fruits, and backgrounds. Zhao et al (Zhao et al., 2025). proposed a field lettuce segmentation model YOMASK, which achieved a segmentation accuracy of 95.41% by improving the feature extraction and fusion of the network itself.

Existing research has achieved relevant results in the field of crop segmentation, but there are still the following limitations. (1) Research on crop segmentation in large-scale field planting scenarios is mostly focused on disease classification, crop classification, and other aspects. There is relatively little research on crop growth supervision, especially in small-scale facility planting scenarios. (2) Most existing research focuses on the construction of segmentation models and growth parameter statistics for specific growth stages or individual leaves, with relatively few studies on the entire growth stage and population crops. (3) There are relatively few intelligent platforms for crop segmentation and growth parameter statistics, and research and algorithm implementation are not closely related.

Therefore, this study takes hydroponic lettuce as the research object, fully considering the differences in variety and growth stage, and constructs a population lettuce semantic segmentation model from 2 perspectives: model performance improvement and model lightweight deployment. Then, growth monitoring and statistical analysis of multiple varieties of lettuce at different growth stages are carried out. The specific contributions are as follows:

1. A semantic segmentation dataset was constructed for multiple varieties of lettuce at different growth stages, taking into account the phenotype and growth differences of different lettuce varieties.

2. To ensure the segmentation accuracy of the model, this study proposes a new attention mechanism CAS, which effectively improves the segmentation reliability of PSPNet.

3. To meet the lightweight requirements of model deployment, this study compared the improvement effects of different lightweight backbone networks on PSPNet and found that PSPNet based on MobileNetv3 achieved a basic balance between model size and model precision.

4. A semantic segmentation model suitable for multiple varieties of lettuce and an intelligent platform for obtaining canopy coverage have been developed, effectively improving the efficiency of obtaining key growth parameters for lettuce populations.

2 Materials and methods

2.1 Experimental field

Considering that difference in regions, planting patterns, etc. may lead to differences in the growth status of the same crop, experiments were carried out in both Chongqing and Beijing from March to May and October to December 2024. The planting scenarios are shown in Figure 1. In this experiment, the dataset was mainly collected from three varieties of hydroponic lettuce (Small cream green, Batavia, and Boston cream), named V1, V2, and V3. Images were collected after the lettuce were transplanted to hydroponic system.

Figure 1

Panel (a) shows an indoor vertical farming facility with multi-level metal racks and automated systems under a glass roof, while panel (b) displays a greenhouse with rows of hydroponic lettuce plants arranged in blue trays.

Figure 1. Experimental scenarios in Chongqing and Beijing. (a) represents the stereoscopic cultivation in Chongqing, and (b) represents the flat cultivation in Beijing.

2.2 Image acquisition

In this study, there are differences in the cultivation environment between Chongqing and Beijing regions. Specifically, the temperature in Chongqing is relatively suitable from March to May, resulting in relatively fast growth rate of lettuce. Therefore, the data collection in Chongqing occurs every 5 days. While the growth rate of lettuce is slower due to the lower temperature in Beijing between October and December. Therefore, the data collection in Beijing occurs per 8 days. Following seedling transplantation (as shown in Figure 2), data collection in Chongqing occurred on the 5th, 11th, 17th, 23rd, and 29th days post-transplantation, with 100 images collected each time. Similarly, data collection took place in Beijing on the 9th, 18th, 27th, 36th, and 45th days after transplantation, with 100 images collected each time. 500 images were collected from the entire growth stage in Chongqing and Beijing, respectively, and totally 1000 images collected for model training. During the image data acquisition process, a fixed bracket is used to maintain the camera plane at a distance of 40 centimeters from the cultivation bed plane and parallel to it, to minimize data acquisition errors and ensure the reliability of subsequent result comparisons, as shown in Figure 3.

Figure 2

This diagram depicts two primary stages of plant growth: the seedling preparation stage, which includes images of seedlings, separated seedlings, and transplanted seedlings; and the data acquisition stage, where image data is collected at five distinct growth phases.

Figure 2. Data acquisition process.

Figure 3

Diagram showing a camera mounted on a fixed bracket 40 centimeters above a cultivation bed with rows of lettuce plants, connected by wire to a laptop computer for monitoring and analysis.

Figure 3. Image dataset acquisition.

2.3 Image pre-processing

The acquisition of canopy coverage of the group lettuce based on the accurate segmentation of lettuce plants. According to the requirements of the segmentation network for the dataset, Labelme software was used to annotate lettuce plants in this study. In addition, considering the difference in the growth cycle of lettuce between Chongqing and Beijing, 500 images collected from Chongqing and Beijing were separately divided into training, validation and testing sets in a ratio of 6:2:2. Then, the same category of data sets from the two regions were summarized to create a model construction dataset with 600, 200 and 200 images for training, validation and testing respectively. Besides, during the model training process, we adjusted the image size to 640×640 pixels to meet the fundamental requirements of effective training.

2.4 Multi-strategy modeling methods

Crop development, from seedlings to mature plants, involves a dynamic growth process. It is prospecting to monitor the phenotypic changes of crops across different growth stages through machine vision technology to characterize their growth status. PSPNet is a pyramid scene parsing network (Zhao et al., 2017), addressing the limitation of full convolution networks by effectively integrating local and global feature information through a pyramid pooling module, thereby enhancing model precision. Notably, both the model reliability and the requirements of lightweight deployment in practical applications needs to be considered. Therefore, this study proposed CAS-PSPNet and MobileNetv3-PSPNet, focusing on both performance enhancement and model lightweighting. It enables accurate determination of the canopy coverage of the group lettuce in different growth stages, laying a good foundation for the growth monitoring and analysis of lettuce throughout its entire growth cycle. The improvement strategy for PSPNet is shown in Figure 4.

Figure 4

Diagram illustrating a plant image segmentation workflow: (a) input images of green plants on a blue background, (b) pathway showing backbone and feature enhancement strategies, (c) pyramid pooling module with convolution and upsampling operations, and (d) final binary predictions highlighting plant regions in red on black backgrounds.

Figure 4. The network structure of the improved PSPNet. (a) Input image. (b) Feature map. (c) Pyramid pooling module. (d) Final prediction.

2.4.1 Input data

This study mainly constructs the semantic segmentation model for the group lettuce datasets of three varieties and five growth stages. On the premise of considering the computational capacity of GPU and the dataset characteristics, we randomly partition the datasets of different varieties and different growth stages into the train, validation, and test subsets. Specifically, the train dataset and validation dataset participate in the process of model construction, and the test dataset serves to evaluate the reliability of the model.

2.4.2 Backbone

The backbone feature extraction network of PSPNet is ResNet50. In order to meet the lightweight requirements in the actual deployment process of the model, this study uses MobileNetv3 (Howard et al., 2017) and InceptionNext (Yu et al., 2023) lightweight backbone to replace ResNet50 to achieve network lightweight. Both MobileNetV3 and InceptionNext backbone networks offer advantage of lightweight, but their ways of achieving lightweight are slightly different. MobileNetv3 achieves lightweight through the introduction of NAS and SE attention mechanisms, while InceptionNext mainly achieves lightweight through the use of bottleneck layers and residual connections.

2.4.3 Feature enhancement module

To ensure the reliability of subsequent networks in pixel level classification, this study explored the enhancement effect of Axial (Ho et al., 2019), CBAM (Woo et al., 2018), and ECA (Wang et al., 2020) on the features from the backbone feature extraction network. Comprehensively consider the actual improvement effect of different attention mechanisms on the network, and be inspired by the feature enhancement method of Axial attention mechanism. On the premise of considering that CBAM enhances features in both dimensions of channel and spatial, this study introduced the row attention enhancement module and column attention enhancement module from the Axis attention mechanism to further enhance CBAM, and proposed an attention mechanism module called CAS. The CAS attention mechanism module is added to the end of the backbone feature extraction network to realize the secondary enhancement of plant features. The network structure of the CAS is shown in Figure 5.

Figure 5

Diagram of the CAS attention module showing data flow from the blue input feature cube through a channel attention module, feature enhancement module with row and column attention, spatial attention module, and resulting in a refined feature cube.

Figure 5. The network structure of the CAS attention mechanism.

Assuming that the feature $F$ obtained from the backbone network generates two different spatial context descriptors $F_{a v g}^{c}$ and $F_{\max}^{c}$ after entering the channel attention module. Then two spatial context descriptors are transmitted to a multi-layer perceptron (MLP) to generate channel attention feature ( $M C (F)$ ). This specific part mainly focuses on the network’s target, lettuce plants, by assigning a higher weight to the plant-related portion in the feature map processed by the module compared to the background. The specific calculation process is shown in Equation 1:

\begin{array}{l} \begin{array}{l} M C (F) = σ (M L P (A v P o o l (F))) + M L P (M a x P o o l (F)) \otimes F \\ = σ (W_{1} (W_{0} (F_{a v g}^{c}))) + W_{0} (W_{1} (F_{\max}^{c})) \otimes F \end{array} & (1) \end{array}

Where, $σ$ represents the sigmoid function, $W_{0}$ and $W_{1}$ are the weights of MLP, $F_{a v g}^{c}$ represents the average pooling feature, and $F_{\max}^{c}$ represents the maximum pooling feature.

In the actual planting scene, due to the interference of external environment such as light, the plants have certain color deviation. To ensure the reliability of subsequent segmentation, inspired by the feature enhancement concept of Axis attention mechanism, we made the following improvements following the channel attention module. Based on the channel attention feature map MC obtained above, input it into the row attention mechanism ( $R_{f}$ ) and column attention mechanism ( $C_{f}$ ) respectively to obtain the row feature and column feature (Details of $R_{f}$ and $C_{f}$ can be found in (Ho et al., 2019)), and then add the row feature and column feature as the input feature of the next module. The specific calculation process is shown in Equation 2.

\begin{array}{l} A (F) = R_{f} (M C) + C_{f} (M C) & (2) \end{array}

Where, $A (F)$ represents the output feature after the feature enhancement module.

Finally, input the feature $A (F)$ after the secondary enhancement into the spatial attention module to generate a spatial attention feature map, guiding the network where the target object is, that is, the spatial information of lettuce plants. Subsequently, the MaxPool and AvgPool operations are performed on $A (F)$ in turn, and the two are connected to generate effective feature descriptors ( $F_{a v g}^{s}$ and $F_{\max}^{s}$ ). Then these descriptors are fed into the convolution layer to generate the spatial attention feature map ( $M S (F)$ ). The specific calculation process is shown in Equation 3.

\begin{array}{l} \begin{array}{l} M S (F) = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)])) \\ = σ (f^{7 \times 7} ([F_{a v g}^{s}; F_{\max}^{s}])) \end{array} & (3) \end{array}

Where, $σ$ represents the sigmoid function, $f^{7 \times 7}$ represents a convolution with the kernel size of 7 × 7.

2.4.4 Pyramid pooling module

Traditional fully convolutional networks rely on local receptive fields and lack understanding of the overall structure of the image, resulting in category confusion when segmenting large-sized objects or complex scenes. PPM utilizes parallel processing of features at different scales to enable the model to simultaneously focus on local details and global context, thereby improving the segmentation accuracy and boundary clarity of targets at different scales.

The PPM is proposed and used to construct the global scene on the last layer feature map (F) of the ResNet50 network. This module mainly carries out feature fusion across four different scales (1×1, 2×2, 3×3, 6×6). By employing pooling operation, the global information of feature maps at different scales can be preserved, thereby better learning contextual information and ensuring the segmentation effect of the model. The specific calculation process is shown in equation X.

Assuming the k-th pooling branch is Equation 4:

\begin{array}{l} P_{k} = U p s a m p l e (C o n v (A d a p t i v e A v g P o o l_{S_{k}} (F))) & (4) \end{array}

Among them, $S_{k} \in {1, 2, 3, 6}$ . The final output feature is Equation 5:

\begin{array}{l} F_{o u t} = C o n v_{1 \times 1} (C o n c a t (F, P_{1}, P_{2}, P_{3}, P_{4})) & (5) \end{array}

2.5 Model training and evaluation

This study is mainly completed based on RTX3090 (24G). In addition, to make better use of the pre-trained model for model construction, we use the combination of freeze and unfreeze for model training. Once the semantic segmentation model of the group lettuce plants of multiple varieties and growth stages is constructed, it is necessary to verify the model. At present, the commonly used evaluation indicators of semantic segmentation models include Pixel accuracy (PA) and Mean intersection over Union (MIoU). In addition, Model-size, FPS, and Time-consuming are also considered as the evaluation indicators to further evaluate the reliability of the model. The specific calculation process of some indicators shown in Equation 6–8.

\begin{array}{l} P A = \frac{T P + T N}{T P + T N + F P + F N} & (6) \end{array}

\begin{array}{l} I o U = \frac{T P}{T P + F P + F N} & (7) \end{array}

\begin{array}{l} M I o U = \frac{\sum_{i = 1}^{C} I o U_{i}}{C} & (8) \end{array}

where FP, TP, TN, and FN represent false positive, true positive, true negative, and false negative, respectively, and C represents the total number of categories.

3 Results

3.1 Comparison of semantic segmentation methods

To solve the problem of accurate semantic segmentation of lettuce in complex agricultural scenes, this study compared a total of 16 typical semantic segmentation methods to select the best modeling approach. The evaluated methods include: FCN (Shelhamer et al., 2017), UNet (Olaf et al., 2015), SegNet (Badrinarayanan et al., 2017), Deeplabv3+ (Chen et al., 2018), GCN (Peng et al., 2017), ExFuse (Zhang et al., 2018b), ENet (Paszke et al., 2016), BiseNet (Yu et al., 2018), FusionNet (Quan et al., 2021b), LinkNet (Chaurasia and Culurciello, 2018), RefineNet (Lin et al., 2017), LWRefineNet (Nekrasov et al., 2019), PSPNet (Zhao et al., 2017), Swin-transformer (Liu et al., 2021), SegNeXt (Guo et al., 2022), and DDRNet (Pan et al., 2023), The specific modeling results are shown in Table 1.

Table 1

Table 1. Comparative analysis of modeling methods.

As shown in Table 1, the above 16 methods performed well on the lettuce plant datasets of multiple varieties and different growth stages collected in this study. Each method exhibits distinct advantages within this dataset. By comparing the various evaluation indicators, it can be found that lightweight semantic segmentation models such as FCN, SegNet, BiseNet, LinkNet, and LWRefineNet, demonstrate slightly lower PA and MIoU than PSPNet, while offer smaller model sizes and better FPS performance than PSPNet. These differences are attributed to the lightweight model reducing complexity of network and computational by adjusting the depth or width of the network. However, this trade-off also weakening the model’s ability to extract features, causing a decrease in the model segmentation precision. Meanwhile, striking a balance between model-size and precision remains a challenge in lightweight models design. For instance, the earlier adoption of down sampling strategy in the design of ENet network architecture resulted in spatial information loss, impacting its segmentation accuracy. UNet is a well-known method used in medical image segmentation, excels in edge segmentation precision. However, in this study, UNet faces challenges when segmenting plant stems due to their similarity to the background. In some cases, UNet misclassify stems as part of the background, resulting in low segmentation precision of the model. Focusing on expanding receptive fields and multi-scale feature fusion, methods such as Deeplabv3+, Exfuse, FusionNet, RefineNetextract features through deep convolution, feature fusion of different layers, skip connections, etc. to improve the segmentation precision of the model. But the model performance is slightly lower than PSPNet due to the high similarity between features of background and some plant stem, as well as relatively high model complexity. GCN is another semantic segmentation method based on graph convolutional networks. As the number of layer increases, the representation vectors of nodes tend to be consistent, leading to over smooth problems and high model complexity. Both the model performance and FPS are lower than PSPNet. PSPNet explicitly aggregates multi-scale global features through the PPM, while Swin Transformer relies on self-attention mechanism to indirectly capture long-range dependencies (which may be limited by local windows). SegNeXt and DDRNet focus more on local feature extraction, resulting in stronger segmentation consistency of PSPNet in complex scenes such as small objects and boundary blurred areas. Therefore, we selected PSPNet with better performance of the core evaluation indicators (PA and MIoU) on the collected datasets as the main semantic segmentation method in this study.

3.2 Modeling results for improving model performance based on attention mechanism

To ensure the reliable performance of semantic segmentation models in complex agricultural scenarios, this study explored the enhancement effects of several mainstream attention mechanisms (e.g., CBAM, Axial, ECA) on PSPNet, with the primary objective of improving the robustness of lettuce plant segmentation through feature enhancement. Furthermore, we integrated the attention mechanism module (CAS) proposed in this study with PSPNet and compared its improvement effects with those of other mechanisms. The specific results are shown in Table 2.

Table 2

Table 2. Comparison of modeling results under different attention mechanism improvements.

As shown in Table 2, the segmentation effects of PSPNet combined with different attention mechanisms (ECA, CBAM, Axial, and CAS) were compared. The results indicate that PSPNet’s performance was further enhanced when integrated with ECA and CAS. The ECA attention mechanism enhances feature representation by introducing channel attention into convolution operations. This enables the network to capture inter-channel dependencies, adaptively adjust channel feature weights, better focus on important features, and suppress less relevant ones, thereby improving feature discriminability. Compared to the baseline PSPNet, ECA-PSPNet shows slight improvements in PA, MIoU, and FPS, with increases of 0.0003, 0.001, and 1.7293, respectively. However, since ECA primarily focuses on channel relationships, it may not fully leverage spatial information. In contrast, CBAM extracts both channel and spatial features to enhance the model’s expressive and generalization capabilities, though its efficiency may decrease for distantly related features. The Axial attention mechanism captures global context by decomposing attention calculations along different axes, thereby improving model performance. The CAS attention mechanism proposed in this study not only considers channel and spatial features but also employs row-wise and column-wise attention mechanisms for secondary enhancement of channel features. A key indicator of semantic segmentation effectiveness, MIoU, increased significantly by 0.0022 with CAS, demonstrating that the proposed mechanism can effectively improve segmentation performance for lettuce across multiple varieties and growth stages.

3.3 Modeling results based on model lightweight improvement

Constructing a semantic segmentation model for the group lettuce not only emphasizes the improvement of model performance, but also considers the lightweight deployment requirements of the model. On the basis of PSPNet, this study replaced the original backbone network ResNet50 with lightweight backbone networks MobileNetv3 and InceptionNeXt, respectively, to explore the lightweight improvement effect of the model. The specific modeling results are compared in Table 3.

Table 3

Table 3. Comparison of Lightweight Improvement Results.

As shown in Table 3, MobileNetV3-PSPNet experiences a smaller performance loss compared to its pre-improvement state, while showing significant improvements in Time, FPS, and Model-size. In particular, the model size has been reduced by approximately 20 times, which greatly enhances inference speed and meets real-time requirements in practical applications. In contrast, InceptionNeXt-PSPNet exhibits a relatively large performance drop compared to its earlier version, even though its model size is only halved. Although InceptionNeXt also achieves relatively fast inference speed, its segmentation accuracy is slightly lower than that of MobileNetV3. This is because the complex and powerful structure of InceptionNeXt makes it more susceptible to interference from background features during feature extraction. As a result, some stem pixels of plants are misclassified as background pixels, leading to reduced segmentation precision. Furthermore, due to its smaller number of parameters, MobileNetV3 has a lower computational load and faster inference speed, giving it an advantage in tasks with high real-time demands. Additionally, the design of MobileNetV3 is more concise and efficient. It incorporates residual connections and lightweight attention mechanisms, enabling better learning of feature representations and contextual information, which ensures stronger performance in downstream tasks such as segmentation. Overall, considering both reliability and lightweight needs, MobileNetV3-PSPNet is more suitable for practical production environments.

3.4 Construction of a canopy coverage acquisition platform

To facilitate the practical application of the aforementioned model for monitoring lettuce growth, this study developed a canopy coverage acquisition system for lettuce using Python and PyQt5. The entire system can supports both online or offline image data acquisition, weight file loading, and multi-model segmentation results acquisition. The specific system interface is shown in Figure 6.

Figure 6

Screenshot of a software interface titled “Canopy coverage of lettuce population” displaying four panels comparing images of lettuce plants. The panels show the original image and segmentation results from PSPNet, CAS-PSPNet, and MobileNetV3-PSPNet models, with segmented backgrounds in black. Percentage results for CAS-PSPNet (47.76%) and MobileNetV3-PSPNet (47.82%) are shown. Several control buttons appear on the left.

Figure 6. A system for obtaining canopy coverage of the group lettuce.

As shown in Figure 6, the system is designed with four main functions for obtaining lettuce canopy coverage: threshold setting, image loading, pre-trained model loading, and result output. First, users can set the segmentation precision threshold using the slider at the top of the interface. Second, image or video data can be loaded by clicking the “Import File” or “Turn on” button on the left. Third, the corresponding pre-trained models (PSPNet, CAS-PSPNet, and MobileNetv3-PSPNet) are loaded by sequentially clicking their respective buttons. Finally, after clicking the “Start” button, the segmentation results from the selected model are displayed in the central area, while the calculated canopy coverage for the lettuce plants is shown in the display area on the right. The accuracy of canopy coverage calculation depends on the reliability of the semantic segmentation model. It is derived by calculating the ratio of plant-area pixels to the total image pixels in the model’s output. The system allows for the simultaneous output of segmentation results from both the baseline and improved models, as well as the canopy coverage information based on the CAS-PSPNet and MobileNetv3-PSPNet methods. This design facilitates a direct comparison between the two approaches.

3.5 Acquisition of canopy coverage of the group lettuce

In this section, we used the canopy coverage acquisition system (Section 3.4) to obtain canopy coverage data for lettuce plant across different varieties and growth stages. Taking the number of pixels as the basic statistical unit, calculating the proportion of pixels of the group plants within the whole image. This proportion was used as the measure of canopy coverage. The specific results are shown in Figure 7.

Figure 7

Each pair of rows in the figure forms a group, with each group containing the original images and segmented result images of the same lettuce variety. Below each group of images, the corresponding canopy coverage statistics are displayed.

Figure 7. Acquisition of canopy coverage of the group lettuce in different varieties and growth stages.

Figure 7 shows the segmentation results and canopy coverage information of the group lettuce under different varieties and growth stages based on the CAS-PSPNet semantic segmentation method. It can be found that the model achieves a satisfactory segmentation performance, except for the minor background segmentation errors between leaves in the V3 variety in the last row. The canopy coverage information for lettuce of different varieties and growth stages, obtained through semantic segmentation models, can be used as an important indicator to measure the growth status of the group lettuce. This study also statistically analyzed the changes in canopy coverage of the group lettuce of different varieties and growth stages, as shown in Figure 8.

Figure 8

A bar chart compares the canopy coverage percentages of three lettuce varieties, V1 (blue), V2 (red), and V3 (green), across five growth stages. It shows that V1 consistently has higher coverage than V2 and V3, and the coverage of all three varieties increases as the growth stages progress.

Figure 8. Canopy coverage of different lettuce varieties at different growth stages.

As shown in Figure 8, the canopy coverage information shows an increasing trend with the advancement of growth stage, and variations in canopy coverage are observed among different varieties of lettuce at different growth stages. The group lettuce canopy coverage acquisition system can be used in subsequent research for automatic calculation of the canopy coverage information of different crops and automatic analysis of the dynamic growth changes of different crop varieties from both quantitative and qualitative perspectives, which is certainly promising and applicable.

4 Discussion

4.1 Necessity of obtaining canopy coverage of the group plant

Accurate acquisition of canopy coverage or leaf area is one of the important indicators for evaluating plant growth. Currently, some studies obtain the canopy ground cover of crop groups for field scenes and compare the advantages and disadvantages among various sensors. However, the process of obtaining canopy coverage with RGB digital camera is based on the pixel discrimination standard of (green – red)/(green + Red) > 0 (Deery et al., 2021). This method can greatly improve the speed of detection in practical applications, but there may be a certain gap with the deep learning method in accuracy (Yang et al., 2020; Sadashivan et al., 2021). Some studies may be limited by the cultivation mode (pot or pipe cultivation) and mainly estimate the leaf area of a single plant, but in the actual large-scale planting process, the canopy coverage or leaf area of a single plant has certain limitations, which is difficult to reflect the growth status of the same batch of the group plants (Rangarajan and Purushothaman, 2020; Mohammadi et al., 2021; Trivedi and Gupta, 2021; Bhagat et al., 2022). Therefore, this study fully considered the individual growth differences in the group plants and conducted a statistical analysis of the individual growth differences of the group plants. Among them, for the group plants in the same monitoring area, we used the ratio of the number of pixels of the target plant to the number of pixels of the whole viewing area to evaluate the growth difference of individual plants. The specific results are shown in Figure 9.

Figure 9

Six panels labeled (a) through (f) show segmented red plant-like shapes on a black background inside green boxes, each marked with a number and a percentage value in white text, illustrating image analysis of foliage.

Figure 9. Analysis of individual plant growth differences. (a) S1-V1 (b) S1-V2 (c) S1-V3 (d) S2-V1 (e) S2-V2 (f) S2-V3.

As shown in Figure 9, we selected partial images of S1 and S2 in the early growth stages of three varieties of lettuce for analysis of individual plant growth differences. It can be found that individual plant growth differences of different varieties of lettuce occurred in the early growth stages. As shown in figures (a), (b), and (c), the individual canopy coverage of multi-variety lettuce plants in the S1 stage can be basically divided into three levels: 0%-1%, 1%-2%, and 2%-3%. Similarly, as shown in figures (d), (e), and (f), it is found that the individual canopy coverage of multi-variety lettuce plants in the S2 stage can be generally divided into three levels: 2%-3%, 3%-4%, and 4%-7%. The reason may be that the root development degree of different plants is inconsistent in the process of seedling, resulting in further differences in nutrient absorption, and finally reflected in individual growth differences. Therefore, this study evaluated the growth of large-scale crops by obtaining the canopy coverage of the group lettuce of greenhouse cultivation.

4.2 The advantages and challenges of practical application

In Section 4.1, this study confirmed the fact that individual plants have growth differences. The canopy coverage of the group plants was obtained by using the semantic segmentation method, and the growth status of lettuce of different varieties and different growth stages was monitored (see Section 3.5 for details). Although we have improved PSPNet by combining ECA attention mechanism without increasing the size of the model, challenges of identifying the dynamic process of different varieties of lettuce of the whole life cycle in practical application are still existed. Therefore, in order to improve the segmentation of PSPNet on the group lettuce plants in the whole growth stage, this study proposed a new attention mechanism called CAS. CAS reinforces the features of the PSPNet backbone, which significantly improves MIoU indicators, ensuring accurate segmentation of PSPNet in practice. Considering the lightweight deployment requirements of the model in actual planting scenarios, this study replaced the original backbone ResNet50 of PSPNet with MobileNetv3 and InceptionNeXt for lightweight improvement. The results showed that PSPNet based on MobileNetv3 achieved a good balance between model performance and model complexity. MobileNetV3 and InceptionNeXt both have fewer parameters and smaller model-size, which helps reduce storage and computing resource consumption and improve model efficiency. However, compared to InceptionNeXt, MobileNetV3 has fewer parameters and a more lightweight model.

It is well known that the segmentation method based on deep learning shows significant advantages in model accuracy and generalization performance compared with the traditional segmentation method. Besides benefiting from the advantages of the deep learning model, it also relies on a large number of manually labeled sample data. To explore the generalization and applicability of the proposed segmentation model in other planting scenarios, this study collected images of the group lettuce from partial pipeline cultivation for validation, as shown in Figure 10. It was found that the proposed semantic segmentation model displays good generalization performance in other planting scenarios. However, considering the annotation of segment datasets is extremely time-consuming, semi-supervised or unsupervised learning seems promising in subsequent research.

Figure 10

The figure is organized into groups of two rows each, demonstrating the practical segmentation performance of the segmentation algorithm proposed in this study under other planting scenarios. It can be observed that the algorithm exhibits favorable segmentation results across two growth stages in these alternative scenarios.

Figure 10. Segmentation verification of the group lettuce in pipeline cultivation.

In the practical application of segmentation models, different agricultural scenarios and crop types face different challenges. The field environment is usually complex, including different soil types, lighting conditions, weather conditions, crop growth stages, etc. These variations may lead to different appearances and textures of crops, thereby decreasing segmentation precision. The internal environment of facility greenhouses is relatively stable but still faces the issue of uneven lighting. High density planting and staggered cultivation of multiple crops in greenhouses may raise higher demands on the accuracy of semantic segmentation algorithms. Crop types vary in characteristics, such as shape, size, and color. Even the same crop type may exhibit different appearances due to different growth stages and environments, making it very difficult to design a universal semantic segmentation model. The proposed method in this study is limitedly applicable to the scene of near-point growth state monitoring such as facility greenhouse and plant factory rather than large-scale scenes such as a field. Thus it is necessary to expand corresponding image datasets and fine-tune the model to meet the application needs in field. Finally, there is no leaf occlusion between plants in the early stage of crop growth, but inevitably occurs in the late stage of crop growth. In this case, the accurate acquisition of plant canopy coverage may also need to consider the spacing between plants, which is very easy to achieve in scenarios such as facility greenhouses and plant factories.

5 Conclusion

Canopy coverage is one of the core indicators for measuring lettuce growth, and the automated acquisition of this indicator is crucial for dynamically monitoring the growth status of lettuce. However, individual growth differences among the group plants at different growth stages cause difficulties in accurately evaluating the growth status of the group lettuce based on the canopy coverage of individuals. Focusing on greenhouse hydroponic lettuce, this study improves the model from two perspectives: improving model precision and reducing model-size. Firstly, an attention mechanism CAS is proposed to improve PSPNet, which considers channel features, spatial features, and axial features simultaneously, addressing issues such as high similarity between some plant stems and background, as well as differences in edge growth of different lettuce varieties. The MIoU can reach 0.9832. Meanwhile, in order to meet the deployment requirements of the model in practical planting scenarios, this study improved PSPNet by replacing a lightweight backbone and establishing a lightweight semantic segmentation model based on MobileNetv3-PSPNet, with an MIoU of 0.9717. Besides, a group lettuce canopy coverage acquisition system was constructed based on PyQt5 to facilitate the automatic analysis of dynamic growth changes for the further exploration. When Loaded into the system, this pre-trained model enables automatic calculation of canopy coverage of the group lettuce. This study provides a feasible, automated, and low-cost method and platform for real-time monitoring of crop growth in practical planting scenarios such as facility greenhouses and plant factories, which can be applied to other crops for growth management in the future. At present, this proposed method is only suitable for small-scale, near-ground hydroponic lettuce growth monitoring. Further data acquisition and model fine-tuning will be required to popularize the proposed model to various crops and scenes in the future. Therefore, it is suggested to further expand the scope of application of the model across different varieties and planting scenarios, and obtain more evaluation parameters of crop growth using this method to enrich the model function and improve the model performance.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

HL: Writing – original draft, Writing – review & editing. PZ: Writing – review & editing. JZ: Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the “Creation and application of green, efficient and intelligent aquaculture factory “[Grant no. 2022YFD2001700].

Acknowledgments

The authors would like to thank the editor and reviewers for their valuable input, time, and suggestions to improve the quality of the manuscript.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Adams, J., Qiu, Y., Xu, Y., and Schnable, J. C. (2020). Plant segmentation by supervised machine learning methods. Plant Phenome J. 3, 1–11. doi: 10.1002/ppj2.20001