Towards practical object detection for weed spraying in precision agriculture

Weeds pose a persistent threat to farmers’ yields, but conventional methods for controlling weed populations, like herbicide spraying, pose a risk to the surrounding ecosystems. Precision spraying aims to reduce harms to the surrounding environment by targeting only the weeds rather than spraying the entire field with herbicide. Such an approach requires weeds to first be detected. With the advent of convolutional neural networks, there has been significant research trialing such technologies on datasets of weeds and crops. However, the evaluation of the performance of these approaches has often been limited to the standard machine learning metrics. This paper aims to assess the feasibility of precision spraying via a comprehensive evaluation of weed detection and spraying accuracy using two separate datasets, different image resolutions, and several state-of-the-art object detection algorithms. A simplified model of precision spraying is proposed to compare the performance of different detection algorithms while varying the precision of the spray nozzles. The key performance indicators in precision spraying that this study focuses on are a high weed hit rate and a reduction in herbicide usage. This paper introduces two metrics, namely, weed coverage rate and area sprayed, to capture these aspects of the real-world performance of precision spraying and demonstrates their utility through experimental results. Using these metrics to calculate the spraying performance, it was found that 93% of weeds could be sprayed by spraying just 30% of the area using state-of-the-art vision methods to identify weeds.


I. INTRODUCTION
The current agricultural approach to weeding in arable crops is to spray an entire field with a selective herbicide that kills the weeds, but does not harm the crops.Such a broadcast spraying approach is easy to deliver-requiring only a sprayer to dispense the herbicide-but wasteful, since much of the area sprayed does not contain weeds, being either bare earth or crops (see Figure 3).Precision agriculture aims to use ideas from artificial intelligence (AI) and robotics to create agricultural solutions that can be delivered at the level of individual plants, instead of entire fields.An important application within precision agriculture is automated weeding, which aims to detect and target individual weeds, resulting in the precise delivery of herbicide [1] on the weeds while avoiding wastage, using a laser [2], or a mechanical tool [1].In the work presented here, we are concerned with precise delivery of herbicide in a real-world farm setting.Now, it is clear that any approach to automated weeding needs to identify the weeds, and there have been numerous attempts to use computer vision to do this (see Section II), many using AI methods based on machine learning (ML).However, most of this work has treated automated weeding as purely a computer vision problem: datasets are collected and annotated, object identifiers are trained, and the resulting models are optimised with respect to accuracy and/or mAP for classifying crops and weeds.We argue that while these metrics are important, there are additional measures that must be considered in assessing the feasibility of ML models for use in precision spraying.Here we focus on three of these.
First, the weed coverage rate (WCR) is more important than conventional ML-based metrics.WCR identifies the proportion of the weeds that could be targeted by a sprayer triggered by the model.WCR depends on the accuracy of the model, but also takes into account the resolution of a spray.Second, we need to understand the area sprayed, that is the area covered in herbicide by the precision sprayer in order to understand the saving in herbicide compared with current practice.Third, it is important to know whether the approach is practical-because you can't just pile more GPUs on-board a tractor operating in an open field, where issues like compute power and energy consumption come into play.Real-world automated weeding (see Figure 1) will need to process many images very rapidly, so here we use inference speed as a proxy for comparing the practicality of different ML approaches.
The primary contribution of this paper is to assess the feasibility of precision spraying by comparing a selection of standard ML-based vision methods applied to weed detection using the additional metrics we have introduced (WCR, area sprayed and inference speed).Section II briefly describes the state-of-the-art in ML-based vision approaches applied to agriculture.Section III explains our methodology and Section IV details the experiments we conducted which form the basis of our comparison.Section V analyses these results and Section VI closes with conclusions.

II. RELATED WORK
Initial approaches to weed detection used machine learning algorithms with handcrafted features based on the differences in colour, shape or texture.[3] extracted local binary patterns with support vector machines for plant discrimination.This method generally requires a relatively small dataset for model development.However, it might fail to generalise under different field conditions.The deep learning-based methods for computer vision increasingly gain more popularity, offering an end-to-end weed detection solution that deals with the issue of generalisation.An adjusted YOLOv3 Fig. 1: A diagrammatic explanation of a small precision sprayer.(a) A boom is mounted on the back of a vehicle, carrying 5 cameras and a set of spray nozzles.Images are captured in real-time, as the vehicle moves.The system processes images, identifies weeds and their location within images, and targets spraying to hit the weeds.Sequences of images captured may provide overlapping views of the field, or there may be gaps between images.(b) A top-down view of the sprayer.The direction of travel for the is indicated in the drawing, from the bottom to the top of the illustration.Each camera captures its own image stream over time.Each row of images represents the set captured by the cameras at the same point in time; each column represents the passage of time as the tractor drives.The images here do not overlap.model was developed for bindweed detection in sugar beet fields [4].This work provides good detection accuracy (mAP=0.829),but it was based on the assumption of only one weed species in crop fields.[5] trained a CentreNet model only for crop detection and treated other remaining vegetation objects as weed species.In this case, the model focuses on only crop detection without consideration the specific weed species in fields.Besides, [6] developed a fully convolutional network integrating sequential information for robust weed detection in fields.The results show a better generalizability to unseen fields compared to other approaches in [7], [8].In order to remedy the scarcity of labelled samples for training the current deep supervised networks, [9] proposed a pipeline to generate annotated weed and crop synthetic data.Similar work based on the cut-andpaste approach for synthetic image generation can be seen in [4] for weed detection.Additionally, generative adversarial networks (GANs) were exploited in weed identification with transfer learning [10].
Conventional CNNs require millions of floating point parameters and consume a lot of compute resources.Therefore, there have been many attempts to optimise networks for fast inference.An approach that has been extremely successful is quantization.This aims to retain the performance of CNNs while decreasing the precision of weights and/or activations.Binarisation, a 1-bit quantization first used in [11], is the most extreme approach.BinaryNet [12] and XNOR-Nets [13] binarize both the weights and activations to achieve a further speed increase and a further reduction in memory usage.Bi-Real Net [14] aimed to improve the accuracy of these initial approaches by retaining the real-valued outputs of 1-bit convolution using a shortcut.BiDet [15] uses a binary network implementation based on Bi-Real Net for object detection using both SSD [16] and Faster R-CNN [17].To our knowledge, there has been no other study investigating the use of binary neural networks for identifying weeds.In [18], nine varieties of weed in the DeepWeeds dataset were classified using a binary neural network.The authors found that most of the accuracy could be retained after downsampling the images to just 32x32x3 so running on the Intel DE1-SoC FPGA, an impressive inference time of 1.059ms could be achieved using just 6.04 watts of power.However, in this study we will focus on object detection instead of image classification and only look at how these networks perform on GPUs.
After weed detection, following work on robotic weed control system can be studied for precision weed management.Spot spraying or selective spraying, the application case in this paper, is referred as switching the individual nozzle ON/OFF to deliver chemicals only for weed species.Hussain et al. [19], [20] described a smart sprayer based on a deep learning detection model and evaluated the entire system under different weather conditions.Other than spraying, robotic mechanical precision weeding [1] is an alternative, suitable for weed management in organic farming, but typically sacrificing the operation efficiency compared with spraying approach.Our study estimates the most suitable object detector based on deep learning by exploiting multiple datasets at different locations.This detector is determined with an emphasis on the balance between detection accuracy and inference speed for robotic spraying weeding development in field environments.

III. METHODOLOGY
The overall scenario we are considering is shown in Figure 1.A boom, mounted on the back of a farm vehicle, carries both cameras and nozzles for dispensing herbicide.To be practical, images captured by the camera must be processed, and weeds detected, in time that they can be sprayed as the nozzles pass over them.Here we describe how we evaluate the effectiveness of different approaches to this detection problem.
A conventional metric to evaluate the performance of object detectors is mean Average Precision (mAP).To calculate mAP, first, a threshold is defined for the IoU value and this is used to distinguish between true positive (T P), false positive (FP) and false negative (FN) detections.The IoU value, or Intersection over Union, measures the accuracy of the predicted bounding box circumscribing the object identified and is equal to (A ∩ B)/(A ∪ B), where A is the area of the ground truth (labelled) bounding box and B is the area of the bounding box predicted by the model.
Precision (P) is the proportion of correctly identified objects over all objects identified, T P/(T P + FP).Recall (R) is the proportion of correctly identified objects over all identified objects (correct or incorrect), T P/(T P + FN).These metrics are used to compute average precision (AP): where n is the IoU threshold rank.Because AP is dependent on the IoU threshold value, mAP takes into account different threshold values and corresponding variations in the relationship between precision and recall.Typically, n = 10 threshold values are chosen, where the threesholds range from 0.5 to 0.95, i.e. {0.5, 0.55, 0.6, . . ., 0.95}.
While mAP is a good measure of the performance of an object detector, it focuses on identifying the object very precisely.In practice, for a sprayer, the level of precision for spraying is limited by the spray nozzle, and, as we will see, it is possible for an object detector that has a relatively low mAP to still be good enough to be effective in ensuring that weeds are covered with herbicide.
To get a better idea of how the precision of the detector impacts the precision of spraying, we have devised a new metric which we call weed coverage rate (WCR).This estimates how many of the weeds in the test data would be sprayed, rather than how many are detected.We start by deciding how many sprayer nozzles n will operate on the area within an image.In practice it is possible to spray at a resolution of around 10cm which (see below) means that we could have 3 or 4 nozzles spraying independent sections of the ground covered by one image.Each image i is then split, along it length, into n stripes.If a weed is detected in a given stripe (that is the bounding box intersects with the stripe), an area that is the height of the stripe, and stretches to either side of the bounding box, is added to the spray area S i .This is illustrated in Figure 2.
The spray area is larger than just the bounding boxes because the size is partly determined by the width of the spray (compare Figure 2a and Figure 2c).WCR differs from mAP because that additional area, while increasing the herbicide used, will sometimes "hit" weeds that have evaded detection.WCR is defined as follows.A weed is counted as having been sprayed if it is wholly contained in the spray area: Then WCR is the fraction of the weeds that are counted as having been sprayed: In addition, we compare the area that has been sprayed with the total area of all the images: Since area sprayed is a proportion of the total area that would be covered by broadcast spraying, The volume of herbicide saved is proportional to 100 − AreaSprayed.
In order to establish the requirements in terms of framerate, we need to make some assumptions about the design of a sprayer.The boom on a typical sprayer is 24m, and the recommended height to operate the boom above the crop canopy is 50cm [21].We established empirically that, at this height, a typical camera with a 1.8:1 aspect ratio (see below), can cover 550mm by 305mm.The maximum speed of a sprayer is widely taken to be around 15mph, or 6.7m/s (see, for example, [22]) to prevent spray drift.With the long edge of the image aligned along the spray boom, we need 44 cameras, and each camera can cover 305mm in the direction of travel.At 6.7m/s, we will need to process 22 frames per second (assuming no overlap), and across all 44 cameras the required frame-rate will be 968.With the short edge of the image aligned along the boom we would need 79 cameras and process 13 frames a second per camera, giving a required frame-rate of 1027.

A. Datasets
We used two weed/crop datasets: the Lincoln beet (LB) dataset 1 which we collected and annotated as part of this work and the Belgium beet (BB) dataset from [4].Both datasets contain pictures from fields in which sugar beet was grown commercially and the images contain pictures of beet and malicious weeds with their respective bounding box locations.The BB dataset contains 2506 images of 1800 × 1200 pixels.The LB dataset consists of 4405 images of 1920 × 1080 pixels.The LB images were extracted from videos2 recorded at different points in time and three sugar beet fields.These data collection dates range from May to June 2021, where each field was scanned, at minimum, on four different dates with a week of separation to record weeds at different stages of growth.For all the scanning sessions, the distance from the camera to the ground was approximately 50cm.Two cameras were used; one with 12 megapixels, 26mm focal length and an f1.6 aperture and a camera with 64 megapixels, 29mm focal length and an aperture of f2.0.The original size of pictures from both cameras was 2160 × 3840 pixels.The fields used for the LB dataset are in Lincoln, UK, at different locations with conditions as the type of soil, distribution of the plants, and weed varieties.Fig. 3 shows examples of the BB dataset and three fields used to create the LB dataset.Both datasets present different item distribution and item visibility.The BB dataset has a lower number of items per picture than the LB dataset.In terms of visibility, the items in the LB dataset are proportionally smaller than the items in the BB dataset, and the items in the BB dataset have higher levels of inter-item occlusion.Table I shows the visibility and distribution characteristics of both datasets, and the characteristics of each of the items in the dataset.

B. Data preparation
For the experiments, each dataset was randomly split into training, test and validation sets with 70%, 20%, and 10% of the dataset images, respectively.To evaluate the approaches under different image resolutions, the images and the annotations were resized into three different sizes that maintain the width/height ratio; 960 × 540 pixels, 640 × 360 pixels, and 320 × 180 pixels.

C. Models and Parameters
For the identification of weeds, we implemented one-stage detectors, two-stage detectors and a binary one-stage detector.The one-stage models are Yolov5m [23], Yolov3 [24] and Yolov4 [25], where all three models use Darknet-53 [24] (DN-53) as a backbone.The two-stage detectors are based on Faster R-CNN [17] with three different backbones: one with a ResNet-50 backbone [26] and a Feature Pyramid During training, for the one-stage and two-stage models, the optimiser was stochastic gradient descent (SGD), the learning rate was 1 × 10 −2 , the momentum was 0.937, and the learning decay was 5 × 10 −4 .In both model varieties, the networks were pre-trained on the COCO dataset [29].
The binary detector uses BiDet [15] and is based on the single shot detector (SSD) [16] with a Bi-Real Net [14] backbone based on VGG-16 [30].The optimiser used for the binary model was ADAM [31]; the learning rate was 1 × 10 −3 , momentum 0.9, learning decay 0 and the network was pre-trained on Imagenet [32].
Training and testing were run on a GTX2080Ti processor with 11 GB VRAM.For all detectors, the batch size, that is the number of images that are fed to the models simultaneously, used for training was the maximum number of images that can fit in the GPU along with the detector, and the number of epochs for training were 300 for the one-stage and two-stage detectors and 350 for the binary detector.For each model, the model weights used for testing were the ones with the highest mAP on the validation set during the training process.For testing and inference, the non-maximum-suppression threshold was 0.45.

D. Results
We evaluated the trained models in several ways across both the BB and LB datasets and report both traditional metrics (mAP) as well as our additional metrics: inference speed in frames per second (FPS), weed coverage rate (WCR) and area sprayed.Analysis of results and discussion are given in the next section (V).Table II provides a conventional evaluation giving mAP and speed of inference with batch size= 1.The approaches with the best performance on these metrics are highlighted.Table III also provides inference speed, but exploits the ability of the models and the hardware they are run on to operate in parallel.Results are given for a range of batch sizes.Tables IV reports weed coverage rate and area sprayed for one to four nozzles.Figure 4 shows how WCR and spray area vary with the number of sprayer nozzles across the set of detectors.While weed coverage decreases with the number of nozzles, WCR decreases more slowly than area sprayed.We also plotted the WCR and area sprayed against mAP, Figure 5.Note that while we have results for all image sizes, space limitations mean we only show results for 640 × 360 images.

V. DISCUSSION
Table II shows that YoloV5m has the highest mAP values for both sugarbeet and weeds, on both the LB and BB datasets-a traditional ML evaluation would stop there.But recalling our practical motivation to devise a strategy for assessing the feasibility of ML methods for precision spraying applications, we need to examine the results in more depth.Table II shows that YoloV3 has the fastest inference  speed.The trade-off here is to select YoloV5m and process 68.96 frames per second very accurately or select YoloV3 and process 75.18 fps, but less accurately.Next, we look at the trade-offs that come with different batch sizes.Table II only reports results where batch size= 1, whereas table III compares results for higher batch sizes.If images from 44 cameras were processed in parallel (i.e.batchsize= 44) a frame rate of 277 or 333 (BB or LB, respectively) could be achieved with YoloV5m on a single GPU.While this is below the speed required to spray at our target pace of 15mph3 , a total of 3-4 GPUs would make in-the-field spraying with YoloV5m feasible.
Further, we investigate investigate the trade-offs when considering our WCR and area spray measures.Figure 4 and table IV contrast results when varying the number of sprayer nozzles per image from 1 to 4. For both datasets, using Fig. 5: mAP plotted against WCR (left) and area sprayed (right) with 3 nozzles for BB (top row) and LB (bottom).
3 nozzles seems to produce a significant reduction in area sprayed while retaining a good weed coverage rate-which is what we want: high WCR combined with low area sprayed means that we are hitting large numbers of weeds while wasting less herbicide.While the highest WCR on the BB dataset is achieved by YoloV5m at 88.2 for 3 nozzles, Faster-RCNN (50-FPN) only exhibits a small reduction in WCR with 85.78 but only sprays 17.08% of the area compared to 33.03%.For the LB dataset with 3 nozzles, YoloV5m achieves a WCR of 98.9, but 68.6% of the area is sprayed, whereas SSD achieves WCR of 96.94 but only needs to spray 35.36% of the area.Figure 5 illustrates the relationships between the traditional mAP results (from table II) and the new WCR and area sprayed metrics.While WCR and area sprayed are both positively correlated with mAP, these results suggest that the detector which exhibits the ideal balance between weed coverage and herbicide usage may not be the detector with the highest mAP.
One limitation of our results is that they were computed on non-contiguous images.This means the entire spray resolution needs to fit within one image.However, if our images were contiguous we could calculate the impact on WCR and area by allowing the spray to run over into the next image frame.Other limitations are that our model does not account for the time taken for the sprayer to turn on and off, and it assumes rectangular spray zones, whereas these are typically circular.Spray drift due to wind force and vehicle movement is also an influencing factor for precision targeting.Modelling these aspects will provide more accurate estimates of sprayer performance, and they are all elements that we will address in future work.

VI. CONCLUSION
This paper has demonstrated that selective spraying is feasible with state-of-the-art object detection.Our experiments show that the models considered are fast enough to detect and spray weeds on-the-fly, while our WCR metric helps to demonstrate, more clearly than conventional ML metrics, that the accuracy of our weed detection would be adequate to achieve a similar hit rate to broadcast spraying.At the same time, the area sprayed metric highlight options that produce clear reductions in area sprayed and hence herbicide required.As with many multi-criteria optimisation problems, there is no single clear winner.Our new metrics help highlight the advantages and drawbacks of different approaches, demonstrating that when it comes to practical deployment, it's not just about mAP.
In future, we will look to improve the frame rate achievable on a single GPU.Additionally, we will improve our WCR and area sprayed estimations by using contiguous images and take into account more properties of the spray, like area shape and sprayer response times.

Fig. 2 :
Fig.2: WCR.The spray area (green), weeds (red) and beet (blue) for the same image and one to four nozzles.When a nozzle is triggered, it sprays an area that starts before and finishes after (in the x direction) the relevant weed bounding boxes and fills (in the y direction) the relevant fraction of the image for the number of nozzles.Note how the area sprayed decreases as the number of nozzles increases, and how the rightmost weed triggers two "sprays" for 4 nozzles.

Fig. 3 :
Fig. 3: Examples of the different fields in our datasets.Clockwise from top left: BB dataset, LB location 1, LB location 2, and LB location 3

Fig. 4 :
Fig. 4: The number of nozzles plotted against WCR (left) and area sprayed (right) for BB (top row) and LB (bottom).

TABLE I :
Characteristics of the BB and LB datasets at dataset level(top) and at item level(bottom)

TABLE II :
Performance for object detectors with the BB (top) and LB (bottom) datasets.The fastest speeds (FPS) and highest mAP scores are highlighted.

TABLE III :
Speed in frames per second (FPS) for object detectors with different batch sizes for the BB dataset (top) and LB (bottom).

TABLE IV :
WCR and area sprayed for 1-4 nozzles, for BB (top) and LB (bottom).Highlighted values explained in text.