A Low-Cost and Unsupervised Image Recognition Methodology for Yield Estimation in a Vineyard

Yield prediction is a key factor to optimize vineyard management and achieve the desired grape quality. Classical yield estimation methods, which consist of manual sampling within the field on a limited number of plants before harvest, are time-consuming and frequently insufficient to obtain representative yield data. Non-invasive machine vision methods are therefore being investigated to assess and implement a rapid grape yield estimate tool. This study aimed at an automated estimation of yield in terms of cluster number and size from high resolution RGB images (20 MP) taken with a low-cost UAV platform in representative zones of the vigor variability within an experimental vineyard. The flight campaigns were conducted in different light conditions and canopy cover levels for 2017 and 2018 crop seasons. An unsupervised recognition algorithm was applied to derive cluster number and size, which was used for estimating yield per vine. The results related to the number of clusters detected in different conditions, and the weight estimation for each vigor zone are presented. The segmentation results in cluster detection showed a performance of over 85% in partially leaf removal and full ripe condition, and allowed grapevine yield to be estimated with more than 84% of accuracy several weeks before harvest. The application of innovative technologies in field-phenotyping such as UAV, high-resolution cameras and visual computing algorithms enabled a new methodology to assess yield, which can save time and provide an accurate estimate compared to the manual method.


INTRODUCTION
Grapes are one of the most widely grown fruit crops in the world: vineyards cover a total area of 7.5 million hectares and produce a total yield of 75.8 million metric tons, of which 36% are fresh grapes, 8% raisins and 48% wine grapes (International Organisation of Vine and Wine [OIV], 2017).
Monitoring and grading in-field grape ripeness and health status is extremely important for valuable production, for both the table grape and premium wine markets (Bindon et al., 2014;Ivorra et al., 2015;Portales and Ribes-Gomez, 2015;Pothen and Nuske, 2016). As for other crops, the yield monitoring in terms of cluster number and size is key information in viticulture (Fanizza et al., 2005;Cabezas et al., 2006;Costantini et al., 2008). Traditionally, the yield prediction is conducted manually and routinely evaluated by visual or destructive methods, like those proposed by the International Organisation of Vine and Wine [OIV] (2007), which may involve problems such as low efficiency in terms of time and sampling representativeness (Pothen and Nuske, 2016). Moreover, yield evaluation with visual inspection by means of cluster counting and size estimation is subjective resulting in error variations between the results of different people .
Field conditions, frequently on a slope and plowed soil, cause measurement difficulties, especially in the summer due to extremely high temperature. In that respect, advances in technology are key to the future of agriculture. Computer vision is a powerful tool for measuring dimensions and size distribution of shaped particles. Specific user-coded computer vision applications of particle size may require advanced programming using a proprietary programming language environment such as Visual C or MATLAB with specialized image processing toolboxes (Igathinathane et al., 2008). ImageJ is a Java-based, multithreaded, freely available, open source platform, independent and public domain image processing and analysis program developed at the National Institutes of Health (NIH), United States (Schneider et al., 2012). ImageJ has a built-in option for analyzing particles, which produces output parameters such as number of particles, areas, perimeters, and major and minor axes. In our study, the shaped particles are vine clusters. This software provides many image analysis tools, including algorithms based on threshold detection value to generate the binary image used for segmentation. Otsu's threshold is one of the most widely used threshold techniques for the vegetation segmentation process (Ling and Ruzhitsky, 1996;Shrestha et al., 2004). Gebhardt et al. (2006) converted the RGB images into grayscale generating local homogeneity images to detect a homogeneity threshold.
The main limitation of threshold techniques is stability of the binarization accuracy of the system, as any mis-segmentation is generally caused by an error in the detected threshold. So, if the detected threshold is not appropriately estimated, the generated segmentation process will be strongly affected. Another issue is related to the effect of light conditions on the vegetation segmentation results obtained, particularly in sunny and overcast conditions. The use of different color spaces instead of RGB can overcome the unwanted light effects. A widely used color model is the LAB color model, whose coordinates represent the lightness of the color (L * ), its position between magenta and green (a * ) and between yellow and blue (b * ). The LABFVC algorithm proposed by Liu et al. (2012) is an automatic Fractional Vegetation Cover (FVC) extracting algorithm for digital images and is based on the premise that the representations of vegetation and soil in the LAB color spaces approximately follow Gaussian distributions. Song et al. (2015) proposed a modified methodology based on LABFVC algorithm as automatic shadowresistant algorithm in the LAB color space (SHAR-LABFVC). Yang et al. (2015) suggested an HSV (hue, saturation, value) color space method for greenness identification of maize seedling images acquired outdoors.
Computer vision has also been widely applied for size and weight evaluation because it is non-destructive and highly efficient (Costa et al., 2011;Cubero et al., 2011;Hao et al., 2016).
In agriculture and the food industries, due to recent advances in computing and robotics, most computer vision applications are related to the measurement of external properties, such as color, size, shape and defects of fruits, vegetables, fish and eggs (Zhang et al., 2014;Soltani et al., 2015;Moallem et al., 2017;Su et al., 2017).
Recently, research has focused on estimating grape yield components using stereo vision (Ivorra et al., 2015) and comparing 2D imaging technology with direct 3D laser scanning system (Tello et al., 2016). The latter successfully achieved a geometric reconstruction of the morphological volume of the cluster from 2D features, which also proved to work better than the direct 3D laser scanning system. All these studies were conducted in the lab on grape samples collected manually, while image processing has been used successfully in the field with prototype rovers or tractors to assess key canopy features, such as yield (Dunn and Martin, 2004;Nuske et al., 2011;Diago et al., 2012;Nuske et al., 2014;Aquino et al., 2018) and leaf area . Moreover, recent papers Roscher et al., 2014;Kicherer et al., 2015) have reported some initial results on the application of image analysis for high-throughput phenotyping in vineyards. Those solutions demonstrated high performance and weak points such as the long monitoring time due to slow forward speed and problems related to soil trafficability. Furthermore, those solutions enhanced image quality using artificial light in the night time condition, with all the related risks for operators in a sloping vineyard or in narrow vineyard rows.
The above papers proved that this new framework, based on computing-robotics-machine vision, can be applied successfully for the evaluation of cluster attributes and components. In addition, precision viticulture is experiencing substantial growth due to the availability of improved and cost-effective instruments such as UAVs (Unmanned Aerial Vehicles; Matese et al., 2015). It has been demonstrated that rapid technological advances in unmanned aerial systems foster the use of these systems for a plethora of applications (Gago et al., 2015;Pôças et al., 2015;Bellvert et al., 2016;Di Gennaro et al., 2016;Poblete-Echeverría et al., 2017;Romboli et al., 2017;Santesteban et al., 2017;Matese and Di Gennaro, 2018), opening also new perspectives to traditional remote sensing (Sun and Du, 2018).
The aim of this study was to develop a fast and automated methodology to provide an early yield evaluation (pre-harvest) in support of vineyard management. The UAV approach described allows a fast monitoring taking into account a few images acquired in representative points of vineyard variability, which is fundamental to export this application to a large vineyard with moderate times for monitoring and image processing. This method aims to provide useful information, overcoming the limits of ground observations (soil trafficability and forward speed). To our knowledge, this is the first study to apply these technologies in a partially defoliated vineyard with vertical pruning system (VPS), while a first attempt was successfully applied to detect melons in pre-harvest (Zhao et al., 2017). The proposed methodology was evaluated with high resolution RGB images acquired by UAV in a vineyard located in Tuscany (Italy), with a commercial digital camera and without the help of artificial light.  Figure 1A). Sangiovese cv. (Vitis vinifera) vines were trained to a vertical shoot-positioned trellis and spur-pruned single cordon with four two-bud spurs per vine. Vine spacing was 2.2 m × 0.75 m (inter-row and intra-row), rows were NW-SE oriented, and the vineyard was on a slight southern slope at 355 m above sea level. Pest, soil and canopy management were performed following the farm practices.

Case Study and Experimental Design
A preliminary flight campaign on the vineyard was performed with a UAV platform on June 27th 2017, which is a key period in this winemaking area for estimation of vine vigor variability (Romboli et al., 2017). Remote sensing multispectral images were collected in order to characterize vigor spatial variability and identify representative vigor zones for the grape detection analysis. Within the representative zones of high and low vigor, five vines were selected in each zone. On August 9th those vines were monitored by a UAV equipped with an RGB camera to assess tool performance in cluster detection in two different light conditions combined with two different leaves management: (i) target shaded by leaves and shadow, (ii) target partially free of leaves and directly illuminated by the sun. Those conditions were evaluated performing a flight before (i) and after (ii) a partial defoliation, which was done by removing the leaves in the fruiting zone from just one side of the row, to maintain partial cluster protection and not excessively alter the vegetative-productive balance of plants. This is a widespread agronomic practice in order to favor grape ripening and health, limiting the risk of summer heat stress. Two datasets were acquired, the first in the morning (10:00 AM) in worst condition (W), with cluster covered by leaves and in shadow; and the second in the afternoon (14:00) in best condition (B), with leaves partially removed and cluster directly illuminated by the sun. On the same day, a third UAV survey was conducted with the aim of confirming the distribution variability detected during the first flight in June. The same flight campaigns as 2017 were repeated on June 26th and August 8th 2018. The experimental design was modified according the preliminary results obtained in 2017. So, the acquisition was conducted by choosing the best performing method of the two tested, expanding the number of experimental parcels. In detail, two blocks per vigor area of eight plants each were identified, for a total of 32 sample plants.

Ground-Truth Measurements
In 2017 season, the characterization of vigor variability within the vineyard was performed taking into account 18 vines in each vigor zone: five sample vines monitored in detail by UAV and 13 chosen randomly around sample vines. In 2018 season, ground truth measurements were taken on 16 sample vines monitored in detail by UAV for each vigor zone. For each vine, vegetative and yield-related data were monitored. As indicator of vine vigor, total shoot fresh mass was determined in the field for each vine in the dormant period, prior to pruning. In both years, the production of each sample vine was characterized in the field in terms of cluster number (ripe and unripe) and yield.

UAV Platform and Sensors
Flight campaigns were performed using an open-source UAV platform consisting of a modified multi-rotor Mikrokopter (HiSystems GmbH, Moomerland, Germany; Figure 1B).
Autonomous flight is managed by an on-board navigation system, which consists of a GPS module (U-blox LEA-6S) connected to a navigation board (Navy-Ctrl 2.0) and a flight control unit (Mikrokopter Flight Controller ME V2.1) that controls six brushless motors. Two communication systems consist of a duplex transmitter at 2.4 GHz (Graupner) and a WiFi module (Mikrokopter) at 2.4 GHz to control the UAV navigation Frontiers in Plant Science | www.frontiersin.org and monitor flight parameters, while a WiFi module provides video data transmission at 5.8 GHz ensuring real-time image acquisition control by the ground operator. The flight planning was managed through Mikrokopter Tool software, which allows a route of waypoints to be generated as a function of the sensor Field Of View, overlap between images and ground resolution needed. Maximum payload is approximately 1 kg, ensuring 15 min of operating time with one 4S battery @11000 mAh. A universal camera mount equipped with three servomotors allows an accurate image acquisition through compensation of tilt and rolling effects.
The UAV was equipped with multispectral and RGB cameras, to monitor vegetative status and perform image detection, respectively. The multispectral camera ADC-Snap (Tetracam Inc., CA, United States) mounts a global shutter CMOS sensor that acquires 1280 × 1024 pixel images (1.3 MP) in the green (520-600 nm), red (630-690 nm), and near-infrared (760-900 nm) bands. Multispectral image acquisition was performed at 50 m above the ground, yielding a ground resolution of 0.025 m/pixel and 70% of overlap in both directions. The RGB camera is a Sony Cyber-shot DSC-QX100 RGB camera (Sony Corporation, Tokyo, Japan), which mounts a 20.2 megapixel CMOS Exmor R sensor and a Carl Zeiss Vario-Sonnar T lens. The RGB camera acquired high resolution data of sample plants selected within representative zones of the vineyard variability. The flight altitude was set at 10 m above the ground and the sensor was placed at 45 • with respect to the vertical at the ground and 90 • with respect to the direction of advancement, providing a ground resolution of 0.002 m/pixel on the fruit zone. Flight planning was made by choosing a route of waypoints following the direction of the rows, which allows images of the area of interest on adjacent rows to be acquired.

Multispectral Data Processing
PixelWrench2 1 software provides color processing of Tetracam RAW files and then radiometric correction, thanks to a calibration target acquired before each remote-sensing campaign. The images were subsequently processed in a series of steps with Agisoft Photoscan Professional (v.1.4.1), 2 a commercial computer vision software package, and Quantum GIS, 3 an open source GIS software, to provide a vigor map (Matese et al., 2015). The filtering procedure of the pure canopy pixels was assessed with a digital elevation model output produced from Agisoft Photoscan software. The basis of the procedure was that vine rows have a greater height from the ground and can easily be discriminated by Otsu's global thresholding, an algorithm that allows discrimination of two different zones: vine rows and ground . Assuming the correspondence between NDVI and vigor (Costa-Ferreira et al., 2007;Fiorillo et al., 2012), the first step is derivation of the NDVI computed by the following equation: NDVI = (Rnir − Rred)/(Rnir + Rred) 1 www.tetracam.com 2 www.agisoft.ru 3 www.qgis.org where Rnir and Rred are the reflectance in near infrared and red bands (Rouse et al., 1973). Matlab software (v.7.11.0.584, 2010) 4 was used to interpolate pure canopy pixel values with a moving average window and elaborate a NDVI map.

Computer Vision and Machine Learning: From RGB to Cluster Yield Workflow
Digital images acquired by UAV camera are stored in three dimensions using RGB color space and for color pattern extraction like vegetation, background or other components; color images offer more dimensions for image segmentation. Analyzing the distribution curve for different components permits the classification by determining a threshold in onedimensional space. Our methodology is completely unsupervised using ImageJ software and a threshold function with Otsu's method. Image processing essentially involves the creation of binary images of the particles from RGB images before being processed by the application algorithm. Figure 2 shows the methodology flowchart.

Selection of RGB Images
The first step of RGB images analysis workflow was the "supervised" selection of images centered on the sample plants in different conditions for each vigor zone ( Figure 3A).

Lab Stack Conversion
Initially RGB was converted to Lab Stack using ImageJ standard commands. The Lab color space mathematically describes all perceivable colors in the three dimensions: L for lightness, a * and b * for the color opponents green-red and blue-yellow. The yellow/blue opponent colors are represented along the b * axis, with blue at negative b * values and yellow at positive b * values. The component b * was found to be strictly correlated with cluster color (Figure 3B). A Gaussian filter with radius equal to 2 was applied in order to enhance the distribution function (histogram; Figure 3C).

Thresholding
The thresholding step was tried with different images and Otsu's method was chosen to locate a threshold value automatically based on each image's condition. Otsu's threshold clustering algorithm searches for the threshold that minimizes the intraclass variance, defined as a weighted sum of variances of the two classes. The algorithm assumes that the image contains two classes of pixels following a bi-modal histogram (foreground and background pixels), it then calculates the optimum threshold separating the two classes so that their combined spread (intraclass variance) is minimal, or equivalent (because the sum of pairwise squared distances is constant), so that their inter-class variance is maximal.
Two threshold routines were applied sequentially to b * filtered images. The first routine applying the automatic Otsu's threshold in order to identify the pixels considered as vegetation ( Figure 3D). Those pixels were removed from the images ( Figure 3E) and a new histogram of the image was obtained. The second routine with a second Otsu's threshold automatically detected only the clusters ( Figure 3F); the cluster values were then converted into a mask ( Figure 3G).

Analyze Particles
The next step was to set the scale by drawing a line on a known length target, so as to enable the tool in the pixel count along the drawn line on spatially calibrated images. ImageJ "Analyze Particles" routine was then invoked, which generates the number and dimensions of the particles (clusters). In the "Analyze Particles" dialog, particle areas to be considered were set at 200 to infinity pixel units, covering only cluster dimension, circularity was set at 0.25-1.00 and Overlay mask to display the number of shapes.

Statistical Analysis
The classification performance of the unsupervised methodology described was evaluated through a sensitivity index or true positive rate (TPR): TPR = (true cluster automatically classified /total clusters observed)/100.
The TPR identifies the percentage ratio of the true cluster automatically classified and the number of total clusters observed. Instead, the accuracy of the UAV approach was assessed by the value of a Percent accuracy index: The Percent Accuracy is calculated by subtracting the estimated value from the measured one, dividing that number by the measured value and multiplying the quotient by 100.

Yield Estimation
Yield estimation was performed by using the clusters surface derived from image analysis and the yield per vine weighted by traditional ground sampling. As a first step, for 2017 season, a linear regression was obtained between clusters surface (cm 2 ) and grapes sampled (g) on representative vines in both high and low vigor areas. Following 2017 results, the linear regression parameters were applied on 2018 dataset to calculate yield from remote sensing data, which was finally validated against ground truth measurements.

Cost-Benefit Analysis
The cost analysis was applied to three ideal vineyard sizes: 5, 10 and 50 ha, according with the last European Farm structure survey (Farm indicators by agricultural area, farm type, standard output, legal form and NUTS 2 regions; Eurostat, 2018). The approach was adopted to account for all the expenses associated to data acquisition and processing, for both the proposed methodology and the traditional survey, grouped into four broad categories, plus the cost for equipment (UAV and RGB camera) purchase: -Survey timing, i.e., the time needed to make a survey in two zones per hectare. For the traditional ground survey, it was calculated as about 25 min/ha to move within the vineyard, count and measure the average yield (10 vines). For the UAV 3 min/ha was estimated: 1 min for take-off and landing and 2 min for image acquisition.
-Survey costs include the man-hour costs to monitor the vineyard in a traditional way and to perform a UAV data acquisition flight. The cost for a single man-hour was considered at $16/h for a skilled worker who undertakes the traditional survey and at $24/h for a trained UAV pilot.
-Elaboration timing takes into account the digitization of the observed and measured data in the field (2 min/ha), while for the proposed UAV methodology it is the time to select the single images of interest and perform the automatic recognition (10 min/ha).
-Elaboration costs include the man-hour costs to digitalize the data acquired by the skilled worker with a related cost of $16/h for ground data and $20/h for image analysis.
-Cost of UAV+RGB camera, based on a DJI Phantom 3 platform + 4K 12 Megapixel camera purchase with a 3 years depreciation cycle ($620 total cost, $17.2 per month in depreciation costs) and assuming only one survey per year. All the reported costs are provided by Payscale, a website that provides information about salary, benefits and compensation, 5 while the costs of UAV+RGB are readily available on the internet (Google Shopping, accessed September 30, 2018).

Representative Zones
Calculation of the NDVI values allowed the identification of two representative zones of the vigor heterogeneity within the 5 www.payscale.com (accessed September 09, 2018) vineyard (Figure 4): one representative of the HV zone (high vigor) and the other of the LV zone (low vigor). The vigor within the vineyard detected during the preliminary flight in June 2017 ( Figure 4A) was confirmed by the vigor map produced by the flight campaign in August 2017 ( Figure 4B).
Remote sensing data related to vigor variability within the vineyard were confirmed by ground measurements for vegetative data acquired through the sampling of 18 vines in each zone identified by UAV survey (Table 1).

Cluster Characteristics Results
The results of cluster detection in Worst condition (W) -target partially covered by leaves and shaded, Best condition (B) -target partially free of leaves and directly illuminated by the sun are shown in Figure 5, which depicts the workflow of the image processing steps in HV (left) and LV (right) zones performed with ImageJ software. Figure 5 shows the visual results of 2017 season related to cluster characteristics detection from raw RGB images acquired by UAV (Figure 5A), LAB images processing (Figure 5B), automatic cluster detection ( Figure 5C) and lastly RGB images with a cluster overlay mask ( Figure 5D). In 2017, the one side defoliation left very little number of leaves in low vigor zone as a consequence of the minimum leaf coverage due to extremely dry season.
The results provided by image analysis algorithm performed with the ImageJ software in B and W conditions within the two vigor zones are summarized in Table 2. Regarding the results of the 2018 season, only the best condition was taken into account on a larger sampling number of plants.
For each vigor × condition, Table 2 reports the mean and standard deviation of number of clusters per vine monitored by ground observation (Clusters per vine), presence of green Values are the mean ± SD. Statistical difference was assessed by Student's t-test: * * * P < 0.001; * * P < 0.01.

clusters (Green Clusters per vine), clusters detected by UAV (Clusters per vine UAV), number of clusters detected by UAV adjusted taking into account the presence of oversized clusters assessed as double clusters (Clusters per vine UAV ADJ).
This was performed applying a threshold on the cluster dimension monitored by UAV, which allowed the presence of two very close clusters to be detected, first identified as one oversized cluster. The UAV methodology performance in different conditions of vigor and image quality was calculated through a TPR index. The UAV methodology applied in HV_B and LV_B identified 100.0% of ripe grapes in 2017 season, while poorer performances were found with HV_W (54.9%) and LV_W (26.4%). The image analysis of 2018 season provided a lower performance in single cluster segmentation on ripe grapes with HV_B (84.8%) and LV_B (97.7%).

Yield Estimation
Following these results, the data obtained from the best performing method (_B) were used to calculate the production values per plant in each vigor zone in both seasons. The zonal statistics calculated with ImageJ allowed the clusters to be counted and sized, processing the surface area exposed by the 2D image acquired by UAV. The total values of cluster area per vine extracted on high (168.6 ± 84.0 cm 2 ) and low (69.2 ± 35.6 cm 2 ) vigor plants on 2017 season were converted by cluster weight through a linear regression with the ground truth measurements.
The yield per vine was computed from the 2018 season dataset, according to the correlation parameters provided by 2017 preliminary results (R 2 = 0.90) obtained taking the best condition dataset into account. The estimated yield was validated with ground sample data; the results are shown in Figure 6.  The mean ± standard deviation values of: Clusters per vine -number of clusters per vine monitored by ground observation; Green Clusters per vine -unripe clusters; Clusters per vine UAV -clusters detected by UAV; Clusters per vine UAV ADJ -number of clusters detected by UAV adjusted taking into account the presence of double clusters. TPR and TPR_Ripe is the true positive rate related to total and ripe clusters. Table 3 reports yield values recorded and estimated during the 2 years trial. In the 2017 season, the average production per plant in different vigor zones was calculated both with traditional ground measurements (HV = 803.7 ± 356.9, LV = 371.9 ± 202.0) and total cluster weight per vine detected by UAV (HV = 682.7 ± 279.6, LV = 323.0 ± 165.1). The yield estimation in the 2018 season confirmed the results of the previous year with a higher value both in ground measurements (HV = 2838.1 ± 1346.4, LV = 1559.2 ± 1066.4) and UAV estimation (HV = 2602.8 ± 1339.4, LV = 1315.7 ± 605.0). The results related to yield data estimation in worst condition evaluated during 2017 season confirm the need of line of sight to apply this methodology.

DISCUSSION
Remote sensing images acquired during June by a UAV platform equipped with a multispectral camera allowed the assessment of spatial variation in terms of vigor in the experimental vineyard in both seasons. As reported by Romboli et al. (2017), the end of June is a good time to characterize vineyard vigor variability, and the map produced by the flight campaign in August confirmed the variability within the vineyard detected during the preliminary flight in June (Figure 4).
The results of the experimentation showed that the proposed methodology has difficulty discriminating green clusters within the canopy. As reported in Grocholsky et al. (2011), the clusters detected with minimal number of false negatives were due to unripe grapes and small clusters related to lateral shoot production. However, the exclusion of unripe clusters during the evaluation of vineyard yield potential gives added value to the effectiveness of the methodology. Unripe clusters cause a product quality loss by conferring a series of sensory characteristics and often producing astringent, bitter and low-alcohol wines (Peyrot des Gachons and Kennedy, 2003;Canals et al., 2005;Kontoudakis et al., 2011).
The cluster detection approach showed a poor performance in the worst condition dataset, substantially due to the leaf cover that blocked line of sight of the cluster, so the yield analysis was only performed on the best dataset. However, fruit zone defoliation is a widely used canopy management practice to improve light exposure for grape quality (Reynolds et al., 1986;Bergqvist et al., 2001;Downey et al., 2006;Cohen et al., 2012;Romboli et al., 2017). Defoliation at the beginning of veraison could cause higher incidence of sunburn damage, but 2-3 weeks before harvest is a diffuse practice also in warm winemaking areas, providing a risk reduction of fungal attack on grape clusters due to improved air circulation, decreased humidity and better penetration of fungicide sprays (Pieri and Fermaud, 2005;Sabbatini andHowell, 2010, Noyce et al., 2016). In that sense, our study takes into account a partial defoliation treatment only on the morning side (north-east) of the canopy, aiming to prevent sunburn and obtain the positive effects of this canopy management.
For the UAV yield estimation results, we combined clusters detected with cluster weight calculated on the basis of cluster dimensions extracted by pixel counts from high resolution images and cluster weight observed by ground measurements in a commercial vineyard in Italy. At the end of 2017 season, the UAV yield prediction approach was evaluated through the comparison of yield ground measurements on a large sample of vines in each vigor zone and data extrapolated from UAV images on a restricted number of vines in each zone. Correlation between the cluster segmentation approach and ground truth measurements, on 2017 data, was used to estimate yield from 2018 images analysis and then compared with traditional field measurements. The yield calculated from UAV images provided high accuracy (over 84.4% in both years) following the strong variability within the vineyard and identifying almost double yield in HV than LV zone, as clearly shown in biomass sampling data for both years. The yield data showed a great inter-annual variability due to the 2 years being completely different. The 2017 season was extremely hot and dry compared to 2018, and grape production in terms of cluster number and weight, resulted exceptionally lower. A direct consequence was less leaf coverage and a greater cluster separation in the 2017 season, which favored the method performance compared to the survey in 2018.
An accurate estimation of the yield several weeks before harvesting by a fast and non-destructive method, such as the one described in this paper, can provide very valuable information for the farmer for canopy management decisions, such as grape trimming, as well as for harvest planning . The proposed methodology is a good alternative to traditional measurement, due to its accuracy and relative speed and would provide the farmer with a promising tool for yield prediction in a fast and precise way. In recent years, innovative solutions have been proposed for grape image analysis based on lab Tello et al., 2016) or on-the-go measurements (Aquino et al., 2018). Those methods are precise as the result of a proximal sensing approach, but are weak in terms of timing, which plays a key role in agriculture management. Diago et al. (2015) found that the best results (R 2 between 69 and 95% in berry detection and between 65 and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capacity based on image analysis to predict berry weight was 84%. Tello et al. (2016) studied cluster length, width and elongation by 2D image analysis and found significant and strong correlations with the manual methods with r = 0.959, 0.861, and 0.852, respectively. Aquino et al. (2018) applied mathematical morphology and a pixel classification method, which yielded  overall average Recall and Precision values of 0.876 and 0.958, respectively. The lab approach is very time-consuming due to the need for destructive grape sampling in the field and transport to the lab, followed by image acquisition and analysis. Regarding on-the-go monitoring methods, the camera moving along the inter-row must be very close to the side of the canopy, providing an image with high resolution but related to very low canopy area. Consequently, in order to monitor a reasonable number of vines it is necessary acquire a large number of images, which causes a massive time increase for data elaboration. The advantage of the UAV approach is that working at a greater distance from the target vines it can acquire up to 10 plants in one image, at the same time providing enough resolution to correctly discriminate the clusters within the canopy. Moreover, the ground solution is dependent on terrain conditions; in fact, wet, sloping, ploughed or uneven soil could affect linear advancement of the platform and therefore image quality. Our methodology works well with vertical shoot position, which is a common and widely used trellis system due to its many good points: compatibility with vineyard mechanization, suitability for many grape varieties, fungal disease risk reduction by allowing good air circulation and light exposure.
The best results obtained depended on flight planning for correct image acquisition (flight altitude, gimbal angle, speed, image frequency acquisition, etc.) and sunlight condition (solar angle, light and shadows). This could be expected given differences in grape hue, color, size and cluster compactness, and because these differences were accentuated by the level of ripeness at the time of image acquisition. All these features greatly influenced the algorithms for contour detection.
A cost-benefit analysis was conducted for three ideal vineyard sizes and two scenarios: (i) accounting for a contractor service, thus not including the cost of purchasing a UAV and related maintenance costs or hiring an agronomist, (ii) assuming the cost of UAV+camera (Table 4).
Overall, on all three solutions (5, 10, and 50 ha) the use of UAV for implementing the proposed methodology appears to be the most time saving but is detrimental in terms of total cost due to the UAV and RGB camera purchase. This methodology requires a minimum farm size of 60 ha (data not shown) in order to depreciate the fixed-cost investments. However, where the field size is smaller than 60 ha, specialist UAV service providers, sharing of farming equipment and cooperative approaches may be suitable for use of the platform by different farmers (Zarco-Tejada et al., 2014). In fact, excluding platform and camera purchase, the final cost is in favor of using UAV. For the three spatial scales analyzed, savings are always slightly less than 50% compared to the cost of the traditional methodology. Although the error of the proposed methodology cannot be assessed in the cost analysis, nor the error relating to manual pre-harvest inspection, it should be noted that this methodology is able to capture the variability, in terms of production, within the vineyard and between the 2 years of analysis, with a certain tendency to underestimate the yield. It should also be considered that traditional survey methods are labor intensive and subject to observer bias (Zhou et al., 2018), while the automated methodology could reduce the bias and help breeders in crop yield phenotyping and farmers to be more efficient in crop planning, reducing labor costs and optimizing the available resources.
For our case study, a first flight was needed to identify zones with different vigor and recommend partial defoliation. It must be taken into account that the latter is a common technique that it is usually performed by the farmer some weeks before harvest, so there would be no additional cost. For the identification of the zones with different vigor the cost of a further flight could be avoided based on the direct experience of the farmer or consulting the freely available NDVI maps that are provided by satellite platforms such as Sentinel 2 or Landsat 8.

CONCLUSION
The application of innovative technologies in field phenotyping such as UAV, digital image analysis tools and image interpretation techniques promises a methodology for yield and quality traits estimation in a vineyard in order to rapidly monitor representative zones in a large acreage, improve the quality of recording and minimize error variation between samples.
The methodology for cluster detection and image analysis described in this paper has proven to be a useful and reliable tool for yield assessment in a vineyard. The approach for imageacquisition and data elaboration is simple and low-cost as it only needs a commercial RGB camera, a base level UAV platform and free image analysis software.
This study analyses the potential of UAV technology to estimate yield in a vineyard with high resolution RGB images in partially leaf removal and full ripe conditions several weeks before harvest. First, an unsupervised cluster detection approach was tested on two different datasets in worst (leaves cover, shaded fruit) and best condition (partially defoliated and directly illuminated fruit) in both high and low vigor zones during the 2017 season. A linear correlation between yield per vine and ground truth measurements in different vigor zones was then performed. The correlation parameters were applied to the 2018 dataset, providing interesting yield prediction performance with about 12% under-estimation. Further tests are necessary to extend and confirm the preliminary results obtained from this study, in terms of camera setting (exposure, etc.), optimal environmental conditions (time of day, angle of incidence of the sun, etc.), setting of the gimbal (camera angle of inclination), flight parameters (speed, flight quote, overlap, etc.) and crop features (more varieties and different crops, training system, phenological stage, crop management, plant spacing, etc.). However, the data are decidedly encouraging. Indeed, given the continuous technological development in image analysis tools, cameras and UAV performances, it will be possible to improve the methodology efficacy in terms of accuracy, times for data acquisition and analysis, and costs. This optimized tool could be a useful support for both phenotyping research and agronomic management.

AUTHOR CONTRIBUTIONS
SDG and AM designed the experiment and coordinated the activity. SDG, AM, PC, and AB performed the UAV data acquisition. SDG, AM, PC, and PT data analysis and wrote the manuscript. All authors read and approved the final manuscript.