Remote Estimation of Rice Yield With Unmanned Aerial Vehicle (UAV) Data and Spectral Mixture Analysis

The accurate assessment of rice yield is crucially important for China’s food security and sustainable development. Remote sensing (RS), as an emerging technology, is expected to be useful for rice yield estimation especially at regional scales. With the development of unmanned aerial vehicles (UAVs), a novel approach for RS has been provided, and it is possible to acquire high spatio-temporal resolution imagery on a regional scale. Previous reports have shown that the predictive ability of vegetation index (VI) decreased under the influence of panicle emergence during the later stages of rice growth. In this study, a new approach which integrated UAV-based VI and abundance information obtained from spectral mixture analysis (SMA) was established to improve the estimation accuracy of rice yield at heading stage. The six-band image of all studied rice plots was collected by a camera system mounted on an UAV at booting stage and heading stage respectively. And the corresponding ground measured data was also acquired at the same time. The relationship of several widely-used VIs and Rice Yield was tested at these two stages and a relatively weaker correlation between VI and yield was found at heading stage. In order to improve the estimation accuracy of rice yield at heading stage, the plot-level abundance of panicle, leaf and soil, indicating the fraction of different components within the plot, was derived from SMA on the six-band image and in situ endmember spectra collected for different components. The results showed that VI incorporated with abundance information exhibited a better predictive ability for yield than VI alone. And the product of VI and the difference of leaf abundance and panicle abundance was the most accurate index to reliably estimate yield for rice under different nitrogen treatments at heading stage with the coefficient of determination reaching 0.6 and estimation error below 10%.

The accurate assessment of rice yield is crucially important for China's food security and sustainable development. Remote sensing (RS), as an emerging technology, is expected to be useful for rice yield estimation especially at regional scales. With the development of unmanned aerial vehicles (UAVs), a novel approach for RS has been provided, and it is possible to acquire high spatio-temporal resolution imagery on a regional scale. Previous reports have shown that the predictive ability of vegetation index (VI) decreased under the influence of panicle emergence during the later stages of rice growth. In this study, a new approach which integrated UAV-based VI and abundance information obtained from spectral mixture analysis (SMA) was established to improve the estimation accuracy of rice yield at heading stage. The six-band image of all studied rice plots was collected by a camera system mounted on an UAV at booting stage and heading stage respectively. And the corresponding ground measured data was also acquired at the same time. The relationship of several widely-used VIs and Rice Yield was tested at these two stages and a relatively weaker correlation between VI and yield was found at heading stage. In order to improve the estimation accuracy of rice yield at heading stage, the plot-level abundance of panicle, leaf and soil, indicating the fraction of different components within the plot, was derived from SMA on the six-band image and in situ endmember spectra collected for different components. The results showed that VI incorporated with abundance information exhibited a better predictive ability for yield than VI alone. And the product of VI and the difference of leaf abundance and panicle abundance was the most accurate index to reliably estimate yield for rice under different nitrogen treatments at heading stage with the coefficient of determination reaching 0.6 and estimation error below 10%.
Keywords: rice, yield, remote sensing (RS), unmanned aerial vehicle (UAV), vegetation index (VI), spectral mixture analysis (SMA) INTRODUCTION Rice (Oryza sativa L.) is one of the most important grain crops in the world, especially in China. There are more than half of the China's population regarding rice as the staple food (Xiao et al., 2002). The accurate estimation of rice yield is of significance to ensure food security and promote sustainable development.
Remote sensing (RS) is a technique which can obtain the information about an object without making physical contact with the object (Wikipedia 1 ) and it can efficiently obtain canopy spectra data in a non-destructive way, which carries valuable information indicating the canopy interaction with solar radiation such as vegetation absorption and scattering (Thenkabail et al., 2011). The vegetation canopy spectra are closely related to vegetation growth. In the visible range, vegetation has strongly absorption of light due to the pigments and presents as low reflectance (Woolley, 1971). But in the nearinfrared (NIR) range, vegetation reflectance is relatively high and affected by thick plant tissues and canopy structure (Gausman et al., 1969). A series of studies have been developed to relate the vegetation spectra to vegetation growth parameters such as chlorophyll content (Gitelson et al., 2006;Wu et al., 2008;Feret et al., 2011), leaf area index (LAI) (Broge and Leblanc, 2001;Viña et al., 2011) and biomass (Thenkabail et al., 2000;Hansen and Schjoerring, 2003), and thus a lot of vegetation indices (VIs) calculated from reflectance of different spectra ranges (Hatfield et al., 2008) have been proposed to accurately estimate these parameters. For example, Peng et al. (2013) used Enhanced Vegetation Index (EVI) and Wide Dynamic Range Vegetation Index (WDRVI) obtained from MODIS to accurately estimate the gross primary productivity in crops with coefficients of variation below 20% in maize and 25% in soybean (Peng et al., 2013). Li et al. (2014) applied several red-edge based spectral indices to estimate plant nitrogen uptake with the coefficient of determination above 0.76. Parametric statistical approaches based on VIs are by far the simplest and most studied variable estimation approaches and have been widely used in monitoring crop growth (Verrelst et al., 2015). And the changes of crop growth status, which can be effectively monitored by spectral measures, directly determine its ultimate yield. Therefore, VIs have also exhibited good potential in remote estimation of crop yield especially at the large scales (Rahman et al., 2012;Sun et al., 2017). Becker-Reshef et al. (2010) estimated the wheat yield in Kansas and Ukraine with a 7% error by using time series Normalized Difference Vegetation Index (NDVI) data from MODIS. Holzman et al. (2014) came up with a new method to estimate regional crop yield using the Temperature Vegetation Dryness Index (TVDI) with the RMSE values ranged from 12 to 13% for soybean and 14 to 22% for wheat. Sakamoto et al. (2014) mapped U.S. corn yields successfully using Wide Dynamic Range Vegetation Index (WDRVI) derived from timeseries MODIS data with the estimation error below 30% at the state level. Generally, VI-based methods are the mainstream approach for crop yield prediction, and a lot of regression algorithms of using VIs to estimate crop yield were established including simple 1 https://en.wikipedia.org/wiki/Remote_sensing linear functions and complex non-linear functions (Tucker et al., 1980). Many experiments showed that the use of appropriate VI was the key to crop yield estimation instead of the complex function structures in VI-based methods (Gholizadeh et al., 2015). Therefore, it is particularly important for crop yield evaluation to get the exact VIs.
However, there may be a considerable discrepancy between pixel sizes of RS images (e.g., 1 km for MODIS satellite image) and much smaller sizes of croplands (e.g., usually 33 m × 20 m in South China) due to the limitation of the spatial resolution as well as the landscape fragmentation (Gong et al., 2018). In this case, one pixel of a RS image may contain several land cover types and the signal of this pixel (mixed pixel) is the outcome of various land cover components that have significantly different spectra. VI, derived from the spectra of such mixed pixels, may encompass the unexpected information of the components not related to yield, which could lower the precision of yield estimation. The mixed pixel is always a problem that have to be discussed in the application of RS technique. Even for the high spatial resolution images, the problem of mixed pixel still exists because of the smaller cropland components such as flower, fruit and grain. This problem is more obvious when addressing rice yield prediction with RS data in VI-based method. Rice is a type of grain crop, the panicle will gradually emerge in paddy rice field when rice steps into reproductive phase and may last until maturation. In the early stage of rice panicle emergence, the panicle was bright green and scarcely distributed in rice canopy, while rice leaves were still the dominant component in paddy rice field. With the continuous growth of rice, the amount of panicle increased and its color gradually changed to yellow. Since rice panicle had the dramatically different spectra from rice leaves, the remotely detected canopy spectra of rice after panicle emergence were greatly mixed by panicle and leaf spectra, and the accuracy of estimated crop growth parameters in VI-based method would decrease. Zhou et al. (2017) found that the appearance of the rice panicle enhances the difficulty of LAI estimation and yield prediction in the later growth stages of rice. And Sakamoto et al. (2011) found the same result in rice that the color index performed well-before the booting stage but poorly in the late growth stages. Harrell et al. (2011) reported that the reduced predictive ability at heading stage was most likely associated with the uneven emergence of panicles into the sensor field of view. These studies showed that the canopy spectra remotely obtained from rice in late growth stages were susceptible to the interference of uneven emergence of panicles, and the factor of spectral mixture that influence the rice yield estimation must be considered.
Spectral mixture analysis (SMA) was extensively used to quantify the spectral contributions from different components in a mixed pixel. It assumed that the spectrum of a mixed pixel was a linear or non-linear combination of its constituent spectral components (called endmember) weighted by their subpixel fractional cover (called abundance) (Somers et al., 2011). Once the pure spectra of endmembers were obtained, the fraction of each endmember within a mixed pixel can be estimated based on its mixed spectra (Bioucas-Dias et al., 2012). SMA has been widely applied in RS for evaluating vegetation properties. Gitelson et al. (2002) developed an approach to estimate vegetation fraction in sampling zones based on measured spectra of two endmembers (bare soil and dense vegetation). Lobell and Asner presented a SMA approach that employs time series of MODIS data to estimate subpixel fractions of land cover types with R 2 reaching 0.8 (Lobell and Asner, 2004). Therefore, SMA can be a good tool to analyze the influence of spectral mixture to rice yield estimation and it also has good potential to improve the accuracy of rice yield estimation by combining with VIs.
In recent years, unmanned aerial vehicles (UAVs) have become an increasingly used platform for RS application in Precision agriculture due to its high spatial and temporal resolutions (Zarcotejada et al., 2013;Candiago et al., 2015;Fang et al., 2016;Aasen and Bolten, 2018). And the UAV-collected data is playing a more and more important role in monitoring crop growth. Based on multi-band images obtained by an UAV system, this study explores to improve VI-based approach for rice yield estimation by combining with SMA.

Study Area
The study site was located at the Rice Experiment and Research Base of Huazhong Agricultural University near Wuxue City, Hubei Province, China (30.1117 • N, 115.5892 • E). In this investigation, the data from 24 rice plots was studied - Figure 1A. They were of the size about 20 m 2 including around 330 plants and all planted with the same hybrid of rice. The field management for these plots were similar except that different levels of nitrogen fertilizer were applied. Eight levels of nitrogen fertilizer (0, 3, 5.5, 8.5, 11, 14, 16.5, and 19.5 kg/ha) were utilized, and each level was repeated on three randomly distributed plots - Figure 1B. The growing period for rice in our study was from June to September, and field experiments were conducted during booting (13 August, 2015) and heading (29 August, 2015) stages of rice growth. At these two stages, rice has gradually completed the transformation from vegetative period to reproductive period. There was no obvious panicle in rice field at booting stage; on the contrary, the panicle began to emerge at heading stage. During the course of each experiment, one UAV flight was arranged to obtain the image of all rice plots. After the UAV flight (from 11:00 am to 2:00 pm), the corresponding ground measurements were carried out in situ immediately.

Field Data Collection
The LAI and leaf chlorophyll content were measured at booting (13 August, 2015) and heading (29 August, 2015) stage of rice growth. Ground rice canopy LAI was measured using a SunScan canopy analysis system (Delta-T Devices, Ltd., Burwell, Cambridge, United Kingdom) under windless conditions and stable light levels. For each plot, three LAI readings were acquired and their average represented the canopy LAI of the plot. A SPAD-502 Chlorophyll Meter (Soil and Plant Analyzer Development Chlorophyll Meter, Spectrum Technologies, Inc., Plainfield, IL, United States; abbreviated as SPAD) was utilized to measure the leaf chlorophyll content of rice. Five positions were selected in each plot, and the rice leaf chlorophyll content was measured in each position. For each position in plot, three SPAD values of top layer leaf, middle layer leaf, and bottom layer leaf were recorded and the average SPAD indicated the rice leaf chlorophyll content of the position. And the plot level leaf chlorophyll content was the average of five SAPD values in five positions.
At the heading stage of rice, some samples of typical ground features were collected and their spectra were measured as the endmember spectra used for SMA. The spectra of six kinds of endmember were measured using an ASD Field Spec 4 spectrometer (Analytical Spectral Devices Inc., Boulder, CO, United States), including top layer leaf (TL), bottom layer leaf (BL), top layer panicle (TP), bottom layer panicle (BP), dry soil (DS), and wet soil (WS). Samples of different endmember were collected from the studied area and their spectra were measured in situ immediately, and the measurements were carried out in a stable light condition after UAV flight. For soil spectra collection, we walked into the rice field and selected several representative sample-collecting spots in the ridges between plots. The measurement of soil spectra was conducted using ASD probe pointing to the ground vertically at the appropriate height to make sure the field of view was full of wet or dry soil with no other land cover features, and the averaged spectra were used as soil spectra. In the same way, the panicle spectra were obtained with ASD probe pointing downward approximately 10 cm above the panicle samples. Since the rice panicle was small and granular, the samples of panicle were put on a black background and evenly spread. As for the leaf spectra, a selfilluminated leaf clip of ASD was used and then the spectra of top layer leaf and bottom layer leaf were taken respectively. In the whole study area, five positions were selected randomly and the spectra of top layer leaf, bottom layer leaf, top layer panicle, and bottom layer panicle were gained respectively for each plot. Similarly, the averaged spectra were used as their endmember spectra.
At maturity, the all rice plants in each plot were harvested manually for determination of grain yield. The seeds were cleaned and exposed to the sun until their weight did not change. And then all the dry seeds were weighted according to plots and the plot-level yield was obtained.

Surface Reflectance and Vegetation Index Derived From UAV Data
The image of study plots was acquired using a Mini-MCA system mounted on a UAV (S1000, SZ DJI Technology, Co., Ltd., Shenzhen, China) on 13 August and 29 August, 2015 - Figure 2. The Mini-MCA system consists of an array of six individual miniature digital cameras (Mini-MCA 6, Tetracam, Inc., Chatsworth, CA, United States) - Figure 2C. Each camera imager was equipped with a customer-specified band pass filter centered at a wavelength of 490, 550, 670, 720, 800, or 900 nm, respectively, and the band width was 10 nm. These bands were selected since they were commonly employed to analyze vegetation growth-related parameters (Behrens et al., 2006;Ray et al., 2010;Kira et al., 2015).  The Mini-MCA system was attached to the UAV on a gimbal which can help to compensate for the UAV movement (pitch and roll) during the flight and guarantee close to nadir image collection - Figure 2B (Turner et al., 2014). Since the imaging system had a significant camera mis-registration effect, six cameras were co-registered in the laboratory prior to the flight so that corresponding pixels of each lens were spatially overlapping in the same focal plane (Jhan et al., 2016). UAV flight was conducted under clear skies with little cloud cover between 10:00 and 14:00 local time when the changes in the solar zenith angle were minimal. The image was collected at approximately 60 m above the ground with the spatial resolution of 3 cm around.
In this study, an empirical linear correction method was applied to transform image Digital Numbers (DNs) into surface Reflectance (ρ) (Dwyer et al., 1995;Laliberte et al., 2011). Four calibration ground targets were placed in the cameras' field of view as a standard for image radiometric corrections and the image taken from Mini-MCA system included all of them. The calibration targets used in this study have the relatively constant reflectance of 0.06, 0.24, 0.48, and 1 respectively throughout the visible to NIR wavelengths and are specially utilized for aerial image radiometric calibration. As a linear relationship was assumed between DNs and ρ, the canopy surface reflectance can be calculated as ρ λ = DN λ × Gain λ + Offset λ (λ = 490, 550, 670, 720, 800 and 900 nm) (1) Where ρ λ and DN λ are the surface reflectance and digital number, respectively, of a given pixel at wavelength λ; Gain λ ; and Offset λ are gains and bias of the camera at different wavelengths respectively. For each wavelength, Gain λ and Offset λ can be calculated using the least-square method from ρ and DN values (referring to DN 0.06 , DN 0.24 , DN 0.48 , and DN 1 ) of four calibration targets.
For each of 24 plots, we defined a maximum rectangle in the image that fit the plot - Figure 1B. The rectangle included approximated 18000 pixels, and the plot-level VI was retrieved by averaging all of the per-pixel values within the rectangle. The VI tested in this study are shown in Table 1.

Fully Constrained Least Squares Linear Spectral Mixture Analysis
Although UAV made it possible to get high resolution data, one pixel on a UAV image still encompassed several land cover components especially for crops with distinct grains like rice.
When rice was at heading stage, there were six dominant components visible in the field of view - Figure 3, including top layer leaf (TL), bottom layer leaf (BL), top layer panicle (TP), bottom layer panicle (BP), dry soil (DS), and wet soil (WS). And their continuous spectra were collected by an ASD spectrometer used as the endmember spectra for SMA - Figure 4. Compared with the ground measured reflectance spectra, the spectra acquired by the MCA camera onboard UAV were discrete wavebands (490, 550, 670, 720, 800, and 900 nm, 10 nm band width). For each endmember, with reference to the wavelength range of each MCA waveband, the average of ground measured endmember reflectance in the corresponding wavelength range was calculated as the endmember reflectance used for the SMA of UAV images. Therefore, six kinds of endmember reflectance were obtained as: ρ(TL), ρ(BL), ρ(TP), ρ(BP), ρ(DS), and ρ(WS).
In this study, fully constrained least squares linear SMA method was employed to estimate the abundance fractions of components present in an image pixel. For linear spectral mixing model, a mixed pixel was defined as a linear combination of components with their relative concentrations (Heinz, 2001). In this case, the reflectance ρ of a mixed pixel at wavelength λ can be approximated as where e is noise or can be interpreted as a measurement error, N is the number of selected endmembers, Abd i denotes the abundance fraction of endmember i, and ρ λ (i) is the reference reflectance of endmember i at wavelength λ. And for the fully constrained linear mixing model, two partially constraints were considered which were referred to as non-negatively constraint and sum-to-one constraint 0 ≤ Abd i ≤ 1; and   As shown above, the value of abundance was non-negative with its maximum constrained to 1 and for each mixed pixel the sum of the endmember abundance equals to 1. Generally, the abundance that meets these conditions can indicate the proportion of the corresponding endmember within the mixed pixel. An abundance of 0 means no presence of the particular endmember in this pixel, while an abundance of 1 indicates that this pixel is a pure pixel of the particular endmember.
To solve the abundance of these selected six components, a least squares method was applied (Pu et al., 2014). In this study, we used MATLAB (MATLAB 2016a, MathWorks, Inc., Natick, MA, United States) to derive the abundance gray images of six selected endmembers whose gray level values meant the estimated abundance of the corresponding endmember in the image pixel. The plot-level abundance was calculated in the same manner as plot-level reflectance, and for each plot the same maximum rectangle was utilized again. In consideration of endmember selection, the panicle abundance (Abd P ) and leaf abundance (Abd L ) were the sum of the corresponding elements in top layer and bottom layer together.

Data Analysis Among UAV Data, Ground Measured Data, and Rice Yield
In this study, the IBM SPSS Statistics (Statistical Product and Service Solutions 22.0, IBM, Armonk, NY, United States) was used to statistically describe and analyze data. Firstly, a normally distribution test was applied to the rice yield data and the product data of LAI and SPAD value (LAI × SPAD) at booting and heading stage. The result of Shapiro-Wilk test served as the indicator to test whether the data is normally distributed or not. And then, the correlation analysis and regression analysis were successively applied. The Pearson correlation coefficient (r) was exhibited as the result of correlation analysis. And for regression analysis, adjusted R square (R 2 ), root mean square error (RMSE) and p-value were analyzed and compared. The detailed calculation method of above evaluation indices can be found in the help document of SPSS.
The relationship between ground measured data (LAI × SPAD) and rice yield was firstly established at booting and heading stage respectively. And the plot-level VI derived from UAV data was correlated with rice yield directly and compared with the ground measured data. The difference of these two rice growth stages was discussed and analyzed.
At the heading stage of rice, in consideration of the impact of spectral mixture within 1 pixel, the leaf and panicle make different spectral contributions to the reflectance of the mixed pixel. For each studied plot, different proportion of leaf and panicle may affect the accuracy of yield estimation based on plotlevel VI at heading stage. In order to eliminate the influence, plot-level VI was incorporated with abundance information to relate with rice yield and four relationships were developed using 23 samples (one of the rice yield data was obviously wrong): (1) yield vs. VI, (2) yield vs. VI × Abd L , (3) yield vs. VI × Abd P and (4) yield vs. VI × Abd L−P . The different estimation ability for rice yield of these four kinds of index was compared and evaluated.

Algorithm Establishment Using Leave One Out Cross-Validation
The final rice yield estimation model was established using a leave one out cross-validation method. Leave one out crossvalidation is s statistics method widely applied in model establishment and validation (Fielding and Bell, 1997). It divided the samples into two groups, one for training and the other one is used for validation. For leave one out crossvalidation, the validation set just included one sample and the training and validating process was repeated K times (K equals to the number of samples, K = 23 in this study). For each time i, K-1 samples were used iteratively as training data to calibrate the coefficients (Coef i ) of the algorithm with the accuracy was measured in terms of coefficients of determination (R 2 i ), and the remaining single sample was used for validation to obtain the estimation error (E i ). The training and validating process was repeated K times until every single sample was used exactly one time for validation data. After K iterations, the coefficients and accuracy of the final algorithm can be produced as

Relationship Between Ground Measured Data and Rice Yield
In general, the product of LAI and leaf-level SPAD (LAI × SPAD) was used to estimate canopy chlorophyll status of rice and has been proven to be a promising index to predict rice yield (Liu et al., 2017). In this study, LAI × SPAD was calculated according to the ground measured LAI and SPAD value both at booting and heading stage. Among all plots which have yield data, one of the plot-level LAI records was missing and 22 LAI × SPAD values was calculated at these two stages. Before correlation analysis and regression analysis, a normal distribution test was applied to the rice yield data and LAI × SPAD data. The result of Shapiro-Wilk test turned out that the yield and LAI × SPAD data followed normal distribution - Table 2. And a simple linear regression analysis of LAI × SPAD and yield was presented in Figure 5A. At booting stage, the result provided a satisfactory linear fitting equation between yield and LAI × SPAD with a relatively high R 2 value (R 2 = 0.627 * * ). However, at heading stage, an obviously lower R 2 value (R 2 = 0379 * * ) was found compared with booting stage.

Correlations of Vegetation Index With Yield and LAI × SPAD
Vegetation index calculated from reflectance of different wavebands could be used to estimate rice growth parameters and thus has a good potential to predict the rice yield. To compare the relationships of VI to LAI × SPAD and yield, we performed a correlation analysis at booting and heading stages. The result of correlation analysis indicated that the VIs were correlated positively to LAI × SPAD and yield except VARI, with 0.01 significance levels at two individual stages - Table 3. On the whole, the Pearson correlation coefficients (r) of VI and LAI × SPAD were obviously higher than that of VI and yield at these two stages. Compared with heading stage, most VIs exhibited better correlation with yield and LAI × SPAD at booting stage. Generally, the VIs which had better correlation with LAI × SPAD also produced higher r values with yield at both two stages - Figure 6. At booting stage, SR and NDRE showed most strong correlations with yield (r was 0.756 and 0.730 respectively), and the r values of most VIs were above 0.7 except CI green , EVI2 and EVI.
Where it came to heading stage, the situation was different, the r values of all VIs were below 0.7. Among the test indices, NDRE and NDVI had the highest r values with yield (r was 0.697 and 0.695 respectively). In addition, the relationships of VARI vs. yield appeared obvious extremely weak correlation at both booting and heading stage. For the most relevant VI with yield at each stage (SR at booting stage and NDRE at heading stage), their correlation with LAI × SPAD had little difference (r was 0.844 and 0.818 respectively), but the result was obviously different when correlated with yield (r was 0.756 and 0.697 respectively).

Spectral Mixture Analysis in Rice Field
In consideration of the impact of mixed pixel, SMA was applied to the UAV image obtained at rice heading stage. According to the ground measured spectra, there was obvious spectral difference in various endmembers - Figure 4. In the top layer, both leaf and panicle spectra appeared the peak and valley configuration as that of other typical vegetation spectra characteristics. Nevertheless, leaf reflectance was a little higher than panicle reflectance in blue bands (7 vs. 5%) and it is much higher than panicle reflectance in NIR bands. On the contrary, leaf reflectance was a bit lower than panicle reflectance in red bands. Compared to the top layer, both leaf reflectance and panicle reflectance were low in the bottom layer, but leaf reflectance was still higher than panicle reflectance in NIR bands. As for soil endmember, the reflectance decreased at all wavelengths with soil moisture increasing.
In this study, abundance image of each component was obtained using fully constrained least squares linear SMA and the spectra of selected endmember measured by ASD. As shown in Figure 7, significant differences existed in abundance images of different endmembers. On the whole, the brightness of the dry and wet soil abundance images was relatively low. And the bright pixels were mainly clustered in the ridges surrounding the plots in these two abundance images, while the pixels located at rice growing area were really dark. For the other four abundance images, leaf abundance images were obviously brighter than panicle abundance images both in top and bottom layers which was fully corresponded to the actual occurrence in paddy field. Noted that obvious brightness heterogeneity was existed among different plots in the images, and such heterogeneity patterns were quite different in panicle abundance images which indicated the uneven emergence of rice panicles. In consideration of endmember selection, the sum of panicle abundance in top layer and bottom layer was calculated as panicle abundance (Abd P ), and the leaf abundance (Abd L ) was obtained in the same way. A normal distribution test was also used to Abd L and Abd P data, the result of statistical description and Shapiro-Wilk test was presented in Table 2. Evidently, Abd L concentrated near 1, while Abd P concentrated near 0. The aggregation effect of abundance data was more apparent in the fit plot of yield with Abd L and Abd P - Figure 5B.

Rice Yield Estimation Using Vegetation Index and Abundance Data
At heading stage, the uneven emergence of panicles may influence the accuracy of rice yield estimation. Since panicle abundance and leaf abundance were the indicators of how much panicle had been emerging, in our proposed approach the VIs were incorporated with the information of plot-level panicle abundance (Abd P ) and leaf abundance (Abd L ) to estimate the yield of rice. In this part, yield was firstly correlated with VI, VI × Abd L , and VI × Abd P respectively, and the Pearson correlation coefficients (r) were compared - Table 4. Generally, after multiplied by Abd L or Abd P , most VIs produced relatively higher r values with yield than VI alone. The result of correlation analysis indicated that the VI × Abd L was correlated positively with yield; on the contrary, VI × Abd P showed a negative correlation. And the correlation is numerically higher in VI × Abd P and yield than in VI × Abd L and yield. In  particular, VARI showed a really weak correlation with yield at heading stage (r was −0.386), and there was even no correlation in VARI × Abd L and yield. However, after multiplied by Abd P , VARI showed a relatively strong correlation (r was −0.734).
For further analysis, regression analysis has been used, and three linear relationship were developed using 23 samples: (1) yield vs. VI, (2) yield vs. VI × Abd L and (3) yield vs. VI × Abd P . Adjusted R 2 , RMSE and p-value were obtained in SPSS - Table 5. For all tested indices, multiplying abundance data (VI × Abd P and VI × Abd L ) significantly increased the goodness of fit with yield, and using VI × Abd P to regress with rice yield was more accurate than using VI × Abd L with lower RMSE and higher Adjusted R 2 values. Among the Abd L multiplied indices (VI × Abd L ), NDVI × Abd L had the best goodness of fit with yield (Adjusted R 2 was 0.555), while CI green × Abd P for Abd P multiplied indices (Adjusted R 2 was 0.615). Whereas the saturation phenomenon of NDVI, the GNDVI × Abd L was also taken into account. As shown in Figure 5C, the linear fitting result of yield and these three indices (NDVI × Abd L , GNDVI × Abd L and CI green × Abd P ) was acceptable with R 2 above 0.56, and CI green × Abd P produced the highest R 2 (R 2 was 0.633). However, the saturation phenomenon of NDVI was still existed in NDVI × Abd L , and the CI green × Abd P data concentrated near 0 which caused a phenomenon similar to saturation. There was no obvious saturation in the fit plot of yield and GNDVI × Abd L . The Abd P multiplied indices (VI × Abd P ) could produce higher R 2 when regressed with yield, but was subjected to saturation (the value was closed to 0). As for Abd L multiplied indices, they were not susceptible to saturation although their R 2 was relatively lower. And Figure 5B showed that Abd L related positively with yield but Abd P showed a negatively correlation. Therefore, the difference of Abd L and Abd P (Abd L−P ) was next used to incorporate with VI. In this way, the correlation analysis and regression analysis were also applied to the relationship of VI × Abd L−P and yield -Tables 4, 5. Compared with VI, VI × Abd L−P acquired better goodness of fit when regressed with yield, Adjusted R 2 could reach 0.574 and RMSE below 0.282 (NDRE × Abd L−P ). And for the best two indices (NDRE × Abd L−P and GNDVI × Abd L−P ), there was no obvious saturation existing in fit plot - Figure 5D. Then on the base of these, leave one out cross-validation approach was utilized to obtain the final yield estimation model. The specific estimation formulas and the goodness of fit between measured yield and estimated yield was shown in Figure 5E.

DISCUSSION
The primary purpose of this study was to improve the accuracy of rice yield estimation at heading stage based on the UAV data.
Vegetation index (VI) and SMA were incorporated to construct new yield estimation algorithm.
In this paper, the same two sets of data were acquired at booting and heading stage respectively, including ground measured data (LAI × SPAD) and UAV data. At these two stages, rice gradually completed the transformation from vegetative growth to reproductive growth. The interval between each two stages was around 2 weeks, and rice had similar growth status except the influence of panicle. At booting stage, the rice plant flourished but with no panicle appearing in canopy. On the contrary, the panicle gradually emerged in canopy at heading stage with the growth of rice plant. At these two stages, the changes of rice leaves were not obvious - Figure 3.
First of all, the ground measured LAI × SPAD was correlated with rice yield directly at the two individual stages. Before regression analysis, LAI × SPAD data and yield data have passed the normal distribution test. As shown in Figure 5A, the goodness of fit between LAI × SPAD and yield at booting stage was better than that at heading stage. And this result revealed that the booting stage had better predictive ability for rice yield based on ground measured data. The reason for this was probably that the panicle began to emerge at heading stage. Sun et al. (2010) found that the appearance of panicle may lead to the changes of canopy spectral reflectance and thus the predictive ability for yield decreased during the later stages of rice growth. In our study, the LAI × SPAD was calculated from ground measured LAI and SPAD, and the LAI data was obtained by a Sunscan canopy analysis system. According to the principle of the used SunScan instrument, each organ of rice plant (including leaf, panicle, and stem) contributed to the LAI value. Compared with booting stage, the LAI value of heading stage contained extra panicle information. Actually, the LAI × SPAD which derived from green leaves was associated with rice yield (Liu et al., 2017). Therefore, the booting stage had better predictive ability for rice yield than heading stage based on ground measured LAI × SPAD.
The plot-level VI derived from UAV data was also correlated with rice yield. We chose 10 widely used VIs which were successfully applied in estimating vegetation grow-related parameters such as chlorophyll content, LAI, vegetation fraction and grain yield. The similar result was found that the Pearson correlation coefficient (r) between VI and yield at booting stage was mostly higher than that at heading stage - Table 3. And the correlation analysis was also applied to analyze the relationship between VI and ground measured LAI × SPAD. Although the correlation between VI and LAI × SPAD had a bit difference at these two stages, most VIs were well-correlated with LAI × SPAD (r above 0.75). Moreover, the correlation of VI and yield was consistent with the correlation of VI and LAI × SPAD - Figure 6. This result revealed that the VI which had a better good correlation with LAI × SPAD probably produced a higher r value with yield. The LAI × SPAD was, so to speak, a bridge between VI and yield. VI derived from UAV data had correlation with LAI × SPAD and thus had a potential to estimate yield. As shown in Table 3, the satisfactory correlation could be found between VI and LAI × SPAD at both two stages, but the LAI × SPAD correlated VIs showed obviously different correlation with yield at booting and heading stage. As mentioned above, the ground measured LAI × SPAD contained the extra panicle information at heading stage. In this way, the VI, correlated with LAI × SPAD, may also be affected by panicle emergence which lead to the decrease of yield estimation based on UAV data at heading stage.
At heading stage, the plot-level VI was calculated from mixed components including different proportion of panicles, using VI alone for yield regression may introduce unexpected uncertainties. Therefore, the SMA was utilized to improve the predictive ability of VI in rice yield estimation at heading stage.
At this stage, the rice plant had really luxuriant growth and the paddy field was almost covered by rice plants. The canopy of rice was comprised of leaf and panicle - Figure 3. And the background of field was mainly soil (wet soil and dry soil), because the water was drained off. In addition, the field management of rice was strict and there were no other plants affecting the growth of rice such as weed. Therefore, six endmembers were selected, including top layer leaf (TL), bottom layer leaf (BL), top layer panicle (TP), bottom layer panicle (BP), dry soil (DS), and wet soil (WS). Based on fully constrained least squares linear SMA, the abundance images of six mainly endmembers in paddy field were derived. Of the six abundance images we obtained, the brightness of dry and wet soil abundance images was relatively low. And compared to panicle abundance image, the leaf abundance image was obviously brighter both in top layer and bottom layer. This was consistent with the actual situation of paddy field. At heading stage, rice leaves grew lushly and the leaves occupied the largest proportion in paddy field. Due to the different nitrogen treatments applied in 24 plots, the growth situation of rice plant in different plots significantly varied from each other even at the same growth stage. It is observed that there was obvious difference among different plots in leaf abundance images and panicle abundance images - Figure 7. In consideration of endmember selection, the sum of panicle abundance in top layer and bottom layer was calculated as panicle abundance (Abd P ), and the sum of leaf abundance in top layer and bottom layer was calculated as leaf abundance (Abd L ). The result of analysis indicated that neither Abd L nor Abd P followed normal distribution and Abd L concentrated near 1, while Abd P concentrated near 0. This numerical distribution characteristic was in accordance with the sum-to-one constraint condition in fully constrained least squares linear SMA. Accordingly, the abundance data was not adequate for yield estimation due to its aggregation effect - Figure 5B.
Note that, the LAI × SPAD of green leaf was associated with rice yield closely and the VI obtained from UAV data had a good correlation with LAI × SPAD at booting and heading stage. However, the LAI × SPAD derived from VI contained extra panicle information at heading stage which decreased the predictive ability of VI in rice yield. The abundance data, indeed, implies the proportion information of different components in rice field. In this case, Abd L was incorporated with VI to extract the portion of green leaf in total LAI × SPAD (contained panicle portion) and thus the VI × Abd L was used to estimate rice yield at heading stage. For comparison purpose, the relationship of VI × Abd P and yield was also analyzed. The result of correlation analysis revealed that after multiplied by Abd L or Abd P , most VIs produced relatively higher r values with yield than VI alone - Table 3. And the VI × Abd L was correlated positively with yield, while VI × Abd P showed a negative correlation. According to the sum-to-one constraint condition of fully constrained SMA, the correlation of VI × Abd L and yield was opposite to that of VI × Abd P and yield. Obviously, the VI × Abd L , reflected the LAI × SPAD contributed by green leaf, had a positive correlation with yield. The result of analysis also revealed that VI × Abd P exhibited a better correlation with yield than VI × Abd L did. However, as shown in Figure 5C, VI × Abd P concentrated on 0 and the aggregation effect of VI × Abd P produced a relatively higher R 2 . The reason was that Abd P was close to 0 and the value of VI × Abd P deeply depended on Abd P . Although there was no aggregation phenomenon in VI × Abd L , the saturation of VI itself still existed in its corresponding VI × Abd P (NDVI and NDVI × Abd L ).
The Abd L multiplied indices (VI × Abd L ) and Abd P multiplied indices (VI × Abd P ) both had its advantages and disadvantages, VI × Abd P could produce higher r but subjected to aggregation, while VI × Abd L was not affected by aggregation but produced relatively lower r. Therefore, Abd L and Abd P were combined together to give full play to the advantages of the two. In consideration of that Abd L correlated positively with yield and Abd P correlated negatively with yield - Figure 5B, the difference of Abd L and Abd P was calculated as Abd L−P and the relationship between VI × Abd L−P and yield was analyzed. The regression analysis showed that the goodness of fit in VI × Abd L−P and yield was better than that in VI and yield (with higher Adjusted R 2 ) - Table 5. However, not all VIs had better regression results with yield after multiplied by Abd L−P (such as CI green × Abd L−P and VARI × Abd L−P ). And the VI × Abd L−P was influenced by the VI itself, if the VI had a poor correlation with yield, the corresponding VI × Abd L−P may also show a bad correlation with yield. And the result of regression analysis in VI × Abd L−P and yield was better than that in VI × Abd L and yield but worse than that in VI × Abd P and yield. It was important to note that there was no obvious aggregation effect in VI × Abd L−P - Figure 5D. The Abd L−P multiplied indices (VI × Abd L−P ) combined the advantages of VI × Abd L and VI × Abd P and performed a better goodness of fit with no obvious aggregation effect in the relationship of VI × Abd L−P and yield. Among all the tested VI × Abd L−P , two indices which had highest Adjusted R 2 were selected (NDRE × Abd L−P and GNDVI × Abd L−P ) and then leave one out cross-validation approach was utilized to obtain the final yield estimation model - Figure 5E. The result proved that NDRE and GNDVI multiplied by Abd L−P could accurately estimate the rice yield at heading stage with R 2 reaching 0.6 and estimation errors below 10%.
In this study, we developed a new approach to estimate rice yield at heading stage using the integration of VI and abundance information retrieved from the UAV image. The approach was simple but it given some significant enlightenment for the yield estimation of grain crop like rice. In the application of RS, the impact of spectral mixture must be considered and this is also important to UAV RS technology which has a really high spatial resolution. And the SMA is a good way to get rid of the influence of different spectral components contained in remotely sensed images. Although the endmembers proposed in our approach were limited to rice yield estimation at heading stage, this work may offer a theoretical framework for yield estimation in grain crops which have obvious grain with significantly different spectra from their leaves. In the future study, we will try to apply this approach to satellite data and in other crops.

CONCLUSION
In this study, we developed an approach to improve the estimation of rice yield at heading stage using UAV-based Vegetation Index and abundance data. Compared with booting stage, a relatively weaker relationship between VI and rice yield was found at heading stage. The reason was the uneven emergence of rice panicle at heading stage which caused the decrease of predictive ability of VI for rice yield. In order to improve the accuracy of yield estimation at heading stage, a fully constrained least squares linear spectral mixture method was used to eliminate the influence of the panicle appearance on yield estimation. The abundance images of six mainly endmembers in paddy field was produced based on the six-band UAV image and ground measured spectra, including top layer leaf, bottom layer leaf, top layer panicle, bottom layer panicle, dry soil, and wet soil. The integration of plot-level VI and abundance information can estimate rice yield more accurately than using VI alone. Among the test VIs, NDRE, and GNDVI multiplied by the difference of leaf and panicle abundance were the most accurate for yield estimation in rice under different nitrogen fertilizer treatment with estimation errors below 10%.

AUTHOR CONTRIBUTIONS
SF conceived of the research ideas and built the infrastructure for the study site to make this research possible. SW provided rice yield and advised on data collection. YG and YP designed the experiments in detail and provided valuable guidance on data analysis. RZ and XW provided important insights and suggestions on this research from the perspective of agronomists. BD performed the majority of the data processing and provided the writing of this paper. All authors read and approved the final manuscript and significant contributions to this manuscript.