Integrating random forest and crop modeling improves the crop yield prediction of winter wheat and oil seed rape

The fast and accurate yield estimates with the increasing availability and variety of global satellite products and the rapid development of new algorithms remain a goal for precision agriculture and food security. However, the consistency and reliability of suitable methodologies that provide accurate crop yield outcomes still need to be explored. The study investigates the coupling of crop modeling and machine learning (ML) to improve the yield prediction of winter wheat (WW) and oil seed rape (OSR) and provides examples for the Free State of Bavaria (70,550 km2), Germany, in 2019. The main objectives are to find whether a coupling approach [Light Use Efficiency (LUE) + Random Forest (RF)] would result in better and more accurate yield predictions compared to results provided with other models not using the LUE. Four different RF models [RF1 (input: Normalized Difference Vegetation Index (NDVI)), RF2 (input: climate variables), RF3 (input: NDVI + climate variables), RF4 (input: LUE generated biomass + climate variables)], and one semi-empiric LUE model were designed with different input requirements to find the best predictors of crop monitoring. The results indicate that the individual use of the NDVI (in RF1) and the climate variables (in RF2) could not be the most accurate, reliable, and precise solution for crop monitoring; however, their combined use (in RF3) resulted in higher accuracies. Notably, the study suggested the coupling of the LUE model variables to the RF4 model can reduce the relative root mean square error (RRMSE) from −8% (WW) and −1.6% (OSR) and increase the R 2 by 14.3% (for both WW and OSR), compared to results just relying on LUE. Moreover, the research compares models yield outputs by inputting three different spatial inputs: Sentinel-2(S)-MOD13Q1 (10 m), Landsat (L)-MOD13Q1 (30 m), and MOD13Q1 (MODIS) (250 m). The S-MOD13Q1 data has relatively improved the performance of models with higher mean R 2 [0.80 (WW), 0.69 (OSR)], and lower RRMSE (%) (9.18, 10.21) compared to L-MOD13Q1 (30 m) and MOD13Q1 (250 m). Satellite-based crop biomass, solar radiation, and temperature are found to be the most influential variables in the yield prediction of both crops.


Introduction
Accurate crop monitoring in response to climate change at a regional scale plays a significant role in developing agricultural policies, improving food security, forecasting, and analyzing global trade trends (Jeong et al., 2016). The emergence of new technologies, such as simulation crop growth models (CGMs) and machine learning (ML), to synthesize and analyze large-scale data with high computing performance has increased the ability to accurately predict crop yields Bogard et al., 2020;Ersoz et al., 2020;Shahhosseini et al., 2020;Washburn et al., 2020). These technologies have each provided unique capabilities and significant advancements in prediction performance; however, they have been mainly assessed separately, and there may be benefits in integrating them to increase further prediction accuracy (Daw et al., 2017;Shahhosseini et al., 2021).
Since the 1960s, CGMs have reached a high degree of success in simulating the behaviour of real crops (i.e., by predicting their final state of total biomass or harvestable yield) (Dhillon et al., 2020). CGMs are a set of mathematical equations pre-trained using a diverse set of experimental data from various environments and are further refined (or calibrated) for more accurate predictions in each study (Kasampalis et al., 2018). They are increasingly applied as tools for decision-making and research, providing quantitative and temporal information on plant growth and development by including the effect of various climate variables (Mirschel et al., 2004;Murthy, 2004). Because CGMs lack spatial information, many studies have used them for forecasting applications by integrating them with remote sensing (RS) data (Clevers et al., 2002). The RS technology provides synoptic, timely, repetitive, and cost-effective information about the surface of the Earth (Justice et al., 2002;Ali et al., 2022); however, the cloud and shadow gaps in the optical satellite data can hinder or limit CGMs from producing accurate yield results (Roy et al., 2008;Gevaert and García-Haro, 2015). To fill the data gaps, many studies have successfully used multitemporal data fusion, combining the data obtained from two different sensors of different spatial and temporal scales (Benabdelouahab et al., 2019;Htitiou et al., 2019;Dhillon et al., 2020;Lebrini et al., 2020).
Due to its public availability of code and simplicity of usage, the spatial and temporal adaptive reflectance fusion model (STARFM) (Gao et al., 2006) is widely used to combine Landsat or Sentinel-2 with the Moderate Resolution Imaging Spectroradiometer (MODIS) (Xie et al., 2016;Zhu et al., 2017;Cui et al., 2018;Lee et al., 2019). Numerous studies successfully utilized the multi-temporal data fusion for deriving the leaf area index (LAI), or fraction of absorbed photosynthetically active radiation (FPAR) obtained from vegetation indices, e.g., the normalized difference vegetation index (NDVI), in combination with CGMs to estimate crop biomass or yield in different study regions around the world (Hwang et al., 2011;Bhandari et al., 2012). Similarly, many studies have compared the performance of different CGMs by implementing them on the same crop and in the same study region (Eitzinger et al., 2004;Dhillon et al., 2020). For example, in the preceding work, we compared five CGMs for simulating the biomass of selected winter wheat (WW) fields in the federal state of Mecklenburg-West Pomerania in northeast Germany. We found that the AquaCrop and semi-empiric Light Use Efficiency (LUE) are highly applicable and precise than the WOFOST, CERES-Wheat, and CropSyst (Dhillon et al., 2020).
Even though CGMs have a reasonable prediction accuracy, they are not readily applicable due to the data calibration requirements, long runtimes, and data storage constraints (Drummond et al., 2003;Puntel et al., 2016;Shahhosseini et al., 2019). Moreover, their specified designs restrict them to considering only limited climate parameters, whereas the other essential climate elements were neglected, which might benefit from further increasing the prediction accuracy. On the other hand, ML models can deal with linear and non-linear relationships by obtaining quality results with lower runtimes-plus, they can input a vast range of climate elements, likely positively affecting the accuracy of crop yields (De'ath and Fabricius, 2000). Moreover, they are easy to implement as they usually provide a less complex calibration and have fewer data storage constraints (Shahhosseini et al., 2019). Numerous ML algorithms (such as linear regression, decision tree, and random forest (RF)) were applied to the RS data for the prediction of agronomic variables (Basso and Liu, Frontiers in Remote Sensing frontiersin.org 02 Khaki and Wang, 2019;Haque et al., 2020;Khaki et al., 2020). Exemplary, the RF is a non-parametric advanced classification and regression tree (CART) analysis method that has been researched widely in many scientific fields. Most applications of RF have been focused on its utility as a classification tool with only limited studies exploring its regression capabilities for predicting crop yields (Vincenzi et al., 2011;Mutanga et al., 2012;Fukuda et al., 2013). However, some studies found that the RF could overfit data, making it unstable for crop yield estimation (Breiman, 2001;Segal, 2004). Moreover, RF could partially depend on variables of less importance that might affect the prediction accuracy negatively (Jeong et al., 2016). Therefore, coupling ML models with CGMs could be tested by training an RF model with the output of a crop model so that the RF can have the potential of overfitting issues within the range of training data.
Many studies have combined CGMs with simple regression models; however, to our knowledge, there are rare studies systematically investigating the effect of coupling ML and CGMs (Shahhosseini et al., 2021). The present study hypothesized that merging CGMs with ML models will improve yield prediction accuracy by combining the spatial crop biomass output of the LUE model (considered the most accurate, precise, and reliable (Dhillon et al., 2020)) with the RF model for WW and oil seed rape (OSR) in Bavaria. For this study, different RF models (RF1 (input: NDVI), RF2 (input: climate variables), RF3 (input: Normalized Difference Vegetation Index (NDVI) + climate variables), RF4 (input: LUE generated biomass + climate variables), and one semi-empiric LUE model were designed with different input requirements to find the best predictors of crop monitoring. In addition, the study investigates the accuracy of model outputs based on the spatial resolution of the RS products (without cloud and shadow gaps) inputting two STARFM-derived synthetic NDVI products (Landsat (L)-MOD13Q1 (30 m, 8 days) and Sentinel-2 (S)-MOD13Q1 (10 m, 8 days and one real NDVI product (MOD13Q1 (250 m, days)) (Dhillon et al., 2022). The specific research objectives include: (i) Explore whether only NDVI (RF1) or climate elements (RF2) or both NDVI and climate elements (RF3) are the best predictors of crop monitoring using RF models; (ii) Investigate the performance of LUE alone and its coupling with RF (RF4) for crop yield prediction of WW and OSR; (iii) Highlight the effect of different spatial scales (10 or 30 or 250 m) for crop yield estimation.

Materials and methods
The study's general workflow shows the input criteria for four different RF models (RF1, RF2, RF3, and RF4) and one LUE model designed to calculate the crop yield for Bavaria in 2019 ( Figure 1).
Firstly, the pixel level satellite and climate inputs are masked out for every field of every region of Bavaria using the InVeKos data (source: www.ec.europa.eu/info/index_en) for WW and OSR. Secondly, the spatiotemporal-metrics (STMs) [such as minimum, maximum, mean, standard deviation (sd) and sum] for pixel-based time series (between the SOS and the EOS of WW and OSR) are calculated for every field. Then the field values are integrated at a regional level.
The STMs of NDVI data and climate elements are inputted into respective RF models in the following steps. The NDVI is the only spatial input for the RF1, whereas the yield output of the model is tested at different spatial resolutions of 10, 30, and 250 m. Similarly, the climate variables (CV) are used as input for the RF2 model. The RF3 model combines satellite NDVI and CV and tests the yield prediction at different spatial scales. Prior to that, LUE model results of crop yield are generated by inputting NDVI data and climate elements (CV2) required by the model. In the last steps, the LUE model (crop biomass as an input) is linked with the RF4 model. As CV2 is already inputted in the LUE model, for RF4, CV3 (CV2 are subtracted from the CV) is used as an input. The study's main objective is to test the performance by coupling the crop simulation model and machine learning; therefore, the LUE model is coupled with the RF4 model, and the crop yield performance is tested for different satellite products.
All RF models are trained with 70% and tested with 30% of the crop yield data available at the regional level for both WW and OSR from the legal authorities [(i.e., Bayerisches Landesamt für Statistik (LfStat)]. Two synthetic (L-MOD13Q1: 30 m and 8-day; S-MOD13Q1: 10 m and 8-day) and one real-time (MOD13Q1: 250 m and 8-day) satellite NDVI time series [completed in preceding work (Dhillon et al., 2022)] is used as an input criterion for the RF and LUE models. The input NDVI and the climate data are selected for the respective start of the season (SOS) and end of the season (EOS) for WW and OSR in 2019.

Study area
The study region is the federal state of Bavaria located between 47˚N and 50.5˚N, and between 9˚E and 14˚E, in the southeastern part of Germany ( Figure 2). The climate of the region is influenced by the region's topography, with higher elevations in the south (northern edge of the Alps) and east (Bavarian Forest and Fichtel Mountains). The mean annual precipitation sums range from approx. 500 to above 3100 mm, with wetter conditions in the southern parts of Bavaria. The mean annual temperature ranges from −3.3°C to 11°C, but in most the regions, the temperature ranges between 8°C and 10°C (Dhillon et al., 2022). As the largest state in Germany, Bavaria covers an area of approx. 70,500 km 2 , covering almost one-fifth Frontiers in Remote Sensing frontiersin.org 03 of Germany. The federal state is divided into 96 counties with 71 rural districts (so-called "Landkreise") and 25 city districts (so-called "Kreisfreie Städte"). For the year 2019, the landcover of Bavaria covers 31.56% of the area under agriculture (Dhillon et al., 2022).
The study investigates the importance of 8-day temporal satellite data with different spatial resolutions and climate data of several meteorological parameters in crop yield prediction. The updated InVeKos data (of 2019) is used to obtain the reference field information of WW and OSR for every district of Bavaria. Table 1 briefly describes the user data and indicates the spatial and temporal resolutions.

Satellite data
The present study used two synthetic [L-MOD13Q1 (30 m, 8-days) and S-MOD13Q1 (10 m, 8-days)] and one real-time [MOD13Q1 (250 m, 8-days)] NDVI time series, which were generated in preceding work by (Dhillon et al., 2022)] as an input to the RF and LUE models. In the synthetic NDVI products, the cloud and shadow gaps in the real-time Sentinel-2 and Landsat data for 2019 were filled using the spatial and temporal adaptive reflectance fusion model (STARFM), which blends the coarse spatial resolution of MODIS and high spatial resolution of Landsat/Sentinel-2 data. In addition, the MOD13Q1 V6 product (just the MODIS NDVI time series without image fusion) is also

FIGURE 1
Conceptual framework of the study that explains the methodology of four random forest (RF1, RF2, RF3, and RF4) models with different input requirements to predict crop yield for winter wheat (WW) and oil seed rape (OSR). The semi-empiric light use efficiency (LUE) model is coupled with the RF4 model. CV belongs to climate variables and CV2 are the set of CV required by the LUE model. CV3 (CV minus CV2) are the set of CV required by the RF4 model. Landsat(L)-MOD13Q1, Sentinel-2(S)-MOD13Q1, and MOD13Q1 are the satellite inputs [generated by (Dhillon et al., 2022)].

Frontiers in Remote Sensing
frontiersin.org 04 selected as an input to the RF and LUE models to allow the comparison of crop yield outputs at high (10 m), medium (30 m) and coarse (250 m) spatial scales. The 8-day time series for the RS products are considered starting at the day of the year (DOY) 1 (1 January) till 353 (19 December) for 2019. For crop modelling and machine learning algorithms, this study inputs the 8-day satellite datasets from the stem elongation phases till the flowering stages of both WW and OSR. For OSR, the SOS is 15th February, and the EOS is 20th April 2019 (Zamani-Noor andFeistkorn, 2022). And for WW, the SOS and EOS period lies between 15th of April to 30th of June 2019 (Harfenmeister et al., 2021).

FIGURE 2
Overview of the study region with spatial information of winter wheat (WW) and oil seed rape (OSR) fields (left). The dark green color shows the fields of WW and dark orange shows the fields of OSR in 2019. The enlargement (displayed with a dark red box on the top left map) shows the detailed version of WW and OSR fields. The bottom right map shows the different districts with their administrative zones in Bavaria. temperature, solar radiation (Rs, MJm −2 day −1 ), sunshine hours (N, hours), relative sunshine duration (n, hours), precipitation (P, mm), soil moisture (Sm, mm), relative humidity (RH, %), wind speed at 2 m height (WS, ms −1 ), runoff (RO, mm), deep percolation (DP, mm), snow cover (Snow, mm), extra-terrestrial radiation (Ra, MJm −2 day −1 ), Sublimation (Sublim, mm) and evapotranspiration (E, mm). The CV were obtained by dynamical downscaling the ERA5 reanalysis dataset (Hersbach et al., 2020), provided by the European Centre for Medium-Range Weather Forecasts, to a horizontal grid resolution of 2000 m using the hydrologically enhanced Weather Research and Forecasting model (Gochis et al., 2018;Skamarock et al., 2019). A detailed description of the selected downscaling approach is provided by (Arnault et al., 2018) and (Rummler et al., 2019). For this research, the daily climate data is aggregated into 8 days' temporal products and adapted to the RF and LUE models. Like the satellite data, the present study considers the 8day climate data for the same SOS and EOS for WW and OSR as described in Section 2.1.1.

InVeKos data
The InVeKos data is field-based data used to identify the fields of WW and OSR in 2019. The data is collected through the Integrated Administration Control Systems (IACS) (www.etc. europa.eu/info/index_en) that is available for all agricultural plots in the European Union (EU) countries by allowing farmers to graphically indicate their agricultural areas. In the IACS, EU countries are responsible for administering and controlling payments to farmers through a principle called shared management.

Bayerisches Landesamt für statistik (LfStat) crop yield data
The LfStat crop yield is a database that provides crop yield of 29 crop categories including WW and OSR in Bavaria at a regional level (source: https://www.statistikdaten.bayern.de/genesis/online/ , Statistics Code: 41241). In this study, LfStat crop yield data of WW [total number of observations (n = 65 and OSR n = 50)] is used for training (70%) and testing (30%) the RF models and for validating the LUE model (100%), respectively (see Figure 1).

LUE model
The study used the semi-empiric LUE model based on the principle of light use efficiency theory (Monteith, 1972;Monteith, 1977). As it is proven to be reliable, precise, and accurate, this study used the same approach to calculate crop yield and biomass as adopted by (Dhillon et al., 2020). The model monitors the growth of WW and OSR by assessing the impact of climate variables over a period of 8 days between their respective SOS and EOS and calculates the crop biomass as a cumulative sum. The climate variables used by the LUE model are (CV2) T_ mean, T_max, T_dew, Rs, and RH. The model is based on a semiempirical approach and calculates the aboveground biomass as a cumulative sum between the stem elongation phase till the flowering stage of both WW and OSR (Eq. 1), where PAR is photosynthetically active radiation (MJ m −2 d −1 ), FPAR is the fraction of PAR absorbed by the canopy, є is the actual light-use efficiency (g C M J −1 ) and i is the time frame between the SOS (stem elongation) and the EOS (flowering period) of both crops. The total aboveground biomass calculated by the LUE model is equivalent to the net primary productivity (NPP) (kg ha −1 yr −1 ) (Monteith, 1972;Gitelson et al., 2012). The detailed stepwise procedure of the LUE model is explained in (Dhillon et al., 2020).

Random forest (RF) model
The study trained and used four RF models (RF1, RF2, RF3, and RF4) (see Figure 1), binary-bias machine-learning methods, to predict crop yields for WW and OSR. RF can be used for classification and regression purposes and this study used it as a regression tool. The RF model is trained by many classification and regression trees (CARTs) that are grown with a random subset of predictors Many random trees are generated when the source data for the model is bootstrapped and, finally, the forest (group of random trees) of the CART is averaged. A more detailed explanation of the model is provided by (Breiman, 2001). The study used the 'randomForest' package in the software R for each RF model (Liaw and Wiener, 2002;Team, 2013) (Table 2). The value of mtry has been approximately considered by dividing the total number of predictors by 3. A variable analysis tool from the randomForest package analyses the variable importance. The mean decrease accuracy is used as a measure of variable importance. The out-of-bag (OOB) performance estimation is analyzed for assessing the performance by averaging the node's mean decrease accuracy before and after permuted.

Statistical analysis
The modeled crop yield data from four RF and LUE models are validated with the Landesamt district-wise yield data collected from the statistics department of Bavaria for the year 2019. From the modeled and referenced yield, the determination coefficient (R 2 ) (Eq. 2), mean error (ME), root mean square error Frontiers in Remote Sensing frontiersin.org 06 (RMSE), and relative RMSE (RRMSE) (Eqs 3, 4, 5) are calculated. The lower the value of ME, RMSE and RRMSE the better the model performed. This study considers RRMSE <15% as good agreement; 15-30% as moderate agreement; and >30% as poor agreement (Yang et al., 2014). A linear regression model (LRM) is performed to establish a linear relationship between the referenced and modeled yield of WW and OSR at different spatial scales (10, 30, and 250 m).
where P i is the predicted value, O i is the observed value, Ō is the observed mean value, and n is the total number of observations. The significance of the used models is checked by analyzing the probability value (p-value), which is calculated using the LRM with a H 0 that there is no correlation between the referenced and modeled yield, and an H 1 that the relationship exists. To perform this test, a significance level (called alpha (α)) is set to 0.05. A p-value of less than 0.05 shows that a model is significant, and it rejects the H 0 that there is no relationship.

RF1: NDVI as the only predictor of crop yield monitoring
With L-MOD13Q1, S-MOD13Q1, and MOD13Q1 NDVI inputs, the RF1 model performed significantly for WW and OSR (p-value <0.05). The R 2 obtained from the S-MOD13Q1 NDVI product has a higher accuracy than the L-MOD13Q1 and MOD13Q1 ( Figure 3). Based on the R 2 of different spatial resolutions of the NDVI products for WW and OSR, the RF models resulted in descending order as S-MOD13Q1 (10 m), L-MOD13Q1 (30 m), and MOD13Q1 (250 m), with R 2 values as 0.66/0.61, 0.66/0.50, and 0.60/0.26, respectively. For quality and precision, the ME and RMSE values give a more complete picture of the performance of RF with NDVI as the only predictor. The ME and RMSE of WW from MOD13Q1 (8.21 dt/ha and10.30 dt/ha) are higher than that of L-MOD13Q1 (8.18 dt/ha and 10.20 dt/ha) and S-MOD13Q1 (5.65 dt/ha and 7.96 dt/ha), respectively ( Figures  3A,C,E). Similarly, for OSR, S-MOD13Q1 has the lowest ME and RMSE of 2.76 dt/ha and 3.76 dt/ha, as compared to L-MOD13Q1 and MOD13Q1 ( Figures 3B,D,F). Overall, the S-MOD13Q1 (RRMSE = 11.40% (WW)/11.23% (OSR)) results are more accurate than L-MOD13Q1 (14.33%/12.28%) and MOD13Q1 (14.83%/14.32%) for both WW and OSR.

RF2: Climate variables (CV) as the only predictors of crop yield monitoring
With climate elements as input parameters, the RF2 model performed significantly for both WW and OSR (p-value <0.05) ( Figures 4A,B). The R 2 obtained for WW has shown a higher accuracy (R 2 = 0.57) than the OSR (R 2 = 0.50). However, the OSR (RMSE = 4.23 dt/ha) resulted in higher preciseness than the WW (RMSE = 10.60 dt/ha). Moreover, the RRMSE for WW shows moderate agreement (15.28%) between the observed and predicted yield. The CV importance for WW are mainly N, E, Ra, Tdew, Sm, and Rs; however, for OSR, Tmin, WS, Ra, Rs, and Snow are of high importance (Figures 4C,D).

RF3: NDVI and CV predictors for crop monitoring
With climate parameters and L-MOD13Q1, S-MOD13Q1, and MOD13Q1 NDVI inputs, the RF3 model performed significantly for WW and OSR (p-value <0.05). The R 2 obtained from the S-MOD13Q1 NDVI product has a higher accuracy than the L-MOD13Q1 and MOD13Q1 ( Figure 5). Based on the R 2 of different spatial resolutions of the NDVI products for WW and OSR, the RF models resulted in descending order as S-MOD13Q1 (10 m), L-MOD13Q1 (30 m), and MOD13Q1 (250 m), with R 2 values of 0.75 (WW)/0.66(OSR), 0.72/0.61, and 0.67/0.53, respectively. For quality and precision, the ME and RMSE values give a more complete picture of the performance of RF with NDVI as the only predictor. The ME and RMSE of WW from MOD13Q1 (5.56 dt/ha and 8.10 dt/ha) are higher than that of L-MOD13Q1 (5.45 dt/ha and 7.98 dt/ha) and S-MOD13Q1 (4.94 dt/ha and 7.56 dt/ha), respectively ( Figures  5A,C,E). Similarly, for OSR, S-MOD13Q1 has the lowest ME and RMSE of 2.70 dt/ha and 3.78 dt/ha, as compared to L-MOD13Q1 (2.77 dt/ha and 3.85 dt/ha) and MOD13Q1 (3.11 dt/ha and 4.08 dt/ha) ( Figures 5B,D,F). The RRMSE is decreased by −6.57% and −7.23% between S-MOD13Q1 (10.66% (WW) and 11.67% (OSR)) and MOD13Q1 (11.41% and 12.58%) for WW and OSR, respectively. The mean and sum of NDVI have a higher impact on the accuracy assessment of WW yield; however, NDVI has less impact on the crop yield prediction of OSR ( Figure 6). Other than that, E, Ra, Sm, and N have a higher influence on WW's yield prediction ( Figure 6A). For OSR, Snow, Temperature, and Sm have shown a higher influence ( Figure 6B).

Overall comparison of models
The bar plots in Figure 10 show the accuracy assessment of estimating crop yields of WW and OSR using different models with different inputs. For the model RF1 (where NDVI is the only predictor), the MOD13Q1 has the lowest R 2 [0.60 (WW)/0.26 (OSR)] while both L-MOD13Q1 and S-MOD13Q1 have almost the same R 2 values (0.66/0.50) for both WW and OSR. However, for WW, the RMSE and RRMSE has shown a different trend with lower values (7.96 dt/ha, 11.40%) for S-MOD13Q1, and higher (10.22 dt/ha, >14.00%) for both MOD13Q1 and L-MOD13Q1. The RF2 model (where climate variables are predictors), has shown the lowest accuracy [R 2 : 0.57 (WW), 0.50 (OSR)] and preciseness (RMSE: 10.6 dt/ha, 4.23 dt/ha) as compared to the RF1 model. The RF3 model (where CV and NDVI are the predictors), has improved the accuracy estimation of RF1 with higher R 2 (CV + S-MOD13Q1> CV + L-MOD13Q1> CV + MOD13Q1) and lower RMSE and RRMSE (CV + S-MOD13Q1< CV + L-MOD13Q1< CV + MOD13Q1) for both WW and OSR. On the other hand, the LUE model has further improved the R 2 and RMSE values for both crops than the RF3 model. The accuracy

Accuracy assessment based on different spatial inputs
The box plots in Figure 11 show the contribution of different spatial inputs to LUE and RF models crop yield estimations of Bavaria for WW and OSR. Among all models, the S-MOD13Q1 (10 m) result in higher mean R 2 [0.80 (WW)/0.69 (OSR)], lower RMSE (dt/ha) (6.38/3.05), Frontiers in Remote Sensing frontiersin.org 10

FIGURE 6
Bar plots of the variable importance of WW and OSR after validation of the modeled yield with referenced yield using the RF3 model. The green color represents WW, and the orange color represents OSR. (A) Variable importance for WW (B) Variable importance for OSR.  Moreover, the MOD13Q1 resulted in better performance for OSR than WW. For L-MOD13Q1 and MOD13Q1, the mean R 2 (0.77 (WW)/0.69 (OSR), 0.67/0.55), RMSE (dt/ha) (7.29/3.06,

FIGURE 10
Bar plots for the overall accuracy assessment of WW and OSR with four RF (RF1, RF2, RF3, RF4) and one LUE model with different input variables (shown in the legend at right). (A) R 2 , (B) RMSE, (C) ME, and (D) RRMSE for WW and (E) R 2 , (F) RMSE, (G) ME, and (H) RRMSE for OSR using different models with various inputs, respectively.

Spatial distribution of crop yields for WW and OSR on regional scale
The maps in Figures 12, 13 describe the region-wise spatial distribution of referenced and predicted (obtained from RF1, RF2, RF3, LUE and RF4) yield for WW and OSR by inputting S-MOD13Q1 (10 m) in Bavaria for the year of 2019, respectively. For both crops, the yield prediction by the RF4 (coupling of LUE and RF) has better synchronization with the observed yield results compared to the other four models. The referenced OSR and WW yields have higher values in the southern regions of Bavaria. Almost all models for OSR have shown higher values in respective regions; however, for WW, only RF4 and LUE modelled yields obtained higher values (>85 dt/ha) and other models have estimated between 55 and 85 dt/ha. The referenced OSR yield values for the central part of Bavaria observed higher yield between 32 and 44 dt/ha; however only RF4 and RF3 models had predicted the accurate amount.

Discussion
This study addresses the importance of coupling the RF model with the LUE model to improve the accuracy of crop yield estimation of WW and OSR for Bavaria in 2019. The present study is among the rare other studies that ensemble models to increase crop yield predictability. This study demonstrated that introducing the LUE output spatial biomass plus climate parameters into the RF model (RF4) and utilizing them as inputs to a prediction task on average can decrease the prediction error measure by RMSE from 5.03-4.98 dt/ha (for WW) and 2.14-1.96 dt/ha (for OSR). In addition, the predictions made by the RF4 model show less bias towards the actual regional yields. Similar studies in this area are only limited to coupling the simplest statistical models with crop growth models (Chakraborty et al., 2005;Hadria et al., 2006;De Wit and Van Diepen, 2007;Dente et al., 2008). However, a related study has coupled the Agricultural Production Systems Simulator (APSIM) variables into machine learning models and estimated the decrease of RMSE between 7 and 20% (Shahhosseini et al., 2021).
The cloud and shadow gaps in the optical satellite data can hinder or limit yield prediction algorithms from producing accurate yield results (Roy et al., 2008;Gevaert and García-Haro, 2015). Many studies employing satellite images aimed to compensate the data gaps present in satellite data by fusing it with another data source for various applications of remote sensing (Barbedo, 2022). The research is conducted at different spatial scales where multiple spatial resolution satellite products [two STARFM-derived synthetic NDVI products (L-MOD13Q1 (30 m, 8 days) and S-MOD13Q1  (Dhillon et al., 2022). The study highlights the importance of high spatial scales in achieving accurate crop yield results. For example, the input products with 10-m resolutions (R 2 > 0.75) resulted in higher accuracy than the 30-m (R 2 > 0.72) and 250-m (R 2 > 0.65) satellite products using RF2, RF3, and RF4 models for WW. Previous studies have also shown that high spatial resolution could significantly improve the accuracy of crop yields (Huang et al., 2016;Dhillon et al., 2020;LIU et al., 2021).
Moreover, the results of the study at hand also demonstrate that variable selection plays an important role in achieving more accurate crop predictions. The time series vegetation index (VI) data derived from satellite images are known as a better predictor for many applications of remote sensing (Zhang et al., 2003;Wardlow et al., 2007;Zhong et al., 2012;Zhang et al., 2013;Kim et al., 2014;Shen et al., 2015); however, this study highlights that NDVI alone could not be used to achieve accurate crop yield results (WW: R 2 < 0.65, OSR: R 2 < 0.45). Moreover, the research found that the combined use of NDVI and climate parameters can help to improve the model performance (WW: R 2 > 0.70, OSR: R 2 > 0.60). The inclusion of relevant climate parameters positively impacted the yield prediction for WW and OSR. For example, extra-terrestrial radiation has higher variable importance for WW and snow cover for OSR. Furthermore, the crop and phenology-related variables (LUE biomass), solar radiation, soil moisture and temperature are the most influential variables in increasing the yield accuracy for WW and OSR.
This study also compares the performance of LUE when used with and without the random forest model. Similar to other studies, the LUE model resulted precisely and accurately with an average R 2 of 0.78 and 0.76 and an RMSE of 6.10 and 2.47 dt/ha for WW and OSR at different spatial scales, respectively (Dhillon et al., 2020). However, a drastic improvement in the accuracy has been seen when the LUE model was linked with the random forest model by including more climate variables as an input. This coupling has increased the R 2 from 0.78 to 0.85 and 0.76 to 0.81 for WW and OSR using different satellite inputs, respectively.
The simplicity and reliability of the present study conclude that this design needs to be implemented for different periods, locations, and crop types to improve the global yield estimation for developing agricultural policies, improving food security, forecasting, and analyzing global trade trends. The study stresses coupling the LUE model with the RF model; however, the applicability of other crop models, such as WOFOST, AquaCrop, or CERES Wheat, on coupling with ML or deep learning (DL) could be tested. Moreover, the study only includes the year 2019 for the state of Bavaria, but the same design could be transferred and tested to other geographical regions at any time scale. Inclusion of climatic variables such as solar radiation, extra-terrestrial radiation, soil moisture, temperature, snow cover (for OSR) and evapotranspiration would be recommended in future studies. Due to the availability of the crop validation data (LfStat) on a regional level, the study integrated the pixel-level information into the district level. This transfer of data (from field to district level) could result in a loss of information, and it might negatively impact the accuracy of the algorithm outcomes. Therefore, to justify the potential of satellite data and machine learning algorithms in crop monitoring, the study recommends testing and validating methodology at the field level. Moreover, as the study validates the accuracy of WW and OSR, the study design might also be tested for different crop types such as Maize, Sugarcane, Rice, etc.

Conclusion
Conclusively, this study stressed the positive impact of combining crop modeling and machine learning to improve the prediction accuracies for the application of agricultural monitoring. Moreover, the crop and phenology-related inputs (LUE biomass), extra-terrestrial radiation, solar radiation, evapotranspiration, extra-terrestrial radiation, soil moisture, snow cover (for OSR) and temperature are the most influential variables that are needed to be considered for increasing the yield accuracy in future studies. The present research concludes the findings as follows: (i) To answer if NDVI or CV is the better predictors of crop yield, the study found that the individual use of NDVI (in RF1) and climate variables (in RF2) would be less accurate in yield prediction than they are used together (in RF3) in machine learning. The accuracy assessment when NDVI is used alone as a crop yield predictor is lower (WW: R 2 < 0.65, OSR: R 2 < 0.45) than it is used together with the climate variables (WW: R 2 > 0.70, OSR: R 2 > 0.60). (ii) To find if the coupling of ML and CGM results in higher accuracy, the study investigated that linking the LUE model's output with the RF model's input (RF4) would increase the crop yields' accuracy drastically. The coupling has decreased the RMSE by -1.00% (for WW) and -8.4% (for OSR), decreased the RRMSE from -8% (WW) and -1.6% (OSR), and increased the R 2 by 14.3% (for both WW and OSR), compared to results of LUE. (iii) To find the impact of high spatial resolution on crop yield estimation, the study concludes that the RS inputs with 10m resolutions resulted in higher accuracy than the 30-m and 250-m with RF2, RF3, LUE, and RF4 models for WW and OSR.
Moreover, the present study is performed at the regional level; however, the availability of field-level yield information could be useful for implementing a similar methodology and obtaining more accurate outcomes. The study design needs to be implemented for different periods, locations, and crop types to improve the global yield estimation for developing agricultural policies, improving food security, forecasting, and analyzing global trade trends. The accurate validations of WW and OSR broaden the scope of the study. Therefore, the simple and reliable design of the study could be tested for other crop types such as maize, sugarcane, or rice on a global scale.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
MD wrote the first draft of the manuscript. MD and TU contributed to editing. MD and TU contributed to the analysis. MD and TU contributed to conceptualization. MD contributed to methodology. MD and TU contributed to formal analysis. TU contributed to supervision. MS-D, C-KF, TR, and JA contributed to data curation. MD, TD, CK-F, TR, JA, IS-D, and TU contributed to resources. MD, CK-F, and TD contributed to project administration. TU and IS-D contributed to funding acquisition.

Funding
This study was conducted within the LandKlif project funded by the Bavarian Ministry of Science and the Arts via the Bavarian Climate Research Network (bayklif, www.bayklif.de). This publication was supported by the Open Access Publication Fund of the University of Wuerzburg.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.