An integrated, probabilistic model for improved seasonal forecasting of agricultural crop yield under environmental uncertainty

We present a novel forecasting method for generating agricultural crop yield forecasts at the seasonal and regional-scale, integrating agroclimate variables and remotely-sensed indices. The method devises a multivariate statistical model to compute bias and uncertainty in forecasted yield at the Census of Agricultural Region (CAR) scale across the Canadian Prairies. The method uses robust variable-selection to select the best predictors within spatial subregions. Markov-Chain Monte Carlo (MCMC) simulation and random forest-tree machine learning techniques are then integrated to generate sequential forecasts through the growing season. Cross-validation of the model was performed by hindcasting/backcasting and comparing forecasts against available historical data (1987–2011) for spring wheat ( Triticum aestivum L.). The model was also validated for the 2012 growing season by comparing forecast skill at the CAR, provincial and Canadian Prairie region scales against available statistical survey data. Mean percent departures between wheat yield forecasted were under-estimated by 1–4% in mid-season and over-estimated by 1% at the end of the growing season. This integrated methodology offers a consistent, generalizable approach for sequentially forecasting crop yield at the regional-scale. It provides a statistically robust, yet ﬂexible way to concurrently adjust to data-rich and data-sparse situations, adaptively select different predictors of yield to changing levels of environmental uncertainty, and to update forecasts sequentially so as to incorporate new data as it becomes available. This integrated method also provides additional statistical support for assessing the accuracy and reliability of model-based crop yield forecasts in time and space.


MOTIVATION
There is increasing worldwide concern over the social, environmental and economic destructive potential of extreme climatic events and cumulative environmental impacts (e.g., climate variability, invasive pests) (FAO, 2011). Extreme events associated with climate variability disrupt water, energy and food production, supply and availability (Jentsch et al., 2007). Climate extremes are having a major impact on inter-annual wheat production worldwide (USDA, 2013). Unanticipated extreme events can be large enough in some countries to offset a significant portion of the increases in average yields that arose from technology, carbon dioxide fertilization, and other factors (Lobell et al., 2011). Reliable crop forecasts have the potential to greatly aid decisionmakers in identifying potential risks and benefits to increase crop production and to gauge uncertainty, particularly during times where production is uncertain, or across regions where production is highly variable (Hansen, 2005;Littell et al., 2011). Across Africa, food aid imports and emergency assistance, strategic food reserves, and the granting of private firm licences for food import and exports all rely on crop forecasts (Jayne and Rashid, 2010). Extreme climatic events (i.e., drought/flood, extreme cold/heatwaves) impact major food-producing regions in Canada, United States, Russia, China, Australia and North/South Korea, thereby raising the prospect of higher commodity prices and localized food shortages in the future (Global Infomation and Early-Warning System (GIEWS), Food and Agriculture Organization of the United Nations/FAO 1 ). As more extreme weather events are anticipated to accompany a warming climate, improved methods for forecasting crop production and its response to climate and other agronomic factors, are becoming increasingly important to guide agricultural producers in making more informed in-season crop management and financial decisions (IPCC, 2013).
The benefit and value of crop forecasts varies according to a range of criteria, namely, relevance, reliability, stakeholder engagement, holism and accuracy (Hammer et al., 2001;McIntosh et al., 2007). These aspects consider the type, availability, coverage of expert knowledge and empirical data, including the required lead-time, time-step, and duration for forecasting. Utilizing available satellite remote-sensing data, alongside observer-based field survey data provides a more rapid and less costly approach to repeated sampling and updating of regional or national-scale forecasts over large cropland areas of interest, while also helping to increasing forecast accuracy, reliability and consistency. Furthermore, crop yield and production forecasting is gaining increasing interest in ecological science because of the availability of large volumes of data from observational monitoring networks and satellite remote-sensing platforms, increases in computational power, and advances in multivariate optimization methodologies. There is also increasing attention on addressing societal needs for better (i.e., more robust) strategies for managing and exploiting natural resources sustainably, given significant global economic, social and environmental change (Luo et al., 2011).

CROP YIELD FORECASTING
A forecasting system has two primary functions: generation and control, for which there are many different types of designs with varying capabilities and levels of data and knowledge integration. Forecast generation includes acquiring data to revise the forecasting model, producing a statistical forecast and presenting results to the user. Forecast control involves monitoring the forecasting process to detect out-of-control conditions and identifying opportunities to improve forecasting performance. Some systems use simple, calibrated reference curves or semi-empirical equations, while others use statistical models or more complex agro-ecosystem process models. Because of the complex interplay of variables affecting crop yield, a general auto-regressive, integrated moving-average time-series (ARIMA) type model for forecasting is often not suitable. Different levels of engagement relate to the operational cost to survey crops in relation to gain in forecast accuracy, available input data streams, and choice, complexity and extent of innovation of a given model-data assimilation approach (Hammer et al., 2001;Stone and Meinke, 2005). To supplement our focus on crop yield forecasting in the Canadian context, we refer interested readers to supplementary information provided on existing operational systems for crop yield forecasting at various spatial and temporal scales (Supplementary Material).

AGROECOSYSTEM DYNAMICS AND MEASUREMENT UNCERTAINTY
Agroecosystems are systems that comprise dynamic processes that interact and take place over multiple spatial and temporal scales. As a consequence, there are significant changes in the leading predictors that control the underlying state for an observed response at any point in time and space. As leading predictors change, so to, can the signal to noise ratio (SNR) of the underlying processes, introducing loss and gain of information for statistical estimation and forecasting. The SNR can be estimated as a ratio of total variance output from a model to the total variance of model inputs. Recently, the uncertainty associated with temperature-driven processes in regional-scale crop models has been reported to be, on average, 12% higher than climate model uncertainty (Koehler et al., 2013). Reducing such uncertainty requires focusing not only on the climate inputs, but also on testing structural model assumptions on crop development that include: changing senescence, the influence that crop water status and changing carbon dioxide levels can have on canopy temperature and heat stress on a crop, crop growth duration and length of day (Hoffmann and Rath, 2013).
Integrating data from different measurement platforms can help to hedge the risk of relying on a single source of information, especially under situations of high data sparsity and environmental variability. Spatial-based models aid in identifying particular crops, times and regions that are most affected by environmental impacts and to assist efforts to measure and analyze ongoing efforts to adapt (Challinor et al., 2003;Hansen, 2005;Matthews et al., 2013). Ecological models consider agroecosystems as an inter-connected system of plant, soil, atmospheric and other underlying, interacting processes. Statistical models, on the other hand, typically focus on representing and understanding individual components, specific processes and/or scales of interactions within an agroecosystem. Nonetheless, statistical models are being increasingly developed and applied to understand whole system dynamics and behavior and "big data" problems. Traditionally, ecological forecasting has typically been based on process-oriented models, informed by data in arbitrary ways. Although most ecological models incorporate some representation of mechanistic processes, many models are generally not adequate to quantify real-world dynamics and provide reliable forecasts with accompanying estimates of uncertainty (Littell et al., 2011). This is because such mechanistic models can be easily over-parameterized by requiring tens to hundreds of parameters and variables to explain an observed pattern or response "signal," without substantially increasing the SNR by explaining a greater fraction of unexplained "noise." In contrast, probabilistic approaches seek to optimize the smallest number of parameters and variables required to explain a signal and minimize the unexplained noise concurrently. This ensures that the SNRs of model predictions and forecasts decrease and are more robust to any changes in input parameters and variables. Moreover, it may be the case that there are limits to crop yield predictability in terms of limits in the achievable SNR. Such limits may be not depend on whether deterministic/probabilistic model is used, nor model complexity in terms of the number of parameters or variables in a model. major crop commodities can be determined using historical data alone, many other considerations are required to calibrate and generate predictions of future production and its uncertainty. The advancement and changes in crop genetics, farming technology and efficiency, improved agronomic management practices, and ecological dynamics (i.e., soil, water and air/climate) requires one to add a sufficient level of detail into the construction of models that are able to more reliably forecast crop yield and production (Challinor, 2009;Matthews et al., 2013). Moreover, statistical methods aid in accounting for and minimizing measurement bias within climate time-series data that is used in agricultural models and yield forecasting (Hoffmann and Rath, 2012). Applying models at different spatial and temporal scales also contributes additional scaling-based uncertainty, so that the accuracy of a forecasting model at one spatial scale may not be the same, when the same model is applied at another scale (Ewert et al., 2011).
Many current model-based forecast methods are very complex, operate only at the point or field scale, and/or utilize crop phenology information or remote-sensing data, but not both. These aspects limit the application of mechanistic, dynamic crop and agroecosystem models when predicting at the regional-scale, and favor a statistical (i.e., probabilistic) approach that generates predictions based on a smaller subset of leading predictors. In summary, current challenges in using models for forecasting, include: (1) high environmental variability, volatility/stochasticity and spatial heterogeneity, (2) the need for forecasts to be assessed in a dynamic framework, (3) high levels of bias and mismatch of spatial and temporal scales between coupled ocean-atmosphere climate-and crop growth-model forecasts, and, (4) whether required institutional and policy arrangements exist to provide a broader environmental-economic agricultural risk framework to ensure crop forecasts offer a beneficial, viable option for producers to improve their economic livelihood and adopt in the longer term. Such arrangements must provide sufficient flexibility in agricultural management to be able to respond to different levels of perceived, real needs and risks. Many crop forecasting systems only consider a single source of variability. This fails to account for crop yield variability that may be explained by the combined effect of both agroclimate and remote sensing indices. Furthermore, models based on remote-sensing indices alone are only able to provide accurate forecasts late in the growing season when crop growth becomes visible. Likewise, forecasts that do incorporate agroclimate indices, such as precipitation and temperature, fail to include information from remote-sensing data and/or do not make use of the spatial dependence of crop yields.

RESEARCH OBJECTIVE
Our research objective was to devise a probabilistic model to forecast crop yield and to increase forecast skill in time and space by integrating both agroclimate and satellite remote-sensing data. This involved tracking uncertainty across large regions and between subregions, and updating of forecasts sequentially through growing season. This approach addresses several knowledge gaps, namely: the need to utilize satellite remote-sensing data and to incorporate spatially-dependent environmental variation influencing crop development and growth, and the need to provide a probabilistic, adaptable approach that can more objective tune, adjust and sequentially improve its forecasts under changes in the amount of data available and level of environmental variability. The model integrates state-of-the-art Bayesian statistical forecasting techniques employing Markov-Chain Monte Carlo (MCMC)-based simulation, robust regression, variable-selection and random forest-tree machine learning. Specifically, (1) we conducted a sensitivity analysis of the model, comparing its performance under simulated changes in the data that is available for forecasting, (2) compared model performance between different subregions as the model selects and ranks different explanatory or predictor variables to reduce uncertainty in its forecasts, and (3) cross-validated the model was performed by hindcasting/backcasting it and comparing its forecasts against available historical data (1987-2011) for spring wheat (Triticum aestivum L.). It was also validated for the 2012 growing season, comparing its forecast skill at the CAR, provincial and Canadian Prairie region scales against available statistical survey data. Finally, we identify ways the model could be further enhanced, improved and made more reliable when being applied to different crops and large crop production areas.

STUDY REGION
Our study region was the Canadian Prairies encompassing the provinces of Alberta, Saskatchewan and Manitoba (Figure 1). This region accounts for roughly 91% of Canada's total wheat production (Statistics Canada, 2012a). The boundaries of the 40 Census Agricultural Regions (CARs) spanning the study region originated from the 2011 census of agriculture (Statistics Canada, 2012c). The Prairies are the northernmost branch of the Great Plains of North America with a flat or rolling plains landform and is the major region where wheat is grown in Canada and the most altered of Canada's ecozones. The mountains to the west block precipitation that would otherwise fall within the region, making the western portion very dry. Strong Chinook winds also reduce humidity and deliver brief episodes of spring-like conditions in the western regions during winter. Precipitation generally increases eastward. Temperatures are extreme due to negligible ocean buffering. Winter temperatures average −10 • C and summers average 15 • C and are very cold. The growing season is defined as the period of each year when crops can be grown. In Canada the growing season is usually determined by the days between last and first frost (i.e., May to October). Growing season precipitation varies between 300 and 400 mm. The dominant soils within this region are the Chernozemic soils. The Chernozemic soils are strongly determined by the accumulation, decomposition and transformation of soil organic matter within the topsoil (or A horizon), whereby native grassland vegetation and climate both influence the amount and nature of organic matter retained within the soil. Deposition of plant material belowground in the grassland system has been the primary factor whereby soil organic matter accumulates (Fuller, 2010). Regional variations in climate and vegetation distribution have formed the distinct soil zones, namely the: Brown, Dark Brown, Black, and Dark Gray zones that have different depths and soil organic matter content within their surface layers (Pennock et al., 2011).

Agro-climate
Historical crop yield data from 1987 to 2011 for each of the CARs was obtained from the Field Crop Reporting Series of the Canadian Crop Condition Assessment Program (CCAP) of the Agriculture Division, Statistics Canada (Statistics Canada, 2007, 2012a. Station-based daily minimum/maximum temperatures and precipitation data were provided by Environment Canada and other partner institutions through the Drought Watch program, operated by the National Agro-climate Information Service (NAIS) of Agriculture and Agri-Food Canada (AAFC). This data was quality controlled and temporally-interpolated to provide a continuous, historical daily time-series from 1987 to 2012. A total of 259 climate stations were involved in the study (Figure 1). Agro-climatic variables considered in the model were growing-degree days (GDD) above an ambient air temperature of 5 • C, soil water availability (SWA), precipitation (P), and crop water deficit index (WDI) defined as WDI = (1 − AET/PET), where AET and PET are actual and potential evapotranspiration (Moran et al., 1994). AET and PET were computed using the Baier and Robertson algorithm (Baier and Robertson, 1996). Soil water availability (SWA) was defined, at any given time, as the percentage of available soil water holding capacity (AWHC). AWHC at the location of each climate station was determined from soil data obtained from the Soil Landscapes of Canada database from AAFC's Canadian System of Soil Classification (CANSIS, SLC Version 3.2, http://sis.agr.gc.ca/cansis/nsdb/slc/v3. 2/intro.html). Crop-specific AWHC depends on soil type and is defined as the difference between soil water field capacity and wilting point, estimated as the cumulative amount of soil water to a maximum of 1 m soil depth or root-restricting soil layer under a difference in volumetric water at hydrostatic pressure of −33 and −500 kPa (Bootsma et al., 2005). Thermaltime (i.e., GDD-based) indices are widely referenced and used in crop development and growth response to temperature as well as crop heat stress and as an indicator of critical temperature thresholds for vernalization (e.g., 3 • C of cold time) and for tracking crop developmental and growth. For example, in the case of cereal crops like barley and wheat indices are used to track the phenological stages of emergence, tillering, stem elongation, heading/flowering/anthesis, ripening, or for oilseeds like canola: sowing/germination, emergence, leaf initiation, stem elongation, bud formation, flowering, pod development, maturity (Ferris et al., 1998;Bootsma et al., 2005;Carew et al., 2009;Robertson et al., 2013). Nonetheless, other indices or measures of thermal-time requirements for crops such as biometeorological time (BMT) may be more accurate when crop end-use quality aspects are of interest, which for wheat includes: flour protein, farinograph dough development time and farinograph stability (Saiyed et al., 2009). Station-based minimum/maximum temperatures, precipitation and AWHC were input into a cropspecific soil water balance equation called the Versatile Soil Moisture Budget (VSMB) to generate the corresponding agroclimatic indices for each climate station at a daily time step (i.e., GDD, SWA, AET, PET) (DeJong, 1988;Baier and Robertson, 1996). More than one climate station was referenced within most CARs, but the number of stations within a given CAR did vary. In generating the agro-climate indices using VSMB, the start and end of growing season is determined by the accumulation of sufficient heat units and soil moisture. Daily agro-climatic values were temporally-averaged by month. We provide a summary of our data sources used in simulating, training and validating the forecasting model in Table 1.

Satellite, remotely-sensed indices
Normalized-difference vegetation index (NDVI) was also included as a predictor of yield. It is defined as NDVI= (ρNIR − ρRED)/(ρNIR + ρRED), where ρNIR and ρRED are the nearinfrared and infrared portions of the electromagnetic spectrum of (0.75-1.5 µm or 841-876 nm) and (0.6-0.7 µm or 620-670 nm), respectively. This index is based on the contrast between the maximum absorption in the red due to chlorophyll pigments (0.4-0.7 µm) and the maximum reflection in the infrared by leaf cellular structure (0.7-1.1 µm) (Habourdane et al., 2004). The NDVI historical time-series were then combined to generate weekly composites south of 60 • latitude (i.e., across the agricultural land area of Canada). The values of ρVIS and ρNIR are bounded between 0 and 1. NDVI itself varies between −1 (indicating no vegetation) and 1 (indicating dense vegetation). Weekly NDVI imagery data was compiled from historical data from the Advanced Very High Resolution Radiometer (AVHRR, 1 km resolution), U.S. National Oceanographic and Atmospheric Administration (NOAA) for years 1987-2012. Further quality control and processing in generating NDVI weekly composites was conducted (Reichert and Caissy, 2002). NDVI values were not crop-specific. Weekly NDVI indices from Julian week 18 to 36 (i.e., May to September) of each year were aggregated into 3-week moving means.

FORECASTING METHODOLOGY
The integrated methodology includes possible crop-specific input data, and additional auxiliary indices that can be integrated and involved various modules and different computational steps (Figure 2). The forecast model was coded using the R Statistical Language (Version 2.15.3, R Development Core Team) (R Foundation, 2013) and made use of the following R library packages: MASS, robustbase, impute, WGCNA, np, mnormt, mcmc, tmvtnorm, Caret, and randomforest. The model's leading, tuning parameters and their library association, description and default values are summarized in Table 2.

Model calibration and training
The forecasting equations are based on Bayesian theory, and are an adaptation of a Bayesian auxiliary variable approach of Tanner and Wong (1987). These equations are used to jointly estimate and predict the underlying "environmental" state (i.e., prescribed by a set of agro-climate variables), the set of model parameters, including any additional auxiliary variables (e.g., remotely-sensed indices). Our statistical problem involves the prediction of future state and response, so is hereafter termed forecasting. Some ambiguity or discrepancy in terminology exists in the ecological and statistical sciences in regard to forecasting to better distinguish relevant terminology. Here, we adopt the terminology of Luo et al. (2011) in defining forecasting, prediction, projection, and prognosis, where forecasting involves a probabilistic statement on future states of an ecological system after data are assimilated into a model, prediction generates future states of an ecological system based on logical consequences of model structure, projection

www.frontiersin.org
June 2014 | Volume 2 | Article 17 | 5 FIGURE 2 | Major components of the sequential, spatial-based Bayesian forecasting model of regional-scale crop yield. Model parameters that are tuned and fixed within each model component are identified to the right. The procedure consists of calibration and a prediction steps. Hindcasting/ backcasting via leave-one-out cross-validation (LOOCV) is performed using the input data as training data. Sequential-based forecasting is accomplished via the random-forests learning algorithm. This methodology is able to integrate other possible auxiliary historical indices, such as regionally-downscaled indices derived from El-Nino Southern Oscillation (ENSO) index, Pacific Decadal (PDO), and the Pacific North American (PNA) climate teleconnection index. Forecast indices generated from complex agroecosystem models, such as WOFOST (WOrld FOod STudies, Wageningen UR) agricultural production model and APSIM, the Agricultural Production Systems sIMulator. Forecast indices derived from downscaled future CCCma-CanRCM4 denotes CanRCM4 regional climate model (RCM) scenario output produced by the Canadian Centre for Climate Modeling and Analysis (CCCma) for the North American region with a horizontal grid resolution of approx. 25 km and available at: http://www.cccma.ec.gc.ca/data/canrcm/CanRCM4/ could, in the future, also be integrated for generating regional yield forecasts. Adapted from Kouadio and Newlands (2014), with permission.
generates future states of an ecological system conditioned upon scenarios, and prognosis is a subjective judgment of future states of an ecological system. There are numerous methodological and practical advantages conferred by introducing auxiliary variables in an estimation and forecasting problem: (1) when computing the likelihood density of the observed data, f (y|θ) is analytically intractable, (2) when the inclusion of the auxiliary variables simplifies or improves the computation using Markov-Chain Monte-Carlo (hereafter, MCMC), whereby, f (y|x, θ) is a more complete data likelihood, and can be computed faster than the observed data likelihood, f (y|θ), (3) when a statistical problem involves latent or hidden variables, (4) when underlying states are controlled by dynamic processes occurring at multiple spatial and temporal scales (i.e., hierarchical problem), (5) when spatial interpolation and/or upscaling of point-or site-based data is impractical due to data sparsity and/or high spatial heterogeneity, and fine-and coarser-grained information must instead be integrated, (6) a response has a strong spatial dependence and estimation and forecasting is needed across a large spatial region, (7) when auxiliary variables track a response variable faster (i.e., near-real time), reducing delay, and (8) to improve a sequential forecast, by conditioning it at a given fixed or variable time-step.

Bayesian inference
Bayesian inference uses Bayes' rule to update the probability estimate for a hypothesis as additional evidence is learned, given by; where H is any hypothesis whose probability is influenced by data or observational evidence E, p(H) is the prior probability (probablity of H before E is observed), p(H|E) is the posterior probability of H after E is observed. p(E|H) is the probablity of observing E given H, also called the likelihood, where p(E) is the marginal likelihood or evidence. Bayes' rule therefore states that the "posterior probability is proportional to the prior times likelihood," or similarly, "posterior is prior times likelihood over evidence." The posterior is the result of updating our prior information with data (i.e., learning). Since the denominator does not depend on any H, it acts as a proportionality or normalization constant, and is obtained by averaging the "prior times likelihood" (i.e., numerator) over all possible hypotheses, H, such that, If we observe data, y, from a sampling density (i.e., distribution of the data), p(y|θ), where θ is a vector of parameters, and assign θ a prior p(θ), then, following from Equations (1, 2), the posterior density of θ is then; We introduce underlying state variables, q, (referred to as a Bayesian data augmentation, auxiliary or latent variable approach). The resultant joint posterior probability, taking into account both the θ parameters and the state variables, is given by; The posterior of model parameters is then; Because the necessary integration to obtain the posterior distribution is analytically intractable, MCMC using the Metropolis-Hastings algorithm is typically used to obtain a sample from the joint posterior distribution of q and θ (Albert, 2009). We assume that the parameters, θ, have a multivariate normal distribution (MVN) as the conjugate prior density. The Monte-Carlo standard errors and confidence intervals are computed using the method of batch-means (i.e., that generally requires posterior samples of at least 1000) (Jones et al., 2006).
For sequential-based forecasting, we adopt a state-space modeling framework, and express the probability densities above, as functions of a set of observed, p(y t |x t , θ), and underlying process, p(x t |x t−1 , θ) discrete states of a dynamic system, for a sequence of times t = (1, . . . , T), that evolve from an initial state, p(x 1 , θ).
If we denote and discretize any auxiliary explanatory variables, denoted as z t , then, Equation (4), can be re-expressed (Tanner and Wong, 1987;King, 2012), as; where, The x t denote environmental states (i.e., a vector X of agroclimate variables or indices), and z t denote states of additional auxiliary explanatory variables (i.e., a vector Z of remotely-sensed variables or indices). At each iteration of an MCMC algorithm, the model parameters and auxiliary variables are updated. One can further identify different spatial regions, s, such that x t ← x s,t and z t ← z s,t . The posterior predictive or forecast density is either the replication of y given the model (denoted y rep ) or the prediction of a new and unobserved y, (denoted y new ), given the model. This is the likelihood of the replicated or predicted data, averaged over the posterior distribution p(θ|y), given by; We assume a linear model that, in matrix notation, for a general response vector, Y, is given by; under the distributional assumptions, where D is the design matrix, β is a vector of unknown model parameters, and is a vector of independently and identically-distributed normal random errors with mean zero and variance σ 2 . Our design matrix, D, involves two sets of explanatory variables, such that D=(X|Z), having columns x 1 , . . . , x n p augmented with columns z 1 , . . . , z n q . N n denotes the MVN of dimension, n, with mean μ, variance-covariance matrix and I the identity matrix.
Crop yield within each CAR region was modeled as a multivariate regression equation with spatially-varying coefficients (Banarjee et al., 2004), whereŷ i,j denotes the estimated or expected value of y i,j , the crop yield for year j (i.e., to distinguish calibration yearly timestep from the forecast monthly time-step, denoted by t), where and z (l) i,j denote the l predictor variables for i at time j. The total number of predictor (i.e., n p agro-climate and n q auxiliary remotely-sensed) is n = n p + n q . The coefficients, β (l) i,j are spatially and temporally-varying. Uncertainty, ε s,i is independent and normally distributed (random error) with mean zero and variance σ 2 i . The regression coefficients, γ i,0 (yield intercept), and γ i,1 (technology trend coefficient) are used to de-trend the yield data and α is a lag-1 autoregressive term. The technology trend accounts for historical increases in yield from genetics, management practices to improve soil fertility, water conservation and minimize soil erosion and nutrient leaching. The technology trend in yield was assumed to be linear. The inter-annual autocorrelation in yield was assumed to vary across CARs.
For the i th region, the design matrix, for fixed i, is given , . . . , Z n i , which is associated with the full/complete model parameter vector, i = γ 0 , γ 1 , α, β i , σ 2 i . In matrix notation, . .

Robust regression
Robust regression is a specialized technique used in the model to ensure it is less sensitive to outliers and may be applied to situations of unequal variance (Khan et al., 2007(Khan et al., , 2010. This technique was applied to account for heteroscedasticity and outliers in the historical data during model training and calibration. Heteroscedasticity occurs when the variance of an explanatory or predictor variable is dependent on its value. Robust regression also provides a flexible and general technique for modeling based on residuals, because it is less influenced by the presence of outliers. Variable-selection is then employed for prediction in the case where there are no significant outliers. Robust regression is a compromise between excluding outliers entirely from the analysis and treating all the data points equally, as it is done in ordinary least squares (OLS) regression. The idea of robust regression is to weigh the observations differently based on how well-behaved the observations are. Several approaches to robust estimation have been proposed, including R-estimators and Lestimators. However, M-estimators are more widely used because of their generality, high breakdown point, and efficiency. Mestimators are a generalization of maximum likelihood estimators (MLEs). Standard MM-type regression estimates use a bi-square re-descending score function and returns a highly robust and efficient estimator (with 50% breakdown point and 95% asymptotic efficiency for normal errors). The tuning parameters of lmrob, the R package we used to implement robust regression, comprises an MM-type robust linear regression estimator that consists of an initial S-estimate, followed by an M-estimate via regression that enables one to specify a breakdown point and asymptotic efficiency consistent with normal distributional assumptions (Koller and Stahel, 2011).

Bootstrap least-angle regression
Bootstrap robust least angle regression (B-RLARS) was applied in selecting the leading m ≤ n q potential predictors from z 1 , z 2 , . . . , z n q after adjusting for the effects of x 1 , x 2 . . . , x n p . Let z 1, * , z 2, * , . . . , z m, * denote the leading n q ranked variables.
A final, reduced model, with m * ≤ n q predictors, is obtained with robust leave-one-out cross validation (LOOCV), by selecting auxiliary predictors from z 1, * , z 2, * , . . . , z n q , * , after adjusting for the influence of x 1 , x 2 . . . , x n p (Khan et al., 2007(Khan et al., , 2010. Let the complete model design matrix, as D C,T,(n p +m * ) following the selection of best-fit predictors. Let be the vector of model parameters corresponding to the joint distribution of D C,T,(n p +m * ) (recall the model parameter vector is i = γ 0 , γ 1 , α, β i , σ 2 i ). The conditional likelihood function of y 2 , y 3 , . . . , y n given y 1 is then; f y 2 , . . . , y n |y 1 , , D C,T,(n p +m * ) = f y 2 |y 1 f y 3 |y 1 , y 2 · · · f y n |y 1 , y 2 , . . . , y n − 1 (13) where the CAR subscript, i, was omitted. Given that y 2 , ..., y n is independent of given y 1 , , and D C,T,(n p + m * ) , it follows that; p y 2 , . . . , y n , D C,T,(n p +m * ) |y 1 , , = p y 2 , . . . , y n |y 1 , , D C,T,(n p + m * ) Under a fully Bayesian approach, prior distributions for both and must be specified. We assumed a separation of variables, such that, p ( , ) = p ( ) p ( ), and constructed an empirical prior distribution for by residual bootstrapping the data from neighboring CARs. Information from neighboring CARs was also considered, employing an approach previously outlined and cross-validated by Bornn and Zidek (2012). This boostrapping procedure was as follows: the neighboring CARs of each given CAR were identified and the calibrated yield regression equation was fit to the given CAR and other surrounding, neighboring CARs simultaneously. The fitted models from the set of CAR neighbors were then cross-validated using the data for the given CAR. The top k ranked CARs, which could comprise a single or multiple neighbors to a given CAR, were then finally selected based on obtaining the minimal cross-validation error. This method was found to generate a more meaningful prior distribution of the model parameters by providing additional spatial covariance support (i.e., considers the residual spatial covariance between CARs). The joint posterior density function can be expressed as; p(y new |y) = θ p y new |β, σ 2 p β, σ 2 |y dβdσ 2 The model predicted future values of the model variables (i.e., forecasted) within a given CAR using only the information from selected model predictors. This assumed a Gaussian prior distribution for each predictor variable, as a conjugate prior for the joint multivariate Gaussian posterior likelihood distribution. It is well-known that choosing a conjugate prior ensures that the resulting posterior distribution is of the same distribution family as the prior (with a closed-form solution), and helps to avoid over-fitting on small training samples. However, the selected predictor variables were determined not to be good predictors of one another and were uncorrelated (results not shown here). For this reason, the forecast method instead uses the entire set of available variables when selecting the best subset of predictors that jointly estimates the unobserved values of future variables. This was first accomplished for each CAR using the multi-variate adaptive regression splines (MARS) non-parametric algorithm, but was later substituted to use the Random Forests algorithm (Chipanshi et al., 2012;Newlands and Zamar, 2012;Kouadio and Newlands, 2014). The random forests algorithm proved to be more computationally efficient and accurate because it created multiple bootstrapped regression trees without pruning and averages the outputs, very effective in reducing variance and error in high dimensional data sets (Breiman, 2001;Chen et al., 2012).
Also, by incorporating non-parametric Bayesian priors, we realized that our methodology would be more flexible and generally applicable in modeling with a wide set of variables and indices. It would also not have to assume a conjugate prior, which in many realistic or real-world cases is inappropriate. Model complexity is automatically determined by the model selection method we used because it specifies a maximum number of predictors (R-LARS followed by robust cross-validation).

SENSITIVITY ANALYSIS AND VALIDATION METHODOLOGY
Dynamic sensitivity analysis was conducted by simulating the model for different input settings of the leading parameters. We also experimented by including and withholding precipitation (P) as an input variable (i.e., a variable with high stochasticity) to evaluate the change this had on forecast error. Ranges for these simulations were set by considering both default values provided in statistical literature linked with the R library code and values prescribed from the application of various algorithms to real-world data, as in the case of the random forest algorithm parameters. The goal of cross-validation was to estimate the expected level of fit of a model to a data set that was independent of the data that were used to train the model. We cross-validated our model forecasts against historical data (1987-2011) by backcasting/hindcasting involving statistical bootstrapping and then leave-one-out cross-validation (LOOCV). The LOOCV selects a single (i.e., year) observation from the original full record as a sample of validation data, and the remaining observations as the training data. This was repeated such that every observation that exists in the sample (i.e., year) was used once as validation data. We performed an additional test of this crossvalidation procedure by removing more than just 1 year of data at a time. This considered samples (K) of 1-5 years, following a K-fold cross-validation procedure. This additional test evaluated how robust the forecasts were to removal of training data/loss of input information termed forecast degeneracy. The model was also forecasted for the 2012 growing season and its output was compared to available statistical survey data across a range of spatial scales. In 2012, a total of 64% of land across the Great Plains/Midwestern United States is under abnormally dry to exceptional drought conditions (United States Department of Agriculture (USDA) Drought Monitor). NOAA has reported that the January-June period in 2012 was the warmest first half of any year on record for the contiguous United States, and the warmest the area has ever experienced since the dawn of record-keeping in 1895 (United States National Climatic Data Center, NCDC). In 2012, nearly 50% of corn and 37% of the soybeans grown in the United States were rated poor to very poor, with three-quarters of US cattle acreage in drought-affected areas. As this year exhibited such extreme conditions, it provided data with high environmental variability and a strong test of the accuracy and robustness of the model's forecasting methodology.

OBSERVED VARIABILITY
The observed variability of the agroclimate and remotely-sensed vegetation variables was considerable when pooled across CARs and years within the study region (Figure 3). In these box-plots (and throughout this paper), the solid line indicates the median error, with ends of the box indicating the upper and lower quartiles as a measure of data spread spanning 50% of the dataset and eliminating the influence of outliers. The whiskers are the two lines outside of the box that extend to the highest and lowest data values. An increasing trend in growing season mean-monthly precipitation (mm) (see inset a) was evident, explained by frequent and larger convective-driven storm events deliver rainfall across the region, accompanying increased summer temperature. Overwinter precipitation and accompanying spring thaw in May raised crop available water and reduces the soil capacity to store available water (inset b) until later in the season, when rainfall decreases (inset a) and significant soil water evaporative losses typically occur. This pattern of variability was consistent with the observed increase of crop water deficit index (WDI) (inset c). Lower/upper quartiles for GDD ranged between 180 and 450 GDD over the growing season, with 180 on average accumulated by end of May. Over the years in the climate station record, many of the CARs experienced extreme rainfall (as outliers in the box-plot of distribution quartiles). CARs also experience extreme temperatures in the middle of the growing season, signified by numerous outlier values in June from the plot of the GDD index (degree days above 5 • C) (inset d). The variability of AVHRR-based NDVI (Figure 3, inset E) identified many outlier CARs that are situated far outside of the main distribution quartiles, indicating a general tendency for over-estimation of NDVI early in the growing season, and under-estimation mid-season, likely due to surface water backscatter.

MODEL SENSITIVITY
The sensitivity of forecasts was tested by simulating the model under changes in the: (1) number of predictors, (2) threshold for removing correlation variables (CORR trim), (3) threshold for the upper percentile for identifying outliers in cross-validation (CV trim), and (4) chain-size specified in the MCMC-simulation sampling from the joint posterior distribution (Equation 14) (Figure 4). These results tracks the change in CAR-based rootmean-square error (RMSE) uncertainty in forecasted yield versus each of the leading design parameters, whereby CORRtrim must be greater than 0.93 and a CV trim must be a value between 0.01 and 0.04 to minimize RMSE. These parameters were regionallycalibrated or tuned and set to 0.95 and 0.04, respectively. MCMC chain-size did not have a large effect on the median, or 10% and 90% percentiles or forecast RMSE uncertainty. The sensitivity to MCMC chain-size did not appear to change through the growing season, as the curve slopes were mostly flat for July, August and September. Therefore, we set the chain-size to a reasonable, mid-point value of 1000 to reduce run-time without any substantial loss of precision. A minimum of four predictors minimized RMSE. The inclusion of additional predictors, such as standard deviation of each of the predictor variables, did not make a significant difference (i.e., neither raised nor lowered forecast uncertainty). An additional sensitivity test of model on the effect of including precipitation (P) on forecast uncertainty was conducted (Figure 5). Selected regression results are summarized for two CARs that generated the best and worst regression fits  yield (A,B), residual versus predicted yield (C,D), and the relative importance or proportion of observed variance explained by each of the best-fit selected predictors (E,F) were generated. Model error reduces when considering from one to up to three predictors. Model accuracy increased when NDVI was included with the agroclimate variables (i.e., WDI, GDD, SWA) and year. Based on these findings, the default maximum number of model predictors was set to be five to regionally-calibrating yield across all the CARs in the study region. This sensitivity analysis identified whether all or a subset of these variables are necessary to obtain a best-fit in each CAR. For instance, with both agroclimate and NDVI indices as predictors, WDI, year and NDVI were selected in CAR 4741. When we constrained model input to NDVI only, RMSE for this region became very large, and is why CAR 4741 appears as an outlier for the "NDVI," and is no longer an outlier in the "agroclimate and NDVI indices" crossvalidation run (see Figure 5, right inset). This was also the case for CAR 4607 and 4609, where NDVI and year were the only predictors selected by the variable-selection algorithm of the forecast model. Further sensitivity-checks were made by re-running the forecast model and examining the change of RMSE of the output for different values of other design parameters listed in Table 2. In many instances, no appreciable effect was measured to support or justify changing the default settings obtained from the scientific literature, as was the case for the random forest tree algorithm. Although planting date did vary across our region (mostly by 15 days min to max), it was not selected as a leading covariate of yield and when forced into the model also did not have a significant effect on RMSE uncertainty, so we fixed this variable.

MODEL VALIDATION
The spatial pattern of model forecast uncertainty (RMSE) across all the CARs was generated for different combinations of the input predictors (Figure 7). Change in RMSE for many of the CARs was dependent on how the model was constrained in selecting only the prescribed maximum number and type of predictor variable. Combining both NDVI and agroclimate predictors enabled minimal uncertainty in most CARs (i.e., within 150-250 kg/ha). However, there were CARs in northern Alberta (4860), central Saskatchewan (4730), and south-eastern Manitoba (4609 and 4610) for which uncertainty could not be appreciably reduced further.
Mean percent departures between historically observed and 2012 forecasted yield varied between under-estimation of 1-4% at the mid-point (i.e., July) of the May-October crop growing season, to over-estimation of less than 1% at end of season (i.e., September) ( Table 3). This compares to a slightly higher mean percent departure based on independent, field crop-reporting yield data in 2012, of under-estimation of 3.6% mid-season, to less than 1% of over-estimation end-season (Table 4). Yield was forecasted for 2012 to be 3386, 2471, 2559 kg/ha in Alberta, Saskatchewan and Manitoba, respectively. Across the Canadian Prairies, forecasted yield was 2803 kg/ha, varying between 2513 and 3092 at the 90% confidence level (Figure 8). Further inspection of yield variability within the two CARs with the highest FIGURE 4 | Sensitivity of forecasted error in yield (RMSE, kg/ha) (y-axes) to leading model design parameters (x-axes). The solid line, dotted, and dashed lines track the median yield across: all CARs, CAR 4841 (low sensitivity case), and CAR 4731 (high sensitivity case), respectively, for: (A) maximum number of predictors (trim factor to 0.04 and number of predictors varies from 1 to 10), (B) CORR trim factor (i.e., threshold for removing correlated variables), with the maximum number of predictors set to 5, (C) CV trim factor (i.e., a parameter of the lmrob() R library algorithm of the robustbase package that is a percentage for the upper percentile of observations to drop based on their cross-validated errors). The correlation trim factor was set to 0.04 on this sensitivity run, and number of predictors was set to 5 and the x-axis is in log scale, and (D) shows two panels divided by the solid horizontal line-the upper panel shows the range defined by the difference between the 10% and 90% percentiles of forecast yield in relation to changes in the MCMC chain size (mcmc.n). This considered all CARs and years of historical input data (i.e., 1987-2011). The three lines are the results for July (dotted), August (dashed), and September (solid). The lower panel tracks RMSE in forecasted yield corresponding to the same monthly forecasts in the upper panel.
forecast error (i.e., CV > 18%) for CARs (4610 and 4860) revealed that they exhibit unique patterns that departed from our model assumptions and variability observed within other CARs (Figure 9). Sample autocorrelation functions (ACFs) of historical crop yield that track the strength of the inter-annual dependence in yield between successive years, revealed an extended (i.e., 3year duration) correlation with a decaying or tapering pattern for CAR 4610, indicating an autoregressive random process of order 3, AR(3), with a positive coefficient. For CAR 4860, an alternating and tampering pattern was identified, indicating an autoregressive process of order 1, AR(1), with a negative slope coefficient.
The RMSE uncertainty associated with the median forecast within CARs was 330 kg/ha, varying between 280 and 360 kg/ha at the 25% and 75% percentile, respectively, from LOOCV analysis of forecasts across all historical years. For the 2012 growing season, the median was less at 270 kg/ha, ranging between 230 and 300 kg/ha. When cross-validating the model forecasts, the number of years that were left out was varied from 1 to up to 5 years at a time (Figure 10). While we utilized a LOOCV procedure with 1 year left out at a time. The additional cross-validation results from the K-fold cross-validation led support for this specification, indicating that specifying any higher number of years left out at a time, only reduced the model's forecast accuracy (RMSE error) by less than 5%.

MODEL PERFORMANCE
Model forecasts were sensitive to both the choice of design parameters and the set of input explanatory (predictor) variables. Agroclimate variables contributed significantly to improving overall multivariate regression fits across most CARs. The CARs, where the model performed the worst, only had NDVI and year as explanatory variables. However, NDVI did explain index (WDI), growing degree-days (GDD), soil water availability (SWA), where * denotes P is not included, and * * denotes P is included; NDVI; WDI, GDD, SWA, and NDVI. Note: seeding date was assumed fixed, and year, t, was included as an additional input variable in all the sensitivity runs. The ID's of outlier CARs are indicated. a significant portion of the observed yield variance in time and space (Figure 6). Because the sensitivity of forecasts to changes in the leading model design parameters was successfully minimized. This suggests that selecting the best predictors of yield within each CAR region, thresholding the detection of cross-correlation between predictors (so as to avoid multi-collinearity), combining information between neighboring CAR regions, and using robust regression to detect outliers, significantly improved robustness of the model's forecasts across the Canadian Prairies (i.e., a large study region).
A simple linear regression model to predict Canadian spring wheat crop yield and applied at the provincial scale achieved 53-77% accuracy relying solely on agro-climate indices (i.e., growing degree day (GDD), precipitation (P), actual/potential evapotranspiration (AET, PET) and crop water deficit index WDI) (Qian et al., 2009). At the provincial scale, percent departure values for 2012, and LOOCV results across the entire region (all CARs) supports that our integrated model forecasts attains a higher overall or forecast skill (CV < 10% or accuracy > 90%), even accuracy does vary between CARs. With further adjustment to our model to allow for alternating autocorrelation with negative slope coefficients, we infer the two CARs with the highest uncertainty would also be greatly improved. Seeded area, irrigation and/or pests could also explain why these two CARs depart from all others, warranting further investigation. Also, without using agroclimate data, and using only satellite remote-sensing data (i.e., NDVI) was shown to predict hstorical wheat yield ranging between 47 and 80% accuracy at the CAR scale and across the Canadian Prairies (Mkhabela et al., 2011). Our findings show that while NDVI is a strong predictor of yield, combining agroclimate and NDVI indices substantially increases the accuracy and confidence in regional forecasts. One can also better discriminate between key ecological variables (i.e., temperature and available soil water) driving yield variability in time and space, which would otherwise, not be possible when only considering a single index.
Globally, wheat accounts for 20% of calories consumed. It is one of the most important crops, alongside rice and maize (corn). Our integrated model forecasted spring wheat yield for 2012 of 2803 kg/ha (Canadian Prairies) and associated uncertainty of 2513-3092 at the 90% confidence level agrees to close to 0.1% of end-of-season field survey yield, and 4% at the start of the growing season. This forecasted yield for 2012 compares reasonably well to independent crop insurance data for 1965-2007 on spring wheat yield across the Prairies accounting for a progressive increase of 0.8-1%/year technology-trend in yield during the period of 2007-2012. Crop insurance data indicated highest and lowest average yields of 4589 and 80.9 kg/ha in 2005 and 1988, respectively, and an overall historical average of 1991.2 kg/ha (Robertson et al., 2013). Total factor productivity (TFP) growth rate in crop production  for the Canadian Prairies was estimated to be 1.77%, where technology change contributes close to 83%, in addition to the smaller contributions from scale effects and changes in the degree of technical efficiency (e.g., improved machinery or crop genetics). TFP has, historically, increased the most in Manitoba, explained in part by reduced use of summer fallow, but historically has lagged within Alberta (Stewart et al., 2009). Also, Canadian Plant Breeder Rights (PBR) qualifying wheat cultivars and improved soil quality have increased wheat yield by 37.2 (1%) to 54.5 kg/ha (2%) within the Canadian Prairies (Carew et al., 2009). Our forecast model could also better account for technology-change, rather than assuming a fixed, linear rate of productivity growth within each CAR based on historical data and trends. Instead, a more detailed equation involving percent of wheat seeded area, spatial cultivar diversity index, average cultivar age and annual cropland planted could be introduced (Carew et al., 2009) in addition to the year time trend already considered (that captures advances in non-genetic technology). However, this could make our regional-scale model too complex. The model generated robust forecasts of spring wheat yield for this extreme year 2012 based on historical data. It was also able to pinpoint two CARs in southern Manitoba  yield (A,B), residual versus predicted yield (C,D) and the relative importance or proportion of observed variance explained by each of the best-fit selected predictors (E,F) are provided. NDVI-30-32(28-30) denotes the 3-week moving average NDVI for Julian weeks 30-32, and WDI-7 denotes monthly-average of water deficit index in July.
that experienced extreme growing conditions in 2012, but also identify that yield within this area exhibits extended correlation with a duration of up to 3 years-very likely attributed to the impact of "persistent" drought and flooding conditions. While TFP has increased most in Manitoba, monthly rainfall variability is high within these two CARs, varying between 17 and 20%.
More climate stations and rainfall data for these two CARs might therefore help to reduce forecast error in this region.

MODEL LIMITATIONS
Our model currently does not consider crop genetics and does not input cultivar-specific data on yield or their developmental   could also enable a determination of the best spatial resolution to integrate agroclimate, remote-sensing vegetation and phenology indices in generating regional-scale forecasts. Recent advances in multivariate time series segmentation methods, such as that described in Graves and Pedrycz (2009) could then be applied to historical remote sensing data to help automate the identification of transition points between stages directly from spatially-referenced, longitudinal time-series of yield obtained from culitvar crop-breeding field trials. We aim to further explore how the performance of our forecast model could be improved by considering crop genetics by including an auxiliary index based on APSIM agroecosystem model forecasts across a series of site-specific long-term cropping sites across the Canadian Prairies. The use of auxiliary indices derived from more complex models of agroecosystem soil, water, air and agronomic management interactions could offer further potential reductions in forecast uncertainty, and an ability to extend our seasonal forecasts and link them to longer-term yield forecasts.
Given that our model does not yet integrate crop phenology, cultivar differences, pest infestations, the interpretation and broader application of our current model to other regions and crops is limited. However, representing such differences at a regional-scale is still a big challenge due to lack of such data   Figure 5). Unlike the single year model forecast results for 2012, the LOOCV procedure that considers all possible combinations of missing data (i.e., years) in the historical input data, also identifies CAR 4609 as an outlier. (B) Variation in model forecast error (i.e., skill) in terms of "degeneracy" in the input data, simulated by increasing the number of years at a time that were left out in the LOOCV cross-validation procedure. Results correspond to a total of 1000 MCMC runs. CAR 4606 was excluded as it had only 7 years of missing data. across large regions, so applying our model to other areas could generate useful insights for decision-makers, where a lack of data exists. Building a component to spatial track pest infestations from pest monitoring data, like that of wheat midge (Sitodiplosis mosellana), a common agricultural pest within most areas of the world where non-resistant wheat varieties are still grown, could also improve the reliability of the forecast model. With further improvement and extension of input data, and refinement of its design, we anticipate our forecast model could provide reliable and robust regional-scale crop yield forecasts. Alongside using our model to forecast crop yield, participatory workshops and stakeholder knowledge-sharing and engagement is critical for enabling open discussion of any real and perceived differences in model and farmer forecast skill. Such approaches have proved invaluable for improving forecast models and their relevance, impact and broader applicability (Pease et al., 1993;Potgieter et al., 2003;Roncoli, 2006;Crane et al., 2010).

IMPROVING FORECAST SKILL
Several key improvements for our forecasting methodology and use of statistical algorithms for variable-and model-selection could be further explored. A more general autocorrelation function, which can account for extended patterns, could be tested. Also, heavier penalties for selecting predictor variables whose values are available late in the growing season could also be tested. This would make the selection of the best set of predictors through the growing season to be less constrained and dependent on historical patterns and thus more responsive to future environmental changes affecting yield. Also, tuning of the random forest algorithm parameters could be explored using the tuneRF R library set of algorithms. Even though future changes may be uncertain and depart significantly from historical trends, the necessity of an adequate length and quality of historical data does impose a requirement to partially-or fullycalibrated model-based forecasts using our integrated methodology. Recently, the accuracy in forecasting crop yield (i.e., wheat, barley and canola) also using NDVI remote-sensing and agroclimate data and employing multivariate linear regression and a variety of machine learning techniques other than the random forest tree algorithm, such as model-based recursive partitioning (MOB) and Bayesian neural networks (BNN) has been investigated (Johnson, 2013). Hierarchical clustering across the CAR regions was used for variable-selection. Based on a Mean Absolute Error (MAE)-based forecast skill score, reportedly multiple linear regression (MLR) achieves the highest skill for spring wheat and canola, while BNN and MOB are able to further increasing skill for barley. This may indicate that if new auxiliary agroclimate or remote-sensing indices are utilized which are non-linearly related or introduce a non-normal (e.g., highly skewed or multimodal) probability prior distribution, or regional-scale climate and yields become more non-linearly related in the future, forecasting algorithms that are able to better deal with non-linearity may need to be substituted to ensure quality in crop forecasts. Here, substituting our MCMC algorithm with an integrated nested Laplace approximation (INLA) that could offer a computationally cheaper alternative. The INLA algorithm would enable representing agro-climate, remote-sensing or other predictors as functions that can be each fit to a different model with a different form of spatial dependence and degrees of non-linearity.
There are also several key improvements to the input data that could be undertaken. Assimilating additional climate station data within certain CARs that exhibit strong spatial heterogeneity (compared to other CARs) in temperature and precipitation could provide further reductions in forecast uncertainty. This could help reduce uncertainty, in addition to accounting for autocorrelation in these variables, within the two CAR's in southern Manitoba that showed highest RMSE that we speculate is due to localized drought and flooding conditions that have occurred there. Validating the agroclimate index for soil water availability (SWA) that was derived from the VSMB soil water model against other independent data would be informative before testing other available remotely-sensed indices (i.e., other than NDVI), such as EVI2 (Enhanced Vegetation Index) for this region could improve forecasting there. Integration of EVI2 remote-sensing index would incorporate the blue portion of the light spectrum and help to better track crop yield by reducing sensitivity of the NDVI to this backscatter. Also, summer storms can deliver large amounts of rainfall in the Canadian Prairies and even in semi-arid environments, sufficient long periods of sustained high rainfall can raise the water table (upper level of saturated soil-water zone) causing groundwater flooding and prolonged conditions of excess surface water, ponding and runoff. Integrating other remotely-sensed indices could be used to track waterlogging (excess water) between the development stages of tillering and anthesis. This would increase the reliability of the forecast model to recognize, respond and forecast yield under extreme conditions of increase rainfall intensity expected to occur in the future due to increased climate variability. Currently, precipitation was removed as a direct, leading predictor given it did not improve forecast accuracy because of its high variance. Instead, precipitation was indirectly factored into the WSI index. Nonetheless, precipitation, early in the growing season, can have a leading influence on attainable yield (He et al., 2013). For example, Bolton and Friedl (2013) have recently reported that significant precision in predicting corn yield in the United States can be achieved using the Normalized Difference Water Index (NDWI) (R 2 = 0.69 in semi-arid areas) as well as the two-band EVI2 of NASA's MODerate-resolution Imaging Spectroradiometer (MODIS) onboard NOAA's Terra satellite, predicting corn and soybean yield better than NDVI. From 2000 onwards, NDVI data, available the MODIS could also be utilized (MODIS, 250 m resolution, Level-2 Gridded (L-2G) surface reflectance data (collection V005)) (NASA, 2012). A statistical regression-based comparison of MODIS with AVHRR 16-day (i.e., bi-weekly) NDVI composite data indicates that for the row crop and small grains land cover classes, over 90% of the variation observed in MODIS NDVI is associated with variation in AVHRR NDVI values (Gallo et al., 2004). Findings from this comparison also indicate that there remains some residual cloud contamination in both data sets that contributes to outliers. Therefore, replacing AVHRR NDVI with MODIS NDVI post-2000 may not reduce the occurrence of outliers, unless cloud contamination is dealt with first. For this reason, AVHRR was deemed adequate for the purposes of our modeling, pending further quality control (i.e., explanation and subsequent reduction) of NDVI residual variance. Cloud removal, cropland area and boundary masking and other quality control measures for MODIS NDVI are described in Davidson et al. (2009). Leaf area index (LAI), soil-adjusted vegetation index (SAVI) could also be integrated and help track yield across different developmental stages based on existing evidence they correlate well with the different crop phenological stages (Kumar et al., 1999;Graves and Pedrycz, 2009). Other combined and improved spectral indices have been developed, validated to better incorporate non-linearity between NDVI and crop biophysical parameters and under water-limited conditions (Habourdane et al., 2004;Eitel et al., 2008;Smith et al., 2008). Further work could also optimize the model to the time period within the growing season that is the most sensitive and most significant for establishing final yield. So, instead of referencing WSI across the entire season, WSI for the most sensitive time period of crop growth could be used. It is important to highlight that inclusion of remote-sensing indices is not to complete replace field crop survey data in training and validating seasonal crop forecasts, but to supplement, improve and add value to such forecasts in both a spatial and temporal context.

CONCLUDING REMARKS
Our forecast modeling findings demonstrate that forecast skill at the regional-scale can be improved with an integrated approach involving the integration of different auxiliary indices (e.g., agroclimate and remotely-sensed) within a probabilistic, Bayesian framework that incorporates both data and model structural uncertainty. The need for improved operational forecasting methods is particularly urgent because forecasting models are so crucial for guiding and informing agricultural crop production, management and policy decisions. Models also provide forecasts with extended "early-warning" or lead-time, enabling stake-holders to better respond to potential impacts and emerging risks. Recent results from the world's largest standardized inter-comparison of ensemble-based model projections of climate change impacts on wheat crop yield calibrated against >300 field data-sets, reveals net-uncertainties of 20-30% CV when partially calibrated, and 2-7% CV for fully-calibrated models . Such uncertainty is explained, in part, due to observed variability in climate, soil water holding capacity, sowing date and agronomic management (i.e., fertilizer application) Carter, 2013). Our findings indicate that by coupling satellite, remote-sensing with agroclimate data, forecast uncertainty in model-based crop yield forecasts can be further reduced to within the range of 1-4% (i.e., for spring wheat in the Canadian Prairies). This integrated methodology offers a consistent, generalizable approach for sequentially forecasting crop yield at the regional-scale. It provides a statistically robust, flexible way to concurrently adjust to data-rich and data-sparse situations, adaptively select different predictors of yield to changing levels of environmental uncertainty, and to update forecasts sequentially so as to incorporate new data as it becomes available. It also provides additional statistical information (i.e., bias, variance, sensitivity and cross-validation statistics) for better assessing the reliability of generated crop yield forecasts in time and space.
We aim to further apply our integrated forecast methodology to generate crop forecasts across Canada by embedding it within an operational forecasting system called the Integrated Canadian Crop Yield Forecaster (ICCYF). This operational decision-support tool with provide forecasts for all major Canadian crops with national coverage across all major agricultural areas. This will require the validation of our model for other major Canadian crops such as barley, canola and soybean. Operational crop outlooks likely will delivered by AAFC and publically-released in partnership with Statistic Canada's Crop Condition Assessment Program. With further extension and validation testing of this forecasting methodology to other major Canadian grain and oilseed crops (e.g., barley and canola), this method will provide a reliable approach for generating more rapid and lower cost crop forecasts each year, within the growing season, and across Canada's agricultural extent. Alongside our continued research work, further international collaborative efforts will seek to improve the design of a consistent and reliable international framework for crop yield and production forecasting and outlook reporting that integrates remote-sensing and agroclimate predictors (Nikolova et al., 2012). Our findings reported here demonstrate that our integrated methodology can provide robust and accurate regional-scale forecasts of spring wheat yield. It offers a flexible, scalable approach for integrating additional auxiliary indices, independent of spatial or temporal scale. In this way, it can integrate remotely-sensed vegetation or finer-scale indices based on field site data. Oursensitivity and cross-validation model findings also provide rigorous analytical testing of the relative benefits of using and combining difference and diverse sources of information in terms of better explaining and reducing uncertainty in model-based forecasts. This integrated method also provides additional statistical support for assessing the accuracy and reliability of model-based crop yield forecasts in time and space. We aim, in the future, to collaborate internationally in applying and validating the integrated forecasting methodology to other countries and continents that experience different regional climate and cropping conditions.

ACKNOWLEDGMENTS
This research was funded under the Growing Forward One, Sustainable AGriculture Environmental Systems (SAGES) Program of Agriculture and Agri-Food Canada (AAFC), with additional support from AAFC's National Agro-climate Information Service (NAIS) and the Growing Forward Two, Canadian federal funding program. We thank Charles Serele (Federal Department of National Defence), Budong Qian and Richard Warren (AAFC) for their assistance in acquiring and manipulating agro-climate input data. We thank Andrew Davidson for help in processing and preparing the NDVI remotesensing/earth observational data. We also acknowledge feedback from a broader set of government collaborators involved in the operational development and delivery of the crop outlook prototype tool: C. Champagne, P. Cherneski, B. Daneshfar, A. Davidson, X. Geng, E. Gorelov, D. Qi, R. Rieger, and D. Waldner from AAFC and F. Bedard and G. Reichert from Statistics Canada for their support and contributions. We thank R. Armstrong (AAFC) for analyzing the autocorrelation in the wheat data, and R. Carew, L. Townley-Smith, and B. MacGregor for feedback and discussions on this research linked with ecological and economic aspects. We also thank three anonymous reviewers for their feedback that helped to improve our manuscript. A prototype version of this forecast model for generating operational crop outlooks is called the Integrated Canadian Crop Yield Forecaster (ICCYF) has been coded and tested using the open source software and statistical libraries provided by the R Statistical Software (R Development Core Team, 2013). ArcGIS TM , (ESRI TM , Version 10, 2010) was used to visualize model output, processing spatial data and generating crop outlook maps. A user-guide is available with the R code for non-commercial and academic research. With further extension and testing of the forecasting method to other major crops and across Canada's agricultural extent, it is anticipated that a web-based portal will, in the future, deliver seasonal, operational crop outlook reports.