AUTHOR=Morales Alejandro , Villalobos Francisco J. TITLE=Using machine learning for crop yield prediction in the past or the future JOURNAL=Frontiers in Plant Science VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1128388 DOI=10.3389/fpls.2023.1128388 ISSN=1664-462X ABSTRACT=The use of ML in agronomy has been increasing exponentially since the start of the century, including data-driven predictions of crop yields from farm-level information on soil, climate and management. In this study we explore the effect of the choice of predictive algorithm, amount of data and data partitioning strategies on predictive performance, using synthetic datasets from biophysical crop models. We simulated sunflower and wheat data using OilcropSun and Ceres-Wheat from DSSAT for the period 2001-2020 in 5 areas of Spain. Simulations were performed in farms differing in soil depth and management. The data set of simulated yields was analysed using different algorithms (regularized linear models, random forest, artificial neural networks) using farm-level data regarding seasonal weather, management and soil. Data partitioning for training and testing was performed with ordered data (i.e., older data for training, newest data for testing) in order to compare the different algorithms in their ability to predict yields in the future by extrapolating from past data. The Random Forest algorithm had a better performance than artificial neural networks and regularized linear models and was easy to execute. However, even the best models showed a limited advantage over the predictions of a sensible baseline (average yield of the farm in the training set). Errors in seasonal weather forecasting were not taking into account, so real-world performance is expected to be even closer to the baseline. Application of AI algorithms for yield prediction should always include a comparison with the best guess to evaluate if the additional cost of data required for the model compensates the increase in predictive power. Crop models validated for the region and cultivars of interest may be used before data collection to establish the potential advantage as illustrated in this study.