Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Plant Biophysics and Modeling

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1617831

This article is part of the Research TopicEmerging Methodologies in Genotype-Phenotype Models for Crop ImprovementView all 7 articles

Enhancing Yield Prediction from Plot-Level Satellite Imagery through Genotype and Environment Feature Disentanglement

Provisionally accepted
  • 1Iowa State University, Ames, United States
  • 2University of Nebraska-Lincoln, Lincoln, Nebraska, United States

The final, formatted version of the article will be published soon.

Accurately predicting yield during the growing season enables improved crop management and better resource allocation for both breeders and growers. Existing yield prediction mod-els for an entire field or individual plots are based on satellite-derived vegetation indices (VIs) and widely used machine learning-based feature extraction models, including principal component analysis (PCA) and autoencoders (AE). Here, we significantly enhance pre-harvest yield prediction at plot-scale using Compositional Autoencoders (CAE) — a deep-learning-based feature extraction approach designed to disentangle genotype (G) and environment (E) features — on high-resolution, plot-level satellite imagery. Our approach uses a dataset of approximately 4,000 satellite images collected from replicated plots of 84 hybrid maize varieties grown at five distinct locations across the U.S. Corn Belt. By deploying the CAE model, we improve the separation of genotype and environment effects, enabling more accurate incorporation of genotype-by-environment (GxE) interactions for downstream prediction tasks. Results show that the CAE-based features improve early-stage yield predictions by up to 10% compared to traditional autoencoder-based features and outperform vegetation indices (VIs) by 9% across various growth stages. The CAE model also excels in separating environmental factors, achieving a high silhouette score of 0.919, indicating effective clustering of environmental features. Moreover, the CAE consistently outperforms standard models in unseen environments and unseen genotypes yield predictions, demonstrating strong generalizability. This study demonstrates the value of disentangling G and E effects for providing more accurate and early yield predictions that support informed decision-making in precision agriculture and plant breeding.

Keywords: representation learning, Genotype × environment interactions, Crop yield prediction, satellite data, Latent feature extraction

Received: 25 Apr 2025; Accepted: 31 Aug 2025.

Copyright: © 2025 Powadi, Jubery, Tross, Shrestha, Coffey, Schnable, Schnable and Ganapathysubramanian. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
James C Schnable, University of Nebraska-Lincoln, Lincoln, 68588, Nebraska, United States
Patrick Schnable, Iowa State University, Ames, United States
Baskar Ganapathysubramanian, Iowa State University, Ames, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.