Your new experience awaits. Try the new design now and help us make it even better

OPINION article

Front. Plant Sci., 20 January 2026

Sec. Technical Advances in Plant Science

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1754446

This article is part of the Research TopicIntegrating Environmental Data and Genomic Resources for Accelerated Plant Adaptation and Crop ImprovementView all articles

Rethinking genomic selection under environmental uncertainty: toward learnable and dynamic environmental representations

Jun Yan*Jun Yan*Xueyang WangXueyang Wang
  • Frontiers Science Center for Molecular Design Breeding, State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, China Agricultural University, Beijing, China

Introduction

The environmental dimension of crop performance has become both the greatest challenge and the greatest opportunity in modern breeding. Increasing climate volatility, soil degradation, and diverse management practices make phenotypes inherently context dependent. Traditional genomic selection (GS) frameworks—built on linear mixed models—excel at capturing additive genetic effects, but when historical datasets are sparse or infrequently updated, they tend to compress environmental variation into categorical factors or a few fixed covariates (Millet et al., 2019). Such fixed-covariate abstraction often fails to capture the complexity, heterogeneity, and temporality of real-world environments, particularly when trial networks are spatially limited or when weather realizations deviate sharply from historical norms.

Importantly, the challenge of representing environmental heterogeneity is not unique to plant breeding. For several decades, spatial ecology has emphasized that ecological patterns and processes are fundamentally scale-dependent, shaped by spatial structure, heterogeneity, and nonlinear interactions across space and time (Levin, 1992). Levin’s seminal articulation of the problem of pattern and scale highlighted that no single spatial or temporal resolution can fully capture environmental complexity, a principle that remains highly relevant for understanding genotype–environment interactions in agricultural systems.

Recent advances in envirotyping, remote sensing, and phenomics have transformed the environment from an external disturbance into a measurable, data-rich component of the phenotype (Xu et al., 2022). In parallel, machine learning (ML) and artificial intelligence (AI) have demonstrated a growing capacity to infer complex, biologically meaningful traits from partial, indirect, or proxy measurements, effectively learning latent structural and functional representations of plant systems (Guo et al., 2025). Such advances underscore the potential of data-driven models to move beyond explicit feature specification toward representation learning, while also enabling the capture of complex, nonlinear genotype–environment (G×E) relationships directly from data (Crossa et al., 2025; Yan and Wang, 2023). Together, these developments imply that the environment should no longer be treated merely as a nuisance to adjust for, but instead as a learnable representation—a structured modality to be modeled, interpreted, and ultimately designed around.

The constraints of fixed-covariate G×E frameworks

Classical extensions of genomic selection that explicitly model G×E—typically implemented within linear mixed-model frameworks using best linear unbiased prediction (BLUP)—such as reaction-norm formulations and factor-analytic (FA) structures, have long provided the statistical foundation for multi-environment prediction. In their standard form, environments are incorporated as fixed effects and G×E is modeled as a random deviation with predefined covariance structures, often expressed through Kronecker products between genomic and environmental kernels (Cuevas et al., 2025). When trial structures are balanced and environmental gradients are well represented, these frameworks deliver interpretable and regularly updatable estimates; reaction-norm models, for instance, can incorporate daily weather covariates and be re-estimated as new data arrive (Toda et al., 2024).

Despite their historical success, these parametric frameworks are increasingly strained in the era of high-throughput envirotyping, as datasets outgrow the scale, structure, and assumptions of their original designs. A central limitation lies in the a priori specification of G×E functional forms—most commonly linear or low-order polynomial—which can fail to capture threshold behaviors, stage-specific stress responses, and genotype-dependent sensitivities observed under complex or fluctuating environments. These structural constraints are further reflected in declining predictive performance when models are transferred across seasons or geographic regions. Such losses in accuracy are only partly attributable to sampling noise and have been increasingly linked to rigid environmental covariate representations, non-stationary climate trajectories, and confounded genotype-by-year effects that are difficult to resolve without extensive multi-year data (Cooper et al., 2021). Moreover, coarse dimensionality-reduction strategies that summarize weather variables over predefined temporal windows can obscure fine-scale temporal dynamics and interactions that are critical for environment-sensitive responses (Garnica and Ojiambo, 2025).

Consequently, the next generation of GS is set to transcend purely parametric interaction modeling and enter an environmental-representation paradigm, one that autonomously extracts informative environmental features without sacrificing the interpretability or variance-partitioning insights inherited from classical mixed-model theory.

Learning to represent the environment

Latent and transferable encodings

Latent representation learning refers to a class of machine learning approaches that learn compact, low-dimensional embeddings from high-dimensional data while preserving the structures most relevant for downstream prediction or inference (Kopf and Claassen, 2021). Modern algorithms increasingly integrate latent representation learning to jointly encode genomic, phenotypic, and environmental inputs. Rather than relying on predefined covariates or hand-crafted interaction terms, these approaches learn compact latent variables that summarize high-dimensional inputs while preserving the structure most relevant for prediction. Technically, these representations are often learned through architectures such as variational autoencoder (VAE), which project heterogeneous data streams into a shared latent space. Within this space, environmental variation can be disentangled from genetic structure, enabling models to generalize across populations, traits, and environments by learning transferable environmental embeddings rather than environment-specific regression coefficients (Zhao et al., 2025). This paradigm replaces explicit regression with implicit understanding: rather than fitting genotype-specific slopes to predefined covariates, the model learns an environmental embedding space that reflects hidden regularities in stress patterns, climate variability, and management differences. Such embeddings form the conceptual bridge between predictive modeling and adaptive breeding under changing climates.

Multi-modal fusion and dual-extraction architectures

Crop performance is rarely determined by a single data modality. Empirical studies in quantitative genetics, functional ecology, and complex systems consistently show that genotype–environment interactions emerge from nonlinear and hierarchical couplings across genetic, physiological, and environmental layers (Crossa et al., 2025; Levin, 1992). Genomic, environmental, and physiological layers interact in hierarchical and nonlinear ways. Multi-modal fusion frameworks therefore use dual-extraction encoders to process each modality independently before integrating them through attention or gating mechanisms (Ren et al., 2024). This approach achieves two goals simultaneously: it preserves the identity and scale of each data type, and it enables the model to learn cross-modal synergies—such as how temperature and soil moisture jointly influence allelic expression or how management practices modify genetic potential. Attention weights or gating coefficients act as interpretable signals, revealing which environmental features and temporal windows most influence trait expression. Thus, these architectures transform opaque black-box predictions into biologically interpretable hypotheses.

Temporal and dynamic environmental modeling

Environmental effects are inherently dynamic: the timing, duration, and sequence of stress events often determine yield and stability more than their cumulative magnitude, because plants perceive and integrate environmental cues through nonlinear, developmentally gated, and memory-dependent physiological processes rather than linear accumulation. Static environmental averages obscure these critical temporal dynamics. Recent algorithmic advances incorporate temporal embeddings and gated attention layers to model environmental trajectories as structured sequences of signals acting on distinct developmental states, rather than as undifferentiated time series (Yao et al., 2025). In this context, dynamic modeling captures not only when stresses occur, but which environmental variables matter at which developmental stages, and how their effects depend on prior exposure and spatial context. By explicitly accounting for heterogeneous temporal variables—such as short-term extremes, cumulative stresses, and season-long trends—these models can identify windows of sensitivity and genotype-specific adaptive strategies. When coupled with spatially explicit envirotyping, temporal modeling further enables the joint learning of spatiotemporal environmental structure, reframing the environment as a spatially heterogeneous and temporally dynamic field that interacts with genotype across both dimensions.

Explainability and automation in environment-aware modeling

As models grow in complexity, interpretability becomes both a scientific and operational necessity. Explainable AI (XAI) techniques are now integral to modern GS pipelines, quantifying the relative influence of environmental and genetic features through Shapley values, attention visualization, or saliency mapping (He et al., 2025; Yu et al., 2025). Such tools bridge the gap between prediction and understanding, enabling breeders to trace how particular environmental drivers affect specific genotypes or traits.

Equally transformative is the emergence of automated machine learning (AutoML) pipelines that unify environmental feature engineering, model selection, and hyperparameter optimization (He et al., 2025). These automated systems not only enhance reproducibility and scalability but also democratize access to advanced modeling, enabling breeding programs to implement AI-driven predictions without specialized expertise. By combining automation with explainability, GS is evolving from a research methodology into a practical decision-support system for adaptive breeding.

From environmental covariates to multi-modal environmental representations

The emerging concept of environmental omics represents a frontier in characterizing agroecosystems, focusing on their biological and biochemical composition rather than traditional physical or climatic descriptors. It integrates soil and rhizosphere microbiomes, as well as management-associated biochemical signatures, providing high-dimensional, functionally informative representations of the external environment (Gioti et al., 2024). When such microbial data are independent of plant phenotypes and sampled in situ, they can be treated as external environmental variables (Dwivedi et al., 2025). By capturing these diverse molecular layers, environmental omics enables a mechanistic view of plant–environment interactions, offering unprecedented insights into how environmental states may modulate molecular regulation, stress signaling, and adaptive responses.

Modern multi-modal deep learning frameworks can now integrate these diverse inputs—genomic, phenomic, enviromic, and environmental-omic—treating each as an interconnected modality within a unified representational space. This design enables models to uncover latent couplings between molecular responses and external stimuli, providing mechanistic insights into how environmental conditions modulate gene expression, metabolism, and trait outcomes.

Furthermore, aligning these learned environmental embeddings with mechanistic crop models—for instance, mapping latent representations to parameters governing growth, canopy energy balance, or soil–water dynamics—bridges data-driven adaptability with physiological interpretability. Such hybrid systems combine the generalization strength of machine learning with the causal rigor of process-based modeling, representing a decisive step toward biologically grounded, environment-aware genomic prediction.

Discussion

Despite remarkable methodological progress, fully realizing environment-aware GS remains a formidable challenge. The bottlenecks now extend beyond computation to data harmonization, model generalization, and biological grounding. A central obstacle is the heterogeneity of environmental data: envirotyping datasets differ widely in spatial scale, temporal resolution, and metadata completeness. Without standardized descriptors and interoperable formats, even advanced models struggle to learn transferable environmental representations.

At the algorithmic level, generalization under novel or extreme climates remains unresolved. Most models are trained on historical datasets that capture only a subset of future climatic variability. When confronted with novel conditions, their predictive accuracy often collapses, underscoring the need for adaptive, transferable frameworks. Transfer learning and meta-learning provide promising directions by enabling models trained in one environment to reuse learned representations in another (Li et al., 2024, 2025), reducing domain dependence and improving robustness under unseen or rapidly changing climatic conditions.

Another major frontier is the heterogeneity of trait architectures. Highly heritable traits respond well to linear models, whereas low-heritability or stress-dependent traits demand more expressive architectures capable of capturing nonlinear G×E dependencies (Drouault et al., 2025; Guo et al., 2025; Millet et al., 2019). No single model can perform optimally across all objectives; one potential way forward is to deploy Task-Adaptive Routing Frameworks (TARFs)—combining specialized sub-models tailored to specific trait–environment contexts through dynamic gating and routing mechanisms (Bai et al., 2024).

Experimental validation remains critical. Model-derived environmental features should be physiologically interpreted and tested in the field, closing the loop between inference and biological understanding. Advancing environment-aware GS thus requires a cohesive system that combines standardized data, adaptive algorithms, and iterative validation—a pathway toward predictive breeding that not only forecasts performance but also reveals principles of adaptation.

Building on the conceptual framework illustrated in Figure 1, the future of environment-aware GS lies in dynamic, learnable environmental representations rather than static covariates. By integrating multi-modal bio-environmental inputs through deep learning engines, models can perform latent embedding, cross-modal fusion, and temporal modeling, capturing key environmental drivers and sensitive developmental windows. Coupled with automated and explainable AI, this approach enables robust prediction under novel or extreme climatic conditions while enhancing biological interpretability. In this paradigm, predictive breeding moves beyond fitting historical data to uncover principles of adaptation and guide the design of resilient genotypes, truly operationalizing the vision of intelligent, environment-aware selection.

Figure 1
Flowchart showing the process of environment-aware genomic selection. It begins with multi-modal bio-environmental inputs like genomics, phenomics, enviromics, environmental omics, and temporal context. Conventional G×E modeling uses linear mixed models. Learnable environmental representations include representation learning, temporal modeling, multi-modal fusion, and automated modeling. The process results in robust predictions, explainability, and optimized breeding decisions.

Figure 1. A conceptual framework for environment-aware genomic selection driven by learnable representations.

Author contributions

JY: Conceptualization, Writing – original draft, Writing – review & editing. XW: Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This study was supported by the National Key Research and Development Program of China (2023YFF1000100) and the National Natural Science Foundation of China (32341036).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. Generative AI was employed exclusively for refining English grammar and style; the authors retain full and sole responsibility for scientific accuracy, intellectual content, and all editorial decisions.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Bai, J., Sun, J. C., Wang, Z. G., Zhao, X. W., Wen, A. J., Zhang, C. L., et al. (2024). An adaptive intelligent routing algorithm based on deep reinforcement learning. Comput. Commun. 216, 195–208. doi: 10.1016/j.comcom.2023.12.039

Crossref Full Text | Google Scholar

Cooper, M., Voss-Fels, K. P., Messina, C. D., Tang, T., and Hammer, G. L. (2021). Tackling G x E x M interactions to close on-farm yield-gaps: creating novel pathways for crop improvement by predicting contributions of genetics and management to crop productivity. Theor Appl Genet 134, 1625–1644. doi: 10.1007/s00122-021-03812-3

PubMed Abstract | Crossref Full Text | Google Scholar

Crossa, J., Montesinos-Lopez, O. A., Costa-Neto, G., Vitale, P., Martini, J. W. R., Runcie, D., et al. (2025). Machine learning algorithms translate big data into predictive breeding accuracy. Trends Plant Sci. 30, 167–184. doi: 10.1016/j.tplants.2024.09.011

PubMed Abstract | Crossref Full Text | Google Scholar

Cuevas, J., Crossa, J., Montesinos-Lopez, A., Martini, J. W. R., Gerard, G. S., Ortegon, J., et al. (2025). Enhancing wheat genomic prediction by a hybrid kernel approach. Front. Plant Sci. 16. doi: 10.3389/fpls.2025.1605202

PubMed Abstract | Crossref Full Text | Google Scholar

Drouault, J., Palaffre, C., Millet, E. J., Rodriguez, J., Martre, P., Johnson, K., et al. (2025). A reaction norm for flowering time plasticity reveals physiological footprints of maize adaptation. G3 (Bethesda) 15. doi: 10.1093/g3journal/jkaf095

PubMed Abstract | Crossref Full Text | Google Scholar

Dwivedi, S. L., Vetukuri, R. R., Kelbessa, B. G., Gepts, P., Heslop-Harrison, P., Araujo, A. S. F., et al. (2025). Exploitation of rhizosphere microbiome biodiversity in plant breeding. Trends Plant Sci. 30, 1033–1045. doi: 10.1016/j.tplants.2025.04.004

PubMed Abstract | Crossref Full Text | Google Scholar

Garnica, V. C. and Ojiambo, P. S. (2025). Leveraging window-pane analysis with environmental factor loadings of genotype-by-environment interaction to identify high-resolution weather-based variables associated with plant disease. Front Plant Sci 16, 1637130. doi: 10.3389/fpls.2025.1637130

PubMed Abstract | Crossref Full Text | Google Scholar

Gioti, A., Theodosopoulou, D., Bravakos, P., Magoulas, A., and Kotoulas, G. (2024). The bioinformatics landscape in environmental omics: Lessons from a national ELIXIR survey. iScience 27, 110062. doi: 10.1016/j.isci.2024.110062

PubMed Abstract | Crossref Full Text | Google Scholar

Guo, W., Wang, F., Lv, J., Yu, J., Wu, Y., Wuriyanghan, H., et al. (2025). Phenotyping, genome-wide dissection, and prediction of maize root architecture for temperate adaptability. Imeta 4, e70015. doi: 10.1002/imt2.70015

PubMed Abstract | Crossref Full Text | Google Scholar

He, K., Yu, T., Gao, S., Chen, S., Li, L., Zhang, X., et al. (2025). Leveraging automated machine learning for environmental data-driven genetic analysis and genomic prediction in maize hybrids. Adv. Sci. (Weinh) 12, e2412423. doi: 10.1002/advs.202412423

PubMed Abstract | Crossref Full Text | Google Scholar

Kopf, A. and Claassen, M. (2021). Latent representation learning in biology and translational medicine. Patterns (N Y) 2, 100198. doi: 10.1016/j.patter.2021.100198

PubMed Abstract | Crossref Full Text | Google Scholar

Levin, S. A. (1992). The problem of pattern and scale in ecology. Ecology 73, 1943–1967. doi: 10.2307/1941447

Crossref Full Text | Google Scholar

Li, J. L., Zhang, D. F., Yang, F., Zhang, Q. S., Pan, S. H., Zhao, X. Y., et al. (2024). TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield. Plant Commun. 5, 100975. doi: 10.1016/j.xplc.2024.100975

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Y., Ren, S., Li, J., Lee, J., Wan, J., and Gan, X. (2025). MeNet: A mixed-effect deep neural network for multi-environment genomic prediction of agronomic traits. Plant Commun., 101620. doi: 10.1016/j.xplc.2025.101620

PubMed Abstract | Crossref Full Text | Google Scholar

Millet, E. J., Kruijer, W., Coupel-Ledru, A., Alvarez Prado, S., Cabrera-Bosquet, L., Lacube, S., et al. (2019). Genomic prediction of maize yield across European environmental conditions. Nat. Genet. 51, 952–956. doi: 10.1038/s41588-019-0414-y

PubMed Abstract | Crossref Full Text | Google Scholar

Ren, Y., Wu, C., Zhou, H., Hu, X., and Miao, Z. (2024). Dual-extraction modeling: A multi-modal deep-learning architecture for phenotypic prediction and functional gene mining of complex traits. Plant Commun. 5, 101002. doi: 10.1016/j.xplc.2024.101002

PubMed Abstract | Crossref Full Text | Google Scholar

Toda, Y., Sasaki, G., Ohmori, Y., Yamasaki, Y., Takahashi, H., Takanashi, H., et al. (2024). Reaction norm for genomic prediction of plant growth: modeling drought stress response in soybean. Theor Appl Genet 137, 77. doi: 10.1007/s00122-024-04565-5

PubMed Abstract | Crossref Full Text | Google Scholar

Xu, Y., Zhang, X., Li, H., Zheng, H., Zhang, J., Olsen, M. S., et al. (2022). Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. Mol. Plant 15, 1664–1695. doi: 10.1016/j.molp.2022.09.001

PubMed Abstract | Crossref Full Text | Google Scholar

Yan, J. and Wang, X. (2023). Machine learning bridges omics sciences and plant breeding. Trends Plant Sci. 28, 199–210. doi: 10.1016/j.tplants.2022.08.018

PubMed Abstract | Crossref Full Text | Google Scholar

Yao, Z., Yao, M., Wang, C., Li, K., Guo, J., Xiao, Y., et al. (2025). GEFormer: A genotype-environment interaction-based genomic prediction method that integrates the gating multilayer perceptron and linear attention mechanisms. Mol. Plant 18, 527–549. doi: 10.1016/j.molp.2025.01.020

PubMed Abstract | Crossref Full Text | Google Scholar

Yu, T., Zhang, H., Chen, S., Gao, S., Liu, Z., Wang, J., et al. (2025). EXGEP: a framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models. Brief Bioinform. 26. doi: 10.1093/bib/bbaf414

PubMed Abstract | Crossref Full Text | Google Scholar

Zhao, X. Y., Tan, C. W., Sun, F. Z., Li, J. L., Zhang, D. F., Zhang, Q. S., et al. (2025). VMGP: A unified variational auto-encoder based multi-task model for multi-phenotype, multi-environment, and cross-population genomic selection in plants. Artif. Intell. Agr 15, 829–842. doi: 10.1016/j.aiia.2025.06.007

Crossref Full Text | Google Scholar

Keywords: environmental representation, envirotyping, explainable AI, G×E interaction, genomic selection

Citation: Yan J and Wang X (2026) Rethinking genomic selection under environmental uncertainty: toward learnable and dynamic environmental representations. Front. Plant Sci. 16:1754446. doi: 10.3389/fpls.2025.1754446

Received: 26 November 2025; Accepted: 29 December 2025; Revised: 23 December 2025;
Published: 20 January 2026.

Edited by:

Ruslan Kalendar, University of Helsinki, Finland

Reviewed by:

Leonardo Datri, Universidad Nacional del Centro de la Provincia de Buenos Aires (UNCPBA), Argentina

Copyright © 2026 Yan and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jun Yan, eWFuanVuQGNhdS5lZHUuY24=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.