AUTHOR=Aviles Toledo Claudia , Crawford Melba M. , Tuinstra Mitchell R. TITLE=Integrating multi-modal remote sensing, deep learning, and attention mechanisms for yield prediction in plant breeding experiments JOURNAL=Frontiers in Plant Science VOLUME=Volume 15 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1408047 DOI=10.3389/fpls.2024.1408047 ISSN=1664-462X ABSTRACT=In this study, a multi-modal deep learning architecture that assimilates inputs from heterogeneous data streams, including high-resolution hyperspectral imagery, LiDAR point clouds, and environmental data, to forecast maize crop yields, is proposed. The architecture includes attention mechanisms that assign varying levels of importance to different modalities and temporal features, reflecting the dynamics of plant growth and environmental interactions. The interpretability of the attention weights is investigated in multi-modal networks that seek to both improve predictions and attribute crop yield outcomes to genetic and environmental variables. This approach also contributes to increased interpretability of the model's predictions. The temporal attention weight distributions were examined to identify relevant factors and critical growth stages that contribute to the predictions. The results of this study affirm that the attention weights are consistent with recognized biological growth stages, thereby substantiating the network's capability to learn biologically interpretable features. Accuracies of the model's predictions of yield ranged from 0.82-0.93 R 2 ref in the genetics-focused study, further highlighting the potential of attention-based models. The primary objective of this research is to explore and evaluate the potential contributions of deep learning network architectures that employ stacked LSTM for end-of-season maize grain yield prediction. A secondary aim is to expand the capabilities of these networks by adapting them to better accommodate and leverage the multi-modality properties of remote sensing data. Further, this research facilitates understanding of how multimodality remote sensing aligns with the physiological stages of maize. In both plant breeding and crop management, interpretability plays a crucial role in instilling trust in AI-driven approaches and enabling the provision of actionable insights. To the best of our knowledge, this is the first study that investigates the use of hyperspectral and LiDAR UAV time series data for explaining/interpreting plant growth stages within deep learning networks and forecasting plot-level maize grain yield using late fusion modalities with attention mechanisms.