Your new experience awaits. Try the new design now and help us make it even better

REVIEW article

Front. Plant Sci., 29 January 2026

Sec. Technical Advances in Plant Science

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1732979

This article is part of the Research TopicIntegrating Environmental Data and Genomic Resources for Accelerated Plant Adaptation and Crop ImprovementView all articles

Artificial intelligence in plant science: from image-based phenotyping to yield and trait prediction

Tong Wang*&#x;Tong Wang1*†Ran Tong&#x;Ran Tong2†Ting XuTing Xu3Yue LiYue Li4Yonghao ChenYonghao Chen5
  • 1Department of Plant Science and Landscape Architecture, University of Connecticut, Storrs, CT, United States
  • 2Mathematics and Statistics Department, University of Texas at Dallas, Richardson, TX, United States
  • 3Department of Computer Science, University of Massachusetts Boston, Boston, MA, United States
  • 4Department of Computer Science, Purdue University, West Lafayette, IN, United States
  • 5San Diego Health, University of San Diego, San Diego, CA, United States

With the development of artificial intelligence (AI) in complicated imaging and remote sensing technologies, plant research is transitioning from manual measurements to automated data collecting. High-throughput image-based phenotyping enables the precise and automated acquisition of traits across various spatial and temporal scales, ranging from controlled laboratory settings to intricate field. Furthermore, AI facilitates the combination of satellite observations, unmanned aerial vehicle (UAV) imaging, soil and climate data, and spatiotemporal information to enhance the precision of trait monitoring and yield prediction. These advances enhance the ability to evaluate and predict crop performance under variable environmental conditions. This paper offers a cross-disciplinary paradigm for accurate and sustainable modern agriculture by merging AI methodologies with plant phenotyping and yield forecasting.

1 Introduction

Global food systems are increasingly challenged by rapid population growth, climate variability, and limited natural resources, creating an urgent need for innovative strategies to enhance crop productivity and ensure sustainable agricultural production (van Dijk et al., 2021). Conventional methods for phenotyping and yield forecasting are laborious and unable to capture complex interactions.

Artificial intelligence (AI) is now widely applied across scientific disciplines (Jumper et al., 2021; Pyzer-Knapp et al., 2022; Rajpurkar et al., 2022; Advancing geoscience with AI, 2024; Murphy et al., 2024; Serrano et al., 2024; Bodnar et al., 2025). In recent years, the utilization of machine learning (ML) and deep learning (DL) in agriculture has expanded swiftly, propelled by substantial methodological advancements in algorithm creation and the rising accessibility of extensive crop datasets (Joshi et al., 2023; Waqas et al., 2025). These approaches remedy long-standing weaknesses in lab and field by taking over tasks that are tedious or require expert handling (Fahlgren et al., 2015). Plant scientists are beginning to use more computer-driven procedures to automatically quantify image-based traits and predict crop yield based on data collected across the different locations. Since global farming is impacted by increasing food demands and shifting weather patterns, this change occurs at a pivotal time in world agriculture. Secure and high productivity of crops amid the environmental changes will demand agile and high yielding agricultural systems, which, are very difficult to achieve using traditional phenotyping and breeding strategies. The use of computational techniques minimizes time intervals between crops and allows for improved accuracy and lower down the repeat measurement of trait assessments (van Klompenburg et al., 2020; Crossa et al., 2025a).

In this review we present a two-part summary on how AI has started to revolutionize plant science in: (i) Image-Based Phenotyping, where algorithms analyze plant images at varying scales from single organs to field canopies. Another application includes (ii) Yield and Trait Prediction through remote sensing, where a combination of varied environment factors like rainfall and temperature data is used to estimate crop yield. Increased adoption and usage of plant imaging provide a switch across scientific disciplines. With the ability of applying reliable and phenotypic information and integrating smart modeling tools, plant biologists can facilitate and improve development of crops; with the integration of high-throughput phenotypic data and advanced predictive algorithms with complicated environment data, the crop improvement cycle can be both fast tracked and enhanced in precision and efficacy (Figure 1).

Figure 1
Illustration depicting a framework for agricultural research using AI. A brain with AI elements connects lab research and field applications, involving machine learning and deep learning. On the left, a corn plant under analysis with a computer and drone. On the right, a farmer holds a tablet, symbolizing remote sensing for yield prediction. The lower section highlights applications such as precision management, early warning, breeding, and policy for food security. Concepts like robustness, interpretability, open science, and edge-cloud co-design are emphasized.

Figure 1. Conceptual framework of AI-enabled plant phenotyping and yield forecasting. The core structure and conceptual framework of the study. (1) Image-based phenotyping (2) Multi-Source remote sensing.

The core structure and conceptual framework of the study (van Dijk et al., 2021). Image-based phenotyping (Rajpurkar et al., 2022) Multi-Source remote sensing.

2 Unified AI framework for plant phenotyping and trait analysis

Modern AI has moved from engineered features to end-to-end DL with automatic representation learning (Figure 2).

Figure 2
Flowchart illustrating the progression from traditional machine learning (using SVM, RF, GBLUP) to deep learning techniques (CNN, RNN, Transformers, GNN). It shows applications in image-based phenotyping and remote sensing for yield and trait prediction.

Figure 2. Framework of AI modeling in plant phenotyping and remote sensing. Traditional ML methods, such as SVM, RF, and GBLUP, for early phenotypic prediction tasks. With the advent of DL, advanced models including CNN, RNN, Transformer, GNN, multimodal learning, and FL for efficient and scalable data analysis for phenotyping and sensing.

2.1 Traditional machine learning

Random forest (RF) is suitable for nonlinear, high-dimensional data and used well in trait prediction and remote sensing (Belgiu and Drăguţ, 2016), Support Vector Machine (SVM) improve generalization by maximizing class margins (Cortes and Vapnik, 1995), and bagging strengthens the robustness of RF (Breiman, 2001). Genomic Best Linear Unbiased Prediction (GBLUP) is a linear mixed model in genomic selection (Clark and van der Werf, 2013).

2.2 Deep feature learning

2.2.1 Convolutional neural networks

CNNs, designed to capture 2D structural variation, perform well in handwritten character recognition. Graph transformer networks (GTNs) enable joint training of multiple modules and enhance performance. Early work such as LeNet illustrated the effectiveness of CNNs in learning hierarchical features from raw images (Lecun et al., 1998). Building on this foundation, AlexNet marked a major advance in large-scale image classification by leveraging graphics processing units (GPUs) and the rectified linear unit (ReLU) activation function (Krizhevsky et al., 2017). Subsequent CNN architectures further pushed performance: VGG deepened networks using small convolutional filters (Simonyan and Zisserman, 2015), and ResNet introduced residual connections that enabled very deep models (He et al., 2016). U-Net and related encoder-decoder models are effective for dense prediction tasks by combining multi-scale features (Ronneberger et al., 2015).

2.2.2 Recurrent neural networks and sequence modeling

Recurrent Neural Networks (RNNs) incorporate feedback connections that create an internal memory of sequential data. Basic RNNs suffer from vanishing or exploding gradients when learning very long-term dependencies. Two gating architectures were introduced to mitigate this: the Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997). and the Gated Recurrent Unit (GRU) (Cho et al., 2014). Another pivotal innovation was the Attention mechanism, first applied to neural machine translation by Jointly Learning to Align and Translate (Bahdanau et al., 2014).

2.2.3 Transformers and self-attention

The limitations of RNNs on long sequences led to the emergence of Transformers, which rely on self-attention to process entire sequences in parallel by relating each element to every other (Vaswani et al., 2017). BERT use bidirectional encoders to generate rich representations for downstream tasks (Devlin et al., 2019). In computer vision, Vision Transformers (ViT) demonstrated that one can treat an image as a sequence of patches and achieve competitive results using Transformer architectures (Dosovitskiy et al., 2021). Hierarchical variants such as the Video Shifted Window (Swin) Transformer introduce localized attention windows and multiscale processing to improve efficiency on high-resolution images (Liu et al., 2021).

2.2.4 Graph neural networks

Graph Neural Networks (GNNs) extend deep learning to graph-structured data. One influential GNN variant is the Graph Convolutional Network (GCN) (Kipf and Welling, 2017), which generalizes the notion of convolution to graph neighborhoods. More broadly, Message Passing Neural Networks (MPNNs) provide a framework where messages sent along edges can be learned functions of both neighbor and edge features (Gilmer et al., 2017). The Graph Attention Network (GAT) introduces attention weights on edges (Velickovic et al., [[NoYear]]).

2.2.5 Multimodal learning

Baltrusaitis et al. provides a comprehensive survey of multimodal Learning, tracing the progression from such independent processing to modern approaches that learn joint representations (Baltrusaitis et al., 2019). Multimodal deep learning integrates heterogeneous data types within a single framework and uses fusion and alignment strategies to learn shared representations, improving performance across diverse tasks (Shaban and Yousefi, 2024). Vision-and-language models like ViLBERT (Lu et al., 2019) and LXMERT (Tan and Bansal, 2019) are examples of vision-and-language models that have co-attentional transformer layers. These layers facilitate interaction and mutual influence between the picture areas and the language tokens. CLIP model exemplifies this methodology by training on image-text pairs to provide analogous embeddings for corresponding pairs and unique embeddings for non-corresponding pairs, allowing zero-shot transfer by enabling the model to learn a universal correspondence between images and natural language descriptions (Radford et al., 2021).

2.2.6 Federated learning

Modern AI systems are being used in situations such as user smartphones or different universities and institutions, and can’t be combined directly because of privacy concerns. FL solves this problem by letting models be trained together without putting all the raw data in one place (McMahan et al., 2017). Federated Averaging (FedAvg) averages the model parameters from each client, making communication in federated networks more efficient (Kairouz et al., 2021).

Traditional ML methods, such as SVM, RF, and GBLUP, for early phenotypic prediction tasks. With the advent of DL, advanced models including CNN, RNN, Transformer, GNN, multimodal learning, and FL for efficient and scalable data analysis for phenotyping and sensing.

2.3 Methodological foundation of AI in plant phenotyping and remote sensing

There are trade-offs between convergence and accuracy (Table 1; Figure 2). This trend show that model design is moving in a more general, flexible, and real-world-aligned direction.

Table 1
www.frontiersin.org

Table 1. Comparative summary of AI methods for plant phenotyping applications.

3 Image-based phenotyping: from lab research to field applications

Traditional plant phenotyping, which relies on manual measurements of plant height and leaf size, is slow, small in scale, and subjective. Modern image-based approaches enable nondestructive, large-scale trait acquisition through automated analysis. An early breakthrough in image-based plant phenotyping was achieved through automated segmentation of time-series plant images under variable backgrounds (Minervini et al., 2014). Advances in imaging hardware (high-resolution cameras, multi/hyperspectral sensors, depth sensors) and robotics now allow plants to be imaged from many angles and modalities, from controlled greenhouse to over field. The improvement of computer image analysis has made it much more accurate, allowing for the consistent extraction of morphological and physiological information across a wide range of temporal and spatial scale (Tardieu et al., 2017).

Modern image-based phenomics allows for the large-scale, non-invasive assessment of essential plant characteristics. By reviewing more than 200 studies on crop diseases, animal health, and aquaculture, researchers identified several common challenges in data and practical implementation. They emphasize the need for flexible, efficient, and scalable AI models to address these issues (Nawaz et al., 2025). Recent advancements in computer techniques have considerably improved the accuracy, efficiency, and scope of trait analysis. This change helps solve problems that have been around for a long time, such as manual scoring, changes in the environment, and low temporal resolution. More accurate trait characterization makes it easier to combine phenotypic data, which speeds up the identification of new traits and the development of crops (Tardieu et al., 2017; Jiang and Li, 2020; Nabwire et al., 2021).

This section reviews how imaging technologies and AI algorithms for plant phenotyping have progressed, and how they are being deployed from controlled lab settings to real-world field applications (Figure 3, Table 2).

Figure 3
Illustration showing image-based phenotyping from lab to field. On the left, plants are monitored in a greenhouse using cameras connected to a computer. An arrow labeled “Advanced AI models” points to the right, where a drone surveys crops in an open field.

Figure 3. Overview of image-based phenotyping from lab to field. The controlled-environment phenotyping (left, e.g. greenhouse imaging systems with cameras on rails or conveyor belts) to field phenotyping (right, e.g. UAV drone surveys over crop plots).

Table 2
www.frontiersin.org

Table 2. Representative AI-based image phenotyping applications.

The controlled-environment phenotyping (left, e.g. greenhouse imaging systems with cameras on rails or conveyor belts) to field phenotyping (right, e.g. UAV drone surveys over crop plots).

3.1 Advances in deep learning for plant image analysis

3.1.1 Deep learning reshaping plant image analysis

Over the last five years, DL has become the dominant approach for analyzing large plant image datasets, largely replacing earlier hand-crafted image analysis techniques. DL models can learn complex visual patterns of plant architecture, health, and development directly from data, reducing the need for human-designed features or heuristics (Murphy et al., 2024).

PhenoAI is an open-source Python framework for processing PhenoCam time-series images. It integrates quality control, vegetation segmentation, and phenological metrics extraction. In a black spruce case study in Quebec, it accurately detected key phenological stages, reducing manual work and improving monitoring efficiency (Kumar et al., 2025).

3.1.2 CNNs: the foundation of modern plant image analysis

CNNs were one of the earliest DL architectures widely applied in plant phenotyping. Using large image datasets, researchers have trained CNNs to identify subtle phenotypic differences and to recognize plant structure (Ubbens et al., 2018). For example, CNN-based semantic segmentation models have been applied to partition images into meaningful classes, which distinguishing plant pixels from background or even segmenting individual leaves and organs in rosette plants and cereal canopies, thereby enabling precise measurement of traits such as leaf area, shape, and number; CNN models LeafNet have also been used for stress and disease phenotyping; by pairing classification networks with saliency maps, researchers can highlight which regions of a leaf or plant (e.g. edges, color patches) most contributed to a disease or drought stress prediction, aligning these regions with known symptoms and improving biological interpretability (Barré et al., 2017; Beikmohammadi et al., 2022).

In one early demonstration of DL’s potential, Pound et al. achieved over 97% accuracy in detecting and localizing plant organs from imagines Their deep CNN-based system could automatically find root tips and other root/shoot structures with great accuracy. The program can detect 12 of 14 QTLs that human experts had already found (Pound et al., 2017). This discovery showed that automated photo processing makes phenotyping easier, showing that phenotyping is moving toward data-driven era.

3.1.3 Vision transformers: broadening the analytical scope

Vision Transformers (ViTs) have developed innovative techniques for processing plant images that exceed conventional models. Transformers show long-range interdependence and global spatial linkages in visuals by using self-attention methods. This feature makes it easier to see the general shape and context of a plant, while CNNs only look at small sections. ViTs are being used more and more to find plant diseases and for precision farming. They are good at working with different datasets and can even take good performance in hybrid models. Current research on transfer learning, model compression, and attention visualization is aiding in the reduction of data and computing requirements, underscoring their applicability for field utilization in crop monitoring and management (Mehdipour et al., 2025). ViT-based models can do plant phenotyping tasks with high accuracy and efficiency that is competitive. For example, a lightweight ViT model (PMVT) made for mobile devices was able to accurately classify plant diseases in the field (Li et al., 2023). Another method used convolutional features and transformer-based context integration (ST-CFI) to improve the ability to find diseases in leaves (Yu et al., 2025). Combined two ViT modules for extracting image features with a temporal transformer to model time-series data, then added seed information to estimate soybean production in Canadian farms. This method reduced the prediction error by more than 40% compared to the baseline and showed that seed traits play a key role, especially in identifying low-yield plots (Bi et al., 2023). Research indicates that Vision Transformers mitigate human and structural biases in picture analysis. This lets researchers examine at pictures more closely and get better features about plants with complicated traits.

3.1.4 Overcoming data limitations through model fusion

A research study shown that employing a deep CNN segmentation network alongside an ensemble bagging technique significantly enhanced segmentation accuracy on a limited set of labeled crop images (Zhan et al., 2024). Deep 1D/2D CNNs were used to forecast yields in food-insecure areas based on normalized difference vegetation index (NDVI) and climate data. While the method transferred to Algeria (2002–2018), it performed worse than simple ML models and NDVI baselines, highlighting the limitations of small datasets for yield prediction (Sabo et al., 2023). As methods get better, they make outputs easier to understand, and help us understand phenotypic variance better.

3.2 Core computer vision tasks: segmentation, detection, and trait extraction

In controlled lab, acquired images are processed using semantic segmentation to identify specific plant organs (e.g., flower, leaf) and disease symptoms. Object detection algorithms further classify traits. UAV-based imaging enables large-scale trait extraction, and 3D reconstruction provides structural and morphological measurements.

For quantitative trait analysis, it based on basic computer vision tasks like segmentation, regression, and classification (Mostafa et al., 2023). Key vision tasks in plant phenotyping include semantic segmentation, object detection, and trait extraction through 3D reconstruction (Figure 4). AI has sped up progress in these areas.

Figure 4
Diagram illustrating LAB Research in plant analysis. A greenhouse setup shows plants monitored by a camera. Processes include semantic segmentation dividing plant parts, object detection identifying health and disease traits, and UAV-based trait extraction with 3D reconstruction of plants.

Figure 4. Controlled-environment Lab phenotyping. In controlled lab, acquired images are processed using semantic segmentation to identify specific plant organs (e.g., flower, leaf) and disease symptoms. Object detection algorithms further classify traits. UAV-based imaging enables large-scale trait extraction, and 3D reconstruction provides structural and morphological measurements.

3.2.1 Semantic segmentation

Semantic Segmentation categorizes each pixel individually rather than considering the image as a whole (Osco et al., 2021b). Detailed pixel-level clarity lets you see the structure of plants in great detail, making it possible to precisely measure things like leaf area, canopy coverage, and organ morphology. Zenkl et al. employed a DeepLabV3+ model, trained on a variety of wheat photos, to effectively distinguish plants from the background across different genotypes (Zenkl et al., 2021). A UAV-enabled framework for cotton yield prediction based on cotton semantic segmentation using YOLO and SAM models (Reddy et al., 2024). Li et al. developed PSegNet, a framework for instance segmentation of 3D plant point clouds, enabling accurate organ identification combines voxel-based downsampling with multi-scale segmentation to provide better results across different plant species (Li et al., 2022). These instances demonstrate a shift towards sophisticated segmentation techniques that bring precision with AI.

3.2.2 Object detection

Modern object recognition like Faster R-CNN and YOLO (Ren et al., 2017; Luo et al., 2024). have made it easier to do things like count fruits, find cereal spikes, and spot leaf defects. One of the biggest problems is generalizing to different species or varieties, since models that work well on one type of fruit may not work well on others that look different. To fix this problem, more and more people are using domain adaptation strategies.

Zhang et al. and Guo et al. studied fruit detection models to “fill the species gap”. Domain adaptation enhances the transferability of fruit identification models, facilitating cross-species application without the necessity for human labeling. When CycleGAN-based visual translation and pseudo-labeling were utilized, models trained on oranges did very well with apples and tomatoes (Zhang et al., 2021) Transfer performance has improved a lot because to cross-domain detection frameworks that leverage GANs and features to match images through knowledge distillation. To make sure that instance- and image-level representations are the same across domains, SSDA-YOLO uses YOLOv5, Mean Teacher distillation, and style transfer (Zhou et al., 2023). Adding contextual aggregation to improve global feature learning made it easier to transition models from datasets of oranges to datasets of tomatoes and apples. This showed that fruit recognition is more generalizable and efficient (Guo et al., 2023). FEWheat-YOLO (YOLOv11n) is a lightweight and efficient model achieved R² = 0.941 for Wheat Spike Detection (Wu et al., 2025). DomAda-FruitDet is a domain-adaptive model that doesn’t use anchors. It fills in the gaps between the foreground and background by using multi-scale prediction and adaptive sampling. This makes auto-labeling more precise and speeds up, thereby makes fruit recognition more general (Zhang W. et al., 2024).

These tactics, along with self-training, enabled the precise identification of fruits without manual labels. This form of domain adaptation is crucial for applying models to crops and in regions with limited data.

3.2.3 Trait extraction and 3D reconstruction

Researchers that study plants have built a system that uses AI to look at drone pictures of crops from different angles. The technique automatically measures plant height and panicle length and creates realistic 3D crop canopy models. This method delivers images to the best ViT model based on clarity, noise, and blurriness. Segmentation accuracy and computing efficiency are higher than typical CNN approaches. Transformer designs can handle a variety of precision agricultural imaging situations, as shown by this modular design (Gopalan et al., 2025). To sum up the architectures used in agricultural research, we focus on two types of Transformer-based models: pure and hybrid (Xie et al., 2024). When put CNN and Transformer parts together, Transformers work from several viewpoints to display the overall shape of the plant, while CNNs gather small details like the margins of leaves and the textures of panicles. In 2023, Hu et al. introduced FOTCA, a hybrid plant disease detection model using adaptive Fourier neural operators and CNN-Transformer fusion. This design improves generalization and recognition by collecting local and global information (Hu et al., 2023).

In addition to 2D picture analysis, computational techniques are being utilized to rebuild plant morphology in three dimensions and to extract structural characteristics. Multi-view imagery and depth sensing make it possible to make 3D point clouds or volumetric models. Transformers are built into CNN-based stereo networks for plant research. UAV-based RS offers high-resolution monitoring that works well with yield data. DL models (CNN, LSTM, CNN-LSTM, ConvLSTM, 3D-CNN) were used to look at UAV RGB time series and meteorological data in Finland to estimate yield. The 3D-CNN achieved the highest accuracy, with an MAE of 219 kg/ha (5.5% MAPE) over a 15-week sequence and 293 kg/ha (7.2% MAPE) using early-season data (Nevavuori et al., 2020). Furthermore, EdgeMVSFormer, a transformer-based multi-view stereo method that reconstructs detailed 3D plant models from UAV images (Cheng et al., 2025). SegVoteNet represents a novel framework for 3D sorghum canopy phenotyping, by integrating UAV-based data acquisition, NeRF-based reconstruction, and DL-driven analysis. By integrating VoteNet and PointNet++ in a unified backbone, the model performs semantic segmentation and panicle detection directly from point clouds. This framework offers a scalable approach for canopy and panicle trait characterization in sorghum (James et al., 2025). Ninomiya (2022) analyzed high-throughput field phenotyping technologies, concentrating on diverse crop canopy characteristics, including plant height, coverage, biomass, stress indicators, and organ identification and enumeration. Improvements in photography, 3D reconstruction, sensor technologies, UAVs, and computational methods have made these observations possible (Ninomiya, 2022). Gaillard et al. devised a high-throughput voxel carving technique to recreate 3D models of sorghum plants from several RGB photos, facilitating the assessment of canopy features pertinent to light interception and genetic research (Gaillard et al., 2020). These 3D reconstructions make it possible to assess properties that are hard to see in 2D photos, such as plant structure, canopy volume, and branching patterns.

In short, DL-based segmentation, detection, and trait extraction dramatically increase the number of traits that can be measured from photos especially for varies situations.

3.3 Extending image-based phenotyping to real-world field conditions

One of the problems with image-based phenotyping is that it is hard to transfer approaches from controlled greenhouse to field settings, where changes in lighting, backdrop, weather, and plant structure can make models less reliable (Figure 5). Three primary research trajectories are tackling this deficiency.

Figure 5
Illustration showing the application of drones in agriculture. On the left, a drone hovers over crop fields. On the right, three panels depict the process: robust model adaptation for different crops, edge deployment on UAVs and mobile devices by a farmer, and high-throughput field phenotyping with a drone collecting data.

Figure 5. Image-based phenotyping to real-world field conditions. Domain adaptation allows models to generalize across crops and environments, while edge deployment supports real-time trait analysis. High-throughput field phenotyping allows rapid trait measurement in real-world agricultural environments.

Domain adaptation allows models to generalize across crops and environments, while edge deployment supports real-time trait analysis. High-throughput field phenotyping allows rapid trait measurement in real-world agricultural environments.

3.3.1 Robust model adaptation

Models that work well in one situation (like greenhouse photos of plants that all develop the same way) frequently don’t work well in another (like field plots with changing conditions). Recent research has investigated semi-self-supervised domain adaptation, utilizing little manual annotation alongside video-derived data and neural encoder-decoder architectures to produce substantial training datasets. These methods have shown good results in a variety of fields, showing that they could make DL applications in agriculture more stable and scalable (Ghanbari et al., 2024). As noted in the last section, researchers are now using domain adaptation and intense data augmentation to make models more robust.

For example, large annotated field image datasets such as the Global Wheat Head Detection (GWHD) dataset (David et al., 2020) have been established to facilitate training of wheat spike detection models across diverse genotypes and locations. Exposure to such diverse training data helps models learn more generalized representations. Techniques such as adversarial domain adaptation and synthetic data generation are also applied. One study proposed a semi-self-supervised CNN with probabilistic diffusion, requiring minimal manual labels and achieving robust crop image segmentation across variable field conditions (weather, lighting changes) (Ghanbari et al., 2024). The focus is on careful dataset design and curation. Training that covers a wide range of environmental circumstances makes models more stable when they are used in the real world. Transfer learning through fine-tuning on small datasets is commonly used to quickly adapt models to new crops or sites without having to relabel a lot of data.

Recent developments have investigated the application of Generative Adversarial Networks (GANs) to reduce dependence on extensive annotated datasets. Varela et al. introduced the Efficiently Supervised GAN (ESGAN) framework, which integrates extensive amounts of unlabeled UAV footage with a limited collection of annotated samples to precisely categorize heading phases in Miscanthus breeding populations (Varela et al., 2025). ESGAN was able to equal the accuracy of fully supervised models using only 1% of the labeled data, which made labeling much less of a chore. Its generator-discriminator design allows it to work well under changing field circumstances, making it a scalable solution for complicated agricultural situations (Depaepe, 2025).

3.3.2 Edge deployment on mobile devices and UAVs

To reach this goal, models need to be quickly put on edge devices like drones, robots, and mobile phones without using the cloud. Lightweight architectures and model compression methods like pruning, quantization, knowledge distillation, and deep compression have made it possible to run on less powerful hardware by using less memory and computing resources. This method keeps accuracy while cutting the network’s storage needs by more than 30 times. Tests on AlexNet and VGG-16 demonstrated that compressed models can fit in memory on the chip, which speeds things up and uses less power. This enables you employ complicated networks on mobile and embedded devices (Han et al., 2016). MobileNetV3 is a new and efficient CNN design that uses hardware-aware neural architecture search, the NetAdapt algorithm, and other advancements to the architecture. This leads to MobileNetV3-Large for apps that need a lot of resources and MobileNetV3-Small for apps that don’t. These models make it faster and more accurate to classify, detect, and segment images semantically. MobileNetV3-Large and the LR-ASPP decoder are superior than MobileNetV2 and have shorter lag time (Howard et al., 2019).

DL and smartphones work together to make it feasible to find crop illnesses on a big scale. A revolutionary study trained a CNN on 54,306 leaf pictures from 14 species and 26 diseases, achieving 99.35% accuracy and demonstrating the potential for large-scale smartphone-based detection (Mohanty et al., 2016).

Recent UAV phenotyping research has focused on lightweight hardware suitable for field deployment. Improvements have made it possible to process data in real time, which means that data doesn’t have to be moved around as often. A number of studies show that optimized embedded models can be just as accurate as bigger cloud-based systems. This means that they may be able to keep working well in the field.

Measuring features like leaf color, leaf area index (LAI), chlorophyll content, biomass, and yield by hand is time-consuming and not very effective. UAV remote sensing platforms (UAV-RSPs) with different sensors, on the other hand, now offer a flexible, fast, and non-destructive way to do large-scale phenotyping (Yang et al., 2017). Later, Osco and his team did 232 studies utilizing pictures taken by UAVs. They looked into a lot of different application areas, sensor integration, and ways to classify and regress. Their experiment showed how well-made models can help in monitoring crops and phenotyping in the field (Osco et al., 2021a). Guo et al. examined recent developments in UAS-based plant phenotyping, emphasizing its cost-effectiveness, versatility, and potential to integrate many sensors in settings (Guo et al., 2021).

To automatically look at canopy images, CimageA is a software system that can grow and uses machine vision and ML. It helps quickly and accurately determine phenotypes such as LAI, canopy coverage, and plant height (Fu et al., 2024). To strike a balance between speed and precision, there are two optimized models: MobileNetV2-UNet and FFB-BiSeNetV2. TensorRT was used to optimization to execute them on Jetson AGX Xavier. MobileNetV2-UNet reduced the number of parameters and calculations while speeding up inference. FFB-BiSeNetV2 had a mean Intersection over Union (IoU) of 80.28%. Both models worked in real time (more than 40 frames per second), showing that UAV-based image analysis is a good way to manage weeds and safeguard plants (Lan et al., 2021).

The lightweight model design, embedded hardware capability, and UAV sensing platforms are making the next generation of scalable solutions for plant phenotyping.

3.3.3 High-throughput field phenotyping

Drones with RGB or multispectral sensors quickly take pictures of field plots and get traits faster in the sectors of precision agriculture and breeding. Future breeding advancements necessitate the development of cost-effective, field-deployed high-throughput phenotyping devices (HTPPs). They need to combine large-scale, non-invasive sensing with automated environmental monitoring and fast ways to collect, score, and analyze data (Araus and Cairns, 2014).

A detailed study looked at how Unmanned aerial system (UAS)-based high-throughput phenotyping can be used in breeding programs. It covered tools, methods, direct trait measurement, predictive breeding, QTL finding, and the ability to speed up genetic gain (Khuimphukhieo and da Silva, 2025). Moreover, DL has enhanced both the throughput and accuracy of these UAV phenotyping pipelines. Yang et al. developed a UAV image analysis method that accurately extracts plot boundaries from field trial images, greatly speeding up trait measurements (Yang et al., 2021). Tang et al. demonstrated automated plot delineation and biomass estimation in alfalfa field trials. Using canopy area and plant height features extracted from images, the model achieved an R² of about 0.6 for biomass prediction without manual input. Multitask DL frameworks have also been applied to predict multiple traits from the same images (Tang et al., 2021). UAV-based multisensory data fusion and GeoAI (a combination of geospatial and artificial intelligence research) was designed for high-throughput maize phenotyping. UAVs equipped with hyperspectral, thermal, and LiDAR sensors were used to predict eight yield- and nitrogen-related traits. Extended NDSI analysis, classical ML models (SVM, RF), and a multitask CNN were compared. Integrating hyperspectral and LiDAR data improved prediction accuracy, and the multitask CNN achieved performance comparable to or better than single-task models, demonstrating the potential of GeoAI and sensor fusion for field-scale trait prediction (Nguyen et al., 2023).

Tausen et al. developed Greenotyper, a low-cost, distributed HTP platform using 180 Raspberry Pi cameras to monitor 1,800 clover plants and generate over 355,000 images in one experiment. A U-Net based pipeline achieved ~98% plant localization accuracy and 95% segmentation accuracy. All images were processed within one day with 96% system uptime, demonstrating an efficient and scalable alternative to traditional phenotyping platforms (Tausen et al., 2020). MtCro is a multi-task framework that models associated phenotypes together in a shared parameter space. It did better than DNNGP and SoyDNGP on big datasets of wheat and maize, making predictions up to 9% more accurate and making predictions for several phenotypes better (Chao et al., 2025).

In the end, the outcome is a phenotyping pipeline that can be scaled up and can fly a drone over a field and automatically extract yield-relevant features from each plot within minutes of data acquisition. This gives plant breeders data-driven insights that are almost real-time for making decisions about which plants to breed. It let crop researchers run tests on thousands of genotypes in different settings, helping farmers on their fields manage with accuracy.

3.4 Open Science platform in plant phenotyping

As image-based phenotyping grows, data needs to be shared and made more open. Open platforms let users add new features or functions, which encourages collaboration in community. Tools like PlantCV (Gehan et al., 2017) and Image Harvest (Knecht et al., 2016) offer researchers standardized image analysis methods that are accessible at no cost. Deep Plant Phenomics offers pre-trained neural networks for plant phenotyping and enables researchers to tailor the models to their particular requirements. In Arabidopsis thaliana, it has showed promising mutant classification and age regression with high counting accuracy (Ubbens and Stavness, 2017). Furthermore, the publication of the Global Wheat Head Detection Dataset (GWHD) was a collaborative initiative aimed at establishing a substantial and varied standard for head detection algorithms (David et al., 2020).

Alongside software, there is an initiative for enhanced data governance and adherence to Findable, Accessible, Interoperable, and Reusable (FAIR) standards (Papoutsoglou et al., 2023).

Common benchmark datasets improve algorithm development and allow research organizations to compare results. Researchers are standardizing data to make collaboration easier. Researchers increasingly share code and models, making results easier to replicate and strengthening trust in open science.

4 Remote sensing for yield and trait prediction

Beyond image-based phenotyping, recent analytical methods highlight data from several sources to better describe environmental aspects.

Forecasting agricultural yield is a plant area priority. Traditional agronomic methods use experience criteria and provide only basic biological process representations, making it difficult to understand agricultural systems. Many methods fail to capture the complex genotype×environment (G×E) interactions that impact agricultural outcomes. In recent years, AI has shown a remarkable ability to find patterns in large, heterogeneous datasets, complementing traditional models. Recent progress in HTP has enabled multi-source frameworks that integrate different data for yield and trait prediction. These may include time-series satellite imagery, weather data, soil maps, management records, and even genomic information about crop varieties. The data are diverse in nature (spatial, temporal), and accordingly, flexible AI architectures are employed to fuse them. At the same time, challenges remain in terms of model generalizability across environments and seasons when predictions are used to inform agricultural decisions or policy.

In this section, we will outline how multi-source data are integrated for crop prediction, the algorithmic advances for spatial-temporal modeling, and key applications from farm to global scales (Figure 6; Table 3).

Figure 6
Flowchart showing an agricultural decision support system. Data sources include satellite, UAV, weather forecasts, soil, measurements, temporal-spatio dynamics, and management. This data undergoes preprocessing using machine learning (SVM, RF, LASSO, GBLUP) and deep learning (CNN, RNN, GNN) techniques. Fusion modeling, calibration, and validation are performed. Outputs include a yield map, trait map, and stress risk shown on a mobile device, leading to field irrigation decisions by a farmer.

Figure 6. AI-enabled workflow for crop monitoring and decision support. Multisource environmental, phenotypic, and management data are processed using machine learning and deep learning models. Model fusion and calibration produce spatial yield and trait maps, enabling real-time, data-driven agricultural management.

Table 3
www.frontiersin.org

Table 3. Representative AI-based models for crop yield and trait forecasting.

Multisource environmental, phenotypic, and management data are processed using machine learning and deep learning models. Model fusion and calibration produce spatial yield and trait maps, enabling real-time, data-driven agricultural management.

4.1 Diverse data sources and multi-source inputs

Crop yields are influenced by multiple factors: weather throughout the season (rainfall, temperature extremes, solar radiation), soil properties (nutrient levels, texture, water-holding capacity, PH), and management (planting date, fertilization, irrigation). A typical multi-source yield prediction pipeline might take as input a sequence of satellite or UAV images (capturing the crop’s canopy development via vegetation indices), a time-series of weather variables, static soil and topography maps, and categorical variables like crop type or cultivar. AI models extract these disparate inputs that might be hard to specify in explicit models then learns to associate patterns in these various inputs with final yield.

4.1.1 Machine learning approaches for prediction

Researchers have shown that ensembles of ML models can effectively map and predict yields by learning from past yield statistics and environmental data. For example, Neumann and Furbank (2021) compiled a continent-wide yield database covering ten major crops and extensive environmental and management variables. Using an ensemble of ML methods, they achieved cross-continental yield prediction accuracies exceeding R² = 0.8 for some crops, despite the high environmental and management diversity (Newman and Furbank, 2021). The progress in using remote sensing and ML for detecting and managing Invasive Plant Species (IPS)-High-resolution imagery from UAVs, satellites, and hyperspectral sensors, combined with ML approaches has improved mapping accuracy and ecological assessment of IPS (Zaka and Samat, 2024). Integrating technologies such as LiDAR and cross-disciplinary modeling can improve monitoring and conservation under climate and urbanization pressures. UAV-derived vegetation indices combined with ML have been used for accurate corn yield prediction with limited training data. Using multispectral data from V6 and R5 stages under different soil treatments, support vector regression (SVR) and k-nearest neighbor (KNN) performed best, with red-edge and chlorophyll indices as key predictors (Kumar et al., 2023). Such studies indicates when enough historical data are available, ML approaches can capture yield determinants across climate zones and farming systems in ways.

At local scales, researchers use freely available satellite imagery and weather data to estimate yields without relying on additional field surveys. In mango orchards, Torgbor et al. developed a time-series modeling framework for yield forecasting without in-field fruit counting. Using eight years of yield data from 51 orchard blocks, together with remotely sensed vegetation indices and climatic variables, ML models (e.g., RF, SVM) achieved accurate yield predictions while reducing the need for extensive ground sampling (Torgbor et al., 2023). Genomic prediction models use genome-wide DNA markers (e.g. SNPs) to predict the breeding values of individuals termed genomic estimated breeding values (GEBVs) without extensive field testing. These ML models have been widely adopted in plant breeding and can predict cultivar performance in different environments by accounting for G×E interactions (Crossa et al., 2025b).

A wide range of ML algorithms have been applied to implement models for G×E: such as GBLUP, a linear mixed model using the G matrix (Nishio and Satoh, 2015). GBLUP captures additive genetic effects and is robust for large marker sets. Bayesian genomic models Methods like Bayes A, Bayes B, Bayes C, Bayesian LASSO, etc. potentially improving prediction when some QTL have large effects (Shi et al., 2021). Reproducing Kernel Hilbert Space (RKHS) methods have been widely utilized in plant breeding for their capacity to model non-linear genetic architectures, including epistatic and G×E interactions. As one of the earliest ML approaches adopted in this field, RKHS regression projects marker genotype data into a high-dimensional function space via kernel function (Gianola and van Kaam, 2008; De Los Campos et al., 2010). RF had been used for genomic predictions and can handle many predictors by averaging over many tree models. It can model G×E by including environment indicators or interactions as features (Stephan et al., 2015). SVM find optimal separating hyperplanes and can be used with kernels for regression (SVR). They have been tested on genomic datasets to capture non-linear genotype-phenotype relationships for very large datasets for genomic prediction (Montesinos López et al., 2022). All kinds of ML approach are highly scalable, since satellite data and gridded weather are available, making it feasible to extend yield prediction to regions where field surveys are impractical.

4.1.2 Combined deep learning for seasonal yield prediction

Neural networks, such as Multi-Layer Perceptrons (MLPs) and CNNs, can automatically capture complex feature interactions, particularly when handling large-scale datasets. As extensively reviewed, DL methods are capable of integrating diverse data sources, including image-based HTP and genomic markers to enhance G×E predictions. Furthermore, emerging architectures like the DNN Genomic Prediction (DNNGP) framework set a precedent for fusing multi-omics data (Wang et al., 2023). A pioneering example of a multi-source architecture for yield prediction is DeepAgroNet, which employs a multi-branch neural network to handle different data types: one branch is a CNN that processes spatial features from satellite images, another branch is Long Short-Term Memory (LSTM) that captures temporal patterns in sequential weather data, and a third branch is a feed-forward network for static variables. By merging these branches, the model can acquire a unified depiction of the environment and agricultural circumstances for different place and period (Ashfaq et al., 2025b).

The FL-AGRN model predicts crop yields by integrating attention-based GNNs and RNNs inside a FL architecture (Nirosha and Vennila, 2025a). A GNN-RNN framework for predicting crop yields across the country clearly considered both geographical and temporal correlations. The model included data from more than 2,000 U.S. counties from 1981 to 2019. It did better than linear, tree-based, and DL baselines (Fan et al., 2022). The model attained great accuracy (Mean Absolute Error (MAE) = 1.23, Root Mean Square Error (RMSE) = 1.45, R² = 0.99) through distributed training with a Federated Gaussian Average aggregator and meticulous data pretreatment, all while ensuring data confidentiality and scalability. It did better than traditional methods and made the yield forecasts more accurate (Nirosha and Vennila, 2025b). The Agri-GNN also established to simulate crop yield by explicitly addressing both spatial and genotypic interactions among farming plots. Agri-GNN is built on a GraphSAGE architecture and combines vegetation indices, genotype, time, and location data. It works better than ML methods, showing that graph-based neural architectures have a lot of potential for agriculture (Gupta and Singh, 2023).

Temporal dynamics are especially important in yield formation: crops progress through growth stages, and mid-season indicators (e.g. a severe drought during the flowering stage, or a period of low NDVI for stress) strongly influence final yield. AI models enabling early detection. Recurrent neural networks like LSTM and GRU are natural choices to handle time-series input. In winter wheat yield prediction, researchers showed that using multi-month NDVI time series as input to an LSTM improved yield estimates compared with single-image or cumulative index approaches. The sequential data enabled the model to account for early growth patterns and stress (Sharma et al., 2020; Zhang et al., 2024). Moreover, LSTM-based models can integrate information across the entire season (from emergence to grain filling) A recent study combined meta-transformer for multi-source feature fusion with temporal graph neural networks for sequential modeling, achieving nearly 97% accuracy on the EPFL dataset and outperforming LSTM, CNN, and standard Transformer baselines (Sarkar et al., 2024). Furthermore, convolutional temporal networks and even 1D CNNs have been used to extract features from weather time-series for yield prediction, summer crops using Landsat EVI time series, which can outperform RNN, RF, and SVM in accuracy and F1 score (Saha et al., 2025).

Transformer-based models, which excel at long-sequence tasks have been applied to multi-year yield forecasting. Transformer’s attention mechanism can in principle learn to weight the most critical periods (e.g., weeks of drought stress) more heavily. In one study, a transformer model outperformed LSTMs for early-season county-level yield prediction in the U.S., likely due to its ability to flexibly incorporate historical information and early-season imagery with varying importance; Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), which integrates satellite imagery with meteorological data through multi-modal, spatial, and temporal transformer modules. This framework accounts for both short-term weather variability and long-term climate influences. Experiments conducted in more than 200 U.S. counties showed that MMST-ViT achieved better performance comparable methods on multiple evaluation metrics (Lin et al., 2023). Najjar et al. submitted a paper to ICLR 2025 that was later withdrawn. It looked at how easy it is to explain Multimodal learning frameworks for predicting crop yields. The research utilized Transformer-based architectures to amalgamate many modalities, including satellite imagery and environmental data, while implementing self-attention and feature attribution techniques to improve interpretability and perhaps yield more robust predictions (Najjar et al., 2025).

4.2 Modeling temporal-spatio dynamics

Combining both spatial and temporal dimensions is a key problem in crop modeling. Topography, soil heterogeneity, microclimates, the timing of stresses (drought, heat, pest outbreaks), and the path of crop growth (e.g., rapid early growth vs. slow start) can all affect how crops grow and how much they yield.

In traffic field previous studies of TedTrajRec, is proposed to improve recovery performance. PD-GNN is employed to model periodic and topologically aware traffic patterns before, while TedFormer, a time-aware Transformer with neural ordinary differential equations, effectively learns temporal dependencies from irregularly sampled data (Sun et al., 2025). In AI-Plant field, AI-based approaches address spatial heterogeneity by incorporating remote sensing at multiple resolutions, such as combining coarse satellite data for regional climate context with high-resolution drone images for fine-grained within-field features. AI can extract spatial features (textures, patterns) that correlate with crop health, while graphical AI applied to geospatial grids can capture interactions between neighboring areas (e.g. disease spread or irrigation patterns).

4.2.1 Spatio-temporal dynamics

Early spatio-temporal rice mapping mainly used phenology- or feature-based analysis of optical and SAR time-series data with thresholding and rule-based classification. A phenology-based algorithm (ILMP) combining Landsat and Moderate-Resolution Imaging Spectroradiometer (MODIS) achieved over 93% accuracy in Nanchang (2015), performing better than NLCD in fragmented cropland (Ding et al., 2020). The PKI method using Sentinel-1 SAR time series achieved 97.99% accuracy in paddy rice mapping, outperforming phenology-based methods and enabling large-scale application (Lin et al., 2024).

Early rule-based methods laid the basis for DL approaches that capture complex spatiotemporal signals. Crop mapping has since shifted from phenology analysis to data-driven feature extraction. Lightweight CNN combined with parcel-based image analysis was developed for crop classification using Sentinel-2 time-series imagery. Applied to two regions in Türkiye, the model achieved overall accuracies of 89.3% and 88.3%, outperforming VGG-16, ResNet-50, and U-Net, and demonstrating its cost-efficient crop mapping (Altun and Turker, 2025). Another approach for spatio-temporal modeling is the use of encoder-decoder frameworks that take sequences of input images (or weather maps) and predict sequences of outputs (like maps of predicted yield). CNN capturing spatial features and LSTM modeling phenological dynamics. A recent soybean study proposed a CNN-LSTM model for county-level soybean yield prediction across CONUS, integrating weather and MODIS data via Google Earth Engine. The hybrid model outperformed alone CNN or LSTM approaches for both in-season and end-of-season prediction, showing promise for broader crop applications (Sun et al., 2019).

An attention-based Geo-CBAM-CNN model was developed for crop classification using Sentinel-2 time-series imagery. By integrating geographic information into an attention module, it effectively mitigated spatial heterogeneity and enhanced spectral-spatial feature extraction. Validated across multiple U.S. regions, it achieved 97.82% overall accuracy across multiple U.S. regions and showed strong spatial adaptability for large-scale applications (Wang et al., 2021). To address the challenges of nonlinear spatiotemporal dependencies in yield prediction, a knowledge-guided Spatial-Temporal Attention Graph Network (KSTAGE) was proposed. By integrating spectral features with prior knowledge through temporal attention and spatial graph modeling, KSTAGE achieved significant improvements over baseline models in county-level yield prediction tasks both in China and the U.S (Qiao et al., 2023). The CNN- Self-Attention based model effectively captures fine spatial and long-term temporal patterns from high-resolution imagery. DL framework SepHRNet combining HRNet and Self-Attention was proposed for crop mapping, achieving 97.5% accuracy and outperforming state-of-the-art models on the Zuericrop dataset (Goyal et al., 2025). For large-scale corn yield estimation, a deep spatiotemporal framework (DeepCropNet, DCN) was developed that combines Attention-Based LSTM (RNN) for temporal features and multitask learning for spatial features. Applied to U.S. Corn Belt data (1981–2016), DCN outperformed LASSO and RF (RMSE = 0.82 vs. 1.14 and 1.05 Mg ha-¹), effectively capturing temporal effects and patterns (Lin et al., 2020).

Both Cropformer and AgriFM are Transformer-based models, with Cropformer focusing on multi-scenario crop classification and AgriFM extending to large-scale, multi-source spatiotemporal crop mapping. Cropformer, a self-supervised and fine-tuned Transformer-based model, enables accurate and transferable multi-scenario crop classification from time-series remote sensing data (Wang et al., 2025). AgriFM, a multi-source remote sensing foundation model designed for crop mapping through unified multi-scale spatiotemporal feature extraction. Built on a modified Swin Transformer and pre-trained on over 25 million samples from MODIS, Landsat, and Sentinel-2, it supports diverse tasks such as cropland mapping, boundary delineation, and early-season crop classification. It outperforms remote sensing foundation models (RSFMs), showing strong scalability and adaptability (Li et al., 2025).

The new trend is Spatial-Temporal (ST) is GNN, by structuring the data as a graph, these approaches can incorporate domain knowledge (e.g. adjacency or shared water sources) and have the model learn how information flows in both space and time. The Spatial-Temporal Synchronous Graph Convolutional Network (STSGCN) was proposed to address complex localized spatiotemporal correlations and heterogeneities in network data forecasting (Song et al., 2020). A study introduced a hyperspectral maize nitrogen prediction model that combines a dynamic spectral-spatiotemporal attention mechanism with a Graph Neural Network (GNN). The technique attained great accuracy (R² = 0.96) and surpassed models like SVM, RF, ResNet, and ViT, exhibiting strong adaptability throughout developmental stages and geographical contexts (Lu et al., 2025).

These modifications indicate employing advanced AI techniques brings a transition from rule-based phenological mapping to integrated spatiotemporal modeling.

4.2.2 Operational deployment and decision support

Remote sensing and spatially explicit yield prediction can help in making decisions about farming and policy. Satellite-based AI models allow the United Nations (UN) Food and Agriculture Organization (FAO) to monitor drought’s effects on crops worldwide in real time. The Agricultural Stress Index System (ASIS) from the FAO combines digital innovation with decades of satellite data, such as long-term AVHRR-derived vegetation health indicators. ASIS helps with anticipatory action, crop insurance, and drought management both on a national and regional scale (Van Hoolst et al., 2016; Rojas, 2021).

4.3 Representative applications with case studies

From research to practice, data-driven yield and trait prediction has proven effective for crop management. Some of the important uses are:

4.3.1 Field trial analytics

Field trials at multiple sites generate thousands of experimental crop genotypes. AI models are utilized to guess important qualities like yield and drought tolerance, as well as meteorological conditions and soil data from each of the location. A study on maize used multitask DL to predict multiple traits from UAV imagery simultaneously, improving the selection of promising hybrids GNN have also been employed to leverage genetic relatedness information alongside environment data, enhancing the prediction of genotype performance across trial sites (Zhou et al., 2025).

4.3.2 Regional yield forecasts

Prior to harvest, governments and commodity markets seek reliable yield forecasts. AI models ingest satellite time-series for crop cover and weather data to forecast yields at district or county levels. Ashfaq et al. demonstrated forecast for wheat in South Asia using an LSTM on NDVI sequences (Ashfaq et al., 2025a), and similar approaches have been tested for corn and soy in the U.S. Corn Belt (Han et al., 2024). These forecasts can inform logistics (e.g. storage and transportation needs) and market pricing, therefore predict early interventions.

4.3.3 Stress and resilience trait prediction

AI can predict traits like drought tolerance, disease outbreak, or nutritional deficiencies using remote sensing data, in addition to yield. Models have been taught to forecast drought stress indices for each pixel in a satellite image, which shows where crops are having trouble getting enough water (Desloires et al., 2024). ML also forecasts when pests and diseases will happen by noticing small changes in the canopy before they happen, which allows for early management actions.

In these applications and case studies, typical problems include making models work in novel situations and making forecasts easier for end users to understand. SHAP (SHapley Additive exPlanations) is one of the methods used to figure out which elements had the biggest effect on a prediction (Lundberg and Lee, 2017; Crane-Droesch, 2018). With advancements in hardware and data infrastructure, the obstacles to large-scale deployment of these models are diminishing.

4.4 Field-based AI-enabled yield forecasting

Nowadays, a growing body of evidence demonstrates that AI methods substantially improve yield forecasting across remote-sensing and environmental data streams. Among these approaches, both ML and DL methods have shown clear advantages in capturing complex spatial and temporal patterns for yield forecasting.

4.4.1 Machine learning based yield forecasting

Accurate yield estimations are increasingly rely on remote sensing, and the growing data volume has made machine learning essential for handling complex, nonlinear information and improving prediction accuracy in modern agricultural systems (Chlingaryan et al., 2018). A global assessment across wheat, maize, and potato showed that RF provide notably higher yield-prediction accuracy than linear regression, with RF achieving RMSE of 6-14% compared to 14-49% for linear models. Moreover, yield data was from various sources and regions for model training and testing: gridded global wheat grain yield, maize grain yield from US counties over thirty years, and potato tuber/maize silage yield from the northeastern seaboard region for large-scale climate-driven yield forecasting (Jeong et al., 2016). ML ensemble to forecast maize yields across three U.S. Corn Belt states using both complete and partial in-season weather information. Weighted and average ensemble models achieved high accuracy, and were able to generate reliable early-season forecasts as early as June 1, RRMSE of 9.2% (Shahhosseini et al., 2020).

4.4.2 Deep learning based yield forecasting

Using multi-temporal MODIS surface reflectance data on satellite across major U.S. soybean-producing states, You et al. developed a deep Gaussian process framework that reduced county-level RMSE by approximately 30% relative to the best traditional remote-sensing models and delivered earlier-season forecasts with steadily improving skill. When aggregated to the national level, the model also achieved about 15% lower MAPE than USDA survey-based estimates in August and September, demonstrating competitive performance well before harvest (Jiaxuan et al., 2017). A DL framework combining CNN and RNN architectures has been used to forecast corn and soybean yields across the U.S. Corn Belt, integrating environmental and management time-series data. The CNN-RNN model achieved notably low errors (8-9% RMSE of average yield), outperforming RF and LASSO, and demonstrated strong generalization to unseen environments. Its design enables the extraction of temporal environmental signals allowing attribution analyses to quantify the influence of weather, soil conditions, and management practices on yield variation (Khaki and Wang, 2019). A transformer-based Informer model has been applied to rice yield forecasting across the Indo-Gangetic Plains by integrating time-series satellite data, environmental variables, and 2001–2016 yield records. The model outperformed multiple ML and DL baselines (R² = 0.81; RMSE = 0.41 t ha-¹) and achieved stable within-season accuracy (R² ≈ 0.78) as early as two months before maturity. NIRV and late-season growth stages emerged as dominant predictors, reinforcing the model’s predictive strength and interpretability (Liu et al., 2022).

Collectively, these case studies provide convergent and quantitative evidence that AI enables more accurate, earlier, and more transferable yield forecasting than traditional statistical or process-based models across satellite, UAV, and environmental data sources.

5 Challenges of deploying AI in real agricultural environments

Even though there have been big improvements in finding traits, detecting the environment, and predicting yields, using AI in modern farming systems is still make challenge in real conditions.

5.1 Data limitations and domain shift

5.1.1 Data limitations

Agricultural production systems exhibit substantial spatial and temporal variability, and this complexity poses major challenges for developing reliable AI models. plant traits change across environmental conditions, reflecting the strong phenotypic plasticity of plants. Variations in day length, water supply, nutrient availability, or light intensity can lead to substantial differences in plant architecture, making the quantification of structural and developmental variation a central task in phenomics research (Poorter et al., 2019). Crop yield is influenced by interacting factors such as climate, soil conditions, fertilizer inputs, and varietal differences (Ansarifar et al., 2021). Data availability and sharing remain uneven across crops, regions, and production scales (Wu et al., 2023). HTP platforms provide valuable sensor-based phenotypic data, but most available datasets cover few sites and seasons and lack consistent formats. These limitations reduce comparability and broader use (Danilevicz et al., 2021).

5.1.2 Domain shift

These data limitations exacerbate domain shift between the data-rich environments used for model training and the data-poor systems where predictions are often needed. Accurate yield estimation for major U.S. crops such as corn and soybean has become increasingly important under growing climate variability and production uncertainty. Yet these crops are grown across highly diverse agroecological zones that differ in climate patterns, soil properties and management practices. Such regional heterogeneity introduces domain, meaning that models trained in one production environment may not generalize well to others (Ma et al., 2021; Ma et al., 2023). Smallholder systems in South Asia exhibit severe spatial heterogeneity (fields are small, fragmented and managed with diverse practices), making their distribution fundamentally different from that of the large, uniform fields typically used to train remote-sensing models. Studies in the region show reduced yield-mapping accuracy under these mismatches, illustrating a clear form of spatial domain shift (Jain et al., 2016).

5.2 Operational, and sensor constraints

Many high-performing deep learning models require computational resources, stable data pipelines, and technical expertise that are often absent in serve farmers. Farmers and agents generally need simplified, low-maintenance decision-support tools rather than complex, opaque systems. If AI become complicated, agronomists and breeders are reluctant to trust or adopt them, and it becomes difficult to diagnose whether errors arise from sensor noise, environmental heterogeneity, or broader domain-shift effects (Gardezi et al., 2024). Sensor instability further reduces the reliability of model inputs. Imagery from UAV, proximal, and satellite platforms frequently varies in spatial resolution, spectral fidelity, temporal coverage, and calibration quality, while sensor aging and calibration drift can introduce systematic biases into repeated measurements. To mitigate these inconsistencies, combining data from multiple sensing platforms and leveraging sensor-fusion or ensemble approaches can improve robustness by averaging across individual sensor limitations (Karmakar et al., 2024; Wang et al., 2025).

Taken together, these operational, sensor, and interpretability constraints help explain why models that perform well in research environments often fail to generalize in real-world production systems.

5.3 Future directions

Future progress will depend on several key areas (van Dijk et al., 2021): model interpretability, which will allow predictions and explanatory outputs to align more closely with management needs (Rajpurkar et al., 2022); the incorporation of methods into multiple crop monitoring systems, enabling more consistent and timely field assessment (Jumper et al., 2021); the adoption of FAIR data principles, which will support broader multi-site datasets and improve generalizability across diverse field conditions; and (Bodnar et al., 2025) opportunities for future advancement, including streamlined architectures that can be deployed in resource-limited settings and integrative approaches that enhance operational decision support.

Looking ahead, continued improvements in data quality, model design, and agronomic integration will further enhance the practical use of AI in agriculture. Expanded multi-site datasets will support cross-regional generalization, lightweight architectures will ease deployment constraints, and advances in interpretability will allow models to provide explanations that better match real management requirements.

6 Conclusion

The amalgamation of AI with imaging technologies and multi-source data is revolutionizing plant phenotyping from a conventional low-throughput, labor-intensive constraint into an intelligent and data-driven high-throughput framework. Progress in various AI architectures has significantly improved the precision and flexibility of contemporary agriculture.

7 Discussion for perspectives

This review combines modeling methods with plant phenotyping and multi-source remote sensing to provide predictions. There are still both challenges and perspectives for future.

7.1 Model interpretability

Techniques such as gradient-weighted class activation mapping (Grad-CAM) (Selvaraju et al., 2017), and SHAP analysis (Desloires et al., 2024) help to clarify the impact of specific characteristics or regions on model predictions. Although the lack of consistent data, limited computing power, and poor model interpretability continue to be major obstacles to the widespread use of modeling approaches in agriculture. To get over these problems, people are now trying to lightweight analytical tools that are easier to test and use, as well as more flexible and understandable for different types of production systems, which, will meet the needs of real-world agricultural systems.

7.2 Multiple crop monitoring systems for remote sensing

CMS have been established worldwide to support agricultural management and food security assessment. Representative platforms include FAO’s ASIS, the European Commission’s Anomaly Hotspots of Agricultural Production (ASAP) and Monitoring Agricultural Resources (MARS), Group on Earth Observations Global Agricultural Monitoring Initiative (GEOGLAM)’s Crop Monitor, NASA Harvest’s Global Agricultural Monitoring (GLAM), China’s CropWatch, USDA VegScape, and several national systems such as FASAL in India and VEGA in Russia. These platforms share a common focus on monitoring crop growth conditions, assessing environmental drivers (Wu et al., 2023).

Other platforms provide a comprehensive and flexible foundation for agricultural and environmental observation data. Satellite platforms (MODIS, Sentinel-2, Landsat-8/9, PlanetScope, WorldView, GF series) offer broad spatial coverage and time series. UAV-based platforms can deliver centimeter-level spatial resolution and flexible flight scheduling, making fine-scale structures detection. Radar and microwave platforms (Sentinel-1 SAR, RADARSAT, ALOS PALSAR, SMAP) provide all-weather observation that are unaffected by cloud cover or illumination conditions, soil moisture mapping and crop structural analysis.

Most systems rely on time-series profiling, and spatial analysis to evaluate crop status. However, differences in classification standards continue to limit the accuracy of these systems across countries, so collaboration will promote standardized protocols and data sharing, enabling more transparent and comparable crop monitoring practices globally.

7.3 FAIR data principles

Data sharing and standardization is necessary to accelerate plant phenotyping and sensing. Sensing technologies and methodologies have progressed, but sensed data often differ between platforms and thus may be fragmentary and inconsistent. Shared standards like FAIR, making data comparable so that modeling traits different crop types, regions, and environmental conditions be possible.

Setting up a unified open data platform will facilitate the sharing and permanent preservation of research products. There will be more shared code in programming languages, more shared common parameters for models used in research, and a single metadata framework to express all the datasets of projects accumulated with time in different locations. So we will certainly face fewer technical barriers when working and collaborating with each other in different locations.

For future, internationally standardized agricultural data interoperable can be developed for reproducible research, use the information permanently in order to get the best value from these agricultural data sources through long-term use.

7.4 Opportunities for future advancement

Future research in plant phenotyping and yield prediction is expected to place increasing emphasis on integration, standardization, and interpretability, ensuring that phenotypic, environmental, and genomic data are findable, accessible, interoperable, and reusable across platforms and institutions, which will facilitate large-scale collaboration, enable more rigorous cross-study comparisons, and reduce barriers for data reuse.

Moreover, AI is entering the era of foundation models. These models are widely used, to truly understand and manage foundation models, close interdisciplinary collaboration is needed, involving not only technical aspects but also in social, ethical, and legal dimensions (Bommasani et al., 2022). In plant research, it is insufficient to concentrate alone on attaining lightweight model correctness; equal emphasis must be placed on guaranteeing the transparency and interpretability of the analytical process. It is necessary to have both of them. Finally, computer scientists, plant biologists, agronomists, and policymakers will need to work closely together to make this vision a reality. All of these approaches could help make production systems that last longer.

In summary, we propose to build an integrated architecture for global agriculture. Achieving this vision requires not only a unified, lightweight AI model but also broad collaboration. Building on open and new agricultural ecosystems with AI-based phenotyping and prediction will be certainly to revolutionize the process of generating brand new systems within the next several decades.

Author contributions

TW: Conceptualization, Investigation, Resources, Supervision, Writing – original draft, Writing – review & editing. RT: Conceptualization, Writing – original draft, Writing – review & editing. TX: Resources, Writing – review & editing. YL: Writing – review & editing. YC: Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author TW declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

MLP, Multi-Layer Perceptron; MMST-ViT, Multi-Modal Spatial-Temporal Vision Transformer; NDVI, Normalized Difference Vegetation Index; QTL, Quantitative Trait Locus/Loci; RF, Random Forest; RKHS, Reproducing Kernel Hilbert Space; RMSE, Root Mean Square Error; RNN, Recurrent Neural Network; RSFM, Remote Sensing Foundation Model; SIFT, Scale-Invariant Feature Transform; SHAP, SHapley Additive exPlanations; SOS/EOS, Start/End of Season; STSGCN, Spatial-Temporal Synchronous Graph Convolutional Network; Swin Transformer, Shifted Window Transformer; SVM/SVR, Support Vector Machine/Support Vector Regression; TFT, Temporal Fusion Transformer; UAS, Unmanned aerial system; UAV, Unmanned Aerial Vehicle; UAV-RSPs, UAV Remote Sensing Platforms; UN, United Nations; ViT, Vision Transformer.

References

Akkus, C., Chu, L., Djakovic, V., Jauch-Walser, S., Koch, P., Loss, G., et al. (2023). Multimodal Deep Learning. arXiv Prepr arXiv230104856.

Google Scholar

Altun, M. and Turker, M. (2025). Integration of convolutional neural networks with parcel-based image analysis for crop type mapping from time-series images. Earth Sci. Inf. 18, 303. doi: 10.1007/s12145-025-01819-8

Crossref Full Text | Google Scholar

Ansarifar, J., Wang, L., and Archontoulis, S. V. (2021). An interaction regression model for crop yield prediction. Sci. Rep. 11, 17754. doi: 10.1038/s41598-021-97221-7

PubMed Abstract | Crossref Full Text | Google Scholar

Araus, J. L. and Cairns, J. E. (2014). Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 19, 52–61. doi: 10.1016/j.tplants.2013.09.008

PubMed Abstract | Crossref Full Text | Google Scholar

Ashfaq, M., Khan, I., Afzal, R. F., Shah, D., Ali, S., and Tahir, M. (2025a). Enhanced wheat yield prediction through integrated climate and satellite data using advanced AI techniques. Sci. Rep. 15, 18093. doi: 10.1038/s41598-025-02700-w

PubMed Abstract | Crossref Full Text | Google Scholar

Ashfaq, M., Khan, I., Shah, D., Ali, S., and Tahir, M. (2025b). Predicting wheat yield using deep learning and multi-source environmental data. Sci. Rep. 15, 26446. doi: 10.1038/s41598-025-11780-7

PubMed Abstract | Crossref Full Text | Google Scholar

(2024). Advancing geoscience with AI. Nat. Geosci 17, 947. doi: 10.1038/s41561-024-01572-5

Crossref Full Text | Google Scholar

Bahdanau, D., Cho, K., and Bengio, Y. (2014). “Neural machine translation by jointly learning to align and translate,” in CoRR. abs/1409.0. arXiv preprint arXiv:1409.0473. doi: 10.48550/arXiv.1409.0473

Crossref Full Text | Google Scholar

Baltrusaitis, T., Ahuja, C., and Morency, L. P. (2019). Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443. doi: 10.1109/TPAMI.2018.2798607

PubMed Abstract | Crossref Full Text | Google Scholar

Barré, P., Stöver, B. C., Müller, K. F., and Steinhage, V. (2017). LeafNet: A computer vision system for automatic plant species identification. Ecol. Inform 40, 50–56. doi: 10.1016/j.ecoinf.2017.05.005

Crossref Full Text | Google Scholar

Beikmohammadi, A., Faez, K., and Motallebi, A. (2022). SWP-LeafNET: A novel multistage approach for plant leaf identification based on deep CNN. Expert Syst. Appl. 202, 117470. doi: 10.1016/j.eswa.2022.117470

Crossref Full Text | Google Scholar

Belgiu, M. and Drăguţ, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm Remote Sens 114, 24–31. doi: 10.1016/j.isprsjprs.2016.01.011

Crossref Full Text | Google Scholar

Bi, L., Wally, O., Hu, G., Tenuta, A. U., Kandel, Y. R., and Mueller, D. S. (2023). A transformer-based approach for early prediction of soybean yield using time-series images. Front. Plant Sci. 14. doi: 10.3389/fpls.2023.1173036

PubMed Abstract | Crossref Full Text | Google Scholar

Bodnar, C., Bruinsma, W. P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., et al. (2025). A foundation model for the Earth system. Nature. 641, 1180–1187. doi: 10.1038/s41586-025-09005-y

PubMed Abstract | Crossref Full Text | Google Scholar

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., et al. (2022). On the opportunities and risks of foundation models. 1–214, arXiv preprint arXiv: 2108.07258. doi: 10.48550/arXiv.2108.07258

Crossref Full Text | Google Scholar

Breiman, L. (2001). Random forests. Mach. Learn 45, 5–32. doi: 10.1023/A:1010933404324

Crossref Full Text | Google Scholar

Chao, D., Wang, H., Wan, F., Yan, S., Fang, W., and Yang, Y. (2025). MtCro: multi-task deep learning framework improves multi-trait genomic prediction of crops. Plant Methods 21, 12. doi: 10.1186/s13007-024-01321-0

PubMed Abstract | Crossref Full Text | Google Scholar

Cheng, Y., Liu, Z., Lan, G., Xu, J., Chen, R., and Huang, Y. (2025). Edge_MVSFormer: edge-aware multi-view stereo plant reconstruction based on transformer networks. Sensors 25, 2177. doi: 10.3390/s25072177

PubMed Abstract | Crossref Full Text | Google Scholar

Chlingaryan, A., Sukkarieh, S., and Whelan, B. (2018). Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput. Electron Agric. 151, 61–69. doi: 10.1016/j.compag.2018.05.012

Crossref Full Text | Google Scholar

Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014). “Learning phrase representations using {RNN} Encoder{--}Decoder for statistical machine translation,” in Proceedings of the 2014 conference on empirical methods in natural language processing ({EMNLP}). Eds. Moschitti, A., Pang, B., and Daelemans, W. (Association for Computational Linguistics, Doha, Qatar), 1724–1734. doi: 10.3115/v1/D14-1179

Crossref Full Text | Google Scholar

Clark, S. A. and van der Werf, J. (2013). Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values. Methods Mol. Biol. 1019, 321–330. doi: 10.1007/978-1-62703-447-0_13

PubMed Abstract | Crossref Full Text | Google Scholar

Cortes, C. and Vapnik, V. (1995). Support-vector networks. Mach. Learn 20, 273–297. doi: 10.1007/BF00994018

Crossref Full Text | Google Scholar

Crane-Droesch, A. (2018). Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environ. Res. Lett. 13, 114003. doi: 10.1088/1748-9326/aae159

Crossref Full Text | Google Scholar

Crossa, J., Martini, J. W. R., Vitale, P., Pérez-Rodríguez, P., Costa-Neto, G., Fritsche-Neto, R., et al. (2025a). Expanding genomic prediction in plant breeding: harnessing big data, machine learning, and advanced software. Trends Plant Sci. 30, 756–774. doi: 10.1016/j.tplants.2024.12.009

PubMed Abstract | Crossref Full Text | Google Scholar

Crossa, J., Montesinos-Lopez, O. A., Costa-Neto, G., Vitale, P., Martini, J. W. R., Runcie, D., et al. (2025b). Machine learning algorithms translate big data into predictive breeding accuracy. Trends Plant Sci. 30, 167–184. doi: 10.1016/j.tplants.2024.09.011

PubMed Abstract | Crossref Full Text | Google Scholar

Danilevicz, M. F., Bayer, P. E., Nestor, B. J., Bennamoun, M., and Edwards, D. (2021). Resources for image-based high-throughput phenotyping in crops and data sharing challenges. Plant Physiol. 187, 699–715. doi: 10.1093/plphys/kiab301

PubMed Abstract | Crossref Full Text | Google Scholar

David, E., Madec, S., Sadeghi-Tehran, P., Aasen, H., Zheng, B., Liu, S., et al. (2020). Global wheat head detection (GWHD) dataset: A large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics 2020, 3521852. doi: 10.34133/2020/3521852

PubMed Abstract | Crossref Full Text | Google Scholar

De Los Campos, G., Gianola, D., Rosa, G. J. M., Weigel, K. A., and Crossa, J. (2010). Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet. Res. (Camb). 92, 295–308. doi: 10.1017/S0016672310000285

PubMed Abstract | Crossref Full Text | Google Scholar

Depaepe, T. (2025). Harder, better, faster, stronger, and with less annotated data: ESGAN and plant sciences. Plant Physiol. 198, kiaf171. doi: 10.1093/plphys/kiaf171

PubMed Abstract | Crossref Full Text | Google Scholar

Desloires, J., Ienco, D., and Botrel, A. (2024). Early season forecasting of corn yield at field level from multi-source satellite time series data. Remote Sens 16, 1573. doi: 10.3390/rs16091573

Crossref Full Text | Google Scholar

Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019–2019 Conf North Am. Chapter Assoc. Comput. Linguist Hum. Lang Technol. - Proc. Conf. 1, 4171–4186. doi: 10.18653/v1/N19-1423

Crossref Full Text | Google Scholar

Ding, M., Guan, Q., Li, L., Zhang, H., Liu, C., and Zhang, L. (2020). Phenology-based rice paddy mapping using multi-source satellite imagery and a fusion algorithm applied to the poyang lake plain, southern China. Remote Sens 12, 1022. doi: 10.3390/rs12061022

Crossref Full Text | Google Scholar

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). “an image is worth 16X16 words: transformers for image recognition at scale,” in ICLR 2021 - 9th int conf learn represent. doi: 10.48550/arXiv.2010.11929

Crossref Full Text | Google Scholar

Fahlgren, N., Gehan, M. A., and Baxter, I. (2015). Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. Curr. Opin. Plant Biol. 24, 93–99. doi: 10.1016/j.pbi.2015.02.006

PubMed Abstract | Crossref Full Text | Google Scholar

Fan, J., Bai, J., Li, Z., Ortiz-Bobea, A., and Gomes, C. P. (2022). A GNN-RNN approach for harnessing geospatial and temporal information: application to crop yield prediction. Proc. AAAI Conf Artif. Intell. 36, 11873–11881. doi: 10.1609/aaai.v36i11.21444

Crossref Full Text | Google Scholar

Fu, H., Lu, J., Cui, G., Nie, J., Wang, W., She, W., et al. (2024). Advanced plant phenotyping: unmanned aerial vehicle remote sensing and cimageA software technology for precision crop growth monitoring. Agronomy 14, 2534. doi: 10.3390/agronomy14112534

Crossref Full Text | Google Scholar

Gaillard, M., Miao, C., Schnable, J. C., and Benes, B. (2020). Voxel carving-based 3D reconstruction of sorghum identifies genetic determinants of light interception efficiency. Plant Direct. 4, 1–16. doi: 10.1002/pld3.255

PubMed Abstract | Crossref Full Text | Google Scholar

Gardezi, M., Joshi, B., Rizzo, D. M., Ryan, M., Prutzer, E., Brugler, S., et al. (2024). Artificial intelligence in farming: Challenges and opportunities for building trust. Agron. J. 116, 1217–1228. doi: 10.1002/agj2.21353

Crossref Full Text | Google Scholar

Gehan, M. A., Fahlgren, N., Abbasi, A., Berry, J. C., Callen, S. T., Chavez, L., et al. (2017). PlantCV v2: Image analysis software for high-throughput plant phenotyping. PeerJ. 5, e4088. doi: 10.7717/peerj.4088

PubMed Abstract | Crossref Full Text | Google Scholar

Ghanbari, A., Shirdel, G. H., and Maleki, F. (2024). Semi-self-supervised domain adaptation: developing deep learning models with limited annotated data for wheat head segmentation. Algorithms. 17, 1–12. doi: 10.3390/a17060267

Crossref Full Text | Google Scholar

Gianola, D. and van Kaam, JBCHM. (2008). Reproducing kernel hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics. 178, 2289–2303. doi: 10.1534/genetics.107.084285

PubMed Abstract | Crossref Full Text | Google Scholar

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). “Neural message passing for Quantum chemistry,” in Proceedings of the 34th international conference on machine learning, vol. 70. (Sydney, New South Wales, Australia: JMLR.org), 1263–1272. (ICML’17). doi: 10.48550/arXiv.1704.01212

Crossref Full Text | Google Scholar

Gopalan, B., Nascimento, N., and Monga, V. (2025). Modular transformer architecture for precision agriculture imaging, arXiv preprint arXiv:2508.03751. doi: 10.48550/arXiv.2508.03751

Crossref Full Text | Google Scholar

Goyal, P., Mitra, A., and Sinha, M. (2025). “SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution,” in Proceedings of the 8th International Conference on Data Science and Management of Data (12th ACM IKDD CODS and 30th COMAD). Jodhpur, India: Association for Computing Machinery (ACM), 27–34. doi: 10.1145/3703323.3703327

Crossref Full Text | Google Scholar

Guo, W., Carroll, M. E., Singh, A., Swetnam, T. L., Merchant, N., Sarkar, S., et al. (2021). UAS-based plant phenotyping for research and breeding applications. Plant Phenomics 2021, 9840192. doi: 10.34133/2021/9840192

PubMed Abstract | Crossref Full Text | Google Scholar

Guo, A., Sun, K., and Wang, M. (2023). Domain adaptive fruit detection method based on multiple alignments. J. Intell. \& Fuzzy Syst. 45, 5837–5851. doi: 10.3233/JIFS-232104

Crossref Full Text | Google Scholar

Gupta, A. and Singh, A. (2023). Agri-GNN: A novel genotypic-topological graph neural network framework built on graphSAGE for optimized yield prediction, Vol. 1–19, arXiv preprint arXiv:2310.13037. doi: 10.48550/arXiv.2310.13037

Crossref Full Text | Google Scholar

Han, J., Kwon, Y., Choi, Y. S., and Kang, S. (2024). Improving chemical reaction yield prediction using pre-trained graph neural networks. J. Cheminform 16, 25. doi: 10.1186/s13321-024-00818-z

PubMed Abstract | Crossref Full Text | Google Scholar

Han, S., Mao, H., and Dally, W. J. (2016). “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” in International Conference on Learning Representations 2016 (ICLR 2016). San Juan, Puerto Rico, USA: OpenReview / ICLR, 1–14.

Google Scholar

He, K., Zhang, X., Ren, S., and Sun, J. (2016). “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). Las Vegas, Nevada, USA: IEEE, 770–778. doi: 10.1109/CVPR.2016.90

Crossref Full Text | Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9, 1735–1780. doi: 10.1162/neco.1997.9.8.1735

PubMed Abstract | Crossref Full Text | Google Scholar

Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L. C., Tan, M., et al. (2019). “Searching for mobileNetV3,” in Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019). Seoul, Republic of Korea: IEEE/CVF, 1314–1324. doi: 10.1109/ICCV.2019.00140

Crossref Full Text | Google Scholar

Hu, B., Jiang, W., Zeng, J., Cheng, C., and He, L. (2023). FOTCA: hybrid transformer-CNN architecture using AFNO for accurate plant leaf disease image recognition. Front. Plant Sci. 14, 1–12. doi: 10.3389/fpls.2023.1231903

PubMed Abstract | Crossref Full Text | Google Scholar

Jain, M., Srivastava, A. K., Balwinder-Singh, Joon, R. K., McDonald, A., Royal, K., et al. (2016). Mapping smallholder wheat yields and sowing dates using micro-satellite data. Remote Sens 8, 860. doi: 10.3390/rs8100860

Crossref Full Text | Google Scholar

James, C., Chandra, S. S., and Chapman, S. C. (2025). A scalable and efficient UAV-based pipeline and deep learning framework for phenotyping sorghum panicle morphology from point clouds. Plant Phenomics 7, 100050. doi: 10.1016/j.plaphe.2025.100050

PubMed Abstract | Crossref Full Text | Google Scholar

Jeong, J. H., Resop, J. P., Mueller, N. D., Fleisher, D. H., Yun, K., Butler, E. E., et al. (2016). Random forests for global and regional crop yield predictions. PloS One 11, e0156571. doi: 10.1371/journal.pone.0156571

PubMed Abstract | Crossref Full Text | Google Scholar

Jiang, Y. and Li, C. (2020). Convolutional neural networks for image-based high-throughput plant phenotyping: A review. Plant Phenomics. 2020, 4152816. doi: 10.34133/2020/4152816

PubMed Abstract | Crossref Full Text | Google Scholar

Jiaxuan, Y., Xiaocheng, L., Melvin, L., and David Lobell, S. E. (2017). Deep gaussian process for crop yield prediction based on remote sensing data. Proc. Thirty-First AAAI Conf Artif. Intell. 2, 4559–4565. doi: 10.1609/aaai.v31i1.11172

Crossref Full Text | Google Scholar

Joshi, A., Guevara, D., and Earles, M. (2023). Standardizing and centralizing datasets for efficient training of agricultural deep learning models. Plant Phenomics 5, 84. doi: 10.34133/plantphenomics.0084

PubMed Abstract | Crossref Full Text | Google Scholar

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. doi: 10.1038/s41586-021-03819-2

PubMed Abstract | Crossref Full Text | Google Scholar

Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., et al. (2021). Advances and open problems in federated learning. Foundations Trends Mach. Learn. 14, 1–210. doi: 10.1561/2200000083

Crossref Full Text | Google Scholar

Karmakar, P., Teng, S. W., Murshed, M., Pang, S., Li, Y., and Lin, H. (2024). Crop monitoring by multimodal remote sensing: A review. Remote Sens Appl. Soc. Environ. 33, 101093. doi: 10.1016/j.rsase.2023.101093

Crossref Full Text | Google Scholar

Khaki, S. and Wang, L. (2019). Crop yield prediction using deep neural networks. Front. Plant Sci. 10. doi: 10.3389/fpls.2019.00621

PubMed Abstract | Crossref Full Text | Google Scholar

Khuimphukhieo, I. and da Silva, J. A. (2025). Unmanned aerial systems (UAS)-based field high throughput phenotyping (HTP) as plant breeders’ toolbox: A comprehensive review. Smart Agric. Technol. 11, 100888. doi: 10.1016/j.atech.2025.100888

Crossref Full Text | Google Scholar

Kipf, T. N. and Welling, M. (2017). “Semi-Supervised classfication with graph convolutional networks,” in 5th International Conference on Learning Representations (ICLR 2017). Palais des Congrès Neptune, Toulon, France: ICLR, 1–14. arXiv preprint arXiv: 1609.02907. doi: 10.48550/arXiv.1609.02907

Crossref Full Text | Google Scholar

Knecht, A. C., Campbell, M. T., Caprez, A., Swanson, D. R., and Walia, H. (2016). Image Harvest: an open-source platform for high-throughput plant image processing and analysis. J. Exp. Bot. 67, 3587–3599. doi: 10.1093/jxb/erw176

PubMed Abstract | Crossref Full Text | Google Scholar

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90. doi: 10.1145/3065386

Crossref Full Text | Google Scholar

Kumar, A., Khare, S., and Rossi, S. (2025). PhenoAI: A deep learning Python framework to process close-range time-lapse PhenoCam data. Ecol. Inform 88, 103134. doi: 10.1016/j.ecoinf.2025.103134

Crossref Full Text | Google Scholar

Kumar, C., Mubvumba, P., Huang, Y., Dhillon, J., and Reddy, K. (2023). Multi-stage corn yield prediction using high-resolution UAV multispectral data and machine learning models. Agronomy 13, 1277. doi: 10.3390/agronomy13051277

Crossref Full Text | Google Scholar

Lan, Y., Huang, K., Yang, C., Lei, L., Ye, J., Zhang, J., et al. (2021). Real-time identification of rice weeds by UAV low-altitude remote sensing based on improved semantic segmentation model. Remote Sens 13, 4370. doi: 10.3390/rs13214370

Crossref Full Text | Google Scholar

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. IEEE. 86, 2278–2324. doi: 10.1109/5.726791

Crossref Full Text | Google Scholar

Li, D., Li, J., Xiang, S., and Pan, A. (2022). PSegNet: simultaneous semantic and instance segmentation for point clouds of plants. Plant phenomics (Washington DC). 2022, 9787643. doi: 10.34133/2022/9787643

PubMed Abstract | Crossref Full Text | Google Scholar

Li, W., Liang, S., Chen, K., Chen, Y., Ma, H., Xu, J., et al. (2025). AgriFM: A multi-source temporal remote sensing foundation model for crop mapping, arXiv preprint arXiv:2505.21357. doi: 10.48550/arXiv.2505.21357

Crossref Full Text | Google Scholar

Li, G., Wang, Y., Zhao, Q., Yuan, P., and Chang, B. (2023). PMVT: a lightweight vision transformer for plant disease identification on mobile devices. Front. Plant Sci. 14. doi: 10.3389/fpls.2023.1256773

PubMed Abstract | Crossref Full Text | Google Scholar

Lin, F., Crawford, S., Guillot, K., Zhang, Y., Chen, Y., Yuan, X., et al. (2023). “MMST-viT: climate change-aware crop yield prediction via multi-modal spatial-temporal vision transformer,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France: IEEE/CVF, 5751–5761. doi: 10.1109/ICCV51070.2023.00531

Crossref Full Text | Google Scholar

Lin, S., Qi, Z., Li, X., Zhang, H., Lv, Q., and Huang, D. (2024). A phenological-knowledge-independent method for automatic paddy rice mapping with time series of polarimetric SAR images. ISPRS J. Photogramm Remote Sens 218, 628–644. doi: 10.1016/j.isprsjprs.2024.09.035

Crossref Full Text | Google Scholar

Lin, T., Zhong, R., Wang, Y., Xu, J., Jiang, H., Xu, J., et al. (2020). 15, DeepCropNet: a deep spatial-temporal learning framework for county-level corn yield estimation DeepCropNet: a deep spatial-temporal learning framework for county-level corn yield estimation. Environ. Res. Lett. 034016. doi: 10.1088/1748-9326/ab66cb

Crossref Full Text | Google Scholar

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021). “Swin transformer: hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021). Montreal, QC, Canada: IEEE/CVF, 9992–10002. doi: 10.1109/ICCV48922.2021.00986

Crossref Full Text | Google Scholar

Liu, Y., Wang, S., Chen, J., Chen, B., Wang, X., Hao, D., et al. (2022). Rice yield prediction and model interpretation based on satellite and climatic indicators using a transformer method. Remote Sens. 14, 5045. doi: 10.3390/rs14195045

Crossref Full Text | Google Scholar

Lu, J., Batra, D., Parikh, D., and Lee, S. (2019). “ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks,” in Proceedings of the 33rd international conference on neural information processing systems (Curran Associates Inc, Red Hook, NY, USA). doi: 10.48550/arXiv.1908.02265

Crossref Full Text | Google Scholar

Lu, F., Zhang, B., Hou, Y., Xiong, X., Dong, C., Lu, W., et al. (2025). A spatiotemporal attention-guided graph neural network for precise hyperspectral estimation of corn nitrogen content. Agronomy 15, 1041. doi: 10.3390/agronomy15051041

Crossref Full Text | Google Scholar

Lundberg, S. M. and Lee, S. I. (2017). “A unified approach to interpreting model predictions,” in Proceedings of the 31st international conference on neural information processing systems (Curran Associates Inc, Red Hook, NY, USA), 4768–4777.

Google Scholar

Luo, J., Liu, Z., Wang, Y., Tang, A., Zuo, H., and Han, P. (2024). Efficient small object detection you only look once: A small object detection algorithm for aerial images. Sensors. 24, 7067. doi: 10.3390/s24217067

PubMed Abstract | Crossref Full Text | Google Scholar

Ma, Y., Yang, Z., Huang, Q., and Zhang, Z. (2023). Improving the transferability of deep learning models for crop yield prediction: A partial domain adaptation approach. Remote Sensing. 15, 4562. doi: 10.3390/rs15184562

Crossref Full Text | Google Scholar

Ma, Y., Zhang, Z., Yang, H. L., and Yang, Z. (2021). An adaptive adversarial domain adaptation approach for corn yield prediction. Comput. Electron Agric. 187, 106314. doi: 10.1016/j.compag.2021.106314

Crossref Full Text | Google Scholar

McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. (2017). “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th international conference on artificial intelligence and statistics, vol. 54 . Eds. Singh, A. and Zhu, J. (Fort Lauderdale, Florida, USA: Proceedings of Machine Learning Research), 1273–1282. doi: 10.48550/arXiv.1602.05629

Crossref Full Text | Google Scholar

Mehdipour, S., Mirroshandel, S. A., and Tabatabaei, S. A. (2025). Vision transformers in precision agriculture: A comprehensive survey, arXiv preprint arXiv: 2504.21706. doi: 10.48550/arXiv.2504.21706

Crossref Full Text | Google Scholar

Minervini, M., Abdelsamea, M. M., and Tsaftaris, S. A. (2014). Image-based plant phenotyping with incremental learning and active contours. Ecol. Inform 23, 35–48. doi: 10.1016/j.ecoinf.2013.07.004

Crossref Full Text | Google Scholar

Mohanty, S. P., Hughes, D. P., and Salathé, M. (2016). Using deep learning for image-based plant disease detection. Front. Plant Sci. 7. doi: 10.3389/fpls.2016.01419

PubMed Abstract | Crossref Full Text | Google Scholar

Montesinos López, O. A., Montesinos López, A., and Crossa, J. (2022). “Support vector machines and support vector regression BT,” in Multivariate statistical machine learning methods for genomic prediction. Eds. Montesinos López, O. A., Montesinos López, A., and Crossa, J. (Springer International Publishing, Cham), 337–378. doi: 10.1007/978-3-030-89010-0_9

Crossref Full Text | Google Scholar

Mostafa, S., Mondal, D., Panjvani, K., Kochian, L., and Stavness, I. (2023). Explainable deep learning in plant phenotyping. Front. Artif. Intell. 6, 1203546. doi: 10.3389/frai.2023.1203546

PubMed Abstract | Crossref Full Text | Google Scholar

Murphy, K. M., Ludwig, E., Gutierrez, J., and Gehan, M. A. (2024). Deep learning in image-based plant phenotyping. Annu. Rev. Plant Biol. 75, 771–795. doi: 10.1146/annurev-arplant-070523-042828

PubMed Abstract | Crossref Full Text | Google Scholar

Nabwire, S., Suh, H. K., Kim, M. S., Baek, I., and Cho, B. K. (2021). Review: Application of artificial intelligence in phenomics. Sensors. 21, 1–19. doi: 10.3390/s21134363

PubMed Abstract | Crossref Full Text | Google Scholar

Najjar, H., Pathak, D., Nuske, M., and Dengel, A. (2025). Intrinsic explainability of multimodal learning for crop yield prediction. Comput. Electron Agric. 239, 111003. doi: 10.1016/j.compag.2025.111003

Crossref Full Text | Google Scholar

Nawaz, U., Zaheer, M. Z., Khan, F. S., Cholakkal, H., Khan, S., and Anwer, R. M. (2025). AI in agriculture: A survey of deep learning techniques for crops, fisheries and livestock. 1–42, arXiv preprint arXiv: 2507.22101. doi: 10.48550/arXiv.2507.22101

Crossref Full Text | Google Scholar

Nevavuori, P., Narra, N., Linna, P., and Lipping, T. (2020). Crop yield prediction using multitemporal UAV data and spatio-temporal deep learning models. Remote Sens 12, 4000. doi: 10.3390/rs12234000

Crossref Full Text | Google Scholar

Newman, S. J. and Furbank, R. T. (2021). Explainable machine learning models of major crop traits from satellite-monitored continent-wide field trial data. Nat. plants. 7, 1354–1363. doi: 10.1038/s41477-021-01001-0

PubMed Abstract | Crossref Full Text | Google Scholar

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A. Y., et al. (2011). Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML 2011). Bellevue, Washington, USA: Microtome Publishing on behalf of the International Machine Learning Society (IMLS). 689–696.

Google Scholar

Nguyen, C., Sagan, V., Bhadra, S., and Moose, S. (2023). UAV multisensory data fusion and multi-task deep learning for high-throughput maize phenotyping. Sensors 23, 1827. doi: 10.3390/s23041827

PubMed Abstract | Crossref Full Text | Google Scholar

Ninomiya, S. (2022). High-throughput field crop phenotyping: current status and challenges. Breed Sci. 72, 3–18. doi: 10.1270/jsbbs.21069

PubMed Abstract | Crossref Full Text | Google Scholar

Nirosha, U. and Vennila, G. (2025a). “Federated learning-powered hybrid model for crop prediction: combining graph neural networks with RNNs,” in 2025 Global Conference in Emerging Technology (GINOTECH 2025). PUNE, India: IEEE, 1–6.

Google Scholar

Nirosha, U. and Vennila, G. (2025b). Enhancing crop yield prediction for agriculture productivity using federated learning integrating with graph and recurrent neural networks model. Expert Syst. Appl. 289, 128312. doi: 10.1016/j.eswa.2025.12832

Crossref Full Text | Google Scholar

Nishio, M. and Satoh, M. (2015). Genomic best linear unbiased prediction method including imprinting effects for genomic evaluation. Genet. Sel Evol. 47, 32. doi: 10.1186/s12711-015-0091-y

PubMed Abstract | Crossref Full Text | Google Scholar

Osco, L. P., Marcato Junior, J., Marques Ramos, A. P., de Castro Jorge, L. A., Fatholahi, S. N., de Andrade Silva, J., et al. (2021a). A review on deep learning in UAV remote sensing. Int. J. Appl. Earth Obs Geoinf 102, 102456. doi: 10.1016/j.jag.2021.102456

Crossref Full Text | Google Scholar

Osco, L. P., Nogueira, K., Marques Ramos, A. P., Faita Pinheiro, M. M., Furuya, D. E. G., Gonçalves, W. N., et al. (2021b). Semantic segmentation of citrus-orchard using deep neural networks and multispectral UAV-based imagery. Precis Agric. 22, 1171–1188. doi: 10.1007/s11119-020-09777-5

Crossref Full Text | Google Scholar

Papoutsoglou, E. A., Athanasiadis, I. N., Visser, R. G. F., and Finkers, R. (2023). The benefits and struggles of FAIR data: the case of reusing plant phenotyping data. Sci. data. 10, 457. doi: 10.1038/s41597-023-02364-z

PubMed Abstract | Crossref Full Text | Google Scholar

Poorter, H., Niinemets, Ü., Ntagkas, N., Siebenkäs, A., Mäenpää, M., Matsubara, S., et al. (2019). A meta-analysis of plant responses to light intensity for 70 traits ranging from molecules to whole plant performance. New Phytol. 223, 1073–1105. doi: 10.1111/nph.15754

PubMed Abstract | Crossref Full Text | Google Scholar

Pound, M. P., Atkinson, J. A., Townsend, A. J., Wilson, M. H., Griffiths, M., Jackson, A. S., et al. (2017). Deep machine learning provides state-of-the-art performance in image-based plant phenotyping. Gigascience. 6, 1–10. doi: 10.1093/gigascience/gix083

PubMed Abstract | Crossref Full Text | Google Scholar

Pyzer-Knapp, E. O., Pitera, J. W., Staar, P. W. J., Takeda, S., Laino, T., Sanders, D. P., et al. (2022). Accelerating materials discovery using artificial intelligence, high performance computing and robotics. NPJ Comput. Mater 8, 84. doi: 10.1038/s41524-022-00765-z

Crossref Full Text | Google Scholar

Qiao, M., He, X., Cheng, X., Li, P., Zhao, Q., Zhao, C., et al. (2023). KSTAGE: A knowledge-guided spatial-temporal attention graph learning network for crop yield prediction. Inf Sci. (Ny) 619, 19–37. doi: 10.1016/j.ins.2022.10.112

Crossref Full Text | Google Scholar

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021). Learning transferable visual models from natural language supervision. Proc. Mach. Learn Res. 139, 8748–8763. doi: 10.48550/arXiv.2103.00020

Crossref Full Text | Google Scholar

Rajpurkar, P., Chen, E., Banerjee, O., and Topol, E. J. (2022). AI in health and medicine. Nat. Med. 28, 31–38. doi: 10.1038/s41591-021-01614-0

PubMed Abstract | Crossref Full Text | Google Scholar

Reddy, J., Niu, H., Scott, J. L. L., Bhandari, M., Landivar, J. A., Bednarz, C. W., et al. (2024). Cotton yield prediction via UAV-based cotton boll image segmentation using YOLO model and segment anything model (SAM). Remote Sens 16, 4346. doi: 10.3390/rs16234346

Crossref Full Text | Google Scholar

Ren, S., He, K., Girshick, R., and Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149. doi: 10.1109/TPAMI.2016.2577031

PubMed Abstract | Crossref Full Text | Google Scholar

Rojas, O. (2021). Next generation agricultural stress index system (ASIS) for agricultural drought monitoring. Remote Sens 13, 959. doi: 10.3390/rs13050959

Crossref Full Text | Google Scholar

Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-net: convolutional networks for biomedical image segmentation BT,” in Medical image computing and computer-assisted intervention – MICCAI 2015. Eds. Navab, N., Hornegger, J., Wells, W. M., and Frangi, A. F. (Springer International Publishing, Cham), 234–241. doi: 10.1007/978-3-319-24574-4_28

Crossref Full Text | Google Scholar

Sabo, F., Meroni, M., Waldner, F., and Rembold, F. (2023). Is deeper always better? Evaluating deep learning models for yield forecasting with small data. Environ. Monit Assess. 195, 1153. doi: 10.1007/s10661-023-11609-8

PubMed Abstract | Crossref Full Text | Google Scholar

Saha, S., Kucher, O. D., Utkina, A. O., and Rebouh, N. Y. (2025). Precision agriculture for improving crop yield predictions: a literature review. Front. Agron. 7. doi: 10.3389/fagro.2025.1566201

Crossref Full Text | Google Scholar

Sarkar, S., Dey, A., Pradhan, R., Sarkar, U. M., Chatterjee, C., Mondal, A., et al. (2024). Crop yield prediction using multimodal meta-transformer and temporal graph neural networks. IEEE Trans. AgriFood Electron. 2, 545–553. doi: 10.1109/TAFE.2024.3438330

Crossref Full Text | Google Scholar

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017). “Grad-CAM: visual explanations from deep networks via gradient-based localization,” in 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE/CVF, 618–626. doi: 10.1109/ICCV.2017.74

Crossref Full Text | Google Scholar

Serrano, D. R., Luciano, F. C., Anaya, B. J., Ongoren, B., Kara, A., Molina, G., et al. (2024). Artificial intelligence (AI) applications in drug discovery and drug delivery: revolutionizing personalized medicine. Pharmaceutics 16, 1328. doi: 10.3390/pharmaceutics16101328

PubMed Abstract | Crossref Full Text | Google Scholar

Shaban, A. and Yousefi, S. (2024). Multimodal deep learning. Springer Optim Its Appl. 211, 209–219. doi: 10.1007/978-3-031-53092-0_10

Crossref Full Text | Google Scholar

Shahhosseini, M., Hu, G., and Archontoulis, S. V. (2020). Forecasting corn yield with machine learning ensembles. Front. Plant Sci. 11, 1–16. doi: 10.3389/fpls.2020.01120

PubMed Abstract | Crossref Full Text | Google Scholar

Sharma, S., Rai, S., and Krishnan, N. C. (2020). Wheat crop yield prediction using deep LSTM model, arXiv preprint arXiv:2011.01498v1. doi: 10.48550/arXiv.2011.01498

Crossref Full Text | Google Scholar

Shi, S., Li, X., Fang, L., Liu, A., Su, G., Zhang, Y., et al. (2021). Genomic prediction using bayesian regression models with global–local prior. Front. Genet. 12. doi: 10.3389/fgene.2021.628205

PubMed Abstract | Crossref Full Text | Google Scholar

Simonyan, K. and Zisserman, A. (2015). “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations (ICLR 2015). San Diego, CA, USA: ICLR, 1–14. arXiv preprint arXiv: 1409.1556.

Google Scholar

Song, C., Lin, Y., Guo, S., and Wan, H. (2020). Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. Proc. AAAI Conf Artif. Intell. 34, 914–921. doi: 10.1609/aaai.v34i01.5438

Crossref Full Text | Google Scholar

Stephan, J., Stegle, O., and Beyer, A. (2015). A random forest approach to capture genetic effects in the presence of population structure. Nat. Commun. 6, 7432. doi: 10.1038/ncomms8432

PubMed Abstract | Crossref Full Text | Google Scholar

Sun, T., Chen, Y., Zheng, B., Sun, W., and May, L. G. (2025). Learning spatio-temporal dynamics for trajectory recovery via time-aware transformer. IEEE Trans. Intelligent Transportation Syst. 26, 16584–16601. doi: 10.1109/TITS.2025.3574100

Crossref Full Text | Google Scholar

Sun, J., Di, L., Sun, Z., Shen, Y., and Lai, Z. (2019). County-level soybean yield prediction using deep CNN-LSTM model. Sensors 19, 4363. doi: 10.3390/s19204363

PubMed Abstract | Crossref Full Text | Google Scholar

Tan, H. and Bansal, M. (2019). “Lxmert: learning cross-modality encoder representations from transformers,” in Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Eds. Inui, K., Jiang, J., Ng, V., and Wan, X. (Association for Computational Linguistics, Hong Kong, China), 5100–5111. doi: 10.18653/v1/D19-1514

Crossref Full Text | Google Scholar

Tang, Z., Parajuli, A., Chen, C. J., Hu, Y., Revolinski, S., Medina, C. A., et al. (2021). Validation of UAV-based alfalfa biomass predictability using photogrammetry with fully automatic plot segmentation. Sci. Rep. 11, 3336. doi: 10.1038/s41598-021-82797-x

PubMed Abstract | Crossref Full Text | Google Scholar

Tardieu, F., Cabrera-Bosquet, L., Pridmore, T., and Bennett, M. (2017). Plant phenomics, from sensors to knowledge. Curr. Biol. 27, R770–R783. doi: 10.1016/j.cub.2017.05.055

PubMed Abstract | Crossref Full Text | Google Scholar

Tausen, M., Clausen, M., Moeskjær, S., Shihavuddin, A., Dahl, A. B., Janss, L., et al. (2020). Greenotyper: image-based plant phenotyping using distributed computing and deep learning. Front. Plant Sci. 11, 1181. doi: 10.3389/fpls.2020.01181

PubMed Abstract | Crossref Full Text | Google Scholar

Torgbor, B. A., Rahman, M. M., Brinkhoff, J., Sinha, P., and Robson, A. (2023). Integrating remote sensing and weather variables for mango yield prediction using a machine learning approach. Remote Sens 15, 3075. doi: 10.3390/rs15123075

Crossref Full Text | Google Scholar

Ubbens, J., Cieslak, M., Prusinkiewicz, P., and Stavness, I. (2018). The use of plant models in deep learning: an application to leaf counting in rosette plants. Plant Methods 14, 6. doi: 10.1186/s13007-018-0273-z

PubMed Abstract | Crossref Full Text | Google Scholar

Ubbens, J. R. and Stavness, I. (2017). Deep plant phenomics: A deep learning platform for complex plant phenotyping tasks. Front. Plant Sci. 8, 1190. doi: 10.3389/fpls.2017.01190

PubMed Abstract | Crossref Full Text | Google Scholar

van Dijk, M., Morley, T., Rau, M. L., and Saghai, Y. (2021). A meta-analysis of projected global food demand and population at risk of hunger for the period 2010–2050. Nat. Food 2, 494–501. doi: 10.1038/s43016-021-00322-9

PubMed Abstract | Crossref Full Text | Google Scholar

Van Hoolst, R., Eerens, H., Haesen, D., Royer, A., Bydekerke, L., Rojas, O., et al. (2016). FAO’s AVHRR-based Agricultural Stress Index System ASIS for global drought monitoring. Int. J. Remote Sens 37, 418–439. doi: 10.1080/01431161.2015.1126378

Crossref Full Text | Google Scholar

van Klompenburg, T., Kassahun, A., and Catal, C. (2020). Crop yield prediction using machine learning: A systematic literature review. Comput. Electron Agric. 177, 105709. doi: 10.1016/j.compag.2020.105709

Crossref Full Text | Google Scholar

Varela, S., Zheng, X., Njuguna, J., Sacks, E., Allen, D., Ruhter, J., et al. (2025). Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery. Plant Physiol. 197, kiaf132. doi: 10.1093/plphys/kiaf132

PubMed Abstract | Crossref Full Text | Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). “Attention is all you need,” in Neural Information Processing Systems 2017 (NeurIPS 2017). Long Beach, CA, USA: Curran Associates, Inc., 5999–6009. doi: 10.5555/3295222.3295349

Crossref Full Text | Google Scholar

Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., and Bengio, Y. (2017). Graph attention networks, arXiv preprint arXiv:1710.10903. doi: 10.48550/arXiv.1710.10903

Crossref Full Text | Google Scholar

Wang, K., Abid, M. A., Rasheed, A., Crossa, J., Hearne, S., and Li, H. (2023). DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol. Plant 16, 279–293. doi: 10.1016/j.molp.2022.11.004

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, C., Ling, L., Kuai, J., Xie, J., Ma, N., You, L., et al. (2025). Integrating UAV and satellite LAI data into a modified DSSAT-rapeseed model to improve yield predictions. F Crop Res. 327, 109883. doi: 10.1016/j.fcr.2025.109883

Crossref Full Text | Google Scholar

Wang, H., Yan, S., Wang, W., Chen, Y., Hong, J., He, Q., et al. (2025). Cropformer: An interpretable deep learning framework for crop genomic prediction. Plant Commun. 6, 101223. doi: 10.1016/j.xplc.2024.101223

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, Y., Zhang, Z., Feng, L., Ma, Y., and Du, Q. (2021). A new attention-based CNN approach for crop mapping using time series Sentinel-2 images. Comput. Electron Agric. 184, 106090. doi: 10.1016/j.compag.2021.106090

Crossref Full Text | Google Scholar

Waqas, M., Naseem, A., Humphries, U. W., Hlaing, P. T., Dechpichai, P., and Wangwongchai, A. (2025). Applications of machine learning and deep learning in agriculture: A comprehensive review. Green Technol. Sustain 3, 100199. doi: 10.1016/j.grets.2025.100199

Crossref Full Text | Google Scholar

Wu, H., Wu, W., Huang, Y., Liu, S., Liu, Y., Zhang, N., et al. (2025). FEWheat-YOLO: A lightweight improved algorithm for wheat spike detection. Plants 14, 3058. doi: 10.3390/plants14193058

PubMed Abstract | Crossref Full Text | Google Scholar

Wu, B., Zhang, M., Zeng, H., Tian, F., Potgieter, A. B., Qin, X., et al. (2023). Challenges and opportunities in remote sensing-based crop monitoring: a review. Natl. Sci. Rev. 10, nwac290. doi: 10.1093/nsr/nwac290

PubMed Abstract | Crossref Full Text | Google Scholar

Xie, W., Zhao, M., Liu, Y., Yang, D., Huang, K., Fan, C., et al. (2024). Recent advances in Transformer technology for agriculture: A comprehensive survey. Eng. Appl. Artif. Intell. 138, 109412. doi: 10.1016/j.engappai.2024.109412

Crossref Full Text | Google Scholar

Yang, C., Baireddy, S., Cai, E., Crawford, M., and Delp, E. J. (2021). “Field-based plot extraction using UAV RGB images,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021). Montreal, QC, Canada: IEEE/CVF, 1390–1398. doi: 10.1109/ICCVW54120.2021.00160

Crossref Full Text | Google Scholar

Yang, G., Liu, J., Zhao, C., Li, Z., and Huang, Y. (2017). Unmanned aerial vehicle remote sensing for field-based crop phenotyping: current status and perspectives. Front. Plant Sci. 8, 1111. doi: 10.3389/fpls.2017.01111

PubMed Abstract | Crossref Full Text | Google Scholar

Yu, S., Xie, L., and Dai, L. (2025). ST-CFI: Swin Transformer with convolutional feature interactions for identifying plant diseases. Sci. Rep. 15, 25000. doi: 10.1038/s41598-025-08673-0

PubMed Abstract | Crossref Full Text | Google Scholar

Zaka, M. M. and Samat, A. (2024). Advances in remote sensing and machine learning methods for invasive plants study: A comprehensive review. Remote Sens 16, 3781. doi: 10.3390/rs16203781

Crossref Full Text | Google Scholar

Zenkl, R., Timofte, R., Kirchgessner, N., Roth, L., Hund, A., Van Gool, L., et al. (2021). Outdoor plant segmentation with deep learning for high-throughput field phenotyping on a diverse wheat dataset. Front. Plant Sci. 12, 774068. doi: 10.3389/fpls.2021.774068

PubMed Abstract | Crossref Full Text | Google Scholar

Zhan, Y., Zhou, Y., Bai, G., and Ge, Y. (2024). Bagging improves the performance of deep learning-based semantic segmentation with limited labeled images: A case study of crop segmentation for high-throughput plant phenotyping. Sensors. 24, 3420. doi: 10.3390/s24113420

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, W., Chen, K., Wang, J., Shi, Y., and Guo, W. (2021). Easy domain adaptation method for filling the species gap in deep learning-based fruit detection. Hortic. Res. 8, 119. doi: 10.1038/s41438-021-00553-8

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, L., Li, C., Wu, X., Xiang, H., Jiao, Y., and Chai, H. (2024). BO-CNN-BiLSTM deep learning model integrating multisource remote sensing data for improving winter wheat yield estimation. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1500499

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, W., Zheng, C., Wang, C., and Guo, W. (2024). DomAda-fruitDet: domain-adaptive anchor-free fruit detection model for auto labeling. Plant phenomics (Washington DC). 6, 135. doi: 10.34133/plantphenomics.0135

PubMed Abstract | Crossref Full Text | Google Scholar

Zhou, H., Huang, F., Lou, W., Gu, Q., Ye, Z., Hu, H., et al. (2025). Yield prediction through UAV-based multispectral imaging and deep learning in rice breeding trials. Agric. Syst. 223, 104214. doi: 10.1016/j.agsy.2024.104214

Crossref Full Text | Google Scholar

Zhou, H., Jiang, F., and Lu, H. (2023). SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection. Comput. Vis. Image Underst. 229, 103649. doi: 10.1016/j.cviu.2023.103649

Crossref Full Text | Google Scholar

Keywords: AI, image analysis, phenotyping, precision agriculture, remote sensing, yield prediction

Citation: Wang T, Tong R, Xu T, Li Y and Chen Y (2026) Artificial intelligence in plant science: from image-based phenotyping to yield and trait prediction. Front. Plant Sci. 16:1732979. doi: 10.3389/fpls.2025.1732979

Received: 27 October 2025; Accepted: 26 December 2025; Revised: 06 December 2025;
Published: 29 January 2026.

Edited by:

Ruslan Kalendar, University of Helsinki, Finland

Reviewed by:

Chien Van Ha, Texas Tech University, United States
Krishna Kumar Rai, Amity University, India

Copyright © 2026 Wang, Tong, Xu, Li and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tong Wang, d2FuZ3RvbmduZWxseUBnbWFpbC5jb20=

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.