Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci., 06 February 2026

Sec. Sustainable and Intelligent Phytoprotection

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1696730

UAV multispectral sensing and data-driven modeling for precision onion yield prediction

Sagar M. WayalSagar M. Wayal1Shardul ParabShardul Parab2Anusha RajAnusha Raj1Kiran KhandagaleKiran Khandagale1Sanket BhegdeSanket Bhegde2Mukund DawaleMukund Dawale1Indira BhangareIndira Bhangare1Mahesh KhaireMahesh Khaire1Yogesh KadamYogesh Kadam1Zafar ShaikhZafar Shaikh2V. KaruppaiahV. Karuppaiah1Pranjali GedamPranjali Gedam1Bhushan BibweBhushan Bibwe1Sanket J. MoreSanket J. More1Lakesh K. SharmaLakesh K. Sharma3Vijay MahajanVijay Mahajan1Suresh J. Gawande*Suresh J. Gawande1*
  • 1ICAR-Directorate of Onion and Garlic Research, Pune, India
  • 2TIH Foundation for IoT & IoE, Mumbai, India
  • 3Soil, Water, and Ecosystem Sciences Department, University of Florida, IFAS, Gainesville, FL, United States

The integration of unmanned aerial vehicle (UAV)-assisted remote sensing with the Internet of Things (IoT) and Internet of Everything (IoE) offers a robust platform for optimizing precision agriculture by capturing spatiotemporal variability in crop growth. In this context, the present study aimed to predict the bulb yield of rainy-season onion crops across four staggered planting dates using UAV-based multispectral imagery. Canopy reflectance mosaics acquired at key growth stages, along with vegetation indices (VIs), viz. NDVI, NDRE, SAVI, LAI, NORM2, and GNDVI, were extracted for yield modeling. Yield prediction models at three onion growth stages were developed and assessed using five machine learning algorithms: linear regression (lm), random forest (rf), support vector machine with radial kernel (svmRadial), gradient boosting (gbm), and elastic net regression (glmnet), with model training and evaluation performed using 10-fold cross-validation. Among these, random forest consistently outperformed the other models at all growth stages, showing promising results at the bulb development stage, with a training R2 = 0.944, RMSE = 1.919 t ha-1, MAE = 1.523 t ha−1, and a validation R² = 0.755, RMSE = 3.824 t ha−1, and MAE = 3.11 t ha−1. The support vector machine also demonstrated strong generalization (training R² = 0.787; validation R2 = 0.716), highlighting its predictive capability. Year-wise evaluation revealed notable interannual variability in model performance, with models trained on data from 2024 outperforming those from 2023. Overall, these results demonstrate the efficacy of UAV-derived multispectral sensing, combined with machine learning, as an effective, scalable, and timely approach for reliable onion yield prediction and decision support in rainy-season onion crops under varying agronomic conditions.

1 Introduction

Onion (Allium cepa L.) is a critical staple crop worldwide, valued for its nutritional benefits, extended storability, and central role in culinary applications. India stands as the world’s largest producer, cultivated across an area of 1.7 mha and contributing 30.19 Mt to global onion production in 2022–23 (FAOSTAT, 2025). Owing to its geographical location and diverse agro-climatic conditions, onion in India is cultivated across three distinct seasons, namely: (1) rainy season (June to September, referred to as Kharif onion); (2) post-rainy season (September to December, also called late Kharif onion); and (3) winter season (November–December to March, also called Rabi onion). Among these seasons, rainy-season onion production faces significant agronomic and economic challenges. Although rainy-season onions account for only 20%–25% of India’s total onion output, they play a crucial role in price stabilization because of their timely arrival, which coincides with the depletion of stored bulbs from the previous post-rainy or winter season (Gadge and Lawande, 2012). Yet, the rainy-season crop is highly susceptible to biotic stresses, including damping-off, anthracnose, purple blotch, and Stemphylium blight, owing to excessive moisture and unpredictable weather patterns, which often result in severe yield losses and sharp market price fluctuations (Moka et al., 2021; Salunke et al., 2022). Accurately forecasting onion yield is therefore essential for proactive supply chain management, early warning of shortages, and effective policy response, irrespective of the growing season.

Traditional methods for estimating crop yield rely on manual field surveys and destructive sampling, which are accurate at small scales but labor-intensive and impractical for timely, large-scale yield assessments. In contrast, satellite-based remote sensing has extended monitoring capabilities to larger areas (Usha and Singh, 2013). However, large-scale monitoring suffers from limitations related to spatial resolution, cloud cover, and infrequent revisit intervals, which are particularly problematic during the rainy season. These constraints necessitate the development of advanced, high-resolution, and timely monitoring tools for robust onion yield prediction under real-world farming conditions (Rahimi and Jung, 2024).

Recent advances in agricultural research have increasingly focused on deploying unmanned aerial vehicles (UAVs) for disease detection, plant health monitoring, and precise pesticide application due to their operational flexibility and close-range, high-resolution imaging (Faical et al., 2014). UAVs have become increasingly popular in precision agriculture because of their ability to capture high-resolution imagery at close range, surpassing many of the limitations of conventional satellite systems (Maes and Steppe, 2019). UAV-assisted multispectral imaging is now widely recognized for its value in modeling biotic and abiotic stresses, with numerous studies demonstrating successful automated disease detection in major crops, including wheat, potato, banana, cotton, peanut, and tomato (Su et al., 2018; Rodriguez et al., 2021; Ye et al., 2020; Wang et al., 2023; Chen et al., 2020; Abdulridha et al., 2020). These findings underscore the growing potential of UAV-based remote sensing for timely, data-driven crop management strategies.

Additionally, UAV-aided imagery has emerged as a powerful tool for crop growth monitoring and biomass estimation through non-destructive approaches, offering high spatiotemporal resolution that enhances its utility in agricultural applications (Fu et al., 2014). UAV platforms equipped with RGB, hyperspectral, and multispectral sensors have been effectively deployed to assess crop health and predict yield (Zhang and Kovacs, 2012). Vegetation indices (VIs), such as NDVI, ratio vegetation index (RVI), and leaf area index (LAI), have been extensively used to monitor crop growth and forecast yield across various crops, including rice (Din et al., 2017; Liu et al., 2025), wheat (Fu et al., 2020; Zhu et al., 2024), corn (Geipel et al., 2014; Bao et al., 2024), barley (Bendig et al., 2015), sugarcane (Ruwanpathirana et al., 2024), and onion (Córcoles et al., 2013). Although extensive UAV-based research exists for major cereals, relatively few studies have focused on commercially important vegetable crops such as onion. However, recent efforts have demonstrated the potential of UAV-derived multispectral imagery in onion research, including biomass monitoring (Ballesteros et al., 2018), yield prediction (Kang et al., 2020), and growth pattern analysis (Duarte-Correa et al., 2023; Farooqui et al., 2024). These emerging studies highlight the promise of UAV technology for improved monitoring and decision-making in onion cultivation. Earlier studies have effectively leveraged multiple machine learning algorithms to predict crop yield based on vegetation indices. Random forest (rf) has been validated for maize yield prediction using a ranking approach with NDVI, NDRE, and GNDVI (Ramos et al., 2020), while support vector machine models have been applied to wheat, potato, tomato, banana, and maize (Ayalew and Lohani, 2023). Gradient boosting (gbm) has been used for yield prediction in cassava, maize, plantains, potatoes, rice, sorghum, soybean, sweet potato, wheat, and yam (Mahesh and Soundrapandiyan, 2024). Despite these advances, significant gaps remain in evaluating and comparing the suitability of various vegetation indices for onion yield prediction. Leaf area index, a key indicator of crop photosynthesis, is widely recognized as a critical predictor, alongside plant height and crop surface models (Verger et al., 2014; Bendig et al., 2015, 2014). For onions and similar vegetable crops, plant height alone often lacks predictive reliability because of complex canopy structure and variable biomass accumulation. Moreover, most previous onion studies have focused on a single sowing or planting date, leaving intra-seasonal VI dynamics associated with staggered planting insufficiently explored. Understanding these temporal variations across growth stages is vital for developing robust and generalizable yield prediction models. This study introduces a growth stage-specific framework leveraging UAV-derived multispectral vegetation indices collected from the vegetative to bulb development stages across four staggered rainy-season plantings. The objectives of the current study were (1) to systematically analyze relationships between multiple vegetation indices and onion yield and (2) to assess the temporal dynamics of vegetation indices under staggered planting conditions. We hypothesize that high-resolution, UAV-derived multispectral vegetation indices collected at key growth stages, analyzed temporally and integrated with machine learning models, can accurately and robustly predict onion yield across staggered planting dates.

2 Materials and methods

2.1 Location of study area

The experiment was conducted during the rainy season (July–December) in 2023 and 2024 at the ICAR–Directorate of Onion and Garlic Research, Pune, Maharashtra, India. The experimental site is geographically located at a latitude of 18°50’27.99” N and a longitude of 73°53’12.88” E (EPSG:4326; WGS 84/UTM zone 43N), as shown in Figure 1. Figure 2 presents the average monthly maximum and minimum temperatures, as well as the total monthly precipitation recorded during the study period (i.e., July 1 to December 31) for both years. The study area lies in Maharashtra, India’s leading onion-producing state, contributing nearly 40% of the country’s total onion production (Sharma and Chauhan, 2024).

Figure 1
Map showing ICAR-DOGR research plot locations in India and Maharashtra, with an aerial view of the study area near Pune. The plot is marked with four trial areas in different colors.

Figure 1. Map showing the geographical location of the study area used for UAV-based data collection and analysis.

Figure 2
Bar and line graph showing monthly temperature and rainfall from July to December for 2023 and 2024. Bars represent maximum and minimum mean temperatures, with orange and light blue for 2023, and red and dark blue for 2024. Lines depict rainfall, with blue for 2023 and brown for 2024. Temperature in Celsius and rainfall in millimeters.

Figure 2. Temperature, relative humidity, average rainfall days, and precipitation of the study area during July-Decermber of 2023 and 2024 (source:IMD, Pune).

To capture intra-seasonal variability, four trials were established using staggered planting dates at 15-day intervals. Each trial followed a uniform experimental design comprising 15 subplots arranged in four rows. The first three rows comprised four subplots each, while the fourth row had three subplots. Each subplot measured 1 m × 8 m (8 m2) and was laid out on a broad bed furrow (BBF) system. A 1 m buffer was maintained between adjacent trials to minimize edge effects and ensure spatial independence during UAV flights and analysis. The nursery of the red onion cultivar ‘Bhima Super’, recommended for rainy-season cultivation, was raised on a BBF of 15 cm in height and 120 cm in top width, with a 45 cm furrow at 15-day intervals. Seedlings were transplanted 45–50 days after sowing at a spacing of 10 × 15 cm. Seedlings were dipped for at least 1 h in a solution containing carbendazim (50% WP; 1 g L−1) and carbosulfan (25% EC; 2 ml/L−1) before transplanting. The recommended fertilizer dose of 110:40:60:30 kg N:P:K:S ha−1 was applied, wherein the full dose of P and K, along with half of the N, was applied at transplanting, and the remaining N was top-dressed in two equal splits to ensure efficient nutrient utilization and sustained crop growth (Thangasamy and Lawande, 2015). The dates of transplanting (DOT) and dates of harvesting (DOH) are provided in Table 1. Onion bulbs were harvested at maturity when pseudostem lodging exceeded 50% (approximately 90–115 days after transplanting), and the marketable yield in each treatment was recorded. A total of 120 experimental plots were evaluated across four field trials conducted over two consecutive years (2023 and 2024).

Table 1
www.frontiersin.org

Table 1. Details of multi-spectral data acquisition and intervals.

2.2 Data collection and processing

2.2.1 Acquisition of UAV images

A MicaSense RedEdge-P multispectral camera mounted on a UAV was employed to capture multispectral imagery across key crop growth stages. Flights were conducted under clear-sky conditions between 12:00 and 15:00 h at an altitude of 30 m above ground level and a flight speed of 2 m s−1. Image acquisition was planned with 80% forward overlap and 60% side overlap, resulting in 165–170 images per field per flight. The camera captured five multispectral bands (blue: 475 nm ± 32; green: 560 nm ± 24; red: 668 nm ± 16; red edge: 717 nm ± 12; near-infrared: 842 nm ± 46) and one high-resolution panchromatic band (634.5 nm ± 46) at each flight pass. The MicaSense RedEdge-P camera was equipped with a 6.3 mm diagonal sensor for multispectral imaging and an 11.1 mm sensor for panchromatic imaging, with a pixel size of 3.45 µm. The multispectral bands featured a resolution of 1450 x 1088 pixels (1.58 MP) with a 4:3 aspect ratio, while the panchromatic band had a resolution of 2464 × 2056 pixels (5.1 MP) and a 6:5 aspect ratio. The focal length was 5.5 mm for the multispectral bands and 10.3 mm for the panchromatic band, with a field of view of 49.60 HFOV x 38.30 VFOV (multispectral) and 44.50 HFOV × 38.0° VFOV (panchromatic). The system captured 2–3 images per second, storing files in 16-bit TIFF raw format. Each image was georeferenced using the WGS 1984 datum and projected into UTM zone 43N to enable precise spatial analysis. Prior to each flight, radiometric calibration images were captured using standard reflectance panels placed on the ground, allowing accurate radiometric correction and reflectance conversion of UAV imagery. Detailed specifications of the camera system, flight parameters, and image acquisition schedule are summarized in Tables 1, 2, respectively.

Table 2
www.frontiersin.org

Table 2. Details of bands and bandwidth utilized for data collection.

2.2.2 Image processing

A set of images was processed using the automated processing template “Ag-Multispectral” in Pix4Dmapper 4.9.0 to perform cloud removal and radiometric correction, resulting in reflectance orthomosaics. Pre-processing included radiometric calibration, correction for sun sensor angles and irradiance, lens distortion correction (radial and tangential), and cloud removal. Radiometric corrections accounted for solar angle variations corresponding to the time of each flight (generally conducted between 12:00 and 15:00 h), ensuring consistent reflectance values across datasets. ArcGIS 10.3 was used to mosaic and composite the spectral bands. Onion field plots were delineated using shapefiles, and six multispectral bands were processed to remove soil pixels and eliminate edge artifacts from the orthomosaics. Vegetation index maps, illustrated in Figures 3, 4, present zonal statistics extracted for each plot. Soil pixels were filtered using a mask based on the hue index, which classified each pixel as either soil or plant vegetation using a threshold value (Matias et al., 2020).

Figure 3
Satellite images and data visualizations of an agricultural field divided into plots. The top images show field overviews with plots marked. The bottom images include smaller grids labeled a, b, c, and d, displaying varying vegetation indices in green and yellow tones. A close-up on the right highlights green areas, indicating vegetation health.

Figure 3. Orthomasaic map overlaid with field shapefiles and vegetation index layers, along with extracted zonal statistics for each plot.

Figure 4
Flowchart depicting UAV data processing for agricultural analysis. The process starts with UAV flight capturing crop images. Images undergo Pix4d mapper processing, generating orthomosaics. Data is input into QGIS for vegetation indices and plot grid boundary extraction. FIELDimageR and R 'caret' package facilitate summary extraction, integrating actual yield data and machine learning models to predict yield.

Figure 4. Workflow diagram illustrating the complete methodology adopted in this study, including UAV data acquisition, preprocessing, vegetation index generation, and subsequent analytical steps.

In this study, six vegetation indices were selected based on extensive literature evidence demonstrating their relevance to crop canopy structure, biomass accumulation, chlorophyll content, and vegetation vigor. These indices capture different aspects of plant physiology across growth stages, from early canopy development to maturity.

NDVI (normalized difference vegetation index) is one of the most widely used indicators of crop growth and canopy vigor. It is sensitive to chlorophyll content and canopy density, making it suitable for monitoring plant health and biomass, particularly during mid-season growth (Tucker, 1979). NDRE (normalized difference red edge index) provides greater sensitivity than NDVI during mid-to-late growth stages when canopies are dense, as the red-edge band avoids the saturation effect observed in the red band (Gitelson et al., 1996; Davidson et al., 2022). SAVI (soil-adjusted vegetation index) was designed to minimize the influence of soil background reflectance on vegetation signals, which is particularly important during early growth stages with sparse canopy cover (Huete, 1988). GNDVI (green normalized difference vegetation index) uses the green band instead of red, improving sensitivity to chlorophyll concentration and nitrogen status (Gitelson et al., 1996). NORM2 exploits the contrast between red and green reflectance and is sensitive to leaf pigmentation changes, plant stress, and senescence (Haboudance et al., 2004). LAI (leaf area index) is defined as the total leaf surface area per unit ground area. Empirical models using multispectral reflectance have been developed to estimate LAI (Haboudance et al., 2004). All vegetation indices were calculated from processed multispectral reflectance mosaics using the formulas provided in Equations 16 and Table 3.

Table 3
www.frontiersin.org

Table 3. Details of vegetation indices used, their formulas, and corresponding references.

NDVI= NIRREDNIR+RED(1)
NDRE= NIRRed EdgeNIR+Red Edge(2)
SAVI= 1.5*NIRRed0.5+(NIRRed)(3)
GNDVI= NIRGreenNIR+Green(4)
NORM2= RedGreenGreen+RED(5)
LAI= 3.618*2.5*(NIRRed)(NIR+6*Red7.5*Blue+1) 0.118(6)

Where: Red – Red band, Green – Green band, Blue – Blue band, NIR – Near infrared band, Red Edge – Red Edge band.

2.3 Statistical analysis

The linear relationship between each vegetation index (VI) and onion yield was assessed at the block level by performing Pearson’s correlation analysis. Additionally, principal component analysis (PCA) was applied to explore multivariate relationships and reduce dimensionality. Subsequently, multiple machine learning algorithms, including simple linear regression (lm), random forest (rf), support vector machine with radial kernel (svmRadial), gradient boosting machine (gbm), and elastic net regularized regression (glmnet), were evaluated separately for each growth stage, each year, and for the pooled dataset across all trials/planting dates to improve prediction accuracy. The selected algorithms represent a spectrum of modeling philosophies. Linear methods (lm, glmnet) provide interpretability and efficiency but may underperform with nonlinear relationships; ensemble learners (rf, gbm) improve robustness and handle complex interactions; and kernel-based approaches (svmRadial) capture nonlinear patterns but are sensitive to parameter tuning. Similar combinations have been widely adopted in crop yield prediction studies, where rf and svm consistently outperform classical regression models, and regularized models (glmnet) guard against overfitting (Jeong et al., 2016; Jiang et al., 2004; Hanuman et al., 2021). A k-fold cross-validation method was used to evaluate machine learning model performance to address the limited data generated by the experimental design. The dataset was randomly divided into k = 10 folds, and in each iteration, nine (10 − 1) folds (90% of the data) were used for training, while the remaining one fold (10%) was used for testing. A 10-fold cross-validation with a single repetition was employed to compute R2, RMSE, and MAE to assess the performance of the different ML models. This process was repeated until every fold had served as a validation set, ensuring that all observations contributed to both training and testing. Such resampling increases statistical efficiency, reduces bias, and provides more reliable error estimates than single-split approaches (Kohavi, 1995; Arlot and Celisse, 2010). Model hyperparameters were optimized via 10-fold cross-validation using caret’s default tuning grids. The final selected parameters were: rf (mtry = 2, using √p predictors per split), svmRadial (cost [C] = 0.25, σ = 0.86), gbm (50 trees, depth = 1, learning rate = 0.1), and glmnet (α = 0.1, λ = 0.058). This optimization strategy ensured robustness against overfitting and provided reliable estimates of performance generalization. All analyses were conducted in R v4.4.1 (R Core Team, 2024) using the caret package (Kuhn, 2008).

Principal component analysis is a multivariate statistical technique used to reduce the dimensionality of a dataset while preserving as much variability as possible. It does so by transforming the original variables into a new set of uncorrelated variables called principal components (PCs) (Paul et al., 2013). In accordance with PCA methodology, dimensionality was reduced by transforming the original matrix of vegetation indices into principal components. The decision to retain principal components was based on the Kaiser criterion (eigenvalues >1) and cumulative variance contribution (>95%), ensuring that the selected components explained most of the variability in the dataset while avoiding redundancy. The PCA transformation can be mathematically expressed as follows:

Given matrix X n×7 of vegetation indices transformed data, PCA transformed X into Z n×3 the transformation is defined as in Equation 7:

Z= XW(7)

Where: X n×7 is the matrix of vegetation indices, n is no of samples, and p=7 vegetation indices. Z n×3 is the matrix containing the first p=3 eigenvectors of the covariance matrix of X-scaled data. W 7×3 is the matrix containing the first 3 eigenvectors of the covariance matrix of X-scaled data.

Machine learning modeling

To assess the effectiveness of spectral indices in predicting onion yield across different growth stages, machine learning methods were trained using principal components (PCs) derived from vegetation indices (VIs). PCA was employed to reduce dimensionality while retaining 95% of the cumulative variance. To ensure unbiased model evaluation, PCA was fitted separately within each training fold during cross-validation, and the resulting transformation was applied to the corresponding test set. This procedure prevents data leakage from the test data into the training process. Five modeling algorithms—linear regression (lm), random forest (rf), support vector machine with radial kernel (svmRadial), gradient boosting machine (gbm), and elastic net regression (glmnet)—were trained using 10-fold cross-validation implemented through the caret package in R. All performance metrics presented in the Results represent cross-validated estimates derived from out-of-fold predictions. The dataset was partitioned into ten equal folds. In each iteration of the cross-validation process, nine folds were used for model training, including the application of PCA transformation, while the remaining one fold was held out as the validation set. This process was repeated ten times, ensuring that each fold served as the validation set once. Model performance was assessed using root mean square error (RMSE), coefficient of determination (R²), and mean absolute error (MAE). Final performance metrics were calculated by aggregating results across all 10 training and validation sets. A simple linear regression model based on vegetation indices was used for onion yield estimation (Equation 8) . The estimation accuracy of each vegetation index and planting date was evaluated, and the results were calculated as follows.

Y(yield)=β1x+β0+(8)

Where: Y is the Actual yield, and x is spectral indices like NDVI, NDRE, SAVI, GNDVI, LAI, and NORM2. β1 is the intercept of the independent variable, and β0  is constant.

Random forest regressor is an ensemble learning model that efficiently processes large-scale datasets by integrating multiple decision trees. This algorithm has the ability to obtain more accurate and stable predictions through noise reduction (Breiman, 2001). In this study, the random forest (rf) regressor was used to predict onion yield using principal components as independent variables, employing the rf method in R’s caret package with 10-fold cross-validation and the following parameters: Z n×3 is the matrix of the principal components (explanatory variable), y n is the vector of onion yield values (dependent variable).

The random forest regressor consists of an ensemble of T decision trees, where T = n_estimator. Each decision tree t in the forest is trained on a bootstrap sample of the data and makes a prediction ht(Z). The final prediction y^ is obtained by averaging the predictions of all individual trees.

Mathematically, this is expressed in Equation 9.

Yi^=1Tt=1Tht(Zi)(9)

Where: Yi^   is the predicted yield for the ith sample, T is the number of trees (500 by default in rf), and ht(Zi) is the prediction of ith sample. Each tree grows with random subset of mtry = 2 features.

Support vector machine with a radial basis function (RBF) kernel is a robust model that creates an epsilon-insensitive tube around the regression function, which does not penalize predictions that fall within this tolerance band (Vapnik, 2013). The model is obtained by minimizing the total loss while maximizing the margin using the RBF (Gaussian) kernel function, with parameters epsilon (ϵ) = 0.1, regularization parameter (C) = 0.25, and sigma (σ) = 0.86. The mathematical formulation of the SVR model is presented in Equations 10-11. Z n×3 is the matrix of the principal components (explanatory variable), y n is the vector of onion yield values (dependent variable).

The SVR function can be written as:

f(Z)=ω.ϕ(ζ)+b(10)

Where: ω ϵ p is the weight vector, ϕ(ζ) is the mapping function that transforms the input space Z into higher dimension feature space, and b is the bias term.

 minω, b, ξ, ξ*   12||ω||2+Ci=1n(ξi+ξi*)(11)

Where ω,b  the model parameter, n is the number of sample points, and ξi,ξi*>0   is slack variables.

Gradient boosting is an ensemble supervised machine learning algorithm that combines multiple weak learners to create a final model (Friedman, 2001). The concept of boosting stems from the need to convert weak learners into stronger predictors. The gradient boosting algorithm requires a loss function to be optimized, a weak learner for making predictions, and an additive model for precise estimation. The mathematical formulation of the model follows a stage-wise procedure for each base learner from m = 1 to M iterations, as shown in Equation 12. Stage-wise additive model (Equation 12):

Stagewise additive model Equation 12:

Fm(Z)= Fm1(Z)+ ν hm(Z)(12)

Where, hm (Z) = weak learner, ν = 0.1 shrinkage learning rate from output.

Trees, build by minimizing loss i=1nL (yi, Fm1 (Zi)+regularization.

The glmnet regression model, as implemented in the caret package in R, is flexible for fitting generalized linear model functions of the independent variables. The model combines L1 (lasso) and L2 (ridge) regularization to balance feature selection and coefficient shrinkage (Friedman et al., 2010). The objective function of the elastic net model is shown in Equation 13.

Model formula: Equation 13. is the objective function of Elastic Net

minβ0 β{12Ni=1N(yiβ0j=1pZijβj)2+λ[(1α)β222+β1]}(13)

Where: yi observed yield for ith sample, Zij jth principle component of ith sample, β0 and β intercept and regression coefficients, λ is penalty strentgth (tuned via cross validation) and α is mixing parameter.

2.4 Evaluation metrics

Multispectral and morphological data obtained from the experiment were used to establish rainy-season onion yield estimation models. Three evaluation metrics were used to assess model performance: coefficient of determination (R2) (Equation 14), root mean square error (RMSE) (Equation 15), and mean absolute error (MAE) (Equation 16). The coefficient of determination (R2) measures the proportion of variance explained by the regression model, while MAE and RMSE measure the average absolute error and the average magnitude of errors between predicted and observed values, respectively. Y¯

R2=11N(YiYi^)21N(YiY¯)2(14)
RMSE=1N1N(YiYi^)2(15)
MAE=1N|YiYi^|N(16)

Where Yi and Yi^ are the actual and predicted onion yield and Y¯ is the mean value of yield, with N representing the sample size. A combination of software and statistical tools was used for image processing, data analysis, and predictive model fitting, including Pix4Dmapper, ArcGIS, and R, as illustrated in the workflow flowchart.

3 Results

The yield distribution across trials and years (Supplementary Figure S1) revealed substantial variation, with consistently higher yields recorded in the 2024 season compared with 2023. This improvement may reflect more favorable environmental conditions during 2024. Notably, Trials 3 and 4 in 2024 showed higher median yields and reduced variability, indicating more stable performance under those conditions. Complementing these findings, density plots of vegetation indices by year and crop growth stage (Supplementary Figure S2) highlighted clear spectral differences across stages and seasons. During the bulb development stage, higher densities of VIs such as NDVI, LAI, and SAVI were observed in 2024, suggesting improved canopy vigor and overall crop health compared with 2023.

3.1 Correlation of bulb yield and spectral indices

Vegetation indices exhibited varying degrees of correlation with final yield across different trials, growth stages, and years (Supplementary Figures S3–S6). Overall, NDRE, NDVI, and GNDVI consistently showed strong positive correlations, particularly during the bulb development stage across all four trials. In Trial 1, NDRE (R = 0.92, p < 0.001) and NDVI (R = 0.88, p < 0.001) were the strongest predictors during bulb development, while at earlier stages, such as bulb initiation and the vegetative phase, only moderate correlations were observed. In Trial 2, NDRE, GNDVI, and LAI in the 2024 season exhibited exceptionally strong correlations (R > 0.9) across all growth stages, confirming the reliability of these indices under favorable conditions. In contrast, the 2023 season showed lower and more inconsistent correlations compared with 2024, particularly during the vegetative and bulb initiation stages. Trial 3 demonstrated strong and stable performance of NDRE (R = 0.96, p < 0.001) and LAI (R = 0.92, p < 0.001) during the bulb development stage, along with moderate to strong associations at bulb initiation in 2024. The vegetative stage in Trial 3 showed moderate correlations, with GNDVI and NDVI performing slightly better than other indices. Trial 4 showed comparable trends, with NDRE and NORM2 emerging as the most consistent vegetation indices across stages, particularly in 2023. However, correlations during the vegetative stage in 2024 were generally weaker, with some indices, such as LAI and SAVI, exhibiting non-significant relationships. Across all trials, the bulb development stage consistently demonstrated the strongest correlations between vegetation indices and yield, reaffirming its role as the most appropriate growth stage for yield prediction. NDRE and NDVI emerged as the most robust and generalizable indices, followed by GNDVI and LAI. These results underscore the importance of stage-specific and trial-specific vegetation index selection to enhance modeling accuracy in precision agriculture.

3.2 Principal component analysis

Principal component analysis (PCA) reduced the complexity of the data across growth stages by analyzing spectral indices both separately and in pooled datasets for both years. For each growth stage, six vegetation indices (NDVI, NDRE, GNDVI, NORM2, SAVI, and LAI) were standardized and subjected to PCA; the variance contributions of the first (PC1), second (PC2), and third (PC3) principal components are presented on the X- and Y-axes (Figure 5). The loading matrices derived from PCA indicated the relative contributions of each original variable to the principal components. Overall, PC1 consistently captured the majority of the variance with substantial loadings from all VIs, indicating their strong collective influence. In 2023, the first two principal components (PC1 and PC2) accounted for most of the total variance across all growth stages, explaining 85.2% and 12.4% at the vegetative stage, 84.5% and 12.8% at bulb initiation, and 90.0% and 6.1% at the bulb development stage, respectively. In contrast, during the 2024 season, PC1 alone explained a dominant portion of the variance, capturing 98.4% at the vegetative stage, 96.0% at the bulb initiation stage, and 98.1% at the bulb development stage. This indicates a high degree of correlation among the indices, suggesting that they responded similarly during that season. For the pooled dataset, the first two components (PC1 and PC2) collectively explained most of the variance across all growth stages. Specifically, at the vegetative stage, PC1 and PC2 accounted for 94.1% and 4.4% of the variance, respectively; at the bulb initiation stage, 87.2% and 11.1%; and at the bulb development stage, 84.3% and 13.9%.

Figure 5
Composite image of PCA analyses across three stages: bulb development, bulb initiation, and vegetative growth. Each stage includes a scree plot, a PCA biplot, and a loadings heatmap. Scree plots depict explained variance; biplots show variable relationships with yield data points in a color gradient; heatmaps illustrate variable loadings on principal components with colors ranging from blue to red.

Figure 5. PCA of VIs for pooled-year data across crop growth stages: scree plot, biplot, and loading matrix heatmap. (C) Vegetative stage, (B) Bulb initiation stage, and (A) Bulb development stage.

At the vegetative stage in 2023, NDVI, NDRE, and NORM2 showed strong associations with PC1, while LAI and SAVI exhibited stronger associations with PC2. This pattern was confirmed by the corresponding loading matrix, indicating that LAI and SAVI made substantial contributions to PC2. In contrast, the 2024 biplot revealed tight clustering of all indices with high loadings on PC1, reflecting a high degree of correlation and a consistent physiological response. The loading matrix supported this observation, showing uniformly positive and high loadings for all indices on PC1, with minimal loadings on PC2, indicating coherence across indices (Figure 6).

Figure 6
Three-panel figure showing PCA results. Panel A displays a scree plot for bulb development with the first dimension at 89.4%. The biplot shows yield correlation with NDVI and SAVI. The heatmap highlights strong PC1 contributions from NDVI and SAVI. Panel B presents a scree plot for bulb initiation, with the first dimension at 84.5%. The biplot displays yield association with NDRE and GNDVI. The heatmap indicates significant contributions of NDRE and NORM2 to PC1. Panel C shows a scree plot for vegetative stage, with the first dimension at 77.7%. The biplot reveals NDVI and LAI as key yield influencers. The heatmap shows NDVI and LAI impacting PC1 and PC2.

Figure 6. PCA of VIs for the year 2024 across crop growth stages: scree plot, biplot, and loading matrix heatmap. (C) Vegetative stage, (B) Bulb initiation stage, and (A) Bulb development stage.

The biplot of the pooled dataset exhibited intermediate behavior. While most indices aligned closely with PC1, similar to the pattern observed in 2024, LAI and SAVI showed slight deviations toward PC2, reflecting minor interannual variation. The pooled loading matrix reflected this pattern, showing strong contributions from all indices to PC1, with slight secondary loadings on PC2 for structure-related indices. At the bulb initiation stage, the 2023 PCA biplot showed a clear orientation of chlorophyll-related indices, such as NDVI, GNDVI, and NORM2, along PC1, indicating that these indices explained most of the variation during this stage. However, a moderate spread along PC2 was also observed. In contrast, the 2024 biplot revealed stronger coherence among vegetation indices. The pooled dataset biplot for the bulb initiation stage exhibited intermediate characteristics between the two years, with most indices, such as NDVI, NDRE, and NORM2, aligning strongly with PC1, while minor deviations were observed for certain indices along PC2. The pooled loading matrix showed strong contributions from key indices to PC1, with slight secondary loadings on PC2, especially for structure-related indices.

At the bulb development stage, the 2023 PCA biplot revealed moderate separation of variables along the first two principal components. NDRE, GNDVI, and NORM2 were strongly aligned with PC1, while NDVI and LAI exhibited slight deviations toward PC2. The corresponding contribution heatmap showed strong positive loadings on PC1 and negative or mixed contributions on PC2 for certain indices (Figure 7). In contrast, the 2024 PCA biplot exhibited a more pronounced unidimensional pattern, with all indices—NDRE, GNDVI, NDVI, NORM2, and LAI—tightly clustered and positively aligned with PC1. The pooled PCA analysis again revealed intermediate characteristics. NDRE, GNDVI, and NORM2 maintained strong positive loadings on PC1, while SAVI and LAI diverged toward PC2, indicating slight interannual variation in canopy structural traits. The heatmap confirmed this pattern, with robust contributions from NDRE and GNDVI on PC1, while SAVI and LAI showed negative or mixed contributions on PC2. Importantly, PCA biplots overlaid with yield vectors indicated that yield was most closely associated with PC1, particularly in the direction of NDVI, NDRE, GNDVI, NORM2, and SAVI, implying that these indices are key predictors of bulb yield during the bulb development stage. This observation is further supported by the contribution heatmap, in which NDVI, NDRE, GNDVI, NORM2, and SAVI showed high positive contributions to PC1.

Figure 7
Three panels displaying principal component analysis (PCA) data for bulb development, initiation, and vegetative stages in 2024. Each panel includes a scree plot showing the percentage of explained variances, a PCA biplot with yield data as colored points, and a PCA loadings heatmap. The scree plots indicate high variance explained by the first dimension: 98.7%, 96.2%, and 98.4% respectively. Biplots show variable loadings; heatmaps highlight the strength of association between variables and principal components. Yield, measured in tons per hectare, varies from 10 to 30 across the panels.

Figure 7. PCA of VIs for the year 2023 across crop growth stages: scree plot, biplot, and loading matrix heatmap. (C) Vegetative stage, (B) Bulb initiation stage, and (A) Bulb development stage.

3.3 Machine learning tools

The results are presented as boxplots of model performance metrics (RMSE and R2) for both combined and year-wise datasets (2023 and 2024) across different growth stages, as shown in Supplementary Figures S7–S10. These figures illustrate model performance across growth stages and demonstrate the stability and generalization ability of the machine learning (ML) models. To assess interannual variability, model performance was also evaluated separately for 2023 and 2024. For the combined dataset across both years and all growth stages, random forest (rf) consistently achieved the highest predictive accuracy, followed by svmRadial and gbm, while linear models (lm and glmnet) showed substantially lower performance (Table 4). Model performance at the vegetative stage is illustrated in Figure 8, where all five ML algorithms exhibited varying degrees of prediction accuracy. At the vegetative stage, the svmRadial model generalized well, with the lowest RMSE (4.891 t ha−1 for training and 5.276 t ha-1 for validation) and moderate R2 values (0.714 for training and 0.645 for validation), indicating moderate predictive ability. In contrast, linear regression and glmnet models exhibited higher errors (RMSE ≈ 7.39 t ha-1) and lower R2 values (0.193 for the training set and 0.234 for validation), indicating poor performance. The rf model exhibited the highest predictive power during training (R2 = 0.921), but its comparatively lower validation R2 of 0.579 indicated a tendency toward overfitting. A similar trend was observed at the bulb initiation stage (Figure 9), where rf achieved the highest R² (0.92) and the lowest RMSE (2.174 t ha−1) on the training set but showed reduced performance on the validation data (R2 = 0.654 and RMSE = 4.902 t ha−1). The gbm and svmRadial models followed with moderate generalization (gbm: RMSE = 4.362 t ha−1 for training and 5.119 t ha−1 for validation; R² = 0.71 for training and 0.63 for validation; svmRadial: RMSE = 4.849 t ha−1 for training and 5.572 t ha−1 for validation; R² = 0.67 for training and 0.58 for validation). During the bulb development stage, all ML models exhibited strong predictive capability (Figure 10). Among the evaluated models, rf demonstrated the highest overall accuracy, achieving the lowest RMSE of 1.919 ± 0.048 t ha-1 and the highest R2 of 0.944 ± 0.003 on the training set. Importantly, rf maintained moderate generalization, with a validation RMSE of 3.824 ± 0.787 t ha-1 and R2 of 0.755 ± 0.136. The svmRadial model also performed robustly, showing balanced metrics across training (R2 = 0.787 ± 0.026; RMSE = 3.912 ± 0.125 t ha-1) and validation (R² = 0.716 ± 0.158; RMSE = 4.388 ± 0.752 t ha-1), suggesting strong generalization ability. The gbm model showed moderate predictive performance, maintaining a training R2 of 0.743 ± 0.018 and a validation R2 of 0.649 ± 0.144, highlighting its potential as a robust predictive model. Linear models, including glmnet and lm, were comparatively less accurate but still demonstrated acceptable cross-validated performance, with R² values of 0.575 ± 0.208 and RMSE around 5.113 ± 0.912 t ha-1.

Table 4
www.frontiersin.org

Table 4. Growth stage-wise evaluation of ML models for onion yield prediction using PCA-based VI features: combined data.

Figure 8
Five scatter plots comparing predicted versus observed yield for different models: gbm, glmnet, lm, rf, and svmRadial. Each plot shows various colored and shaped markers representing trials and years. R-squared and RMSE values are provided for each model, indicating fit accuracy. A dashed line represents the ideal prediction line.

Figure 8. Validation performance of onion yield prediction models at the vegetative stage: observed versus predicted values using gbm, glmnet, lm, rf, and svmRadial with 10-fold cross-validation (n = 105).

Figure 9
Scatter plots comparing predicted versus observed yield during bulb initiation for different models: gbm, glmnet, lm, rf, and svmRadial. Each plot shows data points in varied colors and shapes representing trials and years. Correlation coefficients (R²) and root mean square errors (RMSE) are indicated for each model. Diagonal lines represent ideal predictions.

Figure 9. Validation performance of onion yield prediction models at the bulb initiation stage: observed versus predicted values using gbm, glmnet, lm, rf, and svmRadial with 10-fold cross-validation (n = 120).

Figure 10
Scatter plots comparing predicted vs. observed yield for bulb development using different models: gbm, glmnet, lm, rf, and svmRadial. Each plot shows a dashed line indicating perfect prediction, with varied marker colors and shapes representing trials and years. R-squared and RMSE values denote model accuracy, with rf showing the highest R-squared at 0.752 and lowest RMSE at 3.896.

Figure 10. Validation performance of onion yield prediction models at the bulb development stage: observed versus predicted values using gbm, glmnet, lm, rf, and svmRadial with 10-fold cross-validation (n = 120).

Year-wise model evaluations revealed marked interannual variability, with 2024 consistently outperforming 2023 across all growth stages and model types. At the vegetative stage in 2024 (Table 5), all models showed strong predictive capability, with rf achieving the lowest RMSE of 3.554 ± 0.689 t ha-1 and the highest R2 of 0.866 ± 0.044, followed closely by svmRadial (R2 = 0.902 ± 0.058; RMSE = 4.071 ± 0.899 t ha-1). Even linear models, such as lm and glmnet, demonstrated high predictive power (R² = 0.822 ± 0.101; RMSE ≈ 4.26 t ha-1), indicating good generalization under favorable seasonal conditions. Conversely, in 2023 (Table 4), the same models exhibited substantially lower performance. Although rf remained the top performer (R² = 0.564 ± 0.251; RMSE = 4.978 ± 1.958 t ha−1), its generalization ability was reduced compared with 2024. The svmRadial model yielded an R² of 0.548 ± 0.344 and an RMSE of 4.071 ± 2.102 t ha−1. At the bulb initiation stage in 2024, rf again delivered robust performance (R² = 0.798 ± 0.102; RMSE = 3.696 ± 0.791 t ha−1), while gbm (R2 = 0.792; RMSE = 3.693 t ha−1) and svmRadial (R2 = 0.796 ± 0.294) also demonstrated high predictive accuracy. Linear models (lm and glmnet) maintained strong R² values of approximately 0.795. In contrast, during 2023, only rf (R2 = 0.595 ± 0.322; RMSE = 3.976 t ha−1), gbm (R2 = 0.505 ± 0.300; RMSE = 4.623 t ha−1), and svmRadial (R2 = 0.548 ± 0.273; RMSE = 4.490 ± 1.347 t ha−1) showed moderate performance. The bulb development stage demonstrated the highest model performance across both years. In 2024, gbm achieved the most accurate predictions (R² = 0.909 ± 0.038; RMSE = 2.933 ± 0.336 t ha−1), followed by rf (R2 = 0.889 ± 0.087; RMSE = 2.747 ± 0.807 t ha-1). Notably, linear models also performed exceptionally, with both lm and glmnet reaching R2 = 0.882 ± 0.056 and RMSE ≈ 3.0 t ha−1. In 2023, rf and svmRadial retained relatively good performance (R2 = 0.622 and 0.631; RMSE = 3.889 and 4.266 t ha-1, respectively), whereas the performance of linear models declined markedly, with reduced R² values and increased RMSE, likely reflecting year-to-year variability.

Table 5
www.frontiersin.org

Table 5. Growth stage-wise evaluation of ML models for onion yield prediction using PCA-based VI features: Year-2024 data.

Across all growth stages, the random forest model consistently outperformed other algorithms in terms of R2, RMSE, and MAE. This trend is evident when jointly considering Tables 46, where rf achieved the best or near-best performance in each case. The consistency of rf’s superior results reinforces the robustness of this algorithm for yield prediction based on PCA-derived vegetation index features. The support vector machine with radial kernel also demonstrated strong predictive capability, showing slightly lower R2 values than rf but minimal differences between training and validation sets, indicating better generalization and reduced overfitting.

Table 6
www.frontiersin.org

Table 6. Growth stage-wise evaluation of ML models for onion yield prediction using PCA-based VI features: Year-2023 data.

4 Discussion

Timely and precise monitoring of crop growth and health is essential for optimizing agricultural management, improving yield forecasting, and supporting resource-efficient interventions (Cheng et al., 2016). In this context, UAV-assisted remote sensing has seen substantial advancements in agriculture over the past decade. UAVs provide the unique ability to collect high-resolution, real-time spatiotemporal data that reveal subtle changes in crop growth dynamics and canopy architecture (Calera et al., 2017). While these technologies have been extensively utilized for cereals and high-value horticultural crops (Zhou et al., 2017; Li et al., 2020), their application to onion—a shallow-rooted and leaf-specific crop with distinctive canopy and phenological traits—remains relatively novel. The present study investigated the utility of multispectral UAV imagery for predicting bulb yield in rainy-season onion cultivated under four different planting dates. Planting date is a critical determinant influencing onion growth and productivity because of its effects on thermal time accumulation, radiation interception, and photoperiod sensitivity (Brewster, 2008; Devulkar et al., 2015). The predictive performance of different vegetation indices (VIs), especially when measured at key growth stages, was assessed for yield estimation. Results indicated that remote sensing-based monitoring, when synchronized with critical phenophases, enables reliable prediction of final bulb yield across varying planting windows. These findings corroborate earlier reports by Pinter et al. (2003), Sakamoto et al. (2013), and Sun et al. (2020), which suggest that phenology-linked spectral sensing enhances the accuracy of crop modeling.

Present study demonstrated significant differences in vegetative vigor, bulb initiation timing, and canopy development across the four planting dates. These differences influenced spectral reflectance patterns and, consequently, yield prediction accuracy. In early-transplanted onions, the longer vegetative period led to denser canopies and increased chlorophyll accumulation, whereas late-transplanted onions experienced accelerated phenological progression due to reduced accumulation of growing degree days. Consequently, canopy reflectance and associated VIs exhibited varying levels of saturation and sensitivity across growth stages and planting dates. This dynamic interaction underscores the importance of timing UAV data acquisition to specific crop developmental phases, as emphasized by Tucker et al. (2005) and Yang et al. (2020).

Among the VIs evaluated, NDVI, NDRE, SAVI, LAI, NORM2, and GNDVI exhibited consistent relationships with final bulb yield, particularly when measured during the bulb development stage. NDVI, one of the most widely used indices, showed moderate to strong correlations with yield but tended to saturate under conditions of high LAI. This saturation effect, reported in several previous studies (Haboudane et al., 2004; Zarco-Tejada et al., 2016), limits NDVI’s effectiveness during late vegetative or pre-bulking phases when canopy closure approaches 100%. Conversely, indices such as GNDVI outperformed NDVI in dense canopies because of their greater sensitivity to subtle changes in chlorophyll content and structural biomass. GNDVI uses the green band instead of the red band, thereby reducing saturation effects and more effectively capturing nitrogen-induced variability in leaf pigment concentration (Delegido et al., 2011).

SAVI, which is designed to minimize the influence of soil background reflectance during early growth stages, was especially useful when canopy coverage was below 50%, as observed in early DAT imagery. The soil adjustment factor incorporated in this index enhances stability under partially vegetated conditions, consistent with findings by Mulla (2013) and Johansen et al. (2019). Notably, VI performance varied not only with crop growth stage but also across different planting dates. For example, NDVI exhibited relatively stable performance in early-planted onions, whereas GNDVI performed better in late-transplanted onions. This variation may be attributed to differences in canopy geometry, radiation use efficiency, and stress accumulation under varying environmental regimes. Sakamoto et al. (2013) and Xue and Su (2017) similarly reported that environmental and physiological shifts associated with planting time influence spectral responses and require tailored modeling strategies.

Canopy reflectance and its derived indices are fundamentally linked to chlorophyll content, leaf area index, and biomass accumulation, all of which are influenced by nutrient management, water availability, and plant stress status. The chlorophyll-rich leaves of actively growing onions strongly absorb red light and reflect near-infrared (NIR) radiation, forming the basis for vegetation indices such as NDVI and GNDVI. However, additional biophysical factors, including leaf angle distribution and canopy shadowing, can modulate these reflectance patterns. Such structural attributes often vary with cultivar, planting density, and growth rate, highlighting the need for location- and variety-specific calibration of yield prediction models (Xue and Su, 2017; Zhang et al., 2020).

Environmental factors such as solar radiation, relative humidity, temperature, and soil moisture played a pivotal role in the spectral performance of vegetation indices (VIs). These variables influence the physiological status of the crop, altering reflectance characteristics by affecting water content, pigment concentration, and stomatal conductance. For instance, cloud cover during image acquisition may lower near-infrared (NIR) reflectance because of reduced canopy temperature and transpiration rates. Sakamoto et al. (2013) and Tucker et al. (2005) emphasized the need to include environmental covariates in spectral models to reduce prediction error and improve model robustness.

The regression models developed in this study, based on both single and combined VIs, achieved moderate to high prediction accuracy (R² = 0.60–0.82). Models utilizing combined VIs, which integrate both pigment-sensitive and structural indices, consistently outperformed those built on individual indices, demonstrating that the synergistic use of complementary spectral information enhances yield estimation. These findings align with the results of Chlingaryan et al. (2018) and Liakos et al. (2018), who emphasized that multivariable modeling better captures the complex interactions underlying crop productivity.

Although linear regression models offer simplicity and ease of interpretation, their performance is constrained by assumptions of linearity and a limited capacity to handle noise and multicollinearity among predictor variables. Onion growth and yield, being influenced by numerous interacting physiological and environmental variables, may therefore benefit from more sophisticated modeling approaches. In this context, machine learning algorithms such as linear regression (lm), random forest (rf), support vector machine with radial kernel (svmRadial), gradient boosting machine (gbm), and elastic net regression (glmnet) have demonstrated superior performance in agricultural applications by capturing complex, nonlinear relationships (Liakos et al., 2018; Chlingaryan et al., 2018). Future research could leverage these advanced algorithms to enhance yield prediction accuracy and robustness, particularly under highly variable field conditions or in multi-environment trials.

Temporal dynamics emerged as a crucial factor influencing the strength of relationships between vegetation indices and yield. The study observed that spectral data acquired during the bulb development stage (approximately 60–70 DAT) exhibited the highest correlation with final bulb yield. This period coincides with peak leaf area, optimal radiation interception, and maximum canopy greenness (Brewster, 2008). At this phase, vegetative traits are closely linked with yield potential, enabling accurate prediction (Zhou et al., 2017; Zaman-Allah and Vergara, 2015; Junior et al., 2025). However, for early planting scenarios in which vegetative growth is prolonged, the optimal imaging window may shift slightly forward or backward depending on crop phenology and local microclimate conditions. Thus, stage-specific image acquisition strategies should be aligned with the onion crop’s physiological development.

Consistent with these findings, superior model performance was observed for the 2024 data compared with 2023, underscoring interannual yield variability and suggesting that volatile weather dynamics contributed to deviations in VI values and actual yield across both years (Huang et al., 2021; Knight et al., 2024). Accurate crop yield prediction is essential for agricultural planning and decision-making, enabling stakeholders to optimize resource allocation and mitigate food security risks (Clercq and Mahdi, 2024). Yield prediction remains challenging because of complex interactions between crop development and yield-affecting environmental variables such as weather and soil fertility status (Pham et al., 2022). Interannual yield variation—year-to-year fluctuation in crop yields—further complicates prediction efforts because of the dynamic interplay among environmental factors and management practices (Mohan et al., 2025). Mitigating interannual yield variation requires a multifaceted strategy combining genetic improvements, adaptive agronomic practices, and technology integration. Crop diversification can buffer against environmental stresses and pest outbreaks. Adaptive techniques such as laser land leveling, timely sowing, efficient use of beneficial microbes, and precision irrigation enhance climate resilience (Rakshit et al., 2022). The deployment of drought-tolerant varieties and optimized water use is especially vital in water-scarce regions. In addition, practices such as conservation tillage, crop rotation, and organic amendments improve soil health and enhance yield stability under variable conditions (Dhankher and Foyer, 2018; Ortuani et al., 2019).

Furthermore, the consistent performance of vegetation indices across growth stages and planting dates highlights their potential for developing generalized models for bulb yield prediction. However, model transferability is constrained by factors such as soil background reflectance, crop management practices, sensor characteristics, and atmospheric conditions (Hsiao et al., 2019; Khaki et al., 2020). These challenges necessitate rigorous image calibration, including the use of reflectance panels, ground control points (GCPs), and standardized processing algorithms to ensure data consistency across flights and growing seasons (Zarco-Tejada et al., 2001; Tucker et al., 2005). Planting time also plays a critical role in shaping onion crop physiology. Early planting in this study resulted in extended vegetative growth, greater biomass accumulation, and delayed bulb initiation (Kinoshita et al., 2024). Conversely, late planting induced stress by reducing accumulated growing degree days, thereby limiting vegetative expansion and bulb size. These growth responses were effectively detected through differences in spectral reflectance, supporting the view that phenology-driven variability must be accounted for in yield prediction models (Kang et al., 2020). Sakamoto et al. (2013) similarly demonstrated that differential growth patterns across sowing windows can influence the accuracy of satellite-based biomass estimation. While NDVI remains a standard vegetation index in remote sensing because of its simplicity and widespread validation, its limitations under dense canopies necessitate the use of complementary indices. Indices such as GNDVI, which utilize green reflectance, are more effective in detecting subtle changes in chlorophyll content, particularly under moderate to high canopy densities (Delegido et al., 2011). These findings indicate that the selection of appropriate vegetation indices or index combinations should be tailored to the crop species and the specific growth stage targeted for prediction. Although this study employed UAVs at the field scale, scaling up to regional or national applications may require integration with satellite-based data. Nevertheless, the high spatial resolution of UAV imagery makes it valuable for calibrating and validating coarser-resolution remote sensing products. UAV platforms also provide flexible flight planning, enabling on-demand monitoring following weather events, irrigation, or early signs of crop stress. Integrating UAV-based vegetation indices with physiological models and crop simulation platforms such as DSSAT or APSIM represents an emerging area of interest. Such integration could facilitate the translation of spectral data into agronomic insights, including biomass partitioning, nutrient requirements, and harvest index estimation, thereby enabling real-time decision-making by farmers, researchers, and policymakers. The synergy between remote sensing, crop modeling, and Internet of Things (IoT) infrastructure represents a promising frontier for precision agriculture (Chlingaryan et al., 2018).

In the current study, random forest regression consistently performed well across both individual years and pooled datasets, demonstrating strong generalization. This was evidenced by low RMSE values (training: 1.911 t ha−1; cross-validation: 3.824 t ha-1) and MAE (training: 1.459 t ha−1; validation: 3.110 t ha-1), along with relatively high R2 values (training: 0.939; validation: 0.755), indicating the model’s ability to accurately explain yield variability at the bulb development stage. Comparable results were reported by Ramos et al. (2020), who predicted maize yield using UAV-based vegetation indices. Random forest has also been successfully applied to estimate the yield of bulbous vegetables such as garlic using UAV-based multispectral imagery across multiple sensors and phenological stages (Marcone et al., 2024). Random forest models are particularly effective at

Forest distinguishes itself by handling nonlinear relationships between predictor and target variables, making them well suited for complex agricultural modeling (Fernández-Delgado et al., 2014; Elbaşi et al., 2024). Moreover, they can capture complex interactions among environmental, soil, and biological factors (Meng et al., 2025; Sadasivam, 2021), which has led to their widespread adoption in yield prediction and other agricultural applications (Fu et al., 2025). Notably, the support vector machine with radial kernel also demonstrated strong generalization, with a smaller gap between training (R2 = 0.787) and validation (R2 = 0.716) performance compared with random forest, highlighting its robustness against overfitting. Support vector machines with radial kernels are well suited for handling complex, high-dimensional datasets with relatively limited sample sizes, a scenario common in field-based agricultural experiments (Fernández-Delgado et al., 2014; Kumar et al., 2025). Their kernel-based approach enables the capture of nonlinear patterns in canopy reflectance data while remaining less sensitive to multicollinearity than linear regression methods. These findings underscore the complementary strengths of random forest (high accuracy and flexible modeling) and svmRadial (robust generalization), supporting their reliability for yield prediction at specific crop growth stages in rainy-season onion.

This study validates the utility of UAV-derived multispectral imagery for predicting onion bulb yield across staggered planting dates. Vegetation indices showed significant associations with yield, particularly when measured during the bulb development stage. The effects of planting time, environmental conditions, and crop growth stage on vegetation index performance underscore the need for context-specific model calibration (Vidican et al., 2023). Integrating stage-specific spectral data with robust modeling approaches positions UAV-assisted remote sensing as a valuable tool for advancing onion agronomy. Ongoing advances in sensor technology, data analytics, and agronomic modeling will further improve the precision, scalability, and utility of these tools in supporting sustainable and profitable onion production systems (Din et al., 2021; Messina et al., 2020). The findings of this study are relevant not only for individual growers but also for researchers, policymakers, crop insurance agencies, and stakeholders throughout the supply chain. Early and accurate yield predictions can inform input allocation, price forecasting, and logistics management. For semi-perishable crops such as onion, where sudden gluts or shortages can lead to price volatility, remote sensing offers a proactive tool for large-scale crop area estimation and yield forecasting (Darwin et al., 2021; Pham et al., 2022). The study further underscores the importance of multi-year and multi-location validation of UAV-based yield prediction systems. Incorporating ground-truth measurements, such as chlorophyll meters, leaf area sensors, or plant biomass sampling, would enhance the calibration of predictive models. In addition, advances in cloud-based platforms, edge computing, and AI-driven analytics are making it increasingly feasible to analyze UAV data in near real time, reducing the lag between data acquisition and actionable insights (Samuel and Baysal-Gurel, 2019; Maes and Steppe, 2018). The findings of the present study offer practical applications for precision crop management, yield forecasting, and advisory services by enabling accurate, field-scale prediction of onion yield using UAV-based multispectral data. However, high drone costs, shortages of trained professionals, and data acquisition complexities—particularly when integrating UAV data with satellite imagery—as well as privacy concerns may limit widespread adoption. Additionally, UAVs face constraints related to large-area monitoring and weather dependency, whereas satellite imagery, although broader in spatial coverage, provides lower spatial resolution, directly affecting the scale and precision of agricultural monitoring and yield prediction (Bazrafkan et al., 2025; Toosi et al., 2025). For scalable adoption, future research should prioritize automated UAV workflows, capacity building for technical operators, and the development of hybrid UAV–satellite systems, along with socioeconomic evaluations to support broad and sustainable implementation across diverse farming systems (Jabed and Azmi Murad, 2024).

5 Conclusion

This study demonstrated the potential of UAV-based multispectral imagery for effective yield prediction in rainy-season onion, evaluated across four planting dates over two consecutive years (2023 and 2024) in India. A more significant correlation between vegetation indices and bulb yield was observed in 2024 compared to 2023, underscoring interannual variability. Among the growth stages evaluated, the bulb development stage consistently showed the strongest association with yield in both years. To further address interannual variability, future research should incorporate a wider range of planting dates, multiple seasons and years, and diverse agro-climatic zones. Among the spectral indices tested, NDVI, NDRE, GNDVI, and NORM2 demonstrated the strongest correlations with bulb yield across both years. Of the five machine learning algorithms employed, random forest (rf) and support vector machine with radial basis kernel (svmRadial) models effectively captured yield variability and generalized well for prediction. Overall, the findings confirm that UAV-acquired multispectral imagery, combined with robust modeling approaches, offers a reliable and scalable solution for predicting onion yield under rainy-season cultivation.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

SW: Software, Methodology, Writing – review & editing, Supervision, Writing – original draft, Investigation, Formal Analysis, Conceptualization, Visualization, Data curation, Validation. SP: Writing – original draft, Data curation. AR: Data curation, Writing – original draft. KK: Writing – original draft, Writing – review & editing, Investigation, Validation, Methodology, Supervision. SB: Writing – original draft, Data curation. MD: Formal Analysis, Methodology, Data curation, Writing – original draft. IB: Writing – original draft, Data curation. MK: Writing – original draft, Data curation. YK: Writing – original draft, Data curation. ZS: Data curation, Writing – original draft. VK: Supervision, Writing – review & editing, Project administration, Resources, Funding acquisition, Investigation, Validation. PG: Writing – review & editing, Investigation, Project administration, Funding acquisition, Validation, Supervision. BB: Validation, Writing – review & editing, Investigation, Funding acquisition, Supervision. SM: Investigation, Software, Funding acquisition, Validation, Conceptualization, Resources, Formal Analysis, Writing – review & editing, Project administration, Supervision, Writing – original draft, Methodology, Visualization. LS: Writing – original draft, Validation. VM: Writing – original draft, Funding acquisition. SG: Writing – original draft, Investigation, Funding acquisition, Writing – review & editing, Supervision, Project administration, Conceptualization.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the TIH foundation of IoT and IoE, Mumbai.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2025.1696730/full#supplementary-material

References

Abdulridha, J., Ampatzidis, Y., Kakarla, S. C., and Roberts, P. (2020). Detection of target spot and bacterial spot diseases in tomato using UAV-based and benchtop-based hyperspectral imaging techniques. Precis. Agric. 21, 955–978. doi: 10.1007/s11119-019-09703-4

Crossref Full Text | Google Scholar

Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. 4, 40–79. doi: 10.1214/09-SS054

Crossref Full Text | Google Scholar

Ayalew, A. T. and Lohani, T. K. (2023). Prediction of crop yield by support vector machine coupled with deep learning algorithm procedures in Lower Kulfo watershed of Ethiopia. J. Eng. 2023, 6675523. doi: 10.1155/2023/6675523

Crossref Full Text | Google Scholar

Ballesteros, R., Ortega, J. F., Hernandez, D., and Moreno, M. A. (2018). Onion biomass monitoring using UAV-based RGB imaging. Precis Agric. 19 (5), 1–18. doi: 10.1007/s11119-018-9560-y

Crossref Full Text | Google Scholar

Bao, L., Li, X., Yu, J., Li, G., Chang, X., Yu, L., et al. (2024). Forecasting spring maize yield using vegetation indices and crop phenology metrics from UAV observations. Food Energy Secur. 13, e505. doi: 10.1002/fes3.505

Crossref Full Text | Google Scholar

Bazrafkan, A., Igathinathane, C., Bandillo, N., and Flores, P. (2025). Optimizing integration techniques for UAS and satellite image data in precision agriculture — a review. Front. Remote Sens 6. doi: 10.3389/frsen.2025.1622884

Crossref Full Text | Google Scholar

Bendig, J., Bolten, A., Bennertz, S., Broscheit, J., Eichfuss, S., and Bareth, G. (2014). Estimating biomass of barley using crop surface models (CSMs) derived from UAV-based RGB imaging. Remote Sens. 6, pp.10395–10412. doi: 10.3390/rs61110395

Crossref Full Text | Google Scholar

Bendig, J., Yu, K., Aasen, H., Bolten, A., Bennertz, S., Broscheit, J., et al. (2015). Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int. J. Appl. Earth Observ Geoinformati 39, 79–87. doi: 10.1016/j.jag.2015.02.012

Crossref Full Text | Google Scholar

Breiman, L. (2001). Random forests. Mach. Learn. 45, 5–32. doi: 10.1023/A:1010933404324

Crossref Full Text | Google Scholar

Brewster, J. L. (2008). Onions and other vegetable alliums Vol. Vol. 15 (CABI).

Google Scholar

Calera, A., Campos, I., Osann, A., D’Urso, G., and Menenti, M. (2017). Remote sensing for crop water management: from ET modelling to services for the end users. Sensors 17, 1104. doi: 10.3390/s17051104

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, T., Yang, W., Zhang, H., Zhu, B., Zeng, R., Wang, X., et al. (2020). Early detection of bacterial wilt in peanut plants through leaf-level hyperspectral and unmanned aerial vehicle data. Comput. Electron. Agric. 177, 105708. doi: 10.1016/j.compag.2020.105708

Crossref Full Text | Google Scholar

Cheng, T., Yang, Z., Inoue, Y., Zhu, Y., and Cao, W. (2016). Preface: Recent advances in remote sensing for crop growth monitoring. Remote Sens. 8, 116. doi: 10.3390/rs8020116

Crossref Full Text | Google Scholar

Chlingaryan, A., Sukkarieh, S., and Whelan, B. (2018). Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput. Electron. Agric. 151, 61–69. doi: 10.1016/j.compag.2018.05.012

Crossref Full Text | Google Scholar

Clercq, D. D. and Mahdi, A. (2024). Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis data (arXiv (Cornell University). doi: 10.48550/arxiv.2403.07967

Crossref Full Text | Google Scholar

Córcoles, J. I., Ortega, J. F., Hernández, D., and Moreno, M. A. (2013). Estimation of leaf area index in onion (Allium cepa L.) using an unmanned aerial vehicle. Biosyst. Eng. 115 (1), 31–42. doi: 10.1016/j.biosystemseng.2013.02.002

Crossref Full Text | Google Scholar

Darwin, B., Dharmaraj, P., Prince, S., Popescu, D. E., and Hemanth, D. J. (2021). Recognition of bloom/yield in crop images using deep learning models for smart agriculture: A review. Agronomy 11, 646. doi: 10.3390/agronomy11040646

Crossref Full Text | Google Scholar

Davidson, C., Jaganathan, V., Sivakumar, A. N., Czarnecki, J. M. P., and Chowdhary, G. (2022). NDVI/NDRE prediction from standard RGB aerial imagery using deep learning. Comput. Electron. Agric. 203, 107396. doi: 10.1016/j.compag.2022.107396

Crossref Full Text | Google Scholar

Delegido, J, Verrelst, J., Alonso, L., and Moreno, J (2011). Evaluation of sentinel-2 red-edge bands for empirical estimation of green LAI and chlorophyll content. Sensors 11, 7063–7081. doi: 10.3390/s110707063

PubMed Abstract | Crossref Full Text | Google Scholar

Devulkar, N. G., Bhanderi, D. R., More, S. J., and Jethava, B. (2015). Optimization of yield and growth in onion through spacing and time of planting. Int. J. Green Farm 6, 305–307.

Google Scholar

Dhankher, O. P. and Foyer, C. H. (2018). Climate resilient crops for improving global food security and safety. Plant Cell Environ. 41, 877–884. doi: 10.1111/pce.13207

PubMed Abstract | Crossref Full Text | Google Scholar

Din, M., Zheng, W., Rashid, M., Wang, S., and Shi, Z. (2017). Evaluating hyperspectral vegetation indices for leaf area index estimation of Oryza sativa L. at diverse phenological stages. Front. Plant Sci. 8, 820. doi: 10.3389/fpls.2017.00820

PubMed Abstract | Crossref Full Text | Google Scholar

Din, N. U., Naz, B., Zai, S., and Ahmed, W. (2021). Onion crop monitoring with multi-spectral imagery using deep neural network. Int. J. Adv Comput. Sci. Appl. 12 (5). doi: 10.14569/IJACSA.2021.0120537

Crossref Full Text | Google Scholar

Duarte-Correa, D., Rodríguez-Reséndiz, J., Díaz-Flórez, G., Olvera-Olvera, C. A., and Álvarez-Alvarado, J. M. (2023). Identifying growth patterns in arid-zone onion crops (Allium cepa) using digital image processing. Technologies 11, 67. doi: 10.3390/technologies11030067

Crossref Full Text | Google Scholar

Elbaşi, E., Mostafa, N., Zaki, C., Al-Arnaout, Z., Topcu, A. E., and Saker, L. (2024). Optimizing agricultural data analysis techniques through AI-powered decision-making processes. Appl. Sci. 14, 8018. doi: 10.3390/app14178018

Crossref Full Text | Google Scholar

Eykerman, A. (2022). Feasibility of remote sensing of leeks for fertilization management (Doctoral dissertation, Ghent University).

Google Scholar

Faiçal, B. S., Costa, F. G., Pessin, G., Ueyama, J., Freitas, H., Colombo, A., et al. (2014). The use of unmanned aerial vehicles and wireless sensor networks for spraying pesticides. J. Syst. Architect 60, pp.393–pp.404. doi: 10.1016/j.sysarc.2014.01.004

Crossref Full Text | Google Scholar

FAOSTAT (2025). Food and agriculture data. Available online at: https://www.fao.org/faostat (Accessed June 27, 2025).

Google Scholar

Farooqui, N. A., Haleem, M., Khan, W., and Ishrat, M. (2024). Precision agriculture and predictive analytics: Enhancing agricultural efficiency and yield. Intel techniques predict Data anal, 171–188. doi: 10.1002/9781394227990.ch9

Crossref Full Text | Google Scholar

Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181.

Google Scholar

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Stat, 1189–1232. doi: 10.1214/aos/1013203451

Crossref Full Text | Google Scholar

Friedman, J. H., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. software 33, 1–22. doi: 10.18637/jss.v033.i01

PubMed Abstract | Crossref Full Text | Google Scholar

Fu, Z., Jiang, J., Gao, Y., Krienke, B., Wang, M., Zhong, K., et al. (2020). Wheat growth monitoring and yield estimation based on multi-rotor unmanned aerial vehicle. Remote Sens. 12, p.508. doi: 10.3390/rs12030508

Crossref Full Text | Google Scholar

Fu, H., Lü, J., Li, J., Zou, W., Tang, X., Ning, X., et al. (2025). Winter wheat yield prediction using satellite remote sensing data and deep learning models. Agronomy 15, 205. doi: 10.3390/agronomy15010205

Crossref Full Text | Google Scholar

Fu, Y., Yang, G., Wang, J., Song, X., and Feng, H. (2014). Winter wheat biomass estimation based on spectral indices, band depth analysis and partial least squares regression using hyperspectral measurements. Comput. Electron. Agric. 100, 51–59. doi: 10.1016/j.compag.2013.10.010

Crossref Full Text | Google Scholar

Gadge, S. S. and Lawande, K. E. (2012). Crop damage due to climatic change: a major constraint in onion farming. Indian Res. J. Ext. Educ. 2, 38–41.

Google Scholar

Geipel, J., Link, J., and Claupein, W. (2014). Combined spectral and spatial modeling of corn yield based on aerial images and crop surface models acquired with an unmanned aircraft system. Remote Sens. 6, pp.10335–10355. doi: 10.3390/rs61110335

Crossref Full Text | Google Scholar

Gitelson, A. A., Kaufman, Y. J., and Merzlyak, M. N. (1996). Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 58, 289–298. doi: 10.1016/S0034-4257(96)00072-7

Crossref Full Text | Google Scholar

Haboudane, D., Miller, J. R., Pattey, E., Zarco-Tejada, P. J., and Strachan, I. B. (2004). Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture. Remote Sens. Environ. 90, 337–352. doi: 10.1016/j.rse.2003.12.013

Crossref Full Text | Google Scholar

Hanuman, V., Pinnamaneni, K. V., and Singh, T. (2021). “Best fit radial kernel support vector machine for intelligent crop yield prediction method,” in Machine learning and information processing: proceedings of ICMLIP 2020 (Springer Singapore, Singapore), 457–467.

Google Scholar

Hsiao, J., Swann, A. L. S., and Kim, S.-H. (2019). Maize yield under a changing climate: The hidden role of vapor pressure deficit. ” Agric. For. Meteorol 279, 107692. doi: 10.1016/j.agrformet.2019.107692

Crossref Full Text | Google Scholar

Huang, L., Wang, F., Liu, Y., and Zhang, Y. (2021). Night temperature determines the interannual yield variation in hybrid and inbred rice widely used in central China through different effects on reproductive growth. Front. Plant Sci. 12, 646168. doi: 10.3389/fpls.2021.646168

PubMed Abstract | Crossref Full Text | Google Scholar

Huete, A. R. (1988). A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 25, 295–309. doi: 10.1016/0034-4257(88)90106-X

Crossref Full Text | Google Scholar

Jabed, M. A. and Azmi Murad, M. A. (2024). Crop yield prediction in agriculture: A comprehensive review of machine learning and deep learning approaches, with insights for future research and sustainability. Heliyon 10, e40836. doi: 10.1016/j.heliyon.2024.e40836

PubMed Abstract | Crossref Full Text | Google Scholar

Jeong, J. H., Resop, J. P., Mueller, N. D., Fleisher, D. H., Yun, K., Butler, E. E., et al. (2016). Random forests for global and regional crop yield predictions. PloS One 11, p.e0156571. doi: 10.1371/journal.pone.0156571

PubMed Abstract | Crossref Full Text | Google Scholar

Jiang, D., Yang, X., Clinton, N., and Wang, N. (2004). An artificial neural network model for estimating crop yields using remotely sensed information. Int. J. Remote Sens. 25, pp.1723–1732. doi: 10.1080/0143116031000150068

Crossref Full Text | Google Scholar

Johansen, K., Morton, M. J. L., Malbeteau, Y. M., Aragon, B., Al-Mashharawi, S. K., Ziliani, M. G., et al. (2019). Unmanned aerial vehicle-based phenotyping using morphometric and spectral analysis can quantify responses of wild tomato plants to salinity stress. Front. Plant Sci. 10. doi: 10.3389/fpls.2019.00370

PubMed Abstract | Crossref Full Text | Google Scholar

Júnior, M. R. B., de Azevedo Sales, L., dos Santos, R. G., Vargas, R. B. S., Tyson, C., and de Oliveira, L. P. (2025). Forecasting yield and market classes of Vidalia sweet onions: A UAV-based multispectral and texture data-driven approach. Smart Agric. Technol. 10, p.100808. doi: 10.1016/j.atech.2025.100808

Crossref Full Text | Google Scholar

Kang, Y S., Jang, S H., Park, J. W., Song, H. Y., Ryu, C. S., Jun, S. R., et al. (2020). Yield prediction and validation of onion (Allium cepa L.) using key variables in narrowband hyperspectral imagery and effective accumulated temperature. Comput. Electron. Agric. 178, 105667. doi: 10.1016/j.compag.2020.105667

Crossref Full Text | Google Scholar

Khaki, S., Wang, L., and Archontoulis, S. V. (2020). A CNN-RNN framework for crop yield prediction. Front. Plant Sci. 10. doi: 10.3389/fpls.2019.01750

PubMed Abstract | Crossref Full Text | Google Scholar

Kinoshita, T., Hamano, M., and Atsushi, Y. (2024). Onion cultivars and set planting date during summer for an early winter harvest in northern Japan. Japan Agric. Res. Quarterly: JARQ 58, 165–173. doi: 10.6090/jarq.58.165

Crossref Full Text | Google Scholar

Knight, C., Khouakhi, A., and Waine, T. W. (2024). The impact of weather patterns on inter-annual crop yield variability. Sci. Tot Environ. 955, 177181. doi: 10.1016/j.scitotenv.2024.177181

PubMed Abstract | Crossref Full Text | Google Scholar

Kohavi, R. (1995). August. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai 14, 1137–1145).

Google Scholar

Kuhn, M. (2008). Building predictive models in R using the caret package. J. Stat. Software 28, 1–26. doi: 10.18637/jss.v028.i05

Crossref Full Text | Google Scholar

Kumar, C., Dhillon, J., Huang, Y., and Reddy, K. (2025). Explainable machine learning models for corn yield prediction using UAV multispectral data. Comput. Electron. Agric. 231, 109990. doi: 10.1016/j.compag.2025.109990

Crossref Full Text | Google Scholar

Li, B., Xu, X., Zhang, L., Han, J., Bian, C., Li, G., et al. (2020). Above-ground biomass estimation and yield prediction in potato by using UAV-based RGB and hyperspectral imaging. ISPRS J. Photogrammet Remote Sens. 162, 161–172. doi: 10.1016/j.isprsjprs.2020.02.013

Crossref Full Text | Google Scholar

Liakos, K. G., Busato, P., Moshou, D., Pearson, S., and Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors 18, 2674. doi: 10.3390/s18082674

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, J., Wang, W., Li, J., Mustafa, G., Su, X., Nian, Y., et al. (2025). UAV remote sensing technology for wheat growth monitoring in precision agriculture: comparison of data quality and growth parameter inversion. Agronomy 15, 159. doi: 10.3390/agronomy15010159

Crossref Full Text | Google Scholar

Maes, W. H. and Steppe, K. (2019). Perspectives for remote sensing with unmanned aerial vehicles in precision agriculture. Trends Plant Sci. 24, 152–164. doi: 10.1016/j.tplants.2018.11.007

PubMed Abstract | Crossref Full Text | Google Scholar

Mahesh, P. and Soundrapandiyan, R. (2024). Yield prediction for crops by gradient-based algorithms. PloS One 19, e0291928. doi: 10.1371/journal.pone.0291928

PubMed Abstract | Crossref Full Text | Google Scholar

Marcone, A., Impollonia, G., Croci, M., Blandinières, H., Pellegrini, N., and Amaducci, S. (2024). Garlic yield monitoring using vegetation indices and texture features derived from UAV multispectral imagery. Smart Agric. Technol. 8, 100513. doi: 10.1016/j.atech.2024.100513

Crossref Full Text | Google Scholar

Matias, F. I., Caraza-Harter, M. V., and Endelman, J. B. (2020). ‘FIELDimageR: An R package to analyze orthomosaic images from agricultural field trials’. Plant Phenome J. 3, e20005. doi: 10.1002/ppj2.20005

Crossref Full Text | Google Scholar

Meng, W., Li, X., Zhang, J., Pei, T., and Zhang, J. (2025). Monitoring of soybean bacterial blight disease using drone-mounted multispectral imaging: A case study in northeast China. Agronomy 15, 921. doi: 10.3390/agronomy15040921

Crossref Full Text | Google Scholar

Messina, G., Praticò, S., Siciliani, B., Curcio, A., Di Fazio, S., and Modica, G. (2020). “Monitoring onion crops using UAV multispectral and thermal imagery: preliminary results,” in Innovative biosystems engineering for sustainable agriculture, forestry and food production. MID-TERM AIIA 2019. Lecture notes in civil engineering, vol. vol 67 . Eds. Coppola, A., Di Renzo, G., Altieri, G., and D’Antonio, P. (Springer, Cham). doi: 10.1007/978-3-030-39299-4_94

Crossref Full Text | Google Scholar

Mohan, R. N. V. J., Rayanoothala, P. S., and Sree, R. P. (2025). Next-gen agriculture: integrating AI and XAI for precision crop yield predictions. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1451607

PubMed Abstract | Crossref Full Text | Google Scholar

Moka, S., Singh, N., and Buttar, D. S. (2021). Identification of potential native chitinase-producing Trichoderma spp. and its efficacy against damping-off in onion. Eur. J. Plant Pathol. 161, 289–300. doi: 10.1007/s10658-021-02321-9

Crossref Full Text | Google Scholar

Mulla, D. J. (2013). Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Biosyst. Eng. 114, 358–371. doi: 10.1016/j.biosystemseng.2012.08.009

Crossref Full Text | Google Scholar

Ortuani, B., Sona, G., Ronchetti, G., Mayer, A., and Facchi, A. (2019). Integrating geophysical and multispectral data to delineate homogeneous management zones within a vineyard in Northern Italy. Sensors 19, 3974. doi: 10.3390/s19183974

PubMed Abstract | Crossref Full Text | Google Scholar

Paul, L. C., Suman, A. A., and Sultan, N. (2013). Methodological analysis of Principal Component Analysis (PCA) method. Int. J. Comput. Eng. Manage. 16, 32–38.

Google Scholar

Pham, H. T., Awange, J. L., Kühn, M., Nguyen, B. V., and Bui, L. K. (2022). Enhancing crop yield prediction utilizing machine learning on satellite-based vegetation health indices. Sensors 22, 719. doi: 10.3390/s22030719

PubMed Abstract | Crossref Full Text | Google Scholar

Pinter, J. P. J., Hatfield, J. L., Schepers, J. S., Barnes, E. M., Moran, M.S., Daughtry, C. S. T., et al. (2003). Remote sensing for crop management. Photogrammetric Eng. Remote Sens. 69, 647–664. doi: 10.14358/PERS.69.6.647

Crossref Full Text | Google Scholar

Rahimi, E. and Jung, C. (2024). Evaluating the applicability of landsat 8 data for global time series analysis. Frontiers in Remote Sensing. 5 1492534. doi: 10.3389/frsen.2024.1492534

Crossref Full Text | Google Scholar

Rakshit, A., Meena, V. S., Chakraborty, S., Sarkar, B., and Ghosh, S. (2022). Editorial: adaptive farming sustainability practices: fundamentals to advances. Front. Sustain. Food Syst. 6. doi: 10.3389/fsufs.2022.823437

Crossref Full Text | Google Scholar

Ramos, A. P. M., Osco, L. P., Furuya, D. E. G., Gonçalves, W. N., Santana, D. C., Teodoro, L. P. R., et al. (2020). A random forest ranking approach to predict yield in maize with uav-based vegetation spectral indices. Comput. Electron. Agric. 178, 105791. doi: 10.1016/j.compag.2020.105791

Crossref Full Text | Google Scholar

Rodriguez, J., Lizarazo, I., Prieto, F., and Angulo-Morales, V. (2021). Assessment of potato late blight from UAV-based multi-spectral imagery. Comput. Electron. Agric. 184, 106061. doi: 10.1016/j.compag.2021.106061

Crossref Full Text | Google Scholar

Ruwanpathirana, P. P., Sakai, K., Jayasinghe, G. Y., Nakandakari, T., Yuge, K., Wijekoon, W. M. C. J., et al. (2024). Evaluation of sugarcane crop growth monitoring using vegetation indices derived from RGB-based UAV images and machine learning models. Agronomy 14, 2059. doi: 10.3390/agronomy14092059

Crossref Full Text | Google Scholar

Sadasivam, G. S. (2021). Crop yield prediction using granular SVM. Int. J. Recent Technol. Eng. (IJRTE) 9, 85. doi: 10.35940/ijrte.f5417.039621

Crossref Full Text | Google Scholar

Sakamoto, T., Gitelson, A. A., and Arkebauer, T. J. (2013). MODIS-based corn grain yield estimation model incorporating crop phenology information. Remote Sens. Environ. 131, 215–231. doi: 10.1016/j.rse.2012.12.017

Crossref Full Text | Google Scholar

Salunkhe, V. N., Gedam, P., Pradhan, A., Gaikwad, B., Kale, R., and Gawande, S. (2022). Concurrent waterlogging and anthracnose-twister disease in rainy-season onions (Allium cepa): Impact and management. Front. Microbiol. 13, 1063472. doi: 10.3389/fmicb.2022.1063472

PubMed Abstract | Crossref Full Text | Google Scholar

Samuel C, H. and Baysal-Gurel, F. (2019). Unmanned aircraft system (UAS) technology and applications in agriculture. Agronomy 9, 618. doi: 10.3390/agronomy9100618

Crossref Full Text | Google Scholar

Sharma, D. and Chauhan, A. (2024). Kharif onion production in India- present status and future potential: A review. Agric. Rev. 45, 653–660. doi: 10.18805/ag.R-2455

Crossref Full Text | Google Scholar

Su, J., Liu, C., Coombes, M., Hu, X., Wang, C., Xu, X., et al. (2018). Wheat yellow rust monitoring by learning from multi-spectral UAV aerial imagery. Comput. Electron. Agric. 155, 157–166. doi: 10.1016/j.compag.2018.10.017

Crossref Full Text | Google Scholar

Sun, G., Wang, X., Yang, H., and Zhang, X. (2020). A canopy information measurement method for modern standardized apple orchards based on UAV multimodal information. Sensors 20, 2985. doi: 10.3390/s20102985

PubMed Abstract | Crossref Full Text | Google Scholar

Team, R. (2024). Developement Core. “R: A language and environment for statistical computing (Version 4.4.1 R foundation for statistical computing).

Google Scholar

Thangasamy, A. and Lawande, K. E. (2015). Integrated nutrients management for sustainable onion production. Indian J. Hortic. 72, 347–352. doi: 10.5958/0974-0112.2015.00068.7

Crossref Full Text | Google Scholar

Toosi, A., Samadzadegan, F., and Dadrass Javan, F. (2025). Toward the optimal spatial resolution ratio for fusion of UAV and Sentinel-2 satellite imageries using metaheuristic optimization. Adv. Space Res. 75, 5254–5282. doi: 10.1016/j.asr.2025.01.034

Crossref Full Text | Google Scholar

Tucker, C. J., Pinzon, J. E., Brown, M. E., Slayback, D. A., Pak, E. W., Mahoney, R., et al. (2005). An extended AVHRR 8-km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data. Int. J. Remote Sens. 26, 4485–4498. doi: 10.1080/01431160500168686

Crossref Full Text | Google Scholar

Tucker, C. J. (1979). Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 8, 127–150. doi: 10.1016/0034-4257(79)90013-0

Crossref Full Text | Google Scholar

Usha, K. and Singh, B. (2013). Potential applications of remote sensing in horticulture: A review. Scientia Hortic. 153, 71–83. doi: 10.1016/j.scienta.2013.01.008

Crossref Full Text | Google Scholar

Vapnik, V. (2013). The nature of statistical learning theory (Springer science & business media).

Google Scholar

Verger, A., Vigneau, N., Chéron, C., Gilliot, J. M., Comar, A., and Baret, F. (2014). Green area index from an unmanned aerial system over wheat and rapeseed crops. Remote Sens. Environ. 152, 654–664. doi: 10.1016/j.rse.2014.06.006

Crossref Full Text | Google Scholar

Vidican, R., Mălinaș, A., Ranta, O., Moldovan, C., Marian, O., Ghețe, A., et al. (2023). Using remote sensing vegetation indices for the discrimination and monitoring of agricultural crops: A critical review. Agronomy 13, 3040. doi: 10.3390/agronomy13123040

Crossref Full Text | Google Scholar

Wang, C., Chen, Y., Xiao, Z., Zeng, X., Tang, S., Lin, F., et al. (2023). Cotton blight identification with ground framed canopy photo-assisted multi-spectral UAV images. Agronomy 13, p.1222. doi: 10.3390/agronomy13051222

Crossref Full Text | Google Scholar

Xue, J. and Su, B. (2017). Significant remote sensing vegetation indices: A review of developments and applications. J. Sensors 1353691. doi: 10.1155/2017/1353691

Crossref Full Text | Google Scholar

Yang, Q, Shi, L., Han, J., Yu, J., and Huang, K. (2020). A near real-time deep learning approach for detecting rice phenology based on UAV images. Agric. For. Meteorol 287, 107938. doi: 10.1016/j.agrformet.2020.107938

Crossref Full Text | Google Scholar

Ye, H., Huang, W., Huang, S., Cui, B., Dong, Y., Guo, A., et al. (2020). Recognition of banana fusarium wilt based on UAV remote sensing. Remote Sens. 12, p.938. doi: 10.3390/rs12060938

Crossref Full Text | Google Scholar

Zaman-Allah, M. and Vergara, O. (2015). Araus, J.L. et al., Unmanned aerial platform-based multi-spectral imaging for field phenotyping of maize. Plant Methods 11, 35. doi: 10.1186/s13007-015-0078-2

PubMed Abstract | Crossref Full Text | Google Scholar

Zarco-Tejada, P. J., González-Dugo, M. V., and Fereres, E. (2016). Seasonal stability of chlorophyll fluorescence quantified from airborne hyperspectral imagery as an indicator of net photosynthesis in the context of precision agriculture. Remote Sens. Environ. 179, 89–103. doi: 10.1016/j.rse.2016.03.024

Crossref Full Text | Google Scholar

Zarco-Tejada, P. J., Miller, J. R., Mohammed, G. H., Noland, T. L., and Sampson, P. H. (2001). Estimation of chlorophyll fluorescence under natural illumination from hyperspectral data. Int. J. Appl. Earth Observ Geoinformati 3, 321–327. doi: 10.1016/S0303-2434(01)85039-X

Crossref Full Text | Google Scholar

Zhang, Z., Li, Z, Chen, Y, Zhang, L., and Tao, F. (2020). Improving regional wheat yields estimations by multi-step-assimilating of a crop model with multi-source data. Agric. For. Meteorol 290, 107993. doi: 10.1016/j.agrformet.2020.107993

Crossref Full Text | Google Scholar

Zhang, C. and Kovacs, J. M. (2012). The application of small unmanned aerial systems for precision agriculture: a review. Precis. Agric. 13, 693–712. doi: 10.1007/s11119-012-9274-5

Crossref Full Text | Google Scholar

Zhou, X., Zheng, H. B., Xu, X. Q., He, J. Y., Ge, X. K., Yao, X., et al. (2017). Predicting grain yield in rice using multi-temporal vegetation indices from UAV-based multi-spectral and digital imagery. ISPRS J. Photogrammet Remote Sens. 130, 246–255. doi: 10.1016/j.isprsjprs.2017.05.003

Crossref Full Text | Google Scholar

Zhu, J., Li, Y., Wang, C., Liu, P., and Lan, Y. (2024). Method for monitoring wheat growth status and estimating yield based on UAV multispectral remote sensing. Agronomy 14, 991. doi: 10.3390/agronomy14050991

Crossref Full Text | Google Scholar

Keywords: crop modeling, machine learning, multi-spectral sensors, onion production, precision agriculture, remote sensing, vegetation indices, yield prediction

Citation: Wayal SM, Parab S, Raj A, Khandagale K, Bhegde S, Dawale M, Bhangare I, Khaire M, Kadam Y, Shaikh Z, Karuppaiah V, Gedam P, Bibwe B, More SJ, Sharma LK, Mahajan V and Gawande SJ (2026) UAV multispectral sensing and data-driven modeling for precision onion yield prediction. Front. Plant Sci. 16:1696730. doi: 10.3389/fpls.2025.1696730

Received: 05 September 2025; Accepted: 29 December 2025; Revised: 11 December 2025;
Published: 06 February 2026.

Edited by:

Parvathaneni Naga Srinivasu, Amrita Vishwa Vidyapeetham University, India

Reviewed by:

Nafees Akhter Farooqui, Integral University, India
Xianzhi Wang, Yunnan University, China
Xiaowen Wang, Jiangsu University, China

Copyright © 2026 Wayal, Parab, Raj, Khandagale, Bhegde, Dawale, Bhangare, Khaire, Kadam, Shaikh, Karuppaiah, Gedam, Bibwe, More, Sharma, Mahajan and Gawande. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Suresh J. Gawande, c3VyZXNoZ2F3YW5kZTc2QGdtYWlsLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.