- Department of Electronics Communication Engineering, SRM Institute of Science and Technology, Vadapalani Campus, Chennai, India
Introduction: Early and accurate detection of crop stress is vital for sustainable agriculture and food security. Traditional vegetation indices such as NDVI and NDWI often fail to detect early-stage water and structural stress due to their limited spectral sensitivity.
Method: This study introduces two novel hyperspectral indices — Machine Learning-Based Vegetation Index (MLVI) and Hyperspectral Vegetation Stress Index (H_VSI) — which leverage critical spectral bands in the Near-Infrared (NIR), Shortwave Infrared 1 (SWIR1), and Shortwave Infrared 2 (SWIR2) regions. These indices are optimized using Recursive Feature Elimination (RFE) and serve as inputs to a Convolutional Neural Network (CNN) model for stress classification.
Results: The proposed CNN model achieved a classification accuracy of 83.40%, effectively distinguishing six levels of crop stress severity. Compared to conventional indices, MLVI and H_VSI enable detection of stress 10–15 days earlier and exhibit a strong correlation with ground-truth stress markers (r = 0.98).
Discussion: This framework is suitable for deployment with UAVs, satellite platforms, and precision agriculture systems.
1 Introduction
Sustainable crop production is increasingly challenged by abiotic stresses such as drought, nutrient deficiency, and heat. These stresses often lead to significant yield losses if not detected and mitigated early (Koh et al., 2022). Traditional monitoring techniques, including normalized difference vegetation index (NDVI) and normalized difference water index (NDWI), focus primarily on chlorophyll content and have limited capability in detecting early-stage or non-chlorophyll-related stress responses (Sun et al., 2017; Lu et al., 2020). The authors (Varghese et al., 2021) reviewed the capabilities of Sentinel-2 for drought detection, reinforcing the need to advance beyond traditional multispectral sensors toward hyperspectral solutions for early stress detection (Pertiwi et al., 2024). Hence, there is a strong need for more sensitive and accurate detection tools.
Hyperspectral imaging (HSI) has emerged as a transformative technology in remote sensing, offering high spectral resolution that can detect subtle physiological changes in crops. With hundreds of contiguous spectral bands, HSI captures detailed reflectance patterns sensitive to plant water status, canopy structure (Lutz et al., 2024), and stress-related biochemical properties (Okyere et al., 2024). These spectral signatures, especially in the Near-Infrared (NIR) and Shortwave Infrared (SWIR) regions, are particularly useful for monitoring crop stress (You et al., 2024).
Integrating machine learning (ML) and deep learning (DL) techniques with HSI has further enhanced its potential, enabling automated feature selection and robust classification performance (Raja et al., 2022). For instance, convolutional neural networks (CNNs) and support vector machines (SVMs) have been widely used to analyze spectral-spatial information for early disease and stress detection in crops (Varghese et al., 2021; Hui et al., 2023).
Despite these advancements, most existing approaches lack generalizability across different stress types and fail to provide early warnings. Moreover, the indiscriminate use of all spectral bands increases computational overhead and reduces model interpretability (Wei et al., 2021). Studies have shown that redundant spectral information can negatively impact classifier performance unless optimal band selection is applied (Benelli et al., 2020).
To bridge these gaps, we propose two novel hyperspectral vegetation indices—Machine Learning-Based Vegetation Index (MLVI) and Hyperspectral Vegetation Stress Index (H_VSI)—that leverage recursive feature elimination (RFE) for data-driven band selection. These indices, when fed into a 1D CNN classifier, enable efficient and accurate early stress detection. The proposed CNN model achieved a classification accuracy of 83.40% and successfully differentiated six levels of crop stress severity.
This study aims to:
i. Develop machine learning-optimized vegetation indices using RFE;
ii. Integrate these indices into a 1D CNN model for multi-class stress classification; and
iii. Evaluate the proposed framework against conventional indices using hyperspectral data.
The rest of the paper is structured as follows: Section 2 discusses the related works; Section 3 outlines the methodology, including data acquisition, preprocessing, and feature selection; Section 4 presents the results and discussion, including index performance, classification metrics, and geospatial stress mapping; Section 5 concludes the study and discusses future directions for precision agriculture deployment.
2 Related work
2.1 Hyperspectral imaging in crop stress detection
Hyperspectral imaging (HSI) captures reflectance across hundreds of narrow bands in the visible, near-infrared (NIR), and short-wave infrared (SWIR) regions, enabling detection of early stress-induced changes in plant physiology. Multiple studies have shown that stress-related alterations—such as reductions in leaf water content, pigment degradation, and changes in canopy structure—correlate with spectral variations, particularly in the SWIR region (Koh et al., 2022). The author (Okyere et al., 2024) demonstrated that hyperspectral imaging combined with ML models effectively captured both drought and nitrogen stress interactions in wheat, highlighting its dual diagnostic potential. Studies like You et al. (2024) emphasize the role of surface parameterization and its influence on remote sensing outputs, which can be crucial when interpreting hyperspectral data in variable terrain or microclimates. UAV-mounted HSI systems offer fine spatial resolution and large-area coverage, making them ideal for early stress detection across a range of crops, including wheat, pearl millet, potato, and maize (Khanna et al., 2019). The work (Wei et al., 2021) showed how point-based hyperspectral readings, when combined with multivariate regression, could effectively estimate grapevine water status, underscoring the value of spectral resolution for water stress analysis.
2.2 Machine and deep learning for stress classification
Machine learning (ML) models like Support Vector Machines (SVM), Random Forest (RF), and Deep Neural Networks (DNN) have been successfully used for classifying crop stress using HIS (Lu et al., 2020). For instance, SVMs have achieved high accuracy (>90%) in detecting drought and nutrient deficiencies (Varghese et al., 2021). However, their performance relies heavily on the quality and selection of input features.
Deep learning models, especially 1D and 2D Convolutional Neural Networks (CNNs), have demonstrated strong potential in extracting hierarchical features directly from raw or preprocessed spectral data. CNNs outperform traditional ML in cases with large datasets but require considerable computational power and risk overfitting when spectral inputs are not optimized (Wei et al., 2021). AutoML and meta-learning approaches have also shown promise in learning from limited training data (Benelli et al., 2020).
Recent advancements in ensemble learning and hybrid deep models have demonstrated the power of integrating multiple learning paradigms for complex classification tasks (Chen B. et al., 2024). For example (Bao et al., 2023), proposed a Deep Forest (DF)-based model for Golgi protein classification, which combines decision tree ensembles with representation learning, offering robustness with limited data and avoiding the need for extensive hyperparameter tuning—an approach that is well-suited for hyperspectral stress analysis where labeled data is limited (Chen H. et al., 2024). Similarly (Duarte-Carvajalino et al., 2021), utilized feature-engineered inputs combined with machine learning classifiers for cleft lip and palate reconstruction, demonstrating the value of domain-specific feature extraction in improving model interpretability and classification accuracy. These insights align with our study’s design, where Recursive Feature Elimination (RFE) is used to derive optimized hyperspectral vegetation indices (MLVI, H_VSI) that feed into a CNN for robust stress classification.
2.3 Vegetation index development
Traditional vegetation indices such as NDVI and NDWI are limited by their reliance on broad spectral bands and chlorophyll sensitivity, which hampers early detection of abiotic stress. Recent research has focused on developing more stress-specific indices by targeting NIR and SWIR wavelengths linked to water content, leaf thickness, and pigment loss (Murphy et al., 2020). Examples include the Leaf Water Vegetation Index (LWVI), which is designed for drought monitoring in durum wheat, and indices optimized for pearl millet and groundnut using ML-based feature selection (Kim et al., 2015; Malounas et al., 2024).
Despite these efforts, most indices remain manually crafted and are not dynamically optimized for varying crop conditions or stress types. Moreover, few studies incorporate both band selection and index formulation into a unified classification framework.
2.4 Crop-specific applications
In wheat, hyperspectral indices combined with SVM or RF classifiers have classified drought stress levels under varying nitrogen regimes with accuracies exceeding 94% (Roy et al., 2023). In maize, unsupervised learning techniques have detected stress up to 10 days earlier than NDVI (Gessner et al., 2023). In potato, CNN and RF models trained on hyperspectral features have predicted water stress with high precision, even on small datasets. In tomato crops, author (Zhang et al., 2024; Tola et al., 2025) used spectral indices to assess salinity stress, demonstrating the broader utility of reflectance-based stress diagnosis across crop types.
These findings highlight the cross-crop applicability of HSI and ML models but also point to a need for better generalization and computational efficiency.
2.5 Research gap and study contribution
While existing methods demonstrate strong performance, they often rely on fixed-band inputs and perform binary classification of stress presence. They lack the ability to generalize across diverse stress types or quantify stress severity (Zhou et al., 2022). Moreover, full-spectrum HSI models introduce unnecessary computational complexity and limit scalability in real-world deployments.
This study bridges these gaps by:
Proposing two new hyperspectral vegetation indices (MLVI and H_VSI) derived from machine learning-guided Recursive Feature Elimination (RFE); Embedding these indices into a CNN classifier for accurate, multi-level stress classification; Validating the model on real-world UAV-acquired hyperspectral datasets, demonstrating its practical value in precision agriculture (Abbas et al., 2023).
This Table 1 highlights the diverse applications of machine learning techniques in hyperspectral-based crop stress detection, demonstrating their effectiveness across various crops and stress conditions.
Traditional vegetation indices, such as NDVI and NDWI, have been widely used for vegetation monitoring. However, their effectiveness in detecting early stress conditions is limited due to their reliance on broad spectral bands (Das et al., 2023). Recent advancements in hyperspectral imaging and machine learning have enabled more precise stress detection by leveraging narrow spectral features. Studies have demonstrated that hyperspectral indices incorporating SWIR bands can detect stress earlier than NDVI. Machine learning-based approaches, such as Random Forest, CNN have further improved classification accuracy by automatically selecting the most relevant spectral bands.
3 Methodology
This section outlines the complete workflow for hyperspectral stress classification, including data acquisition details, preprocessing, feature selection, vegetation index formulation, and deep learning-based classification as mentioned in Figures 1, 2.
3.1 Dataset description
This study utilizes the GHISACONUS Hyperspectral Spectral Library, comprising reflectance data acquired from NASA’s EO-1 Hyperion sensor. The dataset spans a spectral range of 437–2345 nm and includes detailed metadata on location, crop type, and growth stage. Key bands relevant to plant stress were identified, including:
X661 (Red) – chlorophyll absorption,
X854 (NIR) – canopy structure,
X1649 (SWIR1) – water content,
X2133 (SWIR2) – leaf dryness.
3.2 Spectral preprocessing
To enhance signal quality, spectral reflectance data were first smoothed using the Savitzky-Golay filter (Sara and Rajasekaran, 2025), which fits a low-degree polynomial across a moving window to reduce noise while preserving peak shapes. This step ensures that small but meaningful spectral variations related to stress are retained. Following smoothing, Z-score normalization was applied to each spectral band (Alordzinu et al., 2021; Johnson et al., 2024b). This standardized the input by centering values around a mean of zero and scaling them to unit variance, which is essential for preventing bias during model training.
3.2.1 Savitzky-Golay filtering
In this study, the Savitzky-Golay filter was applied during the preprocessing stage to enhance the quality of the hyperspectral reflectance data. This step was essential for minimizing noise introduced by environmental variability, sensor drift, and atmospheric interference, which can distort subtle spectral cues relevant to early stress detection. By smoothing the data while retaining key spectral patterns, the filter improved the reliability of subsequent feature selection and classification stages.
The polynomial smoothing equation (Equation 1) enhances the reflectance signal by:
Savitzky-Golay formula:
Where is the smoothed value, are neighboring points, and are the filter coefficients.
This would enhance the scientific rigor of the preprocessing explanation.
Figure 3 demonstrates the effect of the filter:
● The raw spectral data (red line with circles) shows noticeable fluctuations and noise.
● The smoothed spectral curve (blue dashed line with squares) closely follows the overall trend of the raw data but with significantly reduced variability.
● This preprocessing step ensured that the spectral input to machine learning and vegetation index calculations remained biologically meaningful and robust to noise.
Then apply Z-score normalization to ensure that the filtered spectral data is scaled consistently across bands before feature selection and CNN input.
3.2.2 Normalization: scaling of reflectance values across spectral bands
To ensure consistency and stability across the high-dimensional hyperspectral inputs, Z-score normalization was applied to all selected spectral bands prior to index formulation and CNN classification. This standardization technique transforms each spectral value by subtracting the mean and dividing by the standard deviation of its respective band. As hyperspectral reflectance data often contains varying magnitudes across wavelengths due to differences in sensor response and biophysical properties, normalization is essential for preventing feature dominance and improving model convergence.
This preprocessing step plays a critical role in our pipeline, especially when feeding selected bands (e.g., X854, X1649, X2133) or derived indices (MLVI and H_VSI) into the 1D CNN. By centering the data around zero mean and scaling to unit variance, the model is able to learn more efficiently from all spectral features without bias toward specific band ranges.
Z-score normalization was applied using Equation 2:
Where:
X = original reflectance value
μ = mean of the spectral band
σ = standard deviation of the spectral band
3.3 Spectral correlation analysis
A spectral correlation matrix was generated to visualize redundancy and inter-band relationships. Bands exhibiting strong correlation with known stress indicators were prioritized for further analysis. Heatmaps were used to identify regions with high information density and eliminate irrelevant wavelengths.
Correlation heatmap of hyperspectral features with regions corresponding to visible-range (VIS), near-infrared (NIR), and short wave IR (SWIR) bands highlighted (Figure 4a). Strong intra-region correlations are observed within VIS and NIR bands, while SWIR bands show distinct correlation behavior (Figure 4b), indicating their importance for water and structural stress detection (Kumar and Shankar, 2024). The highlighted divisions validate the selection of multi-spectral bands for optimal stress-sensitive index formulation.

Figure 4. (a) Correlation heatmap of hyperspectral features in the dataset (b) Features highlighting VIS and SWIR Bands.
The Pearson correlation coefficient r, computed using Equation (X), quantifies linear dependency between spectral bands. The correlation matrix (Figure 4) uses this metric to identify highly correlated band pairs for elimination, and bands with low redundancy and high stress sensitivity for retention.
Equation 3 or method used to compute correlation coefficients (Pearson or Spearman) is
Pearson’s correlation coefficient formula:
Where,
and are the individual sample points
is the mean of the X values
is the mean of the Y values
n is the number of paired observations
Hyperspectral data has hundreds of bands, many of which are redundant or noisy. PCA reduces this to the top 30 components (Ruan et al., 2023), capturing the most important variance in the data. These components are used as input for training the CNN model. The plot (Figures 5a, b) shows how much total variance is retained as more principal components are added. First 3 components explain over 80% of the data variance. 30 components retain over 95%, ensuring minimal information loss.
The different vegetation indices and their respective sensitivities are summarized in Table 2. NDVI primarily reflects chlorophyll concentration but often exhibits delayed responses under stress conditions. NDWI enhances sensitivity to water stress, while H_VSI extends detection to early water and structural stress. The MLVI index, optimized through machine learning-driven band selection, offers superior early-stage stress detection by integrating multiple stress-sensitive spectral regions (Felix et al., 2025).
3.4 Band selection using recursive feature elimination
Recursive Feature Elimination (RFE) was used in conjunction with a Random Forest estimator to select the top 10 stress-sensitive spectral bands. This dimensionality reduction step improves both classification performance and model interpretability by focusing on biologically relevant features. The selected bands (e.g., X854, X1649, X2133) formed the foundation for vegetation index design.
Given the high dimensionality of hyperspectral data, feature reduction is essential to improve model performance and interpretability. Two complementary approaches were used:
● Recursive Feature Elimination (RFE): This method ranks spectral bands based on their importance in a machine learning model (e.g., SVM) and iteratively eliminates the least significant bands. The selected bands were used to construct two custom indices: MLVI and H_VSI. RFE band selection is explained in Algorithm 1.
● Principal Component Analysis (PCA): PCA was applied to visualize data variance and support unsupervised dimensionality reduction. While PCA outputs were not used directly in model training, they were used for exploratory analysis and variance validation (Lin et al., 2014).
The hyperspectral data was first standardized and then reduced using Principal Component Analysis (PCA) to retain only the most informative features for CNN classification illustrates in Figure 5.
PCA selects the most important features from the hyperspectral dataset and gives the optimal feature extraction. After PCA selection the Band selection is carried over. For the better band selection Spectral Correlation Analysis is done for the dataset.
PCA transforms input data X into uncorrelated components using Equation 4:
Where:
X: standardized spectral data (after scaling)
W: matrix of eigen-vectors (principal components)
Z: transformed data (principal components)
3.5 Vegetation index formulation
Based on the RFE-selected bands, two novel vegetation indices were formulated:
● Machine Learning-Based Vegetation Index (MLVI): Combines bands most predictive of early stress from NIR and SWIR regions.
● Hyperspectral Vegetation Stress Index (H_VSI): Captures stress-sensitive spectral contrast using a weighted band-ratio approach optimized for drought and nutrient deficiencies.
Hyperspectral Vegetation Stress Index (H_VSI), defined in Equation 5:
This index effectively integrates NIR, SWIR1, and SWIR2 to detect stress levels, including water stress and structural damage. NDVI - based indices focus on chlorophyll (Red/NIR) but are weak for detecting water stress. H_VSI directly accounts for leaf water content changes through SWIR1 and SWIR2, making it better for detecting drought stress early.
Machine Learning-Based Vegetation Index (MLVI) formulated as shown in Equation 6:
MLVI dynamically selects the best spectral bands using Recursive Feature Elimination (RFE) and machine learning models. In our work X854, X1649, X2133 spectral bands are chosen dynamically for analyzing the early stress response (Equation 7).
The vegetation index derivation workflow is illustrated in Figure 6. Initially, hyperspectral reflectance data are organized into a matrix format with each sample represented across hundreds of spectral bands (Sharma et al., 2023b). Stress conditions are labeled according to severity (healthy, moderate, high stress), forming a supervised dataset. To reduce dimensionality and emphasize the most informative wavelengths, Recursive Feature Elimination (RFE) is applied, iteratively selecting the top 2–3 bands most correlated with stress levels. A custom vegetation index is then formulated using a normalized difference structure based on these selected bands. The derived index is computed pixel-wise over the hyperspectral imagery, converting complex spectral information into a stress-sensitive grayscale map. Finally, the index output is used as input for ML classifiers such as CNN or SVM to automate vegetation stress detection and mapping. This workflow enables efficient, robust, and physiologically meaningful stress monitoring from hyperspectral remote sensing data.
3.6 Deep learning-based classification
To perform multi-level classification of crop stress, a one-dimensional Convolutional Neural Network (1D-CNN) was developed and trained using two proposed vegetation indices MLVI and H_VSI as input features. These indices condense critical spectral information related to chlorophyll content, water availability, and structural degradation, enabling efficient and interpretable input representation.
The CNN architecture comprises two convolutional layers with ReLU activation functions, followed by max pooling to reduce feature dimensionality and emphasize key spectral patterns. A dropout layer was incorporated to prevent overfitting and improve generalization. The final softmax layer classified each input into one of six crop stress severity levels (Duarte-Carvajalino et al., 2021), ranging from healthy to extreme stress.
To optimize model performance, Recursive Feature Elimination (RFE) was used to identify a subset of the most informative spectral bands. These selected bands were then used to compute the MLVI and H_VSI indices, which were fed as structured 1D sequences into the CNN. This approach reduced computational complexity and ensured the network focused on stress-relevant wavelengths.
Given the sequential nature of hyperspectral data—with spectral bands ordered by wavelength—the 1D-CNN was well-suited to capture both local spectral patterns and long-range dependencies across the spectrum. The architecture learned subtle transitions in reflectance associated with physiological stress responses such as chlorophyll breakdown, water loss, and tissue degradation. This design enabled robust performance across varying stress conditions, achieving a classification accuracy of 83.40%.
For benchmarking, traditional classifiers including Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM) were evaluated using the same inputs. While LDA highlighted spectral separability and SVM handled high-dimensional inputs effectively, both were outperformed by the CNN, which demonstrated superior ability to learn both spatially localized and spectrally sequential features.
This flowchart (Figure 7a) illustrates a six-step pipeline for hyperspectral crop stress classification. It includes data acquisition, preprocessing (Savitzky-Golay filtering and normalization), stress labeling, RFE-based band selection, CNN-based classification, and final stress prediction with 83.40% accuracy. Each step systematically transforms raw spectral input into actionable stress level insights.The model (Figure 7b) ingests vegetation indices derived from RFE-selected bands and includes convolutional, pooling, dropout, and softmax layers. This architecture captures localized spectral features and sequential dependencies to classify six crop stress severity levels with high accuracy.

Figure 7. (a) Proposed MLVI band selected - Steps for Hyperspectral stress classification (b) 1D-CNN architecture.
3.7 Model training and evaluation strategy
To ensure robust and unbiased model performance, the dataset was partitioned using a stratified sampling strategy into 70% training, 15% validation, and 15% testing subsets. Stratification preserved the distribution of stress severity classes across all splits. The training set was used to learn, validation set for hyperparameter tuning, and the test set exclusively for final performance assessment.
The proposed CNN model was trained by the categorical cross-entropy loss function is calculated using Equation 8:
Where, yi is the ground truth label, ŷi is the predicted probability for class i, and C is the number of classes.
Model optimization was conducted using Adam optimizer with initial learning rate of 0.001. A learning rate-scheduler (ReduceLROnPlateau) dynamically adjusted the learning rate based on validation loss stagnation, preventing overfitting and enhancing convergence.
3.7.1 Performance metrics
To quantitatively assess model performance, the following standard metrics were calculated using the below Equations 9–13:
The above metrics were evaluated on the test dataset using confusion matrices and ROC curves for each classifier.
In addition to conventional metrics, we evaluated Matthews Correlation Coefficient (MCC) to assess the balance and robustness of classification performance, especially under class imbalance conditions. The proposed 1D CNN model achieved the highest MCC of 0.659, outperforming SVM (0.570) and LDA (0.528). These results highlight the CNN’s superior ability to correctly classify both positive and negative stress levels across multiple severity classes. This underscores the robustness and suitability of the MLVI-CNN framework for real-world hyperspectral crop stress detection tasks.
4 Results and discussion
4.1 Performance of novel indices
The proposed indices, MLVI and H_VSI, demonstrated substantial improvements over traditional vegetation indices such as NDVI, NDWI, and PRI in early stress detection. Through optimization via Recursive Feature Elimination (RFE), MLVI was able to detect stress signals 10–15 days earlier than NDVI, particularly in water- and heat-stressed vegetation zones (Figure 8). This improvement aligns with prior research that highlights the sensitivity of SWIR/NIR bands to physiological stress.
A correlation analysis (Figure 9) revealed that MLVI had a stronger correlation with actual stress levels (r = 0.98) compared to NDVI (r = 0.86), indicating a more accurate relationship between MLVI and plant health status. The threshold behavior of MLVI across stress levels is summarized in Table 3, where values approaching 1.0 indicated healthy vegetation, while negative values corresponded to severe stress, characterized by declining NIR reflectance and increasing SWIR1/SWIR2 absorption.
The temporal profile shown in Figure 10 further emphasizes that while NDVI remains relatively stable, MLVI and H_VSI demonstrate dynamic variation, particularly between Julian dates 150–250, signaling their superior sensitivity to early and mid-stage stress development (Khanna et al., 2019; Anand and Sharma, 2024).
4.2 Machine learning classification performance
To assess classification performance, ML models including LDA, SVM, and a 1D CNN were evaluated. The 1D CNN outperformed traditional models, achieving an accuracy of 83.40%, while SVM and LDA yielded 78.97% and 77.40%, respectively (Figure 11).
The CNN’s hybrid architecture allowed it to effectively extract both localized and sequential spectral patterns, making it more adept at capturing complex stress signals from hyperspectral inputs.
Figure 12 and Table 4 compare additional metrics (precision, recall, F1-score), all of which were highest for the CNN. The ROC curves in Figure 13a further support the model’s reliability, with AUC scores exceeding 0.95 for all six stress classes, and a micro-average AUC of 0.98. Figure 13b demonstrates stable learning behavior, with minimal overfitting across training and validation sets.

Figure 13. (a) ROC curve for multi-class level stress classification (b) training and validation accuracy over epochs for ID CNN model.
Figure 14 presents confusion matrices for the three models. The CNN model showed improved prediction across all classes, especially for subtle stress stages like “Healthy” and “Extreme Stress” The terminology was unified by mapping stages to stress classes: Class 0 (Healthy) to Class 5 (Extreme Stress).

Figure 14. Confusion matrices comparing the classification performance of four machine learning model - Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) and 1D Convolutional Neural Network (1D CNN) on vegetation stress stage detection.
4.3 Comparison of feature extraction methods
Figure 15 highlights the classification accuracy of different input strategies. NDVI-based input achieved only 68.00% accuracy due to its limited spectral sensitivity. PCA-based inputs improved performance to 75.00% by capturing key variance but still lacked domain-specific feature focus. MLVI-based input outperformed both, achieving 83.40% accuracy, demonstrating the advantage of ML-guided band selection (NIR, SWIR1, SWIR2).
Confusion matrices in Figures 16, 17 validate this observation. The MLVI-CNN model displayed the most balanced performance across all stress stages, with minimal misclassification, especially in early and moderate stress conditions.
4.4 Geospatial stress visualization
Considering the role of irrigation patterns and mechanical stress, studies like (Hui et al., 2023) could provide context for associating field-level sprinkler stress patterns with spectral stress zones. Geospatial mapping (Figure 18) of MLVI-derived stress scores using GPS coordinates and Folium-based heatmaps revealed clearly defined high-stress zones within the test field. Bright regions corresponded to severe stress, while dark regions indicated healthy crops. These spatial insights align with previous studies using fused or pansharpened hyperspectral imagery, further validating the robustness of MLVI for precision agriculture applications (Dao et al., 2021).
5 Comparison of proposed method with existing works with different datasets
Table 5 compares the highlights how the proposed method (MLVI + 1D CNN) performs against several existing studies. The comparison includes key metrics such as classification accuracy and highlights which stress types are detected, along with proper reference citations.
(MLVI + 1D CNN) achieved the highest classification accuracy (83.40%) among the compared studies. It is also more specialized in detecting both early-stage water stress and structural stress, by the use of MLVI’s RFE-optimized band selection. It outperforms several advanced models (e.g., ATSFCNN, SSFNet) that use full-spectrum hyperspectral input, highlighting the benefit of optimized feature engineering (Sankararao et al., 2023; Li et al., 2025).
6 Conclusion
This study presents the development and evaluation of two novel hyperspectral vegetation indices MLVI and H_VSI designed for early detection of crop stress. By leveraging RFE for optimal band selection and 1D CNN for robust classification, the projected method significantly outperforms conventional indices such as NDVI and NDWI in terms of sensitivity and classification accuracy. MLVI demonstrated a stronger correlation with stress indicators (r = 0.98) and detected stress conditions up to 10 days earlier than conventional indices. The 1D CNN hybrid model achieved a classification accuracy of 83.40%, further validating the effectiveness of the selected spectral features. Geospatial visualization using MLVI enabled the mapping of stress intensity across agricultural fields, offering actionable insights for precision agriculture. These findings highlight the potential of hyperspectral-ML approaches to revolutionize early stress detection and crop health monitoring.
6.1 Novelty & strength
MLVI is the first machine learning-optimized vegetation index that integrates hyperspectral selection with deep learning. It surpasses conventional models in early detection, spectral sensitivity, and robustness.
6.2 Limitations & error analysis
Mild stress sometimes misclassified as healthy.
Performance drops under cloud-contaminated bands.
Biochemical validation was not performed yet.
6.3 Generalizability
Future studies will test MLVI on varied crop types and field trials across regions including India. Compatibility with UAV sensors and real-time monitoring is under evaluation.
6.4 Implications
Using MLVI could improve early interventions, reduce water/fertilizer misuse, and enable large-scale crop health monitoring. Such methods support climate-smart agriculture and yield protection.
7 Future work
Future research will focus on integrating these indices with real-time UAV and satellite-based hyperspectral platforms, improving model generalization across diverse crop types, and coupling spectral indices with biochemical validation to enhance interpretability and robustness.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.
Author contributions
PS: Writing – original draft, Writing – review & editing. ASE: Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research, and/or publication of this article.
Acknowledgments
We thank [SRM Institute of Science and Technology] for providing infrastructure and research support.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abbas, A., Zhang, Z., Zheng, H., Alami, M. M., Alrefaei, A. F., Abbas, Q., et al. (2023). Drones in plant Disease Assessment, Efficient Monitoring, and Detection: A way forward to smart agriculture. Agronomy 13, 1524. doi: 10.3390/agronomy13061524
Alordzinu, K. E., Li, J., Lan, Y., Appiah, S. A., Aasmi, A. A., Wang, H., et al. (2021). Ground-Based hyperspectral remote sensing for estimating water stress in tomato growth in sandy loam and silty loam soils. Sensors 21, 5705. doi: 10.3390/s21175705
Anand, S. and Sharma, R. (2024). Pansharpening and spatiotemporal image fusion method for remote sensing. Eng. Res. Express 6, 022201. doi: 10.1088/2631-8695/ad3a34
Bao, W., Gu, Y., Chen, B., and Yu, H. (2023). Golgi_DF: Golgi proteins classification with deep forest. Front. Neurosci. 17. doi: 10.3389/fnins.2023.1197824
Benelli, A., Cevoli, C., and Fabbri, A. (2020). In-field hyperspectral imaging: An overview on the ground-based applications in agriculture. J. Agric. Eng. 51, 129–139. doi: 10.4081/jae.2020.1030
Cai, J., Boust, C., and Mansouri, A. (2024). ATSFCNN: a novel attention-based triple-stream fused CNN model for hyperspectral image classification. Mach. Learn. Sci. Technol. 5, 015024. doi: 10.1088/2632-2153/ad1d05
Chen, H., Chen, H., Zhang, S., Chen, S., Cen, F., Zhao, Q., et al. (2024). Comparison of CWSI and Ts-Ta-VIs in moisture monitoring of dryland crops (sorghum and maize) based on UAV remote sensing. J. Integr. Agric. 23, 2458–2475. doi: 10.1016/j.jia.2024.03.042
Chen, B., Li, N., and Bao, W. (2024). CLPr_in_ML: cleft lip and palate reconstructed features with machine learning. Curr. Bioinf. 20, 179–193. doi: 10.2174/0115748936330499240909082529
Dao, P. D., He, Y., and Proctor, C. (2021). Plant drought impact detection using ultra-high spatial resolution hyperspectral images and machine learning. Int. J. Appl. Earth Observ. Geoinform. 102, 102364. doi: 10.1016/j.jag.2021.102364
Das, S., Biswas, A., VimalKumar, C., and Sinha, P. (2023). Deep learning analysis of rice blast disease using remote sensing images. IEEE Geosci. Remote Sens. Lett. 20, 1–5. doi: 10.1109/lgrs.2023.3244324
Duarte-Carvajalino, J. M., Silva-Arero, E. A., Góez-Vinasco, G. A., Torres-Delgado, L. M., Ocampo-Paez, O. D., and Castaño-Marín, A. M. (2021). Estimation of water stress in potato plants using hyperspectral imagery and machine learning algorithms. Horticulturae 7, 176. doi: 10.3390/horticulturae7070176
Felix, M. J. B., Main, R., Watt, M. S., Arpanaei, M., and Patuawa, T. (2025). Early detection of water stress in kauri seedlings using multitemporal hyperspectral indices and inverted plant traits. Remote Sens. 17, 463. doi: 10.3390/rs17030463
Gessner, U., Reinermann, S., Asam, S., and Kuenzer, C. (2023). Vegetation stress monitor—Assessment of drought and temperature-related effects on vegetation in Germany analyzing MODIS time series over 23 years. Remote Sens. 15, 5428. doi: 10.3390/rs15225428
Hui, X., Zhao, H., Zhang, H., Wang, W., Wang, J., and Yan, H. (2023). Specific power or droplet shear stress: Which is the primary cause of soil erosion under low-pressure sprinklers? Agric. Water Manage. 286, 108376. doi: 10.1016/j.agwat.2023.108376
Johnson, I., Mary, X. A., Raj, A. P. W., Chalmers, J., Karthikeyan, M., and Jeyabose, A. (2024b). Deep-Millet: A Deep learning model for Pearl Millet Disease identification to envisage precision Agriculture. Environ. Res. Commun. 6, 105031. doi: 10.1088/2515-7620/ad8415
Khanna, R., Schmid, L., Walter, A., Nieto, J., Siegwart, R., and Liebisch, F. (2019). A spatio temporal spectral framework for plant stress phenotyping. Plant Methods 15. doi: 10.1186/s13007-019-0398-8
Kim, D. M., Zhang, H., Zhou, H., Du, T., Wu, Q., Mockler, T. C., et al. (2015). Highly sensitive image-derived indices of water-stressed plants using hyperspectral imaging in SWIR and histogram analysis. Sci. Rep. 5. doi: 10.1038/srep15919
Koh, J. C., Banerjee, B. P., Spangenberg, G., and Kant, S. (2022). Automated hyperspectral vegetation index derivation using a hyperparameter optimisation framework for high-throughput plant phenotyping. New Phytol. 233, 2659–2670. doi: 10.1111/nph.17947
Kumar, N. and Shankar, V. (2024). Application of artificial intelligence-based modelling for the prediction of crop water stress index. Res. Square (Research Square). doi: 10.21203/rs.3.rs-3900676/v1
Li, Z., Duan, P., Zheng, J., Xie, Z., Kang, X., Yin, J., et al. (2025). SSFNET: spectral-spatial fusion network for hyperspectral remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 1. doi: 10.1109/tgrs.2025.3549075
Lin, C., Chen, J., Su, P., and Chen, C. (2014). Eigen-feature analysis of weighted covariance matrices for LiDAR point cloud classification. ISPRS J. Photogramm. Remote Sens. 94, 70–79. doi: 10.1016/j.isprsjprs.2014.04.016
Lu, B., Dao, P., Liu, J., He, Y., and Shang, J. (2020). Recent advances of hyperspectral imaging technology and applications in agriculture. Remote Sens. 12, 2659. doi: 10.3390/rs12162659
Lutz, N., Rodriguez-Veiga, P., and Menor, I. O. (2024). Estimating vegetation structure and aboveground carbon storage in Western Australia using GEDI LiDAR, Landsat, and Sentinel data. Environ. Res. Ecol. 3, 045004. doi: 10.1088/2752-664x/ad7f5a
Malounas, I., Paliouras, G., Nikolopoulos, D., Liakopoulos, G., Bresta, P., Londra, P., et al. (2024). Early detection of broccoli drought acclimation/stress in agricultural environments utilizing proximal hyperspectral imaging and AutoML. Smart Agric. Technol. 8, 100463. doi: 10.1016/j.atech.2024.100463
Murphy, M. E., Boruff, B., Callow, J. N., and Flower, K. C. (2020). Detecting frost Stress in wheat: A Controlled Environment Hyperspectral Study on wheat plant components and implications for multispectral field sensing. Remote Sens. 12, 477. doi: 10.3390/rs12030477
Okyere, F. G., Cudjoe, D. K., Virlet, N., Castle, M., Riche, A. B., Greche, L., et al. (2024). Hyperspectral imaging for phenotyping plant drought stress and nitrogen interactions using multivariate modeling and machine learning techniques in wheat. Remote Sens. 16, 3446. doi: 10.3390/rs16183446
Pertiwi, S., Ipung, H. P., and Sukarno, B. P. W. (2024). Prototype of chili pathogen early detection system by using multispectral NIR/NUV. IOP Conf. Ser. Earth Environ. Sci. 1386, 12032. doi: 10.1088/1755-1315/1386/1/012032
Raja, S. P., Sawicka, B., Stamenkovic, Z., and Mariammal, G. (2022). Crop prediction based on characteristics of the agricultural environment using various feature selection techniques and classifiers. IEEE Access 10, 23625–23641. doi: 10.1109/access.2022.3154350
Roy, B., Sagan, V., Haireti, A., Newcomb, M., Tuberosa, R., LeBauer, D., et al. (2023). Early detection of drought stress in durum wheat using hyperspectral imaging and photosystem sensing. Remote Sens. 16, 155. doi: 10.3390/rs16010155
Ruan, S., Cang, H., Chen, H., Yan, T., Tan, F., Zhang, Y., et al. (2023). Hyperspectral classification of frost damage stress in tomato plants based on Few-Shot learning. Agronomy 13, 2348. doi: 10.3390/agronomy13092348
Sankararao, A. U. G., Rajalakshmi, P., and Choudhary, S. (2023). Machine Learning-Based Ensemble band selection for early water stress identification in groundnut canopy using UAV-Based hyperspectral imaging. IEEE Geosci. Remote Sens. Lett. 20, 1–5. doi: 10.1109/lgrs.2023.3284675
Sankararao, A. U. G., Priyanka, P., Rajalakshmi, P., and Choudhary, S. (2021). "CNN based water stress detection in chickpea using UAV based hyperspectral imaging," in IEEE International India Geoscience and Remote Sensing Symposium (InGARSS), Ahmedabad, India, 2021. pp. 145-148. 2021. pp. 145–148. doi: 10.1109/InGARSS51564.2021.9791948
Sara, K. and Rajasekaran, E. (2025). High Spatiotemporal Resolution Land Surface Temperature reveals fine-scale hotspots during heatwave events over India. Environ. Res. Commun. doi: 10.1088/2515-7620/adc0f2
Sharma, C., Barkataki, N., and Sarma, U. (2023b). A deep neural network with electronic nose for water stress prediction in Khasi Mandarin Orange plants. Meas. Sci. Technol. 34, 125152. doi: 10.1088/1361-6501/acf8e3
Sun, H., Liu, W., Wang, Y., and Yuan, S. (2017). Evaluation of typical spectral vegetation indices for drought monitoring in Cropland of the North China Plain. IEEE J. Select. Topics Appl. Earth Observ. Remote Sens. 10, 5404–5411. doi: 10.1109/jstars.2017.2734800
Tola, E., Al-Gaadi, K. A., Madugundu, R., Zeyada, A. M., Edrris, M. K., Edrees, H. F., et al. (2025). The use of spectral vegetation indices to evaluate the effect of grafting and salt concentration on the growth performance of different tomato varieties grown hydroponically. Horticulturae 11, 368. doi: 10.3390/horticulturae11040368
Varghese, D., Radulović, M., Stojković, S., and Crnojević, V. (2021). Reviewing the potential of Sentinel-2 in assessing the drought. Remote Sens. 13, 3355. doi: 10.3390/rs13173355
Wei, H., Grafton, M., Bretherton, M., Irwin, M., and Sandoval, E. (2021). Evaluation of point hyperspectral reflectance and multivariate regression models for grapevine water status estimation. Remote Sens. 13, 3198. doi: 10.3390/rs13163198
You, Y., Huang, C., and Zhang, Y. (2024). Assessing the sensitivity of snow depth simulations to land surface parameterizations within Noah-MP in Northern Xinjiang, China. Remote Sens. 16, 594. doi: 10.3390/rs16030594
Zhang, X., Vinatzer, B. A., and Li, S. (2024). Hyperspectral imaging analysis for early detection of tomato bacterial leaf spot disease. Sci. Rep. 14, 27666. doi: 10.1038/s41598-024-78650-6
Keywords: hyperspectral imaging, vegetation index, machine learning, crop stress, early detection, remote sensing
Citation: S P and Shirly Edward A (2025) MLVI-CNN: a hyperspectral stress detection framework using machine learning-optimized indices and deep learning for precision agriculture. Front. Plant Sci. 16:1631928. doi: 10.3389/fpls.2025.1631928
Received: 20 May 2025; Accepted: 07 July 2025;
Published: 17 September 2025.
Edited by:
Gemine Vivone, National Research Council (CNR), ItalyReviewed by:
Wenzheng Bao, Xuzhou University of Technology, ChinaFirozeh Solimani, Politecnico di Bari, Italy
Copyright © 2025 S and Shirly Edward. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: A. Shirly Edward, ZWR3YXJkc0Bzcm1pc3QuZWR1Lmlu