AUTHOR=Cuenca-Romero Carmen , Apolo-Apolo Orly Enrique , Rodríguez Vázquez Jaime Nolasco , Egea Gregorio , Pérez-Ruiz Manuel TITLE=Tackling unbalanced datasets for yellow and brown rust detection in wheat JOURNAL=Frontiers in Plant Science VOLUME=Volume 15 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1392409 DOI=10.3389/fpls.2024.1392409 ISSN=1664-462X ABSTRACT=This study assesses the efficacy of hyperspectral data in detecting yellow and brown rust in wheat. Machine learning models—ANN (Artificial Neural Network), SVM (Support Vector Machine), RF (Random Forest), and GNB (Gaussian Naïve Bayes)—were evaluated, with SVM and RF models demonstrating the highest accuracies, mainly when SMOTE (Syntenic Minority Over-Sampling Technique) -enhanced datasets were used. For yellow rust, the RF model achieved 70% accuracy using unaltered data, while for brown rust, the SVM model performed best, with an accuracy of 63% when SMOTE was applied to the training set. The study underscores the potential of spectral data and machine learning (ML) in plant disease detection. It calls for advanced research in data processing techniques, particularly in applying SMOTE and its impact on model performance.