AUTHOR=Gelete Tadele Bedo , Pasala Pernaidu , Abay Nigus Gebremedhn , Woldemariam Gezahegn Weldu , Yasin Kalid Hassen , Kebede Erana , Aliyi Ibsa TITLE=Integrated machine learning and geospatial analysis enhanced gully erosion susceptibility modeling in the Erer watershed in Eastern Ethiopia JOURNAL=Frontiers in Environmental Science VOLUME=Volume 12 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2024.1410741 DOI=10.3389/fenvs.2024.1410741 ISSN=2296-665X ABSTRACT=Gully erosion is a serious environmental and agricultural problem in watershed ecosystems; therefore, identifying susceptible areas using advanced machine learning (ML) and geospatial analysis is necessary for sustainable management and ecosystem health. This study integrated XGBoost, random forest (RF), support vector machine (SVM), and neural network (NN) models with geospatial analysis to predict gully erosion susceptibility (GES) and identify conditioning factors in the Erer watershed in eastern Ethiopia. This study identified 22 geoenvironmental factors and modeled GES using 1200 inventory points (70% for training and 30% for validation). The performance and robustness of the ML models were validated using the area under the curve (AUC), accuracy, precision, sensitivity, specificity, kappa coefficient, F1 score, and logarithmic loss. The relative slope position is the most influential factor in GES, with 100% importance in SVM and RF and 95% importance in XGBoost, while rainfall is given 100% importance in NN. The XGBoost model demonstrated robustness and superior performance in GES predictions and mapping (AUC = 0.97), achieving the highest accuracy (0.91), precision (0.92), and kappa value (0.81) while maintaining a low logloss (0.0394).However, the SVM model outperformed the other models in recognizing areas resistant/susceptible to gully erosion, with the highest sensitivity (0.97), specificity (0.98), and F1 score (0.91). The NN predicted the largest area (13.74%) to have very high susceptibility, followed by SVM (11.69%), XGBoost (10.65%), and RF (7.85%), while XGBoost identified most areas (70.19%) as having very low susceptibility. The ensemble technique outperformed individual models with high AUC (0.99), accuracy (0.935), precision (0.925), sensitivity (0.975), specificity (0.954), kappa (0.858), and F1 score (0.949), identifying GES classes with 36.48% very low, 26.51% low, 16.24% moderate, 11.55% high, and 9.22% very high. Districtlevel GES analyses revealed that the Babile, Fedis, Harar, and Meyumuluke districts were the most susceptible, with high GES areas of 32.4%, 21.3%, 14.3%, and 13.6%, respectively. This study offers accurate, robust, and flexible ML models with comprehensive validation metrics for enhanced GES modeling, identifying conditioning factors, and thereby supporting decisionmaking for sustainable watershed conservation and land degradation prevention.