AUTHOR=Li Xiaojun , Zhang Xiaobo , He Jun , Song Yixiang , Li Yanqi TITLE=Enhancing landslide dam stability prediction: a data-driven framework integrating missing data imputation and optimal threshold discrimination JOURNAL=Frontiers in Earth Science VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/earth-science/articles/10.3389/feart.2025.1642791 DOI=10.3389/feart.2025.1642791 ISSN=2296-6463 ABSTRACT=IntroductionAccurate prediction of landslide dam stability is critical for mitigating downstream hazards, but reliable models are hindered by incomplete inventories due to missing data. This study addresses this gap by integrating advanced imputation techniques with machine learning (ML) to enhance prediction accuracy and applicability.MethodsWe compiled a global inventory of 518 landslide dam cases (25% missing data rate) and evaluated five imputation methods: generative adversarial imputation nets (GAIN), missForest, multiple imputations by chained equations (MICE), K-nearest neighbors (KNN), and mean most-frequency (MMF). Imputed datasets were used to train four ML models (SVM, RF, XGBoost, LR), with GAIN-SVM further optimized via Youden-index-based threshold discrimination.ResultsGAIN achieved the lowest RMSE (0.205) for continuous variables and 66.0% accuracy for categorical data. The GAIN-SVM combination yielded the highest predictive performance (AUC = 0.823), surpassing traditional methods by 15.2%. Threshold optimization improved classification accuracy by 3.1−9.3% for ambiguous cases (probabilities ∼0.5).DiscussionThe framework enables robust stability assessments even with incomplete field data, supporting emergency decision-making in landslide-prone regions. Its integration into early warning systems could enhance risk mitigation in data-scarce areas.