ORIGINAL RESEARCH article
Front. Earth Sci.
Sec. Geohazards and Georisks
Volume 13 - 2025 | doi: 10.3389/feart.2025.1642791
This article is part of the Research TopicNatural Disaster Prediction Based on Experimental and Numerical MethodsView all 21 articles
Enhancing Landslide Dam Stability Prediction: A Data-Driven Framework Integrating Missing Data Imputation and Optimal Threshold Discrimination
Provisionally accepted- 1Shanxi Vocational University of Engineering Science and Technology, Jinzhong, China
- 2China Academy of Building Research, Beijing, China
- 3Hebei University of Technology, Beichen District, China
- 4Hebei University of Technology, Tianjin, China
- 5Hebei Polytechnic of Building Materials, Qinhuangdao, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate prediction of the stability of landslide dams is crucial in preventing and mitigating potential threats to downstream communities. However, the development of reliable predictive models is hindered by incomplete landslide dam inventory datasets due to missing data. To overcome this challenge, this study proposes a datadriven approach that incorporates missing data imputation to enhance the applicability and accuracy of landslide dam stability predictions. On the basis of the collected landslide dam inventory containing 518 cases with a probability of missing rate of 25%, various data imputation methods including generative adversarial imputation Nets (GAIN), missForest, multiple imputations by chained equations (MICE), K-nearest neighbors (KNN) and mean most-frequency (MMF) were used to estimate missing values to improve the completeness of the datasets.The imputed datasets were used to predict the stability of landslide dams via various machine learning approaches (support vector machine (SVM), random forests (RF), extreme gradient boosting (XGBoost), and logistic regression (LR)). Our key innovation lies in coupling GAIN with SVM, enhanced by Youden-index-based threshold optimization for stability classification. Key results demonstrate GAIN's superiority: it achieved the lowest RMSE (0.205) for continuous variables and 66.0% accuracy for categorical data. The GAIN-SVM combination yielded the highest predictive performance (AUC = 0.823), outperforming traditional methods by 15.2%. The Youden-index further improved classification accuracy by 3.1-9.3% for ambiguous cases (probabilities ~0.5), addressing a critical gap in existing models. This framework enables rapid stability assessments even with incomplete field data, providing critical support for emergency decision-making in landslide-prone regions. It also allows reliable risk assessments in data-scarce regions, supporting timely hazard mitigation decisions.
Keywords: Landslide dam stability, Missing data imputation, Generative adversarial imputation nets, machine learning, Threshold optimization, Geohazard risk assessment
Received: 07 Jun 2025; Accepted: 10 Jul 2025.
Copyright: © 2025 Li, Zhang, He, Song and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jun He, Hebei University of Technology, Beichen District, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.