Landslide susceptibility assessment of upper Yellow River using coupling statistical approaches, machine learning algorithms and SBAS-InSAR technique

Zeng, Jin; Tuo, Wanbing; Wang, Xinchao; Zhao, Xingchang

doi:10.3389/feart.2025.1652646

ORIGINAL RESEARCH article

Front. Earth Sci., 29 August 2025

Sec. Geohazards and Georisks

Volume 13 - 2025 | https://doi.org/10.3389/feart.2025.1652646

Landslide susceptibility assessment of upper Yellow River using coupling statistical approaches, machine learning algorithms and SBAS-InSAR technique

Jin Zeng¹

Wanbing Tuo²*

Xinchao Wang¹

Xingchang Zhao¹

¹School of Geological Engineering, Qinghai University, Xining, Qinghai, China
²School of Engineering, Qinghai Institute of University, Xining, Qinghai, China

Landslide disasters frequently occur in the upper reaches of the Yellow River, particularly within the Gonghe to Xunhua section. A precise evaluation of landslide susceptibility is vital for effective disaster prevention and mitigation. Integrated models that combine statistical methods with machine learning techniques have been widely adopted for landslide susceptibility assessments. However, the quality and composition of the positive sample training data have a significant impact on the accuracy of the outcomes. This study uses historical landslide data from the region and applies two statistical approaches-the information value (IV) and the coefficient of determination (CF) methods-alongside three machine learning models: Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost). Six integrated models (IV-RF, IV-SVM, IV-XGBboost, CF-RF, CF-SVM, and CF-XGBoost) are developed to evaluate landslide susceptibility in the Yellow River’s upper reaches (from Gonghe to Xunhua). The Receiver Operating Characteristic (ROC) curve and Accuracy (ACC) values are used to assess the models’ performance, while spatial features of newly identified landslides, determined through optical remote sensing images, are compared using Small Baseline Subset-Interferometric Synthetic Aperture Radar (SBAS-InSAR) technology. The CF-XGBoost model is identified as the most effective. New landslide data were then added to the positive sample dataset to retrain the CF-XGBoost model, enhancing its predictive performance. The methodology proposed in this study not only enables effective evaluation of the accuracy and reliability of computational results derived from ensemble models, but also addresses the limitations caused by untimely acquisition of insufficient landslide samples. Furthermore, the resulting landslide susceptibility assessment establishes a reliable technical foundation for local disaster management authorities to formulate scientifically sound risk mitigation and control strategies.

1 Introduction

Landslides are a common geological hazard, distinguished by their sudden occurrence and widespread impact (Jia et al., 2022; Jiang et al., 2022), presenting direct threats to nearby infrastructure and the safety of residents’ lives and property (Pareek et al., 2025). In the upper reaches of the Yellow River (from Gonghe to Xunhua), the region’s complex geological features, steep topography, sparse vegetation, and increasing human activities in recent years have led to a higher frequency of landslides (Tu et al., 2023; Zhao et al., 2022). Therefore, it is essential to improve the management of landslide risks and enhance the capacity for disaster prevention and mitigation in this area. Landslide susceptibility assessment, a key method for disaster prevention, helps identify high-risk zones through precise, reliable, and efficient technical systems, providing a scientific foundation for effective disaster reduction and prevention efforts (Wang and Bai, 2023; He et al., 2023; Bhandary et al., 2013).

The goal of landslide susceptibility assessment is to forecast the likelihood of landslides by examining the spatial patterns of past landslides and the factors that influence their occurrence in a specific area (Sabatakakis et al., 2014; Rohan et al., 2023). The development of landslide disasters is influenced by a combination of internal factors (e.g., topography, geology, geological structure, transportation, and water systems) and external triggers (e.g., rainfall, earthquakes, and human engineering activities). The likelihood of a landslide varies depending on these factors (Lu et al., 2024). Traditional statistical methods calculate the probability of landslides by establishing mathematical relationships, which are simple and straightforward to apply but struggle to capture the complex interactions between landslides and various factors, leading to relatively low prediction accuracy (Zhang et al., 2022).With advancements in computer technology, machine learning models have increasingly been used for landslide susceptibility prediction (Dou et al., 2023; Qi et al., 2024; Huang et al., 2023). Unlike traditional statistical methods, machine learning models are capable of identifying nonlinear relationships between landslides and influencing factors, significantly improving prediction accuracy (Huang et al., 2020). However, single machine learning models often struggle to match training data with real-world conditions, making it difficult to fully capture the nonlinear interactions between landslides and evaluation factors. Combining statistical methods with machine learning models can help address this issue (Umar et al., 2014). The integration of these methods for landslide susceptibility assessment has become a prominent trend in research. For example, Wang et al. used the IV and CF methods along with the RF model for landslide susceptibility assessment in Ningnan County, demonstrating that the integrated model performed better than individual models (Wang J. et al., 2024). Liu et al. proposed the SF-Stacking method, which incorporates spatial heterogeneity and feature selection, for landslide susceptibility assessment in Yibin City. The results showed that SF-Stacking outperformed individual models such as BPNN, SVM, and KNN in terms of accuracy (Liu and Chen, 2024). Wang Jingjing et al., employed a bidirectional long short-term memory model based on landslide density (LD-BiLSTM) for landslide susceptibility assessment in Luding County, achieving higher accuracy compared to both the RF and IV models. These studies have proven that integrated models can effectively overcome the limitations of single models and improve landslide prediction accuracy (Wang L. et al., 2024).

In the existing body of literature, many studies have relied on historical landslide data as training datasets for landslide susceptibility assessments (Hong et al., 2024; Mao et al., 2021; Xing et al., 2021; Gu et al., 2024; Xing et al., 2023), often overlooking newly occurring landslide events. However, older, larger, and more destructive landslides, which may have been mitigated through measures like slope reinforcement by relevant geological disaster management authorities, could lead to less accurate predictions when based solely on historical data. This study introduces a coupled approach that integrates statistical methods, machine learning models, and SBAS-InSAR technology to assess landslide vulnerability in the upper reaches of the Yellow River. The study is structured in three key components: First, historical landslide data from 1998 to 2012, provided by the China Geological Survey (https://www.cgs.gov.cn/), were used to form the sample set. Two statistical methods-the IV and CF methods-were combined with advanced machine learning models, including RF, SVM, and XGBoost. Landslide susceptibility predictions were generated for six integrated models (IV-RF, IV-SVM, IV-XGBoost, CF-RF, CF-SVM, CF-XGBoost). Second, SBAS-InSAR technology along with optical remote sensing images were applied to detect new landslides that occurred in the study area from 2021 to 2023. The newly identified landslides were then compared with the susceptibility results from the six models, which revealed that the CF-XGBoost model was the most effective. Finally, the newly identified landslide data were incorporated into the CF-XGBoost model as a positive sample set to calculate the landslide hazard susceptibility index for the study area, and risk zoning was performed using the natural breakpoint method. These findings provide an important scientific foundation for landslide risk management and prevention in the upper reaches of the Yellow River (from Gonghe to Xunhua).

In summary, to address the current limitations in research on landslide susceptibility assessment in the upper reaches of the Yellow River, the Gonghe to Xunhua section was selected as the study area for conducting systematic landslide susceptibility prediction. The main contributions of this study are as follows:

• A positive sample set was established based on historical landslide points identified in the study area between 1998 and 2012. Landslide susceptibility was predicted by integrating statistical methods with advanced machine learning techniques. The predictive performance of the integrated model was further validated using newly identified landslide points from 2021 to 2023, which were detected through SBAS-InSAR and optical remote sensing imagery.

• The newly identified landslide points were incorporated into the positive sample set to update it. The optimal model (CF-XGBoost) was retrained using the updated dataset, resulting in an improved landslide susceptibility assessment for the upper reaches of the Yellow River. This approach ensures that the training samples remain temporally relevant and enhances the model’s predictive accuracy.

2 Materials and methods

2.1 Research area

The upper reaches of the Yellow River, particularly the Gonghe and Xunhua sections, are located in the southeastern part of the Qinghai-Tibet Plateau. The topography features elevated areas in the west, north, and south, with lower elevations in the east. Altitudes in this region range from 1657 to 4121 m. The river passes through significant areas, including the Longyang Gorge, Lijia Gorge, Guide County, Jianzha County, and Xunhua County (Wang Q. et al., 2024; Fei et al., 2023; Du et al., 2023). The climate in the study area is a plateau continental type, with average annual rainfall over the last 5 years ranging from 550 to 670 mm. Due to the relatively low precipitation, the normalized difference vegetation index (NDVI) remains below 0.3 in most parts of the region, indicating sparse vegetation and considerable desertification. In this ecologically fragile environment, which is further affected by river erosion and human engineering activities, landslide occurrences are common (Shi et al., 2019; Dong et al., 2018). Figure 1 illustrates the general situation and historical landslide data for the area. These landslides pose significant risks to infrastructure along the riverbanks and threaten the safety of residents and their property.

Figure 1

Map showing study area in Qinghai Province, China, highlighting counties and key features like the Yellow River and historical landslides. Elevation is indicated by gradient colors ranging from low in green to high in red. Inset maps locate the region within China and the Qinghai and Gansu provinces. A legend clarifies symbols and colors used.

Figure 1. Overview and historical landslide distribution of the study area.

2.2 Data sources

To accurately detect new, unrecorded landslides, this study employed a method that combines SBAS-InSAR technology with optical remote sensing imagery. Data from Sentinel-1A ascending and descending tracks from January 2021 to December 2023 were used, with 123 scenes for ascending and 149 scenes for descending tracks. SRTM external elevation data with a 30 m resolution and precise orbit data were utilized for orbit error correction. Optical remote sensing images from Landsat-8, also with a 30 m resolution, were chosen for this study. The specific data sources are shown in Table 1. For the selection of non-landslide points, this study randomly selected 167 non-landslide points outside the 2 km buffer zone of landslide locations to maintain a 1:1 ratio between positive and negative samples. This balanced sampling approach prevents potential degradation in ensemble model accuracy caused by imbalanced sample distribution. The classification criteria of evaluation factors, spatial distribution of landslide points, and detailed procedures for non-landslide point selection are visually presented in Figure 2.

Table 1

Table 1. Evaluation factors and data sources.

Figure 2

Figure 2. The classification of each evaluation factor and the landslide point and non-landslide location.

Landslide disasters develop through a complex process, typically influenced by a combination of natural factors and human engineering activities (Nguyen et al., 2025). In this study, 16 evaluation factors were initially selected from five key landslide influencing categories: geological environment, topography and geomorphology, meteorology and hydrology, vegetation and soil, and human engineering activities. These factors include elevation, slope, aspect, plan curvature, profile curvature, topographic wetness index, normalized difference vegetation index, rainfall, distance to faults, distance to rivers, distance to roads, formation lithology, land use, surface roughness, topographic relief, and surface cutting degree. To maintain consistency in the spatial representation of each factor, a 30 m spatial resolution was applied. Continuous factors were classified using the natural break method, while discrete factors were categorized based on their actual states (Wu et al., 2016). More details are provided in Supplementary Material.

2.3 Methods

2.3.1 Evaluation factor screening

To ensure accurate landslide prediction results, it is important to conduct a correlation analysis of the 16 primary evaluation factors to assess their independence (Li et al., 2022). While all selected factors play a role in the development of landslide hazards, strong correlations between them can affect the evaluation outcomes and cause collinearity problems (Yang C. et al., 2023). Therefore, screening the evaluation factors is crucial to maintain the accuracy of the results. In this study, the Pearson correlation coefficient method was employed, and its calculation formula is as follows (Li C. et al., 2024):

r_{x y} = \frac{\sum x_{i} y_{i} - n_{x y}^{-}}{(n - 1) s_{x} s_{y}} = \frac{n \sum x_{i} y_{i} - \sum x_{i} \sum y_{i}}{\sqrt{n \sum x_{i}^{2} - {(\sum x_{i})}^{2}} \sqrt{n \sum y_{i}^{2} - {(\sum y_{i})}^{2}}} (1)

The correlation between the factors can be measured according to the calculated Pearson correlation coefficient ( $r_{x y}$ ) as shown in Equation 1. If 0 < $r_{x y}$ ≤ 0.3 indicates a weak correlation; 0.3 < $r_{x y}$ ≤ 0.5 indicates moderate correlation; 0.5 < $r_{x y}$ ≤ 1 indicates a strong correlation.

2.3.2 Statistical approaches

The IV method is based on assessing the uncertainty of information. By calculating the information value of each evaluation factor affecting landslides in the study area, a higher information value suggests a greater likelihood of landslide occurrence. The formula for computing the information content is as follows (Lv et al., 2024):

I V (X_{i}, Y) = \ln \frac{N_{i} / N}{S_{i} / S} (2)

In Equation 2, $I V (X_{i}, Y)$ represents the information quantity value of evaluation factor $X_{i}$ for landslide event $Y$ ; $N_{i}$ represents the number of landslides distributed within the evaluation factor $X_{i}$ ; $N$ indicates the total number of landslides in the study area. Additionally, $S_{i}$ represents the area covered by the evaluation factor $X_{i}$ , and $S$ represents the overall area of the study region.

The CF coefficient calculates the prior probability of landslide occurrence based on the states of different index factors using landslide point data. The CF value ranges from −1 to 1, where, similar to the IV method, a higher value indicates a greater tendency for landslides to occur. The formula for calculating CF is as follows (Ding et al., 2025):

C F = \{\begin{array}{c} \frac{P P_{a} - P P_{s}}{P P_{s} (1 - P P_{a})}, P P_{a} < P P_{s} \\ \frac{P P_{a} - P P_{s}}{P P_{a} (1 - P P_{s})}, P P_{a} \geq P P_{s} \end{array} (3)

In Equation 3, $P P_{a}$ denotes the ratio of landslide points to the area within the evaluation factor region, while $P P_{s}$ represents the ratio of the total number of landslide points to the total area of the study region.

2.3.3 Machine learning algorithms

Random forest (RF) is an ensemble learning algorithm that integrates multiple classification and regression trees. It constructs several decision trees using subsets of the data, aggregates the predictions from these trees, and ultimately determines the optimal result (Akinci, 2022). The RF algorithm randomly selects portions of the training dataset and features from these samples to train each individual learner, ensuring both independence among the trees and greater accuracy in the aggregated predictions. This method surpasses the performance of a single decision tree by averaging the outcomes, which minimizes overfitting and enhances predictive accuracy (Yang et al., 2024). The fundamental formula is as follows:

f (x) = \frac{1}{K} \sum_{k = 1}^{k} t_{i} (x) (4)

In Equation 4, $f (x)$ is the prediction result of regression tree; $K$ is the number of regression trees; $t_{i} (x)$ is the prediction result of the ith regression model.

Support Vector Machine (SVM) is a widely used machine learning model for classification and regression tasks, with its primary concept being the identification of an optimal hyperplane to separate various categories of data (Zhang et al., 2023; Huang et al., 2022). Initially, all evaluation factors ( $x_{i}$ ) for each sample are organized into a vector $X_{j}$ , Subsequently, the corresponding vectors of all samples are compiled into the training dataset ( $X_{j}, u_{j}$ ), which represents the training output, specifically the probability of landslide occurrence. Ultimately, a mapping relationship is developed between low-dimensional space and high-dimensional space, resulting in the construction of a linear fitting function (Li Z. et al., 2024), as demonstrated below:

f (x) = \sum_{i = 1}^{n} (a_{i} - a_{i}^{*}) K (x_{i}, x) + b (5)

In Equation 5, the terms $a_{i}$ and $a_{i}^{*}$ represent Lagrange multipliers; the kernel function is denoted as $K (x_{i}, x)$ ; and the threshold is represented by $b$ .

The eXtreme Gradient Boosting (XGBoost) optimizes the loss function by employing the second derivative information, and determines the split node based on whether a reduction is achieved. The core formula is as follows (Guo et al., 2024):

θ (x) = \sum_{j = 1}^{j} {[(\sum μ_{i}) w}_{j} + \frac{1}{2} (\sum h_{i} + λ) w_{j}^{2}] + γ T (6)

In Equation 6, objective function is represented by $θ (x)$ ; the regularization coefficient is denoted by $γ, λ$ ; the first partial derivative of the loss function is represented by $μ_{i}$ ; the second partial derivative of the loss function is denoted by $h_{i}$ ; the number of leaves is indicated by $T$ ; and the number of leaf nodes is represented by $w_{j}$ .

2.3.4 SBAS-InSAR technology

This study utilizes Sentinel-1A ascending and descending orbit data from January 2021 to December 2023 to calculate the 3-year average annual surface deformation rate in the study area using SBAS-InSAR technology. The SBAS-InSAR processing involves correcting track errors with precision track data and DEM, and performing phase unwrapping using the minimum-cost flow method (Yang S. et al., 2023). To generate sufficient interference pairs, the time baseline is set to 90–120 days and the spatial baseline is set at 120 m. A deformation rate threshold of 10 mm/a is applied, with values below this considered stable. If the deformation rate exceeds this threshold, optical remote sensing images are combined with visual interpretation to assess whether the area is affected by landslides.

2.3.5 Accuracy evaluation

To ensure the validity of the research method, the accuracy (ACC) and ROC curve were utilized to assess the model’s performance. Accuracy is the proportion of correctly predicted samples out of the total number of samples. The ACC value serves as a direct indicator of the model’s precision, with larger values reflecting higher accuracy (Wang J. et al., 2024). The ROC curve is frequently used to evaluate the classification effectiveness of a model, depicting the area under the curve created by the true positive rate (TPR) and the false positive rate (FPR) to measure the model’s accuracy (Liu and Chen, 2024). A greater area under the ROC curve, or a higher AUC value, signifies improved model accuracy and stronger predictive performance. The fundamental concept of $A C C$ is as follows (Qi et al., 2024):

A C C = \frac{T P + T N}{T P + T N + F P + F N} (7)

In Equations 7–9, $T P$ and $F N$ refer to the number of landslide points correctly and incorrectly predicted by the model, respectively; $F P$ and $T N$ represent the number of non-landslide points that are incorrectly and correctly predicted by the model, respectively.

Precision refers to the proportion of all samples predicted as landslides by the model that are correctly identified as landslide samples. The fundamental concept of Precision is as follows:

P r e c i s i o n = \frac{T P}{T P + F P} (8)

Recall represents the proportion of correctly predicted landslide samples among all actual landslide samples, and its mathematical expression is as follows:

R e c a l l = \frac{T P}{T P + F N} (9)

In Equation 10, F1-Score represents the harmonic mean of accuracy and recall, which can quantitatively evaluate the accuracy and completeness of a model. Its mathematical expression formula is as follows:

F 1 - S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l} (10)

Assuming the model shows good accuracy, the new landslide data identified by SBAS-InSAR technology in conjunction with optical remote sensing images are compared with the landslide susceptibility prediction results from different models. If all the new landslide data fall within high-risk areas, the effectiveness of the model for landslide susceptibility assessment in the upper reaches of the Yellow River is confirmed, thereby verifying the accuracy of the model’s predictions. The technical approach employed in this study is depicted in Figure 3. First, a correlation analysis was conducted on the initially selected 16 evaluation indicators using the Pearson correlation coefficient method. Strongly correlated factors were eliminated to establish a landslide susceptibility evaluation index system. Second, 167 historical landslide locations within the study area were selected as positive samples, while 167 non-landslide points, randomly chosen from areas outside the 2-km buffer zones surrounding historical landslides, were used as negative samples. To address spatial autocorrelation, a spatial block cross-validation approach was applied. Specifically, all samples were first divided into a regular 10 × 10 geographic grid based on their spatial coordinates. Subsequently, group-based cross-validation was conducted to ensure that all samples within the same spatial block were exclusively assigned to either the training set or the validation set, thereby maintaining spatial independence between these two datasets. Finally, 70% of the samples were used for training and 30% for validation. The IV and CF values of each influencing factor were calculated and integrated with three models-RF, XGBoost, and SVM-to generate six integrated models. Through model training and prediction, six landslide susceptibility maps were produced. The performance of these models was evaluated using ROC curves, with accuracy and precision metrics, and susceptibility results were classified using the natural breaks method. Additionally, an overlay analysis was performed between the susceptibility zonation results and 227 newly identified landslides detected via InSAR and optical imagery to further validate the methodology’s reliability. Finally, the optimal model (CF-XGBoost) was applied to predict landslide susceptibility using training data derived from new landslide points, non-landslide points, and evaluation factors. For the training data, 227 new landslide points were defined as positive samples, with an equal number of non-landslide points randomly selected outside a 2 km buffer zone of these new landslides. Regarding the evaluation factors, NDVI, rainfall, and land use required resampling, while other factors were treated as static over time.

Figure 3

Flowchart depicting a landslide susceptibility prediction model with four sections. Section I details data preprocessing using Pearson correlation and training samples. Section II describes integrating models like RF, SVM, and XGBoost, using CF and IV methods, leading to a landslide susceptibility index. Section III evaluates model accuracy with an ROC curve, accuracy scores, and susceptibility results using Jenks natural breaks classification. Section IV focuses on predicting landslide susceptibility with CF-XGBoost, incorporating new landslide data and field verification, and highlights the importance of evaluation factors.

Figure 3. Technical flow chart of this paper.

3 Results and analysis

3.1 SBAS-InSAR and new landslide identification results

Using Sentinel-1A data from January 2021 to December 2023 in the study area, this research employs SBAS-InSAR technology to calculate the average annual surface deformation rate over the past 3 years. The findings are presented in Figure 4. Specifically, Figure 4a depicts the deformation rate in the ascending orbit direction, while Figure 4b illustrates the deformation rate in the descending orbit direction. The entire SBAS-InSAR technical workflow was implemented using the SARscape module within ENVI 5.6 software.

Figure 4

Topographic maps showing three distinct figures. Panel (a) depicts land velocity with a gradient from red to blue indicating a range from negative one hundred thirty-nine to seventy-three millimeters per annum. Panel (b) illustrates velocity ranging from negative two hundred thirty-seven to one hundred twenty-three millimeters per annum. Panel (c) provides a detailed map of the area, highlighting counties, gorges, the Yellow River, lakes, landslides, and geological strata, with colors representing different geological features and structures. A scale and compass rose are included in each panel for orientation.

Figure 4. Average annual surface deformation rate and spatial distribution of new landslides in the study area (2021–2023). (a) represents deformation in the ascending direction, (b) represents deformation in the descending direction, and (c) illustrates the spatial distribution characteristics of different landslides.

To achieve accurate landslide identification within the study area, deformation rates were overlaid onto Landsat imagery and Google Earth basemaps. A deformation threshold of 10 mm/a was established based on ascending and descending orbital datasets, with areas exhibiting deformation rates below this threshold classified as relatively stable regions. Preliminary landslide boundaries were delineated by comparing regional deformation rates against the established threshold. Subsequently, visual interpretation methods were systematically applied to verify each preliminary landslide polygon. The final landslide inventory presented in Figure 4c identifies 227 new landslides throughout the study area. Among these, 171 landslides were detected in ascending orbit data (indicated by white points in Figure 4c), 154 landslides were identified in descending orbit data (blue points), and 98 landslides showed detection consistency in both orbital directions (red points). Historical landslides are represented by black points in Figure 4c. Spatial distribution analysis reveals approximately 40 new landslides occupy pre-existing landslide footprints, while the majority constitute newly developed slope failures. Notably, both newly identified and historical landslides demonstrate clustered distributions concentrated within the Longyang Gorge, Lijia Gorge, Jianzha County, and Hualong County sectors.

3.2 Screening primary factor

Before training the coupling model, a correlation analysis was performed on the primary evaluation factors to prevent data redundancy due to high correlations, which could impact the model’s precision and the accuracy of landslide predictions. Pearson correlation coefficients were calculated to evaluate the relationships between the factors, as shown in Figure 5. The results reveal that the absolute correlation coefficient between surface roughness and slope exceeds 0.5, indicating a strong correlation. Excluding surface roughness led to an improvement of about 0.02 in the ROC value for each coupling model. As a result, surface roughness was excluded from the subsequent landslide susceptibility modeling.

Figure 5

Correlation matrix heatmap showing relationships between variables labeled EL, SL, AS, PLC, PRC, TWI, SR, RA, SCD, NDVI, DR, DTR, DF, FL, RFL, and LU. Color gradient from blue to red indicates correlation strength, with red representing positive and blue negative correlations. Statistical significance at $ p \leq 0.05 $ is marked with asterisks. A prominent correlation is between SL and SR at 0.66.

Figure 5. Pearson correlation coefficient graph.

3.3 IV and CF values of the second-level partition of each evaluation factor

Prior to calculating the IV and CF values for the secondary sub-regions of each factor, continuous factors should be classified using the natural breaks method, while discrete factors should be categorized according to their actual states. Following this, the respective areas and landslide counts within each classification interval of the factors are tallied. Subsequently, the IV and CF values for each classified factor are respectively computed based on Equation 2 and Equation 3.

The IV and CF values for secondary zones across different evaluation factors reflect their contribution to landslide occurrence, with higher values signifying a stronger influence. According to the calculation results, the following conditions that exhibit the highest IV and CF values include: elevation ranging from 2590 to 2754 m, slope between 20° and 30°, north-facing slopes (337.5°–360°), plan curvature from −1 to 0, profile curvature between −5.8 and −3.8, topographic wetness index less than 4.9, topographic relief ranging from 71 to 109 m, surface cutting degree between 40 and 60 m, NDVI ranging from 0.31 to 0.39, distance to rivers between 400 and 800 m, distance to roads is less than 400 m, distance to faults between 800 and 1200 m, lithology from Paleogene to recent, rainfall exceeding 651 mm/year, and land use as cultivated land, as detailed in Supplementary Material.

3.4 Model accuracy evaluation

Hyperparameter optimization is a critical step for enhancing the overall performance of machine learning models. It not only strengthens model robustness and generalization capabilities, effectively mitigating overfitting and improving training stability, but also significantly reduces computational resource consumption.

This study employed a strategy combining random search with fivefold cross-validation to identify the optimal hyperparameter configuration. Specifically, 2000 sets of hyperparameters were randomly sampled. For each set, a fivefold cross-validation procedure was performed: the training set was uniformly partitioned into five mutually exclusive subsets. Sequentially, models were trained on four subsets while the remaining subset served as the validation set for performance evaluation, rotating the validation set across each fold. After evaluating all parameter combinations, the system selected the hyperparameter set achieving the highest average AUC score across the five cross-validation rounds as the final configuration.

The optimal hyperparameters of each model are shown in Table 2. This study implements hyperparameter optimization and model training based on Python 3.9 software and the Scikit-learn (sklearn) package.

Table 2

Table 2. Main hyperparameters of the integrated model.

The IV and CF values derived from the information content method and the determined coefficient method, were applied to train three machine learning models-RF, XGBoost, and SVM-to generate landslide susceptibility evaluation results for six integrated models. The ROC curves and ACC values for each integrated model are presented in Figure 6. In terms of each precision index, the accuracy of the machine learning models coupled with the determined coefficient method generally outperformed those using the information content method. The AUC values for all integrated models surpassed 0.84, indicating strong fitting accuracy and predictive capability. Among them, the CF-XGBoost model achieved the highest accuracy with an AUC value of 0.916. The similar performance of the integrated models may be attributed to the resemblance between randomly generated non-landslide points and the environmental conditions of landslide points, minimizing the impact of subjective influences.

Figure 6

Graph (a) is a Receiver Operating Characteristic (ROC) curve comparing six models—CF-XGBoost, CF-RF, IV-XGBoost, IV-SVM, CF-SVM, and IV-RF—with their respective AUC values, showing model performance. Graph (b) is a bar chart illustrating the accuracy scores of the same models, ranging from 0.746 to 0.866.

Figure 6. ROC curve and ACC value. (a) represents the ROC curve, and (b) represents the ACC value.

Table 3 shows the comprehensive precision metrics for six ensemble models. The CF-XGBoost demonstrates relatively superior overall accuracy. However, all models exhibit substantially lower recall rates compared to other precision metrics. This limitation arises from two primary factors: 1) the insufficient representation of positive-class instances (n = 167) in the training dataset, which hinders effective feature learning; and 2) the application of a fixed 0.5 decision threshold, which imposes stringent criteria for identifying positive-class outcomes. This is also the main reason why 9% of the new landslides occurred outside high-risk areas, as shown in Table 4. As a result, the models tend to minimize false positives while increasing the likelihood of false negatives in landslide detection. Nevertheless, the consistently high precision values indicate strong reliability in distinguishing true positive cases.

Table 3

Table 3. All the metrics of six models.

In addition to using the above five precision metrics, SBAS-InSAR technology combined with optical remote sensing images was also employed to compare and analyze the spatial distribution of landslides identified by the integrated models and the predicted landslide susceptibility. This allowed for further validation of the models’ predictive accuracy. When comparing the landslide susceptibility results obtained from statistical analysis with the spatial distribution of newly detected landslides, as shown in Table 4, it is evident that most of the new landslides are concentrated in moderate, high, and very high-risk areas, with only a small fraction located in low-risk regions. The CF-XGBoost model predicted that 91% of the new landslides occurred in high-risk and very high-risk areas, with no new landslides found in low-risk areas, further confirming its superior performance in landslide prediction.

Table 4

Table 4. Overlay analysis of landslide susceptibility results and new landslide data.

3.5 Landslide susceptibility results

In this study, the natural breaks method was adopted to classify the landslide susceptibility indices predicted by six integrated models into five classes: very low, low, moderate, high, and very high susceptibility. The landslide susceptibility threshold corresponding to each class are shown in Table 5. The results demonstrate that the threshold structure defined by the natural breaks method enables the CF-XGBoost model to exhibit exceptional capability in precisely isolating very low-risk zones and effectively distinguishing very high-risk zones, thereby confirming its superior predictive performance. Compared to other ensemble models, this threshold structure significantly enhances the characterization accuracy of the spatial gradient of landslide probability. This advancement holds substantial practical significance for geohazard risk management planning.

Table 5

Table 5. Six integrated model landslide susceptibility thresholds.

The results of landslide susceptibility of each integrated model are illustrated in Figure 7. The findings show that areas to the north of Lijiaxia, Jianzha County, and Hualong County exhibit higher risk levels, corresponding with zones of landslide concentration. This is due to factors such as steep terrain, relatively high rainfall, a dense population along the Yellow River, frequent human engineering activities, and weak stratigraphic lithology, which collectively increase the likelihood of landslides. However, these differences mainly stem from variations in statistical methods and model characteristics. In terms of statistical methods, different approaches may result in significant discrepancies during feature extraction in the training phase, thereby affecting the composition of feature subsets used in model development. With regard to the machine learning models themselves, tree-based ensemble models such as RF and XGBoost are capable of effectively capturing and utilizing nonlinear relationships among features. In contrast, SVM with linear kernel functions depend heavily on the linear separability of input features. Moreover, key hyperparameters in tree-based models, such as the number of trees, maximum depth, and learning rate, have a direct impact on model performance. Similarly, the regularization parameter plays a critical role in determining the performance of SVM with linear kernels. For instance, CF-RF, CF-SVM, IV-RF, and IV-SVM models identify fewer low-risk regions but more areas in the medium to high-risk categories. On the other hand, the IV-XGBoost model identifies more low-risk areas but provides lower prediction accuracy for landslides. The CF-XGBoost model successfully predicts high-risk areas based on historical landslide data, with a strong alignment to actual landslide distributions.

Figure 7

Six panels of landslide susceptibility maps depicting different modeling techniques for a region along the Yellow River, with susceptibility levels from very low to very high. Panels (a), (b), (c) represent CF-XGBoost, CF-RF, CF-SVM models, and panels (d), (e), (f) represent IV-XGBoost, IV-RF, IV-SVM models. Colors indicate susceptibility, lakes are in blue, and new landslides are marked. Scale bars included.

Figure 7. Results of historical landslide susceptibility mapping. (a–f) respectively represent the model results of CF-XGBoost, CF-RF, CF-SVM, IV-XGBoost, IV-RF, and IV-SVM.

4 Discussion

4.1 Prediction of landslide susceptibility in the upper reaches of the Yellow River (from Gonghe to Xunhua section)

Using historical landslide data as the training sample set for the integrated model, it was found that the CF-XGBoost model demonstrated high accuracy and effective prediction performance. Consequently, this model was applied to predict landslide susceptibility in the upper reaches of the Yellow River (from Gonghe to Xunhua). When processing evaluation factors, only the normalized difference vegetation index and rainfall data from 2021 to 2023 were updated, while other factors remained consistent throughout the year. To ensure the authenticity of the existing landslide data and the accuracy of positive sample data, new landslide data identified by SBAS-InSAR technology and optical remote sensing images were selected as the training sample set.

Following model retraining with an increased number of positive samples (from 167 to 227), the CF-XGBoost model demonstrated a notable improvement in Recall, which increased from 0.818 to 0.911, as show in Table 6. This indicates that after sufficiently learning the characteristics of the positive class, the model successfully captured a greater number of actual landslides. However, Precision experienced a slight decrease of 0.028. This decline is attributable to an inevitable increase in false positives (non-landslides incorrectly classified as landslides) as the model reduced the number of missed detections. Overall, the model exhibited improvements in its discriminative ability (AUC), overall accuracy (ACC), landslide detection performance (Recall), and comprehensive performance (F1-Score). These enhancements collectively indicate strengthened model generalization capability and stability.

Table 6

Table 6. Comparison of model accuracy between historical samples and new samples.

Figure 8a illustrates the landslide susceptibility zoning results obtained using the new landslide data and trained with the CF-XGBoost model. Compared to the historical landslide susceptibility evaluations, Jianzha County and Hualong County remain in high-risk zones, but there is an increase in low-risk areas, with the distribution of high-risk zones becoming clearer, particularly concentrated in Jianzha, Hualong, and Xunhua Counties. This suggests that the use of older historical landslide data could lead to inaccuracies in identifying high-risk areas. Field surveys in a high-risk area of Jianzha County, shown in Figures 8b–f, further validated the model’s ability to accurately identify the spatial distribution and landslide susceptibility of new landslides. The red area in Figure 8b highlights the sliding boundary, while Figure 8c shows subsidence on the slope surface, with the red area marking the active landslide front that has subsided by about 1 m. Figure 8d offers a closer look at Figures 8c,e,f display tensile fractures caused by the active landslides.

Figure 8

A series of images and a map showing landslide susceptibility near the Yellow River. Image (b) to (f) depict various views of landscapes with marked landslide areas and visible soil erosion. The map (a) illustrates susceptibility levels using color coding, ranging from very low (green) to very high (red), with indicators for new landslides and survey areas, highlighting lakes in blue and the river in light blue.

Figure 8. New landslide susceptibility results and detailed map of field investigation. (a) displays the prediction results of landslide susceptibility in the study area. (b) shows landslides verified through field investigation. (c) illustrates the front scarp of an active landslide, and (d) demonstrates a detailed view of settlement features in (c). (e,f) represent tensile fractures caused by the active landslide.

The landslide susceptibility prediction results indicate that urban development and infrastructure planning should prioritize avoiding areas of high and very high susceptibility, directing siting efforts toward zones of low and very low susceptibility. Furthermore, comprehensive factor importance analysis reveals that disaster prevention measures require enhanced implementation in high and very high susceptibility regions during concentrated rainfall seasons, particularly in areas exhibiting dense concentrations of high susceptibility zones such as Jianzha, Hualong, and Xunhua counties.

4.2 Feature importance analysis

Machine learning models not only offer strong predictive performance but also quantitatively assess the importance of each evaluation factor, providing insights into the contribution of various factors to landslide occurrence and facilitating the development of targeted preventive strategies. In this study, Weight is employed as the metric for calculating feature importance within the CF-XGBoost model. This metric quantifies feature importance by tallying the total number of times each feature is utilized as a split node across all trees in the ensemble. The weights of each factor calculated by the CF-XGBoost model are shown in Figure 9a. It can be observed that all factors play a significant role in landslide hazard development. According to the magnitude of factor weights, the three factors with greater impact on landslide hazards are rainfall, slope aspect, and stratigraphic lithology.

Figure 9

Two horizontal bar charts compare evaluation factors based on scores. Chart (a) uses a blue gradient with RFL scoring highest at 0.203. Chart (b) uses a green gradient with RFL scoring highest at 0.262. Other factors are listed with descending scores.

Figure 9. Importance of evaluation factors. (a) represents the CF-XGBoost model, and (b) represents the CF-RF model.

To further identify the principal controlling factors influencing landslides, this study computed the feature importance ranking using the CF-RF model, as illustrated in Figure 9b. Consistent with the results from the CF-XGBoost model, rainfall, slope aspect, and lithology remain the most significant factors affecting landslide occurrence. However, a discrepancy exists between the two models regarding the relative importance ranking of slope aspect and lithology, which is likely attributable to differences in their hyperparameter configurations. Overall, the CF-XGBoost and CF-RF models exhibit a high degree of consistency in the ranking of all factor weights, with both confirming the predominant role of rainfall among the influencing factors.

Analysis combining Supplementary Material shows that when annual rainfall exceeds 636 mm, rainfall promotes landslide occurrence, while north-south slope aspects have a significant influence on landslides. Furthermore, based on geotechnical mechanical properties, this study divides the research area into four categories: hard lithology, moderately hard lithology, moderately weak lithology, and weak lithology, as shown in Figure 4c. Most new and old landslides occur in moderately weak lithology areas. This is because rainwater preferentially infiltrates south-facing slopes, further softening the already fragile lithology. After rainwater infiltrates north-facing slopes, low water evaporation leads to long-term high soil moisture content, continuously reducing the shear strength of geomaterials and increasing the sliding force. Therefore, the interaction among rainfall, slope aspect, and stratigraphic lithology significantly increases landslide hazard risk.

Secondary factors influencing landslides are elevation, topographic relief, and vegetation coverage. This is particularly significant within the ranges of elevation 2590–2754 m, topographic relief 71–109 m, and vegetation coverage (NDVI) 0.31–0.39. This occurs because gravitational potential energy increases with elevation difference, and root stabilization effectiveness weakens in areas with low vegetation coverage, leading to tension crack formation under the self-weight of geomaterials. Additionally, under rainfall infiltration, crack generation in slopes is accelerated by these combined influences. The influence of other factors on landslides is relatively small, demonstrating that landslides in the Upper Yellow River region are mainly controlled by the area’s unique geographical conditions. Human activity factors such as distance to roads and land use exhibit no significant effects on landslide movement.

4.3 Comparison with existing studies

4.3.1 Comparison of the spatial distribution of newly detected landslides with existing studies

Due to the extensive spatial coverage of the study area, field verification of all identified landslides across the entire region was impractical, Consequently, a comparative analysis with existing monitoring results was performed to validate the accuracy of the landslide detection outcomes presented in this study. The detected landslides are predominantly distributed in southeastern Longyang Gorge, north of Lijia Gorge, Jianzha County, and Hualong County. The comparative analysis revealed a high degree of consistency between the landslide detection results obtained in this study and those reported in previous research. For instance, Du et al. employed Stacking-InSAR integrated with optical remote sensing imagery to identify landslide distribution within the upper reaches of the Yellow River (Du et al., 2023). Similarly, Zhao et al. utilized SBAS-InSAR combined with optical remote sensing imagery to determine the precise geographical locations of landslides in this region (Zhao et al., 2022). Notably, the landslides identified by both research groups were also primarily located in southeastern Longyang Gorge, north of Lijia Gorge, Jianzha County, and Hualong County.

4.3.2 Comparison between model predictive results and existing studies

Existing research on landslide susceptibility assessment in the upper Yellow River remains limited. The most comparable study is that of Li et al. (2016), who employed the Analytic Hierarchy Process (AHP) to evaluate susceptibility in the Longyang Gorge to Gongboxia Gorge segment. Their results identified high susceptibility zones across the northwestern and southwestern sectors of Longyang Gorge. While the present study similarly detected localized high-susceptibility areas in these sectors, their spatial extent is significantly reduced relative to Li et al.'s findings, with low-susceptibility domains predominating. This discrepancy likely stems from methodological differences, divergent evaluation criteria, and temporal environmental variations.

4.3.3 Comparative analysis of the model’s landslide prediction performance with existing studies

The integrated model (CF-XGBoost) employed in this study demonstrated superior performance in landslide prediction. Predictive results on new landslide data revealed that 91% of the landslides were located within high-risk and very high-risk zones. Comparative analysis with existing research indicates that this performance remains highly competitive. For example, Zhu et al. (2024) evaluated how different nonlandslide sample selection methods, specifically whole area random selection method, Buffer method, Frequency Ratio method, and Analytic Hierarchy Process (AHP), affected RF and XGBoost model performance in Huize County. Their optimal model (XGBoost-AHP) correctly predicted 85.03% of landslides. Yu et al. (2025) proposed a novel framework based on Dynamic Ensemble Selection (DES) to capture the spatial development patterns of different landslide types, conducting experiments in Wanzhou District, Chongqing, China. The DES model achieved an accuracy of 80.84% in classifying landslides into high-risk and very high-risk zones. Zhou et al. (2024) conducted a landslide susceptibility assessment for the Zigui-Badong section of China’s Three Gorges Reservoir area using a coupled approach integrating ensemble learning and machine learning. Their best-performing integrated model (LR-MLP-Boosting) correctly identified 82.34% of landslide pixels as situated within high-risk and very high-risk zones.

In summary, the upper reaches of the Yellow River constitute a landslide prone region in China, yet research on landslide susceptibility assessment in this area remains limited. Consequently, this study’s approach integrating statistical methods, machine learning models, and SBAS InSAR technology for landslide susceptibility evaluation in the upper Yellow River holds significant scientific merit. Furthermore, comparative analysis with existing studies reveals that the CF-XGBoost model employed in this work demonstrates superior landslide predictive performance.

4.4 Limitations and prospects

A primary limitation of this study stems from the temporal mismatch between the modeling and validation datasets: the historical landslide inventory covers the period 1998–2012, while the InSAR deformation observations used for model validation span 2021–2023. Environmental changes that may have occurred during this interval, such as land use transitions (e.g., urbanization, deforestation) and alterations in vegetation cover (e.g., degradation or succession), could reduce the model’s applicability to current conditions by modifying key landslide-controlling mechanisms. These mechanisms include root reinforcement, rainfall-infiltration-runoff interactions, and pore-water pressure dynamics. Such environmental variability may introduce systematic biases into model predictions, potentially leading to underestimation of current instability risks in areas with significant vegetation loss or, conversely, overestimation of risk in fundamentally altered environments. Therefore, although the model primarily reflects landslide occurrence patterns under historical environmental conditions, direct application of its predictions to interpret InSAR observations from 2021 to 2023 should be approached cautiously and integrated with concurrent assessments of environmental change. Future research should incorporate time-series remote sensing data (e.g., on land use and vegetation cover dynamics) to update model parameters and develop dynamic risk assessment frameworks compatible with near-real-time InSAR monitoring.

Evaluation factor selection significantly determines machine learning model accuracy. This study initially considered 16 potential landslide-influencing factors. The application of Pearson’s correlation coefficient method led to exclusion of surface roughness, resulting in 15 causative factors for model training. However, data availability limitations precluded incorporation of certain factors. For instance, earthquakes-as natural, uncontrollable phenomena-frequently trigger numerous landslides. Thus, future studies should prioritize earthquake-related factors to enhance analysis comprehensiveness and robustness.

Additionally, the natural break-point method partitioned the susceptibility index predicted by the integrated model, maximizing inter-group differences while minimizing intra-group variation. Future research should explore alternative partitioning methods to achieve more realistic zoning.

5 Conclusion

This study aimed to enhance landslide hazard prediction in the upper reaches of the Yellow River (Gonghe to Xunhua section) by obtaining high-precision and accurate landslide susceptibility evaluation results. Historical landslide data were used to train six integrated models (IV-RF, IV-SVM, IV-XGBoost, CF-RF, CF-SVM, CF-XGBoost), and each model’s accuracy was assessed using ROC curves and ACC values. New landslide data, identified through SBAS-InSAR technology and optical remote sensing images, were then overlaid with the susceptibility results from each model. The model that performed best, CF-XGBoost, was analyzed further. The results from this model, based on the new landslide data, were used as a key factor in predicting landslide occurrences in the study area. The key conclusions are as follows:

1 The CF-XGBoost model provided the highest accuracy among the six integrated models, with an AUC value of 0.916. Overlaying the model’s predictions with new landslide data showed a high degree of accuracy, with 91% of new landslides identified in high-risk and very high-risk areas, and no landslides detected in low-risk areas.

2 Spatial differences were observed between the susceptibility results based on historical data and those using new landslide data. The models using new landslide data more accurately reflected actual conditions, whereas those based on historical data tended to misidentify high-risk areas due to long-term landslide control, which led to inaccurate positive sample data for training.

3 The landslide susceptibility evaluation indicated that the highest-risk areas are concentrated in Jianzha County, Hualong County, and Xunhua County. Based on the factor weights, natural geographical conditions are the primary drivers of landslide occurrence, with rainfall being the most significant external factor. As such, landslide prevention efforts should be intensified in these counties during the rainy season.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

JZ: Writing – original draft, Methodology, Visualization, Investigation, Writing – review and editing. WT: Formal Analysis, Investigation, Supervision, Writing – eview and editing. XW: Investigation, Writing – review and editing. XZ: Investigation, Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was funded by Qinghai Institute of Technology “Kunlun Talent” Talent Introduction Research Project (2023-QLGKLYCZX-25).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feart.2025.1652646/full#supplementary-material

References

Akinci, H. (2022). Assessment of rainfall-induced landslide susceptibility in Artvin, Turkey using machine learning techniques. J. Afr. Earth Sci 191, 104535. doi:10.1016/j.jafrearsci.2022.104535

CrossRef Full Text | Google Scholar

Bhandary, N. P., Dahal, R. K., Timilsina, M., and Yatabe, R. (2013). Rainfall event-based landslide susceptibility zonation mapping. Nat. Hazards 69, 365–388. doi:10.1007/s11069-013-0715-x

CrossRef Full Text | Google Scholar

Ding, D., Wu, Y., Wu, T., and Gong, C. (2025). Landslide susceptibility assessment in tongguan District Anhui China using information value and certainty factor models. Sci. Rep. 15, 12275. doi:10.1038/s41598-025-93704-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Dong, G., Zhang, F., Liu, F., Zhang, D., Zhou, A., Yang, Y., et al. (2018). Multiple evidences indicate no relationship between prehistoric disasters in Lajia site and outburst flood in upper Yellow River valley, China. Sci. China Earth Sci. 61, 441–449. doi:10.1007/s11430-017-9079-3

CrossRef Full Text | Google Scholar

Dou, H., Huang, S., Jian, W., and Wang, H. (2023). Landslide susceptibility mapping of mountain roads based on machine learning combined model. J. Mt. Sci. 20, 1232–1248. doi:10.1007/s11629-022-7657-2

CrossRef Full Text | Google Scholar

Du, J., Li, Z., Song, C., Zhu, W., Ji, Y., Zhang, C., et al. (2023). InSAR-Based active landslide detection and characterization along the upper reaches of the yellow River. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 16, 3819–3830. doi:10.1109/JSTARS.2023.3263003

CrossRef Full Text | Google Scholar

Fei, X., Tian, Y., Zhao, C., Liu, H., and Chen, H. (2023). Identification and deformation monitoring of unstable slopes in Longyangxia Reservoir area,the upper reach of Yellow River,China based on multi-temporal InSAR technology. J. Earth Sci. Environ. 45 (03), 578–589. doi:10.19814/j.jese.2022.11042

CrossRef Full Text | Google Scholar

Gu, T., Duan, P., Wang, M., Li, J., and Zhang, Y. (2024). Effects of non-landslide sampling strategies on machine learning models in landslide susceptibility mapping. Sci. Rep. 14, 7201. doi:10.1038/s41598-024-57964-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, F., Wu, D., Ge, M., Dong, J., Fang, H., and Tian, D. (2024). The influence of continuous variable factor classification and machine learning model on the accuracy of landslide susceptibility evaluation. Inf. Sci. Wuhan. Univ. doi:10.13203/j.whugis20230413

CrossRef Full Text | Google Scholar

He, Y., Wang, W., Zhang, L., Chen, Y., Chen, Y., Chen, B., et al. (2023). An identification method of potential landslide zones using InSAR data and landslide susceptibility. Geomat. Nat. Hazards Risk 14 (1), 2185120. doi:10.1080/19475705.2023.2185120

CrossRef Full Text | Google Scholar

Hong, H., Wang, D., Zhu, A., and Wang, Y. (2024). Landslide susceptibility mapping based on the reliability of landslide and non-landslide sample. Expert Syst. Appl. 243, 122933. doi:10.1016/j.eswa.2023.122933

CrossRef Full Text | Google Scholar

Huang, F., Cao, Z., Guo, J., Jiang, S., Li, S., and Guo, Z. (2020). Comparisons of heuristic, general statistical and machine learning models for landslide susceptibility prediction and mapping. CATENA 191, 104580. doi:10.1016/j.catena.2020.104580

CrossRef Full Text | Google Scholar

Huang, W., Ding, M., Li, Z., Zhuang, J., Yang, J., Li, X., et al. (2022). An efficient user-friendly integration tool for landslide susceptibility Mapping Based on Support Vector Machines: SVM-LSM toolbox. Remote Sens. 14, 3408. doi:10.3390/rs14143408

CrossRef Full Text | Google Scholar

Huang, F., Xiong, H., Yao, C., Catani, F., Zhou, C., and Huang, J. (2023). Uncertainties of landslide susceptibility prediction considering different landslide types. J. ROCK Mech. Geotech. 15, 2954–2972. doi:10.1016/j.jrmge.2023.03.001

CrossRef Full Text | Google Scholar

Jia, H., Wang, Y., Ge, D., Deng, Y., and Wang, R. (2022). InSAR Study of landslides: early detection, Three-Dimensional, and long-term surface displacement Estimation—A case of Xiaojiang River Basin, China. Remote Sens. 14, 1759. doi:10.3390/rs14071759

CrossRef Full Text | Google Scholar

Jiang, Z., Zhao, C., Yan, M., Wang, B., and Liu, X. (2022). The early identification and spatio-temporal characteristics of loess landslides with SENTINEL-1A datasets: a case of Dingbian County, China. Remote Sens. 14, 6009. doi:10.3390/rs14236009

CrossRef Full Text | Google Scholar

Li, Y., Zhu, H., and Chen, S. (2016). Landslide hazard assessment in the upper reaches of Yellow River based on AHP Method. Sci. Surv. Mapp. 41 (08), 67–70+75. doi:10.16251/j.cnki.1009-2307.2016.08.014

CrossRef Full Text | Google Scholar

Li, B., Liu, K., Wang, M., He, Q., Jiang, Z., Zhu, W., et al. (2022). Global dynamic rainfall-induced landslide susceptibility mapping using machine learning. Remote Sens. 14, 5795. doi:10.3390/rs14225795

CrossRef Full Text | Google Scholar

Li, C., Liu, Y., Lai, S., Wang, D., He, X., and Liu, Q. (2024a). Landslide susceptibility analysis based on the coupling model of logistic regression and support vector machine. J. Nat. Disasters 33 (02), 75–86. doi:10.13577/j.jnd.2024.0208

CrossRef Full Text | Google Scholar

Li, Z., Leng, L., Sun, Y., Huo, Y., and He, Y. (2024b). Landslide susceptibility assessment in the river cascade development basin based on the IV-LM coupling model. Bull. Surv. Mapp., 237–241. doi:10.13474/j.cnki.11-2246.2024.S147

CrossRef Full Text | Google Scholar

Liu, Y., and Chen, C. (2024). Landslide susceptibility evaluation method considering spatial heterogeneity and feature selection. Acta Geod. Cartogr. Sinica 53 (7), 1417–1428.

Google Scholar

Lu, J., He, Y., Zhang, L., Zhang, Q., Gao, B., Chen, H., et al. (2024). Ensemble learning landslide susceptibility assessment with optimized non-landslide samples selection. Geomat. Nat. Hazards Risk 15 (1), 2378176. doi:10.1080/19475705.2024.2378176

CrossRef Full Text | Google Scholar

Lv, Z., Wang, S., Yan, S., Han, J., and Zhang, G. (2024). Landslide susceptibility assessment based on Multisource remote sensing considering inventory quality and modeling. Sustainability 16, 8466. doi:10.3390/su16198466

CrossRef Full Text | Google Scholar

Mao, Y., Mwakapesa, D., Wang, G., Nanehkaran, Y., and Zhang, M. (2021). Landslide susceptibility modelling based on AHC-OLID clustering algorithm. Adv. SPACE Res. 68, 301–316. doi:10.1016/j.asr.2021.03.014

CrossRef Full Text | Google Scholar

Nguyen, D., Tiep, N., Bui, Q., Le, H., Prakash, I., Costache, R., et al. (2025). Landslide susceptibility mapping using rbfn-based ensemble machine learning models. Comput. Model Eng. Sci. 142 (1), 467–500. doi:10.32604/cmes.2024.056576

CrossRef Full Text | Google Scholar

Pareek, T., Bhuyan, K., Westen, C. V., Rajaneesh, A., Sajinkumar, K. S., and Lombardo, L. (2025). Analyzing the posterior predictive capability and usability of landslide susceptibility maps: a case of Kerala, India. Landslides 22, 655–670. doi:10.1007/s10346-024-02389-4

CrossRef Full Text | Google Scholar

Qi, T., Meng, X., and Zhao, Y. (2024). Landslide susceptibility assessment in active tectonic areas using machine learning algorithms. Remote Sens. 16, 2724. doi:10.3390/rs16152724

CrossRef Full Text | Google Scholar

Rohan, T., Shelef, E., Mirus, B., and Coleman, T. (2023). Prolonged influence of urbanization on landslide susceptibility. Landslides 20, 1433–1447. doi:10.1007/s10346-023-02050-6

CrossRef Full Text | Google Scholar

Sabatakakis, N., Koukis, G., Vassiliades, E., and Lainas, S. (2014). Landslide susceptibility zonation in Greece. Nat. Hazards 65, 523–543. doi:10.1007/s11069-012-0381-4

CrossRef Full Text | Google Scholar

Shi, X., Yang, C., Zhang, L., Jiang, H., Liao, M., Zhang, L., et al. (2019). Mapping and characterizing displacements of active loess slopes along the upstream Yellow River with multi-temporal InSAR datasets. Sci. TOTAL Environ. 674, 200–210. doi:10.1016/j.scitotenv.2019.04.140

PubMed Abstract | CrossRef Full Text | Google Scholar

Su, C. (2023). Study on the risk evaluation of geoenvironmental hazardsin the mountainous areas of southern Ningxia in themiddle and upper reaches of the Yellow River (master’s thesis). Chang'an University. doi:10.26976/d.cnki.gchau.2023.001209

CrossRef Full Text | Google Scholar

Tu, K., Ye, S., Zou, J., Hua, C., and Guo, J. (2023). InSAR displacement with high-resolution optical remote sensing for the early detection and deformation analysis of active landslides in the upper yellow River. Water 15, 769. doi:10.3390/w15040769

CrossRef Full Text | Google Scholar

Umar, Z., Pradhan, B., Ahmad, A., Jebur, M., and Tehrany, M. (2014). Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in West Sumatera Province, Indonesia. CATENA 118, 124–135. doi:10.1016/j.catena.2014.02.005

CrossRef Full Text | Google Scholar

Wang, X., and Bai, S. (2023). Landslide susceptibility mapping and interpretation in the upper Minjiang River Basin. Remote Sens. 15, 4947. doi:10.3390/rs15204947

CrossRef Full Text | Google Scholar

Wang, J., Jaboyedoff, M., Chen, G., Luo, X., Derron, M., Hu, Q., et al. (2024a). Landslide susceptibility prediction and mapping using the LD-BiLSTM model in seismically active mountainous regions. Landslides 21, 17–34. doi:10.1007/s10346-023-02141-4

CrossRef Full Text | Google Scholar

Wang, L., Lv, G., Du, J., Zhu, J., Zhao, G., Wang, D., et al. (2024b). InSAR detection and spatiotemporal characteristics of active landslides in the maqin section of the upper yellow River. Inf. Sci. Wuhan. Univ. doi:10.13203/j.whugis20240490

CrossRef Full Text | Google Scholar

Wang, Q., Xiong, J., Cheng, W., Cui, X., Pang, Q., Liu, J., et al. (2024c). Landslide susceptibility mapping methods coupling with statistical methods, machine learning models and clustering algorithms. J. Geo- Inf. Sci. 26 (3), 620–637. doi:10.12082/dqxxkx.2024.230427

CrossRef Full Text | Google Scholar

Wu, X., Shen, S., and Niu, R. (2016). Landslide susceptibility prediction using GIS and PSO-SVM. Inf. Sci. Wuhan. Univ. 41 (05), 665–671. doi:10.13203/j.whugis20130566

CrossRef Full Text | Google Scholar

Xing, X., Wu, C., Li, J., Li, X., Zhang, L., and He, R. (2021). Susceptibility assessment for rainfall-induced landslides using a revised logistic regression method. Nat. Hazards 106, 97–117. doi:10.1007/s11069-020-04452-4

CrossRef Full Text | Google Scholar

Xing, Y., Huang, S., Yue, J., Chen, Y., Xie, W., Wang, P., et al. (2023). Patterns of influence of different landslide boundaries and their spatial shapes on the uncertainty of landslide susceptibility prediction. Nat. Hazards 118, 709–727. doi:10.1007/s11069-023-06025-7

CrossRef Full Text | Google Scholar

Yang, C., Liu, L., Huang, F., Huang, L., and Wang, X. (2023a). Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples. Gondwana Res. 123, 198–216. doi:10.1016/j.gr.2022.05.012

CrossRef Full Text | Google Scholar

Yang, S., Li, D., Liu, Y., Xu, Z., Sun, Y., and She, X. (2023b). Landslide identification in human-modified alpine and canyon area of the Niulan River Basin based on SBAS-InSAR and optical images. Remote Sens. 15, 1998. doi:10.3390/rs15081998

CrossRef Full Text | Google Scholar

Yang, X., Fan, X., Wang, K., and Zhou, Z. (2024). Research on landslide susceptibility prediction model based on LSTM-RF-MDBN. Environ. Sci. Pollut. Res. 31, 1504–1516. doi:10.1007/s11356-023-31232-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, L., Pradhan, B., and Wang, Y. (2025). A comparative study of various combination strategies for landslide susceptibility mapping considering landslide types. Geosci. Front. 16 (2), 101999. doi:10.1016/j.gsf.2024.101999

CrossRef Full Text | Google Scholar

Zhang, Z., Deng, M., Xu, S., Zhang, Y., Fu, H., and Li, Z. (2022). Comparison of landslide susceptibility assessment models in Zhenkang County, Yunnan Province, China. Chin. J. Rock Mech. Eng. 41 (01), 157–171. doi:10.13722/j.cnki.jrme.2021.0360

CrossRef Full Text | Google Scholar

Zhang, Y., Xu, P., Liu, J., He, J., Yang, H., Zeng, Y., et al. (2023). Comparison of LR, 5-CV SVM, GA SVM, and PSO SVM for landslide susceptibility assessment in Tibetan Plateau area, China. J. Mt. Sci. 20, 979–995. doi:10.1007/s11629-022-7685-y

CrossRef Full Text | Google Scholar

Zhao, S., Zeng, R., Zhang, Z., Wang, H., and Meng, X. (2022). Early identification and influencing factors of potential landslides in the upper reaches of the Yellow River, China. Mt. Res. 40 (2), 249–264. doi:10.16089/j.cnki.1008-2786.000669

CrossRef Full Text | Google Scholar

Zhou, C., Wang, Y., Cao, Y., Singh, R. P., Ahmed, B., Motagh, M., et al. (2024). Enhancing landslide susceptibility modelling through a novel non-landslide sampling method and ensemble learning technique. Geocarto Int. 39 (1), 2327463. doi:10.1080/10106049.2024.2327463

CrossRef Full Text | Google Scholar

Zhu, Y., Sun, D., Wen, H., Zhang, Q., Ji, Q., Li, C., et al. (2024). Considering the effect of non-landslide sample selection on landslide susceptibility assessment. Geomat. Nat. Hazards Risk 15 (1), 2392778. doi:10.1080/19475705.2024.2392778

CrossRef Full Text | Google Scholar

Keywords: upper Yellow River (China), statistical approaches, machine learning, SBAS-InSAR technology, landslide susceptibility assessment

Citation: Zeng J, Tuo W, Wang X and Zhao X (2025) Landslide susceptibility assessment of upper Yellow River using coupling statistical approaches, machine learning algorithms and SBAS-InSAR technique. Front. Earth Sci. 13:1652646. doi: 10.3389/feart.2025.1652646

Received: 24 June 2025; Accepted: 04 August 2025;
Published: 29 August 2025.

Edited by:

Chong Xu, Ministry of Emergency Management, China

Reviewed by:

Bo Liu, China University of Geosciences, China
Yang Dongxu, Chengdu University of Technology, China
Zhihan Wang, Yangtze University, China

Copyright © 2025 Zeng, Tuo, Wang and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wanbing Tuo, d2J0dW9AcWhpdC5lZHUuY24=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.