ORIGINAL RESEARCH article

Front. Environ. Sci., 13 June 2022

Sec. Environmental Informatics and Remote Sensing

Volume 10 - 2022 | https://doi.org/10.3389/fenvs.2022.897254

Application of a Novel Hybrid Machine Learning Algorithm in Shallow Landslide Susceptibility Mapping in a Mountainous Area

  • 1. Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran

  • 2. Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran

  • 3. Department of Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, Lulea, Sweden

  • 4. Research Institute of Forests and Rangelands, Agricultural Research, Education and Extension Organization (AREEO), Tehran, Iran

  • 5. Research Geomorphologies, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, Prince George, BC, Canada

  • 6. Department of Earth and Environment, Institute of Environment, Florida International University, Miami, FL, United States

  • 7. The Center for Artificial Intelligence and Environmental Sustainability (CAIES) Foundation, Bihar, India

  • 8. Department of Geoinformation, Faculty of Built Environment and Surveying, University Technology Malaysia (UTM), Johor Bahru, Malaysia

Article metrics

View details

19

Citations

3,9k

Views

1,1k

Downloads

Abstract

Landslides can be a major challenge in mountainous areas that are influenced by climate and landscape changes. In this study, we propose a hybrid machine learning model based on a rotation forest (RoF) meta classifier and a random forest (RF) decision tree classifier called RoFRF for landslide prediction in a mountainous area near Kamyaran city, Kurdistan Province, Iran. We used 118 landslide locations and 25 conditioning factors from which their predictive usefulness was measured using the chi-square technique in a 10-fold cross-validation analysis. We used the sensitivity, specificity, accuracy, F1-measure, Kappa, and area under the receiver operating characteristic curve (AUC) to validate the performance of the proposed model compared to the Artificial Neural Network (ANN), Logistic Model Tree (LMT), Best First Tree (BFT), and RF models. The validation results demonstrated that the landslide susceptibility map produced by the hybrid model had the highest goodness-of-fit (AUC = 0.953) and higher prediction accuracy (AUC = 0.919) compared to the benchmark models. The hybrid RoFRF model proposed in this study can be used as a robust predictive model for landslide susceptibility mapping in the mountainous regions around the world.

1 Introduction

In recent years, population growth and urban development have contributed to an increase in natural disasters in both developed and developing countries (Huppert and Sparks, 2006), but are generally more serious in developing countries (Alcántara-Ayala, 2002). Landslides are among the most common damaging geohazards, especially in the mountainous regions. Landslide is a general term for a variety of mass movements in soil or rock moving downslope by gravity (Malamud et al., 2004). Over the last century (1903-2004), landslides alone accounted for 17% of the world’s natural disasters, with the highest annual damage in Europe estimated at US$17 million (Koehorst et al., 2005). Landslides have resulted in hundreds of billions of dollars in damages to the built environment, thousands of fatalities, and numerous environmental impacts and are an important driver of landscape change (Aleotti and Chowdhury, 1999; Schuster and Highland, 2001; Geertsema et al., 2009; Fan et al., 2019; Kadirhodjaev et al., 2020). In developing countries, more than 0.5% of the Gross Domestic Product (GDP) is lost every year due to landslides (Chen et al., 2015).

The topic of landslide susceptibility mapping (LSM) has become increasingly popular over the last decade and is being continuously fine-tuned to mitigate landslide hazards, inform land use planning, and improve the prediction accuracy of upcoming landslides. The core idea of LSM has been to explore the association between historical landslides and different causing factors for the prediction of the likelihood of upcoming landslides. To analyze the associations between historical landslides and causing factors, researchers have suggested and used many methods that range from simple and straightforward expert-based and statistical methods to advanced and complex methods derived from machine learning. The expert knowledge methods such as analytical hierarchy process (AHP) (Althuwaynee et al., 2016) and spatial multicriteria evaluation (SMCE) (Meena et al., 2019) and bivariate and multivariate statistics such as frequency ratio (FR) (Chen et al., 2015), weights of evidence (WoE) (Razavizadeh et al., 2017), weighted linear combinations (WLC) (Hung et al., 2016), statistical index (SI) (Razavizadeh et al., 2017), certainty factor (CF) (Wang et al., 2019), index of entropy (IOE) (Chen et al., 2015), and logistic regression (LR) (Sun et al., 2021b) are the first generation methods used for LSM mapping worldwide, with clear processes and are easy to understand and interpret outcomes.

The next generation of the methods used for LSM has originated from machine learning that involves hundreds of algorithms such as artificial neural network (ANN) (Lucchese et al., 2021), adaptive neurofuzzy inference system (ANFIS) (Jaafari et al., 2017), random forest (RF) (Park and Kim, 2019), support vector machine (SVM) (Yao et al., 2008), decision tree (DT) (Dou et al., 2019a), Naïve Bayes (NB) (Nguyen and Kim, 2021), Bayesian logistic regression (BLR) (Abedini et al., 2019), best first decision tree (BFT) (Chen et al., 2018), and deep learning neural network (Ghasemian et al., 2022). With the improvement of artificial intelligence, machine learning becomes the most applied approach for LSM currently.

The other stream of research on machine learning modeling of LSM has combined different methods/algorithms to achieve more accurate prediction results. For example, Pham et al. (2017) reported that the rotation forest (RoF) technique improved the predictive ability of the Naïve Bayes tree landslide prediction. Nguyen et al. (2017) showed that the landslide predictive ability of the instance based learning classifier can be improved by the RoF technique. He et al. (2019) combined the Creedal Decision Tree with the RoF technique and achieved improved accuracy for landslide prediction. In a recent study, Fang et al. (2021) demonstrated that the performance of the decision tree models could be significantly improved when they were integrated with the RoF technique. The key advantage of the RoF technique as a meta classifier is that it can balance accuracy and diversity and decrease bias and overfitting of the modeling process.

Reliability and accuracy of future probabilities are the most important characteristics of a landslide susceptibility map. While machine learning is now widely used in LSM, there is no best method for accurately predicting landslides, especially not for the regions of varying levels of geoenvironmental complexity. However, the experience, to date, suggests comparing several different methods and selecting the optimal one to generate an accurate landslide susceptibility map for a given region. Furthermore, the evaluation of the usefulness of different conditioning factors via feature screening techniques and the optimization of different methods in terms of parameters are other important subjects in the field of landslide modeling (Sun et al., 2021a; Zhou X. et al., 2021).

Iran’s vast mountainous areas have been shaped and modified by ongoing tectonic forces, producing faults, fractures, and sensitive lithology, priming the country for landslides (Shafizadeh-Moghadam et al., 2019). Increased developmental activities and industrial and agricultural and human encroachment on the natural environment due to the extensive land use change in forested areas in recent decades have increased the vulnerability to landslides. Susceptibility mapping and understanding landslide mechanisms, in order to reduce or control landslide damage, are necessary.

In this study, we combined a metaclassifier algorithm with a standalone algorithm as a base classifier to increase the predictive power of the base classifier by reinforcing the parameters used in the model during the calibration phase. The main contribution of our study is to explore how the RoF classifier and a random forest (RF) decision tree classifier generate a hybrid predictive model, called RoFRF, that provides an opportunity to pilot hybrid modeling of LSM and insights into feature screening and selection and parameter optimization. We developed the hybrid RoFRF model using the datasets belonging to Asadi et al. (2022) and Ghasemian et al. (2022) from the Kamyaran area in the Kurdistan Province, Iran, but with a different set of algorithms and results.

2 Study Area

The study area is in the southwest of the Kurdistan Province covering an area of about 150 km2 (Figure 1). The minimum and maximum elevations of the study area are 850 and 2,328 m, respectively, with a height difference of 1,478 m (Figure 1). The average annual rainfall for the period from 2001 to 2019 ranges from 438 to 560 mm and the average annual temperature is 14.15°C. Based on the De Martonne climatic classification index, the climate of the study area is semi-arid climate (Asadi et al., 2022). Bedrock geology belongs to the structural zone of Sanandaj-Sirjan and the high Zagros zone, typified by basalts and shales with the intercalations of volcanic rocks. The six main land cover classes include dry farming, semi-dense forest, low-dense forest, semi-dense pasture, dense pasture, and woodland. The predominant land covers in the study area are semi-dense forests and dry farming (Ghasemian et al., 2022).

FIGURE 1

The study area involved by active faults from the Zagros Main Recent Fault and its formations such as marl and shale and also due to the topographical conditions and geomorphological process (steep slopes and soil erosion) and also anthropogenic factor or improper human interferences (e.g., road construction on Kashtar to Yozidar route and the removal of slope bases) resulted in some landslides occurrence, considering the study area as one of the most susceptible regions of the Kurdistan Province and the country (Asadi et al., 2022). Landslides of the study area are typically shallow with the rupture surfaces less than 2–3 m depth. Figure 2 shows photos of a number of the landslides that occurred in the study area.

FIGURE 2

3 Methodology

The methodology of this research is shown in Figure 3. We selected 25 conditioning factors for the terrain hosting landslides derived from the topographical, geological and land cover maps, meteorological data, digital elevation model (DEM), documentary sources, field surveys, and Google Earth Imagery. We classified our landslides into two groups; training landslides (80%) and validation landslides (20%) (Xie et al., 2021a; Xie et al., 2021b). We then developed the hybrid RoFRF model and compared its performance to the four benchmark models including RF, ANN, BFT, and logistic model tree (LMT) using area under ROC and other statistical measures. The main steps of the methodology are described in the following subsections.

FIGURE 3

3.1 Data Collection

3.1.1 Landslide Inventory

From a total of 118 landslide points detected in the study area, we subdivided the landslides into two datasets with 80% (94 landslides) in a training dataset and 20% (24 landslides) in a validation dataset. These 118 landslides were selected from the 175 landslides that have been previously used by Asadi et al. (2022) and Ghasemian et al. (2022).

3.1.2 Landslide Conditioning Factors

In this study, we selected 25 landslide conditioning factors that were, elevation, slope, aspect, annual solar radiation, curvature, plan curvature, profile curvature, valley depth, vector ruggedness measure (VRM), topographic wetness index (TWI), stream power index (SPI), slope length (LS), topographic position index (TPI), terrain ruggedness index (TRI), normalized difference vegetation index (NDVI), land use, lithology, soil texture, rainfall, fault density, road density, river density, and distance to faults, roads, and rivers (Table 1).

TABLE 1

CategoryParameterScale/resolution
Topographic mapElevationALOS PALSAR DEM resampled into 10 m
Slope10 m
Aspect10 m
Curvature10 m
Plan curvature10 m
Profile curvature10 m
Annual solar radiation10 m
Slope length (SL)10 m
Valley depth (VD)10 m
Geological mapGeologyGeo-map, 1:100,000
Distance from faultGeo-map, 1:100,000
Faults density
Hydrological mapTopographic Wetness Index (TWI)10 m
Stream Power Index (SPI)
Distance to rivers (m)10 m
River density
Rainfall (mm)Average rainfall data of different stations
Land cover mapSoil typeProjected soil map (1:25,000)
Land useIranian land use map
VRM
TRI
TPI10 m
NDVILandsat 8
Distance to Roads (m)Topographical map, 1:25,000, Google Earth image
Road density

Data layers considered as effective factors in the analysis.

Topographic factors (slope, aspect, elevation, profile curvature, plan curvature, and slope length), that have been previously identified as the most influential landslide causing factors, were derived from a DEM of the study area (Zhang et al., 2019b; Wang et al., 2021). Another DEM-derived factor was annual solar radiation, which may affect the incidence of landslides. The land use and land cover were visually interpreted using the high-resolution satellite imagery. NDVI can show surface reflectance of the area and yield quantitative estimates of the vegetation biomass and growth (Li J. et al., 2021; Liu et al., 2022), which can influence landslides. TWI, SPI, TPI, VRM, and valley depth are the secondary DEM products that have been widely used for landslide prediction modeling (Li and Zhang, 2008; Zhang et al., 2019a; Dou et al., 2019b; Lan et al., 2021; Zhao et al., 2021). We incorporated the mean annual rainfall data from 2001 to 2019 to generate a rainfall map using the inverse distance weighted (IDW) method (Chao et al., 2021; de Jesus et al., 2021). The strength and permeability of soils and rocks are controlled in parts by structural variations and lithological formations (Jiang et al., 2021). We extracted lithological units from the geological maps (1:100,000 scale). Soil texture has a significant impact on landslide occurrence (Geertsema et al., 2009). The density maps that include fault density, road density, and river density were prepared using the GIS-based techniques to quantify their effects (Yin et al., 2022b; Chen et al., 2022) on landslide occurrences. The distance maps that include distance to rivers, distance to rivers, and distance to faults were also prepared using the GIS-based techniques (Lee et al., 2017) (Table 2).

TABLE 2

FactorsClasses
Slope (o)(1) 0–13; (2) 14–22; (3) 23–30; (4) 31–42, and (5) >43
Aspect1) flat; (2) north; (3) north east; (4) east; (5) south east; (6) south; (7) south west; (8) west; and (9) north west
Elevation (m)(1) 850–1,000, (2) 1,000–1,200, (3) 1,200–1,400, (4) 1,400–1,600, (5) 1,600–1800, (6) 1800–2000, (7) 2000–2,200, and (8) 2,200–2,400
Curvature(1) highly concave (−51.20)–(−3.79), concave (−3.79)–(−1.12), (3) flat (−1.12)–(0.54), (4) convex (0.54)–(3.21), and (5) very convex (3.21)–(33.9)
Plan curvature(1) [(−28.51)–(−1.43)], (2) [(−1.43)–(−0.44)], (3) [(−0.44)–(0.34)], (4) [(0.34)–(1.53)], and (5) [(1.53)–(21.09)]
Profile curvature(1) [(−23.05)–(−2.29)], (2) [(−2.29)–(−0.519)], (3) [(−0.519)–(0.272)], (4) [(0.272)–(2.05)], and (5) [(2.05)–(27.4)]
Solar radiation(1) 80,000–43,000, (2) 440,000–540,000, (3) 550,000–630,000, (4) 640,000–700,000, and (5) 710,000–810,000
VRM(1) 0–0.0302, (2) 0.0303–0.0795, (3) 0.796–0.151, (4) 0.152–0.274, and (5) 0.275–0.699
VD(1) 0–37.9, (2) 38–87.7, (3) 87.8–149, (4) 150–233, and (5) 234–508
SPI(1) 0–1,510, (2) 1,520–1,600, (3) 1,610–3,110, (4) 3,120–26,500, and (5) 26,600–390,000
TWI(1) 0.0895–2.62, (2) 2.63–3.32, (3) 3.33–4.15, (4) 4.16–6.26, and (5) 6.26–10.70
TRI(1) 0–2.64, (2) 2.65–4.75, (3) 4.76–7.74, (4) 7.75–13.4, and (5) 13.5–44.9
TPI(1) (−75.7)–(−9.77), (2) (−9.77)–(−2.83), (3) (−2.83)–(2.94), (4) (2.94)–(11.03), and (5) (11.03)–(71.7)
LS(1) 0–6.88, (2) 6.89–13.1, (3) 13.2–19.6, (4) 19.7–28.2, and (5) 28.3–87.8
Land cover(1) Dry farming; (2) Semi-dense forest; (3) Low-dense forest; (4) Semi-dense pasture, (5) Dense pasture, and (6) Woodland
NDVI(1) (−0.351)–(−0.064), (2) (−0.064)–(0.008), (3) (0.008)–(0.099), (4) (0.099)–(0.260), and (5) (0.260)–(0.759)
Rainfall (mm)(1) 438–440, (2) 440–480, (3) 480–520, and (4) 520–560
Distance to fault (m)(1) 0–100, (2) 101–200, (3) 201–300, (4) 301–400, and (5) >400
Distance to road (m)(1) 0–100, (2) 101–200, (3) 201–300, (4) 301–400, and (5) >400
Distance to river (m)(1) 0–100, (2) 101–200, (3) 201–300, (4) 301–400, and (5) >400 m
Fault density (km/km2)(1) 0–0.67, (2) 0.671–1.84, (3) 1.85–3.01, (4) 3.02–4.41, and (5) 4.42–7.12
Road density (km/km2)(1) 0–0.440, (2) 0.440–1.210, (3) 1.210–1.914, (4) 1.914–2.772, and (5) 2.772–5.610
River density (km/km2)(1) 0–0.5551, (2) 0.5551–1.4608, (3) 1.4608–2.4542, (4) 2.4542–3.7983, and (5) 3.7983–7.4505
Lithology(1) OLc, (2) QT1, (3) Residential area, (4) Ksh, (5) Kul, (6) Mb, (7) Kpl, (8) Pel, (9) PEf, and (10) MZD. P.2pL
Soil(1) Silty Loam, (2) Clay Loam, (3) Loam, (4) Sandy Loam, and (5) Silty Clay

Landslide conditioning factors with their classes.

3.2 Machine Learning Algorithms

3.2.1 Artificial Neural Networks

ANN is one of the widely used ML algorithms to capture complex trends in the multivariate datasets. The features in ANN use independent statistical distribution, self-learning, and interdependent memory (Zhang et al., 2021). Although not known as a black box model where the modeling processes are not specified, it has been widely used for pattern recognition, classification, and solving regression problems (Zhang et al., 2022). ANNs have multiple nodes that imitate biological neurons in the human brain and therefore the ANN model is often applied in the medical field. It is also being used in landslide susceptibility mapping and drawing relationships between landslides and a host of conditioning factors. The ANN approach has certain advantages over the other statistical techniques. ANNs have input layers (conditioning factors), hidden layers (with activation functions), and output layers (landslide and nonlandslide labels) (Yin et al., 2022a). The neurons process inputs by multiplying each entry with the corresponding weight and summing the product. In turn, the sum is processed using a nonlinear transfer function. ANNs learn by adjusting the weights between the neurons associated with errors between the actual output values and target output values. Then, a number of iterations and learning the neural network created a model that predicts target values from the given input values. We used a back propagation (BP) algorithm, learning in the ANN employing the error signal Es as a measure of the network’s performance.

An ANN becomes a more robust model when relationships between the training datasets are not known. The neurons connect these layers, each using a direct link to connect with other neurons. The links have weights that reflect the power of the outgoing signal (Zhou et al., 2022).

The neurons in each layer were linked to the front and rear neurons with each associated weight (Khandelwal et al., 2018). There are two kinds of networks used in ANN: recurrent neural networks and feed-forward neural networks (FNN). The FNN, based on BP, is a well-known network used in many studies with excellent performance (Luan et al., 2022). Therefore, in this study, FNN was selected for the prediction of the Cv. To validate the performance of FNN, we used different quantitative validation indexes, namely the root means square error (RMSE), the mean absolute error (MAE), and the coefficient of determination (R2) (Li et al., 2021b; Li et al., 2021a). A detailed description of such measurements is presented in several previously published works. These indexes are expressed as follows (Zhou et al., 2021a; Zhou et al., 2021b; Xu et al., 2021):where is the actual output, infers the predicted output, infers the mean of the , and m infers the number of used samples (Li B. et al., 2021; Zhou W et al., 2021).

3.2.2 Logistic Model Tree

The LMT is a recently developed ML algorithm that combines the model trees and logistic regression functionality, especially for classification problems (Landwehr et al., 2005). Therefore, it is advantageous in selecting relevant features in the data, and it is considered an equivalent to the trees algorithm for categorical outcomes (Landwehr et al., 2005). A simplified version of the LMT equation iswhere D is the outcome category, the LMT has also proven to be a better algorithm when dealing with spatial data extracted from the remote sensing images (Colkesen and Kavzoglu, 2017). It reduces the likelihood of overfitting by cross-validation and logistic regression at each node in the tree where the tree undergoes pruning (Breiman et al., 1984).

3.2.3 Best First Tree

BFT build binary trees in which each internal node has exactly two outgoing edges. Here, the splitting process selects the “best” node that reduces impurity among all the available nodes (Shi, 2007). Essential to BF trees is deciding which attribute to split on and how to do so. Here, information and Gini gains are employed. The information value is determined by an entropy function, expressed as follows:where pn, n = 1, 2,. . , n, is the probability of each class and the sum of the pn is 1 (Quinlan, 1986).

Discovering the maximal Gini gain or information gain for a split at a node demands the finding of minimal values of the weighted sum of the information values (Gini index) of its successor nodes (Shi, 2007). This process ends when all the nodes reach a specific number of expansions. The best first decision tree learning process handles both the categorical and numerical variables, expanding the “best” node first.

3.2.4 Rotation Forest

RoF is a widely used ensemble method that was first introduced by Rodriguez et al. (2006). In RoF, the Principal Component Analysis (PCA) is used to extract features to build the training sets. RoF can enhance the accuracy of base classifiers for both individual and diverse applications simultaneously (Rodriguez-Galiano et al., 2012). Because of this, the RF model is commonly used in LSM to achieve a higher accuracy of the prediction capacity (Hong et al., 2019). We assume that x = (x1, x2, … , xn) is considered as the vector of the conditioning factors, while v = (v1, v2) is denoted as the vectors of landslides and nonlandslides, and H symbolizes the training set. E1, E2, and EL are represented as classifiers in the ensemble, and R is designated as a feature set. First, R is separated into K subset, where each subset has the number of condition factors equal to T = n/k. Then, we can get Rij (the jth landslide influencing factors) and Hij (the training set for the Rij features). According to the bootstrap technique, R’ij is randomly generated from the original training set Rij with 75% size. Subsequently, R’ij will be converted to obtain a T × 1 coefficient vector, which can be presented as . Then, by rearranging the matrix of Ri, the rotation matrix is formed as follows:

After that, for a specified test sample, the confidence () of each class is determined from the mean combination method as follows:where represents the probability produced by the classifier, Ei, assume that class k contains δ, and c denotes the sum of class number. In the end, the δ will be allocated to the class that has the highest confidence.

3.3 Model Performance Evaluation

To evaluate the model performance, we employed a variety of statistical index-based methods, including: Sensitivity (SST), Specificity (SPF), Accuracy (ACC), F1-Measure, Matthew’s correlation coefficient (MCC), Kappa, and Receiver operative characteristic (ROC). All the statistical metrics were computed based on the scores from true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). TP represents the number of pixels of landslides (value 1) that are correctly classified as landslides. TN represents the number of pixels of nonlandslides that are correctly classified as nonlandslides and FP is the number of nonlandslide pixels (value 0) that are incorrectly classified to be landslides while FN is defined as the number of landslide pixels that have been incorrectly classified as nonlandslide points (Zhou et al., 2022). These statistical index-based metrics are described from Eq. 815 as follows:where SST is the ratio of landslide pixels that have been rightly classified as landslides and this indicates the good predictability of the landslide model for classifying the shallow landslide pixels (Zhou et al., 2022). SPF is the ratio of the number of landslide locations that are correctly classified as nonlandslides that indicate good predictability of the landslide model for classifying the nonlandslide pixels. ACC is the ratio of nonlandslide and landslide pixels that are correctly classified (Bennett et al., 2013). The ACC demonstrates how well the landslide model works. The F1-Measure is used to assess the landslide diagnosis function. The recall measure represents the number of rightly retrieved pixels divided by the number of relevant pixels from the test dataset. It is a way to combine and balance both precision and sensitivity into a single measure (Konishi and Suga, 2018). The precision measure is the ratio of the correctly retrieved pixels over the number of retrieved pixels.

Mathew’s Correlation Coefficient (MCC) is a correlation coefficient of the observed and predicted binary classifications that yields a value from −1 to +1 (+1 represents a perfect prediction, 0 no better than random, and −1 indicates total disagreement). The Kappa index is used to assess the acceptability of landslide models. The values of this index vary from −1 (unacceptability) to +1 (acceptability) (Bennett et al., 2013). The AUC is a measure used to assess the model performance (Fawcett, 2006). The ROC curve is built with sensitivity on the y-axis and 1-specificity on the x-axis with some cut-off thresholds. The area under the ROC curve (AUC) implies the capability of a model to distinguish between the shallow landslide and nonlandslide pixels. A model is ideal or perfect if AUC isone, and weak or inaccurate if AUC is near zero.

3.4 Chi-Square

A chi-squared test (χ2) is a statistical hypothesis test to determine whether there is a significant statistical difference between the model’s performance in one or more categories of variables or not. The two factors include the number of cells found in the table and the total number of observations of the main factors (Bryant and Satorra, 2012). For the evaluation of the value of landslide predictors by the chi-square algorithm, we defined the null hypothesis first. This hypothesis shows that understanding the level of a landslide predictor does not aid in the prediction of landslide incidence (Sarkar and Kanungo, 2004). The variables are independent.

H1: There is no independent condition between variable X (e.g., aspect) and variable Y (e.g., landslide occurrence).

H0: There is an independent condition between variable X (e.g., aspect) and variable Y (e.g., landslide occurrence). This method is calculated according to Eq. 16:where χ2 is the chi-square, is the observed value, and is the expected value.

4 Results and Analysis

4.1 The Most Important Factors in the Modeling Procedure

Figure 4 shows the role and relative importance of the conditioning factors on the shallow landslide occurrence in our study area based on the average merit (AM) of the chi-square feature selection technique in a 10-fold cross-validation analysis. The results indicate that the distance to road has the highest impact (AM = 235.737) on landslides in the study area, followed by road density (AM = 124.198), lithology (AM = 108.694), land use (AM = 80.921), NDVI (AM = 42.228), soil (AM = 31.774), elevation (AM = 30.733), aspect (AM = 27.662), annual solar radiation (AM = 25.426), slope (AM = 15.538), VRM (AM = 13.489), rainfall (AM = 12.521), TWI (AM = 12.391), LS (AM = 11.563), distance to fault (AM = 11.210), and TRI (AM = 9.064).

FIGURE 4

4.2 Model Result, Validation, and Comparison

Table 3 shows the performance of the models using various statistical measures including specificity, sensitivity, accuracy, F1-measure, Kappa, and AUC obtained using the training dataset. The results show that the hybrid RoFRF model has the highest sensitivity (1.000), which points out that all of the landslide locations (100%) have been correctly classified as nonlandslide. However, RF has the highest specificity (1.000; 100%), indicating that 100% of the nonlandslide locations have been correctly classified and known as nonlandslide locations. This is followed by the RoFRF (0.989; 98.9%), LMT (0.750; 75%), BFT (0.725; 72.5%), and ANN (0.624; 62.4%) models. The accuracy metric state that the hybrid RoFRF model has the highest value (0.999; 99.9%), indicating that this model is able to correctly classify the landslide and nonlandslide locations as landslide and nonlandslide situations, respectively.

TABLE 3

ANNLMTBFTRFRoFRF
TP720733730752751
TN5357588894
FP32192201
FN413736620
Sensitivity (%)0.9460.9520.9530.9241.000
Specificity (%)0.6240.7500.7251.0000.989
Accuracy (%)0.9140.9340.9310.9310.999
F1-measure0.9120.9310.9290.9930.999
Kappa0.5440.6340.6280.9630.994
AUC0.9180.9440.8600.999100

Model results using the training dataset.

The LMT model was ranked as the second with an accuracy = 0.934, followed by the RF, BFT (0.931), and ANN (0.914) models. F1-measure shows the highest value of 0.999 for the hybrid RoFRF model, and the least value of 0.912 for ANN. Moreover, this value for the RF, LMT, and BFT models are 0.993, 0.931, and 0.929, respectively. The lowest and highest Kappa values are 0.544 and 0.994, respectively for the RoFRF and ANN models. Meanwhile, RF (0.963), LMT (0.634), and BFT (0.628) was ranked in other positions. The AUC value of the hybrid RoFRF model is 100, which shows that the power performance or goodness-of-fit of the hybrid RoFRF model is the highest (100%), followed by the RF (0.999), LMT (0.944), ANN (0.918), and BFT (0.860) models (Table 3).

Table 4 shows the prediction accuracy of the five models of the study that were obtained based on the validating dataset. These results are important for assessing the power prediction, applicability, and robustness of the models. According to Table 4, the sensitivity values for BFT, RoFRF, ANN, RF, and LMT are 0.953, 0.944, and 0.938, respectively. However, specificity is the highest for the hybrid RoFRF model (0.684; 68.4%) and then for the ANN (0.650; 65%), BFT (0.625; 62.5%), LMT, and RF (571; 57.1%) models, respectively. The highest value of accuracy is 0.921 (92.1%) for the hybrid RoFRF model, next for the ANN and BFT (0.917; 91.7%) and the RF and LMT (0.903; 90.3%) models. The F1-measure for the hybrid RoFRF model and the BFT model is 0.917 as the highest value, whereas this value is 0.913 for ANN and 0.900 for RF and LMT. Although the BFT model has the highest Kappa (0.578), it had the lowest value of AUC (0.829). Hence, the hybrid RoFRF model with a Kappa value of 0.561 has the AUC value of 0.933, indicating that the power prediction of the hybrid model is 93.3%. This indicates that this model with an AUC of 93.3% is highly capable of predicting landslides. The LMT model has the second-highest value of AUC (0.904; 90.4%) and ANN, RF, and BFT have AUC equal to 0.888 (88.8%), 0.853 (85.3%), and 0.829 (82.9%), respectively (Table 4).

TABLE 4

ANNLMTBFTRFRoFRF
TP185183183183186
TN1312151213
FP79996
FN111291211
Sensitivity (%)0.9440.9380.9530.9380.944
Specificity (%)0.6500.5710.6250.5710.684
Accuracy (%)0.9170.9030.9170.9030.921
F1-measure0.9130.9000.9170.9000.917
Kappa0.5440.4790.5780.4790.561
AUC0.8880.9040.8290.8530.933

Model results using the validation dataset.

4.3 Parameter Optimization

In the modeling procedure, successful and reasonable results are thoroughly dependent on the values of the parameters that are defined by the users. The parameter’s tuning procedure is done by trial-and-error technique and checking the obtained results such as AUC (Janizadeh et al., 2019; Pham et al., 2020a; Hong et al., 2020). We have presented the values of the parameters employed in each model in Table 5.

TABLE 5

RoFRDebug: true; max Depth: 0; numFeatures:0; numbTrees: 10; seed:1
RFDebug: False; maxGroup: 1; min Group: 3; numIterations:10; numberOfGroup: False; RemovedPercentage: 50; and seed:1
ANNDebug: False; ClusteringSeed: 1; min Std Dev: 0.1; numClusters: 2; and ridge: 1.0E-8
BFTDebug: False; heuristic: True; minNumObj: 2; numFoldspruning: 5; pruningStrategy: post- pruning; seed: 1; size per: 1.0; useErrorRate: True; useGini: True; and useOneSE: False
LMTDebug: False; convertNominal: False; errorOnprobabilities: False; fastRegression: True; minNumInstances: 15; numBoostingIterations: 1; SplitOnResiduals: False; use AIC: False; and WeightTrimBeta: 0.0

Parameters of machine learning algorithms for shallow landslide susceptibility mapping.

4.4 Landslide Susceptibility Maps

We assigned the landslide susceptibility index (LSIs) computed for each pixel of our study area by using the probability distribution function in the machine learning models. In this study, we classified the LSIs of RoFRF and LMT maps using the quantile classification method and the BFTree, RF, and ANN using the geometric interval classification method. The LSIs were reclassified into five susceptibility classes including very low susceptibility (VLS; dark green color), low susceptibility (LS; light green color), moderate susceptibility (MS: yellow color), high susceptibility (HS: orange color), and very high susceptibility (VHS; red color). Figure 5 shows the landslide susceptibility maps produced by the hybrid RoFRF model and the benchmark models used in this study.

FIGURE 5

Since the distance to road and road density factors were identified as the most important factors in the modeling process, the HS and VHS classes are located around the road networks. We enlarged a rectangle on the left side of the susceptibility maps to show graphically how many landslide occurrence locations (training and validation) are corresponding to the areas in terms of susceptibility to landslide occurrence.

4.5 Accuracy Assessment and Comparison

We tested and evaluated the performance and prediction accuracy of the hybrid model and the four soft computing benchmark models using the training and validating datasets, respectively (Figures 6, 7). From Figure 6A, the results indicate that the hybrid model with AUC equal to 0.953 (95.3%) has the highest performance compared with the other models, while according to Figure 6B, the power prediction of the hybrid model is 0.919 (91.9%). In comparison (Figures 7A,B), the hybrid RoFRF model is more capable in terms of both the performance and prediction accuracy than the LMT (AUC train = 0.903; AUC validating = 0.909), ANN (AUC train = 0.869; AUC validating = 0.894), RF (AUC train = 0.833; AUC validating = 0.878), and BFT (AUC train = 0.827; AUC validating = 0.798) models.

FIGURE 6

FIGURE 7

5 Discussion

The main objective of our study was to model the spatial distribution of landslide susceptibility and to produce a susceptibility map with high prediction reliability. Therefore, we focused on evaluating the performance of different machine learning methods as the crucial step of a landslide modeling project (Brenning, 2005; Reichenbach et al., 2018). While numerous methods have been suggested and used for landslide modeling over the past decades, machine learning methods have been preferred by many researchers (Merghadi et al., 2020). In recent years, the efficiency of ensemble learning techniques in improving the performance of the machine learning methods has been acknowledged by some researchers (Pham et al., 2020b; Nhu et al., 2020). To test the performance of single models against an ensemble model, we first measured the significance of the conditioning factors using the Chi-square technique with 10-fold cross-validation and identified the distance to roads and road density as the most significant factors related to landslide occurrences in the study area. Similar results have also been reported from other regions, where transport infrastructure cross-steep terrain (Jaafari et al., 2017; Schlögl and Matulla, 2018). Old road networks, which were once planned for low traffic and axle loads, are at extremely high risk of landslides. Hence, maintenance and landslide mitigation measures for these roads should be considered (Schlögl et al., 2019).

We assessed the models’ results via a validation process to compare the ability of four models developed to spatially predict landslides. 10 performance measures indicated that the RF model had a better goodness-of-fit (using the training dataset) and prediction ability (using the validation dataset) than that of the other three single models (i.e., ANN, LMT, and BFT). Many other studies have demonstrated the superiority of the RF model to other machine learning methods, such as best-first decision tree and Naïve Bayes tree (Chen et al., 2018), artificial neural network, and logistic regression (Smith et al., 2021). RF is a powerful nonlinear machine learning method intended for solving classification problems that can overcome the multicollinearity and nonlinear dependencies among the variables (Boulesteix et al., 2012). Being a nonparametric method, RF can be regarded as the most flexible machine learning method (James et al., 2021) with the ability to handle multiclass and skewed datasets (Guyennon et al., 2021). Given this superiority, we selected the RF model as the base model for combining it with the RoF technique to develop a hybrid predictive model, i.e., RoFRF. The new hybrid RoFRF model significantly improved the prediction performance of the base RF model. This is reasonable because the models developed based on the ensemble learning techniques can reduce both the variance and bias of the modeling process and avoid overfitting to gain the highest prediction performance (Nhu et al., 2020; Tran et al., 2020). The key point for the efficiency of the RoF technique is to increase the diversity and individual accuracy of the ensemble classifier simultaneously. Diversity is promoted via the principal component analysis (PCA) to perform feature extraction for the base classifier, whereas accuracy is achieved by using all the principal components and also the whole dataset to train the base classifier (Park et al., 2019). Similar studies have also shown that the RoF technique can improve the training performance (i.e., goodness-of-fit) and validation performance (i.e., predictive ability) of the base classifiers for landslide prediction. In sum, the RoF technique with a fast performance has a decent generalization capability and low implementation complexity, that make it a favored choice for developing powerful ensemble models for landslide prediction.

Overall, our study demonstrated that for a certain study area it is reasonable to select the most influential controlling factors via the feature screening methods and to identify the most accurate method via parameter optimization and comparing multiple models. A comparative approach allows for investigating the capability of multiple models for producing the most accurate and reliable susceptibility maps. This approach is an improvement to the traditional approach that often selects a single model and may ignore the other potentially better models for prediction. Therefore, the hybrid modeling provides a framework that accurately analyzes the historical landslides and conditional factors and improves the reliability of the prediction of future landslides.

6 Conclusion

The aim of this study was to perform a hybrid model of Rotation Forest - Random Forest (RoFRF) and its comparison with the Artificial Neural Network (ANN), Logistic Model Tree (LMT), Best First Tree (BFT), and Random Forest (RF) models to map landslide susceptibility in the part of Kamyaran area in Kurdistan Province, Iran. To achieve this goal based on different sources, 25 landslides affecting (or controlling) factors: elevation, slope angle, aspect, curvature, profile curvature, plan curvature, solar radiation, VRM, VD, SPI TWI, TRI, TPI, LS, NDVI, rainfall, distance to fault, distance to road, distance to river, fault density, road density, river density, lithology, land use, and soil were selected and applied as inputs to the models. Then, the relative importance of each factor was examined based on the Average Merit (AM) score. Among all the factors, 16 factors were importantly selected and used for the modeling procedure. In the next step, after drawing the landslide inventory map, a set of training and validation datasets were divided respectively, for modeling and evaluation processes. The hybrid proposed method can derive the benefits of basic classifiers using different group learning strategies. The present study demonstrated an efficient way to combine different types of landslide susceptibility methods, hybrid learning, and deep learning to obtain a more accurate map. Based on this, the most important findings of our study are summarized as follows:

  • 1) Identifying the most influential controlling factors in the occurrence of shallow landslides and the preparation of susceptibility maps are the basic strategies to control this phenomenon and select the most appropriate and practical options. Although according to the AM score, 16 factors affected the occurrence of shallow landslides, and the most important factor was the distance to roads, followed by the road density factor. Our results demonstrated that more careful road construction, maintenance, and route planning needs to be considered to reduce future landslide occurrence.

  • 2) The parameter optimization contributes to the best performance of the models, and thereby the prediction accuracy of future landslides.

  • 3) Sensitivity, accuracy, specificity, Kappa, RMSE, and AUC metrics were used to evaluate the models that showed that the hybrid RoFRF model had a better goodness-of-fit and prediction accuracy than the ANN, LMT, BFT, and RF models. This model had a successful estimate and significant performance in predicting shallow landslide occurrence.

  • 4) Our results showed that hybrid modeling using group techniques, such as RF is promising for the shallow landslide susceptibility mapping. This approach can then be used as a tool for shallow landslide hazard avoidance and mitigation worldwide.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Author contributions

BG, HS, AS, NA-A, AJ, MG, AM, SS, and AA contributed equally to the work. BG, HS, and AS collected field data and conducted the landslide susceptibility analysis. BG, HS, AS, NA-A, SS, and AJ wrote the manuscript. MG, AM, SS, and AA provided critical comments in planning this article and edited the manuscript. All the authors discussed the results and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the University of Kurdistan, Iran, based on a grant number 00-9-34027-4469.

Acknowledgments

The authors would like to thank the Vice Chancellorship of Research and Technology, University of Kurdistan, Sanandaj, Iran, for supplying required data, reports, useful maps, and their nationwide geodatabase to the first author (BG) as a postdoctoral fellowship scheme.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AbediniM.GhasemianB.ShirzadiA.ShahabiH.ChapiK.PhamB. T.et al (2019). A Novel Hybrid Approach of Bayesian Logistic Regression and its Ensembles for Landslide Susceptibility Assessment. Geocarto Int.34 (13), 14271457. 10.1080/10106049.2018.1499820

  • 2

    Alcantara-AyalaI. (2002). Geomorphology, Natural Hazards, Vulnerability and Prevention of Natural Disasters in Developing Countries. Geomorphology47 (2-4), 107124. 10.1016/S0169-555X(02)00083-1

  • 3

    AleottiP.ChowdhuryR. (1999). Landslide Hazard Assessment: Summary Review and New Perspectives. Bull. Eng. Geol. Env.58 (1), 2144. 10.1007/s100640050066

  • 4

    AlthuwayneeO. F.PradhanB.LeeS. (2016). A Novel Integrated Model for Assessing Landslide Susceptibility Mapping Using CHAID and AHP Pair-Wise Comparison. Int. J. Remote Sens.37 (5), 11901209. 10.1080/01431161.2016.1148282

  • 5

    AsadiM.Goli MokhtariL.ShirzadiA.ShahabiH.BahramiS. (2022). A Comparison Study on the Quantitative Statistical Methods for Spatial Prediction of Shallow Landslides (Case Study: Yozidar-Degaga Route in Kurdistan Province, Iran). Environ. Earth Sci.81 (2), 121. 10.1007/s12665-021-10152-4

  • 6

    BennettN. D.CrokeB. F. W.GuarisoG.GuillaumeJ. H. A.HamiltonS. H.JakemanA. J.et al (2013). Characterising Performance of Environmental Models. Environ. Model. Softw.40, 120. 10.1016/j.envsoft.2012.09.011

  • 7

    BoulesteixA.-L.JanitzaS.KruppaJ.KönigI. R. (2012). Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics. WIREs Data Min. Knowl. Discov.2 (6), 493507. 10.1002/widm.1072

  • 8

    BreimanL.FriedmanJ. H.OlshenR. A.StoneC. J. (1984).Classification and Regression Trees, Int. Group.432, 151166. 10.1201/9781315139470

  • 9

    BrenningA. (2005). Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation. Nat. Hazards Earth Syst. Sci.5 (6), 853862. 10.5194/nhess-5-853-2005

  • 10

    BryantF. B.SatorraA. (2012). Principles and Practice of Scaled Difference Chi-Square Testing. Struct. Equ. Model. A Multidiscip. J.19 (3), 372398. 10.1080/10705511.2012.687671

  • 11

    ChaoL.ZhangK.WangJ.FengJ.ZhangM. (2021). A Comprehensive Evaluation of Five Evapotranspiration Datasets Based on Ground and Grace Satellite Observations: Implications for Improvement of Evapotranspiration Retrieval Algorithm. Remote Sens.13 (12), 2414. 10.3390/rs13122414

  • 12

    ChenW.LiW.HouE.BaiH.ChaiH.WangD.et al (2015). Application of Frequency Ratio, Statistical Index, and Index of Entropy Models and Their Comparison in Landslide Susceptibility Mapping for the Baozhong Region of Baoji, China. Arab. J. Geosci.8 (4), 18291841. 10.1007/s12517-014-1554-0

  • 13

    ChenW.ZhangS.LiR.ShahabiH. (2018). Performance Evaluation of the GIS-Based Data Mining Techniques of Best-First Decision Tree, Random Forest, and Naïve Bayes Tree for Landslide Susceptibility Modeling. Sci. Total Environ.644, 10061018. 10.1016/j.scitotenv.2018.06.389

  • 14

    ChenZ.LiuZ.YinL.ZhengW. (2022). Statistical Analysis of Regional Air Temperature Characteristics before and after Dam Construction. Urban Clim.41, 101085. 10.1016/j.uclim.2022.101085

  • 15

    ColkesenI.KavzogluT. (2017). The Use of Logistic Model Tree (LMT) for Pixel- and Object-Based Classifications Using High-Resolution WorldView-2 Imagery. Geocarto Int.32 (1), 7186. 10.1080/10106049.2015.1128486

  • 16

    De JesusJ. B.KuplichT. M.de Carvalho BarretoÍ. D.Da RosaC. N.HillebrandF. L. (2021). Temporal and Phenological Profiles of Open and Dense Caatinga Using Remote Sensing: Response to Precipitation and its Irregularities. J. For. Res.32 (3), 10671076. 10.1007/s11676-020-01145-3

  • 17

    DouJ.YunusA. P.Tien BuiD.MerghadiA.SahanaM.ZhuZ.et al (2019a). Assessment of Advanced Random Forest and Decision Tree Algorithms for Modeling Rainfall-Induced Landslide Susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. total Environ.662, 332346. 10.1016/j.scitotenv.2019.01.221

  • 18

    DouJ.YunusA. P.XuY.ZhuZ.ChenC.-W.SahanaM.et al (2019b). Torrential Rainfall-Triggered Shallow Landslide Characteristics and Susceptibility Assessment Using Ensemble Data-Driven Models in the Dongjiang Reservoir Watershed, China. Nat. Hazards97 (2), 579609. 10.1007/s11069-019-03659-4

  • 19

    FanX.ScaringiG.KorupO.WestA. J.WestenC. J.TanyasH.et al (2019). Earthquake‐Induced Chains of Geologic Hazards: Patterns, Mechanisms, and Impacts. Rev. Geophys.57 (2), 421503. 10.1029/2018rg000626

  • 20

    FangZ.WangY.DuanG.PengL. (2021). Landslide Susceptibility Mapping Using Rotation Forest Ensemble Technique with Different Decision Trees in the Three Gorges Reservoir Area, China. Remote Sens.13 (2), 238. 10.3390/rs13020238

  • 21

    FawcettT. (2006). An Introduction to ROC Analysis. Pattern Recognit. Lett.27 (8), 861874. 10.1016/j.patrec.2005.10.010

  • 22

    GeertsemaM.HighlandL.VaugeouisL. (2009). Landslides–disaster Risk Reduction. Berlin, Heidelberg: Springer, 589607.

  • 23

    GhasemianB.ShahabiH.ShirzadiA.Al-AnsariN.JaafariA.KressV. R.et al (2022). A Robust Deep-Learning Model for Landslide Susceptibility Mapping: a Case Study of Kurdistan Province, Iran. Sensors22 (4), 1573. 10.3390/s22041573

  • 24

    GuyennonN.SalernoF.RossiD.RainaldiM.CalizzaE.RomanoE. (2021). Climate Change and Water Abstraction Impacts on the Long-Term Variability of Water Levels in Lake Bracciano (Central Italy): A Random Forest Approach. J. Hydrology Regional Stud.37, 100880. 10.1016/j.ejrh.2021.100880

  • 25

    HeQ.XuZ.LiS.LiR.ZhangS.WangN.et al (2019). Novel Entropy and Rotation Forest-Based Credal Decision Tree Classifier for Landslide Susceptibility Modeling. Entropy21 (2), 106. 10.3390/e21020106

  • 26

    HongH.MiaoY.LiuJ.ZhuA.-X. (2019). Exploring the Effects of the Design and Quantity of Absence Data on the Performance of Random Forest-Based Landslide Susceptibility Mapping. Catena176, 4564. 10.1016/j.catena.2018.12.035

  • 27

    HongH.TsangaratosP.IliaI.LoupasakisC.WangY. (2020). Introducing a Novel Multi-Layer Perceptron Network Based on Stochastic Gradient Descent Optimized by a Meta-Heuristic Algorithm for Landslide Susceptibility Mapping. Sci. total Environ.742, 140549. 10.1016/j.scitotenv.2020.140549

  • 28

    HungL. Q.VanN. T. H.DucD. M.HaL. T. C.Van SonP.KhanhN. H.et al (2016). Landslide Susceptibility Mapping by Combining the Analytical Hierarchy Process and Weighted Linear Combination Methods: a Case Study in the Upper Lo River Catchment (Vietnam). Landslides13 (5), 12851301. 10.1007/s10346-015-0657-3

  • 29

    HuppertH. E.SparksR. S. J. (2006). Extreme Natural Hazards: Population Growth, Globalization and Environmental Change. Phil. Trans. R. Soc. A364 (1845), 18751888. 10.1098/rsta.2006.1803

  • 30

    JaafariA.RezaeianJ.OmraniM. S. (2017). Spatial Prediction of Slope Failures in Support of Forestry Operations Safety. Croat. J. For. Eng.38 (1), 107118.

  • 31

    JamesG.WittenD.HastieT.TibshiraniR. (2021). An Introduction to Statistical Learning. New York, USA: Springer, 1557.

  • 32

    JanizadehS.AvandM.JaafariA.PhongT. V.BayatM.AhmadisharafE.et al (2019). Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran. Sustainability11 (19), 5426. 10.3390/su11195426

  • 33

    JiangS.ZuoY.YangM.FengR. (2021). Reconstruction of the Cenozoic Tectono-Thermal History of the Dongpu Depression, Bohai Bay Basin, China: Constraints from Apatite Fission Track and Vitrinite Reflectance Data. J. Petroleum Sci. Eng.205, 108809. 10.1016/j.petrol.2021.108809

  • 34

    KadirhodjaevA.RezaieF.LeeM.-J.LeeS. (2020). Landslide Susceptibility Assessment Using an Optimized Group Method of Data Handling Model. ISPRS. Int. J. Geo-Inf.9 (10), 566. 10.3390/ijgi9100566

  • 35

    KhandelwalM.MartoA.FatemiS. A.GhoroqiM.ArmaghaniD. J.SinghT. N.et al (2018). Implementing an ANN Model Optimized by Genetic Algorithm for Estimating Cohesion of Limestone Samples. Eng. Comput.34 (2), 307317. 10.1007/s00366-017-0541-y

  • 36

    KoehorstB.KjekstadO.PatelD.LubkowskiZ.KnoeffJ.AkkermanG. (2005). Workpackage 6 Determination of Socio-Economic Impact of Natural Disasters. Assessing Socioeconomic Impact in Europe, 173

  • 37

    KonishiT.SugaY. (2018). Landslide Detection Using COSMO-SkyMed Images: A Case Study of a Landslide Event on Kii Peninsula, Japan. Eur. J. remote Sens.51 (1), 205221.

  • 38

    LanZ.ZhaoY.ZhangJ.JiaoR.KhanM. N.SialT. A.et al (2021). Long-term Vegetation Restoration Increases Deep Soil Carbon Storage in the Northern Loess Plateau. Sci. Rep.11 (1), 111. 10.1038/s41598-021-93157-0

  • 39

    LandwehrN.HallM.FrankE. (2005). Logistic Model Trees. Mach. Learn.59 (1-2), 161205. 10.1007/s10994-005-0466-3

  • 40

    LeeS.HongS.-M.JungH.-S. (2017). A Support Vector Machine for Landslide Susceptibility Mapping in Gangwon Province, Korea. Sustainability9 (1), 48. 10.3390/su9010048

  • 41

    LiB.YangJ.YangY.LiC.ZhangY. (2021). Sign Language/gesture Recognition Based on Cumulative Distribution Density Features Using UWB Radar. IEEE Trans. Instrum. Meas.70, 113. 10.1109/tim.2021.3092072

  • 42

    LiH.DengJ.FengP.PuC.ArachchigeD. D.ChengQ. (2021a). Short-Term Nacelle Orientation Forecasting Using Bilinear Transformation and ICEEMDAN Framework. Front. Energy Res.697, 780928. 10.3389/fenrg.2021.780928

  • 43

    LiH.DengJ.YuanS.FengP.ArachchigeD. D. (2021b). Monitoring and Identifying Wind Turbine Generator Bearing Faults Using Deep Belief Network and EWMA Control Charts. Front. Energy Res.9, 770. 10.3389/fenrg.2021.799039

  • 44

    LiJ.ZhaoY.ZhangA.SongB.HillR. L. (2021). Effect of Grazing Exclusion on Nitrous Oxide Emissions during Freeze-Thaw Cycles in a Typical Steppe of Inner Mongolia. Agric. Ecosyst. Environ.307, 107217. 10.1016/j.agee.2020.107217

  • 45

    LiZ.-J.ZhangK. (2008). Comparison of Three GIS-Based Hydrological Models. J. Hydrol. Eng.13 (5), 364370. 10.1061/(asce)1084-0699(2008)13:5(364)

  • 46

    LiuB.SpiekermannR.ZhaoC.PüttmannW.SunY.JasperA.et al (2022). Evidence for the Repeated Occurrence of Wildfires in an Upper Pliocene Lignite Deposit from Yunnan, SW China. Int. J. Coal Geol.250, 10392410.1016/j.coal.2021.103924

  • 47

    LuanD.LiuA.WangX.XieY.WuZ. (2022). Robust Two-Stage Location Allocation for Emergency Temporary Blood Supply in Postdisaster. Discrete Dyn. Nat. Soc., 2022. 10.1155/2022/6184170

  • 48

    LuccheseL. V.De OliveiraG. G.PedrolloO. C. (2021). Mamdani Fuzzy Inference Systems and Artificial Neural Networks for Landslide Susceptibility Mapping. Nat. Hazards106 (3), 23812405. 10.1007/s11069-021-04547-6

  • 49

    MalamudB. D.TurcotteD. L.GuzzettiF.ReichenbachP. (2004). Landslide Inventories and Their Statistical Properties. Earth Surf. Process. Landforms29 (6), 687711. 10.1002/esp.1064

  • 50

    MeenaS.MishraB.Tavakkoli PiralilouS. (2019). A Hybrid Spatial Multi-Criteria Evaluation Method for Mapping Landslide Susceptible Areas in Kullu Valley, Himalayas. Geosciences9 (4), 156. 10.3390/geosciences9040156

  • 51

    MerghadiA.YunusA. P.DouJ.WhiteleyJ.ThaiphamB.BuiD. T.et al (2020). Machine Learning Methods for Landslide Susceptibility Studies: A Comparative Overview of Algorithm Performance. Earth-Science Rev.207, 103225. 10.1016/j.earscirev.2020.103225

  • 52

    NguyenB.-Q. -V.KimY.-T. (2021). Landslide Spatial Probability Prediction: a Comparative Assessment of Naïve Bayes, Ensemble Learning, and Deep Learning Approaches. Bull. Eng. Geol. Environ.80 (6), 42914321. 10.1007/s10064-021-02194-6

  • 53

    NguyenQ. K.BuiD. T.HoangN. D.TrinhP. T.NguyenV. H.YilmazI. (2017). A Novel Hybrid Approach Based on Instance Based Learning Classifier and Rotation Forest Ensemble for Spatial Prediction of Rainfall-Induced Shallow Landslides Using GIS. Sustain. Switz.9 (5), 813. 10.3390/su9050813

  • 54

    NhuV.-H.ShirzadiA.ShahabiH.ChenW.ClagueJ. J.GeertsemaM.et al (2020). Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and its Ensembles in a Semi-arid Region of Iran. Forests11 (4), 421. 10.3390/f11040421

  • 55

    ParkS.HammS.-Y.KimJ. (2019). Performance Evaluation of the GIS-Based Data-Mining Techniques Decision Tree, Random Forest, and Rotation Forest for Landslide Susceptibility Modeling. Sustainability11 (20), 5659. 10.3390/su11205659

  • 56

    ParkS.KimJ. (2019). Landslide Susceptibility Mapping Based on Random Forest and Boosted Regression Tree Models, and a Comparison of Their Performance. Appl. Sci.9 (5), 942. 10.3390/app9050942

  • 57

    PhamB. T.JaafariA.Nguyen-ThoiT.Van PhongT.NguyenH. D.SatyamN.et al (2020a). Ensemble Machine Learning Models Based on Reduced Error Pruning Tree for Prediction of Rainfall-Induced Landslides. Int. J. Digital Earth14, 122. 10.1080/17538947.2020.1860145

  • 58

    PhamB. T.PhongT. V.Nguyen-ThoiT.TrinhP. T.TranQ. C.HoL. S.et al (2020b). GIS-based Ensemble Soft Computing Models for Landslide Susceptibility Mapping. Adv. Space Res.66 (6), 13031320. 10.1016/j.asr.2020.05.016

  • 59

    PhamB. T.BuiD. T.DholakiaM. B.PrakashI.PhamH. V.MehmoodK.et al (2017). A Novel Ensemble Classifier of Rotation Forest and Naïve Bayer for Landslide Susceptibility Assessment at the Luc Yen District, Yen Bai Province (Viet Nam) Using GIS. Geomatics, Nat. Hazards Risk8 (2), 649671. 10.1080/19475705.2016.1255667

  • 60

    QuinlanJ. R. (1986). Induction of Decision Trees. Mach. Learn1 (1), 81106. 10.1007/bf00116251

  • 61

    RazavizadehS.SolaimaniK.MassironiM.KavianA. (2017). Mapping Landslide Susceptibility with Frequency Ratio, Statistical Index, and Weights of Evidence Models: a Case Study in Northern Iran. Environ. Earth Sci.76 (14), 116. 10.1007/s12665-017-6839-7

  • 62

    ReichenbachP.RossiM.MalamudB. D.MihirM.GuzzettiF. (2018). A Review of Statistically-Based Landslide Susceptibility Models. Earth-Science Rev.180, 6091. 10.1016/j.earscirev.2018.03.001

  • 63

    RodriguezJ. J.KunchevaL. I.AlonsoC. J. (2006). Rotation Forest: A New Classifier Ensemble Method. IEEE Trans. Pattern Anal. Mach. Intell.28 (10), 16191630. 10.1109/tpami.2006.211

  • 64

    Rodriguez-GalianoV. F.GhimireB.RoganJ.Chica-OlmoM.Rigol-SanchezJ. P. (2012). An Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification. ISPRS J. Photogrammetry Remote Sens.67, 93104. 10.1016/j.isprsjprs.2011.11.002

  • 65

    SarkarS.KanungoD. P. (2004). An Integrated Approach for Landslide Susceptibility Mapping Using Remote Sensing and GIS. Photogramm. Eng. remote Sens.70 (5), 617625. 10.14358/pers.70.5.617

  • 66

    SchlöglM.MatullaC. (2018). Potential Future Exposure of European Land Transport Infrastructure to Rainfall-Induced Landslides throughout the 21st Century. Nat. hazards earth Syst. Sci.18 (4), 11211132. 10.5194/nhess-18-1121-2018

  • 67

    SchlöglM.RichterG.AvianM.ThalerT.HeissG.LenzG.et al (2019). On the Nexus between Landslide Susceptibility and Transport Infrastructure–An Agent-Based Approach. Nat. hazards earth Syst. Sci.19 (1), 201219. 10.5194/nhess-19-201-2019

  • 68

    SchusterR. L.HighlandL. (2001). Socioeconomic and Environmental Impacts of Landslides in the Western Hemisphere. Citeseer, 148. 10.3133/ofr01276

  • 69

    Shafizadeh-MoghadamH.MinaeiM.ShahabiH.HagenauerJ. (2019). Big Data in Geohazard; Pattern Mining and Large Scale Analysis of Landslides in Iran. Earth Sci. Inf.12 (1), 117. 10.1007/s12145-018-0354-6

  • 70

    ShiH. (2007). Best-first Decision Tree Learning. Hamilton, New Zealand: The University of Waikato.

  • 71

    SmithH. G.SpiekermannR.BettsH.NevermanA. J. (2021). Comparing Methods of Landslide Data Acquisition and Susceptibility Modelling: Examples from New Zealand. Geomorphology381, 107660. 10.1016/j.geomorph.2021.107660

  • 72

    SunD.ShiS.WenH.XuJ.ZhouX.WuJ. (2021a). A Hybrid Optimization Method of Factor Screening Predicated on GeoDetector and Random Forest for Landslide Susceptibility Mapping. Geomorphology379, 107623. 10.1016/j.geomorph.2021.107623

  • 73

    SunD.XuJ.WenH.WangD. (2021b). Assessment of Landslide Susceptibility Mapping Based on Bayesian Hyperparameter Optimization: A Comparison between Logistic Regression and Random Forest. Eng. Geol.281, 105972. 10.1016/j.enggeo.2020.105972

  • 74

    TranQ. C.MinhD. D.JaafariA.Al-AnsariN.MinhD. D.VanD. T.et al (2020). Novel Ensemble Landslide Predictive Models Based on the Hyperpipes Algorithm: A Case Study in the Nam Dam Commune, Vietnam. Appl. Sci.10 (11), 3710. 10.3390/app10113710

  • 75

    WangQ.GuoY.LiW.HeJ.WuZ. (2019). Predictive Modeling of Landslide Hazards in Wen County, Northwestern China Based on Information Value, Weights-Of-Evidence, and Certainty Factor. Geomatics, Nat. Hazards Risk10 (1), 820835. 10.1080/19475705.2018.1549111

  • 76

    WangS.ZhangK.ChaoL.LiD.TianX.BaoH.et al (2021). Exploring the Utility of Radar and Satellite-Sensed Precipitation and Their Dynamic Bias Correction for Integrated Prediction of Flood and Landslide Hazards. J. Hydrology603, 126964. 10.1016/j.jhydrol.2021.126964

  • 77

    XieW.LiX.JianW.YangY.LiuH.RobledoL. F.et al (2021a). A Novel Hybrid Method for Landslide Susceptibility Mapping-Based Geodetector and Machine Learning Cluster: A Case of Xiaojin County, China. ISPRS. Int. J. Geo-Inf.10 (2), 93. 10.3390/ijgi10020093

  • 78

    XieW.NieW.SaffariP.RobledoL. F.DescoteP.-Y.JianW. (2021b). Landslide Hazard Assessment Based on Bayesian Optimization-Support Vector Machine in Nanping City, China. Nat. Hazards109 (1), 931948. 10.1007/s11069-021-04862-y

  • 79

    XuJ.WuZ.ChenH.ShaoL.ZhouX.WangS. (2021). Study on Strength Behavior of Basalt Fiber-Reinforced Loess by Digital Image Technology (DIT) and Scanning Electron Microscope (SEM). Arab. J. Sci. Eng.46 (11), 1131911338. 10.1007/s13369-021-05787-1

  • 80

    YaoX.ThamL. G.DaiF. C. (2008). Landslide Susceptibility Mapping Based on Support Vector Machine: a Case Study on Natural Slopes of Hong Kong, China. Geomorphology101 (4), 572582. 10.1016/j.geomorph.2008.02.011

  • 81

    YinL.WangL.KeimB. D.KonsoerK.ZhengW. (2022a). Wavelet Analysis of Dam Injection and Discharge in Three Gorges Dam and Reservoir with Precipitation and River Discharge. Water14 (4), 567. 10.3390/w14040567

  • 82

    YinL.WangL.ZhengW.GeL.TianJ.LiuY.et al (2022b). Evaluation of Empirical Atmospheric Models Using Swarm-C Satellite Data. Atmosphere13 (2), 294. 10.3390/atmos13020294

  • 83

    ZhangK.AliA.AntonarakisA.MoghaddamM.SaatchiS.TabatabaeenejadA.et al (2019a). The Sensitivity of North American Terrestrial Carbon Fluxes to Spatial and Temporal Variation in Soil Moisture: An Analysis Using Radar‐Derived Estimates of Root‐Zone Soil Moisture. J. Geophys. Res. Biogeosci.124 (11), 32083231. 10.1029/2018jg004589

  • 84

    ZhangK.WangS.BaoH.ZhaoX. (2019b). Characteristics and Influencing Factors of Rainfall-Induced Landslide and Debris Flow Hazards in Shaanxi Province, China. Nat. Hazards Earth Syst. Sci.19 (1), 93105. 10.5194/nhess-19-93-2019

  • 85

    ZhangK.ShalehyM. H.EzazG. T.ChakrabortyA.MohibK. M.LiuL. (2022). An Integrated Flood Risk Assessment Approach Based on Coupled Hydrological-Hydraulic Modeling and Bottom-Up Hazard Vulnerability Analysis. Environ. Model. Softw.148, 105279. 10.1016/j.envsoft.2021.105279

  • 86

    ZhangY.LiuF.FangZ.YuanB.ZhangG.LuJ.et al (2021). Learning from a Complementary-Label Source Domain: Theory and Algorithms. IEEE Transactions on Neural Networks and Learning Systems

  • 87

    ZhaoX.XiaH.PanL.SongH.NiuW.WangR.et al (2021). Drought Monitoring over Yellow River Basin from 2003–2019 Using Reconstructed MODIS Land Surface Temperature in Google Earth Engine. Remote Sens.13 (18), 3748. 10.3390/rs13152934

  • 88

    ZhouG.LongS.XuJ.ZhouX.SongB.DengR.et al (2021a). Comparison Analysis of Five Waveform Decomposition Algorithms for the Airborne LiDAR Echo Signal. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.14, 78697880. 10.1109/jstars.2021.3096197

  • 89

    ZhouG.ZhangR.HuangS. (2021b). Generalized Buffering Algorithm. IEEE Access9, 2714027157. 10.1109/access.2021.3057719

  • 90

    Zhou, W., Lv, Y., Lei, J., and Yu, L. (2019). Global and Local-Contrast Guides Content-Aware Fusion for RGB-D Saliency Prediction. IEEE Trans. Syst. Man Cybern. Syst., 51(6), 3641–3649. 10.1109/tsmc.2019.2957386

  • 91

    Zhou, W., Guo, Q., Lei, J., Yu, L., and Hwang, J. (2021). IRFR-Net: Interactive Recursive Feature-Reshaping Network for Detecting Salient Objects in RGB-D Images. IEEE Trans. Neural Netw. Learn Syst., 1–13. 10.1109/TNNLS.2021.3105484

  • 92

    ZhouX.WenH.ZhangY.XuJ.ZhangW. (2021). Landslide Susceptibility Mapping Using Hybrid Random Forest with GeoDetector and RFE for Factor Optimization. Geosci. Front.12 (5), 101211. 10.1016/j.gsf.2021.101211

  • 93

    ZhouY.XuG.TangK.TianL.SunY. (2022). Video Coding Optimization in AVS2. Inf. Process. Manag.59 (2), 102808. 10.1016/j.ipm.2021.102808

Summary

Keywords

landslide susceptibility, spatial modeling, rotation forest, random forest, decision tree, GIS, Iran

Citation

Ghasemian B, Shahabi H, Shirzadi A, Al-Ansari N, Jaafari A, Geertsema M, Melesse AM, Singh SK and Ahmad A (2022) Application of a Novel Hybrid Machine Learning Algorithm in Shallow Landslide Susceptibility Mapping in a Mountainous Area. Front. Environ. Sci. 10:897254. doi: 10.3389/fenvs.2022.897254

Received

15 March 2022

Accepted

29 April 2022

Published

13 June 2022

Volume

10 - 2022

Edited by

Yusen He, Grinnell College, United States

Reviewed by

Jiahao Deng, DePaul University, United States

Jagabandhu Roy, University of Gour Banga, India

Haijia Wen, Chongqing University, China

Updates

Copyright

*Correspondence: Himan Shahabi,

This article was submitted to Environmental Informatics and Remote Sensing, a section of the journal Frontiers in Environmental Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics