Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Earth Sci., 12 January 2026

Sec. Geohazards and Georisks

Volume 13 - 2025 | https://doi.org/10.3389/feart.2025.1722201

This article is part of the Research TopicPrevention, Mitigation, and Relief of Compound and Chained Natural Hazards, volume IIIView all 5 articles

Spatial distribution patterns and landslide susceptibility analysis from a global–local perspective along the Zhuzhou-Guangzhou section of the Beijing–Guangzhou railway

Hanxing Liu,Hanxing Liu1,2Chong Xu,,
Chong Xu1,2,3*Liye Feng,Liye Feng1,2Peng WangPeng Wang4Jingjing Sun,Jingjing Sun1,2Xuewei Zhang,Xuewei Zhang1,2Juanling WangJuanling Wang5Qihao SunQihao Sun6Kang LiKang Li1
  • 1National Institute of Natural Hazards, Ministry of Emergency Management of China, Beijing, China
  • 2Key Laboratory of Compound and Chained Natural Hazards Dynamics, Ministry of Emergency Management of China, Beijing, China
  • 3School of Geology and Mining Engineering, Xinjiang University, Urumqi, China
  • 4Beijing Engineering Corporation Limited, Beijing, China
  • 5China Railway Xi’an Group Company Limited, Xi’an, Shaanxi, China
  • 6China Railway Design Corporation, Tianjin, China

The Beijing–Guangzhou Railway is a critical transportation corridor in China, with the Zhuzhou–Guangzhou segment particularly susceptible to landslides due to steep terrain and marked climatic variability. This study combined 713 large-scale landslide remnants with 14 Influencing factors (topography, geology, and hydrology) to analyze landslide spatial distribution and proposed a multiscale landslide susceptibility assessment framework based on a “global modeling–local analysis” approach. The contribution of each factor to susceptibility was quantified using a random forest model coupled with the SHapley Additive exPlanations (SHAP) method. Spatial analysis revealed that landslides are concentrated in areas at 188–752 m elevation, with slopes of 15°–30°, curvatures of −0.35–0.22, Jurassic strata, and high precipitation, and are significantly influenced by river erosion and railway engineering activities. The susceptibility model performed well (AUC = 0.88). Global analysis showed that high and very high susceptibility areas are primarily located in the central and southern sections of the study area, particularly in northern Lechang, Shaoguan, and Suxian District, Chenzhou. Local analysis of three representative sections further indicated that slope, Topographic Wetness Index (TWI), elevation, and curvature are the primary hazard factors, with the order of contribution varying across sections, reflecting the regional characteristics and complexity of landslide mechanisms along the railway. This study provides a scientific basis for landslide risk prevention and control along the Beijing-Guangzhou Railway and a transferable reference for susceptibility assessment in similar regions.

1 Introduction

Landslides, as highly destructive and sudden geological hazards, pose serious threats to infrastructure and the safety of residents in mountainous regions (Hungr et al., 2014). Railways, a vital component of China’s transportation system, frequently traverse geologically complex regions such as mountainous and hilly terrains, rendering them particularly susceptible to landslides and related geohazards such as debris flows and rockfalls. The Beijing–Guangzhou Railway, one of the most important north–south transportation arteries in China, passes through several provinces with high geological hazard susceptibility, including Hunan and Guangdong. It serves as a national Class I trunk line with heavy passenger and freight traffic. The Zhuzhou–Guangzhou section of the railway runs through low mountainous and hilly areas, characterized by intense geomorphic dissection, slow vegetation recovery, active geological structures, and intense and frequent precipitation. Combined with increasing human activities, these conditions contribute to a prominent risk of landslide disasters along the route. In 2020, a landslide triggered by heavy rainfall on the Beijing-Guangzhou Railway in Yongxing County, Hunan Province, caused a train to derail after colliding with the collapsed material, resulting in direct economic losses estimated at 3.14 million USD (Bureau, 2020). This incident severely threatened operational safety, emphasizing the urgent need for effective landslide risk mitigation in this region.

In light of such threats, landslide susceptibility assessment serves as a critical tool for disaster prevention and mitigation, aiming to quantify the probability of landslide occurrence in a region by analyzing influencing factors such as topography, geology, and climate (Aleotti and Chowdhury, 1999; Reichenbach et al., 2018). Since the 1970s, landslide susceptibility evaluation methods have undergone significant development (Kubwimana et al., 2021; Huang, 2023). Early research methods primarily relied on qualitative analysis based on expert experience, such as the Analytic Hierarchy Process (AHP) and fuzzy comprehensive evaluation methods (Xu et al., 2009; Corominas et al., 2014; Liu et al., 2021; Wang et al., 2024b). Although these methods exhibit certain applicability in scenarios with limited data, their outcomes are often limited in objectivity and reproducibility (Yong et al., 2022). With the advancement of data acquisition capabilities, quantitative methods based on mathematical statistics, such as the information value method and frequency ratio method, have gradually become mainstream (Kamp et al., 2008; Xu et al., 2010; Yuan et al., 2022). However, these methods still exhibit limitations when dealing with complex nonlinear relationships, such as the reliance on subjective judgment for weight determination and the difficulty in fully capturing the intricate interactions between landslides and multiple influencing factors (Hong et al., 2017a; Kouhartsiouk and Perdikou, 2021).

In recent years, the introduction of machine learning techniques has provided new perspectives for landslide susceptibility evaluation (Merghadi et al., 2020; Fang et al., 2021; He et al., 2023; Guo et al., 2024). The Random Forest (RF) model has gradually emerged as a mainstream method for landslide susceptibility evaluation due to its strong resistance to overfitting and excellent compatibility with high-dimensional data (Bai et al., 2011; Youssef et al., 2016; Habumugisha et al., 2022). Studies have shown that the RF model performs exceptionally well in regional landslide prediction, with its AUC values typically surpassing those of traditional methods. For instance, Hong et al. compared logistic regression with the Random Forest model and found that the latter achieved higher prediction accuracy in the Wuyuan region (Hong et al., 2017b). Zhao et al. validated the superiority of the RF model in the Hengduan Mountains region through a comparison of six different models (Zhao et al., 2022). Zhang et al. utilized the RF and XGBoost models to assess landslide susceptibility in Fengjie County, Chongqing. The results demonstrated that the RF model outperformed the XGBoost model in landslide susceptibility evaluation for this region, with higher AUC values, accuracy, and F-scores (Zhang et al., 2023).

However, existing research largely focuses on natural geographical units such as watersheds and counties, lacking targeted analysis of linear engineering corridors (such as railways). Railway corridors not only have complex terrain and highly patchy landslide distribution along their lines, but are also influenced by multiple factors such as human disturbance and operational safety requirements, resulting in strong spatial heterogeneity and mechanistic diversity in their disaster risks. Current research on railway-related geological hazards mainly concentrates on the substructural response, deformation monitoring, or seismic behavior of engineering structures such as bridges and tunnels. For example, Sam et al. conducted a systematic study on the seismic behavior of tunnels under high ground stress conditions (Sam, 2024). In addition, although advanced technologies such as microseismic methods have provided insights into local ground motion (Susilo et al., 2024), and the importance of triggering factors such as extreme rainfall is increasingly recognized (Yanfatriani et al., 2024), systematic modeling studies of natural geological conditions and landslide susceptibility across entire railway corridors remain insufficient. In particular, there is still a lack of spatial mechanism analysis and local diagnostic studies for typical high-risk sections.

To address these gaps, this study focuses on the Zhuzhou–Guangzhou section of the Beijing–Guangzhou Railway and proposes a multiscale landslide susceptibility assessment framework based on a “global modeling–local analysis” approach. By integrating extensive landslide relic data with 14 influencing factors (including topographic, geological, and hydrological variables), the framework employs a Random Forest model for analysis and incorporates the SHapley Additive exPlanations (SHAP) interpretability method to extract factor contributions, thereby enhancing model transparency and explainability.

Building on the overall model, this study further identifies high-risk segments along the railway and develops a “typical section detailed analysis module.” Three representative regions are selected to examine the spatial distribution of dominant factors, differences in factor responses, and disaster-triggering mechanisms, enabling fine-scale spatial identification and risk interpretation at the segment level. This refined approach not only addresses the limited adaptability of traditional models in complex terrains but also embodies an in-depth extension from whole-region modeling to localized diagnosis, providing scientific support for segmented risk management and differentiated mitigation strategies of landslide hazards along railways.

2 Study area and data

2.1 Study area

The Beijing-Guangzhou Railway spans the north-south axis of China, extending from 39°48′N to 23°05′N latitude and from 113°04′E to 114°53′E longitude. It extends from Fengtai District in Beijing in the north to Guangzhou City in Guangdong Province in the south, with a total length of approximately 2,298 km. The railway traverses three major geographical regions: the North China Plain, the Jianghan Plain, and the Pearl River Delta. It intersects with several key rail lines, including the Beijing-Shanghai Railway, the Shanghai-Kunming Railway, the Hankou-Yichang Railway, and the Wuhan-Guangzhou High-Speed Railway, forming the core framework of China’s railway network. This study selects the Zhuzhou to Guangzhou section of the Beijing–Guangzhou Railway and its surrounding areas as the research focus. The railway passes through several cities including Zhuzhou, Hengyang, Chenzhou, Shaoguan, and Qingyuan, covering a total length of approximately 668 km (Figure 1). This section is located on the eastern edge of the Nanling tectonic belt and is one of the segments of the Beijing–Guangzhou Railway that traverses the most mountainous terrain. Along the route, it frequently crosses narrow valleys, steep slopes, and river-cut landscapes.

Figure 1
Map highlighting the study area along the Beijing-Guangzhou Railway in China. The map includes an inset showing its location within China, with details of cities like Hengyang, Zhuzhou, and Guangzhou. The terrain elevation is indicated by colors in the study area. A legend identifies key elements such as cities, the study area boundary, and elevation levels.

Figure 1. Location map of the study area.

The study area features complex terrain dominated by mountains and hills, with significant overall elevation variation ranging from approximately 200–800 m above sea level. Certain local sections have steep slopes and deep valleys, with well-developed gullies. The region exhibits diverse lithologies, primarily including granite, gneiss, sandstone, shale, and clastic rocks. Some rock masses have well-developed joints and fractures and possess weak resistance to weathering, resulting in loose stratigraphic structures and soft lithologies. These conditions reduce slope stability and provide an unstable foundation conducive to the development of geological hazards such as landslides. The climate is characterized as subtropical monsoon, with distinct four seasons and a pattern of rainfall coinciding with warm periods. Annual precipitation ranges from about 400 to 1,600 mm, and the mean annual temperature is approximately 15 °C–20 °C. Heavy rainfall events are concentrated mainly from April to June, with frequent extreme precipitation, which serves as a significant external trigger for landslides. The combined influence of these factors makes the Beijing–Guangzhou Railway corridor a landslide-prone area, posing considerable threats to the safe operation of the railway.

2.2 Large-scale landslide relics inventory

Landslide relics refer to areas where landslides have occurred or are currently occurring. While most of these areas have stabilized, they may still undergo deformation due to external forces such as earthquakes or rainfall. These relics serve as the only direct data source for understanding the developmental context of local landslides (Guerriero et al., 2021; Wang et al., 2024c). This study utilizes multi-source high-resolution satellite imagery (QuickBird, IKONOS, SPOT-5, etc.) provided by Google Earth to systematically identify and catalog landslide remains in the study area using visual interpretation methods (Highland, 2004; Li et al., 2024; Xue et al., 2025). The cataloging results are verified by referring to historical materials and existing research to ensure their reliability.

Landslides with an area of ≥0.05 km2 were defined as large landslide relics. A total of 713 large landslide relics were cataloged, with individual landslide areas ranging from 0.05 to 2.02 km2, and a total landslide area of 117.35 km2 (Figures 2, 3). Investigating the inventory of landslide relics aids in identifying high-risk landslide areas and assessing the likelihood and potential impacts of future landslides, thereby providing a scientific basis for formulating effective disaster prevention and mitigation strategies (Feng et al., 2024b; Gao et al., 2024; Huang et al., 2024; Xie et al., 2025).

Figure 2
Satellite images show four different landslide-prone areas marked with dashed elliptical lines and arrows indicating the direction of the slope. Each section is labeled (a), (b), (c), and (d). The images depict forested and rural landscapes with various scales noted, ranging from four hundred meters to nine hundred meters, highlighting geographical features and human settlements.

Figure 2. Typical landslide images in the study area. (a) at 23°44'26.37"N, 113°47'15.85"E; (b) at 24°28'16.83"N, 112°09'1.82"E; (c) 23°44'22.27"N, 113°46'30.93"E; (d) at 24°29'43.20"N, 112°13′37.10"E.

Figure 3
Map displaying the Beijing-Guangzhou Railway from Zhuzhou to Guangzhou. Elevation is shown in color gradients from -51 to 2030 meters, with red dots marking landslide points. Cities are circled and labeled. Surrounding coordinates are marked.

Figure 3. Large-scale landslide relics distribution map.

2.3 Influencing factors

Landslides are the result of interactions among various factors, including geological conditions, topography, hydrogeological conditions, seismic activity, human activities, and climate change (Brabb, 1984; Cruden, 1991). Therefore, based on the characteristics of the study area, this paper selected four categories (topography, basic geology, hydrometeorology, and surface cover), comprising a total of 14 factors (Figure 4 and Table 1). The topographic factors include elevation, slope, aspect, Terrain Position Index (TPI), and curvature; the basic geological factors include geology, distance to fault, and peak ground acceleration (PGA); the hydrometeorological factors include average precipitation, distance to river, and topographic wetness index (TWI); and the surface cover factors include Fractional Vegetation Cover (FVC), distance to railway,and landcover. These 14 influencing factors were utilized for landslide susceptibility analysis.

Figure 4
Six-panel map showing different geographical analyses with legends and color coding. Panels (a) and (b) depict elevation and slope data, with gradients from green to red and blue to brown. Panel (c) shows directional aspects with varied colors. Panel (d) illustrates land slope types. Panel (e) presents soil classification with a color gradient. Panel (f) displays geological formations with diverse colors. Each map includes a north arrow and scale bar. Six maps depict different analyses of a geographic area between 111° and 114° E longitude and 30° to 33° N latitude. Map (g) shows elevation with a blue color gradient. Map (h) illustrates slope using green shades. Map (i) highlights geology in red-brown tones. Map (j) displays shaded relief in grayscale. Map (k) features land cover with various colors representing urban, vegetation, and other types. Map (l) presents contour lines with color-coded intervals. Each map includes a north arrow, scale, and legend explaining the color or symbol meanings Two maps display topographical data of a region extending from 111° to 114° E and 23° to 29° N, marked with a north arrow. The left map uses green to pink shading to illustrate elevation, with a legend indicating meters above sea level. The right map employs blue to red shading to depict elevation, with blue representing higher altitudes and red lower ones. Both maps highlight a river flowing through Zhuzhou and Guangzhou, with scale bars measuring distance.

Figure 4. Factor classification maps. (a) Elevation; (b) Slope; (c) Aspect; (d) TPI; (e) Curvature; (f) Geology; (g) Average precipitation; (h) FVC; (i) TWI; (j) PGA; (k) Landcover; (l) Distance to Fault; (m) Distance to River; (n) Distance to Railway.

Table 1
www.frontiersin.org

Table 1. Environmental impact factors and data sources.

The elevation data were sourced from SRTM (Jarvis et al., 2008), and slope, aspect, and curvature information were extracted using GeoScience software. The slope position data were obtained through the Land Facet Corridor program. The TWI was generated based on DEM within the GRASS GIS environment. The land cover type was derived from the land cover dataset GLCNMO (Tateishi et al., 2011). The fault data were obtained from the National Seismic Active Fault Data and categorized into 11 classes (Xu et al., 2016; Wu et al., 2024). The stratigraphic data were sourced from the 1:2.5 million Geological Map of China (China Geological Survey) and partial geological maps of Asian countries (United States Geological Survey), and classified into 12 categories based on geological age. The average precipitation data were obtained from global climate data with a resolution of 1 km, and the average precipitation map for the study area was generated using linear interpolation methods (Fick and Hijmans, 2017). The PGA data were sourced from the United States Geological Survey (USGS), representing the coseismic PGA distribution map derived from a combination of station data and numerical simulations following seismic events (Xu et al., 2019). All factor data were processed into 30 m × 30 m grid format to prepare for subsequent modeling.

To avoid multicollinearity issues among the influencing factors, this study conducted a correlation analysis of the 14 factors using the Pearson correlation coefficient (Huang et al., 2025). Figure 5 shows that the absolute values of the correlations for all factors are below 0.7, indicating no multicollinearity, and thus, they can be input into the model for training.

Figure 5
Correlation matrix displaying relationships between variables such as TPI, Landcover, Geology, and more. Positive correlations are marked in shades of red, negative in blue, with strength indicated by color intensity. Values range from -1.0 to 1.0.

Figure 5. Pearson correlation analysis of influencing factors.

3 Methods

This paper uses the random forest method to evaluate the landslide susceptibility of the Zhuzhou-Guangzhou section of the Beijing-Guangzhou Railway and the surrounding areas. The main steps include: (a) Data preparation: compile a local landslide database and prepare influencing factors; (b) Data processing: use a 30-m grid as the mapping unit, simplify the calculation by reclassifying the influencing factors, and select the same number of landslide samples in the non-landslide area as negative samples, assign values of 1 and 0 to the positive and negative sample points respectively, and randomly construct training and test sets for all samples in an 8:2 ratio; (c) Model training and evaluation: use the training set to train the model, and then use the test set to verify and evaluate the model; (d) Landslide susceptibility evaluation: use the trained model to predict and evaluate the regional landslide susceptibility (Figure 6).

Figure 6
Flowchart depicting a landslide susceptibility analysis process. It starts with landslide factors like slope and precipitation. Datasets from Google Earth and GIS are used. Spatial analysis is conducted, followed by a modeling process with a training and validation dataset split. Model output involves validation, susceptibility maps, and factor importance analysis.

Figure 6. Workflow diagram.

3.1 Model sampling

In this study, a total of 1,426 sample points were selected and the ratio of positive to negative samples was 1:1. The positive samples were obtained by converting 713 large landslide relics into points, while the negative samples were created as random points in non-landslide areas. Values of 1 and 0 were assigned to the positive and negative sample points, respectively, to provide sample data for subsequent model training (Figure 7).

Figure 7
Map showing landslide risk assessment in a specific area with a legend indicating city locations, landslide points, non-landslide points, and digital elevation model (DEM) data ranging from negative fifty-one to two thousand thirty meters. The Beijing-Guangzhou Railway is marked. The map spans from 111°E to 114°E and 23°N to 28°N, highlighting topographical and geographical information.

Figure 7. Distribution map of sample points.

3.2 Random Forests

Random Forests (RF) is an ensemble method that individually trains binary decision trees, primarily known for its exceptional generalization capability and flexibility (Breiman, 2001; Torres-Vázquez et al., 2025). The RF algorithm is widely applicable to both classification and regression problems and exhibits relatively high tolerance to outliers and noise, making it less prone to overfitting (Zhang et al., 2017; Dey et al., 2024). The RF algorithm learns by constructing multiple decision trees, with each tree trained on a different random subsample of the dataset. Compared to a single decision tree, RF provides more robust and accurate prediction results. The mathematical expression of the RF algorithm is Equation 1:

Fx=1Bb=1BTbx,θb(1)

Where B is the number of decision trees, Tb (x,θb) is the prediction function of the b-th tree, and θb represents the parameters of the b-th tree. The RF algorithm reduces variance and enhances the model’s generalization capability by averaging the predictions of multiple trees. It demonstrates excellent predictive performance across various data distributions and has become a preferred method for classification and prediction tasks.

This study employs the GridSearchCV method to repeatedly tune hyperparameters on the training set. The search ranges include: n_estimators [100, 120, 140, 160, 180], max_depth [3, 5, 7, 10, None] (where None indicates no depth restriction), min_samples_split [2, 5, 10], min_samples_leaf [1, 2, 4], and max_features [‘sqrt’, ‘log2’]. A 10-fold cross-validation is applied, and the optimal parameters are selected by maximizing the validation set AUC. The final optimal hyperparameter combination is as follows: the random forest consists of 140 decision trees, with a maximum depth of 5, a minimum split sample number of 2, and a minimum of 1 sample per leaf node. The dataset is randomly divided into training and test sets in an 8:2 ratio, and the training features are standardized. The susceptibility model is then constructed with a fixed random seed of 42.

3.3 Model validation

The receiver operating characteristic (ROC) curve is a tool used to evaluate the performance of binary classification models. It illustrates the model’s performance at different thresholds by comparing the true positive rate (TPR) and false positive rate (FPR), and its accuracy is widely applied in assessing machine learning models (Zhang et al., 2007). The formula is as follows (Equations 2, 3):

TPR=TPTP+FN(2)
FPR=FPFP+TN(3)

where TP indicates that both the prediction and the actual value are landslides; FP indicates that the prediction is a landslide while the actual value is not; TN indicates that both the prediction and the actual value are non-landslides; and FN indicates that the prediction is a non-landslide while the actual value is a landslide. The area under the ROC curve (AUC) directly reflects the accuracy of the model, with a value range of 0–1. The closer the AUC is to 1, the higher the model’s accuracy and the better its classification performance.

3.4 SHAP interpretability method

To reveal the specific contributions of each influencing factor to landslide susceptibility prediction in the Random Forest model, this study introduces the SHapley Additive exPlanations (SHAP) method (Xiao et al., 2024; Halder et al., 2025). SHAP is based on the Shapley value theory from cooperative game theory and aims to fairly distribute the contribution of the model output. Compared with LIME (Local Interpretable Model-agnostic Explanations), which may produce unstable interpretations due to local sampling, and permutation importance, which is limited to providing only global feature rankings, SHAP can simultaneously provide both global feature importance and local instance-level interpretations. This is crucial for revealing the differences in the dominant factors of landslides across different typical road sections within the study area (Fisher et al., 2019; Chen and Fan, 2024). For any model input feature vector X=(x1,x2,x3, … ,xm) with a model prediction output f(x), the SHAP method calculates the marginal contribution value ϕi of each feature xi, satisfying the following additive property (Equation 4):

fx=ϕ0+i=1Mϕi(4)

Here, ϕ0 represents the baseline output of the model (i.e., the expected output when no feature information is available), and ϕi denotes the contribution of the i-th feature. The Shapley value is calculated as the average marginal contribution over all possible subsets of features (Equation 5):

ϕi=S1,,M\iS!MS1!M!fSiXSifSXS(5)

Here, S is any subset of features excluding feature i,and fS is the expected model output conditioned on the features in subset S. By considering all possible feature combinations, SHAP values quantify the average contribution of a single feature to the model’s prediction, thereby ensuring theoretical fairness and consistency. In this study, the Python shap library is used to interpret the Random Forest model, obtaining both global and local contribution distributions for each factor, which assists in identifying the key driving factors of landslide occurrence and their directional effects.

4 Results and analysis

4.1 Spatial distribution of large-landslide relics

To more accurately analyze the spatial distribution characteristics of landslides, we selected landslide number density (LND) and landslide area percentage (LAP) as indicators to evaluate landslide abundance (Huang et al., 2022). LND reflects the number of landslides per unit area, aiding in the identification of high-frequency landslide regions (Equation 6); LAP represents the proportion of landslide area to the total study area, offering a clear visualization of the spatial extent of landslides (Equation 7):

LND=Landslide  numberArea  of  factor  class  CA(6)
LAP=Landslide  areaArea  of  factor  class  CA(7)

where CA represents the area of each class of the influencing factors. Through these two indicators, we can comprehensively evaluate the abundance and spatial distribution of landslides. This enables an in-depth investigation of the relationship between influencing factors and landslides, as well as revealing the distribution patterns of large landslide relics in the study area (Li et al., 2021; Cui et al., 2022).

4.1.1 Topographic factors

The elevation of the study area is divided into nine categories according to the natural breakpoint method, which are −51–91,91–188,188–304,304–435,435–583,583–752,752–956,956–1222,1222–2030 (Figure 8). The large landslide relics are predominantly distributed within the elevation range of 188–752 m, comprising 518 landslides, which account for 72.65% of the total large landslide relics in the study area. The landslide number density (LND) and landslide area percentage (LAP) are highest in the elevation range of 583–752 m, with values of 0.0153% and 0.28%, respectively, indicating a higher probability of landslide occurrence in this range.

Figure 8
Bar and line chart showing the relationship between elevation and three variables: catchment area (CA) in square kilometers, landslide number, landslide density (LND) per square kilometer, and landslide area percentage (LAP). Bars represent CA, while lines with different markers indicate landslide number, LND, and LAP. Elevations range from -15 to 2280 meters.

Figure 8. Relationships between elevation and LND and LAP.

For the slope, most landslides occur between 15° and 30°, accounting for about 70.13% of the total number of landslides. Among these, the slope range of 24.09°–29.92° has the highest number of landslides, totaling 187 (Figure 9). Only 2.24% of the landslides occurred in the 0°–9° slope range. The highest LND and LAP values were found in the 29.92°–71.23° interval, and these metrics continued to increase from 12.01° to 15.66°, suggesting that landslide density roughly rises with slope.

Figure 9
Bar and line graph showing relationships between slope and several factors: CA (blue bars), landslide number (gray line), LND (red line), and LAP (pink line). CA decreases with increasing slope, while landslide numbers, LND, and LAP generally increase.

Figure 9. Relationships between slope and LND and LAP.

Figure 10 shows the relationship between curvature and LND and LAP. The large-scale landslide relics in this area are mainly distributed in the range of-0.35–0.22, with a total of 536 landslides, which account for 75.18% of the total number. LND and LAP are highest in the range of −0.58 to −0.35. This is mainly because negative curvature values indicate concave terrain, which tends to create areas of water flow convergence, increasing soil saturation and reducing soil shear strength, thereby elevating the risk of landslides. In addition, these areas may also feature complex geological structures, fragmented rocks, and intense weathering conditions, further exacerbating landslide susceptibility.

Figure 10
Bar and line chart showing the relationship between curvature and four variables: CA (in square kilometers), landslide number, LND (per square kilometer), and LAP (percentage). The x-axis displays curvature ranges; the y-axis shows CA on the left and landslide number, LND, and LAP on the right. Bars represent CA, while lines with different symbols represent the other variables. The chart highlights variations in these metrics across curvature ranges.

Figure 10. Relationships between Curvature and LND and LAP.

In addition to flat areas, the number of large-scale landslide relics in each category is relatively balanced, and there are slightly more landslides in the Northeast and Southwest, with 110 and 108, respectively (Figure 11). In the Northeast, the LND reaches its highest value of 0.0086, while in the Northwest, the LND is the lowest at 0.0048. In contrast, the LAP is highest in the Southwest, at 0.16%. This pattern can be attributed to variations in sunlight, precipitation, and vegetation growth across different slope aspects. The Northeast and The Southwest receive strong solar radiation, leading to significant thermal expansion and contraction of rocks and soil, which promotes weathering and fragmentation, resulting in poor stability. Additionally, these aspects are more susceptible to landslides due to the substantial impact of precipitation and surface runoff. The Southwest, characterized by dense vegetation, may increase soil porosity and moisture content, reducing shear strength. Furthermore, vegetation can alter the path and velocity of surface runoff, resulting in greater susceptibility to landslides.

Figure 11
Bar and line graph showing various data related to landslides by aspect direction: CA (shown in blue bars), landslide number (gray line with circles), LND (pink line with squares), and LAP (blue line with triangles). The x-axis represents different aspects (flat, north, northeast, east, southeast, south, southwest, west, northwest), and the y-axes show CA, landslide number, LND, and LAP.

Figure 11. Relationships between aspect and LND and LAP.

For the slope position, the number and area of landslides in the steep slope interval are the highest, with 547 and 36,764.26 km2, respectively. Additionally, both LND and LAP in this interval are also the highest, which is consistent with actual observations (Figure 12). Steep slopes are characterized by complex geological conditions and low shear strength of rock and soil types, making them prone to the formation of sliding surfaces. The development of joints, fractures, bedding planes, and faults in the geological structure further increases the likelihood of landslides. Additionally, the study area is located in a subtropical humid monsoon climate zone, accompanied by high annual rainfall. The substantial infiltration of rainwater leads to saturation of the soil and rock layers on the slopes, increasing the weight of the sliding mass and reducing the shear strength of the soil and rock layers. These conditions make landslides more likely to occur.

Figure 12
Bar and line graph showing five terrain categories on the x-axis: Valleys, Upper Slopes, Steep Slopes, Ridges, Lower Slopes, and Gentle Slopes. The y-axis on the left represents contributing area (CA) in square kilometers with blue bars. The y-axis on the right shows the landslide number and percentage (LND and LAP) with red and pink lines. Steep Slopes have the highest CA and landslide impact.

Figure 12. Relationships between TPI and LND and LAP

4.1.2 Basic geological factors

Most of the landslide disasters are distributed in Jurassic, Devonian, Cambrian, and Precambrian strata, accounting for about 77.84% of all disasters. A total of 29 landslides occurred in the three categories from Quaternary to Cretaceous, accounting for a small proportion (Figure 13). The highest LND is 0.086 in the Jurassic, followed by 0.0071 in the Devonian, and the lowest is the Tertiary. The lithology of the Jurassic, Devonian, Cambrian, and Precambrian strata has low shear strength, which is prone to the formation of sliding surfaces and landslides. In contrast, the lithology of the Quaternary to Cretaceous strata has higher shear strength and stability, resulting in relatively fewer landslide hazards. The highest LND is observed in the Jurassic strata, likely due to the widespread distribution of Jurassic strata within the study area and their lithological structure and geological conditions being more conducive to landslide occurrence.

Figure 13
Bar and line chart showing geological data for different periods: Quaternary to Precambrian. Blue bars represent CA in square kilometers, gray circles show landslide numbers, pink squares indicate LND in square kilometers, and blue triangles illustrate LAP percentage. Data is plotted on dual y-axes, with CA and landslide numbers on the left, and LND and LAP on the right.

Figure 13. Relationships between geology and LND and LAP.

For the distance to faults, according to the interval of 1 km, it is divided into 11 categories: 0–1 km, 1–2 km, 2–3 km, 3–4 km, 4–5 km, 5–6 km, 6–7 km, 7–8 km, 8–9 km, 9–10 km, >10 km (Figure 14). The majority of landslides occur in the >10 km range, with 569 landslides accounting for 79.80% of the total. The distribution of landslides in other categories is approximately equal. The LND exhibits significant fluctuations, with the highest value of 0.0092 observed at a distance of 6–7 km from the fault, and the lowest value of 0.0043 observed at a distance of 1–2 km from the fault. The highest LND is observed at 6–7 km from the fault, primarily due to the fragmentation of rock masses in secondary fault zones, the coupling effects of steep terrain and heavy rainfall, and human activity disturbances. In contrast, the near-fault region (1–2 km) exhibits the lowest LND, attributed to consolidated and stable rock masses, limited development, and monitoring biases.

Figure 14
Bar and line graph comparing cumulative area (CA), landslide numbers, landslide number density (LND), and landslide area percentage (LAP) across distances to a fault. The CA is represented by bars, while LND and LAP are shown with lines marked by squares and triangles, respectively. The data reveals peaks at various distances, particularly above ten kilometers. The right vertical axis corresponds to landslide numbers and percentages, while the left axis corresponds to cumulative area.

Figure 14. Relationship between Distance to fault and LND and LAP.

4.1.3 Hydrometeorological factors

Figure 15 shows that 85.69% of landslides occur within the Average precipitation range of 1,485–1802 mm. Among these, the 1,652–1702 mm interval has the highest number of landslides, totaling 111. This is primarily because the ample moisture in this range increases soil saturation and pore water pressure, reducing slope stability and making landslides more likely to occur. The LND and LAP are highest in the 1802–1995 mm rainfall range, with values of 0.1680% and 0.21%, respectively. This is primarily because the study area has an average precipitation ranging from 400 to 1,600 mm, and regions within the 1802–1995 mm range are relatively small. However, when these regions experience high rainfall, the probability of landslides significantly increases, resulting in the highest LND and LAP values.

Figure 15
Bar chart showing the relationship between average precipitation and landslide metrics from 1366 to 1995. Blue bars represent change areas in square kilometers. Gray line indicates landslide numbers. Red line with squares represents landslide density per square kilometer, and purple line with triangles shows landslide area percentage. The data covers nine time intervals with varying precipitation impacts.

Figure 15. Relationship between average precipitation and LND and LAP.

About 74.33% of the landslides occurred within the range of 1941.62 m from the river (Figure 16). Among them, the number of landslides in the 0–447.21 interval was the largest, with a total of 168 landslides, while the number of landslides in the 5,375.87–8747.57 interval was the smallest, with only 4 landslides. This is mainly due to the fact that the scouring effect of the river on the toe of the slope will destroy the supporting structure of the slope, resulting in the instability of the slope and the occurrence of landslides. In addition, the geological conditions near rivers are often complex, such as faults and weak rock layers, which are inherently prone to triggering landslides. Coupled with the erosive effects of rivers, these conditions further exacerbate the occurrence of landslides.

Figure 16
Bar and line graph showing the relationship between distance to a river and four metrics: CA (km²), landslide number, LND (km²), and LAP (%). The x-axis displays distance categories, while the y-axes display values for these metrics. Blue bars represent CA, gray dots represent landslide numbers, red squares represent LND, and pink triangles represent LAP. CA decreases with distance, while LND and LAP show varying trends.

Figure 16. Relationship between distance to river and LND and LAP.

Figure 17 indicates that 616 landslides occur in the TWI 491–865 interval, accounting for 86.40% of the total number of landslides. In contrast, the number of landslides in the 1,030–2,150 interval is only 1.54% of the total. The LND and LAP of TWI exhibit a trend of initially increasing and then decreasing, reaching their highest values of 0.0164% and 0.26%, respectively, within the 679–783 range. This is because when the TWI value is low, the terrain is dry, the soil moisture content is low, resulting in a lower probability of landslides. As TWI values increase, humidity rises, soil moisture content increases, and shear strength decreases, leading to a higher probability of landslides and an increase in LND and LAP. However, when TWI values are excessively high, the soil becomes overly moist or even saturated, further reducing shear strength. Despite this, the probability of landslides decreases due to increased stability, causing LND and LAP to decline. The peak values occur within the TWI range where humidity is optimal.

Figure 17
Bar and line graph illustrating various metrics over time. The blue bars represent cumulative area in square kilometers. A gray line with circular markers denotes the number of landslides. Red square markers represent landslide number density, and blue triangles indicate landslide area percentage. The horizontal axis shows distinct time windows, and the vertical axes represent different measurement units. The graph highlights a peak in landslides around the 491-697 time window, followed by a decline.

Figure 17. Relationship between TWI and LND and LAP.

4.1.4 Surface coverage factors

The number of landslides in the FVC 41–52 interval is the largest, with 125 landslides accounting for 17.53% of the total, while the number of landslides in the FVC -1-9 interval is the lowest, with only 26 landslides accounting for 3.65% of the total (Figure 18). The highest LND is observed in the FVC 86–100 range, primarily because the areas with high vegetation cover along the Zhuzhou-Guangzhou section of the Beijing-Guangzhou Railway are inherently located in regions with complex geological structures and steep terrain, where slope stability is already poor. High FVC values cannot completely offset the impact of these unfavorable geological and topographic conditions. Once the trigger factors are encountered, landslides are still prone to occur, resulting in large LND and LAP.

Figure 18
Bar and line chart comparing catchment area (CA), landslide number, landslide distribution percentage (LND), and landslide area percentage (LAP) against FVC (forest vegetation cover) ranges. The blue bars represent CA, the gray line with circles shows landslide numbers, the red line with squares indicates LND, and the blue line with triangles represents LAP. Overall, CA initially rises and then decreases, while the other metrics fluctuate with FVC, indicating correlations between FVC and landslide-related metrics.

Figure 18. Relationship between FVC and LND and LAP.

Most of the landslides are distributed in the broadleaf evergreen forests, broadleaf deciduous forests, needleleaf evergreen forests, tree open, a total of 525 large-scale landslide relics, accounting for 73.63% of the total number of landslides (Figure 19). Among them, there are more large-scale landslide relics in the tree open area, a total of 148, while there are fewer landslides in urban and mixed forests, only single-digit occurrences. The LND and LAP fluctuated greatly, with the highest LND of 0.0126, distributed in the broadleaf evergreen range, and the highest LAP of 0.23%, distributed in the herbaceous range.

Figure 19
Bar and line chart depicting land cover types against cumulative area (CA), landslide number, landslide density (LND), and landslide occurrence percentage (LAP). Categories include various forest types, crops, and urban areas. Bars show CA in kilometers squared, while lines represent landslide data. Urban areas have the highest CA, while mixed forests show noticeable landslide activity. Different axes on the right indicate LND and LAP percentages.

Figure 19. Relationship between landcover and LND and LAP.

For distance to railway, seven categories were defined: 0–0.2 km, 0.2–0.5 km, 0.5–1 km, 1–2 km, 2–5 km, 5–10 km, and >10 km (Figure 20). Landslides mainly occurred in the >10 km category, with 674 events, accounting for 94.45% of the total. Landslides were relatively few in the other categories. The LND (landslide number density) shows significant variation, with the highest value of 0.0070 observed in the 0.2–0.5 km range and the lowest value of 0.0007 in the 1–2 km range. The highest LND at 0.2–0.5 km from the railway is primarily attributed to the strong disturbance of slope stability caused by railway engineering activities, including slope cutting, train-induced vibrations, and drainage modifications. Slope cutting and excavation during railway construction directly weaken the slope toe support, leading to stress concentration on the potential sliding surface. The installation and modification of drainage ditches may alter surface and groundwater flow paths, increasing pore water pressure. Vibrations and cyclic loads generated by train operation can also cause cumulative plastic deformation of the slope soil, accelerating the reactivation of old landslides. Furthermore, railway maintenance, reinforcement, and expansion activities may also alter the local stress balance, increasing the risk of slope instability.

Figure 20
Graph showing the relationship between distance to railway (km) and various metrics: CA (km²) represented by bars, Number as line with circles, LND (km⁻²) with squares, and LAP (%) with triangles. Distances range from 0-0.2 to over 10 km, indicating variations across categories.

Figure 20. Relationship between Distance to railway and LND and LAP.

Overall, landslides in the study area are predominantly distributed within the elevation range of 188–752 m, slope angles of 15°–30°, and curvature range of −0.35–0.22. The number of landslides on northeast and southwest aspects is slightly higher, and the number and area of landslides in the steep slope interval are the highest. Landslides mostly occur in lithological regions such as the Jurassic strata, within the average precipitation range of 1,485–1802 mm. The number of landslides is higher in areas with FVC ranging from 41 to 52, while the landslide density is highest in the FVC range of 86–100. Most landslides occur more than 10 km from faults, but the highest landslide density is found within 6–7 km of faults. Approximately 74.33% of landslides are located within 1941.62 m of rivers, with the highest number of landslides occurring within 0–447.21 m. Landslides with TWI in the 491–865 interval account for 86.40%, and the landslide density and area percentage in the 679–783 interval are the highest. Landslides are predominantly observed in vegetation types such as broadleaf evergreen forests, with the highest landslide density occurring in areas dominated by broadleaf evergreen vegetation. Conversely, the highest percentage of landslide area is found in regions covered by grassland vegetation. Notably, the greatest number of landslides (94.45%) occurred at distances greater than 10 km from the railway; however, the highest landslide density (LND = 0.0070) was observed within the 0.2–0.5 km range. This distinctive distribution pattern likely reflects localized slope stability disturbances caused by railway engineering activities.

4.2 Model validation

In this study, thirteen influencing factors were selected, including elevation, slope, aspect, TPI, curvature, geology, average precipitation, FVC, TWI, PGA, landcover, distance to faults, distance to railway, and distance to rivers. A data ratio of 8:2 was employed for model training and validation to construct a landslide susceptibility assessment model for the Zhuzhou to Guangzhou section of the Beijing-Guangzhou Railway.

To comprehensively evaluate model performance, this study used a combination of metrics, including the area under the ROC curve (AUC), precision, recall, and F1 score. The results show that the RF model achieved an AUC value of 0.88, indicating a high accuracy in landslide prediction and an effective ability to distinguish between landslide and non-landslide occurrences (Figure 21). Furthermore, confusion matrix analysis (see Table 2) reveals an overall accuracy of 0.77. Specifically, the landslide class achieved a precision of 0.88, recall of 0.91, and an F1-score of 0.79; while the non-landslide class had a precision of 0.88, recall of 0.64, and an F1-score of 0.74, demonstrating balanced and robust performance in predicting both classes. Overall, the model shows good applicability and reliability.

Figure 21
ROC curve showing the relationship between the true positive rate and false positive rate. The blue line represents the testing data with an AUC of 0.88, and the green line represents the training data with an AUC of 0.95. A red diagonal line indicates random chance.

Figure 21. ROC curve.

Table 2
www.frontiersin.org

Table 2. Confusion matrix of RF model.

4.3 Global susceptibility assessment of large-scale landslide relics

The study utilized a precision-validated susceptibility assessment model to predict landslide susceptibility in the study area. The landslide susceptibility assessment model with verified accuracy was used to predict the landslide susceptibility of the study area. The Natural Breaks (Jenks) method was used to reclassify the assessment results to determine the degree of landslide susceptibility in the study area (Jenks, 1967), which was divided into five levels: very low, low, medium, high and very high (Figure 22).

Figure 22
Topographic map showing the area between Zhuzhou and Guangzhou, with a railway marked in black. Color-coded relief levels range from very low (green) to very high (red). A legend and scale are provided.

Figure 22. Large-landslide relics susceptibility map.

As shown in Figure 22, the high and very high susceptibility zones in the study area are primarily concentrated in the central and southern regions. Furthermore, high and very high susceptibility zones are also observed in the northeastern part of the study area. The area of each susceptibility zone was statistically analyzed, and the area ratios were calculated. The results show that the very low susceptibility zone covers approximately 43,034.36 to 45,784.54 km2, accounting for 41.22%–38.64% of the total study area; the low susceptibility zone covers about 23,240.75 to 23,314.76 km2, representing 20.99%–18.7%; the moderate susceptibility zone covers roughly 18,063.42 to 17,110.09 km2, accounting for 15.40%–16.22%; the high susceptibility zone covers approximately 15,939.49 to 14,311.06 km2, making up 12.89%–14.31%; and the very high susceptibility zone covers around 10,546.77 to 11,092.54 km2, which is 9.50%–9.6% of the total study area (Table 3). Overall, the southern section of the Beijing-Guangzhou Railway from Zhuzhou to Guangzhou is more prone to landslide hazards compared to the northern section, likely due to the steeper terrain and higher landslide risk in the southern region. Therefore, it is recommended that relevant authorities enhance monitoring and early warning systems in these high-risk areas, develop targeted disaster prevention and mitigation measures, and improve local residents’ awareness of disaster prevention and emergency response capabilities.

Table 3
www.frontiersin.org

Table 3. Landslide susceptibility grade area and area ratio.

4.4 Importance of influencing factors

For the importance analysis of influencing factors, this study employed two methods: the built-in importance analysis method of Random Forest and the SHAP interpreter (Xiao et al., 2024; Halder et al., 2025). Firstly, the built-in importance analysis method of Random Forest was utilized to evaluate the contribution of each influencing factor within the model. This method primarily relies on two metrics: Mean Impurity Decrease and Mean Accuracy Decrease (Yu et al., 2024). Based on the Python platform, we calculated and sorted the importance values of 14 influencing factors. The results are as follows: Slope (0.3910) > TWI (0.1393) > Elevation (0.0936) > Average Precipitation (0.0899) > Curvature (0.0631) > Slope Position (0.0444) > Lithology (0.0316) > Land Cover Type (0.0307) > Distance to Fault (0.0285) > Distance to Railway (0.0277) > Distance to River (0.0459) > Vegetation Cover (0.0261) > Aspect (0.0204) > Distance to River (0.0132) > PGA (0.0003) (Figure 23). From the ranking results, it can be seen that slope, TWI, average precipitation, and elevation have relatively higher importance weights, whereas PGA and distance to river have lower importance.

Figure 23
Bar chart showing the importance of features related to some measurement.

Figure 23. Factor importance of according to RF method.

To more comprehensively evaluate the importance of the influencing factors, we further employed the SHAP interpreter. SHAP values provide a more intuitive representation of the contribution of each feature to the model’s output (Lundberg and Lee, 2017; Xiao et al., 2024). Figure 24 illustrates the feature importance ranking based on SHAP, which is sorted according to the mean absolute SHAP value of each feature. The importance decreases from top to bottom, with each point in the figure representing a sample. The color gradient from red to blue indicates the feature values from highest to lowest. The analysis indicates that slope, TWI, average precipitation, and elevation are significant influencing factors contributing to landslide occurrence, while PGA and distance to river remain of lower importance. Among these, slope exhibits a significant positive correlation with landslide occurrence, suggesting that the higher the slope, the more prone to landslides. TWI exhibits a certain negative correlation with landslide occurrence, with lower TWI values having a higher impact on landslide initiation. Meanwhile, distance to rivers, distance to faults, and PGA remain factors of lower importance.

Figure 24
SHAP summary plot showing feature impacts on a model output. Features include Slope, TWI, Elevation, Average Precipitation, Curvature, TPI, and others. SHAP values range from negative to positive, with colors representing feature values from high (pink) to low (blue). The horizontal axis indicates SHAP values, while features are listed vertically.

Figure 24. Feature importance based on SHAP.

A comparison of the two importance analysis methods shows that slope, TWI, elevation, and average precipitation are the dominant factors influencing landslide occurrence in the study area. However, discrepancies exist in the importance rankings of certain factors, such as slope, TPI, land cover type, and vegetation cover. This is primarily due to the different mechanisms behind the two methods: the random forest algorithm assesses feature importance based on Gini impurity or permutation importance, while SHAP is based on Shapley values, which account for the marginal contribution of each feature across all possible combinations. SHAP is more sensitive to data distribution and variations in model predictions, and it can capture complex interactions between features. In contrast, random forest provides a relatively stable importance ranking but does not explicitly consider feature interactions. Despite these local ranking differences, both methods consistently identify the key triggering factors of landslides in the study area, offering theoretical support for accurately identifying high-risk zones and formulating differentiated prevention and mitigation strategies.

4.5 Local analysis of landslide susceptibility in representative sections

The Zhuzhou–Guangzhou section of the Beijing–Guangzhou Railway crosses two provinces, Hunan and Guangdong, encompassing 21 cities. Among them, Qingyuan City hosts the largest number of large landslide relics, accounting for 34.08% of the total landslides, followed by Guangzhou City, which accounts for 19.50%. Based on the landslide susceptibility assessment results, this study selects three typical segments along the railway corridor—Chenzhou urban area, Lechang City in Shaoguan, and Yingde City in Qingyuan—for focused analysis. These sections are not only high-risk and extremely high-risk landslide areas directly traversed by railways, but are also significantly representative in terms of geological structure, precipitation distribution, and landslide density. Chenzhou District is characterized by weak rock layers and fault zones, Lechang District by hard rocks and deeply incised river valleys, and Yingde District by karst landforms and dense faults. In addition, the precipitation amounts of the three areas differ, with the Yingde District having the highest annual precipitation, and all three are located within high-density landslide areas identified in this survey. Through SHAP value interpretation, the dominant factors and their mechanisms driving landslide formation in each segment are revealed, embodying a hierarchical assessment strategy of “global modeling—localized interpretation.”

As shown in Figure 25, the very high and high susceptibility zones in the urban area of Chenzhou cover approximately 598.03 km2, accounting for 27.66% of the total area. These zones are mainly distributed in Feitianshan Town, Bailutang Town, Wugaishan Town, Aoshang Town, and Yangtianhu Yao Ethnic Township. Due to the relatively limited number of samples in this section, direct factor contribution analysis may lead to unstable results. To address this issue, a bootstrap sampling strategy was adopted, performing 100 resampling iterations with replacement. In each iteration, the SHAP method was applied to extract feature contributions, and the average contribution values and standard deviations were calculated to derive a robust factor ranking and corresponding plots. This approach effectively improves the reliability and interpretability of factor diagnostics under small-sample conditions. The results indicate that slope (0.151), terrain wetness index (TWI, 0.062), and elevation (0.041) are the primary controlling factors for landslide susceptibility in this area, followed by curvature (0.031) and Average rainfall (0.026) (Figure 26). The region features complex geological structures with developed faults and severely weathered strata, including weak rock types such as mudstone and shale, which easily soften and destabilize under rainfall. Frequent mining activities further weaken slope stability. Moreover, the area experiences abundant and concentrated rainfall, with frequent heavy storms. Combined with the steep terrain, these conditions readily trigger landslide disasters.

Figure 25
Map showing a colored elevation analysis of a region with areas ranging from very low (green) to very high (red). Key locations include Beihu, Yongxing, and Yizhang. A railway is depicted in black. An inset highlights a detailed section with the same color gradation. A legend indicates the color meanings, and a scale measures distance.

Figure 25. Landslide susceptibility results in Chenzhou District.

Figure 26
Bar chart showing SHAP feature importance for Chenzhou with bootstrap. The top features are slope, TWI, average precipitation, and elevation, ranked by mean SHAP value. Error bars indicate variability.

Figure 26. Factor importance ranking in Chenzhou District.

Lechang City in Shaoguan is the area with the highest concentration of very high and high landslide susceptibility zones along the Zhuzhou–Guangzhou section of the Beijing–Guangzhou Railway. As shown in Figure 27, the area proportions of landslide susceptibility zones in this city are: very high susceptibility zone covering 324.88 km2 (13.4%), high susceptibility zone 529.52 km2 (25.97%), moderate susceptibility zone 598.31 km2 (24.69%), low susceptibility zone 491.61 km2 (20.28%), and very low susceptibility zone 379.43 km2 (15.65%). The very high and high susceptibility zones are mainly concentrated in the northern towns of Huangpu, Baishi, Liangjiang, Jiufeng, and the central town of Dayuan.

Figure 27
Map depicting environmental vulnerability in Lechang and surrounding regions. Color-coded areas range from very low (green) to very high (red) vulnerability. A legend and scale bar are included.

Figure 27. Landslide susceptibility results in Lechang District.

The SHAP contribution ranking is: slope (0.138), curvature (0.041), TWI (0.038), elevation (0.036), and distance to fault (0.025) (Figure 28). This area is located in a mid-to-low mountainous terrain with dramatic topographic variation. The dominant lithologies are intrusive rocks and carbonate rocks, with thick, loose weathered layers. The railway often crosses valley sections where surface and groundwater erosion weaken and destabilize slope toes. Additionally, concentrated rainfall combined with human activities such as transportation engineering construction are key triggers of landslides here.

Figure 28
Bar chart titled

Figure 28. Factor importance ranking in Lechang City.

From the landslide susceptibility results in Yingde City, Qingyuan, the areas of very high and high susceptibility zones are 600.13 km2 and 965.24 km2, respectively, together accounting for 27.76% of the total area. These zones are mainly concentrated in Shigutang Town, Hengshitang Town, Shakou Town, Wangbu Town, Donghua Town, and Dawan Town (Figure 29).

Figure 29
Map depicting the landslide susceptibility of a region around Yingde. Areas are color-coded: green (very low), yellow (low), orange (moderate), red (high), and purple (very high). A railway is marked with a line, and the compass direction is indicated. An inset highlights a high-risk area.

Figure 29. Landslide susceptibility results in Yingde City.

Figure 30 presents the SHAP analysis results for Yingde City, showing that slope (0.161), TWI (0.055), elevation (0.047), curvature (0.033), and average precipitation (0.029) are the dominant factors influencing landslide susceptibility. This area has strong surface water infiltration capacity, where heavy rainfall can rapidly increase subsurface pore water pressure, triggering landslides. Meanwhile, fractured rock masses and developed faults exacerbate slope instability potential. The average precipitation reaches 1906.2 mm, mainly concentrated during the flood season from April to September, further increasing the frequency and intensity of landslide occurrences.

Figure 30
SHAP summary plot for Yingde showing the impact of various features on model output. Features include slope, TWI, elevation, and more, with SHAP values ranging from negative to positive. Color gradient indicates feature values from low (blue) to high (red).

Figure 30. SHAP summary plot (Yingde City).

5 Discussion

5.1 Spatial heterogeneity of landslides and segmental differences

This study evaluated landslide susceptibility along the Zhuzhou–Guangzhou segment of the Beijing–Guangzhou Railway, achieving high overall model performance (AUC = 0.88). However, the spatial distribution of landslide susceptibility exhibits significant heterogeneity. High susceptibility zones are primarily concentrated in specific areas characterized by fragmented terrain, intense rainfall, and fractured lithology, such as the urban district of Chenzhou, Lechang City in Shaoguan, and Yingde City in Qingyuan. Local SHAP value analysis reveals that while slope remains the dominant controlling factor across all three regions, the ranking of other contributing factors differs markedly, reflecting the region-specific characteristics of landslide triggers.

In the urban district of Chenzhou, the top five controlling factors are slope (0.151), TWI (0.062), elevation (0.041), curvature (0.031), and average precipitation (0.026). The region has a complex and fragmented geological structure with well-developed faults and intense weathering of strata. The widespread distribution of weak rock layers (such as mudstone and shale) is a key intrinsic factor contributing to the region’s high susceptibility to landslides. These rock layers have low cohesion and poor permeability; after rainfall infiltration, pore water pressure rises rapidly, and the rock mass softens upon contact with water, resulting in a sharp loss of internal friction angle and cohesion, thus creating favorable material sources and mechanical conditions for landslides. Furthermore, extensive mining activities in the area cause significant disturbance to slope structures (Xia, 2008). The importance of TWI and precipitation highlights the key role of rainfall-induced pore water pressure increases in destabilizing weak rock layers. The combined effects of tectonic environment, climatic conditions, and anthropogenic engineering disturbances constitute the fundamental causes of frequent landslides in this area.

The landslide mechanisms in Lechang City, Shaoguan, are primarily driven by the combined effects of topography and rainfall. The area exposes hard intrusive, metamorphic, and carbonate rocks, which, although possessing high strength, have developed joints and fissures due to long-term weathering and tectonic activity. This significantly reduces the overall shear strength of the rock mass and increases its permeability. Residual slope deposits, reaching thicknesses of 5–30 m, further weaken slope stability (Zhang and Zhang, 2014). Engineering disturbances, such as railway construction, have caused stress redistribution within the slope mass, exacerbating instability. Continuous scouring of the slope toe by the Wujiang River further weakens slope support. Intense and concentrated rainfall frequently triggers slope failures. The landslide causation in Lechang exhibits a multifactorial overlay, indicating that mitigation strategies should comprehensively address geological, engineering, and hydrological factors, with particular emphasis on toe reinforcement and hydrological management.

The distribution of landslides in Yingde City, Qingyuan, is controlled by the combined effects of steep topography, intense weathering profiles, and valley incision. The area features karst landforms and a dense fault zone, which lead to rapid increases in pore water pressure. Coupled with concentrated rainfall, this results in a typical causative pattern characterized by steep slopes, heavy rainfall, and fractured rock masses (Zhao, 2018). The coupled influence of hydrogeological processes and tectonic conditions significantly increases landslide susceptibility in the area, highlighting the need for enhanced hydrological monitoring and identification of structurally weak zones.

Based on the observed spatial heterogeneity of landslide susceptibility, differentiated prevention and mitigation strategies should be adopted for different high-risk segments. In extremely high susceptibility zones—such as Huangpu Town in Lechang City and Feitianshan Town in Suxian District—automated monitoring equipment, including inclinometers and rain gauges, should be deployed to establish a threshold-based early warning system tailored to dominant triggering factors (Reichenbach et al., 2018). In areas with significant anthropogenic disturbance, such as mining zones in Chenzhou, priority should be given to composite retaining structures using rock bolts and lattice beams (Dai et al., 2002). For rainfall-sensitive areas like Yingde City, subsurface drainage systems are recommended. Additionally, community emergency preparedness should be strengthened through multiple communication channels such as television, radio, and the internet (Petley, 2012; Han and Wu, 2024). To enhance monitoring capabilities, a space-air-ground integrated system combining UAV-based LiDAR and InSAR technologies should be implemented (Jaboyedoff et al., 2012; Feng et al., 2024a; Feng et al., 2024b). Such region-specific mitigation strategies will provide strong technical support for the long-term safe operation of the Beijing–Guangzhou railway corridor.

5.2 Refined evaluation framework: Applicability and innovation

In response to the practical demands for high-precision hazard identification and prevention in railway corridor projects, this study proposes and develops a refined landslide susceptibility evaluation framework based on a “global modeling–local refinement” approach. At the macro scale, the framework employs a Random Forest model to comprehensively explore the nonlinear coupling relationships between landslide occurrence and 14 influencing factors, including topography, geology, and climate. The model demonstrates strong performance, with an AUC of 0.88, high classification accuracy, as well as robust generalization capability.

Unlike black-box models such as deep learning, RF not only exhibits higher computational efficiency but also retains interpretability (Merghadi et al., 2020). Moreover, through feature randomization and ensemble mechanisms, RF effectively reduces overfitting risks, making it well-suited for complex geological environments with multidimensional and heterogeneous input features.

Multiple studies support RF’s superiority in landslide susceptibility assessment. For instance, Kavzoglu et al. compared RF, XGBoost, and NGBoost, finding that these ensemble methods outperform traditional single models in both training and testing phases, demonstrating stronger generalization capabilities (Kavzoglu and Teke, 2022). Similarly, Chauhan et al.'s study in the Indian Himalayas confirmed that RF and XGBoost not only provide accurate predictions but also offer valuable references for land-use planning and disaster management (Chauhan et al., 2025). Additionally, researchers have noted that RF’s feature randomization and majority voting mechanisms help prevent overfitting while enhancing generalization, particularly for high-dimensional geological data (Luu et al., 2024; Tun et al., 2024).

To improve model interpretability, this study incorporates the SHAP method based on the global Random Forest model, enabling quantitative contribution analysis at the factor level. Compared with traditional feature importance ranking methods, SHAP not only quantifies the global importance of each influencing factor but also reveals local effects at the individual sample level (Lundberg and Lee, 2017).

In recent years, SHAP has been widely applied in geological hazard risk assessments, including landslides and floods, significantly improving model transparency and reliability (Zeng et al., 2024; Choubin et al., 2025). In this study, SHAP analysis elucidates the spatial heterogeneity and nonlinear influences of key factors such as slope gradient, TWI, and average precipitation, providing a scientific basis for precise disaster prevention and mitigation strategies.

Traditional landslide susceptibility studies have mostly focused on uniform regional-scale modeling, making it difficult to fully reflect the complex spatial heterogeneity and differences in disaster mechanisms along linear engineering corridors (Jebur et al., 2014; Hong et al., 2015; Ali et al., 2022; Liu et al., 2022; Liu et al., 2024; Wang et al., 2024a). To overcome these limitations, this study innovatively introduces a “local refinement” module in the refined evaluation framework, selecting three high-risk typical sections in Qingyuan, Chenzhou, and Shaoguan for comparative analysis of fine-scale factor distribution characteristics and disaster mechanisms. The results show significant differences among the sections in terms of geomorphic types, lithological combinations, and external disturbance intensity. The ranking and contribution of dominant controlling factors exhibit distinct regional characteristics, revealing the spatial heterogeneity of landslide-prone environments.

Based on the analysis of differences in typical sections, this study further examines the model’s cross-regional applicability from the perspective of performance evaluation. To preliminarily verify the transferability of the proposed framework, a comparative analysis was conducted using random forest model results reported in previous studies from different regions. For instance, Hong et al. applied the random forest model in Wuyuan, Jiangxi Province, achieving an AUC of approximately 0.86 (Hong et al., 2017b); Gupta et al. obtained comparable accuracy (AUC = 0.876) along the NH-7 highway in the Himalayan region of India (Gupta et al., 2025); and Kavzoglu and Teke reported an AUC of about 0.87 in the Artvin coastal area of Turkey (Akinci et al., 2020). These comparable results suggest that, despite substantial variations in geological structures and environmental conditions, the random forest model maintains high predictive capability. This indicates that the proposed framework exhibits strong robustness and reliable cross-regional applicability. Further pilot verification will be carried out along railway and ultra-high-voltage transmission corridors with diverse geological settings to quantitatively assess its potential for cross-regional deployment.

5.3 Limitations of methods and data

Although the landslide susceptibility assessment framework developed in this study achieves good spatial accuracy and model interpretability, certain methodological and data limitations remain. First, regarding data accuracy, this study primarily utilizes 30-m resolution DEM and remote sensing data for factor extraction. While sufficient for regional-scale analysis, the relatively coarse spatial resolution may obscure key geomorphological features in areas with significant local elevation variations or complex geological structures, limiting the model’s sensitivity to microtopographic changes. Therefore, future research should consider incorporating higher-resolution data sources (such as 10-m or finer DEMs, LiDAR imagery) to more precisely characterize landslide-prone environments. Meanwhile, this study’s landslide inventory derived from remote sensing interpretation primarily focuses on large landslide remnants (≥0.05 km2). While this strategy ensures the saliency of the samples at the regional scale and the stability of model training, it may not fully capture the distribution patterns of small and shallow landslides that also pose threats to railway operations, thereby introducing potential biases into the detailed assessment of local risks along railway lines.

Regarding model adaptability, while random forest models possess strong generalization ability and interpretability, they are essentially static and cannot fully capture the dynamic triggering processes of landslides. External triggers such as extreme rainfall or earthquake–rainfall coupling can significantly alter surface hydrological conditions and the mechanical state of soil and rock, thereby affecting slope stability and model predictions. Future research could investigate the model’s sensitivity and stability under various external disturbances through numerical simulations or time-series modeling, gradually constructing dynamic, time-series susceptibility models capable of responding to environmental changes in real time. At the same time, the limited sample size in some typical areas may reduce the generalization ability of traditional statistical models. To address this, the present study introduces a bootstrap analysis strategy to enhance the interpretative stability in small-sample regions, although this method still has inherent limitations in statistical sampling.

In terms of factor construction and data integration, this study considers 14 types of factors, encompassing topography, geology, and climate. While these factors cover the main disaster-causing environments, the quantification of human activities—such as railway construction, slope reinforcement, and mining disturbances—remains limited, which may lead to an underestimation of local landslide risks. Moreover, the data sources are primarily static historical remote sensing and environmental datasets, without incorporating dynamic monitoring information such as real-time rainfall or groundwater fluctuations. The use of historical landslide relics may also introduce temporal biases, including changes in land use, vegetation cover, or local climate conditions since the landslide events, potentially affecting the accuracy of susceptibility analysis. Future work could integrate multi-source data (e.g., radar, InSAR, groundwater monitoring) and, building upon the existing RF–SHAP framework, develop time-series dynamic landslide susceptibility models to enable real-time updates and dynamic early warning. Additionally, given sufficient data availability, more specific railway engineering disturbance indicators could be incorporated to improve the targeting and predictive accuracy of landslide risk assessment along railway corridors.

6 Conclusion

This study focuses on the Zhuzhou–Guangzhou section of the Beijing–Guangzhou Railway and analyzes the spatial distribution patterns of landslides based on 713 large-scale landslide relics and 14 environmental factors (topography, geology, and hydrology). A landslide susceptibility assessment framework was constructed, integrating the random forest model with the SHAP interpretability method. The study proposes a multi-scale collaborative approach of “global modeling and local analysis,” enabling systematic evaluation from regional modeling to the diagnosis of key controlling factors in representative sections.

Spatial analysis shows that landslides are primarily distributed at elevations between 188 and 752 m and have slopes of 15°–30° and curvatures of −0.35–0.22. Northeastern and southwesterly aspects are slightly more common. The highest number and area of landslides occur on steeper slopes. Landslides occur predominantly in Jurassic strata and areas of high precipitation. FVC and proximity to rivers also significantly influence their distribution. The highest landslide density is within 0.2–0.5 km of railway lines, reflecting the localized perturbations caused by railway construction on slope stability.

The susceptibility model performed well (AUC = 0.88). Global results showed that high and very high susceptibility areas were mainly concentrated in the central and southern parts of the study area, especially in Suxian District of Chenzhou City and the northern part of Lechang City of Shaoguan City. Global factor importance analysis indicates that slope, TWI, elevation, and average precipitation are the primary controlling factors for landslide occurrence in the study area. By further applying the SHAP method for localized factor interpretation in typical high-risk segments (e.g., Chenzhou, Shaoguan, Qingyuan), it was found that although dominant factors are generally similar, their contribution rankings differ, reflecting the spatial heterogeneity and complexity of landslide triggering mechanisms along the railway corridor. For example, landslides in Suxian District are mainly influenced by the combined effects of mining disturbance and rainfall; the Lechang segment in Shaoguan requires focused attention on the interaction between engineering disturbances and hydrological factors; while the Yingde segment in Qingyuan is predominantly controlled by tectonic and hydrological interactions. These findings provide data support and mechanistic insights for segmented landslide risk identification and differentiated prevention strategies along the railway.

This framework enhances the adaptability of traditional models in complex geological settings and exemplifies an integrated analytical approach from regional modeling to localized detailed diagnosis, offering strong engineering applicability and promotion potential. Future work may integrate remote sensing monitoring and measured rainfall data to enable dynamic model updating and real-time early warning capabilities, further supporting precise landslide disaster prevention and control practices for the railway.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

HL: Methodology, Investigation, Supervision, Conceptualization, Visualization, Formal Analysis, Writing – original draft. CX: Conceptualization, Methodology, Investigation, Writing – review and editing, Supervision, Funding acquisition. LF: Conceptualization, Writing – review and editing. PW: Writing – review and editing, Data curation. JS: Writing – review and editing, Data curation. XZ: Writing – review and editing, Data curation. JW: Project administration, Writing – review and editing. QS: Project administration, Writing – review and editing. KL: Methodology, Writing – review and editing.

Funding

The authors declare that financial support was received for the research and/or publication of this article. This work was supported by a grant from Chongqing Water Resources Bureau, China (Project No. CQS24C00836), Research Institute of China Southern Power Grid Co., Ltd. [1500002024030103SJ00003 (CG1500062001647685-001)], Research Institute of China Southern Power Grid Co., Ltd. [1500002024030103SJ00009 (CG1500062001634723-001)], and Key Project of China Railway Design Corporation (Project No. 2023A0226409).

Conflict of interest

Author PW was employed by Beijing Engineering Corporation Limited. Author JW was employed by China Railway Xi'an Group Company Limited. Author QS was employed by China Railway Design Corporation.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors declare that this study received funding from China Southern Power Grid Co., Ltd. and China Railway Design Corporation. The funders had the following involvement in the study: provision of essential research data.

The author CX declared that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Akinci, H., Kilicoglu, C., and Dogan, S. (2020). Random forest-based landslide susceptibility mapping in coastal regions of Artvin, Turkey. ISPRS Int. J. Geo Information 9 (9), 553. doi:10.3390/ijgi9090553

CrossRef Full Text | Google Scholar

Aleotti, P., and Chowdhury, R. (1999). Landslide hazard assessment: summary review and new perspectives. Bull. Eng. Geol. Environment 58, 21–44. doi:10.1007/s100640050066

CrossRef Full Text | Google Scholar

Ali, S. A., Parvin, F., Pham, Q. B., Khedher, K. M., Dehbozorgi, M., Rabby, Y. W., et al. (2022). An ensemble random forest tree with SVM, ANN, NBT, and LMT for landslide susceptibility mapping in the rangit river watershed, India. Nat. Hazards 113 (3), 1601–1633. doi:10.1007/s11069-022-05360-5

CrossRef Full Text | Google Scholar

Bai, S., Lü, G., Wang, J., Zhou, P., and Ding, L. (2011). GIS-Based rare events logistic regression for landslide-susceptibility mapping of lianyungang, China. Environ. Earth Sci. 62, 139–149. doi:10.1007/s12665-010-0509-3

CrossRef Full Text | Google Scholar

Brabb, E. E. (1984). Innovative approaches to landslide hazard and risk mapping. Tokyo, Japan: Japan Landslide Society.

Google Scholar

Breiman, L. (2001). Random forests. Mach. Learning 45, 5–32. doi:10.1023/a:1010933404324

CrossRef Full Text | Google Scholar

Bureau, G. R. S. A. A. (2020). Announcement on the investigation findings of the “March 30” major railway accident involving the derailment of passenger train T179 on the Beijing-Guangzhou railway line. Available online at: http://www.nra.gov.cn/zzjg/jgj/gzgl/gsgz/202004/t20200430_337548.shtml.

Google Scholar

Chauhan, V., Gupta, L., and Dixit, J. (2025). Landslide susceptibility assessment for Uttarakhand, a Himalayan state of India, using multi-criteria decision making, bivariate, and machine learning models. Geoenvironmental Disasters 12 (1), 2. doi:10.1186/s40677-024-00307-3

CrossRef Full Text | Google Scholar

Chen, C., and Fan, L. (2024). Interpretability of statistical, machine learning, and deep learning models for landslide susceptibility mapping in three gorges Reservoir area. arXiv Preprint arXiv:2405.11762. doi:10.48550/arXiv.2405.11762

CrossRef Full Text | Google Scholar

Choubin, B., Jaafari, A., Henareh, J., Karimi, O., and Hosseini, F. S. (2025). Explainable artificial intelligence (XAI) for interpreting predictive models and key variables in flood susceptibility. Results Eng. 27, 105976. doi:10.1016/j.rineng.2025.105976

CrossRef Full Text | Google Scholar

Corominas, J., van Westen, C., Frattini, P., Cascini, L., Malet, J.-P., Fotopoulou, S., et al. (2014). Recommendations for the quantitative analysis of landslide risk. Bull. Engineering Geology Environment 73, 209–263. doi:10.1007/s10064-013-0538-8

CrossRef Full Text | Google Scholar

Cruden, D. (1991). A simple definition of a landslide. Bull. Eng. Geol. Environ. 43 (1), 27–29. doi:10.1007/bf02590167

CrossRef Full Text | Google Scholar

Cui, Y., Hu, J., Xu, C., Miao, H., and Zheng, J. (2022). Landslides triggered by the 1970 Ms 7.7 Tonghai earthquake in Yunnan, China: an inventory, distribution characteristics, and tectonic significance. J. Mt. Sci. 19 (6), 1633–1649. doi:10.1007/s11629-022-7321-x

CrossRef Full Text | Google Scholar

Dai, F., Lee, C. F., and Ngai, Y. Y. (2002). Landslide risk assessment and management: an overview. Eng. Geology 64 (1), 65–87. doi:10.1016/s0013-7952(01)00093-x

CrossRef Full Text | Google Scholar

Dey, H., Haque, M. M., Shao, W., VanDyke, M., and Hao, F. (2024). Simulating flood risk in Tampa Bay using a machine learning driven approach. NPJ Nat. Hazards 1 (1), 40. doi:10.1038/s44304-024-00045-4

CrossRef Full Text | Google Scholar

Fang, R., Liu, Y., and Huang, Z. (2021). A review of the methods of regional landslide hazard assessmentbased on machine learning. Chin. J. Geol. Hazard Control 32 (4), 1–8. doi:10.16031/j.cnki.issn.1003-8035.2021.04-01

CrossRef Full Text | Google Scholar

Feng, L., Qi, W., Xu, C., Yang, W., Yang, Z., Xiao, Z., et al. (2024a). Landslide research from the perspectives of Qinling mountains in China: a critical review. J. Earth Sci. 35 (5), 1546–1567. doi:10.1007/s12583-023-1935-9

CrossRef Full Text | Google Scholar

Feng, L., Xu, C., Tian, Y., Li, L., Sun, J., Huang, Y., et al. (2024b). Landslides of China's qinling. Geoscience Data J. 11 (4), 725–741. doi:10.1002/gdj3.246

CrossRef Full Text | Google Scholar

Fick, S. E., and Hijmans, R. J. (2017). WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. Journal Climatology 37 (12), 4302–4315. doi:10.1002/joc.5086

CrossRef Full Text | Google Scholar

Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20 (177), 177–181. Available online at: http://jmlr.org/papers/v20/18-760.html.

PubMed Abstract | Google Scholar

Gao, H., Xu, C., Xie, C., Ma, J., and Xiao, Z. (2024). Landslides triggered by the July 2023 extreme rainstorm in the Haihe River Basin, China. Berlin, Germany: Springer.

Google Scholar

Guerriero, L., Prinzi, E. P., Calcaterra, D., Ciarcia, S., Di Martire, D., Guadagno, F. M., et al. (2021). Kinematics and geologic control of the deep-seated landslide affecting the historic center of Buonalbergo, southern Italy. Geomorphology 394, 107961. doi:10.1016/j.geomorph.2021.107961

CrossRef Full Text | Google Scholar

Guo, F., Lai, P., Huang, F., Liu, L., Wang, X., and He, Z. (2024). Literature review and research progress of LandslideSusceptibility mapping based on knowledge graph. Earth Sci. 49 (5), 1584–1606. doi:10.3799/dqkx.2023.058

CrossRef Full Text | Google Scholar

Gupta, K., Yunus, A. P., Siddique, T., and Ahamad, A. (2025). Landslide susceptibility along National Highway-7 in the Himalayas using random forest-based machine learning tool. J. Earth Syst. Sci. 134 (2), 74. doi:10.1007/s12040-025-02533-1

CrossRef Full Text | Google Scholar

Habumugisha, J. M., Chen, N., Rahman, M., Islam, M. M., Ahmad, H., Elbeltagi, A., et al. (2022). Landslide susceptibility mapping with deep learning algorithms. Sustainability 14 (3), 1734. doi:10.3390/su14031734

CrossRef Full Text | Google Scholar

Halder, K., Srivastava, A. K., Ghosh, A., Das, S., Banerjee, S., Pal, S. C., et al. (2025). Improving landslide susceptibility prediction through ensemble recursive feature elimination and meta-learning framework. Sci. Rep. 15 (1), 5170. doi:10.1038/s41598-025-87587-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Han, Z., and Wu, G. (2024). Why do people not prepare for disasters? A national survey from China. Npj Nat. Hazards 1 (1), 1. doi:10.1038/s44304-024-00001-2

CrossRef Full Text | Google Scholar

He, L., Wu, X., He, Z., Xue, D., Luo, F., Bai, W., et al. (2023). Susceptibility assessment of landslides in the Loess plateau based on machine learning models: a case study of xining city. Sustainability 15 (20), 14761. doi:10.3390/su152014761

CrossRef Full Text | Google Scholar

Highland, L. (2004). Landslide types and processes. Herndon, VA: USGS Report.

Google Scholar

Hong, H., Pradhan, B., Xu, C., and Bui, D. T. (2015). Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 133, 266–281. doi:10.1016/j.catena.2015.05.019

CrossRef Full Text | Google Scholar

Hong, H., Chen, W., Xu, C., Youssef, A. M., Pradhan, B., and Tien Bui, D. (2017a). Rainfall-induced landslide susceptibility assessment at the Chongren area (China) using frequency ratio, certainty factor, and index of entropy. Geocarto International 32 (2), 1–16. doi:10.1080/10106049.2015.1130086

CrossRef Full Text | Google Scholar

Hong, H., Tsangaratos, P., Ilia, I., Chen, W., and Xu, C. (2017b). Comparing the performance of a logistic regression and a random forest model in landslide susceptibility assessments. The Case of Wuyaun Area, China. In: Advancing culture of living with landslides: volume 2 advances in landslide science. Cham: Springer), 1043–1050.

CrossRef Full Text | Google Scholar

Huang, W. (2023). Landslide susceptibility assessment in large range basedon deep learning: a case Study of the Qinghai-TibetPlateau transportation corridor. Xi'an, China: Master, Chang’an University.

Google Scholar

Huang, Y., Xu, C., Li, L., He, X., Cheng, J., Xu, X., et al. (2022). Inventory and spatial distribution of ancient landslides in Hualong County, China. Land 12 (1), 136. doi:10.3390/land12010136

CrossRef Full Text | Google Scholar

Huang, Y., Xu, C., He, X., Cheng, J., Huang, Y., Wu, L., et al. (2024). Distribution characteristics and cumulative effects of landslides triggered by multiple moderate-magnitude earthquakes: a case study of the comprehensive seismic impact area in Yibin, Sichuan, China. Landslides 21 (12), 2927–2943. doi:10.1007/s10346-024-02351-4

CrossRef Full Text | Google Scholar

Huang, Y., Xu, C., He, X., Cheng, J., Xu, X., and Tian, Y. (2025). Landslides induced by the 2023 Jishishan Ms6. 2 earthquake (NW China): spatial distribution characteristics and implication for the seismogenic fault. Npj Nat. Hazards 2 (1), 14. doi:10.1038/s44304-025-00064-9

CrossRef Full Text | Google Scholar

Hungr, O., Leroueil, S., and Picarelli, L. (2014). The varnes classification of landslide types, an update. Landslides 11, 167–194. doi:10.1007/s10346-013-0436-y

CrossRef Full Text | Google Scholar

Jaboyedoff, M., Oppikofer, T., Abellán, A., Derron, M.-H., Loye, A., Metzger, R., et al. (2012). Use of LIDAR in landslide investigations: a review. Nat. Hazards 61, 5–28. doi:10.1007/s11069-010-9634-2

CrossRef Full Text | Google Scholar

Jarvis, A., Guevara, E., Reuter, H., and Nelson, A. (2008). Hole-filled SRTM for the globe: version 4: data grid. France: CGIAR Consortium for Spatial Information.

Google Scholar

Jebur, M. N., Pradhan, B., and Tehrany, M. S. (2014). Optimization of landslide conditioning factors using very high-resolution airborne laser scanning (LiDAR) data at catchment scale. Remote Sens. Environ. 152, 150–165. doi:10.1016/j.rse.2014.05.013

CrossRef Full Text | Google Scholar

Jenks, G. F. (1967). The data model concept in statistical mapping. Int. Yearb. Cartogr. 7.

Google Scholar

Kamp, U., Growley, B. J., Khattak, G. A., and Owen, L. A. (2008). GIS-based landslide susceptibility mapping for the 2005 Kashmir earthquake region. Geomorphology 101 (4), 631–642. doi:10.1016/j.geomorph.2008.03.003

CrossRef Full Text | Google Scholar

Kavzoglu, T., and Teke, A. (2022). Predictive performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest, extreme gradient boosting (XGBoost) and natural gradient boosting (NGBoost). Arabian J. Sci. Eng. 47 (6), 7367–7385. doi:10.1007/s13369-022-06560-8

CrossRef Full Text | Google Scholar

Kouhartsiouk, D., and Perdikou, S. (2021). The application of DInSAR and Bayesian statistics for the assessment of landslide susceptibility. Nat. Hazards 105 (3), 2957–2985. doi:10.1007/s11069-020-04433-7

CrossRef Full Text | Google Scholar

Kubwimana, D., Brahim, L. A., and Abdelouafi, A. (2021). A new approach in the development and analysis of the landslide susceptibility map of the hillslopes of Bujumbura, Burundi. Eureka Phys. Eng., 26–34. doi:10.21303/2461-4262.2021.001724

CrossRef Full Text | Google Scholar

Li, L., Xu, C., Xu, X., Zhang, Z., and Cheng, J. (2021). Inventory and distribution characteristics of large-scale landslides in Baoji city, Shaanxi province, China. ISPRS Int. J. Geo Information 11 (1), 10. doi:10.3390/ijgi11010010

CrossRef Full Text | Google Scholar

Li, T., Xu, C., Li, L., and Xu, J. (2024). The landslide traces inventory in the transition zone between the Qinghai-Tibet Plateau and the Loess Plateau: a case study of Jianzha County, China. Front. Earth Sci. 12, 1370992. doi:10.3389/feart.2024.1370992

CrossRef Full Text | Google Scholar

Liu, F., Wang, L., Xiao, D., and Wang, J. (2021). Evaluation of landslide susceptibility in Ningnan County basedon fuzzy comprehensive evaluation. J. Nat. Disasters 30 (5), 237–246. doi:10.13577/j.jnd.2021.0523

CrossRef Full Text | Google Scholar

Liu, R., Yang, X., Xu, C., Wei, L., and Zeng, X. (2022). Comparative study of convolutional neural network and conventional machine learning methods for landslide susceptibility mapping. Remote Sens. 14 (2), 321. doi:10.3390/rs14020321

CrossRef Full Text | Google Scholar

Liu, Y., Xu, S., Liu, C., and Ma, Y. (2024). Landslide susceptibility assessment considering multi-method integrated feature selection and negative sample optimization. Bull. Surveying Mapping (9), 74. doi:10.13474/j.cnki.11-2246.2024.0914

CrossRef Full Text | Google Scholar

Lundberg, S. M., and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Adv. Neural Information Processing Systems, 30.

Google Scholar

Luu, C., Ha, H., Tran, X. T., Vu, T. H., and Bui, Q. D. (2024). Landslide susceptibility and building exposure assessment using machine learning models and geospatial analysis techniques. Adv. Space Res. 74 (11), 5489–5513. doi:10.1016/j.asr.2024.08.046

CrossRef Full Text | Google Scholar

Merghadi, A., Yunus, A. P., Dou, J., Whiteley, J., ThaiPham, B., Bui, D. T., et al. (2020). Machine learning methods for landslide susceptibility studies: a comparative overview of algorithm performance. Earth Science Rev. 207, 103225. doi:10.1016/j.earscirev.2020.103225

CrossRef Full Text | Google Scholar

Petley, D. (2012). Global patterns of loss of life from landslides. Geology 40 (10), 927–930. doi:10.1130/g33217.1

CrossRef Full Text | Google Scholar

Reichenbach, P., Rossi, M., Malamud, B. D., Mihir, M., and Guzzetti, F. (2018). A review of statistically-based landslide susceptibility models. Earth Science Reviews 180, 60–91. doi:10.1016/j.earscirev.2018.03.001

CrossRef Full Text | Google Scholar

Sam, J. (2024). The effects of seismic behavior on high ground stress soft rock tunnel: a review. Civ. Eng. J. 10 (9), 3090–3121. doi:10.28991/cej-2024-010-09-020

CrossRef Full Text | Google Scholar

Susilo, A., Zulaikah, S., Pohan, A. F., Hasan, M. F. R., Hisyam, F., Rohmah, S., et al. (2024). Vulnerability index assessment for mapping ground movements using the microtremor method as geological hazard mitigation. Civ. Eng. J. 10 (5), 1616–1626. doi:10.28991/cej-2024-010-05-017

CrossRef Full Text | Google Scholar

Tateishi, R., Uriyangqai, B., Al-Bilbisi, H., Ghar, M. A., Tsend-Ayush, J., Kobayashi, T., et al. (2011). Production of global land cover data–GLCNMO. Int. J. Digital Earth 4 (1), 22–49. doi:10.1080/17538941003777521

CrossRef Full Text | Google Scholar

Torres-Vázquez, M. Á., Herrera, S., Gincheva, A., Halifa-Marín, A., Cavicchia, L., Di Giuseppe, F., et al. (2025). Enhancing seasonal fire predictions with hybrid dynamical and random forest models. NPJ Nat. Hazards 2 (1), 20. doi:10.1038/s44304-025-00069-4

CrossRef Full Text | Google Scholar

Tun, S. H., Changnv, Z., and Jamil, F. (2024). GIS-based landslide susceptibility assessment using random forest and support vector machine models: a case study of chin state, Myanmar. Acta Geodyn. Geomaterialia 21 (3). doi:10.13168/AGG.2024.0019

CrossRef Full Text | Google Scholar

Wang, B., Li, S., Xu, W., Yang, Y., and Li, Y. (2024a). A comparative Study of landslide susceptibility evaluation based onThree different machine learning algorithms. Northwest. Geol. 57 (1), 34–43. doi:10.12401/j.nwg.2023033

CrossRef Full Text | Google Scholar

Wang, J., Zang, M., Xu, C., Liu, T., and Huang, Y. (2024b). Landslide susceptibility assessment following the 2022 Luding earthquake:a coupled analytic hierarchy process and area under the receiver operating characteristic curve algorithm. J. Eng. Geol. 32 (5), 1696–1711. doi:10.13544/j.cnki.jeg.2024-0176

CrossRef Full Text | Google Scholar

Wang, W., Huang, Y.-d., Xu, C., Shao, X.-y., Li, L., Feng, L.-y., et al. (2024c). Identification and distribution of 13003 landslides in the northwest margin of Qinghai-Tibet Plateau based on human-computer interaction remote sensing interpretation. China Geol. 7 (2), 171–187. doi:10.31035/cg2023140

CrossRef Full Text | Google Scholar

Wu, X., Xu, X., Yu, G., Ren, J., Yang, X., Chen, G., et al. (2024). The China active faults database (CAFD) and its web system. Earth Syst. Sci. Data 16 (7), 3391–3417. doi:10.5194/essd-16-3391-2024

CrossRef Full Text | Google Scholar

Xia, L. (2008). Application of RS in mine exploration and Monitor-a case study in Suxian, Chenzhou, Hunan. Beijing: Master China University of Geosciences.

Google Scholar

Xiao, X., Zou, Y., Huang, J., Luo, X., Yang, L., Li, M., et al. (2024). An interpretable model for landslide susceptibility assessment based on optuna hyperparameter optimization and Random Forest. Geomatics, Nat. Hazards Risk 15 (1), 2347421. doi:10.1080/19475705.2024.2347421

CrossRef Full Text | Google Scholar

Xie, C., Xu, C., Huang, Y., Liu, J., Jin, J., Xu, X., et al. (2025). Detailed inventory and initial analysis of landslides triggered by extreme rainfall in the northern Huaiji County, Guangdong Province, China, from June 6 to 9, 2020. Geoenvironmental Disasters 12 (1), 7. doi:10.1186/s40677-025-00311-1

CrossRef Full Text | Google Scholar

Xu, C., Dai, F., Yao, X., Chen, J., Tu, X., Sun, Y., et al. (2009). GIS-based landslide susceptibility assessment usinganalytical hierarchy process in Wenchuanearthquake region. Chin. J. Rock Mech. Eng., 28. doi:10.3321/j.issn:1000-6915.2009.z2.100

CrossRef Full Text | Google Scholar

Xu, C., Dai, F., Yao, X., Zhao, Z., and Xiao, J. (2010). GIS platform and certainty factor analysis method based wenchuan earthquake induced landslide susceptibility evaluation. J. Eng. Geol. 18 (1), 15. doi:10.3969/j.issn.1004-9665.2010.01.003

CrossRef Full Text | Google Scholar

Xu, X., Han, Z., Yang, X., Zhang, S., Yu, G., Zhou, B., et al. (2016). Seismotectonic map in China and its adjacent regions. Beijing: Seismological Press.

Google Scholar

Xu, C., Xu, X., Zhou, B., and Shen, L. (2019). Probability of coseismic landslides: a new generation of earthquake-triggered landslide hazardmodel. J. Eng. Geol. 27 (05), 1122–1130. doi:10.13544/j.cnki.jeg.2019084

CrossRef Full Text | Google Scholar

Xue, Z., Xu, C., Zhang, Z., Feng, L., Li, H., Zhang, H., et al. (2025). Inventory of landslide relics in Zhenxiong County based on human-machine interactive visual interpretation, Yunnan Province, China. Front. Earth Sci. 12, 1518377. doi:10.3389/feart.2024.1518377

CrossRef Full Text | Google Scholar

Yanfatriani, E., Marzuki, M., Vonnisa, M., Razi, P., Hapsoro, C. A., Ramadhan, R., et al. (2024). Extreme rainfall trends and hydrometeorological disasters in tropical regions: implications for climate resilience. Emerg. Sci. J. 8 (5), 1860–1874. doi:10.28991/esj-2024-08-05-012

CrossRef Full Text | Google Scholar

Yong, C., Jinlong, D., Fei, G., Bin, T., Tao, Z., Hao, F., et al. (2022). Review of landslide susceptibility assessment based on knowledge mapping. Stoch. Environ. Res. Risk Assess. 36 (9), 2399–2417. doi:10.1007/s00477-021-02165-z

CrossRef Full Text | Google Scholar

Youssef, A. M., Pourghasemi, H. R., Pourtaghi, Z. S., and Al-Katheeri, M. M. (2016). Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13, 839–856. doi:10.1007/s10346-015-0614-1

CrossRef Full Text | Google Scholar

Yu, B., Xing, H., and Yan, J. (2024). Susceptibility assessment of multi-hazards using random forest—back propagation neural network coupling model: a Hangzhou city case study. Sci. Rep. 14 (1), 21783. doi:10.1038/s41598-024-71053-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Yuan, X., Liu, C., Nie, R., Yang, Z., Li, W., Dai, X., et al. (2022). A comparative analysis of certainty factor-based machine learning methods for collapse and landslide susceptibility mapping in Wenchuan County, China. Remote Sens. 14 (14), 3259. doi:10.3390/rs14143259

CrossRef Full Text | Google Scholar

Zeng, T., Wang, L., Zhang, Y., Cheng, P., and Wu, F. (2024). Landslide susceptibility modeling and interpretability based on CatBoost-SHAP model. Chin. J. Geol. Hazard Control 35 (1), 37–50. doi:10.16031/j.cnki.issn.1003-8035.202309035

CrossRef Full Text | Google Scholar

Zhang, H., and Zhang, G. (2014). Distribution and prevention of geo-hazards in Lechang city. J. Geol. Hazards Environ. Preserv. 25 (04), 47–50. doi:10.3969/j.issn.1006-4362.2014.04.008

CrossRef Full Text | Google Scholar

Zhang, X., Jiang, C., and Luo, M. (2007). Application of ROC analysis in machine learning. Comput. Eng. And Appl. (04), 243–248. doi:10.3321/j.issn:1002-8331.2007.04.074

CrossRef Full Text | Google Scholar

Zhang, K., Wu, X., Niu, R., Yang, K., and Zhao, L. (2017). The assessment of landslide susceptibility mapping using random forest and decision tree methods in the Three Gorges Reservoir area, China. Environ. Earth Sci. 76, 405–420. doi:10.1007/s12665-017-6731-5

CrossRef Full Text | Google Scholar

Zhang, W., He, Y., Wang, L., Liu, S., and Meng, X. (2023). Landslide Susceptibility mapping using random forest and extreme gradient boosting: a case study of Fengjie, Chongqing. Geol. J. 58 (6), 2372–2387. doi:10.1002/gj.4683

CrossRef Full Text | Google Scholar

Zhao, J. (2018). Research on geological disaster development characteristics and Formation conditions in Yingde City,Guangdong Province. Ground Water 40 (03), 108–109. doi:10.3969/j.issn.1004-1184.2018.03.037

CrossRef Full Text | Google Scholar

Zhao, J., Zhang, Q., Wang, D., Wu, W., and Yuan, R. (2022). Machine learning-based evaluation of susceptibility to geological hazards in the Hengduan mountains region, China. Int. J. Disaster Risk Sci. 13 (2), 305–316. doi:10.1007/s13753-022-00401-w

CrossRef Full Text | Google Scholar

Keywords: landslide susceptibility assessment, spatial distribution patterns, random forest, Beijing-Guangzhou railway, shap

Citation: Liu H, Xu C, Feng L, Wang P, Sun J, Zhang X, Wang J, Sun Q and Li K (2026) Spatial distribution patterns and landslide susceptibility analysis from a global–local perspective along the Zhuzhou-Guangzhou section of the Beijing–Guangzhou railway. Front. Earth Sci. 13:1722201. doi: 10.3389/feart.2025.1722201

Received: 10 October 2025; Accepted: 14 November 2025;
Published: 12 January 2026.

Edited by:

Gioacchino Francesco Andriani, University of Bari Aldo Moro, Italy

Reviewed by:

Yang Hailong, Chengdu University of Technology, China
Thapthai Chaithong, Kasetsart University, Thailand
Hooman Mousavi, K. N. Toosi University of Technology, Iran

Copyright © 2026 Liu, Xu, Feng, Wang, Sun, Zhang, Wang, Sun and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chong Xu, eGMxMTExMTExMUAxMjYuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.