Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Earth Sci., 12 January 2026

Sec. Geohazards and Georisks

Volume 13 - 2025 | https://doi.org/10.3389/feart.2025.1710586

Global landslide mapping using U-Net architecture with diverse backbones across multi-regional and multi-sensor remote sensing datasets

Naveen Chandra,&#x;Naveen Chandra1,2Himadri Vaidya,&#x;Himadri Vaidya3,4Kumar Abhinav&#x;Kumar Abhinav1Sansar Raj Meena,
&#x;Sansar Raj Meena5,6*
  • 1Geomorphology, Wadia Institute of Himalayan Geology, Dehradun, Uttarakhand, India
  • 2Academy of Scientific and Innovative Research, Ghaziabad, Uttar Pradesh, India
  • 3Department of Computer Science and Engineering, Graphic Era Hill University, Dehradun, India
  • 4Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India
  • 5Machine Intelligence and Slope Stability Laboratory, Department of Geosciences, University of Padova, Padova, Italy
  • 6Center for Remote Sensing, Department of Earth and Environment, Boston University, Boston, MA, United States

Landslides remain a constant hazard to societies in hilly regions worldwide, necessitating accurate and scalable detection techniques. In this study, we assess the performance of cutting-edge deep learning architecture, specifically U-Net, combined with seven backbone networks, including EfficientNet, ResNet, Inception, MobileNet, VGG, ResNeXt, and SENet, and their 29 variants, for the semantic segmentation of landslides using multiple high-resolution remote sensing datasets. Our experiments encompass two geographically diverse and challenging datasets, primarily the High-Resolution Global Landslide Detector Database (HRGLDD) (which includes South/Southeast Asia, East Asia, and Latin America), and the Large-scale Multi-source High-resolution Landslide Dataset (LMHLD). Performance is evaluated using standard segmentation metrics, including Intersection over Union (IoU), Precision, Recall, and F-score. The experimental results underscore the superior performance of the Squeeze-and-Excitation (SE) family of backbone networks, notably SENet154, SE-ResNet-152, and SE-ResNeXt-50/101, across all three decoder architectures. Specifically, the U-Net + SE-ResNeXt_101 model achieved the highest F-score of 0.9569, followed by U-Net + SE-ResNeXt_50 with an F-score of 0.9471 on the HRGLDD dataset. The study provides a comprehensive benchmark of encoder-decoder combinations for landslide mapping, emphasizing the importance of backbone selection in achieving segmentation accuracy. Our findings serve as a valuable resource for future remote sensing applications in geohazard mapping, particularly in regions with limited ground truth availability.

1 Introduction

Landslides are among the most critical natural hazards worldwide, particularly in mountainous and seismically active areas. Triggered by factors such as heavy rainfall, earthquakes, snowmelt, and human activities like deforestation and slope alteration, landslides pose significant threats to human life, infrastructure, and the environment (Ren et al., 2025). The increasing frequency and intensity of landslide events, due to climate change and land-use modifications, have intensified the need for timely detection and mapping techniques. Accurate detection of landslides plays a vital role in disaster risk reduction. Traditional techniques of landslide identification, such as field investigations and aerial photography interpretation, though accurate on small scales, are labour-intensive, time-consuming, and not pertinent for large or unreachable locations. These limitations led to the application of remote sensing technologies, which provide a synoptic and repeatable source of acquiring information across huge terrains.

Remote sensing data, mainly from Synthetic Aperture Radar (SAR) and optical sensors, have supported the mapping of landslides with varying spatial and spectral resolutions. When combined with Geographic Information Systems (GIS), remote sensing technology has enriched the capability of landslide hazard zonation mapping and change analysis. However, extracting and delineating the precise landslide boundaries and distinguishing landslides from other landscape features remains a challenge. At present, four primary approaches are applied for landslide detection using remote sensing data: visual interpretation, pixel-based classification, object-oriented analysis, and artificial intelligence (AI)-driven techniques (Han et al., 2023). Each technique offers different advantages and limitations. Visual interpretation depends on expert understanding, where analysts manually examine remote sensing imagery to find landslide features. Although this method can be precise due to human proficiency, it is fundamentally time-consuming and laborious. To overcome the ineffectiveness of visual analysis, pixel-based techniques have been introduced. These methods, such as the binary classification approach, categorize individual pixels as either landslide or non-landslide/background. While more effective, these methods face challenges in distinguishing landslides from other environmental features with similar spectral characteristics, leading to classification errors, particularly in heterogeneous terrains. Object-oriented approaches attempt to progress upon this problem by including multiscale segmentation methods by evaluating image primitives (for example, shape, texture, and spectral information) to classify regions instead of individual pixels (Lu et al., 2011). While object-oriented methods can yield enhanced spatial coherence, they need careful tuning of segmentation factors and threshold values. Besides, their performance might be reduced when applied to large and topographically multifaceted areas, particularly when rapid segmentation is essential. A study by Keyport et al. (2018) critically analysed the benefits and drawbacks of both pixel-based and object-oriented methodologies for landslide detection, emphasizing the necessity for more adaptive and scalable solutions.

To overcome these limitations, researchers have gradually headed towards AI-based methods. These techniques are capable of learning complex patterns directly from the data, eradicating the need for manual thresholding or expert-defined tuning. Machine learning (sub-discipline of AI) techniques have been extensively applied for landslide detection from remote sensing data (Tehrani et al., 2022). For example, random forests and support vector machines (Piralilou et al., 2019). However, these methods offer automation and scalability; their performance relies on user-defined feature selection and may struggle in heterogeneous or new environments.

Besides, deep learning methods have unveiled superior performance in semantic segmentation (Minaee et al., 2021), image classification (Zeng et al., 2021), and object detection (Zhao et al., 2019) tasks due to their capability to learn hierarchical features directly from data. Deep learning has also shown notable potential in geohazard analysis studies due to its ability to automatically learn complex patterns from large datasets (Ma and Mei, 2021). It has been effectively applied in detecting landslide events with improved accuracy. For instance, Mask-RCNN (Liu et al., 2024; Ullo et al., 2021), and U-Net architecture (Devara et al., 2024; Ghorbanzadeh et al., 2021; Meena et al., 2022a; Meena et al., 2022b). YOLO (You Only Look Once) models have also been successfully applied for localizing landslides (Chandra and Vaidya, 2024; He et al., 2025; Ma et al., 2025). Moreover, the hybrid models, for example, U-Net + OBIA (Ghorbanzadeh et al., 2022; Kaushal et al., 2024), have also been suggested previously. In recent times, attention mechanisms have significantly boosted the accuracy of deep learning models in landslide detection. Various studies have revealed their efficacy in focusing on relevant features and increasing model performance. For example, U-Net + CBAM (Lin et al., 2024), U-Net + SENet (Chen et al., 2023), AMU-Net (Wei et al., 2023), PConv-simAM-SegFormer (Yang et al., 2025), and YOLOv10+CBAM (Chandra et al., 2025).

Despite notable progress in deep learning-based landslide detection, several critical gaps persist. Most existing landslide detection studies primarily focus on single-region or localized datasets, which restricts systematic understanding of model behavior across diverse geomorphological, climatic, and sensor conditions. Very limited studies have explored the performance of U-Net architectures when integrated with a diverse set of modern backbone networks across varied global topographies for landslide detection. Moreover, systematic comparative evaluations of U-Net architectures coupled with multiple backbone families across globally distributed, heterogeneous datasets remain largely unexplored. Previous research has typically validated model performance within specific geographic zones without examining its robustness under varying terrain types, image resolutions, and environmental complexities. In contrast, this study addresses these limitations by conducting a comprehensive global-scale assessment of 29 U-Net backbone combinations using heterogeneous datasets collected from different continents, offering new insights into model adaptability, scalability, and the influence of backbone selection on landslide detection performance. To address the above challenges, this study’s objectives are threefold.

1. Evaluate the performance of the U-Net architecture, integrated with seven backbone families (EfficientNet, ResNet, Inception, MobileNet, VGG, ResNeXt, and SENet with their respective variants), forming 29 combinations for landslide detection across varied terrain and datasets.

2. Employ two remote sensing-based diverse landslide datasets, particularly HRGLDD, LMHLD, collected from several continents and highlighting different terrain types, to assess the generalizability systematically and to identify the most robust deep learning architecture and backbone combinations in detecting landslides under different environmental settings.

3. Propose a unified automated mapping framework capable of accurately detecting landslides across varying geographic and topographic conditions for global-scale applicability.

Based on the objectives above, this study addresses the following research question (RQ):

RQ1: How do different backbone-integrated U-Net architectures influence model generalization across diverse geomorphological, climatic, and environmental settings?

RQ2: To what extent can a standardized, cross-dataset evaluation framework reveal the robustness and consistency of semantic segmentation models when applied to multi-source, multi-resolution global landslide datasets?

RQ3: How can heterogeneous remote sensing data sources be efficiently leveraged within deep learning frameworks to enhance spatiotemporal robustness and real-time operational capability in landslide detection?

2 Datasets

In our research work, we used two diverse and geographically distributed datasets to assess the performance of our proposed models for landslide detection and mapping. Specifically, HRGLDD (Meena et al., 2022a), and LMHLD dataset (Liu et al., 2025) each contribute distinctive features and challenges, ensuring robust model assessment across dissimilar landscapes and climatic zones. Figure 1 presents the geographic distribution of landslide study sites used in this research, spanning multiple continents and diverse environmental settings, enabling a comprehensive global-scale evaluation. These sites correspond to the locations represented in the various datasets employed throughout the study.

Figure 1
Map of the world highlighting landslide events with red stars. Notable locations include the United States, Mexico, Haiti, Brazil, Chile, Iceland, Italy, Turkey, Georgia, Kyrgyzstan, Nepal, India, Myanmar, Democratic Republic of Congo, Zimbabwe, China, Vietnam, Taiwan, Japan, Indonesia, Papua New Guinea, and New Zealand. A legend in the bottom left indicates stars represent landslide events.

Figure 1. Geographic distribution of landslide study sites used in this research for global-scale evaluation. The map highlights locations corresponding to the datasets used.

HRGLDD is an open-access dataset developed for landslide mapping using high-resolution satellite imagery. This dataset utilizes PlanetScope images (2017–2022), which offer a 3-m pixel resolution, making it exceptionally suitable for detailed landslide detection and analysis (Meena et al., 2022a). HRGLDD comprises landslide instances from diverse global physiographical regions, including East Asia, South Asia, Southeast Asia, South America, and Central America. The dataset includes five instances each of rainfall and earthquake-triggered landslides. The spatial extent of the dataset covers 5,047.93 km2, comprising 3,825 georeferenced tiles, with a total of 7,193 mapped landslides occupying 53.07 km2 of combined landslide area. These tiles are sampled from twelve globally distributed physiographic regions, namely, Porgera in Papua New Guinea, Kodagu (India), Rolante (Brazil), the Tiburon Peninsula (Haiti), Rasuwa in Nepal, Hokkaido (Japan), Wenchuan and Longchuan (China), Sumatra in Indonesia, Hpa-An in Myanmar, Kaikōura (New Zealand), and Uvira in the Democratic Republic of the Congo. These regions are carefully selected to represent a wide range of geomorphological and topographical settings, providing a comprehensive basis for developing and testing landslide detection models. These events span various geographical areas, offering a robust dataset for understanding different landslide triggers and their manifestations in diverse environments. The HRGLDD dataset consists of standardized image patches, each containing four PlanetScope image bands: Red, Green, Blue, and Near-Infrared (NIR), along with a binary mask that indicates the presence of landslide areas. Including these bands allows for comprehensive analysis and feature extraction, leveraging the spectral information in the different bands. The binary mask serves as the ground truth for training and validating landslide detection models. The dataset details are provided in Table 1. The dataset is publicly available and can be freely accessed and downloaded from Zenodo (https://zenodo.org/records/7189381).

Table 1
www.frontiersin.org

Table 1. Details of the HR-GLDD (reproduced after Meena et al., 2022a).

The LMHLD dataset (Liu et al., 2025) encompasses landslide events from seven globally distributed regions, each corresponding to specific years of occurrence. These include Wenchuan, China (2008); Rio de Janeiro, Brazil (2011); Gorkha, Nepal (2015); Jiuzhaigou, China (2015); Taiwan, China (2018); Hokkaido, Japan (2018); and Emilia-Romagna, Italy (2023). The dataset integrates imagery from multiple satellite platforms particularly, IKONOS (0.8 m), RapidEye (5 m), Sentinel-2 (10 m), Gaofen-2 (1 m), and PlanetScope (3 m), resulting in spatial resolutions ranging from 0.8 m to 10 m. In total, LMHLD contains 25,365 tiles and documents more than 31,296 mapped landslides, with landslide areas spanning from 1 to 681,850 pixels. This broad variation in geographic settings, temporal windows, and sensor characteristics positions LMHLD as a comprehensive benchmark for developing and evaluating landslide detection models under diverse imaging and environmental conditions. The dataset details are summarized in Table 2. This dataset is freely available and can be accessed and downloaded from Zenodo (https://zenodo.org/records/11519933).

Table 2
www.frontiersin.org

Table 2. Details of the LMHLD (reproduced after Liu et al., 2025).

All spectral bands were then normalized using min–max scaling to account for radiometric variability arising from different sensors, acquisition conditions, and illumination settings. Additionally, images were converted into fixed-size patches, and binary landslide masks were standardized to a consistent class representation, where one denotes landslide and 0 denotes background. Despite their global coverage, multi-source landslide datasets inherently present several limitations. Annotation quality is not uniform across regions due to differing interpretation protocols and reference data availability. Moreover, sensor heterogeneity spanning 0.8 m (IKONOS) to 10 m (Sentinel-2) introduces substantial variability in spectral detail, texture, and landslide boundary clarity. Environmental factors such as vegetation density, illumination, and seasonal conditions further amplify intra-class variability. HRGLDD and LMHLD jointly offer complementary characteristics for evaluating model performance. HRGLDD provides radiometrically consistent, globally distributed imagery suited for establishing baseline performance, whereas LMHLD, with its multi-sensor and multi-resolution complexity, serves as a rigorous test for assessing the robustness and adaptability of the proposed models under heterogeneous real-world conditions.

3 Methodology

This section describes the methodological framework of our study. We adopt the U-Net architecture as the baseline model, which was first introduced to provide the foundation for our approach. Then, we present an overview of the wide range of state-of-the-art CNN backbones integrated with U-Net to enhance its feature extraction capabilities. Further, we describe the proposed network configurations resulting from these integrations. Lastly, we outline the training settings, machine configurations, and evaluation criteria used to assess the capability of the models across multiple datasets.

3.1 U-net model

A CNN introduced by Ronneberger et al. (2015) is a widely accepted model for classification tasks, including remote sensing image segmentation and landslide detection, due to its high accuracy and efficient use of training data. The U-Net architecture consists of two main paths: (1) Encoder (Contracting Path): This part of the network captures the context in the input image (Figure 2). It is composed of repeated applications of two 3 × 3 convolutional layers, each followed by a Rectified Linear Unit (ReLU) and a 2 × 2 max pooling operation for downsampling. With each downsampling step, the number of feature channels doubles, allowing the network to learn increasingly abstract and spatially compressed features. (2) Decoder (Expanding Path): This path enables precise localization using upsampling operations. Each upsampling step is followed by a 2 × 2 transposed convolution (up-convolution), a concatenation with the corresponding feature map from the encoder, and two 3 × 3 convolutions with ReLU activations. The concatenation of high-resolution features from the encoder with upsampled features helps recover spatial details lost during downsampling. The final layer is a 1 × 1 convolution that maps each feature vector to the desired number of classes (here, binary classification for landslide vs. non-landslide). U-Net remains a suitable architecture for landslide detection due to its encoder–decoder design with skip connections, which preserves fine spatial structures while simultaneously capturing broader contextual information necessary for segmenting complex terrain. Its ability to retain boundary sharpness makes it particularly effective for delineating landslides of varying shapes, sizes, and background heterogeneity. Moreover, U-Net seamlessly integrates with a wide range of pre-trained backbone encoders, enabling stronger feature extraction and improved representation learning. In this study, U-Net is employed as the primary segmentation framework, and its performance is systematically evaluated across diverse backbone families and globally distributed landslide datasets to rigorously assess robustness, adaptability, and cross-regional generalization.

Figure 2
Diagram of a U-Net architecture showing layers and processes. Arrows indicate components: input (orange), convolution (blue), max-pooling (green), output (dark green), concatenation (grey), up-convolution (red), and softmax (orange). The structure resembles an encoder-decoder network.

Figure 2. Architecture of the U-Net model.

3.2 Description of backbone networks

In this research work, seven backbone families are integrated with segmentation architectures to evaluate their effectiveness in landslide detection across varied terrain and datasets. The EfficientNet family (B0–B7) uses a composite scaling approach that consistently scales network depth, width, and input resolution, enabling better performance with fewer parameters and computational overheads. This makes them well-suited for processing high-resolution satellite imagery efficiently. The ResNet family, including ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152, utilizes residual learning through skip connections to remove the vanishing gradient problem, allowing the training of deeper networks while upholding strong feature extraction abilities across dissimilar spatial hierarchies. The Inception family, comprising InceptionV3 and InceptionResNetV2, exploits multi-scale convolutional filters to capture features at several receptive fields. This makes them effective for detecting landslides with variable shapes and sizes across diverse scenes. In contrast, the MobileNet and MobileNetV2 are lightweight models, using depth-wise separable convolutions to minimize computational load, making them suitable for real-time applications in low-resource settings although maintaining competitive accuracy. The SENet (squeeze and excitation network) family, comprising SE-ResNet and SE-ResNeXt variants, introduces a channel attention mechanism by recalibrating feature responses, thereby enhancing the network’s ability to focus on critical regions, which is particularly beneficial in identifying subtle landslide patterns. The VGG family (VGG16 and VGG19) is known for its deep yet simple design, using a stack of convolutional layers with small (3 × 3) filters. Though computationally demanding, VGG networks are powerful in capturing spatial features and have steadily provided robust results in segmentation tasks. Together, these backbones offer a wide range of feature extraction approaches, allowing a comprehensive comparison of their performance for global landslide mapping tasks across different datasets and terrain complexities. The ResNeXt architecture, including ResNeXt-50 and ResNeXt-101, is an extension of the ResNet family that introduces the concept of the number of parallel paths within a block. Unlike traditional ResNets that emphasize increasing depth or width, ResNeXt increases model capacity by aggregating multiple transformations (paths) of the same topology in a split-transform-merge approach. This clustered convolution method allows for more efficient learning and improved feature representation without significantly increasing computational cost. Hence, ResNeXt models are well-suited for tasks like landslide detection, where learning subtle and complex spatial patterns in remote sensing imagery is vital.

3.3 Proposed network

In the proposed network, the main architectural enhancement involves the integration of a CNN backbone directly at the initial stage of the encoder, wherein the input image is first processed through the backbone’s initial convolutional and downsampling layers, effectively replacing the standard convolutional blocks of the U-Net with a pre-trained, feature-rich encoder. These layers progressively extract multilevel semantic features while reducing spatial dimensions. The backbone’s hierarchical design enables the model to capture both low-level and high-level representations more effectively than standard U-Net encoders. The input to the model is an RGB image of shape (H, W, 3). This image is passed through the encoder (backbone) to extract multiscale features. The encoder is derived from standard CNN backbones are used without their classification head (i.e., without the fully connected layers). The encoder processes the input image through successive convolutional blocks, each progressively reducing the spatial resolution while increasing the depth (number of filters), and learns increasingly abstract and semantic features. Conv1 corresponds to low-level features (e.g., edges, textures), conv2, conv3, conv4, and conv5 extract increasingly high-level features. Skip connections link each encoder block to its corresponding decoder block at the same spatial level. These connections pass forward fine-grained spatial details that may be lost during downsampling. This mechanism allows the decoder to reconstruct the output segmentation map with high spatial precision. The decoder performs upsampling and refinement to gradually reconstruct the spatial resolution of the input image: Upsampling doubles the spatial dimensions of the feature maps. The upsampled feature map is concatenated with the corresponding encoder feature map (via skip connection). The concatenated features are passed through a series of convolutional layers, each followed by Batch Normalization and ReLU activation, which refine and learn to decode the features into pixel-wise classification outputs. This process is repeated across all decoder blocks, restoring the spatial resolution step-by-step. The final output is produced by a Conv2D layer with filters = classes, using a 3 × 3 kernel and 'same' padding to maintain resolution. This is followed by a sigmoid activation function to generate the segmentation mask. The model is architecturally symmetric, where the encoder reduces dimensionality and extracts features, and the decoder restores spatial resolution. By allowing interchangeable backbones, the framework can be optimized for different computational constraints and performance needs. Skip connections are crucial to reconstructing fine details, especially in dense prediction tasks like semantic segmentation. Pre-trained backbones improve the representational power of the encoder, leading to better segmentation accuracy. This approach balances deep semantic understanding (via backbone) with fine-grained localization (via skip connections), making it well-suited for complex image segmentation tasks such as landslide detection.

3.4 Training settings and machine configurations

In our experiments, all datasets were evaluated independently using standardized training and validation splits, without any cross-dataset merging. This strategy ensured that model performance was assessed within each dataset under consistent experimental conditions. All backbone variants were trained using identical hyperparameters including an image size of 640 × 640 pixels, a batch size of 16, and 500 training epochs. Stochastic Gradient Descent was employed as the optimizer with an initial learning rate of 0.01, a momentum factor of 0.937, and a weight decay of 0.0005. These hyperparameters were selected based on empirical training stability and established practices in remote sensing semantic segmentation, ensuring reliable convergence across datasets with varying spatial resolutions and sensor characteristics. This unified training protocol supports fair and reproducible comparison across all 29 U-Net–backbone combinations. All experiments were conducted in a Python 3.10.12 environment using PyTorch 2.0.1+cu117, and training was performed on a high-performance system equipped with dual NVIDIA GeForce RTX 4090 GPUs (CUDA:0 with 24,207 MiB and CUDA:1 with 24,210 MiB of memory).

3.5 Evaluation criteria

To evaluate the performance of the proposed models, standard metrics are employed. These include precision, which estimates the ratio of correctly identified landslide pixels amongst all pixels classified as landslides by the model. Recall that signifies the proportion of actual landslide pixels that were correctly identified by the model, and F-Score indicates the harmonic mean of precision and recall, offering a balanced assessment when there is a trade-off between false positives and false negatives (Li, 2025). We also estimated the IoU that computes the overlap between the predicted landslide area and the ground truth, divided by their union (Li, 2025). Mathematically, they are given by Equations 14 (Ji et al., 2020; Liu et al., 2025).

Precision=TruepositiveTruepositive+Falsepositive(1)
Recall=TruepositiveTruepositive+Falsenegative(2)
fscore=2×Precision×RecallPrecision+Recall(3)
IoU=TruepositiveTruepositive+Falsepositive+Falsenegative(4)

4 Experimental results

Here, the quantitative and qualitative results of the proposed U-Net architecture combined with diverse backbones are described in detail.

4.1 Quantitative findings

HRGLDD: Table 3 shows the quantitative results from the training dataset. Among all the U-Net and backbone combinations, U-Net + SE-ResNeXt_101 showed the best overall performance, attaining the highest F-score (0.9569), Intersection over Union (IoU: 0.918), and precision and recall values above 0.95, while also upholding the lowest loss (0.0145). This indicates its superior capability in accurately detecting landslide areas while minimizing false positives and negatives. Further high-performing combinations comprised U-Net + SE-ResNeXt_50 (F: 0.9471, IoU: 0.9005), U-Net + MobileNet (F: 0.9408, IoU: 0.8892), and U-Net + EfficientNet_b6 (F1: 0.9403, IoU: 0.888), all of which achieved robust accuracy and low losses, making them viable choices for practical applications. The EfficientNet family, especially variants b5 to b7, consistently showed strong performance, suggesting their scalable and efficient design is well-suited for remote sensing-based segmentation tasks. Conversely, models such as U-Net + VGG_19 and U-Net + VGG_16 performed relatively below potential, with F-scores under 0.78 and high losses (>0.07), underlining their partial ability to capture complex landslide features in high-resolution imagery. Likewise, traditional ResNet backbones like ResNet_34 and ResNet_152 delivered moderate performance, with F-scores of 0.7818 and 0.8239, respectively. Overall, the results underscore that newer and deeper backbone architectures, especially those with attention mechanisms (e.g., SE blocks), significantly enhance U-Net’s segmentation capability. These findings demonstrate the importance of backbone selection in optimizing deep learning-based landslide detection frameworks across diverse terrains.

Table 3
www.frontiersin.org

Table 3. Training Results of HRGLDD dataset.

The validation results of U-Net with several backbones on the HRGLDD dataset (Table 4) highlight each model’s performance. Among the evaluated combinations, the models using SE-based attention backbones such as U-Net + SE-ResNeXt_50 (F-Score: 0.6807, IoU: 0.5323) and U-Net + SE-ResNet_152 (F-Score: 0.6736, IoU: 0.5223) demonstrated superior validation performance. These models produced high F-scores and maintained balanced precision and recall values, representing robust predictive ability on unseen data. Likewise, U-Net + InceptionResNet_V2 and U-Net + EfficientNet_b7 also achieved high F-scores (above 0.67) and decent IoU metrics, underlining their efficacy in capturing landslide features across a complex environment. Conversely, simpler backbones such as MobileNet_V2, ResNet_152, and ResNeXt_50 exhibited reduced performance with F-scores less than 0.57, specifying possible underfitting and weaker generalization. Whereas some models like U-Net + MobileNet attained high precision (0.7413), with relatively low recall (0.5241), suggesting that they missed some landslide regions. Moreover, VGG_16 and VGG_19 performed competitively, with F-scores around 0.64 with low loss and comparatively high recall, suggesting their continued utility in certain high-resolution remote sensing tasks.

Table 4
www.frontiersin.org

Table 4. Validation Results of HRGLDD dataset.

LMHLD: The training results of the U-Net architecture integrated with several backbone networks on the LMHLD dataset (as presented in Table 5) demonstrate a strong overall performance across most configurations, emphasizing the adaptability and robustness of the structure for landslide detection. The SE-based architectures, mainly U-Net + SE-ResNet_50 (F-score: 0.9627, IoU: 0.9283, Loss: 0.034) and U-Net + SeNet_154 (F-score: 0.962, IoU: 0.9269, Loss: 0.0347), view the best evaluation metrics, specifying superior capability in learning landslide features with marginal error. Correspondingly, U-Net + SE-ResNet_101 and SE-ResNeXt_101 also reveal high performance with F-scores above 0.95 and low loss values, confirming the strength of squeeze-and-excitation mechanisms in capturing channel-wise dependencies. Among the EfficientNet family, U-Net + EfficientNet_b7 (F: 0.9494, IoU: 0.9039) and EfficientNet_b5 (F: 0.9462, IoU: 0.8983) display high accuracy. The InceptionResNet_V2 and MobileNet models also produced challenging results, both exceeding an F-score of 0.945, signifying these architectures can successfully capture complex landslide patterns. While the ResNet variants illustrated varied performance. U-Net + ResNet_18 outperforms deeper variants like ResNet_50 and ResNet_152, achieving an F-score of 0.9469. However, VGG_16 and VGG_19 perform well, with F-scores of 0.9598 and 0.9587, respectively.

Table 5
www.frontiersin.org

Table 5. Training Results of LMHLD dataset.

The validation results of the U-Net model with diverse backbone networks on the LMHLD (as shown in Table 6) dataset reveal a steady and competitive performance across most architectures, however, with marginally lower metrics compared to the training results, showing the generalization capability of each network. The EfficientNet variants, mainly EfficientNet_b5, b6, and b7, attained the highest F-scores (0.8379, 0.8375, and 0.8379, respectively) and IoU values above 0.723. Their precision and recall values also remained high (b5: P = 0.8283, R = 0.8518), suggesting a balanced detection of both positive and negative landslide examples. Backbones like SE-ResNeXt_101, SE-ResNeXt_50, and SE-ResNet_50 also performed competitively (F: ∼0.832–0.834), although some models like SE-ResNet_101 and SE-ResNet_152 showed high losses (1.6 and 1.16), indicating overfitting during training despite decent F-scores. Architectures such as MobileNet, Inception_V3, and ResNet_34 produced competitive results with F-scores above 0.81, ensuring their efficacy in limited computational settings. Nevertheless, ResNeXt_50 and VGG_16 stated the highest loss values (1.30 and 0.43, respectively), suggesting inconsistencies between predictions and actual labels, probably due to the higher variability in landslide form across the dataset. In summary, EfficientNet_b5-b7 and SE-ResNeXt-based backbones delivered the best validation performance, highlighting their usefulness for generalization across global and diverse landslide scenarios characterised in the LMHLD dataset.

Table 6
www.frontiersin.org

Table 6. Validation Results of LMHLD dataset.

To indicate performance variability, we analyzed metric distributions across all 29 U-Net backbone models. On HRGLDD, F-scores ranged from 0.755 to 0.957 (mean ≈0.90, standard deviation (σ) ≈ 0.05) and IoU from 0.611 to 0.918 (mean ≈0.82, σ ≈ 0.07). On LMHLD, F-scores varied between 0.904 and 0.963 (mean ≈0.94, σ ≈ 0.02) and IoU between 0.826 and 0.928 (mean ≈0.89, σ ≈ 0.03). The consistent model ranking across datasets indicates that observed differences reflect backbone architecture effects rather than random variation.

Moreover, we evaluated the computational complexity of all U-Net backbone configurations in terms of parameter count and giga floating-point operations (GFLOPs). MobileNet and MobileNetV2 backbones are the most computationally efficient, with fewer than ∼9 M parameters and GFLOPs per image below 57, making them suitable for resource-constrained or near–real-time landslide mapping. EfficientNet variants (B0–B2) provide favorable efficiency–accuracy trade-offs (10–14 M parameters, <66 GFLOPs per image), while larger variants (B3–B7) exhibit rapidly increasing complexity, reaching 75 M parameters and ∼166 GFLOPs per image. ResNet and SE-ResNet models (18–50) fall within a mid-range complexity regime (14–35 M parameters, 68–133 GFLOPs), whereas deeper versions (101/152) incur substantially higher computational costs (>190 GFLOPs) with marginal efficiency gains. ResNeXt and SE-ResNeXt architectures further enhance representational capacity through grouped convolutions and channel attention but demand higher resources (136–197 GFLOPs per image). Inception-based models (Inception-V3 and Inception-ResNet-V2) are computationally intensive due to multi-branch designs (>120 GFLOPs), emphasizing multi-scale feature extraction at increased cost. VGG-based U-Net variants exhibit very high FLOPs (309–377 GFLOPs) despite moderate parameter counts, limiting their practical scalability. SENet-154–based U-Net is the most computationally demanding (>122 M parameters, 409 GFLOPs per image), offering maximal capacity at significant memory and runtime expense.

4.2 Qualitative findings

We conducted a detailed qualitative evaluation using visual outputs generated by the U-Net models, which were integrated with various backbone architectures. We first present the Grad-CAM visualizations (Figures 3a, 4a) to interpret the behavior of the proposed U-Net models with different backbone architectures. The warmer color regions (e.g., red and yellow hues) indicate areas with a higher contribution to landslide prediction, representing regions that strongly influence the model’s decision. Cooler colors represent lower influence and are commonly associated with background elements such as vegetation, stable terrain, or shadowed areas. This contrast demonstrates the model’s ability to localize discriminative landslide characteristics while suppressing misleading background information. On both the HRGLDD and LMHLD datasets, stronger backbone models exhibit more focused and spatially coherent activations over landslide regions, while suppressing irrelevant background elements such as vegetation, shadows, and terrain texture.

Figure 3
Image consists of two panels showing segmentation results using different U-Net models. Panel A displays colored segmentation overlays for four models across object sizes: Small, Medium, Large, and Complex. Panel B shows corresponding binary masks, with white indicating segmented areas. The models compared are U-Net with EfficientNet_b6, ResNet_50, SE-ResNeXt_101, and MobileNet. Each model's performance is illustrated across varying complexities of segmentation.

Figure 3. (a) Grad-CAM visualizations showing class-discriminative activation maps of selected U-Net backbone models on the HRGLDD dataset. (b) Visual outcomes on the HRGLDD dataset, showing binary detection results from the best-performing models of each backbone family.

Figure 4
Panel A shows a series of heatmaps comparing four U-Net model variations applied to images of varying complexity: small, medium, large, and complex. Each model variation displays different color intensity levels. Panel B provides corresponding binary mask results for the same models and complexity levels, illustrating segmentation outputs in black and white.

Figure 4. (a) Grad-CAM visualizations showing class-discriminative activation maps of selected U-Net backbone models on the LMHLD dataset. (b) Visual outcomes on the LMHLD dataset, presenting binary detection results from the best-performing models of each backbone family.

Further, we present the corresponding binary segmentation maps, where white pixels (value = 1) signify detected landslides and black pixels (value = 0) indicate non-landslide (background) areas. This visual comparison is performed on two benchmark datasets: HRGLDD and LMHLD, as shown in Figures 3b, 4b, respectively. For both, these outputs illustrate the models’ ability to delineate landslides of varying sizes (small, medium, and large) as well as complex shapes.

On the HRGLDD dataset, strong backbone models (e.g., SE-ResNeXt_101) exhibit coherent and well-defined segmentation, accurately capturing landslides with highly variable shapes and sizes (small, medium, large, and complex) while retaining boundary precision. These models effectively suppress confusion arising from visually similar background features such as vegetation, exposed rocks, and shadowed slopes, whereas lower-performing backbones show occasional misclassification under such conditions. This highlights the influence of richer spatial contextual encoding on the model’s visual outputs. On the LMHLD dataset, where image characteristics vary widely due to multi-sensor, multi-resolution inputs and complex mountainous terrain, the qualitative trends remain consistent. High-performing backbones continue to delineate landslides clearly, even in heterogeneous environments. In contrast, weaker models demonstrate visible degradation in boundary sharpness and increased omission of small or fragmented landslide features. Hence, the qualitative results confirm that integrating advanced backbones into the U-Net framework advances the model’s ability to detect diverse landslide shapes across multifaceted terrains while minimizing background confusion, making it more suitable for real-world hazard mapping.

5 Discussion

In this section, we assess the real-world applicability, and limitations of the proposed U-Net-based architectures. We investigate the performance across globally diverse datasets, explore model robustness in cross-regional scenarios, compare with existing benchmark studies, and identify directions for future improvement. The insights aim to inform the development of scalable, adaptable, and high-precision landslide detection frameworks for varied geomorphic and environmental conditions.

5.1 Global-scale evaluation under diverse landslide scenarios

To comprehensively assess the generalizability and robustness of the proposed U-Net architecture combined with various backbone networks, we expanded our experiments to include the GVLM (Global Very-High Resolution Landslide Mapping) dataset (Zhang et al., 2023). This dataset consists of 17 bitemporal very high-resolution (VHR) image pairs, each with a spatial resolution of 0.59 m, covering a total area of 163.77 km2. It spans landslide sites across six continents: Asia, Africa, North America, South America, Europe, and Oceania, capturing a wide variation of geographies, climates, vegetation types, land cover groups, and disaster triggers, including rainfall, earthquakes, floods, snowmelt, and hurricanes. The dataset includes complex spatial patterns (shapes, sizes, and distributions), providing a challenging setting for evaluating model generalization. The dataset presents unique spectral characteristics, geometric complexities, and temporal variations, making GVLM a highly challenging benchmark for evaluating model transferability. By applying our models to this globally distributed dataset, we aimed to demonstrate their potential for cross-regional landslide mapping.

Among the evaluated models, U-Net + SE-ResNeXt_101 attained the highest F-score of 0.9776, with a notable 1oU of 0.9646 and the lowest loss of 0.1055, representing superior global performance and spatial accuracy in detecting landslides across diverse geographic and climatic regions. This is closely followed by U-Net + ResNet_18 (F = 0.9598, IoU = 0.9227) and U-Net + MobileNet (F = 0.9395, IoU = 0.9376), indicating that lightweight and efficient architectures can also generalize well to heterogeneous global terrains. Moreover, the models like SE-ResNeXt_101, ResNeXt_50, and MobileNet also maintained the balance of accuracy and efficiency and demonstrated high generalization potential with relatively low loss values. However, models based on EfficientNet exhibited varied performance; while EfficientNet_b3 and b0 achieved reasonably high F-scores (above 0.79), others like EfficientNet_b4, b5, and b6 underperformed significantly, with b5 reporting an F-score as low as 0.5526. Similarly, VGG-based models yielded the lowest overall performance (VGG_16: F = 0.6124), likely due to their shallow architecture and limited feature extraction capacity in globally diverse terrain conditions. Figure 5 presents the radar charts indicating the results of each metric for all the proposed combinations. Models exhibiting simultaneously high F-score and IoU with low Loss form broader and more balanced radar polygons, indicating stronger cross-regional generalizability and stable feature transfer to unfamiliar terrains. In contrast, models with narrow or irregular polygon shapes reflect inconsistent performance typically caused by weak boundary agreement, reduced detection reliability, or higher prediction uncertainty on this geographically distinct dataset. By visualizing all backbones together, the radar charts clearly distinguish families such as SE-ResNeXt and ResNet variants, which maintain robust performance across all three metrics, from weaker backbones that struggle with spatial accuracy or produce higher errors. This multidimensional representation therefore highlights not only the absolute metric values but also the relative stability and adaptability of each architecture under global-scale domain shift. Overall, these findings confirm that integrating high-performing, scalable backbone networks with U-Net significantly enhances its cross-regional mapping capabilities, particularly in complex global scenarios involving diverse trigger mechanisms, topographic complexities, and land cover types.

Figure 5
Seven radar charts labeled (a) to (g), each displaying a different dataset with axes labeled differently. Each plot features a blue shaded area representing data distribution across multiple dimensions. Plots (a) to (c) feature b_labels, while (d) to (g) use SE-Net, ResNet, and related architecture labels. Data points vary in distribution and scale across charts.Five radar charts compare different models based on specific attributes. Models include ResNet, ResNeXt, Inception, VGG, and MobileNet variants. Charts (h), (j), and (k) show a similar scale up to 0.9, while charts (i) and (l) extend beyond 1.5 and 2.5, indicating different value ranges. Each chart highlights model performance on multiple axes with blue-shaded areas.

Figure 5. Quantitative results on the GVLM Dataset (Cross-Regional Application): Radar chart representation of model performance using three metrics, F-score (a,d,g,j), IoU (b,e,h,k), and Loss (c,f,i,l). Explicitly: U-Net with EfficientNet variants (b_0 to b_7) (a–c), U-Net with SE-based backbones (SE-ResNet and SE-ResNeXt) (d–f), U-Net with ResNet variants (g–i), and U-Net with other architectures, including Inception, MobileNet, and VGG (j–l).

5.2 Multi-dataset evaluation under heterogeneous regional conditions

To further examine the adaptability and robustness of the proposed networks, we conducted experiments on four additional region-specific landslide datasets: Diverse Mountainous Landslide Dataset (DMLD) (Chen et al., 2024), Luding (Wang et al., 2023), Bijie (Ji et al., 2020), and the Nepal Landslide Detection dataset (NpLDD) (Bragagnolo et al., 2021). These datasets differ significantly in terms of geographic location, topographic complexity, satellite sources, and triggering mechanisms such as rainfall or earthquakes, offering unique testbeds to evaluate model performance under diverse environmental settings. The DMLD dataset, collected in Yunnan, southwest China, and the Luding dataset, from Sichuan Province, provide high-resolution imagery of landslide-prone regions characterized by steep terrain and dense vegetation. The Bijie dataset, covering Guizhou Province, encompasses a vast mountainous area with a diverse range of landslide types. The Nepal dataset represents the complex and highly dynamic Himalayan landscape, where landslides are frequently triggered by seismic activity and intense rainfall, making it a critical region for evaluating model performance under extreme geophysical conditions. This cross-dataset evaluation not only highlights the flexibility and adaptability of the deep learning frameworks but also provides insights into their real-world applicability for large-scale and multi-regional landslide mapping.

5.2.1 Results of DMLD

When assessed on the DMLD dataset, U-Net + SeNet_154 achieved the highest performance, with an F1-score of 0.9591, IoU of 0.9217, and the lowest loss value of 0.0012, indicating remarkable segmentation capability (Table 7). Moreover, U-Net + MobileNet, U-Net + SE-ResNet_152, U-Net + SE-ResNeXt_101, and U-Net + EfficientNet_b7 yielded F-scores above 0.93, proving the robustness of lightweight and attention-based backbones. The EfficientNet family, mainly versions b2, b7, and b0, also revealed consistent high performance, reflecting the power of compound scaling in deep networks. Some moderate performances were noted, like U-Net + InceptionResNet_V2 and U-Net + ResNeXt_101 (F-score around 0.93). ResNet_101 and ResNet_152 combinations delivered only moderate results (F-Score between 0.70–0.73). Additionally, U-Net + SE-ResNet_50 and U-Net + ResNet_50 underperformed, likely due to overfitting, as indicated by their relatively low IoU values and high loss metrics.

Table 7
www.frontiersin.org

Table 7. Results of the DMLD dataset.

5.2.2 Results of luding

Among all models, U-Net + SE-ResNet_152 emerged as the best performer, achieving the highest F-score of 0.9819, IoU of 0.9644, and the lowest loss of 0.0102, demonstrating excellent detection ability (Table 8). Other high-performing models include SE-ResNet_101, SE-ResNet_18, and SeNet_154, all with F-scores exceeding 0.97, indicating the strength of Squeeze-and-Excitation (SE) based networks in enhancing spatial and channel-wise feature representation. MobileNet also performed remarkably well (F-Score = 0.9712), offering a lightweight alternative with competitive accuracy. In contrast, deeper ResNet variants like ResNet_50 and classical architectures like VGG_16 and VGG_19 showed lower performance, with VGG_19 achieving the weakest results (F-Score = 0.7241, IoU = 0.5684, Loss = 0.1582), likely due to their limited ability to capture fine-grained contextual features in complex mountainous terrain. Overall, the results reinforce the advantage of modern backbone networks with attention and residual mechanisms, particularly SE-based models, in improving landslide detection performance in high-resolution remote sensing imagery like that of the Luding dataset.

Table 8
www.frontiersin.org

Table 8. Results of the Luding dataset.

5.2.3 Results of bijie

Among all configurations, the U-Net + InceptionResNet_V2 model shows the highest F-score of 0.9868, IoU of 0.974, and the lowest loss of 0.0042, reflecting near-perfect prediction accuracy (Table 9). This is closely followed by models using SE-based ResNet and ResNeXt backbones, such as SE-ResNet_101 (F = 0.9829), SE-ResNet_18 (0.9841), and SE-ResNeXt_101 (0.9843), all showcasing exceptional precision and recall with minimal training loss. Other well-performing models include ResNet_34 (F = 0.9853) and EfficientNet_b6 (F1 = 0.9785), which also demonstrated excellent balance between accuracy and generalization. In contrast, models with VGG backbones (VGG_16 and VGG_19) underperformed (with F-scores below 0.62), suggesting their limited ability to handle the spatial complexity of the Bijie terrain.

Table 9
www.frontiersin.org

Table 9. Results of the Bijie dataset.

5.2.4 Results of NpLDD

Considering NpLDD, the top-performing models included combinations like U-Net + SE-ResNet_152 (F: 0.961), SE-ResNeXt_50 (F: 0.96), SE-ResNet_101 (F: 0.9565), SE-ResNeXt_101 (F: 0.9566), SeNet_154 (F: 0.9558), and MobileNet (F: 0.9549) as given in Table 10. These architectures consistently achieved high IoU values (above 0.91), extremely low loss (around 0.001), and balanced precision–recall tradeoffs (typically above 0.95). These results emphasize their strong segmentation capability and reliability in delineating landslide boundaries, despite the challenging topographic variation and vegetation interference in Nepal. The EfficientNet family also produced competitive results, particularly b0 through b5, all yielding F-scores between 0.94 and 0.95, with smooth precision-recall balance. Moreover, InceptionResNet_V2 (F: 0.8152) and ResNet_34 (F: 0.8183) also underperformed, indicating effective feature learning or generalization for this specific geography. On the other hand, traditional VGG-based backbones exhibited a drop in performance. For instance, U-Net + VGG_19 and VGG_16 produced the lowest F-scores and high loss values, suggesting a limited capacity to handle the spatial complexity and fine-grained features of landslides in this dataset.

Table 10
www.frontiersin.org

Table 10. Results of the NpLDD dataset.

5.2.5 Key insights and implications

SE-based backbones, particularly SENet and SE-ResNeXt variants, outperform other models due to their channel-attention mechanism, which adaptively recalibrates feature responses to emphasize landslide-relevant spectral and textural cues while suppressing confusing background information. This is crucial for landslide detection, where soil, moisture variations, and sharp morphological boundaries often exhibit subtle contrasts with surrounding vegetation, rock, or shadows. When coupled with ResNeXt’s multi-scale feature learning, the squeeze-and-excitation modules enhance discriminative feature representation and boundary precision, leading to higher robustness and consistently improved F-score and IoU across diverse terrains and sensor conditions. The cross-regional and cross-dataset experiments provide insights directly transferable to real-world hazard response situations. For instance, the strong and consistent performance of SE-based U-Net models across GVLM and multiple regional datasets indicates that these architectures can support rapid post-disaster mapping when new satellite imagery becomes available after major earthquakes or extreme rainfall events. In such scenarios, the models can generate binary landslide masks without region-specific retraining, enabling emergency agencies to quickly locate newly triggered landslides, prioritize field inspections, assess blocked transport corridors, and guide relief operations. Thus, although the study is centered on global-scale benchmarking, the demonstrated robustness across heterogeneous terrains and sensor conditions highlights the practical utility of these models for real-world, time-critical landslide hazard assessment.

5.3 Comparative analysis with benchmark studies

The previous work by Meena et al. (2022a) and Liu et al. (2025) played a foundational role by developing the HRGLDD and LMHLD datasets across diverse global regions, respectively and creating initial performance benchmarks using several U-Net-based architectures. Their primary focus was to validate and highlight the value of their newly curated dataset through the application of deep learning models. In contrast, our study builds upon this foundation by exploring a broader range of powerful backbone networks integrated with the U-Net architecture. A direct comparison between our best and lowest performing models and those from the previous study (Meena et al., 2022a) highlights substantial improvements. Our best-performing model, U-Net + SE-ResNeXt_101, attained an F-score of 0.9569 on the HRGLDD dataset. In contrast, the best model from the previous study (U-Net) reported an F-score of 0.7904 (Meena et al., 2022a). This corresponds to an absolute improvement of 0.1665, or a relative enhancement of approximately 21.07%, showcasing the substantial gains brought by integrating a powerful backbone like SE-ResNeXt_101. Moreover, our lowest-performing model, U-Net + VGG_19, achieved an F-score of 0.7554, which still outperformed the least-performing model from the previous study (Attn-res-U-Net, with an F-score of 0.6477) (Meena et al., 2022a). This indicates an absolute gain of 0.1077, or a relative improvement of around 16.63%, even at the lower end of model performance. Moreover, in the case of the LMHLD dataset, our best-performing model, U-Net + SE-ResNet_50, achieved an F-score of 0.9627. This marks a significant absolute improvement of 0.0929 over the previous best (Trans U-Net, F-score: 0.8698) (Liu et al., 2025), corresponding to a relative enhancement of approximately 10.68%. Furthermore, our lowest-performing model, U-Net + ResNet_152, realized an F-score of 0.9042, which still surpassed the lowest-performing model in the earlier study (Liu et al., 2025), Res U-Net (F-score: 0.8442), yielding an absolute gain of 0.06 or a relative improvement of about 7.10%. These improvements and comparisons demonstrate the effectiveness of the backbone architecture in augmenting landslide detection performance using the U-Net architecture. Moreover, we also emphasize the potential for further enhancing landslide detection performance using the HRGLDD and LMHLD datasets, thereby fulfilling the original intent of the benchmark study to drive continued model improvement.

5.4 Limitations and future scope

While our study explored a wide array of backbone architectures integrated with the U-Net model for landslide detection across multiple regional datasets, some limitations persist, offering valuable avenues for future research. The dependency on high-quality, expertly annotated datasets, which remain region-specific, uneven in interpretation quality, and labor-intensive to produce. This constraint restricts large-scale operational deployment, especially in remote and data-scarce regions where timely mapping is most needed. In applied early-warning settings, additional challenges may arise from data latency, cloud cover, limited temporal continuity, and variations across sensor types, which can affect the reliability of timely landslide detection (Zhang B. et al., 2024). To address these challenges, future research should underscore multi-temporal data integration, where sequential satellite observations (e.g., pre- and post-event imagery or time-series optical/SAR data) can enable change detection and progressive slope instability monitoring, thereby supporting near–real-time early warning and rapid post-disaster response. In addition, semi-supervised and weakly supervised learning frameworks can leverage large volumes of unlabeled or sparsely labeled remote sensing data, reducing dependence on high-quality manual annotations that are difficult to obtain during emergencies. Such approaches can facilitate faster model adaptation to newly affected regions and improve scalability of early warning systems. While the current study concentrates on optical satellite imagery and assesses a range of backbone architectures within the U-Net framework across diverse geographic areas, future research must explore the integration of multi-source remote sensing datasets. Such integration will support the development of more comprehensive landslide early warning and monitoring systems. Concerning practical deployment, some deeper and complex backbones may pose challenges in resource-constrained environments due to high computational demands; hence, model optimization techniques can be applied. Therefore, advancing toward real-time, interpretable, and generalizable landslide detection systems requires continued exploration of robust networks, adaptive learning strategies, and scalable implementation methods appropriate for diverse global locations.

6 Conclusion

This research work systematically evaluated 29 hybrid U-Net models integrating backbone architectures for landslide detection across diverse spatial and environmental contexts. Through experimentation on multiple regional and global datasets, we established a robust benchmarking structure to evaluate architectural adaptability, feature extraction efficiency, and cross-regional generalizability in high-resolution remote sensing scenarios. The findings demonstrate that the architectural design of the encoder significantly directs model capacity to characterize complex geomorphological patterns, mainly in heterogeneous terrain influenced by varied landslide triggers. Backbone-integrated U-Nets displayed enriched delineation of landslide boundaries, reduced contextual confusion, and preserved stability across unseen geographical domains. The consistent performance across global datasets upholds the operational capability of these models for geospatial hazard assessment. This work establishes a scalable foundation for developing high-fidelity landslide segmentation pipelines and provides a transferable architecture benchmarking methodology applicable to broader remote sensing-based environmental monitoring tasks.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://doi.org/10.5281/zenodo.7189381, https://zenodo.org/records/11424988, https://github.com/zxk688/GVLM, https://github.com/RS-CSU/DMLD-Dataset, https://pan.cdut.edu.cn/link/B007C24A04BAC995CC7D782DE0483C8F, https://gpcv.whu.edu.cn/data/Bijie_pages.html, https://zenodo.org/records/3675410.

Author contributions

NC: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review and editing. HV: Conceptualization, Data curation, Methodology, Validation, Visualization, Writing – review and editing. KA: Methodology, Writing – review and editing. SM: Conceptualization, Formal Analysis, Funding acquisition, Visualization, Writing – review and editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research work is supported by the Department of Science and Technology, Science and Engineering Research Board, New Delhi, India, under Grant No: EEQ/2022/000812. Open Access funding provided by Universitã degli Studi di Padova | University of Padua, Open Science Committee.

Acknowledgements

The authors would like to thank the director of the Wadia Institute of Himalayan Geology, Dehradun, for his continuous encouragement and motivation. The manuscript’s contribution number is WIHG/0438.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Bragagnolo, L., Rezende, L. R., Da Silva, R. V., and Grzybowski, J. M. V. (2021). Convolutional neural networks applied to semantic segmentation of landslide scars. CATENA 201, 105189. doi:10.1016/j.catena.2021.105189

CrossRef Full Text | Google Scholar

Chandra, N., and Vaidya, H. (2024). Automated detection of landslide events from multi-source remote sensing imagery: performance evaluation and analysis of YOLO algorithms. J. Earth Syst. Sci. 133 (3), 127. doi:10.1007/s12040-024-02327-x

CrossRef Full Text | Google Scholar

Chandra, N., Vaidya, H., Satyam, N., Tang, X., Singh, S., and Meena, S. R. (2025). A novel multi-layer attention boosted YOLOv10 network for landslide mapping using remote sensing data. Trans. GIS 29 (2), e70023. doi:10.1111/tgis.70023

CrossRef Full Text | Google Scholar

Chen, H., He, Y., Zhang, L., Yao, S., Yang, W., Fang, Y., et al. (2023). A landslide extraction method of channel attention mechanism U-Net network based on Sentinel-2A remote sensing images. Int. J. Digital Earth 16 (1), 552–577. doi:10.1080/17538947.2023.2177359

CrossRef Full Text | Google Scholar

Chen, J., Zeng, X., Zhu, J., Guo, Y., Hong, L., Deng, M., et al. (2024). The diverse Mountainous Landslide dataset (DMLD): a high-resolution remote sensing landslide dataset in diverse mountainous regions. Remote Sens. 16 (11), 1886. doi:10.3390/rs16111886

CrossRef Full Text | Google Scholar

Devara, M., Maurya, V. K., and Dwivedi, R. (2024). Landslide extraction using a novel empirical method and binary semantic segmentation U-NET framework using sentinel-2 imagery. Remote Sens. Lett. 15 (3), 326–338. doi:10.1080/2150704X.2024.2320178

CrossRef Full Text | Google Scholar

Ghorbanzadeh, O., Crivellari, A., Ghamisi, P., Shahabi, H., and Blaschke, T. (2021). A comprehensive transferability evaluation of U-Net and ResU-Net for landslide detection from Sentinel-2 data (case study areas from Taiwan, China, and Japan). Sci. Rep. 11 (1), 14629. doi:10.1038/s41598-021-94190-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Ghorbanzadeh, O., Shahabi, H., Crivellari, A., Homayouni, S., Blaschke, T., and Ghamisi, P. (2022). Landslide detection using deep learning and object-based image analysis. Landslides 19 (4), 929–939. doi:10.1007/s10346-021-01843-x

CrossRef Full Text | Google Scholar

Han, Z., Fang, Z., Li, Y., and Fu, B. (2023). A novel Dynahead-Yolo neural network for the detection of landslides with variable proportions using remote sensing images. Front. Earth Sci. 10, 1077153.

CrossRef Full Text | Google Scholar

He, L., Zhou, Y., Liu, L., Zhang, Y., and Ma, J. (2025). Application of the YOLOv11-seg algorithm for AI-based landslide detection and recognition. Sci. Rep. 15 (1), 12421. doi:10.1038/s41598-025-95959-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Ji, S., Yu, D., Shen, C., Li, W., and Xu, Q. (2020). Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17 (6), 1337–1352. doi:10.1007/s10346-020-01353-2

CrossRef Full Text | Google Scholar

Kaushal, A., Gupta, A. K., and Sehgal, V. K. (2024). A semantic segmentation framework with U-Net-pyramid for landslide prediction using remote sensing data. Sci. Rep. 14 (1), 30071. doi:10.1038/s41598-024-79266-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Keyport, R. N., Oommen, T., Martha, T. R., Sajinkumar, K. S., and Gierke, J. S. (2018). A comparative analysis of pixel- and object-based detection of landslides from very high-resolution images. Int. J. Appl. Earth Observation Geoinformation 64, 1–11. doi:10.1016/j.jag.2017.08.015

CrossRef Full Text | Google Scholar

Li, Y. (2025). The research on landslide detection in remote sensing images based on improved DeepLabv3+ method. Sci. Rep. 15 (1), 7957. doi:10.1038/s41598-025-92822-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin, H., Li, L., Qiang, Y., Xu, X., Liang, S., Chen, T., et al. (2024). A method for landslide identification and detection in high-precision aerial imagery: progressive CBAM-U-net model. Earth Sci. Inf. 17 (6), 5487–5498. doi:10.1007/s12145-024-01465-6

CrossRef Full Text | Google Scholar

Liu, X., Xu, L., and Zhang, J. (2024). Landslide detection with Mask R-CNN using complex background enhancement based on multi-scale samples. Geomatics, Nat. Hazards Risk 15 (1), 2300823. doi:10.1080/19475705.2023.2300823

CrossRef Full Text | Google Scholar

Liu, G., Wang, Y., Chen, X., Du, B., Li, P., Wu, Y., et al. (2025). LMHLD: a large-scale multi-source high-resolution landslide dataset for landslide detection based on deep learning. arXiv. doi:10.48550/ARXIV.2502.19866

CrossRef Full Text | Google Scholar

Lu, P., Stumpf, A., Kerle, N., and Casagli, N. (2011). Object-Oriented change detection for landslide rapid mapping. IEEE Geosci. Remote Sens. Lett. 8, 701–705. doi:10.1109/LGRS.2010.2101045

CrossRef Full Text | Google Scholar

Ma, Z., and Mei, G. (2021). Deep learning for geological hazards analysis: data, models, applications, and opportunities. Earth-Science Rev. 223, 103858. doi:10.1016/j.earscirev.2021.103858

CrossRef Full Text | Google Scholar

Ma, R., Yu, H., Liu, X., Yuan, X., Geng, T., and Li, P. (2025). InSAR-YOLOv8 for wide-area landslide detection in InSAR measurements. Sci. Rep. 15 (1), 1595. doi:10.1038/s41598-024-84626-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Meena, S. R., Nava, L., Bhuyan, K., Puliero, S., Soares, L. P., Dias, H. C., et al. (2022a). HR-GLDD: a globally distributed dataset using generalized DL for rapid landslide mapping on HR satellite imagery. ESSD – Land/Geology Geochemistry. doi:10.5194/essd-2022-350

CrossRef Full Text | Google Scholar

Meena, S. R., Soares, L. P., Grohmann, C. H., van Westen, C., Bhuyan, K., Singh, R. P., et al. (2022b). Landslide detection in the Himalayas using machine learning algorithms and U-Net. Landslides 19 (5), 1209–1229. doi:10.1007/s10346-022-01861-3

CrossRef Full Text | Google Scholar

Minaee, S., Boykov, Y. Y., Porikli, F., Plaza, A. J., Kehtarnavaz, N., and Terzopoulos, D. (2021). Image segmentation Using Deep Learning: a Survey. IEEE Trans. Pattern Analysis Mach. Intell. 43, 1. doi:10.1109/TPAMI.2021.3059968

PubMed Abstract | CrossRef Full Text | Google Scholar

Piralilou, S. T., Blaschke, T., and Ghorbanzadeh, O. (2019). An integrated approach of machine-learning models and Dempster-Shafer Theory for Landslide Detection. doi:10.13140/RG.2.2.33724.18567

CrossRef Full Text | Google Scholar

Ren, X., Wu, X., Zhai, D., Wang, X., He, N., and Tarif, M. (2025). ResM-FusionNet for efficient landslide detection algorithm with a hybrid architecture. Sci. Rep. 15 (1), 13080. doi:10.1038/s41598-025-98230-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Ronneberger, O., Fischer, P., and Brox, T. (2009). “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention (Cham: Springer international publishing), 234–241.

CrossRef Full Text | Google Scholar

Tehrani, F. S., Calvello, M., Liu, Z., Zhang, L., and Lacasse, S. (2022). Machine learning and landslide studies: recent advances and applications. Nat. Hazards 114, 1197–1245. doi:10.1007/s11069-022-05423-7

CrossRef Full Text | Google Scholar

Ullo, S., Mohan, A., Sebastianelli, A., Ahamed, S., Kumar, B., Dwivedi, R., et al. (2021). A new mask R-CNN-Based method for improved landslide detection. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 14, 3799–3810. doi:10.1109/JSTARS.2021.3064981

CrossRef Full Text | Google Scholar

Wang, H., Liu, J., Zeng, S., Xiao, K., Yang, D., Yao, G., et al. (2023). A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net. Landslides 21, 901–917. doi:10.1007/s10346-023-02184-7

CrossRef Full Text | Google Scholar

Wei, R., Ye, C., Sui, T., Zhang, H., Ge, Y., and Li, Y. (2023). A feature enhancement framework for landslide detection. Int. J. Appl. Earth Observation Geoinformation 124, 103521. doi:10.1016/j.jag.2023.103521

CrossRef Full Text | Google Scholar

Yang, S., Wang, Y., Zhao, K., Liu, X., Mu, J., and Zhao, X. (2025). Partial convolution-simple attention mechanism-SegFormer: an accurate and robust model for landslide identification. Eng. Appl. Artif. Intell. 151, 110612. doi:10.1016/j.engappai.2025.110612

CrossRef Full Text | Google Scholar

Zeng, D., Liao, M., Tavakolian, M., Guo, Y., Zhou, B., Hu, D., et al. (2021). Deep Learning for Scene Classification: a Survey (Version 2). arXiv. doi:10.48550/ARXIV.2101.10531

CrossRef Full Text | Google Scholar

Zhang, X., Yu, W., Pun, M.-O., and Shi, W. (2023). Cross-domain landslide mapping from large-scale remote sensing images using prototype-guided domain-aware progressive representation learning. ISPRS J. Photogrammetry Remote Sens. 197, 1–17. doi:10.1016/j.isprsjprs.2023.01.018

CrossRef Full Text | Google Scholar

Zhang, B., Tang, J., Huan, Y., Song, L., Shah, S. Y. A., and Wang, L. (2024). Multi-scale convolutional neural networks (CNNs) for landslide inventory mapping from remote sensing imagery and landslide susceptibility mapping (LSM). Geomatics, Nat. Hazards Risk 15 (1), 2383309. doi:10.1080/19475705.2024.2383309

CrossRef Full Text | Google Scholar

Zhao, Z.-Q., Zheng, P., Xu, S., and Wu, X. (2019). Object detection with deep learning: a review (arXiv:1807.05511). arXiv. doi:10.48550/arXiv.1807.05511

CrossRef Full Text | Google Scholar

Keywords: backbone networks, deep learning, landslides, remote sensing, semantic segmentation

Citation: Chandra N, Vaidya H, Abhinav K and Meena SR (2026) Global landslide mapping using U-Net architecture with diverse backbones across multi-regional and multi-sensor remote sensing datasets. Front. Earth Sci. 13:1710586. doi: 10.3389/feart.2025.1710586

Received: 22 September 2025; Accepted: 02 December 2025;
Published: 12 January 2026.

Edited by:

Chong Xu, Ministry of Emergency Management, China

Reviewed by:

Lifang Wang, Hunan Vocational College of Engineering, China
Guo Haojia, Lanzhou University, China

Copyright © 2026 Chandra, Vaidya, Abhinav and Meena. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Sansar Raj Meena, c2Fuc2FycmFqLm1lZW5hQHVuaXBkLml0

ORCID: Naveen Chandra, orcid.org/0000-0002-0957-097X; Himadri Vaidya, orcid.org/0009-0003-5780-0041; Kumar Abhinav, orcid.org/0009-0008-3732-2229 Sansar Raj Meena, orcid.org/0000-0001-6175-6491

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.