Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Environ. Sci., 21 January 2026

Sec. Environmental Informatics and Remote Sensing

Volume 13 - 2025 | https://doi.org/10.3389/fenvs.2025.1657984

This article is part of the Research TopicNew Artificial Intelligence Methods for Remote Sensing Monitoring of Coastal Cities and EnvironmentView all 9 articles

Segmentation of arctic coastal shoreline and bluff edges using optical satellite imagery and deep learning

  • 1Department of Computer Science, University of Texas at El Paso, El Paso, TX, United States
  • 2Biological Sciences Department, University of Texas at El Paso, El Paso, TX, United States
  • 3Environmental Science and Engineering Program, University of Texas at El Paso, El Paso, TX, United States

The Arctic coastline spans multiple countries, supports Indigenous livelihoods, and plays a vital role in the Arctic system. Rapid climate change is accelerating permafrost thaw, sea-level rise, and coastal erosion, underscoring the need for decision making to be informed by accurate delineation of tundra shoreline (instantaneous water line) and bluff edge (vegetation–slope boundary) position and change trends. To address this need, we compared two image segmentation approaches for mapping Arctic land and water interfaces from high-resolution satellite imagery. (1) U-Net, a supervised convolutional neural network trained on expert-annotated scenes, and (2) Differentiable Feature Clustering (DifFeat), an unsupervised model applied in a minimally supervised manner via expert-guided cluster selection. The shoreline and bluff edge boundaries were derived from the segmented land and water masks using an automated interface extraction approach. DifFeat achieved higher segmentation accuracy, with IoU values of 0.95 (water) and 0.92 (land), compared to U-Net’s 0.58 and 0.50, respectively. U-Net produced reliable results and benefited from infrared and vegetation spectral indices, but required extensive annotation and showed limited generalization to UAV imagery. DifFeat achieved superior results without manual annotation, reducing the dependence on labeled data and completing training 99.87% faster than U-Net. These findings highlight the complementary strengths of supervised and semi-supervised models for Arctic coastal mapping, with DifFeat offering a scalable, label-efficient solution for long-term coastal-change monitoring. Future work will integrate elevation data to further improve bluff edge feature detection.

1 Introduction

During the past 40 years, the Arctic has warmed nearly four times faster than the global average (Rantanen et al., 2022). Much of the Arctic coastal zone is underlain by permafrost, a landform that remains frozen for two or more years (National Snow and Ice Data Center, 2023), which stabilizes landscapes and stores large amounts of carbon (Schuur et al., 2015). As permafrost thaws, it disrupts the Arctic carbon balance and reduces the ocean’s ability to absorb atmospheric carbon dioxide, amplifying global climate feedbacks (Jones et al., 2018; Nielsen et al., 2024). Thawing also destabilizes the ice-rich coastal bluffs, triggering rapid erosion and land loss. Erosion rates have doubled over the past half century (Nielsen et al., 2022) and are projected to increase two to three times by 2,100 (Nielsen et al., 2022). This process threatens local infrastructure, cultural heritage, and indigenous subsistence practices by reshaping ecosystems and disrupting food webs (Cassidy et al., 2024; Nielsen et al., 2024; Fritz et al., 2017; Juma et al., 2025). Figure 1C shows an example of bluff collapse, where entire large sections of the coastal landscape can erode to the ocean. When combined with sea level rise and permafrost subsidence, total land loss can be six to eight times greater than erosion alone (Creel et al., 2024). However, long-term understanding of Arctic coastal change remains limited: approximately 86% of the Arctic coastline lacks positional data (i.e., spatial and temporal information of landscape features) (Jones et al., 2018). As erosion accelerates and social vulnerability increases, so does the need for scalable and automated methods to monitor Arctic coastal change (Nielsen et al., 2022; Irrgang et al., 2022).

Figure 1
(A) Base map showing WorldView-2 imagery captured in July 2010, highlighting the approximate 11 km-long shoreline study area (outlined in purple) adjacent to Elson Lagoon, near the Iñupiat village of Utqia3vik on the North slope of Alaska. (B) A UAV orthoimage highlights a coastal section with manually annotated shoreline (blue) and bluff edge (orange) lines used as reference. (C) Ground-based image showing a heavily eroded coastal tundra section along Elson Lagoon.

Figure 1. (A) Base map showing WorldView-2 imagery captured in July 2010, highlighting the approximate 11 km-long shoreline study area (outlined in purple) adjacent to Elson Lagoon, near the Iñupiat village of Utqiaġvik on the North slope of Alaska. (B) A UAV orthoimage highlights a coastal section with manually annotated shoreline (blue) and bluff edge (orange) lines used as reference. (C) Ground-based image showing a heavily eroded coastal tundra section along Elson Lagoon.

Although this study employs deep learning, previously used methods to identify coastal features have traditionally relied on field surveys, manual mapping, or digitization in Geographic Information Systems (GIS). These approaches are resource-intensive, prone to human error, and particularly difficult to scale in remote Arctic regions where accessibility and field conditions are challenging (Jones et al., 2018; Irrgang et al., 2022). In recent years, high-resolution satellite imagery has become more accessible through improvements in spatial resolution, revisit frequency, and the expansion of open-access platforms such as Landsat and Sentinel (Gabarró et al., 2023; Beamish et al., 2020; Wenzl et al., 2024). However, images alone cannot extract detailed geospatial features without automated analytical techniques. Machine learning approaches have been increasingly applied to classify land and water regions and to identify shoreline positions from imagery (Bengoufa et al., 2021). Models such as Random Forests (RF) (Aryal et al., 2021; McAllister et al., 2022; Bengoufa et al., 2021), Support Vector Machines (SVM) (McAllister et al., 2022; Bengoufa et al., 2021), Decision Trees, K-Nearest Neighbors (KNN) (McAllister et al., 2022), and XGBoost (Efimova et al., 2020) have been applied in this context. Although effective for broad land–water classification, these models often lack the spatial precision needed to delineate complex Arctic coastlines or capture subtle geomorphic transitions. To overcome these limitations, deep learning approaches leverage spatial context and hierarchical feature extraction to provide improved boundary accuracy and generalization for large-scale Arctic coastal mapping (Scala et al., 2024; Ronneberger et al., 2015).

Deep learning methods have shown strong performance in land–water segmentation and shoreline mapping, outperforming traditional machine learning approaches in spatial precision, robustness to scene variability, and generalization across diverse landscapes (Aryal et al., 2021; Clark et al., 2022). Among these, convolutional neural networks (CNNs), particularly U-Net models (Aryal et al., 2021; Philipp et al., 2022; Yang et al., 2020; Heidler et al., 2021; Dang et al., 2022; Park and Song, 2024) have proven effective at capturing both fine spatial detail and broader contextual patterns, making them well suited for delineating land–water interfaces even in settings with limited or partially annotated training data (Aryal et al., 2021). However, most U-Net–based shoreline studies have been developed and validated on temperate or tropical coasts, with limited adaptation to Arctic environments. Applying such models in the Arctic presents additional challenges, including short ice-free seasons that limit the availability of optical imagery, persistent cloud cover that reduces the number of usable satellite scenes (Nielsen et al., 2022; Bartsch et al., 2020), and complex permafrost-related geomorphologies such as polygonal tundra, thaw lake basins, and ice wedge networks that can introduce spectral and textural noise in classifications (Bartsch et al., 2020; Nielsen et al., 2022; Irrgang et al., 2022). Although pixel-wise segmentation of land and water enables accurate shoreline detection, Arctic coastal monitoring also requires delineation of bluff edge boundaries, an equally important but often overlooked component for quantifying land loss and erosion dynamics (Jones et al., 2018). In this study, the shoreline is defined as the instantaneous water line, and the bluff edge as the transition from vegetated tundra to a non-vegetated slope or cliff, as shown in Figure 1B (blue: shoreline; orange: bluff edge).

Although supervised deep learning models like U-Net (Ronneberger et al., 2015) have demonstrated success in semantic segmentation tasks using partially labeled data, their application to Arctic land–water segmentation and boundary extraction remains limited. Performing detailed land and water segmentation that can support accurate shoreline and bluff edge delineation typically requires large, high-quality annotated datasets, an effort that is time-consuming and difficult to scale in the Arctic. In contrast, unsupervised or minimally supervised deep learning approaches for Arctic coastal mapping remain largely unexplored, despite their potential to reduce reliance on labeled data. Recent foundation models such as the Segment Anything Model (SAM) (Kirillov et al., 2023) demonstrate zero-shot segmentation without task-specific labels, but were not considered here due to their reliance on per-image prompting (points, boxes, or masks), high computational demands for high-resolution satellite imagery, and training on natural image datasets that differ from the narrow, elongated geomorphic features typical of Arctic coasts. In addition, SAM’s outputs can be sensitive to scale and orientation, reducing consistency when applied to such linear coastal forms. These constraints motivate the exploration of alternative minimally supervised segmentation methods, such as Differentiable Feature Clustering (DifFeat) (Kim et al., 2020) that can produce consistent land–water delineations from which geomorphic boundaries are derived without per-image manual prompting.

To address the limitations and challenges of different deep learning methods, we investigate two approaches for semantic segmentation of Arctic land and water surfaces. The models are trained on WorldView-2 satellite imagery of Elson Lagoon (Figure 1) and are also evaluated on high-resolution UAV imagery of the same site, enabling assessment of cross-sensor generalization across differing spatial resolutions. We apply U-Net, a supervised convolutional neural network trained on manually annotated satellite scenes, and DifFeat (Kim et al., 2020), an unsupervised deep learning algorithm trained on a single unlabeled image, with minimal supervision introduced through expert-guided cluster selection of land and water classes on one image tile. To our knowledge, this represents the first application of an unsupervised deep learning framework with minimal expert supervision for Arctic coastal mapping. From the resulting land and water segmentation, shoreline and bluff edge boundaries are automatically derived using an interface extraction approach; these geomorphic boundaries serve as critical indicators of permafrost-driven land loss and coastal change. This framework enables a direct comparison between supervised and minimally supervised models, examining trade-offs in segmentation accuracy, annotation effort, and cross-sensor generalization. Specifically, this study aims to (1) evaluate both models for accurate land and water segmentation that supports reliable shoreline and bluff edge extraction, (2) compare supervised and minimally supervised approaches in terms of performance and generalization, and (3) assess their potential for larger-scale Arctic coastal monitoring applications.

2 Methodology

We implemented two deep learning-based approaches, U-Net and Differentiable Feature Clustering (DifFeat), to perform semantic segmentation of land and water surfaces using high resolution optical imagery. The following subsections outline dataset preparation, and the preprocessing pipeline and interface extraction approach, and describe the implementation and inference of the two models.

2.1 Dataset source

For segmentation experiments, we used five pan-sharpened WorldView-2 (WV2) images (red, green, blue (RGB) and infrared1 (IR1) bands only) with less than 20% cloud coverage, acquired between May and September in multiple years from 2010 to 2024. The imagery was provided by the Polar Geospatial Center (PGC) at the University of Minnesota through their cooperative agreement with the US National Science Foundations’ Office of Polar Programs. The spatial resolution of the WV2 images is approximately 0.5 m of ground sample distance (GSD), providing sufficient detail for land–water segmentation (European Space Agency, 2025). Each image is approximately 18,000–25,000 pixels in height and 17,000–25,000 pixels in width. The off-nadir angles for the five images were 11°, 25°, 31.6°, 42.9°, and 44.2°, reflecting differences in satellite viewing geometry which may influence shoreline and bluff visibility.

The geographic region was constrained to Elson Lagoon, a shallow embayment near the Iñupiat community of Utqiaġvik (Figure 1). Elson Lagoon was selected because Arctic Lagoons are geomorphologically diverse and ecologically significant, representing approximately 44% of the Alaskan Beaufort Sea coastline and offering a strong analog for Arctic coastal systems more broadly (Miller et al., 2021). Additionally, coastal erosion has been monitored along the Elson Lagoon shoreline for over 8 decades, providing a valuable resource of in situ data that can be used to validate newly developed machine learning-based approaches. The imagery captures diverse geomorphic elements, land-water interfaces, coastal bluffs, and permafrost-related landscape features, which are crucial for studying the impacts of climate change and coastal erosion in the Arctic. The dataset also reflects the unique imaging challenges of the region, including low contrast, mixed spectral textures between the shoreline, vegetation, and lagoon water, as well as seasonal variability. These complexities make the data particularly valuable for the Arctic more broadly, as they represent common issues encountered in Arctic satellite image analysis.

2.2 Annotation

The five WV2 scenes from Elson Lagoon were manually annotated using ArcGIS Pro (version 3.5) to delineate land and water regions. These annotations served as ground-truth labels for the supervised model training. Annotation was performed by an expert with direct field experience at the site and prior in situ data collection over multiple seasons. The imagery selected for annotation featured ideal environmental conditions: minimal cloud cover, no snow or ice, clear visibility of land–water transitions, and favorable sensor viewing angles. The labeling scheme focused on identifying only land and water classes. Areas not belonging to either category, such as beaches, sloping bluffs, tundra surfaces, or small inland water bodies, were left unannotated and thus implicitly treated as background. This structure allows for clear delineation of the main surface types while preserving the natural transitions needed for subsequent boundary extraction.

Annotations were created as polygonal masks, enabling a consistent delineation of spatial regions. Polygon boundaries were drawn along visually identifiable transitions based on surface color, texture, and shading. In areas with low contrast or shadowing, expert interpretation was used, guided by site-specific knowledge and prior field observations, to approximate the most probable land–water separation. Despite favorable conditions, challenges such as spectral confusion between shallow water and wet sediment, vegetation gradients along bluffs, and surface moisture occasionally complicated boundary placement. To ensure consistency, detailed labeling guidelines were developed and applied uniformly in all five scenes, each requiring approximately 2.5–3.5 h to annotate. Figures 2A,B shows examples of the imagery and corresponding annotated polygons, where land and water features are visualized in tan and teal, respectively. Supplementary Figure S1 provides additional visualization of the manual annotations overlaid on WV2 imagery, illustrating the spatial characteristics and delineation detail of the digitized features.

Figure 2
Illustration of the DifFeat workflow for identifying target features: (A) False-color input RGB image slice, (B) Corresponding manually annotated water (blue) and land (green) polygon, (C) Segmentation map generated by the DifFeat model, (D) Target boundaries extracted from the segmentation map, and (E) Final predictions overlaid on the input image slice.

Figure 2. Illustration of the DifFeat workflow for identifying target features: (A) False-color input RGB image slice, (B) Corresponding manually annotated water (blue) and land (green) polygon, (C) Segmentation map generated by the DifFeat model, (D) Target boundaries extracted from the segmentation map, and (E) Final predictions overlaid on the input image slice.

2.3 Data preparation

The satellite imagery used in this study underwent multiple pre-processing steps to ensure compatibility with the U-Net and DifFeat models. These steps standardized spatial alignment, improved feature separability, and prepared the data for model input. The main steps are described below.

2.3.1 Image and shapefile matching

Each WV2 satellite image was paired with its corresponding manually annotated land and water polygons. All raster images and shapefiles were maintained in a consistent projected coordinate reference system (NAD83 (2011) datum, UTM zone 4N, EPSG:26904) to ensure spatial alignment across datasets. Because the annotations were created directly on the source imagery, they aligned precisely with the corresponding scenes, providing an accurate basis for model training and evaluation.

2.3.2 Adding normalized bands

To enhance spectral separability between land and water surfaces, two normalized spectral indices were computed and appended to the original four-band imagery (RGB + IR1): the Normalized Difference Water Index (NDWI) (McFeeters, 1996) and the Normalized Difference Vegetation Index (NDVI) (Rouse et al., 1974). NDWI highlights open-water regions, improving the detection of subtle water boundaries, while NDVI highlights vegetated terrain, aiding the distinction of land-water boundaries under varying illumination and surface conditions.

2.3.3 Image and label slicing

All experiments were carried out in PyTorch on a server equipped with two NVIDIA GeForce RTX 3090 GPUs (24 GB each, total 48 GB) running CUDA 12.4 and driver 550.90.07, enabling efficient parallel training. Because the WV2 scenes were too large to process in their entirety, both imagery and annotation masks were divided into smaller tiles of size 512 × 512 pixels. Polygon annotations were rasterized into class-coded mask arrays, where pixel values corresponded to the following classes: 0 = background, 1 = water, and 2 = land. The slicing procedure ensured perfect spatial alignment between each image tile and its corresponding label. To prevent overrepresentation of homogeneous regions, the dataset was curated so that approximately 90% of the image tiles contained both land and water pixels, capturing transition zones essential for learning land and water interfaces. The remaining 10% of tiles represented single-class regions (e.g., open water or inland tundra) to help the model distinguish pure land and water surfaces and improve generalization to spatially uniform areas. For image edges smaller than the target tile size, zero-padding was applied to both imagery and masks to maintain consistent dimensions across the dataset.

2.3.4 Normalization of image bands

Each image tile, including the RGB and infrared (IR) bands, was normalized independently using z-score normalization, where pixel values were scaled according to the mean and standard deviation of the training dataset. NoData pixels and padded regions were excluded from these calculations to avoid biasing the normalization statistics. The NDWI and NDVI indices were computed from the original (non-normalized) reflectance values and, being ratio-based measures, were already normalized by definition and required no further adjustment. For inference, the mean and standard deviation of the RGB and IR bands derived from the training data were applied to maintain consistency and mitigate lighting or sensor-specific variation across scenes.

2.4 Data splitting and experimental setup

For all experiments, the dataset consisted of five WorldView-2 (WV2) satellite images acquired between 2010 and 2024. These images were divided into distinct subsets: three images were designated for training, one for validation, and one for testing. All tiles used for model evaluation were exclusively obtained from the held-out evaluation image, ensuring strict separation between training, validation, and test data. The tiling process resulted in 528 training tiles, 196 validation tiles, and 255 testing tiles, each measuring 512×512 pixels. While some tiles partially overlapped spatially across acquisitions, differences in capture year, illumination, and viewing geometry ensured that the evaluation imagery remained distinct from the training and validation datasets. This protocol was consistently applied to both the U-Net and DifFeat models, providing a robust basis for comparison and minimizing the risk of overfitting to specific acquisition conditions.

2.5 Cross-domain evaluation on large-scale WV2 and UAV imagery

To assess model generalization beyond training conditions, we evaluated both U-Net and DifFeat on large-scale Arctic imagery: (1) a large-scale WorldView-2 (WV2) satellite scene (18,000×20,000 pixels) covering the Elson Lagoon region, and (2) a high-resolution unoccupied aerial vehicle (UAV) orthomosaic (20,000×20,000 pixels at 2 cm/pixel) from the same coastal area. These scenes were not part of the training or validation data, allowing us to evaluate cross-scale and cross-source generalization. The large-scale WV2 image was processed in the same manner as the training dataset, including band selection, normalization, and tiling into 512×512 pixel patches for model inference. The predictions of each tile were mosaicked to produce a seamless segmentation of the entire scene, with boundary extraction performed as described in Sections 2.6.1 and 2.7.1.

The UAV imagery was collected on 6 August 2024 using a Quantum Systems Trinity F90+ fixed-wing UAV equipped with a Micasense Altum-PT multispectral sensor. The flight was carried out 120 m above ground level (AGL) over the Elson Lagoon coastline, following a corridor mapping path with 80% forward and side overlap. The resulting dataset was processed using a standard structure-from-motion (SfM) workflow, producing a 2 cm/pixel ground sampling distance (GSD) orthomosaic. The images were downsampled by extracting 5120×5120 slices and resizing them to 512×512 to match the input size of the model, thus preserving large-scale spatial patterns while reducing fine spatial detail. The same preprocessing, inference, and boundary extraction steps as with the training data were applied to ensure methodological consistency. No model re-training or fine-tuning was performed for these tests. The results of the large-scale WV2 image and the UAV orthomosaic were used exclusively to assess the ability of the trained models to generalize to new spatial scales and imaging platforms.

2.6 U-Net

U-Net, a fully convolutional neural network originally developed for biomedical image segmentation (Ronneberger et al., 2015), features an encoder–decoder structure with skip connections that allow simultaneous capture of global context and fine spatial detail. The encoder path extracts high-level contextual features through successive convolution and down-sampling operations, while the decoder path progressively reconstructs spatial details using up-sampling and convolution layers. Skip connections between corresponding encoder and decoder stages allow spatial information lost during down-sampling to be preserved and integrated during reconstruction, giving U-Net its distinctive U-shaped architecture.

In this study, U-Net was employed to perform semantic segmentation of land and water regions in high-resolution Arctic satellite imagery. Accurate delineation of these classes is essential for subsequent extraction of shoreline and bluff edge boundaries via post-processing. Because Arctic coastal transitions can be narrow, low contrast, and highly variable in appearance, the localization capability of U-Net is particularly well suited to capture fine-scale land-water interfaces that define these geomorphic boundaries. Training images and their corresponding label masks were prepared following the procedures described in Section 2.3, including normalization, tiling, and class balance adjustments, emphasizing tiles containing both land and water pixels.

To enhance generalization, data augmentation was applied during training using random rotations and horizontal and vertical flips, exposing the model to varied orientations of coastal features. The network was trained using a hybrid loss function combining Dice loss, which mitigates class imbalance, and Boundary loss, which sharpens land–water transitions by penalizing errors near class boundaries (Aryal et al., 2021). Optimization was performed with the Adam optimizer using a batch size of 8 and an initial learning rate of 0.001. A learning rate scheduler reduced the learning rate after 30 epochs to improve convergence stability. Model performance was evaluated using Intersection over Union (IoU), Precision, and Recall, computed for the land and water classes. These metrics provide a comprehensive assessment of segmentation accuracy and boundary fidelity, directly influencing the quality of the derived shoreline and bluff edge boundaries. Quantitative results are presented in the Results section.

2.6.1 Inference

During inference, each large WV2 image was divided into smaller patches matching the model’s input size. The trained U-Net model produced per-pixel class probabilities (softmax) for each patch, which were converted to discrete class labels (0 = background, 1 = water, 2 = land) using an argmax operation. Both the predicted class maps and, optionally, the derived boundaries were written to pre-allocated larger arrays that preserved the original spatial dimensions and alignment across tiles. From each predicted patch, shoreline and bluff edge boundaries were extracted using the interface extraction method (Section 2.8), which identifies the interfaces between land, water, and background regions. This boundary extraction step can be performed per patch or after mosaicking, depending on the desired output format. After processing all patches, the complete mosaicked raster was assembled to reconstruct the full scene.

The final outputs were then vectorized: the land and water segmentation was converted to polygon geometries, and when boundaries were extracted, the shoreline and bluff edge features were exported as polyline shapefiles. These products provide both region and line-based representations suitable for further spatial or ecological analysis. Unless explicitly stated elsewhere, no additional post-processing (e.g., smoothing or morphological filtering) was applied within the core U-Net pipeline. For cross-domain evaluation, U-Net models trained on satellite imagery were also applied to UAV-derived orthomosaics acquired for the same region in 2024 without additional fine-tuning to assess generalization performance.

2.7 Differentiable feature clustering

Differentiable Feature Clustering (DifFeat) is an unsupervised learning technique that leverages feature representation learning for image segmentation (Kim et al., 2020). Its key advantage lies in its ability to perform segmentation without the need for labeled data, making it particularly valuable in Arctic settings where annotated datasets are scarce or expensive to obtain. DifFeat clusters pixels based on learned feature representations extracted from a convolutional feature map, where each pixel is represented as a multidimensional vector across channels. Clustering is performed by assigning each pixel to the channel (cluster) with the highest activation, effectively grouping pixels with similar spectral–textural characteristics. In the implementation, the initial and minimum number of clusters are user-defined parameters. The model begins from the specified maximum and iteratively merges similar pixel groups during optimization until the minimum threshold is reached, controlling the final segmentation granularity. The optimization follows the loss formulation of Kim et al. (2020), combining Cross-Entropy loss which encourages grouping of similar pixels and L1 loss which enforces spatial continuity and smoothness by penalizing abrupt feature changes between neighboring pixels. Together, these mechanisms produce spatially coherent clusters that correspond to distinct geomorphic surface types.

As an unsupervised approach, DifFeat generates multiple clusters from the input image slices (Figure 2C). The model does not assign semantic classes to these clusters and the model’s role is limited to grouping pixels with similar features. In our workflow, this is adapted into a minimally supervised method: a single manual step is used to visually inspect clusters from a reference slice and identify those corresponding to the two target classes (water and land). Once the target clusters are identified, their cluster IDs are fixed and used consistently across all slices and images, enabling fully automated extraction in subsequent predictions. If a selected cluster does not appear in a slice, it simply reflects that the class is absent from that portion of the scene. This fixed-ID mapping is central to reusing the model output without reintroducing manual work.

DifFeat’s segmentation approach provides flexibility in cluster granularity, supporting both coarse-grained and fine-grained clustering depending on the desired level of feature separation. An important advantage of DifFeat in our research is its ability to operate effectively with minimal data, processing as little as a single image slice of size 224×224 or 512×512 pixels. This is particularly useful in Arctic coastal monitoring, where data are scarce due to remote site locations and environmental challenges such as persistent cloud cover. The input image slice is obtained from the slicing process explained in Section 2.3.3, normalized to standardize the pixel intensities and improve the stability of the feature clustering. The model can process various combinations of spectral bands, including RGB, RGB+IR, or RGB+IR+NDWI+NDVI, providing flexibility in using additional information for clustering. The model is initialized with random weights and trained from scratch, ensuring that the learned features are specific to Arctic coastal imagery.

The resulting class-coded masks, which comprise land, water and background regions, were subsequently processed using the automated interface extraction workflow (Section 2.8) to derive boundaries of the shoreline and the bluff edge. This unified post-processing step ensures methodological consistency between the supervised and unsupervised approaches, enabling direct comparison of segmentation accuracy, generalization, and data efficiency. Figure 2 illustrates the complete DifFeat workflow: (A) input WV2 image slice, (B) ground-truth mask displayed for comparison only (not used for model training), (C) clustering or segmentation map obtained from the model, (D) automatically extracted shoreline and bluff edge boundaries from land and water segments, and (E) the boundaries overlaid on the input image. Together, these panels demonstrate how DifFeat, despite operating without labeled data, produces coherent land and water delineations from which geomorphically meaningful shoreline and bluff edge boundaries can be objectively derived.

2.7.1 Inference

During inference, the model processed each large WV2 image in overlapping/non-overlapping slices that matched the input size (512×512 pixels). Predictions were generated for each slice, and the fixed cluster ID mapping established during the reference selection step was applied to retain only the clusters corresponding to land and water. The slice-level outputs were then mosaicked into a seamless array matching the original scene dimensions. The resulting segmentation map was exported as a GeoPandas shapefile with polygon geometry representing land and water regions. Optionally, these polygons were passed through the automated interface extraction workflow (Section 2.8) to obtain shoreline and bluff edge polylines. Supplementary Figure S2 illustrates this process: (A) shows the raw DifFeat prediction overlaid on the evaluation image, and (B) presents the corresponding derived boundaries after optional refinement.

By refinement, we refer to the limited manual post-processing performed for visualization purposes and to generate clean boundary shapefiles without artifacts at tile edges, as shown in Figure 5 and Supplementary Figure S2B. This manual post-processing involved removing small noisy detections, refining edge continuity, and ensuring spatial coherence across tile boundaries. This cleaning step, though derived from automated output, was necessary because the model occasionally introduced artifacts in the large mosaic. The latter included misclassification of small inland water bodies or tundra features as bluff edges, and false shoreline detections along tile corners containing open water. These manual adjustments were applied only for the preparation of final visualizations and presentation materials, and were not used during automated boundary extraction or for quantitative evaluation of model performance. For cross-domain evaluation, DifFeat trained solely on one image slice of satellite imagery was also applied to a full satellite imagery scene and UAV orthomosaics without any fine-tuning, allowing for the assessment of generalization across sensors and spatial scales.

2.8 Post-processing: interface extraction

Following segmentation, we derived geomorphic boundaries corresponding to the shoreline and bluff edge from the predicted land, water and background masks. Specifically, the model predicts two explicit classes, land and water, while unclassified regions (e.g., beach or bluff slopes) are treated as background. These boundaries are not directly predicted by the model but are instead derived from the spatial interfaces between the segmented regions. Boundaries were extracted by identifying adjacency relationships among these classes. For each class map, we first defined binary masks for land, water, and background. Using 8-connected morphological dilation with a 3 × 3 structuring element, we delineated geomorphic boundaries by locating water pixels adjacent to land or background (shoreline) and land pixels adjacent to water or background (bluff edge). To remove artifacts, small closed loops were filtered out and boundary pixels along image edges were discarded. This approach ensures that both shoreline and bluff edge lines are objectively derived from the predicted land and water regions, without requiring manual delineation for each image or direct training of linear features. The same procedure can be applied to either ground-truth or predicted masks, enabling consistent boundary extraction for evaluation and analysis.

2.9 Evaluation metrics

Model performance was quantitatively assessed using four standard region-based metrics: Intersection over Union (IoU), Precision, Recall, and F1 score. All metrics were computed separately for the land and water classes using the manually annotated polygons as ground-truth reference. IoU was calculated as the ratio of the intersection to the union of predicted and ground-truth pixels for each class. Precision represents the proportion of predicted pixels that are correct, while Recall quantifies the proportion of ground-truth pixels that are correctly identified. The F1 score is the harmonic mean of Precision and Recall, summarizing segmentation accuracy for each class. These metrics were reported for both U-Net and DifFeat on the held-out evaluation set, allowing for direct comparison of segmentation performance across model types and spectral input combinations.

3 Results

The following results summarize the performance of both segmentation approaches on land–water classification tasks and derived boundary extraction in Arctic coastal imagery. The results are organized into three components: (1) quantitative evaluation of both models in different combinations of spectral inputs for land-water segmentation, (2) comparative analysis of segmentation quality, training efficiency, and boundary coherence between U-Net and DifFeat, and (3) evaluation of both models in large-scale and multisource imagery to evaluate cross-domain generalization.

3.1 Quantitative evaluation of U-Net and DifFeat across spectral inputs

Table 1 summarizes the segmentation performance of U-Net and DifFeat in three spectral input combinations (RGB; RGB+IR; RGB+IR+NDWI+NDVI).

Table 1
www.frontiersin.org

Table 1. Comparison of U-Net and DifFeat performance across input modalities using region-based metrics (: higher is better).

3.1.1 U-Net performance across spectral inputs

For the water class, U-Net performance improved markedly with additional spectral information. The IoU increased from 0.16 (RGB) to 0.60 (RGB+IR), indicating that near-infrared bands enhanced discrimination of water from land. Using RGB inputs alone, the U-Net detected only the most distinct water regions, yielding very high precision (0.97) but low recall (0.16), indicating that many true water pixels were misclassified. Adding infrared bands increased sensitivity (recall = 0.77) with moderately reduced precision (0.72), resulting in a much higher F1 score (0.28 0.75) and more balanced segmentation. Including NDWI and NDVI produced only minor or negative changes (IoU = 0.58, F1 = 0.74), suggesting redundancy among the derived indices and other spectral bands included in the model.

For the land class, U-Net achieved an IoU of 0.45 with RGB inputs and 0.43 with infrared band. Performance improved slightly with NDWI and NDVI (IoU = 0.50, F1 = 0.66), likely reflecting enhanced vegetation–water contrast. Overall, U-Net relied strongly on spectral richness for accurate water segmentation, while land segmentation showed more modest gains.

3.1.2 DifFeat performance across spectral inputs

DifFeat, trained without labels, achieved consistently high segmentation accuracy across all input combinations. For the water class, performance was good with RGB inputs (IoU = 0.92) and further improved with the addition of the infrared band (IoU = 0.98), confirming that near-infrared information enhances separability between water and surrounding terrain. When NDWI and NDVI indices were included, performance decreased slightly (IoU = 0.95, F1 = 0.97) but remained well balanced, with precision and recall both exceeding 0.93.

For the land class, DifFeat maintained high and stable performance across all input combinations, with IoU increasing from 0.71 to 0.92 and F1 scores from 0.83 to 0.95, highlighting the benefit of incorporating additional spectral and index-based inputs. The model effectively utilized these complementary features, demonstrating robust generalization. Overall, DifFeat consistently outperformed U-Net under all spectral configurations, exhibiting superior adaptability to spectral variation and feature redundancy.

3.2 Comparison of training efficiency and predictions of U-Net and DifFeat

The training times, parameters, and configurations for the U-Net and DifFeat clustering models are summarized in Table 2. U-Net contains approximately 1.943 M learnable parameters, compared to 0.106 M for DifFeat, reflecting the deeper encoder–decoder structure of U-Net versus the shallower CNN in DifFeat. This makes DifFeat substantially lighter in terms of model complexity. The U-Net model required approximately 58 min (3,538 s) to complete 100 epochs using 528 labeled training images. In contrast, the DifFeat clustering model completed 100 epochs in just 4.61 s on a single unlabeled image tile, a 99.87% reduction in training time. Both models were trained using 512×512×6 (six-band composite) input slices to ensure consistency in spectral inputs for comparison. These results highlight the substantial contrast in model size and training efficiency between the two approaches.

Table 2
www.frontiersin.org

Table 2. Number of learnable parameters, total training time, and training settings for the evaluated models.

Figure 3 presents qualitative comparisons between U-Net and DifFeat predictions across five representative Arctic coastal scenes. Each row (A–E) shows a WV2 scene, the manually annotated land (green), water (blue), and background (cream) reference mask, and the predicted segmentation results from both models using different spectral inputs. Columns three and four show U-Net predictions trained on RGB and RGB + IR + NDWI + NDVI inputs, respectively, while columns five and six display DifFeat predictions under the same input configurations. The labels “S” and “B” denote shoreline and bluff edge similarity scores, respectively, computed as the Intersection over Union (IoU) between each model’s predicted boundaries and the ground-truth reference (shown for visualization only).

Figure 3
Segmentation results from U-Net and DifFeat showing land (green), water (blue), and backgroud (cream) regions across five Arctic coastal scenes. Rows (A–E) represent various test-set examples. Columns show false-color RGB input, ground truth reference masks, and predictions using RGB and RGB+IR+NDWI+NDVI inputs. “S” and “B” indicate the IoU scores for the water and land classes, respectively, when comparing model predictions with manual annotations (ground truth).

Figure 3. Segmentation results from U-Net and DifFeat showing land (green), water (blue), and backgroud (cream) regions across five Arctic coastal scenes. Rows (A–E) represent various test-set examples. Columns show false-color RGB input, ground truth reference masks, and predictions using RGB and RGB+IR+NDWI+NDVI inputs. “S” and “B” indicate the IoU scores for the water and land classes, respectively, when comparing model predictions with manual annotations (ground truth).

Across the examples, U-Net produced smooth, continuous land and water regions, closely following the manually annotated masks in most cases. Adding infrared and normalized indices improved delineation of subtle or low-contrast transitions, particularly where shallow water or moist tundra obscure spectral differences. In contrast, DifFeat generated more granular clusters with greater local variability, yet effectively captured fine-scale transition zones between land and water even without labeled training data. Multispectral inputs improved both models, yielding better spatial coherence and reduced false detections. To illustrate, Rows C and E of Figure 3 demonstrate the influence of surface texture and illumination, where both models exhibited small gaps or over-segmentation in darker regions when using RGB inputs. The addition of infrared and normalized indices mitigated these artifacts, improving consistency and reducing misclassification.

Figure 4 presents the corresponding shoreline and bluff edge boundaries derived from the segmentation outputs shown in Figure 3. The first column shows the same WV2 input scenes for reference, while the subsequent panels display the extracted boundaries corresponding to each segmentation result in the same spatial arrangement. Blue lines represent the shoreline and orange lines represent the bluff edge, both automatically delineated from the predicted masks using the interface extraction algorithm described in Section 2.8. The visual comparison highlights that U-Net boundaries were generally smoother and more continuous, whereas DifFeat boundaries were finer and often better aligned with subtle geomorphic transitions, though occasionally fragmented. When infrared and normalized indices were included, both models achieved improved alignment with the ground-truth boundaries with reduced false positives. Together, these results demonstrate that both models can generate accurate land/water segmentation, and corresponding shoreline/bluff edge boundaries, with U-Net favoring spatial smoothness and DifFeat excelling in fine-scale texture capture, providing complementary strengths for Arctic coastal monitoring.

Figure 4
Automatically extracted shoreline and bluff-edge boundaries derived from the land-water segmentations. Rows (A–E) represent various test-set examples. The first column shows the same WorldView-2 input scenes as in Figure 3 for reference, while the remaining columns display the corresponding predicted boundaries. Blue lines indicate shorelines, and orange lines indicate bluff edges.

Figure 4. Automatically extracted shoreline and bluff-edge boundaries derived from the land-water segmentations. Rows (A–E) represent various test-set examples. The first column shows the same WorldView-2 input scenes as in Figure 3 for reference, while the remaining columns display the corresponding predicted boundaries. Blue lines indicate shorelines, and orange lines indicate bluff edges.

3.3 Evaluation on large-scale and multi-source imagery

3.3.1 DifFeat predictions on WV2 satellite imagery

Figure 5A shows the full WV2 scene with shoreline and bluff edge boundaries automatically extracted from the DifFeat land and water segmentation results. A zoomed-in view in Figure 5B (approximately 2 km in length) demonstrates close spatial alignment between the extracted boundaries and the locations where these landforms would be expected, based on expert visual interpretation. However, some localized errors remain in the derived boundary positions. For example, Figure 5C highlights a region where the extracted interfaces are offset by a few pixels, resulting in water sediments being misclassified as bluff edges and wave patterns interpreted as shoreline. In Figure 5D, both the predicted bluff edge and the shoreline are spatially offset from the reference boundaries, and the shoreline shows a larger misalignment. Conversely, Figure 5E shows a correctly delineated bluff edge, but with a minor misalignment of the shoreline.

Figure 5
DifFeat model predictions on a WV2 image scene for two target classes: bluff edge (orange) and shoreline (blue). (A) The map and insets are shown after post-processing to improve visual and spatial coherence. Arrows indicate regions of interest highlighted in sub-figures (B–E). (B) Shows areas where predicted boundaries for both classes align with the reference. (C) Shows regions where predictions for both classes do not match the reference. (D) Highlights a region where the predicted bluff edge and shoreline are spatially offset from the reference, with a larger misalignment for the shoreline. (E) Illustrates a minor misaligned shoreline prediction. Line widths differ for each class to enhance clarity.

Figure 5. DifFeat model predictions on a WV2 image scene for two target classes: bluff edge (orange) and shoreline (blue). (A) The map and insets are shown after post-processing to improve visual and spatial coherence. Arrows indicate regions of interest highlighted in sub-figures (B–E). (B) Shows areas where predicted boundaries for both classes align with the reference. (C) Shows regions where predictions for both classes do not match the reference. (D) Highlights a region where the predicted bluff edge and shoreline are spatially offset from the reference, with a larger misalignment for the shoreline. (E) Illustrates a minor misaligned shoreline prediction. Line widths differ for each class to enhance clarity.

Overall, the derived boundaries correspond well to the expected shoreline and bluff edge positions. This alignment is illustrated in Supplementary Figure S1, where the post-processed DifFeat predictions are overlaid on manually annotated features for visual comparison. It is important to note that manual annotations were digitized as land and water polygons, where one boundary edge represents the actual shoreline or bluff edge. As a result, model predictions align with the intended feature edge, rather than the polygon centroid or area, as illustrated in Supplementary Figure S1. Supplementary Figures S2–S4 provide additional examples of raw and cleaned predictions, spatial discontinuities, and clustering behavior. In tundra regions, the model occasionally misclassified polygonal ice-wedge features, surface cracks, and high-centered polygons as bluff edges (see Supplementary Figure S3 for examples of these misclassifications, panels D–F). Boundary artifacts were also observed along the edges of water bodies and at tile borders, where portions of open water were sometimes interpreted as shoreline. These discontinuities occurred even when slice overlap was introduced during inference.

3.3.2 U-Net predictions on WV2 satellite imagery

U-Net land and water segmentation results for the large-scale WV2 scene are shown in Supplementary Figure S5 (top: extracted bluff edge boundaries; bottom: extracted shoreline boundaries), using models trained on different combinations of input bands.

For bluff edge boundaries (Supplementary Figure S5, top), U-Net trained with RGB + IR + NDVI + NDWI produced a continuous but noisy band that generally aligned with the actual bluff, though occasional extensions into the shoreline or water occurred. The RGB + IR model, however, struggled with turbid water, highlighting the benefit of additional spectral indices. For shoreline boundaries (Supplementary Figure S5, bottom), the RGB + IR + NDVI + NDWI model produced sparse outputs with scattered noise over water, while the RGB + IR model generated more continuous, though noisier, boundary segments that extended into the tundra. Both models showed limited improvement from the six-band composite.

This results contrasts with the prediction from smaller, mixed land-water patches shown in Figure 3, where U-Net produces smoother segmentation, especially in land regions. However, the model’s performance deteriorates in homogeneous patches, such as those containing only land or only water (Supplementary Figure S7, rows 1–3). In these regions, U-Net often generates spurious segments of the opposite class (e.g., isolated water pixels in land-only patches or land pixels in water-only patches). This issue is prominent in large-scale imagery, where many tiles are homogeneous. In such cases, the increased false positives accumulate, resulting in fragmented and noisy boundaries in the raw segmentation output. This behavior is mainly due to U-Net’s challenge in accurately classifying pure, single-class regions, leading to spurious predictions that disrupt the overall segmentation.

3.3.3 DifFeat predictions on drone imagery

The land–water segmentation outputs and extracted boundaries for the UAV scene are shown in Figure 6A, with derived bluff edge boundaries highlighted in green. Across the image, the model consistently produced land–water separations that, after interface extraction, yielded bluff edge lines in strong visual agreement with reference boundaries. Figure 6B shows a zoomed-in region where the extracted bluff edge boundary aligns closely with the manually delineated reference. However, in Figure 6C, tundra features are occasionally misclassified as bluff edges in the segmentation mask, reflecting generalization limitations. The model did not produce distinct shoreline boundaries from the UAV segmentation, likely due to weaker spectral contrast between water and adjacent tundra surfaces in the downsampled UAV data. This limitation may also relate to differences in image resolution, spectral response, or surface conditions unique to the UAV dataset.

Figure 6
Evaluation of the DifFeat model on a UAV-dervied drone orthomosaic. (A) Bluff edge predictions made by the DifFeat model trained on WV2 image slices, shown on UAV drone orthoimage of the same site. (B,C) highlight regions of correct and incorrect predictions, where the model’s performance varies across different sections.

Figure 6. Evaluation of the DifFeat model on a UAV-dervied drone orthomosaic. (A) Bluff edge predictions made by the DifFeat model trained on WV2 image slices, shown on UAV drone orthoimage of the same site. (B,C) highlight regions of correct and incorrect predictions, where the model’s performance varies across different sections.

3.3.4 U-Net predictions on drone imagery

The U-Net land–water segmentation model, trained on annotated WV2 satellite imagery, was evaluated on UAV orthomosaics to test cross-domain generalization. Supplementary Figure S6 shows the derived boundaries extracted from the segmentation outputs, with blue representing the shoreline and orange representing the bluff edge. The shoreline boundaries were sparse and fragmented, appearing as short, discontinuous line segments separated by large unclassified regions. These extracted boundaries did not consistently follow the true shoreline and were often interrupted by gaps, with false positives scattered across tundra areas and over water surfaces exhibiting spectral variation. Bluff edge boundaries were largely absent; the few segments that appeared were fragmented and confined mostly to image margins or corners, without forming coherent or continuous lines.

4 Discussion

In this study, we performed land and water segmentation of Arctic coastal imagery using both a supervised deep learning model (U-Net) and an unsupervised clustering method, DifFeat. U-Net serves as a representative supervised baseline widely used in coastal studies, while DifFeat demonstrates the potential to reduce dependence on annotated data in data-sparse environments. From the resulting land and water masks, shoreline and bluff edge boundaries were automatically derived using an interface extraction algorithm. DifFeat was trained entirely without labeled data and minimal manual supervision was employed for cluster selection, where an expert identified the clusters corresponding to land and water. To our knowledge, this is the first comparison of supervised and unsupervised approaches for these geomorphic boundaries in an Arctic setting, providing a foundation for automated coastal change detection. Beyond shoreline mapping, our study also derives another coastal morphological feature, the bluff edge, which is often overlooked but serves as a critical indicator for coastal change analysis and land-ocean exchange.

U-Net established a reliable supervised baseline for land and water segmentation, performing consistently across spectral inputs when applied to WV-2 imagery. The inclusion of the infrared (IR) band substantially improved water segmentation accuracy by enhancing contrast between tundra and open water, while NDVI and NDWI contributed minor or inconsistent gains. Land segmentation remained comparatively stable and less sensitive to spectral augmentation, indicating that land surfaces were easier to classify than water. When applied to UAV imagery, U-Net produced sparse and fragmented segmentation masks, reflecting limited cross-domain generalization from satellite-trained models. While the U-Net baseline provided a useful comparative model for tile-wise training and evaluation, it did not generalize well to full-scene or cross-domain imagery, producing scattered and noisy predictions. This is due to U-Net’s performance on mixed and homogeneous patches, which becomes critical when scaling to whole-scene analysis. Since a larger proportion of tiles in full-scene WV-2 imagery are homogeneous, the noise accumulates, becoming more prominent in large-scale outputs, leading to fragmented and unreliable boundaries.

DifFeat demonstrated strong and consistent performance for land and water segmentation, maintaining stable precision and recall across all spectral input combinations. This stability indicates that the model effectively learned discriminative spectral–spatial features without supervision and remained robust to input variability. Water segmentation benefited from the inclusion of infrared information, while vegetation and water indices (NDVI, NDWI) provided smaller but consistent improvements for land classification. The outputs were spatially coherent and boundary-aligned, producing clean masks from which shoreline and bluff edge boundaries were automatically extracted using the interface extraction algorithm. Overall, DifFeat’s consistent performance across sensors and spectral inputs highlights its potential as a label-efficient and generalizable alternative to supervised segmentation in data-limited Arctic environments. Its unsupervised clustering framework can be readily extended to other remote sensing applications, where grouping pixels with similar spatial-spectral characteristics enables efficient mapping of diverse surface features without the need for retraining or labeled data.

Beyond segmentation accuracy, DifFeat’s efficiency is a notable strength. Training for the DifFeat model was completed in just a few seconds for a single 512×512 image tile, and large-scene inference required minimal computational and manual effort. The training time is 99.87% shorter than that of U-Net. Despite this simplicity, the outputs were visually coherent and directly suitable for downstream refinement. This efficiency opens opportunities to use DifFeat outputs as sparse supervisory labels or enriched input features for supervised models like U-Net, reducing the need for extensive manual annotation while improving boundary precision (Clark et al., 2022). In data-scarce Arctic settings where fine-grained delineation is essential, such a hybrid strategy offers a scalable and label-efficient pathway for future coastal monitoring.

Since the training data contained only land and water labels, the shoreline and bluff edge were implicitly represented within a shared boundary. As a result, U-Net learned coarse class separations rather than the precise geomorphic interfaces needed for accurate boundary delineation. Although U-Net serves as a strong supervised baseline, recovering fine spatial details, especially at object boundaries, remains challenging (Zhou et al., 2023). Future work could explore boundary-aware or attention-enhanced U-Net variants to improve localization along geomorphic boundaries. More precise, line-based annotations could also benefit both training and evaluation, though such data remain difficult to obtain at large scales.

DifFeat predictions, meanwhile, require post-processing to mitigate tiling artifacts and occasional tundra misclassifications. These issues likely stem from local variability in cluster assignments and the absence of topographic context. Future work should therefore focus on automating cluster selection to improve spatial consistency across tiles and on integrating auxiliary datasets such as DEMs, which may enhance bluff edge detection by capturing subtle elevation transitions. Incorporating DEMs could likewise benefit supervised models such as U-Net by providing complementary elevation cues for boundary refinement. A further limitation is that both models were trained solely on cloud-free, open-water season imagery, limiting their generalization to imagery acquired under more challenging conditions which could include residual snow, or sea ice cover, or cloud shadows. Incorporating post-processing workflows for cloud/shadow removal or spectral inpainting could help address these limitations.

In summary, this study demonstrates supervised and unsupervised segmentation approaches as complementary tools for Arctic coastal monitoring. U-Net offers structured, annotation-driven learning, whereas DifFeat enables scalable, data-efficient mapping without manual labels. DifFeat’s core strength lies in grouping pixels with similar spectral and spatial characteristics into semantically meaningful clusters at varying levels of granularity, which can then be interpreted and assigned semantic labels. Together, these approaches advance automated extraction of shoreline and bluff edge features, two key indicators of permafrost-driven land loss, establishing a foundation for scalable Arctic coastal monitoring and supporting future efforts in climate adaptation, risk assessment, and community resilience across rapidly changing Arctic permafrost landscapes.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

HB: Writing – review and editing, Formal Analysis, Writing – original draft, Software, Validation, Conceptualization, Methodology, Visualization. SV: Conceptualization, Funding acquisition, Writing – review and editing, Visualization, Validation, Data curation, Supervision, Methodology. OF: Supervision, Methodology, Project administration, Software, Formal Analysis, Resources, Validation, Writing – review and editing. SP: Data curation, Validation, Writing – review and editing. CT: Funding acquisition, Writing – review and editing.

Funding

The authors declare that financial support was received for the research and/or publication of this article. This research was supported by the Office of Polar Programs at the US National Science Foundation (Grant Nos: 1656026, 1836861, 1927373, 2318378, and 2322664), and the National Aeronautics and Space Administration (Grant No: 80NSSC21K1164).

Acknowledgements

The authors would like to thank the Utqiaġvik Iñupiat Corporation (UIC) Science and the Iñupiat people of the Utqiaġvik community for their continued support. Additionally, we thank past and present members of the Systems Ecology Lab (SEL) for help with data collection. Geospatial support for this work was provided by the Polar Geospatial Center (PGC) under NSF-OPP awards 1043681, 1559691, and 2129685. The authors acknowledge the use of ChatGPT (version 4, OpenAI) for assistance in grammar checking, rephrasing, and improving the overall readability of the manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that Generative AI was used in the creation of this manuscript. Generative AI, in particular, ChatGPT (version 4, OpenAI) was used for assistance in grammar checking, rephrasing, and improving the overall readability of the manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fenvs.2025.1657984/full#supplementary-material

References

Aryal, B., Escarzaga, S. M., Vargas Zesati, S. A., Velez-Reyes, M., Fuentes, O., and Tweedie, C. (2021). Semi-automated semantic segmentation of arctic shorelines using very high-resolution airborne imagery, spectral indices and weakly supervised machine learning approaches. Remote Sens. 13, 4572. doi:10.3390/rs13224572

CrossRef Full Text | Google Scholar

Bartsch, A., Ley, S., Nitze, I., Pointner, G., and Vieira, G. (2020). Feasibility study for the application of synthetic aperture radar for coastal erosion rate quantification across the arctic. Front. Environ. Sci. 8, 143. doi:10.3389/fenvs.2020.00143

CrossRef Full Text | Google Scholar

Beamish, A., Raynolds, M. K., Epstein, H., Frost, G. V., Macander, M. J., Bergstedt, H., et al. (2020). Recent trends and remaining challenges for optical remote sensing of arctic tundra vegetation: a review and outlook. Remote Sens. Environ. 246, 111872. doi:10.1016/j.rse.2020.111872

CrossRef Full Text | Google Scholar

Bengoufa, S., Niculescu, S., Mihoubi, M. K., Belkessa, R., Rami, A., Rabehi, W., et al. (2021). Machine learning and shoreline monitoring using optical satellite images: case study of the Mostaganem shoreline, Algeria. J. Applied Remote Sensing 15, 026509. doi:10.1117/1.jrs.15.026509

CrossRef Full Text | Google Scholar

Cassidy, G., Wiseman, M., Lange, K., Eilers, C., and Bradley, A. (2024). Seasonal coastal erosion rates calculated from planetscope imagery in arctic Alaska. Remote Sens. 16, 2365. doi:10.3390/rs16132365

CrossRef Full Text | Google Scholar

Clark, A., Moorman, B., Whalen, D., and Vieira, G. (2022). Multiscale object-based classification and feature extraction along arctic coasts. Remote Sens. 14, 2982. doi:10.3390/rs14132982

CrossRef Full Text | Google Scholar

Creel, R., Guimond, J., Jones, B. M., Nielsen, D. M., Bristol, E., Tweedie, C. E., et al. (2024). Permafrost thaw subsidence, sea-level rise, and erosion are transforming alaska’s arctic coastal zone. Proc. Natl. Acad. Sci. 121, e2409411121. doi:10.1073/pnas.2409411121

PubMed Abstract | CrossRef Full Text | Google Scholar

Dang, K. B., Vu, K. C., Nguyen, H., Nguyen, D. A., Nguyen, T. D. L., Pham, T. P. N., et al. (2022). Application of deep learning models to detect coastlines and shorelines. J. Environ. Manag. 320, 115732. doi:10.1016/j.jenvman.2022.115732

CrossRef Full Text | Google Scholar

Efimova, A., Bartsch, A., and Pointner, G. (2020). Arctic coastline mapping with sentinel-2 data. AGU Fall Meet. Abstr. 2020, C003–C008.

Google Scholar

European Space Agency (2025). Worldview-2 mission overview.

Google Scholar

Fritz, M., Vonk, J. E., and Lantuit, H. (2017). Collapsing arctic coastlines. Nat. Clim. Change 7, 6–7. doi:10.1038/nclimate3188

CrossRef Full Text | Google Scholar

Gabarró, C., Hughes, N., Wilkinson, J., Bertino, L., Bracher, A., Diehl, T., et al. (2023). Improving satellite-based monitoring of the polar regions: identification of research and capacity gaps. Front. Remote Sens. 4, 952091. doi:10.3389/frsen.2023.952091

CrossRef Full Text | Google Scholar

Heidler, K., Mou, L., Baumhoer, C., Dietz, A., and Zhu, X. X. (2021). Hed-unet: combined segmentation and edge detection for monitoring the antarctic coastline. IEEE Transactions Geoscience Remote Sensing 60, 1–14. doi:10.1109/tgrs.2021.3064606

CrossRef Full Text | Google Scholar

Irrgang, A. M., Bendixen, M., Farquharson, L. M., Baranskaya, A. V., Erikson, L. H., Gibbs, A. E., et al. (2022). Drivers, dynamics and impacts of changing arctic coasts. Nat. Rev. Earth Environ. 3, 39–54. doi:10.1038/s43017-021-00232-1

CrossRef Full Text | Google Scholar

Jones, B. M., Arp, C. D., Whitman, M. S., Grosse, G., and Nitze, I. (2018). A decade of remotely sensed observations highlight complex processes linked to coastal permafrost bluff erosion in the arctic. Geophys. Res. Lett. 45, 4872–4881.

Google Scholar

Juma, G. A., Meunier, C. L., Herstoff, E. M., Irrgang, A. M., Fritz, M., Weber, C., et al. (2025). Future arctic: how will increasing coastal erosion shape nearshore planktonic food webs? Limnol. Oceanogr. Lett. 10, 5–17. doi:10.1002/lol2.10446

CrossRef Full Text | Google Scholar

Kim, W., Kanezaki, A., and Tanaka, M. (2020). Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans. Image Process. 29, 8055–8068. doi:10.1109/tip.2020.3011269

CrossRef Full Text | Google Scholar

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., et al. (2023). “Segment anything,” in Proceedings of the IEEE/CVF international conference on computer vision, 4015–4026.

Google Scholar

McAllister, E., Payo, A., Novellino, A., Dolphin, T., and Medina-Lopez, E. (2022). Multispectral satellite imagery and machine learning for the extraction of shoreline indicators. Coast. Eng. 174, 104102. doi:10.1016/j.coastaleng.2022.104102

CrossRef Full Text | Google Scholar

McFeeters, S. K. (1996). The use of the normalized difference water index (ndwi) in the delineation of open water features. Int. J. Remote Sens. 17, 1425–1432. doi:10.1080/01431169608948714

CrossRef Full Text | Google Scholar

Miller, C. A., Bonsell, C., McTigue, N. D., and Kelley, A. L. (2021). The seasonal phases of an arctic lagoon reveal the discontinuities of ph variability and co2 flux at the air–sea interface. Biogeosciences 18, 1203–1221. doi:10.5194/bg-18-1203-2021

CrossRef Full Text | Google Scholar

National Snow and Ice Data Center (2023). Frozen ground and permafrost.

Google Scholar

Nelson, P. R., Maguire, A. J., Pierrat, Z., Orcutt, E. L., Yang, D., Serbin, S., et al. (2022). Remote sensing of tundra ecosystems using high spectral resolution reflectance: opportunities and challenges. J. Geophys. Res. Biogeosciences 127, e2021JG006697. doi:10.1029/2021jg006697

CrossRef Full Text | Google Scholar

Nielsen, D. M., Pieper, P., Barkhordarian, A., Overduin, P., Ilyina, T., Brovkin, V., et al. (2022). Increase in arctic coastal erosion and its sensitivity to warming in the twenty-first century. Nat. Clim. Change 12, 263–270. doi:10.1038/s41558-022-01281-0

CrossRef Full Text | Google Scholar

Nielsen, D. M., Chegini, F., Maerz, J., Brune, S., Mathis, M., Dobrynin, M., et al. (2024). Reduced Arctic Ocean co2 uptake due to coastal permafrost erosion. Nat. Clim. Change 14, 968–975. doi:10.1038/s41558-024-02074-3

CrossRef Full Text | Google Scholar

Park, S., and Song, A. (2024). Shoreline change analysis with deep learning semantic segmentation using remote sensing and gis data. KSCE J. Civ. Eng. 28, 928–938. doi:10.1007/s12205-023-1604-9

CrossRef Full Text | Google Scholar

Philipp, M., Dietz, A., Ullmann, T., and Kuenzer, C. (2022). Automated extraction of annual erosion rates for arctic permafrost coasts using sentinel-1, deep learning, and change vector analysis. Remote Sens. 14, 3656. doi:10.3390/rs14153656

CrossRef Full Text | Google Scholar

Rantanen, M., Karpechko, A. Y., Lipponen, A., Nordling, K., Hyvärinen, O., Ruosteenoja, K., et al. (2022). The arctic has warmed nearly four times faster than the globe since 1979. Commun. Earth Environment 3, 168. doi:10.1038/s43247-022-00498-3

CrossRef Full Text | Google Scholar

Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-net: convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention (MICCAI) (Springer), 234–241.

Google Scholar

Rouse, J., Haas, R., Schell, J., and Deering, D. (1974). “Monitoring vegetation systems in the Great Plains with erts,” in Third Earth resources technology Satellite-1 symposium (Technical presentations NASA), 1, 309–317.

Google Scholar

Scala, P., Manno, G., and Ciraolo, G. (2024). Semantic segmentation of coastal aerial/satellite images using deep learning techniques: an application to coastline detection. Comput. Geosciences 192, 105704. doi:10.1016/j.cageo.2024.105704

CrossRef Full Text | Google Scholar

Schuur, E. A., McGuire, A. D., Schädel, C., Grosse, G., Harden, J. W., Hayes, D. J., et al. (2015). Climate change and the permafrost carbon feedback. Nature 520, 171–179. doi:10.1038/nature14338

PubMed Abstract | CrossRef Full Text | Google Scholar

Wenzl, M., Baumhoer, C. A., Dietz, A. J., and Kuenzer, C. (2024). Vegetation changes in the arctic: a review of earth observation applications. Remote Sens. 16, 4509. doi:10.3390/rs16234509

CrossRef Full Text | Google Scholar

Yang, T., Jiang, S., Hong, Z., Zhang, Y., Han, Y., Zhou, R., et al. (2020). Sea-land segmentation using deep learning techniques for landsat-8 oli imagery. Mar. Geod. 43, 105–133. doi:10.1080/01490419.2020.1713266

CrossRef Full Text | Google Scholar

Zhou, X., Wang, J., Zheng, F., Wang, H., and Yang, H. (2023). An overview of coastline extraction from remote sensing data. Remote Sens. 15, 4865. doi:10.3390/rs15194865

CrossRef Full Text | Google Scholar

Keywords: deep learning, segmentation, shoreline, bluff edge, U-Net, differentiable feature clustering, satellite imagery, unsupervised learning

Citation: Bagavathyraj H, Vargas Zesati S, Fuentes O, Peterson S and Tweedie CE (2026) Segmentation of arctic coastal shoreline and bluff edges using optical satellite imagery and deep learning. Front. Environ. Sci. 13:1657984. doi: 10.3389/fenvs.2025.1657984

Received: 02 July 2025; Accepted: 06 November 2025;
Published: 21 January 2026.

Edited by:

Peng Liu, Chinese Academy of Sciences (CAS), China

Reviewed by:

Dustin Whalen, Geological Survey of Canada, Canada
Giulio Passerotti, The University of Melbourne Department of Computing and Information Systems, Australia
Heidi Rodenhizer, Woods Hole Research Center, United States

Copyright © 2026 Bagavathyraj, Vargas Zesati, Fuentes, Peterson and Tweedie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Harshavardhini Bagavathyraj, aGJhZ2F2YXRoeXJAbWluZXJzLnV0ZXAuZWR1

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.