- 1School of Agriculture and Science, University of KwaZulu-Natal, Durban, South Africa
- 2Department of Geography, University of South Africa, Florida Campus, Johannesburg, South Africa
Effective and accurate land use and land cover classification (LULC-C) is an indispensable exercise for various environmental management objectives, including past and future land-use dynamics, flood and runoff modelling. However, LULC-C is subject to several limitations, such as labour-intensive derivation of a labelled dataset. So, we aim to enhance LULC-C using auxiliary features: elevation, slope, aspect, distance from-(road, railway stations, rivers, water, and town), global human settlement built-up layer and remote sensing indices in the heterogeneous landscape of eThekwini (EM) and Nelson Mandela Bay Metropolitan (NMBM) using Sentinel-2 and random forests (RF). We compared two classification scenarios: (1) feature set including bands and indices, and (2) feature set including bands, indices, and auxiliary features. We trained and tested RF using block cross-validation and random hold-out (70/30) split and validated the classified image using independent validation and 30% subset, through overall accuracy (OA) and F1-score. The study quantified the uncertainty using a 95% confidence interval with bootstrapping samples of 1000 iterations, and quantify the significance of scenario 2 using McNemar and p-value. Pixel-wise quantity and allocation disagreement were derived to compare classification scenarios against the two 2020 reference maps for South African National Land Cover and Environmental System Research Institute. A class-by-class pixel comparison between classification scenarios underscores the potential of auxiliary features. While classification scenarios achieved comparable accuracy, scenario 2 superseded scenario 1 in all classification scheme. Using an independent validation, the study found confidence interval (CI) for OA of 83.63% CI: 77.78–88.89 improved to 89.47% CI: 84.79–94.15, respectively, for scenario 1 and scenario 2 over EM. Confirmed by NMBM, where OA of 82.29% CI: 76.57–87.43 stabilised to 88.57% CI: 84.00–93.14 for scenario 1 and scenario 2. The performance improvement was statistically significant, attaining p-values of 0.03 and 0.02, respectively, for EM and NMBM using an independent validation set. However, while using 30% validation subset, results show in-significant improvement in NMBM attaining p-value =
1 Introduction
Land use and land cover classification (LULC-C) have become an indispensable tool in environmental monitoring and to understand the past, present and future land use dynamics (Kgaphola et al., 2023). It is also critical for biodiversity change analysis (Musetsho et al., 2021), flood mitigation and mapping (Ashfaq et al., 2025), and groundwater modelling (Shandu and Atif, 2023). LULC-C have gained prominence over the years and has escalated research evaluating the land use and land cover change (Nxumalo et al., 2025) as well as future land use forecasting (Tahir et al., 2025). This enables the derivation of a conclusion pertaining alteration of the ecosystem, such as foreseeing an increase in built-up areas (Adam et al., 2023), change in the distribution of forest and vegetation cover (Musetsho et al., 2021) which can increase susceptibility to flooding. Understanding LULC configuration is relevant in modelling change trajectories, which aid in human actions and policy directions (Wang et al., 2024). The spatial and temporal characteristics of LULC change constitute a key feature in the strategic planning for climate, biodiversity, hydrology, forestry, food security, and urban expansion (Zhai et al., 2021), to achieve sustainable land development goals (Wang et al., 2024). Among other applications, LULC-C have been used to understand the change of the African Great Lakes and future anticipation (Kayitesi et al., 2024), urbanisation and surface temperature relationship (Thi Hang and Alshayeb, 2025), as well as impact on hydrological process (Alawi and Özkul, 2023; Bhungeni et al., 2025).
Countless research has shown that accurate LULC-C can provide insight into understanding land transformation and allows characterisation of the ways in which diverse human activities are altering the Earth's surface (Zhang et al., 2025), by identifying a shift in rural-urban migration (Nahid et al., 2025). Changes such as urbanisation and the development of human settlement threaten a watershed system (Magidi et al., 2024). Therefore, as human-driven transformation is increasingly noticeable, the need for precise and timely classification continues to grow, aiding in up-to-date and accurate information from data collection (sensor) to the algorithm (classification and forecasting) (Giri et al., 2005).
The South African east coast, particularly eThekwini (EM) and Nelson Mandela Bay Metropolis (NMBM), is faced with urbanisation leading to land alteration and conversion. Decline in vegetation and forest cover to built-up, paved and asphalt surfaces accelerates the rate of water flow while reducing the rate of infiltration. One of the notable environmental problems around eThekwini has been the loss of forest and vegetation (Buthelezi et al., 2024b). South Africa is characterised by the challenging accessibility of high-resolution data that can help determine LULC at a fine scale. Nonetheless, the region is gaining more attention pertaining to environmental analysis (Buthelezi et al., 2024a).
The evolution of remote sensing satellite systems has demonstrated remarkable strides in providing improved data sources for mapping LULC. While satellites are often affected by temporal, spatial, spectral, and radiometric characteristics, they have been renowned for their efficient data collection at a large scale and in inaccessible regions (Abdelmajeed and Juszczak, 2024). Remote sensing satellites provide data in multi-spectral and hyperspectral bands, enabling the retrieval and characterisation of various surface features. These spectral elements are uniquely sensitive to various surface objects, and their combinations can facilitate the detection of subtle changes (Cristille et al., 2024). Publicly available products such as Landsat (5, 7, 8, 9) and Sentinel-2 have been widely used due to their accessibility. Despite their utility, these sensors are unavailable for accurate identification of fine-scale landscape features, and commercial sensors such as PlanetScope offer improved spatial resolution (Acharki, 2022). Still, they are often limited by cost and access.
Leveraging advances in satellite data, machine learning (ML), and deep learning (DL) has greatly improved image classification by offering high accuracy and adaptability. Among these reliable algorithms that have demonstrated robust performance in tasks are random forest (RF) (Badshah et al., 2024; Kasahun and Legesse, 2024) and support vector machine (SVM), (Bhungeni et al., 2024; Faheem et al., 2024). Given the ML and DL effectiveness in LULC-C, they strongly rely on the effective labelled dataset, which can be time and labour-intensive to derive (Xie et al., 2025). Remote sensing satellite indices have long been used as an addition and supplement to LULC-C. Indices such as the normalised difference vegetation index (NDVI) (Yimer et al., 2024), normalised difference water index (NDWI) (Qu et al., 2021), modified normalised difference water index (MNDWI) (Amini et al., 2022), enhanced vegetation index (EVI) (Hurskainen et al., 2019), normalised difference moisture index (NDMI) and index-based built-up index (IBI) are fundamental (Amini et al., 2022).
In addition to LULC-C with satellite bands and indices, the literature has evidence that the addition of auxiliary features in LULC forecasting is effective. To predict future LULC, elevation, slope, and population (Zhang et al., 2023), proximity (road, built-up, rail) and aspect (Indraja et al., 2024), are being utilised. Among other studies have evaluated parameters such as geomorphology, soil type (Zoungrana et al., 2015), population (people per pixel) (Hurskainen et al., 2019) precipitation and soil temperature (Kavzoglu and Bilucan, 2023).
This indicates that supplementary data may capture contextual environmental and topographic information that is not readily apparent in raw spectral data (Setiawan et al., 2024; Thi Hang and Alshayeb, 2025). However, there is limited evidence on how these auxiliary parameters could influence the present LULC-C. Therefore, this aids the need for empirical evidence investigating auxiliary datasets that can enhance LULC-C.
The present study shifts from the traditional use of satellite bands and indices. It aims to fill the gap by evaluating the effect of incorporating auxiliary datasets in LULC-C using the RF algorithm over two distinct study regions (EM and NMBM) based on spectral-rich Sentinel-2 data. We set two different LULC-C scenarios, (1) with Sentinel-2 bands and indices, and (2) with Sentinel-2 bands, indices and auxiliary features. Specifically, we investigated auxiliary features such as elevation, slope, aspect, distance from road, railway stations, rivers, water, town, and the global human settlement layer (GHSL) built-up. In parallel, we used Sentinel-2 bands and indices such as NDVI, NDWI, MNDWI, EVI, water ratio index (WRI), NDMI, IBI, normalised difference built-up index (NDBI), bare soil index (BSI) and automatic water extraction index no-shadow (AWEI_nsh). These datasets have not been simultaneously explored in LULC-C approaches. The RF classifier was chosen based on its superiority in LULC-C. Model performance was assessed using a confusion matrix to derive overall accuracy, producer accuracy, user accuracy, F1-score, error omission and error commission. Additionally, spatial block cross-validation and RF model tuning were employed to select the best parameters. Quantity and allocation disagreement were used to compare our classification scheme against the two 2020 reference maps, the (1) Environmental System Research Institute (ESRI) global LULC and (2) South African National Land Cover (SANLC) dataset. Comparing scenarios 1 and 2 against SANLC and ESRI gives an idea of which scenario performs better. Rather than comparing model accuracy, we extend to compare model disagreement with acceptable classification maps.
This work contributes to a better understanding of an integrated framework for LULC mapping, particularly in a heterogeneous landscape using auxiliary features. Enhancement of LULC classification using auxiliary features reduces labour-intensive derivation of labelled dataset and improves class separability, increasing the degree of accuracy and confidence in land cover analysis and policy management.
2 Methodology
2.1 Study area
The eThekwini (EM) and Nelson Mandela Bay (NMBM) Metropolitan Municipalities (Figure 1) are two among eight metropolitan municipalities of South Africa, along the east coast, bordered by the Indian Ocean on the eastern seaboard. EM and NMBM features different landscapes, and the majority of the EM landmass is made up of urbanised districts, such as the city of Durban, which is encircled by rural villages and peri-urban residential areas and informal settlements (Buthelezi et al., 2024c). The EM is featured by complex topography, transitioning from the coastal zone to the hill inland region on the west and then to the foothills of the Great Drakensberg Mountains. While EM elevation ranges between 0 and 874 m above sea level, the east coast featured by elevation < 131 m, making it favoured for development. Whilst the region is favoured by forest cover (Buthelezi et al., 2024b), it accounts for agricultural practice, especially in the north-eastern part. EM also accounts grassland and wetland prolonged on the river mouth of the uMngeni river.
Figure 1. Spatial location of the two among South African metros (eThekwini and Nelson Mandela Bay Metropolitan Municipalities). Reference A–F were used to visualise and compare the classification schemes (scenario 1 and 2, SANLC and ESRI-LC).
Similar to EM, NMBM have witnessed rural-urban migration, leading to population growth and conversion of land cover to built-up areas. The major city of the NMBM, Port Elizabeth, known as Gqeberha, is the oldest serving city in South Africa, which has witnessed a high rise of industries, residential, retail, and farming in the periphery of the city (Odindi et al., 2012). Nelson Mandela Bay is characterised by elevation ranges between 0 and 894.80 m, whilst characterised mainly by elevation ranges below 291 m. The landmass of NMBM follows a pattern where the central to east portion is highly developed. The south and west are favoured by forest and agriculture. NMBM features diverse South African biomes, including grassland, which characterises different species of grass. Due to its landscape, sloping towards the east, NMBM is characterised by wetland, which dominates especially over slopes of 39%, where over 80% of them are considered depressions, seeps and wetland flats (Melly et al., 2017). References A–F in Figure 1 are randomly selected regions which were used to compare the visual agreement of classification scenarios against reference classifications (Figure 9).
2.2 Data and materials
2.2.1 Study framework
The integration of indices and auxiliary features into LULC-C was conducted using the conceptual framework (Figure 2). The acquired dataset includes auxiliary data, Copernicus Sentinel-2 harmonised surface reflectance image, 2020 SANLC (Ngcofe and Nkoana, 2023) data available at (https://gee-community-catalog.org/projects/sa_nlc/) and 2020 ESRI-LC (Karra et al., 2021) open source global dataset available at (https://livingatlas.arcgis.com/landcoverexplorer/). The steps follow a pre-processing, data conversion, and projection of data to a local coordinate reference system. Training and testing sets were derived in Google Earth Pro in polygons, and random points were generated within those polygons in Google Earth Engine (GEE). To derive polygons in Google Earth Pro, we utilised imagery acquired in July 2020 and August 2020, respectively, for EM and NMBM. GEE cloud-based platform facilitates data storage and geoprocessing, making it a greater interest for scientific research (Gorelick et al., 2017; Velastegui-Montoya et al., 2023). Furthermore, additional validation points were generated in GEE. During the validation process, the study follows a two-fold validation, block-cross validation and random hold-validation. After selecting and applying the best tuned model, we validated the classified raster using a 30% validation (subset of random split) and independent validation. Further comparison was done to evaluate the disagreement between classification scenarios against SANLC and ESRI-LC using quantity and allocation disagreement. While SANLC is a local land cover map, ESRI-LC acts as a benchmarking comparison. We used SANLC and ESRI-LC to evaluate whether our classification approach with auxiliary features provides reliability and adaptability in both local and global scales. Both these referenced datasets are products of Sentinel-2, where their reported reliable overall accuracy, 91.5% for SANLC (Ngcofe and Nkoana, 2023), and 75% for ESRI-LC (Karra et al., 2021). Overall, this study used ArcGIS, Python (Jupyter notebook) and GEE.
Figure 2. The representation of the conceptual framework followed. This flow chart summarises the data collection, pre-processing, preparation training tuning and testing of the models and validation of the classified raster.
2.2.2 Satellite data
The spectral-rich Sentinel-2 dataset with 10–60 m spatial resolution and a 5-day temporal resolution (Kavzoglu and Bilucan, 2023) was employed. Satellite images were obtained from the publicly available GEE data catalogue for June–August 2020, not investigating seasonal change; instead, we computed the median collections. Sentinel-2 data Level 2 collection, which is geometrically corrected surface reflectance, was used in this study. The selection of Sentinel-2 was based on factors such as accessibility, proportion of cloud cover, spectral and spatial characteristics. We further selected bands (blue, green, red, and near infrared) with 10 m resolution, and short-wave infrared (SWIR) (SWIR-1 and SWIR-2) with 20 m spatial resolution, for land monitoring. Two bands (SWIR1 and SWIR2) were up-sampling to correspond 10 m spatial resolution. We applied bilinear interpolation to resample SWIR bands in ArcGIS. We observed that resampling SWIR bands can introduce a spectral distortion and reduced classification accuracy. However, it is important to harmonise SWIR bands if they will be used alongside 10 m bands. Among other studies have used cubic interpolation (X. Huang et al., 2024), nearest neighbour to down sample 10 m bands to 30 m (Sisay et al., 2023) and hierarchical fusion network to sharpen 20 and 60 m bands to 10 m (Wu et al., 2023).
2.2.3 Remote sensing indices
In this study, we investigated the effectiveness of NDVI, NDWI, MNDWI, EVI, WRI, NDMI, BSI, NDBI, and IBI, composed of several indices such as NDBI, NDVI, MNDWI, as well as AWEI_nsh, which is effective in removing dark built-up areas and other non-water pixels (Khalid et al., 2021), computed in GEE using equations in Supplementary Table A1. The contribution of each index to a model predictive performance was tested, and indexes were employed respectively in each classification scheme.
2.2.4 Auxiliary data
The auxiliary features describing environmental, human, and topographic factors which are otherwise not directly recorded by the satellite spectral reflectance is summarised in Table 1. Among these auxiliary features, proximity layers, distance from road, town, and railway station, record geographical patterns of human settlement and infrastructure. Whereas terrain elements (slope, aspect, elevation, and topographic position index) aid in identifying vegetation and land usage in mountainous or riparian areas, and rivers and waterbodies identify resources needed by humans and the location of water.
Table 1. Relevance of auxiliary features to land use and land cover classification schemes and class separability.
The topographic dataset (elevation, slope, and aspect) was derived from a digital elevation model (DEM) computed from 25 m contour lines sourced from the South Africa geospatial portal of National Geo-spatial Information (NGI) (http://www.cdngiportal.co.za/CDNGIPortal/). Topographic position index (TPI), aspect and slope are all derivatives of the elevation model. NGI is considered a component of the Department of Agriculture, Land Reform and Rural Development (DALRRD). To derive DEM, we followed the process of converting contour lines to triangulated irregular networks (TIN), and TIN to raster using 3D Analyst in ArcGIS. The initial dataset (road, town centres, railway stations) was sourced from an open-source database (https://hub.tumidata.org/ and https://gis-ethekwini.opendata.arcgis.com/). In this study, we used both water bodies and rivers, hypothesising that they differ in terms of water availability. The river dataset obtained from the government institution of the South African Department of Water and Sanitation, freely available at (https://www.dws.gov.za/iwqs/gis_data/river/rivs500k.aspx), and water bodies were initially derived from GEE sourced from Joint Research Centre (JRC) Global Surface Water Mapping Layer V1.4 (https://code.earthengine.google.com/), accounting for permanent water bodies, consisting of 30 m resolution, detailed by Pekel et al. (2016). Furthermore, proximity parameters, distance from the town centre, road, railway station, rivers, and from waterbodies were computed using Euclidean distance in ArcGIS. While using Euclidean, we accounted for the edge effect by incorporating nearby features into equations. For instance, we not only used cities within each metro to calculate Euclidean distance and derive a raster. This process helps smooth edges. Euclidean distance indicates the probability of class belonging, such that a small distance to a class indicates a high likelihood that the object belongs to that class (Huang and Jia, 2012). Additionally, a global human settlement layer (GHSL) built-up describing artificial places where people live (Pesaresi et al., 2024), available in GEE. We defined and used an image collection (JRC/GHSL/P2023A/GHS_POP/2020). In addition to auxiliary features, the study derives K-Means clustering to guide the unsupervised learning process and the class aggregation technique (F. Zhu et al., 2025). The derived auxiliary variables were up-sampled to 10 m using a bilinear kernel, corresponding to Sentinel-2 resolution, and orthorectified into the local reference system using ArcGIS South Africa National Coordinate System Universal Transverse Mercator (UTM), Haartebeeshoek 1994 datum (Longitude of zone origin 31 and 25) respectively for EM and NMBM.
2.3 Class description
The identification and distinction of several LULCs in this study are based on the South African land cover class level 1 (SALCC1) provided by the Department of Forestry, Fisheries and the Environment (DFFE). The South African land cover classification based on level 1 (class) and level 2 (subclass) is described in Table 2. The South African landcover classification is based on the hierarchical structure derived from the South African Spatial Data Infrastructure (Act No. 54 of 2003) (DRDLR, 2017). For the current study literature, we aggregated the class of grassland together with shrubland (fynbos, succulent karoo, nama karoo), which is dominant especially in the Western Cape and Northern Cape. Also, we integrated quarries and barren areas into one class (barren) due to their similar spectral reflectance. Although LULC-C can be aggregated into numerous classes, this study investigates those that are fundamental for hydrological system and flood alteration. Where classes like built-up areas limit infiltration and increase the chance of runoff, while forests increase the rate of infiltration and reduce runoff.
While the two metros have a heterogeneous landscape, there is a great difference between their wetland ecosystems. eThekwini has a complex wetland system challenging to identify unlike NMBM. eThekwini wetlands systems are mainly found in the floodplain associated with rivers like uMngeni.
2.4 LULC variable extraction and selection
Feature importance and selection is fundamental in machine learning, and this helps to select the most influential parameters (Amoakoh et al., 2021), using various statistical measures (Qu et al., 2021). This helps a balanced distribution among the datasets (Jastrzębski et al., 2016; Kumaraperumal et al., 2025). For this study, we investigated RF feature importance, Pearson correlation coefficient, SHaply Additive exPlanation (SHAPs) values per class, and Chi-square test in Python. All these methods are used for feature selection but with distinct functions of assessing relationship between variables. According to our knowledge, combining this method provides a solid foundation for feature selection.
We determined the correlation among variables; those with a correlation threshold of 0.90 indicate redundancy. Together with the correlation matrix threshold, we selected one with exceedingly high feature importance and with exceptionally high Chi-square; this gives enough evidence on the contribution of bands, indices and auxiliary features in classifying LULC maps. While confirming no standard pipeline between RF feature important and Chi-square, we used Chi-square to assess the statistical association between each feature and the target class, helping us to deprioritise weakly informative features. Not formally integrating Chi-square with other feature selection techniques into a unified ranking algorithm poses a limitation. This study adopted an expert-driven selection approach.
2.4.1 Variable correlation matrix
The study assesses the Pearson correlation between variables to prevent the use of redundant variables in LULC-C. We set a threshold of 90% (0.90), where a variable with a correlation between 0.90 and 1.00, their significance was confirmed by high SHAP values. Highly correlated variables may somewhat look like duplicate variables, where using both does not add value to LULC-C scenarios. A correlation matrix was developed, highlighting the correlation between all variables (selected bands, indices and auxiliary features), individually for both study areas.
2.4.2 Feature importance
To evaluate the potential of each auxiliary and index variable in classifying LULC, we employed RF-based feature importance scores in Python. The feature importance ranked the parameters according to relative importance score between (0–1), equating (0%–100%). These tell the contribution of each variable in classifying LULC classes. This is a straightforward approach that has been explored by numerous researchers in determining variable contribution to prediction (Buthelezi et al., 2024c).
2.4.3 SHAP values
The SHaply Additive explanation (SHAP) (Lundberg et al., 2017), is widely known methods used to evaluate feature attribution (Hosseiny et al., 2022; Lamane et al., 2025). The SHAP assigned each variable an important value for prediction (Lundberg et al., 2017). While giving details on which feature impact model predictive performance by removing one feature at a time and trying the combination of features to see how the new combination change’s predictive ability (McCarty et al., 2021). The SHAP values comprise of two parts: (1) game, referred to the outcome of the learning model and (2) players, which indicate feature of the model (Kavzoglu and Bilucan, 2023). SHAP can be used to rank feature importance while giving details per class feature contribution.
2.4.4 Chi-square
Chi-square feature evaluation ranking features by their statistical dependency on the target variable was utilised similarly to (Ma et al., 2017). The comparative tests of independence can be applied using the chi-squared approach, by assessing two types of comparison: the test of goodness of fit and the test of independence (Zhao et al., 2010.). The ranking of all features was obtained by computing the chi-squared score of the classes. To apply the chi-squared statistics to identify data discrepancies, the numeric properties were converted to discrete values (Liu and Setiono, 1995; Ma et al., 2017).
2.5 Random forest model development
2.5.1 Training, testing and validation datasets
For model training and testing data, we used a Google Earth Pro dated to the same months as the Sentinel-1 dataset (June-August 2020) to derive multiple polygons for various classes. The derived polygons in a compressed version of Keyhole Markup Language (KML) were converted into shapefiles in ArcGIS to be used in GEE. We generated, random points within those polygons in GEE (Sothe et al., 2017), preparing training and testing in point form. The choice of deriving random points within polygons were motivated by the fact that generating labelled points in GEE or Google Earth Pro can be subjective (Mawasha and Britz, 2022) and labour-intensive. The approach of deriving random points inside polygons reduces subjectivity and gives a systematic and repeatable sampling strategy, which matches the approach of random sampling classes in remote sensing. Random sampling within defined polygons is computationally efficient and covered the scope of this study. The additional validation points were solely derived in GEE, accounting for factors like subjectivity. The validation points did not form set with training and testing label data. The validation set was used to validate the classified raster by extract raster values and computing a matrix of the classified raster against true observed classes.
This study employed two ways testing strategies. We conducted a spatial cross-validation and random hold-out validation using training and testing (labelled) samples to evaluate model robustness under different sampling schemes. During a spatial block cross-validation (block-cv) process, we applied RF tunning and assessed model performance-based on macro-F1 score. After carefully analysing a best tuned mode, we extracted RF parameters which were then applied to random hold-out approach. These two approaches (block-cv and random hold-out) were compared against each other. Finally, random hold-out approach was used to classify images.
Cross validation is used to test model performance, where the training set is split into different sections based on chosen cross-validation methods. It trained model on the data in all training folds and test model on the remaining test folds (Sweet et al., 2023). The final model performance is evaluated by averaging the scores of test folds. Spatial similarity and structure in the dataset can lead to model prediction error, when covariate predictors allow model to fit these structural patterns (Roberts et al., 2017). Spatial block-cv has a potential to minimise the spatial dependency problems. While splitting data strategically rather than randomly, block-cv accounts for important choices such as block-shape, block size, number of folds, and assignment of block to folds (Stock, 2025).
The study applied block-cv using 0.01° grid approximately 1 km by 1 km in both stud areas, which was then converted to feature collection. This fine grid was chosen based on multiple-testing and observation of model outcome. The motivation of choosing 0.01°, follows the testing approach that was done. We observed 0.001° to result in high accuracy, however it can be associated with autocorrelation.
Each block was assigned a unique number and cross validation folds were formed by leaving entire spatial block out for testing while training on the remaining blocks, Table 3. A total of 5-folds (Wang et al., 2023), was created and assigned to blocks. This ensure that training and testing samples are spatially independent, unlike random hold-out split that can lead to spatial leakage due to the spatial autocorrelation of satellite pixels. By that, grid search was performed using five RF hyperparameters defined in Section 2.4.3 in Table 7. We used macro-F1 to select the best model for classification scenario 1 (feature set including bands and index) and scenario 2 (feature set including bands, indices and auxiliary layers). The best model with parameters was used to train random hold out.
Following block-cv, we adopted a random hold-out with 70/30 training and testing split to meet our ML model utilisation. The same training sets were used in different modelling scenarios to train the models. Using the same training and testing set ensures that differences in classification accuracy are solely due to input variables, being referred to as auxiliary, bands and indices for this study scope.
The resulting labelled samples with blocks were statistically aggregated and assigned to 5-folds. The samples count per fold are fairly event which was ideal for block-cv. The EM slightly range between 333 and 415 whereas NMBM ranges between 175 and 202 (Table 4). These differences in fold distribution between metros was due to the number of derived labelled dataset.
We further computed the fold mean class prevalence for both metros (Table 5). The mean prevalence shows that Built-up dominated both regions, especially in EM where is accounted for 125 blocks versus 48 in NMBM. Wetland on the hand are relatively small. While the built-up areas dominated the training dataset, the classification task was affected by class imbalance. We therefore evaluated per-class producers’ accuracy (PA), user accuracy (UA) and F1-score to better represent minority classes (Supplemetary Tables A1–4). The inclusion of auxiliary features improved representation of minority classes, increasing F1-score, PA and UA.
In this study, we rigorously trained ML model. Firstly, after evaluating the performance of block-cv and random hold-out. We compared their efficiency by deriving delta macro- and micro-F1. Secondly, after training and applying random hold-out RF, we classified raster and evaluated model performance using (1) 30% random hold-out test and (2) using independent validation set, which was random chosen ensuring that points do not fall within training and validation polygons (Supplementary Figure A1). The 30% subset testing approach which is normally used in supervised classification techniques was used together with independent validation to validate classified raster. The study acknowledge the challenge of accessing high spatial resolution data enabling identification and labelling of wetland (Table 6).
Table 6. The aggregated labelled samples used during random hold-out model training, and validation. Training and testing data are subset of 70/30 and independent validation referred to extra derived dataset that do not fall in 70/30 split.
2.5.2 Random forest model
A RF classifier by Breiman (2001) was chosen because of its resilience to overfitting (Z. Zhu et al., 2016), ability to reduce outliers, ability to characterise high-dimensional data (Breiman, 2001), relatively easy parameterisation (Hurskainen et al., 2019), and robustness against collinear data (Qu et al., 2021), excelling in both classification and regression (Faheem et al., 2024). Each RF tree is defined using randomly chosen samples with replacement (Kavzoglu and Bilucan, 2023). By selecting a random selection of training samples and minimising the correlation with other trees, the RF classifier essentially uses bootstrap aggregation to lower the variance in decision trees before estimating their parameter. Then the output from each decision tree is compiled, and an unweighted mean and majority vote of all the trees is used to produce the outcome (Hosseiny et al., 2022). RF model is flexible and can be further improved by feature reduction (Qu et al., 2021).
2.5.3 Parameter tuning
ML algorithms consist of hyperparameters that need adjustment (tuning) for the model to perform at its best. In RF, parameters must be set: (1) the number of trees to be developed (ntrees/numberOfTrees) in a model and (2) the number of features used at each node to produce a tree (mtry/variablePerSplit) (Hurskainen et al., 2019), (3) minimum leaf population (minLeafPopulation) referred to a minimum number of samples required to form a leaf node where smaller values results in deeper trees, (4) maximum nodes (maxNodes) defining the number of node per tree, limiting trees depth indirectly, (5) bagged fraction (bagFraction) (Svoboda et al., 2022) describing the fraction of training data used per tree also known as bootstrapping; and finally (6) (seed) which is used for reproducibility and was fixed for consistent model training (Table 7). It is recommended to select the ntree parameter so that repeated RF runs yield reliable results (Breiman, 2001).
Table 7. The key RF model hyperparameter tuning used in the study to find the optimal model. The study used model 3 parameters after evaluating the mean macro-F1 score, which is illustrated in Supplementary Figure A1.
During training of spatial block-cv, we select best model (model 3) by comparing average macro-F1 for both scenarios and region. A model with high macro-F1 was used to select parameters for random hold-out classification. This study did not investigate OOB error for spatial-cv, unlike during training of random hold-out. For spatial-cv we prioritise external validation metrics (OA, micro- and macro-F1) to give insight into model diagnostic. However, during block-cv tuning RF models proved reliability in predicting LULC class, scoring accuracy between 80% and 97% (Supplementary Figure B1). This suggests that defined parameters were comparably effective where scenario 2 supersede scenario 1.
2.5.4 Accuracy assessment and model testing
The overall scope of this study was to investigate the impact of integrating auxiliary features into traditional LULC-C; therefore, evaluating the performance of this approach at different levels proves that classification is significant. We extended the traditional approach of validation, which uses a traditional matrix to test model effectiveness. This accounts for testing model performance using a subset of 30% validation set and using additional validation points to extract classifications raster values for evaluation. Corresponding to 30% subset testing, overall accuracy (OA), producer accuracy (PA), user accuracy (UA), omission error (OE), commission error (CE), and F1-score were used. We did not consider kappa for model testing, considering the kappa shortcoming (Pontius and Millones, 2011), and has been discouraged (Olofsson et al., 2014).
2.5.4.1 Model testing
We derived a confusion matrix using the validation set and calculated the overall accuracy (OA) (Tesfaye et al., 2024). Secondly, from the confusion matrix, we calculated producers’ (PA) and users/consumers’ accuracy (UA or CA) (Qu et al., 2021) and F1-score, and omission error (OE), together with commission error (CE). All metrics calculation and derivations were concluded in percentages.
In parallel, these can be calculated using true positive (TP) referred to as a hit, false negative (FN), referred to as a miss, false negative (FN) referred to as a false alarm, and true negative (TN), referred to as correct negatives or correct rejections. Common concepts such as precision and recall evolve from producer and user accuracy. While these validation measures scaled between 0% and 100%, the remaining portion presents error (omitted from correct classification). Where error of omission can be calculated using producer accuracy, and error of commission is calculated using user accuracy.
The out-of-bag error (OOB) was used to confirm the model training efficiency and model training accuracy. The model OOB error estimates serve as an unbiased measure of the generalisation of the model (Rahmati et al., 2016). Further information on OOB can be found in the study Wiesmeier et al. (2011) and Hakim et al. (2022).
2.5.4.2 Classified images comparison with reference maps
In addition to the model testing matrix, we used an additional validation set to assess the classified images for scenarios 1 and 2 against ESRI-LC and SANLC. We investigated the degree of disagreement (Pontius and Millones, 2011; Pontius et al., 2025) in Python. This approach evaluates the efficiency of classified scenarios against reference maps. This was achieved by computing the matrix in GEE between scenario 1 vs. ESRI-LC, scenario 2 vs. ESRI-LC, scenario 1 vs. SANLC and scenario 2 vs. SANLC. For this purpose, we computed quantity disagreement (QD), allocation disagreement (AD), total disagreement (TD) and map agreement index (MAI). These metrics were employed solely to evaluate disagreement between classified maps.
The overall disagreement was decomposed into the allocation and quantity disagreement (Pontius and Millones, 2011). The allocation disagreement compares the spatial arrangement of pixels, while quantity disagreement assesses the disparity in the quantity of a specific class between the classified and reference map (Pontius and Millones, 2011; Warrens, 2015). Moreover, quantity disagreement can address a question concerning the size of classes, while, on the other hand, allocation disagreement addresses a question concerning the allocation of classes. This agreement goes beyond explaining disagreement, where metrics such as Kappa focus on the agreement (Pontius et al., 2025). However, it has been facilitated by literature such as (Warrens, 2015). In the context of overall disagreement, there is no value regarded universally as an indication of good agreement. However, the indication should be interpreted in the context of the study in which it is part. The lower the value of overall disagreement, the higher the agreement between the reference map and the comparison. However, if the overall disagreement between reference and comparison is substantial, the measure can be used to investigate allocation and quantity disagreement. In conditions where quantity disagreement is relatively higher, this means there are substantial differences in the categories of the maps. On the other hand, where allocation disagreement is relatively higher, this indicates that there are substantial differences in the spatial allocation of the categories (Warrens, 2015). Further information on the evaluation and comparison of metrics, including allocation and disagreement, can be found in the literature (Pontius and Millones, 2011; Warrens, 2015; Pontius et al., 2025).
The SANLC map sourced from DFFE was originally derived from Sentinel-2 multispectral imagery using Computer Automated Land Cover (CALC) over South Africa for 2020, reporting 91.5% overall accuracy (Ngcofe and Nkoana, 2023). This dataset consists of 73 land cover classes, gazetted standards by South African National Standards (SANS 19144-2). To be used for validation in this study, we aggregated these classes to SALCC1, meeting the classification criteria used in this study. Since this was provided by a government department, it is believed to be accurate as reported.
On the other hand, the ESRI World Cover dataset provides a global land cover dataset for multiple time points, 2017–2024, at 10 m resolution based on Sentinel-1 data, which can be sourced from the application (https://livingatlas.arcgis.com/landcover/) and available in GEE. The study employed the 2020 ESRI World Cover dataset, which consists of 9 land cover classes (water, trees, flooded vegetation, crops, built-up, bare ground, snow/ice, clouds and rangeland) with an average accuracy of over 75% (Karra et al., 2021). Both reference maps (SANLC and ESRI-LC) were reprojected and reclassified to match our study areas’ LULC scheme, ensuring schematic consistency across all classes.
3 Results
3.1 Auxiliary dataset
The auxiliary features Figures 3a–i gives an overview of the spatial pattern of each phenomenon in EM. Using ArcGIS, distances from the variable using Euclidean distance were derived. While elevation Figure 3a, indicates that the region reached 874.00 m above sea level, decreasing from west to east, making the east coast of the eThekwini range between 0 and 131 m, favoured for development and human settlement. A very high slope
Figure 3. eThekwini Metropolitan auxiliary features: Nelson Mandela Bay Metropolitan auxiliary features: (a) elevation, (b) slope, (c) aspect, (d) distance from roads, (e) distance from rail stations, (f) distance from rivers, (g) distance from water, (h) distance from town and (i) GHSL built-up areas.
Due to the situation of both study areas bordered by the Indian Ocean on the east coast, Nelson Mandela Bay follows a similar elevation pattern to eThekwini and is dominated by elevation
Figure 4. Nelson Mandela Bay Metropolitan auxiliary features: (a) elevation, (b) slope, (c) aspect, (d) distance from roads, (e) distance from rail stations, (f) distance from rivers, (g) distance from water, (h) distance from town and (i) GHSL built-up areas.
3.2 Variable selection
3.2.1 Correlation
The assessment of variable correlation based on Pearson correlation in Figure 5 clearly reveals the relationship among auxiliary features, indices and Sentinel-2 bands used to derive LULC maps of the study area. The study identifies a strong positive correlation
3.2.2 Feature importance
Furthermore, the feature importance plot derived from RF assessing all the proposed variables scaled to 0.10 equating (100%) was ranked according to the importance of the variable in LULC-C. Our results rank the importance of variables differently between our study areas, while EM LULC-C (Figure 6a) identifies GHSL built-up areas as the most influential factor, followed by NDWI and identifies aspect and TPI as the least influential factors. GHSL Built-up areas, K-Means and SWIR1 were identified as the most influential factors in NMBM (Figure 6b), while considering TPI, IBI and distance from rivers, the least influential. Nonetheless, these results indicate the importance of especially GHSL built-up area dataset in classifying LULC, together with other auxiliary variables, suggesting that they can be employed to enhance classification and model performance.
3.2.3 SHAP values
Unlike RF feature importance, SHAP values (Figure 7) give an overview of significance per class. Features that show high correlation were compared in the SHAP value plot. Features such as green, SWIR1, MNDWI, NDBI and EVI relative have less SHAP values than their correlation pairs (Figure 7a). Whereas in Figure 7b, SWIR1 shows a significantly higher SHAP value than SWIR2. While considering NDBI over BSI and NDVI over EVI. The importance of variables was incomparable between EM and NMBM. For instance, SWIR1, NDVI, and NDBI were important in NMBM. Whereas NDVI, BSI and SWIR2 were important in EM.
3.2.4 Chi-square test
Additionally, we limitedly investigated Chi-square to filter and illustratively measure statistical dependencies between variables. Features such as IBI and TPI show relatively low Chi-Square (Figure 8). Although IBI shows relative improvement for SHAP values, Chi-square confirms that it cannot explain the dependent variable alone. With this result, IBI was pronounced as not significant for the classification. Since the main scope of this study was to investigate the influence of auxiliary features, we did not renounce TPI given its relatively low Chi-square score.
Figure 8. Chi-square test measures of statistical dependency between variables and LULC classes (EM and NMBM).
3.3 Selected bands and auxiliary features
After investigating the contribution and correlation among bands, based on a Pearson correlation, RF variable importance measures and Chi-square test, the consistency variables we selected for LULC-C approaches. Respectively, Table 8 indicates variables per category; for instance, the selected bands, indices together with auxiliary features used for scenarios. Similar auxiliary data were selected in both study areas, and that was not the case for bands and indices. For EM, a SWIR1 was chosen over SWIR2 in NMBM and BSI, and NDVI were chosen over NDBI and EVI. The selection of SWIR1 over SWIR2 was decided based on their high
Table 8. A combination of features (bands, indices and auxiliary features) for LULC classification scenarios.
3.4 Classification maps
The classified LULC maps underscore the significance of bands and variable selection in class separation as in addition to spectral index separation. While all our models show the efficiency of the RF model in classifying LULC maps, they achieve high accuracy. The spatial distribution and areas of class are separable from these models. Our model in the proposed two scenarios, using scenario 1 and scenario 2, shows significant improvement as more features are added. Comparable to SANLC and ESRI-LC, these classifications depict different spatial extents of classes, especially the built-up class. Our classification scenarios 1 and 2, in Figures 9a,b, indicate that built-up is concentrated in the eastern part of the EM, while scenario 2 indicates that built-up is scattered towards north, south, and west. Scenario 1 disagrees with the result of scenario 2, where scenario 1 does not indicate a high concentration of built-up areas. Unlike in EM, the NMBM scenarios 1 and 2 follow the same distribution of classes. Nonetheless, SANLC (Figure 9c) and ESRI-LC (Figure 9d) show somewhat different spatial distribution of classes, where they argue that EM is largely occupied by built-up areas.
Figure 9. Land use and land cover classification scenarios against the two reference maps. Scenario 1 (a, e), scenario 2 (b, f), SANLC map (c, g) and ESRI-LC map (d, h) respectively for EM and NMBM.
Moreover, the classification scenarios improved in NMBM. Classification scenarios reveal similar patterns of class distribution as compared to reference maps. The northern part of NMBM, in all classifications, shows a different pattern of forest and grassland distribution. The visual inspection indicates that there is a degree of disagreement between maps.
Overall comparison proves that the integration of the auxiliary features adapted well in the NMBM landscape compared to EM.
3.4.1 Comparison between classification
The difference in class distribution extent could be the result of model training differences, where SANLC built-up areas consist of numerous classes such as vegetation within built-up zones, whereas our derived training dataset differentiates and identifies built-up areas not inclusive of vegetation along residential areas and urban. The chosen area of reference (A–F), Figure 10, which can be referred to as Figure 1 for its spatial location, indicates the prediction of different classes for both EM and NMBM and all classifications. The significant difference between these results is noticeable in small holdings or sparse buildings in rural communities, where vegetation or grassland can be depicted by satellite. In Figure 10, reference A–B, the clear distinction between scenario predictions and SANLC and ESRI-LC can be identified, where SANLC and ESRI-LC classify built-up with vegetation as built-up. In that case, SANLC and ESRI-LC overestimated built-up zones. Similarly, in reference B where SANLC overestimated built-up areas, whereas other classifications identified grassland, forest, and built-up classes. The issue arises with reference C, where scenario 1 predicted cropland and scenario 2 elevated the cropland prediction, due to spectral reflectance, ESRI-LC gives a totally different estimation, indicating grassland. SANLC, on the other hand, overestimated the region as built. Scenario 1 and 2 in reference F gives clear class distinctions, where SANLC and ESRI-LC could not detect the pattern of cropland. There is, however, a strong agreement for classification between scenarios 1 and 2 in reference D. Overall, scenario 2 has the ability to distinguish small changes in class variation.
Figure 10. Land use and land cover classification scenarios comparison against the 2020 reference maps (SANLC and ESRI-LC). The location of reference A-E is described in Figure 1.
3.4.2 Statistical area analysis
To facilitate the difference between the area coverages of each class in different classification scenarios, we computed the statistical summary in Figure 11 and Table 9. We identify a gradual difference in built-up, grassland, and cropland between classification techniques, especially in EM. Respectively for EM, while SANLC proved built-up to be a dominant class, covering up to 1183.65 km2 (46.3%) followed by forest at 663.98 km2 (26%), scenario 2 identifies built-up to be 601.96 km2 (23.6%), forest to be 860.73 km2 (33.7%), scenario 1 recognise built-up at 419.76 km2 (16.4%) while considering forest at 893.86 km2 (35.0%), and ESRI-LC on other side highlight that EM built-up account for 1191.61 km2 (46.6%) while considering forest at 702.32 km2 (27.5%).
Figure 11. The area representation and comparison between the classification scenarios and the reference maps.
Table 9. The area representation and comparison in percentage (%) between the classification scenarios and the reference maps..
Similarly, NMBM (Figure 11; Table 9), ESRI-LC results show that built-up, forest, and grassland are among the dominant classes. In NMBM, scenario 1, scenario 2 identify grassland to be dominant over other classes, respectively (802.65 km2 and 732.72 km2), while SANLC and ESRI-LC indicate forest as the dominant class, 708.55 km2 (36.15%) and 769.24 km2 (39.25%). Unlike in EM, for NMBM, there is a high degree of agreement between classes, whereas in EM, a huge difference is observable between the areas of each class, especially for built-up, grassland and water.
Between classifications in both study regions, the study identifies almost similar areas for wetland, barren, and cropland. These results further identify challenges in LULC-C models.
3.4.3 Scenario 1 and scenario 2 comparison
While investigating the area difference between classification scenarios 1 and 2 which were the core investigations in this study. The study highly the contribution of auxiliary features to traditional land cover classification. The study demonstrates a significant difference in built-up, cropland, and grassland between classifications. Change from scenario 1 to scenario 2 is calculated by subtracting scenario 1 from scenario 2, Equation 1. Over EM, scenario 2 significantly projected built-up area with a difference of 182.20 km2 from scenario 1, while indicating major loss of cropland, forest and grassland. However, scenario 2 and scenario 1 reflect equivalent waterbodies, barren and wetland. The changes of area in NMBM, on the other hand, slightly differ from EM. Where scenario 2 results in the declination of built-up areas 23.75 km2, forest 45.59 km2 and grassland 21.36 km2. Grassland increased significantly from model scenario 1 to scenario 2, attaining 76.93 km2, indicating the potential of scenario 2 to classify grassland, Figure 12.
Figure 12. The difference of an area in each class between classification scenario 2 and classification scenario 1 (scenario 2 – scenario 1).
3.4.4 Pixel-based confusion and agreement of classes between scenario 1 and scenario 2
Although scenario 1 and scenario 3 used different classification features, there is, however, ideal agreement between classes. These scenarios show a very stable agreement between water, strong agreement, but some changes for forest, moderate stability with disagreement for built-up, high agreement between cropland, stable agreement for grassland and weak agreement for wetland, especially over EM. Over EM, about 84.94% wetland pixels were classified as forest in scenario 2, indicating potential improvement Figure 13.
3.5 Spatial block-cv and random hold-out validation
When validating block-cv, RF best tuned model shows that classes were effectively predicted with their accuracy ranging between 81.26–81.82 and 93.64–94.25 at EM, respectively, for scenario 1 and scenario 2, Supplementary Table B1. While EM confirms this stability, NMBM shows a remarkable predictive performance, attaining between 91.96–92.65 and 95.25–96.17, F1-score per class, respectively for scenario 1 and scenario 2. The quantification of F1-score, PA and UA per class (Supplementary Tables C1–4) shows that cropland and wetland resulted in confusion, resulting in reduced accuracy compared to other class.
In all validations (OA and Macro-F1) in both regions, accuracy improved from scenario 1 to scenario 2, Supplementary Table C5. These results suggest the stability of scenario 2 integration with block-cv. Although scenario 1 itself pronounced high accuracy, scenario 2 surpassed its performance. The result also suggests the potential of random hold-out training and validation methods. The difference between block-cv and random hold-out delta (Supplementary Table C6) suggests a significant of scenario 2, which maintain small delta between block-cv and random hold-out. For EM, scenario 2 attain OA delta of −1.31%, whereas scenario 1 attain −3.22%. Whereas in NMBM, scenario 2 attains OA delta of −0.08, where scenario 1 achieves −2.43. Confirming the strength of scenario 2, this pattern is also followed by Macro-F1, attain −3.30% for scenario 2 and –5.60 for scenario 1 in NMBM, Supplementary Table C6.
3.6 Classified raster validation
The defined RF models performed exceptionally well in classifying LULC maps of our study area. The overall OA was performed during model validation, employing a two-step validation of the classified raster. The validation with 30% random hold-out test data differs from independent validation, which utilised independent samples. This process is renounced for its ability to evaluate a model with data that it has never seen. However, this validation is underlined by several limitations. In this regard, scenario 2 performs exceptionally well compared to scenario 1, Figure 14. The independent validation shows reduced OA in both classification scenarios and regions. For instance, over NMBM, 30% validation indicates that scenario 1 attains 97.66% while scenario 2 stands out with 99.33%. Independent validation shows somewhat different outcomes having stabilised from 82.29% to 88.57% (Figure 14).
Figure 14. Overall accuracy assessment of the classified raster, respectively for 30% validation and independent validation.
While using an independent validation set, the study found confidence interval (CI) for OA of 83.63% [CI: 77.78–88.89] improved to 89.47% [CI: 84.79–94.15], respectively for scenario 1 and scenario 2 over EM. This finding was confirmed by NMBM, where OA of 82.29% [CI: 76.57–87.00] stabilised to 88.57% [CI: 84.00–93.14] for scenario 1 and scenario 2 (Supplementary Table D1). Similarly, 30% validation shows improvement from scenario 1 to scenario 2, although this method was more effective compared to independent validation. The performance improvement was statistically significant, attaining p-values of 0.03 and 0.015, respectively, for EM and NMBM using an independent validation set. Whereas the improvement was significant in EM attaining p =
3.7 Matrix and validation assessment
3.7.1 Classified image validation with 30% random hold-out
Furthermore, we investigated the accuracy measure using UA, PA, F1-score, OE, and CE over two study regions, based on 30% testing sets from a random hold-out. The consideration of auxiliary variables shows stability for UA, PA and F1-score, Figure 15. Although scenario 1 achieved high accuracy OA
Figure 15. Classified raster accuracy assessment based on scenario 1 and scenario 2 using confusion matrix (PA, UA, F1-score, OE and CE) for 30% validation over EM and NMBM.
3.7.2 Classified image validation with independent validation data
The additional derived dataset, which validates the classified image, shows somewhat different accuracies compared to validation using the 30% validation. Raster validation, Figure 16, indicates the associated degree of error for all classification scenarios. For scenario 1 over EM, the wetland resulted in a high 45% omission error, while the forest resulted in 40.54% commission error. This was improved in scenario 2 over EM, where forest CE was reduced to 33.33% and wetland CE was reduced to 35%. This transition between errors shows improvement and reliability of the scenario 2 classification approach. In contrast, highlighting the confusion and instability of wetland and forest training.
Figure 16. Classified raster accuracy assessment based on scenario 1 and scenario 2 using confusion matrix (PA, UA, F1-score, OE and CE) for an independent validation set over EM and NMBM.
On the other hand, scenario 1 over NMBM also indicates a high OE of 36% for wetland and 28.57% CE for grassland. Comparable, this was subsequently improved when compared to scenario 2, where wetland OE was reduced to 28% and grassland CE remains at 31.25%. Overall assessment shows that OE and CE were reduced while increasing accuracy when classifying images using scenario 2. Wetlands show elevated commission errors, particularly near urban fringes and low-lying terrain. This is likely due to spectral confusion with cropland and built-up areas, seasonal variability in water content, and smoothing effects from bilinear sampling.
Although an independent validation is useful in strengthening the credibility and generalisability of our results. We found inconsistencies in more generalisability compared to 30% validation. Since model performs well in 30% validation, and not well in independent validation, this can suggest overfitting and ability of model to memorise training patterns but without solid generalisation. Additionally, this could be associated with data collection approaches.
3.8 Disagreement between scenarios and reference maps
The disagreement computed using QD, AD, TD, and MAI assists in investigating and testing the overall disagreement between classified scenarios and reference maps, to evaluate the global compatibility classification approach (Figure 17). When comparing classification scenarios against ESRI-LC and SANLC, our models highlight the associated degree of disagreement. In EM, scenario 2 shows improvement in terms of disagreement with the reference classification maps. Scenario 2 against ESRI-LC depicted exceptionally low QD (21.94%) and AD (10.58%), compared to scenario 1 with QD (29.03%) and AD (15.43%). Nonetheless, scenario 2 shows improved total agreement with reference maps compared to scenario 1, concluding reliable performance.
Figure 17. Classification scenario 1 and 2 degree of disagreement against the 2020 ESRI-LC and SANLC reference maps (EM and NMBM). Indicating overall disagreement and agreement between classifications maps.
Additionally, almost similar pattern is observed for NMB. Although classification scenarios improved agreement with reference maps, underpinned the effect of heterogeneous environments. Overall, QD were found to fall below 11%, with AD below 22% and overall, MAI lies between 62% and 79%. Although classification scenarios were derived from the same training set, scenario 1 and scenario 2 do not agree with 100% having (78.27% and 87.94%) respectively, for EM and NMBM. This indication shows that scenario 2 was significant in changing classification in EM over NMBM. These results also indicate an observed TD of 21.78% and 12.06) for scenario 1 and scenario 2, respectively, for EM and NMBM. The dynamic disagreement between the models indicates the degree of dissimilarity between classification approaches.
4 Discussion
It is widely accepted that the LULC-C is vital for several reasons, such as tracking the change of LULC patterns over time, land ownership and conflict, and understanding the impact of LULC change on various environmental processes, such as hydrological modelling and surface runoff. The current study facilitates the enhancement of LULC-C using Sentinel-2 bands, with varied remote sensing indices and an auxiliary features. The performance of our model in classifying LULC was validated by cross-validation, OA, PA, F1-score, OE, CE, and further comparison using disagreement (QD, AD, TD and MAI), against SANLC and ESRI-LC.
While classification scenarios 1 and 2 achieved comparable accuracy, scenario 2 superseded scenario 1 in all classification schemes in both study regions. While using an independent validation set, the study found confidence interval (CI) for OA of 83.63% CI: 77.78–88.89 improved to 89.47% CI: 84.79–94.15, respectively, for scenario 1 and scenario 2 over EM. This finding was confirmed by NMBM, where OA of 82.29% CI: 76.57–87.43 stabilised to 88.57% CI: 84.00–93.14 for scenario 1 and scenario 2. The performance improvement was statistically significant, attaining p-values of 0.03 and 0.02, respectively, for EM and NMBM using an independent validation set. However, while using 30% validation subset, results show in-significant improvement in NMBM attaining p =
The integration of auxiliary data improves the model accuracy, complementing spectral and texture from satellite images, aligning with literature discovery (Qu et al., 2021; Yimer et al., 2024). Although using indices together with bands can result in high accuracy, as depicted from our literature and complementing other studies that have witnessed the importance of remote sensing bands in LULC-C (Tahir et al., 2025). Using auxiliary data illustratively gives a better estimation of class distribution.
Our model classification and class distribution correspond with the literature, such as Buthelezi et al. (2024c), who investigated LULC change over EM spanning from 2002 to 2022, and found impervious surface to be concentrated in the north and eastern part of the region. Although Buthelezi et al. (2024c) did not investigate the 2020 LULC, the spatial distribution of classes in 2022 and 2017 can be referenced in our classification. However, our results show a somewhat wide extent achieved by the integration of auxiliary data. While the model increased spectral separability, the built-up class was significantly improved. The fact that we found a greater extent of built-up area was due to the inclusion of the GHSL built-up, distance from town and roads dataset in the model. Furthermore, the contribution of auxiliary parameters to LULC corroborates with other studies, which identified elevation, slope degree to be influential (Hurskainen et al., 2019). Additionally, this literature confirms the importance of city centres, roads, and railway stations, as used in other studies, for instance, Zhang et al. (2023) used these parameters in addition to topographic data for LULC forecasting and demonstrated their capability in maintaining the CA-Markov model. This further motivated the importance of classifying LULC. Complementary, the performance of RF in this study aligns with the proven reliability and robustness of the RF classifier. Where, Zafar et al. (2024) compared RF against classification and regression tree, support vector machine for LULC-C using Google Earth Engine, and proved that in all class predictions, RF attain very high user and producer accuracy. Furthermore, the Sentinel-2 dataset provided effective high spatial and spectral resolution and data availability (Hosseiny et al., 2022).
This study is unique in providing an in-depth integration of auxiliary data into LULC-C, whereas other studies have studied regional LULC-C and change through remote sensing satellite bands and index combination. To our knowledge, this study is among the first to integrate topographic and distance-based auxiliary features into multiclass LULC classification for South African metros using Sentinel-2 composites. Although South Africa’s LULC is understood to a wide extent, they do not facilitate the improvement of LULC-C with the use of additional auxiliary data. Building on these study findings, a critical tracking of urban transformation can be enhanced.
Although, we proved the enhancement of integrating auxiliary data to LULC-C. We acknowledge the limitations, (i) RF model was trained using a weak dataset (Tang et al., 2025), not extensively derived. (ii) During the collection of training data, instead of collecting ground truth data, the study relied on using satellite images with reference to the SANLC map. The absence of ground-truth label data significantly pose a challenge for label accuracy, particularly in spectrally ambiguous classes such as wetlands and cropland. Although, a high-resolution Google Earth Pro images were used for labelling data, future works should incorporate ground truth validation to strengthen external validity and support transferability of the classification approach and confidence in class-levels. For both metros, (iii) we selected Sentinel-2 image and used a median image for dry-winter (June, July and August) without investigating different seasons, which can have an impact on classes that change seasonally and limit phenological generalisation. Selecting June, July and August image ensure that we minimise cloud cover and vegetation phenology effect. Finally, (iii) the up-sampling of auxiliary data to correspond to Sentinel-2 spatial resolution may potentially increase their discriminative power. The categorisation of wetland within study areas, especially EM was challenging, and this limitation was overcome by identifying wetland areas which are widely recognised and those that have been confirmed by SANLC maps, this still need intense evaluation.
Future research directions should focus on the spatiotemporal generalisation test across additional South African landscapes, incorporating multi-seasonal, different time point composites capturing seasonal dynamics and exploit phenological differences, evaluate model transferability and incorporate in situ validation to strengthen model validity, and improve confidence in class-levels distinction. We also aim to evaluate and refine auxiliary feature selection and explore ensemble-based fusion to improve class separability, especially wetland and cropland separability. Effective derivation of a training dataset in more complex landscape can be tested to enrich the current understanding of the auxiliary features for LULC-C.
5 Conclusion
This study addresses a critical LULC classification, shifting from the traditional approach of comparing and assessing the efficiency of the classifier. The primary objective was to assess the efficiency of integrating auxiliary featuress, proximity parameters (distance from water, rivers, road, railway stations, town centre), topographic data (elevation, slope, aspect) and GHSL built-up into LULC-C over a complex terrain of eThekwini and Nelson Mandela Bay Metropolitan Municipalities landscapes based on the RF machine learning algorithm.
Key findings from this study indicate that the integration of auxiliary featuress has a considerable positive impact on increasing model performance and overall LULC-C. Auxiliary dataset not only increases model performance, but it also significantly improves class distinctions and separation, which might otherwise be missed by traditional classification which uses only band and indices. This study further demonstrates an open-source geospatial data can be employed in LULC-C literature to define very fine and coarse class separation.
The implications of these study findings are profound and underline the significant of auxiliary features to land cover mapping. Valuable to urban development and land use change management strategies by providing valuable insight into which dataset could be employed to identify built-up areas. Taking from this model, urban expansion can be addressed by a global human settlement layer, especially in a continuously developing city like eThekwini, where informal settlements are witnessed to be scattered over urban peripheries. Moreover, since this was the first time these parameters were tested using a local SA dataset and comparing using satellite bands, indices and auxiliary featuress at the SA heterogeneous landscape, the study contributes to the existing body of knowledge on remote sensing-based LULC-C methodology. However, the analysis of this study was limited to two South African metros, eThekwini and Nelson Mandela Bay, therefore we advise cautious over generalising the results to other South African landscapes. Nonetheless, the heterogeneity of these regions provides valuable empirical evidence for diverse urban and peri-urban settings.
In conclusion, this study demonstrates the potential of integrating auxiliary data (terrain and proximity features) into multiclass LULC classification using Sentinel-2 imagery. This approach can be used to support the complexity and labour-intensive derivation of a training set for LULC classification. Further proactive approaches should be considered to enhance classification accuracy, especially for wetland separability. While promising, the findings are subject to limitations including lack of field data validation, seasonal and multi-time-point constraints and limited geographical scope. Future research scope will expand spatial coverage, model transferability, and incorporate multi-seasonal, time point, and pursue ground-based validation to enhance model robustness and generalisability.
Data availability statement
The raw data supporting the conclusion of this article will be made available by the author, without undue reservation. The script used to process and formulate the findings can be found: https://github.com/idcossa/LULC_ClassificationApproaches-EMandNMBM. Additional sources of data acquisition are as follows: SANLC maps: https://gee-community-catalog.org/projects/sa_nlc/; ESRI-LC map: https://livingatlas.arcgis.com/landcoverexplorer/; Topographic features: http://www.cdngiportal.co.za/CDNGIPortal/; Road, towns and railways: https://hub.tumidata.org/ and https://gis-ethekwini.opendata.arcgis.com/; Rivers: https://www.dws.gov.za/iwqs/gis_data/river/rivs500k.aspx.
Author contributions
IS: Resources, Investigation, Writing – original draft, Software, Formal Analysis, Visualization, Data curation, Conceptualization, Methodology, Writing – review and editing. SX: Visualization, Formal Analysis, Conceptualization, Investigation, Writing – review and editing, Supervision. MG: Visualization, Conceptualization, Investigation, Formal Analysis, Supervision, Writing – review and editing.
Funding
The authors declare that no financial support was received for the research and/or publication of this article.
Acknowledgments
Irvin D. Shandu was funded by the Postgraduate Bursary of the South African National Space Agency (SANSA) for 2025.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that Generative AI was used in the creation of this manuscript. QuillBot.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frsen.2025.1697897/full#supplementary-material
References
Abdelmajeed, A. Y. A., and Juszczak, R. (2024). Challenges and limitations of remote sensing applications in northern peatlands: present and future prospects. Remote Sens. 16 (3), 591. doi:10.3390/RS16030591
Acharki, S. (2022). PlanetScope contributions compared to Sentinel-2, and Landsat-8 for LULC mapping. Remote Sens. Appl. Soc. Environ. 27, 100774. doi:10.1016/J.RSASE.2022.100774
Adam, E., Masupha, N. E., and Xulu, S. (2023). Spatial assessment and prediction of urbanization in Maseru using Earth observation data. Applied Sciences 3 (10), 5854. doi:10.3390/APP13105854
Alawi, S. A., and Özkul, S. (2023). Evaluation of land use/land cover datasets in hydrological modelling using the SWAT model. H2Open J. 6 (1), 63–74. doi:10.2166/h2oj.2023.062
Amini, S., Saber, M., Rabiei-Dastjerdi, H., and Homayouni, S. (2022). Urban land use and land cover change analysis using random forest classification of landsat time series. Remote Sens. 14 (11), 2654. doi:10.3390/rs14112654
Amoakoh, A. O., Aplin, P., Awuah, K. T., Delgado-fernandez, I., Moses, C., Alonso, C. P., et al. (2021). Testing the contribution of multi-source remote sensing features for random forest classification of the greater amanzule tropical peatland. Sensors 21 (10), 3399. doi:10.3390/s21103399
Ashfaq, S., Tufail, M., Niaz, A., Muhammad, S., Alzahrani, H., and Tariq, A. (2025). Flood susceptibility assessment and mapping using GIS-based analytical hierarchy process and frequency ratio models. Glob. Planet. Change 251, 104831. doi:10.1016/j.gloplacha.2025.104831
Badshah, M. T., Hussain, K., Rehman, A. U., Mehmood, K., Muhammad, B., Wiarta, R., et al. (2024). The role of random forest and Markov chain models in understanding metropolitan urban growth trajectory. Front. For. Glob. Change 7, 1345047. doi:10.3389/ffgc.2024.1345047
Bhungeni, O., Ramjatan, A., and Gebreslasie, M. (2024). Evaluating machine-learning algorithms for mapping LULC of the uMngeni catchment area, KwaZulu-Natal. Remote Sens. 16 (12), 2219. doi:10.3390/rs16122219
Bhungeni, O., Gebreslasie, M., and Ramjatan, A. (2025). LULC change detection and future LULC modelling using RF and MLPNN-Markov algorithms in the uMngeni catchment, KwaZulu-Natal, South Africa. Front. Environ. Sci. 13, 1543524. doi:10.3389/fenvs.2025.1543524
Birhane, E., Ashfare, H., Fenta, A. A., Hishe, H., Gebremedhin, M. A., G. wahed, H., et al. (2019). Land use land cover changes along topographic gradients in Hugumburda national forest priority area, Northern Ethiopia. Remote Sens. Appl. Soc. Environ. 13, 61–68. doi:10.1016/J.RSASE.2018.10.017
Buthelezi, M. N. M., Lottering, R., Peerbhay, K., and Mutanga, O. (2024a). Optimising forest rehabilitation and restoration through remote sensing and machine learning: mapping natural forests in the eThekwini municipality. Remote Sens. Appl. Soc. Environ. 36, 101335. doi:10.1016/J.RSASE.2024.101335
Buthelezi, M. N. M., Lottering, R. T., Peerbhay, K. Y., and Mutanga, O. (2024b). A machine learning approach to mapping suitable areas for forest vegetation in the eThekwini municipality. Remote Sens. Appl. Soc. Environ. 35, 101208. doi:10.1016/J.RSASE.2024.101208
Buthelezi, M. N. M., Lottering, R. T., Peerbhay, K. Y., and Mutanga, O. (2024c). Predicting land use and land cover change dynamics in the eThekwini municipality: a machine learning approach with Landsat imagery. J. Spatial Sci. 69, 1241–1263. doi:10.1080/14498596.2024.2378362
Cristille, P. L., Bernhard, E., Cox, N. L. J., Bernard-Salas, J., and Mangin, A. (2024). Earth observation multi-spectral image fusion with transformers for Sentinel-2 and Sentinel-3 using synthetic training data. Remote Sens. 16 (16), 3107. doi:10.3390/RS16163107
DRDLR (2017). South African land cover classes and definitions approved in terms of spatial data infrastructure act NO 54, 2003. Available online at: www.gpwonline.co.za.
Faheem, Z., Kazmi, J. H., Shaikh, S., Arshad, S., and Mohammed, S. (2024). Random forest-based analysis of land cover/land use LCLU dynamics associated with meteorological droughts in the desert ecosystem of Pakistan. Ecol. Indic. 159, 111670. doi:10.1016/J.ECOLIND.2024.111670
Giri, C., Zhu, Z., and Reed, B. (2005). A comparative analysis of the global land cover 2000 and MODIS land cover data sets. Remote Sens. Environ. 94 (1), 123–132. doi:10.1016/J.RSE.2004.09.005
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., and Moore, R. (2017). Google Earth engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27. doi:10.1016/J.RSE.2017.06.031
Hakim, W. L., Nur, A. S., Rezaie, F., Panahi, M., Lee, C. W., and Lee, S. (2022). Convolutional neural network and long short-term memory algorithms for groundwater potential mapping in anseong, South Korea. J. Hydrology Regional Stud. 39, 100990. doi:10.1016/j.ejrh.2022.100990
Hosseiny, B., Abdi, A. M., and Jamali, S. (2022). Urban land use and land cover classification with interpretable machine learning – a case study using Sentinel-2 and auxiliary data. Remote Sens. Appl. Soc. Environ. 28, 100843. doi:10.1016/J.RSASE.2022.100843
Huang, Z., and Jia, X. (2012). Integrating remotely sensed data, GIS and expert knowledge to update object-based land use/land cover information. Int. J. Remote Sens. 33 (4), 905–921. doi:10.1080/01431161.2010.536182
Huang, X., Yin, Y., Feng, L., Tong, X., Zhang, X., Li, J., et al. (2024). A 10 m resolution land cover map of the Tibetan Plateau with detailed vegetation types. Earth Syst. Sci. Data 16 (7), 3307–3332. doi:10.5194/ESSD-16-3307-2024
Hurskainen, P., Adhikari, H., Siljander, M., Pellikka, P. K. E., and Hemp, A. (2019). Auxiliary datasets improve accuracy of object-based land use/land cover classification in heterogeneous savanna landscapes. Remote Sens. Environ. 233, 111354. doi:10.1016/j.rse.2019.111354
Indraja, G., Aashi, A., and Vema, V. K. (2024). Spatial and temporal classification and prediction of LULC in brahmani and baitarni basin using integrated cellular automata models. Environ. Monit. Assess. 196 (2), 117. doi:10.1007/s10661-023-12289-0
Jastrzębski, S., Leśniak, D., and Czarnecki, W. M. (2016). Learning to SMILE(S). Available online at: http://arxiv.org/abs/1602.06289.
Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J. C., Mathis, M., and Brumby, S. P. (2021). Global land use/land cover with sentinel 2 and deep learning. Int. Geoscience Remote Sens. Symposium (IGARSS), 4704–4707. doi:10.1109/IGARSS47720.2021.9553499
Kasahun, M., and Legesse, A. (2024). Machine learning for urban land use/cover mapping: Comparison of artificial neural network, random forest and support vector machine, a case study of dilla town. Heliyon 10 (20), e39146. doi:10.1016/J.HELIYON.2024.E39146
Kavzoglu, T., and Bilucan, F. (2023). Effects of auxiliary and ancillary data on LULC classification in a heterogeneous environment using optimized random forest algorithm. Earth Sci. Inf. 16 (1), 415–435. doi:10.1007/s12145-022-00874-9
Kayitesi, N. M., Guzha, A. C., Tonini, M., and Mariethoz, G. (2024). Land use land cover change in the African Great Lakes region: a spatial–temporal analysis and future predictions. Environ. Monit. Assess. 196 (9), 852. doi:10.1007/s10661-024-12986-4
Kumaraperumal, R., Raj, M. N., Pazhanivelan, S., Jagadesh, M., Selvi, D., Muthumanickam, D., et al. (2025). Data mining techniques for LULC analysis using sparse labels and multisource data integration for the hilly terrain of Nilgiris district, Tamil Nadu, India. Earth Science Informatics, 18 (1). doi:10.1007/s12145-024-01586-y
Kgaphola, M. J., Ramoelo, A., Odindi, J., Mwenge Kahinda, J. M., Seetal, A. R., and Musvoto, C. (2023). Impact of land use and land cover change on land degradation in rural semi-arid South Africa: case of the greater sekhukhune district municipality. Environ. Monit. Assess. 195 (6), 710. doi:10.1007/s10661-023-11104-0
Khalid, H. W., Khalil, R. M. Z., and Qureshi, M. A. (2021). Evaluating spectral indices for water bodies extraction in western Tibetan Plateau. Egypt. J. Remote Sens. Space Sci. 24 (3), 619–634. doi:10.1016/J.EJRS.2021.09.003
Lamane, H., Mouhir, L., Moussadek, R., Baghdad, B., Kisi, O., and El Bilali, A. (2025). Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration. Int. J. Sediment Res. 40 (1), 91–107. doi:10.1016/J.IJSRC.2024.10.002
Liu, H., and Setiono, R. (1995). Chi2: feature selection and discretization of numeric attributes. Proc. Int. Conf. Tools Artif. Intell., 388–391. doi:10.1109/TAI.1995.479783
Lundberg, S., Lundberg, S. M., Allen, P. G., and Lee, S.-I. (2017). A unified approach to interpreting model predictions. doi:10.48550/arXiv.1705.07874
Ma, L., Fu, T., Blaschke, T., Li, M., Tiede, D., Zhou, Z., et al. (2017). Evaluation of feature selection methods for object-based land cover mapping of unmanned aerial vehicle imagery using random forest and support vector machine classifiers. ISPRS Int. J. Geo-Information 6 (2), 51. doi:10.3390/IJGI6020051
Magidi, J., Bangira, T., Kelepile, M., and Shoko, M. (2024). Land use and land cover changes in notwane watershed, Botswana, using extreme gradient boost (XGBoost) machine learning algorithm. Afr. Geogr. Rev. 44, 497–517. doi:10.1080/19376812.2024.2424378
Mawasha, T., and Britz, W. (2022). Detecting land use and land cover change for a 28-year period using multi-temporal landsat satellite images in the Jukskei River catchment, Gauteng, South Africa. South Afr. J. Geomatics 11 (1). doi:10.4314/sajg.v11i1.2
McCarty, D., Lee, J., and Kim, H. W. (2021). Machine learning simulation of land cover impact on surface urban heat Island surrounding park areas. Sustainability 13 (22), 12678. doi:10.3390/SU132212678
Melly, B. L., Schael, D. M., Rivers-Moore, N., and Gama, P. T. (2017). Mapping ephemeral wetlands: manual digitisation and logistic regression modelling in Nelson Mandela Bay Municipality, South Africa. Wetl. Ecol. Manag. 25 (3), 313–330. doi:10.1007/s11273-016-9518-7
Musetsho, K. D., Chitakira, M., and Nel, W. (2021). Mapping land-use/land-cover change in a critical biodiversity area of South Africa. Int. J. Environ. Res. Public Health 18 (19), 10164. doi:10.3390/ijerph181910164
Nahid, S., Kumar, R. P., Acharya, P., Kumar, K., and Purohit, S. (2025). Forecasting urban expansion in Delhi-NCR: integrating remote sensing, machine learning, and Markov chain simulation for sustainable urban planning. GeoJournal 90 (2), 72. doi:10.1007/s10708-025-11317-5
Ngcofe, L., and Nkoana, K. (2023). Sentinel-2 land cover product comparison: south African National land cover 2020 vs ESRI global land cover 2020. Abstr. Int. Cartogr. Assoc. 6, 1–2. doi:10.5194/ica-abs-6-186-2023
Nxumalo, N., Nzimande, N. P., and Xulu, S. (2025). Geo-temporal analysis of land use dynamics in the Dolphin Coast, KwaZulu-Natal, South Africa, using RapidEye and PlanetScope imagery. Frontiers in Environmental Science, 13, 1639760. doi:10.3389/FENVS.2025.1639760
Odindi, J., Mhangara, P., and Kakembo, V. (2012). Remote sensing land-cover change in Port Elizabeth during South Africa’s democratic transition. South Afr. J. Sci. 108 (5). doi:10.4102/sajs.v108i5/6.886
Olofsson, P., Foody, G. M., Herold, M., Stehman, S. V., Woodcock, C. E., and Wulder, M. A. (2014). Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 148, 42–57. doi:10.1016/J.RSE.2014.02.015
Pekel, J. F., Cottam, A., Gorelick, N., and Belward, A. S. (2016). High-resolution mapping of global surface water and its long-term changes. Nature 540 (7633), 418–422. doi:10.1038/nature20584
Pesaresi, M., Schiavina, M., Politis, P., Freire, S., Krasnodębska, K., Uhl, J. H., et al. (2024). Advances on the global human settlement layer by joint assessment of Earth observation and population survey data. Int. J. Digital Earth 17 (1), 2390454. doi:10.1080/17538947.2024.2390454
Pontius, R. G., and Millones, M. (2011). Death to kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. Int. J. Remote Sens. 32 (15), 4407–4429. doi:10.1080/01431161.2011.552923
Pontius, R. G., Francis, T., and Millones, M. (2025). A call to interpret disagreement components during classification assessment. Int. J. Geogr. Inf. Sci. 39 (7), 1373–1390. doi:10.1080/13658816.2025.2469830
Qu, L., Chen, Z., Li, M., Zhi, J., and Wang, H. (2021). Accuracy improvements to pixel-based and object-based LULC classification with auxiliary features from google Earth engine. Remote Sens. 13 (3), 453. doi:10.3390/rs13030453
Rahmati, O., Reza, H., and Melesse, A. M. (2016). Catena application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran region, Iran. Catena 137, 360–372. doi:10.1016/j.catena.2015.10.010
Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., et al. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40 (8), 913–929. doi:10.1111/ECOG.02881
Setiawan, O., Rahayu, A. A. D., Samawandana, G., Tata, H. L., Dharmawan, I. W. S., Rachmat, H. H., et al. (2024). Unraveling land use land cover change, their driving factors, and implication on carbon storage through an integrated modelling approach. Egypt. J. Remote Sens. Space Sci. 27 (4), 615–627. doi:10.1016/j.ejrs.2024.08.002
Shandu, I. D., and Atif, I. (2023). An integration of geospatial modelling and machine learning techniques for mapping groundwater potential zones in Nelson Mandela Bay, South Africa. WaterSwitzerl. 15 (19), 3447. doi:10.3390/w15193447
Sisay, G., Gesesse, B., Fürst, C., Kassie, M., and Kebede, B. (2023). Modeling of land use/land cover dynamics using artificial neural network and cellular automata Markov chain algorithms in Goang watershed, Ethiopia. Heliyon 9 (9), e20088. doi:10.1016/J.HELIYON.2023.E20088
Sothe, C., de Almeida, C. M., Liesenberg, V., and Schimalski, M. B. (2017). Evaluating Sentinel-2 and Landsat-8 data to map successional Forest stages in a subtropical Forest in Southern Brazil. Remote Sens. 9 (8), 838. doi:10.3390/RS9080838
Stock, A. (2025). Choosing blocks for spatial cross-validation: lessons from a marine remote sensing case study. Front. Remote Sens. 6, 1531097. doi:10.3389/frsen.2025.1531097
Svoboda, J., Štych, P., Laštovička, J., Paluba, D., and Kobliuk, N. (2022). Random Forest classification of land use, land-use change and forestry (LULUCF) using Sentinel-2 data—A case study of Czechia. Remote Sens. 14 (5), 1189. doi:10.3390/RS14051189
Sweet, L., Müller, C., Anand, M., and Zscheischler, J. (2023). Cross-validation strategy impacts the performance and interpretation of machine learning models. Artif. Intell. Earth Syst. 2 (4). doi:10.1175/AIES-D-23-0026.1
Tahir, Z., Haseeb, M., Mahmood, S. A., Batool, S., Abdullah-Al-Wadud, M., Ullah, S., et al. (2025). Predicting land use and land cover changes for sustainable land management using CA-Markov modelling and GIS techniques. Sci. Rep. 15 (1), 3271. doi:10.1038/s41598-025-87796-w
Tang, Y., Zhang, G., Liu, J. K., and Qin, R. (2025). Weakly supervised land-cover classification of high-resolution images with low-resolution labels through optimized label refinement. Int. J. Remote Sens. 46 (5), 1913–1937. doi:10.1080/01431161.2024.2443612
Tesfaye, W., Elias, E., Warkineh, B., Tekalign, M., and Abebe, G. (2024). Modeling of land use and land cover changes using google Earth engine and machine learning approach: implications for landscape management. Environ. Syst. Res. 13 (1), 31. doi:10.1186/s40068-024-00366-3
Thi Hang, H., and Alshayeb, M. J. (2025). Technological innovations in urbanization and land surface temperature analysis: a remote sensing and machine learning case study of Delhi. Environ. Technol. Innovation 38, 104164. doi:10.1016/j.eti.2025.104164
Velastegui-Montoya, A., Montalván-Burbano, N., Carrión-Mero, P., Rivera-Torres, H., Sadeck, L., and Adami, M. (2023). Google Earth engine: a global analysis and future trends. Remote Sens. 15 (14), 3675. doi:10.3390/RS15143675
Wang, Y., Khodadadzadeh, M., and Zurita-Milla, R. (2023). Spatial+: a new cross-validation method to evaluate geospatial machine learning models. Int. J. Appl. Earth Observation Geoinformation 121, 103364. doi:10.1016/J.JAG.2023.103364
Wang, Y., Jiang, R., Yang, M., Xie, J., Zhao, Y., Li, F., et al. (2024). Spatiotemporal characteristics and driving mechanisms of land use/land cover (LULC) changes in the Jinghe River Basin, China. J. Arid Land 16 (1), 91–109. doi:10.1007/s40333-024-0051-x
Warrens, M. J. (2015). Properties of the quantity disagreement and the allocation disagreement. Int. J. Remote Sens. 36 (5), 1439–1446. doi:10.1080/01431161.2015.1011794
Wiesmeier, M., Barthold, F., Blank, B., and Kögel-Knabner, I. (2011). Digital mapping of soil organic matter stocks using random forest modeling in a semi-arid steppe ecosystem. Plant Soil 340, 7–24. doi:10.1007/s11104-010-0425-z
Wu, J., Lin, L., Zhang, C., Li, T., Cheng, X., and Nan, F. (2023). Generating Sentinel-2 all-band 10-m data by sharpening 20/60-m bands: a hierarchical fusion network. ISPRS J. Photogrammetry Remote Sens. 196, 16–31. doi:10.1016/J.ISPRSJPRS.2022.12.017
Xie, G., Liu, P., Chen, Z., Chen, L., Ma, Y., and Zhao, L. (2025). Time series remote sensing image classification with a data-driven active deep learning approach. Sensors 25 (6), 1718. doi:10.3390/S25061718
Yimer, S. M., Bouanani, A., Kumar, N., Tischbein, B., and Borgemeister, C. (2024). Comparison of different machine-learning algorithms for land use land cover mapping in a heterogenous landscape over the Eastern Nile river basin, Ethiopia. Adv. Space Res. 74 (5), 2180–2199. doi:10.1016/j.asr.2024.06.010
Zafar, Z., Zubair, M., Zha, Y., Fahd, S., and Ahmad Nadeem, A. (2024). Performance assessment of machine learning algorithms for mapping of land use/land cover using remote sensing data. Egypt. J. Remote Sens. Space Sci. 27 (2), 216–226. doi:10.1016/J.EJRS.2024.03.003
Zhai, H., Lv, C., Liu, W., Yang, C., Fan, D., Wang, Z., et al. (2021). Understanding spatio-temporal patterns of land use/land cover change under urbanization in Wuhan, China, 2000–2019. Remote Sens. 13 (16), 3331. doi:10.3390/RS13163331
Zhang, Z., Hörmann, G., Huang, J., and Fohrer, N. (2023). A random forest-based CA-Markov model to examine the dynamics of land use/cover change aided with remote sensing and GIS. Remote Sens. 15 (8), 2128. doi:10.3390/rs15082128
Zhang, Y., Lv, J., Wang, T., Zhang, K., and Wu, Y. (2025). Assessment of ecological risk under different SSP-RCP scenarios of the Xinjiang province in China. Sci. Rep. 15 (1), 8345. doi:10.1038/s41598-024-81879-w
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., and Liu, H. (2010). Advancing feature selection research − ASU feature selection repository. Available online at: https://www.kyb.tuebingen.mpg.de/bs/people/andre/spider.htm.
Zhu, Z., Gallant, A. L., Woodcock, C. E., Pengra, B., Olofsson, P., Loveland, T. R., et al. (2016). Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS J. Photogrammetry Remote Sens. 122, 206–221. doi:10.1016/J.ISPRSJPRS.2016.11.004
Zhu, F., Zhu, C., Fang, Z., Lu, W., and Pan, J. (2025). Using constrained K-Means clustering for soil texture mapping with limited soil samples. Agronomy 15 (5), 1220. doi:10.3390/AGRONOMY15051220
Keywords: auxiliary features, land use and land cover classification, random forest, block-cross validation, global human settlement built-up layer
Citation: Shandu ID, Xulu S and Gebreslasie M (2026) Enhancing land cover classification in the heterogeneous landscape by integrating auxiliary data with Sentinel-2 imagery using the random forest algorithm. Front. Remote Sens. 6:1697897. doi: 10.3389/frsen.2025.1697897
Received: 02 September 2025; Accepted: 06 November 2025;
Published: 09 January 2026.
Edited by:
Xinghua Li, Wuhan University, ChinaReviewed by:
Ery Suhartanto, Universitas Brawijaya Jurusan Teknik Pengairan, IndonesiaAli Al-Ghamdi, King Saud University, Saudi Arabia
Copyright © 2026 Shandu, Xulu and Gebreslasie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Irvin D. Shandu, MjI0MTY2MjEwQHN0dS51a3puLmFjLnph, aWRjb3NzYUBnbWFpbC5jb20=