Integrating Angular Backscatter Response Analysis Derivatives Into a Hierarchical Classification for Habitat Mapping

Accurate maps of biological communities are essential for monitoring and managing marine protected areas but more information on the most effective methods for developing these maps is needed. In this study, we use Wilsons Promontory Marine National Park in southeast Australia as a case study to determine the best combination of variables and scales for producing accurate habitat maps across the site. Wilsons Promontory has full multibeam echosounder (MBES) coverage coupled with towed video, remotely operated underwater vehicle (ROV) and drop video observations. Our study used an image segmentation approach incorporating MBES backscatter angular response curve and bathymetry derivatives to identify benthic community types using a hierarchical habitat classification scheme. The angular response curve data were extracted from MBES data using two different methods: 1) angular range analysis (ARA) and 2) backscatter angular response (AR). Habitat distributions were predicted using a supervised Random Forest approach combining bathymetry, ARA, and AR derivatives. Variable importance metrics indicated that ARA derivatives, such as grain size, impedance and volume heterogeneity were more important to model performance than AR derivatives mean, skewness, and kurtosis. Additionally, this study investigated the impact of segmentation software settings when creating segmented surfaces and their impact on overall model accuracy. We found using fine scale segmentation resulted in the best model performance. These results indicate the importance of incorporating backscatter derivatives into biological habitat maps and the need to consider scale to increase the accuracy of the outputs to help improve the spatial management of marine environments.


INTRODUCTION
Coastal regions support a wide range of human activity, such as commercial and recreational fisheries, aquaculture, and resource extraction. These anthropogenic pressures often result in threats to the local biodiversity and ecosystems. To successfully protect and conserve marine areas, more information on geological and ecological characteristics are required (Brown et al., 2011). The production of habitat maps is a key decision-support tool for conservation management and planning (Costello, 2009) but requires detailed information regarding the physical characteristics of the seafloor. Many studies have shown the utility in data from multibeam echosounders (MBES), including both bathymetry and backscatter, for producing accurate and useful habitat maps (Lecours et al., 2017;Lacharité et al., 2018;Fakiris et al., 2019).
Bathymetry provides information that has long been used to delineate habitats due to its correlation with light attenuation, which influences the distribution of habitat forming algal species and their associated communities (Kendal et al., 2005;Ready et al., 2010;Rees et al., 2014;MacDonald et al., 2016;Sen et al., 2016). Derivatives from bathymetry, such as geomorphological complexity, rugosity and slope, have also contributed to understanding the distributions of sessile invertebrate (Kendal et al., 2005), and macroalgal assemblages (Toohey et al., 2007). In addition to the bathymetry information provided by MBES systems, backscatter, which is the intensity of the acoustic signal return used to infer seafloor geological and biological characteristics, is also acquired and can be converted into continuous mosaics.
Characterising the seafloor using backscatter requires more parameters compared to bathymetric measurements, most importantly target strength, and incident angle (Lurton et al., 2015). Target strength is a description of how much acoustic energy is redirected or scattered back to the echosounder with rough surfaces scattering a greater portion of the acoustic signal back to the receiver compared to smooth surfaces, which scatter much less signal. The backscatter intensity is also impacted by the incidence angle, with increasingly weaker returns as the incidence angle increases (Lurton et al., 2015).
Previous studies have shown the usefulness of backscatter derived mosaics for classifying biological communities such as rhodolith and seagrass beds through unsupervised clustering (Ryan et al., 2007;Hamilton and Parnum, 2011). However, to create a backscatter mosaic, the angular response must be removed, resulting in omission of useful data. Angular backscatter intensity, or backscatter angular response (AR), can be processed (or extracted) to investigate in detail the backscatter intensity as a function of incidence angle, scattered from the seafloor surfaces (Lurton et al., 2015) under the assumption of a homogenous seabed. From the backscatter response curve further variables such as the mean, slope, kurtosis and skewness of the curves can be derived and used as variables for the classification process as in Che Hasan et al. (2012). Combining bathymetric and backscatter AR derivatives often results in more accurate habitat classifications, compared to bathymetric derivatives alone (Che Hasan et al., 2012;Sen et al., 2016) and the inclusion of AR derivatives with bathymetry and backscatter derivatives can increase overall map accuracy up to 5% according to previous studies (Che Hasan et al., 2012Hasan et al., , 2014Fakiris et al., 2019).
The above process differs from angular range analysis (ARA) approach, which uses a geo-acoustic inversion model to predict sediment classes (Fonseca and Mayer, 2007). ARA attempts to use the angular backscatter intensity information to estimate type of sediment as well as other sediment properties, such as the sediment grain size, index of impedance and volume heterogeneity (Mulhearn, 2000;Fonseca et al., 2009). These inversion parameters are useful for identifying differences in the seafloor that can be attributed to variations in features and biological habitats (Brown and Blondel, 2009;Che Hasan et al., 2014;Lecours et al., 2016a). However, there are no studies that assess the importance of combined AR and ARA mosaics on classification accuracy.
Incorporating AR derivatives (mean, slope, kurtosis, and skewness) and ARA inversion parameters (sediment grain size, index of impedance, and volume heterogeneity) in a classification with high resolution bathymetry and backscatter mosaics is challenging due to the mismatch of spatial resolution (Fonseca et al., 2009;Che Hasan et al., 2012). Both AR and ARA are "halfswath time-series" data that require uninterrupted time-series for the port and starboard swaths, with swath width increasing as depth increases . Considering this requirement, one assumption when processing AR and ARA is that the seafloor is homogenous at the scale of the swath and areas where the seafloor is heterogeneous results in less reliable outputs (Clarke et al., 1997;Fonseca et al., 2009;Che Hasan et al., 2012). Additionally, the resulting derivative mosaics, such as grain size, are spatially limited to the cross track length of the port, and starboard swaths (Fonseca et al., 2009). Although challenging to incorporate into habitat classifiers, adding AR and ARA into benthic habitat classification has been shown to improve classification prediction accuracy despite the cost of coarser spatial resolution (Clarke, 1994;Clarke et al., 1997;Lamarche et al., 2011;Che Hasan et al., 2012).
ARA is limited if substrate properties change part way across the swath and as a result is difficult to integrate into high resolution habitat maps. One promising method for incorporating AR derivatives and ARA inversion parameters into higher resolution habitat maps is by extracting this information from homogeneous regions or segments. The goal of segmentation is to separate the study area into multiple varying sized objects based on spectral and spatial characteristics . This process can be achieved by performing spatial segmentation of the corrected bathymetric (spatial) and backscatter (spectral) mosaics. Using segmented polygons created from the bathymetric and backscatter surfaces to create segmented mosaics of ARA curves identifies spatial differences in seafloor properties that may be difficult to derive from the ARA curves alone. Image segmentation has the added benefit of reducing the impact of MBES backscatter artefacts on model predictions by smoothing outliers present in the mosaic within homogeneous objects (Lecours et al., 2017).
Artefacts in MBES bathymetry and backscatter datasets are common, and may result from MBES installation, sea state, and user input. These artefacts can be further reduced at the pixel level by varying the analytical scale (kernel size) when generating derivative surfaces. By increasing the analytical scale, artefacts are smoothed due to the influence of nearby records (MacMillan and Shary, 2009). In addition to reducing artefacts within a MBES dataset, analytical scale can have an impact on model performance as demonstrated by Wilson et al. (2007) who found that a single fixed analytical scale cannot capture all features of interest and as a result may impact model performance. Including derivatives at multiple analytical scales has been demonstrated to increase model performance (Wilson et al., 2007;Porskamp et al., 2018). Using multiple analytical scales allows for the derivative surfaces that best capture the variability in the environment to be used in classifying predetermined habitat classes (Wilson et al., 2007).
We explore methods for including multibeam bathymetry and backscatter derivatives, ARA inversion parameters and AR Frontiers in Remote Sensing | www.frontiersin.org May 2022 | Volume 3 | Article 903133 3 derivatives, using image segmentation along with multiscale environmental derivatives as variables in a Random Forest (RF) classifier. For each classification hierarchy explored in this study we evaluate: 1) How the inclusion of ARA inversion parameters and AR derivatives influences habitat classification models by assessing variable importance through mean decrease in accuracy, and 2) How image segmentation approaches influence overall model accuracies at varying scales.
Ultimately, this study aims to determine the most effective methods for accurately classifying marine habitats to aid in marine spatial management, including the management of marine protected areas.

Study Site
The study site is located at Wilsons Promontory Marine National Park (MNP) in the temperate coastal zone of Victoria, southeastern Australia (Figure 1). Wilsons Promontory Marine National Park was designated in 2002 and is the largest marine national park in Victoria. Recreational fishing, commercial fishing, or any activities that disturb the seafloor are not permitted within the park. It is classified under the International Union for Conservation of Nature and Natural Resources (IUCN) category II, which means it protects a functioning ecosystem, and supports education and recreation.
The Park is characterised by a coastal granitic reef with areas of complex sand dunes further from shore. In addition, there are several deep (up to 90 m) nearshore scoured depressions. Wilson's Promontory MNP supports high biodiversity (Edmunds et al., 2012) with 180 fish, 404 molluscs, 191 crustaceans, and 1,633 plants/algae species (Parks, 2017). Biodiversity is likely high due to the seasonal convergence of the East Australian and South Australian currents (Kennedy et al., 2014). High energy shallow rocky reefs are dominated by fucoid macroalgae, primarily Phyllospora comosa. Deeper reefs to the south of the park are largely dominated by sessile invertebrate communities.

MBES Collection and Processing
Bathymetry and backscatter were collected using a multibeam echosounder (Kongsberg EM 2040C) aboard the research vessel MV Yolla at a frequency of 300 kHz with an automated ping and fixed pulse rate dependent on survey conditions. Differential GPS positions were post-processed using POSPac MMS. Bathymetry data were processed with CARIS HIPS and SIPS (v8.1A). The backscatter mosaic was processed with FMGT (v.7.4.1) to correct for source level, absorption and spreading losses, and insonified areas. To remove the angular effect, we used a trend Angular Varying Gain (AVG) correction using angular interval of 20°to 60°within 300 window sizes. All soundings were processed manually at 1 m resolution and corrected to the Australian height datum (AHD). Bathymetry and backscatter derivatives were generated at kernel sizes ranging from 3 × 3 to 201 × 201 pixels resulting in products at analysis scales of 3,5,7,9,11,21,51,101,201 m. The kernel sizes used in this study were selected following methods from Porskamp et al., 2018, with additional larger kernel sizes (i.e. 51, 101, and 201) to test if there were upper limits to which kernel sizes performed best. ArcGIS Pro (ArcGIS, 2020) was used was used for all kernel analysis through the focal statistics tool.

Segmentation
Image segmentation partitions the surface into continuous regions that share similar attributes (Blaschke, 2010). Our study used ENVI FX (ENVI, 2020) to perform Image segmentation on the FMGT (FMGT, 2020) backscatter mosaic and CARIS (CARIS, 2017) bathymetry digital elevation model (DEM). ENVI FX uses a hybrid segmentation method that combines region-growing and edge-based methods . The scale parameter is defined as the percentage of the normalized cumulative distribution of the pixel values in the gradient or intensity image (Kucharczyk et al., 2020). For example, a scale parameter of 10 indicates that the region growing would start with the lowest 10 percent of pixel values, essentially a minimum parameter for initial regions. After the initial regions are created, adjacent regions are merged based on their similarity of brightness, texture and colour. Selection of scale parameters were guided by previous studies that used image segmentation for benthic habitat map segmentation (Adger, 1997). To determine if scale selection in image segmentation influences overall model accuracy, our study used a scale parameter of 15 (fine scale: 4,531 objects), 30 (medium scale: 2,586) and 45 (broad scale: 1,608 objects).

AR and ARA Creation
To prepare for AR derivatives, first, the raw backscatter *.all files were processed using CMST-GA MB Process v17.05.07 (Gavrilov and Parsons, 2014) which converted the raw backscatter valuesto backscatter intensity values presented as a function of incidence angle (i.e. angular response) in MATLAB format over 5 consecutive pings. Secondly, a custom MATLAB script (v2018b) (Che Hasan et al., 2014) was used to extract the AR derivatives mean, slope, kurtosis and skewness for each segmented polygon, as discussed above. This tool outputs each AR derivative as a mosaic, which were gridded at a resolution of 1 m.
ARA curves were calculated and corrected using FMGT and converted into continuous raster mosaics, as a function of incident angle over five consecutive pings, with the port and starboard curves calculated separately. The study area ranges in depth from 2 to 90 m, using 5 pings and resulting in a footprint of 1.85 m for the deepest areas of the park and 0.5 m in the shallowest areas of the park. A geophysical model (Jackson et al., 1986) is fitted to these curves in FMGT, which produces estimates of the following parameters impedance, grain size, and volume heterogeneity (Fonseca et al., 2002;Fonseca and Mayer, 2007). Impedance is defined as the ratio between acoustic pressure and the velocity of the soundwave, indicating "hardness" of the seafloor (Fonseca et al., 2009;Hou et al., 2018) and grain size represents sediment grain size measured in millimetres (Manik and Jaya, 2016). As the present study did not extend to sediment analysis, raw grain size values are used as a relative comparison between grain sizes. Volume heterogeneity provides information on density and porosity of the sediment (Jackson and Briggs, 1992). To reduce the impact of "swath resolution" and possible artefacts from sediment type changing mid swath, the ARA derivatives central tendencies (mean, and standard deviation) were calculated for each segment. All output raster mosaics were re-gridded at a scale of 1 m to match all derivatives. A summary of all derivatives can be found in Table 1. One limitation to the ARA approach is that the models are for unconsolidated sediments. Therefore, impedance and grain size for gravel and rock come out as the same from the ARA inversion. As a result, sites with large areas of bedrock or gravel may find ARA derivatives grain size, impedance and volume less important in the RF model. The Jackson model (Jackson et al., 1986) specifies that the classifications are only applicable for sonar systems between 10 and 100 kHz; however, previous studies have shown (Jackson and Ivakin, 1998) that roughness and volume parameters essentially have no frequency dependence. Due to the limitations of this method and our lack of sediment samples at this site, which would be necessary to groundtruth classifications, we used ARA as a measure of relative grain size and the heterogeneity of the sediment. These parameters are helpful when defining biological habitat.

Groundtruth Collection and Classification
We compiled all available seabed observations across the park to inform the habitat mapping process. Video observations were collected between 2013 and 2016: 85 drop video samples collected in 2014, 41 ROV video samples collected in 2013, 7 AUV flights collected in 2016 and 7 towed video transects collected in 2016. Video was classified using the Combined Biotope Classification Scheme (CBiCS) (Edmunds and Flynn, 2015), which has been adopted by the Victorian Department of Environment, Land, Water and Planning. Video observations were classified into categories according to six of the CBiCS hierarchical levels. Three hierarchical levels of the scheme; broad habitat classes (BC2), habitat complexes (BC3), and biotope complexes (BC4) were used in the present study ( Table 2). In order to limit spatial autocorrelation in the groundtruth data, each groundtruth source was split into point observations (i.e., continuous towed video transects were separated into points, each with latitude, longitude and class). Variograms for each CBiCS class were created in R v3.6.1 following methods by Lyons et al., 2018 to determine the minimum distance between all points to reduce spatial autocorrelation. The groundtruth dataset was then stratified by class and randomly subset with a minimum distance of 35 m to reduce the spatial autocorrelation. To limit the impact of rare classes on model accuracy any classes with less than 25 observations were excluded ( Table 2). A random forests (RF) approach was used to derive rule-based relationships between geophysical derivatives and observational data using the "randomForest" package in R (Liaw and Wiener, 2002). The RF classification approach reduces overfitting by including the results of multiple trees from iterative bootstrap samples (Cutler et al., 2007). RF reduces bias via random selection of variables. Stephens and Diesing (2014) show that tree-based methods, particularly RF, resulted in more accurate classifications than competing methods when predicting sediment classes. For a workflow flowchart see Figure 2.
Model parameters mtry (i.e., amount of variables to sample at each split) and ntree (i.e., number of trees, aiming for the minimum number to stabilize the error) were determined using model optimization procedures in the 'caret' package from R (Kuhn, 2020), and set to 11 and 300 respectively for all fine segmentation models. Variable importance for each RF model was determined by using mean decrease in accuracy. Mean decrease in accuracy is calculated by using the out-of-bag error (prediction error utilizing bootstrap aggregating) for each tree for each variable. The differences between each variable are then averaged over all trees (Han et al., 2016). The higher the mean decrease in accuracy the more important the variable is to the model. Many studies demonstrate the benefits of permutating variable importance, however these studies are typically focused on the field of genomics; running simulated studies with as little as 5 predictor variables that are highly correlated, and primarily categorical (Strobl et al., 2007;Nicodemus et al., 2010). Running high dimensional (162 variables) large observational (708) models over multiple permutations is computationally intensive (Degenhardt et al., 2017), therefore we did not average variable importance over multiple permutations. Predictors from this study were assessed using Pearson product-moment correlation ( Figure 3). Any predictor variables with a correlation value greater than 0.8 or smaller than 0.8 were assessed by variable importance, with the lowest ranking variable being removed, reducing the correlation below the threshold for the retained variable (retained variables are shown in Table 3). Correlated variables within the model can lead to large variances when small changes within the collated variables occur, which is why it is best practice to remove correlated variables (Ohlemüller et al., 2008). For more details on variable importance refer to Porskamp et al., 2018. The same predictor variables were selected for the three segmentation sizes, broad, medium and fine, to allow comparisons across segmentation scales.

Predictive Mapping
Classified habitat maps were created using the "ModelMap" package in R (Freeman and Frescino, 2009). Model accuracies were FIGURE 3 | Pearson product-moment correlation plots for fine segmentation. See variable code names in Table 3. determined following Lyons et al. (2018) methods, which randomly splits groundtruth data into training and validation samples. This process is repeated 500 times and the mean and variance of the error metrics are used as an indication of map confidence.

One-Way ANOVA
To determine differences between image segmentation scales, overall model accuracies were used. For each image segmentation scale, RF models were run 500 times, each time with a different training and validation dataset for a total of 1,500 overall accuracies. This resulted in three groups, fine, medium and broad, each with 500 overall accuracy values. To test if there was a significant difference between the three groups, we applied a one-way analysis of variance (ANOVA) with an α value of 0.05.

Image Segmentation Results
Overall, the three image segmentations createsimilar objects within the backscatter mosaic ( Figure 4). Comparing the three different segmentation scales, broad medium and fine, reveals that overall model accuracy is within 1% of each scale ( Table 4). The overall accuracy medians for BC3 and BC4 were approximately 10% less than BC2. Standard deviation is low across the tested segmentations. There was a significant difference between model means as determined by one-way ANOVA (F (2, 1,497) = 17.21, p < 0.001).

ARA Derivative Results
Most of the site was circalittoral fine sand at the BC4 level followed by sandy low-profile reef wave surge communities. Example outputs form continuous unsegmented ARA derivatives; impedance, volume, and grain size ( Figure 5).
To demonstrate class separability, mean angular response curves were generated for BC2, and BC4 based on groundtruth observations ( Figure 6). A figure was not generated for BC3 because the curves were similar to BC2. Both circalittoral rock (CR) and infralittoral rock (IR) BC2 classes have similar angular response curves with CR slightly higher than IR ( Figure 6A). The differences between CR and IR decrease as incidence angle increases. Differences between these two classes could be driven by depth differences between the two classes. Sublittoral sediment (SS) shows good separation from both IR and CR as it starts the highest dB value and ends at the lowest for the three classes ( Figure 6A).  Sublittoral fine sand (SSfa) BC4 class was the most different of the seven classes starting lower than all classes with a fairly consistent drop off for both near and outer incidence angles ( Figure 6B). High energy Ecklonia-Phyllospora comosa communities (EP) shows the strongest discrimination characteristic from 0 to 30°. Although Ecklonia-Phyllospora comosa communities are found on consolidated substrate, the presence of dense algae could be a possible cause for near beams (0-30°) to have such low intensities. Circalittoral mixed sediments (CMs), circalittoral coarse sediments (CCS), moderate to high complexity circalittoral rock with covering of small colonies and well-spaced erect sponges (MCS) and sandy low profile reef wave surge communities (SLP) show the most separation between 0 and 15°with the outer beams having similar intensities ( Figure 6B). High energy circalittoral rock with bushy branching and low erect sponges (HCB) has the strongest FIGURE 5 | Example surfaces for the ARA derivatives Grain size, Impedance, and Volume with groundtruth points coloured by category. Top row are ARA derivatives post segmentation, bottom row are ARA derivatives pre segmentation.
Frontiers in Remote Sensing | www.frontiersin.org May 2022 | Volume 3 | Article 903133 9 discrimination in the near beams (0-20°). BC2 classes were more disguisable than BC4 classes, which is likely due to BC2 class separation focused on substratum while BC4 classes are separated by biological communities.

Habitat Classification Maps
The habitat classification map for BC2 shows that "sublittoral sediments" dominate the Wilsons Promontory site, while "circalittoral rock" and "infralittoral rock" are found fringing along the granitic headlands and islands (Figure 7). The BC3 classification map is similar to BC2, but the sublittoral sediment classes are further split into "sublittoral mixed sediments" and "sublittoral sand and muddy sand".
The smallest class by area for BC4 was "circalittoral coarse sediment", found in the southwestern section of the study site at a mean depth of 45 ± 9 m. The next smallest class by area is "moderate to high complexity circalittoral rock with covering of small colonies and well-spaced erect sponges" and is found primarily in the west near shore with a mean depth of 45 ± 7 m "Circalittoral fine sand" is the largest class by area and is distributed primarily in the east and south of the site with patches in the west. "Circalittoral fine sand" has a mean depth of 44 ± 7 m. The next largest class by area is "circalittoral mixed sediments", which are found deeper than "circalittoral fine sand" with a mean of 49 ± 8 m "Circalittoral mixed sediments" are found throughout the western section of Wilsons Promontory with patches near the headlands in the south. The biotope complex "high energy circalittoral rock with bushy branching and low erect sponges" are found at a mean depth of 58 ± 14 m with the majority located just south of the headlands. "High energy Ecklonia-Phyllospora communities" are present along the coastline and fringing off the Southwestern Islands at a mean depth of 27 ± 8 m. The final class, "sandy low-profile reef wave surge communities", is scattered within "circalittoral mixed sediments" with a mean depth of 48 ± 9 m. Overall model accuracies and class accuracies for BC2, BC3 and BC4 can be found in Supplementary Tables S1, S2, S3 respectively.

Variable Importance Comparisons
Variable importance plots for the fine scale segmentation model are shown in Figure 8. Bathymetry VRM exhibited the greatest mean decrease in accuracy for hierarchy BC2, while Bathymetry mean, and grain size mean were greatest for BC3 and BC4 respectively. The ARA derivatives volume, impedance and grain size had consistently higher mean decrease in accuracy values over other variables, such as backscatter mean, backscatter standard deviation, slope, and curvature. The AR derivative skewness was ranked lowest for hierarchy BC2 and BC4, while AR slope ranked lowest for BC3.
For medium scale segmentation, impedance mean exhibited the greatest mean decrease in accuracy for hierarchy BC2 and BC3, and grain size mean was greatest for BC4 ( Figure 9). The ARA derivatives volume, impedance and grain size had consistently higher mean decrease in accuracy values over other variables, such as backscatter mean, backscatter standard deviation, slope, curvature, eastness and northness. The AR derivative kurtosis was ranked lowest for hierarchy BC3, while curvature was ranked lowest for BC2 and BC4 (Figure 9).
For broad scale segmentation bathymetry mean exhibited the greatest mean decrease in accuracy for hierarchy BC2 and BC4, and complexity was greatest for BC3 ( Figure 10). Unlike the fine and medium scale segmentation models impedance and grain size mean had lower importance than broad scale segmentation models. The AR derivative kurtosis was ranked lowest for hierarchy BC3, while curvature was ranked lowest for BC2. The lowest ranking derivative for BC4 was AR slope.

DISCUSSION
Benthic habitat mapping provides important decision-support tools for spatial management of marine resources and associated species diversity. Quantifying and improving map accuracy increases certainty for decision makers. Compared to terrestrial studies, marine studies are often limited in the information available to  Table 2 for full class names.
Frontiers in Remote Sensing | www.frontiersin.org May 2022 | Volume 3 | Article 903133 classify habitats, which has, in part, driven the development of new technologies and methods used to improve accuracy of habitat classification (Fakiris et al., 2019). This study builds on previous work conducted by Che Hasan et al., 2014;Fonseca et al. (2009) and is the first to include ARA derivatives grain size, impedance and volume with contemporary environmental variables into a RF model to classify biotic communities. These combinations of variables were incorporated into nine RF models to classify habitats at three hierarchical scales: broad habitats, habitat complexes and biotic complexes, and using three levels of segmentation. The results of this study directly compare the importance of environmental variables that best describe habitat distribution within Wilsons Promontory Marine National Park. Additionally, this study demonstrated that methods for creating segmented surfaces using image segmentation affect overall model accuracy, indicating the importance of testing and developing methods for optimising the

Impact of Variables
Across the models, ARA derivatives ranked high for variable importance, whereas AR derivatives ranked lowest for six of the nine models. The inversion parameters from ARA, volume heterogeneity, grain size, and impedance were more important to all nine models than values from backscatter mean, backscatter standard deviation and backscatter VRM. In some cases, backscatter derivatives and ARA inversion parameters were correlated with one another, but in all cases the ARA derivatives were more important. These results support the hypothesis by Fonseca et al. (2009), which found that ARA inversion parameters modelled from MBES observation correlated well with substrate photos and would likely aid in discrimination and delineation of benthic habitat maps.
The utility of both ARA inversion parameters and AR derivatives at Wilsons Promontory is probably due to the dominance of sediment classes at this site. ARA and AR derivatives are particularly good at separating sediment classes (Hamilton and Parnum, 2011;Rzhanov et al., 2012;Huang et al., 2013;Zhi et al., 2014;Daniell et al., 2015;Siwabessy et al., 2017). This ability to delineate sediment using ARA derivative properties likely resulted in the higher ranking of ARA derivatives. Previous studies have explored how ARA derivatives impact model performance through separating geological classes (Fonseca et al., 2009;Rzhanov et al., 2012) as well as AR derivatives (Huang et al., 2013;Che Hasan et al., 2014;Daniell et al., 2015;Siwabessy et al., 2017) but none have compared variable importance between both methods of including angular response data.
Each of the ARA derivatives, grain size, impedance, and volume heterogeneity, had varying effects on the classifications across the hierarchical scheme. These ARA derivatives have had limited application in habitat classification studies due to difficulty incorporating with high resolution bathymetry and backscatter datasets because of the mismatch in spatial resolution (Fonseca et al., 2009;Che Hasan et al., 2012). To incorporate ARA derivatives with contemporary environmental variables, the derivatives must be segmented, which may reduce the overall resolution of the habitat classification map. However, these derivatives describe and delineate sediment physically (Fonseca et al., 2009;Hamilton and Parnum, 2011;Rzhanov et al., 2012;Che Hasan et al., 2014) and because sediment influences the distribution of biotic communities (Ryan et al., 2007;McBreen et al., 2008), could explain why grain size, impedance and volume rank highly among the biological classification scheme. Relative grain size was a variable of high importance for both the BC3 and BC4 classifications, probably due to grain size often driving the distribution of those biotic communities associated with sediment (Seiderer and Newell, 1999;McBreen et al., 2008). Grain size is also the primary factor used in the hierarchy to define the classes in the groundtruthing datasets-so this would be expected. Impedance is found to be influential in delineating classes in BC2 and BC3, which are primarily defined by geological features, such as sediment versus rock. Although ranked lower than impedance and grain size, volume mean, and volume standard deviation are present in each model and in each hierarchy. Fonseca et al. (2009) hypothesized that volume homogeneity would be useful for delineation of benthic habitat maps and our study supports this hypothesis. Volume is a measure of sediment "heterogeneity" and is an important variable for identifying changes in sediment types (Fonseca et al., 2009).
ARA derivatives outranked many contemporary environmental variables such as slope, eastness, northness, complexity, backscatter and, most notably, bathymetry. Slope, eastness, northness, and complexity represent various degrees of complexity in the seafloor (Lecours et al., 2016b). Wilson's Promontory has low variability in the seafloor for the majority of the site with areas of high complexity associated with the extension of granitic headlands, outcrops and high current areas where sediment waves were observed only making up a small area of the park (see Kennedy et al., 2014). The lower coverage of high complexity substrates likely caused the lower importance of those variables associated with complexity compared to ARA derivatives. Previous studies that use MBES data to classify seafloor habitat have typically found bathymetry (depth) to be the highest ranking variable in terms of importance (Che Hasan et al., 2014;Rattray et al., 2015;Lecours et al., 2016a;Ierodiaconou et al., 2018). Since our site is dominated by granitic reefs it is important to note that the transition of the granitic reefs is relatively steep and can cause issues with lack of observations across the depth range. In this study ARA derivatives grain size and impedance contributed to differentiating the classes specified, resulting in high mean decrease in accuracy values. A possible explanation for ARA derivatives' high importance is likely due to Wilsons Promontory being dominated by sediment communities (91% coverage) with fringing "reef" based communities covering the remaining area (9% coverage). CBiCS reef-based classes are frequently delineated by the macroalgae and sessile invertebrate communities with their distributions influenced by the amount of light they receive, unlike sediment-based classes. Daniell et al. (2015) demonstrated that AR curves can be used to identify hard, soft and, most relevant to benthic habitat delineation, mixed substrate types. To determine ARA's effectiveness at delineating mixed substrate types and reef communities' future studies should explore ARA derivatives importance at sites with higher diversity of reef-based communities.

Hierarchical Comparison
As the number of classes increases among the hierarchy, overall model accuracy decreases, which has been previously demonstrated using RF models (Bernard et al., 2009;Wilson et al., 2007;Porskamp et al., 2018). As the hierarchy becomes more complex (increased number of classes) more rare classes with fewer observations are included that reduce the ability of the models to accurately predict rare classes (Hernandez et al., 2006). To reduce this effect, we excluded any classes with less than 25 observations. A reduction in model accuracy with the introduction or rare classes is a limitation observed in other classification schemes such as EUNIS (Galparsoro et al., 2012). The roughly 10% drop in accuracy from BC2 to BC3 was much larger than BC3 to BC4 (3%) and is likely due to class delineation in BC2 driven primarily by geologic features and the fewer number of classes in BC2. Both BC3 and BC4 have multiple classes over the same substrate type, and therefore have closer overall accuracies. One method to reduce the impact of classes with few observations influencing overall model accuracy at each hierarchy is planning groundtruthing surveys to focus on capturing as many biotic communities as possible through more representative and balanced sampling to reduce the number of repetitions over the same substrate type (Daniell et al., 2015;Foster et al., 2018). However, even better planned groundtruthing surveys may not capture truly rare classes that have a small spatial footprint resulting in these classes being excluded from the classification. Often, we are in a position where groundtruthing is combined from several different sources, basically discovery surveys rather than part of a long-term monitoring program. Removing rare classes that are a key interest for management bodies is not an ideal solution, however there are alternative statistical methods for including rare classes such as using a balanced random forest design (Chen et al., 2004;Diesing, 2020) or by using a Synthetic Minority Over-sampling Technique (Chawla et al., 2002). These methods should be explored in future studies to increase the ability to retain rare classes in random forest models. Products used in the classification process are useful in informing future sampling strategies such as spatially balanced designs. Additionally, future studies should investigate using a nested approach when modeling across multiple hierarchies to preserve shared classifications across models.

Segmentation Comparison
When comparing model accuracies across the various segmentation sizes, there was a significant difference between fine, medium, and broad image segmentation scales. Previous studies have noted differences in model performance between segmentation software and methods (Chen, 2008;Clinton et al., 2010;Montereale Gavazzi et al., 2016;Liu et al., 2017) but did not test scale effects within the same software and their impact on model performance. When extracting features using image segmentation software, in our case ENVI, care should be taken in the selection of merging and edge detecting values as they can affect the overall model accuracy. This study found that using a low merge value (e.g., fine segmentation) resulted in the best overall model accuracy ( Table 4). Due to the size of Wilsons Promontory, we were limited in how many segments could be created and processed in a reasonable time. Although our results were statistically significant, a 1% difference between segmentation scales may not be warranted for all future studies. Future research should test additional merge settings to determine at what merge settings model accuracy is highest and beyond which accuracies do not improve. Segmentation scale is likely driven by site characteristics and thus should be tested whenever using an image segmentation approach. One limitation with using the fine segmentation as opposed to medium and broad is that along track artefacts in the outer beams from the backscatter mosaic are detected by the software. These artefacts are therefore visible in the final habitat map as elongated spatial habitat distributions. These artefacts in the backscatter mosaic persists after radiometric and geometric corrections are applied and is therefore likely they are from data acquisition. When acquiring MBES data careful selection of time varied gain, swath width, and pulse type can reduce the number of artefacts visible in a backscatter mosaic but is not always possible to survey constraints such as weather conditions and strict timelines or the prioritisation of bathymetry data over backscatter in the settings used.

CONCLUSION
Overall, this study shows how AR and ARA can be included in predictive habitat mapping when using an image segmentation approach and RF classifier. If the study site is dominated by sediments, including ARA derivatives will potentially improve model performance, resulting in more accurate habitat classification maps. These methods can be applied to any site with bathymetry, backscatter and groundtruth data. This study is a quantitative representation of angular response derivatives, future work should investigate incorporating absolute grain size measurements. Lastly, this study found that image segmentation merge settings affect overall classification accuracy. When applying image segmentation to create segmented surfaces finding the best performing segmented surface should be a priority.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: mapshare.vic.gov.au/coastkit/.

AUTHOR CONTRIBUTIONS
DI conceived the project and led the fieldwork. PP led the analyses and writing with contributions from DI, MY, AR, CB and RC.

FUNDING
This work was supported by the Parks Victoria Research Partners Panel Program-Baseline habitat mapping and improved monitoring of reef habitats in Victoria's marine national parks and sanctuaries. Towed video was classified by Australian Marine Ecology and Fathom Pacific using CBiCS under a project funded by the Department of Environment, Land, Water, and Planning for the development of a hierarchical classification scheme for marine environments in Victoria. GIS laboratory facilities at Deakin University, Warrnambool Campus, Victoria were used for spatial analyses. Autonomous Underwater Vehicle data was sourced from Australia's Integrated Marine Observing System (IMOS) -IMOS is enabled by the National Collaborative Research Infrastructure Strategy (NCRIS).