Testing the Improvement of Coral Reef Associated Fish Distribution Models Based on Multibeam Bathymetry by Adding Seafloor Backscatter Data

Demersal fishes constitute an essential component of the continental shelf ecosystem, and a significant element of fisheries catch around the world. However, collecting distribution and abundance data of demersal fish, necessary for their conservation and management, is usually expensive and logistically complex. The increasing availability of seafloor mapping technologies has led to the opportunity to exploit the strong relationship demersal fish exhibit with seafloor morphology to model their distribution. Multibeam echo-sounder (MBES) systems are a standard method to map seafloor morphology. The amount of acoustic energy reflected by the seafloor (backscatter) is used to estimate specific characteristics of the seafloor, including acoustic hardness and roughness. MBES data including bathymetry and depth derivatives were used to model the distribution of Abalistes stellatus, Gymnocranius grandoculis, Lagocephalus sceleratus, Lethrinus miniatus, Loxodon macrorhinus, Lutjanus sebae, and Scomberomorus queenslandicus. The possible improvement of model accuracy by adding the seafloor backscatter was tested in three different areas of the Ningaloo Marine Park off the west coast of Australia. For the majority of species, depth was a primary variable explaining their distribution in the three study sites. Backscatter was identified to be an important variable in the models, but did not necessarily lead to a significant improvement in the demersal fish distribution models’ accuracy. Possible reasons for this include: the depth and derivatives were capturing the significant changes in the habitat, or the acoustic data collected with a high-frequency MBES were not capturing accurately relevant seafloor characteristics associated with the species distribution. The improvement in the accuracy of the models for certain species using data already available is an encouraging result, which can have a direct impact in our ability to monitor these species.


INTRODUCTION
Coral reef fish constitute an essential component of the continental shelf ecosystem, and a significant element of fisheries catch around the world . Successful management and conservation of these demersal fishes rely on our ability to monitor their abundance and distribution. However, collecting distribution and abundance data is often expensive and logistically complex . Increasing availability of seafloor mapping technologies, such as Multibeam echo-sounders (MBES), has led to the opportunity to exploit the strong relationship demersal fish species exhibit with seafloor morphology to model their distribution in a costeffective manner (Brown et al., 2012).
Multibeam echo-sounders (MBES) are used to map the seafloor by transmitting acoustic energy toward the seafloor. The two-way travel time of this energy, to and from the transducer, combined with the angle of its travel, is used to determine the depth (bathymetry). The amount of acoustic energy reflected by the seafloor (backscatter) is used to estimate specific characteristics of the seafloor, including acoustic hardness and roughness (Fonseca and Mayer, 2007). The importance of depth to the assemblage of demersal fish has been well established (Fitzpatrick et al., 2012;Garcia-Alegre et al., 2014). As well as the direct influence depth has on demersal fish, it is also seen as a proxy for a broader set of variables involved in processes that occur at different levels of the water column which are usually harder to sample, e.g., temperature and light (Sih et al., 2017). Depth derivatives (e.g., ruggedness) are used to describe the complexity of the seafloor which can also influence the distribution of demersal fish at a variety of scales (Monk et al., 2011;Costa et al., 2014). Differences in the seafloor backscatter are used to help discriminate between benthic habitats, which can be closely related to the distribution of demersal species (e.g., sand vs. rock bottom; Monk et al., 2010;Monk et al., 2011). Therefore, the inclusion of seafloor backscatter data in demersal fish distribution models is slowly becoming more common. Multiple descriptors can be derived from the original backscatter data adding several lines of potentially useful information for species distribution modeling (Hasan et al., 2012a).
One of the most common products derived from the raw backscatter data is a mosaic, where the backscattered energy (measured as the backscatter strength on the dB scale, and backscatter intensity on the linear scale) received from different grazing angles is normalized for a certain angle or a range of angles. This method produces a regular grid usually with a resolution equal to the bathymetry layer (Fonseca et al., 2009). However, the relationship between the backscatter strength/intensity and grazing angle is related, for certain frequencies, to particular properties of the seafloor (Fonseca and Mayer, 2007). Normalizing the data to a specific angle dismisses valuable information contained in the angular response curve (ARC) (Hamilton and Parnum, 2011). Another approach is to characterize the seafloor using the Angle vs. Range Analysis (ARA) (Fonseca et al., 2009). During the ARA analysis, the backscatter response observed is compared to expected acoustic response curves based on a mathematical model, the Jackson Model (Jackson et al., 1986). In particular, the ARA analysis can be used to estimate the sediment grain size, which has been shown in some demersal species to be a driver of distribution, or at least a correlate. Previous studies have focused on testing the relevance of including the backscatter and its derivatives to model the distribution of benthic habitat classes (Ierodiaconou et al., 2007;Brown et al., 2012;Hasan et al., 2012a). Less attention has been placed in testing the benefit of adding the angular response data in modeling the distribution of demersal fishes which traditionally included the mosaic image and derivatives e.g., texture features (Hasan et al., 2014).
In the present study, terrain variables were used to model the distribution of fish data derived from Baited Remote Underwater Stereo-Video (stereo-BRUVS). The overarching aim was to test the possible improvement of a model's accuracy if the backscatter data is included. This was tested in three areas of the Ningaloo Marine Park (NMP) with different bathymetry and levels of terrain complexity. Seven species where chosen as an indicative evaluation of the accuracy of species distribution models: starry triggerfish (Abalistes stellatus), Robinson's seabream (Gymnocranius grandoculis), silver toadfish (Lagocephalus sceleratus), red throat emperor (Lethrinus miniatus), sliteye shark (Loxodon macrorhinus), red emperor (Lutjanus sebae), and school mackerel (Scomberomorus queenslandicus). The probability of presence of each of these species was modeled using depth, depth derivatives and backscatter (mosaic and ARC) data as explanatory variables.

Study Area
Ningaloo Reef (NR) is the longest fringing coral reef in Australia, and is considered a biodiversity hotspot and to be in a good state of conservation compared with other coral reefs (Gazzani and Marinova, 2007;Schonberg and Fromont, 2012). The NMP was designed to protect 90% of these iconic waters (CALM and MPRA, 2005). A biodiversity analysis of different phyla including demersal fish, sponges, and soft corals showed the NMP is a biogeographical overlap zone, where more tropical species occur in the northern section and both tropical and temperate species are present in the southern area (Simpson and Waples, 2012). In the present study, three areas of the NMP were used to model the distribution of demersal species of fish using depth derivatives and backscatter information. Mandu in the northern area, Point Cloates in the central area and Gnaraloo in the southern zone (Figure 1).

Baited Remote Underwater Stereo-Video
A multidisciplinary project was conducted in NMP between 2006 and 2009 by the Western Australia Marine Science Institution (WAMSI) and associates (Waples and Hollander, 2008). As part of this project, many aspects of the NMP were studied, including the demersal fish composition using Baited Remote Underwater Stereo-Video (stereo-BRUVS). A total of 656 stereo-BRUVS were deployed across the areas in  included 239 deployments in Mandu, 185 in Pt Cloates and 155 in Gnaraloo (Waples and Hollander, 2008;Simpson and Waples, 2012). A database that included relative abundance, produced by the Australian Institute of Marine Science (AIMS), was used in the present study. The commonly used metric, MaxN, corresponds to the maximum number of individuals of the same species observed together in one frame at any one time, during the analyzed period of the video, and has been shown to provide a conservative estimate of relative abundance (Willis et al., 2000;Cappo et al., 2003). Only the first hour of recording was used for the MaxN estimation analysis, which commences the moment the cameras touch the bottom. More details on the collection and analysis of the stereo-BRUVS can be found in Harvey et al. (2007).

Depth and Depth Derivatives
Depth and depth derivatives MBES surveys of the study areas were conducted in 2008 by Geoscience Australia and AIMS, using a Kongsberg EM3002, operating at 300 kHz. The MBES bathymetry was downloaded from the Geoscience Australia (GA) website as a raster with 3 m resolution. Ten depth derivatives were calculated from the bathymetry as shown in Table 1 (Moore et al., 2010(Moore et al., , 2011. Some of the derivatives were produced using the raster package (Hijmans, 2016) of the free software R (R Development Core Team, 2017) and the rest were produced using Landserf v2.3 as specify in Table 1. Ecological processes occurring at different scales can affect the distribution and abundance of demersal fishes. Therefore, four different windows sizes were used in the production of the derivatives. The finest scale of analysis was fixed by the resolution of the MBES data (3 m) and a 3 × 3 window of analysis, while the other three were chosen based on the spatial dependence of the species. A variogram analysis was used to identify the maximum distance at which the species display spatial dependency (the range) (Holmes et al., 2008). For the species with spatial dependency, the range was above 4 km. Therefore, the scales were chosen to cover the span between the finest resolution and below the maximum range of any of the species using four windows sizes of analysis (number of cells) 3 by 3 (81 m 2 ), 9 by 9 (729 m 2 ), 15 by 15 (2,025 m 2 ), and 21 by 21 (3,969 m 2 ). The largest windows size was selected to be within the maximum range of the species, but also based on the pixel size of the ARA-phi layer (60 m). For the fractal dimension calculation, the smallest windows size allowed in Landserf is 9 by 9. Therefore, the 3 by 3 window analysis was not used for this variable.

Backscatter Derivatives
The backscatter information was included in the models as two different layers. The first one was the full-coverage, 3-m resolution mosaic, downloaded from the GA website. The second one is an approximation of the sediment phi size estimated using the ARA (Fonseca et al., 2009), applied to the raw files.

Angle vs. Range Analysis
The relationship between the backscatter strength and the grazing angle is commonly known as the ARC. ARC is related, for certain frequencies, to particular properties of the seafloor (Hasan et al., 2014). Therefore, the ARC can be used to infer characteristics of the seafloor using the Angle vs. Range Analysis (ARA; Fonseca et al., 2009). In this study, we used the FMGT software (version 7.8) to conduct an ARA analysis using the raw MBES backscatter data. A full description of the method followed during the ARA analysis in FMGT can be found in Fonseca and Mayer (2007), a brief description of the method is given here.
The backscatter angular response is first corrected for radiometric and geometric distortions to locate each ping to its correct angular position. In the next step, a group of consecutive pings is stacked in the along-track direction, 30 pings were stacked. The stack of the pings produces two seafloor patches, one for the port side and another for the starboard side. The size of the patch being analyzed is approximately half of the swath of the MBES system coverage. The stacking of the pings in a patch has the effect of reducing the resolution of the final layer, but it is a necessary step to reduce the speckle noise, typical to any acoustic method. An average ARC calculated for each patch is then compared to a formal mathematical model which relates the observed backscatter with seafloor properties in a process called the ARA-inversion. During the inversion, the model is used to produce an approximation of the acoustic impedance, roughness and consequently the mean grain size of the patch under analysis. An ARA-inversion analysis was conducted for all the patches in the three studied sites to obtain maps of the distribution of grain size, with a resolution of 60 m. During the analysis, only incidence angles between 20 and 60 • were included, as the angles in the near nadir and outer angle regions tend to be noisy with less power of discrimination between different types of substrate (Hasan et al., 2012b).
As part of the WAMSI project, 290 sediment samples were collected using a Van-Veen grab sampler for surface and subsurface material between 2007 and 2006 (Colquhoun et al., 2007). The grain size estimated for this ground-truth data was compared with sediment phi size estimated using the ARA analysis, correlation and regression was used to test the relationship between them.

Species Distribution Models
The environmental variables including depth, depth derivatives, and the backscatter data were used as explanatory variables to explain the probability of presence of A. stellatus, G. grandoculis, L. sceleratus, L. miniatus, L. macrorhinus, L. sebae, and S. queenslandicus. The species were selected based on a minimum 25 presence in each of the sampled areas. All the species included in the present study are carnivores with different degrees of generalist feeding behavior using a variety of benthic habitats ( Table 2). Lutjanids and lethirinids including G. grandoculis, L. miniatus, and L. sebae have a strong association to hard bottom or substrate with a certain degree of vertical relief (Parrish, 1987). L. sceleratus and A. stellatus, on the other hand, have a preference for sandy bottoms (Randall, 1967;Rousou et al., 2014). Seafloor backscatter can help to differentiate hard from sandy bottoms; therefore, this study has the hypothesis that the inclusion of seafloor backscatter will improve the accuracy of the models for the lutjanids and lethirinids species. For S. queenslandicus and L. macrorhinus, water column variables may be more important in explaining their distribution (Collette and Nauen, 1983;Gutteridge et al., 2011), and it is expected that the inclusion of seafloor backscatter data to have a marginal effect on their models.
Random Forest (RF) is a robust statistical method with many advantages to solving ecological problems, including high classification accuracy and particularly high capacity to model complex interactions without statistical pre-assumptions like normality (Breiman, 2001). The algorithm begins by selecting a bootstrap sample from the data, approximately 63% of the original observations are used at least once in the bootstrap sample. The rest of the observations not selected for the bootstrap sample are called out-of-bag (OOB) observations. RF fits a tree to each bootstrap sample, but in each node, only a subsample of the variables is available for the binary partitioning (one-third of the total number of variables in the case of regression and the square Abalistes stellatus Sand, sponge, and weed areas on deep slopes. Feeds on benthic animals Randall, 1967 root in the case of classification). All the trees are fully grown and used to predict the OOB observations. The predicted value for each observation is based on the average value predicted by the trees (Breiman, 2001). In this study, we used RF classification to model the presence/absence of the nine selected species and RF regression for the richness of species. For the RF classification, the sensitivity and specificity were evaluated using the Area Under the Curve (AUC) of the Receiver Operator Curve. The AUC varies between 0 and 1. Values higher than 0.9 are considered outstanding whereas values between 0.9 and 0.7 indicate good performance. Values lower than 0.7 indicate poor prediction and values lower than 0.5 indicate that the model is not better than a random classification (Hosmer et al., 2013). The performance of the models was also tested using the F1, and Kappa statistics. The F1 is the harmonic mean of precision and sensitivity while the Kappa statistic (K) measures the level of chance-corrected agreement between the observed and predicted classes. According to Landis and Koch (1977) the level of agreement measured with K can be classified as K < 0.0 poor, 0 ≤ K < 0.2 slight, 0.21 ≤ K < 0.4 fair, 0.41 ≤ K < 0.6 moderate, 0.61 ≤ K < 0.8 substantial, K > 0.8 almost perfect. The effect of including the backscatter data as explanatory variables in the accuracy of the models was examined using two scenarios, the first one including depth and depth derivatives (DV) and in the second one the two backscatter variables were added (DVBS). A fivefold cross-validation procedure was used, for each fold 65 percent of the data was used to train the model and the rest to test it, an AUC, F1, and K was obtained for each fold and the mean and standard error of the statistics is reported. The difference in the mean AUC, F1, and K for the DV and DVBS scenarios was tested using a t-test in R. The mean importance of the covariables was also calculated. For the RF regression the accuracy was measured by the mean square error (MSE), and also the percentage of variance explained by the model is reported.

Angle vs. Range Analysis
A significant correlation was found in the Mandu area between the phi sediment size estimated using the backscatter data in the ARA analysis and the ground-truth sediment samples grain size (r = 0.59, p < 0.001, r 2 = 0.25, p < 0.001). A significant correlation was also found in the Pt Cloates area between the phi sediment size estimated using the backscatter data in the ARA analysis and the ground-truth sediment samples grain size (r = 0.47, p = 0.003, r 2 = 0.22, p < 0.001). The relationship between the grab grain size and the ARA-phi for the full data combined was also significant (p < 0.001, Figure 2). No significant correlation was found for the Gnaraloo site.

Species Distribution Models
The performance of the models was species-and area-dependent with some species being better modeled in some areas than others and all species models having acceptable levels of accuracy (mean AUC > 0.7) in at least one of the studied sites ( Figure 3A). The effect of adding the backscatter data (DVBS) also varied by species and study site with no consistent improvement in the accuracy of the models. The Mandu area had fewer models of species with acceptable levels of accuracy (mean AUC > 0.7) while Pt Cloates had only one species with model mean AUC consistently < 0.7. A similar pattern was observed in terms of K with lower values K < 0.2 in the Mandu area and higher mean K-values in the Pt Cloates area K > 0.4 considered as moderate performance ( Figure 3B). The majority of the species had relatively high F1 mean > 0.7 (Figure 3C).
For G. grandoculis, the inclusion of the seafloor backscatter had a positive effect on the performance of the models increasing the mean AUC, K and F1 in all the three study sites. The increase on the mean value was significant for the K statistic in the Mandu and Pt Cloates area, and for the F1 in the Pt Cloates area. The significant increase of the F1 in the Pt Cloates area meant the accuracy of the resulting model was F1 > 0.7. The significant increase of the mean K also meant the model including the seafloor backscatter had an accuracy considered moderate (K > 0.4). Although the increase of the mean AUC in the G. grandoculis models was not significant, the mean AUC (± se) for Pt Cloates area was > 0.7 after the inclusion of the seafloor backscatter data (Figure 3A).
The DVBS scenario had a better performance in the models of L. miniatus, and S. queensladicus in at least two of the study sites, with different levels of improvement. The inclusion of DVBS resulted in a significant increase of the mean AUC for L. miniatus in the Pt Cloates area ( Figure 3A) and a significant increase in the mean K in the S. queensladicus model in the Gnaraloo area ( Figure 3B). A significant increase in the mean AUC was observed for the L. sceleratus model (Figure 3A) in the Mandu area, and a significant increase of the mean K in the Gnaraloo area (Figure 3B), when the seafloor backscatter was included; however, the mean AUC of the model in the Mandu area remained < 0.7.

Richness
The RF for the richness of demersal species explained different level of variance in the three study sites ( Table 3). The model with the lowest level of mean explained variance was at Gnaraloo, although, the DVBS scenario produced a significant increase of explained variance by 5% (P < 0.05). In the Mandu area, a significant portion of variance was explained by the models, with more than 25% of mean explained variance. However, no change in explained variance was observed in the DVBS scenario. The richness of demersal fish species was particularly well modeled in the Pt Cloates area with variance explained of greater than 40%. The importance of the backscatter data was evident with an increase of the explained variance in the DVBS scenario, although the increase was not significant (Table 3).

Variables Importance in the Distribution Models
A summary of the most important variables explaining the distribution of the species is shown in Table 4. Depth and seafloor backscatter were the most important variables in the construction of the models for the majority of the species in the three study sites (Table 4). Depth was key for the majority of the species in the three study areas, with some exceptions. Variables related to terrain variability (including roughness, TRI, and standard deviation of depth) were important for many of the species, in particular at a broad-scale (15-21 neighbors). For the models of L. sceleratus, L. macrorhinus, and S. queenlandicus, for example, the terrain variability variables had higher importance than depth in the Pt Cloates area.
The seafloor backscatter, and the ARA-phi layer, were among the three most important variables in the models of six of the seven studied species in at least one of the study areas. For G. grandoculis, the ARA-phi was the second most important variable in the models of the three study sites, confirming its importance for this species as shown by higher mean AUC, F1 and K of the DVBS models compared to the DV scenario. For L. sceleratus DVBS models in both the Mandu and Gnaraloo areas the ARA-phi and the backscatter mosaic ranked among the three most important variables in the models. The ARAphi variable was identified as one of the three most important variables in the models of L. miniatus, S. queenslandicus, in both Pt Cloates and Gnaraloo areas. For the L. macrorhinus and L. sebae model, the ARA-Phi was important in the Gnaraloo and Pt Cloates areas, respectively.
Depth was the most important variable in the construction of the model for the richness of species in the Mandu area, in both DV and DVBS scenarios. For the Pt Cloates area, TRI, followed by the profile curvature were the most important variables explaining the richness of species. The ARA-phi layer was also considered important when included in the model for Pt Cloates, although with a lower ranking. For the Gnaraloo area, slope, profile curvature and depth were the most important variables in the DV model, while for the DVBS model, both the ARA-phi layer and backscatter mosaic were second and third in importance.

DISCUSSION
The accuracy of the species distribution models based on depth and depth derivatives varied among species and study sites. Higher accuracies were observed, in general, for the species in the Pt Cloates area, which is considered to have a complex seafloor. The terrain variables were less successful in modeling the presence of the species in the Mandu area. The addition of the seafloor backscatter in the species distribution models did not necessarily increase the model's accuracy in a significant manner, although, in the majority of the cases the ARA-phi layer was ranked as an important variable when included in the models. The ARA-phi layer was particularly important in the model of G. grandoculis in the three study sites, and L. miniatus in two areas, increasing the accuracy of the models. A significant portion of the species richness variation was explained using the terrain variables, and the addition of the seafloor backscatter improved the accuracy of the model in the Gnaraloo area.

Backscatter Derivatives
A significant relationship was found between the phi size estimated with the ARA analysis and the grain sediment size measured from the grab samples. However, the ARA-phi analysis did not identify coarse gravel sediments (cobbles) with phi values below −3. Previous studies have suggested the inclusion of backscatter, and in particular, the use of the angular response of the backscatter can add to the discrimination between benthic habitats (Hasan et al., 2014). However, the seafloor backscatter intensity can be affected in different ways by the frequency of the echo-sounder, sediment grain size, nature, and magnitude of seabed roughness, and volume scattering by subsurface scatters (Ferrini and Flood, 2006). For example, previous studies have shown that the use of high-frequency MBES (e.g., 300 kHz) can lead to misclassification of coarse sediments when the grain size is larger than the acoustic wavelength of the sonar. In such cases, there is a decrease in the backscatter values for sediments of increasing grain size (i.e., λ = -2.3 φ, equal to 5 mm; Eleftherakis et al., 2014). In this study, something similar was observed with a significant correlation between grain size and the acoustic backscatter (ARA-phi), but low agreement between the two for large grain sizes. Also, scattering register by a high-frequency echo-sounder would be related to seabed surface roughness while scattering by particles under the sediment-water interface will be relatively more important at lower frequencies (Jackson et al., 1986). A previous study compared a high (200 kHz) and low (50 kHz) frequency echosounders and its ability to discriminate sediment grain size and found the higher frequency system failed to differentiate between sediment grain sizes even between mud and sand (Freitas et al., 2008). The importance of the different variables influencing the backscatter of the seafloor can also vary between sampling sites (Ferrini and Flood, 2006). Therefore the seafloor backscatter on its own has limitations to predict seabed characteristics (Ferrini and Flood, 2006). A drawback in the approach adopted in this study was using a constant number of stack pings during the ARA analysis, as the area sampled would then depend on the water depth. As a result, the sampling areas in the shallowest depths were around three times smaller than in the deepest zones. This can produce a misinterpretation of the sediment class in deeper areas, in particular, in areas of transition between two different classes. However, the vast majority of stereo-BRUVs deployments were not located in areas of transition between different ARA-phi classes reducing the risk of mixing sediment classes. Hence, it is unlikely that the different resolutions of the ARA-phi size layer had a significant effect on the species distribution models. The high ranking of the ARA-phi in the models of species distribution, reinforce the idea that the resolution of the variable was appropriate.

Species Distribution Models
For the majority of species, depth was a primary variable in explaining their distribution across the three study sites and for both DV and DVBS scenarios. Depth is a common variable influencing the distribution of species in coral reef areas, as it is related to the effects of light availability on community composition and function (Hill et al., 2014).
The importance of the depth derivatives at different window sizes varied among species and study sites. For the most abundant species, such as A. stellatus, L. sceleratus, and G. grandoculis, broader-scale variables (15 × 15 and 21 × 21 windows size) of TRI, roughness, slope and fractal dimension were considered key variables in explaining their distribution. These results agree with previous studies, showing that broad-scale variables are more relevant for species with higher mobility and larger home ranges that use a variety of benthic habitats (Franklin et al., 2009;Tamburello et al., 2015). For other species, like L. macrorhinus, which had the lowest prevalence in the study, the fine-scale variables were more important in two of the study areas, indicating a higher level of specialization. For the remaining species, a mix of fine and broad-scale variables was important in the construction of the distribution models.
The ARA-phi layer which was calculated with a broad resolution of 60 m, was found to be one of the three most Only the three variables with highest ranking of importance are included for each species and each scenario. The scenario of depth and derivatives (DV) and depth, depth derivatives and backscatter data (DVBS) scenarios are shown.
important variables in the species models. This reaffirms the importance of broad-scale variables for roaming species with a wide niche (Monk et al., 2011;Moore et al., 2011). The backscatter mosaic at 3 m resolution was often included as a key variable, though to a lesser extent. This study investigated the hypothesis that the addition of the seafloor backscatter would increase the accuracy of the models, in particular, for G. grandoculis, L. miniatus, L. sebae, L. sceleratus, and A. stellatus models. Seafloor backscatter data were consistently important in the models of G. grandoculis, increasing the model's accuracy for the three study sites. G. grandoculis is a species that inhabits rocky bottoms (Dorenbosch et al., 2005), which can explain the importance of backscatter in the construction of the models as this variable can be used to differentiate between soft/hard bottoms (Kloser et al., 2010). L. miniatus is associated with sand around coral reefs areas where it feeds on benthic invertebrates, which could explain the importance of the seafloor backscatter in the models of two of the study sites (Carpenter and Niem, 1998). However, results showed only an increase of 2-5% in the model mean AUC for G. grandoculis, L. miniatus and L. sceleratus, in at least two of the study sites, and similar incremental increases in F1 and K. Also, the increase of the mean AUC was only significant for L. sceleratus and L. miniatus, therefore the improvement can only be seen as indicative. L. miniatus is often more prevalent in shallow waters, such as on the Great Barrier Reef where it was found in 12-18 m of water (Newman and Williams, 2001), which may explain the lack of importance in the model accuracy for this species in the Mandu area. Depth might play a more important role at Mandu, where rapid changes in bathymetry are observed (Brooke et al., 2009). L. sceleratus, inhabits offshore sandy bottoms in their early life stages with a habitat shift to deeper or rocky grounds for the largest individuals (Fitzpatrick et al., 2012). The inclusion of the ARA layer may, therefore, add useful information to differentiate between sandy and rocky habitats. For L. sebae, the inclusion of the seafloor backscatter had a positive effect on the accuracy of the models for the Pt Cloates while variables measuring the rugosity of the seafloor were particularly important for this species in the three study sites. This species is associated with exposed reef slope (Fitzpatrick et al., 2012) which could explain the importance of variables related to the complexity of the seafloor as coral reef areas have, in general, higher levels of terrain complexity and rugosity. Depth and backscatter were not considered as important in explaining the distribution of some species. For example, L. macrohinus is a small species of shark whose distribution was more related to variables measuring the rugosity of the seafloor. Another species, S. queenslandicus, is an epipelagic neritic schooling species (Collette and Nauen, 1983;Kailola et al., 1993), which might explain the poor performance of the models for this species in two of the study sites, as variables of the terrain might not be related to its distribution.
Previous studies have found the addition of backscatter metrics can be important in the construction of models of demersal fish distribution (Monk et al., 2011) or suggested further studies were needed to assess the relationship between seafloor backscatter and the assemblage of demersal fish (Schultz et al., 2014). The results of this study showed that the seafloor backscatter was an important variable in the models of demersal fish distribution. However, the inclusion of this variable did not necessarily lead to an improvement in the accuracy of the models. Possible reasons for that may be that the depth and derivatives were capturing the significant changes in the habitat, or that the substrate was not a significant driver for the species distribution.
Another factor is the uncertainty associated with the use of seafloor backscatter to approximate specific characteristics of the seafloor, including roughness and hardness, but also sediment grain size. The amount of energy reflected by the seafloor is affected by the frequency of the MBES, and although higher frequencies are more affected by seabed roughness, they have less penetration in the sediment. For example, a 100 kHz frequency in fine sediment is expected to penetrate between 0.1-1 m (Fonseca et al., 2002), while a 12 kHz can penetrate up to 12 m in muddy deposits (Schneider von Deimling et al., 2013). The level of penetration in the sediments of high frequencies is also highly sensitive to small changes in sediment properties, in particular, between fine sediments (Gaida et al., 2018). Higher frequencies are also less effective to map coarse sediment larger than the wavelength of the MBES, which will have a lower level of acoustic reflectance. Therefore, the high frequency of the MBES (300 kHz) could limit the power of discrimination between benthic habitats (Boscoianu et al., 2008;Schneider von Deimling et al., 2013), that might be relevant for habitat selection of demersal fish. The use of multi-frequencies could increase our ability to discriminate between seabed environments (Feldens et al., 2018;Gaida et al., 2020). Previous studies have shown the use of multiple frequencies, in particular lower frequencies (e.g., 100 kHz) can help differentiate between soft sediments (Costa, 2019). However, the possible improvement of benthic habitat discrimination will also be related to the characteristics of the study site, as shown by Gaida et al. (2018).
The analysis of the 656 stereo-BRUVS showed that only around 3% of the species were moderately prevalent, occurring in ≥ 20% of the sampling point (Simpson and Waples, 2012). In the present study, we included some of these species, those with a minimum of 25 occurrences on each of the three sites in an effort to compare the model performance in different areas of the NMP. Therefore, they all presented a certain degree of generalist behavior which is related to less specialized habitat requirements, and as a result it is more difficult to produce well performing models of the species distributions (Wilson et al., 2008). Finally, the combination of stereo-BRUVS with acoustic data in itself includes a certain degree of uncertainty, for example, by the use of bait which can attract fish from the areas around the deployment. The possible impact of the incorrect location of presence records in the models of species distribution will be less likely for species for which depth and ARA-phi variables were important, considering that both variables have a high local spatial autocorrelation (in an area of 500 m around the stereo-BRUVs deployment) (Naimi et al., 2014). Future studies using more than one frequency are needed to better test the benefit of using the seafloor backscatter in habitat distribution models of species of demersal fish, in particular, for non-generalist species.

CONCLUSION
Demersal species were well modeled with the depth and depth derivatives in the majority of the species analyzed, in at least one of the study sites. The addition of the backscatter data increased the accuracy of the models for some species, in particular, a consistent positive effect was observed for G. grandoculis. Depth derivatives can integrate some of the seafloor roughness information which may explain the limited benefit of adding the backscatter data in some of the species distribution models. Additional information related to the hardness/roughness not included in the depth derivatives were important for some species for which the inclusion of the backscatter data had a positive effect.
For some species the mosaic backscatter layer appeared as an important variable in explaining their distribution. In general, however, the ARA-layer was more important for the variables in the construction of the models. This is an encouraging result that demonstrates that the use of novel derivatives which take advantage of the angular response can produce models with higher accuracies. Although the increase in the accuracy of the models was not significant for the majority of the species, it can be considered an indicative result, and more efforts are needed to confirm this pattern.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The animal study was reviewed and approved by the Curtin University Ethics Committee, AEC approval number: AEC_2014_09.

AUTHOR CONTRIBUTIONS
ML: conceptualization, methodology, formal analysis, writing-original draft, writing-review, and editing. MP: conceptualization, supervision, writing-original draft, writingreview, and editing. BS: conceptualization, methodology, writing-review, and editing. BR: conceptualization, formal analysis, writing-review, and editing. IP: conceptualization, investigation, supervision, writing-review, and editing. All authors contributed to the article and approved the submitted version.

FUNDING
Consejo Nacional de Ciencia y Tecnologia (CONACYT), provided a Ph.D. scholarship CVU 230139 for ML.