A Systematic Review of Remotely Operated Vehicle Surveys for Visually Assessing Fish Assemblages

Anthropogenic activities and greater demands for marine natural resources has led to increases in the spatial extent and duration of pressures on marine ecosystems. Remotely operated vehicles (ROVs) offer a robust survey tool for quantifying these pressures and tracking the success of management intervention while at a range of depths, including those inaccessible to most SCUBA diver-based survey methods (∼>30 m). As the strengths, limitations, and biases of ROVs for visually monitoring fish assemblages remain unclear, this review aims to evaluate ROVs as a survey technique and to suggest optimal sampling strategies for use in typical ROV-based studies. Using the search engines Scopus™ and Google Scholar™, 119 publications were identified that used ROVs for visual surveys of fish assemblages. While the sampling strategies and sampling metrics used to annotate the imagery in these publications varied considerably, the total abundance of fish recorded over strip transects of varying dimensions was the most common sampling design. The choice of ROV system appears to be a strong indicator of both the types of surveys available to studies and the success of ROV deployments. For instance, larger, more powerful working-class systems can complete longer and more complex designs (e.g., swath, cloverleaf, and polygonal transects) at greater depths, whereas observation-class systems are less expensive and easier to deploy, but are more susceptible to delays or cancellations of deployments. In more severe sea state conditions, radial transects, or strip transects that employ live-boating or a weight to anchor the tether to the seafloor, can be used to improve the performance of observation-class systems. As these systems often employ shorter tethers, radial transects can also be used to maximize sampling area at greater depths and on large vessels that may rotate substantially while anchored. For highly mobile species, and in survey designs where individuals are likely to be recounted (e.g., transects along oil and gas pipelines), relative abundance (MaxN) may be a more robust sampling metric. By identifying subtle, yet important, differences in the application of ROVs as a tool for visually surveying deep-water marine ecosystems, we identified key areas for improvement for best practice for future studies.

Anthropogenic activities and greater demands for marine natural resources has led to increases in the spatial extent and duration of pressures on marine ecosystems. Remotely operated vehicles (ROVs) offer a robust survey tool for quantifying these pressures and tracking the success of management intervention while at a range of depths, including those inaccessible to most SCUBA diver-based survey methods (∼ >30 m). As the strengths, limitations, and biases of ROVs for visually monitoring fish assemblages remain unclear, this review aims to evaluate ROVs as a survey technique and to suggest optimal sampling strategies for use in typical ROV-based studies. Using the search engines Scopus TM and Google Scholar TM , 119 publications were identified that used ROVs for visual surveys of fish assemblages. While the sampling strategies and sampling metrics used to annotate the imagery in these publications varied considerably, the total abundance of fish recorded over strip transects of varying dimensions was the most common sampling design. The choice of ROV system appears to be a strong indicator of both the types of surveys available to studies and the success of ROV deployments. For instance, larger, more powerful working-class systems can complete longer and more complex designs (e.g., swath, cloverleaf, and polygonal transects) at greater depths, whereas observation-class systems are less expensive and easier to deploy, but are more susceptible to delays or cancelations of deployments. In more severe sea state conditions, radial transects, or strip transects that employ live-boating or a weight to anchor the tether to the seafloor, can be used to improve the performance of observation-class systems. As these systems often employ shorter tethers, radial transects can also be used to maximize sampling area at greater depths and on large vessels that may rotate substantially while anchored. For highly mobile species, and in survey designs where individuals are likely to be recounted (e.g., transects along oil and gas pipelines), relative abundance (MaxN) may be a more robust sampling metric. By identifying subtle, yet important, differences in the application of ROVs as a tool for visually surveying deep-water marine ecosystems, we identified key areas for improvement for best practice for future studies.

INTRODUCTION
In light of global anthropogenic threats, such as climate change, pollution, and overexploitation, an increasing number of marine biological communities require conservation (Veitch et al., 2012). Fundamental to these conservation initiatives is the need for robust monitoring of the focal ecosystems (Espinoza et al., 2014;Addison et al., 2018). However, comprehensive monitoring of our marine ecosystems, particularly the deeper regions, is difficult, and has resulted in most of the spatiallyand temporally-mature monitoring datasets being constrained to depths shallower than 30 m (e.g., SCUBA-diver based underwater visual census) (Andaloro et al., 2013). While methods do exist for assessing deep-water marine ecosystems, such as longline (Brooks et al., 2011;Santana-Garcon et al., 2014;McLean et al., 2015), and bottom dredging/trawling (De Leo et al., 2010;D'Onghia et al., 2012), ecological concerns encourage the use of non-destructive methods (Murphy and Jenkins, 2010;Lyle et al., 2014). The development of robust, non-destructive videobased survey techniques is an emerging field that allows for in situ observations of species, their distributions, behaviors and habitat associations in an array of habitats, including these difficult, deep-water ecosystems (Mallet and Pelletier, 2014). However, some of these mobile video-based methods, including Autonomous Underwater Vehicles (AUVs), underwater towed videos (UTVs) and remotely operated vehicles (ROVs), are in their early stages of development and require proper evaluation and standardization before they can be reliably used for biological monitoring (Karpov et al., 2012;Lauermann, 2014).
ROV surveys are novel video-based tool for assessing fish assemblages, yet have a number of strengths that could make them important tools in future biological surveys (Huvenne et al., 2018). These strengths include the ability to deploy high-resolution video (forward and downward looking) along fixed and repeatable transects on targeted seabed features, while maneuvering around complex substrate, and recording the track of the vehicle over base-maps (Linley et al., 2013;Macreadie et al., 2018). This combination of strengths allows for quantitative estimates of benthic floral and faunal cover as well as benthic fishes, epi-benthic fishes, pelagic fishes, and ground-truthing of major habitat features (Quattrini et al., 2017). However, the strengths, limitations and biases of ROVs for visually monitoring fish assemblages remain unclear. This review aims to evaluate ROVs as a video-based survey tool and to suggest standardized operating protocols for typical ROV-based studies through trends in the literature.

METHODS
The Scopus TM and Google Scholar TM databases were accessed in November 2018 to search for peer-reviewed journal publications and gray literature  that used video footage collected by remotely operated vehicles (ROVs) to visually assess fish assemblages (Figure 1 and Supplementary Material). The review identified 119 publications that used ROVs to visually survey fish communities by using the keyword combinations "[(remotely AND operated AND vehicle * ) OR (ROV OR ROVs) AND fish OR fishes)"] Articles that (i) did not use a videobased approach, and (ii) did not survey fish assemblages, were excluded from this review. The publications from this search provided information on the general trends in ROV survey application and comparisons of ROV surveys with other deepwater survey methodology. As surveys were often conducted over several years, publication years were used to analyze trends. Furthermore, publication years may also better reflect trends in the types of sampling metrics and analyses used for archived video footage from unstandardized industrial ROV surveys on oil and gas structures. ROV system costs were determined by the prices listed online by commercial retailers and inferred for specially-built models owned by research institutions.

Types of Remotely Operated Vehicles
Remotely Operated Vehicle (ROV) surveys use an unmanned underwater submersible that transmits real-time video observations and environmental readings (e.g., depth, compass heading) via an umbilical tether to the operator at the surface. ROVs are available in a range of systems from smaller observation-class ROVs (∼3-20 kg for mini and ∼30-120 kg for regular-sized models) to larger working-class systems (100-1500 kg for light and up to 5,000 kg for heavy-duty models), which vary in power, depth rating, accessibility, and additional payload capabilities (Baker et al., 2012;Romano et al., 2017;Huvenne et al., 2018) (Table 1). Since the first publication in 1996, ROV systems are becoming increasingly used as a deep-water survey method (Figure 2). High-definition video cameras carried as extra payload provide researchers with permanent records of biota and their habitat associations . While some studies used photography to makes these assessments, these studies were typically focused on mega-benthic taxa (Salvati et al., 2010;Thresher et al., 2014;Lacharité et al., 2015;Cánovas-Molina et al., 2016), and may not be ideal for moving targets such as fish.
With advances in technology, a wider range of ROV models are becoming available, including many low-cost models (Figure 3), allowing researchers greater access to deep-water environments. New gear developments have also led to the creation of hybrid ROV designs capable of autonomous deployments without an external energy source (Huvenne et al., 2018). For example, the Boxfish ROV (https://www. boxfish-research.com/) or BlueROV2 (https://www.bluerobotics. com) uses battery packs that last between 3 and 6 h that can be periodically exchanged for charged batteries on-board the vessel. As the umbilical tether no longer supplies power to the ROV system, the tether thickness is reduced, improving ROV maneuverability. Open source files on the construction of ROVs from low-cost materials have further increased the accessibility of these systems to researchers and to the public, such as the OpenROV initiative (https://www.openrov.com) (Jessup, 2014).

Depths/Locations Surveyed
High maneuverability and deep-water capabilities allow ROVs to make fine-scale assessments of fish assemblages over a FIGURE 1 | The sampling parameters identified for visual-based ROV surveys of fish assemblages.
TABLE 1 | General differences in capabilities between observation-class and working-class ROV systems, with "-" signifying qualities that are less/fewer than and "+" signifying qualities that are more/greater than.

Observation-class
Working-class -+ Personnel required to operate -+ Size of vessel required to deploy from -+ Accessibility of use + wide range of habitats (high relief, ledges, crevices) and depth distributions that may not be suitable for other methods. For example, on highly complex substrates in the Caribbean, Quattrini et al. (2017) determined that 42% of fish species were found at greater depths than previously recorded. The specific depth capabilities of each ROV system varies, with smaller observation-class models typically surveying shallower waters than working-class models (Table 2; Figure 4). ROV surveys have been conducted off the coast of all continents, with the vast majority of studies having been undertaken off the coast of the United States and Europe ( Figure 5). While many studies have taken place in the Mediterranean Sea, the Gulf of Mexico and the north-east and north-west Atlantic Ocean, few studies have been done in the southern hemisphere and Asia ( Figure 5).

ROV-Based Surveys
The extensive range of motion capable by ROVs provides new possibilities in surveying methodology not capable by other  deep-water (>30 m) or video-based survey techniques. ROVbased sampling strategies often reflected the aims of the study, which could be classified into six major types, (1) surveys in natural habitats, (2) surveys on artificial structures, (3) surveys in marine protected areas (MPAs), (4) opportunistic/exploratory surveys without the use of transects, (5) studies that evaluate the effectiveness of ROVs, and (6) studies that compare ROVs with other survey methods. While ROV surveys have primarily used horizontal strip transects, similar to SCUBAbased underwater visual counts (UVCs), or unstandardized transects (Figure 6), alternative sampling strategies and transect designs (i.e., cloverleaf, radial, and polygonal patterns) are achievable (Table 3; Figure 7).

Metrics Scored From ROV Imagery
Although an extensive array of sampling metrics has been used to analyze ROV surveys, total abundance, diversity, body length estimates, and natural behaviors were consistently used between different study types ( Figure 8A). While surveys in natural habitats have used each type of sampling metric to annotate the video footage, surveys in marine protected areas have only used total abundance, diversity, body lengths, and natural behaviors of fish communities ( Figure 8B). Total abundance, the total number of individuals recorded throughout the transect, was the most frequently used metric ( Figure 8B; 86% of studies), and was often converted into catch-per-unit-effort (Amend et al., 2001) or catch-per-unit-time (Söffker et al., 2011;Bryan et al., 2013) to facilitate comparisons across locations and between different survey methods (Adams et al., 1995;Pita et al., 2014). As total abundance assumes sampling independence of individuals, fish that can swim faster than the ROV or are attracted to the ROV system itself may result in recounts that could potentially inflate population estimates. Alternative abundance metrics, such as relative abundance (MaxN)-the maximum number of individuals in a deployment (Ajemian et al., 2015a,b;McLean et al., 2018), and weighted encounters-where scores are assigned based on the order or frequency that species are seen on transect (Moser et al., 1998;Pradella et al., 2014), were used to mitigate the effect of individual recounts. This metric was most often used on artificial structures (33% of studies), where multidirectional movement along the structure often resulted in fish overtaking the ROV. Percent cover was the least used sampling metric, accounting for one publication that assessed the distribution of juvenile silver hake (Auster et al., 1997). Diversity indices were also commonly used for ROV surveys ( Figure 8B, 45% of studies), where they have been used to describe the total number of species recorded per deployment, species richness (S) (Carpenter and Shull, 2011;Consoli et al., 2016), and the evenness of each species within the community, Pielou's evenness (J'), Shannon's diversity index (H'), and Simpsons diversity index (Johnson et al., 2003;Harter et al., 2009;Ajemian et al., 2015a;Quattrini et al., 2017). Percent occurrence-the sum of transects in which a species was observed, was also used as a measure of diversity across years with different sample sizes (Auster et al., 1997;Pacunski et al., 2013). Presence/absence (Figures 8A,B) has also been used as a versatile metric for ROV-based studies with unbalanced survey designs (Duffy et al., 2014), where they are effective for inferring distribution ranges and the catchability of species to different survey methods (Karpov et al., 2004).
As a video-based method, ROVs can obtain accurate body length measurements of fish, without the destructiveness of traditional methods, such as trawling or hook, and line. These body length measurements can be used to assess trophic structure (Auster et al., 2003;Dance et al., 2011), which is valuable for evaluating the condition of ecosystems and for making comparisons between marine reserves and adjacent habitats. Both calibrated stereo-video and scaling lasers have been used to estimate the fish length. A study by Dunlop et al. (2015) found that while stereo-video was less influenced by the orientation and height of the organism recorded, it took considerably longer than lasers to analyze. Length estimates have also been used to standardize sampling area by limiting population counts to a certain transect width and distance in front of the camera's FIGURE 4 | Temporal trends in the average maximum depth (m) surveyed using mini-sized observation-class (white), regular-sized observation-class (Light Gray), light working-class (Dark Blue), and heavy working-Class (Light Blue) ROV systems.
FIGURE 5 | The location of studies using ROV surveys to visually assess fish assemblages identified using the keywords "remotely operated vehicle*," "ROV or ROV's," and "fish or fishes" in the search engines Scopus TM and Google Scholar TM .
field-of-view (FOV), while maintaining a constant height above the seafloor (Mapula et al., 2016). Current strength and other weather conditions, however, may interfere with the ROV's ability to maintain a constant position in the water column, leading to inconsistent sampling areas (Mapula et al., 2016). Furthermore, studies in turbid waters and studies that survey small-bodied or cryptic species may require the ROV to be flown closer to the seafloor, decreasing the FOV. Variations in applications have thus led to differences in the height above the seafloor (0.2-3.0 m), the transect width (0.35-6.0 m) and distance recorded in front of the camera (0.5-4.0 m), which may limit comparisons between surveys and locations. Larger, workingclass vehicles typically also recorded larger FOVs than with observation-class systems (Trenkel et al., 2004a,b;Trenkel and Lorance, 2011;Baker et al., 2012).
Although initially used for determining substrate composition in deep-water biological assessments as a replacement for manned submersibles (Koenig et al., 2005), advancements in video quality and in ROV technology have allowed ROVs to become a more practical and affordable method for providing assessments for a wide variety of flora and fauna, including elasmobranchs (Benz et al., 2007;Henry et al., 2016), teleost fish (Carpenter and Shull, 2011;Haggarty et al., 2016), cephalopods (Smale et al., 2001;Zeidberg and Robison, 2007), gastropods (Butler et al., 2006;Stierhoff et al., 2012), macro-algae (Spalding et al., 2003), corals (Doughty et al., 2014;Etnoyer et al., 2018) and other macroinvertebrates (Grinyó et al., 2016;Hemery and Henkel, 2016). While ROVs can provide in situ observations of fish, their behaviors and habitat-associations that cannot be determined with traditional methods (i.e., trawls, longline) (Adams et al., 1995;Karpov et al., 2004;Linley et al., 2013), ROVbased sampling strategies must account for the unique challenges of surveying mobile organisms that are not applicable in surveys of sessile invertebrates and substrate.
Behaviors naturally exhibited by fish ( Figures 8A,B), such as swimming, feeding, and resting (Baker et al., 2012), may allow for a better understanding of small-scale influences on fish distribution (Lorance and Trenkel, 2006) and indicate behaviors that may result in over-counting (Ajemian et al., 2015a). Behavioral reactions of fish to ROVs have been frequently documented ( Figures 8A,B), and are important for establishing baseline information for species and locations, evaluating the FIGURE 6 | The percentage of studies using horizontal transects (standardized strip/line transects), exploratory transects (unstandardized transects used to make initial baseline inventories of species and community structure), swath transects (strip transects connected to form a grid-like pattern), vertical transects (mobile point count, continuous roving transect, depth-interval transect, vertical strip transect), radial transects (strip transects radiating from a central point), and timed transects (rapid visual count, timed swim, timed stationary counts, and modified timed swim with timed stationary counts) in surveys in natural habitats (n = 62 studies), studies on artificial structures (n = 19 studies), surveys in marine protected areas (n = 5 studies), exploratory surveys (n = 15 studies), surveys evaluating the effectiveness of ROV -based san1pling strategies (n = 7 studies), and surveys comparing ROVs against other surveying methods (n = 15 studies).
reliability of data generated for this method, and for making comparisons against other survey methods. Baker et al. (2012) used a basic scale of attraction, avoidance and no reaction, whereas Adams et al. (1995) used a scale that differentiated between weak and strong responses of attraction and repulsion. The type and severity of the reaction to the ROV can be influenced by a variety of factors, including the species, trophic position, and the body size and position of the individual relative to the seafloor as well as to different aspects of the ROV system (i.e., artificial lighting, thruster noise, speed) (Smale et al., 2001;Lorance and Trenkel, 2006;Stoner et al., 2008;Yamamoto et al., 2009;Söffker et al., 2011;Mapula et al., 2016). For example, Laidig et al. (2012) found that more fish reacted to the regular-sized observation-class ROV (57%) than to a larger, manned submersible (11%). The fish that did react to ROV presence were typically smaller-bodied individuals, individuals >1 m above the seafloor and species that aggregate (Laidig et al., 2012). Intuitively, ROVs traveling at greater speeds may increase the frequency and severity of reactions displayed and decrease the probability that cryptic individuals will be detected (Pacunski et al., 2008). However, this review was unable to locate any studies that specifically investigated the effects of different speeds on behavior. Furthermore, while most studies attempted to maintain a relatively constant speed (Meirelles et al., 2015), the actual speed traveled was only calculated for 26% of studies. Speeds reported varied between studies (0.1-1.0 m/s) and within each study (Quattrini et al., 2017), often as a result of current and drag. Standardizing deployment speed, however, is logistically difficult, but may ultimately become more achievable with advancements in thruster technology.
Artificial lighting is another important factor known to influence organism behavior (Smale et al., 2001), but is a critical component of night-time sampling, for surveys beyond the photic zone and in areas of high turbidity, and for improving the detection of small-bodied or cryptic species, such as flatfish (Norcross and Mueter, 1999;Pacunski et al., 2013). In an ROV experiment comparing the behavior of sablefish (Anoplopoma fimbria) to different lighting conditions around a bait source, Widder et al. (2005) found that more fish avoided white light than red light. While the majority of ROV studies (∼67%), did not indicate whether lights were used, the depth surveyed by many of these studies suggests that most would have used some form of artificial lighting. The type and intensity of lighting used for ROV surveys, however, were either unspecified (∼16% of studies) or varied considerably. Even though the distribution of many species (e.g., Sebastes spp) are influenced by the time of day (Hart et al., 2010), few studies (∼2%) used nocturnal sampling. Timed metrics, such as time at first sighting-the time when a species was first seen (Norcross and Mueter, 1999;Ajemian et al., 2015b;Smith and Lindholm, 2016), and the duration of encounter (Laurenson et al., 2004;Luck and Pietsch, 2008;Trenkel and Lorance, 2011;Mundy et al., 2018), were infrequently used for ROV studies of fish (Figures 8A,B; n = 8 studies), but are likely a reflection of species-specific behaviors. Timed metrics have been used primarily in exploratory surveys for obtaining baseline information on species (Luck and Pietsch, 2008;Mundy et al., 2018) and in studies evaluating ROVs  ( Figure 8A), where the distance traveled on deployment before first sighting may indicate the sampling power required to survey different species .

Use in Natural Habitats
Biological assessments provide crucial information on population dynamics and species-habitat associations necessary for monitoring and conservation. These assessments must employ sampling strategies that are able to survey fish effectively and representatively over major habitat features across large distances (Trenkel et al., 2004a). Bathymetric maps created by multibeam echosounder (MBES) can be used to locate and stratify sampling over ecologically important habitat structures, such as reefs or other areas of high relief Linley et al., 2013). Ultra-short baseline (USBL) or long baseline (LBL) transponders can then be used to track and record the precise deployment path taken over multibeam-derived features (Stierhoff et al., 2013). This allows for accurate estimations of transect length and for the specific location of individual sightings to be determined for a better understanding of microhabitat features (Ajemian et al., 2015b). Horizontal strip transects are a straight-forward and wellestablished surveying strategy for providing standardized assessments of fish assemblages (Johnson et al., 2003) that are accessible to most ROV systems. Strip transects that are placed parallel to the coastline or isobath allow for greater replication within depth bands, while strip transects placed perpendicular to the coastline or isobath with deployments moving shallower increase the amount of time the seafloor is in view, but may decrease the length of transects on steep topography (Pacunski et al., 2008). Deployments in high wind conditions and current, however, may be limited to traveling down-current from the starting location.
The length of the umbilical tether and the current-induced drag on the tether cord (Ajemian et al., 2015b) can influence the distance the ROV can travel from the operating vessel. A "live-boat" technique, in which a clump weight is attached to the umbilical tether a short distance above the seafloor (Amend et al., 2001;Bryan et al., 2013;Haggarty et al., 2016), can be used to maintain the ROV at depth (Yamamoto et al., 2009) while allowing the unanchored vessel to move freely with the ROV during deployment (Pacunski et al., 2008). For locations that require additional stabilization from the current, particularly for systems with shorter tethers, and are at depth shallow enough to anchor the vessel, clump weights can be used to secure the umbilical tether to the seafloor, where it acts as a central starting point for radial transects. Radial transects, however, can be time-consuming and is not practical for studies using long transect lengths (>100 m) (Pacunski et al., 2008). Although timed swims are an efficient approach, inconsistent sampling effort between surveys and locations indicate that this method should be used sparingly when the survey area cannot be determined or for validating transect lengths generated by the USBL tracking system.

Use on Artificial Structures
Artificial structures, such as decommissioned oil and gas platforms (Adams et al., 1995;Andaloro et al., 2013) and artificial reefs (Patterson III et al., 2009;Dance et al., 2011) provide substrate for coral reef and sessile invertebrates to attach, creating the habitat complexity necessary for marine communities to prosper. Working-class ROVs have been a long-established tool for inspecting and maintaining underwater pipelines, promoting advancements in these systems as well as the opportunity for researchers to work alongside these companies to collect data on marine communities (Gates et al., 2017). Archived video footage of underwater pipelines can be analyzed into short strip transects to provide fine-scale population assessments of fish biodiversity (McLean et al., 2017). Swath transects can be used to survey fish nearby and over low-relief artificial substrate, such as sunken vessels (Ross et al., 2016), whereas radial transects are effective for assessing the influence of high-relief artificial structures on nearby biological communities (Taylor et al., 2014).
As many artificial structures have complex features and areas of high relief, alternative sampling strategies that incorporate vertical movement while minimizing the possibility of tether entanglement need to be considered. Bryan et al. (2013) used a modified timed swim along the hull of the vessels with a series of timed stationary counts to aid in identifying smallbodied and cryptic species, standardized to within 3 m of the reef. On larger artificial reefs, Ajemian et al. (2015b), compared continuous roving transects, a UVC-based approach whereby the ROV follows horizontal transects at the bottom and top of the reef, with depth-interval transects, a modified type of mobile point count that replaces 360-degree spins with stationary timed counts at pre-set depth intervals to avoid tether entanglement. This study found that continuous roving transects were more effective at surveying fish with patchy distributions, including several rare species, whereas depth-interval transects were more effective at recording overall fish community composition, were able to estimate fish densities at distinct depth strata and reduced time spent processing video footage. As sampling independence of individuals would be difficult to ensure for many of these approaches, relative abundance would be a more appropriate sampling metric than total abundance for surveys on artificial structures.

Use in Marine Protected Areas
Marine protected areas (MPAs) are a widely recognized tool for conservation that have been shown to increase the overall density and biomass of organisms within the MPA and the surrounding ecosystem (Barrett et al., 2007;Haggarty et al., 2016). Exploratory transects are beneficial for characterizing the types and quantities of different habitats and obtaining baseline information on community structure and species distributions to inform proposed MPA designs (Quattrini and Ross, 2006) and to concentrate future sampling efforts (Butler et al., 2006). In studies investigating MPA effects, sites within each reserve (impact) are compared with fished location(s) outside of the reserve that have similar depths and habitat profiles (control) (Karpov et al., 2012). While studies that employ a greater number of control sites are less likely to be influenced by site-specific differences in community, sites nearer to the MPA are more likely to be influenced by spill-over effects (Karpov et al., 2012). Collecting data before and after the establishment of a MPA, as part of a before-after-control-impact (BACI) approach, would account for the natural differences between MPAs and controlled sites, which may more accurately reflect the influence of disturbance events, such as trawling (Lindholm et al., 2015). Since the first publication in 2003, five studies have examined the influence of MPAs on fish communities using ROVs (Auster et al., 2003(Auster et al., , 2016Harter et al., 2009;Karpov et al., 2012), with only one employing a BACI sampling design (Haggarty et al., 2016).
Optimal sampling designs must consider the trade-off between transect lengths that are able to detect target species significant quantities and the number of replicates that are required to detect population change within and outside of the MPA. Studies using observation-class ROVs would more effectively survey an area using a greater number of replicate transects, whereas working-class systems can employ large swaths (>500 m segments) to survey marine protected areas (Karpov et al., 2012;Lauermann, 2014). Given the few sampling strategies identified, research into alternative sampling designs may provide better insight into strategies that can more effectively detect differences in biological communities between protected and fished sites. However, as gear-selective biases can influence the types and quality of data that are obtained by different survey techniques, one approach should be employed throughout the duration of a BACI study.

Exploratory Surveys
Exploratory studies are important for collecting baseline information in locations with little a priori knowledge, in order to obtain a general understanding of species distributions necessary to inform future research directives (Hall-Spencer et al., 2002). These studies generally maintain a relatively straightline trajectory but may deviate from the intended route to investigate specific habitat features (Thresher et al., 2014). For rarely documented deep-water species, such as sleeper sharks (Benz et al., 2007) and angler fish (Ho and McGrouther, 2015), as well as unique deep-water communities, such as at whale falls (Lundsten et al., 2010;Higgs et al., 2014), collecting information on morphological characteristics and behaviors is crucial for understanding the ecology of organisms in these environments. However, as the FOV is unstandardized, absolute abundances cannot be estimated (Stein et al., 2005;Caldwell et al., 2016). As a result, exploratory studies tend to collect qualitative rather than quantitative data (Söffker et al., 2011), which may hamper comparisons between locations and studies.

Method Evaluation Studies
Evaluating the effectiveness of ROVs and ROV-based sampling designs for surveying biological assemblages is essential toward understanding the capabilities of this method. Trenkel et al. (2004a) determined that spatial dispersion of individuals had the greatest effect on between-species variation, with aggregating species more susceptible to ROVs than those that were randomly or uniformly distributed. Optimal sampling units (i.e., transect lengths and number of replicates) are dependent on species distributions within the study location, with longer transects increasing the probability of detecting rare and cryptic species and species with patchy distributions (Norcross and Mueter, 1999;Karpov et al., 2004Karpov et al., , 2010Pacunski et al., 2013), but decreasing the overall number of replicates used, due to time constraints (Trenkel et al., 2004b). Furthermore, the number of species accumulated with increasing transect length will eventually plateau, decreasing overall sampling efficiency. Karpov et al. (2010) used a power analysis to assess the relationship between rockfish density and variance on strip transects of different sizes (50,100,200,400, and 800 m 2 ) to determine the optimal sampling unit for surveying rockfish off the west coast of the United States. This study evaluated a few sampling strategies, including randomly allocating transects of different lengths across depth and relief strata, and by using long 800 m 2 transects broken up into different-sized transect segments that were either systematically placed parallel to the shoreline or randomly placed within 500 m 2 wide rectangular areas (Karpov et al., 2010). Adams et al. (1995) defined statistical power as the sampling size required to detect a 50% reduction in the transformed mean abundance at a power of 0.8 and an α of 0.05. Karpov et al. (2004), on the other hand, argued that sample sizes that can detect 1.5 times less than the sample mean may be more practical for detecting depleted species and species that typically undergo large-scale population changes. Furthermore, this study suggested that only species with abundances that can be detected within 3 times the sample mean can be reliably monitored by ROVs (Karpov et al., 2004). The lengths of transects used in ROV surveys of fish, however, varied considerably between studies (0.05-20 km) and even within individual studies (Du Preez and Tunnicliffe, 2011;Baker et al., 2012;Duffy et al., 2014). While numerous approaches have been used to increase the sampling efficiency of ROV surveys, including towing the ROVs behind vessels (Pierdomenico et al., 2016) or tethering the ROVs to camera sleds (Quattrini et al., 2017), these approaches increase deployment speed at the expense of maneuverability (Mortensen et al., 2008). Sophisticated working-class systems that can maintain a precise deployment path, on the other hand, can reduce time spent deploying and retrieving the ROV for each transect by either dividing long strip transects (>500 m) into separate replicates (Karpov et al., 2010) or by connecting strip transects in a continuous grid-like pattern to form a swath transect.
Investigating how different aspects of the ROV apparatus (i.e., lights, thruster speed, size) affect fish behavior may give insight into species-specific differences in gear-selectivity and allow for specific aspects that minimize behavioral bias to be identified. Spanier et al. (1994) used laboratory and field experiments to investigate how different components of the ROV system influence the boldness and feeding behaviors of American lobsters (H. americanus). In laboratory experiments, tanks that had ROVs (treatment) were compared with tanks that did not have ROVs (control). Alternatively, field experiments compared ROVs that had lights, thruster speed set to 50% and a camera flashing every 15 min or whenever the lobster appeared out of its den (treatment) with ROVs that had the lights, thrusters and camera flash turned off (control) (Spanier et al., 1994). This study provided the first scientific evidence of behavioral bias toward ROVs but was unable to identify specific variables that contributed toward changes in lobster behavior as the different components were not individually tested. While Trenkel et al. (2004b) did assess the attraction and repulsion of fish to different intensities of light (1,200 and 2,700 W) and survey speeds (0.25 m/s and 0.5 m/s) of an ROV, this study was unable to generate quantitative estimates of fish abundance.

Method Comparison Studies
Method comparison studies allow researchers to assess the capabilities of each method for surveying fish in different habitats while under the same environmental conditions within the study location as well as the species-specific behaviors and reactions to each survey method that leads to differences in the strengths, limitations, and biases of these methods ( Table 4). As a result, comparison studies in different locations or survey different biota may lead to variations in performance. Study designs for method comparison studies must take into consideration the types of sampling metrics used to annotate the imagery. For

RT
RTs use anchored umbilical tethers to improve the performance of ROVs in less than ideal sea state conditions while maximizing the sampling area covered while using shorter tether cords typically associated with mini-sized systems Studies that require longer tether lengths (e.g., in deeper water or long transects) can use live-boating to provide greater stability to STs or OBs, although this sampling strategy is more subjective to sea state conditions and may present challenges in high relief habitats Artificial structures

MTS
MTSs provide intensive assessments of common and cryptic fish species on finite structures (Bryan et al., 2013), such as sunken vessels and artificial reefs, that are more practical for mini-class systems than are CRTs and DITs However, DITs may provide a better representation of community structure along vertical structures, such as oil and gas platform legs, whereas STs may be more appropriate for narrow structures, such as oil and gas pipelines, that may not benefit from stationary abundance counts instance, true abundance estimates obtained by ROV cannot be used to make reliable comparisons to relative abundance estimates collected by stationary methods, such as baited remote underwater video systems (BRUVS). However, studies using these two methods may be able to provide comparisons of community structure and the efficiency and cost of each method.
The specific capabilities of ROVs in comparison to other survey methodology should, therefore, be considered when developing appropriate experimental designs (Table 4). ROV surveys provide direct and non-destructive observations of habitat associations and behaviors not attainable by fisheriesdependent methods, such as longline and bottom trawling (Busby et al., 2005;Bicknell et al., 2016;Consoli et al., 2016;Mapula et al., 2016). Trawls and ROV surveys operate at much different scales, with ROVs intensively sampling a narrow area directly in front and extending up a short height off the bottom, and trawls sampling a much wider area, including a larger area off the bottom, but with greater number of escaped fish (Adams et al., 1995). Consequently, trawls are able to cover large enough areas to compensate for local variability, whereas ROVs must select sampling designs that either employ long strip transect lines or have enough replication selectively placed over targeted habitat features in order to account for patchy distributions (Norcross and Mueter, 1999). The mechanical nature of the sampling gear has resulted in major differences in the type of fish assemblages captured by each method. The catch-per-unit-effort was often higher for ROVs (Norcross and Mueter, 1999) and with lower coefficients of variation than fish obtained by trawls (Adams et al., 1995). While small, benthic, cylindrically-shaped fishes were more susceptible to ROVs (Adams et al., 1995), juveniles under 100 mm (Norcross and Mueter, 1999) and species with larger bodies that were further from the seafloor were more susceptible to trawling (Trenkel et al., 2004a). Overall, ROV surveys are better-suited to environmental assessments as higher abundances allowed for smaller population changes to be detected (Adams et al., 1995). In a study comparing the ability of bottom trawls, a stereodrop camera system (SDC) and ROVs to discriminate between rockfish species, bottom trawl was the most effective, while SDC was the least effective . Bottom trawl and SDC, on the other hand, were able to record a larger number of fish measurements. A study by Karpov et al. (2004) comparing hook-and-trap and ROV methods determined that ROVs were more efficient than the hook-and-trap approach, but took considerably longer in post-processing. Furthermore, as hookand-trap attracts fish from larger areas, rockfish abundances can be overinflated, making these methods less sensitive to actual stock declines (Karpov et al., 2004).
ROVs are able to survey an extensive range of habitats, including deep-water (>40 m) and hazardous environments, that are not attainable by UVCs (Busby et al., 2005;Boavida et al., 2015). As a SCUBA-diver based method, UVCs requires training and are limited by the time and the depth divers can spend underwater as well as by the diver's ability to quickly and accurately identify and count fish assemblages (White et al., 2013), which can lead to inconsistent results between divers. While differences between annotators may also lead to discrepancies when processing ROV data, video footage can be reviewed indefinitely with the help of experts if necessary. UVCs were found to be more effective than ROVs at obtaining reliable representations of fish communities at shallow depths (0-6 m and 12-18 m), particularly for crypto-benthic and nekton-benthic species (Andaloro et al., 2013). While Carpenter and Shull (2011) found similar results in a study comparing ROVs with paired-diver surveys of rockfish, the depth that fish densities were highest were below those attainable by conventional SCUBA-diver equipment. Therefore, despite differences in performance, ROVs are still able to access deeper depths not available to UVCs, making it a more robust biological survey tool (Andaloro et al., 2013).
UTVs and BRUVS are well-established methods of obtaining non-destructive assessments of fish communities (Karpov et al., 2004;Trenkel et al., 2004a). UTVs provide rapid assessments of biodiversity over large distances (Assis et al., 2007), whereas the multidirectional thruster-power capabilities of ROV systems allow for greater maneuverability around complex and high relief environments, such as rocky reefs, resulting in more detailed observations of biodiversity and improving the detection of cryptic and rare species (Consoli et al., 2016). BRUVS, on the other hand, have been demonstrated as an effective monitoring tool for monitoring carnivorous, omnivorous, and herbivorous fishes throughout Australia Watson et al., 2010;Caldwell et al., 2016). BRUVS use bait to attract fish in large abundances to the sampling area, avoiding many of the problems associated with zeroinflated datasets and increasing the statistical power to detect change (Cappo et al., 2003;Watson et al., 2005;Malcolm et al., 2011;Dorman et al., 2012). However, individuals attracted to the bait from the surrounding environment may bias habitat associations toward the site being surveyed. Given that BRUVS are unable to be deployed over highly complex environments, species associations may be further skewed toward more accessible habitat types. Additionally, bait plume variability can also influence the sampling area of attraction, making standardizing survey effort logistically difficult (Heagney et al., 2007;Wraith et al., 2013). While ROVs can survey a larger area over a greater range of habitats, BRUVS often record greater species densities in the smaller area sampled.
Few studies have compared ROV with Autonomous underwater vehicles (AUVs) and manned submersibles, despite employing similar methodology. In a study comparing the abundances and lengths of fish collected by ROV and manned submersible surveys across different habitats and depths in California, manned submersibles were able to record a greater number of species, body length estimates, and abundances of species that were found closer to the seafloor (Laidig and Yoklavich, 2016). While the number of studies comparing these methods are limited, ROVs are a more practical and inexpensive tool for monitoring (Koenig et al., 2005), and possess greater maneuverability than AUVs and manned submersibles.

CONCLUSION
The use of ROVs as a non-destructive method for visually surveying fish assemblages is a rapidly growing field with over 100 publications since 1995, and 65% of these studies coming from the last decade. Evaluation of the ROV as a survey method has been undertaken, with several publications finding that ROV-based surveys are comparable to more established survey techniques such as BRUVS or UVC. In the age of globally standardized datasets, this review identified the need for standardization of sampling protocols for ROV-based surveys. While some consistency was identified, with (for example) the majority of studies using heavy working-class ROVs, it is clear that not all researchers have access to these expensive units. Recent technological advancements, however, are improving the performance and practicality of observation-class ROVs, with some models meeting or surpassing the performance of larger-sized models (Pacunski et al., 2008). Optimal transect design need to be selected with consideration of species-specific distributions and characteristics, with patchily distributed or rare/low abundance species requiring either long transect lengths or greater numbers of replicate transects (Trenkel et al., 2004b;Perkins et al., 2016), noting the latter provides greater statistical power. Finite structures, such as artificial reefs, sunken vessels and oil pipelines, require transect designs that sample fish more intensively within a smaller area. While we have identified a number of commonalities between studies, transect design often appeared to be arbitrarily chosen, with transect type and length, in particular, varying considerably between studies. We suggest further research is required to better guide researchers in how to select the most appropriate transect design for their particular study. For example, one benefit that is almost never applied when using ROVs is the ability to alter transect designs as required. Meaning, for example, strip transects could be altered to include multiple timed stationary counts at specific features of interest should they occur (such as caves). It should be noted that strict protocols need to be implemented around such a sampling strategy to reduce operator bias.
We identified nine different metrics that were extracted from imagery. Clearly, the suitability of some metrics is likely to change depending on the focus of the study (e.g., high relief artificial structures), with species-specific mobility and reaction to the ROV system (i.e., attraction and repulsion) influencing the catchability of individuals as well as the potential that individuals will be recounted during surveys. For highly mobile species and in survey designs where individuals are likely to be recounted (e.g., vertical transects on oil and gas pipelines), relative abundance (MaxN) may be a more robust sampling metric. However, the inherent variation in metrics between studies will restrict or preclude the ability of the datasets being combined in the future to look a larger-scale patterns (noting that video imagery can be reanalyzed if required). Given the extensive range of lights, thrusters, speeds, and ROV sizes available, investigation into behavioral reactions may provide further insight into each of these different aspects, which could then be used toward standardizing more effective survey strategies.
The choice of ROV system appears to be a strong indicator of both the types of surveys available to studies and the success of ROV deployments ( Table 5). For instance, larger, more powerful working-class systems are able to complete longer and more complex designs (e.g., swath, cloverleaf, and polygonal transects) at greater depths while maintaining a more standardized deployment route (Norcross and Mueter, 1999), but are more expensive and difficult to deploy. Studies on highly mobile species should also be cognizant of the distance between study sites and the distance between adjacent sections of complex, multi-directional transect designs. Observation-class systems, on the other hand, are typically flown down-current on strip transects and are more susceptible to delays or cancelations of deployments, with some researchers having to modify their intended sampling strategies (Ruhl et al., 2003;Trenkel et al., 2004b;Bryan et al., 2013;Ajemian et al., 2015b;Rosa et al., 2015). Radial transects, or strip transects that employ live-boating or a weight that anchors the tether to the seafloor, can be used to improve the performance of observation-class systems under severe current and swell conditions. As observation-class systems often employ shorter tethers (<150 m for mini-sized and <300 m for regular-sized models), radial transects may be used to maximize sampling area at deeper depths and on large vessels that may rotate substantially while anchored. Further research is clearly needed to provide researchers with the guidance needed to strategically choose between transect designs and lengths as well as the type of metrics that should be annotated from imagery. This will ultimately lead to a more rigorous understanding of ROVs to visually survey the distribution and abundance of deepwater marine fish.

AUTHOR CONTRIBUTIONS
DS has provided much of the research and writing into this review. NB and JM provided edits, their expertise, and suggestions for publications to be included.

ACKNOWLEDGMENTS
This work was undertaken for the Marine Biodiversity Hub, a collaborative partnership supported through funding from the Australian Government's National Environmental Science Programme. NESP Marine Biodiversity Hub partners include the Institute for Marine and Antarctic Studies, University of Tasmania; CSIRO, Geoscience Australia, Australian Institute of Marine Science, Museum Victoria, Charles Darwin University, University of Western Australia, NSW Office of Environment and Heritage, NSW Department of Primary Industries and the Integrated Marine Observing System. DS would like to thank the Oppermans for their encouragement and support of my higher degree education.