Drones Minimize Antarctic Predator Responses Relative to Ground Survey Methods: An Appeal for Context in Policy Advice

Unoccupied aerial systems (UAS) have become common tools for ecological monitoring and management. However, UAS use has the potential to negatively affect wildlife. Both policy makers and practitioners require data about the potential impacts of UAS on natural biota, but few studies exist and some of the published results conflict. We conducted two experiments to assess the responses of chinstrap penguins (Pygoscelis antarcticus), Antarctic fur seals (Arctocephalus gazella), and leopard seals (Hydrurga leptonyx) to UAS overflights. First, to provide a baseline for assessing disturbance from UAS operations, we compare behavioral responses from UAS flights to those from traditional, ground surveys. Second, to inform users and policy makers about preferred flight methods, we assess behavioral and physiological responses to UAS flown at specific altitudes, during different stages of breeding chronology, and with other site factors. Between January 2017 and March 2018 we conducted 268 UAS flight approaches and 36 ground-based surveys at Cape Shirreff, Antarctic Peninsula. We applied generalized linear mixed effects models and Kruskal-Wallis tests to 10,164 behavioral scores obtained from three independent observers. When directly compared, behavioral responses by all species to UAS overflights at 30 m were not different from control periods, while responses to ground surveys were significantly more intense. Behavioral responses generally increased as UAS flew lower, and for penguins those increases intensified as the breeding season progressed (i.e., guard and molt stages). We argue that results from UAS wildlife response studies need to be assessed relative to the impacts of alternative methods, and within the ecological context of the target species. Finally, we suggest data-driven best practices for both UAS use and for the design of future UAS-wildlife response studies.


INTRODUCTION
The use of unoccupied aerial systems (UAS), or drones, for recreational, educational, and commercial purposes has expanded rapidly in recent years (King, 2014;Crutsinger et al., 2016;Johnston, 2019). Concurrently, they have become common, often indispensable, tools for ecological monitoring and management (Goebel et al., 2015;Sweeney et al., 2015;Christie et al., 2016;Durban et al., 2016). UAS-based scientific surveys frequently obtain data that is as-or-more accurate than traditional ground-based methods (e.g., Hodgson et al., 2018;Krause et al., 2017;Torres et al., 2018). However, UAS use creates potential for negative interactions with wildlife (Lambertucci et al., 2015;Mulero-Pázmány et al., 2017;Bennitt et al., 2019). Managers and policy makers have created a patchwork of regulations to mitigate potential impacts, but policy creation and effectiveness is limited by a lack of objective data on the effects of UAS on wildlife (Linchant et al., 2015;Gomez et al., 2016;Wallace et al., 2017). Resultant appeals to ecological monitoring programs to systematically collect UAS-wildlife response data to better inform UAS guidelines (e.g., Smith et al., 2016;Mustafa et al., 2018) have produced varied results.
A major challenge to reconciling disparate wildlife-UAS interaction results is the large number of variables that may affect animal behavior in such studies (Raoult et al., 2020). Wildlife response may differ depending on: type and size of UAS (McEvoy et al., 2016), method of operation (Vas et al., 2015), amplitude and frequency of noise (Scobie and Hugenholtz, 2016;Arona et al., 2018), age, sex, breeding chronology (Weimerskirch et al., 2018), species (Ramos et al., 2018), and even by individual (Pomeroy et al., 2015, this study), among other variables . Some of this complexity will resolve with a dramatic increase in UAS wildlife response studies. However, even among studies using similar parameters some results conflict. For example, horizontal overflights by small UAS (<5 kg) at altitudes of 25-40 m were reported to cause disturbance to penguins (Ratcliffe et al., 2015;Rümmler et al., 2016Rümmler et al., , 2018, dolphins and manatees (Ramos et al., 2018), albatross and petrels (Weimerskirch et al., 2018). Yet other studies reported little to no effect of UAS use under similar conditions for whales (Domínguez-Sánchez et al., 2018;Torres et al., 2018), fur seals (McIntosh et al., 2018), and various species of birds (Vas et al., 2015;McEvoy et al., 2016), even when comparing congeneric penguins in similar field conditions (Goebel et al., 2015;Weimerskirch et al., 2018;this study).
Ultimately, different conclusions about UAS impacts on wildlife may arise from differences in the baseline conditions against which UAS impacts are assessed. With so many potential variables it is crucial that behavioral impacts are evaluated relative to the same standards. For example, in a given system UAS may elicit a behavioral response, but that result does not provide a framework to decide if the UAS is better or worse than another method. We argue that wildlife response to UAS should be assessed by researchers, and interpreted by policy makers, relative to alternative methods (e.g., Moreland et al., 2015;Barnas et al., 2018). Wildlife researchers should design their studies in comparison to commonly used and accepted methods for wildlife study (e.g., ground or manned aerial surveys). Other users should assess UAS impact in concert with the alternate case. Such context is required to reduce confusion and truly inform regulation of UAS activities to benefit wildlife.
Accurate population counts and body condition measurements of upper trophic-level predators are fundamental to ecosystem management in the Antarctic (Agnew, 1997;Boyd et al., 2006). While UAS facilitate such population counts (Goebel et al., 2015;Pfeifer et al., 2019) and body condition measurements (Krause et al., 2017), a comparison of wildlife behavioral response to UAS vs. alternative observational methods has not been done. To evaluate the behavioral effects of UAS overflights on Antarctic megafauna, we use three target species and conduct two separate experiments. First, we directly compared the behavioral effects of UAS flights at a standard flight altitude (30 m) to those from traditional, ground survey methodologies used to census colonial chinstrap penguins (Pygoscelis antarcticus, CHPE) and Antarctic fur seals (Arctocephalus gazella, AFS), or measure solitary leopard seals (Hydrurga leptonyx, LS). Results from the survey comparison provide context for outcomes of the second experiment, an altitude comparison, to assess behavioral and physiological responses to UAS flown at specific altitudes and during different stages of breeding chronology. The altitude comparison study was done to inform users and policy makers about preferred flight altitudes and methods. Each target species in our study present unique challenges and opportunities for study in the field. Therefore, while we applied the same general experimental design to all species, we tuned our data collection, when possible, to address specific covariates relevant to the behavioral responses of each species. For example, during the altitude comparison study over AFS, we assessed the influence of sex, level of human exposure, ambient wind speed, and UAS approach direction (upwind or downwind) on behavioral responses. For LS, we observed both behavioral and physiological (respiration rate) responses and considered molt phase as a potential factor that may affect their response to UAS overflight.

MATERIALS AND METHODS
Field studies were done at the National Oceanic and Atmospheric Administration (NOAA) research facility on Cape Shirreff (62.47 • S, 60.77 • W), Livingston Island, Antarctic Peninsula. NOAA's United States Antarctic Marine Living Resources (U.S. AMLR) Program conducts long-term monitoring of Antarctic krill (Euphausia superba) and dependent penguin, seal, and fur seal populations at Cape Shirreff. The breeding season for CHPE and AFS falls between November and March annually (Hinke et al., 2007). LS do not breed at Cape Shirreff, but seasonally resident animals haul-out there each year from December through May (Krause et al., 2016). All ground and aerial surveys were conducted between December and March of the 2016/17 and 2017/18 seasons. payload capacity: 1 kg, diameter < 60 cm) UAS configured with a downward facing Olympus E-PM2 digital camera and a single battery (QuadroPower 6200 mAh Li-PO) as payload. This platform was selected because it is robust to prevailing Antarctic temperature and wind conditions and is small, portable over rough terrain, and relatively quiet for a small UAS (31.3-57.8 decibels at 0-90 m distance, Goebel et al., 2015). The APH-22 and associated flight protocols have been well described in previous studies (Durban et al., 2015;Krause et al., 2017).
To limit disturbance during UAS setup, take-offs, and landings, all flight operations were staged, and all midflight altitude changes were made, ≥50 m (range: 55-120 m) horizontally from focal animal groups (Figure 1). All reported flight altitudes were Above Ground Level.

Data Collection
During each field season we identified healthy (i.e., uninjured, good body condition) groups of CHPE and AFS, or individual LS, during each of our target breeding stages ( Table 1). We selected sections of CHPE colonies of similar size to control for potentially stronger reactions from birds in smaller groups (e.g., Rush et al., 2018) that provided clear view of both peripheral and interior nests (typically ∼ 30 individuals) We selected harems of AFS, generally containing a single male and several females (typically ∼ 10 individuals). We sampled LS individually because they are typically encountered ashore as solitary animals.
For all three species, and for all ground and aerial surveys, we placed automated horizontal-facing cameras (Reconyx PC800, 6.1 MP) ∼15 m from the focal group (Figure 2). In order to limit and control for potential disturbance, cameras were transported carefully by a single, experienced researcher ≥5 min prior to commencing flight operations or grounds surveys. Visible reactions from target individuals due to camera set-up were minor (e.g., sleeping animals opening their eyes, no movement or locomotion), and subsided immediately. The Reconyx camera recorded one photo per second before, during, and after all flights and ground surveys.

Survey Comparison Study
For comparisons of behavioral response between UAS flights and traditional ground survey methodologies we selected ongoing studies where UAS methods are comparably accurate to ground methods: CHPE nest and chick census, AFS pup census (Goebel et al., 2015), and LS body mass, and condition measurements (Krause et al., 2017). We flew the UAS missions at an altitude of 30 m ( Figure 1A) because this altitude is commonly used in UAS operations over wildlife (Ratcliffe et al., 2015;Krause et al., 2017;Weimerskirch et al., 2018;Raoult et al., 2020) and within the range where reported animal responses differ across studies. The length of each flight was set by the length of time needed to collect the required data (e.g., census counts, or body measurements), or a minimum of 60 s. During ground surveys, for CHPE, two field technicians using binoculars and hand-counters approached colonies annually to census nests (December) and, later, chicks (February). Every effort was made to reduce animal disturbance, however, ground surveys typically required close approach (≤5 m) or entrance into the colony on foot for accurate counting of dense aggregations. AFS pups were counted from the entire Cape Shirreff population in late December annually. Each breeding group was censused systematically by a single observer using a hand counter and walking within ∼ 5 m of animal groups. Finally, the U.S. AMLR program estimates the body condition of leopard seals annually at Cape Shirreff. Before UAS techniques were instituted in 2017 (Krause et al., 2017), measuring the size and mass of LS necessitated sedation captures undertaken by teams of ≥5 researchers (Krause et al., 2017).
In order to maintain independence for the survey comparison study we monitored separate target groups for each species during UAS overflights, ground surveys, and control periods. UAS overflights were conducted on the same breeding beach and within 4-48 h of ground surveys to ensure environmental conditions were a similar as possible.

Altitude Comparison Study
The altitude comparison study required repeated occupation of positions over target animals at successively lower altitudes FIGURE 1 | A schematic of unoccupied aerial systems (UAS) flight patterns for the (A) survey comparison study, and (B) altitude comparison study using penguins as an example species. All take-offs, landings, and mid-flight altitude changes were conducted ≥50 m from target animals. Antarctic fur seal Non-harem Late in the breeding season, but before pups wean from their mothers, female-pup pairs disburse widely away from breeding beaches; occasionally forming small groups, but not restrained by males.
Leopard seal Pre-molt There is substantial variance in the timing of molt for adult leopard seals. Pre-molt typically takes place before February when no evidence of molting (browning of fur, molting patches of fur, etc.) can be seen externally.

Leopard seal Molt
When animals are actively molting. The peak at Cape Shirreff is typically in the first week of February.
( Figure 1B). We controlled for particularly disruptive, highengine-noise UAS flight maneuvers by conducting in-flight altitude changes ≥50 m horizontally from target animal groups ( Figure 1B). Pilots flew missions manually. During approach to the target groups they did not vary the aircraft altitude and used consistent flight speeds (range: 3-4 m/s). Once overhead, the UAS hovered over the group for 60 s at each altitude starting at 46 m to the lowest altitude at 8 m. Because we flew over target groups repeatedly ( Figure 1B), observed reactions may be caused by repeated or prolonged exposure rather than detection of the UAS at a specific altitude. However, previous studies indicate that animals typically react upon first detection of the UAS even on repeated flights (Pomeroy et al., 2015;Vas et al., 2015;Rümmler et al., 2016;Mulero-Pázmány et al., 2017), suggesting that any bias for cumulative impacts of repeated overflight should be minimal (Bennitt et al., 2019). For AFS sampling flights we also collected data on wind speed (knots, measured 2 m above the ground during UAS setup with handheld, Kestrel 3000, anemometer) and UAS approach direction (upwind or downwind). Further, to address questions about the effects of habituation of AFS to human presence, we included harem groups that routinely experience daily or weekly exposure to humans ("high exposure") and harem groups with limited exposure (0-2 times per year) to humans ("low exposure").
Additionally, for all AFS and LS flights, ground-based observers hid behind natural obstructions 20-30 m from target animals and scored behavioral responses, and respiration rates (LS only), in real-time aided by 10 × 40 binoculars. The altitude comparison study featured a repeat measures design, therefore the same target groups were used for each flight mission [all altitudes, and the control period (see section "Control Groups")].

Behavioral Scoring From Photographs
We aligned time and date stamps (resolution = 1 s) between each Reconyx and UAS photographs taken when the UAS was above target animals. Three, independent observers identified clearly visible individual animals within each photograph (e.g., Figure 3) and scored their behavior using a five-point scale. The CHPE scale was taken from Weimerskirch et al. (2018) derived from previous reaction studies on penguins (Rümmler et al., 2016 and references therein; Table 2). We developed a similar, five-point behavioral reaction scale for AFS and LS (Table 3)  ) and the regulations governing the taking and importing of marine mammals (50 CFR Part 216). This U.S. law provided a standardized definition of the behavioral reaction of a pinniped to external stimulus other than direct contact, capture, or death. During each observation period we used the highest number score for each animal in each treatment group for statistical analysis.
Study and altitude information was removed from photos and data sheets to prevent subconscious bias amongst observers. Only in rare instances (4 of 31 fur seal groups), when the Reconyx camera failed due to wind or user error, we used the scores recorded by dedicated, in situ observers.

Control Groups
For both the survey comparison and the altitude comparison experiments our observers scored a set of 60 consecutive photographs (1 min) from each target group, taken ≥2 min before the initiation of UAS flights, or independently from ground surveys. We used these behavior observations as behavioral controls (hereafter: "control").

Observing Respirations
Physiological metrics, like respiration rates, can be used to detect stress responses in wild animals (Mortola, 2015;Weimerskirch et al., 2018). Visual observers positioned themselves with a clear view of the target LS nose in order to record the respiration rate (breaths/min). Observers counted respirations for 60 s during control periods and specific UAS overflight altitudes (hereafter: "physiological response").

Data Analysis
The null hypothesis for our study questions was that no difference in behavioral scores (all species) or respiration rates (LS) existed when grouped by specific UAS flight altitudes or survey techniques compared to our controls. We used three modeling frameworks to test for differences between: (1) behavioral scores taken from independent target groups during control, ground, and aerial surveys (hereafter: "survey comparison study"), (2) behavioral scores taken repeatedly from one target group at specific altitudes (hereafter: "altitude comparison study"), and (3) LS respiration rates taken repeatedly from one individual at specific altitudes (part of: "altitude comparison study"). Further, three independent observers ("observer") scored each animal in each treatment group. We conducted all analyses using R 3.5.3 (R-Core-Team, 2019).

Survey Comparison Study
Our aerial-ground survey comparison data had no pertinent covariates and scores between surveys were independent. Therefore, we tested for differences in Antarctic predator behavioral responses grouped by survey type (levels = Ground, Aerial, Control) using Kruskal-Wallis and multiple-comparison Dunn tests (Dunn, 1964). To account for multiple statistical tests, p-values were adjusted using the Benjamini-Hochberg method (Ogle, 2016).

Altitude Comparison Study
The behavioral scores at specific altitudes are categorical, ordinal (there is a specific order to the scores that is sequential), and non-normally distributed. We applied generalized ordinal logistic mixed effects models (a.k.a., proportional odds models or cumulative link mixed models) implemented with the R package ordinal (Christensen, 2018) with behavioral response as the dependent variable. For all species in all models we assigned flight altitude (levels = Control, 46 m, 30 m, 15 m, 8 m) as a fixed effect and observer (levels = Obs1, Obs2, Obs3) as a random effect. We also evaluated the potential importance of a suite of covariate effects when that data was available for a given species. For CHPE we tested breeding stage as a fixed effect ("stage, " levels = Incubation, Guard, Molt). For AFS we evaluated breeding stage ("stage, " levels = Harem, Post-harem, Non-harem), sex (levels = Male, Female), level of human exposure ("exposure, " levels = High, Low), wind speed (range = 4-16 knots), and UAS approach direction ("approach, " levels = Up-wind, Down-wind) as fixed effects. For LS we tested molt stage ("molt stage, " levels = Premolt, Molt, Post-molt) as a fixed effect and Seal ID (unique identity of each leopard seal, levels = 25 × seal ID's) as a random effect.
Finally, for the physiological response portion of the altitude response study we used respiration rate (breaths/minute) as a marker of physiological change in response to UAS presence. These data are numerical counts (range: 3-16) that are not normally distributed. We applied generalized linear mixed effects models with a Poisson (link = "log") distribution implemented with the R package lme4 (Bates et al., 2015) with respiration rate as the dependent variable, and altitude and molt stage as fixed effects. Seal ID was included as a random effect to account for the fact that mammalian resting respiration rates vary individually based on mass, age, cardiovascular health, and other factors (Mortola, 2015 and references therein). Finally, we calculated a semi-partial R 2 (R 2 ) to evaluate the generalized variance explained by our fixed effects (Jaeger et al., 2017).
Candidate model effects were tested individually using paired likelihood ratio tests (Pinheiro and Bates, 2000;Bolker et al., 2009) compared with the base model (intercept and observer as a random effect only). All significant effects were included in a forward stepwise model selection process informed both by comparing model AIC (Akaike, 1973) and paired likelihood ratio tests.

Model Assumption Evaluations
We verified that each of our generalized linear mixed effects models met relevant assumptions (linearity, absence of collinearity, homoscedasticity, normality of residuals, and absence of influential data points). We also evaluated all linear mixed effects models for overdispersion (Bolker et al., 2009). Finally, the Condition number of Hessian provides a measure of goodness of model fit for ordinal logistic models; values below 10ˆ4 indicate a good fit (Christensen, 2018). 1 Awake Eyes open, but no movement of the head or body (except those movements listed for "Resting").

2
Looking Specifically looking at, and visually following a source of disturbance (UAS or survey biologist). Heads maybe turned upwards or laterally, tracking an object.
3 Orientation change Horizontal movement less than 2 body lengths in total distance. This includes "spinning around," short movements, as well as aggressive behaviors directed at the UAS or survey biologists (not other animals) including gapes (opening the jaws wide in the direction of the disturbance) or vocalizations.

Escape
The strongest reaction to the presence of UAS or researchers involving locomotion of more than 2 body lengths from its original position.

Survey Comparison Study
For all three species behavioral response scores did not differ significantly between UAS aerial surveys (30 m altitude) and control observations. In contrast, the behavioral responses to ground surveys were significantly higher than during aerial surveys and control periods ( Table 5 and Figure 4). However, even during ground surveys escape responses were rare for CHPE (6.2% of scores) and AFS (1.3% of scores). Leopard seals reacted to manual sedation captures by moving to the water post-capture 100% of the time (Supplementary Figure 1).

Chinstrap Penguins
The behavioral response scores of chinstrap penguins when UAS hovered at 46 and 30 m were indistinguishable from those recorded during the control period, but then increased at 15 and 8 m ( Figure 5A). The most common behavioral scores were resting ("0") and vigilant ("1") for all flights above 30 m, and escape ("4") only occurred during low altitude molt-stage flights and groundbased chick counts (Supplementary Figure 2). Ordinal logistic models fit the data well (Condition number of Hess < 10 ∧ 3), and the most informative model included effects for altitude, stage, and the interaction between altitude and stage ( Table 6).
The behavioral responses of chinstrap penguins were significantly higher during UAS overflights at 15 and 8 m [χ 2 (1) = 226.7, p << 0.0001). The coefficients for 15 and 8 m were positive, indicating that lower altitudes are likely to increase behavioral scores. Behavioral reactions at altitudes 46 or 30 m did not inform any model suggesting they could not be distinguished from behaviors during controls (Table 6). In combination with altitude effects, behavioral reaction scores increased when birds were guarding chicks and molting [χ 2 (8) = 516.9, p << 0.0001].

Antarctic Fur Seals
Fur seal behavioral responses at 46 m were not different from the control; however, responses were progressively higher for 30, 15, and 8 m flights (Figure 5B). The most common behaviors were resting ("0") and awake ("1") for all flights ≥30 m, and escape ("4") reactions at all altitudes were rare [observer mean = 21.0 (s.d. 19.1) of 1,418 scored reactions, Supplementary  Figure 3]. The coefficients for 30, 15, and 8 m flights had positive slopes [χ 2 (4) = 334.2, p << 0.0001, Table 7] indicating that reactions were stronger at lower altitudes. Ordinal logistic models fit the data well (Condition number of Hess < 10ˆ3). The most informative model included effects for altitude, approach direction, sex, and human exposure level. Males and animals with low previous exposure to humans were more likely to have higher scores than females [χ 2 (1) = 18.0, p << 0.0001] Bold values are the totals of the column above.

Leopard Seals
Like CHPE and AFS the behavioral reaction scores of leopard seals increased with decreasing UAS altitude, particularly below 30 m ( Figure 5C); and ordinal logistic mixed models fit the data well (Condition number of Hess < 10ˆ3). The most informative model included altitude as a significant effect [χ 2 (4) = 54.44, p << 0.0001], but did not include molt stage [χ 2 (2) = 1.29, p = 0.524,  Table 8). While behavioral reactions of looking ("2") were more common for leopard seals than other species, there was never an escape ("4") reaction to UAS flights at any altitude (Supplementary Figure 5). Leopard seal respiration rates were highest for control animals, measured well before UAS were launched, and the lowest rates were recorded while the UAS was flying at 46 m and 30 m overhead (Figure 6). The generalized linear mixed effects model fit the respiration data well, and met all tested assumptions. Model comparison indicated that seal ID [χ 2 (1) = 45.76, p << 0.0001] as a random effect, and altitude [χ 2 (4) = 10.26, p = 0.0363] and stage [χ 2 (2) = 8.93, p = 0.0115] as fixed effects, significantly informed the model (Table 9). Observer was not included as a random effect because respiration rates were recorded only by a single observer. The informative coefficients for altitude (46 and 30 m) both had negative slopes ( Table 9) indicating that flights at those altitudes predicted lower respiration rates.

Observer Differences
Assigned behavioral scores from all three observers showed similar patterns, but were systematically offset (e.g., Supplementary Figure 6). Observer as a random effect was significantly informative to every model framework in which it was tested (paired likelihood ratio test p < 0.05, Tables 6-8).

On the Importance of Context
It is clear from this and other studies that even small batterypowered drones can affect natural biota. The interpretation of such effects, however, requires context for both UAS use objectives and the ecology of target species. For example, behavioral reactions during 30 m survey overflights for all species were indistinguishable from the control while responses to ground surveys were significantly more intense (Table 5 and Figure 4). However, even if UAS-induced effects at 30 m had been detected, results like these need to be assessed and interpreted relative to the impacts of alternative methods. Moreover, even these ground-survey effects require additional context. The ground-based techniques used in this study have been refined over decades to limit disturbance, and are widely accepted by practitioners and regulators as appropriate for long-term monitoring (e.g., Hinke et al., 2007;Goebel et al., 2009;Krause et al., 2020). Further, UAS use may be impractical for some field surveys for a variety of factors (e.g., environment, weather); therefore, established ground-based methodologies like these may be the best option in some cases.
All three species of Antarctic predators in this study reacted to the presence of the UAS, however, in context those reactions were likely not harmful. Mean behavioral reactions for both pinniped species at all UAS flight altitudes never met the minimum threshold for minor, transitory (Level B) harassment, as defined by the U.S. Marine Mammal Protection Act. In fact, even at the individual level, escape reactions were rare (LS = 0%, AFS = 1.5%). And, while it was common for CHPE to briefly look at UAS overhead (Supplementary Figure 2), such behavior is natural during their breeding season. Chinstrap penguins and their congeners are evolutionarily adapted to disturbance from predatory birds [e.g., brown skuas (Stercorarius antarcticus), kelp gulls (Larus dominicanus), giant petrels (Macronectes giganteus)]. Short term overflights by drones induce as-muchor-less disturbance (i.e., drones never steal eggs or chicks) than    overflights by predators that occur multiple times per hour, all day, throughout the breeding season (Emslie et al., 1995).

Behavioral Response
Despite substantial differences in life history, size, social behavior, and ecology all three focal species in this study demonstrated similar patterns in their behavioral response to the presence of small UAS flying at specific altitudes. Behavioral responses to flights at higher altitudes were limited, but increased for flights below 30 m. For CHPE, behavioral responses also intensified over the course of the breeding season. The influence of breeding stage on behavioral response supports earlier findings (e.g., Mulero-Pázmány et al., 2017;Weimerskirch et al., 2018). While we were not able to test leopard seals during their pup rearing stages,  differences in molt stage apparently did not affect their reaction to the presence of UAS (Table 8).
Antarctic fur seals were more sensitive to 30 m overflights than CHPE, and males were more likely to react than females. Both emphasize a key aspect of AFS response to observational stimuli. Otariids have a polygynous social structure during their breeding season where a single male controls a territory and actively retains females within that space (Bonner, 1994). As a result, female reactions to external stimuli frequently initiate a chain reaction where the male rushes through the harem to curtail female movement, impacting and disturbing other animals along the way. Hence, disturbances are often amplified. Habituation by regular human exposure on regularly monitored study beaches appears to decrease sensitivity to UAS use (e.g., Supplementary Figure 4C). Finally, the detection of UAS noise is a primary source of animal disturbance across animal taxa (Scobie and Hugenholtz, 2016;Mulero-Pázmány et al., 2017). We found fur seals were significantly more likely to react when approached from upwind ( Table 7) likely because UAS noise is more strongly propagated to target animals during upwind approaches. Therefore, we suggest that behavioral reactions would be lower if groups are approached from downwind. Finally, behavioral response studies typically rely on a subjective assessment of animal reaction. However, studies to date have relied almost exclusively on behavioral observations from a single observer which may introduce an unknown bias. Despite similarities in scoring between observers in this study (Supplementary Figure 6), inter-observer variance was informative to every model tested in this study. It seems prudent to obtain behavioral scores from ≥3 independent observers so that observer effects can be assessed and propagated through modeling frameworks. Moreover, accurately tracking behavioral change in real time from multiple animals is at best problematic. Digitally recording target animals with photographs or video natively allows for review by multiple observers and a permanent record to ensure reproducibility.

Physiological Response
Physiological metrics provide insight into the responses of wildlife to disturbance that are not detectable externally (Weimerskirch et al., 2002). For example, an elevated heartrate could indicate a stress response even if no behavioral change was detected (Weimerskirch et al., 2018). Although the resting respiration rates of large aquatic mammals are substantially lower than terrestrial mammals of similar mass, pulmonary and cardiovascular systems are coupled in the same way; therefore, change in respiration rate is highly correlated with change in heart rate (e.g., Mortola, 2015). Predictable changes in respiration rates were detected for leopard seals across a range (3-16 breaths/min) similar to changes seen during sedation captures (U.S. AMLR unpublished data). However, patterns of respiration rate changes in response to UAS were distinct from behavioral responses, paradoxically, due to a peak in physiological responses during the control (no UAS) period (Figure 6). In further contrast to behavioral response, model comparison indicated molt stage and higher altitudes (46 and 30 m) were significant effects (Table 9). However, the coefficients were negative signifying that seals had higher respiration rates during the control period than when the UAS was hovering overhead at 30 m. Because adult leopard seals have no terrestrial predators, it seems likely that elevated control respiration rates were a carryover effect from human researchers arriving to the area rather than a UAS effect. We suggest future studies use an extended (>15 min) acclimation period for camera placement. Finally, baseline respirations were significantly higher during molt compared to pre-or post-molt periods, which is logical because molting and new fur growth necessitates a substantial increase in metabolism (Costa and Crocker, 1996), and therefore respiration rate.

CONCLUSION
Small UAS allow researchers to obtain data that are comparable to traditional collection methods (e.g., Krause et al., 2017;Hodgson et al., 2018) in ways that are typically cost-effective, safer (Sasse, 2003;Watts et al., 2012) and at larger scales (Sweeney et al., 2015). We demonstrated that when studied using appropriate context, and operated responsibly, UAS can be less invasive to wildlife than traditional observation techniques as well.
We suggest that wildlife managers and policy makers take a precautionary approach when considering the regulation of UAS use. Given myriad differences in UAS types, site-specific conditions, observation requirements, and species-specific risks to overflight, we recommend a policy emphasis on in situ risk assessments rather than fixed requirements or exclusions. Future UAS-wildlife response studies should incorporate experimental design that explicitly compares wildlife responses to the non-UAS data collection methods.

Best Practices for UAS Use
Our results reinforce previously established best-practices for conducting UAS flights near wildlife (Hodgson and Koh, 2016;Mulero-Pázmány et al., 2017). Scientists collecting counts or body measurements from penguins or pinnipeds should use a small, battery-powered UAS (Goebel et al., 2015), and fly at the highest altitude possible that allows for adequate sensor resolution (Scobie and Hugenholtz, 2016;Mustafa et al., 2018). Particular caution should be used if data collection requires flights later in the breeding season, at altitudes below 30 m, or when studying harem breeders like otariid species. Finally, pilots should approach animal groups from downwind if possible. Small UAS should be considered as a primary, low-disturbance methodology to obtain such data if available.

Best Design for UAS-Wildlife Disturbance Studies
While UAS wildlife response studies have increased in recent years, the available data is not yet sufficient to resolve the manifold mitigating factors in many systems . And, of course, UAS technology continues to evolve rapidly. In order to provide the most useful data to wildlife managers and policy makers we suggest that future studies: • Are explicit in their study design, analysis, and presentation about the context of alternate non-UAS options. • Assign behavioral scores by ≥3 independent observers so that individual variance can be assessed and propagated through modeling frameworks. Doing so will increase confidence in inference. • Control for individual-level effects on physiological parameters (e.g., heart or respiration rate) when possible. • Record wildlife responses with photographs or video.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The animal study was reviewed and approved by the NMFS-SWFSC Institutional Animal Care and Use Committee, Permit # SWPI 2014-03R. Additionally, research was completed in accordance with Marine Mammal Protection Act (MMPA) Permit # 20599, and Antarctic Conservation Act (ACA) Permit # Watters 2017-012.

AUTHOR CONTRIBUTIONS
DK, JH, MG, and WP designed the study. DK conducted all the field work. DK and JH trained and supervised the technicians and analyzed the data. DK and JH wrote the manuscript with editing from WP and MG. All authors contributed to the article and approved the submitted version.

FUNDING
Financial, infrastructure, and logistical support was provided by the U.S. AMLR Program, NOAA Fisheries.