Estimating Bycatch Mortality for Marine Mammals: Concepts and Best Practices

Fisheries bycatch is the greatest current source of human-caused deaths of marine mammals worldwide, with severe impacts on the health and viability of many populations. Recent regulations enacted in the United States under the Fish and Fish Product Import Provisions of its Marine Mammal Protection Act require nations with fisheries exporting fish and fish products to the United States (hereafter, “export fisheries”) to have or establish marine mammal protection standards that are comparable in effectiveness to the standards for United States commercial fisheries. In many cases, this will require estimating marine mammal bycatch in those fisheries. Bycatch estimation is conceptually straightforward but can be difficult in practice, especially if resources (funding) are limiting or for fisheries consisting of many, small vessels with geographically-dispersed landing sites. This paper describes best practices for estimating bycatch mortality, which is an important ingredient of bycatch assessment and mitigation. We discuss a general bycatch estimator and how to obtain its requisite bycatch-rate and fisheries-effort data. Scientific observer programs provide the most robust bycatch estimates and consequently are discussed at length, including characteristics such as study design, data collection, statistical analysis, and common sources of estimation bias. We also discuss alternative approaches and data types, such as those based on self-reporting and electronic vessel-monitoring systems. This guide is intended to be useful to managers and scientists in countries having or establishing programs aimed at managing marine mammal bycatch, especially those conducting first-time assessments of fisheries impacts on marine mammal populations.

Fisheries bycatch is the greatest current source of human-caused deaths of marine mammals worldwide, with severe impacts on the health and viability of many populations. Recent regulations enacted in the United States under the Fish and Fish Product Import Provisions of its Marine Mammal Protection Act require nations with fisheries exporting fish and fish products to the United States (hereafter, "export fisheries") to have or establish marine mammal protection standards that are comparable in effectiveness to the standards for United States commercial fisheries. In many cases, this will require estimating marine mammal bycatch in those fisheries. Bycatch estimation is conceptually straightforward but can be difficult in practice, especially if resources (funding) are limiting or for fisheries consisting of many, small vessels with geographically-dispersed landing sites. This paper describes best practices for estimating bycatch mortality, which is an important ingredient of bycatch assessment and mitigation. We discuss a general bycatch estimator and how to obtain its requisite bycatch-rate and fisheries-effort data. Scientific observer programs provide the most robust bycatch estimates and consequently are discussed at length, including characteristics such as study design, data collection, statistical analysis, and common sources of estimation bias. We also discuss alternative approaches and data types, such

INTRODUCTION
Fisheries bycatch is the greatest current source of human-caused deaths of marine mammals worldwide (Lewison et al., 2004;Read et al., 2005;Avila et al., 2018). Bycatch occurs when species not targeted by fishers are incidentally and unintentionally hooked, entangled or entrapped by fishing gear (Hall et al., 2000). Most species of marine mammals-cetaceans, pinnipeds, sirenians and sea otters-are affected by bycatch (Reeves et al., 2013), with hundreds of thousands or perhaps millions of individuals killed annually (Read et al., 2006). Most bycatch occurs in gillnet fisheries (Read, 2008;Reeves et al., 2013), but there is notable bycatch in other types of gear as well, including but not limited to longlines, set nets, stow nets, seines, trawls, and pot or trap gear. Before the 1990s-prior to the enactment of key amendments to the United States Marine Mammal Protection Act (MMPA)-hundreds of thousands of dolphins were killed each year in Eastern Tropical Pacific tuna purse-seine fisheries alone (Hall, 1998). Fisheries-related mortality has been the dominant factor, or at least a major contributing factor, in causing population decline or preventing population recovery (e.g., from historical whaling and sealing impacts) of many marine mammal species. Examples of species highly affected by bycatch include the North Atlantic right whale (Eubalaena glacialis), Hector's dolphin (Cephalorhynchus hectori), New Zealand sea lion (Phocarctos hookeri), the franciscana (Pontoporia blainvillei), the nearly-extinct vaquita (Phocoena sinus), and the extinct baiji (Lipotes vexillifer) (Wang et al., 2006;Slooten, 2007;Turvey et al., 2007;Chilvers, 2008;Secchi, 2010;Rolland et al., 2016;Taylor et al., 2017;Jaramillo-Legoretta et al., 2019; also see Brownell et al., 2019).
In 2016, the United States enacted regulations under the MMPA aimed at reducing marine mammal bycatch in international fisheries 1 . The regulations, stemming from the Fish and Fish Product Import Provisions of the MMPA (hereafter, "MMPA Import Provisions") 2 , require fisheries exporting fish and fish products to the United States (hereafter, "export fisheries") to have or establish marine mammal protection standards that are comparable in effectiveness to the standards for United States commercial fisheries. To continue exporting their products to the United States, nations must apply for and receive a "comparability finding" from the United States National Oceanic and Atmospheric Administration 3 . To achieve a comparability finding, the harvesting nation's program regulating an export fishery must: (1) prohibit the intentional killing or serious injury 4 of marine mammals in the fishery, and (2) conduct marine mammal stock (population) assessments that establish bycatch limits for those marine mammal populations interacting with export fisheries, estimating marine mammal bycatch in those fisheries, and taking measures if necessary to reduce total bycatch below the bycatch limits. Alternatively, harvesting nations may adopt other approaches, such as the implementation of bycatch mitigation measures, that are comparable in effectiveness to United States standards for export fisheries [Food and Agriculture Organization of the United Nations [FAO], 2021]. These comparability requirements are conceptually straightforward, but can be difficult to achieve in practice, especially for the most economically challenged countries (Williams et al., 2016). In the United States, the Guidelines for Assessing Marine Mammal Stocks [GAMMS; National Marine Fisheries Service [NMFS], 2016] provide guidance on the key assessment elements: estimating stock abundance, estimating bycatch mortality and serious injury, and comparing the latter to conservation reference points derived from the former. For example, in the United States, bycatch is compared to the conservation reference point call the Potential Biological Removal (PBR) level 5 , which is calculated from an estimate of the minimum population size and other parameters. PBR is defined conceptually in the MMPA and operationalized from a management strategy evaluation study by Wade (1998). More generally, comprehensive reviews of protected species reference point estimation and assessment frameworks have been conducted by Lonergan (2011), Moore et al. (2013), Curtis et al. (2015), and Moore and Curtis (2016).
The objective of this paper is to describe best practices for estimating bycatch mortality, which is a key ingredient for population or stock assessment, whereby the mortality estimates are compared to a conservation or limit reference point. Reference point estimation is tied to estimating population size, a topic thoroughly reviewed by Hammond et al. (2021), this issue. An in-depth description of the broader assessment framework is found in Wade et al. (2021), this issue. Readers should examine these papers to understand the broader management context within which bycatch estimation takes place under the MMPA Import Provisions and how estimating marine mammal bycatch and population size relate to each other.
There are important precursors to designing a program to estimate bycatch in a fishery (discussed more thoroughly by Wade et al., 2021). The first is making use of exploratory data (which may need to be collected anew) to characterize the fishery (number of vessels, vessel types, gears used, when and where fished, target species, etc.) and identify marine mammal populations that might interact with it. Our use of the term "fishery" is consistent with its usage under the United States List of Fisheries and List of Foreign Fisheries. That is, a fishery is characterized by a collection of fishers using similar methods (e.g., vessel and gear types), fishing for certain target species, operating in a certain place and time. Examples include the United States drift gillnet fishery for swordfish and thresher shark off the United States West Coast, or the Mexican demersal longline fishery for deepwater snappers in the Gulf of Mexico. Our use of marine mammal "population" is consistent with definitions provided by the GAMMS [National Marine Fisheries Service [NMFS], 2016], i.e., a group of interbreeding individuals that is more or less demographically independent from other groups. The United States marine mammal Stock Assessment Reports provide numerous examples of defined population "stocks" (e.g., Carretta et al., 2021). A marine mammal population may occur entirely within the geographic range of a fishery or the fishery and marine mammal population may only slightly overlap in space or time.
In the absence of data, inferences about the likelihood of bycatch occurrence can be made through exploring the spatial overlap of marine mammal populations and fishing gears known to catch or entangle marine mammals. If bycatch is known or expected to occur, and if negligible impacts to the population cannot be ruled out, then this points to the need to mitigate or undertake a formal bycatch estimation program, which is the focus of the remainder of this paper. Estimation should be prioritized for high-risk gears and fisheries that interact with marine mammal populations at particular risk (see risk categories in Box 1 in Wade et al., 2021, this issue). In the context of complying with MMPA Import Provisions, priority should be given to those fisheries categorized as "Export Fisheries" on NOAA's List of Foreign Fisheries 6 . Once obtained, bycatch estimates can then be compared to conservation reference points that depend on the population's size and growth rate to assess the likely or potential impacts of the fishery on the population's viability and whether mitigation actions are needed.
We proceed by discussing a general bycatch estimator and how to obtain the bycatch-rate and fisheries-effort data needed to apply the estimator. Scientific observer programs provide the most robust source of information for estimation and should be used when possible. Observer programs are therefore discussed at 6 https://www.fisheries.noaa.gov/foreign/international-affairs/list-foreignfisheries length, including program-design considerations, data collection, statistical analysis, and common sources of estimation bias. In addition to their value for directly estimating bycatch, scientific observer programs can also be used to assess and improve compliance of required mitigation measures, estimating the efficacy of such measures (e.g., comparing bycatch rate estimates before and after mitigation, or in sectors of the fisheries with vs. without mitigation), and improve the quality of information provided by fishermen (Cox et al., 2007;Porter, 2010;Snyder and Erbaugh, 2020). Because scientific observer programs tend to be expensive and logistically difficult to implement, we also discuss alternative approaches and data types for making bycatch inferences, and the caveats associated with these. This document is intended to be useful to managers and scientists in harvesting nations maintaining or establishing regulatory programs aimed at reducing marine mammal bycatch, including for the purposes of achieving a comparability finding under the MMPA Import Provisions.

BYCATCH MORTALITY ESTIMATOR
Bycatch mortality is the total number of animals that die (or are expected to die) in a fishery from interacting with fishing gear. Bycatch mortality is typically estimated annually for each gearspecific fishery affecting a defined population. Summing across all fisheries interacting with the population provides a total annual estimate for the population. A general point estimator of bycatch mortality for population i is: where the expected bycatch mortality in year t, µ it , is the product of animal abundance in the population (N it ), total fishing effort (E t ), a scaling parameter referred to as catchability (c it ) (which has the unit: bycatch N −1 effort −1 , and can be thought of as the likelihood that a single animal in the population would be caught by a single unit of fishing effort), and the fraction of bycaught animals that are dead or expected to die (m it ; the bycatch mortality rate, BMR), noting that for some types of gear, animals may be released or escape alive after being fatally injured. N it and c it are correlated and in practice will often be difficult to estimate separately. For example, c it will lower if N it is defined as the entire population (including potentially large numbers that never overlap with the fishery), whereas c it will be higher if N it refers to just those animals in the area of the fishery, which may be difficult to estimate. Although there may be cases where c it is explicitly estimated (e.g., from a concurrent time series of abundance and bycatch data; Moore and Curtis, 2016), more typically the product N it c it is estimated as a single parameter referred to as "bycatch per unit effort" (BPUE), or b, where The sections below give details on how these terms may be estimated, and Figure 1 illustrates the associated decisions that must be made.  The most accurate way to estimate BPUE is with data from scientific fisheries observer programs, whereby a representative sample of fishing effort is directly observed by independent observers aboard fishing vessels, and the number of marine mammals bycaught (and killed or injured) is recorded. A simple point estimate for BPUE can be calculated as bycatch observed divided by effort observed. For example, if researchers observed 100 gillnet sets and two dolphins were captured, the BPUE would be two dolphins per 100 sets, or 0.02 dolphins per set. If instead of observing and recording sets, the researchers observed 10 complete fishing trips (which might last many days and include multiple sets and retrievals of one or multiple gear types) and counted six dolphins captured, then BPUE would be 0.6 dolphins per trip.
The fraction of a fishing fleet's effort that is observed is referred to as the "observer coverage." BPUE is more precisely estimated for populations in which animals are caught in greater numbers (because either N or c is higher), and in fisheries with higher observer coverage. Small populations for which bycatch is an infrequent or rare event pose particular bycatch estimation challenges (Martin et al., 2015;Gray and Kennelly, 2018;Wakefield et al., 2018) and require fairly high observer coverage levels to avoid severe biases due to small sample size. Curtis and Carretta (2020) developed the observer coverage calculator ObsCovgTools in R (R Core Team, 2019) that calculates coverage levels required to meet user-defined bycatch estimation objectives. Objectives include estimating bycatch to a desired precision level, estimating the probability of observing bycatch when it exists in a fishery, and providing an upper confidence limit for bycatch, even if no bycatch is observed. Outputs are conditioned on inputs, such as total effort in the fishery and expected BPUE and sampling variance, which can be obtained from a pilot study or borrowed from a similar study, or based on expert opinion. Under the MMPA, performance tests of the PBR control rule used for setting conservation reference points are based on bycatch estimation coefficients of variation (CVs) of 0.3 or better (Wade, 1998), so we suggest this as a reasonable default input for the target precision.
In addition to having adequate levels of observer coverage, statistically valid BPUE estimates demand the use of welltrained observers and an appropriate survey design. Designing an observer training program and prescribing field protocols (datasheets, etc.) are beyond the scope of this paper, but numerous resources address these topics in detail and should be consulted when designing an observer program [e.g., Pacific Islands Regional Office Observer Program [PIRO-OP], 2017; Northeast Fisheries Science Center [NEFSC], 2019]. As for survey design, the goal is to obtain observer data from a representative sample of the fishery with respect to the suite of attributes that characterize fishing effort, such as the geographic distribution of effort, temporal distribution on diurnal and seasonal timescales, vessel and gear characteristics, and types of effort (e.g., sets, hooks, etc.). For example, a gillnet fishery might have the following hypothetical characteristics: 30% of sets in July, 60% of sets in August, and 10% of sets in September; 30% of sets over the continental shelf and 70% of sets offshore; 50% of sets using long nets deployed from large vessels, and 50% of sets using shorter nets deployed from smaller vessels. Ideally, to avoid a biased estimate the observer dataset would have effort in roughly the same proportions; i.e., the observed component of the fishery would be a microcosm of the whole fishery. If this is not possible, stratification of the sampling and effort can potentially reduce problems associated with non-proportional sampling, as long as the important strata are identified and adequately sampled (see below under discussion of biases). Sampling does not need to be exactly proportional to the effort in each stratum, indeed strata sample sizes can be adjusted to the variance in BPUE in each stratum to increase the estimate's precision without introducing significant bias.
The most statistically valid estimates typically are achieved by stratified random sampling, whereby the fishing effort is subdivided into relatively homogenous subgroups with respect to a particular variable (e.g., by area or season) (e.g., Liggins et al., 1997;Cotter and Pilling, 2007;Benoît and Allard, 2009). Precision is improved especially by sampling more intensively in strata where variance of the bycatch is higher (if this is known), for example if bycatch rates in 1 month tend to be more variable than bycatch rates in other months. This would be the recommended approach if there were sufficient knowledge of all the fishing vessels and their schedule of fishing deployments, and if all vessels could accommodate observers [e.g., bunk and deck space for the observer(s)]. In this scenario, one would randomly select a certain percentage of gear deployments ahead of time and place observers on the vessels expecting to make those deployments. But this is rarely practical in fisheries because of, among other things, uncertainties about who is fishing when and where, and the unwillingness (if observer program participation is voluntary) or inability to accommodate observers (see section "Sources of Bias in Bycatch Estimation"). Whatever the circumstances, the observer program must be diligent about obtaining a sample that accurately and precisely represents the fishery as well as possible (Benoît and Allard, 2009;Benoît et al., 2012;Mangi et al., 2015;Fernandes et al., 2021). It would also be beneficial to representatively sample in relation to spatialtemporal variation in animal density, although this information will often not be available. In situations where the effort is well characterized (e.g., how much fishing effort is occurring when, where, and how), but the observed fishing effort is extremely non-representative (e.g., zero or very small sample sizes in one or more strata), statistical approaches can be used in some cases to eliminate bias in bycatch estimates. Statistical approaches are discussed further below in the "Biases in Bycatch Estimation . . ." section.

ESTIMATING m, BYCATCH MORTALITY RATE
Observers typically document bycaught marine mammals as "dead" or "released/escaped alive, " often with an assessment of the type of gear interaction, observation of any gear remaining on the animal, and characterization of any injuries. Animals that escape or are released alive might be uninjured or, if injured, could die later or recover and survive. Thus, an unbiased estimate of bycatch mortality, the bycatch mortality rate (BMR), requires an estimate of the proportion of bycaught individuals that die, whether immediately or eventually (i.e., post-release mortality). In the United States, following a bycatch event and based on data collected at the time of detection and observation, the bycaught individual is categorized as "dead, " "seriously injured, " or "not seriously injured." Those categories are based on guidelines developed through scientific analyses of data on injury severity and outcome, where "seriously injured" was defined as an animal having a greater than 50 percent chance of dying after release, and "not seriously injured" as the animal having a less than 50 percent chance (National Marine Fisheries Service [NMFS], 2012a,b).
Ideally, to determine the post-release mortality rate, bycaught individuals would be tagged prior to release and monitored afterward. Although this approach has been used for marine fish and sharks (e.g., Davis, 2002;Cadigan and Brattey, 2006;Campana et al., 2009;Carruthers et al., 2009;Patterson et al., 2014) and marine turtles (e.g., Álvarez de Quevedo et al., 2013;Stacy et al., 2016;Maxwell et al., 2018;Parga et al., 2020), it has not been employed with marine mammals. Punt et al. (2021) used a modeling approach to estimate postrelease mortality rate of two pinniped species bycaught in Chilean purse seine and trawl fisheries. In practice, most bycatch mortality rate estimates are based on small data sets, categorical assignments (e.g., Andersen et al., 2008;Olaya-Ponzone et al., 2020), or expert assessments. The United States injury guidelines are based on either analyses of scarring data or subsequent observations documenting the condition, health, and fate of known individuals following the detection of injuries due to interactions with fishing gear (see case studies in Andersen et al., 2008). Only a small number of published studies provide estimates of BMR (e.g., Wells et al., 2008;Cassoff et al., 2011;Dolman and Moore, 2017;Pettis et al., 2017;Olaya-Ponzone et al., 2020).
It may not always be necessary to estimate BMR. For example, small cetaceans and pinnipeds caught in gillnets and some trawl fisheries are typically found dead. Conservatively, in the absence of data specific to a study population and fishery, it is prudent to set BMR to 1.0 for marine mammals captured in gillnet fisheries, as suffocating or drowning in the nets is by far the most likely outcome. For bycatch in other types of gear (e.g., purse seine, longline, some trawl), approximate values for BMR might be taken from the literature for similar species and gears, but in fact few such estimates are available.

ESTIMATING E, FISHING EFFORT General Principles for Estimating E
Scaling the observed-sample estimate of BPUE to an estimate of total bycatch in a fishery requires knowing the total amount of effort in the fishery (this is the sampling frame within which a subset of effort has been observed). Critically, the effort metric used for estimating BPUE and for characterizing the whole fishery must be the same. For example, if observers collect data for a random sample of fishing trips with an estimate of how many marine mammals on average are caught per trip, then the total number of trips made by the fleet must be quantified to properly extrapolate to the whole fishery. Similarly, if BPUE is quantified for a gillnet fishery by observing a random number of gillnet deployments (sets), then the total number of gillnet sets made by the fishery must also be known or estimated. In some observer programs, observers monitor all fishing activity over the course of a particular period (e.g., 24 h) and BPUE is measured as the number of bycatch events per effort-period (e.g., per-day); in this case, the number of effort-days (# boats x the # days each boat operates) would need to be known for the fleet. If the sampling frame is incomplete because the size and extent of the fleet has not been accurately determined, then the total bycatch mortality will be underestimated (i.e., negatively biased). Therefore, diligence is needed to identify all of the vessels operating in a particular fishery throughout its range.
Ideally, the units of fishing effort measured should be those most directly related to the amount of bycatch that occurs. For example, for a longline fishery, one might quantify the number of longline sets, or more coarsely, the number or total duration of longline fishing trips. However, the number of hooks on the line and their soak time (e.g., "hook-hours") more closely relates to the likelihood of an animal being bycaught. This distinction is relatively unimportant if large numbers of longline sets or trips (i.e., effort units) are randomly sampled. However, if the size of the observed sample is small or the sampling is biased, then bycatch mortality may be more accurately estimated by measuring the number of bycatch events per hook-hour and scaling this to hook-hours in the fleet (rather than quantifying bycatch per longline trip and the number of trips). Effort recorded in finer units can always be re-scaled into coarser units as needed, whereas data recorded in coarse units cannot be more finely resolved. Of course, there are trade-offs to how finely one measures effort. Obtaining fleetwide information about the number of vessels and trips is easier and less costly than monitoring the number of hookhours, for example. In addition, coarser units tend to be more statistically independent. For example, observations of bycatchper-trip are more likely to be statistically independent than observations of bycatch-per-set, since set data will be correlated in time and space within the same trips. Observations at coarser scales thus tend to give more valid estimates of precision unless autocorrelations in hierarchical or nested datasets are properly taken into account.

Estimating Fishing Effort in Practice
Measures of fishing effort vary greatly, as do the methods for quantifying those measures. McCluskey and Lewison (2008) reviewed the types of effort measures available for different types of fishing fleets around the world, including artisanal or smallscale and industrial fleets (as well as recreational and IUU 7 fleets, not discussed further here). Though not well-defined (Tietze, 2016;Smith and Basurto, 2019), our use of "small-scale" refers to fleets that tend toward having lower capital or technological investment, being operated at the household/family level (rather than by companies), and having smaller vessel sizes. In the extreme, these fleets can consist of thousands of such vessels dispersed across vast geographic areas. Effort in small-scale fleets is usually poorly documented and rarely quantified, due to factors such as lack of awareness, funds and infrastructure, and institutional capacity. Interviews with a large, representative sample of small-scale fishers may be the most practical way to get useful estimates of effort (e.g., Gómez-Muñoz, 1990;Moore et al., 2010), and often measures of fishing-effort will necessarily be crude. For example, Lewison and Moore (2012), working with Nigerian colleagues, identified the number of fishing villages in each of three Nigerian states. For each state, they randomly sampled the villages, counted the number of fishing vessels on the beach in these villages and interviewed fishers there (stratified by boat or gear type) to obtain information about fishing methods, gears, seasonality, fish catch, and bycatch of marine mammals and sea turtles. The fishing effort metric was the average number of boats per village, multiplied by the number of fishing villages along the entire coast to estimate the number of boats per state. BPUE, also obtained from the interview data, was quantified in terms of animals caught per vessel per year. Rough total bycatch estimates were derived as catch per vessel (per year) multiplied by the number of active vessels in the state.
For industrial fisheries, a greater variety of methods for quantifying fishing effort data are generally available. In addition to interview approaches, industrial fleets are more amenable to implementing observer programs. Fleet-wide effort can be quantified through complete dockside monitoring when all vessels return to one central port or a few main ports, or using logbook data, whereby data are recorded on when, where, how and how much they fish (e.g., Roman et al., 2014). Collecting spatially and temporally explicit information about fishing effort (e.g., through logbook data) is extremely valuable, especially if total fishing effort (or BPUE) needs to be modeled rather than estimated using design-based approaches (McCluskey and Lewison, 2008). Inaccuracy is a potential pitfall of both interview-based and logbook data due to response bias (e.g., Cosgrove et al., 2016;Northridge et al., 2017;Luck et al., 2020).
The most accurate data on fishing effort are obtained from electronic logbooks that provide spatial and temporal fishing effort data, such as via a vessel monitoring system (VMS) in which data are uploaded via satellite on a regular schedule. A challenge to this approach is the resistance commonly shown by fishers to being monitored. Nevertheless, the availability of VMS and Automatic Identification Systems (AIS) data had led to recent advances in the ability to make inferences about fishing activity (effort levels and distribution) using computer science algorithms such as Global Fishing Watch 8 (e.g., Kroodsma et al., 2018).
In the absence of data to estimate effort directly, effort can sometimes be predicted or inferred from other characteristics of the fishery using models (e.g., McCluskey and Lewison, 2008;Greenstreet et al., 2009;Soykan et al., 2014;Johnson et al., 2017;Adibi et al., 2020), although their accuracy may be difficult to validate and may rely on unrealistic or unsupported assumptions or inaccurate information. For example, fish catch (landings) has been used as a proxy for fishing effort, either directly or through models, but landings data themselves are often inaccurate (e.g., Batista et al., 2015;Pauly and Zeller, 2016).

ESTIMATING µ, TOTAL MORTALITY, AND ITS UNCERTAINTY
Given (1) BPUE (b) and mortality (m) estimates obtained from an unbiased sample of observer data from a fishery and (2) an estimate of that fishery's total fishing effort (E) in comparable units, the simplest and most common estimator for total bycatch is a ratio estimator, whereby b * m is multiplied by E. Equivalently, if e is the amount of effort observed, so that observer coverage P = e/E is the proportion of the fleet observed, then the bycatch estimator can also be expressed as the bycatch mortality in the observed sample divided by P (Julian and Beeson, 1998;Carretta et al., 2004). For example, if 100 effort units out of 1,000 in the fleet are observed (P = 0.1), then estimated total bycatch mortality, µ = b × m × 1,000 = observed bycatch mortality/0.1. Variance in this estimate is commonly calculated using resampling (e.g., bootstrapping) or delta methods (e.g., Zhou, 2002;Manly, 2011;Cruz et al., 2018). An advantage of bootstrapping is that it facilitates the accounting of variance on appropriate (independent) observational units. Often the independent sampling unit in an observer program is the fishing trip (e.g., it might be possible, given a rough schedule of fishing trips, to sample these randomly) whereas the multiple gear deployments observed within that trip are correlated (occurring in similar time and space and with similar methods). Treating each day or gear deployment as the observational unit would likely over-estimate the precision (underestimate the variance) of the estimates, whereas resampling fishing trips in the bootstrap analysis provides a valid variance estimate. Precision of the bycatch estimate is typically reported using coefficients of 8 globalfishingwatch.org variation (CVs), along with other standard precision measures, such as 95% confidence intervals. As noted above, in the United States, performance tests of the PBR framework are based on the assumption that the CV for bycatch in an individual year 0.3 or less.
The above "design-based" methods for estimating bycatch and bycatch mortality assume that bycatch in the observed portion of the fishery can be extrapolated to the whole fishery, because the study is designed in a representative way. In many situations, bycatch is better estimated using a model-based approach, rather than simple ratio estimators. Examples include when the sample data are biased (not collected using a random or other representative sampling scheme), when multiple years of data have been collected and inferences about current bycatch levels can be informed by data from prior years, when multi-year datasets include years when no bycatch was observed (CVs cannot be calculated for these years using a simple ratio estimator), or when one desires to make probabilistic or predictive inferences about the likelihood of bycatch mortality exceeding a bycatch-limit reference point (e.g., in the current or a future year; Martin et al., 2015;Cruz et al., 2018;Carretta et al., 2019;Stock et al., 2019). Model-based approaches are discussed further in the next section.

Non-representative Sampling
Biased sampling (e.g., extreme over-or under-sampling the fishing fleet with respect to characteristics such as area, season, gear, or vessel type) should be avoided if possible, but if the total fishing effort is well characterized, then stratifying the sample of observer data can help address some biases. For example, if a fishery operates over a 3-month period, with most effort occurring in the second month, but most of the observed effort comes from the first month, then bycatch (and variance) can be estimated separately for each month (stratum) and the stratum estimates combined to obtain the total bycatch mortality. However, it is important in this scenario that sampling within each stratum is largely representative of the fishing occurring within the stratum. Ideally, stratification should be built into the study design, to ensure sufficient representativeness and to ensure the adequacy of within-stratum sample sizes. "Post hoc stratification" may not overcome severe design biases, such as when sample sizes are very small or absent within some strata, or when sampling biases exist across multiple attributes of the fishery (e.g., sampling in the third month under-represented an important fishing area).
Model-based approaches can be useful when sampling biases cannot be remedied by stratification, and in some other contexts. Model-based estimators use statistical relationships between potential explanatory variables (e.g., properties of a fishing deployment in a certain time and place) and a response variable (e.g., bycatch mortality) to make predictions about bycatch mortality in the unobserved component of the fishery. If the sample data capture the range of variation in the important explanatory variables, then these relationships can be described (modeled) and used to predict bycatch throughout the fishery provided that the covariate values are known for all the fishing effort (e.g., from fishery logbooks). For example, Carretta et al. (2019) used a random-forest machine learning approach to estimate marine mammal, sea turtle and seabird bycatch in the California drift-gillnet fishery based on quantified relationships between observed bycatch and a suite of fishingset characteristics (location, diurnal and seasonal time variables, bathymetry, oceanography, gear characteristics, etc.) (also see Stock et al., 2019 for a random forest example). Authier et al. (2021, this issue) showed how regularized multilevel regression with post-stratification could be used to estimate bycatch from non-representative sampling. Another common framework for estimating bycatch mortality using covariate data is generalized linear or generalized additive modeling (GLMs or GAMs), which can be implemented in a frequentist (Orphanides, 2009;Cruz et al., 2018;Stock et al., 2019) or Bayesian estimation framework (e.g., Martin et al., 2015;Moore and Curtis, 2016). Models can be particularly useful when multiple years of data exist, allowing information-rich years to inform bycatch estimates in more data-limited years, to resolve the problem of unestimable CVs in years when no bycatch is observed, and to evaluate longitudinal relationships to bycatch mortality such as a change in management actions (e.g., Carretta et al., 2019). Bayesian methods in particular are useful for obtaining probabilistic inferences, such as the probability that the bycatch rate has changed in response to a management action or that bycatch mortality exceeds a limit or other threshold (Martin et al., 2015;Moore and Curtis, 2016).
Taking a model-based approach may be the only option for obtaining valid estimates of bycatch if a sampling design is non-representative. Importantly, however, a model-based design cannot always provide unbiased estimation if the survey design is poor. In particular, if important covariates are not adequately sampled across their range of variation, or if many observations are not statistically independent, then the covariate relationships can be incorrect. As described earlier, there is no good substitute for a well-designed survey and fishery observer program.

Inaccurate Counts by Observers
Bycatch mortality estimates can be biased due to inaccurate counts of observed bycatch events (typically undercounts). Undercounts occur when observers are unable to record every bycatch event that occurs during a watch period. Observers may be engaged in other data collection tasks and not detect bycaught individuals, particularly those not brought on deck. The number of bycaught animals recorded by an observer can be less than the number that were actually bycaught because marine mammals caught on hooks can "drop-off, " or those entangled in nets can "drop-out, " at any time during the fishing or retrieval of the gear (e.g., Hamer et al., 2011). These problems can be exacerbated if the crew inadvertently or deliberately fail to inform the observer of the presence of bycaught individuals, or surreptitiously release or shake an animal out, or off, the gear. This source of bias can be minimized by assuring the cooperation of crews, although in practice it cannot be eliminated because it is very difficult to estimate the frequency of drop-offs and drop-outs that occur out of sight from the vessel.

Deployment Effects
"Deployment effects" refer to factors that make it logistically infeasible to carry out the planned sampling design, forcing non-representative sampling of the fishery (Benoît and Allard, 2009;Faunce and Barbeaux, 2011;Cahalan and Faunce, 2020;Fernandes et al., 2021). These factors (Table 1) include unequal ability to observe different vessel types in the fishery, nonparticipation in the observer program by fishers, inability to observe the fleet operating in certain locations and periods, sub-optimal allocation of observer effort due to incomplete knowledge of the fleet, and other logistical restraints ( Table 1).

Observer Effects
"Observer effects" occur when fishers use different gear or fishing methods, target different fish species, fish in different areas or at different times, reduce effort per trip, or handle bycatch differently when observers are on board (Liggins et al., 1997;Benoît and Allard, 2009;Faunce and Barbeaux, 2011), presumably to reduce the chance of bycatch occurring (or being detected and reported). Observer effects result in observer data that are not representative of the entire fleet and may not accurately reflect the bycatch that occurred on the observed trips. Subsequently, bycatch mortality estimates are biased and likely more precise than is warranted (Cotter and Pilling, 2007). Observer effects and the resulting "observer biases" are difficult to confirm [National Marine Fisheries Service [NMFS], 2011]. In some studies, observer effects have been inferred based on catch statistics that differed significantly between observed and unobserved portions of a fleet (e.g., Wahlen and Smith, 1985;Walsh et al., 2002;Cotter and Pilling, 2007;Burns and Kerr, 2008;Faunce and Barbeaux, 2011;Kirkwood et al., 2020); some other studies failed to find such differences (e.g., Liggins et al., 1997). It is widely assumed that observer effects are common (e.g., Faunce and Barbeaux, 2011), especially when (i) captains and/or crew believe that observer data can be used against them (e.g., have enforcement consequences or lead to disadvantageous management changes) [National Marine Fisheries Service [NMFS], 2011], that an onboard observer requires additional effort on their part, or that having an observer on board constrains their behavior in some way (Cotter and Pilling, 2007), or (ii) fishers believe that greater profits can be made without observers on board (Furlong and Martin, 2000). One way to address this challenge is for managers to identify and provide incentives for fishers to cooperate, thereby helping to ensure the safety of observers and the integrity of their data. Potential incentives might include financial compensation, increased quota allocation, access to closed areas or seasons, permit fee relief, or access to restricted access fisheries.
If significant observer effects have been documented or are suspected, the problem can be handled in several ways, although Using electronic monitoring to observe bycatch on vessels that cannot take observers, or placing observers on an alternate platform (e.g., another vessel) Location and time Observers are less likely to be placed on vessels operating out of certain locations (e.g., remote ports) or during certain times of the season Vessels operating in different parts of the fishing grounds or at different times (e.g., seasons) have different bycatch rates, and observer coverage is not proportional to effort in those different areas or times Detailed understanding of distribution of fishing effort and marine mammals, and the factors that affect their dynamics, to ensure representative observer coverage Sub-optimal allocation The optimal allocation of observers requires knowing the universe of trips, which is only possible at the end of the sampling period (e.g., fishing season), yet observers have to be allocated to trips while the season is underway The real distribution of fishing effort differs significantly from the anticipated distribution upon which observer deployments were based; observed effort is not representative of the fishery Adaptively modifying observer placements based on within-season monitoring of fishing effort

Logistical constraints
The ability to deploy observers deviates from the initial survey design, for example during periods of intense fishing effort Some portions of the fishing effort are under-sampled by observers, and those portions have significantly different bycatch rates compared to the rest of the effort Anticipating factors that could "overwhelm" an observer coverage design prior to deploying observers Volunteer participation Operators who volunteer to accommodate observers are more likely to comply with bycatch mitigation measures than operators who do not The bycatch rates on vessels that volunteer are significantly less than those on vessels that do not volunteer Requiring all vessels to carry observers, or independently estimating the bycatch rate in the unobserved portion of the fishery the best approach is to address potential biases in the initial design of the observer program (Benoît and Allard, 2009). Theoretically, bycatch mortality estimates could be corrected if there is an estimate of the bias introduced by the observer effects, although such an estimate is rarely if ever available (Punt, 1999). Bias can be reduced by increasing observer coverage or deploying electronic monitoring devices on the unobserved portion of the fleet, although the latter may introduce its own sources of error (see "Electronic vessel monitoring as an alternative to observer programs" below) and may not be economically feasible. The bias should decrease to zero as observer coverage increases to 100%, although there is still the potential for bias due to unrepresentative sampling within trips or to fishers influencing the ability of observers to conduct their duties as required by the observer program (Benoît and Allard, 2009). Finally, a fishery can be stratified such that unbiased bycatch estimates are obtained from the observable vessels, thus confining the problems and bias to just a portion of the fishery, which can be subject to targeted monitoring to account for the under-representation (Furlong and Martin, 2000;Benoît and Allard, 2009).

Cryptic Bycatch Mortality
In general, "cryptic mortality" refers to human-caused mortality that is not, or cannot be, observed. Bycatch should be estimated across all fisheries for a given marine mammal population. However, it is relatively uncommon that all fisheries are observed, and IUU fisheries are, of course, unobserved. Cryptic deaths and injuries can (1) occur in observed fisheries when deaths and injuries are not detected by observers (e.g., drop-offs and drop-outs), (2) go undetected because some fisheries are not observed, or (3) result from "ghost-fishing" (Gilman et al., 2013). Several methods have been developed to estimate the magnitude of overall cryptic mortality (e.g., Williams et al., 2011;Peltier et al., 2012;Barbieri et al., 2013;Gilman et al., 2013;Prado et al., 2013;Wells et al., 2015;Carretta et al., 2016), from which it may be possible to estimate cryptic bycatch mortality. The most common approach estimates the recovery rate of carcasses as the ratio of the number of known deaths due to all causes (obtained, for example, from stranding data) to the estimated total number of deaths in the population (e.g., from a population model). The product of the inverse of the recovery rate and the number of known deaths due to fisheries interactions, excluding those documented by observers, provides an estimate of the undetected (i.e., cryptic) fisheries related mortality. This approach depends strongly on the assumption that the detection rate of deaths due to fisheries interactions is not different from the overall detection rate. Cryptic mortality from all sources, not just bycatch in fisheries, has been estimated to be one-half to two-thirds, and in extreme cases up to and exceeding 90%, of total mortality for marine mammal populations (see references above). , 2011], forcing an "optimal" observer coverage level typically much less than 100%. While observer programs are generally regarded as the most accurate approach for estimating bycatch, some less-than-ideal alternatives exist that, under favorable circumstances and if implemented well, can provide information to support cruder assessments of marine mammal bycatch in a fishery.

Self-Reporting (Logbooks or Interview Data) as an Alternative to Observer Programs
Vessel logbook data and data collected through "dock-side" interviews, in addition to providing information about fishing effort, can provide information about marine mammal bycatch. Bycatch data collected by these methods are generally incomplete and inaccurate, usually in the direction of under-reporting (e.g., Walsh et al., 2002;Emery et al., 2019;see Mangi et al., 2016 for discussion of the efficacy of self-reporting). It is widely assumed that logbook data are incomplete and inaccurate because fishers are not skilled at collecting fisheries data (e.g., Faunce, 2011;Faunce and Barbeaux, 2011;Sampson, 2011;Mangi et al., 2016), or that they withhold information they believe could have negative consequences for them. Gilman et al. (2019) suggested that fishers "may have an economic or regulatory disincentive to record accurate data." This problem may be less severe where there are strict legal requirements to report bycatch in logbooks, with surveillance, enforcement and punishments in place. Indeed, the use of electronic monitoring (see below) has been shown to improve the quality of logbook data (Emery et al., 2019). If logbook reporting can be assumed to be consistent throughout the fishery, then such data can be useful for extrapolating/estimating from more reliable data (observer program) that are limited in time and/or space.
Many interview-based assessments have been conducted to obtain semi-quantitative or qualitative information for characterizing fisheries in terms of describing gears and vesseltypes, spatial or temporal patterns of fishing effort, and interactions with target and bycatch species, and for doing risk mapping, spatial planning and understanding socio-economic drivers of fisheries management issues (e.g., Moore et al., 2010;Liu et al., 2016;Whitty, 2016;Pilcher et al., 2017;Braulik et al., 2018). An advantage of using interview-based approaches to quantify bycatch mortality is the relatively low cost and relative logistical ease of talking with fishers compared to implementing an observer program. However, as is the case for logbook data, interview data are likely to provide biased inference if fishers are not forthcoming and honest (e.g., for fear of regulations that will limit their fishing opportunities), and interview responses are prone to memory error and interviewer effects. Conducting interviews is itself an art that requires skill and training (e.g., Moore et al., 2010;Lewison and Moore, 2012).
Nevertheless, there can be circumstances where self-reporting from logbooks or interviews provides useful information, at the very least providing information on minimum bycatch levels and on when and where at least some bycatch is occurring, in which gear types, and for which species (although accurate species identification can also be a problem with logbook and interview data, as fishers are unlikely to have been trained in species identification). Information from self-reporting can be useful for determining whether an observer program is needed, and, if it is, for guiding initial planning (e.g., prioritizing which fisheries or areas to observe first).

Electronic Vessel Monitoring as an Alternative to Observer Programs
An alternative to using fisheries observers is electronic monitoring using various technologies, such as GPS or AIS, video cameras, and gear sensors, that capture information on fishing location, catch, bycatch, and discards. Electronic monitoring systems can be used to monitor compliance with catch retention requirements or bycatch of protected species. Systems are now available that can monitor fishing activities on a vessel, and they are starting to supplement data collected by observers or to obtain data from previously unmonitored fisheries (Gilman et al., 2019). These systems integrate GPS units, hard disks, gear sensors and video cameras that provide a visual record of what was caught when and where, including bycatch (Mangi et al., 2015;van Helmond et al., 2020). Gear sensors can improve the efficiency of data collection and storage. For example, a reel sensor can determine when a longline is being retrieved and turn the system on only at those times. van Helmond et al. (2020) reviewed 100 pilot studies and 12 operational implementations, as of 2018, to monitor catch from around the world. As electronic monitoring systems are in the early stages of development and use, it is not yet clear how effective they will be at detecting and accurately recording data on marine mammal bycatch. Nonetheless, a number of systems deployed to monitor protected-species bycatch have reported marine mammal or seabird bycatch (McElderry et al., 2007;Evans and Molony, 2011;Kindt-Larsen et al., 2012;Bartholomew et al., 2018;Emery et al., 2019;Glemarec et al., 2020;van Helmond et al., 2020).
While these systems may collect data on numbers of species with relatively high precision, they cannot yet match observers in many tasks (e.g., species identification, measuring and weighing, sample collection) . On the other hand, electronic monitoring systems can collect some data that observers cannot necessarily collect consistently (e.g., precise time and location of individual events, nature of handling and disposition of animals), and they can collect data on 100% of the effort during a fishing trip; observers miss some effort when they are off duty or ill, or weather prevents them from being on deck. Important costs associated with electronic monitoring systems include the often substantial time and funding needed to review and analyze the video streams, although advances in machine learning software hold promise for addressing this issue, and the need for video storage, which can be expensive (Margolis and Alger, 2020). Several authors have identified strengths and weaknesses of electronic monitoring systems and compared the technology to traditional methods (e.g., Mangi et al., 2015;Suuronen and Gilman, 2020; Table 2). Because these systems can be "on" all the time, or started and stopped remotely or automatically based on sensor input, the fishers do not know when the system is collecting data. Further, the data are likely to be subsampled later, which also prevents fishers from knowing when they are being monitored. Therefore, the use of these systems could eliminate an observer effect, or discourage fishers from attempting to influence the data collected by an observer, when electronic monitoring is used to supplement observer data. Further, a sampling design applied to the recorded data could be completely representative and would not suffer from a deployment effect. Electronic monitoring can create a record that, for the duration that it is stored, can be revisited to verify information or resampled to address new questions, although most current applications retain raw data only for finite periods because of high data storage costs and infrastructure requirements.
Impediments to deploying and implementing electronic monitoring include resistance from fishers out of concern about the upfront cost, difficulties of installation, especially on small vessels, and privacy issues (McElderry et al., 2007;Mangi et al., 2015). Fishers may consider electronic monitoring an intrusion into their private workspace (Plet-Hansen et al., 2017) and may argue that camera surveillance reflects a governmental mistrust against them (Mangi et al., 2015). There are also concerns that some bycaught marine mammals may not be brought close enough to the vessel to be seen on camera, and regarding the capability of the video cameras to record sufficient detail to confirm the species identification of marine mammals in the water alongside the vessel and determine the extent of their injuries, particularly at night.

What Can Be Inferred Without Bycatch Monitoring Data?
There is no substitute for bycatch monitoring, but in the complete absence of a bycatch data collection system, there are indirect ways to infer whether bycatch is occurring and whether the impacts are likely to be trivial or worse. For example, beach-stranded and at-sea carcasses can provide information on interactions with fisheries and be used to help determine the need for an observer program. In the United States for example, stranding-network volunteers document human-caused injuries and deaths (e.g., as evidenced by vessel strikes, gunshot wounds, hooks, line, or net, or knife marks), and the data from strandings are used in marine mammal stock assessments [National Marine Fisheries Service [NMFS], 2016]. Often, carcasses bear clear evidence of a fishery interaction, although it is often not possible to link each case to a specific fishery or type of fishing activity. Stranding data can rarely be used to estimate bycatch mortality directly, but in some cases, models applied to stranding data have been used to infer estimates of the proportion of carcasses likely to strand ashore or minimum bycatch levels (e.g., Moore and Read, 2008;Williams et al., 2011;Carretta et al., 2016;Peltier et al., 2016Peltier et al., , 2020. For pinnipeds, animals at rookeries can show direct evidence of entanglement; Page et al., 2004 used such data to calculate minimum entanglement mortality estimates. If a minimum estimate itself approaches or exceeds a bycatch-limit reference point (e.g., PBR) that may be sufficient to conclude that a management problem exists that needs to be addressed through an active effort to collect bycatch data more directly to inform mitigation.
It is widely understood that certain gear types represent a predictable threat to particular groups of marine mammals (Wade et al., 2021). For example, vertical buoy lines used to mark and retrieve fixed gear such as crab, lobster or fish traps have the potential to entangle large whales, and to result in their serious injury and death, but may not be a threat to smaller species. In contrast, gill nets are a serious threat to most marine mammals, including porpoises, dolphins and pinnipeds, as well as whales and sirenians. Similar to buoy lines, trawls, seines and longlines can be significant threats to particular marine mammals. Careful comparison by experts of the characteristics of an unobserved fishery with those of similar fisheries with known bycatch rates, combined with consideration of the extent of spatial-temporal overlap between the fishery and the distribution of marine mammal populations, can be used to make qualitative inferences about the likelihood of a population-level problem. Inferences of any kind can be made stronger by drawing upon multiple lines of information.

CONCLUDING REMARKS
This paper is especially intended for fisheries managers and researchers attempting to conduct first-time assessments of fisheries impacts on marine mammal populations. We have tried to break down the daunting challenge of estimating bycatch mortality, highlighting key central concepts, best practices, and typical impediments to obtaining good estimates. Bycatch estimates need to be compared to conservation reference points, which are derived for marine mammal populations mainly from estimates of population size. Population size and reference point estimation are not covered here, but we have provided references on these topics, and a more complete treatment of estimating abundance and reference points can be found in Hammond et al. (2021) and Wade et al. (2021), respectively, in this issue. Scientific observer programs are the only known way to obtain the data needed to estimate bycatch accurately. We therefore place considerable emphasis on this topic and hope the principles discussed in this paper will be useful for those developing fledgling observer programs. Importantly, the main principles-e.g., estimators and measurement units, survey design and statistical considerations, sources of bias-should be useful for the application of alternative bycatch estimation approaches (e.g., using logbooks, interviews) to the extent that these can be incorporated. Alternatives to observer programs have the key advantage of cost-effectiveness. If done well, they can provide useful information for the assessment process and in some cases may be sufficient for determining whether bycatch mitigation is required.

AUTHOR CONTRIBUTIONS
JM and DH designed and wrote the manuscript. TF helped design and produced the figures. All co-authors developed the concept, contributed to writing and editing, and approved its publication. All authors contributed to the article and approved the submitted version.
FUNDING Support for this project was provided by the Lenfest Ocean Program (Contract ID: #31008).