Methods for Monitoring for the Population Consequences of Disturbance in Marine Mammals: A Review

Assessing the non-lethal effects of disturbance and their population-level consequences is a significant ecological and conservation challenge, because it requires extensive baseline knowledge of behavioral patterns, life-history and demography. However, for many marine mammal populations, this knowledge is currently lacking and it may take decades to fill the gaps. During this time, undetected population declines may occur. In this study we identify methods that can be used to monitor populations subject to disturbance and provide insights into the processes through which disturbance may affect them. To identify and address the knowledge gaps highlighted above, we reviewed the literature to identify suitable response variables and methods for monitoring these variables. We also used existing models of the population consequences of disturbance (PCoD) to identify demographic characteristics (e.g., the proportion of immature animals in the population, or the ratio of calves/pups to mature females) that may be strongly correlated with population status and therefore provide early warnings of future changes in abundance. These demographic characteristics can be monitored using established methods such as visual surveys combined with photogrammetry, and capture-recapture analysis. Individual health and physiological variables can also inform PCoD assessment and can be monitored using photogrammetry, remote tissue sampling, hands-on assessment and individual tracking. We then conducted a workshop to establish the relative utility and feasibility of all these approaches for different groups of marine mammal species. We describe how future marine mammal monitoring programs can be designed to inform population-level analysis.


INTRODUCTION
Investigating the sublethal effects of disturbance and their consequences at a populationlevel remains a significant ecological and conservation challenge. It requires extensive baseline knowledge of behavioral patterns, life-history and demography. However, for most marine mammal populations, this knowledge is currently lacking and it could take many years to fill these gaps. During this time, undetected population declines may occur. Typically, marine mammal populations are monitored via surveys to determine their size or density. Whilst there are well established methods -such as line-transect surveys for cetaceans (e.g., Wade and Gerrodette, 1993) or telemetrycorrected haulout counts for pinnipeds (e.g., Thompson and Harwood, 1990) -for estimating the size of marine mammal populations, these are expensive, particularly in the case of cetacean populations. They also tend to provide imprecise estimates, because marine mammal populations are often spread over wide areas and individuals are often submerged, when they cannot be sighted. Consequently, monitoring programs based on these methods typically only have the power to detect large changes (Taylor et al., 2007;Jewell et al., 2012). Additionally, it may take many years before changes in vital rates manifest themselves as changes in population size if a species is long-lived. There may, therefore, be merit in monitoring demographic characteristics (such as the age-or stage-structure of the population) and indicators of individual health that can provide an early warning of population level effects and help to identify some of the drivers of changes (National Academies of Sciences Engineering and Medicine, 2017).
Working committees established by the National Research Council of the United States National Academies and the United States Office of Naval Research (National Research Council, 2005;New et al., 2014; National Academies of Sciences Engineering and Medicine, 2017) developed a conceptual model for assessing the population consequences of disturbance (PCoD). Those efforts led to a mathematical framework documented in Pirotta et al. (2018) (Figure 1). It describes how disturbance may impact both the behavior and physiology of an individual, and how changes in these characteristics may affect that individual's vital rates either directly (an acute effect) or indirectly via its health (a chronic effect). An individual's health integrates the potential effects of physiological and behavioral responses to stressors over a time scale that is longer than the duration of the responses themselves but shorter than the response time of vital rates. Changes in health indices can therefore provide an early indication of future reductions in vital rates, such as survival and reproduction. A key requirement for the implementation of the PCOD framework is therefore the ability to assess the health of individuals. A variety of health indices have been proposed, including allostatic load, energy stores, immune status, organ status, stress levels, contaminant burden, and parasite load (National Academies of Sciences Engineering and Medicine, 2017).
The PCoD framework can be used to forecast the possible consequences of a range of disturbance scenarios. However, in data poor situations these forecasts have significant uncertainty associated with them, which can only be reduced by decades of research. Here, we review methods for monitoring such populations that can also provide insights into the processes through which disturbance may affect health and vital rates. This review provides both a retrospective and prospective view of how appropriate monitoring methods might be selected for a given marine mammal population. We also explore how the statistical power of a monitoring program to detect changes can be improved and identify contextual variables to aid interpretation of observed patterns.
We summarize the results of a literature review, and the outputs from an expert consultation. In addition, we explore the practicality of different methods for collecting appropriate datasets and use existing PCoD models to explore the potential of different demographic characteristics for providing an early warning of population decline. We conclude by describing how monitoring can inform future PCoD analysis of the effects of anthropogenic activities on marine mammal populations.

Literature Review and Workshop
Although we reviewed the published literature spanning a wide range of monitoring methods, we do not provide an exhaustive review here. Instead, we focus on six methods that have been applied to monitor marine mammal populations. They are: • hands-on assessment: capture-release, live stranding and necropsies, • remote tissue sampling, • capture-recapture approaches (that is, methods for which capture-recapture analyses can be applied to data once collected), • photogrammetry, • individual tracking, • visual and acoustic surveys.
We categorized marine mammal species into the following groups: deep-diving cetaceans (including beluga and narwhal); baleen whales; coastal dolphins and porpoises (including river dolphins); oceanic dolphins; land-breeding pinnipeds; and ice-breeding pinnipeds. We did not consider sea otters, sirenians or polar bears.
The literature review process was supported by a workshop at which invited experts from a range of marine mammal research disciplines (Supplementary Table S1) were asked to assess and prioritize monitoring approaches for the different marine mammal species groups. Experts followed a lines of evidence (LoE) approach (e.g., Ross, 2000;Baruch et al., 2009;Amidan et al., 2015) to help identify the most suitable methods to monitor demographic characteristics and health variables (see Supplementary Material). Using the LoE approach, a set of response variables was identified, based on their "relative utility" (see Supplementary Material). Note, that the relative utility scores are non-linear and not directly comparable between methods for monitoring demography and individual health. The FIGURE 1 | "The Population Consequences of Disturbance (PCoD) conceptual framework, the boxes within the dashed gray boundary line represent the effects of exposure to a stressor and a range of ecological drivers on the vital rates of an individual animal. The effects are then integrated across all individuals in the population to project their effects on the population's dynamics" [reproduced from Pirotta et al. (2018)].
readiness of the available methodologies for monitoring these variables (i.e., their "feasibility") was also assessed. The feasibility and relative utility scores were plotted to identify approaches that were both feasible and had high utility.

Demographic Characteristics That May Provide an Early Warning of Population Decline
Population consequences of disturbance models developed using the outputs of earlier expert elicitations were used to evaluate whether the monitoring of demographic characteristics, rather than population size or density, may provide an early warning that a population is declining. In these elicitations, experts were asked to predict the potential effects of different levels of disturbance on the vital rates (individual survival and fertility) of a number of different marine mammal species. Here we focus on models for three species (harbor porpoise Phocoena phocoena, bottlenose dolphin and Blainville's beaked whale Mesoplodon densirostris) that have different life history strategies and that are exposed to different types of disturbance. The harbor porpoise and bottlenose dolphin population models were based on studies of the effects of noise associated with the construction of offshore wind farms in the North Sea King et al., 2015). The Blainville's beaked whale model was based on studies of the effects of sonars used in Navy exercises at the Atlantic Undersea Training and Evaluation Center (AUTEC) in the Bahamas (Moretti et al., 2014;Booth et al., 2016;Harwood and Booth, 2016). We investigated the sensitivity of the following demographic characteristics to changes in vital rates that might be caused by disturbance: • the ratio of calves/pups to mature females; • the proportion of immature animals in the population.
The first of these characteristics will be sensitive to changes in fertility and calf survival, the second will be sensitive to changes in fertility, calf survival and juvenile survival.
Details of the expert elicitation process we used can be found in Booth et al. (2016) and Donovan et al. (2016). In all the expert elicitations we asked the experts for their best estimates of the number of days of disturbance that would be required to have any effect on survival or fertility, the maximum likely effect of disturbance on these vital rates, and the number of days of disturbance that would be required to have this maximum effect. We also asked them for an estimate of the uncertainty they associated with these values. This information allowed us to construct a set of response functions of the form shown in King et al. (2015) for each expert. The opinions of all the experts were combined to provide a probability density surface for each function (e.g., see Figure 3 in King et al., 2015). In order to investigate the potential effects of a particular disturbance activity on a population, we obtained the views of many hundreds of "virtual" experts by sampling at random from these density surfaces. The functions from each virtual expert were incorporated into a stage-structured population model of the kind described in King et al. (2015) in order to investigate their implications for population dynamics.
For the simulated harbor porpoise populations, we investigated the effect of different levels of disturbance occurring over 10 years on a population of 10,000 animals. For bottlenose dolphins we investigated the effect of similar levels of disturbance on a smaller population (200 animals) because most coastal populations of this species are relatively small (Rosel et al., 2011;Cheney et al., 2018a). In both cases we examined the population consequences of the response functions predicted by 500 virtual experts, and compared the values of the two demographic characteristics at various times during the first 10 years with the ratio of the overall predicted decline in population size to the maximum decline in population size. Initially, we only accounted for variation between the opinions of the different virtual experts. However, environmental variation will also affect the value of the two demographic characteristics we chose to examine. We therefore re-ran the harbor porpoise simulations allowing survival and fertility to vary from year to year using experts' predictions of the level of environmental variability in these rates (see King et al., 2015). For the bottlenose dolphin population, we also took account of demographic stochasticity (the chance variation in survival and fertility between individuals which can affect the dynamics of small populations).
Beaked whales on Navy testing ranges are likely to be subject to the same pattern of disturbance over many years. For the Blainville's beaked whale example, we therefore examined the implications of 500 virtual experts' predictions of the effect of 44 days of disturbance each year (a typical pattern for United States Navy ranges, Moretti, 2019) for the long-term growth rate of a population. We then compared these long-term growth rates with the ratio of calves to mature females and the proportion of immature animals in the population as indicated by the stable age distribution associated with this long-term growth rate. We also calculated values for these demographic characteristics that would be obtained from samples of 1,000 or 100 individuals from a large population.

Monitoring Methodologies for PCoD
Hands-on Assessment: Capture-Release, Live Stranding and Necropsies Hands-on assessment makes use of animals that have been caught and then released, animals under managed care, and stranded or bycaught animals. The method received consistently high relative utility scores across all species groups for both demographic characteristics and health variables (see Supplementary Tables S2-S5 and Supplementary Figures S1, S2). However, the feasibility of using the method was considered to be low for deep-diving cetaceans, baleen whales and oceanic dolphins, except in situations where by-caught or harvested animals were available for study.
The demographic characteristics that can be estimated from hands-on assessment include age at sexual maturity and age at first pregnancy, sex ratio, and survival and pregnancy rate (see Supplementary Table S2). For example, whether or not an animal is pregnant can be assessed in a live animal using ultrasound, hormone analysis or physical examination of sex organs (e.g., Kjeld et al., 2006;Galatius et al., 2013;Kellar et al., 2013;Wells et al., 2014). Ultrasound has also been used to measure blubber thickness in stranded and bycaught small delphinids (Joblon et al., 2014) and live baleen whales, specifically North Atlantic right whales and southern right whales (Eubalaena australis) (Moore et al., 2001;Miller et al., 2011;Nousek-McGregor et al., 2013). Serum, urine and blubber samples can also provide a wide range of omics biomarkers, immune function markers and hormone measurements (see Mello and Oliveira, 2016 for a comprehensive review) (Supplementary Table S3). The age of individual animals can be estimated from growth layers in teeth (e.g., dolphin species, Hohn and Fernandez, 1999;pinnipeds, Blundell and Pendleton, 2008) or earplugs (e.g., baleen whales, Trumble et al., 2013), and from fatty acid concentration in blubber (e.g., odontocetes, Koopman et al., 2003;Herman et al., 2008).
Hands-on assessments of live animals can be performed as part of capture-release health assessments (see Hall et al., 2010) or individual-tracking studies, although this approach is typically limited to pinnipeds, small delphinid and porpoise species (Supplementary Tables S4, S5 and Supplementary Figures S1, S2). For example, bottlenose dolphins in Sarasota Bay, United States have been captured since the 1980's to conduct health assessments and to obtain demographic data such as sex ratio, age structure, pregnancy rates, survival rates and age at maturity (Wells and Scott, 1990;Wells et al., 2004). A range of health indicators can be obtained via these capture-release assessments (Schwacke et al., 2013;Schwacke and Wells, 2015;Unal et al., 2018). Serum samples and ultrasound have been used to assess physiological state and pregnancy status, respectively in pinnipeds captured for individual-tracking studies (e.g., Roletto, 1993;Mellish et al., 2004Mellish et al., , 2006Greig et al., 2010). Animals under managed care also provide opportunities for more controlled experimental studies (Champagne et al., 2017(Champagne et al., , 2018 though the applicability of the results to wild animals is poorly understood. Similar information can be collected from hands-on assessment of animals that strand live or dead. However, these animals may be subject to biases and not representative of the healthy population (described in Jones, 1981;Bilgmann et al., 2011;Peltier et al., 2012). In addition, deep-diving cetaceans and oceanic dolphins are less likely to be available for this kind of sampling than other marine mammal species groups because they wash ashore less frequently than coastal species. Even when a stranded carcass is available, its suitability to provide information on demographic characteristics and health variables will depend on its level of decomposition. The sample sizes obtained from hands-on assessments of stranded animals are usually small, but larger samples may be available from bycaught animals and animals harvested for subsistence or during culls.

Remote Tissue Sampling
Tissue samples may be collected remotely using biopsy darts, and from blows and feces (Piggott and Taylor, 2003;Hunt et al., 2013). Remote tissue sampling using biopsy darts was awarded a high feasibility and moderate utility scores for all species groups (Supplementary Tables S4, S5 and Supplementary Figures S1, S2). Sampling using blows and feces was awarded lower feasibility and utility scores.
Samples obtained in these ways can provide information on a suite of health measures, including stress indicators (e.g., stress hormones, omics markers of chronic stress), levels of reproductive hormones, body condition indices and immune function markers. Remote sampling techniques do not require handling of the animal and therefore have applications for a wider range of marine mammal species. Biopsy samples of blubber can be analyzed to obtain data on sex ratios, reproductive hormones and wax/sterol esters or fatty acids to estimate the age/stage class of each individual (Supplementary Tables S2, S3). Remote sampling methods have been used to measure reproductive hormone levels in blubber samples from delphinid species (e.g., Kellar et al., 2009;Trego et al., 2013), baleen whales (e.g., bowhead whales, Kellar et al., 2013;humpback whales, Vu et al., 2015) and deep diving cetaceans such as sperm whale (Sinclair et al., 2015). Sampling the blow from respiring animals has been increasing in the recent decade (e.g., Hogg et al., 2009;Dunstan et al., 2012;Hunt et al., 2013;Thompson et al., 2014;Bennett et al., 2015;Apprill et al., 2017;Pirotta et al., 2017;Geoghegan et al., 2018;Nelsons et al., 2019). Along with fecal sampling it provides a non-invasive technique for monitoring the health of pinnipeds (Harvey, 1989;Fossi et al., 1997;Trites and Joy, 2005;Deagle and Tollit, 2007), killer whales (Hanson et al., 2010;Ford et al., 2011;Ayres et al., 2012) and baleen whale species (reviewed in Hunt et al., 2013). It is possible to obtain measures of a number of physiological markers from fecal samples. These include stress hormones, reproductive hormones, thyroid hormone metabolites (as indicators of nutritional stress), gut microflora (including parasite load), exposure to toxins, prey DNA and fecal hormone metabolites (used to assess acute vs. chronic stress markers) (Hunt et al., 2004(Hunt et al., , 2018(Hunt et al., , 2019. As with blow sampling, there is still a need for further work to validate these approaches and to understand how measurements obtained from blows and feces compare with those obtained via biopsy (Hunt et al., 2018).

Capture-Recapture Analysis
Capture-recapture analysis (also known as mark-recapture analysis) involves the initial "capture" and "marking" of individual animals, usually by taking a series of photographs that allow that individual to be identified from their markings and injuries (photo-identification or photo-ID). These individuals may be "recaptured" when subsequent photographs are compared to a catalog of known individuals. This method received high feasibility scores for all species groups except oceanic dolphins, and high utility scores for demographic characteristics.
Studies on coastal and oceanic dolphins have used the scratches and nicks on dorsal fins (e.g., bottlenose dolphins, Wells and Scott, 1990), or the dorsal fin shape and saddle patch markings of killer whales (e.g., Kuningas et al., 2014) to identify individuals. Capture-recapture techniques have been used extensively in coastal bottlenose dolphins to estimate a range of demographic characteristics (Hansen and Wells, 1996;Norman et al., 2004;De Wet, 2013;Schwacke et al., 2013;Fair et al., 2014). Studies on baleen whales have used fluke patterns (e.g., humpback whales, Gabriele et al., 2017), patterns of calluses and crenulations (e.g. southern right whales, Carroll et al., 2011) and patterns of pigmentation, scarring and barnacles (e.g., gray whales, Yakovlev and Tyurneva, 2005). Deep diving cetaceans can be identified using nicks and marks on the trailing edge of flukes (e.g., sperm whales, Matthews et al., 2001) and patterns of scars (e.g., beaked whales, Falcone et al., 2009;Rosso et al., 2011), and pinnipeds can be identified uniquely using their pelage patterns (e.g., harbor seals, Cordes and Thompson, 2015;ringed seals, Zhelezniakov et al., 2015). Other methods of capture have been employed. For example, genetic tagging (using genotyping) has been most extensively explored in baleen whales (Palsbøll et al., 1997;Lukacs and Burnham, 2005;Wiig et al., 2011). Capturerecapture analysis has also been used with telemetry data to estimate survival probabilities of first year gray seals (Halichoerus grypus) (McConnell et al., 2004). Non-permanent methods of marking have been used with pinnipeds, these include dyes or paints, flipper tags and shaving patches of fur. Dyes and paints are often used for shorter term studies, such as tracking gray seal pups from birth to first molt (e.g., Büche and Stubbings, 2016), but they are less effective for long-term studies as there is a high risk of mark loss which will affect the re-sighting rate.
Capture-recapture analysis has been used to assess population structure and demographic variables in a wide range of marine mammal species (e.g., Aguilar et al., 2013;Moretti et al., 2017;Schorr et al., 2017). For example, the Sarasota Dolphin Research Program have been conducting photo-ID studies of bottlenose dolphins since 1970 (Wells, 2014) to provide information on population size, survival rate, fecundity rate and age at maturity (e.g., Wells and Scott, 1990;Rosel et al., 2011;Bassos-Hull et al., 2013;Wells, 2014). Such baseline data provide a means to compare demographic characteristics before and after perturbation events, and to identify the early warnings of declines.

Photogrammetry
Photogrammetry and videogrammetry (henceforth collectively referred to as "photogrammetry") provide a non-invasive method for analyzing photographs to provide information on body shape, health and nutritional status . When images are combined with information on scale, they can be used to estimate morphometric characteristics (e.g., length, girth etc.) Such measurements can provide information on the age or stage class of an individual, provided a suitable growth curve or other calibration data are available, and on individual body condition. This method received high feasibility scores, and high to moderate utility scores for all species groups (Supplementary Figures S1, S2).
The simplest form of photogrammetry used in marine mammal surveys involves a single-camera deployed from a vessel, aircraft or land with a suitable scale indicator, such as a known size object or precisely calibrated camera lens (Dawson et al., 2017). The use of unmanned aerial systems or "drones" has become increasingly common and has facilitated significant advances in the field. However, there are concerns that it may cause short-term disturbance to some species (Fettermann et al., 2019). Single-camera photogrammetry is often conducted at the same time as marine mammal surveys.
For stereo-photogrammetry, simultaneous photos are taken from two cameras a known distance apart. This method does not require a scaling object to be present in the image. Parallellaser photogrammetry involves mounting dual lasers onto the camera system. The laser dots in the resulting photographs can be used as a scale in order to obtain morphometric measurements (e.g., Durban and Parsons, 2006). For example, the dorsal fin length of some dolphin species can be used as a predictor of total body length; therefore, if images are obtained of the dorsal fin and the two laser dots, the length of the dorsal fin can be measured from the images and the total body length can be estimated. This has been demonstrated in Hector's dolphins (Cephalorhynchus hectori) (Webster et al., 2010) and spinner dolphins (Stenella longirostris) (Karczmarski et al., 2005). In the Moray Firth, United Kingdom, the length of bottlenose dolphin calves has been shown to be significantly correlated with survival through the first winter (Cheney et al., 2018b). There is, therefore, the potential for body size measurements to provide an estimate of calf survival rates. Parallel-laser photogrammetry can also provide data on body condition if the images are taken from above to give both length and width measurements. Three-dimensional photogrammetry has been used to estimate dimensions, condition and mass for pinnipeds on land (e.g., Stellar sea lions, Waite et al., 2007; southern elephant seals, Postma et al., 2013). Obtaining photos from multiple angles for animals that are wholly or partially submerged is more challenging.
Photogrammetry has been widely used to provide information on individual health, particularly for North Atlantic and southern right whales (Pettis et al., 2004;Christiansen et al., 2018) and other baleen whale species (Burnett et al., 2019). Recent advances of this methodology include the use of three-dimensional volumetric models to allow estimation of mass (Beltran et al., 2018;Adamczak et al., 2019;Christiansen et al., 2019), and longitudinal studies to assess reproductive investment in calves (Christiansen et al., 2018).
Photogrammetry also allows a number of visible indicators of health to be recorded [reviewed by Hall et al. (2010)]. These include rake marks and epidermal lesions (Thompson and Hammond, 1992;Hughes-Hanks et al., 2005;Van Bressem et al., 2009), and the shape of the post-nuchal (Gryzbek, 2013;Reed et al., 2015) and scapular depressions (e.g., Bradford et al., 2012). Rolland et al. (2007) used a combination of information from visual indicators and fecal sampling (to assess parasite load) to provide a single health metric for individual North Atlantic right whales. This information was then analyzed in a Bayesian framework to explore the links between health metrics, vital rates and population status (Schick et al., 2013a;Rolland et al., 2016).

Individual Tracking
Telemetry has been used to track the movements of individual marine mammals over both short (Johnson and Tyack, 2003;Miller et al., 2012c;DeRuiter et al., 2013), intermediate (Mate et al., 2016) and longer time scales (Mate et al., 2000(Mate et al., , 2007Peterson et al., 2012;Hindell et al., 2016). Such approaches have been widely used for assessing baseline behavior and to study the responses of animals to disturbance sources. Telemetry studies have provided information on residency patterns and activity budgets in a wide range of species (e.g., McConnell et al., 2004;Aarts et al., 2008;Patterson et al., 2010;Laidre and Heide-Jorgensen, 2012;Mcclintock et al., 2012, McClintock et al., 2013. While demographic information can potentially be collected using this method (e.g., McConnell et al., 2004), it was considered to have low utility for monitoring demographic characteristics at present. However, it received a high score for feasibility for monitoring health in deep-diving cetaceans and seals. This is because the analysis of dive behavior -particularly "drift dives" (Biuw et al., 2003(Biuw et al., , 2007) -can provide information on the buoyancy of an individual that can be used as an index of body condition. This analysis has been used to derive a measure of body density in elephant seals (Aoki et al., 2011;Miller et al., 2012b;New et al., 2014), fur seals (Costa et al., 1989;Page et al., 2005) and northern bottlenose whales (Hyperoodon ampullatus) (Miller et al., 2016). Its use is currently being explored for humpback, blue and long-finned pilot whales (P. Miller pers. comm). In some cases it may also possible to correlate buoyancy with pregnancy (e.g., Crocker et al., 1997).

Visual and Acoustic Surveys
Visual and acoustic surveys are standard methodologies for estimating marine mammal density using vessel, aerial or land platforms (e.g., Buckland et al., 2001Buckland et al., , 2004. However, visual surveys can also provide information on demographic characteristics, such as mother-calf ratios, which can be used to estimate birth rate (Kogi et al., 2004;Koski et al., 2008;Currey et al., 2009;Perryman et al., 2010)(see Supplementary Tables S2, S3).
Acoustic surveys of marine mammals are usually conducted using some form of Passive Acoustic Monitoring (PAM) (e.g., McDonald and Fox, 1999;Mellinger and Barlow, 2003;Mellinger et al., 2007Mellinger et al., , 2011Marques et al., 2009;Mellinger and Heimlich, 2013) which relies on detecting the sounds produced by marine mammals. It is best developed for cetaceans but it has been used successfully for pinnipeds. Characterizing the vocalizations made by the species that are being surveyed is critical to the success of this approach (Supplementary Tables S4, S5 and Supplementary Figures S1, S2). PAM surveys are routinely conducted using towed hydrophones deployed from vessels (Barlow and Taylor, 2005;Gillespie et al., 2005Gillespie et al., , 2010Barlow et al., 2013) and, more recently, from gliders and other autonomous mobile platforms (Baumgartner and Fratantoni, 2008;Klinck et al., 2012;Baumgartner et al., 2013). However, in these approaches the number of animals detected is often limited by the length of time that a suitable towing platform is available. Fixed PAM installations allow for cost-effective long-term monitoring over limited spatial extents and have been used to estimate the density of a number of cetacean species (Harris, 2012;Harris et al., 2013;Thomas et al., 2017;Carlén et al., 2018). These installations can generate significant sample sizes, and thereby increase the ability to detect trends (Gerrodette et al., 2011).

Demographic Characteristics That May Provide an Early Warning of Population Decline
Here we describe the outcomes of the PCoD model simulations described in the section "Materials and Methods." The Ratio of Calves to Mature Females Figure 2A shows the relationship between the maximum reduction in harbor porpoise population size recorded during each simulation and the ratio of calves to mature females in the third year of disturbance. Although there was a good correlation between the pairs of values, there are some clear outliers, where a large reduction in population size was not matched by a change in the ratio of calves to mature females. These outliers correspond to the opinions of a small number of virtual experts who predicted that disturbance would have a large effect on juvenile survival, but very little effect on fertility or calf survival. In addition, Figure 2A overestimates the power of this demographic characteristic to provide an early warning of population decline because it does not account for the effects of environmental variation, which will also affect the stage-structure of the population. Figure 3A shows the relationship between maximum population decline and the ratio of calves to mature females when environmental variation was included in the simulations. The predictive power is much reduced. Figure 4A shows the relationship for bottlenose dolphins between the maximum reduction in population size recorded in each simulation and the mean ratio of calves to mature females in the first 3 years of disturbance, including the effects of environmental variation. The mean ratio was used rather than the value from a single year because of the small population size and low fertility rate for bottlenose dolphins, which resulted in large variations in the predicted number of calves born each year. Although there is a clear correlation between the pairs of values, there is a much variability, with a wide range of values of the ratio corresponding to each reduction in population size.
In order to investigate the effectiveness of monitoring the ratio of calves to adult females for Blainville's beaked whales, we modeled a population with the same demographic rates as those observed by Claridge (2013) for an undisturbed population in the Bahamas. The effects of disturbance resulted in reductions of between 0% and 6% in the predicted population growth rate. The ratio of calves to mature female animals derived from the stable stage structure for each disturbed population was a highly reliable predictor of long-term population growth rate. However, these values are not presented here because, in practice, it would be impossible to estimate this demographic characteristic with the kind of precision that is provided by the stable stage structure. Instead, Figure 5Aa shows the relationship if estimates of the ratio of calves to mature females were based on sample of 1,000, and Figure 5C shows the relationship for a sample of 100.
The Proportion of Immature Animals in the Population Figure 2B shows the relationship between the maximum reduction in population size and the proportion of immature animals in a harbor porpoise population after 5 years of disturbance. To allow time for the effects of disturbance on fertility and calf survival to influence this demographic characteristic, we chose a later date than that used for the ratio calves to mature female ratio (above). Figure 3B shows the same relationship when environmental stochasticity was included in the simulations. As with the ratio of calves to mature females, there were a large number of outliers from a simple relationship when there was no environmental stochasticity, and the addition of environmental stochasticity substantially increased the uncertainty associated with the relationship. Results for bottlenose dolphins ( Figure 4B) were similar.
For Blainville's beaked whale, Figure 5 shows the relationships between population growth rate and the proportion of immature animals for samples of 1,000 individuals ( Figure 5B) and 100 individuals ( Figure 5D).

Effectiveness for Monitoring
We used the results of the simulations described above to evaluate the likely effectiveness of a monitoring program based on either the ratio of calves to mature females, or the proportion of immature animals. Specifically, we examined whether these demographic characteristics could be used to provide an early warning of a potential population reduction of 40% or more by the end of the period of disturbance.
It was not possible to identify an appropriate early warning threshold for the ratio of calves to mature females in year 3 of the harbor porpoise simulations that did not result in unacceptable numbers of false positives (simulations in which the ratio of calves to mature females fell below the threshold in year 3, but the maximum population decline was actually less than 40%) or false negatives (simulations in which the population declined by 40% but the ratio of calves to mature females in year 3 was above the threshold). For example, although a ratio of calves to mature females of 0.25 resulted in relatively few (21%) false positives, the false negative rate was 50%. However, results using the proportion of immature animals in the population were more encouraging. For example, using a threshold of 0.2 for the proportion of immature animals in year 5 correctly identified 81% of all declines greater than 40%, and had a false positive rate of only 10%.  Similar results were obtained for bottlenose dolphins: early warning thresholds that occurred in a high proportion of the simulations in which there was a population decline of at least 40% also had a high false positive rate (45-50%).
In general, these results suggest that the ratio of calves to mature females may be problematic as an early warning indicator, but that the proportion of immature animals in the population may be a more robust indicator of a potential population reduction.
Results from the Blainville's beaked whale analysis were qualitatively similar. Although the proportion of immature animals in a sample of 1,000 animals appears to provide a good indicator of long-term population growth rate (Figure 5B), it is extremely unlikely that 1,000 animals can be classified from surveys on a regular basis, given the observed densities of this species. However, the proportion of immature animals in a sample of 100 individuals can provide a reliable indication that the population is declining: the proportion of immatures in all samples from simulated populations with a growth rate lower than 0.98 (i.e., declining by 2% per annum) was always less than 0.2 ( Figure 5D).

DISCUSSION
Using a combination of literature review, an expert workshop and additional analysis we identified a set of methodologies for monitoring demographic characteristics and health variables that can be used to inform PCoD analyses for marine mammals.
In general, information on demographic characteristics, such as the ratio of calves to mature females and the proportion of immature animals in the population, can be collected using established approaches. Monitoring of these characteristics is most commonly achieved via capture-recapture techniques (usually photo-ID, although genetic and electronic tagging methods have also been used) that can also provide information on stage-specific survival rates and fertility. Although these approaches are labor intensive, they are well established and capable of providing robust estimates of key demographic variables (Supplementary Table S3). Analysis of population simulation data indicated that some of these demographic characteristics, particularly the proportion of immature animals in the population, can provide an early warning of population decline. This conclusion is supported by analyses of the potential effects of body condition on vital rates using bioenergetics models. For example, Hin et al. (2019) used a dynamic energy budget (DEB) model of the North Atlantic long-finned pilot whale to investigate the potential effects of disturbance on lifetime reproductive success. They found that disturbance could lead to a large reduction in the proportion of calves that survived to weaning. Moretti (2019) used a variant of this model to show a similar effect of disturbance on Blainville's beaked whale.
A wide range of methods can potentially provide information on individual health. However, further validation and standardization of these methods, together with a better understanding of natural variation in the health variables about which they provide information, is needed to advance their utility.
Photo-ID and photogrammetry can be used to provide a visual assessment of health variables such as body condition. Information on features such as the presence of rake marks and epidermal lesions, which may be useful for health assessment and individual identification, can also be collected during visual surveys (Supplementary Table S4). New tools are being developed to monitor the physiology of animals and estimate physiological states and body condition remotely using telemetry (e.g., Ponganis and McDonald, 2015;Williams, 2015;Elmegaard et al., 2016;Ponganis, 2017;McDonald et al., 2018;Costa et al., 2019;Fahlman et al., 2019;Madsen and Van Der Hoop, 2019). Transmission of this information via satellite could allow longitudinal monitoring of these variables over a period of months (Miller et al., 2019).
Hands-on assessment of captive animals, animals in accessible locations (such as seal breeding colonies), and stranded, bycaught or harvested individuals can provide estimates for a range of health variables. However, these animals may not be representative of the population because of sampling biases toward individuals that are easier to catch and handle, or are more likely to be bycaught or strand. Remote tissue sampling can be used to provide information on the same set of health variables, but there may also be issues about the representativeness of the sampled individuals.
FIGURE 5 | Relationship between the long term growth rate of a Blainville's beaked whale population and (A) the ratio of calves to mature females estimated from a random sample of 1,000 animals, (B) the proportion of immature animals estimated from a random sample of 1,000 animals (C) the ratio of calves to mature females estimated from a random sample of 100 animals, (D) the proportion of immature animals estimated from a random sample of 100 animals.

Designing an Appropriate Monitoring Program
Here we focus on issues that need to be considered in designing a monitoring program that will use the methods identified as appropriate by the literature survey and workshop.

Population Structure
One of the key pre-requisites for designing a monitoring program is to identify an appropriate unit of assessment. However, this can be challenging for marine mammal populations. In many cases, the unit of assessment will simply be those animals that can be accessed and sampled. Local populations of this kind may not be closed to migration. As a result undocumented, or time-varying immigration and emigration may make it difficult to interpret observed changes in demographic characteristics. It is therefore important that a monitoring program is capable of generating estimates of migration rates. The most appropriate methods for this are capture-recapture analysis, and genetic analysis of tissue samples.

Quantifying Uncertainty
In order to detect changes over time in demographic characteristics or health measures, it is essential to have reliable estimates of the uncertainty (precision and accuracy) associated with these measurements. Information on levels of uncertainty is also a fundamental component of PCoD models. From our review, we determined that information on precision is limited for many of the methods that we considered. The main exception was for the estimates of demographic characteristics that come from capture-recapture analyses. reviewed the use of covariates to improve the precision of capture-recapture analysis, and of information from auxiliary studies to improve estimates of recapture probability (e.g., Hewitt et al., 2010). In photogrammetric analyses, the posture (or orientation) of animals, the turbidity of the water, the altitude and type of the camera, weather conditions and observer measurement error can all contribute to uncertainty in estimates. However, these can quantified (Webster et al., 2010;Christiansen et al., 2018), and biases can be reduced with an appropriate data collection strategy (Koski et al., 2006(Koski et al., , 2009(Koski et al., , 2013.

Sampling Scale
The appropriate scale of sampling (both spatial and temporal) will depend on the variables and species of interest, the methods being used and the overall objectives of the monitoring program. In terms of the species of interest, the National Academies of Sciences Engineering and Medicine (2017) identified four species groups, based on their ranging behavior: • Animals that can be sampled on land or ice; • Accessible resident populations; • Species that have large ranges but are accessible at certain times of year or during migrations; • Open ocean species.
A wide range of monitoring methods can be used with populations that fall into the first three categories. However, as noted in the Results (and see Supplementary  Tables S4, S5), sampling open ocean species will always be logistically challenging.

Understanding Natural Variation
In order to understand observed changes in demographic characteristics or health variables that may be a consequence of disturbance, it is important to document natural variability under undisturbed conditions. Although body condition is a potentially useful measure of health, observed changes in body condition may be the result of a change in environmental quality, perhaps because the population is approaching carrying capacity and competition for resources is high (e.g., Estes et al., 2009), rather than as a result of exposure to disturbance. Body condition also varies markedly seasonally and during different life history stages. For example, the DEB model developed by Hin et al. (2019) indicates that body condition (as estimated by the proportion of body mass available as an energy reserve) of undisturbed lactating pilot whales and their calves may fall to potentially lifethreatening levels during the first year of lactation. It is therefore important, whenever possible, to identify the life history stage of each individual when estimating its body condition.
One way to calibrate changes observed during monitoring is to study a suitable reference population against which the focal population can be compared. For example, Claridge (2013) monitored demographic characteristics of Blainville's beaked whales on AUTEC in the Bahamas and at a study site at Abaco, approximately 150 km away, where sonar activity is rare. However, Benoit-Bird (2017) observed significant differences in prey distribution and abundance between the two sites. These make it difficult to unequivocally attribute observed differences in demographic characteristics between the two sites to the effects of disturbance (Moretti, 2019).
An alternative approach is to collect information on contextual variables, such as environmental quality and prey resources, at the focal study site and include these as covariates in subsequent analyses. This may involve site-specific prey resource mapping (e.g., Friedlaender et al., 2016), or the inclusion of information on larger scale environmental perturbations, such as the North Atlantic Oscillation (Drinkwater et al., 2003;Greene and Pershing, 2003) or El Niño Southern Oscillation (Tershy et al., 1991;Wilson et al., 2001), that are known to affect primary productivity.

Adding Value to Programs That Monitor Density or Abundance
The estimation of density or abundance is a standard component of most marine mammal monitoring programs, usually via visual, PAM or capture-recapture surveys (e.g., Wilson et al., 1999). As noted above, the statistical power of such time series data to detect change is often low, because of the uncertainty associated with the individual estimates. For example, Moretti (2019) calculated that, depending on the number of days of sampling each year, 25-30 years of photo-ID data would be required to detect an annual decline of 5% in the Blainville's beaked whale population at AUTEC. However, standard density and abundance surveys can be augmented with other methods that can provide data on additional demographic and health variables. For example, individual pigmentation patterns can be documented during visual and capture-recapture studies, and for some oceanic dolphin species this information can be used to distinguish between adults and juveniles (Perrin, 1970;Perrin et al., 1976;Herzing, 1997;Bertulli et al., 2016). Photogrammetric data, and biopsy, fecal and blow samples can also be collected during capture-recapture studies and used to provide information on health variables. Information on features such as the presence of rake marks and epidermal lesions, which may be useful for health assessment and individual identification, can be collected during visual and capture-recapture studies (Supplementary Table S4). Capture-recapture studies also provide an opportunity to collect photogrammetric data, and biopsy, fecal and blow samples.

Informing PCoD Analysis
The key feature of the PCoD framework is that it links behavioral and physiological changes that occur as a response to disturbance to demography via a series of transfer functions. However, as noted above, obtaining the data that is required to parameterize these functions is challenging.
There is considerable scope for using monitoring data to inform the link between disturbance exposure and behavioral and physiological change. For example, Moretti et al. (2014) used PAM data, which was available almost continuously from a fixed hydrophone array, to parameterize a risk function relating changes in Blainville's beaked whale deep-diving behavior to received levels of sonar. Short-term telemetry studies have provided similar information (Harris et al., 2018), and it is expected that the incorporation of acoustic dosimeters into satellite tags will make it possible to quantify behavioral and physiological responses to acoustic disturbance over much longer time periods. Such telemetry devices could also provide information on the effects of behavioral and physiology changes on some health variables, particularly body condition (e.g., Costa et al., 2019).
There is less scope for monitoring programs to inform the link between vital rates and health variables, because of the time scales involved and the (often unobserved) contextual variables which complicate assessments. Nevertheless, programs that monitor both health variables and demographic characteristics can be used to infer the nature of this relationship. For example, Rolland et al. (2016) used data from a long-term photo-ID study of north Atlantic right whales to examine the relationship between a composite measure of individual health and demographic trends in the population.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

AUTHOR CONTRIBUTIONS
CB and JH designed and performed the research. All authors analyzed the data, workshop outputs and wrote the manuscript.

FUNDING
This work was sponsored by the Office of Naval Research: Marine Mammal Biology Program, under award N000141612858.

ACKNOWLEDGMENTS
We are grateful to the workshop participants (see Supplementary Table S1