Overhauling Ocean Spatial Planning to Improve Marine Megafauna Conservation

. Telemetry is a key, widely used tool to understand marine megafauna distribution, habitat use, behavior, and physiology; however, a critical question remains: “ How many animals should be tracked to acquire meaningful data sets? ” This question has wide-ranging implications including considerations of statistical power, animal ethics, logistics, and cost. While power anal- yses can inform sample sizes needed for statistical significance, they require some initial data inputs that are often unavailable. To inform the planning of telemetry and biologging studies of marine megafauna where few or no data are available or where resources are limited, we reviewed the types of information that have been obtained in previously published studies using different sample sizes. We considered sample sizes from one to > 100 individuals and synthesized empirical findings, detailing the information that can be gathered with increasing sample sizes. We complement this review with simulations, using real data, to show the impact of sample size when trying to address various research questions in movement ecology of marine megafauna. We also highlight the value of collaborative, synthetic studies to enhance sample sizes and broaden the range, scale, and scope of questions that can be answered.


INTRODUCTION
Tracking studies of marine animals have proliferated in recent years as a new generation of miniaturized, cost-effective, and reliable telemetry tags are deployed on an ever-increasing array of species ( Fig. 1; Evans et al. 2013). These technological advances have led to a dramatic increase in the use of the Argos and GPS satellite systems to track highly migratory marine vertebrates. Including the large-bodied marine megafauna, that surface to breathe or spend time on land enabling transmitters to communicate with satellites. Simultaneously, an expansion of acoustic telemetry networks to track gill-breathing animals that remain submerged, like tunas or sharks, has occurred. These advances also have been coupled with an increasing diversity in the sensors available on tags and other bio-logging devices (Hebblewhite and Haydon 2010, Hussey et al. 2015, Kays et al. 2015. Combined with advanced analytical techniques, these technological innovations have transformed our knowledge of movement patterns, behavior, habitat use, and ecophysiology of animals with movement data driving a series of positive conservation outcomes across multiple taxa such as the creation of marine protected areas and other conservation zones (Hays, 2019 , 983). However, the deployment of tags can involve procedures that stress the target animals (Wilson and McMahon 2006), including capture and restraint, anesthesia, chemical immobilization, and surgery (Harcourt et al. 2010). Further, costs and logistics associated with purchase and deployment of tags are considerable.
For example, satellite-linked Argos tags cost several thousand US$ per unit with on-going operational costs for satellite time. Consideration of these various elements leads to a fundamental but complex question: what is the minimum number of animals that should be tagged and tracked for a given study to deliver sufficient data to address the research aims McMahon 2006, Hays et al. 2016), while ensuring the number of animals tagged complies with ethical pillars of the three R's of Reduce, Replace, and Refine (Russell and Burch 1959)?
There are well-established metrics to determine sample size and provide the statistical power necessary to draw probability-based conclusions from data sets (Green 1989, Johnson et al. 2015. Hence, answering the question of how many tags to deploy in any given study would seem straightforward. However, power analyses FIG. 1. Across a broad range of species and habitats, electronic tags are used to assess patterns of animal movement. Across studies, a prevailing question is "how many animals need to be tagged?" To illustrate the breadth of tracking studies, this figure shows (A) an ocean sunfish (Mola mola) fitted with a satellite tag, (B) a jellyfish (Rhizostoma octopus) equipped with a time-depth recorder, (C) a hatchling green turtle (Chelonia mydas) equipped with a miniature acoustic tag, (D) a juvenile loggerhead sea turtle (Caretta caretta) equipped with an Argos satellite tag, (E) a ruddy turnstone (Arenaria interpres) equipped with a light-based geolocator tag on its leg, and (F) a harbor seal (Phoca vitulina) with a "mobile phone tag" that relays Fastloc-GPS locations via the mobile phone network. In each panel the scale bar is 10 cm. Photographs courtesy of Graeme Hays, Gower Coast Adventures, Joan Costa, George Balazs, Erik Kleyheeg, and Paul Thompson. require some initial data or knowledge of the expected movements of animals, such as on the variance of the behavior being studied, or on the movement range. This information is often not available because researchers are studying new species or working in new areas. Furthermore, an important caveat to the application of power analyses is that many of these studies are still in the "discovery" phase and the most interesting or relevant questions or observations are still unknown. Although it is always prudent to undertake power analyses when possible, here, we take a complementary approach to assist the planning of telemetry and biologging studies of marine megafauna where little or no prior data are available. We focus on marine megafauna and satellite tracking, given the growth of this area, but some of our conclusions are relevant to other biologging approaches. For example, data storage tags that measure parameters such as diving and body acceleration are widely deployed on marine megafauna and the increasing use of acoustic arrays, often in networks spanning thousands of kilometers , means that acoustic tags are also widely used within this group including smaller life stages of some taxa, such as hatchling sea turtles (Thums et al. 2013). We do not focus on smaller bodied, commercial species because there are complexities associated with sample size for this size range that need separate consideration, for example their common fine-scale stock structure (Righton et al. 2007). Here, we review the types of information that have been obtained by studies with different sample sizes of marine megafauna (Fig. 2). In doing so, we provide guidance for researchers embarking on tracking studies of marine megafauna by summarizing what has been achieved with sample sizes from one to well over 100 individuals. We provide examples of simulation exercises that can be used to estimate the sample size needed to address specific questions. We show evidence that significant advances can be made with small sample sizes while highlighting the benefits obtained from employing greater sample sizes and supplement this review with simulations from real data to illustrate how the ability to answer specific research questions changes with the sample size of tracked individuals. We illustrate this, showing how different sample sizes are needed when addressing different questions of interest for the same taxa (using turtles as example), and also when addressing the same question (using home range or utilization area as example) for multiple taxa (sharks, seabirds, seals). We also highlight the value of data sharing and showcase some of the seminal discoveries made by combining data across studies to reach very large sample sizes.
The value of different sample sizes is best exemplified in work from individuals who pioneered tagging on the same system or species, necessarily starting with small numbers of tags before attaining larger sample sizes that altered the scope of their work allowing new questions to be addressed. For example, a thread of work tracking leatherback turtles in the Atlantic began with n = 3 (Hays et al. 2004a), progressed to n = 21 (Fossette et al. 2010), then n = 106 (Fossette et al. 2014) to recently become part of a study involving >2,500 tracked marine animals across multiple species (Sequeira et al. 2018). At each iteration, the questions that were addressed changed, and this increasing capability is reflected in the synthesis presented here.

DARE TO DREAM (SAMPLE SIZE OF ONE)
Many researchers assert that tracking studies with sample sizes of one are of no value, but the history of animal tracking includes many startling discoveries made from tracking one individual. While, statistically, a sample size of one is expected to capture a "normal" or common trajectory, the value of such studies actually lies in their ability to show that certain feats are possible. Examples of extraordinary feats detected in single animal studies include the journey of >1,000 km by a leatherback turtle (Dermochelys coriacea) tagged off South Africa (Hughes et al. 1998), and the discovery that white sharks (Carcharodon carcharias) can last more than a month on a single large meal (Carey et al. 1982, but see Semmens et al. 2013). Additionally, despite multiple tags having been used in another white shark study, it was the track from a single white shark traversing an entire ocean basin while performing deep dives to nearly 1,000 m (combined with photo ID data) that was central to the discovery that these sharks are not coastal obligates (Bonfil et al. 2005). A single tagged sea turtle was also found to routinely conduct sequences of dives each 6-8 h followed by short interdive surface intervals suggesting operation within its aerobic dive limit and fundamentally altering the expectations of the ecophysiological capacity for this species (Hochscheid et al. 2005). Individual tracks can also provide significant information with conservation implications. For example, the track of a single grey whale (Eschrichtius robustus) tagged within the feeding grounds of the critically endangered western stock off Sakhalin Island, Russia, and migrating to the breeding lagoons of the eastern stock in Baja California, Mexico, questioned whether these two stocks were indeed distinct (Mate et al. 2015). This individual whale also broke the world record (previously held by a humpback whale) for the longest known mammalian migration at 22,511 km.
Data from one individual can also reveal aspects of behavior linked to physical abilities and, if sampled at very high frequency, they can provide high-resolution movement information. For example, flipper sensors attached to a turtle revealed how swimming effort was linked to depth-dependent, air-mediated buoyancy and swim angle (Hays et al. 2004c). Finally, and importantly, a sample size of one may provide critical proof of concept for novel equipment or attachment procedures, providing a starting point for follow-up studies. For example, one of the first animals tracked by Argos satellite-linked tags was a plankton-feeding basking shark (Cetorhinus maximus) that oriented along thermal fronts for 17 d (Priede 1984). The species was studied further with increasing numbers of tags providing insight into other ecologically relevant questions , Southall et al. 2006.
As animal-borne tags are increasingly used to obtain data on the environment, single tags can also provide highly valuable data that would be difficult to obtain with any other observing system. For example, the use of a CTD (conductivity, temperature, and depth) tag on a single southern elephant seal (Mirounga leonina) provided an 8-month hydrographic profile that allowed an assessment of the seasonal evolution of the upper ocean (Meredith et al. 2011). Similarly, a CTD tagged Weddell seal (Leptonychotes weddellii) provided some of the first data on the wintertime conditions over the Weddell Sea continental shelf (Nicholls et al. 2008). Indeed, marine mammals and particularly seals, now provide the bulk of the physical oceanographic observations in the polar regions and are a central component of the global ocean observing system (Treasure et al. 2017). Despite the common perception of the limited value of a sample size of one, the examples above show evidence that even a single tag can provide ground-breaking information allowing insights into population-and species-level ecology and guiding future studies.
UNDERSTANDING VARIABILITY (SAMPLE SIZES UP TO 10) As sample sizes increase, so too does the probability that tags will reveal individual variability in the behavior being observed. Statements based on such data can move from possible limits of animal performance to plausible and ecologically valuable metrics for the species, such as diving behavior, home ranges, and foraging areas. Variations in individual foraging patterns have been observed with surprisingly small sample sizes. For example, three distinct foraging patterns were detected in data derived from nine Galapagos sea lions (Zalophus wollebaeki; Villegas-Amtmann et al. 2008), which were, in subsequent studies, correlated with differences in the physiological capability of these animals (Villegas-Amtmann and Costa 2010). Sample sizes of only a few individuals may also be immensely valuable when high-resolution temporal data are available. This is the case for diving data of marine vertebrates downloaded from the archive of recovered tags that are equipped with pressure sensors (e.g., SPLASH tags, pop-off satellite-linked archival transmitters [PSAT], and dive loggers), which allow for greater insight into the environmental and physiological drivers of movement patterns (Deutsch et al. 2003, Meekan et al. 2015. This type of high-resolution temporal data is more easily collected for animals that return to areas that are predictable in space and time (e.g., breeding areas) and thus facilitate tag recovery. This is because the data that are transmitted to satellite are binned summaries only and the detailed patterns of vertical movements are only available in the tag archives. So, for animals that do not return to breeding or over-wintering sites, such as whale sharks (Rhincodon typus), the detailed patterns of vertical movements can only be obtained when detached tags are recovered by chance (e.g., when these sharks wash up on beaches). Such limitations to data acquisition, in addition to problems with tag failure and loss, need to be factored into the initial sample size of tags. Therefore, information on the expected return of data from all animals tagged is important when writing ethics approvals, to estimate the cost of the project and to define the research scope.
Although larger sample sizes typically are recommended for many ecological questions, a sample size of up to 10 individuals may be immensely valuable for some applications. For example, when testing and developing new methods or technologies, deploying >10 tags may lead to potentially unforeseen negative impacts on animals and waste financial resources. A sample size ≤10 may also be appropriate when studying critically endangered species. Indeed, in such cases, the limit of ≤10 might be enforced by permitting agencies. When generating hypotheses about unknown phenomena, a sample size of up to 10 tags could also be a good starting point, allowing this exploration phase to dictate if the phenomenon is worth exploring further. Also a sample size of ≤10 may be appropriate for species or questions that are difficult to study, such as following social groups on long migrations, or where high logistics costs for deployment may limit funds available for tags, as is the case, for FIG. 2. Examples of tracking studies using various sample sizes to understand different animal movement and behaviors. (A) Track of a great white shark showing a transoceanic migration from South Africa to northwestern Australia; color indicates average sea surface temperature ranging from À7.6°C (indigo) to 32.5°C (red) (adapted from Bonfil et al. 2005). (B) Track from a leatherback turtle revealing that the species was able to travel thousands of kilometers (adapted from Hughes et al. 1998). (C) Tracked movement of eight green turtles equipped off Diego Garcia, Chagos, used to evaluate effectiveness of marine protected areas in the region (adapted from Hays et al. 2014b). (D) Movements of grey reef sharks in the Great Barrier Reef, Australia, showing site fidelity to single reefs; white circles: acoustic receiver locations; brown polygon: buffer zone; white, teal, yellow, pink and red polygons: different management zones (adapted from Heupel et al. 2010). (E) Comparison of vertical movement patterns of basking sharks, bigeye tuna, Atlantic cod, leatherback turtles, Magellanic Penguins, and juvenile basking sharks across taxa showing levy-like scaling laws (plots shown on log 10 scale; adapted from Sims et al. 2008 instance, for killer whales (Orcinus orca; Durban and Pitman 2012).
An early example of the value of relatively small sample sizes is a satellite tagging study of six Wandering Albatrosses (Diomedea exulans), which revealed individuals travelling thousands of kilometers in a single foraging trip during an incubation shift in the southwestern Indian Ocean (Jouventin and Weimerskirch 1990). Although such a small data set might not provide sufficient precision to determine preferred foraging areas, the consistency of the distances covered provoked a fundamental shift in how researchers thought about habitat use by these birds. Similarly, for 50 yr, basking sharks were thought to hibernate in deep waters of the North Atlantic Ocean during winter until satellite tracking of five individuals showed that they exhibit extensive horizontal and vertical movements at this time .
As sample size increases, variability in space use can be defined in more detail. For example, while a study of nine leatherback turtles in the Atlantic Ocean revealed individuals all moving in disparate directions (Hays et al. 2004b), the majority of nine grey reef sharks (Carcharhinus amblyrhynchos) in the Great Barrier Reef, Australia, showed fidelity to a single reef, while one individual undertook a 134-km movement across the deep open ocean (Heupel et al. 2010). Inclusion of nine individuals in the latter study indicated that although large movements were present, they were not representative. In contrast, the former study indicated that a larger sample size is required to fully understand patterns in movement for leatherback turtles.
When a study species is rare or endangered, small sample sizes are unavoidable, but their value is amplified because they may represent a larger proportion of the population (McMahon and Hays 2006). However, the scope of questions that can be addressed for such species through tracking and biologging are likely to be constrained by low sample sizes. Sample sizes of approximately 10 tagged individuals have been useful in identifying responses to environmental variation, and possible drivers of movement of some species. For example, the diving behavior of 10 satellite-tracked female Antarctic fur seals (Arctocephalus gazella) highlighted their differential use of oceanographic features (Lea and Dubroca 2003). Insights into size-or sex-based differences in behavior can also become evident. For example, active acoustic tracking of only two male and two female benthic catsharks during a 14-d period (Sims et al. 2001) suggested sexual segregation by habitat. This result stimulated further studies that revealed the mechanisms underlying these sex differences in behavioral patterns (Wearmouth et al. 2012).
To further exemplify how small sample sizes can lead to insightful sex-based differences in marine megafauna, we used a simulation exercise to explore how differences in breeding periodicity between male and female turtles can be detected with even small increments in sample size. Understanding these differences in breeding periodicity is important because turtles have temperature-dependent sex determination, and the rising incubation temperatures due to climate change will likely produce increasingly sex ratios skewed toward females. For our simulation, we used information published in a recent study for loggerhead turtles (Caretta caretta) in the Mediterranean, where males tend to return to breed after 1 yr with probability of 0.76 (p male = 0.76) whereas females returned after longer intervals (i.e., the probability of returning to breed after 1 yr p female = 0.00; Hays et al. 2014a). These probabilities were originally based on tracks from 25 individuals (17 males and 8 females) but we use them here to show the likelihood of the same biological conclusion being reached with smaller sample sizes. Using these probabilities, and assuming equal numbers of tracked males and females, we ran 1,000 simulations for sample sizes ranging from one to eight female and male individuals, randomly selecting the number of males and females that would be recorded to return after 1 yr. When only three males and three females were tracked, the probability of recording a significant difference in numbers returning was only 0.331, but this rose to 0.983 when eight females and eight males were tracked (Fig. 3A). This simulation exercise illustrates how researchers can use available data to optimize the number of deployments they need to address their question of interest.

DEFINING THE NORM (SAMPLE SIZES OF A FEW 10S UP TO 100)
A better assessment of overall patterns of movement or behavior at the population scale may be possible after tens of individuals of the same species have been tagged. While specifying the sample size needed for these types of studies is challenging, simulation exercises can be useful as exploratory tools to understand how much data are needed. Using another simulation exercise, we illustrate how confidence in observed results can be improved by sample sizes increasing from <10 to a few tens of tags (Fig. 3B). As an example of a study question, we focused here on what is the clutch frequency of turtles, i.e., the frequency with which eggs are laid within and among seasons, which is a critical life-history trait for quantifying population trends of turtles. The number of nesting females in a population is typically determined by counting tracks on beaches associated with nesting and then dividing by a nominal mean frequency of clutches. A recent study that tracked 10 green turtles (Chelonia mydas) in Diego Garcia, Indian Ocean, showed that their mean clutch frequency was six (Esteban et al. 2017), and led to the understanding that the population at this locality was about one-half the size of that estimated from previous studies that patrolled beaches on foot to intercept females when they nested. Using the probabilities obtained in Esteban et al. (2017), we can simulate how the confidence limits on estimates of mean clutch frequency change with sample size. For each sample size (3-40), we ran 1,000 simulations and then determined the standard deviation (SD) of the estimate for mean clutch frequency, which reflects the variation in the estimate of mean clutch frequency that might be recorded with that sample size (Fig. 3B). When the sample size was three, the SD was~1.20 (i.e., the 95% confidence limit on the estimate of mean clutch frequency that might have been derived was AE3 clutches), but when the sample size was increased to 30, the SD reduced to 0.38, and to 0.34 when the sample size was 40 individuals (i.e., 95% confidence limit = AE0.11 clutches). Examples of improvement on previous results through increased sample sizes are also found in published literature. For example, assessment of the diving behavior of 13 female northern elephant seals showed maximum dive durations of 106 min (Le Boeuf et al. 2000) and FIG. 3. Simulation examples to understand the effects of sample of size when addressing different key questions for the same guild (i.e., turtles). (A) Probability of finding differences in breeding periodicity of loggerhead turtles by simulating the number of males and females that would be recorded to return after 1 yr and then testing if there is a significant difference (P < 0.05) in the numbers of returning males and females for increasing sample sizes up to 10 individuals. (B) Standard deviation of the estimate for mean clutch frequency for green turtles reflecting the variation that might be recorded for the mean estimate with different samples sizes. (C) Percentage of individuals perceived to travel to locations 1 (blue), 2 (green), and 3 (red) shown in the schematic representation displayed in the center of the figure as the number of tags deployed increases from 5 to 40. The central scheme depicts movement dispersion and probabilities of detection of dispersion to different locations and detection of a rare event, with arrow width proportional to probability of dispersion from the tagging location X to each of the locations 1 (blue), 2 (green), and 3 (red) (0.65, 0.30, and 0.05) for a population of 100 individuals. (D) Percentage of the population expected to travel to each of the locations 1, 2, and 3 depicted in the central scheme showing a decrease in the confidence intervals as the number of tags increases. (E) Representation of the confidence intervals for detection of possible rare events such as colonization of a new site. was confirmed as a good approximation in a later study with a sample of 211 females aimed at identifying drivers of their large-scale distribution and interannual variability in foraging and breeding success (Robinson et al. 2012). Despite the different focus of these two studies, the later data confirmed that the earlier study had a large enough sample size to provide a general understanding of the dive behavior of the species.
Commonly, tagging studies aim to quantify space use and identify important utilization areas (e.g., 50% kernel densities). Such estimates are highly sensitive to sample size due to variability in movement among individuals, as shown by Gutowsky et al. (2015) with albatrosses. That study demonstrated that the sensitivity of grouplevel space-use estimates stabilizes with increasing sample size of albatrosses, in that the areas covered by space use estimates generated from data sets comprising different individuals roughly approached an asymptote in median area estimates around a mean sample size of 17-21 individuals. However, the range of estimates remained large with the 95% and 50% contour area estimates varying by 7.2 and 1 million km 2 , respectively. For other seabirds, like European Shags (Phalacrocorax aristotelis) and Black-legged Kittiwakes (Rissa tridactyla), sample sizes of 39 and 83 have been used, respectively, to estimate space use (Soanes et al. 2013). Estimates of area utilization are also highly dependent on the animal's range and the context of habitat utilization. For example, a sample size of 30 was sufficient for calculating the area used by flatback turtles (Natator depressus) during the nesting season but not for calculating the typically larger area used post breeding (Thums et al. 2018b).
To demonstrate the effect of sample size on utilization area and kernel estimates for a range of species, we used a resampling approach to test whether an asymptotic relationship between sample size and monthly utilization area estimates was attained. We did this for probability contours of 50% and 25% (typically considered of relevance to marine spatial planning) using tracking data from six different species in the Pacific Ocean (results detailed in Fig. 4). Together, these studies demonstrate the power and limitations of a moderate number of tags to improve our understanding of animal movements. Another example showing how an increasing number of tracks can assist our understanding of animal movement was a study tracking 75 loggerhead turtles across the Mediterranean finding that they exhibit disparate dispersal patterns. The study highlighted that extending protected areas to include 10 of the core sites used by loggerhead turtles would result in better protection for 64% of the population (Schofield et al. 2013).
To depict the effect of sample size on our understanding of dispersal of individuals from a population, we used a simulation of a hypothetical population of 100 individuals in location X where tagging took place, and then assumed equal probabilities of 0.65, 0.30, and 0.05 for individuals to go to location 1, 2, and 3, respectively (Fig. 3C). Increasing the number of randomly tagged individuals from 5 to 50, and repeating this procedure 10,000 times, showed that accurate detection of movements to location 3 was only possible at the higher number of tags (n~40). Moreover, precision around the percentage of the population travelling to each location increased with increasing numbers of tag deployments. In our example, 95% confidence intervals for the percentage of the population travelling to 1 narrowed from between 61.0-69.1% to between 64.1-66.0% as sample size increased from 1 to 40 tags, with similar reductions obtained for the other locations. As we have demonstrated, power-analysis needs some understanding of the system to allow the model to be parameterized and can be used to assess if there is further information likely to be obtained by tagging more individuals. However, it is important to highlight that simulation results only provide an idea of how many representative tracks are needed and do not consider the excess tags needed to account for potential problems with data acquisition, such as early tag failure or loss prior to exhaustion of battery, as mentioned earlier. So, interpretation of the results presented above is that little further detail would be gained after obtaining more than 40 representative tracks to answer a specific question about dispersal patterns. However, new and different questions may emerge to justify further tag deployments. Examples would include the need to assess inter-annual variability in movements or to address tagging sampling design to adjust not only for sample size but also sex ratio of animals tagged, size range, or range of capture and release sites.
As sample size increases, improved evaluation of the use of marine protected areas (MPAs) also becomes possible. Although the following studies provide only examples of detected patterns for the sample size used, what is crucial here is that having a large enough sample size across different seasons, sites or stages (e.g., breeding vs. non-breeding) allows detection of gradients across other variables of interest including environmental variables for habitat use detection. For example, acoustic tagging of 57 sharks showed that only one-half of the available protected space was used while sharks made excursions in and out of MPAs at consistent locations along the boundaries (Knip et al. 2012). Deployment of multiple tens of tags (simultaneous or staggered in time) can, therefore, provide insight into the scale of spatiotemporal movements to assist tailoring MPA design for improved effectiveness. Similarly, tens of tags can assist the assessment of movement variability driven by changes in environmental conditions. For example, behavioral changes by 32 fur seals were associated with strong El Niño conditions (Lea et al. 2006), movement of 40 bonnethead sharks (Sphyrna tiburo) changed in association with decreased salinity due to freshwater discharge (Ubeda et al. 2009), and foraging success of 50 Little penguins (Eudyptula minor) was shown to relate to boundary current anomalies in different years (Carroll et al. 2016). Detection of philopatry in highly migratory species has also been possible when using a sample size of tens of tags. Jorgensen et al. (2010) showed high philopatry in the migratory behavior of white sharks  1-114 individuals). Plots show means and standard deviation of home range area, with mean estimates initially increasing as a function of the number of individuals tracked (the home range area of one individual is likely much smaller than the utilization distribution of 10 individuals). Once most of the variability in the population is captured, the estimate of space use of the population stabilizes resulting in an asymptote in the plot. Estimates of home range size approached an asymptote for Northern elephant seals and salmon sharks (species with data sets of 57-108 individuals) at sample sizes of 20-40 individuals in most months at all contour levels. In contrast, for estimates calculated from samples sizes between 10 and 30 individuals for (A) Black-footed Albatross, Laysan Albatross, sooty Shearwater, and Pacific bluefin tuna that were recorded to undertake their trans-Pacific migration, and (B) white sharks from June through September), the sample size was insufficient to observe an asymptote in estimates of utilization area (especially at the largest probability contours that would capture rare events). There were also large confidence intervals around the area estimates for these species' data sets, implying that larger data sets were needed to increase the precision and accuracy of the estimates. based on the results from 68 satellite-linked tags and revealed a predictive migratory cycle within the same network of coastal hotspots for a genetically distinct population. The larger sample sizes used in these examples enabled researchers to claim that their results were representative of the wider population of these species.
Although an individual study might include only a few tags, sample sizes in the 10s (and greater) can be obtained by pooling data across studies, allowing researchers to pose new questions and search for general patterns. For example, the compilation of eight studies with low individual sample sizes (1-13 summing to 50 tags) across the Mediterranean Sea and the Pacific, Atlantic, and Indian Oceans confirmed previous concerns of high sea turtle mortalities by fisheries (Hays et al. 2003). The same applies to multispecies studies, where even low sample sizes for individuals of different species pooled together allow some level of interspecies comparisons. For example, informed comparison of vertical movement patterns and their statistical properties across taxa were obtained with data from 31 individuals from seven species (Sims et al. 2008). While the sample size of the later study was relatively small, the high resolution of the diving data contained in the tracks, which included over one million data points, allow for a comparative multispecies analysis.

DEFINING POPULATION PARAMETERS (SAMPLE SIZES~100)
With the implicit assumption that each tag results in an appropriate amount of data (e.g., number of locations and enough resolution), improved accuracy in our understanding of patterns (e.g., space use) can be obtained using a larger number of tagged animals (see examples of northern elephant seals and salmon sharks, Lamna ditropis, in Fig. 4A). As sample sizes approach 100, it becomes possible to assess movement behavior between populations of the same species and across large areas. For example, 101 tracks of leatherback turtles were used to define areas of high susceptibility to by-catch across the Atlantic Ocean (Fossette et al. 2014). In this example, a large sample size was necessary to encompass a range of different nesting populations, all of which foraged within the Atlantic. Likewise, Breed et al. (2006) investigated segregation of seasonal foraging habitats of grey seals from 95 tagged individuals. In cases where sex or age leads to segregated behavior, the number of tags needed to detect specific patterns of movement will necessarily be inflated to identify potential behavioral mechanisms, and more so if a comparison across populations is to be completed. As the spatial scale under consideration increases, so too does the minimum number of tags, until even sample sizes of 100 may be insufficient. For example, when Sequeira et al. (2013) compiled all publicly available tracking data for whale sharks, they found that the existing~100 tracks (average 90 d deployment with a range from hours to >3 yr) were insufficient to reveal global migration patterns.
Assessment of animal health and increasing anthropogenic impacts on movement is also highly relevant and urgently sought for many species. For example, data from 136 West Indian manatees were used to assess rehabilitation success following release (Adimey et al. 2016). However, the large sample sizes needed for assessing effects at the species level are not commonly available (but see Fossette et al. 2014), and pooling data across species of the same guild might provide the means to obtain relevant information. This was the case for a data set of 113 oceanic sharks examined to detect spatial overlap with commercial fisheries. This data set comprised tracks from six species (average of 17 tags per species) and led to the revelation that shark hotspots in the North Atlantic Ocean may be at risk from overfishing (Queiroz et al. 2016). Similarly, passive acoustic tracking of 116 reef sharks of five species (average of 17 tags per species) together with 25 hawksbill turtles (Eretmochelys imbricata) determined the long-term, fine-scale space use inside and outside a marine protected area (MPA) for each species. This study also revealed that a modest increase in MPA size could lead to a 34% increase in spatial coverage of these predator's movements (Lea et al. 2016).

MOVING TOWARD BIG DATA ANALYSIS (VERY LARGE SAMPLE SIZES; ≫100)
Common areas of space use at large spatial scales can be revealed using a large number of tagged individuals (≫100). For example, Wakefield et al. (2013) used tracking data of 184 Northern Gannets from different breeding areas to assess the levels of foraging area overlap around the British Isles. A much larger tracking data set of 287 individual elephant seals led to an improved understanding of how these seals utilize the circumpolar habitat in the Southern Ocean (Hindell et al. 2016). Large data sets also allow application of big data approaches, which are scalable to very large numbers of tracks (e.g., as used in human mobility studies). A recent example of the application of such approaches to tracking data of 272 southern elephant seals showed that, despite idiosyncrasies in movement, a clear signature of directed movement emerged, highlighting the presence of intrinsic drivers of movement such as memory (Rodriguez et al. 2017). In addition, samples size in the hundreds can reveal correlated or coordinated movement patterns among individuals. An example is the coherent movement patterns suggested by the sonification of movement (i.e., the generation of sound based on the movement patterns in the tracking data) of over 300 northern elephant seals tagged over~10 yr in the Northeast Pacific Ocean (Duarte et al. 2018). These studies show that the use of techniques that can deal with big data (Leek et al. 2017) might bring new insights to movement ecology.
Very large sample sizes of single species can also be useful to increase the probability of defining events not commonly detected using tags, such as colonization of a new site or mortality (Hays et al. 2003). To illustrate this point, we extended the simulation exercise presented above to consider how many tags would be needed to detect a rare event with a probability of 0.001 and showed that hundreds of tags would be required (Fig. 3E).
For multiple species, the quantity of information returned climbs dramatically as sample size increases to many hundreds, particularly for assessing movement patterns in response to resource fields within the same geographical extent. For example, in East Antarctica, a compilation of 268 satellite tracks for six top predators including penguins, albatrosses, and seals revealed areas of particular ecological significance for these multiple species (Raymond et al. 2015). Maxwell et al. (2013) used tracks from 685 individuals of eight species in the North Pacific, to show high variability in the distribution of cumulative impacts across species and highlight that effective spatial management will need to account for trade-offs among stressors. These individuals had been tagged as part of the Tagging of Pacific Predators (TOPP) project, a much larger collaborative effort under the Census of Marine Life field program, which led to the deployment of an unprecedented number of tags (4,300). Of these, 1,791 tracks were used in a single study to assess space use by multiple predatory species in the Pacific Ocean highlighting hotspots, migration pathways, and niche partitioning among species (Block et al. 2011), and was used to predict how climate change will affect the available habitat for different species (Hazen et al. 2013). Another subset of 1,648 tracks representing 14 species was also used to show annual patterns of movements through the high seas and across geopolitical boundaries in the Pacific Ocean (Harrison et al. 2018). Most recently, the coastal movements of 2,181 individuals from 92 species including fish, sharks, turtles, and marine mammals were used to identify four distinct functional movement classes in the coastal waters of Australia, with these classes emerging only through aggregating data across the entire data set (Brodie et al. 2018). Finally, the Marine Megafauna Movement Analytical Program (MMMAP) used >2,500 individual tracks across 50 species of marine vertebrates including whales, sharks, seals, seabirds, polar bears, sirenians, and turtles, to show that, unlike terrestrial animals, movement patterns in marine animals are strongly conserved across species regardless of evolutionary history, with movements being more complex in the coastal than in the open ocean (Sequeira et al. 2018). As these large aggregated data sets increase further in size, their temporal and spatial coverage may become sufficient to retrospectively detect signals of climate change or other perturbations in the movement patterns of species (Weimerskirch et al. 2012).
In the last decade, many tens of thousands of tags have been deployed on animals and, if shared, the resulting data sets will allow for powerful analysis at large spatiotemporal scales (Thums et al. 2018a). Such data sets can assist in answering topical questions (Hays et al. 2016), refine conservation benefits (Allen and Singh 2016), and facilitate the use of big data approaches to enhance our understanding of animal movements (Meekan et al. 2017, Rodriguez et al. 2017. The advantages of data sharing for researchers are clear (Nguyen et al. 2017), and well-recognized in some fields of scientific inquiry such as molecular ecology and physical oceanography. Encouragingly, some tracking programs already have some type of open data policy, and a large range of online repositories are now available (Campbell et al. 2016), including Zoatrack (Dwyer et al. 2015), Movebank, the Integrated Marine Observing System (IMOS), and the Ocean Tracking Network (OTN). The increasing use of telemetry technology also supports unprecedented opportunities for collaboration among researchers studying different species.
By combining satellite tracking with acoustic detection and making relatively minor compromises on equipment sampling parameters (i.e., scanning range of tag frequencies and using collaborative acoustic monitoring arrays), there is potential for researchers to expand the spatial and temporal range of tracking efforts and collect data for multiple species simultaneously (Lidgard et al. 2014, Aven et al. 2015. The big, but heterogeneous data acquired by pooling data sets from a variety of sources will present a challenge for analysis, data visualization and storage. Ways to overcome such challenges have already been addressed in other disciplines. For example, studies of human mobility interrogate massive and rapidly growing databases of geolocations available from smart phones and internet records, which describe the movements of humans (Gonzalez et al. 2008). Although such studies focus on a single species (humans; Homo sapiens), they have shown the power of data encompassing tens of thousands of individuals to address questions associated with collective responses and with processes occurring at the population level. Great examples include the study of epidemics, transmission of culture or mood (Mocanu et al. 2013), or the development of models describing mobility patterns (e.g., radiation model; Simini et al. 2012).

CONCLUSION
The answer to "how many animals should be tracked?" is intrinsically dependent on the species of interest, on the tagging methods used, and, primarily, on the question that needs to be addressed, including spatial and temporal coverage (see examples in Table 1). We point out that tracking studies usually develop in stages, including (1) an initial phase of "innovation and discovery" that commonly involves small sample sizes (N ≤ 10), through to (2) a stage of "confirmation and consolidation" of results with intermediate sample sizes (10 < N ≤ 100), and, last, to (3) more synthetic, overarching, and inter-disciplinary studies involving larger sample sizes (N ≫ 100). At each stage, the impact of the sample size on the key conclusions can be assessed (e.g., the proportion of individuals travelling to different sites) and the outcomes of this assessment can be used to objectively plan how the sample size needs to be increased to answer different questions with the required level of confidence. As sample size  (Priede 1984) detected >1,000 km trip for leatherback turtles (Hughes et al. 1998); demonstrated the potential ecophysiological capacity (aerobic limits) for loggerhead turtles (Hochscheid et al. 2005); revealed link between swimming effort to depthdependent, air-mediated buoyancy and swim angle (Hays et al. 2004c) proved that elephant seals could act as samplers of the environment providing an 8-month CTD hydrographic profile that allowed an assessment of the seasonal evolution of the upper ocean (Meredith et al. 2011); first freeranging heart rate recorded for an adult female southern elephant seal during the postbreeding migration (Hindell and Lea 1998) N ≤ 10 (1) Initial insights into individual variability, scale of movements, and drivers of movement; (2) Generate hypotheses revealed Wandering Albatrosses travel thousands of kilometers in foraging trips during an incubation shift (Jouventin and Weimerskirch 1990) revealed that basking sharks exhibited extensive horizontal and vertical movements during winter rather than hibernating ; provided evidence for reverse diel vertical migration in basking sharks (Sims et al. 2005); revealed diel vertical migration for 10 individuals  recorded the first dive profiles outside the nesting season based on three individuals (Hays et al. 2004a); revealed that nine individuals all moved in disparate directions in the Atlantic Ocean (Hays et al. 2004b) revealed the vertical distribution of southern elephant seal's prey is tightly related to light level (Jaud et al. 2012); identified three distinct foraging patterns for Galapagos sea lions (Villegas-Amtmann et al.

2008)
10 < N < 100 (1) Estimate space use; (2) characterize spatiotemporal patterns; (3) identify specific behaviors (e.g., sex and age differences) defined space use for albatrosses (Gutowsky et al. 2015), shags, and kittiwakes (Soanes et al. 2013); showed that foraging success of penguins relates to boundary current anomalies in different years (Carroll et al. 2016) used (1) Quantify habitat use over large spatial scales; (2) assess shifts in space use with time, among subpopulations or with gender, age class, and period (e.g., breeding cycles); (3) estimate susceptibility to interactions with human activities; (4) allow multispecies assessments at large spatial scales assessed the levels of foraging area overlap for Northern Gannets from different breeding areas (Wakefield et al. 2013) revealed that oceanic shark hotspots may be at risk from overfishing (Queiroz et al. 2016) used to define areas of high susceptibility for fisheries by-catch at the scale of the Atlantic Ocean based on 106 tracks (Fossette et al. 2014) led to a better understanding of how seals use the circumpolar habitat in the Southern Ocean based on 287 seals (Hindell et al. 2016); allowed application of big data approaches to show memory as an intrinsic driver of movement for southern elephant seals (Rodriguez et al. 2017); reveal correlated or coordinated movements from a 10-yr movement data set of northern elephant seals suggested through sonification (Duarte et al. 2018) revealed areas of particular ecological significance for multiple species (Raymond et al. 2015); showed high variability in the distribution of cumulative impacts across multiple species (Maxwell et al. 2013); highlighted hotspots, migration pathways, and niche partitioning among multiple species in the Pacific Ocean (Block et al. 2011); showed that movement patterns in marine animals are strongly conserved across species regardless of evolutionary history (Sequeira et al. 2018) increases, both in relation to the number of individuals tracked and the length of individual tracks, there is improved ability to resolve a range of questions associated with movement, such as home-range estimates, migration patterns including identification of high-use corridors, migration distance and variability in destinations, and foraging search patterns. How large a sample size is needed to resolve these various movement components to a certain level of confidence will depend on the extent of individual variability and on the behavior of the species being tracked. We caution that the same given number of tags can also lead to very different data depending on when the tags are deployed and the duration of the tag deployment. For example, for pinnipeds, tagging the individuals close to molting may result in a track of very short duration with the tag coming off before its battery is exhausted, while post-molt deployments will likely result in 8-9 months of tracking data (Treasure et al. 2017). For species that display different seasonal movement patterns, such as sirenians, differences in the data obtained with the same sample sizes can vary as much as detection of little movement in the peaks of summer or winter, to hundreds of kilometers of movement being captured in spring and fall (Aven et al. 2016). In the latter example, if a tag continues to function for 9--10 months, both high-resolution local data and wider regional habitat use can be obtained. In such cases, the timing and duration of a small number of tags may yield more or better information than larger sample sizes deployed at the wrong time. Moreover, as variability increases, so too will the sample size needed to resolve research questions. Similarly, variability has implications in studies pooling data sets across species and aiming to make inferences on comparisons across groups. In such cases, the number of individuals representing each specific group will affect the high-level inferences that can be made based on the pooled data sets. Comparing changes in space use over time is only as powerful as the smallest within-year group size; however, pooled data sets are generally useful to draw conclusions across groups.
We suggest that the planning of a tracking study should include a thorough search of the published literature where similar questions have been addressed (even if for other species). For example, studies provided in Table 1 show the types of questions that have been asked for species of different guilds with increasing sample size and can be used as guide for minimum numbers required by future studies. If prior information is available for the specific study species, then the use of simulation exercises similar to those presented in Figs. 3 and 4 (refer to DataS1: Sequeira et al_Simulation Code.R) can be informative. Also, when estimating utilization areas and kernel densities, a full evaluation of sensitivity to sample size should be carried out and results should be reported with the confidence estimates (Fig. 4).
It is often not possible to do a priori assessments of the importance of sample size as the various tracking outcomes are not known. In such cases, we suggest that the question to be addressed is explicitly defined so it becomes clear in which phase of research the question falls, i.e., "innovation and discovery," "confirmation or consolidation," or "synthetic, overarching, inter-disciplinary approach." Depending on the phase, the relative sample size (small, intermediate, or large) becomes easier to estimate. Once this target sample size has been identified, it then becomes useful to consult Table 1 to have an idea of the types of questions that have been addressed with specific sample sizes for different taxa. Generally, within each phase, the largest logistically feasible sample size should be employed, within ethical and logistical constraints. This is because larger sample sizes will provide greater confidence in species-or population-level inferences. However, sample sizes will necessarily be lower for rare or cryptic species, small or critically endangered populations, and when tagging may be too disruptive. The number of individuals tagged within populations, the amount and resolution of data, as well as, their accuracy also impact the types of questions that can be addressed. Therefore, in addition to the practical limitations in sample size in such situations, there will also be financial and research scope limitations.
Recent advances made in the field of telemetry and bio-logging have led to an exponential increase in satellite telemetry studies (Thums et al. 2018a), with very large sample size (≫1,000 tracks) recently starting to appear in the literature (Block et al. 2011, Brodie et al. 2018, Sequeira et al. 2018. In spite of that, a sample size of one with sufficient track length can still lead to scientific insights. This is particularly relevant for species that have never been tracked before, when previous deployments have not been successful, or when testing new sensors ). In such situations, and where the current knowledge of a species' movement is still in its infancy, any new insights from small sample sizes have the potential to significantly advance knowledge. In contrast, for species where tracking is well established (e.g., some seals or turtles and seabirds), the questions relating to population densities, biologically important areas, population structure, or social networking will require tracks of many individuals, or can be addressed by retrospective analysis after combining existing data across studies and including multiple researchers. Clearly, there are many challenges to statistically estimate an appropriate sample size for telemetry studies across the many and varied contexts. Our review highlights these challenges and provides recommendations based on examples and data simulations to assist in decision making.