Learning and Animal Movement

Lewis, Mark A.; Fagan, William F.; Auger-Méthé, Marie; Frair, Jacqueline; Fryxell, John M.; Gros, Claudius; Gurarie, Eliezer; Healy, Susan D.; Merkle, Jerod A.

doi:10.3389/fevo.2021.681704

REVIEW article

Front. Ecol. Evol., 09 July 2021

Sec. Behavioral and Evolutionary Ecology

Volume 9 - 2021 | https://doi.org/10.3389/fevo.2021.681704

This article is part of the Research TopicCognitive Movement EcologyView all 16 articles

Learning and Animal Movement

Mark A. Lewis^1,2*†

William F. Fagan^3†

Marie Auger-Méthé^4,5

Jacqueline Frair⁶

Jerod A. Merkle¹⁰

¹Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, Canada
²Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
³Department of Biology, University of Maryland, College Park, College Park, MD, United States
⁴Department of Statistics, University of British Columbia, Vancouver, BC, Canada
⁵Institute for the Oceans and Fisheries, University of British Columbia, Vancouver, BC, Canada
⁶Department of Environmental and Forest Biology, State University of New York, Syracuse, NY, United States
⁷Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
⁸Physics Department, Goethe University, Frankfurt, Germany
⁹School of Biology, University of St Andrews, St Andrews, United Kingdom
¹⁰Department of Zoology and Physiology, University of Wyoming, Laramie, WY, United States

Integrating diverse concepts from animal behavior, movement ecology, and machine learning, we develop an overview of the ecology of learning and animal movement. Learning-based movement is clearly relevant to ecological problems, but the subject is rooted firmly in psychology, including a distinct terminology. We contrast this psychological origin of learning with the task-oriented perspective on learning that has emerged from the field of machine learning. We review conceptual frameworks that characterize the role of learning in movement, discuss emerging trends, and summarize recent developments in the analysis of movement data. We also discuss the relative advantages of different modeling approaches for exploring the learning-movement interface. We explore in depth how individual and social modalities of learning can matter to the ecology of animal movement, and highlight how diverse kinds of field studies, ranging from translocation efforts to manipulative experiments, can provide critical insight into the learning process in animal movement.

Introduction

Animal movement, in the form of translocation from one locale to another, takes many forms and is critical to ecological processes. This understanding has given rise to the rapidly growing discipline called movement ecology (Nathan, 2008). Concurrently, the subject of learning has been studied from the perspective of animal behavior, both in the context of ecological interactions and in the context of movement itself (Box 1 and Table 1). Animal behavior has a well-established and celebrated history of understanding learning and there has been recent growth in connecting learning and memory to animal movement behavior (e.g., Fagan et al., 2013). At the same time, a recent explosion of ideas about machine learning is now creating new perspectives on understanding animal movement based on algorithms.

BOX 1. Definitions of terms associated with learning.

This box defines terms central to a synthesis of concepts from animal behavior, ecology, psychology, and certain quantitative methods.

Foundational Concepts

Learning:

Psychology-based definition: the cause-effect process leading to information acquisition that occurs as a result of an individual’s experience.

Task-based definition: improved performance for a specific task, based on experience.

Memory: The storage, retention and retrieval of information.

Spatial memory: The memory for where objects/resources/places are in space. Representation of space. Encodes spatial relationships or configurations.

Supervised machine learning: The process by which the machine is trained to perform a task where some input data are already labeled with the correct output. It can be compared to learning in the presence of a supervisor or teacher.

Statistical learning theory: An unsupervised framework for machine learning that deals with the problem of extracting statistically relevant correlations from data.

Modes of Learning

Associative learning: When an animal makes an association between a stimulus and an outcome. Two forms are:

Classical (Pavlovian) conditioning: an animal associates a biologically relevant stimulus (e.g., food) with a previously irrelevant stimulus. For example, a dog presented the sound of a bell rung alongside the presentation of food, will come to salivate at the sound of the bell in the absence of food. Another example would be that a raccoon learns that garbage cans contain food.

Operant (instrumental) conditioning: the behavior of an animal is controlled by the consequences of that behavior. Typically, this behavior develops through sequential reinforcement (e.g., a raccoon learns how to open the garbage can to get food and is rewarded).

Positive reinforcement: Behavior is rewarded and then increases.

Negative reinforcement: Behavior is increased through avoidance of an unpleasant stimulus (also known as instrumental conditioning).

Punishment or Inhibitory learning: Behavior is decreased through avoidance of an unpleasant stimulus. This contrasts with negative reinforcement, where the behavior increases.

Reinforcement learning: From machine learning: The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. This is synonymous with trial and error learning. As in optimal foraging in ecology, the focus is on the balance between exploration (of unfamiliar objects/places) and exploitation (of current knowledge).

Online learning: From machine learning: A technique for implementing machine learning based on data becoming available in a sequential order and then being used to update the best predictor for future data at each step.

Habituation: after repeated exposure, an animal decreasingly responds to a stimulus. The stable end state is the animal’s level of tolerance of a stimulus and the outcome is higher tolerance.

Sensitization: after repeated exposure, an animal increasingly responds to a stimulus. The stable end state is the animal’s level of tolerance of a stimulus, and the outcome is decreased tolerance.

Latent learning: an animal learns by gathering and storing information, without immediate reward.

Pathways of Learning

Social learning: Also called “transmission,” this is an umbrella term that includes transfer of skills, concepts, rules and strategies that occur in social contexts and can affect individual behavior. These include:

Social facilitation: An animal has an increased probability of performing a behavior in the presence of a conspecific.

Local enhancement: An individual’s interest in an object or location is mediated by the interest or movement of others.

Imitation: Novel copying of a model behavior through observation that results in a reliably similar outcome.

Cultural transmission: Social transmission leading to the development of traditions that are passed down from generation to generation.

Vertical vs. horizontal learning: Sometimes referred to as parent vs. peer learning, this dichotomy characterizes the generational source of social information.

Information center: Particular locations or events that provide opportunity for information exchange. For example, a community roost may enable individuals to follow well-fed peers to new foraging locations.

Direct information exchange: An animal is provided sender-based, actively communicated information by another individual. For example, honeybees tell their sisters the locations of rewarding flowers.

Optimization-related Terms

Genetic algorithm: A population of candidate solutions to an optimization problem that evolve toward better solutions.

Policy: In machine learning, the mapping of states to actions (e.g., a hungry animal begins to hunt).

Utility function: In machine learning, the assignment of weights or values to agent states. Actions are selected by comparing the values of the predicted states that derive from particular action. For example, a policy involving search vs. sit-and-wait strategies will yield different outcomes for a hungry animal.

Adaptive movement: When animals modify their movement in response to a change. In models, adaptive implies movement behaviors that confer fitness/performance benefits.

TABLE 1

Table 1. Case studies of learning and animal movement.

Along with these recent developments, the ability of ecologists to track animal movements and behaviors remotely in the wild has been steadily increasing. The collection of massive amounts of data on animal movement, primarily via satellite tracking, is now possible at a scale and level of detail previously unimaginable and can be linked with similarly improving remotely sensed or modeled environmental data (e.g., vegetation, anthropogenic disturbance, terrain, NDVI, snow depth) (Kays et al., 2015). Furthermore, more recent advances in bio-logging (e.g., accelerometers, proximity measures, audio-, and video-recording devices) provide direct information on some of the physiological (e.g., jaw movement, heart rate, cortisol, stable isotopes, reproductive status), and social (e.g., interactions with conspecifics), contexts of movements (Wilmers et al., 2015). This coupling of movement patterns with the movement context has created opportunities to infer learning mechanisms and meld ideas from animal behavior, movement ecology, and remote sensing in the context of ecology of learning and animal movement. We develop such a synthesis here.

We start with a focus on learning as a means for acquiring information and making decisions. Employing two related definitions of learning, one from psychology and the other related to computer science, we evaluate the benefits, costs and limitations of learning in the context of animal movement. Next, we address the modality of learning in animal movement, ranging from individual to social. We then develop links to related disciplines: psychology, animal cognition, and machine learning. We close by reviewing approaches to studying the process of learning and animal movement, whether from experimental or observational studies, discussing the role that models can play in this endeavor, and suggesting areas for future developments.

Individual Information Acquisition and Decision Making

Definition of Learning

We start with a psychology-based definition of learning, which states that learning is the information acquisition that occurs via an individual’s experience that results in a detectable and consistent change in neurophysiology and/or behavior (Box 1). Movement intersects with this definition of learning in several key ways. First, movement will give rise to learning if the movement facilitates information acquisition by introducing an animal to a new environment (e.g., information on forage availability) or state (e.g., information from increased vigilance). Second, the learned information can give rise to new movement decisions if the information acquired is used to change movement patterns (e.g., switching to area-restricted search in regions of high forage availability). Lastly, learning can be about movement itself, for example, when an animal learns where and when to migrate by imitating conspecifics (e.g., crane migration). Figure 1 depicts these connections among movement, information processing, the environment, and the internal states of the animal.

FIGURE 1

Figure 1. A conceptualization of learning in the context of animal movement. An individual’s environment (green, including social context) and its internal state (gray) can both influence the onset of information gathering via the attention that an individual pays to landscape features (arrows 1 and 2, respectively). As currently understood by psychologists, the information gathering pathway involving attention, perception, learning, and memory appears inside the animal’s brain (pink, unlabeled arrows) ultimately providing input to a movement decision (arrow 3). Both the individual’s environment (arrow 4) and its internal state (arrow 5) can then shape and modify the link between memory and movement. The movement decision has ramifications for the environment (arrow 6) and for the internal state (arrow 7). Lastly, the environment can alter an individual’s internal state directly (arrow 8) without invoking information gathering and memory, often via social interactions.

Laboratory studies of learning can be used to seek out direct cellular evidence for neurophysiological changes arising from information acquisition and storage via functional magnetic resonance imaging (Marsh et al., 2010). However, these approaches are impractical in studies of wild animals, for which most ecologically relevant evidence for learning comes from observing changes in behavior as a result of experience. Thus, although the psychology-based definition of learning above does not strictly involve decision-making, the ecological implications of learning are often intimately tied to experience and the decision-making process. This emphasis on process means that movement-related learning is more similar to how machine learning is defined: improved performance for a specific task as a result of prior experience. This definition, which we refer to as the task-based definition, differs from the psychological definition because it is directly tied to experience-based improvements in performance for a specific task (Box 1).

The Learning Process

The process of learning includes all the steps needed for information acquisition based on experiences encountered. Broadly, these steps include attention to relevant information, perception of the information, acquisition of that information, and, finally, storage, retention, and retrieval (memory) of that information. At this point, the information can be acted upon, for example, to make a movement decision (Figure 1).

Diverse factors may impede or enhance an animal’s attention to information from its environment or from other individuals. For example, animals in unfamiliar environments may be more (or less) observant of environmental cues (Wolfe, 1969) and certain types of social interaction may increase or decrease attentiveness, leading to social learning (Heyes, 1994). Other factors, such as the internal state of an animal (Dorrance and Zentall, 2001) or its risk sensitivity (Bacon et al., 2010) may also play a role in determining attentiveness (Figure 1).

The perception and acquisition of information depend on an animal’s sensory capacities. For most animals, certain sensory cues will be easier to detect than others, which can lead to different hierarchies of inputs, which may be altered contextually. For example, many aural and olfactory cues may be more important than visual information at night (Zollner and Lima, 1999). Once acquired, information must be committed to memory as part of the learning process. Spatially distributed information may be stored as a cognitive map, sometimes in a network-based non-Euclidean format (Noser and Byrne, 2014). Storage and retrieval of learned information is essential for decision making, which can be based on recent events or information from long ago (Polansky et al., 2015; Abrahms et al., 2019).

A test of successful learning is the ability to make a decision using information from past experiences that discriminates among alternative strategies. For example, in laboratory studies, exposure to spatially distributed food rewards in mazes can affect the movement choices of rats (Leonard and McNaughton, 1990). Similarly, for wolves, memory-related statistical metrics like “time since last visit” to a location may form the basis for movement decision discrimination (Schlägel et al., 2017). Of course, this link between experiences and decision making is both complex and context-dependent, being modulated by layers of complexity regarding habitats, social status, and internal states (Figure 1). The so-called diffusion theory for learning posits that the brain does not solve decision-making problems exactly but uses algorithms that optimize the speed and accuracy of choices (Bogacz, 2007).

Benefits and Costs of Learning

All mobile organisms face a wide variety of spatial challenges that influence individual fitness and present opportunities for decision making shaped by learning. Foraging opportunities and energetic constraints are patchy in space and time, in large part because the underlying physical and biotic processes are also patchy. Optimal foraging theory (McNamara and Houston, 1985; Stephens and Krebs, 1987; Mangel and Clark, 1988) provides a framework for understanding how benefits accrue from foraging in patches that offer the highest returns of energy or nutrient intake per unit time relative to time or energetic costs. Lost opportunities for social interaction, breeding, reproductive care, or shelter, and the risks of mortality due to predation, parasitism, or disease can then be considered.

When the rate of environmental change varies across time and space, as is common along elevation or rainfall gradients, theory suggests an animal may be able to improve its fitness through appropriate patterns of nomadic or migratory movement (e.g., Fryxell and Sinclair, 1988). Field studies support this theory. For example, migratory ungulates can choose patches at a landscape scale that yield appreciable improvement in rates of energy gain, even when such gains are transitory and require continual nomadic repositioning (Fryxell et al., 2004; Holdo et al., 2009). Memory can also influence the choice of movement patterns, such as the balance between range residency and migration (e.g., Shaw and Couzin, 2013). For example, when undergoing seasonal transitions between ranges, migratory ungulates can obtain fitness benefits by remembering previous trajectories (Bracis and Mueller, 2017; Jesmer et al., 2018; Merkle et al., 2019).

Researchers have investigated how learning can influence and confer advantages to moving organisms. Agent-based models of foragers with spatial memory have shown how fitness accrues from moving to acquire reliable information, even when that movement samples sub-optimal patches (Bracis et al., 2015). This is particularly clear when naïve animals are presented with an unfamiliar environment and movement is exploratory. However, even experienced individuals can benefit by spatially sampling a dynamic environment, in particular when resources can be depleted (Boyer and Walsh, 2010) or predation risk can change (Bracis et al., 2018). In this case, movement keeps current the information needed for appropriate decision making.

Given that foraging often results in resource depletion, fitness may also be improved through informed departure criteria based on marginal value leaving rules (Charnov, 1976; Arditi and Dacorogna, 1988; Brown, 1988). The field of “sampling behavior” (Stephens, 1987) extends ideas originally developed within the optimal foraging theory framework, which traditionally assumed that animals are omniscient (Krebs and Inman, 1992; Stephens et al., 2007). One sampling framework considers when animals should visit a patch to assess whether it has changed in value (Green, 1980), whereas another framework focuses on the benefit accrued by tracking a changing environment (Shettleworth et al., 1988). Foragers that sample patches or track changing conditions are learning about the current state of the environment (Stephens, 1987). Informed decision making about which patches to feed in and how long to do so requires reliable expectations regarding resource availability, predation risk, and energetic costs across an individual’s home range, as well as the capacity to estimate these same variables at a given spatial location. For example, primates foraging on fruit track the productivity of different trees and possibly fruit ripeness (Janson and Byrne, 2007). Overall, environmental predictability appears to be essential for the origin and success of movements based on learning and the reshaping of movement strategies based on experience more generally (Mueller et al., 2011; Riotte-Lambert and Matthiopoulos, 2020).

Learning can also help improve fitness even when spatial movement processes are not directly tied to foraging (e.g., territorial defense, migration, reproduction) (Box 2). For example, learning can provide advantages in dominance interactions (Kokko et al., 2006), efficiency of movement (Stamps, 1995), effective escape from predators (Brown, 2001), and large-scale dispersal decisions (Barry et al., 2020), all of which can translate into fitness benefits (Brown et al., 2008; Patrick and Weimerskirch, 2017). For territorial species, learning can influence how conflicts drive pattern formation (Stamps and Krishnan, 1999, 2001; Sih and Mateo, 2001) and alter strategies for territorial defense (Potts and Lewis, 2014; Schlägel and Lewis, 2014; Schlägel et al., 2017). For migratory species, this includes determining least-cost migration corridors between seasonal ranges (Bischof et al., 2012; Poor et al., 2012).

BOX 2. Learning and Movement Processes.

Movement is the spatial consequence of a number of different behaviors by animals. For example, a predator searching for predictable but mobile prey must change its location in space to increase the chances it will encounter a prey item. In many situations (e.g., predictable environments or regularly available prey), learning can reduce uncertainty and increase success in such spatial behaviors. We outline a selection of these below:

Search and attack in predation—When prey live in a complex and heterogeneous environment, predators may benefit by adjusting their search and attack behavior over time (Stephens et al., 2007). When predators detect their prey through visual, auditory, or olfactory cues, they can use associative learning to refine their “search image” and improve their ability to detect and attack prey (Ishii and Shimada, 2010). For instance, desert ants (Cataglyphis fortis) use associative learning to connect specific odors to food, and then use this food-odor memory to assist their next foraging journey (Huber and Knaden, 2018).

Escape from a predator—Spending time in familiar space allows animals to learn motor programs that enhance efficient movement within that space (Stamps, 1995). For instance, in response to a pursuing human, Eastern Chipmunks (Tamias striatus) within their home range (i.e., familiar space) take half as much time and travel half as far to reach a refuge compared to when outside their home range (Clarke et al., 1993).

Foraging bouts—An animal’s rate of energy gain while foraging can increase by collecting information about the environment (Stephens and Krebs, 1987), given the environment changes in a (at least somewhat) predictive way. In most of these cases, animals use associative learning to connect the reward of a food source with some aspect (e.g., color, nearby landmark) of that food source. For instance, Rufous Hummingbirds learned the location of flowers that they had emptied in a foraging trial, and in subsequent trials did not waste time visiting them again (Healy and Hurly, 1995).

Navigation and migration—Migratory movements notably occur at spatial scales that greatly exceed perceptual abilities of animals (mammals: Teitelbaum et al., 2015; birds: Alerstam et al., 2003). Thus, it is expected that animal migration is at least partly based on memory of past experience (though some migrations appear to be innate). When migration has a learned component, learning is likely used to improve migratory performance. For instance, social learning of migration helps ungulates improve energy gain (Jesmer et al., 2018) and helps birds reduce costs (Mueller et al., 2013).

Home range or territory selection—The decision process of choosing the size and location of home range or territories can be thought of as a learning process of integrating new information about the distribution of resources of a landscape (Mitchell and Powell, 2004). For instance, home range size is often larger in areas with fewer resources available (e.g., Morellet et al., 2013; Viana et al., 2018). Further, increased exploration events, presumably to sample new locations when others are unavailable, can result in still larger home ranges (Merkle et al., 2015).

While learning may have benefits, acquiring information based on experience does not come without costs. For example, information gathering can require substantial investment in time and/or energy, and may heighten risk (Eliassen et al., 2007) or come at the expense of lost opportunities for foraging, social interaction, or search for suitable breeding sites (Dall et al., 2005). The machinery for learning also exacts an energetic cost (Isler and Van Schaik, 2006; Niven, 2016). Furthermore, retained memories may negatively affect the acquisition of new information, and so there may be a trade-of between memory retention and acquiring new memories (Tello-Ramos et al., 2019).

Limitations to Measuring Learning From Animal Movement Patterns

Typical methods for recognizing learning in animal movement patterns do not measure the acquisition of information directly but rather rely on the task-based definition of learning, which requires improved performance for a specific task, based on acquired experience (Box 1). There are limitations to such methods, which pose challenges to learning from uncontrolled field-derived data. Unambiguously explaining a particular movement is a general challenge in the study of wildlife, where context, perception, internal states, and particular environmental cues all determine an animal’s response, but are often unobserved. For example, the “time since last visit” behavior in wolves, mentioned above, may not require memory, but could be explained by information from decaying scent marks (Schlägel and Lewis, 2014).

Obvious and obscure alternative explanations to learning and memory must be carefully considered in uncontrolled field studies. Table 2 categorizes a number of movement studies according to the level of evidence for learning—from strong to simply consistent with learning. For each we provide other, non-learning interpretations of the data that cannot be definitively excluded (Table 2).

TABLE 2

Table 2. Mapping empirical examples of learning to machine learning concepts.

Pathways of Learning for Animal Movement

Individuals can experience or gain information about their environment via different pathways—individually (i.e., by direct interaction with the environment; Dall et al., 2005) or socially (i.e., by observing others; Bandura and Walters, 1963; Rendell et al., 2010)—with learning demonstrated by a change in an individual’s behavior due to its experience (Box 1).

Individual Learning

Much of an animal’s individual learning is associative; that is, the individual learns by making an association between a stimulus and an outcome. Associative learning may arise either from classical (Pavlovian) conditioning, where an animal associates a biologically relevant stimulus (e.g., food) with a previously irrelevant stimulus (e.g., railway tracks), or from operant (instrumental) conditioning, where the behavior of the animal is controlled by the consequences of that behavior (e.g., feeding on grain on tracks leads to a food reward) (Pearce and Bouton, 2001).

These learning processes can make a behavior more likely through positive reinforcement (via rewards) or negative reinforcement (via unpleasant stimuli), or less likely through punishment or inhibitory learning (again, via unpleasant stimuli). For example, a bear foraging on railway tracks (Murray et al., 2017) might be more likely to forage when it finds grain (positive reinforcement) but less likely to forage through negative interactions with moving trains (punishment or inhibitory learning). Additionally, it might increase its vigilance through negative interactions with moving trains (negative reinforcement).

One associative learning mode relevant to animal movement is discrimination learning, where an animal learns to respond differently to distinct stimuli. For example, because homing pigeons can discriminate between the presence and absence of anomalies in magnetic fields, magnetoreception could be used for navigation (Mora et al., 2004).

Two non-associative learning modes that are relevant to movement are habituation (decreased response to a stimulus after repeated exposure) and sensitization (increased response to a stimulus after repeated exposure). These modes depend on the strength of association between stimulus and outcome, rather than the association itself. For example, the sensory responsiveness of honey bees declines after bees receive low sucrose sugar solutions (habituation) and increases after offerings of high sugar solutions (sensitization) (Scheiner, 2004). In turn, the sensory responsiveness of honey bees constrains individual foraging plasticity and skews the collective foraging decisions of colonies (Scheiner, 2004).

Another mode of learning, latent learning, is relevant to animal movement (Franks et al., 2007). Latent learning involves the gathering and storing of information, without immediate reward, such as when animals learn their migration route away from breeding grounds after they are born (e.g., in autumn) and must use that information to return in springtime. Box 1 provides further details on these modes of learning.

Social Learning

Social learning is an umbrella term for the learning pathway that includes transfer of skills, concepts, rules, and strategies that occur in social contexts and can affect individual behavior. Types of social learning include (i) social facilitation (increased probability of performing a behavior in the presence of a conspecific), (ii) local enhancement (an individual’s interest in an object or location mediates interest/movement by others), and (iii) imitation (novel copying of a model behavior through observation that results in a reliably similar outcome) (Visalberghi and Fragaszy, 1990). Note that these are distinct from the transfer of declarative or procedural information via direct information exchange, such as in bee dancing, to relay information concerning resource locations (Leadbeater and Chittka, 2007).

Each type of social learning is relevant to movement ecology. For example, social facilitation explains bison movement: individuals are more likely to travel to a given new location when in a group where another animal had knowledge of that location (Sigaud et al., 2017). Following behavior occurs in ants where leaders provide guidance to naïve individuals concerning the location of resources (Franks and Richardson, 2006), and in elephants where matriarchs lead herds to waterholes not known to the rest of the group (Fishlock et al., 2016). Imitation can be seen in fish, where translocation experiments demonstrate how naïve individuals learn migration routes through association with experienced individuals (Helfman and Schultz, 1984), as well as in replacement experiments where the long-term re-use of resting and mating sites can be socially learned rather than quality-based (Warner, 1988).

Individual learning can interact with social learning. For example, independent exploration allows ants to improve upon the paths they have learned via social learning through tandem running (Franklin and Franks, 2012). Here, independent exploration is the basis for improvement of route navigation, which can then be distributed within a colony via “information cascades.” More generally, individual learning may be modulated by associational acquisition, where options for individual learning are constrained by the individuals with which an animal associates (Fragaszy and Visalberghi, 2004).

Social learning is emphasized though existing social bonds, such as parent-offspring relationships. For example, elephants will learn resource locations in complex landscapes through both vertical and horizontal transmission (Bowell et al., 1996) and maternal-offspring pairs of whales may complete entire migrations together (Hamilton and Cooper, 2010), thus enhancing the potential for social learning.

However, social learning does not always confer a net benefit (Giraldeau et al., 2002), and may result in costly strategies of movement and resource use (Sigaud et al., 2017). For example, tested alone, adult female guppies that had shoaled with trained conspecifics as they swam to food used the same route used by their trained fellows, even if the route taken by the trained shoal was longer and more energetically costly than were alternative routes (Laland and Williams, 1997; Giraldeau et al., 2002).

Learning and Space Use: Connections to Other Disciplines

We distinguish two fundamental constructs for learning in conjunction with animal movement: updating the world model and building a new world model. To understand the difference between these, it helps to assume that the animal has a cognitive model of the world ( $\hat{Q}$ ) and a set of “policy rules” (β) for mapping conditions—including the snapshot of that cognitive model and the state or priorities of the animal—into outcomes, in particular movement decisions. The policy rules can be thought of as the coefficients of a function governing outcomes in terms of conditions. Within this construct, updating the world model refers to the process of movement through a world, acquiring and storing information about the world, updating the world model $\hat{Q}$ , and acting upon that knowledge according to the fixed set of policy rules β. The learning process itself is limited to updating the world model. Note that this kind of learning is only meaningful if the world itself is dynamic, with resources or threats moving, regenerating, or depleting in a way that makes it necessary to update expectations. When confronted with a new world, either via dispersal, translocation, or a significant perturbation to the existing world, the very structure of the world model and the policy rules both require adjustment by building a new world model. These two fundamental kinds of learning are schematized in Figure 2 where an elk’s movement among three dynamic patches permits constant updating of information (updating the world model), a process with relies on moving between those patches. But when a patch is significantly perturbed, or becomes unusable in a novel way, the fundamental structure of the world needs to be altered (building a new world model), and novel policy rules to govern interaction with novel elements must be developed.

FIGURE 2

Figure 2. A schematic representation of a forager’s movement rules in a heterogeneous landscape, how a stable set of rules might be applied, and how landscape disturbance could force an update to the movement rules via learning. In a pre-disturbance world (left three columns), the forager (denoted by the white elk symbol) occupies a landscape with three depletable and renewable resource patches and a water body. The “real world” is represented in the top row, with all of its complexity. The second row represents the forager’s model of that world, which distils the complexity to the most relevant information. Shapes indicate different landscape elements, while colors reflect a quantitative score: darker greens are regenerated, paler greens are depleted. The forager has two movement rules in this landscape (bottom row): (1) move from depleted resource patch to a regenerated resource patch and (2) avoid the water body. The pre-disturbance movements rely on a dynamically updated spatial memory, as the forager learns about a changing environment. Post-disturbance, the forager’s world model changes after it gains information about the loss of a potential foraging area, e.g., a new oil well destroys one of the patches. Accordingly, the forager’s world model is refined to include a novel categorical element (orange triangle), with its own avoidance rule for movement (dynamic learning).

The main distinction between updating the world model and building a new world model appears in a slightly different form in the machine-learning literature, where the two kinds of learning are labeled as base-level and meta-level. Specifically, “The base-level learning problem is the problem of learning functions, just like regular supervised learning. The meta-level learning problem is the problem of learning properties of functions, i.e., learning entire function spaces” (Thrun and Pratt, 1998). The function spaces in our analogy comprise $\hat{Q}$ , whereas the learning functions are the coefficients β. In the neurosciences, the terms model-based and model-free reinforcement learning are used in analogy with base-level and meta-level learning (Doll et al., 2012).

Cognitive ecologists typically have stringent experimental criteria for identifying learning. For example, experimentation plus control conditions sufficient to rule out alternate explanations are fundamental to confirming the existence of social learning (Reader and Biro, 2010). In this framework, experimentation could involve manipulation of physical aspects of the environment, individual animals via translocations or similar means, or the routes governing social transmission of information. Rare cases where a wild population can be experimentally manipulated provide the strongest cases for demonstrating and parameterizing memory-based movements (Ranc et al., 2020).

It is also interesting to note that complex behaviors that appear to involve decision-making can arise from other mechanisms of self-organized behavior. Self-organization occurs when simple rules lead to emergent behavior (Gros, 2015). A prominent theoretical example is cellular automata whereby a specific rule set, such as “the game of life,” gives rise to agent-like configurations that may travel, replicate, and combine. Self-organized robots (Box 3) can exhibit emergent behavior, such as autonomous direction reversal, which an external observer could mistakenly interpret as decision-making (Kubandt et al., 2019). Because self-organization is not purposeful, an agent solely based on self-organizational principles will not be able to improve, or to “learn” its score in a given task. However, complex, emergent behavior that appears to be adaptable can confound efforts to recognize signals of learning in movement data.

BOX 3. Robotics: learning by mobile autonomous agents.

Robots that move and act autonomously, learning as they go, are confronted with tasks that parallel, in some ways, the life needs faced by moving animals. As in living animals, future decisions by a mobile autonomous robot hinge on what the learning robot experiences and encounters. Consequently, it is interesting to investigate how animal decision making about movement (Figure 1) may be understood using concepts commonly used in robotics and control theory (Jordan and Mitchell, 2015).

The basic model of an autonomous learner includes the following ingredients:

1) The external environment (e.g., spatial locations of forage).

2) An internal state representation, sometimes termed a world representation (e.g., an individual’s location, energy level and knowledge of forage locations).

3) A set of possible actions (e.g., foraging strategies).

4) A policy map that relates state representations to actions (e.g., anticipated energy gain from each foraging strategy).

5) Information acquisition, which is a consequence of actions interacting with the environment and the state representations (e.g., accumulated information on forage locations).

6) Value functions that quantify benefits and consequence of actions as represented by the internal states (e.g., benefits and consequences of choosing a foraging strategy, given an individual’s location, energy level and knowledge of forage locations).

A robot’s state representation simplifies all the information in the environment to a manageable (pruned and stylized) subset of relevant information that can eventually be linked to actions. Unsupervised state representations (Lesort et al., 2018) in which there are no performance measures, may be particularly relevant as constructs for how learning operates in animals. State representations allow the policy map to act on a dimensionally reduced decision space (the collection of states), which dramatically simplifies the task of learning individual policies.

A policy map structures the relationship of the robot’s state representation to possible actions. A policy map may be complete, mapping all possible states to actions, or calculated on the run. Monte Carlo tree search, as used in the Go program AlphaGo from Google Deepmind (Silver et al., 2017), determines the next move via an extensive stochastic search. As an additional complication, a robot may possess several policy maps and then select among the alternatives in a rule-based fashion.

Specified in this way, the basic details of a mobile autonomous robot map quite closely onto a formal conceptualization of the learning process in the context of animal movement (Figure 1).

Machine Learning Approaches

Machine learning tasks involve an explicit goal, such as parameter estimation or classification, and require a clear objective function, such as minimizing a cost function or correctly classifying data. To the extent that animals also have clear objective functions (e.g., ultimately: increasing individual fitness; proximally: eating, avoiding being eaten, reproducing), and that these objectives might be satisfied by performing a specific movement-related task (e.g., selecting appropriate places to forage), it is useful to draw a general analogy between a machine-learning algorithm and an animal that learns. As described above, we use the term task-based learning when referring to this type of process.

Types of Machine Learning

Machine learning has three main learning paradigms: supervised, statistical (unsupervised) learning and reinforcement (Box 1). Training data for supervised learning is labeled with the correct output (Jordan and Mitchell, 2015). However, statistical and reinforcement machine learning do not require labeled training data and thus may be more directly applicable to animal learning. Statistical learning attempts to extract statistically relevant correlations from data (Hastie et al., 2009) whereas reinforcement learning attempts to maximize a cumulative reward through a balance between exploitation of current knowledge and exploration of new strategies (Sutton and Barto, 2017; Box 1).

A wide range of machine learning approaches emphasizes the importance of improvement through experience (Jordan and Mitchell, 2015), which is close to some definitions of animal learning. Good examples are artificial neural networks (ANN), a class of biologically inspired statistical learning algorithms. The input of an ANN, typically the sensory perception of the agent or animal, is propagated through a network of idealized neurons, which can be readjusted by experience-generated reward signals. The sophistication of the ANN can be increased via multiple layers (referred to as deep learning). The output of the ANN induces observable behavior, although it may suffer from overfitting the model to the particular data set at hand. Another way to incorporate the effects of improvement through experience is via evolutionary computing. This method mimics the trial-and-error process of natural evolution, with inheritance, mutation, and crossing over providing the material upon which selection, via reward signals, acts.

The Bayesian probabilistic model for inference provides another perspective on learning. While Bayesian reasoning is most often applied for statistical tasks such as parameter estimation and complex model fitting, it is also a central, probabilistic model for human cognition and learning (Chater et al., 2006; Tenenbaum et al., 2006). In the context of animal movement, prior information represents existing knowledge or existing preference sets (e.g., spatial memory and selection coefficients). Bayesian perspectives readily permit prior knowledge to be updated with new data (experiences) gained by an animal’s movement through the environment. For example, Michelot et al. (2019) draw an explicit analogy between stochastic rule-based animal movement and a Gibbs sampler performing Markov chain Monte Carlo sampling. The resulting posterior distributions accurately reflect the animal’s resource selection function (RSF).

As introduced above, reinforcement learning is a paradigm involving iterated remapping of situations to actions with the goal of maximizing a numerical reward (Sutton and Barto, 2017). Learners are not provided with rules, but must instead employ repeated trials to discover relationships between actions and rewards. This framework has strong parallels to experience-based frameworks for animal learning. Indeed, the temporal difference algorithm from machine learning calculates a reward-prediction error, reflecting how much better the world is than expected (Sutton and Barto, 2017). This algorithm closely resembles the Rescorla-Wagner learning rule (Rescorla and Wagner, 1972), a mainstay from animal learning theory, which posits the change in associative strength during learning is proportional to the difference between the reward received vs. predicted. By way of example, a schematic of the reinforcement optimizer for a computer learning to play the game Go is broadly similar to schematics of animal behavior and learning (Table 2). In both frameworks, an agent takes actions (movements) in the environment, and the outcomes of those actions are processed by an interpreter (cognitive model), which either “rewards” or “punishes” the agent, thereby modifying its internal state and modifying its subsequent actions. Additional aspects of realism are that rewards can be short term or delayed, and that the appropriateness of actions is not provided initially but must be learned via exploration.

Criteria of machine learning applied to animal learning

The machine learning literature provides concrete criteria for identifying if an algorithm has learned (Thrun and Pratt, 1998). Specifically, given (1) a task, (2) training experience, and (3) a performance measure, if performance at the task improves with experience, the algorithm is said to have learned. This is a useful framework for interpreting observational animal movement data. For example, for the sheep and moose in Jesmer et al. (2018) the task was maximizing energy intake and the training experience was several years of moving around the landscape. The performance measure was whether the animals adopted a migratory movement strategy to track variability in energy availability across space and time. Because of an increase in the proportion of migrants in the population over time (and, thereby, an increase in the proportion of individuals with increased energy intake), the animals likely had “learned”. Other instances of mapping empirical examples to machine learning concepts, given in Table 2, include hummingbird traplining, crane migration, and experimental elk translocation.

A major challenge to applying machine learning criteria to moving animals involves identifying the task and performance measure in meaningful ways, given the animals’ spatial context and scale of movement. Survival and reproduction are the ultimate tasks, but foraging, resting, finding a mate, and avoiding predation are all proximal tasks. Nonetheless, the framework helpfully and unambiguously associates movement in the environment with training experience. Table 2 cross-references a machine-learning example with field studies that provided experimental evidence of learning.

Machine learning may suggest new avenues for research in learning and animal movement. Active topics include feature extraction, in which derived values are intended to be informative and non-redundant (for example, preference for exploring as yet unvisited locations in mice or composition of feeding groups in jackdaws), and feature selection, which is the choice of a subset of goal-relevant features (for example, availability of resources for mice or foraging efficiency for jackdaws) (Valletta et al., 2017; Maekawa et al., 2020). These subjects must also play a role in the information processing associated with learning and animal movement; developing the connections may provide new insights.

A particularly interesting learning challenge involves updating the world (as described above) in a familiar rather than novel landscape. For example, in the foraging models of Bracis et al. (2015, 2018), the task is maximization of instantaneous energy intake, the training experience is the movement (together with the acquisition of information for updating the cognitive map), and the performance measure is the amount of forage obtained. This challenge can be connected to that of online statistical machine learning (Box 1), where data become available in a sequential order and are used to update the best predictor for future data at each step.

Could machine learning move beyond an analogy by providing specific hypotheses about the way animals learn to move? It has done so, but the cases are few. By way of example, foraging bumblebees were manipulated in a laboratory environment by presentation with artificial blue and yellow flowers dispensing sucrose solution according to probabilistic reward schedules, and their sampling strategy was compared to the results under the equivalent two-armed bandit reinforcement learning decision rules (Keasar et al., 2002). These decision rules describe optimal behavior of gamblers choosing repeatedly between options that differ in reward probability, without any prior information. In this case, the bees’ behaviors were generally consistent with the decision rule predictions.

Learning About Learning: Methods and Approaches

Experimental vs. Observational Frameworks for Gathering Evidence of Learning in Movement

Researchers have inferred connections between learning and animal movement via classical experiments, observational studies, and translocation/reintroduction efforts. These diverse data types provide distinct insights into how movement can be used to infer learning.

Experimental Studies

Informative experimental studies of learning and movement derive from both field and laboratory settings (Jacobs and Menzel, 2014). Many experimental studies involve insects. Indeed, study of insect navigation propelled much of the early understanding of animal behavior and movement and includes work by Nobel Prize winners Tinbergen and von Frisch. Examples range from moving landmarks to show the effects on navigation to food sources (Wystrach and Graham, 2012) to displacing individuals to show the effects on path integration when returning to an organizing center (Collett and Collett, 2000). Experimental resource manipulations have been used to demonstrate that hummingbirds can learn abstract concepts like spatial position (Henderson et al., 2006) and can encode spatial location on the basis of surrounding landmarks (Flores-Abreu et al., 2012). When applied to roe deer, experimental resource manipulation in a field environment demonstrates that memory, rather than perception, drives foraging decisions (Ranc et al., 2020). Elsewhere, Preisler et al. (2006) tracked elk movements in relation to experimental treatments involving all-terrain vehicles (ATV). They found that elk were more likely to respond to ATVs when on an ATV route, even if the ATV was far away. These data suggest that elk have learned to associate ATV presence with their routes.

In laboratory settings, radial mazes and water mazes (e.g., Leonard and McNaughton, 1990) have been used to study how quickly rodents can learn movement routes and improve their efficiency. Elsewhere, laboratory arenas built for insects have demonstrated that pesticide exposure can impair spatial learning of resource locations by bumblebees (Stanley et al., 2015).

Sometimes field and laboratory experiments can be combined with great benefit, including comparisons among three classic model systems (homing pigeons, bees, and rats; Jacobs and Menzel, 2014). For example, experimental lesioning studies of young homing pigeons, followed by release in unfamiliar areas, demonstrate that immature birds are very good at learning movement routes and that there is a consolidation phase during which experiences (e.g., encounters with landmarks) are neurally encoded (Bingman et al., 2005).

Observational Studies

To assess learning in observational studies, researchers must analyze how an animal behaves at a given time based on local conditions and past experiences. Observational studies typically record the location of animals and thus their experiences over relatively long time-frames (e.g., multiple years, or entire lifetimes). Remotely sensed geographic and climatological data then provide the local conditions the animal is experiencing during movement. Additional information on the behavioral and physiological states of the animal may also be relevant. Fortunately, the ongoing evolution in remote animal tracking and sensing technology means that researchers are increasingly able to infer physiological and behavioral states over long periods of time (Kays et al., 2015).

Data on repeated movement patterns can help differentiate learning hypotheses. For example, data on repeated migration routes have helped distinguish whether animals follow resource gradients, rely on memory to navigate, or learn from experience to shape their movement decisions (Mueller et al., 2013; Merkle et al., 2019). However, long-term tracking data may also be sufficient for analysis. For example, wolf movement data have identified how animals follow resource levels, but that they may also rely on the memory of time since last visit to a location (Schlägel et al., 2017). Augmenting tracking data with information that the animals might gather, for example the location of kill sites (Gurarie et al., 2011) or profitable forage patches (Merkle et al., 2014), can further enhance our understanding of how animals monitor their environment (Gurarie et al., 2011).

Comparative studies can be useful for identifying instances of learning. For example, comparing the movement efficiency of juveniles and adults shows that seabirds start by exploring their landscape and then learn to identify the good foraging areas and cues as adults (de Grissac et al., 2017; Votier et al., 2017; Grecian et al., 2018; Wakefield et al., 2019). Effects of early-life experience can be identified by analyzing the site fidelity of animals to their breeding ground (Weinrich, 1998) and by comparing the migration patterns of offspring to those of their mother’s (Colbeck et al., 2013). Finally, comparing the movement of cultural groups, especially if sympatric, can help to assess the effect of culturally transmitted information on animals’ space use (Kendal et al., 2018; Owen et al., 2019).

Translocations and Reintroductions

Some management actions involve human-aided displacements of animals, either from captivity (reintroductions) or from wild populations (translocations). Tracking the animals released in such manipulations can provide unique opportunities to understand how the animals adapt to their new environments (He et al., 2019). For example, recurring short displacements (such as when animals are repeatedly taken to the same sampling station for physiological samples), can be used to assess how quickly the animal learns the return route to its home range (Biro et al., 2007).

Translocations of animals into existing populations can aid understanding of learning when movement behaviors of individuals new to the environment can be compared to those of already-resident individuals. For example, quantifying the rate of convergence of movement metrics between new arrivals and residents could help estimate learning rates. In addition, if translocated animals, such as elk, are sourced from areas that differ in predation risk (or other factors) but released in a common space, comparison of the survival and movement patterns could be useful to understanding how previous experience shapes learning (Frair et al., 2007). Translocations of social animals may also create opportunities for newly arrived individuals to learn from resident conspecifics (Dolev et al., 2002).

Overall, comparing movements of animals in novel environments over years or even generations with historical populations can reveal learning and cultural transmission and identify the rate at which animals gain knowledge. For example, Jesmer et al. (2018) found that it took multiple decades for translocated bighorn sheep and moose to regain the capacity to identify and follow the optimal forage gradients that existed in their landscapes as they migrated. Likewise, tracking the movement of prey species before and after the introduction of predators into a landscape affords unique opportunities for investigating how animals learn to avoid predators (Ford et al., 2015).

Uncontrolled Experiments

Beyond intentional displacements, other management actions can serve as uncontrolled experiments for learning. For example, aversive conditioning, which is routinely used in wildlife conflict management, could provide guidance on the mode of learning (Bejder et al., 2009) and may provide contrast the efficacy of different deterrence systems. For example, Ronconi and Clair (2006) showed that presence-activated deterrent systems were more useful than were randomly activated systems for limiting the landing of waterfowl on tailing ponds from oil extraction. Likewise, fences involving bee hives were more likely to turn away elephants than were bush fences (King et al., 2011) and problem elk repeatedly chased by humans and dogs stayed further from town (Kloppers et al., 2005).

Rapid changes in habitat can also serve as uncontrolled experiments. For example, because ungulates will select recently burned areas (Allred et al., 2011), monitoring animal movement in fire-prone systems could help understand how these animals learn about and navigate to novel habitats. Studying movement in the vicinity of new obstacles (e.g., pipelines and roads) and passageways (e.g., road-crossing structures) could help to understand how animals change their spatial patterns as they learn to circumvent barriers and make use of new structures (McDonald and Clair, 2004; Ford and Clevenger, 2018).

Identifying and Characterizing Learning

Analytical and computational tools have a special role to play in the context of learning and animal movement. They can be used both to develop new theory, and in inference regarding actual movement behaviors.

Modeling Frameworks for Exploring How Learning Operates

Dynamical systems models are often used to investigate learning and animal movement in a purely theoretical context (Table 3). The most common purpose is to investigate possible emergent patterns, which arise from the inclusion of learning in movement models. Here spatial location and spatial memory are given by variables that change in time and space, and dynamical rules postulate how these variables could change through the interplay of movement and learning. The actual form of the dynamical systems ranges from difference equations used to analyze home ranges (van Moorter et al., 2009), to “record-keeping” models of cognitive maps based on incremental experiences (Spencer, 2012), to partial differential equations used to analyze searching ability (Berbert and Lewis, 2018) to stochastic processes used to investigate patrolling ability (Schlägel and Lewis, 2014). Agent-based simulations have also been used to track the development of complex spatial movement behaviors via learning (Tang and Bennett, 2010; Avgar et al., 2013). A review of the ways in which decisions can be integrated into agent-based models is given in DeAngelis and Diaz (2018). Often a balance is required between current perceptual information vs. memories of long-term averages and between random exploration vs. determinism when exploiting resources (see Boyer and Walsh, 2010; Bracis and Mueller, 2017). When it comes to the sharing of information between individuals, ephemeral public information about resource locations can lead to permanent aggregations of memory-based foragers that move via circuits (traplines) (Riotte-Lambert and Matthiopoulos, 2019), and sometimes the rules for near-optimal traplines can be developed based on simple heuristics (Lihoreau et al., 2013). Theoretical studies can investigate relationships or feedbacks between movement and learning that generate patterns similar those seen in nature. They can also be used to explore the environments in which learning might confer benefits. Intriguingly, in the face of an uncertain heterogeneous environment, it may be better for individuals to overestimate environmental quality, as optimistic animals can learn the true value of the environment faster, allowing for a higher rate of exploration (Berger-Tal and Avgar, 2012). Theoretical explorations are particularly useful for studying the updating the world model type of learning, where it is more difficult to make a clear distinction between precipitating events of experiences and movement outcomes in observational data.

TABLE 3

Table 3. Models for learning and animal movement.

Machine learning is emerging as a powerful paradigm for the analysis of many biological systems. In the context of learning and animal movement, these approaches can map environmental conditions to movement behavior outcomes without necessarily investigating the learning process itself (see, for example, Mueller et al., 2011; Wijeyakulasuriya et al., 2020). Furthermore, as described earlier, machine learning can serve as prototype models for the process of animal learning itself.

Testing for Change Over Time in Key Movement Metrics

Across diverse data types, a key indicator of learning is a change quantified as a function of “time in the environment” (Figure 3). While not sufficient to say confidently that learning has occurred, a strong signal that an animal’s movement behavior has changed with experience suggests that it is learning. For example, the range occupied by a group of newly translocated animals would be expected to stay very close to their point of release as they focus on learning attributes of their new environment, but wander more widely as time since release increases as they start to exploit their new environment more widely (e.g., total daily displacement, He et al., 2019). It has been proposed that Lévy walks may arise from a learning process wherein animals attempt to learn optimally from their environment (Namboodiri et al., 2016). In this situation the change from simple random (Brownian) motion to a Lévy walk pattern of movement could be interpreted as learning (but see, for example, Benhamou and Collet, 2015 for a critique of this type of formalism).

FIGURE 3

Figure 3. Exemplar movement patterns associated with learning. We represent clusters of movement activity as squiggles and range displacement events as periods of directed motion. In each of the three examples, the process of learning alters the pattern of movement in a statistically detectable manner. Exploration becomes exploitation through repeated visitation (top row). Conditioned responses to habitat elements may manifest as before / after displacement events (middle row). Information gathering during a juvenile (or otherwise naïve) phase may yield improved efficiency of travel. In all three examples, one or more key metrics will exhibit time-dependence (right column).

Decreases in the rate of range expansion over time indicates that translocated individuals may have learned to favor certain parts of the landscape. In this case, exploration shifts to an exploitation phase (Berger-Tal et al., 2014) as translocated animals exhibit a greater probability of revisiting previously visited areas in a goal-directed manner (Figure 3, top row), and may ultimately establish home-ranges (Moorcroft and Lewis, 2006). Similarly, exposure to a hostile landscape element (e.g., human habitation) may condition wild animals to avoid such elements, altering their spatial distribution to favor locations far from habitation (Figure 3, middle row). This issue has been particularly well-investigated with elephants (Hoare and Du Toit, 1999; Cheptou et al., 2017).

Animals that “sample” different landscapes during exploratory movements may ultimately settle in landscapes featuring the kinds of elements they encountered and exploited during the exploration phase. This can occur during dispersal, during which animals effectively sample and make decisions in an environment about which they are completely naïve. Wolves have been shown to show less avoidance of human elements, in particular relatively little-used forest roads, in new territories after a greater level of exposure and use during a dispersal phase, suggesting that they might have learned that the benefits of using those human elements outweigh the risks (Barry et al., 2020). Translocation, which can be considered an artificial and more abrupt dispersal, also requires decision making in novel environments. Changes in movement behavior (and improved survival) were recorded following translocation of naïve elk from a savannah landscape in Alberta to a forested landscape in eastern Canada (Fryxell et al., 2008).

Migration can also feature time-dependence in characteristics of movement (Figure 3, bottom row). For example, both Mueller et al. (2013) and Jesmer et al. (2018) report changes in migration performance as a function of animals’ time in a landscape (Table 2). On smaller scales, foraging journeys from a central place and other kinds of daily activity patterns can show the same kind of performance gains (e.g., reduced tortuosity) as a function of experience or age (Franklin and Franks, 2012; de Grissac et al., 2017; Votier et al., 2017; Wakefield et al., 2019; Table 1). Resulting spatial patterns of movement can be complex, exhibiting increased speed and goal-directedness (Noser and Byrne, 2014) and even providing evidence of future-oriented cognitive mechanisms (Janmaat et al., 2014). Emerging patterns may include periodic recursions (Riotte-Lambert et al., 2013) as well as sequential movements, where locations are revisited in a regular order (Ayers et al., 2015, 2018; De Groeve et al., 2016; Riotte-Lambert et al., 2017).

Statistical Inference to Identify Learning in Movement Processes

Analytical and computational tools may also be used to infer learning processes from data. For example, the step-selection function (SSF, Fortin et al., 2005) is of particular utility when it is connected to regular samples of location data and allows for inference of movement parameters that depend on different habitat types. Computationally efficient approaches such as integrated step selection analysis (iSSA) (Avgar et al., 2016), provide practitioners a straightforward way to evaluate movement decisions against actual observations. A generalized form of the SSF, termed the coupled SSF (Potts et al., 2014), allows for the inclusion of memory and past social interactions. Here memory and past interactions can be included into the model, as one or more spatio-temporal maps, sometimes referred to as cognitive maps. Although superficially similar to a changing habitat layer, the contents of the cognitive maps are particular to each individual as they are populated by information gleaned from the individual’s past experiences (Fagan et al., 2013). With such an SSF, one can test how the individual’s movement behavior is governed by cognitive maps whose contents arise from different types of memories or social interactions. Coupled SSFs have been used to test for evidence of memory (Polansky et al., 2015; Oliveira-Santos et al., 2016; Schlägel et al., 2017) and learning (Merkle et al., 2014) in animal movement patterns.

Analysis via SSF assumes that animals’ location data are known without error. If error is significant, as it can be for marine systems, a different class of model, known as state space models, are needed. State space models are hierarchical and feature separate models for the movement process and the measurement error process. These models can be modified to include a hidden Markov process, whose latent state is determined by physiological status (e.g., searching or traveling) or by learning (Avgar et al., 2016). Such models, while flexible, may suffer from parameter estimability issues (Auger-Méthé et al., 2016) and must be implemented with care.

Conclusion and New Horizons

Traditionally, studies of animal learning and movement have taken place in controlled laboratory environments or small-scale field studies. Thanks to animal tracking technologies, increasingly detailed observations of how free-ranging animals move and interact are possible, leading to opportunities to formulate and test new ideas about learning and movement. We summarize a variety of outstanding new opportunities as grand challenges in Box 4. However, potential pitfalls accompany this exciting development. Alternative explanations to learning must be considered, and if these alternatives cannot be ruled out, then we can only infer that observations are consistent with learning (Table 2).

BOX 4. Grand challenges in the study of animals learning to move.

How animals learn to move in novel environments. As a key form of experimental manipulation on animals in the wild, translocations and reintroductions have provided unique insights into the role of social learning of migration and the time-lags required to re-establish migration routes (Mueller et al., 2013; Jesmer et al., 2018). By designing efforts to collect pre-translocation movement that could be compared with post-release data would allow insight into the ways animals learn to move in novel environments.

Social learning. Social learning is particularly hard to study in the context of animal movement because it requires simultaneous information on the location of multiple individuals (Fragaszy and Visalberghi, 2004; Sigaud et al., 2017). One promising approach for studies in this area involves the deployment of animal tracking collars with proximity detectors that can be used to characterize and quantify how known individuals spend their time near or far from other known individuals.

Near-term prediction of movement. Successful prediction of movement, even over modest time horizons of one or a few days, requires a strong, probabilistic representation of animals’ decision-making process. With such a representation in hand it would become possible to gauge how novel experiences shape subsequent movements.

Understanding fitness consequences of learning on population interactions. Learning about movement affects interactions with other individuals (conspecifics, predator, prey and so forth), as well as with the environment. While much has been done to connect individual learning to the environment via optimal foraging (Stephens and Krebs, 1987) there is not yet a comprehensive theory for the influence of learning about movement on population level interactions and the subsequent impacts of these interactions on individual fitness. A natural place to start investigating these feedbacks would be social insects.

Machine learning as a source for new testable hypotheses regarding animal learning and movement. This contrasts with simply providing an interesting analogy for the learning process. While the multi-armed bandit problem has been applied as a model for insect foraging (e.g., Keasar et al., 2002), there are few other cases. However, ML algorithms (for example, K nearest neighbors, decision trees) provide intriguing hypotheses for how learning could proceed. A good place to start would be to build on connections between the theory of ML and the theory of learning, such as the similarity of the reward-prediction error rules in the temporal difference algorithm from machine learning calculates (Sutton and Barto, 2017) and the Rescorla-Wagner learning rule in cognitive science (Rescorla and Wagner, 1972). To date, little has been done on applying machine learning as a source for new testable hypotheses regarding animal learning and movement, but this is an intriguing area for future research.

There are two possible approaches to solving this problem. First, field observations can be transformed into controlled experiments via manipulations, as in the hummingbird example in Table 2. While allowing for incisive analysis, this approach limits the scientific questions to those where such experiments can be set up. A second possible solution is to collect more direct data on the individual experiences over a life-time, including the environmental features of locations animals visit, physiological measurements, and sensory data as made possible by daylight sensors and collar cameras.

Exciting approaches to studying learning and animal movement arise from “uncontrolled” experiments, specifically translocations, reintroductions, aversive conditioning, and rapid environmental change. Understanding learning in the context of relocations and environmental change may ultimately help with understanding how animals can adapt to an increasingly complex world, driven by elevated levels of anthropogenic impacts.

The emergence of machine learning as a dominant paradigm for solving human problems provides fertile ground for modeling and understanding learning from animal movement patterns. Here, processes such as reinforcement learning have close natural ties to animals learning to move to maximize fitness (e.g., optimal foraging). As machine learning algorithms are currently improving and evolving, we expect this field to shed light on further possible models for learning and animal movement. However, as described in the fifth Grand Challenge of Box 4, machine learning has yet to meet its full promise as a reliable source for new testable hypotheses regarding animal learning and movement. This is despite the recognition that animal cognition and communication can be closely tied to computational models (Ma, 2015) and that behavioral decisions can often be best formulated by simple algorithmic models (heuristics) (Hutchinson and Gigerenzer, 2005).

Overall, the subject of learning and animal movement is at a crucial point in development and a host of new possibilities are on the horizon. Our goal in this review has been to set the context for these new possibilities and point out some future directions.

Author Contributions

ML and WF designed and organized the review, secured the funding, and led writing of the manuscript. MA-M, JF, JMF, CG, EG, SH, and JM contributed to the review and helped writing the manuscript. All authors contributed to the article and approved the submitted version.

Funding

We acknowledge the following grants for supporting this research: NSERC Discovery (ML and MA-M), NSF DMS-1853465 (WF and EG), and Canada Research Chairs Program (ML and MA-M).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank Banff International Research Station (BIRS) for hosting a Focused Research Group on Learning and Animal Movement in the spring of 2019. BIRS is an inspiring environment for research, with foraging elk right outside the window, and is where the review manuscript was conceived and started. We thank the reviewers and Handling Editor for helpful comments and suggestions, which improved the manuscript. O. Couriot, T. Hoffman, A. Swain, Y. Salmaniw, P. Thompson, and H. Wang provided helpful feedback on the manuscript. We thank K. Budinski for help with the references and citations.

References

Abrahms, B., Hazen, E. L., Aikens, E. O., Savoca, M. S., Goldbogen, J. A., Bograd, S. J., et al. (2019). Memory and resource tracking drive blue whale migrations. Proc. Natl. Acad. Sci.U.S.A. 116, 5582–5587. doi: 10.1073/pnas.1819031116

PubMed Abstract | CrossRef Full Text | Google Scholar

Alerstam, T., Hedenstrom, A., and Akesson, S. (2003). Long-distance migration: evolution and determinants. Oikos 103, 247–260. doi: 10.1034/j.1600-0706.2003.12559.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Allred, B. W., Fuhlendorf, S. D., Engle, D. M., and Elmore, R. D. (2011). Ungulate preference for burned patches reveals strength: of fire-grazing interaction. Ecol. Evol. 1, 132–144. doi: 10.1002/ece3.12

PubMed Abstract | CrossRef Full Text | Google Scholar

Arditi, R., and Dacorogna, B. (1988). Optimal foraging on arbitrary food distributions and the definition of habitat patches. Am. Nat. 131, 837–846. doi: 10.1086/284825