Intrinsic motivations and open-ended development in animals, humans, and robots: an overview

This editorial article introduces the Frontiers Research Topic and Electronic Book (eBook) on Intrinsic Motivations (IMs), which involved the publication of 24 articles with the journals Frontiers in Psychology – Cognitive Science and Frontiers in Neurorobotics. The main objective of this Frontiers Research Topic is to present state-of-the-art research on IMs and open-ended development from an interdisciplinary perspective involving human and animal psychology, neuroscience, and computational perspectives. We first introduce in this section the main themes and concepts on IMs from different interdisciplinary perspectives. These themes and concepts have been reviewed more extensively in other works (e.g., see Barto et al., 2004; Oudeyer and Kaplan, 2007; Mirolli and Baldassarre, 2013; Barto, 2013), but they are briefly reported here both to meet the needs of the reader new to the field and to introduce the concepts and terms we use in the succeeding sections. In the next four sections, we give an overview of the Topic contributions grouped by four themes. A final section draws the conclusions. 
 
Autonomous development and lifelong open-ended learning are hallmarks of intelligence. Higher mammals, and especially humans, engage in activities that do not appear to directly serve the goals of survival, reproduction, or material advantage. Rather, many activities seem to be carried out “for their own sake” (Berlyne, 1966), play being a prime example, but including other activities driven by curiosity and interest in novel stimuli or surprising events. Autonomously setting goals and working to acquire new forms of competence are also examples of activities that often do not confer obvious evolutionary benefit. Activities like these are thus said to be driven by intrinsic motivations (Baldassarre and Mirolli, 2013a). IMs facilitate the cumulative and virtually open-ended acquisition of knowledge and skills that can later be used to accomplish fitness-enhancing goals (Singh et al., 2010; Baldassarre, 2011). IMs continue during adulthood, and they underlie several important human phenomena such as artistic creativity, scientific discovery, and subjective well-being (Ryan and Deci, 2000b; Schmidhuber, 2010). 
 
IMs were proposed within the animal literature to explain aspects of behavior that could not be explained by the dominant theory of motivation postulating that animals work to reduce physiological imbalances (Hull, 1943). The term “intrinsic motivation” was first used to describe a “manipulation drive” hypothesized to explain why rhesus monkeys would engage with mechanical puzzles for long periods of time without receiving extrinsic rewards (Harlow et al., 1950). Other studies showed how animal instrumental actions can be conditioned with the delivery of apparently neutral stimuli: for example, monkeys were trained to perform actions to gain access to a window from which they could observe conspecifics (Butler, 1953), and mice were trained to perform actions that resulted in clicks or in moving the cage platform (Kish, 1955). The psychological literature on IMs initially linked them to the perceptual properties of stimuli, such as their complexity, novel appearance, or surprising features (Berlyne, 1950, 1966). Later, IMs were also related to action, in particular to the competence (“effectance”) that an agent can acquire to willfully make changes in its environment (White, 1959). This relation of IMs with action and their effects was later linked to the possibility of autonomously setting one's own goals (Ryan and Deci, 2000a). 
 
Computational approaches, in particular machine learning and autonomous robotics, are concerned with IMs and open-ended development as these are thought to have the potential to lead to the construction of truly intelligent artificial systems, in particular systems that are capable of improving their own skills and knowledge autonomously and indefinitely. The relation of these studies with those on IMs in psychology were first highlighted by Barto et al. (2004) and Singh et al. (2005). The investigation of IMs from a computational perspective can lead to theoretical clarifications, in particular with respect to the computational mechanisms and functions that might underlie IMs (Mirolli and Baldassarre, 2013). IM mechanisms have been classified as being either knowledge-based or competence-based (Oudeyer and Kaplan, 2007): the former based on measures related to the acquisition of information, and the latter on measures related to the learning of skills. More recently, knowledge-based IMs have been further divided into novelty-based IMs and prediction-based IMs (Baldassarre and Mirolli, 2013b; Barto et al., 2013). Novelty-based IMs are elicited by the experience of stimuli that are not in the agent's memory (e.g., novel objects, or novel object-object or object-context combinations); prediction-based IMs are related to events that surprise the agent by violating its explicit predictions. 
 
These distinctions have been formalized in the computational models proposed in the literature. Seminal works in machine learning (Schmidhuber, 1991), later developed to function in robots (Oudeyer et al., 2007), have proposed algorithms rewarding actions that allow the agent to improve the quality of a “predictor” component with which it anticipates the effects that such actions produce on the environment. Other researchers have proposed robots capable of detecting and focussing on novel stimuli (e.g., Marsland et al., 2005), or systems capable of detecting anomalies in datasets (Nehmzow et al., 2013). Additional research threads have focussed on action and control, in particular on IMs guiding the autonomous acquisition of motor skills (Barto et al., 2004), on the decision about which of several skills to practice at any time (Schembri et al., 2007; Santucci et al., 2013), and on the the autonomous formation of goals guiding skill acquisition (Baranes and Oudeyer, 2013). Other computational mechanisms related to the idea of IMs are being proposed in the growing field of active learning, in particular in relation to supervised learning systems (Settles, 2010). 
 
Recent neuroscientific investigations are revealing brain mechanisms that possibly underlie the IM systems investigated in the behavioral and computational literature. However, unfortunately such investigations are carried out under agendas different from the one on IMs, e.g., in relation to dopamine, memory, motor learning, goal-directed behavior, and conflict monitoring, so comprehensive views are still missing. A large body of research shows how the hippocampus, a brain compound system playing pivotal functions for memory, has the capacity to detect the novelty of various aspects of experience, from the novelty of single items to the novelty of item-item and item-context associations (Ranganath and Rainer, 2003; Kumaran and Maguire, 2007). This detection is then capable of triggering the release of neuromodulators, such as dopamine, that modulate the functioning and learning processes of the hippocampus itself and other brain areas, e.g., of the frontal cortex involved in higher cognition, action planning, and action execution (Lisman and Grace, 2005). Other studies have shown that unexpected stimuli can activate the superior colliculus, a midbrain structure that plays a key role in oculomotor control, which in turn causes phasic bursts of dopamine affecting trial-and-error learning processes happening in basal ganglia, a brain region known to be involved in learning to select actions and other cortex contents (Redgrave and Gurney, 2006). Dopamine signals have also been shown to have an interesting direct relationship with information seeking (Bromberg-Martin and Hikosaka, 2009). Noradrenaline, another neuromodulator targeting a large part of brain, has been shown to be involved in signaling violations of the agent's expectations (Sara, 2009). The failure (Carter et al., 1998) or success (Ribas-Fernandes et al., 2011) in accomplishing goals and sub-goals, possibly themselves set by IMs, has been shown to have neural correlates that might affect succeeding motivation, engagement, and learning. Bio-inspired/bio-constrained computational modeling is linking some of these neuroscientific results to specific computational mechanisms, e.g., in relation to dopamine (e.g., see the pioneering work of Kakade and Dayan, 2002, and Mirolli et al., 2013) and goal-directed behavior (Baldassare et al., 2013). 
 
The 24 interdisciplinary contributions to the present Research Topic can be clustered into four groups. The first group of six contributions (IMs and brain and behavior) focuses on different types of IM mechanisms implemented in the brain. The second group of five contributions (IMs and attention) focuses on the role of IMs in attention. The third group of eight contributions (IMs and motor skills) focuses on IMs as drives for the acquisition of manipulation and navigation skills, often with an emphasis on their function in enabling cumulative, open-ended development. Finally, the fourth group of five contributions (IMs and social interaction) focuses on the relationship between IMs and social phenomena, a novel area of investigation of IMs that is increasingly attracting the attention of researchers.


This editorial article introduces the Frontiers Research Topic and Electronic Book (eBook) on Intrinsic Motivations (IMs), which involved the publication of 24 articles with the journals Frontiers in Psychology -Cognitive Science and Frontiers in Neurorobotics.
The main objective of this Frontiers Research Topic is to present state-of-the-art research on IMs and open-ended development from an interdisciplinary perspective involving human and animal psychology, neuroscience, and computational perspectives. We first introduce in this section the main themes and concepts on IMs from different interdisciplinary perspectives. These themes and concepts have been reviewed more extensively in other works (e.g., see Barto et al., 2004;Barto, 2013), but they are briefly reported here both to meet the needs of the reader new to the field and to introduce the concepts and terms we use in the succeeding sections. In the next four sections, we give an overview of the Topic contributions grouped by four themes. A final section draws the conclusions.
Autonomous development and lifelong open-ended learning are hallmarks of intelligence. Higher mammals, and especially humans, engage in activities that do not appear to directly serve the goals of survival, reproduction, or material advantage. Rather, many activities seem to be carried out "for their own sake" (Berlyne, 1966), play being a prime example, but including other activities driven by curiosity and interest in novel stimuli or surprising events. Autonomously setting goals and working to acquire new forms of competence are also examples of activities that often do not confer obvious evolutionary benefit. Activities like these are thus said to be driven by intrinsic motivations (Baldassarre and Mirolli, 2013a). IMs facilitate the cumulative and virtually open-ended acquisition of knowledge and skills that can later be used to accomplish fitnessenhancing goals (Singh et al., 2010;Baldassarre, 2011). IMs continue during adulthood, and they underlie several important human phenomena such as artistic creativity, scientific discovery, and subjective well-being (Ryan and Deci, 2000b;Schmidhuber, 2010).
IMs were proposed within the animal literature to explain aspects of behavior that could not be explained by the dominant theory of motivation postulating that animals work to reduce physiological imbalances (Hull, 1943). The term "intrinsic motivation" was first used to describe a "manipulation drive" hypothesized to explain why rhesus monkeys would engage with mechanical puzzles for long periods of time without receiving extrinsic rewards (Harlow et al., 1950). Other studies showed how animal instrumental actions can be conditioned with the delivery of apparently neutral stimuli: for example, monkeys were trained to perform actions to gain access to a window from which they could observe conspecifics (Butler, 1953), and mice were trained to perform actions that resulted in clicks or in moving the cage platform (Kish, 1955). The psychological literature on IMs initially linked them to the perceptual properties of stimuli, such as their complexity, novel appearance, or surprising features (Berlyne, 1950(Berlyne, , 1966. Later, IMs were also related to action, in particular to the competence ("effectance") that an agent can acquire to willfully make changes in its environment (White, 1959). This relation of IMs with action and their effects was later linked to the possibility of autonomously setting one's own goals (Ryan and Deci, 2000a).
Computational approaches, in particular machine learning and autonomous robotics, are concerned with IMs and openended development as these are thought to have the potential to lead to the construction of truly intelligent artificial systems, in particular systems that are capable of improving their own skills and knowledge autonomously and indefinitely. The relation of these studies with those on IMs in psychology were first highlighted by Barto et al. (2004) and Singh et al. (2005). The investigation of IMs from a computational perspective can lead to theoretical clarifications, in particular with respect to the computational mechanisms and functions that might underlie IMs . IM mechanisms have been classified as being either knowledge-based or competencebased : the former based on measures related to the acquisition of information, and the latter on measures related to the learning of skills. More recently, knowledge-based IMs have been further divided into noveltybased IMs and prediction-based IMs (Baldassarre and Mirolli, 2013b;Barto et al., 2013). Novelty-based IMs are elicited by the experience of stimuli that are not in the agent's memory (e.g., novel objects, or novel object-object or object-context combinations); prediction-based IMs are related to events that surprise the agent by violating its explicit predictions.
These distinctions have been formalized in the computational models proposed in the literature. Seminal works in machine learning (Schmidhuber, 1991), later developed to function in robots , have proposed algorithms rewarding actions that allow the agent to improve the quality of a "predictor" component with which it anticipates the effects that such actions produce on the environment. Other researchers have proposed robots capable of detecting and focussing on novel stimuli (e.g., Marsland et al., 2005), or systems capable of detecting anomalies in datasets (Nehmzow et al., 2013). Additional research threads have focussed on action and control, in particular on IMs guiding the autonomous acquisition of motor skills (Barto et al., 2004), on the decision about which of several skills to practice at any time (Schembri et al., 2007;Santucci et al., 2013), and on the the autonomous formation of goals guiding skill acquisition (Baranes and Oudeyer, 2013). Other computational mechanisms related to the idea of IMs are being proposed in the growing field of active learning, in particular in relation to supervised learning systems (Settles, 2010).
Recent neuroscientific investigations are revealing brain mechanisms that possibly underlie the IM systems investigated in the behavioral and computational literature. However, unfortunately such investigations are carried out under agendas different from the one on IMs, e.g., in relation to dopamine, memory, motor learning, goal-directed behavior, and conflict monitoring, so comprehensive views are still missing. A large body of research shows how the hippocampus, a brain compound system playing pivotal functions for memory, has the capacity to detect the novelty of various aspects of experience, from the novelty of single items to the novelty of item-item and item-context associations (Ranganath and Rainer, 2003;Kumaran and Maguire, 2007). This detection is then capable of triggering the release of neuromodulators, such as dopamine, that modulate the functioning and learning processes of the hippocampus itself and other brain areas, e.g., of the frontal cortex involved in higher cognition, action planning, and action execution (Lisman and Grace, 2005). Other studies have shown that unexpected stimuli can activate the superior colliculus, a midbrain structure that plays a key role in oculomotor control, which in turn causes phasic bursts of dopamine affecting trial-and-error learning processes happening in basal ganglia, a brain region known to be involved in learning to select actions and other cortex contents (Redgrave and Gurney, 2006). Dopamine signals have also been shown to have an interesting direct relationship with information seeking (Bromberg-Martin and Hikosaka, 2009). Noradrenaline, another neuromodulator targeting a large part of brain, has been shown to be involved in signaling violations of the agent's expectations (Sara, 2009). The failure (Carter et al., 1998) or success (Ribas-Fernandes et al., 2011) in accomplishing goals and sub-goals, possibly themselves set by IMs, has been shown to have neural correlates that might affect succeeding motivation, engagement, and learning. Bio-inspired/bio-constrained computational modeling is linking some of these neuroscientific results to specific computational mechanisms, e.g., in relation to dopamine (e.g., see the pioneering work of Kakade andDayan, 2002, and and goal-directed behavior (Baldassare et al., 2013).
The 24 interdisciplinary contributions to the present Research Topic can be clustered into four groups. The first group of six contributions (IMs and brain and behavior) focuses on different types of IM mechanisms implemented in the brain. The second group of five contributions (IMs and attention) focuses on the role of IMs in attention. The third group of eight contributions (IMs and motor skills) focuses on IMs as drives for the acquisition of manipulation and navigation skills, often with an emphasis on their function in enabling cumulative, open-ended development. Finally, the fourth group of five contributions (IMs and social interaction) focuses on the relationship between IMs and social phenomena, a novel area of investigation of IMs that is increasingly attracting the attention of researchers.

INTRINSIC MOTIVATIONS, BRAIN AND BEHAVIOR
The theoretical contribution of Barto et al. (2013) argues for the importance of distinguishing between novelty and surprise on the basis of a comprehensive analysis of the computational literature related to the two. It then shows the utility of the distinction for improved understanding of brain and behavior phenomena where the two are often confused. Andringa et al. (2013) present a broad view of possible relationships between IMs and control, exploration, and agency, linking these processes to the specialization of the left and right hemispheres of the brain and showing how the interplay between these can lead to a progressive sophistication of cognition. Shah and Gurney (2014) propose a computational model that investigates how basal ganglia, modulated by IMs, can lead to a dynamical shift from noise-based exploration to repetition that can support the acquisition of both simple and more complex motor skills (in the present case, simulated reaching skills). Boedecker et al. (2013) propose a computational model based on the distinction between dorsal and ventro-medial basal ganglia regions (supporting respectively habitual and goal-directed behavior). Through the model, the authors analyze the relation between these brain regions and IMs concerning reasoning costs and the value of information. This analysis is used to account for some empirical phenomena concerning the relationship between extrinsic and IMs. Fiore et al. (2014) propose a biologically-constrained computational model that also focuses on different portions of basal ganglia. The model shows how these regions can be differentially regulated by a unique tonic dopaminergic signal, linked to both intrinsic and extrinsic motivations, on the basis of their different sensitivity to dopamine. The model, also tested with the simulated humanoid robot iCub, shows how these modulatory mechanisms can play important adaptive functions for the control of overt attention, manipulation, and goal-directed processes. Thirkettle et al. (2013) introduce the novel "Joystick experimental paradigm" developed to study intrinsically and extrinsically driven acquisition of actions. The authors demonstrate the function and effectiveness of this paradigm by presenting behavioral experiments grounded in the neuroscientific literature and concerning the acquisition of non-trivial motor actions.

INTRINSIC MOTIVATIONS AND ATTENTION
The computational work of Lonini et al. (2013) builds on a previous binocular system in which an IM learning signal is generated on the basis of the capacity of the system to reconstruct images encoded with sparse-coding features. This signal guides the acquisition of attention and vergence skills by reinforcement learning. The contribution here focuses on demonstrating the robustness of the system, in particular for recovering from disturbances and for self-recalibration. Di Nocera et al. (2014) present a behavior-based architecture that uses curiosity drives to improve the attentional capabilities of a reinforcement learning robot engaged in solving simulated survival "extrinsic" tasks. Overall, the work shows the utility of IMs to improve attention and, based on this, action selection. Mather (2013) briefly reviews research related to the familiarity-to-novelty attention shift observed in babies, and, on this basis, highlights the challenges that this phenomenon poses to theories on IMs. Perone and Spencer (2013) also deal with the familiarity-to-novelty shift. In particular, the authors propose a dynamical-field model that offers an explanation of the phenomenon as emerging from the autonomous accumulation of visual experience under the guidance of noveltybased IMs. Schlesinger and Amso (2013), referring to the results of tests of both human and computational agents engaged in solving a visual-exploration task, propose that free viewing of natural images in human infants can be understood as the effect of intrinsically motivated visual exploration driven by the goal of producing predictable gaze sequences. The authors highlight the implications of their approach for understanding visual development in infants. Santucci et al. (2013) focus on the problem of which IM signals are best suited to decide which skills to learn by reinforcement learning given a set of tasks. By comparing the results of systems receiving different IM signals, they show that the best IM signals are those based on mechanisms that measure the improvement of the skill competence rather than the errors, or error improvements, of predictors of the action effects on the environment. In a theoretical machine learning contribution, Schmidhuber (2013) proposes a system that automatically invents computational problems in order to train an increasingly-general problem solver. IM signals driving learning are generated when the system finds more efficient skills to solve all the problems generated thus far. In a similar vein, Ngo et al. (2013) propose an architecture for controlling a Katana simulated and real robot interacting with a blocks-world. The system is capable of self-generating goals based on its confidence in its predictions about how the environment will react to its actions. Zahedi et al. (2013) propose the use of task-independent IMs to support task-dependent learning on the basis of the mutual information of the past and future elements of sensor streams (predictive information). The authors conclude that a combination of predictive information with external rewards is recommended only for hard tasks to speed-up learning but at the cost of an asymptotic performance lost. Metzen and Kirchner (2013) propose a reinforcement learning model that self-generates tasks on the basis of graphs of states and selects the skills to learn on the basis of both novelty-based and prediction-based IMs. The system is tested with navigating and octopus-like simulated robots acting in continuous domains. Inspired by infant cognition, Pitti et al. (2013) present a reinforcement-learning bio-inspired gain-fields system for learning task-sets (areas of the sensorimotor space having a common underlying cause-effect structure). The system, tested in a cognitive task and with a Kinova robot arm, is capable of recognizing a given task-set as familiar and can create a new representation for it on the basis of its uncertainty and related prediction errors. Frank et al. (2014) propose a system for controlling the humanoid robot iCub that explores the state-action space on the basis of information gain maximization so as to improve the learning of the world model used for real-time motion planning. Law et al. (2014) present a schema-based memory system inspired by child early sensorimotor development for controlling the iCub robot. The system undergoes a staged learning process to acquire eye-arm reaching skills and basic manipulation skills under the guidance of novelty-and prediction-based IMs, and the progressive release of constraints focussing attention and learning on relevant experiences.

INTRINSIC MOTIVATIONS AND SOCIAL PHENOMENA
In a contribution based on game theory, Merrick and Shafi (2013) propose the concept of "optimally motivating incentive" for game players, and show how different instances of such an incentive (i.e., strong power, affiliation, and achievement motivation) can be used in both modeling human behavior and designing effective artificial agents. The theoretical contribution of Triesch (2013) starts from the idea of IMs serving the function of learning "efficient coding" of sensory data and proposes that imitation can emerge as the consequence of a general intrinsic drive to compress information that leads to matching one's own actions with those of the imitated tutor. Moulin-Frier et al. (2013) propose a model of the initial staged development of speech in infants. IMs initially drive the system to learn the control of phonation, then to produce unarticulated sounds, and finally to produce proto-syllables. The model is tested with a simulator of the vocal tract, the auditory system, the agent's motor control, and social interactions with peers. The contribution of Ogino et al. (2013) proposes a reinforcement learning model of parent-child engagement where learning signals, similar to phasic dopamine signals, are caused by both extrinsic and intrinsic information, in particular related to the presence and novelty of emotional facial expressions. Finally, Jauffret et al. (2013) propose a bio-inspired neural architecture that uses a prediction-based algorithm applied to sensorimotor contingencies to solve complex navigation tasks and is capable of asking for help in dead-lock situations.

CONCLUDING REMARKS
The papers of the present Research Topic testify to the existence of ample interest on the Topic issues. At the same time, they show that the literature on IMs is still characterized by a www.frontiersin.org September 2014 | Volume 5 | Article 985 | 3 heterogeneity of perspectives on their possible roles in cognition and behavior and on the possible mechanisms supporting them. On the one side, this heterogeneity is expected given the recency of the attempts to systematize the psychological, neuroscientific, and computational views on IMs within broad interdisciplinary frameworks. On the other side, the heterogeneity is also an indication of the richness of intrinsically motivated phenomena, of their importance for animals' cognition and behavior, and of their utility for the design of autonomous robots and intelligent machines. The richness of this topic is expected to result in a further strengthening of the research in the field over the near future.