Intrinsic motivations and open-ended development in animals, humans, and robots: an overview
- 1Laboratory of Computational Embodied Neuroscience, Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
- 2Department of Psychology, University of Sheffield, Sheffield, UK
- 3Department of Clinical and Social Sciences in Psychology, University of Rochester, River, New York, USA
- 4Department of Computer Science, University of Massachusetts Amherst, Massachusetts, USA
This editorial article introduces the Frontiers Research Topic and Electronic Book (eBook) on Intrinsic Motivations (IMs), which involved the publication of 24 articles with the journals Frontiers in Psychology – Cognitive Science and Frontiers in Neurorobotics. The main objective of this Frontiers Research Topic is to present state-of-the-art research on IMs and open-ended development from an interdisciplinary perspective involving human and animal psychology, neuroscience, and computational perspectives. We first introduce in this section the main themes and concepts on IMs from different interdisciplinary perspectives. These themes and concepts have been reviewed more extensively in other works (e.g., see Barto et al., 2004; Oudeyer and Kaplan, 2007; Mirolli and Baldassarre, 2013; Barto, 2013), but they are briefly reported here both to meet the needs of the reader new to the field and to introduce the concepts and terms we use in the succeeding sections. In the next four sections, we give an overview of the Topic contributions grouped by four themes. A final section draws the conclusions.
Autonomous development and lifelong open-ended learning are hallmarks of intelligence. Higher mammals, and especially humans, engage in activities that do not appear to directly serve the goals of survival, reproduction, or material advantage. Rather, many activities seem to be carried out “for their own sake” (Berlyne, 1966), play being a prime example, but including other activities driven by curiosity and interest in novel stimuli or surprising events. Autonomously setting goals and working to acquire new forms of competence are also examples of activities that often do not confer obvious evolutionary benefit. Activities like these are thus said to be driven by intrinsic motivations (Baldassarre and Mirolli, 2013a). IMs facilitate the cumulative and virtually open-ended acquisition of knowledge and skills that can later be used to accomplish fitness-enhancing goals (Singh et al., 2010; Baldassarre, 2011). IMs continue during adulthood, and they underlie several important human phenomena such as artistic creativity, scientific discovery, and subjective well-being (Ryan and Deci, 2000b; Schmidhuber, 2010).
IMs were proposed within the animal literature to explain aspects of behavior that could not be explained by the dominant theory of motivation postulating that animals work to reduce physiological imbalances (Hull, 1943). The term “intrinsic motivation” was first used to describe a “manipulation drive” hypothesized to explain why rhesus monkeys would engage with mechanical puzzles for long periods of time without receiving extrinsic rewards (Harlow et al., 1950). Other studies showed how animal instrumental actions can be conditioned with the delivery of apparently neutral stimuli: for example, monkeys were trained to perform actions to gain access to a window from which they could observe conspecifics (Butler, 1953), and mice were trained to perform actions that resulted in clicks or in moving the cage platform (Kish, 1955). The psychological literature on IMs initially linked them to the perceptual properties of stimuli, such as their complexity, novel appearance, or surprising features (Berlyne, 1950, 1966). Later, IMs were also related to action, in particular to the competence (“effectance”) that an agent can acquire to willfully make changes in its environment (White, 1959). This relation of IMs with action and their effects was later linked to the possibility of autonomously setting one's own goals (Ryan and Deci, 2000a).
Computational approaches, in particular machine learning and autonomous robotics, are concerned with IMs and open-ended development as these are thought to have the potential to lead to the construction of truly intelligent artificial systems, in particular systems that are capable of improving their own skills and knowledge autonomously and indefinitely. The relation of these studies with those on IMs in psychology were first highlighted by Barto et al. (2004) and Singh et al. (2005). The investigation of IMs from a computational perspective can lead to theoretical clarifications, in particular with respect to the computational mechanisms and functions that might underlie IMs (Mirolli and Baldassarre, 2013). IM mechanisms have been classified as being either knowledge-based or competence-based (Oudeyer and Kaplan, 2007): the former based on measures related to the acquisition of information, and the latter on measures related to the learning of skills. More recently, knowledge-based IMs have been further divided into novelty-based IMs and prediction-based IMs (Baldassarre and Mirolli, 2013b; Barto et al., 2013). Novelty-based IMs are elicited by the experience of stimuli that are not in the agent's memory (e.g., novel objects, or novel object-object or object-context combinations); prediction-based IMs are related to events that surprise the agent by violating its explicit predictions.
These distinctions have been formalized in the computational models proposed in the literature. Seminal works in machine learning (Schmidhuber, 1991), later developed to function in robots (Oudeyer et al., 2007), have proposed algorithms rewarding actions that allow the agent to improve the quality of a “predictor” component with which it anticipates the effects that such actions produce on the environment. Other researchers have proposed robots capable of detecting and focussing on novel stimuli (e.g., Marsland et al., 2005), or systems capable of detecting anomalies in datasets (Nehmzow et al., 2013). Additional research threads have focussed on action and control, in particular on IMs guiding the autonomous acquisition of motor skills (Barto et al., 2004), on the decision about which of several skills to practice at any time (Schembri et al., 2007; Santucci et al., 2013), and on the the autonomous formation of goals guiding skill acquisition (Baranes and Oudeyer, 2013). Other computational mechanisms related to the idea of IMs are being proposed in the growing field of active learning, in particular in relation to supervised learning systems (Settles, 2010).
Recent neuroscientific investigations are revealing brain mechanisms that possibly underlie the IM systems investigated in the behavioral and computational literature. However, unfortunately such investigations are carried out under agendas different from the one on IMs, e.g., in relation to dopamine, memory, motor learning, goal-directed behavior, and conflict monitoring, so comprehensive views are still missing. A large body of research shows how the hippocampus, a brain compound system playing pivotal functions for memory, has the capacity to detect the novelty of various aspects of experience, from the novelty of single items to the novelty of item-item and item-context associations (Ranganath and Rainer, 2003; Kumaran and Maguire, 2007). This detection is then capable of triggering the release of neuromodulators, such as dopamine, that modulate the functioning and learning processes of the hippocampus itself and other brain areas, e.g., of the frontal cortex involved in higher cognition, action planning, and action execution (Lisman and Grace, 2005). Other studies have shown that unexpected stimuli can activate the superior colliculus, a midbrain structure that plays a key role in oculomotor control, which in turn causes phasic bursts of dopamine affecting trial-and-error learning processes happening in basal ganglia, a brain region known to be involved in learning to select actions and other cortex contents (Redgrave and Gurney, 2006). Dopamine signals have also been shown to have an interesting direct relationship with information seeking (Bromberg-Martin and Hikosaka, 2009). Noradrenaline, another neuromodulator targeting a large part of brain, has been shown to be involved in signaling violations of the agent's expectations (Sara, 2009). The failure (Carter et al., 1998) or success (Ribas-Fernandes et al., 2011) in accomplishing goals and sub-goals, possibly themselves set by IMs, has been shown to have neural correlates that might affect succeeding motivation, engagement, and learning. Bio-inspired/bio-constrained computational modeling is linking some of these neuroscientific results to specific computational mechanisms, e.g., in relation to dopamine (e.g., see the pioneering work of Kakade and Dayan, 2002, and Mirolli et al., 2013) and goal-directed behavior (Baldassare et al., 2013).
The 24 interdisciplinary contributions to the present Research Topic can be clustered into four groups. The first group of six contributions (IMs and brain and behavior) focuses on different types of IM mechanisms implemented in the brain. The second group of five contributions (IMs and attention) focuses on the role of IMs in attention. The third group of eight contributions (IMs and motor skills) focuses on IMs as drives for the acquisition of manipulation and navigation skills, often with an emphasis on their function in enabling cumulative, open-ended development. Finally, the fourth group of five contributions (IMs and social interaction) focuses on the relationship between IMs and social phenomena, a novel area of investigation of IMs that is increasingly attracting the attention of researchers.
2. Intrinsic Motivations, Brain and Behavior
The theoretical contribution of Barto et al. (2013) argues for the importance of distinguishing between novelty and surprise on the basis of a comprehensive analysis of the computational literature related to the two. It then shows the utility of the distinction for improved understanding of brain and behavior phenomena where the two are often confused. Andringa et al. (2013) present a broad view of possible relationships between IMs and control, exploration, and agency, linking these processes to the specialization of the left and right hemispheres of the brain and showing how the interplay between these can lead to a progressive sophistication of cognition. Shah and Gurney (2014) propose a computational model that investigates how basal ganglia, modulated by IMs, can lead to a dynamical shift from noise-based exploration to repetition that can support the acquisition of both simple and more complex motor skills (in the present case, simulated reaching skills). Boedecker et al. (2013) propose a computational model based on the distinction between dorsal and ventro-medial basal ganglia regions (supporting respectively habitual and goal-directed behavior). Through the model, the authors analyze the relation between these brain regions and IMs concerning reasoning costs and the value of information. This analysis is used to account for some empirical phenomena concerning the relationship between extrinsic and IMs. Fiore et al. (2014) propose a biologically-constrained computational model that also focuses on different portions of basal ganglia. The model shows how these regions can be differentially regulated by a unique tonic dopaminergic signal, linked to both intrinsic and extrinsic motivations, on the basis of their different sensitivity to dopamine. The model, also tested with the simulated humanoid robot iCub, shows how these modulatory mechanisms can play important adaptive functions for the control of overt attention, manipulation, and goal-directed processes. Thirkettle et al. (2013) introduce the novel “Joystick experimental paradigm” developed to study intrinsically and extrinsically driven acquisition of actions. The authors demonstrate the function and effectiveness of this paradigm by presenting behavioral experiments grounded in the neuroscientific literature and concerning the acquisition of non-trivial motor actions.
3. Intrinsic Motivations and Attention
The computational work of Lonini et al. (2013) builds on a previous binocular system in which an IM learning signal is generated on the basis of the capacity of the system to reconstruct images encoded with sparse-coding features. This signal guides the acquisition of attention and vergence skills by reinforcement learning. The contribution here focuses on demonstrating the robustness of the system, in particular for recovering from disturbances and for self-recalibration. Di Nocera et al. (2014) present a behavior-based architecture that uses curiosity drives to improve the attentional capabilities of a reinforcement learning robot engaged in solving simulated survival “extrinsic” tasks. Overall, the work shows the utility of IMs to improve attention and, based on this, action selection. Mather (2013) briefly reviews research related to the familiarity-to-novelty attention shift observed in babies, and, on this basis, highlights the challenges that this phenomenon poses to theories on IMs. Perone and Spencer (2013) also deal with the familiarity-to-novelty shift. In particular, the authors propose a dynamical-field model that offers an explanation of the phenomenon as emerging from the autonomous accumulation of visual experience under the guidance of novelty-based IMs. Schlesinger and Amso (2013), referring to the results of tests of both human and computational agents engaged in solving a visual-exploration task, propose that free viewing of natural images in human infants can be understood as the effect of intrinsically motivated visual exploration driven by the goal of producing predictable gaze sequences. The authors highlight the implications of their approach for understanding visual development in infants.
4. Intrinsic Motivations and Open-Ended Development of Motor Skills
Santucci et al. (2013) focus on the problem of which IM signals are best suited to decide which skills to learn by reinforcement learning given a set of tasks. By comparing the results of systems receiving different IM signals, they show that the best IM signals are those based on mechanisms that measure the improvement of the skill competence rather than the errors, or error improvements, of predictors of the action effects on the environment. In a theoretical machine learning contribution, Schmidhuber (2013) proposes a system that automatically invents computational problems in order to train an increasingly-general problem solver. IM signals driving learning are generated when the system finds more efficient skills to solve all the problems generated thus far. In a similar vein, Ngo et al. (2013) propose an architecture for controlling a Katana simulated and real robot interacting with a blocks-world. The system is capable of self-generating goals based on its confidence in its predictions about how the environment will react to its actions. Zahedi et al. (2013) propose the use of task-independent IMs to support task-dependent learning on the basis of the mutual information of the past and future elements of sensor streams (predictive information). The authors conclude that a combination of predictive information with external rewards is recommended only for hard tasks to speed-up learning but at the cost of an asymptotic performance lost. Metzen and Kirchner (2013) propose a reinforcement learning model that self-generates tasks on the basis of graphs of states and selects the skills to learn on the basis of both novelty-based and prediction-based IMs. The system is tested with navigating and octopus-like simulated robots acting in continuous domains. Inspired by infant cognition, Pitti et al. (2013) present a reinforcement-learning bio-inspired gain-fields system for learning task-sets (areas of the sensorimotor space having a common underlying cause-effect structure). The system, tested in a cognitive task and with a Kinova robot arm, is capable of recognizing a given task-set as familiar and can create a new representation for it on the basis of its uncertainty and related prediction errors. Frank et al. (2014) propose a system for controlling the humanoid robot iCub that explores the state-action space on the basis of information gain maximization so as to improve the learning of the world model used for real-time motion planning. Law et al. (2014) present a schema-based memory system inspired by child early sensorimotor development for controlling the iCub robot. The system undergoes a staged learning process to acquire eye-arm reaching skills and basic manipulation skills under the guidance of novelty- and prediction-based IMs, and the progressive release of constraints focussing attention and learning on relevant experiences.
5. Intrinsic Motivations and Social Phenomena
In a contribution based on game theory, Merrick and Shafi (2013) propose the concept of “optimally motivating incentive” for game players, and show how different instances of such an incentive (i.e., strong power, affiliation, and achievement motivation) can be used in both modeling human behavior and designing effective artificial agents. The theoretical contribution of Triesch (2013) starts from the idea of IMs serving the function of learning “efficient coding” of sensory data and proposes that imitation can emerge as the consequence of a general intrinsic drive to compress information that leads to matching one's own actions with those of the imitated tutor. Moulin-Frier et al. (2013) propose a model of the initial staged development of speech in infants. IMs initially drive the system to learn the control of phonation, then to produce unarticulated sounds, and finally to produce proto-syllables. The model is tested with a simulator of the vocal tract, the auditory system, the agent's motor control, and social interactions with peers. The contribution of Ogino et al. (2013) proposes a reinforcement learning model of parent-child engagement where learning signals, similar to phasic dopamine signals, are caused by both extrinsic and intrinsic information, in particular related to the presence and novelty of emotional facial expressions. Finally, Jauffret et al. (2013) propose a bio-inspired neural architecture that uses a prediction-based algorithm applied to sensorimotor contingencies to solve complex navigation tasks and is capable of asking for help in dead-lock situations.
6. Concluding Remarks
The papers of the present Research Topic testify to the existence of ample interest on the Topic issues. At the same time, they show that the literature on IMs is still characterized by a heterogeneity of perspectives on their possible roles in cognition and behavior and on the possible mechanisms supporting them. On the one side, this heterogeneity is expected given the recency of the attempts to systematize the psychological, neuroscientific, and computational views on IMs within broad interdisciplinary frameworks. On the other side, the heterogeneity is also an indication of the richness of intrinsically motivated phenomena, of their importance for animals' cognition and behavior, and of their utility for the design of autonomous robots and intelligent machines. The richness of this topic is expected to result in a further strengthening of the research in the field over the near future.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research has received funds from the European Commission 7th Framework Programme (FP7/2007-2013), “Challenge 2—Cognitive Systems, Interaction, Robotics,” Grant Agreement No. ICT-IP-231722, Project “IM-CLeVeR—Intrinsically Motivated Cumulative Learning Versatile Robots.” This Frontiers Topic was accomplished as a deliverable of the IM-CLeVeR Project.
Andringa, T. C., van den Bosch, K. A., and Vlaskamp, C. (2013). Learning autonomy in two or three steps: linking open-ended development, authority, and agency to motivation. Front. Psychol. 4:766. doi: 10.3389/fpsyg.2013.00766
Baldassare, G., Mannella, F., Fiore, V. G., Redgrave, P., Gurney, K., and Mirolli, M. (2013). Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model. Neural Netw. 41, 168–187. doi: 10.1016/j.neunet.2012.09.015
Baldassarre, G. (2011). “What are intrinsic motivations? a biological perspective,” in Proceedings of the International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob-2011), eds A. Cangelosi, J. Triesch, I. Fasel, K. Rohlfing, F. Nori, P.-Y. Oudeyer, et al. (New York, NY:IEEE), E1–E8.
Baldassarre, G., and Mirolli, M. (2013b). “Intrinsically motivated learning systems: an overview,” in Intrinsically Motivated Learning in Natural and Artificial Systems, eds G. Baldassarre and M. Mirolli (Berlin: Springer-Verlag), 1–14. doi: 10.1007/978-3-642-32375-1_1
Barto, A. (2013). “Intrinsic motivation and reinforcement learning,” in Intrinsically Motivated Learning in Natural and Artificial Systems, eds G. Baldassarre and M. Mirolli (Berlin:Springer-Verlag), 17–47. doi: 10.1007/978-3-642-32375-1_2
Barto, A. G., Singh, S., and Chentanez, N. (2004). “Intrinsically motivated learning of hierarchical collections of skills,” in International Conference on Developmental Learning (ICDL2004), eds J. Triesch and T. Jebara (New York, NY:IEEE), 112–119.
Boedecker, J., Lampe, T., and Riedmiller, M. (2013). Modeling effects of intrinsic and extrinsic rewards on the competition between striatal learning systems. Front. Psychol. 4:739. doi: 10.3389/fpsyg.2013.00739
Carter, C. S., Braver, T. S., Barch, D. M., Botvinick, M. M., Noll, D., and Cohen, J. D. (1998). Anterior cingulate cortex, error detection, and the online monitoring of performance. Science 280, 747–749. doi: 10.1126/science.280.5364.747
Fiore, V. G., Sperati, V., Mannella, F., Mirolli, M., Gurney, K., Friston, K., et al. (2014). Keep focussing: striatal dopamine multiple functions resolved in a single mechanism tested in a simulated humanoid robot. Front. Psychol. 5:124. doi: 10.3389/fpsyg.2014.00124
Frank, M., Leitner, J., Stollenga, M., Forster, A., and Schmidhuber, J. (2014). Curiosity driven reinforcement learning for motion planning on humanoids. Front. Neurorobot. 7:25. doi: 10.3389/fnbot.2013.00025
Jauffret, A., Cuperlier, N., Tarroux, P., and Gaussier, P. (2013). From self-assessment to frustration, a small step toward autonomy in robotic navigation. Front. Neurorobot. 7:16. doi: 10.3389/fnbot.2013.00016
Lonini, L., Forestier, S., Teuliere, C., Zhao, Y., Shi, B. E., and Triesch, J. (2013). Robust active binocular vision through intrinsically motivated learning. Front. Neurorobot. 7:20. doi: 10.3389/fnbot.2013.00020
Mirolli, M., and Baldassarre, G. (2013). “Functions and mechanisms of intrinsic motivations: the knowledge versus competence distinction,” in Intrinsically Motivated Learning in Natural and Artificial Systems, eds G. Baldassarre and M. Mirolli (Berlin:Springer-Verlag), 49–72.
Mirolli, M., Baldassarre, G., and Santucci, V. G. (2013). Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: a simulated robotic study. Neural Netw. 39, 40–51. doi: 10.1016/j.neunet.2012.12.012
Moulin-Frier, C., Nguyen, S. M., and Oudeyer, P.-Y. (2013). Self-organization of early vocal development in infants and machines: the role of intrinsic motivation. Front. Psychol. 4:1006. doi: 10.3389/fpsyg.2013.01006
Nehmzow, U., Gatsoulis, Y., Kerr, E., Condell, J., Siddique, N., and McGinnity, M. T. (2013). “Novelty detection as an intrinsic motivation for cumulative learning robots,” in Intrinsically Motivated Learning in Natural and Artificial Systems, eds G. Baldassarre and M. Mirolli (Berlin: Springer-Verlag), 185–207. doi: 10.1007/978-3-642-32375-1_8
Ngo, H., Luciw, M., Forster, A., and Schmidhuber, J. (2013). Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots. Front. Psychol. 4:833. doi: 10.3389/fpsyg.2013.00833
Ribas-Fernandes, J. J. F., Solway, A., Diuk, C., McGuire, J. T., Barto, A. G., Niv, Y., et al. (2011). A neural signature of hierarchical reinforcement learning. Neuron 71, 370–379. doi: 10.1016/j.neuron.2011.05.042
Ryan, R. M., and Deci, E. L. (2000b). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. Am. Psychol. 55, 68–78. doi: 10.1037/0003-066X.55.1.68
Schembri, M., Mirolli, M., and Baldassarre, G. (2007). “Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot,” in Proceedings of the 6th International Conference on Development and Learning, eds Y. Demiris, D. Mareschal, B. Scassellati, and J. Weng (New York,NY: IEEE), E1–E6.
Schlesinger, M., and Amso, D. (2013). Image free-viewing as intrinsically-motivated exploration: estimating the learnability of center-of-gaze image samples in infants and adults. Front. Psychol. 4:802. doi: 10.3389/fpsyg.2013.00802
Schmidhuber, J. (1991). “A possibility for implementing curiosity and boredom in model-building neural controllers,” in Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, eds J. A. Meyer and S. W. Wilson (Cambridge, MA: MIT Press/Bradford Books), 222–227.
Schmidhuber, J. (2013). Powerplay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4:313. doi: 10.3389/fpsyg.2013.00313
Shah, A., and Gurney, K. N. (2014). Emergent structured transition from variation to repetition in a biologically-plausible model of learning in basal ganglia. Front. Psychol. 5:91. doi: 10.3389/fpsyg.2014.00091
Singh, S., Barto, A., and Chentanez, N. (2005). “Intrinsically motivated reinforcement learning,” in Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, eds L. K. Saul, Y. Weiss, and L. Bottou (Cambridge, MA: The MIT Press).
Singh, S., Lewis, R., Barto, A., and Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans. Auton. Mental Dev. 2, 70–82. doi: 10.1109/TAMD.2010.2051031
Thirkettle, M., Walton, T., Redgrave, P., Gurney, K., and Stafford, T. (2013). No learning where to go without first knowing where you're coming from: action discovery is trajectory, not endpoint based. Front. Psychol. 4:638. doi: 10.3389/fpsyg.2013.00638
Keywords: intrinsic motivations, novelty and surprise, cumulative learning and development, computational models, autonomous robotics, reinforcement learning, brain and behavior, review
Citation: Baldassarre G, Stafford T, Mirolli M, Redgrave P, Ryan RM and Barto A (2014) Intrinsic motivations and open-ended development in animals, humans, and robots: an overview. Front. Psychol. 5:985. doi: 10.3389/fpsyg.2014.00985
Received: 14 July 2014; Accepted: 19 August 2014;
Published online: 09 September 2014.
Edited and reviewed by: Eddy J. Davelaar, Birkbeck College, UK
Copyright © 2014 Baldassarre, Stafford, Mirolli, Redgrave, Ryan and Barto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.