Influence of the reward on the abstractions
-
1
Honda Research Institute Europe GmbH, Honda Research Institute Europe GmbH, Germany
1. Introduction
The integration of multiple subsystems is a key to building of an autonomous system. In this contribution we consider interfaces between two subsystems: reward and abstractions. Both were thoroughly investigated, but to our knowledge mostly isolated from each other (Oudeyer, 2008), (Pezzulo, 2007). Our analysis leads to categorization of different abstraction types based on the influence of the reward and not on the content of representation. With help of gained overview we show how the integration of different abstraction types supports the ongoing development.
2. Categorization of abstraction types
We clarify first what we mean by abstractions. The system can observe its current state in sensorimotor space as well as causality either between time-delayed measurements (predictive model) or between measurements in different sensory channels (associative model). Both types of causality can be formalized as expectations. As the observation space is huge, most implementations aim at clustering and discretization of this space. This is how the abstractions, i.e. action or perception primitives, come into play. We use the term abstractions for this initial stage in difference to the term symbol that implies existence of symbol-symbol links and grammar.
2.1 Statistical structuring of the system-environment interaction
The observations done by the system obviously depend on the behavior of the system (Lungarella, 2005). If the behavior follows reward optimization, then it indirectly influences abstraction building. This statistical influence can be made more explicit, for example (Hart, 2006) refines the discretization if entropy of the behavior choice is high.
2.2 Discrimination according to reward
The reward can itself be a part of observation. Then the observation space can be directly segmented according to the amount of the reward (Ishiguro, 1996). Alternatively the reward or value function can be used as a teaching signal for clustering as discussed in (Koerner, 1998).
2.3 Optimization of abstractions with respect to reward
Optimization is a qualitatively different way how the reward can influence the building of abstractions, (Wolpert, 1998). The authors assume that there exists an initial clustering of the sensorimotor flow into the predictive models and corresponding inverse models to produce a desired action. As a result of the reiteration of action optimization and differentiated model update, each pair of prediction and inverse models gets specialized to a particular context.
2.4 Optimization of abstractions with respect to internal processing
Finally, abstractions can be introduced as a result of the optimization of the operations to be executed on the abstraction layer, e.g. optimization of memorization or planning (Toussaint, 2006).
Figure 1 schematically shows different types of abstractions discussed above.
A natural question arises whether we should prefer one type to another.
We believe that the discussion about the best suited abstraction type is contra-productive.
Instead we investigate how one abstraction type supports the building of another type.
3. Incremental development
Ongoing development supposes that the system can build up on its abilities by creating favorable interaction, by skill optimization, and control reorganization in the sense of usage of learned regularities by higher metal functions (simulation, planning, attention). If one of systems reward functions evaluates the quality of interaction with environment, then with help of simple abstractions described in 2.1 and 2.2 it can optimize its interaction with environment in order to enhance the learning of expectation models. These in turn can be initialization of optimization leading to type 2.3. The learned expectations also allow to use higher mental functions and thus to build type 2.4. For the stability reasons it is important that tight coupling of abstractions to behavior, e.g 2.3, 2.4, occurs on later stages. The results of testing these ideas on real-world applications are summarized in (Mikhailova, 2010).
4. Conclusions
The appropriate knowledge representation is one of the hardest problems in the research on cognitive systems. In this contribution we did not ask what is the best way, instead we categorized different possibilities from the perspective of a complete system that monitors reward acquisition. We showed how one abstraction type supports the building of another in the sense of ongoing development. We hope to encourage more intensive discussion on integration of multiple types of learning started in (Balkenius, 1994).
References
Balkenius, C. (1994). Biological learning and artificial intelligence. Lund University Cognitive Studies, 30.
Hart, S., Ou, S., Sweeney, J., and Grupen, R.(2006). A framework for learning declarative structure. In Robotics: Science and Systems Workshop on Manipulation for Human Environments. Philadelphia, Pennsylvania.
Ishiguro, H., Sato, R., and Ishida, T. (1996).Robot oriented state space construction. In IEEE/Robotics Society of Japan Int'l Conf. Intelligent Robots and Systems. IEEE.
Koerner, E. and Matsumoto, G. (1998). Cortical architecture and self-referential control for brain-like processing in artificial neural systems. Artificial life and robotics, 2(3):170-178.
Lungarella, M. and Sporns, O. (2005). Information self-structuring: Key principle for learning and development. International Conference on Development and Learning, 0:25-30.
Mikhailova, I. (2010). System design for autonomous open-ended acquisition of new behaviors. Suedwestdeutscher Verlag fuer Hochschulschriften.
Oudeyer, P.-Y. and Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 1(6).
Pezzulo, G. and Castelfranchi, C. (2007). The symbol detachment problem. Cognitive Processing,8(2):115-131.
Toussaint, M. and Storkey, A. J. (2006). Probabilistic inference for solving discrete and continuous state markov decision processes. In ICML '06:Proceedings of the 23rd international conference on Machine learning. ACM.
Wolpert, D. and Kawato, M. (1998). Multiple paired forward and inverse models for motor control. Neural Networks, 11:1317-1329.
Keywords:
Knowledge representation,
System Integration
Conference:
IEEE ICDL-EPIROB 2011, Frankfurt, Germany, 24 Aug - 27 Aug, 2011.
Presentation Type:
Poster Presentation
Topic:
Architectures
Citation:
Mikhailova
I and
Goerick
C
(2011). Influence of the reward on the abstractions.
Front. Comput. Neurosci.
Conference Abstract:
IEEE ICDL-EPIROB 2011.
doi: 10.3389/conf.fncom.2011.52.00026
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
11 Apr 2011;
Published Online:
12 Jul 2011.
*
Correspondence:
Dr. Inna Mikhailova, Honda Research Institute Europe GmbH, Honda Research Institute Europe GmbH, Offenbach/Main, 63073, Germany, inna.mikhailova@honda-ri.de