Event Abstract

Toward a Goal Directed Construction of State Spaces

  • 1 Frankfurt Institute for Advanced Studies, Germany

Reinforcement learning of complex tasks presents at least two major problems. The first problem is caused by the presence of sensory data that are irrelevant to the task. It will be a waste of computational resources if an intelligent system represents information that are irrelevant, since in such a case state spaces will be of high dimensionality and learning will become too slow. Therefore, it is important to represent only the relevant data. Unsupervised learning methods such as independent component analysis can be used to encode the state space [1]. While these methods are able to separate sources of relevant and irrelevant information in certain conditions, nevertheless all data are represented.

The second problem arises when information about the environment is incomplete as in so-called partially observable Markov decision processes. This leads to the perceptual aliasing problem, where different world states appear the same to the agent even though different decisions have to be made in each of them. To overcome this problem, one should constantly estimate the current state based also on previous information. This estimation process is traditionally performed using Bayesian estimation approaches such as Kalman filters and hidden Markov models [2].

The above-mentioned methods for solving these two problems are merely based on the statistics of sensory data without considering any goal-directed behaviour. Recent findings from biology suggest an influence of the dopaminergic system on even early sensory representations, which indicates a strong task influence [3,4]. Our goal is to model such effects in a reinforcement learning approach.Standard reinforcement learning methods often involve a pre-defined state space. In this study, we extend the traditional reinforcement learning methodology by incorporating a feature detection stage and a predictive network, which together define the state space of the agent. The predictive network learns to predict the current state based on the previous state and the previously chosen action, i.e. it becomes a forward model. We present a temporal difference based learning rule for training the weight parameters of these additional network components. The simulation results show that the performance of the network is maintained both, in the presence of task-irrelevant features, and in the case of a non-Markovian environment, where the input is invisible at randomly occurring time steps.

The model presents a link between reinforcement learning, feature detection and predictive networks and may help to explain how the dopaminergic system recruits cortical circuits for goal-directed feature detection and prediction.

References

1. Independent component analysis: a new concept? P. Comon. Signal Processing, 36(3):287-314 (1994).

2. Planning and acting in partially observable stochastic domains. L. P. Kaelbling, M. L. Littman and A. R. Cassandra. Artificial Intelligence, 101:99-134 (1995).

3. Practising orientation identification improves orientation coding in V1 neurons. A. Schoups, R. Vogels, N. Qian and G. Orban. (2001). Nature, 412: 549-53 (2001).

4. Reward-dependent modulation of working memory in lateral prefrontal cortex. S. W. Kennerley, and J. D. Wallis. J. Neurosci, 29(10): 3259-70 (2009).

Conference: Bernstein Conference on Computational Neuroscience, Frankfurt am Main, Germany, 30 Sep - 2 Oct, 2009.

Presentation Type: Poster Presentation

Topic: Abstracts

Citation: Saeb S and Weber C (2009). Toward a Goal Directed Construction of State Spaces. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference on Computational Neuroscience. doi: 10.3389/conf.neuro.10.2009.14.019

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 25 Aug 2009; Published Online: 25 Aug 2009.

* Correspondence: Sohrab Saeb, Frankfurt Institute for Advanced Studies, Frankfurt, Germany, saeb@fias.uni-frankfurt.de