Event Abstract

A developmental agent for learning features, environment models, and general robotics tasks

  • 1 Sandia National Laboratories, United States

BECCA, a developmental agent, is described and demonstrated performing a high-dimensional visual servoing task. BECCA learns 1) a feature representation of its state space, 2) a model of its environment, and 3) how to behave in order to receive reward. It learns these things concurrently in an on-line and incremental fashion, without any prior knowledge of its environment or the nature of its inputs and outputs. (See [4] for the full paper.)

Biological developmental agents, such as children, learn both feature representations and world models through their actions and interactions. Learning a feature representation is the act of mapping low-level inputs onto higher level perceptual symbols or categories. Learning a world model is the act of recording observed features in order to capture salient aspects of the agent’s experience. The learning of feature representations and world models that are both useful and biologically plausible are among the chief technical challenges for those seeking to create developmental agents. [1] This work is an effort to address the problems of integrated feature, model, and task learning in a unified framework.

METHOD
A BECCA agent interacts with the world by taking actions, making observations, and receiving reward. Formulated in this way, natural world interaction is a general reinforcement learning (RL) problem, and BECCA is a potential solution.
BECCA’s feature creation algorithm identifies patterns in the agent’s input that are repeated and thus likely to have semantic relevance. It works by grouping the elements of the input vector into groups whose activity is somewhat correlated. In a pixel array exposed to a video stream of broadcast television, for example, the correlation between two neighboring pixels will be much higher than that occurring between distant pixels, and a small number of pixels grouped by correlation will be closely related in space. The groups of input elements form input subspaces, and unit vectors in these subspaces represent features . The feature creator creates new features by adopting novel inputs, also known as imprinting. Inputs must be sufficiently different from existing features in order to be imprinted. Returning to the pixel array example, once a small group of correlated pixels has been formed, features will be created based on patterns observed in those pixels. These may include horizontal, vertical, and diagonal edges, as well as uniform intensity and center-surround patterns.

In addition to creating and updating the feature set at each time step, the feature creator projects the inputs onto exiting features to calculate feature votes. These feature votes are then subjected to a winner-take-all operation, such as might be implemented in a neural network with mutual inhibition. A single feature in each group remains active, and the set of active features is passed on to the reinforcement learner. It is also fed back and combined with the next observation to form the input for the next time step. The recursive nature of the feature creation algorithm allows more complex features to be created from combinations of simpler ones.

The reinforcement learner takes in feature activity and reward and selects an action to execute. The reward map associates features with reward by approximating the correlation between reward and each feature. An attention filter selects the most salient feature at each time step as the attended feature. Working memory is a weighted combination of several recent attended features and any recent actions. The attended feature and working memory are used to update the model. The model is a table of cause–effect pairs, where each effect is an attended feature and each cause is the working memory from the preceding time step. The model also contains a record of the number of times each pair is observed. Rarely observed pairs are periodically removed. This table of cause–effect pairs provides a record of common transitions in feature space, as well as any actions that may have been taken to precipitate them.

RESULTS
A simulated task similar to the Morris water maze [2] was constructed to demonstrate BECCA in operation. Complete MATLAB code for the simulation and BECCA implementation (version 0.3.5) can be found at [4].

Acknowledgements

This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

References

[1] A. Cangelosi, G. Metta, G. Sagerer, S. Nolfi, C. L. Nehaniv, K. Fischer, J. Tani, B. Belpaeme, G. Sandini, L. Fadiga, B. Wrede, K. Rohlfing, E. Tuci, K. Dautenhahn, J. Saunders, and A. Zeschel. Integration of action and language knowledge: A roadmap for developmental robotics. IEEE Transactions on Autonomous Mental Development, 2(3):167–195, 2010.

[2] R. G. M. Morris. Spatial localization does not require the presence of local cues. Learning and Motivation, 12(2):239–260, 1981.

[3] B. Rohrer. BECCA code page. http://www.sandia.gov/rohrer/code.html, 2011.

[4] B. Rohrer. Sandia technical report SAND2011-2275 C: A developmental agent for learning features, environment models, and general tasks. http://www.sandia.gov/rohrer/doc/Rohrer11DevelopmentalAgentLearning.pdf, 2011.

Keywords: agent, biologically-inspired, Cognitive Architecture, feature creation, natural world interaction, reinforcement learning

Conference: IEEE ICDL-EPIROB 2011, Frankfurt, Germany, 24 Aug - 27 Aug, 2011.

Presentation Type: Poster Presentation

Topic: Architectures

Citation: Rohrer B (2011). A developmental agent for learning features, environment models, and general robotics tasks. Front. Comput. Neurosci. Conference Abstract: IEEE ICDL-EPIROB 2011. doi: 10.3389/conf.fncom.2011.52.00001

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 17 Jun 2011; Published Online: 12 Jul 2011.

* Correspondence: Dr. Brandon Rohrer, Sandia National Laboratories, Albuquerque, NM - New Mexico, United States, brrohre@sandia.gov