Event Abstract

Connectionist model of action learning and naming

  • 1 Comenius University, Department of Applied Informatics, Slovakia

I. Introduction

The recent rise of interest in cognitive developmental robotics attempts to provide new understanding of how cognitive functions develop by means of a synthetic approach and how physical embodiment shapes information structuring through interactions with the environment (Asada 2009). These endavours also include the grounding the language learning (such as action words) in sensorimotor behavior (see e.g. (Lallee et al, 2010; Marocco et al, 2010; Sugita and Tani, 2005). We propose here a connectionist system embedded in a simulated robot that learns to link the sensorimotor behavior with action words by using an ecologically more plausible learning of motor patterns, contrasting with earlier models.

We use a computer simulator of the humanoid robot iCub (Tikhanoff et al, 2008). The robot's task was to learn: (1) to execute actions related to an object placed on a table in front of the robot in its reach, (2) to name the action the robot was currently executing. In both tasks, the required action was determined by its name and the target position. We use the same environment as Sugita and Tani (2005), i.e.~three objects (block, cylinder, ball) each existing in three colors (red, green, blue), being subject to three physical actions (point, touch, push). However, the task complexity is higher because we impose no restrictions upon target location and its properties (that is, any object can be located in any of the three positions, unlike Sugita and Tani (2005). Hence in total, 216 different visual inputs are possible. The lexicon contains 9 words used for action naming (3 for action, 3 for color and 3 for object type). Each sentence is a composition of two words where the target object is referred to either in terms of its type (e.g. touch block) or color (e.g. point red).

II. Model architecture

The model is composed of three neural-network-based modules that are trained separately and in different ways. The first module is trained to determine the target position in the visual scene from the low-level visual information and the feature-based target information. The second module is trained to execute actions, based on a linguistic command, leading to the requested sequential motor behavior. The third module is trained to provide the linguistic description of the executed action. Below we describe each of the modules in more detail.

A. Target localizer

The first module is a multi-layer perceptron (MLP) with a single hidden layer that is trained to map the triple [IMAGE,COLOR,TARGET] onto [TGT-POSITION]. IMAGE is the bitmap obtained from a real image containing three objects in the scene (with different shapes), filtered with an image-processing OpenCV module in order to extract object edges (hence providing the shape information). The bitmap was zoomed out from the iCub's retinal image (one eye) to 60x20 pixels in gray values, rescaled to the interval [0,1]. COLOR is the localist encoding of object colors, represented by 9 neurons (three for each object), so for each input image, three units have value 1, the others are 0. TARGET information is provided by a word denoting the color (3 neurons) or the shape of the target object (3 units). The output POSITION was coded with three units (left, middle, right). The MLP task is to localize the target position (i.e. to drive attention to it) from available input information (i.e. the complete scene and one of two target properties). The MLP has 50 hidden units and all units have a sigmoid activation function. Since the TARGET inputs are words, the functioning of the target localizer can be interpreted as the word understanding module that focuses attention on the target object, described by its shape or color.

We used two-thirds of inputs for training the MLP by standard error back-propagation algorithm (during 600 epochs), the rest was used for testing. The MLP module obtained the precision on testing data of roughly 95% which we consider sufficient for our task.

B. Action learner

The action learning module is based on a biologically plausible reinforcement learning (RL) paradigm which does not assume the availability of arm's joint values corresponding to the target location. Since the model operates in continuous state and action spaces, we based our module on an actor-critic architecture for which a continuous version of the RL algorithm has been proposed. Specifically, we use CACLA algorithm (van Hasselt and Wiering, 2007) and we experimented with its modifications. The action learning module consists of two MLPs - an "actor" that learns actions, and a "critic" that learns to estimate rewards associated with different states, that are used for training the actor.

Both the actor and the critic use as inputs the agent-state information (arm position and one touch sensor), action type and the target position. The iCub's arm proprioceptive information is composed of four degrees of freedom (the hand shape does not change) rescaled to the interval [-1,1]. The actor performs the mapping from [STATE,ACTION,TGT-POSITION] onto [STATE-CHANGE]. The target object position is localist and is taken from the pretrained target localizers's output. The actor's output reflects the arm position change that is added to the input in the next time step. The actor has 40 hidden neurons, which makes it a 11-40-4 network. Actor's exploration during learning is driven by a 4-dimensional Gaussian function centered around the actor's predicted actions, given its current parameters (weights). The critic maps the [STATE,ACTION,TGT-POSITION] onto the expected [REWARD] and it has 20 hidden units. The reward grows with decreasing the distance between the effector and the target (we chose Manhattan metric for point action and Euclidean distance for other two actions), so we assume that the action learner is provided with this visually-based information.
For training both the actor and the critic we used the RL version of the back-propagation learning (CACLA) via gradient ascent (with respect to network weights). Both networks used a tanh activation function.
The crucial part of the algorithm turned out to be the design of the reward function that should allow the differentiation between the three actions (especially separation of touch and push was at first a bit problematic). In addition, for touch action, if the arm touches or pushes the wrong object, or the target object gets pushed, the agent gets penalized. For push action, if the arm effector reaches the target position, the target must be pushed beyond defined threshold, in order to get a reward. If the target is touched but not pushed, the agent gets penalized.

The second module was trained on roughly 200 episodes (i.e. sequences associated with a single action) for each location which turned out to be sufficient for achieving the task, such that the actor could generalize well when initiated from novel arm positions.

C. Action naming

The third module is an echo-state network (ESN; Lukosevicius and Jaeger, 2009), an efficient recurrent network capable of processing sequences, trained to generate the linguistic description of the just executed action. The ESN performs the mapping from [AL-HID,TL-HID] onto [SENTENCE]. AL-HID is the hidden-layer activation of the action learner that allows to predict (and hence name) the action type. TL-HID is the hidden-layer activation of the target localizer that allows to predict the information about the object (type, or color) and hence name it. SENTENCE is the spatial representation of the target two-word sentence. The ESN reservoir with 50 linear units turned out to work fine (even better than units with tanh activation function). The output representation is spatial but we were interested to see how the target words become predicted from the current reservoir activations. We experimented with various ways of training the output (readout) ESN weights. The best method uses all reservoir activations, concatenated together to form a reservoir matrix. The readout weights are then computed using the pseudo-inverse method.

During training the ESN to associate motor sequences with (static) linguistic labels, these were observed to appear at the ESN output during testing as early as few steps after the sequence was initiated. Our hypothesis that naming would start with the verb (i.e.~that action-related output neurons would become activated faster) was hence not confirmed. However, the network showed a certain degree of indecisiveness related to object information, by sometimes (roughly 15% of cases) producing two output neurons simultaneously active rather than one. We are in a process of optimizing this module performance.

III. Conclusion

We have presented a novel connectionist model that masters the task of executing the actions and naming them, hence linking sensorimotor behavior with language. Unlike some previous models, based on learning to predict the next proprioceptive state via supervision, our model does not assume this proprioceptive information. It must instead be learned from the interaction with the environment. Further experiments are being run to fine-tune the model. It will also be valuable to scale up the model.

Acknowledgements

This research was supported by Slovak Grant Agency for Science (VEGA), grant \# 1/0439/11 and its results are based on the master thesis of Tomas Malik, supervised by the author.

References

Asada M., K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui, Y. Yoshikawa, M. Ogino, and C. Yoshida, Cognitive developmental robotics: a survey, IEEE Transactions on Autonomous Mental Development, vol.1, no.1, 12-34, 2009.

Lallee S., C. Madden, M. Hoen, and P. Dominey, Linking language with embodied and teleological representations of action for humanoid cognition, Frontiers in Neurorobotics, vol.4, 2010.

Marocco D., A. Cangelosi, K. Fischer, and T. Belpaeme, Grounding action words in the sensorimotor interaction with the world: experiments with a simulated icub humanoid robot, Frontiers in Neurorobotics, vol.4, 2010.

Sugita Y. and J. Tani, Learning semantic combinatoriality from the interaction between linguistic and behavioral processes, Adaptive Behavior, vol.13, 33-52, 2005.

Tikhanoff V., P. Fitzpatrick, F. Nori, L. Natale, G. Metta, and A. Cangelosi, The iCub humanoid robot simulator, Advanced Robotics, vol.1, no.1, 22-26, 2008.

van Hasselt H., and M. Wiering, Reinforcement learning in continuous action spaces, in IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), 2007, 272-279.

Lukosevicius M. and H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review, vol.3, no.3, 127-149, 2009.

Keywords: Action Learning, connectionist networks, Language, reinforcement learning

Conference: IEEE ICDL-EPIROB 2011, Frankfurt, Germany, 24 Aug - 27 Aug, 2011.

Presentation Type: Poster Presentation

Topic: Architectures

Citation: Farkas I (2011). Connectionist model of action learning and naming. Front. Comput. Neurosci. Conference Abstract: IEEE ICDL-EPIROB 2011. doi: 10.3389/conf.fncom.2011.52.00019

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 11 Apr 2011; Published Online: 12 Jul 2011.

* Correspondence: Dr. Igor Farkas, Comenius University, Department of Applied Informatics, Bratislava, 84248, Slovakia, farkas@fmph.uniba.sk