Learning indoor robot navigation using visual and sensorimotor map information

As a fundamental research topic, autonomous indoor robot navigation continues to be a challenge in unconstrained real-world indoor environments. Although many models for map-building and planning exist, it is difficult to integrate them due to the high amount of noise, dynamics, and complexity. Addressing this challenge, this paper describes a neural model for environment mapping and robot navigation based on learning spatial knowledge. Considering that a person typically moves within a room without colliding with objects, this model learns the spatial knowledge by observing the person's movement using a ceiling-mounted camera. A robot can plan and navigate to any given position in the room based on the acquired map, and adapt it based on having identified possible obstacles. In addition, salient visual features are learned and stored in the map during navigation. This anchoring of visual features in the map enables the robot to find and navigate to a target object by showing an image of it. We implement this model on a humanoid robot and tests are conducted in a home-like environment. Results of our experiments show that the learned sensorimotor map masters complex navigation tasks.


INTRODUCTION
Spatial cognition refers to humans' and animals' ability of gathering information about the environment, organizing and using the spatial knowledge, and revising it when the environment changes (Montello, 2001). It is a fundamental ability which helps us to achieve different tasks like navigation and grasping. The spatial knowledge of an environment is presented with features and relationships as a sensorimotor map, which enables an animal to navigate flexibly based on the abstract information stored in the sensorimotor map and to make detours by adjusting the map structure when an obstacle appears. Tolman (1948) observed rats not behaving in a simple stimulus-response fashion but based on some form of internal spatial representation of the environment, which he termed a "cognitive map".
Considering that humans and animals can achieve various complex tasks, how information is processed in the brain should be the key for realizing an intelligent robot (Burgess et al., 2002). Therefore, the study of human spatial cognition is crucial for research on robot mobile behavior. A model based on spatial knowledge has the advantages of high robustness against sensor noise, good adaptation capability during environmental change, and high efficiency, which could overcome the challenges of indoor navigation such as high complexity of the environment and the possible dynamic changes during robot navigation. Furthermore, such a model supports the integration of a service robot into an ambient assistant living (AAL) setup.
Consistent with the suggestion that a spatial environment can be represented with sensorimotor features and actions associated with changes in the sensory input (Zetzsche et al., 2009), we develop a neural-inspired model for robot navigation based on learning sensorimotor representation of an indoor environment.
The focus of this system is to bring into the real world a neural network for planning and navigation based on a model of spatial memory that resembles hippocampal place cells (Toussaint, 2006;Martinet et al., 2011).
The sensorimotor map consists of: (1) a spatial memory that learns the environment through visual perception, (2) an action memory for learning actions associated with state transitions in the spatial memory, and (3) an action layer that controls the robot's behavior based on the associated action input. While related models for navigation have only been tested in simulation (Toussaint, 2006;Martinet et al., 2011), we have further developed our model into a real world scenario. To handle sensor and actuator noise present in a real environment, the positions of the robot and navigation target are represented by multiple hypotheses. Based on these distributed representations in the spatial memory, multiple action memories associated with state transitions combine in the competitive action layer, which yields a robust and smooth control signal for navigation. To handle the dynamics of a real environment, a reflex-like behavior avoids obstacles based on the robot's sensor signals and reduces the corresponding action memory weights. Thereby the robot remembers the obstacles in its spatial memory of the sensorimotor map and avoids them proactively in the future. Weight reduction and recovery, together with dynamic space representations, enable life-long model adaptation. A further unique feature of our approach is that by anchoring the appearance features of the environment with the states in the spatial memory, visual associations are linked to specific locations in the map, which is inspired by biological evidence [e.g., visual landmarks helps desert ants to return home (Collett et al., 1998)]. This visual anchoring allows the robot to achieve complex tasks such as fetching an object by showing an image of it. Together, these developments bring a neural network model for navigation into a dynamic real environment.
The reminder of this article is organized as follows: Section 2 briefly reviews the findings of spatial cognition and the related computational models developed in recent years, and section 3 presents the idea and the goal of our model. Section 4 provides insight into the model and the details of each component. Section 5 describes the mechanism of robot path planning, navigation control, reflex-like obstacle avoidance and the adaptation of the map connections based on the feedback of obstacle avoidance. Finally, we evaluate the test cases in section 6 and summarize in section 7.

RELATED WORK
Recent advances in neuroscience provide insight about the neural mechanisms of spatial cognition in humans and animals. An important finding are the place cells in the hippocampus of rats by O' Keefe and Dostrovsky (1971) and Andersen et al. (2006). Their activity rate is strongly related to the rat's location in the environment. Later on, head direction cells were found in the rat's brain [first in the postsubicular cortex of the hippocampus (Taube et al., 1990a,b), then in other related brain areas (Mizumori and Williams, 1993;Chen et al., 1994;Lavoie and Mizumori, 1994;Taube, 1995;Blair et al., 1998;Cho and Sharp, 2001)]. These cells fire selectively when the rat faces a specific orientation and provide a signal of the rat's heading direction during navigation (Pennartz et al., 2011). Also, anticipatory head direction signals are found in the anterior thalamus (Blair and Sharp, 1995;Taube and Muller, 1998). Together, these cells constitute a coordinate system that provides a representation of the location based on the animal's internal position sense in the environment.
A neural map can be represented in different ways, for example as a topological map (Cuperlier et al., 2007;Martinet et al., 2011), a continuous attractor network (Samsonovich and McNaughton, 1997;Milford et al., 2004;Samsonovich and Ascoli, 2005), etc. Toussaint (2006) developed a model using a self-growing mechanism (Fritzke, 1995) that forms a map with a dynamic size, which is flexible for exploring an unknown environment. The RATSLAM model developed by Milford and Wyeth (2010) provides a nature-inspired way for mapping, which represents the spatial information in its pose cells by combining the internal sensing and the external vision perception. However, the experience map in RATSLAM builds mainly line-like trajectories in space rather than mesh-like representations of space, due to a strict rule to connect cells. This constrains the generation of flexible navigation.
To acquire robust robot navigation, Weiller et al. (2010) proposed an unsupervised learning method to learn navigation behavior associated with state transitions in an unsupervised manner and control the robot during navigation by selecting the action with the highest value. Weber and Triesch (2008) and Witkowski (2007) present neural network models that learn associations between adjoining states and the action that links them. In addition, closed-loop control models for other behaviors, e.g., arm reaching, apply similar methods of planning (Herbort et al., 2010). A drawback of these models is that the representation of the state space is hardwired, which means that they only work in a well-defined environment. The action model in these models is discrete, and the robot is controlled by a winner neuron's action signal. Hence, the executed action might not be accurate in the continuous real world because of the discretisation error, or a fine-meshed action space would be required, which increases the learning effort strongly.
Given a population code for state estimation, there will be different actions suggested by the network for the robot to take. In order to allow their integration, the action layer is implemented as a neural field. A neural fields model has been seen as a simple but effective way to model motion perception (Giese, 1998), and has drawn more attention in the robotic area because of its distributed representation and the dynamic integration of information (Cuperlier et al., 2005;Toussaint, 2006;Torta et al., 2011). For example, Erlhagen and Bicho (2006) achieved a goal-directed robot navigation behavior with realtime obstacle avoidance with dynamic neural fields (DNFs). Because of the distributed information encoding, i.e., the neural population coding, the DNFs approach is able to generate stable signals by updating the activation patterns with noise canceling.

MOTIVATION OF OUR NEURAL APPROACH
Our neural architecture models the spatial context, the reward signal, the decision making and the action response as a whole. The spatial knowledge of the environment is modeled with an internal representation, i.e., the sensorimotor map, which allows the robot to select actions dynamically and to adapt its strategy when the environment changes. Not only the state transition but also the actions corresponding to them are represented in the map. In order to accelerate learning and to avoid possible danger caused by the robot's active exploration, the map is built by observing the movement of a person using our localization method with a ceiling-mounted camera (Yan et al., 2011). This design has been chosen in the context of an AAL setup that uses a small socially assistive robot as a communication interface to the person (The KSERA project ). The ceiling camera presents a cheap and little intrusive solution to localize a person anywhere within a larger room, even when the small robot cannot directly see the person. At the same time, it supplies high-level visual input about the robot location in allocentric coordinates, consistent with hippocampal place coding, activating directly the neurons in the spatial memory. This bypasses the need of learning a visual system that localizes from the robot's camera image, as has been done by Wyss et al. (2006). The localization input, which comes via particle filters (Yan et al., 2011), is compatible with distributed neural coding (Deneve, 2005;Huang and Rao, 2009;Wilson and Finkel, 2009).
The map building itself resembles "latent learning," where there is no task during the exploring of the room by a person. When the navigation task is active, the sensorimotor map together with the reward signals from the target position support a model-based reinforcement learning, which guides the robot for maneuvering to the target. Because the robot control is not calibrated, but learned, the model can be applied easily in different rooms without camera calibration. Parts of this work have been presented in Yan et al. (2012b).
Our model tackles the challenge of applying a neural system in realistic settings, and uses a humanoid Nao robot as experimental platform. Other than pure simulation models that simplify the environment or model the system dynamics exactly without considering noise, our robot needs to navigate by finding a movement direction among 360 • in our home setup, which requires further refinement to match requirements of real application. For example, in order to obtain a continuous robot control, our model represents the individual action for each state transition to obtain a high control accuracy with minimal memory requirement and uses a population code to represent the state in a probabilistic manner. Using this distributed representation, multiple state transitions are active at the same time and the corresponding action signals are merged together via a ring-form neural field. This results in a smooth and continuous action control, where actions are generated that did not occur during map learning. Moreover, the sensorimotor map does not need to predefine its state space, because it adapts itself using latent learning while a person explores the unknown environment. As no goal is needed here, the path planning is not learned for a specific target, which permits a flexible navigation behavior toward an arbitrary possibly moving target.
According to Penner and Mizumori (2012), the dorsolateral striatum (generates automatic behavior when appropriate) and the dorsal striatum (coordinates the goal-directed behavior) are important for animals to generate flexible navigation behaviors. Our framework models the dorsolateral striatum with a reflexlike behavior for interaction with the obstacles and updates the connectivity of connections in the sensorimotor map, which resembles the dorsal striatum to coordinate the path planning and provides a pro-active obstacle avoidance property. Consistent with the advice of using a hybrid control architecture (Murphy, 2000), our system fuses the path-based and behavior-based navigation in a constructive manner.
In order to build up a memory of the environment context, features of the current camera view are extracted and associated with the current state of the robot in the spatial memory. This memory is essential to accomplish complex cognitive tasks, for example to ground visual appearances to a location (Bellotto et al., 2008). In our case, a camera integrated in the robot's head captures the appearance of the environment during navigation. The anchored memory of objects is used to let the robot locate an object in an environment by showing a picture of it after the robot has observed the environment properly through its explorative navigation.

ARCHITECTURE
The architecture of our navigation framework is shown in Figure 1. Three sensors are applied: (1) a robot camera fixed on the robot's head for extracting the visual features of the environment, (2) a ceiling-mounted camera to localize the person and the robot, and (3) sonar sensors installed on the robot's chest for detecting obstacles. The position information of the person and the robot is used for building up the sensorimotor map and planning navigation, and the visual features are used for generating an appearance memory that associates with the robot's position in the map. The interaction model generates reflexbased behavior for obstacle avoidance and also adapts the spatial memory so that the robot can remember and avoid the obstacle proactively next time.
The core of the system is the sensorimotor map. As shown in Figure 2, it consists of three components: (1) a spatial memory that learns the structure and the appearance of the environment, (2) an action memory that learns an inverse control model, and (3) an action layer for robot control. We will first describe these components and then the mechanism of planning and interaction with obstacles.

SPATIAL MEMORY
The spatial memory layer represents the spatial information of an environment. It contains two types of spatial information: states that present the features and connections that present the relations between different features. When a person, as observed by the ceiling-mounted camera, or the robot visits a novel location, features of this location as well as the relation with FIGURE 1 | Overall system architecture.
FIGURE 2 | Architecture of the sensorimotor map. The red curve above shows the activation bump of the neural field (action layer) which is used for robot control.

Frontiers in Neurorobotics
www.frontiersin.org October 2013 | Volume 7 | Article 15 | 3 neighboring places will be stored. The features of the spatial memory can be presented in different forms, for example, they could be readings of a laser scanner, etc. In our case, it is the position information ("where") measured by the ceilingmounted camera, combined with feature information ("what") that is observed from the robot's camera. When the robot is at a specific position, the neurons whose features match with the ones obtained from the current position will be activated. This resembles the neural activity of the place cells in the hippocampus, whose firing rate is dependent to the current location.
During navigation, the desired next state will be estimated in the spatial memory layer to guide the robot to the target position (see Section 5). As output of the spatial memory layer, the activity signal from the current state of the robot, i.e., q c (red line in Figure 2), and the desired next state, i.e., q n (blue line in Figure 2), will be sent to the action memory layer. The computation of q c and q n is described in the following section.
The spatial memory is represented by a Growing When Required (GWR) network (Marsland et al., 2002). Compared with the Growing Neural Gas model (Fritzke, 1995) that we used in Yan et al. (2012b), GWR does not grow over time but only when novelty is detected, which provides better convergence properties. The network consists of a set A of neurons, each associated with feature vectors v, and a set N of connections to describe the relations (i 1 , i 2 ) between neurons i 1 and i 2 in A.
The different features presented in the neurons are the x, y coordinate information on the image from the ceilingmounted camera, v c = {x, y}, and the appearance memory, which resembles the visuospatial perception, based on visual keypoints extracted from the robot's camera v r = {k 1 , k 2 , . . . }. These features are used to determine the position of the robot as well as the target. Neurons and connections will be allocated or updated dynamically using a competitive Hebbian learning rule.
Because a person's walking behavior is different to the robot's, the map built by observing the person's movement might not be totally suitable for a robot. Hence, robot-environment interaction is essential for the robot to adapt its navigation strategy online. We therefore define a connection weight c i 1 i 2 ∈ [0, 1] for each connection (i 1 , i 2 ) to indicate how "easy" a robot can move along this connection. The higher c i 1 i 2 is, the easier is the connection for the robot to walk through. When an obstacle is detected or the robot has difficulties walking further, c i 1 i 2 will be decreased and may reach zero. When a connection is created, its connection weight c i 1 i 2 will be initialized to 1 and adapted during the robot navigation. Details about this adaptation will be described in section 5. The following part of this section will explain the methods of map building based on the position and the visual features.

Learning position information
The map building is based on the position information ξ = {x, y} from the person or the robot localization and orientation model using the ceiling-mounted camera [for details about the personand robot-localization please see Yan et al. (2011Yan et al. ( , 2012a]. We first find the winner neuron i * and the second winner neuron i * * by calculating the map activity s (later we will use s p to indicate a person and s r to indicate a robot in section 5) based on ξ with the following equations: A new node will be added between neurons i * and i * * when the following two conditions hold: (1) The activity of the winner neuron s i * is smaller than a threshold activity a t , which means that the person is far from the position represented by any map unit, and (2) The firing counter h i * has become smaller than h t , which means this neuron cannot move a lot any more.
Neuron insertion includes the following steps: • Insert a new neuron r with the average weights of the winner neuron and the current position: • Insert edges between r and i * as well as between r and i * * • Remove the current connection between i * and i * * The coordinate information of the winner neuron (i.e., v c i * ) as well as its neighborhood neurons (v c i for all directly adjacent neurons of i * ) will be updated. Each neuron is assigned an age factor age i , which can increase incrementally, and a firing counter h to control the adaptation efficiency. In order to improve the convergence of the network and to have a homogeneous distribution of the neurons, the adaptation is as follows: where i * and n are the fixed learning rates of the winner and the neighborhood neurons, and h i * and h n are the corresponding firing counters. The firing counters (initialized with h 0 > 0) are calculated as follows: where τ b , τ n , and μ are constant parameters for controlling the adaptation. The firing counters indicate how "active" a neuron is. When a neuron is added in the network, its firing counter is initialized as a high value h 0 , which allows it to adapt its features quickly. During the iteration, h decreases toward a small value μh 0 and the neuron loses its mobility, which ensures that the neuron's positions become stable. After adaptation, we increase the age of all edges that connect with neuron i * : and delete the connection whose age is over a threshold age m . Isolated neurons that have no neighborhood will be deleted as well. For details of parameter setting please see Table A1 in the Appendix.

Anchoring the appearance memory
The robot's camera captures vision information when the robot walks during or after map building. The neuron i * closest to the robot's location will be active in this case and the visual features extracted from the robot's camera will be assigned to v r i * : A buffer is defined for each neuron to store the last 64 visual features when the robot visits the corresponding place, irrespective of its orientation. We use SURF features (Bay et al., 2006) to present the information of keypoints (see Figure 6A). Each keypoint contains the x, y position of the feature point in the image of the robot camera and a 64-dimensional vector that represents the image gradients. As a result, the robot learns a memory by associating the extracted visual features with its current location in the spatial memory during navigation. When the robot visits the same place in the map again with different orientation, the features from the new point of view will be inserted to the same neuron corresponding to this place. This memory is used for locating an observed object by comparing the similarity between the features extracted from an image of the target and the visual features stored in the appearance memory. For details of computation of SURF features as well as feature matching please read the original paper (Bay et al., 2006).

FORWARD AND INVERSE MODEL
The forward and inverse model represents the robot control signals coupled with the state transition in the spatial memory layer. The action information is learned in the weights w ki 1 i 2 which connect fully with the neurons k in the action layer. Depending on the way of controlling the robot, the action information can be presented in a different form, for example the force, velocity, angle value, etc. In our case the robot is controlled by adjusting its heading direction. For this, we use a ring-form DNF, which will be described in the next section. During map building, when a robot moves in a room and its spatial representation changes from i 1 to i 2 , the action executed at that time will be associated with this state transition. As a result, the robot learns the action for each state transition and is able to select the most appropriate behavior for navigation. We use the second order weights {w ki 1 i 2 }, which is also called Sigma-Pi weights (Weber and Wermter, 2007), to store the action information associated with the state transfer in the spatial memory. When the robot shall move from i 1 to i 2 in the spatial memory, the corresponding second order weight will be activated with the input signal s r i 1 of the current state i 1 and q d i 2 of the desired next state i 2 , and the output I k toward unit k in the action layer will be computed as follows: where the calculation of q d i 1 will be described in section 5. Because of the distributed representation of the robot's position, multiple connections may be activated at the same time. Equation (14) sums up those inputs.
The neural field of the action layer has 36 nodes, hence k ∈ {1, 2, . . . , 36}. Assume that there are m neurons in the spatial memory layer, then the total number of connection weights w ki 1 i 2 are 36m 2 , which grows quadratically according to the number of spatial memory neurons. However, most of the weights are zero since the action layer is sparsely connected with the spatial memory layer.
These connection weights can be trained online. The action learning can be done based on observing the person's movement, or based on the robot's location and motion information. Here we describe the method of action learning only regarding the observation of the person's movement because in principle the action learning based on the robot's movement is similar. The winner neurons with respect to the person's position will be determined at first and all the connections c i 1 i * between the winner neuron i * and its neighborhood neurons (indexed with i 1 ) will be adapted. Assuming that a connection c i 1 i * is active, the direction associated with this connection is calculated and the corresponding weights w ki 1 i * are trained. Since there are two possible walking directions for every connection (from i 1 to i 2 and from i 2 to i 1 ), we train both weights at the same time as follows: (1) According to the position x i 1 , y i 1 of neuron {i 1 } and x i * , y i * of neuron {i * } of the spatial memory, we calculate the possible orientation o i 1 i * of connection c i 1 i * using inverse trigonometric functions: And then we calculate the opposite orientation o i * i 1 : (2) Two bumps of activation with the size of the DNF will be created in the shape of a circular normal distribution, one around the link orientation: and another around the opposite orientation: where p ki 1 i * is the k-th connection weight of the action memory for orientation o i 1 i * , κ is a constant and J 0 (κ) is the modified Bessel function of order 0 (Abramowitz and Stegun, 1965): (3) Minimize the errors between the current activation p and the second order weights w, i.e., 1 2 ||p ki 1 i * − w ki 1 i * || 2 . Gradient descent leads to: where η is a fixed learning rate.
Note that the spatial map and the forward and inverse model is based on spatial relations, but not on temporal ones during person moving, because the person's motion commands are unknown. In case of learning the sensorimotor by observing the robot's movement, the forward and inverse model can be built based on the robot's action signal.

ACTION LAYER
The action layer generates the robot control signals based on the active action units during navigation. A DNF model is used to merge these action signals and to adjust the robot's walking orientation by showing the desired robot orientation. The DNF is a biologically-inspired model of the neural dynamics in cortical tissues (Amari, 1977), which is of interest in robotics to generate dynamic behavior (Cuperlier et al., 2005;Erlhagen and Bicho, 2006). A one-dimensional ring-form DNF with 36 neurons is implemented in our work representing the desired robot orientation in 10 • increments. The DNF is capable of integrating the multiple action codes received by the action layer and adjusting the robot's motion with a smooth orientation behavior. Each neuron k of the DNF has a membrane potential u k that represents the activity and lateral connections n kj with other neighbor neurons j. Through the following updating rule (Equation 21), the DNF generates an activation bump dynamically to show the suggested orientation for the next step.
where h is a rest potential, τ is a temporal decay rate of the membrane potential, and I k is the input stimulus of the k-th neuron received from the second order weights that encodes the desired robot orientation. We use here a Gaussian-function with negative offset as the function n kj to describe the lateral interaction of neurons: where β is a scaling factor, σ 2 a variance, k, j the index positions of neurons and c a positive constant. The function f (u) is a sigmoid transfer function of a single neuron with a constant offset g: The robot's desired orientation O d is calculated using vector averaging: We control the robot's navigation by giving it a differential orientation command: where O p is the present robot's estimated orientation (see Yan et al., 2012a), c is a constant rotation speed parameter and d is a constant threshold. Details about parameter setting are listed in Table A2 in the Appendix.

PLANNING AND NAVIGATION
As described in Equation (14), the control signal entering the DNF network is computed based on the connection weights w, the activity of the current state q c and the activity of the next desired state q d . Because the connection weights w are trained during map building and q c can be computed with respect to the robot's position, this section focuses on how to determine the next desired state q d . We therefore assign for each neuron of the spatial memory a reward value that spreads from the target states representations iteratively with an exponential decrease. First, however, the navigation target needs to be marked in the spatial memory. Depending on the navigation task, we use the coordinate information v c or the appearance memory feature Frontiers in Neurorobotics www.frontiersin.org October 2013 | Volume 7 | Article 15 | 6 v r as initial signals to define the target. To this end, an initial reward r i (0) at the target location is calculated with the following steps: (1) Calculate the goodness-of-match signals m i of the neuron i in the spatial memory. For approaching a person, we calculate the signals m based on the distance between the person's position ξ and the neuron's coordinate v c . Assume that the person's position is distributed with a position list ξ s (indexed in s) with corresponding probabilities w s where s w s = 1, the m is computed as: This means, the closer v c i of neuron i to ξ s is, the higher the activity this neuron has. For finding an object, we compare the similarity between v r i of neuron i and the features v obj of the target object as follows: where v r i are the learned keypoint features (see Equation 13) and v obj are the keypoint features extracted from the target object. The more visual keypoints of the test object match v r i of the neuron i, the higher the activity this neuron will have. If no matched feature is found, m i = 0.
(2) Normalize the match signals with a softmax function: (3) Assign m i to initial reward signals with a threshold filter: where p indices the neuron of the initial reward signal.
Multiple units will contribute to localize the target object/person since the person's location is presented with a probabilistic distribution and the object features may be represented at different positions. For each r p i (0) > 0, the reward signals will spread separately to the neurons connecting to the neuron i (listed in nl) iteratively with a discount factor λ and the corresponding connection weight c ij (see Equation 38): where the neighborhood list nl will be updated for each iteration as follows: nl(t + 1) = n After the spreading phase, the final signal r j of each neuron {j} will be calculated by summing up all the reward signals from the target's location distribution: As shown in Figure 5, the gray-scaled brightness of the neurons indicates the reward spreading from the target location. The brighter the color is, the higher the reward this neuron has. Based on these reward signals, the robot plans its action by calculating the next position it should reach. Assume that the robot's position is represented by a group of neurons α in the spatial memory. The next possible position should be among the neighborhood neurons that connect with neurons in α directly. The activity q d i 2 of these neighborhood neurons i 2 , which connect with neurons i 1 ∈ α, is computed as follows: where s r i 1 is neuron activity of the robot detection (see Equation 1) and c i 1 i 2 is the connection weight (see Equation 38). The higher q d i 2 is, the more desirable it is for the robot to be at this position. We scale q d i 2 to the range of [0, 1] by dividing all the q d i with the maximal value max i (q d i ). Theoretically, the distance of reward spread is unlimited, but the strength of the signal decreases exponentially with a constant discount factor, which leads to small gradients at large distances. For each decision making the robot only evaluates the states around the current location and uses a soft-max function to retrieve those with highest reward value. Neuronal noise would impose a limit at which the gradient toward the goal cannot be evaluated, but in the computer implementation, the limit will occur when the computer cannot distinguish the higher value due to the precision of the double float value. However, during our experiments this situation never appears.

INTERACTION WITH OBSTACLES
An interaction mechanism is essential to provide the robot with some "reflex" behavior to protect the robot passively during navigation and to adapt the navigation strategy online. Since the spatial memory is built using a person's actual movement, it may well fit here, since it helps showing that some positions where the person can go are not accessible by the robot. The connection weights c i 1 i 2 in the spatial memory indicate how "easy" the robot can follow that link. They are further adapted depending on the interaction with the environment.
A small humanoid Nao robot is chosen for evaluating our architecture. It is equipped with various sensors, among them we use one camera in the head to observe the environment and the two pairs of sonar sensors for detecting obstacles during walking (Figure 3). Both sensors can detect the distance to obstacles robustly between 30 and 80 cm. A higher sensor value indicates a larger distance. Some routes learned from observing the person may be difficult for the robot to walk through, for example the path (shown in Figure 3) between the sofa and the tea table with a width of only 25 cm. Here, the sonar sensors indicate an obstacle in front of the robot. We simulate the sonar sensor in the simulator for modeling the obstacle avoidance behavior which is described in section 5.1. In order to incorporate a reflex-like obstacle avoidance behavior, for each step, we compute a signal G(s 1 , s 2 ) according to the sonar sensor signals s 1 and s 2 with a non-linear function: where a, b and c are constant parameters. Two kinds of obstacle avoidance strategy will be triggered based on G(s 1 , s 2 ). When G(s 1 , s 2 ) is below a threshold, i.e., G(s 1 , s 2 ) < γ, the robot will turn away from the obstacle based on the sonar signals: where c o is a defined parameter. When G(s 1 , s 2 ) < 0.8γ, the robot will walk slowly backwards besides turning. The weight adaptation is as follows. When G(s 1 , s 2 ) < γ, we update the connection i 1 i 2 * with the highest value of s r i 1 q d i 2 , i.e., the connection along the current traveling direction: Then, the connection weight c i 1 i 2 * is adapted based on the sensor inputs: where τ 1 is a learning rate. When the robot approaches an obstacle, the connection weight c i 1 i 2 * will thereby be decreased and even converged to zero when the obstacle gets too close. Since obstacles may also be removed from the environment, we consider the following method for recovering the connection weights in this case. When G(s 1 , s 2 ) > γ, all the connection weights around the current robot position are adapted with Equation (39): where e is a threshold of the distance and τ 2 is a learning rate smaller than τ 1 . For our robot we adjust parameters as given in Table A3 in the Appendix.

EXPERIMENTAL RESULTS
To evaluate the performance of our model, experiments are conducted in a simulator 1 as well as in a real home-like laboratory. Figure 4A shows the GUI of the simulator. It mimics a real environment and is used for demonstrating the map building and the navigation functionality. We also simulate the mechanism of sonar sensors, which can be visualized. The setup of the real test environment is shown in Figure 4B. A ceiling-mounted camera with a resolution of 320 × 240 pixels is used to localize the person and the robot (for details please see Yan et al., 2011). A fish-eye lens gets a wide field of view of the whole room with a single camera but at the price of strong image distortion. Since the map building is based on the internal spatial representation from the ceiling camera view, this distortion does not interfere with the robot navigation behavior.
The following tasks are included in the experiments. First, the spatial memory is built by observing the movement of a person within the room. Second, once the map is built, the robot can either navigate to the target autonomously or be controlled by a joystick in order to learn the environment further. While the robot walks, appearance features are extracted from the robot's head camera (with a resolution of 320 × 240 pixels) and stored to the corresponding spatial state. Based on this memory, the robot is capable of locating and navigating to the target object when showing an image of it. Third, since obstacle avoidance leads to a decay of the spatial memory connection weights, we evaluate the performance of interaction with environments by comparing the navigation behavior before and after obstacle avoidance. Details are presented in the following sections.

MAP LEARNING BY OBSERVING PERSON MOVEMENT
At the beginning of map learning, the spatial memory is initialized with two neurons linked with each other with a connection (see Figure 5E). Then, based on the person's position estimated by the visual input, the closest neuron to the person (winner neuron in yellow) and the second closest neuron (second winner neuron in blue) will be computed. The winner and its neighborhood neurons will be drawn to the person's position and new neurons will be inserted (see Figure 5F). The spatial memory will grow automatically when a person moves to a new place in the room, until most of the free space has been visited ( Figure 5G). After the sensorimotor map building phase, we control the robot remotely to explore the room and to memorize the appearance of the environment. The visual features extracted from the robot's camera   Figure 6A) will be registered to the corresponding neuron where the robot is located. For details of the learning rules please see Equations (1-13).

NAVIGATION TO A PERSON
The navigation task can be started after the map building. The robot will first be localized (see Figure 5G, the robot is localized with a red spot, and the red short bar shows its estimated orientation) and the neuron activities s r i that represent the current position of the robot are computed. A reward signal then spreads from the person position using Equations (27-30) (see yellow spot) through the entire network with an exponential decrease, which is visualized through the brightness of the nodes (see Equations 31-33). The brighter the neurons are, the higher reward they have. Then, based on the current robot position and the reward signals in the sensorimotor map, the robot calculates the next desired state and generates the motion signals with Equation (14). The motion signals are merged in the DNF and the activities of the neurons of the DNF are updated (cf. Equations 21-23). In the lower row of Figure 5 we visualize the neuron activities of the DNF by a red circle surrounding the robot's position with a basic radius of 15 pixels where activations are zero, and a larger radius where activations are larger. An activation bump is built up which determines the desired orientation for navigation . During the robot's movement, the state representations are changing and the activation bump will be updated to the new desired orientation. Also, when the target person moves, the reward spreading will be changed and the robot replans its behavior in real time 2 . Because of the dynamic behavior of the DNF, the robot will adjust its orientation slowly and walk in a natural way instead of reacting suddenly to noisy measurements. When the robot gets close to the person, the navigation will be achieved and the robot will stop walking ( Figure 5H).

OBJECT FINDING BASED ON APPEARANCE FEATURE MEMORY
To test the object finding task, we show a picture of an object to the robot, which might have been observed by its head camera during the explorative navigation. Then, the robot searches its visual memory to find the states that contain the visual features resembling the features of the target object. As shown in Figure 6A, the reward signals are reset according to matching features extracted from the shown image and the features v r of neurons (cf. Equation 13), and spread from the winner neuron to the entire network (see Equations 28-33). Then the robot can approach the object with the same method used for approaching a person 3 . Twenty experiments are conducted for locating different objects. For each experiment we first let the robot walk around the room and learn the appearance of the environment. Objects are placed in different positions each time, which are observed by the robot during navigation. After learning, we check if the robot can find the correct position of each object by showing an image of it. The object is considered to be identified correctly if the distance between the target node, i.e., the neuron with the highest reward in the spatial memory (see the brightness of neurons in the map) and the object is smaller than 30 pixels. As summarized in Figure 6B, the results vary with the objects. The docking station has the lowest success rate because of its simple structure and few detectable features. Due to the constraint of the robot hardware, the robot can provide images with 10 frames per second (fps) and features may be missing because of the image blur. The book and the bottle can be localized easily because sharp features can be extracted from their surfaces. Also, the light condition of the environment influences the experiments.

MAP ADAPTATION DURING NAVIGATION
The adaptation of the map during navigation is an essential ability of the robot to interact with the environment and adapt the spatial knowledge based on its interaction with obstacles. We set up a test scenario shown in Figure 7. The robot can navigate to the target position based on the map learned from observing the movement of the person, and among the connections in the map, the route masked in red seems better for navigation and the robot attempts to choose this way. However, since this narrow path has only 25 cm width, which is smaller than the limit of detection range of the sonar sensors, the robot is unable to walk through this path. The robot should then realize this difficulty by obstacle detection and adapt the sensorimotor map accordingly. We will start this task several times from the same position to check how the navigation performance improves. The trajectories of different trials are shown in Figure 7. The robot first tries to navigate through the narrow path and fails due to the warning signal of the sonar sensors. According to this feedback, the corresponding connection weight decreases. At a certain point, when the connection weight is small enough, the robot's behavior will be changed. We then start the navigation from the same initial position again. As the trajectories of the first FIGURE 6 | Anchoring appearance features in the sensorimotor map. (A) Shows the features extracted from the robot's camera during navigation. We use SURF features (Bay et al., 2006) for representing the appearance shown as red circles. (B) Shows the results of object finding. Each object is experimented for 20 times in our lab environment. The black columns show the number of successful tests and the success rate on average is 84.2%.

Frontiers in Neurorobotics
www.frontiersin.org October 2013 | Volume 7 | Article 15 | 10 adaptation shows, the robot turns immediately to avoid the obstacle pro-actively. That means, the robot has learnt the suitable way for walking through map adaptation. We evaluate the performance of the navigation from different positions and a subset of the experiments is displayed in Figure 8A. A map is built each time and then we let the robot navigate to the person's position shown as squares in Figure 8A. Because the obstacles are detected by sonar sensors, the robot will keep a certain distance to the obstacles. Thus, the robot's behavior differs from a person's: the planned navigation paths are often blocked by corners of furniture which are easy for a person to avoid (see Test 1, 2, and 3). Also, a new obstacle was placed in the room (Test 4) which was not present during mapping. During the robot's own exploration, the robot detects the obstacles and adapts the map by deactivating the connections close to them. As a consequence, after the adaptation the robot would choose a safer route for approaching the target. In some cases (for example Test 3) the time for normal navigation behavior increases, because the robot may walk in a longer path to avoid obstacles. The time analysis of these experiments is shown in Figure 8B. As we can see, the total time of navigation decreases significantly and the run of the second adaptation is close to the manual control.

SUMMARY
We have presented a novel neural framework for robot navigation in a cluttered environment based on sensorimotor map learning. The system consists of multiple layers of neural networks, which combine map building and localization with planning and navigation. The spatial memory is represented by a GWR network with self-organizing learning, which is related to dynamic [e.g., cell growth (Eriksson et al., 1998)] place cells in the hippocampus (Gorchetchnikov and Grossberg, 2007). The ceiling-mounted camera simulates a high-level visual perception model not only for robot localization and map building, but also person detection from an arbitrary position in the room. Our map learning and adaptation is inspired from the principles of sensorimotor learning (Wolpert et al., 2011): (1) observational learning that develops the map and the corresponding motor skills by watching a moving person and (2) error-based learning during navigation that adapts the map, and hence its navigation strategy based on interaction with obstacles. During navigation, the robot learns the appearance of the environment by anchoring the object's features to the corresponding neuron in the spatial memory, which simulates the visuospatial perception and enables the robot to combine the cognitive tasks of locating and navigating to an object held in memory. The navigation is planned in real time based on the reward signal spread through the spatial memory network from the target position. Since there is activity away from the robot's actual location, this could correspond to the activation patterns observed in hippocampal cells, which do not strictly encode the current position of a rat, but represent places it is considering to visit in the near future (Van Der Meer and Redish, 2010). The navigation tasks to different targets were achieved successfully in a home-like laboratory. Our research represents the first adaptation of the kind of models of Toussaint (2006) and Martinet et al. (2011) to the real world, transferring the new concepts from a simplified to a real environment with higher complexity. Interesting further developments in our model have arisen from the needs of the real world. First, while the goal position and the current robot position can be determined perfectly in simulation, it becomes difficult in the real environment due to the uncertainty of the measurement. A distributed neural coding is therefore used (i.e., multiple hypotheses) to represent the positions of the robot and the target. This brings the model closer to the spiking and redundant population coding in real neurons, and is useful to represent the locations from which the robot can see an object, as the object may be observed from different positions. Second, unlike in the simulator where a state transition with a certain action can be modeled deterministically, the effect of an action may be uncertain due to the presence of noise. Therefore, in our model we back-propagate the rewards on the state level, which is corrected through observation during navigation.
Our model is able to perform a flexible navigation to reach an arbitrary target without pre-training with a fixed goal position. The robot moves within a broad home-like environment, which requires selection of a continuous direction among 360 • , rather than discrete choices as in mazes or corridors. Hence, we merge different actions weighted with their corresponding activities to generate actions that are more precise and robust with respect to the discretization of the grid of the spatial map. Another aspect that needed to be considered for real-world integration is the focus on the obstacle avoidance and the adaptive map and planning behavior in a dynamic environment, which is an attractive topic in robotics. This allows perpetual learning which is required in a dynamic environment.
Overall, the key achievement of this work is the successful development of a neural model for robot indoor navigation and visual appearance anchoring to realize cognitive tasks such as finding and approaching an object. The obstacle avoidance validates the model in a dynamic environment, which requires to incorporate a simple reflex to unknown obstacles. To remember them in spatial memory leads to improved performance.

OUTLOOK
The presented architecture learns a sensorimotor map through observing the movement of a person in a room in order to accelerate the mapping phase. However, it might not be suitable for a person to explore some places (for example a room with high temperature) for safety reasons. Accelerating the room mapping without support from a person is essential in this case. In future work we plan to therefore focus on learning a sensorimotor map via observing the active exploration movement of a robot. The ceiling-mounted camera is an effective external sensor that needs to be installed, but also constrains the flexibility of the system. We will therefore also consider to build up the map using only the head camera of the robot.