Neurobiologically Inspired Mobile Robot Navigation and Planning

After a short review of biologically inspired navigation architectures, mainly relying on modeling the hippocampal anatomy, or at least some of its functions, we present a navigation and planning model for mobile robots. This architecture is based on a model of the hippocampal and prefrontal interactions. In particular, the system relies on the definition of a new cell type “transition cells” that encompasses traditional “place cells”.


INTRODUCTION
Trying to understand human cognition is a very difficult problem. We choose to focus on navigation and planning behaviors. Our work follows an iterative strategy divided in two related parts. First, simulations allow to define a minimal model to isolate cognitive function based on biological data and experiments with animals. Second, we develop our models on robotic platforms because we need a physical interaction with the environment (embodiment). This phase also allows to validate/invalidate the simulation model, and to suggest new modifications in the simulation model. Hence, the present model does not take into account some modifications already performed on our latest simulation works (like the integration of grid cells and the return of idiothetic information into EC, see Section Conclusion).
Navigation in an unknown environment requires from the agent or the robot to select the appropriate action to perform. This task might be complex when several actions are possible, and so different approaches have been proposed to choose what to do next. In a traditional robotic way, many methods rely on the combination of different algorithms that have to be triggered appropriately (and concurrently) when necessary. Hence, one challenge is to be able to develop a system that autonomously decides the appropriate behavior corresponding to the goal to achieve. We try to address this problem adding the following constraints: the model should be biologically grounded, the model should be as minimal as possible, input is limited to visual information (no ultra-sounds, lasers, or GPS, . . .), avoid the homunculus problem where one has to develop an external algorithm in order to be able to perform an action for instance.
The first item imposes a neural coding. Rate coding is enough for the present model. More biologically plausible models are also developed by our team and serve as basis for robotic control architectures. Concerning the second point, we adopt a constructivist approach. The third point is linked with the first one. We do not want to provide additional information that could solve a crucial problem for animal navigation (as setting up the environment for the need of the experiment: tagging objects, . . .).
Finally, the last point is maybe the most demanding. For instance, it is not necessary to "see" the map of an environment in order to be able to use it. Introduction of transition cells instead of place cells (PCs) is an answer to this question (see Section Place Cells and Subsection Transition Cells Coding).
In order to point out the difference between a coding relying only one PCs rather than on PCs and Transition Cells, let us take the following example. A first description of a path can look like this: "in A turn 10 degree on the left and go straight until reaching place B then turn 40 degree on the right until reaching place C." Instead, we have chosen to describe it like: "in A use the transition AB to reach place B, next use transition BC to reach place C." Each transition can be linked with the movement used to go from one place to another, for instance transition AB with the movement "turn 10 degree on the left and go straight" for going to place A to place B. Once two way points, their corresponding transition and movement are learned, the time of displacement does not matter. Only the order of the elements in the sequence is important.
This example shows the natural way for coding a path by using a graph where the nodes are the places and the edges code for the movement needed for joining them. This is the case for instance, in the model developed by Mallot et al. (1995). However, this graph has no neuronal grounding: the movements have to be extracted by an external algorithm (see also Subsection Navigation with Topological Maps). Similarly, Hafner (2000a) suggests that a representation of the environment is stored and contains information on the direction of the path between pairs of locations. This definition looks alike our definition of a sensory-motor transition cell. Trullier also (Brunel and Trullier, 1998;Trullier, 1998) proposes to replace the directional goal cells by directional PCs. But this directionality tends to vanish due to the recurrent links in CA3. Finally, Chavarriaga et al. (2003) also propose in their model to use directional PCs.
From a biological point of view, it is rather difficult to be able to isolate a transition cell activity from a PC activity, given that there is really a difference. However, some findings may suggest that transition cells exist. For instance, weights learned between transition cells during exploration may be the elementary blocks of sequence learning and may correspond to the oval-shaped place fields in the deep layers of EC (Frank et al., 2004). Samsonovich and McNaughton (1997) also report oriented place field when the animal goes to a goal. Wiener suggests that theta rhythm is used to synchronize HS neurons in order to organize the ordered activation of neurons having adjacent or overlapping place fields (Wiener et al., 2002). This description of the relationships between place fields may be also implemented by the activation of transition cells predicted by successive PCs. Poucet et al. (2004) have reported predicting goal activities in hippocampal PCs. Finally, several works have found directional firing in PCs in some constraint environments (like star or plus maze) (Markus et al., 1995;Muller et al., 1994).
The idea of transition cells coding has been inspired by a neurobiological model of timing and temporal sequences learning in the hippocampus (HS) (Banquet et al., , 2005. From a robotic point of view, a natural question is why using transitions instead of places, what are the advantages of this coding? To briefly answer this question, we have to focus on the drawbacks of planning model using PCs. Several bio-inspired approaches rely on PCs, but to better illustrate our approach we will only focus on our past-model which allows to easily underline the problems it suffers and the way we have followed for solving it. First, we have to notice that a PC may be linked with the movement needed to reach a goal without any map. Indeed, this sensory-motor association may be generalized to the whole environment (Gaussier et al., 2000b) using the fact that PCs keep an activity over a quite long area. However, this simple reactive mechanism is not enough in an environment composed of several rooms, or when there are contradictory motivations. A cognitive map will solve these drawbacks (Subsection Autonomous Cognitive Map Building) by linking successively reached PCs together.
The action selection mechanism has to be integrated. Indeed, by associating an action with a place, it is possible to define a sensory-motor unit. But then, the choice of the direction to follow may be ambiguous because in some places, several actions can be associated with the same neuron like in a T-maze (see Figure 1). In this example, from place B the robot had learned during exploration that it can go either to C by turning left or D by turning right. Both movements are thus linked with place B. In this case, which movement should be selected by the robot if it must go to C? One way for selecting the action in a place-based model can Both movement are thus linked with place B so that in B it is impossible to choose which one to perform. In case of transition learning, if exploration leads to the sequence AB, BC, CB, BD, then when in A, the sequence performed will be AB and then BD directly. be realized by an external mechanism applied to the cognitive map: the gradient algorithm. But, if this solution is enough for a navigation task, it might be more difficult to find an external mechanism for more complex tasks like robot arm control. Moreover from a biological point of view, using an external algorithm "looking for" the gradient of activity leads to the famous problem of the homunculus: "who is looking at the PC activity?" Thus, in order to solve these drawbacks, we have chosen not to directly use PCs for planning in our model. We use instead transitions between two PCs successively winning the recognition competition. Such spatiotemporal transitions are explicitly coded on neurons called transition cells. Transitions are better suited for sensory-motor associations than places since only one direction can be linked with a transition: the movement used to go from A to B with the transition cell AB (see Figure 3). This property allows solving the second drawback listed before. We also introduce in this article the possibility to have AA transitions. The first problem will be solved by the way we exploit the cognitive map build with transition cells (see Subsection Autonomous Planning Using the Cognitive Map and Motor Transitions).
We will focus in this paper on the "all neuron" architecture from the visual input processing to the motor commands, which is rarely the case in any other similar model. Thus, we describe all components of the architecture following the information stream from the visual input processing until planning. This architecture has been tested in various environments (Subsection Autonomous Planning Using the Cognitive Map and Motor Transitions).
The outline of the paper will be the following. After a brief sketch of the main hippocampal anatomical structures and functions (Section Hippocampus in Short), we will describe the PCs as found in the HS (Section Place Cells), then we will make a short review of navigation models using PCs (Section Biologically Inspired Navigation Models). These two parts may give useful pointers to PC modeling. We do not however discuss in details the relevance of the different models cited. Finally, we present our navigation architecture (Section Our Navigation Architecture). Readers familiar with PCs may directly go to this section.

HIPPOCAMPUS IN SHORT
Navigation and planning in an unknown environment requires memory and prediction abilities. One brain structure involved in these processes is the hippocampus (HS). In particular, the functional interplay between HS, entorhinal cortex (EC), prefrontal cortex (PF), and nucleus accumbens (ACC) is a central issue in understanding the biological substrate of navigation and planning (Brown and Sharp, 1995;Hok et al., 2005;Taha et al., 2007). So, we will first give a brief overview of the HS structure and then a functional overview of HS processing. More details on the hippocampus may be found in books such as (Amaral et al., 2006).
We will make no differences between the right and the left HS hemispheres. One may refer to the rat's hippocampal anatomy for a more precise description (Amaral and Witter, 1995). Although there is variation among mammals in the size and shape of the hippocampus, its intrinsic circuitry is very distinctive and is conserved across species (Kolb and Tees, 1990). The trisynaptic loop is the name for the connectivity of the different hippocampal structures (see Figure 2).
We will begin the loop from EC. The perforating fibers from EC layer II convey the main information stream into HS. They arrive on the pyramidal cells of the dentate gyrus (DG) and of CA3. It was in this pathway that long-term potentiation (LTP) was first discovered. Neurons from layers III and IV project onto the pyramidal cells of CA1 and the Subiculum (SUB). Dendrites of the CA3 pyramidal cells are the target of the mossy fibers from DG. Part of the CA3 region axons (Schaffer collateral) go to CA1. The distal cells of CA1 project onto SUB. The loop is closed by the projection from the distal cells of CA1 and proximal cells of SUB onto the lateral part of EC, and by the projection from the proximal cells of CA1 and the distal cells of SUB onto medial part of EC. The reciprocal links also exist. PF is also the target of direct fibers from CA1, which in turn projects to ACC. ACC also receives links from CA1 and SUB.
CA3 has a large amount of recurrent links. This has led to make the hypothesis of an auto-associative memory property.
As Redish (2001) points it out, two main empirical facts have driven the research fields on the functional role of HS: the finding of PCs (Section Place Cells) that fire only when the animal is at a particular location, the fact that hippocampal lesions impair navigation capabilities, and cause an anterograde amnesia particularly in humans.
Observing this, two main theories explain the hippocampal functions: Marr (1971) has suggested that HS may constitute a working memory (short time memory) mandatory if one wants to access to stored sequences in order to repeat them. HS would also guide the cortex for learning multimodal sequences. The emphasis is thus on the temporal role of HS ("memory theory"). Others think that HS generates a cognitive map acting as a context for events that would be reactivated in the cortex (O'Keefe and Nadel, 1978). This map is mainly used for navigation, thus for spatial purposes ("cognitive map theory").
Both theories may converge if one considers that it is the comparison of current inputs with the memories of previously visited location ("memory theory") that enables spatial localization ("cognitive map theory"). Thus, spatial memory is a part of episodic memory. However, it is still an open debate whether phylogenetically spatial memory existed before episodic memory in HS. As mentioned by Healy (1998), HS functional role seems to be similar in rodents and humans: ". . . spatial memory in rodents, as well as conscious recollection and explicit memory expression in humans, are prime examples of fundamental declarative memory function mediated across species by the hippocampus." More details on the functional role of HS may be found in Burgess et al. (2001), Corbit and Balleine (2000), Papez (1937), Whishaw et al. (1995).

PLACE CELLS
Many neurobiologically inspired navigation models rely on the building of PCs. We will however show in Subsection Transition Cells Coding, that PC are not always enough and may be generalized to transition cells.
A PC has a firing pattern strongly correlated with a particular location in the environment. Namely, one PC fires strongly when the animal is at some location, and not when it is somewhere else. The topology of the environment is not preserved since two close PCs in HS may code for two far away locations in the environment.
The place field is the projection in the environment of the locations where a particular PC fires. The firing activity is maximal at the "center" of the place field and decreases almost monotonically as one goes away from the center.
PCs were initially found in the rat's hippocampus, in different regions called CA1 and CA3 (O'Keefe and Dostrovski, 1971). Later, other structures in link with HS have found to exhibit PCs: the superficial (Quirk et al., 1992;Sharp, 1999) and deep (Frank et al., 2000) EC, the DG (Jung and McNaughton, 1993) and the SUB (Sharp and Green, 1994) with a high tendency in the later case to have a directional response.
Some PC properties (non-exhaustive) are the following: In a new environment, PCs are rapidly recruited (Jeffery and Hayman, 2004;Wilson and McNaughton, 1993). Place fields are stable in time: the same PCs code for the same location from one trial to the other in the same environment, even if the two trials are separated by several months (Thompson and Best, 1990). PCs do not rely on the sole visual information as they may be active in the dark (Markus et al., 1994;Muller and Kubie, 1987;Quirk et al., 1990). Hence, blind rats develop PCs. These results show that visual input is not the sole information channel triggering a PC firing (path integration (Etienne and Jeffery, 2004; or odor may also be used Schenk, 1995, 1998;Wallace et al., 2002)). Displacement (rotation . . . ) of distal landmarks leads to the displacement of the place fields (Cressant et al., 1997). The same PC may fire in two distinct environments and have totally different place fields (Kubie and Ranck, 1983). Proximal and distal landmarks have not the same impact on PCs (Muller and Kubie, 1987). The place field is also linked to the animal behavior (Poucet et al., 2004).
We also can note some differences across species. Studies on Rhesus monkeys have revealed "view cells" instead of PCs (Rolls and O'Mara, 1995). These cells fire when the monkey is looking at a particular part of the environment. Recently, these cells have also been discovered in the human brain (Ekstrom et al., 2003). A hypothesis may explain this phenomenon (Araujo et al., 2001;Gaussier and Joulain, 1998;Gaussier et al., 2001;Rolls, 1999). Indeed, monkeys visual field if approximately 180 degree, instead of the 320 degree for rats for instance. Hence, the larger visual field of the rat enables it to base its visual recognition system on a large panorama. Thus, the rats and the monkeys localization system may be very similar, only distinguished by the width of their visual field.
From a robotic point of view, PCs provided very interesting information since they could code for the localization of the robot (Section Biologically Inspired Navigation Models). We have here an example of the biomimetic approach interest as modeling PCs provide a quite straight way for using rich information sources like vision for self-localization that may be more complex to handle in a classical robotic architecture (Ayache and Faugeras, 1989;Moutarlier and Chatila, 1990).
It seems that two kinds of PCs exist : PCs from ECs where the modalities coming into the HS begin to merge. A place field of these cells is large and noisy, it may even be split into several distinct parts for the same environment.
PCs from HS (in regions DG, CA1 and CA3) have a smaller and more precise place field than in ECs. It seems that DG acts as a noise filter and selects the appropriate place field. According to (Redish and Touretzky, 1997), ECs PCs also carry contextual information (in particular concerning the actual location). DG would select the place field corresponding to the actual environment.
It is also worth mentioning that PCs activity is modulated by "head direction cells" (Ranck, 1985;Skaggs et al., 1995). Head direction cells have the property to fire for a particular direction (orientation of the rat's head) and are almost silent otherwise. They have been found in different cerebral parts: lateral dorsal nucleus of the thalamus (Blair and Sharp, 1995;Mizumori and Williams, 1992;Taube, 1995), the lateral mammillary nuclei (Leonhard et al., 1996), the striatum (Wiener, 1993), and the posterior cortex (Chen et al., 1994). Finally, "grid cells" have recently been discovered in the dorsocaudale portion of the medial EC (Hafting et al., 2005). They code for a topographic representation of the environment: neighbor grid cells code for the same orientation and the same step. The model presented in this paper does not take into account these cells. However, a computational model of navigation including these grid cells has been developed. It remains so far in simulation and has not led to a robotic implementation yet whereas the model presented in this paper runs on robots (Gaussier et al., 2007).
We only present here a brief review of the most popular PC models. This non-exhaustive list is given to show PCs can be modeled by one or more competitive network over a sensory layer. Zipser (1985) proposed the first PC computational model. Based on two neuronal layers, it makes the assumption that PC response is a function of the difference between the learned visual clues for a given location and the current visual input. Sharp (1991) presents a three layer model (one input layer and two layers making a competition) exhibiting PCs with a very realistic firing pattern. O'Keefe and Burgess (Burgess and Hartley, 2002;Burgess and O'Keefe, 1996) have developed a detailed model of the matching between place fields and visual clues extending O'Keefe's (1991) centroid model. Jensen proposes an accurate timing model accounting for the theta phase precession of PCs (Jensen and Lisman, 1996). We will now detail in the next section some HS models used for navigation.

BIOLOGICALLY INSPIRED NAVIGATION MODELS
Robotics uses a wide range of algorithms for solving navigation and planning problems. We will present here models inspired by the biological anatomy or functioning of the brain (mainly from rodents, even if some navigation strategies based on insects may also be efficient (Cartwright and Collett, 1983;O'Keefe and Nadel, 1978)). A large overview of navigation strategies and spatial representation may be found in Gervet and Pratte (1999). We will further restrain our study to architectures that model the HS, or at least exhibit functions devoted to it. Most of these models rely on PCs. Some models use the associative properties of CA3 in order to create event chains (McNaughton and Nadel, 1996), maps or graphs (Muller et al., 1996;Trullier and Meyer, 2000), or attractor networks (Samsonovich and McNaughton, 1997). Other models use vector fields on a map of the environment (Burgess and Hartley, 2002;O'Keefe, 1991). We will first detail navigation architectures that are simulated or implemented on robots and do not use any (topological) map (Subsection Navigation Without Maps). Then, we will present models using topological maps (Subsection Navigation with Topological Maps). One can refer to Franz and Mallot (2000), and Trullier et al. (1997) for definitions and classification of navigational strategy with more examples.

Navigation without maps
In the following, we will only cite some navigational models relying on either homing, planning by Q-learning, or a "recognition triggered response" strategy. Models for homing strategy share some common properties like the association between PCs and directions leading to the goal. Zipser (1986) proposes a model that enables a navigation based on landmarks. Directions leading to the goal are linked with directional place fields through hebbian learning. Current direction is updated by idiothetic information. Burgess and coworkers (Burgess and O'Keefe, 1996;Burgess et al., 1994;Burgess et al., 1997) propose a model including "goal cells." Distances to obstacles (walls) are obtained through visual information. A first exploration phase leads to learn these distances. Thus, PCs fire at a given (learned) distance from the obstacles. Orientation of the robot is obtained by path integration, periodically reset according to a fixed visual reference giving the north direction. Each goal is coded by a set of goal cells representing the directions leading to it. For reaching a goal, the robot learns the direction to take from four different positions around it. Gaussier and coworkers (Gaussier and Zrehen, 1995;Gaussier et al., 1997;Gaussier et al., 2000a) have proposed a HS model where PCs learn the location of a robot based on visual landmarks. The association of a direction given by a compass and the actual visual scene around a goal enables to reach it. This model serves as basis for the one explained in the following and will be detailed thereafter (Section Our Navigation Architecture).
Some authors propose to add planning to homing by using Q-learning. Brown and sharp (Brown and Sharp, 1995;Sharp et al., 1996) make the hypothesis that control of the movements is performed by the ACC taking inputs from both HS PCs and head direction cells from the postsubiculum. Output of their model is a direction leading to a rewarding location (goal). Association between these directions and PCs is achieved through repeated learning. When the goal has to be reached, the selection of the direction to follow is based on a strategy close to Q-learning. As a consequence, when a long sequence of actions has to be performed before reaching the goal, the system cannot determine which actions to reward (problem of delayed reward). Gerstner (1999, 2000) have developed a feedformard architecture where PCs are created based on visual and path integration information. Localization is computed as the gravity center of these activities. Planning is then performed through Q-learning.
Some models rely on recognition-triggered response. These models are based on learning and use sequences of intermediate places, linked with the corresponding movements, allowing to get closer to the final goal. Whereas also using sequences of linked places, they differ from topological map navigation in that sequences are not connected together, and form instead separated paths. McNaughton and Nadel (1996) have proposed a model where HS acts as an associative memory. When exploring, each view and each performed movement is linked with the preceding view as being the consequence of this movement. This coding is performed on the CA3 recurrent links. Thus, routes forming chains of view/movement associations are learned. The PCs response is directional. Blum and Abbott (1996) model CA3 with asymmetrical long term potentiation (LTP). This architecture learns routes to a goal. This model is extended for taking into account several goals (Gerstner and Abbott, 1996). The main drawback of this model is that all PCs must know where the goal is.

Navigation with topological maps
Topological maps code the relationships between locations. They may be given by a metric map (Thrun, 1998) or not. Previously explained models may not be used for planning an entire path from the current robot position toward the goal. Indeed, the recognition-triggered response is limited to use the same sequence for reaching a given goal. The routes defined are then independent from each other and thus not connected. On the contrary, in topological navigation, spatial representation is independent from the goal, and a same representation may be used for reaching different goals. The topology is often represented by a graph where nodes represent the locations and the edges how to go from one node to the other. We will now list some implementations of this strategy. Mataric (1991) proposes a model taking inspiration from experiments on rats. The robot uses sonar and compass. On a first level, this information triggers elementary behaviors such as wall following, or predefined ones when the robot reaches crosses or dead-ends. The architecture follows Brooks (1981) subsumption. At a second level, landmarks are detected.
They are created by combining the movement performed by the robot, its inputs and the direction given by the compass. These landmarks are used on a third layer for creating a topological map coding the adjacency between them. The robot is able to navigate to a goal by using a diffusion mechanism on the map from the goal to the current node (Mataric, 1992). Recce and Harris (1996) have proposed a model where HS is an autoassociative memory following the Marr theory. This memory stores relationships and distances between the surrounding landmarks and the goal. This enables the robot to locate itself. The model relies on an egocentric map of space, located in the neocortex updated by idiothetic information and the hippocampus stores snapshots of this egocentric map. Bachelder and Waxman (1995) use a PC model. Contrary to Recce and Harris (1996), they suppose that the map is coded in HS. PCs are the node of the map connected by the movement decision. The first level of the architecture is the localization. It is performed by dividing the environment into several regions characterized by a specific configuration of objects. An ART network classifies these regions. A second level stores the topological map where movement from a region to the other is learned. The network was implemented on a real robot but in a very simplified environment (black and big objects with lights at the corner to simplify their recognition as landmarks). Owen and Nehmzow (1998) have proposed a first model where input is coming from an omnidirectional sonar. Location information is stored in a graph. A new location is created when the input is different enough from the already learned ones. This similarity is tested explicitly. When a location is supposed to be different from the learned ones, the robot tries to reach the surrounding known locations. If it fails a new location is added on the graph. More recently, (Nehmzow and Owen, 2000) proposed an architecture using visual input. Based on the inputs, the environment is clustered into regions. Each node in the map contains information on the direction, the distance and the apparent size of the region. Final behavior of the robot comes from the coupling between several elementary behaviors (going back to a learned location, wall following . . . ). Trullier and Meyer (2000) model HS as a cognitive graph. HS is viewed as a heteroassociative network learning the sequence of reached locations. Thus, a topological representation of the environment is stored. The model has the same goal cells as Burgess combined with PCs and head direction cells for navigation. The cognitive graph is coded in CA3 recurrent links with a bias coming from the goal cells. When the robot has to reach a goal, information spreads along this graph. The main drawback is the poor biological relevance of this model particularly on the modulation of the recurrent links by goal cells. Obstacles may also be a problem for the diffusion of the goal information on the graph. Hafner (2000a) adds coding of the movement orientation in the PCs activity. As in Trullier's model, movement (here only the angle) would modulate the recurrent links between locations. Thus, PCs build nodes of a self-organized map similar to a Kohonen map (Hafner, 2000b).
The two following models are the main inspiration for the one we have developed. The architecture proposed by Schmajuck and Thieme (1992) has two layers. The first one encodes the topological representation, the second one selects the movement to perform. Inputs are views and places that are predetermined. Learning enables to reinforce the link between a view node and a neighboring place node so that after learning a view node predicts the corresponding place node. Diffusion among place nodes allows planning a path through vicarious trial and error. This diffusion is also used in our map, but we do not need any vicarious trial and error mechanism for planning. Mallot and co-workers Schölkopf and Mallot, 1995) propose a model where the node of a graph are local views and edges are the direction of the movements. Contrary to Schmajuck and Thieme's model, output of the architecture are not places, but directions leading to the goal. Learning of the matching between view sequences and movements is performed in a similar way as in Bachelder and Waxman (1995). Franz et al. (1997Franz et al. ( , 1998 have also used an architecture based on Mallot's one. The interesting concept behind the link between views cells and movement and their practical limitation motivated us to explicitly code transitions on neurons instead of edges. Finally, it is worth mentioning that some works combine genetic algorithms and neural networks (Floreano and Mondada, 1996;Mondada and Floreano, 1995). We will now develop our architecture for navigation and planning in an unknown environment.

OUR NAVIGATION ARCHITECTURE
As shown in the previous sections, in most bio-inspired models, localization is based on particular neurons found in CA, where transitions between them only occur in an implicit manner (e.g., edges of a graph). In our model, we also use PCs (Subsection Autonomous Place Building) that learn pattern specific of given locations (spatial landmarks constellation, see Subsection Autonomous Landmark Extraction and Recognition Based on Characteristic Points), but we do not directly use them to plan or build a map. We rather use neurons (transition cells) that explicitly code for these spatiotemporal transitions. Details of their creation and prediction are given in Subsection Transition Cells Coding.
We propose here a unified neuronal framework based on a hippocampal and prefrontal model where vision, place recognition, dead-reckoning (Subsection Autonomous Creation of Motor Transitions), and planning (Subsection Autonomous Planning Using the Cognitive Map and Motor Transitions) are fully integrated (see Figure 3 for an overview of the architecture).
During exploration, transition cells are created and allow learning a cognitive map whose construction is explained in Subsection Autonomous Cognitive Map Building. Next, we will show why and how these transition cells may be combined with an integrated movement coming from proprioceptive information (Subsection Autonomous Creation of Motor Transitions). When a plan is needed, transitions are predicted and filtered from the most activated PCs (similar to the multiple hypothesis position tracking) as explain in Subsection Transition Cells Coding. These transitions are then biased via top-down information from the cognitive map (Subsection Autonomous Planning Using the Cognitive Map and Motor Transitions). In a discussion (Section Discussion), we will give some keys on how the control of exploration and planning behaviors can be performed in order to allow navigation in a partially discovered and dynamically modified environment. We will conclude with improvements that may be proposed in our model. Parameters used for planning in the experiment of the Figure 12 are given in appendix.

Autonomous landmark extraction and recognition based on characteristic points
The visual processing of our architecture is inspired the mechanisms used by some insects like honey bees and some mammals like the rat for self-localization. Observations of their visual processing have led to the identification of two main streams of information the what and the where. The first allows identifying the characteristic points found in the retinal image and the second gives information on their locations in this image. Fusion of these two streams of information allows creating a constellation of landmarks with their azimuths.
We choose to adopt this strategy for the following reasons: First, a set of landmarks and azimuths is enough to define a particular place without any need of a metric map (Gaussier and Zrehen, 1995). Second, local correlation is more efficient and robust than global correlation since it allows only taking care of the recognition of some characteristic points and their relative motions from their learned position.
Setup and algorithm. In our architecture, images are taken by a panoramic camera at low resolution. This allows handling lighter images so that the process can be performed in real time and enhances the robustness of the characteristic points found (high frequencies are removed). In order to eliminate problems induced by luminance variability, we only use the gradient image as input of the system (a 1500 × 240 pixels image extracted from the 640 × 480 pixels panoramic image which is originally circular). This gradient image is then convolved with a difference of Gaussian (DOG) filter in order to detect characteristic points (Gaussier and Joulain, 1998;Gaussier et al., 1997). Standard deviation σ 1 and σ 2 of the Gaussian functions are given in appendix. Two processes then occur in parallel: A learning process allowing to code for these characteristic points. First, a log-polar transform of the local area extracted around each characteristic point is computed to improve the pattern recognition when small rotations and/or scale variations on this small image occur. These neurons are named landmark units. For each landmark, an angular position relative to the north, given by a compass, is computed (O' Keefe and Nadel, 1978;Tinbergen, 1951). This angle is coded on a neural population and a Gaussian diffusion is used to allow generalization (Giovannangeli et al., 2006).
A soft competition between landmark units, allowing several interpretations of a given local snapshot, is then computed to increase robustness. Learning and activity equations of landmarks units as well as more details on the impact of this soft competition can be found in Giovannangeli et al. (2006).
A simple feedback inhibition allows then to select a single landmark unit at a time. The whole process can thus be seen as a spotlight mechanism based on an attention process. This process is repeated until a given number (N ) of the most activated landmark units found has been used (see Equation (2)). The number of visible landmarks needed is a trade-off between the robustness of the algorithm and the speed of the process. If all landmarks are fully recognized, only three of them are needed. But as some of them may not be recognized, for example, in case of changing condition like occlusion, taking a greater number is enough to guarantee the robustness.
Our visual system provides both the what (on a layer called Pr, for perirhinal cortex) and the where (on a layer called Ph, for parahippocampal cortex) information: the recognition of a 32 × 32 pixels small images in log-polar coordinates, and the azimuth of the corresponding characteristic point. Figure 4 shows the different steps of the process. What and where information is then merged in a matrix of neurons [a product space (PS)] leading to a spatial landmark unit constellation. Again details and study on this process can be found in Banquet et al. (1997Banquet et al. ( , 2005, Gaussier and Zrehen, (1995), Giovannangeli et al. (2006). This product space allows measuring the distance between two visual configurations.

Learning small local views. W Pr
k,ij (t) is the weight of the link from pixel i, j to the k th landmark. W Pr k,ij are initialized to 0. Learning a small local view around one characteristic point is a one shot learning (one iteration step) on a neuron k recruited according to the following rule: R Pr k = 1 when recruited, and R Pr k = 0 otherwise. I ij (t) is pixel (i, j) from the small local view I at time t. The recruited neuron is a landmark unit.
Activity of the kth landmark unit, X Pr k (t), is computed according to the following equation: In a second step, the product I kl of these two activities is computed by: In a last step, activity of neurons in PS is computed by: This activity is reset after each complete exploration of all landmarks of an image.

Learning in PS.
A PS neuron learns to be activated when a landmark is recognized under a given angle. This activity may be maintained for near angles by convolving the angular information with a Gaussian function. The response is then maximal for the learned angle and is decreasing when the robot is going away. For a given threshold, the response is set to zero. This threshold is called the vigilance. This parameter is similar to the vigilance parameter of Grossberg and Carpenter (Carpenter and Grossberg, 1987;Carpenter et al., 1991) because it determines the threshold at which the difference between the learned landmark and the actual perceived one is too high, leading to a new learning.
Learning is performed on the weights between Ph and PS. The weight is maximal for the angle under which the corresponding landmark was learned. Weight learning is the following:

Autonomous place building
The spatial landmarks constellation on PS, resulting from the visual input process, characterizes one location. We use a neural network (ECs, see Section Hippocampus in Short) to learn the activity pattern on PS. A neuron coding for this location is called a "place cell" (Section Place Cells).
PC Activity. In our model, each PC neuron is linked with all neurons of the PS. Their activity is computed as a scalar product between the vector of activity on PS and the vector of the weights of the corresponding links. The activity of a PC then results from the computation of the distance between the learned and the current local view.
Activity of the jth PC is expressed as follows: . If the robot is at the exact position where the PC has been learned, its activity is maximal (equal to one). A priori generalization is an interesting property of this model. When the robot moves from this position, the activity of this PC decreases according to the distance between the learned position and the current one. Hence, a PC keeps a certain amount of activity around the learned position that corresponds to the place field of the PC (Section Place Cells). A more biologically plausible model can be found in Banquet et al. (2005).

PC learning.
A PC neuron thus categorizes a particular pattern of activity on the PS and hence a particular location (see Figure 5). Learning of PC neurons follows a Hebbian like rule: λ 1 is a decay term, λ 2 is a learning constant. Recruitment of a new neuron for encoding a new location occurs during exploration of the unknown environment. This mechanism is performed autonomously, without any external signal, relying only on the PC population activity. If activities of all previously learned PCs are below a given RT, then a new neuron is recruited for coding this new location.
At a given place, every existing PC responds with an analog recognition value that may be seen as a robot position probability. If at a given place, several PCs respond with an activity greater than the recognition threshold, there are two options: let only one neuron win the competition, or keep the activity of all neurons. In the first case, there are sudden changes in the movements when a new neuron wins. In the second one, the final movement is a combination of different transitions (Cuperlier et al., 2005). Thus, at a given location several neurons in ECs are firing.
The density of locations learned depends on the level of this threshold, but also on the robot position in the environment. Namely, more locations are learned near walls or doors due to the fast changes in the angular position that can occur near landmarks, or in the (dis)appearance of landmarks caused by these obstacles. In other locations, small changes produce a small variation in the PC activity. When the environment has been entirely explored, and thus fully covered by PCs, a PC responds specifically for each location (see Figures 6 and 13). Consequently, the PC neural layer gives our robot a way to localize itself inside the environment it has discovered.
Experimental PC formation has also been tested in outdoor environments (Giovannangeli et al., 2006). The result confirmed the mathematical model which predicts that the size of the place field grows proportionally with the landmarks distance.

Transition cells coding
We focus in this paper on a planned navigational task. This task leads us to focus on the motor trajectory, the spatiotemporal path, used by our robot in the environment. We thus follow a spatial interpretation of this trajectory that can be described by successive way points (places)  in the environment. We have shown in the previous section how we may autonomously build these way points. We will now show how we use this information in order to navigate.
A first idea for creating transition cells could be to use a full "matrix" (A matrix with current places along a line, and previous ones along the column) for coding all the possible combinations of the input (PCs). But, this would be too memory consuming.
We know the number of possible transitions starting from a given place cell is limited (Cuperlier et al., 2006b), since we only take into account transitions that can really be performed and not all combinations of PCs (There is, on average, a maximum of six possible starting PCs linked with a given one in our experiments). Thus, we can use this information for modeling the transition cells layer (see Figure 7). Transitions are in CA1/3 as a whole. We refer to this structure as CA.
Each neuron of a given line receives projections from both all neurons of ECs activated by the current location at time t and from the neurons coding the PC at time t − 1. Each transition neuron belongs to a particular neighborhood supervised by a single ECs neuron (a line in Figure 7). No learning is allowed on those links and their weights are not sufficient to trigger alone any activity on the associated transition neurons. Conversely, each transition neuron is connected to all DG neurons through conditional links, initialized with random low weights inferior to the threshold θ on CA neurons (see Equation (10)). The activation of ECs neurons triggers learning between the weights coming from the activated neurons in DG and the corresponding CA neuron (see Equation (10)). Once those weights are learned, the single activity of the corresponding DG, in a prediction mode, allows the activity of the transition neuron even if no signal comes from ECs.
Based on temporal proximity, this structure allows coding spatial proximity using information from currently and previously recognized PCs. Furthermore, we can notice that PC have not really disappeared from our new coding. Since transitions link two successively recognized PC separated by only one time step and since a PCs place field can be quite large, it becomes possible to recognize the place A at time t − 1 and still the same place A at time t, thus leading to code a transition AA. This kind of transition is the equivalent of PC in transition coding. In our model, no movement is linked with these transition cells. We only associate a movement to a transition linking two different PCs.
Compared to a "full matrix model," this structure leads to a reduction of the memory cost: the number of neurons has decreased from N × N to 6N with N the number of possible PCs. This gain is important since almost all next neuronal structures of architecture keeps same number of neurons. This gain is even more important for the structure encoding the cognitive map since this structure has recurrent links (see Section Hippocampus in Short). This decrease in the number of neurons needed has to be paid by an increase in the number of links from 2N × N to 6N(N + 1). But this increase is quite small and is only true for this structure not for the next ones.
DG neurons store the previous location. Hence, activity on DG is the following: CA neurons have the following activity: Learning in CA allows increasing the weights between DG and CA. Hence, after learning, the sole activation of DG is enough for activating a neuron on CA. This allows predicting all transitions based on the current location. Learning equation is the following: after learning small random value inferior to θ before learning Among the predicted transitions, the choice of the transition to perform will be done by the cognitive map (Subsection Autonomous Cognitive Map Building).

Autonomous creation of motor transitions
Each motor transition cell is linked with the direction used to go from the starting location to the ending location. For instance, going from place A to place B creates a transition cell AB. This transition is linked with the direction (relative to the north) for going from A to B. This direction is given by integrating all direction changes, given by a compass, performed from the starting place A up to the creation of B. The distance is obtained using robot wheel encoders to compute elementary displacement vectors. Direction changes can result from a new movement vector generated by the exploration mechanism (random exploration) or from the obstacle avoidance mechanism. A unique integrated vector summarizes all these movement changes. The integrated vector is reset when entering a different PC. An internal signal is computed from the automatic detection of a new winning PC at time t by temporal differences on the ECs layer. This signal is used to trigger the sensory-motor association. As several but close direction can be used to go from one PC to another, we use a learning mechanism (not described here) that increases the weights coding for the most often used direction (Cuperlier, 2006).

Autonomous cognitive map building
Since our robotic model is inspired by the animat approach (Meyer and Wilson, 1991), we use three contradictory animal like motivations (eating, drinking, and resting). Each one associated with a satisfaction level that decreases over time and increases when the robot is on the proper source. When a level of satisfaction falls bellow a given threshold, the corresponding motivation is triggered so that the robot has to reach a place allowing to satisfy this need. Hence, this place becomes the goal to reach. More sources can be added and one can increase the number of sources associated with a given motivation. Other motivations linked with levels and given places may also be added. Curiosity may be modeled by the inhibition of known transitions and a random choice between the remaining possible directions.
Experiments carried out on rats have led to the definition of cognitive maps used for path planning (Tolman, 1948). From the original Tolman definition, we keep the "latent learning" ability. We do not think, however, that cognitive maps are enough for taking shortcuts. They rely on either a metric map or a global path integration mechanism. Most cognitive maps models are based on graphs showing how to go from one place to another (Arbib and Lieblich, 1977;Bachelder and Waxman, 1994;Bugmann et al., 1995;Franz et al., 1998;Schmajuk, 1996;Schmajuk and Thieme, 1992;Schölkopf and Mallot, 1995;Trullier et al., 1997). They mainly differ in the way they use the map in order to find the shortest path, in the way they react to dynamical environment changes, and in the way they achieve contradictory goal satisfactions. Other works use ruled-based algorithms, classical functional approach, that can exhibit the desired behaviors, we will not discuss them in this paper, but one can refer to Donnart and Meyer (1996).
In our model, learning the cognitive map is performed continuously during the exploration phase of the unknown environment (latent learning) by linking transition cells successively reached. In the same time, if a source is present at the destination place the corresponding transition is associated with a motivation neuron. After some time, exploring the environment leads to the creation of the cognitive map (see Figure 8). This map may be seen as a graph where each node is a transition and the edges the fact that the path between these two transitions was used. The edges have a weight W PF−PF ij set to an arbitrary value (0.99 in the experiments) if i = j. If i = j, W PF−PF ij = 0. This value may be increased if the link is used, and decreased if it is not. It is possible to use a learning rule on these edges so that after some time, some weights are reinforced, and other decreased. These edges correspond to paths that are often used. In particular, this is the case when some particular locations have to be reached more often than others (Gaussier et al., 2000b).

Autonomous planning using the cognitive map and motor transitions
The need to plan is defined by a motivation to satisfy a certain need (eat, drink, rest . . . ). These needs are functions evolving in time between 0 and 1. An arbitrary threshold may be defined for each need. Below this threshold there is no motivation, above the corresponding motivation is triggered. This means that the transitions leading to the goal  The motivation activates all possible transitions arriving in "D," here "CD." This activity diffuses on the cognitive map according to described algorithm. Activity in "BC" is higher than in "BE" or "BB." In the same time, in CA different transitions from "B" are predicted ("BB," "BC," and "BE"). They activate the corresponding motor transitions. The bias coming from the cognitive map enhances "BC" leading to the corresponding motor command.
cell (where the need may be satisfied) are activated. This activation is then diffused on the cognitive map graph, each node taking the maximal incoming value which is the product between the weight on the link and the activity of the node emitting the link. After stabilization, this diffusion process gives the shortest path between all nodes and the goal nodes. This is a neural version of the Bellman-Ford algorithm (Bellman, 1958;Revel et al., 1998) (see Figure 9). Hence, activity on PF is the following:

Initialization
• i 0 is the transition activated if there is a motivation for reaching that goal (there may be several transitions activated)

While the network is not stable
When the robot is at a particular location A, all possible transitions beginning with A are possible. The top-down effect of the cognitive map is to bias the possible transitions such that the ones chosen by the cognitive map have a higher value. This small bias is enough to select/filter the appropriate transitions via a competition mechanism.
Bias of the predicted transitions coming from CA by the cognitive map in PF is performed in the ACC. The activity on ACC is given by: where X CA i (t) is the activity of transition i coming from CA, X PF i (t) is the value of the diffusion of the motivation on PF for the same transition. M is a binary variable indicating whether planning is required or not. So, when exploring M = 0, and activity in ACC is the same as in CA. When planning (M = 1), a transition in ACC has an activity depending on his distance to the goal (X PF i (t)) and on his recognition of the current transition (X CA i (t)). Merging several transitions is performed by a neural field (Amari, 1977;Schöner et al., 1995). Description of the mechanism is beyond the scope of this article, but may be found in (Cuperlier et al., 2006a;Quoy et al., 2003).
Our cognitive map is a topological map. Thus, our system cannot infer a path the robot has never experienced before (see Figure 11). But, this system can nevertheless take a shortcut among the different paths previously realized. The shortcut is then a sequence of transitions previously learned but not necessary in the same order (see Figure 1).
We have tested the creation of the map in several environments. We display in Figure 10 the result in an environment taking the shape of an "eight." In a first robotic experiment, we have verified that the cognitive map was correctly created. For sake of simplicity, we have forced learning at five different places in an open environment (9.9 m × 8.4 m) by manually setting the vigilance parameter to one, instead of relying on the RT threshold. The map correctly displays the adjacency between learned places (see Figure 13).
In a second experiment, creation of PCs was done without any supervision: the recognition threshold based on the vigilance value autonomously providing the learning signal. Figure 12 shows the corresponding cognitive map. The map does not cover the full environment because exploration was only partial.

CONCLUSION
Though relying on the identification of places, our model is able to overcome the shortcomings of PC models by introducing transition cells. The choice of the movement to perform for going from one location to the other is directly triggered by the activation of the corresponding transition. Tests have been successfully carried out in indoor environments. The architecture based on a neuronal modeling is running in real time on a robot. The processes are distributed on three double core Pentium 4 3GHz.
The biological proof of transition cells may be hard to achieve as it would be difficult to observe the difference between (directional) PCs and transition cells in CA. Some neurobiological works found directional firing in PCs in part of the environment which is constraint and nondirectional in open environment. Our model can account for these results in the following way: when the environment is constraint, a transition can only be linked with two others (one before and one after), whereas in open environment transitions are possible with all adjacent transitions. Thus, in open environment place field might seem non-directional (see Figure 14).
We are able to propose a unified vision of the spatial (navigation) and temporal (memory) functions of the HS (Banquet et al., 2005). Current simulation work of the group focus on the several biolocally relevant issues like integration of grid cells and the feedback loop from SUB to deep EC layers (Gaussier et al., 2007).

DISCUSSION
Exploration periods may be alternated with planning periods. The choice of the behavior is obtained through the self regulation of two control variables: first, the motivational information which allows triggering a planning behavior; and second, a detection signal while a new transition is learned which triggers a period of exploration if the planning behavior leads the robot in a place still unknown (case of an incomplete map). Planning then restarts as soon as the robot is able to predict transitions from the current place.
Our model currently running on robots (Koala robots and Labo3 robots) has interesting properties in terms of autonomous behavior. Namely, localization relies only on vision (and a compass). Once exploration has been done, the robot may find its way back to any goal even when many people are freely moving around in the rooms. Assessment of the model performance is hard to quantify and mainly rely on the measurements of the visual system performances (what happens when many landmarks are hidden, shifted . . . ) (Giovannangeli et al., 2006). However, this model has some drawbacks: We are not able to build a Cartesian map of the environment because all locations learned are robot centered. However, the places in the cognitive map and the direction used give a skeleton of the environment.
Size of the goal location has to be of the same size as the place field of the corresponding PC. Consequently, we need a new mechanism to adapt the vigilance in order to autonomously fit the size of the place field. Some parameters have to be set: the recognition threshold (Subsection Autonomous Place Building) and the number of detected landmarks to use by panorama. The first parameter determines the density of build places. The higher the threshold, the more places are created. The second determines, partially (because it depends also of the physical characteristics of the environment like the distance of the detected landmarks), the recognition robustness of PCs. The greater is this number the lowest is the risk that PC activity decrease due to an occlusion of landmarks (e.g., in a dynamic environment this can happen when people move in the room).
Transitions used in this model may also be the elementary block of a sequence learning process. However, going from a graph of transitions to a sequence of transition of any length is still an open question. A scaling problem also appears when one wants to code several different maps. Each map should be linked with a kind of context signal (which floor or which room) that should be able to "reload" the previous learned map (or a part of it) into the different neural structures used here.

CONFLICT OF INTEREST STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial relationships that should be construed as a potential conflict of interest.

APPENDIX: MODEL PARAMETERS
We list in the table below the parameters used to perform the experiment of λ 1 (Equation (7)) 0.05 λ 2 (Equation (7)) 0.9 θ (Equation (9)) 0.15 The next table gives the neural population size used for the same experiment. The neural population size of layer Pr, EC, CA, and of the cognitive map may change according to the size of the environment. For example, the number given here is much greater than strictly needed for a room of this size (9.9 × 8.4 m 2 ). Hence, many neurons remain "unused" and may code for another room.

Number of neurons
Ph 220 Pr 90