A Neurocomputational Model of Goal-Directed Navigation in Insect-Inspired Artificial Agents

Despite their small size, insect brains are able to produce robust and efficient navigation in complex environments. Specifically in social insects, such as ants and bees, these navigational capabilities are guided by orientation directing vectors generated by a process called path integration. During this process, they integrate compass and odometric cues to estimate their current location as a vector, called the home vector for guiding them back home on a straight path. They further acquire and retrieve path integration-based vector memories globally to the nest or based on visual landmarks. Although existing computational models reproduced similar behaviors, a neurocomputational model of vector navigation including the acquisition of vector representations has not been described before. Here we present a model of neural mechanisms in a modular closed-loop control—enabling vector navigation in artificial agents. The model consists of a path integration mechanism, reward-modulated global learning, random search, and action selection. The path integration mechanism integrates compass and odometric cues to compute a vectorial representation of the agent's current location as neural activity patterns in circular arrays. A reward-modulated learning rule enables the acquisition of vector memories by associating the local food reward with the path integration state. A motor output is computed based on the combination of vector memories and random exploration. In simulation, we show that the neural mechanisms enable robust homing and localization, even in the presence of external sensory noise. The proposed learning rules lead to goal-directed navigation and route formation performed under realistic conditions. Consequently, we provide a novel approach for vector learning and navigation in a simulated, situated agent linking behavioral observations to their possible underlying neural substrates.


Symbol Description
Value ( Kronecker delta x M i (t) activity of ith neuron in memory layer R ≥0 λ memory leak parameter 0.0075 x HV i (t) activity of ith neuron in home vector array R ≥0 w ij (t) weights of decoding layer R ≥0 θ HV (t) home vector angle (vector average of x HV i ) [0, 2π) l HV (t) length of home vector R ≥0 m HV (t) motor signal of home vector angle R ≥0 r (t) food reward at the feeder

Experimental Platforms
For our simulation results, we applied two different experimental platforms: First, we embedded the closed-loop control into a two-dimensional simulated point agent (Fig. S1i) for large-scale numerical results. Secondly, we used a simulated, embodied agent based on the the hexapod walking robot AMOS II (Fig. S1ii, Manoonpong et al. (2013)). Both agents are able to perceive sensory input about compass direction, walking speed, and landmark detection, as well as food reward and internally generated signals (e.g., foraging state). These external and internal signals are fed into our model.
Our navigation model produces an output signal which controls the steering direction of the agent. The embodied agent applies a central pattern generator (CPG)-based neural locomotion control, which consists of modular neural networks generating a variety of periodic patterns and coordinating all leg joints. Thus, the agent is able to control a multitude of different, insect-like behavioral patterns. The resulting behaviors include omnidirectional walking and insect-like gaits , which can be controlled manually or autonomously driven by exteroceptive sensors, such as a camera (Zenker et al., 2013), a laser scanner (Kesper et al., 2013), or infrared sensors (Goldschmidt et al., 2014). All neural networks in the CPG-based locomotion control are modeled using a discrete-time non-spiking neuron model with different activation functions (see Manoonpong et al. (2013) for details).

Agent motion dynamics and foraging statistics
In this subsection, we will derive agent trajectory dynamics and foraging statistics for modeling social insects. In both the point and embodied agent cases, the motion trajectory of the agent can be modeled using the current compass orientation φ(t) as well as the walking speed v(t  Figure S1. Experimental platforms. i) The twodimensional point agent simulation NaviSim is applied for large-scale numerical experiments. ii) Lpzrobots framework (Der and Martius, 2012) containing the Modular Robot Control Environment and the simulated artificial agent based on the six-legged walking robot AMOS II . The agent has six legs (R0, R1, R2, L0, L1, L2) and each leg has three joints: the thoraco-coxal (TC) joint enables forward and backward movements, the coxatrochanteral (CTr) joint enables elevation and depression of the leg, and the femur-tibia (FTi) joint enables extension and flexion of the tibia. The agent also contains a multitude of proprio-and exteroceptive sensors.
Here we apply a compass sensor, a walking speed sensor, and infrared (IR) sensors. Both platforms are open-source projects and are available at https://github.com/degoldcode/NaviSim (NaviSim) and https://github.com/georgmartius/lpzrobots (Lpzrobots), respectively. and y are described by the following differential equationṡ In the two-dimensional simulation, we numerically integrate these equations by using the forward Euler method with interval step size ∆t as follows: The orientation of the agent is updated given by the differential equation where Σ(t) is the control output of our model, and k φ = π is a scaling factor. In random foraging, we apply a Gaussian normal distribution N (0, exp(t)) for the turning rate, which corresponds to a correlated random walk of the agent (Bovet and Benhamou, 1988). It has been shown that such a random walk leads to mean foraging distances L proportional to the square root of the simulation time √ t.
Similarly, we can show that the path integration errors follow a similar square-root scaling law.

Directed walk using sine error compensation
In order to generate directed motion of an agent towards a desired orientation, we apply a turning rate based on a sinusoidal function (Mittelstaedt, 1962(Mittelstaedt, , 1985Vickerstaff and Di Paolo, 2005) minimizing the angular difference between the desired and the actual orientation of the agent. The angular difference, or error is defined as where θ is the desired orientation, and φ is the actual orientation. Thus, a turning rate given bẏ leads to right turns (φ < 0) when the actual orientation is left from the desired orientation (δ < 0), and vice versa (see Fig. S3).

Generating searching patterns
An interesting behavior arises from the unstable fixed point given by Eq. S8. When the agent overshoots the home or goal position, its angular error changes rapidly from zero to close to the unstable fixed point ±π. Note that if the agent's angular error is exactly ±π, the turning rate is computed to be zero by definition. However, if δ is close to the unstable fixed point, the agent will slowly turn to the left or right, leading the turning rate to increase in the respective turning direction. As the agent's orientation aligns with the desired orientation, the turning rate decreases to zero. As a result, the agent will perform loops around the desired position in a searching pattern (Vickerstaff and Di Paolo, 2005). Indeed, such looped searching patterns Figure S3. Sketch of the turning dynamics using a sine function of angular difference. The dynamical system consists of two fixed points: a stable one at x = 0 and an unstable one at x = ±π. The system has been shown to be equivalent to a linearly damped pendulum (Vickerstaff, 2007).
has been observed in desert ants (Wehner and Srinivasan, 1981), as well as in honeybees (Reynolds et al., 2007).

Proof: phasor addition of inverted home vector and global vector leads to goal-directed phasor
Here, we prove that adding the phasors given by the inverted home vector (HV) and global vector (GV) leads to a phasor, which has a phase corresponding to the orientation towards the goal. We define the HV as vector a (angle θ HV , length l HV ) and the GV as vector b (angle θ GV , length l GV ). We will show that the vector b − a connecting the agent's current position and the goal (see Fig. S4), is represented by the sum of the HV and GV phasors. We assume that the agent controls its heading orientation φ(t) due to the following differential control where vector a is inversed by substraction of π. For convenience, we will drop the time dependences from now on.
Representing the phasors in the complex plane C with c sin(x) = Im ce ix leads tȯ φ ∝ Im  Figure S4. Sketch of vector-based navigation. In order to derive the correct orientation towards the goal based on a stored vector b, the agent has to subtract by the current position a derived from path integration.
Clearly, (l GV e iθ GV − l HV e iθ HV ) is the vector (b − a) described in complex polar coordinates. Thus, we define the agent-to-goal vector representation in complex polar coordinates to be l goal e iθ goal to obtaiṅ φ ∝ Im l goal e iθ goal e −iφ = l goal sin(θ goal − φ) Therefore, we proved that the addition of the inverted home vector and the global vector phasor leads to a phasor that describes the vector connecting the agent's current location to the goal position.

DERIVING AN ADAPTIVE EXPLORATION RATE BASED ON A GRADIENT RULE
Here, we derive an adaptive exploration rate based on a gradient rule (Triesch, 2005(Triesch, , 2007 for a foraging agent in random environments (i.e., containing randomly distributed goals). The exploration rate only accounts for received rewards in time. We define the time-discounted cumulative reward to be which is given by the update rule v ← r + γv. We assume the exploration rate with respect to v to be given by where β > 0 is the inverse temperature. For later convenience, we derive the partial derivatives of ε, which are given by We derive a gradient rule, which changes β accordingly to bring the probability distribution f ε (ε) of ε(t) closer to an exponential distribution f exp (λ, ε) = λ exp(−λε), assuming a fixed mean distribution. This maximizes the mutual information between the input distribution f v (v) and the output distributionf ε (ε), such that the exploration activity matches the environmental needs. The following relationship is given by the derivatives: We consider the Kullback-Leibler (KL) divergence as a measure for closeness of two probability distributions: By substituting −H[ε] with the relations given by Eq. S12, we derive which leads to Considering that the input entropy H[v] is independent 1 of β, we derive the partial derivative of the KL divergence with respect to β as given by where we used the derivatives from Eqs. S10 and S11. Thus, the partial derivivate of the KL divergence is given by where we used f V (V ) fε( ) = ∂ε ∂V . Finally, the gradient-descent rule is given by which we apply as an update for β ← β + ∆β.

PSEUDOCODE OF LEARNING ALGORITHM FOR ADAPTIVE VECTOR NAVIGATION
Algorithm 1 Learning algorithm for adaptive vector navigation Repeat: At simulation time t Step 1: Update sensory inputs (compass φ(t), speed s(t), internal states σ(t) and rewards R(t) from environmental interactions of the agent.
Step 3: Update global vector array activities x GV i (t) and weights w GV i (t) using Eqs. 12-15.
Step 8: Update agent's position due to control output.
Until: maximum simulation time is reached (t = T ).