Reclaiming saliency: rhythmic precision-modulated action and perception

Computational models of visual attention in artificial intelligence and robotics have been inspired by the concept of a saliency map. These models account for the mutual information between the (current) visual information and its estimated causes. However, they fail to consider the circular causality between perception and action. In other words, they do not consider where to sample next, given current beliefs. Here, we reclaim salience as an active inference process that relies on two basic principles: uncertainty minimisation and rhythmic scheduling. For this, we make a distinction between attention and salience. Briefly, we associate attention with precision control, i.e., the confidence with which beliefs can be updated given sampled sensory data, and salience with uncertainty minimisation that underwrites the selection of future sensory data. Using this, we propose a new account of attention based on rhythmic precision-modulation and discuss its potential in robotics, providing numerical experiments that showcase advantages of precision-modulation for state and noise estimation, system identification and action selection for informative path planning.


Introduction
Attention is a fundamental cognitive ability that determines which events from the environment, and the body, are preferentially processed [Itti and Koch, 2001]. For example, the motor system directs the visual sensory stream by orienting the fovea centralis (i.e., the retinal region of highest visual acuity) towards points of interest within the visual scene. Thus, the confidence with which the causes of sampled visual information are inferred is constrained by the physical structure of the eye -and eye movements are necessary to minimise uncertainty about visual percepts [Ahnelt, 1998]. In neuroscience, this can be attributed to two distinct, but highly interdependent attentional processes: (i) attentional gain mechanisms reliant on estimating the sensory precision of current data [Feldman andFriston, 2010, Yang et al., 2016a], and (ii) attentional salience that involves actively engaging with the sensorium to sample appropriate future data Friston, 2019, Lengyel et al., 2016]. Put simply, we formalise the fundamental difference between attention -as optimising perceptual processing --and salience as optimising the sampling of what is processed. This highlights the dynamic, circular nature with which biological agents acquire, and process, sensory information.
Understanding the computational mechanisms that undergird these two attentional phenomena is pertinent for deploying apt models of (visual) perception in artificial agents [Klink et al., 2014, Mousavi et al., 2016, Atrey et al., 2019 and robots [Frintrop and Jensfelt, 2008, Begum and Karray, 2010, Ferreira and Dias, 2014, Lanillos et al., 2015a. Previous computational models of visual attention, used in artificial intelligence and robotics, have been inspired (and limited) by the feature integration theory proposed by [Treisman and Gelade, 1980] and the concept of a saliency map [Tsotsos et al., 1995, Itti and Koch, 2001, Borji and Itti, 2012. Briefly, a saliency map is a static two-dimensional 'image' that encodes stimulus relevance, e.g., the importance of particular region. These maps are then used to isolate relevant information for control (e.g., to direct foveation of the maximum valued region). Accordingly, computational models reliant on this formulation do not consider the circular-dependence between action selection and cue relevance -and simply use these static saliency maps to guide action.
In this article, we adopt a first principles account to disambiguate the computational mechanisms that underpin attention and salience  and provide a new account of attention. Specifically, our formulation can be effectively implemented for robotic systems and facilitates both state-estimation and action selection. For this, we associate attention with precision control, i.e., the confidence with which beliefs can be updated given (current) sampled sensory data. Salience is associated with uncertainty minimisation that influences the selection of future sensory data. This formulation speaks to a computational distinction between action selection (i.e., where to look next) and visual sampling (i.e., what information is being processed). Importantly, recent evidence demonstrates the rhythmic nature of these processes via a theta-cycle coupling that fluctuates between high and low precision-as unpacked in Sec 2. From a robotics perspective, resolving uncertainty about states of affair speaks to a form of Bayesian optimality, in which decisions are made to maximise expected information gain [Lindley, 1956, Sajid et al., 2021a. The duality between attention and salience is important for resolving uncertainty and enabling active perception. Significantly, it addresses an important challenge for defining autonomous robotics systems that can balance optimally between data assimilation (i.e., confidently perceiving current observations) and exploratory behaviour to maximise information gain [Bajcsy et al., 2018].
In what follows, we review the neuroscience of attention and salience (Sec. 2) to develop a novel (computational) account of attention based on precision-modulation that underwrites perception and action (Sec. 3). Next, we facevalidate our formulation within a robotics context using numerical experiments (Sec. 4). The robotics implementation instantiates a free energy principle (FEP) approach to information processing [Friston, 2010]. This allows us to modulate the (appropriate) precision parameters to solve relevant robotics challenges in perception and control; namely, state-estimation (Sec. 4.2.2), system identification (Sec. 4.2.3), planning (Sec. 4.3), and active perception (Sec. 4.3.3). We conclude with a discussion of the requisite steps for instantiating a full-fledged computational model of precision-modulated attention -and its implications in a robotics setting.

Attention and salience in neuroscience
Our interactions with the world are guided by efficient gathering and processing of sensory information. The quality of these acquired sensory data is reflected in attentional resources that select sensations which influence our beliefs about the (current and future) states of affairs [Lengyel et al., 2016, Yang et al., 2016b. This selection is often related to gain control, i.e., an increase of neural spikes when an object is attended to. However, gain control only accounts for half the story because we can only attend to those objects that are within our visual field. Accordingly, if a salient object is outside the centre of our visual field, we orient the fovea to points of interest. This involves two separate, but often conflated, processes: attention and salience -where the former relates to processing current visual data, and the latter to ensuring the agent samples salient data in the future . That these two processes are strongly coupled is exemplified by the pre-motor theory of attention [Rizzolatti et al., 1987], which highlights the close relationship between overt saccadic sampling of the visual field and the covert deployment of attention in the absence of eye movements. Specifically, it posits that covert attention 2 is realised via processes that are generated by particular eye movements but inhibits the action itself. In this sense, it does not distinguish between covert and overt 3 types of attention.
From a first principles (Bayesian) account, it is necessary to separate between attention and salience because they speak to different optimisation processes. Explicitly, attention as a precision-dependent (neural) gain control mechanism that facilitates optimisation of the current sampled sensory data [Feldman andFriston, 2010, Desimone, 1996]. Conversely, salience is associated with selection of future data that reduces uncertainty [Mirza et al., 2016a, Friston et al., 2015a. Put simply, it is possible to optimise attention in the absence of eye movements and active vision, whereas salience is necessary to optimise the deployment of eye movements. In what follows, we formalise this distinction with a particular focus on visual attention [Kanwisher and Wojciulik, 2000], and discuss recent findings that speak to a rhythmic coupling that underwrites periodic deployment of gain control and saccades, via modulation of distinct precision parameters.

Attention as neural gain control
Neural gain control (or precision) can be regarded as an amplifier of neural communication during attention tasks [Eldar et al., 2013, Reynolds et al., 2000. Specifically, an increase in gain amplifies the postsynaptic responses of neurons to their pre-synaptic input. Thus, gain control rests on synaptic modulation that can emphasise --or preferentially select -a particular type of sensory data. From a Bayesian perspective , Spratling, 2008, Parr et al., 2018, this speaks to the confidence with which beliefs can be updated given sampled sensory data (i.e., optimal state estimation) -under a generative model [Whiteley andSahani, 2008, Parr et al., 2018]. For example, affording high precision to certain sensory inputs would lead to confident Bayesian belief updating. However, low precision reduces the influence of sensory input by attenuating the precision of the likelihood, relative to a prior belief, and current observations would do little to resolve ensuing uncertainty. Thus, sampled visual data (from different areas) can be predicted with varying levels of precision, where attention accentuates sensory precision. The deployment of precision or attention is influenced by competition between stimuli (i.e., which sensory data to sample) and prior beliefs. Interestingly, casting attention as precision or, equivalently, synaptic gain offers a coherency between biased competition [Desimone, 1996], predictive coding [Spratling, 2008] and generic active inference schemes [Feldman and Friston, 2010, Parr et al., 2018, Brown et al., 2013, Kanai et al., 2015.
Naturally, gain control is accompanied by neuronal variability, i.e., sharpened neural responses for the same task over time. Consistent with gain control, these fluctuations in neural responses across trials can be explained by precision engineered message passing [Clark, 2013] via (i) normalization models Heeger, 2009, Ruff andCohen, 2016], (ii) temperature parameter manipulation [Mirza et al., 2019, Parr et al., 2018, Parr and Friston, 2017, Feldman and Friston, 2010, or (iii) introduction of (conjugate hyper-)priors that are either pre-specified [Sajid et al., 2021b or optimised using uninformed priors [Friston et al., 2003, Anil Meera and. Recently, these approaches have been used to simulate attention by accentuating predictions about a given visual stimulus [Reynolds and Heeger, 2009, Feldman and Friston, 2010, Ruff and Cohen, 2016. For example, normalization models propose that every neuronal response is normalized within its neuronal ensemble (i.e., the surrounding neuronal responses) [Heeger, 1992, Louie andGlimcher, 2019]. Thus, to amplify the neuronal response of particular neuron, the neuronal pool has to be inhibited such that that particular neuron has a sharper evoked response [Schmitz and Duncan, 2018]. Importantly, these (superficially distinct) formulations simulate similar functions using different procedures to accentuate responses over a particular neuronal pool for a given neuron or a group of neurons. This introduces shifts in precision to produce attentional gain and the precision of neuronal encoding.

Salience as uncertainty minimisation
In the neurosciences, (visual) salience refers to the 'significance' of particular objects in the environment. Salience often implicates the superior colliculus, a region that encodes eye movements [White et al., 2017]. This makes intuitive sense, as the superior colliculus plays a role in generation of eye movements -being an integral part of the brainstem oculomotor network [Raybourn and Keller, 1977] -and salient objects provide information that is best resolved in the centre of the visual field, thus motivating eye movements to that location. For this reason, our understanding of salience is a quintessentially action-driving phenomenon . Mathematically, salience has been defined as Bayesian surprise Koch, 2001, Itti andBaldi, 2009], intrinsic motivation [Oudeyer and Kaplan, 2009], and subsequently, epistemic value under active inference [Mirza et al., 2016b, Parr et al., 2018. Active inference -a Bayesian account of perception and action [Friston et al., 2017a, Da Costa et al., 2020 -stipulates that action selection is determined by uncertainty minimisation. Formally, uncertainty minimisation speaks to minimisation of an expected free energy functional over future trajectories [Da Costa et al., 2020, Sajid et al., 2021a. This action selection objective can be decomposed into epistemic and extrinsic value, where the former pertains to exploratory drives that encourage resolution of uncertainty by sampling salient observations, e.g., only checking one's watch when one does not know the time. However, after checking the watch there is little epistemic value in looking at it again. Generally, the tendency to seek out new locations -once uncertainty has been resolved at the current fixation point -is called inhibition of return [Klein, 2000].
From an active inference perspective, this phenomenon is prevalent because a recent action has already resolved the uncertainty about the time and checking again would offer nothing more in terms of information gain . Accordingly, salience involves seeking sensory data that have a predictable, uncertainty reducing, effect on current beliefs about states of affairs in the world [Parr et al., 2018, Mirza et al., 2016b. Thus salience contends with beliefs about data that must be acquired and the precision of beliefs about policies (i.e., action trajectories) that dictate it. Formally, this emerges from the imperative to maximise the amount of information gained regarding beliefs, from observing the environment. Happily, prior studies have made the connection between eye movements, salience, and precision manipulation [Friston et al., 2011, Brown et al., 2013, Crevecoeur and Kording, 2017. This connection emerges from planning strategies that allow the agent to minimise uncertainty by garnering the right kind of data.
Next, we consider recent findings on how the coupling of these two mechanisms, attention and salience, may be realised in the brain.

Rhythmic coupling of attention and salience
To illustrate the coupling between attention and salience, we turn to a recent rhythmic theory of attention. The theory proposes that coupling of saccades, during sampling of visual information, happens at neuronal and behavioural theta oscillations; a frequency of 3-8Hz Kastner, 2019, 2021]. This frequency simultaneously allows for: (i) a systematic integration of visual samples with action, and (ii) a temporal schedule to disengage and search the environment for more relevant information.
Given that gain control is related to increased sensory precision, we can accordingly relate saccadic eye movements to the decreased precision. This introduces saccadic suppression, a phenomenon that decreases visual gain during eye movements [Crevecoeur and Kording, 2017]. This phenomenon was described by Helmholtz who observed that externally initiated eye movements (e.g., when oneself gently presses a side of an eye) eludes the saccadic suppression that accompanies normal eye movements -and we see the world shift, because optic flow is not attenuated [Helmholtz, 1925]. An interesting consequence of this is that, as eye movements happen periodically [Rucci et al., 2018, Benedetto et al., 2020, there must be a periodic switch between high and low sensory precision, with high precision (or enhanced gain) during fixations and low precision (or suppressed gain) during saccades. Interestingly, it has been shown that rather than having action resetting the neural periodicity, it is better understood as something that aligns within an already existing rhythm [Tomassini et al., 2017, Hogendoorn, 2016. Additionally, the rhythmicity of higher and lower fidelity of sensory sampling has been shown to fluctuate rhythmically around 3Hz [Benedetto and Morrone, 2017], suggesting that action emerges rhythmically when visual precision is low [Hogendoorn, 2016], triggering salience.
Building upon this, we hypothesise that theta rhythms generated in the fronto-parietal network , Helfrich et al., 2018, Fiebelkorn and Kastner, 2020 couples saccades with saccadic suppression causing the switches between visual sampling and saccadic shifting. This introduces a diachronic aspect to the belief updating process , Parr and Pezzulo, 2021, Sajid et al., 2022; i.e., sequential fluctuations between attending to current data (perception) and seeking new data (action). This supports empirical findings that both eye movements [Sommer and Wurtz, 2006] and filtering irrelevant information [Nakajima et al., 2019, Phillips et al., 2016, Fiebelkorn and Kastner, 2020 are initiated in this cortical network. Interestingly, both eye movements and visual filtering then propagate to sub-cortical regions, i.e., the superior colliculus-for saliency map composition [White et al., 2017]-and the thalamus-for gain control [Kanai et al., 2015, respectively. Furthermore, this is consistent with recent findings that the periodicity of neural responses are important for understanding the relation of motor responses and sensory information -i.e., perception-action coupling [Benedetto et al., 2020]. Importantly, theta rhythms also speak to the speed (i.e., the temporal schedule) with which visual information is sampled from the environment [Busch and VanRullen, 2010, Dugué et al., 2015, Helfrich et al., 2018. Meaning visual information is not sampled continuously, as our visual experiences would suggest, but rather it is made of successive discrete samples [VanRullen, 2016.
In summary, the computations that underwrite attention and active vision are coupled and exhibit circular causality. Briefly, selective attention and sensory attenuation optimize the processing of sensory samples and which particular visual percepts are inferred. In turn, this determines appropriateness of future eye movements (or actions). Interestingly, the close functional (and computational) link between the two mechanisms endorses the pre-motor theory of attention. Figure 1: A graphical illustration of the precision-modulated account of perception and action. Salience and attention are computed based upon beliefs (assumed to be) encoded in parts of the fronto-parietal network and realised in distinct brain regions: superior colliculus (SC) for perception as inference and thalamus for planning as inference, respectively. To deploy attentional processes efficiently, these two mechanisms have to be aligned, which is done rhythmically, hypothetically in theta frequency. This coupling enables the saccadic suppression phenomenon through fluctuations in precision (on an arbitrary scale). When precision is low (i.e., the trough of the theta rhythm), the saccade emerges. Note that there might be distinct processes inhibiting the action (e.g., covert attention). On the other hand, high precision facilitates confident inferences about the causes of visual data. Under this account, thalamus is used for initiating gain control (or visual sampling in general) by providing stronger sensory input, while superior colliculus dictates next saccades, that lead to most informative fixation positions.

Proposed precision-modulated account of attention and salience
Here, we introduce our precision-modulated account of perception and action. A graphical illustration is provided in Figure 1. For this, we turn to attention and salient action selection which have their roots in biological processes relevant for acquiring task-relevant information. Under an active inference account, this attention influences (posterior) state estimation and can be associated with increased precision of belief updating and gain control-described in Sec. 2.1. Furthermore, this is distinct from salience despite interdependent neuronal composition and computations.
Further alignment between the two constructs can be revealed by considering the temporal scheduling between movement (i.e., action) and perception for uncertainty resolution . We postulate that this perception-action coupling is best understood as a periodic fluctuation between minimising uncertainty and precision control. Subsequently, action is deployed to reduce uncertainty. Such an alignment specifies what stimulus is selected and under what level of precision it is processed.  hypothesise that action alignment with precision is due to the eye structure that provides precise information in the fovea and requires the agent to foveate the most informative stimulus. We extend this by considering the periodic deployment of gain control with saccades [Hogendoorn, 2016, Nakayama and Motoyoshi, 2019, Tomassini et al., 2017, Benedetto and Morrone, 2017.
Accordingly, our formulation defines attention as precision control and salience as uncertainty minimisation supported by discrete sampling of visual information at a theta rhythm. This synchronises perception and action together in an oscillatory fashion [Hogendoorn, 2016]. Importantly, a Bayesian formulation of this can be realised as precision manipulation over particular model parameters. We reserve further details for Sec. 4. Summary Based upon our review, we propose a precision-modulated account of attention and salience, emphasising the diachronic realisation of action and perception. In the following sections, we investigate a realisation of this model for a robotic system.

Precision-based attention for Robotics
The previous section introduced a conceptual account to explain the computational mechanisms that undergird attention based on neuroscience findings. We focused on reclaiming saliency as an active process that relies on neural gain control, uncertainty minimisation and structured scheduling. Here, we describe how we can mathematically realise some of these mechanisms in the context of well-known challenges in robotics. Enabling robots with this type of attention may be crucial to filter the sensory signals and internal variables that are relevant to estimate the robot/world state and complete any task. More importantly, the active component of salience (i.e., behaviour) is essential to interact with the world-as argued in active perception approaches [Bajcsy et al., 2018].
We revisit the standard view of attention in robotics by introducing sensory precision (inverse variance) as the driving mechanism for modulating both perception and action [Clark, 2013, Friston et al., 2011. Although saliency was originally described to underwrite behaviour, most models used in robotics, strongly biased by computer vision approaches, focus on computing the most relevant region of an image [Borji and Itti, 2012]-mainly computing human fixation maps-relegating action to a secondary process. Illustratively, state-of-the-art deep learning saliency models-as shown in the MIT saliency benchmark [Bylinskii et al., 2019]-do not have the action as an output. Conversely, the active perception approach properly defines the action as an essential process of active sensing to gather the relevant information. Our proposed model, based on precision modulated action and perception coupling (i) place attention as essential for state-estimation and system identification and (ii) and reclaims saliency as a driver for information-seeking behaviour, as proposed in early works [Tsotsos et al., 1995], but goes beyond human fixation maps for both improving the model of the environment (exploration) and solving the task (exploitation). In what follows, we highlight the key role of precision by reviewing relevant brain-inspired attention models deployed in robotics (Sec. 4.1). We propose precision-modulated attentional mechanisms for robots in three contexts -perception (Sec. 4.2), action (Sec. 4.3) and active perception (Sec. 4.3.3). The precision-modulated perception is formalised for a robotics setting; via (i) state estimation (i.e., estimating the hidden states of a dynamic system from sensory signals -Sec. 4.2.2), and (ii) system identification (i.e., estimating the parameters of the dynamic system from sensory signals -Sec. 4.2.3). Next, we show that precision-modulated action can be realised through precision optimisation (planning future actions -Sec. 4.3.2) and discuss practical considerations for coupling with precision-modulated perception (precision based active perception -Sec. 4.3.3). Table 1 summarises our proposed precision manipulations to solve relevant problems in robot perception and action. Table 2 provides the definitions of precision within our mechanism. The robot's confidence on its prior parameters η θ . Noise precisionΠ The inverse covariance of all noises (Eqn. 5). Posterior parameter precision Π θ The robot's confidence on its parameter estimates.

Previous brain-inspired attention models in robotics
Brain-inspired attention has been mainly addressed in robotics from a 'passive' visual saliency perspective, e.g., which pixels of the image are the most relevant. This saliency map is then generally used to foveate the most salient region. This approach was strongly influenced by early computational models of visual attention [Tsotsos et al., 1995, Itti andKoch, 2001]. The first models deployed in robots were bottom-up, where the sensory input was transformed into an array of values that represents the importance (or salience) of each cue. Thus, the robot was able to identify which region of the scene has to look at, independently of the task performed-see Borji and Itti [2012] for a review on visual saliency. These models have also been useful for acquiring meaningful visual features in applications, such as object recognition [Orabona et al., 2005, Frintrop, 2006, localisation, mapping and navigation [Frintrop and Jensfelt, 2008, Kim and Eustice, 2013, Roberts et al., 2012. Saliency computation was usually employed as a helper for the selection of the relevant characteristics of the environment to be encoded. Thus, reducing the information needed to process.
More refined methods of visual attention employed top-down modulation, where the context, task or goal bias the relevance of the visual input. These methods were used, for instance, to identify humans using motion patterns [Butko et al., 2008, Morén et al., 2008. A few works also focused on object/target search applications, where top-down and bottom-up saliency attention were used to find objects or people in a search and rescue scenario [Rasouli et al., 2020].
Attention has also been considered in human-robot interaction and social robotics applications [Ferreira and Dias, 2014], mainly for scene or task understanding [Ude et al., 2005, Kragic et al., 2005, Lanillos et al., 2016, and gaze estimation [Shon et al., 2005] and generation [Lanillos et al., 2015a]. For instance, computing where the human is looking at and where the robot should look at or which object should be grasped. Furthermore, multi-sensory and 3D saliency computation has also been investigated [Lanillos et al., 2015b]. Finally, more complex attention behaviours, particularly designed for social robotics and based on human non-verbal communication, such as joint attention, have also been addressed. Here the robot and the human share the attention of one object through meaningful saccades, i.e., head/eye movements [Kaplan and Hafner, 2006, Nagai et al., 2003, Lanillos et al., 2015a.
Although attention mechanisms have been widely investigated in robotics, specially to model visual cognition [Begum andKarray, 2010, Kragic et al., 2005], the majority of the works have treated attention as an extra feature that can help the visual processing, instead of a crucial component needed for the proper functioning of the cognitive abilities of the robot [Lanillos and Cheng, 2018a]. Furthermore, these methods had the tendency to leave the action generation out of the attention process. One of the reasons for not including saliency computation, in robotic systems, is that the majority of the models only output 'human-fixation map' predictions, given a static image. Saliency computation introduces extra computational complexity, which can be finessed by visual segmentation algorithms (e.g., line detectors in autonomous navigation). However, it does not resolve uncertainty nor select actions that maximise information gain in the future. In essence, the incomplete view of attention models that output human-fixation maps has arguably obscured the huge potential of neuroscience-inspired attentional mechanisms for robotics.
Our proposed model of attention, based on precision modulation, abandons the current robotics narrow view of attention and saliency by explicitly modelling attention within state estimation, learning and control. Thus, placing attentional processes at the core of the robot computation and not as an extra add-on. In the following sections, we describe the realisation of our precision-based attention formulation in robotics using common practical applications as the backbone motif.

Precision-modulated perception
We formalise precision-modulated perception from a first principles Bayesian perspective -explicitly the free energy principle approach proposed by Friston et al. [2011]. Practically, this entails optimising precision parameters over (particular) model parameters.
Through numerical examples we show how our model is able to perform accurate state estimation [Bos et al., 2021] and stable parameter learning [Meera and Wisse, 2021a,b]. To illustrate the approach, we first introduce a dynamic system modelled as a linear state space system in robotics (Sec. 4.2.1)-we used this formulation in all our numerical experiments. We briefly review the formal terminologies for a robotics context to appropriately situate our precisionbased mechanism for perception. Explicitly, we introduce: precision modelling (by adapting a known form of the precision matrix), precision learning (by learning the full precision matrix), and precision optimisation (use precision as an objective function during learning). As a reminder, precision modelling is associated with (instantaneous) gain control and precision learning (at slower time scales) is associated with optimising that control.

Precision for state space models
A linear dynamic system can be modelled using the following state space equations (boldface notation denotes components of the real system and non-boldface notation its estimates): where A, B and C are constant matrices defining the system parameters, x ∈ R n is the system state (usually an unobserved variable), u ∈ R r is the input or control actions, y ∈ R m is the output or the sensory measurements, w ∈ R n is the process noise with precision Π w (or inverse variance Σ w−1 ), and z ∈ R m is the measurement noise with precision Π z .
For instance, we can describe a mass-spring damper system (depicted in Fig. 2b) using state space equations. A mass (m = 1.4kg) is attached to a spring with elasticity constant (k = 0.8N/m), and a damper with a damping coefficient (b = 0.4N s/m). When a force (u(t) = e −0.25(t−12) 2 ) is applied on the mass, it displaces x from its equilibrium point. The linear dynamics of this system is given by: Note that Eq. (2)  Now we introduce attention as precision modulation assuming that the robotic goal is to minimise the prediction error [Friston et al., 2011, Lanillos and Cheng, 2018b, Meera and Wisse, 2020, i.e., to refine its model of the environment and perform accurate state estimation, given the information available. In other words, the robot has to estimate x and u from input prior η u with a prior precision of P u , given the measurements y, parameters A, B, C and noise precision Π w and Π z . Formally, the prediction error˜ of the sensory measurements˜ y , control input referencẽ u and state˜ x are:˜ sensory prediction error control input prediction error state prediction error (3) Note that˜ y =ỹ −Cx is the difference between the observed measurement and the predicted sensory input given the state 4 . Here D x performs the (block) derivative operation, which is equivalent to shifting up all the components in generalised coordinates by one block.
We can estimate the state and input using the Dynamic Expectation Maximisation (DEM) algorithm [Friston et al., 2008, Meera andWisse, 2020] that optimises a free energy variational bound F to be tractable 5 . This is: Crucially,Π is the generalised noise precision that modulates the contribution of each prediction error to the estimation of the state and the computation of the action. Thus,Π is equivalent to attentional gain. For instance, we can model the precision matrix to attend to the most informative signal derivatives inỹ. Concisely, the precisionΠ has the following form: 4 The tilde over the variable refers to the generalised coordinates, i.e., the variable includes all temporal derivatives. Thus,˜ is the combined prediction error of outputs, inputs and states. For example, the generalised outputỹ is given byỹ = [y, y , y ...] T , where the prime operator denotes the derivatives. We use generalised coordinates  for achieving accurate state and input estimation during the presence of (coloured) noise by modelling the time dependent quantities (x, v, y, w, z) in generalised coordinates. This involves keeping track of the evolution of the trajectory of the probability distributions of states, instead of just their point estimates. Here the coloured noise w and z are modelled as a white noise convoluted with a Gaussian kernel. The use of generalised coordinates has recently shown to outperform classical approaches under coloured noise on real quadrotor flight [Bos et al., 2021] 5 Note that this expression of the variational free energy is using the Laplace and mean-field approximations commonly used in the FEP literature where S is the smoothness matrix. In Sec. 4.2.2, we show that modelling the precision matrixΠ using the S matrix improves the estimation quality.
The full free energy functional (time integral of free energyF = Fdt at optimal precision) that the robot optimises to perform state-estimation and system identification is described in Eq. (6)-for readability we omitted the details of the derivation of this cost function, and we refer to  for further details.
Here θ = θ − η θ , λ = λ − η λ are the prediction errors of parameters and hyper-parameters 6 .F consist of two main components: i) precision weighed prediction errors and ii) precision-based entropy. The dominant role of precisionin the free energy objective --is reflected in how modulating these precision parameters can have a profound influence on perception and behaviour. The theoretical guarantees for stable estimation [Meera and Wisse, 2021b], and its application on real robots  make this formulation very appealing to robotic systems.
Note that we can manipulate three kinds of precision within the state space formulation: i) prior precision (Pũ, P θ , P λ ), ii) conditional precision on estimates (Π X , Π θ , Π λ ) and iii) noise precision (Π z , Π w ). Therefore, to learn the correct parameter values θ, we i) learn the parameter precision Π θ , ii) model the prior parameter precision P θ , and iii) learn the noise precision Π w and Π z (parameterised using λ).

State and input estimation
State estimation is the process of estimating the unobserved states of a real system from (noisy) measurements. Here, we show how we can achieve accurate estimation through precision modulation in a linear time invariant system under the influence of coloured noise [Meera and Wisse, 2020]. State estimation in the presence of coloured noise is inherently challenging, owing to the non-white nature of the noise, which is often ignored in conventional approaches, such as the Kalman Filter [Welch et al., 1995]. Figure 2 summarises a numerical example that shows how one can use precision modulation to focus on the less noisy derivatives (lower derivatives) of measurements, relative to imprecise higher derivatives. Thus, enabling the robot to use the most informative data for state and input estimation, while discarding imprecise input. Figure 2b depicts the mass-spring damper system used. The numerical results show that the quality of the estimation increases as the embedding ordering increases but the lack of information in the higher order derivatives of the sensory input do not affect the final performance due to the precision modulation. The higher order derivatives (Fig. 2a) are less precise than the lower derivatives, thereby reflecting the loss of information in higher derivatives. The state and input estimation was performed using the optimisation framework described in the previous section. The quality of estimation is shown in Fig. 2c, where the input estimation using six derivatives (blue curve) is closer to the real input (yellow curve) than when compared to the estimation using only one derivative (red curve). The quality of the estimation reports the sum of squared error (SSE) in the estimation of states and inputs with respect to the embedding order (number of signal derivatives considered).
To obtain accurate state estimation by optimising the precision parameters, we recall that the precision weights the prediction errors. From Eq. (3), the structural form ofΠ is mainly dictated by the smoothness matrix S, which establishes the interdependence between the components of the variable expressed in generalised coordinates (e.g., the dependence between y, y and y inỹ). For instance, the S matrix for a Gaussian kernel is as follows: 6 System identification involves the estimation of system parameters (denoted by θ, e.g., vectorised A), given y, u, by starting from a parameter prior of η θ with prior precision P θ , and a prior on noise hyper-parameter η λ with a prior precision of P λ . Note that we parametrise noise precision (Π w and Π z ) using λ ∈ R 2×1 = λ z λ w as an exponential relation (e.g., Π w (λ w ) = exp(λ w )I n×n ). Figure 2: An illustration of an attention mechanism in state and input estimation. The quality of the estimation improves as the embedding order (number of derivatives) of generalised coordinates are increased. However, the imprecise information in the higher order derivatives of the sensory input y does not affect the final performance of the observer because of attentional selection, which selectively weighs the importance afforded to each derivative, in the free energy optimisation scheme.
where s is the kernel width of the Gaussian filter that is assumed to be responsible for serial correlations in measurement or state noise. Here, the order of generalised coordinates (number of derivatives under consideration) is taken as six (S ∈ R 7×7 ). For practical robotics applications, the measurement frequency is high, resulting in 0 < s < 1. It can be observed that the diagonal elements of S decreases because s < 1, resulting in a higher attention (or weighting) on the prediction errors from the lower derivatives when compared to the higher derivatives. The higher the noise colour (i.e., s increases), the higher the weight given to the higher state derivatives (last diagonal elements of S increases). This reflects the fact that smooth fluctuations have more information content in their higher derivatives. Having established the potential importance of precision weighting in state estimation, we now turn to the estimation (i.e., learning) of precision in any given context.

System identification
This section shows how to optimise system identification by means of precision learning [Anil Meera andWisse, 2021b]. Specifically, we show how to fuse prior knowledge about the dynamic model with the data to recover unknown parameters of the system through an attention mechanism. This involves the learning of the 1) parameters and 2) noise precisions. Our model 'turns' the attention to the least precise parameters and uses the data to update those parameters to increase their precision. Hence, allowing faster parameter learning.
For the sake of clarity, we use again the mass-spring-damper system as the driving example (Sec. 4.2.1). We formalise system identification as evaluating the unknown parameters k, m and b, given the input u, the output y, and the general form of the linear system in Eq.
(2). Figure 3 depicts the process of learning unknown parameters (dotted boxes denote the processes inside the robot brain). The robot measures its position x(t) using its sensors (e.g., vision or range sensor). We assume that the robot has observed the behaviour of a mass-spring-damper system before or a model is provided by the expert designer. However, some of the parameters are unknown. The robot can reuse the prior learned model of the system to relearn the new system. This can be realised by setting a high prior precision on the known parameters and a low prior precision on the unknown parameters. By means of precision learning, the robot uses the sensory signals to learn the parameter precision Π θ , thereby improving the confidence in the parameter estimates θ. This directs the robot's attention towards the refinement of the parameters with least precision as they are the most uncertain. The requisite parameter learning Figure 3: The schematic of the robot's attention mechanism for learning the least precise parameters of a given generative model of a mass-spring-damper system.
proceeds by the gradient ascent of the free energy functional given in Eq. (6). The parameter precision learning proceeds by tracking the negative curvature ofF as Π θ = − ∂ 2F ∂θ 2 [Anil Meera and Wisse, 2021]. The learning process -by means of variational free energy optimisation (maximisation) -is shown in Fig. 3b. The learning involves two parallel processes: precision learning (Fig. 3a), and parameter learning (Fig. 3c). Precision learning comprises of parameter precision learning (top graph) -i.e., identifying the precision of an approximate posterior density for the parameters being estimated -and noise precision learning (bottom graph). The high prior precision on the known system parameters (0 and 1), and low prior precision on the unknown system parameters (− k m , − b m and 1 m , highlighted in blue) directs attention towards learning the unknown parameters and their precision. Note that in Fig. 3a, the precision on the three unknown parameters start from a low prior precision of P θ = 1 and increase with each iteration, whereas the precision of known parameters (0 and 1) remains a constant (3.3 × 10 6 ). The noise precisions are learned simultaneously, which starts from a low prior precision of P λ w = P λ z = 1 and finally converges to the true noise precision (dotted black line). Both precisions are used to learn the three parameters of the system (Fig. 3b), which starts from randomly selected values within the range [-2,2] and finally converges to the true parameter values of the system (θ 3 = − k m = −0.5714, θ 4 = − b m = −0.2857 and θ 6 = 1 m = 0.7143), denoted by black dotted lines. From an attentional perspective, the lower plot in (Fig. 3a) is particularly significant here. This is because the robot discovers the data are more informative than initially assumed, thereby leading to an increase in its estimate of the precision of the data-generating process. This means that the robot is not only using the data to optimise its beliefs about states and parameters (system identification), it is also using these data to optimise the way in which it assimilates these data.
In summary, precision-based attention, in the form of precision learning, helps the robot to accurately learn unknown parameters by fusing prior knowledge with new incoming data (sensory measurements), and attending to the least precise parameters.

Precision-modulated exploration and exploitation in system identification
Exploration and exploitation in the parameter space can be advantageous to robots during system identification. Precision-based attention-here the prior precision-allows a graceful balance between the two, mediated by the prior precision 7 . A very high prior precision encourages exploitation and biases the robot towards believing its priors, while a low prior precision encourages exploration and makes the robot sensitive to new information. Figure 4: Precision-based attention allows exploration and exploitation balanced model learning mediated by the prior precisions on the parameters P θ . The higher the P θ , the higher the attention on prior parameters η θ and the lower the attention on the sensory signals while learning.
We use again the mass-spring-damper system example but with a different prior parameter precision P θ . The prior parameters are initialised at random and learned using optimisation. Figure 4b shows the increase in parameter estimation error (SSE) as the prior parameter precision P θ increases until it finally saturates. The bottom left region (circled in red) indicates the region where the prior precision is low, encouraging exploration with high attention on the sensory signals for learning the model. This region over-exposes the robot to its sensory signals by neglecting the prior parameters. The top right region (circled in red) indicates the biased robot where the prior precision is high, encouraging the robot to exploit its prior beliefs by retaining high attention on prior parameters. This regime biases the robot into being confident about its priors and disregarding new information from the sensory signals. Between those extreme regimes (blue curve) the prior precision balances the exploration-exploitation trade-off. Figure 4a describes how increased attention to sensory signals helped the robot to recover from poor initial estimates of parameter values and converge towards the correct values (dotted black line). Conversely, in Fig. 4c, high attention on prior parameters did not help the robot to learn the correct parameter values.
These results establish that prior precision modelling allows balanced exploration and exploitation of parameter space during system identification. Although the results show that an over-exposed robot provides better parameter learning, we show -in the next section -that this is not always be the case.

Noise estimation
In real-world applications, sensory measurements are often highly noisy and unpredictable. Furthermore, the robot does not have access to the noise levels. Thus, it needs to learn the noise precision (Π z ) for accurate estimation and robust control. Precision-based attention enables this learning. In what follows, we show how one can estimate Π z using noise precision learning and that biasing the robot to prior beliefs can be advantageous in highly noisy environments.
Consider again the mass-spring-damper system in Figure 5b, where heavy rainfall/snow corrupts visual sensory signals. We evaluate the parameter estimation error under different noise conditions, using different levels of noise variances (inverse precision). For an over-exposed robot (only attending to sensory measurements), left plot of Fig.  5a, the estimation error increases as the noise strength increases, to a point where the error surpasses the error from a prior-biased robot. This shows that a robot, confident in its prior model, assigns low attention to sensory signals and outperforms an over-exposed robot that assigns high attention to sensory signals, in a highly noisy environment. The right plot of Fig. 5a shows the quality of noise precision learning for an over-exposed robot. It can be seen that all the data points in red lie close to the blue line, indicating that the estimated noise precision is close to the real noise precision. Therefore, the robot is capable of recovering the correct sensory noise levels even when the environment is extremely noisy, where accurate parameter estimation is difficult.
These numerical results show that attention mechanism -by means of noise precision learning -allows the estimation of the noise levels in the environment and thereby protects against over-fitting or overconfident parameter estimation. Figure 5: Simulations demonstrating how a biased robot could be advantageous, especially while learning in a highly noisy environment. As the sensor noise increases, the quality of parameter estimation deteriorates to a point where an explorative robot generates higher parameter estimation errors than when compared to the biased robot that relies on its prior parameters. However, the sensor noise estimation is accurate even for high noise environments, demonstrating the success of the attention mechanism using the noise precision learning.
Summary. We have shown how precision-based attention-through precision modelling and learning-yields to accurate robot state estimation, parameter identification and sensory noise estimation. In the next section, we discuss how action is generated in this framework.

Precision-modulated action
Selecting the optimal sequence of actions to fulfil a task is essential for robotics [LaValle, 2006]. One of the most prominent challenges is to ensure robust behaviour given the uncertainty emerging from a highly complex and dynamic real world, where the robots have to operate on. A proper attention system should provide action plans that resolve uncertainty and maximise information gain. For instance, it may minimise the information entropy, thereby encouraging repeated sensory measurements (observations) on high uncertainty sensory information.
Salience, which in neuroscience is sometimes identified as Bayesian surprise (i.e., divergence between prior and posterior), describes which information is relevant to process. We go one step further by defining the saliency map as the epistemic value of a particular action [Friston et al., 2015b]. Thus, the (expected) divergence now becomes the mutual information under a particular action or plan. This makes the saliency map more sophisticated because it is an explicit measure of the reduction in uncertainty or mutual information associated with a particular action (i.e., active sampling), and more pragmatic because it tells you where to sample data next, given current Bayesian beliefs.
We first describe a precision representation usually used in information gathering problems and then how to directly generate action plans through precision optimisation. Afterwards, we discuss the realisation of the full-fledged model presented in the neuroscience section for active perception. We use the informative path planning (IPP) problem, described in Fig. 6, as an illustrative example to drive intuitions. in a realistic simulation environment, plans a finite look-ahead path to minimise the uncertainty of its human occupancy map (e.g., modelled as a Gaussian process) of the world. The planned path is then executed, during which the UAV flies and captures images at a constant measurement frequency. After the data acquisition is complete, a human detection algorithm is executed to detect all the humans on the images. These detections are then fused into the UAV's human location map. The cycle is repeated until the uncertainty of the map is completely resolved (this usually implies enough area coverage and repeated measurements on uncertain locations). The ground truth of the human occupancy map and the UAV belief is shown in (c) and (b) respectively. The final map approaches the ground truth and all the seven humans on the ground are correctly detected.

Precision maps as saliency
One of the popular approaches in information gathering problems is to model the information map as a distribution (e.g., using Gaussian processes ). This is widely used in applications, such as a target search, coverage and navigation. The robot keeps track of an occupancy map and the associated uncertainty map (covariance matrix or inverse precision). While the occupancy map records the presence of the target on the map, the uncertainty map records the quality of those observations. The goal of the robot is to learn the distribution using some learning algorithm [Marchant and Ramos, 2014]. A popular strategy is to plan the robot path such that it minimises the uncertainty of the map in future [Popović et al., 2017]. In Sec. 4.3.2, we will show how we can use the map precision to perform active perception, i.e., optimise the robot path for maximal information gain. Optimising the map precision drives the robot towards an exploratory behaviour.

Precision optimisation for action planning
To introduce precision-based saliency we use an exemplary application of search and rescue. The goal is to find all humans using an unmanned air vehicle (UAV) [Lanillos, 2013, Lanillos et al., 2014, Rasouli et al., 2020, Meera et al., 2019. We use precision for two purposes: i) precision optimisation for action planning (plan flight path) and ii) precision learning for map refinement. In contrast to previous models of action selection within active inference in robotics [Oliver et al., 2021 here precision explicitly drives the agent behaviour. Figure 7 describes the scenario in simulation. The seven human targets on the ground are correctly identified by the UAV. We can formalise the solution as the UAV actions (next flight path) that minimise the future uncertainties of the human occupancy map. In our precision-based attention scheme, this objective is equivalent to maximising the posterior precision of the map. Figure 8 shows the reduction in map uncertainty after subsequent assimilation of the measurements (camera images from the UAV, processed by a human detector). The map (and precision) is learned using a recursive Kalman Filter by fusing the human detector outcome onto the map (and precision). The algorithm drives the UAV towards the least explored regions in the environment, defined by the precision map.
Furthermore, Fig. 9 shows an example of uncertainty resolution under false positives. In this case, human targets are moved to the bottom half of the map. The first measurement provides a wrong human detection with high uncertainty. However, after repeated measurements at the same location in the map the algorithm was capable of resolving this  2018]. The simulation environment on the left consists of a tall building at the centre, surrounded by seven humans lying on the floor. The goal of the UAV is to compute the action sequence that allows maximum information gathering, i.e., the humans location uncertainty is minimised. On the right is the final occupancy map coloured with the probability of finding a human at that location. It can be observed that all humans on the simulation environment were correctly detected by the robot.  ambiguity, to finally learn the correct ground truth map. Hence, the sought behaviour is to take actions that encourage repeated measurements at uncertain locations for reducing uncertainty.
Although the IPP example illustrates how to generate control actions through precision optimisation, the task, by construction, is constrained to explicitly reduce uncertainty. This is similar to the description of visual search described in [Friston et al., 2012], where the location was chosen to maximise information gain. Information gain (i.e., the Bayesian surprise expected following an action) is a key part of the expected free energy functional that underwrite action selection in active inference. In brief, expected free energy can be decomposed into two parts: the first corresponds to the information gain above (a.k.a., epistemic value or affordance), and the second corresponds to the expected log evidence or marginal likelihood of sensory samples (a.k.a., pragmatic value). When this likelihood is read as a prior preference, it contextualises the imperative to reduce uncertainty by including a goal-directed imperative. For example, in the search paradigm above, we could have formulated the problem in terms of reducing uncertainty about whether each location was occupied by a human or not. We could have then equipped the agent with prior preferences for observing humans.
In principle, this would have produced searching behaviour until uncertainty had been resolved about the scene; after which, the robot would seek out humans; simply because, these are its preferred outcomes. In thinking about how this kind of neuroscience inspired or biomimetic approach could be implemented in robotics, one has to consider carefully, the precision afforded sensory inputs (i.e., the likelihood of sensory data, given its latent causes) -and how this changes during robotic flight and periods of data gathering. This brings us back to the precision modulation and the temporal scheduling of searching and securing data. In the final section, we conclude with a brief discussion of how this might be implemented in future applications.

Precision-based active perception
Figure 10: Precision-modulated attention model adapted to the action-perception loop in robotics. Each cycle consists of two steps: 1) action (planning and execution of a finite-time look ahead of the robot path for data collection) and 2) perception (learning using the collected data). This scheduling, using a finite time look-ahead plan, is quite common in real applications and of particular importance when processing is computationally expensive, e.g., slow rate of classification, non-scalable data fusion algorithms, Exponential planners, etc. However, the benefits of incorporating 'optimal' scheduled loop driven by precision should be further studied.
In this section, we discuss the realisation of a biomimetic brain-inspired model in relation to existing solutions in robotics in the context of path-planning. Figure 10 compares our proposed precision-modulated attention model-from Fig. 1-with the action-perception loop widely used in robotics. By analogy with eye saccades to the next visual sample, the UAV flies (action) over the environment to assimilate sensory data for an informed scene construction (perception). Once the flight time of the UAV is exhausted (similar to saccade window of the eye), the action is complete, after which the map is updated, and the next flight path is planned.
In standard applications of active inference, the information gain is supplemented with expected log preferences to provide a complete expected free energy functional [Sajid et al., 2021a]. This accommodates the two kinds of uncertainty that actions and choices typically reduce. The first kind of uncertainty is inherent in unknowns in the environment. This is the information gain we have focused on above. The second kind of uncertainty corresponds to expected surprise, where surprise rests upon a priori expected or preferred outcomes. As noted above, equipping robots with both epistemic and pragmatic aspects to their action selection or planning could produce realistic and useful behaviour that automatically resolves the exploration-exploitation dilemma. This follows because the expected free energy contains the optical mixture of epistemic (information-seeking) and pragmatic (i.e., preference seeking) components. Usually, after a period of exploration, the preference seeking components predominate because uncertainty has been resolved. Although expected free energy provides a fairly universal objective function for sentient behaviour, it does not specify how to deploy behaviour and sensory processing optimally. This brings us to the precision modulation model, inspired by neuroscientific considerations of attention and salience.
Hence, there is a key difference between biological and robotic implementations of the search behaviour, which is the use of continuous oscillatory precision to modulate visual sampling and movement cycles, as opposed to arbitrary discrete action and perception steps currently used in robotics. Importantly, our salience formulation speaks to selecting future data that reduces this uncertainty. For instance, we have shown-in the information gathering IPP example described in the previous subsection-that by optimising precision we also optimise behaviour.
We argue the potential need and the advantages of realising precision based temporal scheduling, as described in our brain-inspired model, for two practically relevant test cases: (i) learning dynamic models and (ii) information seeking applications.
In Section 4.2.4, we have shown how the exploration-exploitation trade-off can be mediated by the prior parameter precision during learning. However, the accuracy-precision curve (Fig. 4b) is often practically unavailable due to unknown true parameters values, challenging the modelling of prior precision. An alternative would be to use a precision based temporal scheduling mechanism to alternate between exploration and exploitation by means of a varying P θ (similar to Fig. 10) during learning, such that system identification is neither biased nor over exposed to sensory measurements. In Fig. 5a, we showed how noise levels influence estimation accuracy, and how biasing the robot by modelling P θ can be beneficial for highly noisy environments. A precision based temporal scheduling mechanism by means of a varying P θ could provide a balanced solution between a biased robot (that exploits its model) and an exploratory one.
Furthermore, temporal scheduling, in the same way that eye saccades are generated, can be adapted for information gathering applications, such as target search, simultaneous localization and mapping, environment monitoring, etc. For instance, introducing precision-modulation scheduling for solving the IPP, and scheduling perception (map learning) and action (UAV flight). Precision modulation will switch between action and perception: when the precision is high, perception occurs (c.f., visual sampling), and when the precision is low, action occurs (c.f., eye movements). This switch, which is often implemented in the robotics literature using a budget for flight time, will be now dictated by precision dynamics.
In short, we have sketched the basis for a future realisation of precision-based active perception, where the robot computes the actions to minimise the expected uncertainty. While most attentional mechanisms in robotics are limited to providing a 'saliency' map highlighting the most relevant features, our attention mechanism proposes a general scheduling mechanism with action in the loop with perception, both driven by precision.

Concluding remarks
We have considered attention and salience as two distinct processes that rest upon oscillatory precision control processes. Accordingly, they require particular temporal considerations: attention to reliably estimate latent states from current sensory data and salience for uncertainty reduction regarding future data samples. This formulation addresses visual search from a first principles (Bayesian) account of how these mechanisms might manifest --and the circular causality that undergirds them via a rhythmic theta-coupling. Crucially, we have revisited the definition of salience from the visual neurosciences; where it is read as Bayesian surprise (i.e., the Kullback Leibler divergence between prior and posterior beliefs). We took this one step further and defined salience as the expected Bayesian surprise (i.e., epistemic value) of a particular action (e.g., sampling this set of data) [Friston et al., 2017b, Sajid et al., 2021a. Formulating salience as the expected divergence renders it the mutual information under a particular action (or action trajectory) . For brevity, our narrative was centred around visual attention and its realisation via eye movements. However, this model does not strictly need to be limited to visual information processing, because it addresses sensorimotor and auditory processing in general. This means it explains how action and perception can be coupled in other sensory modalities. For instance, [Tomassini et al., 2017] showed that visual information is coupled with finger movements at a neural theta rhythm.
The point of contact with the robotics use of salience emerges because the co-variation between a particular parameterisation and the inputs is a measure of the mutual information between the data and its estimated causes. In this sense, both definitions of salience reflect the mutual information -or information about a particular representation of a (latent) cause -afforded by an observation or consequence. However, our formulation is more sophisticated. Briefly, because it is an explicit measure of the reduction in uncertainty (i.e., mutual information) associated with a particular action (i.e., active sampling) and specifies where to sample data next, given current Bayesian beliefs. These processes (attention and salience) are a consequence of precision of beliefs over distinct model parameters. Explicitly, attention contends with precision over the causes of (current) outcomes and salience contends with beliefs about the data that has to be acquired and precision over beliefs about actions that dictate it. Since both processes can be linked via precision manipulation, the crucial thing is the precision that differentiates whether the agent acquires new information (under high precision) or resolves uncertainty by moving (low precision).
The focus of this work has been to illustrate the importance of optimising precision at various places in generative models used for data assimilation, system identification and active sensing. A key point -implicit in these demonstrationsrests upon the mean field approximation used in all applications. Crucially, this means that getting the precision right matters, because updating posterior estimates of states, parameters and precisions all depend upon each other. This may be particularly prescient for making the most sense of samples that maximises information gain. In other words, although attention and salience are separable optimisation processes, they depend upon each other during active sensing. This was the focus of our final numerical studies of action planning.
To face-validate our formulation, we evaluated precision-modulated attentional processes in the robotic domain. We presented numerical examples to show how precision manipulation underwrites accurate state and noise estimation (e.g., selecting relevant information), as well as allowing system identification (e.g., learning unknown parameters of the dynamics). We also showed how one can use precision-based optimisation to solve interesting problems; like the informative path planning in search and rescue scenarios. Thus, in contrast to previous uses of attention in robotics, we placed attention and saliency as integral processes for efficient gathering and processing of sensory information. Accordingly, 'paying attention' is not only about filtering the current flow of information from the sensors but performing those actions that minimise expected uncertainty. Still, the full potential of our proposal has yet to be realised, as the precision-based attention should be able to account for prior preferences beyond the IPP problem (e.g., localising people using UAVs). Finally, we briefly considered the realisation of temporal scheduling for information gathering tasks, opening up interesting lines of research to provide robots with biologically plausible attention.

Author Contributions
AAM and FN are responsible for the novel account and its translation to robotics. All authors contributed to conception and design of the work. AAM, FN, PL and NS wrote the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.