<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Hum. Neurosci.</journal-id>
<journal-title>Frontiers in Human Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Hum. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5161</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnhum.2014.00825</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Uncertainty in perception and the Hierarchical Gaussian Filter</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Mathys</surname> <given-names>Christoph D.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/19655"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Lomakina</surname> <given-names>Ekaterina I.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/159849"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Daunizeau</surname> <given-names>Jean</given-names></name>
<xref ref-type="aff" rid="aff6"><sup>6</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Iglesias</surname> <given-names>Sandra</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Brodersen</surname> <given-names>Kay H.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Friston</surname> <given-names>Karl J.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/20407"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Stephan</surname> <given-names>Klaas E.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/5889"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London</institution> <country>London, UK</country></aff>
<aff id="aff2"><sup>2</sup><institution>Max Planck UCL Centre for Computational Psychiatry and Ageing Research</institution> <country>London, UK</country></aff>
<aff id="aff3"><sup>3</sup><institution>Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich and ETH Zurich</institution> <country>Zurich, Switzerland</country></aff>
<aff id="aff4"><sup>4</sup><institution>Laboratory for Social and Neural Systems Research (SNS Lab), Department of Economics, University of Zurich</institution> <country>Zurich, Switzerland</country></aff>
<aff id="aff5"><sup>5</sup><institution>Department of Computer Science, ETH Zurich</institution> <country>Zurich, Switzerland</country></aff>
<aff id="aff6"><sup>6</sup><institution>Institut du Cerveau et de la Moelle &#x000C9;pini&#x000E8;re, H&#x000F4;pital Piti&#x000E9; Salp&#x000EA;tri&#x000E8;re</institution> <country>Paris, France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Hauke R. Heekeren, Freie Universit&#x000E4;t Berlin, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Dirk Ostwald, Freie Universit&#x000E4;t Berlin, Germany; Mateus Joffily, Centre National de la Recherche Scientifique, France</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Christoph D. Mathys, Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London WC1N 3BG, UK e-mail: <email>c.mathys&#x00040;ucl.ac.uk</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to the journal Frontiers in Human Neuroscience.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>11</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="collection">
<year>2014</year>
</pub-date>
<volume>8</volume>
<elocation-id>825</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>02</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>09</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2014 Mathys, Lomakina, Daunizeau, Iglesias, Brodersen, Friston and Stephan.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>In its full sense, perception rests on an agent&#x00027;s model of how its sensory input comes about and the inferences it draws based on this model. These inferences are necessarily uncertain. Here, we illustrate how the Hierarchical Gaussian Filter (HGF) offers a principled and generic way to deal with the several forms that uncertainty in perception takes. The HGF is a recent derivation of one-step update equations from Bayesian principles that rests on a hierarchical generative model of the environment and its (in)stability. It is computationally highly efficient, allows for online estimates of hidden states, and has found numerous applications to experimental data from human subjects. In this paper, we generalize previous descriptions of the HGF and its account of perceptual uncertainty. First, we explicitly formulate the extension of the HGF&#x00027;s hierarchy to any number of levels; second, we discuss how various forms of uncertainty are accommodated by the minimization of variational free energy as encoded in the update equations; third, we combine the HGF with decision models and demonstrate the inversion of this combination; finally, we report a simulation study that compared four optimization methods for inverting the HGF/decision model combination at different noise levels. These four methods (Nelder&#x02013;Mead simplex algorithm, Gaussian process-based global optimization, variational Bayes and Markov chain Monte Carlo sampling) all performed well even under considerable noise, with variational Bayes offering the best combination of efficiency and informativeness of inference. Our results demonstrate that the HGF provides a principled, flexible, and efficient&#x02014;but at the same time intuitive&#x02014;framework for the resolution of perceptual uncertainty in behaving agents.</p></abstract>
<kwd-group>
<kwd>uncertainty</kwd>
<kwd>volatility</kwd>
<kwd>Bayesian inference</kwd>
<kwd>hierarchical modeling</kwd>
<kwd>filtering</kwd>
<kwd>free energy</kwd>
<kwd>learning</kwd>
<kwd>decision-making</kwd>
</kwd-group>
<counts>
<fig-count count="10"/>
<table-count count="0"/>
<equation-count count="67"/>
<ref-count count="44"/>
<page-count count="24"/>
<word-count count="13585"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>Introduction</title>
<p>Perception has long been proposed to take place in the context of prediction (Helmholtz, <xref ref-type="bibr" rid="B21">1860</xref>). This entails that agents have a model of the environment which generates their sensory input. Probability theory formally prescribes how agents should learn about their environment from sensory information, given a model. This rests on sequential updating of beliefs according to Bayes&#x00027; theorem, where beliefs represent inferences about hidden states of the environment in the form of posterior probability distributions. It is this process that we refer to as perception. Beliefs about hidden states are inherently uncertain. This uncertainty has two sources. First, even when states are constant, the amount of sensory information will in general be too little to infer them exactly. This has been referred to as <italic>informational uncertainty</italic> or <italic>estimation uncertainty</italic> (Payzan-LeNestour and Bossaerts, <xref ref-type="bibr" rid="B34">2011</xref>). The second source of uncertainty is the possibility that states change with time, i.e., <italic>environmental uncertainty</italic>.</p>
<p>Various models have suggested how an agent may deal with an environment fraught with both kinds of uncertainty (e.g., Yu and Dayan, <xref ref-type="bibr" rid="B43">2003</xref>, <xref ref-type="bibr" rid="B44">2005</xref>; Nassar et al., <xref ref-type="bibr" rid="B30">2010</xref>; Payzan-LeNestour and Bossaerts, <xref ref-type="bibr" rid="B34">2011</xref>; Wilson et al., <xref ref-type="bibr" rid="B42">2013</xref>). Here, we discuss an alternative approach that derives closed form update equations for the hidden states, and crucially, for the uncertainty about them, by variational inversion of a generic hierarchical generative model that reflects the time-varying structure of the environment in its higher levels. This derivation has the advantage that the resulting updates optimize a clearly defined objective function, namely variational free energy. Since this quantity is an approximation to surprise (i.e., to the negative log-probability of sensory input), the updates are optimal in the sense that they minimize surprise, given an agent&#x00027;s individual model. Furthermore, the updates explicitly reflect informational and environmental uncertainty.</p>
<p>Our approach makes use of a framework which assumes that agents have an internal generative model of their sensory input. This model is generative in the sense that it describes how sensory inputs are generated by the external world. It does this by assigning a probability (the likelihood) to each sensory input given states (which vary with time) and parameters (which are constant in time) and by completing this with a prior probability distribution for states and parameters. While the purpose of the model is to predict input emanating from the external world, it is internal in the sense that it reflects the agent&#x00027;s <italic>beliefs</italic> about how sensory inputs are generated by the external world.</p>
<p>While Bayesian belief updating is optimal from the perspective of probability theory, it requires computing complicated integrals which are not tractable analytically and difficult to evaluate in real time. Although some attempts to design a Bayesian model of how biological agents learn in a changing environment were remarkably successful in explaining empirical behavior (Behrens et al., <xref ref-type="bibr" rid="B3">2007</xref>, <xref ref-type="bibr" rid="B2">2008</xref>), they were restricted by the computational burden imposed by these models and the assumption that the learning process was identical across subjects. Recently, however, theoretical advances have enabled computationally efficient approximations to exact Bayesian inference during learning (e.g., Friston, <xref ref-type="bibr" rid="B16">2009</xref>; Daunizeau et al., <xref ref-type="bibr" rid="B7">2010a</xref>,<xref ref-type="bibr" rid="B8">b</xref>) and have furnished a basis for biologically plausible mechanisms that might underlie belief updating in the brain. These approaches rest on variational Bayesian techniques which optimize a free-energy bound on the surprise about sensory inputs, given a model of the environment, and represent a special case of the general &#x0201C;Bayesian brain&#x0201D; hypothesis (Dayan et al., <xref ref-type="bibr" rid="B12">1995</xref>; Knill and Pouget, <xref ref-type="bibr" rid="B25">2004</xref>; K&#x000F6;rding and Wolpert, <xref ref-type="bibr" rid="B26">2006</xref>; Friston, <xref ref-type="bibr" rid="B16">2009</xref>; Doya et al., <xref ref-type="bibr" rid="B13">2011</xref>). This hypothesis has been highly influential in recent years, shaping concepts of brain function and inspiring the design of many specific computational models (see Friston and Dolan, <xref ref-type="bibr" rid="B17">2010</xref>, for review). However, for practical applications to empirical data, a general purpose modeling framework has been lacking that would allow for straightforward &#x0201C;off the shelf&#x0201D; implementations of models explaining trial-wise empirical data (e.g., behavioral responses, eye movements, evoked response amplitude in EEG etc.) from the Bayesian brain perspective. This is in contrast to reinforcement learning (RL) models which, due to their simplicity and computational efficiency, have found widespread application in experimental neuroscience, for example, in the analysis of functional magnetic resonance imaging (fMRI) and behavioral data (for reviews, see Daw and Doya, <xref ref-type="bibr" rid="B10">2006</xref>; O&#x00027;Doherty et al., <xref ref-type="bibr" rid="B32">2007</xref>).</p>
<p>To fill this gap and provide a generic, robust and flexible framework for analysis of trial-wise data from the Bayesian brain perspective, we recently introduced the Hierarchical Gaussian Filter (HGF), a hierarchical Bayesian model Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>) in which states evolve as coupled Gaussian random walks, such that each state determines the step size of the evolution of the next lower state (for examples of applications, cf. Iglesias et al., <xref ref-type="bibr" rid="B22">2013</xref>; Joffily and Coricelli, <xref ref-type="bibr" rid="B23">2013</xref>; Vossel et al., <xref ref-type="bibr" rid="B41">2013</xref>). Based on a mean field approximation to the full Bayesian solution, we derived analytic update equations whose form resembles RL updates, with dynamic learning rates and precision-weighted prediction errors. These highly efficient update equations made our approach well suited for filtering purposes, i.e., predicting the value of (and, crucially, the uncertainty about) a hidden and moving quantity based on all information acquired up to a certain point. Our original formulation (Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>) only contained three levels; here, we extend the HGF explicitly to any number of levels and show that the update equations maintain the same form across all levels because they are derived on the basis of the same coupling. Furthermore, the derivation of the variational energies involved in the inversion is given in much more detail than in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>). It is important to note that &#x0201C;perceptual uncertainty&#x0201D; has a broader meaning here than in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>), where it was used more narrowly for that part of the informational uncertainty that relates to sensory input.</p>
<p>Furthermore, in this paper we describe how the HGF is applied in the &#x0201C;observing the observer&#x0201D; framework developed in (Daunizeau et al., <xref ref-type="bibr" rid="B7">2010a</xref>,<xref ref-type="bibr" rid="B8">b</xref>). This framework is based on a clear separation of two model components: First, the agent&#x00027;s perception of (inference about) its environment, i.e., the posterior estimates provided by the agent&#x00027;s model of how its sensory input is generated. Second, the agent&#x00027;s observed actions (i.e., decisions or responses) which are (probabilistic) consequences of the agent&#x00027;s beliefs about its environment. We call the first model <italic>perceptual</italic>, while the second is the <italic>decision</italic> or <italic>response</italic> model.</p>
<p>The &#x0201C;observing the observer&#x0201D; framework is meta-Bayesian in that it enables Bayesian inference (by an observer or experimenter) on Bayesian inference (by a subject). It requires four elements: (1) a generative model of sensory inputs (i.e., a perceptual model), (2) a computationally efficient and robust method for model inversion, (3) a loss function for actions depending on the inferred state, and (4) a decision model. A specific suggestion for the first two elements is contained in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>). In what follows, we extend and generalize this description, discussing specifically the nature of the coupling between levels, choice of coordinates at higher levels, and how to deal with sensory inputs that arrive at irregular time intervals.</p>
<p>In the following section of this paper, we set out our theoretical framework. We first define the HGF model formally. We then proceed with its variational inversion, which gives us closed one-step update equations. Next, we show how the HGF can serve as a perceptual model for any decision model that provides a mapping from the HGF&#x00027;s representations of the environment to the probability of an observed decision. We then derive an objective function whose optimization leads to maximum-a-posterior (MAP) estimates for the parameters of the HGF and the decision model, followed by a short discussion of how the choice of decision model affects which HGF parameters can be estimated.</p>
<p>In the next section, we turn to examples and simulations. We first deal with categorical outcomes and sensory uncertainty. To complement this, we introduce a decision model for binary choices and use it to give an example of model inversion and comparison based on two different but closely related decision models. We do this by estimating model parameters from empirical data in a single subject, juxtaposing the two different response models (which do and do not take into account the uncertainty of beliefs) and the ensuing differences in inferred state trajectories. Next, we introduce a decision model for a one-armed bandit. As in the preceding examples of decision models, we base the decision rule on the agent&#x00027;s expected loss under a given loss function. In the last part of this section, we report the results of a simulation study which demonstrates the feasibility of accurate model inversion, i.e., inferring known HGF parameter values from observed decisions. This test of model inversion was performed under different levels of noise and using four different optimization methods (Markov chain Monte Carlo, the Nelder&#x02013;Mead simplex algorithm, Gaussian process-based global optimization, and variational Bayes.</p>
<p>Together, the theoretical derivations and simulation results provided in this paper generalize the framework of the HGF and demonstrate its utility for estimating individual approximations to Bayes-optimality from observed decision-making under uncertainty.</p>
</sec>
<sec>
<title>Theoretical framework</title>
<sec>
<title>The hierarchical gaussian filter (HGF)</title>
<p>The goal of the model introduced in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>) is simple and general: to describe how an agent learns about a continuous uncertain quantity (i.e., random variable) <italic>x</italic> that moves. One generic way of describing this motion is a Gaussian random walk:</p>
<graphic xlink:href="fnhum-08-00825-e0001.tif"/>
<p>where <italic>k</italic> is a time index, and <italic>x</italic><sup>(<italic>k</italic> &#x02212; 1)</sup> and &#x003D1; are the mean and variance (not standard deviation) of a Gaussian distribution, respectively. In this formulation, the volatility in <italic>x</italic> is governed by the positive constant &#x003D1; (in this paper, we define volatility as the variance of a time series per unit of time); however, there is in principle no reason to assume that volatility is constant. To allow for changes in volatility, we replace &#x003D1; by a positive function <italic>f</italic> of a second random variable, <italic>x</italic><sub>2</sub>, while <italic>x</italic> becomes <italic>x</italic><sub>1</sub>:</p>
<graphic xlink:href="fnhum-08-00825-e0002.tif"/>
<p>We may now further assume that <italic>x</italic><sub>2</sub> performs a Gaussian random walk of its own, with a constant variance &#x003D1;, so that the model for <italic>x</italic><sub>2</sub> is the same as the one for <italic>x</italic> in Equation 1. This adding of levels of Gaussian random walks coupled by their variances can now continue up to any number <italic>n</italic> of levels in the hierarchy, as illustrated in Figure <xref ref-type="fig" rid="F1">1</xref>. At each level <italic>i</italic>, the coupling to the next highest level <italic>i</italic> &#x0002B; 1 is given by a positive function <italic>f<sub>i</sub></italic>(<italic>x</italic><sub><italic>i</italic> &#x0002B; 1</sub>) which represents the variance or step size of the random walk:</p>
<graphic xlink:href="fnhum-08-00825-e0003.tif"/>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Overview of the Hierarchical Gaussian Filter (HGF)</bold>. The model represents a hierarchy of coupled Gaussian random walks. The levels of the hierarchy relate to each other by determining the step size (volatility or variance) of a random walk. The topmost step size is a constant parameter &#x003D1;.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0001.tif"/>
</fig>
<p>At the top level, instead of <italic>f<sub>n</sub></italic>, we have &#x003D1;:</p>
<graphic xlink:href="fnhum-08-00825-e0004.tif"/>
<p>To complete our model, we still need to define the <italic>f<sub>i</sub></italic> in Equation 3. A flexible and straightforward approach to this is to allow any positive analytic <italic>f<sub>i</sub></italic>, but to expand it in powers to first order to give it a simple functional form. However, since <italic>f<sub>i</sub></italic> has to be positive, we cannot approximate it by expanding it directly. Instead, dropping indices for clarity, we expand its logarithm (for details, see Appendix A), which motivates our definition of coupling between levels:
<disp-formula id="E1"><label>(5)</label><mml:math id="M1"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>As we will see below, this form of coupling has the additional advantage of enabling the derivation of simple one-step update equations under a mean-field approximation.</p>
<p>By further assuming that inputs (observations) <italic>u</italic><sup>(<italic>k</italic>)</sup> are generated by means of a Gaussian emission distribution of the form.</p>
<graphic xlink:href="fnhum-08-00825-e0005.tif"/>
<p>where <inline-formula><mml:math id="M68"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>u</italic></sub> denotes the precision of the emission distribution, the model defined by Equations 3 and 5 can be used for online prediction of <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub>, i.e., <italic>filtering</italic>. Considering that the model consists of a hierarchy of Gaussian random walks, this motivates why we refer to it as the <italic>HGF</italic>.</p>
<p>To illustrate this, we might imagine a time series of financial data. <italic>u</italic><sup>(<italic>k</italic>)</sup> could be the observed returns of a particular security. <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub> then is the underlying quantity (the true return after observation noise <inline-formula><mml:math id="M69"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sup>&#x02212;1</sup><sub><italic>u</italic></sub> has been filtered out). &#x003C9;<sub><italic>i</italic></sub> is the tonic (i.e., time-invariant) part of the (log-)volatility of <italic>x</italic><sub>1</sub>, while &#x003BA;<italic>x</italic><sub>2</sub> is the phasic (i.e., time-varying) part. Accounting for the scaling by &#x003BA;, <italic>x</italic><sub>2</sub> is now the scaled phasic log-volatility of <italic>x</italic><sub>1</sub>. This pattern then repeats until the top of the hierarchy is reached. One of the advantages of the HGF is that volatility is captured hierarchically: not only returns are volatile, but also their volatility, and the volatility of the volatility, etc.</p>
</sec>
<sec>
<title>Approximate inversion for sensory input at irregular intervals</title>
<p>Sensing takes place at the bottom of the hierarchy: <italic>u</italic> is the agent&#x00027;s sensory input. To allow for input that comes at irregular intervals, we can multiply the variance of the random walks at all levels by the time <italic>t</italic><sup>(<italic>k</italic>)</sup> that elapses between the arrival of inputs <italic>u</italic><sup>(<italic>k</italic> &#x02212; 1)</sup> and <italic>u</italic><sup>(<italic>k</italic>)</sup>:</p>
<graphic xlink:href="fnhum-08-00825-e0006.tif"/>
<p>This proportionality of the variance to time reflects the fact that the mean squared distance of a quantity performing a Gaussian random walk from its origin is proportional to time (cf. the connection between Gaussian random walks, Brownian motion, and the heat equation; Evans, <xref ref-type="bibr" rid="B14">2010</xref>). For inputs at regular intervals, we may simply set <italic>t</italic><sup>(<italic>k</italic>)</sup> &#x0003D; 1 for all <italic>k</italic>, effectively removing <italic>t</italic><sup>(<italic>k</italic>)</sup> from the model.</p>
<p>We can now derive update equations using the variational inversion method introduced in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>). This approximate inversion assumes Gaussian posteriors at all levels with means &#x003BC;<sub><italic>i</italic></sub> and precisions (inverse variances) &#x003C0;<sub><italic>i</italic></sub>:</p>
<graphic xlink:href="fnhum-08-00825-e0007.tif"/>
<p>where &#x003C7; <inline-formula><mml:math id="M70"><mml:mrow><mml:munder><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow><mml:mo>=</mml:mo></mml:munder></mml:mrow></mml:math></inline-formula> {&#x003BA;, &#x003C9;, &#x02026;, &#x003BA;<sub><italic>n</italic>&#x02212;1</sub>, &#x003C9;<sub><italic>n</italic>&#x02212;1</sub>, &#x003D1;}. This is an approximation because the true posterior distribution <italic>p</italic>(<italic>x</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>|<italic>u</italic><sup>(1)</sup>, &#x02026;, <italic>u</italic><sup>(<italic>k</italic>)</sup>, &#x003C7;) will deviate somewhat from a Gaussian shape. A discussion of the variational nature and the implications of this approximation can be found in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>). The sufficient statistics &#x003BC;<sub><italic>i</italic></sub> and &#x003C0;<sub><italic>i</italic></sub> are the quantities that are updated after each new input <italic>u</italic> according to the following equations (cf. the Discussion, where we give a natural interpretation of them in terms of learning rate and prediction error):
<disp-formula id="E2"><label>(9)</label><mml:math id="M2"><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mtext>+</mml:mtext><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext>&#x02009;</mml:mtext><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mfrac><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:math></disp-formula>
<disp-formula id="E3"><label>(10)</label><mml:math id="M3"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
with
<disp-formula id="E4"><label>(11)</label><mml:math id="M4"><mml:mrow><mml:msubsup><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>&#x003D1;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E5"><label>(12)</label><mml:math id="M5"><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></disp-formula>
<disp-formula id="E6"><label>(13)</label><mml:math id="M6"><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E7"><label>(14)</label><mml:math id="M7"><mml:mrow><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></disp-formula></p>
<p>Variances (i.e., inverse precisions) are denoted by &#x003C3;<sup>(<italic>k</italic>)</sup><sub>i</sub> &#x0003D; 1/&#x003C0;<sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>. Note that irregular intervals between inputs are fully accounted for by the factor <italic>t</italic><sup>(<italic>k</italic>)</sup> in Equation 11. While the updates of Equations 9 and 10 apply to all but the first level, they are different at the first level:
<disp-formula id="E8"><label>(15)</label><mml:math id="M8"><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>u</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></disp-formula>
<disp-formula id="E9"><label>(16)</label><mml:math id="M9"><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>u</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula>
with
<disp-formula id="E10"><label>(17)</label><mml:math id="M10"><mml:mrow><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></disp-formula>
where <inline-formula><mml:math id="M71"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sup>(<italic>k</italic></sup>)<sub>1</sub> and <inline-formula><mml:math id="M72"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sup>(<italic>k</italic>)</sup><sub>1</sub> are defined by Equations 12 and 13. The different form of the updates at the first level arises because, at the first level, the direction of inference is from <italic>u</italic> to <italic>x</italic><sub>1</sub>, which appears in the <italic>mean</italic> of the Gaussian in Equation 6, while at all higher levels, the direction of inference is from <italic>x</italic><sub>i</sub> to <italic>x</italic><sub><italic>i</italic></sub> &#x0002B; 1, which appears in the <italic>variance</italic> of the Gaussian in Equation 3. This results in the updates being driven by different kinds of prediction errors: value prediction errors (VAPEs) at the first level, volatility (i.e., variance) prediction errors (VOPEs) at all higher levels. We elaborate this distinction in the Discussion below.</p>
<p>The details of this inversion are given in Appendix B. The notation chosen here emphasizes the role of precisions in the updating process more than that used in (Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>); they are, however, equivalent.</p>
</sec>
<sec>
<title>Maximum-a-posteriori (MAP) parameter estimation</title>
<p>Given the above update equations, initial representations <inline-formula><mml:math id="M73"><mml:mrow><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>(i.e., the means &#x003BC;<sub><italic>i</italic></sub> and precisions &#x003C0;<sub><italic>i</italic></sub> of the states <italic>x<sub>i</sub></italic> at time <italic>k</italic> &#x0003D; 0), and priors on the perceptual parameters &#x003C7;, we could invert the model (i.e., estimate the values of &#x003C7; that lead to least aggregate surprise about sensory inputs) from sensory inputs alone; this would provide us with state trajectories and parameters which represent an ideal Bayesian agent, where &#x0201C;ideal&#x0201D; means experiencing the least surprise about sensory inputs. However, our goal is usually different; it is to estimate subject-specific parameters (which encode the individual&#x00027;s approximation to Bayes-optimality) from his/her observed behavior, as formalized in the &#x0201C;observing the observer&#x0201D; framework (Daunizeau et al., <xref ref-type="bibr" rid="B7">2010a</xref>,<xref ref-type="bibr" rid="B8">b</xref>). To achieve this goal, we will now bring the HGF into this framework. This requires the introduction of a response model which links the agent&#x00027;s current estimates <inline-formula><mml:math id="M74"><mml:mrow><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> of states to expressed decisions <italic>y</italic><sup>(<italic>k</italic>)</sup> and which also contains subject-specific parameters <inline-formula><mml:math id="M75"><mml:mrow><mml:mi>&#x003B6;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B6;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B6;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. For example, a useful response model (e.g., Iglesias et al., <xref ref-type="bibr" rid="B22">2013</xref>), which we also use in our simulation study below, is the unit-square sigmoid, which maps the predictive probability <italic>m<sup>(k)</sup></italic> that the next outcome will be 1 onto the probabilities <italic>p</italic>(<italic>y</italic><sup>(<italic>k</italic>)</sup> &#x0003D; 1) and <italic>p</italic>(<italic>y</italic><sup>(<italic>k</italic>)</sup> &#x0003D; 0) that the agent will choose response 1 or 0, respectively:
<disp-formula id="E11"><label>(18)</label><mml:math id="M11"><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200B;&#x0200B;</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msup><mml:mi>m</mml:mi><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mi>m</mml:mi><mml:mi>&#x003B6;</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi><mml:mtext>&#x0200B;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow></mml:mfrac><mml:mtext>&#x0200B;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>y</mml:mi></mml:msup><mml:mo>&#x000B7;</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi><mml:mtext>&#x0200B;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mi>m</mml:mi><mml:mi>&#x003B6;</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi><mml:mtext>&#x0200B;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow></mml:mfrac><mml:mtext>&#x0200B;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
where, for clarity, we have omitted time indices on <italic>y</italic> and <italic>m</italic>. For this to serve as a response model for the inversion of the HGF, the predictive probability <italic>m</italic> &#x0003D; <italic>m</italic>(&#x003BB;) has to be a function of the quantities &#x003BB; the HGF keeps track of. This model is explained and discussed in detail below. <bold>Figure 4</bold> is a graphical representation of it. For our present purposes, the only important point is that it contains a parameter &#x003B6; that captures how deterministically <italic>y</italic> is associated with <italic>m</italic>. The higher &#x003B6;, the more likely the agent is to choose the option that is more likely according to its current belief. Since <italic>m</italic> is a deterministic function of &#x003BB;, we can write <italic>p</italic>(<italic>y</italic> | <italic>m</italic>, &#x003B6;) &#x0003D; <italic>p</italic>(<italic>y</italic> | &#x003BB;, &#x003B6;).</p>
<p>In general, the joint distribution for observations (i.e., decisions) and parameters of an HGF-based decision model takes the form
<disp-formula id="E12"><label>(19)</label><mml:math id="M12"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:mi>p</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <inline-formula><mml:math id="M76"><mml:mrow><mml:mi>u</mml:mi><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>K</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M77"><mml:mrow><mml:mi>y</mml:mi><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>K</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> are the inputs and responses at time points <italic>k</italic> &#x0003D; 1 to <italic>k</italic> &#x0003D; <italic>K</italic>, respectively, and &#x003BB;<sup>(<italic>k</italic>)</sup>(&#x003C7;, &#x003BB;<sup>(0)</sup>, <italic>u</italic>) are the sufficient statistics <inline-formula><mml:math id="M78"><mml:mrow><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> of the hidden states of the HGF at time <italic>k</italic>. The inputs <italic>u</italic> are given because the agent and its observer both know them and the agent uses them to invert its HGF model, resulting in the trajectories &#x003BB;<sup>(<italic>k</italic>)</sup>. This makes it clear that the decision model is not a generative model of sensory input, while the HGF is. Strictly speaking, the decision model does not use the perceptual model directly. Instead, the decision model uses the perceptual model indirectly via its inversion, where input is given. It is also noteworthy that the sets of inputs <italic>u</italic> and observations <italic>y</italic> are finite here (<italic>k</italic> &#x0003D; 1,&#x02026;, <italic>K</italic>), while the HGF is open-ended (cf. <italic>k</italic> &#x0003D; 1, 2, &#x02026; in Equation 1).</p>
<p>The goal now is to find an expression for the maximum-a-posteriori (MAP) estimate for the parameters <inline-formula><mml:math id="M79"><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mover><mml:mtext>=</mml:mtext><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. The MAP estimate &#x003BE;<sup>&#x0002A;</sup> of &#x003BE; is defined as
<disp-formula id="E13"><label>(20)</label><mml:math id="M13"><mml:mrow><mml:msup><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x02217;</mml:mo></mml:msup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>We unpack this in Appendix E to make it tractable:
<disp-formula id="E14"><label>(21)</label><mml:math id="M14"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msup><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x02217;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The objective function <italic>Z</italic>(&#x003BE;) that needs to be maximized is therefore the log-joint probability density of the parameters &#x003BE; and responses <italic>y</italic> given inputs <italic>u</italic>:
<disp-formula id="E15"><label>(22)</label><mml:math id="M15"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>Z</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mi>ln</mml:mi></mml:mrow></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>While the response model gives <italic>p</italic>(<italic>y</italic><sup>(<italic>k</italic>)</sup> | &#x003BB;<sup>(<italic>k</italic>)</sup>, &#x003B6;), the perceptual model (i.e., the HGF) provides the representations &#x003BB;<sup>(<italic>k</italic>)</sup>(&#x003C7;, &#x003BB;<sup>(0)</sup>, <italic>u</italic>) by way of its update equations. The last missing part in Equation 22 is the prior distribution <italic>p</italic>(&#x003BE;). This will be discussed below. There are many alternative optimization procedures to implement the maximization in Equation 21. We have compared four in the simulations discussed below.</p>
<p>Finally, one important point in relation to model inversion is model identifiability which we discuss in detail in Appendix F. In brief, when the posterior mean of the state at level <italic>i</italic> (i.e., &#x003BC;<sub><italic>i</italic></sub>) is included in the response model and thus affects measured behavior, all quantities at that level &#x003BC;<sup>(0)</sup><sub><italic>i</italic></sub>, &#x003BA;<sub><italic>i</italic></sub>, and &#x003C9;<sub><italic>i</italic></sub> can be estimated. If &#x003BC;<sub><italic>i</italic></sub> is not included in the response model, it is advisable to fix two of the three parameters &#x003BC;<sup>(0)</sup><sub><italic>i</italic></sub>, &#x003BA;<sub><italic>i</italic></sub>, and &#x003C9;<sub><italic>i</italic></sub>, reflecting a particular choice of origin and scale on <italic>x<sub>i</sub></italic>. This avoids an overparameterized model.</p>
</sec>
<sec>
<title>Priors and transformed parameter spaces</title>
<p>A crucial part of Bayesian inference is the specification of a prior distribution, in our case <italic>p</italic>(&#x003BE;). There is no principled reason why the priors on the different elements of &#x003BE; should not be independent; therefore, we may assume the following factorization:
<disp-formula id="E16"><label>(23)</label><mml:math id="M16"><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003D1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x0220F;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x0220F;</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B6;</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<italic>p</italic>(&#x003B6;<sub><italic>j</italic></sub>) depends on the response model chosen and will have to be dealt with on a case-by-case basis (see below), but the remaining marginal priors are generic and will be discussed in what follows.</p>
<p>The most straightforward case are the priors on &#x003C9;<sub><italic>i</italic></sub>. Since &#x003C9;<sub><italic>i</italic></sub> can take values on the whole real line, it can be estimated in its native space with a (possibly wide) Gaussian prior:</p>
<graphic xlink:href="fnhum-08-00825-e0008.tif"/>
<p>The same applies to &#x003BC;<sup>(0)</sup><sub><italic>i</italic></sub>:</p>
<graphic xlink:href="fnhum-08-00825-e0009.tif"/>
<p>&#x003C3;<sup>(0)</sup><sub><italic>i</italic></sub> has a natural lower bound at zero since it is a variance. We can avoid non-positive values by estimating &#x003C3;<sup>(0)</sup><sub><italic>i</italic></sub> in log-space. That is, we use a log-Gaussian prior:</p>
<graphic xlink:href="fnhum-08-00825-e0010.tif"/>
<p>Just like &#x003C3;<sup>(0)</sup><sub><italic>i</italic></sub>, &#x003D1; is a variance and has a lower bound at zero. In addition to the lower bound, it is desirable to have an upper bound on &#x003D1; because, for a &#x003D1; too large, the assumptions underlying the derivation of the update equations of the HGF no longer hold. Specifically, for large values of &#x003D1; it is possible to get updates that push the precision &#x003C0;<sub><italic>n</italic></sub> at the top level below zero, indicating that the agent knows &#x0201C;less than nothing&#x0201D; about <italic>x</italic><sub><italic>n</italic></sub>. In less extreme cases, a large &#x003D1; may allow &#x003BC;<sub><italic>n</italic></sub> to jump to very high levels, giving rise to improbable inference trajectories (cf. Equations 9 and 11). This is due to a violation of the assumption that the variational energies <italic>I</italic>(<italic>x</italic><sub><italic>i</italic></sub>) are nearly quadratic (see Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>, for details).</p>
<p>To avoid such violations, it is sensible to place an upper bound on &#x003D1; in addition to the lower bound at zero. This can be achieved by estimating &#x003D1; in &#x0201C;logit-space,&#x0201D; a logistic sigmoid transformation of native space with a variable upper bound <italic>a</italic> &#x0003E; 0:
<disp-formula id="E17"><label>(27)</label><mml:math id="M17"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mtext>logit</mml:mtext><mml:mi>a</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mi>x</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>&#x021D2;</mml:mi><mml:mi>x</mml:mi><mml:mtext>&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mi>a</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mtext>logit</mml:mtext></mml:mrow><mml:mi>a</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>In that space, the prior on &#x003D1; can then be taken as</p>
<graphic xlink:href="fnhum-08-00825-e0011.tif"/>
<p>While &#x003BA;<sub><italic>i</italic></sub> can in principle take any real value, flipping the sign of &#x003BA;<sub><italic>i</italic></sub> is equivalent to flipping that of <italic>x</italic><sub><italic>i</italic>&#x0002B;1</sub> (cf. Equation 5). It is therefore useful to adopt the convention that all &#x003BA;<sub><italic>i</italic></sub> &#x0003E; 0. This leads to the intuitive relation that a greater <italic>x</italic><sub><italic>i</italic>&#x0002B;1</sub> means a greater variability in <italic>x<sub>i</sub></italic>; in other words, this makes <italic>f<sub>i</sub></italic> in Equation 3 a monotonically increasing function. A second useful constraint on &#x003BA;<sub><italic>i</italic></sub> is that it is bounded above, for the same reason as &#x003D1;. Consequently, we evaluate &#x003BA;<sub><italic>i</italic></sub> in logit-space with the following priors:</p>
<graphic xlink:href="fnhum-08-00825-e0012.tif"/>
<p>The exact specification of the above priors can vary, depending on the experimental context and instructions given to the subject (e.g., whether or not to expect a volatile environment). In most cases, a choice of &#x003BA;<sub><italic>i</italic></sub> and &#x003D1; with upper bounds at or below 2 will be sensible. In cases where there is little movement in <italic>x<sub>n</sub></italic> (the topmost <italic>x</italic>), a choice of &#x003D1; closer to 0 than 1 will be appropriate. Notably, given that choosing a different prior amounts to having a different model, the choice between alternative priors can be evaluated using model comparison (cf. Stephan et al., <xref ref-type="bibr" rid="B38">2009</xref>).</p>
</sec>
</sec>
<sec>
<title>Examples and simulations</title>
<sec>
<title>Categorical outcomes and sensory uncertainty</title>
<p>In the formulation above, <italic>x</italic><sub>1</sub> performs a Gaussian random walk on a continuous scale. However, states of an agent&#x00027;s environment that generate sensory input are often categorical, in the simplest case binary (e.g., present/absent). This fact can be accommodated in the perceptual model above by making the base level, <italic>x</italic><sub>1</sub>, binary (we omit time indices <italic>k</italic> unless they are needed to avoid confusion):
<disp-formula id="E18"><label>(30)</label><mml:math id="M18"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The second level, <italic>x</italic><sub>2</sub>, of the model then describes the tendency of <italic>x</italic><sub>1</sub> toward state 1:
<disp-formula id="E19"><label>(31)</label><mml:math id="M19"><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:msup><mml:mtext>=Bernoulli</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>;</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
where <inline-formula><mml:math id="M80"><mml:mrow><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>exp</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula> is the logistic sigmoid function. Similarly, when <italic>x</italic><sub>1</sub> represents <italic>d</italic> &#x0003E; 2 outcomes, the probability of each can be represented by its own <italic>x</italic><sub>2</sub>, performing (at most) <italic>d</italic> &#x02212; 1 independent random walks. <italic>p</italic>(<italic>x</italic><sub>1</sub> &#x0003D; 0) simply is 1 &#x02212; <italic>p</italic>(<italic>x</italic><sub>1</sub> &#x0003D; 1).</p>
<p>In the three-level HGF for binary outcomes (Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>) the third level, <italic>x</italic><sub>3</sub> is at the top, with constant step variance &#x003D1;. The only level with a coupling of the form of Equation 5 is therefore the second level; this allows us to write &#x003BA;<sub>2</sub> &#x02261; &#x003BA; and &#x003C9;<sub>2</sub> &#x02261; &#x003C9;. We can allow for sensory uncertainty by including an additional level at the bottom of the hierarchy that predicts sensory input <italic>u</italic> from the state <italic>x</italic><sub>1</sub>. In the absence of sensory uncertainty, knowledge of the state <italic>x</italic><sub>1</sub> enables accurate prediction of input <italic>u</italic> and vice versa; we may then simply set <italic>u</italic> &#x02261; <italic>x</italic><sub>1</sub> and treat <italic>x</italic><sub>1</sub> as if it were directly observed. A graphical overview of this model is given in Figure <xref ref-type="fig" rid="F2">2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>The 3-level HGF for binary outcomes</bold>. The lowest level, <italic>x</italic><sub>1</sub>, is binary and corresponds, in the absence of sensory noise, to sensory input <italic>u</italic>. Left: schematic representation of the generative model as a Bayesian network. <italic>x</italic><sup>(k)</sup><sub>1</sub>, <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>2</sub>, <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>3</sub> are hidden states of the environment at time point <italic>k</italic>. They generate <italic>u</italic><sup>(<italic>k</italic>)</sup>, the input at time point <italic>k</italic>, and depend on their immediately preceding values <italic>x</italic><sup>(k &#x02212; 1)</sup><sub>2</sub>, <italic>x</italic><sup>(<italic>k</italic> &#x02212;1)</sup><sub>3</sub> and on the on parameters &#x003BA;, &#x003C9;, &#x003D1;. Right: model definition. This figure has been adapted from Figures 1, 2 in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>).</p></caption>
<graphic xlink:href="fnhum-08-00825-g0002.tif"/>
</fig>
<p>For the particular case of the three-level HGF for binary outcomes, the general update equations in Equations 9&#x02013;14 (with <italic>t</italic><sup>(<italic>k</italic>)</sup> &#x0003D; 1 for all <italic>k</italic>) take the following specific form (as previously derived in (Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>), with additional detail in Appendix D):
<disp-formula id="E20"><label>(32)</label><mml:math id="M20"><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></disp-formula>
<disp-formula id="E21"><label>(33)</label><mml:math id="M21"><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
with
<disp-formula id="E22"><label>(34)</label><mml:math id="M22"><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E23"><label>(35)</label><mml:math id="M23"><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:math></disp-formula>
<disp-formula id="E24"><label>(36)</label><mml:math id="M24"><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="E25"><label>(37)</label><mml:math id="M25"><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mtext>e</mml:mtext><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula></p>
<p>Details of the derivation of these update equations are given in Appendix D. The update equations for binary outcomes differ from those given in Equations 9 and 10 only at the second level. On all higher levels, they are the generic HGF updates from Equations 9 and 10. This difference is entirely due to the sigmoid transformation that links the first and second level, enabling the filtering of binary outcomes. Note that in the binary case, the second level corresponds to the first level of the continuous case in the sense that they are the lowest levels where a Gaussian random walk takes place.</p>
<p>To illustrate of how the HGF can deal with the simplest kind of informational uncertainty, sensory uncertainty, we simulate two agents, one with high and the other with low sensory uncertainty, who are otherwise equal. Sensory uncertainty is captured by the following relation between the binary state <italic>x</italic><sub>1</sub> and sensory input <italic>u</italic>:</p>
<graphic xlink:href="fnhum-08-00825-e0013.tif"/>
<p>This means that the probability of <italic>u</italic> is Gaussian with precision <inline-formula><mml:math id="M81"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>u</italic></sub> around a mean of &#x003B7;<sub>1</sub> for <italic>x</italic><sub>1</sub> &#x0003D; 1 and &#x003B7;<sub>0</sub> for <italic>x</italic><sub>1</sub> &#x0003D; 0. In this case (cf. Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>, Equation 47), the update equation for &#x003BC;<sub>1</sub> is</p>
<disp-formula id="E26"><label>(39)</label><mml:math id="M26"><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>u</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B7;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>u</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B7;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>u</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B7;</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>Figure <xref ref-type="fig" rid="F3">3</xref> shows our simulation. We first chose a sequence of true hidden states <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub>, <italic>k</italic> &#x0003D; 1, &#x02026;, 640. We then drew a sequence of inputs <italic>u</italic><sup>(<italic>k</italic>)</sup> &#x0007E; <inline-graphic xlink:href="fnhum-08-00825-i0001.tif"/> (<italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub>, 0.1), corresponding to a setting of &#x003B7;<sub>1</sub> &#x0003D; 1, &#x003B7;<sub>0</sub> &#x0003D; 0, and <inline-formula><mml:math id="M82"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>u</italic></sub> &#x0003D; 10 (cf. Equation 38). These inputs were fed into two three-level HGFs that differed only in the amount of sensory uncertainty they assumed: <inline-formula><mml:math id="M83"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sup>&#x02212;1</sup><sub><italic>u</italic></sub> &#x0003D; 0.001 (low) and <inline-formula><mml:math id="M84"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sup>&#x02212;1</sup><sub><italic>u</italic></sub> &#x0003D; 0.1 (high). Clearly, higher input precision leads to greater responsiveness to fluctuations in input, as reflected in the trajectory of belief on tendency <italic>x</italic><sub>2</sub> and in a higher volatility estimate (belief on <italic>x</italic><sub>3</sub>). In the case of low input precision, the volatility estimate keeps declining because most of the variation in input is attributed to noise instead of fluctuations in the underlying tendency toward one outcome category <italic>x</italic><sub>1</sub> or the other. Decisions (purple dots) were simulated using a unit-square sigmoid response model with &#x003B6; &#x0003D; 8 in both cases (cf. Equation 18). The consequences of high sensory uncertainty for decision-making in this scenario are apparent: the agent with higher sensory uncertainty is less consistent in favoring one option over the other at any given time. This accords well with recent accounts of psychopathological symptoms as a failure of sensory attenuation (Adams et al., <xref ref-type="bibr" rid="B1">2013</xref>).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>The consequences of sensory uncertainty</bold>. Simulation of inference on a binary hidden state <italic>x</italic><sub>1</sub> (black dots) using a three-level HGF under low (<inline-formula><mml:math id="M85"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>u</italic></sub> &#x0003D; 1000, top panel) and high (<inline-formula><mml:math id="M86"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>u</italic></sub> &#x0003D; 10, bottom panel) sensory uncertainty. Trajectories were simulated using the same input and parameters (except <inline-formula><mml:math id="M87"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>u</italic></sub>) in both cases: &#x003BC;<sup>(0)</sup><sub>2</sub> &#x0003D; &#x003BC;<sup>(0)</sup><sub>3</sub> &#x0003D; 0, &#x003C3;<sup>(0)</sup><sub>2</sub> &#x0003D; &#x003C3;<sup>(0)</sup><sub>3</sub> &#x0003D; 1, &#x003BA; &#x0003D; 1, &#x003C9; &#x0003D; &#x02212;3, and &#x003D1; &#x0003D; 0.7. Decisions were simulated using a unit-square sigmoid model with &#x003B6; &#x0003D; 8.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0003.tif"/>
</fig>
<p>We now turn to the third and fourth elements defining our Bayesian agent: decision models based on loss functions.</p>
</sec>
<sec>
<title>Decision model for a simple binary loss function</title>
<p>One of the simplest decision situations for an agent is having to choose between two options, only one of which will be rewarded, but both of which offer the same gain (i.e., negative loss), if rewarded. In the three-level version of the HGF from Figure <xref ref-type="fig" rid="F2">2</xref>, we may code one such binary outcome as <italic>x</italic><sub>1</sub> &#x0003D; 1 and the other as <italic>x</italic><sub>1</sub> &#x0003D; 0. This allows us to define a quadratic loss function &#x02113; where making the wrong choice <italic>y</italic> &#x02208; {0, 1} leads to a loss of 1 while the right choice leads to a loss of 0:
<disp-formula id="E27"><label>(40)</label><mml:math id="M27"><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mtext>-</mml:mtext><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula></p>
<p>The expected loss <inline-graphic xlink:href="fnhum-08-00825-i0002.tif"/> of decision <italic>y</italic>, given the agent&#x00027;s representations &#x003BB;, is then the expectation of &#x02113; under the distributions <italic>q</italic> described by &#x003BB;:</p>
<graphic xlink:href="fnhum-08-00825-e0014.tif"/>
<p>To evaluate this, we must remember that the agent has to rely on its beliefs deriving from time <italic>k</italic> &#x02212; 1 to make decision <italic>y</italic><sup>(<italic>k</italic>)</sup> at time <italic>k</italic>. In the above equation, elements of &#x003BB; therefore have time index <italic>k</italic> &#x02212; 1, while <italic>x</italic><sub>1</sub> and <italic>y</italic> have time index <italic>k</italic>. Specifically, the belief on the outcome probability at the first level is <inline-formula><mml:math id="M88"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sup>(<italic>k</italic>)</sup><sub>1</sub> &#x0003D; <italic>s</italic> (&#x003BC;<sup>(<italic>k</italic>&#x02212;1)</sup><sub>2</sub>). With <italic>q</italic>(<italic>x</italic><sub>1</sub>) &#x0003D; (&#x003BC;<sub>1</sub>)<sup><italic>x</italic><sub>1</sub></sup>(1 &#x02212; &#x003BC;<sub>1</sub>)<sup>1&#x02212;<italic>x</italic><sub>1</sub></sup> (Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>, Equation 12), we then have</p>
<graphic xlink:href="fnhum-08-00825-e0015.tif"/>
<p>The optimal decision <italic>y</italic><sup>&#x0002A;</sup> is the one that minimizes expected loss <inline-graphic xlink:href="fnhum-08-00825-i0002.tif"/>:</p>
<graphic xlink:href="fnhum-08-00825-e0016.tif"/>
<p>This simply means that to minimize its losses, the agent should choose the option it believes more likely to be rewarded. It may seem superfluous to go to such lengths to derive such an obvious result, but the purpose of the above is also to give an illustration of the principled way a decision rule can be derived by combining the HGF with a loss function.</p>
<p>It is, however, unreasonable to assume that human agents will always choose the option that minimizes their expected loss in the current trial, for two reasons. First, if there is more than one trial and the probabilities of the different options are independent, there is an exploration/exploitation tradeoff that makes it worth the agent&#x00027;s while (in the long run) sometimes to choose an option that is not expected to minimize loss in the current trial (Macready and Wolpert, <xref ref-type="bibr" rid="B28">1998</xref>; Daw et al., <xref ref-type="bibr" rid="B11">2006</xref>). Second, biological agents exhibit decision noise (Faisal et al., <xref ref-type="bibr" rid="B15">2008</xref>), e.g., owing to implementation constraints at the molecular, synaptic or circuit level. To allow for exploration and noise, we use a decision model that corresponds to the right-hand side of Equation 43, without taking the limit, instead leaving &#x003B6; as a parameter to be estimated from the data (cf. Equation 18):
<disp-formula id="E28"><label>(44)</label><mml:math id="M28"><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mi>&#x003B6;</mml:mi></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mi>&#x003B6;</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>y</mml:mi></mml:msup><mml:mo>&#x000B7;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mi>&#x003B6;</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula></p>
<p>Figure <xref ref-type="fig" rid="F4">4</xref> contains a graph of this function for <italic>p</italic>(<italic>y</italic> &#x0003D; 1) where &#x003B6; plays the role of the noise (or exploration) parameter. This decision model was the basis for the simulations we conducted to assess the accuracy of parameter estimations (results below).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>The unit square sigmoid (cf. Equations 43, 44)</bold>. The parameter &#x003B6; can be interpreted as inverse response noise because the sigmoid approaches a step function as &#x003B6; approaches infinity.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0004.tif"/>
</fig>
</sec>
<sec>
<title>Inversion example</title>
<p>To illustrate how real datasets can be inverted and different response models compared, we take the data of one subject from Iglesias et al. (<xref ref-type="bibr" rid="B22">2013</xref>). This consists of 320 inputs <italic>u</italic> and responses <italic>y</italic>. Our perceptual model is the three-level HGF for binary outcomes without sensory noise, and a first choice of decision model is the unit-square sigmoid of Equation 44. Using the HGF Toolbox (<ext-link ext-link-type="uri" xlink:href="http://www.translationalneuromodeling.com/tapas">http://www.translationalneuromodeling.com/tapas</ext-link>), we specify the following priors (mean, variance) in appropriately transformed spaces: &#x003BC;<sup>(0)</sup><sub>2</sub>: (0, 1), &#x003C3;<sub>2</sub>: (0, 1), &#x003BC;<sup>(0)</sup><sub>3</sub>: (1, 0), &#x003C3;<sup>(0)</sup><sub>3</sub>: (0, 1), &#x003BA;: (0, 2) &#x003C9;: (&#x02212;4, 0), &#x003D1;: (0, 2), and &#x003B6;: (48, 1). The variance of 0 on &#x003BC;<sup>(0)</sup><sub>3</sub> and &#x003C9; fixes these parameters to 1 and &#x02212;4, respectively. The spaces for &#x003C3;<sup>(0)</sup><sub>i</sub> and &#x003B6; were log-transformed while &#x003BA; was estimated in a logit-transformed space with upper bound 6, and &#x003D1; was estimated in logit-transformed space with upper bound 0.005.</p>
<p>We now modify our response model so that it no longer has a constant free parameter (&#x003B6;) as its inverse decision temperature, but the inverse volatility estimate exp (&#x02212; &#x003BC;<sub>3</sub>):
<disp-formula id="E29"><label>(45)</label><mml:math id="M29"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>y</mml:mi></mml:msup><mml:mo>&#x000B7;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>This means that the agent will behave the less deterministically the more volatile it believes its environment to be. Since this is now a decision model that contains &#x003BC;<sub>3</sub>, it permits us to estimate all parameters including &#x003BC;<sup>(0)</sup><sub>3</sub> and &#x003C9;. Accordingly, we increase the variance of their priors to 4. The result of this inversion is shown in Figure <xref ref-type="fig" rid="F5">5</xref>. This figure illustrates how the HGF deals with perceptual uncertainty by updating beliefs throughout its hierarchy on the basis of precision-weighted prediction errors. The learning rates &#x003C8;<sub><italic>i</italic></sub> (definitions see figure caption) are adjusted continually at each level separately, which provides the flexibility needed to adapt to changes in outcome tendency and volatility (i.e., to perceptual uncertainty).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Model inversion</bold>. Maximum-a-posteriori parameter estimates are &#x003BC;<sup>(0)</sup><sub>2</sub> &#x0003D; 0.87, &#x003C3;<sup>(0)</sup><sub>2</sub> &#x0003D; 1.20, &#x003BC;<sup>(0)</sup><sub>3</sub> &#x0003D; &#x02212;0.65, &#x003C3;<sup>(0)</sup><sub>3</sub> &#x0003D; 0.88, &#x003BA; &#x0003D; 1.32, &#x003C9; &#x0003D; &#x02212;0.71, and &#x003D1; &#x0003D; 0.0023. These parameter values correspond to the following trajectories: <bold>(A)</bold> Posterior expectation &#x003BC;<sub>3</sub> of log-volatility <italic>x</italic><sub>3</sub>. <bold>(B)</bold> Precision weight <inline-formula><mml:math id="M89"><mml:mrow><mml:msub><mml:mi>&#x003C8;</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula> which modulates the impact of prediction error &#x003B4;<sub>2</sub> on log-volatility updates. <bold>(C)</bold> Volatility prediction error &#x003B4;<sub>2</sub>. <bold>(D)</bold> Posterior expectation &#x003BC;<sub>2</sub> of tendency &#x003BC;<sub>2</sub>. <bold>(E)</bold> Precision weight <inline-formula><mml:math id="M90"><mml:mrow><mml:msub><mml:mi>&#x003C8;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> (in green) which modulates the impact of input prediction error &#x003B4;<sub>1</sub> on &#x003BC;<sub>2</sub>. Since &#x003BC;<sub>2</sub> is in logit space, the function of &#x003C3;<sub>2</sub> as a dynamic learning rate is more easily visible after transformation to <italic>x</italic><sub>1</sub>-space. This results in the red line labeled <inline-formula><mml:math id="M91"><mml:mrow><mml:mi>q</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003C8;</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msup><mml:mi>s</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>&#x003C8;</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mn>2</mml:mn><mml:mtext>&#x02009;</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula>. (F) Prediction error &#x003B4;<sub>1</sub> about input <italic>u</italic>. (In Iglesias et al., <xref ref-type="bibr" rid="B22">2013</xref>, Figures S1 and S2, &#x003B4;<sub>1</sub> is defined as an outcome prediction error, which corresponds to the absolute value of &#x003B4;<sub>1</sub> as defined here). <bold>(G)</bold> Black: true probability of input 1. Red: posterior expectation of input <italic>u</italic> &#x0003D; 1, <inline-formula><mml:math id="M92"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>1</sub>; this corresponds to a sigmoid transformation of &#x003BC;<sub>2</sub> in <bold>(E)</bold>. Green: sensory input. Orange: subject&#x00027;s observed decisions.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0005.tif"/>
</fig>
<p>Under the Laplace assumption (Friston et al., <xref ref-type="bibr" rid="B18">2007</xref>), the negative variational free energy, an approximation to the log-model evidence, is &#x02212;196.19 for the first response model and &#x02212;188.84 for the second. This corresponds to a Bayes factor of exp (&#x02212;188.84 &#x02212; (&#x02212;196.19)) &#x0003D; 1556, giving the second model a decisive advantage despite the fact that it contains an additional free parameter. In this example, including a measure of higher-level uncertainty has clearly improved our model of a subject&#x00027;s learning and decision-making.</p>
<p>Another possible choice for the inverse decision temperature is <inline-formula><mml:math id="M93"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>2</sub> (cf. Paliwal et al., <xref ref-type="bibr" rid="B33">2014</xref>). This choice is interesting because it is similar to the hypothesis (Friston et al., <xref ref-type="bibr" rid="B19">2013</xref>; Schwartenbeck et al., <xref ref-type="bibr" rid="B37">2013</xref>) that precision serves as the inverse decision temperature in active inference. With a negative variational free energy of &#x02212; 189.63, this model performs similarly to the one with exp (&#x02212; &#x003BC;<sub>3</sub>) as the inverse decision temperature. This similarity in performance is not surprising since <inline-formula><mml:math id="M94"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>2</sub> is (inversely) driven to a large extent by &#x003BC;<sub>3</sub> (cf. Equations 11 and 13).</p>
</sec>
<sec>
<title>Decision model for a one-armed bandit</title>
<p>As an additional example, we discuss a more complex binary decision task that we used to collect data from human subjects (Cole et al., in preparation). In this variant of a one-armed bandit experiment, subjects were asked to play a series of gambles with the goal of maximizing their overall score (Figure <xref ref-type="fig" rid="F6">6</xref>). On each trial, subjects chose between two options represented by the same two fractals, which had different and time-varying reward probabilities. At any point in time, these probabilities summed to unity, implying that exactly one of the two options would be rewarded. Although subjects knew that probabilities varied throughout the course of the experiment, they were not told the schedule that governed these changes. The schedule included both a period of low volatility and a period of high volatility (Figure <xref ref-type="fig" rid="F6">6</xref>), similar to the task used by Behrens et al. (<xref ref-type="bibr" rid="B3">2007</xref>).</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>One-armed bandit task</bold>. Participants were engaged in a simple decision-making task. Each trial consisted of four phases. (i) Cue phase. Two cards and their costs were displayed. (ii) Decision phase. Once the subject had made a decision, the chosen card was highlighted. (iii) Outcome phase. The outcome of a decision was displayed, and added to the score bar only if the chosen card was rewarded. (iv) Inter-trial interval (ITI). The screen only showed the score bar, until the beginning of the next trial. Our experimental paradigm consisted of a number of phases with different reward structures. Different phase lengths induced both a phase of low volatility (trials 1 through 90) and a phase of high volatility (trials 91 through 160).</p></caption>
<graphic xlink:href="fnhum-08-00825-g0006.tif"/>
</fig>
<p>In order to encourage subjects to switch options above and beyond normal exploration behavior (Macready and Wolpert, <xref ref-type="bibr" rid="B28">1998</xref>), the two cards were associated with varying reward magnitudes. On each trial, magnitudes were drawn from a discrete uniform distribution <inline-graphic xlink:href="fnhum-08-00825-i0003.tif"/>(1, 9) (i.e., rewards would take values from the range {1, 2,&#x02026;, 9} with equal probability).</p>
<p>Subjects began the experiment with an initial score of 0 points. Once a card had been chosen, if that card was rewarded, the associated reward would be added to the current score. The final score at the end of the experiment was translated into monetary reimbursement. The experiment consisted of 160 trials.</p>
<p>Calling the two fractals A and B, we parameterize the agent&#x00027;s response by
<disp-formula id="E30"><label>(46)</label><mml:math id="M30"><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0</mml:mn><mml:mtext>&#x02009;for&#x02009;choice&#x02009;A</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn><mml:mtext>&#x02009;for&#x02009;choice&#x02009;B</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>Correspondingly, the state <italic>x</italic><sub>1</sub> is
<disp-formula id="E31"><label>(47)</label><mml:math id="M31"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0</mml:mn><mml:mtext>&#x02009;if&#x02009;A&#x02009;rewarded</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn><mml:mtext>&#x02009;if&#x02009;B&#x02009;rewarded</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>Taking <italic>r<sub>A</sub></italic> and <italic>r<sub>B</sub></italic> to be the rewards for A and B, respectively, we introduce the quadratic loss function
<disp-formula id="E32"><label>(48)</label><mml:math id="M32"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>&#x02113;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>B</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>B</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>This corresponds to the following loss table:</p>
<graphic xlink:href="fnhum-08-00825-e0017.tif"/>
<p>Following the same procedure as above, we get:</p>
<graphic xlink:href="fnhum-08-00825-e0018.tif"/>
<p>With the expected loss from each option on a continuous scale, a simple but powerful decision model is the softmax rule (Sutton and Barto, <xref ref-type="bibr" rid="B40">1998</xref>; Daw et al., <xref ref-type="bibr" rid="B11">2006</xref>)</p>
<graphic xlink:href="fnhum-08-00825-e0019.tif"/>
<p>where <italic>y<sub>i</sub></italic> is one particular option and the sum runs over all options. This means that the decision probabilities are Boltzmann-distributed according to their expected rewards (i.e., their expected negative losses) with the parameter &#x003B6; serving as the analogon of inverse temperature. In our binary case, this evaluates to
<disp-formula id="E33"><label>(52)</label><mml:math id="M33"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B6;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>B</mml:mi></mml:msub><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo></mml:mrow></mml:msub><mml:msub><mml:mi>r</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x003B6;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>B</mml:mi></mml:msub><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>This is a logistic sigmoid function of the difference <italic>r<sub>B</sub></italic> <inline-formula><mml:math id="M95"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>1</sub> &#x02212; <italic>r<sub>A</sub></italic> (1 &#x02212; <inline-formula><mml:math id="M96"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>1</sub>) of expected reward for choice B minus expected reward for choice A. If the expected reward of choice B exceeds that of choice A, the likelihood of choice B is greater than half and vice versa.</p>
</sec>
<sec>
<title>Simulation study</title>
<p>Given the nontrivial nature of our model, it is important to verify the robustness of the inversion scheme. This, in turn, may depend on the numerical optimization method employed. To assess our ability to estimate the parameters under different optimization schemes, we conducted a systematic simulation study based on the 3-level HGF for binary outcomes. This model is shown graphically in Figure <xref ref-type="fig" rid="F2">2</xref> and was the basis for the studies of Vossel et al. (<xref ref-type="bibr" rid="B41">2013</xref>) and Iglesias et al. (<xref ref-type="bibr" rid="B22">2013</xref>). &#x003BA; was chosen as the perceptual parameter to vary because of the interesting effects it has on the nature of the inferential process (cf. Mathys et al., <xref ref-type="bibr" rid="B29">2011</xref>). The response parameter &#x003B6; was chosen as the second parameter to vary because it represents inverse response noise (cf. Equation 20), i.e., for lower values of &#x003B6; the mapping from beliefs to responses becomes less deterministic, which renders it more difficult to estimate the perceptual parameters.</p>
<p>Simulations took place in four steps:</p>
<list list-type="order">
<list-item><p>We chose a particular sequence of 320 binary input <italic>u</italic> &#x0003D; {<italic>u</italic><sup>(1)</sup>, &#x02026;, <italic>u</italic><sup>(320)</sup>}; this was the input sequence in a recent study using the HGF (Iglesias et al., <xref ref-type="bibr" rid="B22">2013</xref>).</p></list-item>
<list-item><p>We chose a particular set of values for the parameters &#x003BE;.</p></list-item>
<list-item><p>We generated 320 binary responses <italic>y</italic> &#x0003D; {<italic>y</italic><sup>(1)</sup>, &#x02026;, <italic>y</italic><sup>(320)</sup>} by drawing from the response distribution given by Equation 44 below.</p></list-item>
<list-item><p>We estimated &#x003BE;<sup>&#x0002A;</sup> according to Equation 21.</p></list-item>
</list>
<p>Step 1 was only performed once, so that <italic>u</italic> was the same in all simulations. The values of &#x003BE; in step 2 were constant for all parameters except &#x003BA; and &#x003B6;. The values of &#x003BA; and &#x003B6; were taken from a two-dimensional grid in which the &#x003BA; dimension took the values {0.5, 1, 1.5, &#x02026;, 3.5} while the &#x003B6; dimension took the values {0.5, 1, 6, 24}. Steps 3 and 4 were then repeated 1&#x00027;000 times for each value pair on the {&#x003BA;, &#x003B6;} grid (for MCMC, owing to its computational burden, only 100 estimations were performed). The &#x003B6; values on the grid were chosen such that they covered the whole range from very low (&#x003B6; &#x0003D; 24) to very high response noise (&#x003B6; &#x0003D; 0.5, cf. Figure <xref ref-type="fig" rid="F4">4</xref>). The &#x003BA; values were chosen to cover the range observed in an empirical behavioral study using the same inputs <italic>u</italic> (Iglesias et al., <xref ref-type="bibr" rid="B22">2013</xref>). The remaining model parameters were held constant (&#x003C9; &#x0003D; &#x02212;4, &#x003D1; &#x0003D; 0.0025). In total, six parameters were estimated. These were (with the space they were estimated in and prior mean and variance in that space in brackets): &#x003BC;<sup>(0)</sup><sub>2</sub> (native, 0, 1), &#x003C3;<sup>(0)</sup><sub>2</sub> (log, 0, 1), &#x003C3;<sup>(0)</sup><sub>3</sub> (log, 0, 1), &#x003BA; (logit with upper bound at 6, 0, 9), &#x003D1; (logit with upper bound at 0.005, 0, 9), and &#x003B6; (log, 48, 1). The prior mean of the response variable &#x003B6; was chosen relatively high to provide shrinkage on the estimation of decision noise.</p>
<p>This procedure was repeated for four different optimization methods which are commonly used but possess different properties with regard to computational efficiency and robustness to getting trapped in local extrema:</p>
<list list-type="order">
<list-item><p>Nelder-Mead simplex algorithm (NMSA),</p></list-item>
<list-item><p>Gaussian process-based global optimization (GPGO),</p></list-item>
<list-item><p>Variational Bayes (VB),</p></list-item>
<list-item><p>Markov Chain Monte Carlo estimation (MCMC).</p></list-item>
</list>
<p>In brief, NMSA (Nelder and Mead, <xref ref-type="bibr" rid="B31">1965</xref>) is a popular local optimization algorithm which is implemented, for example, in the fminsearch function of Matlab. VB also optimizes locally (by gradient descent); for details see Bishop (<xref ref-type="bibr" rid="B4">2006</xref>, p. 461ff). For our simulation study, we used VB as implemented in the DAVB toolbox, available at <ext-link ext-link-type="uri" xlink:href="http://goo.gl/As8p7">http://goo.gl/As8p7</ext-link> (Daunizeau et al., <xref ref-type="bibr" rid="B9">2009</xref>, <xref ref-type="bibr" rid="B6">2014</xref>). In contrast, GPGO (Rasmussen and Williams, <xref ref-type="bibr" rid="B35">2006</xref>; Lomakina et al., <xref ref-type="bibr" rid="B27">2012</xref>) provides a global optimum of the objective function and is thus potentially more robust than NMSA and VB albeit computationally more expensive. The final method was MCMC (Gelman et al., <xref ref-type="bibr" rid="B20">2003</xref>, p. 283ff) which served as a &#x0201C;gold standard&#x0201D; against which we compared the other methods. Specifically, we used Gibbs sampling with a one-dimensional Metropolis step for each of the parameters (cf. Gelman et al., <xref ref-type="bibr" rid="B20">2003</xref>, p. 292). For each of the 100 simulation runs (at each point on our parameter grid) we used one chain with a length of 500&#x00027;000 samples and a burn-in period of 25&#x00027;000 samples. In summary, our simulations thus consider two algorithms (NMSA and VB) which are computationally very efficient but provide a local optimum only, in comparison to another two algorithms (GPGO and MCMC) which are computationally more expensive but are capable of finding global optima.</p>
<p>All optimization methods could reliably distinguish different values of &#x003BA; at low or moderate decision noise (Figure <xref ref-type="fig" rid="F7">7</xref>). At higher noise levels, estimates became less reliable. With GP, VB, and MCMC, they then exhibited a tendency to underestimate &#x003BA;, while NMSA tended to mid-range values. Nonetheless, substantial differences in &#x003BA; within the range tested could be detected by all four methods even at high levels of noise.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>Estimation of coupling &#x003BA; by four methods at different noise levels &#x003B6;</bold>. A range of &#x003BA; from 0.5 to 3.5 was chosen based on the range of estimates observed in the analysis of experimental data. Decision noise levels were chosen in a range from very high (0.5) to very low (24). The remaining model parameters were held constant (&#x003C9; &#x0003D; &#x02212;4, &#x003D1; &#x0003D; 0.0025). For each point of the resulting two-dimensional grid, 1000 task runs with 320 decisions each were simulated. Given the fixed sequence of inputs and simulated sequence of decisions, we then attempted to recover the model parameters, including &#x003BA; and &#x003B6;, by four estimation methods: (1) the function Nelder-Mead simplex algorithm (NMSA), (2) Bayesian global optimization based on Gaussian processes (GPGO), (4) variational Bayes (VB), and Markov chain Monte Carlo sampling (MCMC). The figure shows boxplots of the distributions of the maximum-a-posteriori (MAP) point estimates for the four methods at each grid point. Boxplots consist of boxes spanning the range from the 25th to the 75th percentile, circles at the median, and whiskers spanning the rest of the estimate range. Horizontal shifts within &#x003B6; levels are for readability. Black bars indicate ground truth.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0007.tif"/>
</fig>
<p>The noise level itself could also be determined by all four methods (Figure <xref ref-type="fig" rid="F8">8</xref>). The methods did not differ appreciably in their performance. They all tended to underestimate the noise level owing to a mild shrinkage due to the prior on &#x003B6;. Errors are smaller for moderate noise levels, increasing for both high and low noise (cf. Figure <xref ref-type="fig" rid="F9">9A2</xref>).</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p><bold>Estimation of noise level &#x003B6; at different levels of coupling &#x003BA;</bold>. &#x003B6; is estimated and displayed here at the logarithmic scale because it has a natural lower bound at 0. See Figure <xref ref-type="fig" rid="F7">7</xref> for key to legend. The figure shows boxplots of the distributions of the maximum-a-posteriori (MAP) point estimates for the four methods at each point of the simulation grid. Horizontal shifts within &#x003BA; levels are for readability. Black bars indicate ground truth.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0008.tif"/>
</fig>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>Quantitative assessment of parameter estimation</bold>. <bold>(A)</bold> Root mean squared error of MAP estimates by noise level &#x003B6; for all four estimation methods (see Figure <xref ref-type="fig" rid="F7">7</xref> for key to legends). <bold>(A1)</bold> Estimates for &#x003BA; improve with decreasing noise and do not exhibit substantially significant differences between methods although NMSA is somewhat better at very high noise. <bold>(A2)</bold> As in Figure <xref ref-type="fig" rid="F8">8</xref>, estimates for &#x003B6; were assessed at the logarithmic scale. <bold>(B)</bold> Confidence of VB and MCMC. <bold>(B1)</bold> Both methods are realistically confident about their inference on &#x003BA; across noise levels, with a slight tendency toward overconfidence with higher noise. <bold>(B2)</bold> This tendency is more pronounced with estimates of &#x003B6;.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0009.tif"/>
</fig>
<p>Figures <xref ref-type="fig" rid="F9">9A1,A2</xref> shows the root mean squared error in &#x003BA; and log (&#x003B6;), jointly for all values of &#x003BA;. The results show that the noise level could best be estimated at moderate levels, where in fact most estimates of experimental data are found. Again, the methods perform comparably well, with NMSA best at high noise. Figures <xref ref-type="fig" rid="F9">9B,B2</xref> contrasts the performance of VB and MCMC and displays the accuracy of the confidence with which VB and MCMC make their estimates. To this end, it uses the fact that VB and MCMC estimate the whole posterior distribution. Parameter estimates can therefore not only be summarized as point estimates, but also as posterior central intervals (PCIs; the 95% PCI is the interval that excludes 2.5% of the posterior probability mass on either side). If an estimation method were neither over- nor underconfident, 95 of 95% PCIs would contain the true parameter value. If the proportion is less than 0.95, this indicates overconfidence; if it is greater than 0.95, underconfidence. Both methods were realistically confident about their inference on &#x003BA; across noise levels, with a slight tendency toward overconfidence with higher noise. This tendency was more pronounced with estimates of &#x003B6;.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s2">
<title>Discussion</title>
<p>In this paper, we have shown that the hierarchical Bayesian model of Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>) can be extended in several ways, resulting in a general framework referred to as the HGF. Furthermore, we have demonstrated how the HGF can be combined with decision models to allow for parameter estimation from empirical data. We start by discussing the nature of the HGF updates in the context of Bayesian inference.</p>
<p>A crucial feature of the HGF&#x00027;s update equations is emphasized by the notation used in Equation 9: the updates of the means are <italic>precision-weighted prediction errors</italic>. For a full understanding of their role, we will first discuss Bayesian updates in the simplest possible case, where they can be calculated exactly. In this simplest case, there is only one hidden state <italic>x</italic> &#x02208; &#x0211D; that is the target of our inference, and there is a Gaussian prior on <italic>x</italic>:</p>
<graphic xlink:href="fnhum-08-00825-e0020.tif"/>
<p>where &#x003BC;<sub><italic>x</italic></sub> is the mean and &#x003C0;<sub><italic>x</italic></sub> the precision. The likelihood of <italic>x</italic> (i.e., the probability of observing the datum <italic>u</italic> &#x02208; &#x0211D; given <italic>x</italic>) is also Gaussian, with precision (inverse observation noise) &#x003C0;<sub><italic>u</italic></sub>:</p>
<graphic xlink:href="fnhum-08-00825-e0021.tif"/>
<p>According to Bayes&#x00027; theorem, the posterior is now also Gaussian:</p>
<graphic xlink:href="fnhum-08-00825-e0022.tif"/>
<p>The posterior precision &#x003C0;<sub><italic>x</italic>|<italic>u</italic></sub> and mean &#x003BC;<sub><italic>x</italic>|<italic>u</italic></sub> can be written as the following analytical and exact one-step updates:
<disp-formula id="E34"><label>(56)</label><mml:math id="M34"><mml:mrow><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mi>u</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E35"><label>(57)</label><mml:math id="M35"><mml:mrow><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mi>u</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula></p>
<p>The update in the mean is a precision-weighted prediction error. The prediction error <italic>u</italic> &#x02212; &#x003BC;<sub><italic>x</italic></sub> is weighted proportionally to the observation precision &#x003C0;<sub><italic>u</italic></sub>, reflecting the fact that the more observation noise there is, the less weight should be assigned to the prediction error. On the other hand, prediction error is weighted inversely proportionally to the posterior precision &#x003C0;<sub><italic>x</italic>|<italic>u</italic></sub>; that is, with higher certainty about <italic>x</italic>, the impact of any new information on its estimate becomes smaller.</p>
<p>The same precision-weighting of prediction errors appears in the update of the means &#x003BC;<sub><italic>i</italic></sub> of the states <italic>x</italic><sub><italic>i</italic></sub> in the inversion of the general HGF (Figure <xref ref-type="fig" rid="F10">10</xref>, Equation 9):
<disp-formula id="E36"><label>(58)</label><mml:math id="M36"><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mfrac><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
or, in more compact notation,
<disp-formula id="E37"><label>(59)</label><mml:math id="M37"><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x0221D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p><bold>Posterior mean update equation</bold>. Updates are precision-weighted prediction errors. This general feature of Bayesian updating is concretized by the HGF for volatility predictions in a hierarchical setting.</p></caption>
<graphic xlink:href="fnhum-08-00825-g0010.tif"/>
</fig>
<p>Owing to the hierarchical nature of the HGF, the place of the likelihood precision &#x003C0;<sub><italic>u</italic></sub> in Equation 56 is here taken by the precision of the prediction on the level below, <inline-formula><mml:math id="M97"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub><italic>i</italic></sub>, while the posterior precision &#x003C0;<sub><italic>i</italic></sub> in the HGF corresponds exactly to the posterior precision &#x003C0;<sub><italic>x</italic>|<italic>u</italic></sub> in Equation 56. The precision ratio in these updates is the <italic>learning rate</italic> with which the prediction error is weighted. Prediction errors weighted by a learning rate are a defining feature of many reinforcement learning models (e.g., Rescorla and Wagner, <xref ref-type="bibr" rid="B36">1972</xref>). The HGF furnishes a Bayesian foundation for these heuristically derived models in that it provides learning rates that are optimal given a particular agent&#x00027;s parameter setting. The numerator of the precision ratio in Equation 59 contains the precision of the prediction onto the level below. This relation make sense because the higher this precision, the more meaning a given prediction error has. The denominator of the ratio contains the precision of the belief about the level being updated. Again, it makes sense that the update should be antiproportional to this since the more certain the agent is that it knows the true value of <italic>x<sub>i</sub></italic>, the less inclined it should be to change it. What sets the HGF apart from other models with adaptive learning rates (e.g., Sutton, <xref ref-type="bibr" rid="B39">1992</xref>; Nassar et al., <xref ref-type="bibr" rid="B30">2010</xref>; Payzan-LeNestour and Bossaerts, <xref ref-type="bibr" rid="B34">2011</xref>; Wilson et al., <xref ref-type="bibr" rid="B42">2013</xref>) is that its update equations are derived to optimize a clearly defined objective function, variational free energy, thereby minimizing surprise. Furthermore, while the Kalman filter (Kalman, <xref ref-type="bibr" rid="B24">1960</xref>) is optimal for data generated by linear dynamical systems, the HGF has the advantage that it can deal with nonlinear systems because it adapts its volatility estimate as the data come in. This adaptive adjustment of learning rates corresponds to an optimal &#x0201C;forgetting&#x0201D; algorithm that prevents learning rates from becoming too low.</p>
<p>Notably, the prediction error &#x003B4;<sub><italic>i</italic> &#x02212; 1</sub> is a volatility prediction error (VOPE) in the HGF while the prediction errors in the single-level Gaussian updates Equation 56, like the first-level updates in the HGF Equation 15, refer to value prediction errors (VAPEs). While a VAPE captures the error about the magnitude of a hidden state, a VOPE captures the error about the amount of change in a hidden state. The crucial point here is that the levels of the HGF are linked via the variance (or, equivalently, precision) of the prediction of the next lower level. Consequently, the inversion proceeds by updating the higher level based on the variance (or volatility) of the lower level. This becomes apparent in Equation 14. The denominator of the fraction contains predicted uncertainty about the level below, while the numerator contains observed uncertainty. These can again be broken down into informational and environmental uncertainty (see below). Whenever observed uncertainty exceeds predicted, the fraction is greater than one and the VOPE is positive. Conversely, when observed uncertainty is less than predicted, the VOPE is negative.</p>
<p>The two sources of uncertainty, informational and environmental, are clearly visible in the precision of the predictions Equation 13 and in the VOPEs Equation 14. In Equations 13 and 14, &#x003C3;<sup>(<italic>k</italic> &#x02212; 1)</sup><sub><italic>i</italic></sub> is the informational posterior uncertainty about <italic>x</italic><sub><italic>i</italic></sub>, while <inline-formula><mml:math id="M98"><mml:mrow><mml:msubsup><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> is the environmental uncertainty, the magnitude of which is determined by a combination of two kinds of volatility: phasic <inline-formula><mml:math id="M99"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula> and tonic (&#x003C9;<sub><italic>i</italic></sub>). The less we know about <italic>x<sub>i</sub></italic>, the greater the informational uncertainty &#x003C3;<sup>(<italic>k</italic> &#x02212; 1)</sup><sub>i</sub>; by contrast, the more volatile the environment is, the greater the environmental uncertainty <italic>v</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>.</p>
<p>The relation between uncertainty (informational and environmental, expected and unexpected) and volatility (phasic and tonic) can be summarized as follows: informational uncertainty could be seen as a form of expected uncertainty which, however, differs from Yu and Dayan (<xref ref-type="bibr" rid="B44">2005</xref>) in that it is defined in terms of posterior variance instead of estimated deviation from certainty. By constrast, environmental uncertainty can be linked to &#x0201C;unexpected&#x0201D; uncertainty and is the result of phasic and tonic volatility. We use the term &#x0201C;environmental&#x0201D; instead of &#x0201C;unexpected&#x0201D; because, in the context of the HGF, unexpected uncertainty is incorporated into the precision of predictions (cf. Equation 13), i.e., there is always some degree of belief that the environment might be changing.</p>
<p>In a review of the literature on different kinds of uncertainty in human decision-making, Bland and Schaefer (<xref ref-type="bibr" rid="B5">2012</xref>) argue that unexpected uncertainty and volatility are often not sufficiently differentiated while Payzan-LeNestour and Bossaerts (<xref ref-type="bibr" rid="B34">2011</xref>) make a further subdistinction of unexpected uncertainty: they differentiate between stochastic volatility and a narrower concept of unexpected uncertainty. This distinction maps exactly onto the difference between tonic and phasic volatility in the HGF. While the Kalman Filter deals optimally with tonic/stochastic volatility, the HGF can also accommodate sudden environmental changes via phasic volatility. An illustration of this ability can be found in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>), where the U.S. Dollar to Swiss Franc exchange rate time series from the first half of the year 2010 is filtered using the HGF (their Figure 11). In addition to tonic volatility, this time series also reflects a clear change point, i.e., the markets&#x00027; realization that Greece was insolvent. The latter is captured by the HGF in phasic volatility shooting up almost vertically.</p>
<p>Environmental uncertainty is updated by adjusting &#x003BC;<sub><italic>i</italic> &#x0002B; 1</sub>, the estimate of the next higher level. This is done in the VOPE by comparing predicted total uncertainty (informational plus environmental, &#x003C3;<sup>(<italic>k</italic> &#x02212; 1)</sup><sub><italic>i</italic></sub> &#x0002B; <italic>v</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>) to observed total uncertainty <inline-formula><mml:math id="M100"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula>. In this way, environmental uncertainty estimates are dynamically adapted to changes in the environment, leading to changes in learning rates that reflect an optimal (with respect to avoiding surprise) balance between informational and environmental uncertainty estimates.</p>
<p>At first glance, the precision-weighting of prediction errors may seem different in the &#x0201C;classical&#x0201D; 3-level HGF (Figure <xref ref-type="fig" rid="F2">2</xref>) with categorical outcomes, where the update for &#x003BC;<sub>2</sub> (Equation 32) is:
<disp-formula id="E38"><label>(60)</label><mml:math id="M38"><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></disp-formula></p>
<p>At first, this simply looks like an uncertainty-weighted (not precision-weighted) update. However, if we unpack &#x003C3;<sub>2</sub> according to Equation 33 and do a Taylor expansion in powers of <inline-formula><mml:math id="M101"><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>1</sub>, we see that it is again proportional to the precision of the prediction on the level below:
<disp-formula id="E39"><label>(61)</label><mml:math id="M39"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mtext>-</mml:mtext><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>3</mml:mn></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;+</mml:mtext><mml:mi>O</mml:mi><mml:mtext>(4).</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>We have further shown a principled way how to define decision models based on perceptual HGF inferences, namely by deriving them from a loss function. Based on such decision models, it is possible to infer on model parameters and state trajectories from observed decisions. In the simulation study we have reported here, we could show that even with considerable decision noise, we can reliably infer model parameters based on a few hundred data points for binary decisions. Several recent studies have done this in practice, estimating subject-specific HGF parameters from behavioral data. For example, (Vossel et al., <xref ref-type="bibr" rid="B41">2013</xref>) used the HGF to model learning in human subjects performing a Posner task with varying outcome contingencies. This study sought to compare different possible explanations for measured eye movements (saccadic reaction speeds), using a factorial model space comprising three alternative perceptual and three different response models. Model comparison showed that the 3-level HGF had greater model evidence than simpler versions of itself and Rescorla&#x02013;Wagner learning (Rescorla and Wagner, <xref ref-type="bibr" rid="B36">1972</xref>). This indicates that humans are capable of hierarchically structured learning, exploiting volatility estimates to adapt their learning rate dynamically. The same conclusion emerged from the study of Iglesias et al. (<xref ref-type="bibr" rid="B22">2013</xref>) who used the HGF to analyze human learning of auditory-visual associations which varied unpredictably in time. This study subsequently used the trial-wise estimates of precision-weighted prediction errors (i.e., <inline-formula><mml:math id="M102"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) in fMRI analyses, demonstrating activation of the dopaminergic midbrain with first-level (i.e., sensory outcome) precision-weighted prediction errors, and activation of the cholinergic basal forebrain with second-level (i.e., probability) precision-weighted prediction errors. These findings resonate with recent proposals that an important aspect of neuromodulatory function is the encoding of precision (Friston, <xref ref-type="bibr" rid="B16">2009</xref>).</p>
<p>The HGF can, in principle, accommodate any form of loss function in the decision model. This choice will depend on the particular question addressed and the assumptions of the application domain (e.g., rationality assumptions). In the examples shown in this paper, we employ loss functions that are quadratic. This reflects the fact that squared losses imply a Gaussian distribution of errors, which is the appropriate choice where little is known about the true distribution because the Gaussian has maximum entropy (i.e., the least arbitrary assumptions) for a given mean and variance. This means that using a quadratic loss function is the most conservative choice in the absence of additional prior knowledge about the error distribution, where the term error refers to the agent&#x00027;s failure to make the choice that minimizes its expected loss.</p>
<p>Since all four methods of inverting the decision model performed well in our simulation study, we may focus on secondary criteria in choosing a method for practical applications. The most important of these criteria are the computational burden imposed and the amount of information contained in the estimate. The best performer in these respects is currently variational Bayes because it is efficient and provides an estimate of the whole posterior distribution for all parameters in addition to an approximation to the free energy bound on the log-model evidence, enabling model comparison. MCMC offers the same in principle, but at a considerably higher computational cost. GPGO is computationally more expensive than VB but may be a strong contender for future cases with multimodal posterior distributions. The weakest contender is NMSA because it is not much more efficient than VB but only offers a point estimate of the MAP parameter values.</p>
<p>In summary, the HGF provides a general and powerful framework for inferring on belief updating processes and learning styles of individual subjects in a volatile environment. This makes it a generic tool for studying perception in a Helmholtzian sense. The simple nature of the HGF updates in the form of precision-weighted prediction errors do not only enhance their biological interpretability and plausibility (cf. Friston, <xref ref-type="bibr" rid="B16">2009</xref>) but are also crucial for practical applications. The ability of the HGF to infer learning styles of individual subjects from behavioral data and its support of Bayesian model comparison offer interesting opportunities for studying individual differences and particularly for clinical studies on psychopathology. To facilitate such practical applications, we have developed a software toolbox based on Matlab that is freely available for downloading as part of the TAPAS collection at <ext-link ext-link-type="uri" xlink:href="http://www.translationalneuromodeling.org/tapas/">http://www.translationalneuromodeling.org/tapas/</ext-link>. The HGF toolbox is specifically tailored to the implementations of discrete time filtering models, as opposed to the DAVB toolbox, which is mostly aimed at inverting dynamic models in continuous time. The HGF toolbox implements most of the models described in this article, plus some additional ones, and will be the focus of a forthcoming article.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>The authors gratefully acknowledge support by the UCL-MPS Initiative on Computational Psychiatry and Ageing Research (Christoph D. Mathys), the Ren&#x000E9; and Susanne Braginsky Foundation (Klaas E. Stephan), the Clinical Research Priority Program &#x0201C;Multiple Sclerosis&#x0201D; (Klaas E. Stephan), and the Wellcome Trust (Karl J. Friston).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adams</surname> <given-names>R. A.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Brown</surname> <given-names>H. R.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2013</year>). <article-title>The computational anatomy of psychosis</article-title>. <source>Front. Schizophr</source>. <volume>4</volume>:<issue>47</issue>. <pub-id pub-id-type="doi">10.3389/fpsyt.2013.00047</pub-id><pub-id pub-id-type="pmid">23750138</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behrens</surname> <given-names>T. E. J.</given-names></name> <name><surname>Hunt</surname> <given-names>L. T.</given-names></name> <name><surname>Woolrich</surname> <given-names>M. W.</given-names></name> <name><surname>Rushworth</surname> <given-names>M. F. S.</given-names></name></person-group> (<year>2008</year>). <article-title>Associative learning of social value</article-title>. <source>Nature</source> <volume>456</volume>, <fpage>245</fpage>&#x02013;<lpage>249</lpage>. <pub-id pub-id-type="doi">10.1038/nature07538</pub-id><pub-id pub-id-type="pmid">19005555</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behrens</surname> <given-names>T. E. J.</given-names></name> <name><surname>Woolrich</surname> <given-names>M. W.</given-names></name> <name><surname>Walton</surname> <given-names>M. E.</given-names></name> <name><surname>Rushworth</surname> <given-names>M. F. S.</given-names></name></person-group> (<year>2007</year>). <article-title>Learning the value of information in an uncertain world</article-title>. <source>Nat. Neurosci</source>. <volume>10</volume>, <fpage>1214</fpage>&#x02013;<lpage>1221</lpage>. <pub-id pub-id-type="doi">10.1038/nn1954</pub-id><pub-id pub-id-type="pmid">17676057</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bishop</surname> <given-names>C. M.</given-names></name></person-group> (<year>2006</year>). <source>Pattern Recognition and Machine Learning</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bland</surname> <given-names>A. R.</given-names></name> <name><surname>Schaefer</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>Different varieties of uncertainty in human decision-making</article-title>. <source>Front. Neurosci</source>. <volume>6</volume>:<issue>85</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2012.00085</pub-id><pub-id pub-id-type="pmid">22701401</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Adam</surname> <given-names>V.</given-names></name> <name><surname>Rigoux</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data</article-title>. <source>PLoS Comput. Biol</source>. <volume>10</volume>:<fpage>e1003441</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003441</pub-id><pub-id pub-id-type="pmid">24465198</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>den Ouden</surname> <given-names>H. E. M.</given-names></name> <name><surname>Pessiglione</surname> <given-names>M.</given-names></name> <name><surname>Kiebel</surname> <given-names>S. J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name></person-group> (<year>2010a</year>). <article-title>Observing the Observer (II): deciding when to decide</article-title>. <source>PLoS ONE</source> <volume>5</volume>:<fpage>e15555</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0015555</pub-id><pub-id pub-id-type="pmid">21179484</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>den Ouden</surname> <given-names>H. E. M.</given-names></name> <name><surname>Pessiglione</surname> <given-names>M.</given-names></name> <name><surname>Kiebel</surname> <given-names>S. J.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2010b</year>). <article-title>Observing the Observer (I): meta-bayesian models of learning and decision-making</article-title>. <source>PLoS ONE</source> <volume>5</volume>:<fpage>e15554</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0015554</pub-id><pub-id pub-id-type="pmid">21179480</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Kiebel</surname> <given-names>S. J.</given-names></name></person-group> (<year>2009</year>). <article-title>Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models</article-title>. <source>Phys. Nonlinear Phenom</source>. <volume>238</volume>, <fpage>2089</fpage>&#x02013;<lpage>2118</lpage>. <pub-id pub-id-type="doi">10.1016/j.physd.2009.08.002</pub-id><pub-id pub-id-type="pmid">19862351</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daw</surname> <given-names>N. D.</given-names></name> <name><surname>Doya</surname> <given-names>K.</given-names></name></person-group> (<year>2006</year>). <article-title>The computational neurobiology of learning and reward</article-title>. <source>Curr. Opin. Neurobiol</source>. <volume>16</volume>, <fpage>199</fpage>&#x02013;<lpage>204</lpage>. <pub-id pub-id-type="doi">10.1016/j.conb.2006.03.006</pub-id><pub-id pub-id-type="pmid">16563737</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daw</surname> <given-names>N. D.</given-names></name> <name><surname>O&#x00027;Doherty</surname> <given-names>J. P.</given-names></name> <name><surname>Dayan</surname> <given-names>P.</given-names></name> <name><surname>Seymour</surname> <given-names>B.</given-names></name> <name><surname>Dolan</surname> <given-names>R. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Cortical substrates for exploratory decisions in humans</article-title>. <source>Nature</source> <volume>441</volume>, <fpage>876</fpage>&#x02013;<lpage>879</lpage>. <pub-id pub-id-type="doi">10.1038/nature04766</pub-id><pub-id pub-id-type="pmid">16778890</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dayan</surname> <given-names>P.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Neal</surname> <given-names>R. M.</given-names></name> <name><surname>Zemel</surname> <given-names>R. S.</given-names></name></person-group> (<year>1995</year>). <article-title>The helmholtz machine</article-title>. <source>Neural Comput</source>. <volume>7</volume>, <fpage>889</fpage>&#x02013;<lpage>904</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1995.7.5.889</pub-id><pub-id pub-id-type="pmid">7584891</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Doya</surname> <given-names>K.</given-names></name> <name><surname>Ishii</surname> <given-names>S.</given-names></name> <name><surname>Pouget</surname> <given-names>A.</given-names></name> <name><surname>Rao</surname> <given-names>R. P. N.</given-names></name></person-group> (<year>2011</year>). <source>Bayesian Brain: Probabilistic Approaches to Neural Coding</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Mit Press</publisher-name>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Evans</surname> <given-names>L. C.</given-names></name></person-group> (<year>2010</year>). <source>Partial Differential Equations</source>. <publisher-loc>Providece, RI</publisher-loc>: <publisher-name>American Mathematical Society</publisher-name>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faisal</surname> <given-names>A. A.</given-names></name> <name><surname>Selen</surname> <given-names>L. P. J.</given-names></name> <name><surname>Wolpert</surname> <given-names>D. M.</given-names></name></person-group> (<year>2008</year>). <article-title>Noise in the nervous system</article-title>. <source>Nat. Rev. Neurosci</source>. <volume>9</volume>, <fpage>292</fpage>&#x02013;<lpage>303</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2258</pub-id><pub-id pub-id-type="pmid">18319728</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>The free-energy principle: a rough guide to the brain?</article-title> <source>Trends Cogn. Sci</source>. <volume>13</volume>, <fpage>293</fpage>&#x02013;<lpage>301</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2009.04.005</pub-id><pub-id pub-id-type="pmid">19559644</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Dolan</surname> <given-names>R. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Computational and dynamic models in neuroimaging</article-title>. <source>Neuroimage</source> <volume>52</volume>, <fpage>752</fpage>&#x02013;<lpage>765</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.12.068</pub-id><pub-id pub-id-type="pmid">20036335</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name> <name><surname>Mattout</surname> <given-names>J.</given-names></name> <name><surname>Trujillo-Barreto</surname> <given-names>N.</given-names></name> <name><surname>Ashburner</surname> <given-names>J.</given-names></name> <name><surname>Penny</surname> <given-names>W.</given-names></name></person-group> (<year>2007</year>). <article-title>Variational free energy and the Laplace approximation</article-title>. <source>Neuroimage</source> <volume>34</volume>, <fpage>220</fpage>&#x02013;<lpage>234</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2006.08.035</pub-id><pub-id pub-id-type="pmid">17055746</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name> <name><surname>Schwartenbeck</surname> <given-names>P.</given-names></name> <name><surname>FitzGerald</surname> <given-names>T.</given-names></name> <name><surname>Moutoussis</surname> <given-names>M.</given-names></name> <name><surname>Behrens</surname> <given-names>T.</given-names></name> <name><surname>Dolan</surname> <given-names>R. J.</given-names></name></person-group> (<year>2013</year>). <article-title>The anatomy of choice: active inference and agency</article-title>. <source>Front. Hum. Neurosci</source>. <volume>7</volume>:<issue>598</issue>. <pub-id pub-id-type="doi">10.3389/fnhum.2013.00598</pub-id><pub-id pub-id-type="pmid">24093015</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gelman</surname> <given-names>A.</given-names></name> <name><surname>Carlin</surname> <given-names>J. B.</given-names></name> <name><surname>Stern</surname> <given-names>H. S.</given-names></name> <name><surname>Rubin</surname> <given-names>D. B.</given-names></name></person-group> (<year>2003</year>). <source>Bayesian Data Analysis</source>. <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>Chapman &#x00026; Hall/CRC</publisher-name>.</citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Helmholtz</surname> <given-names>H.</given-names></name></person-group> (<year>1860</year>). <source>Handbuch der Physiologischen Optik</source>. English translation (1962): J. P. C. Southall. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Dover</publisher-name>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iglesias</surname> <given-names>S.</given-names></name> <name><surname>Mathys</surname> <given-names>C.</given-names></name> <name><surname>Brodersen</surname> <given-names>K. H.</given-names></name> <name><surname>Kasper</surname> <given-names>L.</given-names></name> <name><surname>Piccirelli</surname> <given-names>M.</given-names></name> <name><surname>den Ouden</surname> <given-names>H. E. M.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Hierarchical prediction errors in midbrain and basal forebrain during sensory learning</article-title>. <source>Neuron</source> <volume>80</volume>, <fpage>519</fpage>&#x02013;<lpage>530</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2013.09.009</pub-id><pub-id pub-id-type="pmid">24139048</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joffily</surname> <given-names>M.</given-names></name> <name><surname>Coricelli</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Emotional valence and the free-energy principle</article-title>. <source>PLoS Comput. Biol</source>. <volume>9</volume>:<fpage>e1003094</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003094</pub-id><pub-id pub-id-type="pmid">23785269</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalman</surname> <given-names>R. E.</given-names></name></person-group> (<year>1960</year>). <article-title>A new approach to linear filtering and prediciton problems</article-title>. <source>J. Basic Eng</source>. <volume>82</volume>, <fpage>35</fpage>&#x02013;<lpage>45</lpage>.</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knill</surname> <given-names>D. C.</given-names></name> <name><surname>Pouget</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>The Bayesian brain: the role of uncertainty in neural coding and computation</article-title>. <source>Trends Neurosci</source>. <volume>27</volume>, <fpage>712</fpage>&#x02013;<lpage>719</lpage>. <pub-id pub-id-type="doi">10.1016/j.tins.2004.10.007</pub-id><pub-id pub-id-type="pmid">15541511</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>K&#x000F6;rding</surname> <given-names>K. P.</given-names></name> <name><surname>Wolpert</surname> <given-names>D. M.</given-names></name></person-group> (<year>2006</year>). <article-title>Bayesian decision theory in sensorimotor control</article-title>. <source>Trends Cogn. Sci</source>. <volume>10</volume>, <fpage>319</fpage>&#x02013;<lpage>326</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2006.05.003</pub-id><pub-id pub-id-type="pmid">16807063</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Lomakina</surname> <given-names>E.</given-names></name> <name><surname>Vezhnevets</surname> <given-names>A.</given-names></name> <name><surname>Mathys</surname> <given-names>C.</given-names></name> <name><surname>Brodersen</surname> <given-names>K. H.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Buhmann</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <source>Bayesian Global Optimization for Model-Based Neuroimaging</source>. HBM E-Poster. Available online at: <ext-link ext-link-type="uri" xlink:href="https://ww4.aievolution.com/hbm1201/index.cfm?do=abs.viewAbs&#x00026;abs=6320">https://ww4.aievolution.com/hbm1201/index.cfm?do&#x0003D;abs.viewAbs&#x00026;abs&#x0003D;6320</ext-link></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Macready</surname> <given-names>W. G.</given-names></name> <name><surname>Wolpert</surname> <given-names>D. H. I.</given-names></name></person-group> (<year>1998</year>). <article-title>Bandit problems and the exploration/exploitation tradeoff</article-title>. <source>Evol. Comput. IEEE Trans</source>. <volume>2</volume>, <fpage>2</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1109/4235.728210</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mathys</surname> <given-names>C.</given-names></name> <name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name></person-group> (<year>2011</year>). <article-title>A Bayesian foundation for individual learning under uncertainty</article-title>. <source>Front. Hum. Neurosci</source>. <volume>5</volume>:<issue>39</issue>. <pub-id pub-id-type="doi">10.3389/fnhum.2011.00039</pub-id><pub-id pub-id-type="pmid">21629826</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nassar</surname> <given-names>M. R.</given-names></name> <name><surname>Wilson</surname> <given-names>R. C.</given-names></name> <name><surname>Heasly</surname> <given-names>B.</given-names></name> <name><surname>Gold</surname> <given-names>J. I.</given-names></name></person-group> (<year>2010</year>). <article-title>An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment</article-title>. <source>J. Neurosci</source>. <volume>30</volume>, <fpage>12366</fpage>&#x02013;<lpage>12378</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0822-10.2010</pub-id><pub-id pub-id-type="pmid">20844132</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nelder</surname> <given-names>J. A.</given-names></name> <name><surname>Mead</surname> <given-names>R.</given-names></name></person-group> (<year>1965</year>). <article-title>A simplex method for function minimization</article-title>. <source>Comput. J</source>. <volume>7</volume>, <fpage>308</fpage>&#x02013;<lpage>313</lpage>. <pub-id pub-id-type="doi">10.1093/comjnl/7.4.308</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Doherty</surname> <given-names>J. P.</given-names></name> <name><surname>Hampton</surname> <given-names>A.</given-names></name> <name><surname>Kim</surname> <given-names>H.</given-names></name></person-group> (<year>2007</year>). <article-title>Model-based fMRI and its application to reward learning and decision making</article-title>. <source>Ann. N.Y. Acad. Sci</source>. <volume>1104</volume>, <fpage>35</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1196/annals.1390.022</pub-id><pub-id pub-id-type="pmid">17416921</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paliwal</surname> <given-names>S.</given-names></name> <name><surname>Petzschner</surname> <given-names>F.</given-names></name> <name><surname>Schmitz</surname> <given-names>A. K.</given-names></name> <name><surname>Tittgemeyer</surname> <given-names>M.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name></person-group> (<year>2014</year>). <article-title>A model-based analysis of impulsivity using a slot-machine gambling paradigm</article-title>. <source>Front. Hum. Neurosci</source>. <volume>8</volume>:<issue>428</issue>. <pub-id pub-id-type="doi">10.3389/fnhum.2014.00428</pub-id><pub-id pub-id-type="pmid">25071497</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Payzan-LeNestour</surname> <given-names>E.</given-names></name> <name><surname>Bossaerts</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings</article-title>. <source>PLoS Comput. Biol</source>. <volume>7</volume>:<fpage>e1001048</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1001048</pub-id><pub-id pub-id-type="pmid">21283774</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rasmussen</surname> <given-names>C. E.</given-names></name> <name><surname>Williams</surname> <given-names>C. K. I.</given-names></name></person-group> (<year>2006</year>). <source>Gaussian Processes for Machine Learning</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rescorla</surname> <given-names>R. A.</given-names></name> <name><surname>Wagner</surname> <given-names>A. R.</given-names></name></person-group> (<year>1972</year>). <article-title>A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement,</article-title> in <source>Classical Conditioning II: Current Research and Theory</source>, eds <person-group person-group-type="editor"><name><surname>Black</surname> <given-names>A. H.</given-names></name> <name><surname>Prokasy</surname> <given-names>W. F.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Appleton-Century-Crofts</publisher-name>), <fpage>64</fpage>&#x02013;<lpage>99</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartenbeck</surname> <given-names>P.</given-names></name> <name><surname>FitzGerald</surname> <given-names>T.</given-names></name> <name><surname>Dolan</surname> <given-names>R. J.</given-names></name> <name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>Exploration, novelty, surprise, and free energy minimization</article-title>. <source>Front. Cogn. Sci</source>. <volume>4</volume>:<issue>710</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2013.00710</pub-id><pub-id pub-id-type="pmid">24109469</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Tittgemeyer</surname> <given-names>M.</given-names></name> <name><surname>Kn&#x000F6;sche</surname> <given-names>T. R.</given-names></name> <name><surname>Moran</surname> <given-names>R. J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2009</year>). <article-title>Tractography-based priors for dynamic causal models</article-title>. <source>Neuroimage</source> <volume>47</volume>, <fpage>1628</fpage>&#x02013;<lpage>1638</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.05.096</pub-id><pub-id pub-id-type="pmid">19523523</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sutton</surname> <given-names>R.</given-names></name></person-group> (<year>1992</year>). <article-title>Gain adaptation beats least squares?,</article-title> in <source>In Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems</source> (<publisher-loc>New Haven, CT</publisher-loc>), <fpage>161</fpage>&#x02013;<lpage>166</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sutton</surname> <given-names>R. S.</given-names></name> <name><surname>Barto</surname> <given-names>A. G.</given-names></name></person-group> (<year>1998</year>). <source>Reinforcement Learning: an Introduction</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vossel</surname> <given-names>S.</given-names></name> <name><surname>Mathys</surname> <given-names>C.</given-names></name> <name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Bauer</surname> <given-names>M.</given-names></name> <name><surname>Driver</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Spatial attention, precision, and Bayesian inference: a study of saccadic response speed</article-title>. <source>Cereb. Cortex</source> <volume>24</volume>, <fpage>1436</fpage>&#x02013;<lpage>1450</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhs418</pub-id><pub-id pub-id-type="pmid">23322402</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname> <given-names>R. C.</given-names></name> <name><surname>Nassar</surname> <given-names>M. R.</given-names></name> <name><surname>Gold</surname> <given-names>J. I.</given-names></name></person-group> (<year>2013</year>). <article-title>A mixture of delta-rules approximation to Bayesian inference in change-point problems</article-title>. <source>PLoS Comput Biol</source> <volume>9</volume>:<fpage>e1003150</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003150</pub-id><pub-id pub-id-type="pmid">23935472</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>A.</given-names></name> <name><surname>Dayan</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <source>Expected and Unexpected Uncertainty: ACh and NE in the Neocortex</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://books.nips.cc/nips15.html">http://books.nips.cc/nips15.html</ext-link> (Accessed: July 29, 2013).</citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>A. J.</given-names></name> <name><surname>Dayan</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <article-title>Uncertainty, neuromodulation, and attention</article-title>. <source>Neuron</source> <volume>46</volume>, <fpage>681</fpage>&#x02013;<lpage>692</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2005.04.026</pub-id><pub-id pub-id-type="pmid">15944135</pub-id></citation>
</ref>
</ref-list>
<app-group>
<app id="A1">
<title>Appendices</title>
<sec>
<title>(A) coupling between levels</title>
<p>Since <italic>f</italic> (<italic>x</italic>) is a positive function, there must be a function <italic>g</italic> (<italic>x</italic>) whose exponential it is.</p>
<disp-formula id="E40"><label>(A1)</label><mml:math id="M40"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x0003E;</mml:mo><mml:mn>0</mml:mn><mml:mo>&#x000A0;</mml:mo><mml:mo>&#x02200;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>&#x021D2;</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mo>&#x02203;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mi>g</mml:mi><mml:mo>:</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x000A0;</mml:mo><mml:mo>&#x02200;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:math></disp-formula>
<p>Expanding <italic>g</italic>(<italic>x</italic>) then amounts to expanding the logarithm of <italic>f</italic>(<italic>x</italic>).</p>
<disp-formula id="E41"><label>(A2)</label><mml:math id="M41"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msup><mml:mi>g</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x000B7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>a</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>f</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x000B7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>a</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>O</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mfrac><mml:mrow><mml:msup><mml:mi>f</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy='true'>&#x0FE38;</mml:mo></mml:munder><mml:mrow><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mi>&#x003BA;</mml:mi></mml:mrow></mml:munder><mml:mo>&#x000B7;</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mi>log</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>a</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>f</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy='true'>&#x0FE38;</mml:mo></mml:munder><mml:mrow><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mi>&#x003C9;</mml:mi></mml:mrow></mml:munder><mml:mo>+</mml:mo><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi><mml:mo>+</mml:mo><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E42"><label>(A3)</label><mml:math id="M42"><mml:mrow><mml:mi>&#x021D2;</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02248;</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>Given <italic>f</italic>(<italic>x</italic>), &#x003BA; and &#x003C9; are only unique with respect to the choice of a particular expansion point <italic>a</italic>, except when <italic>f</italic>(<italic>x</italic>) is the exponential of a first-order polynomial, in which case the above approximation is exact. The greater the weight of higher-order terms in <italic>g</italic>(<italic>x</italic>), the less accurate the approximation will be far from the expansion point <italic>a</italic>. In ignoring second and higher order terms, we effectively restrict the HGF to first-order coupling functions (i.e., to coupling functions that are the exponentials of first-order polynomials). However, as this derivation shows, locally (i.e., within small variations of <italic>x</italic>), this restriction does not matter.</p>
</sec>
<sec>
<title>(B) variational inversion of the HGF</title>
<p>Variational inversion provides closed-form one-step update equations for the sufficient statistics that describe the agent&#x00027;s belief about the state of its environment. These update equations are derived in two steps. First, for a particular level <italic>i</italic>, 1 &#x0003C; <italic>i</italic> &#x0003C; <italic>n</italic>, of the hierarchy, a mean field approximation is introduced, where the distributions <italic>q</italic>(<italic>x</italic><sub><italic>j</italic></sub>) for all the other levels <italic>j</italic> &#x02260; <italic>i</italic> are assumed to be known and Gaussian, fully described by the known sufficient statistics {&#x003BC;<sub><italic>j</italic></sub>, &#x003C3;<sub><italic>j</italic></sub>}<sub><italic>j</italic>&#x02260;<italic>i</italic></sub>:</p>
<graphic xlink:href="fnhum-08-00825-e0023.tif"/>
<p>In the absence of sensory noise, <italic>x</italic><sub>1</sub> is directly observed; this means that &#x003BC;<sup>(<italic>k</italic>)</sup><sub>1</sub> &#x0003D; <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub> and &#x003C3;<sup>(<italic>k</italic>)</sup><sub>1</sub> &#x0003D; 0 for all <italic>k</italic>. The approximate posterior <inline-formula><mml:math id="M103"><mml:mover accent='true'><mml:mi>q</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula>(<italic>x</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>) for level <italic>i</italic> at time <italic>k</italic>, given sensory input &#x003BC;<sup>(1 &#x02026; <italic>k</italic>)</sup><sub>1</sub> &#x0003D; {&#x003BC;<sup>(1)</sup><sub>1</sub>, &#x003BC;<sup>(2)</sup><sub>1</sub>, &#x02026;, &#x003BC;<sup>(<italic>k</italic>)</sup><sub>1</sub>} is then</p>
<graphic xlink:href="fnhum-08-00825-e0024.tif"/>
<p>with variational energy <italic>I</italic>,
<disp-formula id="E43"><label>(B3)</label><mml:math id="M43"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:mo>&#x0222B;</mml:mo><mml:mi>q</mml:mi></mml:mrow></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>&#x003C7;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02026;</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>d</mml:mtext><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>q</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mi>q</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>&#x003C7;</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x003D1;</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0003C;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>We further unpack this to get</p>
<graphic xlink:href="fnhum-08-00825-e0025.tif"/>
<p>Performing this last integral, we obtain</p>
<graphic xlink:href="fnhum-08-00825-e0026.tif"/>
<p>This constitutes the &#x0201C;prediction step&#x0201D; of the update. Specifically, we let the random walk do its work by integrating out all states from the previous time point, thus ensuring that the mean field approximation only applies to current values of states. This is akin to the prediction step in the Kalman filter since it gives us the predictive densities for <italic>x</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub> given inputs up to time <italic>k</italic> &#x02212; 1.</p>
<p>This now enables us to solve the integral in Equation B3, yielding (see Appendix C)
<disp-formula id="E44"><label>(B6)</label><mml:math id="M44"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Following the procedure of Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>), we can now calculate the mean and precision of the Gaussian posterior for <italic>x</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>:
<disp-formula id="E45"><label>(B7)</label><mml:math id="M45"><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>I</mml:mi><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mo>&#x02032;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E46"><label>(B8)</label><mml:math id="M46"><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C0;</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:msup><mml:mi>I</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
where a double and single prime denote second and first derivative, respectively. This gives us Equations 9 and 10.</p>
</sec>
<sec>
<title>(C) calculation of the variational energy</title>
<p>By substituting Equation B5 into Equation B3, we get</p>
<graphic xlink:href="fnhum-08-00825-e0027.tif"/>
<p>We now have to solve these last two integrals. The first one can be solved analytically:</p>
<graphic xlink:href="fnhum-08-00825-e0028.tif"/>
<p>To solve the second integral, we take</p>
<graphic xlink:href="fnhum-08-00825-e0029.tif"/>
<p>The sum of Equations C2 and C3 then is the variational energy of Equation B6, up to a constant term that can be absorbed into the normalization constant <inline-graphic xlink:href="fnhum-08-00825-i0004.tif"/> (cf. Equation B2).</p>
</sec>
<sec>
<title>(D) variational energies for categorical outcomes</title>
<p>Using the notation
<disp-formula id="E47"><label>(D1)</label><mml:math id="M47"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mstyle displaystyle='true'><mml:mrow><mml:mo>&#x0222B;</mml:mo><mml:mi>q</mml:mi></mml:mrow></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>d</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula>
for the expectation of <italic>f</italic>(<italic>x</italic>) under <italic>q</italic>(<italic>x</italic><sub><italic>j</italic></sub>) together with the definition of the model described graphically <bold>Figure 2</bold>, we can rewrite Equation B3 as a sum of expectations
<disp-formula id="E48"><label>(D2)</label><mml:math id="M48"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003D1;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The term <italic>p</italic> (<italic>u</italic><sup>(<italic>k</italic>)</sup> | <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub>) is included here to cover also models with sensory uncertainty as discussed in Mathys et al. (<xref ref-type="bibr" rid="B29">2011</xref>). In cases without such uncertainty (sc. <italic>x</italic><sub>1</sub> &#x0003D; <italic>u</italic>), the term vanishes. <italic>p</italic> (<italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub> | <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>2</sub>) can be taken directly from Equation 5, while</p>
<graphic xlink:href="fnhum-08-00825-e0030.tif"/>
<p>and</p>
<graphic xlink:href="fnhum-08-00825-e0031.tif"/>
<p>This is the &#x0201C;prediction step&#x0201D; (cf. Appendix B).</p>
<p>We only need to determine the <italic>I</italic> (<italic>x</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>) up to a constant because any constant term can always be absorbed into <inline-graphic xlink:href="fnhum-08-00825-i0004.tif"/> when forming <inline-formula><mml:math id="M104"><mml:mover accent='true'><mml:mi>q</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula>(<italic>x</italic><sup>(<italic>k</italic>)</sup><sub><italic>i</italic></sub>) according to Equation B2. For the three levels of our example model, this means
<disp-formula id="E49"><label>(D5)</label><mml:math id="M49"><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mtext>const</mml:mtext><mml:mo>.</mml:mo></mml:math></disp-formula>
<disp-formula id="E50"><label>(D6)</label><mml:math id="M50"><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mtext>const</mml:mtext><mml:mo>.</mml:mo></mml:math></disp-formula>
<disp-formula id="E51"><label>(D7)</label><mml:math id="M51"><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003D1;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mtext>const</mml:mtext><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>With two exceptions, all integrals on the right-hand sides above can be solved analytically in all cases considered here, including sensory uncertainty and inference on continuous-valued states.</p>
<p>The two exceptions are the following: first, to solve &#x02329;ln <italic>p</italic> (<italic>x</italic><sup>(<italic>k</italic>)</sup><sub>1</sub> | <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>2</sub>)&#x0232A;<sub><italic>q</italic><sub>\1</sub></sub>, we expand <italic>s</italic> (<italic>x</italic><sup>(<italic>k</italic>)</sup><sub>2</sub>) to second order around the prior expectation &#x003BC;<sup>(<italic>k</italic> &#x02212; 1)</sup><sub>2</sub> of <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>2</sub>:
<disp-formula id="E52"><label>(D8)</label><mml:math id="M52"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>ln</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02248;</mml:mo><mml:mi>ln</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>&#x02212;</mml:mo><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Second, to solve &#x02329;ln <italic>p</italic> (<italic>x</italic><sup>(<italic>k</italic>)</sup><sub>2</sub> | <italic>x</italic><sup>(<italic>k</italic>)</sup><sub>3</sub>, &#x003BA;, &#x003C9;)&#x0232A;<sub><italic>q</italic>\<sub>2</sub></sub>, we take
<disp-formula id="E53"><label>(D9)</label><mml:math id="M53"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mo>&#x0005C;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x02248;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula></p>
<p>The result of doing the integrals in Equations D5&#x02013;D7 is:
<disp-formula id="E54"><label>(D10)</label><mml:math id="M54"><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mi>ln</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E55"><label>(D11)</label><mml:math id="M55"><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>ln</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mtext>e</mml:mtext><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula>
<disp-formula id="E56"><label>(D12)</label><mml:math id="M56"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mtext>e</mml:mtext><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mtext>e</mml:mtext><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003D1;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>At the first level, <italic>q</italic> can be calculated directly. By inserting Equation D10 into Equation B2 and noting that, at the first level, <italic>q</italic> &#x0003D; <inline-formula><mml:math id="M105"><mml:mover accent='true'><mml:mi>q</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula> since <inline-formula><mml:math id="M106"><mml:mover accent='true'><mml:mi>q</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula> already conforms to the distributional assumptions (Bernoulli) that <italic>q</italic> is subject to, we find
<disp-formula id="E57"><label>(D13)</label><mml:math id="M57"><mml:mrow><mml:mi>q</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mn>1</mml:mn><mml:mtext>&#x02009;&#x02009;&#x02009;for&#x02009;</mml:mtext><mml:msubsup><mml:mi>x</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mtext>&#x02009;&#x02009;&#x02009;otherwise</mml:mtext><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>This result is due to the fact that, in the absence of sensory uncertainty, <italic>I</italic> (<italic>x</italic><sub>1</sub>) is only defined for <italic>x</italic><sub>1</sub> &#x0003D; <italic>u</italic> while taking on a value of &#x0201C;minus infinity&#x0201D; otherwise (cf. Equation D5).</p>
<p>At the second level, applying Equations B7 and B8 to Equation D11 yields Equations 32 and 33. At the third level, we recover the standard HGF variational energy for the top level given in Equation B6, yielding the standard HGF updates of Equations 9 and 10.</p>
</sec>
<sec>
<title>(E) derivation of &#x003BE;<sup>&#x0002A;</sup></title>
<p>&#x003BE;<sup>&#x0002A;</sup> can be unpacked in the following way:
<disp-formula id="E58"><label>(E1)</label><mml:math id="M58"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msup><mml:mi>&#x003BE;</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mtext>&#x02009;</mml:mtext><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mtext>&#x02009;</mml:mtext><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mi>ln</mml:mi></mml:mrow></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>arg&#x02009;max</mml:mtext></mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mi>ln</mml:mi></mml:mrow></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003C7;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x003BB;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>ln</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
</sec>
<sec>
<title>(F) coordinate choice on higher levels</title>
<p>This appendix deals with the consequences of choosing a particular scale and origin for <italic>x</italic><sub><italic>i</italic></sub> on higher levels (i.e., where <italic>i</italic> is greater than 1 for <italic>x</italic><sub>1</sub> continuous and greater than 2 for <italic>x</italic><sub>1</sub> discrete). These consequences are important for parameter identifiability. Whenever, there is an ambiguity between parameter values and a coordinate transformation (i.e., shifting and rescaling) of <italic>x</italic><sub><italic>i</italic></sub>, one or several parameters will not be indentifiable. An example will show what we mean by this.</p>
<p>In a three-level HGF with binary <italic>x</italic><sub>1</sub>, the state <italic>x</italic><sub>3</sub> at the third level of the model represents the log-volatility of <italic>x</italic><sub>2</sub>. We now make use of the fact that any change in the initial value &#x003BC;<sup>(0)</sup><sub>3</sub> of &#x003BC;<sub>3</sub> can be neutralized by corresponding changes in &#x003BA; and &#x003C9;. This means that by adjusting &#x003BC;<sup>(0)</sup><sub>3</sub> appropriately, we can set &#x003BA; &#x0003D; 1, thereby making it seemingly disappear from the model. As an example, here are parameter estimates from a behavioral study where estimation was performed with &#x003BC;<sup>(0)</sup><sub>3</sub> set to 1:</p>
<disp-formula id="E59"><label>(F1)</label><mml:math id="M59"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>&#x003BA;</mml:mi><mml:mo>=</mml:mo><mml:mn>2.49</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0.988</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>&#x003D1;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.000592</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We can now make &#x003BA; disappear by setting it to 1. This change should be neutral to the model&#x00027;s predictions of input, which can only be achieved by the following compensatory substitutions:
<disp-formula id="E60"><label>(F2)</label><mml:math id="M60"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02192;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>2.49</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;</mml:mtext><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0.988</mml:mn><mml:mo>&#x02192;</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0.988</mml:mn><mml:mo>&#x000B7;</mml:mo><mml:msup><mml:mn>2.49</mml:mn><mml:mn>2</mml:mn></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>&#x003D1;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.000592</mml:mn><mml:mo>&#x02192;</mml:mo><mml:mi>&#x003D1;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.000592</mml:mn><mml:mo>&#x000B7;</mml:mo><mml:msup><mml:mn>2.49</mml:mn><mml:mn>2</mml:mn></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>At the first two levels, nothing has changed (the trajectory of &#x003BC;<sub>2</sub> and therefore the input predictions <inline-formula><mml:math id="M107"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula><sub>1</sub> are the same); however, at the third level <italic>x</italic><sub>3</sub> has been rescaled by the inverse of the factor with which &#x003BA; has been rescaled (i.e., 1/2.49). This means that the term
<disp-formula id="E61"><label>(F3)</label><mml:math id="M61"><mml:mrow><mml:msup><mml:mi>v</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>is invariant under the above transformation for all time points <italic>k</italic>, leading to the same trajectory in &#x003BC;<sub>2</sub> and &#x003C3;<sub>2</sub> according to Equations 37 and 38 as before.</p>
<p>Effectively, because of the transformation to a fixed &#x003BA; &#x0003D; 1, our estimate of &#x003BA; has been reinterpreted as an estimate of &#x003BC;<sup>(0)</sup><sub>3</sub>. This is a consequence of our freedom to choose coordinates on <italic>x</italic><sub>3</sub>. To take an analogy from geometry, the distance between Zurich and London is an objective geometric quantity that does not change whether it is measured in miles or kilometers; we may even introduce new units where this distance is 1. Likewise, we may always rescale <italic>x</italic><sub>3</sub> individually for each agent (and estimation) such that the coupling &#x003BA; between the second and third level has value 1. Note, however, that this does not prevent the coupling from differing between agents, just as the actual geometric lengths of a mile and a kilometer are different even though they both have value 1 in a mile-based and kilometer-based coordinate system, respectively.</p>
<p>Whenever the representations of <italic>x</italic><sub>3</sub> (i.e., &#x003BC;<sub>3</sub> and &#x003C3;<sub>3</sub>) are part of the observation model, this gives us a direct handle on <italic>x</italic><sub>3</sub> and measures of its representations can immediately be compared between agents, provided we use always the same coordinates (which we would automatically do without even thinking about it; individually rescaling &#x003BA; to 1 would then obviously be unwise). However, in cases where we have no measure of <italic>x</italic><sub>3</sub> through the observation model, there is a fundamental ambiguity between individual differences in coupling (i.e., &#x003BA;) and individual differences in priors for <italic>x</italic><sub>3</sub> (i.e., &#x003BC;<sup>(0)</sup><sub>3</sub>).</p>
<p>Nonetheless, we have to make some choice of coordinates. In setting &#x003BC;<sup>(0)</sup><sub>3</sub> &#x0003D; 1, we choose to take the belief on environmental volatility that an agent begins inference with as the benchmark. Observed differences in learning are then attributed to, inter alia, differences in coupling. This is one way to obtain comparable measures of belief on <italic>x</italic><sub>3</sub> (and consequently, ?) between agents, since one may be able to influence their priors while there is usually no way to equalize their coupling levels.</p>
<p>Just as it is possible to set &#x003BA; to an arbitrary non-zero value while keeping <italic>v</italic> invariant by compensatory substitutions, one can set &#x003C9; to an arbitrary value with invariant <italic>v</italic> using another set of compensatory substitutions. In particular, &#x003C9; can be set to zero, thereby making it seemingly disappear from the model, just as &#x003BA; seems to disappear when set to 1.</p>
<p>We have seen that any change of scale in <italic>x</italic><sub>3</sub> is expressed in a corresponding change of &#x003BA;. However, there is an additional degree of freedom in choosing coordinates on <italic>x</italic><sub>3</sub>: the choice of origin. Changes in this are expressed in a corresponding change of &#x003C9;:
<disp-formula id="E62"><label>(F4)</label><mml:math id="M62"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mo>&#x02032;</mml:mo></mml:msubsup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:msub><mml:mi>x</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:math></disp-formula>
then
<disp-formula id="E63"><label>(F5)</label><mml:math id="M63"><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mo>&#x02032;</mml:mo></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:mi>a</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>x</mml:mi><mml:mn>3</mml:mn><mml:mo>&#x02032;</mml:mo></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>&#x003C9;</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
with
<disp-formula id="E64"><label>(F6)</label><mml:math id="M64"><mml:mrow><mml:msup><mml:mi>&#x003C9;</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mtext>def</mml:mtext></mml:mrow></mml:mover><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mrow></mml:math></disp-formula></p>
<p>In our example, we have reinterpreted the estimate of &#x003BA; as one of &#x003BC;<sup>(0)</sup><sub>3</sub>. We may now go on to reinterpret it another time, this time as an estimate of &#x003C9; (up to now fixed to &#x02013;4) by shifting the origin on <italic>x</italic><sub>3</sub> such that &#x003BC;<sup>(0)</sup><sub>3</sub> is again fixed to 1:
<disp-formula id="E65"><label>(F7)</label><mml:math id="M65"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>2.49</mml:mn><mml:mo>&#x02192;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>&#x003C9;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>4</mml:mn><mml:mo>&#x02192;</mml:mo><mml:mi>&#x003C9;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mn>2.49</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>4</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>2.51</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Again, the trajectories at the first two levels have not changed. It is now apparent that by rescaling <italic>x</italic><sub>3</sub> and shifting its origin, we can choose arbitrary values for two out of the three parameters &#x003BC;<sup>(0)</sup><sub>3</sub>, &#x003BA;, and &#x003C9;. However, we repeat that this is only possible as long as we do not measure <italic>x</italic><sub>3</sub> on any objective scale by including its representations in the response model.</p>
<p>Equivalence classes (with equivalence defined as leading to invariant <italic>v</italic>) of parameter values are defined by the following <italic>conservation laws</italic>:
<disp-formula id="E66"><label>(F8)</label><mml:math id="M66"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msup><mml:mi>&#x003BA;</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mi>&#x003C9;</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x003C9;</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msup><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>&#x003D1;</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi>&#x003BA;</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>&#x003D1;</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msup><mml:mi>&#x003BA;</mml:mi><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>&#x003BA;</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Using these equations, we can now make &#x003BA; and &#x003C9; seemingly disappear by setting them to 1 and 0, respectively. Note that this does not amount to a simplification of the model: it is only a coordinate choice. In our example this means:
<disp-formula id="E67"><label>(F9)</label><mml:math id="M67"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>&#x003C9;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>2.51</mml:mn><mml:mo>&#x02192;</mml:mo><mml:mi>&#x003C9;</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02192;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1.51</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>This disappearance of &#x003BA; and &#x003C9; (and in the general case, all &#x003BA;<sub><italic>i</italic></sub> and &#x003C9;<sub><italic>i</italic></sub>) from the model may seem convenient. However, a danger here is to confuse coordinates with the underlying reality they describe. Crucially, it is impossible to discuss the choice of coordinate choice on higher levels in a model that lacks &#x003BA;<sub><italic>i</italic></sub> and &#x003C9;<sub><italic>i</italic></sub>. Yet this choice does not cease to exist only because one makes it implicitly instead of explicitly.</p>
<p>In summary, in order to avoid on overparameterized model when &#x003BC;<sub><italic>i</italic></sub> is not included in the response model, it is advisable to be explicitly arbitrary by fixing two out of &#x003BC;<sup>(0)</sup><sub><italic>i</italic></sub>, &#x003BA;<sub><italic>i</italic></sub>, and &#x003C9;<sub><italic>i</italic></sub>, reflecting a choice of origin and scale on <italic>x</italic><sub><italic>i</italic></sub>. By contrast, when &#x003BC;<sub><italic>i</italic></sub> is included in the response model, all of &#x003BC;<sup>(0)</sup><sub><italic>i</italic></sub>, &#x003BA;<sub><italic>i</italic></sub>, and &#x003C9;<sub><italic>i</italic></sub> can be estimated.</p>
</sec>
</app>
</app-group>
</back>
</article>