Simulating Active Inference Processes by Message Passing

The free energy principle (FEP) offers a variational calculus-based description for how biological agents persevere through interactions with their environment. Active inference (AI) is a corollary of the FEP, which states that biological agents act to fulfill prior beliefs about preferred future observations (target priors). Purposeful behavior then results from variational free energy minimization with respect to a generative model of the environment with included target priors. However, manual derivations for free energy minimizing algorithms on custom dynamic models can become tedious and error-prone. While probabilistic programming (PP) techniques enable automatic derivation of inference algorithms on free-form models, full automation of AI requires specialized tools for inference on dynamic models, together with the description of an experimental protocol that governs the interaction between the agent and its simulated environment. The contributions of the present paper are two-fold. Firstly, we illustrate how AI can be automated with the use of ForneyLab, a recent PP toolbox that specializes in variational inference on flexibly definable dynamic models. More specifically, we describe AI agents in a dynamic environment as probabilistic state space models (SSM) and perform inference for perception and control in these agents by message passing on a factor graph representation of the SSM. Secondly, we propose a formal experimental protocol for simulated AI. We exemplify how this protocol leads to goal-directed behavior for flexibly definable AI agents in two classical RL examples, namely the Bayesian thermostat and the mountain car parking problems.


INTRODUCTION
The free energy principle (FEP) offers an ambitious theory for how biological agents perceive and interact with their environment (Friston, 2009(Friston, , 2010. The FEP postulates that in order for an agent to exist (and persist) under time-varying environmental conditions, it must minimize a free energy functional under the agent's internal ("generative") model for environmental observations (Friston et al., 2006). Active inference, which is a corollary of the free energy principle, claims that natural agents act to fulfill prior beliefs about preferred observations (Friston, 2010). These prior beliefs about future observations are part of the agent's internal model specification, and free energy minimization thus ensures that the agent avoids surprising states.
Currently, the derivation of active inference algorithms on free-form dynamic models still requires manual work. Automation of active inference processes might enable practitioners to build more effective, flexible and scalable agents . In addition, automated execution of active inference processes also requires the definition of a formal experimental protocol that governs the interaction between the agent and its environment.
The derivation of a free energy minimizing algorithm can be automated through the use of probabilistic programming (PP) techniques (Tran et al., 2016;Carpenter et al., 2017;Minka et al., 2018). While most PP toolboxes offer free-form modeling tools and automated derivation of flexible inference algorithms, their generality often comes at the cost of increased computational load. In contrast, dynamic models incorporate model-specific structure that can be exploited to increase algorithm performance. Here, message passing on a factor graph description of the generative model is especially suited for inference in flexibly definable dynamic models (Loeliger et al., 2007;Cox et al., 2019).
The current paper details an experimental protocol and simulation environment for the automated derivation and execution of online active inference processes in a dynamic context. Crucially, we illustrate how the message passing approach and proposed experimental protocol cooperate to automate the execution of structured active inference algorithms on flexibly definable generative dynamic models. We address the following questions: 1. How can online active inference processes be operationally described by an experimental protocol? 2. How can active inference processes be automatically derived within the given protocol?
With respect to the first issue, we describe a protocol that formally captures the interactions between an (active inference) agent and its environment. The protocol supports online simulations under situated conditions. Concerning the second issue, the current paper provides a full message-passing based account of active inference, formulated in a Forney-style factor graph (FFG) representation of the internal model (Forney, 2001). The FFG formulation supports flexible model definitions and automated (active) inference execution by message passing-based free energy minimization. Crucially, this automation absolves the need for manual derivations of variational calculus problems and in principle scales up to complex hierarchical and interdependent models, making the approach suitable for industrial-sized applications.
The paper is structured as follows. Sections 2 and 3 offer a short technical rehearsal to active inference and FFGs, respectively. The experimental protocol is detailed in section 4. In section 5 we test our proposed protocol by simulating two classical active inference applications, namely the Bayesian thermostat and the mountain car. These simulations were executed with ForneyLab, which is a freely available toolbox for automated free energy minimization in FFGs that we have developed in our research lab (Cox et al., 2019). Finally, we discuss related work in section 6 and conclude in section 7. Friston (2013) considers the consequences of being alive for the internal informational mechanics of natural agents. This FIGURE 1 | The Markov blanket forms a statistical separation between an agent and its environment. Arrows indicate the "generative" direction. Conditioning on the Markov blanket renders information about the internal states of the agent independent from information about hidden state of the environment. Blanket states for the agent are the variables for controls u t and observations x t . The solid diamond node in the Markov blanket links posterior beliefs over the agent's control variables to environmental actions, and the solid square node in the Markov blanket passes environmental outcomes to agent observations. analysis leads to the conclusion that natural agents appear to exchange information with their environment so as to maximize Bayesian evidence (i.e., minimize free energy) for an internal model of sensory data. We consider the system of interacting agent and environment as drawn in Figure 1. Both the agent and environment are considered dynamical systems with hidden state dynamics. The environment executes a process (y t , z t ) = R t (z t−1 , a t ), where a t is an action, z t a latent state vector and y t is the output signal. The agent holds a generative probabilistic model p t (x, s, u) for the environmental process, where x, s, and u are sequences of observations, internal states and controls. The action at time t in the environmental process is represented in the agent's model by control variable u t and environmental output (y t ) by sensory variable ("observation") x t . The agent has a single goal, namely minimizing free energy, which roughly corresponds to minimizing (precision-weighted) prediction errors for its sensory inputs x (Rao and Ballard, 1999;Huang and Rao, 2011). Minimizing free energy leads to posterior beliefs over both the states s (which affects predictions and thereby prediction errors) and controls, which are observed by the environment as actions that lead to changes in the sensory inputs. These changes in sensory inputs may also affect prediction errors. Thus, inference for the states and controls both lead to free energy minimization. Technically, in order for a natural agent to persevere, it must be both physically and statistically separated from its environment. This statistical skin is called a Markov Blanket, which comprises the sensory and action variables. Sensory variables are affected by internal states of the environment but do not directly affect internal environmental states. Similarly, actions are affected by internal states of the agent but do not directly affect internal states of the agent.

ACTIVE INFERENCE
Because active inference reasons from observations toward controls, the inference process requires the definition of an "inverse" probabilistic model which is sometimes called the recognition model q t . Full Bayesian inversion of p t is in general intractable, so the agent resorts to approximating Bayesian inference by minimizing a variational free energy functional, defined by: Minimizing Equation (1) renders the free energy an upper bound on surprise (negative log-evidence), while simultaneously approximating the (generally unavailable) true posterior p t (s, u|x) with the recognition model q t . In order to render this optimization process tractable, the recognition model is often factorized (Attias, 1999), where a fully factorized recognition model is referred to as the "mean-field" assumption. In order to equip the agent with a sense of "goal-directedness" or "purpose, " the internal model extends over future states and incorporates counter-factual beliefs about desired future outcomes, also referred to as target priors (Parr and Friston, 2018). These target priors lead to high surprisal for observations that are unlikely under the agent's preferences. Free energy minimization then produces (approximate) posterior beliefs over controls that are believed (by the agent) to avoid these undesired (surprising) observations. In the current paper we set the target priors ourselves, but more generally these priors might be set by contextual processes such as other agents or higher-level temporal layers. These ideas are further discussed in section 7.
Previous accounts of active inference introduce an expected free energy that steers goal-directed behavior via a prior over control (Friston et al., 2015;Friston K.J. et al., 2017). Instead, we will employ the internal model formulation by Parr and Friston (2018), which explicitly includes counter-factual prior beliefs over future observations in order to steer behavior. This leads to a unified model specification that allows us to optimize a single free energy functional (see also de Vries and Friston, 2017). Minimizing this functional by message passing simultaneously captures inference over current states (perception) as well as future controls (action/policy planning).
Practical models for active inference that are suitable for industrial application may be complex, layered and embedded in a volatile context. Manual derivation of free energy minimizing algorithms for active inference algorithms will then become prohibitively tedious and error-prone. A graphical representation of the internal model will aid with visualization of complex models and allows for automated derivation of message passing algorithms. The next section introduces Forney-style factor graphs as a graphical framework for automatic derivation of active inference algorithms.

INFERENCE BY MESSAGE PASSING ON FORNEY-STYLE FACTOR GRAPHS
A Forney-style factor graph (FFG), also known as a "normal" factor graph, offers a graphical description of a factorized function (Forney, 2001). Excellent and detailed introductions to FFGs are available in (Loeliger, 2004;. While related graphical formalisms such as Bayesian networks, Markov random fields and bipartite factor graphs offer essentially equivalent formulations (Forney, 2001;Loeliger, 2004), the FFG formalism is especially suited for representing dynamical models (Loeliger et al., 2007). Specifically, the FFG representation requires only a single node and message type, while retaining the explicit representation of variable relations through factor nodes.
As an example factorization, in this section we consider the function of Equation (2), which splits into four factors: In this paper, we assume that the function f is a probability distribution. The FFG for this factorization is drawn in Figure 2 (middle), together with the equivalent bipartite factor graph representation (left) for comparison. In an FFG, variables FIGURE 2 | Bipartite factor graph (left) and Forney-style factor graph (FFG) (middle) for the model of Equation (2), together with the message passing schedule for computing (Equation 3b) (right). In a bipartite factor graph, solid nodes represent factors and round nodes represent variables. In an FFG, edges represent variables and all nodes represent factors, where solid nodes indicate observations. Frontiers in Robotics and AI | www.frontiersin.org correspond to edges and factors are represented by nodes. A node is connected to an edge only if the variable of the edge is an argument in the factor. For instance, node f c connects to edges x 2 and x 3 since f c = f c (x 2 , x 3 ). Now assume that we observe x 3 =x 3 (technically: x 3 is a variable that takes on valuex 3 ) and are interested in computing a posterior distribution for x 4 . The observation of x 3 then imposes an additional constraint δ(x 3 −x 3 ) on the model, which clamps this variable to its observed value. Following (Reller, 2012), we indicate observations by a small solid node, see Figure 2 (right).
The integral for computing the posterior distribution is shown in Equation (3a). A direct approach to solving this integral might be tedious and error-prone. Conveniently, the distributive law allows us to pull unaffected integrands out of integrals, thus unpacking the total integral into a product of sub-integrals, each of which can be interpreted as a message over an edge of the factor graph, cf. Figure 2 (right) and Equation (3b): Consecutive computation of these messages then leads to the solution of Equation (3c), where the (unnormalized) posterior function results from the product of the two colliding messages 3 and 4 . For efficiency reasons the message normalizing constants are often ignored, and messages are represented by a probability distribution with corresponding sufficient statistics. The message can then be interpreted as an information summary for the variable of the corresponding edge. For instance, message 3 in Figure 2 (right) represents the probability distribution for x 4 given the information in the large box at left-hand side of message 3 . While an FFG is principally an undirected graph, we often draw arrows on the edges in order to anchor the message direction. A forward message − → µ aligns with the edge arrow, and a backward message ← − µ aligns against the edge direction. Crucially, the message passing approach to inference allows to reuse pre-derived solutions to specific message updates for elementary factors across multiple models. Implementing these solutions in a look-up table allows us to automate the derivation and execution of message passing algorithms. Well-known algorithms such as (loopy) belief propagation (Forney, 2001), variational message passing (Dauwels, 2007), expectation maximization (Dauwels et al., 2005), and expectation propagation (Cox, 2018) have all been formulated as message passing procedures on an FFG.

Example: Equality Node
In order to exemplify the derivation of a reusable message update rule for an elementary factor, we consider the equality factor (see also Cox et al., 2019), defined as In contrast to a variable in a bipartite graph (Figure 2, left), a variable in an FFG can connect to at most two factors. The equality factor resolves this situation by constraining the information about three variables to be equal (Equation 4). Edges constrained by equality factors can then effectively be regarded as a single variable that is shared among connected factors. An update for the equality node is computed by Equation (5b), for which the schedule is graphically represented by Figure 3 (left). It can be seen that this schedule fuses information from two branches into a single message, much like Bayes rule does. Indeed, in FFGs, equality nodes are often used to connect prior information about a variable with likelihood-based information about that variable.

Dealing With Nonlinear Factors
The modularity of the message passing approach allows to make local approximations. Through local linearization, we can pass messages through nodes that encode nonlinear constraints. Here Frontiers in Robotics and AI | www.frontiersin.org we consider the factor of Equation (6), where g(x) is a nonlinear differentiable function: The forward message is expressed as Equation (7a) and drawn in Figure 3 (right).
When the incoming message 1 is in the exponential family, it can be conveniently parameterized by its sufficient statistics. However, because of the nonlinear transformation, the outgoing message is no longer guaranteed to be a member of the exponential family, which may lead to intractable updates. Therefore, we trade some accuracy for computability by making a local linear approximation to the node function. We choose the approximation pointx as the mean of the incoming message, and expand g locally asg where J g (x) is the Jacobi matrix at the approximation point (or the first derivative in the scalar case). By substituting Equation (8) in Equation (6), we can obtain approximate but tractable message updates. This local linearization strategy for nonlinear factors will be used in section 5.2.

ONLINE ACTIVE INFERENCE IN FACTOR GRAPHS
In section 2 we mentioned the agent's internal model p t , which expresses the agent's prior beliefs about how the environmental process generates observations from actions. In the current section we propose a simulation protocol for online active inference execution by the agent.

Model Specification
We consider an agent with a state-space model (SSM) factorization for its internal model, given by where x, s, and u are sequences with ranges that are implicitly given by the model specification at the right-hand side. Note how the model of Equation (9) differs slightly from a standard SSM factorization (Koller and Friedman, 2009), because it includes additional "target" priors p ′ (x k ) over desired future outcomes. A factor graph representation of Equation (9) is shown in Figure 4. Also note that the probability distribution for the internal model has a subscript t to indicate that the model is time-varying. We will consider these aspects in more detail below. Also note that at time step t, the agent has assumptions about how the environment will evolve over the next T + 1 time steps since we can run this model forward and generate observations x k for k = t, t + 1, . . . , t + T. The horizon T is determined by the information content of the target priors p ′ (x k ). These target priors are generally set by states of contextual processes, i.e., not by this agent but rather by other agents (or higher level processes) that encode unsurprising future outcomes for this agent. In order to distinguish the predictive model for observations p(x k |s k ) from the context-based target prior for observations p ′ (x k ), we label the latter factor with a prime.
We refer to a sequence of future controls u = (u t , u t+1 , . . . , u t+T ) as a policy. Through inference, the posterior over policies becomes dependent on the hidden state sequence s. Prior to inference however, the model requires the definition of a prior belief over policies that constrains attainable control states. In a more general formulation of the internal model, we would write a prior over policies p(u) = p(u t , u t+1 , . . . , u t+T ). Here, for simplicity, we assume p(u) = t+T k=t p(u k ).
FIGURE 4 | The Forney-style factor graph for the agent's generative model (Equation 9).
Frontiers in Robotics and AI | www.frontiersin.org Next to the internal model, we assume that the agent has access to a variational distribution, also known as the recognition distribution, that encodes the agent's posterior beliefs about all latent states. Because the future is by definition unobserved, the recognition distribution includes the future observation variables as well.
The distinction between the agent's prior ("generative") beliefs p t (x, s, u) and posterior ("recognition") beliefs q t (x, s, u) will be finessed below. At the start of the simulation (say at t = 1), we will set q 1 to an uninformative distribution. As time progresses, observations become available and subsequent inference iterations will tune q t to more informed distributions.
Often, but not necessarily so, the recognition distribution is assumed to be fully factorized (this is the mean-field assumption).
In the present article we assume a structured factorization for the recognition distribution that is solely induced by the internal model factorization (Bishop, 2006). In general, we may write Since actions onto the environment are real-valued interventions, we will generally enforce a deterministic posterior belief over policies, i.e., where υ k (upsilon) are parameters that are to be determined in the free-energy minimization process (see section 4.3). In order words, while the prior belief over policies p(u) may contain uncertainties, we will fix the posterior belief over policies q(u) on a single concrete sequence. As time progresses, at each time step, the agent interacts with its environment through exchanging actions and observations, followed by processing the observations. Technically, everything that the agents does can be described as updates to its internal and recognition models. We distinguish three phases per time step: (1) act-execute-observe, (2) infer (process observations), and (3) slide (move one time slice ahead), see Algorithm 1. Next, we consider each step in more detail.

The "Act-Execute-Observe" Step
Since we assumed that the posterior beliefs over control states were constrained by δ(u k − υ k ), we will select the action at time step t as a t = υ t .

4:
Slide // move one time slice ahead 5: end for Technically, in a factor graph context, we will let the agent be the owner of the control state u t and the environment is the owner of action variable a t . The agent and environment are coupled at the agent's Markov blanket by an interface factor δ(u t − a t ). Message passing from the agent to the environment will now pass the agent's posterior belief q(u t ) = δ(u t − υ t ) over control state u t to the action, leading to a t = υ t . Next, this action is imposed onto the environmental process R t that generates outputs y t by where z t refers to the dynamic states of the environmental process. Here, we call environmental processing the execution phase. In similar fashion to the action-interface, if we let y t refer to the output variable andŷ t is the value of the environmental output at time t, then an observation-interface node δ(x t − y t ) at the agent's Markov blanket leads to an incoming message δ(x t −ŷ t ) from observation variable x t . In a real-world setting, the agent does not know the model R t nor the environmental states, but still gets to observe the environmental output. Acting onto the environment and observing the consequences can technically be processed by the agent through updating its internal model to The factor graph of the updated internal model is shown in Figure 5. We use small filled diamond nodes to indicate causal interventions (actions) and small square nodes for observation variables. Note that when we multiply the model with a delta distribution δ(u t − υ t ), we technically overwrite the prior p(u t ) and the open square for p(u t ) is replaced by a black diamond smaller square to indicate the intervention. Similarly, the target prior p ′ (x t ) is omitted because the observation δ(x t −ŷ t ) renders it conditionally independent from the rest of the model.

The "Infer" Step
The internal model has changed as a result of acting and observing. In the infer step, we process the consequences of this change for the model's latent variables. Technically, we update the posterior from q t−1 to q t by free energy minimization. Generally, the recognition distribution will be parameterized by sufficient statistics µ, so we can write q t (x, s, u) = q(x, s, u|µ t ) and free energy minimization amounts to finding µ t by This minimization procedure can be executed through variational message passing (VMP) in the factor graph for the updated internal model p t (x, s, u). The inference process is visualized in Figure 6. VMP involves iteratively updating single (or a few) components of µ while holding the other components fixed. In this procedure, it will be very useful to set the initial value µ init t to µ t−1 . With this initialization, in most cases, the number of VMP iterations to convergence per time step will be quite low.

The "Slide" Step
The slide step implements a time step increment and is best understood by looking at the factor graph in Figure 7. Essentially, we eliminate (marginalize out) the first time slice of the internal and recognition models and add a time slice to the horizon. Then we fix the indexing back to p t .
Formally, marginalization of the first slice and adding a slice at the horizon leads to the following update of the agent's internal model: p(x t+T+1 |s t+T+1 ) p(s t+T+1 |s t+T , u t+T+1 ) p(u t+T+1 ) p ′ (x t+T+1 ) add slice at horizon Frontiers in Robotics and AI | www.frontiersin.org Here, we have processed the action and observation together with sliding a time step in one update. The marginalization of x t , s t−1 , and u t in the first slice of the graph can be executed by message passing. In practice, we can simply re-use message 4 from Figure 6 (after re-normalization) as the new state prior p(s t ). In order to maintain a horizon of T time steps, we add one time slice of the state space model to the horizon (at time step t + T + 1). Again, it is assumed that a contextual agent feeds the target prior p ′ (x t+T+1 ).
Adding and deleting slices at each time step also provides room for changing the model structure of the internal model on the basis of current contextual information. For instance, the observation model p(x t+T+1 |s t+T+1 ) does not need to be the same as p(x t+T |s t+T ). The decision for the structure of p(x t+T+1 |s t+T+1 ) can be postponed until time step t.
Because the recognition model is simply parameterized by its sufficient statistics [q t (x, s, u) = q(x, s, u|µ t )], we do not require an explicit slide for the recognition model. In the next "infer" step, the recognition model is simply initialized with the present statistics (µ init t = µ t−1 ) and updated to the new posterior statistics, without the need for redefining the recognition model. Once the internal model has slided forward by one slice, we increment the time step index by so as to obtain a generative model p t (x, s, u) again (see Equation  9), but now for an increased time step. The Slide phase is followed by repeating the loop, starting with an Act-Execute-Observe step. In summary, online active inference by a dynamic agent proceeds according to updating both its internal and recognition distributions at each time step according to Equations 9, 11, 15, 16, and 17. Actions and observations are technically processed by appending delta factors to the agent's internal model. The effects of these internal model changes on the latent variables in the model are inferred through free energy minimization, which leads to an updated recognition distribution. Next, the model slides forward one time slice and starts a new Act-Execute-Observe step.

SIMULATIONS
In order to illustrate the operationability of the active inference protocol, we simulate two classic active inference problems, namely the Bayesian thermostat Buckley et al., 2017) and the mountain car Sutton and Barto, 2018;Ueltzhöffer, 2018). We use ForneyLab (version 0.9.1) as a tool for automated derivation of message passing algorithms (Cox et al., 2019). ForneyLab (available from https:// github.com/biaslab/ForneyLab.jl) supports flexible specifications of factorized probabilistic dynamic models  and generates high-performance inference algorithms on these models (Cox et al., 2019).
ForneyLab is written in Julia, a high-level programming language for numerical computing. Julia combines an accessible MATLAB-like syntax with C-like performance (Bezanson et al., 2017). Julia's excellent meta-programming capabilities allow ForneyLab to define an intuitive domain-specific model specification syntax, and to automatically compile message passing schedules directly to executable Julia code. Furthermore, Julia's powerful multiple dispatch functionality enables efficient scheduling and algorithm execution.

The Bayesian Thermostat
In this section we simulate an agent that can relocate itself in a temperature gradient field. The agent aims to position itself at a desired temperature relative to a heat source. Our simulation setup is an adaptation of the setup described by Buckley et al. (2017).

Environmental Process Specification
First we specify the environmental process. The temperature T as a function of position z follows the profile Frontiers in Robotics and AI | www.frontiersin.org where T 0 = 100 represents the temperature at the location of the heat source, see Figure 8. The environmental process is steered by actions a t that reflect the velocity of the agent. The output of the environmental process is the observed temperature by the agent. We assume that the agent observes a noisy temperature, leading to the following environmental process equations: In our simulation, we used initial position z 0 = 2 and observation noise variance ϑ = 10 −2 .

Internal Model Specification
The agent's internal model follows the factorization of Equation (9). The goal prior encodes a belief about a preferred temperature x • + = 4, as Furthermore, we endow the agent with an accurate model of system dynamics p(s k |s k−1 , u k ) = N s k |s k−1 + u k , 10 −2 .
However, in order to challenge the agent, we hamper the observation model. Instead of the actual temperature profile (Equation 19b), we use which simply specifies that the observed temperature decreases with position. The internal model definition is completed by specifying a prior for controls and a vague prior for the initial state: p(u k ) = N u k |0, 10 −2 (23a) p(s 0 ) = N s 0 |0, 10 12 .
Substituting Equations (20-23) in Equation (9) then returns the full internal model definition. One section of the generative model FFG is shown in Figure 9. A Julia code snippet 1 for constructing the internal model with ForneyLab is shown in Figure 10.
After having specified the generative model, ForneyLab can be used to automatically generate a message passing algorithm for free energy minimization, see Figure 11. ForneyLab generates Julia source code for the message passing algorithm, see Figure 12 for a snippet of the generated code.

Simulation Results
The agent-environment system was simulated using the experimental protocol as defined in section 4, for 100 time steps and with horizon T = 2, with the additional constraint that the agent is only allowed to act after t = 25. Execution of the experimental protocol amounts to executing the code of Figure 13. Note that this implementation of the experimental protocol ensures that the agent never observes the environmental states of the environment directly, but rather only interacts with the environmental process through the act, execute, and observe functions. This setup emphasizes the idea that the environmental states (z) are not (directly) observable. In principle, this setup is then applicable to a real-world setting where the environmental process is not a simulation.
The graphs of Figure 14 show the simulation results. It can be seen that after the agent is allowed to act, it quickly moves away from the heat source in order to eventually settle at the desired temperature. Apparently, despite the hampered observation model, the agent still attains the desired goal.
placeholder indicates a placeholder for data that can be passed to the algorithm during inference. algo = sumProductAlgorithm ([s; u]) # Generate message passing algorithm code FIGURE 11 | ForneyLab command for automated generation of the message passing algorithm for inference in the Bayesian thermostat simulation. The sum-product algorithm implicitly assumes the recognition model of Equation (11). ForneyLab also supports generating message passing code for alternative constraints on the recognition distribution (see e.g., Cox et al., 2018).   [9].dist * messages [20].dist return marginals end FIGURE 12 | Snippet of the automatically generated inference algorithm code (T = 2) for the Bayesian thermostat. Upon inference, the step! function builds an array of messages using pre-derived update rules (e.g., ruleSPAdditionOutVGG) and returns posterior beliefs over internal and control states.

Mountain Car
In this section we simulate an agent that aims to relocate and park itself on a steep hill. However, the agent's engine is too weak to climb the hill directly. Therefore, a successful agent should first climb a neighboring hill, and subsequently use its momentum to overcome the steep incline on the goal-side. This task is also known as the mountain car problem , which is considered a classical benchmark in the reinforcement learning literature (Sutton and Barto, 2018).

Environmental Process Specification
We start by defining the environmental process, which is similar to the process defined in Ueltzhöffer (2018). We interpret the environmental state z t = (φ t ,φ t ), as the respective position and velocity of the mountain car. The horizontal gravitational force component of the hilly landscape depends upon the mountain car's position by [see Figure 15 (top)] Furthermore, we define a velocity-dependent drag as Through actions, the agent is allowed to set the engine force, which is limited to the interval (−0.04, 0.04), by [see  Assuming unit mass for the mountain car, this leads to the following discrete system dynamics: In our simulation, we choose initial position φ 0 = −0.5 and initial velocityφ 0 = 0. The environmental process generates outcomes as noisy observations of the current state with an observation noise variance θ = 10 −4 · I, where I represents the 2 × 2 identity matrix. This leads to the following environmental system dynamics: where Equations (27a, 27b) are summarized by a transition function g(·).

Internal Model Specification
For the internal model we define observation variables x k = (ξ k ,ξ k ), and encode the agent's target to reach a desired state x • + = (ξ • + ,ξ • + ) = (0.5, 0) at time t = 20 by defining a time-dependent target prior In other words, we set a vague prior belief over short-term observations (t < 20), but aim to reach and remain at the goal state afterwards. We endow the agent with an accurate model of the environmental dynamics. These dynamics are captured by the transition model for the internal states s k = (ζ k ,ζ k ). Because the current state is a non-linear function of the previous state and action, we resort to local linear approximations of the system dynamics as explained in section 3.2. We choose the transition model p(s k | s k−1 , u k ) = N s k |g(s k−1 , u k ), 10 −4 · I , whereg represents the local linear approximation of g in Equation (28a). The observation model simply captures the observation noise FIGURE 16 | A factor graph visualization of one time slice of the agent's internal model for the mountain car problem (excluding the initial state prior).
These priors specify an accurate belief over the initial state and few prior constraints on control signals. Figure 16 shows one time slice of the agent's internal model.

Simulation Results
The simulation results 1 for 30 time steps and with horizon T = 20 are shown in Figure 17. Here, a naive policy (right column) with full throttle to the right does not overcome the steepest part of the incline, while the active inference process (left column) infers the need to move left before engaging full throttle to the right in order to reach the desired goal state.

RELATED WORK
Languages and toolboxes for automated probabilistic inference are increasingly studied in the research literature under the label Probabilistic Programming (PP). Recent state-of-the-art PP toolboxes such as Stan (Carpenter et al., 2017), Edward (Tran et al., 2016), and Infer.NET (Minka et al., 2018) support a broad spectrum of models and algorithms. However, dynamic models incorporate specific structures that may be exploited for improved algorithm efficiency. The SPM toolbox (Friston, 2014) includes specialized routines for simulating active inference processes, but offers limited modeling flexibility. ForneyLab marries flexible model design with automated derivation of FIGURE 17 | A naive policy in the mountain car task applies a full thrust to the right (right column). This naive policy is unable to make the agent traverse the barrier raised by the gravitational force. In contrast, active inference (left column) automatically infers that the agent should first move left (away from its goal) and subsequently use its momentum (and full forward thrust) to traverse the barrier. efficient structured inference algorithms on dynamic models (Cox et al., 2019). Furthermore, because ForneyLab produces inference algorithms as stand-alone (Julia) programs, these algorithms can be manually optimized before execution. Agent-environment interactions can be viewed from the perspective of coupled dynamical systems (Beer, 1995). It was recognized early on that a successful regulating agent must include a model of its environment (Conant and Ashby, 1970). Later efforts in the field of model-predictive control used this idea to build controllers for industrial applications (Camacho and Alba, 2013). Various reinforcement learning techniques have also been applied to the mountain car parking problem (Kuss and Rasmussen, 2004;Fürnkranz et al., 2012;Sutton and Barto, 2018), which can be extended to hierarchical contexts as well (Barto and Mahadevan, 2003). A comparison between active inference, risk-sensitive control (van den Broek et al., 2010) and expected utility maximization is provided by Friston et al. (2015). Further comparisons between active inference and reinforcement learning can be found in Friston (2012), , and Friston and Ao (2012).
While the present paper considers a discrete-time formulation of active inference, simulations of active inference Pio-Lopez et al., 2016;Buckley et al., 2017) have also been formulated and performed in the continuoustime domain (Friston et al., 2008). In the continuous-time treatment, preferred states are encoded as attracting sets in the dynamical system, and active inference leads the system to these attracting sets over time. While the continuoustime formulation allows for a more standard mathematical treatment, the discrete-time formulation supports explicit reasoning about targets at specific time-points and thereby more easily supports specification of priors for value-seeking behavior (Friston et al., 2015;Parr and Friston, 2017).

DISCUSSION AND CONCLUSION
This paper has described a message passing approach to automating simulations of online active inference processes, together with an experimental protocol that governs the interactions between the agent and its environment. We have tested our protocol on two synthetic applications, namely the Bayesian thermostat and the mountain car parking tasks.
Through these examples we have addressed the questions formulated in section 1, and illustrated how: 1. the proposed experimental protocol defines how to simulate the interactions between an active inference agent and its environment (section 4); 2. The ForneyLab toolbox allows for automatic scheduling of message passing algorithms for variational free energy minimization (section 5) in active inference agents.
The FFG formalism offers a modular decomposition of the internal model definition, allowing flexible model adaptations and intuitive visualization of complex models. Moreover, message passing algorithms for free energy minimization can be automatically derived on the FFG formulation of the agent's internal model. Automated derivation with ForneyLab returns the inference algorithm as a Julia program, which can be customized and executed in context of an experimental protocol. The proposed experimental protocol formulates the active inference process at each time step as an interplay between updating an internal (generative) model with actions and outcomes ("act-execute-observe"), followed by updating the recognition model ("infer") with the (statistical) consequences of the change in the generative model. Crucially, the agent and its environment solely interact through the exchange of actions and outcomes.
A current limitation of active inference with ForneyLab is that high-dimensional models may lead to numerical instabilities. Message passing with improved numerical stability is described by Loeliger et al. (2016). Furthermore, the specific message update order as prescribed by the schedule may have an effect on algorithm convergence. However, little theory still exists on optimal scheduling strategies. An interesting idea was mentioned in de Vries and Friston (2017), where it was suggested to approach the scheduling problem as an inference process that is itself subject to the free energy principle.
The presented approach to active inference relies fully on automatable inference methods. This aproach scales in principle to more complex applications that may be of interest to industry as well. For example, the state space models in the examples can be readily extended to hierarchical generative models Senoz and de Vries, 2018), which have been shown to be quite powerful in modeling real-world dynamics (e.g., Turner and Sahani, 2008;Mathys et al., 2014).
In order to construct hierarchical models, the policy priors may optionally depend on higher-order states, e.g., p(u k |s (1) k ), which renders prior constraints on control context-dependent. Similarly, target priors may also be made context-dependent, e.g., p(x k |s (1) k ). Contextual processes may thus influence the agent's behavior by modifying prior statistics, which allows the model design engineer to propose hierarchical and context-aware models. For example, when higher-order states evolve over longer timespans, hierarchical nesting leads to deep temporal models .
Higher-order dynamics could also be learned by free energy minimization (Ramstead et al., 2018). For example, the current simulations internalize a fixed model of the environmental dynamics. By including a prior belief over the dynamics in the internal model, the agent might learn the environmental dynamics from data (Ueltzhöffer, 2018). This adaptive agent then exhibits epistemic behavior and will take action in order to decrease uncertainty about the environmental dynamics Cullen et al., 2018). Moreover, the FFG paradigm supports messages that are computed by local gradient or sampling-based methods (Dauwels, 2007), and even allows for learning complex updates from data with the use of amortization techniques (Stuhlmüller et al., 2013;Gershman and Goodman, 2014). With these techniques, an adaptive agent might learn a rich and accurate model of its environment, leading to more effective behavior.
In summary, the current paper has proposed a scalable approach to automatic derivation of active inference algorithms and a practical view on the implementation of simulated active inference systems. We believe that synthetic active inference holds great promise for future engineering applications.

AUTHOR CONTRIBUTIONS
TvdL performed the simulations. TvdL and BdV wrote the manuscript.