Probabilistic inference in discrete spaces can be implemented into networks of LIF neurons

The means by which cortical neural networks are able to efficiently solve inference problems remains an open question in computational neuroscience. Recently, abstract models of Bayesian computation in neural circuits have been proposed, but they lack a mechanistic interpretation at the single-cell level. In this article, we describe a complete theoretical framework for building networks of leaky integrate-and-fire neurons that can sample from arbitrary probability distributions over binary random variables. We test our framework for a model inference task based on a psychophysical phenomenon (the Knill-Kersten optical illusion) and further assess its performance when applied to randomly generated distributions. As the local computations performed by the network strongly depend on the interaction between neurons, we compare several types of couplings mediated by either single synapses or interneuron chains. Due to its robustness to substrate imperfections such as parameter noise and background noise correlations, our model is particularly interesting for implementation on novel, neuro-inspired computing architectures, which can thereby serve as a fast, low-power substrate for solving real-world inference problems.


I. INTRODUCTION
The ability of the brain to generate predictive models of the environment based on sometimes ambigouous, often noisy and always incomplete sensory stimulus represents a hallmark of Bayesian computation.Both experimental [1,2] and theoretical studies [3][4][5] have explored this highly intriguing but also hotly debated hypothesis.These approaches have, however, remained rather abstract, employing highly idealized neuron and synapse models.
In this study, we explore how recurrent networks of leaky integrate-and-fire (LIF) neurons -a standard neuron model in computational neuroscience -can calculate the posterior distribution of arbitrary Bayesian networks over binary random variables through their spike response.Our work builds upon the findings of three previous studies: In Buesing et al. [5], it was shown how the spike pattern of networks of abstract model neurons can be understood as MCMC sampling from a well-definded class of target distributions.The approach was extended in a follow up paper [6] to Bayesian networks by identifying appropriate network architectures.The theoretical foundation for taking the step from abstract neurons to more realisitic networks of LIF neurons was developed recently in Petrovici et al. [7].In this paper, we follow and extend the approach from Petrovici et al. [7] to the network architectures proposed by Pecevski et al. [6].In particular, we describe a blueprint for designing spiking networks that can perform sample-based inference in arbitrary graphical models.
We thereby provide the first fully functional implementation of Bayesian networks with realistic neuron models.This enables studies in two complementary fields.On one hand, the development of network implementations for Bayesian inference contributes to the open debate on its biological correlate by exploring possible realizations in the brain.These can subsequently guide both targeted experimental research and computational modeling.Furthermore, additional physiological investigation is now made possible, e.g., of the influence of specific neuron and synapse parameters and dynamics or the embedding in surrounding networks.On the other hand, the finding that networks of LIF neurons can implement parallelized inference algorithms provides an intriguing application field for novel computing architectures.Much effort is currently invested into the development of neuroinspired, massively parallel computing platforms, called neuromorphic devices [8][9][10].These devices typically implement models of LIF neurons which evolve in parallel and without a central clock signal.This paper offers a concrete concept for the application of neuromorphic hardware as powerful inference machines.Interestingly, questions similar to the ones mentioned above in a biological context arise for artificial systems as well: the effect of parameter noise or limited bandwidth on functional network models is, for example, the subject of active research [11].
The document is structured as follows.In Sec.II, we review and adapt the theories from Buesing et al. [5], Pecevski et al. [6] and Petrovici et al. [7] to build a complete framework for embedding sampling from probability distributions into the structure and dynamics of networks of LIF neurons.In Sec.III, we provide the required translation rules and demonstrate the feasibility of the approach in computer simulations.We further compare the effect of different synaptic coupling dynamics and present a mechanism based on interneuron chains which significantly improves the sampling quality.Finally, we study robustness to parameter distortions and to correlations in the background noise, as these are likely to be present in any physical substrate, be it biological or artificial.
For the simulations with LIF neurons, we used PyNN [12] with NEST [13] or NEURON [14] as back-end.The simulations with networks of abstract model neurons were conducted in Python.

A. Bayesian Networks as Boltzmann Machines
The joint distribution defined by a Bayesian graph is the product of conditional distributions, one for each RV, with its value conditioned on the values of its parent variables.For a graph with K binary RVs Z k , the joint probability distribution is given by where z k represents the state vector of the variables Z k in Φ k , which we henceforth call principal RVs, and pa k represents the state vector of the parents of Z k .Z is a normalizing constant; w.l.o.g., we assume Φ k > 1.The factor p(z k |pa k ) is called an n th -order factor if it depends on n RVs or rather Such a Bayesian network can be transformed into a second-order Markov random field (i.e., an MRF with a maximum clique size of 2).Here, we follow the recipe described in Pecevski et al. [6].First and second-order factors are easily replaceable by potential functions Ψ k (Z k ) and Ψ k (Z k1 , Z k2 ), respectively.For each n th -order factor Φ k with n > 2 principal RVs, we introduce 2 n auxiliary binary RVs X z k ∈Z k k , where Z k is the set of all possible assignments of the binary vector Z k (Fig. 1 C).Each of these RVs "encode" the probability of a possible state z k within the factor Φ k by introducing the first-order potential functions Ψ where an auxiliary RV X z k k is active if and only if the principal RVs Z k are active in the configuration z k .
Formally, this corresponds to the assignment: In the graphical representation, this amounts to removing all directed edges within the factors and replacing them by undirected edges from the principal to the auxiliary RVs.It can then be verified [6] that the target probability distribution can be represented as a marginal over the auxiliary variables.
As the resulting graph is a second-order MRF, its underlying distribution can be cast in Boltzmann form: where the (symmetric) weight matrices W, V and bias vectors b, a are defined as follows: within second-order factors Φ k 0 otherwise (4) within first-order factors log within second-order factors (6) all other matrix and vector elements being zero.L 1 (•) represents the L 1 norm.In the theoretical model, M exc = ∞ and M inh = −∞, but they receive finite values in the concrete implementation (Sec.II D).From here, it is straightforward to create a corresponding classical Boltzmann machine.We therefore use a simplified notation from here on: we consider the vector Z to include both principal and auxiliary RVs and the Boltzmann distributions over Z are henceforth defined by the block diagonal weight matrix W and the bias vector b.

B. Neural Sampling: An Abstract Model
Gibbs sampling is typically used to update the states of the units in a Boltzmann machine.However, in a spiking network, detailed balance is not satisfied, since spiking neurons do not incorporate reversible dynamics due to the existence of refractory mechanisms.While a nonrefractory neuron can always be brought into the refractory state with sufficient stimulation, the reverse transition is, in general, not possible.It is possible, however, to understand the dynamics of a network of stochastic neurons as Markov chain Monte Carlo (MCMC) sampling.In the following, we use the model proposed in Buesing et al. [5] for sampling from Boltzmann distributions (Eq.3).
In this model, the spike response of a neuron is associated to the state z k of an RV Z k and a spike is interpreted as a state switch from 0 to 1.Each spike is followed by a refractory period of duration τ , during which the neuron remains in the state Z k = 1.The so-called neural computability condition (NCC) provides a sufficient condition for correct sampling, wherein a neuron's FIG. 1. Formulation of an example inference problem as a Bayesian network and translation to a Boltzmann machine.(A) Knill-Kersten illusion from [15].Although the four objects are identically shaded, the left cube is perceived as being darker than the right one.This illusion depends on the perceived shape of the objects and does not occur for, e.g., cylinders.(B) The setup can be translated to a Bayesian network with four binary RVs.The (latent) variables Z1 and Z2 encode the (unknown) reflectance profile and 3D shape of the objects, respectively.Conditioned on these variables, the (observed) shading and 2D contour are encoded by Z3 and Z4, respectively.(C) Representation of the Bayesian network from B as a Boltzmann machine.Factors of order higher than 2 are replaced by auxiliary variables as described in the main text.The individual connections with weights Mexc, M inh → ∞ between each principal and auxiliary variable have been omitted for clarity.
"knowledge" about the state of the rest of the networkand therefore its probability to spike -is encoded in its membrane potential: where Z \k (t) denotes the vector of all other variables Z i with i = k.By solving for Z k (t) = 1, one obtains a logistic neural activation function (Fig. 2 D), which is reminiscent of the update rules in Gibbs sampling: In order for a neuron to be able to track its progression through the refractory period, a refractory variable ζ k is introduced for each neuron, which assumes the value τ following a spike at time t and decreases linearly towards 0 at time t + τ .The transition probability to the state ζ k only depends on the previous state ζ k .The resulting sequence of states For the particular case of a Boltzmann distribution with weight matrix W and bias vector b, the NCC (Eq.8) is satisfied by neurons with the membrane potential represented by a sum of rectangular postsynaptic potentials (PSPs):

C. Neural Sampling with LIF Neurons
In contrast to the abstract model described above, biological neurons exibit markedly different dynamics.Most importantly, the firing times of individual neurons are not stochastic: in-vitro single neuron experiments show how a fixed stimulus sequence triggers a fixed spike train reliably over multiple trials [16].Also, their interaction is not mediated by rectangular PSPs.In-vivo, however, neurons often receive diffuse synaptic stimulus that alters their dynamics in several important ways [17].As demonstrated in Petrovici et al. [7] and described below, under such conditions, deterministic neurons can attain the required stochastic dynamics to sample from arbitrary Boltzmann distributions.
A widely used neuron model that captures the abovementioned characteristics of biological neurons is the leaky integrate-and-fire (LIF) model, which defines neuron membrane dynamics as follows: where C m , g l and E l represent the membrane capacitance, the membrane leakage conductance and the membrane leakage potential, respectively, and generated by synaptic conductances g syn : where w syn i represents the weight of the i th afferent synapse, E rev i its reversal potential and τ syn the synaptic time constant.
In the regime of diffuse synaptic background noise, it can be shown that the temporal evolution of the membrane potential is well approximated by an Ornstein-Uhlenbeck (OU) process with a mean and variance that can be computed analytically [7].This regime is achieved by intense synaptic bombardment from independent Poissonian spike sources with high firing rates ν syn and low synaptic weights.This allows calculating the temporal evolution of the membrane potential distribution p(u|u 0 ), as well as the mean first-passage time T (u 1 , u 2 ) of the membrane potential from u 1 to u 2 .With this, the activation function of a single LIF neuron can be expressed as where P n represents the probability of an n-spike burst and T n the average period of silence following such a burst.Both of these terms be expressed with recursive integrals that can be evaluated numerically [7].
Denoting by u eff the effective membrane potential (i.e., the average membrane potential under constant external stimulus other than the synaptic noise), this yields a sigmoidal activation function σ(u eff ) (see Fig. 2 D), which can be linearly transformed to the logistic activation function σ(v) to match the abtract model in Sec.II B: where u 0 represents the value of u eff for which p(Z = 1) = 1/2.The factor α denotes a scaling factor between the two domains and is equal to 4 [dσ/du eff ( u 0 )] −1 .
From here, a set of parameter translation rules between the abstract and the LIF domain follow, which are explained in more detail in Sec.II E. Fig. 2 E shows the result of sampling with LIF neurons from an example Boltzmann distribution together with the target probability values.

D. Characterization of the Auxiliary Neurons
In the mathematical model in Section II A, the weights between principal and auxiliary RVs are M exc = ∞ and M inh = −∞, to ensure a switching of the joint state whenever one of the auxiliary variables changes its assignment.In a concrete implementation, infinite weights are unfeasible.Here, we set the connection strengths where γ is a fixed number between 5 and 10.Neurons with a bias of M exc,k (M inh,k ) will effectively spike at maximum rate (remain silent), unless driven by afferent neurons with similarly high synaptic weights.
The individual values of the factor Φ k (z k ) are introduced through the bias of the auxiliary neurons: where the factor µ/ min z k [Φ k (z k )] ensures that the argument of the logarithm stays larger than 0 for all possible assignments z k .
Observed variables are clamped to fixed values 0 or 1 by setting the biases of the corresponding principal neurons to very large values (±20), to ensure that they spike either at maximum rate or not at all.This mimics the effect of strong excitatory or inhibitory stimulation.

E. Parameter Translation between Distributions and Networks
In the LIF domain, the bias b can be set by changing the leak potential E l such that the neuron is active with σ(b) for Z \k = 0: where g tot represents the total synaptic conductance and u b eff is the effective membrane potential that corresponds to the bias b: σ(u b eff ) = σ (b).For the translation of synaptic weights, we use the approximate PSP shape of an LIF neuron with conductance-based synapses in the HCS [7]: where, w ki denotes the synaptic weight from neuron i to neuron k and τ eff = C m / g tot the effective membrane time constant.For both the LIF domain and the abstract domain, a presynaptic spike is intended to have the same impact on the postsynaptic neuron, which is approximately realized by matching the average value of the LIF PSPs within a refractory period with the theoretically optimal constant value: Evaluating the integral in Eq. 18 yields the weight translation factor between the abstract and the LIF domain: τ syn e Fig. 4A shows the shape of such an LIF PSP.The shape is practically exponential, due to the extremely short effective membrane time constant in the HCS.We will later compare the performance of the LIF implementation to two implementations of the abstract model from Sec. II B: neurons with theoretically optimal rectangular PSPs of duration τ ref , the temporal evolution of which is defined as and neurons with alpha-shaped PSPs with the temporal evolution ) Here, t 1 and t 2 are the points in time where the alpha kernel is e • t • exp(−t) = 0.5.The value q 1 = 2.3 is a scaling factor and τ α = 17 ms • τ ref 30 ms is the time constant of the kernel [6].
In the abstract neural model, by definition, the rectangular PSPs can not superpose, since their width is identical to the refractory period of the neurons.In LIF neurons, PSPs do not end abruptly, possibly leading to (additive) superpositions, and thereby to deviations from the target distribution.To counteract this effect, we have used the Tsodyks-Markram model of short-term synaptic plasticity [18].Setting the facilitation constant τ facil = 0 leads to u n+1 = U 0 .With the initial utilization parameter U 0 = 1 and the recovery time constant τ rec = τ syn , the parameter R, which describes the recovery of the synaptic strength after the arrival of an action potential, yields FIG. 3. In order to establish a coupling which is closer to the ideal one (rectangular PSP), the following network structure was set up: Instead of using one principal neuron ν per RV, each RV is represented by a neural chain.In addition to the network connections imposed by the translation of the modeled Bayesian graph, feedforward connections between the neurons in this chain are also generated.Furthermore, each of the chain neurons projects onto the first neuron of the postsynaptic interneuron chain (here: all connections from ν i 1 to ν 1 2 ).By choosing appropriate synaptic efficacies and delays, the chain generates a superposition of single PSP kernels that results in a sawtooth-like shape which is closer to the desired rectangular shape than a single PSP.
where ∆t is the time interval between the n th and the (n + 1) th afferent spike.The condition in Eq. 22 is equivalent to a renewing synaptic conductance, which, due to the fast membrane in the HCS, is in turn equivalent to renewing PSPs.

F. Performance Improvement via a Superposition of LIF PSP Kernels
The difference in PSP shapes between the LIF domain and the theoretically optimal abstract model is the main reason why the direct translation to LIF networks causes slight deviations from the target probability distribution.The sometimes strong interaction involved in the expansion of Bayesian networks into Boltzmann machines (see Eq. 5) leads to a large overshoot of the membrane potential at the arrival of a PSP and a nonzero PSP tail beyond t = t spike + τ ref (see Fig. 4 A).
In order to reduce this discrepancy, we replaced the single-PSP-interaction between pairs of neurons by a superposition of LIF PSP kernels.For this, we replaced the single neuron that coded for an RV by a chain of neurons (see Fig. 3).In this setup, the first neuron in a chain is considered the "main" neuron, and only the spikes it emits are considered to encode the state z k = 1.However, all neurons from a chain project onto the main neuron of the chain representing a related RV.This neuron then registers a superposition of PSPs, which can be adjusted (e.g., with the parameter values from Tab. II) to closely approximate the ideal rectangular shape by appropriately setting synaptic weights and delays within as well as between the chains.In particular, the long tail of the last PSP is cut off by setting the effect of the last neuron in the chain to oppose the effect of all the others (e.g., if the interaction between the RVs is to be positive, all neurons in the chain project with excitatory synapses onto their target, while the last one has an inhibitory outgoing connection).While this implementation only scales the number of network components (neurons and synapses) linearly with the chosen length of the chains, it improves the sampling results significantly (Fig. 4 B, C, D gray bars/traces).

III. RESULTS
In the Methods Section, we have provided a comprehensive description of the translation of arbitrary Bayesian graphs to networks of LIF neurons.Now, we apply these networks to several well-studied cognitive inference problems and study their robustness to various types of substrate imperfections.

A. Bayesian Model of the Knill-Kersten Illusion
Fig. 1 illustrates the translation of the Bayesian graph describing the well-studied Knill-Kersten illusion [15] to the LIF domain.We have chosen this experiment since it has been thoroughly studied in literature and it has a rather intuitive Bayesian representation.More importantly, it features some essential properties of Bayesian inference, such as higher-order dependencies within groups of RVs and the "explaining away" effect.The underlying Bayesian model consists of four RVs: Z 1 (reflectance step versus uniform reflectance), Z 2 (cylindrical versus cuboid 3D shape), Z 3 (sawtooth-shaped versus some other shading profile) and Z 4 (round versus flat contour).The network structure defines the decomposition of the underlying probability distribution: The inference problem consists in estimating the relative reflectance of the objects given the (observed) contour and shading.Analytically, this would require calculating p(Z 1 |Z 3 = 1, Z 4 = 0) for the cuboid shapes and p(Z 1 |Z 3 = 1, Z 4 = 1) for the cylindrical ones.
Fig. 4 shows the behavior of the LIF network that represents this inference problem.When no variables are clamped, the network samples freely from the unconstrained joint distribution over the four RVs.The performance of the network, i.e., its ability to sample from the target distribution, is quantified by the Kullback-Leibler (KL) divergence between the target and the sampled distribution normalized by the entropy of the target distribution: with the KL divergence between the sampled distribution q and the target distribution p and the entropy of the target distribution p When presented with the above inference problem the LIF network performs well at sampling from the condi-tional distributions p(Z 1 |Z 3 , Z 4 ) (Fig. 4 B, C).When the stimulus is changed during the simulation, the optical illusion, i.e., the change in the inferred (perceived) 3D shape and reflectance profile, is clearly represented by a change in firing rates of the corresponding principal neurons (Fig. 4 D).For each point in time, the rate is determined by convolution of the spike train with a rectangular kernel At t = 100 s (red line), the evidence is switched: Z 4 = 1 → 0. The switch happens on the time scale of several seconds, as can be seen in the spike raster plot.
When not constrained by prior evidence, i.e., when sampling from the joint distribution over all RVs, the LIF network settles on an equilibrium distribution that lies close to the target distribution (Fig. 4 E, F, green traces).For this particular network, the convergence time is of the order of several tens of seconds.

B. Robustness to Parameter Distortions
We further investigated the robustness of our proposed implementation of Bayesian inference with LIF neurons to low levels of parameter noise (see Tab. I).Here, we focus on fixed-pattern noise, which is inherent to the production process of semiconductor integrated circuits and is particularly relevant for analog neuromorphic hardware [11,19].However, such robustness would naturally also benefit in-vivo computation.
Some of the noise (the one affecting the neuron parameters that are not changed when setting weights and biases) can be completely absorbed into the translation rules from Sec. II C. Once the neurons are configured, their activation curves can simply be measured, allowing a correct transformation from the abstract to the LIF domain.However, while the neurons remain the same between different simulation runs, the weights and biases may change depending on the implemented inference problem and are still subject to noise.Nevertheless, even with a noise level of 10% on the weights and biases, the LIF network still produces useful predictions (Fig. 4 B, C, E yellow bars/traces).

C. Robustness to Noise correlations
The investigated implementation of Bayesian networks ideally requires each neuron to receive independent noise as a Poisson spike train.When aiming for a hardware implementation of large Bayesian networks, this requirement may become prohibitive due to the bandwidth limitations of any physical back-end.We therefore examined the the robustness of our LIF networks to small crosscorrelations between the Poissonian noise channels of individual neurons.
For both the excitatory and the inhibitory background pools, we induced pairwise noise correlations by allowing neurons within the network to share 10% of their background Poisson sources.The controlled cross-correlation of 10% between noise channels is achieved in the following way: each neuron receives Poisson background from three shared and seven private Poisson spike trains.The excitatory and inhibitory noise of each individual neuron remained uncorrelated in order to leave its activation function (Eq.13) unaltered.Each of shared sources projects onto exactly two neurons in order to prevent higher-order correlations.The single Poissonian spike trains have a firing rate of ν/10, such that their superposition is also Poisson, with the target firing rate of ν.With this setup, we were able to verify that small pairwise correlations in the background noise do not significantly reduce the ability of the LIF network to produce useful predictions (Fig. 4 B, C, F orange bars/traces).

D. General Bayesian Networks
In order to study the general applicability of the proposed approach, we quantified the convergence behavior of LIF networks generated from random Bayesian graphs.Here, we used a method proposed in Ide and Cozman [20] to generate random Bayesian networks with K binary RVs and random conditional probabilities.The algorithm starts with a chain graph Z 1 → Z 2 → • • • → Z K and runs for N iterations.In each iteration step, random RV pairs (Z i , Z j ) with i > j are created.If the connection Z i → Z j does not exist, it is added to the graph, otherwise it removed, with two constraints: any pair of nodes may not have more than 7 connections to other nodes and the procedure may not disconnect the graph.For every possible assignment of pa i , the conditional probabilities p pa i i := p(Z i = 1|pa i ) are drawn from a second-order Dirichlet distribution D(p with the multinomial Beta function where Γ(•) denotes the gamma function.We chose the parameters η 1 = η 2 =: η in order to obtain a symmetrical distribution.Fig. 5 A shows three examples of a symmetrical two-dimensional Dirichlet distribution.A larger η favors conditional probabilities which are closer to 0.5 than to the boundaries 0 and 1.We implemented Bayesian networks with K = 5 random variables and N = 50000 The random graphs were then translated to sampling neural networks, both with abstract model neurons and LIF neurons.The performance was tested for sampling from the unconstrained joint distributions over the 5 RVs.In the simulations, we varied η between 0.3 and 10 and created 30 random Bayesian graphs for each η.Each network was then run for a total duration of 100 s.
Fig. 5 B illustrates the average sampling results for the different PSP shapes as a function of the "extremeness" of the randomized conditional probabilities, which is reflected by the parameter η.For larger η, conditionals cluster around 0.5 and the RVs become more independent, making the sampling task easier and therefore improving the sampling performance.The curves show the median of the D norm KL between sampled and target distributions of the 30 random Bayesian graphs.The shaded regions denote the standard error.Overall, the LIF networks perform well, capturing the main modes of the target distributions.As with the Bayesian model of the Knill-Kersten illusion, the main cause of the remaining discrepancy is the difference in PSP shapes between the LIF domain and the theoretically optimal abstract model.A modification of the RV coupling by means of the neuron chains described in Sec.II F leads to a significant improvement of the sampling results for arbitrary Bayesian networks (Fig. 5 B, C gray traces).

IV. DISCUSSION
In this article, we have presented a complete theoretical framework that allows the translation of arbitrary probability distributions over binary RVs to networks of LIF neurons.We build upon the theory from Buesing et al. [5] and Pecevski et al. [6] and extend their work to a mechanistic neuron model widely used in computational neuroscience based on the approach in Petrovici et al. [7].In particular, we make use of the conductance-based nature of membrane dynamics to enable fast reponses of neurons to afferent stimuli.
We have demonstrated how networks of conductancebased LIF neurons can represent probability distributions in arbitrary spaces over binary RVs and can perform stochastic inference therein.By comparing our proposed implementation to the theoretically optimal, abstract model we have shown that the LIF networks produce useful results for the considered inference problems.Our framework allows a comparatively sparse implemen-tation in neural networks, both in terms of the absolute number of neurons as well as considering energy expenditure for communication, since state switches are encoded by single spikes.This compares favorably with other implementations of inference with LIF neurons, based on e.g.firing rates or reservoir computing [21].
The main cause for the deviations of the LIF equilibrium distributions from the target distributions lie in the shape of synaptic PSPs.We shave shown how a more complex coupling mechanism based on interneuron chains can improve inference by allowing a more accurate translation of target distributions to networks of LIF neurons.This kind of interaction provides a connection to other well-studied models of chain-based signal propagation in cortex [11,22,23].A similar interaction kernel shape can be achieved by multiple synapses between two neurons (multapses) lying at different points along a dendrite, causing their PSPs to arrive at the soma with different delays.From a computational point of view, the increased sampling performance due to this coupling mechanism only comes at the cost of linearly increasing network resources.
Furthermore, we have shown that imperfections of the physical substrate of the neuronal implementation, be it biological or artificial, can be well tolerated by our model.In particular, we have studied the robustness of our networks to small levels of parameter noise as well as weak correlations between the noise channels of individual neurons.We therefore regard it as a promising candidate for implementation on neuromorphic devices, which can augment these already efficient networks by providing a fast, low-power emulation substrate.Implemented on such substrates, our networks can serve as a basis for machine learning algorithms, thereby facilitating the development of, e.g., autonomous robotic learning agents.

Fig. 2 A
illustrates the transition of the state variable ζ k .A neuron with ζ k ∈ {0, 1} elicits a spike with probability σ(v k −log τ ).This defines the stochastic neuron model in Buesing et al.[5].

Fig. 2 B
Fig. 2 B illustrates the time courses of the membrane potential v k , the state variable Z k and the refractory variable ζ k of an abstract model neuron.
FIG. 2. Neural sampling: abstract model vs. implementation with LIF neurons.(A) Illustration of the Markov chain over the refractory variable ζ k in the abstract model.(B) Example dynamics of all the variables associated with an abstract model neuron.(C) Example dynamics of the equivalent variables associated with an LIF neuron.(D) Free membrane potential distribution and activation function of an LIF neuron: theoretical prediction vs. experimental results.The blue crosses are the mean values of 5 simulations of duration 200 s.The error bars are smaller than the size of the symbols.Table I lists the used parameter values of the LIF neuron.(E) Performance of sampling with LIF neurons from a randomly chosen Boltzmann distribution over 5 binary RVs.Both weights and biases are chosen from a normal distribution N (µ = 0 , σ = 0.5).The green bars are the results of 10 simulations of duration 100 s.The error bars denote the standard error.

FIG. 4 .
FIG. 4. Comparison of the different implementations of the Knill-Kersten graphical model (Fig. 1): LIF (green), LIF with noised parameters (yellow), LIF with small cross-correlations between noise channels (orange), mLIF PSPs mediated by a superposition of LIF PSP kernels (gray), abstract model with alpha-shaped PSPs (blue), abstract model with rectangular PSPs (red) and analytically calculated (black).The error bars for the noised LIF networks represent the standard error over 10 trials with different noised parameters.All other error bars represent the standard error over 10 trials with identical parameters.(A) Comparison of the four used PSP shapes.(B, C) Inferred marginals of the hidden variables Z1 and Z2 conditioned on the observed (clamped) states of Z3 and Z4.In B, (Z3, Z4) = (1, 1).In C, (Z3, Z4) = (1, 0).The duration of a single simulations is 10 s. (D) Marginal probabilities of the hidden variables reacting to a change in the evidence Z4 = 1 → 0. The change in firing rates (top) appears slower than the one in the raster plot (bottom) due to the smearing effect of the box filter used to translate spike times into firing rates.(E, F) Convergence towards the unconstrained equilibrium distributions compared to the target distribution.In D, the performance of the four different PSP shapes from A is shown.The abstract model with rectangular PSPs converges to D norm KL= 0, since it is guaranteed to sample from the correct distribution in the limit t → ∞.In E, the performance of the three different LIF implementations is shown.

FIG. 5 .
FIG. 5. Sampling from random distributions over 5 RVs with different networks: LIF (green), mLIF (gray), abstract model with alpha-shaped PSPs (blue) and abstract model with rectangular PSPs (red).(A) Distributions for different values of η from which conditionals are drawn.(B) D norm KL between the equilibrium and target distributions as a function of η.The error bars denote the standard error over 30 different random graphs drawn from the same distribution.(C) Evolution of the D norm KL over time for a sample network drawn from the distribution with η = 1.Error bars denote the standard error over 10 trials.

Fig. 5 C
Fig. 5 C shows the temporal evolution of the D norm KL between sampled and target distributions for a sample Bayesian network drawn from the distribution with η = 1 that lied close to the D norm KL median in Fig. 5 B. The curves illustrate the average results of 10 simulations, while the shaded regions denote the standard error.As with the Bayesian model of the Knill-Kersten illusion, the main cause of the remaining discrepancy is the difference in PSP shapes between the LIF domain and the theoretically optimal abstract model.A modification of the RV coupling by means of the neuron chains described in Sec.II F leads to a significant improvement of the sampling results for arbitrary Bayesian networks (Fig.5B, C gray traces).

TABLE I .
Standard neuron and network parameters used in this paper.The network parameter denotes a sample from the uniform distribution unif(0.9,1.1).Parameters of the first chain neuronCapacity of the membrane Cm 0.2 nF Membrane time constant τm 0.1 ms Duration of refractory period τ ref 29.5 ms Decay time of the excitatory synaptic conductance τsyn,exc 30.0 ms Decay time of the inhibitory synaptic conductance τ syn,inh 30.0 ms Reversal potential for excitatory input E rev

TABLE II .
Parameters of the interneuron chain of 6 neurons, which are used to generate the mLIF PSP.