Bayesian Population Decoding of Spiking Neurons

The timing of action potentials in spiking neurons depends on the temporal dynamics of their inputs and contains information about temporal fluctuations in the stimulus. Leaky integrate-and-fire neurons constitute a popular class of encoding models, in which spike times depend directly on the temporal structure of the inputs. However, optimal decoding rules for these models have only been studied explicitly in the noiseless case. Here, we study decoding rules for probabilistic inference of a continuous stimulus from the spike times of a population of leaky integrate-and-fire neurons with threshold noise. We derive three algorithms for approximating the posterior distribution over stimuli as a function of the observed spike trains. In addition to a reconstruction of the stimulus we thus obtain an estimate of the uncertainty as well. Furthermore, we derive a ‘spike-by-spike’ online decoding scheme that recursively updates the posterior with the arrival of each new spike. We use these decoding rules to reconstruct time-varying stimuli represented by a Gaussian process from spike trains of single neurons as well as neural populations.


INTRODUCTION
Understanding how stimuli and other inputs to neurons can be decoded from their spike patterns is an essential step towards understanding neural codes.Neurons communicate by sequences of action potentials, which can be viewed as a sequence of discrete events in time.Many sensory inputs, however, change continuously in time and have variations across a large range of different time scales.Similarly, the occurrence of spikes can depend on continuous electrophysiological signals such as local fi eld potentials (Montemurro et al., 2008;Rasch et al., 2008).Here, we seek to achieve a better understanding of how such continuous signals can be decoded from neuronal spike trains, and how the basic biophysical dynamics of individual neurons affect the encoding process.
We will investigate these questions using leaky integrate-and-fi re neurons (LIFs) (Stein, 1967;Tuckwell, 1988).Leaky integrators constitute a natural choice as they capture basic dynamical properties of neurons, yet are still amenable to analytical studies of dynamic encoding.In this model, a spike is emitted as soon as the integrated input reaches a threshold.Thus, the relative timing of spikes will contain information about the stimulus in the recent past.In the noiseless case, an elegant solution has been proposed for decoding a time-varying stimulus from integrate-and-fi re neurons based on computing the pseudo-inverse (Seydnejad and Kitney, 2001) which can also be used to decode neural populations (Lazar and Pnevmatikakis, 2008).
Here, we seek to generalize from the noiseless to the noisy case.Specifi cally, we study decoding rules for reconstructing timevarying, continuous stimuli from populations of leaky integrateand-fi re neurons with noisy membrane thresholds.Incorporating noise into the model does not only make the model more realistic, but also naturally leads to a Bayesian approach to population coding (Rao et al., 2002;Huys et al., 2007;Natarajan et al., 2008).Each spike constitutes a noisy measurement of the underlying membrane potential and, using the Bayesian formalism, this relationship can be inverted in order to infer the posterior distribution over stimuli (Paninski et al., 2007;Lewi et al., 2009).While many studies have addressed Bayesian population codes and the representation of uncertainty in neural populations (Pouget et al., 2000;Rao et al., 2002;Rao, 2005;Ma et al., 2006), the question of how posterior distributions can be decoded from the spike-times of LIFs has not been studied in detail.Natarajan et al. (2008), Huys et al. (2007) analyzed probabilistic decoding of continuously varying stimuli, but they did not use the LIF neuron model but an inhomogeneous Poisson point process.
A Bayesian decoding rule does not only return a point estimate of the stimulus, but also an estimate of the posterior covariance, representing the residual uncertainty about the stimulus.This uncertainty estimate is of critical importance for a 'spike-by-spike' decoding scheme (Wiener and Richmond, 2003), as it allows one to appropriately weight each observation by its reliability.In addition, the uncertainty directly relates to the accuracy of the neural code.By inspecting the posterior variance of different stimulus features, one can gain insight into the accuracy with which different features are represented by the population.
For the sake of clarity, we choose a simple threshold noise model, which does not affect the dynamics of the integration process but only sets the threshold to a new random value whenever a spike has been elicited (Gerstner and Kistler, 2002).The generation of spikes in this model class can be described by a renewal process.A Gamma point process is obtained as special case in the limit of a large membrane time constant when the threshold values are drawn from a Gamma distribution.In particular, when the exponential distribution is chosen, the spike generation process constitutes an inhomogeneous Poisson process.The Gamma distribution is a computationally convenient distribution which ensures positiveness of the threshold.Therefore, this choice of noise model is conceptually simple, but nevertheless can be used to model a wide range of different spiking statistics.However, even for this simple noise model, the exact shape of the posterior distribution over stimuli can not be obtained in closed form in general and approximations have to be used.Here, we derive three decoding rules based on Gaussian approximations to the posterior distribution.We show that the simple decoder which originates from the noiseless case is biased when introducing threshold noise.We then derive an expression for the bias length and state conditions under which this leads to an improved estimator of the stimulus.Furthermore, we show how this estimate can be updated iteratively every time a new spike is observed.
The paper is organized as follows: In the Section 'Encoding' we describe the basic encoding model as well as the stochastic description of the time-varying input.The decoding in the noiseless case can be extended to include threshold noise as well.This leads to an approximate likelihood, from which we derive several approximations to the full posterior distribution in the Section 'Decoding' .In the Section 'Alternative Methods' we compare the resulting Bayesian decoding schemes to alternative reconstructions, such as the linear decoding (Bialek et al., 1991) andthe Laplace approximation (MacKay, 2003;Rasmussen and Williams, 2006;Paninski et al., 2007) based on the likelihood approximation.Finally, in the Section 'Simulations', we apply the decoding schemes to different scenarios which illustrate different aspects of neural population coding.

ENCODING
The encoding process is split up into two parts: The fi rst one is the neural encoding part, which characterizes the spike generation process for a given stimulus.The second part describes the stimulus ensemble.

LEAKY INTEGRATE-AND-FIRE NEURON WITH THRESHOLD NOISE
We start with the classic leaky integrate-and-fi re neuron model (Tuckwell, 1988;Gerstner and Kistler, 2002).It consists of a membrane potential V t which accumulates the effective input I t .Here, V t and I t are scalar functions if a single neuron is modeled, or vectors if a population is considered.Whenever the membrane potential of a neuron n reaches a pre-specifi ed threshold θ n a spike is fi red and the membrane potential is reset to zero, i.e. lim ( ) .
= In addition to the input I, there is a leak term which drives the membrane potential back to zero when no input is present.Correspondingly, the sub-threshold dynamics of the membrane potential can be described by the following ordinary differential equation (ODE): (1) The time constant τ specifi es the time scale of the neural dynamics.Assuming the time of the last spike is t k − , the membrane potential at any time t before the next spike is given by: ) − , I is a linear functional of the stimulus I depending on the time of the last spike t k − and the current time point t.Due to the additional spiking nonlinearity that governs the dynamics when the membrane potential reaches the threshold, the LIF neuron performs a complex mapping of continuous signals to spike patterns.A simple way of incorporating noise into our model is to vary the threshold from spike to spike in a stochastic fashion.Every time a spike is fi red, the threshold is drawn from a known distribution with density p θ .Thus for every given (constant) stimulus, the resulting point process is a renewal process.
With these assumptions we can write down the likelihood of observing a spike train of one neuron for a given stimulus I t : , −1 defi ned as in Eq. 2 and I ( denotes the stimulus between t k − 1 and t k .The fi rst equality holds because of the renewal property of the spike generation process.In other words, the time of the next spike only depends on the time of the previous spike and the stimulus since then.Subsequently, we change variables from I is only a function of t k because we condition on t k − 1 and I.As the value of the linear functional at the time of a spike equals the threshold θ, we plug in the density for the threshold p θ .The change of variables t k to I is only one-to-one, if one uses the fact, that t k is the fi rst time I equals the threshold.Therefore, plugging in the threshold distribution without accounting for the problem, that F(I) may have been super-threshold turns the last equation into an approximation.If we consider a whole population, the likelihood reads: where t k − denotes the time of the previous spike of the neuron, which fi red a spike at time t k .The threshold distribution p θ might be different for different neurons.For notational simplicity, however, we do not indicate this.In the following the spike times t k are ordered and indexed by the subscript k.Which neuron fi red the spike t k only enters the calculation in the computation of the linear functionals F I t t k k − , ( ).Therefore we drop the dependency of the neuron.
There is no simple way how the sub-threshold condition can be incorporated.However, we can include the condition that at the time of reaching the threshold, the membrane potential V t must be increasing by adding the requirement , − I > 0 (Pillow and Simoncelli, 2002;Arcas and Fairhall, 2003).
For the threshold noise we assume a Gamma distribution with shape parameter α and scale parameter β: As a special case, if the input is non-negative and if the time constant goes to infi nity, the resulting point process is an inhomogeneous Gamma-renewal process.
In this way we obtain an approximate likelihood, when the threshold is varied at the time of spikes.The case of white input noise and fi xed threshold is described in Paninski et al. (2004).This can equivalently be seen as varying the thresholds continuously according to an Ornstein Uhlenbeck process.For the case of softthreshold based likelihoods from the family of Generalized Linear Models (see Jolivet et al., 2006;Paninski et al., 2007).

SPECIFYING THE PRIOR: A MODEL FOR THE STIMULUS
The prior distribution specifi es the assumption about the range and relative frequency of different stimuli.A common approach is to use a maximum entropy prior.In particular, the normal distribution is a maximum entropy distribution for given mean and covariance.As stimuli are functions of time, we have to specify a distribution over functions.We choose a fi nite set of basis functions { f i } and then specify a distribution over the coeffi cients from which all possible functions are generated by a linear superposition: The coeffi cients c i are drawn from the Gaussian prior distribution.We denote the mean and the covariance matrix by c and Σ c , respectively.For stationary processes, a natural choice of basis functions is the Fourier basis.Any superposition of such basis functions will result in a smooth function.Defi ning a covariance structure for the coeffi cients directly translates into the structure of the powerspectrum.Thus, I t is a fi nite-dimensional Gaussian process.Using a fi nite number of basis functions poses a potential diffi culty for the spike generation process described in the previous section.If one uses basis functions which are bounded, so will be any sample from the input process.Therefore, there is a non-zero probability that a threshold is drawn which could never be reached by the membrane potential.However, if we use a fl at power-spectrum, i.e. isotropic covariance for the coeffi cients, and increase the number of Fourier basis functions the process will converge to a Brownian motion.For Brownian motion as input, the membrane potential is an Ornstein-Uhlenbeck process and therefore will eventually exceed any threshold.For the simulations in this paper, we never observed an infi nitely long inter-spike interval.
Using this model for the stimulus we can rewrite the linear functional of the stimulus as an inner product with the stimulus coeffi cients: Ignoring the likelihood term of the fi rst spike time t 0 , we can write down the approximate log-likelihood (Eq. 3) as follows: where the constant does not depend on t k , I t .As Paninski pointed out (Paninski et al., 2007), this model is a Generalized Linear Model (GLM).The resulting encoding process is illustrated in Figure 1.

DECODING
In the previous section, we have seen that the encoding process can be described by a conditional distribution p(r |s), the probability of observing a neural response r, given that a stimulus s was presented.For the task of decoding, an important conceptual distinction can be made between point estimation and probabilistic inference.The latter consists of inferring the full posterior distribution p(s |r): the probability of stimulus s, given that we observed a specifi c neural response r.Point estimation in contrast requires to make a decision for one particular stimulus as a best guess.Typical point estimates are the posterior mean E[s |r] or the stimulus s* for which the posterior distribution takes its maximum (maximum a posteriori, MAP).These choices are optimal for different loss functions.A loss function specifi es the 'cost' of guessing stimulus ŝ if the true stimulus was s.The posterior mean is optimal for the squared error loss ||s − ŝ|| 2 , whereas the MAP is optimal under the 0/1 loss.Although the 0/1 loss, which has a constant loss for arbitrarily small errors, is an arguably unnatural choice for continuous stimuli, MAP decoding is still popular and often performs well also with respect to other loss functions.Further, the posterior mean together with the posterior variance can also be regarded as a Gaussian approximation to the full posterior distribution.
In the following we will start from the noiseless case, re-deriving the pseudo-inverse decoding scheme that has been presented before by (Seydnejad and Kitney, 2001).We show that when introducing noise, the pseudo-inverse can still be seen as an approximate decoding rule, but suffers from an asymptotic bias.In order to cope with this problem, we derive a bias-reduced version as well, which canbe applied in an iterative 'spike-by-spike' fashion.

NOISELESS CASE
In the noiseless case, the problem of inverting the mapping from stimulus to spike-times can be interpreted as a linear mapping (see Seydnejad and Kitney, 2001;Pillow and Simoncelli, 2002;Arcas and Fairhall, 2003).Roughly speaking, each interspike interval defi nes one linear constraint on the set of possible stimuli that could have evoked the observed spike response.The evolution of the membrane potential during an interspike interval is obtained via Eq. 2. As the spike times correspond to threshold crossings of the membrane, we know that the membrane potential hits the threshold θ at time t k : If we represent the stimulus in terms of a linear superposition of basis functions (see Encoding), we can address the decoding problem within the framework of fi nding a linear inverse mapping.Decoding of the stimulus signal I(t) is equivalent to inferring the coeffi cients c i from the observed spike trains.Every interspike interval imposes a linear constraint on the coeffi cients c i .
where the components of y are defi ned as in Eq. 7. Note that Eq. 10 is a necessary condition for the coeffi cients.The unknown coeffi cients c can be uniquely determined if the number of linearly independent constraints is equal to or larger than the number of unknown coeffi cients (see also Figure 2).We can summarize the constraints compactly in a linear equation: In general, a solution to this equation can be found by using the Moore-Penrose pseudo-inverse (Penrose, 1955): The pseudo-inverse is well defi ned even if the matrix L is not square or is rank-defi cient.If the number of interspike intervals exceeds the number of coeffi cients, the pseudo-inverse is given by:

One dimensional stimulus: exact inference
We start with a simple case in which exact inference is possible: the stimulus consists of a constant (one dimensional) input c, i.e.
In this situation, we can write down the likelihood exactly.
For the observations we have: In this case, we do not have to account for the sub-threshold condition as the evolution of the membrane-potential since the last spike is a monotonic function and therefore there is only one possibility to be at the threshold for a given stimulus at a specifi c time.In particular, if the threshold is Gamma distributed (as assumed in 'Introduction'), we see that y k |c is also Gamma distributed with parameters α, β/c.For now we choose c to be Gamma distributed as well (say with parameters α 0 , β 0 ).This choice deviates from the choice in the Section 'Encoding' , but for this choice, we can write down the posterior exactly: Having the posterior in closed form we can calculate the posterior mean as well as the point of maximal posterior probability exactly.Thus, we have in the special case of a constant one-dimensional input a reference for later use (see also Figure 3).

Gaussian factor approximation
The pseudo-inverse solution of the Section 'Introduction' has also a probabilistic interpretation in linear Gaussian models (see also Bishop, 2006): In this setting, it can be interpreted as the posterior mean estimate for data with a Gaussian distribution.In particular, if (for the moment) we assume that the linear functionals y(t k − , t k ) are observed and that c ‫ׅ‬ y(t k − , t k ) is Gaussian distributed around the mean of the threshold θ with a constant variance σ θ 2 , the posterior mean of the coeffi cients c would be the same as the pseudo-inverse described above.However, this setting is not directly applicable to the context of decoding a stimulus from spike times of LIFs: In a linear Gaussian model, the observed functionals y(t k − , t k ) would not be allowed to depend on either c or θ, but they do here.This is most easily explained for a one-dimensional stimulus: We have that θ = cy, and therefore y = θ/c.This can be highly non-Gaussian even if the distribution of θ and c are Gaussian1 .We now derive a probabilistic decoding rule which is analogous to the pseudo-inverse used in the noiseless case.Each observation defi nes a linear constraint: We can approximate the distribution of the threshold by a Gaussian term.Each linear constraint defi nes one factor of the likelihood.That is, p θ in Eq. 3 is replaced with a Gaussian term of the form: = is the variance of the threshold distribution.Additionally, we have replaced θ by its mean θ , because we are not observing θ but t k .Each of these factors peaks at θ = c ‫ׅ‬ y(t k − , t k ), therefore refl ecting the linear constraint.Replacing every term in the likelihood by its corresponding Gaussian approximation and including one Gaussian factor for the prior p(c) ∼ N( c , Σ c ), the posterior is approximated by a Gaussian with the following moments: In ( 16) and ( 17), we have abbreviated y(t k − , t k ) = y k .In addition to the pseudo-inverse (Eq.12), this approximation takes the prior distribution over stimuli into account, specifi ed by the mean c and covariance Σ c of the coeffi cients c.This can be seen by setting Σ c − = 1 0, i.e. by using an uninformative prior.Then the mean of this approximation p is exactly the pseudo-inverse of Eq. 12.Our approach of replacing likelihood factors by Gaussians is similar to the extended Kalman fi lter, where the dynamics is linearized and therefore results in a Gaussian update for the hidden states.However, it is known that this approximation can be biased (see Julier and Uhlmann, 1997;Minka, 2001).Similarly, in our case, the mean of this approximation also does not converge to the true coeffi cient values for increasing number of observed spikes, as shown in Figure 3. Fortunately, under some simplifying assumptions, this bias can be calculated and therefore can be signifi cantly reduced as will be shown in the following section.

Bias reduction of the Gaussian factor approximation
In this section we calculate the asymptotic length bias for the approximate posterior mean of Eq. 16, assuming a correct orientation of the coeffi cient vector.By fi xing a stimulus, i.e. c, we defi ne the average over all resulting interspike intervals E[y k ]: = y and Cov[y k ]: = Σ y .We then fi nd asymptotically for n >> 1 and for a fi xed c: Note that we do not know the distribution of the y k and that this distribution depends on the distribution of the threshold as well as the choice of basis functions.However, the proportion of y in the direction of c is on average of magnitude θ and the variance along c on the other hand is σ θ 2 .Orthogonal to c, we assume, that y has zero mean and fi nite variance.This assumption is justifi ed in the one-dimensional case, because there is simply no orthogonal direction.Empirically, it turns out to reduce the amount of bias substantially; see Figure 6.
Therefore, we can rewrite where Here, c Ќ denotes the basis for the space orthogonal to c and σ σ , , … are the variances in the direction of the basis vectors of c Ќ which are not important for the calculation of the bias.Wecan now compute the asymptotic posterior mean: We obtain ( 21) from ( 20) by using the Woodbury matrix identity.By definition of U, all directions orthogonal to c cancel out and Eq.22 follows.Equation 22shows that (asymptotically) the norm of the posterior mean approximation is biased.The direction, however, is correct.Therefore, the Moore-Penrose pseudo-inverse is unbiased only in the noiseless case when σ θ 2 0 = .In the noisy case, however, we can divide the mean by its asymptotic bias in order to obtain an unbiased estimator for the coefficients.To improve the estimator also in the regime of few observations, we divide only the likelihood part θ θ σ ∑ 2 k y by the asymptotic bias.Therefore we have for the bias-reduced posterior mean: This bias-reduced version of the Moore-Penrose inverse is also plotted in Figure 3, which gives an improved estimate also for a small number of observations.The presented bias-reduced Gaussian approximation can also be rewritten into an online algorithm.The update equations to incorporate one additional observation y k in the current posterior are given by: Together with Eq. 23 we thus obtain a bias-reduced on-line estimator which allows one to recursively improve the stimulus reconstruction on a spike-by-spike basis: We can now compare how well the different approximations perform compared to the exact solution in the one-dimensional case (see Introduction).In Figure 3 the mean squared error is shown as function of the number of observed interspike intervals.Plotted are the error of the MAP estimator (magenta), the exact minimum mean squared error (blue), the Gaussian-Factor approximation, which is the equivalent to the Moore-Penrose pseudo-inverse (red) and the bias reduced Gaussian-Factor approximation (green).Importantly, the solution obtained by the Moore-Penrose pseudoinverse does not converge to the true solution, but hasa strong bias.This bias can lead to a solution which is actually worse than the prior solution.Unfortunately, we do not have access to the exact posterior in general, especially in higher dimensions.Therefore, we need approximation schemes which are generally applicable in the general case, but which perform better than the Moore-Penrose pseudo-inverse.

TWO-DIMENSIONAL CASE
In the section 'Introduction' , we investigated the accuracy of the different reconstruction schemes in the one-dimensional case.If the stimulus is two-or higher-dimensional, the observation of a single spike does not give us full rank information about the stimulus.In the case of a two-dimensional stimulus, three types of scenarios can occur after one interspike interval has been observed: 1.The observation of an interspike interval only leads to one important constraint on the coeffi cients of the basis functions, namely that the membrane potential has to be at the threshold at the time of a spike.For example, if the observed interspike interval is relatively small, solutions which cross the threshold twice or hit the threshold from above, are very unlikely under the prior distribution.Therefore, to stay below threshold, one can neglect constraints other than being atthe threshold at the time of the observed spike, see also Figure 2. In this situation, all approximations should be almost equally good as they all account for this type of constraint.2. If the interspike interval is longer, we might get another important constraint for the posterior, namely by requiring that the threshold is hit from below, not from above.This possibility is ruled out by the Jacobian term of the pseudo-likelihood (Eq.32).Therefore the MAP estimate should be closer to the true posterior mean than the Gaussian or pseudo-inverse approximation, which does not satisfy this constraint.Here, crossing the threshold twice before hitting it again from below is still very unlikely according to the prior and therefore we do not get an effective restriction for the posterior by ruling out all these solutions which cross the threshold twice.3.If the interspike interval is suffi ciently long, both types of violations of crossing the threshold between spike times are probable according to the prior.Some possible stimuli might exist for which the membrane potential would cross the threshold twice before reaching the threshold again at the time of the observed spike.These stimuli are neither ruled out by the pseudo-likelihood nor by the Gaussian approximation.Therefore, both approximations can result in quite poor estimates of the true posterior mean.
To illustrate the three scenarios, we simulated a single neuron with a stimulus consisting of two basis functions, one sine and one cosine function.We obtained an approximation to the true posterior after single observations by rejection sampling.This true posterior refl ects all of the constraints mentioned above.As can be seen in Figure 4, indeed three types of situations can be observed.

ALTERNATIVE METHODS
In the following, we will discuss the relationship between our decoding rule and previously proposed decoding algorithms.In particular, we compare our decoders with an optimal linear decoder, as well as with a Maximum-a-Posteriori decoder (MAP) based on the approximate likelihood.

RELATIONSHIP WITH LINEAR DECODER
Bialek et al. popularized a linear decoder for reconstructing the stimulus from a spike train (Bialek et al., 1991;Rieke et al., 1997).
Here the spike train ∑ − ) is convolved with an acausal linear fi lter K in order to obtain an estimate of the stimulus: The fi lter can be calculated by (see Rieke et al., 1997): where F is the Fourier transform.The average is taken over the joint distribution of stimuli and spike times, which can be done via sampling.Additionally, the stimuli we used are composed by a superposition of sine and cosine functions with discrete frequencies, which we write here as complex functions f l (t) = exp(iω l t).
Hence, the linear fi lter has also only non-vanishing power in those frequencies which are present in the stimulus.
In the noiseless case, the Pseudo-Inverse decoder can be interpreted as a linear fi lter, but one that depends on the particular spike train observed, as we will show in the following.To this end, we replace the stimulus ensemble used to calculate the linear fi lter with a single stimulus consisting of the stimulus reconstructed by the Pseudo-Inverse.That is, we replace F(s)(ω l ) by ∑ , − j i j L θ; see Eq. 12.If we furtherassume that there is no neuronal noise, we can neglect the expectation in the defi nition of the linear fi lter (Eq.12), and defi ne a linear fi lter K p corresponding to the Pseudo-Inverse: Although this equivalence is only valid in the noiseless case, we can use Eq. 29 to illustrate the decoding performed by the Pseudo-Inverse.The linear fi lters we obtain for this decoder is different for different spike trains, refl ecting the increased fl exibility of the Pseudo-Inverse compared to the optimal linear predictor.The different reconstructions and associated fi lters are illustrated in Figure 5.

MAP and Laplace approximation
By inspecting the approximate likelihood (see Eq. 8) we see that the model is a generalized linear model.In this sense it is very similar to the soft-threshold noise model (Jolivet et al., 2006;Paninski et al., 2007).However, the threshold noise there is Poisson-like, whereas here it is Gamma distributed.Further, the soft-threshold likelihood does not account for the fact that the threshold has to be reached from below.By ensuring that the Jacobian of the change of variables in Eq. 3 is positive, however, we can take this constraint into account.One approach for getting a possibly better point estimate is to fi nd the maximum of the approximate posterior density (MAP).To compute this posterior density, we have to multiply Eq. 3 by the prior density (which is Gaussian in our case).In this model, the MAP cannot be determined in closed form, but we may apply gradient ascent in order to fi nd it numerically.If both likelihood and prior are log-concave, which is true for the approximate likelihood and the Gaussian prior used here, the posterior is unimodal (see Paninski et al., 2004).Hence, fi nding the MAP point is a convex problem.The gradient and the Hessian of the log posterior are straightforward to compute.For the sake of clarity, we only write down the gradient and the Hessian for one spike time t k in the sum of Eq. 8: Here, f(t k ) is the vector consisting of all basis functions evaluated at the spike time t k .The Hessian is given by: Applying a gradient ascent scheme yields a point estimate that respects the constraint that the membrane potential crosses the threshold from below.Nevertheless, it does not take into account the sub-threshold condition between spike times: The solution we get might correspond to a membrane potential that crosses the The fi rst column is the true log-likelihood, the second is the approximate log-likelihood obtained by Eq. 43 and the third column is the Gaussian Factor approximation.The true log-likelihood is not available in higher dimensions and is plotted here for comparison and as a reference.It is obtained via rejection sampling.Point estimates are: true posterior mean ( ), MAP ( ), Gaussian Factor mean ( ) and the bias reduced version ( ).For each point estimate a Gaussian prior with unit isotropic covariance was chosen.Each subplot shows the log-likelihood (or its approximation) after one interspike interval is observed.The x and y axes indicate the two dimensions of the stimulus coeffi cients.Each row corresponds to a different scenario with different numbers of effective constraints for the posterior.If only one constraint is active (fi rst row) the true posterior does not differ much from the other approximations, and therefore the point estimates perform all almost equally well.If two constraints are active (the threshold has to be reached from below and the membrane potential has to be at the threshold at the time of a spike) the MAP performs better than the Gaussian Factor approximation.If three constraints are active, the MAP refl ects two of the three constraints and therefore is slightly shifted.As one observation is far away from the asymptotic regime, the Gaussian Factor approximation and its bias reduced version do not differ much.
threshold twice before it hits it again from below.Therefore this point estimate suffers from the same source of bias as the Gaussian factor approximation.
This point estimate can be extended to give an approximation of the uncertainty as well by expanding the posterior to second order around the MAP point.The posterior we are using here is the likelihood (Eq. 3) times a prior term p(c): which itself is an approximation (see Introduction).Unfortunately, computing the normalization constant for this distribution with respect to c is not tractable.We therefore approximate the posterior by a second-order expansion.In other words, the posterior distribution is approximated by a multivariate Gaussian, where the mean of the Gaussian is taken to be the MAP, and the covariance is found by looking at the second-order derivatives of the logposterior at the MAP:

‫ׅ‬
The MAP and the Hessian are calculated by Eqs 30 and 31.This yields a Gaussian approximation known as the Laplace approximation (MacKay, 2003;Rasmussen and Williams, 2006;Paninski et al., 2007;Cunningham et al., 2008).

SIMULATIONS
In this section, we present the results of three different simulations which highlight different aspects of neural population coding of time-varying stimuli with integrate and fi re neurons.As a general framework, we fi rst specify a generative model for the stimulus signal x(t) = [x 1 (t),…, x m (t)] and then we specify a mapping g : x(t) I(t), which can be interpreted as the encoding strategy of the neural population.The dimension of I(t) = [I 1 (t),…, I n (t)] can be different from the number of spatial stimulus components m.Each I i (t) represents one neuron within a population of n neurons.Each spatial component x l (t),l = 1,…,m is represented with a superposition of temporal basis functions f k (t),k = 1,…, M. In the fi rst simulation we have n = m = 1, M = 80.In the second and third simulation n >> m = 1, M = 40.In the last simulation we study the encoding of an amplitude and phase variable with n = m = 2, M = 40.

SIMULATION 1: ONE NEURON, ONE COMPONENT, MANY TEMPORAL DIMENSIONS
In order to evaluate the accuracy of our Gaussian Factor approximation to the posterior when the stimulus has several temporal dimensions (not to be confused with spatial dimensions m), we analyzed the decoding performance as a function of increasing number of observations.To this end, we simulated a neuron with Bayesian population decoding of spiking neurons a stimulus consisting of a random superposition of 40 sine and 40 cosine functions with equally spaced frequencies between 10 and 50 Hz.In each trial the neuron was simulated until 10 4 spikes were accumulated.We calculated the mean squared error over 100 repetitions.Interspike intervals taken into account for reconstruction were randomly selected from the whole time interval of the simulation.
In Figure 6 we see that the simple Gaussian approximation (Gaussian Factor, red, dashed) is indeed biased and the bias is larger for larger noise levels.In the limit of no noise we expect a sharp drop off for the number of spikes equal to the number of dimensions for the stimulus.This is weakened in the presence of noise.For comparison, we also plot the asymptotic error of the Gaussian Factor approximation as derived analytically in the Section 'Decoding' .Additionally the mean squared errors are plotted for the linear decoder (see Introduction) and the bias-reduced version of the Gaussian approximation.The mean squared error for the MAP was obtained by gradient ascent (see Introduction).In order to start with a feasible solution, we initialized the optimizer with the true stimulus coeffi cients, turning the obtained solution in an optimistic estimate of the actual MAP.

SIMULATION 2: MANY NEURONS, MANY TEMPORAL DIMENSIONS
In this simulation, a population of n = 30 neurons with different receptive fi elds were all driven by the same stimulus, which consisted of a superposition of 20 sine and 20 cosine functions 20 were equally spaced between 1 and 100 Hz, and the coeffi cients { } c j j=1 40 were drawn independently from a Gaussian distribution with unit variance.Incorporating a receptive fi eld r i (t) for neuron i in our model can easily be done by pre-fi ltering the stimulus with the corresponding receptive fi eld: Because of the linearity of the convolution, the decoding algorithms stay the same with the exception that the basis functions f k (s) are replaced by r i * f k .The receptive fi elds r i (t) of each neuron were chosen to be a gamma tone: All parameters except the frequency f were fi xed (a = 0.01, b = 0.01[ ] 1 ms , n = 2, φ = 0).The frequencies of each receptive fi eld were drawn from a uniform distribution ranging between 1 and 100 Hz.The resulting receptive fi elds are shown in Figure 7(left).The stimulus and its reconstruction based on the spike times of this population are shown in Figure 7(right).The uncertainty is smaller within periods of higher fi ring rates, yet to a smaller extent than in the next setting (see Figure 9), because here the receptive fi elds have a larger temporal extent.

SIMULATION 3: HETEROGENEITY
Every new spike contributes new information about the stimulus, and leads to a reduction in reconstruction error.However, if the resulting linear constraints are correlated, the reduction can be arbitrarily small.This problem can become particularly severe for interspike intervals observed at different neurons.For example, if the parameters of different neurons (e.g. the receptive fi elds) are the same, spikes of different neurons tend to synchronize, even in the presence of threshold noise.This leads to similar interspike intervals, and thus to highly correlated linear constraints.In this case, the information conveyed by different neurons can be redundant and be of limited use for decoding.
It is plausible that effi cient population codes should have heterogeneity in their receptive fi eld properties, to ensure that different properties of the stimulus are sampled by the population.In our setting, diversity in receptive fi eld parameters would ensure that the constraints are less correlated and that the reconstruction error does not saturate with increasing numbers of neurons.As a result, we expect to get a better reconstruction if we have a larger diversity within the encoding population.
In this simulation, we extend the previous example by systematically varying the degree of similarity in the receptive fi eld properties among the different neurons.To construct heterogeneous populations with different degrees of diversity we sampled the center frequencies of the receptive fi eld (gamma tone) of each neuron from a uniform distribution within a frequency interval centered at 50 Hz (the center frequency of the stimulus used).The degree of diversity was then measured by the length of thisinterval, from 0 to 25 Hz. Figure 8 shows the mean squared error as a function of number of neurons as well as the diversity within the receptive fi elds.From this plot one can see that the rate with which the error drops with increasing number of neurons strongly depends on the degree of diversity.This result confi rms the general idea of redundancy reduction as an effi cient coding strategy.

SIMULATION 4: ENCODING OF AMPLITUDE AND PHASE VARIABLES
In this simulation we consider the case of decoding a twodimensional, time-varying stimulus signal.In particular, we want to illustrate how the encoding of angular variables can be addressed in this framework, as the neural representation of edge orientations or motion directions are frequently studied in neuroscience.Therefore, we use the nonlinear polar coordinate transform to obtain an amplitude and phase variable x(t) = [a(t), ϕ(t)] ‫ׅ‬ as our stimulus signal.For simplicity, we consider the case where this signal is encoded by two neurons with identical temporal receptive fi eld properties but with 90° difference in the preferred stimulus angle.Specifi cally, the encoding model of the two neurons is given by: As temporal basis functions we picked 20 sine and cosine basis functions with discrete equally spaced frequencies between 1 and 10 Hz.The corresponding coeffi cients c k were drawn independently from a Gaussian distribution with variance 2 σ 2 = 0.06.  2 The small variance was chosen such that the resulting signal varies roughly between − π 2 and π 2 .
The neurons were simulated according to Eq. 1, with parameters τ σ θ = , = , = .10 1 0 01 2 θ .As can be seen from Figure 9, the two dimensional signal (bottom two panels) can be reconstructed best in those time intervals which contains spikes (vertical black lines).The reconstruction and uncertainty (obtained via sampling) are transformed into phase and amplitude in the top two panels.

DISCUSSION
How to read out spatio-temporal spike patterns generated by populations of neurons is fundamental to the understanding of neural network computation.Most of the previous studies on population coding were limited to the static case where only spike countsfor a preset time window are considered.For the encoding of continuously varying signals, however, it is important to understand how the accuracy of population codes is affected by the dynamics of neural spike generation.
Here, we studied dynamic population codes with noisy leaky integrate-and-fi re neurons.We presented an algorithm for Bayesian decoding similar to the one presented in Cunningham et al. (2008).In addition, we derived an approximate algorithm which yields a simple spike-by-spike update rule for recursively improving the stimulus reconstruction whenever a new spike is observed.
The decoding rules can also be applied for decoding the spike trains of populations of neurons, not just single neurons.Importantly, we do not have to assume that the neurons are uncoupled, i.e. conditionally independent given the stimulus.In particular, as we assume the encoding model to be known, we would also know the parameters describing the couplings between neurons.Then, the infl uence of one spike of a neuron on the membrane potential of any other neuron is just a known, given input and can be subtracted.Therefore, the same decoding framework can also be used for decoding coupled neurons.
The decoding rule is nonlinear and sensitive to the relative latencies between each spike and its predecessor in the population.However, it is not optimal as it does not use the information that the membrane potential stays below threshold between spikes.To incorporate this kind of knowledge one has to integrate the coeffi cient distribution over the linear halfspace confi ned by the threshold similar to the method described in Paninski et al. (2004Paninski et al. ( , 2007) ) but with the additional complication that, the distribution is not Gaussian.Therefore, the optimal Bayesian decoding rule would be computationally much more expensive.
The main goal of this work was to derive a simple decoding rule that facilitates the analysis of neural encoding strategies such as efficient coding, unsupervised learning, or active sampling.Bayesian approaches are particularly useful for these problemsas they do not yield a point estimate only but also aim at estimating the posterior uncertainty over stimuli.Having access to this uncertainty allows one to optimize receptive fi eld properties or other encoding parameters in order to minimize the reconstruction error or to maximize the mutual information between stimulus and neural population response.In this way it becomes possible to extend unsupervised learning models such as independent component analysis (Bell and Sejnowski, 1995) or sparse coding (Olshausen FIGURE 9 | Decoding of an angular variable.Two neurons were stimulated with a(t)sinϕ(t) and a(t)cosϕ(t), respectively (two bottom panels).Each of those signals was represented by a superposition of 20 sine and 20 cosine functions.From the reconstructed signal, the amplitude a(t ) and the phase angle ϕ(t ) were obtained by taking the Euclidean norm and the arc-tangent, respectively.The reconstruction (dashed) of the original stimulus (solid) was obtained by using the Gaussian approximation with bias correction.Confi dence intervals, indicating one standard deviation of the posterior variance, are plotted in shaded gray.The confi dence intervals of a(t) and ϕ (t) were calculated by drawing 5000 samples from the approximate posterior.
and Field, 1996) to the spatio-temporal domain with spiking neural representations.This seems highly desirable as comparisons between theoretically derived models and experimental measurements would thus become feasible.
Furthermore, animals do not receive the sensory input in a passive way but actively tune their sensory organs to acquire the most useful data, for example by changing gaze or by head movements.Such active sampling strategies are related to the theory of optimal design or active learning (Lewi et al., 2009), where the next measurement is selected in order to minimize the current uncertainty about the signal of interest.Such active sampling strategies give rise to 'saliency maps' , which encode the expected information gain from any particular stimulus.
Maximizing the mutual information between stimulus and neural response is equivalent to minimizing the posterior entropy.Because of the Gaussian approximation, this can be done in our model by performing a gradient descent on the log-determinant of the posterior covariance matrix.The gradient can be calculated from Eq. 17.However, the approximated posterior covariance derived in this paper might also be subject to a systematic deviation from the exact covariance matrix.Therefore, an important extension of the present work would be to correct for a bias in the approximate covariance estimate, too.In general the approximations considered in this paper usually tend to over-estimate the true underlying uncertainty, as they wrongly donot cut-off regions in the parameter space.
In this paper, we chose to represent the stimulus by a superposition of a fi nite set of basis functions as this has some practical advantages.Alternatively, it is possible to start from a full Gaussian process as stimulus model and then derive a discretization for numerical evaluation.Analogous to the mean vector and covariance matrix of a fi nite-dimensional normal distribution, a Gaussian process prior over the stimulus is specifi ed by the mean and covariance function of the process.For numerical evaluation it is necessary to choose a grid of time points yielding a fi nite dimensional normal distribution again.Note that for inference, integrals on the grid points have to be evaluated numerically and therefore a fi ne time resolution for the s i should be chosen.Therefore, the computational load of decoding a discretized Gaussian process is considerably higher.For practical reasons, we can restrict the inference procedure to a time window around the current spikes, provided that the covariance function falls off quickly.In the nonleaky case with no receptive fi elds this is the same setting as in Cunningham et al. (2008).
The extension to the Gaussian process setting is conceptually important as it allows one to replace the somewhat artifi cial threshold noise model by membrane potential noise.The dynamics can then be described by a stochastic differential equation.Although the likelihood is much harder to calculate (Paninski et al., 2004(Paninski et al., , 2007)), it still has the renewal property and therefore a similar approximation scheme might be applicable.However, it has the further complication, that the obtained likelihood is only for a given threshold and therefore the threshold has to be marginalized.We hope that more studies will be devoted to the problem of decoding time-varying stimuli from populations of spiking neurons in the future.In particular, it will be crucial to achieve a good trade-off between the basic dynamics of neural spike generation, the accuracy of posterior estimates and the computational complexity of the decoding algorithm.

FIGURE 1 |
FIGURE 1 | Illustration of the encoding process.We simulated a leaky (τ = 10) integrate-and-fi re neuron with threshold noise (mean 1.0, variance 0.05).The input is a pink noise process consisting of 80 basis functions, 40 sine and 40 cosine, frequencies equally spaced between 1 and 500 Hz.The stimulus is plotted in shaded gray, the membrane potential in black.The threshold is drawn randomly according to a gamma distribution every time a spike (vertical lines) is fi red.

FIGURE 2 |
FIGURE 2 | Example of noiseless decoding for a two dimensional stimulus and its limitations.The inset illustrates the linear constraints that the fi rst and the second interspike interval pose on the two coeffi cients c 1 and c 2 .The driving stimulus is plotted in blue.Vertical bars at the bottom indicate the three observed spike times corresponding to threshold crossings of the membrane potential (solid black).Possible membrane potential trajectories, which obey the linear constraints are plotted in shaded green and red respectively, darker ones have smaller norm.As can be seen the linear constraints only refl ect that the membrane potential has to be at zero at the beginning of an interspike interval and at the threshold at the end of it.They do not refl ect that the membrane potential has to stay below threshold between spike times.Parameters are: τ = 1 ms, frequency for sine and cosine basis functions: 32 Hz.

FIGURE 3 |
FIGURE 3 | Comparison of the mean squared error (MSE) for differentreconstruction methods in the case of a one dimensional stimulus.The best possible estimate is the true posterior mean (exact, blue).The error of the maximum a posteriori (MAP) estimator (magenta) is nearly the same as the error of the exact posterior mean and therefore cannot be distinguished from the exact one.The red line shows the error of the Moore-Penrose pseudo-inverse and the horizontal line indicates its asymptotic bias.The Moore-Penrose pseudo-inverse is called Gaussian Factor approximation (see Encoding).The bias corrected (BC) version of the Gaussian approximation (green) is explained later and here included for completeness (see Decoding).Parameters were: α prior = 20, β prior = 0.5, α θ = 2, β θ = 0.5.

FIGURE 4 |
FIGURE 4| Log-likelihood approximations in two dimensions for three different cases of observations and different approximations to the posterior.The fi rst column is the true log-likelihood, the second is the approximate log-likelihood obtained by Eq. 43 and the third column is the Gaussian Factor approximation.The true log-likelihood is not available in higher dimensions and is plotted here for comparison and as a reference.It is obtained via rejection sampling.Point estimates are: true posterior mean ( ), MAP ( ), Gaussian Factor mean ( ) and the bias reduced version ( ).For each point estimate a Gaussian prior with unit isotropic covariance was chosen.Each subplot shows the log-likelihood (or its approximation) after one interspike interval is observed.The x and y axes indicate the two dimensions of the

FIGURE 5 |
FIGURE 5 | Comparison of the linear decoder and the Gaussian factor approximation. Upper left: Linear fi lter obtained via Eq.28.Upper right: Average linear fi lter for the Pseudo-Inverse or Gaussian factor approximation, see Eq. 29.Bottom: Example of a decoded stimulus for a given spike train by two decoding schemes.The true stimulus is plotted in dashed black, the Gaussian factor reconstruction in red and the linear decoder reconstruction is

FIGURE 6 |
FIGURE 6 | Mean squared error (MSE) as a function of the number of spikes used for the different decoding schemes.The stimulus consists of a superpostion of 40 sine and 40 cosine functions of discrete frequencies equally spaced between 10 and 50 Hz.The time constant of the neuron used for decoding is τ = 25 ms.The MSE is calculated as the average over 100 repetitions for three different noise levels.Horizontal lines indicate the asymptotic bias for the different noise levels.The prior was an isotropic Gaussian with zero mean and covariance matrix 1 ⋅ 25.

FIGURE 7 |
FIGURE 7 | Left: Receptive fi elds of the population, each is a gamma tone with a different frequency, randomly drawn from a uniform distribution between 1 and 100 Hz.Right: A time varying stimulus consisting of a superposition of 20 sine and 20 cosine functions is decoded from spike trains of a population of 30 neurons, each having a noise level of σ θ = 0.05.

FIGURE 8 |
FIGURE 8 | Mean squared error as a function of the number of neuronsand their diversity within their receptive fi elds.Diversity is measured by the width of the uniform distribution from which frequencies for the gamma tone receptive fi elds were drawn.The average is taken over ≥25 repetitions.All other parameters were as in the previous section.