Information Loss Associated with Imperfect Observation and Mismatched Decoding

We consider two types of causes leading to information loss when neural activities are passed and processed in the brain. One is responses of upstream neurons to stimuli being imperfectly observed by downstream neurons. The other is upstream neurons non-optimally decoding stimuli information contained in the activities of the downstream neurons. To investigate the importance of neural correlation in information processing in the brain, we specifically consider two situations. One is when neural responses are not simultaneously observed, i.e., neural correlation data is lost. This situation means that stimuli information is decoded without any specific assumption about neural correlations. The other is when stimuli information is decoded by a wrong statistical model where neural responses are assumed to be independent even when they are not. We provide the information geometric interpretation of these two types of information loss and clarify their relationship. We then concretely evaluate these types of information loss in some simple examples. Finally, we discuss use of these evaluations of information loss to elucidate the importance of correlation in neural information processing.


IntroductIon
Neurons in early sensory areas represent the information of various stimuli from the external world by their noisy activities. The noise inherent in neural activities needs to be properly handled by the nervous system for the information to be accurately processed. One simple but powerful means for coping with the neural noise is population coding. Neurophysiological experiments have shown that many neurons with different selectivities respond to particular stimuli. These findings suggest that the nervous system represents information through population activities, which would be helpful for accurate information processing. This coding scheme for stimulus information is known as population coding.
An important feature of population coding is that the activity of neurons is correlated (Gray et al., 1989;Gawne and Richmond, 1993;Zohary et al., 1994;Meister et al., 1995;Lee et al., 1998;Ishikane et al., 2005;Ohiorhenuan et al., 2010; but see Ecker et al., 2010). A crucial question is how much information the nervous system can extract from correlated population activities. In general, it is difficult for the nervous system to maximally extract information when its activities are correlated because two conditions must be satisfied. First, downstream neurons must perfectly observe the responses of the upstream neurons (perfect observation of neural responses). Second, downstream neurons must optimally decode the information from the observed neural responses (optimal decoding of stimulus). In other words, if either the observation or the decoding is imperfect or non-optimal, which are both likely situations in the nervous system, stimuli information is inevitably degraded. In this work, we discuss the amount of information loss associated with imperfect observation and nonoptimal decoding.
Combining the concepts of imperfect observation and mismatched decoding produces four types of situations where stimulus is inferred: (1) observation is perfect and decoding is optimal; (2) decoding is optimal but observation is imperfect; (3) observation is perfect but decoding is mismatched; and (4) observation is imperfect and decoding is mismatched. We discuss the inferences in these four types of situations from the viewpoint of information geometry (Amari and Nagaoka, 2000) and then clarify their relationships. We also specifically compute the amount of information obtained through these four inference types for neural responses described by the Gaussian model and by a binary probabilistic model. This paper is organized as follows. First, we introduce the concept of an exponential family and two probability distributions that belong to the exponential family, i.e., the Gaussian distribution and the binary probabilistic model, which have both been intensively investigated as representative models of neural responses (Abbott and Dayan, 1999;Amari, 2001;Nakahara and Amari, 2002). Second, we provide information geometric interpretation of the four types of inference mentioned above and describe how to evaluate the information in each of the four types by using the Fisher information. Third, we compute the amount of information in each of the four types of inference in the Gaussian model and explain the relationship between the inference with imperfect observation and that with mismatched decoding. Fourth, we also compute the amount of information obtained by the four types of inference in simple binary probabilistic models. Finally, we summarize the results and discuss how to use the two measures introduced in this study for quantifying the importance of neural correlation and mention some of the future directions of this work.

ExponEntIal famIly of probabIlIty dIstrIbutIons
Let us denote the conditional probability distribution for a neural response r = (r 1 , r 2 ,…,r N ) over a population of N neurons being evoked by a stimulus s as p(r; s), where s is a continuous variable. We assume that p(r; s) belongs to the exponential family. Probability distributions that belong to an exponential family can be written as p s R s s where R I (r) is a function of neural responses r, u I (s) is a function of stimulus s, and a normalization constant c(s) is also a function of stimulus s. u I (s) is called the natural parameter of the exponential family. Two examples of probability distributions are investigated in this paper.

ExamplE 1. GaussIan dIstrIbutIon
The number of spikes emitted by a neuron over a fixed time period (time-averaged rate) or the total number of spikes emitted by a population of neurons (population-averaged rate), which is denoted by r, may be described by the Gaussian distribution p s s where N is the number of neurons, f(s) is the average number of spikes, and C(s) is the covariance matrix. If we rewrite Eq. 2 as follows, we can see that the Gaussian distribution belongs to the exponential family In this case,

ExamplE 2. loG-lInEar modEl of bInary nEural rEsponsE
When we analyze neural responses within a short time period (∼1-10 ms, typically), the neural responses are considered stochastic binary variables: r i = 1 when the ith neuron fires within the time bin, and r i = 0 when it does not. The joint distributions of N random binary variables can be generally written in the following form (Amari, 2001;Nakahara and Amari, 2002): p s s r s rr s r r r s where the normalization constant c(s) is given by  where l s = log p T (r T ; s) and E p T denotes the expectation with respect to the distribution p T (r T ; s). Through the Cramér-Rao bound, the Fisher information bounds the average squared decoding error for an unbiased estimate as follows: where s is the true stimulus value and ŝ is the estimate. Since the Fisher information is the lower bound of the mean square error, behavior of the mean square error and the Fisher information could be different in general (Brunel and Nadal, 1998;Yaeli and Meir, 2010). However, the maximum likelihood estimator, which chooses s for an estimate that maximizes likelihood function p T (r T ; s), achieves the Cramér-Rao bound (Eq. 19) as T → ∞. A Bayesian estimator, which is generally a biased estimator, can also achieve the Cramér-Rao bound as T → ∞ because it becomes equivalent where By using g IJ (U), the Fisher information with respect to the stimulus s can be written as The Fisher information g(s) described above determines the accuracy of the estimate of s under two conditions: (1) all sufficient statistics R I are available, and (2) the likelihood function p(r; s) is exactly known. Regarding the first condition, downstream neurons may not be able to simultaneously access the responses of all upstream neurons. This imperfect observation of neural responses by downstream neurons leads to loss of information. Similarly, regarding the second condition, downstream neurons are unlikely to completely know the likelihood function p(r; s). Downstream neurons are more likely to only partially know p(r; s) and to decode the stimulus based on a decoding model q(r; s), which is not equal to p(r; s) but partially matches p(r; s) (Nirenberg et al., 2001;Wu et al., This probability distribution is clearly in the exponential family form. In this case, Recent investigations have shown that the observed statistics of neural responses can be sufficiently captured by this type of probabilistic model, which contains up to second-order correlation terms (Schneidman et al., 2006;Shlens et al., 2006;Tang et al., 2008), although the importance of higher-order correlations has also been discussed (Amari et al., 2003;Montani et al., 2009;Ohiorhenuan et al., 2010). For simplicity, we consider only the second-order correlations and ignore higher-order correlations in this work, i.e., ). for (13)

InfErEncE of stImulus and fIshEr InformatIon
We consider the inference problem of how accurately the stimulus value s can be estimated when the stochastic neural response r is given. We assume that neural response r is observed many times. The neural response at the tth trial is denoted by r(t) and the number of trials by T. If each neural response is independent and identically distributed, the probability distribution for T observations of neural responses is given by ( ( ); ), where R I are sufficient statistics for the probability distribution and are given by We evaluate the accuracy of the estimate by using the Fisher information, If we regard the inverse of the squared decoding error as the Fisher information, the Fisher information for mismatched decoding model q(r; s) is Note that when q(r; s) = p(r; s), g * (s) = g(s).

InformatIon GEomEtrIc IntErprEtatIons
We discuss the inference of stimulus when neural responses are only partially observed and that when a mismatched probability distribution is used for decoding, from the information geometric viewpoint (Amari and Nagaoka, 2000). We consider the following four types of inference.

Inference 1 (Perfect observation and matched decoding)
The complete data ( , ) R R 1 2 are available and the true probability distribution p(r; s) is used for decoding.
Inference 2 (Imperfect observation) The true probability distribution p(r; s) is used for decoding but only partial data ( ) R 1 are available.
Inference 3 (Mismatched decoding) The complete data ( , ) R R 1 2 are available but a mismatched probability distribution q(r; s) is used for decoding.

Inference 4 (Imperfect observation and mismatched decoding)
A mismatched probability distribution q(r; s) is used for decoding and only partial data ( ) R 1 are available. We assumed that both the true probability distribution p(r; s) and mismatched probability distributions q(r; s) belong to the exponential family of probability distributions S given in Eq. 1. S is specified by n-dimensional natural parameters U = (u 1, u 2, …, u n ). If we take U as a coordinate system introduced in set S of probability distributions, we can regard S as an n-dimensional manifold (space). A point in S represents a specific probability distribution determined by the parameters U. The true statistical model p(r; s) denoted by M and the mismatched statistical model q(r; s) denoted 2001; Oizumi et al., 2010). This mismatched decoding of stimuli by downstream neurons also results in loss of information. These two types of information loss are evaluated next.

InformatIon loss causEd by ImpErfEct obsErvatIon of nEural rEsponsEs
In this work, we specifically consider the situation that second-order sufficient statistics R 2 are not accessible to downstream neurons and only first-order sufficient statistics R 1 are available to them. This is related to whether coincidence detector neurons are needed to accurately estimate the stimulus. To evaluate the loss of information associated with loss of data, we first marginalize the joint probability distribution p s When only R 1 is observed, the Fisher information with respect to stimulus s is given by The information loss associated with loss of R 2 is

InformatIon loss causEd by mIsmatchEd dEcodInG of stImulus
We evaluate the loss of information when downstream neurons infer the stimulus parameter s based on not the correct probability distribution p(r; s) but a mismatched probability distribution q(r; s). We assume that q(r; s) also belongs to the exponential family and that the maximum likelihood estimation based on q(r; s) is consistent, i.e., where E p denotes the expectation with respect to p(r; s) and l q (r; s) = log q(r; s). We evaluate the squared decoding error of the maximum likelihood estimation with the mismatched likelihood function q(r; s) based on T observations of neural responses, r T . The estimate ŝ q is given by For a given candidate point in D, ˆ, x the point in M that minimizes the Kullback-Leibler divergence is given by the orthogonal projection of x to M.
Inference 3 For Inference 3 (Figure 4), we assumed that the maximum likelihood estimation based on a mismatched model q(r; s) is unbiased (Eq. 29). This condition corresponds to the case where the point p(r; s) in M and the point q(r; s) in M * , which both represent a given stimulus parameter s, are the mutually nearest points in S in terms of the Kullback-Leibler divergence, i.e., If we differentiate Eq. 40 with respect to s′, we obtain Eq. 29. Similar to in Inference 1, the maximum likelihood estimation based on a mismatched model M * corresponds to the minimization of the Kullback-Leibler divergence between the observed point x and points in M * , This corresponds to the orthogonal projection of the observed point x to M * (Figure 4). The orthogonal projection to M * cannot completely eliminate the deviation in the direction perpendicular to M, unless M and M * are in parallel. Thus, information is inevitably lost depending on the angle between M and M * . The Fisher information for the mismatched decoding model is given by Eq. 36. by M * , both of which are parameterized by a single variable s, are considered as curves in the manifold S, i.e., one-dimensional submanifolds having a coordinate s.
Inference 1 First, we describe Inference 1 from the viewpoint of information geometry (Amari, 1982;Amari and Nagaoka, 2000). Let us denote the observed data ( , ) R R 1 2 o o by x. x can be considered as a point in S, which we call the observed point. The observed point x is distributed near the point s that represents the true probability distribution when stimulus s is presented, p(r; s). The deviation of x from the point specified by the true stimulus parameter s can be decomposed into the deviation in the parallel direction to M and the deviation in the orthogonal direction to M. The maximum likelihood estimation corresponds to the minimizer of the Kullback-Leibler divergence between the distribution corresponding to the observed point x (which is not in M in general) and distributions in M, where ŝ is the maximum likelihood estimator. The geometric interpretation of the maximum likelihood estimation is the orthogonal projection to M from x (Figure 2). The orthogonal projection completely eliminates the deviation of x from s in the orthogonal direction to M but the deviation in the parallel direction to M remains. This remaining deviation corresponds to the Fisher information of M (Eq. 18). If we use other estimators that are not orthogonal projections to M (e.g., the moment estimator), the decoding error necessarily becomes larger than the orthogonal projection. Inference 2 In the Inference 2 case (Amari, 1995), only R 1 is observed. Let us define a submanifold D, which is formed by the set of observed points, where R 1 is fixed at the observed value R 1 o but unobserved R 2 takes arbitrary values. Submanifold D is called the data submanifold. The maximum likelihood estimation based on partial observed data corresponds to searching for the pair of points x ∈ D and ŝ M ∈ that minimizes the Kullback-Leibler divergence between D and M (Figure 3), i.e., min ( || ) min ( || ).
The estimated value of s is expressed as In this model, we compute the Fisher information and the information loss when the data of variance R is lost and those when the decoding model whose variance is mismatched with the actual one is used.
Inference 1 When the data of neural responses x = ( ) r R , are completely observed and the actual statistical model M is used in decoding, the maximum likelihood estimation of s corresponds to the orthogonal projection from x to M (Figure 6). In this case, we can compute the Fisher information as Inference 2 When data of variance R is lost, the data manifold D is given by In this case, the estimated value of s is the intersection point of D and M (Figure 6). By using Eq. 27, we can obtain the Fisher information, where we used the fact that the marginal distribution over R can be written as p r ( ) can be derived by considering that r also obey a Gaussian distribution and that the mean and the variance of r are s and s 2 /T, respectively.
The information loss is given by Inference 4 In Inference 4 (Figure 5), the maximum likelihood estimation with partial observed data R 1 o and a mismatched probability distribution q(r; s) corresponds to searching for two points in the data submanifold D and the mismatched model In this section, we compute the Fisher information obtained by the four types of inference described in the previous section when the probability distributions are Gaussian. We also discuss the relationship between the inferences.

onE-dImEnsIonal casE
Before we deal with the multidimensional Gaussian model, we first consider the one-dimensional case as a toy example. We specifically consider the Gaussian distribution with mean m(s) = s and variance s 2 (s) = s 2 : The statistical model M = {N(s, s 2 )} is expressed as a curve in the manifold S = {N(m, s 2 )} with coordinates of mean m and variance s 2 (Figure 6). The probability distribution on T observations of r is given by where r and R are sufficient statistics  .
By using the Fisher information matrix with respect to the natural parameters, we can obtain the Fisher information with respect to stimulus s from Eq. 25: Inference 2 Second, let us consider the Fisher information in the Inference 2 case. We consider the situation that the second-order sufficient statistics R in Eq. 55 are lost and only the first-order sufficient statistics r are observed. The marginalized distribution over missing data R is given by where the natural parameters U and Θ are where we ignored the terms of the order of 1/T in the limit of T → ∞. In this case, the Fisher information matrix with respect to the natural parameters is computed as follows: Inference 3 We specifically consider the inference with the following mismatched decoding model to compare it with the inference when R is lost: In this model, the mean is equal to the actual one but the variance is mismatched with the actual one. The maximum likelihood estimation based on the mismatched decoding model M * corresponds to the orthogonal projection from the observed point x to M * (Figure 6). By using Eq. 36, we obtain the Fisher information Inference 4 When the mismatched decoding model q(r), where the variance is independent of s, is used for decoding, the data of variance R does not affect the results of the inference. Thus, even if R is lost, no information is lost in this mismatched decoding. The Fisher information in the Inference 4 case is the same as that in the Inference 3 case: As Eqs. 49 and 53 show, I F3 (s) is equal to I F2 (s). We can also easily show that I F3 (s) is equal to I F2 (s) in one-dimensional cases in general. However, in the multidimensional case, I F3 (s) is not equal to I F2 (s). In the next section, we explain the general relationship between I F2 (s) and I F3 (s) in the multidimensional Gaussian model.

multIdImEnsIonal casE
We next consider the multidimensional Gaussian distribution shown in Eq. 2. The probability distribution for T observations of neural responses r is given by where the sufficient statistics r and R are the natural parameters U and Θ are and the normalization constant Ψ(s) is Inference 1 First, let us consider the Fisher information in the Inference 1 case. The Fisher information matrix with respect to the natural parameters is given by Frontiers in Computational Neuroscience www.frontiersin.org covariance data, R, is the statistical model whose covariance matrix is a constant matrix. In this case, the vector of natural parameters Θ in Eq. 55, which is coupled with R, does not depend on s. Thus, when we use a mismatched model q(r; s) whose covariance matrix is independent of s, the inference does not change even if the data about covariance R are lost. Thus, Inferences 3 and 4 result in the same estimate of s and the same Fisher information: To summarize, the relationship between the Fisher information in each of the four inference cases is

InformatIon loss In loG-lInEar modEl of bInary nEural rEsponsE
In this section, we evaluate the information loss associated with loss of data and mismatched decoding in the log-linear model of binary neural response.

two-nEuron modEl
As the simplest example, we first consider the two-neuron model, p r r s r s r s r r Z s where r 1 , r 2 , and R 12 are sufficient statistics, and W r r R ( , , ) 1 2 12 is the number of configurations of (r (1) , r (2) ,…,r (T) ) where the sufficient statistics take the specific values r r R If we compare the components of the Fisher information matrix when R is missing with those when the data are complete, the information loss due to the missing data is seen to be represented in the components ∂ ; .
From Eqs. 64 and 73, we find that the information loss due to the missing data R is This information loss solely depends on C′C −1 and is always positive.
Inference 3 Third, let us consider the Fisher information in the Inference 3 case. In the Inference 2 case, we considered that the second-order sufficient statistics R, which are the variance and covariance data of neural responses, are lost. A mismatched probability distribution q(r; s) that is comparable with the inference when R is lost would be that the mean in q(r; s) is the same as that in p(r; s) but the covariance matrix in q(r; s) does not match the true covariance matrix in p(r; s). As a simple example, we assume that the covariance matrix in the mismatched probability distribution is a constant matrix C q that is independent of s: In this case, we can show that the maximum likelihood estimation based on the mismatched probability distribution is consistent, i.e., the condition shown in Eq. 29 is satisfied, as follows: By using Eq. 36, we obtain the Fisher information for the mismatched model q(r; s): Inference 4 and comparison Finally, we consider the Fisher information in the Inference 4 case and compare the four types of inference described above. It is obvious that the Fisher information obtained by Inference 1, I F1 , is the largest and the Fisher information obtained by Inference 4, I F4 , is the smallest. On the other hand, the relationship between the Fisher information obtained by Inference 2, I F2 , and that obtained by Inference 3, I F3 , is not clear, i.e., However, the relationship between I F2 and I F3 can be clarified by considering the Fisher information obtained by Inference 4, I F4 . We assumed that a mismatched model that is related to the loss of where we used the fact that R 12 * only depends on Θ. The information loss is given by This information loss only depends on ∂Θ/∂s. Thus, when Θ is independent of s, there is no information loss even if data R 12 are lost.
Inference 3 We next compute the Fisher information when neural correlation is ignored in decoding. We consider the mismatched decoding model q(r 1 , r 2 ) that is the product of the marginal probability distributions of the actual distribution p(r 1 , r 2 ): From the definition of q(r 1 , r 2 ), the averaged values of r 1 and r 2 over the mismatched model q(r 1 , r 2 ) are equal to those over the actual model p(r 1 , r 2 ). Thus, the following relationship holds between u i D s ( ) and the natural parameters in the actual probability distribution p(r 1 , r 2 ): The maximum likelihood estimation based on this mismatched decoding model q(r 1 , r 2 ) is shown to be consistent as follows: In the limit of T → ∞, by using Stirling's formula, W r r R ( , , ) Inference 1 We first compute the Fisher information in the Inference 1 case. The Fisher information matrix with respect to the natural parameters is given by where 〈x〉 = E p [x] = ∑ r xp(r 1 , r 2 ). By using the Fisher information matrix with respect to the natural parameters, we can obtain the Fisher information with respect to stimulus s from Eq. 25: Inference 2 We next compute the Fisher information when the data of neural correlation R 12 are lost. When T is finite, it is difficult to marginalize the probability distribution p r r R ( , , ) 1 2 12 over R 12 because there are many possible R 12 when specific values of r 1 and r 2 are given. However, in the T → ∞ case, we only need to consider the most probable R 12 when r 1 and r 2 are given. By differentiating the argument of the exponential function in Eq. 83 with respect to R 12 , we can obtain the equation for the most probable R 12 : By using Eq. 94, we can compute the Fisher information when R 12 is lost as (109)

Inference 1
We first compute the Fisher information when data are complete and the decoding is optimal. As Eq. 23 shows, the Fisher information can be computed if we evaluate log Z. For analytical tractability, we consider the limit of N→∞. In this case, Z can be calculated as and W(m) is the number of states where m takes a certain value, which is given by In the limit of N→∞, by using Stirling's formula, W(m) can be approximated as We denote the argument of the exponential function in Eq. 110 by F, where In the limit of N→∞, the integral in Eq. 110 can be approximated as where F * is the maximum of the function F. From ∂F/∂m = 0, the value of m that maximizes the function F is the solution of the self-consistent equation We set the solution of Eq. 116 to m * (u, Θ). The Fisher information matrix with respect to the natural parameters θ and Θ is given by When the estimation by a mismatched decoding model is consistent, the Fisher information obtained by the mismatched decoding model can be computed by Eq. 36. The Fisher information is given by where As discussed in the previous section, I F3 (s) is always smaller than I F2 (s). Although the information loss associated with the loss of data R only depends on ∂Θ/∂s, the information loss associated with ignoring correlation in decoding depends not only on ∂Θ/∂s but also on ∂u 1 /∂s and ∂u 2 /∂s. Thus, when neural correlation is ignored in decoding, the information is lost even if Θ is independent of s.
As a special case, if u 1 = u 2 , which means 〈r 1 〉 = 〈r 2 〉 = 〈r〉, the information loss only depends on ∂Θ/∂s and is given by ∆I s T r r r r r r r r r r r r F 3 1 2 1 2 1 2 2 1 2 In this case, ∆I F3 (s) is equal to ∆I F2 (s) (Eq. 96). This is because if u 1 = u 2 = u, only two parameters, namely u and Θ, are in the statistical model. This is the same situation as in the one-dimensional Gaussian case illustrated in Figure 6, where ∆I F3 (s) is also equal to ∆I F2 (s).

homoGEnEous N nEuron modEl
We next consider the case with a large number of neurons. In this case, the Fisher information cannot be analytically computed in general. To restrict ourselves to dealing with an analytically tractable model, we here only deal with a probabilistic model of a homogeneous neural population. In this model, u i (s) = u(s) for any i and u ij (s) = Θ(s) for any pair of i and j, i.e., where the normalization constant Z(s) is given by where Tr stands for the sum over all possible combinations of the neuron state variables (r 1 , r 2 ,…,r N ). The probability distribution of T observations is given by In summary, no information is lost in a homogeneous neural population even when the data of neural correlation, R, are lost or when the mismatched model that ignores neural correlation is used for decoding.

dIscussIon
In this work, we introduced a novel framework for investigating information processing in the brain where we studied information loss caused by two situations: imperfect observations and mismatched decoding. By evaluating the information loss caused by non-simultaneous observations of neural responses, we can quantify the importance of correlated activity. This can also be quantified by similarly evaluating the information loss caused by mismatched decoding that ignores neural correlation. We discussed these two types of loss by giving the information geometric interpretations of inferences with partially observed data and those with a mismatched decoding model and elucidated their relationship. We showed that the information loss associated with ignoring correlation in decoding is always larger than that caused by non-simultaneous observations of neural responses. This is because the inference based on an independent decoding model with complete data is equivalent to the inference based on an independent decoding model with "partial" data where the data of neural correlation are lost, which is naturally worse than the inference based on a correct decoding model with the partial data. This also can be intuitively understood by considering that decoding without the data of correlation considers all possible models of neural correlations, including the correct one, whereas decoding with a mismatched model locks it within the wrong domain.
Taking account of the relationship between the two inference methods, we give a simple guide on how to use the two different measures for quantifying the importance of correlation. To address the importance of coincidence detection by downstream neurons without making any specific assumption about the decoding process in higher-order areas in the brain, we should evaluate the information loss caused by non-simultaneous observations. The information loss quantified in this way can be used as the lower bound on the information conveyed by correlated activity. In contrast, the information loss quantified by using the independent decoding model can be used as the upper bound on the information conveyed by correlated activity because neural correlation is ignored not only in the observations but also in the decoding in this quantification. In summary, we consider that both measures should be computed when quantifying the importance of correlations and should be used as the lower bound and upper bound for the importance of correlations.
We considered the case that the stimulus is represented as a continuous variable. In this case, the Fisher information is a suitable measure for quantifying the maximal amount of information that can be extracted from neural responses. When considering a set Similar to in the previous section, the maximum likelihood estimation with the independent decoding model q(r; s) can be shown to be consistent as follows: Since I F1 (s) = I F3 (s) (see Eqs. 120 and 126), there is no information loss when the independent model is used for decoding in a homogeneous neural population (Wu et al., 2001). As for Frontiers in Computational Neuroscience www.frontiersin.org Frontiers in Computational Neuroscience www.frontiersin.org