A Generalized ideal observer model for decoding sensory neural responses

We show that many ideal observer models used to decode neural activity can be generalized to a conceptually and analytically simple form. This enables us to study the statistical properties of this class of ideal observer models in a unified manner. We consider in detail the problem of estimating the performance of this class of models. We formulate the problem de novo by deriving two equivalent expressions for the performance and introducing the corresponding estimators. We obtain a lower bound on the number of observations (N) required for the estimate of the model performance to lie within a specified confidence interval at a specified confidence level. We show that these estimators are unbiased and consistent, with variance approaching zero at the rate of 1/N. We find that the maximum likelihood estimator for the model performance is not guaranteed to be the minimum variance estimator even for some simple parametric forms (e.g., exponential) of the underlying probability distributions. We discuss the application of these results for designing and interpreting neurophysiological experiments that employ specific instances of this ideal observer model.


INTRODUCTION
Ideal observer models are an important tool in the effort to understand the neural bases of perception and behavior (FitzHugh, 1957;Ratliff, 1962;De Valois et al., 1967;Ratliff et al., 1968;Talbot et al., 1968;Barlow and Levick, 1969;Barlow et al., 1971;Mountcastle et al., 1972;Johansson and Vallbo, 1979;Bradley et al., 1987;Newsome et al., 1989;Vogels and Orban, 1990;Geisler, 2001). Ideal observer analysis can be applied to the organism as a whole, as in psychophysical studies, or to a specific stage of information processing within the visual system of the organism, as is often done in neurophysiological studies (sometimes referred to as "sequential ideal observer analysis," see Geisler, 1989). Here we focus exclusively on ideal observer models that arise in the analysis and interpretation of neurophysiological data. In this context, we define an ideal "observer" model as a set of operations and processes by which the experimenter optimally decodes stimuli, perceptual decisions, or behavioral outcomes from sensory neural activity (Green and Swets, 1966;Geisler, 1989Geisler, , 2001Geisler, , 2004. In the early stages of a sensory system, such an ideal "observer" model can be used to study the efficiency of a neuron. For example, Barlow et al. (1971) used an ideal detector model to compute detection probability from the number of photons absorbed by photoreceptors and related the results to retinal ganglion cell responses. In this manner, they were able to estimate the average number of impulses emitted by a retinal ganglion cell per quantum of light absorbed by photoreceptors. They concluded ganglion cells are efficient and sensitive. In the intermediate stages of sensorimotor transformation, ideal observer models are often used to optimally decode behavioral choice related information from the responses of a single sensory neuron (Celebrini and Newsome, 1994;Britten et al., 1996). Such analyses associate neural responses with perceptual decisions (rev. Parker and Newsome, 1998). Ideal observer analysis can also be applied to optically imaged cortical signals to assess neural population sensitivity for detection or discrimination (Chen et al., 2006(Chen et al., , 2008Purushothaman et al., 2009; see also rev: Cohen et al., 2011).
The statistical properties of an ideal observer model impact the results. For example, an ideal observer typically yields an unbiased estimate of performance and increasing the number of trials will decrease the variance of this estimate. These assumptions are generally valid when the underlying probability distributions take certain parametric forms but deviations from these assumptions can influence the results. Furthermore, it is not always straightforward to take into account confidence intervals for model performance in interpreting the results. Statistically valid methods of computing confidence intervals are known for some applications (e.g., Agarwal et al., 2005;Sarma et al., 2011) but this is not true in general. Therefore, heuristic or Monte-Carlo simulations are used to compute confidence intervals of ideal observer performance where necessary (e.g., Purushothaman et al., 2009). The main goal of this paper is to investigate the statistical properties and limitations of ideal observer models commonly used in the analyses of neurophysiological data. To achieve this goal, we first generalize four common forms of such ideal observer models.
The first of these was used in studies of the absolute visual detection threshold (Hecht et al., 1942;Hartline et al., 1947;Ratliff, 1962). Hecht et al. (1942) showed that the probability with which human observers detected flashes of light, that presumably delivered a certain average number of quanta of energy (a) to the retina, closely followed the probability of drawing a "threshold" number of n or more quanta from a Poisson distribution with mean arrival rate a. Analysis of the electrophysiological data of Hartline et al. (1947) from the Limulus eye showed that the frequency with which a neuron emitted at least a criterion number (N C ) of impulses also closely followed the probability of drawing N C or more impulses from the Poisson distribution with arrival rate equal to a (Ratliff, 1962). Implicit in this analysis is the linking hypothesis that the neuron signals to the animal the presence of an external stimulus whenever the number of impulses emitted by the neuron is greater than or equal to N C (Teller, 1984). Given this hypothesis, the ideal observer model estimates the maximum detection probability for a set of neural responses. It can be said that the criterion N C is chosen in this model to fit detection probabilities but without regard to the false alarm rate. Since the "maintained" or "background" discharge rate of the neuron also fluctuates (Ratliff et al., 1968;Barlow and Levick, 1969), in some trials, the number of impulses emitted by the neuron will equal or exceed N C simply due to this random fluctuation and the ideal observer will falsely signal the presence of a stimulus. This false alarm rate is not incorporated into this model.
The second ideal observer model we consider takes the false alarm rate into account [e.g., Barlow and Levick, 1969;rev. Green and Swets, 1966]. Typically, the probability distribution of the number of impulses in the maintained discharge is used to determine N C so that the probability of false alarm is less than or equal to a predetermined value [e.g., 0.2% in Barlow and Levick (1969)]. The probability distribution for the stimulus-induced response will then determine the detection rate for this criterion. The ideal observer in this analysis performs essentially the same operation as the one above, signaling the presence of a stimulus whenever the number of impulses emitted by the neuron exceeds N C . But this criterion value is chosen based on a constraint on the false alarm rate.
The third model arises in Two-Alternative Forced-Choice (2-AFC) paradigms employed in detection and discrimination studies (Green and Swets, 1966). Typically, a reference and a test stimuli are presented either at two spatial locations (simultaneously) or in two temporal intervals (sequentially). The task of the observer is to indicate the location or the interval in which the test stimulus occurred. Because decisions are based on the comparison of two stimuli or neural responses to two stimuli, there is no need in this case to set a fixed criterion level. For example, the ideal observer can consistently associate the larger response with the test stimulus (e.g., Barlow et al., 1971). Computationally, the experimenter builds two histograms of neural responses, one each for the reference and test stimuli. The correct detection or discrimination probability for the ideal observer in the 2-AFC task is then the average rate at which the observer can correctly identify which sample belongs to which distribution when presented with two random samples, one drawn from the reference distribution and the other from the test distribution (Green and Swets, 1966). This probability can be estimated as the area under the receiver operating characteristic (ROC) curve for the pair of histograms (Green and Swets, 1966).
The fourth model we consider is computationally similar to the third model but has an important conceptual difference in that it is used to predict the choices made by a subject in a 2-AFC task based on the neural responses for near-threshold stimuli (Johansson and Vallbo, 1979;Celebrini and Newsome, 1994;Britten et al., 1996). This analysis can be used to link subjective perceptual decisions to single neuron responses (rev. Parker and Newsome, 1998;Romo, 2001; see also Vallbo and Johansson, 1980). As a consequence, this ideal observer model has found wide application recently (Dodd et al., 2001;Cook and Maunsell, 2002;Romo et al., 2002;Williams et al., 2003;Stoet and Snyder, 2004;Uka and DeAngelis, 2004;Williams et al., 2004;Purushothaman and Bradley, 2005;Pessoa and Padmala, 2005;Gu et al., 2007Gu et al., , 2008Cohen and Newsome, 2009;Bosking and Maunsell, 2011).
The main difference between the first two ideal observer models and the last two is that the latter models are presented with two observations instead of one, making it possible to render decisions based on a direct comparison of the given observations, independent of a free parameter in the form of a constant criterion number. While this makes the two types of ideal observers different from functional point of view, it is possible to have a single mathematical framework within which the performance of both types of models can be quantitatively described. Consider an ideal observer with two inputs r 0 and r 1 and two outputs C 0 and C 1 . Let P(r 0 ) and P(r 1 ) be the probability distributions of the two input variables. In the following, we show that with appropriate choices for C 0 , C 1 and P(r 0 ), P(r 1 ), this ideal observer can be used for absolute sensory detection tasks (first two categories described above) as well as for 2-AFC tasks (last two categories). In this framework, the performance (i.e., true positive, false positive, true negative, and false negative rates) of all four types of ideal observers can be described using the same closed-form expression. We then address the following questions: 1) How does the performance of the generalized ideal observer compare to the area under the ROC curve? 2) Is it possible to determine a priori the number of input samples required so that the estimated value of the observer's performance will lie within a specified confidence interval at a specified confidence level? 3) Are these estimates unbiased and consistent, i.e., does estimation error decrease with increasing number of observations and at what rate? 4) Do efficient (minimum variance) estimators exist for the performance of these ideal observers? 5) Is the standard method of estimating performance (area under the ROC curve) efficient? Answers to these questions will facilitate a more efficient design of neurophysiological experiments for ideal observer analysis.

GENERALIZED IDEAL OBSERVER EQUATIONS
In the notation introduced above, consider an ideal observer model with inputs r 0 and r 1 . Let S 0 and S 1 be the two experimental conditions associated with r 0 and r 1 , respectively. The probability distributions P 0 (r 0 ) and P 1 (r 1 ) are given by the conditional distributions P 0 (r 0 ) = P 0 (r 0 |S 0 ) and P 1 (r 1 ) = P 1 (r 1 |S 1 ). The ideal observer, who has no a priori knowledge of which input sample comes from which condition, makes a prediction to that effect using a "decision rule". If the observer predicts that r 0 comes from the condition S 0 (or, equivalently, from the distribution P 0 (r 0 |S 0 )) and that r 1 comes from S 1 (i.e., from P 1 (r 1 |S 1 )), then the observer will be correct. The opposite association will be incorrect. The variables r 0 and r 1 may represent the frequency of impulses emitted by the neuron. Without loss of generality, assume that the values of r 0 and r 1 lie within the upper right quadrant of the real plane, i.e., the sample space consists of all points r = (r 0 , r 1 ) ∈ + × + . The decision region D ⊂ + × + consists of all values of r 0 and r 1 for which the ideal observer makes a correct prediction. Then the probability of correct prediction for this ideal observer is given by where p(r 0 , r 1 ) is the joint probability density function corresponding to the joint probability distribution P(r 0 , r 1 ). In many experiments, the responses to the two conditions are independent random variables. Hence P(r 0 , r 1 ) = P 0 (r 0 |S 0 )P 1 (r 1 |S 1 ). Furthermore, the optimal decision variable (e.g., the likelihood ratio) or its sufficient statistic, involve monotone functions of the two variables r 0 and r 1 thereby resulting in a partition of the sample space + × + into a decision region of the form D = {(r 0 , r 1 ) ∈ + × + |r 1 ≥ r 0 }. Substituting this integration region into Equation (1) and choosing the summation of the elemental areas along the two possible directions yields two equivalent expressions for the performance of the ideal observer as and where dx denotes the expectation of the function G with respect to the probability density function f , and p i (r i ), i = 0, 1 are the marginal probability density functions. It is important to note that P p 0 (r 0 ), p 1 (r 1 ) = 1 − P p 1 (r 1 ), p 0 (r 0 ) and therefore the order of the two distribution in the argument of P(., .) cannot be exchanged. This general ideal observer gives rise to the four specific ideal observers described above. In simple detection tasks, the two stimulus conditions are typically S 1 = "Stimulus present" and S 0 = "Stimulus absent." Choose which is the probability P 1 (r 1 > N C ), the hit rate in the detection task. Thus, for the choice of p 0 (r 0 ) = δ(r 0 − N C ), the general ideal observer model simplifies to the first category of ideal observers that signal the presence of a stimulus whenever the response of the neuron under consideration equals or exceeds the fixed criterion number N C . The second category of ideal observers used in detection tasks differs from the first only in the choice of the criterion number N C . Therefore these models can be derived using It is also clear that the general observer fully describes the third category of ideal observers used to quantify neural detection and discrimination performance in 2-AFC tasks. Finally, for the fourth category of ideal observers, the two "stimulus" conditions need to be replaced with the two "choices" available to the subject. Thus, this general ideal observer provides a complete description of the four types of ideal observers considered above. We should note that this generalization does not imply that all four categories of ideal observers are functionally or physiologically equivalent. This generalization is just mathematical and provides a unified framework for the following analyses.

ESTIMATORS FOR THE PERFORMANCE OF THE IDEAL OBSERVER
and R 0 = [r 01 r 02 . . . r 0k . . . r 0N ] are two sets of N samples each obtained in the experiment from the conditions S 1 and S 0 , respectively. In the above notation, the elements of R 1 are independent and identically distributed as P 1 and those of R 0 are similarly drawn from P 0 .
and 0 otherwise. Then, based on Equation (2), an estimator of the performance of the generalized ideal observer as a function of the samples R 0 for given a value of r 1i can be proposed as This provides an estimate for the inner integral in Equation (2), given a value of r 1 . Using all 2N samples of both R 1 and R 0 , www.frontiersin.org September 2013 | Volume 4 | Article 617 | 3 P can be estimated as The estimator based on Equation (3) can be similarly obtained as Equation (4) provides one simple way to estimate the performance of the generalized ideal observer. We pick one sample from R 1 , say r 1i , and count the number of samples of R 0 that are less than or equal to r 1i . We repeat this for all samples in R 1 and divide the result by N 2 . Equation (5) provides a similar method. Computationally, this sequence of operations can be rearranged to resemble the operations involved in computing the area under the ROC curve for the normalized frequency histograms constructed from R 0 and R 1 . Thus, there are at least 3 different methods to estimate the performance of this ideal observer. We show below that all three methods compute the area under the ROC curve, empirically constructed from R 0 and R 1 .

RELATIONSHIP TO THE AREA UNDER THE ROC CURVE
For a fixed criterion T, the hit rate (β) and false alarm rate (α) are and Using Equation (6), we can rewrite the expression for the performance of the ideal observer in Equation (3) as Using Equation (7) in the above, we have the performance of the ideal observer as Since the ROC curve is the plot of β against α as the criterion varies from 0 to ∞, the quantity βdα is the area under the ROC curve ( Figure 1A). Therefore, estimates of the quantities in Equations (2) and (3) are also estimators of the area under the ROC curve. Figures 1, 2 numerically illustrate the fact that estimators (4) and (5) are equivalent to the conventional estimate of performance as the area under the ROC curve. For Figure 1, we assumed Gaussian distributions for P 0 and P 1 with the mean of P 1 greater than that for P 0 . A random set of 100 samples were drawn from each distribution and the area under the ROC curve was estimated. Performance was also estimated using Equations (4) and (5). The difference between the mean values of Gaussian distributions was then increased in the range [0.5, 25] in steps of 0.5. The variances were set to 1.28 × mean 1.2 to mimic the firing rate statistics of MT neurons (Britten et al., 1992;Purushothaman and Bradley, 2005). The estimates were computed for this entire range of mean values (Figure 1B). The deviation of the estimators (4) and (5)  (Area under ROC − estimate)/estimate. The three estimates differed by less than 1.5% from each other ( Figure 1C). When the estimates were averaged over 100 repetitions, the errors became negligible (Figure 1D). Simulations with Poisson distributions showed errors in the range of 0 − 5% (Figures 1E,F).
For Figure 2, we again assumed Gaussian distributions for P 0 and P 1 with the mean of P 1 greater than that for P 0 . However, in these simulations, the difference between the mean values were held constant while the ratio of the variance of P 1 to that of P 0 was increased in the range [1, 25]. The percent error was computed as above. These simulations also showed that the estimates averaged over 100 repetitions had negligible error ( Figure 2C).

UNBIASED ESTIMATION OF THE IDEAL OBSERVER PERFORMANCE
It is easy to verify that the estimators given in Equations (4) and (5) are unbiased, i.e., their expected values are equal to the true value to be estimated (Van Trees, 1966, pp. 65-73). We note that P(R 0 , R 1 ) is a joint transformation of the independent random variables r 0k and r 1i , i, k = 1, 2, . . . , N and that r ik , k = 1, 2, . . . , N are identically distributed for each i. Therefore the expected value of the estimator in Equation (4) can be computed as Substituting Equation (4) into the above equation, we get Therefore, P in Equation (4) is an unbiased estimator of P. Similarly, it can be shown that the estimator of Equation (5) is also unbiased.

VARIANCE OF THE ESTIMATOR
The variances of the estimators in Equations (4) and (5) can be computed by first subtracting P from both sides of Equation (4) and squaring them : Expanding the summand of S 1 as I [0, r 1i ] (r 0k ) − P 2 = I [0, r 1i ] (r 0k ) + P 2 − 2 P I [0, r 1i ] (r 0k ) and noting that E[I [0, r 1i ] (r 0k )] = P, we obtain for the expectation of the first term, E[S 1 ] = N 2 P(1 − P). Next, we rewrite the second sum as Consider the first sum on the right side of Equation (11) above. We compute the expectation of the product Since min(r 1i , r 1j ) ≤ r 1i , we get the following bound: Therefore, we have for the expectation of the first term on the right side of Equation (11) where we used the bound r 1i 0 (r 0m )p 0 (r 0m )dr 0m ≤ 1 in Equation (14). Therefore, we have for the expectation of S 22 the bound Finally, we note that in the last term S 23 , the summand I [0, r 1i ] (r 0k ) − P I [0, r 1l ] (r 0m ) − P is the product of two independent and zero-mean random variables for (i, k) = (l, m). Hence the variance of the estimator in Equation (4) has the bound Similar calculations yield the same bound for the variance of the estimator in Equation (5).

CONSISTENCY OF THE ESTIMATOR
Next, we verify if the estimators are consistent, i.e., if the estimates progressively converge to the true value as the number of observations is increased (Van Trees, 1966, pp. 65-73). To do so, we first apply the Tchebycheff-Bienayme inequality to P. For any ε > 0, we have Thus P converges to P in probability as N → ∞ and is a consistent estimator of P.

DEVIATION OF AN ESTIMATE FROM THE TRUE VALUE
The above analyses showed that the proposed estimators give an unbiased estimate of the performance of the ideal observer and that as the number of observations increases, the error of estimation (i.e., the variance of the estimator) decreases at the rate of 1/N. In addition to establishing these properties, the above analyses also give us tools for designing the ideal observer model. Suppose the experiment has been performed and an estimate of the performance of the ideal observer has been obtained for a neuron. It is desirable to determine the likelihood that the true value of the performance lies within a known range of the estimate obtained, i.e., we would like to state a confidence interval for the estimate at a given significance level. Currently, this confidence interval, when reported, is obtained using bootstrapping or other empirical methods. The above analyses provides a tool for quantifying the deviation of a performance estimate from its true value in a simpler and more rigorous manner. Equation (16) can be used for this purpose. Suppose we require the percent error in the estimate, 100 × |P − P|/ P, to be less than 5%. This gives ε = 0.05 × P, from which the probability that the true value lies outside this error range can be computed as Thus, the quantity α = gives the significance level for the desired confidence interval. We note that since |P − P| ≥ ε, α does not necessarily depend upon the unknown P. For large N, 2N >> 1. Hence α ≈ 2P(1 − P)/N 0.05 × P 2 .
We investigated the tightness of this bound using a series of simulations (Figure 3). We simulated N trials by drawing N samples of R 0 and R 1 , each, from Gaussian distributions whose mean values differed by progressively increasing amounts so that the true value of the ideal observer performance varied from 0.5 to 1.0. For each set (R 0 , R 1 ), we obtained one estimate of P. We performed this simulation 1000 times and computed the maximum deviation of the estimate from the true value, the average deviation and the minimum deviation for the 1000 estimates. We repeated all of these simulations for Gamma distributions. The results are shown superimposed on the corresponding values of ε for α values of 0.01 and 0.05 (Figure 3). The same pattern of results were obtained for Poisson and scaled Poisson distributions. These simulations show that for small values of N(≤ 100) and α(= 0.01), the actual difference between the true and estimated values is much smaller than the theoretical bound . At α = 0.05 and for higher values of N, the theoretical deviation approaches the maximum empirical deviation obtained in the simulations. The implications of the varying tightness of the theoretical bound for experimental design are discussed below.

DESIGNING EXPERIMENTS FOR RELIABLE ESTIMATION OF IDEAL OBSERVER PERFORMANCE
Some previous studies have empirically investigated the number of trials required to obtain a reliable estimate of the ideal observer's performance. For example, Britten et al. (1996) computed "choice probability" separately for odd and even numbered trials. This allowed them to compute a measure of the random dispersion of the probability values. One goal of that investigation was to test whether or not the population average choice probability was significantly different from chance. For the population average choice probability of 0.55, at least 100 trials were required for the odd and even estimates to differ by less than 0.05 (i.e., 0.55-0.5). A different empirical approach was required to estimate the number of trials required to significantly reduce estimation errors in the ROC analysis of optically imaged intrinsic signals (Purushothaman et al., 2009). From the results obtained in the previous section, we can arrive at a general formula for systematically determining the number trials required for the estimate of the performance of the generalized ideal observer to reach a desired confidence interval. From Equation (16) above, we have, ∀ε > 0, First, as an example, we consider the Britten et al. (1996) study. Assume that the true value of choice probability in that study was 0.55. Suppose we require that the estimate should lie within ±0.05 of the true value at an alpha (or significance) level of 0.05, i.e., we require P{| P − P| ≥ 0.05} ≤ 0.05 so that, in concordance with the empirical test performed by Britten et al. (1996), the dispersion in the choice probability estimate reliably excludes the chance value of 0.5. Then the number of trials N should be at least 2 0.55(1 − 0.55)/(0.05 2 × 0.05) ≈ 89. The empirical test by Britten et al. (1996) yielded N ≈ 100, quite close to this value. However, the above formula also allows us to determine N at other significance levels. At a significance level of 0.01, we get N ≥ 198. While many studies that followed Britten et al. (1996) have used this "100 trials" rule to determine N, our analysis shows that fewer trials suffice when higher values are expected for the performance of the ideal observer. For example, multistable percepts are linked to fluctuations in neural activity quite strongly (Dodd et al., 2002) and neurons in higher brain areas also show a strong link between their activity and perceptual decisions (Shadlen and Newsome, 2001). Using Table 1 and Equation (18), it is possible to estimate the required value of N during experimental design. It is also possible to estimate confidence intervals (i.e., ε) for a given value of N during data analysis without resorting to numerical simulations. Table 1 provides a look-up of ε and N for various values of P. As mentioned above, our simulations showed that at a given value of N and α, the actual deviation between the true and estimated values was much smaller than the theoretical bound set at (Figure 1). Therefore, the values of N shown in Table 1 are likely to be overestimates, i.e., fewer trials might suffice to reach the desired confidence interval in some cases.

EFFICIENT ESTIMATORS OF IDEAL OBSERVER PERFORMANCE MAY NOT EXIST
Since the performance of the ideal observer can be estimated in more than one way, it is natural to ask if some of these methods The expression for obtaining the number of trials required to reach a given confidence interval ε at a significance level α is N ≈ P(1 − P) ε 2 × α . Alternatively, for given values N and α, the confidence interval can be computed as ε ≈ P(1 − P) N 2 × α .
are "better" than others. In addition to requiring that estimators be unbiased and consistent, it is also required that estimators should be "efficient" when possible (Van Trees, 1966, pp. 66-73). An efficient estimator has the minimum possible variance among all unbiased estimators for a quantity and therefore will yield the lowest possible error for a given number of observations, on average. Under some conditions, maximum likelihood (ML) estimators are minimum variance estimators. Therefore, it is natural to seek for ML estimators for the performance of the ideal observer model. In this section, we first show that P(R 0 , R 1 ) is "efficient" in a limited sense. We then present a counter-example to show that the maximum-likelihood (ML) estimator for the performance of an ideal observer is not guaranteed to be minimum variance.
We will first describe a limited sense in which P is efficient.
Then, for a given value of P, the probability distribution function for M is simply the binomial distribution Therefore it can be verified that the calculation gives the ML estimator as We note also that i.e., P ml (m) satisfies the sufficient condition to be an efficient estimator (Van Trees, 1966, pp. 66-73). In addition, E M ( P ml (m)) = P. Therefore, P ml (m) = m/N 2 is an unbiased and efficient estimator of P. However, it is important to note that P ml (m) is an estimator of P as a function of the transformed random variable M(R 0 , R 1 ) and not as a function of R 0 and R 1 . The following counter-example shows that it is not possible to guarantee that ML estimators of P are minimum variance. Let the two conditional distributions be exponential, with p 0 (r 0 ) = α 0 exp(−α 0 r 0 ) and p 1 (r 1 ) = α 1 exp(−α 1 r 1 ). We can calculate P for this case using Equation (2) as P(P 0 , P 1 ) = α 0 /(α 1 + α 0 ). Let us note that 1. if α 1 = α 0 , then P = 0.5, 2. P → 1 as α 0 → ∞ for a given α 1 < ∞ (i.e., as the mass in the tail of the distribution P 0 accumulates while that of P 1 remains constant), and 3. P → 0 as α 0 → 0 for a given α 1 < ∞.
We now note that equation (21)  where T(P) is a function of P alone. Therefore, the sufficient condition for P ml (r 0 , r 1 ) to be efficient is not satisfied (e.g., Van Trees, 1966, pp. 66-73). Further, it is also clear that P(r 0 , r 1 ) is a biased estimator. Hence the ML estimator of P for this case cannot be guaranteed to be minimum variance.

DISCUSSION
We proposed a general form of an ideal observer for decoding stimulus information and perceptual decisions from neural responses. We showed that several ideal observer models used in previous studies are special cases of this general form. We investigated the statistical properties of this general ideal observer model. These analyses provide various tools for designing experiments with the goal of using an ideal observer analysis on neural data. We have provided a lower bound on the number of observations required for the estimate to lie within a pre-specified range of its true value ("confidence interval"), within a specified confidence level. We also showed that there is not a uniformly "best" (i.e., minimum variance) estimator for the performance of the ideal observer since the existence of such an estimator depends on the parametric forms of the underlying probability distributions. It is sometimes argued that computing the area under the ROC curve offers a non-parametric way of estimating ideal observer performance. While it is true that this estimation procedure does not depend on the parametric forms of the underlying probability distributions, it is important to note that the resulting estimate will be invariably influenced by the underlying parametric forms. Therefore, for some parametric forms and under some conditions, neither the estimators provided in Equations (4) and (5) nor the area under the ROC curve will be efficient. However, regardless of which estimator is chosen, the relationship between the number of trials, the confidence interval and the confidence level derived in this paper can be used to design the experiment and validate the results.
It is worth noting that the number of trials required for the estimate to lie within a confidence interval at a given confidence level is not the optimum number of trials required for reaching the decision. Therefore in certain applications other methods, such as sequential probablity ratio tests, may be more appropriate (Wald, 1945).