An Information-Theoretic Approach for Evaluating Probabilistic Tuning Functions of Single Neurons

Neuronal tuning functions can be expressed by the conditional probability of observing a spike given any combination of explanatory variables. However, accurately determining such probabilistic tuning functions from experimental data poses several challenges such as finding the right combination of explanatory variables and determining their proper neuronal latencies. Here we present a novel approach of estimating and evaluating such probabilistic tuning functions, which offers a solution for these problems. By maximizing the mutual information between the probability distributions of spike occurrence and the variables, their neuronal latency can be estimated, and the dependence of neuronal activity on different combinations of variables can be measured. This method was used to analyze neuronal activity in cortical area MSTd in terms of dependence on signals related to eye and retinal image movement. Comparison with conventional feature detection and regression analysis techniques shows that our method offers distinct advantages, if the dependence does not match the regression model.

Here, we present a novel approach for evaluating neuronal tuning functions. Using a probabilistic tuning function description, we propose a model-free alternative to regression analysis. By maximizing the mutual information between the neuronal activity and any combination of explanatory variables, the presented method allows the estimation of neuronal latency and gives a measure for the dependence of neuronal activity on different combinations of the explanatory variables. This stands in contrast with typical applications of information-theoretic techniques in neural data analysis (see, e.g., Rieke et al., 1997), which investigate the relevance of spike timing or the influence of correlations in neural populations.
In the following, we demonstrate the application of this novel method on data from MSTd and show that neuronal activity in this cortical area is determined by combinations of retinal (e.g., image velocity) and extraretinal variables (e.g., eye velocity and position).

evaluating probabilistic tuning functions
The method proposed here consists of two components: first, a Bayesian approach for the determination of probabilistic tuning functions, and second, an information-theoretic technique for evaluating these tuning functions by estimating neuronal latencies and selecting those variables that show the greatest dependence on the neuronal activity.

Bayesian approach for tuning function determination
Let S be a binary random variable for the observation of a spike or non-spike, with p S (s) denoting the probability mass function of spike occurrence. The discrete random variable V denotes the observation of a specific combination of explanatory variables with

introduction
Defining the dependence of neuronal activity on certain variables, e.g., presented stimuli, is often the major aim of studies attempting to define neural mechanisms supporting sensory-motor behavior. A neuronal tuning function defines the functional relation between the spiking activity and uni-or multivariate explanatory variables. Virtually every sensory system, from the vertebrate visual cortex to wind-detecting neurons in the cricket cercal system, has been characterized in this way (Rieke et al., 1997;Dayan and Abbott, 2001).
One common approach to determine the dependence of neurons on these variables is regression analysis, in which the spike density function is approximated by one or multiple explanatory variables using linear or other models (Ilg et al., 2004;Ono et al., 2004;Wu et al., 2006). It remains questionable, however, what model assumptions can be made when analyzing neuronal data.
Another difficulty in the analysis of neuronal data is the accurate estimation of latencies between any given variable and associated neuronal activity (Seal et al., 1983;Friedman and Priebe, 1998;Bollimunta et al., 2007). For example, when visual information arrives in cortical area MSTd, which is the dorsal part of the medial superior temporal cortex, it has passed a number of processing stages resulting in a considerable delay with respect to the stimulus. Depending on the properties of the visual input, retinal delay alone accounts for a latency of up to 50 ms (Schmolesky et al., 1998). On the way through thalamus and primary visual areas, the signal propagates with a velocity on the order of 10-100 m/s. Additionally, synaptic transmission produces delays of several milliseconds. Therefore, a proper model of this system has to account for the fact that neuronal activity is a delayed function of the input variables.
An information-theoretic approach for evaluating probabilistic tuning functions of single neurons associated probability mass function p V (v). Then, p S|V (s|v) expresses the conditional probability of observing a spike given any combination of variable values. By multiplying with the sampling rate, this probability translates directly into an expectation value of the rate of spiking activity, and therefore describes a neuronal tuning function. Using Bayes' theorem, p S|V (s|v) can be expressed as the quotient of the joint probability mass function p V,S (v,s) divided by p V (v): The normalization on p V (v) allows the estimation of the tuning function in unbalanced designs (i.e., unequal number of observations across explanatory variables). Figure 1 demonstrates this principle for the two-dimensional case.
Estimates of p V (v) and p V,S (v,s) can be attained by generating histograms of the experimental data. Note that the joint probability mass function p V,S (v,s) critically depends on the assumed neuronal latency. For estimating the optimal number of bins, with which each variable is discretized, we adapted an algorithm proposed by Knuth (2006). According to this, the optimal bin width is defined by the Bayesian estimate of the number of segments of a piecewise constant probability function that is limited to a fixed interval. The most probable solution is determined by a balance between the likelihood function, which describes the probability that a data point can be assigned to a specific bin, and the prior probability that decreases with an increasing number of bins. For smoothing the resulting histograms, we used a symmetrical Gaussian low-pass filter with a SD of two bin widths.
The amount of data needed for generating reasonably finegrained histograms increases exponentially with the number of dimensions. However, the duration one single neuron can be recorded is restricted due to experimental and physiological constraints. Hence, there is a practical limitation on the number of variables that can be modeled or included in this Bayesian approach for tuning function determination.

Mutual information maximization
Entropy H is a measure of the uncertainty of a single random variable. The reduction in uncertainty due to another random variable is called mutual information I (Cover and Thomas, 1991). Mutual information is a measure of the dependence between two random variables. It is symmetric, non-negative, and equal to zero only if both random variables are mutually independent. Mutual information captures all dependencies between random variables, not just second order dependencies which are indicated, for example, by the covariance.
When applied to the two random variables V and S from the previous section, the mutual information I(V;S) can be stated as with H(S) being the entropy of S and H(S|V) the conditional entropy of S given V, also referred to as noise entropy. These are defined by where p S (s) denotes the probability mass function of spike occurrence. The conditional probability p S|V (s|v) denotes the tuning function, determined by the Bayesian approach mentioned in the previous section. This probability and thus H(S|V) as well depend on both the choice of variables analyzed and the choice of latencies between these variables and the neuronal activity.
Maximizing the mutual information between V and S provides an unbiased estimator for the neuronal latency. A proof based on the data-processing inequality theorem in information theory (Cover and Thomas, 1991) is given in the Section "Appendix." The proof requires only the moderate assumption that a constant delay exists between stimulus and neuronal activity. For the simulation of neuronal activity, we used a Poisson process model as described in Rieke et al. (1997). The Poisson model is characterized by the statistical independence of events in disjoint time intervals. The probability distribution for k events in the time interval ∆ is where l(t) denotes the time-dependent firing rate. To simulate a neuron that resembles the two-dimensional tuning function p S|V (s|v), l(t) was set to with sampling rate f s , and v 1 (t) and v 2 (t) being time shifted by the estimated neuronal latencies t 1 and t 2 .

application
The data reported in this paper were recorded in cortical area MSTd from two behaving monkeys (Macaca mulatta, 5-7 kg). The experiments were performed at the Yerkes National Primate Research Center (Atlanta, GA, USA) in compliance with National Institutes of Health Guide for the Care and Use of Laboratory Animals and protocols were reviewed and approved by the Institutional Animal Care and Use Committee at Emory University. For verifying MSTd location we used functional, histological, and MRI criteria. During the experiments monkeys were seated in a primate chair with their head fixed in the horizontal stereotaxic plane in a completely dark room. Only those neurons that showed significant response to moving visual stimuli were analyzed. Visual receptive fields of neurons were mapped by moving a probe stimulus at regularly spaced eccentricities across the visual field. Most receptive fields were large (>30°) and had their center in the contralateral hemifield in accordance with known MSTd properties. Experimental procedures are explained in detail in Ono et al. (2010).

Visual stimuli
Visual large field (LF) stimuli (35° × 35° random dot patterns) were rear projected on a tangent screen. Data were acquired only for those movement directions that were previously identified to be the preferred direction of the neuron, i.e., the direction which elicits maximal spiking activity for a moving LF stimulus in the analyzed neuron. For each neuron two kinds of paradigms were tested: (1) Fixation during moving LF stimulus: The monkey fixated a small target spot located at the center of gaze. After some random time the LF stimulus started to move with constant velocity (5-20°/s) in the neuron's preferred direction for a period between 1000 and 1800 ms. During presentation of the visual motion the monkey still fixated the laser spot, though LF stimulation always produces a slight optokinetic nystagmus (<2°/s; see Figure 2A).
(2) Optokinetic response: As (1) with the difference that the laser spot was turned off when the LF stimulus began to move. In this case, the monkey's eye movements followed the motion (see Figure 2B).
As H(S) is defined by the neuronal activity alone, the maximization of I(V;S) is achieved by minimizing the noise entropy H(S|V). In the limit case of H(S|V) = 0, S and V are one-to-one related and the probability of spike occurrence is uniquely defined by the explanatory variables.
Due to the limitation mentioned in the previous section, the dimension of the tuning function is constrained by the amount of data recorded. Estimating the entropy from a finite number of samples is prone to systematic errors. This so called sampling bias problem is described in Panzeri et al. (2007). Put shortly, the noise entropy tends to be underestimated, as finite sampling makes the neuronal response seem less variable than it really is. In our case, the length of each dataset was around 500K samples. The typical number of bins per dimension was less than 20. Hence, the average amount was more than 1250 samples per stimulus condition for the case of two-dimensional tuning functions. To avoid errors due to an insufficient amount of samples, we limited the analysis to this case. Bins containing less than 32 samples were omitted in the analysis.
To investigate the dependence of a spike on more than two explanatory variables, we determined the tuning functions of a single neuron for any pairwise selection V k of those variables. For each of these pairs neuronal latencies of both variables were estimated by maximizing the mutual information I(V k ;S). As I(V k ;S) quantifies the dependence of the spike on the selected pair of variables, those two variables that are most related to the spiking activity can be determined by comparing the maximal mutual information of the two-dimensional tuning functions.
The mutual information depends on the number of bins used to discretize the variables. Using the algorithm of Knuth (2006; see previous section) an optimal number of bins was estimated for each pair of variables. The average of these optimal numbers of bins was determined and used for comparing the different pairs of variables. For each neuron, this average number of bins was determined separately.

Regression analysis
Regression analysis is a common approach to estimate both the latency and the dependence of neuronal activity on explanatory variables. Spiking activity was represented as a spike density function (sdf), generated by convolving the spike pulses with a Gaussian window function (s = 100 ms). To allow a comparison with the two-dimensional analysis in previous section, the linear regression model consisted of the two regressor variables v 1 and v 2 according to where r represents the Gaussian noise term. The model was fit to the whole dataset. Neuronal latencies were estimated by shifting the variables in steps of 10 ms and searching for the best fit (maximal R 2 ). Furthermore, maximal R 2 values were compared for all pairs of variables.

Simulation of synthetic datasets
For each dataset the two explanatory variables image velocity (v 1 (t)) and eye velocity (v 2 (t)) were generated as 10 s long band-limited (<20 Hz) white noise random signal, sampled at 1 kHz. The mean was zero and the SD was the same as in the example MSTd neuron dataset. Both variables were perfectly uncorrelated. datasets using both the information-based method and regression analysis. Finally, we end this section with the population results for 49 MSTd neurons.

analysis of an exaMple Mstd neuron
We begin with the detailed analysis of an example neuron. The 763 s long dataset contained 19452 detected spikes. The explanatory variables were slightly correlated, with a maximal Pearson correlation coefficient of 0.23 for the pair of variables (image velocity and eye position).

Feature detection technique
Detecting certain features of averaged data from multiple trials associated with a given stimulus is an often used approach for estimating the latency between some signal, e.g., a stimulus variable, and the neuronal activity. Figure 3 shows the mean eye movement traces and spike density function of an example MSTd neuron during optokinetic response to a moving LF stimulus. Signal onset was defined as the time when the trace increased above the 99% confidence limit during the preceding fixation period. Latency of the neuronal activity with respect to any explanatory variable was defined as time interval between adjacent onsets of that signal and neuronal activity. Retinal image velocity is the difference between target and eye velocity. As can be seen in Figure 3, neuronal latency in regard to target, as well as image velocity, was 45 ms. Regarding the eye velocity signal, neuronal activity was leading (in the following denoted as negative latency) by 20 ms.

Mutual information maximization method
Next we analyze data from the same neuron using our informationbased approach. Due to the practical limit of number of variables mentioned in Section 2, we determined the tuning functions for pairwise selections of explanatory variables. Neuronal latencies were also estimated for each pair separately. bits. This information contained in the spiking activity was the maximum that could be explained by the information of that pair of variables. Note that this value depends on the number of bins used to discretize the variables. Here, each explanatory variable was discretized in 22 bins of equal width. The dependence of the estimated latencies on the number of bins is shown in Figure 5.
For both image and eye velocity the latency estimate is robust for a wide range of chosen number of bins.
In the same way the neuronal tuning functions were determined for all variable pairs (Figure 6). Comparing all pairwise selections of considered variables, the mutual information I(V;S) for the pair [image velocity & eye velocity] was maximal. Hence, this pair was most related to spiking activity. As can be seen, the expected rate of spiking activity increased primarily with higher image velocity values. Within a certain range of image velocity, the rate additionally increased with eye velocity, yielding a non-linear dependence of spiking activity on both variables. Estimated latencies of image During both (1) and (2) the constant velocity phase of LF motion was interrupted (600-800 ms after stimulus onset) in some trials by a perturbation of target speed consisting of one sinusoidal cycle (5 Hz, ±10°/s), which increased the range of image and eye velocity (as in Ono et al., 2010).
This combination of two different paradigms has the advantage of yielding a large range of values for both retinal image velocity during fixation trials, as well as eye velocity in the optokinetic response trials.

Data collection and preparation
Single unit activity was analyzed from 49 neurons. Action potentials were detected with both a hardware window discriminator and template matching algorithm.
Eye movements were detected with standard electro-magnetic methods using scleral search coils (Fuchs and Robinson, 1966). Eye and target position feedback signals were processed with antialiasing filters at 200 Hz using 6-pole Bessel filters (latency 5 ms) before digitization at 1 kHz with 16-bit precision. The recorded eye position traces were filtered with an acausal zero phase Gaussian low-pass (cutoff frequency 30 Hz) and three-point differentiated to obtain the velocity traces. Saccades were detected and removed with a slow-phase estimation algorithm as described in Ladda et al. (2007).
We related the neuronal activity to variables supposed to be coded in MSTd during moving LF stimulation (Newsome et al., 1988;Bremmer et al., 1997;Hamed et al., 2003). Retinal variables were image velocity and acceleration, whereas extraretinal variables were eye position, velocity, and acceleration.

results
In this section we demonstrate the analysis of neuronal recordings from MSTd using different methods. Neuronal latency is estimated first using a feature detection technique. We then apply the previously described information-theoretic method on the same data to determine both neuronal latency and dependence of neuronal activity on certain variables. Furthermore we analyze the example MSTd neuron using linear regression analysis. To compare the performance in latency estimation, we simulated 100 neuronal recordings of the example MSTd neuron and analyzed this synthetic paired with. This is primarily due to the low dependence of spiking activity on eye and image acceleration in this specific neuron, as can be seen in Figure 6.
velocity, eye velocity, and eye position depended little on which pair was analyzed. In contrast, estimated latencies of eye acceleration and image acceleration depended strongly on the variable they were  Figure 8. The tuning to these variables was highly non-linear and differed across the neurons. Figure 9 shows the estimated latencies of the variables image velocity and eye velocity. To ensure that for each variable only those neurons were selected, in which this variable was actually related to the activity of that neuron, only those cells were considered that belonged to the pair that showed maximum I(V;S). Each histogram contains the latencies of the neurons for which the corresponding variable belonged to this optimal combination. For instance, image velocity was one of the optimal pair in 20 + 15 + 13 = 48 neurons. Average estimated latencies for image velocity, and eye velocity were 52.5, and −37.1 ms, respectively. For image velocity the SD within the population was very small (11.2 ms). For eye velocity it was larger (50.6 ms).

Regression analysis
For comparison with our previous results we show the results for linear regression analysis of the same MSTd population data.  (47%), 14 (29%), 9 (18%), and 3 (6%) neurons, respectively. Therefore, both approaches, information maximization, and linear regression, agreed in concluding that a combination of an image velocity and an eye movement variable is most related to spiking activity in MSTd. Figure 9 shows the results of latency estimation, using an analogous selection criterion as previous section. Instead of maximal I(V;S), those combinations were selected that showed maximum R 2 of all combinations. For both variables image and eye velocity the SD was much larger in the regression estimates than with the information-based approach.

discussion
We showed that our novel information-theoretic approach is capable of estimating and evaluating probabilistic neuronal tuning functions. By maximizing the mutual information between the probability distributions of spike occurrence and the variables, their neuronal latency can be estimated and the dependence of neuronal activity on different combinations of variables can be measured. In the following we discuss the various techniques for neuronal latency estimation. Finally, we compare our method with other information-theoretic approaches for analyzing neuronal data.

estiMation of neuronal latency
Numerous methods for estimating the latency of spiking activity relative to some variable on a trial-by-trial basis have been published (e.g., Seal et al., 1983;Friedman and Priebe, 1998;Bollimunta et al., 2007). Because the latency can vary for each trial, such detailed analysis is often necessary and might yield better results than the feature detection technique based on averaging as used here. Nonetheless, averaging techniques are widely used in the neurophysiological literature, and many applications, for instance tuning function determination as presented here, require the estimation of a distinct latency for each neuron.

Regression analysis
Analogous to the result of the information-based approach, linear regression analysis determined the pair [image velocity & eye velocity] as being most related to spiking activity (R 2 =0.40). The estimated neuronal latencies, however, differed from the previous results. The best fit was obtained for delaying image velocity by 70 ms and eye velocity by 170 ms.

analysis of synthetic datasets
To further compare the performances of latency estimation for both the information-based and the regression approach, we simulated 100 neuronal recordings based on the non-linear tuning function for [image velocity & eye velocity] of the example MSTd neuron (details are explained in Section 2.2.2). This allowed the a priori definition of the neuronal latency, which was set to the previously estimated values of 50 and −80 ms, respectively.
Results for both approaches are shown in Figure 7. Linear regression analysis was not able to correctly estimate the latencies of 50 and −80 ms. The broad distributions of estimates around 67 (±68) and −61 (±110) ms for image and eye velocity, respectively, demonstrate that a linear model is insufficient for analyzing this example neuron. The information-based approach, on the other hand, was capable of estimating the proper latencies for both variables in all datasets.

Mstd population results
We applied our information-theoretic approach to 49 cells recorded in area MSTd. The mean recording length per neuron was about 500 s, with an average spike count of approximately 18K. The explanatory variables were slightly correlated. The combination [image velocity & eye velocity] for instance exhibited an average Pearson correlation coefficient of −0.23 ± 0.11.

Mutual information maximization method
The While our information-based method considers temporal mean values, a method like feature detection might be better suited, when the primary goal is estimating the actual onset time of a signal. Such an approach requires temporal matching between distinct features in stimulus and response, as for example given by the sudden onset of acceleration in a ramp stimulus. Whenever this is not achievable, this method cannot be applied and temporal averaging methods such as regression analysis or our information-based technique are more appropriate.
Feature detection is an intuitive approach for estimating a distinct neuronal latency. Using this technique, the estimated latency for image velocity was about the same as with our information-theoretic approach in the example neuron. For eye velocity there was a discrepancy of 60 ms between feature detection (−20 ms latency) and our method (−80 ms latency). However, a shift that maximizes the mutual information between two random variables does not necessarily align the on-and offsets of these signals. The labeling is analog to Figure 6. The expected rate of spiking activity increased in general with higher image and eye velocity values. However, the shape of the tuning functions differed remarkably in each neuron.
Brostek et al.  Panzeri, 2009). Several of these studies examined the mutual information between stimulus variables and the neuronal response (e.g., Eckhorn and Pöpel, 1974;Optican and Richmond, 1987;Kjaer et al., 1994;Panzeri and Treves, 1996;Golomb et al., 1997;Rolls et al., 1997;Strong et al., 1998;Butts, 2003;Nirenberg and Latham, 2003;Osborne et al., 2004). Sharpee et al. (2004Sharpee et al. ( , 2006 maximized the mutual information between neuronal responses and certain subspaces of the high-dimensional stimulus. Similarly, our approach maximizes the mutual information between pairwise selected variables. However, these studies analyze the neuronal response, single or multi-unit recordings, to quantify either the information due to spike patterns or the information due to correlations between neurons. Those methods estimate the probabilities of certain spiking patterns in the neuronal response given specific stimuli. These probabilities are usually determined by pooling over a large number of trials, where the same stimulus was presented many times.
Our method, on the other hand, estimates for each sample the probability of spike occurrence given certain stimuli, not caring for certain spiking patterns. This probability is determined by pooling the whole dataset over time. Therefore, our method does not depend on recording a large number of similar trials. In this sense, it rather presents a model-free alternative to the model-based regression analysis.
Previous approaches that were determining the mutual information between stimulus and response were, to our knowledge, less concerned about neuronal latency estimation. When using static stimuli (Optican and Richmond, 1987;Kjaer et al., 1994;Rolls et al., 1997), there is no need for exact knowledge of the neuronal latency. Also, it might be negligible if latencies are relatively short, for instance in data from peripheral neurons in insects (Rieke et al., 1997). When, in contrast, large latencies have to be considered, determining neuronal tuning functions depends critically on estimating these latencies. In such cases, an approach as presented here is required.
Our approach for evaluating neuronal tuning functions is analogous to a method used for the alignment or registration of medical images: the relative position and orientation of two different images is adjusted by transforming one of the images until the mutual information between both intensity distributions is maximized (Collignon et al., 1995;Wells and Viola, 1996). Similar techniques have been used for instance for object detection in computer vision (Shams et al., 2000). Analogous to the spatial alignment used in these approaches, our method performs temporal alignment of two random variables by maximizing the mutual information.

conclusion
We present a novel method for determining and evaluating multidimensional probabilistic tuning functions. Our Bayesian approach allows the identification of arbitrary neuronal tuning functions. It can be applied in unbalanced designs and allows quantification of any possible dependence of the neuronal activity on the explanatory variables. However, the dimension of the tuning function is limited by the length of the neuronal recording.
Maximizing the mutual information allows estimation of neuronal latency and comparison of the coherence between spiking activity and different variable combinations. This informationbased approach does not require parametric modeling of the tuning function and is an appropriate tool for evaluating probabilistic tuning functions defined in the Bayesian framework.
Regression analysis can be applied on a virtually arbitrary number of variables. However, the results strongly depend on the model assumptions. The results shown here demonstrated that a simple linear model is insufficient when analyzing neuronal data. As Figure 6 illustrates, the versatility and non-linear character of neuronal tuning functions means that it can be difficult to find adequate general models.
For image velocity, estimated latencies of the population averaged 20 (±101) ms using linear regression. In contrast, our information-based method yielded a much sharper distribution of latencies averaging 53 (±11) ms. This result equals previous studies that were using feature detection techniques to estimate latencies in MSTd neurons: Kawano et al. (1994) found a latency of 47 (±7) ms to moving LF stimuli; Ono et al. (2010) determined a similar value of 42 (±14) ms.
A signal such as eye velocity is the output of a non-linear dynamic system with a large number of different input signals and recurrent connections. Neuronal activity in MSTd resembles some intermediate stage embedded in this network. Hence, assuming that the activity in MSTd can be explained using a single non-linearity with fixed delayed input variables may oversimplify conditions extant in a complex system. Nevertheless, our method yields consistent and plausible latency estimation under these conditions.

coMparison with other inforMation-theoretic approaches
In recent years various information-theoretic approaches have been used to analyze neuronal data (for reviews see, e.g., Rieke et al., 1997;Borst and Theunissen, 1999;Victor, 2006;Quiroga and  Only those neurons were considered where the variable was actually related to the activity of that neuron. Compared to the results of our information-based approach, the distribution of estimated latencies is much wider when using regression analysis.
individual response properties to the same variables. For this reason, the model-free approach proposed here is particularly suitable for this analysis.

acknowledgMents
This work was supported by the German Federal Ministry of Education and Research Grants 01 GQ 0448, 01 EO 0901, and National Institutes of Health Grants EY013308, RR000166.
Model-based approaches like regression analysis critically depend on the validity of model assumptions. As demonstrated here, simple approaches, such as the linear model evaluated above, are often insufficient for analyzing neuronal data.
By applying this novel technique to data from MSTd neurons, we show that they are tuned for non-linear combinations of retinal image and eye movement signals. Though latencies are quite consistent across neurons, single neurons differ remarkably in their