Edited by: Marc Zirnsak, Stanford University, USA
Reviewed by: Lawrence H. Snyder, Washington University School of Medicine, USA; Xiaomo Chen, Stanford School of Medicine, USA
*Correspondence: Adam P. Morris
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Eye movements are essential to primate vision but introduce potentially disruptive displacements of the retinal image. To maintain stable vision, the brain is thought to rely on neurons that carry both visual signals and information about the current direction of gaze in their firing rates. We have shown previously that these neurons provide an accurate representation of eye position during fixation, but whether they are updated fast enough during saccadic eye movements to support real-time vision remains controversial. Here we show that not only do these neurons carry a fast and accurate eye-position signal, but also that they support in parallel a range of time-lagged variants, including predictive and post dictive signals. We recorded extracellular activity in four areas of the macaque dorsal visual cortex during a saccade task, including the lateral and ventral intraparietal areas (LIP, VIP), and the middle temporal (MT) and medial superior temporal (MST) areas. As reported previously, neurons showed tonic eye-position-related activity during fixation. In addition, they showed a variety of transient changes in activity around the time of saccades, including relative suppression, enhancement, and pre-saccadic bursts for one saccade direction over another. We show that a hypothetical neuron that pools this rich population activity through a weighted sum can produce an output that mimics the true spatiotemporal dynamics of the eye. Further, with different pooling weights, this downstream eye position signal (EPS) could be updated long before (<100 ms) or after (<200 ms) an eye movement. The results suggest a flexible coding scheme in which downstream computations have access to past, current, and future eye positions simultaneously, providing a basis for visual stability and delay-free visually-guided behavior.
The primate visual system makes use of the exquisite sensitivity of the fovea by continually directing the eye toward new areas of interest. As a consequence of this active strategy, visual information must be combined with up-to-the-moment information about eye (and head) position to make sense of the environment (Soechting and Flanders,
The instantaneous eye position, however, is only one aspect of eye position that is of interest to active vision. Several well-known peri-saccadic phenomena would benefit from easy access to past, current, and future eye positions. For instance, before executing a saccade, the visual system could use information on the future eye position to remap visual information from neurons currently receiving input from a specific spatial location to those receiving input from that location after the saccade (Duhamel et al.,
In the one area where multiple groups have studied the peri-saccadic dynamics of eye position signals (LIP), the results have been quite contradictory. We showed that the EPS in LIP is accurate and precise during fixation (Morris et al.,
In this contribution we test the hypothesis that the dorsal visual system carries in parallel a continuum of time-lagged, continuous eye-position signals, including anticipatory, zero-lag, and delayed signals. We predicted that these EPS are represented in a distributed fashion across the neurons in areas LIP and VIP in the intraparietal sulcus, and areas MT and MST in the superior temporal sulcus. Our hypothesis is built on the idea that even if inputs that explicitly encode eye position are slow to update (Xu et al.,
To test this hypothesis, we developed a novel approach in which we construct a linear decoder whose output provides a metric representation of eye position, and is computed as a weighted sum of instantaneous firing rates in a recorded sample of neurons. The pooling weights are chosen to approximate a specific desired output (e.g., a synthetic EPS that leads the actual eye) and the performance of the decoder is quantified using an independent set of experimental trials (i.e., in cross-validation).
The main difference with the decoding approach used by Graf and Andersen (
We first analyzed how firing rate changes over time at the time of saccades in darkness. Consistent with the results of Xu et al. (
The current study consists of a re-analysis of electrophysiological data reported previously (Morris et al.,
We recorded single unit action potentials extracellularly using single tungsten-in-glass microelectrode penetrations through the intact dura. Recordings were made in four regions across a total of four hemispheres in two macaque monkeys, including the LIP and VIP areas of the PPC, and the MT and medial temporal (MST) areas. LIP and VIP were recorded within the same hemisphere and MT and MST in the other, in opposite left-right configuration across the two animals. We report the results from all neurons for which we recorded at least 6 trials per experimental condition. A total of 276 neurons were analyzed, including 74 from area LIP, 107 from VIP, and 95 from areas MT and MST combined.
The animal was seated in a primate chair facing a translucent screen (60° × 60° of visual angle) in near darkness and performed an oculomotor task for liquid reward. The animal's head was stabilized using a head-post and eye position was monitored using scleral search coils. The fixation dots were small, faint light-emitting diodes back-projected onto the screen (0.5° diameter, 0.4 cd/cm2). Each trial of the animal's task began with fixation on a dot for 1000 ms. The dot then stepped either rightward or downward by 10° (with equal probability), cueing the animal to perform a saccade to the new position and hold fixation for a further 1000 ms. The initial position of the fixation dot was selected pseudorandomly across trials from five possible locations arranged like the value 5 on a standard six-sided die ([
All analyses were performed in MATLAB R2014b (The MathWorks, Inc.). The raw data included timestamps for the recorded spikes for each neuron and eye position data. Spike-times within each trial were expressed relative to the onset of the ~10° amplitude primary saccade (detected offline using eye velocity criteria), and converted to instantaneous firing rates using a 50 ms wide counting window stepped in 25 ms increments from −800 ms to +800 ms. These firing rates were then averaged over trials separately for each of the 10 task conditions (five initial fixation positions and two saccade directions). The data for the five initial positions were then averaged to yield a single firing rate time course for each of the two saccade directions (rightward and downward) for each neuron. For each cortical area, the time courses for the two saccade directions were compiled into matrices,
We used principal component analysis (PCA) to investigate whether a small number of typical firing rate modulation patterns could capture the observed neural dynamics at the time of saccades. As we were particularly interested in saccade-direction specific dynamics, we first subtracted the response time course for rightward saccades (
The aim of our main analysis was to determine whether the activity of the recorded neurons could be combined into a pair of output variables,
We use matrix notation to link these equations to the data. Using the horizontal channel as an example, we modeled the relationship between the eye's true position and firing rates as:
The design matrix,
Equation 2 is therefore a set of linear equations with unknown parameters β
To estimate β
After estimating β
To determine whether a neural population could support predictive or delayed representations of the actual eye movement, the general linear model analysis was repeated using a range of time-lagged eye signals as the target variable for the regression. Specifically, the mean (μ) of the cumulative Gaussian used to model eye position (
The linear readout typically generated sigmoid-like representations of eye position over time. If the output were perfect, the times of these sigmoidal transitions would have matched those of the target signals—that is, the “achieved lag” would match the target lag. To quantify the achieved lag for each target lag condition, the mean outputs of the
To analyze how much each recorded neuron contributed to the decoding performance, we first determined, for each target lag, and separately for
We defined a neuron's “pooling weight” as the average of its absolute weight values to
We recorded the spiking activity of neurons in macaque areas LIP, VIP, MT, and MST during an oculomotor task consisting of rightward and downward saccades. We have shown previously for this data-set that many neurons in all four areas show tonic changes in firing rate across changes in eye position (Morris et al.,
To illustrate the diversity of peri-saccadic dynamics across neurons, we performed a PCA on the saccade direction specific components of the firing rate (see Materials and Methods), separately for each cortical region (MT and MST neurons were pooled). The first three principal components for each cortical area are shown in Figure
The components revealed complex dynamics underlying the firing rates of these neurons. The first component, for example, consisted of a broad enhancement or reduction of activity for all three cortical regions. Because we analyzed the saccade direction-specific time courses, this corresponds to an enhancement/reduction for one saccade direction relative to the other (and is therefore complementary to the general changes in peri-saccadic firing rate discussed in Bremmer et al.,
The PCA analysis serves mainly to illustrate the richness of the response time courses in the population. Next we turn to our population decoding approach, in which such richness is not a sign of “inconsistent behavior” (cf. Xu et al.,
We used a linear decoding approach to reveal information about eye position and eye movements in the recorded neural data (Figure
Each über-neuron took a weighted sum of the recorded neural activity from a given cortical area. In a first analysis, the weights were optimized such that the predicted eye positions represented by the output—
Figure
Coefficients of determination (
The results presented in Figure
We constructed a range of synthetic eye-position signals, each with a different “target lag,” defined as the time interval between the sigmoid step of the regression target and that of the actual eye. A unique set of pooling weights was estimated for the output variables (
Figures
There were, however, limits to the neurons' ability to represent time-lagged eye-position signals. Target signals that were updated more than 100 ms before or 200 ms after the saccade were fit poorly by the linear read-out model. In those cases, the outputs either drifted slowly toward the post-saccadic eye position (e.g., for target lags > 300 ms), or showed a step-like transition at a time that did not match that of the target signal. Figure
To quantify the dynamics of the decoders, we fit cumulative Gaussians to the predicted time courses (shown by the green curves in Figures
Figure
The generally poor performance of the decoders for target signals that led the eye by 300 ms or more was expected. After all, the direction of the impending saccade could not have been known to the neurons earlier than ~213 ms (the average saccade latency) before saccade onset, when the fixation point was displaced. Moreover, the decoders used fixed weights for
We therefore asked at what time
In sum, neurons in the cortical regions we examined supported a continuum of accurate, time-shifted representations of eye movements, including signals that led the eye by as much as 100 ms and lagged by up to 200 ms. Target signals that were updated outside of these bounds were approximated poorly by the über-neurons, reflecting the limits of peri-saccadic information about future and past eye positions.
In principle, an über-neuron could match a target signal by assigning approximately equal weights to all neurons, at one extreme, or high weights to one or a small number of neurons and near-zero to the rest, at the other. We examined the sparseness of the decoded representations by evaluating the contribution of each recorded neuron to the über-neurons for each decoder (see Materials and Methods). The analysis was performed only for decoders that provided an adequate fit to the target signal (defined as a total
Figure
To explore this further, we calculated how much the contribution of each neuron varied across the different lags. Specifically, we calculated the difference between each neuron's maximum and minimum contribution and expressed this difference as a percentage of its mean contribution over lags. Figure
Our analysis shows that neurons in extrastriate and PPC carry a continuum of time-lagged representations of eye position, including predictive, zero-lag, and post dictive signals. These flexible signals were found in all the regions we studied, including those that are not directly involved in saccade planning (VIP, MT/MST). The representations were not superficially evident in the firing rates of single neurons but manifest only when population activity was read out appropriately by artificial downstream neurons (“über-neurons”). With different synaptic weightings, über-neurons carried EPS that shifted toward the new fixation position in sync with the actual eye, led it by up to 100 ms, or lagged behind it by up to 200 ms. These peri-saccadic limits on accurate time-lagged EPS align well with the typical duration of the intersaccadic interval during normal vision (≈300 ms, Ballard et al.,
The 100 ms limitation on the predictive effect is likely determined at least in part by the nature of the instructed saccade paradigm used in this study; given the typical visual latencies in these areas, information about the impending saccade direction could not have been available much earlier. The post-saccadic limit of ~200 ms, however, was not constrained by our experimental paradigm and reflects the fading neural memory of past eye positions in the neural population (or at least that available to a linear decoder).
In light of our results, one might consider it curious that there are (to our knowledge) no empirical accounts of cortical neurons that exhibit zero-lag or predictive dynamics like our über-neurons. Indeed, Xu et al. (
In our view, single neurons are unlikely to be devoted exclusively to the purpose of representing eye position. Therefore, a search for über-neurons with dynamics like those reported here would likely be fruitless. Neurons multiplex a large variety of signals in their firing rates and participate in a multitude of functions simultaneously. LIP firing rates, for example, are influenced by visual (Colby et al.,
Nevertheless, we find the concept of a linear über-neuron appealing because it shows that the decoding we perform is not complex and could be achieved in a monosynaptic computation in the brain. Further, it provides a convenient way to visualize the EPS, even though the brain might access these high-dimensional population codes in smarter ways than we can currently imagine. EPS are ubiquitous throughout cortex (V1: Trotter and Celebrini,
In some respects, our conclusions mirror those of a recent study that used probabilistic population decoding to examine eye-position signals in LIP (Graf and Andersen,
First, their decoders chose among a coarse experimental grid of possible eye positions (i.e., the decoder was a classifier), whereas ours estimated eye position as a continuous variable between a start and end position (i.e., it has a metric). The constraints of workable experimental designs result in complex conditional probabilities between eye movement parameters (e.g., the rightmost eye positions in a grid can only be reached by rightward saccades). A classifier can exploit these contingencies to achieve above chance decoding performance, even though they are unlikely to be useful in real life. A metric decoder is more in line with the type of signal the brain requires for spatial processing, and also has the practical advantage that one can study systematic errors on a fine spatial scale (e.g., Morris et al.,
Second, Graf and Andersen constructed a new decoder for each time window relative to saccade onset. For instance, to decode future eye position, they constructed different Bayesian classifiers based on firing rates in time windows before, during, and after the saccade. This approach accurately quantifies the information available in each of these epochs, but it is not clear how the brain might implement this kind of read-out in which the decoder changes over time. Xu et al. (
Third, Graf and Andersen employed a memory-saccade paradigm, whereas our animals made immediate saccades to visual cues. The memory-saccade paradigm has the advantage that it can dissociate saccade planning from saccade execution and visual signals. Indeed, we cannot rule out the possibility that at least some of the predictive information about future eye positions in our study is derived from visually-evoked activity. Visual influences, however, would be maximal immediately after the onset of the target (i.e., ~130 ms before saccade onset, based on the mean visual latency of 80 ms); and yet, these neurons supported predictive signals that remained stable at this time and were only updated later (e.g., the −100 ms lag reported here, as well as any lag between −100 ms and 0 [data not shown]). This suggests that visual influences are unlikely to account for the predictive eye-position signals reported here. Applying our approach to experiments with stable visual displays or memory-guided saccades is needed to fully resolve this question.
In Graf and Andersen's study, the future eye position could be decoded reliably from the moment the target had been specified, even though the saccade itself did not occur until more than half a second later. This suggests that the decoder was able to use the current eye position and direction-selective planning activity to infer the future eye position (similar to modeling studies: Ziesche and Hamker,
A fourth difference between our study and that of Graf and Andersen is that their decoder provided estimates of
Finally, Graf and Andersen observed that most of the eye position information used by the decoder was obtained from a small subset of neurons (~20), whereas our über-neurons pooled broadly across neurons. This apparent discrepancy perhaps reflects the different emphasis in the studies. Our decoder assigns weights to neurons for their ability to provide consistent information on the full spatiotemporal dynamics of a saccade. This constraint resulted in a distributed code. Graf and Andersen, however, constructed a new classifier in each 250 ms time window; it would seem that a small subset of neurons (i.e., a sparse code) carries most of the information in each of these windows.
In Morris et al. (
Although it was not framed as such, that analysis was in some respects a rudimentary version of the population decoding reported here; that is, neurons in the two sub-populations received binary weights of 1 and −1 (though these weights had to be re-assigned for every saccade direction, unlike in the current study). Our current findings show that given the freedom to choose graded weights, a decoder can generate a near-veridical representation of the eye, as well as a range of time-lagged variants.
This raises the question why the perceptual system makes errors in the context of this well-established experimental paradigm. One explanation is that even if eye position information is veridical, information on the location of the stimulus on the retina may not be (Krekelberg et al.,
We tested this idea informally by omitting the peri-saccadic epoch (i.e., 100 ms either side of the saccade) during the estimation of weights for each neuron, such that the decoder is optimized for representing eye position during fixation. We then predicted the EPS for all times, including the perisaccadic window. As expected, this EPS was clearly damped, similar to the one we reported previously, and qualitatively consistent with perisaccadic mislocalization. Of course, there are likely other factors at play, such as uncertainty regarding the temporal onset of the visual flash (Boucher et al.,
Although perhaps counter-intuitive, a perfect, zero-lag representation of eye position may in general be no more useful to the brain than a range of other time-lagged signals, such as those that are updated before or after eye movement. In the standard gain-field model of spatial processing, the co-existence of eye-centered visual information and eye-position signals gives rise to an implicit, head-centered representation of visual space (Andersen et al.,
The required delay, however, would vary across cortical areas in accordance with differences in visual latencies. Although this variation is fairly modest in traditional measurements during fixation (tens of milliseconds, Schmolesky et al.,
Taken together, these considerations suggest that a single, global EPS might be insufficient to support stable vision. Our results show that through appropriate synaptic weighting, an EPS can be tailor-made for a given neuron or population to ensure that it is notified of changes in eye position only at the suitable time. That is, the cortex could be furnished with an essentially infinite number of different EPS, all achieved through unique pooling of signals. Local computations, therefore, could incorporate information about past, current, and future eye positions simultaneously. This could allow, for example, self-induced changes in sensory representation to be dealt with differently to those caused by true changes in the outside world (Crapse and Sommer,
Remarkably, our analysis of pooling weights suggests that profoundly different time courses can be achieved through modest local adjustments (20–40% on average) to a coarse and universal weighting template. To an extent, this is not surprising, given that the target signals used here differed only in the timing of the saccade representation and had in common the extensive fixation intervals (the correlation between pairs of target signals was between 0.35 and 0.91; mean = 0.68). Nevertheless, the profound global effects of such subtle changes to the network provides a striking example of the powerful yet obscure nature of distributed population codes. Further, it emphasizes the need for population-level analysis techniques to unmask the underlying representations.
AM, FB, and BK designed and performed the research, and wrote the paper; AM and BK contributed analytic tools and analyzed the data.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work was supported by the National Health and Medical Research Council of Australia (AM; APP1083898). The contents of the published material are solely the responsibility of the Administering Institution, a Participating Institution, or individual authors and do not reflect the views of the NHMRC. Additional support came from the Eye Institute of the National Institutes of Health, USA (R01EY017605) and the Deutsche Forschungsgemeinschaft (CRC/TRR-135/A1).