Time Course of Target Recognition in Visual Search

Visual search is a ubiquitous task of great importance: it allows us to quickly find the objects that we are looking for. During active search for an object (target), eye movements are made to different parts of the scene. Fixation locations are chosen based on a combination of information about the target and the visual input. At the end of a successful search, the eyes typically fixate on the target. But does this imply that target identification occurs while looking at it? The duration of a typical fixation (∼170 ms) and neuronal latencies of both the oculomotor system and the visual stream indicate that there might not be enough time to do so. Previous studies have suggested the following solution to this dilemma: the target is identified extrafoveally and this event will trigger a saccade towards the target location. However this has not been experimentally verified. Here we test the hypothesis that subjects recognize the target before they look at it using a search display of oriented colored bars. Using a gaze-contingent real-time technique, we prematurely stopped search shortly after subjects fixated the target. Afterwards, we asked subjects to identify the target location. We find that subjects can identify the target location even when fixating on the target for less than 10 ms. Longer fixations on the target do not increase detection performance but increase confidence. In contrast, subjects cannot perform this task if they are not allowed to move their eyes. Thus, information about the target during conjunction search for colored oriented bars can, in some circumstances, be acquired at least one fixation ahead of reaching the target. The final fixation serves to increase confidence rather then performance, illustrating a distinct role of the final fixation for the subjective judgment of confidence rather than accuracy.


INTRODUCTION
When searching for a known target in a visual scene, eye movements are guided by a combination of retinal input and information about the target stored in working memory. Depending on the task, the same scene can evoke very different scan paths. During free viewing, the most salient locations are preferentially fi xated (Parkhurst et al., 2002;Peters et al., 2005;Mannan et al., 2009). When looking for a particular target, however, this pattern changes: locations that share features with the target are preferentially fi xated (Williams, 1966;Yarbus, 1967;Zohary and Hochstein, 1989;Wolfe, 1994;Findlay, 1997;Motter and Belky, 1998;Bichot and Schall, 1999;Hooge and Erkelens, 1999;Beutter et al., 2003;Najemnik and Geisler, 2005;Navalpakkam and Itti, 2005;Einhauser et al., 2006;Rajashekar et al., 2006;Ludwig et al., 2007;Rutishauser and Koch, 2007;Tavassoli et al., 2007). That is, stimuli are fi xated because of their behavioral relevance rather than their saliency. The more diffi cult the search, the longer it takes and the more fi xations are required (Binello et al., 1995;Williams et al., 1997;Zelinsky and Sheinberg, 1997;Scialfa and Joffe, 1998). Throughout search, two decisions need to be made: where to next move the eyes (planning) and detecting the target. Planning has been extensively studied (Motter and Belky, 1998;Caspi et al., 2004;Najemnik and Geisler, 2005;Rutishauser and Koch, 2007;Zelinsky, 2008). Where to saccade next is largely determined afresh at every fi xation with little carry-over of information from the last fi xation (Wolfe, 1994;Findlay et al., 2001), that gradual accumulation is likely, this has not been conclusively demonstrated experimentally for two-feature (color, orientation) conjunction search.
One fundamental constraint on the speed of target recognition is imposed by the time required for information to arrive at the appropriate areas of the brain. The human visual system can detect the presence or absence of complex objects within a very short time (Potter, 1976;Thorpe et al., 1996). Stimulus-specifi c responses measured with surface EEG take at least 150 ms to emerge (Thorpe et al., 1996). The frontal eye fi elds (FEF) are known to be crucial for initiating voluntary eye movements. In macaque monkeys, the earliest single-neuron responses in FEF emerge after 75 ms. These very early responses are, however, neither stimulus nor response selective (Schmolesky et al., 1998). On the motor side, it takes at least 140 ms to stop the execution of a pre-planned eye movement in humans and monkeys (stop signal reaction time; Hanes and Carpenter, 1999;Emeric et al., 2007). However, in our experiment, the average fi xation duration is only 170 ± 70 ms. It is thus conceivable that this is not enough time to detect a target and stop the search before the next saccade is executed. Here, we test this hypothesis.
We use a novel gaze-contingent (Perry and Geisler, 2002;Geisler et al., 2006) experimental paradigm to terminate search with millisecond accuracy after the eyes fi rst come close to the target. We show that subjects' accuracy to detect the target is high and does not depend on dwell time on the target, even for times as short as 10 ms after landing on the target. Supporting earlier arguments directly, we fi nd that information about the target is acquired at least one fi xation ahead. Further, we show that subjects nevertheless choose to fi xate the target in order to increase subjective confi dence.

MATERIALS AND METHODS
Twenty four subjects were paid for participating in the experiment. All had normal or corrected-to-normal vision and none were aware of the purpose of the experiment. The experiments were approved by the Caltech Institutional Review Board, and all subjects gave written informed consent. All subjects were tested for red-green color deficiency using 24 color plates (Ishihara, 2004). One subject had to be excluded due to color blindness (not included in number above).

TASKS -SEARCH ARRAY
We created the search arrays by placing 49 items on a 7 × 7 grid with 3.25º and 2.25º spacing in the x and y directions, respectively (Figure 1). Uniformly distributed position noise of ±1.00º and ±0.50º was added to each grid position (x and y directions respectively). We then rearranged the items so they would fi t inside an imaginary 4 × 3 grid (4 columns and 3 rows; see Figure S1 in Supplementary Material). This imaginary grid was used for deciding whether to report a trial as correct or incorrect to the subject. We used this grid instead of the original 7 × 7 grid to decrease the accuracy necessary for correct target localization. In addition we used this grid after the experiment to calculate the chance performance of localizing a target correctly. The resulting average distance to the closest neighbor was 2.13º, while the minimal and maximal distances were 2.10º and 2.38º respectively.
There were four different search item types (e.g. all combinations of red/green and horizontal/vertical). Three out of those four unique item types were present in a particular search array.
The distractors were chosen such that half of them shared the fi rst feature dimension with the target while the other half shared the second feature dimension (e.g. green/horizontal and red/vertical). Each search display consisted of 24 distractors and one target. The item size was 0.50º × 0.25º or vice versa.

TASKS -PSYCHOPHYSICS
The screens shown to the subject are illustrated in Figure 2. At the beginning of each trial a blank screen was displayed for 1 s, followed by a white fi xation cross at the center of the screen (400ms display time). At the center of the next screen the target was presented for 1 s. To assure that subjects started the search at the center of the screen, we subsequently presented a second fi xation cross which subjects had to fi xate for 400 ms (within a 1.5º radius) to start the trial. If subjects failed to do so, recalibration was started automatically.
Depending on the experiment (see below), the search display (49 colored oriented bars) was present for a period of time between 20 ms and 25 s. Subjects were free to move their eyes (except during experiment 3, where subjects were required to maintain fi xation within 1.5º of the center of the screen) and were instructed to fi nd the target as quickly as possible. A trial was terminated either if subjects fi xated within an area of 1.5º around the target for at least 400 ms (experiment 1 and 2) or if the trial timed out (whichever was fi rst). The maximum time allowed for each trial was pre-determined before the start of the trial (range: 20 ms to 25 s, depending on the experiment; see below). If, during the maximal time allowed, subjects failed to identify the target by fi xating on it for at least 400 ms, the trial was terminated regardless of the subjects' behavior. No manual interaction was required to terminate the trial except in the button press experiment. The mask consisted of 800 randomly positioned red and green rectangles. It was shown for 100 ms to erase any retinal or iconic memory representation of the search display (Breitmeyer, 1984;Yantis and Jonides, 1996;Enns and Di Lollo, 2000). After the mask, an instruction screen was shown for 1 s followed by a fi xation screen (white cross, 400 ms). Subjects were required to keep their gaze inside an area of 1.5º of the cross for the trial to continue. Trials where subjects failed to do so were discarded. This is to ensure that subjects did not keep their gaze at their last fi xation (which might be the true location of the target). Afterwards, the search screen was shown again with all items replaced by black crosses. Subjects were asked to look at the location where they thought the target was. To indicate their choice, subjects needed to keep their gaze constant inside an area of 1.5º radius for 600 ms. The next screen asked subjects to indicate their level of confi dence (confi dent, maybe, guessing) for this choice by button press. At the end, subjects received visual feedback (text, displayed for 500 ms) about whether their answer was correct or not.
A trial was considered correct (for the purpose of feedback) if the indicated location was within a 2.50º radius centered on the target. We used this coarse criterion to motivate subjects during the experiments but used a stricter rule for data analysis (see below).
Each subject performed only a single experiment type, which consisted of 8-10 blocks of 32 trials each. Prior to the experiment, subjects were given 20 practice trials (excluded from data analysis).

TASKS -EXPERIMENTS
We performed three different experiments. Each subject only participated in one of the three. In experiment 1, subjects were instructed to fi nd the target as quickly as possible and to indicate their choice by fi xating on it. There were two categories of trials: normal and early terminated. In normal trials, subjects were allowed up to 25 s to fi nd the target. In early termination trials, the search screen was removed before the subject was able to fi nd the target. Early terminated trials would either terminate while the subject was fi xating the target (early The target (here a red horizontal bar) was shown at the center of the screen for 1 s. (D) A fi xation screen (white cross, 400 ms) was shown to assure that subjects always started the search at the center of the screen (fi xation enforced). (E) The search screen consisted of 49 items and was shown for a random amount of time (<25 s), suffi cient for subjects to fi nd the target in about 50% of all trials. (F) Immediately after search screen offset the mask was shown for 100 ms. (G) An instruction screen (1 s) reminded subjects to indicate the target location. (H) Fixation screen (white cross, 400 ms, fi xation enforced) to ensure that subjects did not keep their gaze at the last fi xation. (I) The target location screen. A black cross is shown at each point where an item was displayed in the search display. Subjects were instructed to fi xate the cross corresponding to the target location for 600 ms in order to submit their choice. Subjects were then asked to indicate their confi dence on a scale of 1 (guessing), 2 (maybe) or 3 (highly confi dent) by button press. Afterwards, feedback was given as to whether the indicated target location was correct (not shown in this fi gure). Screens shown are not drawn to scale. termination on fi nal fi xation) or randomly through search (temporal early termination). We balanced the number of early terminated and normal trials so that in approximately 50% of all trials the duration was long enough for the subject to fi nd the target (normal termination). We chose the next trial duration adaptively by sampling from a log-normal distribution that was generated by taking into account previous trial durations as well as their outcome (correct/incorrect). Subjects did not know whether a trial terminated because they fi xated on the target (for 400 ms) or whether it was early terminated by the computer. In experiment 2 ("button press"), subjects were instructed to fi nd the target as quickly as possible and to press a button as soon as they knew where it was. In experiment 2, the trial timeout was always 25 s. In both experiment 1 and 2, subjects were free to move their eyes during search. In experiment 3 ("fi xation control"), the search screen was shown for a short amount of time (20-600 ms). Subjects had to maintain fi xation within 1.5º of the center of the screen while searching for the target. In all three experiments, the same procedure followed after the end of a trial (mask, fi xation, target location indication, confi dence indication).
See Supplementary Material for written instructions given to subjects.

EQUIPMENT -EYE TRACKING
Throughout all experiments we recorded subjects' right eye positions using a non-invasive infrared Eyelink-1000 (SR Research Ltd., Osgoode, ON, Canada) eye tracker with a sampling rate of 1,000 Hz. We used the manufacturer's software for calibration and validation (9-point calibration grid). The average radial resolution was 0.39º (the resolution of the eye tracker itself is 0.01º). Fixations were detected using the built-in fi xation detection mechanism. We used the system's "real time" data acquisition mechanism which allowed us to react to eye movements with a delay of 2 ms. We confi rmed this delay time by randomly sending timestamps during the experiments. Figure 1 shows an example of a scan path.

EQUIPMENT -SOFTWARE AND SCREEN
All experiments were implemented with Matlab (MathWorks, Natick, MA, USA) and the psychophysics toolbox version 3 (Brainard, 1997;Pelli, 1997) running on a Windows PC and a 19in. CRT monitor (Dell Inc., Round Rock, TX, USA), which was located 80 cm in front of the subject. The maximal luminance (white screen) of the presentation screen was 29 cd/m 2 . Maximal luminance for the green and red channels was 9.6 cd/m 2 and 6.1 cd/ m 2 respectively. Ambient light levels were below 0.01 cd/m 2 . The background of the screen was set to a light gray (14 cd/m 2 ) in order to reduce contrast. The display size was 25º × 20º. Subjects' heads were stabilized using a chin rest and a forehead rest to avoid head movement. The bit values used for red, green and gray were 255, 255 and 212 respectively. Red and green were not isoluminant. All experiments were run with a vertical screen refresh rate of 120 Hz; hence the refresh interval was roughly 8 ms.

DATA ANALYSIS -INCLUDED TRIALS
We classifi ed trials as correct if subjects indicated the correct target location (on the screen with crosses) with an accuracy of at least 1.5º. During the experiment, a trial was reported as correct to the subject if accuracy was at least 2.5º to not discourage subjects.
Results are reported (unless noted otherwise) as the mean over subjects and the standard error over the number of subjects. In case a subject contributed only 1 trial to the current data bin, this subject was not included into the analysis for this particular bin.
Six percent of all trials were excluded from analysis. These trials either timed out, were skipped because the target was too close to the center of the screen, because the subjects moved their eyes when they were not supposed to, or because eye movements were outside of the screen. We also excluded 38.2% of all fi xation control trials (experiment 3) because we were only interested in trials where the search screen was shown for less than 400 ms.
The degree of diffi culty for the search task was quantifi ed by the number of fi xations to fi nd the target. All fi xations between stimulus onset and mask onset were counted. We fi nd that the average fi xation duration, saccade duration and saccade size is quite stereotypic for all subjects (170 ms ± 70 ms, 44.3 ms ± 4.8 ms and 5.78º ± 0.85º, all ± SD). Consequently, the number of fi xations to fi nd the target is proportional to the time it takes to fi nd the target.

EXACT TIMING OF STIMULUS ONSET AND EYE MOVEMENTS
We developed a method to match the actual stimuli presentation length (taking into account that a typical trial will terminate while the screen is in the middle of its refresh cycle ∼4 ms) with the eye movements. The resulting uncertainty of this method is 1 ms. In order to not tamper accidentally with this stimulus/eye position matching, we always assume that the stimulus was present for this additional millisecond and thus include this additional data point into our analysis. Note that this is only done after the experiment is fi nished (offl ine).
During the actual experiment, some trials are terminated at random times before the subject identifi es the target ("early terminated", see above). The procedure described here is used to retrospectively determine where the subjects' eyes were when the screen disappeared from the screen. This data is then used to identify where the subject's eyes were relative to the target and for how long. Figure 3 illustrates the interaction between the eyetracker and the experimental computer during the course of a trial (see Supplementary Methods for details).

EXPERIMENT 1/LOOKING AT THE TARGET
We asked ten subjects to fi nd a target among 48 colored oriented bars (Figures 1 and 2). The target was unique and shown to the subjects before each trial. Subjects were instructed to fi nd the target as fast as possible and trials would terminate in one of the following three ways (experiment 1): (i) normal trials terminated after subjects looked at the target for 400 ms; (ii) trials terminated after subjects looked at target for less than 400 ms (early termination on fi nal fi xation; we tested 1-400 ms); (iii) trials terminated randomly throughout search (temporal early termination), which resulted in subjects' last point of view being 1.5º-20.0º away from the target. Technical constraints make it non-trivial to guarantee that stimuli changes triggered by eye movements are performed with millisecond accuracy. However, software we developed allowed us to do so with an effective time lag of 1 ms (see Materials and Methods for details). Following a mask and a central fi xation cross (enforced) at the end of the trial, subjects were asked to look at the location where they thought the target was, along with providing a three-level confi dence rating. For normal trials, subjects needed on average 4.38 ± 0.77 fi xations (898 ± 197 ms) to fi nd the target. The average saccade amplitude for correct trials was 5.56 ± 0.85º. Subjects correctly identifi ed the target location for 83.3% of all normal trials, with chance corresponding to approximately 13.9%. The average confi dence for all correct trials was 2.62 on a scale from 1 (guessing) to 3 (highly confi dent) (see Materials and Methods for details on how chance performance and confi dence were calculated). We analyzed trials in which the search screen was removed and effi ciently masked following a fi xed interval (<400 ms) after the subjects acquired the target (that is, after they fi xated the target within 1.5º; early termination on fi nal fi xation). The amplitude of the last saccade made to the target was on average 5.25 ± 1.18º and was not signifi cantly different from the average saccade amplitude (rank sum, p = 0.43). Trials with a last saccade amplitude of less than 4º were excluded to make sure that we do not look at effects caused by corrective saccades. Surprisingly (Figure 4), detection performance did not depend signifi cantly on the delay between fi xation onset and screen offset (ANOVA, p = 0.61, Figure 4A). This was also true for pair wise comparisons between the fi rst and the 2nd-5th bin (t-test, α = 5% uncorrected; Figure 4A). Thus, subjects always knew the target location with high accuracy. Even for trials in which the target was fi xated only very shortly (<10 ms), performance was signifi cantly different from chance (p < 0.001). This result also holds on a trial by trial basis (Figure S2 in Supplementary Material). We calculated the mean confi dence rating for all trials, regardless of whether the answer was correct or not. In contrast to performance, the mean confi dence of all trials (correct and incorrect) did increase as a function of fi xation duration from 2.3 ± 0.2 to 2.9 ± 0.1 (ANOVA, p = 0.015, Figure 4B). Mean confi dence also increased if only correct trials were considered (ANOVA, p = 0.028; from 2.5 ± 0.3 to 3.0 ± 0, data not shown).
In case of randomly terminated trials (temporal early termination), performance and confi dence did strongly depend on the distance between the last point of view and the target location (Performance and Confi dence ANOVA, p < 0.001, Figure 5A). The median distance for correct trials was 4.1º and 10.1º for incorrect trials. A within-subject ANOVA revealed a signifi cant difference between the two population of distances (p < 0.001, Figure 5B). Performance was not different from chance once the distance between target and last point of view was greater than 6º ( Figure 5A).
So far, only the distance between the fi xation point and the target was considered as a factor. Does performance also increase as a function of time? To evaluate this, we used the duration of the last fi xation (before interruption). We expect a higher probability of being correct in trials where subjects fi xated for longer at a distance where they could possibly detect the target (i.e. <6º). Indeed, we found that trials which terminated within a distance of 4-6° away from the target did show a dependency between the last fi xation duration and correctness of the trial (withinsubject ANOVA, p = 0.013, Figure 5C) and were independent of the previous saccade length (ANOVA, p = 0.18; data not shown). In contrast, trials which terminated within a distance of 1.5-4° away from the target, were independent of last fi xation duration (ANOVA, p = 0.97; data not shown). Thus trials that did not terminate on the target were only correct if subjects fi xated long enough during the preceding fi xation, independent of the previous saccade size. FIGURE 3 | Illustration of the procedure used to match the recorded eye movement data with the stimulus presentation timing. The goal is to determine the gaze position at the instant when the search screen was replaced by the mask. The stimulus presentation computer sends timestamps throughout the trial to the eye tracking system. Of interest is the exact delay (t 12 ) between stimulus onset and recorded eye movements. Before stimulus onset (t 1 ) and after mask offset (t 6 ) we always send a timestamp so we can calculate t tracker . The time between stimulus onset and mask onset (t 25 ) is known. We

EXPERIMENT 3/DETECTION WITHOUT EYE-MOVEMENTS
How much time is required to identify the target in the absence of eye movements? To quantify this, we briefl y fl ashed search arrays (for 20-600 ms) onto the screen and asked subjects (n = 11, who did not participate in the previous experiments) to identify the target's location (experiment 3). Subjects were required to maintain fi xation within 1.5º of the center of the screen. The target was always located within 5.0º of the fi xation point. If indeed the target is identifi ed at the previous fi xation, it should be impossible for subjects to locate the target as quickly as in the previous experiment (<10 ms). First, we performed this task with the mask immediately following the end of the trial as done previously (experiment 3 with mask, r < 5.0º). Subjects (n = 7) needed at least 375 ms to successfully localize the target ( Figure 6A, 60% correct, p = 0.05 compared to chance). Performance as well as confi dence increased as a function of time (ANOVA, p = 0.002 and p = < 0.001, respectively). Thus, the very short presentation times (<10 ms) suffi cient to localize the target in active search (experiment 1, early termination on fi nal fi xation; successful identifi cation in 76% of all trials) imply

EXPERIMENT 2/DETECTING THE TARGET
So far, subjects were required to look at the target to indicate successful localization of the target. Do subjects also look at the target in natural, unconstrained, search? We repeated the experiment with the instruction to localize the target and press a button as fast as possible, independent of whether or not subjects fi xated the target and for how long (experiment 2). Afterwards, subjects (n = 3, who did not participate in the previous experiment) were asked to indicate the target position and confi dence by pressing a button (see Materials and Methods). Subjects correctly identifi ed the target location in 89.9% of all trials. In 91.2% of all correct trials, subjects fi xated the target (within 1.5º) before pressing the button. The fi nal fi xation lasted on average 327 ms ± 96 ms, which is signifi cantly longer than the average fi xation duration (227 ± 142 ms). Thus, subjects looked at the target location for 100 ms longer than a typical search fi xation, even though they were not required to do so (data not shown). 8.7% of all trials in experiment 2 terminated when subjects were not looking on the target (r > 1.5º). Still, subjects identifi ed the target correctly in 80.8% of these cases.

FIGURE 4 | Performance when trials were terminated after subjects fi xated on the target (early termination on fi nal fi xation, Experiment 1). (A)
Subject's performance (green bars) was independent of the time between fi xating the target and removal of the search screen. Note that for very short fi xation times of 1-10 ms, performance is highly signifi cantly different from chance (yellow bars; see Materials and Methods for details). (B) Confi dence ratings of all trials (correct and incorrect). Subjects' confi dence rating (on a scale 1-3) increases as a function of fi xation duration. Error bars are ±s.e. over subjects. Data shown on a non linear axis to emphasize the fi rst bins. All ANOVA values refer to a one-way ANOVA (see Supplementary Material for details). that the target location must have been acquired during previous fi xations. Demonstrating this, subjects were at chance if we presented the exact same visual information for the same amount of time in the absence of previous fi xations ( Figure 6A). We repeated the experiment but this time targets were located outside the previously mentioned 5.0º around the center (experiment 3 with mask, r > 5.0º). Subjects were never able to identify the target better than expected by chance (12.2% ± 9.9%, Figure 6B) and performance only increased weakly with presentation time (ANOVA, p = 0.02). Subjects' confi dence was never higher then guessing (1.49 ± 0.18) and did not signifi cantly depend on time (ANOVA, p = 0.13), (data not shown). This result supports the previous fi nding that eye movements are necessary to perform the search task and that target detection is constrained to a specifi c detection radius. It could be argued that perhaps the mask we used is not effective in suppressing visual information. Thus, in a control experiment (4 subjects), we presented targets inside 5.0º of the fi xation point, this time leaving out the mask at the end of the trial (experiment 3 without mask, r < 5.0º). It is known that target detection can proceed for very short presentation durations if no mask is presented (Thorpe et al., 1996;Fabre-Thorpe et al., 1998;Keysers et al., 2001;Li et al., 2002). It is thus expected that subjects will be able to locate the target in this condition. Confi rming this, we found that subjects were able to identify the target with high confi dence even at very short presentation times (Figure 6C 70% correct for <75 ms; p = 0.05 compared to chance; see Figure S3 in Supplementary Material for trial by trial distribution). The masking effect is immediately apparent (compare Figures 6A,C). Therefore, we conclude that our mask was effective.

Frontiers in Human
In our basic experiment, subjects reached performance levels between 76-93% correct ( Figure 4A). Why were subjects never perfect? This gap in performance can, in part, be attributed to "return saccade" trials (which have been observed in monkeys as well as humans, i.e. (Motter and Belky, 1998;Peterson et al., 2001;Sheinberg and Logothetis, 2001). For the same task, we observed previously (Rutishauser and Koch, 2007) that subjects failed in approximately 12% of all trials to identify the target the fi rst time they were directly looking at it. Subjects eventually returned to and identifi ed the target ("return saccades"). Thus, landing on the target (without time constraint) does not guarantee detection of the target (also see below).

DISCUSSION
Our results show that object detection in active conjunction search for a known target can occur with target fi xation times under 10 ms (experiment 1, early termination on fi nal fi xation; Figure 4A): subjects correctly identifi ed the target even if they fi xated it for less memory) must have been acquired before the subject saccaded towards the target, at least one fi xation ahead. This is why once the target is fi xated, subjects are always able to correctly detect it even if the target was masked within 10 ms (by a mask that was disruptive enough to erase any retinal and iconic memory representation of the search display) (Breitmeyer, 1984;Kovacs et al., 1995;Yantis and Jonides, 1996;Enns and Di Lollo, 2000). Note that our experiments show that subjects typically, if given the freedom to do so, fi xate the target location before indicating knowledge of the target location (experiment 2). Thus, subjects fi xate the target before they press the button to terminate the trial. However, our fi rst and principal experiment (experiment 1) demonstrates that this is not necessary: if subjects are forced to identify the target location before ever fi xating it (up to 6 deg away), they nevertheless succeed in identifying the target location (albeit with lower confi dence). Thus, while the information necessary to identify the target was available, subjects did not terminate the search if they could choose to do so. One reason why this might be the case is for purposes of confi dence, as shown below. Also note that in our fi rst experiment, subjects were instructed to look at the target as fast as they could (to terminate the trial). Our experiment 2, however, shows that subjects also choose to look at the target if not forced to. In fact, they almost always (91% of cases) chose to look at the target before manually terminating the search.

FIXATING THE TARGET INCREASES IDENTIFICATION CONFIDENCE
In our experiment, longer fi nal fi xation durations only increased the confi dence, but not the performance, with which subjects made their decision ( Figure 4B). Subjects accurately reported their confi dence, than 10 ms and with a mask present at the end of a trial. This result stands in sharp contrast to our second fi nding that in the absence of eye movements, much longer times are required to successfully locate the target: the search screen had to be fl ashed on for at least 375 ms, otherwise subjects were unable to perform the task (experiment 3 with mask, r < 5.0º; Figure 6A). However, in the very same task, object detection can occur with very short display times if no mask is used (experiment 3 without mask, r < 5.0º; Figure 6C). Confi rming previous results (Thorpe et al., 1996;Fabre-Thorpe et al., 1998;Keysers et al., 2001;Li et al., 2002;Serre et al., 2007), enough information can be acquired in these short (<75 ms) periods to detect the target.
How can these contradictory results be reconciled? These two cases differ in one substantial point: during active search (as opposed to at fi xation), subjects previously fi xated other positions in the display before landing on the target and therefore do have access to the relevant visual information, albeit at a non-zero retinal eccentricity, that is, away from the fovea, the point of highest acuity. This could be termed "look-ahead" processing. In contrast, there was not enough time to acquire suffi cient information about the target while only fi xating on it (as demonstrated by experiment 3 with mask, r < 5.0º). This interval during the previous fi xation is indeed crucial as demonstrated by the temporal early termination experiment. Once subjects fi xated at a critical distance away from the target, it is the previous fi xation duration that determines the success of a trial (Figure 5B).
We therefore conclude that all the information necessary for the identifi cation of our targets (here, comparing the color and orientation of an elongated bar with the target information in working In the absence of eye movements (fi xation enforced) and in the presence of a mask, performance and confi dence depend strongly on trial duration (ANOVA, p < 0.05) if targets were located within 5.0º of the fi xation point. Performance is above chance (yellow bars) for presentation times >375 ms. (B) It targets were located further then 5.0º from fi xation, performance (green bars) was never signifi cantly higher than chance (yellow bars) and depended only weakly on time (ANOVA, p = 0.02). (C) Without the mask (and r < 5.0º), performance is independent of time (p = 0.76, ANOVA) and always different from chance. All ANOVA values refer to a one-way ANOVA (see Supplementary Material for details).
as demonstrated by a strong positive correlation between confi dence and performance (see Supplementary Material). This fi nding led us to hypothesize that the reason why subjects looked at the target is to increase confi dence. Otherwise there would be no reason for doing so, since the target identity is already known (at least by the saccadic system) before directly looking at it (within ∼5º). We found supporting evidence for this hypothesis in the button press experiment (experiment 2). Subjects almost always (91% of trials) chose to look at the target before they pressed the button even though they were not instructed to do so (confi rming previous results; Maioli et al., 2001). They also fi xated on the target for longer (327 ms) before pressing the button than during the main experiment (experiment 1). This confi rms that the instinctive behavior during active visual search is to fi rst look at the target before confi rming its location. However, in some cases subjects chose to identify the target without fi xating on it. In these cases subjects were nevertheless highly accurate in identifying the target location (81% correct). Clearly peripheral identifi cation is only possible if the items (given the resolution limits) can be discriminated when not directly fi xating them. For targets that are much more diffi cult to discriminate than our colored bars (for instance bars that require the type of high spatial acuity information that's only accessible at the fovea), fi xating the target directly might be necessary.

INFORMATION PICK-UP FROM EXTRAFOVEAL LOCATIONS
It is thought that the neural substrates for controlling attentional and oculomotor shifts are largely the same (Rizzolatti et al., 1987;Corbetta et al., 1998). Furthermore, spatial attention shifts to the saccade target location prior to the onset of the saccade (Crawford and Muller, 1992;Hoffman and Subramaniam, 1995;Kowler et al., 1995;Deubel and Schneider, 1996). This, in turn, facilitates recognition processes at the location that is about to be fi xated. Information necessary for recognition can thus be accessed away from the fovea if it is close enough to the current fi xation (Geisler et al., 2006). Based on these fi ndings and constraints imposed by response latencies (see below), it has been hypothesized that the target is typically recognized extrafoveally and not while fi xating on it. Here we demonstrate experimentally that this hypothesis is true for the case of active visual search. Additionally we also demonstrate that while subjects detect the target before fi xating on it, they nevertheless proceed to saccade to the target (in order to increase their confi dence).
There are multiple tasks for which extrafoveal information acquisition has been shown, such as for saccade sequence programming or reading. In the former, eye movements are usually restricted to predefi ned locations between which subjects will saccade (preprogrammed saccade sequences) (Godijn and Theeuwes, 2003;Caspi et al., 2004;Baldauf and Deubel, 2008). This kind of task is different from natural visual search, where subjects freely explore the search space. In this respect, reading is more similar to visual search, where extrafoveal processing is a known phenomenon ("previewing") (Rayner, 1978;Engbert et al., 2002;McDonald, 2006;Kliegl et al., 2007;Angele et al., 2008). While it has repeatedly been hypothesized that similar processes are at work during recognition in visual search (and this assumption is implicitly built into many models of visual search), here we experimentally demonstrate this process. Note that saccade planning during search (where to fi xate next) is also an extrafoveal process.

GENERALIZATION TO OTHER TASKS AND COMPARISON TO PREVIOUS WORK
We used a conjunction search display consisting of 49 items, half of which shared a feature with the target. Based on this confi guration, we found that targets could be identifi ed successfully in the periphery up to 6 deg away. How do these results generalize to other search tasks? While we do not consider any other confi gurations in this paper, there are several factors that need to be considered. Clearly, the extent to which peripheral processing is possible depends on the target size, distractor density, discrimination diffi culty and noise levels. If due to any reason the search becomes more diffi cult, we expect the effective radius within which this processing is possible to be reduced and at some point foveal processing will be required. On the other hand, if targets are suffi ciently large and easy to discriminate, multiple fi xations (active search) are less benefi cial (Eckstein et al., 2001).
It has been shown previously that information about the target can be acquired peripherally (Viyiani and Swensson, 1982;Scialfa et al., 1987;Scialfa and Joffe, 1998;Hooge and Erkelens, 1999;Eckstein et al., 2001;Caspi et al., 2004;Najemnik and Geisler, 2008;Zelinsky, 2008) and saccade target selection necessarily relies on parafoveal information. Our work adds to this literature by showing that this information is (for our task confi guration) suffi cient to make an accurate choice of target location, despite the fact that under typical circumstances subjects always look at the target. In contrast, it is not suffi cient to make a high-confi dence judgment, which requires foveal fi xation. This is an important and novel distinction between the accuracy of the decision as such and its subjective confi dence. Whether our fi ndings are limited to the conjunction search display we used or generalize to other search confi gurations remains to be shown.

MODELS OF VISUAL SEARCH
Our subjects found the target after an average of 4.38 ± 0.77 fi xations. How many of the 49 items present in the display did the subjects likely consider during the search? In a purely random search without any attentional cues (and assuming that only one item is processed at every fi xation), on average 24 saccades would be required to fi nd the target. Clearly, subjects did not need to process every item since they knew which target they were looking for. Top-down attentional cues about the target reduce the number of fi xations required to fi nd the target (Williams, 1966;Findlay, 1997;Motter and Holsapple, 2000;Najemnik and Geisler, 2005). For similar search displays, subjects preferentially used color to guide the search (Williams, 1966;Motter and Belky, 1998;Williams and Reingold, 2001;Rutishauser and Koch, 2007). This reduces the number of possible items to 24 and thus the expected number of fi xations to 12. (Rutishauser and Koch, 2007), used a quantitative model of search for the same display as used here to estimate that subjects process on average two to three items per fi xation. This reduces the required number of fi xations to 4-6. It is not known, however, whether these 2-3 items are processed sequentially or in parallel. Either way does not alter our fi nding and we thus remain agnostic about this issue. This theoretical consideration fi ts very well with the data we observed (4.38 ± 0.77 fi xations). Our previous work using the same stimuli, as well as similar experiments by others (Findlay et al., 2001), showed that all considered items are located inside a "search radius" of approximately 5-6° around the current fi xation. The current experiments confi rm this: whereas subjects are able to fi nd the target inside a 5.0º circle around the fi xation (Figures 6A,C, experiment 3 with and without mask, r < 5.0º), they are unable to do so for bigger radii (Figure 6B, experiment 3 with mask, r > 5.0º). Furthermore, during active search we found that performance does crucially depend on the distance between target and fi nal point of view, in case trials are randomly terminated ( Figure 5A) and performance is never found to be better then chance for distances bigger then 6.0º.
These results clearly emphasize the necessity of eye movements for this kind of search task. It is known that the density of items in a search array is a critical variable (Motter and Belky, 1998;Motter and Holsapple, 2000). To avoid this confound, we kept the density of items constant.

NEURONAL CORRELATES
This "look ahead of fi xation" processing of targets is compatible with a study reporting recordings from object-selective single neurons in the IT cortex in macaque monkeys (Sheinberg and Logothetis, 2001). Identical to our experiment, they considered fi xations to be on target if they landed within 1.5º of the target (target acquisition). There were two key observations related to the time course of target identifi cation: i) neurons exhibited a differential response to the target approximately 95 ms before the eyes landed on the target. This was only true if the monkey was about to fi xate the target. ii) the same object selective IT neurons responded again once the eyes landed on the target.
Another fundamental constraint imposed on behavior is the onset latency of single neurons (Rolls and Tovee, 1994). It is frequently assumed that object-selective responses in IT cortex (such as the ones discussed above) are necessary for successful localization of complex objects. The minimum onset latency for monkey IT neurons is 85-95 ms (Nowak and Bullier, 1998;Naya et al., 2003) but can also be considerably later (Sheinberg and Logothetis, 2001). After the target is recognized, the search needs to be stopped (if fi xating on the target). This requires approximately 140 ms (Hanes and Carpenter, 1999). Given these latencies and the average fi xation duration, IT responses would be too late to induce an eye movement, since triggering a saccade requires the activity to reach other areas (such as FEF) fi rst (see introduction). The different nature of these observations (macaque monkeys, different task) precludes any defi nitive conclusion in regards to our result. However, it is nevertheless of interest to note that these monkey IT single-neurons have properties which seem very similar to what we observed behaviorally. Also note that our task design did not allow express saccades (which have latencies of as low as 120 ms; Kirchner and Thorpe, 2006). Of course, it remains speculative which specifi c neurons are activated during search in humans.
We used backward masking to terminate visual processing after a certain period of time. Measured psychophysically, the mask was effective in disrupting target recognition (see results). Neuronally, however, responses can occur long after mask onset and not all neuronal processing is disrupted (Thompson and Schall, 2000). Thus, the time of stimulus presentation is different from the duration of neuronal processing. Note that the latency argument (see above) does not require that neuronal processing is disrupted by the onset of the mask. Rather the argument states that, given the known response latencies of visual as well as motor neurons, the time available during a single fi xation might not be suffi cient for the entire process to complete (irrespective of whether the stimulus was masked or not).

CONCLUSIONS
We conclude that, for our conjunction search task and confi guration, the target identifi cation process can be divided into two steps: a fi rst round of processing, suffi cient for target identifi cation, takes places while the eyes fi xate in the neighborhood but not on the target. A second round of processing starts once the fi nal saccade arrives at its goal during which confi dence increases. In cases where subjects fail to identify the target despite looking at it for ample time (>100 ms), either of these two processes might fail.