Attentional selection of levels within hierarchically organized figures is mediated by object-files

Valdés-Sosa, Mitchell J.; Iglesias-Fuster, Jorge; Torres, Rosario

doi:10.3389/fnint.2014.00091

ORIGINAL RESEARCH article

Front. Integr. Neurosci., 16 December 2014
Volume 8 - 2014 | https://doi.org/10.3389/fnint.2014.00091

Attentional selection of levels within hierarchically organized figures is mediated by object-files

Mitchell J. Valdés-Sosa^*

Jorge Iglesias-Fuster

Rosario Torres

Cognitive Neuroscience, Cuban Center for Neuroscience, Havana, Cuba

Objects frequently have a hierarchical organization (tree-branch-leaf). How do we select the level to be attended? This has been explored with compound letters: a global letter built from local letters. One explanation, backed by much empirical support, is that attentional competition is biased toward certain spatial frequency (SF) bands across all locations and objects (a SF filter). This view assumes that the global and local letters are carried respectively by low and high SF bands, and that the bias can persist over time. Here we advocate a complementary view in which perception of hierarchical level is determined by how we represent letters in object-files. Although many properties bound to an object-file (i.e., position, color, even shape) can mutate without affecting its persistence over time, we posit that same object-file cannot be used to store information from different hierarchical levels. Thus, selection of level would be independent from locations but not from the way objects are represented at each moment. These views were contrasted via an attentional blink paradigm that presented letters within compound figures, but only one level at a time. Attending to two letters in rapid succession was easier if they were at the same-compared to different-levels, as predicted by both accounts. However, only the object-file account was able to explain why it was easier to report two targets on the same moving object compared to the same targets on distinct objects. The interference of different masks on target recognition was also easier to predict by the object-file account than by an SF filter. The methods introduced here allowed us to investigate attention to hierarchical levels and to object-files within the same empirical framework. The data suggests that SF information is used to structure the internal organization of object representations, a process understood best by integrating object-file theory with previous models of hierarchical perception.

Introduction

Object Based Attention

Although we can choose to attend to anything that happens at a given spatial location (i.e., the goal zone in a match of the FIFA World Cup), or to a specific feature (i.e., find the black uniforms in the playfield), we often focus on visual objects. The last alternative is especially sensible from an ecological point of view, given that most of our interactions with the world are precisely directed at objects (i.e., we grasp/eat/avoid/or-flee-from objects or we boo at them if they fail to score a goal). These alternatives for defining the units of selection are known respectively as spatial-based, feature-based and object-based attention (Serences et al., 2004).

The two-target test (TTT) was developed to identify which of these units are used in attention in a specific scenario. This test assumes that there should be little competition between two pieces of information arising within the same unit of attentional selection, in contrast to strong competition when these pieces originate from distinct units (Duncan, 1984; Kravitz and Behrmann, 2011). Thus, accuracy of reports about two targets have been compared when they belong to the same location/feature/object and when they do not. Note that TTT elegantly keeps several confounding factors other than attention (such as the number of perceptual decisions, working memory load, and response competition) constant across the focused/divided attention comparison.

Use of TTT has shown that under many circumstances object-based attention overrides spatial-based and feature-based mechanisms. For instance, it is easier to discriminate two features belonging to a single object than if they are split between two objects, even if these are spatially superimposed (Neisser and Becklen, 1975; Duncan, 1984). Most of the early work on object-based attention used stationary visual objects (with invariant features) as stimuli that were presented with abrupt onsets and offsets. However, in real life, objects move around and mutate in their proprieties (i.e., soccer players run around the field and can collapse).

In order to conceptualize these dynamic traits, Kahneman et al. (1992), proposed the concept of object-files as mid-level visual representations that would bind the present state of an object to its preceding history, thus “integrating visual information across time to represent a unitary object moving or changing within an ongoing perceptual experience” (Treisman, 1992). These temporary representations also would bind together different features. Only a limited number of object-files can be handled by attention at the same time (Scholl, 2007), consistent with the biased competition model (Desimone and Duncan, 1995), in which attentional competition implies mutual inhibition of object representations.

Changes in a scene are interpreted as updates to an existing object-file if they do not violate spatio/temporal predictability (plausible trajectories) or if the magnitude of feature changes is small (Scholl, 2007). Object-files can survive brief occlusions by other objects or interference from visual masks (Scholl, 2007; Wutz and Melcher, 2013). However, other changes cannot be assimilated into existing object-files and consequently trigger the creation of a new representation (Mitroff et al., 2004, 2005a,b; Scholl, 2007). Updating an attended file is thought to be less attentional demanding than creating a new object-file (Moore et al., 2007), or shifting attention from one file to another. We will describe a modification of the TTT that optimizes it for research on dynamically changing object-files.

Attention to Hierarchical Levels

Objects frequently possess different hierarchical levels, spanning from the entire entity to increasingly finer subdivisions (e.g., tree-branch-leaf). This has been studied in the laboratory with compound letters: a global letter made out of local letters (see Figure 1A). In what is known as the Navon (1977) task, subjects are asked to make speeded letter identifications from compound figures. The letters are easily recognized when attention focused on only one echelon (usually faster for the global case), but with more difficulty when attention is divided between the two levels (Navon, 1977; Kimchi, 2014). In the latter competition it is the global level that habitually dominates.

FIGURE 1

Figure 1. (A) A compound letter (a.k.a. Navon pattern). (B) Trial structure represented with schematic stimuli in the Experiments 1 and 2, with a grid mask (above), and with a noise mask (below). The example is one of the four types of trials: a global/local transition. IT: individually titrated durations. See Figure 2 for more realistic gray tone depictions.

Is there any relationship between object-based attention and attention to hierarchical levels? Unfortunately, these two research topics have been largely studied in isolation from each other (but see Vecera et al., 2000). Some work has equated selection of hierarchical levels with spatial-based attention (Stoffer, 1993; Kim et al., 1999). The idea is that global letters would require larger attentional windows than the smaller local letters. Another proposal equates selection of hierarchical levels with the selection of spatial frequency bands, with global and local information respectively mediated by relatively lower and higher frequencies (Shulman et al., 1986; Shulman and Wilson, 1987; Robertson, 1996; Flevaris et al., 2011a). We will later examine this idea of spatial frequency (SF) filters in more detail. To be fair, proponents of this hypothesis recognize that other cues can be used to signal hierarchical level (e.g., Flevaris et al., 2014).

To our knowledge, the object-file concept has not been used to understand hierarchical perception. Regrettably TTT, the mainstay of object-based attention research, cannot be fully deployed with traditional compound letters. Although traditional Navon figures admit two targets for divided attention, only one is available for focused attention. This precludes the elegant controls inherent to TTT. To overcome this limitation, compound letters and the TTT must be modified.

A Common Empirical Framework

To pull together research on object-files and hierarchical perception we must use a variant of the TTT that allows use of moving mutable objects, and also of compound letters. We have called this method the two-sequential-target test (TSTT). The TSTT essentially spreads the two targets of TTT over time (see Valdes Sosa et al., 2003) with variable stimulus onset asynchronies (SOAs). This allows use of the attentional blink (see reviews by Egeth and Yantis, 1997; Arnell et al., 2006; Dux and Marois, 2009). The attentional blink is an interference to the recognition of one visual target (T2) induced by the previous identification of another target (T1). This occurs typically when the targets are separated in time by less than 500 ms, and their availability has been curtailed by visual masks. The attentional blink is eliminated when T1 is ignored, which allows attention to focus on T2 (Raymond et al., 1992; Duncan et al., 1994).

Placing the targets in TSTT astride a spatial, a feature, or an object boundary, allows us to identify the units of attention in a given situation. If an attentional blink is increased by crossing a boundary (compared to avoiding it), then that boundary probably defines a legitimate unit of attentional selection. One instantiation of TSTT uses two rotating random dot kinetograms that are perceived as superimposed (but transparent) visual surfaces sliding over each other (Valdes-Sosa et al., 2000; Pinilla et al., 2001). Observers are asked to report two sequential jerks in motion direction (T1 and T2). The reports are always accurate for T1. They are also accurate for T2 if it affects the same-surface as T1, but a large attentional blink is produced if T2 switches surfaces (unless T1 is ignored).

Similar results have been reported by other groups (Mitchell et al., 2003, 2004; Reynolds et al., 2003; Khoe et al., 2005, 2008; Ciaramitaro et al., 2011). The findings can be understood if we assume that each surface creates an object-file. Updating the file for an already attended object-file is easier than shifting attention to another one. The TSTT has also been used with serial presentation of images of objects which can mutate their properties (Raymond, 2003; Valdes Sosa et al., 2003; Kellie and Shapiro, 2004). The attentional blink is ameliorated when the two targets are construed as variations of the same object-file, but will be large when they are interpreted as distributed between competing object-files, or when T2 causes the creation of a new object-file.

The TSTT can also be used with compound letters (Lopez et al., 2002), if one can uncouple the presentation of global and local letters over time. We achieve this by level specific letter presentation. This method presents a grid of 15 “8” symbols (Figure 1B). For brief periods of time, either some segments within the “8” symbols disappear (unmasking local letters), or complete “8” symbols disappear within the grid (unmasking a global letter). Thus, at any instance letters are shown at only one level, while easily detectable (and ignorable) patterns are presented at the other. This represents an important difference with traditional Navon figures, in which letters are always simultaneously present at both levels. The original grid, or a noise pattern, can be used to limit the persistence of the target letters, a requirement for the attentional blink as mentioned above.

Level specific letter presentation allows us to present in succession two target letters. These may belong to the same or to different hierarchical levels (see Figure 1B). Using a grid as a mask, Lopez et al. (2002) found a large attentional blink at short SOAs in different- trials, which was absent in same-level trials. The attentional blink lasted for about 400 ms for the local/global shift, and more than 1600 ms for the global/local shift. This advantage for local/global shifts could explain the attentional advantage of the global over the local level described for traditional Navon tasks (Kimchi, 2014). Other studies have replicated these results (White et al., 2009; Valdes-Sosa et al., 2014). Note that with level-specific letter presentation we can disassemble the traditional Navon task, controlling the direction of in which attention shifts between levels which is not possible with the traditional Navon task.

A Common Theoretical Framework

To provide a common conceptual structure for object-based attention and attention to hierarchical levels, we start off with the proposal that compound letters must be perceived via object-files, just like any other visual object. Theories of hierarchical perception have ignored this fact (or have downplayed it, see Robertson, 1996). Interestingly, Hübner and Volberg (2005) provide evidence that letter identity and hierarchical level are represented separately in earlier processing stages, and must be subsequently bound for reporting (at a stage they did not specify). This contradicts the traditional assumption that these aspects are inseparable during initial stages of perceptual processing. This proposal is supported by the fact that subjects make more conjunction errors (i.e., a correct identity from the wrong level) than predicted by this traditional view, particularly when compound letters are presented only briefly. Here we propose that this putative binding of letter and level identities requires the use of an object-file.

Furthermore, we posit that object-files for the different hierarchical levels are independent and have incompatible formats. Treisman (2006) advocated that, depending on the perceptual strategy of the observer, object-files can consist of either single objects, ensembles of objects, or even scenes. Obviously the types of information that these diverse object-files can bind, as well as their internal structure, must be very different. Object-files can mutate their form without losing their continuity (i.e., an open hand clenching into a fist). This implies that different letters can be represented by the same object-file. However, we stipulate that this only happens if they have a compatible level-specific format.

We have seen that spatiotemporal congruity and featural similarity are necessary to maintain object continuity in face of sensory change (Scholl, 2007; Flombaum et al., 2008). Object-file persistence depends also on cohesion. This implies that an object-file cannot survive after splitting into distinct identical objects (Mitroff et al., 2004, 2005a). Attending to local letters implies segregating a whole into a collection of parts (Han and Humphreys, 2002), which would destroy the object-file for the global form. Segregation of different object-files is also required for their persistence. This means if different objects merge, their object-files cannot survive as autonomous entities (Scholl et al., 2001; Mitroff et al., 2005b). This implies that integrating local letters into a global letter (Han and Humphreys, 2002), would eliminate their independent representations.

We propose extending the object-file theory in an additional direction. The creation of a new object-file (attentionally more demanding than its update), should be especially susceptible to the quality of figure/ground segregation (Peterson, 2001; Peterson and Kim, 2001). An object cannot be represented by an object-file until it is parsed from the background (Peterson, 2001). Thus, noise or masking should impair object-file creation, as recently reported by Wutz and Melcher (2013). This deleterious effect of noise and masking should be greater when attention is already invested on an existing object, and thus less available for new object-file creation. Finally we assume that the object-files in our experiments do not survive from one trial to another.

Overview of this Article

Our first goal was to replicate our previous findings with the TSTT with compound letters and based on level specific letter presentation. Thus, we verified if different-level trials elicited a larger attentional blink than same-level trials, while including additional controls for several potential confounds (Experiments 1 and 2). Furthermore, in Experiment 2 we measured the duration of the attentional blinks more precisely by using an adaptive staircase procedure. In this experiment we also confirmed the attentional nature of the inter-target interference by having participants ignore T1 in some trials. In these experiments we used two types of mask (with different spatial frequency content) to curtail the persistence of the targets. Previous results (Valdes-Sosa et al., 2014) suggest that these masks impair T2 recognition differently as a function of the latter's hierarchical level, a finding we confirmed here. In Experiment 3 we explored the nature of this effect by simulating the inattention to T2 produced during the attentional blink by reducing of the contrast of the letters. Finally in Experiment 4, two moving objects were presented, either of which could harbor the compound letter targets. The goal was to directly explore the participation of object-files in attention to hierarchical levels. In this experiment we compared the effects of switching location, hierarchical level, and objects on the attentional blink, as well as their interactions. In the general discussion, we elaborate the extended object-file theory presented here and compare its ability to explain our data with other models concerning hierarchical perception. In all these experiments the level of T1 was forewarned by a cue word. In Experiment 5 we explored the role of endogenous reconfiguration of executive processes (associated with task switching) by manipulating cue validity over trials.

Experiment 1

In this experiment we replicated and extended previous work with TSTT based on level specific letter presentation (Lopez et al., 2002; White et al., 2009; Valdes-Sosa et al., 2014). The novelty of this approach over traditional Navon figures is that it is possible to separate the presentation of local and global letters by variable amounts of time. Four types of transition between targets were used (global/global, local/local, global/local, and local/global). These were interspersed within the same blocks in a random order, in contrast to the blocked procedure used in our previous work (Lopez et al., 2002). The goal was to verify if the attentional blink was smaller in same-level trials, and larger in different level trials as previously reported.

We also examined if the quality of the masks temporally enclosing T2 was capable of affecting the size of the attentional blinks, as suggested by a previous study (Valdes-Sosa et al., 2014). One mask, containing relatively higher SF (HSF), was the original grid used to generate the compound stimuli (also used as masks by Hübner and Volberg, 2005; Flevaris et al., 2010). The other mask, containing relatively more low SF (LSF) content, was generated by superimposing random line segments on this grid. In Valdes-Sosa et al. (2014), the two masks were used with different participants, a shortcoming corrected here. Furthermore, our previous work used fixed durations for the global and local letters. Since the latter are more difficult to read, uncontrolled differences in readout times between levels could have existed. This could have distorted estimates of attentional blink magnitudes. Here we controlled this factor by separately titrating in each subject the contrasts and durations of the global and the local letters in order to equate their ease of identification. The Quest staircase algorithm was used for this titration (Watson and Pelli, 1983).

Methods

Participants

Ten university graduate students (4 females) from the Cuban Center for Neuroscience, with ages between 25 and 35 years, were recruited for the study. All subjects had normal, or corrected to normal vision, and none had a history of neuropsychiatric disorders, nor were they taking psychotropic drugs at the time of this experiment. A written informed consent was obtained from all participants and the experimental protocol was approved by the ethics committee of the Cuban Center for Neuroscience.

Stimuli

Dark-gray characters (see Figure 1 for a black and white rendering, and Figure 2 for a more realistic gray tone depiction) were displayed on a light-gray background, at the center of a CRT screen placed 40 cm in front of the observers. Letters were obtained by modifying selected segments of a grid that comprised 15 rectangular “8” figures. Five letters were used (E, H, S, P, U), that could appear at either the global or local level. Global letters (see Figure 1) were obtained by increasing the saturation (i.e., making the gray lighter) of selected “8” figures (thus reducing their contrast with the background), and roughly occupied the same area as the original grid, which was 100 mm high and 38 mm wide (approximately 7.2 × 2.43° of visual angle). Local letters (see Figure 1) were obtained by increasing the saturation of individual segments within all the “8” figures, and were 18 mm high and 10 mm wide (approximately 1.42 × 0.8° of visual angle).

FIGURE 2

Figure 2. Spatial frequency spectra of the stimuli from Experiments 1 and 2. Rotational spectra were obtained for masks and the global and the local letter “E,” as described in the Methods of the article. Areas of coincidence between mask and letter spectra are indicated as mask/letter SF overlap. (A) The two mask types used and global and local letter E from Experiment 1. (B) The two mask types used and global and local letter E from Experiment 2.

The rotational energy spectra of the grid and noise masks, as well as global and local “E,” were calculated. The sfPlot function (based on the 2-D Fourier transform), as implemented in the Shine Toolbox (Willenbockel et al., 2010), was used for this estimation. The resulting spectra were modulated by the corresponding values of the contrast sensitivity function equation described by Mannos and Sakrison (1974). Figure 2A shows the energy spectrum of the two masks and the letters. The noise mask had more energy at LSF than the grid masks. The latter had a large peak at HSF. The spectra of the global and local E respectively overlap best with the noise and grid masks. We will dub this coincidence in spectral peaks as mask/letter SF overlap.

Titration of Letter Contrast and Duration

The contrast used to produce the global and the local letters was titrated in each subject a separate session (before the main experiment) with a Quest staircase (implemented on Matlab 6.5, Mathworks Inc., see Watson and Pelli, 1983), using a 75% correct recognition threshold. Trials consisted of a randomly selected letter presented for 150 ms, preceded and followed by a mask, both of which lasted 300 ms. The initial Quest parameters were: beta = 3.5, delta = 0.01, gamma = 0.5, and grain = 0.01. Titration was performed separately when using the grid or the noise masks. Letter duration was subsequently titrated with the same staircase. Note that in the other experiments of this article only letter duration was titrated. Mask contrast was not titrated.

Procedure

Two blocks of trials were used, one with the grid mask and the other with the noise mask, with the order counterbalanced across subjects. The nature of T1 was forewarned by the words “GLOBAL” or “LOCAL” at the beginning of every trial. Participants then triggered the events presented in Figure 1B by pressing the spacebar on the computer keyboard. The initial mask was presented and after a 300 ms delay, it was first briefly substituted by T1, which could be either a global letter or a set of local letters. The letter was then replaced by the inter-target mask. After a delay, the second target (T2) was briefly revealed and then replaced by the final mask. Four stimulus onset asynchrony (SOA) values were used: 200, 400, 800, and 1600 ms. At the end of each trial, the observers reported (in a forced-choice) both the T1 and T2 letters. According to the level of the two targets, two types of same-level trials (global/global and local/local), and two types of different-level trials (global/local and local/global) were used. These 4 transitions types were presented randomly mixed in a single block. T2 accuracies (only from trials with correct T1 recognition were used in this article) was submitted to repeated measures ANOVA, using three within-subject factors: Mask-type (grid vs. noise), Transition-type (global/global, local/local, global/local, and local/global) and SOA (200, 400, 800, and 1600 ms). The Greenhouse-Geisser correction was used when appropriate in these and in all subsequent analysis (Greenhouse and Geisser, 1959).

Results and Discussion

The mean titrated durations for all stimulus types are shown in Table 1. They were significantly [F_{(1, 9)} = 112.2, p < 0.0001] shorter for global letters (60.1 ms) than for local letters (122 ms), and significantly shorter [F_{(1, 9)} = 9.6, p < 0.012] for grid masks (75 ms) then for noise masks (107 ms). The interaction between these effects was not significant.

TABLE 1

Table 1. Mean titrated durations of letters in Experiment 1.

Recognition of T1 was accurate for all types of trial in every subject (>85%, see Figure 3). The mean T2 accuracies as a function of Mask-type, Transition-type and SOA are shown in Figure 3. The Mask-type effect was not significant. Transition-type was highly significant [F_{(3, 27)} = 130.5, p < 0.0001, η² = 0.35], and SOA was also highly significant [F_{(3, 27)} = 93.046, p < 0.0001, ε = 0.586, η² = 0.12]. The interactions between Mask-type and Transition-type [F_{(3, 27)} = 224.720, p < 0.0001, ε = 0.699, η² = 0.32], between Transition-type and SOA [F_{(9, 81)} = 9.8462, p < 0.0001, ε = 0.416, η² = 0.042] and between Mask-type, Transition-type and SOA were all highly significant [F_{(9, 81)} = 6.7418, p < 0.001, ε = 0.372, η² = 0.024]. Planned comparisons showed that T2 accuracy was larger in same-level than in different-level trials (p < 0.0001).

FIGURE 3

Figure 3. T2 accuracy as a function of Mask-type, Transition-type and SOA. In these and all subsequent graphs, means and standard errors of targets are plotted, and for T2 accuracy calculation only trials with correct T1 identification were included. Note that the corresponding T1 accuracies are plotted at SOA zero. In this figure and the rest the following acronyms are used: GG (global/global), LL (local/local), LG (local/global), and GL (global/local).

An additional ANOVA, including only the same-level trials, exhibited a highly significant SOA effect [F_{(3, 27)} = 19,813, p < 0.0001, ε = 0.665, η² = 0.26]. For the two masks, accuracy was significantly lower than the maximum (at 1600 ms) at 200 and 400 ms (each p < 0.0001). The interaction between Mask-type and Transition-type was also significant [F_{(1, 9)} = 6.1132, p < 0.04, ε = 1.0, η² = 0.057]. This was due to a higher T2 recognition accuracy in global/global, compared to local/local trials only for the grid mask (p < 0.016).

Another ANOVA was performed including only the different-level trials. The effect of Mask-type was significant [F_{(1, 9)} = 6.378, p < 0.04, η² = 0.014], with slightly lower T2 accuracies for noise compared to grids. Transition-type was not significant, whereas SOA was highly significant [F_{(3, 27)} = 95.769, p < 0.0001, ε = 0.932, η² = 0.234]. The interaction between SOA and Mask-type was also significant [F_{(3, 27)} = 4.829, p < 0.0081, ε = 0.929, η² = 0.009]. Importantly, the interaction between Mask-type and Transition-type was highly significant with a strong effect [F_{(1, 9)} = 747.63, p < 0.0001, ε = 1.0, η² = 0.54]. This interaction was further analyzed with planned comparisons. With grid masks the attentional blink was significantly smaller in local/global than in the global/local trials (p < 0.0001). The opposite pattern was found for noise masks (p < 0.0001). Finally, the interaction between Mask-type, Transition-type and SOA (see Figure 3) was highly significant [F_{(3, 27)} = 9.2549, p < 0.001, ε = 0.757, η² = 0.03]. This effect was due to a faster recovery of the attentional blink for local/global trials and a much slower recovery in global/local trials with grid masks, with the opposite pattern for noise masks. We did not observe in this, and subsequent, experiments any level/identity binding errors (which would have resulted in reporting the targets in the wrong order in different-level trials).

In this experiment, we used level specific letter presentation to eliminate ambiguity about letter hierarchical level (which precluded level/identity binding errors), while carefully equating the difficulty of global and local letter recognition ease. Also, trial types were randomly interspersed within the same blocks, thus avoiding long term biases that could have resulted from grouping transition types into blocks (as in Lopez et al., 2002; White et al., 2009). Finally, both grid and noise masks were used within the same participants (in contrast with Valdes-Sosa et al., 2014). With these additional controls, we were able to replicate and extend our previous findings.

Small -albeit significant- attentional blinks were found for same-level trials. This effect is perhaps similar to that found in mainstream attentional blink research (Dux and Marois, 2009). Much larger attentional blinks were replicated for different-level trials, that were relatively shorter (approximately 0.5 s) for local/global trials but much longer trials for global/local trials when grid masks were used (>1 s). The opposite pattern was seen across different-level trials when noise masks were used. This confirmed the findings from Valdes-Sosa et al. (2014), but now with a statistically significant within-subject interaction between Mask-type and T2 level for different-level trials. No evidence for a lag-1 effect was found, although shorter SOAs than those used here are needed to exclude this possibility.

Only a few studies have studied the attentional blink produced by traditional compound-letters. One series of studies (Lawson et al., 1998, 2005; Crewther et al., 2007) presented multiple distracters in addition to a T1 distinguished by color, and a pre-designated symbol as T2. They found unusually long attentional blinks, lasting from 1.5 to 2 s, for all types of transitions. Puzzlingly, there was no reduction of the attentional blink in same level trials. Their task may have been more difficult than ours due to the use of many distracters. Also, since there was no advance knowledge of T2 level, extra time may have been needed to resolve this uncertainty and also to resolve the response conflict inherent to traditional Navon figures. By using level specific letter presentation, we were able to avoid both problems.

Findings closer to ours were reported by Srivastava et al. (2010). They used TSTT with traditional Navon figures, with the targets placed at two out of four spatial locations followed by visual noise masks. The levels of T1 and T2 were pre-specified at trial start, but the location was forewarned only in their second experiment. In both experiments, same-level trials elicited very small attentional blinks, similar to our results. However, they did not find any difference in attentional blink magnitude between global/local and local/global trials in Experiment 1. This discrepancy could have arisen from the unpredictable T2 location in this experiment. Their Experiment 2 eliminated spatial uncertainty, and elicited larger attentional blinks for local/global than for global/local trials, more in line with our findings with the noise masks. Again the use of level specific target presentation probably allowed us to obtain cleaner estimates of attentional dwell times.

How can we explain the large attentional blinks observed here for different-level trials which are absent for same-level trials? One possibility is the “attentional print” posited by Robertson (1996). She hypothesized that identification of a letter within a traditional Navon figure creates this print, which attracts attention to features typical of its hierarchical level. This could be achieved by facilitation of specific SF bands (see also Flevaris et al., 2010). In other words, stimuli would enduringly bias competition between SF bands in favor of their dominant spectral content. This model explains level-specific priming with traditional Navon figures, which consists of faster identification of a letter in one trial when it is presented at the same (relative to a different) level as in the previous trial (Robertson, 1996; Kim et al., 1999). This type of priming presents interesting analogies with our data. However, level-specific priming has a long duration (in the order of several seconds) and is carried over between different trials. Our same-level facilitation (reflected by an absence of an attentional blink) occurs within the same trial and is present at much shorter time intervals (in the order of hundreds of milliseconds). We will return to the relationship between the two phenomena in the next experiment. Nevertheless, an attentional print as described by Robertson, could certainly produce the pattern of attentional blinks described in this experiment. This would be a form of feature-based selection.

On the other hand, our extended object-file theory also accommodates all the data from this experiment. If T2 is at the same level as T1 it can be assimilated as an update of the corresponding (recyclable) object-file. This is attentionally undemanding, therefore the attentional blinks should be small for this type of trials. In contrast, a new object-file must be created for T2 on different-level trials, which is attentionally very demanding. Since T1 sequesters attention, this entails large attentional blinks for this type of trial. The effects of Mask-type on T2 recognition would be mediated by their deleterious effect on object-file formation, which would interact with the impoverished attention existing at T2 presentation. Note that T2 recognition was most impaired for larger mask/letter SF overlaps. These assumptions were tested more directly in Experiment 3. Note that the attentional print and object-file accounts do not mutually exclude each other, and both make identical predictions about Experiment 1.

Experiment 2

The previous experiment showed that, at short SOAs, it was difficult to divide attention over between successive compound letters if they occurred at different hierarchical levels. Furthermore, this difficulty was more pronounced when the mask/letter SF-overlap for T2 was larger. In the present experiment we aimed at reproducing these findings with the following extensions. First, we tested if the large attentional blink found in different-level trials was reduced when T1 was disregarded by the observers. This allowed us to assess how much of the T2 impairment was due to sensory interactions and how much due to attentional factors. Second, the same test was used to determine if the effect of Mask-type on attentional blink size found in Experiment 1 also depended on attention. Valdes-Sosa et al. (2014) found a reduced attentional blink with grid masks when T1 was ignored, but did not examine this with noise masks. Therefore, they did not statistically test the interaction of Mask-type with attention in a within-subject design.

Furthermore, we were interested in obtaining more precise estimates of the duration of the attentional blinks. To achieve this we used a staircase psychophysical procedure linking inter-target interval to accuracy in T2 identification (Watson and Pelli, 1983). Note that although we report attentional blink durations as SOA units, as is standard in the literature, we also analyzed the inter-stimulus durations (ISIs) to take advantage of the procedure for titration of letter readout times introduced in Experiment 1. Remember that this allowed us to adjust the duration of letter presentations in order to compensate the slower readout of the local- relative to the global-level. If we assume that letter readout times were held constant, then the ISI are a purer comparison of attentional dwell times. The staircase procedure had the additional advantage of reducing testing to a tolerable duration, since we needed to compare attentional blink durations within the same participants while varying trial types, masks, and attend/ignore-T1 conditions.

Another change respect to Experiment 1 is that the contrast of all stimuli was reversed, and the energy of the noise mask increased, to see if these physical characteristics of the stimuli had any effect on attentional blink durations. Finally a control was introduced to examine if global/local selection could be based on zooming in attention (Stoffer, 1993) to a few selected locations in the visual field (e.g., near fixation). Here for each local target, letters were only unveiled ata subset of locations within the stimulus matrix. Since these locations were randomly selected for each target, a strategy of monitoring of only a few pre-selected placeholders would have made accurate identification of local letters very difficult.