From Foreground to Background: How Task-Neutral Context Influences Contextual Cueing of Visual Search

Zang, Xuelian; Geyer, Thomas; Assumpção, Leonardo; Müller, Hermann J.; Shi, Zhuanghua

doi:10.3389/fpsyg.2016.00852

ORIGINAL RESEARCH article

Front. Psychol., 07 June 2016

Sec. Cognitive Science

Volume 7 - 2016 | https://doi.org/10.3389/fpsyg.2016.00852

From Foreground to Background: How Task-Neutral Context Influences Contextual Cueing of Visual Search

Xuelian Zang^1,2*

Thomas Geyer²

Leonardo Assumpção²

Hermann J. Müller^2,3

Zhuanghua Shi^2*

¹China Centre for Special Economic Zone Research, Research Centre of Brain Function and Psychological Science, Shenzhen University, Shenzhen, China
²General and Experimental Psychology, Department of Psychology, Ludwig-Maximilians-Universität Munich, Munich, Germany
³Department of Psychological Science, Birkbeck, University of London, London, UK

Selective attention determines the effectiveness of implicit contextual learning (e.g., Jiang and Leung, 2005). Visual foreground-background segmentation, on the other hand, is a key process in the guidance of attention (Wolfe, 2003). In the present study, we examined the impact of foreground-background segmentation on contextual cueing of visual search in three experiments. A visual search display, consisting of distractor ‘L’s and a target ‘T’, was overlaid on a task-neutral cuboid on the same depth plane (Experiment 1), on stereoscopically separated depth planes (Experiment 2), or spread over the entire display on the same depth plane (Experiment 3). Half of the search displays contained repeated target-distractor arrangements, whereas the other half was always newly generated. The task-neutral cuboid was constant during an initial training session, but was either rotated by 90° or entirely removed in the subsequent test sessions. We found that the gains resulting from repeated presentation of display arrangements during training (i.e., contextual-cueing effects) were diminished when the cuboid was changed or removed in Experiment 1, but remained intact in Experiments 2 and 3 when the cuboid was placed in a different depth plane, or when the items were randomly spread over the whole display but not on the edges of the cuboid. These findings suggest that foreground-background segmentation occurs prior to contextual learning, and only objects/arrangements that are grouped as foreground are learned over the course of repeated visual search.

Introduction

In everyday life, we constantly receive a massive amount of sensory input that would require an unrealistic amount of cognitive resources to be all processed. To ensure the functioning of higher-level mental processes, we benefit from sophisticated attentional mechanisms that help us select and process information that is important, and deselect information that is unimportant, for performing relevant tasks and ongoing actions (Treisman and Gelade, 1980; Wolfe, 2003). To illustrate, imagine a situation in which one searches for a car in a parking lot: search strategies would be different depending on whether one searches for a car in a global scene context (e.g., searching on the east side of the parking deck) or in a local configural context (e.g., searching for a car parked between two cars of the same model/brand). The specific ‘contexts’ in these scenarios would determine how and where attention should be deployed, thus ‘saving’ cognitive resources by processing only the most relevant information to the task at hand.

The interplay between scene-based and configuration-based context has been investigated in a number of studies using the ‘contextual-cueing’ paradigm (e.g., Brockmole and Henderson, 2006; Brooks et al., 2010; Kunar et al., 2013; Rosenbaum and Jiang, 2013). In the standard contextual-cueing task (e.g., Chun and Jiang, 1998; Chun, 2000; Pollmann and Manginelli, 2009a; Geringswald et al., 2012; Annac et al., 2013; Geyer et al., 2013), participants search for a ‘T’-shaped target amongst ‘L’-shaped distractors. Unbeknownst to participants, half of the search displays are repeatedly presented, that is: ‘old’, displays, in which the locations of both the target and the distractors are kept constant across trials (though with target identity being variable), while the other half of search displays presents novel items arrangements. In more detail, in these ‘new’ displays, the distractors change locations randomly across trials, while the target locations are nevertheless controlled to equate target location repetition between old and new displays. The common finding is that reaction times (RTs) are faster to targets in old compared to new spatial arrangements, an effect referred to as ‘contextual cueing’. Interestingly, when participants are asked about display repetitions in an explicit old-display recognition test at the end of the search experiment, they are typically unable to discriminate old from new displays to a level better than chance. This has led to the idea that contextual cueing is an implicit-memory effect, though the role of consciousness in contextual cueing has become a controversial issue recently (for a review, see Vadillo et al., 2015).

Since the seminal study of Chun and Jiang (1998), the contextual cueing paradigm has proven to provide a powerful tool in the investigation of visual search and attention. An important issue in the present context concerns whether contextual cueing is itself influenced by attention. Regarding this question, it has been proposed that perceptual segmentation – or visual grouping – regulates the acquisition of contextual memory traces. For example, some studies suggested that contextual cueing is determined by spatial grouping, evidenced by findings that only display items in the vicinity of the target are effectively acquired in contextual learning (e.g., Olson and Chun, 2002; Brady and Chun, 2007; Zang et al., 2015). Other findings (e.g., Brockmole et al., 2006; Kunar et al., 2006; Shi et al., 2013), by contrast, provide strong evidence in favor of the idea that global context is necessary for the cueing effect to occur (i.e., the observers form associations between the target and the entire distractor background). Discrepancies also arise in relation to featural grouping (e.g., Jiang and Chun, 2001; Jiang and Leung, 2005). On the one hand, under conditions in which the search items could be grouped based on common color (e.g., groups of black vs. white items), Jiang and Leung (2005) observed contextual cueing only when the distractors, as well as the target, in a specific (e.g., the white) color group appeared at identical locations (the locations of the distractors in the other color group, e.g., black distractors, were either maintained constant or varied) – suggesting that, by ‘default’, contextual target-distractor associations are formed within individual color groups (see also Geyer et al., 2010). Note that observers in Jiang and Leung’s (2005) study were explicitly instructed to search for a target defined by a pre-specified color (e.g., white), invoking a feature-based attentional set. Interestingly, the magnitude of the cueing effect in this ‘attended-old’ condition was comparable to cueing in a ‘both-old’ condition in which all, black and white, distractors appeared at identical locations. On the other hand, when presenting the distractors in different – ‘small’ and, respectively, ‘large’ – sizes, Conci et al. (2011) found contextual cueing to be reduced in the ‘grouping’ condition compared to the ‘standard’ condition in which all distractors were of the same size. This suggests that feature-based attention (to one or the other group of items) might even hamper contextual cueing. Thus, although manipulation of display features (e.g., color, size) does provide a promising tool for investigating the role(s) of feature-based attention and grouping for contextual cueing, the evidence available to date is rather mixed.

Findings from other studies that investigated attentional constraints in relation to scene context complicate the picture of the link between attention and contextual cueing even further. For instance, Brooks et al. (2010) examined contextual cueing in visual search arrays that were presented on the surface of a green ‘table’ located in the center of a real-world scene display, where the repetition of search array configurations and scene displays were manipulated independently. In this condition, both the configuration of the search items and the real-world scene (or, alternatively, either one but not the other) could in principle act as context cues for the search target. The results revealed a contextual cueing effect only in the ‘constant-configuration/variable-scene’ condition, but not in the ‘variable-configuration/constant scene condition’, which led Brooks et al. to propose a ‘configuration-dominant’ influence in contextual cueing. Nonetheless, a ‘scene-dominant’ effect was reported by Rosenbaum and Jiang (2013) when they presented the visual search display across the entire scene, including both central (foveal) and peripheral item locations. Participants were first trained on predictive displays containing both a scene and a search array configuration (i.e., the target location was consistently associated with the same scene and the same search array configuration), and then were tested with two types of search displays: a scene-predictive display, in which the target location was associated with the same scene but embedded in a different search array, and an array-predictive display, in which target location was associated with the same search array but a different scene. The results revealed reliable contextual cueing when the scene, but not array, was predictive, arguing in favor of a more important role for scene-based, as compared to configuration-based, context in contextual learning.

While in one hand the studies reviewed above generally support the idea that contextual learning is subject to perceptual constraints, on the other hand they merely focused on the (relative) extent to which the acquisition of contextual cues is influenced by certain visual properties. Arguably, however, in addition to producing equivocal findings, these studies failed to provide a general view as to how spatial associations are formed in the first place, that is: what are the principles that determine the learning of target-distractor associations (e.g., configuration- vs. scene-based contextual cueing)? Here, we propose that spatial context learning is constrained by a more basic, yet fundamental process, namely: ‘foreground-background segmentation’, which governs how attention is deployed. Foreground-background segmentation has been shown to occur quite early in visual processing, prioritizing the foregrounded ‘candidate’ perceptual units for further processing (e.g., Baylis and Driver, 1992, 1993; Driver et al., 2001; Mazza et al., 2005). Accordingly, attention is biased toward the selected foreground, yielding an enhanced representation (and learning) of foregrounded items (Mazza et al., 2005). Importantly, processes of foreground-background segmentation are not limited to the search items, but rather involve the entire visual scene. On this view, determining the role of foreground-background segmentation may provide a unified account as to how grouping, scene- and configuration-based information, influences contextual cueing in visual search (Brockmole et al., 2006; Brooks et al., 2010; Kunar et al., 2013; Rosenbaum and Jiang, 2013). In a nutshell: we propose that the information that is selected as foreground determines the contextual cueing effect.

In order to validate our hypothesis, the current study investigated attentional constraints on context learning by examining scene-based interference in a conventional visual search task. In order to separate grouping effects from context learning, we conducted two experiments presenting visual search items (‘T’ and ‘L’s) together with a task-neutral (i.e., irrelevant for deciding on the required search response) context (e.g., a 2D projection of a 3D cuboid, see Figure 1) that was not predictive of the target location. The reason for choosing the cuboid as a task-neutral object was twofold: First, a cuboid object is ultimately larger and more salient than the individual search items. It serves as a ‘global shape’ stimulus, enabling us to examine for the (novel) effects of global, 3D stimulus attributes on contextual cueing, in addition to the effects of semantic context (Brockmole and Henderson, 2006) or color context (Jiang and Leung, 2005; Geyer et al., 2010). Second, in the real world, visual search operates in 3D environments, and the learning of visual contexts could interact with 3D objects that may exist in the scene. Therefore, a task-neutral cuboid enables us to investigate the interactions between 2D items and 3D objects in contextual cueing. In Experiment 1, all visual search items were located on the edges of the cuboid, ensuring that the shape of the cuboid could be easily picked up as foreground information. In Experiment 2, by contrast, the cuboid was assigned the role of background by virtue of being presented on a different, distant depth plane to the search items. In Experiment 3, the visual search items were randomly spread over the whole display but not on the edges of the cuboid (e.g., with a weak association between items and the cuboid), thus assigned the cuboid to the background during visual search. Following an initial training session, in the test sessions, the cuboid was either rotated or entirely removed to examine for possible effects of figure-ground segmentation on contextual cueing.

FIGURE 1

FIGURE 1. Stimulus configurations and schematic paradigm used in Experiments 1, 2, and 3. Left: possible positions (gray grids) for search items in three experiments, respectively, (for both ‘upward-pointing’ and ‘downward-pointing’ cuboid). The visual search items were presented on the edges of the pseudo cuboid on the same plane in Experiment 1 (A), while projected on the edges of the pseudo cuboid, but presented on the different planes in Experiment 2 (B), and on the space location but not edges of the ‘upward-pointing’ and ‘downward-pointing’ cuboid in Experiment 3. The grids, numbers, and the gray color were invisible during the actual experiments. The whole display subtended as 13.2 × 13.2°. Right: (A) schematic illustration of three sessions used in Experiment 1: the training session (block 1–28), the first test session (block 29–30), and the second test session (block 31–32). For the old item-based configurations, each target was paired with a particular consistent distractor sets, repeated once per block; while for the novel item-based configuration, the target was paired with newly generated distractor sets for each presentation. The task-neutral cuboid was the same for both old and new displays, ‘upward-pointing’ during the training session, ‘downward-pointing’ during the first test session, and absent during the second test session. (B) Schematic illustration of Experiment 2. The visual stimuli used in Experiment 2 were the same as used in Experiment 1, except the pseudo cuboid was presented on the deeper depth plane, separated from the search items on the front plane. The schematic illustration was plotted from a -50° of view angle in order to show the depth information well. In the real experiment participants wore 3D glasses (Optoma ZF2100) and viewed the display in front of the visual search items, such that the search items were still on the edges of the cuboid, though in separated planes. (C) Schematic illustration of the three sessions in Experiment 3.

Our hypothesis was that foreground context would play a more important role in contextual guidance than background context, that is, the cuboid would influence contextual learning in Experiment 1 (in which, during learning, observers were unable to separate the cuboid and the search items), but not in Experiments 2 and 3 (in which depth segmentation was possible, or associations between search items and cuboid were weak, permitting the arrangement of the search items to be learned without reference to the cuboid object). Accordingly, we expected a decrease, if not complete abolishment of the contextual cueing effect after the change (or removal) of the ‘foreground’ cuboid at the transition from training to test/transfer in Experiment 1, but not in Experiments 2 and 3. Alternatively, if processes of foreground-background segmentation do not affect contextual cueing, presenting the cuboid as foreground during learning (i.e., in Experiment 1) should not modulate contextual cueing in the subsequent test session.

Experiment 1

Experiment 1 investigated whether a task-neutral ‘cuboid’ context interpreted as foreground would be encoded in the memory representation underlying contextual cueing. Crucially, we examined whether an already acquired context (in the learning phase of the experiment) would still be used after a change or complete removal of the task-neutral cuboid in the test phase. To this end, the search items were randomly arranged on the edges of the cuboid (see Figure 1 for an example) such that the frame of the cuboid and the search items would be automatically co-located, or linked with each other, in the visual space. As shown by previous studies (Palmer, 1992; Palmer and Rock, 1994; Han et al., 1999), uniform connectedness is a strong factor in perceptual grouping and organization, occurring at a very early stage. Therefore, the task-irrelevant cuboid was expected to be grouped together with the task-relevant visual search items and, thus, be interpreted as foreground context.

Materials and Methods

Although contextual cueing is a stable effect observed repeatedly in previous studies (e.g., Chun, 2000; Goujon et al., 2015), it is important to note that some 30% of the participants may reveal from none to negative contextual cueing (Schlagbauer et al., 2012). As our aim was to examine how the change of the task-neutral cuboid affects contextual cueing, it was crucial to limit investigation of the transfer effect to only those participants who had already learned, and displayed a positive cueing effect in response to, the original (‘old’) displays before the cuboid variation. Since Experiment 1 (as well as Experiments 2 and 3) consisted of two stages, only those participants who exhibited a positive contextual cueing effect in the first, training stage continued on to the second test stage (for the other participants, the experiment was terminated after the training stage). Two criteria were used to identify positive cueing effects: the grand mean response times (RTs) over the whole training session and the mean RTs for the last epoch (see definition of ‘epoch’ below) had to be faster for old compared to the new displays. This procedure has been used routinely in many other studies investigating transfer effects of contextual cueing (Conci et al., 2011; Conci and Müller, 2012; Zellin et al., 2013a,b).

Participants

Eleven participants (eight females, mean age: 26 ± 4.54 years) took part in the training session, ten of whom (seven females, mean age 26.5 ± 4.45 years old) went on to complete the test session. Participants were paid 8 Euro per hour for their participation. The experiment was approved by the ethics committee of the Department of Psychology of LMU Munich.

Apparatus and Stimuli

The experiment was conducted in a sound-attenuated, dimly lit cabin (2.95 cd/m²). The visual displays were presented on a 21-inch LACIE CRT monitor, with a refresh rate of 100 Hz. The viewing distance was set at 57 cm, and kept constant with the use of a chin rest. The search displays comprised of 12 search items (each 0.8° × 0.8° of visual angle in size and 24.24 cd/m² in luminance; the display background was gray: 6.33 cd/m²), consisting of one ‘T’-shaped target and eleven ‘L’-shaped distractors. Similar to previous studies (Jiang and Chun, 2001; Olson and Chun, 2002; Zang et al., 2015), the ‘L’ distractors had a small offset (0.12°) at the line junctions to make them more similar to the target ‘T’. The task-neutral object was a ‘pseudo’ cuboid (i.e., a cuboid projected onto a 2D plane, extending 12° × 12° of visual angle; see Figure 1 for an example), composed of nine white lines (24.24 cd/m²). Two cuboid orientations, ‘upward-pointing’ and, respectively, ‘downward-pointing’, were used for the training and test sessions, respectively. The ‘square’-face of the upward-pointing cuboid was located in the upper-right quadrant, while the downward-pointing cuboid was created by rotating the (upward-pointing) cuboid 90° clockwise, so as to position the square face in the bottom-right quadrant (see Figure 1).

For each search display, the ‘L’ distractors were randomly rotated 0°, 90°, 180°, or 270° from the vertical midline, while the ‘T’ target was rotated 90° either clockwise or counter-clockwise, pointing to the right or to the left (and requiring a ‘left’ or, respectively, ‘right’ response). Both ‘T’ and ‘L’s were randomly placed at 36 possible locations inside an invisible 11 × 11 grid square area, with each location subtending 1.2° × 1.2° of visual angle. The 36 possible locations were selected on the edges, but not the vertices, of the trained cuboid (see left in Figure 1A). In this way, the position of the cuboid was strongly linked to the positions of the search items.

Procedure and Design

Participants were asked to discriminate the orientation of the target letter ‘T’ as fast and accurately as possible by pressing either the left or the right arrow key on the keyboard, using their left- and right-hand index fingers, respectively. Each trial started with the presentation of a central fixation cross for 800–1000 ms, which was immediately followed by a search display. The search display remained on the screen until a response was made or (in the absence of a response) until 10 s had elapsed. The next trial started automatically after a random interval of 1.0–1.2 s. As illustrated in Figure 1A, the experiment consisted of a 28-block training session, two 2-block test sessions, and a 3-block recognition session. Each block of 16 trials contained 8 old and 8 new displays, randomly intermixed. As for displays generation, for each participant, 16 possible target locations were generated; eight target locations for old and eight target locations for new displays. For old displays, except for target’s orientation, both target and distractor locations were kept constant across blocks, whereas for the new displays only target locations (except orientation) were kept constant. By maintaining target locations constant in both old and new displays we equate target location repetition effects between these displays.

During the training session, an upward-pointing cuboid, with search items presented on its edges (but not vertices), served as the task-neutral scene for both old and new displays. Since the very same cuboid was shown on each trial, it could not cue the target location in any better way for the old compared to the new displays. Therefore, any differences in RT performance between the old and new displays were attributable solely to either the configural context of the search items, or the interaction between the task-neutral cuboid and the search items. The cuboid was rotated by 90° in clockwise direction in the subsequent test session, in both old and new displays, while the configural context of the search items (old displays) was held constant across the two sessions. With this variation, most of the visual search items (more than 88%) were no longer located on the edges of the rotated (downward-pointing) cuboid, thus clearly disrupting any spatial association between the task-neutral cuboid and the search array. In the second test session, the cuboid was entirely removed from the search display.

Once the search task was completed, participants performed three consecutive blocks of recognition trials, with an ‘upward-pointing’ cuboid, a ‘downward-pointing’ cuboid, and ‘no cuboid’, respectively. Participants were told that half of the displays were repeated displays from the search task, and their task was to decide whether or not they had already seen a given display in the previous search task (by pressing the left and right arrow keys to respond ‘yes’ and ‘no’, respectively). The display presentation lasted maximum of 20 s (i.e., twice as long as the 10 s in the search sessions).

Prior to the experiment, participants practiced the experimental task with upward-pointing cuboids in one block of 16 trials. Only new display configurations were shown during practice. Participants were allowed to take a break between blocks of the experiment.

Results

Search Task

The data of all 11 participants (see Figure 2) were analyzed together for the training session, and of the 10 participants who completed the whole experiment for the test and recognition sessions. Each 7 consecutive blocks in the training session were grouped into one ‘epoch’, forming 4 training epochs, and each test session (two blocks) was grouped into one epoch, forming epoch 5 (hereafter referred to as ‘test session I’) and epoch 6 (‘test session II’), respectively. The mean RT of the 10 positive cueing learners with epochs and contexts as factors are shown in Figure 3.

FIGURE 2

FIGURE 2. Contextual cueing scores (RT differences between the new and old display) in the test session for individual observers in Experiments 1 (A), 2 (B), and 3 (C) respectively.

FIGURE 3

FIGURE 3. Mean RTs with associated standard errors are shown as a function of experimental epoch and display context (old, indicated by solid-diamond lines, vs. new, indicated by dash-dot lines) for Experiments 1 (A), 2 (B), and 3 (C). Epochs 1–4 were in the training session, while epoch 5 and 6 were the test sessions with rotated or removed cuboid.

Trials with erroneous responses or ‘outlier’ RTs shorter than 200 ms and longer than 3 SDs above the mean were excluded from further analyses. Both the overall mean error and outlier rates of the training session were low (mean error rates: 1.00%; outliers: 2.27%). Note that the error/discard rates of the positive cueing learners were even lower in the test session (<1.00%; a similar result was also observed in Experiments 2 and 3). The error rates were comparable across all conditions: context, F(1,10) = 2.34, p = 0.16, $η_{p}^{2}$ = 0.19, epoch, F(3,30) = 1.47, p = 0.24, $η_{p}^{2}$ = 0.13, and interaction, F(3,30) = 1.79, p = 0.08, $η_{p}^{2}$ = 0.20. That is, accuracy did not improve significantly, for any of the context conditions (old, new displays) over the training session.

Examining training performance of all participants recruited in the experiment, a 2 × 4 repeated-measures ANOVA on RTs with the factors context (old, new displays) and epoch (1–4) revealed significant main effects of context [F(1, 10) = 5.89, p < 0.05, $η_{p}^{2}$ = 0.37] and of epoch [F(1.48, 14.75) = 20.86, p < 0.01, $η_{p}^{2}$ = 0.68], as well as the context × epoch interaction [F(3,30) = 3.75, p < 0.05, $η_{p}^{2}$ = 0.27]. RTs were overall 180 ms faster for old compared to new displays, and 329 ms faster in epoch 4 compared to epoch 1. The interaction indicated that contextual cueing developed over the course of training. Additional post hoc tests confirmed that the contextual cueing effect reached significance in epochs 3 and 4 (p < 0.05), but not in epochs 1 (p = 0.27) and 2 (p = 0.18). Taken together, these results are indicative of both procedural learning, indexed by a general speeding-up of task performance across epochs (in all conditions), and contextual learning, that is, a RT advantage for old versus new displays, over the training session.

In the subsequent test sessions, the mean RTs of the ten positive cueing learners (in the training session) appeared somewhat faster for ‘old’ compared to ‘new’ displays. However, this numerical difference was neither significant in epoch 5 (test session I) nor in epoch 6 (test session II), as indicated by paired-sample t-tests: epoch 5, t(9) = 0.09, p = 0.93; epoch 6, t(9) = 1.4, p = 0.20. Additional JZS Bayes Factor (BF) analysis (Rouder et al., 2009) revealed a BF of 4.29 for epoch 5 and of 1.85 for epoch 6. According to Jeffries (1961), a value greater than 3 provides solid evidence for the null hypothesis. Therefore, the result patterns in the two test sessions favor the null hypothesis (despite a non-significant trend for contextual facilitation in epoch 6). Thus, in summary, the results of the test sessions suggest that, although the cuboid itself was not predictive of the target location, it was nevertheless encoded in the representation driving contextual cueing. As a result, when the aspect of the foreground cuboid was changed (test session I) or when the cuboid was entirely removed (test session II), contextual facilitation was effectively abolished.

To examine the effect of cuboid change on RT performance, a further 2 × 3 repeated-measures ANOVA was performed for the last two training blocks and the test sessions, with context (old, new) and session (training, test session I, test session II) as factors. The results revealed no significant context effect, F(1, 9) = 3.15, p = 0.11, $η_{p}^{2}$ = 0.26, but a significant session effect, F(2,18) = 9.42, p < 0.05, $η_{p}^{2}$ = 0.51, and the context × session interaction was significant, F(2,18) = 3.94, p < 0.05, $η_{p}^{2}$ = 0.31, the latter confirming the above finding that contextual cueing decreased significantly from the training to the test sessions: mean RTs to new [old] displays were 1.68 [1.30] s, 1.42 [1.42] s, and 1.35 [1.20] s with the upward-pointing (training session), the downward-pointing (test session I) and the no-cuboid condition (test session II), respectively. As it can be seen, however, the reduction of cueing was mainly due to responses to new displays being expedited in the test sessions compared to the training session [by 260 ms in test session I, t(9) = 2.63, p < 0.05; and by 328 ms in test session II, t(9) = 3.51, p < 0.01]. In contrast, for the old displays, responses became significantly faster (compared to training) only when the cuboid was removed (98 ms), t(9) = 2.63, p < 0.05, while tending to be slower when the cuboid changed (121 ms), t(9) = 1.57, p = 0.15. The RT facilitation for new displays suggests that the search task became easier with the rotated, downward-pointing cuboid (or without cuboid) compared to search with the original, upward-pointing cuboid object. In other words, detection of the target on the foreground cuboid may be difficult as such, as reflected in slower RTs. RTs are expedited, in turn, as soon as the cuboid is ‘pushed’ to the background (recall that after the change, more than 88% of the search items no longer appeared on the edges of the cuboid, facilitating segregation of the search items [foreground] from the cuboid [now background]). Interestingly, with the backgrounded downward-pointing cuboid, RTs was longer compared to the no-cuboid condition, suggesting that the downward-pointing cuboid still causes a cost in processing time – perhaps attributable to the demands associated with keeping the irrelevant background out of the search.

A similar expedition of responses would, in principle, also be expected for old; however, here the change or entire removal of the cuboid object, overlaid on a (relative to the training session) constant search item configuration, did affect the search performance. The net result would be that facilitation of responses due to (in the downward-pointing cuboid condition) improved or (in the no-cuboid condition) no longer necessary foreground-background segregation on the one hand and inhibition of responses due to partial cuboid-configural changes on the other would cancel each other out, effectively abolishing the contextual cueing effect in the test sessions. Thus, in summary, the results indicate that the foreground task-neutral cuboid was learned together with the spatial context during contextual learning, and the rotation or removal of the cuboid in the test sessions abolished a well-established contextual cueing effect.

Recognition Results

Trials with RTs exceeding 20 s (i.e., on which participants failed to respond in time) were excluded from the analysis; this led to the removal of 0.38% of the data. For the 10 positive cueing learners who finished the whole experiment (training, test, and recognition sessions), their mean hit rates (i.e., correctly identified old display as repeated) were 61.25, 53.75, and 60.54% in the three consecutive blocks, respectively, which were numerically higher than the false alarm rates (i.e., new display incorrectly judged as old; 57.05, 53.04, and 48.75%, respectively). However, these differences were not significant: first recognition block display including upward-pointing cuboid, t(9) = 0.67, p = 0.52, JZS Bayes Factor = 3.64; second block with display including downward-pointing cuboid, t(9) = 0.07, p = 0.95, JZS Bayes Factor = 4.29; third block without cuboid, t(9) = 1.34, p = 0.21, JZS Bayes Factor = 1.98. As the power of each single (block) test may have been too low to reveal a significant level of explicit recognition, following a criticism by Vadillo et al. (2015) leveled against many previous contextual-cueing studies, we collapsed the three recognition blocks together (to increase the statistical power): nevertheless, the results still revealed no significant effect: t(9) = 0.81, p = 0.44, JZS Bayes Factor = 3.18. Further participant-wise analysis failed to reveal a systematic correlation between the recognition performance (d’) in the collapsed recognition blocks and the magnitude of contextual cueing in the last two test blocks, r = 0.05, p = 0.89. Thus, taken together, there was no evidence that contextual cueing in the current experiment was based on explicit memory of old displays.

Discussion

The major finding of Experiment 1 was that contextual cueing was abolished when the cuboid, serving as a task-neutral context, changed its orientation or was removed in the test session. Two alternative reasons could explain the loss of the contextual-cueing effect. The first is that the search-guiding contextual associations acquired during the training session were established with reference to the cuboid, despite the fact that the cuboid itself was completely non-informative with respect to the target location. In other words, the cuboid was perceptually foregrounded and encoded together with the distractor configuration during contextual learning (‘foreground-learning’ alternative). As a result, contextual cueing was sensitive to the change of the cuboid object. Alternatively, the absence of contextual cueing in the test session was due to the change of the cuboid object. In this case, contextual cues were learned only in relation to the configuration of the search items; however, retrieval of the learned context was blocked by the salient change of the display even though this change was task-irrelevant (‘blocked-retrieval’ alternative). The key difference between these two accounts is that the first, ‘foreground learning’, assumes that the foreground context, including the task-neutral cuboid, is learned in conjunction with the spatial-array context; by contrast, the second ‘blocked retrieval’ alternative, emphasizes that contextual memory is solely constructed based on the spatial-array, but the retrieval could be blocked by the variation of the cuboid. To further disassociate these accounts, we ‘weakened’ the spatial association between the search items and the cuboid object by placing them in separate depth planes in Experiment 2. We hypothesized that placing the cuboid in a separate, more distant, depth plane than the search array would effectively assign the former to the background, permitting contextual learning of only the item configuration in the foreground. Thus, on the foreground-learning account, contextual cueing was expected to be evident regardless of the change (removal) of the cuboid object in the test phase of Experiment 2. The block-retrieval account, by contrast, would predict diminished contextual cueing following the cuboid change (removal), as already seen in Experiment 1.

Experiment 2

Experiment 2 was designed to examine whether a task-neutral context that is ‘segmented’ into the background by means of 3D depth cues would still be learned together with the item-based spatial context in the foreground during visual search. Previous work (Nakayama and Silvermann, 1986) has shown depth to be less costly than the other feature dimensions, such as color or motion, in search for a target defined by a cross-dimensional feature conjunction (e.g., searching for a white upward-moving target among black upward-moving and white downward-moving distractors). Targets ‘popped out’ of the display when they were defined by a conjunction of depth with color or depth with motion, but not when they were defined by motion and color. This suggests depth provides a stronger grouping or segregation cue than color or motion, efficiently guiding observers’ attention to the relevant sub-group (or depth plane) that contains target (while minimizing the interference from distractors in other depth planes). Thus, assuming that in Experiment 2, the task-neutral cuboid is effectively separated from the visual search plane, then, if contextual learning relies primarily on the foreground context, the cuboid should not be encoded into the learned, configural memory representation, and thus not interfere with contextual cueing when the cuboid is changed or removed. Otherwise, if foreground-background segmentation does not affect contextual learning, the findings should be similar to those of Experiment 1.

The method in Experiment 2 was essentially the same as in Experiment 1, except that the trial displays were now shown in 3D (using a 3D projector presentation system). The major difference relative to Experiment 1 was that the 12 search items were shown on the front and the cuboid on the back plane of a 3D (stereoscopic) search display (see Figure 1B for examples). Importantly, the display arrangements were the same as in Experiment 1 when viewed monocularly.