Contextual Cueing Effect in Spatial Layout Defined by Binocular Disparity

Zhao, Guang; Zhuang, Qian; Ma, Jie; Tu, Shen; Liu, Qiang; Sun, Hong-jin

doi:10.3389/fpsyg.2017.01472

ORIGINAL RESEARCH article

Front. Psychol., 31 August 2017

Sec. Cognitive Science

Volume 8 - 2017 | https://doi.org/10.3389/fpsyg.2017.01472

Contextual Cueing Effect in Spatial Layout Defined by Binocular Disparity

Guang Zhao^1*

Qian Zhuang¹

Jie Ma¹

Shen Tu²

Qiang Liu¹

Hong-jin Sun³

¹Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
²Department of Psychology, Institute of Education, China West Normal University, Nanchong, China
³Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, ON, Canada

Repeated visual context induces higher search efficiency, revealing a contextual cueing effect, which depends on the association between the target and its visual context. In this study, participants performed a visual search task where search items were presented with depth information defined by binocular disparity. When the 3-dimensional (3D) configurations were repeated over blocks, the contextual cueing effect was obtained (Experiment 1). When depth information was in chaos over repeated configurations, visual search was not facilitated and the contextual cueing effect largely crippled (Experiment 2). However, when we made the search items within a tiny random displacement in the 2-dimentional (2D) plane but maintained the depth information constant, the contextual cueing was preserved (Experiment 3). We concluded that the contextual cueing effect was robust in the context provided by 3D space with stereoscopic information, and more importantly, the visual system prioritized stereoscopic information in learning of spatial information when depth information was available.

Introduction

In a natural environment, objects hardly exist in isolation. When we interact with objects in a familiar environment, other objects in the environment might serve as a context that allows us to process visual stimuli more efficiently. Previous research has shown that, in visual search, repeated configurations in which search items (the target and distracters) in a given scene appeared in constant locations over blocks could induce faster responses than random configurations in which the target was presented with a variant configuration. This contextual benefit of invariant configurations has been referred to as the contextual cueing effect (Chun and Jiang, 1998; Chun and Phelps, 1999; Chun, 2000; Olson et al., 2001; Kunar et al., 2006; Zhao et al., 2012). Interestingly, while the contextual cueing effect was obtained, participants were not aware of the repetition of the configurations, which indicated that an implicit learning mechanism was involved in the contextual cueing paradigm (Chun and Jiang, 1998) (but see also Smyth and Shanks, 2008). It is generally considered that the contextual cueing effect was generated on the acquisition of covariations between the location of the target and location of the distractor items. The investigation of contextual cueing effect can help us better understand the learning of critical contextual information and how context guides the deployment of attention (Chun, 2000; Olson and Chun, 2002).

One distinct feature of the contextual cueing effect is said to be its ecological validity in real-world settings, since objects in the world are often present in a given context (Chun and Jiang, 1998; Chun, 2000; Chua and Chun, 2003; Torralba et al., 2006). In fact, invariant contextual information is ubiquitous in real world 3D scenes. The classical paradigm reported by Chun and Jiang (1998) focused on the invariance in spatial relations between target and context in 2-dimensional space. Moreover, the stimuli used in most of the subsequent studies featuring contextual cueing paradigms involved only 2D displays and lacked depth information. However, the learning of spatial relations between target and distractor arrays in a 3-dimensional space has not been well studied.

Studies of visual perception have demonstrated the importance of depth information in the perception of objects and scenes. In fact, researchers have demonstrated that depth cues can over-ride some salient 2-D cues in influencing object recognition (Howard and Rogers, 1995; Liu and Ward, 2006; Akhavein and Farivar, 2017). However, the role of depth information remains unclear in the context of implicit learning. In the contextual cueing paradigm, it remains an open question whether the learning of the search context can be extended to the depth dimension.

Chua and Chun (2003) presented computer-rendered artificial scenes that used pictorial cues to give an impression of apparent depth. An array of objects was positioned on a ground plane and apparent depth information was provided through linear perspective cue. Using such display, they successfully obtained the contextual cueing effect. Note that in this case, objects further in depth along the z axis would appear higher along the y axis of the fronto-parallel plane (display screen). Therefore, the effect found here could in principle still be considered as an effect in 2D space.

Binocular disparity is another important cue to depth (He and Nakayama, 1992; Qian, 1997; Finlayson et al., 2012). It is well known that the human brain can process information about depth using binocular disparities alone (Barlow et al., 1967; Johnston et al., 1994). If depth information defined by binocular disparity is introduced into the contextual cueing paradigm for repeated configurations, the invariant relation between target and context could be defined by the combined location information in the fronto-parallel plane and in-depth (x, y, and z coordinate).

Kawahara (2003) attempted to investigate the contextual cueing effect with a stereoscopic display. The experiments presented stimulus items in two depth planes defined by binocular disparity and instructed participants to attend to items in one depth plane and ignore items in the other plane. This design was similar in principle to Jiang and Chun (2001), in which visual context was defined by items of a particular color while items of another color were ignored. In fact, both studies explored selective attention on the role of global and local contexts for the contextual cueing effect. In addition, in the study by Kawahara (2003), participants were required to group items into two parts by binocular disparity, but did not need to search across different depth planes. Real world scenes are rich in depth information and spatially continuous rather than isolated in one depth plane (Wolfe, 1994). The task in Kawahara (2003), although defined by binocular disparity, was limited in spatial continuity in depth domain, thus could still be considered as a 2D task.

Tsuchiai et al. (2012) used a search task presented in stereoscopic depth to examine the contextual cueing effect in 3D space. There was no detailed description of the exact 3D manipulation in that study, but it is likely that the stimulus items were randomly scattered in different depth planes. They demonstrated contextual cueing effect using this stereoscopic display, however, no attempt was made in examining whether stereoscopic information actually contributed to the contextual cueing effect.

It is important to point out that, for typical repeated scenes in a stereoscopic display, the invariant relation between target and context, in theory, can still be defined through 2D information (displacement on x and/or y coordinate) alone. The 3d information (displacement on z axis) of the items, if repeated, would be redundant in informing the structure of the layout.

In the present study, we set out to demonstrate again that the contextual cueing effect could be obtained in an invariant target-distractors association defined by both 2D and 3D location. More importantly, we went beyond what Tsuchiai et al. (2012) showed by examining whether invariant relation in depth between target and distractors was necessary for the contextual cueing effect when the disparity information of the distractors was available.

In Experiment 1, we varied the binocular disparity of the distracter items in both predictive and random displays and held the disparity information as well as 2D information constant for the predictive displays. The results demonstrated that the contextual cueing effect could be obtained in visual search in depth.

Although the results showed the contextual cueing effect in Experiment 1, the participants may have solely relied on 2D information. In Experiment 2, we examined whether participants actually used the disparity information. Specifically, we tested whether the contextual cueing effect would disappear when the depth information of distracter items was randomized across blocks in predictive configurations while the 2D information of the layout was held constant. The results showed a lack of contextual cueing effect suggesting that participants might use the disparity information to learn the predictive display in Experiment 1.

However, the variation of disparity information in Experiment 2 led to a small displacement in the horizontal axes in the 2D plane as well. Thus in Experiment 3’s predictive displays condition, we introduced a comparable 2D displacement in the distracter items while maintaining the disparity information constant across blocks. The results showed that the contextual cueing effect resumed, suggesting that the lack of contextual cueing effect in Experiment 2 was not due to the small 2D displacements created in the process of variating disparity in Experiment 2. Consequently, the data suggests that participants indeed relied on disparity information when it was available to achieve the contextual cueing effect.

For the contextual cueing effect in a 2D display, it has been demonstrated that attentional guidance is the mechanism in which the repeated context guides participants’ attention toward the target (Chun and Jiang, 1998). The evidence of such a mechanism came from the slope and intercept of the RT × Set Size function when the set size of the search items were varied in the experiment. It was found that the slope was lower in repeated displays compared to random displays, but such pattern was not seen for intercept. However, results from other studies showed the such effect of improved search efficiency has been less consistent (Kunar et al., 2007), suggesting that attentional guidance might not be the only mechanism for the contextual cueing effect.

The present study also investigated the mechanism of the contextual cueing effect generated in a 3D setting. We varied the set size of configuration in all experiments. We fitted a line to the RT × Set Size function, and analyzed the slope and intercept of the fitting line. Based on the predictions of previous studies, if contextual cueing was driven by attentional guidance, there would be a downward trend in the slope and slope for repeated scene will be lower; otherwise, if contextual cueing was sourced by non-search factors (the perceptual recognition processing or the response selection processing), such pattern of results would not been seen (Chun and Jiang, 1998; Jiang et al., 2005; Zhao et al., 2012).

Experiment 1

Methods

Participants

Twenty undergraduate students (9 males and 11 females, mean age = 21 years) participated in the experiment and received a small payment. All participants were right-handed, with normal or corrected-to-normal vision. They first performed a task to ensure they could perceive 3D structure with stereo goggles. Participants were naïve to the experimental hypotheses before they accomplished the experiment. All participants provided informed written consent prior to the experiment. The experimental protocol was approved by the Ethics Committee of Liaoning Normal University, China. The methods were performed in accordance with the approved guidelines. One participant was excluded from analyses because of incomplete data collection. All participants gave written informed consent in accordance with the Declaration of Helsinki.

Apparatus and Stimuli

The experiment was conducted in a quiet and dark room (15 m × 7 m). Stimuli were projected onto a film screen using a rear-projection device (JVC projector DLA-SX21). The screen size was 246 cm × 182 cm and the image resolution was 1024 pixels × 768 pixels with a frame rate of 60 Hz. Participants were asked to maintain their body steadily and viewed the screen from a distance of 3 m. Participants wore stereo goggles that provided two 2D images with a horizontal offset to elicit the perception of 3D. To make participants at ease and to provide a better viewing posture, stimuli were presented in the lower part of the central axis. That is, the distance between the center position of stimuli and the floor was 150 cm.

Within each display, the target was a letter “T” rotated by 90° either clockwise or counter-clockwise, and the distracters were letter “L”s rotated randomly by 0, 90, 180, or 270°. The two lines in each stimulus item were of equal length and with the length of 2.3° and the line thickness of 0.1°. Two set sizes were used (7 distracters and 1 target for set size 8 and 11 distracters and 1 target for set size 12). All the search items were randomly distributed over an invisible array of 8 × 6 locations (x and y coordinates). The array grid subtended 34.4° × 25.8° of visual angle. To avoid the formation of collinearities among the stimulus items, the position (x and y coordinates) of each item had a slight random displacement within a range of [0°, 0.8°] in the vertical and horizontal axes. The background of the screen was gray, and the stimuli were always black. To produce stereovision, binocular disparity (in z coordinate) in the search items was presented over two eyes. The disparity of each search item was randomized within a range of [0.1°, 1°], making a perceived depth distance away from participants within a range of [3.27 m, 14.18 m]. An example stimulus is illustrated in Figures 1A,B.

FIGURE 1

FIGURE 1. A schematic illustration of the search display. (A) Stimuli presented in two eyes, respectively. The stimulus presented to one eye is illustrated in black letters and that presented to the other eye is illustrated in gray letters. The d stands for the distance of binocular disparity for each stimulus item (the distances are examples of different depth ranges). For predictive configurations, the distances of elements remain constant across blocks in Experiments 1 and 3, but not in Experiment 2; for random configurations, all the distances of distracters are randomized. (B) Illustration of stereoscopic depth for the search display. (C) An illustration of disparity variation across blocks for predictive configurations in Experiment 2. B1 stands for perceived depth location of the stimulus in one block, B2 stands for the same item perceived depth location in another block. Other eye (e.g., right eye) were randomized within [0.1°, 1°]. It is perceived with variant depth between B1 and B2. (D) An illustration of invariant disparity across blocks for predictive configurations in Experiment 3. The item is presented in both eyes are translated synchronously. It is perceived with difference fronto-plane but invariant depth in B1 and B2.

Design

Three within-subject factors were included: configuration (predictive vs. random), epoch (1∼7 epochs) and set size (8 vs. 12). There were two types of configurations, predictive and random. Each predictive configuration was presented once in a block and reoccurred in every block throughout the whole experiment. In particular, for the predictive configurations, both the 2D locations (x- and y-values) and the disparity values (z-value) of all the items were repeated across blocks. For the random configurations, both the 2D locations and the disparity of each item were randomized except that the same set of target location (x-, y-, and z-values) was used in all blocks. We also balanced, within and between configurations, possible target locations across four quadrants and in different eccentricities. The predictive and random displays have different sets (but in equal number) of possible target locations.

The entire experiment consisted of 28 blocks of 16 trials (8 random and 8 predictive trials, each contained 4 scenes with the set size of 8, and 4 scenes with the set size of 12) with a total of 448 experimental trials. Two different set sizes (8 and 12) were randomized within a block. To enhance the statistical power, in data analysis, 4 blocks in a row were grouped and averaged into one epoch, which resulted into 7 epochs as the time window.

Procedure

Each trial started with a centrally presented fixation cross “+” (500 ms), followed by the search display. Participants were asked to search for the target (left or right orientated “T”) and responded upon detection as quickly and accurately as possible. Participants were asked to respond by pressing one of two keys: the “F” key for the left rotated target and the “J” key for the right rotated target. The trial would terminate when no response was detected for 10 s. After the participant had responded, a blank gray display was shown for 200 ms and then a screen with the word “Next” appeared for 200 ms indicating the onset of the next trial. Before the formal test, participants performed one practice block of 20 trials to get familiar with the task.

Results

The overall mean accuracies were 99.05% in each conditions, and showed no significant effects (all p’s > 0.110). This pattern of results for mean accuracy was similar for Experiment 2 and 3, we thus will not describe accuracy results for subsequent experiments and will mainly focus on the mean RT data in the data analysis.

For the mean RT data, trials with incorrect responses and RTs below 200 ms or above 4000 ms (representing less than 0.6% of all outliers and errors) were excluded. The mean RTs for predictive and random configurations with epochs for set sizes of 8 and 12 are shown in Figure 2 (left and right panels, respectively). The mean RTs were analyzed in a repeated measures ANOVA of 2 (configuration) × 7 (epoch) × 2 (set size). There were significant main effects of configuration [F(1,18) = 8.227, p = 0.01, η² = 0.314], indicating 39.19 ms faster RTs in predictive configurations than in random configurations; and epoch [F(6,108) = 17.194, p < 0.001, η² = 0.489], indicating 172.84 ms reduction of RTs over time; and set size [F(1,18) = 74.735, p < 0.001, η² = 0.806], indicating 147.92 ms faster search times for the larger set size. Further, the two-way interaction was significant for configuration × epoch [F(6,108) = 4.976, p < 0.001, η² = 0.217]; Post hoc simple effects analysis demonstrated that predictive configurations needed longer search times in the first two epochs (p’s < 0.041), but the situation was reversed from 4th epoch to the last epoch (p’s < 0.05), indicating greater contextual benefit of predictive configurations as the epoch progressed. The two-way interaction of set size × epoch [F(6,108) = 4.251, p = 0.015, η² = 0.191] was also significant. Post hoc simple effects analysis demonstrated that RTs in set size 8 were all significantly faster than in set size 12 in each epoch session (p’s < 0.001), indicating that more search times were needed for the larger set size as the epoch session progressed. However, the two-way interaction of set size × configuration, and the three-way interaction of configuration × epoch × set size, were not reliable.

FIGURE 2

FIGURE 2. (A) Mean correct RTs in each epoch for predictive (red) and random (green) configurations in set size 8 (Left) and set size 12 (Right) of Experiment 1. (B) Search slopes (ms/item) for predictive (red) and random (green) configurations over epoch of Experiment 1. (C) Intercepts (ms) for predictive (red) and random (green) configurations over epoch of Experiment 1.

To obtain the entire learning effect, we analyzed the cueing effect, in which we collapsed set size condition of reaction times and then compute the difference between predictive and random configurations. A one-way ANOVA for the Cueing effect of the learning epochs showed that there was significant main effect of epoch [F(6,108) = 4.976, p < 0.001, η² = 0.217], indicating learning effect was obtained.

We further examine how context influences the efficiency of search using target slope measures as function of set size. Search slopes and intercepts were derived from each individual’s mean data. The slope as a function of configuration and epoch are shown in Figure 2B, the corresponding intercept are in Figure 2C. The slope data were analyzed in a repeated measures ANOVA of 2 (configuration) × 7 (epoch). There was a significant main effect of Epoch [F(6,108) = 4.251, p < 0.001, η² = 0.191]. However, the main effect of configuration and the two-way interaction were not reliable. Analyzing the intercept data, the main effects of configuration and epoch were not reliable, but the two-way interaction of configuration × epoch was marginally significant [F(6,108) = 1.881, p = 0.091, η² = 0.095]. Note that the slope was greater in predictive configurations than in random configurations, but the intercept was just the opposite.

Discussion

Experiment 1 examined whether contextual cueing effect could take place in scenes containing stimulus items presented in different depth planes through binocular disparity. The results showed that the response RTs were significantly faster in predictive configuration than in random configuration as the learning progressed, indicating that contexts defined by depth can induce contextual cueing. The results replicated the general pattern of results by Tsuchiai et al. (2012), suggesting that learned associations between visual context and target presented in 3D space can facilitate search performance.

The stimuli in both predictive and random configurations were scattered similarly on different depth planes by manipulating the binocular disparity of elements projected in two eyes. Moreover, the possible target locations were also matched across the configurations. Thus any difference in results should be attributed to the repetition of the contextual information only.

Experiment 2

Even though the results of Experiment 1 suggested that participants could learn the contextual items presented in different depths, this result cannot exclude the possibility that participants only use the spatial displacement between items in the fronto-parallel plane to learn the spatial layout. To investigate whether disparity information was indeed learned as part of the context in Experiment 1, in Experiment 2, the items in the predictive displays no longer maintained the same disparity information across blocks even though the 2D displacement between items remained largely invariant.