Contextual Cueing Effect Under Rapid Presentation

In contextual cueing, previously encountered context tends to facilitate the detection of the target embedded in it than when the target appears in a novel context. In this study, we investigated whether the contextual cueing could develop at early time when the search display was presented briefly. In four experiments, participants searched for a target T in an array of distractor Ls. The results showed that with a rather short presentation time of the search display, participants were able to learn the spatial context and speeded up their response time overall, with the learning effect lasting for a long period. Specifically, the contextual cueing effect was observed either with or without a mask after a duration of 300-ms presentation of the search display. Such a context learning under rapid presentation could not operate only with the local context information repeated, thus suggesting that a global context was required to guide spatial attention when the viewing time of the search display was limited. Overall, these findings indicate that contextual cueing might arise at an “early,” target selection stage and that the global context is necessary for the context learning under rapid presentation to function.


INTRODUCTION
Despite a large amount of information that we experience every day, we have acquired the ability to learn the regularities from the environment. Chun and Jiang (1998) introduced a contextual cueing task, which proved to be a powerful tool to scrutinize the processes involved in environmental statistical learning. In their seminal study, Chun and Jiang (1998) asked the observers to perform a visual search task in which they had to discriminate the direction of a "T"-shaped item target embedded in a set of "L"-shaped distractor items as fast and as accurately as possible. The "context" is defined by the spatial arrangement of distractors. Two types of displays, repeated and novel contexts (sometimes referred to as old and new contexts, respectively) were presented. In the repeated context, there was a stable relationship between the target and distractor locations, which was repeated across the experimental blocks. The novel context on the other hand was used as a control condition, in which distractor locations were determined randomly in every trial and their spatial locations could not predict the target location. It has been widely shown that search speed in the repeated context is faster than novel context (e.g., Chun andJiang, 1998, 1999;Zheng and Pollmann, 2019;Vadillo et al., 2020b), leading to the suggestions that participants, through repeated encounters of old Frontiers in Psychology | www.frontiersin.org 2 December 2020 | Volume 11 | Article 603520 displays, form some incidental memory about the invariant spatial target-distractor relations in these displays, with this spatial context memory subsequently guiding selective attention more effectively towards the target location. Chun and Phelps (1999) proposed that context memory stores spatial/configural or more general relational information, independent of whether or not this information is acquired implicitly or explicitly. On the other hand, context memory differs largely from other forms of explicit memory. For example, (i) it is fast to acquire: five times of repetition, the search displays are enough to produce the contextual cueing effect (Chun and Jiang, 1998); (ii) it exhibits a large capacity, in that observers are able to form context memory for as many as 60 repeated displays (e.g., Jiang et al., 2005); and (iii) it is robust against interference with time and can last at least 10 days (e.g., Van Asselen and Castelo-Branco, 2009). However, the temporal properties of the context memory are seldom investigated. For instance, how early does the contextual cueing occur? Studies with event-related potential (ERP) methods indicate that context may be learned within a rather short time (Olson et al., 2001;Johnson et al., 2007;Schubo, 2009, 2010). For instance, Johnson et al. (2007) employed the typical contextual cueing paradigm in which participants were asked to detect the target "T" and discriminate its orientation among the distractor "Ls. " They observed a facilitation in reaction times (RTs) for repeated relative to novel contexts, accompanied by an increase in the amplitude of the N2pc waveform beginning at approximately 200 ms post-stimulus. N2pc component is defined as the difference in amplitude between the electrode sites contralateral and ipsilateral to the target, which is a wellvalidated electrophysiological signature of the focusing of attention (Luck et al., 1997). The difference between repeated and novel contexts in the N2pc component provides direct evidence that contextual cueing leads to greater probability of attention being directed to the visual hemisphere containing the target (see also Schankin and Schubo, 2009). Thus, around 200 ms after the search context, participants could already take use of the learned contextual information to guide attention to the target location. Earlier time differences were also observed with magnetoencephalography (MEG), which showed greater gamma activity to occur 100-300 ms earlier in the repeated than novel conditions (Chaumon et al., 2008).
Although these neurophysiological studies demonstrate relatively rapid emergence of the contextual cueing effect, these findings have seldom been corroborated in behavioral work. In the classic study by Chun and Jiang (1998), the stimulus display was presented until participants responded to the target item, which enabled participants to have an effective connection between the target and (old) contexts. In the subsequent test phase, the stimulus display was presented for 200 ms only (in Experiment 5). The results showed that context memory could be successfully extracted and influence the behavioral response within 200 ms. Note that in this study, there was enough learning time of the stimulus display during the learning stage as the presentation time of the search display was unlimited. Thus, it remains unknown if it is possible to learn the contexttarget association when the search display is presented with only a limited time. In the "pop-out" search, the characteristics of the single target (such as color and orientation) are different from those of the interference stimuli, and thus participants directed their attention to the target location through bottom-up processing, responding to the target efficiently. The evidence from "pop-out" visual search showed that response could be executed around 600 ms (vs. ca. 1,300 ms in the classical contextual cueing studies by Chun andJiang, 1998, 2003) after the onset of the search context while also being facilitated by the repeated contextual information (see also Kunar et al., 2008;Geyer et al., 2010). Although they did not separate the learning time from the response time, these findings suggest that it is possible to learn the context within 600 ms of the display time. Moreover, evidence from eye-tracking studies (e.g.,  showed that when participants were performing the standard T/L search task, the probability that the first saccade went to the target on repeated displays was increased relative to novel displays, suggesting that contextual cueing affects behavior as early as the first saccade. However, it is also possible that contextual cueing could not manifest within a short presentation time. Kunar et al. (2007) presented results that do not support an early onset of the contextual cueing effect. They manipulated the number of search items and measured the search slopes for the repeated and novel configurations but failed to find an improvement in search slopes for repeated over novel display. According to their hypothesis that the visual search slope (reflects response times to the increased number of search items) was assumed as a signature of attentional guidance, a lack of slope difference was interpreted that response-level enhancement but not the early attentional guidance was the reason to driven contextual cueing.
Thus, the goal of the present study was to examine directly if there was an early behavioral gain reflecting the contextual cueing effect. To investigate this question will also help answer the important question whether the context could be learned within a short exposure time. Previous work suggested that contextual cueing cannot be effectively used until search begins . Thus, a more straightforward method might be to manipulate the duration of the search display directly. To this end, we investigated whether contextual cueing effects can be acquired under rapid presentation of the search display for the first time to our knowledge.

EXPERIMENT 1
To investigate whether the spatial context could affect the target detection with a rapid presentation of the search display, we employed the classical contextual cueing paradigm (Chun and Jiang, 1998) where participants performed visual search for the target object "T" among other distractor objects "Ls" (see Figure 1). Especially, the search contexts were presented for 500 ms only, which is in contrast with previous contextual cueing experiments where the presentation time of the search display was unlimited (Chun and Jiang, 1998). Note that in a previous study with pop-out search paradigm, robust contextual cueing effect was observed, and participants' mean response Frontiers in Psychology | www.frontiersin.org 3 December 2020 | Volume 11 | Article 603520 times were around 550-650 ms (Geyer et al., 2010). Here, we set the display presentation duration (i.e., 500 ms) being less than the response time threshold that was reported in Geyer et al. (2010) study, to guarantee that no response could be made (i.e., participant could not finish the visual search task) within the display presentation time. The purpose was to exclude the role of response factor in the learning process of contextual cueing.

Participants
Fifteen naive participants with normal or corrected-to-normal visual acuity (14 females; mean ages: 20.47 ± 1.88 years; all right-handed) were recruited from Hangzhou Normal University. The sample size was estimated by a power analysis using G*Power (Prajapati et al., 2010). In previous contextual cueing tasks, the effect sizes were relatively high (e.g., all η p 2 s > 0.31 in Makovski and Jiang, 2011;Harris and Remington, 2017;Zhao et al., 2017). Here, we choose a medium effect size (η p 2 = 0.25) to estimate the sample size, and the results yielded a sample size of 12 participants per experiment to reach a power of 95% and an α level of 0.05. To be more conservative, we recruited 15 participants for each experiment. The study was approved by the ethics committee of the Institutes of Psychological Sciences in Hangzhou Normal University. All participants were given written consent prior to the experiment and were paid ¥50 for their participation.

Apparatus and Stimuli
The experiment was conducted in a dimly lit room (ambient light: <1 cd/m 2 ). Visual stimuli were presented on a 27-in. LCD monitor (1,920 × 1,080 pixels; 120 Hz). Stimulus presentation and response collection were programmed by using MATLAB and the Psychophysics Toolbox (Brainard, 1997;Pelli, 1997) on an ASUS computer. The distance between the eyes and the computer screen was about 54 cm, with participants' head position fixed by a chin rest. The background color was gray (luminance: 11.58 cd/m 2 ), and the stimulus presentation area was divided into 10 × 10 invisible matrix grid (subtending 12.53° × 12.53° of visual angle). Search items (subtending 0.85° × 0.85° of visual angle) appeared in 12 of the 100 square units, including 11 "L"-shaped distractors (rotated 0, 90, 180, or 270°) and 1 "T"-shaped target rotated 90° to the left or right. The stimuli were presented in white (luminance: 43.34 cd/ m 2 ). The number of items and the possibility of the target location are equal in each of the four quadrants of the whole stimulus presentation area. The target never appeared in the central 2 × 2 units to prevent participants from looking at the target immediately after the display onset, as the participants were instructed to fix the central cross before the display presentation. In addition, 24 locations on the four corners (each containing six locations) were not used for target's locations to avoid extreme difficulty in the search task.

Design and Procedure
The experiment contained 50 blocks, with 24 trials (12 repeated and 12 novel contexts) in each block, and participants can take a short break every two blocks. The trials with repeated and novel configurations were intermixed randomly in each block. Twelve repeated configurations were randomly generated at the beginning of the experiment and repeated across blocks, whereas 12 novel configurations were newly re-generated in each block. That is, for each repeated context, the locations of the target and distractors, as well as distractors' orientations (but not the target's orientations), kept constant throughout the experiment. For the novel context, except for the target's location (which was constrained to appear at a fixed location in each configuration), both distractors' locations and orientations varied randomly at each presentation. The orientation of the target (left vs. right) was chosen randomly for each repeated and novel context to avoid potential learning of the targets' features in visual search.
Each trial started with a central "+" fixation display lasting for 1,300-1,500 ms. Then the visual search display including target and distractors was presented for 500 ms. Participants were asked to respond to the target stimulus "T" as fast and accurately as possible by pressing the response keys (left and right arrow keys for the "T" that is tilted to left and right, respectively). Following the search display, a blank screen was presented for another 500 ms (see Figure 1). Participants could respond during the presentation time (1 s in total) of both the search display and the blank screen to make the response execution as fast as possible (based on a pilot experiment in which we found that most of the responses could be made within 800 ms). Before the start of the experiment, participants were required to perform a practice session including two blocks of 24 trials each (12 repeated and 12 novel). The stimulus displays were presented for 2,500 and 500 ms (same as the training session) in the two blocks to help participants get familiar with the task gradually (starting from an easy condition and then to a difficult condition). Note that all displays used in the practice session were never reused during the experimental phase. Most importantly, participants were not informed in any way that the spatial layout of some trials would be repeated, nor were they told to memorize the display layout.

Results
In order to improve the power of statistical analysis, every five blocks were collapsed into one epoch for statistical analysis, resulting in 10 epochs in total. Trials with empty or wrong responses were treated as error trials and were not included in the RT analysis. A repeated-measures analysis of variance (ANOVA) with the within-subject factors context (repeated and novel) and epoch (1-10) was conducted on the error rates and RTs. Greenhouse analysis was used when the sphericity of Mauchly's test was violated. The same analysis was applied to all the subsequent experiments. In addition, Bayes factors (BFs) were computed for those results that favored the null hypothesis using JASP software (Marsman and Wagenmakers, 2016). In the calculation process, the default Cauchy settings (i.e., r-scale fixed effects = 0.5, r-scale random effects = 1, r-scale covariates = 0.354) and Cauchy prior (scale = 0.707) were used in the ANOVA and t-test, respectively, to calculate BF. Specifically, BF 10 was reported to indicate the extent to which the data support the alternative hypothesis (i.e., H 1 ) as compared with the null hypothesis (H 0 ). A BF value larger than three is considered to provide substantial evidence for alternative hypothesis, while a BF less than 1/3 indicates substantial evidence for the null hypothesis (Wetzels et al., 2011).

Discussion
Experiment 1 showed that under a presentation time of 500 ms, the error rates were lower and RTs were faster in the repeated compared with novel contexts, suggesting that the learning of the spatial context can facilitate the target detection even when the search display was presented for a relatively short time. Moreover, there was a main effect of epoch for both error rates and speed, indicative of procedural learning as the experiment progressed (e.g., Shiffrin and Schneider, 1977). Note that the response speed (mean RT = 654 ms) was much faster when the response time was limited within 1 s, compared with the mean RT (more than 1 s) in previous similar studies with unlimited presentation time of the search display and unlimited response time (i.e., the search display remained on the screen until the response; see, e.g., Chun andJiang, 1998, 1999), which indicates that responses could be speeded when setting the response boundaries. However, it seems that limiting the response time also made the task more difficult. The initial error rates were rather high, but the accuracy could be greatly improved after a period of training (error rates decreased from 35 to 16%). Given that 500-ms presentation time is sufficient for learning the contextual information with mean RTs around 600-700 ms, it is thus possible that participants could already encode and extract the contextual information before 500 ms. Next, we reduced the presentation time to 300 ms to examine if the contextual cueing effect could also occur.

EXPERIMENT 2
In Experiment 2, we further investigated whether the repeated spatial context could be learned when limiting the presentation time to 300 ms (which is approximately a duration of one fixation; Zhao et al., 2012). To this end, we changed the presentation time of the search display to 300 ms while keeping other properties the same as in Experiment 1. In addition, we also provided three additional test sessions with the presentation time prolonged to 2,500 ms after the 10-epoch learning session, in order to test whether the contextual memory learned based on rapidly presented displays could be transferred to normally presented displays (with longer presentation duration) and whether the search difficulty could be reduced when the presentation time was longer. The test sessions were conducted at three time points: right after training, 1 day after training, and 1 week after training.

Method
A new group of 15 participants (13 females; mean ages: 20.2 ± 1.9 years; all right-handed) took part in the experiment.
The stimuli, design, and procedure in Experiment 2 were essentially the same as those in Experiment 1 except that the visual search display in the learning session (including 10 epochs of five blocks each) was presented for 300 ms and then the blank screen was presented for 700 ms. In addition, three test sessions with five blocks of 24 trials each were conducted right after training (see Figure 3, Epoch 11: Blocks 51-55), 1 day after training (Epoch 12: Blocks 56-60) and 1 week after training (Epoch 13: Blocks 61-65). Thus, in total, each participant received 1,200 trials in the learning phase and 360 trials in the test phase. The randomly generated new configurations in the last 15 blocks during the learning session (i.e., Blocks 35-50) were reused in the 15 test blocks. In other words, the configurations of Blocks 35-50 in the learning session were identical to those in the three test sessions to control the possible confound resulting from learning when only the repeated context (but not the novel context) was repeated in the test phase. The duration of the search display in the test sessions was extended to 2,500 ms, and the blank screen was 500 ms.

Test Phase
In the three test sessions, participants were given enough search time (i.e., 2.5 s), which greatly decreased the error rates as compared with the learning session (see Figure 3A

Reaction Time Learning Phase
The mean RTs for repeated and novel contexts as a function of epoch in the learning session are depicted in Figure 3B (Epochs 1-10). The mean RTs were comparable for the 500and 300-ms presentation time conditions (

Test Phase
The mean RTs in the test session are depicted in Figure 3B  To further examine whether the contextual cueing effect observed in the test phase was due to new learning effect in the test phase or due to the transfer effect from the previous learning phase, a paired sample t-test was applied to compare the difference in RTs between repeated and novel contexts for the first block of each test session, given that the configurations in the first block were all presented once and thus not repeated yet. Moreover, the changes of presentation time (from 300 ms in the training session to 2,500 ms in the test session) would only influence the RT equally for the novel and repeated conditions in the first block of the test session (with comparable properties) but not influence their RT difference (i.e., contextual cueing effect), thus excluding the possible influence of the presentation time on the transfer of context cueing. The results revealed significant contextual cueing effect in the first block of all test sessions, Session 1: t (14)

Discussion
Experiment 2 showed that there were significant differences in error rates and RTs between repeated and novel contexts in the learning session, replicating the results in Experiment 1. Thus, contextual information can be learned under the rapid presentation of 300 ms. Most important, the contextual cueing effect was comparable between Experiments 1 and 2, suggesting that shortening the presentation time of the search display from 500 to 300 ms would not significantly impede contextual learning. Moreover, the contextual memory maintained under rapid presentation time could last as long as 1 week, replicating previous studies to show that contextual cueing effect is a long-term memory effect (Chun and Jiang, 2003;Jiang et al., 2005;Van Asselen and Castelo-Branco, 2009). Note that when the response time was limited to 1 s, participants tended to make a speeded response around 550-650 ms (in both Experiments 1 and 2), which duration was comparable with the RT in the pop-out search where the search for the salient target is rather efficient (Geyer et al., 2010). In contrast, when the response limitation was changed to 2.5 s (in Experiment 2), the RT was correspondingly extended. In contrast to the RT, the accuracy was greatly dropped when the response limitation was 1 compared with 2.5 s. Thus, it appears that a strategy applying speed-accuracy trade-off was used among participants. It has been shown that searching for a "T" among distractors "L" involves a serial processing (Treisman and Gelade, 1980;Duncan and Humphreys, 1989), which is a demanding search task strongly dependent on the focused spatial attention of the display (Woodman and Luck, 2003). Thus, it is possible that with limited response time, it is more difficult for the participants to correctly localize and identify the target. Despite the increased task difficulty due to the limitation of the response time, we nevertheless found that context information could be learned and extracted to guide the attention more effectively.
Previous eye-tracking studies showed that the first saccade on average landed already closer to the target for the repeated configurations than new configurations (e.g.,  with an average fixation duration of up to 300 ms Zhao et al., 2012). Given attention can be guided to the general vicinity of the target for the initial fixations , it is possible that the contextual memory only relies on the local context of the target within a rather short time of viewing. There is evidence that local invariances are important for successful contextual learning (Olson and Chun, 2002;Song and Jiang, 2005;Brady and Chun, 2007). For instance, Brady and Chun (2007) showed that when the repeated distractors (e.g., 2 "Ls") were locally positioned near the target, participants were able to acquire the context in the learning phase, suggesting that near-target invariant inter-element relations are important for contextual learning. However, other studies showed that the acquired cueing effects transferred from the learning to the test session only for search displays that maintained the global information, but not for displays that only maintained the local set of objects near the target (Brockmole et al., 2006), supporting the important role of global context in contextual learning and transfer (Kunar et al., 2006;Geyer et al., 2010;Rosenbaum and Jiang, 2013). More recent evidence showed that effective retrieval for search guidance required the availability of peripheral information (Zang et al., 2015). In the next experiment, we set out to solve the question of whether information learned within 300-ms viewing time is global or local context.

EXPERIMENT 3
Experiment 2 showed that contextual cueing effect could be effectively observed with the search display presented for 300 ms. However, it is unclear how the contextual information could be learned and extracted when it is only available for a rather short time. Experiment 3 investigated whether the learning and retrieval of invariant display properties require global structure of the context or whether the availability of the local structure of the context is sufficient for the contextual cueing to manifest. To this end, in Experiment 3, we changed the repeated configurations so that only the local layouts in which two distractor items within the target quadrant were repeated across blocks whereas the remaining distractor items in other quadrants were located randomly across trials (see also Brady and Chun, 2007). The novel configurations were generated randomly across trials, which were similar to those of Experiments 1 and 2. If contextual cueing relies on a global context, we would observe a null finding. However, if the local layout of the configuration is sufficient to guide attention to the target, we would observe a contextual cueing effect.

Method
A new group of 15 participants (13 females; mean ages: 21.40 ± 0.46 years; all right-handed) took part in the experiment. The stimuli, design, and procedure in Experiment 3 (see Figure 4A) were essentially the same as those in the training phase of Experiment 2 except that in the repeated configurations, only the local layouts (i.e., two distractors and one target) were repeated across blocks whereas the spatial locations of distractors in the other three quadrants were randomly manipulated across trials (see Figure 4B). In addition, each of the four quadrants had equal possibility of the local repeated configurations. In the local layout, the target and two near distractors were presented within a view window sized 6.27° × 6.27°, whereas the whole stimulus presentation area (with 12.53° × 12.53° of visual angle) was kept the same as previous experiments.

Reaction Time
The mean RTs for repeated and novel contexts as a function of epoch are shown in Figure 5B. analyzing RTs from the second epoch, significant main effect of epoch was observed, F(3.735, 52.289) = 3.780, p = 0.010, η p 2 = 0.213, with RTs decreased at 34 ms from Epoch 2 (640 ms) to Epoch 10 (606 ms), indicating the procedural learning effect. This might be due to a different search strategy from Epoch 1 to Epoch 2 as indicated by higher error rates in Epoch 1 (39%) than Epoch 2 [28%; t(14) = 3.129, p = 0.007, Cohen's d = 0.808]. However, due to the high error rates, participants' RT response in the first epoch may not provide enough statistic power.

Discussion
In Experiment 3, the contextual cueing effect was not observed by repeating only the local layout of the repeated configuration (with the remaining distractors randomly distributed across trials) with a presentation time of 300 ms. Note that the size of the stimuli display used in the current study was comparable with that in Zang et al. (2015) study where a view window sized 12° made the peripheral information available and thereby enhanced contextual retrieval. They argued that additional information from the periphery (outside the 8° area) likely contributes to optimizing (online) saccadic path planning, which is important for retrieving the learned spatial inter-element relations from contextual memory. In the present experiment, no contextual cueing effect occurred by repeating the local configuration consisting of just two to three items within a view window sized 6.27° while changing the peripheral information in the rest presentation area. Thus, even with visible peripheral information beyond the local layout, if the peripheral information was not invariant, there was still no context-based search guidance. These results indicate that global contextual information is required for the contextual cueing effect to manifest with a 300-ms viewing time of the search display.

EXPERIMENT 4
Previous experiments showed that context can be learned under a rapid presentation of 300 ms. However, it is also possible that the learning of context also occurs via an internal representation after the display disappears due to effects of visual persistence (Coltheart, 1980). In Experiment 4, we employed backward masking of the search displays to limit the processing time to 300 ms. Moreover, we introduced a recognition test at the end of the experiment to examine whether participants had awareness of the repeated configurations.

Methods
In Experiment 4, a new group of 15 participants (10 females; mean ages: 21.07 ± 0.37 years; all right-handed) were tested. Experiment 4 was essentially the same as Experiment 2, except that after the search display disappeared, the blank screen in Experiment 2 was replaced by a mask display presenting for 700 ms (see Figure 6). The masking stimuli were composed of 100 white lines with random orientations presented at each grid of the stimulus presentation area (with 10 × 10 invisible matrix grid; see Experiment 1). Participants could respond to the target "T" after the onset of the search display until the offset of the mask display. The response time was also limited to 1 s (from the onset of the search display until the end of the masking stimuli). After the experiment, a recognition task (with 24 trials) was carried out. Each trial started with a central "+" fixation display lasting for 1,300 to 1,500 ms. Then the visual search display including target and distractors was presented for 2,500 ms. Participants were instructed to press the left arrow key if they feel that they had seen this configuration (i.e., repeated display) in the earlier visual search blocks or the right arrow key if they recognize this configuration as novel display. The repeated displays that had been presented in the earlier search blocks (i.e., 12 displays) and 12 newly generated configurations (that had never appeared before) were randomly intermixed across trials.

Reaction Time
The mean RTs for repeated and novel contexts as a function of epoch are shown in Figure 7B. Next, we compared the performance between the experiments with (Experiment 4) and without masking stimuli (Experiment 2). To this end, we conducted 2 (experiment: Experiment 2 vs. Experiment 4) × 2 (context: repeated vs. novel) × 10 (epoch: 1-10) mixed ANOVAs, which showed that there was no significant difference between the two experiments: F(1, 28) = 2.098, p = 0.159, η p 2 = 0.070, BF 10 = 0.945, indicating that the mean RTs were comparable between the two experiments (M = 626 ms for Experiment 2 and M = 574 ms for Experiment 4), and any interactions with the factor experiment were all not significant, all Fs ≤ 1, all ps > 0.370, all η p 2 s < 0.040, all BF 10 s < 0.190. An independent-samples t-test was further carried out for the contextual cuing effect averaged Epochs 1-10 in Experiment 2 and Experiment 4. The results showed that there was no significant difference in the averaged contextual cueing effect between Experiment 2 (M = 16.18 ms, SE = 5.98 ms) and Experiment 4 (M = 9.17 ms, SE = 6.01 ms), t(28) = 0.827, p = 0.415, Cohen's d = 0.302, BF 10 = 0.446. Thus, it appears that the amount of contextual cueing effect was comparable regardless if with or without masking stimuli.

Recognition Task
We examined participants' recognition performance by means of the recognition sensitivity d′ [d′ = Z (hit rate) − Z (falsealarm rate) (Green and Swets, 1966)]. A hit means that participants correctly judged a "repeated" configuration as "old, " while a false alarm means that they incorrectly judged a "novel, " random configuration as "old. " The hit and false alarm rates were 53 and 48%, respectively. The mean d′ was 0.13 (SE = 0.19) and not significantly different from zero, t(14) = 0.679, p = 0.508, Cohen's d = 0.175, BF 10 = 0.321, indicating that participants did not have explicit memory for repeated context.

Discussion
In Experiment 4, under a rapid presentation time of 300 ms, we employed a procedure of backward masking to block the visual processing after the search displays. The results showed a contextual cueing effect but only occurred at the late stage of the learning. The mean response time and averaged contextual cueing effect under backward masking were comparable with the condition where the internal visual processing was not blocked. Furthermore, post-experimental recognition tests revealed participants' ability to distinguish repeated (old) from novel (new) conditions only to be at chance level, suggesting that the acquired contextual memory could be implicit. It is important to mention that we have only 24 trials in the recognition session, which may lack enough statistical power to make a firm conclusion, as some studies have discussed the power problems in recognition tests of contextual cueing (see, e.g., Smyth and Shanks, 2008;Vadillo et al., 2016).

GENERAL DISCUSSION
The current study investigated whether the contextual cueing effect could be observed when the search context was presented briefly. Specifically, the search stimuli were presented for 500 ms in Experiment 1 and for 300 ms in Experiment 2, and in Experiment 4 with the search display masked after 300 ms, which showed that participants were able to learn the spatial context within a short presentation time, leading to faster search response for repeated than novel contexts. Moreover, the learning effect acquired under 300-ms presentation time could last as long as 1 week (as shown in Experiment 2), similar to the contextual cueing effect obtained under unlimited presentation time (Chun and Jiang, 2003). In addition, we further showed that such a context learning under rapid presentation required the availability of the global context information instead of the local context information (in Experiment 3). Furthermore, post-experimental recognition tests revealed participants' ability to distinguish repeated from novel conditions only to be at chance level, indicating that contextual cueing is mediated by implicit memory representations. Taken together, the results provided first evidence that context could be learned and acquired to guide attention effectively within a rather short time.
Previous behavioral studies showed that context memory could be successfully extracted within the 200-ms presentation time in the subsequent test phase after the initial learning phase with unlimited presentation time (Chun and Jiang, 1998). The present study showed that the contextual information could be learned effectively within a limited time as short as 300 ms. This suggests that contextual cueing might occur rather early (before 300 ms) in the search process, which is also supported by the results that the overall search RTs were greatly reduced when the display presentation time was limited to 300 ms compared with when the presentation time was extended to 2,500 ms (in Experiment 2). This finding is in contrast with previous behavioral studies, which suggested a slow time course of contextual cueing (Kunar et al., 2007(Kunar et al., , 2008. For instance, Kunar et al. (2008) found that search slopes were shallower in the repeated than in the novel condition, but only when the overall search took longer with slowed search RTs; otherwise, there was no difference in the search slopes between the repeated and novel conditions when the number of the search items was varied (Kunar et al., 2007). Therefore, they concluded that the cueing benefits might arise "late" in processing, i.e., at the response selection stage. Instead, the present study provided evidence of behavioral gains at an early time, which is consistent with the neurophysiological indices reflecting that spatial attention diverges as early as 100-200 ms between the repeated and novel displays (e.g., Johnson et al., 2007;Chaumon et al., 2008;Schankin and Schubo, 2009). Accordingly, it is possible that contextual cueing influences an "early, " target selection stage (e.g., Chun and Jiang, 1998;Johnson et al., 2007). That is, contextual cuing arises because observers learn the predictive structure of the search environment by associating the positions of distractors in repeated displays with the location of the target, thus promoting the search efficiency of the task (in line with Chun and Jiang, 1998, attentional-guidance account).
The original contextual cueing paradigm showed that observers implicitly learn the repeated configuration of targets in visual search tasks and that this context can serve to cue the target location and facilitate search performance in subsequent encounters (Chun and Jiang, 1998). Thus, the process of search through distractors to find the target is crucial for contextual cueing (Olson and Chun, 2002). Interestingly, there is evidence that repeating the locations just of items in the target's quadrant produces as much contextual cueing as does repeating the entire display Brady and Chun, 2007). In Brady and Chun's study, only the target adjacent locations were attended and incorporated into learning, suggesting that contextual cueing effect relies on the local context of the target. However, we found that with 300-ms presentation time, only repeating the distractor locations inside the target quadrant was not able to produce contextual cueing effect (in Experiment 3), indicating that the learning of the contextual information presented rapidly did not incorporate local configuration information. Yet by combining the results of Experiments 2 and 4, which revealed a significant contextual cueing effect when the whole display was repeated, we could infer that participants learned the global context and performed a global search mode.
It should be noted that the size of the stimuli presentation area (with visual angle of 12.53° × 12.53°) was much smaller in our study compared with that (the entire screen) in Brady and Chun's study. Correspondingly, the size of the single stimulus in the present study was also much smaller (0.85° × 0.85° in our study relative to 1.8° × 1.8° in Brady and Chun's study). The difference in the size of the stimuli presentation area (and of the letters) might be the critical factor that leads to the difference in the search mechanism. That is, with a smaller presentation area of the stimuli that presented briefly, it is easier for the participants to encode the global configuration without the necessity to frequently shift their attention from one stimulus to the other in order to locate the target. Alternatively, given that Brady and Chun (2007) did not limit the presentation time of the search display, it might be possible that provided enough viewing time when participants were able to process the local context, the local context could guide the attention to the target location as well. In line with our findings, Zang et al. (2015) contextual cueing study presented stimuli within a circular display matrix with a diameter of 16° of visual angle, which was nearly comparable with our study and found that repeated contexts could not be effectively retrieved based on the learned local context under limited viewing condition (e.g., only two distractors near the target can be seen) to aid search guidance. However, once (some) peripheral global information was provided or the whole display configuration was previewed, the contextual cueing effect immediately manifested, suggesting that global information is necessary for contextual retrieval.
The observation of a behavioral gain around 300 ms in the present study is in line with previous neurophysiological evidence showing that the N2pc component of the ERP was greater for repeated than for novel displays (Johnson et al., 2007;Schankin and Schubo, 2009). That is, attention could be allocated effectively to the target's location in repeated context around 200-300 ms after the search display onset as concluded in previous studies (e.g., Schankin and Schubo, 2009). However, it is quite obvious that 200 ms could not guarantee participants to identify the target's location and direct attention to the target; otherwise, participant's mean search time should be much shorter than that (more than 1 s) reported in previous studies that did not limit participant's viewing time (e.g., Chun and Jiang, 1998;Olson and Chun, 2002;Brady and Chun, 2007). Based on our results that 300 ms was sufficient to learn and extract the global configuration of the contextual information, we propose that participants might first process the search display globally and then could direct attention to the local context near the target. Therefore, the N2pc component may indicate attentional guidance to the global search display but not to the exact or near target's location. This, however, would require further investigation.
To further explain the underlying search mechanism, one possibility is that context learning under rapid presentation requires the associative learning between a global context and the target location with the top-down influence of the integrated representation on attentional guidance (Chun, 2000;Chun and Nakayama, 2000). With effective learning, a perceptual unit that integrates the spatial association of the target and distractors of a display might be extracted and formed. Specifically, the formation and reinforcement of this perceptual unit across repetitions might be accompanied by an enhancement of its visual saliency (Geyer et al., 2010), which captures spatial attention in a bottom-up way by using near-peripheral vision (Zang et al., 2015). This process is also constrained by spatial attention and working memory limitations (Gobet et al., 2001). Based on our results, all the visual stimuli could be simultaneously held in attentional window and thus grouped together, effectively encoded into working memory within 300 ms. These temporally learned configurations then translate to long-term memory along with the learning time, subsequently guiding focal attention to the target location when learned pattern re-occurs on later occasions. Alternatively, recent studies suggest that the learning of the distractor configuration could also facilitate the target detection without the guidance to the target location (Vadillo et al., 2020a). Vadillo et al. (2020a) observed a significant contextual cueing effect in visual search even when the target location cannot be predicted by the distractors in repeated configurations (with the locations of distractors kept constant but the locations of the target changed randomly). They suggested that participants learn to ignore the locations usually occupied by distractors, which in turn facilitates the detection of targets. Accordingly, it is possible that with rapid presentation of the contextual information in the current study, the learning of the global configuration makes the distractors suppressed, and thereby, the target becomes more salient, which facilitates the target detection. In line with this account, Zinchenko et al. (2020) also showed that a broad attentional set facilitates flexible updating of global (relative to local) context representations, making the acquired context memory be more adaptive to the changes of the targets. However, to disentangle the two accounts requires further research.
However, our study had several limitations: first, although our sample size has a good power, it is better to use a larger sample size to increase the generalizability of the effect. Moreover, it might be interesting for future work to use more ecological stimuli (see, e.g., Santangelo, 2015;Santangelo et al., 2015, for a review) to replicate current results. In addition, we only investigated the presentation time of 300 and 500 ms, but 300 ms might be not the minimum presentation time to get a contextual cueing effect, which also requires further research.
To summarize, the present study showed that a long-term context memory could be acquired under a rapid presentation of the search display, suggesting that contextual cueing might arise at an "early, " target selection stage. Moreover, the obtained contextual cueing effect with short presentation time did not result from the learning of repeated local configuration of items, thus indicating that a more global context was required. This novel finding sheds light on the temporal attributes of the contextual cueing effect and provides a possible answer as to the underlying learning mechanism when the presentation time is limited.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author. The data files and analysis scripts are available at a public repository https://github.com/ Xie-0130/contextual-cueing.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the Institutes of Psychological Sciences in Hangzhou Normal University. The patients/ participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SC and XZ are responsible for experimental design, results interpretation, manuscript revision, and final approval. XZ programmed the code for the experiments. XZ and XX collected and analyzed the data. XX and SC drafted the manuscript. All authors agree to be accountable for the content of the work.