Random reward priming is task-contingent: the robustness of the 1-trial reward priming effect

Ásgeirsson, Árni G.; Kristjánsson, Árni

doi:10.3389/fpsyg.2014.00309

ORIGINAL RESEARCH article

Front. Psychol., 10 April 2014

Sec. Cognition

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.00309

This article is part of the Research TopicWhat you did is what you’ll do: the role of implicit visual memory in search behaviorView all 12 articles

Random reward priming is task-contingent: the robustness of the 1-trial reward priming effect

Árni G. Ásgeirsson¹^*

Árni Kristjánsson²

¹Department of Psychology, Center for Visual Cognition, University of Copenhagen, Copenhagen, Denmark
²Laboratory for Visual Perception and Visuomotor Control, Faculty of Psychology, University of Iceland, Reykjavík, Iceland

Consistent financial reward of particular features influences the allocation of visual attention in many ways. More surprising are 1-trial reward priming effects on attention where reward schedules are random and reward on one trial influences attentional allocation on the next. Those findings are thought to reflect that rewarded features become more salient than unrewarded ones on the subsequent trial. Here we attempt to conceptually replicate this effect, testing its generalizability. In three versions of an analogous paradigm to the additional singleton paradigm involving singleton search for a Gabor patch of odd spatial frequency we found no evidence of reward priming, while we only partially replicate the reward priming in the exact original paradigm tested by Hickey and colleagues. The results cast doubt on the proposal that random reward enhances salience, suggested in the original papers, and highlight the need for a more nuanced account. In many other paradigms reward effects have been found to progress gradually, becoming stronger as they build up, and we argue that for robust reward priming, reward schedules need to be more consistent than in the original 1-trial reward priming paradigm.

Introduction

Reward, financial or through the possibility of lessened effort, has a strong effect upon attentional function. Della-Libera and Chelazzi (2009) showed that selection or ignoring of stimuli is strongly modulated by whether stimuli are consistently associated with high or low reward. Anderson et al. (2011) found that stimuli associated with high reward are more likely to capture attention on subsequent unrewarded trials and in Kristjánsson et al. (2010) priming of pop-out was stronger for highly rewarded colors than colors receiving low reward. Kristjánsson et al. also showed that observers were flexible, quickly picking up on within-block changes in reward schedules even without awareness of the changes. Kiss et al. (2009) have shown how the N2pc attentional selection EEG component occurs earlier and is larger for visual search for colors consistently rewarded higher than for colors receiving low reward. Observer-by-observer N2pc correlated with effects of reward on search efficiency. Tseng and Lleras (2013) have then shown how rewarded search contexts are more easily learned during implicit contextual cueing (see Chelazzi et al., 2013 for review).

In most of these studies, reward was consistently associated with a particular color or context during a training phase, throughout testing, or at least for long series of adjacent trials. In contrast, Hickey et al. (2010a,b, 2011) reported an effect they called reward priming, in a task in which correct responses were rewarded with randomly determined high or low monetary reward. Their key finding was that a correct response to a target, which coincidentally resulted in high-reward, led to less attentional capture by an irrelevant singleton distractor on a subsequent trial when the color scheme of targets and distractors remained constant. If the colors changed between trials, on the other hand, performance was slowed. In one of their studies (Hickey et al., 2010b), low-magnitude rewards even led to an apparent devaluation of target features and attention was applied more slowly to these features. Participants responded slowly when the target was the same as on the preceding trial, but quickly when the colors swapped following low-magnitude rewards. Hickey et al. argued that rewards could have an inhibitory effect if target color differed from the last trial.

Hickey et al. termed this “reward priming”—the high reward biases selective processes toward a transiently valued stimulus feature, and can lead to inhibited responses if that feature is currently present on a distractor, analogously to priming of visual search (see e.g., Olivers and Humphreys, 2003; Muller et al., 2004; Theeuwes et al., 2006; Kristjánsson et al., 2008; Lamy et al., 2010; Ásgeirsson et al., 2014; see, e.g., Kristjánsson and Campana, 2010; Lamy and Kristjánsson, 2013, for review). What distinguishes this result from many other studies in the literature on reward and attention is that there was no contingency between target and distractor color on the one hand, and actual reward on the other. Hickey et al. seemingly isolated a direct effect of reward reception that was independent of longer-term, or motivational factors. In other words, this suggested 1-trial effects from rewarded attention deployments. Note that their effect was dependent upon the presence of an irrelevant color singleton, suggesting that the reward affects selection rather than other processes. Selection is, typically, much more difficult when a highly salient singleton distractor is presented alongside a less salient target stimulus (Jonides and Yantis, 1988; Theeuwes, 1992; Franconeri et al., 2005). That reward priming affects efficiency of selection is further supported by electrophysiological evidence. High rewards affect the N2pc ERP component differentially; depending on whether the color scheme of a display repeats or the colors swap (Hickey et al., 2010b).

Hickey et al. proposed that this reflected a general transient effect of reward associated with a particular target color that boosts target saliency. Throughout their experiments, they always used the same stimulus set, with set-size, color, and target identity manipulations; never manipulating salience directly or generalizing the effects to other stimulus sets [but see Hickey and van Zoest (2012) for related findings with eye-movement measures]. Here we conduct a conceptual replication of their experiments to generalize their findings to other stimulus sets (Experiments 1 and 2).

Another point that merits attention is that the main dependent variable in the tasks of Hickey et al. is response time. Since reward did not depend on response speed but only on whether the response was correct or not, this means that the most lucrative strategy was not to respond as quickly as possible, but rather to respond correctly. The main dependent measure is, in other words, not rewarded. A second motivation here was therefore to investigate the role of strategy, by adding motivation for speed (Experiment 3), where observers were told that they had 45 min to maximize their earnings. The faster they responded the more trials were presented so that their chances to earn money increased. Finally, in experiment 4 we attempt to replicate the Hickey et al. result using their original task.

Experiment 1—Conceptual Replication of Hickey et al. (2010a)

The first experiment was designed to conceptually replicate the results of Hickey et al. (2010a) testing a task similar to the one used in their experiments, while changing the stimuli in the hope of generalizing the results beyond their exact paradigm. If reward priming reflects increased salience of recently rewarded features, the effect should be replicable in any paradigm where there is sufficient selection pressure, and the random feature (e.g., color) is sufficiently salient. We therefore expected to find results consistent with 1-trial reward priming in our conceptually identical paradigm. From the Hickey et al. (2010a) result we expected observers to respond faster to a target whose color is repeated from the previous trial if it also resulted in a high reward. However, if the target changes color between trials and observers have previously received high reward, they should respond more slowly to the current target. Such a divergence in results would constitute reward priming following high, but not low reward but no color-association effects following low reward trials (Figure 1A). But based on Hickey et al. (2010b, 2011) we might also expect the opposite pattern to the high reward pattern for trials following low reward—that observers will respond slowly to targets sharing color with a previous target. Conversely, observers should be faster when a target changes color between trials, immediately following reception of low reward (Figure 1B). Tentatively, we will consider the emergence of either pattern a successful replication.

FIGURE 1

Figure 1. (A) The reward priming pattern reported in Hickey et al. (2010a), where low reward did not affect the subsequent trial, but a high reward did so contingent on whether color was repeated or not. (B) The reward priming pattern reported in Hickey et al. (2010b, 2011), where both high and low reward affected response times, contingent on color repetition, but in the opposite ways, as if a target previously associated with high reward was subsequently highly valued, but a target previously associated with low reward was subsequently devalued. The figure does not represent actual data and is only for illustrative purposes. See Hickey et al. (2010a,b, 2011) for details on their results.

Methods

Subjects

Twenty observers participated. They were randomly assigned to either experiment 1 or experiment 3 from a pool of 40 participants (26 female), aged 19–30 year (mean = 23.8 year). Due to privacy restrictions, age and gender information is not available for each experiment separately. The project was approved by the Research Ethics Committee of the Department of Psychology, University of Copenhagen, and the IRB at the University of Iceland.

Stimuli and apparatus

Stimuli. Gabor-patches of low or high spatial frequency (approx.1 and 4 cycles per degree, respectively), appeared on a dark background (lum. = 0.2 cd/m2). The diameter of each Gabor was 4.3°. They were tilted ± 45° from vertical (see Figure 2). The target and distractor stimuli were pinkish red (x = 0.412, z = 0.304, lum. = 35.6 cd/m2, at maximal luminosity) or light green (x = 0.288, z = 0.384, lum. = 32.1 cd/m2, at max. luminosity). On each trial, four Gabor patches were presented on an imaginary circle (radius = 7.4°), centered on a fixation cross. Stimulus configuration varied randomly between trials such that the distance between all 4 stimuli was always equal, but there were 12 potential stimulus positions, resulting in three different configurations. A target was defined as the stimulus with the odd spatial frequency, whereas all other stimuli were of the opposing spatial frequency.

FIGURE 2

Figure 2. (A) A trial started when a fixation-cross appeared. The stimulus array was presented until response, followed by a feedback display signifying the amount of reward or punishment obtained on the current trial. (B) All possible trial types relative to the example trial type in panel (A). Each type was equally likely and the amount of reward was not related to trial type or any other task attribute. Note that the main hypothesis is tested by calculating the means of trials where the irrelevant singleton was present on the current trial, but absent on the previous (n-1) trial. The target is always shown as a low spatial frequency singleton in the western position. Targets could also be high spatial frequency singletons, among low-frequency ones, and were equally likely to appear at any of the four positions. Stimuli are not drawn to scale. See Stimuli and apparatus for stimulus specifications.

A target, defined by odd spatial frequency, was present on all trials. A salient singleton distractor was present on 50% of trials, while the other half had only non-targets plus the target. We will refer to these as present/absent trials throughout, unless otherwise noted. On present trials, there were always two non-targets on the screen, alongside the target and the singleton distractor. The non-targets always shared the color of the target, but shared spatial frequency with the oddly colored singleton distractor. On absent trials, there were three non-targets while all other features were the same as on present trials.

The percentage of present vs. absent trials differed from Hickey et al. (2010b), where a singleton distractor was present on 80% of trials. The reasoning behind this change is that Hickey et al. (2010a, p. 4) argued that “novelty” should disrupt search to the greatest degree, and therefore based their analysis solely on trials where a singleton distractor was present, but had not been present on the previous trial¹. We therefore reasoned that disruption by novelty would increase with more absent trials. Simultaneously, we increased the statistical power of our design, by keeping 25% of our trials eligible for analysis (half of our trials are present trials, and half of those will follow an absent trial), compared to 16% in Hickey et al. (2010b; 80% of their trials were present trials, but only 20% of those followed an absent trial).

Reward schedule. As in the studies by Hickey et al. (2010a,b, 2011), the reward schedule was not contingent on any display parameters, but selected by a balanced randomization algorithm for each trial (“high” reward = (5 ISK/0.3 DKK); “low” reward = (0.5 ISK/0.03 DKK). Punishment (following errors) had only one level, equal to the negative of “high” reward. Rewards were also signaled by audible feedback. A high-pitched “ka-ching” sound was played following high reward; when the reward was low, a high note was played (C6); and when the response was incorrect and the observer was punished, a medium pitched note was played (C4). Examples of these feedback noises were given before the experiment started.

Apparatus

The experiment was carried out in two laboratories and apart from any differences noted, methods were identical. At the University of Iceland, the experiment was run on a 2.8 GHz Dell Optiplex 760 desktop computer connected to a 100 Hz 14′ CRT display. At the University of Copenhagen, it was run on a 2.66 GHz Dell Optiplex 255 connected to a 100 Hz 19′ CRT display. Stimulus presentation was programmed in Matlab^® using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Viewing distance was adjusted so that retinal size was practically identical in the two setups.

Design and procedure

Observers were presented with illustrated task instructions before the experiment started. They were instructed to respond as quickly as possible, without making many errors. The reward scheme was explained, i.e., that when responding correctly observers would receive high or low reward, but when they responded incorrectly they would lose money. Following the instructions they viewed and listened to computerized examples of “dummy” trials and the audio feedback related to each reward/punishment level, before completing 30 practice trials. During practice trials, reward balance was displayed (but not paid out). Observers were informed of this beforehand. Following practice the experiment started, run as a single block of 900 trials. Four observers ran only 811 trials, but were otherwise treated identically to all other observers.

A single trial started with the presentation of a fixation cross at screen center for 700–1300 ms after which the four Gabor patches were presented on an imaginary circle. On half of trials, the Gabor patches were all uniformly colored, red, or green (singleton distractor absent trials). On these trials, there were always three non-targets whose spatial frequencies were the same and one target, defined by odd spatial frequency. On the other half of trials, three Gabor patches, two of which were non-targets and one a target, shared a color, while the fourth patch had the opposing color (the singleton distractor). Note, however, that stimulus colors were completely irrelevant to the task.

Observers located the item of odd spatial frequency (the target-defining variable) and reported whether that Gabor was rotated −45° (“J” key) or + 45 (“L” key) from vertical (the response feature). As soon as a key was pressed, the reward amount appeared at screen center and the appropriate feedback sound was played. The next trial started a second later with the presentation of the fixation cross in isolation. This procedure was interrupted every 30 trials by the presentation of the total amount earned. After finishing the experiment, observers were debriefed on the experimental hypothesis and informed of their total earnings.

Results

Before analyses, we applied individual filters to each observer's data where very slow reaction times were discarded (individual mean + 3 std; 1.8% of trials, between-observer range: 0.2–2.6%). As expected there were large differences between trials where the irrelevant singleton distractor was present vs. absent [810 vs. 699 ms, respectively; t₍₁₉₎ = 8.409, p < 0.001]. This shows that our modified irrelevant singleton paradigm is analogous to the one in Theeuwes (1992), in that observers have trouble ignoring color singletons, even when always irrelevant. Observers were also more accurate in the distractor absent condition (98.5%) compared to the distractor present condition [97.2%; t₍₁₉₎ = −3.337, p < 0.004].

We then filtered the dataset further to match that of Hickey et al. (2010a, p. 4). We discarded all incorrect trials, as well as trials preceded by incorrect trials since they were also not preceded by reward. We also discarded singleton distractor absent trials, and trials that were not preceded by absent trials (see also Method). Finally, we limited analyses to trials where response/orientation was repeated from the previous trial. The filtering process left an average of 101 trials per participant.

Following Hickey et al. (2010a) we contrasted reaction times where target color is constant between subsequent trials compared to when it changes and, further, whether the current trial is preceded by randomly determined high or low reward. A repeated measures ANOVA of within-subject effects on reaction times revealed a marginally significant main effect of target color repetition [F_{(1, 19)} = 4.236, p = 0.054, η²_p = 0.182] but no effect of reward value [F_{(1, 19)} = 1.129, p = 0.301, η²_p = 0.056]. Most importantly, there was no interaction between target color repetition and reward value [F_{(1, 19)} = 1.612, p = 0.220, η²_p = 0.078]. Figure 3A, shows no hint of the reward priming interaction of reward and color reported in Hickey et al. (2010a). Taking accuracy into account, by calculating inverse efficiencies (e.g., Townsend and Ashby, 1983) for each condition and running a repeated measures ANOVA on those, further pushed the trend away from the reward priming interaction.

FIGURE 3

Figure 3. Mean reaction times in experiments 1-4, for all observers by immediately preceding reward and repetition or switch of the color scheme from the preceding trial. Search arrays show stimuli and set-sizes in each experiment (not drawn to scale; see Methods for details). Error bars show within-subject 95% confidence intervals (Cousineau, 2005).

Discussion

There was no evidence of reward priming in experiment 1, or faster responding following highly rewarded trials, when target color was constant between trials, nor a slowing of responses following high reward with a swap of colors. In fact reward barely seemed to affect performance at all. This is surprising, given that Hickey et al. (2011) argued that the salience of the target feature is boosted following a highly rewarded trial, and observed the effect on three separate occasions in a paradigm analogous to ours.

Two features of the respective paradigms should be noted: (i) Our set-size was fixed at 4. Hickey et al. demonstrated reward priming using set-sizes of 6 and 12. Might set-size affect the results? When target identity is unknown in visual search, slopes of set-size vs. response time are negative, decreasing toward an asymptote around a set-size of 10 (Bravo and Nakayama, 1992). This has been attributed to ambiguity (Meeter and Olivers, 2006); with only two non-targets and an unknown target, there is sparse information to determine the target. However, when a target is present among many identical non-targets, there is ample evidence regarding which category each stimulus belongs to. Reward priming may therefore depend on sufficiently unambiguous search conditions. In experiment 1, we used the smallest possible set-size, and consequently the most ambiguous target-distractor relationship, in an irrelevant singleton paradigm, finding no reward priming.

(ii) An important feature of Hickey et al.'s paradigm is that reward value is unrelated to reaction time. If an accurate response is given, reward level is random. The optimal strategy to maximize profit is therefore to emphasize accuracy using a conservative (i.e., slow) response strategy. The adoption of such a strategy would compromise the interpretation of reaction times, the primary measure.

We address these two points in experiments 2 and 3.

Experiment 2—Increased Set-Size

May set-size explain discrepancies between our results and those of Hickey et al.? They originally used a set-size of 12 (2010a,b) and replicated the effect with a set-size of 6. The reward priming effect may depend on larger set-sizes, perhaps reflecting larger pop-out for the target and irrelevant distractor (see Bravo and Nakayama, 1992; Meeter and Olivers, 2006). We therefore re-ran experiment 1 on 19 naive observers, doubling the number of objects in each search display.