On the Influence of Reward on Action-Effect Binding

Ideomotor theory states that the formation of anticipatory representations about the perceptual consequences of an action [i.e., action-effect (A-E) binding] provides the functional basis of voluntary action control. A host of studies have demonstrated that A-E binding occurs fast and effortlessly, yet little is known about cognitive and affective factors that influence this learning process. In the present study, we sought to test whether the motivational value of an action modulates the acquisition of A-E associations. To this end, we linked specific actions with monetary incentives during the acquisition of novel A-E mappings. In a subsequent test phase, the degree of binding was assessed by presenting the former effect stimuli as task-irrelevant response primes in a forced-choice response task, absent reward. Binding, as indexed by response priming through the former action-effects, was only found for reward-related A-E mappings. Moreover, the degree to which reward associations modulated the binding strength was predicted by individuals’ trait sensitivity to reward. These observations indicate that the association of actions and their immediate outcomes depends on the motivational value of the action during learning, as well as on the motivational disposition of the individual. On a larger scale, these findings also highlight the link between ideomotor theories and reinforcement-learning theories, providing an interesting perspective for future research on anticipatory regulation of behavior.


INTRODUCTION
The vast majority of actions we perform in everyday life are directed at producing a particular outcome in the environment. For instance, we may press a light switch because we want to illuminate the room, or boil water because we want to drink a cup of tea. In doing so, we effortlessly select actions that are appropriate for achieving a desired outcome. Accordingly, the ability to associate actions with their immediate and long-term consequences is a key mechanism for learning, and thus for flexible and adaptive control of behavior.
Ideomotor theory (IMT) constitutes the prevailing theoretical approach toward the role of effect anticipation in action control. The earliest versions of IMT can be traced back to the nineteenth century (Lotze, 1852;Harleß, 1861;James, 1890), and these ideas have undergone a renaissance in experimental psychology over the last decades (for recent reviews see Nattkemper et al., 2010;Shin et al., 2010;Pfister and Janczyk, 2012). In a nutshell, the core assumption of IMT is that actions and their perceptual outcomes are cognitively bound together. Performing an action (A) that produces a particular environmental effect (E) is assumed to lead to the formation of a common representation of the two events ("A-E binding"). Importantly, these bindings are conceived as bidirectional. Thus, internally anticipating a desired environmental effect directly activates the associated motor program, thereby promoting goal-directed behavior.
In the laboratory, this cardinal assumption of IMT is commonly assessed with so-called induction paradigms (Elsner and Hommel, 2001). Typically, participants first complete an acquisition phase to establish a novel association between simple actions and arbitrary sensory effects. For instance, participants may perform left-and right-hand button presses, each of which is contingently followed by a specific stimulus (e.g., left button → low-pitch tone, right button → high-pitch tone). In a subsequent test phase, the same responses are performed in a speeded forced-choice response task while the learned action-effects are presented as primes. Presupposing that participants have acquired bi-directional A-E bindings in the learning phase, the perception of a learned action-effect should directly activate the associated response, causing facilitation when the prime was previously the effect of the required response (compatible primes) and interference when the prime was previously the effect of a different response (incompatible primes). Over the last decade, this prediction has been confirmed in numerous studies employing a variety of response and effect modalities (e.g., Hommel, 1996;Elsner and Hommel, 2001;Beckers et al., 2002;Ziessler and Nattkemper, 2002;Kunde, 2004;Ziessler et al., 2004;Herwig et al., 2007).
Interestingly, once A-E knowledge has been acquired, the priming of a response via the activation of an associated perceptual representation seems to occur highly automatically, without requiring further cognitive mediation. For instance, it also occurs in conditions in which effect primes are entirely task-irrelevant (Hommel, 1996) and even when the primes are presented subliminally so that they cannot be consciously perceived (Kunde, 2004). On the other hand, relatively little is known about the factors that contribute to the acquisition of this kind of knowledge. Elsner and Hommel (2004) have investigated situational determinants of A-E binding, demonstrating that it critically depends on the temporal contiguity and the probabilistic contingency between actions and their www.frontiersin.org effects. In other words, A-E binding diminishes with increasing delays between the two events, as well as with reduced predictability of a unique effect. Other studies have shown that cognitive factors such as the internal selection of an action may influence the strength of A-E binding during the acquisition phase (Ziessler et al., 2004;Herwig et al., 2007;Herwig and Waszak, 2009;Kühn et al., 2009; but see Pfister et al., 2011).
Here, we wanted to examine whether the acquisition of A-E bindings can moreover be modulated by factors related to the motivational value of an action. It is well established that monetary incentives can be used to modulate a wide range of human cognitive functions including visual discrimination, conflict resolution, and long-term memory encoding (Wittmann et al., 2005;Engelmann and Pessoa, 2007;Padmala and Pessoa, 2008;Krebs et al., 2012). In these paradigms, reward is typically associated with specific trial types, stimulus types, or entire task blocks, in such a way that the participant is rewarded for correct and/or fast executions of the required response. As such, these stimulus-reward associations are in most cases compatible with the task goal, which generally results in a facilitation of response execution. However, we recently showed that reward associations can also have detrimental effects upon response execution if they trigger specific response tendencies that are incompatible with the task goal (Krebs et al., 2010(Krebs et al., , 2011. Another line of research has demonstrated that not only perceptual but also affective features of outcomes are bound to the actions that produce them. Specifically, in a study by Beckers et al. (2002), one of two responses in a free-choice task was always associated with an electrocutaneous stimulation (negative valence), while the other was not (positive valence). In the subsequent test phase, responses to target words were facilitated if their semantic valence was compatible with the effect previously associated with this response (Beckers et al., 2002). Similar effects of "affective compatibility" have been observed in a recent study by Eder et al. (2012). The authors showed that preparing a response to a picture of positive or negative valence interfered with the actual execution of a subsequent response to a word of similar valence. This suggests that action planning involves the activation of associated affective features, making them less accessible to other responses that share this feature.
While these findings highlight that affective codes are a part of the mental representation of an action, we wanted to further investigate whether motivational values of an action would modulate the degree of A-E binding -a notion which has not yet been tested. To this end, we associated two out of four actions with monetary incentives during the acquisition phase of an induction paradigm. In the subsequent test phase, we assessed the influence of compatible and incompatible effect primes, which could be related to former reward or to no reward, in the absence of any further monetary reinforcement. Considering previous evidence that affective feedback stimuli can strengthen sensorimotor integration (Colzato et al., 2007a;Waszak and Pholulamdeth, 2009), and that reward-related stimuli can prime response tendencies even if they are task-irrelevant (Krebs et al., 2010), we predicted that binding would be stronger for rewarded A-E mappings as compared to unrewarded mappings. This should be reflected in increased compatibility effects for primes that were previously related to a rewarded action, and would provide direct evidence that the acquisition of action-effect knowledge can be modulated by changes in the motivational value of an action and its consequence.

PARTICIPANTS AND PROCEDURE
Twenty-six undergraduate students from Ghent University (eight male, four left-handed) participated in the study (mean age = 18.72 years; SD = 1.02). They all had normal or corrected to normal vision, gave written and informed consent to participate, and were naive to the rationale of the experiment. Stimuli were presented on a PC with a 17" monitor and responses were given with both index and middle fingers using the buttons "A," "S," "K," and "L" on a QWERTY computer keyboard. Following the experiment, participants completed the Behavioral-Inhibition and Behavioral-Activation Scales (BIS/BAS; Carver and White, 1994) to assess individual sensitivity to reward. The whole procedure lasted approximately 30 min. All participants received a basic compensation of 4 euro and an average performance-related bonus of 2.5 euro.

EXPERIMENTAL DESIGN
In line with previous research on A-E binding, the experiment consisted of two phases. First, participants completed an acquisition phase to establish learning of novel A-E mappings. For the given purpose, we manipulated the reward value of these mappings by associating half of them with monetary incentives. In the subsequent test phase, in which participants could no longer earn bonuses, the degree of A-E binding was assessed by presenting the previous action-effects as task-irrelevant response primes. Based on our assumption that reward would modulate the binding between actions and their effects during the acquisition phase, we predicted that reward-related primes would induce greater incompatibility effects as compared to reward-unrelated primes in the test phase.

ACQUISITION PHASE
The acquisition phase consisted of a forced-choice reaction time (RT) task with four different responses. Within a given block, each response was consistently mapped onto one specific picture (response cue) taken from a set of line drawings (Snodgrass and Vanderwart, 1980). At the beginning of each block, the four specific response cues were presented on the screen along with their associated responses. In each trial, after a variable intertrial interval (ITI) of 800-1000 ms, one of the cues was centrally presented for the maximum duration of 1500 ms ( Figure 1A, left panel). Immediately after a response was given, or the maximum duration was reached, a colored square was displayed for 500 ms in the background of the cue, serving as a visual action-effect (see Wolfensteller and Ruge, 2011 for a similar procedure). In case of correct responses, the background color was responsespecific (red, green, blue, or yellow), and in case of incorrect or late responses (>1500 ms) the background square turned gray. Participants were instructed to respond to the cues as quickly and as accurately as possible. Furthermore, they were told that the background color would indicate if their response on a given trial was correct and within the critical time window. Importantly, the picture category of the current cue (living animals vs. non-living objects) indicated whether a correct response (action, A) would be rewarded (reward action, RA) or not (no-reward action, NA).

FIGURE 1 | Illustration of the experimental paradigm in the acquisition phase (A) and test phase (B).
During acquisition, two out of the four actions were associated with reward (RA vs. NA). The unique effects (E1-E4) that were produced by specific actions (A1-A4) were used as response primes in the subsequent test phase. Primes could be either compatible with the required response (cP) or incompatible (shown for one exemplary A-E mapping). Due to the reward manipulation during acquisition, incompatible primes in the test phase could be either related to reward (iRP) or to no-reward (iNP) effects. The primes, however, were entirely irrelevant to the task and no longer predictive of reward in the test phase.
For each correct response that was given within the maximum time window of 1500 ms, 10 points were automatically added to the participants' score, which determined the total gain in Euro cents (0.5 euro per 200 points). The cue-category association with reward was counterbalanced across participants and cue categories were equally assigned to both hands and to index and middle fingers. In each block, a novel set of cue pictures was introduced in order to keep the task at a constant level of difficulty. However, mappings between cue categories and responses, and between responses and effect colors were constant for each participant (counterbalanced across participants). Overall, participants worked through four blocks of 60 trials, resulting in 120 reward trials and 120 no-reward trials performed with two fingers each.

TEST PHASE
In the test phase, participants completed a similar RT task using the same responses as before. They were told that there was no longer anything to win, but that they should continue to respond as quickly and accurately as possible. Importantly, responses were cued by a new set of pictures that were not associated with the previous cue categories (abstract symbols from the creative symbol collection of Matton images 1 ). The new cue-category was introduced to eliminate a potentially confounding influence of stimulus-effect associations on task performance in the test phase (cf. Wolfensteller and Ruge, 2011). To probe the degree of A-E binding, the previous action-effects were now presented as response primes (i.e., displayed as squares in the background at 100 ms prior to cue onset until the offset of the cue). Participants were instructed that the colors were irrelevant for the task at hand and should thus be ignored. Analogous to the acquisition phase, cues remained on the screen for a maximum duration of 1500 ms. After a response was given or the maximum duration was reached, performance feedback was presented centrally for 500 ms, with a "+" indicating correct and fast responses and a "−" indicating response errors or omissions ( Figure 1B). All possible combinations of response cues and primes were presented equally often, resulting in three types of primes: (1) compatible primes (cP, compatible to previous A-E mapping), (2) incompatible reward-related primes (iRP, effect of a different previously reward-related response), and (3) incompatible no-reward primes (iNP, effect of a different previously rewardunrelated response). Moreover, responses themselves could be distinguished based on whether they had been related to reward in the acquisition phase (former RA) or not (former NA). Altogether, participants completed eight trials of each prime response combination, resulting in a total of 128 trials (32 cP, 48 iRP, 48 iNP). www.frontiersin.org

ACQUISITION PHASE
As expected, participants' responses were faster on trials with RA than on trials with NA (RA < NA; t = 6.58, p < 0.001; Table 1), confirming that cue-reward associations facilitated performance in the respective trials. Overall, participants responded highly accurately with a small numerical but non-significant difference between reward and no-reward trials (96.8 vs. 95.4%; p > 0.1).

TEST PHASE RESPONSE TIMES (RTs)
Mean RTs of correct responses in the test phase were analyzed using a 2 × 3 repeated-measures analysis of variance (rANOVA) with reward-relatedness of the action (RA vs. NA) and prime compatibility (cP vs. iNP vs. iRP) as within-subject factors (Figure 2A; Table 1). The assumption of sphericity for the rANOVAs was tested using Mauchley's method. Since no significant violations were observed (all W -values > 0.8, p > 0.2), uncorrected

TEST PHASE ACCURACY
An identical rANOVA on the response accuracy revealed no main effects of reward-relatedness of the action or prime compatibility, and no interaction of the two factors (all p-values > 0.1). This indicates that the conditions did not differ with regard to the absolute percentages of errors. We conducted an additional analysis of the relative percentages (i.e., ratios) of different error types across conditions to explore whether the ratio of prime-consistent errors would be increased in iRP-trials. This would support the notion that the perception of former reward-related effects indeed induced a specific, albeit false, action in the test phase (see Schmidt and De Houwer, 2011 for a similar analysis of different error types). To this end, we distinguished between prime-consistent errors, defined as erroneous responses that were consistent with the incompatible prime on a given trial, and prime-inconsistent errors, defined as erroneous responses that were not consistent with the incompatible prime, i.e., random errors. Observed ratios for prime-consistent errors were compared with a baseline of 33.3% that would be expected under a random error distribution with only one out of three possible false responses being primeconsistent. It should be noted that this analysis is limited in two ways, and must hence be considered exploratory: first, due to the nature of the paradigm, only incompatible conditions could be included, as no prime-consistent errors could be made on compatible trials. Second, the analysis could only be performed on a subset of participants, i.e., those who committed errors in the respective conditions (former RA trials: N = 13; former NA trials: N = 11). Ratios of prime-consistent errors were significantly increased in only one condition, namely on trials in which former no-reward responses were primed with incompatible reward-related effects [iRP: 62 vs. 33.3%, t (10) = 2.3, p = 0.042].

INDIVIDUAL REWARD RESPONSIVENESS
Our final analysis was concerned with the relation of participants' task performance to inter-individual differences in reward responsiveness. If the observed priming effect indeed reflects rewarddriven strengthening of A-E bindings, then the size of this effect may be related to participants' dispositional sensitivity to rewarding events. To this end, we correlated individual RT-differences between iRP-trials and iNP-trials with the individual scores on the reward responsiveness subscale of the BIS/BAS (Carver and White, 1994), which is thought to reflect an individual's dispositional responsiveness to rewarding events. In the present sample, individual reward responsiveness scores varied between 14 and 20 (mean score = 17, SD = 1.74). We observed a significant correlation between RT difference values (NA-iRP minus NA-iNP) and the reward responsiveness subscale across all 26 participants [r(24) = 0.42, p = 0.030, two-tailed], indicating that those participants who reported being more responsive to reward in general showed a greater slowing on NA-iRP-trials compared to NA-iNP-trials ( Figure 2B).

DISCUSSION
The present study investigated the influence of reward on A-E binding. We hypothesized that the intrinsic tendency to associate actions with their contingent outcomes could be influenced by assigning motivational values to specific actions. Following an acquisition phase in which half of the applied A-E mappings were related to monetary incentives, the strength of A-E binding was assessed in a test phase by presenting the former action-effects as task-irrelevant primes. Altogether, three major findings were evident, all of which confirmed our prediction. First, and most importantly, induction effects were only found for primes that had been associated with reward during acquisition, providing direct evidence that reward strengthens the association between actions and their outcomes. Note that these differential effects occurred although the primes were entirely irrelevant to the task at hand and they were no longer

FIGURE 2 | Influence of reward-related primes in the test phase. (A)
Despite being entirely irrelevant to the task and being no longer predictive of reward, incompatible reward-related primes (iRP) differentially increased RTs to new cues in the test phase. This effect was unique to former NA responses, in which the required action was never associated with actual reward. Error bars depict the standard error of the mean (SE) for within-subject comparisons. (B) The size of the RT-differences on trials with incompatible reward-related primes compared to trials with incompatible reward-unrelated primes (iRP > INP) on former NA trials correlated with participants dispositional responsiveness to reward.
predictive of any reward, which highlights the automatic nature of the binding process. Second, besides slower RTs on trials with correct responses, the same incompatible reward-related primes also increased the ratio of prime-consistent errors compared to a random distribution. This finding nicely illustrates the specificity of the interference effect at the response level and thus directly mirrors the concept of bi-directional action-effect representations in the framework of IMT. Third, inter-individual differences in reward responsiveness predicted the size of differential binding effects for reward-related and reward-unrelated primes. This finding further supports the idea that the observed induction effect with reward-related primes is related to incentive value representations of specific A-E bindings, which likely vary across individuals. Such a pattern is highly consistent with previously reported correlations between reward-sensitivity traits and actual behavioral responsiveness to reward (Kambouropoulos and Staiger, 2004), as well as between reward-related performance facilitation and neural activity in brain regions implicated in reward processing (Locke and Braver, 2008).
It is, however, important to consider to what extent the observed induction effect with reward-related primes indeed reflects a modulation of A-E binding in the acquisition phase. It could be argued that the influence of former reward effects arises from prioritized processing of a salient stimulus. Several possible outcomes are possible: for instance, stimulus processing could be generally facilitated by the salient effect, similar to effects of reward-related colors in a visual search array (Kiss et al., 2009). Such facilitation should, however, result in faster rather than slower response execution due to the advanced access to stimulus information. The salient effect color could also lead to a general distraction form the main task. Such effects have been demonstrated by using salient stimuli as irrelevant flankers in a target-discrimination task (Serences et al., 2005), as well as for reward-related colors that were presented at irrelevant positions in a visual search task (Hickey et al., 2010).
Finally, participants could have experienced some kind of frustration in trials displaying former reward-related effects in the test phase, as they could no longer earn bonus money. In turn, frustration could cause unspecific attentional distraction. Importantly, however, all these forms of attentional distraction are unlikely to trigger specific erroneous response tendencies, which is suggested by the result of the exploratory error types analysis in the present study.
It is moreover key to exclude the possibility that the observed differential effect in the test phase is an artefact of the individuals' performance during the acquisition phase. As noted above, there was no difference in performance accuracy between reward-related and unrelated trials. Thus, participants experienced a similar number of A-E couplings in both conditions. Furthermore, participants responded faster in reward-related trials in the acquisition phase. This nicely illustrates that participants were indeed motivated by the prospect of reward and optimized their performance accordingly (Krebs et al., 2010;Schmidt et al., 2012). It could thus be argued that the observed binding for reward-related A-E mappings is a mere consequence of participants allocating more attention to the reward-related color effects during acquisition. Although recent evidence indicates that directing the focus of attention toward action outcomes during the acquisition phase does not automatically facilitate A-E binding (Herwig and Waszak, 2009), future research should certainly specify the mechanisms by which reward modulates A-E binding and to what extent it relies on the modulation of attentional mechanisms.
An additional interesting observation was that responses that had been associated with reward during acquisition were unaffected by prime compatibility in the test phase. Considering that reward-predictive stimuli have not only been shown to increase attention but also to strengthen the associated response pathways (e.g., Krebs et al., 2011;Schmidt et al., 2012), it is feasible to assume that former reward-associated responses in the current study are www.frontiersin.org less prone to interfering information, namely incompatible effect primes.
Another noteworthy finding in the present study was the absence of significant compatibility effects with reward-unrelated primes. This non-finding is rather surprising since binding for unrewarded effects has already been demonstrated frequently in the literature (e.g., Hommel et al., 2003or Hoffmann et al., 2009). However, the absence of compatibility effects for reward-unrelated primes may be associated with methodological aspects of the present experimental design. First, our study employed visual action-effects, which have been shown to be less salient than auditory action-effects, thereby leading to weaker A-E binding (Kunde, 2001;Dutzi and Hommel, 2009). Moreover, the paradigm was designed to minimize the influence of possibly confounding factors that could artificially inflate the size of induction effects. For instance, we excluded an influence of cue-effect associations by introducing a novel set of pictures as cues in the test phase. Furthermore, the present study employed a full combination of primes and responses, i.e., each effect occurred multiple times both as compatible and as incompatible prime. By using this design, the influence of each particular effect is necessarily weakened in comparison with classical paradigms that present effect stimuli as either only compatible or only incompatible primes in the test phase (cf. Elsner and Hommel, 2004;Wolfensteller and Ruge, 2011). A final paradigmatic aspect relates to the timing of prime presentation relative to the onset of the response cues. Recently, Ziessler and Nattkemper (2011) employed a systematic manipulation of the stimulus-onset asynchrony (SOA) between effect primes and response cues. Effects of prime compatibility were only observed when the primes were presented after cue onset. Thus, the absence of priming effects for reward-unrelated effects in the present study could be partly due to the fact that the primes may not have been presented at the time of their maximal effectiveness.
From a more general perspective, it is moreover a common observation that the introduction of reward signals not only modulates performance in those trials that are subject to actual reward, but it also modifies the general task context, resulting in altered performance on the no-reward trials, as compared to a "neutral" task-contexts without reward (e.g., Braem et al., 2012). Thus, in the present study, the presence of reward in the acquisition phase may have influenced participants' experience of the unrewarded A-E mappings as well. It could be argued that unrewarded effects in a reward context may be perceived as less significant. Specifically, it has been demonstrated that behavioral and neural influences of high-reward vs. low-reward stimuli critically depend on the overall context, i.e., the differences between trial types become more distinct in a general reward context (Delgado et al., 2004). Such a relative "devaluation" of unrewarded effects may counteract A-E binding in the present paradigm, such that for an action which does not produce an explicitly positive outcome, a bi-directional binding of the two events might be attenuated. Future research could explore this question by explicitly introducing reward as well and punishment signals during the acquisition of A-E associations.
Future research should also specify the precise mechanisms by which reward enhances the association strength of motor representations and representations of the respective sensory outcomes. It is known from numerous studies employing reward-modulated paradigms that reward associations can influence cognitive functions and behavior via diverse mechanisms (Pessoa, 2009;Pessoa and Engelmann, 2010). Among them are the prioritization of perceptual processing and the enforcement of specific response tendencies, as well as the increase of cognitive and physical effort to perform the task and the change of long-term stimulus representations. While conclusive statements about the underlying mechanism may not be warranted based on the present data, it appears likely that reward modulates the behavioral relevance of both an action and its consequence, which may in turn enforce the joint coding of the two events. With regard to the neural level, dopamine has been proposed to underlie the formation of sensorimotor associations (Colzato et al., 2007a). Considering that reward-predicting stimuli are known to trigger dopaminergic activity (Knutson and Gibbs, 2007;Schott et al., 2008), it is likely that the reward-related effect in our own study is mediated by dopamine as well. Future studies will be needed to illuminate this relationship further, e.g., by assessing markers of individual dopamine levels, such as the spontaneous eye-blink rate, as covariates (Colzato et al., 2007b;Aarts et al., 2012), or by employing a similar paradigm in individuals with specific genotypes or clinical conditions promoting differential striatal dopamine levels (Schott et al., 2007;Yacubian et al., 2007).