The Impact of Deliberative Strategy Dissociates ERP Components Related to Conflict Processing vs. Reinforcement Learning

Warren, Christopher  Michael; Holroyd, Clay  Brian

doi:10.3389/fnins.2012.00043

ORIGINAL RESEARCH article

Front. Neurosci., 03 April 2012

Sec. Decision Neuroscience

Volume 6 - 2012 | https://doi.org/10.3389/fnins.2012.00043

This article is part of the Research TopicThe Neuroscience and Psychophysiology of Experience-Based DecisionsView all 8 articles

The impact of deliberative strategy dissociates ERP components related to conflict processing vs. reinforcement learning

Christopher M. Warren*

Clay B. Holroyd

Department of Psychology, University of Victoria, Victoria, BC, Canada

We applied the event-related brain potential (ERP) technique to investigate the involvement of two neuromodulatory systems in learning and decision making: The locus coeruleus–norepinephrine system (NE system) and the mesencephalic dopamine system (DA system). We have previously presented evidence that the N2, a negative deflection in the ERP elicited by task-relevant events that begins approximately 200 ms after onset of the eliciting stimulus and that is sensitive to low-probability events, is a manifestation of cortex-wide noradrenergic modulation recruited to facilitate the processing of unexpected stimuli. Further, we hold that the impact of DA reinforcement learning signals on the anterior cingulate cortex (ACC) produces a component of the ERP called the feedback-related negativity (FRN). The N2 and the FRN share a similar time range, a similar topography, and similar antecedent conditions. We varied factors related to the degree of cognitive deliberation across a series of experiments to dissociate these two ERP components. Across four experiments we varied the demand for a deliberative strategy, from passively watching feedback, to more complex/challenging decision tasks. Consistent with our predictions, the FRN was largest in the experiment involving active learning and smallest in the experiment involving passive learning whereas the N2 exhibited the opposite effect. Within each experiment, when subjects attended to color, the N2 was maximal at frontal–central sites, and when they attended to gender it was maximal over lateral-occipital areas, whereas the topology of the FRN was frontal–central in both task conditions. We conclude that both the DA system and the NE system act in concert when learning from rewards that vary in expectedness, but that the DA system is relatively more exercised when subjects are relatively more engaged by the learning task.

Introduction

Adaptive decision making depends on both fast and efficient processing of stimulus events for effective responding (e.g., Servan-Schreiber et al., 1990) and slow trial-to-trial learning of action values for optimizing the selection process (e.g., Schultz et al., 1997). The catecholinergic neuromodulatory systems that distribute norepinephrine (NE) and dopamine (DA) have been implicated in these two groups of processes, respectively (Servan-Schreiber et al., 1990; Schultz et al., 1997). Further, putative manifestations of these systems have been identified in the human electroencephalogram (EEG; Holroyd and Coles, 2002; Nieuwenhuis et al., 2005a,b; Warren et al., 2011). However, the way these two systems interact has yet to be explored.

The locus coeruleus–norepinephrine system (NE system) is believed to play a key role in facilitating fast and effective processing of task-relevant stimuli (Usher et al., 1999). The locus coeruleus (LC) is a neuromodulatory nucleus in the midbrain that briefly enhances cortical processing in reaction to motivationally salient or conflict-inducing events (Usher et al., 1999; Gilzenrat et al., 2002). The LC is the primary source of NE to the cortex and other regions (Berridge and Waterhouse, 2003), where NE release increases the responsivity of individual neurons and improves the signal-to-noise ratio of associated neural networks (Servan-Schreiber et al., 1990). Single-cell recordings from the LC in monkeys show that the LC releases NE in phasic bursts to motivationally salient events, and periods of greater phasic release of NE are associated with better performance in target discrimination tasks (Usher et al., 1999). The NE system is also auto-inhibitory, such that phasic bursts of NE are followed by a refractory-like¹ period lasting ∼500 ms characterized by reduced or arrested NE supply to the cortex.

In a previous paper (Warren et al., 2011), we proposed that the impact of phasic bursts of NE on cortical processing manifests in the human EEG as an increase in amplitude of the N2, a negative deflection of the human event-related brain potential (ERP) occurring between about 200 and 300 ms after the onset of the eliciting stimulus, the amplitude of which is exercised by unexpected or conflict-inducing events (e.g., Nieuwenhuis et al., 2003). This theory is a modification of a previous “LC–P3 theory” that holds that the phasic bursts of NE produce the P3 – a prominent, positive deflection in the ERP that immediately follows the N2 – rather than the N2 itself (Nieuwenhuis et al., 2005a). Thus, our “modified LC–P3 theory” develops this account by proposing that the LC burst impacts cortical activity somewhat earlier than originally proposed, during the time period of the N2 (∼250 ms post-stimulus), whereas the LC refractory period coincides with P3 generation.

A key prediction of our proposal is that any change in the ERP due to noradrenergic modulation should exhibit a variable scalp distribution dependent on relative engagement of the different cortical areas giving rise to the ERP. This position follows from two key characteristics of the NE system. First, the broadly dispersed efferent projection system of the LC distributes NE to all regions of the cortex, so any given phasic release can modulate neural activity (and the associated N2) anywhere in cortex (Berridge and Waterhouse, 2003; Nieuwenhuis et al., 2007). Second, NE-mediated changes in activity should be greatest in cortical areas that are most engaged by the task at hand because increasing the signal-to-noise ratio in the entire cortex will have the greatest impact in those areas (Nieuwenhuis et al., 2005a, 2011). This position contrasts with theories of the N2 which posit that the N2 is produced specifically by the anterior cingulate cortex (ACC) and should therefore exhibit a relatively fixed topology, maximal at frontal–central regions of the scalp (e.g., van Veen and Carter, 2002a,b; Yeung et al., 2004).

In previous work, we supported the modified LC–P3 theory by demonstrating that the scalp distribution of the N2 varies widely according to task changes that relatively engage different cortical areas (Warren et al., 2011). We presented subjects with pictures of male and female faces that were tinted either blue or yellow. Subjects attended to either the gender or the color of the faces and counted targets in an oddball task. The impact of frequency was isolated by subtracting frequent stimulus trials from infrequent stimulus trials, yielding a difference-wave representative of the change in neural activity specifically caused by differences in stimulus probability (and putatively due to differences in NE recruitment). When subjects attended to the color of the face, the N2 in the difference wave was maximal over frontal–central regions as is often observed in simple oddball tasks (e.g., Nieuwenhuis et al., 2003; Holroyd et al., 2008; but see Folstein and Van Petten, 2007), consistent with arguments that the N2 is generated in the ACC (van Veen and Carter, 2002a,b; Yeung et al., 2004). By contrast, when subjects attended to the gender of the faces the N2 in the difference wave was maximal over lateral-occipital regions, consistent with a relatively large change in activity within the fusiform face-processing area (FFA). This study demonstrated that identical task stimuli (colored faces) presented with identical task designs (standards and deviants) can nevertheless radically alter the topology of the N2 depending on which aspect of the stimuli participants are instructed to attend.

An interesting special case of the N2 occurs when the eliciting stimulus is a feedback stimulus in a reward/no-reward paradigm. A negative feedback stimulus (e.g., that indicates a potential reward was not received) elicits a frontal–central negative deflection in the same time range as the N2, but positive feedback does not (Miltner et al., 1997). This difference is called the feedback-related negativity (FRN), and is usually measured with a difference wave approach whereby the ERP to reward feedback is subtracted from the ERP to error feedback (Holroyd and Krigolson, 2007). It is important to note that the FRN may be characterized by variance in the ERP associated with both negative and positive feedback. Source localization studies suggest that the FRN is generated in, or very close to, the ACC (Gehring and Willoughby, 2002; Miltner et al., 2003; Hewig et al., 2007). Additionally, a neurocomputational theory of this ERP component is based on the seminal observation that rewarding events elicit phasic bursts of dopamine (DA) activity that are utilized by the targets of the DA system (including the ACC) for the purpose of adaptive decision making (Schultz et al., 1997; Holroyd and Coles, 2002). In particular, single-cell recordings from primates show increased phasic DA activity in response to unexpected rewards or reward predictors, and shallow dips from baseline DA activity in response to punishment or to the absence of expected rewards (e.g., Schultz, 2002). Holroyd and Coles (2002) proposed the reinforcement learning theory of the FRN, which holds that the FRN reflects the impact of these phasic DA signals on the ACC such that motor neurons in the ACC are inhibited and disinhibited by phasic increases and decreases of DA, respectively.

Recent evidence suggests that these phasic DA signals specifically modify the amplitude of the N2. According to this position, the ACC produces a negative deflection to unexpected task-relevant events (the N2), including unexpected negative feedback and unexpected reward feedback. However, unexpected reward feedback also elicits a dopamine-induced positive deflection (“the reward positivity”) that is superimposed over the N2 and cancels it out (Holroyd et al., 2008). In other words, unexpected error and reward feedback elicit the N2, but unexpected reward feedback also elicits a reward positivity that obscures the N2, creating the difference observed between the ERPs to positive and negative feedback (the FRN).

To dissociate the reward positivity from the N2, a recent multi-experiment study presented subjects with complicated reward feedback that indicated not only whether a subject had won or lost money, but also what response was required of them for the subsequent trial (Baker and Holroyd, 2011). In one experiment, a stimulus-induced delay in reward processing caused the reward positivity to appear about 100 ms later than usual (peaking at about 350 ms), thereby exposing the N2 on those trials. When the reward-feedback stimulus was simplified in further experiments, the reward positivity appeared earlier and attenuated the N2. Furthermore, factors related to response conflict impacted N2 amplitude and reduced the reward positivity on high-conflict reward trials.

The ACC has been posited to be the neural generator of both the N2 (van Veen and Carter, 2002a,b; Yeung et al., 2004) and the FRN (Holroyd and Coles, 2002). Furthermore, here we have proposed that noradrenergic modulation enhances activity in the ACC and all across the cortex, amplifying the N2 in target areas. Thus, there are three factors that push the amplitude of the N2 at frontal–central scalp locations up and down: ACC activity, noradrenergic modulation, and dopaminergic modulation. If we have any chance of understanding how the frontal–central N2 provides insight into ACC function, we need to understand how these systems interact – otherwise we will be at a loss to interpret N2 data.

To investigate this issue, we employed the same paradigm used in our previous study (Warren et al., 2011), presenting subjects with male or female faces tinted either blue or yellow, with frequent or infrequent category presentations based on either the gender or the color of the faces. But here the stimuli also indicated reward or no-reward, allowing us to simultaneously examine the N2 and the FRN. We manipulated the amplitudes of the reward positivity and the N2 along two independent dimensions. Along one dimension, we varied (across subjects) the degree of participant engagement in a feedback task, which is known to affect FRN amplitude. For example, Yeung et al. (2005) manipulated the degree to which a deliberative strategy was required of subjects, from passively observing reward/no-reward outcomes, to actively making a decision that would result in either reward or no-reward. The FRN was significantly larger when subjects utilized the feedback to optimize their decisions, as opposed to passively collecting rewards (see also Holroyd et al., 2009; Li et al., 2011; Peterson et al., 2011). We implemented this manipulation across three experiments wherein subjects passively collected rewards in Experiment 1 (Passive Experiment), made a decision based on multiple stimulus feature-response combinations in Experiment 2 (Active Experiment), and intermediate to these, made a decision based on relatively simple response–reward contingencies in Experiment 3 (Moderate Experiment). We predicted that the FRN would be largest in the Active Experiment and reduced or absent in the others. By contrast, we predicted that the N2 would be smaller with increasing task engagement because of component overlap with the reward positivity elicited by infrequent rewards.

Along the second dimension we varied N2 amplitude by manipulating (within subjects) the attended dimension of the feedback: Subjects were required to attend to either the color or the gender of the feedback stimuli (male or female faces tinted either blue or yellow). We predicted that switching from color to gender would move the N2 from frontal–central to lateral-occipital regions of the scalp. By contrast, we predicted that the FRN would remain frontal–central irrespective of the attended dimension of the feedback. Further, we predicted that we would observe maximal interference between the two components in the color condition of the Active Experiment, where both the N2 and the reward positivity are frontal–central. These results would validate our claim that the N2 and FRN are produced by distinct neural mechanisms, one that produces a negativity to infrequent events that has a variable scalp distribution consistent with a noradrenergic origin, and one that produces a positivity to rewards and a negativity to no-rewards that has a frontal–central scalp distribution consistent with genesis in the ACC.

Experiment 1: Passive Learning

In the Passive Experiment we sought to replicate the results of our previous study by engaging the NE system and the N2 in an oddball task with minimal involvement of reinforcement learning systems and therefore minimal interference from the FRN. We employed the exact same paradigm as reported in our previous work (Warren et al., 2011) except that instead of counting stimuli associated with a target category (e.g., male faces), subjects counted earnings accrued with each stimulus presentation (e.g., if subjects were told that they would be given 5 cents for each male face); they were asked to report the sum once during the block and a second time at the end of the block. Importantly, because participants were not required to make an overt response on each trial, we expected this task to elicit only a small FRN, if any (Yeung et al., 2005; Holroyd et al., 2009; Li et al., 2011). Further, as we observed previously, we predicted that relative engagement of the FFA in the attend-gender condition would enhance the N2 over lateral-occipital sites, whereas relative engagement of the ACC in the attend-color condition would enhance the N2 over frontal–central sites. Finally, we predicted that the FRN – to the extent that it was present – would not exhibit any changes in scalp topography.

Method

Methods were identical across all four experiments except where indicated.

Participants

Twenty-one people (three males) completed this experiment. For all experiments reported in this paper, participants signed up through the research participation system at the University of Victoria, Canada, and were compensated with extra credit in an undergraduate psychology course or were paid $20.00 Canadian for their time. This project (Experiments 1 through 4) was approved by the human subjects review board at the University of Victoria and conducted in accordance with the ethical standards prescribed in the 1964 Declaration of Helsinki.

Apparatus and procedure

Participants were seated comfortably, approximately 50 cm in front of a computer screen, in an electromagnetically shielded booth. Stimuli consisted of male or female faces (30 examples of each, lifted from black and white photos, excluding hair and contour of head) and tinted either blue or yellow (∼4.4° visual angle). In a previous experiment (Warren et al., 2011), we used a larger set of the same stimuli (40 males and 40 females), but because the error rates in discriminating between male and female faces were high, here we selected a subset of those stimuli: The 75% that were most accurately discriminated previously. For both stimulus dimensions (color, gender), one stimulus type occurred infrequently (20% of all trials). The order of stimulus presentation was randomized with replacement. At the beginning of each block, subjects were instructed by the computer program to keep track of presentations of a specific target stimulus (blue faces, yellow faces, male faces, or female faces), which when presented would indicate a winning trial. The task consisted of eight blocks of 75 trials each (600 total trials), counterbalanced such that each of the four stimulus types (blue males, yellow males, blue females, yellow females) occurred in two blocks as the target, and of those two blocks, once as a frequent target and once as an infrequent target. Stimuli were presented for 1200 ms and were separated by a fixation cross displayed for 300 ms (see Figure 1, Passive Learning, for a graphic representation of the task).

FIGURE 1

Figure 1. Graphic representation of the four experiments.

Each presentation of the target stimulus category indicated that the subject won $0.05. Subjects were instructed to keep track of the money won and were required to report their count twice per block (at a random trial number about halfway through each block, and at the end of each block). This method yielded 16 reports of the subject’s money count. Subjects reported their count by answering an eight-choice multiple choice question, choosing from several ranges within which the correct count fell (e.g., between $0.30 and $0.50, or between $ 0.55 and $0.75, etc.). We assessed accuracy by dividing the number of correct reports by the number of total reports.

Data acquisition

The EEG was recorded from 41 electrode locations arranged in the standard 10–20 layout using Brain Vision Recorder software (Version 1.3, Brain Products, Munich, Germany). During recording, the EEG data were referenced to the average voltage across channels, sampled at 250 Hz, and amplified (Quick Amp, Brain Products) and filtered through a passband of 0.017–67.5 Hz (90 dB octave roll off). Impedances were below 12 kΩ.

EEG data analysis

The EEG data were filtered off-line through a 0.1- to 20-Hz passband phase-shift-free Butterworth filter and re-referenced to linked mastoids. Ocular artifacts were removed using the algorithm described by Gratton et al. (1983). Trials in which the change in voltage at any channel exceed 35 μV per sampling point were removed. In total, 0.02% of the data were discarded. Thousand ms epochs of data were extracted from the continuous EEG from 200 ms before stimulus onset to 800 ms after. The data were baseline-corrected according to the average amplitude of the EEG over the 200-ms preceding stimulus presentation and ERPs were created by averaging the EEG data for each condition, electrode site, and participant.

To isolate the effect of reward independent of frequency, we subtracted the ERPs associated with reward from the ERPs associated with no-reward yielding an attend-color FRN and attend-gender FRN that were equated for the effect of stimulus probability. This method maximized the signal-to-noise ratio in the ERPs, as opposed to averaging the ERPs separately for the infrequent reward trials, frequent reward trials, infrequent no-reward trials, and frequent no-reward trials. Similarly, to isolate the effect of frequency independent of reward feedback, we subtracted the ERP associated with the frequently occurring stimuli from the ERP associated with the infrequently occurring stimuli, collapsed across reward condition, yielding a difference-wave N2 (dN2) for each task condition (attend-color, attend-gender). Thus, each of the infrequent and frequent ERPs contained equal numbers of reward and no-reward trials such that the difference between these ERPs were equated for the effects of reward. Note that because NE system activity causes a change in the relative activation of the underlying cortical systems (i.e., making ERP components larger), the impact of NE on the ERP is most appropriately measured in a difference wave that isolates that change. We distinguish between the dN2, and the “raw” N2 in light of this consideration. The interaction of the raw N2 and the reward positivity to the four individual conditions was examined separately in an across-group comparison (below).

The amplitudes of the dN2 and FRN were assessed using a base-to-peak measure as follows: For each subject in each condition, the most negative peak between 200 and 280 ms in the attend-color condition, or 300 vs. 380 ms in the attend-gender condition was identified and recorded as the dN2/FRN peak amplitude. The base amplitude of the dN2/FRN was then taken as the most positive voltage prior to the dN2/FRN and these values were subtracted from the dN2/FRN peak amplitude, yielding our base-to-peak measures. This procedure controls for overlap with the P2, a positive deflection that typically immediately precedes the dN2 and that can push the dN2 into positive peak values. Note that because the FRN is not typically preceded by any notable deflection in the difference wave, the base measure is approximately 0 μVs; for this reason the base-to-peak measure of the FRN is equivalent to a peak amplitude measure. However, we chose to assess FRN base-to-peak for consistency with our method for assessing dN2 amplitude.

In assessing the change in component topology across task conditions, we focused on two electrode sites representative of frontal–central and lateral-occipital scalp regions as we did in our previous study, specifically at channel locations FCz and P8. Both the FRN and the dN2 are typically maximal at channel FCz (e.g., Holroyd et al., 2008) and the dN2 was maximal at channel P8 in the attend-gender condition of our previous study (Warren et al., 2011). Single-tailed t-tests were applied to assess the amplitudes of these ERP components at these channels because of our a priori hypotheses of the direction of each difference. For example, we predicted that the dN2 would be larger at channel P8 than at channel FCz for the attend-gender condition; a dN2 that was larger at FCz than at P8 would run contrary to our hypothesis.

Results

Behavioral results

Mean accuracy was 79.2% (SD = 14.4%) for the attend-color condition and 68.5% (SD = 21.2%) for the attend-gender condition. The data of one subject were eliminated from further analysis because the accuracy score was more than 2 SD below the mean in the attend-color condition. For the remaining 20 subjects, mean accuracy was 80.6% (SD = 13.1%) for the attend-color condition and 70% (SD = 20.4%) for the attend-gender condition. This difference approached significance using a two-tailed t-test, t(19) = −2.0, p < 0.10.

EEG results

The raw ERPs, difference waves and scalp distributions are shown in Figure 2. Inspection of the scalp distributions suggests that the attend-color dN2 was maximal over frontal–central sites (FCz, −4.5 μV) whereas the attend-gender dN2 was maximal at lateral-occipital regions (PO8, −3.5 μV). This impression was confirmed with a 2 × 2 ANOVA on dN2 amplitude with electrode (FCz vs. P8) and task (attend-color vs. attend-gender) as repeated factors. There was an effect of task such that the dN2 was larger in the attend-color condition (−4.0 μV) than the attend-gender condition (−2.8 μV), F(1, 19) = 10.8, p < 0.01, η² = 0.36. There was also an interaction of electrode and task,F(1, 19) = 6.8, p < 0.05, η² = 0.26, and one-tailed paired samples t-tests revealed that in the attend-color condition, the dN2 was larger at FCz than P8 (−4.5 vs. −3.4 μV), t(19) = −2.0, p < 0.05, whereas in the attend-gender condition the dN2 was larger at P8 than at FCz (−3.2 vs. −2.5 μV), t(19) = 2.0, p < 0.05.

FIGURE 2

Figure 2. Grand average ERPs for Experiment 1, recorded from channel FCz and P8 (see labels) and scalp distributions associated with the difference waves. The top row shows the “raw” ERPs for each of the frequency by reward conditions across tasks and electrodes. The middle row shows the dN2 and FRN difference waves across task and electrodes. The bottom row shows the scalp distributions of the dN2 and FRN across tasks. The scalp distributions reflect the base-to-peak measure of each of the dN2 and FRN. The black star on the scalp map denotes channel FCz, and the white star denotes P8. Note that negative is plotted up.

Inspection of the scalp distributions in Figure 2 further indicates that the FRN was distributed over posterior, rather than frontal, regions of the head in both the attend-color (Pz, −5.2 μV) and attend-gender (POz, −4.0 μV, followed by Pz, −4.0 μV) conditions. A 2 × 2 ANOVA on FRN amplitude with electrode and task as repeated factors revealed an effect of electrode such that the FRN was larger at FCz than at P8 (−3.5 vs. −2.9 μV), F(1, 19) = 4.8, p < 0.05, η² = 0.20. There was a trend toward a main effect of task such that the attend-color task yielded a larger FRN than the attend-gender task (−3.5 vs. −2.9 μV), F(1, 19) = 4.0, p < 0.10, η² = 0.18. There was also a trend toward an interaction of electrode and task,F(1, 19) = 3.2, p < 0.10, η² = 0.14, and one-tailed paired samples t-tests revealed that in the attend-color condition, the FRN was larger at FCz than P8 (−4.1 vs. −3.0 μV), t(19) = −2.3, p < 0.05, whereas there was no significant difference in the attend-gender condition (FCz: −3.0 μV; P8: −2.9 μV, p > 0.05). An additional check indicated that the FRN was larger at Pz than FCz in the attend-gender condition, t(19) = 2.9, p < 0.01, but not in the attend-color condition, t(19) = 1.4, p > 0.05.

Discussion

We proposed that the dN2 is a manifestation of cortex-wide NE neuromodulation, and predicted that the impact of NE modulation on cortex and therefore the topology of the dN2 should vary according to task demands. By contrast, a standard theory of the FRN holds that it is produced by the impact of DA signals on ACC activity, and therefore that the FRN should appear with a consistent frontal–central scalp topology across task conditions. Here, we replicated our previous finding that the dN2 changes from exhibiting a primarily central scalp distribution when subjects categorize tinted faces based on color to a more lateral-occipital distribution when subjects categorize the same face stimuli based on the gender of the face. Further, although the FRN was larger at frontal central regions in both the attend-color and attend-gender conditions, it was not significantly larger at FCz than P8 in the attend-gender condition, it was relatively small overall (ranging from −2.9 to −4.1 μV), and it exhibited a scalp distribution that was mostly posterior (see Figure 2). These results are inconsistent with the identification of this component with the FRN (Miltner et al., 1997) and indicate that (as predicted) this task did not produce a robust FRN. We conclude that, with minimal interference from the FRN, the dN2 exhibits a prominent yet variable scalp distribution.

Experiment 2: Active Learning

The Active Experiment maximized engagement of the system underlying the FRN by presenting subjects with an apparently complex decision task that encouraged deliberation. Subjects were asked to choose between two elaborate images of tarot cards presented side-by-side on a computer screen by pressing either a left or right key on a keyboard. Six different cards were paired a total of 15 different ways. The subjects were told that with each pairing one card had a better chance of winning than the other, and that they were required to learn which card to pick in any specific pairing (as opposed to finding which of the six cards had the best chance of winning overall). The complexity of the stimulus displays was intended to cultivate a sense that the task was challenging yet learnable (when in fact it was not). In so doing we expected the feedback stimuli to elicit a relatively large FRN with a frontal–central scalp topography for both the attend-gender and attend-color conditions. We further predicted that the FRN would interfere with the production of the dN2 in both the attend-color and attend-gender conditions.