Larger Error Signals in Major Depression are Associated with Better Avoidance Learning

The medial prefrontal cortex (mPFC) is particularly reactive to signals of error, punishment, and conflict in the service of behavioral adaptation and it is consistently implicated in the etiology of major depressive disorder (MDD). This association makes conceptual sense, given that MDD has been associated with hyper-reactivity in neural systems associated with punishment processing. Yet in practice, depression-related variance in measures of mPFC functioning often fails to relate to performance. For example, neuroelectric reflections of mediofrontal error signals are often found to be larger in MDD, but a deficit in post-error performance suggests that these error signals are not being used to rapidly adapt behavior. Thus, it remains unknown if depression-related variance in error signals reflects a meaningful alteration in the use of error or punishment information. However, larger mediofrontal error signals have also been related to another behavioral tendency: increased accuracy in avoidance learning. The integrity of this error-avoidance system remains untested in MDD. In this study, EEG was recorded as 21 symptomatic, drug-free participants with current or past MDD and 24 control participants performed a probabilistic reinforcement learning task. Depressed participants had larger mid-frontal EEG responses to error feedback than controls. The direct relationship between error signal amplitudes and avoidance learning accuracy was replicated. Crucially, this relationship was stronger in depressed participants for high conflict “lose–lose” situations, demonstrating a selective alteration of avoidance learning. This investigation provided evidence that larger error signal amplitudes in depression are associated with increased avoidance learning, identifying a candidate mechanistic model for hypersensitivity to negative outcomes in depression.


INTRODUCTION
At the interface of emotion and cognition, affective neuroscience has the potential to advance the characterization of disease states away from idiosyncratic symptom-based criteria toward common brain-based nosology (cf. Insel et al., 2010). One promising example is evidenced by the convergence of cognitive, emotional, and neurological accounts of major depressive disorder (MDD). In addition to cardinal features of anhedonia and low mood, cognitive processing in MDD is characterized by a negative emotional distortion of the world, the self, and the future (Beck, 1976). Eshel and Roiser (2010) have suggested that these symptoms of MDD may reflect an impairment in basic reward (hypo-responsive) and punishment (hyper-reactive) processing systems. In this investigation, we propose a mechanism by which the hyper-reactive distortion of punishment information in MDD biases avoidance learning, possibly increasing the salience of "bad" outcomes.
The medial prefrontal cortex (mPFC), particularly the anterior cingulate cortex, appears to be centrally involved in a self-monitoring network. This system is consistently activated in neuroimaging investigations of reward and punishment (Carter et al., 1998;Ridderinkhof et al., 2004) and it is strongly implicated in the etiology of MDD (Davidson et al., 2002). The ACC has been described as a functional node in complex processes such as adaptive control over behavior and acquisition of reinforcement contingencies, as a dynamic processing hub for attention and action selection, and as a sensitive determinant of motivational functions including emotional reactivity and willful engagement (Devinsky et al., 1995;Vogt, 2005). The combined activities of this particular neural system identify it as a focal node by which emotion may be internalized to affect cognitive functioning.
One reliable measurement proposed to reflect mPFC functioning is the feedback-related negativity (FRN), a scalp-measured electrical voltage deflection occurring after feedback indicating a loss of value, or a performance error. The FRN reflects phaselocked theta band activities and is thought to reflect the functions of an action monitoring system that uses signals of error, conflict, or punishment to adapt future behavior (Holroyd and Coles, 2002;Frank et al., 2005). Larger error signals have been found in MDD participants, both to negatively valenced feedback (Tucker et al., 2003) and to response errors (Chiu and Deldin, 2007;Pizzagalli, 2008, 2010). Yet paradoxically, depressed participants are characterized by deficits in performance adaptations following error, conflict, and punishment (Elliott et al., 1996;Pizzagalli et al., 2006;Holmes and Pizzagalli, 2007;Compton et al., 2008), even in the context of larger error signals (Holmes and Pizzagalli, 2008). Thus, it remains unknown if larger error signals in depression reflect a functional increase in performance-monitoring integrity.
There is another, longer-term consequence associated with larger error-related mid-frontal activities: increased ability in learning to avoid stimuli that have been previously associated with punishment, especially for very difficult (high conflict) choices (Frank et al., , 2007aCavanagh et al., 2010a,b). This learning is suggested to reflect the involvement of the mPFC with basal ganglia systems during slow probabilistic integration of action values (Frank et al., 2007b). We have previously detailed how emotional reactivity to social stress can instantiate a reinforcement learning bias in this slow integrative system (Cavanagh et al., 2010a). In that study, negative affect altered the processing of punishment information (as indicated by mid-frontal theta), which in turn predicted the efficacy of avoidance learning. Depressed patients have been shown to overreact to punishment information (Elliott et al., 1996(Elliott et al., , 1997, but the functioning of this error-avoidance system in MDD remains unknown. Our previous findings suggest a novel and testable hypothesis. Since larger error signals lead to better avoidance learning, enhancement of this relationship in MDD might reveal a mechanistic explanation for hypersensitivity to negative outcomes in MDD.

PARTICIPANTS
All participants provided written informed consent that was approved by the University of Arizona. Participants were recruited from introductory psychology classes based on mass survey scores of the beck depression inventory (BDI). Recruitment criteria included: (1) age 18-25, (2) no history of head trauma or seizures, and (3) no current psychoactive medication use. Control participants (N = 24, 14 female) had stable low BDI (<7) between mass survey and preliminary assessment, no self-reported history of MDD, and no self-reported symptoms indicating the possibility of an Axis 1 disorder as indicated by computerized self-report completion of the Electronic Mini International Neuropsychological Interview (eMINI: Medical Outcome Systems, Jacksonville, FL, USA). Depressed participants needed to have a stable high BDI (>13), and needed to meet criteria for current or past MDD during a Structured Clinical Interview for the DSM-IV. A total of N = 21 (14 female; 10 current MDD, 11 past history of MDD) symptomatic participants met these criteria. Participants with current and past MDD history were grouped together in this study to increase power; this decision was additionally motivated by the fact that BDI score reflected a moderate severity of depression and did not differ between the current and past history groups (current: M = 22, SD = 5.54; past: M = 21, SD = 5.54). All subsequent task procedures and EEG processing steps are identical to Cavanagh et al. (2010a) except where otherwise indicated.

TASK
Participants performed a probabilistic learning task twice, with a self-paced break between tasks, using different pseudo-randomly assigned character sets. Each task included a forced choice training phase followed by a subsequent testing phase (Frank et al., 2004), as shown in Figure 1. During the training phase the participants were presented with three stimulus pairs, where each stimulus was a Japanese Hiragana character associated with a different probabilistic chance of receiving "Correct" or "Incorrect" feedback. These stimulus pairs (and their probabilities of reward) were termed A/B (80%/20%), C/D (70%/30%), and E/F (60%/40%). All training trials began with a jittered inter-trial-interval between 300 and 700 ms. The stimuli then appeared for a maximum of 4000 ms, and disappeared immediately after the choice was made. If the participant failed to make a choice within the 4000-ms, "No Response Detected" was presented. Following a button press, either "Correct" or "Incorrect" feedback was presented for 500 ms (jittered between 50 and 100 ms post response).
During the testing phase all possible stimulus pairs were presented eight times (120 trials total). Trials in the test phase began with an ITI of 500 ms. Stimuli were presented for a maximum FIGURE 1 | Probabilistic learning task. During training, each pair is presented separately. Participants have to select one of the two stimuli, slowly integrating "Correct" and "Incorrect" feedback (each stimulus has a unique probabilistic chance of being correct) in order to maximize their accuracy. The FRN/theta dynamics reported here were taken following these feedbacks. During the testing phase, each stimulus is paired with all other stimuli and participants must choose the best one, without the aid of feedback. Measures of reward and punishment learning are taken from the test phase, hypothesized to reflect the operations of a slow, probabilistic integrative system during training. Note that the letter and percentage are not presented to the participant, nor are the green boxes surrounding the choice.

Frontiers in Psychology | Cognition
of 4000 ms, and disappeared as soon as a choice was made. No feedback was provided in the testing phase. Reward seeking ("Go learning") was defined as the accuracy of choosing A over C, D, E, and F (i.e., seeking A), whereas punishment avoidance or "NoGo learning" was defined as the accuracy of choosing C, D, E, and F over B (i.e., avoiding B). Conflict trials were defined based on the reinforcement value difference between the available choices (with smaller, more subtle differences in reinforcement values associated with increasing conflict). Thus, we analyzed performance separately for high conflict Go (AC, AE, CE), high conflict NoGo (BD, BF, DF), low conflict Go (AD, AF), and low conflict NoGo (BC, BE). We have previously referred to these types of high conflict valenced decisions as "win-win" (Go) and "lose-lose" (NoGo) situations (Frank et al., 2007c;Cavanagh et al., 2010a). To increase sensitivity, data from the two administrations of the task were combined if participants were able to select the most rewarding stimulus (A) over the most punishing stimulus (B) at least 50% of the time during the testing phase on each administration (based on this criterion, five participants in each group had data from only one administration). For this investigation, EEG signals were taken from the training phase (responses to feedback during learning), and behavioral indices of learning were taken from the testing phase. This analytic strategy allows an assessment of how the neural processing of feedback during learning relates to value-based decision making at a later point in time.

ELECTROPHYSIOLOGICAL RECORDING AND PROCESSING
Scalp voltage was measured using 64 Ag/AgCl electrodes using a Synamps 2 system (bandpass filter 0.5-100, 500 Hz sampling rate, impedances <10 kΩ), referenced offline to averaged mastoids. Eyeblinks were removed with Independent Components Analysis (Delorme and Makeig, 2004). Because the FRN represents phase-locked theta activity following feedback, data were processed to obtain both time-domain FRN amplitudes, as well as time-frequency theta band activity in this same time range. Event-related EEG was time-locked to correct and incorrect feedback during training and baseline corrected to the average power from 300 to 200 ms before feedback. Baseline-independent amplitudes of the incorrect ERPs (filtered 0.5-15 Hz) were computed as the difference between the mean values in 20 ms windows around the grand average peak (P2 or P3) and the trough (FRN) at FCz (P2: 200 ms, FRN: 276 ms, P3: 376 ms), see Figure 2A, yielding two difference scores: P2-FRN, and P3-FRN. Thus, larger values indicate larger amplitude deflections. Note that these time differences correspond to the period of a 5-7-Hz (theta) rhythm. This type of peak-to-trough quantification of ERP components has been shown to correlate with between-subjects differences in theta power better than baseline corrected mean amplitude (Cavanagh et al., 2011).
Time-frequency calculations were computed using customwritten Matlab routines (Cohen et al., 2008;Cavanagh et al., 2009). Time-frequency measures were computed by multiplying the fast Fourier transformed (FFT) power spectrum of single trial EEG data with the FFT power spectrum of a set of complex Morlet wavelets (defined as a Gaussian-windowed complex sine wave: e −i2πtf e −t 2 /(2 * σ 2 ) , where t is time, f is frequency (which increased from 1 to 50 Hz in 50 logarithmically spaced steps), and σ defines the width (or "cycles") of each frequency band, set according to 3/(2πf), and taking the inverse FFT. The end result of this process is identical to time-domain signal convolution, and it resulted in estimates of instantaneous power (the magnitude of the analytic signal), defined as z[t ] (power time series: p(t ) = real[z(t )] 2 + imag[z(t )] 2 ). Whereas our previous www.frontiersin.org investigations have favored a Gaussian width (σ) of 4.5/(2πf), here we utilize a width of 3/(2πf) to better resolve the temporally specific theta activities suggested by the ERP analyses.
One second of data was removed from each end of the transformed single trial EEG data (to account for edge effects) prior to averaging. Averaged power was normalized by conversion to a decibel (dB) scale (10 * log10[power(t )/power(baseline)]) from a baseline of 300-200 ms, allowing a direct comparison of effects across frequency bands. Whereas the ERPs reflect phase-locked amplitude changes, these time-frequency measures reflect total power (phase-locked and phase-varying). As indicated by the topographic plots, and as in most other studies of these phenomena, values for statistical analysis were averaged over time and frequency at the FCz electrode (276-376 ms post feedback, 5-8 Hz), see Figure 2B. Topographic plots ( Figure 2C) show theta power in this same time-frequency window, detailing a mid-frontal distribution peaking at FCz.

RESULTS
There were no group differences in any performance measures, including training or test phase accuracies or reaction times, immediate post-punishment adaptation, test phase accuracy for Go or NoGo, nor in high or low conflict variants of each valence (see Table 1). Importantly, Table 1 also demonstrates that there were no group differences in the number of correct or incorrect feedbacks as evidenced by the EEG epoch counts. As shown in Figure 2A, there was a significant difference between groups in the P3-FRN amplitude of the ERP [t (43) = 2.85, p < 0.01], but not for the P2-FRN amplitude (t < 1). Error-related theta power in this P3-FRN time range did not significantly differ between groups [t (43) = 1.3, p = 0.22]. However, both P3-FRN amplitude and theta power predicted individual differences in NoGo accuracy (rs > 0.34, ps < 0.05), replicating previous findings (Frank et al., , 2007aCavanagh et al., 2010a,b). BDI score did not significantly correlate with brain or behavioral variables within the depressed group with linear or quadratic fits.
The moderating effect of depression on this error-NoGo relationship was tested using repeated measures general linear models (GLMs) with NoGo accuracy as the dependent variable, and within-subjects factors for conflict (Low, High) and valence (Go, NoGo), a between-subjects factor for group (depressed, control) and a continuous moderator of theta power to incorrect feedback. Planned comparisons were first split by valence, then by conflict. As expected, group differences in the coupling between error signal theta power and avoidance learning were specific to high conflict NoGo cases [four-way interaction F (1,41) = 8.7, p < 0.01; threeway high conflict interaction F (1,41) = 4.9, p < 0.05, two-way high conflict NoGo interaction F (1,41) = 5.1, p < 0.05; all other interactions F 's < 1.3, p's > 0.25]. Substituting P3-FRN amplitudes for theta power as a continuous moderator produced a similar two-way interaction for high conflict NoGo [F (1,41) = 5.8, p < 0.05; other two-way interactions F s < 1] but higher-order statistical tests were non-significant. Figure 3 demonstrates how error signal-avoidance learning coupling was specifically enhanced in MDD groups compared to control in high conflict NoGo conditions. As described by the GLM and indicated in Figure 2, the high

Control
Depressed t p

Mean (SD) Mean (SD)
Beck depression inventory (score) conflict NoGo correlations were significantly different between the groups (Fisher's r to z test: z = 2.37, p = 0.018).

DISCUSSION
Numerous investigations have shown that larger error signals predict better avoidance learning, and the present report reveals that this relationship is enhanced among non-medicated depressed participants. This mood-related effect was specific to high conflict lose-lose cases, revealing the specificity of increased error signals in depression on avoidance learning.

RELATIONSHIP TO PREVIOUS INVESTIGATIONS
In the current investigation, the MDD group was characterized by larger feedback-locked error signals and enhanced error signal-avoidance coupling, yet these occurred in the context of similar behavioral performance to controls. A similarly powered study of depressed patients recently reported null results for behavioral measures of punishment adaptation and NoGo learning in this same task (Chase et al., 2010). The lack of behavioral effects are convergent with those reported here, indicating that depression-specific effects on the link between brain error Frontiers in Psychology | Cognition FIGURE 3 | Scatterplots demonstrating error signal-avoidance learning relationships, along with correlation test results. The first row shows total NoGo accuracy, which can be split into high and low conflict cases (rows 2 and 3). Only in the high conflict "lose-lose" cases did controls and depressed participants significantly differ from each other, demonstrating the specificity of increased error signal-avoidance learning acuity among those with depression.
monitoring systems and performance are critical variables for understanding learning-related changes. While larger error signals have been previously been found in MDD participants (Tucker et al., 2003;Chiu and Deldin, 2007;Pizzagalli, 2008, 2010), many other studies report complicated patterns and divergent contrasts (Ruchsow et al., 2005;Compton et al., 2008;Schrijvers et al., 2008Schrijvers et al., , 2009Olvet et al., 2010;Georgiadi et al., 2011) in addition to compromised post-error adaptation (Elliott et al., 1996;Pizzagalli et al., 2006;Holmes and Pizzagalli, 2007;Compton et al., 2008). These complexities suggest that in order to successfully interpret the meaning of altered error-related signals, it may be critical to understand how these signals are (or are not) being used for behavioral adaptation.

DEPRESSION, ERRORS, AND AVERSIVE LEARNING: WHAT DOES IT ALL MEAN?
Avoidance learning acuity is proposed to be reflected by NoGo behavioral accuracy, and high conflict choices reflect two outcomes that are hard to distinguish. Thus, high conflict NoGo trials all consist of "lose-lose" forced choice decisions. Notably the mood-related effect reported here was specific to these loselose cases, revealing the specificity of increased error signals in depression on avoidance learning. A mechanistic explanation of this effect may be that an increased salience of error signals is related to larger or more extended pauses in tonic dopamine release.
The temporal specificity of the enhanced error signals in the MDD group support this interpretation. While earlier stages of feedback evaluation have been associated with valence-specific differences, MDD-related modulation of later stages may be associated with an enhanced prediction error magnitude (Philiastides et al., 2010). A neural network model of cortico-striatal function in this same task suggests that a larger negative prediction error would cause a larger/longer dopamine dip, which would increase learning for stimulus-action combinations in the D2 receptor mediated indirect cortico-striatal pathway, contributing to a tendency to not make this action again (Frank, 2005). This effect would be behaviorally reflected by particularly increased accuracy in loselose choices, which are most sensitive to individual differences in the ability to resolve subtly different probabilities of negative events.
The finding reported here suggests an error-related mechanism by which punishment hypersensitivity may be related to affective and behavioral distress. We previously proposed that an affect-related increase in mid-frontal error signals and avoidance learning reflected a cortical bias on the integration of action values (Cavanagh et al., 2010a). Using the exact same task and methods, our prior study of social threat found that emotional reactivity to stress predicted an increase in mid-frontal theta and a related increase in high conflict NoGo learning amongst highly punishment sensitive participants (Cavanagh et al., 2010a). Note that increased high conflict NoGo learning accuracy in the context of increased mid-frontal theta was paralleled between highly stress-reactive participants in the prior investigation and the depressed participants reported in this investigation. The similarities between the previous and current studies warrant comparison, as they may provide a window into the processes underlying these common effects: affect-modulated mPFC activities may bias mood-congruent learning.

CONCLUSION
An integrative explanation of the findings and possible mechanisms reported here focuses on the fact that the mPFC is involved in cognitive control, affective reactivity, and the adaptation of behavior to reinforcement. It is likely no coincidence that this system is intimately implicated in the etiology of MDD. The combined activities of this particular cortico-striatal system identify it as a focal node by which emotion may be internalized to affect cognitive functioning. In this investigation, we have identified a measure of how, and a possible mechanism by which, negatively valenced information is internalized in the genesis and expression www.frontiersin.org of MDD: error and punishment signals are increasingly coupled with the salience of "bad" outcomes.