The Neural Basis of Predicting the Outcomes of Imagined Actions

A key feature of human intelligence is the ability to predict the outcomes of one’s own actions prior to executing them. Action values are thought to be represented in part in the dorsal and ventral medial prefrontal cortex (mPFC), yet current studies have focused on the value of executed actions rather than the anticipated value of a planned action. Thus, little is known about the neural basis of how individuals think (or fail to think) about their actions and the potential consequences before they act. We scanned individuals with fMRI while they thought about performing actions that they knew would likely be rewarded or unrewarded. Here we show that merely imagining an unrewarded action, as opposed to imagining a rewarded action, increases activity in the dorsal anterior cingulate cortex, independently of subsequent actions. This activity overlaps with regions that respond to actual unrewarded actions. The findings show a distinct network that signals the prospective outcomes of one’s possible actions. A number of clinical disorders such as schizophrenia and drug abuse involve a failure to take the potential consequences of an action into account prior to acting. Our results thus suggest how dysfunctions of the mPFC may contribute to such failures.


INTRODUCTION
A key feature of human intelligence is the ability to predict the outcomes of one's own actions prior to executing them. Much of the literature on decision-making and reinforcement learning focuses on learning the value of various available options. The optimal decision is one that has the highest value in the decision-maker's subjective evaluation (Thorndike, 1911), with perhaps some value on exploring new options (Kaelbling et al., 1996). Environmental cues indicate what options are available, and the cues in turn guide instrumental responding via learned stimulus-response (S-R) associations (Sutton and Barto, 1998). This is the essence of model-free reinforcement learning (Dayan and Niv, 2008). Such constitutes an inverse model (Shadmehr and Wise, 2004), in that stimulus cues (S) activate a representation of the desired goal such as a piece of food, and this goal is mapped backward to the response (R) necessary to achieve the goal. The values of stimuli and the goals they represent are likely represented in the orbitofrontal cortex (OFC; Tremblay and Schultz, 1999;Schoenbaum et al., 2003). All of this works fine for habit learning.
The situation is more difficult when an animal faces a novel environment in which the S-R association has not been learned, or there is a more complex set of constraints, so that there is no one automatic best response. This is where forward models as in model-based reinforcement learning (Shadmehr and Wise, 2004;Daw et al., 2005;Glascher et al., 2010) are useful. A forward model predicts the outcome of a planned action. This is a learned response-outcome (R-O) association (Colwill and Rescorla, 1990) which affords a "dynamic evaluation lookahead" (Van Der Meer and Redish, 2010). Favorable outcome predictions might further activate the corresponding response plan, while unfavorable or risky outcome predictions might suppress it.
The process of employing a forward model to predict the likely outcomes of planned actions is akin to the popular notion of thinking before acting. Humans can think about or imagine (with varying accuracies) what might be the outcome of a planned action. Nonetheless, relatively little research has been done on the neural basis of thinking ahead, with just a few cognitive (Johnson, 2000;Hassabis et al., 2007), neuroimaging (Newman et al., 2009;Glascher et al., 2010), and rat (Van Der Meer and Redish, 2010) studies. Some results suggest that anterior cingulate cortex (ACC) may be involved in anticipating adjustments in control (Sohn et al., 2007;Aarts et al., 2008;Aarts and Roelofs, 2011). We previously showed that the medial prefrontal cortex (mPFC), and especially ACC may learn to predict the likelihood of an impending error resulting from current actions (Brown and Braver, 2005). Here we use fMRI to ask whether and how the ACC may signal the nonrewarded action likelihood of imagined responses, as distinct from the alternative hypothesis that ACC is activated only by impending actions. We use a simple task that isolates the outcome prediction by asking subjects to imagine performing an action and experiencing its consequences, while controlling for the subsequent action execution.

PARTICIPANTS
Data from 22 right-handed participants were collected (mean age = 23.42, SD = 2.80). Data from two participants were discarded due to insufficient reward outcomes and data from one participant was excluded due to a scanning artifact, leaving 19 www.frontiersin.org usable participants (11 female). Participants reported no history of psychiatric or neurological disorder, and reported no current use of psychoactive medications. Participants were compensated $25/h for their time. Participants were trained on the task on a computer outside of the scanner until they gave verbal confirmation that they understood the task. The experimenter observed the participant's performance and judged whether they demonstrated sufficient understanding of the task.
Participants were informed that they would receive compensation based on their performance, although they were unaware of how much they would receive for rewarding feedback. They received $0.05 for each reward outcome (described in further detail below in section Experimental Paradigm).
Participants gave informed consent prior to participating in the experiment. The experiment was approved by the Indiana University Institutional Review Board.

EXPERIMENTAL PARADIGM
The task consisted of two phases: an imagine phase and a response phase. During the imagine phase, participants were instructed to imagine making particular responses and experiencing the corresponding consequences. During the response phase, participants were instructed to choose one of two possible responses. The appropriate response was determined by feedback history. When a particular response was rewarded, participants were instructed to make that response again. If a response was not rewarded, participants were instructed to make the alternate response. Hence, prior to each trial, participants had a belief about the response that would most likely result in a reward outcome, and could therefore imagine the consequences of a response that matched that belief (i.e., imagine rewarded action) or violated it (i.e., imagine non-rewarded action).
On each trial, the imagine phase began with the sequential presentation of two white arrow cues on a black background, with one pointing left and the other pointing right (Figure 1). The order of presentation of the arrow cues was counterbalanced.
Participants were instructed to simply imagine themselves pressing the corresponding left or right buttons with the left or right index finger, along with the corresponding outcome they would expect if they were to actually press the button. Both left and right responses were imagined separately on each trial, so the probabilities of imagining each event were equal. After a variable delay, the response phase was signaled by an exclamation mark ("!") which cued them to respond with either a left or right actual button press. Crucially, the responses that they imagined were independent of the actual response that they made. Participants would then be presented with a "$" or a "0" as feedback. A "$" would mean that they had gained a point, while a "0" would mean that they had gained nothing. Participants were informed that if they were rewarded on a trial (i.e., if they received a "$" as feedback) that they should make the same button response on the next trial, but a "0" indicated that they should switch. The appropriate button response (left vs. right) switched across trials with a relatively low probability resulting in a high likelihood of reward outcome. With this design, subjects could predict the outcome of each possible button press with good confidence. This allowed us to examine neural activity related to imagining distinct non-rewarded actions and rewarded actions without confounding the results with a particular effector. The probability of an underlying switch was 0 for the first two trials following a switch, then 0.33 per trial for trials three through seven, and 1.0 after eight trials. This distribution ensured that switches occurred but were unpredictable and less likely than chance. After receiving feedback, participants were presented with a blank screen that lasted either 1, 3, 5, or 7 s, based on an exponential distribution function (Dale, 1999).
On 20% of the trials, a question mark ("?") was presented instead of arrow cues. During this condition, participants were instructed to recall the last response they had made and the corresponding outcome they had received, whether it was an outcome signaling a reward or not gaining a reward. When the exclamation mark cue was presented, participants were to make the same response they had made on the previous trial, whether it FIGURE 1 | Imagine condition. In the Imagine condition, participants saw a sequence of two arrows, one facing left and the other facing right (order randomized across trials). As each arrow appeared, participants were instructed to imagine performing the corresponding button press response (left or right) and the outcome associated with it. An exclamation mark ("!") cued the subjects to make a response of their choice. One of the two options was rewarded action, and the other would be unrewarded action. The rewarded action response in the preceding trial was more likely to be rewarded action in the current trial. Participants received either rewarded ("$") or non-rewarded ("0") feedback as a result of their choice. The response cue and outcome cues were identical to the Imagine condition. was rewarded or not. These trials were included for purposes not relevant here and were modeled separately.

fMRI ACQUISITION AND DATA PREPROCESSING
The experiment was conducted with a 3 T Siemens TIM Trio scanner using a 32-channel head coil. Foam padding was inserted around the sides of the head to increase participant comfort and reduce head motion. Imaging data was acquired at a 30å ngle from the anterior commissure-posterior commissure line in order to maximize signal-to-noise ratio in the orbital and ventral regions of the brain (Deichmann et al., 2003). Functional T2 * weighted images were acquired using a gradient echo planar imaging sequence (30 mm × 3.8 mm interleaved slices; TE = 25 ms; TR = 2000 ms; 64 × 64 voxel matrix; 220 mm × 220 mm field of view). Three runs of data were collected with 240 functional scans each. High resolution T1-weighted images for anatomical data (256 × 256 voxel matrix) were collected at the end of each session.
SPM5 (Wellcome Department of Imaging Neuroscience, London, UK; www.fil.ion.ucl.ac.uk/spm) was used for preprocessing and data analysis. The functional data for each run for each participant was slice-time corrected and realigned to each run's mean functional image using a 6 degree-of-freedom rigid body spatial transformation. The resulting images were then coregistered to the participant's structural image. The structural image was normalized to standard Montreal Neurological Institute (MNI) space and the warps were applied to the functional images. The functional images were then spatially smoothed using an 8-mm Gaussian kernel.

fMRI ANALYSIS
Functional neuroimaging data were analyzed using a general linear model (GLM) with random effects. Feedback for rewarded actions and non-rewarded actions responses were modeled with a canonical hemodynamic response function (HRF) at the time of feedback. Two regressors modeled each imagine event. A delta regressor locked to the onset of stimulus presentation was included to capture initial perceptual activation. An epoch regressor onsetting 1 s after stimulus presentation and spanning the duration of the imagine event was included to capture the act of imagining itself. These epoch regressors are the regressors of interest for present purposes. Separate regressors were included for imagining non-rewarded action and imagining rewarded action events. These regressors were subdivided into imagining rewards associated with left button responses (ImagineLeftReward) and imagining rewards associated with right button responses (ImagineRightReward), as well as non-rewards associated with both button responses (ImagineLeftNon-Reward, ImagineRightNon-Reward).
Additional regressors modeled left vs. right button presses. Contrasts were conducted on imagining a potential non-rewarded action outcome (ImagineLeftNon-Reward, ImagineRightNon-Reward) compared to imagining a potential rewarded action outcome (ImagineLeftReward, ImagineRightReward). This contrast would reveal whether there was significantly more activity for merely imagining a non-rewarded action outcome as opposed to a rewarded action outcome. Separate contrasts were computed for each subject, and results are based on a group-level random effects analysis on these contrasts.
Unless otherwise stated, all whole-brain results were thresholded at p < 0.01 uncorrected at the voxel-level with a 238 voxel cluster extent providing a corrected p < 0.05 threshold according to AlphaSim.

BEHAVIORAL RESULTS
Behavioral data were analyzed in order to confirm that subjects performed the task appropriately. If participants successfully followed instructions on either switching or repeating their response on the next trial, participants would on average receive 17 reward outcomes per run, or 51 reward outcomes over all three runs. However, participants could also commit errors if the instructions were not successfully followed, which resulted in an incorrect switch or an incorrect stay (e.g., switching the button response when the previous trial had yielded a reward outcome). Failure to follow the instructions resulted in fewer reward outcomes and increased the probability of receiving a non-reward outcome. On average, participants performed the task at a high level (mean number of reward outcomes per run = 15.95, SD = 1.02; mean number of errors committed per run = 1.05, SD = 1.11). Participants who received 12 or fewer reward outcomes for two or more runs were excluded from further analysis.
A subset of participants (N = 10) were given a debriefing survey after scanning asking whether they were able to visualize the motor response associated with each arrow, whether they were able to imagine the outcome associated with each button press, and www.frontiersin.org whether they felt motivated to respond to gain the bonus money. Ratings were made on a Likert scale from 1 to 5, with 1 being the lowest confidence in the given response and 5 being the highest. In general, participants rated that they were able to visualize the motor response (mean rating = 4.3) and able to imagine the outcome associated with each button press (mean rating = 4.7). Participants also appeared to be motivated to perform the task well (mean rating = 4.6). A Wilcoxon Signed-Rank test showed that all ratings were significantly different from an average score of 3, which would represent indifference toward each of the questions (all P's < 0.01). Hence, the behavioral data indicated that subjects understood and performed the task as instructed.

IMAGING RESULTS
We began by confirming that non-reward feedback produced heightened activation in the ACC compared to reward feedback as would be expected by prior literature (Hohnsbein et al., 1989;Gehring et al., 1990). Confirming these activations, the contrast of FeedbackNon-Rewarded-FeedbackReward produced robust activations in the dorsal ACC and pre-SMA, as well as lateral frontal and parietal regions. These results indicate that the paradigm appropriately elicited non-reward signals in the ACC.
Another possibility is that activation in the ACC represents response conflict (Botvinick et al., 2001). By this account, even when subjects imagine a non-rewarded action, they also maintain an active representation of the rewarded action as they subsequently intend to execute it. Thus, even though subjects were informed of the button response which would likely be rewardedand therefore presumably were prepared for execution of this response -the presentation of an arrow cue for imagining a non-rewarded action would lead to preparation of the rewarded response. This would lead to greater summed motor cortex activity when imagining non-rewarded actions relative to imagining rewarded actions. This idea is consistent with previous research demonstrating that conflict activation in the ACC can precede actual response execution as forthcoming actions are anticipated (Sohn et al., 2007).
To address this possibility, we examined whether greater summed motor activation accompanied ImagineNon-Reward trials relative to ImagineReward trials as predicted by response conflict accounts. To do so, we identified regions in motor cortex (Areas 4 and 6) that showed effects of executing particular responses, i.e., RespondLeft > RespondRight (right motor cortex, MNI 46, −28, 54, k = 2161 voxels; Extent: 14 < x < 63, −39 < y < −6, 29 < z < 74) and RespondRight > RespondLeft (left motor cortex, MNI −34, −32, 46, k = 3437 voxels; Extent: −58 < x < −3, −48 < y < 9, 28 < z < 74) at a cluster corrected threshold of P < 0.001. We then extracted parameter estimates from these two regions for imagining rewarded and non-rewarded outcomes associated with either left or right button presses. This yielded 8 parameter estimates for each subject divided in a 2 (left/right motor cortex) × 2 (imagine left/right response) × 2 (imagine reward/non-reward) factorial design. These parameter estimates were analyzed using a three-way ANOVA, with subjects treated as a random factor. If ACC activation is driven by response conflict, we would expect a main effect of imagined outcome.
In contrast to the conflict monitoring predictions, there was no main effect of imagined outcome [F (1,18) = 0; p = 0.97]. Instead, there was a significant three-way interaction between motor cortex, imagined response, and imagined outcome; [F (1,18) = 45.23; p < 0.0001]. As depicted in Figure 3, subjects only exhibited motor cortex activity associated with the rewarded button press even when presented with a cue instructing them to imagine making a non-rewarded button press. This interaction suggests that the observed activity in both motor cortices was not due to summed motor activity for imagining non-rewarded outcomes associated with both left and right button presses. Thus, it appears that the present data cannot be explained by a conflict effect.

DISCUSSION
The present study sought to explore the neural mechanisms involved in imagining possible actions and predicting their potential consequences, a concept variously referred as mentation (Goldman-Rakic, 1996) or "dynamic evaluation lookahead" (Van Der Meer and Redish, 2010) based on learned R-O predictions (Colwill and Rescorla, 1990). We identified the mPFC as playing a potential role in action outcome prediction. In prior studies, the mPFC has been implicated in predicting action outcomes (Brown and Braver, 2005;Valentin et al., 2007;Glascher et al., 2009;Krawitz et al., 2011) or similarly learning the value of actions (Kennerley et al., 2006;Rushworth et al., 2007), although previous studies have not isolated R-O prediction from the actual execution of the corresponding responses.
Because of the absence of explicit feedback and motor response during the Imagine condition, our findings in mPFC are unlikely to be accounted for by models assigning a role for error detection (Gehring et al., 1993). Our findings of greater ACC activity for imagining non-rewarded actions, combined with greater motor cortex activity representing the rewarded action response while subjects imagined non-rewarded actions, might initially seem consistent with the response conflict model (Botvinick et al., 2001) as extended to anticipation (Sohn et al., 2007). Nevertheless, multiple responses can lead to ACC activity even without response conflict (Brown, 2009), which suggests that ACC may reflect the anticipated responses and outcomes rather than conflict per se. Furthermore, anticipatory effects in ACC likewise do not necessarily entail response conflict (Aarts et al., 2008). Another possible alternative account of ACC activity is that it correlates with time Frontiers in Neuroscience | Decision Neuroscience FIGURE 3 | Parameter estimates extracted from left and right motor cortex for Imagining Rewarded and Imagining Non-Rewarded outcomes associated with left and right button presses. ImagineL, ImagineLeft; ImagineR, ImagineRight. on task (Grinband et al., 2010). We attempted to control for this by equalizing the duration period for imagining both rewarded action and non-rewarded action outcomes. However, we cannot entirely rule out the possibility that participants spent unequal amounts of time imagining the rewarded action vs. non-rewarded action options.
Imagining non-rewarded outcomes also produced activation in the precuneus and superior frontal sulcus. Activations in these regions might reflect increased imagery/working memory demands when imagining non-rewarded outcomes while simultaneously keeping the rewarded outcome in mind. The precuneus is a region that has been shown to be involved in various forms of imagery, such as visuo-spatial imagery (Selemon and Goldman-Rakic, 1988), episodic memory retrieval (Henson et al., 1999), and self-processing (Kircher et al., 2000). More relevant to our current study, experiments have revealed that the precuneus shows greater activation to imagined motor actions as opposed to actual motor executions, specifically in the case of imagined finger movements (Gerardin et al., 2000;Hanakawa et al., 2003). Additionally, the superior frontal sulcus is strongly related to working memory, especially in the spatial domain (Courtney et al., 1998). Taken together, the observed effect in the ACC may be part of a larger network of brain regions involved in the predicting the outcomes of imagined actions more generally.
Given the above, our results are consistent with a comprehensive computational model of mPFC as anticipating and then evaluating the outcome of planned actions. We have recently developed a new model of mPFC, the predicted response-outcome (PRO) model, according to which R-O predictions are generated and subsequently evaluated against actual outcomes in the mPFC (Alexander and Brown, 2011). A key prediction of the model is that mPFC (and especially ACC) signals a prediction of the anticipated outcome of an action, which may be subsequently compared against the actual outcome. In the model, discrepancies between actual and predicted action outcomes form the basis of the error effect in mPFC. These discrepancy signals are not limited to errors; they also signal surprisingly good outcomes (Jessup et al., 2010). There is ample evidence that surprising action outcomes are detected in part by ACC in monkeys (Ito et al., 2003;Hayden et al., 2011) and humans (Nee et al., 2011). www.frontiersin.org Table 1 | Summary of activations for whole-brain analysis. Coordinates are reported in MNI space, and cluster size is given in number of contiguous voxels. All reported activations pass a cluster-corrected threshold of p < 0.05.

IMAGINENON-REWARD-IMAGINEREWARD
Our results are consistent with the outcome predictions originating from within the mPFC, although these are likely derived from component information in other regions such as the precuneus and superior frontal sulcus. Second, it was unclear whether the R-O predictions would be represented in the mPFC even when action execution was not imminent. Our results are consistent with the PRO model predictions and indicate that mPFC activity may reflect a subjective prediction of action outcomes. The present results suggest that these action outcome predictions are present even when action execution is merely imagined and not imminent. The region that responds to imagined errors overlaps with the region that responds to actual errors, which is consistent with a partial overlap between regions that predict outcomes and regions that evaluate actual outcomes. These findings, combined with prior evidence that mPFC activity is key to risk avoidance (Brown andBraver, 2005, 2007;Magno et al., 2006), are consistent with proposals that mPFC is a region crucial to the ability to anticipate and avoid adverse consequences even when a risky action is not planned to be executed immediately. Indeed, over-activity of the mPFC and especially ACC appears to be a key ingredient in obsessive-compulsive disorder (Machlin et al., 1991), in which the excessive urge to avoid potential dangers may be experienced even when no action is otherwise imminent. As a whole, the results are consistent with the PRO model account of the mPFC as involved in predicting the potential outcomes of an action. The results suggest that them PFC evaluates potential outcomes with a view toward guiding decisions among possible actions even when action is not imminent. Our results provide a view of the networks involved in guiding decisions about actions and especially how those networks function when dissociated from action execution. These networks are central to a number of clinical disorders, and a better understanding of their role is urgent given that the impaired ability to think about and take into account the outcomes or consequences of actions is a hallmark of various clinical disorders such as obsessive-compulsive compulsive disorder, schizophrenia, and drug abuse (Petry and Casarella, 1999;Bechara et al., 2002). The identification of the neural mechanisms involved in prospective decision-making has the potential to inform more effective pharmacological and cognitive treatments in patient populations.

ACKNOWLEDGMENTS
Supported by R01 DA026457 (Joshua W. Brown) and the Indiana METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. We thank B. Pruce and C. Chung for assistance with data collection.