Activity in dlPFC and its effective connectivity to vmPFC are associated with temporal discounting

There is widespread interest in identifying computational and neurobiological mechanisms that influence the ability to choose long-term benefits over more proximal and readily available rewards in domains such as dietary and economic choice. We present the results of a human fMRI study that examines how neural activity relates to observed individual differences in the discounting of future rewards during an intertemporal monetary choice task. We found that a region of left dorsolateral prefrontal cortex (dlPFC) BA-46 was more active in trials where subjects chose delayed rewards, after controlling for the subjective value of those rewards. We also found that the connectivity from dlPFC BA-46 to a region of ventromedial prefrontal cortex (vmPFC) widely associated with the computation of stimulus values, increased at the time of choice, and especially during trials in which subjects chose delayed rewards. Finally, we found that estimates of effective connectivity between these two regions played a critical role in predicting out-of-sample, between-subject differences in discount rates. Together with previous findings in dietary choice, these results suggest that a common set of computational and neurobiological mechanisms facilitate choices in favor of long-term reward in both settings.


INTRODUCTION
Impaired ability to delay gratification is thought to play a critical role in sub-optimal decision-making, and in conditions like addiction and obesity (Chambers et al., 2007;Monterosso and Ainslie, 2007;Peters and Buchel, 2011). As a result, there is a widespread, on-going effort to characterize the computational and neurobiological mechanisms underlying this form of selfcontrol. Two types of paradigms have been widely used in behavioral neuroscience to examine these mechanisms. First, are tasks involving intertemporal decisions between rewards, often money, in which subjects choose between sooner-smaller amounts and later-larger ones (Rachlin, 2000;Ainslie, 2001;McClure et al., 2004;Kable and Glimcher, 2007;McClure et al., 2007;Ballard and Knutson, 2009;Gregorios-Pippas et al., 2009;Carter et al., 2010;Monterosso and Luo, 2010;Buchel, 2010, 2011;Luo et al., 2012). Second, are tasks involving dietary choices, in which subjects make choices between foods that vary in their tastiness and healthiness (Hare et al., 2009(Hare et al., , 2011aVolkow et al., 2011).
In previous work investigating dietary self-control, we found important commonalities and differences between successful and unsuccessful dieters (Hare et al., 2009). Behaviorally, the two groups differed on the relative weight that they placed on the health and taste attributes of foods in making their decisions (with successful dieters weighting both health and taste, and unsuccessful dieters weighting only taste). Neurally, the ventromedial prefrontal cortex (vmPFC) encoded the value of foods at the time of choice equally for both groups. The critical difference had to do with the role of left dorsolateral prefrontal cortex (dlPFC). In successful dieters, dlPFC came on-line and exhibited increased effective connectivity with vmPFC during choices that required self-control (e.g., refusing to eat tasty, but unhealthy candy). In contrast, unsuccessful dieters did not exhibit this pattern of connectivity. Furthermore, in a subsequent study we found that non-dieting participants behaved like successful dieters if they were given an exogenous reminder to pay attention to health information, and that the reminder activated the same dlPFC-vmPFC networks that successful dieters activated on their own (Hare et al., 2011a).
These findings led us to propose the following model of the computational and neurobiological processes at work in selfcontrol (Hare et al., 2009(Hare et al., , 2011aRangel and Hare, 2010;Harris et al., 2013). In the model, the vmPFC computes the value of options at the time of decision, by first assessing their various attributes, and then integrating them into a net value for the option as a whole. Importantly, "basic" attributes like tastiness might always be represented in the final value. However, more abstract attributes like healthiness are only represented, or are represented more strongly, if the dlPFC comes online and modulates activity in vmPFC so that its value computations incorporate them. This modulation is critical for optimal decision-making because, if some of the attributes are not represented or weighted properly, the vmPFC will assign values to options that are not consistent with the long-term, goal-relevant (e.g., proper nutrition) rewards that they generate.
An important open question is whether this model is also at work in other decision domains, such as those involving intertemporal monetary tradeoffs. This question is important because comparing the mechanisms at work in different decision contexts is a critical step in identifying common mechanisms that facilitate self-control. Theoretically, these circuits should also influence the degree of discounting for delayed rewards in the case of intertemporal choice, as long as dlPFC modulation of vmPFC can lead to an increased (or decreased) weighting for delayed rewards.
Here we address this open question by testing the following three hypotheses. First, we hypothesized that regions of left dlPFC similar to those that are more active during self-control in dietary choice would also be more active in intertemporal choice when the subjects choose the larger-delayed payment over the money available today, after controlling for their relative subjective values. Note that it is crucial to control for the subjective values because, if the subjective value of the delayed reward is large enough, the decision to wait becomes trivial. Second, we hypothesized that effective connectivity from left dlPFC to vmPFC would be stronger during trials in which subjects choose larger-delayed rewards (again controlling for subjective value), which is consistent with the idea that dlPFC can modulate the value signals in vmPFC so that they place more weight on the value of delayed payouts. Third, we hypothesized that the levels of activation in dlPFC, as well as its effective connectivity to vmPFC, would help to explain differences in discount rates across subjects.
These hypotheses are based not only on previous work in dietary choice, but also on findings from the previous literature on goal-directed choice. First, areas of vmPFC have consistently been shown to correlate with stimulus values at the time of choice across a wide variety of decision contexts (Tom et al., 2007;Boorman et al., 2009;Lebreton et al., 2009;Basten et al., 2010;Hare et al., 2010;Plassmann et al., 2010;Shenhav and Greene, 2010;Clithero et al., 2011;Kahnt et al., 2011;Park et al., 2011), including decisions involving intertemporal tradeoffs (Kable and Glimcher, 2007;Ballard and Knutson, 2009;Hare et al., 2009Hare et al., , 2011aCarter et al., 2010;Peters and Buchel, 2010). Second, previous studies have associated responses in left dlPFC with choosing to wait for delayed monetary rewards using transcranial magnetic stimulation (TMS) and fMRI (McClure et al., 2004;Figner et al., 2010;Luo et al., 2012). In particular, Figner et al. (2010) showed that temporarily reducing activity in left dlPFC via TMS results in subjects making more impatient choices, thus, establishing a causal role for this region in temporal discounting. Third, recent studies have found that resting-state connectivity in networks including left dlPFC was correlated with discount rates (Gianotti et al., 2012;Li et al., 2013).
Despite the attractiveness of the theory, and the body of consistent evidence, critical questions remain open. In particular, none of the previous studies have examined the effective connectivity between dlPFC and vmPFC during intertemporal monetary choices, nor have they established that the dlPFC influences discount rates through a mechanism that involves the modulation of the stimulus values computed in vmPFC, or that the effective connectivity runs from dlPFC to vmPFC, and not the other way around. Here we are able to address these questions by estimating Dynamic Causal Models , and using those parameter estimates to explain and predict differences in discount rates across individuals.

PARTICIPANTS
Twenty-seven subjects (18 males; age: mean = 24.1 years; range = 19-40) were included in the study. Two additional subjects were excluded because of excessive head motion during the scanning session (>2 mm in translation or rotation). All participants had normal or corrected-to-normal vision, no history of neurological, psychiatric, or metabolic illness, and were not taking any medications that interfere with the blood oxygenation level-dependent (BOLD) signal at the time of scanning. The Institutional Review Board at California Institute of Technology approved the methods and procedures used in this study.

INTERTEMPORAL CHOICE (ITC) TASK
On every trial, subjects chose between getting $25 at the end of the experiment, or getting an equal or larger amount at a later date. The later offers ranged from $25 to $54; with a delay from 7 to 200 days. Subjects made 216 decisions. The unique combinations of amount and delay used are shown in Table 1. All subjects saw the same set of options, although in different random orders. Each option was shown twice. Note that by presenting all subjects with the same options we were able to control for the objective reward levels when testing how neural activity relates to discount rates. Although beneficial for the hypotheses tested in previous studies Glimcher, 2007, 2010;Buchel, 2009, 2010), tailoring the choice sets around the indifference points of  7  2 5  2 6  2 8  3 0  3 2  3 5   10  25  26  27  29  30  32   12  25  26  28  31  33  35   14  25  26  28  32  35  39   21  26  27  29  30  32  38   25  27  29  31  33  35  46   28  26  28  32  35  39  46   30  26  27  29  30  32  38   40  27  33  35  40  47  54   45  26  29  31  35  40  46   50  27  30  35  40  46  54   60  29  33  35  40  47  54   90  26  30  33  40  46  54   95  31  33  35  40  47  54 1 0 0 2 6 3 1 3 8 3 9 4 6 5 4 each subject would create a confound with objective value when examining how individual differences in neural activity relate to discount rates because less patient subjects would be shown delayed rewards with higher monetary values. As described in Figure 1A, each trial began with an offer presented onscreen. Participants were required to press within 3 s to indicate whether or not they accepted the delayed reward offered. Only the varying delayed option was presented onscreen. A button press response resulted in the termination of the offer screen, and the appearance of a feedback screen for 250 ms displaying "Yes," if the delayed offer was accepted, or "No," if it was rejected. The phrase "No decision received" was displayed if the subject failed to respond within 3 s (mean = 2% of trials, standard deviation 5%, median = 0%). Trials were separated by a fixation cross of random duration (uniform: 2-6 s). The assignment of left/right button presses to accept/reject responses was counterbalanced across subjects. At the end of the experiment a single trial was randomly chosen and implemented: subjects received the chosen option in addition to $50 (available immediately) for participating in the study. All payments were made using prepaid debit cards given to the subjects at the end of the experiment. This allowed us to make the delayed payments available on the appropriate date, without requiring subjects to return to the lab.

BEHAVIORAL DATA ANALYSIS
We estimated an individual discount factor (denoted by k) for each subject using maximum likelihood. In particular, we assumed that subjects assigned value to the delayed options using a hyperbolic discounting function, in which the value of $A with FIGURE 1 | Task design and behavioral data. (A) Example display screens and timing parameters. (B) Choice curve displaying the probability of choosing the larger, delayed reward. The y-axis shows the probability of selecting the future reward and the x-axis displays the stimulus value of the future reward. Error bars represent the standard error of the mean. (C) Bar graph showing the distribution of discounting parameters across subjects. The x-axis represents individual subjects and the y-axis is the magnitude of the discount parameter k from a hyperbolic discounting function. a delay of D days is given by where dSV denotes the discounted stimulus value. We also assumed that the probability of accepting the delayed option is given by the soft-max function where b is a non-negative parameter that modulates the slope of the psychometric choice function. Note that in this formula the value of the constant reference option is $25.

IMAGING DATA ACQUISITION
FMRI data were collected in a Siemens (Erlangen, Germany) 3.0 Tesla Trio MRI scanner. Using an eight-channel, phased array head coil, we collected gradient echo T2 * -weighted echoplanar (EPI) images with BOLD contrast. In order to optimize BOLD sensitivity, we used a tilted acquisition in an orientation 30 • oblique to the anterior-posterior commissure line (Deichmann et al., 2003). The imaging parameters were as follows: TR = 2500 ms; TE = 30 ms; flip angle = 80 • ; FOV = 192 mm; inplane resolution = 3 × 3-mm; and 40 3-mm slices (0.3-mm gap) with ascending acquisition. While in the scanner, subjects completed two runs of the ITC task (with 323 volumes acquired per run). They also completed an additional task involving the degustation of liquid rewards that is not relevant to this study (task order was counterbalanced across participants). Highresolution, whole-brain T1-weighted structural images (TR = 1500 ms; TE = 3.05 ms; flip angle = 10 • ; voxel resolution = 1 mm 3 ; single-shot, ascending acquisition) were also collected for each participant. These images were co-registered with the their respective EPI images to assist with the anatomical localization of the functional activations.

FMRI DATA PREPROCESSING
Imaging data were preprocessed using SPM8 (Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK). Data were corrected for motion with realignment to the mean image, spatially normalized to the Montreal Neurological Institute EPI template, resampled to 3 mm 3 voxels, and spatially smoothed using a Gaussian kernel (full-widthat-half-maximum = 8 mm). Data were also temporally filtered using a filter width of 128 s.

GLMs
We estimated two different mixed effect models of the BOLD responses, with first order auto correlation correction [AR(1)]. The models were designed to localize in our sample the areas of vmPFC that, as discussed in the Introduction, have been repeatedly shown to correlate with stimulus values at the time of choice. The models are identical except for the specification of the value modulators.
The first model, GLM-dSV, had the following regressors of interest: (1) An indicator function beginning at the onset of each decision screen with duration equal to the reaction time for that trial, (2) the indicator function modulated by the subject specific value of each delayed offer (dSV), and (3) the indicator function modulated by the variable Accept (which equals 1 if the subject chooses the delayed outcome, and zero otherwise). The third regressor was orthogonalized with respect to the second one in order to assign any shared variance between them to the dSV regressor. The model also included session dummies, linear time trends, and head movements as regressors of no interest.
The second model, GLM-rdSV, was identical except for the specification of the parametric regressor. In particular, for the reasons described in the Results section, we defined a relative discounted subjective value (rdSV) variable, which is equal to dSV −25 for subjects that choose the delayed variable more than 50% of the time (15 subjects), and is equal to 25-dSV for those that choose the immediate option more frequently (12 subjects).
Both GLMs were estimated in three steps. First we estimated the model at the individual level. Second, we calculated the following first-level single-subject contrasts: regressor 2 (dSV or rdSV) vs. baseline, and regressor 3 (Accept) vs. baseline. Third, we calculated second-level group contrasts using one-sample t-tests on the single-subject contrasts.
We controlled for multiple comparisons at the cluster level using an individual voxel threshold of p < 0.005 to achieve a whole brain corrected (WBC) p-values less than 0.05 (cluster sizes are listed in each table). We also used small volume corrections (SVCs) in areas of a priori interest to the study of the selfcontrol mechanisms that are at the core of the hypotheses tested here. We carried out an SVC in the vmPFC using an anatomical mask based on the AAL atlas (Tzourio-Mazoyer et al., 2002) that included the rectal gyrus, medial orbitofrontal, and anterior cingulate cortex below z = 5 (1619 3-mm 3 voxels). A region in left dorsolateral PFC, in Brodmann Area (BA) 46, has been shown to play a role in various types of self-control tasks (Hare et al., 2009(Hare et al., , 2011aFigner et al., 2010). Because anatomical masks of dlPFC lacked the required specificity to isolate this region, we carried out SVC using a mask composed of a 10-mm radius sphere centered around the target coordinates (x, y, z = −36, 30, 27) used by Figner et al. (2010) to demonstrate a causal role of dlPFC on discounting behavior. The TMS stimulation of left dlPFC in this study was directed to the F3 location of the international 10-20 system for electrode placement, and we used the transformation algorithms in the Münster T2T-Converter software (http://wwwneuro03.uni-muenster.de/ger/t2tconv/) to compute an estimate of the underlying MNI coordinates.
To assess the impact of the dSV regressor on model fit within the reward valuation network identified in two recent metaanalyses (Bartra et al., 2013;Clithero and Rangel, 2013), we reestimated GLM-dSV and a reduced form of GLM-dSV using the Variational Bayes routines in SPM8 (Penny et al., , 2005. The reduced form of GLM-dSV excluded the parametric regressor for dSV, but included all other regressors described above. Following estimation of both models at the single subject level, we used a random effects Bayesian model selection (BMS) procedure (Rosa et al., 2010) to assess the variance explained by the dSV regressor independent of the sign (+/−) on its coefficient in a fashion similar to an F-test, but additionally accounting for model complexity. This BMS procedure generated exceedance probabilities from the model evidence for each GLM. The exceedance probabilities correspond to the belief that the full version of GLM-dSV is more likely than the reduced version given the data from all subjects (or vice versa). We evaluated the exceedance probability in all voxels within the conjunction of masks generated from recent meta-analyses on reward valuation. Specifically, we formed the conjunction mask from the voxels shown in Clithero and Rangel's (2013) Figure 3 and Bartra et al. (2013) Figure 3A. This mask included voxels in vmPFC, ventral striatum (vStr), and posterior cingulate cortex (PCC) consistently found to positively correlate with the value of reward across reward types and decision contexts in the meta-analyses listed above. This mask can be downloaded from the following website: http://www.rnl.caltech. edu/resources/index.html.

DYNAMIC CAUSAL MODELING (DCM)
We tested the hypothesis that the effective connectivity from left dlPFC-BA46 to vmPFC plays a critical role in self-control using DCM . The analysis proceeded in several steps.
First, for each subject we extracted average activation time courses from vmPFC and left dlPFC-BA46. In particular, for every subject we defined an ROI with a 5-mm radius, and a center given by each subject's most significant voxel within the group ROIs. The group ROI in vmPFC was defined based on the conjunction between voxels showing an effect for the rdSV and Accept regressors from GLM-rdSV at an uncorrected threshold of p < 0.005. The group ROI in dlPFC-BA 46 was defined as all voxels showing an effect for the Accept regressor from GLM-rdSV at an uncorrected threshold of p < 0.005.
Second, we optimized the basic architecture of the DCM, in terms of where experimental inputs entered. To do so, we estimated 64 different DCMs that could be organized into four different families ( Figure 3A), based on how the variables rdSV and Accept affect activity in dlPFC and vmPFC. Each family contained 16 models that varied in terms of the combinations of connectivity between vmPFC and dlPFC-BA46 as a function of three events: fixation, all choice periods, and choice periods in which the delayed option is selected.
Third, we compared each model family using BMS (Stephan et al., 2009) to determine the most likely pattern of task related inputs into dlPFC and vmPFC.
Fourth, having optimized the model inputs, we calculated the parameter estimates and posterior probabilities of the full model (i.e., the one containing coupling parameters from dlPFC-BA46 to vmPFC and vice versa for all choice types and inter-trial fixation, as shown in Figure 3). Parameter estimates were computed using Bayesian parameter averaging (BPA) over subjects (Kasess et al., 2010). For completeness, we also tested the effective connectivity parameters from dlPFC-BA46 to vmPFC using two-tailed, one-sample t-tests against zero across individuals.

PREDICTION EXERCISE
We tested the hypothesis that the effective connectivity from left dlPFC-BA46 to vmPFC predicts between-subjects differences in the discount rate using the following out-of-sample prediction exercise. For every subject, we estimated the following linear regression using elastic net regularization (Zou and Hastie, 2005) (alpha parameter = 0.3) on the other N−1 subjects.
Where the dependent variable y was the log of the individual discount rates (k) and the explanatory variables x 1 , . . . , x p were the complete set of estimated DCM parameters for each functional run (exclusive of the hemodynamic parameters) listed in Table 7. One advantage of the elastic net regularization is that the regression model is more robust to correlated predictor variables such as the DCM parameters for separate runs from the same subject.
We then used the fitted coefficients from the elastic net model to predict if the discount parameter for the excluded individual was above or below the mean value for the N−1 subjects. The procedure was repeated for every subject. Finally, we computed the balanced accuracy of the prediction using the confusion matrix, in which the rows represent the true labels and the columns represent the predicted labels, generated by our classification results (Brodersen et al., 2010). Briefly, this method controls for any imbalance in the data classes that may bias the classifier accuracy. The balanced accuracy is computed as where TP, FN, TN, and FP represent the number of true positives, false negatives, true negatives, and false positives respectively. In order to further test the specificity of our findings, we carried out different versions of this prediction exercise, in which subsets of the DCM parameters were excluded (see the Results section for details), or other candidate regions replaced left dlPFC-BA46 in the DCM. In all cases, we used the same fully connected DCM model with a fixed input to dlPFC-BA46 (or its replacement, when appropriate) on accepted trials and an input parametrically varying with the subjective value of the delayed reward to vmPFC.

SPECIFICITY TESTS
These tests were designed to test the specificity of left BA46 activity and connectivity on the results. To do this, we repeated the DCM and prediction exercises described above using the left dlPFC-BA9 region listed in Table 6, or an ROI created by mirroring the 10 mm sphere centered on the estimated coordinates from Figner et al. (2010) to the right hemisphere.

CHOICE BEHAVIOR
We began the analysis by estimating individual discount rates (denoted by k) using maximum likelihood, and under the wellvalidated assumption that subjects exhibit hyperbolic discounting (Frederick et al., 2002;Green and Myerson, 2004;McKerchar et al., 2009). These estimates also allowed us to compute the discounted stimulus value (dSV) that each subject assigned to each option. As shown in Figure 1B, which depicts the group's psychometric choice curve, the estimated values provided a good description of the choice data. Figure 1C provides an ordered histogram of the individual estimates of the discount parameter k (with larger values denoting more frequent choices for immediate reward), and shows that there were sizable differences across individuals. This is important because one of the goals of the study is to relate individual differences in brain activity to differences in discounting behavior.

REACTION TIMES
Subjects responded well under the time limit of 3 s for both immediate [mean = 1.22 s, SD = 0.24 s, t (26) = −39.2, p < 0.001] and delayed choices [mean = 1.20 s, SD = 0.25 s, t (26) = −37.2, p < 0.001]. An ANOVA on reaction times as a function of choice (accept delayed offer vs. take money now) and group (those who chose to wait for delayed rewards on the majority of trials-wait group, WG-vs. those who most often took the money available today-now group, NG) showed no main effects of choice or group [F (1, 48 ) = 0.10, p = 0.76 and F (1, 48) = 0.49, p = 0.49, respectively]. However, there was an interaction between the tendency to wait and choice [F (1, 48) = 4.79, p < 0.05]. This interaction was driven by the fact that WG subjects showed faster reaction times when choosing the delayed reward (mean = 1133 ± 70 ms) than when choosing the immediate reward [mean = 1277 ± 63 ms; t (14) = −3.99, p < 0.001], whereas NG subjects had slower reaction times when choosing the delayed reward (mean = 1278 ± 62 ms) compared to immediate rewards [mean = 1143 ± 60 ms; t (11) = 2.58, p < 0.05].

GLM ANALYSES LOCALIZING THE dLPFC AND vmPFC ROIs
A central goal of this study was to investigate the role of effective connectivity between the area of vmPFC that has been widely associated with the computation of stimulus values at the time of choice, and an area of left dlPFC, in BA 46, that has been shown to exert a causal influence on discounting behavior in monetary intertemporal choices and implicated in self-control processes in various domains. (See the Introduction and Discussion for more details). In order to carry out the connectivity analyses, we first needed to localize these two brain regions in our sample.
To do this, we first estimated a GLM of the BOLD responses that contained dSV and Accept (defined as 1 if the subject chose to accept the delayed option, and 0 otherwise) as parametric modulators at the time of decision (GLM-dSV). Based on previous studies, we expected that activity in vmPFC, PCC, and vStr would correlate with the dSV regressor, as these areas have been shown consistently to encode subjective values at the time of choice (Bartra et al., 2013;Clithero and Rangel, 2013). Note that in our task the immediate option was invariant ($25), whereas the delayed option changed every trial. Therefore, all trial-wise variation in the value of the delayed option is captured by dSV, even if the brain computes relative value signals (e.g., dSV − 25 or 25 -dSV).
The contrast for the Accept regressor showed that, after controlling for dSV, a region of left dlPFC in BA 46 was more active when subjects chose the larger, delayed option (p < 0.05, SVC; Table 2). No regions were more active when declining the larger delayed reward in favor of the $25 today. Note that this increased activity when accepting delayed options is present after controlling for dSV, indicating that it does not reflect a mere tendency of the subjects to choose larger rewards more frequently (indeed 12 out of 27 subjects in our sample choose the objectively smaller reward today most often).  The contrast for dSV revealed a large cluster (1277 voxels) with a peak in the middle frontal gyrus (Brodmann Area 6), extending into the putamen that was positively correlated with dSV (p < 0.05 WBC; Table 3), but, to our surprise, no significant activity in the vmPFC region consistently linked to value computation at the group level (Bartra et al., 2013;Clithero and Rangel, 2013). In addition, none of the regions from the meta-analysis of negative correlations with subjective value in Bartra et al. (2013) showed decreasing activity as a function of dSV at whole brain or small volume corrected thresholds. Although there was no significant activity in canonical reward regions at the group level, inspection of individual participant results revealed positive correlations with dSV within the vmPFC for many individuals. However, there was also a large fraction of participants who showed negative correlations with dSV in vmPFC, resulting in a summation of signed t-test coefficients that was close to zero at the group level.
In order to examine the association between dSV and BOLD signals at the group level in a manner independent of the sign (+/−) of this relationship, we re-estimated GLM-dSV, as well as a reduced version of this model excluding the dSV regressor, using Variational Bayes (see Materials and Methods). The logic of comparing the original version of GLM-dSV with the reduced model excluding dSV is that any differences in their fits to the data can be attributed to variance explained by dSV. Critically for our purposes, the amount of variance explained is unchanged by the sign of the regression coefficient allowing us to compute random effects statistics across individuals showing positive and negative effects of dSV. Following estimation of these models at the individual subject level, we compared their relative probabilities given the data from all subjects using BMS within a mask of reward value sensitive regions including vmPFC, vStr, FIGURE 2 | Voxels in meta-analytically defined reward value regions whose pattern of activity is better explained by a GLM including dSV as a regressor than a reduced version of the GLM omitting dSV. Voxels in violet are those within a mask of reward value sensitive regions including vmPFC, vStr, and PCC created from recent meta-analyses on reward value computation (Bartra et al., 2013;Clithero and Rangel, 2013) where the exceedance probability for GLM-dSV compared to the reduced version without dSV is 0.90 or higher. The exceedance probability was 0.90 or higher for the version of GLM-dSV including dSV in 83% of voxels within the meta-analysis conjunction. and PCC created from recent meta-analyses on reward value computation (Bartra et al., 2013;Clithero and Rangel, 2013). Here we compared the models based on their respective exceedance probabilities, which are measures that corresponds to the belief that a particular model is more likely than any other in the test set given the data from all participants. Figure 2 shows all voxels in this mask where the exceedance probability for GLM-dSV compared to the reduced version without dSV is 0.90 or higher. Quantitatively, 83% of voxels in the meta-analysis conjunction had an exceedance probability of 0.90 or higher for the version of GLM-dSV including dSV, whereas only 4% had a value of 0.90 or higher for reduced GLM-dSV. This indicates that the dSV regressor explains a significant amount of the variance in the BOLD response in vmPFC, vStr, and PCC.
Given that previous studies have found evidence consistent with the encoding of relative value signals in vmPFC at the time of choice (Boorman et al., 2009;Lim et al., 2011;Hunt et al., 2012), we carried out an additional generalized linear model (GLM-rdSV). We hypothesized that there might be individual differences in the computation of the relative subjective value. In fact, a class of popular models in behavioral economics predicts that subjects will use as their reference item (i.e., the one that is subtracted when computing relative value) the option that they choose most frequently (Koszegi and Rabin, 2006). Based on this, we defined a relative discounted subjective value regressor (rdSV) that is given by dSV − 25 for those that picked the delayed option more than 50% of the time, and by 25 − dSV for those that select the immediate option most frequently.
We estimated a new GLM (GLM-rdSV) with rdSV and Accept as parametric modulators at the time of decision. Consistent with the post-hoc hypothesis that value computations were made relative to the most frequent choice, we found that BOLD responses in vmPFC (p < 0.05 SVC, Figure 3A; Table 4) and the anterior superior temporal gyri (p < 0.05 WBC) were positively correlated with the modified value regressor. In addition, several regions including the anterior insula (AI), dorsomedial prefrontal cortex (dmPFC), inferior parietal cortex, middle frontal gyri, and posterior cingulate showed negative correlations with rdSV (p < 0.05 WBC; Figure 4A; Table 5). Both the vmPFC region that positively correlated with rdSV and the portions of the AI and dmPFC that negatively correlated with rdSV show considerable overlap with results from recent meta-analyses (Bartra et al., 2013;Clithero and Rangel, 2013) of reward value representation (Figures 3B, 4B). This overlap with previous results for both positive and negative correlations with subjective value suggests that there is significant variation between subjects in how discounted subjective values are computed and that this computation may be related to choice frequencies (e.g., most often wait or rarely wait) consistent with our post-hoc hypothesis. We do not presume or infer any causal relationships between choice frequency, and the directionality of relative discounted subjective value computations from these results, and it may be that a third as yet unknown variable drives choice preference, value computation, or both. However, this analysis provides us with a sample-specific ROI for vmPFC in which to test our main hypothesis about dlPFC modulation and the prediction of individual differences.
Consistent with GLM-dSV, GLM-rdSV showed that regions of left dlPFC in BA 46 and 9 were more active when subjects chose the larger, delayed option (p < 0.05, SVC; Figure 5; Table 6). Just as in GLM-dSV, no regions were more active when declining the larger delayed reward in favor of the $25 today. Note that both GLM-dSV and GLM-rdSV control for the value of delayed rewards in a similar manner. The variance explained is the same in both models because in the individual subject GLMs only the sign of dSV regressor, and therefore, the sign on the regression coefficients changes while the explanatory power of the regressor remains the same.

TESTS OF EFFECTIVE CONNECTIVITY
Next, we used the ROIs in dlPFC-BA46 and vmPFC to test our first hypothesis; namely, that effective connectivity from left dlPFC-BA46 to vmPFC plays a critical role in delaying gratification. This test was carried out on time courses extracted from the vmPFC and dlPFC-BA46 ROIs identified in GLM-rdSV. We focused on the ROI in BA46 because a previous TMS study found a causal role for this region in choosing to wait for larger delayed rewards in monetary intertemporal choices (Figner et al., 2010). Furthermore, our previous effective connectivity analyses of dietary self-control choices suggested other dlPFC regions active during self-control choices (e.g., BA9) might work through BA46 to modulate vmPFC (Hare et al., 2009). As explained in the Materials and Methods section, the test was performed in several steps.
First, we estimated 4 different DCM families that were grouped based on how the experimental variables rdSV and Accept entered into the model as driving inputs (Figure 6A). Each family contained 16 models that varied on how the vmPFC and dlPFC-BA46 affect each other as a function of three task events: fixation, choice periods, and choice periods when the delayed option is selected.  Bartra et al. (2013) and Clithero and Rangel (2013). All voxels shown in violet are significant in all three studies.
Next, we used BMS to identify the most likely family of models. We compared the models based on their respective exceedance probabilities, a measure of whether particular model is more likely than any other in the test set given the data from all participants. The most likely model family (exceedance probability = 0.87), shown in Figure 6A, had two driving inputs (i.e., direct influences): an input to vmPFC given by the rdSV of the delayed option on every trial, and an input to dlPFC-BA46 given by Accept.
Lastly, we examined the effective connectivity parameters between dlPFC-BA46 and vmPFC using BPA on the fully connected model (Figure 6B) with the most likely experimental inputs across subjects. We found increased signaling from dlPFC-BA46 to vmPFC at the time of choice relative to inter-trial fixation times, and further increases when subjects selected the later option (posterior probability > 0.90 and 0.95, respectively; Figure 6C). In contrast, the signaling in the other direction was not significantly different from zero. For completeness, we also compared the estimated DCM coefficients using one-sample t-tests, which lead to the same conclusion: effective connectivity parameters from dlPFC-BA46 to vmPFC increased during all choices and further increased when subjects selected the later option [t (26) = 2.65 and t (26) = 3.80 respectively; p < 0.01], but signaling in the opposite direction did not increase significantly during any task events, suggesting that there is increased connectivity from dlPFC-BA46 to vmPFC during decisions to wait for larger delayed rewards, but not in the other direction.

BETWEEN-SUBJECTS PREDICTION
Next, we used the results of the DCM, to test our second hypothesis; namely, that it is possible to use inter-individual differences in the strength of effective connectivity between dlPFC-BA46 and vmPFC, as well as differences in local responses in those regions, to predict differences in discount rates. For each subject we estimated an elastic net regression model using only the data from the N−1 other subjects, with discount rates as the dependent variable, and the estimated DCM parameters as the predictors. The estimated parameters of the model were then used to predict whether the discount rate of the excluded subject was above or below the mean of the group. The procedure was repeated to obtain a prediction for each subject. We found that the mean balanced accuracy (MBA) across all subjects was 71% correct (95% posterior probability interval = 54 − 85%). In a complimentary analysis, we tested how well the continuous discount rate rankings (e.g., 1 = highest to 27 = lowest) estimated from the DCM parameters matched to those estimated from behavior, and found a significant correlation (Spearman's rho = 0.42, p < 0.02). Thus, the elastic net regression model can  Bartra et al. (2013). All voxels shown in violet are significant in both studies.
accurately predict both binary high low labels and the continuous ordinal rank of discount rates using DCM parameters.
Next, we compared the accuracy of several versions of this prediction exercise, to test the specific role of the various components of the DCM in predicting the individual discount rates. Note that all parameters were estimated in the fully connected version of the model (shown in Figure 6) and were simply omitted from the elastic net regressions during these tests. The logic of these tests is as follows: the prediction accuracy of a regression model that excludes a key parameter should drop, whereas excluding a parameter that does not play a role in intertemporal choice should not affect the model's ability to predict the discount rates. The first test excluded the local driving input response parameters in vmPFC (Spearman's rho = 0.14, p = 0.48; MBA = 65%; 95% post. prob. int. = 47 − 80%). The second test excluded the local driving input response parameters for dlPFC-BA46 (Spearman's rho = −0.03, p = 0.90; MBA = 60%; 95% post. prob. int. = 44 − 76%). The third test excluded the effective connectivity parameters from dlPFC-BA46 to vmPFC (Spearman's rho = −0.12, p = 0.54; MBA = 58%; 95% post. prob. int. = 41 − 74%). The fourth test excluded the effective connectivity parameters in the other direction (Spearman's rho = 0.02, p = 0.91; MBA = 54%; 95% post. prob. int. = 37 − 70%). We found that omitting effective connectivity parameters between dlPFC-BA46 and vmPFC in either direction, or parameters measuring local task induced responses within dlPFC-BA46 or vmPFC reduced the accuracy to chance levels. Together, these findings show that the local responses in both areas, as well as both directions of effective connectivity between dlPFC and vmPFC, are critical for explaining the individual differences in discounting. Table 7 lists the relative size and direction of the effects of each DCM parameter on discount rates when estimating the model over all 27 participants. Finally, we tested the specificity of these results with regard to the left dlPFC BA46 region. We replaced left dlPFC BA46 in DCMs using the vmPFC with either the more posterior left dlPFC BA9 ROI that was also found to increase its activity when subjects chose the delayed rewards, or with an ROI created by mirroring the 10 mm sphere centered on the estimated coordinates from Figner et al. (2010) to the right hemisphere. This resulted in two new DCMs and elastic net regression models. None of these combinations yielded significantly better than chance predictions (best MBA = 55%) or significant correlations with the true discount rates. The results replacing left dlPFC-BA 46 with the analogous region in the right hemisphere are consistent with previous TMS results showing that only stimulation of the left hemisphere impacted choices for the delayed monetary rewards (Figner et al., 2010).

DISCUSSION
The results in this paper, in conjunction with previous reports (Hare et al., 2009(Hare et al., , 2011aHarris et al., 2013), suggest that a similar set of computational and neurobiological mechanisms are at work in tasks involving the delay of gratification in dietary and monetary intertemporal choices. In particular, we found that left dlPFC BA46 becomes more active in trials in which subjects choose the delayed option, which on average requires more self-control. We also found that the connectivity from left dlPFC BA-46 to a region of vmPFC widely associated with the computation of stimulus values (Bartra et al., 2013;Clithero and Rangel, 2013), increased at the time of choice, and especially during trials in which subjects chose to wait for the delayed reward. In addition, we were able to explain between-subject differences in discount rates using the estimated parameters from a DCM including the activity within dlPFC BA-46 and vmPFC, and the coupling between them, but only if the effective connectivity parameters between the two areas were included.
These results parallel previous findings in the domain of dietary choice, in which individuals chose among foods that differed in their tastiness and healthiness (Hare et al., 2009(Hare et al., , 2011aHarris et al., 2013). Although an explicit between-subject prediction exercise was not performed in those previous studies, the data indicate a central role for dlPFC-vmPFC interactions  Figner et al. (2010) (x, y, z = −36, 30, 27).
Peak voxel coordinates and cluster sizes within the small volume correction masks are listed in plain text below the corresponding clusters identified in the whole brain analysis.
in dietary self-control. This suggests that the mechanisms mediating self-control described in the Introduction are at work in both tasks, and thus helps to advance our understanding of common computational and neurobiological components of various forms of self-control. In this model, vmPFC computes the value of options by identifying its various attributes, assigning value to them, and then integrating them into a net value for the option. A critical component of the model is that basic attributes (like immediate monetary payoffs, or the tastiness of foods) are preferentially incorporated into the values computed in vmPFC, but that more abstract attributes (like delayed monetary payoffs, or the healthiness of foods) are generally given less weight unless left dlPFC comes online and modulates activity in vmPFC, so that it weights all attributes according to the current goals (e.g., eat healthy or maximize monetary payoff). Note that the types of attributes that need to be represented and evaluated in both types of tasks are different, but that poor self-control could be attributed to the same source in both cases: reduced weighting of abstract attributes in vmPFC in the absence of dlPFC modulation. One limitation of the study must be emphasized. Our experiment is not able to differentiate between heterogeneity in the discount rates attributable to patience or self-control abilities (potentially mediated by differences in dlPFC functioning or connectivity), and heterogeneity due to differences in individual circumstances (e.g., immediate budgetary constraints) that are not directly associated with patience or self-control. Differences in individual circumstances, therefore, do not enter our prediction model and may be one reason why the model is less than perfectly accurate. In other words, our analysis cannot indicate if less patient subjects failed to wait for delayed rewards because they are unable to do so, or because their best option was to take the immediate monetary payout.
These results provide novel interpretations of results in the sizable literature on intertemporal choice paradigms. Consider three important examples.
First, there has been a debate in the literature on whether or not there are multiple and competing value signals at work in self-control. In particular, previous findings (McClure et al., 2004(McClure et al., , 2007 have been interpreted as suggesting that vmPFC-VStr and dlPFC compute parallel but distinct value signals, with a vmPFC-VStr valuation system placing more value on immediate, concrete outcomes, and areas such as dlPFC computing the value of longterm, abstract goals. In this view, the quality of decisions depends on competition between the two valuation systems. In contrast, others have proposed that one value system integrates information about all stimulus attributes, both immediate and long-term, to form an overall value for the stimulus Glimcher, 2007, 2010). In this view, the quality of decision-making depends solely on the weighting of different stimulus attributes in value computation. The results here, and in previous work (Hare et al., 2009(Hare et al., , 2011a, suggest an obvious way of reconciling both views. In this class of tasks, choices seem to be driven by the stimulus This table reports the regression coefficients from the elastic net regression using the DCM parameters specified above to predict discount rates (log(k)). value signals encoded in a vmPFC-based valuation system, but the activation of dlPFC is critical for the deployment of self-control, because it appears to promote increased weighting of foresighted stimulus attributes in the vmPFC value signals as evidenced by increased effective connectivity to vmPFC during larger delayed choices. Second, our results provide a mechanistic explanation of the influential study of Figner et al. (2010), which found that applying inhibitory TMS over left (but not right) dlPFC-BA46 resulted in a decrease in subjects' willingness to wait for delayed rewards (Figner et al., 2010). Consistent with the implication of a causal role for left dlPFC in self-control from these previous results, we find that this region is more active when subjects chose larger future rewards over payments on the same day, after controlling for the subjective value of the payments. Furthermore, our data and analyses indicate that the left BA46 region of dlPFC contributes to delaying gratification by influencing the valuation process in vmPFC at the time of choice, rather than intervening after valuation has occurred, as was previously suggested in Figner et al. (2010). The previous suggestion by Figner and colleagues was based on their finding that choices over delayed options, but not the attractiveness ratings of those delayed rewards were affected by TMS to left dlPFC. However, our data on effective connectivity from dlPFC to vmPFC at the time of choice are more consistent with a mechanism in which dlPFC activity directly impacts valuation processes at the time of choice. We note, however, that these results are not contrary to Figner and colleagues' assertion that the role of dlPFC is specific to decisions as opposed to outcome free ratings.
Third, recent EEG and fMRI studies have found that individual measurements of activity and connectivity within networks including left dlPFC taken at rest exhibited a sizable correlation with discount rates taken in separate behavioral tasks (Gianotti et al., 2012;Li et al., 2013). Similarly, a study of alcoholics found that responses in left dlPFC also correlated with behavior during intertemporal choices (Boettiger et al., 2007). Our results also provide a novel mechanistic explanation for these findings as a whole. Furthermore, our prediction exercises show that measures of effective connectivity between dlPFC and vmPFC are a critical aspect of being able to predict individual discount rates.
We investigated the specificity of the dlPFC-vmPFC interactions in self-control by repeating a similar exercise replacing left dlPFC-BA 46 with left dlPFC-BA 9 or right dlPFC-BA 46.
The specificity test using left dlPFC-BA 9 was motivated by the fact that this area was more active when subjects delayed gratification in previous dietary choice experiments (Hare et al., 2009(Hare et al., , 2011a) as well the current monetary choice dataset, although the activity did not survive whole brain correction in the current sample. However, it did not result in significant correlations with or above average predictions of between-subject discount rates. This is consistent with our previous findings in dietary self-control where dlPFC-BA9 did not directly interact with vmPFC, but rather affected a more anterior region in BA46, near the region we find in the current intertemporal monetary choice task.
The intertemporal choice task utilized here, as well as the dietary choice task that we have used in our previous related work (Hare et al., 2009), examines the deployment of self-control in the context of goal-directed choice. Other types of self-regulation might be better characterized by competition between habitual and goal-directed systems (Dayan et al., 2006;Balleine et al., 2008;Rangel et al., 2008), or by the type of response inhibition associated with action control in paradigms such as the go/no-go, Flanker, or Stroop tasks (Wager et al., 2005;Congdon et al., 2010). A critical question for future work is to systematically investigate the commonalities and differences between these various sources of self-regulation.
Another avenue for further investigation is our finding that subjects appear to compute the discounted subjective value of delayed rewards relative to their most common choice, perhaps viewing this as a default. While not true in every case, the majority of subjects who most often chose the immediate reward appeared to positively encode a relative value signal in vmPFC equal to the difference between the immediate reward and the larger delayed reward (i.e., $25 −dSV). They also showed negative correlations with this relative value signal in a network of regions that includes dmPFC, AI, and parietal regions consistently shown to negatively correlate with SV ( Bartra et al., 2013). This network has been implicated in computations related to conflict, error processing, decision difficulty, and evidence accumulation (Carter et al., 1998;Botvinick, 2007;Pochon et al., 2008;Venkatraman et al., 2009;Wunderlich et al., 2009;Hare et al., 2011b). On the other hand, subjects who most often waited for the delayed reward frequently encoded the opposite relative value signal of dSV− $25 in both sets of regions. This suggests that it is important to control for reference point variation across subjects when examining the neural correlates of subjective values at the group level, but further investigation of this issue is clearly warranted.
While our findings may at first seem contradictory to previous reports where all subjects showed positive correlations with a value signal proportional to later reward-immediate reward, this can potentially be explained by important methodological differences. Many previous studies of intertemporal choice have customized the offer sets for each participant to maintain an acceptance rate close to 50% for all subjects Glimcher, 2007, 2010;Buchel, 2009, 2010). In contrast, we purposefully utilized the same offer set for all subjects to examine individual differences in neural responses. By keeping the response rate near 50% for all subjects, these previous studies may have also generated a more homogeneous encoding of relative value in their participants avoiding the heterogeneity present in our dataset. These previous datasets also highlight that our findings with regard to relative value computations are likely driven by choice or action probabilities and not a function of discount rates or self-control ability because these previous datasets show that when subjects with high discount rates are presented with choices around their indifference points, they also have positive correlations with delayed reward values in vmPFC. Such changes in the directionality of relative value computations as a function of choice or action probability represent an important target for future research.
In summary, our data provide evidence that the dlPFC supports the delay of gratification by modulating activity in a vmPFC region that reflects the stimulus value of available rewards. Our between-subjects prediction results indicate that both local activity levels and connection strengths between these brain regions mediate delay of gratification tendencies in this task. These findings also suggest that examining effective connectivity parameters in pathological populations with self-control deficits may provide useful insights into the biological basis of their dysfunction.