Temporal discounting and inter-temporal choice in rhesus monkeys

Hwang, Jaewon; Kim, Soyoun; Lee, Daeyeol

doi:10.3389/neuro.08.009.2009

ORIGINAL RESEARCH article

Front. Behav. Neurosci., 11 June 2009
Sec. Motivation and Reward
Volume 3 - 2009 | https://doi.org/10.3389/neuro.08.009.2009

Temporal discounting and inter-temporal choice in rhesus monkeys

Jaewon Hwang¹ Soyoun Kim² Daeyeol Lee^2*

Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA

Department of Neurobiology, Yale University School of Medicine, New Haven, CT, USA

Humans and animals are more likely to take an action leading to an immediate reward than actions with delayed rewards of similar magnitudes. Although such devaluation of delayed rewards has been almost universally described by hyperbolic discount functions, the rate of this temporal discounting varies substantially among different animal species. This might be in part due to the differences in how the information about reward is presented to decision makers. In previous animal studies, reward delays or magnitudes were gradually adjusted across trials, so the animals learned the properties of future rewards from the rewards they waited for and consumed previously. In contrast, verbal cues have been used commonly in human studies. In the present study, rhesus monkeys were trained in a novel inter-temporal choice task in which the magnitude and delay of reward were indicated symbolically using visual cues and varied randomly across trials. We found that monkeys could extract the information about reward delays from visual symbols regardless of the number of symbols used to indicate the delay. The rate of temporal discounting observed in the present study was comparable to the previous estimates in other mammals, and the animalÃ¯Â¿Â½s choice behavior was largely consistent with hyperbolic discounting. Our results also suggest that the rate of temporal discounting might be infl uenced by contextual factors, such as the novelty of the task. The fl exibility furnished by this new inter-temporal choice task might be useful for future neurobiological investigations on inter-temporal choice in non-human primates.

Introduction

The rewards that humans and animals seek to obtain are often not delivered immediately after the required actions are completed. In such cases, the subjective desirability or utility of the expected reward decreases with its delay, and this is referred to as temporal discounting. Consequently, during inter-temporal choice in which the decision makers choose between rewards delivered after unequal delays, they might in some cases prefer a small but immediate reward to a larger but more delayed reward. Such impulsive choices can be often parsimoniously accounted for by a discount function, which is defined as the fraction of the subjective value of a delayed reward relative to that of the same reward delivered immediately. The value of a delayed reward multiplied by the discount function is referred to as the temporally discounted value. In addition, denoting the discount function as F(D), in which D refers to the delay of a reward, the ratio F′(D)/F(D) is referred to as the discount rate and indicates how rapidly the discount function decreases with delay. Abnormally high discount rate underlies a number of psychiatric disorders, including substance abuse and pathological gambling (see Reynolds, 2006 ).

Regardless of the absolute value of discount rate, if the discount rate is constant and does not change with the reward delay, the discount function is exponential (Samuelson, 1937 ). This implies that the relative preference for two different rewards available at time t₁ and t₂ would not be affected when their delays are altered by the same amount and become available at time t₁ + Δt and t₂ + Δt, respectively. The fact that the preference between the two delayed rewards does not change with the elapse of time is referred to as time-consistency, but this assumption is commonly violated (Ainslie and Herrnstein, 1981 ; Green et al., 1981 , 1994 ; Rachlin and Green, 1972 ; Strotz, 1955–1956 ). In addition, a large number of empirical studies have found that behaviors of humans and animals during inter-temporal choice are better described by hyperbolic discount functions that violate time consistency (Frederick et al., 2002 ; Green and Myerson, 2004 ; Kalenscher and Pennartz, 2008 ). A decision maker with a hyperbolic discount function might prefer a larger and more delayed reward when both rewards have relatively large delays, but his or her preference might change when their delays are reduced by the same amount.

Although hyperbolic discount functions have successfully described behaviors for many different animal species, including humans, the overall rate of temporal discounting varied tremendously between humans and other animals. The reasons for this discrepancy are not fully understood, but might be related to the differences in the methods to measure the discount functions for humans and animals. In human studies, choices are typically presented using verbal cues, and the subjects are often allowed to engage in other activities while waiting for the delivery of rewards. In contrast, animals are tested in a more controlled environment and consume their chosen rewards after experiencing the corresponding delays. Moreover, in previous animal studies, reward delays and magnitudes are either fixed or adjusted gradually across successive trials so that they must be estimated from the animal’s experience. In the present study, we trained rhesus monkeys in a new inter-temporal choice task in which the information about the magnitude and delay of each reward is delivered symbolically and as a result could be manipulated independently across trials. We found that the animal’s behaviors were largely better accounted for by hyperbolic discount functions, whereas the form and rate of temporal discounting might be influenced by the novelty of the task.

Materials and Methods

Animal Preparation and Apparatus

Two male rhesus monkeys (monkeys D and J; body weight = 9.5 and 9.0 kg) were tested. During an aseptic surgery, a set of four titanium head posts were attached to the animal’s skull for the purpose of fixing the animal’s head during the experiment. The animals were seated in a primate chair and faced a 17-inch computer monitor located 57 cm away. A custom-designed software was used to control the task and coordinate data acquisition. Eye movements were monitored using a video eye tracking system with 225 Hz sampling rate (ET-49, Thomas Recording, Germany). All the procedures used in the present study were in accordance with the guidelines of the National Institutes of Health and were approved by the University of Rochester Committee on Animal Research.

Inter-Temporal Choice Task

General

Each trial began when the animal fixated a white square (0.9° × 0.9°) presented at the center of the monitor (Figure 1 ). After a 1-s fore-period during which the animal was required to maintain its fixation of the central square within a 2°-radius window, two targets (1° disk in diameter) were presented 8° to the left and right of fixation. The animal was required to continue its central fixation until the white square was extinguished 1 s later. At the end of this cue period, the animal was then required to shift its gaze towards one of the two targets. One of the targets (TS) was green and delivered a small reward when it was chosen by the animal, whereas the other target (TL) was red and delivered a large reward. The delay between the fixation of the chosen target and the reward delivery was indicated by a variable number of small disks (0.9° in diameter) presented around each target. When the target was presented without any disks, the animal was rewarded after a 0.5 delay (Experiment I) or immediately (Experiments II and III) upon fixation of its chosen target. Otherwise, disks were extinguished one at a time according to a specific schedule described below, and the animal was rewarded after all the disks were extinguished. Yellow disks were extinguished at the rate of 0.5 s/disk (Experiment I) or 1.0 s/disk (Experiments II and III). In Experiment III, a mixture of yellow (1.0 s/disk) and cyan (4.0 s/disk) disks were used in some trials. The brightness of a yellow disk was fixed until it was extinguished, whereas a cyan disk dimmed gradually during the 4-s period before it was extinguished. The target that was not chosen by the animal and its clock were extinguished immediately after the animal fixated its chosen target. If the animal chose the large reward, the central white square for the next trial was presented following a 2-s inter-trial interval after the reward delivery. If the animal chose the small reward, the inter-trial interval was increased by the difference in the reward delays for the small and large reward targets. Therefore, the onset of the next trial was not affected by the animal’s choice.

[View Larger Version of this Image]

Figure 1. Spatiotemporal sequences of the inter-temporal choice task. Three different types of clocks are referred to as ordered, random, and mixed. For both ordered and random clocks, the reward delay was indicated by the number of yellow disks that disappeared in a fixed or random order, respectively. Each yellow and cyan disk in mixed clocks corresponds to 1 and 4 s added to the reward delay, respectively.

The animal was required to maintain its fixation of the chosen target during the reward delay, but was allowed to re-fixate the target without any penalty if the target was re-fixated within 0.3 s after breaking the fixation. This also allowed the animals to blink without any penalty during the fixation on its chosen target. Throughout the experiment, the proportion of the trials that were aborted due to the animal’s failure to maintain its fixation during the reward delay was relatively low and never exceeded 2% of the trials. This always corresponded to a relatively small proportion fixation breaks during the entire trials, never exceeding 17% of all fixation breaks (mean = 1.6% and 6.9% for monkeys D and J, respectively). Moreover, extensive training was not necessary for fixation during the reward delays, and the animals frequently made saccades among the small disks. Although we could not quantify the additional efforts necessary for the fixation of the chosen target during the reward delays, these observations indicate that such efforts are likely to be relatively minor.

Reward delays and clocks

All the disks in the clock for a given target were presented on the circumference of an imaginary circle (4.0° in diameter) concentric with the target. In the following, the position of a disk in a given clock is described by its clockwise angular deviation from the position directly above the target. Disks were presented only at multiples of 45° (Figure 1 ). In the present study, three different types of clocks were used, and referred to as ordered, random, and mixed, respectively. For ordered and random clocks, only yellow disks were used, whereas mixed clocks included both yellow and cyan disks. In an ordered clock with n yellow disks, disks were presented at the positions corresponding to 0°, 45°,…, (n − 1) × 45°, and were extinguished counter-clockwise during the reward delay so that the disk at 0° position was always extinguished at the end of the reward delay (Figure 1 , top). In random and mixed clocks, the positions of disks were determined randomly, and they were extinguished in a random order during the delay period (Figure 1 middle and bottom).

Preliminary training

Each animal was initially trained to fixate the central white square. Next, it was trained to choose between the green small-reward target and the red large-reward target, while the delay for the small reward was always 0.5 s. Within a few days, both animals were gradually exposed to various reward delays and started to choose the large-reward target less frequently as its reward delay increased. No rewards were omitted during this training period, as long as the animal performed the task correctly. Before the data collection began for Experiment I, monkeys D and J were trained for this inter-temporal choice task for 9 and 12 days, respectively.

Experiment I

During the trials of Experiment I, only the ordered clocks were used and all disks in the clocks were yellow. The reward delay for the clock with n yellow disks was (n + 1)/2 s, where n = 0, 1,…8, corresponding to the delays ranging from 0.5 to 4.5 s. Among the 64 possible combinations of reward delays for the two targets, only those in which the reward delay for the large-reward target was equal to or longer than the delay for the small-reward target were used. This resulted in 45 different combinations of the reward delays. The positions of the large-reward and small-reward targets were counter-balanced across trials, resulting in 90 trials in a block. In Experiment I-A, the animal received 0.2 and 0.4 ml of apple juice for small and large rewards, respectively. The size of the small reward was increased to 0.27 ml in Experiment I-B, in order to encourage the animals to choose the small-reward target more frequently. Each animal performed 10 blocks (900 trials) each day (Table 1 ). Monkey D was tested in Experiment I-A for 5 days and then in Experiment I-B for 5 days, whereas the order of these two experiments was reversed for Monkey J.

Experiment II

In Experiment II, the clock with n yellow disks indicated that the reward delay was n seconds (n = 0, 1,…, 8). Thus, reward delays ranged between 0 and 8 s. During Experiment II, the small and large rewards were 0.27 and 0.4 ml of juice. As in Experiment I, all possible combinations of reward delays were used as long as the delay for the large reward was equal to or larger than the delay for the small reward. Each animal performed 10 blocks (900 trials) daily. Only the random clocks were used in Experiment II-A, whereas for Experiment II-B, only the ordered clocks were used (Table 1 ). After Experiment I, both animals were tested in neurophysiological experiments in which a subset of conditions included in Experiment II-A was used (Kim et al., 2008 ). Accordingly, Experiment II was conducted approximately 6 and 8 months after Experiment I for monkeys D and J, respectively. Both animals were tested for 5 days in Experiment II-A, and then for 5 days in Experiment II-B.

Experiment III

In Experiment III-A, mixed clocks were introduced to test whether the animals could extract the information about the reward delays independently of the number of disks in the clock. During Experiment III, a clock that includes n_Y yellow disks and n_C cyan disks indicated the reward delay of (n_Y + 4 n_C) s. Therefore, clocks did not include any cyan disks (n_C = 0) if the reward delay was less than 4 s. In addition, when the reward delay was 4, 5, 6, or 7 s, a given delay was indicated by one of two different types of clocks (n_C = 0 or 1). For example, the delay of 4 s could be indicated by (n_Y, n_C) = (4, 0) or (0,1), and the delay of 5 s by (5, 0) or (1, 1). Finally, three different types of clocks were used to indicate the 8-s reward delay, namely, (n_Y, n_C) = (8, 0), (4, 1), or (0, 2). Accordingly, 15 different types of clocks were available to indicate the reward delay ranging from 0 to 8 s. To limit the number of different combinations of clocks, the reward delays for the small-reward target were restricted to 0, 2, 4, and 6 s. Excluding the cases in which the delay for the small reward is longer than the delay for the large reward, therefore, a total of 64 different combinations of clocks were used in Experiment III-A. The positions of the large-reward and small-reward targets were counter-balanced, and this resulted in 128 trials in a given block. Both monkeys were tested for 5 days in Experiment III-A and completed six blocks (768 trials) each day. In Experiment III-A, the animal was rewarded by 0.27 and 0.4 ml of juice for choosing the small-reward and large-reward target, respectively.

Prior to Experiment III-A, both animals were trained with mixed clocks for several weeks. This preliminary training began approximately 5 and 3 months after Experiment II for monkeys D and J, respectively. During this preliminary training, each animal was trained for 17 days (monkey D) or 13 days (monkey J) with a subset of reward delays used in Experiment III-A in which the delay for the small reward was either 0 or 2 s. Each animal was then trained for another day (day 18 and day 14 for monkeys D and J, respectively) with all the conditions described above for Experiment III-A before collecting the data described in the Results. After Experiment III-A, one of the monkeys (monkey J) was tested using the mixed clocks in a neurophysiological experiment (Kim et al., 2008 ). During this period, only a subset of reward delays in Experiment III-A was used (0 and 2 s for small reward and 0, 2, 5, and 8 s for large reward). Both animals were then tested in Experiment III-B in order to investigate whether exposure to mixed clocks influenced the animal’s discount function. Experiment III-B was identical to Experiment II-A, except that the magnitude of small reward was reduced to 0.2 ml for monkey J.

Data Analysis

In the following, the symbol Ω is used to denote a set of variables corresponding to the magnitudes and delays of small and large rewards. Namely, Ω = {A_TS, A_TL, D_TS, D_TL}, in which A_TS (A_TL) and D_TS (D_TL) refer to the magnitude and delay of small (large) reward, respectively. To estimate the animal’s discount function from its choices, we assumed that the probability of choosing TS given Ω, P(TS|Ω), was determined by the difference in the temporally discounted values for the two targets. In other words, denoting the temporally discounted value of a given target x as DV(A_x, D_x),

This is also known as softmax transformation, and is equivalent to the Boltzmann distribution given by the following:

where β denotes the inverse temperature controlling the randomness of the animal’s choices. In addition, p(TL|Ω) = 1 − p(TS|Ω). Therefore, p(TS|Ω) = p(TL|Ω) = 0.5, if the temporally discounted values are equal for both targets, and p(TS) approaches 1, as the temporally discounted value of TS increases. The temporally discounted value of the reward with the magnitude A and delay D is determined by the following:

where F(D) refers to a discount function. An exponential discount function corresponds to the following:

where k_E denotes the discount rate (s⁻¹). A hyperbolic discount function can be given by the following:

where the parameter k_H controls the steepness of discounting. We have also tested three additional discount functions. One of them is a variant of hyperbolic discount function in which the more immediate reward is not discounted and the more delayed reward is discounted according to the hyperbolic discount function based on the difference in the delays of the two rewards (Green et al., 2005 ). In addition, the general hyperbolic discount function (Green and Myerson, 2004 ; Takahashi et al., 2008 ), F_G, and the β-δ discount function (Phelps and Pollak, 1968 ), F_β-δ, are given by the following:

It should be noted that the general hyperbolic discount function shown above is mathematically equivalent to the so-called q-exponential discount function (Cajueiro, 2006 ; Takahashi et al., 2008 ), which is given by the following:

The parameters of the general hyperbolic discount function and q-exponential discount function are related by the following;

Denoting the animal’s choice in trial t as c_t (=TS or TL), the likelihood of the animal’s choices was given by,

where Ω_t denotes the magnitudes and delays for the rewards in trial t, and N the number of trials. For each discount function, model parameters were chosen to maximize the log likelihood (Pawitan, 2001 ), using a function minimization procedure in Matlab (Mathworks, Natick, MA, USA). Since the models with exponential and hyperbolic discount functions both include two parameters (β and k), these two models were compared using their log likelihood. This was carried out for the entire data from a given experiment as well as separately for each daily session. The general hyperbolic and β-δ discount functions included an additional parameter. Therefore, the Bayesian information criterion (BIC) was used to compare the performance of models with different numbers of parameters. BIC was computed as follows:

where N is the number of trials and m the number of model parameters (e.g., 2 for the model with exponential or hyperbolic discount function). For the results obtained from monkey D in Experiments I-B and III-A, the process of parameter search failed to converge for the general hyperbolic discount function. In these two cases, the values of the parameters in the general hyperbolic discount functions were computed by estimating the parameters of the q-exponential discount function instead and converting them as described above. Since the general hyperbolic discount function and q-exponential discount function are mathematically equivalent, the log likelihood for the best parameters of these two models should be the same.

During Experiment III-A, the physical reward delay was given by (n_Y + 4 n_C) s, in which n_Y and n_C indicate the numbers of yellow and cyan disks, respectively. Temporally discounted values of rewards associated with mixed clocks were computed without assuming that the animal accurately estimated the value of n_C. This was done by using the subjective delays for cyan disks, which were estimated as a free parameter in the maximum likelihood procedure described above. In other words, the subjective reward delays used to compute temporally discounted values were given by (n_Y + D_C n_C) s, in which D_C refers to the subjective delay for one cyan disk.