Original Research ARTICLE
The role of mediodorsal thalamus in temporal differentiation of reward-guided actions
- Department of Psychology and Neuroscience, Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
The mediodorsal thalamus (MD) is a crucial component of the neural network involved in the learning and generation of goal-directed actions. A series of experiments reported here examined the contributions of MD to the temporal differentiation of reward-guided actions. In Experiment 1, we trained rats on a discrete-trial, fixed-criterion temporal differentiation task, in which only lever presses exceeding a threshold duration value were rewarded. Pre-training MD lesions impaired temporal differentiation of action duration, by increasing the dispersion of the duration distribution. Post-training MD lesions also impaired differentiation, but by reducing the average emitted press durations, thus shifting the distribution without increasing the dispersion. In Experiment 2, we trained rats to space their lever pressing above criterion inter-press-intervals in order to earn rewards. Both pre-training and post-training MD lesions impaired the differentiation of inter-press-intervals. These results show that MD plays an important role in the acquisition and expression of action differentiation.
The mediodorsal nucleus of the thalamus (MD), which participates in both the associative and limbic cortico-basal ganglia networks, has been implicated in learning and memory (Markowitsch, 1982 ; Aggleton and Mishkin, 1983a ; Hunt and Aggleton, 1991 , 1998a ; Oyoshi et al., 1996 ; Parker et al., 1997 ; Gaffan and Parker, 2000 ; Mitchell and Dalrymple-Alford, 2005 ; Mitchell et al., 2007a ,b ). MD receives inputs from the medial substanta nigra pars reticulata and ventral globus pallidus (Haber et al., 1985 ; Ilinsky et al., 1985 ) and sends direct projections to the striatum (Cheatwood et al., 2003 , 2005 ); it is also strongly and reciprocally connected with the prefrontal cortex, including anterior cingulate cortex and orbital frontal cortex (Goldman-Rakic and Porrino, 1985 ; Giguere and Goldman-Rakic, 1988 ; Ray and Price, 1992 , 1993 ). In agreement with the anatomical connectivity, MD lesions have been reported to disrupt the acquisition of stimulus-reward associations (Gaffan and Murray, 1990 ; Gaffan et al., 1993 ) and action-outcome associations (Corbit et al., 2003 ; Mitchell et al., 2007b ; Ostlund and Balleine, 2008 ), which involve the limbic and associative cortico-basal ganglia networks, respectively (Yin et al., 2008 ). MD lesions also cause deficits in recognition memory (Aggleton and Mishkin, 1983b ; Zola-Morgan and Squire, 1985 ; Parker and Gaffan, 1997 ; Parker et al., 1997 ; Hunt and Aggleton, 1998a ), Pavlovian fear conditioning, stress responses (Herry et al., 1999 ; Chauveau et al., 2005 ), and limbic motor seizures (Cassidy and Gale, 1998 ; Popken et al., 2000 ; Byne et al., 2001 ; Volk and Lewis, 2003 ).
As reported by Corbit et al. (2003) , MD plays an important role in the acquisition of goal-directed behavior, which is sensitive to outcome devaluation and instrumental contingency degradation. This study focuses on the role of MD in the timing of actions, using the action differentiation paradigm. Action differentiation is the process by which different actions are generated despite identical stimulus conditions (Platt et al., 1973 ; Kuch, 1974 ; Kuch and Platt, 1976 ; Yin, 2009 ). Here we examined the temporal differentiation of action duration (the period between the pressing and the release of a lever) and inter-response-times (IRT, time between two adjacent lever presses) to further investigate the role of the MD in instrumental learning.
Materials and Methods
We used 65 male Long–Evans rats (∼7 weeks at the beginning of the experiments). All procedures followed Duke University Animal Care and Use Committee guidelines. Surgery was performed under general anesthesia with isoflurane (2%). Anesthetized animals were mounted in a stereotaxic device (Kopf, CA, USA). The scalp was cut to expose the skull surface and small burr holes were drilled bilaterally (AP −3.0; ML ±0.7 − 0.8; DV −5.5). Lesions were created by infusing 0.4 µl NMDA (0.1 M in PBS) per side over 2 min; and to allow the drug to diffuse, the needle was left in the brain for another 5 min after the end of injection. Sham lesions were created using the same procedures except 0.9% saline was injected instead of NMDA.
Training took place in six Med Associates (St. Albans, VT, USA) operant chambers. Rats were food deprived by feeding them 10–12 g of home chow each day after training and testing, to maintain their body weight at about 85% of their normal weight. Water was always available in the home cages. Each chamber was equipped with a food magazine that received pellets from a pellet dispenser (Bio-Serv 45 mg dustless precision pellets, Bio-Serv, NJ, USA). An infrared beam crossed the magazine opening to record head entries into the magazine. Each chamber contained two retractable levers on either side of the magazine and a 3 W 24 V house light mounted on the wall opposite the levers and magazine. A computer with the Med-PC-IV program was used to control the equipment and record behavior. The duration of each lever press was measured at a resolution of 10 ms using custom-written programs.
Experiment 1: Press Duration Differentiation
Pre-training MD lesion
Lever press training and devaluation test. Twenty-three rats were used in this experiment (sham, n = 11; lesion, n = 12). Before the temporal differentiation training, we replicated the previously reported deficits on the outcome devaluation test following MD lesions. Both groups of rats were trained with the same two-action, two-outcome training design from a previous study (Corbit et al., 2003 ). Briefly, rats were given two 30-min sessions of magazine training before lever press training, in which the food pellets and 20% sucrose were delivered on a random time schedule (on average every 60 s), with no levers extended. On the next day, half of the rats in each group received pellets by pressing the left lever and the sucrose by pressing right lever. The rest were trained with the opposite pairing. Initial lever press training began with 2 days of continuous reinforcement (CRF, each press earns one reward), and then shifted to random ratio (RR) schedule, which consisted of 2 days of RR5 (0.2 probability of reward for each press), 2 days of RR10 (0.1 reward probability) and 2 days of RR20 (0.05 reward probability). The rats were trained with two sessions each day, one with each paring. The sessions were separated by at least 1 h. Each session started with illumination of the house light and insertion of the lever, and ended with turning off the house light and retraction of the lever after 90 min or 30 rewards.
After the last day of RR20 training, the rats were given two consecutive days (one session per day) of outcome devaluation test. Before each test, the rats were pre-fed on either pellet or sucrose for at least 1 h. After pre-feeding, a 5-min choice extinction test was conducted. During the test, both levers were inserted, but no reward was delivered. The number of presses on each lever was recorded.
Fixed-criterion, discrete-trial, press duration differentiation. After the devaluation test, the rats were retrained with 1 day of CRF with pellets, during which half the rats pressed the left lever and the other half pressed the right lever. Sucrose was not used in this experiment. Following CRF, the rats were successively shifted to three different temporal response differentiation schedules, in which the rats were trained to produce lever presses with a minimum duration at 400, 800 and 1600 ms (Yin, 2009 ). A discrete-trial program was used. Each trial began with the insertion of a lever, and ended with its retraction as soon as the lever was pressed and released. The trial was repeated, with an inter-trial-interval of 8 s. If the press lasted longer than the criterion duration, following the release of the lever a food pellet was delivered immediately into the food magazine. If not, no pellet was given. The session was terminated after 90 min or 50 earned pellets. The rats were trained for six sessions on each criterion.
Post-training MD lesion
Twenty-six rats were used in this experiment. Before surgery, naïve rats were trained with temporal differentiation schedules: >400, >800 and >1600 ms for six sessions on each criterion after four sessions of CRF. Following the last session of 1600-ms duration training, the rats were given free access to food in their home cage for 1 day. At the time of the surgery, rats were divided into two groups based on their baseline instrumental performance during their duration differentiation training. Half the rats from each cage (the rats were paired in home cage) received MD lesion with 0.4 µl NMDA, and the remaining received 0.9% saline. The surgery procedure was the same as described above. After surgery, the rats were allowed to recover for 5 days, and then returned to the food deprivation schedule for 2 days before the test. During the test, the rats were placed under >1600 ms schedule for four sessions.
Experiment 2: IRT Differentiation
Pre-training MD lesion
After duration differentiation training, 16 of the rats from the post-training duration differentiation experiment (sham, n = 8; lesion, n = 8) were retrained with CRF for four sessions, and then used for IRT differentiation schedule at >10 and >20 s successively. Rats were required to press the lever with a minimum delay above a criterion value (10 or 20 s) after the last press. If they pressed earlier than the required delay, the reward was canceled. The rats were trained for five sessions at 10 s, and then shifted to 20 s for five sessions. Each session terminated after 90 min or 30 earned pellets.
Post-training MD lesion
Sixteen naïve rats were used in post-training MD lesion experiment. Lever press training began with four sessions of CRF. Half of the rats in each group earned pellets by pressing the left lever. The remaining rats were trained with the right lever. They were all shifted to two sessions of RR5, two sessions of RR10 and two sessions of RR20. They then were trained on IRT differentiation schedule at >10 and >20 s for six sessions on each criterion. The session ended with 30 earned pellets or after 90 min.
Before surgery, the rats had free access to food for 1 day. They were then divided into two groups based on their baseline instrumental performance during the pre-surgery training. Half of the rats from each cage received MD lesion, and the remaining were chosen as controls. The surgical procedure is the same as described above. Similarly, the rats were given a recovery period of 5 days and then food deprived before testing. The >20-s schedule was used during the test.
Data analysis was performed using Microsoft Excel, Graphpad Prism and Matlab.
Histological analysis showed that NMDA infusions caused substantial neuronal damage in the MD, with limited damage to surrounding thalamic nuclei in some subjects (Figure 1 ). Four rats with inaccurate lesions (lesion of deeper and more caudal thalamic nuclei, or unilateral damage of MD) were excluded.
Figure 1. Histological analysis of MD lesions. (A) Photomicrographs of representative sham (left) and MD lesions (right) at low (4×, above) and high (10×, below) magnification. (B) Illustration of the largest (gray shading) and smallest (black outline) extent of lesions. The diagrams are based on a rat brain atlas (Paxinos and Watson, 1998 ). The numbers indicate distance in mm from bregma.
Effect of Pre-Training Lesion on Duration Differentiation
Initial acquisition and devaluation test
To make sure we were targeting the same area as previous work did, we replicated experiment using two-action and two-outcome training, and performed a 5-min devaluation test before the duration differentiation training. Our data indicated that both groups of rats learned to press the lever for both pellets and sucrose after 8 days of training. The mean press rates increased across the days. A two-way ANOVA with group and days as factors showed no main effects of group (pellet, F1,105 = 2.35, p > 0.05; sucrose, F1,105 = 2.44, p > 0.05), main effects of days (pellet, F7,105 = 90.83, p < 0.0001; sucrose, F7,105 = 67.98, p < 0.0001), and no interactions between them (pellet, F7,105 = 0.24, p > 0.05; sucrose, F7,105 = 1.79, p > 0.05) on pellet and sucrose. During the devaluation test, MD lesioned rats displayed reduced sensitivity to outcome devaluation: their mean response rate did not differ between devalued and non-devalued levers (planned comparison, p > 0.05). By contrast, the sham control group exhibited a normal devaluation effect, pressing less frequently on the devalued lever (planned comparison, p < 0.05). We were therefore able to replicate the results from previous work (Corbit et al., 2003 ) showing that MD lesion impaired acquisition of the action-outcome contingency.
To examine the effect of MD lesion on temporal differentiation, the rats were retrained with 1 day of CRF with pellets as reinforcers, and then began the duration differentiation task. Figure 2 showed that all rats (n = 11 for sham and n = 11 for MD) learned to perform the task and produced duration distributions peaking around criterion durations within six sessions (Figure 2 A).
Figure 2. Temporal differentiation of lever pressing. (A) The distribution of press durations of sham (n = 11) and MD lesioned rats (n = 11) during the last training session for each criterion duration. Dashed lines indicate criterion durations. (B) Median press durations. (C) Interquartile range (IQR) of duration distribution for each criterion duration across six training sessions. Error bars represent SEM.
As the press durations exhibited non-Gaussian distributions, to quantify the performance, the median press duration value of each rat was used as a measure of the timing of the action and interquartile range (IQR) of the duration distribution was used as a measure of dispersion. For all three criterion durations, the median duration of both groups increased within six sessions (Two-way ANOVA, main effects of time: 400 ms, F5,100 = 2.48, p < 0.05; 800 ms, F5,100 = 8.74, p < 0.0001; 1600 ms, F5,100 = 24.74, p < 0.0001), and reached a steady state across last three sessions (no effects of time: Fs < 1.2, ps > 0.05, Figure 2 B). There was no significant difference between sham and MD lesion rats for all three criterion durations (no effects of group: 400 ms, F1,100 = 1.14, p > 0.05; 800 ms, F1,100 = 0.74, p > 0.05; 1600 ms, F1,100 = 0.38, p > 0.05); no interaction between time and group (Fs < 1).
However, as shown in Figure 2 B, pre-training MD lesion produced higher variability in press durations. This was confirmed by a two-way ANOVA. The IQRs of lesioned rats were higher than those of sham rats at 400 and 1600 ms (main effects of lesion, for 400 ms, F1,100 = 5.20, p < 0.05; for 1600 ms, F1,100 = 9.32, p < 0.01; no effects of time, for 400 ms, F5,100 = 0.74, p > 0.05; for 1600 ms, F5,100 = 0.82, p > 0.05; no interactions, for 400 ms, F5,100 = 1.12, p > 0.05; for 1600 ms, F5,100 = 1.14, p > 0.05). For 800 ms, the IQR was not significantly different (no effect of lesion, F1,100 = 2.74, p > 0.05; effect of time, F5,100 = 2.34, p < 0.05; no interaction, F5,100 = 0.87, p > 0.05) although the IQRs were numerically higher for MD lesions.
We also compared the proportion of presses rewarded and the rate of lever presses (rLP) for each criterion duration (Figure 3 A). For both groups, the proportion of rewarded presses and rLP increased over six sessions (Two-way ANOVA, main effects of time, 400 ms, Fs5,100 > 6.70, ps < 0.0001; 800 ms, Fs5,100 > 19.3, ps < 0.0001; 1600 ms, Fs5,100 > 23.6, ps < 0.0001; no interactions between time and group, Fs < 1.62, ps > 0.05) at each criterion. Similar to the median duration measure, the transition to steady state occurred around the 4th session for both sham and lesion groups (no effects of time for last three session: Fs2,40 < 1.38, ps > 0.05 for proportion of rewarded presses; Fs2,40 < 1.18, ps > 0.05 for rLP; except at 1600 ms, F2,40 = 3.68, p < 0.05 for proportion of rewarded presses; no interactions at all criterions, Fs < 1.21, ps > 0.05). As shown in Figure 3 A, lesioned rats produced lower proportion of correct (rewarded) presses and showed reduced rate of pressing at 1600 ms across six sessions (main effect of group, F1,100 = 5.06, p < 0.05 for proportion of rewarded presses; F1,100 = 5.58, p < 0.05 for rLP). This effect was smaller for shorter duration criteria. At 400 ms, although the proportion of presses rewarded and the press rate of lesion group were not significantly lower than those of control group across six sessions (no effect of group, Fs1,100 < 3.70, ps > 0.05), they did reach statistical significance over the last three sessions (main effect of group, F1,40 = 4.72, p < 0.05 for proportion of press rewarded; F1,40 = 7.01, p < 0.05 for rLP). There was no significant lesion effect for the 800-ms criterion duration, however (no effect of group, Fs1,100 < 0.87, ps > 0.05). Thus pre-training MD lesions impaired the accuracy of action timing during the initial differentiation learning and when the performance became more difficult.
Figure 3. Acquisition of duration differentiation. (A) Proportion of presses rewarded and the rate of lever pressing at each criterion duration. (B) Coefficient of variations of Sham (left) and MD lesion (right) at each criterion duration. Error bars indicate SEM.
Interval timing often exhibits the scalar property, i.e. noise is proportional to the average. A recent study in mice indicated that motor cortical lesions have no impact on the scalar property of press duration (Yin, 2009 ), which was previously suggested to be a basic property of the psychophysical judgment of temporal duration. To assess the effect of MD lesions on the scalar property of timing, coefficient of variation (CV, standard deviation/mean) across six sessions was analyzed (Figure 3 B). A two-way ANOVA analysis with group and duration as factors showed a main effect of group (F1,40 = 7.50, p < 0.05), a main effect of duration (F2,40 = 13.91, p < 0.05), and no interaction between these two factors (F2,40 = 1.06, p > 0.05). In other words, MD lesion resulted in a general increase in the CV of press durations, and the CV changes depending on the duration criterion. Post hoc tests revealed a difference between 400 and 800 ms sessions (p < 0.05), but no difference between 800 and 1600 ms (p > 0.05). This observation suggests that, in rats, the distribution of lever press durations may exhibit the scalar property only for relatively long durations.
Effect of Post-Training Lesion on Duration Differentiation
To investigate how post-training MD lesions affect the expression of temporal differentiation, after training with three criterion durations, the rats were separated into two groups (n = 13 for sham and n = 12 for MD lesion) based on their press duration distributions (Figure 4 A). Planned comparisons revealed no significant difference in median (p > 0.05), IQR (p > 0.05), rate of lever pressing (p > 0.05), and proportion of rewarded presses (p > 0.05) during the last session. After surgery, rats were re-tested with the 1600-ms duration task. The distribution of lesioned group immediately shifted to the left (Figure 4 B) during the first session of post-lesion tests. By contrast, there was no change in the sham group. Yet the dispersion of the distribution did not change in either group. A two-way AVOVA with group and session (pre-lesion vs. post-lesion) as factors showed an interaction between these two factors for median duration (F1,23 = 11.2, p < 0.05) and proportion of rewarded press (F1,23 = 31.8, p < 0.05). Furthermore, post hoc analysis showed that the median duration of lesioned group was significantly reduced (p < 0.01, Figure 4 C). Moreover, proportion of rewarded press was immediately decreased after surgery (p < 0.0001). No differences were found for sham group (ps > 0.05). In comparison, there was no interaction between time and group, no effect of session and no effect of group for IQR (Fs < 1.86, ps > 0.05) and rate of pressing (Fs < 2.21, ps > 0.05). Interestingly, the deficit in the lesioned group disappeared after three additional training sessions (4th session after surgery, ps > 0.05 for all measures, data not shown). In short, post-training MD lesions caused an immediate deficit in the capacity of timing the required action duration, but this deficit could be reduced by additional learning.
Figure 4. Post-training lesion impairs temporal precision of lever pressing. (A) The distribution of press durations for sham (n = 13) and to be lesioned (n = 12) rats during the last session of 1600-ms training before surgery. (B) The distribution of durations for MD lesioned rats and sham during the last session of 1600 ms before surgery (pre) and first test session of 1600 ms after surgery (post).(C) Comparisons between pre- and post-surgery sessions for each group. Error bar represents SEM. *indicates p < 0.05.
Effect of Pre-Training Lesion on IRT Differentiation
The IRT distribution is shown in Figure 5 A. At the beginning of training, both groups showed peak values that are much lower than the criterion value (data not shown). After five session of 10 s IRT training, sham group learned the task and showed a bi-modal distribution with the second mode above the criterion IRT. MD lesion group still produced IRTs with a single mode below criterion (Figure 5 A). After shifting to the new criterion of 20 s, the difference between two groups became significant. In the first session after shifting, the sham group immediately shifted their second peak above 20 s, whereas the lesioned group showed lower IRTs (data not shown). During the last session, the lesion group slightly increased the proportion of long IRTs (Figure 5 A).
Figure 5. Temporal differentiation of inter-response-time (IRT). (A) The distribution of IRTs during the last session for each criterion IRT. Dashed line indicates criterion IRT. (B) Response dynamics of median IRT, interquartile range of IRT, and proportion of presses rewarded across five sessions at each criterion IRTs. Error bar represents SEM.
Acquisition was quantified using three measures: Median, IQR, and proportion of rewarded presses. Here better performance is indicated by higher IRTs. A two-way ANOVA analysis of the >10 s training data (Figure 5 B, left), with time and lesion as factors, revealed a main effect of time across sessions (median, F4,54 = 13.53, p < 0.0001; IQR, F4,54 = 11.71, p < 0.0001; proportion of rewarded press, F4,54 = 46.71, p < 0.0001), but no main effect of lesion (median, F1,54 = 0.76, p > 0.05; IQR, F1,54 = 2.89, p > 0.05; proportion of rewarded press, F1,54 = 2.41, p > 0.05), and no interaction between these factors (Fs < 2.12, ps > 0.05).
When the criterion IRT shifted to 20 s, lesioned rats immediately showed a deficit. This observation was confirmed by a two-way ANOVA performed on the data from first three sessions of 20 s IRT training (Figure 5 B, right). MD lesions reduced IQR and proportion of presses rewarded (main effects of lesion on IQR: F1,26 = 5.95, p < 0.05; on proportion of rewarded presses, F1,26 = 10.37, p < 0.05). There was also a significant effect of time (IQR, F2,26 = 4.79, p < 0.05; proportion of rewarded presses, F2,26 = 3.83, p < 0.05) and no interactions between these factors (IQR, F2,26 = 0.33, p > 0.05; proportion of rewarded presses, F2,26 = 0.76, p > 0.05). There was no difference in median IRT (no main effect of lesion, F1,26 = 1.10, p > 0.05). These findings indicated that MD lesion has an important effect on IRT differentiation, especially when the task became presumably more difficult (>20 s).
Furthermore, the lever press durations were examined. Figure 6 showed that sham and MD lesions exhibited similar distributions of press duration during IRT differentiation test. There was no significant group difference in median duration and IQR (Fs1,52 < 1.93, p > 0.05; no effect of time, Fs4,52 < 1.66, ps > 0.05; no interaction between lesion and time, Fs4,52 < 2.04, p > 0.05). Thus, when the time between presses is differentially reinforced, MD lesions did not affect the press durations themselves. Such results show that press duration and IRTs are independently controlled.
Figure 6. The distribution of lever press durations during the first and last sessions for each criterion IRT: Duration distribution was not affected by MD lesions when press duration is not the differentially reinforced operant. Thus, rather than having a general effect on lever press duration, MD lesions only impaired the differentiation of press duration. Error bar represents SEM.
Effect of Post-Training Lesion on IRT Differentiation
Similarly, after training with both criterion IRTs, the rats were separated into two groups. No differences existed between two groups for median IRT (t-test, p > 0.05), IQR of IRT (p > 0.05), rate of pressing (p > 0.05) and proportion of rewarded presses (p > 0.05) at the last pre-surgery session of 20 s. After recovery from surgery, rats were retested with 20 s sessions (Figure 7 A). A two-way ANOVA analysis with session and group as factors showed a main effect of session (F1,13 = 6.02, p < 0.05), no effect of group (F1,13 = 0.69, p > 0.05), but an interaction between these factors (F1,13 = 8.49, p < 0.05) for IQR of IRT. Significantly, a post hoc analysis on the pre- and post-lesion training session revealed that the dispersion of lesion group was reduced by the lesion (Figure 7 B, lesion IQR, p < 0.05). There were no effects of session and group, and no interactions for median IRT (Fs < 2.44, ps > 0.05), proportion of presses rewarded (Fs < 3.46, ps > 0.05), and rate of pressing (Fs < 4.64, ps > 0.05). These results indicated that post-training MD lesion impaired expression of IRT differentiation.
Figure 7. Post-training lesion affects the performance of IRT differentiation. (A) The distribution of IRTs for sham (n = 8, left) and MD lesion (n = 7, right) during the last session before surgery (pre) and first test session of 20-s schedule after surgery (post). (B) Comparisons of responses between pre- and post-surgery sessions for each group. Error bar represents SEM. *indicates p < 0.05.
Previous studies have found a range of drug effects on duration and IRT differentiation of behavior (Schulze and Paule, 1990 , 1991 ; Buffalo et al., 1993 , 1994 ; Hudzik and McMillan, 1994a ,b , 1995 ; McMillan et al., 1994 ; McClure and McMillan, 1997 ; McClure et al., 1997 ). Yet the neural substrates important for these two temporal dimensions of action have not been examined in any systematic fashion (Yin, 2009 ).
In this study, we found that MD plays an important role in both acquisition and expression of temporal differentiation in instrumental learning. More specifically, (i) pre-training MD lesion impaired the differentiation of action durations, producing higher variability in press duration; (ii) post-training MD lesion reduced the action duration and accuracy, but did not affect variability; (iii) pre-training MD lesion impaired the acquisition of IRT selection at longer required IRT (20 s), resulting in lower IRTs and probability of rewarded presses; (iv) post-training MD lesion also impaired expression of IRT differentiation. Overall, there is a general impairment in the formation of an operant, i.e. any behavioral parameter that increases the frequency of reward delivery. The effects of MD lesions are specific – limited to the shaping of the appropriate operant, be it press duration or IRT. When press duration was the operant, it was affected by MD lesions; but when IRT was the operant, MD lesions impaired IRT differentiation without having any effect on duration distribution.
Temporal Differentiation of Action
Differentiation is to be distinguished from discrimination. In discrimination, behavior is generated in response to some discriminative stimulus, e.g. green light go, red light stop. In differentiation, the external stimuli do not provide any instruction about what the animal should do. Rather, the animal must use learned criteria to produce appropriate behavior. Most previous lesion studies of reward-guided behaviors use some variant of cue discrimination procedure, but the study of differentiation has largely been neglected.
The current study focuses on temporal differentiation, which is concerned with the questions of ‘when’ and ‘how long.’ In the absence of instructions the organism can select the appropriate behavioral parameter based on experienced consequences. Action duration and spacing are two basic temporal dimensions of behavior, known to be modifiable by learning (Skinner, 1938 ; Kuch, 1974 ; Kuch and Platt, 1976 ). In the former, the duration is the operant, whereas in the latter, the time between presses is the operant. Press duration differentiation restricts the animal’s behavioral repertoire (it is impossible to enter the magazine or groom while holding down the lever), whereas IRT differentiation does not restrict the range of behaviors to fit the required temporal interval. Because IRT can be affected by a variety of uncontrolled variables, it is thought to be a noisy index of temporal differentiation (Platt et al., 1973 ; Kuch, 1974 ; Kuch and Platt, 1976 ). Our results support this assumption. In duration differentiation, the improvement in performance and the impairments produced by MD lesion were relatively consistent. However, in IRT differentiation, the IRTs between two presses showed a bimodal distribution. Obviously, the second peak near the criterion IRTs was due to reinforcement, while the first peak of short IRTs (∼2.5 s) was relatively independent of reinforcement. Nevertheless, historically the more commonly studied temporal dimension of actions has been IRTs, in part for technical reasons. Here we showed that MD lesions impaired performance on both tasks, and that duration differentiation is a more convenient and reliable method for studying temporal differentiation.
In this study the temporal differentiation experiments were conducted with a single action and a single reward. Extended training under these conditions has been shown to result in lever pressing that is not explicitly goal-directed, as indicated by insensitivity to outcome devaluation treatments (Yin and Knowlton, 2006 ). But devaluation is not a convenient test to assess the goal-directedness of differentiation, as the operant in question here is not the rate of action but the form of action. It is certainly possible that devaluation can reduce the rate of lever pressing on this task, which would indicate the goal-directed nature of the differentiated action. This analysis can be difficult to perform, however, as rate-based devaluation relies on the use of extinction tests to probe the remembered action-outcome representation, and the lack of reinforcement in extinction tests may produce other effects on the form of the action after temporal differentiation. A possible solution is the use of partial reinforcement schedules for specific duration criteria, so that the animals are used to performing non-reinforced but nevertheless correct actions. Such a method should be effective when combined with a short extinction test in revealing the goal-directedness of differentiated actions.
Pre-Training vs. Post-Training Lesions
Pre-training and post-training MD lesions produced different effects. Both impaired accuracy of performance; both increased number of errors (presses not long enough to be rewarded), but in different ways. Pre-training lesions increased the dispersion of the lever press duration distribution, i.e. increasing ‘noise’ in performance. Post-training lesions, on the other hand, did not affect dispersion, but simply reduced the median duration. One obvious alternative account of these findings, especially the deficits after post-training lesions, is that MD may be needed for the inhibitory control of instrumental actions, which explains the premature release of the lever after post-training lesions. This account, however, failed to explain the very different results obtained after pre-training lesions, namely increased dispersion of press durations with no significant change in average duration. Thus, whatever role the MD may play in the inhibition of actions cannot easily explain our results.
With pre-training lesions, it is possible that animals were able to make use of alternative systems to acquire the action duration, as the duration distribution is more dispersed. In this connection it should be noted that for the 800-ms duration criterion, pre-training lesions did not produce significant deficits, even though the lesioned rats showed numerically higher IQR. With post-training lesions, there appeared to be a direct effect on the acquired memory of the criterion duration. The variability, however, was not affected. To our knowledge, such observations have never been reported in any previous lesion study for any brain region. But their significance remain unclear, as the neural circuitry underlying duration differentiation is still poorly defined.
MD and Instrumental Learning
As previously reported, pre-training but not post-training MD lesions significantly reduced sensitivity of instrumental performance to outcome devaluation and instrumental contingency degradation, suggesting a crucial contribution of MD to the acquisition of action-outcome contingencies (Corbit et al., 2003 ). Furthermore, others have found MD is important for new learning, but not for retrieval of previously learned scene discrimination (Mitchell et al., 2007a ; Mitchell and Gaffan, 2008 ). Consistent with previous work, our data revealed that the MD is critical for new learning: Pre-training lesions of the MD impaired acquisition. Unlike previous work, however, our data also revealed that post-training lesions impaired the expression of temporal differentiation, for both press durations and IRTs (Figures 4 and 7 ). Thus, our behavioral measures permit the discovery of new effects of post-training lesions.
Our data also suggest that the largest effects of MD lesions are found when animals have to re-adjust their behavior to new and more difficult contingencies. For example, when the criterion duration shifted to 1600 ms and IRT shifted to 20 s (Figures 2 , 3 and 5 ). This is consistent with studies which suggested that the MD plays particularly role in certain forms of behavioral flexibility (Hunt and Aggleton, 1998b ; Block et al., 2007 ). In this study, MD is required when animals have to adjust the timing of their actions.
Despite the range of deficits produced by MD lesions, they did not impair general sensorimotor functions or motivation (Figure 6 ). Rather, the MD may be critical for the acquisition and retention of the operant – an arbitrary set of behavioral parameters that lead to the goal. Thus the present results extend previous work on the role of MD in action-outcome learning. But a major difference lies in the significant post-training lesions reported here using temporal differentiation. Previous work (Ostlund and Balleine, 2008 ) did not identify any effect on sensitivity to devaluation following post-training lesions. In traditional rate measures of instrumental performance, the operant is less constrained. As such it could be redundantly represented by numerous brain areas; and once acquired, the expression of the press rate-outcome knowledge does not appear to require the MD. On the other hand, the present temporal differentiation procedure requires more specific representation of the duration of the required lever press, which may require the MD.
The cortical-basal ganglia networks are functional units for behavioral integration (Yin and Knowlton, 2006 ; Haber and Calzavara, 2009 ). How different substrates within the networks contribute to temporal differentiation of action is not yet known. One obvious candidate, in light of our data, is the associative cortico-basal ganglia network, which has access to motor initiation networks in the brainstem. In addition to MD, the medial prefrontal cortex, the major target of MD outputs, is important for learning of new action-outcome contingencies but not for expression of learned associations (Corbit and Balleine, 2003 ; Ostlund and Balleine, 2005 ). As suggested by the similar pre-training effects of the dorsomedial striatum and the prefrontal cortex (Corbit and Balleine, 2003 ; Yin et al., 2005 ), initial acquisition of action-outcome contingencies is mediated by the associative cortico-basal ganglia network including medial prefrontal cortex, associative striatum, and MD. While our results have uncovered a novel role for MD in action differentiation, additional research will be needed to clarify the specific contributions of thalamic, striatal, and cortical regions to this important adaptive function.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work is supported by National Institute on Alcohol Abuse and Alcoholism Grants 016991 to H.H.Y. We would like to thank Oksana Shelest and Alberto Lopez for their help with the experiments.
Block, A. E., Dhanji, H., Thompson-Tardif, S. F., and Floresco, S. B. (2007). Thalamic-prefrontal cortical-ventral striatal circuitry mediates dissociable components of strategy set shifting. Cereb. Cortex 17, 1625–1636.
Buffalo, E. A., Gillam, M. P., Allen, R. R., and Paule, M. G. (1994). Acute behavioral effects of MK-801 in rhesus monkeys: assessment using an operant test battery. Pharmacol. Biochem. Behav. 48, 935–940.
Byne, W., Buchsbaum, M. S., Kemether, E., Hazlett, E. A., Shinwari, A., Mitropoulou, V., and Siever, L. J. (2001). Magnetic resonance imaging of the thalamic mediodorsal nucleus and pulvinar in schizophrenia and schizotypal personality disorder. Arch. Gen. Psychiatry 58, 133–140.
Chauveau, F., Celerier, A., Ognard, R., Pierard, C., and Beracochea, D. (2005). Effects of ibotenic acid lesions of the mediodorsal thalamus on memory: relationship with emotional processes in mice. Behav. Brain Res. 156, 215–223.
Corbit, L. H., Muir, J. L., and Balleine, B. W. (2003). Lesions of mediodorsal thalamus and anterior thalamic nuclei produce dissociable effects on instrumental conditioning in rats. Eur. J. Neurosci. 18, 1286–1294.
Gaffan, D., and Murray, E. A. (1990). Amygdalar interaction with the mediodorsal nucleus of the thalamus and the ventromedial prefrontal cortex in stimulus-reward associative learning in the monkey. J. Neurosci. 10, 3479–3493.
Giguere, M., and Goldman-Rakic, P. S. (1988). Mediodorsal nucleus: areal, laminar, and tangential distribution of afferents and efferents in the frontal lobe of rhesus monkeys. J. Comp. Neurol. 277, 195–213.
Hunt, P. R., and Aggleton, J. P. (1998b). Neurotoxic lesions of the dorsomedial thalamus impair the acquisition but not the performance of delayed matching to place by rats: a deficit in shifting response rules. J. Neurosci. 18, 10045–10052.
McClure, G. Y., and McMillan, D. E. (1997). Effects of drugs on response duration differentiation. VI: Differential effects under differential reinforcement of low rates of responding schedules. J. Pharmacol. Exp. Ther. 281, 1368–1380.
McClure, G. Y., Wenger, G. R., and McMillan, D. E. (1997). Effects of drugs on response duration differentiation. V: Differential effects under temporal response differentiation schedules. J. Pharmacol. Exp. Ther. 281, 1357–1367.
McMillan, D. E., Adams, S. L., Wenger, G. R., McClure, G. Y., and Hardwick, W. C. (1994). Effects of drugs on response duration differentiation. III. Acute variation of reinforced duration. Pharmacol. Biochem. Behav. 48, 941–957.
Mitchell, A. S., Baxter, M. G., and Gaffan, D. (2007a). Dissociable performance on scene learning and strategy implementation after lesions to magnocellular mediodorsal thalamic nucleus. J. Neurosci. 27, 11888–11895.
Mitchell, A. S., Browning, P. G., and Baxter, M. G. (2007b). Neurotoxic lesions of the medial mediodorsal nucleus of the thalamus disrupt reinforcer devaluation effects in rhesus monkeys. J. Neurosci. 27, 11289–11295.
Oyoshi, T., Nishijo, H., Asakura, T., Takamura, Y., and Ono, T. (1996). Emotional and behavioral correlates of mediodorsal thalamic neurons during associative learning in rats. J. Neurosci. 16, 5812–5829.
Parker, A., Eacott, M. J., and Gaffan, D. (1997). The recognition memory deficit caused by mediodorsal thalamic lesion in non-human primates: a comparison with rhinal cortex lesion. Eur. J. Neurosci. 9, 2423–2431.
Ray, J. P., and Price, J. L. (1992). The organization of the thalamocortical connections of the mediodorsal thalamic nucleus in the rat, related to the ventral forebrain-prefrontal cortex topography. J. Comp. Neurol. 323, 167–197.
Ray, J. P., and Price, J. L. (1993). The organization of projections from the mediodorsal nucleus of the thalamus to orbital and medial prefrontal cortex in macaque monkeys. J. Comp. Neurol. 337, 1–31.
Yin, H. H., Knowlton, B. J., and Balleine, B. W. (2005). Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur. J. Neurosci. 22, 505–512.
Yin, H. H., Ostlund, S. B., and Balleine, B. W. (2008). Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur. J. Neurosci. 28, 1437–1448.
Keywords: mediodorsal thalamus, learning, devaluation, reward, action, lever press duration, differentiation
Citation: Yu C, Gupta J and Yin HH (2010) The role of mediodorsal thalamus in temporal differentiation of reward-guided actions. Front. Integr. Neurosci. 4:14. doi: 10.3389/fnint.2010.00014
Received: 01 April 2010;
Paper pending published: 15 April 2010;
Accepted: 22 April 2010; Published online: 21 May 2010
Edited by:Rui M. Costa, Instituto Gulbenkian de Ciência, Portugal
Reviewed by:Shih-Chieh Lin, National Institutes of Health, USA
Sean B. Ostlund, University of California at Los Angeles, USA
Copyright: © 2010 Yu, Gupta and Yin. This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
*Correspondence: Henry H. Yin, Department of Psychology and Neuroscience, Center for Cognitive Neuroscience, Duke University, Box 91050, Durham, NC 27708, USA. e-mail: firstname.lastname@example.org
Abbreviations: CRF, continuous reinforcement; CV, coefficient of variation; IRT, inter-response-time; MD, mediodorsal; NMDA, N-methyl-D-aspartic acid; rLP, rate of lever presses; RR, random ratio.