The Role of Mediodorsal Thalamus in Temporal Differentiation of Reward-Guided Actions

The mediodorsal thalamus (MD) is a crucial component of the neural network involved in the learning and generation of goal-directed actions. A series of experiments reported here examined the contributions of MD to the temporal differentiation of reward-guided actions. In Experiment 1, we trained rats on a discrete-trial, fixed-criterion temporal differentiation task, in which only lever presses exceeding a threshold duration value were rewarded. Pre-training MD lesions impaired temporal differentiation of action duration, by increasing the dispersion of the duration distribution. Post-training MD lesions also impaired differentiation, but by reducing the average emitted press durations, thus shifting the distribution without increasing the dispersion. In Experiment 2, we trained rats to space their lever pressing above criterion inter-press-intervals in order to earn rewards. Both pre-training and post-training MD lesions impaired the differentiation of inter-press-intervals. These results show that MD plays an important role in the acquisition and expression of action differentiation.

As reported by , MD plays an important role in the acquisition of goal-directed behavior, which is sensitive to outcome devaluation and instrumental contingency degradation.

Fixed-criterion, discrete-trial, press duration differentiation.
After the devaluation test, the rats were retrained with 1 day of CRF with pellets, during which half the rats pressed the left lever and the other half pressed the right lever. Sucrose was not used in this experiment. Following CRF, the rats were successively shifted to three different temporal response differentiation schedules, in which the rats were trained to produce lever presses with a minimum duration at 400, 800 and 1600 ms (Yin, 2009). A discrete-trial program was used. Each trial began with the insertion of a lever, and ended with its retraction as soon as the lever was pressed and released. The trial was repeated, with an inter-trial-interval of 8 s. If the press lasted longer than the criterion duration, following the release of the lever a food pellet was delivered immediately into the food magazine. If not, no pellet was given. The session was terminated after 90 min or 50 earned pellets. The rats were trained for six sessions on each criterion.

Post-training MD lesion
Twenty-six rats were used in this experiment. Before surgery, naïve rats were trained with temporal differentiation schedules: >400, >800 and >1600 ms for six sessions on each criterion after four sessions of CRF. Following the last session of 1600-ms duration training, the rats were given free access to food in their home cage for 1 day. At the time of the surgery, rats were divided into two groups based on their baseline instrumental performance during their duration differentiation training. Half the rats from each cage (the rats were paired in home cage) received MD lesion with 0.4 µl NMDA, and the remaining received 0.9% saline. The surgery procedure was the same as described above. After surgery, the rats were allowed to recover for 5 days, and then returned to the food deprivation schedule for 2 days before the test. During the test, the rats were placed under >1600 ms schedule for four sessions.

Pre-training MD lesion
After duration differentiation training, 16 of the rats from the post-training duration differentiation experiment (sham, n = 8; lesion, n = 8) were retrained with CRF for four sessions, and then used for IRT differentiation schedule at >10 and >20 s successively. Rats were required to press the lever with a minimum delay above a criterion value (10 or 20 s) after the last press. If they pressed earlier than the required delay, the reward was canceled. The rats were trained for fi ve sessions at 10 s, and then shifted to 20 s for fi ve sessions. Each session terminated after 90 min or 30 earned pellets.

Post-training MD lesion
Sixteen naïve rats were used in post-training MD lesion experiment. Lever press training began with four sessions of CRF. Half of the rats in each group earned pellets by pressing the left lever. The remaining rats were trained with the right lever. They were all shifted to two sessions of RR5, two sessions of RR10 and two sessions of RR20. They then were trained on IRT differentiation schedule at >10 and >20 s for six sessions on each criterion. The session ended with 30 earned pellets or after 90 min.
Before surgery, the rats had free access to food for 1 day. They were then divided into two groups based on their baseline instrumental performance during the pre-surgery training. Half of the rats from each cage received MD lesion, and the remaining were chosen as controls. The surgical procedure is the same as described above. Similarly, the rats were given a recovery period of 5 days and then food deprived before testing. The >20-s schedule was used during the test.

DATA ANALYSIS
Data analysis was performed using Microsoft Excel, Graphpad Prism and Matlab.

RESULTS
Histological analysis showed that NMDA infusions caused substantial neuronal damage in the MD, with limited damage to surrounding thalamic nuclei in some subjects (Figure 1). Four rats with inaccurate lesions (lesion of deeper and more caudal thalamic nuclei, or unilateral damage of MD) were excluded.

Initial acquisition and devaluation test
To make sure we were targeting the same area as previous work did, we replicated experiment using two-action and two-outcome training, and performed a 5-min devaluation test before the duration differentiation training. Our data indicated that both groups of rats learned to press the lever for both pellets and sucrose after 8 days of training. The mean press rates increased across the days. A two-way ANOVA with group and days as factors showed no main effects of group (pellet, F 1,105 = 2.35, p > 0.05; sucrose, F 1,105 = 2.44, p > 0.05), main effects of days (pellet, F 7,105 = 90.83, p < 0.0001; sucrose, F 7,105 = 67.98, p < 0.0001), and no interactions between them (pellet, F 7,105 = 0.24, p > 0.05; sucrose, F 7,105 = 1.79, p > 0.05) on pellet and sucrose. During the devaluation test, MD lesioned rats displayed reduced sensitivity to outcome devaluation: their mean response rate did not differ between devalued and non-devalued levers (planned comparison, p > 0.05). By contrast, the sham control group exhibited a normal devaluation effect, pressing less frequently on the devalued lever (planned comparison, p < 0.05). We were therefore able to replicate the results from previous work  showing that MD lesion impaired acquisition of the action-outcome contingency.

Duration differentiation
To examine the effect of MD lesion on temporal differentiation, the rats were retrained with 1 day of CRF with pellets as reinforcers, and then began the duration differentiation task. Figure 2 showed that all rats (n = 11 for sham and n = 11 for MD) learned to perform the task and produced duration distributions peaking around criterion durations within six sessions (Figure 2A).
As the press durations exhibited non-Gaussian distributions, to quantify the performance, the median press duration value of each rat was used as a measure of the timing of the action and interquartile range (IQR) of the duration distribution was used as a measure of dispersion. For all three criterion durations, the median duration of both groups increased within six sessions (Twoway ANOVA, main effects of time: 400 ms, F 5,100 = 2.48, p < 0.05; 800 ms, F 5,100 = 8.74, p < 0.0001; 1600 ms, F 5,100 = 24.74, p < 0.0001), and reached a steady state across last three sessions (no effects of time: Fs < 1.2, ps > 0.05, Figure 2B). There was no signifi cant difference between sham and MD lesion rats for all three criterion durations (no effects of group: 400 ms, F 1,100 = 1.14, p > 0.05; 800 ms, F 1,100 = 0.74, p > 0.05; 1600 ms, F 1,100 = 0.38, p > 0.05); no interaction between time and group (Fs < 1).
We also compared the proportion of presses rewarded and the rate of lever presses (rLP) for each criterion duration ( Figure 3A). For both groups, the proportion of rewarded presses and rLP increased over six sessions (Two-way ANOVA, main effects of time, 400 ms, Fs 5,100 > 6.70, ps < 0.0001; 800 ms, Fs 5,100 > 19.3, ps < 0.0001; 1600 ms, Fs 5,100 > 23.6, ps < 0.0001; no interactions between time and group, Fs < 1.62, ps > 0.05) at each criterion. Similar to the median duration measure, the transition to steady state occurred around the 4th session for both sham and lesion groups (no effects of time for last three session: Fs 2,40 < 1.38, ps > 0.05 for proportion of rewarded presses; Fs 2,40 < 1.18, ps > 0.05 for rLP; except at 1600 ms, F 2,40 = 3.68, p < 0.05 for proportion of rewarded presses; no interactions at all criterions, Fs < 1.21, ps > 0.05). As shown in Figure 3A, lesioned rats produced lower proportion of correct (rewarded) presses and showed reduced FIGURE 1 | Histological analysis of MD lesions. (A) Photomicrographs of representative sham (left) and MD lesions (right) at low (4×, above) and high (10×, below) magnifi cation. (B) Illustration of the largest (gray shading) and smallest (black outline) extent of lesions. The diagrams are based on a rat brain atlas (Paxinos and Watson, 1998 rate of pressing at 1600 ms across six sessions (main effect of group, F 1,100 = 5.06, p < 0.05 for proportion of rewarded presses; F 1,100 = 5.58, p < 0.05 for rLP). This effect was smaller for shorter duration criteria. At 400 ms, although the proportion of presses rewarded and the press rate of lesion group were not signifi cantly lower than those of control group across six sessions (no effect of group, Fs 1,100 < 3.70, ps > 0.05), they did reach statistical signifi cance over the last three sessions (main effect of group, F 1,40 = 4.72, p < 0.05 for proportion of press rewarded; F 1,40 = 7.01, p < 0.05 for rLP). There was no signifi cant lesion effect for the 800-ms criterion duration, however (no effect of group, Fs 1,100 < 0.87, ps > 0.05). Thus pre-training MD lesions impaired the accuracy of action timing during the initial differentiation learning and when the performance became more diffi cult. Interval timing often exhibits the scalar property, i.e. noise is proportional to the average. A recent study in mice indicated that motor cortical lesions have no impact on the scalar property of press duration (Yin, 2009), which was previously suggested to be a basic property of the psychophysical judgment of temporal duration. To assess the effect of MD lesions on the scalar property of timing, coeffi cient of variation (CV, standard deviation/mean) across six sessions was analyzed ( Figure 3B). A twoway ANOVA analysis with group and duration as factors showed a main effect of group (F 1,40 = 7.50, p < 0.05), a main effect of duration (F 2,40 = 13.91, p < 0.05), and no interaction between these two factors (F 2,40 = 1.06, p > 0.05). In other words, MD lesion resulted in a general increase in the CV of press durations, and the CV changes depending on the duration criterion. Post hoc tests revealed a difference between 400 and 800 ms sessions (p < 0.05), but no difference between 800 and 1600 ms (p > 0.05). This observation suggests that, in rats, the distribution of lever press durations may exhibit the scalar property only for relatively long durations.

EFFECT OF POST-TRAINING LESION ON DURATION DIFFERENTIATION
To investigate how post-training MD lesions affect the expression of temporal differentiation, after training with three criterion durations, the rats were separated into two groups (n = 13 for sham and n = 12 for MD lesion) based on their press duration distributions ( Figure 4A). Planned comparisons revealed no signifi cant difference in median (p > 0.05), IQR (p > 0.05), rate of lever pressing (p > 0.05), and proportion of rewarded presses (p > 0.05) during the last session. After surgery, rats were re-tested with the 1600ms duration task. The distribution of lesioned group immediately shifted to the left (Figure 4B) during the fi rst session of post-lesion tests. By contrast, there was no change in the sham group. Yet the dispersion of the distribution did not change in either group. A twoway AVOVA with group and session (pre-lesion vs. post-lesion) as factors showed an interaction between these two factors for median duration (F 1,23 = 11.2, p < 0.05) and proportion of rewarded press (F 1,23 = 31.8, p < 0.05). Furthermore, post hoc analysis showed that the median duration of lesioned group was signifi cantly reduced (p < 0.01, Figure 4C). Moreover, proportion of rewarded press was immediately decreased after surgery (p < 0.0001). No differences were found for sham group (ps > 0.05). In comparison, there was no interaction between time and group, no effect of session and no effect of group for IQR (Fs < 1.86, ps > 0.05) and rate of pressing (Fs < 2.21, ps > 0.05). Interestingly, the defi cit in the lesioned group disappeared after three additional training sessions (4th session after surgery, ps > 0.05 for all measures, data not shown). In short, post-training MD lesions caused an immediate defi cit in the capacity of timing the required action duration, but this defi cit could be reduced by additional learning.

EFFECT OF PRE-TRAINING LESION ON IRT DIFFERENTIATION
The IRT distribution is shown in Figure 5A. At the beginning of training, both groups showed peak values that are much lower than the criterion value (data not shown). After fi ve session of 10 s IRT training, sham group learned the task and showed a bi-modal distribution with the second mode above the criterion IRT. MD lesion group still produced IRTs with a single mode below criterion ( Figure 5A). After shifting to the new criterion of 20 s, the difference between two groups became signifi cant. In the fi rst session after shifting, the sham group immediately shifted their second peak above 20 s, whereas the lesioned group showed lower IRTs (data not shown). During the last session, the lesion group slightly increased the proportion of long IRTs ( Figure 5A). Acquisition was quantifi ed using three measures: Median, IQR, and proportion of rewarded presses. Here better performance is indicated by higher IRTs. A two-way ANOVA analysis of the >10 s training data (Figure 5B, left), with time and lesion as factors, revealed a main effect of time across sessions (median, F 4,54 = 13.53, p < 0.0001; IQR, F 4,54 = 11.71, p < 0.0001; proportion of rewarded press, F 4,54 = 46.71, p < 0.0001), but no main effect of lesion (median, F 1,54 = 0.76, p > 0.05; IQR, F 1,54 = 2.89, p > 0.05; proportion of rewarded press, F 1,54 = 2.41, p > 0.05), and no interaction between these factors (Fs < 2.12, ps > 0.05).
When the criterion IRT shifted to 20 s, lesioned rats immediately showed a defi cit. This observation was confi rmed by a two-way ANOVA performed on the data from fi rst three sessions of 20 s IRT training (Figure 5B, right). MD lesions reduced IQR and proportion of presses rewarded (main effects of lesion on IQR: F 1,26 = 5.95, p < 0.05; on proportion of rewarded presses, F 1,26 = 10.37, p < 0.05). There was also a signifi cant effect of time (IQR, F 2,26 = 4.79, p < 0.05; proportion of rewarded presses, F 2,26 = 3.83, p < 0.05) and no interactions between these factors (IQR, F 2,26 = 0.33, p > 0.05; proportion of rewarded presses, F 2,26 = 0.76, p > 0.05). There was no difference in median IRT (no main effect of lesion, F 1,26 = 1.10, p > 0.05). These fi ndings indicated that MD lesion has an important effect on IRT differentiation, especially when the task became presumably more diffi cult (>20 s). Furthermore, the lever press durations were examined. Figure 6 showed that sham and MD lesions exhibited similar distributions of press duration during IRT differentiation test. There was no signifi cant group difference in median duration and IQR (Fs 1,52 < 1.93, p > 0.05; no effect of time, Fs 4,52 < 1.66, ps > 0.05; no interaction between lesion and time, Fs 4,52 < 2.04, p > 0.05). Thus, when the time between presses is differentially reinforced, MD lesions did not affect the press durations themselves. Such results show that press duration and IRTs are independently controlled.

EFFECT OF POST-TRAINING LESION ON IRT DIFFERENTIATION
Similarly, after training with both criterion IRTs, the rats were separated into two groups. No differences existed between two groups for median IRT (t-test, p > 0.05), IQR of IRT (p > 0.05), rate of pressing (p > 0.05) and proportion of rewarded presses (p > 0.05) at the last pre-surgery session of 20 s. After recovery from surgery, rats were retested with 20 s sessions ( Figure 7A). A two-way ANOVA analysis with session and group as factors showed a main effect of session (F 1,13 = 6.02, p < 0.05), no effect of group (F 1,13 = 0.69, p > 0.05), but an interaction between these factors (F 1,13 = 8.49, p < 0.05) for IQR of IRT. Signifi cantly, a post hoc analysis on the pre-and post-lesion training session revealed FIGURE 6 | The distribution of lever press durations during the fi rst and last sessions for each criterion IRT: Duration distribution was not affected by MD lesions when press duration is not the differentially reinforced operant. Thus, rather than having a general effect on lever press duration, MD lesions only impaired the differentiation of press duration. Error bar represents SEM. that the dispersion of lesion group was reduced by the lesion (Figure 7B, lesion IQR, p < 0.05). There were no effects of session and group, and no interactions for median IRT (Fs < 2.44, ps > 0.05), proportion of presses rewarded (Fs < 3.46, ps > 0.05), and rate of pressing (Fs < 4.64, ps > 0.05). These results indicated that post-training MD lesion impaired expression of IRT differentiation.

DISCUSSION
Previous studies have found a range of drug effects on duration and IRT differentiation of behavior Paule, 1990, 1991;Buffalo et al., 1993Buffalo et al., , 1994McMillan, 1994a,b, 1995;McMillan et al., 1994;). Yet the neural substrates important for these two temporal dimensions of action have not been examined in any systematic fashion (Yin, 2009).
In this study, we found that MD plays an important role in both acquisition and expression of temporal differentiation in instrumental learning. More specifi cally, (i) pre-training MD lesion impaired the differentiation of action durations, producing higher variability in press duration; (ii) post-training MD lesion reduced the action duration and accuracy, but did not affect variability; (iii) pre-training MD lesion impaired the acquisition of IRT selection at longer required IRT (20 s), resulting in lower IRTs and probability of rewarded presses; (iv) posttraining MD lesion also impaired expression of IRT differentiation. Overall, there is a general impairment in the formation of an operant, i.e. any behavioral parameter that increases the frequency of reward delivery. The effects of MD lesions are specifi c -limited to the shaping of the appropriate operant, be it press duration or IRT. When press duration was the operant, it was affected by MD lesions; but when IRT was the operant, MD lesions impaired IRT differentiation without having any effect on duration distribution.

TEMPORAL DIFFERENTIATION OF ACTION
Differentiation is to be distinguished from discrimination. In discrimination, behavior is generated in response to some discriminative stimulus, e.g. green light go, red light stop. In differentiation, the external stimuli do not provide any instruction about what the animal should do. Rather, the animal must use learned criteria to produce appropriate behavior. Most previous lesion studies of reward-guided behaviors use some variant of cue discrimination procedure, but the study of differentiation has largely been neglected.
The current study focuses on temporal differentiation, which is concerned with the questions of 'when' and 'how long.' In the absence of instructions the organism can select the appropriate behavioral parameter based on experienced consequences. Action duration and spacing are two basic temporal dimensions of behavior, known to be modifiable by learning (Skinner, 1938;Kuch, 1974;Kuch and Platt, 1976). In the former, the duration is the operant, whereas in the latter, the time between presses is the operant. Press duration differentiation restricts the animal's behavioral repertoire (it is impossible to enter the magazine or groom while holding down the lever), whereas IRT differentiation does not restrict the range of behaviors to fit the required temporal interval. Because IRT can be affected by a variety of uncontrolled variables, it is thought to be a noisy index of temporal differentiation (Platt et al., 1973;Kuch, 1974;Kuch and Platt, 1976). Our results support this assumption. In duration differentiation, the improvement in performance and the impairments produced by MD lesion were relatively consistent. However, in IRT differentiation, the IRTs between two presses showed a bimodal distribution. Obviously, the second peak near the criterion IRTs was due to reinforcement, while the first peak of short IRTs (∼2.5 s) was relatively independent of reinforcement. Nevertheless, historically the more commonly studied temporal dimension of actions has been IRTs, in part for technical reasons. Here we showed that MD lesions impaired performance on both tasks, and that duration differentiation is a more convenient and reliable method for studying temporal differentiation.
In this study the temporal differentiation experiments were conducted with a single action and a single reward. Extended training under these conditions has been shown to result in lever pressing that is not explicitly goal-directed, as indicated by insensitivity to outcome devaluation treatments (Yin and Knowlton, 2006). But devaluation is not a convenient test to assess the goaldirectedness of differentiation, as the operant in question here is not the rate of action but the form of action. It is certainly possible that devaluation can reduce the rate of lever pressing on this task, which would indicate the goal-directed nature of the differentiated action. This analysis can be diffi cult to perform, however, as rate-based devaluation relies on the use of extinction tests to probe the remembered action-outcome representation, and the lack of reinforcement in extinction tests may produce other effects on the form of the action after temporal differentiation. A possible solution is the use of partial reinforcement schedules for specifi c duration criteria, so that the animals are used to performing non-reinforced but nevertheless correct actions. Such a method should be effective when combined with a short extinction test in revealing the goal-directedness of differentiated actions.

PRE-TRAINING VS. POST-TRAINING LESIONS
Pre-training and post-training MD lesions produced different effects. Both impaired accuracy of performance; both increased number of errors (presses not long enough to be rewarded), but in different ways. Pre-training lesions increased the dispersion of the lever press duration distribution, i.e. increasing 'noise' in performance. Post-training lesions, on the other hand, did not affect dispersion, but simply reduced the median duration. One obvious alternative account of these fi ndings, especially the defi cits after post-training lesions, is that MD may be needed for the inhibitory control of instrumental actions, which explains the premature release of the lever after post-training lesions. This account, however, failed to explain the very different results obtained after pre-training lesions, namely increased dispersion of press durations with no signifi cant change in average duration. Thus, whatever role the MD may play in the inhibition of actions cannot easily explain our results.
With pre-training lesions, it is possible that animals were able to make use of alternative systems to acquire the action duration, as the duration distribution is more dispersed. In this connection it should be noted that for the 800-ms duration criterion, pretraining lesions did not produce signifi cant defi cits, even though the lesioned rats showed numerically higher IQR. With post-training lesions, there appeared to be a direct effect on the acquired memory of the criterion duration. The variability, however, was not affected. To our knowledge, such observations have never been reported in any previous lesion study for any brain region. But their significance remain unclear, as the neural circuitry underlying duration differentiation is still poorly defi ned.

MD AND INSTRUMENTAL LEARNING
As previously reported, pre-training but not post-training MD lesions signifi cantly reduced sensitivity of instrumental performance to outcome devaluation and instrumental contingency degradation, suggesting a crucial contribution of MD to the acquisition of action-outcome contingencies . Furthermore, others have found MD is important for new learning, but not for retrieval of previously learned scene discrimination (Mitchell et al., 2007a;Mitchell and Gaffan, 2008). Consistent with previous work, our data revealed that the MD is critical for new learning: Pre-training lesions of the MD impaired acquisition. Unlike previous work, however, our data also revealed that post-training lesions impaired the expression of temporal differentiation, for both press durations and IRTs (Figures 4 and  7). Thus, our behavioral measures permit the discovery of new effects of post-training lesions.
Our data also suggest that the largest effects of MD lesions are found when animals have to re-adjust their behavior to new and more diffi cult contingencies. For example, when the criterion duration shifted to 1600 ms and IRT shifted to 20 s (Figures 2,  3 and 5). This is consistent with studies which suggested that the MD plays particularly role in certain forms of behavioral fl exibility (Hunt and Aggleton, 1998b;Block et al., 2007). In this study, MD is required when animals have to adjust the timing of their actions.
Despite the range of defi cits produced by MD lesions, they did not impair general sensorimotor functions or motivation (Figure 6). Rather, the MD may be critical for the acquisition and retention of the operant -an arbitrary set of behavioral parameters that lead to the goal. Thus the present results extend previous work on the role of MD in action-outcome learning. But a major difference lies in the signifi cant post-training lesions reported here using temporal differentiation. Previous work  did not identify any effect on sensitivity to devaluation following post-training lesions. In traditional rate measures of instrumental performance, the operant is less constrained. As such it could be redundantly represented by numerous brain areas; and once acquired, the expression of the press rate-outcome knowledge does not appear to require the MD. On the other hand, the present temporal differentiation procedure requires more specifi c representation of the duration of the required lever press, which may require the MD.
The cortical-basal ganglia networks are functional units for behavioral integration (Yin and Knowlton, 2006;Haber and Calzavara, 2009). How different substrates within the networks contribute to temporal differentiation of action is not yet known. One obvious candidate, in light of our data, is the associative cortico-basal ganglia network, which has access to motor initiation networks in the brainstem. In addition to MD, the medial prefrontal cortex, the major target of MD outputs, is important for learning of new action-outcome contingencies but not for expression of learned associations Ostlund and Balleine, 2005). As suggested by the similar pre-training effects of the dorsomedial striatum and the prefrontal cortex Yin et al., 2005), initial acquisition of action-outcome contingencies is mediated by the associative cortico-basal ganglia network including medial prefrontal cortex, associative striatum, and MD. While our results have uncovered a novel role for MD in action differentiation, additional research will be needed to clarify the specifi c contributions of thalamic, striatal, and cortical regions to this important adaptive function.

ACKNOWLEDGMENTS
This work is supported by National Institute on Alcohol Abuse and Alcoholism Grants 016991 to H.H.Y. We would like to thank Oksana Shelest and Alberto Lopez for their help with the experiments.