The Role of the Murine Motor Cortex in Action Duration and Order

This study examined the contributions of the primary and secondary motor cortices (M1 and M2) to action differentiation and sequencing in mice. In Experiment 1, mice with excitotoxic lesions of M1 and M2 and sham controls learned to emit lever presses exceeding a criterion duration to earn food rewards. Duration differentiation obeys Weber's law – i.e. the spread of the distribution is proportional to the average duration. M1 or M2 lesions did not affect differentiation of press durations. Experiment 2 studied the effects of the same lesions on the learning of a simple sequence consisting of two lever presses, one distal, and the other proximal, to the reward. M2 lesions impaired the acquisition and reversal of this sequence. M1 lesions, by contrast, had no effect on acquisition but impaired sequence reversal. Moreover, duration of the first press in a sequence was on average twice as long as that of the second press, though this ratio was not affected by motor cortex lesions. Together these results offer a first glimpse into the cortical substrates of instrumental differentiation in mice.


introduction
All behaviors take place in time. Two fundamental temporal dimensions of behavior are duration and order (Lashley, 1951;Bernstein, 1967). These dimensions can be modified by instrumental learning, in a process known as differentiation (Skinner, 1938).
To differentiate means to generate actions fulfilling some criterion. Differentiation can be contrasted with discrimination, in which animals are trained to perform one response to one discriminative stimulus, and a different response to a different discriminative stimulus (Konorski, 1967), with the stimuli differing in some specified dimension such as direction of motion (Newsome and Pare, 1988). In differentiation, by contrast, no antecedent stimuli are manipulated. Animals learn to generate behaviors that satisfy some arbitrary criterion, relying on an internal program to select the appropriate action (Skinner, 1938;Platt et al., 1973;Kuch, 1974).
This study focuses on the differentiation of action duration and action order. Duration is the duration of a discrete lever press, the period between the pressing and the release of a lever. Order is the serial order of two presses on two different levers. Studies have demonstrated that rats are capable of duration and order differentiation (Platt et al., 1973;Kuch, 1974;Balleine et al., 1995), but little is known about how the brain coordinates these behavioral parameters. Neuropsychological and electrophysiological data from primates (including humans) point to a critical role of the motor cortices in the fine control of movement (Luria, 1966;Tanji et al., 1996;Shima and Tanji, 2000). The present study assessed the effects of excitotoxic lesions of primary motor cortex (M1) and secondary motor cortex (M2) on duration and order differentiation.

inStruMental training
Training and testing took place in 8 Med Associates (St Albans, VT, USA) operant chambers (21.6 cm L × 17.8 cm W × 12.7 cm H) housed within light-resistant and sound attenuating walls. Each chamber contained a food magazine that received pellets from a dispenser. The food reward used was Bio-Serv 14 mg Dustless Precision Pellets (Bio-Serv, NJ, USA). Each chamber contained two retractable levers on either side of the magazine and a 3-W 24-V house light mounted on the wall opposite the levers and magazine, with an infrared beam to record head entries into the magazine. A computer with the Med-PC-IV program was used to control the equipment and record behavior. The duration of each lever press was measured at a resolution of 10 ms using custom-written programs (available upon request).

lever preSS training
Magazine training began with one 30-min session, during which food pellets were delivered on a random time schedule (on average every 60 s), with no levers extended, allowing the mice to learn the location of food delivery. The next day, lever-press training began on the left lever. At the beginning of each session, the house light was illuminated and the lever was inserted. At the end of each session, the house light turned off and the lever retracted. Initial lever-press training consisted of three consecutive days of continuous reinforcement (CRF), during which the animals received a pellet for each lever press. Sessions ended at 90 min or 30 rewards (whichever came first).

Fixed-criterion, diScrete-trial, preSS duration diFFerentiation
After CRF training, the mice were successively shifted to three different temporal differentiation schedules: >400 ms, >800 ms, and >1600 ms. A discrete-trial program was used to train mice to produce lever presses with a minimum duration. Each trial began with the insertion of a lever, and ended with its retraction as soon as the lever was pressed and released. The trial was repeated, with an inter-trial-interval (ITI) of 8 s, until 50 earned pellets or 90 min. If the press lasted longer than the criterion duration, a food pellet was delivered immediately into the food magazine. If not, no pellet was given. Mice were trained for six daily sessions on each criterion: 400, 800, 1600 ms.

Sequence acquiSition and reverSal
After the end of duration differentiation training, the same mice were used for sequence training. For sequence training, two levers, approximately 11 cm apart, were inserted at the beginning of each trial, which ended after two presses on any lever. The ITI was 8 s. The only reinforced sequence was left→right. After 14 sessions, the reinforced order was changed to right→left, and animals received an additional 14 sessions on the new sequence.
To ensure that initial lever press training as well as temporal differentiation training did not bias sequence acquisition, an additional group of naive mice was also trained on the discrete-trial sequence task. These mice never received any duration differentiation training. Moreover, they first learned to press the right lever (under CRF) instead of the left lever as did the lesioned groups. After CRF, they learned the same left→right sequence, followed by a reversal of the serial order (right→left). The sequence training was thus identical to that received by the lesioned animals, except the naïve animals were only trained for 13 days on each sequence.

data analySiS
The saved data were analyzed using Microsoft Excel, Graphpad Prism, and Matlab.

reSultS hiStological analySiS oF leSionS
The extent of motor cortical lesions is shown in Figure 1. NMDA infusions clearly caused substantial damage in the targeted cortical areas. Mice with inaccurate lesions were excluded. Final group sizes were n = 8 for Group Control, n = 8 for Group M1, and n = 6 for Group M2. initial lever preSS training Figure 2 illustrates the initial acquisition of lever pressing under CRF. Motor cortical lesions had no effect on the rate of pressing or head entries, as confirmed by a planned comparison on the last (third) day of training (ps > 0.05). Thus all groups learned to press the lever for food reward under CRF.
There was a general reduction in the average press duration, and proportionally in the spread of press duration (Figures 2A,B). Median and interquartile range (IQR) were used as measures of average duration and relative dispersion. IQR decreased over 3 days of training. A mixed two-way ANOVA with Time and Group as factors revealed a main effect of Time (F 2,38 = 17.9, p < 0.05), but no effect of Group (F < 1), or any interaction between Time and Group (F 4,38 = 1.64, p > 0.05). The same is true of median duration (main effect of Time, F 2,38 = 7.68, p < 0.05; main effect of Lesion, F < 1; Lesion × Time interaction, F < 1). The coefficient of variation (IQR/median) was also reduced during this initial learning phase in all groups (interaction, F < 1; Lesion, F < 1; Time, F 2,38 = 5.9, p < 0.05).
During the initial acquisition of the action-outcome (press-pellet) contingency, therefore, there was an initial burst of variability in the press duration. This is the first report of a significant reduction in duration and IQR of press duration during early acquisition of instrumental actions. Neither M1 lesions nor M2 lesions had any effect on this pattern.

teMporal diFFerentiation oF lever preSSeS
After CRF training, the mice were trained on the duration differentiation task. All mice reached asymptotic performance after six sessions for each criterion duration (Figure 3). On each day, the median press duration value of each mouse was used as a measure of the timing of the action (see Figure 4). A mixed two-way ANOVA revealed no main effect of Lesion (400 and 800 ms, Fs < 1; for the longest duration, 1600 ms, the median duration was numerically lower for the M2 group, but this difference was not statistically reliable (F 2,95 = 2.27, p = 0.12). Nor was there any interaction between Time and Lesion, for any duration criterion (Fs < 1). For all three criterion durations, the median duration increased over six sessions of training (main effects of Time: 400 ms, F 5,95 = 30.6, p < 0.05; 800 ms, F 5,95 = 35.4, p < 0.05; 1600 ms, F 5,95 = 11.92, p < 0.05).
The duration distributions from all three duration criteria are shown in Figure 5. To assess whether the press duration data from all three groups are scalar, i.e. the spread is proportional to the average duration, a coefficient of variation was calculated (without assuming a Gaussian distribution, IQR/median from the last day of training on each criterion duration). Using this value as a measure of relative spread, a two-way mixed ANOVA was conducted with Group and Criterion Duration as factors. There was no main effect of Group (F 2,38 = 2.40, p > 0.05) or of Criterion Duration (F < 1), nor any interaction between them (F 4,38 = 1.31, p > 0.05).
That the coefficient of variation did not change with increasing criterion durations suggests that the distribution of lever press durations exhibits the scalar property, at least for the three criterion durations used in this study. This observation extends the previous literature on interval timing, which has found the scalar property to be a fundamental property of the psychophysical judgment of temporal duration (Gibbon et al., 1984). Motor cortical lesions, however, did not have a significant impact on this measure.

acquiSition oF Serial order
Two measures were used to quantify acquisition of a left-right (LR) sequence: proportion of LR sequences (of all possible sequences -LL, RR, LR, RL), and the conditional probability of R given L. As shown in Figure 6, M2 lesions impaired the acquisition of the FIgure | Histological analysis of motor cortex lesions. Photomicrographs of the M1 (A) and M2 (B) lesions. Representative lesions as well as illustration of the largest (gray) and smallest (black) extent of lesions are shown. The diagrams are based on a mouse brain atlas (Paxinos and Franklin, 2003). The numbers indicate distance in mm from bregma. Motor cortex and learning LR sequence. This observation is confirmed by a two-way mixed ANOVA conducted on the conditional probability of R given L, with Lesion and Time as factors. There was a main effect of Lesion (F 2,247 = 4.7, p < 0.05), a main effect of Time (F 13,247 = 38, p < 0.05), but no interaction between these factors (F 26,247 = 1.2, p > 0.05). As shown by a planned comparison of performance on the last day of training, M2 group were significantly lower than either the M1 or the control group (p < 0.05), but there was no difference between the M1 and control groups (p > 0.05).
The inter-response-time between two actions in a correct sequence decreased significantly during training, but this measure did not differ between groups (Figure 6). A two-way mixed ANOVA conducted on data from the first and last days of training revealed a significant main effect of Time (F 1,19 = 29.2, p < 0.05),

reverSal oF Serial order
After 14 days of initial sequence training, the correct sequence was reversed to right→left (RL), and the mice were trained for another 14 days. Figure 7 illustrates the acquisition of the reversed serial order. When the reinforced sequence was changed to right→left after 14 days, the mice gradually learned to reverse the sequence. Again, M2 lesions produced a significant deficit on sequence reversal; but unlike initial acquisition, M1 lesions also impaired the learning of the opposite sequence. This conclusion

duration oF preSSeS in a Sequence
An interesting observation is the difference in duration distribution between the left (distal) and right (proximal) presses. The distal press is generally longer and more variable than the proximal press (Figure 8). A two-way mixed ANOVA was conducted on the duration data from the last day of left-right sequence training. There was no interaction between Order and Group (F 2,19 = 2.4, p > 0.05), no main effect of Group (F 2,19 = 1.6, p > 0.05), but a main effect of Order (F 1,19 = 21.2, p < 0.05). Given no group differences, data from all three groups are combined and the mean median duration of the left press is 0.26 ± 0.03 s, and 0.13 ± 0.01 s for the right press, showing a rough 2:1 ratio. However, as shown in Figure 9, when the serial order was reversed, the relative durations of the two presses in a sequence were only partially reversed.

Sequence acquiSition and reverSal in naïve Mice
The difference in duration distribution between distal and proximal presses in a simple sequence is the first such observation reported in the literature. But two peculiarities of the training procedure may be responsible for the observed results. The mice were trained on the duration differentiation before sequence training, and they all learned to press the left lever first. To rule out possible effects of previous training, five naïve mice were first trained to press the right lever, and then, without any specific duration training, they were trained on the same LR sequence followed by RL. These mice showed similar acquisition and reversal (data not shown). Most importantly, as shown in Figure 10, the duration distribution of the lever press depends on its relative position in the sequence. A planned comparison revealed that the median duration of the first press is significantly longer than that of the second press for both left-right (p < 0.05) and rightleft sequences (p = 0.05).

diScuSSion
Introducing two simple techniques for studying action duration and sequencing in mice, this study reports the effects of motor cortical lesions on temporal and sequential differentiation of behavior.
The findings can be summarized as follows: (1) Selective reinforcement of an arbitrary duration or serial order produced rapid learning of these behavioral parameters in mice. (2) All mice, regardless of lesion group, reduced their press duration and variability during initial acquisition (Figure 2). (3) Motor cortical lesions did not affect the acquisition or performance of lever pressing per se. (4) Temporal differentiation of lever press duration in mice obeys Weber's law -it displays the scalar property, i.e. the spread of the distribution is proportional to the median ( Figure 5). (5) Motor cortical lesions did not alter the temporal differentiation of actions, though M2 lesions resulted in a small reduction of median duration at the longest criterion duration (1600 ms; Figure 4). (6) The distal and proximal presses in the two-press sequence differ in their duration distribution: Whereas the distal action is longer and more variable, the proximal action is shorter and more stereotypical (Figure 10). The first action is on average twice as long as the second, though this ratio was not affected by motor cortical lesions. (7) M2 lesions impaired the learning of serial order, but M1 lesions did not (Figure 6). Both M1 and M2, however, appeared to be critical for the reversal of serial order (Figure 7). M2 appears to be a critical neural substrate for the learning of serial order. M1 is not necessary for the acquisition of a new serial order, but is needed for flexible reversal of an acquired order. Neither area is needed for instrumental learning or performance per se. Neither area is needed for the differentiation of action duration, FIgure 0 | Sequence acquisition and reversal in naïve mice without previous temporal differentiation training. Lever press duration distributions of the first and second actions on the last day of training for left-right sequence (LR) and for right-left sequence (RL). Also shown is the mean of the median press duration of the two actions for LR and RL sequences.

Yin
Motor cortex and learning though M2 lesions resulted in numerically lower median durations at the highest criterion value.

diFFerentiation oF actionS
Differentiation is the selection of certain forms of behavior by their consequences, a fundamental learning process by which adaptive behavior is generated. Differentiation is central to what is broadly called operant or instrumental learning. Traditional studies on instrumental learning emphasize the rate of behavior, on 'rate differentiation' , by selectively reinforcing higher rates of behavior. In the most extreme case, namely CRF schedules, the differentiation is between action and no action, e.g. pressing is reinforced, not pressing is not. But not all behaviors are effective on account of their rate of occurrence. Nor does rate differentiation suffice to reveal the true representational capacity of the brain. For in addition to rate, behavior has many other properties, such as force, duration, order, etc., which are germane to the question of 'how to do something' , or skills, and therefore critical for the generation of adaptive behavior. Differentiation is a powerful method to study how actions are represented and how instrumental control (the control over consequences) is achieved through the successful selection of the self-generated behavioral parameters. For successful selection to occur, it must operate on a substrate of behavioral variability, requiring some representation of the individual variants, or action representations, which can then be linked to the consequent reward. As shown in Figure 2, the initial variability in press duration was high but decreased rapidly as the mice learned to press the lever to earn rewards. This initial burst of variability is not necessary to earn reward. It is an active exploratory process by which the animal gathers information about the world. There being no a priori reason for a press of a particular duration to be effective, under such a state of uncertainty, which characterizes much, if not most, of the contingencies in the animal's interaction with its environment, the optimal strategy is to produce a sufficient level of variation at a specific point in time, such as the beginning of the interaction with a lever. When such variability is established by experience to be unnecessary, it is then reduced and a specific policy yielding the most rewarding outcomes pursued.
The variants -action representations in the broadest sense -are constrained in many ways. One constraint is cost. For example, despite a wide range of possible press durations, only relatively brief durations are actually produced, because the longer the press duration the higher the cost of action. The actually emitted duration therefore represents a compromise between maximizing reinforcement probability by exceeding the criterion and minimizing effort by producing the shortest possible duration (Skinner, 1938;Kuch, 1974). The same is true of sequence acquisition, the discrete trials design sets a limit at two presses and only four possible sequences (LL, RR, LR, RL), and some of which are easier than others (e.g. LL is easier than LR).

iMplicationS For the Study oF tiMing
Two procedures have been used to study interval timing in the range of seconds to minutes. In temporal discrimination, animals time the duration of a stimulus (Roberts, 1981;Zeiler, 1985). In temporal differentiation, animals time the duration of self-generated actions. Of the two, temporal discrimination has been much more popular, in part due to better experimental control of the duration to be timed. The present results suggest that within the range studied here (400-1600 ms), differentiated lever presses obey Weber's law, an established property of interval timing data from traditional temporal discrimination studies. Thus a link can be established between the present results and the temporal discrimination literature. An intriguing question is whether neural substrates for discrimination and differentiation are also similar. Although differentiation and discrimination are very different processes, as noted above, the scalar property observed in both may reveal common underlying mechanisms. This possibility remains to be investigated. Of great significance to electrophysiological investigations of interval timing, the differentiation procedure does not require a separate reporter of timing such as rate of lever pressing, as is the case in traditional peak interval procedures used for temporal discrimination, thus avoiding a major confounding variable, and enabling direct investigations of the scalar property of timing.

the eFFectS oF Motor cortical leSionS
No previous study has examined the effects of motor cortical lesions on differentiation of action duration and serial order in rodents. And of the substantial literature on the motor cortices in primates, one can find little on the role of these structures in self-initiated action in the absence of discriminative stimuli. Historically the most dominant tradition in the study of the motor cortices has used cued movements, with experimental designs that do not permit an examination of the functional role of the motor cortices in self-initiated and reward-guided action differentiation. The present study is in fact the first to report a significant functional dissociation of primary and secondary motor cortices in mice.
Motor cortical lesions did not impair the ability to generate the required variants. Mice with M1 or M2 lesions were perfectly capable of producing the required action duration or sequence. Rather the impairment, mostly observed after lesions to the M2, lies in the selection of the appropriate variants. M2 lesions resulted in numerically lower median press duration at the highest criterion duration (1600 ms), but this effect was not statistically significant, suggesting that M2 is not critical for the control of action duration. Whether M2 merely plays a secondary role in representing action duration and, if so, what this secondary role is, remains unclear. A recent study found cells in the monkey supplementary motor cortex, an area roughly comparable to M2 in mice (see below), that show spiking activity correlated with the duration of actions (Mita et al., 2009). Yet it is not known whether lesions of the supplementary motor cortex in monkeys impair control of action duration. Although present results do not indicate a critical role of the M2 in this process, further work is needed to clarify the role of M2 in temporal differentiation.
On the other hand, M2 lesions did produce a significant deficit in sequence learning and reversal (Figures 6 and 7). M2 may be functionally similar to the supplementary motor cortex in primates, lesions of which impair the learning of new sequences without causing paralysis or akinesia Thaler et al., 1995). Moreover, cells in this area fire before the production of a specific sequence of actions (Tanji et al., 1996). The encoding of serial order, therefore, appears to be a major function of the secondary motor cortices in both primates and rodents, suggesting phylogenetic continuity in the functional role of this neocortical area.
More surprising, perhaps, is the observation that M1 lesions have no effect on acquisition and performance of lever pressing, on the differentiation of action duration, or on the acquisition of action sequences. It is possible that the M1 lesions in this study are incomplete, and that a deficit might emerge with more complete lesions. But comparable damage to the M2 did severely disrupt sequence learning and reversal, clearly dissociating the roles of these two motor cortical areas. Thus, at least in mice, M1 is not necessary for the control of action duration or for instrumental learning per se.
A previous study showed that motor cortical lesions produced only transient effects on grooming sequences, which are more innately organized action patterns in rodents (Berridge and Whishaw, 1992). The only significant effect of M1 lesions observed here was impaired reversal of an acquired sequence, supporting a role for M1 in the flexible control of learned serial order. This impaired flexibility in reversal the serial order of behavior after damage to the primary motor cortical has never been reported in any species before.

diFFerenceS between the proxiMal and diStal actionS in a Sequence
An intriguing result is the difference in press duration between the proximal and the distal actions in a simple sequence. Experiment 2 demonstrated that the distal action is longer on average, with a correspondingly broader distribution, whereas the proximal action is shorter in duration, with a narrower distribution (Figures 8 and 9). When the sequence was reversed, so were the relative durations, though not completely given the amount of reversal training given here. It is therefore the relative position in a sequence that determines the relative duration distribution. The distal action is roughly twice as long as the proximal action. On a trial-by-trial basis, however, no consistent correlation was detected between the first press and the second press in a sequence (data not shown).
Long-short patterns of action duration have been reported previously in the swing ratios of human Jazz performance, though it is not clear how analogous these phenomena are (Friberg and Sundstrom, 2002).
It is important to emphasize that this natural distribution of press durations in a sequence cannot be explained by trivial factors such as the sound of the pellet dispenser, which is not activated until the proximal lever press is released. A more likely explanation is suggested by previous work demonstrating differences in motivational control between the proximal and distal actions in a sequence (Balleine et al., 1995;Corbit and Balleine, 2003). Distal actions are thought to be more instrumentally controlled (i.e. by the action-reward contingency), whereas proximal responses are more susceptible to control by Pavlovian stimulus-reward contingencies. These distinct motivational systems, which are thought to depend on dissociable neural systems, may show trademark durations. Thus in a behavioral sequence, the more distal and instrumentally controlled component tends to be longer in duration than the more proximal and consummatory component under Pavlovian control. At present, however, no satisfactory explanation can be given for these differences.

concluSion
Probing the representational capacity of the brain requires the development of precise and sensitive tools for behavioral analysis.
Here two simple operant techniques are used to assess action differentiation, more specifically the control over the duration and order of actions, and reveal for the first time distinct roles of the primary and secondary motor cortices in these behavioral parameters in mice. These analytical tools, combined with a model organism with readily available genetic tools, have considerable advantages in studying voluntary behavior and its neural substrates.

acknowledgMentS
This work is supported by Duke University. I would like to thank Warren Meck for his advice and Oksana Shelest, Jay Gupta, and Alberto Lopez for their help with the experiments.