Ventral tegmental area neurons use an ensemble code for representing information about ongoing actions

Affiliations: Department of Psychiatry, University of Pittsburgh, 450 Technology Drive Suite 223, Pittsburgh, PA 15219. Department of Psychology, University of Memphis, Psychology Building 400 Innovation Drive, Memphis, TN 38152 Department of Statistics, Carnegie Mellon University, 132 Baker Hall, Pittsburgh, PA 15213. Center for the Neural Basis of Cognition, Carnegie Mellon University and the University of Pittsburgh, 4400 5 Ave, Pittsburgh PA, 15213 Machine Learning Department, Carnegie Mellon University, Gates Hillman Center 8203, Pittsburgh, PA 15213 Department of Neuroscience, University of Pittsburgh, A210 Langley Hall, Pittsburgh, PA 15260.


INTRODUCTION
Organization of goal-directed behavior requires real-time monitoring of ongoing actions.
Lesions of VTA dopamine projections to the striatum or antagonists of dopamine receptors impair the capacity to perform large numbers of actions without impacting the ability to complete one (or a few) action(s) during reward-motivated behavior (Aberman & Salamone, 1999;Ishiwari et al., 2004;Mingote et al., 2005). Recordings from VTA neurons, however, have primarily focused on homogenous responses to reward or aversion related events that require no or few actions (Schultz, 1998;Matsumoto et al., 2016). Little is known about how VTA neurons encode information when multiple or varying number of actions are required to achieve a goal.

Animals learn to execute serial actions for reward
VTA activity was recorded while rats learned to execute multiple actions for random reinforcement with one sugar pellet (Fig. 1A, B). In the first session, each action was rewarded (FR01). In session 2, the reward probability was decreased from 1 to 0.2 across 3 blocks of trials. In sessions 3 and 4, actions were reinforced at a probability of 0.2 (RR05). In sessions 5 -7, actions were reinforced at a probability of 0.1 (RR10). In all randomly reinforced trials, unpredictable and varying numbers of actions were required, but each action was equally likely to be reinforced.
Action response rates increased as reinforcement probability decreased (Fig. 1C, D), and the response rate in the final RR10 sessions was significantly higher than all other sessions (Fig. 1D;F (6,24) = 4.726, p = 0.003). To investigate a potential relationship between behavioral performance and action number, we divided the RR10 data into three bins: actions 1-7 (low), actions 8-14 (medium), and actions 15-21 (high). We analyzed serial actions in trials reinforced with a probability of 0.1 (RR10 sessions, sessions 5-7), because this reinforcement schedule produced the greatest average number of actions performed per trial (Fig. 1C). Within a trial, the inter-action interval significantly increased with increasing action number bins ( Fig. 1E; F (2,44) = 11.020, p < 0.001). This indicated that behavior was sensitive to action number. The latency to retrieve the reward (r (3468) = -0.017, p = 0.309) or initiate responding in the next trial, however, was not correlated with binned action number (r (3445) = 0.015, p = 0.389). These data suggest that the number of actions performed in a trial did not affect behavioral responses to reward or cues.

Neurophysiological recordings
VTA units (n=375, Fig. S1-S3) were recorded over 7 sessions from 10 rats. All neurons were included in these analyses because we were interested in understanding the full diversity of activity patterns in the VTA. Neurons were classified as putative 'dopamine' (n = 155) or 'nondopamine' neurons (n=220) based on validated criteria (Grace & Bunney, 1983;Schultz, 1998;Ungless & Grace, 2012). This classification approach permits comparison with previous work, despite potential inaccuracies (Margolis et al., 2006) and with the caveat that some dopamine neurons co-release other neurotransmitters (Tritsch et al., 2012). We did not observe strong clustering in the electrophysiological profiles of these neurons. It is important to underscore that neurons were collected with no attempt to select for particular electrophysiological characteristics. When VTA neurons are recorded in an unbiased manner, as done here, weak clustering may be expected (Kiyatkin & Rebec, 1998). Furthermore, recent studies which have optogenetically verified neuron subtypes show that VTA neurons form similar clusters when recorded in this fashion (Cohen et al., 2012). In addition to the above classification, analyses were performed on reward-responsive dopamine neurons, as this subgroup may be a more conservative estimate of dopamine neuron identity (Lak et al., 2014;Eshel et al., 2016).

VTA neurons are diversely tuned to serial actions
In order to understand how VTA neurons encode information during serial actions, we examined how activity was modulated by actions in RR10 sessions. We observed preferential responding to low, medium, or high numbered actions within the population of simultaneously recorded neurons ( Fig. 2A-C). To determine how VTA neurons responded to serial actions, we calculated each neuron's tuning curve as a function of action number within a trial (Fig. 3A).
The diversity of the tuning curves of VTA neurons suggested that network properties may be critical to information processing during serial actions. To gain insight into VTA networkbased information processing, we calculated the trial-by-trial correlations in spike counts (noise correlations), and correlations between tuning curves (signal correlations) for all simultaneously recorded pairs of neurons. Noise correlations reflect functional connectivity between neurons while signal correlations reflect similarity in tuning curves (Cohen & Kohn, 2011). Correlations were examined between all possible pairings of VTA neurons (dopamine -dopamine, nondopamine -non-dopamine, and dopamine -non-dopamine).
We found that noise correlations during actions were significantly lower in pairs of nondopamine neurons than pairs containing a dopamine neuron ( Fig. 4A; F (2,858) = 4.052, p = .018).
The magnitude of these correlations was decreased during low numbered actions compared to medium or high numbered actions (F (2,1716) = 14.625, p < .001). It is unlikely that these differences were due to differences in spike count because the pairwise geometric mean spike count was not associated with noise correlation magnitude in any bin (low: r = 0.008, p = .805; med: r = -0.015, p = .650; high: r = -0.048, p = .158). In contrast, signal correlation strength did not differ between different pairings of VTA neurons ( Fig. 4B; F (2,858) = 0.289, p = .749). There was a strong association between signal correlations and noise correlations (Fig. 4B, Table 1), suggesting that there is a high degree of functional connectivity between pools of similarly tuned neurons. This result raises the possibility that different patterns of functional connectivity contribute to the diversity of tunings to actions.

Actions in a trial are accurately discriminated from VTA neuronal ensemble activity
The combination of tuning curve heterogeneity, lack of population average signal for serial actions, and the association between action-evoked signal and noise correlations suggested that information about ongoing sequences of actions may be encoded by VTA ensembles. To quantify the accuracy of this encoding mechanism, we differentiated low, medium, and high action number bins from either population-averaged activity (Fig. 5A) (Table 2). Similar proportions of these neurons were positively or negatively correlated, indicating that this relationship was inconsistent between neurons (Table 2). Likewise, very few neurons had responses to the last action in each trial that were significantly correlated with the number of actions in that trial; there was no significant difference in the proportion of dopamine and non-dopamine neurons with positive or negative correlations (Table 2).
We further investigated whether reward anticipation modulated neuronal responses by examining the relationship between responses evoked by the cue light turning off, reward delivery, or reward retrieval, and the number of actions performed in each trial. We found no significant differences in the proportion of dopamine or non-dopamine neurons with positive or negative correlations between responses evoked by cue onset and the number of actions in the previous trial or between cue light offset, reward delivery, or reward retrieval responses and the number of actions in the current trial (Table 2). These results indicate that performance of serial actions did not increase or decrease reward anticipation supporting the notion that VTA neurons encode information about ongoing behaviors as opposed to information about time or reward anticipation.

Reward delivery evoked responses
Reward delivery evoked greater activity in dopamine neurons than in non-dopamine neurons ( After the final action in a trial, the cue light was immediately extinguished and reward was delivered 0.5 s later. We hypothesized that cue light offset would evoke VTA responses, similar to a reward prediction error, because this event signaled new information about unpredictable outcome delivery (Montague et al., 1996;Schultz, 1998). As learning progressed, cue offset evoked great activity levels ( Fig. 6E; F (6,361) = 4.776, p < .001), and activated a greater proportion of neurons ( Fig. 6F; all neurons, Χ 2 (6) = 45.949, p < .001; dopamine neurons, Χ 2 (6) = 22.093, p = .001; non dopamine neurons, Χ 2 (6) = 29.744, p < .001). This finding confirms that we observed responses to cues and rewards that are attributed to dopamine neurons (Montague et al., 1996;Schultz, 1998).

Cue evoked responses
When animals were required to perform a single action to earn rewards (FR01), the majority of dopamine neurons encoded cue onset (Fig. 7A), which is consistent with previous observations (Schultz, 1998;Roesch et al., 2007 ). Non-dopamine neurons followed a similar pattern (Fig. 7B). During random ratio sessions, the population response to the cue decreased significantly as response requirement increased ( Fig. 7C; session, F (6,361) = 2.667, p = .015).
This pattern of responding did not differ between dopamine and non-dopamine neurons (

Cue and reward evoked responses are decoupled when performing serial actions
We assessed the correlation between each neuron's responses to cue onset and reward delivery in order to understand the relationship between encoding of these events. During FR01 sessions, responses evoked by these events were significantly correlated in both dopamine and non-dopamine neurons (Table 3). This correlation decreased in RR05 sessions (Table 3), and was no longer significant by RR10 sessions (Table 3). A similar pattern was observed in dopamine neurons that were activated by reward delivery (Table 3). This suggests that, although cue and reward responses are positively correlated in Pavlovian conditioning tasks (Eshel et al., 2016), cue-evoked responses become decoupled from reward responses during an instrumental random reinforcement schedule.

Modulation of noise correlations by cue onset and reward delivery in serial action trials
We calculated the signal and noise correlation between pairs of VTA neurons during cues and reward delivery in RR10 sessions. We found a significant interaction between pair type (dopamine pairs, non-dopamine pairs, and mixed pairs) and event ( suggesting that mean activity level did not account for this difference. Noise correlations were strongest during reward delivery overall. Noise correlations during reward delivery did not differ significantly between pairings of neurons ( Fig. 8; F (2,832) = 2.794, p = .062). These data confirm previous findings that correlated activity in VTA circuits emerges when rewarding outcomes can be earned (Joshua et al., 2009;Kim et al., 2012;Eshel et al., 2016).

DISCUSSION
We investigated how VTA neurons encode information when animals execute a series of actions to earn a reward. Dopamine and non-dopamine VTA neurons were preferentially tuned to different subsets of binned action number in a manner that allowed serial actions to be accurately discriminated from the collective activity of VTA ensembles. In addition, pairs of neurons with similar action tuning curves had higher action-evoked noise correlations suggesting that VTA network properties may be critical to information processing during serial actions. Our analyses further showed that this pattern of tuning is distinct from reward prediction related activity or passage of time.

Encoding information during an action series
Previous reports indicate dopamine neurons encode errors in the predicted value of rewards and encode action value (Montague et al., 1996;Schultz, 1998;Morris et al., 2006;Roesch et al., 2007). Dopamine neurons also phasically fire during the first and last action in a sequence, which may be critical to initiating and terminating behavior (Jin & Costa, 2010), and dopamine release occurs during the first action in a short sequence (Wassum et al., 2012).
Consistent with previous work (Schultz, 1998), we observed that population responses to cue offset after action completion (which predicts impending rewards) increased during learning.
The reward encoding properties of dopamine neurons notwithstanding, our data indicate that the collective activity of ensembles of VTA neurons has the capacity to encode information about ongoing behavior in real-time.
The ensemble code revealed here may be used to organize and sustain actions, which is a fundamental feature of dopamine's role in cognition (Aberman & Salamone, 1999;Ishiwari et al., 2004;Salamone et al., 2009). In particular, this signal could be critical for supporting the unique motivational and cognitive demands of trials requiring large numbers of actions (Goldman-Rakic, 1998;Aberman & Salamone, 1999;Salamone & Correa, 2002;Seamans & Yang, 2004;Robbins & Roberts, 2007;Salamone et al., 2007;Robbins & Arnsten, 2009). VTA dopamine neurons project to several networks that may utilize VTA signals during serial actions to organize behavior. For instance, dopamine innervation of the ventral striatum is selectively required for completing a lengthy series of instrumental actions, but not for executing small numbers of actions for reward (Aberman & Salamone, 1999;Ishiwari et al., 2004;Mingote et al., 2005). Dopamine projections to the prefrontal cortex are important for working memory and related constructs that require real-time information about ongoing behavior (Goldman-Rakic, 1998; Seamans & Yang, 2004;Robbins & Roberts, 2007;Robbins & Arnsten, 2009). VTA neurons could contribute to these functions by continually encoding information throughout execution of a lengthy series of actions, allowing striatal and prefrontal networks to track goaldirected effort expenditure and progress.
Our analyses suggest that VTA responses do not represent the passage of time in the current task. Although dopamine responses are sensitive to the timing of reward delivery and necessary for perception of time (Lake & Meck, 2013;Bermudez & Schultz, 2014;Soares et al., 2016), neuronal responses were not consistently modulated by these variables. Instead, activity during a series of actions was most strongly modulated by action execution. This is likely due to the fact that reward delivery was not contingent upon elapsed time, and that VTA neurons are sensitive to the contingency between action execution and outcome delivery. Experimental designs with an explicit contingency between elapsed time and reward delivery, such as a fixed interval reward schedule, may reveal a more complex relationship between action execution and within trial timing in the responses of VTA neurons.
We also found that the number of actions that animals were required to perform did not modulate VTA responses to the rewarded action, or the first action in the next trial. This indicates that recent trial history does not affect action evoked responding at the beginning or end of a series of actions, and that the expected value of rewards is not updated on a trial-bytrial basis. Similarly, VTA responses to stimuli associated with reward were not consistently modulated by the number of actions that an animal performed, suggesting that VTA correlates of reward anticipation did not increase or decrease according to action number. This result was expected, as each action was reinforced with equal probability.

VTA ensemble encoding
Neurons with similar action number tuning curves had higher action-evoked noise correlations in spike count, which reflect shared connectivity between neurons (Cohen & Kohn, 2011). This suggests that different action number tunings may arise from unique inputs or differing connection strengths between inputs, and that VTA neurons with similar tunings may share connection properties. VTA receives inputs from an expansive set of afferent structures (Geisler & Zahm, 2005;Geisler et al., 2007;Watabe-Uchida et al., 2012), and the diversity of inputs that converge in VTA may contribute to preferential firing in different subsets of serial actions.
The association between noise correlations and action selectivity indicates that VTA network properties are critical to understanding how information about serial behavior is encoded. Networks can encode information redundantly or through diverse activity patterns and heterogeneous tunings naturally lead to the capacity to encode information as ensembles.
Accordingly, serial actions were accurately discriminated from ensemble activity, but not from population-averaged activity. Though previous studies suggest VTA activity is highly redundant (Joshua et al., 2009;Schultz, 2010;Glimcher, 2011;Kim et al., 2012;Eshel et al., 2016), this work was limited to tasks requiring very few actions. The diversity of VTA tuning curves may increase to match the expanded behavioral state space of serial actions compared with the limited state space of single action trials (Eshel et al., 2016). Thus, ensemble encoding of information may occur when the complexity of a task increases, such as during ongoing serial actions.
Both dopamine and non-dopamine neurons had comparable mixed tunings to serial actions, suggesting that dopamine neurons can cooperate with non-dopamine VTA neurons to encode information about serial actions. These ensemble signals consisting of multiple neurotransmitters could allow information to be decoded through multiple signaling mechanisms, which may diversify the spatiotemporal properties of this signal (Seamans & Yang, 2004;Kim et al., 2010;Barker et al., 2016). The activity of different types of VTA neurons must ultimately be coordinated and unified to represent information, and the present work demonstrates how different types of VTA neurons could collectively encode information.

Conclusions
Our date reveal a novel form of information processing by VTA neurons that may explain their expansive role in cognition and motivation. The unique ensemble coding scheme that we observed allows heterogeneous groups of VTA neurons to provide a real-time account of ongoing behavior. This coding scheme may subserve the well-established role of dopamine in goal-directed actions, decision making, and the behavioral disorganization and amotivation associated with illnesses such as ADHD and schizophrenia.

Subjects and Apparatus
All procedures were conducted in accordance with the National Institutes of

Behavior
Each rat was given 7 days to recover from surgery and food restricted to 90% of their free feeding body weights. Rats were habituated to handling for 5 minutes per day for 3 consecutive days, before being habituated to being handled and connected to a headstage cable in the procedure room for 2 additional days. Following habituation, rats were given a single 30 minute magazine training session in the operant chamber, in which sugar pellets were delivered on a variable time 75s reinforcement schedule. When each pellet was delivered, the pellet trough was illuminated for 4s. The animal's behavior had no programmed consequences in the magazine training session.
Following the magazine training session, each animal began instrumental conditioning.
During all instrumental conditioning sessions, each trial began with illumination of the nose poke port (cue light onset). This served as a discriminative stimulus that reinforcing outcomes (sugar pellets) were available (termed the 'response period'), contingent upon the animal executing actions (nose pokes into the lit port). In each trial, actions were reinforced randomly, according to a predetermined probability. When an action was executed, the behavioral system controller randomly drew an outcome state (either reinforcement or no programmed consequence) with replacement, according to the probability of reinforcement. Each action was reinforced randomly and independently of the animal's action history within that trial or session. When an action was reinforced, the cue light was immediately extinguished (cue light offset) and nose pokes had no additional programmed consequences. A 0.500s delay between the final action and outcome delivery was instituted to temporally separate these events, as done in previous work (Schultz et al., 1993). Following this delay, the outcome was delivered to the animal and the food trough was illuminated. Outcomes were delivered into the food trough from a standard pellet magazine, via the operation of a smaller stepper motor and dispenser. The food trough remained illuminated and the task did not progress until the animal retrieved the outcome. Once the animal retrieved the outcome, a variable length intertrial interval (ITI) of 10-12s was initiated. In each session, 180 trials were administered.
In the first instrumental conditioning session, actions were reinforced with a probability of 1 (each action was reinforced) equivalent to a fixed ratio 1 (FR01) reinforcement schedule. In the second session, the probability that an action was reinforced was decreased across three blocks of trials. In the first block of 60 trials, actions were reinforced with a probability of 1 (FR01). In the second block of 60 trials, each action had a 1 in 3 chance of being reinforced (random ration 3, RR03). In the third block of 60 trials, the probability was further decreased to 0.2 (random ratio 5, RR05). In sessions 3 and 4, actions were reinforced with a 0.2 probability for all trials (RR05). In sessions 5-7, actions were reinforced with a probability of 0.1 for all trials (random ratio 10, RR10). In all trials but the FR01 trials, animals were required to execute an unpredictable, varying, and randomly determined number of actions per trial. Random reinforcement was utilized to limit the ability of the animal to correctly anticipate reward delivery.
Actions differed from each other mainly in terms of their location within the action series in each trial (the action number within a trial, e.g. 1st action, 2nd action, 3rd action, etc.). In each trial, each animal's action rate was calculated as the number of actions divided by the duration of the response period. This served as a measure of behavioral conditioning and performance.
Changes in behavior across sessions were assessed with repeated measure analysis of variance (ANOVA), and repeated measures contrasts were applied as appropriate (Kass et al., 2014). The time interval between each action (inter-action interval) was measured for bins of low, medium, and high numbered actions (actions 1-7, 8-14, and 15-21). Statistical differences between binned inter-action intervals were assessed with repeated measures ANOVA. This measured the animal's behavioral sensitivity to increasing action numbers throughout a trial. To examine the effects of the number of actions performed in a trial on fatigue, motivation, or attention, the number of actions performed in each trial was correlated with the latency to retrieve the reward, or initiate the next trial.

Histology
Following the completion of experiments, animals were perfused with saline and brains were extracted. Each brain was stored in a mixture of sucrose and formalin. The brains were then frozen and sliced in 60 µm coronal sections on a cryostat, before being stained with cresylviolet. The location of each implant was histologically verified under light microscope according to Swanson's brain atlas (Swanson, 2004). Animals were excluded if electrode location could not be confirmed in VTA.

Electrophysiology
During experiments, animals were attached to a flexible headstage cable and motorized commutator that allowed the animal to move freely about the operant chamber, with minimal disruption of behavior (Plexon, Dallas, TX). Neural data were recorded via the PlexControl software package, operating a 64-channel OmniPlex recording system (Plexon, Dallas, TX).
Neural data were buffered by a unity gain headstage and then a preamplifier. The digitized broadband signal was then band-pass filtered (100 Hz -7 KHz). High-pass filtering can affect spike waveform shapes and neuronal identification, but with freely moving animals it is necessary to apply these filters to remove artifacts from the neuronal signal (Ungless & Grace, 2012). The filter pass bands that were utilized in the current manuscript are consistent with those that have previously been used to record from dopamine containing brain regions (Schultz et al., 1993;Fiorillo et al., 2003;Tobler et al., 2005). Data were digitized at 40 KHz and continuously recorded to hard disk. Voltage thresholds were applied to the digitized spike data offline (Offline Sorter, Plexon, Dallas, TX). Single units were sorted using standard techniques, and were utilized only if they had a signal to noise ratio in excess of 2/1, and were clearly separated from noise clusters and other single unit clusters.
A VTA neuron was classified as dopaminergic if it had broad action potentials, greater than 1.4 ms in duration, and a mean baseline firing rate less than 10 Hz. These criteria are similar to those used in previous studies (Hyland et al., 2002;Fiorillo et al., 2003;Anstrom & Woodward, 2005;Pan et al., 2005;Tobler et al., 2005;Anstrom et al., 2007;Totah et al., 2013).

Neuronal Data Analysis
Each single unit's spike times were binned into spike counts (0.025s bins) within a trial.
Binned spike counts were aligned to all events (e.g. cue light onset, actions, time period between cue light offset and outcome delivery, and outcome delivery). A stable, four s portion of the ITI (5s to 1s prior to cue light onset) served as the neuronal activity baseline. Single unit firing rates were Z-score normalized relative to baseline and zero-centered before activity was averaged together. Each unit's normalized activity was examined in 0.250s windows around events (cue onset: +0.050 -0.300s, relative to cue onset; -0.125 -+0.125s, relative to the time of action execution; time period between cue offset and outcome delivery: +0.150 -0.400s, relative to execution of the last action; outcome delivery: +0.050 -0.300s, relative to delivery).
To assess between-session changes in population-level evoked activity, windowed activity was compared with a between groups two way ANOVA, with session number and neuron type (dopamine or non-dopamine) as grouping variables. In all cases, protected Fisher's least significant difference tests were applied as appropriate.
A unit was classified as being activated or suppressed by an event if it met two criteria: 1) a significant paired samples t-test comparing raw (non-normalized) baseline firing rates with raw evoked firing rates, and 2) three or more consecutive bins of mean activity within the eventwindow, that were in excess of a 95% confidence interval around the baseline mean. With respect to a given task event, the proportions of units classified as being activated or suppressed were calculated. Differences in the proportions of activated units between sessions were compared with a Chi-squared test of independence. As expected, insufficient numbers of suppressed neurons (in some sessions zero) were obtained to permit reliable statistical analyses of this class of responses. Suppressed neurons are plotted for clarity, but not analyzed in detail.
The terminology "action-evoked" neuronal responses refers to activity around the time of action execution, without assuming that the action is solely responsible for evoking this neuronal response. Each unit's activity was examined as a function of action number (a unit's mean response to each n th numbered action within a trial, across all trials). These analyses were  (Fig. 3A). Because evoked firing rates could span a large range of values, each tuning curve was scaled so that the maximum evoked activity was equal to 1 and the minimum evoked activity was equal to 0. Scaled tuning curves were used only for visualization.
To compute the number of maxima, each tuning curve was fit with a cubic smoothing spline (smoothing parameter 0.001) and maxima were detected as points in the spline with derivatives equal to 0.

Population Average Decoder
A decoder classified binned action number according to the trial-averaged spike count, also averaged across neurons, on an action-by-action basis. This is called the 'population average decoder.' Binned action number, , is defined as 3 bins of 7 consecutive actions (1-7, 8-14, or 15-21). The classifier maximizes Here ( � | )represents the probability of the population averaged response occurring, given a particular action bin. The probability of an action belonging to an action bin is denoted by ( ), and is uniform across action bins in the current work. That is, the prior probability for each bin, Here, * is the population average of the held out action number. ̂ * is the resulting estimated action bin. Testing is repeated for each action number.

Ensemble Decoder
Assuming � is not a sufficient characterization of { } =1 , which we will call �⃗ , ( � | ) does not thoroughly represent the information in ( �⃗ | ). Therefore, we instead assume relevant information is captured by a higher-dimensional representation: � �⃗ � �~( , Σ). The projection, , estimated by principal components analysis, is necessary to ensure the covariance, Σ , can be estimated by the usual pooled sample covariance, Σ � . This maximizes the dimensionality of the space into which we project �⃗ , for a given sample size. We then maximize ( | �⃗ ). This maintains the varying firing rates in each unit, elicited by actions within different bins. Note that the population average decoder is the special case of this approach wherein is a row vector of ones. The projection and classifier can be learned with each neuron's set of evoked trial-averaged spike counts as a feature. Each action's set of observed trial-averaged spike counts, the new test observations, * ������⃗ , are then projected via the weights estimated from the training data and classified sequentially (that is, we estimate the set of {̂} =1 3 on all the remaining action numbers, which is equivalent to training, and maximize the analogous posterior).

Statistical testing of the decoders
To test whether the two decoders produce equal or different classification accuracy, we fit a binomial generalized linear model and controlled for action number, session, and unit type (dopaminergic versus non-dopaminergic). For each trial of action bin , tested on decoder , the log odds of the probability of correct classification was assumed to be linear in its covariates as given by the logistic regression function logit� � �̂� | ( , , , )�� = 0 + ⋅ + ⋅ + ⋅ + ⋅ , where �̂� is an indicator signifying correct or incorrect classification, and , , , are indicators (or a set of indicators) for the decoder, action bin, session, and dopaminergic unit, respectively. Likelihood ratio tests were used to test the effect of each group of factors, corresponding to each feature. Additionally, statistical significance of decoder performance (versus chance levels of correct classification) at each action number was determined via permutation test. Approximations to the exact p-values can be determined by assigning classes to each action according to a uniform prior distribution, and calculating the proportion of sets of resulting classifications that perform better than the classification rates observed using the previously discussed methods.
Rho p value Dopamine -Dopamine Pair 0.571 < 0.001 Dopamine -Non-Dopamine Pair 0.569 < 0.001 Non-Dopamine -Non-Dopamine Pair 0.528 < 0.001 Table 1.   Actions (nose pokes into a lit port) were reinforced probabilistically with sucrose pellets (rewards). In session 1, each action was reinforced (fixed ratio 1, FR01). In session 2, reinforcement probability decreased to 0.2 in 3 trial blocks (transition, TRANS). In sessions 3-4, the probability of reinforcement was 0.2 (random ratio 5, RR05). In sessions 5-7, the probability of reinforcement was 0.1 (random ratio 10, RR10). At trial start, a cue light was illuminated until the outcome was earned (blue arrow). When an action was reinforced, the cue light was extinguished immediately and the outcome was delivered 0.5 sec later. (B) Random reinforcement led to different numbers of serial actions performed per trial. (C) Mean ± SEM number of actions required per trial. Shaded area depicts RR10 sessions. Serial action data were drawn from these sessions because these sessions required the greatest average number of actions per trial, and thus, had the greatest statistical power. (D) Mean ± SEM action rate in each session. (E) Mean ± SEM inter-action intervals in low, medium and high bins (actions 1-7, 8-14, and 15-21) during RR10 sessions.   Dashed lines represent decoding of shuffled control data. Inset to the right depicts performance of ensemble and populationaverage decoders across consecutive sessions in dopamine and non-dopamine neurons. Note that ensemble activity was decoded significantly more accurately than population averaged activity and shuffled control. Inset data are collapsed across high, medium, and low action numbers and depicted separately for each session.    Purple points represent spikes that were assigned to unit A, and gray points represent noise that was not sorted into single unit spikes. A raw voltage trace (band pass filtered between 100 Hz and 7 KHz) corresponding to the same unit is depicted in the middle column. Examples of spikes belonging to unit A are notated in the trace. Unit A represents a typical cue-responsive unit. Raster plot depicts the unit's response aligned to cue onset (right). Each dash represents a single spike, and each row represents a single trial (first trial in the top row). Note the increased spike density just after cue onset (time 0) across all trials. (B) Representative cue-offset responsive, non-dopaminergic, neuron. Data plotted with the same conventions as (A). In this example recording, two units were simultaneously recorded (left). Raster (right) depicts data during the time period between cue offset and reward delivery (-0.5 -0s), with spikes aligned to the time of reward delivery. Note the consistent increase in spike density after cue offset and preceding outcome delivery. (C) Data from a representative non-dopaminergic, outcome delivery responsive neuron is plotted with similar conventions as (A). Raster (right) depicts neuronal activity aligned to the time of outcome delivery. Note the consistent delivery evoked response. (D) Data from a typical dopaminergic neuron that preferred low action numbers. Spike sorting (left) plotted with similar conventions as (A). Several units were simultaneously recorded and the example voltage trace contains spikes from multiple units (middle). Only the spike corresponding to unit D (yellow) is notated. The raster (right) shows spikes aligned to the time of action execution (time = 0). Each row of the raster represents one action evoked response and rows are arranged by action number. Each arrow on the right represents an action number. For each action number, the earlier occurrences of an Nth numbered action are arranged toward the top. Thus, the first row of the raster represents the first occurrence of an action number 1, and the second row represents the second occurrence, etc. The inset depicts the tuning curve across action numbers 1-20. Note that the neuron most strongly prefers actions 1 and 2, which is reflected in the tuning curve (inset) and the spike density in the raster.