Neural Representation of Motor Output, Context and Behavioral Adaptation in Rat Medial Prefrontal Cortex During Learned Behavior

Selecting behavioral outputs in a dynamic environment is the outcome of integrating multiple information streams and weighing possible action outcomes with their value. Integration depends on the medial prefrontal cortex (mPFC), but how mPFC neurons encode information necessary for appropriate behavioral adaptation is poorly understood. To identify spiking patterns of mPFC during learned behavior, we extracellularly recorded neuronal action potential firing in the mPFC of rats performing a whisker-based “Go”/“No-go” object localization task. First, we identify three functional groups of neurons, which show different degrees of spiking modulation during task performance. One group increased spiking activity during correct “Go” behavior (positively modulated), the second group decreased spiking (negatively modulated) and one group did not change spiking. Second, the relative change in spiking was context-dependent and largest when motor output had contextual value. Third, the negatively modulated population spiked more when rats updated behavior following an error compared to trials without integration of error information. Finally, insufficient spiking in the positively modulated population predicted erroneous behavior under dynamic “No-go” conditions. Thus, mPFC neuronal populations with opposite spike modulation characteristics differentially encode context and behavioral updating and enable flexible integration of error corrections in future actions.

Motor behavior also shows neurophysiological correlates in the mPFC (Horst and Laubach, 2013;Pinto and Dan, 2015;Amarante et al., 2017) and disturbing the mPFC network can lead to reduced attention performance and inappropriate motor output Luchicchi et al., 2016;Kamigaki and Dan, 2017). However, it is currently unknown how the mPFC controls and encodes this motor behavior or whether it encodes just a copy of the motor signal (efference copy). Similarly, we know that inhibitory control and top-down increase of stimulus discrimination depend on mPFC function Pinto and Dan, 2015;Wimmer et al., 2015), yet we do not know how these are encoded. The executive function of the mPFC in adaptive behavior occurs on short timescales implying that the mPFC should integrate trial outcomes and adapt behavioral strategies on a trial-by-trial basis (Narayanan and Laubach, 2008;Euston et al., 2012;Horst and Laubach, 2012;Narayanan et al., 2013;Pinto and Dan, 2015), nevertheless we do not know how these are represented by mPFC neuronal spiking.
To identify neurophysiological correlates of sensoryguided and adaptive motor behavior, we recorded activity of mPFC neurons extracellularly (Csicsvari et al., 2003;Rossant et al., 2016) in adult rats performing a whiskerbased ''Go''/''No-go'' object localization task (O'Connor et al., 2010a;Pammer et al., 2013). We present four findings on populations of putative pyramidal mPFC neurons, which show modulation of their spike rates: (1) during correct performance of the task; (2) during motor output with and without the intent to collect reward; (3) when changing behavior after mistakes; and (4) as a function of increasing task difficulty.

Animal Welfare Statement
This study was carried out in accordance with European and Dutch law and approved by the animal ethical care committee of the VU Amsterdam and VU University Medical Center, Netherlands (protocol INF-14-08).

Surgery
Male Wistar rats (250-350 g, 8-12 weeks) were implanted with a headpost. Preoperatively Baytril (5 mg/kg), Temgesic (buprenorphine 0.05 mg/kg) and carprofen (5 mg/kg) were administered subcutaneously. Animals were anesthetized with 2% isoflurane for induction, placed on a heating mat and fixed into a stereotactic frame, after which the isoflurane concentration was reduced to 1.5% for maintenance. The hair on the head was removed with hair clippers and lidocaine was injected for local analgesia (200 µl, 2% s.c.). An incision was made in the skin, the underlying tissue was removed and the skull cleaned. Next, the skull was etched with Gel Etchant (Kerr Dental, Visé, Belgium) and cleaned. Bonding agent (Optibond, Kerr Dental, Visé, Belgium) was used to enhance adhesion between dental cement (Tetric evoflow, Ivoclar Vivadent, Schaan, Liechtenstein) and the bone. Four screw holes were drilled (1× occipital bone, 1× contralateral parietal bone (above V1), 1× frontal bone (above the olfactory bulb) and 1x lateral of the temporal ridge) and stainless steel head screws were inserted to increase stability of the head cap. The craniotomy for electrophysiological mPFC recordings was drilled at 2.5 mm anterior and 0-1.2 mm lateral of bregma and recordings were targeted to the ventral anterior cingulate (AC), prelimbic (PL) and dorsal infralimbic (IL) cortex (at 1,900-3,500 µm from pia on the dorso-ventral axis).
Behavioral Task: Whisker Based Head-Fixed "Go"/"No-Go" Task To study the neural basis of cognitive behavior in the mPFC, we adapted a whisker-based object localization (''Go''/''No-go'') task for head-fixed rats. The task was adapted for rats from similar tasks in mice (O'Connor et al., 2010a;Pammer et al., 2013) and development during the start-up phase was facilitated by expert input from Dr. Karel Svoboda (HHMI, Janelia farm, Ashburn, VA, USA) and Dr. Cornelius Schwarz (CIN, Tübingen, Germany). To drive motivation for optimal task performance, the rats were maintained on temporary water restriction such that they earned all their water by performing the task (weekly schedule of 5 days training, 2 days ad libitum access). Rats learned to distinguish two locations of a cue-pole and were rewarded when responding with motor output (i.e., licking) to the ''Go'' location. To promote discrimination and avoid simple detection strategies (e.g., keeping whiskers on one of the two locations), both locations were placed on the same azimuth of the resting position of the C1 whisker, but were spaced 4 mm on the proximal-distal axis ( Figure 1A). During behavioral training, rats learn to lick for a reward when the object was in the proximal (''Go'') position and refrain from licking in the distal (''No-go'') position. To monitor the rats' health and water intake, both daily FIGURE 1 | Experimental approach. (A) Schematic of the head-fixed rat. Licking when the object was in the "Go" position (dark blue) was rewarded with water and called a Hit trial (blue), while refraining from licking was called a Miss trial (purple). When the object was in the "No-go" position (brown-red) the rat should refrain from licking to make a Correct Rejection (CR; orange), while licking in this condition was a False Alarm (FA; red) and punished with a tone and time out (TO). (B) Block diagram showing the sequence of actions during trials. The object moved from a position in the middle between "Go" and "No-go", while out of reach of the whiskers. Subsequently, the pole moves up, the rat was allowed to lick during the sampling period (1 s), but licking was neither punished nor rewarded. During the 1 s decision window licking triggers an outcome (rewarded or punished for "Go" or "No-go", respectively). Colors as in (A). (C) The licking behavior of a rat during a single behavioral session. (C1) Licking raster plot of the 180 trials that were performed during this session, sorted by trial outcome and color coded as in (A). Every (Continued) FIGURE 1 | Continued tick is a lick that was counted as a beam break in the lick port. We intermixed 10% Catch trials without the object (green (H2) PETH of spiking of the unit from (H1) around "Go" trial outcomes with baseline subtracted. Horizontal blue bar is the window during which water is available from the lick port. (H3) As (H2) for "No-go" trials. Horizontal red bar is the window in which the tone is broadcast, the horizontal black bar shows the duration of the TO.
water volume consumed and the weight of the rat were closely monitored and behavioral abnormalities were registered. When the rat did not drink enough during training and consequently lost weight, a small volume (typically 2-4 ml) of water was offered to the rat.

Training Regime
Trials started with the pole outside the reach of the whiskers in a neutral position, exactly in the middle of the ''Go'' and ''No-go'' positions to reduce detectability of the pole location by cues other than whisker touch (e.g., timing or sound cues). Trial identity varied randomly, but consecutive trials were limited to four of the same ''Go''/''No-go'' identity. After trials started, the object moved to either the ''Go'' or ''No-go'' position. The pole then moved up into the whisker field by valves driven by air pressure, which produced a soft clicking sound upon action. After the pole was fully up, a 1 s grace period started for sensory acquisition during which licking was allowed, but not rewarded or punished. Rats repeatedly touched the object with their whiskers during both ''Go'' and ''No-go'' trials (Supplementary Figure S1). Following the grace period, the rat had 1 s to lick if it decided the pole was in the ''Go'' position. Licking was recorded by beam breaks of an infrared laser beam in the lick port (Sunx, West Des Moines, IA, USA). When rats licked during ''Go'' trials, the trial was labeled a Hit trial and a 20-40 µl water was given (two drops separated in time by 1 s). When rats licked during ''No-go'' trials, the trial was labeled a False Alarm (FA) trial and the mistake was signaled with a 1 s 12 kHz sound at 70 dB followed by a time out (TO) period (5-7 s depending on previous session '' Nogo'' outcome). In addition, the probability of another ''No-go'' trial was increased after FA errors to discourage ''always-lick'' strategies (fraction ''Go'' trials, 0.25). When the rat did not lick in ''Go'' or ''No-go'' trials, the trial outcome was labeled as Miss or Correct Rejection (CR), respectively. Trials without licking were not rewarded nor punished ( Figure 1B) and all trials ended with the pole moving back to the neutral position. We used a variable inter-trial interval period of 1.5, 2 or 3 s to reduce predictability of task-timing. Rats were trained until they reached performance accuracy above 70% for ''Go'' and ''No-go'' trials collectively ((Hit + CR)/(''Go'' + ''No-go'' trials) * 100%). Whisker object-touches were comparable between ''Go'' and ''No-go'' trials (Supplementary Figure S1). To confirm that the rats use whisker-guided sensory cues for decision making, we intermixed 10% catch trials (no object) in which the valves to control the pole z-position made similar sounds, but the pole did not move within reach of the whiskers. Licks were not rewarded or punished during Catch trials and rats usually did not lick during Catch trials. After reaching threshold for task performance, the rats daily performed three types of sessions in random order. Apart from the regular session type, we changed the task in a second session type such that the ''No-go'' location was randomly selected from an Easy, Normal or Hard position (2, 4 and 8 mm from the ''Go'' position, respectively; Figure 6A). This manipulation allowed us to test mPFC physiological dependence on changes of stimulus contrast. In the third session type we changed the reward or punishment during random trials: in a subset of Hit trials, we doubled the water reward to investigate scaling of reward related spiking with reward magnitude. In a different subset of Hit trials, we delayed the delivery of the reward with 1 s to distinguish between the neurophysiological correlate of simple licking-induced motor activity and actual reward feedback and consumption. In the same session type we used trials in which the punishment signal was delayed with 1 s.

Electrophysiology
We used the Open Ephys data acquisition board with two RHD2132 digital interface chips (Intan Technologies, Los Angeles, CA, USA) and the open ephys GUI (Siegle et al., 2017) to record the electrophysiological activity of the mPFC. We used 4-shank 64-channel silicon probes (Cambridge Neurotech, Cambridge, UK). On these probes, 16 channels were clustered per shank in two parallel columns so that each channel has a distance of 25 µm to its neighbors. The shanks were spaced at 250 µm and were placed on the lateral-medial axis, i.e., throughout the layers of the mPFC. We saved the data as 16-bit integers at 30 kHz. Afterwards, we high-pass filtered the data and combined the 16 channels of each shank for automated clustering using Klustakwik (Rossant et al., 2016). We manually curated the clusters to get stable and well-isolated single units. We used stringent cut-offs; a unit was considered well-isolated if it had an isolation distance (ID) > 40, L-ratio <1, the cluster exceeded 300 spikes and the fraction of interspike intervals below 1.5 ms was <0.1%. This resulted in an average yield of 0.2 unit/recording site. Next, analysis was done on the average waveforms of all well-isolated units to further sub-classify units as regular spiking units (RSUs) vs. fast spiking units (FSUs; Barthó et al., 2004). Since FSUs (AP peak-to-trough time <0.5 ms and AP half-peak time <0.25 ms) represent a vastly heterogeneous population with a broad spectrum of functional and morphological characteristics, units with fast spiking waveforms were excluded from further analyses. Data obtained from three rats passed behavioral and electrophysiological criteria for subsequent data analysis.
Two distinct methods to place the silicon probe were used. In two rats we implanted the silicon probe on a nano-drive (Cambridge Neurotech, Cambridge, UK) with the electrodes in the dorsal mPFC. After the rat was trained we moved the probe down through the dorsal-ventral extent of the mPFC to record at multiple locations while rats were performing the behavioral task. In one rat, we positioned an acute silicon probe (same lay-out as the chronic probes) in the mPFC with a Luigs and Neumann manipulator (Ratingen, Germany).

Perfusion, Slicing and Histology
After data acquisition was completed, rats were deeply anesthetized with 3% isoflurane and urethane (i.p. 10 mL/kg 20%). Next, rats were perfused transcardially with 0.9% NaCl followed by 4% paraformaldehyde (PFA). Brains were extracted and stored overnight in 4% PFA in phosphate buffer (PB) at 4 • C after which they were transferred to 0.05 M PB for further processing.
Brains were sliced into 100 or 50 µm coronal sections with a vibratome, rinsed with PB (0.05 BM) and mounted using a Mowiol solution (Clairant GmbH, Frankfurt am Main, Germany; Narayanan et al., 2014). Probe placement was verified visually using an Olympus BX51 microscope with a 4× air objective or 40× oil objective. For chronic recordings, we used the electrode tract to retrieve probe placement. For acute recordings, we dipped the probe in DiI (Thermo Fisher Scientific, Waltham, MA, USA) and used a X-Cite 120 Q light-source (Excelitas Technologies Corp., Waltham, MA, USA), to visualize the electrode tract.

Behavioral Quantification
Performance was quantified either as accuracy (number of correct over total number of trials of a type; Figure 1D), or as proportion of trials with licking in the response window for both ''Go'' and ''No-go'' trials ( Figures 1E, 5A, 6B). Each lick was defined as the first time the infrared beam in the lick port was broken if it had remained unbroken for at least 10 ms. Licking Peri-event time histograms (PETHs) were made by binning licking time-stamps in 200 ms bins and averaging the detected licks per trial type.
To distinguish between spiking during motivated licking for a reward or during randomly emitted licks, spiking profiles for intra-trial vs. extra-trial licking bouts were computed. A licking bout was defined as the time between the first lick and last lick if all inter-lick-intervals in between were less than 1 s. We then defined a licking bout to be intra-trial if it started between 2 s before to 0.2 s after the outcome of a trial and extra-trial if it was outside of these windows. We then computed PETHs with 200 ms bins and from −3 s to +3 s from lick bout start for both the intra-trial and extra-trial licking bouts (spiking in Figures 3A3-C3, licking in Figure 3D3).
Quantification of spike rates per licking condition (i.e., no lick, intra-trial and extra-trial lick bouts) was done per unit as an average during all lick bouts (or all non-lick periods, Figure 3E). The distinction between intra-and extra-trial licking was made to have a measure for motor control (extra-trial licking) vs. a measure for the mixture of sensory, motivational, reward-driven and motor activity (intra-trial licking). To rule out that differences in licking behavior during intra-vs. extratrial licking bouts would underlie differences in spiking between conditions, we matched individual extra-trial licking bouts based on equal number of licks and a licking frequency (# licks / [time between first and last lick]) within 1 Hz of the extra-trial bout (Figure 4). To test correlations between spike rates and licking behavior of the rat, we took all licking bouts with more than one lick. We quantified the licking frequency between the first and last lick of the bout as well as the spike frequency for each unit. We then performed Pearson's correlations between the spike rate and licking frequency for each unit. Finally, we split the (positively, negatively and unmodulated) populations in a licking-frequency correlated group and an uncorrelated group and made licking-bout triggered PETHs for all six groups (Supplementary Figure S2).

Statistical Analysis
A bootstrapping method was used to determine whether spiking activity of individual units was modulated by a specific segment of correct behavioral performance. First, we aligned the spiking activity of an individual unit to the Hit time stamps (first lick within decision window of a ''Go'' trial) and computed the ∆ spike rate (mean spike rate in the 1 s windows before Hit outcomes minus mean spike rate during the session) of each unit during the 1 s before these Hit outcomes (mock example in Figure 2A). Subsequently, an equal number of random ''Hit'' time stamps were generated throughout the recording session and the mean ∆ spike rate in the 1 s before these randomized ''Hit'' time stamps was determined. This procedure was repeated 1,000 times after which a distribution of ∆ spike rate to the randomized triggers was constructed and compared to the ∆ spike rate upon true Hit time stamps. If the recorded ∆ spike rate of a unit was in the 1st percentile of its (randomized) bootstrap distribution it was grouped in the negatively modulated spike rate group (reduced spiking activity in 1 s window before Hit outcomes) and conversely, when the value was in the 99th percentile the unit was placed in the positively modulated spike rate group (increased spiking activity in 1 s window before Hit outcomes; Figures 2C,D). Units that did not fall in either group were considered unmodulated.
We performed Kolmogorov-Smirnov tests to check whether our data was normally distributed. To determine correlations between spike rates and ∆ spike rate and between licking probability and trial number (Figures 1E, 2E), Pearson's correlation (α < 0.05) was used. To compare medians of two populations, Wilcoxon signed-rank tests were used (α < 0.05). Finally, to compare three or more groups, Kruskal-Wallis tests were used followed by post hoc multiple comparison tests. In case of multiple testing, a Bonferroni correction was used.

RESULTS
To answer how mPFC neurons encode tactile decision making, motivated behavior and learning from mistakes, we trained rats in a head-fixed whisker-based object localization task. In red are the spikes of a unit spiking on average with 0.5 Hz. The horizontal blue bars show the 1 s windows before Hit outcomes, the blue numbers give the number of spikes in each window. In black we show two randomizations of the Hit time stamps and the corresponding ∆ spike rate. Under the dashed line we show in black a histogram of ∆ spike rates (black) and the recorded ∆ spike rate (blue). In this case the recorded ∆ spike rate was in the 99th percentile of the randomized distribution and is considered positively modulated during Hit trials. (B) The PETHs of three example units around Hit outcomes. (C) The histograms of the bootstrapped spike rate distribution and the recorded ∆ spike rate as a thick black line. Same units as in (B). The magenta unit was positively modulated, the gray unit unmodulated and the cyan unit was negatively modulated. (D1) The distribution of percentiles from the bootstrapping method with respect to the mean spike rate during the whole session. (D2) Histogram over the percentile axis of (D1). (E) Correlations between mean spike rates and ∆ spike rate. Units that have a high spike rate will show the largest ∆ spike rate, positive (magenta) or negative (cyan). (F) The positively and negatively modulated population have a significantly higher median spike rate compared to the unmodulated units. * * * p < 0.001.
Data of three rats reached selection criteria for quality of electrophysiological recordings and behavioral performance. In this task, rats learned to report proximal (''Go'') location of an object by licking for a water reward (20-40 µl) while refraining from licking when the object was placed distally (''No-go'') to avoid a TO punishment (Figures 1A,B). ''Go'' trials were called Hit if the rat licks and Miss when the rat does not, conversely, licking during a ''No-go'' trial resulted in a FA, while refraining from licking resulted in a CR. Rats completed on average 147 trials per session (range 77-253 trials). After the object moved into the whisker field, rats started whisking extensively and many touches followed, both during ''Go'' and ''No-go'' trials (Supplementary Figure S1). Rats readily distinguished between ''Go'' and ''No-go'' positions and licked preferentially during ''Go'' trials over ''No-go'' trials (Proportion correct: ''Go'' 0.67 ± 0.13; ''No-go'' 0.82 ± 0.10; All 0.74 ± 0.07, mean ± standard deviation; Figures 1C-E, Table 1). Catch trials were intermixed to test whether rats relied on whisker information to determine task outcome. Rats typically did not lick during Catch trials (Proportion not licked: 0.98 ± 0.04, mean ± standard deviation; Figures 1C-E).
To monitor mPFC neuron spiking activity, 64-channel silicon probes were implanted. Spikes were analyzed and clustered off-line (Figures 1F1,F2), which resulted in 204 well-isolated units distributed over the dorsal-ventral axis and throughout mPFC layers. FSUs (peak-to-trough time <0.5 ms and half-peak time <0.25 ms, gray), were excluded from subsequent analyses (Figures 1G1,G2). RSUs represent putative excitatory pyramidal neurons for which we found strong correlations between specific trial parameters and spike rates in a subset of units (example in Figures 1H1-H3). Thus, spiking in a subpopulation of mPFC RSU units strongly correlated with behavioral performance.

mPFC Units Encode Motor Output and Context
The bootstrap analysis identified units with spike rates that were positively modulated (Figure 3A), unmodulated ( Figure 3B) or negatively modulated (Figure 3C) during correct ''Go'' trials. Units with positively modulated spiking activity (n = 31) were clearly excited before Hit outcomes and remained excited during reward delivery and consumption (blue, Figure 3A1). Similarly, during FA trials the average spike rate of positively modulated units increased before FA outcomes. As soon as rats received feedback on trial outcome however (i.e., learned that it made The negatively modulated population (C) and unmodulated population (D) did not show spiking rate modulations for matched intra-and extra-trial licking bouts, indicating that context is not involved in spike rate modulation for these populations. (E) The spiking rate of the positively modulated population was significantly higher during intra-trial than during extra-trial licking bouts, which is supportive evidence for the hypothesis that context is an important contributor to spike rate modulation for this population of medial prefrontal cortex (mPFC) units. n.s. not significant (p > 0.05), * p < 0.05. a mistake), spike rates returned to baseline (red, Figure 3A2). During Miss and CR trials, spike rates of the positively modulated population remained stable (purple and orange respectively, Figures 3A1,2). Unmodulated units (n = 112) did not show spike rate changes for any trial type ( Figure 3B). The negatively modulated population (n = 48) showed a clear reduction in spike rate before, during and after Hit ( Figure 3C1) and FA outcomes ( Figure 3C2). Similar to the positively modulated units, spike rates of the negatively modulated population returned to baseline quickly after outcome of FA trials. Thus, neurophysiological correlates during ''No-go'' trials closely resembled those during ''Go'' trials, up to the moment that sensory feedback signaled whether behavioral output was appropriate.
Both the positively and negatively modulated populations showed obvious spike rate changes during trials in which rats licked, independent of reward delivery (Hit and FA) and spike rate changes were absent from non-lick trials (Miss and CR). This may suggest that spike rate modulations during trials in which rats were licking represented motor output. To further quantify correlations between modulation of mPFC spiking activity and licking behavior, we analyzed licking behavior for each trial type ( Figure 3D) and found that the time course of licking matched changes in spike rate. Most licks occurred during Hit and FA trials, but rats occasionally licked outside trials. These extra-trial licks presumably represent licking behavior without the expectation to earn water rewards (see e.g., the ticks at 0.5-1.5 s for CR and Miss trials in Figure 1C1). It is thus tempting to assume that these licks were not as strongly driven by expectance to receive rewards compared to intratrial licks and we therefore used this distinction as a control for motor behavior. Thus, to test whether ∆ spike rates were related to motor output, we analyzed PETHs of spike rates There are significantly more/fewer FA errors in "No-go" trials if the previous trial was a Hit/FA respectively. (A2) There was no significant difference in the performance in "Go" trials based on the previous trial outcome. (B1) Schematic representation of the selection of trials for (B2,B3). Only those Hit trials that are followed by a "No-go" trial are used. RW is water reward. (B2) Left: average PETHs of the positively modulated population around Hit outcomes for trials that were followed by a "No-go" trial. Hit trials that were followed by CR (Hit CR ; orange) and those that were followed by a FA (Hit FA ; red) show (Continued) FIGURE 5 | Continued similar time courses of ∆ spike rate (n = 27). Right: there was no significant difference of the median ∆ spike rate in Hit CR and Hit FA trials. (B3) As (B2) for the negatively modulated population (n = 46). (C1) Similar to (B1) for FA trials. TO is time out punishment. (C2) Left: PETHs of the ∆ spike rate of the positively modulated population during FA trials that are followed by a CR trial (FA CR ; orange) and those that are followed by another FA trial (FA FA ; red; n = 12). Right: there was no significant difference of the median ∆ spike rate in FA CR and FA FA trials. (C3) As (C2) for the negatively modulated population (n = 18). The black bar indicates the window for right panel. Right: the median ∆ spike rate during 0.2-3 s after the FA trial outcomes was significantly higher during FA CR trials than FA FA trials. n.s. not significant (p > 0.05), * p < 0.05, * * p < 0.01.
centered around the start of lick bouts for both intra-trial bouts and extra-trial bouts (Figures 3A3-C3). Both positively and negatively modulated populations had stronger spike rate modulation around lick start for intra-trial licks compared to extra-trial licks (compare amplitude of solid and dashed lines for intra-and extra-trial licking respectively, Figures 3A3-C3).
We additionally quantified the contribution of motor output (i.e., licking) to spike rate modulation by comparing spike rates during no licks vs. extra-trial licks. We found that spike rates of the unmodulated and negatively modulated population during extra-trial licks significantly decreased relative to no-lick episodes (neg. modulated p = 5 * 10 −7 ; unmodulated p = 1 * 10 −4 ; Wilcoxon signed-rank test with Bonferroni correction; Figure 3E3). Spike rate modulation was not observed in the positively modulated population upon comparison of no-lick and extra-trial licking episodes, indicating that spiking in this population is insensitive to motor output (pos. modulated p > 0.05).
Since intra-trial licking is associated with reward expectation, intra-trial licks probably represent a combination of motor output and contextual value (i.e., reward delivery). We thus quantified spike rates between intra-and extra-trial licking to determine whether context (i.e., reward delivery) impacted spike rate beyond motor output only (extra-trial licks). All three populations showed spike rate modulation with respect to the cognitive context and have a higher median spike rate during intra-trial compared to extra-trial licking (pos. modulated p = 0.01; unmodulated p = 0.03; neg. modulated p = 0.02; Wilcoxon signed-rank test with Bonferroni correction; Figure 3E3). This indicates that context increased spike rates consistently in all three populations.
Since we cannot rule out that fine-scale differences in behavior during intra-and extra-trial licks may underlie differences in spike rate modulation, we quantified licking behavior after intraand extra-trial licking bout starts ( Figure 3D3). Indeed, intratrial licking frequencies were higher compared to extra-trial licking frequencies. When different licking frequencies underlie spike rate differences between intra-and extra-trial lick bouts, spike rates should correlate with licking frequency. For a subset of units, spike rate indeed significantly correlated to licking rate (Supplementary Figures S2A1,A2). Moreover, differences in spike rates between intra-and extra-trial lick bouts persisted when licking frequency-correlated units were excluded from the analysis (Supplementary Figures S2B2,D2). The median ∆ spike rate was significantly higher for Hard trials than for Easy FA trials. (E) Boxplot for the ∆ spike rate of the positively modulated population −2 to 0 from trial outcome (first lick in the decision window) for the four licking trials (Hit and Hard, Normal and Easy FA trials). The ∆ spike rate was significantly lower during Easy FA trials than Hit and Hard "No-go" trials. n.s. not significant (p > 0.05), * p < 0.05, * * p < 0.01.

Positively Modulated Units Represent Contextual Value
Finally, we matched individual extra-trial lick bouts with intratrial lick bouts based on similar behavioral statistics (i.e., lick frequency, number of licks and bout duration) to rule out behavioral differences and isolate contextual value. We found that spike rate modulation during licking was comparable between intra-trial and extra-trial licking bouts when behavioral statistics were matched for the negatively and unmodulated populations (p > 0.05, Wilcoxon's signed-rank test with Bonferroni correction, Figure 4), which indicates that spike rate modulation during licking is due to motor output and not contextual value. For positively modulated units, spike rates during intra-trial licking bouts continued to be significantly higher compared to extra-trial licking bouts after matching behavioral characteristics (Figure 4E), indicating that context but not motor output impacted spike rates in this population (pos. modulated p = 0.01, Wilcoxon's signed-rank test with Bonferroni correction).
We also changed the cognitive context by doubling the water reward or by delaying reward or punishment but these experimental manipulations did not consistently change the behavior or spike rates and these manipulations were therefore not used for further in-depth analyses (Supplementary Figure S3). Collectively, our data suggest that motor behavior and cognitive context contribute to spike rate modulation during task execution, but modulation depth depends on the neuronal population involved.
A subset of ''Go'' trials is associated by the absence of licking, leading to ''Miss'' categorization. As a consequence, CR trials may be the outcome of a ''No-go'' miss instead of an active decision to identify the trial as ''No-go.'' To test whether withholding licks during ''No-go'' trials is a passive process or is actively encoded by the mPFC, the bootstrap analysis was applied again (Figures 2A-C) to classify units based on ∆ spike rate during correct ''No-go'' performance. We found positively (n = 9), negatively (n = 12) and unmodulated units (n = 170; 12 out of 21 units were also modulated during Hit trials; Supplementary Figure S4). We quantified the median spike rate of the ''No-go'' CR-negatively modulated population during the 1 s before ''Go''-Hit outcomes and found that it was not negatively modulated (data not shown). In contrast, the ''No-go'' CR-positively modulated population showed a reduction in spike rate in the 1 s window before Hit outcomes (p = 0.006 sign test with Bonferroni correction; Supplementary Figure S4A3). Thus, the presence of CR-modulated units supports the hypothesis that ''No-go'' trials are actively encoded in mPFC and rats actively determine the outcome of both ''Go'' and ''No-go'' trials.

Negatively Modulated Units Signal Updating of Task Rules on a Trial-to-Trial Basis
The mPFC is involved in monitoring and updating behavior in highly dynamic environments. During the behavioral task, rats constantly received feedback on performance through water rewards for Hit trials and sound and TOs for FA trials. The general assumption is that these cues are integrated with existing internal rule representations and change behavioral strategies to drive rewarded behavior (Dalley et al., 2004;Rich and Shapiro, 2009;Durstewitz et al., 2010;Horst and Laubach, 2012;Narayanan et al., 2013). In accordance, rats were more likely to lick in ''No-go'' trials following a Hit trial compared to ''No-go'' trials following a FA trial (p = 0.012, Wilcoxon signed-rank test with multiple comparisons, significant difference between ''No-go'' trials after Hit or FA outcome; Proportion licked: Hit 0.20, 0.11/0.29; FA 0, 0.0/0.23, values represent median, 1st/3rd quartile; Figure 5A1). This effect was not observed for ''Go'' trials ( Figure 5A2), since performance during ongoing trials did not depend on the outcome of the previous trial. Thus, rats adapt behavior on a trial-to-trial basis to avoid consecutive FA trials and associated TO punishments.
To address the question whether mPFC spiking correlates with the trial-to-trial behavioral shifting, we tested correlations between mPFC spiking activity during trial outcome (feedback) of Hit and FA trials and subsequent trial performance. Thus, we took the PETHs of the modulated units for Hit trials followed by a ''No-go'' trial (Hit CR and Hit FA ) and similarly for FA trials followed by a ''No-go'' trial (FA CR and FA FA ; Figures 5B,C). We selected units that were recorded in sessions that contained both Hit CR and Hit FA or both FA CR and FA FA trials for the following analyses. We computed the average spike frequency of units in the time window where feedback was received and integrated (i.e., 0.2-3 s after outcome). We hypothesized differential spiking activity when the behavior on the next trial corresponded with feedback signals in the previous trial (i.e., keep licking after reward: Hit FA ; or stop licking after punishment: FA CR ) compared to no correspondence (Hit CR and FA FA ). Neither positively modulated units, nor negatively modulated units showed different spike rates between Hit CR and Hit FA (∆ spike rate: pos. modulated Hit CR 1.06, 0.02/5.10 Hz; Hit FA 0.25, −0.51/1.59 Hz; neg. modulated Hit CR −0.65, −1.61/0.05 Hz; Hit FA −0.64, −1.47/0.01, values are median, 1st/3rd quartile; p > 0.05, Wilcoxon signed-rank test with Bonferroni correction; Figures 5B2,3). Thus, spike rates during correct ''Go'' trials did not predict performance during subsequent ''No-go'' trials. Similarly, positively modulated units showed comparable spike rates during feedback of FA CR and FA FA trials (FA CR 0.07, −0.16/1.39 Hz; FA FA 0.06, −0.60/2.12 Hz, values are median, 1st/3rd quartile; p > 0.05, Wilcoxon signed-rank test with Bonferroni correction; Figure 5C2). Surprisingly, negatively modulated units had lower spike rates during FA FA trial feedback (when the behavior was not changed) compared to FA CR trials, when rats updated their behavior and correctly performed during the next ''No-go'' trial (FA CR −0.08, −0.59/0.36 Hz; FA FA −0.60, −1.74/−0.20 Hz, values are median, 1st/3rd quartile; p = 0.012, Wilcoxon signed-rank test with Bonferroni correction; Figure 5C3). To test whether this effect was robust, the two most active neurons were removed from the population, but this did not affect statistical outcome (p = 0.043, Wilcoxon signed-rank test with Bonferroni correction).
Additional evidence for mPFC encoding of behavioral shifting can be found in spike rates of the CR-modulated units. The positive CR-modulated units did not encode behavioral updating after error feedback (n = 5; Supplementary Figures S5A,B), but the negative CR-modulated population spiked more during feedback integration of FA FA trials compared to FA CR trials (p = 0.03, Wilcoxon signed-rank test with Bonferroni correction, n = 8; Supplementary Figure S5C). We therefore propose that both negative Hit-modulated and negative CR-modulated populations may carry the neurophysiological representation of adaptive behavior in response to unsuccessful trial outcome and in turn may drive trial-to-trial learning.

Spike Rate Increase in Positively Modulated Units Could Predict Inhibitory Control
It has been suggested that mPFC recruitment depends on the level of task difficulty (Euston et al., 2012;Zhang et al., 2014;Kamigaki and Dan, 2017) and that increasingly difficult tasks involve mPFC more strongly (Chudasama and Muir, 2001;White et al., 2018;Wu et al., 2018), while others suggest that mPFC encodes certainty (Hanks et al., 2015). Additionally, correct mPFC spiking activity may be essential for impulse control Laubach, 2006, 2009;Chudasama et al., 2012;Gourley and Taylor, 2016). Therefore, in one session each day, we randomly intermixed two extra ''No-go'' locations, in addition to the Normal ''No-go'' location. These locations were chosen such that the Easy location was twice as far from the ''Go'' location (8 mm) as the Normal location (4 mm) and the Hard location was half as far (2 mm; Figure 6A). The rats performed as expected; they licked mostly for ''Go'' trials, made a relatively high number of (FA) mistakes for the Hard ''No-go'' location and progressively fewer mistakes for the Normal and Easy ''No-go'' locations (Proportion licked ''Go'': 62.1 ± 3.8%; Hard ''No-go'': 31.4 ± 3.8%; Normal ''No-go'': 17.8 ± 3.4%; Easy ''No-go'': 6.8 ± 1.7%; mean ± SEM; Figure 6B). During these sessions both the positively and negatively modulated populations (n = 20 and n = 35 respectively) responded similarly relative to regular sessions (compare Figure 3C with Figures 6C1,D1) with prolonged increase/decrease of spike rates during licking and reward consumption in Hit trials and rapid return to baseline spiking after Normal FAs ( Figure 6C1). ''No-go'' trial type did not determine the spike rate of the negatively modulated units during FA and CR trials (Figure 6C, CR not shown). Spiking of the positively modulated units was also not dependent on the type of CR trials (data not shown). However, spike rates of positively modulated units differed between distinct types of lick trials (∆ spike rate: Hit 1. 61,0.49/2.36 Hz;0.32/2.72 Hz;0.16/2.18 Hz; Easy ''No-go'' −0.09, −0.78/1.12 Hz, values are median, 1st/3rd quartile; p = 0.01, Kruskal-Wallis test; Figure 6E). More specifically, positively modulated units exhibited reduced spiking in the 0-2 s prior to outcomes of Easy FA compared to the same time-window during Hard FA trials (p = 0.004, Wilcoxon signed-rank test with Bonferroni correction; Figure 6D3) and during Normal FA trials (p = 0.035, Wilcoxon signed-rank test with Bonferroni correction). Thus, mPFC spiking activity scales with task difficulty during error trials. Since rats may deploy different cognitive or behavioral strategies to solve Easy vs. Hard ''No-go'' trials (# of licks p = 0.055 and licking peak latency p = 0.063; Wilcoxon signed-rank test; Supplementary  Figure S6), future work should elucidate which factors underlie the observed correlation between task difficulty and mPFC spiking activity.
We performed extracellular electrophysiological recordings in the mPFC of rats in combination with rigorous cluster analysis to reach single-unit resolution (Csicsvari et al., 2003;Barthó et al., 2004;Schmitzer-Torbert et al., 2005;Rossant et al., 2016) and aligned spiking responses to specific epochs of task performance. We found units showing positive spike rate modulation (increased spiking), negative modulation (decreased spiking) or no modulation (no change) when aligned to correct ''Go'' performance, which is comparable to sensory, motor, associative and prefrontal cortices (Narayanan and Laubach, 2009;Totah et al., 2009;O'Connor et al., 2010bO'Connor et al., , 2013Hanks et al., 2015;Zagha et al., 2015;Ebbesen et al., 2017;Le Merre et al., 2018). We assumed that populations with specific modulation characteristics represent segregated (and potentially competing) microcircuits (Halladay and Blair, 2015; and thus analyzed these populations separately in subsequent in-depth analyses. We found that: (1) modulation of spike rates coincides with decision making and motor output and modulation depth correlates to cognitive context; (2) low excitation of positively modulated units during and (3) units negatively modulated during correct ''Go'' behavior encode behavioral state updating by increased spiking after an error was made.

mPFC as a Single Functional Area
Even though there is evidence showing differential functional roles and innervation patterns of distinct mPFC areas (Heidbreder and Groenewegen, 2003;Euston et al., 2012;Gourley and Taylor, 2016), we chose not to subdivide the mPFC. We found significantly modulated units over the entire dorso-ventral extent of the mPFC without different proportions of significantly modulated units along the dorso-ventral axis (Supplementary Figure S7). Classical cytoarchitectonic divisions of mPFC (i.e., AC, PL and IL cortex) or divisions into the dmPFC and the vmPFC (Gabbott et al., 2005;Luchicchi et al., 2016) lead to ambiguous and potentially artificial subdivisions of the mPFC (Heidbreder and Groenewegen, 2003;Euston et al., 2012;Hardung et al., 2017). The absence of a dorso-ventral distinction of unit task-selectivity in our dataset strengthens our confidence that for our task mPFC subdivision is unnecessary and potentially counterproductive.

mPFC Processing of Tactile Behavior
Primary and/or secondary sensory cortices govern stimulus detection after low intensity sensory stimuli (Yang et al., 2016;Le Merre et al., 2018), whereas CA1 and mPFC are necessary for correct performance in detection tasks (Pinto and Dan, 2015;Le Merre et al., 2018). However, it remains unknown how sensory information is integrated into the mPFC during stimulus discrimination and how the mPFC influences sensory processing during tasks that involve sensation. Activity of mPFC axons increases stimulus detection and discrimination by top-down center-surround contrast enhancement in V1  and are necessary to respond to low-contrast stimuli during conditioning (Wu et al., 2018). The mPFC also increases stimulus detection by driving modality-dependent attention through the thalamus (Wimmer et al., 2015). The mPFC does not receive strong direct inputs from S1 (Hoover and Vertes, 2007;DeNardo et al., 2015) or somatosensory thalamic nuclei VPM and PoM (Hoover and Vertes, 2007), although somatosensory field potentials are observed in the mPFC after whisker stimulation in both anesthetized rats (Martin-Cortecero and Nuñez, 2016), awake untrained mice and trained mice (Le Merre et al., 2018). We did not observe robust changes in spiking upon touch in the mPFC (data not shown). Nonetheless, tactile information probably should be present in the mPFC to correctly drive tactile decision making. We did see a fast increase in local field potential amplitude (latency of approximately 50 ms, data not shown) after the first touch for each trial, but we cannot exclude the possibility that this local field potential carries multimodal information. When the mPFC does not encode sensory information directly, the alternative explanation could be that sensory information from primary sensory cortices is summarized by feature extraction and/or categorization (Hanks et al., 2015) in an upstream brain region, e.g., the more dorsal parts of frontal cortex (Sreenivasan et al., 2017) or the orbitofrontal cortex as has been suggested by Euston et al. (2012), since there is strong innervation from both these regions to mPFC (Hoover and Vertes, 2007). During our task we do not identify specific sensory spiking patterns. However, it remains enigmatic whether this is due to absence of sensory signals altogether or because the sensory information is packaged in a way that we cannot yet recognize and decode in separation from the motor and cognitive components of the task.

Recorded Populations and Their Characteristics
In our dataset, we aimed to address coding properties of pyramidal projecting neurons and therefore excluded putative GABAergic interneurons based on their fast-spiking waveform (Barthó et al., 2004). A small population of interneurons with a regular-spiking waveform may nevertheless remain in our dataset (Kim D. et al., 2016). In addition, histological recovery of recording location indicated that the predominant fraction of units was recorded in the deep layers (layer 5 and 6). Interpretations of the output of the mPFC circuitry may thus be influenced by overrepresentation of units from these layers. This is important to consider since encoding of task-relevant information will almost certainly be layer-and cell-type specific and thus correlate to projection target (Gabbott et al., 2005;Dembrow et al., 2010;Pinto and Dan, 2015;Kamigaki and Dan, 2017). For example, layer 5 units can project to the lateral hypothalamus and carry appetite-related signals or send motor signals to striatum, ventral tegmental area and spinal cord, while units from layer 6 can project to the mediodorsal (MD) thalamus (Vertes, 2002(Vertes, , 2004Gabbott et al., 2005;Morishima et al., 2011). Additionally, the spiking activity of layer 2 and layer 3 output projections, most notably to basolateral amygdala (Gabbott et al., 2005), need to be determined to generate a comprehensive view on cell-type and layer-specific activity in mPFC during the whisker-guided objection localization task.
We have not been able to link our three populations of modulated/unmodulated units to projection target or specific morphological types (Dembrow et al., 2010;Morishima et al., 2011;van Aerde and Feldmeyer, 2015;Kawaguchi, 2017). However, this may be essential for future experiments to increase the interpretability of links between the morphological structure, electrophysiological function, and influence on the behavior of the recorded populations.

Activity in mPFC Pyramidal Neurons Is More Than Motor Representation
In our study, we used the first lick during the decision window as a discrete trigger to signal decision making, but this means that it is not straightforward to disentangle spiking involved in decision making (e.g., motivation, cognitive context and sensory inputs) from spiking involved in motor output (Horst and Laubach, 2013;Amarante et al., 2017). This could be solved by temporally separating the response moment from motor output (delayed reward), but this goes at the expense of identification of the exact decision moment. Instead, we compared licks during and outside trials to quantify the impact of context on spike rate and ''no licks'' vs. ''extra-trial licks'' to quantify the influence of motor output. We show that motor output without contextual value (i.e., lick outside trial) leads to a decrease in spike rates, restricted to the populations that showed either no or negative modulation of spike rate during correct ''Go'' performance. In contrast, the population that showed positive modulation of spike rate during correct ''Go'' behavior did not show spike rate modulation by motor output. However, the positively modulated population showed a significant difference in spike rate during licks with and without trial-context (Figures 3E3, 4E), indicating a potential influence of motivation or attention processes on spiking. Collectively, cognitive context and motor output influence spiking in mPFC. Spike rate modulation by motor output and context could ultimately be the outcome of excitability changes by neuromodulatory inputs from e.g., the dopaminergic or cholinergic system (reviewed in Thiele and Bellgrove, 2018) or due to strong interactions with regions involved in appetitive behaviors, such as e.g., the supra-mammillary nucleus and lateral hypothalamus (Ikemoto et al., 2004;Hoover and Vertes, 2007;Reppucci and Petrovich, 2016).

Correlate of Behavioral Updating
It was proposed previously that mPFC may be critical for task learning, but perhaps not as much during task execution (Liu et al., 2014). This hypothesis is at odds with our finding of trial-to-trial changes in spike rates in the mPFC correlating to task performance. In addition, manipulation of the mPFC during a task shows that mPFC is needed to perform many behaviors, from bottom-up stimulus detection (Le Merre et al., 2018) to attention Luchicchi et al., 2016) and top-down attentional selection (Wimmer et al., 2015;Schmitt et al., 2017). The activity of the mPFC contains short-term memories of actions and their consequences (Narayanan and Laubach, 2008;Horst and Laubach, 2012). Additionally, the mPFC contains a physiological representation of current task rules or strategies to solve the task (Durstewitz et al., 2010;Cho et al., 2015;Malagon-Vina et al., 2018). The mPFC, thus, has access to all information needed to compute strategies to optimize action outcome. Behavioral updating in mPFC is signaled by theta frequency oscillations (Narayanan et al., 2013) and it could be that spiking of the negatively modulated population is phase locked to theta (Horst and Laubach, 2013;Amarante et al., 2017) to broadcast relevant updates to the behavioral state to downstream targets (Fries, 2005;Womelsdorf et al., 2007). Furthermore, the mPFC has indirect control over actions, i.e., optogenetically stimulating efferent projections from the mPFC to the nucleus accumbens core (NAc) and the periventricular nucleus of thalamus (PVT) leads to enhancement or suppression of reward seeking behaviors after negative outcomes (Kim et al., 2017;Otis et al., 2017). Behavioral updating also depends on mPFC to MD thalamus output (Marton et al., 2018) and risky decision making after negative outcomes involves VTA to mPFC projection (Verharen et al., 2018). In the current work, we cannot show causality of spiking rates to changes in behavior. It could be that mPFC spike rate changes during behavioral updating are simply caused by the same upstream neuronal processes, or that it is not the spiking rate change, but rather phase locking of spiking to oscillations that result in behavioral updating (Fries, 2005;Narayanan et al., 2013;Amarante et al., 2017). Furthermore, it remains to be determined whether the functional populations of (positively modulated, unmodulated and negatively modulated) units represent anatomically distinct groups or heterogeneous populations.

mPFC Excitation Needed to Prevent Avoidable Mistakes
We show that insufficient spiking in positively modulated units predicts FA mistakes when task conditions are easy. We hypothesize that this may be due to either a lack of top-down attentional filtering (Dalley et al., 2004;Zhang et al., 2014;Wimmer et al., 2015;Luchicchi et al., 2016) or due to insufficient inhibitory control Laubach, 2006, 2009;Chudasama et al., 2012;Luchicchi et al., 2016;Kamigaki and Dan, 2017). The mPFC is an important area for attentional processing and activation of mPFC projections have been shown to increase stimulus detection and contrast in primary visual cortex (V1; Zhang et al., 2014). Similarly, mPFC projections to the pons are selectively necessary to detect low-contrast stimuli during conditioning (Wu et al., 2018). Additionally, mPFC to thalamus projections are needed for proper selection of sensory modality (Wimmer et al., 2015) and top-down input from mPFC to claustrum is needed for correct attention behavior (White et al., 2018). Thus, the mPFC is driving top-down stimulus-response associations. In our experiment, however, we assume that the Easy ''No-go'' position is easy to distinguish from the ''Go'' position and that bottom-up input should therefore be sufficient to select the proper action. Thus, we suggest that insufficient excitation of mPFC neurons through malfunctioning of the mPFC-driven inhibitory control mechanisms leads to FA errors. Under Normal and Hard conditions, the ∆ spike rate during the low number of errors of impulsivity could get lost in the larger number of errors of discrimination. The link between impulsivity and reduced mPFC spiking is further supported by findings that reduction of mPFC activity leads to more impulsive prematurely expressed responses Laubach, 2006, 2009;Luchicchi et al., 2016;Hardung et al., 2017) and FAs (Kamigaki and Dan, 2017).

Final Remarks
In summary, we quantified neurophysiological representations of motivational and cognitive context in the mPFC of rats performing a whisker-based object localization task (Figure 3). We show that a specific fraction of negatively modulated units could underlie behavioral updating on a trial-to-trial basis ( Figure 5) and that a subpopulation of mPFC units needs to be sufficiently activated to maintain task performance under dynamic ''No-go'' task conditions (Figure 6). These results could provide a framework for future studies investigating causality of mPFC neuronal spiking activity to behavioral updating and stimulus detection.

AUTHOR CONTRIBUTIONS
RH, HM and CK designed the study. RH performed the experiments. AP, VN and RH designed and programmed the behavioral hardware and software. JL, SB, RH and CK analyzed the data. RH and CK wrote the manuscript with input from all authors.

ACKNOWLEDGMENTS
We would like to thank Prof. Dr. B. Sakmann for advice and support. In addition, we would like to thank Prof. Dr. K. Svoboda and Prof. Dr. C. Schwarz for advice on setting up a head-fixed whisker-based task for rats. Furthermore, we want to thank R. Elsinga of Fine Mechanics of BetaVU and T.J.F. Brouwer and P.J.M. Caspers from Electronic Engineering of BetaVU for building and helping design our custom behavioral hardware. In addition, we thank Dr. T. Pattij and H. Terra for comments on a previous version of the manuscript.