Orbitofrontal cortex mediates the differential impact of signaled-reward probability on discrimination accuracy

Orbitofrontal cortex (OFC) function is critical to decision making and behavior based on the value of expected outcomes. While some of the roles the OFC plays in value computations and behavior have been identified, the role of the OFC in modulating cognitive resources based on reward expectancy has not been explored. Here we assessed the involvement of OFC in the interaction between motivation and attention. We tested mice in a sustained-attention task in which explicitly signaling the probability of reward differentially modulates discrimination accuracy. Using pharmacogenetic methods, we generated mice in which neuronal activity in the OFC could be transiently and reversibly inhibited during performance of our signaled-probability task. We found that inhibiting OFC neuronal activity abolished the ability of reward-associated cues to differentially impact accuracy of sustained-attention performance. This failure to modulate attention occurred despite evidence that mice still processed the differential value of the reward-associated cues. These data indicate that OFC function is critical for the ability of a reward-related signal to impact other cognitive and decision-making processes and begin to delineate the neural circuitry involved in the interaction between motivation and attention.


Introduction
It is well known that knowledge of the value of a potentially-earned reward can impact performance in cognitive tasks. Explicitly signaling changes in reward value influences discrimination accuracy in monkeys (Leon and Shadlen, 1999;Bendiksby and Platt, 2006), pigeons (Jones et al., 1995;Brown and White, 2005), and humans (Engelmann and Pessoa, 2007;Engelmann et al., 2009). Yet, little is known about the circuits underlying how representations of expected reward impact cognitive performance.
The orbitofrontal cortex (OFC) is involved in the representation of reward as demonstrated in its critical role in value-based decision making (Schoenbaum et al., 2009). Recent research suggests that OFC does not encode value per se, but is involved in adaptive decision making that requires information about the value of specific outcomes, particularly when this information must be dynamically updated and used to guide selection of specific behaviors that lead to those outcomes (Schoenbaum et al., 2011;Takahashi et al., 2013).
The aim of the present research was to see if OFC modulates the recruitment of cognitive resources based on reward expectations. Specifically, we asked if the OFC plays a role in the modulation of discrimination accuracy when explicit cues signal changes in the likelihood of reward. Altered performance under these conditions is thought to reflect differences in the topdown recruitment of attention to trial-specific stimuli (Corbetta and Shulman, 2002;Pessoa et al., 2003;Small et al., 2005;Pessoa and Engelmann, 2010).
To investigate the role of the OFC in modulation of discrimination accuracy in response to reward-associated cues, we generated mice in which neuronal activity in the OFC could be transiently (for the duration of a single behavioral test session) silenced. This was achieved by stereotaxically injecting a virus which drives expression of the Designer Receptor Exclusively Activated by Designer Drug (DREADD) hM4D(G i ) selectively in neurons. Systemic administration of the synthetic drug clozapine-N-oxide (CNO) induces G i activation which mediates decreased neuronal activity selectively in neurons in which the hM4D(G i ) receptor is expressed (Armbruster et al., 2007).
These mice were tested in a procedure which explicitly assays the impact of motivation on attention . In our signaled-probability sustained-attention task (modeled after the five-choice serial reaction-time task; Robbins, 2002), the correct response on a given trial is a lever press which is cued by a stimulus light. As with the 5CSRTT, we have previously shown that increasing attentional demand by decreasing the duration of cue presentation worsens discrimination performance (Kahn et al., 2012;Ward et al., 2015). Motivation to attend during the task is manipulated by explicitly signaling the probability of reward for correct choice responses on a trial-by-trial basis. Under control conditions, mice performed with greater accuracy when the signaled-reward probability was high. When OFC neuronal activity was inhibited, discrimination performance was not modulated by cues associated with different reward probabilities. The inhibition did not eliminate the representation of differential-outcome likelihood associated with different cues but specifically interfered with the capacity for this information to influence attention or decision processes.

Methods and Materials
Mice Mice were male F1 hybrids (3-6 months old at the beginning of the experiment) of the C57BL/6J and 129Svev (Tac) background strain. Mice were housed, bred, and tested in compliance with the New York State Psychiatric Institute and Columbia University Institutional Animal Care and Use Committees.

Apparatus
Operant chambers (Med-Associates, St. Albans, VT; model ENV-307w) were used in all behavioral testing. The operant chambers had internal dimensions 22½ × 18½ × 12½ and were located in a light-and sound-attenuating cabinet equipped with an exhaust fan, which provided 72 dB background white noise. Each chamber was equipped with a feeder trough that was centered on one wall of the chamber. A reward of one drop of evaporated milk could be provided by raising a dipper. An infrared photocell detector was used to record head entries into the trough. Two retractable levers were mounted on the same wall as the feeder trough. The chambers were illuminated throughout all sessions with a houselight (Med Associates #1820) located at the top of the chamber. An audio speaker was positioned 8.5 cm from the floor on the wall opposite the feeder trough. The speaker delivered a brief tone (90 db, 2500 Hz, 200 ms) to signal when the liquid dipper was raised.

Sustained-attention Task
All training and testing sessions occurred once per day, 7 d per week. Animals were first trained to consume evaporated milk from the liquid dipper. The mice were then trained to press the lever to obtain rewards on a continuous reinforcement (CRF) schedule as described previously . Each CRF session ended after 60 rewards or 60 min, whichever occurred first. Subjects that had earned fewer than 30 rewards on the third day of CRF training were given an overnight (14-h) session with no limit on earned rewards. Discrimination training then occurred in several phases. In all phases, each trial began with an intertrial interval (ITI) of unpredictable duration (mean = 45 s, range 2.74-148.13 s).

Single Cue-Single Lever Training
During single cue-single lever training, mice received trials where a cue light above a lever on either the left or right side of the chamber was illuminated for 10 s. One second after the cue's termination, the lever beneath the cued light was presented for 10 s. Pressing the lever beneath the cued light resulted in a dipper reward. The cue light/lever position alternated daily across a total of four sessions, until the mice reliably pressed the lever after each stimulus cue presentation.

Choice Training
During choice training, a percentage of the trials were single cue-single lever trials as described above, while the remaining percentage were choice trials. The position of the cue light (left or right) was randomly determined from trial to trial. During choice trials, both of the levers were inserted 1 s after the cue's termination, and a response to the lever that had been cued at the beginning of the trial was rewarded. Incorrect responses resulted in a correction procedure, where the trial was repeated with the cue light in the same location until a correct response was made. Training consisted of three sessions with 50% choice: 50% single lever-cue trials, three sessions of 80% choice: 20% single lever-cue trials, and nine sessions of 100% choice trials, all with correction. This was followed by 10 sessions of 100% choice trials without correction. During these sessions, incorrect responses resulted in both levers being withdrawn and a new trial being initiated. If no response was made after 10 s (an omission), both levers were retracted and a new trial began.
We have shown previously that accuracy on this task is sensitive to increasing attentional demand (i.e., accuracy decreases with decreasing cue duration; Kahn et al., 2012;Ward et al., 2015). Thus, it is a sensitive assay with which to test manipulations that impact attention.

Signaled-probability Sustained-attention Task
Following acquisition of the sustained-attention task, mice were moved to the signaled-reward probability sustained-attention task, in which the probability of reward for a correct choice response (either 1.0 or 0.1) on each upcoming trial was signaled by either turning the houselight on or off during the trial (counterbalanced across mice). Mice received an equal number of high and low reward-probability trials. High and low rewardprobability trials were presented pseudorandomly with the constraint that no more than four consecutive trial types of the same reward probability could be presented in a row. Mice received six sessions on this task, after which the cue duration was successively decreased from 10 to 2 s over the course of 15 sessions.
hM4D(G i ) is a modified human muscarinic receptor that does not bind endogenous ligands but responds to a synthetic compound, clozapine-n-oxide (CNO). When CNO binds to hM4D(G i ), it produces a hyperpolarization of the cell through a g-protein mediated activation of inward-rectifying potassium channels (Armbruster et al., 2007). We have successfully used this method previously to silence neurons in vivo in behaving mice (Parnaudeau et al., 2013(Parnaudeau et al., , 2015. Importantly, this method has also been shown recently to significantly inhibit activity of OFC neurons in behaving mice (Gremel and Costa, 2013).
Mice with bilateral orbitofrontal cortex GFP (N = 12) or hM4D(G i )-mCitrine (N = 11) viral injections were tested in the signaled-reward probability sustained-attention task. After cue duration was decreased to 2 s, mice received several i.p. injections to accustom them to the injection procedure. Following this, mice received injections of saline and CNO, counterbalanced for order, in a within-subjects design, in which all mice received both types of injections.

Drugs and Injection Protocol
CNO (obtained from NIH) was dissolved in saline to a final concentration of 0.2 mg/ml. Saline or CNO (2 mg/kg) was administered intraperitoneal to the mice 30 min before behavioral testing. This dose was chosen based on our previous work using the DREADD method and has been shown to significantly reduce neuronal firing in infected neurons in vitro and this impacted both cognition and also task related synchronous activity with a distal structure during in vivo recordings (Parnaudeau et al., 2013(Parnaudeau et al., , 2015. After an additional three sessions of drug-free testing, this injection regimen was repeated so that there were two determinations of the drug effect in each subject. For three mice in each group, an equipment malfunction resulted in data not being correctly recorded during one of the saline or CNO sessions during the first injection regimen. In these cases, the obtained value reflects only the data from the second injection regimen. For the mice on whom we had data from both drug determinations, we conducted a reward probability (high vs. low) × treatment (saline vs. CNO) × viral injection [GFP vs. hM4D(G i )] × drug determination ANOVA on the accuracy data which showed that the overall effect of determination was not significant [effect of drug determination; F (1, 18) = 0.132, p = 0.720], nor were any of the interactions between determination and the other factors (Fs < 0.70). Because there was no statistically significant difference in the data obtained from the two determinations, data reported below represent the average of the two drug-testing regimens.

Data Analysis
The main dependent measure of interest was the proportion of correct responses. We also analyzed latency to make a choice response, latency to retrieve rewards, proportion of trials omitted, and the proportion of total responses made on the previously correct lever, as well as the number of errors made on the previously correct lever (measures of perseverative responding). For statistical comparison, repeatedmeasures analyses of variance with appropriate factors, followed by Bonferroni post-tests were used. Individual means were compared using paired-samples t-tests. Latency to retrieve rewards was compared using a between-subjects t-test.

Results
Pharmacogenetic Inhibition of OFC Function Abolishes the Impact of Reward-associated Cues on Attention Bilateral stereotaxic injection of either hM4D(G i )-mCitrine or GFP expressing adeno-associated viruses resulted in expression of either hM4D(G i ) and mCitrine or GFP selectively in neurons due to the use of the human Synapsin1 promoter (hSyn). Figure 1A shows a representative image of viral expression in OFC. The minimal and maximal extent of intrinsic fluorescence of mCitrine [from the hM4D(G i ) expressing virus] and GFP in either hemisphere are depicted in the left and right hemispheres, respectively, of coronal sections in Figure 1B. Intrinsic fluorescence was largely located in lateral and ventral orbitofrontal cortices. In a few cases in GFP-injected control mice there was some spreading of the virus to M1, M2, and frontal association cortex.
We tested virally injected mice in the signaled-reward probability sustained-attention task after injection of either vehicle (saline), or CNO. A reward probability × viral injection × treatment ANOVA on proportion correct indicated that the effects of viral injection [GFP vs. hM4D(G i )] and treatment FIGURE 1 | (A) Representative example of viral expression in the orbitofrontal cortex. (B) Diagrammatic representation of the spread of hM4D(G i )-mCitrine (red) and GFP (blue) virus. All injections were bilateral. For clarity, the minimal (light colors) and maximal (dark colors) extent of each type of injection is depicted only on one hemisphere. Numbers indicate relative distance from bregma according to Paxinos and Franklin (2001). hM4D(G i ) N = 11, GFP N = 12.

Intact Encoding of Signaled-reward Probability in Mice during OFC Inhibition
In addition to analyzing the effect of signaled-reward probability on response choice, we analyzed the latency to make a choice response during the task (Figure 2B). Trials on which mice failed to respond were not included in these calculations. As above, the overall effect of viral injection [F (1, 21) = 0.021, To further analyze performance, separate ANOVAs were conducted on the latency data from GFP and hM4D(Gi) mice. As shown in Figure 2B, latency to respond was shorter on high-reward probability trials than on low-reward probability trials [effect of reward probability; F (1, 11) = 43.45, p = 0.00 and F (1, 10) = 28.12, p = 0.000] for GFP and hM4D(G i ) mice, respectively. Differently from the effect on discrimination accuracy, however, there was no significant effect of OFC neuronal silencing on the latency to make a choice response. Although latencies became noticeably shorter on low reward probability trials under CNO treatment, this effect was not statistically significant in either GFP or hM4D(G i ) mice [reward probability × treatment interaction F (1, 11) = 3.64, p = 0.09 and F (1, 10) = 1.51, p = 0.25, respectively]. These results demonstrate that OFC inactivation did not eliminate the ability to associate different reward probabilities with specific cues nor did it impair an overall ability to use that information in motivated behavior. We also analyzed latency to retrieve rewards and found no difference between GFP or hM4D(G i ) mice treated with either saline or CNO (ps > 0.50).

OFC Inhibition does not Produce Perseverative Responding
Lesions of the OFC have been shown to impair reversal learning performance by producing perseveration on a previously rewarded response (Clarke et al., 2008). To determine whether OFC inactivation altered perseveration in the present study, we calculated the proportion of total responses that were made on the lever that was correct on the previous trial. Figure 3A shows that there was no difference in the proportion of perseverative Frontiers in Neuroscience | www.frontiersin.org These results indicate that OFC inactivation did not produce impairments by increasing perseverative responding.

OFC Inhibition does not Impact Motivation to Participate in the Task
We also analyzed the proportion of trials on which mice did not make a choice response to determine whether OFC inactivation impacted motivation to engage in the task. Figure 4 shows that overall, mice completed the majority of trials (>90%). As we have previously reported , mice omitted responses on significantly more low reward-probability trials than high These results indicate that mice were less motivated to respond on low reward-probability trials, but OFC inhibition did not impact the proportion of omitted trials.

Discussion
Transient inhibition of neuronal activity in the OFC attenuated the ability of reward-related cues to modulate differential discrimination accuracy. Importantly, neither the presence of the hM4D(Gi) receptor or CNO alone had any impact on accuracy. It was only when the hM4D(Gi) receptor was activated by CNO that the effects were seen. This effect was not the result of an increase in perseverative responding. Furthermore, because overall accuracy was not impaired, this occurred in the absence of general decrements in attention. Additionally, the mice appreciated that different cues signaled different reward probabilities, as evidenced by the fact that both choice-response latencies and overall task engagement were modulated by the signals. Thus, inhibiting the OFC did not impact (1) overall attention; or (2) encoding of the relation between signals and outcome probability per se, but impaired the ability of the mice to use that information to modulate attention or other processes that impact discrimination accuracy.
Recent work parsing the role of the OFC in behavior and decision making has indicated that the OFC may not be necessary in simple value-based behavior, but that it is critical when information about specific outcomes is relevant to ongoing choice behavior and decision making (Schoenbaum et al., 2009;Walton et al., 2010;Takahashi et al., 2013). The signaled-reward probability paradigm employed here involves learning the visual discrimination, learning that the different cues signal different reward probabilities, and then leveraging that knowledge to differentially recruit attentional processes. The dissociation between the lack of effect of signaled-reward probability on discrimination accuracy in mice during OFC inhibition, and the spared differential effect of reward probability on choiceresponse latencies and omitted trials, is further support for the distinction of the specific psychological processes subserved by OFC. Specifically, our data suggest that OFC is not required for a probability signal to modulate differential behavioral responses per se; rather the OFC is critically involved in the ability of that same probability signal to modulate other cognitive and decision-making processes.
Although we show here that the OFC might be critical for the ability of reward-associated cues to differentially impact attention, the present data do not demonstrate that OFC is sufficient for such modulation. A growing body of work has deepened understanding of the subtle and complex role of the OFC in this type of decision making (Furuyashiki et al., 2008), and has elucidated the critical role of connectivity with other structures in value-based behavior. For example, interactions between OFC and basolateral amygdala have been shown to be critical for using information based on the learned value of outcomes (Schoenbaum et al., 1998(Schoenbaum et al., , 1999(Schoenbaum et al., , 2007Baxter et al., 2000;Blundell et al., 2001;Saddoris et al., 2005).
We have thus far interpreted the results from our signaledprobability task as being indicative of differential attentional recruitment in response to reward-associated cues. Our previous data  using this task suggest that differential accuracy on high and low-probability trials is most pronounced at shorter cue durations, suggesting that the ability of the rewardprobability signal to recruit attention depends on how taxed attentional resources are by current task requirements. The current results suggest that the OFC may be necessary under these conditions to modulate differential recruitment of attention in response to reward-associated cues. This interpretation may be consistent with recent results which show that the OFC may play a role in attention in addition to its role in value-based decision making (Chase et al., 2012), and that the OFC signals the increased salience of situations in which multiple outcomes (in our case, high and low reward probability) are expected (Ogawa et al., 2013).We should note, however, that while our task requires the interaction of motivation and attention on some level, we cannot unequivocally conclude that differential accuracy on high and low reward-probability trials can only be understood in terms of top-down recruitment of attention, or that OFC inhibition compromises this specific aspect of performance. Such confirmation would require parametric manipulation of cue duration and reward probability combined with OFC inactivation.
Another interpretation of the obtained results is that given the increased latency to respond on low-probability trials, these trials taxed working memory more than high-probability trials, and the differences in accuracy are indicative of deficits in remembering the location of the cue. This interpretation is less plausible, however, given the fact that accuracy of mice in the present sustained-attention task is not impaired when explicit delays within the range of the latency intervals obtained here are inserted between cue presentation and presentation of choiceresponse levers . Thus, the difference in accuracy is not likely to be due to working memory for correct cue location being unduly taxed on low-probability trials.
Another alternative to the attentional recruitment account of our data is that the transient inhibition of neuronal activity in the OFC resulted in an inability to recruit motivational processes. Based on electrophysiology and human neuroimaging evidence, there is overlap in areas involved in the processing of reward and those involved in recruiting attention to motivationallyrelevant stimuli (Pessoa and Engelmann, 2010). Thus, the degree to which motivational processes are distinct, both psychologically and at the functional neural level, from attentional processes in experiments like the one reported here may be difficult to specify. Indeed, some have suggested that motivation may exert its effects on behavior by engaging the same functional circuitry used by the attention system (Pessoa and Engelmann, 2010). However, the fact that reward probability was manipulated on a trial-by-trial basis, as opposed to over blocks of trials or over sessions, suggests that the differential accuracy was not due to general, bottom-up, arousal-mediated motivational effects. Furthermore, OFC inhibition did not change the proportion of trials omitted, indicating that it did not impact overall motivation to engage in the task. The present results suggest therefore, that if OFC inhibition impacted accuracy through motivation it must be through a process that regulates action on a trial-bytrial basis. There is ample evidence that OFC neurons code for outcomes when a signal indicates a specific outcome (Tremblay and Schultz, 1999;Padoa-Schioppa and Assad, 2006;van Duuren et al., 2007). Thus, it is possible that OFC inactivation interferes with the use of information about differential encoding of reward probability from trial-to-trial.
Recent research on the nature of the interactions between the nucleus accumbens (NAc) and the OFC in value-based decision making may suggest different specific roles of the OFC in the performance seen here. The NAc is widely known to be critical for reward-motivated behavior in a variety of paradigms (Salamone et al., 2007). Given the functional connectivity of the NAc with the basal forebrain and the medial prefrontal cortex, both critical components of the attentional machinery needed for our task, the NAc is particularly well situated to mediate the recruitment of attention via reward-associated cues (Hasselmo and Sarter, 2011). It also receives direct projections from the OFC, and recent work has clarified the nature of the interactions between NAc and OFC in value-based decision making. For example, Stott and Redish (2014) recorded concurrently from NAc and OFC during a spatial delay-discounting task that involved a trade-off between reward delay and magnitude. Importantly, they were able to isolate neural activity that occurred during deliberation, before the choice occurred, from activity which occurred after the choice was made. They found that activity in NAc signaled aspects of the reward before the choice was made (see also van der Meer and Redish, 2009), whereas activity in both NAc and OFC maintained representations of reward during the execution of the chosen response. Based on these results, they suggested that NAc is more directly involved in the planning of action during behavioral choice, while both NAc and OFC process information related to the decision, execution, and receipt of an outcome once the decision is made. Inactivation of OFC in our experiment could alter all of the later steps following the decision point and contribute to the inability to use information about reward to guide response selection.
Thus, is seems likely that the OFC is integrating information about both the cued correct choice location gained from employment of attentional processes (possibly facilitated by the NAc) and the signaled-reward probability to facilitate the accurate use of the information for response selection and execution of the selected response. An impairment in the capacity to maintain a representation of the integrated information would lead to less differential responding on high and low rewardprobability trials.
The dissociation reported here between the impact of signaled-reward probability on choice-response latencies and discrimination accuracy is consistent with this interpretation of OFC function. Impaired ability to use information about reward probability to modulate response selection and execution accuracy on a trial-by-trial basis in the present procedure could be based on accurate encoding of reward probability but an inability to use that information to guide and/or sustain response choice. Thus, perhaps mice encoded the differential value of the reward-associated cues and this information impacted their choice-response latencies (they were more motivated to respond on high-probability trials), but they were unable to use this information either during cue presentation to recruit/direct attention appropriately or during the choice phase to adaptively modulate choice behavior. Although similar to the above interpretation, in that the role of the OFC is to allow for association of a particular reward value with a particular response, this interpretation differs from that proposed by (Stott and Redish, 2014 see also Walton et al., 2010) in that rather than altering an updating process which impacts choice behavior on subsequent trials, we suggest that OFC inhibition impacted the decision process by altering choice behavior on the current trial. In sum, we suggest that OFC may play critical roles in both the dynamic recruitment of attention in response to signaled-reward probability and/or in the selection and execution of a choice response.
A number of results suggest that there are likely separate and dissociable roles of the lateral and medial OFC in rewardmotivated behavior and decision making (Burton et al., 2014;Rudebeck and Murray, 2014). Specifically, lateral OFC is thought to be involved in evaluating differences in expected outcomes, while medial OFC is involved in guiding choices based on the expected value of these outcomes (Rudebeck and Murray, 2014).
Our viral injections included both medial and lateral OFC, but were biased toward medial OFC. The dissociation between the impact of signaled-reward probability on choice-response latencies and discrimination accuracy may reflect inhibition of the choice-modulating function of the medial OFC, but spared valuation by lateral OFC. Further work is needed to delineate the specific roles of distinct anatomical areas of OFC in this task.
These results demonstrate that normal OFC function is necessary for the ability of motivationally-significant cues to impact cognitive performance. Deficits in motivation and cognition are present in diseases such as schizophrenia, and severity of these deficits determines functional outcomes and quality of life (Bowie and Harvey, 2006;Green, 2006). Additionally, cognition and motivation interact to produce dysfunction, at least in part through an inability to adaptively modify behavior in response to motivationally-significant cues (Barch, 2005;Nakagami et al., 2008). Numerous results point to prefrontal dysfunction in the pathophysiology of schizophrenia (Berman et al., 1988;Crespo-Facorro et al., 2000;Wang et al., 2014). Indeed, cognitive deficits described in patients are prototypical of the type of deficits seen when OFC function is compromised, including deficits in reversal learning (Waltz and Gold, 2007) and insensitivity of performance to variation in reward probability (Gold et al., 2012(Gold et al., , 2013. Our results lend support to the hypothesis that OFC plays a causal role in the interaction of motivation and cognition and that a disruption of this function is a likely source of cognitive and functional impairment in patients.

Author Contributions
RW, PB, and ES conceptualized the research. RW and VW conducted the research. EK provided lab space, reagents, and other material support and consultation for the research. RW and VW analyzed data. RW, PB, and ES wrote the manuscript.