Contrasting Effects of Lithium Chloride and CB1 Receptor Blockade on Enduring Changes in the Valuation of Reward

When an organism responds for a reward, its learned behavior can be characterized as goal-directed or habitual based on whether or not it is susceptible to reward devaluation. Here, we evaluated whether instrumental responding for brain stimulation reward (BSR) can be devalued using a paradigm traditionally used for natural rewards. Rats were trained to lever press for BSR; afterward, BSR was paired with either lithium chloride (LiCl, 5 mg/kg, i.p.), a pro-emetic, or AM251, a CB1 receptor antagonist (3 mg/kg, i.p.) or the vehicle of these compounds. Pairings of BSR with these compounds and their vehicles were performed in a novel environment so that only unconditional effects of BSR would be affected by the pharmacological manipulations. Subsequently, in a probe test, all rats were returned in the drug-free state to the boxes where they had received training and instrumental responding was reassessed in the absence of BSR delivery. When compared to control, LiCl produced a significant decrease in the number of responses during the test session, whereas AM251 did not. These results show that instrumental responding for BSR is susceptible to devaluation, in accord with the proposal that this behavior is supported at least in part by associations between the response and the rewarding outcome. Further, they suggest that reward modulation observed in studies involving the use of CB1 receptor antagonists arises from changes in the organism’s motivation rather than drug-induced changes in the intrinsic value of reward.


INTRODUCTION
Goal-directed behavior, unlike habits, is adjusted immediately and appropriately to changes in the value of the expected outcome. This reflects the finding that such behavior is based on associations between the response and the outcome or goal of the action, so that organisms may continuously re-evaluate their goal objects and dynamically change their actions in order to effectively produce adaptive behaviors (Dickinson, 1985). A rewarding goal's value can be diminished by selective satiety and by induction of taste aversion (Colwill and Rescorla, 1986;Yin and Knowlton, 2002). Such manipulations do not produce a significant change in habitual behaviors; habits persist even if the reward becomes less attractive or if the action is not necessary to earn the reward (Adams and Dickinson, 1981;Adams, 1982). Thus, once lever pressing for a reward becomes habitual in this sense, induced taste aversion or unlimited exposure to the reward prior to a probe test have very little consequences on subsequent lever pressing behavior.
Since the discovery that organisms will seek and reinitiate electrical stimulation to certain brain areas (Olds and Milner, 1954;Olds, 1962), brain stimulation reward (BSR) has become the paradigm of choice for studying the neural reward circuitry. Some of the reasons for this are that the electrical stimulation can be precisely manipulated and that its parameters have neurophysiological meaning. The current passed through the electrode tip depolarizes nearby neurons thereby triggering action potentials. If the train and pulse duration are held constant, the number of action potentials elicited in the neurons close to the electrode tip is determined by the pulse frequency, whereas the stimulation current or pulse amplitude determines the radius of effective stimulation, and thus the number of cells excited by the electrode (Gallistel et al., 1981).
The behavior elicited and controlled by the electrical stimulation, unlike the behavior controlled by natural rewards (McSweeney and Roll, 1993), is stable both between and within sessions. The electrical signal is delivered directly into the brain, bypassing sensory inputs, and physiological feedback mechanisms that discount natural rewards over the length of the experimental session. Moreover, it is delivered with a minimal delay after the behavior that procures the reward has occurred; therefore response-reward delays that degrade natural rewards are avoided. The behavior controlled by the rewarding signal that arises as a result of the delivery of electrical pulses is very sensitive to changes in the stimulation parameters and therefore the rewarding efficacy.
Even though BSR has very peculiar characteristics, the rewarding signal delivered by the electrode and that of natural rewards are evaluated and compared on a similar scale. The rewarding signal produced by the stimulation can compete with, summate with , and substitute for (Green and Rachlin, 1991) natural rewards. Drugs that are used to devaluate natural rewards like lithium chloride (LiCl) decrease the rewarding effect of electrical brain stimulation. Specifically, when the curve shift paradigm is used it has been reported that injecting LiCl at relatively high doses (100 or 200 mg/kg, i.p.) produces an increase in self-stimulation threshold, meaning that Frontiers in Behavioral Neuroscience www.frontiersin.org higher stimulation is required to produce a response similar to that observed during vehicle conditions (Tomasiewicz et al., 2006;Mavrikaki et al., 2009). Thus, a rightward shift of the curve that relates operant performance to stimulation frequency occurs, without significantly disrupting performance capacity (Miliaressis et al., 1986). A similar increase in reinforcement threshold is observed when the post-reinforcement pause method is used (Cassens and Mills, 1973). In this method the experimental subjects are trained under a concurrent fixed ratio (FR)-continuous reinforcement (CRF) schedule of reinforcement, in which the stimulation for the FR schedule is kept at maximal intensity whereas for the CRF stimulation is varied between zero and maximal. Increasing and decreasing stimulus intensity on the CRF schedule leads to the switching in schedule control over the behavior and a gradual disappearance and reappearance, of post-reinforcement pauses (PRPs) on the concurrent FR schedule. These PRPs are critical for providing a criterion for changeover in schedule control, and constitute a measure for reinforcement threshold (Buscher et al., 1990). The threshold obtained through this method, like the one obtained with the curve shift method, is then used as a baseline against which the effect of various experimental manipulations are expressed quantitatively in psychophysical units therefore avoiding the confounds effects of drugs on response rate (Bozarth, 1987).
These studies suggest that LiCl produces a hypofunction of brain reward systems and immediate effects on reward. One of the goals of the present study was to further characterize reward devaluation of BSR by providing evidence of long-lasting effects of LiCl when non-contingent reward delivery is paired with this drug, using a paradigm commonly used with natural rewards (Holland and Rescorla, 1975;Adams and Dickinson, 1981;Schoenbaum and Setlow, 2005;Nelson and Killcross, 2006). An advantage of using this approach is that BSR will be given in a different context than where the rats will be trained or tested (instead of performance under the effects of the drug), therefore minimizing associations between training context and reward that could counteract the effects of LiCl.
Additionally we also evaluated the effects of AM251, a cannabinoid receptor (CB1) antagonist. Behavioral output during the pursuit of reward can be potently modulated by activation of CB1 receptors, which are ubiquitous in brain circuitry associated with reward (Solinas et al., 2008). For example, injection of a CB1 agonist can reinstate drug-seeking behavior (De Vries et al., 2001). Similarly CB1 receptor agonists can potentiate the rewarding effect of drugs of abuse and natural rewards (Gallate et al., 1999;Valjent et al., 2002;; whereas antagonists have the opposite effect (Fattore et al., 2003(Fattore et al., , 2007Cippitelli et al., 2005;Economidou et al., 2006). When the role of CB1 receptors is evaluated in the context of BSR the results are contradictory. Some studies using CB1 receptor agonists show small or no decreases in self-stimulation threshold (Lepore et al., 1996;Arnold et al., 2001); whereas other experiments report pronounced decreases in self-stimulation thresholds (Vlachou et al., 2005(Vlachou et al., , 2006. When CB1 receptor antagonists are used, similar contradictory results are observed; some studies report no effects (Vlachou et al., 2005;Xi et al., 2008) whereas other show significant increases (Deroche-Gamonet et al., 2001;De Vry et al., 2004). The contrast between the robust effects of CB1 receptor manipulations on the reinforcing effects of natural rewards and drugs of abuse with those obtained with BSR could be an indirect indication of what factors are affected by CB1 receptor activation. It is possible that these receptors elicit a change in reinforcement by affecting the organism's motivational state and not the reward's intrinsic value. Indeed, it has been recently reported that CB1 receptors produce their effects on BSR by altering factors others than reward sensitivity (Trujillo-Pisanty et al., 2011). Therefore we hypothesized that the effects of pairing AM251 with non-contingent rewarding stimulation should not produce enduring effects on the valuation of reward.

SUBJECTS
Forty male Sprague-Dawley rats (Charles River, Wilmington, MA, USA) Weighting between 350 and 400 g at the moment of the surgery were used (n = 24 for LiCl experiments and n = 16 for AM251 experiments). The subjects were individually housed on a 12-h normal cycle (lights on from 0700 to 1900), with ad libitum access to water and food (Purina Rat Chow).

SURGERY
Animals were anesthetized with isoflurane, and implanted with a bipolar stimulating electrode (Plastics One, Roanoke, VA, USA) with prongs spaced 0.5 mm apart. The electrode was stereotaxically aimed at the ventral tegmental area (VTA; −0.5 mm ML, 5.4 mm AP, −8.7 mm DV) relative to bregma and secured with dental acrylic and skull-screw anchors. At the end of the surgery, the rats were injected with carprofen (5 mg/kg; s.c.) to reduce the pain and with sterile saline solution (1 ml/kg; s.c.) as post surgery fluid therapy. The rats were allowed to recuperate for 5-7 days post surgery before any experimental manipulation.

SELF-STIMULATION TRAINING
Each of the rats implanted with stimulating electrodes was shaped to press a lever for 24 biphasic square pulses (2 ms per phase) delivered at 60 Hz. The current varied across animals between 100 and 150 μA and it was delivered using a constant current isolator (A-M Systems, Sequim, WA, USA) controlled by a PC running custom-written LabVIEW software (National Instruments,Austin, TX, USA). Shaping took place in an operant conditioning chamber (12.5 L × 13.5 W × 13.5 H; Med Associates, Georgia, VT, USA) located within ventilated sound attenuation chambers. Control of operant boxes and response acquisition was achieved with Med-PC IV software (Med Associates, Georgia, VT, USA).
The operant boxes were equipped with a house light, two cue lights above two retractable levers, a sonalert module (2900 Hz tone delivery), and a white noise amplifier. Rats were shaped to press a lever to obtain electrical stimulation delivery at the VTA. Once they pressed the lever on their own they were trained under a fixed ratio 1 schedule with an inter-trial interval of 10 s. Both retractable levers were present during the experiment, but only one was associated with an illuminated cue light and reward delivery (active lever). Responses on the other lever (inactive lever) did not have any scheduled consequences. A trial began with the cue light on top of the active lever and the house light on and the extension of the active and inactive levers. Once the rat pressed down Frontiers in Behavioral Neuroscience www.frontiersin.org the active lever, both levers retracted and the electrical stimulation train was delivered, the cue and house lights were turned off, and the 2900-Hz tone started. At the end of the 10-s inter-trial interval, the tone was muted and the houselight was turned off for 1 s and a new trial began. White noise and fans were on throughout the experimental session. Animals were considered to be at criterion once they pressed 100 consecutive times for stimulation. Those rats that showed motor or aversive effects to the stimulation were removed from the experiment.

Experiment 1
Twenty-four hours after training, rats were randomly divided into two groups. The first group (n = 12) was injected with 5 mg/kg i.p. of LiCl (Sigma Aldrich) dissolved in 0.9% saline; the second group (n = 12) was injected with saline. Injections took place in the home cage 30 min prior to the delivery of non-contingent stimulation. The non-contingent stimulation was carried out in similar operant boxes as the ones the rats were trained; but no levers, stimuli, houselights, or white noise were present and the doors of the isolation cubicles were left open. When the rats were inside the boxes they received the stimulation according to a variable time 80 s schedule of reinforcement (VT 80 ). The non-contingent stimulation ended when the rats received 50 stimulations in a 60-min period. This procedure was carried out approximately at the same time for three consecutive days. Twenty-four hours after the last non-contingent stimulation experiment, rats were returned to the operant chambers where training had taken place. For this test session, all stimuli associated with lever presentation and reward delivery were presented as during self-stimulation responding; but the electrical stimulation was withheld. The session ended after an hour had elapsed.

Experiment 2
Twenty-four hours after training, rats were randomly divided into two groups. The first group (n = 8) was injected with 3 mg/kg, i.p. of AM251 dissolved in a solution of (1:1:18) ethanol, emulphor (Rhodia, Cranbury, NJ, USA), and saline (0.9%). The second group (n = 8) was injected with the vehicle. Drug delivery and experimental design were identical to experiment 1. This dose was chosen in accordance with previous studies (Xi et al., 2008;Trujillo-Pisanty et al., 2011).

HISTOLOGY
After completion of the experiment, a lethal dose of urethane (5 g/kg, i.p.) was administered and a 1-mA anodal current was passed through the stimulating electrode for 15 s to deposit iron ions at the site of the electrode tip. Rats were then perfused intracardially with 0.9% sodium chloride and a solution of potassium ferrocyanide (3%), potassium ferricyanide (3%), and trichloroacetic acid (0.5%) in 10% formalin. The brains were removed from the skulls and fixed with 10% formalin solution for at least 7 days. Coronal sections of 40 μm thickness were cut with a cryostat (Thermo Scientific). The stimulating electrode location was determined microscopically at low magnification with reference to the stereotaxic atlas of Paxinos and Watson (2007). The histological reconstructions of the electrode placement show that the tips of the electrodes were located within the VTA (see Figures 1A,B).

STATISTICS
The number of lever presses as well as the latency to press during the extinction session were analyzed for each pair of groups using independent groups t -test. A level of p < 0.05 for a two-tailed test was the criterion for statistical significance. The analysis was carried out using Statistica (Statsoft, Inc., Tulsa, OK, USA).

RESULTS
During the test session the group of subjects that received the pretreatment with LiCl pressed the lever an average of 8.41 ± 0.98 times with an average latency of 135 ± 6.17 s, whereas animals that received the pretreatment with saline pressed the lever on average 29 ± 5.86 times with an average latency of 135 ± 9.31 s (Figures 2A,B). The difference in the total number of lever presses between these two groups is statistically reliable [t (22) = 3.45; p = 0.002]. There was not a statistically significant difference in the observed latency to press between these two groups [t (22) = −0.04; p = 0.498].

Bregma -5.40
Bregma -5.28 Bregma -5.52 The rats that received the pretreatment with AM251 pressed an average of 24.25 ± 3.22 times whereas the rats that were pretreated with vehicle pressed in average 21.25 ± 2.16 times. The average latency to press for these groups was 101.66 ± 9.07 and 104.25 ± 15.52 s, respectively (Figures 3A,B). There were no statistically significant differences between the groups for neither the total number of lever press [t (14) = 0.58; p = 0.282] nor the latency to press [t (14) = −0.14; p = 0.443].

DISCUSSION
The present results show that instrumental responding for BSR is susceptible to reinforcer devaluation effects, when devaluation is conducted according to classically established procedures. Specifically the current study is unique from prior attempts in that BSR was devalued independently of the learned instrumental behavior, and the instrumental behavior was assessed without reexposure to the now-devalued BSR. Thus the demonstrated change in responding in the rats that received the LiCl-BSR pairings must reflect an underlying associative structure in which the instrumental response (or perhaps associated cues) drives responding in part by activating a cognitive representation of BSR and its current value. The finding that responding for BSR is sensitive to LiCl devaluation draws an important parallel between responding for BSR and natural rewards, and adds to evidence supporting the use of BSR as a model to examine the brain circuits mediating reward.

FIGURE 3 | (A)
Average number of responses during extinction session after three sessions of non-contingent reward delivery. The group pretreated with AM251 showed fewer responses than the group pretreated with vehicle but this difference was not statistically significant (p > 0.05). (B) The latency to press for both groups was statistically similar.
Other studies have shown that BSR and natural rewards share a common circuitry; BSR can be modulated by factors that have been shown to modulate the behavior controlled by natural rewards. Food restriction can potentiate BSR at certain brain sites (Blundell and Herberg, 1968;Carr and Wolinsky, 1993;Fulton et al., 2002). Furthermore, leptin, a hormone secreted by fat cells that suppresses food intake and promotes weight loss, has modulatory effects on BSR. Intracerebroventricular infusion of leptin attenuates the effectiveness of BSR in those brain sites in which BSR is susceptible to food restriction, whereas this hormone has the opposite effect when the electrode is located in sites that are not sensitive to food restriction manipulations (Fulton et al., 2000). Not only manipulations that alter natural rewards can potentially alter the behavior controlled by BSR, but BSR can also exert effects on behaviors typically elicited by natural rewards. For example, BSR can induce feeding (Valenstein et al., 1970;Berridge and Valenstein, 1991) and hoarding (Blundell and Herberg, 1973). The effect of BSR on these behaviors is probably due to potentiated salience of external stimuli rather than increased hedonic value (Berridge and Valenstein, 1991). At the electrophysiological level, conduction velocities and refractory period between the neurons that mediate BSR and stimulation-induced feeding are indistinguishable (Gratton and Wise, 1988).

Frontiers in Behavioral Neuroscience www.frontiersin.org
Our results also have important implications for understanding the role of CB1 receptors in mediating reward-seeking behaviors. CB1 receptors have been identified in reward pathways (Robbe et al., 2002;Cota et al., 2003;Melis et al., 2004;Le Foll and Goldberg, 2005) and play an important role in the behavioral expression of the rewarding effects of drugs of abuse, as well of natural rewards. CB1 receptor agonists increase operant responses for natural rewards and drugs of abuse (Gallate et al., 1999;Valjent et al., 2002;. In an opposite fashion, CB1 receptor antagonists blunt operant performance for natural rewards and drugs of abuse (De Vry et al., 2004). The malleability of behavior elicited by these manipulations suggests that these receptors play a crucial role in changing the attractiveness of reward. However, in the present task pairing AM251 with BSR did not affect subsequent instrumental responding. This suggests that, unlike LiCl, CB1 antagonism does not induce a lasting shift in the value of BSR. It could be argued that the dose of AM251 used in the present study was too low to produce any significant effect. However this possibility can be discarded since this dose given when the experimental subjects are performing for BSR produces significant effects on the mountain model testing paradigm (Trujillo-Pisanty et al., 2011) and produces significant changes in reward-seeking behavior when drugs of abuse are used (Xi et al., 2006(Xi et al., , 2008 as well as natural rewards (Droste et al., 2010).
The contrast in reward devaluation obtained with LiCl and AM251 could arise because antagonism of CB1 receptors does not affect the intrinsic value of reward, but the organism's motivational state. This would explain why AM251 administered during instrumental responding decreases progressive ratio breakpoints for a diversity of rewards (Ward and Dykstra, 2005;Droste et al., 2010), whereas AM251 administered separately with BSR does not.
Also, the inconsistent effects of cannabinoid antagonists on BSR (Solinas et al., 2008) may be a product of the lack of dimensionality of the traditional curve shift method. When operant performance for BSR is measured as a joint product of its stimulation strength and opportunity cost (Hernandez et al., 2010), AM251 produces consistent leftward shifts of the function that relates operant performance to the opportunity cost of the reward, whereas the function that relates operant performance to stimulation strength was conserved (Trujillo-Pisanty et al., 2011). Such shift is believed to be a product of factors that could include a decrease in the reward signal gain, or an increase in the subjective reward cost and the value of competing activities such as grooming, resting, and exploring (Herrnstein, 1970(Herrnstein, , 1974Killeen, 1972;Heyman, 1988). This result strongly suggests that CB1 receptors play their principal role in other parts of the reward circuit that that are not involved in the determination of reward sensitivity.
In summary, the present results show that LiCl has long-term effects on the valuation of BSR, which suggests that this compound is effective in reducing its intrinsic value and that the BSR task utilized in this study and others (Cheer et al., 2005(Cheer et al., , 2007 is indeed goal-directed. In contrast, treatment with the CB1 receptor antagonist AM251 did not produce such a change, suggesting that endocannabinoids preferentially engage the circuitry involved with motivation. The present results clarify that BSR is a goaldirected behavior and reinforce the notion that endocannabinoids are primarily involved with motivational rather than intrinsic aspects of reward.