Habitual Preference for the Nondrug Reward in a Drug Choice Setting

For adaptive and efficient decision making, it must be possible to select between habitual alternative courses of action. However, research in rodents suggests that, even in the context of simple decision-making, choice behavior remains goal-directed. In contrast, we recently found that during discrete trial choice between cocaine and water, water-restricted rats preferred water and this preference was habitual and inflexible (i.e., resistant to water devaluation by satiation). Here we sought to test the reproducibility and generality of this surprising finding by assessing habitual control of preference for saccharin over cocaine in non-restricted rats. Specifically, after the acquisition of preference for saccharin, saccharin was devalued and concurrent responding for both options was measured under extinction. As expected, rats responded more for saccharin than for cocaine during extinction, but this difference was unaffected by saccharin devaluation. Together with our previous research, this result indicates that preference for nondrug alternatives over cocaine is under habitual control, even under conditions that normally support goal-directed control of choice between nondrug options. The possible reasons for this difference are discussed.


INTRODUCTION
Organisms are constantly choosing between alternatives to select appropriate actions based on prior experience or expected outcomes. Evidence indicates that the performance of rewardrelated actions in both rats and humans reflects the interaction of two learning processes (Dickinson and Balleine, 1994;Dickinson, 1994;Balleine and Dickinson, 1998). The deliberative goal-directed process depends on a representation of the outcome as a goal and requires encoding of both the outcome value and the instrumental contingency between the action and the outcome (Dickinson and Balleine, 1994;Balleine and Dickinson, 1998). In contrast, the habitual learning process dissociates actions from the evaluation of their consequences, such that habitual actions can be spontaneously elicited by particular situations or stimuli Hart et al., 2014). The balance between goal-directed and habitual processes allows adaptive and efficient decision making. Although one may intuitively think that habitual course of actions can be selected among other alternatives, research in laboratory animals suggests that, even in the context of the simple choice decision, choice performance is dominated by goal-directed actions, rather than habitual responses (Colwill and Triola, 2002;Holland, 2004;Kosaki and Dickinson, 2010;Halbout et al., 2016). For instance, using the concurrent schedule in which two responses yielded different outcomes, post-training decrease in the incentive value of one outcome has been found to attenuate the rate of performance of the associated action, and to favor the choice of the alternative action (Yin et al., 2005;Corbit et al., 2013;Parkes and Balleine, 2013), indicating that choice behavior is goal-directed.
In a series of experiments, we have repeatedly shown that when facing a choice between pressing a lever to get a nondrug reward (i.e., water sweetened with saccharin) or an alternative lever to receive an intravenous dose of cocaine, most rats prefer the nondrug alternative (Lenoir et al., 2007;Cantin et al., 2010;Augier et al., 2012;Madsen and Ahmed, 2015;Vandaele et al., 2016). Importantly, we have found that choice could be biased in favor of cocaine by systematically varying the cost to obtain saccharin or by decreasing its concentration (Cantin et al., 2010). These results suggest that preference remains sensitive to instrumental and environmental contingencies, and may thus be under goal-directed control. However, we recently showed that this is, in fact, not the case (Vandaele et al., 2019b). Specifically, rats persisted to choose water, their preferred nondrug option when thirsty, even after devaluation by satiation and even if they consumed little of it upon delivery.
This result contrasts with the studies mentioned above showing that expression of habit is prevented in situations of choice involving multiple response-outcome associations (Colwill and Triola, 2002;Holland, 2004;Kosaki and Dickinson, 2010;Halbout et al., 2016). This discrepancy could be explained by the relative difference between the incentive value of the two outcomes which was large in our procedure (i.e., water under water-restriction vs. cocaine) but relatively small in prior studies (i.e., sucrose solution vs. sucrose pellets and sucrose solution vs. food pellets). In theory, when the options' values are close, the comparison process is difficult and should thus engage goal-directed processes, whereas when outcomes' values are sufficiently distant, a simple stimulus-response policy, relying on prior reward history, should suffice, eventually taking over goal-directed processes. Alternatively, this discrepancy could also be explained by other factors. First, the devalued outcome was water, a non-palatable biological reward that is essential for survival, particularly under conditions of water restriction (Vandaele et al., 2019b). Second, unlike prior studies, in our study, preference sensitivity to devaluation was not tested under extinction and with continuous access to both response options. Finally, in our study, devaluation also involved non-contingent access to the devalued outcome between choice trials, which may have resulted in concurrent degradation of instrumental contingency.
Here, we aimed at assessing habitual control of choice between a drug and a nondrug reward by using more standard devaluation procedures. Specifically, non-restricted rats were trained to choose between saccharin and cocaine. After the acquisition of preference for saccharin, saccharin was devalued and concurrent responding for both options was measured under extinction. Saccharin was devalued using two standard devaluation methods-sensory-specific satiety and conditioned taste aversion (CTA). As expected, rats responded more for saccharin than for cocaine during extinction, but this difference was unaffected by any method of saccharin devaluation. Together with our previous research, this result indicates that preference for nondrug alternatives over cocaine is under habitual control, even under conditions that normally support goal-directed control of choice between nondrug options close in value.

Subjects
Twenty male Sprague-Dawley rats (Charles River, L'Arbresle, France, 249-340 g at the beginning of experiments) were used. Rats were housed in groups of 2-3 and maintained in a temperature-controlled vivarium with a 12-h light-dark cycle. Food and water were freely available in the home cages and rats were neither food-nor water-restricted during behavioral testing. All experiments were carried out following institutional and international standards of care and use of laboratory animals UK Animals (Scientific Procedures) Act, 1986; and associated guidelines; the European Communities Council Directive (2010/63/UE, 22 September 2010) and the French Directives concerning the use of laboratory animals (décret 2013-118, 1 February 2013). The animal studies were reviewed and approved by the Committee of the Veterinary Services Gironde, agreement number B33-063-5.

Apparatus
Twelve identical operant chambers (30 × 40 × 36 cm) were used for all behavioral training and testing (Imetronic, Pessac, France). These chambers have been described in detail elsewhere (Augier et al., 2012). Briefly, each chamber was equipped with two automatically retractable levers (Imetronic), a commerciallyavailable lickometer circuit (Imetronic), two syringe pumps, a single-channel liquid swivel (Lomir biomedical Inc., Quebec, Canada) and two pairs of infrared beams to measure horizontal cage crossings.

Surgery
Rats were surgically prepared with chronic Silastic catheters (Dow Corning Corporation, Michigan, MI, USA) in the right jugular vein that exited the skin in the middle of the back about 2 cm below the scapulae as described previously (Lenoir et al., 2013).

Operant Training for Cocaine and Saccharin Self-administration
Animals were first trained on alternate daily sessions to lever press for either water sweetened with saccharin (0.2% for 20 s, delivered in the drinking cup) or intravenous cocaine (0.25 mg delivered over 5 s) under a fixed-ratio 1 (FR1 time-out 20 s) schedule (i.e., one response results in one reward), as described in detail elsewhere (Lenoir et al., 2013). One lever was associated with cocaine reward (lever C), the other with saccharin reward (lever S). Sessions began with the extension of one single lever (C or S). If rats responded on the available lever, they were rewarded by the corresponding reward (cocaine or saccharin). Reward delivery was signaled by a 20-s illumination of the cue-light above the lever during which responses were not rewarded (i.e., time-out period). Sessions ended after rats had earned a maximum of 30 rewards or 3 h had elapsed. The maximum number of saccharin or cocaine rewards was limited to 30 per session to ensure approximately equal exposure to both rewards before choice testing. Importantly, to equate training conditions, rats were also tethered to the infusion line during saccharin training sessions but received no injections. There were a total of 10 saccharin training sessions that alternated with 9 cocaine training sessions ( Figure 1A).

Discrete-Trials Choice Procedure
After the acquisition of lever pressing for cocaine and saccharin, rats were allowed to choose during several consecutive daily sessions between the lever associated with cocaine (lever C) and the lever associated with saccharin (lever S) on a discretetrials choice procedure. Each daily choice session consisted of 12 discrete trials, spaced by 10 min, and divided into two successive phases, sampling (four trials) and choice (eight trials). During sampling, each trial began with the presentation of one single lever in this alternative order: C-S-C-S. Lever C was presented first to prevent an eventual drug-induced taste aversion conditioning or negative affective contrast effects (Lenoir et al., 2007). If rats responded within 5 min on the available lever, they were rewarded by the corresponding reward (i.e., 0.25 mg cocaine delivered intravenously or 20-s access to water sweetened with 0.2% saccharin, as described above). Reward delivery was signaled by retraction of the lever and a 40-s illumination of the cue-light above this lever. If rats failed to respond within 5 min, the lever retracted and no cue-light or reward was delivered. Thus, during sampling, rats were allowed to separately evaluate each reward before making their choice. During choice, each trial began with the simultaneous presentation of both levers S and C. Rats had to select one of the two levers. During choice, reward delivery was signaled by retraction of both levers and a 40-s illumination of the cue-light above the selected lever. If rats failed to respond on either lever within 5 min, both levers retracted and no cue-light or reward was delivered. The response requirement of each reward was set to two consecutive responses to avoid eventual accidental choice. A response on the alternate lever before the satisfaction of the response requirement resets it. Response resetting occurred very rarely, however. Rats were tested in this discrete-trials choice procedure during at least five daily sessions until stabilization of group-average preference (i.e., no increasing or decreasing trend across three consecutive sessions and between-session variation <10%; Figure 1A).

Satiety-Induced Saccharin Devaluation
Animals were divided into two groups, devalued (D) and non-devalued (ND). Animals were individually placed in feeding cages, brought to an experimental room (satiety room) physically different from the room containing the self-administration chambers (choice room), and allowed to acclimate to this room for 30 min. Rats in the D group (N = 10) were given 30 min of free access to a bottle containing 0.2% saccharin whereas rats in the ND group (N = 10) were given free access to a water bottle. The control solution was water because it has a distinct taste from saccharin but, like saccharin, is non-caloric. Immediately after home-cage pre-feeding, rats were brought to the choice room and tested for lever-press responding during extinction. For each rat, extinction begins after a delay of 10 min. During extinction, both levers are presented simultaneously and continuously for 10 min. Responding on either lever has no programmed consequence. To confirm the presence of saccharin satiety, immediately after extinction testing, animals were brought back to the satiety room and were all given free access to a bottle containing 0.2% saccharin during 30 min.

Conditioned Taste Aversion (CTA)-Induced Saccharin Devaluation
Aversion conditioning was conducted in feeding cages in an experimental room (CTA room) physically different from the room containing the self-administration chambers (choice room) to minimize direct aversive conditioning to the operant chambers and to devalue saccharin in similar conditions as the satiety-induced devaluation test. Animals were allowed to acclimate to this room for 30 min to avoid novelty-induced anxiety. Rats in the D group (N = 10) were given 30 min of free access to a bottle containing 0.2% saccharin whereas rats in the ND group (N = 10) were given free access to a water bottle. After this 30-min period, rats returned to the colony room and were injected with lithium chloride (5 ml/kg, i.p., of 0.3 M LiCl) before being returned to their home cages. The entire procedure was repeated three times until >80% suppression of saccharin drinking. Rats were then left in their home cages for at least 48 h after the last LiCl administration and before being tested for lever-press responding under extinction. For each rat, extinction begins after a delay of 10 min. During extinction, both levers are presented simultaneously and continuously for 10 min. Responding on either lever has no programmed consequence. To confirm the presence of the CTA, immediately after extinction testing, animals were brought back to the CTA room and were all given free access to a bottle containing 0.2% saccharin during 30 min.

Data Analysis
All data were subjected to mixed analyses of variance (ANOVA), followed by post hoc comparisons using Tukey's Honestly Significant Difference (HSD) test. Comparisons with a fixed theoretical level (e.g., 50%) were conducted using one-sample t-tests. Some behavioral variables did not follow a normal distribution and were thus analyzed using non-parametric statistics (i.e., Friedman's test for the main effect followed by Wilcoxon's test for paired comparisons; Mann-Whitney for group comparison).

RESULTS
During acquisition, rats learned to self-administer saccharin and cocaine on alternate daily sessions and rapidly earned the maximum number of reward possible in both conditions ( Figure 1B). However, rats self-administered saccharin at a much higher response rate than cocaine which resulted in shorter session durations ( Figure 1C; main effect of reward: F (1,18) = 75.91, p < 0.0001). Following this result and as expected from previous research, virtually all Sprague-Dawley rats preferred saccharin over cocaine when offered a choice (mean cocaine choice over the last three sessions: 27.1 ± 6.5%; Figure 2A). Their preference significantly deviated from indifference from the second session (t-values > 2.2, pvalues < 0.05; Figure 2A). Although preference did not significantly change across sessions (Friedman ANOVA Chi Sqr = 7.2, p > 0.1), rats generally completed every choice trial (i.e., 99.8 ± 0.2%; Figure 2B) with increasing efficiency, as evidenced by the decrease in choice latency reaching about 5 s across the last three sessions (Friedman ANOVA Chi Sqr: 68.5, p < 0.0001; Figure 2C). This decrease in choice latency was accompanied with a decrease in both cocaine and saccharin sampling latency (cocaine: Friedman ANOVA Chi Sqr = 44.0, p < 0.0001; saccharin: Friedman ANOVA Chi Sqr = 46.6, p < 0.0001; Figure 2D) suggesting that rats learned to select options and to choose between them with little hesitation. Following rats' preference, the latency to sample saccharin was shorter than the latency to sample cocaine (F (1,19) = 10.40, p-values < 0.001; Figure 2D).
We then assessed whether choice behavior in our procedure was sensitive to 30-min free access to either a bottle containing 0.2% saccharin (D group) or a bottle of water (ND group) before testing under extinction. Although animals having free access to saccharin drank a large amount of saccharin (18.1 ± 2.1 ml), this pre-feeding did not affect responding during the extinction test (group: F (1,18) = 0.02, p > 0.5; Figures 3A,B). Animals in both groups responded more on the saccharin lever than on the cocaine lever (lever: F (1,18) = 17.45, p < 0.001; Figures 3A,B), in agreement with their strong preference for saccharin. The greatest difference in responding between saccharin and cocaine occurred during the first minute of the test (time bin 1: z = 2.91, p < 0.01; Figure 3A). However, rats in the D group responded as much on the saccharin and cocaine lever as rats in the ND group (group × lever: F (1,18) = 0.004, p > 0.5; Figures 3A,B). The lack of devaluation effect was not due to a failure of saccharin pre-feeding to induce sensory-specific satiety. Indeed, after the extinction test, animals in the D group significantly decreased their saccharin intake compared to their consumption during saccharin pre-feeding before the extinction test (F (1,9) = 17.72, p < 0.01; Figure 3C). Furthermore, these rats consumed significantly less saccharin after the extinction test than rats in the ND group, previously exposed to water bottles (F (1,18) = 13.84, p < 0.01; Figure 3C). Thus, although saccharin pre-feeding reliably induced sensory-specific satiety, animals were insensitive to reward devaluation suggesting that their behavior was under habitual control.
To further probe the resistance to devaluation, we assessed the effects of CTA on responding during extinction. Rats were first trained for 8 additional sessions in the discrete-trials choice procedure and maintained a stable preference for saccharin, similar to the preference before the first devaluation test (data not shown; average % of cocaine choice over the last three sessions: 26.8 ± 7.1). An aversion to saccharin was then conditioned by pairing its consumption with illness induced by lithium chloride (LiCl) for 3 days. LiCl injections induced a robust CTA (group × day: F (2,36) = 117.4, p < 0.0001; Figures 4A,B) as animals in the D group decreased their saccharin intake between the first and the last day of LiCl treatment (from 26.9 ± 1.8 ml to 4.3 ± 0.2 ml; Tukey p < 0.001; Figure 4A). However, the LiCl devaluation had no effect on responding during the extinction test (group × lever: F (1,18) = 0.65, p > 0.4; Figures 4C,D). Animals in both groups responded more on the saccharin lever than on the cocaine lever (F (1,18) = 24.84, p < 0.0001), mainly during the first minute of the test (time × lever: F (9,162) = 6.65, p < 0.0001; post hoc time bin 1; p < 0.0001; Figure 4C), in agreement with their preference and with the results described above (Figure 3A). The lack of devaluation effect was not due to a failure to induce CTA since animals in the D group significantly decreased their saccharin intake compared to the ND group during free access to saccharin bottle after the extinction test (Mann-Whitney, Z = 3.62, p < 0.0001; Figure 4B).

DISCUSSION
The present study clearly shows that choice between saccharin and cocaine is insensitive to changes in saccharin value, a hallmark of habitual performance (Balleine and Dickinson, 1998;Balleine and O'Doherty, 2010). As expected, rats responded more for saccharin than for cocaine during extinction, but this difference was unaffected by any method of saccharin devaluation (i.e., sensory-specific satiety or induction of a CTA). Together with our previous research (Vandaele et al., 2019b), this result indicates that preference for nondrug alternatives over cocaine is under habitual control, even under conditions that normally support goal-directed control of choice between nondrug options.
In our previous study, preference sensitivity to water devaluation was tested during reinforced choice trials, with free water accesses before and during the test, in conditions where drinking water constituted a biological need critical for survival (Vandaele et al., 2019b). These factors could have promoted preference insensitivity to devaluation. In the present study, several changes were done to avoid this potential caveat and to test sensitivity to outcome devaluation under more standard conditions (Holland, 2004;Kosaki and Dickinson, 2010;Corbit et al., 2013;Parkes and Balleine, 2013). Here, devaluation tests were conducted under extinction, with a 10 min concurrent access to both levers, allowing rats to freely sample them in a self-paced manner. Responding during both devaluation tests reflected rats' preference with a higher rate of responding for saccharin compared to cocaine, specifically during the first minute of the test session. This result is in agreement with previous findings showing a stronger resistance to extinction on the saccharin lever (Cantin et al., 2010). However, we did not observe any effect of devaluation, indicating that choice performance was habitual. This finding is difficult to conciliate with previous empirical and theoretical research on choice behavior showing that training on a schedule offering a choice between responses yielding different outcomes prevents the expression of habits (Colwill and Rescorla, 1985;Holland, 2004;Kosaki and Dickinson, 2010). In two of the latter studies (Colwill and Rescorla, 1985;Kosaki and Dickinson, 2010), the CTA was conducted in the operant chambers whereas it was conducted in separate feeding cages in the present study. This was done on purpose to avoid any aversion conditioning to the choice context itself (Boakes et al., 1997;Blizard, 2016, 2018). One could, therefore, argue that our negative findings may be due to a failure of generalization of the CTA to the choice context. This is unlikely, however, since such generalization has been previously observed in other similar studies (Dickinson et al., 1983;Holland, 2004;Schoenbaum and Setlow, 2005;Vandaele et al., 2017;Keiflin et al., 2019). Also, we have independent evidence that in our conditions, devaluation by satiation is effectively transportable across different contexts (in preparation). Taken together these considerations strongly suggest that saccharin choice is habitual in our choice procedure.
It is well known that overtraining on a particular response can render it habitual through the development of stimulus-response associations (Adams, 1982;Dickinson, 1985;Dickinson et al., 1995;Coutureau and Killcross, 2003). During initial operant training, animals were exposed to 300 saccharin outcomes, a number of trials sufficient to shift animals' performance from goal-directed to habitual (Dickinson et al., 1995). It could then be argued that repeated testing could account for the insensitivity to outcome devaluation observed in the present study. However, it has been shown that whatever the amount of instrumental training, stimulus-response habits do not overcome goal-directed decision making when two responses associated with different outcomes are concurrently available (Colwill and Rescorla, 1985;Holland, 2004;Kosaki and Dickinson, 2010). Alternatively, numerous studies have shown that cocaine exposure promotes habitual responding, whether for cocaine itself (Dickinson et al., 2002;Miles et al., 2003) or a nondrug reward (Gourley et al., 2013;LeBlanc et al., 2013;Corbit et al., 2014;Schmitzer-Torbert et al., 2015). However prior cocaine exposure was not sufficient to bias responding toward habit in a choice situation involving two nondrug rewards close in value, and cannot account for the results reported here (Halbout et al., 2016). Rather, the rapid development of habit may have been promoted by prior training in the discrete trial choice procedure. Indeed, it was shown that the insertion and retraction of the lever at the onset and termination of discrete trials constitute salient reward-predictive cues, leading to higher automaticity, behavioral chunking, and the rapid development of stimulus-bound habitual responding (Vandaele et al., 2017(Vandaele et al., , 2019a).
An alternative explanation for the unexpected habitual performance in our choice procedure could reside in the large difference in saccharin and cocaine incentive value (Cantin et al., 2010). Theoretical models and a growing body of evidence suggest that the brain chooses advantageously among competing options by assigning values to the two stimuli, comparing them, and selecting the best course of action (Glimcher and Rustichini, 2004;Rangel et al., 2008;Rushworth et al., 2009;Rangel and Hare, 2010). Therefore, when the available options are difficult to distinguish, decisions are made based on careful evaluation of options values, and therefore, remain under goal-directed control. Consistent with this, choice performance is systematically under goal-directed control when the choice outcomes are close in value (Colwill and Triola, 2002;Holland, 2004;Kosaki and Dickinson, 2010). However, in our situation, the value difference between the two outcomes is such that decision-making does not require effortful representation and comparison of the value of the option and could instead rely on a simpler stimulus-response policy, based on prior reward history. As such, we suggest that the difference in options' values might encourage the transition from goal-directed to habitual performance. Consistent with this hypothesis, Daw et al. (2005) suggested that arbitration between goal-directed and habitual systems relies on the relative uncertainty of predictions from each system with a low task complexity favoring habitual model-free control (Daw et al., 2005). Also, Keramati et al. (2011) proposed a normative model in which the relative incentive value of each outcome critically affects the arbitration between goal-directed and habit processes (Keramati et al., 2011). If the arbitration between goal-directed and habitual processes depends on the value difference between options, then one would expect that behavior would be goal-directed during a choice between cocaine and another nondrug reward with a similar reinforcing value, such as a lower concentration of saccharin (Cantin et al., 2010). This hypothesis remains to be tested in future experiments.
At a neurobiological level, the balance between goal-directed and habitual behavior depends upon corticostriatal circuits, with a sensorimotor-dorsolateral striatal network supporting habitual, stimulus-response behaviors and a prefrontal-dorsomedial striatal network mediating flexible, goal-directed behavior (Yin and Knowlton, 2004;Yin et al., 2005Hitchcott et al., 2007;Ashby et al., 2010;Balleine and O'Doherty, 2010;Corbit et al., 2012;Lingawi and Balleine, 2012). Among regions of the ''goal-directed network,'' the orbitofrontal cortex (OFC) is critical when values must be used to guide responding based on a representation of the expected outcomes (Rushworth et al., 2011;Schoenbaum et al., 2011;Padoa-Schioppa and Conen, 2017). Although this region would be expected to support choice performance, recent studies reported no effect of optogenetic inhibition of OFC on economic choice behavior (Gardner et al., 2017(Gardner et al., , 2018. To explain this surprising result, the authors suggested that economic choice in their task may not be entirely governed by model-based goal-directed behavior but could instead rely on habits, as is the case in the present study. We have recently found neuronal correlates of preference between cocaine and saccharin in the OFC, with the size of the cocaine-signaling neuronal assembly during sampling trials predicting preference for cocaine during choice trials Ahmed, 2018, 2019). Future experiments will move one step forward to investigate the causal involvement of this region in choice performance during discrete-trial choices between cocaine and saccharin. From the results reported here, we should expect no effect of OFC lesion, pharmacological inactivation, or optogenetic inhibition on choice performance.
Several theories suggest that drugs of abuse may contribute to compulsive drug use by promoting habitual drug-seeking, at the expense of alternative activities (Robbins and Everitt, 1999;Robbins, 2005, 2016). Although the difficulty to devalue drug self-administered intravenously precludes any conclusion about the nature of cocaine-seeking (i.e., habitual or not), our results do not seem consistent with these theories. In our study, habitual responding for saccharin may bias preference toward saccharin choice. Indeed, by definition habitual responding for saccharin is automatically triggered by antecedent stimuli (for instance, the insertion of the lever) with short response latency. In contrast, if responding for cocaine is under goal-directed control, then the selection of this option would require a representation of the outcome value and would be associated with longer response latencies. Preference for saccharin could then be explained by a faster selection of the saccharin option, as previously suggested (Shapiro et al., 2008). Analysis of sampling latencies supports this hypothesis with shorter sampling latency for saccharin compared to cocaine. Does this mean that preference for saccharin only results from habit? Previous findings suggest that habitual responding for saccharin is not sufficient to explain the preference (Vandaele et al., in press). Indeed, rats still preferred the saccharin option when they were able to exert voluntary goal-directed control over the initiation of choice trials. Furthermore, preference is sensitive to variation in saccharin concentration, delay, and cost (Lenoir et al., 2007;Cantin et al., 2010), suggesting that the value of saccharin is still computed and considered in the decision-making process, although with less deliberation than previously thought.
In conclusion, we report strong evidence that choice behavior can become habitual in a drug choice setting in rats. Prior training in the discrete trial choice procedure combined with the large difference in options value may have contributed to this finding. Clearly, more research is needed to understand why choice behavior between drug and nondrug rewards becomes habitual and inflexible in our conditions in comparison to other choice studies. This question is all the more important because growing evidence in humans suggests that habit formation occurs rarely, if at all, in similar laboratory drug choice settings (Hogarth, 2020). One major difference between these two sets of choice studies, in addition to species-specific differences, is that in human drug choice studies, people preferred the drug option over the nondrug option while in our and other studies, rats showed the opposite preference. Understanding this difference in drug preference may represent a first step toward understanding differential engagement of goal-directed control during drug choice in these two animal species.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The animal studies were reviewed and approved by the Committee of the Veterinary Services Gironde, agreement number B33-063-5.

AUTHOR CONTRIBUTIONS
SA conceived the project. KG and SA designed the experiment. KG carried out the experiment and collected the data. KG and YV analyzed the data and wrote the first version of the manuscript. All authors critically edited, reviewed content and approved the final version for publication.

FUNDING
This work was supported by the French Research Council (Centre National de la Recherche Scientifique, CNRS), the Université de Bordeaux, the French National Agency (Agence Nationale de la Recherche, ANR-2010-BLAN-1404-01), the Ministère de l'Enseignement Supérieur et de la Recherche (MESR) and the Fondation pour la Recherche Médicale (FRM DPA20140629788).