Habit Formation after Random Interval Training Is Associated with Increased Adenosine A2A Receptor and Dopamine D2 Receptor Heterodimers in the Striatum

Striatal adenosine A2A receptors (A2ARs) modulate striatal synaptic plasticity and instrumental learning, possibly by functional interaction with the dopamine D2 receptors (D2Rs) and metabotropic glutamate receptors 5 (mGluR5) through receptor-receptor heterodimers, but in vivo evidence for these interactions is lacking. Using in situ proximity ligation assay (PLA), we studied the subregional distribution of the A2AR-D2R and A2AR-mGluR5 heterodimer complexes in the striatum and their adaptive changes over the random interval and random ratio training of instrumental learning. After confirming the specificity of the PLA detection of the A2AR-D2R heterodimers with the A2AR knockout and D2R knockout mice, we detected a heterogeneous distribution of the A2AR-D2R heterodimer complexes in the striatum, being more abundant in the dorsolateral than the dorsomedial striatum. Importantly, habit formation after the random interval training was associated with the increased formation of the A2AR-D2R heterodimer complexes, with prominant increase in the dorsomedial striatum. Conversely, goal-directed behavior after the random ratio schedule was not associated with the adaptive change in the A2AR-D2R heterodimer complexes. In contrast to the A2AR-D2R heterodimers, the A2AR-mGluR5 heterodimers showed neither subregional variation in the striatum nor adaptive changes over either the random ratio (RR) or random interval (RI) training of instrumental learning. These findings suggest that development of habit formation is associated with increased formation of the A2AR-D2R heterodimer protein complexes which may lead to reduced dependence on D2R signaling in the striatum.

Striatal adenosine A 2A receptors (A 2A Rs) modulate striatal synaptic plasticity and instrumental learning, possibly by functional interaction with the dopamine D 2 receptors (D 2 Rs) and metabotropic glutamate receptors 5 (mGluR5) through receptor-receptor heterodimers, but in vivo evidence for these interactions is lacking. Using in situ proximity ligation assay (PLA), we studied the subregional distribution of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimer complexes in the striatum and their adaptive changes over the random interval and random ratio training of instrumental learning. After confirming the specificity of the PLA detection of the A 2A R-D 2 R heterodimers with the A 2A R knockout and D 2 R knockout mice, we detected a heterogeneous distribution of the A 2A R-D 2 R heterodimer complexes in the striatum, being more abundant in the dorsolateral than the dorsomedial striatum. Importantly, habit formation after the random interval training was associated with the increased formation of the A 2A R-D 2 R heterodimer complexes, with prominant increase in the dorsomedial striatum. Conversely, goal-directed behavior after the random ratio schedule was not associated with the adaptive change in the A 2A R-D 2 R heterodimer complexes. In contrast to the A 2A R-D 2 R heterodimers, the A 2A R-mGluR5 heterodimers showed neither subregional variation in the striatum nor adaptive changes over either the random ratio (RR) or random interval (RI) training of instrumental learning. These findings suggest that development of habit formation is associated with increased formation of the A 2A R-D 2 R heterodimer protein complexes which may lead to reduced dependence on D 2 R signaling in the striatum.

INTRODUCTION
The adenosine A 2A receptors (A 2A Rs) are highly enriched in the striatopallidal neurons of the striatum (Svenningsson et al., 1999) where A 2A Rs are co-localized with and form heterodimers with the dopamine D 2 receptors (D 2 Rs) and metabotropic glutamate 5 receptors (mGluR5) (Tebano et al., 2009;Pinna et al., 2014;Taura et al., 2015). Possibly through the receptor-receptor heterodimerization, striatopallidal A 2A Rs interact antagonistically with D 2 Rs (Canals et al., 2003;Trifilieff et al., 2011), and synergistically with mGluR5 (Ferré et al., 2002;Kachroo et al., 2005). By these functional interactions, striatopallidal A 2A Rs can modulate dopamine and glutamate signaling and striatal synaptic plasticity and cognitions including instrumental behaviors (Chen, 2014). Indeed, genetic inactivation of striatal A 2A Rs impaired habit formation (Yu et al., 2009) and pharmacological reduction of A 2A R-mediated cAMP-pCREB signaling in the dorsal medium striatum (DMS) enhanced goal-directed ethanol drinking (Nam et al., 2013) and reversed meth-amphetamine-induced facilitation of habit formation (Furlong et al., 2015). However, the mechanism underlying the A 2A R modulation of instrumental behaviors is not known.
Striatal long-term depression (LTD) that is restricted to striatopallidal neurons and requires activation of D 2 Rs and mGluR5 (Kreitzer and Malenka, 2007;Lovinger, 2010) is the main form of plasticity of synaptic transmission in the dorsolateral striatum (DLS; Partridge et al., 2000;Yin and Knowlton, 2006;Lovinger, 2010). The loss of striatopallidal LTD is associated with a shift in behavioral control from goaldirected (Furlong et al., 2015) action to habitual responding (Nazzaro et al., 2012). Since activation of striatopallidal A 2A Rs can convert the striatopallidal synaptic plasticity from LTD to long-term potentiation (LTP; Shen et al., 2008), striatopallidal A 2A R signaling may interact with D 2 R-/mGluR5-/endocannabinoids-mediated striatal LTD in striatopallidal neurons to modify instrumental learning. Thus, we postulated that striatopallidal A 2A Rs may exert their effects on D 2 R-or mGluR5-mediated striatal synaptic plasticity and instrumental learning through the physical association of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers in the striatopallidal neurons. Here, using two instrumental learning schedules coupled with in situ proximity ligation assay (PLA), we investigated the heterogeneous distribution of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers in the DLS and DMS and their adaptive changes after the random interval (to promote habit) and random ratio (to promote goal-directed behavior) training schedules.

Animals
All animals were handled in accordance with the protocols approved by the Institutional Ethics Committee for Animal Use in Research and Education at Wenzhou Medical University, China. Eighteen adult C57B6/J (n = 6/experimental group), three A 2A R knockout mice (from Chen's laboratory at Boston University School of Medicine) and three D 2 R knockout mice (from The Jackson Laboratory, USA, Drd2 tm1Low , stock No. 003190) were used for the experiments.

Instrumental Behavior Training Schedules
Instrumental training and behavioral testing schedules were performed following the procedure by Rossi et al. (Rossi and Yin, 2012). In brief, mice first underwent a 5-day food deprivation schedule to reach 80-85% of their free-feeding weight before instrumental training sessions. Mice were then given one 30-min magazine training session during which one drop of 20 µl 20% sucrose solution as reward was delivered on a random time 60-s schedule. During continuous reinforcement (CRF) sessions, each lever press resulted in delivery of the sucrose reward. Sessions ended after 60 min or 50 rewards had been earned, whichever came first. After CRF, mice underwent a random interval (RI) schedule to promote habit formation or a random ratio (RR) schedule to promote goal-directed behavior. Mice underwent the RI schedule were trained for 2 days on random interval 30 s (RI30), with a 0.1 probability of reward availability every 3 s contingent upon lever pressing, followed by 4 days on RI60 schedule. Progressively leaner schedules of reinforcement were used for the RR training procedure: RR5 (each response was rewarded at a probability of 0.2 on average), RR10, RR20 each for 2 days.
Following the RI and RR training sessions, a 2-day devaluation test was conducted. A specific satiety procedure was applied to alter the current value of a specific reward. On each day the mice were allowed to have free access to home chows which were used for maintaining their weights (i.e., valued condition, the sucrose solution was still valued) or sucrose solution which was earned by their lever pressing in the training sessions (i.e., devalued condition, the sucrose solution was devalued) for at least an hour to achieve sensory-specific satiety. Immediately after the unlimited pre-feeding session, mice were given a 5-min extinction test during which the lever was inserted and pressing times was recorded but without reward delivery. The orders of the valued and devalued condition tests (day 1 or 2) were counterbalanced across each group. Mice insensitive to manipulation of outcome value, that is habit, would mildly change lever presses on the devalued condition compared to the valued condition, whereas goal-directed mice that performed sensitively to outcome value would significantly reduce their lever presses on the devalued condition. The control mice underwent food deprivation schedule and were handled exactly the same way every day just as the RR and RI training group before instrumental training. During the training sessions, the control mice were also placed in the operant chambers in which the sucrose reward was delivered in a random 30/60 s (corresponding to RI30/RI60) manner but without lever stretched. In the devaluation test, the control mice were also exposed to the operant chamber for 5 min with no lever stretched out and no reward available. In the present study, three groups (n = 6 for each group) were examined for the A 2A R-D 2 R/A 2A R-mGLuR5 heterodimers in the striatum after instrumental learning: (a) mice without instrumental training as "control group, " (b) mice underwent RI/RR training sessions as "RI group, " and (c) mice underwent RR training sessions as "RR group."

Proximity-Ligation Assay (PLA)
After two additional RI60 or RR20 training sessions following devaluation test, mice were sacrificed for PLA detection of the A 2A R-D 2 R and A 2A R-mGLuR5 heterodimers in the striatum. We performed PLA analysis according to the procedure described recently (Augusto et al., 2013;Pinna et al., 2014). Three sections from one brain (from anterior, middle, and posterior parts of the striatum, respectively) were rinsed in TBS at room temperature. The sections were incubated with 1% BSA and 0.5% Triton X-100 for 2 h at room temperature for blocking and permeabilization. The mouse anti-A 2A R (1:300; millipore) and rabbit anti-D 2 R (1:300; millipore)/rabbit anti-mGluR5 (1:300; millipore) were incubated with sections overnight at room temperature. Sections were then rinsed for four times (30 min each time) in TBS with 0.2% Triton X-100 following the manufacturer's protocol. Slices were then incubated at 37 • C with the PLA secondary probes (1:5; Olink Bioscience) for 2 h. After rinsing with "Duolink II" Wash Buffer A, the slices were then incubated with the ligation-ligase solution for 30 min at 37 • C followed by rinsing with Duolink II Wash Buffer A. Sections were ready for amplification with polymerase (1:40; Olink Bioscience). Then the sections were washed in decreasing concentrations SSC buffers (Olink Bioscience) and mounted on slides. Fluorescent mounting medium (containing DAPI) were applied on the sections. The fluorescence images (three non-overlapping and random microscopic fields respective from DMS and DLS of each brain section) were acquired by confocal microscope (Figure 2A). The quantitative analysis was done following the procedure by Bonaventura et al. (2014). The cells surrounded by the red puncta were defined as positive cells (white arrows in Figure 1). The cell number was counted by software "Image J." Each microscopic field image was quantified as "positive cell number/total cell number." The quantified value of the experimental mice was normalized to that of A 2A R KO mice (as the background).

Immunofluorescence
Mice were deeply anesthetized and then transcardially perfused with 0.01 M PBS (pH = 7.4) followed by ice-cold 4% paraformaldehyde. Brains were post-fixed in 4% paraformaldehyde for 4-6 h at 4 • C, and then allowed to equilibrate using gradient sucrose solution (10, 20, and 30%). Immunofluorescence were performed on 30 µm free-floating sections. Free-floating sections were washed in PBS and incubated for 60 min in 0.3% Triton X-100 and 10% normal donkey serum and then incubated with mouse anti-A 2A R antibody (Millipore, 1:200) and rabbit anti-D 2 R antibody (Millipore, 1:200) at 4 • C overnight. Brain sections were incubated with Alexa 488-conjugated secondary antibodies (Invitrogen, 1:1000). The sections were washed and mounted. Fluorescent mounting medium were applied on the sections. Images were acquired by a fluorescence microscope.

Statistical Analysis
Instrumental behavior training and test processes were analyzed using two-way ANOVA for repeated measurements with the training or test sessions as within-subjects effect and the different  training procedures as between-subjects effect. In the PLA assay, two-way ANOVA was used with striatal subregions and training procedures as main effects. Paired t-test was conducted to compare the distribution difference of A 2A R-D 2 R and A 2A R-mGluR5 heterodimers between the DMS and DLS. One-way ANOVA with LSD post-hoc was used to compare the distribution variation of A 2A R-D 2 R heterodimers on the anterior-posterior axes in both DMS and DLS after instrumental learning.

Detection of the A 2A R-D 2 R and A 2A R-mGluR5 Heterodimers by PLA in the Striatum
To detect the striatal A 2A R-D 2 R and A 2A R-mGluR5 heterodimers by PLA assay, we first confirmed the specificity of the PLA detection of the A 2A R-D 2 R and A 2A -mGluR5 heterodimers using the A 2A R KO and D 2 R KO mice. The specificity of the A 2A R and D 2 R antibody was evident with highly enriched expression pattern of the A 2A Rs or D 2 Rs in the striatum by immunofluorescence, which was absent in the A 2A R or D 2 R KO mice ( Figure 1A). Moreover, the specific labeling of the A 2A R-D 2 R heterodimer signals (cellular membrane) were detected in ∼15% of striatal neurons of wild-type mice in close association of DAPI (nuclei; as indicated by white arrow; Figures 1B,C). Importantly, these signals for the A 2A R-D 2 R protein complexes were essentially absent in the striatum of the A 2A R KO or D 2 R KO mice (Figures 1B,C), confirming the specificity of the PLA detection of the A 2A R-D 2 R heterodimers. Similarly, the A 2A R-mGluR5 heterodimer signals were detected by PLA in the striatum of wild-type mice but not in A 2A R KO mice, supporting the specificity of PLA detection of the A 2A R-mGluR5 heterodimers (Figures 1D,E).

Heterogeneous Distribution of the A 2A R-D 2 R Heterodimers (but Not the A 2A R-mGluR5 Heterodimers) in the DMS and DLS
Following the confirmation of the specificity of the PLA detection of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers, we examined the heterogeneous distribution of these heterodimers in the DMS and DLS in normal mice. PLA analysis showed that the A 2A R-D 2 R complexes were more prominent in the DLS than DMS (Figure 2B). Quantitative analysis confirmed the heterogeneous distribution of A 2A R-D 2 R heterodimers in the striatum (i.e., DLS > DMS; Figure 2B). By contrast, there was no heterogeneous distribution of the A 2A R-mGluR5 heterodimers in the DMS and DLS by PLA (Figure 2C).

Random Interval Schedule Promoted Habit Formation and Increased the Formation of the Striatal A 2A R-D 2 R Heterodimers
Following RI training sessions, mice gradually, and steadily increased their lever presses and reached plateau at RI60 schedule ( Figure 3A). Consistent with the previous studies (Dickinson et al., 1983;Yin and Knowlton, 2006), devaluation test ( Figure 3B) showed that mice trained by RI procedure showed insensitive to outcome devaluation, indicating their habitual action. In association with habit formation after the RI training sessions, we detected the increased formation of the A 2A R-D 2 R heterodimers compared to mice without training ("control;" Figures 3C,D). Moreover, the A 2A R-D 2 R heterodimers increased in both DMS and DLS, and accordingly, the heterogeneous pattern of this heterodimers in DMS and DLS persisted after the trainings.

Random Ratio Promoted Goal-Directed Behavior without Affecting the A 2A R-D 2 R Heterodimer Formation in the Striatum
Over the RR training sessions, mice also gradually and steadily increased their lever presses and reached plateau at RR20 schedule ( Figure 4A). Consistent with the previous studies (Dickinson et al., 1983;Yin and Knowlton, 2006), devaluation test showed that mice trained by RR schedule markedly reduced their lever presses in the devalued condition, indicating their goal-directed behavior ( Figure 4B). In association with goaldirected behavior after RR training sessions, we did not detect any significant change in the A 2A R-D 2 R heterodimers compared to mice without training ("control;" Figures 4C,D). Furthermore, the striatal A 2A R-D 2 R heterodimers underwent RI schedule were significantly increased in both DMS and DLS compared to the RR group [DMS: F (2, 17) = 6.351, p = 0.010, DLS: F (2, 17) = 9.605,  Data are presented as the mean ± SEM from n = 6/group. p = 0.002; * p < 0.05, * * p < 0.01; one-way ANOVA with LSD post-hoc test, n = 6/group]. Thus, the increased association of striatal A 2A R-D 2 R heterodimers is selectively induced by the RI training schedule in association with habit formation, but not affected by RR training schedule which produced goal-directed behavior.

A 2A R-D 2 R Heterodimers in the DMS Showed Prominent Increases after RI Training on Anterior-Posterior Axes
We have also performed detailed analysis of the A 2A R-D 2 R hetrerodimers in three subregions of the DMS and DLS on anterior-posterior axes (i.e., anterior, middle, and posterior, Figure 5). In the DMS, the A 2A R-D 2 R heterodimers in both the anterior and posterior parts were increased after RI training compared to the control or RR group (Figure 5A). In the DLS, the change in the A 2A R-D 2 R heterodimers was relatively less pronounced such that the increase was observed only in the middle DLS after RI training compared to the control ( Figure 5B). There was no difference in the A 2A R-D 2 R heterodimers in the anterior, middle and posterior part of the DMS and DLS between the RR or control groups, with exception of apparently a small increase in the middle part of the DMS in the RR group (Figures 5A,B).

The Striatal A 2A R-mGluR5 Heterodimers Display Neither Subregional Distribution Nor Adaptive Changes after the RI and RR Training Schedules
We also examined the subregional distribution and adaptive changes of the A 2A R-mGluR5 heterodimers after the RI and RR training schedules (Figure 6). The A 2A R-mGluR5 heterodimers displayed neither heterogeneous distribution between DMS and DLS nor adaptive changes after the RI training (leading to habit formation) or RR training (leading to goal-directed behavior) sessions (Figures 6A,B). These indicated  that such heterogeneous distribution (DLS vs. DMS) and adaptive changes over the instrumental learnings (i.e., increased formation of the heterodimers) were specific for the A 2A R-D 2 R heterodimers.

DISCUSSION
The A 2A R-D 2 R heterodimers have been studied extensively in cultured cells and brain tissues by fluorescence resonance energy transfer (FRET; Torvinen et al., 2005) and receptor binding with biochemical finger printing (Ciruela et al., 2006) and by blocking peptides targeting the presumed A 2A R-D 2 R interaction site (Azdad et al., 2009) and by co-immunoprecipitation (Ciruela et al., 2006). Since heteromerization of A 2A Rs and other GPCRs (such as A 2A R-D 2 R and A 2A R-mGluR5) has been demonstrated mostly in cultured cell lines with overexpressed recombinant receptors that may result in the creation of many more heterodimers than naturally exist, it is essential to detect the normal distribution of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers in the intact striatum in order to infer their possible physiological functions. However, the direct detection of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers in intact animals and its physiological relevance has been proven to be difficult. Recently, PLA has been developed to detect the presence of the A 2A R-D 2 R (Trifilieff et al., 2011) and A 2A R-CD73 heterodimers in the striatum (Augusto et al., 2013). For example, Bonaventura et al. showed that the A 2A R-D 2 R heterodimers by PLA were reduced in dopamine-depleted caudate-putamen after chronic treatment with L-dopa in non-human primates . The specificity of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers using PLA was further validated here by demonstrating the detection of the A 2A R-D 2 R and A 2A R-mGluR5 heterodimers in the striatum of WT but neither in A 2A R KO nor in D 2 R KO mice.
Using PLA, we demonstrated the heterogeneous distribution of the A 2A R-D 2 R heterodimers in DMS and DLS as well as their adaptive changes over the instrumental learning procedures. Given the critical role of the DLS in control of habit formation, the more abundant A 2A R-D 2 R heterodimers in DLS than DMS under the basal condition may suggest that the A 2A R-D 2 R heterodimers in DLS may contribute to habit formation. It should be noted that there was no subregional variation in the A 2A R-mGluR5 heterodimers between the DMS and DLS. Since D 2 R-mediated striatal LTD is preferentially founded in the DLS striatopallidal neurons (Shen et al., 2008), the prominent DLS distribution of A 2A -D 2 R heterodimers may suggest a possible role of the A 2A R and D 2 R (rather than the A 2A R-mGLuR5) interaction in modulating LTD in the DLS.
Importantly, following instrumental learning, our analysis reveals that the formation of the A 2A R-D 2 R heterodimers increased (by nearly two-folds) over the RI schedule which resulted in habitual behavior compared to their control level. The increased association of the A 2A R-D 2 R heterodimers was seen selectively after the RI schedule (to promote habit), but was not seen in the mice trained by the RR schedule (to promote goal-directed behavior). Genetic and pharmacological studies have implicated several neurotransmitter and neuromodulator receptors take effects on development of goal-directed and habit behaviors, including D 1 R, D 2 R , CB 1 R (Hilário et al., 2007), NMDAR (Yin et al., 2005), A 2A R (Yu et al., 2009;Li et al., 2016), and Gpr6 (Lobo et al., 2007), by alteration of instrumental behaviors. However, to the best of our knowledge, this is the first report on the molecular marker of habitual formation that is selectively induced by the RI (but not the RR) training schedule. This novel molecular correlate of the RI training and possibly habit formation would be useful in molecular dissecting and monitoring habit formation. Moreover, our detailed analysis of A 2A R-D 2 R heterodimer changes on anterior-posterior axes indicated that the A 2A R-D 2 R heterodimers in the DMS showed more dynamic changes after RI training in agreement with our recent finding that the DMS A 2A R signaling plays a major role in control of instrumental learning (Li et al., 2016).
Given the well-documented A 2A R-D 2 R antagonistic interaction, striatopallidal A 2A Rs may affect animals' sensitivity to dopamine signaling through the increased A 2A R-D 2 R heterodimers. Dopamine signaling apparently has more prominent role during the early stage of instrumental learning (goal-directed behavior) than the late stage (habitual behavior; Choi et al., 2005). As the RI training progresses, the formation of the striatal A 2A R-D 2 R heterodimers increases, resulting in the increased inhibition of the A 2A R on the D 2 R signaling in the striatopallidal neurons and consequently the less dependence of dopamine signaling at the late stage of instrumental learning. On the other hand, the lack of adaptive changes of the A 2A R-mGluR5 heterodimers after instrumental learning suggests that striatopallidal A 2A R activity may preferentially interact with dopamine signaling to modify instrumental learning process.
Since the LTD in striatopallidal neurons, which is modulated by the D 2 R (Kreitzer and Malenka, 2007) and A 2A R activities (Shen et al., 2008), is associated with a shift in behavioral control from goal-directed action to habitual responding (Nazzaro et al., 2012). We speculated that the increased formation of the A 2A R-D 2 R heterodimers after the RI learning may increase the inhibition effect of the A 2A Rs on the D 2 Rs, and consequently reduce D 2 R-mediated striatal LTD in striatopallidal neurons to modify instrumental learning. The prominent changes in the A 2A R-D 2 R heterodimers in the DMS and its correlation with development of habitual behavior lend support for our interpretation that the increased formation of the A 2A R-D 2 R heterodimers after RI training augments the inhibitory effect of the A 2A R on the D 2 R activity and consequently on goaldirected behavior, manifesting as a habitual behavior. Thus, the A 2A R-D 2 R heterodimers may partially account for recent demonstrations that optogenetic activation of the striatal A 2A Rs promotes habit (Li et al., 2016) and pharmacological blockade of the A 2A R promote goal-directed ethanol intake (Nam et al., 2013) and reverse meth-amphetamine-induced facilitation of habitual action (Furlong et al., 2015), albeit the A 2A R may control habit formation by distinct mechanism other than the A 2A R-D 2 R heterodimerization. If the functional significance of the A 2A R-D 2 R heterodimers can be demonstrated by future studies with direct manipulation of these heterodimers (such as the blocking peptide specifically targeting the A 2A R-D 2 R heterodimers interface; Azdad et al., 2009) in intact animals, the A 2A R-D 2 R heterodimers may represent a novel therapeutic target for controlling abnormal habit formation associated with obsessive compulsive disorders and relapse of drug addiction.