Edited by: John D. Salamone, University of Connecticut, USA
Reviewed by: Etsuro Ito, Waseda University, Japan; Jee Hyun Kim, University of Melbourne, Australia
*Correspondence: Gianluca Baldassarre
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Devaluation is the key experimental paradigm used to demonstrate the presence of instrumental behaviors guided by goals in mammals. We propose a neural system-level computational model to address the question of which brain mechanisms allow the current value of rewards to control instrumental actions. The model pivots on and shows the computational soundness of the hypothesis for which the internal representation of instrumental manipulanda (e.g., levers) activate the representation of rewards (or “action-outcomes”, e.g., foods) while attributing to them a value which depends on the current internal state of the animal (e.g., satiation for some but not all foods). The model also proposes an initial hypothesis of the integrated system of key brain components supporting this process and allowing the recalled outcomes to bias action selection: (a) the sub-system formed by the basolateral amygdala and insular cortex acquiring the manipulanda-outcomes associations and attributing the current value to the outcomes; (b) three basal ganglia-cortical loops selecting respectively goals, associative sensory representations, and actions; (c) the cortico-cortical and striato-nigro-striatal neural pathways supporting the selection, and selection learning, of actions based on habits and goals. The model reproduces and explains the results of several devaluation experiments carried out with control rats and rats with pre- and post-training lesions of the basolateral amygdala, the nucleus accumbens core, the prelimbic cortex, and the dorso-medial striatum. The results support the soundness of the hypotheses of the model and show its capacity to integrate, at the system-level, the operations of the key brain structures underlying devaluation. Based on its hypotheses and predictions, the model also represents an operational framework to support the design and analysis of new experiments on the motivational aspects of goal-directed behavior.
The capacity to select actions on the basis of desired goals (
Knowledge about the neural substrates of goal-directed behavior has significantly advanced in the last years. Particularly important for this work is evidence on the effects on IDE of brain lesions focused on specific brain structures. Among the most important ones, lesions of the basolateral amygdala (BLA) (Blundell et al.,
Notwithstanding the large number of experiments on IDE, there are still few works proposing comprehensive system-level accounts of the neural basis of IDE and its role in goal-directed behavior (e.g., see Yin et al.,
This work contributes to answer these questions by presenting a computational model that incorporates most of the constraints from the lesion experiments on IDE mentioned above and that accounts for them in terms of the underlying system-level brain mechanisms. The main hypothesis of the model is that during the instrumental and satiation phases the system formed by BLA and IC (henceforth “BLA/IC”) associates the perception of the manipulanda (e.g., the levers) with the motivational value of the outcomes, and then during the devaluation test it transfers such value to goal representations via the BLA/IC-NAc connections (cf. the proposal of Donahoe et al.,
The model also incorporates and operationalizes additional hypotheses related to how the selected goal leads to bias the selection of actions to perform: (a) the brain system underlying IDE and goal-directed behavior is based on three basal ganglia-cortical (BG-Ctx) loops involving ventral basal ganglia-PFC (BGv-PFC; “limbic loop,” called here “goal loop” for the focus on goal-directed behavior), dorsomedial BG-posterior parietal cortex (BGdm-PPC; “associative loop”), and dorsolateral BG-motor cortex (BGdl-MC; “motor loop”) (Yin and Knowlton,
The rest of the paper is organized as follows. Section 2 presents the model structure and functioning and the biological evidence supporting them. In particular, Section 2.1 expands the evidence on lesions involving IDE addressed with the model. Section 2.2 further elaborates the main hypothesis at the core of the model. Section 2.3 explains the other hypotheses incorporated by the model. Section 2.4 explains the model at a computational detailed level. Section 3 shows how the model accounts for the target experiments. In particular, Section 3.1 illustrates the simulated environment, rats, and experiments used to test the model. Section 3.2 addresses the standard devaluation experiment with two manipulanda. Section 3.2 addresses a devaluation experiment using only one manipulandum. Section 3.4 presents some predictions of the model. Finally, Section 4 discusses the results and draws the conclusions. The acronyms used in the paper and the model parameters are indicated in the Appendix.
The results of a large number of lesion experiments furnish strong constraints on the brain system underlying IDE, so they have been used to build the system-level architecture of the model. As argued by some researchers (e.g., Passingham and Wise,
BLA | V | V | Blundell et al., |
Balleine et al., |
|||
IC | V | V | Balleine and Dickinson, |
Parkes and Balleine, |
|||
NAc | V | V | Corbit et al., |
NAs | X | X | Corbit et al., |
PL | V | X | Corbit and Balleine, |
Tran-Tu-Yen et al., |
|||
Ostlund and Balleine, |
|||
DMS | V | V | Yin et al., |
OFC | X | X | Ostlund and Balleine, |
Hip | X | X | Corbit and Balleine, |
BLA-PL | X | X | Coutureau et al., |
Lesions of the BLA (Blundell et al.,
Recent experiments indicate a complex involvement of BLA and IC in IDE and suggest that they closely interact to form an important sub-system for outcome-related incentive learning. In particular, transient inactivation of BLA during satiation has been shown to prevent IDE, whereas its inactivation after satiation leaves IDE intact (West et al.,
The main hypothesis of the model is that (a) the evaluation processes of the rewards involving IDE are based on the associations between the representations of external stimuli involved in the instrumental conditioning, in particular the manipulanda, and the representations of action outcomes, in particular the rewarding foods, and that (b) the value attributed to such outcomes depends on the current state of the animal. These associations rely on mechanisms pivoting on the BLA/IC subsystem.
The amygdala complex (Amg) is a core part of the appetitive and aversive motivational system in vertebrates (Balleine and Killcross,
Information on external objects and cues reach the Amg through connections with the terminal areas of the brain ventral visual pathway, such as the temporal cortex (TC) encoding objects through abstract features (Pitkänen et al.,
The functions played by Amg rely on two kinds of associative processes (Hatfield et al.,
The second learning process directly associates the CS to unconditioned responses (UR; these are CS-UR associations). Once formed, when a CS is perceived these associations allow Amg to directly trigger UR without the mediation of the US representation. An experiment revealing the presence of this type of association involves the lesion of the BLA within a PDE experiment. When this is done, the CS still triggers the UR even if the related US has been devalued (Hatfield et al.,
The information processed by Amg and its associative learning processes allow it to trigger various responses and modulations affecting action directed toward the outer world (see Mirolli et al.,
The retrieval of the incentive value of outcomes during instrumental behavior has been shown to involve the gustatory region of the anterior insular cortex (IC; Balleine and Dickinson,
The connections of BLA and IC with other structures, and the evidence of focused lesions of such structures reviewed in Section 2.1, support the idea that BLA and IC are sufficient to store the current motivational values of outcomes in IDE experiments, and to transfer it to NAc for the selection of goals. BLA (Pitkänen et al.,
Feedforward projections from BLA and IC to the NAc thus seem to be the main connections needed to broadcast incentive value information to downstream structures (Zahm,
We can now restate more in detail the core hypothesis of the model based on the empirical evidence illustrated this far. The hypothesis is sketched in Figure
As we have previously proposed (Baldassarre et al.,
In the instrumental learning phase of the devaluation experiment, when BLA/IC recall the outcome representation in the presence of the CS, and at the same time the NAc-PL loop activates the representation of the possible effects of the selected actions, a third learning process can take place. This links the motivationally salient representations of outcomes in BLA/IC with the goal representations in the NAc (Figure
A further parallel learning process leads to the formation of the action-outcome/ outcome-action associations (contingencies). This learning process is studied in contingency degradation, another experimental paradigm used to operationalize goal-directed behavior alongside devaluation (Balleine and Dickinson,
Previous studies, most notably Donahoe et al. (
To empirically rule out a possible S-R interpretation of IDE experiments, Balleine and Ostlund (
Balleine and Ostlund (
This section introduces five additional hypotheses that we used to structure the system-level architecture of the model within which we embed the key hypothesis presented in the previous section. While doing this, the section overviews the model architecture and functioning whereas Section 2.4 presents the model computational details. The hypotheses captures the system-level organization of key brain structures supporting the behavioral expression of IDE and involve in particular the functioning of: the BLA/IC, the striato-cortical macro-loops, the cortico-cortical connectivity, the striato-nigro-striatal pathway, and the dopamine system (Figure
On the input side, the motor and associative loops and BLA/IC receive input signals from “out-of-loop” sensory cortical areas. In the model, these input cortical areas are not explicitly simulated and encode the absence/presence of the two levers with two units each activated with binary positive/zero values (Figure
On the output side, two neural units of the motor loop cortex encode respectively two actions: “press the lever” and “pull the chain” (or “press lever 1” and “press lever 2”). An action is selected and performed at each time step where the related cortical unit is activated above a certain threshold.
As seen in Section 2.2, BLA is one main place in brain where neutral stimuli from the environment get associated with stimuli having an innate biological appetitive/aversive value depending on the internal state of the animal. In the model, the representations of food outcomes are activated either by the consumption of food or by neutral stimuli previously associated to them through a Pavlovian process. Previous computational system-level models have highlighted the importance of Amg and associative Hebbian learning rules to implement Pavlovian processes (Armony et al.,
In the model (Figure
In the model, learning within BLA/IC takes place at the synaptic level through a time-dependent form of plasticity (Maren,
Basal ganglia form multiple re-entrant
The main architecture of the model is based on three loops: the motor loop, the associative loop, and the goal loop (Haber,
The associative loop (Figure
The goal loop (Figure
In the model, the BG are implemented starting from the model proposed in Gurney et al. (
The selective function of BG is not innate but is acquired through trial-and-error learning processes (Graybiel,
Information on desirable outcomes processed in medial and orbital PFC (part of the goal loop) is transferred to dorsal PFC (associative loop) via connections within PFC (Yeterian et al.,
In the model, these cortical pathways are represented as cortico-cortical connection between the cortical areas of the goal loop and the associative loop and between the cortical areas of the associative loop and the motor loop. The associative loop is hence important to encode the action-outcome associations linking the representations of outcomes/goals within the goal loop with the representations of actions within the motor loop. In the brain, the macro-structure of cortico-cortical pathways has a strong innate basis but it also undergoes cortical plasticity (Buonomano and Merzenich,
Section 2.1 explained how lesions to either NAc or DMS impair IDE both if they are carried out before or after the instrumental training phase, whereas lesions of PL impair IDE only if they are carried out before instrumental learning but not after it. This indicates that a brain structure targeted by NAc, and different from PL, has to carry NAc information on the selected goal to the associative and motor loops. Empirical evidence suggests a possible candidate for this function, namely the striato-nigro-striatal “dopaminergic spiral” pathway that involves re-entrant connections successively involving NAc, DMS, and DLS within BG and VTA and SNpc as dopaminergic structures (Fudge and Haber,
In the model, the striato-nigro-striatal pathway plays the role of transferring the information on the current incentive value of goals encoded in NAc to the associative and motor loops based on the dopaminergic modulation of local selective processes within the BG. The connections forming the dopaminergic spirals are hardwired: the plasticity processes likely involving these connections are not simulated for the same reasons of the lack of learning in the cortico-cortical pathways. In the model, the dopaminergic spirals contain neural channels that maintain the topology throughout their stages, thus reflecting the typical segregation of other portions of the BG and the necessity for IDE of the DMS as intermediate striatal stage (Yin et al.,
In the literature, two main distinctive functions are ascribed to phasic and tonic production of dopamine by VTA and SNpc. Phasic—intense and short-lasting—dopamine is strongly associated to plasticity of several structures of brain. Here we focus on the role of phasic dopamine in learning processes taking place within striatum (Reynolds and Wickens,
Tonic—extracellular, slow-changing—dopamine, in particular directed to NAc, has a major role in modulating animals' active coping with challenges (Salamone et al.,
This section describes the model in computational detail, but before doing this it presents some general considerations on its nature and on the methodology used to build it. The integrated account of the wide empirical evidence on IDE required the construction of a
Another problem we faced was that sometimes the reproduction of the IDE experiments required the implementation of some functions relying on brain mechanisms that are still unknown. In this case, we used tentative neural mechanisms suggested by more general neuroscience knowledge and our computational experience. This approach has the advantage of allowing: (a) the formulation of operational hypotheses on IDE integrating behavioral and lesion evidence produced by several different empirical experiments; (b) the identification of current knowledge gaps of theories on IDE, in particular in relation to the neural mechanisms underlying it, and the proposal of computational hypotheses on them; (c) the production of system-level predictions testable in future empirical experiments.
The computational approach used to build the model facilitates the explanation and the reproduction of the model (the approach was initially proposed in Baldassarre et al.,
The detailed architecture of the model is presented in Figure
Three sets of input units are activated and reach different components of the model during simulations. Two binary units encode the absence or presence of the two manipulanda and both reach two striatal units within the motor loop, two striatal units within the associative loop, and two units within BLA/IC representing conditioned stimuli (CS). Two binary units, encoding the non-consumption or consumption of the two foods, reach two units of BLA/IC representing the unconditioned stimuli (US). The two food units also reach the single unit of the PPN, in turn activating SNpc, and the single unit of the LH, in turn activating VTA. These circuits encode the value information (reward) related to the ingestion of the two foods. Two binary units encoding no-satiation/satiation for the two foods reach, through inhibitory one-to-one connections, respectively the two BLA/IC units representing the US. The output of the model is encoded by two cortical units of the MC representing the two actions on either one of the two manipulanda. An action is triggered when the activation of one of the two units overcomes a threshold θ
The model is formed by two types of firing rate units each abstracting the activity of a whole population of neurons encoding relevant information (e.g., a lever, a food, a goal, an action). The first type of units, used in most components of the model, are
where
where
The striatal units are leaky units as those described above but their input (and hence activation) is also enhanced by dopamine as follows:
where τ is a time constant, ι is a parameter weighting the input to the striatum that is independent of dopamine, δ is a parameter weighting the input that is dependent on dopamine, and
The unit of PPN, the unit of LH, and the four units of BLA, are represented with a second different type of units, called here “leaky onset units,” to be able to produce fast transient responses to the input in the case of PPN and LH, or to be able to implement a learning process highly sensitive to the timing of the input signals in the case of BLA/IC. A leaky onset unit is based on two coupled leaky units, one representing an excitatory neural population processing the input signals and returning the whole output of the onset unit, and a second one representing an inhibitory neural population processing the input signals and inhibiting the first population. This complex unit produces an onset response to the input signals, namely a response that first increases and then decreases even if the input signal starts and remains high for a prolonged time. Onset units allow the production of phasic responses to the rewards, in the case of PPN and LH units, or the support of the time-sensitive learning processes of the BLA/IC illustrated below. Formally, the equations of an onset unit are as follows:
where
where
BLA/IC is formed by four leaky onset units that exchange all-to-all lateral connections between them. Each connection between a pre-synaptic and a post-synaptic BLA/IC unit is updated with a Hebbian learning rule depending on the time-difference between the onset activation of the two units, and on dopamine. In particular, the learning rule is applied to a memory traces of the activation of the units: such traces allow the formation of connection weights on the basis of activations of the pre- and post-synaptic units taking place at different times (e.g., as in Pavlovian “trace conditioning” or “delay conditioning”). Traces represent slow electrochemical lasting reactions following the activation of neurons. Formally, a trace related to a unit is computed as follows:
where
where
This Hebbian learning rule is closely related to other types of
The noise process driving the exploration of the system and the trial-and-error learning of striatum takes place within Th. The noise is added as input to the thalamic units and is computed on the basis of a decaying moving average as follows:
where
Connections from within-loop cortical units to striatal units are not trained. These connections are however important for training the synaptic weights reaching the striatum from out-of-the-loop units as they carry information to the striatum about the cortical units of the “channels” that win the within-loop competition on the basis of the BG selection processes. For example, if a cortical unit (and the corresponding channel) encoding an action within the motor loop wins the competition and is activated, its activation is projected back to the corresponding striatal unit within the loop and this unit can associate to the unit encoding the presence of a certain lever and belonging to cortical areas outside the loop. The dopamine-dependent Hebbian learning rule used for such training is as follows:
where
The SNpc component in the model is formed by two different modules corresponding to the DMS and DLS. Each module is formed by two couples of units. Within each couple, one unit projects to the corresponding striatal unit whereas the second unit inhibits the first unit. The inhibitory unit receives an afferent inhibitory connection from the corresponding unit of the striatal structures located one level higher in the striato-nigro-striatal hierarchy. This projection can reduce the baseline activation of the inhibitory unit so that the overall output of the couple increases. The time constant of the dopaminergic inhibitory units is set to a large value so that the baseline activation of the excitatory dopaminergic unit changes very slowly, thus mimicking tonic dopamine slow changes. The excitatory dopaminergic units of the SNpc couples receive an afferent connection from the onset unit of the PPN. In this way, when the PPN unit is activated by a reward, it causes a high peak of excitation of the SNpc dopaminergic couples mimicking phasic dopamine bursts. The VTA module is at the vertex of the dopaminergic spirals and receives only an excitatory phasic input from LH.
This section first describes how we simulated the devaluation experiments. Then it presents the performance of the model in the experiment with two levers, and the neural mechanisms underlying it, both when the model is fully functioning and when it undergoes focused lesions as those investigated in the literature. Successively, it presents similar analyses for the single-manipulandum experiment. Finally, it presents some predictions of the model.
The model was tested with simulated rats acting in a simulated environment. Although the simulated rats and environment were quite abstract, they nevertheless reproduced the circular interaction of real animals with the environment, involving repeated close-loop cycles of input, processing, output, and environment reaction (in this respect, the model is “embodied,” Mannella et al.,
All simulations consisted of two instrumental training sessions followed by two devaluation test sessions. The satiation phase, happening between training and test, was simulated by suitably setting the satiation inputs of the model in the test sessions (see below). Each instrumental session lasted 20 min and was formed by multiple trials during which both satiety input units were set to zero. In the simulations with two manipulanda, in the first training session each rat experienced the first manipulandum and related reward (first food), whereas in the second session it experienced the second manipulandum and related reward (second food). In the simulations with one manipulandum, in the first training session only the first action could lead to a reward (first food) whereas in the second session only the second action could lead to a reward (second food).
The two test sessions lasted 2 min each during which no reward was delivered. In the first test phase, both satiety variables were set to zero. In the second test phase the satiety variable related to the first food was set to zero whereas the one related to the second food was set to one (note that in simulation it was not necessary to test the opposite satiation pattern as the symmetry of conditions was guaranted by design). In the two manipulanda experiment, two levers were used in the two test phases whereas in the one manipulandum experiment only one manipulandum was used.
The simulations were replicated several times by setting a different seed of the random-number generator so as to have different learning and test histories mimicking different rats. In the model, the lesion of a structure was reproduced by permanently setting the activation of its units to a value of zero whereas the disconnection between structures was performed by permanently setting the connection weights of the neural connections linking them to zero.
The simulation with two manipulanda was performed in nine different conditions. For each condition the simulation was replicated 40 times with different random seeds (rats) each including the two training sessions and the two test sessions. The first condition involved the intact version of the model (this condition was called CONTROL). Further four conditions tested the model with the lesion of respectively BLA/IC, NAc, DMS and PL performed before the training sessions (“BLA/IC-pre,” “NAc-pre,” “DMS-pre,” and “PL-pre”). The last four conditions tested the models with respectively the lesion of BLA, NAc, DMS and PL performed after the training phase (“BLA-post,” “NAc-post,” “DMS-post,” and “PL-post”).
During each training session, the learning process was monitored by measuring the number of pressures of the available lever in 10 time-bins, averaged over the 40 simulation repetitions and the two manipulanda. For each lesion, the mean number of actions in the different bins was compared with a one-factor ANOVA to detect the presence of learning. Table
CONTROL | 3.49 | 5.01 | 6.16 | 7.03 | 7.44 | 7.73 | 8.03 | 8.25 | 8.30 | 8.43 | ||
BLA/IC-pre | 3.79 | 5.21 | 6.44 | 7.06 | 7.40 | 7.74 | 7.81 | 7.93 | 8.11 | 8.20 | ||
NAc-pre | 3.35 | 4.23 | 4.89 | 5.63 | 6.15 | 6.69 | 7.01 | 7.18 | 7.29 | 7.48 | ||
DMS-pre | 3.36 | 3.83 | 4.14 | 4.54 | 4.64 | 4.80 | 5.05 | 5.21 | 5.33 | 5.55 | ||
PL-pre | 3.36 | 4.16 | 4.98 | 5.51 | 6.23 | 6.65 | 6.91 | 7.15 | 7.21 | 7.36 | ||
BLA-post | 3.56 | 4.95 | 6.06 | 6.95 | 7.46 | 7.80 | 8.05 | 8.21 | 8.34 | 8.43 | ||
NAc-post | 3.60 | 5.21 | 6.30 | 7.03 | 7.48 | 7.86 | 8.06 | 8.26 | 8.36 | 8.45 | ||
DMS-post | 3.48 | 5.00 | 6.11 | 6.99 | 7.50 | 7.71 | 8.01 | 8.24 | 8.34 | 8.53 | ||
PL-post | 3.61 | 5.11 | 6.23 | 7.04 | 7.44 | 7.84 | 8.08 | 8.30 | 8.25 | 8.48 |
Table
The explanation of these effects on learning of lesions with respect to controls might be that the impairment of the goal-directed systems formed by NAc, PL, and DMS deprives the system of a means to “focus” on specific inputs and actions (cf. Fiore et al.,
IDE were measured comparing the number of actions toward one manipulandum vs. those toward the other in the two test sessions, the first with no satiation for either food and the second with satiation for the second food. IDE were considered to be in place if a statistically significant difference between the number of actions toward the two levers, measured with a
CONTROL | 7.96 | 10.43 | 25.11 | 0.25 | ||||
BLA/IC-pre | 9.43 | 8.91 | 9.34 | 8.71 | ||||
NAcCO-pre | 8.30 | 9.73 | 8.49 | 9.54 | ||||
DMS-pre | 8.81 | 8.60 | 8.69 | 8.88 | ||||
PL-pre | 9.38 | 8.80 | 8.95 | 8.91 | ||||
BLA-post | 9.13 | 9.54 | 9.25 | 9.56 | ||||
NAcCO-post | 9.18 | 8.58 | 9.08 | 8.80 | ||||
DMS-post | 9.56 | 7.54 | 9.16 | 8.08 | ||||
PL-post | 9.61 | 8.56 | 25.09 | 0.35 |
The behavior of the model can be explained as follows based on direct inspection of its functioning during training and test. The plastic connection weights linking the lever units and BLA/IC, on one side, to the striatal regions of the motor, associative, and goal loops, on the other side, were set to zero at the beginning of each simulation. The neural noise injected into the three BG-Th-Ctx loops led them to initially perform random selections. During the training phase, four different learning processes take place in the model (see Figure
During the two test devaluation phases, both levers are presented together. Based on the perception of the levers (CS), BLA/IC tend to activate the related food representations within them. In the first test phase, when the model is not satiated for any one of the two foods, this tends to lead to the selection of either one of the two actions with the same chance. Instead in the second test phase, when the second food is satiated, the BLA/IC neural representation of the latter (US) is inhibited, so only one outcome representation (first non-devalued food) can actually activate. This is the key process implementing the central hypothesis of the model: a lever, acting as CS, recalls the valued representation of food within BLA/IC, i.e., a US, and this in turn leads to select a specific goal within the goal loop. The goal representation within NAc and PL leads the system to activate, via the cortico-cortical connections and the striato-nigro-striatal dopaminergic spirals, the corresponding neural unit within the associative loop and then the motor loop, thus biasing the preferential selection of the action corresponding to the valued food.
Table
Table
The experiment with one manipulandum was conducted with the same modalities as the experiment with two levers, with the only difference that only one manipulandum was used. Table
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||
Means | 3.73 | 5.10 | 6.29 | 6.98 | 7.38 | 7.46 | 7.89 | 8.05 | 7.94 | 7.98 |
The presence of IDE was measured as in the experiment with two manipulanda. Table
Means | 8.99 | 10.93 | 24.75 | 0.45 |
The data on lesions and the anatomical and physiological evidence reproduced by the model and illustrated in the previous sections represent a considerable amount of constraints satisfied by the model system-level architecture and functioning. The model can hence be used to produce predictions that might be tested in future new empirical experiments. Here we present in particular one prediction that concerns the effects of a possible lesion of the dopaminergic striato-nigro-striatal spirals that transfer incentive-value information from the goal loop to downstream loops. We have mentioned in Section 2.3 that this hypothesis of the model has been formulated in the lack of empirical evidence and that alternative hypotheses exist. For this reason it was interesting to probe the model to furnish a prediction that closely depended on that hypothesis and that would falsified the hypothesis itself if empirically disproved. To this purpose, we used the model to simulate and predict the effects of the lesion of the striato-nigro-striatal projections. In particular we set to zero the projections from the NAc to the medial region of the SNpc, and from the DMS to the dorsal region of the SNpc. The lesions were performed either before or after the instrumental training in two different simulations.
Table
CONTROL | 3.49 | 5.01 | 6.16 | 7.03 | 7.44 | 7.73 | 8.03 | 8.25 | 8.30 | 8.43 | ||
SNS-pre | 3.21 | 4.06 | 4.55 | 4.85 | 5.40 | 5.88 | 6.28 | 6.71 | 6.74 | 7.01 | ||
SNS-post | 3.65 | 5.11 | 6.46 | 7.04 | 7.46 | 7.84 | 8.11 | 8.21 | 8.34 | 8.56 |
Table
CONTROL | 11.88 | 12.05 | 17.13 | 6.43 | ||||
SNS-pre | 10.63 | 9.78 | 10.43 | 10.08 | ||||
SNS-post | 10.93 | 9.68 | 10.30 | 10.40 |
These results encourage the test of the model prediction in real animals. This could be done using the technique based on controlateral lesions, in this case targeting the left SNpc and the right NAc in some rats, and the right SNpc and the left NAc in others. The expectation would be that in devaluation tests IDE would be impaired although one lateral NAc component and one lateral SNpc component are still intact and can play their functions.
We close this section on predictions recalling the predictions reported in Section 3.2 and concerning the possible differential effects that different pre-instrumental-training lesions might produce on the effectiveness of learning process. As illustrated there, the model predicts that a DMS lesion would lower the final performance of rats more than either a NAc-lesion or a PL lesion; and that the latter two lesions would lead to quantitatively similar detrimental effects on learning. These predictions might be tested in future empirical experiments that use the same experimental protocol and measures to evaluate the degree to which these different lesions impair instrumental learning. Although not directly related to the devaluation paradigm, these predictions might contribute to trace how the goal-directed components of brain exert their control on action selection during the expression of IDE.
The instrumental devaluation experimental paradigm is considered a pivotal to demonstrate the presence of goal-directed behavior in mammals (Balleine and Dickinson,
The importance of these brain structures for devaluation, the neuroscientific evidence on the pivotal role of amygdala in implementing Pavlovian processes leading to attribute value to unconditioned stimuli, and the knowledge on the role of the other brain structures in implementing goal and action selection, allowed us to propose a computational model that integrates such information in a whole operational framework. The core hypothesis of the model is that the basolateral amygdala and insular cortex re-activate the representations of action-outcomes (e.g., foods) on the basis of environment stimuli (e.g., levers) and that these representations reflect the current value of the outcomes for the animal given its current internal state (e.g., satiation for one food but not for a second one). The model also operationalizes additional hypotheses on how the incentive value computed by the amygdala and insular cortex can bias action selection by: (a) influencing goal selection performed within the basal ganglia-cortical loop involving nucleus accumbens core and some areas of the prefrontal cortex, in particular the prelimbic cortex; (b) transferring information to the downstream associative and motor basal ganglia-cortical loops via both striato-nigro-striatal dopaminergic spirals and cortico-cortical pathways.
Donahoe et al. (
An important piece of empirical evidence accounted for by the model is that the prelimbic cortex is important for the acquisition of the neural prerequisites, but not for the expression, of devaluation effects (Corbit and Balleine,
Further results (noticed during the paper revision process) corroborate the soundness of the model. In particular, the model correctly predicts that the lesion of basolateral amygdala does not impair instrumental
Related to the last point, the model also produced some predictions that might be tested in future empirical experiments. In particular, the model predicts the possible effects of a lesion of the dopaminergic striato-nigro-striatal spirals that transfer incentive-value information from the goal loop to downstream loops. In particular, the model predicts that this lesion would lead to (a) a slower learning process and performance and (b) the impairment of instrumental devaluation effects. These predictions might be tested in future work by performing a controlateral double lesion involving the right subtantia nigra pars compacta and the left nucleus accumbens core in some rats, and the left subtantia nigra pars compacta and right nucleus accumbens core in other rats. The model also predicts that a lesion of the dorsomedial striatum would slow learning more than what done by a lesion of either the prelimbic cortex or the nucleus accumbens; and that the lesion of either the prelimbic cortex or the nucleus accumbens (which form a closely integrated system for goal selection) would slower learning to a similar extent. These represent further predictions that might be tested in future empirical experiments.
The model has some limitations that, together with the opportunity to account for other phenomena related to goal-directed behavior, call for its future development in multiple directions. First, the ventral/orbital prefrontal cortex of the model is now connected to the dorsal prefrontal/associative cortex that in turn is connected to the motor cortex. This was done as currently the model associative loop does not play any specific function while in animals it serves important functions, such as working memory and the control of attention and affordances (e.g., see Baldassarre et al.,
A further issue, moving toward the use of the model to account for different phenomena relevant for goal-directed behavior, involves contingency degradation experiments. Contingency degradation is a second experimental paradigm that, together with devaluation, has been used to establish goal-directed behaviors in mammals (Balleine and Dickinson,
Similarly, the model does not currently account for Pavlovian instrumental transfer (PIT), an important phenomenon closely related to devaluation (Corbit and Balleine,
Notwithstanding the need of these future extensions, the model contributes to the elaboration of an overall theory on brain mechanisms underlying instrumental devaluation effects and in particular on how incentive value is assigned to goals in goal-directed behavior. In particular, we think the main aspects of this contribution are two. First, the integration of data on lesions involving devaluation into a coherent operational model. Second, the proposal of an overall system-level architecture that, although abstract at the level of the single components, connectivity, and learning processes, represents an important “skeleton” usable to build more detailed future theories and models of devaluation and goal-directed behavior.
Idea of work: FM, MM, GB; implementation of model and simulations: FM; analysis of results: FM, MM, GB; writing up of article: FM, MM, GB.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research has received funds from the European Commission under the 7th Framework Programme (FP7/2007-2013), ICT Challenge 2 “Cognitive Systems and Robotics,” project “IM-CLeVeR - Intrinsically Motivated Cumulative Learning Versatile Robots,” Grant Agreement no. ICT-IP-231722. We thank one of the reviewers for highlighting the differences in learning speed when the model undergoes different lesions before instrumental training.
Table
Amg | Amygdala |
BG | Basal ganglia |
BGdl | Basal ganglia, dorsolateral part |
BGdm | Basal ganglia, dorsomedial part |
BGv | Basal ganglia, ventral part |
BLA | Amygdala, basolateral complex |
CeA | Amygdala, central nucleus |
CS | Conditioned stimulus |
Ctx | Cortex |
DA | Dopamine |
DLS | Dorsolateral striatum |
DM | Dorsomedial thalamus |
DMS | Dorsomedial striatum |
GPe | Globus pallidus, external part |
GPi | Globus pallidus, internal part |
Hip | Hippocampus |
Hyp | Hypothalamus |
IC | Insular cortex, gustatory division |
IDE | Instrumental devaluation effects |
LH | Lateral hypothalamus |
MC | Motor cortex |
MD | Mediodorsal thalamus |
MGV | Thalamus medial geniculate body, ventral division |
NAc | Nucleus accumbens, core part |
NAs | Nucleus accumbens, shell part |
OFC | Orbitofrontal cortex |
P | Pulvinar, part of thalamus |
PDE | Pavlovian devaluation effects |
PFC | Prefrontal cortex |
PFCd | Prefrontal cortex, dorsal division |
PFCm | Prefrontal cortex, medial division |
PL | Prelimbic cortex |
PMC | Premotor cortex |
PPC | Posterior parietal cortex |
PPN | Peduncolopontine nucleus |
SNpc | Substantia nigra pars compacta |
SNpr | Substantia nigra pars reticulata |
S, O, R, A | Stimulus, outcome, response, action |
STN | Subthalamic nucleus |
STNdl | Dorsolateral subtalamic nucleus |
STNdm | Dorsomedial subtalamic nucleus |
STNv | Ventral subtalamic nucleus |
TC | Temporal cortex |
Th | Thalamus |
UR | Unconditioned response |
US | Unconditioned stimulus |
VTA | Ventral tegmental area |
Tables
τ | σ | θ | |||
DLS | 300 | 1 | 0 | ||
STNdl | 300 | 1 | 0 | ||
GPi | 300 | 1 | 0 | ||
DMS | 300 | 1 | 0 | ||
STNdm | 300 | 1 | 0 | ||
GPi/SNpr | 300 | 1 | 0 | ||
NAc | 300 | 1 | 0 | ||
STNv | 300 | 1 | 0 | ||
SNpr | 300 | 1 | 0 | ||
τ | σ | θ | |||
MGV | 300 | 1 | 0 | ||
P | 300 | 1 | 0 | ||
DM | 300 | 1 | 0 | ||
τ | σ | θ | |||
MC | 2000 | 20 | 0.8 | ||
PFCd/PC | 2000 | 20 | 0.8 | ||
PL | 2000 | 20 | 0.8 | ||
τ | σ | θ | |||
SNpco | 300 | 1 | 1 | ||
SNpci | 300 | 1 | 1 | ||
VTA | 300 | 1 | 1 | ||
σ | θ | τ |
τ |
||
PPN | 1 | 0 | 100 | 500 | |
LH | 1 | 0 | 100 | 500 | |
BLA | 1 | 0 | 500 | 500 |
Table
Table
Mani | MC | DLS | STNdl | ||
DLS | L | 1 | |||
STNd | 1.6 | ||||
GPi | −3 | −2 | |||
Mani | PFCd/PC | DMS | STNdm | ||
DMS | L | 1 | |||
STNdm | 1.6 | ||||
GPi/SNpr | −3 | −2 | |||
BLA | PL | NAc | STNv | ||
NAc | L | 1 | |||
STNdm | 1.6 | ||||
SNpr | −3 | −2 | |||
MGV | P | DM | GPi/SNpr | ||
MGV | −0.8 | 1.5 | |||
P | −0.8 | 1.5 | |||
DM | −0.8 | 1.5 | |||
MC | PFCd/PC | PL | Th | ||
MC | 1 | 1 | |||
PFCd/PC | 0.2 | 0.2 | 1 | ||
PL | 1 | 1 | |||
SNpci | PPN | LH | DMS | NAc | |
SNpco | 1 | 20 | |||
SNpci | −10 | −6 | |||
VTA | 20 | ||||
Mani | Food | Sat | BLA | ||
BLA | 5 | 5 | 10 | L | |
PPN | 10 | ||||
LH | 10 | 5 |
Table
τ | 80 | Noise decay constant |
ν |
0.25 | Noise coefficient of MGV |
ν |
0.25 | Noise coefficient of P |
ν |
6.0 | Noise coefficient of DM |
τ |
500 | Trace time constant |
α | 1010 | Trace amplification coefficient |
η |
0.08 | BLA/IC learning rate |
θ |
0.7 | BLA/IC DA learning threshold |
2 | Maximum connection weight in BLA/IC | |
ι |
0.2 | DLS dopamine independent input coefficient |
ι |
0.5 | DMS dopamine independent input coefficient |
ι |
0.8 | NAc dopamine independent input coefficient |
δ |
4.0 | DLS dopamine dependent input coefficient |
δ |
6.5 | DMS dopamine dependent input coefficient |
δ |
1.5 | NAc dopamine dependent input coefficient |
η |
0.02 | DLS learning rate |
η |
0.02 | DMS learning rate |
η |
0.05 | NAc learning rate |
θ |
0.8 | DLS dopamine learning threshold |
θ |
0.8 | DMS dopamine learning threshold |
θ |
0.9 | NAc dopamine learning threshold |
θ |
0.5 | DLS activation learning threshold |
θ |
0.5 | DMS activation learning threshold |
θ |
0.9 | NAc activation learning threshold |
θ |
0.5 | DLS input activation learning threshold |
θ |
0.5 | DMS input activation learning threshold |
θ |
0.9 | NAc input activation learning threshold |
1 | Input-DLS maximum connection weight | |
1 | Input-DMS maximum connection weight | |
2 | BLA-NAc maximum connection weight | |
θ |
0.8 | Threshold for triggering actions |