Front. Syst. Neurosci., 03 February 2010 | https://doi.org/10.3389/neuro.06.002.2010
Hippocampus is critical for fast learning of relational knowledge. Relational knowledge acquired and represented by the hippocampus may include reward associations of arbitrary cues and responses during adaptive decision making. Hippocampal memory representations that are formed in tasks that use reinforcement learning may reflect reward-related signals, but this may not be apparent unless the magnitude of expected reward is controlled. This issue is particularly relevant to a recent study by Kumaran et al. (2009) whose results suggested that the hippocampus may house abstract conceptual knowledge, whereas other brain areas, such as the ventromedial prefrontal cortex, may integrate abstract information received from the hippocampus with stimulus-bound value information. Here an alternative opinion is presented.
Kumaran et al. (2009) provided compelling evidence that the hippocampus acquires and retains knowledge about conceptual relations among visual patterns during a task where participants learn associations between a set of patterns and a set of choices through reinforcement learning. They showed that, during learning, hippocampal blood oxygenation level-dependent (BOLD) signal correlates with participants’ knowledge of conceptual relations among the patterns. The acquisition of this knowledge was measured using probe questions, and the answers to these questions were used to construct a variable called the probe_performance vector (Figure 2D in Kumaran et al., 2009 ), which was used as a regressor for the BOLD signal to identify brain regions that represent abstract conceptual representations. Kumaran et al. (2009) found that the hippocampal BOLD signal correlated with the probe_performance vector during the “Initial” session of the task, where conceptual relations among a set of patterns were acquired for the first time, and that the hippocampal BOLD signal observed during the “Initial” session also correlated with performance during the “New” session of the task, where the same conceptual relations were present among a new set of patterns.
Although correlates of the probe_performance vector in hippocampal BOLD signal may represent the acquisition and representation of conceptual knowledge, the question of whether a correlate of the reward value of this knowledge is also represented in hippocampal activity remains unanswered. Probe_performance vector was strongly correlated with probability of correct response for conceptually defined pattern classes (i.e. spatial or nonspatial patterns). Probability of correct response for spatial and nonspatial patterns is shown in the lower panel of Figure 2B in Kumaran et al. (2009) . The correlation coefficient between these probabilities and the probe_performance vectors was about 0.98 for each pattern class (as measured from the figures). This indicates that the probe_performance vector of each pattern class represents virtually the same information as the probability of correct response for that pattern class. Moreover, part of the variability in the participants’ responses to individual pattern presentations significantly correlated with the probe_performance vector, even after an estimated learning curve (the probability_success vector) was partialled out (Kumaran et al., 2009 ). These lines of evidence suggest that the probe_performance vector represents a component of the participants’ probability of correct response that is due to the knowledge of the conceptual relations among the patterns, and that this component is not captured by each pattern’s independently estimated learning state process.
Since the magnitude of reward was constant in Kumaran et al.’s (2009) experiment, correlation with correct response probability implies correlation with the expected value of reward. Thus, spatial and nonspatial probe_performance vectors directly correlate with the expected value of the reward that the participants earn by deploying the knowledge of the associated conceptual pattern classes in making their decisions. It follows that all brain areas that selectively correlated with the probe_performance vector in Kumaran et al.’s (2009) study may represent not only conceptual knowledge, but also a correlate of its associated reward value.
For a brain area to represent a correlate of this reward value, its activity during the “Initial” session of Kumaran et al.’s (2009) task should correlate with performance in the “New” session (Figure 6B), because “spatial” and “nonspatial” conceptual knowledge have reward value in both sessions. Moreover, its activity should show significantly stronger correlation with probability of success (i.e. expected reward magnitude) on a given learning trial in the “New” session, as compared to the “Initial” session (Figure 6C), because the reward value of the conceptual knowledge increases monotonically throughout the “Initial” and “New” sessions due to continued reinforcement. The activity of this area should correlate with the probe_performance vector, above and beyond any correlation with learning curves that are derived as a function only of the learning state processes associated with individual patterns (Figure 4). The results of Kumaran et al. (2009) suggest that the only area that meets all of these conditions is the left hippocampus. Thus, the reward value of conceptual knowledge may be present in its hippocampal representation, and may be transferred to new learning situations through this representation. Alternatively, the activity observed in the left hippocampus may be the signature of a reward dependent machinery that produces conceptual knowledge representations that do not necessarily have a reward-related semantic content themselves.
These interpretations suggest that the signal detected in the left hippocampus in relation to the learning of conceptual knowledge may covary with the expected value of the reward that the participants earn by deploying that conceptual knowledge during adaptive decision making. This would be compatible with the finding that the medial temporal lobe BOLD signal correlates with the reward value predicted by reinforcement learning models during the performance of a task with a relational structure where time-varying payoffs correlate between stimuli (Wimmer et al., 2008 ). This would also be compatible with the finding that the BOLD signal in the left hippocampus correlates with the Temporal Difference reward-predictive value during the reinforcement learning of a multistate Markov decision task where reward magnitude depends on task conditions (Tanaka et al., 2004 ).
In view of these observations, the question of whether the reward value of conceptual knowledge is present in its hippocampal representation, and how reward signals mediate this encoding, may be addressed in future experiments where reward magnitude is controlled in concept learning paradigms.