Trial-by-Trial Modulation of Associative Memory Formation by Reward Prediction Error and Reward Anticipation as Revealed by a Biologically Plausible Computational Model

Anticipation and delivery of rewards improves memory formation, but little effort has been made to disentangle their respective contributions to memory enhancement. Moreover, it has been suggested that the effects of reward on memory are mediated by dopaminergic influences on hippocampal plasticity. Yet, evidence linking memory improvements to actual reward computations reflected in the activity of the dopaminergic system, i.e., prediction errors and expected values, is scarce and inconclusive. For example, different previous studies reported that the magnitude of prediction errors during a reinforcement learning task was a positive, negative, or non-significant predictor of successfully encoding simultaneously presented images. Individual sensitivities to reward and punishment have been found to influence the activation of the dopaminergic reward system and could therefore help explain these seemingly discrepant results. Here, we used a novel associative memory task combined with computational modeling and showed independent effects of reward-delivery and reward-anticipation on memory. Strikingly, the computational approach revealed positive influences from both reward delivery, as mediated by prediction error magnitude, and reward anticipation, as mediated by magnitude of expected value, even in the absence of behavioral effects when analyzed using standard methods, i.e., by collapsing memory performance across trials within conditions. We additionally measured trait estimates of reward and punishment sensitivity and found that individuals with increased reward (vs. punishment) sensitivity had better memory for associations encoded during positive (vs. negative) prediction errors when tested after 20 min, but a negative trend when tested after 24 h. In conclusion, modeling trial-by-trial fluctuations in the magnitude of reward, as we did here for prediction errors and expected value computations, provides a comprehensive and biologically plausible description of the dynamic interplay between reward, dopamine, and associative memory formation. Our results also underline the importance of considering individual traits when assessing reward-related influences on memory.


INTRODUCTION
Information entering the brain is varied and not equally well represented in memory. For example, aversive and rewarding events are usually better remembered as compared to neutral events (Kensinger, 2004;Maren and Quirk, 2004;Shohamy and Adcock, 2010). Improved memory for such events has an evolutionary advantage because it increases the chance of avoiding potential dangers as well as fulfilling crucial needs that typically engage reward processes, e.g., finding food and water, shelter, sexual partner.
Although not yet fully understood, the neural mechanisms that enable such memory enhancement therefore promote survival.
Different aspects of reward have been reported to enhance memory. For example, recognition memory is enhanced for images serving as reward-predictive vs. neutral cues (Wittmann et al., 2005(Wittmann et al., , 2011(Wittmann et al., , 2013Spaniol et al., 2014), suggesting a beneficial role of reward anticipation (Adcock et al., 2006). Memory is also promoted by the delivery of rewards, as indicated by enhanced recognition memory for items presented prior to positive vs. negative feedback (Mather and Schoeke, 2011) and high vs. low rewards (Murayama and Kitagami, 2014). Because both reward anticipation and reward delivery engages the dopaminergic reward system which projects to brain regions involved in memory processes, such as the hippocampus (Gabrieli, 1998;Squire et al., 2004), reward-related enhancements of memory may be determined by dopaminergic influences on hippocampal plasticity (Lisman and Grace, 2005;Shohamy and Adcock, 2010;Miendlarzewska et al., 2015). Some attempts have been made to disentangle the contribution of reward anticipation and reward delivery to memory formation. For example, Bialleck et al. (2011) used two categories of neutral objects as reward-predictive cues in a number comparison task, in which positive and negative outcomes were delivered. The authors tested two conditions, one where reward was contingent on task performance and one where it was not. In the rewardcontingent block, i.e., the condition most similar to previous studies, recognition memory was found to be modulated by reward anticipation, but not reward delivery. By contrast, Mather and Schoeke (2011) reported a main effect of reward delivery on memory for pictures, but no effect of reward anticipation besides an interaction with image valence. In this latter study, no rewardlearning occurred because pictures were presented as targets in a reaction time task and were preceded by reward-anticipatory cues with specific numbers (0, −0.25, +0.25$) indicating the value of the trial and followed by reward outcomes indicating hits or misses. The relationship between memory encoding, reward delivery, and reward anticipation is still unresolved.
However, because most previous studies averaged memory performance across trials within conditions of high or low rewards, significant fluctuations in reward computations associated with the dopamine system have been ignored. Specifically, rather than simply differentiating between positive and negative outcomes or between cues predicting rewards or neutral outcomes, dopamine neuron activity scales with the mismatch between actual and predicted outcomes, i.e., prediction errors (Schultz and Dickinson, 2000) and the magnitude of reward-predictive cue values, i.e., expected values (Tobler et al., 2005). These functional properties may have implications for the understanding of the links between reward, dopamine, and memory formation. For example, neutral objects may acquire reward-predictive values through the pairing with positive outcomes (Wittmann et al., 2005;Bialleck et al., 2011). This can be described by a process in which a cue's expected value is incremented following outcomes that are better than predicted, i.e., positive prediction errors, and decremented following outcomes that are worse than predicted, i.e., negative prediction errors (Rescorla and Wagner, 1972). Given initially neutral cues, learning typically develops gradually. In this case, because dopamine neuron activity tracks the magnitude of both expected values and prediction errors, a shift in dopamine activity occurs, whereby activity predominates at the time of reward delivery early during learning due to large prediction errors and small expected values, but is later observed at the time of cue presentation due to small prediction errors and large expected values (Schultz et al., 1997). Ignoring such fluctuations in reward makes it difficult to ascertain whether memory enhancing effects commonly attributed to reward anticipation are due to reward anticipation as a result of high expected values assigned to reward-predictive cues, reward delivery owing to large positive prediction errors, or both.
Two recent studies explicitly addressed the impact of reward computations on item recognition memory but reported seemingly discrepant results. Specifically, during a reinforcement learning task, Davidow et al. (2016) presented task-irrelevant images at reward delivery and found that adolescents, but not adults, showed increased recognition memory for images that had been presented during large prediction errors, when memory was tested immediately after the learning. By contrast, Wimmer et al. (2014) presented task-irrelevant images together with choice alternatives during a similar reinforcement learning task, and reported decreased recognition memory, in adult participants, for images presented in trials with large prediction errors, when memory was tested the next day. Potential explanations for these results may partly relate to the fact that Davidow et al. (2016) and Wimmer et al. (2014) tested memory immediately after learning and on the next day, respectively. Thus, the delay period between memory encoding and testing in Wimmer et al. (2014)'s study likely included a period of sleep, during which reward likely interacts with consolidation processes to promote memory formation (Lansink et al., 2009;Igloi et al., 2015). Moreover, differences between adolescents and adults may be mediated by increased reward-related activations of the dopaminergic reward system in adolescents (Galvan, 2010;Davidow et al., 2016). Thus, individual differences in the activation of the dopaminergic reward system, as determined by reward sensitive traits, may modulate the impact of reward during memory encoding. Yet, the impact of individual traits on reward-related modulations of episodic memory formation remains largely unknown.
One final issue addressed by the present study is the current research bias toward using item recognition paradigms combined with monetary rewards. Recognition memory is just one facet of episodic memory and it seems unlikely that only monetary rewards should determine whether one memory is prioritized over others. For example, memorizing that a spouse prefers roses over tulips depends on the formation of associative memories, and such associations are shaped by aspects of social reward, such as the spouse's differential facial expression when receiving roses or tulips (see Figure 1). Social rewards evoke similar patterns of activity in the dopaminergic reward system as primary or monetary rewards. For example, presenting happy smiley faces increases activity in the ventral striatum, as compared to sad ones, and the mismatch between receiving a happy smiley face and the expectation of receiving it, i.e., the prediction error, is encoded in the dopaminergic midbrain . Moreover, because associative memory formation depends on brain structures in the medial temporal lobe, including the hippocampus (Gold et al., 2006;Mayes et al., 2007), it is plausible to assume that dopaminergic influences on hippocampal plasticity also enhances associative memory formation.
To address these raised issues, the present study combines a novel associative memory task with a computational approach and estimates of trait reward and punishment sensitivity. In brief, during each encoding trial, participants learned characterobject preferences by guessing which of two items a cartoon character preferred. Positive and negative feedback indicated whether selected objects were preferred or not preferred, and was also used to test the impact of reward delivery on memory encoding. Reward anticipation was experimentally manipulated by varying the ratio between the number of positive and negative feedbacks assigned to different characters. For example, characters assigned high reward anticipation received positive feedback in 8 out of 10 trials while those assigned to medium and low reward anticipation received positive feedback in 5/10 and 2/10 trials, respectively. Character-object associations were then tested after either 20 min or after 24 h. A computational approach incorporating computational characteristics of reward, namely trial-by-trial changes of expected value and prediction error, allowed us to decipher the respective contributions of reward anticipation and reward delivery on associative memory formation beyond simple averaging procedures. Individual differences in reward and punishment sensitivity was estimated via a French short version of the Sensitivity to Punishment and Sensitivity to Reward Questionnaire (SPSRQ).

Participants
Twenty-six right-handed participants participated in the study. All participants provided written and informed consent according to the ethical regulations of the Geneva University Hospital and the study was performed in accordance with the Declaration of Helsinki. Data from one participant were excluded due to failing to adhere to task instructions, i.e., data from 25 participants were included in the statistical analyses (20 females; mean age ± SEM: 24.62 ± 1.08).

Associative Memory Task
Encoding Each trial started with the presentation of a fixation cross, after which the face of a cartoon character was presented together with FIGURE 1 | Shaping associative memory via positive and negative feedback. The problem of buying a spouse the right type of flowers becomes trivial by remembering the positive and negative associations between the spouse and each type of flower. In this example, a rose elicits a smile which causes a positive association while a tulip causes sadness, thus creating a negative association. two objects (Figure 2A). Participants were instructed that the character preferred one object in each pair and they had to guess the preferred object by pressing a left or a right button with their right hand. Following selection, a positive feedback (happy smiley face) or negative feedback (sad smiley face) indicated whether the participant had guessed the preferred or the non-preferred object. Participants were instructed that they would later be tested on the preferences. Positive and negative feedbacks were therefore equally valid in terms of learning to discriminate between the preferred and the non-preferred object in each pair.
Each pair of objects was presented once for each character and participants encoded 10 preferences for each character before encoding 10 preferences for the next character, and so forth. In total, participants learned 10 character-object preferences for six different characters, resulting in a total of 60 trials.
In accordance with standard procedures, memory between different conditions of reward-delivery and anticipation was tested by collapsing memory performance across trials within a condition. Specifically, the influence of reward delivery was tested by comparing average memory performance for trials with positive and negative feedbacks (Figure 2B), while the impact of reward anticipation was assessed by comparing average memory FIGURE 2 | Stimuli and Procedure. (A) During encoding trials, a cartoon character was presented together with two objects. Participants guessed which of the two objects the character preferred and positive or negative feedback indicated whether the selected object was preferred or not preferred. During testing, trials were identical to the encoding with the exception that a confidence rating was performed instead of receiving feedback. (B) The impact of reward delivery on memory encoding was assessed through two types of feedback, positive and negative. (C) The impact of reward anticipation on memory encoding was assessed by manipulating the ratio between the number of positive and negative feedback assigned to a character. Specifically, characters were divided into three different categories. performance for characters assigned to different feedback ratios ( Figure 2C). The feedback ratios were manipulated in a predetermined fashion such that the feedback presented in a trial did not depend on the actual selection. For example, for characters assigned to high reward anticipation positive feedback was presented in 8/10 trials, while feedback was positive in 5/10 and 2/10 trials for characters assigned to medium and low reward anticipation ( Figure 2C). In total, two characters were assigned to each of three levels of reward anticipation (high, medium, and low; Figure 2C). Moreover, because the feedback ratios were determined probabilistically, the reward anticipatory value of each character fluctuated from trial to trial. This allowed studying how subtle fluctuations in reward, as estimated by a computational model (see Section Computational approach below), influenced associative memory formation.

Testing
During testing, the procedure was identical to the training with the exception that feedback was no longer provided (Figure 2A). Participants were instructed to select the preferred object in each pair. This could be accomplished by recalling and selecting the preferred object, or by recalling and rejecting the nonpreferred object. Instead of receiving feedback, participants rated the confidence of their selection as "a Guess, " "Quite certain, " or "Certain."

Procedure
To determine whether reward-influences on memory formation were present after short (Bialleck et al., 2011;Mather and Schoeke, 2011;Davidow et al., 2016) and/or long over-night delays between encoding and recall (Wittmann et al., 2005(Wittmann et al., , 2011(Wittmann et al., , 2013Murayama and Kitagami, 2014;Wimmer et al., 2014), each participant performed two encoding-testing sessions. Participants in one group (n = 13, 13 females; mean age ± SEM: 25.44 ± 0.94) first encoded character-object preferences and then performed an unrelated visual discrimination task (lasting approximately 20 min) after which memory for the preferences were tested. After a break new character-object preferences were encoded followed by the unrelated visual discrimination task, but memory for these new preferences was tested 24 h later. Another group of participants (n = 12, 7 females; mean age ± SEM: 23.79 ± 1.27) performed the encoding-testing sessions in reverse order, i.e., they first performed the session with a 24 h delay between encoding and recall, and then the one with a 20 min delay. Of note, neither feedback nor reward was provided during the visual task to avoid any interference with associative memory formation. Moreover, new characters and pairs were presented in each session to prevent memory interference between sessions. Before the experiment, participants performed a training version of the task to get familiarized with the experimental paradigm. Six character-item preferences were encoded (three with positive and three with negative feedback) and tested after a brief break. The characters and the object-pairs used for the training were not included in the experiment proper.

Behavior
We first used standard procedures in the field, i.e., averaging performance across trials within each condition, to assess the impact of reward anticipation and delivery on memory. We thus calculated the proportion of correct selections/rejections for each condition (Character types; Feedback types). Differences between conditions were analyzed using ANOVAs and Monte-Carlo permutation tests (MC-tests). MC-tests are less sensitive to violations of normality and are therefore more suitable for small sample sizes as compared to the traditional t-test (Howell, 2013). We then applied a computational approach in order to account for trial-by-trial fluctuations in memory formation as a function of reward delivery and reward anticipation, as described below.

Computational Approach
Reward-related enhancements of memory formation have been attributed to dopaminergic influences on hippocampal plasticity (Wittmann et al., 2005;Adcock et al., 2006;Shohamy and Adcock, 2010). Two specific aspects of reward are known to scale with dopamine neuron activity, i.e., expected value (the level of anticipated reward) and prediction error (the mismatch between actual and predicted outcomes). Specifically, phasic dopamine neuron activity increases when expected values increase (Tobler et al., 2005) and when outcomes are better (more positive) than predicted (Schultz et al., 1997). Thus, in an attempt to elucidate the relationship between specific reward computations and associative memory formation, expected values and prediction errors were incorporated into a novel computational model of associative memory.
Each character type in the present study is defined through its repeated associations with positive and negative feedback, with the underlying assumption that different feedback ratios will yield different expected values. This learning process can be described by a Q-learning rule (Watkins and Dayan, 1992). In each trial t, the expected value V c of character c is updated based on the prediction error δ, i.e., the mismatch between the actual outcome r (here set to 1 and 0 for positive and negative outcomes, respectively) and the expected value V c , scaled by the learning rate α: These concepts were then incorporated into an associative memory model which assumes that the probability of encoding a character-object association depends on the reward R(t) provided in each trial t, i.e., the magnitude of the expected value V c (t) and the prediction error δ(t), as well as a constant C 0 . The probability p Memory (t) of encoding information presented in a trial t is described through a logistic function: To determine which aspect of reward that contributes to memory formation, different models were confronted: A "Baseline" model without any reward contribution: R (t) = C 0 A "δ" model with a contribution of only signed prediction error: A "V" model with a contribution of only expected value: A "δ+V" model with contributions from signed prediction error and expected value: One reviewer pointed out the involvement of the noradrenergic system in both memory formation and the encoding of "surprise, " i.e., unsigned prediction errors. Potential noradrenergic contributions to memory formation were therefore tested in two additional models: A "|δ|" model with a contribution of only unsigned prediction error: R (t) = C 0 + C |δ| · |δ (t) | A "|δ|+V" model with contributions from unsigned prediction error and expected value: C V , C δ , and C |δ| are scale factors that determine the respective contribution of expected value, signed prediction error, and unsigned prediction error to the memory encoding probability. The free parameters C 0 , C V , C δ , C |δ| , and α were fitted to each participant's data through maximum likelihood estimation, i.e., by minimizing the negative log-likelihood estimation function (LLE) for logistic regression: y(t) is the observed outcome (i.e., hit/miss) in each trial t. Minimization of the log-likelihood was performed in two steps. First, for each model the LLE was calculated for the complete parameter space in steps of 0.01, i.e., −2:0.01:2 for all scale factors C x and 0:0.01:1 for α. A more refined search was then performed via a Nelder-Mead simplex method (Nelder and Mead, 1965), in which the parameter search space was centered on the optimal values obtained from the first step with boundaries set to ±0.01 around these values.
Model fits were compared using the Akaike Information Criterion (AIC; Akaike, 1974) which accounts for different numbers of fitted variables k:

Questionnaires
Individual differences in the sensitivity to reward and punishment modulate the balance between approach and avoidance learning (Smillie et al., 2007;Aberg et al., 2016).
To investigate whether such trait characteristics also influence associative memory formation, we administered a questionnaire estimating reward and punishment sensitivity, i.e., a French version of the Sensitivity to Punishment and Sensitivity to Reward Questionnaire (SPSRQ; Lardi et al., 2008) previously shown to be related to the balance between learning from positive (vs. negative) feedback in a reinforcement learning task (Aberg et al., 2016). Six of the participants were unable to fill out the questionnaires because they were not native French speakers, thus these data were collected on 19 out of the 25 participants. Correlations between memory performance and traits were investigated using the Spearman rank-correlation coefficient ρ. A sensitivity bias score was calculated as the z-scored Sensitivity to Reward minus the z-scored Sensitivity to Punishment, i.e., z-score SR-SP (Aberg et al., 2016).

Memory Performance
An initial ANOVA on memory performance did not reveal any main effect of Group (participants first tested after 20 min vs. first tested after 24 h) or significant two-, three-, four-, or fiveway interactions between the factor Group and factors Feedback type (positive, negative feedback), Character type (low, medium, high positive vs. negative feedback ratio), Confidence level (guess, quite certain, certain), or Delay (20 min, 24 h; all p > 0.1). Thus, the factor Group was removed from all subsequent analyses.

Memory Tested after 20 min
A three-way repeated measures ANOVA with factors Feedback type (positive, negative feedback), Character type (high, medium, low feedback ration), and Confidence level (guess, quite certain, certain) revealed a main effect of Feedback type [F (1, 24) = 21.345, p = 0.004] because character-object preferences encoded during positive feedback (mean ± SEM: 0.777 ± 0.020) were better remembered as compared to those encoded during negative feedback (mean ± SEM: 0.624 ± 0.036, p < 0.001; Figure 3A). Moreover, there was a trend for an effect of Character type [F (2, 48) = 3.975, p = 0.061] because character-object preferences encoded for characters associated with a high feedback ratio (mean ± SEM: 0.756 ± 0.024) were better remembered as compared to those associated with a low feedback ratio (mean ± SEM: 0.632 ± 0.031, p < 0.001) and marginally better as compared to those associated with a medium feedback ratio (mean ± SEM: 0.714 ± 0.029, p = 0.098). Memory performance between characters associated with medium and low feedback ratios was also significant (p = 0.018; Figure 3B). There was also a significant main effect of Confidence level [F (2, 48) = 44.348, p < 0.001] because memory performance was higher for Certain (mean ± SEM: 0.852 ± 0.027) as compared to Quite certain responses (mean ± SEM: 0.653 ± 0.029, p < 0.001) and Guesses (mean ± SEM: 0.524 ± 0.028, p < 0.001). Memory performance for Quite certain responses was also better as compared to Guesses (p = 0.001) ( Figure 3C). No two-or three-way interactions were significant (all p > 0.13), therefore suggesting independent influences of reward anticipation (Character type) and reward delivery (Feedback type) on memory formation. In addition, this means there was no reward-modulation of the relationship between subjective confidence levels and memory performance.
The effect of Character type was confirmed in a subsequent analysis where the linear regression slopes between memory performance and character types were calculated separately for positive and negative feedback. A paired MC-test revealed that the slopes did not differ between positive feedback (mean ± SEM: 0.039 ± 0.026) and negative feedback (mean ± SEM: 0.028 ± 0.022, p = 0.762). However, the average slope across feedback types (mean ± SEM: 0.062 ± 0.016) was significantly different from 0.0 (p < 0.001), indicating a significant overall effect of Character type on memory performance.
As when memory was tested after 20 min, memory performance was graded by confidence levels, i.e., Certain > Quite certain > Guess, also when tested after 24 h. Together these results replicate the robust relationship between confidence levels and memory accuracy (De Zubicaray et al., 2011;Qin et al., 2011;Kuchinke et al., 2013). Because there were no significant FIGURE 3 | Behavioral results (Mean ± SEM). (A) Main effect of Feedback type when tested after 20 min. Memory performance was higher for character-item associations encoded with positive as compared to negative feedback. (B) Main effect of Character type when tested after 20 min. Memory performance for character-object associations scaled with the positive vs. negative feedback ratio. (C) Main effect of Confidence level when tested after 20 min. Performance was highest when the memory for character-item associations was rated as Certain, as compared to those rated as Quite certain, which were better recalled than Guesses. (D) Main effect of Feedback type when tested after 24 h. Memory performance was higher for character-object associations encoded with positive feedback as compared to negative feedback. (E) Memory performance did not differ between Character types when tested after 24 h. (F) Main effect of Confidence level when tested after 24 h. Performance was highest when the memory for character-item associations was rated as Certain, as compared to those rated as Quite certain, which were better recalled than Guesses. • p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001, ns = not significant (p > 0.05).
interactions between Confidence levels and the Feedback or Character types when memory was tested after 20 min or after 24 h, these results are not discussed any further.

Computational Approach
To explain these behavioral results in light of specific reward computations associated with the dopaminergic system, a computational approach was applied. Because previous studies indicate influences of both reward anticipation and delivery of rewards on memory formation, and because dopamine activity tracks the magnitude of reward anticipation, i.e., expected value, and reward delivery, i.e., prediction errors, we predicted that a model combining these two concepts would provide the best fit to behavioral data. Moreover, to confirm whether expected values and/or prediction errors contributed significantly to the memory formation, the values of the fitted scale factors of the best fitting model were compared to the null hypothesis of no influence, i.e., that their respective average value is equal to 0.

Memory Tested after 20 min
Average fitted parameters and AIC values for the different models are displayed in Table 1. Comparing AIC values using paired MC-tests revealed that the "δ+V" model, incorporating contributions from signed prediction errors (δ) and expected values (V), provided the best fit as compared to the other models (δ+V vs. Baseline, p < 0.001; δ+V vs. δ, p = 0.011; δ+V vs. V, p = 0.005; δ+V vs. |δ|, p < 0.016; δ+V vs. |δ|+V, p < 0.020). The fit of the δ+V model to behavioral data is shown in Figure 4A (dashed lines).
To confirm that reward-learning had occurred, the final expected values of each character type, as estimated through the "δ+V" model, are displayed in Figure 4B. Paired MC-tests revealed that characters associated with high reward anticipation (i.e., high feedback ratios) had obtained a higher final expected value (mean ± SEM: 0.816 ± 0.045) as compared to characters with medium (mean ± SEM: 0.575 ± 0.056) and low (mean ± SEM: 0.203 ± 0.046) reward anticipation, who also differed between each other (all p < 0.006).
Next, it was tested whether the average values of the fitted scale factors C V and C δ for the "δ+V" model (see Table 1) were significantly larger than 0. Indeed, MC-tests revealed that both C V (p < 0.001) and C δ (p < 0.001) were significantly larger than 0.0. Because C V and C δ index the contribution of expected value and signed prediction error magnitudes to memory performance, both of these reward computations contributed to subsequent memory formation when tested after 20 min.
The expected values of each character type, as estimated through the "δ+V" model, are displayed in Figure 4D. Paired MC-tests revealed that expected values for characters associated with high reward anticipation had obtained a higher final expected value (mean ± SEM: 0.732 ± 0.064) as compared to characters with medium (mean ± SEM: 0.534 ± 0.067) and low (mean ± SEM: 0.181 ± 0.043) reward anticipation, who also differed between each other (all p < 0.032).
Moreover, the average values for C V (p < 0.013) and C δ (p < 0.017) for the "δ+V" model (see Table 1) were significantly larger than 0.0. Importantly, these results indicate that both expected value and prediction error contributed to memory formation when tested after 24 h, despite no significant effect of reward anticipation when memory performance was averaged across trials (i.e. Character type; Section Memory performance). Thus, unlike standard analyses, which typically average memory performance across trials within conditions, our computational approach, which incorporates trial-by-trial fluctuations in the magnitude of reward during encoding, was able to clarify how reward significantly influenced memory formation also when tested after 24 h. The model is therefore not only more sensitive but, because it integrates existing physiological constraints to study how reward affects behavior, it presumably provides a more accurate description of how reward influences memory formation.

Effects of Individual Sensitivity to Reward/Punishment on Memory
Individual differences in the sensitivity to reward and punishment are reflected in the responsiveness of the dopaminergic reward system (Simon et al., 2010;Kennis et al., 2013), and bias the ability to learn from positive and negative outcomes (Aberg et al., 2016). To test how biases in the expression of individual traits impact on reward-related memory enhancements, individual memory performance was correlated with the differential score between the Sensitivity to Reward and the Sensitivity to Punishment scales of the SPSRQ.

Memory Tested after 20 min
Individuals with a sensitivity bias favoring reward over punishment showed better memory for associations encoded during positive (vs. negative) feedback (ρ = 0.550, p = 0.015; Figure 5A). Looking at positive and negative feedback separately revealed no significant correlations between the sensitivity bias and memory associated with positive (ρ = 0.209, p = 0.391) or negative feedback (ρ = −0.326, p = 0.173). Testing whether individual sensitivity biases predicted to what extent memory performance was influenced by prediction errors, i.e., C δ , and expected values, i.e., C V , showed a significant correlation between the sensitivity bias and C δ (ρ = 0.457, p = 0.049) but not with C V (ρ = 0.008, p = 0.974).

Difference between Memory Tested after 20 min and 24 h
Finally, we tested the influence of the sensitivity bias on the differential memory performance between short and long delays, i.e., memory tested after 20 min (vs. 24 h). An increased expression of reward (vs. punishment) sensitivity was associated with relatively better memory for associations encoded during positive (vs. negative) feedback when tested after 20 min as compared to 24 h (ρ = 0.550, p = 0.015; Figure 5C). This was specifically the case for associations encoded during negative feedback (ρ = −0.476, p = 0.039; Figure 5D), but not for positive feedback (ρ = 0.250, p = 0.303; Figure 5E).

DISCUSSION
The present study addresses the influence of reward delivery and reward anticipation on associative memory formation. Participants performed an associative memory task where character-object associations were learned through positive and negative feedback. To investigate the influence of reward anticipation, different characters were associated with different positive vs. negative feedback ratios, while the influence of reward delivery was tested by comparing memory for associations encoded during positive vs. negative feedback. The memory for character-object associations were tested after a delay of either 20 min or 24 h, and the impact of individual sensitivity biases to reward and punishment was assessed via trait questionnaires. Critically, a computational approach was used to investigate the link between memory formation and subtle fluctuations in expected value and prediction error, two different aspects of reward known to be associated with the dopaminergic system. The results are discussed in detail below.

Prediction Error Magnitude Modulates the Contribution of Reward Delivery to Memory Formation
Associative memory performance was better for characterobject associations encoded during positive as compared to negative feedback, a result which extends previous studies reporting enhanced recognition memory for images associated with positive vs. negative feedback (Mather and Schoeke, 2011), and high vs. low monetary reward outcomes (Bialleck et al., 2011;Murayama and Kitagami, 2014). Because the ability to discriminate between preferred and non-preferred objects in a character-object pair is equally dependent on the ability to encode positive, i.e., that an object is preferred, and negative associations, i.e., that an object is non-preferred, it might seem surprising that memory performance was better for positive as compared to negative associations. However, reward-related enhancements of memory formation have been attributed to the activation of the dopaminergic system and associated influences on hippocampal plasticity (Lisman and Grace, 2005;Shohamy and Adcock, 2010;Miendlarzewska et al., 2015). Dopamine neurons do not respond to an outcome unless it is more positive than predicted, i.e., positive prediction errors, or more negative than predicted, i.e., negative prediction errors, in which case there is a "dip" in activity (Schultz et al., 1997). Therefore, in the present study, learning positive associations may have benefitted from positive prediction errors causing increased dopaminergic influences on the hippocampus, while encoding negative associations may have suffered from reduced dopamine activity caused by negative prediction errors. Supporting this explanation, the computational approach showed significant and positive influences of prediction error magnitudes on memory performance, both when tested after 20 min and after 24 h. Additionally and importantly, we previously showed that the dopaminergic midbrain encodes the mismatch between actually receiving a happy smiley face and the expectation of receiving it, i.e., prediction errors .
The finding that the prediction error magnitude during encoding correlated positively with memory performance when tested after a short delay, i.e., 20 min, is similar to recently obtained results (Davidow et al., 2016). Specifically, Davidow et al. (2016) showed that the prediction error magnitude positively influenced the incidental encoding of images, when memory was tested immediately after learning. Interestingly, this effect was found only in adolescents but not in adults. Because adolescents show a general increase in the activation of FIGURE 5 | Individual differences. (A) When tested after 20 min, memory performance for positive vs. negative feedback correlated positively with the sensitivity bias, i.e., z-scored Sensitivity to Reward (SR) minus z-scored Sensitivity to Punishment (SP). (B) When tested after 24 h, memory performance for positive vs. negative feedback correlated marginally and negatively with the sensitivity bias. (C) For positive vs. negative feedback, memory performance correlated positively with the sensitivity bias, when memory was tested after 20 min as compared to 24 h. (D) For negative feedback, memory performance correlated negatively with the sensitivity bias, when memory was tested after 20 min as compared to 24 h. (E) For positive feedback, memory performance did not correlate significantly with the sensitivity bias, when memory was tested after 20 min as compared to 24 h. For display purposes non-ranked data and linear regression slopes are shown. the dopaminergic reward system (Galvan, 2010;Van Leijenhorst et al., 2010;Davidow et al., 2016), as do adults with an increased expression of reward sensitive traits (Simon et al., 2010;Kennis et al., 2013), we predicted that individuals with an increased sensitivity for reward over punishment should display a stronger positive influence from prediction error magnitudes on subsequent memory performance, a prediction that was confirmed by the present results.
Our result that prediction error magnitudes correlated positively with memory performance when tested after 24 h contrasts with a recent study reporting a negative correlation between prediction error magnitude and memory performance for incidentally encoded images when memory was tested on the next day (Wimmer et al., 2014). Overnight delays between encoding and testing are likely to include sleep, thus offering an opportunity for sleep-related memory consolidation processes to occur (Diekelmann et al., 2009;Rasch and Born, 2013;Igloi et al., 2015). One possible explanation for Wimmer et al. (2014)'s results is that consolidation processes enhanced and decremented memories encoded during negative and positive prediction errors, respectively. Furthermore, the activation of the dopaminergic reward system during sleep is important for memory consolidation , and the degree of this activation is related to the expression of reward-related behaviors and traits during wakefulness (Perogamvros et al., , 2015. Accordingly, consolidation processes dependent on the activation of the dopaminergic reward system could interact with individual traits to modulate memory formation during sleep. Supporting these speculations, we found that an increased sensitivity to reward over punishment was associated with increased memory for associations encoded during negative feedback when tested after 24 h, as compared to 20 min. Because some evidence suggests that sleep-related consolidation processes promote weakly encoded memories (Diekelmann et al., 2009;Oudiette et al., 2013;Rasch and Born, 2013), it is tempting to speculate that the memory encoding strength is directly related to the prediction error magnitude. Unfortunately, the present study was not designed to investigate sleep-related memory effects and further research is therefore needed to elucidate how individual traits influence reward-related memory processes during sleep. At the very least, the present results demonstrate the importance of considering individual traits when assessing reward-related memory enhancements.
Of note, other neurotransmitters such as noradrenalin and acetylcholine have also been implicated in learning and episodic memory formation (Doya, 2002;Harley, 2004;Tully and Bolshakov, 2010;Mather et al., 2016). Particularly relevant in the context of prediction errors is the noradrenergic system, with noradrenergic neurons in the locus coeruleus showing increased activity following events and outcomes that are either better or worse than predicted, thus encoding a notion of "surprise" or unexpected uncertainty (Harley, 2004;Yu and Dayan, 2005;Dayan, 2012;Clewett et al., 2014). Because noradrenergic neurons project to regions involved in memory formation and reward, including the hippocampus, the amygdala, and the dopaminergic midbrain, it could be predicted that the noradrenergic system enhances memory formation for any surprising event, and that the degree of surprise is determined by the magnitude of the unsigned prediction error. However, two computational models incorporating unsigned "noradrenergic" prediction errors provided inferior fits to behavioral data, as compared to a model incorporating signed "dopaminergic" prediction errors. Accordingly, the source of reward-related memory enhancements in the present study is more likely to be dopaminergic than noradrenergic.

Reward Anticipation Determines Memory Formation through Magnitude of Expected Value
Character-object associations for character types assigned to higher feedback ratios, i.e., more frequent positive vs. negative feedback, were better remembered as compared to those assigned to lower feedback ratios when tested 20 min, but not 24 h, after encoding.
Previous reports indicate enhanced memory formation for images signaling upcoming rewards (Wittmann et al., 2005;Bialleck et al., 2011), possibly due to increased reward anticipation (Adcock et al., 2006). While the present study did not test recognition memory for reward-predictive stimuli, i.e., the images of characters, the results are well in line with studies showing enhanced recognition-memory for other information presented in trials with high reward anticipation (Adcock et al., 2006;Spaniol et al., 2014).
The finding that memory performance was enhanced for character types associated with high feedback ratios after short, but not long time-delays, is in accordance with previous studies reporting effects of reward anticipation after short timedelays, i.e., up to 30 min (Bialleck et al., 2011;Mather and Schoeke, 2011), but contrasts with studies showing influences of reward anticipation on memory formation when memory is tested after long delays, i.e., >20 h (Wittmann et al., 2005(Wittmann et al., , 2011(Wittmann et al., , 2013Murayama and Kitagami, 2014). While the reason for these rather discrepant results between studies are unclear, one explanation could be derived from the notion that dopamine plays an important role for memory consolidation processes (Lisman et al., 2011). Specifically, unlike most previous studies that used item recognition paradigms paired with monetary rewards, the present study investigated associative memory and provided happy or sad smiley faces as feedback. Thus, it could be suggested that, in most experimental contexts, monetary rewards may be more motivationally salient causing increased dopaminergic responses, which would then have a stronger impact on dopamine-dependent consolidation processes, ultimately leading to longer-lasting memory-benefits of reward anticipation (Shohamy and Adcock, 2010;Igloi et al., 2015).
Another explanation could be related to the fact that averaging memory performance across trials for specific reward conditions (such as different Character types in the present study) ignores the fine temporal distribution of delivery and anticipation of reward within and across conditions. In particular, the repeated pairing of a neutral cue, i.e., the face of a character type, with positive and rewarding outcomes causes a gradual increase in the reward predictive value of the cue, while conversely causing a gradual reduction of prediction errors at reward delivery. Indeed, using a computational approach that provides estimates of trial-wise fluctuations in these reward-related parameters, we demonstrate a significant influence of expected value on memory formation, both after 20 min as well as after 24 h, therefore indicating contributions of reward anticipation over both short and long delays. Further evidence supporting this explanation comes from robust findings showing that dopamine neuron activity does not reflect the absolute magnitude of reward, but instead tracks the magnitude of expected values and prediction errors. Specifically, reward-learning causes a shift in the dopaminergic response from the time of reward delivery to the presentation of a reward-predictive cue (Schultz et al., 1997). Thus, given that dopamine activity promotes memory formation (Lisman and Grace, 2005;Shohamy and Adcock, 2010;Lisman et al., 2011), a plausible speculation is that changes in memory encoding as a function of fluctuations in the magnitude of prediction error and expected value relate to similar fluctuations in the activity of the dopaminergic system. Given that the contributions of these very specific and important reward computations to memory formation have been largely ignored in the literature, the present results significantly extend previous research by providing a more accurate description of the link between reward, the dopamine system, and memory.
Notably, two recent studies used computational approaches to study the impact of reward anticipation on the incidental encoding of images during reinforcement learning tasks, but report no impact of reward anticipation, i.e., the expected value of a chosen alternative (Wimmer et al., 2014) or a rewardpredictive cue (Davidow et al., 2016), on subsequent image recognition memory. One main difference between these studies and ours is that those previous studies looked at recognition memory while we used an associative memory paradigm. Moreover, we presented the associated objects together with the reward-predictive cue, i.e., the character face, during both the encoding and the test phase. Accordingly, it cannot be excluded that, besides its impact on memory encoding, the presence of a reward-predictive cue enhanced memory retrieval as well. Surprisingly few studies looked at the impact of reward and the dopamine system on memory retrieval. However, while memory retrieval is not enhanced by performance-based incentives alone (Han et al., 2010;Elward et al., 2015), the presentation of previously rewarded objects may contribute to the reinstatement of "non-strategic, reward-related processes" during memory retrieval (Kuhl et al., 2010;Halsband et al., 2012). Moreover, pattern completion, i.e., memory retrieval based on partial inputs, depends on intact dopamine function (Li et al., 2010). Additionally, presenting items associated with high (vs. low) rewards was found to engage the hippocampus and the dopaminergic midbrain during both encoding and retrieval (Wolosin et al., 2012). Together, these results suggest that previously rewarded objects may reinstate the reward context that was present during memory encoding, thus providing additional input which could enhance the memory retrieval process.
The present results may provide some insights concerning recent interesting findings by Mather and Schoeke (2011) showing that positive outcomes enhanced memory encoding also in subsequent trials. Specifically, a better than expected reward outcome (e.g., a positive feedback) may enable rewards to temporally extend their influence on memory formation by increasing expectations of rewards in subsequent trials. Notably, Mather and Schoeke (2011) found that this lingering effect of reward was significant for recognition memory but not when tested via free recall. It could be speculated that the absence of reward-related cues during free recall impeded memory retrieval processes by preventing the reinstatement of reward contexts. However, these results should be interpreted with caution due to the very low number of recalled items. More research is needed to further validate the lingering effects of reward on memory formation, and to disentangle the contribution of reward to memory encoding from its influence on retrieval processes.
Finally, most previous studies addressing the impact of reward on memory formation looked at item recognition memory or source memory. Here we show that anticipation and delivery of reward also enhances associative memory formation, thus extending previous results obtained via recognition memory paradigms. Altogether, these findings are in accordance with lesion studies suggesting that item recognition memory and associative memory depend on the same hippocampal neural circuitry (Stark and Squire, 2001;Stark et al., 2002), but see (Brown and Aggleton, 2001;Yonelinas et al., 2001).

Limitations
The present behavioral results provide some first insights about the involvement of specific neurotransmitters in reward's impact on memory formation. As mentioned previously, dopamine, acetylcholine, and norepinephrine have all been linked to enhanced memory encoding (Doya, 2002;Harley, 2004;Lisman and Grace, 2005;Tully and Bolshakov, 2010;Lisman et al., 2011;Mather et al., 2016). Thus, to determine the involvement of specific neurotransmitter systems in reward-related memory formation, the computational approach outlined in the present study needs to be combined with pharmacology and/or neuroimaging.
Another limitation of the present study can be derived from recent evidence indicating significant gender differences in learning and memory (Herlitz et al., 1997;Piefke et al., 2005;Pauls et al., 2013), reward sensitivity (Chian-Shang et al., 2007;Byrne and Worthy, 2015), reward-related decision making (Mather and Lighthall, 2012), and dopamine function (Cosgrove et al., 2007;Riccardi et al., 2011). Because participants in the present study were mostly females, and information regarding their menstrual cycle was not obtained, the impact of gender or sex hormones cannot be accounted for here.

CONCLUSION
Here we first use standard procedures, i.e., averaging memory performance across trials within conditions, to describe independent contributions to associative memory formation from reward delivery at short (20 min) and longer term (24 h), as well as from reward anticipation selectively at short term (20 min, but not when tested after 24 h). By contrast, a novel computational approach, which considers trial-bytrial fluctuations in the magnitude of prediction errors and expected values, revealed that reward delivery (as mediated by prediction error magnitude) and reward anticipation (as mediated by expected value magnitude) contribute to memory formation, both when tested after 20 min and after 24 h. Because this approach incorporates fundamental aspects of reward computations well-known to be reflected in dopamine neuron activity, it not only provides a more sensitive, but also a more accurate description of the link between reward, dopamine, and memory formation, than behavioral measures averaged across trials alone.
By clarifying how memory formation may be significantly modulated by distinct reward processing mechanisms and individual trait-dispositions, the present study also provides valuable insights for fields beyond basic neuroscience, such as for education or rehabilitation strategies. In particular, our findings highlight the importance of tailoring learning contexts on an individual basis, as well as the potential drawbacks when not doing so (as is the case in most of today's educational settings).

ETHICS STATEMENT
All participants provided written consent according to the ethical regulations of the Geneva University Hospital and the study was performed in accordance with the Declaration of Helsinki.

AUTHOR CONTRIBUTIONS
KA, JM, and SS were involved in all aspects of the study from initial idea to submission of paper.