Differential Effects of Valence and Encoding Strategy on Internal Source Memory and Judgments of Source: Exploring the Production and the Self-Reference Effect

Item memory studies show that emotional stimuli are associated with improved memory performance compared to neutral ones. However, emotion-related effects on source memory are less consistent. The current study probed how emotional valence and specific encoding conditions influence internal source memory performance and judgments of source (JOSs). In two independent experiments, participants were required to read silently/aloud (Experiment 1) or to perform self-reference/common judgments (Experiment 2) on a list of negative/neutral/positive words. They also performed immediate JOSs ratings for each word. The study phase was followed by a test phase in which participants performed old-new judgments. In Experiment 1, the production effect was replicated for item memory, but the effects of valence on item and source memory were not significant. In Experiment 2, self-referential processing effects on item and source memory differed as a function of valence. In both experiments, JOSs ratings were sensitive to valence and encoding conditions, although they were not predictive of objective memory performance. These findings demonstrate that the effects of valence on internal source memory and JOSs are modulated by encoding strategy. Thus, the way information is encoded can shed light on how emotion might enhance, impair or exert no influence on source memory.


Method Data analysis
Multinomial models.
The first step was to bring together all the participants' responses in a 3 x 4 table, where rows correspond to the source testing responses ('read in silence'; 'read aloud'; 'new') and the columns correspond to participants' responses ('read in silence'; 'read aloud'; 'read, but do not know if silently/aloud'; 'new'). This table was computed for each valence condition (see Table S2 and S3). The model adopted here followed Leshikar and colleagues (2015), which relied on the proposal of Batchelder and Riefer (1990). Accordingly, the following parameters were considered: 'D' as the probability of correctly recognizing studied stimuli irrespective of the encoding source; 'd' as the probability of correctly recognizing the stimulus source given that it was accurately identified as a studied item; 'b' as the probability of correctly guessing whether a previously studied item was studied or the probability of erroneously considering a new stimulus as old; 'g' as the probability of guessing the stimulus source given that the item was already assessed as old; 'a' as the probability of guessing the source given that the stimulus was correctly detected as study items. By imposing the constraint 'a' = 'g' (Batchelder & Riefer, 1990;Dodson, Holland, & Shimamura, 1998;Leshikar et al., 2015), eight parameters were estimated for each valence condition -two 'D', two 'd', three 'a', one 'b'giving rise to a 24-parameter full model. Of note, two additional restrictions were imposed: all the parameter values could only vary between 0.00000001 and 0.999999999 (Dodson, Prinzmetal, & Shimamura, 1998). As we had three 'a'/'g" parameters to estimate, their sum was constrained to one. Both parameters' estimation and model fit were computed using the excel solver function following Dodson, Prinzmetal et al. (1998), which employs the maximum likelihood ratio and the likelihood statistic (G 2 ). Additionally, goodness of fit for different models was assessed by comparing G 2 statistics with a chi-square distribution (alpha = .05). After an initial parameter estimation for the full 24 parameters' model, the goodness of fit of the general model was tested by changing the parameters until a satisfactory solution was found. Then, we compared the goodness of fit between different nested models in a two by two fashion, contemplating item/source memory accuracy and item/source memory response bias. As stated by Dodson, Holland et al. (1998), the idea is to compare models in which the parameters can vary without restrictions with models in which specific parameters are constrained to be equal. If the model fit does not differ significantly between models, it might be the case that the parameters are not different; if the model with free parameters reveals a better fit than the restricted one, it may suggest that the parameters are different.

Goodman-Kruskal gamma correlation.
To compute gamma, the following elements were considered: (a) the number of correct "remember" predictions by adding the cases that received a rating between 4 and 6; (b) the number of incorrect "remember" predictions by adding the cases with ratings between 4 and 6 that were later forgotten; (c) the number of incorrect "forget" predictions by adding the cases with ratings between 1 and 3 that were later remembered; (d) the number of correct "forget" predictions by adding the cases with ratings between 1 and 3 that were actually forgotten. These frequencies were then inserted into the following formula: G = (adbc)/(ad bc). However, it was not always possible to calculate the formula as some of the elements were equal to zero. To overcome this issue, an adjustment was employed as recommended by Snodgrass and Corwin (1988), and as adopted by previous studies (e.g., Bastin et al., 2012;Grainger, Williams, & Lind, 2016). More specifically, the value of 0.5 was added to each prediction frequency (a, b, c, d), and the result was then divided by the total number of judgments plus one (N + 1).
Gamma correlations provide a measure of association between the predictions about which words will be later remembered and forgotten and the actual performance of the participant. Large and positive gamma values are indicative of a good metamnemonic resolution, whereas values equal or below zero do not support an accurate relation between prediction and performance.

Results
The multinomial model results and specific statistical analyses performed on the response times and confidence ratings are presented for both Experiment 1 and Experiment 2.
The descriptive statistics are shown in Table S5

Experiment 1
Multinomial model results. After running the solver function on the 24-parameter model, it was possible to verify that the solution was not a good fit for the data, because the obtained G 2 value of 25.35 was above the critical chi-square value of 12.59 (considering six degrees of freedom). To obtain a G 2 value below 12.59, we tried to keep most of the parameters yielded by the initial solution and changed three 'a'/'g' parameters of only one valence condition. This alteration resulted in a G 2 of 12.42 which is below the critical chisquare value. All the parameter values are presented in Table S4. With this model, the ANOVA results were tested.
In the case of item memory, when both positive and negative parameters were set to be equal, the G 2 was 20.06, which suggests that the parameters might be different after all, G 2 (2) = 7.64, p < .05. When both aloud vs. silence conditions were equated, the G 2 was 38.63, which supports the ANOVA result that words read aloud are better recognized than words read silently, G 2 (3) = 26.21, p < .05. In the case of source memory and considering no statistically significant differences emerged from the repeated-measures ANOVA, we set all the source 'd' to be equal regardless of the experimental condition, and the G 2 value obtained was 17.80. This result indicates that the parameters are not different, G 2 (5) = 5.38, p < .05, in good agreement with the ANOVA outcome. Overall, the multinomial-based results are consistent with the ANOVA results.

Incorrect source responses.
The 3 x 2 Friedman's ANOVA yielded a statistically significant result, X 2 (5) = 17.25, p = .004. Specifically, the Wilcoxon tests with Bonferroni corrections (p < .006) showed that the proportion of incorrect responses was higher for negative words read aloud than for negative words read silently. No other comparisons reached the statistically significant threshold (see Table S6).
Only two comparisons survived the Wilcoxon tests with Bonferroni corrections (p < .006): the proportion of misses was higher for both negative and positive words read silently during the study phase when compared to negative and positive words read aloud, respectively (see Table S6).

Correct rejections and corrected false alarm rates.
Regarding the proportion of correct rejections, no statistically significant difference emerged, X 2 (2) = 5.93, p = .052.
Considering the proportion of corrected false alarm rates, the The Friedman's ANOVA showed a statistically significant effect, X 2 (2) = 6.17, p = .046. Nonetheless, when applying the Wilcoxon tests with Bonferroni corrections (p < .017), none of the comparisons survived this correction. Taken together, no significant differences were observed between the responses to negative/neutral/positive new words during the recognition test.

Correct rejections.
In the case of confidence ratings for accurately rejected new words, a repeated-measures ANOVA run on the factor valence yielded a significant effect, F(2, 54) = 12.57, p < .001, η 2 p = .32, ɛ = .81, which showed that participants were more confident when 40. In general, the mean proportion of "yes" responses was higher in the case of common judgments compared to self-referential judgments (all p < .001), and it also differed according to word valence: negative < neutral < positive (all p < .01).

Multinomial model results. When modelling the results of this experiment based on
the 24-parameter model, we came across the same problem reported in Experiment 1, that is, the solution found with solver was not the best fit for the data as the G 2 value of 44.39 was above the critical chi-square value of 12.59 (considering six degrees of freedom). In these circumstances, we applied the same strategy reported in Experiment 1 to achieve a better solution. We ended up obtaining a G 2 of 11.15 which is below the critical chi-square value.
All the parameter values are presented in Table S4. In the case of item memory, we started by testing if there was a difference between the parameters of neutral and negative stimuli. For this, negative and neutral parameters were set to be equal regardless of the source condition.
The obtained G 2 value was 43.46 which supports the difference between neutral and negative words, G 2 (3) = 32.31, p < .05. To confirm the main effect of source (self-reference vs. common), the item parameters were set to be equal for each valence condition, which revealed a G 2 value of 71.23. So, the model that posits the source conditions as different is a better fit for the data, G 2 (3) = 60.08, p < .05, which is in accordance with the ANOVA results. Concerning the source memory accuracy results, specifically in the context of the self-referential condition, the 'd' parameter was set to be equal for both neutral and positive stimuli. The G 2 value was 11.85, which suggests that these parameters might not differ, . Specifically, the proportion of incorrect source responses for positive words encoded in the common condition was higher in comparison with neutral words in the common condition (see Table S6).
Do not know responses. The 3 x 2 Friedman's ANOVA was statistically significant, X 2 (5) = 36.98, p < .001. The Wilcoxon tests with Bonferroni corrections (p < .006) showed that for both neutral and positive words studied in the common condition received more 'do not know' responses in contrast to neutral and positive words in the self-referential condition, respectively (see Table S6).

Misses.
A statistically significant result was obtained with the Friedman's ANOVA, X 2 (5) = 45.08, p < .001. In the case of words studied self-referentially, the proportion of misses was higher in the case of negative words in comparison with positive words.
Moreover, irrespective of valence, words encoded in the common condition presented higher proportion of misses in contrast to words studied in the self-referential condition (see Table   S6).
Correct rejections and corrected false alarm rates. In the case of correct rejections, the Friedman's ANOVA yielded a statistically significant result, X 2 (2) = 9.60, p = .008.
During the test phase, participants correctly identified more new neutral words as 'new' when