Questioning the Role of Forward Associative Strength in False Memories: Evidence From Deese-Roediger-McDermott Lists With Three Critical Lures

We report an experiment examining the factors that produce false recognition in the Deese-Roediger-McDermott (DRM) paradigm. We selectively manipulated the probability that critical lures produce study items in free association, known as forward associative strength (FAS), while controlling the probability that study items produce critical lures in free association, known as backward associative strength (BAS). Results showed that false recognition of critical lures failed to differ between strong and weak FAS conditions. Follow-up correlational analyses further supported this outcome, showing that FAS was not correlated with false recognition, despite substantial variability in both variables across our stimulus sets. However, these correlational analyses did produce a significant and strong relationship between BAS and false recognition. These results support views that propose false memory is produced by activation spreading from study items to critical lures during encoding, which leads critical lures to be confused with episodically-experienced events.


INTRODUCTION
False memory has been studied extensively using the Deese/Roediger-McDermott paradigm (DRM; Deese, 1959;Roediger and McDermott, 1995). In this paradigm, people study a list of words, which are all related to the same non-studied word known as the critical lure. On a subsequent memory test, critical lures are often mistakenly recalled or recognized (e.g., Arndt and Beato, 2017;Pitarque et al., 2018;Huff et al., 2020;Beato and Arndt, 2021). While this paradigm produces robust false memory, there is substantial variability in the false recognition that occurs across DRM lists (e.g., Gallo and Roediger, 2002;Beato and Díez, 2011;Cadavid and Beato, 2017;Coane et al., 2021).
In order to understand this variability, many studies have focused on the roles played by the probability that list items produce the critical lure in free association (referred to as backward associative strength or BAS) and the probability that the critical lure produces list items in free association (referred to as forward associative strength or FAS). This work has shown that BAS influences false memory reliably (e.g., McEvoy et al., 1999;Roediger et al., 2001;Gallo and Roediger, 2002;Arndt, 2006;Arndt and Gould, 2006;Howe et al., 2009a;Knott et al., 2012). On the contrary, prior work examining the effect of FAS on false memory has produced inconsistent results. Some studies did not find significant correlations between FAS and false recall/recognition (e.g., Roediger et al., 2001;Gallo and Roediger, 2002;Beato and Arndt, 2014), while other studies found correlations between FAS and false memory (e.g., Brainerd and Wright, 2005;Howe et al., 2009b;Arndt, 2012bArndt, , 2015. Brainerd and Wright (2005) argued that the lack of correlation found between FAS and false memory by, for example, Roediger et al. (2001) was due to the restricted range of FAS values used in their lists. Indeed, this same criticism applies to most studies showing that FAS fails to predict false memory (see Beato and Arndt, 2014 for an exception).
Beyond the empirical considerations highlighted above, the question of whether FAS impacts false memory is important theoretically. One class of theories, associative activation theories (Roediger et al., 2001;Howe et al., 2009b), posit that false memory in the DRM paradigm is caused by the activation spreading from study items' representations in semantic memory to critical lures' representations. As a result, these theories propose that BAS, but not FAS, should impact false memory. In contrast, other theories suggest that featural similarity between study items and critical lures (Arndt and Hirshman, 1998;Brainerd et al., 2008;Arndt, 2012a;Brainerd et al., 2020) increases false memory. In the view of these theories, both BAS and FAS should impact false memory, because both variables can be interpreted to index the extent to which study items and critical lures share features. Thus, investigating whether FAS influences false memory will help to distinguish between theoretical views of false memory's genesis.
In the present study, we sought to determine whether FAS was related to false memory using lists that had substantial variability in FAS. In order to understand the unique role that FAS plays in producing false memories, it is important to separate the contributions of FAS and BAS to false memory, given that they are correlated (Brainerd et al., 2008). For this purpose, we built DRM lists that varied widely in FAS, while controlling BAS and employed DRM lists that were related to multiple critical lures (e.g., Beato et al., 2012;Cadavid et al., 2012;Beato and Arndt, 2014). Constructing lists that were related to multiple critical lures allowed us to evaluate correlations between FAS and false recognition using both the variability due to list-wide characteristics (the approach used in prior work, where lists are related to a single critical lure) and the variability due to individual critical lures' characteristics as the basis for analyses. To illustrate, consider the lists that were related to the general theme "School, " we constructed one list of associates (homework, work, school, book, class, and test) that had strong FAS with three critical lures (assignment, lesson, and study), and a second list of associates (school, book, work, boring, long, and test) that had weak FAS with three critical lures (essay, homework, and study). 1 Within a list that had a strong FAS-based relationship with its critical lures, there was variation in the summed FAS values of the three critical lures (e.g., assignment = 0.755, lesson = 0.294, and study = 0.341). Importantly, lists that had a weak FAS-based relationship with its critical lures also varied in the summed FAS values of the three critical lures (e.g., essay = 0.158, homework = 0.265, and study = 0.321).
The benefit of constructing lists this way is that it enabled us to examine the effects of study item characteristics and critical lure characteristics on false recognition separately. In contrast, the standard DRM paradigm, where each list of words is related to a single critical lure, only allows evaluation of these characteristics simultaneously, making it impossible to understand their unique impact on false memory. Thus, if the results of correlational analyses using study item and critical lure characteristics converge, it is appropriate to infer the variables correlated with false recognition reflect a general property of DRM lists, as well as the factors that drive false memory. On the other hand, if the results of correlational analyses produce different results using study item and critical lure characteristics as the unit of analysis, one may have less confidence that they reflect a general property of the factors that drive false memory, and instead may reflect specific item characteristics. As a consequence, factors that show the same effects for both sets of analyses are more likely to be key drivers of false memory effects, making them carry greater theoretical importance.

Participants
The sample comprised 40 native English speakers (70% female) who participated as part of a course research appreciation requirement. Participants' ages ranged from 18 to 20 years (M age = 18.60; SD = 0.78).

Materials
A total of 28 six-word DRM lists were constructed as stimuli (see Table 1). Lists were built to ensure that three critical lures produced the same six study items in free association (i.e., they were related via FAS) based upon the University of South Florida free-association norms (Nelson et al., 1998). This list length was chosen because it allowed us the best opportunity to construct lists with multiple critical lures that manipulate FAS while controlling BAS across levels of FAS. While there is a tendency for critical lure false alarms to be lower with shorter lists, studying six associates of critical lures produces robust false memory (e.g., Robinson and Roediger, 1997). 1 The astute reader will notice that some words are repeated across the two sets of critical lures and study items. As is clarified in the method section, participants studied either the strong or weak FAS lists for a given general theme. Thus, the overlap in specific stimulus items is not problematic for assessing the effects of FAS on false recognition.
The FAS values for each critical lure word (lure word FAS hereafter) were determined by the sum of the probabilities that each critical lure produced its six associated words in free association. Similarly, the FAS values for each list (FAS list strength hereafter) were calculated as the sum of the FAS values for the three critical lures (similar to Robinson and Roediger, 1997;Beato and Díez, 2011;Beato and Arndt, 2014). BAS values, measured as the probability that study items produced critical lures in free association, were similarly calculated for both critical lures and lists.
There were 14 "general themes" in the lists (e.g., music). For each theme, we built two different six-word study lists. One of the two study lists per general theme included six associates that had relatively stronger FAS relations to critical lures (high-FAS lists hereafter) and the other included six associates that had relatively weaker FAS relations to critical lures (low-FAS lists hereafter). For example, for the general theme "music" the high-FAS critical lures were CLARINET (FAS = 0. Further, there was a wide variability in FAS, such that FAS list strength ranged from 0.45 to 2.30 2 and lure word FAS ranged from 0.127 to 0.865. Finally, we constructed lists in a way that controlled mean levels of BAS across the stimulus sets that varied in FAS (M = 0.34 and M = 0.29, for the high-and low-FAS lists, respectively), t(26) = 0.37; p > 0.05, although BAS still varied substantially across stimulus sets and for the critical lures within a stimulus set. Table 2 reports FAS and BAS values per critical lure.
Finally, we built five additional six-word lists, each with three critical lures (see Table 3), to be used as unrelated distractors and unrelated critical-lure distractors on the memory tests. Distractor lists were constructed to ensure that they were associatively unrelated to study items (Nelson et al., 1998) following a procedure similar to that used to construct study lists, such that FAS list strength ranged from 0.93 to 1.46. The recognition memory test included 168 words randomly intermixed: the 84 studied words, the 42 critical lures, and 42 distractors (15 unrelated critical-lure distractors, 27 unrelated distractors).

Procedure
First, participants were informed about the nature and procedure of the study and signed a consent form. Participants were tested individually and were instructed that their task was to remember the words as best they could, because they would be given a memory test later in the experiment. Each participant was presented with the study items from 14 lists, one list per general theme (seven high-FAS lists and seven low-FAS lists). General themes' high-FAS lists and low-FAS lists were presented equally often across participants. Further, we confirmed that no associates or critical lures were repeated within the stimuli experienced by a participant. Study items were presented individually on a computer screen for 2,000 ms with a 500-ms ISI blocked by DRM list. The associates within each list were arranged in decreasing order of FAS. The order of list presentation was randomized for each participant. At the conclusion of the study phase, participants completed a self-paced recognition memory test, where they were asked to determine whether each word was previously studied by pressing the "O" key to indicate it was OLD, and the "N" key to indicate it was NEW.

Power Analysis
We evaluated the power to detect the effects of FAS on false recognition using the three strategies highlighted above using G*Power (Faul et al., 2007(Faul et al., , 2009. When participants were used as the unit of analysis, our sample size of 40 was sufficient to detect a large effect (d z = 0.5), with power = 0.869. However, the power to detect a medium-sized effect (d z = 0.3) was considerably smaller, 0.457. When study lists (N = 28) were used as the unit of analysis for correlations, the power to detect a large effect (ρ = 0.5) was 0.799, which is near the conventionally-preferred level of 0.80 (Cohen, 1988). However, the power to detect a medium-sized effect (ρ = 0.3) was considerably lower, 0.348. Finally, when critical lures (N = 84) were used as the unit of analysis for correlations, the power to detect a large effect (ρ = 0.5) and a medium-sized effect (ρ = 0.3) were both sufficient by conventional standards, with power of 0.999 and 0.800, respectively.

Data Analysis
Given the relatively modest levels of power to detect mediumsized effects in most of our analyses, we chose to analyze our data using both standard null-hypothesis tests and Bayesian analysis (Kass and Raftery, 1995), which allows quantification of the strength of the evidence for the null and alternative hypothesis. We conducted standard analyses that treat FAS as a categorical variable (e.g., t-tests) and as a continuous variable (correlation) to assess its relationship to false recognition. We also conducted correlational analyses on a variety of other characteristics of our stimuli (Nelson et al., 1998) to evaluate how well semantic memory variables other than FAS predicted false recognition (see, e.g., Brainerd et al., 2008). As highlighted above, these analyses were conducted using study lists' characteristics as the unit of analysis and using critical lures' characteristics as the unit of analysis. Finally, we conducted Bayesian analyses using JASP (Version 0.14.1; JASP Team, 2020) to quantify the strength of the evidence for the observed statistical outcomes using BF 10 . BF 10 > 1 supports the alternate hypothesis, and a BF 10 < 1 supports the null hypothesis. Importantly, BF 10 between 3 and 20 is signifies positive evidence for the alternate hypothesis, BF 10 between 20 and 150 signifies strong evidence for the alternate hypothesis, and BF 10 greater than 150 signifies very strong evidence for the alternate hypothesis (Kass and Raftery, 1995). Similarly, BF 10 between 0.33 and 0.05 is signifies positive evidence for the null hypothesis, BF 10 between 0.05 and 0.0067 signifies strong evidence for the null hypothesis, and BF 10 below 0.0067 signifies very strong evidence for the null hypothesis. 3 Table 2 reports the mean percentage of true recognition per list and false recognition per critical lure and list, while Table 4 reports the mean percentage of true and false recognition as a function of FAS and whether items were studied, critical lures, unrelated critical-lure distractors, or unrelated distractors.

False Memory Effect
A one-way repeated-measures analysis of variance (ANOVA) was used to compare the percentage of old judgments to studied words, critical lures, unrelated critical-lure distractors, and unrelated distractors. This analysis revealed a significant difference, F(3, 117) = 521.535; p < 0.001, η 2 p = 0.930. Bonferroni post-hoc tests showed that hits to studied words (true recognition; M = 73.48, SD = 13.06) were higher than false alarms to critical lures (false recognition; M = 19.35, SD = 10.64), unrelated critical-lure distractors (M = 5.67, SD = 7.78) and unrelated distractors (M = 5.92, SD = 7.45; p < 0.001 for all comparisons). There were also significant differences between false alarms to critical lures and both unrelated critical-lure distractor and unrelated distractor items (p < 0.001). There was not a reliable difference between the two types of unrelated distractors (p > 0.05). Thus, the stimuli we constructed for this study produced the typical DRM false memory effect.

True Recognition, False Recognition, and FAS
The percentage of hits (true recognition) and false alarms to critical lures (false recognition) as a function of FAS is presented in Table 4. FAS did not impact hits, t(39) = 0.630; p = 0.532, d = 0.09, BF 10 = 0.206, or false alarms to critical lures, t(39) = 0.868; p = 0.391, d = 0.13, BF 10 = 0.242. The BF 10 values for hits and false alarms to critical lures indicate positive support for the conclusion that FAS failed to impact true and false recognition.
We also examined the relationship between FAS and false recognition using correlation. Our DRM lists included three critical lures per list, which allowed us to correlate FAS list strength and lure word FAS with false recognition separately. Neither of these analyses produced a significant correlation [r(26) = −0.026, p = 0.895, BF 10 = 0.237 for FAS list strength; r(82) = 0.047, p = 0.668, and BF 10 = 0.149 for lure word FAS]. The BF 10 values for these correlations again indicate positive evidence that FAS was unrelated to false recognition. It is important to note that these null correlations occurred despite there being substantial variability in false recognition across lists and critical lures. For example, some high-FAS lists yielded high levels of false recognition, such as the list with the critical lures BURGLAR, THEFT, and THIEF (45%), whereas other high-FAS lists produced very low levels of false recognition (e.g., SOCCER, SOFTBALL, and VOLLEYBALL list, 3%). In low-FAS lists, we also found wide differences in false recognition, ranging between 37% (DEAD, DEATH, and DIE list) and 5% (COMMENT, REMARK, and SUGGEST list).
Although FAS was unrelated to false recognition, we sought to explore whether other stimulus characteristics were related to false recognition. Thus, we correlated the characteristics of the study words included in the lists with the overall level of false recognition produced by that list (i.e., averaged across the three critical lures). The variables examined in these analyses were BAS, interconnectivity among the associates included in the lists (sum of the FAS and BAS of all possible pairings of study items and critical lures), associates' set size, associates' concreteness, the mean connectivity among lists' associates, the probability of a resonant connection, and resonant connection strength (Nelson et al., 1998). The only reliable correlation found in these analyses was between false recognition and BAS, r(26) = 0.643, p < 0.001, BF 10 = 152.01, indicating very strong evidence that false recognition and BAS were related. 4 We also conducted correlational analyses between false recognition and critical lure characteristics. We computed each critical lure's average BAS with study items, frequency of occurrence, concreteness, set size (i.e., number of different words produced by a critical lure), density (i.e., mean connectivity among all critical lure associates), accessibility index (i.e., number of word that produced the critical lure as a response), resonant connections (i.e., number of critical lure's associates that produced it as an associate), and resonant strength (i.e., associative strength from all the words produced by the critical lure to the critical lure; Nelson et al., 1998). This analysis showed there were significant correlations between false recognition and critical lures' BAS, r(82) = 0.662, p < 0.001, BF 10 > 1,422,000,000, frequency, r(82) = 0.388, p < 0.001, BF 10 = 95.27, resonant connections, r(82) = 0.463, p < 0.001 BF 10 = 2099.64, resonant strength, r(82) = 0.470, p < 0.001, BF 10 = 2982.68, and accessibility, r(82) = 0.479, p < 0.001, BF 10 = 4735.24. For each correlation, BF 10 indicated strong or very strong evidence each variable was positively related to false recognition.

DISCUSSION
The empirical and theoretical aim of this research was to analyze the effect of FAS on false recognition. In order to do this, we constructed stimulus sets that varied widely in FAS. The results of this study showed that false recognition was robust. Moreover, there was wide variability in false recognition rates per list, ranging from 3 to 45%. Thus, there was substantial variability in both false recognition and FAS, which is critical for assessing if there is a relationship between FAS and false recognition.
Despite empirical conditions that were conducive to observing a relationship between FAS and false recognition, no such relationship was found. This finding replicates previous research that has failed to find a relationship between FAS and false recognition (e.g., Roediger et al., 2001;Gallo and Roediger, 2002;Beato and Arndt, 2014), but stands in contrast to research that has found a reliable relationship between FAS and false recognition (e.g., Brainerd and Wright, 2005;Arndt, 2012bArndt, , 2015. Importantly, interpretation of the present results is not complicated by restricted range in FAS, a concern that has been advanced to explain the finding of Roediger et al. (2001) that FAS was not correlated with false recognition (see Brainerd and Wright, 2005). Finally, the present results extend prior findings of a null correlation between FAS and false recognition to DRM lists related to multiple critical lures.
Although FAS failed to predict false memory, our correlational analyses produced several notable results. Most importantly, BAS was associated with false recognition. This association is particularly notable because we sought to control the mean levels of this variable across the high-and low-FAS conditions. Despite this constraint, BAS was strongly correlated with false recognition, replicating extensive evidence that BAS is a reliable predictor of false memory (e.g., Roediger et al., 2001;Gallo and Roediger, 2002;Arndt, 2012bArndt, , 2015Beato and Arndt, 2017). Beyond BAS, our correlational analyses found that the factors that were correlated with greater false recognition generally measured the extent to which a critical lure is activated by the study of its associates, such as resonant connections and resonant strength. Thus, these measures may reflect, like BAS, how active a critical lure's representation is following study of its associates (Roediger et al., 2001).
At a theoretical level, the present results fit most naturally with associative activation views of false memory (Roediger et al., 2001;Howe et al., 2009b). These views posit that spreading activation from study item representations to critical lure representations plays a key role in producing false memory. Two cardinal predictions from these theories are upheld by the present data. First, that the primary driver of false memory is the extent to which study items activate lure representations in semantic memory, and thus the extent to which lure items can be confused with episodically-experienced items. This activation is most directly measured by BAS in word association norms. Second, that associative variables, which are unrelated to how much study items activate critical lures' representations, such as FAS, will not affect false memory. Both of these predictions were supported in the present study, despite the fact that we implemented a strong manipulation of FAS between lists and sought to control BAS across levels of that manipulation.
In addition to favoring associative-activation theories of false memory, the present results are puzzling from the perspective of theories highlighting the role that the similarity between study items and critical lures in semantic memory plays in producing false memory (Arndt and Hirshman, 1998;Brainerd et al., 2008;Arndt, 2012a;Brainerd et al., 2020). In particular, these views suggest that false memory increases with the similarity between study items and lure items, as well as the extent to which study lists' gist is encoded during study (Brainerd et al., 2020). Thus, FAS, BAS, and other measures of semantic memory activation should increase critical lure false memory. In contrast to this expectation, FAS failed to produce differences in false memory in this study, despite our intentional and substantial manipulation of this variable. In addition, FAS failed to correlate with false recognition, both when measured based upon study list characteristics and when measured based upon critical lure characteristics. Finally, in our analysis of list-wide semantic memory variables with false recognition as well as critical lures' semantic memory characteristics, the only correlation we found in both sets of analyses was between BAS and false recognition.
One set of outcomes from the present study may be taken as partial evidence favoring the view that similarity among study items enhances gist encoding, which is hypothesized to play a role in false memory production (Brainerd et al., 2008(Brainerd et al., , 2020. In our analyses of semantic memory variables associated with false memory, several semantic memory variables, such as critical lures' word frequency, resonant connections, resonant strength, and accessibility were all correlated with false recognition. While this broader set of semantic memory variables associating with false recognition is consistent with general semantic memory activation underlying false recognition (Brainerd et al., 2008), it is critically important that other key semantic variables, such as connectivity, failed to correlate with false recognition (Brainerd et al., 2020). 5 Indeed, it has been suggested that connectivity can serve as a proxy measure for a study lists' gist, since it assesses interrelationships among studied items, which can be viewed as assessing, in part, the semantic relationships among studied items that are thought to underlie a study list's overall gist (Brainerd et al., 2020). Thus, while views proposing that non-associative semantic memory activation underlies false memory are consistent with some aspects of the present data, the correlations observed in our results were (1) not as wideranging as would be expected if semantic memory activation is the primary basis for false memory and (2) not reliable for key variables thought to be proxy-measures of gist processing during encoding.
In closing, we wish to emphasize four key points. First, we failed to observe a correlation between FAS and false recognition, despite using conditions that provide an excellent opportunity for such a relationship to be found. Second, we observed a positive relationship between BAS and false recognition, despite not directly attempting to manipulate BAS in this study. Third, both of these results occurred when 5 Connectivity was not correlated with false recognition when study lists were used as the unit of analysis [r(26) = 0.025, p = 0.901, BF 10 = 0.236] and was negatively correlated with false recognition when critical lures were used as the unit of analysis, albeit not significantly so [r(82) = −0.180, p = 0.102, BF 10 = 0.507]. While BF 10 for this latter correlation falls in the range, where it fails to provide support for the null hypothesis, evaluating the statistical hypothesis that connectivity and false recognition were positively correlated, as gist-based perspectives predict, produces a value of p of 0.949 and a BF 10 = 0.054, which falls in the range of positive evidence that the two variables are unrelated, and is close to the range where the evidence is considered "strong. " we assessed the relationship between list-wide associative strength and false recognition as well as when we assessed the relationship at the level of individual critical lures. Importantly, because the FAS and BAS results occurred regardless of the method we used to calculate FAS, BAS, and false recognition, it suggests that the relationships we observed in this study are products of the nature of the associations between study lists and critical lures. Fourth, and finally, these results favor activation-based explanations of false memory (Roediger et al., 2001;Howe et al., 2009b) over similarity-based explanations (Arndt and Hirshman, 1998;Brainerd et al., 2008;Arndt, 2012a;Brainerd et al., 2020). Thus, these results best support the view that study items in the DRM paradigm activate critical lures' representations during encoding, which leads critical lures to be falsely recognized on a subsequent memory test.

DATA AVAILABILITY STATEMENT
The data collected for this study will be made available by the authors on request.

ETHICS STATEMENT
The research reported in this paper was reviewed and approved by the Middlebury College Institutional Review Board. The participants provided their written informed consent to participate in this study.