Richer concepts are better remembered: number of features effects in free recall

Many models of memory build in a term for encoding variability, the observation that there can be variability in the richness or extensiveness of processing at encoding, and that this variability has consequences for retrieval. In four experiments, we tested the expectation that encoding variability could be driven by the properties of the to-be-remembered item. Specifically, that concepts associated with more semantic features would be better remembered than concepts associated with fewer semantic features. Using feature listing norms we selected sets of items for which people tend to list higher numbers of features (high NoF) and items for which people tend to list lower numbers of features (low NoF). Results showed more accurate free recall for high NoF concepts than for low NoF concepts in expected memory tasks (Experiments 1–3) and also in an unexpected memory task (Experiment 4). This effect was not the result of associative chaining between study items (Experiment 3), and can be attributed to the amount of item-specific processing that occurs at study (Experiment 4). These results provide evidence that stimulus-specific differences in processing at encoding have consequences for explicit memory retrieval.

Words vary on a large number of lexical dimensions that characterize factors such as their frequency of usage, or that refer to structural characteristics such as shape (orthography) and sound (phonology). Words, rather helpfully, also vary in meaning, and this variability can be captured by numerous semantic dimensions that influence the speed with which words can be recognized or categorized (Pexman et al., 2008). A vast word recognition literature has sought to characterize how orthographic, phonologic, and semantic dimensions interactively contribute to our ability to read. Consistently, researchers have shown that the variability of a given word along any or all of these dimensions is an important determinant in how that word is processed, manifesting in differences in reading times and accuracy (Yap and Balota, 2009). Words are also convenient stimuli for experiments, and are often utilized in memory research as they offer a well-defined minimal unit that can easily serve in recognition and free recall memory paradigms. This raises an interesting question: we know that there are many characteristics of individual words that shape how those words are processed, but do these item-specific differences influence subsequent memory when words are used as stimuli?
One approach to characterizing these effects is also one of the most influential frameworks in human memory research. The levels of processing framework proposed by Craik and Lockhart (1972) provided a number of important ideas, including the assertion that deeper processing at encoding leads to more accurate recollection at retrieval. In later work the framework was refined in a number of ways, and depth of processing was distinguished from another important type of encoding: elaboration. While depth of processing refers to the fact that some domains of processing typically involve richer or more extensive processing than others, elaboration has been characterized as "richness or extensiveness of processing within each qualitative type (of processing)" (Lockhart and Craik, 1990, p. 100). That is, within a particular type or domain of processing (e.g., semantic processing) there is variability in processing richness and this variability has consequences for memory. Numerous studies of semantic elaboration showed that free recall could be influenced by manipulating the encoding conditions applied to the to-beremembered items (e.g., Craik and Tulving, 1975;Klein and Saltz, 1976;Ritchey and Beal, 1980;Ross, 1981;Hashtroudi, 1983) and importantly for the present discussion, by the variability in semantic elaboration prompted by the characteristics of the tobe-remembered items themselves (Seamon and Murray, 1976). This revised emphasis on elaboration helped to shift the levels of processing framework away from a focus on the depth of processing per se and toward a focus on how qualitatively distinct encoding operations influence memory. This shift was important, as the levels of processing framework was criticized for being underspecified (Morris et al., 1977) or worse, inherently circular (Nelson, 1977). However, despite this advancing construal of levels of processing, researchers continued to struggle with implementing the framework within a computational model (Eich, 1985;cf. Craik and Lockheart, 1986).
Researchers still show great interest in characterizing how variability in processing during encoding can influence subsequent memory. Indeed, the primary assertion of semantic elaboration (that the relative amount of processing within a single domain should predict subsequent memory) finds a more clearly specified counterpart in the construct of encoding variability 1 . Similar to elaboration, encoding variability captures the idea that variability in how items are processed will lead to differences in memory strength across items. This intuitive assumption has been implemented in models of recognition memory in order to account for the observation that studied items vary more in memory strength than new items (Hintzman, 1986;cf. Koen and Yonelinas, 2010). It has also been used to interpret the observation that brain-based changes at encoding predict the subsequent recall of items, for example item-wise variability in hippocampal gamma oscillations predict the likelihood of successful free recall (Sederberg et al., 2007). Encoding variability can also be implemented in models of free recall (Sederberg et al., 2008), offering a level of specification that the elaboration account lacks.
Both semantic elaboration and encoding variability literatures make the prediction that processing differences at encoding will lead to subsequent effects in free recall. However, neither has given much attention to potential differences in processing that are spontaneously elicited by the lexico-semantic characteristics of to-be-studied items. This is an important point; words are known to vary on a large number of lexico-semantic dimensions, and to the extent that this variability automatically shapes the processing of these items, both semantic elaboration and encoding variability accounts would predict subsequent effects in free recall.
In related research, Nelson and colleagues have investigated how the associative relationships between words can influence memory performance for individual words. In natural language usage words are produced in structured sentences that lead them to become entangled with one another. Nelson and colleagues captured these associative relationships by asking a large number of participants to list the first word that comes to mind in relation to a presented target word (Nelson et al., 1998). Using this database, Nelson and colleagues documented effects of words' Number of Associates (NoA; also known as associative set size) in a variety of memory tasks. Compared to words with many associates, words with fewer associates are more likely to be successfully retrieved during cued recall, however, manipulating NoA did not influence free recall performance (Nelson and Schreiber, 1992). That NoA influences cued but not free recall suggests that the influence of lexico-semantic variables on memory performance is likely task-specific. The concreteness variable shows a different pattern across tasks: relative to abstract words (e.g., VIRTUE), concrete words (e.g., CAT) show more accurate performance in cued recall, free recall and recognition memory tasks (Paivio and Csapo, 1973;Nelson and Schreiber, 1992;Hamilton and Rajaram, 2001). In visual word recognition, there have been repeated demonstrations that the effects of item-specific relative 1 The term encoding variability also refers to a class of phenomena in the spacing effect literature in which encountering an item in numerous (or variable) contexts confers a memory advantage relative to items encountered in a single context (e.g., Waters and McAlaster, 1983). Here, we use the term solely to refer to variability in memory strength in the sense that some items are encoded very well, and this influences subsequent memory performance. Our items are balanced with respect to their normative distribution across textual contexts (Brysbaert and New, 2009). semantic richness are multidimensional, leading variables like NoA and concreteness to dissociate across different visual word recognition tasks (Pexman et al., 2008;Yap et al., 2011). While it is not surprising to observe similar dissociations in a task as unconstrained as free recall, the potential for lexico-semantic variables to selectively influence different memory tasks highlights the importance of properly balanced stimulus sets. Seamon and Murray (1976) manipulated subjectively rated meaningfulness, which uses a Likert-type scale to measure the extent to which participants feel that a word arouses other associated words (with more words leading to higher values; Toglia and Battig, 1978). Unfortunately, it is unclear what information participants use when placing words on a dimension of meaningfulness and this variable shows significant correlations with other subjectively rated variables such as familiarity, imageability, and concreteness. Indeed, in the Seamon and Murray study the high meaningful words were also high on ratings of imagery and concreteness. Because of the difficulty in operationalizing meaningfulness it is unknown whether this manipulation is fine-grained enough to test theories of elaboration, since high meaningful words may differ on any number of dimensions from low meaningful words.
The goal of the current study was to investigate item-specific encoding variability in a more precise fashion than in previous studies, by investigating number-of-features (NoFs) effects in free recall.
NoF refers to the number of semantic features that participants list for different concepts in a feature-listing task (Pexman et al., 2002). The features listed for different concepts are considered "verbal proxies for packets of knowledge" (McRae, 2005, p. 42), rather than veridical descriptions of semantic memory. As they generate features, participants access representations derived from their experience with the concepts. McRae and colleagues  published feature norms for 541 concrete concepts. For instance, for the concept cow the normative features include perceptual characteristics like has four legs, has an udder, and is smelly. Other features describe behaviors, like eats grass, and moos. Some of the features describe the concept's function, like produces milk, or its context, as in lives on farms. There is variability in the number of features listed for different concepts (e.g., 20 for couch, 23 for cougar, 11 for table, 9 for leopard) and this variability is related to responding in word recognition tasks (lexical decision, semantic categorization), such that responses are faster and more accurate for words with many features than for words with few features, even when other variables, like word length, frequency, typicality, and concreteness, are controlled (Pexman et al., 2002(Pexman et al., , 2003(Pexman et al., , 2008Grondin et al., 2009;Yap et al., 2011). The processing advantage observed for high NoF words has been attributed to greater semantic activation for high NoF concepts (Pexman et al., 2003).
NoF effects have only been examined in visual word recognition tasks. In the present study we investigated whether NoF effects can be observed in free recall. Compared to past investigations that manipulated meaningfulness and concreteness, the relative transparency with which the NoF variable is defined allowed us to test for fine-grained effects of item-specific encoding variability in memory performance. Given the nature of these effects as outlined above, one would expect that the enriched Frontiers in Human Neuroscience www.frontiersin.org encoding afforded by high NoF words would lead to more accurate recall. Of course, given the narrow definition of semantic richness captured by NoF, it was also possible that the difference between high and low NoF words would be too subtle to influence memory accuracy. To investigate these possibilities we chose free recall because an extensive literature shows that this task produces effects of another stimulus-specific property: concreteness (Dukes and Bastian, 1966;Paivio and Csapo, 1973;Nelson and Schreiber, 1992;Paivio et al., 1994;Ruiz-Vargas et al., 1996;Hamilton and Rajaram, 2001;ter Doest and Semin, 2005), and we modeled our procedure after the most recent of these studies. To be clear, however, we investigated NoF effects for sets of items for which concreteness, word frequency, familiarity, and contextual diversity was controlled, so any memory effects observed for NoF could be interpreted as incremental to those of each of these other factors. In Experiments 1 and 2 we tested for fine-grained effects of item-specific encoding variability by investigating whether NoF effects can be observed in free recall. In Experiments 3 and 4 we further explored the mechanisms for those effects by investigating whether NoF effects are the result of associative chaining among items rather than superior recall for individual items (Experiment 3) and by investigating whether NoF effects emerge during the incidental encoding of to-be-remembered items in a lexical decision task (LDT; Experiment 4).

Participants
Participants in Experiment 1 were 30 undergraduate students at the University of Calgary. In all of the experiments reported in this paper, participants reported that English was their first language, had normal or corrected-to-normal vision, and received course credit for participation.

Materials
The stimuli for Experiment 1 were 30 low NoF words and 30 high NoF words selected from the McRae et al. (2005) norms ( Table A1). The selected word sets differed significantly in NoF (p < 0.001) but were matched for printed frequency, contextual diversity (Brysbaert and New, 2009), familiarity, printed length, orthographic neighborhood size (Coltheart et al., 1977), and concreteness (see Table 1). As a result of this matching, differences between the low NoF and high NoF words on each of these dimensions were non-significant at p > 0.10 2 . We obtained concreteness values for 55 of the items from the MRC database (Wilson, 1988), and collected concreteness ratings for the five remaining items from a separate group of 31 participants.

Procedure
There were three components in a testing session: (1) a study phase, (2) a distraction phase, and (3) a recall phase. On each trial in the study phase, a word was presented in the center of a 17" monitor controlled by a Macintosh G3 computer using PsyScope (Cohen et al., 1993). Each word was presented for 2 s, followed by 3 s of blank screen before presentation of the next word (ter Doest and Semin, 2005). A total of 60 words were presented for study, in a different random order to each participant. Participants were asked to memorize the words for a later recall test. In the distraction phase, participants were asked to complete two unrelated tasks on the computer: a semantic categorization task and a ratings task, both with word stimuli. The time taken to complete the 2 In retrospect, we investigated possible issues with our stimulus sets. While the study item sets were matched on numerous lexical dimensions, subsequent examination of the High and Low NoF items revealed significant differences in the Number of Associates for the items used in Experiments 1 and 3, t(58) = −2.69, p < 0.05, SE = 1.82. In addition, we collected additional concreteness and new age of acquisition (AoA) ratings from separate groups of participants at the University of Calgary. AoA values were collected from a group of 144 undergraduate students. Each of these students provided AoA ratings for one-quarter of a larger set of 514 words, such that 36 students provided ratings for each word. The instructions for the AoA ratings task originated from Carroll and White (1973), but we used the modified 7-point scale of Gilhooly and Hay (1977). Concreteness ratings were collected for all 110 items used in Experiments 1 and 2 from a new set of 20 participants. For these new ratings data we detected small but statistically significant differences in concreteness and in AoA between the High and Low NoF items used in Experiments 1 and 3, t(58) = 2.80, p < 0.05, SE = 0.09, t(58) = 2.02, p = 0.048, SE = 0.27 respectively. NoA, concreteness and AoA were all balanced in the item set used in Experiments 2 and 4 (all non-significant p > 0.10).
While studies have shown that NoA is an important determinant of cued recall, manipulating NoA does not influence free recall (Nelson and Schreiber, 1992). In addition, while the observed differences in rated concreteness and AoA between High and Low NoF items was significant, both sets of items are very concrete, and are perceived to be learned early on, between kindergarten and the first grade. This, and the observation that a NoF effect of similar size was observed across all four studies provides prima facie evidence that the NoF effect observed in Experiments 1 and 3 was not driven by these differences in AoA or concreteness. Frontiers in Human Neuroscience www.frontiersin.org distraction tasks was 9 min. In the recall phase, participants were presented with a blank computer screen and were asked to try to remember the studied words, typing in each word they recalled. Participants were given 4 min to complete the recall phase but could request more time (Hamilton and Rajaram, 2001). None of the participants requested additional time.
Coding procedures for recall responses were adopted from those used in previous studies (e.g., ter Doest and Semin, 2005). Responses were judged correct if they were identical to, or were inflectional or misspelled variants of words on the study list (e.g., we accepted shelf for shelves, and plyers for pliers). Responses were judged incorrect if they did not appear on the study list or were synonyms of a studied word (e.g., we did not accept cabinet for cupboard).

RESULTS AND DISCUSSION
The mean proportions of low NoF and high NoF words recalled are presented in Table 2. In addition to the studied items, participants recalled an average of 2.80 words (SD = 3.08) that were not in the studied list. T-tests were conducted with subjects (t 1 ) and, separately, items (t 2 ) as random factors to compare correct recall for low and high NoF words. Results showed a significant NoF effect (t 1(29) = 3.65, p < 0.001, SE = 0.02; t 2(58) = 2.91, p < 0.01, SE = 0.03): recall was better for high NoF words than for low NoF words. This was, to our knowledge, the first report of a NoF effect in memory and we sought to replicate it with a different set of items in Experiment 2.

Participants
Participants in Experiment 2 were 37 undergraduate students at the University of Calgary.

Materials
The stimuli for Experiment 2 were the 25 high NOF words and 25 low NOF words used in Pexman et al. (2002) ( Table A2). The selected word sets differed significantly in NoF (p < 0.001) but were matched for printed frequency, contextual diversity (Brysbaert and New, 2009), familiarity, printed length, orthographic neighborhood size (Coltheart et al., 1977), and concreteness (see Table 1). All matching was non-significant at p > 0.10. We obtained concreteness values for 26 of the present items from the MRC database (Wilson, 1988), and collected concreteness ratings for the 24 remaining items from a separate group of 31 participants. No participants requested additional time for free recall.

RESULTS AND DISCUSSION
The mean proportions of low NoF and high NoF words correctly recalled are presented in Table 2. In addition to the studied items, participants recalled an average of 2.49 words (SD = 3.08) that were not in the studied list. Results showed a significant NoF effect (t 1(36) = 3.23, p < 0.005, SE = 0.02; t 2(48) = 2.01, p < 0.05, SE = 0.04): the proportion of correctly recalled items was higher for high NoF words than for low NoF words. With the existence of the effect established and replicated, we next sought to investigate the source of the effect. The free recall task has a long history in memory research (Kirkpatrick, 1894). Participants must engage in a selective search of memory in order to produce items studied at an earlier time. The unconstrained nature of this process means that multiple informational dimensions are free to interact with this search, yielding a long list of factors that influence recall dynamics. Factors such as the relative decay of items from memory (e.g., recency effects; Glanzer and Cunitz, 1966), additional rehearsal at study (e.g., primacy effects) and any factor that might influence the order with which information comes to mind, such as the order of presentation at study or semantic proximity to other items on the study list (e.g., contiguity effects; Kahana, 1996) all dynamically contribute to recall performance.
Recent work by Kahana and colleagues has produced models of immediate free recall that successfully incorporate many of these factors (Sederberg et al., 2008). Importantly, they also outline a mechanism for effects of item-specific encoding variability, and have the potential to account for the observation of a NoF effect in free recall. For example, the temporal context model (TCM-A) of Sederberg et al. (2008) frames free recall as the result of a series of stages. At study, the presentation of the to-be-studied items drives the evolution of a context layer which is stored in memory. Since item presentation drives the evolution of context, temporal information about the successive order of items, previous contexts associated with that item (i.e., semantic information), and information about the current context combine to create a context representation that can then guide later memory search. It is through this mechanism that the overall study context forms associations with the representations of the individual studied items, which enables subsequent retrieval of those items during recall. Free recall of items using these contextual states is modeled as a competitive process among a set of leaky accumulators (Usher and McClelland, 2001). Items that leave a stronger trace in the context layer at study will be more active during this process, and will be more likely to be produced during free recall. While TCM-A is a model of immediate recall, this framework provides a potential mechanism for effects Frontiers in Human Neuroscience www.frontiersin.org of item-specific encoding variability such as those observed when manipulating NoF in delayed recall. By increasing item-specific activity at encoding we vary the relative ability of an item to bind itself to the prevailing context at study (Sederberg et al., 2008), and thus increase the likelihood of that item being active during subsequent recall. If, following TCM-A, NoF effects in free recall result from variability in encoding at study, then we can make two predictions. First, the relative increase in an item's ability to bind itself to context should be specific to that item. Any subsequent benefit in free recall for that item should be driven by its improved ability to compete during free recall, not by any form of contiguity effect in which the temporal or semantic relationships between items at study influence retrieval, thereby creating associative chains between items that are recapitulated in free recall (Polyn et al., 2011). Thus, while we should very likely observe contiguity effects in recall, these contiguity effects should not be stronger for high NoF words than for low NoF words. Second, the locus of the NoF effect should be at encoding, and thus the quantifiable amount of semantic processing for an item that occurs at study should predict the likelihood of that item being recalled. These predictions were investigated in Experiment 3 and Experiment 4.

EXPERIMENT 3
In Experiment 3 we investigated whether NoF effects arise due to associative chaining between studied items as a function of NoF. As such we recorded the sequential ordering of item presentation at study (something we did not do in Experiments 1 or 2). It is worthwhile to note that Experiment 3 was not designed as a strong test of contiguity effects in free recall; there is substantial evidence that such effects are genuine (Kahana, 1996;Polyn et al., 2011). Rather, Experiment 3 was designed to test whether the NoF effect observed in Experiment 1 was the result of associative chaining between items due to contiguity, or whether it resulted from enhanced item-specific encoding.

Participants
Participants in Experiment 3 were 42 undergraduate students at the University of Calgary.

Materials
The stimuli used in Experiment 3 were the same 60 items used in Experiment 1.

Procedure
The procedure for Experiment 3 was largely the same as that described for Experiment 1, but here we used a single distracter task during the distracter phase. In the distracter phase participants made semantic categorization decisions to single words presented on the monitor, again for 9 min. Stimuli were presented using E-Prime presentation software (Psychological Software Tools, Pittsburgh, PA) on a 19 inch dell monitor. We used the same coding procedures outlined in Experiment 1. No participants requested additional time.

RESULTS AND DISCUSSION
The mean proportions of low NoF and high NoF words recalled are presented in Table 2. In addition to the studied items, participants recalled an average of 4.14 words (SD = 3.41) that were not in the studied list. Results showed a NoF effect that was significant by subjects but not by items (t 1(42) = 2.09, p < 0.05, SE = 0.01; t 2(58) = 0.27, p = 0.78, SE = 0.02): recall was again better for high NoF words than for low NoF words.
Following Kahana (1996) and Ozubko and Joordens (2007), we constructed conditional response probability plots in order to reveal any association by contiguity. We plotted the probability of recalling an item that was between one to five positions ahead of or behind the just recalled item. A within-subjects ANOVA using these positional conditional probability plots revealed a significant effect of position using Greenhouse-Geisser corrections (F (4.41, 176.60) = 4.73, p = 0.001, MSE = 0.01), this significant effect of contiguity indicates that participant recall was influenced by the sequential ordering of items at study. As Figure 1 demonstrates, there was a large probability that a just-recalled item was one study position ahead of a previously recalled item. This pattern, along with a general bias to recall items in the forward direction, is typical of contiguity effects in free recall (Kahana, 1996;Ozubko and Joordens, 2007).
Since overall participant recall was influenced by contiguity we next turned to the question of whether the observed NoF effect was also a result of associative mechanisms that operate across items. Following Ozubko and Joordens (2007), for each participant we calculated the conditional probabilities of recalling a high NoF or low NoF item next, given that participants had just recalled a high or low NoF item. Averaging across participants yielded four conditional probabilities (presented in Table 3) that are sensitive to associative chaining among items as a function of NoF; P(highNoF|highNoF), P(lowNoF|highNoF), P(highNoF|lowNoF), and P(lowNoF|lowNoF). Using this notation, P(highNoF|lowNoF) would reflect the probability of recalling a high NoF item, having just recalled a low NoF item. If associations among items only form as a result of temporal FIGURE 1 | Mean conditional probability plotted against distance from last item recalled.

Frontiers in Human Neuroscience
www.frontiersin.org proximity at study (i.e., the demonstrated contiguity effect) then these conditional probabilities defined in reference to NoF should be approximately equivalent. However, as demonstrated with word frequency effects (Ozubko and Joordens, 2007), varying NoF may lead to associative chaining among items. In this case, having just recalled a high or low NoF item would alter the probability of recalling either a high or low NoF item next, and significant differences among the four conditional probabilities should be observed. In order to account for the observation of a high NoF advantage in free recall, associative chaining as a function of NoF would have to take one of two forms. The first would be associative chaining among high NoF items, leading to an increased probability of recalling a high NoF item when having just recalled a high NoF item. In this situation, the P(highNoF|highNoF) should be significantly greater than the P(lowNoF|highNoF). Alternatively, the high NoF advantage could be explained via a decrease in associative chaining among low NoF items, thereby increasing the overall likelihood of producing high NoF words during free recall. In this situation, the P(lowNoF|lowNoF) should be significantly less than P(highNoF|lowNoF). To provide a test for this associative chaining as a function of NoF we used a paired-samples t-test to contrast the conditional probabilities P(highNoF|highNoF), P(lowNoF|highNoF), P(highNoF|lowNoF), and P(lowNoF|lowNoF) listed in Table 3.
The results revealed that the conditional probabilities did not differ as a function of NoF, specifically, there was no evidence for differential associative chaining among high NoF items: having just recalled a high NoF word, participants were just as likely to recall a high NoF word (38%) as they were a low NoF word [32%; t (41) = 1.33, p = 0.19, SE = 0.05]. Similarly, there was no evidence for reduced associative chaining among low NoF items: having just recalled a low NoF word, participants were just as likely to recall a low NoF word (33%) as they were to recall a high NoF word (32%; t (41) = 0.31, p = 0.76, SE = 0.05). Indeed, plots of associative chaining by contiguity for both high and low NoF items (Figure 2) resemble the plots for the overall data (Figure 1). A within-subjects ANOVA using within-NoF positional conditional probability plots revealed a significant effect of position using Greenhouse-Geisser corrections for high NoF items, F (3.95, 142.27) = 2.77, p = 0.03, MSE = 0.03. The same analysis for low NoF items (which are fewer in number, since fewer low NoF items were correctly recalled) revealed marginally significant results, F (2.46, 83.63) = 2.61, p = 0.06, MSE = 0.03. Clearly, both high and low NoF items are capable of showing some degree of association by contiguity, including the classic asymmetrical bias in favor of recalling items from study list positions that are nearer to the just-recalled-item. Given that we observed no significant evidence for associative chaining as a function of NoF, one could argue that our tests simply lacked power. We conducted a post-hoc power analysis based on the effect size reported for the associative chaining in the low-frequency advantage in free recall reported by Ozubko and Joordens (2007). Like NoF, word frequency is another stimulusspecific variable that has been shown to influence free recall. These calculations suggested that only 32 participants would be required in order for our paired-sample comparisons to reach statistical significance. Since we tested 42 participants this indicates that our contrasts were sensitive enough to detect association by contiguity that varied as a function of NoF. Again, it is important to note that our goal for these analyses was to explore whether differential association by contiguity provides an explanation for the observation of a NoF advantage in Experiments 1, 2, and 3, where items were presented randomly, and later freely recalled. Under these specific conditions, the bulk of the evidence suggests that differential association by contiguity does not account for the NoF effect.

Frontiers in Human Neuroscience www.frontiersin.org
These results suggest that NoF effects in free recall do not arise from the associative chaining of high or low NoF items. Rather, the lack of differential associative contiguity among items as a function of NoF provides evidence that NoF effects arise from item-specific encoding variability. A stronger test of this conclusion would require a demonstration that the extensiveness of item-specific processing at encoding predicts the likelihood of recall. Experiment 4 was designed to test this prediction. In Experiment 4 we also investigated whether the NoF advantage in recall generalizes beyond the intentional learning paradigm used in the three experiments reported thus far. On the one hand, the NoF advantage may arise because participants are able to engage in more elaborative encoding for high NoF words during intentional learning of those items in the study phase. On the other hand, the NoF advantage may arise due to more extensive activation of the semantic system that occurs when high NoF words are processed. In the former case, NoF effects should arise only in an expected memory test (intentional memory). In the latter case, NoF effects should also arise in an unexpected memory test (incidental memory). This possibility was tested in Experiment 4.

EXPERIMENT 4
The goal of Experiment 4 was to investigate NoF effects in unexpected recall. In an initial version of this experiment we copied the procedure of Experiment 1, but changed the study phase task to a lexical decision task (LDT) and did not tell participants that they would need to recall the LDT word items later. The distraction tasks and timing were the same as in Experiment 1; that is, a 9 min distraction phase involving word judgment tasks. Results for this version of the experiment showed very poor recall performance (<3% items correctly recalled) and high rates of intrusion (participants recalled many items from the distraction tasks). To make the experiment somewhat easier and to reduce intrusions, in Experiment 4 we used a shorter distraction task comprised of math problems.

Participants
Participants in Experiment 4 were 32 undergraduate students at the University of Calgary.

Materials
The stimuli for Experiment 4 were the same as in Experiment 2. There were also 50 non-words in the LDT.

Procedure
There were three components in a testing session: (1) a LDT, (2) a distraction phase, and (3) a recall phase. For the LDT participants first completed eight practice trials. Participants were told to decide whether each letter string presented in this task was a real word or a non-word, and to make their decision as quickly and as accurately as possible. Participants were not told that they would need to remember the LDT stimuli for a later phase of the session. In the distraction phase, participants completed a set of math problems. The time taken to complete the distraction task was 6 min. In the recall phase, participants were asked to try to remember as many of the LDT words as possible. Participants were given 4 min to complete the recall phase but could request more time. No participants requested additional time for free recall.

LDT
LDT responses that were incorrect (3.2% of trials), or that were faster than 250 ms or slower than 2500 ms (less than 1% of trials) were excluded from the RT analysis. Mean RTs and errors are presented in Table 4. Results included a significant NoF effect in RT (t 1(31) = 4.14, p < 0.001, SE = 12.55; t 2(48) = 2.16, p < 0.05, SE = 29.14) but not in errors (both t < 1). This was the typical NoF effect in LDT; responses were faster for high NoF words than for low NoF words.

Recall
The mean proportions of low NoF and high NoF words recalled are presented in Table 2. In addition to the studied items, participants recalled an average of 1.72 words (SD = 1.68) that were not in the studied list. Results showed a significant NoF effect (t 1(31) = 3.88, p < 0.001, SE = 0.02; t 2(48) = 2.72, p < 0.01, SE = 0.02): recall was again better for high NoF words than for low NoF words.
NoF effects in visual word recognition are thought to capture the contributions of semantic processing to performance  (Pexman et al., 2002). Therefore, we also examined the relationships between LDT performance and recall performance ( Table 4) on the assumption that the magnitude of the NoF effect in LDT provides an index of the extensiveness of semantic processing at study. This analysis is designed to test whether the magnitude of the NoF effect shown by an individual participant predicted the subsequent recall of items. Notably, larger NoF effects in LDT RTs were related to better recall for both low NoF words and high NoF words. It is worth commenting on the nonsignificant relationship between the size of the NoF effect in LDT RTs and the size NoF effect in free recall. Across four Experiments, while we consistently found a NoF effect in free recall, the size of the NoF effect was consistently small (between 4% and 7%). This reduction in variability likely limits our ability to detect any significant correlation between RT and the magnitude of the NoF effect in free recall. While there is no evidence that the extent of semantic encoding during the LDT study phase is directly related to the size of the NoF effect in free recall, there is evidence linking the extent of semantic processing at study to recall accuracy for both low and high NoF words. This is a critical point, as it suggests that variability in the extensiveness of semantic processing undertaken by our participants during study is related to how much information they will subsequently recall. Given the careful balancing of the items in Experiment 4, we can reasonably attribute this variability in processing during encoding to the relative stimulus specific differences in NoF. While strictly correlative, this provides a tenable explanation for the NoF effect in free recall that is consistent with the literature on encoding variability. That the extensiveness of semantic processing was related to recall performance supports the inference that more extensive semantic processing at encoding leads to more accurate retrieval.

GENERAL DISCUSSION
The purpose of the present study was to investigate whether memory accuracy was modulated by stimulus-specific differences in encoding variability. We chose to manipulate the number of features (NoF) in order to elicit a shift in the relative processing among items at encoding. Results of four experiments showed that recall was more accurate for high NoF words than for low NoF words. An investigation into the mechanism of these NoF effects suggested that the observed benefit was due to stimulusspecific differences in encoding at study and did not result from associative chaining by NoF word type across studied items. Further, correlational results revealed that memory accuracy was related to the extent of semantic processing undertaken in the encoding task (as captured by the NoF effect in LDT), but was not related to the time spent processing the items at encoding. These results serve to constrain alternative hypotheses about the locus of NoF effects in free recall, and provide additional evidence that NoF effects are effects of item-specific encoding variability. Prior to this study the NoF effect had only been observed in word recognition tasks. The fact that the NoF effect generalizes to memory tasks suggests that the NoF dimension captures substantial variability in semantic processing. We believe that the observed NoF effects in free recall provide a novel demonstration of semantic elaboration as proposed by the levels of processing framework, but as a framework, levels of processing does not actually provide a mechanism to account for these effects. Recent computational models of free recall such as TCM-A can, however, be modified to provide a mechanism for NoF effects by including a term that captures encoding variability at study. Across numerous tasks in which participants read individual words, the relative NoF of an item has been shown to influence the relative lexical processing of that item (Pexman et al., 2008;Yap et al., 2011). By modulating item-specific activity at encoding, NoF may vary the relative ability of an item to bind itself to the prevailing context at study (Sederberg et al., 2008). In TCM-A, free recall is modeled as a competitive process between items (Usher and McClelland, 2001) therefore, items that leave a stronger trace in memory will be more active and more likely to be produced during free recall. It is through this mechanism that variability in NoF would drive the encoding variability that ultimately leads to the observation of a NoF effect in free recall. In considering ingredients for a more complete model of free recall, Sederberg et al. (2008) outline such a mechanism, suggesting that if one assumes that items vary in the weighting of newly learned experimental item-to-context associations, one can model effects of encoding variability such as elaboration. The results of the current study provide behavioral evidence that the inclusion of a mechanism to account for encoding variability makes an important contribution to a models' characterization of free recall performance.
As reviewed earlier, the unconstrained nature of free recall allows for a number of factors to contribute to memory performance. In the current study, we examined the hypothesis that NoF effects are a form of item-specific encoding variability by examining whether there was a relationship between semantic processing at study and subsequent recall, and whether manipulating NoF lead to the formation of associations across items as a function of their NoF status. The balance of the evidence suggests that NoF effects are based in elaboration, not association. However, while we controlled for a number of lexico-semantic factors, the correlational structure of the English language virtually guarantees that our manipulation also encompasses some other undefined semantic relationship. For the present purpose, our interest was only whether a manipulation of NoF was sufficiently fine-grained to elicit a shift in the relative encoding of a set of to-be-remembered items. The results of Experiment 4 demonstrate that the amount of semantic processing at study directly predicts accuracy in free recall, and thus provide evidence in favor of this hypothesis, but alternative explanations are also possible. Howard and Kahana (2002) demonstrated that semantic similarity (calculated from LSA; Landauer and Dumais, 1997) led to significant semantic clustering in free recall, with items that had similar LSA vectors more likely to show association in free recall. In future research it would be useful to investigate whether other dimensions of semantic richness which have been shown to influence word recognition performance, such as number of semantic neighbors and contextual dispersion (see Pexman et al., 2008, for a review), are also related to memory task performance.

Frontiers in Human Neuroscience
www.frontiersin.org Gallo et al. (2008) argued that the degree of richness or elaboration achieved within a given level of processing will have consequences for memory performance because of distinctiveness. That is, they argued that when more features are encoded for a given stimulus, memory for that stimulus is more distinctive. Gallo et al. did not directly test the effects of "more features" on memory performance but we did so here. Our results confirm that, even when items are equated in all other ways, items that activate more semantic features are better remembered.