Recognition memory across the lifespan: the impact of word frequency and study-test interval on estimates of familiarity and recollection

The goal of this study was to investigate recognition memory performance across the lifespan and to determine how estimates of recollection and familiarity contribute to performance. In each of three experiments, participants from five groups from 14 up to 85 years of age (children, young adults, middle-aged adults, young-old adults, and old-old adults) were presented with high- and low-frequency words in a study phase and were tested immediately afterwards and/or after a one day retention interval. The results showed that word frequency and retention interval affected recognition memory performance as well as estimates of recollection and familiarity. Across the lifespan, the trajectory of recognition memory followed an inverse u-shape function that was neither affected by word frequency nor by retention interval. The trajectory of estimates of recollection also followed an inverse u-shape function, and was especially pronounced for low-frequency words. In contrast, estimates of familiarity did not differ across the lifespan. The results indicate that age differences in recognition memory are mainly due to differences in processes related to recollection while the contribution of familiarity-based processes seems to be age-invariant.


INTRODUCTION
Evidence from developmental and cognitive aging studies suggests that episodic memory follows an inverted U-shape function with an increase in performance from childhood to adolescence, a peak in young adulthood, followed by a steady decline in later life. Episodic memory refers to the memory of one's past, the "when," "where," and "what" of particular events one has experienced (e.g., Tulving, 2002). It is an important ability, which permits us to keep track of our life history and to distinguish new from old information. It is likely that different mechanisms are responsible for the performance rise and fall across the lifespan, but the development of overarching theoretical models has just begun (e.g., Graf and Ohta, 2002;Fingerman et al., 2011;for overviews). Nevertheless, there is general agreement that episodic memory has an active subjectively-controlled component which is engaged, for example, for recollecting what was said on a particular occasion, as well as a more passive autonomous component which seems involved in deciding whether a particular stimulus has been encountered before that is, whether it is "old" or "new." Both of these components seem to influence performance on recognition memory tests which is the focus of this article.
In a typical recognition memory experiment, words are presented in a study phase. Later, in a test phase, these words are presented again, intermixed with new words that have not been exposed previously, and participants are required to indicate for each word whether it is "old" or "new." A correct decision can occur as a consequence of recollecting the memory of that word as it was presented in the study phase (i.e., recollection) or due to more fluent processing which is attributed to the oldness of an item (i.e., familiarity). Recollection is assumed to engage processes which are more resource demanding and slower than familiarity (e.g., Hintzman and Caulton, 1997;Joordens and Hockley, 2000;Yonelinas, 2002).
In this article, we first present a brief overview of contemporary models of lifespan recognition memory development. We use these models to introduce the rationale and the hypotheses for the present study which investigated the trajectory of recognition memory performance across the lifespan. To our knowledge, this is the first study which used the same recognition task to examine performance across such a large segment of the lifespan. In three separate experiments, we investigated the impact of word-frequency and of different retention intervals in five cohorts ranging from 14 up to 85 years of age, allowing us to analyze the trajectory of performance on recognition memory hit and false alarm responses, as well as to obtain estimates of recollection and familiarity.

LIFESPAN APPROACHES
Only a few theoretical accounts are available for explaining lifelong changes in cognitive abilities (e.g., Baltes, 1987;Salthouse, 1996;Craik and Bialystok, 2006;Li and Baltes, 2006;Shing and Lindenberger, 2011). Perhaps the most prominent of these is Salthouse's (1985Salthouse's ( , 1996 account which focuses on the speed of cognitive operations. According to his account, the speed with which many cognitive processes can be executed increases from infancy to young adulthood and then declines from the twenties to old age, and it is assumed that the general or global slowing which occurs in late adulthood is the primary cause of age-related declines in cognition. Salthouse suggested that slowed processing could lead to cognitive deficits by two distinct routes, which he called a limited time mechanism and a simultaneity mechanism. The limited time mechanism captures the idea that insufficient time might be available for some operations when more or most of the available time is required for completing initial or precursor operations. In the context of a conversation, for example, there may not be sufficient time for reflecting on the full implications of a message because most of the available time is spent on the initial sensory and perceptual encoding of the message. Salthouse used the simultaneity mechanism to discuss the fact that slowing might reduce the pool of information that is available simultaneously (i.e., at the same time), because the products of earlier processing operations may have been lost before later processing operations are completed. Craik and Bialystok (2006) proposed a view of lifespan differences in cognitive processing which focuses on the development of knowledge representations and of cognitive control. By this view, representations are defined as crystallized knowledge structures or schemas, which serve as the scaffolding for encoding and retrieving episodic memories and as the knowledge base for making predictions about common events. By contrast, cognitive control refers to the processes which we operate on the knowledge structures, for example, by choosing the best schema for making predictions about a particular event or by using a schema to guide recollection of critical components of a recently experienced event. According to Craik and Bialystok, representations and control and their manner of interacting with each other evolve across the lifespan. Representational knowledge is assumed to increase rapidly during childhood, to continue accumulating throughout adulthood and to remain relatively stable in old age. In contrast, cognitive control is also assumed to increase from childhood to young adulthood, but to follow a steady decline thereafter. Craik and Bialystok noted that these developmental patterns are largely compatible with the developmental maturation of different brain areas, in particular the medial temporal lobe (MTL) and the prefrontal cortex (PFC). It is assumed that the development of MTL areas underlies the build-up and functionality of knowledge representations, while the developmentally later maturation and earlier attrition of prefrontal areas mediates the functionality of cognitive control.
Building on the framework of lifespan development introduced by Baltes and colleagues (Baltes, 1987;Baltes et al., 2006), Shing and Lindenberger (2011) have made a distinction between associative and strategic components in order to account for episodic memory performance across the lifespan. Associative components are assumed to be involved in binding different aspects of an event during encoding while strategic components are used to elaborate relational aspects on the basis of existing semantic knowledge during encoding and retrieval (see Shing et al., 2008). It is assumed that the associative component mostly relies on structures of the medial temporal lobes (MTL) while the strategic component depends primarily on the prefrontal cortex (PFC). Given that, the development of the MTL and the PFC follow different trajectories, differential hypotheses can be derived regarding the contribution of the associative and strategic components to memory performance at different phases of the lifespan (cf. Craik and Bialystok, 2006). For example, in agreement with the stronger decline of PFC functions compared to MTL function in old age, older adults engage less in strategic encoding and retrieval operations and this is assumed to account for the performance decline in episodic memory performance.

RECOLLECTION AND FAMILIARITY ACROSS THE LIFESPAN
None of the foregoing accounts of lifespan changes in cognition has focused on the distinction between recollection and familiarity which is a fundamental building block of many recent accounts of age-related declines in episodic memory performance. Recollection, assumed to depend on controlled processing and strategic elaboration, refers to the mental reinstatement of previously experienced events that gives rise to memories that are vivid and rich in contextual details. By contrast, familiarity refers to the subjective impression that an event has been experienced before, and more particularly, to a feeling of "I know this" or "I have experienced this before" which typically arises in the absence of any recollection of contextual information about the same event. Although the distinction between recollection and familiarity was not included in the lifespan models described in the preceding section, this distinction seems compatible especially with the view proposed by Craik and Bialystok (2006) as well as with the proposal of Shing and Lindenberger (2011). In addition, the distinction between recollection and familiarity also seems to map closely onto the distinction between controlled and automatic processing which has been offered as a framework for understanding adult age-related changes in episodic memory (Light et al., 2000;Prull et al., 2006). In an in-depth review article, Yonelinas (2002) summarized several other models of recognition memory which postulate recollection as an allor-none process and familiarity as a continuous process, while stipulating that these two make independent contributions to recognition memory performance.
Although a variety of different methods have been used for estimating the relative contributions of recollection and familiarity to recognition memory performance, they tend to lead to similar results and conclusions. For the present study, we relied primarily on the remember/know procedure which was originally introduced by Tulving (1985). This procedure requires participants in a recognition memory test to augment their "old" decisions with an additional judgment, indicating whether the test item was recognized as old based on the recollection of contextual details from the study phase or based on the familiarity of the item in the absence of recollection. The first of these judgments, called "remember" responses, are used to estimate the influence of recollection on performance while the second type, called "know" responses, are treated as revealing the contribution due to familiarity.
Developmental studies have revealed that recognition memory performance increases with age, and it appears that this increase is due primarily to recollection (i.e., remember responses), in the absence of age differences due to familiarity. This pattern of results was reported by Billingsley et al. (2002) in a study which included participants between 8 and 19 years of age. Similarly, in a sample of 6 and 24 year old participants, Ofen et al. (2007) found a significant correlation between age and remember responses, but not between age and know responses. Likewise, in a sample of 6-18 year olds, Ghetti and Angelini (2008) found a similar pattern of results, using confidence judgments combined with the analysis of receiver operating characteristics to estimate recollection and familiarity (ROC; cf. Yonelinas, 2002).
However, in a study that investigated the lifespan trajectory of the event-related potentials underlying recollection and familiarity, Friedman et al. (2010) found that 9-10 year old children recruited less familiarity-based processes compared to 13-14 year old children, young, and older adults. In contrast, older adults recruited less recollection based processes compared to the younger groups. Moreover, Mecklinger et al. (2010) found that the ERP correlate of recollection can be reliably recorded in 8 year old children, and they suggested that their recollection is already fully developed, but that their recognition memory network is still weaker and less matured.
Changes in recollection in combination with stability in familiarity are the typical pattern that has emerged from studies with older adults. Episodic memory performance is correlated negatively with age, and age-related declines are typically larger on recall than recognition memory tests (Craik and McDowd, 1987;Meier et al., 2002). In recognition memory, the age-related declines that have been observed are mostly due to lower levels of recollection, although some studies have also revealed an age-related decline in familiarity estimates. In all cases, the latter is less pronounced than the age-related decline in recollection (Parkin and Walter, 1992;Mäntylä, 1993;Perfect et al., 1995;Java, 1996;Norman and Schacter, 1997;Perfect and Dasgupta, 1997;Schacter et al., 1997;Mark and Rugg, 1998;Friedman and Trott, 2000;Clarys et al., 2002;Lövdén et al., 2002;Bastin and Van der Linden, 2003;Bunce, 2003;Comblain et al., 2004;Bunce and Macready, 2005;Duarte et al., 2006Duarte et al., , 2008Prull et al., 2006;Bugaiska et al., 2007;Skinner and Fernandes, 2009;Friedman et al., 2010). Different interpretations have been invoked to explain these somewhat discrepant results. For example, it has been argued that the differences are due to the particular measurement method that was used (i.e., the process dissociation procedure seems to be less prone to show deficits in familiarity estimates than the remember/know procedure; Light et al., 2000;Prull et al., 2006). Differences might also depend on the overall levels of performance, with studies reporting high estimates of recollection producing age effects on estimates of familiarity. However, it has been argued that this effect is spurious and rather related to potential ceiling effects (Yonelinas, 2002).

THE PRESENT STUDY
The present study was designed to investigate the lifespan trajectory of recognition memory test performance, and especially the distinct influences due to recollection and familiarity. The study included two critical variables-word frequency and study-test interval-that are known to have different effects on recollection and familiarity. Specifically, the goal of this study was to test the hypothesis that these variables may have a differential effect on the lifespan trajectory of recognition memory performance and its underlying processes.
Previous research shows that on recall tests, high-frequency words are remembered better than low-frequency words, suggesting an advantage for recollection (Gregg et al., 1980). In contrast, on recognition tests, the reverse pattern emerges (Glanzer and Adams, 1985). Because high-frequency words have higher baseline familiarity compared to low-frequency words, new highfrequency words are more likely to produce familiarity-based false alarms than new low-frequency words. Low-frequency words are assumed to have fewer contextual associations than highfrequency words and thus their situation-specific activation during the study phase of an experiment is assumed to result in the formation of more distinctive memory traces. Therefore, compared to high-frequency words, the recognition of low-frequency words is typically advantaged and results in a higher hit rate. For this reason, the hit rate for low-frequency words is typically used to estimate recollection, while the false alarm rate for highfrequency words is assumed to provide an estimate of familiarity (Reder et al., 2000). Because hit rates (i.e., correct recognition of old, previously presented items) are increased and false alarm rates (i.e., incorrect "recognition" of new, not previously presented items) are decreased when performance of low-frequency words is compared to performance of high-frequency words, this effect is also referred to as the word-frequency mirror effect (Glanzer and Adams, 1990).
Both recollection and familiarity are typically greater for low compared to high-frequency words, but the effect due to frequency tends to be larger on recollection than on familiarity (Gardiner and Java, 1990;Kinoshita, 1995;Gardiner et al., 1997;Guttentag and Carroll, 1997;Joordens and Hockley, 2000;Reder et al., 2000;Hirshman et al., 2002). This pattern of results is typically found with young adult participants. However, in a cognitive aging study, Balota et al. (2002) found an age-related decline in the hit rate for low-frequency words, but not for highfrequency words while there was a slight age-related increase in false alarms for both low-and high-frequency words. This result is consistent with the proposal that there is an age-related decline in recollection while familiarity remains stable. However, as Balota et al. did not include remember/know judgments to test whether word-frequency and age have an interactive effect their specific contribution to familiarity and recollection remain to be determined. One goal of the present study was to fill this gap.
A further goal of the present study was to test the impact of a study-test delay (i.e., the retention interval) on recognition memory performance as well as on estimates of familiarity and recollection. As reviewed by Yonelinas (2002), different theoretical predictions exist. According to models that assume that familiarity reflects a temporary activation of an item in the knowledge representation system, it would be expected to decrease rapidly (e.g., Mandler, 1979). In fact, over the very short-term (i.e., seconds to a few minutes) the empirical results agree with this notion (e.g., Hockley, 1991Hockley, , 1992. However, across longer time intervals (i.e., weeks to several months) familiarity seems to remain rather stable while there is a substantial decrease in recollection (e.g., Gardiner and Java, 1991;Hockley and Consoli, 1999). Wilson et al. (1983) investigated the word frequency effect either immediately or after an interval of 1 week to examine whether, after the delay, healthy controls would show a similar lack of frequency effect as evidenced on immediate tests by individuals with dementia. However, this was not the case. There was still a word-frequency effect after the 1 week interval. As they did not include remember/know judgments, however, it is not clear whether the relative contribution of recollection and familiarity changed across test sessions. Joordens and Hockley (2000) compared a test condition in which each of ten 24-item study list was immediately tested and in which remember/know responses were collected with a delayed condition that involved testing at the end of the session. Their results showed main effects of word frequency and study-test delay, but no interaction for hits and false alarms. They concluded that the impact of word-frequency and test delay on remember/know judgments is additive.
Consistent with these results, we hypothesized that test delay would affect both recollection and familiarity, that test delay and word frequency would both interact with age particularly for recollection, and that no triple interaction between age, word frequency and retention interval would emerge, neither for familiarity nor for recollection. In the present study we manipulated word-frequency and study-test interval across five age groups in order to test their influence on recognition memory performance and on the estimates of recollection and familiarity. In general, we expected to find an inverse u-shape lifespan trajectory. Moreover, we hypothesized to find a more pronounced u-shape function for low-frequency words than for high-frequency words for both recognition memory performance and estimates of recollection, but not for estimates of familiarity. While we expected a performance decline across the retention interval, we were particularly interested to test whether this pattern would flatten out more for high than for low-frequency words and whether this potential change in performance would be different across the lifespan.

General method
All three experiments used the same general method which is described here, while features unique to each experiment are reported in connection with each experiment.

Participants.
The experiments included participants from five age groups: Children (aged 14-16), young adults (aged 25-30), middle-aged adults (aged 45-50), young-old adults (aged 65-70), and old-old adults (aged above 75). The experiments were conducted in the context of a research method class at the Swiss Distance University. Each student had to recruit and test participants from each age group. Inclusion criteria were German as first language, normal or corrected to normal vision, and self-rated good health. In order to prevent the accidental inclusion of individuals afflicted with age-related pathologies, participants from both older groups (young-old and old-old) had to achieve a score of at least 27 on Mini Mental State Examination (Folstein et al., 1975).

Materials.
The study and test lists were composed according to the method of Balota et al. (2002;cf. Weiermann et al., 2010). High-and low-frequency words were selected from the word database of the University of Leipzig (http://wortschatz.uni-leipzig.de). The words ranged from 3 to 9 letters and were matched in letter-length across frequency categories. For the study and test phase of each experiment, words were displayed in the center of a computer screen, in white 60-point Times New Roman font against a black background.
Design. The basic design of each experiment had three factors. For all experiments, age group was a between-subjects factor while word-frequency was manipulated within-subject. In Experiment 1, retention interval was manipulated betweensubjects, while in Experiments 2 and 3, it was manipulated within-subject.

Procedure.
Each experiment consisted of a study phase and a test phase and participants performed both phases individually, under the direct supervision of the experimenter. In the study phase, participants were informed that they would see words presented on a computers screen one at a time. They were instructed to read each word aloud and to remember it for a later memory test. To verify comprehension of the instructions, participants were asked to summarize them for the experimenter. For the study phase, each word was displayed for 2 s. Then, the experimenter pressed a key to advance to the next word. Immediately after the study phase, there was a 20-min filled delay during which participants completed a set of unrelated questionnaires, including the Mehrfachwahl-Wortschatztest, a German equivalent to the National Adult Reading Test (Lehrl et al., 1991) to assess verbal intelligence.
For the test phase, participants were informed that they would see more words, some of them from the study phase (i.e., old words) and some which had not been displayed before (i.e., new words). They were instructed to indicate for each word whether it was old or new. After a "new" decision, the next word appeared immediately. However, after entering an "old" decision, participants were asked to make a Remember or Know judgment in Experiments 1 and 2, or a Remember, Know, or Guess judgment in Experiment 3, following the Method of Weiermann et al. (2010). Specifically, the participants were instructed to make a Remember response when they were able to recollect the word from the study episode, to give a Know response when they were not able to recollect the word, but nevertheless believed that they had studied it before, and to give a Guess response when they only guessed that they had studied it before. To collect the Remember/Know and Remember/Know/Guess responses, the experimenter pressed the appropriate key on the keyboard, and upon doing so, the next word was displayed. At the end of the test session, the Mini Mental State Examination (Folstein et al., 1975) was completed by each participant from the young-old and old-old groups.
Data analysis. For each participant, hit rates, false alarm rates and the discrimination score Pr (i.e., hits minus false alarms; Snodgrass and Corwin, 1988) were computed for each frequency category and condition. Based on remember/know judgments, estimates of recollection and familiarity were computed with the

Participants
A total of 186 individuals (37 children, 36 young adults, 36 middle-aged adults, 37 young-old adults, and 40 old-old adults) participated in the study. Two of the young-old adults and three of the old-old adults had to be excluded because they obtained a MMS value below 27 points. Demographic characteristics of the final sample are presented in Table 1.

Materials
Materials were 48 high-frequency and 48 low-frequency words. According to the vocabulary database of the University of Leipzig (http://wortschatz.uni-leipzig.de), their mean frequency class was 15 (SD = 1.5) for low-frequency words and 8.9 (SD = 1.5) for high-frequency words, t (94) = 20.01, p < 0.001. A random half of the items from each frequency condition was chosen to form the study list, with the remainder used as new items on the recognition test. The assignment of words to the study list was counterbalanced across participants such that each word occurred both as an old and a new item on the recognition test. The assignment of words to the recognition test was also counterbalanced across participants such that each word occurred in the immediate as well as in the delayed tests.

Procedure
The procedure was as described in the General Method section. For half of the participants, the test phase started immediately after the filled delay (immediate test); for the other half, it started 24 h later (delayed test).

RESULTS
The main focus of the analyses is on recognition memory performance Pr and on the estimates of recollection and familiarity. These results are depicted in Figure 1, and in order to facilitate comparison with other investigations, hits and false alarm rates are also presented in Table 2. The results show a consistent and prominent word-frequency effect, which was borned out by the statistical analyses presented in the Supplementary Materials.

Recognition memory performance
For the Pr scores, depicted in Figure 1A, a Three-Way analysis of variance (ANOVA) with age group (children, young adults, middle-aged adults, young-old adults, old-old adults) and retention interval (immediate test, delayed test) as between-subjects factors and word-frequency (high, low) as a within-subject factor showed significant main effects of age, F (4, 171) = 6.09, p < 0.001, η 2 = 0.12, retention interval, F (1, 171) = 35.76, p < 0.001, η 2 = 0.17, and word frequency, F (1, 171) = 208.7, p < 0.001, η 2 = 0.55. No interaction reached significance, all Fs < 1.93, ps > 0.11. Overall, recognition memory performance was higher for the immediate than delayed test (M = 0.54 and M = 0.39, respectively) and for low-than for high-frequency words (M = 0.57 and M = 0.37, respectively). Post-hoc Tukey HSD tests revealed that the age effect was due the lower performance of the old-old adults compared to both the young adults and the middle-aged adults (both ps < 0.01); moreover, young-old adults performed lower than young adults (p < 0.05). No other group difference reached significance (all ps > 0.11).

DISCUSSION
In Experiment 1, recognition memory performance showed the expected inverted U-shaped function due to age, the expected decline from the immediate to the delayed test, as well as the performance advantage for low-vs. high-frequency words. The absence of any interaction effects among these variables suggests that they exert an additive influence on performance. Collectively, these findings are consistent with previous research on both agegroup differences in recognition memory, as well as with research concerned with word frequency and retention interval effects. The estimates of recollection showed a different pattern; in addition to the expected effects due to age, retention interval and word frequency, they also revealed a significant interaction between age and word frequency which occurred because the age variable affected recollection of low-but not high-frequency words. As recollection is assumed to depend on elaborative processing, and low-frequency words are more likely to involve this type of processing, one interpretation is that aging is accompanied by a reduction in elaborative processing. In contrast to the estimates of recollection, the estimates of familiarity showed

Frontiers in Psychology | Cognition
October 2013 | Volume 4 | Article 787 | 6  no effect of age and no interaction effects with age as expected. However, they did reveal effects due to word frequency and for the retention interval manipulation. Overall, Experiment 1 revealed several provoking results. First, for high-frequency words both estimates of recollection and estimates of familiarity did not vary with age. Second, despite the fact that the word-frequency and retention interval manipulations consistently affected recognition memory performance, estimates of recollection and estimates of familiarity, there was a surprising lack of interaction effects, in particular regarding the retention interval manipulation.

REMEMBER
In order to increase the statistical power of the latter manipulation, we varied the retention interval within-subject in Experiment 2. By this variation, we were also able to investigate the stability of the estimates of recollection and familiarity across the retention interval. It is noteworthy that differences in the stability of individual differences (i.e., higher stability for recollection compared to familiarity) may be a trivial cause for the differential influence of age on recollection and familiarity. Given that the relationship between a certain measure and any other variable is limited by its reliability, it is clear that the extent to which relationships with other variables can be established is restricted by the stability of that measure itself. It has been shown previously that implicit memory measures-which rely on automatic processes-tend to be less reliable than explicit memory measures-which rely on controlled processes (cf. Meier and Perrig, 2000;Meier et al., 2009). Thus, it is possible that a similar pattern may be present for estimates of recollection and familiarity. In Experiment 2 this question was addressed directly.

Participants
A total of 149 participants, 30 children, 30 young adults, 30 middle-aged adults, 29 young-old adults, and 30 old-old adults participated in the study. One of the young-old adults and six of the old-old adults had to be excluded because they failed to achieve the cut-off score of 27 on the MMS. Demographic characteristics of the final sample are presented in Table 1.

Materials
Materials were 96 high-frequency and 96 low-frequency words. According to the vocabulary database of the University of Leipzig (http://wortschatz.uni-leipzig.de), the mean frequency class was 17.1 (SD = 2.8) for low-frequency words and 8.3 (SD = 1.3) for high-frequency words, t (190) = 27.93, p < 0.001. For each frequency, four different lists were created and administered in a counterbalanced manner across the conditions of the experiment.
Half of the items within each frequency condition were used as "old" items (i.e., two lists of each frequency condition) which were presented in the study phase and the others (i.e., two lists of each frequency condition) were used as "new" items. Half of the study words were used as old words for the immediate test and the other half was used for the delayed test. Similarly, half of the www.frontiersin.org October 2013 | Volume 4 | Article 787 | 7 new words were used for the immediate test and the other half were used for the delayed test. The assignment of words to the study and test lists was counterbalanced across participants such that each word occurred equally often in each counterbalancing condition.

Procedure
The procedure was as described in the General Method. After the 20-min filled interval, the immediate test phase was administered (immediate test). The participants returned the next day (i.e., after a 24-h retention interval) and then the delayed test was administered. Table 3 and the respective analyses are presented in the Supplementary Materials. They showed a consistent word-frequency effect that was slightly smaller in older adults. As in Experiment 1, we focus on recognition memory performance, and on the estimates of recollection and familiarity.
No interaction involving age reached significance, all Fs < 1.65, ps > 0.17. However, the interaction between word frequency and retention interval reached significance, F (1, 137) = 3.95, p < 0.05, η 2 = 0.03. Recognition memory performance was higher for low-than for high-frequency words and this difference was larger for the immediate than for the delayed test (immediate: M = 0.65 and M = 0.38, respectively; delayed: M = 0.42 and M = 0.19, respectively). Post-hoc Tukey HSD tests revealed that the age effect was due the lower performance of the old-old adults compared to the children, the young and the middle-aged adults (ps < 0.09 and 0.05, respectively); moreover, the young-old adults performed lower than the young adults (p < 0.05). No other group difference reached significance (all ps > 0.46).

Correlation analysis
With the variation of the retention interval within subjects, it was possible to investigate the stability of the estimates of recollection and familiarity. Previous findings have shown that the test-retest reliability of explicit memory tests-assumed to be based on recollection-is higher than the test-retest reliability of implicit memory-assumed to reflect processes such as familiarity (Meier and Perrig, 2000). Accordingly, we expected that the estimates of recollection would more stable across test conditions than those for familiarity. To test this assumption, we computed the correlation between the immediate and delayed tests for each estimate (recollection and familiarity), averaged across word frequencies and age groups. The results revealed significant correlation coefficients for both estimates (recollection: r = 0.53; and familiarity: r = 0.36, both ps < 0.001). More critically, however, a test for correlated but non-overlapping correlations (Raghunathan et al., 1996) showed that these correlations were significantly different, ZPF = 1.76, p < 0.05. Thus, the estimate of recollection was more stable than the estimate of familiarity and this difference may also have contributed to the presence of a strong age effect for estimates of recollection and a much weaker effect for the estimates of familiarity.

DISCUSSION
In general, Experiment 2 replicated the results of Experiment 1. As expected, recognition memory performance formed an inverted U-shape function, with higher memory performance in the immediate compared to the delayed test, and also higher performance for low-compared to high-frequency words. Overall, performance appeared to be somewhat lower than in Experiment 1. This is probably due to doubling the number of stimuli, which was a consequence of varying retention interval within-subject. As a result of the within-subject manipulation there was increased power to detect effects of retention interval which appeared in the form of an interaction with word-frequency. However, there was no interaction between retention interval and age. Thus, our hypothesis that an interaction between the retention interval and age may have been disguised by the between-subject manipulation of this variable in Experiment 1 was not confirmed. For estimates of recollection, there was a marginally significant interaction between age group and word frequency and the pattern of the trajectory was similar to Experiment 1 that is, an age-effect occurred only for low-frequency words, based on the lower estimates of older adults, but no age-effect occurred for high-frequency words. As expected, for estimates of familiarity, there was no significant age effect. However, familiarity was again affected by the retention interval and word-frequency manipulations, with lower estimates for delayed testing and for high-frequency words, thus replicating Experiment 1.
Overall, Experiment 2 corroborates the results of Experiment 1. First, it showed again that for high-frequency words, estimates of recollection did not vary with age. Second, there was still a lack of interaction effects, in particular with age-revealing the substantial stability of recognition memory and its basic processes. Moreover, correlational analysis revealed that the stability of estimates of familiarity was significantly lower than the stability of estimates of recollection and thus may have contributed to the different pattern of age-related trajectories. In Experiment 3, we followed up on this question by introducing a slight variation into the remember/know procedure to enhance the measurement property of the estimates of familiarity. Specifically, we provided an additional "guess" response option (cf. Gardiner et al., 2002). Previous research has shown that when a "guess" response option is available the discrimination between old and new items for "know" responses is increased and thus, the estimates of familiarity may also be improved (Eldridge et al., 2002;cf. Bruno and Rutherford, 2010). Moreover, by eliminating the error variance specific to guess-responses, the stability of the familiarity estimates may be further enhanced. Experiment 3 was similar to Experiment 2, except that we included a "guess" response option in the remember/know procedure.

Participants
A total of 192 participants, 39 children, 38 young adults, 40 middle-aged adults, 39 young-old adults, and 36 old-old adults participated in the study. Three of the young-old adults and one of the old-old adults had to be excluded due to a MMS value below 27 points. The data of two participants (one young-old adult and one old-old adult) were lost due to a technical error, thus the final sample consisted of 186 participants. Demographic characteristics of the final sample are presented in Table 1.

Materials
Materials were the same as in Experiment 2.

Procedure
The procedure was as described in the General Method section. After the 20-min retention interval, the immediate test phase was administered (immediate test). After a 24-h retention interval, the delayed test phase was accomplished in a second session. The only difference from Experiment 2 was that after entering an "old" decision, participants were asked to make Remember, Know, or Guess judgment rather than only a Remember/Know judgment. Specifically, the participants were instructed to make a Remember response when they were able to recollect the word from the study episode, to give a Know response when they were not able to recollect the word, but nevertheless believed that they had studied it before, and to give a Guess response when they only guessed that they had studied it before. Table 4 and the respective analyses are presented in the Supplementary Materials. They showed a consistent word-frequency effect that was slightly smaller in older adults. As in the previous experiments, we focus on recognition memory performance Pr, and on the estimates of recollection and on familiarity, which are depicted in Figure 3.

Frontiers in Psychology | Cognition
October 2013 | Volume 4 | Article 787 | 10  between retention interval and word frequency, F (1, 181) = 2.96, p = 0.089, η 2 = 0.02, revealing that although recognition memory performance was higher for low-than for high-frequency words, this difference was larger for the immediate than for the delayed test (immediate: M = 0.61 and M = 0.36, respectively; delayed: M = 0.42 and M = 0.20, respectively). More importantly, the interaction between age and word-frequency was significant, F (4, 181) = 4.26, p < 0.01, η 2 = 0.09. No other interaction approached significance, all Fs < 1.22, ps > 0.30. Separate ANOVAs into the interaction between age and word frequency showed a significant age effect for high-frequency words, F (4, 181) = 4.51, p < 0.01, η 2 = 0.09, as well as for lowfrequency words, F (4, 181) = 13.03, p < 0.001, η 2 = 0.22. For high-frequency words, post-hoc Tukey HSD tests revealed that the age effect was due the lower performance of the young-old and old-old adults compared to the children and the young adults (ps < 0.05 and 0.09, respectively). For low-frequency words, post-hoc Tukey HSD tests revealed that the age effect was due the lower performance of the young-old and old-old adults compared to the children, the young and the middle-aged adults (all ps < 0.05). No other group difference reached significance (all ps > 0.34).

Estimates of recollection
The estimates of recollection, depicted in Figure 3B, were also examined by a Three-Way ANOVA, which showed sig-  Error bars represent standard errors. LF, low-frequency words; HF, high-frequency words.
age effect was due the lower recollection of the young-old and old-old adults compared to the young adults (ps < 0.05 and 0.06, respectively). No other group difference reached significance (all ps > 0.22). Thus, it seems that the lack of significance of the interaction between age and word-frequency is rather due to a lack of statistical power than reflecting a failure to find the data pattern.

Correlation analysis
As in Experiment 2, we investigated the stability of the estimates of recollection and familiarity. Specifically, we were interested whether by the exclusion of guess responses from the calculation of the familiarity estimates the error related variance may be reduced and thus abolishing the reliability differences observed in Experiment 2. Again these scores were significantly different from zero with r = 0.56, p < 0.01, for recollection r = 0.57, p < 0.01, for familiarity. More critically, the test for correlated but nonoverlapping correlations by Raghunathan et al. (1996) showed no significance, ZPF = −0.08, p > 0.05. Therefore, the modified remember/know/guess procedure revealed a higher stability across the retention interval for the estimate of familiarity and it eliminated the differences that were present in Experiment 2.

DISCUSSION
In general, Experiment 3 replicated the main findings from the previous experiments. That is, for low-frequency words, there was a clear inverse u-shape pattern for the age trajectory and a substantial performance decline in old age. However, for highfrequency words, this was less pronounced. For estimates of recollection, there was again a somewhat different pattern for high-and low-frequency words, with significant age-related differences for low-, but not for high-frequency words. In contrast, estimates of familiarity were largely invariable across age, but were again affected by word-frequency, and marginally by the retention interval manipulation. It is noteworthy that the latter results were based on estimates of familiarity that showed comparable stability to the estimate of recollection across the retention interval and thus we can reject the hypothesis that the absence of a significant age-difference in estimates of familiarity is simply a methodological artifact.

GENERAL DISCUSSION
The purpose of this study was to investigate the lifespan trajectory of recognition memory test performance, and especially the distinct influences due to recollection and familiarity. We used the same recognition task to examine performance across a large segment of the lifespan that is, five groups ranging from 14 up to 85 years of age. In three separate experiments, we investigated the impact of word-frequency and of different retention intervals, two critical variables known to have different effects on recollection and familiarity. Here, we first summarize the results of the three experiments and then we discuss an additional "meta-analysis" in which we combined the findings from three experiments. We then connect our findings with theoretical approaches of lifespan development and conclude with a more general note. For recognition memory performance, we found consistent age effects across experiments, with a performance increase from children to young adults and a decline thereafter, thus confirming our hypothesis to find an inverse u-shape lifespan trajectory. As expected, performance was higher when tested on the same day compared to after a 24 h retention interval and for low-compared to high frequency words. Importantly, we found no significant interaction between age group and any experimental variable, thus suggesting a very robust recognition memory performance trajectory across the lifespan. For estimates of recollection the overall pattern was very similar. However, in Experiments 1 and 2, the inverse u-shape lifespan trajectory was only significant for low-frequency words that is, this pattern flattened out for highfrequency words. As hypothesized there was no triple interaction between age, delay, and word-frequency. For estimates of familiarity we did not find a significant age effect. However, estimates of familiarity for the same day were higher compared to after a 24 h retention interval and for low-compared to high-frequency words. Thus, the manipulations of word-frequency and retention interval showed a similar effect for recognition memory performance and the estimates of recollection and familiarity. In contrast, age affected both recognition memory performance and the estimate of recollection but the estimate of familiarity was mainly age-invariant. These results support the notion that recollection and familiarity recruit different processes which are differently affected by development and aging across the lifespan.
Because the method and design were similar across experiments, it was possible to calculate the same statistics based on the combined data from all three experiments. This "metaanalysis" was used to boost statistical power and for revealing effects that might have been missed in the individual experiments. Details about these analyses are presented in the Supplementary Materials. They revealed two additional significant effects. First, for overall recognition memory performance, the age effect was more pronounced for low-than for highfrequency words. Second, and more important, for estimates of familiarity a main effect of age group emerged, with both groups of older adults showing lower familiarity estimates than the young adults. Thus, when a large enough sample is tested, the results show that familiarity is not completely age-invariant. Rather they suggest that familiarity is also somewhat affected in old age, but much less than recollection.
The latter consideration is consistent with conclusions from aging studies. For example, Prull et al. (2006) stated that aging is associated with a stronger impairment in recollection than in familiarity, by contrast to the claim that aging is accompanied by a decline in recollection and invariance in familiarity. However, the absence of a developmental effect (i.e., between children and young adults) would support the pattern of invariance that was found in previous developmental studies (e.g., Ghetti and Angelini, 2008).
Moreover, the absence of a significant developmental effect is in line with the lifespan approach advocated by Shing and Lindenberger (2011). Specifically, according to this approach the associative component of episodic memory matures earlier in childhood than the strategic component which is reflected by an absence of an age effect in familiarity and the presence of an effect in recollection. In contrast, in old age, the associative component is also decreased resulting in both a decline in recollection and familiarity.
To conclude, we would like to emphasize that our study demonstrates that the pattern of age changes across the lifespan is rather robust across different experimental conditions. This suggests that age-related differences in recognition memory are rather stable across different situations-not only for performance measures, but also for estimates of the underlying processes.

ACKNOWLEDGMENTS
This work was supported by Grant IZK0Z1_139985 from the Swiss National Science Foundation to B. Meier. N. Rothen is also supported by the Swiss National Science Foundation (Grant PA00P1_145370). Data collection occurred in the course of a lectureship of B. Meier at the Swiss University of Distance Education, Brig, Switzerland. We thank the students of the University of Distance Education for conducting the experiments.