The Rapid Forgetting of Faces

Krill, Dana; Avidan, Galia; Pertzov, Yoni

doi:10.3389/fpsyg.2018.01319

ORIGINAL RESEARCH article

Front. Psychol., 27 July 2018

Sec. Perception Science

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.01319

The Rapid Forgetting of Faces

Dana Krill^1*

Galia Avidan^2,3

Yoni Pertzov^1*

¹Department of Psychology, Hebrew University of Jerusalem, Jerusalem, Israel
²Department of Psychology, Ben-Gurion University of the Negev, Beersheba, Israel
³Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Beersheba, Israel

How are faces forgotten? Studies examining forgetting in visual working memory (VWM) typically use simple visual features; however, in ecological scenarios, VWM typically contains complex objects. Given their significance in everyday functioning and their visual complexity, here we investigated how upright and inverted faces are forgotten within a few seconds, focusing on the raw errors that accompany such forgetting and examining their characteristics. In three experiments we found that longer retention intervals increased the size of errors. This effect was mainly accounted for by a larger proportion of random errors - suggesting that forgetting of faces reflects decreased accessibility of the memory representations over time. On the other hand, longer retention intervals did not modulate the precision of recall – suggesting that forgetting does not affect the precision of accessible memory representation. Thus, when upright and inverted faces are forgotten there is a complete failure to access them or a complete collapse of their memory representation. In contrast to the effect of retention interval (i.e., forgetting), face inversion led to larger errors that were mainly associated with decreased precision of recall. This effect was not modulated by the duration of the retention interval, and was observed even when memory was not required in the task. Therefore, upright faces are remembered more precisely compared to inverted ones due to perceptual, rather than mnemonic processes.

Introduction

Working memory refers to the short-term storage and manipulation of sensory information (Baddeley and Hitch, 1974). It is considered to be a core cognitive process underpinning a range of behaviors from perception to problem solving and action control (Hitch and Baddeley, 1976; Baddeley et al., 1985). Visual working memory (VWM) is involved in many perceptual and cognitive processes such as planning visually guided actions (Hayhoe et al., 2003; Hollingworth et al., 2008), however, it is highly limited. Its capacity limitations have been amply investigated and debated (Luck and Vogel, 1997, 2013; Cowan, 2001; Bays and Husain, 2008; Ma et al., 2014) but its temporal limitations have attracted much less attention. The most pertinent literature related to this issue has focused on the temporal robustness of VWM (e.g., Regan and Beverley, 1985; Magnussen et al., 1996), but has not elaborately examined the impact of extending the retention interval on memory performance (i.e., forgetting). The decline in performance after longer, as compared to shorter delays, reflects the loss of information due to imperfect maintenance processes, rather than imprecise encoding into memory or retrieval processes that are identical in all delay conditions. Thus, comparing performance following two different intervals enables us to isolate the effect of forgetting and maintenance processes from effects related to encoding and retrieval processes.

One of the first studies to address short-term visual forgetting following various delay intervals was conducted by Phillips (1974). Participants were asked to detect a change in two consecutive sets of checkerboard stimuli that either differed by one cell or were identical. Drops in performance during the first 600 ms were shown to reflect a loss of low-level sensory information (i.e., they were sensitive to small position translations of the whole stimulus), whereas the loss of information following longer delays was not sensitive to small position translation and therefore was considered to be driven by forgetting processes in more abstract short-term visual memory. The current study addresses the latter form of forgetting that takes place following retention intervals of a few seconds but not less than 1 s.

Not all studies have documented forgetting at this time scale. Recent studies by Ricker and Cowan (2010, 2014) found that only characters that were unfamiliar to participants (i.e., Hebrew letters shown to participants who were not Hebrew speakers) were forgotten, whereas familiar letters (i.e., English letters for participants who were English literates) were not forgotten. The authors concluded that forgetting is typically counteracted by a rehearsal process that is more effective in familiar stimuli that can be easily named. Thus, in the current study we have used unfamiliar stimuli that are hard to name. In addition, Ricker and Cowan (2014) found forgetting mainly in conditions involving simultaneous presentation of memory items compared to sequential displays, and that most time-based forgetting occurred between 1 and 6 s and not at longer delays. Therefore, we use parallel displays of memory array and retention intervals of a few seconds.

The above studies shed some light on rapid visual forgetting, but they do not address the mechanism involved in forgetting and failing to report the features of the previously displayed item correctly. One possibility is that following extended retention intervals, individuals are not able to access some of the memory representations that were accessible following shorter intervals. Alternatively, people may be able to maintain and access the object in memory but its representation becomes noisier and less precise with time.

To address this question, studies typically implement a delayed estimation task (Prinzmetal et al., 1998; Wilken and Ma, 2004; Zhang and Luck, 2008; Bays et al., 2009). In this paradigm, participants are required to reproduce a previously observed stimulus from an analog cyclic scale, such as the color of the corresponding item, by clicking on a color wheel. These tasks encourage participants to remember the fine details of an item, rather than its verbal tag and enable the documentation of the distribution of errors, thus providing data on the type of errors committed by participants. For example, a complete failure to access a memory representation should be manifested as a uniform distribution of errors across the scale, whereas a degradation in the fidelity of a representation should lead to a broader distribution of errors around the correct value.

Results obtained on these delayed estimation tasks suggest that extending the delay interval influences both types of errors: it increases the number of errors distributed randomly on the reporting scale, as well as broadens the distribution of errors around the correct target. In one study (Zhang and Luck, 2009), participants were shown displays with three simple objects (e.g., patches of color and shapes) and were required to reproduce one of the objects after a delay of 1, 4, or 10 s. Extending the retention interval led to a significant increase in random errors and to a modest, insignificant effect on the width of the distribution of errors around the correct feature of the item. A more recent study that used a larger variety of memory loads showed that more items in memory lead to steeper forgetting, which was reflected in both random errors as well as less precise reports (Pertzov et al., 2016).

Overall, these studies imply that when multiple, hard to verbalize visual objects are maintained in memory over extended time intervals, performance declines, as manifested in a greater number of random errors and more variable responses. Critically, all these studies addressed memory for simple features such as orientation, color, and simple shape elements. These stimuli, however, do not reflect the demands placed on the visual system in real life situations. Under such conditions, we hardly ever need to remember simple shapes and colors, but rather, we are required to remember complex objects such as the identity of the person in front of us in the line to a ticket booth. This distinction raises a critical question that has not been addressed to date; namely, how do people forget ecological (or more complex) objects? The present study focuses on memory for faces, given their unique ecological significance.

The way ecological, complex objects are maintained in memory may differ from the way basic features are maintained. Basic features are processed (Tong, 2003) and maintained (Harrison and Tong, 2009) in low-level visual cortex, whereas complex objects are processed and maintained in cortical regions higher up in the processing hierarchy (Grill-Spector and Malach, 2004). Indeed, the ability to remember complex objects in WM was shown to be more limited than the ability to remember simple features (Jiang et al., 2008). However, to the best of our knowledge, no study has reported how complex objects are forgotten across extended retention intervals. Two recent studies have used a delayed estimation task with face stimuli, but the employment of a fixed retention interval precluded the assessment of maintenance processes and forgetting. In one study (Lorenc et al., 2014), participants were asked to report the identity of a previously displayed face out of a set of 80 possible computer generated faces that varied continuously in terms of age and gender. The study showed that simply turning a set of faces upside-down lead to an increase in the width of the distribution of errors but did not modulate the fraction of random errors. Thus, this study suggests that memory representations of inverted faces are less precise (Lorenc et al., 2014). Another study (Zhou et al., 2018) investigated memory of faces of people from the same vs. other race with respect to the observer. They found that following long encoding time, the other race effect (ORE) was reflected in more random errors in the other race condition. When encoding time was more limited, other race faces were reported less precisely. The authors concluded that the ORE is driven by an inefficient encoding of other-race faces due to lack of visual experience with such faces.

As noted above, the usage of a fixed retention interval in these two studies did not enable them to isolate the effect of forgetting since reporting errors could be attributed also to visual perception, memory encoding and retrieval. Moreover, the usage of a limited face dataset in these experiments might have encouraged participants to attach verbal tags to the stimuli (e.g., the young man) and therefore confounded any direct assessment of VWM, a point which we further elaborate on in the section “Discussion”.

The present study investigated how faces are forgotten by using a delayed estimation task. One possibility is that faces are forgotten similarly to simple objects – hence their memory becomes less precise with time and sometimes becomes completely inaccessible. Alternatively, it could be that the precision of memory is stable and complex memory representations become inaccessible with time, or vice versa, that precision degrades with time but all representations stay accessible. To validate that the process we investigate is immediate forgetting of active representations we have used a large set of natural faces (with comparable age) that were displayed simultaneously. The use of a large set of stimuli, as opposed to a single set in all trials, is expected to hamper the usage of verbal and long-term memory strategies. This procedure, along with the incorporation of a delayed estimation task with various delay intervals enabled us to directly explore the mechanism behind immediate forgetting of complex objects from VWM for the first time.

Experiment 1

Methods

Participants

Twelve university students from the Hebrew University of Jerusalem, with normal or corrected-to-normal vision and normal color vision according to self-reports (mean age: 24.3 ± 1.8, eight female) participated in Experiment 1, which consisted of three 1-h experimental sessions. The study was approved by the Hebrew University ethics committee. All participants provided informed consent and received course credit or monetary compensation (∼$10.00 per hour).

Stimuli

One hundred and ninety realistic, color pictures of faces (78 female and 112 male) were taken from the following databases: Productive Aging Lab Face Database (Minear and Park, 2004), The IMM Face DB (Nordstrøm et al., 2004), and the Glasgow Unfamiliar Face Database (GUFD) (Burton et al., 2010). All faces had a neutral expression. The photos were cropped in a fixed round form, without hair (using Adobe Photoshop CS6). To study VWM and prevent verbal tagging and using long-term memory strategies, all faces were displayed only once on a given block and all faces in a trial had similar age, gender, skin tone and facial shape (e.g., cheekbones, jaw line). Each trial consisted of a circle of 18 faces (Figure 2) composed of three original faces from the pool and five morphed faces (Abrosoft FantaMorph deluxe V5) between each pair (83%A/17%B, 67%A/33%B, 50%A/50%B, 33%A/67%B, 17%A/83%B). Stimuli were presented on a 24-inch Dell U2412M monitor (resolution 1920^∗1080) and participants were positioned at a viewing distance of 60 cm from the screen.

Procedure and Experimental Design

The experiment was programmed in MATLAB and Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). The segment of the experimental design common to all the experiments is illustrated in Figure 1. Each trial began with the presentation of a central fixation cross (white, 3 pixels, 0.08° of visual angle) for 1,000 ms. This was followed by a stimulus array consisting of one or three faces (each face picture was displayed in 200^∗200 pixels, 5.17° × 5.17°). In trials with three faces in the memory array, faces were placed on the circumference of a circle with a radius of 150 pixels (3.88°) just above fixation and 120° clockwise and anticlockwise of the vertical meridian. In trials with one face in the memory array, the face was displayed randomly in one of those locations. The memory array was displayed on a black background for 1,500 and 4,500 ms for the 1 and 3 face conditions, respectively. Participants were instructed to remember the faces and, after a variable delay (1 or 6 s), to report the identity of one of the faces (the specific target face was cued by an empty circle at its original location). Participants reported the face identity by selecting a face from a circle of 18 faces [with a radius of 500 pixels (12.88°) around fixation; see illustration in Figure 2]. The circle was randomly rotated on every trial in order to hamper learning of the position of the original faces.

FIGURE 1

FIGURE 1. (A) Experimental design of one working memory trial. One or three faces (upright or inverted) are presented, followed by 1 or 6 s of a blank delay. Then, a spatial cue indicates which face from the memory array should be reported by selecting a face from the 18 face report circle. During the reporting stage the selected face appeared at the cued location. (B) The memory display of Experiment 1 consisted of one or three upright faces. (C) The memory display of Experiment 2 consisted of three upright or inverted faces. (D) The memory display of Experiment 3 consisted of one upright or inverted face.

FIGURE 2

FIGURE 2. Analysis technique. (A) The report circle comprised of 18 faces. The three original (un-morphed) faces are marked by red rectangular frames. In this example, face ‘0’ is the target face on which the participant is required to report, as this is the correct answer. Faces 6 and –6 are the other two original faces, and all the faces between them are linear morphs between the two original faces. When the subject selected a face that includes any resemblance to the target face (–5 to 5), this was treated as a precision error. Selection of a face that had no resemblance to the target face (errors above 5 and below –5) was treated as a random error. (B) An example of an error distribution of one participant in one condition. Precisions errors are marked in green and random errors in pink.

There were two types of trials distributed randomly among the three face conditions. In one, the 18 face report circle was composed of the target face and two novel faces. In the second type, the three faces from the memory array composed the report circle. The latter type of trials were harder because participants could misremember the exact location of the target face and erroneously report one of the other faces that appeared in the memory array (i.e., source error). In the former type, the two other faces that composed the reporting cycle were not related to the faces in the memory array, so participants were not likely to report the wrong face simply because of a confusion with a face from the memory array. We have used the two types of trials in order to control for the existence of such source errors in the experiment (i.e., reporting the wrong face due to misremembering of its location in the memory array). Previous studies have shown that such errors have significant contribution to forgetting of simple features such as orientation (Pertzov et al., 2016) and position (Pertzov et al., 2012). The chi-square analysis described below have validated that such confusion errors were observed in the condition in which the report circle consisted of the three faces in the memory array but not when it consisted of only one displayed face.

One block of the task included 30 trials consisting of all conditions in equal proportions: 10 trials with one face, and 20 with a three face array – 10 from each type of trials described above. Half of the trials had a 1-s delay, and half had a 6-s delay. None of the faces was repeated within a block and trials were randomly ordered within a block. Participants completed as many blocks as possible in an hour (between 4 and 6). To encourage participants to engage in the task, a feedback was presented every 10 trials, depicting the average error rate on the last 10 trials (the error rate calculation is described in the “Data Analysis” section). A score of 100 was given if the participant’s average magnitude of error was less than 1, and the score decreased with an increased error rate to a score of 60. The data of all experiments are available via the OSF at https://osf.io/t59p6/?view_only=37c9d58899774d8ba5615a9137716194.

Data Analysis

First, we analyzed the averaged size of absolute error when each report was considered with respect to the correct answer: error size was calculated as the distance between the participant’s answer and the correct target face. For example, participants were assigned an error score of zero if they selected the target face they were presented with. For an adjacent face to the target face (most similar morph), the error was 1 or −1 for clockwise or anticlockwise errors respectively. The errors on all trials of each participant and in each condition yielded a frequency distribution of errors (Figure 2B). To detach the analysis from any assumptions regarding the distribution of errors (Ma, 2018), we first analyzed the results using the mean absolute raw errors of each participant in each condition, similarly to earlier studies using a similar experimental procedure (Pertzov and Husain, 2013; Pertzov et al., 2013, 2017; Liang et al., 2016). Next we divided the distribution of errors to precision and random errors as illustrated in Figure 2.

Many of the studies that used delayed estimation tasks have used a data fitting procedure to dissociate the distribution to two or three underlying components (Zhang and Luck, 2009; Lorenc et al., 2014; Pertzov et al., 2016). However, this approach was less appropriate here because the distribution of errors is not likely to be smooth and cyclic as in the feature domain. While in other stimulus domains the similarity of the items in the reporting scale gradually changes across the scale, in this experiment only part of the stimuli were similar to the target object (all morphs that included the target face) and other stimuli were completely different than the target item (the two non-target original faces and the morphs between them). Hence, we explored the different types of errors directly using the following approach: We extracted two summary statistics from each distribution of errors: (1) Proportion of random errors: when a participant chose a face from the circle that did not have any resemblance to the target face [morph did not include any fraction of the target face (errors 6,7,8,9 in absolute value)]. In such cases we assumed that the participant did not remember the target face, and therefore just guessed. To obtain a proportion value, the number of such errors was divided by the overall number of trials. (2) Precision of recall – is calculated based on trials in which a participant reported a face that had some resemblance to the target face (i.e., was a morph of the target face). In such cases, we assumed that participants had some recollection of the target face. To quantify the degree of recall precision, we averaged the magnitude of the absolute errors from 0 to 5. Note that when participants did not have any recollection of the target face, they were likely to guess a random face and therefore sometimes report a face with some resemblance to the target face. Therefore, the number of random errors per bin (average number of errors of 6,7,8,9 in absolute value) was subtracted from the precision errors, and added to the proportion of random errors (see uniform distribution in Figure 2).

Note that the two summary statistics: (1) Proportion of random errors and (2) Precision of recall, are somewhat independent of each other. The proportion of random errors is sensitive only to the proportion of trials defined as random while precision of recall is sensitive to the magnitude of errors which is related to the shape of the error distribution rather than to the proportion of trials in it. Thus, it is reasonable that an experimental manipulation would modulate the proportion of random errors but not the precision of recall, and vice versa.

For statistical analysis, we applied a repeated measures ANOVA with number of faces (1 or 3) and delay duration (1 or 6 s) as factors. The mean absolute errors, the proportion of random errors and the average precision errors were the dependent variables (three different mixed ANOVAs). We also subjected the data to a JZS Bayes factor ANOVA (Rouder et al., 2012; Love et al., 2015; Morey and Rouder, 2015). Whereas a typical analysis of p-values does not enable the interpretation of null effects, this Bayesian technique allowed the evaluation of the strength of the evidence in favor of the null effect. In the main text we report the two models with the highest posterior probability and the probability ratio between them. A table with the full set of Bayes Factors (BFs) is provided in the Supplementary Materials – 1.

We used chi-square analysis to test whether two distributions were significantly different with no assumption on the type of distribution of errors. Because chi-square analysis requires a large number of samples per bin (Armitage et al., 2002), we collapsed the distributions of all participants into a single distribution.

First, we used chi-square to complement our comparisons between the average precision errors by comparing the entire distributions of precision errors (we used errors of 5 and below) rather than comparing a single summary statistic.

We also used chi-square analysis to validate our assumption about the uniform distribution of random errors. As mentioned above, we assumed that if a participant did not remember the target face, this constituted a guess and therefore the distribution of guesses should not be different from a uniform distribution. For the purpose of validation, we ran a chi-square to compare the distribution of errors above 5 to a theoretical uniform distribution. Note that unlike the chi-square analysis of precision errors in which two empirical distributions were compared, all the random distributions were expected to be uniform. Therefore, in the random error distributions, we did not compare two empirical distributions (both were assumed to be uniform) but rather each distribution to a theoretical uniform distribution to validate our assumption. In most of the conditions the distributions of random errors (>5) seem to spread uniformly [all χ²(3) < 6.943, all p > 0.05]. However, the distribution of errors in the condition in which the three faces from the memory array composed the report circle was significantly different from a uniform distribution [χ²(3) = 15.89, p ≤ 0.001]. This is expected since these trials were harder and participants might have misremembered the exact location of the target face and reported one of the other faces from the memory array (i.e., source error). In fact, in this condition the distribution of random errors deviated from uniformity because participants tended to select the other two original faces in the circle (error of 6 and -6) more than the other errors defined as random (>6) [paired one-tailed t-test (10) = 1.825, p = 0.049]. Because the focus of this study was not on source errors but rather on the rate of forgetting and its relationship to random and precision errors, we excluded this condition from the remainder of the analyses reported in the main text. Further analyses and a discussion of source errors in this condition are described in the Supplementary Materials – 2. Note that including this condition in the analysis did not lead to qualitative changes in the results.

Results

We first calculated the mean raw error: the absolute values of the errors of each participant and each condition were averaged. A repeated measures ANOVA confirmed that the general mean (absolute) error rate increased with the increase in set size [Set size main effect: F(1,11) = 19.103, p ≤ 0.001, η² = 0.635]. There was a descriptive increase in error rate with an increase in delay duration [Delay main effect: F(1,11) = 4.231, p = 0.064, η² = 0.278], and no significant interaction [interaction: F(1,11) = 0.463, p = 0.51, η² = 0.04].

A JZS Bayes factor ANOVA with a default prior supported these results. The model, including the two main effects of delay and set size, yielded the highest Bayes factor and was more probable than all the other models (BF 2 main effects without interaction = 3.458 vs. BF Set size main effect = 2.092, BF-Ratio: 1.652).

Next, out of each error distribution of a specific condition and subject, we extracted two summary statistics: (1) Random errors as a proportion, and (2) Precision of recall: the average magnitude of the precision errors (error-size ≤ 5).

Random Errors

First, we verified that the large errors (>5) we refer to as random were indeed uniformly distributed. Chi-square analysis confirmed that all the distributions (two delays and two set sizes) were not significantly different from the expected uniform distribution (p > 0.05 for all conditions).

A repeated measures ANOVA (Figure 3A) confirmed that the proportion of random errors increased with delay duration [Delay main effect: F(1,11) = 6.286, p = 0.029, η² = 0.364], as well as with the increase in set size [Set size main effect: F(1,11) = 19.437, p ≤ 0.001, η² = 0.639]; but with no significant interaction [interaction: F(1,11) = 0.524, p = 0.484, η² = 0.045]. This analysis indicates that longer delays as well as increased memory load increase random errors but the two do not interact.

FIGURE 3

FIGURE 3. Results of Experiment 1. The memory display consisted of one or three upright faces. A repeated measures ANOVA analysis showed that (A) Random errors were modulated by delay duration (x-axis) and set size (light and dark pink). (B) Precision errors were not affected by either delay duration (x-axis) or set size (light and dark green). ^∗p < 0.05.

A JZS Bayes factor ANOVA (Rouder et al., 2012; Love et al., 2015; Morey and Rouder, 2015) with a default prior supported these results. The model including the two main effects of delay and set size yielded the highest Bayes factor and it was more probable than all the other models (BF 2 main effects without interaction = 5.98 vs. BF 2 main effects + interaction = 1.274, BF-Ratio: 4.7).

Precision Errors

A repeated measures ANOVA (Figure 3B) showed no effect of delay duration or set size, or interaction [Set size main effect: F(1,11) = 1.414, p = 0.259, η² = 0.114. Delay main effect: F(1,11) = 0.417, p = 0.532, η² = 0.037. Interaction: F(1,11) = 0.015, p = 0.904, η² = 0.001].

Consistent with the results above, a JZS Bayes factor ANOVA on precision errors revealed that the null effect model was more probable than all the other possible models (BF Null effects = 4.117 vs. BF Set size main effect = 1.117, BF-Ratio: 3.7).

The chi-square statistic supported these results: the distributions underlying precision errors were not significantly different for the 1 and 6 s delays [χ²(10) = 13.386, p = 0.203]. Unlike the ANOVA analysis on the average magnitude of the precision errors (summary statistics of the distribution), the distributions of errors seemed to be different when one or three faces were displayed [χ²(10) = 20.558, p = 0.024]. We elaborate on this in the section “Discussion”.

Discussion

To study forgetting of complex objects we used a delayed estimation task with images of natural color faces, all with comparable age and gender. We found that longer delays increased the averaged size of errors which was reflected mainly in random errors but not in the precision of recall. Moreover, increased memory load led to larger errors accompanied by larger proportion of random errors, and a slight change in the distribution of precision errors (as captured by chi-square analysis) but did not change the averaged size of precision error (as captured by an ANOVA on the averaged precision error). Despite the large difference between simple features and complex objects, such as faces, these findings are somewhat consistent with the results obtained with tasks employing simple features (Zhang and Luck, 2009; Pertzov et al., 2016). Thus, when complex objects are forgotten their representation is rendered inaccessible and reflected in random errors. However, in contrast to the case of simple features (Pertzov et al., 2016), the precision of recall does not seem to degrade in longer retention intervals.

Experiment 2

Experiment 1 explored immediate forgetting of faces, under low and high memory loads (one or three faces). A wealth of behavioral literature posits that faces are processed in a qualitatively different fashion compared to other visual categories (Rossion, 2013). Hence, a critical question is whether the pattern of forgetting we observed in Experiment 1 is unique to faces and their form of processing, or alternatively, could be generalized to other complex objects. One hallmark of face specific processing mechanisms is the face inversion effect (Yin, 1969), that is, the disruption of face processing due to inversion compared to the effect of this manipulation on other objects. Inverted faces provide an ideal stimulus to employ in our experiment as their low level image properties are identical to those of upright faces yet, their processing is markedly different compared to upright faces (Yin, 1969; Rossion and Gauthier, 2002; Richler and Gauthier, 2014). Thus, in Experiment 2 we directly compared the forgetting of upright and inverted faces.