The Facial Expressive Action Stimulus Test. A test battery for the assessment of face memory, face and object perception, configuration processing, and facial expression recognition

There are many ways to assess face perception skills. In this study, we describe a novel task battery FEAST (Facial Expressive Action Stimulus Test) developed to test recognition of identity and expressions of human faces as well as stimulus control categories. The FEAST consists of a neutral and emotional face memory task, a face and shoe identity matching task, a face and house part-to-whole matching task, and a human and animal facial expression matching task. The identity and part-to-whole matching tasks contain both upright and inverted conditions. The results provide reference data of a healthy sample of controls in two age groups for future users of the FEAST.


INTRODUCTION
Face recognition is one of the most ubiquitous skills. The neural underpinnings of face perception are still a matter of debate. This is not surprising when one realizes that a face has a broad range of attributes. Identity is but one of these, and it is not clearly understood yet how a deficit in that area affects perception and recognition of other aspects of face perception. Prosopagnosia or absence of normal face identity recognition is one of the most peculiar neuropsychological symptoms and it has shed some light on the nature of face perception (de Gelder and . The term referred originally to loss of face recognition ability in adulthood following brain damage (Bodamer, 1947). Prosopagnosia can have a profound impact on social life, as in extreme cases the patients have difficulty recognizing the face of their spouse or child. More recently it has also been associated with neurodegenerative syndromes like fronto-temporal lobe degeneration (FTLD) (Snowden et al., 1989) and neurodevelopmental syndromes like cerebellar hypoplasia ( Van den Stock et al., 2012b). In addition to the acquired variant, there is now general consensus on the existence of a developmental form, i.e., developmental prosopagnosia (DP). A recent prevalence study reported an estimate of 2.5% (Kennerknecht et al., 2006) and indicates that DP typically shows a hereditary profile with an autosomal dominant pattern.
In view of the rich information carried by the face, an assessment of specific face processing skills is crucial. Two questions are central. One, what specific dimension of facial information are we focusing on, and two, is its loss specific for faces. To date, there is no consensus or golden standard regarding the best tool and performance level that allows diagnosing individuals with face recognition complaints as "prosopagnosic." Several tests and tasks have been developed, such as the Cambridge Face Memory Test (Duchaine and Nakayama, 2006), the Benton Facial Recognition Test (Benton et al., 1983), the Cambridge Face Perception Task (Dingle et al., 2005), the Warrington Recognition Memory Test (Warrington, 1984) and various tests using famous faces (such as adaptations of the Bielefelder famous faces test, Fast et al., 2008). These each provide a measure or a set of measures relating to particular face processing abilities, e.g., matching facial identities or rely on memory for facial identities which is exactly what is problematic in people with face recognition disorders. More generally, beyond the difference between perception and memory, there is not yet a clear understanding of how the different aspects of normal face perception are related. So testing of face skills should cast the net rather wide. A test battery suitable for the assessment of prosopagnosia should take some additional important factors into account. Firstly, to assess the face specificity of the complaints, the test battery should include not only tasks with faces, but also an equally demanding condition with control stimuli that are visually complex. Secondly, an important finding classically advanced to argue for a specialization for faces regards the configural way in which we seem to process faces, so the task should enable the measurement of configural processing of faces and objects. The matter of configuration perception also has been tackled in several different ways, such as with the composite face task (Young et al., 1987), the whole-part face superiority effect (Tanaka and Farah, 1993) or more recently, using gaze-contingency (Van Belle et al., 2011). We choose to focus on the classical face inversion effect (Yin, 1969;Farah et al., 1995), whose simple method lends itself very well to study object inversion effects. Next, besides using the inversion effect, configuration-vs. feature-based processing can also be investigated more directly by part-to-whole matching tasks (de Gelder et al., 2003). Furthermore, previous studies have found positive relationships between the ability to process faces configurally and face memory (Richler et al., 2011;Huis in 't Veld et al., 2012;Wang et al., 2012;DeGutis et al., 2013) indicating that configural processing might facilitate memory for faces.
Additionally, there is accumulating evidence in support of an interaction between face identity and face emotion processing (Van den Stock et al., 2008;Chen et al., 2011;de Gelder, 2012, 2014) and there is increasing evidence that configuration processing is positively related to emotion recognition ability (Bartlett and Searcy, 1993;Mckelvie, 1995;Calder et al., 2000;White, 2000;Calder and Jansen, 2005;Durand et al., 2007;Palermo et al., 2011;Tanaka et al., 2012;Calvo and Beltrán, 2014). We therefore extended our test battery with tasks targeting emotion recognition and emotion effects on face memory, by adding an emotional face memory task and a facial expression matching task. To stay with the rationale of our test that each skill tested with faces must also be tested with a selected category of control objects, we used canine face expressions.
Taking all these aspects into account, we constructed a face perception test battery labeled the Facial Expressive Action Stimulus Test (FEAST). The FEAST is designed to provide a detailed assessment of multiple aspects of face recognition ability. Most of the subtests have been extensively described and validated on the occasion of prosopagnosia case reports and small group studies (de Gelder et al., 1998(de Gelder et al., , 2000(de Gelder et al., , 2003Rouw, 2000a,b,c, 2001;Hadjikhani and de Gelder, 2002;de Gelder and Stekelenburg, 2005;Righart and de Gelder, 2007;Van den Stock et al., 2008, 2012aHuis in 't Veld et al., 2012). But so far the test battery was not presented systematically as it had not been tested on a large sample of participants receiving the full set of subtests. Here, we report a new set of normative data for the finalized version of the FEAST, analyze the underlying relationships of the tasks, and freely provide the data and stimulus set to the research community for scientific purposes.

Subjects
The participants were recruited between 2012 and 2015 from acquaintances of lab members and research students. Participation was voluntarily and no monetary reward was offered. The following inclusion criteria were applied: righthanded, minimally 18 years old, normal or corrected-to-normal vision and normal basic visual functions as assessed by the Birmingham Object Recognition Battery (line length, size, orientation, gap, minimal feature match, foreshortened view, and object decision) (Riddoch and Humphreys, 1992). A history of psychiatric or neurological problems, as well as any other medical condition or medication use which would affect performance, or history of a concussion, were exclusion criteria. This study was carried out in accordance with the recommendations and guidelines of the Maastricht University ethics committee, the "Ethische Commissie Psychologie" (ECP). The protocol was approved by the Maastricht University ethics committee (ECPnumber: ECP-128 12_05_2013).
In total, 61 people participated in the study. Three subjects were 80, 81, and 82 years old. Even though they adhered to every inclusion criteria, they were excluded from the analyses due to being outliers on age (more than 2 standard deviations from the mean). The sample thus consisted of 58 participants, between 18 and 62 years old (M = 38, SD = 15). Of those, 26 are male, between 19 and 60 years old (M = 38, SD = 15) and 32 women between 18 and 62 years old (M = 39, SD = 16). There are no differences in age between the genders [t (1, 56) = −0.474, p = 0.638].
However, an age distribution plot (see Figure 1) reveals a gap, where there are only 6 participants between 35 and 49. Therefore, the sample is split in two: one "young adult" group, younger than 42 and a "middle aged" group of participants between 47 and

Experimental Stimuli and Design
The face and shoe identity matching task, face and house partto-whole matching task, Neutral and Emotion Face Memory task (FaMe-N and FaMe-E) have been previously described including figures of stimulus examples (Huis in 't Veld et al., 2012).

Face and Shoe Identity Matching Task and the Inversion Effect
The face and shoe identity-matching task (de Gelder et al., 1998;de Gelder and Bertelson, 2009) was used to assess identity recognition and the inversion effect for faces and objects. The test contained 4 conditions with a 2 category (faces and shoes) × 2 orientation (upright and inverted) factorial design. The materials consisted of greyscale photographs of shoes (8 unique shoes) and faces (4 male, 4 female; neutral facial expression) with frontal view and 3 /4 profile view. A stimulus contained three pictures: one frontal view picture on top and two 3 /4 profile view pictures underneath. One of the two bottom pictures (target) was of the  same identity as the one on top (sample) and the other was a distracter. The target and distracter pictures of the faces were matched for gender and hairstyle. Each stimulus was presented for 750 ms and participants were instructed to indicate by a button press which of the two bottom pictures represented the same exemplar as the one on top. Participants were instructed to answer as quickly but also as accurately as possible, and responses during stimulus presentation were collected. Following the response, a black screen with a fixation cross was shown for a variable duration (800-1300 ms). The experiment consisted of four blocks (one block per condition). In each block, 16 stimuli were presented 4 times in a randomized order, adding up to a total of 64 trials per block. Each block was preceded by 4 practice trials, during which the participants received feedback about their performance (see Figure 2).

Face and House Part-to-whole Matching Task
This task is developed to assess holistic processing. The test also consisted of 4 conditions, with a 2 category (faces and houses) × 2 orientation (upright and inverted) factorial design. Materials consisted of grayscale pictures of eight faces (four male; neutral facial expression, photographed in front view and with direct gaze) and eight houses. From each face, part-stimuli were constructed by extracting the rectangle containing the eyes and  the rectangle containing the mouth. House-part stimuli were created using a similar procedure, but the parts consisted of the door or window. The trial procedure was similar to the face and object identity matching task, where a whole face or house was presented on top (sample), with a target part-picture and a distractor part-picture presented underneath. Each trial was presented for 750 ms and participants were instructed to indicate by a button press which of the two bottom pictures represented the same exemplar as the one on top. Participants were instructed to answer as quickly but also as accurately as possible, and responses during stimulus presentation were collected. Following the response, a black screen with a fixation cross was shown for a variable duration (800-1300 ms). The experiment consisted of eight blocks (two blocks per condition). In each block, 16 stimuli were presented 2 times in a randomized order, adding up to a total of 32 trials per block and 64 trials per condition. Within blocks, the presentation of the two parts (eyes or mouth, window or door) was randomized in order to prevent participants to pay attention only to one specific feature. The first block of each condition was preceded by 4 practice trials, during which the participants received feedback about their performance (see Figure 3).

Facial Expression Matching Task (FEM-H and FEM-C)
The FEM is a match-to-sample task used to measure emotion recognition ability in both human and canine faces. The experiment was divided into two parts. The first part consisted of human facial expressions (anger, fear, happy, sad, surprise, disgust). The materials consisted of grayscale photographs of facial expressions of 34 female identities and 35 male identities taken from the Karolinska Directed Emotional Faces (KDEF) (Lundqvist et al., 1998). This task has been used previously in Van den Stock et al. (2015). A stimulus consisted of three pictures: one picture on top (sample) and two pictures underneath. One of the two bottom pictures showed a face expressing the same emotion as the sample, the other was a distracter. The target and distracter pictures of the faces were matched for gender for the human stimuli. Each trial was presented until a response was given, but participants were instructed to answer as quickly and accurately as possible. Following the response, a black screen with a fixation cross was shown for a variable duration (800-1300 ms). Each emotional condition contained 10 trials (5 male) in which the target emotion was paired with a distracter from each of the other emotions once per gender, resulting in 60 trials in total. The first part was preceded by 4 practice trials, during which the participants received feedback about their performance.
The second part consisted of canine facial expressions. In total, 114 pictures of dogs which could be perceived as angry (17), fearful (27), happy (17), neutral (29), and sad (24) were taken from the internet by EH. These pictures were validated in a pilot study using 28 students of Tilburg University in exchange for course credit. The participants indicated of each photo whether they thought the dog was expressing anger, fear, happiness, sadness or no emotion in particular (neutral) and secondly, how intense they rated the emotional expression on a scale from one to five. Twelve angry, twelve fearful, and twelve happy canine expressions were accurately recognized by more than 80% of the participants and used in the experiment. The canine part consisted of 72 trials in total, 24 per emotion condition, in which each target emotion was paired with each of the distracter emotions 12 times. The experiment was preceded by 2 practice trials, during which the participants received feedback about their performance (see Figure 4).

Neutral Face Memory Task (FaMe-N)
Based on the Recognition Memory Test (Warrington, 1984), the FaMe-N consists of an encoding and a recognition phase. The stimuli consist of 100 grayscale Caucasian faces (50 male) with a FIGURE 8 | Means and standard errors of the mean of the accuracy and reaction times on the face and house part-to-whole matching task split by age group.  neutral facial expression, in front view, with frontal eye gaze. The stimuli were taken from a database created at Tilburg University. Trials in the encoding phase consisted of the presentation of a single stimulus for 3000 ms, followed by a black screen with a white fixation cross with a duration of 1000 ms. Participants were instructed to encode each face carefully and told that their memory for the faces would be tested afterwards. The encoding block consisted of 50 trials. The recognition phase immediately followed upon the encoding phase. A trial in the recognition phase consisted of the simultaneous presentation of two adjacent faces. One was the target face and was also presented in the encoding phase. The other face was not previously presented in the encoding phase and served as distracter. Fifty trials were randomly presented and target and distractor presentation side were evenly distributed. Participants were instructed to indicate as quickly and also as accurately as possible which face was also presented in the encoding phase. The stimulus pairs were matched for gender and hairstyle (see Figure 5).

Emotional Face Memory Task (FaMe-E)
This task was designed by adapting the FaMe-N task by using stimuli containing emotional instead of neutral faces. Images were taken from the NimStim database (Tottenham et al., 2009) and stimuli created at Tilburg University. The stimuli consisted of 96 photographs (53 female) with direct eye gaze and frontal view. The individuals in the stimuli express fear, sadness, or happiness. There was no overlap in identities with the FaMe-N. The procedure was similar to the FaMe-N, but with 48 trials (16 per emotion) in both phases. The pictures making a stimulus pair were matched for emotion and hairstyle and in most trials also gender (see Figure 6).

Analyses
Accuracies were calculated as the total proportion of correct responses for both the total score of each task and for each condition separately. Average response times from stimulus onset were calculated for the correct responses only. For all tasks, reaction times faster than 150 ms were excluded from analyses. In addition, for the identity matching task and part-to-whole matching task, reaction times longer than 3000 ms were excluded from analyses. For the other tasks, reaction times longer than 5000 ms were excluded from analyses. The number of outliers are reported in the results. One control subject did not complete the  face and house part-to-whole matching task. The SPSS dataset can be downloaded through the supplementary materials. In addition, the internal consistency was assessed with the Kuder Richardson coefficient of reliability (KR 20), reported as ρ KR20 , which is analogous to Cronbach's alpha but suitable for dichotomous measures (Kuder and Richardson, 1937).
The results were analyzed using repeated measures GLMs, with the experimental factors as within subject variables and age group and gender as between subject variables. Interaction effects were further explored using post-hoc paired samples t-tests. The assumption of equality of error variances was checked with a Levene's test. The assumption of normality was not formally tested, as the sample is larger than 30 and repeated measures GLMs are quite robust against violations of normality.
Inversion scores were calculated by subtracting the accuracy and reaction time scores on the inverted presentation condition from the upright condition. A positive score indicates that accuracy was higher, or the reaction time was longer, on the upright condition. A negative score indicates higher accuracy or reaction times for the inverted condition. To assess whether a stronger configuration processing as measured by a higher accuracy inversion effect is related to improved face memory and emotion recognition, multiple linear regression analyses were performed with accuracy scores on the FaMe-N, FaMe-E, and both FEM tasks as dependent variable and age, gender, and four inversion scores (face identity, shoe identity, face-part, and house-part) as predictors. In addition, correlations between all tasks were calculated.
Lastly, percentile ranks of all tasks and correlations between all tasks were calculated and reported for both the accuracy scores and reaction times (see Tables 8-11).
A repeated measures GLM on accuracy scores with category (faces, shoes) and orientation (upright, inverted) as withinsubject factors and gender and age group as between-subject factors revealed a category by orientation interaction effect  shoes are matched slower than inverted ones [F (1, 54) = 7.560, p = 0.008, η 2 p = 0.12] and the middle aged group responded slower [F (1, 54) = 15.174, p < 0.001, η 2 p = 0.22; see Figure 7 and Table 1].
A repeated measures GLM on accuracy scores with category (faces, houses) and orientation (upright, inverted) as withinsubject factors and gender and age group as between-subject factors revealed a three way age group by category by orientation interaction effect [F (1, 53) = 5.413, p = 0.024, η 2 p = 0.09]. Overall, both age groups are better at part to whole matching of houses [F (1, 53) = 153.660, p < 0.001, η 2 p = 0.75]. However, the young adult group is more accurately able to part to whole match upright than inverted faces [t (31) = 5.369, p < 0.001], whereas the middle aged group is not [t (24) = 0.952, p = 0.351], but no such group differences are found for house inversion [young adult group: t (31) = −0.958, p = 0.345, middle aged group: t (24) = −0.490, p = 0.628].  Step The same repeated measures GLM on reaction times revealed a three way gender by age group by category interaction effect [F (1, 53) = 5.539, p = 0.022, η 2 p = 0.10]. To assess this effect, the repeated measures GLM with category (faces, houses) and orientation (upright, inverted) as within-subject factors and age group as between-subject factors was run for males and females separately. For the female group, a category by age group interaction effect is found [F (1, 29) = 7.022, p = 0.013, η 2 p = 0.20], whereas no significant effects were found for men (see Figure 8 and Table 2).
A repeated measures GLM on accuracy scores and reaction times scores with emotion (fear, happy, sad) as within-subject factors and gender and age group as between subject variables revealed no significant effects. However, a gender by age group by emotion three-way interaction effect was found for reaction times, [F (2, 53) = 3.197, p = 0.049, η 2 p = 0.11]. Figure 11 shows that the pattern of results between men and women is reversed when the age groups are compared. It looks like young adult women seem quicker to recognize sadness than middle aged women: indeed, if the repeated measures is run for men and women separately, with emotion as within subject variables and age group as between, no effects of emotion or age group are found for men. However, for women, an emotion by age group interaction trend is found [F (2, 29) = 2.987, p = 0.066, η 2 p = 0.17; see Figure 11 and Table 5].
In addition, we directly compared the FaMe-N and FaMe-E using a repeated measures GLM on accuracy scores and reaction times scores on the neutral, fearful, happy, and sad conditions as within-subject factors and gender and age group as between subject variables, but no significant effects were found.

Relationships between Tasks
In the current sample, no significant predictive relationship between configuration processing as measured by the inversion effect and face memory scores were found (see Table 6).
Similarly, no significant relationship between configuration processing and emotion recognition scores were found, aside from a negative effect of age on accuracy on the FEM-H and FEM-C, see Table 7. In addition, see Tables 8, 9 for correlations between the all the tasks and subtasks of the FEAST.
Furthermore, percentile ranks for accuracy scores as percentage correct and the reaction times are reported in Tables 8, 9, and the correlations between all tasks are reported in Tables 10, 11.

DISCUSSION
In this study, we provide normative data of a large group of healthy controls on several face and object recognition tasks, face memory tasks and emotion recognition tasks. The effects of gender and age were also reported. All tasks have a good internal consistency and an acceptable number of outliers.  Firstly, face and object processing and configuration processing were assessed. As expected, upright face recognition is more accurate than inverted face recognition, in line with the face inversion effect literature (Yin, 1969;Farah et al., 1995). Interestingly, even though the middle aged group was less accurate than the young adults group, their response patterns regarding face and object inversion were comparable. As configurational processing measured by (upright-inverted) inversion scores was not influenced by gender or age, this is a stable effect in normal subjects. The absence of any interaction effects with age group or gender indicate that category specific configuration effects are stable across gender and between young adulthood and middle age. This implies it is a suitable index to evaluate in prosopagnosia assessment. Secondly, the face and house part to whole matching task seems to be a harder task than the whole face and shoe matching task, as indicated by overall lower accuracies. Young adults are more sensitive to inversion in this task.
Thirdly, we found that fear and sadness recognition on our FEM-H task was quite poor, but that anger, disgust, surprise and happiness were recognized above 80% accuracy. Similarly, canine emotions were recognized very well, although fear was also the worst recognized canine emotion and the older age group scored slightly worse and slower on this task, confirming that this subtest provides a good control.
Lastly, no effects of gender or age were found on neutral face memory, and participants scored quite well on the task, with an average of almost 80% correct. Similarly, no clear effects of age, gender or emotion were found on face memory as measured with the FaMe-E, except that it seems that middle aged women are slower to recognize previously seen identities when they expressed sadness. Interestingly, this is in line with the "agerelated positivity effect" (Samanez-Larkin and Carstensen, 2011;Reed and Carstensen, 2012). In general, the results corroborate those from other studies on the effect of emotion on memory (Johansson et al., 2004), but a wide variety of results has been reported in the literature (Dobel et al., 2008;Langeslag et al., 2009;Bate et al., 2010;D'Argembeau and Van der Linden, 2011;Righi et al., 2012;Liu et al., 2014). In addition, we did not find any relationships between configuration perception and face memory. This can be due to the fact that unlike in samples with DPs and controls, there is less variability in inversion scores and memory scores (i.e., most participants will not have any configuration processing deficits similar to DPs and in contrast to DPs, most controls are not severely limited on face memory).
The results indicate that age is most likely a modulating factor when studying face and object processing, as the responses of the middle aged group is often slower. One explanation besides a general cognitive decline with age can be found in the literature on the effect of age on facial recognition, where an "own-age bias" is often found (Lamont et al., 2005;Firestone et al., 2007;He et al., 2011;Wiese, 2012). The "own-age bias" in face recognition refers to the notion that individuals are more accurate at recognizing faces from individuals belonging to the age category of the observer. For instance, children are better at recognizing child faces and adults are better at recognizing adult faces. Future researchers wishing to use the FEAST should compare the results of their participants with the appropriate age group, or should control for the effects of age or ideally, test age-matched controls. Gender on the other hand does not seem so influential, but this article provides guidelines and data for both gender and age groups regardless.
Some limitations of the FEAST should be noted. One is the lack of a non-face memory control condition using stimuli with comparable complexity. However, a recent study with a group of 16 DPs showed that only memory for faces, in contrast to hands, butterflies and chairs was impaired (Shah et al., 2014), so for this group this control condition might not be necessary. Also, the specific effects of all emotions, valence and arousal may be taken into account in future research. The face memory test could be complemented with the use of test images that show the face in the test phase from a different angle that in the training phase as is done in the matching tests. In addition, the low performance on fear recognition should be assessed. In short, the FEAST provides researchers with an extensive battery for neutral and emotional face memory, whole and part-to-whole face and object matching, configural processing and emotion recognition abilities.