Misaligned and Polarity-Reversed Faces Determine Face-specific Capacity Limits

Thoma, Volker; Ward, Neil; de Fockert, Jan W.

doi:10.3389/fpsyg.2016.01470

ORIGINAL RESEARCH article

Front. Psychol., 27 September 2016

Sec. Cognition

Volume 7 - 2016 | https://doi.org/10.3389/fpsyg.2016.01470

Misaligned and Polarity-Reversed Faces Determine Face-specific Capacity Limits

Volker Thoma¹

Neil Ward¹

Jan W. de Fockert²^*

¹School of Psychology, University of East London, London, UK
²Department of Psychology, Goldsmiths, University of London, London, UK

Previous research using flanker paradigms suggests that peripheral distracter faces are automatically processed when participants have to classify a single central familiar target face. These distracter interference effects disappear when the central task contains additional anonymous (non-target) faces that load the search for the face target, but not when the central task contains additional non-face stimuli, suggesting there are face-specific capacity limits in visual processing. Here we tested whether manipulating the format of non-target faces in the search task affected face-specific capacity limits. Experiment 1 replicated earlier findings that a distracter face is processed even in high load conditions when participants looked for a target name of a famous person among additional names (non-targets) in a central search array. Two further experiments show that when targets and non-targets were faces (instead of names), however, distracter interference was eliminated under high load—adding non-target faces to the search array exhausted processing capacity for peripheral faces. The novel finding was that replacing non-target faces with images that consisted of two horizontally misaligned face-parts reduced distracter processing. Similar results were found when the polarity of a non-target face image was reversed. These results indicate that face-specific capacity limits are not determined by the configural properties of face processing, but by face parts.

Introduction

In modern daily life, people see many human faces, and increasingly this happens by looking at images (e.g., in photographs and social media). Despite sharing the same basic parts (eyes, nose, mouth), recognition of individual faces appears to be fast and almost effortless in normal circumstances. One reason for the apparent ease of face recognition is the ability of the visual system to recognize a face as a whole, rather than process its individual features in a piece-meal fashion (Young et al., 1987; Tanaka and Farah, 1993; Laguesse and Rossion, 2013). However, there is recent evidence that only a limited number of faces can be recognized in parallel (Thoma and Lavie, 2013) indicating that face recognition has a limited capacity. The current study investigates whether face-specific capacity limits are associated with mental representations that rely on part-processing or processing of the whole face.

For some time, experimental evidence has suggested that face recognition is based on “automatic” processes that are deemed to be fast (Young et al., 1986), difficult to suppress intentionally (Wojciulik et al., 1998), and require only minimal attentional resources (Schneider and Chien, 2003, see Palermo and Rhodes, 2007, for a review). Human faces are also processed faster than any other visual category, including ape faces (Itier et al., 2011). Accordingly, one would expect face recognition to be relatively unhindered by limits in processing capacity, and only minimally affected if demand for visual attention (and therefore processing capacity) was allocated elsewhere. This was indeed observed in a number of behavioral (Jenkins et al., 2002; Reddy et al., 2004) and neuro-physiological studies (Neumann and Schweinberger, 2008).

One account that explicitly predicts capacity-limits in visual processing is perceptual load theory (PLT; Lavie and Tsal, 1994; Lavie, 1995; Lavie et al., 2004). The theory holds that, in tasks with low perceptual load (e.g., when the search for a visual target is undemanding because non-targets are few or easy to distinguish from the target), spare attentional capacity remains available for processing irrelevant distracters. However, at higher levels of perceptual load an irrelevant distracter is hardly or not at all processed (Lavie, 1995, 2005; Lavie and Cox, 1997) because the main task does not leave any spare capacity. In a typical experimental paradigm using binary categorization, Lavie et al. (2003; Experiment 2) asked subjects to search the center of a computer screen for the name of an object, among one, two, four, or six non-word letter strings, and categorize it as either belonging to the category of fruits or musical instruments, whilst ignoring a distracter image in the periphery. The distracter was either a photograph of the target (congruent condition) or a photograph from the opposite category (incongruent condition). The experiment showed faster response times in the congruent compared to the incongruent condition, indicating that the distracter image was processed, and—as predicted by load theory—this congruency effect was eliminated when the set size of non-targets in the center was increased.

But whereas perceptual load theory seems to adequately account for the fate of processing peripheral letters (Lavie and Cox, 1997) and objects (Lavie et al., 2009), the experimental evidence is different for faces as distracter stimuli. In a target search for letters (Jenkins et al., 2002) or names (Lavie et al., 2003), interference from task-irrelevant faces was not eliminated under high levels of task load. It was thus proposed that the apparent special status of faces may involve “automatic” processing at an early perceptual stage, which would be consistent with the theory that face processing is mediated by a specialized visual module in the brain (Fodor, 1983), triggered automatically in the presence of faces (Kanwisher et al., 1997; Farah et al., 1998). Indeed, there is evidence that recognition of faces is subject to rapid processing in comparison to non-face objects (Young et al., 1986) and that processing appears to be mandatory, meaning that it cannot be prevented at will (Wojciulik et al., 1998; Boutet et al., 2002; Palermo and Rhodes, 2007).

Despite these findings of preserved processing of peripheral faces under attentional load, recent research indicates that there are conditions when the processing of peripheral faces is reduced by capacity limits. Bindemann et al. (2005) showed that when participants categorized centrally shown names of famous people or national flags (as belonging to either the UK or US), famous distracter faces produced response competition effects, but these were eliminated when a face had to be categorized as a central target. A similar finding using priming measures was reported by Bindemann et al. (2007). Thus, it appears that processing of face distracters is capacity-free as long as a central task is not involved with face recognition as well.

To investigate whether these presumed category-specific capacity limits are apparent when the perceptual load of relevant processing is systematically varied, Thoma and Lavie (2013) conducted a series of experiments in which participants searched for the face of a famous politician or pop star and made speeded classification responses. Perceptual load was manipulated through changes in the relevant search set size by adding non-famous faces appearing with the target in the center of the screen. A task-irrelevant face that was the same as the target, or from a different category, was shown in the periphery. As in traditional perceptual load studies, faster and more accurate responses to a target face were observed when the distracter face was the same as the target, rather than from a different category, and this congruency effect was only observed when a single face was presented in the search array. Under high load, when additional non-target faces were added to the search set, the congruency effect was eliminated, indicating a maximum capacity of two to three faces. In a further experiment, Thoma and Lavie replicated the results of Lavie et al. (2003; Experiment 2), which demonstrated that, in a central name search task, response competition effects from incongruent peripheral face images are not affected by increases in perceptual load, removing the possibility that the face-specific perceptual load effects were due to inequity in the load manipulations between the face and name search tasks.

The results of Thoma and Lavie (2013) therefore showed that the processing of face distracters only depends on perceptual load when load manipulation involved face stimuli. Recently, Thoma (2014) confirmed the face-specific aspect of load capacity in similar experiments. Importantly, that study also showed that when the central task was loaded with inverted non-target faces (while searching for an upright famous target face) the congruency effects were still reduced, just as observed with upright non-target faces. This was a surprising finding, as traditionally face recognition research makes a distinction between holistic processing of a whole face and “featural” processing, in which parts of the face are processed separately (Tanaka and Farah, 1993), in a way similar to that observed for processing non-face objects (Maurer et al., 2002). Holistic processing involves rapid classification through integration of facial features—eyes, nose, mouth—which show an established, first-order spatial relationship¹. Second-order relations, such as the metric distance between facial features, may then be processed to discriminate between faces (sometimes distinguished as “configural” processing, see Richler and Gauthier, 2014). Holistic processing has been originally assumed to occur only when faces are in the upright orientation (Farah et al., 1998), and face recognition can be disrupted by introducing changes in spatial information, for example by presenting a face in an inverted orientation (Nederhouser et al., 2007). Inversion of faces is commonly believed to lead to more part-based processing, whilst having little disruptive effect on processing of the facial features themselves (Searcy and Bartlett, 1996). This so-called face inversion effect (FIE; Yin, 1969) is regularly cited as important evidence that faces have a special status, since it demonstrates that inversion has a greater effect on recognition of a face than on recognition of other objects (but see Richler et al., 2011, for the view that upside-down faces may still be processed “holistically”). Yet, Thoma's (2014) finding that increasing perceptual load with upside-down faces also reduces distracter processing is strong evidence that the observed face-specific capacity limits are not—or not solely—determined by holistic face representations, at least in the sense of so-called first order relations between parts. This leads to the question which other properties of face processing can explain category-specific load effects? One possibility is that the unique range of distinctive spatial frequencies (inherent in images of faces) is responsible for the observed capacity limits. The spatial frequencies present in a face image are the same for upright and upside down faces, but different to other non-face objects or letters (De Valois and De Valois, 1980; Costen et al., 1996), which would account for the findings of both Lavie et al. (2003) and Thoma (2014). However, previous experiments show that scrambled versions (which also retain the spatial frequencies of the original face) of distracter (peripheral) faces did not reduce congruency effects compared to the presence of an intact anonymous face. Thoma and Lavie (2013) also ruled out that spatial frequency determined face capacity limits (see Thoma, 2014, and Discussion Section for details).

The observation that there are no capacity effects from non-target faces with scrambled spatial frequency components, while at the same time face capacity effects persist with inverted faces therefore suggests that face recognition limits are determined by the processing of specific face parts or local features rather than holistic face representations. Indeed, this concurs with recent evidence that face perception relies more on local facial characteristics than previously thought (Gaspar et al., 2008; Schwaninger et al., 2009; Gold et al., 2012). However, inversion of a face may affect face processing in a variety of ways: it may impede the computation of distances between parts such as the nose and eyes (which is thought to underlie face identification (Kemp et al., 1990; Bruce et al., 1991), or it may affect the way information about face parts is sampled (Gaspar et al., 2008; Gold et al., 2012). Recently, Hayward et al. (2016) showed that holistic processing captures both configuration-based and component-based information. Therefore, Thoma's (2014) findings that even inverted non-target faces eliminate target-distracter congruency effects, just as upright faces do, could be explained by face processing capacity relying on processing of parts rather than the first-order relations between them.

Another transformation that impairs the recognition of a face, whilst preserving identifiable features, is based on the Composite Face Effect (CFE; Taubert and Alais, 2009; Laguesse and Rossion, 2013). This is derived from the Composite Face Illusion (CFI) in which the top and bottom halves of two different individual faces are combined into a single composite, or chimeric image, making it more difficult to name the target top half of a familiar face, compared to when it is presented shifted sideways along the horizontal axis (Young et al., 1987). Even if two identical top halves are shown side by side, they are not perceived as from the same face if combined with bottom halves from two different individuals. This striking visual illusion (see Rossion, 2013) shows that aligned half faces cannot be perceived as independent from each other, and is strong evidence that faces are normally perceived as integrated wholes rather than perceived as a collection of features. This integration of the facial features into a Gestalt (a global picture) is reminiscent of the idea of “configural” (Sergent, 1984; Young et al., 1987) or “holistic” (Tanaka and Farah, 1993; Farah et al., 1998) processing—similar to the arguments regarding the inversion effect.

Several mechanisms may underlie the CFE. The misalignment between the two half faces increases the relative distance between the parts in the two halves, which may make individuation of each face easier (Diamond and Carey, 1986; Mondloch et al., 2003). If this were the case, then one would expect a linear relationship between degree of misalignment and the magnitude of the CFE. However, Taubert and Alais (2009) report that the degree of CFE did not differ between two levels of alignment (25% vs. 50%). More recently, Laguesse and Rossion (2013) have shown that holistic processing is reduced when the half-faces are displaced horizontally by as little as 8.3% of the width of the face. Thus, there seems to be a qualitative breakdown of the perceptual whole—i.e., the first-order configuration of the features (Maurer et al., 2002; McKone et al., 2007)—when face halves are even slightly misaligned. This would then lead to more featural processing, similar to the assumed effect of face inversion. We therefore predict that using misaligned faces as non-targets in a visual search set will result in similar effects on target-target congruency as was observed when inverted faces were used (Thoma, 2014).

A third type of image manipulation that has repeatedly been shown to disrupt the processing of faces is to create a negative of the original photo image (Galper, 1970; Phillips, 1972; Johnston et al., 1992). Reversing the contrast polarities of an image (also termed polarity reversion or negation) makes black areas white, light gray areas dark gray, and so forth. Like face inversion, the disruptive effects of polar reversal on face recognition have been observed consistently across a number of experimental paradigms, (Vuong and Tarr, 2004; Nederhouser et al., 2007) although there are differences in interpreting the mode of disruption. Some researchers have proposed that polarity reversal alters shading cues in a face, which impairs interpretation of its three-dimensional properties (Kemp et al., 1990; Johnston et al., 1992). It has also been suggested that polarity reversal disrupts the perception of second-order relations, such as the distance between facial features, which are widely accepted to play an important role in the perceptual representation of faces (Diamond and Carey, 1986; Hole et al., 1999; White, 2001). However, more recent evidence supports the hypothesis that the disruptive effects of polarized faces is driven by the resulting changes in surface pigmentation; i.e., their variation in reflectance (Bruce and Langton, 1994; Vuong and Tarr, 2004; Nederhouser et al., 2007). Notably, Liu et al. (2000) found that recognition was poor for faces missing surface pigmentation (but with intact 3D information). In other studies, employing faces with a similar pigmentation pattern but differing shape (Russell et al., 2006) or non-pigmented faces (Bruce and Langton, 1994), there was little or no effect of polarity reversal on face matching (but see Gilad et al., 2009, that polarity-reversal effects may be limited to some face parts). Whatever the reasons, neurophysiological evidence suggests different mechanisms between inversion and polarity reversal: Itier (Itier and Taylor, 2002) reported that electro-encephalogram (EEG) recordings showed different neural sources of early (P1) effects resulting from inversion compared to polarity reversal effects (see also Itier et al., 2006, for similar results with MEG). The research literature therefore suggests that CFE and polarity reversal, like face inversion, specifically affect face recognition, but not—or only to a limited degree—recognition of non-face objects (Subramaniam and Biederman, 1997; Nederhouser et al., 2007).

We tested two predictions. If processing of misaligned half faces (presumed to be non-holistic in the sense of changed second-order relationships between parts) and/or polarity reversed faces (either affecting second-order relationships or face-part recognition itself) relies on the same processing capacity as does the processing of intact faces, then we expect that the presence of misaligned and polarity reversed faces respectively will reduce the processing of peripheral distracter faces (like upright and inverted faces do; Thoma, 2014). If, however, the nature of processing misaligned and polarity reversed faces means that they do not share processing resources with intact faces, then the misaligned and polarity reversed faces will impose fewer capacity demands, and peripheral distracter faces should receive processing (similar to the low load conditions in Thoma and Lavie, 2013). We predicted that if face-specific capacity limits are determined by face parts or features (Gold et al., 2012) rather than configural properties (Maurer et al., 2002; Laguesse and Rossion, 2013) then we would expect that only the misaligned face manipulation but not contrast reversal will load a face-specific capacity.

The current investigation includes three experiments. Experiment 1 aimed to confirm that interference from distracter faces occurs irrespective of task load for non-face targets (as first reported by Lavie et al., 2003) and two further experiments examine the effects of disrupting configural face processing on face-specific load capacity using the CFE (Young et al., 1987) and polarity reversal (Galper, 1970).

Experiment 1

Experiment 1 employed a visual search and binary classification task similar to that first used by Lavie et al. (2003) and which was replicated in Thoma and Lavie (2013; Experiment 2). In each trial, participants classified the name of a famous male politician or film star in displays of either low (target name plus two non-target name-like letter strings) or high (target name plus five non-target name-like letter strings) perceptual load. In all conditions, the face of a famous politician or film star was presented in the periphery (see Figure 1). The key measure of interest was the effect of the congruency between the target name and the distracter face on response latencies and accuracy, as a function of perceptual load.

FIGURE 1

Figure 1. Examples of displays in Experiment 1. Shown is a congruent display with a relevant set size of three items (left panel) or six items (right panel; see caption of Figure 3 for copyright information on the face images).

Materials and Methods

Participants

Participants were recruited from the student body at the University of East London and all reported normal or corrected-to-normal vision. Potential participants were asked to name eight famous faces from the images used in the experiment, which included four male politicians (David Cameron, Tony Blair, George Bush and Bill Clinton) and four male film stars (Hugh Grant, Robert DeNiro, Daniel Craig, and George Clooney). Sixteen people (mean age 21.3, SD = 2.5; 5 males) who could name all eight faces participated without compensation. Written consent was obtained and the study was approved by the Ethics committee of the University of East London.

Stimuli and Procedure

Participants were placed in front of a 15” CRT monitor at a distance of approximately 60 cm. They were asked to attend to the center of the display and classify a target name as that of a famous politician or a film star through a key press, whilst ignoring a peripheral distracter face. In the low load condition, there were two additional non-target letter strings in the search area. The famous name was displayed in one of six vertical positions (rows), with two of the other (adjacent, or both above or below) rows filled by name-like non-sense letter strings. In the high load condition, the famous name was displayed in one row and all five remaining rows were filled by non-sense letter strings. All non-targets were non-sense letter strings in a first name-last name format, e.g., “Cgerth Jnfedgsa.” The distracter face either matched the target name (congruent condition) or was selected from the faces in the other category (incongruent condition).

The relevant search display was presented in a vertical column in the center of the display. Target and non-target letter stimuli were shown in Arial 12 bold, and the horizontal expanse of the letter strings was between 3.5 cm (3.34 degrees) and 4.9 cm (4.68 degrees). The vertical expanse from the top edge to the bottom edge was 3 cm (2.86 degrees) in the low load condition and 6 cm (5.73 degrees) in the high load condition. Distracter face images were presented in grayscale with a standardized vertical size of 3.4 cm (3.24 degrees) and positioned at the periphery of the screen 4 cm (3.82 degrees) to the left or right of fixation.

E-prime 1.1 was used to run the experiment and counterbalancing was applied regarding the target category (politician vs. films star), identity, and positions of the target (six positions) and distracter (left or right). Participants ran through a practice block of 96 trials followed by 4 experimental blocks of 96 trials each, with conditions randomly intermixed in each block. Displays remained visible for 3 s unless the participant responded sooner. Response times and error rates were analyzed using parametric tests, except when assumptions for normal distribution of data were violated (non-parametric tests were then used, for error rates) or the assumption of sphericity (as happened for RTs, Greenhouse-Geisser corrections were then used).

Results

Only correct response times (RTs) greater than 150 ms were analyzed; trials with responses faster than 150 ms were excluded (1.5% of trials). A two-way, within-subjects Analysis of Variance (ANOVA) was carried out on correct RTs. There were two levels of load, set size three (low load) and set size six (high load), and two levels of congruence (congruent vs incongruent) for the distracter face relative to the target name.

In the RTs there was a significant main effect of load, F_{(1, 15)} = 336.3, p < 0.001, partial η² = 0.95. RTs were faster under low load (M = 1197, SD = 140) compared to high load (M = 1488, SD = 158). The main effect of congruency was also significant, F_{(1, 15)} = 8.42, p = 0.011, partial η² = 0.36. RTs (see Figure 2) were faster on congruent trials (M = 1318, SD = 140) compared to incongruent trials (M = 1366, SD = 158). Importantly, there was no interaction between load and congruency, F_{(1, 15)} = 0.5, p = 0.48, indicating that the congruency effect produced by the distracter faces remained unchanged as a function of load.

FIGURE 2

Figure 2. Mean reaction times in the name classification task of Experiment 1 as a function of set size and congruency. Error bars represent standard error of the mean.

The congruency effect was significant for set size 3 [t_{(1, 15)} = 2.49, p = 0.025] and set size 6 [t_{(1, 15)} = 2.26, p = 0.039]. An analog analysis of the error rates in each condition (overall M = 8%, SD = 7%) did not reveal any significant main effects or an interaction (all Fs < 1.19; See Table 1).

TABLE 1

Table 1. Mean error rates (in percent) and Standard deviation for conditions in Experiment 1.

The main effect of load in the RT analysis confirmed that load was successfully manipulated. Nonetheless, the congruence effect was unaffected by increasing load with non-face stimuli, suggesting that the processing of distracter faces was independent of the attention required for processing the central non-face stimuli. This result therefore replicates findings with almost identical paradigms in Lavie et al. (2003) and Thoma and Lavie (2013).

Experiment 2

In Experiment 1, task-irrelevant faces were processed irrespective of the attentional demands of the relevant task, which could suggest (i) that face recognition is capacity free, or (ii) that face processing has capacity limitations, but that it does not compete for resources with processing non-face information (the relevant names in this case). The previous finding that increasing the attentional demands of the relevant task by adding face stimuli to the relevant set does modulate the processing of peripheral distracter faces (Thoma and Lavie, 2013; Thoma, 2014), suggests that face processing is subject to capacity limitations, but that these are face-specific. The question remains which aspects of face processing drive the face-specific capacity limitation. Since processing of inverted faces was found to consume capacity (Thoma, 2014), holistic face processing appears not to be a necessary condition to exhaust face-specific capacity. Experiment 2 was designed to further test this assertion, by presenting a to-be-recognized target face together with either intact or chimeric non-target faces. In line with Thoma and Lavie (2013), we predicted that intact non-target faces would eliminate the congruency effect produced by peripheral distracter faces. The key effect of interest was the congruency effect for displays containing misaligned non-target faces. If such faces are able to consume capacity despite not being processed as a face-like configural whole, we predicted a reduction in the distracter congruency effect, similar to the previous finding using inverted faces (Thoma, 2014). Such a finding would suggest that face-capacity limits are determined by non-configural representations of faces.