Horizontal Information Drives the Behavioral Signatures of Face Processing

Goffaux, Valerie; Dakin, Steven

doi:10.3389/fpsyg.2010.00143

ORIGINAL RESEARCH article

Front. Psychol., 28 September 2010

Sec. Perception Science

Volume 1 - 2010 | https://doi.org/10.3389/fpsyg.2010.00143

Horizontal information drives the behavioral signatures of face processing

Valérie Goffaux^1,2*

Steven C. Dakin³

¹ Department of Neurocognition, Maastricht University, Maastricht, Netherlands
² Educational Measurement and Applied Cognitive Science Unit, Department of Psychology and Educational Sciences, University of Luxembourg, Luxembourg
³ Institute of Ophthalmology, University College London, London, UK

Recent psychophysical evidence indicates that the vertical arrangement of horizontal information is particularly important for encoding facial identity. In this paper we extend this notion to examine the role that information at different (particularly cardinal) orientations might play in a number of established phenomena each a behavioral “signature” of face processing. In particular we consider (a) the face inversion effect (FIE), (b) the facial identity after-effect, (c) face-matching across viewpoint, and (d) interactive, so-called holistic, processing of face parts. We report that filtering faces to remove all but the horizontal information largely preserves these effects but conversely, retaining vertical information generally diminishes or abolishes them. We conclude that preferential processing of horizontal information is a central feature of human face processing that supports many of the behavioral signatures of this critical visual operation.

Introduction

Facial information is of paramount significance to social primates such as humans. Consequently, we have developed visual mechanisms which support the recognition of thousands of individuals based only on facial information, while resisting the sometimes drastic changes in appearance that arise from changes in distance, lighting, or viewpoint. Despite a large amount of research, the visual information supporting human face recognition remains unclear.

From the point of view of engineering, a number of automated face recognition algorithms make use of principal component analysis (or similar techniques for reducing data-dimensionality) deriving basis images from a set of sample faces, such that any face can be decomposed into a weighted-sum of eigenfaces (Sirovich and Kirby, 1987; Turk and Pentland, 1991). Such approaches enjoy varying levels of success but are limited by the fact that they operate in a space that is determined by the representation of the raw data (i.e., lists of pixel values). This makes them vulnerable to simple changes in the image that have little impact on human performance. For example it has recently been shown that the structure of many of the most significant eigenfaces (i.e., those that can capture most of the variation between individual faces) serves to capture gross image structure due to variation in lighting (Sirovich and Meytlis, 2009).

The eigenface approach relies on a multi-dimensional representation of faces, a notion that has been extended into the domain of human face recognition (Valentine, 1991). Under this view, each individual is represented as a vector within a multi-dimensional face space containing a series of measurements made along some (presently unknown) dimensions (e.g. eye separation). At the origin of this space sits the average face. Psychophysical support for this theory comes from the observation that prolonged exposure (adaptation) to a single face, elicits facial identity after-effects, shifting the perceived identity of subsequently viewed faces away from the adapting facial identity (Leopold et al., 2001) along a vector known as the identity axis (i.e., one running from the adapting face to the average). Face-space encoding requires access to attributes of a series of “common” facial features (e.g. (x,y) location of the pupil centers), i.e., features that can always be extracted from a given face. In this way one can encode two faces on a common set of feature dimensions (although the psychological validity of such dimensions remains to be established). What then are the low-level visual mechanisms that might support a visual code that is appropriate for faces?

A logical starting point for understanding what visual information is used for face coding, is to consider what information is made explicit through known mechanisms in the human primary visual cortex (V1). V1 neurons are well-characterized by Gabor filters; i.e., they primarily decompose regions of the scene falling within their receptive fields along the dimensions of spatial frequency (SF) and orientation (Hawken and Parker, 1987). The idea of decomposing faces using such a local analysis of orientation and SF has been proposed as a way to characterize facial information for automated recognition (Kruger and Sommer, 2002). Here we briefly review evidence from studies of human face recognition concerning the role of orientation and SF structure of faces.

Face perception seems to be more sensitive to manipulation of SF, than does human visual processing of other categories of objects (Biederman and Kalocsai, 1997; Collin et al., 2004 but see Williams et al., 2009). Perception of different categories of facial information is driven by different ranges of SFs (Sowden and Schyns, 2006). For example, the perception of facial identity is tuned to a narrow band of intermediate SFs (7–16 cycles per face, cpf; Costen et al., 1994, 1996; Gold et al., 1999; Näsänen, 1999; Willenbockel et al., 2010). In contrast, the coarse structure provided by low SFs contributes to both the processing of holistic facial information (interactive feature processing) and the perception of fearful expressions (Collishaw and Hole, 2000; Goffaux et al., 2003, 2005; Goffaux and Rossion, 2006; Goffaux, 2009; Vlamings et al., 2009). Interactive processing is strongly attenuated when faces are filtered to retain only high SFs, a finding that seems to support the notion that high SF channels encode only fine facial details.

With respect to the orientation structure of faces, most work has focused on the drastic impairment in recognition that occurs when faces are inverted within the picture plane. Since human recognition of objects in other visual categories is less prone to planar inversion effects (e.g., Robbins and McKone, 2007), the face inversion effect (FIE) is thought to be a signature of the particular mechanisms engaged for this visual category. Considerable evidence indicates that inversion disrupts feature interactive, so-called holistic, processing. The notion of interactive processing arises from the observation that when presented with an upright face observers find it difficult to process a given feature (e.g., top half or just the eyes) without being influenced by the surrounding features within the face (Sergent, 1984; Young et al., 1987; Rhodes et al., 1993; Tanaka and Farah, 1993; Farah et al., 1995; Freire et al., 2000). Inversion disrupts interactive processing of features, making observers better at processing features independently of each other. The fact that inversion disrupts interactive face processing suggests the latter is a core aspect of human ability to discriminate and recognize faces (though see Konar et al., 2010).

In contrast to the impairment caused by planar inversion, humans readily recognize others despite changes in viewpoint and illumination. Viewpoint generalization is presumably achieved through the combination of 2D and 3D cues in face representations (O’Toole et al., 1999; Jiang et al., 2009; Wallis et al., 2009).

Recently, it has been suggested that what is special about face processing is its dependence on a particular orientation structure. Dakin and Watt (2009) showed that recognition of familiar faces benefits most from the presence of their horizontal information, as compared to other orientation bands. Specifically, when face information is limited to a narrow band of orientations, recognition performance peaks when that band spans horizontal angles, and declines steadily as it shifts towards vertical angles. These authors also reported that it is the horizontal structure within faces that drives observers’ poor recognition of contrast–polarity–inverted faces, and their inability to detect spatially inverted features within inverted faces (Thompson, 1980). Finally they showed that, in contrast to objects and scenes, the horizontal structure of faces tends to fall into vertically co-aligned clusters of horizontal stripes, structures they termed bar codes (see also Keil, 2009). Scenes and objects fail to show such structural regularity. Dakin and Watt (2009) suggested that the presence and vertical alignment of horizontal structure is what makes faces special. This notion is also supported by Goffaux and Rossion (2007) who showed that face inversion prevents the processing of feature spatial arrangement along the vertical axis. These structural aspects may convey resistance to ecologically valid transformations of faces due to, e.g., changes in pose or lighting.

The goal of this paper is to address whether a disproportionate reliance on horizontal information is what makes face perception a special case of object processing. We investigated four key behavioral markers of face processing: the effect of inversion, identity after-effects, viewpoint invariance, and interactive feature processing. If the processing of horizontal information lies at the core of face processing specificity, and is the main carrier of facial identity, we can make and test four hypotheses. (1) The advantage for processing horizontal information should be lost with inversion as this simple manipulation disrupts face recognizability and processing specificity. (2) Identity after-effects should arise only when adapting and test faces contain horizontal information. (3) Face recognition based on horizontal structure should be more resistant to pose (here, viewpoint) variation than face recognition limited to other orientations. (4) Interactive feature processing – i.e., the inability to process the features of an upright face independently from one another – should be primarily driven by the horizontal structure within face images.

Experiment 1. Horizontal and Vertical Processing in Faces, Objects and Natural Scenes

A behavioral signature of face perception is its dramatic vulnerability to inversion. Inversion disrupts the interactive processing of face parts and face recognizability in general. However, what visual information is processed in upright, but not inverted, faces is the subject of current axiomatic debate driven by considerable divergence of findings within the empirical literature (e.g., Sekuler et al., 2004; Yovel and Kanwisher, 2004; Goffaux and Rossion, 2007; Rossion, 2008). Here, we investigated whether the FIE arises from the disruption of visual processing within a particular orientation band.

Materials and Methods

Subjects

Eighteen Psychology students (Maastricht University, age range: 18–25) participated in face and car experiments. Thirteen additional students consented to perform the experiment with scenes. All subjects provided their written informed consent prior to participation. They were naïve to the purpose of the experiments and earned course credits for their participation. They reported either normal, or corrected-to-normal vision. The experimental protocol was approved by the faculty ethics committee.

Stimuli

Twenty grayscale 256 × 256 pixel pictures of unfamiliar faces (half males, neutral expression, full-front), cars (in front view), and natural scenes (van Hateren and van der Schaaf, 1998) were used (Figure 1). Face and car pictures were edited in Adobe Photoshop to remove background image structure. To eliminate external cues to facial identity (e.g., hair, ears, and neck), the inner features of each individual face were extracted and pasted on a generic head (one for female and one for male faces; as in Goffaux and Rossion, 2007). The mean luminance value was subtracted from every image. Filtered stimuli were generated by Fast Fourier transforming the original image using Matlab 7.0.1 and multiplying the Fourier energy with orientation filters: these allowed all SFs to pass but had a wrapped Gaussian energy profiles in the orientation domain, centered on one orientation with a particular bandwidth specified by the standard deviation parameter (cf Dakin and Watt, 2009). We used a standard deviation of 14° selected to broadly match the orientation properties of neurons in the primary visual cortex (e.g., Blakemore and Campbell, 1969; Ringach et al., 2002). Note that this filtering procedure leaves the phase structure of the image untouched, and only alters the distribution of Fourier energy across orientation. There were three filtering conditions: horizontal (H), vertical (V) and horizontal plus vertical (H + V) constructed by summing the H and V filtered images. After inverse-Fourier transformation, the luminance and root-mean square (RMS) contrast of all resulting images were adjusted to match mean luminance and contrast values of the original image set (i.e., prior to filtering). Note that this normalization was applied in all following experiments.

FIGURE 1

Figure 1. Face, object and scene pictures (A–C) were filtered to preserve either horizontal (D–F), vertical (G–I) or both horizontal and vertical orientations (J–L).

Inverted stimuli were generated by vertically flipping each image. Stimuli were displayed on a LCD screen using E-prime 1.1 (screen resolution: 1024 × 768, refresh rate: 60 Hz), viewed at 60 cm. Stimuli subtended a visual angle of 8.9° × 8.9°.

Procedure

The procedure was identical for face, car, and scene experiments. A trial commenced with the presentation of a central fixation cross (duration range: 1250–1750 ms). The first stimulus then appeared for 700 ms, immediately followed by a 200-ms mask (256 × 256-pixel Gaussian noise mask; square size of 16 × 16 pixels). Both the first stimulus and the mask appeared at randomly selected screen locations across trials (by ± 20 pixels in x and y dimensions). After a 400-ms blank interval, where only the fixation marker was visible, the second stimulus appeared and remained visible until subjects made a response (maximum duration: 3000 ms). Subjects were instructed to respond as quickly and as accurately as possible, using the computer keyboard, whether the pair of stimuli were the same or different. On 50% of trials stimuli differed, on 50% they were identical.

In a given trial, faces were presented either both upright, or both inverted. Faces in a trial also always belonged to the same filter-orientation condition (i.e., both H, both V, or both H + V). Upright and inverted trials were clustered in 20-trial mini-blocks; the order of the other conditions (filter-orientation and similarity) was random. There was a 10-s resting pause every ten trials. Feedback (a running estimate of accuracy) was provided every 40 trials. Prior to the experiment, subjects practiced the task over 50 trials.

There were 12 conditions per experiment in a 2 × 2 × 3 within-subject design. The three conditions were: similarity (same versus different), planar-orientation (upright versus inverted), and filter-orientation (H, V or H + V). We ran 20 trials per condition, to give a total of 240 experimental trials.

Data analyses

Using hits and correct-rejections in every planar-orientation and filter-orientation condition, we computed estimates of sensitivity (d′) for each subject following the loglinear approach (Stanislaw and Todorov, 1999). Sensitivity measures were submitted to repeated-measure 2 × 2 × 3 ANOVA. ANOVAs were computed for each experiment separately. Conditions were compared two-by-two using post hoc Bonferroni tests.

Results

Faces. Main effects of planar-orientation and filter-orientation were significant (F(1,17) = 63.76, p < 0.0001, F(2,34) = 33, p < 0.0001). These factors interacted significantly (F(2,34) = 7.17, p < 0.003) stressing that inversion disrupted the processing of H (p < 0.0002), H + V (p < 0.0001), but not V (p = 1) face information. When faces were upright, discrimination was better for stimuli containing H or H + V information, compared to V information (H/V comparison: p < 0.0001; H + V/V comparison: p < 0.0001; Figure 2A). When faces were inverted, sensitivity level was comparably low across H and V conditions (p = 1). The only significant difference across inverted conditions was that sensitivity was higher for H + V than V filter-orientation (p < 0.005).

FIGURE 2

Figure 2. Results from Experiment 1. Bar graphs plot mean discrimination sensitivity (d′) as a function of filter-orientation and planar-orientation (error bars indicate mean square error, MSE). Data are shown for the three classes of stimuli: (A) faces, (B) cars, and (C) scenes. The scatter plots below each bar graph show the individual data it was derived from.

Cars. In stark contrast to faces, sensitivity at discriminating cars was not affected by inversion (see Figure 2B; F(1,17) = .06, p = .8), in any of the filter-orientation conditions (planar-orientation by filter-orientation interaction: F(2,34) = .99, p = .38). Yet, the main effect of filter-orientation was significant (F(2.34) = 39.37, p gnificant (F(2.34) = 39.37 0.0001), highlighting overall lower discrimination sensitivity based on V information than on H or H + V information (H/V comparison: p gnificant (F(2.34) = 39.37 0.0001; H + V/V comparison: p gnificant (F(2.34) = 39.37 0.0001).

Natural scenes. There was no inversion effect when discriminating scenes (F(1,12) = 0.001, p = 0.97; Figure 2C), in any filter-orientation (planar-orientation by filter-orientation interaction: F(2,24) = 1.45, p = 0.25). Only the main effect of filter-orientation was significant (F(2,24) = 8.3, p < 0.002). Sensitivity to scene differences was modestly worse at H orientation as compared to V and H + V orientations (ps < 0.025). There was no difference between the latter two conditions (p = 0.9) but we cannot exclude a ceiling effect.

Discussion

Similar to previous report for the naming of famous faces (Dakin and Watt, 2009), the present findings indicate that the ability to discriminate between unfamiliar faces is best supported by horizontal information. In contrast to Dakin and Watt (2009), all external face cues were removed in the present experiment and subjects had to rely on the spatial arrangement of inner facial features to discriminate faces. Our results therefore show that the horizontal advantage stems from the improved processing of inner facial cues. Critically, we report that the advantage for processing horizontal face structure is lost when faces are inverted in the picture plane, further suggesting that the horizontal advantage for upright stimuli marks the engagement of face-specific perceptual mechanisms. Furthermore, the loss of horizontal processing advantage with face inversion indicates that the better efficiency for processing horizontal information in upright faces does not merely derive from stimulus orientation structure, but arises as a consequence of the interactions between stimulus structure and observer. We also report that upright face discrimination was better when horizontal and vertical information were combined than when only horizontal information was provided to the participants. Both horizontal and vertical information are thus clearly useful for face recognition, but it is clear that, for images restricted in orientation content, recognition is easier when horizontal structure is provided.

Object discrimination was also better when images were restricted to horizontal than vertical information. Unlike faces, this advantage was observed independent of the planar-orientation of the stimuli. In agreement with previous evidence (Hansen and Essock, 2004), scene discrimination was worse at horizontal than at vertical orientation. In contrast to faces, orientation combination did not improve discrimination sensitivity for objects and scenes.

Experiment 2. Dependence of the Facial Identity After-Effect on Orientation Structure

Key psychophysical support for the psychological validity of the face space model (described in section “Introduction”) comes from the observation that identity after-effects are strongest between faces that belong to the same identity axis (Leopold et al., 2001; Rhodes and Jeffery, 2006). That perception of upright faces seems to disproportionately rely on horizontal information suggests that the horizontal orientation band is largely responsible for carrying the relevant cues to facial identity. If so, identity after-effects should be preferentially driven by horizontal and not by vertical image structure. Experiment 2 addressed this issue directly by measuring identity after-effects when adapting faces contained either broadband, horizontal, or vertical information.