Original Research ARTICLE
Front. Hum. Neurosci., 29 December 2008 | https://doi.org/10.3389/neuro.09.018.2008
Category specificity in early perception: face and word N170 responses differ in both lateralization and habituation properties
Sackler Institute for Developmental Psychobiology, Weill Medical College of Cornell University, New York, NY, USA
Department of Child and Adolescent Psychiatry, University of Zurich, Zurich, Switzerland
Unité de Neurosciences Cognitives and Laboratoire de Neurophysiologie, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
N170 event-related potential (ERP) responses to both faces and visual words raises questions about category specific processing mechanisms during early perception and their neural basis. Topographic differences across word and face N170s suggests a form of category specific processing in early perception – the word N170 is consistently left-lateralized, while less consistent evidence supports a right-lateralization for the face N170. Additionally, the face N170 shows a reduction in amplitude across consecutive individual faces, a form of habituation that might differ across studies thereby helping to explain inconsistencies in lateralization. This effect remains unexplored for visual words. The current study directly contrasts N170 responses to words and faces within the same subjects, examining both category-level habituation and lateralization effects. ERP responses to a series of different faces and words were collected under two contexts: blocks that alternated faces and words vs. pure blocks of a single category designed to induce category-level habituation. Global and occipito-temporal measures of N170 amplitude demonstrated an interaction between category (words, faces) and block context (alternating categories, same category). N170 amplitude demonstrated class-level habituation for faces but not words. Furthermore, the pure block context diminished the right-lateralization of the face N170, pointing to class-level habituation as a factor that might drive inconsistencies in findings of right-lateralization across different paradigms. No analogous effect for the word N170 was found, suggesting category specificity for this form of habituation. Taken together, topographic and habituation effects suggest distinct forms of perceptual processing drive the face N170 and the visual word form N170.
Electrophysiological studies have shown that visual processes are specialized for certain stimulus categories within the first 200 ms of stimulus presentation. Such fast and specialized brain processes have most consistently been indexed by an increased occipito-temporal N170 component of the event-related potential (ERP) for faces (e.g., Bentin et al., 1996 ; Botzel et al., 1995 ; Jeffreys, 1989 ; for a recent review see Rossion and Jacques, 2008 ) and visual words (e.g., Bentin et al., 1999 ; Maurer et al., 2005a ,b ) compared to control stimuli (e.g., words compared to non-orthographic strings of symbols or forms; faces compared to nonface objects). Enhancements in N170 responses have been linked to perceptual expertise via experiments demonstrating enhanced N170 responses to pictures of birds and dogs within bird- and dog experts respectively (Tanaka and Curran, 2001 ), or from the reduction of N170 amplitude in response to faces when processing objects of expertise concurrently (e.g., cars, Rossion et al., 2007 ).
Similar evidence for the role of experience in N170 enhancement has come from experiments in the reading domain. The N170 response to words written in Japanese scripts was increased in Japanese readers compared to native English speakers without experience with the script (Maurer et al., 2008 ; see also Wong et al., 2005 for a similar effect in Chinese), and the N170 response to visual words increased relative to control stimuli, only after children had experienced reading training and mastered basic reading skills (Maurer et al., 2006 ).
In addition to amplitude differences, the lateralization of the N170 can differ by category. For example, the topography of N170 responses to faces is more right-lateralized, whereas N170 to words is more left-lateralized (for a direct comparison, see Rossion et al., 2003 ). This pattern of differences over left and right hemispheres for N170 responses to faces and words is reported consistently enough in the literature to rule out this effect being spurious, yet enough counter examples exist to suggest experimental factors are at play that influence the lateralization of N170. These factors have not been adequately controlled across studies. For example, a right-lateralized N170 in response to face stimuli was generally reported in studies that presented faces randomly among other object classes (e.g., Bentin et al., 1996 ; Botzel et al., 1995 ; Eimer, 1998 ; Itier and Taylor, 2004 ; although see Rossion et al., 2000 ). In some studies that presented faces in a series of other faces, however, the face N170 was bilateral or showed even a left-lateralization in the group mean (Deffke et al., 2007 ; Schweinberger et al., 2002a ,b ).
Such findings suggest that the context in which a face is presented, i.e., whether it occurs among other faces or among stimuli of other object classes, may systematically influence the degree to which a face is processed in the right hemisphere. On the other hand, left-lateralization of N170 responses to visual words appears rather consistently in skilled readers, and do not show any evidence of being manipulated by context. For example, left-lateralization has been found both under conditions in which words are presented among other words (Hauk and Pulvermuller, 2004 ; Hauk et al., 2006 ; Maurer et al., 2005a ,b ; Simon et al., 2007 ) as well as when appearing randomly among control stimuli ( Bentin et al., 1996 , 1999 ; Rossion et al., 2003 ; Simon et al., 2004 ). Other studies have suggested that the degree of left-lateralization may depend on linguistic rather than contextual factors (Hauk et al., 2006 ; Maurer et al., 2005a ).
This pattern of findings across studies provides some initial support for the hypothesis that lateralization of the N170 is category-related, reflecting different perceptual encoding mechanisms applied to words versus faces (Farah, 1991 ). Furthermore, this lends support to the notion that a category-level habituation effect may operate differently for faces than for words, specifically leading to reduced right-lateralization for face N170 when this category is repeated within pure blocks of face stimuli. Thus the degree to which stimulus categories are repeated in rapid succession may help explain some differences reported in the topography of the face N170 and at the same time may provide additional evidence for category specific processes of perceptual encoding that underlie N170 responses.
To test these hypotheses directly, we examined whether the sequential context in which face and word stimuli are presented affects the topography of N170 component. We presented streams of unique face and word stimuli either within blocks that consisted of only one stimulus category or within blocks that alternated between these two stimulus categories. We expected that within the “blocked” condition the N170 response to faces would demonstrate category-level habituation effects – especially over the right hemisphere, relative to N170 responses to faces presented in the “alternating” condition. Furthermore, we anticipated that contrasting such context manipulations across the categories of faces versus words should reveal important differences for habituation effects across these two categories given that reading explicitly involves viewing visual words in rapid succession yet naturalistic viewing faces rarely involves rapid succession of face stimuli.
Eighteen volunteers (7 males, 11 females) who were between 19 and 30 years of age (23 years on average) participated in the study. They were all right-handed and spoke English as their first language, as determined by self-report. Participants provided informed consent according to a protocol approved by the Institutional Review Board of the Weill Medical College of Cornell University.
Visual word and face stimuli were either presented in blocks with just faces or just words (blocked condition), or in blocks in which face and word stimuli alternated in order (alternating condition; see Figure 1 ). A stimulus was shown for 250 ms followed by an average interstimulus interval of 1500 ms (jittered between 1250 and 1750 ms). The stimuli consisted of 36 faces and 36 words that were presented twice in each condition. Half of the words and half of the faces appeared first in the blocked condition, the other half first in the alternating condition. Block position and sequence were counterbalanced across subjects. Although the majority of stimuli required no response, subjects were asked to press a single button in response to an occasional (11.1%) face or word that appeared upside-down. Accuracy and reaction time data were collected for responses to these rare target stimuli.
Figure 1. Experimental design. Face and word stimuli were either presented in pure category blocks or in blocks alternating with stimuli from the other category. The task required subjects to press a button whenever a stimulus was presented upside-down (11.1%).
Faces were presented at a visual angle of 1.65° × 2.25°, words at 1.45° × 0.55°. The face stimuli were color photographs adopted from a previous study (see Jacques and Rossion, 2004 for details). The word stimuli were frequent [frequency per million based on HAL corpus (Balota et al., 2007 ; Lund and Burgess, 1996 ), computed as base-10 logarithm: mean 2.2 ± SD 0.6], concrete nouns, four letters in length presented in lower case.
EEG Recording and Analysis
We recorded 128-channel EEG data (Hydrocel net from Electrical Geodesics) initially with a Cz reference. Data were sampled at 500 Hz with 0.1–200 Hz filter settings with impedances below 50 kΩ. Using BESA software, data from channels with excessive artifacts were excluded and replaced applying spherical spline interpolation, and eye blinks were corrected applying multiple source eye correction to minimizing topographic distortions (Berg and Scherg, 1994 ). The data then were digitally bandpass filtered (0.3–30 Hz), segmented (−150 to 750 ms), artifact rejected (±80 uV) and averaged for non-target stimuli in each condition separately. Using Brain Vision Analyzer software, the averaged data were re-referenced to average reference (Lehmann and Skrandies, 1980 ), and baseline-corrected (100 ms prior to stimulus presentation). After computing global field power (GFP; Lehmann and Skrandies, 1980 ) grand mean values were computed for all conditions.
The resulting ERP data were analyzed in a two-step strategy. First, time windows that demonstrated habituation effects (i.e., significant differences between one-category versus two-category contexts for either faces or words) were identified for further analysis.
This first step involved computing topographic analyses of variance (TANOVA, part of the LORETA-KEY software package) on non-normalized (raw) maps contrasting the one-category versus two-category context conditions at each time point. Separate analyses were carried out for faces and words to identify time windows that would be used to explore habituation effects that might impact either category. These time point-wise TANOVAs were conducted on raw maps to detect any systematic amplitude differences between the two conditions (i.e., blocked and alternating contexts) by running a nonparametric randomization test (Holmes et al., 1996 ) on the GFP of difference maps between two conditions (Lehmann and Skrandies, 1980 ; Strik et al., 1998 ). An alpha level of p < 0.01 in at least three consecutive time frames was adopted to control for multiple comparisons across these two analyses. The joint probability of p < 0.01 in three time frames (0.01 × 0.01 × 0.01) surpasses the Bonferroni corrected threshold of p < 0.05 across the 450 time frames tested in these analyses (0.05/450).
Time windows related to early perceptual processes (i.e., <300 ms) that demonstrated any effect of the context manipulation (regardless of stimulus category) were extracted in a form that was applied equally across all four conditions (faces in one-category blocks, faces in two-category blocks, words in one-category blocks, words in two-category blocks) and subjected to two further analyses.
Since these TANOVA effects reflect habituation effects within categories, but do not inform whether the habituation effects differ between categories, additional analyses were carried out to test interactions between Context and Category. For the resulting time window we ran a MANOVA on GFP with the factors Context (blocked vs. alternating presentation) and Category (faces vs. words). To be consistent with the previously reported literature on laterality effects for N170 responses ( Bentin et al., 1996 , 1999 ; Maurer et al., 2005a ,b , 2008 ; Rossion et al., 2003 ), an additional analysis focused on direct comparisons between left and right occipito-temporal channels at the negative topographic peak. The topographic peak channels were defined based on the average across all four condition grand mean values in the corresponding time segment.
Accuracy and reaction time of responses to targets were analyzed within MANOVAs with the factors Context (one-category vs. alternating two-category blocks) and Category (faces vs. words). Mean RT was computed after excluding trials with inaccurate responses, anticipations (i.e., RT < 200 ms), or unusually long delay (i.e., >3 SD slower than the mean). The range of individual subject performance (accuracy range 86–100%; RT 469–617 ms) indicated that all subjects understood and complied with instructions, and thus no subjects were excluded from the analyses.
Overall behavioral results of the responses to the inverted stimuli which served as rare targets demonstrated that subjects were highly accurate and attentive to the primary task (>95% correct in all conditions). While there were no significant main effects or interactions for reaction time (i.e., all F-values <1), accuracy of responses revealed a subtle yet significant effect of context that was different for words and faces [Context × Category interaction, F(1,17) = 4.6, p < 0.05]. Subjects were more accurate in detecting inverted words in the context of pure blocks (words 98.3%, faces 97.9%), yet more accurate for detecting inverted faces that appeared in the mixed blocks (words 95.2%, faces 99.3%). No other effects were significant.
Time Point-Wise TANOVA
Consecutive comparisons between ERP maps elicited by faces in the blocked versus alternating conditions revealed processing differences (p < 0.01) within the N170 time range (168–188 ms), as indicated in Figure 2 . The corresponding comparison between words presented in blocked versus alternating conditions revealed no processing differences within the N170 time range. Processing differences for words were only found after 300 ms, at a sub-threshold significance level.
Figure 2. Time point-wise habituation effects. A point-to-point comparison (TANOVA) between ERP maps from the same-category “blocked” condition and ERP maps from the “alternating” categories condition revealed effects of habituation (p < 0.01) in the N170 time range for faces (top) but not for words (bottom). Overlaid are the GFP curves indicating that the N170 response was stronger for faces than words and that the habituation effect for faces started at the N170 peak.
The time segment in the N170 in which habituation effects were found for faces was further analyzed regarding global map strength and lateralization to test whether habituation effects were specific for faces, as would be indicated by an interaction between Category (faces, words) and Context (alternating, blocked) factors.
Global Field Power
In the GFP analysis demonstrated significant effects for the Category × Context interaction [F(1,17) = 12.1, p < 0.01], and also for the two associated main effects of Category [F(1,17) = 17.7, p < 0.001] and Context [F(1,17) = 7.0, p < 0.05]. Post hoc t-tests revealed that faces elicited a larger N170 than words within each presentation context examined separately (blocked: p < 0.05, alternating: p < 0.001). Furthermore, post hoc t-tests revealed a significant habituation effect for the amplitude of the N170 for pure blocks of faces relative to faces presented in alternating blocks (p < 0.01), but no such effect was present for words (p = 0.58; see also Figure 3 A).
Figure 3. Topographic ERP maps and occipito-temporal waveforms. (A) The segment identified by the TANOVA (168–188 ms) shows stronger N170 ERP maps for faces (top left two columns) than for words (bottom left two columns). As indicated by the t-maps, a habituation effect occurred only for the faces (top right), but not for the words (bottom right). The habituation effect was right-lateralized for the negative pole and left-lateralized for the positive central counterpart of the face N170 (the vertex positive potential, VPP, see Jeffreys, 1989 ). (B) The occipito-temporal waveforms illustrate the larger N170 for faces (black lines) than for words (gray lines). While the N170 for words was left-lateralized when presented after other words or after faces, the N170 for faces was right-lateralized only for the faces that followed words. Faces that were presented after other faces showed a bilateral N170, suggesting a right-lateralized effect of habituation.
To test the possibility that the differential habituation effect between the two stimulus categories was driven by the overall larger N170 response for faces compared to words, we computed the same analyses after GFP normalization for Category (division by GFP mean of blocked and alternating conditions, separately for each Category). Accordingly, the Category main effect was eliminated, and both the Context main effect [F(1,17) = 5.1, p < 0.05] and the Context × Category interaction remained significant [F(1,17) = 8.7, p < 0.01], indicating that differences in habituation between the categories were not due to overall GFP differences between face and word conditions.
The occipito-temporal channel analysis at the topographic peak (see also Figures 3 A,B) corroborated the GFP analysis by revealing main effects for Category [F(1,17) = 48.0, p < 0.001] and Context [F(1,17) = 14.6, p < 0.01] as well as a Category × Context interaction [F(1,17) = 5.5, p < 0.05]. An additional Category × Hemisphere interaction indicated a more left-lateralized N170 for words compared to a more right-lateralized N170 for faces [F(1,17) = 5.0, p < 0.05]. Although the three-way interaction between Category, Context and Hemisphere resulted in only a non-significant trend [F(1,17) = 2.6, p = 0.12], a separate analysis conducted only on face stimuli showed that the face N170 was right-lateralized with alternating presentation, but bilateral with blocked presentation [Context × Hemisphere, F(1,17) = 5.7, p < 0.05]. Finally, to test whether the lateralization difference of the habituation effect for faces would be purely topographic we did the same analysis for the faces after normalizing for GFP. Indeed, the Context × Hemisphere interaction remained significant [F(1,17) = 4.7, p < 0.05].
The present study replicates previous findings reviewed above that visual word N170 exhibits left-lateralization and the face N170 right-lateralization, indicating different hemispheric involvement of the neural networks involved in processing these two classes of stimuli under equivalent task conditions. Most importantly, the present study demonstrated that in addition to these topographic main effects of stimulus category, the N170 response to faces and words responds differently to the sequential context in which the stimuli are presented.
The amplitude of the face N170 was strongest when each face was immediately preceded by a visual word, and the amplitude was significantly reduced (i.e., habituated) for faces preceded by other faces. This evidence of class-level habituation of the face N170 effect is in line with studies showing a reduction of the face N170 as a result of adaptation (Harris and Nakayama, 2007 ; Kovacs et al., 2006 ) and sensory competition (Jacques and Rossion, 2004 ). It suggests that presenting stimuli from the face category in a repetitive manner leads to similar neurophysiological effects on face processing as when a face is presented after viewing another face for an extended time period (adaptation) or when a face is presented in close spatial proximity of another face (spatial competition). The fact that such habituation effects occur across unique faces provides additional evidence that neural processes within the first 200 ms after presentation are sensitive to the general category of faces. Such habituation effects, however, may not only occur at the category-level, but may additionally be sensitive to information at the individual face level. This is suggested by a small N170 reduction to a face preceded by the same individual face compared to a face preceded by a different face (e.g., Itier and Taylor, 2002 ; Jemel et al., 2003 ), or by more substantial effects observed when using a long face adapter duration and a short ISI between the adapter and target faces (Jacques et al., 2007 ).
In addition to amplitude modulation, context effects also impacted the lateralization of the face N170. The habituation effect of the face N170 was stronger over the right hemisphere, such that faces presented among other faces were significantly less right-lateralized than faces interleaved with visual words. This novel result may help explain some of the variability across earlier studies regarding whether the face N170 was right-lateralized (Bentin et al., 1996 ; Itier and Taylor, 2004 ) or not (Deffke et al., 2007 ; Schweinberger et al., 2002a ,b ). This suggests that the same aspect of face processing that leads to the typical right-lateralized N170 topography is also prone to habituation. Habituation paradigms thus may serve as a tool to further understand the nature of right-lateralization in face processing.
A second aim of the study was to investigate whether such habituation effects are a general property of all N170 eliciting stimuli, or whether the processes of habituation that influences the face N170 may play out differently across other categories of stimuli known to induce N170 responses. In stark contrast to the face N170 results, the visual word N170 was unaffected by alternating versus blocked contexts. This null effect, in the context of the current study, drives a significant interaction between category and context, thereby demonstrating that N170 processes associated for faces and words are differentiated not just by patterns of lateralization, but also by patterns of habituation. This suggests habituation of the N170 reflects a functional difference between fast, specialized processing of words and faces rather than some more general process that applies equivalently across these classes of stimuli.
Future research will be required to further probe the nature of these functional differences, which amount to a form of category specific processing. Although the current study was not designed to differentiate the nature of such differences, there are multiple dimensions that distinguish early processing of words and faces in the current experiment which might be investigated. First, it is possible that top-down, block level factors were not fully equated across the three central conditions of this experiment. For example, detecting inverted words within entire blocks of words may require a different degree of attention than detecting inverted faces within entire blocks of faces. The accuracy and reaction time data in this study, however, does not support a task difficulty effect between entire blocks of words and faces, although it is possible that more sensitive measures might uncover such differences.
A second more general factor that could differentiate words and faces might involve differences in processing novel items versus frequently encountered items. Note that in the current experiment familiar categories of facial features formed novel faces, yet familiar categories of letters formed words of relatively high frequency and familiarity. However, it is unlikely that the novelty/familiarity of word forms is what accounts for the observed N170 habituation differences between faces and words. Indeed, the current finding of block context impacting faces but not words has recently been replicated in a study that extended our design to include contrasts between faces and novel non-words (Mercure et al., personal communication) thereby ruling out word level familiarity as a critical factor.
Another avenue to explore might examine lower level visual factors that could be thought to interact with block context. Face information typically consists of relatively low spatial frequency information compared to visual words – a factor which is thought to contribute to the right-lateralization of faces (Sergent, 1985 ) but also potentially results in greater spatial overlap between successive presentations of face stimuli. This explanation, however, may be less likely, as a reduction of the face N170 can also be obtained with line-drawings of faces which consist of little low-frequency information (Harris and Nakayama, 2007 ).
Alternatively, these effects may be driven by factors that are more intricately related to differences in the way these two classes of stimuli are learned. While reading typically involves rapid sequential processing of many stimuli, each of which fall within the category of visual words, habituation within blocks of visual word stimulation would be at odds with the functional goals of fluent reading. Naturalistic face processing, on the other hand, may rarely require processing of multiple faces presented in rapid succession. Thus, within this category of stimuli, category-level habituation may not interfere with overall processing goals. At the same time, it is possible that category-level habituation of structural features across faces may serve to enhance function within this domain. If naturalistic face processing typically involves looking at a face for an extended period of time, habituation to information important for classification of a stimulus as a face may facilitate the detection of more subtle changes within a face, as in the case of tracking facial expressions over time. Thus, both the presence of face N170 habituation for faces and its absence for words may help meet specific processing demands unique to processes applied to each category of stimuli.
In conclusion, it appears that in addition to lateralization differences often reported between face N170 and visual word N170 topographies, N170 responses for each of these categories reflect early perceptual processes that can be differentiated by their habituation profiles. Whatever parallels may be drawn between face and visual word N170 phenomena (for review, see McCandliss et al., 2003 ), clear differences between these categories must be accounted for during the early phases of perceptual encoding.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was supported by the Swiss National Science Foundation (Fellowship for Prospective Researchers: UM), and the US National Science Foundation (NSF 529112: BDM). We thank Nicholas Hindy for assistance in data collection.
Rossion, B., Gauthier, I., Tarr, M. J., Despland, P., Bruyer, R., Linotte, S., and Crommelinck, M. (2000). The N170 occipito-temporal component is delayed and enhanced to inverted faces but not to inverted objects: an electrophysiological account of face-specific processes in the human brain. Neuroreport 11, 69–74.