Social attention with real versus reel stimuli: toward an empirical approach to concerns about ecological validity
- 1Social and Behavioral Sciences, Cognition and Natural Behavior Laboratory, Arizona State University, Glendale, AZ, USA
- 2Department of Psychology, University of British Columbia, Vancouver, BC, Canada
- 3Department of Psychology, University of Sheffield, Sheffield, UK
- 4Department of Psychology, University of Essex, Colchester, UK
Cognitive neuroscientists often study social cognition by using simple but socially relevant stimuli, such as schematic faces or images of other people. Whilst this research is valuable, important aspects of genuine social encounters are absent from these studies, a fact that has recently drawn criticism. In the present review we argue for an empirical approach to the determination of the equivalence of different social stimuli. This approach involves the systematic comparison of different types of social stimuli ranging in their approximation to a real social interaction. In garnering support for this cognitive ethological approach, we focus on recent research in social attention that has involved stimuli ranging from simple schematic faces to real social interactions. We highlight both meaningful similarities and differences in various social attentional phenomena across these different types of social stimuli thus validating the utility of the research initiative. Furthermore, we argue that exploring these similarities and differences will provide new insights into social cognition and social neuroscience.
Imagine the following scenario. You are walking down a busy city street and you see a life size mural of two people sitting down eating a meal. You approach the mural and inspect it. This inspection will likely involve a characteristic pattern of eye movements and brain activity. Now imagine walking down that same busy city street and you see two real people sitting down eating a meal. You approach and inspect them. This inspection will also involve a characteristic pattern of eye movements and brain activity. The question motivating the present review is the extent to which researchers should expect the patterns across these situations to be qualitatively and quantitatively equivalent and, more fundamentally, how to approach such a question. This issue has recently surfaced in the context of research on social neuroscience given its reliance on stimuli more akin to the first scenario (e.g., simple, static representations of socially relevant stimuli) than the second scenario (e.g., an actual live social interaction) in attempting to map the social brain. One of the critical assumptions driving social neuroscience is that the knowledge gained about the social brain using the former class of stimuli will generalize to the richer scenarios associated with everyday social cognition. However, as others have remarked, this could prove to be a dangerous assumption (Neisser, 1978; Ochsner, 2004; Schilbach et al., 2006; Kingstone et al., 2008; Kingstone, 2009; Zaki and Ochsner, 2009). That said, it is important that this concern not turn into a presumption of non-equivalence (see Mook, 1983). Rather, we argue for an empirical approach to the determination of the equivalence of different social stimuli. Specifically, we argue for the systematic comparison of different types of social stimuli ranging in their approximation to a real social interaction as a means to address issues about the equivalence of social stimuli and as a means to provide new insights into social cognition and social neuroscience.
In the review that follows, we describe a number of studies in the context of social attention research that assess putatively social phenomena in different environments ranging in their approximation to a real social interaction. While it is difficult to operationalize the extent to which a stimulus approximates a real social interaction, we have tried to sample stimuli that would span the implied continuum. Toward this end, we discuss social attention research using static schematic faces, dynamic schematic faces, static photographs of faces, static photographs of people in complex social scenes (e.g., people having lunch), dynamic images of people in complex social scenes (e.g., a movie), situations with the potential for real social interaction (e.g., walking down a street), and real social interactions (e.g., in conversation). By focusing our review on the social attention literature, it allows us to engage the discussion about the equivalence of social stimuli within a common framework, though the issues are by no means restricted to social attention. This review is not meant to be exhaustive; instead the review focuses on research that highlights both similarities and differences in how we attend to social stimuli that vary in their approximation to a real social interaction. Thus, the purpose is not to simply advocate for the use of more naturalistic stimuli (as others have done) but to provide examples that testify to the utility (and necessity) of such an approach. In this respect the modulations of various social phenomena by the nature of the stimulus (i.e., looking at an image of a face versus looking at a real face), which is only possible through comparison between stimuli, provides a central piece to the puzzle. Thus, we hope to support this special issue's call to “go social” by describing some of the work that has “gone social” and what it has revealed about social cognition and social cognitive neuroscience. Furthermore, while we highlight relevant neuroscientific research where appropriate, it is important to note that many of the examples provided are behavioral. Nonetheless, given that human behavior is the bedrock of social neuroscience, the implications for social neuroscience are no less clear.
Advocating for the use of stimuli that vary in their approximation to a real social interaction in the context of studying social cognitive neuroscience is not in and of itself a condemnation of research using social stimuli that are far removed from such a situation. These “unnaturalistic” stimuli have clear benefits (e.g., control) and abandoning their use is fraught with as many challenges as neglecting to use stimuli that more closely approximate a real social interaction. For example, eschewing stimuli because they are not “naturalistic” would severely limit a researcher's ability to isolate the mechanisms that make social cognition possible. Take for example the point light walkers used in studies of biological motion. This research has made important contributions to our understanding of social cognition and social cognitive neuroscience (e.g., Pavlova, 2012) arguably as a direct result of stripping away characteristics of the stimuli that might make them more “naturalistic” on some level. The approach advocated here embraces the entire range of available social stimuli and specifically highlights the utility of directly comparing between them. That being said, in the present context history demands that an emphasis be put on highlighting the positive aspects of using stimuli that better approximate a real social interaction as opposed to highlighting, for example, the positive aspects of the status quo, though this should not be taken as indicating that the latter is devoid of such aspects. Extensive discussions of the benefits of “external invalidity” are available elsewhere (see Mook, 1983; Banaji and Crowder, 1989).
Before beginning the review it is important to note that using stimuli that better approximate a real social interaction comes with methodological challenges. For example, while monitoring behavioral and/or neural responses to a picture of two people engaging in a social interaction is straightforward, it would be difficult to monitor behavioral and/or neural responses (particularly the latter) as individuals actually engage in a real social interaction. Despite these difficulties, we do not see the challenge as insurmountable, and in fact we will highlight research that has begun to overcome some of these challenges. Furthermore, taking on the methodological challenge will likely require innovations (e.g., technological) and new paradigms for exploring social cognition (e.g., Wilms et al., 2010) both of which would likely be viewed as welcome. Lastly, even if some aspect of real social interactions were beyond the scope of current (and future) methods, this would not negate the benefits of exploring the comparisons that are technologically feasible (e.g., comparing a static schematic face to a real dynamic face). The following review aims to provide support for these claims.
Folk knowledge suggests that people are very interested in where other humans are directing their attention. Driven by this intuition, researchers have proposed that eye gaze represents a special social attentional cue (Baron-Cohen, 1995), and that this cue is associated with specific neural mechanisms (such as that revealed by activity in the superior temporal sulcus; Campbell et al., 1990; Itier and Batty, 2009). Gaze direction can give the observer an indication of a person's mental state, their focus of attention, and their goals (Baron-Cohen, 1995; Shimojo et al., 2003; Tipper et al., 2003; Ristic et al., 2005; Frischen and Tipper, 2006). This notion leads to the expectation that where someone looks should have a profound impact on where we allocate our attention (i.e., we should attend to where others are looking). This idea dovetails with work suggesting that the morphology of eyes have evolved for social communication (Kobayashi and Kohshima, 1997) and that we are skilled at detecting the direction of gaze (e.g., Anderson et al., 2011).
To examine gaze following in the laboratory, researchers have modified a cueing task popularized by Posner (1980) and used it to investigate whether people are biased to attend to where someone else is looking. Typically observers are presented with a schematic face that looks to the left or right. This is then followed by the presentation of a target to the left or right of the face. Results from such experiments indicate that people are faster to respond to the target when it appears at the location the face is looking at (Friesen and Kingstone, 1998; Driver et al., 1999; Langton and Bruce, 1999). This gaze cueing effect occurs rapidly (i.e., less than 100 ms after the appearance of the cue; Friesen and Kingstone, 1998; Frischen et al., 2007) and is thought to be largely obligatory (e.g., orienting in response to gaze occurs even if the gaze-cue is counter-predictive; Friesen et al., 2004). Gaze cueing has become a signature of not only the tendency for people to reorient attention in the direction of another's eyes but of social attention in general. The latter point is well supported by the proportion of social cognitive neuroscience papers that focus on gaze following (as opposed to other potential social behaviors; see Itier and Batty, 2009).
While the simple elegance of the original gaze cueing paradigm is laudable, a cursory glance at the simple schematic faces typically used raises just the kind of question discussed in the introduction. While the schematic faces and eyes are recognizable as such, they are clearly not real faces and real eyes (i.e., the ones we presumably follow and have followed over our lifetime). This leads to the concern that schematic faces could elicit different behavioral and neural responses than real faces. Consistent with this concern, Sagiv and Bentin (2001) demonstrated important differences in how faces are processed when those faces are schematic versus real images of faces. While schematic faces and real images of faces generated an equivalent N170, an ERP component thought to index face processing in the right hemisphere (Bentin et al., 1996), when the researchers inverted the faces the neural response was qualitatively different across the different stimulus types. Specifically, inversion of schematic faces lead to a reduction in the amplitude of the N170 whereas inversion of real images of faces lead to an enhancement of the N170. The authors attributed the difference to the relative abilities of the two stimulus types to engage holistic and part based face processing mechanisms. Thus, even a simple difference (i.e., schematic face versus an image of a real face) in the stimulus can lead to a qualitative difference in brain activity in response to that stimulus. Results such as these underline the potential that gaze processing might be influenced by inherent differences in stimuli that covary with changes in the extent to which they represent naturalistic social stimuli.
Gaze Cueing with Images of Real Faces
Researchers have discovered important differences in gaze cueing when using stimuli that vary in their approximation to a real social interaction. For example, Hietanen and Leppanen (2003) compared gaze cueing using schematic and real images of faces. While they found that both types of stimuli produced a significant gaze cueing effect, schematic faces actually produced a larger gaze cueing effect than real images of faces. This particular form of non-equivalence can be interpreted in a number of theoretically useful ways. For example, on the argument that gaze cueing with schematic faces is social, one might expect that a change in the stimulus that made it more similar to the gaze cues we typically encounter in social interactions would increase the magnitude of the gaze cueing effect. That it did not might suggest that orienting in response to schematic faces is at least partially mediated by non-social mechanisms (e.g., motion cues; Farroni et al., 2000). Alternatively, as Hietanen and Leppanen (2003) suggest, the use of a schematic face could enhance the gaze cueing effect by reducing the noise introduced by the presence of other facial features (e.g., skin texture) that are typically present while individuals follow the gaze of conspecifics.
Gaze Cueing with Dynamic Stimuli
Aside from schematization, the stimuli typically used in gaze cueing studies also differ from real faces in that the former are static rather than dynamic. Motion is an important aspect of face processing (e.g., Curio et al., 2010) and gaze following at least early in development (Farroni et al., 2000). For example, Farroni et al. (2000) demonstrated that early in development individuals would only orient to gaze if a motion cue was present (i.e., the eyes actually moved). While adults do not require such a cue in order to follow gaze (i.e., static gaze cues yield gaze cueing effects; Friesen and Kingstone, 1998), research using complex dynamic gaze cues has revealed interactions between gaze and emotion (Putman et al., 2006) that are absent (or much less pronounced) using simple static or simple dynamic gaze cues (Hietanen and Leppanen, 2003). Hietanen and Leppanen (2003) compared a static gaze-cue and a simple dynamic gaze cue. In the dynamic condition, a face was presented initially with straight gaze and after a delay a face was presented with averted gaze, thus giving the appearance of the eyes moving. In the static condition, only the latter image was presented. Results demonstrated a significant cueing effect in both conditions and no difference in the magnitude of the gaze cueing effect across conditions. Furthermore, Hietanen and Leppanen (2003) failed to find any evidence for an effect of facial emotion (e.g., happy, sad, fearful) on the magnitude of the gaze cueing effect using either type of stimulus (i.e., static or dynamic). Thus, across a static and dynamic gaze cue, the pattern of results appeared similar such that the gaze cueing effects were equivalent and showed a similar lack of interaction with the emotion of the face.
In contrast to the Hietanen and Leppanen (2003) research, Putman et al. (2006) did find an interaction between gaze cueing and emotion (i.e., greater gaze cueing effect for fearful expressions) when they employed a more complex dynamic representation of emotion and gaze. Putman et al. (2006) used stimuli wherein both the emotion and the gaze changed simultaneously across frames of a video (rather than a two-frame gaze-only change). Thus, the emotion-based modulation of gaze cueing was revealed when emotion and gaze changes occurred dynamically (i.e., a stimulus that better approximates a natural social stimulus; see also Bayless et al., 2011). One potential explanation for this pattern of results is based on the relative ability of static and dynamic faces to engage areas of the brain responsible for social cognition (e.g., Kilts et al., 2003; Sato et al., 2004; Schultz and Pilz, 2009; Vuilleumier and Righart, 2011). Specifically, given that the majority of our experience with faces, and faces displaying emotion, is with dynamic faces, the neural regions dedicated to processing this type of information may show a stronger response when presented with dynamic relative to static faces (Schultz and Pilz, 2009). In a similar vein, it may be, as others have argued (e.g., O'Toole et al., 2002), that motion aids in facial recognition either because it facilitates perception of the 3D structure of the face or because we actually retain motion information about a face when storing its representation. Consistent with these ideas, Schultz and Pilz (2009) found that dynamic faces elicited stronger responses than static faces in face processing areas of the brain and Vuilleumier and Righart (2011) note that of the limited fMRI studies that used dynamic faces as stimuli, responses were increased compared to those elicited by static stimuli in face-sensitive areas. These stronger responses are also related to improved learning (i.e., there is a dynamic advantage for learning faces; Pilz et al., 2006, 2009). With respect to emotion specifically, in a series of studies Sato and colleagues (2004, 2007a,b) have demonstrated that dynamic stimuli are better able to engage the mechanisms that support the processing of emotion. For example, dynamic expressions of fear activated the amygdala more strongly than static expressions of fear (Sato et al., 2004), and dynamic expressions were more likely to lead to facial mimicry (Sato and Yoshikawa, 2007b). If the interaction between gaze and emotion is linked to the effectiveness of the stimulus to engage the mechanisms responsible for understanding emotion in others (as seems likely) or social cognition in general, then this could explain the increasing likelihood of observing gaze and emotion interactions with stimuli that better approximate a real social stimulus. It is interesting to note that even the complex dynamic stimuli used by Putman et al. (2006) and others (Bayless et al., 2011) are subtly different from viewing, for example, a video of a real face or a real face (Schultz and Pilz, 2009) suggesting the need for further research. Thus, while the gaze cueing effect is present across a wide range of stimuli varying in their approximation to a real (live) gazing face; it is clear that important differences also exist. In the next section we consider another salient social attentional phenomenon: the bias to attend to others.
Attending to Others
In addition to people's tendency to follow gaze, researchers interested in social attention have also focused on people's tendency to orient attention to other people, their faces and in particular, their eyes. Attending to others represents an important pre-requisite to normal social functioning. In the following, we review research investigating overt attention to others using stimuli that vary in their approximation to a real social interaction.
The Eye Bias
One of the most investigated areas in social attention research concerns the bias of individuals to attend to the eyes of others. This research has typically employed measures of overt attention (i.e., eye tracking) while individuals view still photos of faces. For example, individuals will spend the majority of their fixations on the eyes of the faces in the photos (Walker-Smith et al., 1977; Barton et al., 2001; Henderson et al., 2005). As with gaze cueing, attending to the eyes of others seems to be at least partially automatic (Itier et al., 2007; Laidlaw et al., in press). Here again, the eyes are viewed as a kind of “special” cue for social attention. Indeed, some have suggested that there exists a neural mechanism devoted exclusively to the detection and processing of gaze information (e.g., the Eye Direction Detector; Baron-Cohen, 1995) though neural evidence for such a module is mixed (see Itier and Batty, 2009).
The Eye Bias in Static Complex Social Scenes
One potentially important difference between the types of stimuli typically used in studies demonstrating an eye bias (e.g., still photos of faces) and a real social interaction is that in the latter, the eyes are embedded within a complex visual array consisting of other objects (animate and inanimate) that could compete for attention. From research on attention to the eyes during face perception, it is unclear whether biases toward the eyes reflect true interest in the eyes or a less social phenomenon, such as a center of gravity effect initially pulling gaze to the eyes of forward facing images (e.g., Bindemann et al., 2009). To examine this question, Birmingham et al. (2008) investigated the gaze bias in complex static social scenes containing one or several people in a variety of poses either doing something (e.g., reading a book; active scenes) or doing nothing (e.g., sitting on their own; inactive scenes). In addition, participants were given three possible task instructions: to view freely, to describe the scene, or to describe where people in the scene were directing their attention. Results demonstrated that even in these complex static scenes with multiple potential objects competing for attention, participants committed the highest proportion of their fixations to the eyes of others in the scene (controlling for the size of the stimulus). The magnitude of the gaze bias, however, was not invariant across conditions. Birmingham et al. (2008) demonstrated that the eye bias was stronger in the more social scenes (i.e., scenes containing multiple people doing something together) and in the task requiring social cognition (i.e., describe where people were attending). Thus, the bias to attend to the eyes of others extends to complex static scenes and is modulated by “social” factors such as the number of individuals in the image. Importantly, the latter finding would have been impossible to uncover had only single isolated faces been used as stimuli.
The Eye Bias in Dynamic Social Scenes
A complex static scene, like those used in Birmingham et al. (2008), might provide a better approximation to a real social interaction than an isolated face, but it nevertheless falls short in at least one important respect: natural social interactions are dynamic not static. To address this important difference, Foulsham et al. (2010) explored attention while individuals watched a dynamic social interaction. Specifically, participants viewed a video recording of people taking part in a group decision-making task while their eye movements were monitored. Important for the current discussion, Foulsham et al. (2010) demonstrated that, as with isolated faces and complex social scenes, most of the fixations on people were targeted at an individual's eye region. Thus, the eye bias was present in a complex dynamic scene consisting of individuals gesturing, taking turns speaking, and against a complex background where the eyes were relatively small. These data demonstrate that the general bias to look at the eyes is present in static isolated faces, complex static scenes and complex dynamic scenes (i.e., videos).
While the studies above have identified clear similarities in attention to the eyes across a range of social stimuli (i.e., isolated images of faces, complex static scenes and complex dynamic scenes), a recent series of studies investigating the gaze bias in individuals with autism spectrum disorder (ASD) has also revealed important differences as well (Klin et al., 2002; Pelphrey et al., 2002; Speer et al., 2007). Individuals with autism are believed to have impairments in social attention. Indeed, marked impairment in eye contact and responding to gaze during infancy, childhood and adulthood is a diagnostic feature of the disorder (Lord et al., 2000). Consequently, autism has figured prominently in investigations of social attention with numerous studies attempting to assess which aspects of social attention are deviant in those with ASD (e.g., Klin et al., 2002; Dawson et al., 2004; Fletcher-Watson et al., 2009; Freeth et al., 2011a,b). Pelphrey et al. (2002) reported that when (high-functioning) individuals with ASD looked at static faces they showed less of a bias to attend to the eyes than did individuals without ASD. This finding is consistent with the general notion that individuals with ASD have a social attentional deficit such that they fail to pay attention to salient social cues like eyes. However, van der Geest et al. (2002a,b) failed to replicate the behavioral patterns found by Pelphrey et al. (2002) both within a similar task (Experiment 1; van der Geest et al., 2002a) and using static complex social scenes (van der Geest et al., 2002b). Freeth et al. (2010) also found that non-developmentally delayed adolescents with ASD spent a similar proportion of overall viewing time fixating on the eye and mouth region of people when presented with static complex scenes.
The research failing to detect an overall differential attentional bias toward the eyes of others in autism relied on static scenes. As with the interaction between gaze and emotion in gaze cueing, the pattern of findings appears to be different when dynamic social stimuli are considered. Specifically, Klin et al. (2002) found a robust difference in eye bias across an autistic and non-autistic sample (i.e., a marked reduction in attention to the eyes in individuals with autism) using dynamic social scenes (i.e., a movie). Furthermore, they found that attention to the eye region was the best predictor of group membership (i.e., autistic group versus non-autistic group). In a recent attempt to reconcile these disparate findings across static and dynamic stimuli, Speer et al. (2007) compared gaze patterns in an autistic and non-autistic sample using four types of stimuli (1) social dynamic (i.e., social encounter in a movie), (2) isolated dynamic (i.e., a single person in a movie), (3) social static (i.e., two or more people in static scene), and (4) isolated static (i.e., one person in a static scene). Critically, all of the stimuli were from the same movie used by Klin et al. (2002). Speer et al. (2007) demonstrated, in the dynamic social condition, that individuals with autism were less likely to look at the eyes than individuals without autism in the dynamic social condition (replicating Klin et al., 2002; see also Riby and Hancock, 2009), however, they did not differ in any of the remaining conditions (i.e., isolated dynamic, social static, isolated static). As with the emotion and gaze cueing studies, one interpretation of these results is that the amount of overlap between the stimulus and a typical social situation (which is strongest in the dynamic social condition) determines the extent that a stimulus engages areas of the brain responsible for social cognition. Thus, with greater similarity between stimuli and real social situations, the differences between a typically developing group and a group with social attentional deficits (i.e., individuals with autism) might better reveal themselves, assuming these differences are based in the relative function of the neural mechanisms supporting social cognition. In other words, more contrived and less socially realistic stimuli may serve to mask underlying deficits and equate performance across two groups who, in actuality, perform very differently in everyday social situations. Whatever the mechanism responsible for the disparate findings, it seems clear that the nature of the social stimuli may be particularly important in investigating social cognition in special populations. In the following section we move from attention to the eyes of others to attention to others in general and also shift to recent studies that have been conducted in situations that involve either the potential for, or the involvement in, a real social interaction.
Social Attention in the Wild
The studies reviewed above, and social cognitive neuroscience in general, have focused predominantly on individual minds and brains observing representations of other people (e.g., static image or dynamic set of images). This approach, however, seems to overlook a defining attribute of social cognition, namely, social interaction (De Jaegher et al., 2010; Schilbach, 2010). While it may be difficult to identify the attributes that constitute a real social interaction (see De Jaegher et al., 2010 for a recent attempt), the notion of reciprocity or at least the potential for reciprocity seems to be central. The individuals depicted in images or movies can neither look back at the observer nor can they alter their behavior in response to the observer's actions. In addition, the observer's actions cannot influence the individuals in the static images or movies. These missing elements are potentially important provided the view that the neural mechanisms that realize social attention likely evolved to facilitate this two-way interaction (e.g., Emery, 2000). In the following we discuss research aimed at understanding how individuals attend to others in situations that have the potential for real social interaction or actually involve real social interaction (i.e., in the “wild”).
Foulsham et al. (2011) asked whether the allocation of gaze in a live situation was the same as that observed while individuals watched a video of a similar situation. Participants wore a mobile eye tracker while walking to buy a coffee, a trip that required a short walk outdoors through a university campus. These same participants subsequently watched, in the laboratory, first-person videos of their own walk or the same walk by another participant. Critically, by presenting video of the same events to people in the laboratory condition, the contents of central vision were kept as similar as possible across the real and movie conditions. This permitted a comparison of individuals attention to others while embedded in the actual “buying coffee” situation versus simply watching a video of someone participating in the “buying coffee” situation (from a first-person perspective). While there are a number of informative comparisons to be made in this study (see Foulsham et al., 2011), we focus here on individuals attending to other people.
In Foulsham et al. (2011), other people were frequently fixated on in the live and video conditions and the amount of time spent looking at people was equivalent across conditions. Interestingly, while the amount of time fixating people was similar across the conditions, there was a subtle difference in when people were looked at. Specifically, people in the scene who were far away from the observer were looked at equivalently in both conditions (i.e., live and video), however, when people in the scene were close to the observer (e.g., were approaching to pass by) they were more likely to be gazed at in the video condition than the live condition. This result suggests that when there exists the potential for social interaction (e.g., the walkers), participants adjusted their attentional focus, perhaps as a means to deter such interaction.
In a related study, Laidlaw et al. (2011) compared an individual's tendency to look at other people in a live and video condition. The Laidlaw et al. (2011) experiment took place in a more intimate setting than the Foulsham et al. (2011) study and focused exclusively on social looking behavior. Participants were told that they were taking part in a “real world search” task that involved wearing a mobile eye tracker. Participants were fitted with the eye tracker, calibrated, and then told to wait in a room for the experimenter to return. Participants were unaware that this waiting period was part of the experiment. Critically, for half of the participants there was a confederate sitting in the waiting room and for the other half there was a videotape of the same confederate filmed from an earlier session. The live confederate did not interact with the participant but the potential for interaction according to the participant certainly existed. Thus, the experiment compared looking behavior in a waiting room where the potential for social interaction existed (in the case of the live confederate) or was absent (in the case of a recording of the confederate). The results were consistent with those from Foulsham et al. (2011). Specifically, Laidlaw et al. (2011) demonstrated that participants looked at the videotaped confederate more often and for a longer duration than the live confederate. In addition, when Laidlaw et al. (2011) compared gaze to the confederates versus a baseline non-social object in the room, the frequency and duration of looks to the live confederate were actually less frequent than to the baseline object, whereas looks to the videotaped confederate were significantly greater than to the baseline object.
The Laidlaw et al. (2011) results provide a compelling counter-point to the idea (generated from research using social stimuli with no potential for social interaction) that individuals are biased to attend to other people and hints toward the influence of complicated social norms and practices that may govern social attention within real social situations. Interestingly, this result also implies that measuring “social” attention in scenarios that do not allow for social interaction (i.e., eye movements in social scenes) may exaggerate the extent to which we attend to others in everyday situations. Together these results suggest, as in Foulsham et al. (2011), that a live situation fundamentally alters how people attend to people. Specifically, when attentional objects represent real social agents for whom the actions of the observer (e.g., gazing) would have meaning, gaze patterns change.
Further support for the notion that the potential for social interaction alters attention has recently been provided in the context of gaze following. Gallup et al. (2012a,b) assessed the tendency for individuals to follow the gaze of others using naturalistic observation. The researchers placed an attractive object in a busy hallway and monitored individual's gaze behavior in response to the gaze of other pedestrians. Gaze toward the attractive object increased when other nearby pedestrians looked toward the object, consistent with the gaze following research reviewed above. Interestingly, Gallup et al. (2012a) demonstrated that this gaze following behavior was modulated by whether the nearby pedestrian was walking toward or away from the “participant” (i.e., the individual who did or did not follow gaze). Specifically, when the participant was behind the individual that looked at the attractive stimulus (i.e., they could not see them) gaze following was frequent. However, when the participant was facing the individual that looked at the attractive stimulus, they were actually less likely to look at it than if no one had looked at the attractive stimulus (i.e., a baseline condition). Thus, individuals were less likely to follow the gaze of someone who could see them. Note that in Gallup et al. (2012a) individuals were not only failing to exhibit gaze following when the nearby pedestrian was facing them, but rather the individuals gaze was inhibited when the oncoming pedestrian's gazed toward the attractive object. As with the Laidlaw et al. (2011) results, this research provides a salient counter-point to the power of gaze following in more traditional laboratory set ups. Similar results were reported by Gallup et al. (2012b) using a paradigm based on early work of Milgram et al. (1969). In this experiment, again using naturalistic observation of gaze following, Gallup et al. (2012b) placed confederates at a heavily trafficked location and had them stand and look upward. Important for the present discussion, pedestrians were more likely to follow the gaze of the confederate (i.e., look up) when they passed behind them rather than in front of them. Thus, again, gaze following was dependent on the relation between the gazer and the gaze follower.
The majority of studies reviewed in this section have involved potential social interactions between the observer and the observed. Participants inhabited the same environment as the other people and were able to interact with them, but situations were controlled so that no verbal or physical interaction actually took place. In a recent study, Freeth et al. (under review) again compared a live condition to a video condition (as in Foulsham et al., 2011; Laidlaw et al., 2011), but this time in the context of a genuine social interaction. In both conditions (manipulated across experiments) a female interviewer sat across a desk from the participant who was wearing an eye tracker and asked the participant a series of questions. In the live condition the interviewer was physically in the same room as the participant, thus replicating a real social interaction with its associated reciprocity. In the video condition the same social interaction was completed but the “interviewer” was a pre-recorded video of the interviewer.
Freeth et al. (under review) demonstrated a number of common gaze patterns across these conditions. For example, in both conditions, participants spent most of the time looking at the interviewer's face. In addition, participants were more likely to look at the interviewer, especially their face, when they were being asked a question versus when they were answering a question. Freeth et al. (under review) also found a number of interesting differences in gaze patterns across conditions. For example, in the live interview there was an eye contact effect that was not present in the video interview. Specifically, participants in the live condition were more likely to look at the interviewer's face than her body when the interviewer made eye contact than when eye contact was not made. This was not true in the video condition. Thus, interviewer eye contact was more effective at capturing participants' attention in the live interviews.
One interpretation of all of these results is that the meaning of attending to another (i.e., looking at another person) or attending to what another is attending to (i.e., gaze following) is altered by the nature of the situation. Previous research (including that reviewed above) has consistently shown a strong bias to attend to people, their faces, and their eyes but studies of interpersonal behavior have suggested that in a natural context people will sometimes avoid looking at others, a phenomenon known as civil inattention (Goffman, 1963; Zuckerman et al., 1983). Again, a key consideration is the imminent potential for interaction in “real” versus “reel” social situations. Looking at someone is a potent social signal, however, this is only true (for the most part) when the individual at whom we are gazing is real. Returning to our two situations in the introduction, you can stare at a mural of two people sitting down to a meal as much as you like, but the equivalent behavior when those two people are real social agents could have very different consequences. Risko and Kingstone (2011) provided further evidence for the importance of social context for individual's tendency to look at people by demonstrating that monitoring an individual's looking behavior with an eye tracker will reduce their tendency to look toward a provocative stimulus. Thus, the knowledge that one's eyes were being watched alters looking behavior, a result consistent with the impact of social presence on behavior (Bond and Titus, 1983; Risko et al., 2006; Crosby et al., 2008).
The idea that gaze takes on different meaning as the stimuli better approximate a real social interaction was recently investigated by Pönkänen et al. (2011) using event related potentials (ERPs) and a design conceptually analogous to those reviewed here (e.g., Foulsham et al., 2011; Laidlaw et al., 2011). These researchers assessed differences in face related brain activation for averted versus direct gaze by a real person or a static image of a person. Pönkänen et al. (2011) focused on the N170, a component demonstrated to be sensitive to averted versus direct gaze. Critically, when the gazer was a live person, the difference in the N170 between direct and averted gaze was larger than when the gazer was a static image. In other words, the neural response to gaze is modulated by the extent to which the stimulus approximates a real social interaction. Hietanen et al. (2008) reported similar results. These researchers demonstrated that direct gaze more strongly activated the approach-avoidance system than averted gaze, as indexed by electroencephalography and skin conductance measures. However, this was only true when the gazer was a live actor and was not true when the gazer was a static image of another person. Thus, as we have seen at various points in this review, a putatively social phenomenon (i.e., the difference between direct and averted gaze) is modulated by the extent to which the stimuli are real versus reel.
Further neuroscientific evidence for a difference between live and video interaction has been provided by Redcay et al. (2010). They report a study in which participants either took part in an interaction with the experimenter via video feed (while the participants were in a functional magnetic resonance imaging scanner) or watched a taped version of the same interaction. Thus, in one condition a live social interaction took place while in the other participants merely watched an interaction. Redcay et al. (2010) found increased activity in the live condition across a number of areas associated with social cognition, including right posterior superior temporal sulcus and the right temporoparietal junction. There was also increased activity in the live versus recorded condition in regions associated with attention (e.g., dorsal anterior cingulate cortex) and reward (e.g., regions within the ventral striatum). Redcay et al. (2010) also compared activity across a joint attention condition wherein the participant followed the experimenter's gaze to find a target and a solo attention condition wherein the participants did not follow the experimenter's gaze but the experimenter was nonetheless present. Critically, differences in brain activity between the joint and solo attention conditions were specific to the social cognitive brain regions that had previously been demonstrated to exhibit increased activity in the live condition relative to the recorded condition. Thus, a live social interaction was better able to engage the neural mechanisms thought to be intimately involved in social cognition. Taken together, these findings converge on the conclusions that a “live” situation fundamentally alters how individuals attend to others and accordingly how their brains respond to social stimuli.
Beyond Social Attention—the Mirror Neuron System
The general notion that some stimuli would be better at engaging the social brain than others (an idea touched on throughout this review) has received support from research on the mirror neuron system. The mirror neuron system “transforms sensory information describing actions of others into a motor format similar to that the observers internally generate when they imagine themselves doing that action or when they actually perform it” (Rizzolatti and Fabbri-Destro, 2008, p. 179). This system is hypothesized to play a fundamental role in social cognition (Frith, 2007; Rizzolatti and Fabbri-Destro, 2008) as it provides a basis for understanding the minds of others (e.g., their emotions). Important for the present discussion are recent findings demonstrating modulations of the response of the mirror neuron mechanism based on the extent to which the visual stimulus is a socially relevant stimulus (Jarvelainen et al., 2001; Shimada and Hiraki, 2006). For example, Shimada and Hiraki (2006) compared activity in the sensorimotor cortex of adults and infants using near infrared spectroscopy in an action observation condition (i.e., an actor performed a series of simple actions with an object), an object observation condition (i.e., an invisible actor performed a series of simple actions with an object) and a spontaneous object motion condition (i.e., control). Critically, each condition was also presented live or via video. Shimada and Hiraki's (2006) results demonstrated that only in the live condition was activity in the sensorimotor cortex significantly greater than in the control condition. When presented via video, the equivalent condition did not activate sensorimotor cortex any more than it was activated by spontaneous object motion. Jarvelainen et al. (2001) also demonstrated that responses within the human premotor cortex were greater when viewing live compared to pre-recorded human movements. Thus, the human brain's mirroring of others (a critical neural correlate of social cognition) can be altered by the medium in which the other appears (i.e., live versus video).
The reduced response of the mirror neuron system to “reel” stimuli versus “real” stimuli has also been observed in single neuron recording studies of the macaque brain. Ferrari et al. (2003), in the context of exploring mirror neuron responses to mouth actions, reported: “Mirror neurons that, during naturalistic testing, showed good responses to a hand action made by the experimenter, showed weak or no response when the same action, previously recorded, was shown on the screen” (p. 1705). Thus, similar to the results reviewed above, the mirror neuron system was less responsive to a video representation than to a live demonstration of an action. Interestingly, in a recent study of hand actions Caggiano et al. (2011), in the context of study hand actions, reported that video and live presentation of actions actually activated the mirror neuron system of the macaque in a similar manner. According to the researchers, the critical difference between the two studies was that in the case where the video stimuli failed to elicit a strong mirror neuron response, there had been no initial training task that encouraged the animals to attend to the location of the video in the first place. In conjunction, these studies make an important point in the present context. Namely, the comparison of stimuli that ranged in their approximation to a real action (i.e., live action versus filmed action) initially produced a pattern of results suggesting some form of non-equivalence (Ferrari et al., 2003). Subsequent work, making a similar comparison, then identified the potential source of that non-equivalence (i.e., attending to the video stimulus; Caggiano et al., 2011). This latter step thus provides a potential mechanism through which to explain (some) differences observed between “reel” and “real” stimuli, specifically, the relative ability of those stimuli to capture/hold an individual's attention. It is important to note that this latter insight would not have been uncovered had the researchers not engaged in the systematic comparison of stimuli ranging in their approximation to a real action. In addition, these researchers actually began with “real action” or what they called “naturalistic action” and only (cautiously) moved toward less “naturalistic” stimuli. This direction is the opposite of that typically employed (i.e., moving from less to more naturalistic stimuli), an issue that we will discuss briefly below and has been discussed at length in other work (e.g., Kingstone et al., 2008; Kingstone, 2009).
This review has focused on one approach to addressing concerns about the nature of social stimuli commonly used in social neuroscience research. This work has typically relied on simple stimuli (e.g., schematic faces) lacking, at least on its face, many of the potentially important characteristics of a real social interaction. This is a critical limitation if the neural mechanisms uncovered in the former “reel” instance differ quantitatively and/or qualitatively from those engaged in the latter “real” case. We have suggested here that a useful approach to addressing these types of concerns is to explicitly compare different types of social stimuli ranging in their approximation to a real social interaction. We have highlighted recent research that has done just that. This approach allows researchers the opportunity to identify similarities and differences in brain and behavior as the stimuli become more like the natural social stimuli with which our systems have evolved and developed to deal with.
The current review suggests that the promise of the approach described here has already started to be realized. The studies considered suggest important similarities and differences in social attention across different social stimuli ranging from a schematic face to a face-to-face interaction. For example, individuals will follow the gaze of a static and a dynamic schematic face and a static and dynamic image of a real face. In addition, the bias to look at another individual's eyes is present when the stimulus is an isolated face (Henderson et al., 2005; Laidlaw et al., in press), a complex social scene (Birmingham et al., 2008), and a dynamic social scene (i.e., a movie; Klin et al., 2002; Foulsham et al., 2010). Despite these and other similarities, there also appear to be important differences. For example, dynamic faces reveal effects of emotion on gaze following not observed for static faces (e.g., Putman et al., 2006). In addition, dynamic social scenes, relative to static ones, appear better able to reveal differences between individuals with and without a typical social attention system (Klin et al., 2002; Speer et al., 2007). Lastly, the propensity to look at other people (Foulsham et al., 2011; Laidlaw et al., 2011) and follow their gaze (Gallup et al., 2012a,b) seems to be profoundly altered when there is the potential for an actual social interaction. The presence of both similarities and differences seems to falsify any simple notion of equivalence or non-equivalence of social stimuli and, through attempts to understand these similarities and differences, researchers will better understand the variables that influence social cognition in general and how the brain responds to social stimuli in particular.
The methodological approach advocated here is based on a more general framework for cognition and cognitive neuroscience referred to as cognitive ethology (Smilek et al., 2006; Kingstone et al., 2008; Kingstone, 2009). Briefly, the basic idea behind the framework is to begin one's research approach at the level of the phenomenon of interest (e.g., real social interaction) and to systematically move toward the more simplified and abstracted level (e.g., looking at schematic faces). While much of the research reviewed here can be seen as going in the opposite direction, such that researchers have started with simplified and abstracted stimuli and have moved toward more ecological stimuli, both approaches have merit and are based fundamentally on the same notion: to systematically compare brain and behavior at various levels of abstraction. One caveat should be noted, as Kingstone (2009) suggests, by beginning at the level of the phenomenon of interest researcher's subsequent work can be benchmarked against the original phenomenon and conclusions can be related back to what is experienced there. However, when we begin using a possibly distant approximation to the phenomena of interest, researchers run the risk of spending a great deal of time, effort, and resources studying “phenomena” that are peculiar to (or worse even, products of) that distant approximation. That said, the purpose of the present review is not to espouse a particular direction (i.e., from artificial to naturalistic versus naturalistic to artificial) but rather to champion the act of moving along that continuum in either direction.
Implications for Social Neuroscience
The promise of social neuroscience is that we can understand the neural basis of social phenomena. Given the uniquely social environment of humans, other primates and their ancestors, such an understanding will have widespread ramifications for our knowledge of how and why the brain evolved in the way that it did. Far from being a special circumstance, it is likely that social context colors the majority of our cognitive and behavioral repertoire. However, the challenge of social neuroscience, and the impetus for the current issue, is to bring the social environment under the microscope of current neuroscientific methods.
We have argued that one useful approach toward this general goal will be to compare social phenomena using stimuli ranging in their approximation to a real social interaction. This approach has both methodological and theoretical advantages for social neuroscience. Methodologically the approach provides researchers with an empirical assessment of the equivalence of different social stimuli. The knowledge gained from such an approach allows researchers to make an informed decision about the stimuli they use while mapping the social brain. For example, the review has suggested that in some cases the use of more contrived stimuli can lead to difficulties in detecting effects (e.g., the series of studies investigating the modulation of gaze cueing by emotion and the importance of using dynamic stimuli to detect it). Thus, the power to observe and measure effects might be strongest when the social stimuli closely match those that make up our social environment. For example, Schultz and Pilz (2009; see also Fox et al., 2009) have argued that dynamic images of faces should be used in place of static images of faces as localizers of face processing regions in the brain. The importance of such knowledge should not be underestimated given the cost (e.g., time, money, effort) of conducting research in social neuroscience.
Theoretically, the explicit comparison between stimuli varying in their approximation to a real social interaction can yield new insights into the neural underpinnings of social cognition. This is true both when similarities and differences emerge from the comparison. For example, Sagiv and Benton's (2001) demonstration that the N170 was comparable across upright schematic and images of real faces suggest that the neural processes generating it are sensitive to some common feature of the stimuli (i.e., the basic structural configuration of a face). The same author's demonstration that the N170 was qualitatively different for inverted schematic and images of real faces, however, suggests that the neural processes generating the N170 are also sensitive to some unshared feature (e.g., experience, familiarity) between schematic and real faces. Both pieces of information can inform theorizing about the neural basis of social cognition.
It is important to reiterate that the approach advocated here does not seek to minimize the contribution of using stimuli that are not “naturalistic.” These types of stimuli have numerous benefits for researchers in social neuroscience as evidenced by the progress made using such stimuli. The approach advocated here calls for the addition of more naturalistic stimuli and, more specifically, the systematic comparison between stimuli that range in their approximation to a real social interaction. Lastly, in some facets of social neuroscience (e.g., studies involving fMRI or EEG), the approach we have suggested will present methodological challenges. Rather than see this as a reason to abandon such an effort, we see it as a reason to innovate—a challenge researchers are already beginning to meet and overcome (e.g., Redcay et al., 2010; Wilms et al., 2010). For example, we have reviewed numerous neuroscientific investigations that have successfully compared social phenomena using stimuli ranging in their approximation to a real social interaction (Sato et al., 2004; Schultz and Pilz, 2009; Redcay et al., 2010; Pönkänen et al., 2011). We are confident this effort will continue and continue to succeed.
Understanding the social brain represents one of the fundamental aims of neuroscience. This pursuit faces daunting challenges given the complex nature of social phenomena. This review presents one viable way to meet the challenge. Future research employing an approach derived from cognitive ethology promises to provide further insight into the nature of the social brain.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) operating grant to Alan Kingstone and an Early Career Fellowship from the Leverhulme Trust to Megan Freeth.
Bayless, S. J., Glover, M., Taylor, M. J., and Itier, R. J. (2011). Is it in the eyes? Dissociating the role of emotion and perceptual features of emotionally expressive faces in modulating orienting to eye gaze. Vis. Cogn. 19, 483–510.
Caggiano, V., Fogassi, L., Rizzolatti, G., Pomper, J. K., Thier, P., Giese, M. A., and Casile, A. (2011). View based encoding of actions in the mirror neurons of the area F5 in macaque premotor cortex. Curr. Biol. 21, 144–148.
Campbell, R., Heywood, C. A., Cowey, A., Regard, M., and Landis, T. (1990). Sensitivity to eye gaze in prosopagnosic patients and monkeys with superior temporal sulcus ablation. Neuropsychologia 28, 1123–1142.
Dawson, G., Toth, K., Abbott, R., Osterling, J., Munson, J., Estes, A., and Liaw, J. (2004). Early social attention impairments in autism: social orienting, joint attention, and attention to distress. Dev. Psychol. 40, 271–283.
Ferrari, P. F., Gallese, V., Rizzolatti, G., and Fogassi, L. (2003). Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur. J. Neurosci. 17, 1703–1714.
Fletcher-Watson, S., Leekam, S. R., Benson, V., Frank, M. C., and Findlay, J. M. (2009). Eye-movements reveal attention to social information in autism spectrum disorder. Neuropsychologia 47, 248–257.
Freeth, M., Chapman, P., Ropar, D., and Mitchell, P. (2010). Do gaze cues in complex scenes capture and direct the attention of high functioning adolescents with ASD? Evidence from eye-tracking. J. Autism Dev. Disord. 40, 534–547.
Freeth, M., Ropar, D., Mitchell, P., Chapman, P., and Loher, S. (2011a). Brief report: how adolescents with ASD process social information in complex scenes. combining evidence from eye movements and verbal descriptions. J. Autism Dev. Disord. 41, 364–371.
Gallup, A. C., Hale, J. J., Sumpter, D. J. T., Garnier, S., Kacelnik, A., Krebs, J. R., and Couzin, I. D. (2012b). Visual attention and the acquisition of information in human crowds. Proc. Natl. Acad. Sci. U.S.A. doi: 10.1073/pnas.1116141109. [Epub ahead of print].
Hietanen, J. K., Leppänen, J. M., Peltola, M. J., Linna-aho, K., and Ruuhiala, H. J. (2008). Seeing direct and averted gaze activates the approach–avoidance motivational brain systems. Neuropsychologia 46, 2423–2430.
Jarvelainen, J., Schurmann, M., Avikainen, S., and Hari, R. (2001). Stronger reactivity of the human primary motor cortex during observation of live rather than video motor acts. Neuroreport 12, 3493–3495.
Kilts, C. D., Egan, G., Gideon, D. A., Ely, T. D., and Hoffman, J. M. (2003). Dissociable neural pathways are involved in the recognition of emotion in static and dynamic facial expressions. Neuroimage 18, 156–168.
Klin, A., Jones, W., Schultz, R., Volkmar, F., and Cohen, D. (2002). Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Arch. Gen. Psychiatry 59, 809–816.
Laidlaw, K. E. W., Foulsham, T., Kuhn, G., and Kingstone, A. (2011). Social attention to a live person is critically different than looking at a videotaped person. Proc. Natl. Acad. Sci. U.S.A. 108, 5548–5553.
Laidlaw, K. E. W., Risko, E. F., and Kingstone, A. (in press). A new look at social attention: orienting to the eyes is not (entirely) under volitional control. J. Exp. Psychol. Hum. Percept. Perform.
Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal, B. L., DiLavore, P. C., Pickles, A., and Rutter, M. (2000). The autism diagnostic observation schedule—generic: a standard measure of social and communication deficits associated with the spectrum of autism. J. Autism Dev. Disord. 30, 205–223.
Pönkänen, L. M., Alhoniemi, A., Leppänen, J. M., and Hietanen, J. K. (2011). Does it make a difference if I have an eye contact with you or with your picture? An ERP study. Soc. Cogn. Affect. Neurosci. 6, 486–494.
Redcay, E., Dodell-Feder, D., Pearrow, M., Mavros, P., Kleiner, M., Gabrieli, J., and Saxe, R. (2010). Live face-to-face interaction during fMRI: a new tool for social cognitive neuroscience. Neuroimage 50, 1639–1647.
Sato, W., Kochiyama, T., Yoshikawa, S., Naito, E., and Matsumura, M. (2004). Enhanced neural activity in response to dynamical facial expressions of emotion: an fMRI study. Brain Res. Cogn. Brain Res. 20, 81–91.
van der Geest, J. N., Kemner, C., Verbaten, M. N., and van Engeland, H. (2002a). Gaze behavior of children with pervasive developmental disorder toward human faces: a fixation time study. J. Child Psychol. Psychiatry 43, 669–678.
van der Geest, J. N., Kemner, C., Camfferman, G., Verbaten, M. N., and van Engeland, H. (2002b). Looking at images with human figures: comparison between autistic and normal children. J. Autism Dev. Disord. 32, 69–75.
Vuilleumier, P., and Righart, R. (2011). “Attention and automaticity in processing facial expressions,” in Oxford Handbook of Face Perception, eds A. J. Calder, G. Rhodes, M. Johnson, and J. V. Haxby (Oxford, UK: Oxford University Press), 449–478.
Wilms, M., Schilbach, L., Pfeiffer, U., Bente, G., Fink, G. R., and Vogeley, K. (2010). It's in your eyes—using gaze-contingent stimuli to create truly interactive paradigms for social cognitive and affective neuroscience. Soc. Cogn. Affect. Neurosci. 5, 98–107.
Keywords: social attention, social neuroscience, ecological methods, ethology
Citation: Risko EF, Laidlaw KEW, Freeth M, Foulsham T and Kingstone A (2012) Social attention with real versus reel stimuli: toward an empirical approach to concerns about ecological validity. Front. Hum. Neurosci. 6:143. doi: 10.3389/fnhum.2012.00143
Received: 10 January 2012; Accepted: 07 May 2012;
Published online: 25 May 2012.
Edited by:Chris Frith, Wellcome Trust Centre for Neuroimaging at University College London, UK
Reviewed by:Thierry Chaminade, Centre National de la Recherche Scientifique, France
Ayse P. Saygin, University of California, USA
Copyright: © 2012 Risko, Laidlaw, Freeth, Foulsham and Kingstone. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
*Correspondence: Evan F. Risko, Social and Behavioral Sciences, Cognition and Natural Behavior Laboratory, Arizona State University, Glendale, P.O. Box 37100 Phoenix, AZ 85069-7100, USA. e-mail: firstname.lastname@example.org