Can We Distinguish Emotions from Faces? Investigation of Implicit and Explicit Processes of Peak Facial Expressions

Xiao, Ruiqi; Li, Xianchun; Li, Lin; Wang, Yanmei

doi:10.3389/fpsyg.2016.01330

ORIGINAL RESEARCH article

Front. Psychol., 31 August 2016

Sec. Emotion Science

Volume 7 - 2016 | https://doi.org/10.3389/fpsyg.2016.01330

Can We Distinguish Emotions from Faces? Investigation of Implicit and Explicit Processes of Peak Facial Expressions

Ruiqi Xiao

Xianchun Li

Lin Li

Yanmei Wang^*

School of Psychology and Cognitive Science, East China Normal University, Shanghai, China

Most previous studies on facial expression recognition have focused on the moderate emotions; to date, few studies have been conducted to investigate the explicit and implicit processes of peak emotions. In the current study, we used transiently peak intense expression images of athletes at the winning or losing point in competition as materials, and investigated the diagnosability of peak facial expressions at both implicit and explicit levels. In Experiment 1, participants were instructed to evaluate isolated faces, isolated bodies, and the face-body compounds, and eye-tracking movement was recorded. The results revealed that the isolated body and face-body congruent images were better recognized than isolated face and face-body incongruent images, indicating that the emotional information conveyed by facial cues was ambiguous, and the body cues influenced facial emotion recognition. Furthermore, eye movement records showed that the participants displayed distinct gaze patterns for the congruent and incongruent compounds. In Experiment 2A, the subliminal affective priming task was used, with faces as primes and bodies as targets, to investigate the unconscious emotion perception of peak facial expressions. The results showed that winning face prime facilitated reaction to winning body target, whereas losing face prime inhibited reaction to winning body target, suggesting that peak facial expressions could be perceived at the implicit level. In general, the results indicate that peak facial expressions cannot be consciously recognized but can be perceived at the unconscious level. In Experiment 2B, revised subliminal affective priming task and a strict awareness test were used to examine the validity of unconscious perception of peak facial expressions found in Experiment 2A. Results of Experiment 2B showed that reaction time to both winning body targets and losing body targets was influenced by the invisibly peak facial expression primes, which indicated the unconscious perception of peak facial expressions.

Introduction

Facial expression, which conveys affective and motivational states, serves as one of the most important nonverbal social cues in daily interpersonal communication. Thus, the ability to extract emotion information from facial expression is crucial for efficient social functioning and interpersonal relationships (Hinojosa et al., 2015). There are two important processes of facial expression recognition: explicit recognition and implicit perception. Implicit facial expression perception, occurring relatively quickly, can be made with limited information input and without consciousness. Conversely, explicit facial expression recognition requires comparison between the currently obtained features and related prior knowledge (Landis, 1924; Adolphs, 2002). Various evidence has been provided to support the notion that implicit and explicit processes are distinct and independent. For example, the adult neuroimaging literature suggests different underlying neural structures for these two processes: subcortical limbic activity for the implicit process and the response of the prefrontal cortex for the explicit process (Nakamura et al., 1990; Joynt, 1995; Winkielman et al., 1997; Adolphs, 2002; Lange et al., 2003). Moreover, other studies have demonstrated that the strength of activation of the amygdala differed between implicit perception and explicit recognition processes, although no consensus on how the activation changes was obtained (Studies revealing enhanced reaction of amygdala in implicit facial emotion perception, see: Williams et al., 2005; Habel et al., 2007, Studies revealing less response in implicit facial expression perception, see: Gorno-Tempini and Price, 2001; Gur et al., 2002).

Most studies of facial emotion recognition have focused on basic emotions of moderate intensity (Ekman and O'sullivan, 1988; Ekman, 1993; Young et al., 1997; Smith et al., 2005). Although, debate continues (Gendron et al., 2014), most studies using moderate intensity facial expressions have revealed that six basic emotions (happiness, sadness, disgust, fear, anger, and surprise) are universal, and people can automatically and accurately recognize or perceive them from face cues both explicitly and implicitly (Boucher and Carlson, 1980; Haidt and Keltner, 1999; Sauter et al., 2010; Ekman and Cordaro, 2011). However, apart from the well-recognized moderate emotions, there are many more facial expressions that are ambiguous in our daily life, such as peak emotion. Peak emotion is one kind of the unexploited emotions, which was defined by Aviezer et al. (2012b) as “the apex of a highly intense emotional experience and focused on the immediate peak expressions in response to real-life situations, such as undergoing a nipple piercing, receiving an extravagant prize, winning a point in a professional sports match, and so forth.” Some studies investigated intensity as an important factor to influence expression recognition, finding that recognition accuracy improved as expression intensity increased. However, we found that they did not actually take peak emotion into account, because the intensity of the stimuli they adopted was far below peak emotions, even for the most intensive stimuli (Orgeta and Phillips, 2007; Hoffmann et al., 2010; Leime et al., 2013; Rosenberg et al., 2015).

The current state of research on peak emotion is inadequate. According to the limited number of studies, peak emotions are unable to convey emotion information. To our knowledge, the work of Aviezer et al. (2012b) pioneered the investigation of recognition of peak expressions in real life situations. Their study employed expression images of athletes at the moment of winning or losing a point; the participants were asked to deduce the valence of isolated faces, isolated bodies, the congruent face-body compounds, and incongruent ones. The results showed that participants could judge the valence of isolated body images, though it was difficult for them to distinguish the isolated faces. Furthermore, the valence of incongruent face-body compounds was judged by body gestures, rather than facial expressions. They concluded that faces in intense situations were not capable of conveying emotion information. However, we need to be cautious about their conclusion considering the following perspectives.

First, we consider the communicatory function of facial expressions and discuss the diagnosability of peak facial expressions from the functional aspect. Facial expressions could convey specific information to observers, while simultaneously acting as reinforces to modulate further action (Blair, 2003). Although, peak facial expressions were distorted, facial expressions in peak emotional situations should still keep their communicatory characteristics. In many high-stake sporting competitions, athletes are required not to exhibit intensive expressions frequently since they are not necessarily functional in achieving goals (Friesen, 2015). For example, players rarely perform at their best when feeling sad. However, they nonetheless display some intense expressions. One possible reason for their conscious choice to express their feelings in an extreme way is to exaggerate their confidence, cheer themselves and their supporters up (for winners) or exhibit extreme anger to scare competitors (for losers).

Second, participants in Aviezer et al. study were asked to rate the valence of presented faces, which required consciously matching the obtained information and the existing experience. Thus, we cannot conclude that peak facial expressions are not capable of conveying emotion information, given that the implicit process was not tested. The results merely indicated that peak facial expressions cannot be recognized explicitly.

Third, the finding that the valence ratings of peak face-body compounds (congruent and incongruent) were mainly determined by body gestures is insufficient to support the notion that peak facial expressions are not diagnostic, because body cues also exert influence on emotion recognition in other situations where the faces conveyed strong and clear emotion information (Kret et al., 2013). For example, by using the facial expressions taken from Ekman and Friesen (1976) set, App et al. (2012) demonstrated that angry faces on fearful bodies were recognized as less angry than on angry bodies. The context influence, including bodily gestures, words, cultural context and voice (Barrett et al., 2011), on perception of facial expressions is thought to be automatic (Aviezer et al., 2011), outside consciousness (Aviezer et al., 2007, 2009, 2012a) and culturally unspecific (Ito et al., 2011).

Some expressions (such as anger and disgust) bear strong similarities in facial configuration, and the high degree of similarity could foster the influence of the body on expression recognition. Aviezer et al. (2008) highlighted the “similarity” between the facial configuration of different emotions, and suggested that the influence of the body on emotion recognition depends on the degree of similarity. In their study, they sought to find the influence of an angry body on emotional facial expressions. They found that when participants are presented with two images—one of an angry fist accompanied by a disgusted facial expression, the other of an angry fist accompanied by a fearful facial expression—they were more likely to choose the former image as anger, because of the high similarity of facial expressions between anger and disgust. In a similar vein, regarding peak facial expressions, the facial muscles tense to the greatest extent, making the faces of different peak emotions look alike so that the bodies can strongly influence emotion recognition.

Peak emotion is special, considering its anatomical structures and distorted appearance. Specific expression activates certain facial muscle combinations. For example, when smiling, the orbicularis oculi muscle and zygomaticus major muscles combine to raise the cheeks and the corners of mouth, while anger causes orbicularis oculi to lower, bringing the brows together, while orbicularis oris is caused to raise and tighten the upper eyelids. Theoretically, specific facial muscle combinations for each emotion and the way they work do not change as the intensity increases. However, for peak emotion, the facial muscles are extremely constrained, and the configuration distortion caused by high intensity makes the peak facial expressions hard to distinguish. Therefore, we query whether it is possible for observers to detect emotion information from the different facial muscle actions “hidden” under distorted facial configuration, in terms of facial muscle combination and the way they work.

The studies in this field are principally focused on explicit recognition, and there are few research studies investigating the implicit processes of peak emotion. Evidence for implicit emotion perception is mostly generated from studies using continuous flash suppression (CFS) techniques and the backward masking (BM) technique. These two paradigms are distinct regarding the strength of suppression and the underlying neural mechanisms activated. In CFS, the prime and noise are presented simultaneously to both eyes. The sequence of CFS is as follows: two fixation crosses appear on the screen, followed by the prime picture accompanied by the first random-noise pattern, followed by the same prime picture accompanied by the second random-noise pattern, followed by the targets. The noise images are usually presented to the dominant eye, and the facial expressions are presented to the other one (Adams et al., 2010). During binocular presentation, the dominant noise images obliterate the information of the suppressed image further up to the visual system, leaving the subcortical processing relatively unaffected (Tong and Engel, 2001). Subliminal affective priming task is an example of the backward masking paradigm. In this task, positive and negative primes are presented for a short time (17 or 30 ms), which could not be consciously detected, followed by a positive or negative target (Hermans et al., 2001). Participants were found to respond faster and more accurately to the targets when primed by congruent valence primes than when primed by incongruent valence primes. These two methods differ in the loci and degree of cortical activation. Almeida et al. (2013) reported that CFS were more sensitive to negative-valenced stimuli. Whereas, the major advance of the subliminal affective priming task is that its relatively “loose” masking procedure was proved to generalize activation across many cortical regions, demonstrating that it is sensitive to both positive and negative prime stimuli and eliminates the threat-specific effect of CFS.

Given previous findings, we have reasons to hypothesize that peak facial expressions could be implicitly perceived, even though the differences between peak facial expressions are too subtle to be explicitly recognized. Our study aimed to investigate the diagnosability of peak facial expressions at both conscious and unconscious levels. In Experiment 1, we investigated the explicit process of peak emotion. Participants rated the valence of isolated bodies, faces, congruent face-body compounds, and incongruent face-body compounds whilst their eye movement pattern was simultaneously recorded. Experiment 2A and 2B adopted the subliminal affective priming task to investigate the implicit process of peak facial expressions, with isolated bodies as the target and isolated faces as the prime. We hypothesized that the participants would fail to judge the peak facial expressions (Experiment 1), but would be able to implicitly perceive them, by showing the influence on reaction to other emotional body targets (Experiment 2A and 2B).

Experiment 1

Materials and Methods

Participants

Thirty-two college students from East China Normal University (11 males and 21 females; M = 20.41 years, SD = 1.30, range: 19–24 years) participated in the experiment. All the participants were right-handed, had normal or corrected-to normal vision, and had no neurological or psychiatric history. They gave written informed consent and received small gifts for their participation. All the participants were included in the behavioral analyses. Nine of them (two males and seven females) were excluded from the eye movement data collecting procedure, due to the possible influence of spectacles. Of the remaining 23 participants, five participants in the face-body compound blocks were rejected due to technical problems. The study was approved by the Institutional Ethics Committee of East China Normal University.

Materials

Images of tennis athletes, depicting the transient peak-intense reactions to winning or losing a point in high-stake competitions, were selected from Google (Same key words were used as Aviezer et al., 2012b). Every image was digitally manipulated using photo-editing software to create four image categories:

(1) Isolated-face: nine images in total, five losing faces (three females and two males) and four winning faces (two females and two males) (see Figure 1D);

(2) Isolated-body: eight images in total, four winning bodies (two females and two males) and four losing bodies (two females and two males) (see Figure 1C);

(3) Face-body congruent images: three images in total (two losing and one winning) (see Figure 1A);

(4) Face-body incongruent images: seven images in total (four losing-face-winning-body and three winning-face-losing body) (see Figure 1B).

FIGURE 1

Figure 1. (A) Examples of congruent pictures: (1) losing-face-losing-body; (2) winning-face-winning-body. (B) Examples of incongruent pictures: (1) winning-face-losing-body; (2) losing-face-winning-body. (C) Examples of pictures: (1) isolated winning body; (2) isolated losing body. (D) Examples of pictures: (1) isolated losing face; (2) isolated winning face.

All the participants were unaware of the manipulation. Pictures were presented in grayscale, with a gray background. The size of the vertical stimuli was 350 × 533 pixels and the size of horizontal stimuli was 533 × 350 pixels.

Procedure

The participants were directed into the laboratory. After signing the informed consent forms, they were seated in front of an eye-tracking device positioned 64 cm in front of them, with their head placed on a chin-rest. A nine-point calibration was then performed, during which the participants were required to follow the calibration point as it moved over the screen to ensure that eye gaze data were adjusted for movement. Calibration was repeated before each block. Eye movements were recorded with Tobii T120, at the sample rate of 120 Hz. All of the instructions for the study were given by computer.

The study comprised three blocks: isolated face block, isolated block, and face-body compound block (with both face-body congruent and incongruent images). All the images were randomly presented in each block. The order of three blocks was counterbalanced across all participants. The participants were given the instruction to look carefully at the images and evaluate the emotional states of the athletes in the images after they disappeared. Each image was presented for 5000 ms followed by an evaluative scale. Since all the pictures showed high arousal, participants were only asked to evaluate the valence: this refers to the pleasant or unpleasant state, with 1 for extremely unpleasant, 5 for neutral, and 9 for extremely pleasant.

Results

The participants rated isolated facial and bodily expressions and face-body compounds; their fixation patterns (fixation duration and fixation count) were recorded simultaneously. The original ratings of valence varied from 1 to 9. We transferred them to −4 to 4 by subtracting 5 from the original ratings. Thus, the ratings below 0 stood for negative, those above 0 stood for positive, and 0 represented neutral.

Accuracy

If the response (positive/negative/neutral) was consistent with the emotion shown in the image, it was recorded as correct; if not, it was recorded as incorrect. Since the face-body incongruent images displayed two different emotions simultaneously, their accuracy cannot be calculated. We will, therefore, present the results for incongruent images separately.

Except for the face-body incongruent images, the overall accuracy was 81%, which was significantly above chance performance: in one-sample t-test, t₍₃₁₎=13.88, p < 0.01. Breaking down accuracy for the three kinds of images: isolated face was 66%, isolated body was 89%, and face-body congruent was 88%. A paired sample t-test was conducted. The accuracy for isolated face was significantly lower than the isolated body images, t₍₃₁₎ = −7.79, p < 0.01; and the face-body congruent images, t₍₃₁₎ = −5.74, p < 0.01; but significantly above chance, t₍₃₁₎ = 6.28, p < 0.01. There were 192 face-body incongruent trials in total, of which only 18 trials were evaluated corresponding to face, representing 9.4% of the total.

However, it should be claimed that the overall accuracy for isolated face images (66%) is driven by the high accuracy for the losing face. We calculated the accuracy for the winning face and losing face separately: the results showed that the mean accuracy for the losing body is 92%, which is significantly higher than chance level (50%), t₍₃₁₎ = 20.00, p < 0.01; but the mean accuracy for the winning body is 39%, which is marginally significantly below chance level, t₍₃₁₎ = −1.88, p = 0.07. Thus, the 66% accuracy of peak facial expression may not indicate the diagnosability. Instead, it indicated that participants tended to take both winning and losing peak facial expressions as lose.

Emotional Ratings

Isolated face and isolated body image

Participants were able to correctly evaluate the valence of isolated body images: they succeeded in rating winning bodies as positive and losing bodies as negative. However, they failed to judge the emotional valence when faces were shown alone. Specifically, the participants evaluated both losing faces and winning faces as negative when measuring the emotional ratings (see Figure 2A).

FIGURE 2

Figure 2. (A) Results of mean valence ratings for images of isolated face and isolated body. (B) Results of mean valence ratings for images of face and body compounds. **p < 0.01.

Face-and-body images

A 2 (Body: losing/winning) × 2 (Face: losing/winning) repeated-measure ANOVA on valence ratings revealed a main effect of the body, F_{(1, 31)} = 114.21, p < 0.01, $η_{p}^{2} =$ 0.80, suggesting that judgments of peak emotions were mostly accordant to bodies regardless of the congruency. The recognized affective valence of face-body compounds shifted mainly depending on the body. Furthermore, in accordance with the previous study of Aviezer et al. (2012b), the interaction between the two factors reached significance: F_{(1, 31)} = 20.53, p < 0.01, $η_{p}^{2} =$ 0.39. The subsequent paired t-test showed that images with winning faces were rated as more extreme. Congruent winning images were rated as significantly more positive than losing face-winning body, t₍₃₁₎ = 3.23, p < 0.01, and winning-face-losing-body images were rated as significantly more negative than congruent losing: t₍₃₁₎ = −3.37, p < 0.01 (see Figure 2B).

Eye Movement

Eye movements were recorded from 23 participants. To better reveal the observation processes, we removed the blocks whose recording samples were below 70%. Samples are the index of the quality of recording as a percentage, which is calculated by correctly recognized numbers of eye movement samples. One hundred percent means both eyes were found throughout the recording; 50% means only one eye was fully recorded or both eyes during half duration. Since we were interested in the relative contributions of body and face to the emotion recognition, analysis on body-and-face images was conducted in terms of the number of fixations and fixation duration. There remained 17 blocks for face-body compound.

We defined two regions of interest (ROI): the face and the body in body-and-face compound images. The average number of fixations per ROI for each image type was calculated. A 2 (ROI: Body/Face) × 2 (Congruency: Congruent/Incongruent) repeated-measure ANOVA was conducted. There were no significant main effects of ROI or congruency. The interaction between ROI and Congruency was significant: F_{(1, 16)} = 9.87, p < 0.01, $η_{p}^{2} =$ 0.38. A further paired t-test revealed that there was no significant difference between body and face in congruent situations, t₍₁₆₎ = −0.33, p > 0.05; while the number of fixations on face was significantly higher than on body in incongruent images, t₍₁₆₎ = −2.39, p < 0.05 (see Figure 3A). The same analysis was conducted on the fixation duration, revealing a significant main effect of ROI, F_{(1, 16)} = 44.38, p < 0.01, $η_{p}^{2} =$ 0.74, and congruency, F_{(1, 16)} = 8.01, p < 0.05, $η_{p}^{2} =$ 0.33, and significant interaction between them, F_{(1, 16)} = 29.25, p < 0.01, $η_{p}^{2} =$ 0.65. A further paired t-test revealed significant differences between body and face in both congruent, t₍₁₆₎ = −0.49, p < 0.01, and incongruent situations, t₍₁₆₎ = −2.39, p < 0.05 (see Figure 3B). The results of both the fixation count and the fixation duration suggested that the participants displayed distinct gaze patterns for congruent and incongruent images.

FIGURE 3

Figure 3. (A) Number of fixation for Regions of Interest (ROI) of face and body in face-body compounds. (B) Fixation duration fixation for Regions of Interest (ROI) of face and body in face-body compounds.

The emotional ratings of the 17 remaining participants included in the analysis of eye-tracking were also analyzed. They produced similar results: a significant main effect of the body, F_{(1, 16)} = 98.08, p < 0.01, $η_{p}^{2} =$ 0.87. The interaction between the two factors was also significant, F_{(1, 16)} = 19.92, p < 0.01, $η_{p}^{2} =$ 0.57. A subsequent paired t-test showed that congruent winning images were rated as significantly more positive than losing face-winning body images, t₍₁₆₎ = 3.48, p < 0.01, and winning face-losing body images were rated as significantly more negative than congruent losing, t₍₁₆₎ = −2.57, p < 0.01.

Moreover, to solve the possible problem caused by an unequal number of materials, we randomly selected three images from each group (isolated-face, isolated-body, face-body congruent, and face-body incongruent) to repeat the same analysis on fixation duration and number of fixations. The results were almost the same. For fixation duration, the main effects of ROI [F_{(1, 16)} = 35.73, p < 0.01, $η_{p}^{2} =$ 0.69], congruence [F_{(1, 16)} = 11.29, p < 0.01, $η_{p}^{2} =$ 0.41], and the interaction effect between ROI and congruence were all significant [F_{(1, 16)} = 10.22, p < 0.01, $η_{p}^{2} =$ 0.39]. A further paired t-test revealed a significant difference between body and face in both congruent, t₍₁₆₎ = −6.29, p < 0.01, and incongruent situations, t₍₁₆₎ = −4.90, p < 0.01. For fixation duration, neither the main effects of ROI nor those of congruence were significant, but the interaction between these two effects showed the trend to reach significance, F_{(1, 16)} = 3.23, p = 0.09, $η_{p}^{2}$ = 0.17.

Discussion

Experiment 1 aimed to investigate the explicit recognition of peak facial and bodily expressions and the relative contribution of body and face during the emotion recognition process. The emotion rating results were consistent with the principal previous study (Aviezer et al., 2012b), revealing that faces were not able to provide sufficient valence information in peak emotion situations.

One of the most interesting findings in Experiment 1 was that the participants showed different gaze patterns to face-body congruent and incongruent images. This was reflected by the significant interaction between ROI and congruency, with larger distinctions between ROIs in incongruent images than in congruent images, in terms of both fixation duration and number of fixations.

We query why the distinct gaze patterns appeared. It could be assumed that if the participants were unable to discriminate valence (both explicitly and implicitly) from intense facial expressions, there would not be “congruent” or “incongruent” to them. Since ambiguity of peak facial expressions could not provide any valid emotional information to match or mismatch with the bodily gestures. But in fact, participants did display distinct eye-gaze patterns to the congruent and incongruent groups. One possible explanation of the different gaze patterns was that people could perceive specific emotional information from the intense facial expressions, maybe in an unconscious way. To further investigate the unconscious perception process of facial expressions, Experiment 2A and 2B were conducted.

Experiment 2A

According to Murphy and Zajonc (1993) Affective Primacy Theory, the emotional reaction to a stimuli could be activated with minimal stimuli input and few cognitive resources. Consistent with this, previous studies have shown that people can process faces of different valence in the absence of consciousness. Neurons in the superior colliculus are capable of responding to rapid visual input and producing distinct responses to facial expressions without any conscious experience (Blair, 2003). In essence, emotion perception is highly automatic, outside consciousness, and prior to other cognition and perception (Massar and Buunk, 2009). In Experiment 2A, we tested the implicit emotion perception process of peak emotion facial expressions.