An Android for Emotional Interaction: Spatiotemporal Validation of Its Facial Expressions

Sato, Wataru; Namba, Shushi; Yang, Dongsheng; Nishida, Shin’ya; Ishi, Carlos; Minato, Takashi

doi:10.3389/fpsyg.2021.800657

METHODS article

Front. Psychol., 04 February 2022

Sec. Emotion Science

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.800657

An Android for Emotional Interaction: Spatiotemporal Validation of Its Facial Expressions

Wataru Sato ^1,2^*

Shushi Namba ¹

Dongsheng Yang ³

Shin’ya Nishida ^3,4

Carlos Ishi ⁵

Takashi Minato ⁵

1. Psychological Process Research Team, Guardian Robot Project, RIKEN, Kyoto, Japan
2. Field Science Education and Research Center, Kyoto University, Kyoto, Japan
3. Graduate School of Informatics, Kyoto University, Kyoto, Japan
4. NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan
5. Interactive Robot Research Team, Guardian Robot Project, RIKEN, Kyoto, Japan

Article metrics

View details

Citations

14,6k

Views

1,8k

Downloads

Abstract

Android robots capable of emotional interactions with humans have considerable potential for application to research. While several studies developed androids that can exhibit human-like emotional facial expressions, few have empirically validated androids’ facial expressions. To investigate this issue, we developed an android head called Nikola based on human psychology and conducted three studies to test the validity of its facial expressions. In Study 1, Nikola produced single facial actions, which were evaluated in accordance with the Facial Action Coding System. The results showed that 17 action units were appropriately produced. In Study 2, Nikola produced the prototypical facial expressions for six basic emotions (anger, disgust, fear, happiness, sadness, and surprise), and naïve participants labeled photographs of the expressions. The recognition accuracy of all emotions was higher than chance level. In Study 3, Nikola produced dynamic facial expressions for six basic emotions at four different speeds, and naïve participants evaluated the naturalness of the speed of each expression. The effect of speed differed across emotions, as in previous studies of human expressions. These data validate the spatial and temporal patterns of Nikola’s emotional facial expressions, and suggest that it may be useful for future psychological studies and real-life applications.

Introduction

Emotional interactions with other people are important for wellbeing (Keltner and Kring, 1998) but difficult to investigate in controlled laboratory experiments. While numerous psychological studies have presented pre-recorded photographs or videos of emotional expressions to participants and reported interesting findings regarding the psychological processes underlying emotional interactions (e.g., Dimberg, 1982), this method may lack the liveliness of real interactions, thus reducing ecological validity (Shamay-Tsoory and Mendelsohn, 2019; Hsu et al., 2020). Other studies used confederates as interaction partners and tested live emotional interactions (e.g., Vaughan and Lanzetta, 1980), but this strategy can lack rigorous control of confederates’ behaviors (Bavelas and Healing, 2013; Kuhlen and Brennan, 2013). Androids—that is, humanoid robots that exhibit appearances and behaviors that closely resemble those of humans (Ishiguro and Nishio, 2007)—could become an important tool for testing live face-to-face emotional interactions with rigorous control.

To implement emotional interaction in androids, the androids’ facial expressions must be carefully developed. Psychological studies have verified that facial expressions play a key role in transmitting information about emotional states in humans (Mehrabian, 1971). Studies of facial expressions developed methods for objectively evaluating facial actions (for a review, see Ekman, 1982), and the Facial Action Coding System (FACS; Ekman and Friesen, 1978; Ekman et al., 2002) is among the most refined of these methods. Based on observations of thousands of facial expressions in natural settings, together with a series of controlled psychological experiments, researchers identified the sets of facial action units (AUs) in the FACS corresponding to prototypical expressions of six basic emotions (Ekman and Friesen, 1975; Friesen and Ekman, 1983). For example, happy expressions involve an AU set consisting of the cheek raiser (AU 6) and lip corner puller (AU 12); surprised expressions involve the inner and outer brow raisers (AUs 1 and 2, respectively), the upper lid raiser (AU 5), and the jaw drop (AU 25). Numerous studies testing the recognition of photographs of facial expressions created based on this system verified that the expressions were recognized as the target emotional expressions above chance level across various cultures (e.g., Ekman and Friesen, 1971; for a review, see Ekman, 1993). Furthermore, the researchers described how the temporal aspects of dynamic emotional facial expressions are informative (Ekman and Friesen, 1975), which was supported by several subsequent experimental studies (for reviews, see Krumhuber et al., 2016; Dobs et al., 2018; Sato et al., 2019a). For example, Sato and Yoshikawa (2004) tested the naturalness ratings of dynamic changes in facial expressions and found that expressions that changed too slowly were generally rated as unnatural. Additionally, the effects of changing speeds differed across emotions, where fast and slow changes were regarded as relatively natural for surprised and sad expressions, respectively. Collectively, these psychological findings specify the spatial and temporal patterns of facial actions associated with facial expressions of emotions. Based on such findings, researchers have developed and validated novel research tools, including emotional facial expressions of virtual agents (Roesch et al., 2011; Krumhuber et al., 2012; Ochs et al., 2015). Virtual agents are promising tools to investigate emotional interactions with high ecological validity and control (Parsons, 2015; Pan and Hamilton, 2018). Androids may be comparably useful in this respect, and also have the unique advantage of being physically present (Li, 2015). If androids’ facial expressions can be developed and validated based on psychological evidence, they will constitute an important research tool for investigating emotional interactions.

However, although numerous studies have developed androids for emotional interactions (Kobayashi and Hara, 1993; Kobayashi et al., 2000; Minato et al., 2004, 2006, 2007; Weiguo et al., 2004; Ishihara et al., 2005; Matsui et al., 2005; Berns and Hirth, 2006; Blow et al., 2006; Hashimoto et al., 2006, 2008; Oh et al., 2006; Sakamoto et al., 2007; Lee et al., 2008; Takeno et al., 2008; Allison et al., 2009; Lin et al., 2009, 2016; Kaneko et al., 2010; Becker-Asano and Ishiguro, 2011; Ahn et al., 2012; Mazzei et al., 2012; Tadesse and Priya, 2012; Cheng et al., 2013; Habib et al., 2014; Yu et al., 2014; Asheber et al., 2016; Glas et al., 2016; Marcos et al., 2016; Faraj et al., 2021; Nakata et al., 2021; Table 1), few have empirically validated the androids that were developed. First, no study validated androids’ AUs coded using FACS (Ekman and Friesen, 1978; Ekman et al., 2002). Second, no study sufficiently demonstrated recognition of the six basic emotions conveyed by androids’ facial expressions. Many androids’ facial expressions were reportedly insufficiently developed to exhibit all six basic emotions (e.g., Minato et al., 2004). While several studies developed androids capable of exhibiting the six basic emotions, and recruited naïve participants to label the facial expressions, most did not statistically evaluate the accuracy (e.g., Kobayashi and Hara, 1993). One study conducted a statistical analysis that did not reveal significantly high level of recognition of disgust and fear (Berns and Hirth, 2006). Another study testing five basic emotions failed to observe better-than-chance recognition of fear (Becker-Asano and Ishiguro, 2011). Finally, no study systematically validated whether androids can show dynamic changes in facial expressions like humans. Only a few studies reported that incorporating the dynamic patterns of human facial expressions into an androids’ facial expressions led to high naturalness ratings of facial expressions during laughter (Ishi et al., 2019) and vocalized surprise (Ishi et al., 2017).

TABLE 1

Study	Robot name	Emotional expression	Head DOF	Validation
Kobayashi and Hara, 1993	Face robot	6 basic emotions	24	Emotion recognition (no statistical test)
Kobayashi et al., 2000	–	Some (not specified)	21	–
Minato et al., 2004	Repliee R1	–	9	–
Weiguo et al., 2004	F&H robot	4 basic emotions	12	–
Ishihara et al., 2005	Affetto	Some (not specified)	12	–
Matsui et al., 2005	Repliee Q2	Some (not specified)	16	–
Berns and Hirth, 2006	ROMAN	6 basic emotions	21	Emotion recognition
Blow et al., 2006	KASPAR	Some (not specified)	8	–
Hashimoto et al., 2006	Saya	6 basic emotions	23	Emotion recognition (no statistical test)
Minato et al., 2006	CB²	Some (not specified)	14	–
Oh et al., 2006	Albert HUBO	Full range (not specified)	31	–
Sakamoto et al., 2007	Geminoid HI-1	–	13	–
Hashimoto et al., 2008	–	6 basic emotions	39	–
Lee et al., 2008	EveR-2	6 basic emotions	22	–
Takeno et al., 2008	Kansei	6 basic emotions	19	–
Allison et al., 2009	Brian	6 basic emotions	11	Emotion recognition (no statistical test)
Lin et al., 2009	Janet; Thomas	Various (not specified)	23	–
Kaneko et al., 2010	HRP-4C	Some (not specified)	11	–
Becker-Asano and Ishiguro, 2011	Geminoid F	5 basic emotions	12	Emotion recognition
Ahn et al., 2012	EveR-4 H33	13 basic emotions	33	Emotion recognition (no statistical test)
Mazzei et al., 2012	FACE	6 basic emotions	32	Emotion recognition (no statistical test)
Tadesse and Priya, 2012	–	Various (not specified)	–	–
Cheng et al., 2013	EVA	4 basic emotions	–	Emotion recognition (no statistical test)
Habib et al., 2014	PKD	Various (not specified)	24	–
Yu et al., 2014	–	6 basic emotions	13	Motion similarity; emotion recognition (no statistical test)
Lin et al., 2016	–	6 basic emotions	4	Emotion recognition (no statistical test)
Marcos et al., 2016	–	6 basic emotions	22	Emotion recognition (no statistical test)
Glas et al., 2016	ERICA	Wide range (not specified)	13	–
Asheber et al., 2016	–	6 basic emotions	8	–
Faraj et al., 2021	Eva	6 basic emotions	25	–
Nakata et al., 2021	Ibuki	7 basic emotions	18	–
This study	Nikola	6 basic emotions	35	FACS; emotion recognition; speed rating

Summary of studies on androids’ emotional facial expressions.

We included only androids that were human-like in appearance, and for which data were reported at conferences or in papers. DOF = degree of freedom; FACS = Facial Action Coding System.

To resolve the issues described above, we developed an android head, called Nikola, and validated its facial actions and emotional expressions. Nikola has 35 actuators, designed to implement AUs relevant to prototypical facial expressions based on psychological evidence (Ekman and Friesen, 1975, 1978; Friesen and Ekman, 1983; Ekman et al., 2002). The temporal patterns of the actions can be programmed at a resolution of milliseconds. We conducted a series of psychological studies to validate Nikola’s emotional facial expressions. In Study 1, we applied FACS coding to Nikola’s single AUs, which underlie appropriate emotional facial expressions. In Study 2, we evaluated emotional recognition accuracy based on the spatial patterns of Nikola’s emotional facial expressions through an emotion labeling task. In Study 3, we evaluated the temporal patterns of Nikola’s dynamic facial expressions through a naturalness rating task.

Study 1

Here, we used FACS coding for Nikola’s single facial actions. We expected that AUs specifically associated with the facial expressions corresponding to the six basic emotions to be produced.

Materials and Methods

Development of the Android

Nikola was developed for the purpose of studying emotional interaction with humans. Currently, only the head and neck are complete; the body parts are under construction. It is human-like in appearance, similar to a male human child; it resembles a child to promote natural interactions with both adults and children. It is about 28.5 cm high and weighs about 4.6 kg. It has 35 actuators: 29 for facial muscle actions, 3 for head movement (roll, pitch, and yaw rotation), and 3 for eyeball control (pan movements of the individual eyeballs and tilt movements of both eyeballs). The facial and head movements are driven by pneumatic (air) actuators, which create safe, silent, and human-like motions (Ishiguro and Nishio, 2007; Minato et al., 2007). The pneumatic actuators are controlled by an air pressure control valve. The entire surface, except for the back of the head, is covered in a soft silicone skin. Video cameras are mounted inside the left and right eyeballs. Nikola is not a stand-alone system; the control valves, air compressor, and computer for controlling the actuators and sensor information processing are external.

The facial muscle actuators’ locations were selected to produce as many AUs as possible, specifically those associated with emotional facial expressions (Ekman and Friesen, 1975, 1978; Friesen and Ekman, 1983; Ekman et al., 2002), together with the information provided by previously constructed androids (Minato et al., 2004, 2006, 2007; Matsui et al., 2005; Glas et al., 2016). Specifically, we designed Nikola to produce the following AUs corresponding to the emotional expressions associated with six basic emotions: 1 (inner brow raiser), 2 (outer brow raiser), 4 (brow lowerer), 5 (upper lid raiser), 6 (cheek raiser), 7 (lid tightener), 10 (upper lip raiser), 12 (lip corner puller), 15 (lip corner depressor), 20 (lip stretcher), 25 (lips part), and 26 (jaw drop). Although AUs 9 (nose wrinkler), 17 (chin raiser), and 23 (lip tightener) are reportedly relevant to prototypical facial expressions (Ekman and Friesen, 1975; Friesen and Ekman, 1983), these AUs were not implemented owing to the technical limitations of the silicone skin. AUs 14 (dimpler), 16 (lower lip depressor), 18 (lip pucker), 22 (lip funneler), and 43 (eyes closed) were also designed to implement other communication-related facial actions (e.g., speech and blinking).

Procedure

We programmed Nikola to exhibit AUs on an individual basis. A certified FACS coder scored the AUs from the neutral status to the action apex using FACS (Ekman et al., 2002). When the AU was detected, the coder evaluated it according to five discrete levels of intensity (A: trace, B: slight, C: marked/pronounced, D: severe, and E: extreme/maximum) according to FACS guidelines (Ekman et al., 2002). The coder could view the sequence repeatedly by adjusting the program settings. The Supplementary Material provides video clips of these AUs.

Results

The AUs produced by Nikola are illustrated in Figure 1, and the results of the FACS coding are presented in Table 2. Figure 1 demonstrates that Nikola is capable of performing each AU. It was difficult to distinguish between AUs 6 (cheek raiser) and 7 (lid tightener), but the eyes’ outer corners were slightly lowered in AU 6. The maximum intensity of the AUs ranged from A (e.g., AU 12) to E (e.g., AU 26).

FIGURE 1

Illustrations of the facial action units (AUs) produced by the android Nikola. For AU 25, AU 25 + 26 is shown.

TABLE 2

AU	AU description	Maximum intensity
1	Inner brow raiser	C
2	Outer brow raiser	B
4	Brow lowerer	C
5	Upper lid raiser	C
6	Cheek raiser	B
7	Lid tightener	D
10	Upper lip raiser	D
12	Lip corner puller	A
14	Dimpler	B
15	Lip corner depressor	B
16	Lower lip depressor	B
18	Lip pucker	A
20	Lip stretcher	B
22	Lip funneler	A
25	Lips part	E
26	Jaw drop	E
43	Eyes closed	E

Results of the Facial Action Coding System (FACS) coding of Nikola’s facial actions.

AU = action unit.

Discussion

Our results demonstrated that Nikola was capable of producing each AU based on manual FACS coding performed by a certified FACS coder. The results are consistent with several earlier studies’ findings that androids could exhibit AUs designed based on FACS (e.g., Kobayashi and Hara, 1993), but none of these studies involved evaluation by certified FACS coders. The coder found it difficult to differentiate AUs 6 (cheek raiser) and 7 (lid tightener). This is in line with earlier findings that androids struggled to replicate z-vector movements, including wrinkles and tension, compared with human expressions (Ishihara et al., 2021), owing to the physical constraints of artificial skin materials. The results of our intensity evaluation revealed that some AUs’ maximum intensities were not realized. This resulted from technical limitations, such as an insufficient number of actuators and skin materials. Collectively, the data suggest that Nikola can produce AUs associated with prototypical facial expressions, albeit with limited intensity.

Study 2

Next, we devised prototypical facial expressions for Nikola reflecting six basic emotions and asked naïve participants to label photographs of these expressions, as in earlier psychological studies using photographs of human facial expressions as stimuli (Sato et al., 2002, 2009; Kubota et al., 2003; Uono et al., 2011; Okada et al., 2015). Because earlier studies of human expression stimuli consistently demonstrated emotion recognition above the level of chance, as well as differences across emotions (such as lower recognition rates for angry, disgusted, and fearful expressions than happy, sad, and surprised expressions), we expected such patterns to be seen with respect to emotion recognition of Nikola’s facial expressions.