Pre-exposure to Ambiguous Faces Modulates Top-Down Control of Attentional Orienting to Counterpredictive Gaze Cues

Abubshait, Abdulaziz; Momen, Ali; Wiese, Eva

doi:10.3389/fpsyg.2020.02234

ORIGINAL RESEARCH article

Front. Psychol., 09 September 2020

Sec. Cognitive Science

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.02234

Pre-exposure to Ambiguous Faces Modulates Top-Down Control of Attentional Orienting to Counterpredictive Gaze Cues

Abdulaziz Abubshait¹^*

Ali Momen²

Eva Wiese²

¹Robotics Domain, Italian Institute of Technology, Genoa, Italy
²Department of Psychology, George Mason University, Fairfax, VA, United States

Understanding and reacting to others’ nonverbal social signals, such as changes in gaze direction (i.e., gaze cue), are essential for social interactions, as it is important for processes such as joint attention and mentalizing. Although attentional orienting in response to gaze cues has a strong reflexive component, accumulating evidence shows that it can be top-down controlled by context information regarding the signals’ social relevance. For example, when a gazer is believed to be an entity “with a mind” (i.e., mind perception), people exert more top-down control on attention orienting. Although increasing an agent’s physical human-likeness can enhance mind perception, it could have negative consequences on top-down control of social attention when a gazer’s physical appearance is categorically ambiguous (i.e., difficult to categorize as human or nonhuman), as resolving this ambiguity would require using cognitive resources that otherwise could be used to top-down control attention orienting. To examine this question, we used mouse-tracking to explore if categorically ambiguous agents are associated with increased processing costs (Experiment 1), whether categorically ambiguous stimuli negatively impact top-down control of social attention (Experiment 2), and if resolving the conflict related to the agent’s categorical ambiguity (using exposure) would restore top-down control to orient attention (Experiment 3). The findings suggest that categorically ambiguous stimuli are associated with cognitive conflict, which negatively impact the ability to exert top-down control on attentional orienting in a counterpredicitive gaze-cueing paradigm; this negative impact, however, is attenuated when being pre-exposed to the stimuli prior to the gaze-cueing task. Taken together, these findings suggest that manipulating physical human-likeness is a powerful way to affect mind perception in human-robot interaction (HRI) but has a diminishing returns effect on social attention when it is categorically ambiguous due to drainage of cognitive resources and impairment of top-down control.

Introduction

In social interactions, we use information from social cues like gestures, facial expressions, and/or gaze direction to make inferences about what others think, feel, or intend (Adolphs, 1999; Emery, 2000; Gallagher and Frith, 2003). Joint attention or the ability to follow an interaction partner’s gaze in order to conjointly attend to an object of potential interest, is a fundamental social-cognitive mechanism that develops very early in life and is a precursor for higher-order social-cognitive processes, such as mentalizing or action understanding (for a review, see Frischen et al., 2007). In empirical research, joint attention can be investigated using the gaze-cueing paradigm (Friesen and Kingstone, 1998), where an abstract face stimulus is presented in the center of a screen that first looks straight at the participant and then changes its gaze direction to the left or right side of the screen (i.e., gaze cue), which is followed by a target that is presented either at the gazed-at location (i.e., valid trial) or opposite of the gazed-at location (i.e., invalid trial). This typically results in faster reaction times (RTs) to targets presented at valid than invalid locations (i.e., gaze cueing effect). Attentional orienting to gaze cues has traditionally been thought of as reflexive (i.e., a bottom-up process) as it is observable in infants as young as 3 months of age (Hood et al., 1998), is triggered by any kind of stimulus with eye-like configurations (Quadflieg et al., 2004), cannot be suppressed even when gaze direction is unlikely to predict the location of a target (i.e., counterpredictive cueing; Friesen et al., 2004; Vecera and Rizzo, 2006), and is not affected by a resource-demanding secondary task (Law et al., 2010). The few modulatory effects of gaze cueing that were originally reported were strongly dependent on participants’ age (i.e., stronger gaze cueing in children; Hori et al., 2005) and/or other individual traits (i.e., stronger gaze cueing in highly anxious individuals; Tipples, 2006; Fox et al., 2007).

More recently, however, studies have shown that attentional orienting to gaze cues can be top-down modulated when gaze cues are embedded in a richer context (the original experiments used abstract face stimuli; Friesen and Kingstone, 1998) that enhances the social relevance of the interaction for the observer (Tipples, 2006; Fox et al., 2007; Bonifacci et al., 2008; Graham et al., 2010; Kawai, 2011; Hungr and Hunt, 2012; Süßenbach and Schönbrodt, 2014; Wiese et al., 2014; Wykowska et al., 2014; Cazzato et al., 2015; Dalmaso et al., 2016, 2020; Abubshait and Wiese, 2017; Abubshait et al., 2020). Using such “social” versions of the original gaze-cueing paradigm, researchers were able to show that when social relevance is increased based on modulations of similarity-to-self (Hungr and Hunt, 2012; Porciello et al., 2014), physical humanness (Admoni et al., 2011; Martini et al., 2015), facial expression [Bonifacci et al., 2008; Graham et al., 2010; at long stimulus onset asynchrony (SOA) only], social status (Jones et al., 2010; Dalmaso et al., 2012, 2014, 2015; Ohlsen et al., 2013), social group membership (Dodd et al., 2011, 2016; Liuzza et al., 2011; Pavan et al., 2011; Ciardo et al., 2014; Cazzato et al., 2015; Dalmaso et al., 2015), and familiarity (Frischen and Tipper, 2006; Deaner et al., 2007) larger gaze cueing effects were observed (Wiese et al., 2013). Taken together, these findings suggest that engagement in joint attention may strongly depend on social context information, as well as a link between higher-level mechanisms of social cognition related to mentalizing, empathizing, or group membership and lower-level mechanisms of social cognition, such as joint attention (see Capozzi and Ristic, 2018 and Dalmaso et al., 2020, for comprehensive reviews on social factors that influence social attention).

With regard to human-robot interaction (HRI), potentially one of the most powerful contextual factors is the degree to which a robot is perceived to have a mind, with the ability to experience internal states like emotions and intentions and to execute goal-directed actions (i.e., mind perception; Gray et al., 2007). Seeing minds in others is not exclusive to humans, but “mind” can also be ascribed to agents that by definition do not have minds (e.g., robots) or whose mind status is ambiguous (e.g., animals; Gray et al., 2007). Mind perception is an automatic process that can be triggered implicitly when agents possess human-like facial features (Balas and Tonsager, 2014; Deska et al., 2016) or behaviors (Castelli et al., 2000). Decisions as to whether an agent “has a mind” are made within a few 100 ms (Wheatley et al., 2011; Looser et al., 2013), and just passively viewing stimuli that trigger mind perception is sufficient to activate social-cognitive brain networks (Wagner et al., 2011), even if their mind status is irrelevant to the task at hand (Wykowska et al., 2014; Caruana et al., 2015, 2017a). Mind status can also be explicitly ascribed to nonhuman agents when the presence of a human is needed in the current situation or when an entity has become so important to an individual that a “machine” status is no longer sufficient. For instance, agents of ambiguous physical human-likeness are more likely treated as a “human” when individuals are in an increased need of social contact due to chronic loneliness (Hackel et al., 2014) or when participants have to collaborate with them on a joint task (Hertz and Wiese, 2017). Likewise, soldiers who work with search-and-rescue robots on a regular basis are reported to be reluctant to agree to install updates on their robot “companions” because they fear this would change their “personality” (Singer et al., 2008; Carpenter et al., 2016).

Despite being an important question, studies investigating the effect of mind perception on social attention are surprisingly rare and have yielded mixed results depending on how mind perception was manipulated (Teufel et al., 2010; Wiese et al., 2012, 2014; Wykowska et al., 2014; Martini et al., 2015; Abubshait and Wiese, 2017). When mind perception was manipulated via belief (e.g., participants are instructed that changes in a robot’s gaze direction are pre-programmed vs. human-controlled), attentional orienting to gaze cues was enhanced when observed gaze behavior was believed to be caused by a human agent as opposed to a pre-programmed algorithm (Wiese et al., 2012; Wykowska et al., 2014; Caruana et al., 2015). Belief manipulations can also impact participants’ perceptions of the space around them (Müller et al., 2014; Fini et al., 2015), their performance in a joint Simon task (Müller et al., 2011), and their neural responses, as measured by activation in social regions of the brain (Kühn et al., 2014). A similar positive effect was found when mind perception was manipulated via behavior (e.g., predictive vs. random gaze cues), such that larger gaze cueing effects were observed when gaze cues predicted the target location with high likelihood as opposed to being non-predictive (80 vs. 50% predictive; Abubshait and Wiese, 2017). However, when mind perception was manipulated via physical appearance (e.g., gazers of varying degrees of human-likeness), results were more mixed: on the one hand, general differences in gaze cueing mechanisms were found between human and robot agents when using non-predictive cues (Admoni et al., 2011; Wiese et al., 2012), such that robots tended to induced smaller gaze cueing effects than humans when non-predictive gaze cues were used (i.e., 50% predictive of target location); this effect, however, was not further modulated by the robot’s physical human-likeness (Admoni et al., 2011; Martini et al., 2015; Abubshait and Wiese, 2017). On the other hand, a gazing stimulus that possesses very human-like but not perfectly human physical appearance (i.e., morphed images consisting of 70% of a human image and 30% of a robot image) disrupted top-down control of attentional orienting in counterpredictive gaze-cueing paradigms (i.e., targets appear with a higher chance at the uncued location), such that participants were less capable of shifting their attention away from the cued (i.e., not very likely target location) to the predicted (i.e., very likely target location) location when the gazer displayed ambiguous levels of human-likeness, as opposed to an unequivocally “human” or “robot” gazer (Martini et al., 2015).

The assumption that observing stimuli of ambiguous physical human-likeness negatively impacts resource-demanding cognitive processes, such as top-down control of attention, is in line with established biased-competition models of visual processing (Desimone and Duncan, 1995), showing that possible interpretations of ambiguous stimuli compete for representation in visual networks causing cognitive conflict and that cognitive resources are needed to direct selective attention to stimuli features that favor one explanation over the others (via inhibition of alternative category representations) to resolve the cognitive conflict (Meng and Tong, 2004; Sterzer et al., 2009; Ferrey et al., 2015). It is also in line with empirical examinations of the uncanny valley (UV; Mori, 1970) theory that links negative evaluations and long categorization times for ambiguously human-like face stimuli to categorical uncertainty (Cheetham et al., 2011; Hackel et al., 2014; Martini et al., 2016) and consumption of additional cognitive resources compared to categorically unambiguous stimuli (Wiese et al., 2019). Specifically, it was shown that when mind perception was manipulated via physical parameters, for instance, by morphing human images into robot images along a spectrum ranging from 0 to 100% of physical humanness, changes in mind ratings attributed to the resulting images show a categorical pattern, with significant changes in ratings at the human-nonhuman category boundary located at around 63% physical humanness, but only marginal changes in mind ratings for stimuli that unequivocally fall into either the “human” or “nonhuman” category (Cheetham et al., 2011, 2014; Hackel et al., 2014; Martini et al., 2016).

Follow-up studies showed that this qualitative change in mind ratings for stimuli located at the category boundary is associated with increased categorization times, indicating that being exposed to categorically ambiguous stimuli might be associated with increased cognitive processing costs compared to categorically unambiguous stimuli (Cheetham et al., 2011, 2014).

In support of this notion, a follow-up study used mouse tracking (Freeman and Ambady, 2010) to show that this increase in categorization time for stimuli of ambiguous human-likeness is associated with an increase in cognitive conflict, as indicated by larger mouse curvatures for stimuli of ambiguous human-likeness than unequivocally “human” or “robot” stimuli (Weis and Wiese, 2017; Wiese and Weis, 2020). Yet, another follow-up study showed that processing categorically ambiguous stimuli is also associated with an increase in cognitive costs and draining of cognitive resources over time even when the stimuli were irrelevant to the immediate task (Wiese et al., 2019). Specifically, the authors embedded face stimuli of differing levels of human-likeness (0% human, 30% human, 70% human, and 100% human) into a vigilance task, known to be sensitive to the drainage of cognitive resources (Parasuraman et al., 2009), and examined whether a categorically ambiguous stimulus of 70% physical humanness would be associated with a stronger decrease in performance over time (i.e., vigilance decrement) than a categorically unambiguous stimulus of 0, 30, or 100% physical humanness. In line with this assumption, the researchers showed that the 70% human stimuli caused a significantly larger decrement than the 0, 30, and 100% human stimuli, indicating that categorically ambiguous stimuli may drain more cognitive resources over time than categorically unambiguous stimuli, even when being irrelevant to the task (Wiese et al., 2019).

Interestingly, the negative effect on cognitive performance vanished for ambiguous stimuli when participants were perceptually pre-exposed to the stimuli before the task (i.e., both the ambiguous and unambiguous stimuli) by being asked to evaluate the stimuli regarding their capacity of having internal states (i.e., explicit mind perception; e.g., “Can the stimulus feel pain?”) or their perceptual features (i.e., implicit mind perception; e.g., “Does the stimulus have the shape of an avocado?”; Wiese et al., 2019). This suggests that cognitive conflict, when assessing the mind status of stimuli, is triggered by bottom-up mechanisms related to ambiguous perceptual stimulus features (Gao et al., 2010; Wheatley et al., 2011) and the automatic coactivation of competing categories (Ferrey et al., 2015), which can only be resolved by focusing selective attention on a subset of perceptual features that support one category over another – for instance, by pre-exposing participants to the stimuli and directing attention to their perceptual features. Regarding attentional orienting to gaze signals, this means that manipulating the degree to which a gazer is perceived to “have a mind” via physical features can have negative consequences on the effectiveness of a gaze cue when the gazer is of ambiguous physical human-likeness, which could drain cognitive resources and negatively impact top-down control of spatial attention.

Aim of Study

The goal of the current study is to investigate whether a perceptually ambiguous agent induces cognitive conflict due to an increased difficulty in categorizing a face stimulus as “human” or “nonhuman” (Experiment 1), whether the categorically ambiguous face can potentially interfere with top-down control of attentional orienting to gaze cues due to cognitive conflict (Experiment 2), and whether resolving perceptual ambiguity via pre-exposing participants to the stimuli prior to the gaze-cueing task would restore top-down control abilities (Experiment 3). To investigate these questions, we created stimuli of varying degrees of physical humanness – ranging from 0 to 100% human image contained in the morphed image in steps of 20% – and embedded them into a mouse-tracking task (Experiment 1) and a gaze-cueing task (Experiments 2 and 3). The mouse-tracking task is a force-choice categorization task that is designed to investigate the coactivation of two competing categories with larger overlap in coactivation correlating with larger cognitive conflict, as measured by mouse-movement curvatures (Freeman and Ambady, 2010). In the gaze-cueing task, we used a counterpredictive paradigm where participants responded to targets that appeared more often at the uncued location (80% of trials) than the cued location (20% of trials), to disentangle bottom-up from top-down mechanisms (Friesen et al., 2004). In order to optimize task performance, participants have to suppress bottom-up attentional orienting to the cued (but unlikely) target location and, instead, shift their attentional focus to the uncued (but likely) target location via top-down control. If attentional orienting to gaze cues follows a bottom-up pattern, reaction times will be shorter for valid than invalid trials (i.e., a standard gaze cueing effect: invalid minus valid trials reaction time difference is positive); if attentional orienting follows a top-down pattern, reaction times should be shorter for invalid than valid trials (i.e., a reversed gaze cueing effect: invalid minus valid trials reaction time difference is negative). Thus, in counterpredictive gaze-cueing paradigms, the difference in reaction times between invalid and valid trials can be used as a measure for the extent to which gaze cueing is top-down controlled: the more positive (negative) the difference in reaction times, the more pronounced the bottom-up (top-down) component is.

If mind perception caused cognitive conflict for stimuli located at the category boundary between “human” and “nonhuman” (located at around 60–70% physical humanness, as indicated by previous work; Cheetham et al., 2011, 2014; Wiese et al., 2019; Wiese and Weis, 2020), we would expect that the 60% human morph¹ would induce the most cognitive conflict that is due to categorization compared to faces that are easily distinguished as “human” or “nonhuman” (Experiment 1). This cognitive conflict should also significantly disrupt top-down control of attentional orienting for the morphed face that showed the most cognitive conflict (Experiment 2). Furthermore, top-down control should be restored when participants are pre-exposed to the stimuli’s perceptual features prior to the gaze-cueing task (Experiment 3).

Experiment 1

The aim of Experiment 1 was to use mouse tracking to measure if perceptually ambiguous faces caused cognitive conflict that is due to categorizing the faces as “human” or “nonhuman” via measures of mouse curvatures and to identify which of the faces was closest to the category boundary. If, indeed, categorically ambiguous faces induced conflict due to categorization, we would expect that mouse curvatures should be largest for the 60% morphed human face as previous literature suggests that the category boundary exists around the 60% physical humanness level (Cheetham et al., 2011, 2014; Hackel et al., 2014; Martini et al., 2016).

Materials and Methods

Participants

Thirty-eight participants were recruited from the George Mason University undergraduate pool (25 females, M age = 20.68, SD = 4.07, range = 18–35). Students were given course credit for completion of the study. All participants reported normal or corrected-to-normal vision and provided written consent prior to participating. All research procedures were approved by George Mason University’s Internal Review Board. All data and analysis scripts can be found on https://osf.io/73pr6/.

Stimuli

The face stimuli were created using FantaMorph, a software that allows two images to be morphed into one another incrementally, resulting in a spectrum ranging from 0% of image A (i.e., 100% of image B) to 100% of image A (i.e., 0% of image B). On the “nonhuman” end of the spectrum, the S2 humanoid robot head developed by Meka Robotics was used. On the “human” end of the spectrum, a male face stimulus from the Karolinska Directed Emotional Faces (KDEF) database was used (Lundqvist et al., 1998). The spectrum comprised of six morph levels, resulting in stimuli of 0, 20, 40, 60, 80, and 100% physical human-likeness; see Figure 1.

FIGURE 1

Figure 1. Spectrum of physical humanness ranging from 0 (left) to 100% (right) of physical humanness. Morphed face stimuli were created by morphing the face image of a humanoid robot into the image of a male human face from the Karolinska Directed Emotional Faces (KDEF) database (Lundqvist et al., 1998). The morphed images increase in physical humanness from the left side of the spectrum (i.e., robot) to the other (i.e., human) in increments of 20%.

Task and Procedure

Stimuli were presented (one at a time) at the bottom center of the screen and asked participants to categorize them as either “human” or “nonhuman” by moving their mouse cursor to the respective labels presented in the top left or top right corner of the screen as soon as the image appeared on the screen. The location of the labels was counterbalanced across participants. At the beginning of each trial, participants were asked to move the mouse cursor to a designated starting position, which was located centrally at the bottom of the computer screen, and to click the left mouse button to initiate the trial. Immediately afterward, one of the morphed stimuli was presented centrally at the bottom of the screen, and participants had to move the mouse cursor from the starting position to one of the two category labels located in the top two corners of the screen and click the label. For each morphed image, mouse cursor movement trajectories were measured. After each trial, participants were presented with a blank black screen for an inter-trial interval (ITI) of 1,000 ms to signify the end of the trial; see Figure 2.

FIGURE 2

Figure 2. Sequence of events during a trial of mouse tracking. On each given trial, participants moved their mouse cursor to the start position at the bottom of the screen and clicked the start button. After the click, a face would be presented centrally at the bottom of the screen, right above the start button. Immediately after the face is presented, the category labels appeared. Next, the participant moved the mouse to pick one of the categories. The inter-trial-interval (i.e., ITI) was set to 1,000 ms.

Participants were instructed to complete the task as quickly as possible to maximize the likelihood of the mouse moving from the starting position and to limit the number of trials where participants would keep the mouse stationary and only move it once they have categorized the face. We did not want participants to notice that the faces formed a spectrum that progressed systematically in degree of human-likeness to account for any bias that could correspond to the pattern. Therefore, faces were presented to participants in a randomized fashion. Additionally, to further conceal the pattern among the faces, eight decoy human-robot spectrums (that were created with the same procedures) were included among the stimuli. Each experimental session started with a practice block in which participants completed three trials with three different morphed agents that were not included in the main task. After completing the practice block, participants moved to the experimental condition in which they saw all 54 faces in a randomized fashion. Each face was presented once for a total of 54 trials. The study took approximately 20 min to complete.

Analysis

Data were analyzed using R (version 3.6.1). The mouse-tracking software developed by Freeman and Ambady (2010) was used to collect and process mouse tracking data. The software allows researchers to record time-standardized trajectories of the mouse’s movements for a given trial. This allows users to compute the area under the curve (AUC), which is the geometric area of the mouse trajectory from the mouse’s starting point to the end point compared to a straight line trajectory from those points (Freeman and Ambady, 2010). When participants are conflicted between two choices regarding a stimulus, an overlap in activation between the two categories would cause participants to make a choice in a geometrically wide mouse movement, which would result in a large AUC; a stimulus that does not coactivate two categories should induce less conflict and result in a geometrically narrow movement and a small AUC. The general idea underlying mouse tracking is depicted in Figure 3. None of the mouse-tracking trials deviated more than 3 SD from the participant’s mean and were all kept in the analysis.

FIGURE 3

Figure 3. Mouse tracking recording and analysis (adapted from Freeman and Ambady, 2010). The shaded region visualizes how mouse curvature is used to calculate the area under the curve (AUC): the curved line represents the participant’s actual mouse trajectory while picking a category; the straight line represents the theoretical mouse trajectory if no cognitive conflict between the indicated categories occurs for a given stimulus. A comparison is, then, drawn between the maximal deviation of the actual mouse movement and the theoretical straight line to calculate the AUC (i.e., solid black line).

FIGURE 4

Figure 4. AUC during mouse tracking as a function of physical human-likeness. The 60% human morph is associated with a significantly stronger cognitive conflict than all the other morphs combined.

To analyze the mouse tracking data, a univariate ANOVA with AUC as a dependent variable and Agent Type as a within-participants factor (0% human vs. 20% human vs. 40% human vs. 60% human vs. 80% human vs. 100% human) was conducted. Follow-up t-tests were corrected using the false discovery rate (FDR) procedure.

Results

Greenhouse–Geisser corrections were done (ε = 0.65) due to violation of the sphericity assumption according to the Mauchly test [χ ²(6) = 0.22, p < 0.001]. Results revealed a significant main effect of Agent Type [F(1,37) = 3.83, p = 0.009, η_G ² = 0.08], with mouse curvatures varying as a function of physical human-likeness. To examine whether mouse curvatures were more pronounced for the 60% morph than the other stimuli, we used contrast coding comparing the average mouse curvatures for the 60% morph to the grand mean of all other face stimuli. This analysis revealed a significant difference between the 60% face and the grand mean of all the other morph stimuli [t(37) = 4.03, p < 0.001, d = 0.57], such that the 60% morph had a significantly higher AUC compared to the average AUC of all the other faces (AUC: M _{60% face} = 1.29 vs. M _Grand = 0.45; see Figure 4). This suggests that the 60% morph was perceived as categorically more ambiguous and that it potentially triggered larger cognitive conflict compared to the other face stimuli.

Discussion

Experiment 1 aimed to use an established technique for measuring cognitive conflict processing, namely mouse tracking, to examine if categorically ambiguous faces induced cognitive conflict that is due to categorizing them as a “human” or “nonhuman” and if the faces that have been previously shown to be close to the category boundary (i.e., 60% humanness) induced the most cognitive conflict. As such, we expected to find that the 60% human face would exert the most cognitive conflict. Results of Experiment 1 showed that, indeed, the level of morphing had an overall effect on cognitive conflict and that the supposedly categorically ambiguous 60% morph induced significantly more cognitive conflict than all of other morphed images together (as measured in AUC) when subjects were categorizing the faces as a human or nonhuman.

Experiment 2

Experiment 2 aimed at examining whether the category boundary face (i.e., the face that exerted the most cognitive conflict in Experiment 1) has the ability to disrupt top-down modulation of attention orienting compared to faces that are more easily distinguishable as either a human or a nonhuman. If perceptual ambiguity drained cognitive resources, less cognitive resources would remain for an observer to exert top-down modulation of attentional orienting (i.e., attending to the predicted location as opposed to the cued location). As such, we would expect to find significant differences in gaze cueing for categorically ambiguous than non-ambiguous faces. Specifically, we expect the categorically ambiguous face (i.e., the 60% human face) to elicit stronger reflexive orienting of attentional resources (i.e., standard gaze cueing effect with shorter reaction times on valid trials) than categorically unambiguous faces as cognitive resources that would facilitate top-down modulation should be more depleted for categorically ambiguous face stimuli compared to unambiguous faces.