Is honesty the best policy for mismatched partners? Aligning multi-modal affordances of a social robot: An opinion paper

Huang, Guanyu; Moore, Roger K.

doi:10.3389/frvir.2022.1020169

OPINION article

Front. Virtual Real., 16 September 2022

Sec. Virtual Reality and Human Behaviour

Volume 3 - 2022 | https://doi.org/10.3389/frvir.2022.1020169

This article is part of the Research TopicDo we really interact with artificial agents as if they are human?View all 5 articles

Is honesty the best policy for mismatched partners? Aligning multi-modal affordances of a social robot: An opinion paper

Guanyu Huang*

Roger K. Moore

Department of Computer Science, University of Sheffield, Sheffield, United Kingdom

Introduction

Spoken interactions between a human user and an artificial device (such as a social robot) have attracted much attention in recent decades (Lison and Meena, 2014; Oracle, 2020). Shifting from automation robots in the industrial domain, social robots are expected to be used in social domains, such as the service industry, education, healthcare, and entertainment (Bartneck et al., 2020, p.163).

According to Darling (2016)’s definition, a social robot is “a physically embodied, autonomous agent that communicates and interacts with humans on an emotional level”. Many features play important roles in interactions with a social robot, such as people’s experience with technology products, expectations of social robots, interactional environments and other features such as a social robot’s appearance, voice and behaviours. In this last regard, affordance design affects how people perceive a social robot and how such perception affects their behaviours and experiences. The term “affordance” was invented by ecological psychologist Gibson (1977), who proposed that our perception of what it is possible to do with objects is shaped by their form. Affordance indicates what users see and can do with an object in a given situation; it is about perceptual action possibilities in an environment (Matei, 2020).

A strong tendency in social robot affordance design is to make human-robot interaction (HRI) resemble human-human interaction (HHI). It is hoped in many studies that robots designed with anthropomorphic appearances and human-like cognitive behaviours can enable humans to interact with them in similar ways as they would interact with other humans, even to develop social bonds (Leite et al., 2013; Kahn et al., 2015; Koyama et al., 2017; Ligthart et al., 2018). However, there are concerns about this approach. In fact, speech-based artificial agents’ conversational interaction with human users is far from natural, and the language used tends to be formulaic (Moore et al., 2016).

One of the reasons behind this is a significant change in the applications of spoken human-agent interaction (HAI) along the evolution of spoken language technology applications (Moore, 2017a). Compared with “command and control systems” of the 1970s and contemporary smartphone-based “personal assistants”, social robots are expected to be used in more dynamic and open environments. This implies that users’ expectations, demands and ways to interact with spoken agents differ depending on the use case. What has succeeded before in real-time spoken HAI (e.g., voice command for specific uses) may not work well for social robots in some contacts. Additionally, a social robot’s human-like affordances could be seen as “dishonest” because such signals hide the fact that a social robot has limited interactive capabilities and is a “mismatched” conversational partner (Moore, 2015; 2017b). What’s more, the approach to constructing a robot by integrating off-the-shelf human-like technologies lacks an appreciation of the function and behaviour of speech in a broader theoretical framework (Moore, 2015).

This paper takes a step back to consider what human users look for when speaking to a social robot. It starts by looking at the nature and the process of spoken interactions. It then discusses why honesty is the best policy for a social robot in HRI. Furthermore, the arguments presented here support the hypothesis that aligning a social robot’s external affordances coherently with internal capabilities can shape its usability and improve human users’ experience in HRI.

Broader theoretical background of spoken interaction

What happens when we talk with each other?

Spoken interaction is a joint activity grounded in social needs to cooperate on a moment-by-moment basis (Holtgraves, 2013). In this joint activity, interlocutors need to solve so-called “dilemma of cooperation” (Smith, 2010). The dilemma refers to two problems. One is the commitment problem of ensuring other individuals’ collaborative motivation is genuine. The other is the collaboration problem of coordinating each individual’s efforts to complete the collective task. Here is where language plays a vital role in facilitating the resolution of these two problems (Smith, 2010). Language helps to solve the commitment problem by enabling participants to express, recognise and act on each other’s intentions, ultimately leading to shared intentionality (Bratman, 1992; Searle, 1995; Tomasello and Carpenter, 2007). Language also helps to solve the collaboration problem by building up common ground, which is “the knowledge that the communicating parties both share and know they share” (Krauss and Fussell, 1990, p.112).

It is worth noting that this process is not linear. In Pierce and Corey (2009)’s review, the most dynamic model of conversations is the transactional model. In this model, interlocutors send and receive signals simultaneously. Additionally, their shared field of knowledge and experiences¹ could allow them to use less speech or even a single sound to achieve a successful interaction result (Hawkins, 2003).

Why a social robot could be a mismatched conversational partner?

Based on the above literature, it is safe to say that “sociality” in spoken interactions means collaboration. The efforts to achieve an effective collaboration lie in a pre-conversation shared field of knowledge and experiences, communicative competencies to align information in the conversation and post-conversation long-term memories, which updates the shared field of knowledge and experiences. Thus, interlocutors with similar shared field of knowledge and experiences, communicative competencies and memory capabilities can be considered “matched partners”. Otherwise, interlocutors can be seen as “mismatched partners”, like first and second language speakers, parents and babies, and humans and animals.

Hence, looking back at the case of social robots and other spoken artificial agents, it becomes clear why they are mismatched partners in HRI. To start with, robot designers and engineers must build the hardware and software to equip robots with desired abilities. This poses questions for them. For example, what shared fields of knowledge and experiences does the robot need to have? What sensory data of users and interactive environment does it need to get, and how can it act on collected information? What and when should it talk? How to use multi-modal cues to deliver the same message?

Without appropriate answers to these questions, it would be difficult for ordinary users to know how to coordinate their efforts to achieve a successful interaction with a social robot. For example, when talking with a social robot, how do we know what to say or how to speak? How do we know whether and how to adjust our behaviours to suit the agent? How do we know whether or when to give up?

Affordance design and its consequences

Human-like: To be or not to be

Bearing the above questions in mind, it is proposed that the human-like design of a social robot could be problematic by the following three arguments.

First, human-like design can be deceptive because it uses anthropomorphic signals that violate the associated humans’ expectations, and it is for ulterior purposes (Danaher, 2020). When interacting with a social robot with human-like affordances, people may instantly perceive such an agent as a matched conversational partner. However, it is not the case. For example, the authenticity of a social robot’s expressions can be doubted (Bartneck et al., 2020, p.195). Hence, it can be said that human-like cues risk overstating a social robot’s capabilities. It then leaves people with negative impressions of the robot, resulting in deflated motivation to interact with it. One example is Ham and Midden (2014)’s study about people’s negative reaction to a robot’s deceptive praising. Hence, a social robot with such a misleading design can be seen as “dishonest”, which is a primary ethical concern (Elder, 2016; Leong and Selinger, 2019; Hildt, 2021).

Second, human-like cues do not necessarily contribute to human users’ tendency to anthropomorphise. Anthropomorphisation is a natural outgrowth of humans’ social interaction and cognition (Bartneck et al., 2020, p.48). Long before social robots, evidence shows that people tend to humanise nonhuman entities regardless of their forms. For example, humans attribute mental states to animated geometrical shapes (Heider and Simmel, 1944), and people treat any technological form as social actors (Reeves and Nass, 1996). Hence, human users’ tendency to anthropomorphise exists regardless of whether a social robot is human-like or not.

Third, current technologies are not advanced enough to deliver human-like perceptual cues concordantly. Social robots tend to be multi-dimensional. However, component technologies have not developed coherently. According to the studies of Moore (2012), Meah and Moore (2014) and MacDorman and Chattopadhyay (2016), the inconsistency of perceptual cues causes perceptual conflict, leads to uncertainty in HRI and contributes to the Uncanny Valley Effect (Mori, 1970). Also, it can cause human users to fall into the habitability gap where usability drops significantly when flexibility increases (Moore, 2017b). Therefore, human-like social robots have a high risk of failure.

Use of honest signals in HRI design: A hypothetical approach

Given the above analysis of should not and couldn’t of human-like design, what is an appropriate way forward? One possibility is to explore a more appropriate affordance design for mismatched social agents. Here, it is hypothesised that the effectiveness of HRI could be improved by ensuring that a social robot’s affordances are designed as “honest signals”.

The concept of honest signals originally derived from evolutionary biology but has then developed in the social sciences (Pentland, 2010; Vinciarelli et al., 2011). Such signals refer to biological signals which cannot be faked thus convey reliable, useful information to the receiver; they are adopted as an evolutionarily stable strategy. In social communication, honest signals are unconsciously generated signals that indicate signallers’ genuine intentions or thoughts. They are reliable social signals because they are rooted in human brain structure, and biology (Pentland, 2010, p.3–4). By creating a direct or indirect link between perception and action, honest signals help make action possibilities be perceived appropriately so that signal receivers would have reasonable expectations and act accordingly. A successful example is the biomimetic robot MiRo’s voice design, which aligns its physical and behavioural affordances (Moore and Mitchinson, 2017).

Hence, it is suggested that a true robot’s affordances should be designed as honest signals. Such signals can then reflect a social robot’s inner capabilities and help users form a dependable affordance prediction. By doing so, it can reduce communicative uncertainty and improve communicative effectiveness.

How honest is good enough?

Given that the principles for honest affordance design lie with the robot’s internal capabilities, these must be quantifiable. It is worth noting that limited capabilities do not mean limited usability (Marge et al., 2020). Campa (2016) emphasised that two essential concepts for robots’ usability are “scenario” and “persona”. Both concepts are linked with user needs: one is what users need in given situations; the other is about expected user-robot relations and what users want from this relation. Hence, it is hypothesised that 1) similar user needs can be found in different domains; 2) the bottom line of a social robot’s capabilities shall meet given user needs. Thus, a better scientific understanding of various use cases is needed (Marge et al., 2020) to help explore user needs, provide design guidance for a robot’s capabilities and correlated affordance design, and develop a unified evaluation framework. Furthermore, it is proposed that honesty also exists in context, for example, Bonial et al. (2021)’s study about situated dialogue showed “the surrounding physical context, as well as the dialogue history and some assumptions relevant to the robot’s own embodied form and capabilities” matter. Apart from that, another factor to consider is the role played by a spoken agent. Is it supposed to maintain the authenticity of a real person like a Holocaust survivor (Traum et al., 2015) or fictional characters (Gustafson et al., 1999; Leuski et al., 2006; Clark and Fischer, 2022)?

Finally, one thing to add is that the “honest level” for each component in a social robot may not be even due to technological or practical constraints. However, honest signals are helpful to reduce uncertainty if they are correct “on average” (Johnstone, 1999). Hence, a key question is how to measure a social robot’s overall honesty and the honesty of its individual components.

Conclusion

What a social robot should look like, sound like and behave like is not only a practical concern, but also raises ethical issues in HRI. It is essential to understand what affects a social robot’s affordance design and how affordance designs impact the effectiveness of HRI. This paper attempts to facilitate theoretical understanding by looking at the nature and process of spoken interaction and arguing that 1) a social robot is a mismatched partner when talking with human users; 2) a more appropriate affordance design for a social robot should be honest and aligned with its inner capabilities and states; 3) the honest design should be based on user needs in given use cases. It is argued that a social robot with honest affordances can explicitly represent its capabilities and states, shape users’ expectations and reduce uncertainties during the interaction. While this helps us see the potential benefits of honesty in design, it also gives rise to plenty of challenges to overcome. For example.

• How could user needs and use cases be categorised more effectively?

• How do user needs affect people’s expectations of a social robot’s capabilities?

• How can a robot’s capabilities be adjusted?

• How do a social robot’s affordances reflect its capabilities?

• How would people perceive a social robot with honest affordance design in reality?

• Is it possible to develop an evaluation framework for social robot’s affordances, capabilities and usability?

Author contributions

GH and RM: article conceptualisation. GH: writing. RM: review. Both authors contributed to the article and approved the submitted version.

Funding

This work is supported by the Centre for Doctoral Training in Speech and Language Technologies (SLT) and their Applications funded by UK Research and Innovation [grant number EP/S023062/1].

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

¹The shared field of knowledge and experiences is not necessarily known to be shared by communicators (Bangerter and Mayor, 2013), which makes it different from the concept of “common ground”

References

Bangerter, A., and Mayor, E. (2013). “14 interactional theories of communication,” in Theories and models of communication. Handbook of communication science, 257–271. |