Emerging Grounded Shared Vocabularies Between Human and Machine, Inspired by Human Language Evolution

Kouwenhoven, Tom; Verhoef, Tessa; de Kleijn, Roy; Raaijmakers, Stephan

doi:10.3389/frai.2022.886349

OPINION article

Front. Artif. Intell., 26 April 2022

Sec. Language and Computation

Volume 5 - 2022 | https://doi.org/10.3389/frai.2022.886349

This article is part of the Research TopicConversational AIView all 6 articles

Emerging Grounded Shared Vocabularies Between Human and Machine, Inspired by Human Language Evolution

Tom Kouwenhoven¹^*

Tessa Verhoef¹

Roy de Kleijn²

Stephan Raaijmakers^3,4

¹Creative Intelligence Lab, Leiden Institute for Advanced Computer Science, Leiden University, Leiden, Netherlands
²Cognitive Psychology Unit, Institute of Psychology, Leiden University, Leiden, Netherlands
³TNO, Information and Communication Technology, Delft, Netherlands
⁴Leiden University Centre for Linguistics, Leiden University, Leiden, Netherlands

1. Introduction

Building conversational AI systems has the goal to teach machines to understand human language and respond naturally. The most common way to train agents to produce and interpret natural language is currently by exposing them to large quantities of data. Although this has resulted in advances in many areas, these systems typically have little understanding of how language is related to the real world (Mordatch and Abbeel, 2018), known as the grounding problem. Also, most conversational agents are trained in isolation, while humans are social animals, deeply embedded in culture and surrounded by others. Complex human behaviors, like language, evolved in socio–cultural contexts and could not exist without a variety of minds using and transmitting these behaviors.

To overcome this problem, researchers in Computational Linguistics have started modeling emerging communication setups, in which novel signals are created by interacting agents (Lazaridou et al., 2018; Mordatch and Abbeel, 2018; Chaabouni et al., 2019; ter Hoeve et al., 2021). However, the findings in such models do not always match what is found in similar experiments with humans, and features found in human language often do not emerge (Lazaridou et al., 2020).

The mechanisms that influence the emergence of communication and linguistic structure have been studied in the field of Language Evolution. Although the precise origins of human language are widely debated, computer simulations (Boer, 2006; Steels, 2012a; Kirby, 2017) and experiments in which humans use novel communication signals (Galantucci and Garrod, 2010; Scott-Phillips and Kirby, 2010; Kirby et al., 2014), have revealed some key mechanisms that drive the initial emergence of a novel language and the gradual appearance of more complex linguistic structure. We review relevant findings and propose to apply methods that confirm the importance of including micro–societies of interacting minds to the emergence of novel human–machine communication systems.

A major insight from these studies is that language adapts to human biases and how it is learned and used (Kirby et al., 2014, 2015). Similarly, current language models also exhibit biases, free order case-marking languages are for example more challenging to model than fixed-order languages (Bisazza et al., 2021). As such, we suggest that language used in human–machine communication should also evolve more naturally, resulting in a grounded communication system adapted to biases and constraints of human and machine learning. We moreover emphasize the importance of co–development of shared vocabularies by conversational partners (human or AI–based). Doing so might result in a dynamic communication system that is natural to humans and artificial conversational agents. We propose to follow a process of several steps, displayed in Figure 1. Starting from random behaviors, a signal–meaning mapping emerges from shared interactions (section 2) which become more structured through horizontal and vertical transmission (section 3) and eventually evolve into an adaptive communication system (sections 4, 5).

FIGURE 1

Figure 1. The proposed road of evolving a natural human–machine communication system. First, initially random behaviors obtain meanings and become more structured through recurrent horizontal and vertical transmission. Everyday usage facilitates the continuous evolution of communication systems which are adaptive to biases of humans and machines.

2. Emergence of communication

Successful communication happens when the coordinated actions of all participants adhere to the grounding criterion: that interlocutors agree that they have understood what was meant for the current purposes (Clark and Brennan, 1991). This requires a vocabulary that is (partially) aligned between interlocutors of a conversation (Pickering and Garrod, 2004). The emergence of which starts with agreeing on what kind of (initially random) behaviors should be interpreted as communicative and what they refer to (box 1 and 2 in Figure 1).

Experiments with human participants have been conducted to study the emergence of novel communication forms (Galantucci, 2005; Steels, 2006; Scott-Phillips et al., 2009; Galantucci and Garrod, 2010). Here, participants need to invent and negotiate novel signals to solve a communicative or cooperative task. Albeit often bound to the starting conditions of the experiment, even when no conventional signaling device is given, actions may gradually become communicative (Scott-Phillips et al., 2009). Typically, humans quickly establish conventions and settle on a shared set of signals. Sufficient common ground, interactions, and social coordination have been identified as crucial to facilitate the emergence of communication systems.

With computational agents, Quinn (2001) investigated the emergence of signals and cooperation without dedicated communication channels in a way that is comparable to the work of Scott-Phillips et al. (2009). Here, robots, only equipped with sensors to observe a shared environment, were tasked to move away from a starting point while maintaining proximity to each other. Initial random behaviors gradually evolved into an iconic signaling system that could establish the allocation of leader–follower roles (Quinn, 2001; Quinn et al., 2003).

A large body of work in evolutionary language games, as reviewed in Steels (2012b), has shown that agents without a pre–programmed language can develop a communication system from scratch. This happens in a self–organizing fashion, as alignment between agents arises from repeated interactions between individuals without central control. In the context of those experiments, Steels already proposed that robots can participate in the ongoing evolution of language and learn from human language users if there are sufficient situated interactions (Steels, 2012a). We think that this is key to developing natural communication between humans and machines.

Although building an initially shared vocabulary is well–explored between humans and in agent–based models, to the best of our knowledge, it is rarely applied in human–machine settings. One exception is a large–scale exhibition of Steel's Talking Heads experiment (Steels, 1999), in which both agents and human visitors proposed new words that could become part of an evolving shared vocabulary. We propose to revisit this idea in the context of conversational AI and include this initial step of co–developing a shared vocabulary, rather than bypassing it with (random) symbols or pre–trained language models, and trust in the process of self–organization to facilitate the emergence of conventional signal–meaning mappings. Once established, this shared set of signals may be far removed from natural language that humans know today, but just like human languages have, will adapt to their users and usage and become more complex and systematic.

3. Emergence of structure in language systems

Human language is uniquely structured and exhibits systematicity at multiple levels (Kirby, 2017). For example, words are combined into sentences such that their meaning is a function of the meanings of the parts and the way they are combined (compositional structure). The origins of this and other types of structure have been studied using computer models and artificial language learning experiments with humans (Kirby, 2017).

Among others, two processes have been found to contribute to the emergence of structure in language (box 2 and 3 in Figure 1). The first is cumulative cultural evolution where (cultural) information, such as ideas or linguistic signals, is transmitted vertically along generations. An influential experiment was conducted by Kirby et al. (2008). Here, the first participant was asked to learn an artificial language and describe images with the acquired words. Subsequent participants learned the output of the previous participant. Through this process, the words gradually changed and became more compositional and learnable. Such results consistently show that increases in learnability and structure arise because languages adapt to human inductive biases to be transmitted faithfully (Griffiths and Kalish, 2007). Words and patterns that are not easily learned or interpreted, will not be reproduced by the next generation and since structured languages are more easily compressible (Kirby et al., 2015; Tamariz and Kirby, 2015), this eventually results in more learnable and structured languages.

A second process contributing to the emergence of structure in human language is horizontal transmission. Here, linguistic structure originates and evolves from social coordination through repeated interactions between individuals in micro-societies. While interactions between dyads can lead to shared vocabularies and initial regularities (Theisen-White et al., 2011; Verhoef et al., 2016), a community of users seems to be necessary for the emergence of system-wide compositional structure and efficient coding (Fay et al., 2008; Raviv et al., 2019). In these cases, pressures such as the number of interaction partners and expanding meaning spaces cause initially random languages to become more structured over time.

The effects of horizontal and vertical transmission have also been demonstrated with agent–based computer simulations (Steels and Loetzsch, 2012; Kirby, 2017). Altogether, there is overwhelming evidence suggesting that transmission (vertical or horizontal) of signals within communities contributes to the emergence of structure in language. In fact, it has been argued that both types of transmission are necessary to get a language that is learnable and usable (Kirby et al., 2015). These processes should therefore be projected onto the human–machine language evolution scenario to evolve a vocabulary that shares features with human language and is equally adapted to be learned and used by machines.

4. Human–machine language evolution and reinforcement learning

Recent work in Computational Linguistics started to train machines to understand human language through the emergence of communication systems (Lazaridou et al., 2017; Clark et al., 2019; Manning et al., 2020). A range of work has shown that (multi–agent) reinforcement learning (RL; Sutton and Barto, 2018) can converge on communication protocols in various game scenarios (Lazaridou et al., 2016; Havrylov and Titov, 2017; Chaabouni et al., 2020). While communicative systems emerge, these often suffer from interpretability issues for humans (e.g., Mordatch and Abbeel, 2018), making its applicability to human–machine communication less obvious. The emergent protocols also often do not bear core properties of natural language (Kottur et al., 2017; Chaabouni et al., 2019). As such Lazaridou et al. (2020) use a pre–trained language model in combination with self–play to teach RL agents to communicate in natural language. However, without human intervention, this approach suffers from language drift, ultimately causing misunderstandings. While too much is problematic, we argue that some language drift is welcome since it can result in language that is optimized for human–machine communication.

A growing trend in RL advocates to include human feedback in the learning loop to improve learning (Arzate Cruz and Igarashi, 2020; ter Hoeve et al., 2021). Bignold et al. (2021) propose assisted reinforcement learning where information external from the environment is used to improve the performance of a learner agent and scale to more complex scenarios. Human feedback can, for example, directly be included in the behavior of an agent instead of learning it from the ground up and potentially prevents too much language drift. This draws parallels to human interactions which offer a means to ground signs through recurrent and reciprocal usage (Garrod et al., 2007), provide feedback on the success of a conversational contribution, and alleviate miscommunications resulting from partially aligned vocabularies due to variations or dialects.

To establish mutual understanding, we propose to use assisted reinforcement learning and revisit signaling games (Scott-Phillips et al., 2009), referential games (Steels and Loetzsch, 2012; Chaabouni et al., 2020), and navigation games (Mordatch and Abbeel, 2018; Dubova and Moskvichev, 2020) to evolve shared vocabularies between humans and machines. The next step is not only to use more complex problems (e.g., increasing the number of objects, interacting partners, or vocabulary size) that necessitate more complex syntax and vocabularies (Mordatch and Abbeel, 2018) but also to continue interacting with conversational agents frequently (box 3 and 4 in Figure 1). While communities of self–playing RL agents interact and consolidate the learned behaviors, frequent human–agent interactions prevent too much language drift. The evolved communication systems will not take the same form as human language initially, but through iterations, may come closer toward it and evolve into a form that makes human–machine interactions more natural, with communication systems adapted to biases in both human and machine learning.

We are aware that this proposition poses challenges, for example, the tutoring role and interactions that humans must supply. However, we assume that humans are willing to do so since we perceive robots as social, communicative partners (Guzman and Lewis, 2020) and linguistically align to computers in several ways (Branigan et al., 2010), an effect that is even stronger when computers themselves also exhibit alignment behavior (Spillner and Wenig, 2021).

5. Conclusion

This article proposed to combine insights from human language evolution, specifically concerning the influence of vertical and horizontal transmission, with assisted reinforcement learning. We have shown how signals and structure emerge in socio–cultural contexts and that language adapts to how it is learned and used. We therefore suggest that language used in human–machine communication should also evolve naturally, emphasizing the importance of co–development of shared conventions during communication. A first step would be to revisit communicative games and evolve successful systems between humans and machines. Doing so allows communication systems to adapt to the biases of both parties while mutual understanding is maintained, ultimately benefiting the communicative capacity of conversational agents.

Author Contributions

This opinion piece is written by TK with help of TV. TK, TV, and RK: literature research was conducted. TK, TV, RK, and SR: reviewing and editing was done. TV and SR supervised the process. All authors contributed to the article and approved the submitted version.

Conflict of Interest

SR is employed by TNO.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Arzate Cruz, C., and Igarashi, T. (2020). “A survey on interactive reinforcement learning: design principles and open challenges,” in Proceedings of the 2020 ACM Designing Interactive Systems Conference, 1195–1209.