Recursive Metropolis-Hastings naming game: symbol emergence in a multi-agent system based on probabilistic generative models

In the studies on symbol emergence and emergent communication in a population of agents, a computational model was employed in which agents participate in various language games. Among these, the Metropolis-Hastings naming game (MHNG) possesses a notable mathematical property: symbol emergence through MHNG is proven to be a decentralized Bayesian inference of representations shared by the agents. However, the previously proposed MHNG is limited to a two-agent scenario. This paper extends MHNG to an N-agent scenario. The main contributions of this paper are twofold: (1) we propose the recursive Metropolis-Hastings naming game (RMHNG) as an N-agent version of MHNG and demonstrate that RMHNG is an approximate Bayesian inference method for the posterior distribution over a latent variable shared by agents, similar to MHNG; and (2) we empirically evaluate the performance of RMHNG on synthetic and real image data, i.e., YCB object dataset, enabling multiple agents to develop and share a symbol system. Furthermore, we introduce two types of approximations—one-sample and limited-length—to reduce computational complexity while maintaining the ability to explain communication in a population of agents. The experimental findings showcased the efficacy of RMHNG as a decentralized Bayesian inference for approximating the posterior distribution concerning latent variables, which are jointly shared among agents, akin to MHNG, although the improvement in ARI and κ coefficient is smaller in the real image dataset condition. Moreover, the utilization of RMHNG elucidated the agents' capacity to exchange symbols. Furthermore, the study discovered that even the computationally simplified version of RMHNG could enable symbols to emerge among the agents.


INTRODUCTION
The origin of language remains one of the most intriguing mysteries of human evolution (Deacon, 1997;Christiansen and Chater, 2022;Steels, 2015).Humans utilize various symbol systems, including gestures and traffic lights.Although the language is considered a type of symbol (or sign) system, it boasts the richest structure and the strongest ability to describe events among all symbol systems (Chandler, 2002).The adaptive, dynamic, and emergent nature of symbol systems is a common feature in human society (Taniguchi et al., 2018(Taniguchi et al., , 2021)).This paper focuses on the emergent nature of general symbols and their meanings, rather than the structural complexity of language.The meaning of signs is determined within society in bottom-up and top-down manners, owing to the nature of symbol systems (Taniguchi et al., 2016).More specifically, the self-organized (or emergent) symbol system enables each agent to communicate semiotically with others, while being subject to the top-down constraints of the emergent symbol system.Agent-invented symbols can hold meaning within a society of multiple agents, even though no agent can directly observe the intention in the brain of a speaker.Peirce, the founder of semiotics, defines a symbol as a triadic relationship between sign, object, and interpretant (Chandler, 2007).The interpretant serves as a mediator between the sign and the object.In nature, the relationship between sign and object exhibits arbitrariness.This implies that human society-a multi-agent system using a symbol system-must form and maintain these relationships within a society in a decentralized manner.Research on symbol emergence, language evolution, and emergent communication has been addressing this issue using a constructive approach for decades.
Studies on emergent communication takes many forms.Numerous studies have explored emergent communication by engaging agents in Lewis-style signaling games, such as referential games.Lazaridou et al. (2017Lazaridou et al. ( , 2018) ) and Havrylov and Titov (2017) demonstrated that agents can communicate using their own language by performing reference games.Furthermore, Choi et al. (2018) and Mu and Goodman (2021) suggested that the compositionality of emergent language can be improved by modifying the real image data used in reference games.However, Bouchacourt and Baroni (2018) highlighted the issue of agents being able to communicate even when using uninterpretable images in referential games.Noukhovitch et al. (2021) demonstrated the necessity of referential games for agent communication.Numerous studies have also attempted emergent communication with multiple agents.Gupta et al. (2021) explored extending to multiple agents using meta-learning, while Lin et al. ( 2021) employed autoencoder, a standard representation learning algorithm.Chaabouni et al. (2022) investigated the effects of varying the number of agents in referential games on agent communication.These studies successfully achieved communication through games that provided rewards.
In contrast, Taniguchi et al. (2023) proposed an alternative formulation of emergent communication based on probabilistic generative models and the assumption of joint attention.The Metropolis-Hastings naming game (MHNG) was introduced to explain the process by which two agents share the meaning of signs in a bottom-up manner from a Bayesian perspective.It was demonstrated that symbol emergence can be considered decentralized Bayesian inference.MHNG assumes joint attention between two agents-widely observed in human infants learning vocabularies-instead of reward feedback from a listener to a speaker.This idea is rooted in the concept of a symbol emergence system (Taniguchi et al., 2016(Taniguchi et al., , 2018)), rather than the view of an emergent communication channel often assumed in emergent communication studies based on Lewis-style signaling game.The notion of a symbol emergence system was proposed to capture the overall dynamics of symbol emergence from the perspective of emergent systems, i.e., complex systems exhibiting emergent properties.This approach aims to further investigate the fundamental cognitive mechanisms enabling humans to organize symbol systems within a society in a bottom-up manner.
In this paper, we use the term symbol system in a restricted sense.Here, a symbol system simply refers to a set of signs and their (probabilistic) relationship to objects.In the context of studies on symbol emergence and emergent communication, we cannot assume a ground-truth relationship between signs and objects, unlike many studies in artificial intelligence, e.g., standard pattern recognition task that assumes a ground-truth label given to each object.Ideally, a multi-agent system should form a symbol system with which agents can appropriately categorize (or differentiate) objects and associate signs with objects.The definition of appropriate categorization and sign sharing is crucial to the formulation of symbol emergence.Different approaches assume different goals of symbol emergence and criteria based on various hypothetical principles.Iterated learning assumes that the goal of symbol emergence is for each agent 5 Figure 1.Left: an overview of symbol emergence system (Taniguchi et al., 2016).Right: an overview of recursive Metropolis-Hastings naming game played among multiple agents.
to use the same sign for each object.In contrast, emergent communication based on referential games assumes that organizing signs allows a speaker to provide information that enables a listener to choose an object intended by the speaker.Taniguchi et al. proposed a collective predictive coding (CPC) hypothesis in the discussion of Taniguchi et al. (2023).The CPC hypothesis posits that the goal of symbol emergence is the formation of global representations created by agents in a decentralized manner.This can also be called social representation learning, i.e., symbol emergence is conducted as a representation learning process by a group of individuals in a decentralized manner.From a Bayesian perspective, this can be regarded as decentralized Bayesian inference.
The MHNG was proposed, demonstrating that the language game enables two agents (agents 1 and 2) to form a symbol system, with MHNG's process mathematically considered as a Bayesian inference of p(w | o 1 , o 2 ), where o 1 and o 2 represent the observations of agents 1 and 2, and w represents the shared representations, i.e., signs.Furthermore, MHNG does not assume the existence of explicit feedback from the listener to the speaker in the game, unlike Lewis-style signaling games widely employed in emergent communication studies.Instead, MHNG assumes joint attention, considered foundational to language acquisition during early developmental stages (Cangelosi and Schlesinger, 2015).The CPC hypothesis and MHNG are based on generative models rather than discriminative models, which are prevalent in the dominant approach to emergent communication in the deep learning community.The MHNG and results of constructive studies substantiate the CPC hypothesis in a tangible manner (Taniguchi et al., 2023).However, existing studies on MHNG only demonstrate that the game can become a decentralized approximate Bayesian inference procedure in a two-agent scenario.No theoretical research or evidence exists to show that the CPC hypothesis can hold in more general cases, i.e., in N -agent settings where N ≥ 3.In other words, it is crucial to determine whether a language game can perform decentralized approximate Bayesian inference of p(w | o 1:N ), where o 1:N = {o 1 , o 2 , . . ., o n , . . ., o N }, and o n represents the observations of the n-th agent.
The fundamental reason why the MHNG can act as an approximate Bayesian inference of p(w | o 1 , o 2 ) is that the utterance of a sign w ∼ p(w | o Sp ) by the speaker agent Sp can be sampled based on agent Sp's observations alone, and the acceptance ratio of the sign (i.e., the message) can be solely determined by the listener Li based on its own observations and internal state.These properties are derived from the theory of the Metropolis-Hastings algorithm.MHNG has a solid theoretical basis in Markov Chain Monte Carlo (MCMC) (Hastings, 1970).However, the proof provided by Taniguchi et al. (2023) assumed that the naming game is played between only two agents.This assumption was based on the need for individual agents to make the proposal sampling of a sign and the acceptance/rejection decision, respectively, without direct observation of the internal states of the other agent.Due to the difficulty, a naming game having the same theoretical property as MHNG for the N -agent (N ≥ 3) case has not been proposed.
The goal of this paper is to extend the MHNG to the N -agent (N ≥ 3) scenario and show that the extended naming game can act as an approximate Bayesian inference algorithm for p(w | o 1:N ).The main idea of the proposed method is the introduction of arecursive structure into the MHNG.Let us consider a 3-agent case.If w ∼ p(w | o 1 , o 2 ) can be sampled in the MH algorithm, the acceptance ratio for the third agent can be calculated based on the third agent's internal states, and the communication can be regarded as a sampling process of p(w | o 1 , o 2 , o 3 ).Notably, w ∼ p(w | o 1 , o 2 ) can be sampled using the original two-agent MHNG.By extending this idea in a recursive manner, we can develop a recursive MHNG (RMHNG).The details will be described in Section 2.
The main contributions of this paper are twofold.
• We propose the RMHNG played between N agents and provide mathematical proof that the RMHNG acts as an approximate Bayesian inference method for the posterior distribution over a latent variable shared by the agents given the observations of all the agents.
• The performance of the RMHNG is empirically demonstrated on synthetic data and real image data.
The experiment shows that the RMHNG enables more than two agents to form and share a symbol system.The inferred distributions of signs are shown to be a posterior distribution over p(w | o 1:N ) in an empirical manner.To reduce computational complexity and maintain applicability for the explanation of communication in human society, two types of approximations, i.e., (1) one-sample (OS) approximation and (2) limited-length (LL) approximation, are proposed and both are validated through experiment.
The remainder of this paper is structured as follows.In section 2, we describe RMHNG, explaining its assumed generative model, algorithms, and theoretical results.Additionally, a practical approximation is provided.Section 3 presents an experiment using synthetic data and demonstrates the RMHNG empirically.Section 4 presents an experiment using the YCB object dataset (Calli et al., 2015), which contains real images of everyday objects.In Section 5, we engage in a comprehensive discussion.Finally, we conclude the paper in Section 6.

Overview
The RMHNG is a language game played between multiple agents (N ≥ 2).It is an extension of the original MHNG.When N = 2, the RMHNG is equivalent to the original MHNG.Notably, the game does not allow agents to give any feedback to other agents during the game, unlike Lewis-style signaling games (Lewis, 2008), which have been used in studies of emergent communication.Instead, the game assumes joint attention.Generally, when we ignore the representation learning parts, the original MHNG is played as follows: .A listener updates its internal parameter θ Li .4.They alternate their roles, i.e., take turns, and go back to 2.
The RMHNG extends the original MHNG to allow for communication between multiple agents (N ≥ 3) and forms a shared symbol system among them.The key idea of RMHNG is as follows: 1.In an RMHNG played by M agents, we recursively use an RMHNG played by M − 1 agents as a proposal distribution of w d , which corresponds to a speaker in the original MHNG.Note that, an RMHNG played by M − 1 agents (1, . . ., M − 1) is a sampler of an approximate distribution of p(w d | x 1:M −1 ).2.An RMHNG played by two agents (N = 2) is equivalent to an original MHNG.
Consequently, when played by N agents, the RMHNG approximates the distribution of p(w d | x 1:N ) through mathematical induction.et al., 2018;Taniguchi et al., 2020) allows us to decompose the main part of the naming game (exchanging signs w n d between agents) and the representation learning part (inferencing x n d and θ n ).For simplicity and to focus on the extension of the MHNG, we assume that x n d is observable throughout the paper and concentrate on inferring θ n and w d through the language game.

Inference as a naming game
The RMHNG, like the MHNG, acts as a decentralized approximate Bayesian inference based on the MH algorithm.A standard inference scheme for p(w d | x 1:N d ) in Figure 2 (C) requires the information about x 1:N d , e.g., the posterior distribution p(x d | o 1:N d ).However, x 1:N d are internal representations of each agent, and the agents cannot access each other's internal state, which is a fundamental principle of human semiotic communication.If the agents' brains were connected, the shared variable w d would be a representation of the connected brain and could be inferred by referencing x 1:N d .But this is not the case in real-world communication.The challenge is to infer the shared variable w d without connecting the agents' brains and without simultaneously referencing x 1:N d .The solution is to play the RMHNG.The decomposition of the generative model inspired by SERKET, as shown in Figure 3 right, allows for a more manageable and systematic approach to the inference of hidden variables.The SERKET framework enables the decomposition of a PGM into multiple modules, which simplifies the overall inference process by breaking it down into inter-module communication and intra-module inference (Taniguchi et al., 2020;Nakamura et al., 2018).In the context of the RMHNG, the semiotic communication between agents is analogous to the inter-module communication in the SERKET framework.

MH receiving
Algorithm 1 presents the MH-receiving algorithm.When a listener agent A Li ∈ A receives a sign w ⋆ for the d-th object, the agent evaluates whether to accept the sign and update A Li .wd or not, where A is a set of agents.Here, A i .wd represents the w d that agent A i possesses.Similarly, A Li .xd denotes the x d held by  Given n + 1 (n < N ) agents, each with parameter w d , this algorithm is used to compute w d for interactions among n agents.If n > 1, the RMH-communication function is recursively called for agents A 1:n−1 ⊂ A to compute interactions among them.Then, A n+1 updates its own parameter w d using the received information s by calling the MH-receiving function.If n = 1, the MH-communication function is called.After the internal loop (from line 2 to line 9) is completed, the algorithm returns the w d of a randomly selected agent j from A 1:n+1 .This algorithm can recursively calculate interactions among N agents.
Algorithm 3 Recursive Metropolis-Hastings Communication end for

Theory and proof
For the main theoretical result, we use the following corollary.
COROLLARY 2.1.The MH communication is a Metropolis-Hastings sampler of The acceptance probability r in MH-receiving is equivalent to that in the MH algorithm for P (w d | x Sp d , x Li d , θ Sp , θ Li ) in the case that P (w | x Sp , θ Sp ) is a proposal distribution.This result is a generalization of (Hagiwara et al., 2019(Hagiwara et al., , 2022) and a special case of (Taniguchi et al., 2023).For the details of the proof, please refer to the original papers.
The first theoretical result is as follows.
THEOREM 1.The RMH communication converges to a MCMC sampler of P ( PROOF.When n = 2, the RMH communication is reduced to the execution of MH communication T times.The MH communication is proven to be an MH sampler in corollary 1.Therefore, RMH communication is a MCMC sampler, and the sample distribution converges to P ( PROOF.The RMHNG samples the local parameters w d for all d using the RMH communication, and the global parameters θ 1:n from P (θ As a result, the RMHNG converges to a Gibbs sampler of As a result, the RMHNG is proved to be a decentralized approximate Bayesian inference procedure for p({w d } d∈D , {θ n } n∈{1,...,N } | {x n d } d∈D ).

Approximations
Though the RMHNG is guaranteed to be a decentralized approximate Bayesian inference procedure for p({w d } d∈D , {θ n } n∈{1,...,N } | {x n d } d∈D ) , the computational cost increases exponentially with respect to the number of agents N .The computational cost is O(IDT (N −1) ).This indicates that the computational cost of RMH-communication, i.e., O(T (N −1) ), has a significant impact on the overall computational cost.Therefore, we introduce a lazy version of RMHNG, which employs two approximations to reduce the computational cost.

One-sample (OS) approximation
The number of internal iterations T corresponds to the iterations of MCMC for sampling w d given variables of a (sub)group of agents.Theoretically, T should be large.However, practically, even T = 1 can work in an approximate manner.We refer to the RMHNG with T = 1 as the OS approximation (OS), a special case.With the OS, the computational cost of RMH communication is significantly reduced from O(T (N −1) ) to O(N ).

Limited-length (LL) approximation
RMH communication is a process of information propagation through a chain connecting N agents (as shown in Figure 5).Limited-length approximation (LL) truncates the chain to M agents.By shuffling the order of the agents according to the data points, it is expected that sufficient information will be statistically propagated among all the agents.LL reduces the computational cost of RMH communication from O(T (N −1) ) to O(T (M −1) ), where M ≤ N is the length of the truncated chain, i.e., the number of agents participating in an RMH communication.To reduce computational complexity while maintaining applicability for explaining communication in human society, two types of approximations are proposed: (1) OS approximation and (2) LL approximation.Both types were validated through experimentation.

Example: multi-agent Inter-GMM
To evaluate the RMHNG, we developed a computational model of symbol emergence called multi-agent Inter-GMM.This is based on the Gaussian mixture model (GMM) and is a special case of the multi-agent Inter-PGM.Hagiwara et al. (2019Hagiwara et al. ( , 2022) )  ) represented as a categorical distribution as a part of GMM and a VAE respectively.Inter-GMM is defined as a part of Inter-GMM+VAE and combines two GMMs via a shared latent variable.We generalized the two-agent Inter-GMM and obtained the multi-agent Inter-GMM, which has N Gaussian emission distributions corresponding to N agents.The probabilistic generative process of the multi-agent inter-GMM is as follows: where µ n k and Λ n k are the mean vector and the precision matrix of the k-th Gaussian distribution of the n-th agent.Cat( * ) is the categorical distribution, N ( * ) is the Gaussian distribution, W( * ) is the Wishart distribution.The Inter-GMM is a probabilistic generative model represented by the PGM shown in Figure 2 (C).In other words, the multi-agent Inter-GMM is an instance of the multi-agent Inter-PGM.Therefore, the RMHNG can be directly applied to the multi-agent Inter-GMM.

Conditions
We evaluated the RMHNG using the multi-agent Inter-GMM with four agents (N = 4) using synthetic data.For all experiments (excluding the measurement of computation time), the number of iterations (I) was set to 100, and each experiment was conducted five times.
Dataset: We created synthetic data to serve as observations for the four agents.A dataset was generated from five 4-dimensional Gaussian distributions with mean vectors of (0, 1, 2, 3), (0, 5, 6, 7), (8, 5, 10, 11), (12, 13, 10, 15), and (16, 17, 18, 15), respectively.The variance of each Gaussian distribution was set to the identity matrix I.The values obtained for each dimension were taken as observations for each agent.In other words, the value of the n-th dimension of data sampled from the GMM was considered as the observation for the n-th agent.Notably, for the n-th agent, the n-th and n + 1-th Gaussian distributions have the same mean and variance.Therefore, the n-th agent cannot differentiate the n-th and n + 1-th Gaussian distributions without communication.

Compared methods:
We assessed the proposed model, RMHNG (proposal), by comparing it with two baseline models and a topline model.In No communication (baseline 1), two agents independently infer a sign w, i.e., perform clustering of the data.No communication occurs between the four agents.In other words, the No communication model assumes that the agents independently infer signs w n d (n ∈ {1, 2, 3, 4}), respectively, using four GMMs.All acceptance (baseline 2) is the same as the RMHNG, with an acceptance ratio always set to r = 1 in MH receiving (see Algorithm 1).Each agent always believes that the sign of the other is correct.In Gibbs sampling (topline), the sign w d is sampled using the Gibbs sampler.This process directly uses x 1:4 d , although no one can simultaneously examine the internal (i.e., brain) states of human communication.This is a centralized inference procedure and acts as a topline in this experiment.
We also evaluated two approximation methods introduced in Section 2.5.OS and LL refer to the OS and LL approximations, respectively.In the LL approximation, M = 2, i.e., the chain length is one.In OS&LL, both OS and LL approximations were applied simultaneously.
Hyperparameters: In all methods, the hyperparameters of the agents were set to be the same.The hyperparameters were β = 1, m = 0, W = 0.01, and ν = 1.

Evaluation criteria:
• Clustering: We used Adjusted Rand Index (ARI) (Hubert and Arabie, 1985) to evaluate the unsupervised categorization performance of each agent in the MH naming game.A high ARI value indicates excellent categorization performance, while a low ARI value indicates poor performance.ARI is advantageous over precision since it accounts for label-switching effects in clustering by comparing the estimated labels and ground-truth labels.Appendix 2 provides more details.
• Sharing sign: We assessed the degree to which the two agents shared signs using the κ coefficient (κ) (Cohen, 1960).Appendix 2 provides more details.
• Computation time: We conducted experiments to measure the processing time of the program when running it at I = 10 by varying the values of T in Algorithm 3 and M in Algorithm 4. We conducted experiments with T = 1, 2, 3, 4 and M = 1, 2, 3.The program was run three times in each experiment (30 iterations in total, initialized every 10 iterations), and we calculated the average processing time per iteration (10 iterations).
• Decentralized posterior inference: To investigate whether RMHNG is an approximate Bayesian estimator of the posterior distribution p(w | x 1 , x 2 , . . ., x N , θ 1 , θ 2 , . . ., θ N ), we need to compare it with the true posterior distribution.However, computing the true posterior distribution p(w | x 1 , x 2 , . . ., x N , θ 1 , θ 2 , . . ., θ N ) directly is difficult.Therefore, we evaluate how well the distribution of signs obtained by RMHNG matches that of Gibbs sampling.Appendix 1 provides more details.
Machine Specifications: The experiment was conducted on a desktop PC with an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz 3.60 GHz, 32GB of RAM, and an NVIDIA GeForce RTX 2080 SUPER GPU.

Results
Categorization and sign sharing: Table 2 shows the ARI and κ for each method used on the artificial data.As shown in Table 2, the ARI values for RMHNG were consistently close to those of Gibbs sampling, with a maximum difference of only 0.1.This indicates that RMHNG had a similar category classification accuracy as Gibbs sampling.In this setting, OS performed even better than RMHNG, achieving the highest values for both ARI and κ.This might be because OS facilitated the mixing process by introducing  Change in ARI and κ for each iteration: Figure 6 shows the ARI (right) and κ (left) for each iteration (i in Algorithm 4).From the left graph in Figure 6, we can see that RMHNG, OS, and LL converge faster in terms of ARI, in that order, among the RMHNG and its approximation methods.OS&LL show an upward trend in ARI even at the 100th iteration, indicating that they have not converged.No communication has the fastest convergence in ARI among all the methods.As for All acceptance, we can see that the ARI does not show an upward trend even as the iteration count increases, compared to other methods.From the right graph in Figure 6, we can see that RMHNG, OS, and LL converge faster in terms of κ, in that order, among the RMHNG and its approximation methods.OS&LL show an upward trend in κ even at the 100th iteration, indicating that they have not converged.No communication and All acceptance do not show an upward trend in κ even as the iteration count increases, compared to other methods.
Computation time: Figure 7 shows the average computation time for varying values of M and T in RMHNG.As shown in the figure, it can be seen that the computation time increases logarithmically as T increases.Considering that the vertical axis is logarithmic, this confirms that the computation time follows the computational complexity of O(T M −1 ).Additionally, it can be confirmed that significant reductions in computation time can be achieved by approximating RMHNG with OS (T = 1, M = 3), LL Feature extraction was performed using SimSiam (Chen and He, 2021), a representation learning method based on self-supervised learning, pre-trained on the collected cropped YCB-object dataset.The feature extractor outputted 512-dimensional vectors.To address the issue of high feature dimensionality compared to the small amount of data available, principal component analysis (PCA) was used to reduce the features to 10 dimensions2 .Figure 10 shows a visualization of the features of all data and the features observed by each agent using PCA.From this figure, it can be expected that some degree of categorization is possible.
Hyperparameters: All agents were assigned the same hyperparameters, with values set as follows: β = 1, m = 0, W = 100 × I, and ν = 1, where I is a 10-dimensional identity matrix I.
Compared method and evaluation criteria are the same as those in the Experiment 1.

Result
Categorization and sharing signs: Table 3 shows the ARI and κ for each method on the YCB object dataset.It is observed that RMHNG and Gibbs sampling have similar category classification accuracy with a maximum difference of only 0.04.Among the RMHNG approximations, OS had the highest ARI and κ values.Interestingly, it showed a value close to that of RMHNG for the κ.OS&LL had the lowest values for both ARI and κ.However, the difference in ARI between LL, OS, and OS&LL was at most 0.02, indicating similar performance.In the YCB object dataset experiments, although OS showed higher ARI   Change in ARI and κ for each iteration: Figure 11 shows the ARI (right) and κ (left) for each iteration (i) in Algorithm 4 for various methods, while Figure 6 shows the convergence of the κ for synthetic data.From the left figure in Figure 11, we can see that the RMHNG method has the fastest convergence of ARI, followed by OS, OS&LL, and LL.Regarding OS&LL, we can see that ARI did not converge when using synthetic data, but it did converge when using the YCB object dataset.No communication had the fastest convergence of ARI among all the methods.As for All acceptance, we can see that ARI did not show an increasing trend with iteration in synthetic data, but it did show an increasing trend when using the YCB object dataset.From the right figure in Figure 6, we can see that the RMHNG method had the fastest convergence of the κ, followed by OS, LL, and OS&LL.No communication did not show any increasing trend compared to other methods.As for All acceptance, we can see that the κ did not show an increasing trend with iteration when using synthetic data, but it did show an increasing trend when using the YCB object dataset.
Decentralized posterior inference: Figure 12 shows the results of calculating the degree of similarity between the distribution of the sign obtained by each method and that obtained by Gibbs sampling in the last 10 iterations (91-100 iterations) for each method.RMHNG showed a value of 0.76, indicating that the distribution of the sign obtained by RMHNG matched that obtained by Gibbs sampling by 76%.Among the methods that approximated RMHNG, OS showed the highest value, both in the synthetic data experiment and the YCB object dataset experiment.Additionally, all approximation methods showed higher values than No communication.
When comparing Table 2 and Table 3, we observed that the ARI values for RMHNG, OS, LL, and OS&LL were higher for the synthetic dataset, while the κ were higher for the YCB object dataset.This may be because the YCB object dataset is easier to categorize when observed partially by an agent, but some objects are so similar that agents naturally regard them as instances of a single category.For example, in Figure 10, the mustard bottle, bleach cleanser, and Windex bottle have similar feature distributions, making it difficult to cluster them according to the ground truth.Therefore, using the YCB object dataset leads to a decrease in ARI.As a result, when using the YCB object dataset, there was little difference in ARI for No communication and methods based on RMHNG compared to when using the synthetic dataset.

CONCLUSION
In this study, we extended the MHNG to the N -agent scenario by introducing the RMHNG, which serves as an approximate decentralized Bayesian inference method for the posterior distribution shared by agents, similar to the MHNG.We demonstrated the effectiveness of RMHNG in enabling multiple agents to form and share a symbol system using synthetic and real image data.To address computational complexity, we proposed two types of approximations: OS and LL approximations.Evaluation metrics, such as the ARI and the κ, were used to assess the performance of communication in each iteration of the naming game.Results showed that the 4-agent naming game successfully facilitated the formation of categories and effective sign-sharing among agents.Moreover, the approximated RMHNG exhibited higher ARI and κ compared to the No communication condition, showing that the approximate version of RMHNG could perform symbol emergence in a population.Additionally, we assessed the agreement between the sign distributions obtained by RMHNG and Gibbs sampling, confirming that RMHNG approximates the posterior distribution with a degree of agreement exceeding 87% for the synthetic data and 71% for the YCB object data.This result demonstrates that RMHNG could successfully approximate the posterior distribution over signs given every agent's observations.Several future perspectives emerge from this study.Firstly, we plan to analyze the behavior of the RMHNG in populations with a larger number of agents.Although we focused on the 4-agent scenario due to the computational cost of the original RMHNG (O(IDT (N −1) )), we empirically observed that the OS approximation performed well in many cases.Unlike the original RMHNG, the OS-approximated version exhibits scalability in terms of the number of agents (O(N )), enabling simulations with larger populations.This scalability opens up possibilities for providing valuable insights into language evolution through the MHNG framework.Additionally, extending the categorical signs to more complex signs, such as sequences of words, represents a natural progression for our research.Investigating the dynamics of communication with more intricate sign systems will shed light on the evolution and complexity of language.

EVALUATION OF THE DECENTRALIZED POSTERIOR INFERENCE ARCHITECTURE
In order to evaluate the efficacy of RMHNG as an approximate Bayesian estimator for the posterior distribution p(w | x 1 , x 2 , . . ., x N , θ 1 , θ 2 , . . ., θ N ), a comparison was made between RMHNG and the actual posterior distribution.However, direct computation of the true posterior distribution p(w | x 1 , x 2 , . . ., x N , θ 1 , θ 2 , . . ., θ N ) presented significant challenges.Therefore, the evaluation focused on the degree of concurrence between the sign distribution generated by RMHNG and that produced by Gibbs sampling.The evaluation was conducted for the last 10 iterations (i.e., 91 − 100 iterations) of RMHNG and Gibbs sampling.Let f R d,w and f G d,w be the number of times the word w was sampled using RMHNG and Gibbs sampling, respectively, for the d-th dataset.The similarity between the two methods was calculated as ).However, due to the singularity of the GMM, label switching (i.e., swapping of signs) between different inference results needed to be addressed.To solve this problem, bipartite graph matching was performed to correspond a clustering result with another.To perform bipartite graph matching, the sign obtained by RMHNG was considered as the point set V R = {v R 0 , v R 1 , . . ., v R K }, and the sign obtained by Gibbs sampling was considered as the point set V G = {v G 0 , v G 1 , . . ., v G K }.The edge connecting V R i and V G j was denoted by E i,j , and the set of all edges was denoted by E = {e 0,0 , e 0,1 , . . ., e i,j , . . ., e K,K }.The graph G = (V G ∨ V R , E) was a complete bipartite graph.If the gain of each pair ), then the sign replacement problem could be reduced to a weighted maximum bipartite matching problem.To simplify the problem further, the gain of each pair was multiplied by (−1) ).This reduced the weighted maximum two-part matching problem to a minimum cost flow problem, which could be solved using the Hungarian method.Finally, the similarity was calculated by 1

ARI AND κ
ARI is a widely used measure for evaluating clustering performance by comparing the clustering results with the ground-truth labels.Unlike precision, which is calculated by directly comparing estimated labels to ground-truth labels and often used in the evaluation of classification systems trained using supervised learning, ARI considers label-switching effects in clustering.The formula for ARI is given by Equation ( 6), where RI represents the Rand Index.Further details can be found in Hubert and Arabie (1985).
The kappa coefficient (κ) is defined by Equation ( 7): Here, C o represents the degree of agreement of signs among agents, and C e denotes the expected value of coincidental sign agreement.The interpretation of κ is as follows (Landis and Koch, 1977):

FEATURE EXTRACTION BY SIMSIAM
As a feature extractor, we utilized SimSiam (Chen and He, 2021), a self-supervised representation learning technique that was pre-trained on the YCB object dataset.We followed the same network architecture and hyperparameters as outlined in the original paper (Chen and He, 2021), but with a few minor adjustments.For data augmentation, we used the following parameters, using PyTorch notation3 .RANDOMRESIZEDCROP with a scale in the range of [0.1, 0.6], and RANDOMGRAYSCALE with a probability of 0.2.We normalized tensor images using the NORMALIZE function with a mean of (0.485, 0.456, 0.406) and standard deviation of (0.229, 0.224, 0.225) We used ResNet-18 as the BACKBONE (He et al., 2016) and set the dimension of the output feature vector to 512.A two-layer fully connected layer was employed as the PROJECTOR, using an intermediate layer with a dimension of 512.The predictor also utilized a two-layer fully connected layer, with an intermediate layer dimension of 128.During the training phase, we set the learning rate to 0.1 and utilized stochastic gradient descent as the optimizer.The batch size was set to 64, and we trained the network for 100 epochs.

Figure 2
Figure 2 presents three probabilistic graphical models (PGMs) representing the interactions between multiple agents sharing a latent variable w d .(A) The left panel shows a PGM that integrates two PGMs representing two agents with a shared latent variable w d .This model is referred to as the two-agent Inter-PGM.(B) The center panel generalizes the PGM in (A) to integrate PGMs representing N agents.This model can be considered a multimodal PGM in which a shared latent variable integrates multimodal observations.We refer to this model as the multi-agent Inter-PGM.(C) The right panel provides a concise representation of (B) using plate representations, meaning (B) and (C) represent the same probabilistic generative process.When agent n observes the d-th object, they receive observations x n d and infer their internal representation x n d .A latent variable representing a word, w d , is shared among the agents.The inference of θ n and x n d corresponds to a general representation learning problem.As studied in Taniguchi et al. (2023), introducing the Symbol Emergence in Robotics Toolkit (SERKET) framework (Nakamuraet al., 2018;Taniguchi et al., 2020) allows us to decompose the main part of the naming game (exchanging signs w n d between agents) and the representation learning part (inferencing x n d and θ n ).For simplicity

Figure 2 .
Figure 2. Probabilistic graphical models considered for MHNG and RMHMG.(A) PGM is for MHNG, i.e., a two-agent scenario called two-agent Inter-PGM.(B) PGM is a generalization of PGM in (A), i.e., a multi-agent scenario (N ≥ 2), called multi-agent Inter-PGM.n-th agent has variables for observations o n d , internal representations w n d for the d-th object (1 ≤ d ≤ D). n-th agent has global parameters ϕ n and θ n and hyperparameters.Variable w d is a shared latent variable, and concrete samples drawn from the posterior distribution over w d are regarded as an utterance, i.e., a sign.(C) PGM shows a concise representation of (B) using plate representations (i.e., (B) and (C) represent the same probabilistic generative process).
The function MH-receiving returns the sign for the d-th object agent Li holds after receiving a new name for the d-th object from another agent.Algorithm 1 MH Receiving 1: function MH-RECEIVING(w ⋆ , A Li , d) 2: r = min 1, P (A Li .xd |A Li .θ,w⋆ ) P (A Li .xd |A Li .θ,ALi .sd ) the MH-communication algorithm.The function MH-communication describes the elementary communication in both the MHNG and the RMHNG.A sign s for the d-th object is sampled (i.e., uttered) by agent Sp and received by agent Li, where Li, Sp ∈ N. Algorithm 2 MH Communication 1: function MH-COMMUNICATION(A Sp , A Li , d) 2: w ⋆ ∼ P (A Sp .wd | A Sp .xd , A Sp .θ)3: return MH-receiving(w ⋆ , A Li , d) 4: end function 2.3.3Recursive MH communication Algorithm 3 presents the recursive MH communication algorithm.This algorithm represents the recursive MH communication process, as shown in Figure 4.The recursive MH communication is one of the MH sampling procedures for p(w d | o 1:N d ).

Figure 4 .Figure 5 .
Figure 4.The upper figure is schematic explanation of RMH communication and RMHNG.The recursive MH communication is one of the MH sampling procedures for p(w d | o 1:N d ).Given n + 1 (n < N ) agents, each with parameter w d , this algorithm is used to compute w d for interactions among n agents.
proposed the Inter-Dirichlet mixture (Inter-DM) which combines two Dirichlet mixtures (DMs), p(x n d | w d ) and p(o n d | x n d ), represented as categorical distributions in Figure 2 (A).Taniguchi et al. (2023) proposed Inter-GMM+VAE which combines two GMM+VAEs, i.e., p(x n d | w d ) and p(o n d | x n d

Figure 6 .
Figure 6.ARI (left) and κ (right) for each iteration when using artificial data

Figure 8 .
Figure 8. Distribution of signs obtained by various methods and degree of agreement between the distribution of signs obtained by Gibbs sampling

Figure 9 .
Figure 9. (a): Type of YCB object dataset utilized in the experimental analysis (b): Partition diagram of YCB object dataset.we divided the images of each object into four sets and assigned each set to one of the four different agents.Each set consisted of 30 images.Specifically, images ranging from 0°to 87°were assigned to agent 1, those from 90°to 177°were assigned to agent 2, those from 180°to 267°were assigned to agent 3, and those from 270°to 267°were assigned to agent 4.

Figure 10 .
Figure 10.Features of the entire dataset and the features of individual agents' observations are visualized by 2D-PCA (a): Features of all data visualized by 2D-PCA (b): PCA visualization of Agent 2's observations (c): PCA visualization of Agent 3's observations (D): PCA visualization of Agent 4's observations

Table 2 .
Experimental results for synthetic data: Each method was tested 5 times, and for each agent, ARI and κ were calculated when I was between 91 and 100.Mean ± standard deviation of obtained 50(5 × 10)ARI and κ are shown.The highest scores are shown in bold, and the second-highest scores are underlined.

Table 3 .
Experimental results for YCB object dataset: Each method was tested five times, and for each agent, the ARI and κ were calculated when I was 91 − 100.The mean ± standard deviation of obtained 50(5×10)ARI and κ are shown.Highest scores are shown in bold, and second-highest scores are underlined.ARI for Agent1, lower ARI for Agent2, and higher ARI for other agents.However, the κ was the lowest among all methods for No communication.All acceptance had the lowest ARI among all methods and the highest κ among all methods.