Metropolis-Hastings algorithm in joint-attention naming game: experimental semiotics study

We explore the emergence of symbols during interactions between individuals through an experimental semiotic study. Previous studies have investigated how humans organize symbol systems through communication using artificially designed subjective experiments. In this study, we focused on a joint-attention-naming game (JA-NG) in which participants independently categorized objects and assigned names while assuming their joint attention. In the Metropolis-Hastings naming game (MHNG) theory, listeners accept provided names according to the acceptance probability computed using the Metropolis-Hastings (MH) algorithm. The MHNG theory suggests that symbols emerge as an approximate decentralized Bayesian inference of signs, which is represented as a shared prior variable if the conditions of the MHNG are satisfied. This study examines whether human participants exhibit behavior consistent with the MHNG theory when playing the JA-NG. By comparing human acceptance decisions of a partner's naming with acceptance probabilities computed in the MHNG, we tested whether human behavior is consistent with the MHNG theory. The main contributions of this study are twofold. First, we reject the null hypothesis that humans make acceptance judgments with a constant probability, regardless of the acceptance probability calculated by the MH algorithm. The results of this study show that the model with acceptance probability computed by the MH algorithm predicts human behavior significantly better than the model with a constant probability of acceptance. Second, the MH-based model predicted human acceptance/rejection behavior more accurately than four other models (i.e., Constant, Numerator, Subtraction, Binary). Among the models compared, the model using the MH algorithm, which is the only model with the mathematical support of decentralized Bayesian inference, predicted human behavior most accurately, suggesting that symbol emergence in the JA-NG can be explained by the MHNG.


Introduction
Humans have the ability to create and communicate through symbol systems that involve assigning meanings to signs.This semiotic process does not rely on predetermined definitions of the symbols' meanings but rather emerges gradually through semiotic communication and perceptual experiences.This phenomenon is known as symbol emergence [1,2].Understanding the cognitive capabilities and the social and cognitive dynamics that support symbol emergence is crucial to comprehend the dynamic property of language.
Numerous experimental semiotic studies have been conducted to investigate how humans organize symbol systems through communication [3][4][5].These studies demonstrated that humans can build communication systems from scratch [3][4][5][6][7].Additionally, computational model-based studies in experimental semiotics, such as those by Kirby et al.,Cornish et al., validate the effectiveness of iterated learning models.Iterated learning is a process in which an individual acquires a behavior by observing a similar behavior in another individual who acquired it in the same way [8].However, iterated learning is not an explanatory principle that answers the question of whether the emergence of a symbol system improves the environmental adaptation of a group of agents.Iterated learning does not have a theoretical connection to explanatory theories about human perceptual systems.In constant, symbol emergence based on the Metropolis-Hastings naming game (MHNG), which is the focus of this study, is closely related to predictive coding and the free-energy principle [11][12][13], which are often referred to the general principle of cognition.In this context, Taniguchi et al. hypothesized that symbol emergence could be viewed as a collective predictive coding by a group of agents [14].
Many studies have been conducted on computational models that represent symbol emergence systems.Pioneering studies have been conducted using naming games, in which remote robots share symbols to represent objects, and variants of referential games [15][16][17][18].More recently, deep learning-based referential games have been extensively used to study emergent communication [19][20][21][22].Referential and naming games, often referred to as variants of the Lewis-style signaling game, have also been used to achieve compositionality in languages [23][24][25][26].Generally, in these games, a speaker sends a message to a listener who indicates the object intended by the speaker.After the communication, reward feedback is provided to the agents, and they update their parameters.The reward feedback precedes joint attention in this approach.
However, in the developmental process of human infants, joint attention, which is acquired at around nine months of age, is well known to precede tremendous progress in lexical acquisition and language development.Another notable idea is the naming game based on joint attention and the associated theoretical basis, called MHNG, in which each agent independently forms categories and shares signs associated with those categories through communication in the joint attention naming game (JA-NG) [27].This theory suggests that symbol emergence can be viewed as the approximate decentralized Bayesian inference of a posterior distribution over a shared latent variable conditioned on the observations of all agents participating in the communication.However, previous studies on experimental semiotics [8][9][10] have not employed computational models that incorporate decentralized Bayesian inference over the entire system, including multiple agents.
In this study, our objective is to investigate whether the MHNG, which models symbol emergence as a decentralized Bayesian inference [14,27], can serve as a valid explanatory principle of symbol emergence between human individuals.The MHNG involves computational agents playing a JA-NG, where agents independently form categories of objects and name them while assuming joint attention.Unlike the widely used Lewis signaling games [28], JA-NG does not involve any explicit reward feedback from the opponent after the naming process.In the MHNG, each agent decides whether to accept another agent's naming based on a probabilistic criterion calculated using the Metropolis-Hastings (MH) algorithm [29].Consequently, symbol emergence occurs through a decentralized Bayesian inference.
Suppose people in JA-NG follow a similar acceptance probability as observed in MHNG.In this case, it can be inferred that they perform decentralized Bayesian inference as a whole system that includes multiple individuals involved in the emergence of symbols.MHNG is a computational model in which agents play joint-attention naming games, and it uses the acceptance probability based on MH algorithm to determine whether a listener agent accepts an incoming name proposed by another agent.Testing the hypothesis that humans use MH-based criteria to determine the acceptance of new names in JA-NG is crucial for demonstrating the validity of the MHNG as an explanatory principle.If humans exhibit a behavior similar to that of the MHNG, their acceptance rate of incoming names should be correlated with the probability calculated using the MH algorithm.Thus, it can be concluded that humans make acceptance or rejection judgments in communication, following the principles of the MHNG to some extent.However, whether humans employ the same acceptance/rejection assessments in similar settings remains unclear.
In this study, we aim to verify whether humans engage in decentralized Bayesian inference by conducting subject experiments similar to JA-NG.To achieve this, we conducted a communication experiment with human participants.The communication structure in the experiment resembled that of the JA-NG in the simulation experiment conducted by Hagiwara et al [27].We observed the acceptance or rejection assessments of participants and tested whether they utilized the acceptance probability calculated by MHNG theory to a certain extent.Additionally, we evaluated whether the computational model using the MH algorithm predicted human behavior more accurately than four other comparative models, i.e., Constant, Numerator, Subtraction, and Binary.
The main contributions of this study are as follows: • We verify whether human participants playing JA-NG utilize the acceptance probability computed in the model based on the MH algorithm to a certain extent.
• We demonstrate that the model based on the MH algorithm outperforms the other four comparative computational models in predicting participants' acceptance behavior in JA-NG.
Statistical tests were conducted to examine our hypotheses.The results showed that the acceptance behavior of the human participants in JA-NG can be modeled using the MH algorithm.
The remainder of this paper is organized as follows: The next section provides an overview of the computational theory underlying this study.We then describe the setup of the communication experiment as well as the analysis and statistical test procedures in the Materials and Methods section.The Results and Discussion section presents our findings and corresponding interpretations.The final section concludes the paper.

Preliminaries
In this section, we describe JA-NG performed in the subject experiments and the interpersonal Gaussian mixture (Inter-GM), which is the assumed probabilistic model for analyzing the results of the subject experiments.Additionally, we describe the general interpersonal probabilistic generative model (Inter-PGM), whose concrete instance is the inter-GM, and the MHNG in which agents play JA-NG using a specific acceptance probability based on the MH algorithms.
Fig. 2 illustrates the correspondence between the computational model (i.e., inter-GM) and the communication experiment.1. Variables of Inter-PGM and their explanations.Superscript * ∈ {A, B} refers to a specific agent.
1. Perception: Both the speaker and the listener observe an object and update their perceptual state, e.g., a categorization result, corresponding to the object based on their respective observations, assuming joint attention where two agents are looking at the same object.

Communication:
The speaker gives the name to the object based on its perceptual state, e.g., the categorization result, and its own knowledge.The listener decides whether to accept the name.
3. Learning: After communication, the categorization results and knowledge are updated based on the results of the communication.

Turn-taking:
The speaker and listener alternate their roles and repeat the above steps for all objects.
The JA-NG is a procedural description of the interaction between two agents and their learning process through the sharing of semiotic knowledge between them based on joint attention.

Inter-PGM and MH naming game (MHNG)
We first define the variables related to JA-NG and assume a conditional dependency between the variables by defining the Inter-PGM (Fig. 1).Table .1 is an explanation of the variables in the Inter-PGM.Inter-PGM is a general form of the PGMs that models the symbol emergence using JA-NG.The probability variables related to JA-NG can be described using a probabilistic graphical model, as shown in Fig. 1.
The generative process of the Inter-PGM is as follows: where x * n represents the observed information, c * n represents the category to which x * n is classified, i.e., perceptual state, s * n represents the sign of x * n , and * ∈ {A, B}.The PGM can be decomposed into two parts corresponding to the two agents using the SERKET framework [30] in the inference process.Hagiwara et al. found that a certain type of language game can be regarded as a decentralized inference process for an inter-PGM [27], and Taniguchi et al. formulated this idea as MHNG [14].
The MH naming game is a special case of the JA-NG [14].JA-NG becomes the MHNG on satisfying the following conditions: 1.The speaker (Sp) selects the name s ⋆ n by sampling from the posterior distribution P (s n | Θ Sp , c Sp n ). 2. The listener (Li) determines acceptance of sign s ⋆ n using the probability 3. The agents update its internal variables c * n , Θ * , Φ * using Bayesian inference appropriately.
It is theoretically guaranteed that the MHNG is an approximate decentralized Bayesian inference of shared representations, i.e., P ({s n } n=1,...,N | {x A n , x B n } n=1,...,N ) and each agent's internal representations and knowledge.For more details, please refer to the original paper [14].

Interpersonal Gaussian Mixture (Inter-GM)
In this study, we used inter-GM, which was tailored to fit the observations, that is, the color information used in our experiment.Hagiwara et al. proposed inter-DM and inter-MDM models in which agents observe bag-of-features representations, i.e, histograms [27,31].They formed individual categories using a Dirichlet mixture and shared signs linked to the formed categories through communication.Inter-GM is a modified version of inter-DM in which the part that formed categories using a Dirichlet mixture is replaced by a Gaussian mixture for categorizing multidimensional continuous real-valued vectors.

!"#$%&'()*+%, -).+#/0"+%&1/0&/2%+03)."/$% /2%+03)."/$
In the MHNG, after observing (or sampling) s * n , the probabilistic variables for each agent become independent, and the parameters for each agent can be inferred using ordinal approximate Bayesian inference schemes.We applied Gibbs sampling, a widely used Markov chain Monte Carlo approximate Bayesian inference procedure [32], to sample the parameters µ * k , Λ * k , c * n , and Θ * .In the MHNG, the sign s n is inferred by agents A and B through an alternative sampling of the sign s n from each other, and acceptance based on the acceptance probability of the MH algorithm r MH n = min 1, for the other agent's sign where Θ Li = {θ Li l } l=1,...,L inferred using c Li n and s ⋆ n .The acceptance probability estimated from the categorization results (see Fig. 3) and the actual acceptance/rejection decisions were recorded to investigate whether humans accept their opponents' proposals based on the MH acceptance probability.The parameters Θ * and Φ * are inferred through Gibbs sampling using the categorization {c * n } n=1,...,N provided by the participants, along with their names s

Communication experiment
To investigate whether a listener's acceptance of the speaker's proposals aligns with the acceptance probability calculated by the MH algorithm r MH n , we conducted a communication experiment with human participants.Instead of the computational experiment described in [27], we conducted a communication experiment with human participants that followed a methodology similar to that of experimental semiotics.
The experiment was conducted in pairs, with each pair comprising two participants, referred to as participants A and B. Each pair followed the procedure outlined in Fig. 3 and used separate personal computers (PCs).Participants were in different rooms and were not permitted to communicate directly using any alternative communication media.
Fig. 4 shows the user interface of the experimental application.(1) in Fig. 4 shows the category classification screen that the participants first encountered, (2) shows the screen for the name, and (3) shows the screen for the listener.The procedure is detailed below.
Before starting the communication, each participant was instructed to classify the 15 images into categories labeled A-E (initialization).
1. Perception: An image used in the initialization step is displayed to a speaker.
In the experiment, the participants were asked to exhibit their perceptual state as a categorization result ((1) Categorization in Fig. 3).

Communication:
The speaker names the image by selecting any name from A to E. Participant B, the listener, decides whether to accept or reject the proposed name by pressing a button.
3. Learning (update categories and sign allocation): Participant B, as the listener, can modify his/her classification result after the acceptance/rejection decision.

Turn-taking:
Steps from 1 to 3 correspond to (2) Naming game in shown Fig. 3, and this game is repeated with participants switching their roles.
During the experiment, the participants repeated steps 2 to 4 fifteen times for each data sample and then repeated the process three times.Therefore, each participant made 45 acceptance or rejection decisions per dataset.
The communication process involves proposing and accepting/rejecting names in steps 1 and 2. Each communication was completed when step 2 ended and the results were recorded each time.Participants may modify their classification results whenever desired; however, a prompt appears if they attempt to alter the result after accepting/rejecting their partner's proposal when playing the listener's role.The two participants were housed in separate rooms, and the classification and communication were performed on PCs using a Python application that communicated with the other PCs.The PCs used was a 13-inch MacBook.The brightness of the PCs was automatically adjusted to account for the possibility of different ambient lighting in each room.The images were presented in random order because the same images were used even after switching roles in step 3.

Computational model for analysis
We used the inter-GM described in the Preliminaries section to analyze the behavioral data and predict the acceptance rate of the participants.The hyperparameters used for the inter-GM were α = (0.1, 0.

Materials
For the experiment, 20 participants were recruited forming 10 pairs.The female-to-male ratio was 6:14, and the minimum and maximum ages were 21 and 59 respectively.As the experiment used colors, the participants were verbally asked if they were colorblind to ensure that colorblind participants were not included in the experiment.This study was approved by the Research Ethics Committee of Ritsumeikan University under approval number BKC-LSMH-2022-012.All the participants provided informed consent prior to participation.
To generate color images as stimuli, the CIE-L * U * V * color space, which accurately represents the psychological distance perceived by humans, was used [33].In the CIE-L * U * V * color space, L * represents brightness and U * V * represents hue.The details of the color images are as follows: (1) Pillow (PIL), a Python image processing library, was used to create images of colored circles1 .(2) L * , U * , and V * were sampled from three three-dimensional Gaussian distributions.(3) Two datasets, hard and easy, were prepared to observe the differences in communication according to difficulty levels: Dataset 1 was difficult to classify, and Dataset 2 was easy to classify.(4) The same images were shown to both participants and each dataset contained 15 (5) The Gaussian distribution to sample from was determined using a uniform distribution.
Table 2 lists the parameters for each Gaussian distribution.Each data point in the three-dimensional CIE-L * U * V * color space was generated from a three-dimensional Gaussian distribution.

Hypothesis testing 1
We investigated whether people's decisions are affected using the acceptance probability based on the MH algorithm, although the decision does not completely comply with the theory.To investigate whether humans use the MH-based acceptance probability to a certain extent, i.e., whether the actual acceptance probability correlates with the MH-based acceptance probability, we define a biased Bernoulli distribution, Bern(z n | r n = ar MH Table 2. Parameters of the three Gaussian distributions generating the color patches used in the experiment.µ k is the mean vector of the k-th three-dimensional Gaussian distribution.Σ = Λ −1 is the covariance matrix that is shared among the three ).Specifically, variable z n represents whether the participant accepted the given name, taking the value of 1 if accepted and 0 if rejected.The acceptance probability of a participant is denoted by r n .
We tested the estimated parameters a and b, which model the relationship between the actual acceptance probability and MH-based acceptance probability r MH n .For acceptance and rejection, we assumed 1 and 0, respectively.Instead of calculating the correlation between the acceptance/rejection decision and r n , we used a conditional Bernoulli distribution.
Parameters a and b were determined using the maximum likelihood estimation.The maximum log-likelihood estimation of parameters a and b was performed using gradient descent.The original likelihood function is defined as To avoid the Bern parameter from going outside the domain, a and b were bounded to 0 ≤ b and a + b ≤ 1 , respectively.
A hypothesis test was performed to test the statistical significance of the association between the r MH n and acceptance decisions made by actual human participants.
The null hypothesis H 0 and alternative hypothesis H 0 are as follows: • H 0 : There is no association between the acceptance decision and r MH n , the MH-based acceptance probability.In other words, the human acceptance probability remains constant with respect to r MH n .
• H 1 : The acceptance probability is not constant, indicating that humans utilize the MH-based acceptance probability r MH n to some extent a = 0.
The test statistic is the coefficient of a (bounded) linear function that parameterizes June 1, 2023 11/19 the Bernoulli distribution and the acceptance probability as output.The test statistic was set as the coefficient of the regression fitted to the observed data â.
To estimate the sampling distribution of the test statistic, we used a randomized approach in which we randomly generated Bernoulli random variables with a fixed parameter and then fitted a linear model to obtain the coefficient a (i.e., the test statistic) from the null hypothesis2 .The acceptance and rejection decisions were randomly sampled from the distribution by assuming H 0 , i.e., z n ∼ Bern(z | b).The null distribution of the test statistics was estimated and the p-values were empirically calculated.By repeating this 1000 times, we obtained an estimate of the sampling distribution as a histogram, by which we could compute the p-value as the tail probability.b was determined from the behavior of all subjects using maximum likelihood estimation.
By assuming that the acceptance event occurs with probability r n , we can compute the likelihood by fitting them to the Bernoulli distribution and multiplying them by the total number of given names N ; that is, L = where f (x, y) represents a function that returns 1 if x exceeds or is equal to y, and 0 if x is below y.
Because r MH can be used if it is significantly greater than 0, a one-sided test was performed.The bias parameter b undergoes a two-sided test.The significance level was set at p < 0.001.Specifically, the following steps were performed.If P ′ a (â) ≤ 0.001, then the p-value P a (â) < 0.001, that is, the null hypothesis H 0 is rejected.In addition, if P ′ b ( b) ≥ 0.9995 or P ′ b ( b) ≤ 0.0005, then the p-value P b ( b) < 0.0005, that is, the null hypothesis H 0 is rejected.

Hypothesis testing 2
In Test 2, we tested whether the model that used the MH algorithm, i.e., the acceptance decision using Bern(z n | r MH n ), was closer to the participants' behavior than several heuristic comparative models.We performed a test using the assessment of acceptance or rejection obtained from the results of the communication experiment, and the inferred acceptance probability was denoted as r MH n .We created a set of data consisting of the distances between the participants' behaviors and the samples Table 3.Details of each model

# Model name
Acceptance probability formula 0.1 (r ≤ 0.5) 0.9 (r > 0.5) generated from the probabilities calculated by the five comparison models.These models were used to evaluate the acceptance and rejection.Subsequently, U-tests were conducted for each model.Table 3 lists the comparative models used in this study.Constant accepts with a probability b calculated from the actual acceptance rate of the subject from the experimental results, which corresponds to the null hypothesis of hypothesis testing 1. MH accepts with the inferred MH-based acceptance probability r MH n from the experimental results.Numerator accepts with the acceptance probability being the numerator part of the r MH n score, which represents the likelihood of the opponent's sign using its own parameter.Subtraction calculates the difference between the numerator part of the r MH n score representing the likelihood of the opponent's sign using the listener's parameter and the denominator part representing the likelihood of its own sign instead of the ratio in r MH n score.Subsequently, it was transformed into a range of 0.0-1.0.Binary accepts with a probability of 0.1 if the inferred acceptance probability r MH n is less than or equal to 0.5 and 0.9 if it exceeds 0.5.To test the statistical significance of models m and m ′ that make decisions regarding acceptance and rejection, hypothesis tests were performed as null and alternative hypotheses, respectively, as follows: • H 0 : Prec m = Prec m ′ .The models m and m ′ predict the participants' behavior at the same level.
• H 1 : Prec m > Prec m ′ .The model m predicts the participant's behavior more accurately than the model m ′ .
Here, Prec m is the rate at which the model m could predict the participants' acceptance or rejection decisions, i.e., precision.We sampled 100 data points for the pseudo-experimental results of each comparison model using computer simulations.The pseudo-experimental results for each comparison model were sampled from the Bernoulli distribution with the parameter of acceptance probability r m (j,i) for subject j of model m in the ith communication trial and labeled 1 for acceptance and 0 for rejection.The pvalues were calculated using a U-test.The significance level was set at 0.001.z m (j,i) ∼ Bern(z | r m (j,i) ) Precision was calculated as follows: First, we store the j-th participant's acceptance/rejection evaluation at the ith trial in the experiment in z h (j,i) , where j = 1, • • • , 20.Second, we store the acceptance/rejection evaluation results of model m in the i-th trial of the pseudo-experiment for subject j in z m (j,i) , where i = 1, • • • , 45 (i = 1, • • • , 90 for both datasets).Third, we calculate the precision of model m in predicting the j-th participant's behavior.The precision Prec m is calculated by counting the number of matches between the participant's and model's decisions.One-sided tests were conducted for all model combinations.

Results and Discussion
Hypothesis testing 1 Fig. 7 illustrates an example of the actual acceptance/rejection behavior of a participant and the inferred acceptance probabilities r MH .This suggests that there is certain coherence between r MH n and participants' behavior.This association was evaluated quantitatively and statistically.
Fig. 8 shows a histogram of the number of accepted stimuli for each acceptance rate (left) and the actual acceptance rate for each acceptance rate with a graph of y = ar + b using the estimated weights a and bias b (right) for all the participants, where a = 0.5105 and b = 0.4842.When the inferred acceptance rate was high, the actual acceptance rate by humans were also high.However, the actual probability of acceptance was higher than r MH when r MH was low.It was rare for the inferred acceptance rate, r MH n to assume an intermediate values between 0.2 and 0.8.Subsequently, we describe the results of the hypotheses tests.First, we examine the results of Test 1.The estimated parameters for Datasets 1 and 2 are shown in Table 4.The p-values P a (a), P b (b) for each subject obtained for each dataset in Table 4 show that they are rejected at the 0.001 significance level in all cases, except for some results for participants 6, 8 and 12. P a (a) in Dataset 2 for Participant 8 and 12 is 0.009 and 0.008, which could not be rejected at the 0.001 significance level, but can be rejected at the 0.01 significance level.The tests for both datasets and all subjects were rejected at a significance level of 0.001.Therefore, the null hypothesis is rejected, suggesting that humans use the inferred acceptance probabilities r MH to a certain extent.

Hypothesis testing 2
Subsequently, we examine the results of Test 2. Table 5 shows the p-values obtained from the U-tests conducted for each combination of models.The row for MH (i.e., m = 2) in Table 5 shows that the null hypothesis is rejected for all the models.The model using the MH algorithm was the closest to the participants' behaviors among June 1, 2023 14/19 !""#$%&'"#($)*+&+,-,%.!""#$%&'"#($)*+&+,-,%./01+#)(*2(&""#$%&'"# !""#$%&'"#()&%# Fig 8 .Relationship between the acceptance status of all participants and the inferred value of the acceptance probability Graphs of the number of accepted names for each inferred acceptance probability for all participants (left), the actual acceptance rate for each inferred acceptance probability for all participants, and the graph of y = ar+b with weights 'a' and bias 'b' estimated by linear regression (right) the models compared in this study.We also individually performed tests on data from each participant.Table 6 presents the results.For each participant, bf MH outperformed the other models in predicting behavior in all cases, except for six participants in bf Constant and one in bf Subtraction.For the six participants, MH did not significantly outperform bf Constant, and for one participant, bf MH did not significantly outperform bf Subtraction.We tested the data for each participant separately, and even for each dataset.Tables 7 and 8 list the results.Looking at the MH (i.e., m = 2) row in Table 7, MH outperforms the other models in all cases except 7 for bf Constant and 1 for Numerator.Looking at the MH (i.e., m = 2) row in Table 8, MH outperformed the other models in all cases, except five for Constant.Based on these test results, we suggest that humans use the acceptance probability r MH derived from the MH algorithm during communication.
The experimental results supported our hypothesis that human behavior in JA-NG follows the MH algorithm.Consequently, this result suggests that symbol emergence through JA-NG between people performs decentralized Bayesian inference, i.e., collective predictive coding.

Conclusion and Discussion
In this study, we conducted a communication experiment on symbol emergence, in which participants played a JA-NG in pairs.We compared the acceptance decisions of human participants with those of the computational models and confirmed that the acceptance probability of the model based on the MH algorithm was used to a certain extent by the participants.Additionally, the MH-based model outperformed the other five comparative computational models in terms of predicting the participants' behavior through two statistical tests.Consequently, the model using the MH algorithm was found to be suitable for explaining human acceptance behavior in JA-NG.
This suggests that the MHNG, which was studied computationally as a constructive approach to human symbol emergence, is a reasonable model for P b (b) All 0.5105 0.4842 < 0.001 < 0.0005 explaining symbol emergence in computational agents and human groups.This finding also supports the collective predictive coding hypothesis, which argues that symbol emergence in human society can be regarded as a decentralized Bayesian inference of a prior variable shared among people [14].To advance our understanding of the human acceptance evaluation in JA-NG and the dynamics of symbol emergence among people, future studies should aim to gather more evidence by conducting experiments in diverse scenarios to test whether they follow the MH algorithm.
Exploring symbol emergence in a human-agent mixed system is a future challenge worth pursuing.Because we obtained evidence supporting the prediction of human participants' behavior using the MH algorithm, we could approximate human behavior as a computational agent following the MH algorithm.Based on this approximation, we can theoretically model and analyze a mixed system involving a human participant and a computer agent.
Fig 2. Illustration of the relationship between the communication game in the experiment and the probabilistic graphical model of the Inter-GM.The color images observed by the participants are labeled as x A n and x B n , with the corresponding color classification results represented by c A n and c B n .The subjects' images are named by sampling a shared sign s ⋆ n , with signs sampled for the n-th object by A and B, which are labeled as s A n and s B n , respectively.The red balloon is A's sampled sign and the blue one is B's sampled sign.The transmission of the sign through naming is depicted by the dashed red and blue lines.Θ * = {θ l } l=1,...,L and Φ * = {(µ * k , Λ * k )} k=1,...,K

Fig 3 .
Fig 3. Flow of the subject experiment In (1) categorization, participants categorize the given image.In (2) naming game, the speaker names the image by selecting any name from A to E, and the listener, decides whether to accept or reject the proposed name by pressing a button.Participants repeat the process, switching between the roles of speaker and listener.

Fig 4 .
Fig 4. Screenshots of the experimental application in operation during the actual experiment.(1) a view of the initial categorization phase, (2) a speaker's view in the naming phase, and (3) a listener's view when the listener receives a name of a color patch.

Fig 5 .Fig 6 .
Fig 5. State of the actual experiment.The two participants used different PCs and were housed in separate rooms, and they used socket communication for communication.
indicating the degree to which acceptance occurs unconditionally, were used and these parameters were estimated.If a = 1 and b = 0, the distribution becomes the original MH-based acceptance probability distribution, Bern(z n | r n = r MH n

Nn=1
Bern(z n | r n = ar MH n + b) We performed sampling using Bern(z | b) to obtain lists of test statistics a and b and created their cumulative distribution functions to conduct a statistical test.The following steps describe the process of obtaining the list of test statistics a and b: From the experimental results, we calculated the acceptance rate b = 1 N N n=1 z n for all participants or target participants across all trials.We sampled the acceptance or rejection of each round from the Bernoulli distribution Bern(z | b) with the parameter b determined in the previous step, that is, zn ∼ Bern(z | b) (n ∈ 1, . . ., N ).Parameters a and b were estimated using the maximum likelihood estimator for each sampling result and were added to the list of statistical quantities.This procedure was repeated 1000 times and the sample distributions of a and b were obtained.We computed the cumulative distribution function P ′ a (a) = 1 L L l=1 f (a l , a) from a list of obtained statistical values a represented as a 1 , a 2 , . . ., a L , where L = 1000.Similarly, we compute the cumulative distribution function P ′ b (b) = 1 L L l=1 f (b l , b) from a list of statistical values b, represented by b 1 , b 2 , . . ., b L .Here, f (x, y) = 1, x ≥ y 0, x < y

Fig 7 ..
Fig 7. Example of the actual acceptance made by a participant and the inferred acceptance probabilities r MH n .Dataset 1 (left), Dataset 2 (right).

Table 4 .
Parameters a, b estimated and p-values on each subject's data and the data aggregated over all participants for each dataset

Table 5 .
P-value for U-test for each model combination for all participants

Table 6 .
Number of participants whose behavior resulted in the rejection of the null hypothesis for each pair of models

Table 7 .
Number of participants whose behavior resulted in the rejection of the null hypothesis for each pair of models in Dataset 1

Table 8 .
Number of participants whose behavior resulted in the rejection of the null hypothesis for each pair of models in Dataset 2