A Bayesian Approach to the Naming Game Model

We present a novel Bayesian approach to semiotic dynamics, which is a cognitive analogue of the naming game model restricted to two conventions. The one-shot learning that characterizes the agent dynamics in the basic naming game is replaced by a word-learning process, in which agents learn a new word by generalizing from the evidence garnered through pairwise-interactions with other agents. The principle underlying the model is that agents, like humans, can learn from a few positive examples and that such a process is modeled in a Bayesian probabilistic framework. We show that the model presents some analogies but also crucial differences with respect to the dynamics of the basic two-convention naming game model. The model introduced aims at providing a starting point for the construction of a general framework for studying the combined effects of cognitive and social dynamics.


I. INTRODUCTION
A basic question in complexity theory is how the interactions between the units of the system lead to the emergence of ordered states from initially disordered configurations [1,2]. This general question concerns phenomena ranging from phase transitions in condensed matter systems and self-organization in living matter to the appearance of norm conventions and cultural paradigms in social systems. Various models were used in order to study social interactions and cooperation, e.g. models of condensed matter systems (such as spin systems), statistical mechanical models (e.g. based on the master equation), ecological competition models [1], many-agents game-theoretical models [3][4][5]. Opinion dynamics and cultural spreading models represent suitable theoretical frameworks for a quantitative description of the emergence of social consensus [2].
In this respect, the emergence of human language remains a challenging, multi-fold question, related in turn to biological, ecological, social, logical, and cognitive aspects [6][7][8][9][10]. Language dynamics [11,12] has provided models describing phenomena of language competition and change that focus on the mutual interactions of linguistic traits (sounds, phonemes, grammatical rules, or languages understood as fixed entities) under the influence of ecological and social factors, modeling such interactions in analogy to biological competition and evolution.
However, the basic learning process of a word has a complex dynamics due to its cognitive dimension. In fact, learning a word means to learn a concept (understood as a pointer to a subset of objects, see Refs. [10,13,14]) and a linguistic label -for example the name of the object-used for communicating the concept. The double concept↔name nature of words has been studied through semiotic dynamics models, such as the models of Hurford [15] and Nowak [16] (see also [17,18]) and the naming game (NG) model [19,20].
In the basic version of the model of Nowak [16], the language spoken by each agent i (i = 1, . . . , N ) is defined by two personal matrices, representing the links of a bipartite network joining Q names and R concepts: (1) an active matrix U (i) representing the concept → name links, where the element U (i) q,r (q ∈ (1, Q), r ∈ (1, R)) gives the probability that agent i will utter the qth name to communicate the rth concept; (2) the passive matrix H (i) , representing the name → concept links, in which the element H (i) q,r represents the probability that an agent interprets the qth name as referring to the rth concept. In the models of Hurford and in the model of Nowak, the languages of each individual evolve with time according to a game-theoretical dynamics, with agents gaining a reproductive advantage if their matrices have a higher communication efficiency. These studies have achieved interesting results, such as the emergence of non-ambiguous one-to-one links between objects and sounds, and explain why homonyms are more frequent than synonyms [15][16][17][18].
In the NG model [19,20] there is only one concept (R = 1) that can be linked to a set of Q > 1 different names. The model can be reformulated through the agents' lists L i of the name↔concept connections known to each agent i. In the case of two-conventions models, where the conventions are the names A and B, the list of the ith agent can be L i = ∅ (no connection), L i = (A) or (B) (one name is known), or L i = (A, B) (both name↔concept connections are known).
Extending semiotic dynamics models is not trivial and already two-opinion variants of the NG model, taking into account committed groups, show a remarkable phase diagram [21]; and trying to describe actual cognitive effects requires entirely new features [22]. This paper presents a minimal model to study the interplay of the cognitive and social dynamical dimensions, assuming for simplicity the two-conventions NG model as a semiotic framework [20,23] and making a cognitive generalization within the experimentally validated Bayesian framework of [10] (see also Refs. [13,14,[24][25][26][27]). In that framework, an individual can learn a concept from a small number of examples, a most remarkable feature of human learning [10,28,29], to be contrasted with machine learning algorithms, which require a large amount of examples for generalizing successfully [30][31][32].
The paper is organized as follows. The new model is introduced in Sec. II. In Sec. III, we present and discuss the features of the semiotic dynamics emerging from the numerical simulations and quantitatively compare them with those of the two-conventions NG model. Future directions in the study of the interplay of the cognitive and the social dynamics are outlined in Sec. V.

II. A BAYESIAN LEARNING APPROACH TO THE NAMING GAME
A. The two-conventions naming game model Before introducing the new model, we recall the basic 2-conventions NG model [33], in which there is a single concept C, corresponding to an external object, and two possible names (synonyms) A and B for referring to C. Thus, the possibility of homonymy is excluded [23]. Each agent i is equipped with the list L i of the names known to the agent. We assume that at t = 0 each agent i knows either A or B and has therefore a list L i = (A) or L i = (B), respectively.
During a pair-wise interaction, an agent can act as a speaker, when conveying a word to another agent, or as a hearer, when receiving a word from a speaker. One can think of an agent conveying a word as uttering a name, e.g. A, while pointing at an external object, corresponding to concept C: thus, the hearer records not only the name A but also the name↔concept association between A and C. At a later time t > 0, the list L i of the ith agent can contain one or both names, i.e., L i = (A), (B), or (A, B).
The system evolves according to the following update rules [23]: 1. Two agents i and j, the speaker and the hearer, respectively, are randomly selected.
2. The speaker i randomly extracts a name (here either A or B) from the list L i and conveys it to the hearer j.
Depending on the state of agent j, the communication is usually described as: (a) Success: the conveyed name is present also in the hearer's list L j , i.e. also agent j knows its meaning; then the two agents erase the other name from their lists, if present.
(b) Failure: the conveyed name is not present in the hearer's list L j ; then agent j records and adds it to the list L j .
3. Time is increased of one step, t → t + 1, and the simulation is reiterated from the first point above.
An example of unsuccessful and one of successful communication are schematized in the left panel (A) of Fig. 1, see Ref. [19] for more examples. Despite its simple structure, the basic NG model describes the emergence of consensus about which name to use, which is reached for any (disordered) initial configuration [34]. From a cognitive perspective, a "communication failure" of the NG model can be understood as a learning process, in which the hearer learns a new word. It is a "one-shot learning process", because it takes place instantaneously (in a single time step) and independently of the the agent's history (i.e. of the previous knowledge of the agent).
However, modeling an actual learning process should take into account the agents' experience, based on the previous observations (the data already acquired) as well as the uncertain/incomplete character naturally accompanying any learning process.
Here, the one-shot learning is replaced by a process that can describe basic but realistic situations, such as the prototypical "linguistic games" [35]. For example, consider a "lecture game", in which a lecturer (speaker) utters the name A of an object and shows a real example "+" of the object to a student (hearer), repeating this process a few times. Then, the teacher can e.g. (a) show another example and ask the student to name the object; (b) utter the same name and ask the student to show an example of that object; or (c) do both things (uttering the name and showing the object) and ask the student whether the name↔object correspondence is correct. The student will not be able to answer correctly if not after having received some examples, enabling the student to generalize the concept C corresponding to the object in association to name A. To model these and similar learning processes, we need a criterion enabling the hearer to assess the degree of equivalence between the new example and a the examples recorded previously.
The starting point for the replacement of the one-shot learning is Bayes' theorem. According to Bayes' theorem, the posterior probability p (h|X) that the generic hypothesis h is the true hypothesis, after observing a new evidence X, reads [36,37], Here, the prior probability p (h) gives the probability of occurrence of the hypothesis h before observing the data and p (X|h) gives the probability of observing X if h is given. Finally, p (X) gives the normalization constraint; in the applications it can be evaluated as p (X) = h p (X|h ) p (h ), where {h } ∈ H represents the set of hypotheses, within the hypothesis space H. The next step is to find a way to compute explicitly the posterior probability p (h|X), through a representation of the concepts and their relative examples in a suitable hypothesis space H of the possible extensions of a given concept C, constituted by the mutually exclusive and exhaustive hypotheses h. Following the experimentally verified Bayesian statistical framework of Refs. [10,28], we adopt the paradigmatic representation of a concept as a geometrical shape. For example, the concept of "healthy level" of an individual in terms of the levels of cholesterol x and insulin y, defined by the ranges x a ≤ x ≤ x b and y a ≤ y ≤ y b , where x i and y i (i = a, b) are suitable values, represents a rectangle in the Euclidean x-y plane R 2 . Examples of healthy levels of specific individuals 1, 2, . . . correspond to points (x 1 , y 1 ), (x 2 , y 2 ), · · · ∈ R 2 . In the following, we assume that a hypothesis h is represented by a rectangular region in R 2 . Figure 2 shows four positive examples, denoted by the symbol "+", associated to four different points of the plane, consistent with (i.e. contained in) three different hypotheses, shown as rectangles. The problem of learning a word is now recast into an equivalent problem, consisting in acquiring the ability to infer whether a new example z recorded, corresponding to a new point "+" in R 2 , corresponds to the concept C, after having seen a small set of positive examples "+" of C. More precisely, let X = {(x 1 , y 1 ) , . . . , (x n , y n )} be a sequence of n examples of the true concept C, already observed by the hearer, and z = (z 1 , z 2 ) the new example. The learner does not know the true concept C, i.e. the exact shape of the rectangle associated to C, but can compute the generalization function p (z ∈ C|X) by integrating the predictions of all hypotheses h, weighted by their posterior probabilities p (h|X): Clearly, p (z ∈ C|X) = 1 if z ∈ h and 0 otherwise. By means of the Bayes' theorem (1), one can obtain the right Bayesian probability for the problem at hand. A successful generalization is then defined quantitatively by introducing a threshold p * , representing an acceptance probability: an agent will generalize if the Bayesian probability p (z ∈ C|X) ≥ p * . The value p * = 1/2 is assumed, as in Ref. [28]. We assume that an Erlang prior characterizes the agents' background knowledge. For a rectangle in R 2 defined by the tuple (l 1 , l 2 , s 1 , s 2 ), where l 1 , l 2 are the Cartesian coordinates of its lower-left corner and s i its sides along dimension i = 1, 2, the Erlang prior density is [10,28] where the parameters σ i represent the actual sizes of the concept, i.e. they are the sides of the concept rectangle C along dimension i. The choice of a specific informative prior, such as the Erlang prior, is well motivated by the fact that in the real world individuals have always some prior knowledge or expectation. In fact, a Bayesian learning framework with an Erlang prior of the form (3) well describes experimental observations of learning processes of human beings [28]. The final expression used below for computing the Bayesian probability p that, given the set of previous examples X, the new example z falls in the same category of concept C, reads [28] p Here  (4) is actually a "quick-and-dirty" approximation that is reasonably good, except for n ≤ 3 and r i ≤ σ/10, estimating the actual generalization function within a 10% error, see Refs. [10,28] for details. Despite these approximations, Eq. (4) will ensure that our computational model, described in the next section, retains the main features of the Bayesian learning framework. It is to be noticed that for the validity of the Bayesian framework, it is crucial that the examples are drawn randomly from the concept (strong sampling assumption), i.e. they are extracted from a probability density that is uniform in the rectangle corresponding to the true concept [28]. This definition of generalization is now applied below to word-learning.

C. The Bayesian word-learning model
Based on the Bayesian learning framework discussed above, in this section we introduce a minimal Bayesian individual-based model of word-learning. For the sake of clarity, in analogy with the basic NG model, we study the emergence of consensus in the simple situation, in which two names A and B can be used for referring to the same concept C in pair-wise interactions among N agents.
At variance with the NG model, here in each basic pair-wise interaction an agent i, acting as a speaker, conveys an example "+" of concept C, in association with either name A or B, to another agent j, who acts as hearer (i, j = 1, . . . , N ). In order to be able to communicate concept C uttering a name, e.g. name A, the speaker i must have already generalized concept C in association with name A. This is signalled by the presence of name A in the list L i . On the other hand, the hearer j always records the example received in the respective inventory, in the example the inventory [+ + + . . . ] A .
The state of a generic agent i at time t is defined by • the list L i , to which a name is added whenever agent i generalizes concept C in association with that name; agent i can use any name in L i to communicate C; • two inventories [+ + + . . . ] A and [+ + + . . . ] B , containing the examples "+" of concept C received from the other agents in association with name A and B, respectively.
It is assumed that initially each agent knows one word: a fraction n A (0) of the agents know concept C in association with name A and the remaining fraction n B (0) = 1 − n A (0) in association with name B -no agent knows both words, n AB (0) = 0. We will examine three different initial conditions: This choice, somehow arbitrary, is dictated by the condition that Eq. (4) becomes a good approximation for n > 3 [10].
Examples are points uniformly generated inside the fixed rectangle corresponding to the true concept C, here assumed to be a rectangle with lower left corner coordinates (0, 0) and sizes σ 1 = 3 and σ 2 = 1 along the x and y axis, respectively. Results are independent of the assumed numerical values; in particular, no appreciable variation in the convergence times t conv is observed as the rectangle area is varied, which is consistent with the strong sampling assumption, on which the Bayesian learning framework rests; see Ref. [10] and Sec. III.
Furthermore, we introduce an element of asymmetry between the names A and B, related to the word-learning process: different minimum numbers of examples n * ex,A = 5 and n * ex,B = 6 will be used, which are needed by agents to generalize concept C in association with A and B, respectively. This is equivalent to assume that concept C is slightly easier to learn in association with name A than B. Such an asymmetry plays a relevant role in the model dynamics in differentiating the Bayesian generalization functions p A and p B from each other, see Sec. IV.
The dynamics of the model can be summarized by the following update rules: 1. A pair of agents i and j, acting as speaker and hearer, respectively, are randomly chosen among the agents.
2. The speaker selects randomly: (a) a name from the list L i (or selects the name present if L i contains a single name), for example A (analogous steps follow if the word B is selected); (b) an example z among those contained in the corresponding inventory [+ + + . . . ] A -; then the speaker i conveys the example extracted z in association with (e.g. uttering) the name selected A to the hearer j.
3. The hearer adds the new example z (in association with A) to the inventory [+ + + . . . ] A . This reinforcement process of the hearer's knowledge always takes place.
4. Instead, the next step depends on the state of the hearer: (a) Generalization. If the selected name, A in the example, is not present in the hearer's list L j , then the hearer j computes the relative Bayesian probability (b) Agreement. The name uttered by the speaker, A in the example, is present in the hearer's list L j , meaning that that agent j has already generalized concept C in association with name A and has connected the corresponding inventory [+ + + . . . ] A to A. In this case, the hearer and the speaker proceed to make an agreement -analogous to that of the NG model, leaving A in their lists L i and L j and removing B is present. No examples contained in any inventory are removed.  A, B) and knows the meaning of both names. The outcome can be: (1) a reinforcement (only); (2) generalization of concept C, if the Bayes probability is p > 1/2; (3) an agreement between hearer and speaker, if both agents know the meaning of the conveyed name. Even if not indicated, reinforcement takes place also in cases (2) and (3).

S-List
Name H-List Branching Process Condition S-List H-List (before) conveyed (before) probability (after) (after) In the latter case, the inventory [+ + + . . . ] A may be "ready" for a generalization process, since it contains a sufficient number of examples, i.e., agent i will probably be able to generalize as soon as another example is conveyed by an agent. This situation is not as peculiar as it may look at first sight. In fact, there is a linguistic analogue in the case where a speaker that loses the habit to use a certain word (or a language) A can regain it promptly, if exposed to A again.
Notice also that without the agreement dynamics scheme introduced in the model, borrowed from the basic NG model, the population fraction n AB of individuals who know both A and B (n A + n B + n AB = 1) would be growing, until eventually n AB = 1.

III. RESULTS
In this section we study numerically the Bayesian NG model introduced above and discuss its main features. We limit ourselves to study the model dynamics on a fully-connected network.
In the new learning scheme, which replaces the one-shot learning of the two-conventions NG model, an individual generalizes concept C on a suitable time scale ∆t > 1, rather than during a single interaction. However, a few examples are sufficient for an agent to generalize concept C, as in a realistic concept-learning process. This is visible from the Bayesian probabilities p A and p B computed by agents in the role of hearer, according to Eq. (4), once at least n * ex,A = 5 and n * ex,B = 6 examples "+", respectively, have been stored in the inventory associated to the name A and B: Figure 3 shows the histograms of the p A 's and p B 's computed from the initial time until consensus for a single run with N = 2000 agents and starting with SIC. The low frequencies at small values of p A and p B and the highest frequencies at values close to unity are due to the fact that the Bayesian probabilities reach values p A ≈ p B ≈ 1 very fast, after a few learning attempts, consistently with the size principle, on which the Bayesian learning paradigm, and in turn Eq. (4), are based [10]. In order to visualize how the system approaches consensus, it is useful to consider some global observables, such as the fractions n A (t), n B (t), and n AB (t) of agents that have generalized concept C in association with name A only, name B only, or both names A and B, respectively, or the success rate S(t). The dynamics of a population of N = 1000 agents (panels (A) and (B)) using different initial conditions, SIC, AIC, and AICr, and that of a population of N = 100 agents starting with SIC (panels (C) and (D)) are shown in Fig. 4.
Panel (A) of Fig. 4 shows only the population fractions corresponding to the name found at consensus, for the sake of clarity (the remaining population fractions eventually go to zero). For asymmetrical initial condition (AIC or AICr), it is the initial majority that determines the convention found at consensus (that is B for AIC and A for AICr). If the system starts from SIC, the convention A, for which agents can generalize earlier (n * ex,A = 5 < n * ex,B = 6), is always found at consensus -in this case it is the asymmetry in the thresholds n * ex,A and n * ex,B , characterizing the Bayesian learning process, to determine consensus.
Panel (B) of Fig. 4 shows the success rate S(t = t k ), representing the average over different runs of the instantaneous success rate S k of the kth interaction at time t k , defined as follows: S k = 1 in case of agreement between the two agents or when a successful learning of the hearer takes places, following a Bayes probability p > 1/2; or S k = 0 in case of unsuccessful generalization, when p < 1/2 and only reinforcement takes place. The success rate S(t) varies between S(0) ≈ (n A (0)) 2 + (n B (0)) 2 , due to the respective fractions of agents that initially know the two conventions A and B, to S ≈ 1 at consensus, following a typical S-shaped curve of learning processes [38]. In the case of SIC, the initial value is S(0) ≈ 0.5 2 + 0.5 2 = 0.5, while for AIC or AICr the initial value is S(0) ≈ (0.3) 2 + (0.7) 2 ≈ 0.58.
We now investigate how the modified Bayesian dynamics affects the convergence times to consensus. The study of the size-dependence of the convergence to consensus shows that there is a critical value N * ≈ 500 in the case of SIC, such that for N ≤ N * there is a non-negligible probability that the final absorbing state is B. Panels (C) and (D) of size can play an important role in the dynamics of social systems, as an actual thermodynamic limit is only allowed for simulations of macroscopic physical systems [39].
The convergence time t conv follows a simple scaling rule with the system size N , related to the average number of examplesn ex,A ,n ex,B relative to A, B respectively, stored in the agents' inventories at consensus. These values depend on the number of learning and reinforcement processes, and hence are related to the system size N . The average number of interactions undergone by the agents until the system reaches the consensus is given by the sum n int =n ex,A +n ex,B [40]. One expects that which suggests a linear scaling law (t conv ∼ N ) for convergence time with the system size N for all the possible initial conditions. A linear behavior is indeed confirmed by the numerical simulations with population sizes N = 50, 100, 500, 1000, 1500, 2000 starting from SIC, AIC, AICr. The relative numerical results are reported in Table II. Moreover, in Eq. (5) the size-dependence ofn int is ignored as it shows a weak dependence upon N , see panel (B) in Fig. 5.  ). Moreover, the average number of examples, relative to the absorbing state, always increases monotonically with the system size while a size-independent behavior is observed in the opposite case, see the right panel (B) of Fig. 5.
Finally, we compare the convergence time of the Bayesian word-learning model, t conv , with that of two-conventions NG model,t conv [33], by studying the corresponding ratio R = t conv /t conv for common initial conditions and population sizes. When starting with SIC, the values of the convergence times obtained from the two models become of the same order by increasing N : R decreases with N , reaching unity for N = 10000, see Fig. 6. In other words, the time scales of the two models become equivalent for relatively large system sizes, i.e., the learning processes of the two models perform equivalently and the Bayesian approach roughly gives rise to the one-shot learning that characterizes the two-conventions NG model. In the next section we discuss how the Bayesian model becomes asymptotically equivalent to the minimal NG model. The inset of Fig. 6 represents R versus N , for N < 2000, given different starting configurations, with SIC, AIC and AICr, and different population sizes. In the following, we focus on the case of SIC.

IV. STABILITY ANALYSIS
In this section we investigate the stability properties of the mean-field dynamics of the Bayesian NG model, in which statistical fluctuations and correlations are neglected. In the Bayesian NG model, as in the basic NG, agents can use two non-excluding options A and B to refer to the same concept C. The main difference between the Bayesian model and the basic NG model is in the learning process: a one-shot learning process in the basic NG and a Bayesian process in the Bayesian NG model. In the latter case the presence of a name in the word list indicates that the agent has generalized the corresponding concept from a set of positive recorded examples.
The NG model belongs to the wide class of models with two non-excluding options A and B, such as many models of bilingualism [41], in which transitions between state (A) and state (B) are allowed only through an intermediate ("bilingual") state (A, B), as schematized in Fig. 7. The mean-field equations for the fractions n A (t) and n B (t) can be obtained considering the gain and loss contributions of the transitions depicted in Fig. 7, Hereṅ a (t) = dn a (t)/dt and the quantities p a→b represent the respective transition rates per individual, corresponding to the arrows in Fig. 7 (a, b = A, B, AB). The equation for n AB (t) was omitted, since it is determined by the condition that the total number of agents is constant, n A (t) + n B (t) + n AB (t) = 1. The details of the possible pair-wise interactions in the Bayesian naming game are listed in Table I. From the various contributions, one obtains the master equatioṅ which can be rewritten in the form (6) with transition rates per individual given by Equations (8) provide the transition rates of learning processes, while Eqs. (9) give the transition rates of agreement processes. Setting x ≡ n A , y ≡ n B , and z = n AB ≡ 1 − x − y, the autonomous system (7) becomeṡ where v = (f x (x, y), f y (x, y)) is the velocity field in the phase plane. For the following analysis, it is convenient to write the Bayesian probabilities p A and p B appearing in these equations as time-dependent parameters of the model, but they are actually highly non-linear functions of the variables. In fact, they can be thought as averages of the microscopic Bayesian probability in Eq. (4) over the possible dynamical realizations. For this reason, they have also a complex non-local time-dependence on the previous history of the interactions between agents. For the moment, we assume p A (t) = p B (t) = p(t), returning later to the general case. From the conditions defining the critical points, f x (x, y) = f y (x, y) = 0, one obtains (x − y) z = 0. Setting z = 0, one obtains two solutions that correspond to consensus in A or B, given by (x 1 , y 1 , z 1 ) = (1, 0, 0) and (x 2 , y 2 , z 2 ) = (0, 1, 0). Instead, setting (x − y) = 0 leads to the equation that has the solutions For p ∈ (0, 1], the corresponding solutions (x ± , x ± , 1 − 2x ± ) are not suitable solutions, because z ± = 1 − 2x ± < 0. This analysis is valid for p > 0. In fact, p = p(t) is a function of time and for a finite interval of time after the initial time one has that p = 0, which defines a different dynamical system. In the initial conditions used, z(0) = 0, which implies z(t) = 0, x(t) = x(0), and y(t) = y(0) at any later time t as long as p(t) = 0, sinceẋ(t) =ẏ(t) =ż(t) = 0 (see Eq. (7); in fact, the whole line x + y = 1 (for 0 < x, y < 1) represents a continuous set of equilibrium points. The reason why in this model p(0) = 0 at t = 0 and also during a subsequent finite interval of time is twofold. First, agents do not have any examples associated to the name not known and they have to receive at least n * ex,A or n * ex,B examples, before being able to compute the corresponding Bayesian probability p A (t) or p B (t) -thus it is to be expected that p(t) = 0 meanwhile. Furthermore, even when agents can compute the Bayesian probabilities, the effective probability to generalize is actually zero, due to the threshold p * = 0.5 for a generalization to take place. The existence of the (temporary) equilibrium points on the line x + y = 1 ends as soon as the parameter p(t) > p * and, according to Eqs. (7), the two A-and B-consensus states become the only stable equilibrium points. The representative point in the x-y-plane is deemed to leave the initial conditions on the z = 1 − x − y = 0 line, due to the stochastic nature of the dynamics, which is not invariant under time reversal [42].
As long as the general case p A = p B , it can be shown that the trajectory of the system can point toward and eventually reach the consensus state with A or B, depending on whether p A (t * ) > p B (t * ) or p A (t * ) < p B (t * ), where t * > 0 is the critical time at which the representative point leaves the initial position. The convention A or B is selected randomly, depending on various factors related to the specific realization of the system evolution, such as the numbers of examplesn ex,A (t) andn ex,B (t) recorded by the agents until time t, their quality from the point of view of the generalization, and the initial asymmetry of the thresholds for generalizing, n * ex,A = n * ex,B . The asymmetrical thresholds n * ex,A = 4 < n * ex,B = 5 produce a bias toward consensus in A and play a crucial role in the subsequent Bayesian semiotic dynamics; in fact, swapping the threshold values (setting n * ex,A = 5 > n * ex,B = 4), the approach to consensus occurs with the outcomes A, B swapped. We observed that for N N * ≈ 500, the chances that the system converges to (B) become negligible. This can be seen in panels (C) and (D) of Fig. 8, showingn ex,A (t) andn ex,B (t) versus time (averaged over the agents of the system) for single runs, a population of N = 100 agents, and SIC, for different runs that relax toward consensus A and B, respectively. After an initial transient, in whichn ex,A (t) ≈n ex,B (t), they differ more and more from each other at times t > t * . In turn, also p A and p B begin to differ significantly from each other, thus affecting the rate of depletion of the populations during the subsequent dynamics. For instance, if p A > p B , then p B→AB > p A→AB , see Eqs. (8), which means that the depletion of n B occurs faster then that of n A . In turn, this favours the decay of the mixed states (A, B) into the state (A), see Eqs. (9), being n A > n B .
The asymmetry discussed above also affects the convergence times t A conv and t B conv and we find t B conv > t A conv in all the numerical simulations. Despite the noise, such a trend is already appreciable in a single run, as shown in panels (A) and (B) of Fig. 8. The mean fractions n A (t), n B (t), and n AB (t) versus time, obtained by averaging over many runs, result in less noisy outputs and provide a more clear picture of the difference, which is visible in Fig. 9, obtained using 600 runs starting with SIC and for N = 100 agents (panels (A) and (B)) and N = 200 agents (panels (C) and (D)). In addition, the convergence times depend on the system size, increasing with the number of agents N : compare the left panels (A) and (B), where N = 100 agents, with the right panels (C) and (D), where N = 200 agents.
The possibility that the same system, starting with the same initial conditions and evolving with the same dynamical parameters, can reach either A or B is a consequence of the stochastic nature of dynamics. This does not happen for N N * , when bothn ex,A andn ex,B reach some threshold values close to those observed at t conv , which is clearly a value sufficient for the agents to generalizing concept C. In fact, the scaling law of t conv with N shows that the sum ofn ex,A withn ex,B becomes nearly constant for N N * , implying that the dynamics is uniquely determined, that is, the consensus always occurs at A from SIC, once the agents have stored a threshold number ofn ex,A ,n ex,B . It is found that these threshold values correspond ton * ex,A = 21,n * ex,B = 12. Note that inn * ex,A ,n * ex,B we add values the four initial given examples stored in the agents' inventories at the beginning. The reason is that the generalization function p(t) outputs will effectively depend on them all. Therefore, at these threshold values, it would be very unlikely that p B > p A , and so it would be the same for the consensus at B. Now we consider the Bayesian probabilities p A (t) and p B (t) computed by agents and the corresponding number of learning attempts no A (t) and no B (t) made by agents at time t to learn concept C in association with word A or B, respectively, i.e. the number of times that the agents compute p A or p B (only the case of a system starting with SIC is considered). We consider a single run of a system with N = 5000 agents and study the average values p A (t),p B (t), obtained by averaging p A (t) and p B (t) over the agents of the system. We also assume a coarse-grained view, consisting in an additional average ofp A (t),p B (t), and no A , no B , over a a temporal bin ∆t = 16 × 10 3 , in order to reduce random fluctuations. Figure 10 shows the time evolution of the average probabilitiesp A (t) andp B (t) in the time-range where data allow a good statistics. The probabilities grow monotonically and eventually reach the value one. While this points at an equivalence between the mean-field regime of the Bayesian naming game and that of the two-conventions NG model, in which agents learn at the first attempt (one-shoot learning), such an equivalence is suggested but not fully reproduced by the coarse-grained analysis. The time evolution of the number of learning attempts no A (t) and no B (t) shows that they are negligible both at the beginning and at the end of the dynamicssee inset in Fig. 10. This is due to the fact that at the beginning it is most likely that either interactions between agents with the same conventions take place (starting with SIC, each agent has a probability of 50% to interact with an agent having the same convention) or interactions between agents with different conventions but with still too small inventories to be able to generalize concept C, leading to reinforcement processes only. When approaching consensus, agents with one of the conventions constitute the large majority of the population and thus they are again most likely to interact through reinforcements only. Thus, the largest numbers of attempt to learn concept C in association with A and B are expected to occur at the intermediate stage of the dynamics. In fact, no A (t) and no B (t) are observed to reach a maximum at t ≈ t conv /2 for any given system size N , as visible in the inset of Fig. 10. Notice that also the fraction of agents n AB , who know both conventions and can communicate using both name A and name B, possibly allowing other agents to generalize in association with name A or B, reaches its maximum roughly at the same time.

V. CONCLUSION
We introduced a novel agent-based model that describes the appearance of linguistic consensus through a wordlearning process. The work presented is exploratory in nature, concerning the minimal problem of a single concept that can be associated to two different possible names A or B, but is aimed at providing a prototype of general framework for describing the interaction between the social and the cognitive dimension. To this aim, the model is constructed on the basis of the semiotic dynamics of the NG model and is then extended by adding a Bayesian cognitive process, mimicking human learning processes.
The model describes in a natural way (1) the uncertainty accompanying the first phase of a learning process, (2) the gradual reduction of the uncertainty as more and more examples are provided, and (3) the ability to learn from a few examples. The semiotic dynamics of the synonyms is different from the basic NG, in that it depends on parameters that are of a strictly cognitive nature, such as the thresholds n * ex of the number of examples necessary before an agent can try to generalize and the acceptance threshold p * for carrying out the generalization of a concept. The interplay between the asymmetry of the conventions A and B, the system size, and the stochastic character of the time evolution, have dramatic consequences on the consensus dynamics: there is a critical time t * > 0, when the system begins to move in the phase-plane to eventually converge toward a consensus state; there is a critical system size N * , such that for N < N * the system can end up in any of the two consensus states and the convergence times depends on N ; there is an asymmetry in the branching probabilities that the system converges toward one of the two possible conventions and of the corresponding convergence times; the scaling laws of the convergence times versus N differ from those observed in the basic NG model, because they depend on the learning experience of the agents.
The cognitive dimension offers additional possibilities for modelling in terms of specific cognitive parameters problems that are out of the reach of traditional social dynamics models. The model illustrated in this work represents a step toward a generalized Bayesian approach to social interactions, leading to cultural conventions.
Future work can address specific problems of current interest from the point of view of cognitive processes; or features relevant from the general standpoint of complexity theory. In the first case, it is possible to study in the cognitive dimension the semiotic dynamics of homonyms, synonyms, and innovation, e.g., the cognitive conditions leading to a name A 1 , associated to a concept C 1 , splitting into two names A 1 and A 2 , associated to two related but distinct concepts C 1 and C 2 , as more examples become available that make the two concepts eventually distinguishable from each other -a type of problems that cannot be tackled within models of cultural competition. In the second case, one can mention the classical problem of the interplay between a central information source (bias) and the local influences of individuals -this time in a cognitive framework.
Another question to be investigated within a cognitive framework would be the role of heterogeneity. In fact, heterogeneity is known to characterize most of the known complex systems at various levels -here the diversity could affect the dynamical parameters of e.g. the different competing names as well as those of the agents. Heterogeneity of individuals can lead to counter-intuitive effects, such as resonant behaviors [44,45]. Furthermore, the complex, heterogeneous nature of a local underlying social network can change drastically the co-evolution and the time-scales of the conventions in competition with each other [46].