Conditional Neutral Reward Promotes Cooperation in the Spatial Prisoner’s Dilemma Game

Reward is an effective mechanism that promotes cooperation. However, an individual usually reward her opponents in certain cases. Inspired by this, a conditional neutral reward mechanism has been introduced. In detail, an individual will reward his or her neighbors with the same strategy when the payoff of the focal one is higher than that of his or her neighbors. And simulations are conducted to investigate the impact of our mechanism on the evolution of cooperation. Interestingly, cooperation can survive and dominate the system. Nominal antisocial reward that defectors reward each other is rarely because of the greed of defectors. By contrast, cooperators inside the cooperative clusters share the payoff with cooperators on the boundary so that the latter can form shields to protect cooperators.


INTRODUCTION
How cooperation among selfish individuals can emerge and maintain of has been an attractive question in biology, sociology, and many different fields [1][2][3][4][5]. For example, worker ants give up their reproductive capacity to build nests and collect food. And human beings play different roles in social division of labor. In order to explain the widespread phenomenon of cooperation, evolutionary game theory has been proposed and provides a powerful mathematical framework [6][7][8][9][10][11]. In many game models, PDG (prisoner's dilemma game) is regarded as a paradigm due to capturing the essence of cooperation. In the PDG, two players choose cooperation (C) or defection (D) at the same time without being known by the opponent. If they both choose cooperation or defection, they will both receive the reward (R) or get the punishment (P). However, if one chooses cooperation but the other chooses cooperation, the defector will get the temptation (T) while the cooperator will get the sucker's payoff (S). For PDG, the ranking rules are T > R > P > S and 2R > T + S. Obviously, the better choice is always defection no matter which strategy the other chooses. But if two individuals both defects, they will receive the less payoff than both cooperating. This is the dilemma.
In the landmark work of Nowak, the mechanism of spatial topology, widely known as spatial reciprocity, has proved to be an effective mechanism to promote cooperative coevolution . Inspired by this, many kinds of spatial topologies are applicated to study the cooperative dynamics in evolution, such as square lattice network, ER random network, small-world network, BA scale-free network and so on [39][40][41][42][43][44][45][46][47]. Besides, to explain cooperation on the spatial topologies, different mechanisms have been proposed, such as reputation, asymmetric interaction, different update rule, co-evolution of dynamical rules, reward or punishment and so on [48][49][50][51][52][53].
Recent research has shown that rewarding is an effective way to promote cooperation. Various rewards are often given to those who perform well, which is very common in the real society. In this paper, we consider a reward mechanism that the individual could pay a cost to reward the neighbors who has the same strategy. Meanwhile, he could be rewarded by his neighbors. We find that rewards have a positive effect on the maintenance of cooperation, which is manifested in the fact that the weakened cooperators are supported against the invasion of defectors by the population of their kinds, in the form of rewards, while it is opposite for defectors. This creates a unique boundary structure. The reminder of this paper is organized as follows. First, we describe our model detailly. Then, we show the simulation results with figures and try to give an explanation. Finally, we summarize and give the discussion about the conclusions.

METHODS
We introduce social reward in the PDG (prisoner's dilemma game) on L*L square lattices, where each player occupies one and is surrounded by four neighbors. Each player will be initialized as either C (cooperator) or D (defector) with the same possibility. We use the standard PDG by setting T b (1 < b < 2), R 1, P 0, and S 0. The value of b specifies the strength of the dilemma [54][55][56]. Hence, the payoff matrix of PDG is described as follows: (1) A player x is chosen randomly at the beginning of each time step, whose payoff p x can be calculated: where p xy is the payoff of player x obtained from neighbor y, and it is defined by the payoff matrix. Four neighbors of player x get their payoffs in the same way. Thus, the average payoff of player x's neighbors p avN is calculated as follows: If payoff of player x is higher than his neighbors' or equal, he will pay a cost to reward each his neighbors who have the same strategy. Otherwise, his payoff remains the same. Meanwhile, his four neighbors follow the same procedure. The accumulated payoff of player x at current time step is: where N r (0 ≤ N r ≤ 4) is the number of neighbors rewarding player x, N c (0 ≤ N c ≤ 4) is the number of neighbors whom player x rewards, and r and c is the value of rewards and cost. Finally, player x updates his strategy. A neighbor y is chosen and player x learns the strategy of y randomly with the probability as following: where K indicates the amplitude of noise that also called intensity of selection [57][58][59][60][61][62]. Without loss of generality, we set K 0.1. The Monte Carlo simulation is carried out with setting L 200, and the number of all step is set to 5 × 10 4 . We choose the data of last 5 × 10 3 steps to calculate the ultimate average fraction of cooperation. To eliminate the random errors, the final results is the average value of 10 independent repeated experiments for each value of parameters.

RESULTS
In order to verify the impact of our reward mechanism on cooperation, we give a contour plot as Figure 1, where the simulation result of fraction of cooperation ρ c with a fixed parameter cost c (0.01) and two changing parameters reward r and temptation of defect b is shown. From the figure we can see that first, it is obviously that when the b is relatively large, due to the introduction of reward mechanism (r > 0), the fraction of cooperation ρ c is still at a high level. In sharp contrast, in the traditional case, when b > 1.04, the cooperation disappears. Second, the contour plot has an obvious dividing line. The area at the upper left of this line, which represents smaller b and larger r, is the area where the cooperation survives. In other words, when the reward mechanism is introduced, for the same b, a higher level of reward r leads to a higher fraction of cooperation ρ c . All in all, our reward mechanism strongly promotes cooperation. Figure 2 shows the impact of both cost c and reward r on the evolution of cooperation. Interestingly, in Figure 2 we can see that the both increasements of c and r lead to more appropriate condition for survival of cooperation. What's more, the boundaries among cooperation phase, mixed phase and defection phase are close to straight lines. The phenomenon can be described as under what conditions complete cooperation be formed, which is explained as follows. The key point lies in whether the cooperator can resist the invasion of the defector on the boundary between the cooperator cluster and the defector cluster, which depends on the payoffs of the two type players. Let's consider a common situation on the boundary, as shown in Figure 3.
In Figure 3, we show the payoffs of a cooperator and a defector on the boundary, which are marked with dotted boxes. For the defector, first, he gets payoff 2b from his two cooperative neighbors and 0 from his two defective neighbors. It is obviously that his payoff (2b) is higher than the average payoff of his four neighbors (1) because of the interaction with cooperators on the boundary. According to the reward mechanism we introduced, he must pay costs of 2c to reward his two defective neighbors, but he doesn't get any reward from his neighbor with the same strategy due to their low payoff. Therefore, his updated payoff is 2b − 2c. However, it is different for the cooperator, who gets 2 from his two neighbors with cooperation strategy firstly. Interacting with defectors makes his payoff (2) lower than the average payoff of his four neighbors (2 + b). Thus, he is from paying the cost, but gets the rewards from his neighbors, which increases his payoff to 2 + 2r. Obviously, the critical condition is For this situation, if r + c < b − 1, the cooperator will be at a disadvantage in the game, and vice versa. In this way, we can give the reason why the both increasements of c and r lead to more appropriate condition for survival of cooperation. What's more, the critical condition is also a linear function, which explains that boundaries among different phases are close to straight lines as shown in Figure 2.
As mentioned before, the supports of cooperation clusters to cooperators on the boundaries, and on the contrary for defectors on the boundaries of defection clusters, may be the potential reasons for promoting cooperation. To confirm that, we show the characteristic snapshots in Figure 4, where different types of nodes called cooperator (C), cooperative rewarder (CR), defector (D) and defective rewarder (DR), are marked in four different colors. Here we fixed cost 0.1 and from top to bottom, the reward is set as 0.1, 0.3, 0.6 respectively. Obviously, the evolution of game is very different under different values of reward when the value of cost is fixed. It is worth mentioning that distributions CR-C-DR-D as shown in Figure 3 indeed appear on the boundaries among cluster of cooperators and cluster of defectors at any value of reward. However, different r led to different evolution Tendency, which further led to different results. When r is relatively small (0.1), cooperation clusters cannot provide adequate support to cooperators on the boundaries. Hence the cooperators  can't resist the invasion of the defectors. As r increases, cooperators begin to take advantage on the boundaries. When r 0.5, it can be observed that while the cooperation cluster invaded the defection cluster, it is also invaded by the defectors. For this phenomenon, our explanation is that when the cluster expands, its boundary structure changes. In particular, the expansion of clusters will produce ragged boundaries, which weakens the role of the cluster to the boundary player. When r 1, cooperation has an absolute advantage over defection, so it has expanded rapidly and soon occupied the entire region. Figure 5 shows how the size of the largest cluster of all kinds of cooperators (including cooperators and cooperative rewarders) S cr and the cooperators who isn't a rewarder S c at the last MC step evolves respectively when cost 0.1 and reward 0.3. When b ≤ 1.1, there is only one cluster of cooperators and its size is equal to the scale of the network, due to the fact that cooperation dominates the system. Now there is no rewarder in populations of cooperators. As b increases, the clusters of cooperators are invaded, and its size decreases. It should be noted that S c is smaller S cr , which suggests that cooperative clusters are surrounded by cooperative rewarders to free from invasion. When b further increases, the huge cooperation cluster disintegrates rapidly and decomposes into many small cooperation clusters until it disappears completely.

CONCLUSION
In the real world, individuals are more willing to reward other participants according to certain conditions rather than directly reward them. Hence, we explore the effects of neutral and conditional rewards in structural groups. By numerical simulation, we find that cooperation can be greatly promoted, while conditional antisocial reward does not prevent the evolution of cooperation. From the micro perspective, we provide some evidence to prove that our mechanism enhances the spatial reciprocity and is conducive to the formation of cooperation clusters. In our model, the individuals in the cooperative cluster reward the same kind of individuals on the boundary, so that the latter can form a shield to protect the  former. On the contrary, defectors on the border will gradually reduce themselves after rewarding similar individuals inside. By and large, Social reward rather than antisocial reward shapes the direction of collective behavior when an individual rewards others under the condition that her payoff is higher. We hope our work is helpful to resolve the social dilemmas in real society.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.