Social Performance Cues Induce Behavioral Flexibility in Humans

Behavioral flexibility allows individuals to react to environmental changes, but changing established behavior carries costs, with unknown benefits. Individuals may thus modify their behavioral flexibility according to the prevailing circumstances. Social information provided by the performance level of others provides one possible cue to assess the potential benefits of changing behavior, since out-performance in similar circumstances indicates that novel behaviors (innovations) are potentially useful. We demonstrate that social performance cues, in the form of previous players’ scores in a problem-solving computer game, influence behavioral flexibility. Participants viewed only performance indicators, not the innovative behavior of others. While performance cues (high, low, or no scores) had little effect on innovation discovery rates, participants that viewed high scores increased their utilization of innovations, allowing them to exploit the virtual environment more effectively than players viewing low or no scores. Perceived conspecific performance can thus shape human decisions to adopt novel traits, even when the traits employed cannot be copied. This simple mechanism, social performance feedback, could be a driver of both the facultative adoption of innovations and cumulative cultural evolution, processes critical to human success.

("low cued" versus "high cued"), or to no cues (" without-cues"). These cues took the form of high-scores, ostensibly from previous participants. Participants then could explore a novel threedimensional computer environment with the overall goal to amass as many points as possible within a limited time. Participants knew that more points resulted in a higher payment at the end of the game. Points could be gained by making and repeatedly using novel discoveries. To draw parallels to research on cultural transmission, we term these discoveries "innovations" and their repeated use "adoption" or "exploitation" of an innovation. Innovations that individuals adopt and use repeatedly will be those most likely to spread through cultural transmission (Reader and Laland, 2003b).
In the computer environment, participants were faced with trade-offs between exploration and exploitation, since exploitation of an already discovered innovation provided points, but precluded further exploration that might have led to a still more profitable discovery. Many fields investigate exploration-exploitation tradeoffs, including work encompassing underlying neural and genetic mechanisms (March, 1991;Daw et al., 2006;Cohen et al., 2007;Frank et al., 2009). As Cohen et al. (2007) note, social cues are a potentially critical but understudied factor in determining the balance struck between exploration and exploitation.
Innovations were initially unknown to the participants and had to be discovered via costly trial and error, since actions cost time and points, and only particular combinations or sequences of actions resulted in rewards. For example, participants could discover that using two tools sequentially with a certain object resulted in a points gain, but if participants tried tools in the wrong sequence or with the wrong object they lost time and points. By allowing participants to explore a computer environment with all actions recorded, we could identify the first time an action sequence was performed (discovery of an innovation) and whether these sequences were repeated (exploitation of an innovation). There were three classes of innovation to be discovered that differed in complexity, and, within each class, four sub-innovations that used related techniques. Thus discovery of one sub-innovation could facilitate discovery of the other sub-innovations within the same class, since similar techniques could be applied. This could represent, for example, discovery of a novel food processing technique, which could then be applied to different foods. To be successful players had to both explore their virtual world and exploit the innovations they discovered.

PartIcIPants
Participants were 64 adult undergraduate students (30 females, M = 23.6, SD = 4.7 years) of Utrecht University and Hogeschool Utrecht, The Netherlands. Participants were recruited by posters and were paid on average 8.1 Euros (ca. $11.6 US, range 6-10 Euros). We assigned participants randomly to the experimental treatments. Before beginning, written instructions were given and read to the participants, participants could ask questions aloud, and written informed consent was obtained from all participants. Players could not ask questions during the actual game, but could ask questions regarding the post-game questionnaire in private. The procedures and questionnaires were approved by the Medical Ethics Review Committee (METC) of the Universitair Medisch Centrum (protocol number 06-672) and comply with the ethical guidelines of the APA and the principles expressed in the Declaration of Helsinki.

exPerIMental task
Participants played a computer game involving exploration of a virtual three-dimensional double T-maze. Two to six players played simultaneously in the same room, in private booths visually isolated from one another and wearing headphones so that other players' key strokes were inaudible. Players were instructed that they had to collect as many points as possible in the upcoming 20-30 min and that remuneration would depend on their score. Points could be scored by selecting one of four available tools and using them somewhere in the maze. A red box appeared on the screen in areas where objects were present on which tools could be used and potentially give points. There were four such areas, one located at the end of each of the four maze arms and marked by colored cone-shaped objects. Different tools carried different costs, just as "real world" tools might differ in the effort needed to use them. Each tool cost a certain number of points per use (3, 6, 9, or 12), regardless of where or how it was used. Thus incorrect use of a tool cost points. Tools were pictured as arbitrary symbols, with the four symbols randomly allocated to each tool for each player to eliminate any influence of consistent preferences for a particular symbol. Players could discover three ways to gain points, here termed "innovations," which varied in their difficulty of discovery: (a) tool-object combinations: a specific tool used at a specific cone gave points (for example tool C had to be used at cone 1); (b) tool-tool-object combinations: the tool-object innovation tool used on the appropriate cone followed by another specific tool used at the same cone gave points [innovation (b) thus builds on innovation (a); for example tool C then tool A had to be used at cone 1]; (c) object-object-object combinations; visits to three cones in a certain order, using any tool at each cone, gave points (for example cone 1 then 2 then 3 had to be visited in sequence, using any tool at each cone). To discourage players from using tools in rapid succession the recently activated cone became inactive for 2 s after a tool was used there, visualized by the cone becoming gray. It took approximately 3-5 s for players to move directly from cone to cone. Players were informed how to select tools and move around the maze, that they began with 1000 points, that tools had costs, that tools could result in gains when the red box appeared, that gray indicated inactivity, and that the game would end if their points dropped to zero. They were otherwise unaware of how to gain points or the relative costs of the tools.
For each innovation, four "sub-innovations" existed. That is, for the tool-object combination there was one correct combination for each cone (e.g., the four sub-innovations could be tool C with cone 1, B with cone 2, A with 3, and D with 4). For the tool-tool-object combination there was one correct combination for each cone (e.g., the four sub-innovations could be tools C then A at cone 1, B then D at cone 2, A then B at 3, and D then C at 4). For the object-object-object combinations, each sub-innovation included three different cone locations that had to be visited, and all cone locations were included three times in this innovation (e.g., the four sub-innovations could be to visit cone 1 then 2 then 3, 4 then 2 then 1, 3 then 1 then 4, and 2 then 3 then 4). We could (2007). We included gender, previous experience with computer games (self-reported on a 7 point Likert scale), and experimental treatment as independent variables and their two-way interactions. Gender and previous computer experience, however, had no significant influence in any of our models and we thus report models that only include treatment as an independent variable. Significance tests of GLMs were conducted via F and χ 2 statistics. For significant differences between treatments we report Cohen's d for GLMs as effect size (equation 10, Nakagawa and Cuthill, 2007). results socIal cues of hIgh-PerforMance Increase PerforMance, asPIratIons, and actIvIty Players exposed to high-performance cues scored significantly more points than those exposed to low performance or no cues [points, high cued: Mdn = 2093, median absolute deviation (MAD) = 1623; low cued: Mdn = 1337, MAD = 431; without-cues: Mdn = 1183, MAD = 753; GLM (quasipoisson errors, log link function): F(2, 61) = 5.7; p = 0.005; see Table 1 for treatment contrasts]. Participants were asked to estimate their final scores prior to the game, and these estimates were significantly higher for high-cued and without-cue participants than low-cued participants [high cued: Mdn = 2700, MAD = 1038, range 600-10000; low cued: Mdn = 1168, MAD = 101, range 600-1500; without-cues: Mdn = 2000, MAD = 1482, range 250-12000; GLM (quasipoisson errors, log link function): F(2, 61) = 9.6; p < 0.001; Table 1]. Only one player in the low cued and one in the high-cued treatment estimated their score would be higher than the maximum high-score they observed on the posters. In fact, six low-cued and four high-cued players beat the maximum high-score they had observed. Activity, measured as the number of tool-activating keystrokes during the game, was significantly higher when high cued than low cued, while the activity of thus examine how many innovations players discovered, and to what extent they explored and exploited a particular innovation's "parameter space" (i.e., the sub-innovations).
The points received from a sub-innovation depended on how many innovations they previously had discovered and how often they already had exploited a particular sub-innovation. The first innovation discovered yielded 10 points, the second 100 points, and the third 1000 points. For example, an object-object-object innovation would give 10 points if it was discovered before the tool-object and tool-tool-object innovations, 100 points if discovered after the tool-object innovation, and 1000 points if discovered last. Thus the third innovation discovered gave the most points, regardless of which innovation it was. Repeated use of a sub-innovation gave less and less points [calculated by multiplying the points gained (10, 100, or 1000) by (0.85 s ), where s is the number of uses of that sub-innovation], until the gains equaled the tool cost and zero points were received. The decline in reward with repeated use applied independently to each sub-innovation. For example, the second tool-object sub-innovation discovered would initially score 10 points. Points gained were displayed on the screen directly after using a tool, colored green for gains and red with a minus symbol for losses. The game lasted exactly 20 min, after which participants completed a questionnaire, were paid in private, and debriefed.

exPerIMental ManIPulatIon
After instruction, we asked participants to guess how many points they would score. In two of the three experimental treatments we indicated two identical handwritten posters on the wall, and informed players they were the top 10 previous scores of inexperienced players to provide an orientation of how many points others scored in earlier sessions. We constructed the scores to manipulate the available social information on past performance. In the lowperformance cues experimental treatment (n = 21), the presented scores ranged from 958-1453 points, compared with 2711-4112 points in the high-performance cues treatment (n = 21). The scores were distributed evenly with small random deviations. To achieve the top score in the low-performance treatment players needed to discover at least two innovations, whereas in the high-performance treatment all three innovations had to be discovered. In the control treatment players saw no posters (n = 22). Since the instructions mentioned 1000 starting points, players in the control treatment had an indication of the scale on which points were measured. The maximum possible score was approximately 40000.

analysIs
For each player we recorded the points scored, number of actions, tools used, innovations discovered, how often innovations were employed, and which sub-innovations were used. An individual was assessed as having discovered an innovation if it was performed once or more. The number of discoveries of innovations provides a measure of how exploratory players were. To measure exploitation of innovations, we examined the number of times a given innovation and its sub-innovations were used. High usage of a specific innovation indicates that a behavior has been adopted, i.e., incorporated in the repertoire of the participant. Data were analyzed using generalized linear models (GLMs) in R 2.9.2, with link functions and underlying error distributions chosen according to Crawley To measure innovation exploitation rates, we totaled the number of times the sub-innovations of each innovation class were performed. All participants exploited the tool-object innovation, typically exploiting all four tool-object sub-innovations. High-cued participants exploited the tool-tool-object and object-object-object innovation classes more often than participants exposed to low or no cues [high cued: Mdn = 13.0, MAD = 19.3; low cued: Mdn = 3.0, MAD = 4.4; without-cues: Mdn = 5.5, MAD = 6.7, Figure 1B, GLM (quasipoisson errors, log link function): F(2, 61) = 9.3; p < 0.001; Table 1]. An identical pattern was observed when restricting the analysis of exploitation rate to those players that discovered at least two innovation classes, removing players that discovered only the tool-object innovation [F(2, 49) = 9.1; p < 0.001]. In summary, high-cued participants exploited innovations more than did lowcued participants or participants that saw no cues, although their innovation discovery rates were similar.

socIal Influences on dIscovery versus exPloItatIon of InnovatIons
We distinguished between the initial performance of an innovation (discovery) and its repeated use (exploitation). There were three classes of innovation that could be discovered and exploited (tool-object, tool-tool-object, and object-object-object: see above).
All players discovered the tool-object innovation, and always before any other innovation, thus social performance cues had no detectable effect on this first innovation. Similar numbers of participants discovered a second innovation (either tool-tool-object or object-object-object) after exposure to high, low, or no performance cues (high cued: 18 of 21; low cued: 15 of 21; without-cues: 19 of 22), and there were no significant differences in discovery rates [GLM (binomial errors, logit link function): difference in deviance (χ 2 distributed on 2 df) = 1.9, p = 0.38; Table 1]. Only one participant (of 64) discovered all three innovations, thus precluding separate analysis of the third innovation discovered. perception of high performance in other individuals appears to facilitate the exploitation of innovations, i.e., the adoption of a novel trait, rather than perception of low-performance depressing the exploitation of innovations.
Social cues of high performance ("social performance feedback") can drive the adoption of innovations, at least in our experimental setting. Social comparison theory has inspired extensive research addressing how people compare themselves with others and the outcomes of such self-comparison (Festinger, 1954;Buunk and Gibbons, 2007;Johnson and Stapel, 2007). Social comparison is widely observed in humans, with individual and cultural differences, and has been linked to underlying neurocognitive substrates (Buunk and Gibbons, 2007;Fliessbach et al., 2007). Social comparison can elicit increased performance (Gibbons et al., 2000;Huguet et al., 2001;Johnson and Stapel, 2007), even over and above an individual's goal to improve personal performance (Gollwitzer, 1999). However, we did not examine the typical research foci of social comparison studies, such as influences of the similarity or expertise of other individuals (Wheeler et al., 1997;Suls et al., 2002). Moreover, we did not set out to investigate whether the processes we describe are unique to the social domain. It is possible that asocial indicators of achievable performance may have produced identical results. For example, we could have informed players that the highscores they observed were produced by computer-controlled agents making random choices. Such asocial cues may be rare. In real-life situations performance indicators will typically only be available by individuals interacting with the world, meaning that performance indicators must come from individual exploration and exploitation or from social cues. Many individual performance indictors are available, allowing individuals to assess performance without social cues. For example, deviation from past rates of personal success ("historical performance feedback") has received considerable empirical attention (e.g., Greve, 1998). However, social cues are also readily available and provide an efficient means to assess achievable performance, particularly when success could be achieved in many domains and there are multiple measures of performance. For example, an author could be achieving 100% success in articles being accepted, but remain unaware of whether this was maximal performance. Social information could inform the author that others are publishing more papers, in better journals, expending less effort per article, or achieving in some other domain simultaneously. We demonstrate here that social performance indicators influence rates of individual exploitation. We suspect that social cues will be more salient and widely used than many asocial cues, but this is an issue for further empirical test.
A possible objection to our interpretation of this study is that the innovations we describe are insufficiently complex to provide data applicable to innovation in general. An area of controversy is how novel an act must be to qualify as innovation, and how this novelty can be objectively measured (Reader and Laland, 2003b). The observed behavioral changes are not highly novel once-in-alifetime inventions, and would presumably be repeatedly discovered in populations, thus failing some definitions of true innovation (see discussion in Reader and Laland, 2003a;Ramsey et al., 2007). Moreover, we do not attempt to delimit the many neural and cognitive processes potentially underlying the production and adoption of innovations. For example, we cannot distinguish between function): F(2, 49) = 2.7; p = 0.08; players that did not discover a second innovation were excluded from the analysis]. Note that the (non-significant) pattern is opposite to that predicted by exploitation frequency: high-cued participants discovered the second innovation later than other participants.

dIscussIon
Individuals who were exposed to social indicators of high performance were more active, exploited innovations more, and scored more points than did individuals exposed to low performance or no cues. High-cued individuals also expected that they would do better than individuals exposed to low-performance cues. However, social performance cues had no significant effects on rates of innovation discovery. Thus cues of how others have performed had dramatic effects on the adoption of novel behavior patterns into an individual's repertoire, even when the innovations themselves cannot be observed or copied and it is not known that innovation was the reason for another's success. This suggests that innovations will be more likely to spread through groups when individuals see indicators that others are doing well.
Almost all individuals in our study discovered a second innovation, but only those exposed to high-performance cues continued to use and exploit it. Thus individuals exposed to low-performance cues and to no cues were apparently less inclined to attend to new discoveries or to change their behavior, despite the fact that such discoveries resulted in a points yield 10-fold that previously experienced. This finding is all the more striking given that participants knew that more points led to increased compensation at the end of the game. The exact sequence of acts leading to the increased reward was likely unclear to the participants upon first discovery, so further exploration was needed to ascertain how to achieve this yield again, requiring further investment of points. Players exposed to low-performance cues avoided this investment. They appeared less willing to spend points, slightly preferring low cost tools, for example. Furthermore, their reported expectations were lower, arguably reflecting lower aspiration levels (goals) than high-cued players. A 100 point windfall would bring them near their mean expectation (1168 points) or the highest viewed score (1453 points). Moreover, they could beat the lowest viewed high-score by simply avoiding excessive loss of points. They may thus have been satisfied with a low rate of reward, and pursued a risk-averse strategy to avoid losing accumulated points. Compared to low-cued players, players that saw no performance cues had higher expectations (i.e., they predicted they would score more points) and did not avoid high-cost tools. However, like low-cued players, they did not exploit already discovered innovations as readily as high-cued players. Thus probability that they will be maintained, and increasing withinpopulation variance in the traits displayed, all of which could influence cultural evolution (Galef and Whiskin, 1997;Strimling et al., 2009). Spear designs, for example, are socially transmitted but have to be adjusted to individual strength and weight in order to be used efficiently (Hughes, 1998). Thus, social performance feedback could result in the accelerated diffusion and modification of innovations.
A visible lack of high performers could decrease innovation rates, particularly if acquisition of a trait is costly or benefits are delayed. Moreover, social performance feedback could hamper cumulative cultural evolution by accelerating the uptake of innovations, if high-performance cues result in "over-innovation," i.e., suboptimally high levels of innovation which cause the loss of established beneficial traditions or prevent the exploitation of currently available resources. Social performance cues may themselves form a positive feedback cycle driving cumulative cultural evolution where only information on performance, and not on the acts or traits, is available. That is, social performance feedback results in higher performance, resulting in still higher performance for the next cultural generation. Such a feedback cycle could result from observed performance, or from other feedback on performance, such as verbal prompting or disapproval or approval of others' performance levels (for analysis of the influence of approval of an act, see Castro and Toro, 2004). Thus relatively simple social cues based on performance comparisons could have widespread influences on human cultural evolution. conclusIon Social information on the performance level of other individuals influenced individual decisions to exploit novel behavior patterns in an unfamiliar computer environment. Thus social cues can bias the way in which novel behaviors are adopted and expressed, raising the possibility that social cues will influence the extent of cumulative cultural evolution. Moreover, since individuals exposed to social cues of high-performance outperformed other individuals and incorporated more innovations in their repertoire, social performance feedback provides a mechanism by which differences in behavioral flexibility can themselves be transmitted.

acknowledgMents
We thank N. J. Boogert, S. M. Hrotic, G. J. M. Lucas, V. A. Sarkol and the reviewers for helpful discussion, comments or assistance and the Netherlands Organisation for Scientific Research (NWO) "Evolution and Behaviour" and "Cognition" Programmes for funding. discoveries made accidentally versus those involving some foresight. However, our aim was to investigate social influences on behavioral flexibility, and our findings now open the possibility of studying the mechanisms and generality of these social influences. Our procedure allowed us to study processes related to innovation in experimental settings where reasonable numbers of discoveries differing in complexity were made in the allotted time.
Social cues provided information on potential gains from further exploration and investment in tool use, informing a decision about behavioral change. In other words, social information revealed an opportunity space that promoted change, probably by reducing participants' sensitivity to exploration costs and stimulating effective exploitation. In our experiment, those exploiting innovations outperformed others, and thus those exposed to high-performance cues performed best. However, in other situations innovators may do poorly, meaning that sensitivity to high-performance cues would produce suboptimal behavior. For example, high performance could be the result of luck (Offerman and Schotter, 2009), in which case focusing on the highest performance levels may present a false image of the possible benefits of innovation. Humans in experimental settings have strong tendencies to copy the best even when this is not advantageous (Offerman and Schotter, 2009), meaning that the use of performance cues could on occasion result in suboptimal behavior. Our results add weight to a growing literature showing that prevailing conditions and recent experience can influence which social and individual information gathering strategies are employed (Laland, 2004;Kendal et al., 2005;Efferson et al., 2008;Galef, 2009;Leadbeater and Chittka 2009;Toelch et al., 2009;. Social performance feedback could influence both the rate and course of cultural evolution. Cumulative cultural evolution (Tomasello, 2000) demands (i) adoption of novel behavior by the inventor, (ii) adoption of this novel solution by other individuals, and (iii) further innovation based on the acquired behavior patterns. Adoption of a novel behavior frequently requires abandonment of an old, inferior behavior, a process chimpanzees (Pan troglodytes) find challenging (Marshall-Pescini and Whiten, 2008;Hrubesch et al., 2009). This process is key to human cultural evolution because for many problems a solution already exists, such as hammers replacing blunt stones (Basalla, 1988). Our demonstration that social performance cues increase a discoverer's rate of abandonment of old acts and adoption of new ones raises the possibility that such cues could also increase social learning of novel acts, because adoption is more likely. Moreover, our results suggest social performance feedback will increase experimentation with acquired traits, potentially improving their utility, affecting the