Self-confidence, Overconfidence and Prenatal Testosterone Exposure: Evidence from the Lab

This paper examines whether foetal testosterone exposure predicts the extent of confidence and over-confidence in own absolute ability in adulthood. To study this question, we elicited incentive-compatible measures of confidence and over-confidence in the lab and correlate them with measures of right hand 2D:4D, used as as a marker for the strength of prenatal testosterone exposure. We provide evidence that men with higher prenatal testosterone exposure (i.e., low 2D:4D ratio) are less likely to set unrealistically high expectations about their own performance. This in turn helps them to gain higher monetary rewards. Men exposed to low prenatal testosterone levels, instead, set unrealistically high expectations which results in self-defeating behavior.


Introduction
Self-confidence and overconfidence play a crucial role in people's decisions and welfare.
While positive thinking can enhance motivation and improve performance, being overly confident -i.e. believing one is better than one actually is -can be self-defeating (Benabou and Tirole, 2002). Indeed, overconfidence bias has been used to explain phenomena such as business failures (Camerer and Lovallo, 1999), stock market bubbles and excessively frequent trading (Barber and Odean, 2001; Grinblatt and Keloharju, 2009).
An important question that arises is what determines the level of self-confidence and overconfidence. It is known that nurture does play a role. Mastering own experiences and observing successful experiences of similar others can influence people's confidence (Bandura 1997). Does nature play any role too?
We address this question by examining whether prenatal testosterone exposure determines people's confidence and overconfidence about their own ability to perform a rather unfamiliar and challenging task. 1 We found that, ceteris paribus, male subjects exposed to low prenatal testosterone levels were more likely to overestimate their actual performance.
Such overestimation, rather than being a rational strategy to increase motivation and hence performance, showed to be self-defeating. Overconfident participants gained significantly less earnings than participants who were rather conservative in their expectations.
As a marker for the strength of prenatal testosterone exposure we used the ratio of the length of the index finger to the length of the ring finger (2D:4D). We followed the vast literature started by Manning et al. (1998) which shows that individuals with conditions associated with very high prenatal testosterone levels exhibit significantly smaller 2D:4D (Brown et al., 2002). 2 To measure confidence and overconfidence, we implemented an incentive-compatible scheme. We introduced participants to an unfamiliar task, and we asked them to report the number of tasks they expected to solve during the experiment. Their total final earnings depended on the precision of their estimate, so subjects had incentives to truthfully report their expected performance (i.e. their confidence). 3 Our experimental design also allowed to measure the degree of overestimation of actual performance (i.e. overconfidence) in an incentive-compatible way. We payed the subjects piece-wise during their performance task, so, when performing, subjects had enough material incentives to perform up to their maximal potential. The difference between these two incentive-compatible measures (i.e. expected minus actual performance) constituted our incentive-compatible measure of over-1 Prenatal testosterone exposure has been shown to have important organizing effects on brain development, several psychological traits and behavior (see Tobet and Baum, 1987) 2 The most direct evidence for the link between 2D:4D and prenatal testosterone exposure comes from Lutchmaya et al. (2004) who measure foetal oestrogen and testosterone levels before birth and record digit lengths at age two. They find that the right-hand digit ratio is significantly correlated with prenatal testosterone levels and the ratio of testosterone to oestrogen levels. 3 The incentive-compatible scheme of payments we used was also implemented by Mobius and Rosenblant (2006) to measure self-confidence in a lab setting. Next section describes in detail the mechanism. confidence.
This paper contributes to the literature in several ways. First, overconfidence is "perhaps the most robust finding in the psychology of judgment" (De Bondt and Thaler, 1995, p. 389). Here we provide evidence that it is -at least partially -biologically determined.
Second, our results unify two well-known empirical findings in the literature of economics and finance. On the one hand, Barber and Odean (2001) find that overconfident traders earn lower returns than more conservative traders. On the other hand, Coates et al. (2009) show that male traders who earn higher long term returns and remain longer time on business have been exposed to high prenatal testosterone levels (i.e. lower 2D:4D). Hence, our results reconcile these two pieces of independent evidence, providing a plausible explanation  (Sapienza et al., 2008). However, to our knowledge, this is the first paper investigating the link between 2D:4D and confidence and overconfidence. 4 The rest of the paper is organized as follows. Section 2 introduces the experimental method. Section 3 describes the data and Section 4 introduces the results. Section 5 concludes.

Method
We designed an experiment to measure the three variables of interest: (ex-ante) selfconfidence, ex-post overconfidence and the second to fourth digit ratio (2D:4D). Through emails and leaflets, we recruited two hundred fifty-five undergraduate and graduate students from the University of Warwick. We conducted twelve sessions with approximately twenty students each. Each session lasted sixty minutes. The average payment was £ 14 including a show up fee of £ 5. In each session, the sequence of the experiment was as follows. Once each subject read and signed the consent form, the experimenter would read out loud the experimental instructions, which included a description of the task and the monetary payments. 5 Participants were informed that they had twenty minutes to complete the same task and that they would be payed 100 points (equivalent to £ 1) per completed task. Subjects were given one minute of practice time to get familiar with the task and after that, we elicited their self-confidence in the following way. 6 We asked them to predict the number of tasks they expected to successfully complete in the twenty minutes of performance time. The answer to that question constituted our measure of self-confidence. In Section 2.1 below we describe the incentive-compatible mechanism of self-confidence elicitation. Once the subjects reported their prediction, they started performing the task for twenty minutes. When they finished, they were asked to fill in a questionnaire, they were payed and their right hands were scanned. Below we describe in more detail the manner in which self-confidence, overconfidence and the 2D:4D were measured.

Confidence, Overconfidence and Incentives Scheme
Self-confidence is broadly defined as a feeling of trust in one's ability, quality and judgment.
The literature of social psychology has operationalized this broad concept using two related constructs: "perceived self-efficacy"and "outcome expectations". Perceived self-efficacy is a judgment of capability to execute given types of performances; outcome expectations are judgments about the anticipated outcomes that would arise from such performances (Bandura, 1977(Bandura, , 1986. 7 Both psychological concepts are usually measured with surveys compounded of several rather broad statements to which the respondents have to agree or disagree following a likert scale. For example, perceived self-efficacy scales include items such as "I can solve most problems if I invest the necessary effort"or "I can usually handle whatever comes my way". Outcome expectancy scales contain statements of the type "If I quit smoking I will save money"or "If I quit smoking I will gain weight." Although these scales have been proved to be useful in many settings, they were not appropriate for the purpose of this paper for the following reasons. First, we required a unidimensional and easily interpretable measure of how confident the person was about his/her capacity to perform an unfamiliar task in the lab. These scales are rather multidimensional and general. Second, this paper also aimed at measuring overconfidence, so we needed to be able to evaluate how far were expectations from actual performance. The existing psy-5 See Appendix A for the instructions and appendices B and C for a snapshot of the screen the subjects saw. 6 One minute was only enough to understand what the task was about, but was not enough to understand how to fully solve it, except for someone who had previous expertise with a similar task. Out of the 257 subjects, only 5 subjects managed to solve the task during the practice time and we excluded them from our analysis. We explain this in more detail in Section 3. 7 Perceived self-efficacy is a very different concept to self-esteem. While perceived self-efficacy is a judgment of capability, self-esteem is a judgment of worth (Bandura, 1977, pg. 309).
chological scales are simply not developed to measure this construct. Finally, we needed to capture the true expectations of own performance and at the same time, we wanted to ensure that subjects performed up to their maximum capacity during performance time.
To achieve that, subjects needed to be provided with the right material incentives. In the absence of incentives, they may have answered to conform to the experimenter's expectations, they may have not put enough care to think about the answer or they may have not put enough effort when performing.
In light of the above, we applied the following incentive scheme. Subjects were asked to solve a practice task for one minute. Once the practice period was over, their self-confidence C was measured by asking them to report how many tasks they expected to solve during the 20-minute period. The subject received a piece rate of 100 points per solved task, P, minus 40 points for each task that he mispredicted when estimating future performance: The misprediction penalty provided the subjects with an incentive to truthfully report the median of their perceived performance distribution. Note that this scheme implies that the effective piece rate of performance was 140 points for each successfully completed task as long as they stay below their estimate and 60 points for each successfully completed task thereafter. Hence, truthful elicitation of self-confidence was bound to somewhat distort incentives during the performance period. For this reason, we chose a generous exchange rate from points to money (0.01 £ per point) to ensure that even 60 points represented a salient reward and the subject had high enough incentives to continue putting effort.
Moreover, once the subject reached his estimate, it meant that he figured out the way to solve the task, and the effort put thereafter was bound to be less costly.
Recall that above and beyond confidence, we were interested in measuring the degree of overconfidence. Moore and Healy (2008) defines overconfidence as the overestimation of one's actual performance and we apply this definition for this paper. 8 Like self-confidence, the degree of overconfidence is usually measured through answers to surveys or experimental questionnaires in a non-incentivised way. For the same reasons exposed above, we used an incentive compatible measure of overconfidence. A person was considered to be overconfident when he/she expected to perform better than his/her actual performance. This measure pins down overconfidence in an incentive compatible way because subjects had material incentives to both, announce their expectations as accurately as possible and perform as well as possible. 8 Overconfidence has also been defined in the literature as the overplacement of one's performance relative to others and as the overestimation of the precision in one's knowledge (Moore and Healy, 2008).

2D:4D and other Measures
At the end of the experiment, we scanned the right hand of each subject, we measured the length of their second and fourth finger, and calculated their ratio (2D:4D ratio). 9 Finger length was measured by two independent research assistants using a digital caliper. All data analysis was done using the average of the two independent measures of ratios. 10 In addition to the variables of interest, we collected independent data in a post-experiment questionnaire to construct variables that were used as controls in our regressions. In particular, we elicited risk preferences using the Eckel and Grossman (2002) method. This method involves a single choice among six hypothetical gambles. The gambles differ in expected return and variance. Each gamble has two possible outcomes with fifty percent probabilities of each occurring. The higher the gamble, the the higher expected payoff but also the higher the risk involved.
We also used the General Self-Efficacy Scale (Schwarzer and Jerusalem, 1995) to measure generalized perceived self-efficacy (see Appendix D). This Likert-type scale consists of 10 statements. Subjects are asked to indicate how true they think each statement is for them.
The scale has been validated in several studies and widely used internationally (Schwarzer and Born, 1997). It captures, in a general way, the belief that one can perform a novel or difficult tasks.

The Task
For our experiment, we chose a computerized puzzle which consisted of a modified version of the so-called "Tower of Hanoi"(ToH) puzzle. The standard ToH consists of three rods, and a number of disks of different sizes which can slide onto any rod. 11 The puzzle starts with the disks in a neat stack in ascending order of size on one rod, the smallest at the top, thus making a conical shape. The objective of the puzzle is to move the entire stack to another rod, obeying the following rules: (a) only one disk can be moved at a time, (b) each move consists of taking the upper disk from one of the rods and sliding it onto another rod, on top of the other disks that may already be present on that rod and (c) no disk may be placed on top of a smaller disk. We used a slightly modified version of the original ToH to increase difficulty. In our case, instead of having disks of different sizes, there were disks of different colors. The rule was to always preserve the original order of colours of the disks (pink, green, blue, turquoise, brown). For example, brown could be moved on top of any other disks, but green could only be moved on top of the pink, etc. 12 9 2D:4D was determined from right-hand measurements only, because right-hand digit ratios have been shown previously to display more robust sex differences and are thus thought to be more sensitive to prenatal androgens. 10 Both independent measures displayed a high repeatability (intraclass correlation 0.875). The results if we used the two measurements separately are qualitatively the same. 11 The standard ToH has been extensively studied by cognitive psychologists but very rarely used in economics (McDaniel and Rutström, 2001). 12 A screenshot of the computerized puzzle can be seen in Appendix C.
We chose this puzzle for several reasons. First, the rules of the task are easy to understand, which reduces the possibility of noise. Second, the task has a unique solution (involving thirty one moves) which is computed by backward induction. Third, it is quite unfamiliar to subjects and it constitutes a Eureka-type of problem (Cooper and Kagel, 2005): it appears to be challenging at first glance, but simple to solve once the algorithm is figured out. This is a desirable property for a self-confidence and overconfidence measure, since it allowed us to elicit expectations within a setting in which people had imperfect knowledge of their own abilities. 13 In fact, in our experiment, only five subjects managed to solve the task in the practice time, but all eventually made it during the performance time.

Data
Two hundred and fifty five students from Warwick University participated in the study. The sample was proportionally balanced by gender. Five subjects who solved the task in the practice time were excluded from all the analysis. We decided to exclude them because their prediction of expected performance would not involve any level of uncertainty about their capacity to perform. Further, we excluded one outlier with an overconfidence level forty times higher than the mean and two subjects who did not report their gender. Therefore, the final sample we analyze consisted of two hundred and forty nine subjects. Table 1 shows the summary statistics of our experimental measure of self-confidence.
On average, subjects expected to solve about ten ToHs in twenty minutes, with a standard deviation of about six. As Figure 1 shows, the frequency distribution of confidence in our data is quite disperse and rather skewed to the right, with a median at eight, a mode at five, a minimum at zero and a maximum at thirty. Finally, although this paper is not about gender differences, it is worth noticing that in average men expected to perform forty percent better than women (P <0.01). 14 We also looked at other variables that we expected to be positively correlated with our measure of self-confidence (see Table 2). As expected, we observed a significant positive correlation with Schwarzer and Jerusalem's (1995) general measure of perceived self-efficacy (P <0.01). Likewise, self-confidence was positively correlated with some proxies of the ability to solve the task such as being enrolled in a mathematical oriented degree (P <0.01) and being familiar with the task (P <0.10). We also looked at its correlation with risk aversion, since one could expect that risk averse subjects set lower expectations. However we don't find evidence of a link between these two variables. Table 3 and Figure 2 describe the data on overconfidence. Recall that those subjects whose expectations were higher (respectively lower) than their actual performance are clas-sified as overconfident (respectively underconfident). As it can be seen in Table 3, the sample is equally divided between these two groups of subjects, with only 7 percent of the subjects performing exactly the way they expected to perform. Interestingly, the number of overconfident (hence underconfident) subjects is equal for men and women.
Finally, Table 4 summarizes the data on 2D:4D ratio. The average of 0.96 as well as the gender differences are in accordance with standard findings in the literature: male ratios are typically shorter than those of female.

Self-confidence and Prenatal Testosterone Exposure
In Table 5 we report the results of a linear regression analysis examining the relation between our measure of self-confidence and the digit ratio. 15 Self-confidence was significatively positively correlated with the digit ratio, suggesting that high self-confidence was associated with low prenatal testosterone exposure. When data were analyzed separately for men and women, we found that the effect was entirely driven by men. Also, as expected, men exhibited significantly higher self-confidence than women (P <0.01).
The correlation between prenatal testosterone exposure and self-confidence may not reflect a causal relation between these variables but rather be due to a third variable, independently correlated with testosterone and self-confidence. For example, it may be that subjects enrolled in a mathematics oriented degree or who are familiar with the ToH, are also those who have been exposed to lower prenatal testosterone (i.e. high 2D:4D) and because of their better knowledge (and not directly because of the prenatal testosterone exposure) they expected to perform better than those with a low 2D:4D. However, when we control for these two factors, the estimated coefficient of self-confidence on 2D:4D remains substantially the same (Table 5, column II). The same happens with risk aversion and self-efficacy. When we include these variables in the regression, the association between prenatal testosterone exposure and self-confidence remains virtually unchanged (Table 5, columns III and IV). Interestingly, the degree of previous expertise with the task (measured with proxies such as being enrolled in a maths degree or familiarity with the task), has a significant positive correlation with male (rather than female) self-confidence, whereas perceived self-efficacy is significatively positively correlated with female (rather than male) self-confidence. 15 Given that self-confidence is a count variable, we replicated our analysis using Negative Binomial Regressions and our results do not change. We chose Negative Binomial instead of Poisson regressions due to over dispersion in our data (variance greater than mean). Table 6 reports results on the relation between our measure of overconfidence and digit ratio. Recall that overconfidence is defined as expectations minus actual performance, so this variable takes positive values when the person is overconfident, and is increasing in the degree of overconfidence. When we regressed this measure on digit ratio, we found that they were significatively positive correlated, suggesting that high overconfidence was associated with low prenatal testosterone exposure ( Table 6). After controlling for possible confounding variables, like previous experience with the task, risk aversion and self-efficacy, the association between prenatal testosterone exposure and overconfidence became even stronger. (Table 6, columns III and IV). Again, we found this effect only in men. Also, as expected, we found that the higher the degree of previous expertise with the task and the higher the self-efficacy, the lower the overconfidence. 16

Overconfidence and Experimental Earnings
So far we have shown that men who were exposed to higher prenatal testosterone in their mothers' womb were less likely to be overconfident. An important question that still remains unanswered regards the welfare effects of overconfidence. Was being overconfident good or bad for the subjects? Did overconfident subjects earn more money in the experiment than non-overconfident subjects?
As pointed out by Benabou and Tirole (2002), the answer is not straightforward. On the one hand, setting high expectations can improve earnings by motivating higher effort and hence improving performance. On the other hand, setting excessively high expectations can only increase the cost of not reaching them. Thus, whether overconfidence is in the end a good or a bad strategy is an empirical question. We examined this question by regressing an overconfidence dummy on the final experimental earnings (see Table 8). Our regressions confirm that being overconfident was on average a bad strategy in our experiment.
Non-overconfident subjects who set their expectations below their actual potential ended up winning on average eight to nine British pounds more than overconfident subjects. 17 These results are true for both, men and women, and controlling for a series of possible confounders. The magnitude of the cost of overconfidence on earnings was very high: it more than doubled the cost of not having previous experience with the task. Interestingly, the 2D:4D ratio did not affect earnings directly, but trough its effect on self-confidence.
The subjects who performed better in the lab seemed to have pursued a strategy that the psychologists know as "defensive pessimism": setting low expectations in uncertain situations to harness anxiety and thus perform better. This strategy was also discussed in the economic model of Benabou and Tirole (2002). In their theory, "defensive pessimism" comes as a result from assuming that ability is a substitute rather than a complement of effort in generating future pay-offs. This gives the person an incentive to discount or repress signals of high ability, as these would increase the temptation to "coast" or "slack off." In other words, considering the possibility of failure may motivate higher effort to avoid that possibility, and it is a rational strategy to follow inasmuch it increases performance. This is, indeed, what we observe in our experimental data: overconfident subjects gained substantially lower earnings than subjects who set more modestly their expectations.

Conclusion
This is the first paper examining the biological determinants of self-confidence and overconfidence. We provide evidence that men with higher prenatal testosterone exposure (i.e. low 2D:4D ratio) are less likely to set unrealistically high expectations about their own performance. Importantly, we also show that such bias has normative implications: overconfidence was detrimental for individuals' earnings.
The evidence in this paper can be understood as a plausible explanation of why male financial traders with higher prenatal testosterone exposure remain longer on business or have higher long term profits (Coates et al., 2009). According to our findings, these traders may be less likely to suffer from overconfidence bias, and this helps them to be more successful in the long run. This interpretation is consistent with the empirical findings of Barber and Odean (2001), who show that overconfidence is negatively correlated with traders financial returns. 18 Our paper also provides an alternative plausible channel through which prenatal testosterone exposure may affect behavior and outcomes in other settings. For instance, prenatal testosterone has been shown to be positively correlated with performance in a range of sports. The main explanation put forward is that it promotes the development of male fighting and competitiveness, which are useful traits to succeed in sports (Manning and Taylor, 2001). The evidence presented here suggests another alternative explanation: men with high prenatal testosterone exposure may succeed in sports because they may use "defensive pessimism"strategies. That is, they may set low expectations to harness anxiety and hence perform better. 18 The other alternative explanations to Coates' et al. (2009) findings rely on risk preferences or preferences for competition. However, there is no unambiguous empirical evidence on the link between these two preferences and 2D:4D. Moreover, we found no significant relationship of 2D:4D and risk aversion in our data (neither for men nor for women), which is also the findings of Sapienza et al. (2008) and Apicella et al. (2008). Likewise, Pearson and Schipper (2011) found no correlation between 2D:4D and competitive behavior in markets.  Notes: *** significant at 1%, ** significant at 5%, * significant at 10%.   Notes: This table shows OLS regressions of number of repetitions of tasks expected to solve in 20 minutes after one minute of practice time on the 2D:4D digit ratio. All regressions include sessions fixed effects and robust standard errors clustered by session are reported in brackets. *** significant at 1%, ** significant at 5%, * significant at 10%. Notes: This table shows OLS Regressions of a measure of expectations -actual performance on the 2D:4D digit ratio. All regressions include sessions fixed effects and robust standard errors are reported in brackets. *** significant at 1%, ** significant at 5%, * significant at 10%.  2 if Predicted >Actual Performance on the 2D:4D digit ratio. All regressions include sessions fixed effects and robust standard errors clustered by session are reported in brackets. *** significant at 1%, ** significant at 5%, * significant at 10%. . The dependent variable is final experimental earnings measured in GBP. All regressions include sessions fixed effects and robust standard errors clustered by session are reported in brackets. *** significant at 1%, ** significant at 5%, * significant at 10%.