An Experimental Examination of Demand-Side Preferences for Female and Male National Leaders

Females constitute a far smaller proportion of political leaders than their proportion in the general population. Leading demand- and supply side explanations for this phenomenon account for some of the variance but leave a great deal unexplained. In an effort to account for additional variance, this research evaluates the issue informed by the biological theory of evolution by natural selection, a foundational explanation for the diversity and function of living organisms. It experimentally assesses how varying types of inter- and intragroup threat–a recurring ancestral problem–affect demand for female and male national leaders. This work analyzes data collected from individuals (N = 826) in the U.S. during the 2012 Cooperative Congressional Election Study. The results suggest the predominant preference for male over female leaders in some contexts may be the non-adaptive and non-functional but lingering outcome of an adaptive preference for physically formidable allies that was shaped by natural selection in ancestral environments.


INTRODUCTION
In the last several decades, women have attained unprecedented success in the electoral arena (Geiger and Kent, 2017). However, the proportion of women attaining political leadership still falls far short of their proportion of the world population. According to a recent review conducted for Pew Research Center (Geiger and Kent, 2017), less than 10 percent of United Nations member states are currently led by a female, and less than 40 percent of the 146 countries examined by the World Economic Forum have had a female head of state or government in the last 50 years. Leading explanations focus on demand-side factors such as role congruity (e.g., Eagly and Karau, 2002) and gender stereotyping (e.g., Huddy and Terkildsen, 1993) as well as supply side factors such as differential political ambition (e.g., Fox and Lawless, 2004). Although these explanations account for some of the imbalance in female-male leadership attainment, they, like most social science models, leave a great deal to explain. This research suggests natural selection, a biologically informed approach widely used outside the social sciences but frequently overlooked within them, may offer additional explanatory leverage on this phenomenon by accounting for differences between females and males in physical formidability.
A cross-disciplinary review of the literature suggests the imbalance in leadership attainment is not surprising. Diverse fields such as non-human animal behavior, anthropology, economics, and psychology document a direct relationship between gender and leadership over millennia and across cultures and species (Murray and Murray, 2011). For instance, with few exceptions (e.g., killer whales, lions, spotted hyenas, bonobos, lemurs, and elephants), male group dominance is nearly universal in primate and mammalian animal groups (Smith et al., 2018;Kappeler et al., 2019). This, of course, includes humans. For instance, Brown (1991, 137) states that "In the public political sphere men form the dominant element among [every people or people in general]" (see also Ludwig, 2002;von Rueden et al., 2018). Archeological and anthropological data suggest that human males have been vastly over-represented in the public sphere dating back thousands of years to Egyptian pharaohs; Chinese, Japanese, and Roman emperors; Catholic Popes; and European monarchs (Murray and Murray, 2011). This includes not just large-scale states or empires but also other forms of society including small-scale, egalitarian, and preindustrial and nonindustrial societies (Whyte, 1978;Collier and Rosaldo, 1981;Low, 1992;von Rueden et al., 2018). Ludwig concludes "this blanket prejudice against female rulers goes back to antiquity" (2002,29). Moreover, in modern times, the proportion of female political and business executive leaders tops out at about 10 percent (Murray and Murray, 2011;Geiger and Kent, 2017).
Leading explanations for this phenomenon suggest that it results from culturally transmitted, learned stereotypes. Nonhuman animal, small-scale, egalitarian, nonindustrial, ancient Egyptian, medieval European, and modern corporate cultures are vastly different, though. These cross-temporal, cross-cultural, and, in particular, cross-species results suggest that these unrepresentative leadership outcomes are not due solely to learning stereotypes through cultural transmission. Substantial evidence shows that human behavior is not only affected by environment. It is also affected by biological transmission such as genetic inheritance (Alford et al., 2005;Benjamin et al., 2012;Smith et al., 2012) along with many other biological forces, through time, culture, and species.
The research presented here suggests explanations informed by Darwin's (1859) evolutionary theory of natural selection may shed additional light on the relationship between gender and leadership preferences. In particular, this argument suggests the imbalance in female-male leadership attainment may be the nonadaptive and non-functional but lingering outcome, or incidental by-product, of an adapted psychological mechanism that favors a preference for physically formidable leaders in certain situations. In short and consistent with an emerging literature on adaptive followership theory (e.g., Little et al., 2007;Van Vugt et al., 2008a;Spisak et al., 2012;Van Vugt and Grabo, 2015;Laustsen and Petersen, 2017), physically formidable leaders may have helped allies acquire and ward off competitors for resources that were vital for survival and reproduction in violent ancestral times (Lukaszewski et al., 2016). Because males have been physically larger than females throughout human history (Geary, 1998) and because humans sometimes rely on "mismatched" cues more suitable for ancestral than modern society (Tooby and Cosmides, 1992;Li et al., 2017), males aspiring to leadership positions in certain situations may have a greater probability of success than females as the result of a by-product of an adaptive preference for physically formidable allies 1 .
To test for effects of evolution-related forces on the demand of followers for female versus male leadership, this article first presents a review of the pertinent literature regarding the role of gender heuristics in leadership preferences and then identifies potential evolutionary factors that may influence leadership preferences within that literature. Next, it presents the results from an experiment (N = 826) embedded in the 2012 Cooperative Congressional Election Study (CCES), a nationally representative, population-based online survey that offers the advantage of the external validity of a nationally representative sample and the internal validity of an experimental design. It then presents the results, which suggest that increased intergroup threat increases preferences for physically formidable leaders, which, in turn, increase preferences for male over female leaders. Together, these findings are consistent with the assertion that the preference for male over female leaders may be the incidental by-product of an evolved psychological mechanism. Finally, it ends with a discussion of the implications of the findings, including a possible but controversial policy solution that has been implemented in a number of countries to attempt to overcome gender imbalances in leadership that may be a result of very long-term biological influences. (McGarty et al., 2002). Scholars have argued that gender stereotyping occurs in a variety of situations encompassing such diverse areas as coaching decisions (Kalin and Waldron, 2015), student evaluations of teaching (Centra and Gaubatz, 2000), law firm hiring (Gorman, 2005), career choice evaluation (Correll, 2004), and, most pertinently here, assessments of political leaders (e.g., Sapiro, 1981Sapiro, , 1983Rosenwasser and Seale, 1988;Alexander and Andersen, 1993;Huddy and Terkildsen, 1993;Matland, 1994;Kahn, 1994;Koch, 2002;Sanbonmatsu, 2002;King and Matland, 2003;Dolan, 2010Dolan, , 2014Carey and Lizotte, 2019).
A significant body of literature indicates that voters possess particularized expectations or stereotypes about female and male leaders (e.g., Huddy and Terkildsen, 1993;McDermott, 1998;Eagly and Karau, 2002). Females are generally stereotyped as more communion oriented (Eagly et al., 2019) such as being more compassionate, compromising, emotional, and sensitive. Males are generally stereotyped as more agency oriented (Eagly et al., 2019) such as being more aggressive, assertive, selfconfident, and tough (Sapiro, 1981(Sapiro, , 1983Rosenwasser and Seale, 1988;Leeper, 1991;Alexander and Andersen, 1993;Huddy and Terkildsen, 1993;Sanbonmatsu, 2002;Lawless, 2004). In terms of government policy, females are believed to be advantaged on issues such as education, health, and care for children and the elderly, while males are thought to hold an advantage on issues such as crime, the military, and economics (Dolan, 2014).
It is important to note, though, that the effects of gender can be complicated. Scholars have found that gender stereotypes can transcend political party (Plutzer and Zipp, 1996), but in several instances the effect of gender is rendered insignificant when controlling for partisanship, incumbency, or both (Dolan, 2014;Stadelmann et al., 2014). In other situations, gender heuristics have been found to interact with female candidates' ideological orientation, which, for instance, can hinder conservative female candidates (Sanbonmatsu and Dolan, 2009) where stereotypes regarding women, such as positions on abortion, may handicap them compared to their liberal counterparts.

Evolutionarily Informed Demand Factors
The literature provides substantial evidence in support of gender effects associated with stereotyping and other demandrelated factors, but there is a great deal of variance left to be explained regarding how gender-based considerations affect leadership preferences. Research shows that biological factors also contribute to human behavior (e.g., Chagnon and Irons, 1979;Scarr and McCartney, 1983;Bouchard, 2004;Faulkner et al., 2004;Iofrida et al., 2014;Manuck and McCaffery, 2014), including political behavior (e.g., Alford and Hibbing, 2004;Alford et al., 2005;Kanai et al., 2011;Arceneaux et al., 2012;Hatemi and McDermott, 2012;Aarøe and Petersen, 2013;Merolla et al., 2013;Adams et al., 2014;Dawes et al., 2014Dawes et al., , 2015French et al., 2014;Shah et al., 2015;Stewart et al., 2015;Klofstad, 2016;Murray, 2017;Weinschenk and Dawes, 2017). The research presented here suggests that another key effect, evolution by natural selection, a foundational explanation for the diversity and function of living organisms, may explain additional variance. The application of its general principles may account for how leadership preferences vary by leadership situations (e.g., Van Vugt and Spisak, 2008;Post, 2015) that are evolutionarily relevant. It is appropriate to note, as stated by a reviewer of this article, "the theory of evolution by natural selection makes no predictions whatsoever about leadership preferences. . . [it] only provides some very general principles that can be applied in many specific ways to understand the evolution of particular traits." Put otherwise, while the theory of evolution holds substantial explanatory powers for matters involving biological systems, it is important to be clear that the following leadership-specific hypotheses are informed by but not derived directly from evolutionary theory.
Evolution by natural selection suggests that physical, cognitive, emotional, and motivational mechanisms emerged because they resulted in a greater likelihood of an individual's survival and ability to reproduce (e.g., Mayr, 2001;Crawford, 2008). Given the typically slow speed of evolution, the human brain, like other parts of the human body, still reflects the hominids living in the environment of evolutionary adaptedness (Tooby and Cosmides, 1992;Foley, 1995). Instincts acquired through natural selection in human ancestral times manifest themselves in modern life, even when seemingly irrational in and mismatched to the context of modern living (e.g., Li et al., 2017;Giphart and van Vugt, 2018). For instance, the widespread fear of snakes (LoBue and DeLoache, 2008;Hoehl et al., 2017), which are rarely encountered in contemporary society, and the overconsumption of fatty and sweet foods (Nesse and Williams, 1994), which promoted survival in times when adequate nutritional intake was uncertain, continue today despite their mismatches to modern society.
Similarly, the probability that a modern national leader will physically lead troops into battle is extremely small. But prior research suggests that individuals prefer physically formidable political leaders, a preference some scholars suggest is due to evolutionarily shaped preferences regarding physically formidable allies helping others acquire and protect vital resources in evolutionary environments (Murray, 2014;von Rueden et al., 2014;Lukaszewski et al., 2016). This suggests that individuals discount the aspects of modern society that render characteristics like size irrelevant and can make leadership decisions using cues that were suitable to older, small-scale societies (Little et al., 2007).
The human species has lived roughly 99 percent of its existence in small hunter-gatherer communities of roughly five to 150 people (Diamond, 1999;Foley, 1995;Van Vugt et al., 2008b). Intra-and intergroup conflict were common (e.g., Chagnon, 1997;Keeley, 1996;Van Vugt et al., 2008b) as individuals and groups competed over resources and status related to survival and reproduction. In terms of applying an evolutionary analytical framework (Lewis et al., 2017), this competition created a frequent and impactful adaptive problem regarding threats to individual survival and growth in the form of aggression and conflict initiated by competitive and/or dangerous conspecifics over vital resources such as food, shelter, and social status (Petersen et al., 2008).
One potential solution to such dangerous ancestral environments was physically formidable leaders who, in the pursuit of prestige and the related benefits (e.g., Henrich and Gil-White, 2001), helped allies offensively acquire and defensively protect vital resources due to their significant resource holding potential [regarding offensive versus defensive leadership see Laustsen and Petersen (2017) and Lukaszewski et al. (2016)]. Some scholars note that leadership in ancestral times was gained through capabilities that included fighting skills and strength (Diamond, 1999;Van Vugt et al., 2008a). For instance, leaders were called on by followers to quell intragroup fights and to lead raids against adversary groups (Van Vugt et al., 2008a). As succinctly summarized by Lukaszewski et al. (2016, 385): "Ancestrally, physically formidable males would have been differentially equipped to generate benefits for groups by providing leadership services of within-group enforcement. . . and between-group representation..." Intragroup enforcement might include punishing free riders and rule breakers, intervening in fights and other conflict between group members to reduce group-threating disputes, and enforcing group coordination to keep members on task. On the other hand, between-group coordination might include engaging in face-to-face negotiations with other groups and serving in war in times of extreme conflict (Lukaszewski et al., 2016).
Modern leadership preferences reflect these ancestral forces through a number of characteristics. First, there is substantial evidence of a biological component, particularly genetic inheritance, to political behavior (e.g., Alford et al., 2005) and leadership attainment (e.g., De Neve et al., 2013;Oskarsson et al., 2017). This evidence lays a solid foundation for related biological factors like evolutionary forces to play a role in leadership preferences. Second, in social interactions, individuals establish hierarchies quickly based on perceived authority, even using first impressions that can occur before any verbal interaction (Kalma, 1991). Importantly, humans have the ability to evaluate visually a person's physical formidability (Sell et al., 2008). Third, in leadership preferences, the context matters as different leadership situations require different leader responses (McCleskey, 2014). For example, research suggests female leaders more effectively coordinate large teams and cultivate team cohesion and communication (Post, 2015). This is consistent with findings that female leaders are strongly preferred and more successfully raise group investment than male leaders during intragroup competition (Van Vugt and Spisak, 2008). On the other hand, male leaders more successfully raise group investment during intergroup competition and, more broadly, people tend to prefer more dominant leaders when the chance of danger increases (McCann, 2001;Little et al., 2007;Merolla et al., 2007;Petersen and Laustsen, 2020).
Fourth, this is consistent with prior findings that individuals with greater physical stature, as indicated by relative height, are more likely to be perceived as capable and competent (Hensley, 1993) and to be respected and feared by potential opponents (Gregor, 1979). To extend this, research also suggests people are less likely to aggress against opponents who are physically formidable (e.g., Fessler et al., 2014). Broadly speaking, formidability is defined as the ability to hold resources by imposing costs on challengers (Sell et al., 2008). Physical size is an effective indicator of formidability related to fighting ability. Larger animals, both human and non-human, are more likely to prevail in physical contests (e.g., Huntingford and Turner, 1987;Sell et al., 2012;Szamado, 2008), and, therefore, individuals frequently use physical size as an indicator of resource holding potential (Huntingford and Turner, 1987).
It is theoretically important not to conflate dominance with physical formidability. They are different concepts, and this research specifically addresses physical formidability. Dominance has been defined as, for instance, "the induction of fear, through intimidation and coercion" (Petersen and Laustsen, 2020, 136). Physical formidability as used here is indicated by physical characteristics (e.g., height, weight, body mass index) that can be used "to hold resources by imposing costs on challengers." A physically formidable person may or may not induce fear (i.e., be dominant); the person may merely cue that in the case of a physical altercation he or she will have an advantage over an opponent. But when a physically formidable individual does induce fear, it is because the opponent believes it is likely he or she will be physically harmed or "beat up." On the other hand, a dominant person is by definition inducing fear. Importantly, though, that person may induce fear because of physical formidability or myriad other reasons. The person may be brandishing a lethal weapon or holding a position of social advantage such as a powerful role in an organization (e.g., a supervisor of other people) or possessing resources that could damage personal, social, or professional reputations (e.g., social or news media). Put otherwise, although the two sometimes go together, one can be dominant without being physically formidable, and one can be physically formidable without being dominant.
This argument is in line with evidence that war stimulates a preference for leaders with greater weight and body mass (Murray, 2014). This is also consistent with emerging research on adaptive followership theory, which suggests that modern followership preferences are influenced via factors related to natural selection by the outcomes of leadership in ancestral situations of social conflict (Little et al., 2007;Van Vugt et al., 2008b;Spisak et al., 2012;Van Vugt and Grabo, 2015;Laustsen and Petersen, 2017). This review suggests: Hypothesis 1: Situations of increased intergroup threat will lead to an increased preference for a physically formidable leader.
The above argument and supporting evidence suggest there are adaptive psychological tendencies unrelated to modern gender stereotypes that affect individual preferences in terms of both intra-and intergroup competition and physically formidable leaders. But the connection to preferences regarding gendered leadership requires further evidence and the role of evolution by natural selection requires further specification. Evolution produces three outcomes: adaptations, incidental byproducts, and random effects (e.g., Buss et al., 1998;Lewis et al., 2017). Adaptations emerged because they helped solve a recurring problem related to survival and reproduction in ancestral environments. For example, umbilical cords carry nutrition from mothers to their developing fetuses. Based on the argument presented above, this research asserts that the preference for physically formidable allies and leaders is, like umbilical cords, an adaption. Such a preference promoted survival and reproduction as formidable allies helped individuals acquire and protect vital resources. On the other hand, byproducts emerged as an outcome of an adaptation. They promote neither survival nor reproduction but accompany adaptations, which do. That is, they are non-adaptive and nonfunctional such as navels being the results of umbilical cords. This research asserts that the preference for male over female leaders in threatening situations is, like navels, a by-product of evolution. This phenomenon accompanies the preference for physically formidable leaders and does not promote survival or reproduction. Finally, evolution can also produce random effects, which emerged as the result of random or sudden changes in the environment and which are not linked to features of an adaptation. For example, the shape of an individual's navel is a random effect of evolution that neither helps nor hinders the adaptive function of umbilical cords. This research asserts the psychological mechanism presented here is a by-product of evolution and not a random effect of evolution.
With the mechanism for the evolutionary link specified, the link to gendered leadership preferences can also be specified. Continuing the enumeration from above, fifth, archeological evidence suggests that males have been physically larger than females in all human hominid groups dating back three to four million years (Geary, 1998). This translates in current times to men having on average 61 percent more muscle mass (Lassek and Gaulin, 2009) and roughly 50 to 100 percent more upper-body strength than women (Pheasant, 1983), with female and male distributions in upper-body strength and muscle mass overlapping by less than 10 percent (Lassek and Gaulin, 2009). This sexual dimorphism suggests that when physical formidability is a desirable trait, males are greatly advantaged over females. Sixth, evidence suggests that throughout history males have been more likely to serve as combatants in wars and other intergroup conflict than females (Keegan, 1993;Goldstein, 2003;Glowacki et al., 2017). This is consistent with research that indicates male leaders are strongly preferred over female leaders and more successfully raise group investment than female leaders during intergroup competition (Van Vugt and Spisak, 2008). It is also consistent with research that shows groups with greater numbers of males are more likely to win intergroup contests (Glowacki et al., 2017). This review suggests: Hypothesis 2: A preference for a physically formidable leader will lead to an increased preference for a male leader compared to a female leader.
If the results support Hypotheses 1 and 2, the next step is to provide evidence that demonstrates the potential role of evolutionary forces in gendered leadership preferences by establishing a link from intergroup threat through preferences for leader physical formidability to preferences regarding the biological sex of a preferred leader. Evidence suggests that part of the male advantage in leadership attainment is related to males' greater body size and physical strength (e.g., Handwerker and Crosbie, 1982;Glowacki and von Rueden, 2015;von Rueden et al., 2018; but see Low, 1992). Overall, the analyses presented here suggest: Hypothesis 3: The preference for a male versus female leader will be at least partially attributable to a sense of external threat that is conveyed through a preference for a physically formidable leader.

Plan of Analysis
We assert that differential preferences for female versus male leaders are motivated at least partially by situational threat that may be related to evolutionary forces. Increased intergroup threat, an evolutionarily salient situation, increases preferences for physically formidable leaders, and, in turn, a preference for physically formidable leaders increases preferences for male compared to female leaders. To assess this argument and process, we use simple mediation analysis (e.g., Baron and Kenny, 1986;Preacher and Hayes, 2004) to see if a preference for a physically formidable leader contributes to (i.e., mediates) the relationship between the experimental treatments and the preference for a female versus male leader. We first present results of the underlying experiment; that is, did the experimental conditions affect the outcome variable, the preference for a female versus male leader? In terms of mediation analysis this represents the total effect of the relationship between the treatments and leadership preferences; that is, the relationship between the treatments and leadership preferences without controlling for the effect of the mediating variable. Then we turn to the main argument and evaluate evidence regarding intergroup threat and whether it increases preferences for physically formidable leaders (the mediator), testing Hypothesis 1 (H 1 ). Next, we assess evidence regarding preferences for physically formidable leaders and whether they increase preferences for male relative to female leaders, testing Hypothesis 2 (H 2 ). Finally, we assess evidence regarding whether a differential preference for a female versus male leader is linked to perceived intergroup threat through the preference for a physically formidable leader, the mediator representing the indirect effect (Preacher and Hayes, 2004), testing Hypothesis 3 (H 3 ). We assert that supporting evidence for the three hypotheses would provide nontrivial evidence that differential preferences for male versus female leaders are motivated at least in part by situational threat related to evolutionary forces. As depicted in Figure 1, specifically we are testing for the presence of an indirect effect (ab) from the experimental treatments (T) through the mediating variable, preference for physical formidability (M), to the outcome variable, leader preference (Y).

DATA AND METHODS
The data were collected before the 2012 presidential election as part of that year's Cooperative Congressional Election Study (CCES), an ongoing series of nationally representative, population-based online surveys administered by YouGov/Polimetrix. One thousand subjects participated in the survey experiment with completed responses obtained from N = 826 subjects. Compared to population means reported by the United States Census Bureau, the experimental subject pool is slightly more female, racially diverse, and educated; slightly less Hispanic; and similar in terms of wealth. Overall, this research takes advantage of the internal validity offered by an experimental design and the external validity offered by a nationally representative sample. All p-values are based on two-tailed tests.
Regarding the experiment, the CCES survey used simple random assignment to assign subjects to one of four treatment groups. The treatments were vignettes directing subjects to "[c]reate in your mind the national leader of your country, such as a president or prime minister, whom you would want to lead the country" during times of varying threat conditions: war, peace, natural disaster requiring cooperation, and a non-specific control condition. This vignette approach (Schoenberg and Ravda, 2000) was used to lead subjects to fix their leader's characteristics in their minds before answering the follow-up questions regarding specific characteristics that may have led them to change their answers (e.g., some subjects may not have imagined a female leader until a question led them to do so). The war vignette served as the threat condition, while the peace, cooperation, and control vignettes served as reduced-threat conditions. See Supplementary Appendix A for the treatment vignettes. After treatment, the instrument directed subjects to describe in their own words the leader they imagined and then to answer a series of open-and closed-ended questions related to leader preferences stemming from the treatments followed by a series of political and demographic questions.
A multinomial probit test of random assignment to the experimental groups indicates the randomization process generated statistically equivalent experimental groups (X 2 [69] = 47.38, p = 0.98). In this test, group assignment was regressed on subject gender stereotyping (discussed below), political ideology, income, education, race, gender, age, religiosity, and political interest. See Supplementary Appendix B for details. Manipulation checks indicate the treatments successfully influenced subjects' assessments of the differences in threat presented by the treatments.

The Underlying Experiment: The Total Effect
This preliminary analysis tests for a relationship between each treatment and the preference for a female versus male leader without controlling for the proposed mediating effect. More formally, this is the total effect (c) (Baron and Kenny, 1986;Preacher and Hayes, 2004). As such, subjects responded to a closed-ended question about the gender of their imagined leader, with "male" responses coded 1 and "female" coded 0. This dichotomous measure served as the dependent variable for this and later analyses related to H2 and H3. For these analyses, leader gender was separately regressed on three different independent variables representing the treatment conditions such that subjects in the "war" group were coded 1 and each of the others coded 0. Due to the dichotomous nature of the dependent variable, the effects were estimated using probit regression. Further, they were also estimated using robust standard errors due to evidence of heteroscedasticity in some of the models. Because random assignment was successful, the model does not specify covariates. The probit estimates and statistics for the three models appear in Supplementary Appendix C. Overall, the probability of a preference for a male leader ranges from 0.77 in the cooperation condition (Pr(male| coop)) to 0.81 in the control (Pr(male| control)) and peace (Pr (male| peace)) conditions. For ease of interpretation, Figure 2 presents the average treatment effects in the form of average marginal effects, which are derived from the probit estimates. The figure suggests that moving from the control and peace treatments to the war treatment increases the preference for a male leader but not in statistically discernible ways. On the other hand, the war treatment relative to the cooperation treatment statistically significantly increases the preference for a male leader by 7.9 percentage points or about 10 percent.
Because a total effect is not essential for finding indirect effects in mediation analysis (Preacher and Hayes, 2004;Hayes, 2017), the next steps are to continue working through the mediation analysis process to test the hypothesized indirect effects.

Testing H 1 : Intergroup Threat and Preferences for Physically Formidable Leaders
The argument presented here indicates intergroup threat leads to a mismatched preference for a physically formidable leader as a result of lingering evolutionary effects on people's behavior. Specifically, Hypothesis 1 states that increased intergroup threat will lead to an increased preference for a physically formidable leader. If the evolutionarily argument is correct, experimentally stimulated intergroup threat should increase subjects' preference for a physically formidable leader despite the fact that modern national leaders are extremely unlikely to lead troops into battle.
For the analyses, the subjects assessed the physical formidability of their imagined leaders using a 1-7 scale indicating how well the 10 words or phrases presented in   Dependent Variables (DV) coded 1 = "not well at all" to 7 = "extremely well"; +p < 0.10 *p < 0.05 **p < 0.01 ***p < 0.001 (two tailed). Bold indicates measures with multiple statistically significant treatment effects.
the word or phrase described their leader "not well at all" and 7 indicated "extremely well." For these analyses, the leader descriptions are regressed on the treatments, where 1 indicates the war condition and 0 indicates either the control, peace, or cooperation conditions, using OLS estimation. The estimates are based on robust standard errors due to evidence of heteroscedasticity in some of the models. Because random assignment was successful, the models do not specify covariates. Supplementary Appendix D reports the 30 models. Table 1 presents the OLS coefficients or average treatment effects estimated by the models (Figure 1, (a)). Of the 10 characteristics, only three generated statistically meaningful effects: physically imposing/intimidating, dominant, and physically strong. The positive and statistically significant effects on the physically imposing/intimidating dependent variable indicate that the war treatment stimulated a meaningfully greater preference for a physically imposing/intimidating leader compared to each of the other treatments. The magnitude of these effects is not trivial. The threatening war treatment increased the preference for a physically imposing leader relative to the control and peace treatments by about 0.4 point or 10 percent and the cooperation treatment by about 0.5 point or 13 percent. The war treatment also increased the preference for a dominant leader relative to the peace and cooperation treatments by about 0.4 point or 8.5 percent. Further, it had a positive effect on preference for a dominant leader relative to the control treatment but only reached a marginal level of statistical significance. Finally, the war treatment also had a positive effect relative to the cooperation treatment on the preference for a physically strong leader but only reached a marginal level of statistical significance. The measures "physically imposing or intimidating" and "physically strong" were intended to represent physical formidability. The preference for a physically strong leader is not as clear as it is for a physically imposing/intimidating leader, but the results are mostly consistent with arguments that increased intergroup threat stimulates a preference for physically formidable leaders. Overall, these results provide reasonable evidence in support of H 1 .
Interestingly, the war treatment did not stimulate a discernible effect on any of the classically preferred leadership characteristics (i.e., competent, dependable, and intelligent; e.g., Miller et al., 1986;Zaccaro et al., 2004). This is likely a reassuring result given that it indicates individuals value these traits regardless of the context, which is also indicated by their high mean scores (minimum of 6.4 out of maximum possible of 7) reported in Table 1 for these characteristics. Together these three measures of classic characteristics create an internally consistent scale in these data with Cronbach's alpha = 0.91. Table 1 indicates the war treatment also had no statistically discernible effect on this scale of classic leadership traits compared to the non-threat treatments.
To specifically assess physical formidability, the two measures physically imposing/intimidating and physically strong are used to create a measure of physical formidability that constitutes a reasonable scale, particularly for only two items, with Cronbach's alpha = 0.59 and a moderate bivariate correlation r = 0.43 (p < 0.001). In the case of this scale, Table 1 indicates the war treatment meaningfully increased relative to the control and cooperation treatments the preference for a physically formidable leader. It increased the preference for a physically formidable leader by 0.3 point or 7 percent relative to the control treatment and 0.4 point or 9 percent relative to the cooperation treatment. Overall and again, these results mostly support H 1 . They are consistent with arguments that increased intergroup threat triggers a preference for physically formidable leaders.

Testing H2: Preferences for Physically Formidable Leaders and Leader Sex
Having provided evidence in support of H 1 that intergroup threat stimulates a preference for physically formidable leaders, the second set of analyses is designed to establish a relationship between a preference for a physically formidable leader and the sex of the leader. In particular, H 2 states that a greater preference for a physically formidable leader will lead to an increased preference for a male leader compared to a female leader. If this argument is correct, then subjects' leader preferences should account for the biological condition and everyday experience of sexual dimorphism in which human males tend to be larger and more physically formidable than human females.
To test this hypothesis, leader gender was regressed on the physical formidability scale created for H 1 as well as a measure of gender stereotyping and a number of covariates found in previous research to affect attitudes toward differential preferences for female versus male leaders: subject's sex, age, education, religiosity, and political ideology. Education, religiosity, and ideology were specified as series of indicator variables as noted in Table 2. Physical formidability was recoded to a 0-1 scale to facilitate comparison with other measures. The measure of gender stereotyping is a "multidimensional aversion to +p < 0.10 *p < 0.05 **p < 0.01 ***p < 0.001 (two tailed).
women who work scale, " which estimates skepticism of female employment and traditional role preferences (Valentine, 2001). The analysis includes this measure to control for leadership preferences motivated by attitudes toward gender equality at work. Stereotyping is the primary alternative explanation to the evolutionary argument presented here. This 10-item scale (Cronbach's alpha = 0.91) represents learned or environmental effects on this leadership preference, and the expectation is that it will exert an independent effect on the gender of the imagined leader such that individuals with a greater "aversion" to women at work will be more likely to prefer a male leader. This scale was recoded to range between 0 and 1 to facilitate comparisons with the measure of preferences for physical formidability. In particular, if effects of the physical formidability measure on the sex of respondents' preferred leader disappear when this measure is included in the model, then we can conclude that it is an effect of stereotyping and that evolution-related forces, at least as construed here, do not affect this leadership preference. See Supplementary Appendix E for details on pertinent variables. Supplementary Appendix F presents the full regression models, which use probit estimation due to the dichotomous nature of the dependent variable and robust standard errors due to evidence of heteroscedasticity. For ease of interpretation, Table 2 presents the average marginal effects for the four probit models in order to demonstrate the effect of subjects' preferences for leader physical formidability on their preferences for a female versus male leader (Figure 1, (b)). The first column of results indicates that moving from the minimum to maximum value of leader physical formidability increases the probability of subjects preferring a male leader by 32.5 percentage points (p < 0.001, 95% CI [20.7,44.3]). The second column shows that including the control for gender stereotyping in the model only trivially reduces the marginal effect of physical formidability from 32.5 percentage points to 32.0 percentage points (p < 0.001, 95% CI [20.1, 43.8]). It is also worth noting that moving from a subject who stereotypes the least to one who stereotypes the most increases the probability of preferring a male leader by 31.0 percentage points (p < 0.001, 95% CI [15.3, 46.6]).
Columns 3 and 4 show that including pertinent sociodemographic and political covariates only trivially attenuates the effect of leader physical formidability on preferences for a male leader. The socio-demographics decrease the effect by slightly more than one percentage point to 30.7 percentage points (p < 0.001, 95% CI [19.3,42.0]), while decreasing it by less than one percentage point to 30.0 when political ideology is also included (p < 0.001, 95% CI [19.0, 41.1]). It is worth noting that the effect of gender stereotyping decreases substantially across the range of models declining to 21.7 percentage points (p = 0.01, 95% CI [6.6, 36.9]) when the socio-demographics are also included and to a statistically insignificant effect (12.1, p = 0.12, 95% CI [−2.9, 27.1]) when political ideology is included as well.
The effect of preferences for leader physical formidability on preferences for female versus male leaders persists across a number of models that include pertinent controls including gender stereotyping, the primary alternative explanation. Further, the effect is only trivially attenuated as the controls are added, dropping from a 32.5 percentage-point effect in the bivariate model to a 30.0 percentage-point effect in the fully specified model. These results support H 2 . They are consistent with arguments that the preference for a physically formidable leader is associated with a decreased preference for a female leader.

Testing H3: Threat Affects Leader Preferences Through Physical Formidability
Having provided evidence that intergroup threat stimulates a greater preference for physically formidable leaders (H 1 ) and demonstrated that a greater preference for a physically formidable leader stimulates a greater preference for a male leader (H 2 ), the third and final set of analyses is designed to establish a link from intergroup threat through preferences for leader physical formidability to preferences regarding the biological sex of a preferred leader (Figure 1, (ab)). Specifically, H 3 states that the preference for a male versus female leader will be at least partially attributable to a sense of external threat that is conveyed through a preference for a physically formidable leader. This analysis is intended to test the key link between the evolutionarily salient treatments and preferences for male versus female leaders.
This study uses causal mediation analysis to test this hypothesis. Causal mediation analysis is designed to "quantify the effect of a treatment that operates through a particular mechanism. . .the key quantity of interest is the calculation of how much of the treatment variable is transmitted by the mediating variable" (Hicks and Tingley, 2011, 606). In this study, causal mediation models link each threat stimulus with preferences regarding sex of the preferred leader through preferences for a physically formidable leader. Figure 3 presents the formal causal mediation models. Supplementary Appendix G presents the full models. The effect of the threat stimulus on the preference for a physically formidable leader (path a, the quantitative estimate of Figure 1, (a)) and the effect of the preference for a physically formidable leader on the preference for a male leader (path b, the quantitative estimate of Figure 1, (b)) constitute the indirect effect from the threat stimulus to leadership preference (path ab, the quantitative estimate of Figure 1, (ab)). Path ab, the indirect effect, is the effect of primary interest. Path a was estimated with OLS linear regression specifying a bivariate model regressing the preference for a physically formidable leader on the specified treatment. This is the relationship established in tests of H 1 . Path b was estimated with probit regression specifying a multivariate model regressing leader sex preference on leader physical formidability preference, the specified treatment, gender stereotyping, and several sociodemographic and political covariates included in the test of H 2 (i.e., respondent biological sex, age, education, religiosity, and political ideology). This is the relationship established in tests of H 2 . For completeness, the models report the direct effect (path c') of the treatments on the preference for a male leader after controlling for the indirect effect (path ab). As a reminder, this is not the same as the total effect, which does not control for the mediated effect, discussed above. Path c' was estimated in the same model as path b. All paths were estimated using the Mediation package in Stata (Hicks and Tingley, 2011) and robust standard errors.
The figure presents the primary effects of interest, threat on preferences for a male versus female leader via preferences for a physically formidable leader (i.e., path ab or the indirect effects). These results demonstrate the expected effects in two of the three cases. They indicate that compared to respondents receiving the non-threat treatments, those who received the threat treatment of war compared to the control (Figure 3 panel A) and cooperation (Figure 3 panel C) treatments were statistically more likely to prefer a male leader, and this preference was partially attributable to a greater preference for a physically formidable leader. The results also hint at an indirect effect of war compared to peace (p = 0.104 two tailed).
These results mostly support H 3 . They indicate that intergroup threat tends to stimulate a greater preference for a male versus female leader, and that greater preference is partially transmitted through the preference for a physically formidable leader.

DISCUSSION
Research in a variety of contexts finds that individuals often use gender-based heuristics to evaluate females and males. Learning or environment-related explanations such as gender stereotyping (e.g., Huddy and Terkildsen, 1993;Kahn, 1996;Sanbonmatsu, 2002;Bauer, 2015) successfully account for some of the variance in this behavior, but, like most social science models, they also leave a substantial amount of variance to explain. Because human behavior is the result of both environment and biology and the interaction between the two, this research attempts to account for additional variance by broadly looking at the issue employing the biological theory of evolution by natural selection, a foundational explanation for the diversity and function of living organisms, and by specifically framing leadership preferences in terms of an evolutionary consideration, varying levels of group threat. The results for tests of H 1 generally suggest that increased group threat increases preferences for physically formidable leaders. In particular, increased threat increases the preference for a physically imposing/intimidating leader and to a lesser degree a physically strong leader. The results clearly support H 2 and indicate that the preference for a physically formidable leader is associated with an increased preference for a male leader. Finally, two of three tests of H 3 indicate there is a causal link between increased intergroup threat and the preference for a male over female leader that is at least partially attributable to subjects' preference for a physically formidable leader.
Overall, the results support the argument that the advantage males have over females in regard to national executive leadership may be the result of long-term evolutionary forces. In terms of applying an evolutionary analytical framework (Lewis et al., 2017), the ancestral environment posed a frequent and impactful adaptive problem of threats to individual survival and growth in the form of aggression and conflict over vital resources such as food, shelter, and social status (Petersen et al., 2008). One potential solution to such dangerous ancestral environments was physically formidable leaders, who helped allies acquire and maintain vital resources due to their significant resource holding potential. Given an adaptive preference for physically formidable leaders, sexual dimorphism, or persistent advantages of males over females in terms of size and strength, created a non-adaptive and non-functional but lingering outcome (i.e., incidental by-product) that advantages males over females in national leadership attainment.
It is important to note that the results presented here do not fully explain or even attempt to explain "why men in all human societies have tended to wield more political leadership than women" (von Rueden et al., 2018, 403). They do, though, shed light on a high-profile and important situation in which males have had a vastly disproportionate presence: national executive leadership. Members of the polity view these national leaders as the head of the military. For instance, of the limited constitutional powers specifically given to U.S. Presidents, Article II Section 2 of the US Constitution states, "The President shall be Commander in Chief of the Army and Navy of the United States, and of the Militia of the several States" (U.S. Const. art. II, § II, 1992). When political leaders take their countries to war, the result (win, lose, or draw) affects the leader's likelihood of remaining in office (Croco, 2011). Considerations such as culpability and vulnerability for the involvement in war affect the impact, but this does not change the fact that leaders are still held responsible for taking their country to war or coming into power during a war (Croco and Weeks, 2016). On the other hand, subnational leaders often do not have "war making" duties. As such, there may not be a link between the effects of physical formidability and gendered leadership preferences in other situations. For example, although there have been no female U.S. Presidents, there have been 44 female governors of U.S. states (Center for American Women in Politics, 2020) and in 2019 nearly 17 percent of cities with populations over 30,000 had female mayors (United States Conference of Mayors, 2020).
Methodologically, some may wonder if a demand effect is at play. That is, subjects are motivated to describe a male or physically formidable leader in the war condition because that is the most socially appropriate response irrespective of their true preferences. While this is possible in survey research, it seems unlikely here. There was little to no incentive to "respond appropriately." Answers were not scored or tied to rewards for respondents, and the instrument was fielded online and anonymously. Further, there is little to no evidence of a demand effect for other measures. For instance, the peace and cooperation treatments did not stimulate a preference for a friendly leader relative to the war treatment (see Table 1), which some could suggest is a demand effect. Further, and more directly, in the test of the total effect, the war treatment did not stimulate a preference for male leader relative to the peace and control treatments (see Figure 2). Those effects are only detected as indirect effects (see Figure 3).
Future research needs to confirm these results through conceptual replication with different measures and varied samples, in particular samples outside Western, educated, industrialized, rich, and democratic societies, which some researchers claim are outliers on a number of characteristics and not suitable for using to generalize broadly about humans (Henrich et al., 2010). It would also be appropriate to attempt to reproduce these results using other estimation methods. For instance, Spencer et al. (2005) propose an alternative research design to assess mediated effects that they call a "measurementof-mediation" design. This design uses a series of experiments that Spencer and colleagues suggest provides a superior approach to estimating mediation effects under certain conditions. Further, future research could advance this argument by probing the assertion that the phenomenon is an incidental by-product of the evolutionary process and not a random effect or even an adaptation. Evidence that it is a by-product or even random effect, which imply there are no implications for humans' survival and reproduction, would suggest much different theoretical and policy considerations than evidence that it is an adaptation, which implies the implications are vital.
Despite recent strides in leadership attainment by females (Geiger and Kent, 2017), the slow progress disappoints and surprises many who recognize the leadership skills women often bring to bear on society's pressing issues. The sluggish progress suggests that conventional explanations may be overlooking additional factors. These results along with other evidence spanning time, cultures, and species suggest these outcomes may be related to very long-term factors related to evolution that are extraordinarily difficult to overcome (Tooby and Cosmides, 1992;Li et al., 2017). If this is the case, and if some societies demand to expand the pool of leadership talent, then those societies may deem it necessary to intervene directly in democratic decision making to accelerate the expansion of their leadership pools by, for instance, implementing or increasing gender-based quotas among elected officials. Although researchers have not reached a consensus on the effects of electoral gender quotas (Dahlerup, 2012), as of 2013, 57 countries had some type of legislated gender quota for national-level legislative bodies and 37 countries had political parties with voluntary quotas (Dahlerup et al., 2013).
Regardless of what emerges on the policy agenda, this research offers a more complete explanation of the imbalance in leadership attainment between men and women. It suggests that biological factors also matter in leadership-followership behavior.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Cooperative Congressional Election Study Human Subjects Review, Harvard University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
Leading explanations for advantage males have over females in national leadership attainment account for some of the variance but leave a great deal unexplained. This research evaluates the issue using the biological theory of evolution by natural selection and finds that this approach accounts for additional variance in this phenomenon. The results suggest the predominant preference for male over female leaders in some contexts may be the non-adaptive and non-functional but lingering outcome of an adaptive preference for physically formidable allies that was shaped by natural selection in ancestral environments. Both authors contributed to the article and approved the submitted version.

FUNDING
This study was supported by the Texas Tech University (provided the funding for the data collection).