# PERCEPTIONS OF PEOPLE: CUES TO UNDERLYING PHYSIOLOGY AND PSYCHOLOGY

EDITED BY : Kok Wei Tan, Lisa L. M. Welling, Ian D. Stephen, Alex L. Jones and Danielle Sulikowski PUBLISHED IN : Frontiers in Psychology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-748-5 DOI 10.3389/978-2-88963-748-5

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# PERCEPTIONS OF PEOPLE: CUES TO UNDERLYING PHYSIOLOGY AND PSYCHOLOGY

Topic Editors:

Kok Wei Tan, University of Reading Malaysia, Malaysia Lisa L. M. Welling, Oakland University, United States Ian D. Stephen, Macquarie University, Australia Alex L. Jones, Swansea University, United Kingdom Danielle Sulikowski, Charles Sturt University, Australia

Citation: Tan, K. W., Welling, L. L. M., Stephen, I. D., Jones, A. L., Sulikowski, D., eds. (2020). Perceptions of People: Cues to Underlying Physiology and Psychology. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-748-5

# Table of Contents

*05 Editorial: Perceptions of People: Cues to Underlying Physiology and Psychology*

Danielle Sulikowski, Kok Wei Tan, Alex L. Jones, Lisa L. M. Welling and Ian D. Stephen

*08 Evidence That the Hormonal Contraceptive Pill is Associated With Cosmetic Habits*

Carlota Batres, Aurélie Porcheron, Gwenaël Kaminski, Sandra Courrèges, Frédérique Morizot and Richard Russell

*13 The Relationship Between Observers' Self-Attractiveness and Preference for Physical Dimorphism: A Meta-Analysis*

Lijun Chen, Xiaoliu Jiang, Huiyong Fan, Ying Yang and Zhihong Ren


*65 Facial Adiposity, Attractiveness, and Health: A Review* Stefan de Jager, Nicoleen Coetzee and Vinet Coetzee

*81 The Influence of Body Composition Effects on Male Facial Masculinity and Attractiveness*

Xue Lei, Iris J. Holzleitner and David I. Perrett


Amany Gouda-Vossos, Robert C. Brooks and Barnaby J. W. Dixson

*157 Roar of a Champion: Loudness and Voice Pitch Predict Perceived Fighting Ability but Not Success in MMA Fighters* Pavel Šebesta, Vít Třebický, Jitka Fialová and Jan Havlíček


# Editorial: Perceptions of People: Cues to Underlying Physiology and Psychology

Danielle Sulikowski <sup>1</sup> \*, Kok Wei Tan<sup>2</sup> , Alex L. Jones <sup>3</sup> , Lisa L. M. Welling<sup>4</sup> and Ian D. Stephen5,6,7

 *Perception and Performance Research Group, School of Psychology, Charles Sturt University, Bathurst, NSW, Australia, School of Psychology and Clinical Language Sciences, University of Reading Malaysia, Gelang Patah, Malaysia, Department of Psychology, College of Human and Health Sciences, Swansea University, Swansea, United Kingdom, Psychology Department, Oakland University, Rochester, MI, United States, <sup>5</sup> Department of Psychology, Macquarie University, Sydney, NSW, Australia, <sup>6</sup> Perception in Action Research Centre, Macquarie University, Sydney, NSW, Australia, Body Image and Ingestive Behaviour Group, Macquarie University, Sydney, NSW, Australia*

Keywords: mate quality signals, health, fertility, formidability, person perception

#### **Editorial on the Research Topic**

#### **Perceptions of People: Cues to Underlying Physiology and Psychology**

Our perceptual sensitivity to cues of socially and sexually relevant physiological and psychological traits in others is remarkable. For such sensitivity to evolve, the directly perceptible qualities of others (which include intrinsic physical traits, such as height, weight, body odor, facial morphology, and body shape; as well as behaviorally modified appearance cues, such as those produced by clothing and makeup; and vocal parameters) must afford at least somewhat accurate judgements of others' health, fertility, formidability, personality, or other fitness-relevant capacities. The current issue examines whether a variety of perceptible qualities present potentially valid cues of underlying physiology and psychology, and/or the extent to which such cues support adaptive judgements of others.

The human face is a highly complex signaling system, and not surprisingly, it features heavily in this Research Topic. Sexual dimorphism (masculinity in male faces and femininity in female faces) is one on the most studied aspects of facial morphology (for review, see, e.g., Little et al., 2011) and its importance in mate relevant choices is again highlighted in the current issue. Facial morphological correlates of bodily muscle mass in male faces predict female perceptions of facial masculinity; and such correlates are perceived as more attractive for short-term relationships, and less attractive for long-term relationships by said women (Lei et al.). Chen et al.. present a meta-analysis of preferences for (primarily) facial sexual dimorphism, confirming well-reported associations between sexual dimorphism and perceived attractiveness. Preferences for highly sexually dimorphic partners are thought to be condition-dependent, reflecting a trade-off made by higher quality individuals, which compromises kind, caring personalities (especially in less masculine men) in favor of more sexually dimorphic, genetically robust individuals. Consistent with this theory, Chen et al. also report a reliable (though small) positive association between own attractiveness and preferences for sexual dimorphism (especially in women rating men as longterm partners, and men rating women as short-term partners). Šterbová et al. ˇ reported that women's long-term partner preferences are generally highly stable from one relationship to the next, with one exception: facial masculinity. The facial masculinity of participants' long-term partners varied more than expected by random coupling—the only trait of 21 measured to do so. In addition, long-term partners with whom the participants had children also had more masculine faces than non-fathers.

Edited and reviewed by: *Peter Karl Jonason, University of Padova, Italy*

\*Correspondence: *Danielle Sulikowski danielle.sulikowski@ymail.com*

#### Specialty section:

*This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology*

Received: *29 February 2020* Accepted: *17 March 2020* Published: *08 April 2020*

#### Citation:

*Sulikowski D, Tan KW, Jones AL, Welling LLM and Stephen ID (2020) Editorial: Perceptions of People: Cues to Underlying Physiology and Psychology. Front. Psychol. 11:643. doi: 10.3389/fpsyg.2020.00643*

Beyond sexual dimorphism, facial adiposity, and skin color are also important cues to physical and physiological health. Facial adiposity is a highly reliable indicator of body mass index and numerous health outcomes (most notably those associated with being overweight or obese; de Jager et al.). Across cultures and ethnicities, skin yellowness (which indicates the presence antioxidant carotenoids), and to a lesser extent skin redness (an indicator of blood flow associated with physical fitness and levels of sex hormones) are subjectively perceived by observers as indicators of physical health (Tan and Stephen). Cues to physiological health, especially anaerobic capacity, may also be present in human faces. Judgements of the fighting ability of mixed martial arts fighters, based on 360◦ headshots, were jointly predicted by the fighters' weight and anaerobic capacity (Trebick et al.). Although this study took advantage of the additional information available in a 360◦ headshot compared to a single front-on portrait, Trebick et al. also demonstrated that attractiveness and formidability judgements of front-on, profile, and 360◦ portraits differ very little in terms of their means, and are strongly inter-correlated. This observation bolsters many conclusions drawn from across the field of evolutionary psychology of face perception, many of which are based on judgements of single front-on portraits.

The judgments we make of human faces may also influence how we perceive non-human primate faces. Primate faces that are least like human faces are perceived as the most beautiful (Rádlová et al.), an effect attributed to the uncanny valley (i.e., the hypothesized relationship between resemblance to humans and emotional response; e.g., Mori et al., 2012). The more like human faces primate faces become, however, the more that human-face predictors of attractiveness judgements (in terms of the exact arrangements of the internal features), also predicted the beauty judgements of the primate faces.

New data-driven methods for analyzing facial morphology have also been presented in this Research Topic. Mogilski and Welling present a novel application of data-driven conjoint analysis showing that eyebrow thickness, jaw prominence, and facial height exhibit superior signaling capacity for judgements of sexual dimorphism and attractiveness compared to other regions of the face. Kleisner et al. proposed a new method for measuring cultural typicality and cross-cultural distinctiveness of faces. Based on calculating an individual face's exact position on the vector connecting the facial morphological averages of the two cultures being compared, Kleisner et al. were able to predict significant variance in faces' perceived distinctiveness. They also demonstrated that in-group/out-group judgements were easier (in a two-alternative forced-choice design) when the out-group face was furthest away from the in-group mean. Both of these novel methodologies open the door for more nuanced analyses of the signaling capacity of human facial morphology.

Non-morphological, behavioral cues, including vocalizations and body odor, also provide information about their bearer. Single men, compared to partnered men, exhibit stronger body odor, and based on their body odor alone are judged by women to be more masculine (Mahmut and Stevenson). Listeners also adjust their perceptions of another based on the pitch of vocalizations. Mixed martial arts fighters are judged to be more formidable if their roars (spontaneous vocalizations made when victorious) are higher pitched, and also if their speaking voice is lower pitched (Šebesta et al.). Similarly contrasting effects of pitch are observed when comparing singing and speaking voices. The perceived attractiveness of singing and speaking strongly correlate across individuals, and both predict physical size in men (Valentova et al.). Also in men, low-pitched speech, but higherpitched singing, predict higher sociosexuality. In women, shorter apparent vocal tract length (indicated by shorter spacing between the first four formant frequencies) in speech, but longer apparent vocal tract length during singing, predicted higher sociosexuality (Valentova et al.). It therefore appears that both pitch and range of vocalizations provide important cues as to the quality of potential mates and competitors.

Extended phenotypic traits, manifesting beyond the bearer's physical body, may also constitute socially meaningful cues. Two such adornments include clothing and make-up. Gouda-Vossos et al. investigated how wearing business vs. casual attire influenced perceptions of men's and women's socio-economic status. Business attire increases the perceived economic status of men more than it does for women, while the perceived economic status of women is increased if depicted in business attire alongside a group of men (whereas the economic status of men is not increased by being depicted among a group of women). The consequences of such judgements for interpersonal interactions are not known. Women use cosmetics to alter their physical appearance and this can affect their perceived attractiveness. Batres et al. observed that women using the contraceptive pill spend less time putting on make-up for an outing than do naturally cycling women, and that such women are indeed perceived as wearing more make-up. This finding adds to the breadth of behaviors in which circulating sex hormones are implicated (see Welling and Shackelford, 2019), although the complexities of female cosmetics use, what it signals, and how it is perceived remain to be thoroughly investigated.

The methods used by evolutionary psychologists to understand how the multitude of physical and behavioral cues we possess are signaled and received were also a focus of critique within this special issue (see Bovet; Kleisner et al.; Trebick et al.). Bovet reviewed the literature concerning women's waist-to-hip ratios and criticized the weak theoretical foundations of the area. Highlighting the large number of potential signaling functions of the waist-to-hip ratio and the small number of studies designed to specifically differentiate between the competing theories, Bovet cautions against rushing into empirical work without first establishing a clear theoretical basis to guide empirical robustness and consistency. Doing so can lead to imprecise, untestable predictions and seemingly contradictory results, leading to the premature rejection of potentially legitimate theories based on findings of objectively low evidentiary value.

Cues and signals of physiology, behavior, and personality exist in faces, bodies, voices and our extended phenotypes. For many such potential cues, we have demonstrated objective links between the putative cue and the underlying quality it may indicate; observed receiver sensitivity to such cues; or put forward coherent adaptive and mechanistic theories accounting for how such cues may come to signal these qualities in the first place (in either a proximate or ultimate sense). For this area of person perception to continue to move forward, concerted efforts are needed to address all three of the above outcomes for each individual cue-quality-receiver system. By triangulating objective relationships between cues and qualities, the impact of such cues on receiver psychology, and sophisticated mechanistic, functional, and adaptive theories to explain the evolution and

## REFERENCES


maintenance of these systems, we will amass a body of work with great potential impact for understanding modern human social, romantic, and sexual interactions.

## AUTHOR CONTRIBUTIONS

DS was the primary author of the first draft. All authors contributed to revisions and corrections of the final manuscript.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Sulikowski, Tan, Jones, Welling and Stephen. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Evidence That the Hormonal Contraceptive Pill Is Associated With Cosmetic Habits

Carlota Batres1,2 \*, Aurélie Porcheron3,4, Gwenaël Kaminski5,6, Sandra Courrèges<sup>3</sup> , Frédérique Morizot<sup>3</sup> and Richard Russell<sup>1</sup>

<sup>1</sup> Department of Psychology, Gettysburg College, Gettysburg, PA, United States, <sup>2</sup> Department of Psychology, Franklin and Marshall College, Lancaster, PA, United States, <sup>3</sup> CHANEL Fragrance & Beauty Research & Innovation, Pantin, France, <sup>4</sup> Laboratoire de Psychologie et NeuroCognition, Université Pierre Mendès-France, Grenoble, France, <sup>5</sup> CNRS (UMR 5263), Cognition, Langues, Langage, Ergonomie, Université de Toulouse, Toulouse, France, <sup>6</sup> Institut Universitaire de France, Paris, France

Hormonal contraception is known to cause subtle but widespread behavioral changes. Here, we investigated whether changes in cosmetic habits are associated with use of the hormonal contraceptive pill. We photographed a sample of women (N = 36) who selfreported whether or not they use the contraceptive pill, as well as their cosmetic habits. A separate sample of participants (N = 143) rated how much makeup these target women appeared to be wearing. We found that women not using the contraceptive pill (i.e., naturally cycling women) reported spending more time applying cosmetics for an outing than did women who use the contraceptive pill. We also found that the faces of these naturally cycling women were rated as wearing more cosmetics than the faces of the women using the contraceptive pill. Thus, we found clear associations between contraceptive pill use and makeup use. This provides evidence consistent with the possibility that cosmetic habits, and grooming behaviors more generally, are affected by hormonal contraception.

Keywords: cosmetics, makeup, contraception, birth control, grooming behaviors

## INTRODUCTION

The hormonal contraceptive pill is used by approximately 100 million women worldwide (Christin-Maitre, 2013). Its main function is to change the hormonal state of the menstrual cycle in order to mimic, and thus prevent, pregnancy (Alvergne and Lummaa, 2010). While the majority of women take the contraceptive pill to prevent pregnancy, approximately 14% of women use it for other reasons, such as for lessening menstrual pain and migraines (Cooper and Adigun, 2017).

In addition to its medical side effects, the contraceptive pill has also been linked to several behavioral effects (Welling et al., 2012). For example, one study found that married women not using the contraceptive pill (i.e., naturally cycling women) showed an increase in female-initiated sexual behavior at the time of ovulation, whereas women using the contraceptive pill did not show such a rise (Adams et al., 1978). Other studies have also found that the hormonal state of the contraceptive pill changes women's partner preferences. For instance, unlike naturally cycling women, one study found that those using the contraceptive pill prefer less masculine male faces (Little et al., 2002, but see Jones et al., 2018).

Research has also found that the contraceptive pill affects women's ability to attract mates. For example, one study found that for naturally cycling women, their voices became more attractive as

#### Edited by:

Alex L. Jones, Swansea University, United Kingdom

#### Reviewed by:

Benedict C. Jones, University of Aberdeen, United Kingdom Danielle Leigh Wagstaff, Federation University, Australia

\*Correspondence:

Carlota Batres cbatres@fandm.edu

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 22 May 2018 Accepted: 24 July 2018 Published: 23 August 2018

#### Citation:

Batres C, Porcheron A, Kaminski G, Courrèges S, Morizot F and Russell R (2018) Evidence That the Hormonal Contraceptive Pill Is Associated With Cosmetic Habits. Front. Psychol. 9:1459. doi: 10.3389/fpsyg.2018.01459

**8**

their risk of conception increased but no such effect was found for women using the contraceptive pill (Pipitone and Gallup, 2008). One field study examined the tip earnings of professional lap dancers and found that those who were naturally cycling made more money per shift than those who were using the contraceptive pill (Miller et al., 2007).

In this study, we aimed to examine whether the use of the contraceptive pill also influences grooming behaviors, particularly cosmetic habits. Previous research has found that among naturally cycling women, the time spent putting on cosmetics, and the rated level of cosmetics used by women, seems to be higher near ovulation (Guéguen, 2012). No study, however, has examined whether there is a difference in cosmetic habits between naturally cycling women and women using the contraceptive pill. Previous research has shown that, unlike women using the contraceptive pill, naturally cycling women change their appearance near ovulation to look more attractive (Haselton et al., 2007; Durante et al., 2011). Given that makeup is one way that women can alter their attractiveness (Graham and Jouhar, 1981; Jones et al., 2015; Jones and Kramer, 2016; Batres et al., 2018), we predicted that naturally cycling women would report spending more time applying cosmetics, and would be rated as wearing more cosmetics, than women using the contraceptive pill.

## STUDY 1

## Methods

#### Participants and Procedure

Thirty six women (M age = 19.58 years, SD = 2.14) completed the study. The participants were recruited at the University of Grenoble using advertisements and obtained financial compensation to participate in the study. The research was performed in accordance with the Declaration of Helsinki: it was conducted with the understanding and the written consent of each participant (who were instructed that their photographs would be taken) and was approved by local ethics boards (CNRS and the University of Grenoble).

After arriving at the laboratory, 20 min were allowed to pass in order to allow the participants' skin to acclimatize to the indoor temperature. Photographs were then taken of each participant facing forward, under constant camera/lighting conditions, with neutral expressions, and closed mouths. Each participant also completed a questionnaire in which she reported whether she was taking hormonal contraceptives, if so, what type of hormonal contraceptives (e.g., the contraceptive pill, an intrauterine device), whether she had regular menstrual cycles, and whether she was in a relationship. All 36 women reported having regular menstrual cycles. Twenty women reported not using any hormonal contraceptives (i.e., naturally cycling) and 16 women reported using hormonal contraceptives, out of which 100% reported using the contraceptive pill.

Each participant also completed two questions pertaining to her cosmetic habits: how much time she spent on her daily makeup in the morning and how much time she spent making up for an outing. These two questions allowed us to define two variables concerning cosmetic habits: daily cosmetics and outing cosmetics. To examine the association between contraceptive pill use and cosmetic habits, we performed independent samples t-tests on both cosmetic habits variables.

#### Results

Naturally cycling women reported spending more time applying cosmetics (both daily and for an outing) than women using the contraceptive pill (see **Figure 1**). The difference between naturally cycling women and women using the contraceptive pill was not statistically significant for the amount of time they spent on their daily cosmetics, t(33) = 1.59, p = 0.121, Cohen's d = 0.51, but it was statistically significant for the amount of time they spent on their cosmetics for an outing, t(19.6) = 2.30, p = 0.033, Cohen's d = 0.80.

## STUDY 2

## Methods

#### Participants and Procedure

One hundred and forty three Gettysburg College students participated in Study 2 (M age = 18.53 years, SD = 0.87, 56 male, 87 female) as part of a course requirement. Ethical approval was received from the Gettysburg College Institutional Review Board. Participants were instructed that they would be viewing and rating face images on a computer. Participants were asked to rate each face on the question: "How much makeup does this face have?," where 1 = Very little makeup and 7 = A lot of makeup. The faces presented were those of the women from Study 1. To examine the association between pill use and perceived amount of cosmetics, we performed a linear mixed model.

#### Results

Due to repeated measurements, we included the target faces and the participants as random effects. Naturally cycling women

FIGURE 1 | In minutes, reported amount of time spent applying daily cosmetics in the morning and reported amount of time spent making up for an outing, for both naturally cycling women (i.e., those not using the contraceptive pill) and women using the hormonal contraceptive pill. The asterisk indicates a significant difference (∗p < 0.05). Error bars indicate the standard error of the mean.

were rated as having higher levels of cosmetics [3.65 (3.24–4.06)] than women using the contraceptive pill [2.79 (2.43–3.17)] (see **Figure 2**). We found that the difference in amount of perceived cosmetics between naturally cycling women and women using the contraceptive pill was statistically significant [χ2(1) = 8.86, p = 0.003, β = 0.85 [0.31–1.39)].

## DISCUSSION

The results from Studies 1 and 2 provide the first evidence that the use of the hormonal contraceptive pill is associated with cosmetic habits. In Study 1, we found that naturally cycling women spent more time applying cosmetics for an outing than women using the contraceptive pill. On average, naturally cycling women reported spending an extra 13 min applying cosmetics for an outing than women using the contraceptive pill. Naturally cycling women also reported spending more time applying their daily cosmetics than women using the contraceptive pill, but this difference was not statistically significant. In Study 2, we found that the faces of the naturally cycling women (taken during the day) were rated by participants as having higher amounts of cosmetics than the faces of the women using the contraceptive pill. This suggests that while naturally cycling women may not spend more time applying their daily makeup, they use more visible cosmetics. These findings thus provide evidence that contraceptive pill use is associated with cosmetics use.

This is consistent with previous research suggesting that naturally cycling women are found more attractive than women using the hormonal contraceptive pill (Miller et al., 2007). Our study, however, proposes that part of such difference in attractiveness may be due to cosmetics. In other words, naturally cycling women may, in part, be found more attractive than women using the contraceptive pill because they wear more cosmetics, which greatly increases attractiveness (Graham and Jouhar, 1981; Batres et al., 2018). For instance, Miller et al.'s (2007) study found that naturally cycling lap dancers had higher tip earnings than those using the contraceptive pill, but part of that difference may be explained by a change in the amount of cosmetics worn by the dancers. Indeed, research has observed that waitresses receive higher tips when they are wearing cosmetics compared to when they are not (Jacob et al., 2009). This suggests it is likely that different makeup use explains part of the differences in attractiveness found between naturally cycling women and women using hormonal contraception.

Our results may also point to a larger issue: that the contraceptive pill may be associated with behavioral changes that affect women's grooming practices more generally, of which cosmetics is just one part. In other words, the contraceptive pill may suppress how women adorn themselves, otherwise referred to as the extended phenotype (Etcoff et al., 2011). Some studies have examined how the extended phenotype changes throughout the menstrual cycle. For instance, one study found that selfgrooming and ornamentation through attractive choice of dress increased during high fertility periods (Haselton et al., 2007). However, a recent longitudinal study did not find evidence for fertility-linked changes in women's clothing choices (Arslan et al., 2017). Similar studies are needed to compare the extended phenotype of women who are using the contraceptive pill and those that are not. Such research would shed light on whether cosmetic habits are the only form of ornamentation influenced by the contraceptive pill or whether it is just one facet of a greater behavioral shift in how women present themselves (e.g., grooming, jewelry, clothing).

Our study did not control the phase of the menstrual cycle in which women were photographed nor asked how long the women had been using or not using the contraceptive pill and therefore future research should control for this. We also did not define what was meant by an outing and the interpretation of this could have had some impact on self-reported time spent applying cosmetics. For example, women may have reported less time if an outing was interpreted as going out for dinner versus if it was interpreted as going out to a club. Future studies would therefore benefit from being more specific when asking about time spent applying cosmetics for an outing. Similarly, while we asked the women how much time they spend on their daily makeup in

the morning, it would have been helpful to ask how much time they spent making up that exact morning in order to better link cosmetics application time with perceptual differences. Moreover, adding other measures of cosmetics use would also be beneficial in future studies (e.g., "How natural does the makeup on this face look?"). Women who are skilled at perfecting the "natural" look are likely to spend more time applying cosmetics but may appear to be wearing less makeup.

Further research is still needed to better understand links between the contraceptive pill and cosmetic habits as our study was a non-randomized between-subjects design and research has found that naturally cycling women and women using the contraceptive pill differ in other ways (Alexander et al., 1990). For example, Little et al. (2002) found that women using the contraceptive pill reported having more lifetime sexual partners than naturally cycling women. A related concern is that in our study only 19% of naturally cycling women reported being in a relationship, while 85% of women using the contraceptive pill reported being in a relationship. Our sample was too small to include relationship status as a factor in our analyses and thus future research with a larger sample is needed. No study has examined differences in cosmetic habits between single women and women in a relationship, however, such a difference could very well be possible. In order to address these two concerns and confirm the link between the contraceptive pill and cosmetic habits, an experimental design, rather than the correlational one used here, would be needed. More specifically, women would need to be randomly assigned to use or not use the contraceptive pill in order to be able to establish a casual effect between the contraceptive pill and cosmetic habits. However, such an experiment is unlikely due to ethical reasons, which is why the current literature has relied on non-randomized betweensubjects designs (e.g., Adams et al., 1978; Little et al., 2002; Miller et al., 2007; Pipitone and Gallup, 2008; Welling et al., 2012).

#### CONCLUSION

In conclusion, we found that, compared to women using the contraceptive pill, naturally cycling women self-reported

#### REFERENCES


spending more time applying cosmetics for an outing, and their faces were rated as having higher levels of cosmetics. These results provide initial evidence that the contraceptive pill is associated with cosmetic habits. Moreover, this association may be part of a broader relationship between contraceptive pill use and other grooming behaviors, stemming from hormone-mediated changes in motivations.

## ETHICS STATEMENT

The experiments were undertaken with the understanding and written consent of each subject, with the approval of the appropriate local ethics committees, and in compliance with national legislation and the Code of Ethical Principles for Medical Research Involving Human Subjects of the World Medical Association (Declaration of Helsinki).

### AUTHOR CONTRIBUTIONS

CB, AP, GK, FM, and RR conceived and designed the research. CB and GK acquired the data. CB, GK, and RR analyzed the data. CB, AP, GK, SC, and RR interpreted the results. CB drafted the work. CB, AP, GK, SC, and RR critically revised the paper. CB, AP, GK, SC, FM, and RR approved the final version to be published and agreed to be accountable for the content of the work.

## FUNDING

The research was funded in part by CHANEL Fragrance & Beauty Research & Innovation.

## ACKNOWLEDGMENTS

We thank Lucille Brunero for help with collecting data for this manuscript.



Evol. Hum. Behav. 28, 375–381. doi: 10.1016/j.evolhumbehav.2007. 06.002


**Conflict of Interest Statement:** AP, FM, and SC work at CHANEL Fragrance & Beauty Research & Innovation, a cosmetics company, and CB and RR receive funding from CHANEL Fragrance & Beauty Research & Innovation.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Batres, Porcheron, Kaminski, Courrèges, Morizot and Russell. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Relationship Between Observers' Self-Attractiveness and Preference for Physical Dimorphism: A Meta-Analysis

Lijun Chen1,2, Xiaoliu Jiang<sup>1</sup> , Huiyong Fan<sup>3</sup> , Ying Yang<sup>1</sup> and Zhihong Ren4,5 \*

*<sup>1</sup> School of Humanities and Social Sciences, Fuzhou University, Fuzhou, China, <sup>2</sup> Institute of Psychological and Cognitive Sciences, Fuzhou University, Fuzhou, China, <sup>3</sup> College of Teacher, Bohai University, Jinzhou, China, <sup>4</sup> Key Laboratory for Adolescent Cyberpsychology and Behavior of the Education Ministry, Wuhan, China, <sup>5</sup> Laboratory of Human Development and Mental Health, Institute of Psychology, Central China Normal University, Wuhan, China*

Background: Many studies have reported an association between observers' self-attractiveness and their preference for sexual dimorphism across different physical domains, including the face, voice, and body. However, the results of these studies are inconsistent. Here, a meta-analysis was conducted to estimate the association between observers' own attractiveness and their dimorphic preference.

## Edited by:

*Lisa L. M. Welling, Oakland University, United States*

#### Reviewed by:

*Anthony Little, University of Bath, United Kingdom Nicole Barbaro, Oakland University, United States Lei Chang, University of Macau, China*

> \*Correspondence: *Zhihong Ren psyren@qq.com*

#### Specialty section:

*This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology*

Received: *15 June 2018* Accepted: *19 November 2018* Published: *05 December 2018*

#### Citation:

*Chen L, Jiang X, Fan H, Yang Y and Ren Z (2018) The Relationship Between Observers' Self-Attractiveness and Preference for Physical Dimorphism: A Meta-Analysis. Front. Psychol. 9:2431. doi: 10.3389/fpsyg.2018.02431* Methods: Major electronic databases including PsycINFO, Web of Science, PubMed, ProQuest, and Google Scholar were searched during April 2017 (the first time) and April 2018 (the second time). The effect size computation and moderating effect analyses were conducted separately for masculine and feminine preferences.

Results: We identified 5,359 references, of which we included 25 studies (*x* = 55, *x* = number of the effect size) with 6,853 participants in the meta-analysis. Across these studies, the correlation between observers' own attractiveness and their sexual dimorphic preference was 0.095 (*x* = 55) and that for preference for masculinity (*x* = 39) and femininity (*x* = 16) were 0.102 and 0.076, respectively. The results of the funnel plot, Egger's regression method, and fail-safe number suggested that there was no obvious publication bias. The relationship depended on the relationship context (short or long-term), opposite or same sex (the gender of the observer and host), measures of observers' self-attractiveness (subject or objective), and preference task (e.g., attractiveness rating, forced-choice, and face sequence test). Furthermore, for female participants, using a hormonal contraceptive also influenced their masculinity preference. The effect size for the preference for a masculine body and voice was larger than that for facial masculinity.

Conclusion: We found a small but significant correlation between self-attractiveness and physical dimorphic preference, the relationship was moderated by the relationship context, same/opposite-sex, and contraceptive using. These three moderating effects represented the observer's trade-off on good genes, good provider and good father (3Gs) consistent with the life history strategies. Besides, measurement of observers' attractiveness, type of preference task and stimuli may also involve the relationship.

Keywords: femininity, masculinity, meta-analysis, self-attractiveness, sexual dimorphism

## INTRODUCTION

Secondary sexual characteristics in adult humans reflect the masculinization or feminization that occurs during puberty (Perrett et al., 1998; Rhodes, 2006). Physical sexual dimorphism is a broad concept that could include sexual dimorphism in multiple domains (e.g., face, body, voice). Sexually dimorphic physical traits are important for mate choice and mate preference in many species, including humans. Several previous studies have observed that humans' preferences for physical cues of extreme secondary sexual characteristics (more feminine for women, more masculine for men) in different domains (e.g., visual, vocal, and bodily) are correlated (Little et al., 2007; Fraccaro et al., 2010). These correlations demonstrate a systematic, rather than arbitrary, variation in humans' preferences for sexual dimorphism, which are consistent with the proposal that sexually dimorphic cues in different domains reflect a common underlying aspect of quality. On the evolutionary view, femininity of women, masculinity of men are proposed to be more attractive because they advertise the good genes of an individual (Rhodes, 2006). Among humans, physical characteristics consistent with the owner's gender are correlated with indices of long-term health (Rhodes et al., 2003; Thornhill and Gangestad, 2006), reproductive potential (Puts, 2005; Rhodes et al., 2005), and low parasite loadings and high immune competence (Thornhill and Gangestad, 1993, 1996), but negatively correlated with prosociality (Haselton, 2005; Haselton and Gangestad, 2006). Men's masculine traits indicate untrustworthiness and bad parental traits (Boothroyd et al., 2007; Smith et al., 2009b), and women's femininity are considered as more likely to be unfaithful, to pursue short-term relationships, and to be in higher risk of cuckoldry (Little et al., 2014).

According to the life history (LH, referring to organisms capturing energy from the environment and using it to produce more organisms) trade-off model strategies in mating choice (Gangestad and Simpson, 2000; Del Giudice and Belsky, 2011), women focused on two types of characteristics when they chose a mate: those indicating a "good provider" (social-economic characteristics, such as wealth, education, career) and "good genes"(physical characteristics) (Gangestad and Buss, 1993; Gangestad and Simpson, 2000). Other researchers believed that the framework of women's mate preference should involve three Gs. Besides good genes, good providers, women also prioritize a man's personality traits—for example, being kind, loving, and staying at home—that constitute good fathers (Buss and Shackelford, 2008; Lu et al., 2015). Good provider indicate men have resource to invest in parenting, traits of good father can reflect men's intention to help raising young, both of the two types of characteristics represent parenting of reproductive effort in post mating events (parental investment), otherwise good genes characteristics were realized as mating attributes affecting premating decisions. When exposed to contrasting environments, women would have evolved to make trade-offs between investment qualities and indicators of good genes contingent on specific environmental conditions, because good genetic males tend to have more mates at the same time, and they invest in each female than in males of lower phenotypic quality, women's emphasis on good genes may be at the cost of men's parental investment (Gangestad and Simpson, 2000). Men also encounter trade-off problems, because, while feminine females possess high attractiveness and good genes, they are also associated with negative personality characteristics (unfaithful) (Haselton, 2005; Haselton and Gangestad, 2006), and men also have expectations about maternal investment. For both men and women (but more for men than women), parental warmth and care (good father or mother) correlated negatively with good-gene and good-provider mate values (Chang et al., 2017). Both men and women encounter the tradeoff of 3Gs mating framework.

Therefore, preferences for dimorphism represent the result of trading off between good genes and parental investment. Differences in how humans resolve this trade-off can lead to individual differences in sexual dimorphic preference. For example, attractive women demonstrate stronger preferences for masculine men than relatively unattractive women do (Little et al., 2001; Little and Mannion, 2006). These are called condition-dependent preferences. In the evolution of species, condition-dependent preferences have been observed in many species, in which individuals in good physical condition tend to show stronger preferences for high-quality mates (Bakker et al., 1999). Condition-dependent preferences in both humans and non-humans may have a common function and they may occur because individuals in good physical condition (i.e., attractive individuals) are better able to compete for and/or retain highquality mates (Little et al., 2001). Additionally, they can offset the costs of choosing a partner with good genes (e.g., by being able to replace a partner more quickly), and therefore, they improve their criteria for mate selection. On the contrary, in order to meet the needs of parental investment, individuals with poor self-conditions may reduce their standards for mate selection, and they prefer mates who are more likely to make high parental investment. The following findings appear to be somewhat analogous to condition-dependent preferences observed in humans. Women's ratings of their own physical attractiveness positively correlated with the strength of their preferences for masculine characteristics in men's faces (Little et al., 2001; Penton-Voak et al., 2003; Smith et al., 2009b). Similar correlations have been found in men's voices (Vukovic et al., 2008, 2010) and bodies (Little et al., 2007). Further, the concept of condition-dependent preferences is conceptualized as "market value dependent preferences": when exposure to attractive samesex images, women perceived themselves less attractiveness and lower prefer for male facial masculinity; whereas exposure to unattractive same-sex images, they perceived themselves more attractiveness and lower prefer for masculinity (Little and Mannion, 2006).

There is inconsistent evidence on the relationship between observers' own attractiveness and their sexual dimorphic preference. While some experimental results have confirmed that attractive females prefer masculine faces (Smith et al., 2009b; Welling et al., 2009; Kandrik and DeBruine, 2012), others did not find such a relationship (Zietsch et al., 2015; Carrito et al., 2016). However, several studies have found that observers' own attractiveness can interact with other variables and impact their preferences (Little et al., 2001; Smith et al., 2009a; Burriss et al., 2011; Chen et al., 2017).

## Influence of Relationship Context in the Preference of Opposite-Sex

The context in which judgments are made can also contribute to differences in the relationship between dimorphic preference and the observer's self-attractiveness (Little et al., 2001, 2002; Penton-Voak et al., 2003; Carrito et al., 2016). There are two types of relationship context: short-term and long-term. The former refers to a sexual relationship, such as a one-night stand; the latter is a lasting relationship, such as being married. The tradeoff theory proposed that contextual factors affect the strength of people's preferences for a masculine or feminine partner (Gangestad and Simpson, 2000). Attractive women prefer more masculine male faces than less attractive women do, and this difference is seen in the context of a long- but not a shortterm relationship (Penton-Voak et al., 2003). This result has been supported by several studies across voice and body stimuli (Little et al., 2007; Feinberg et al., 2012). Otherwise, attractive men exhibit stronger preferences for feminine women only in the short-term context (Burriss et al., 2011; Little et al., 2014). As feminine women were seen as more likely to be unfaithful and more likely to pursue short-term relationships (Boothroyd et al., 2008), and the risk of cuckoldry limit men's preferences for femininity in women and that it could additionally lead to preferences for femininity in short-term mates (Little et al., 2014). Thus, we supposed that the influence of attractiveness on male masculine preferences is more pronounced in the long-term than in the short-term context for women, but for men, the effect of attractiveness on female feminine preference is more prominent in the short-term context.

## Observer's Age and Contraceptive Using

It has been found that reproductively active women had the strongest preference for males' masculinity (Little et al., 2010; Jones et al., 2011), and another study only observed the age effect in women using no contraception (Little et al., 2002). Some studies have found that hormonal contraceptive use may modulate individual differences in women's masculinity preference (Little et al., 2002; Feinberg et al., 2008; Smith et al., 2009a). Furthermore, Vukovic et al. (2008) found that self-rated attractiveness was positively related to the strength of women's preference for masculinized men's voices in women reporting no use of hormonal contraceptives, but not in those using the same. Actually, effects of female observer's age and contraceptive using both due to their physiological hormone levels. Because females of different ages are in different reproductive stages with different hormone levels. And women using hormonal contraceptives are in a hormonal state similar to pregnancy, and consequently, they are unable to realize the benefits that are thought to be associated with choosing a masculine mate (i.e., increased offspring health) (Smith et al., 2009a). In light of the previous studies, we speculate that attractive-contingent preference is stronger in women who do not use hormonal contraceptives than in those who do.

## Measures of Observer's Own Attractiveness

Initially, researchers used self-rated items to determine observers' own attractiveness. Little et al. (2001) used self-reported attractiveness as an indicator of observers' own attractiveness and found that attractive females preferred masculine male faces; Little and Mannion (2006) found that women's subjective impressions of their own market value (i.e., their self-rated attractiveness) is particularly important with reference to the effects of attractiveness on women's masculinity preferences. However, other researchers failed to confirm these findings (Cornwell et al., 2006). Therefore, some researchers have questioned the veracity of subjective assessments, and began to adopt objective measures of self-attractiveness in experiments (Penton-Voak et al., 2003). These objective measures included other-rated facial attractiveness, waist-to-hip ratio (WHR), and body mass index (BMI) (Penton-Voak et al., 2003; Smith et al., 2009b). found that women with high WHR (an indicator of unattractiveness) and/or relatively low other-rated facial attractiveness preferred more feminine male faces when choosing males for a long-term relationship. This result has been confirmed by another study (O'Connor et al., 2012). Then, what kind of measurement (subjective or objective) would be more sensitive to self-attractiveness? In the current study, we examined the moderating role of the measurement of observers' own attractiveness (subjective/objective) and compared the effect sizes of the measurements in this relationship.

Thus, evidence on the relationship between observers' self-attractiveness and preference for sexual dimorphism is equivocal. This is potentially because the preference is conditiondependent. Therefore, we can conclude that observers' selfattractiveness is an important variable that interacts with other variables such as measuring methods, relationship context, same/opposite-sex, and contraceptive use (yes or no) to influence observers' dimorphic preferences. Based on the LH trade-off model strategies and condition-dependent preference, the metaanalysis technique was used in the present meta-analysis to investigate whether observers' sexual dimorphic preferences vary across their self-attractiveness, and to interpret the possible reasons for the divergence. We focused on the following two core issues: What is the totally coefficient of the relationship between the two variables? Which factors moderate their relationship significantly?

## METHODS

## Information Sources and Search

The following search terms were used in combination: sexual dimorphism, masculin<sup>∗</sup> , feminin<sup>∗</sup> , fac<sup>∗</sup> , bod<sup>∗</sup> , vocal, voice, and attractiveness. Search terms for observers' own attractiveness included self-rated attractiveness, self-perceived attractiveness, self-reported attractiveness, self-perceptions of attractiveness, selfratings of attractiveness, other-rated attractiveness, and thirdparty attractiveness ratings. Major electronic databases, including PsycINFO, Web of Science, PubMed, ProQuest, and Google Scholar were searched during April 2017(the first time) and April 2018 (the second time). The reference lists of the included studies were searched to identify additional studies.

## Eligibility Criteria and Study Selection

Only studies that met the following three criteria were included: ➀ The relationship between the sexual dimorphic preference and observer's self-attractiveness was investigated in the study. ➁ Specific data on this relationship were accurately reported in the study (such as the correlation coefficients r; mean; standard deviation; sample size; or corresponding F, t, or χ 2 ) to enable the calculation of the effect size, excluding the data of the structural equation model, path analysis, and multivariate regression analysis. In order to avoid missing important literature, we wrote to the authors (first or corresponding author) to obtain the correlation coefficient if it was not reported in the article. ➂ In cases where there were multiple reports of the same study, we used the first published report. Two authors independently screened the titles and abstracts of the identified articles to exclude ineligible studies. Disagreements were resolved by discussion. We retrieved the full text of the potentially eligible studies and examined full-text reports for further evaluation. The PRISMA flow diagram (Moher et al., 2010) represents all the steps of the literature search (see **Figure 1**).

## Summary Measures

In psychological research, the standard mean difference (d) and correlation coefficient (r) are frequently used to compute effect sizes. In order to integrate the relationship between the sexual dimorphic preference and observers' own attractiveness, the correlation coefficient (r) was used in the present meta-analysis. In some primary studies correlation coefficients can be retrieved from t, F, or χ <sup>2</sup> which reported. The following formulas were used in this context (Card, 2012):

$$\begin{aligned} r &= \sqrt{\frac{t^2}{t^2 + df}}, \, df = n\_1 + n\_2 - 2 \,; r = \sqrt{\frac{F(1, -)}{F(1, -) + df(error)}} ; \\ r &= \sqrt{\frac{\chi^2}{\chi^2 + N}}. \end{aligned}$$

After extracted, all correlations were transformed using Fisher's Z-transformation (Lipsey and Wilson, 2001). The sample distribution of Z<sup>r</sup> is approximately equal to the normal distribution (Hittner and Swickert, 2006; Borenstein et al., 2009). The formula for the transformation is as follows: Z<sup>r</sup> = 0.5 × ln( <sup>1</sup>+<sup>r</sup> 1−r ). The overall Zr can be computed through weighted Zr. Then the overall r can be found through an inverse operation of Fisher' Z-transformation (Lipsey and Wilson, 2001; Borenstein et al., 2009). These computation related to overall effect size estimation were conducted under the random-effects model (Borenstein et al., 2009). The specialized statistical software Comprehensive Meta-Analysis (CMA, Version 2.2) was used in the current meta-analysis for conducting all the needed computations and analyses.

## Data Extraction

Two authors extracted the following information: first author's name, year of publication, sample size of observers, observers' age, gender of the stimuli (male or female), observers' gender (male or female), facial attractiveness task (the forced-choice test with an attractiveness rating, the forced-choice test alone, or the face sequence test), sexual dimorphic preference (masculinity or femininity), measures of observers' own attractiveness (objective or subjective), type of stimuli (face, voice, body), and use of contraceptive (yes or no; for female observers only).

Considering that the sexual dimorphic preference is divided into masculine preference and feminine preference, and that the correlation between observers' own attractiveness and their masculine or feminine preferences is opposite, this meta-analysis calculated the effect sizes and conducted moderating effect analyses for masculine and feminine preferences separately. The data is available in **Supplementary Table 1**.

## Heterogeneity Test

The heterogeneity test was conducted to test whether the average effect size was heterogeneous. Each effect size of each observation value in this meta-analysis contained real and residual effect sizes, which resulted in the partial false phenomenon of effect sizes. The heterogeneity test of effect sizes is always examined by calculating the Q statistic (Borenstein et al., 2009) and I 2 (Card, 2012). In this systemic review, the two statistical values of I 2 and Q were used to detect the heterogeneity of the included effect sizes.

## Publication Bias

Publication bias is a concern for any meta-analytic review because it can lead to a larger combined effect than what actually exists. This type of bias refers to the phenomenon where published studies are more likely to report larger effects. Because studies that have not been published due to their negative or null findings are more difficult to retrieve, and therefore, are less likely to be included in a meta-analysis, an upward bias in the combined effect may occur. Furthermore, English-language publication are more likely to be searched, which leads to an oversampling of statistically significant studies. In this meta-analysis, we use a variety of methods to minimize and test publication bias. When searching the literature, we also searched the most popular and diverse Chinese database CNKI (China National Knowledge Infrastructure), the conference and the dissertation database; and wrote to the important researchers in this field to ask if they had any unpublished research reports. When analyzing the data, we used the funnel plot, Egger's regression and Rosenthal's Fail-safe N to evaluate the publication bias and the degree of its impact (Borenstein et al., 2009).

## RESULTS

## Description of Studies and Overall Association

Due to the topic involved self-attractiveness and dimorphic preference two variables, we combined 7 terms describing preference (face, voice and body) and 6 terms describing observer's attractiveness to search, our literature searches initially identified 5,359 potential articles from databases, but most of them were unrelated articles; Additionally, in many related studies, they involved the relationship between dimorphic

preference and self-attractiveness, whereas, they did not look the relationship as an important topic and not describe in the titles, abstracts, keywords, we also searched the references and citing articles of the related studies, which resulted in many duplicates. Finally the current meta-analysis included 25 studies with 6,853 participants (see **Table 1**). The flow chart has been presented in **Figure 1**. The 25 eligible studies produced 55 effect sizes because 12 studies consisted of multiple datasets. We examined the relationship between preference for sexual dimorphism and observers' own attractiveness. The results showed that the correlation coefficientr (x = 55) of the relationship between these two variables was 0.095 (95% CI: 0.059, 0.130; Z = 5.173, p < 0.001). The correlation coefficient r of the relationship between observers' own attractiveness and preference for masculinity (x = 39) and femininity (x = 16) were 0.103 (95% CI: 0.060, 0.146; Z = 4.691, p < 0.001) and 0.076 (95 % CI: 0.007, 0.145, Z = 2.162, p = 0.031< 0.05), respectively. According to Lipsey and Wilson (2001), an effect size r lower than 0.10 indicates a weak correlation. In order to check the stability of the results of the mean effect size analyses, we conducted a sensitivity analysis, which showed that this meta-analysis did not need to eliminate any of the data that had been included. The data is available in **Supplementary Table 2**.

### Heterogeneity Test

The overall heterogeneity test (x = 55) showed Q = 145.567 (p <0.001), I <sup>2</sup> = 62.904, that mean there existed moderate heterogeneity (Higgins et al., 2003). For masculine preferences, the result of the heterogeneity test (x = 39) showed Q = 72.93 (p < 0.001), I <sup>2</sup> = 47.90 (moderate heterogeneity). For feminine preferences, the result of the heterogeneity test (x = 16) showed Q = 71.77 (p <0.001), I <sup>2</sup> = 79.10 (high heterogeneity).

Considering that previous studies have shown heterogeneity of sexual dimorphic preferences (Wood et al., 2014), and according to Borenstein et al. (2009), if the true effect varies across studies using different samples, it is more reasonable to use the random model. A large number of studies have shown that sexual dimorphic preferences are influenced by observers' own attractiveness, age, gender, and sexual orientation (Zheng and Zheng, 2016). Therefore, the random model was more suitable for the present meta-analysis.

## Publication Bias

We could not find any Chinese published researches in the CNKI database, and none of the important researchers said they have unpublished researches on this topic. We used the funnel plot, Egger's regression and Rosenthal's Fail-safe N to evaluate the publication bias of the studies included in this meta-analysis. In the absence of publication bias, the studies will be distributed symmetrically about the mean effect size, since the sampling error is random. Otherwise, if the funnel plot is asymmetrical at the bottom, there may be publication bias. The funnel plots in the current analysis are a little asymmetrical at the bottom (see **Figures 2**–**4**). Because the interpretation of a funnel plot is largely subjective, the Egger's method has been proposed to quantify or test the publication bias (Egger et al., 1997). So, Egger's method was conducted to detect the publication bias, the result is that t(47) = 1.39, p = 0.17, which means there is no significant bias. Our current meta-analysis reported a significant p-value based on 25 studies. According to Rosenthal's suggestion,

#### TABLE 1 | Characteristics of the 25 studies included in the present meta-analysis.


*(Continued)*

#### TABLE 1 | Continued


we should compute how many missing studies we would need to retrieve and incorporate in the analysis before the p-value became non-significant, the number of the missing studies was called Rosenthal Fail-safe N. The larger number of studies that are needed to nullify the effect, the more confident we can be of a real effect (Rosenthal, 1979). The result of Fail-safe N showed that at least 973 studies with the opposite conclusions would be required to overturn the findings of this meta-analysis.

### Subgroup Analysis

Given the moderate heterogeneity of effect sizes and influence factors aforementioned in the introduction, we conducted subgroup analyses to examine whether the effect sizes varied according to measures of observers' self-attractiveness, type of stimulus, tasks of preference and same/opposite-sex stimuli. The present meta-analysis included only one effect size of same-sex feminine preferences in male stimuli and none of masculine preferences in female stimuli. Thus, we conducted subgroup analysis on feminine preferences for female stimuli and masculine preferences for male stimuli for the same-opposite sex analysis. Moreover, in masculine preference, all the effect sizes on relationship context and contraceptive use were extracted from the preference of opposite-sex (female preference for male stimulus); in feminine preference, all the data on relationship context also came from the preference of opposite-sex (male preference for female stimulus). The results of the subgroup analysis have been presented in **Table 2**. The moderated analysis showed that, for masculine (x = 39), task of preference, measures of observers' self-attractiveness, type of stimulus affected the association between observers' self-attractiveness and their masculine preferences, additionally, women's preference for male masculinity varied across contraceptive use/no and shortlong term relationship context. And for feminine preference (x = 16), this association also depended on the task of preference, relationship context, same/opposite-sex stimuli.

#### Meta-Regression Analysis

To assess the influence of observers' age, published year, and sample size of observers on the effect size (r coefficient), considering these three variables are continuous, meta-regression analysis were carried out for masculine preference and feminine preferences separately. Firstly, we looked the effect size of relationship between self-attractiveness and masculine preference as dependent variable. The results showed the effect size decreased with observers' age, published year, and sample size of observers (see **Table 3**). Specifically, the older observers were, the more negative relationship between self-attractiveness and masculinity preference. In other words, in younger people, individuals in high physical condition would more prefer masculinity. Further, the larger the sample size of the study, the weaker was this relationship, and the later the publishing year of the study, the smaller was the effect. Subsequently, the effect size of relationship between self-attractiveness and feminine preference was used as dependent variable. Only the publishing year significantly positively influenced the relationship.

## DISCUSSION

## The Overall Association

Human sexual dimorphic preference is condition-dependent, and it is an attractive-contingent preference. However, the empirical evidence is inconsistent. The present meta-analysis provides a quantitative synthesis of the available evidences on the attractive-contingent preference, and reveals a significant overall relationship (r = 0.095, x = 55). Additionally, selfattractiveness is significantly positively but weakly related with masculine and feminine preference (masculinity: r = 0.102, x = 39; femininity, r = 0.076, x = 16). These findings are consistent with the concept of condition-dependent preference. Condition dependence lies at the heart of the trade-off between costly sexual traits and other major fitness components such as survival and growth. Variability among individuals' physical condition can potentially influence the form, direction, and intensity of sexual selection in the population as a whole (Widemo and Sæther, 1999). Some previous studies have confirmed that highquality females were more attracted to markers of quality in males (masculine men) across different domains (face, voice, body, and smell) (Little et al., 2011b), which is due to the fact that their own high attractiveness means that lower parental investment is less detrimental. Actually, there is a more complex relationship between self-condition and mating choice, as selfattractiveness does not occur in isolation (Little et al., 2014). The objective or subjective measurement of self-attractiveness (Smith et al., 2009b), relationship context of preference for opposite-sex stimuli (Little et al., 2007; Kandrik and DeBruine, 2012), type of stimuli would also play an important role in observers' sexual dimorphic preferences.

## The Influence of Same/Opposite-Sex, Relationship Context, Contraceptive Use: Basing on the Life History Tradeoff Strategies

As shown in **Table 2**, the intensity of attractive-contingent preference did change across same/opposite-sex conditions, which is inconsistent with our hypothesis. In masculine preference for male stimulus, the correlation between women's self-attraction and male masculine preference was stronger than that of men's self-attraction (opposite = 0.107, x = 35; same = 0.066, x = 3). Patterns of women preferences for men's masculine features are more complex (Burke and Sulikowski, 2010; Holzleitner and Perrett, 2017). On the viewing of LH strategies, good genes, and good provisioning male mate attributes evolved mainly from polygyny: muscularity in men indicating higher immune competence, physical attractiveness, and dominance, competition, high status, but they attract and tend to have more female mates at the same time, and they invest less in each female. In spite of this, attractive women are more confident and believe that they can offset the costs of choosing good genes and good providing partner (Little et al., 2001). Although, attractive men more likely to prefer masculine men as social allies, self-rated sex typicality is the

stronger trait to predict preference for sex-typical physical cue in same-sex faces (Kandrik and DeBruine, 2012), and appears to be more important influence factor of men's preference for same-sex physical cues. Moreover, the market value (good physical condition) is potentially a more important resource for women's mating choice than for men's cooperative choice (Kandrik, 2017). Otherwise, When it comes to the preference for female femininity, the influence of same-sex is greater than that of opposite-sex (same = 0.219, x = 3; opposite = 0.083, x = 7), in another word, attractive women showed a more obvious trend than men did. Possibly because more attractive women may perceive less threat from other feminine women and more likely to look them as social allies (Fisher, 2004). While Attractive men tend to choose charming women but also trade off the benefit and the cost of cuckoldry. Therefore, men's choices also depend on the context of the relationship (long or short term) (Burriss et al., 2011; Little et al., 2014).

In line with our expectations, on women's preferences in male masculinity, long-term context had a larger effect size as compared to the short-term context (correlation in the shortterm context was 0.001, x = 9; that in the long-term context was 0.228, x = 9). Attractive women expressed preferences for all three clusters of men's mate characteristic (3Gs) (Buss and Shackelford, 2008), as the aforementioned conflict of good genes, good provider and good father, they have to trade off. Women placed more value on man's physical attractiveness, muscularity and immediate resource displays (Haselton and Gangestad, 2006) when pursuing short-term mating, in contrast, they placed greater importance on resource acquisition potential and good dad indicators when pursuing long-term mating (Buss and Shackelford, 2008; Lu et al., 2015). In the context of short-term sexual relationships, the perceived cues to high parental investment in feminine men are of little value to women. Moreover, women of higher/lower physical condition can extract potential benefits from masculine men by copulating and conceiving with a short-term relationship. And therefore, both attractive and unattractive females trade on good genes (masculinity) in short-term context. On the contrary, in longterm relationships, better parenting and increased cooperation may outweigh the benefits of genetic fitness, thereby enhancing the attractiveness of more feminine males (Little and Mannion, 2006). Highly attractive women think that they are good enough to find another partner soon, and therefore, they do not need to be restricted by the real conditions or change their preferences according to the relationship context. However, women with poorer appearance have to trade off good genes and parental investment in the long-term context. As a result, the attractivecontingent masculine preference appeared more apparent in the long- than in the short-term context. A similar trend was observed in the voice domain (Feinberg et al., 2012).

On men's preference for women's femininity, the effect size for the short-term context was larger as compared to that for the long-term context (short-term = 0.202, x = 2; longterm = 0.081, x = 3), which indicated that attractive men were more strongly attracted by feminine female in shortterm relationship than unattractive men, however, in long-term context, the men's self-attractiveness wasn't closely associated to their preference for femininity. Firstly, on the perspective of LH tradeoff strategies, preferences represent various LH strategies: fertility-related attributes (good genes) represent a fast LH strategy, whereas attributes of good parenting serve as slow LH function (Lu et al., 2017). Attractive men tended to adopt fast LH strategies with feminine women in short-term relationship, reported more short-term partners than less attractive men TABLE 2 | Summary of the results of the sub-group meta-analysis.


*x, number of effect sizes; MOOA, Measures of Observers' Own Attractiveness; RC, Relationship Context; FCT and AR, Forced-choice Test and Attractiveness Rating; FCT, Forced-choice Test; ST, Sequence Test;* \*\*\**p* < *0.001,* \*\**p* < *0.01,* \**p* < *0.05.*

(Rhodes et al., 2005). As without regards to raising the young, their attractiveness primed them with chance to produce greater number of offspring. Secondly, unlike women trading off between man's good gens and his resource, willing to invest, a man risked raising a child which is not his own, they mainly traded on partner's physical attractiveness and personality traits in longterm relationship (Little et al., 2014). Because feminine women are perceived as unfaithful and are considered more likely to have an affair or a brief sex trade. Both attractive and unattractive men considered women's risk of cuckoldry carefully in long-term contexts, such that self-attractiveness seemed less related with femininity preferences (Little et al., 2014).

The analysis of the moderating effect showed that the correlation coefficient for the relationship of the self-reported attractiveness of non-users of contraceptives and their masculinity preference was significantly higher than that of users (0.183 vs. 0.093). A cross-sectional longitudinal study confirmed that hormonal contraceptive users had a weaker preference for masculinity than non-users did (Little et al., 2013), which was consistent with the findings of other previous studies (Roberts et al., 2014). In addition, hormonal contraceptives tended to decrease women's overall physical attractiveness (Puts and Pope, 2013; Welling, 2013; Roberts et al., 2014), and as discussed above, women's attractiveness was their "market



value" and an important factor in "intra-sexual competition." Therefore, by diminishing women's attractiveness, hormonal contraceptives might make it more difficult for women to compete for romantic partners (Smith et al., 2009a; Puts and Pope, 2013). Similarly, effect size was more negatively correlated with observers' age in masculine preference (see **Table 3**). The observers in the present analysis pool for masculine preference were females. In other words, younger attractive women preferred masculine faces. As people age, especially women, they recalibrate subjective impressions of their own attractiveness (i.e., impressions of their own "market value"), which, in turn, leads to a recalibration of their mate preferences (Little et al., 2011a). With a decrease in their market value with age, most women pay attention to parental characteristics (Jones et al., 2011).

#### Type of Stimuli

With reference to feminine preferences, because the present meta-analysis included only one study on vocal and none on body stimuli, we examined the moderating role of stimuli type in masculine preference. This analysis revealed significantly different effect sizes for body, voice, and face stimuli. Specifically, the effect sizes for voice and body were larger than that for face stimuli (body = 0.185, x = 2; voice = 0.184, x = 8; face = 0.078, x =29), the effect sizes of body and voice are close to 0.2, a moderate amount. However, the coefficient of face and masculine preference was 0.078, a weak correlation, which was out of our expectation. Previous studies have interpreted covariation as evidence that different domains of masculinity all advertised a common underlying factor. Additionally, we suggested that men's and women's masculinity, as signaled by multiple traits, were related to some common information about the underlying quality of the observed individual. Of course, this did not mean that the signals overlap perfectly (Little et al., 2011a), and indeed, our data suggests that masculine preference in all the three traits was strongly related to observers' self-attractiveness. This indicates a distinct characteristic. Masculine facial preferences are less relevant to self-attractiveness, potentially because a large number of studies have focused on facial masculine preference, while little attention has been paid to the preference for masculinity in the voice and body of the observed. Furthermore, most of these studies on facial preferences regarded self-attractiveness as a covariant, and they explored its interaction with other main variables (e. g., relationship context or menstrual cycle). Evidently, the correlation varied in different conditions, and therefore, more effect sizes of facial masculine preference (x = 29) averaged into a smaller effect size.

## The Task of Preference and Measures of Self-Attractiveness

Stronger effects were found when the sequence test (ST) was used instead of using the forced-choice test (FCT) with an attractiveness rating (Masculinity: ST = 0.183 x = 11; FCT = 0.081, x = 12; Femininity: ST= 0.125 x = 6; FCT = 0.076, x = 6). The sequence test provides observers a program in which they can regulate the sexual dimorphism independently until they reached the level that they considered most attractive (Carrito et al., 2016). The forced-choice test with an attractiveness rating merely provides observer two dimensions of a face (masculine and feminine) to choose from. They must then choose one that they consider more attractive and rate its attractiveness (Little and Mannion, 2006). Thus, the sequence test is more ecological and it reflects the observer's preference for dimorphism more clearly.

Most studies included in the present meta-analysis focused on masculine preferences and none used an objective index to measure feminine preferences. Thus, we conducted this subgroup analysis only on masculine outcomes. A moderator analysis revealed a significant difference in the effect sizes of masculine preferences according to the measures of selfattractiveness, which is consistent with Penton-Voak et al.'s findings (Penton-Voak et al., 2003). They thought objective measures could be independent from menstrual cycle, and would be more stable than self-assessments are (Penton-Voak et al., 2003). Some researchers have emphasized that BMI and WHR were better indicators of female attractiveness (Swami et al., 2007; Smith et al., 2009b). Similarly, in the current meta-analysis, objective attractiveness was highly related to masculine preferences (effect size = 0.133, x = 13), but selfrated attractiveness showed a weak relationship (effect size = 0.069, x = 26). Most of the participants were female in the studies included in this meta-analysis. Subjective measures may be influenced by their individual physiology, for example, women perceived themselves more attractive when ovulating (Singh et al., 2001), and self-rated attractiveness may potentially fluctuate in a short space of time. Self-report measures are subjective and may not reflect how an individual is perceived by others. Therefore, objective measurements such as WHR, BMI, and other-rated attractiveness may be more sensitive to the relationship between self-attractiveness and sexual dimorphic preferences.

### LIMITATIONS

We would like to mention some limitations of the present metaanalysis. The first concerns the number of studies analyzed. We included 25 studies, which were divided into sets based on feminine and masculine preferences before submitting them to separate meta-analyses. This meant that we had a limited number of studies for each moderator level (see **Table 2**). Therefore, some levels of the moderator were under-represented. The second limitation of the current meta-analysis is that the age range of observers was relatively concentrated, in that most of them were teenagers or young individuals in their 20s, and only two studies selected older subjects with an average age of 33 years (Zietsch et al., 2015) and 48 years (Jones et al., 2011). Therefore, although the results of the present study revealed a significant regulatory effect of age, the accuracy of this correlation is difficult to prove.

## CONCLUSIONS

This meta-analysis suggests that there is a real relationship between sexual dimorphic preferences and observers' selfattractiveness. Therefore, in future studies, researchers should control self-attractiveness to ensure the reliability of experimental results. According to the results of the present subgroup analyses, this relationship depends on relationship context, same/oppositesex, and contraceptive using. These three moderating effects represented the observer's trade-off on 3Gs, were consistent with the life history strategies. Besides, measurement of observers' attractiveness, type of preference task and stimuli may also may involve the relationship. Therefore, in future studies, researchers should consider these factors and their interactions.

## REFERENCES


## AUTHOR CONTRIBUTIONS

LC and ZR co-designed the study and wrote the manuscript. LC and YY conducted the literature searches, the studies selection. LC and XJ took charged. LC, XJ and HF analysied the data. LC and ZR wrote the manuscript. XJ and LC revised according the review.

## FUNDING

This work was supported by the National Social Science Foundation of China (Grant No. CEA150173).

## ACKNOWLEDGMENTS

We would like to acknowledge Dr. Robert P. Burriss (Department of psychology, Basel University, Basel, Switzerland) for his comments for the manuscript.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02431/full#supplementary-material

attraction to facial characteristics. Philos. Trans. R. Soc. Lond B Biol. Sci. 361, 2143–2154. doi: 10.1098/rstb.2006.1936


to sexual self-labels and attitudes toward masculinity. Arch. Sex. Behav. 45, 725–733. doi: 10.1007/s10508-015-0543-z

Zietsch, B. P., Lee, A. J., Sherlock, J. M., and Jern, P. (2015). Variation in women's preferences regarding male facial masculinity is better explained by genetic differences than by previously identified context-dependent effects. Psychol. Sci. 26, 1440–1448. doi: 10.1177/0956797615591770

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Chen, Jiang, Fan, Yang and Ren. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Relative Contribution of Jawbone and Cheekbone Prominence, Eyebrow Thickness, Eye Size, and Face Length to Evaluations of Facial Masculinity and Attractiveness: A Conjoint Data-Driven Approach

#### Justin K. Mogilski<sup>1</sup> \* and Lisa L. M. Welling<sup>2</sup>

#### Edited by:

Achim Schuetzwohl, Brunel University London, United Kingdom

#### Reviewed by:

R. Nathan Pipitone, Florida Gulf Coast University, United States Darren Burke, University of Newcastle, Australia

> \*Correspondence: Justin K. Mogilski jmogilsk@mailbox.sc.edu; justin.mogilski@gmail.com

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 10 September 2018 Accepted: 19 November 2018 Published: 05 December 2018

#### Citation:

Mogilski JK and Welling LLM (2018) The Relative Contribution of Jawbone and Cheekbone Prominence, Eyebrow Thickness, Eye Size, and Face Length to Evaluations of Facial Masculinity and Attractiveness: A Conjoint Data-Driven Approach. Front. Psychol. 9:2428. doi: 10.3389/fpsyg.2018.02428 <sup>1</sup> Department of Psychology, University of South Carolina Salkehatchie, Walterboro, SC, United States, <sup>2</sup> Department of Psychology, Oakland University, Rochester, MI, United States

Recent work demonstrates the methodological rigor of a type of data-driven analysis (i.e., conjoint analysis; CA), which accounts for the relative contribution of different facial morphological cues to interpersonal perceptions of romantic partner quality. This study extends this literature by using a conjoint face ranking task to predict the relative contribution of five sexually dimorphic facial shape features (jawbone and cheekbone prominence, eyebrow thickness, eye size, face length) to participants' (N = 922) perceptions of facial attractiveness and sex-typicality (i.e., masculinity/femininity). For overall partner attractiveness, eyebrow thickness and jawbone prominence were relatively more salient than cheekbone prominence and eye size. Interestingly, masculinized (i.e., thicker) eyebrows were marginally more attractive for female than male faces, particularly within a long-term mating context. Masculinized jawbone prominence was more attractive for male than female faces, and feminized jawbone prominence was more attractive for female than male faces. For perceptions of masculinity, eyebrow thickness, jawbone prominence, and facial height were relatively more salient than cheekbone prominence and eye size, although facial height was more important for female than male faces, and jawbone prominence was marginally more important for male than female faces. These findings highlight the prominence of eyebrows, the jawline, and facial height during perception of facial attractiveness and masculinity – though it should be noted that many of these differences were small to moderate in effect size. Findings are interpreted in the context of prior research, and future directions for studying why these facial traits exhibit superior signaling capacity are discussed.

Keywords: face preference, face shape, masculinity, attractiveness, conjoint analysis, data-driven, digital manipulation

## INTRODUCTION

fpsyg-09-02428 December 3, 2018 Time: 11:4 # 2

Facial morphological cues (e.g., shape, color, and texture) are indicators of underlying physiology (Stephen et al., 2009; Little et al., 2011b; Jones et al., 2012). From these cues, humans can accurately predict certain physical and psychological qualities (e.g., an individual's health, physical attractiveness, trustworthiness) that are significant to partner selection and social judgment (e.g., Zebrowitz, 2011; Todorov et al., 2015). These qualities can be assessed by observers from facial photographs at first acquaintance, and significantly impact employment decisions, mate selection, friendship, and other key aspects of social interaction (e.g., Petrican et al., 2014; Walker and Vetter, 2016; Funk et al., 2017). Face perception researchers have studied how these cues are processed during interpersonal evaluation by digitally manipulating photographic facial cues and presenting these images to third-party raters. These manipulations predictably alter perceptions of attractiveness, dominance, sex-typicality (i.e., masculinity/femininity), health, trustworthiness, and other social attributes (for a review, see Todorov et al., 2008; Little et al., 2011a). However, this research has tended to focus on individual facial cues in isolation, and debate is now turning to the relative contributions of these cues to social perception (Scott et al., 2010; Stephen et al., 2012; Mogilski and Welling, 2017).

Data-driven models are becoming more valued within scientific face perception research for their capacity to account for a broader array of structural and configural facial features, and the contributions of those specific features to social perception, compared to traditional methods alone (Todorov et al., 2011; Zhang et al., 2018). Early studies that used digital face stimuli to alter and assess person perception (e.g., Perrett et al., 1999; Penton-Voak et al., 2001) measure the influence of perceptually distinct facial cues (e.g., symmetry and sexual dimorphism) by digitally altering one feature while experimentally or statistically controlling for variation in other features. Although this work makes compelling contributions to the literature by reducing confounds, it tells us little about how collections of traits are evaluated in combination. Recent data-driven techniques have overcome this limitation by permitting distinct clusters of features to be altered simultaneously (e.g., Mogilski and Welling, 2017; Stephen et al., 2017; Jones, 2018). For example, Stephen et al. (2017) recorded participants' physiological health (i.e., blood pressure, BMI, percent body fat) and regressed these measures onto variation in facial morphology. These measures were then subjected to factor analysis to identify which clusters of facial features predict variation in health. Participants were asked to change the appearance of potential romantic partners' facial photographs to fit their preference using sliders that incrementally altered faces along each health dimension, thereby digitally manipulating the constellation of facial features naturally associated with variation in these health indices. Similarly, Jones (2018) recorded participants' self-reported health and implemented a Brunswick lens model to assess which facial cues are utilized to assess health, and which cues are valid indicators of health. Photographs were subsequently manipulated according to whichever cues were most utilized and valid. Compared to prior techniques, these methods identify facial cues and manipulate them via digital transformations that are based on a broad collection of naturalistic features, rather than from artificially restricted parameters based on theory alone.

Although these models have been particularly useful for initially exploring, identifying, and simulating the facial cues that contribute to person perception and social decision-making, they are limited in their capacity to assess the relative contribution of several concurrently altered facial cues to holistic perception of those faces. In the studies noted above (Stephen et al., 2017; Jones, 2018), digital transformations were applied to facial images and rated sequentially rather than concurrently. For example, Stephen et al. (2017) asked participants to manipulate a series of faces to appear as healthy as possible by manipulating apparent BMI, blood pressure, and body fat, but each of these dimensions was manipulated independently and rated across separate trials. Jones (2018) manipulated two features concurrently (i.e., averageness and color), but asked participants to assess stimuli of different combinations (e.g., high averageness, low color) across separate line-ups of faces. These methods allow researchers to examine preference for feature combinations, but they are limited by how many feature combinations may be examined at the same time without separating them into separate trials or experimental conditions.

## Conjoint Analysis

Conjoint analysis (CA) provides a convenient way to overcome this design challenge. CA is a multivariate, data-driven analysis used in marketing research (e.g., Gustafsson et al., 2007) that has recently been adapted to study human mate preferences (Mogilski et al., 2014; Mogilski and Welling, 2017). Generally, CA is used to assess how individuals make trade-offs among multiple attributes when evaluating "whole" units that comprise those attributes. For example, CA is often used to evaluate which attributes of a product are most important during consumer purchasing decisions by having consumers rank several versions of the product, where each version is composed of a unique combination of product attributes. Mogilski and Welling (2017) first used this technique to examine the relative salience of three facial cues (i.e., sexual dimorphism, color, and symmetry) during romantic partner perception. Compared to other methods, this technique allows researchers to present sets of faces wherein each face is altered by several different features at once. Participants rank these sets on some metric (e.g., their attractiveness as a romantic partner) and CA provides measures of the relative contribution of each feature to participants' overall ranking decisions. Using this technique, Mogilski and Welling (2017) found that facial shape masculinity/femininity was relatively more important than both symmetry and color cues to health during participants' evaluations of potential romantic partners' facial photographs. However, presenting individuals with multiple versions of mates who vary across several different traits is but one potential use of CA. This technique can also be used to explore preferences for feature variations that relate to a single construct.

Specifically, CA can investigate how several traits that contribute toward a single construct impact rater's perceptions of that construct.

### Current Study

The present study contributes to current face perception literature by using CA to assess the relative contributions of several facial shape cues to perceptions of romantic partner attractiveness and masculinity. Previous research (e.g., Scheib et al., 1999; Penton-Voak et al., 2001; Koehler et al., 2004; Thornhill and Gangestad, 2006; Apicella et al., 2008) has identified several prominent shape cues that contribute to perceptions of sexual dimorphism (i.e., features that differ statistically between male and female faces): eyebrow prominence, cheekbone prominence, eye size, facial height, and jawbone prominence. Among these features, eyebrow prominence, facial height, and jawbone prominence are all reliably larger in men (i.e., are positively related to a masculine face shape and negatively related to a feminine face shape), whereas cheekbone prominence and eye size were reliably larger in women (i.e., are positively related to a feminine face shape and negatively related to a masculine face shape). Identification of the specific features that impact the overall measured sexual dimorphism of a particular individual is important for perceptual research, but no research to date has investigated how these individual features are prioritized relative to one another with respect to mate choice or with respect to the overall evaluation of a person's masculinity/femininity.

This study assessed whether individuals prioritize certain sexually dimorphic facial cues of partner quality (i.e., eyebrow prominence, cheekbone prominence, eye size, facial height, and jawbone prominence) when evaluating the attractiveness of same- and opposite-sex individuals' facial photographs as longand short-term romantic partners. Furthermore, it investigated whether individuals prioritize any of these specific features when ranking different versions of the same individual by perceived masculinity. Given that face perception is, in part, due to part-based information processing (Schwaninger et al., 2004; McKone and Yovel, 2009), it is possible that the partworth value of some facial features is weighted more heavily compared to others during partner perception. Similarly, some features may be more salient than others within different mating contexts (e.g., long- versus short-term; Buss and Schmitt, 1993; Gangestad and Simpson, 2000). Examining how these features are prioritized within long- (i.e., committed) versus short-term (i.e., purely sexual) mating contexts may explain the specific signal value of the distinct facial shape cues that contribute to person perception, and thereby reveal which features are most important to perceptions of attractive facial cues (e.g., Thornhill and Gangestad, 1999; Gangestad and Scheyd, 2005; Puts, 2010). Indeed, developing techniques that improve the accuracy of models that estimate psychological and physiological qualities from facial information (see Hu et al., 2017; Todorov, 2017) is a critical future direction in face perception research (see Jack and Schyns, 2017, for a review). Moreover, because this is the first study to manipulate individual features rather than whole faces by masculinity and femininity, the contribution of each manipulation to perceptions of facial masculinity will reveal interesting information about how we process this trait, as well as serve to further verify the computer graphics manipulations. Finally, this study will expand prior findings that used CA to study face perception (Mogilski and Welling, 2017). Because this study found that sexually dimorphic shape cues were more important than color and symmetry cues (Cohen's d ∼0.60), this study sought to examine which shape features were driving this effect. That is, this study examines whether some shape features signal relatively more information about partner quality (i.e., attractiveness; masculinity/femininity) than others.

## MATERIALS AND METHODS

### Participants

Participants (N = 922, 250 male; age: M = 20.22 years, SD = 3.53; range = 18–51) were recruited from a university in the midwestern United States and various social media outlets (e.g., Facebook, Reddit, Twitter). The majority of participants were White (78.4%; Black 9.5%, Asian 5.5%, Hispanic/Latino 2.3%, "Other" 4.3%), roughly half reported currently being single (49% versus 51% reported being in a romantic relationship), and the majority reported being exclusively heterosexual (91.3%; 6.5% bisexual, 2.2% exclusively homosexual).

## Stimuli

Using well-established methods (e.g., Jones et al., 2005; Little et al., 2007; Welling et al., 2008), composite male and female faces were generated by averaging the shape, color, and texture of a group of 60 Caucasian adult male faces and a group of 60 Caucasian adult female faces. Each composite served as the base image for a set of 19 photographs that varied exclusively by a series of objective, composite-based image transformations (detailed below). Up to five distinct facial characteristics were transformed per photograph variation: eyebrow prominence, cheekbone prominence, eye size, facial height, and jawbone prominence. These features are sexually dimorphic and vary with perceptions of facial attractiveness (Keating, 1985; Scheib et al., 1999; Penton-Voak et al., 2001; Baudouin and Tiberghien, 2004). To permit CA of participants' photograph rankings, each of the 19 photograph variations were planned using an orthogonal array generated with IBM SPSS 21, which is constructed according to a standard formula drawn from statistical reference material. A fractional-factorial design was used to minimize the number of photograph variations that participants were required to rank (Hair et al., 1995). This design generates the fewest number of profiles needed to estimate the contribution of each of the five facial characteristics to overall face evaluation. Each of the five facial features were assigned three possible levels (i.e., feature masculinization, unaltered, or feature feminization), indicating which transformations would be applied to each photograph. This produced an orthogonal array of 16 photograph variations, whereby each variation possessed a unique combination of the five facial characteristics. For example, a face might have masculinized eyebrow prominence, feminized cheekbone prominence, unaltered eye size, feminized facial

height, and masculine jawbone prominence. Three additional holdout images (for a total of 19 photographs) were included to test the validity of the CA utility estimates (see Hair et al., 1995; utility estimates and holdout profiles are defined in more detail in the Results section below). All participants ranked the same 19 images constructed based on the orthogonal array.

To alter specific facial features, image transformation methods used in prior work (e.g., DeBruine et al., 2006; Welling et al., 2007, Welling et al., 2008) were adapted to target individual features rather than whole faces. Specifically, rather than apply compositebased transformations holistically to base images (i.e., to the whole face), 5 male and 5 female composite images were first created, whereby each composite image possessed one feature (i.e., eyebrow prominence, cheekbone prominence, eye size, facial height, or jawbone prominence) of the opposite-sex, but that was otherwise sex-typical. Thus, for each composite image, the points that correspond to individual features (e.g., eye size) were altered such that they matched the position of those same points on the opposite-sex composite (see **Figure 1** for an example). These composite images were then applied to same-sex base images by taking 50% of the linear differences in 2D shape between the applicable altered composites (e.g., **Figure 1**, image C) and the original same-sex composites (i.e., **Figure 1**, image A for women, image B for men) and adding to or subtracting from corresponding points on the base image. **Figures 2**, **3** demonstrate the complete array of masculine and feminine manipulations individually applied to base female (**Figure 2**) and male (**Figure 3**) composite faces. These transformations were then applied to base images (i.e., the original, unaltered composite images) according to the orthogonal array (see **Table 1**). In other words, facial features were manipulated as per previous research (e.g., Welling et al., 2007, Welling et al., 2008) except that individual features were independently manipulated and then concurrently applied to the same face (see examples in **Figure 4**). Although no study has manipulated individual facial features in this way, holistic facial manipulations using these techniques have been shown to influence perceptions of masculinity and femininity in the predicted directions (DeBruine et al., 2006; Welling et al., 2007).

### Procedure

All experimental materials were presented using Qualtrics. After indicating their consent, participants provided demographic information (i.e., age, sex, ethnicity, relationship status, and sexual orientation) and then completed a series of four face ranking tasks. For each task, participants were presented with each of two sets (one male, one female) of nineteen digital facial photographs. Participants were asked to rank the images within each set relative to one another twice: once according to their preference for a long-term relationship and once according to their preference for a short-term relationship. Long- and shortterm relationships were defined for participants as follows:

Long-term relationship: You are looking for the type of person who would be attractive in a long-term relationship. Examples of this type of relationship would include someone you may want to move in with, someone you may consider leaving a current partner to be with, and someone you may, at some point, wish to marry (or enter into a relationship on similar grounds as marriage).

Short-term relationship: You are looking for the type of person who would be attractive in a short-term relationship. This implies that the relationship may not last a long time. Examples of this type of relationship would include a single date accepted on the spur of the moment, an affair within a long-term relationship, and the possibility of a one-night stand.

Participants were instructed to rank same-sex photographs according to how they believed a heterosexual person of the opposite-sex would rank them. The order in which the face ranking tasks and photographs within sets were presented was randomized.

## RESULTS

CA was performed to assess the relative importance of each of the five facial features in participants' ranking decisions. CA produces importance values, which indicate a feature's overall contribution to how profiles are ranked (e.g., the overall importance of cheekbone prominence, eye size, etc.), and part-worth utility estimates, which indicate the relative importance of each level within each trait (i.e., masculinization, feminization, unaltered). In other words, importance values reveal which features are weighted most heavily relative to others during ranking decisions, but not the direction of preference within any given feature (e.g., whether eyebrow prominence is more important for ratings of attractiveness than cheekbone prominence, but not whether masculine, unaltered, or feminine eyebrow prominence is preferred). On the other hand, utility estimates reveal the importance of the manipulation within a trait (e.g., preference for masculine, unaltered, versus feminine eyebrow prominence). Importance values and part-worth utility estimates were calculated for each set of faces.

Participants' rankings of holdout profiles were accurately predicted by the utility estimates (all τ = 1.00) for both

composite image. To create a male composite with feminine eye size, all points within the red square in (B) were altered to match corresponding points in (A). (C) Is a male composite with feminine eye size [i.e., all points within the red squares match (A) (female composite), whereas all points outside the red squares match (B) (male composite)].

Feminized features are presented in the top row and masculinized features are presented in the bottom row.

FIGURE 3 | Examples of each independent feature manipulation applied to the male base composite image. Feature manipulations are organized into columns. Feminized features are presented in the top row and masculinized features are presented in the bottom row.

attractiveness and masculinity assessments. Holdout profiles are image variations with unique facial characteristic combinations that are ranked alongside the original 16 profile, but which are not used to generate importance values or utility estimates. The attractiveness or masculinity of a holdout profile can be calculated using the model generated from

TABLE 1 | Orthogonal array of facial transformations and the respective image variations to which they were applied.


The number listed in the "Image variation" column corresponds to each image's numerical label in Figure 4.

participants' rankings of the other 16 images. That is, the utility estimates for each characteristic (e.g., masculinized face height, feminized eye size, etc.) can be summed within each image variation to give an overall estimated attractiveness or masculinity score for that image. How well the score for each holdout profile predicts participants' rankings of those profiles relative to each of the other 16 profiles is represented by the tau coefficient. In other words, tau represents of how well the utility estimates predict participants' rankings of the holdout profiles relative to the other 16 profiles.

For each of the following analyses, participant gender was included as an additional variable. All interactions with gender were nonsignificant even before Bonferroni correction (adjusted critical p = 0.01; all ps > 0.08). Therefore, participant gender was excluded from our report below.

### Attractiveness Ratings

#### Importance Values

looseness2 A 2(sex of face [male, female]) × 2(relationship context [long-term, short-term]) × 5(facial attribute [eyebrow thickness, cheekbone prominence, eye size, face height, jawbone prominence]) repeated-measures ANOVA was used to examine differences in importance values for each facial attribute in male and female faces ranked for desirability as long- and shortterm mates. All post hoc analyses and pairwise comparisons were adjusted using Bonferroni correction (critical p = 0.01). There was a main effect for facial attribute, F(4, 3684) = 23.41, p < 0.001, η <sup>2</sup> = 0.03. Importance values for eyebrow thickness (M = 21.28, SD = 6.62) were not significantly different from face height (M = 20.31, SD = 7.61, p = 0.114) or jawbone prominence (M = 20.48, SD = 5.66, p = 0.155), but both eyebrow thickness and jawbone prominence were greater than cheekbone prominence (M = 18.43, SD = 4.42, p < 0.001, d = 0.31, d = 0.29) and eye size (M = 19.49, SD = 5.36, p < 0.001, d = 0.19, d = 0.12) when ranking faces for attractiveness. Likewise, importance values were greater for face height than for cheekbone prominence (p < 0.001, d = 0.19).

There was also a significant interaction between sex of face and facial attribute, F(4, 3684) = 6.97, p < 0.001, η <sup>2</sup> = 0.01. Importance values for eye size were greater for male faces (M = 20.09, SD = 7.28) than for female faces (M = 18.90, SD = 7.12), t(921) = 3.76, p < 0.001, d = 0.12, whereas importance values for face height were greater for female faces (M = 20.90, SD = 9.98) than for male faces (M = 19.73, SD = 8.36), t(921) = 3.45, p = 0.001, d = 0.12. There was also a significant interaction between relationship context and facial attribute, F(4, 3684) = 2.39, p = 0.049, η <sup>2</sup> = 0.003. Importance values for eyebrow thickness were greater for a long-term (M = 21.71, SD = 8.26) compared to short-term (M = 20.95, SD = 8.17) relationship context, however, this was not significant after Bonferroni correction, t(921) = 2.04, p = 0.042, d = 0.08.

#### Utility Estimates

Five 2(sex of face [male, female]) × 2(relationship context [longterm, short-term]) × 3(attribute level [masculinized, unaltered, feminized]) repeated-measures ANOVAs were used to examine differences in utility estimates for each level of each facial attribute in male and female faces ranked for desirability as long- and short-term mates. All post hoc analyses and pairwise comparisons were adjusted using Bonferroni correction (critical p = 0.017).

FIGURE 4 | Examples of composite male (top) and female (bottom) images to which one or more of the five digital transformations were applied according to an orthogonal array (see Table 1). Numerical labels correspond to the "Imagine Variation" number listed in Table 1. Base composite images were borrowed from previous work (e.g., Perrett et al., 1998; Penton-Voak et al., 1999; Jones et al., 2005; Welling et al., 2007).

#### **Eyebrow thickness**

There was a main effect for attribute level, F(2, 1842) = 27.37, p < 0.001, η <sup>2</sup> = 0.03. Utility estimates were greater for masculinized (M = 0.25, SD = 0.95) than for unaltered (M = −0.12, SD = 1.03, p < 0.001, d = 0.23) and feminized (M = −0.13, SD = 1.13, p < 0.001, d = 0.21) eyebrow thickness. This was moderated by a significant interaction with sex of face, F(2, 1842) = 3.81, p = 0.022, η <sup>2</sup> = 0.004. Utility estimates for masculinized eyebrow thickness were greater for women (M = 0.33, SD = 1.35) than for men (M = 0.17, SD = 1.19), t(921) = 2.85, p = 0.004, d = 0.10. This was further moderated by a three-way interaction, F(2, 1842) = 3.14, p = 0.044, η <sup>2</sup> = 0.003. For female faces, utility estimates for masculinized eyebrows were higher in a long- (M = 0.42, SD = 1.75) versus short-term (M = 0.25, SD = 1.68) context, t(921) = 2.39, p = 0.017, d = 0.11, though this was only marginally significant after Bonferroni correction. There were no other significant main effects or interactions (all F < 1.96, all p > 0.141).

#### **Cheekbone prominence**

There was a main effect for attribute level, F(2, 1842) = 30.63, p < 0.001, η <sup>2</sup> = 0.03. Utility estimates were greater for masculinized (M = 0.21, SD = 0.79) than unaltered (M = −0.11, SD = 0.78, p < 0.001, d = 0.25) and feminized (M = −0.11, SD = 0.90, p < 0.001, d = 0.21) cheekbones. There were no other significant main effects or interactions (all F < 2.09, all p > 0.124).

#### **Eye size**

There was a main effect for attribute level, F(2, 1842) = 58.21, p < 0.001, η <sup>2</sup> = 0.06. Utility estimates were higher for unaltered (M = 0.23, SD = 0.78) than masculinized (M = 0.09, SD = 0.91, p = 0.006, d = 0.11) and feminized (M = −0.32, SD = 1.07, p < 0.001, d = 0.30) eye size. Estimates were also higher for masculinized compared to feminized eye size (p < 0.001, d = 0.25). There was also a significant interaction between relationship context and attribute level, F(2, 1842) = 3.37, p = 0.035, η <sup>2</sup> = 0.004. Utility estimates for masculinized eye size were greater in a short- (M = 0.15, SD = 1.04) versus long-term (M = 0.04, SD = 1.05) context, t(921) = −2.37, p = 0.018, d = 0.08, whereas unaltered eye size was preferred in a long- (M = 0.29, SD = 1.21) compared to short-term (M = 0.16, SD = 1.24) context, t(921) = 2.33, p = 0.020, d = 0.01, however, both were nonsignificant after Bonferroni correction. There were no other significant main effects or interactions (all F < 2.18, all p > 0.113).

#### **Face height**

There was a main effect for attribute level, F(2, 1842) = 26.88, p < 0.001, η <sup>2</sup> = 0.03. Utility estimates were greater for unaltered (M = 0.24, SD = 1.07) versus masculinized (M = −0.10, SD = 0.84, p < 0.001, d = 0.21) and feminized (M = −0.14, SD = 1.06, p < 0.001, d = 0.24) face height. This was moderated by a significant interaction with sex of face, F(2, 1842) = 68.42, p < 0.001, η <sup>2</sup> = 0.07. Utility estimates for masculinized face height were greater for male (M = 0.25, SD = 1.65) than female (M = −0.45, SD = 1.65) faces, t(921) = 9.71, p < 0.001, d = 0.30. Likewise, estimates for feminized face height were lower for male (M = −0.43, SD = 1.45) than female (M = 0.15, SD = 1.50) faces, t(921) = −8.60, p < 0.001, d = 0.29. There were no other significant main effects or interactions (all F < 1.24, p > 0.290).

#### **Jawbone prominence**

There was a main effect for attribute level, F(2, 1842) = 101.85, p < 0.001, η <sup>2</sup> = 0.10. Utility estimates were greater for masculinized (M = 0.26, SD = 0.88, p < 0.001, d = 0.40) and feminized (M = 0.17, SD = 0.84, p < 0.001, d = 0.36) compared to unaltered (M = −0.43, SD = 1.02) jawbone prominence. This was moderated by a significant interaction between sex of face and attribute level, F(2, 1842) = 10.34, p < 0.001, η <sup>2</sup> = 0.01. Utility estimates for masculinized jawbone prominence were greater for male (M = 0.36, SD = 1.15) than for female (M = 0.16, SD = 1.12) faces, t(921) = 4.19, p < 0.001, d = 0.14, whereas estimates for feminized jawbone prominence were greater for female (M = 0.28, SD = 1.19) than for male (M = 0.06, SD = 1.21) faces, t(921) = −3.88, p < 0.001, d = 0.13. There were no other significant main effects or interactions (all F < 1.30, all p > 0.273).

## Masculinity Ratings

fpsyg-09-02428 December 3, 2018 Time: 11:4 # 8

#### Importance Values

A 2(sex of face [male, female]) × 5(facial attribute [eyebrow thickness, cheekbone prominence, eye size, face height, jawbone prominence]) repeated-measures ANOVA was used to examine differences in importance values for each facial attribute in male and female faces ranked for masculinity. There was a main effect for facial attribute, F(4, 3684) = 17.61, p < 0.001, η <sup>2</sup> = 0.02. Importance values for eyebrow thickness (M = 21.09, SD = 8.65), face height (M = 21.13, SD = 10.23), and jawbone prominence (M = 20.38, SD = 7.49) were not significantly different, but each was significantly greater than cheekbone prominence (M = 18.34, SD = 6.47; d = 0.23; d = 0.21; d = 0.21, respectively) and eye size (M = 19.07, SD = 6.98; d = 0.16; d = 0.15; d = 0.11, respectively) (all ps < 0.001). This was moderated by a significant interaction with sex of face, F(4, 3684) = 5.11, p < 0.001, η <sup>2</sup> = 0.01. Importance values for face height were greater for female (M = 22.09, SD = 13.18) than for male (M = 20.17, SD = 11.76) faces, t(921) = 4.06, p < 0.001, d = 0.14. By contrast, importance values for jawbone prominence were greater for male (M = 20.90, SD = 9.74) than for female (M = 19.86, SD = 10.21) faces, though this difference was only marginally significant after Bonferroni correction, t(921) = 2.39, p = 0.017, d = 0.08.

#### Utility Estimates

Five 2(sex of face [male, female]) × 3(attribute level [masculinized, unaltered, feminized]) repeated-measures ANOVAs were used to examine differences in utility estimates for each level of each facial attribute in male and female faces ranked for perceived masculinity.

#### **Eyebrow thickness**

There was a main effect for attribute level, F(2, 1842) = 77.97, p < 0.001, η <sup>2</sup> = 0.08. Utility estimates were greater for masculinized (M = 0.43, SD = 1.29) than unaltered (M = 0.08, SD = 1.26, p < 0.001, d = 0.17) and feminized (M = −0.51, SD = 1.46, p < 0.001, d = 0.39) eyebrow thickness. There were no other significant main effects or interactions (all F < 0.88, all p > 0.417).

#### **Cheekbone prominence**

There was a main effect for attribute level, F(2, 1842) = 10.73, p < 0.001, η <sup>2</sup> = 0.01. Utility estimates were greater for masculinized (M = 0.17, SD = 1.09) than for feminized (M = −0.07, SD = 1.19, p = 0.001, d = 0.09) and unaltered (M = −0.11, SD = 1.10, p < 0.001, d = 0.11) cheekbone prominence. There was also a significant interaction between sex of face and attribute level, F(2, 1842) = 3.17, p = 0.042, η <sup>2</sup> = 0.003. Utility estimates for feminized cheekbone prominence were greater for female (M = 0.02, SD = 1.64) than for male (M = −0.15, SD = 1.64) faces, though this was only marginally significant after Bonferroni correction, t(921) = 2.34, p = 0.019, d = 0.08.

#### **Eye size**

There was a main effect for attribute level, F(2, 1842) = 29.53, p < 0.001, η <sup>2</sup> = 0.03. Utility estimates were greater for masculinized (M = 0.26, SD = 1.02) than for unaltered (M = 0.00, SD = 1.23, p < 0.001, d = 0.15) and feminized (M = −0.26, SD = 1.32, p < 0.001, d = 0.25) eye size. Estimates were also greater for unaltered than for feminized eye size (p = 0.002, d = 0.11). There were was no other significant main effects or interactions (all F < 0.91, p > 0.401).

#### **Face height**

There was a main effect for attribute level, F(2, 1842) = 147.61, p < 0.001. η <sup>2</sup> = 0.14. Utility estimates were greater for masculinized (M = 0.72, SD = 1.57) than for unaltered (M = −0.05, SD = 1.12, p < 0.001, d = 0.34) and feminized (M = −0.66, SD = 1.49, p < 0.001, d = 0.49) face height. Similarly, estimates were greater for unaltered compared to feminized face height (p < 0.001, d = 0.28). There was also a significant interaction between sex of face and attribute level, F(2, 1842) = 3.10, p = 0.045, η <sup>2</sup> = 0.003. Utility estimates for masculinized face height were greater for female (M = 0.81, SD = 2.14) than for male (M = 0.63, SD = 1.78) faces, t(921) = −2.30, p = 0.022, d = 0.08, whereas estimates for unaltered face height were greater for male (M = 0.02, SD = 1.49) than for female (M = −0.13, SD = 1.55) faces, t(921) = 2.09, p = 0.037, d = 0.07. However, both of these pairwise comparisons were nonsignificant after Bonferroni correction.

#### **Jawbone prominence**

There was a main effect for attribute level, F(2, 1842) = 44.30, p < 0.001, η <sup>2</sup> = 0.05. Utility estimates were greater for masculinized (M = 0.35, SD = 1.15) than for unaltered (M = −0.31, SD = 1.32, p < 0.001, d = 0.30) and feminized (M = −0.04, SD = 1.23, p < 0.001, d = 0.20) jawbone prominence. Estimates were also higher for feminized than for unaltered jawbone prominence (p < 0.001, d = 0.12). There were no other significant main effects or interactions (all F < 1.14, p > 0.321).

## DISCUSSION

The relative importance of five facial features (i.e., eyebrow thickness, cheekbone prominence, eye size, face height, and jawbone prominence) to perceptions of physical attractiveness and masculinity were assessed during participants' evaluations of potential romantic partners' facial photographs. CA was used to calculate individual facial feature importance values in overall rankings of attractiveness and masculinity and utility estimates for each attribute. Importance values for perceived masculinity were not significantly different for eyebrow thickness, face height, or jawbone prominence, but each of these traits was significantly greater than cheekbone prominence and eye size, suggesting that perceptions of physical masculinity are more strongly influences by eyebrow thickness, face height, and jawbone prominence

compared to cheekbone prominence and eye size. Interestingly, this interacted with the sex of the face being ranked, whereby importance values for face height were greater for female faces compared to male faces. This indicates that a masculinized face height has a greater impact on the perceived masculinity of women's faces than men's faces. The opposite was true of jawbone prominence, which was perceptually more important in attributing masculinity to male faces than female faces. This latter finding should be interpreted with caution, however, as the effect fell short of significance after Bonferroni correction. Finally, utility estimates for masculinity rankings indicated that all features manipulated to appear more masculine or more feminine were ranked as such. Importantly, this indicates that the transformations were perceived as intended, further validating the use of computer graphics methods in objectively manipulating perceptions of sexual dimorphism (see also, e.g., Welling et al., 2007).

For physical attractiveness, these estimates were compared across (1) long- and short-term relationship contexts, and (2) sex of the face. With respect to physical attractiveness importance values, eyebrow thickness was not significantly more important than face height or jawbone prominence, but both eyebrow thickness and jawbone prominence were more important than cheekbone prominence and eye size when ranking faces for attractiveness. Likewise, importance values were greater for face height than for cheekbone prominence. In other words, participants in the current sample weighted the appearance of eyebrow thickness, face height, and jawbone prominence as most important in determining overall attractiveness, and eyebrow thickness and jawbone prominence as more important than eye size. Each of these traits appear to be important during zero-acquaintance assessments of physical attractiveness (Cunningham, 1986; Baudouin and Tiberghien, 2004), social dominance and maturity (Keating, 1985; Cunningham et al., 1990), and personality (Paunonen et al., 1999), however, this was the first study to examine which features are relatively more salient when digitally altered and presented alongside other digitally altered facial features within the same facial identity. Furthermore, no study has isolated the precise informational value of each particular facial feature to overall potential partner assessment. In this sense, it is difficult to conclude precisely why certain features were prioritized over others, although it opens up promising avenues for future investigation. Therefore, the informational value of each trait to perceptions of attractiveness may be best interpreted with respect to how preference for these traits varies across relationship context and sex of the face.

Importance values revealed that eye size was relatively more important for male versus female faces. Specifically, utility estimates showed that unaltered eye size was preferred more than masculinized or feminized eye size, and masculinized eye size was preferred more than feminized eye size, when collapsing across sex of face. However, masculinized (i.e., smaller) eye size was preferred more within short- versus long-term relationship contexts. One possibility is that eye size is an important attractiveness indicator in men insofar as individuals with smaller (i.e., masculinized) eyes are perceived as more mature or more socially dominant (Keating, 1985), which may make them more desirable as potential sexual (but not necessarily long-term) partners. Interestingly, unaltered eye size was preferred more in a long- compared to short-term context, perhaps suggesting that only a moderate level of masculinity/femininity is preferred in a long-term mate. However, these latter two findings fell short of significance after Bonferroni correction and should be interpreted with caution.

Importance values for face height were greater for female faces than for male faces, suggesting that the overall signal value of face height is more valuable for attributions of female attractiveness than for attributions of male attractiveness. Utility estimates revealed that masculinized face height was considered more attractive for male versus female faces, and, correspondingly, that feminized face height was preferred less for male versus female faces. Thus, sex-typical face height is preferred in both sexes, but the value of this trait in general is more salient in women's faces compared to men's, likely reflecting greater preferences for sex-typical women versus men (see Little et al., 2014). Similarly, utility estimates for masculinized jawbone prominence were greater for male than for female faces, and, correspondingly, estimates for feminized jawbone prominence were greater for female than for male faces, again suggesting that sex-typical traits are preferred more than sex-atypical traits. However, utility estimates were greater for masculinized compared to unaltered and feminized cheekbone prominence, regardless of the sex of the faces being ranked. Although sensible for male faces, this finding is unexpected for female faces, among whom one would expect a higher preference for feminized (i.e., more prominent) cheekbones. It is possible that participants simply preferred more masculine cheekbone prominence in both sexes, but this interpretation should be taken with caution because the highly subtle nature of this particular manipulation may have also made the differences more difficult to discern for the participants. That said, participants did accurately perceive masculinized cheekbone prominence as more masculine than unaltered or feminized cheekbone prominence, suggesting the manipulation was not imperceptibly subtle. This relationship should be investigated more thoroughly in future work.

Finally, although importance values for eyebrow thickness were greater for a long-term compared to short-term relationship context, this relationship was not significant after Bonferroni correction. Surprisingly, utility estimates revealed that thicker eyebrows were more attractive for female faces than male faces, particularly within a long-term mating context. Previous research shows that thicker eyebrows are typically perceived as more masculine and dominant (Windhager et al., 2011) and are more attractive in male than female faces (Keating, 1985), making this finding unexpected. It is possible that unmeasured personality features (e.g., sociosexuality, self-rated attractiveness) of the raters may moderate these findings. Indeed, recent work suggests that eyebrows may signal personality qualities (e.g., narcissism; Giacomin and Rule, 2018) that influence perceptions of attractiveness. Additionally, effect sizes in the current study were relatively small, and so investigation into the mediating effects of individual difference variables may further explain this relationship. Alternatively, it is possible that current and/or

temporary cosmetic and style trends popular among the tested cohort are influencing this relationship, and so this relationship may not generalize to other populations. Certainly, this requires further investigation.

## CONCLUSION

This study is the first to demonstrate the utility of CA in investigating the importance of specific aspects or traits of a larger construct (e.g., facial masculinity) to the overall evaluation of that construct. Using this technique, we identified three traits (i.e., eyebrow thickness, jawbone prominence, and face height) whose digital manipulation appears to exert a relatively greater influence on perceptions of romantic partner attractiveness and masculinity than cheekbone prominence and eye size. We also showed how the relative salience of these traits shifts across the sex of the face and the relationship context (i.e., a long-term committed versus purely sexual relationship) for which they are evaluated. This contributes to a burgeoning literature of datadriven face analyses and promises to enrich the development

#### REFERENCES


of methodologies that allow researchers to study the relative contribution of distinct facial features to perceptually holistic facial representations.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Oakland University Institutional Review Board with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Oakland University Institutional Review Board.

### AUTHOR CONTRIBUTIONS

JM was responsible for devising the study design, data collection and analysis, and manuscript preparation. LW provided advisory support and guidance throughout each step of this project.


of potential partners' facial photographs. Hum. Nat. 28, 53–75. doi: 10.1007/ s12110-016-9277-4


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Mogilski and Welling. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# 360 Degrees of Facial Perception: Congruence in Perception of Frontal Portrait, Profile, and Rotation Photographs

Vít Trebický ˇ 1,2 \*, Jitka Fialová1,2, David Stella1,2, Zuzana Šterbová ˇ 1,2, Karel Kleisner 1,2 and Jan Havlícek ˇ 1,2

<sup>1</sup> National Institute of Mental Health, Klecany, Czechia, <sup>2</sup> Faculty of Science, Charles University, Prague, Czechia

Studies in social perception traditionally use as stimuli frontal portrait photographs. It turns out, however, that 2D frontal depiction may not fully capture the entire morphological diversity of facial features. Recently, 3D images started to become increasingly popular, but whether their perception differs from the perception of 2D has not been systematically studied as yet. Here we investigated congruence in the perception of portrait, left profile, and 360◦ rotation photographs. The photographs were obtained from 45 male athletes under standardized conditions. In two separate studies, each set of images was rated for formidability (portraits by 62, profiles by 60, and 360◦ rotations by 94 raters) and attractiveness (portraits by 195, profiles by 176, and 360◦ rotations by 150 raters) on a 7-point scale. The ratings of the stimuli types were highly intercorrelated (for formidability all rs > 0.8, for attractiveness all rs > 0.7). Moreover, we found no differences in the mean ratings between the three types of stimuli, neither in formidability, nor in attractiveness. Overall, our results clearly suggest that different facial views convey highly overlapping information about structural facial elements of an individual. They lead to congruent assessments of formidability and attractiveness, and a single angle view seems sufficient for face perception research.

#### Edited by:

Ian Stephen, Macquarie University, Australia

#### Reviewed by:

Barnaby James Wyld Dixson, The University of Queensland, Australia Danielle Leigh Wagstaff, Federation University, Australia

#### \*Correspondence:

Vít Trebický ˇ vit.trebicky@natur.cuni.cz

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 12 September 2018 Accepted: 15 November 2018 Published: 07 December 2018

#### Citation:

Trebický V, Fialová J, Stella D, ˇ Šterbová Z, Kleisner K and Havlí ˇ cek J ˇ (2018) 360 Degrees of Facial Perception: Congruence in Perception of Frontal Portrait, Profile, and Rotation Photographs. Front. Psychol. 9:2405. doi: 10.3389/fpsyg.2018.02405 Keywords: 2D, 3D, head, standardized photography, assessment, morphology, attractiveness, formidability

## INTRODUCTION

When artists create portraits, they rarely depict a full frontal view of the face of a given sitter. Instead, they tend to portray people in some degree of profile, emphasizing one cheek and dimensionality of a face (Murphy, 1994). Interestingly, vast majority of studies on facial perception uses frontal portraits (un/altered photographs, morphs, or line drawings) as stimuli (e.g., Thornhill and Gangestad, 1999; Rhodes, 2006; Ko´scinski, 2009; Calder et al., 2011; Little et al., 2011; Valentová et al., 2013; Little, 2014). Given, however, that in our daily lives we experience faces from multiple angles, it is far from certain that a frontal view is the optimal depiction and several studies even suggested that an individual's appearance can significantly vary depending on the viewing angle (Rule et al., 2009; Jenkins et al., 2011; Tigue et al., 2012; Ko´scinski and Zalewska, 2017; Sutherland et al., 2017). Faces are, after all, complex and highly variable morphological structures (Enlow et al., 1996) and some facial features are apparent only from some viewing angles (Danel et al., 2018). For example, Danel et al. (2018) reported only a moderate correlation in sexually dimorphic features between lateral and frontal facial configuration in both men and women. When frontal and lateral facial configurations were compared as to their averageness,

**38**

a significant association was found only in women. It is therefore plausible to assume that complementary information may be provided by different viewing angles. A single frontal view could potentially obscure relevant visual cues used in assessing certain dimensions (e.g., determinants of facial masculinity, such as protrusion of the brow ridge and angularity of the jaw), thus reducing judgment accuracy.

In research on body perception, the use of other than just frontal view is becoming increasingly common (Tovée and Cornelissen, 2001; Perilloux et al., 2012; Sell et al., 2017; Cornelissen et al., 2018). The use of multiple body angles views allows for assessments of multivariate trait interactions (Brooks et al., 2015). Varying viewing angles of bodies allow raters to assess the shapes and sizes of various morphological characteristics, such as body fat, lean mass distribution, or breast morphology (Dixson et al., 2011, 2015) which all contribute to the resulting attractiveness rating.

Research on facial perception that employs other than frontal facial views remains, however, at best unsystematic (Ko´scinski, 2009) and mutual relations between the frontal and lateral dimensions of facial features have so far received very little attention (Danel et al., 2018). Profile views have been used primarily in orthodontics and aesthetic medicine because it is known that they have an impact on facial attractiveness judgments (Spyropoulos and Halazonetis, 2001; Johnston et al., 2005; Maple et al., 2005; Soh et al., 2007; Shafiee et al., 2008; Nomura et al., 2009). Results from several studies that investigated the averageness of facial profiles show patterns analogous to frontal images (Spyropoulos and Halazonetis, 2001; Minear and Park, 2004; Valentine et al., 2004; Valenzano et al., 2006). Some researchers, meanwhile, tried to overcome the limitations of a single view stimulus by presenting raters with both frontal and lateral views of targets on a single screen (e.g., Dixson and Rantala, 2015; Dixson et al., 2016; Valentova et al., 2017), while other studies found a medium to high correlation between the rating of attractiveness of frontal and lateral depictions (ranging from r = 0.52 to 0.83) (Diener et al., 1995; Valenzano et al., 2006; Davidenko, 2007; Shafiee et al., 2008; Ko´scinski and Zalewska, 2017).

Until recently, most studies used as stimuli static, twodimensional images (photographs). Thanks to technological progress, including a considerable increase in computers' computing powers, 3D scanning and 3D reconstruction technology is now becoming more accessible to facial perception research (Toole et al., 1999; Caharel et al., 2009; Chelnokova and Laeng, 2011; Meyer-Marcotty et al., 2011; Jones et al., 2012; Lefevre et al., 2012; Tigue et al., 2012; Berssenbrügge et al., 2014; Mydlová et al., 2015; Holzleitner and Perrett, 2016; Hu et al., 2017; Kordsmeyer et al., 2018). Potential bias associated with a single 2D image (e.g., profile) might be minimized by the use of 3D images, which represent various viewing angles. To our best knowledge, however, only one study directly compared ratings based on 2D and 3D facial images (Tigue et al., 2012). Authors found a high correlation between 2D and 3D stimuli (r = 0.71), with mean ratings significantly higher for 3D images. On the other hand, it should be noted that in this study, only opposite-sex ratings were performed (female faces were rated by male participants), on a single scale (attractiveness), and the only 2D depictions used were frontal portraits.

Current evidence suggests a rather high level of congruence in judged characteristics (especially attractiveness) between frontal and lateral or frontal and 3D views of faces. It should, however, be taken into account that the development of morphological features between the frontal and lateral view does not always correlate (Danel et al., 2018) and one could thus expect that some socially relevant traits may be easier to assess from other than frontal view (Tigue et al., 2012).

In the two studies, we estimated the congruence in perception of three different views of male heads (frontal portrait, left profile, and 360◦ rotation photographs). We employed two characteristics relevant in the context of intra- and inter-sexual selection, namely the rating of formidability and attractiveness. We also explored whether the type of device used (mobile phones, laptop, and desktop computers) influences the ratings.

## MATERIALS AND METHODS

All procedures employed in this study conform to the ethical standards of the relevant committee on human experimentation and with the Helsinki Declaration. The study was approved by the Institutional Review Board of National Institute of Mental Health, Czech Republic (Ref. num. 28/15). All participants were informed about the goals of the study and gave their informed consent. The present study is part of a larger project which investigates multimodal perception of traits associated with sexual selection and characteristics related to competition outcome.

## Targets

We collected photographs of 45 male Mixed Martial Arts (MMA) athletes (mean age = 26.6, SD = 5.86, range = 18–38). All athletes were from the Czech Republic. They were invited via social media advertisements, leaflets distributed at domestic MMA tournaments, gyms, and with the assistance of Mixed Martial Arts Association Czech Republic (MMAA). All targets were provided with brief description of the project and approved their participation by signing informed consent. As compensation for their participation, they received 400 CZK (approx. e15).

## Acquisition and Settings of the Photographs

To capture images of the targets' head from all 360◦ , we built a turning plywood platform (120 cm in diameter) using flat ball bearings. The platform had 36 steps around its perimeter, i.e., one step for every 10◦ , making it basically a large turntable. To achieve standardization—all photographs were acquired on site—the platform was placed inside a purposebuilt portable photographic booth to control for changes in ambient illumination and for color reflections (see e.g., Rowland and Burriss, 2017; Thorstenson, 2018). Booth dimensions were 140 × 140 × 255 cm. Its frame was made of sectioned aluminum profiles. The outside of the booth and the inside of its roof was covered with black duvetyn cloth (a dense fabric), while the internal side of the walls, the seamless backdrop, and surface

of the turning platform were covered with a bright white velvet (medium density fabric).

To achieve standardized lighting conditions, we used one 800 W studio strobe (Photon Europe MSN HSS-800) aimed into a white reflective umbrella used as a light modifier (Photon Europe, 109 cm diameter), mounted on a 175 cm high light stand, and tilted 10◦ downwards toward the booth. The light was positioned 125 cm from the target. This lighting setup ensured even exposure across the whole scene, which was further verified before each session by a digital light meter (Sekonic L-308S).

Images were acquired using a 24-megapixel, full-frame (35.9 × 24 mm CMOS sensor, a 35 mm film equivalent) digital SLR camera Nikon D610 equipped with a fixed focal length lens Nikon AF-S NIKKOR 85 mm f/1.8 G (Tˇrebický et al., 2016). Exposure values were set to ISO 100, shutter speed 1/200 s, and aperture f/11. Photographs were shot into 14-bit uncompressed raw files (NEF) and AdobeRGB color space. Color calibration was performed using X-Rite Color Checker Passport color targets and white balance patch photographed at the beginning of each session. The camera was mounted in portrait orientation directly onto the light stand, which carried also the strobe light positioned 125 cm from the target so as to achieve a perception close to social interpersonal distance (Hall, 1966; Baldassare and Feller, 1975; Sorokowska et al., 2017), to maintain a constant perspective distortion (Tˇrebický et al., 2016; Erkelens, 2018), and to avoid potential perception bias based on interpersonal distance (Bryan et al., 2012). Camera's distance from each target was verified with a digital laser rangefinder (Bosch PLR 15) as distance between the sensor plane (marked φ on camera body) to the middle of target's forehead. Camera's height was adjusted for each target so as to position the center of his head in the middle of the frame. Focus point was set on target's right eye and focus distance was locked for further images of the target. This setting of camera's distance, focal length, and sensor size yielded a 35 × 53 cm field of view (23.85◦ angle of view) and the aperture setting resulted in a 9 cm depth of field (4 cm before and 5 cm behind the focal plane).

Targets were seated on a 63 cm high bar stool (Ikea Franklin) positioned in the middle (rotation axis) of the turning platform. We asked them not to lean against the stool's back support and to sit with their back straight and hands hanging freely alongside their body. They were asked to adopt a neutral facial expression (with no smile or frown), to look directly into the camera, and to remain in this position for all subsequent photographs. When necessary, targets were instructed to adjust their posture and head position, so they were facing the camera straight on, without any head pitch, yaw, or roll. On top of that, they were instructed to wear only black underwear shorts we provided them with (i.e., without T-shirt) and to remove any adornments (glasses, earrings, piercings or other jewelry).

One full 360◦ rotation yielded 36 photographs. After each photograph, research assistant manually turned the turning platform by one step (10◦ ) clockwise. We captured two full rotations to obtain one full set of images while eliminating all possible movements, blinks, etc. of targets. Capturing both full rotations took approx. 10 min.

## Stimuli Processing

All image processing was carried out in Adobe Lightroom Classic CC (Version 2017). First, all images were converted into DNG raw files and DNG color calibration profiles were assembled (using X-Rite Color Checker Passport LR plugin) and applied to all photographs. For each target, a final set of 36 images covering full 360◦ head rotation was selected and postprocessed by combining suitable images (correct head position, open eyes, closed mouth, etc.) from the two captured rotations. To ensure consistency in exposure across all selected photographs, percentages of Red, Green, and Blue channel values were checked across three background areas (above, left, right) and eventual small differences in exposure were manually adjusted to the same level. In the next step, the calibrated images were exported into lossless 16-bit AdobeRGB TIFF files in their real size of 35 × 53 cm and 168 pixels per inch (ppi) resolution (a native ppi of 4K screens used for rating sessions, see Rating Session in section Formidability Rating). This resulted in life-sized images of targets' heads. Horizontal and vertical positions of images were adjusted using LR Transform tool to position target's head in the center of the frame with eyes in a horizontal line. Final images were batch-cropped to 1:1.1 (2,095 × 2,305) side ratio to fit head rotations of all targets. Images were then converted into sRGB color space and exported as 8-bit JPEG files (2,095 × 2,305 px @ 168 ppi).

### Building 360◦ Head Rotations

We used Sirv (www.sirv.com), an online suite for creating and managing image spins, to build 360◦ head rotations. With all image adjustments and optimization to image size and quality done by Sirv turned off, we uploaded the images of all targets and created the individual spins. See **Supplementary Materials** for sample 360◦ head rotation (360 rotation video.MP4).

#### Portraits and Profiles

Analogously to previous research investigating morphological differences between portraits and profiles (Danel et al., 2018), we have selected from the set of 36 images for each target a frontal and left profile image. See **Supplementary Materials** for sample frontal portrait (Frontal portrait.JPEG) and left profile (Left profile.JPEG).

## Raters

#### Formidability Rating

Portraits were evaluated by 62 raters (30 men), mean age = 23.1 (SD = 3.45, range = 18–39); profiles by 60 raters (30 men), mean age = 22.8 (SD = 3.55, range = 18–36); and 360◦ rotations by 94 raters (46 men), mean age = 22.1 (SD = 3.09, range = 18–38) (see **Table 2**). Raters were mainly Charles University (Prague, Czech Republic) students recruited via social media advertisements, mailing list of participants assembled in previous studies or invited on site. All raters were provided with brief description of the project and approved their participation by signing informed consent. Rating took place in a lab (see section Rating Sessions, Formidability Rating) and when the rating was completed, they received for their participation 100 CZK (approx. e4) and a debriefing leaflet. Using a two-way ANOVA, we found no age differences between sexes, stimuli type ratings, or their interaction [Sex: F(1, 210) = 0.371, p = 0.543; Stimuli type: F(2, 210) = 1.777, p = 0.172; Sex × Stimuli type: F(2, 210) = 0.006, p = 0.994].

#### Attractiveness Rating

Portraits were evaluated by 195 raters (30 men), mean age = 29.6 (SD = 6.05, range = 18–48); profiles by 176 raters (32 men), mean age = 29.2 (SD = 6.26, range = 18–53); and 360◦ rotations by 150 raters (35 men), mean age = 29 (SD = 6.27, range = 18–46) (see **Table 2**). Raters were recruited mainly via advertisements among followers of National Institute of Mental Health (facebook.com/nudzcz) and Human Ethology group (facebook.com/etologie) Facebook pages. Ratings were carried out online. All raters provided their informed consent by clicking on the "I agree" button to consent with their participation in the study and were not financially reimbursed. Two-way ANOVA showed no age difference between sexes, the stimuli type ratings or their interaction [Sex: F(1, 515) = 2.553, p = 0.111; Stimuli type: F(2, 515) = 0.162, p = 0.85; Sex × Stimuli type: F(2, 515) = 0.084, p = 0.864]. **Table 2** provides detailed descriptive statistics.

#### Rating Sessions Formidability Rating

Formidability ratings were performed in two separate sessions. In the first session, we collected the ratings of 360◦ rotations. In the second session, raters were randomly divided to rate either a set of portrait or profile images. Each rater thus judged a full set of only one type of stimuli.

Ratings took place in a quiet perception lab, in standardized conditions across all raters (with artificial lighting and closed window blinds to eliminate changes in ambient lighting). Raters were seated in the same eye level with stimuli's eyes, 125 cm from the screen, i.e., in the same distance as the camera was from the target in order to simulate approximate social interpersonal distance (Hall, 1966; Baldassare and Feller, 1975), and in the center of the projected photograph (Cooper et al., 2012). This was implemented so as to increase the ecological validity of the rating.

Images were presented to raters on 27′′ Dell U2718Q UltraSharp IPS screens (3,840 × 2,160, 99% sRGB color space coverage) turned to a vertical position to accommodate life-size images. Screens were connected to Asus ROG G20 PC running Microsoft Windows 10 with environment scaling set to 100%. Screens were color- and luminance-calibrated with X-Rite i1 Display Pro probes. The probes were connected during the whole rating session to adjust screens for ambient light. Qualtrics survey suite (Qualtrics, Provo, UT) with Blank theme run through Google Chrome (in full screen mode and 100% scaling) was used for data collection.

All raters received a set of brief demographics questions (e.g., sex, age, and education status) followed by a block containing stimuli. Images were presented in a randomized order. Raters were asked to rate formidability ("Jak moc by byl tento muž úspešný, kdyby se dostal do fyzického souboje?"/"If this man ˇ was involved in physical confrontation, how successful he would be?") of each target on a 7-point verbally anchored scale (from "1 – velice neúspešný"/"very unsuccessful," to "7 ˇ – velice úspešný"/"very successful"). The 360 ˇ ◦ rotations spun automatically clockwise once (automatic rotation took approx. 2 s) and raters were instructed to turn the heads around for further inspection by dragging mouse left or right before rating. Portrait and profile photographs were simply projected on the screen. Time for rating was not limited.

#### Attractiveness Rating

Ratings were collected on-line via Qualtrics survey suite (Qualtrics, Provo, UT). All raters were first presented with a brief study description and informed consent. Then they completed a set of demographics questions, which was followed by one randomly selected block of stimuli (portraits, profiles, or 360◦ rotations). Each rater thus assessed a full set of only one type of stimuli. Images were presented in a randomized order. Raters were asked to rate attractiveness ("Jak atraktivní je muž na fotografii?"/"How attractive is the man on photograph?") of each target on a 7-point verbally anchored scale (from "1 – velice neatraktivní"/"very unattractive", to "7 – velice atraktivní"/"very attractive"). The 360◦ rotations spun automatically once clockwise (automatic rotation took approx. 2 s) and raters were instructed to turn the heads around for further inspection by dragging mouse left or right before rating, while portrait and profile photographs were simply projected on the screen. Time for rating was not limited.

We used Qualtrics Blank theme and custom CSS code to set the image size to 800 px width with centered margin alignments (.Skin #SkinContent.QuestionBody {width: 800px; display: block; margin-left: auto; margin-right: auto;}.Skin #SkinContent.QuestionText {width: 800px; display: block marginleft: auto; margin-right: auto;}) to standardize stimulus size and position across all devices used. First, raters were asked to switch their web browsers into Full Screen mode and adjust page scaling to achieve the largest image size possible while seeing the rating scale without having to scroll down, i.e., if a Full HD 16:9 screen (1,920 × 1,080) was used for rating in Full Screen mode, browser scaling would remain on native 100%.

#### **Devices used for attractiveness rating**

When raters completed rating the images, they were asked to specify the type of device they used (mobile phone, tablet device, laptop computer, desktop computer or other), screen size or brand and model name of the device (to

TABLE 1 | Correlations between stimuli types.


All correlations are significant at p < 0.001.

later identify screen size and resolution). This data was used to test a possible effect of the device used on the rating.

In total, attractiveness was rated by 521 raters: 233 used laptop computers, 135 desktop computers, 116 mobile phones, 19 tablet devices, 1 other device, and 17 did not specify device type. See **Tables S1** and **S2** for data on screen sizes and resolutions.

In subsequent analyses, we used only data from the three most frequently represented device categories: mobile phones, laptops, and desktop computers. This resulted in a sample of 484 raters.

#### Statistical Analysis

All statistical tests were performed in JASP 0.9.0.1 (JASP Team, 2018) and jamovi 0.9.1.7 (jamovi project, 2018). McDonald's ω statistics was used for estimating inter-rater agreement (Dunn et al., 2014). To test for potential age differences between rater groups, a two-way ANOVA was carried out, with raters' sex and stimuli types entered as two independent variables and age as a dependent variable. Two-way ANOVA was further used to compare sex differences in mean formidability and attractiveness rating, where raters' sex and stimuli types were entered as two independent variables and the rating of formidability or attractiveness as a dependent variable. Effect sizes for two-way ANOVAs are reported in η 2 . Associations between the ratings of different stimuli types were tested by bivariate correlations using Pearson's r coefficient with 95% CIs [lower limit, upper limit]. For exploratory purposes, we also tested the influence of device on attractiveness rating using a two-way ANOVA with stimulus type and device type entered as independent variables and mean attractiveness rating as a dependent variable. A Holm's post-hoc test was performed and effect sizes for the comparison are reported in Cohen's d.

## Data Availability

Datasets generated and analyzed during the current study are available as **Supplementary Material** of this article (Dataset formidability.XLSX, Dataset attractiveness. XLSX).

## RESULTS

### Formidability Rating

McDonald's ω scores of male and female ratings showed a high inter-rater agreement across all three stimuli types (ranging from 0.732 to 0.876). In subsequent analyses, we

Frontiers in Psychology | www.frontiersin.org

have therefore used mean formidability ratings given to the individual stimuli separately by male and female raters. Further, we found a high correlation between ratings assigned by men and women for portraits (r = 0.941, 95% CI [0.895, 0.967], p < 0.001), profiles (r = 0.962, 95% CI [0.931, 0.979], p < 0.001) and 360◦ rotations (r = 0.972, 95% CI [0.95, 0.985], p < 0.001).

Ratings of all three stimuli types were highly correlated (**Table 1**, **Figure 1**). Two-way ANOVA showed no main effect of rater's sex [F(1, 264) = 0.00014, p = 0.991, η <sup>2</sup> < 0.001], stimulus type [F(2, 264) = 0.473, p = 0.624, η <sup>2</sup> = 0.004], or rater's sex × stimulus type interaction [F(2, 264) = 0.01, p = 0.99, η <sup>2</sup> < 0.001] on formidability ratings (**Figure 2**). For descriptive statistics, see **Table 2**.

FIGURE 2 | Differences in mean ratings of formidability (Left) and attractiveness (Right) between stimuli types (portraits, profiles, and 360◦ rotations). Violin plots show rating distributions, box plots its 25th and 75th percentile. Dark gray violin plots represent female and white violin plots male ratings, respectively. Mean formidability ratings did not differ between sexes, while males rated all stimuli types as more attractive compared to females.



## Attractiveness Rating

McDonald's ω scores of male and female ratings showed a high inter-rater agreement in all three stimuli types (ranging from 0.831 to 0.966), which is why in subsequent analyses, we used the mean formidability ratings given to a particular stimulus separately by male and female raters. Ratings by women and men were highly correlated: r = 0.952, 95% CI [0.915, 0.974], p < 0.001; r = 0.969, 95% CI [0.944, 0.983], p < 0.001; r = 0.962, 95% CI [0.932, 0.979], p < 0.001 for portraits, profiles, and 360◦ rotations, respectively.

Attractiveness ratings of all three stimuli types were highly correlated (**Table 1**, **Figure 1**). Two-way ANOVA showed main effect of rater's sex [F(1, 264) = 3.87, p = 0.05, η <sup>2</sup> = 0.014], men rated attractiveness higher as compared to women, but the effect of stimulus type [F(2, 264) = 1.516, p = 0.222, η <sup>2</sup> = 0.011], and rater's sex × stimuli interaction [F(2, 264) = 1.118, p = 0.329, η <sup>2</sup> = 0.008] on attractiveness ratings was not significant (**Figure 2**). For descriptive statistics, see **Table 2**.

#### Influence of Device Type on Attractiveness Rating

To explore whether the type of device used for viewing and rating influences attractiveness rating, we performed a twoway ANOVA with stimuli type and device type as independent factors. The results showed main effects of both device types [F(2, 475) = 7.429, p < 0.001, η <sup>2</sup> = 0.030] and stimuli types [F(2, 475) = 4.27, p = 0.015, η <sup>2</sup> = 0.017], but no significant interaction between them [F(4, 475) = 1.065, p = 0.373, η <sup>2</sup> = 0.008]. Holm's post-hoc comparison showed that raters using mobile phones rated the images as significantly more attractive compared to desktop [t(475) = 3.817, pHolm < 0.001, Cohen's d = 0.557] and laptop users [t(475) = 3.023, pHolm = 0.005, Cohen's d = 0.392], whereby the ratings assigned by laptop and desktop users did not differ [t(475) = 1.357, pHolm = 0.175, Cohen's d = 0.145]. 360◦ rotations were rated significantly higher than portraits [t(475) = 2.912, pHolm = 0.011, Cohen's d = 0.418], but there was no statistical difference between 360◦ rotations and profiles [t(475) = 1.753, pHolm = 0.16, Cohen's d = 0.212]; and between portraits and profiles [t(475) = 1.366, pHolm = 0.173, Cohen's d = 0.159] (**Figure 3**). For descriptive statistics, see **Table 3**. Further, attractiveness ratings between all three types of devices were highly correlated: r = 0.883, 95% CI [0.839, 0.915], p < 0.001; r = 0.885, 95% CI [0.842, 0.917], p < 0.001; r = 0.949, 95% CI [0.93, 0.964], p < 0.001 for mobile phones–laptops, mobile phones–desktops, and laptops–desktops, respectively (**Figure 4**).

#### DISCUSSION

The main aim of this study was to examine whether perception of formidability and attractiveness varies depending on the angle under which a face is viewed. To this purpose, we used standardized sets of frontal portraits, left profiles, and 360◦ head rotations of male facial images. We found strong correlations between the three types of stimuli and no significant differences

percentiles. Dark gray violin plot represents mobile phones, light gray violin plot laptop computers, and white violin plot desktop computers, respectively. Asterisks indicate the level of significance, \*\*p = 0.005, \*\*\*p < 0.001.

TABLE 3 | Devices and rating descriptive statistics.


in the mean ratings of formidability and attractiveness. Our results thus showed that ratings based on the three different face views were highly congruent and both perceived formidability and attractiveness ratings appear to be view-invariant. As a subsidiary aim, we have also tested the effect of the device used on attractiveness rating. While there were no differences between ratings performed on desktop and laptop computers, ratings performed using the mobile phones were higher (targets were perceived as more attractive).

Majority of facial perception research uses as stimuli frontal images (Ko´scinski, 2009), which is in striking contrast with our daily life experience. Moreover, there is a long-standing debate on how human visual system recognizes objects viewed from different angles (Hayward, 2003) and whether object recognition is view-specific, i.e., linked to a specific viewing orientation (Tarr and Bülthoff, 1995), or view-invariant (Biederman and Gerhardstein, 1993). Some evidence suggests that human visual system may be view-specific and process objects differently depending on the viewing angle (Jeffery et al., 2007; but see Jiang et al., 2006). If this were the case, results from perceptual studies

that rely solely on frontal portraits could not be generalized to the other viewpoints. Our results, however, at least when it comes to social perception, do not support this hypothesis.

Our data shows patterns analogous in direction and magnitude to those reported in previous studies that compared assessments based on different stimuli views of both faces and bodies (e.g., frontal × profile, frontal × 3D or oblique poses), which likewise showed strong correlations in ratings (Diener et al., 1995; Tovée and Cornelissen, 2001; Valenzano et al., 2006; Davidenko, 2007; Shafiee et al., 2008; Perilloux et al., 2012; Tigue et al., 2012; Dixson et al., 2015; Ko´scinski and Zalewska, 2017). A related study by Tigue et al. (2012) reported that attractiveness of frontal and 3D depictions of women's faces as rated by men were highly correlated (r = 0.71) but 3D stimuli received significantly higher mean ratings. Authors suggest that their findings may be an effect of novelty of the 3D visualization. Our study, on the other hand, found no differences between the mean ratings of 2D (frontal and profile) and 3D images. It should be noted, however, that we opted for an alternative to standard 3D visualization. By combining several individual photographs presented in sequence as a spin (360◦ rotation), we avoided possible bias based simply on differences in capture technology (such as noticeable differences in lighting and colors between 2D and 3D stimuli).

The 360◦ rotation photographs allowed us to present raters with an all-around view of stimuli heads without running the cost of acquiring 3D capture technologies. Although the resulting visualizations are indeed photorealistic, there is a notable drawback related to implementing this procedure. The capturing and subsequent processing of the images is rather time-consuming and physically demanding, especially for the photographed targets, because one spin takes approx. 5 min and during this time, targets have to sit completely still with fixed gaze, so that controlling for head tilts, yawns, and rolls thus becomes even more critical (Penton-Voak et al., 2001; Hehman et al., 2013; Sulikowski et al., 2015). The use of 3D stimuli captured with actual 3D scanning and 3D reconstructions technology would allow for a variety of target applications including stimuli capture and presentation. For instance, resulting models could be rotated to arbitrary angles relative to their position during capture, rather than simply displayed in an identical head position in all photographs. Such 3D stimuli would produce more realistic face reconstructions: the main obstacle is the relatively high initial investment into a 3D scanner. Moreover, although 3D facial models are remarkably human-like, they are certainly distinguishable from, and less familiar than, photographs and that could potentially reduce their validity in terms of being a realistic visualization of humans (Crookes et al., 2015). Future studies should investigate whether perception of 3D models differs from 360◦ rotation photographs.

Interestingly, we found that the device used for viewing the stimuli has a significant influence on the rating. Raters using mobile phones gave on average higher attractiveness ratings than users of laptop or desktop computers. To our best knowledge, no previous study investigated the influence of the device used for viewing on ratings. Although the screen size and resolution of mobile phones are increasing, screen size of handheld devices does limit the size of images that can be viewed on it. That negatively influences the amount of detailed visual information available to the observer, hence potentially limiting the visibility of cues that may affect some aspects of social perception (such as attractiveness). All this may result in ratings higher than those based on viewing images on larger screens which do show more detail. For instance, several studies have reported that more homogenous skin is perceived as more attractive (Jones et al., 2004; Fink et al., 2006; Tsankova and Kappas, 2016; Jaeger et al., 2018; Tan et al., 2018). It is thus possible that lower visibility of such types of imperfections on mobile phones may lead to higher scores. This is a potentially important issue since ever more researchers opt for online data collection. One ought to take into consideration the kind of devices raters decide to use for their viewing and rating, because if a specific subgroup of raters systematically chooses to use a particular kind of device, it could bias the results. In our case, mobile phones were used by nearly one quarter of raters. The results we report are correlations and we just assume that differences in ratings were influenced by the kind of device raters used. In theory, it is possible that a particular group of raters simultaneously tended to give higher ratings and used mobile phones for viewing and rating. It is likely, however, that the two phenomena are independent of one another, because we found no differences between rater groups in other characteristics.

A potential limitation of our study is the fact that we used a rather specific sample of targets, namely MMA athletes. This fact might limit the generalization of our results. One could expect that MMA fighters would be perceived as highly formidable opponents, as rather specific in appearance ("cauliflower" ears, broken noses, eyebrow scares, etc.), which is why their ratings of formidability and attractiveness might be less variable and/or skewed. In our study, however, raters were not explicitly told that the targets presented to them are MMA fighters. What we found was that mean formidability rating of all three stimuli types on a 7-point scale ≈ 4 (ranging from 2 to 6.2) and skewness of all three stimuli types were between 0.097 and 0.189 (**Table 2**) and data followed normal distribution. For attractiveness, mean ratings for all three stimuli types were between 2.8 and 2.93 (ranging from 1.7 to 4.8) and skewness between 0.111 and 0.578 (**Table 2**), hence comparable to average ratings of male facial attractiveness in other studies (e.g., Saribay et al., 2018). It thus seems that the specific nature of our sample does not impede generalization of our finding. Nevertheless, future studies based on less specific samples should further investigate this issue.

To conclude, the findings presented here, along with other recent studies, provide converging evidence that single and multiple view facial images convey highly overlapping information and a single angle view contains enough information about the spatial structural elements of a face to congruently assess formidability and attractiveness, at least in the case of male faces. These results also suggest that studies which use different types of stimulus depiction are, generally speaking, comparable:

#### REFERENCES


this ought to simplify the interpretation of individual studies.

### AUTHOR CONTRIBUTIONS

VT, JF, and JH developed the study concept. KK contributed to the study design. Data collection was performed by VT, JF, DS, and ZŠ. VT performed data analysis and interpretation, VT and JF drafted the manuscript, and DS, ZŠ, KK, and JH provided critical revisions. All authors approved the final version of the manuscript for submission.

## FUNDING

This research was supported by Czech Science Foundation (GACR P407/16/03899S), Charles University Research Centre ˇ UNCE 204056 and the Sustainability for the National Institute of Mental Health project, grant number LO1611, with financial support from the Ministry of Education, Youth, and Sports of the Czech Republic under the NPU I program.

## ACKNOWLEDGMENTS

We thank Klára Coufalová, Radim Pavelka, Tereza Nevolová, Žaneta Slámová, Pavel Šebesta, Dagmar Schwambergová, and other members of Human Ethology group (www.etologiecloveka. cz) for their help with data collection, Mixed Martial Arts Association Czech Republic (MMAA), Anna Pilátová for English proofreading and all volunteers for their participation.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02405/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Tˇrebický, Fialová, Stella, Šterbová, Kleisner and Havlí ˇ ˇcek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Judging Others by Your Own Standards: Attractiveness of Primate Faces as Seen by Human Respondents

Silvie Rádlová, Eva Landová and Daniel Frynta\*

RP3 Applied Neurosciences and Brain Imaging, National Institute of Mental Health, Klecany, Czechia

The aspects of facial attractiveness have been widely studied, especially within the context of evolutionary psychology, which proposes that aesthetic judgements of human faces are shaped by biologically based standards of beauty reflecting the mate quality. However, the faces of primates, who are very similar to us yet still considered nonhuman, remain neglected. In this paper, we aimed to study the facial attractiveness of non-human primates as judged by human respondents. We asked 286 Czech respondents to score photos of 107 primate species according to their perceived "beauty." Then, we analyzed factors affecting the scores including morphology, colors, and human-likeness. We found that the three main primate groups were each scored using different cues. The proportions of inner facial features and distinctiveness are cues widely reported to affect human facial attractiveness. Interestingly, we found that these factors also affected the attractiveness scores of primate faces, but only within the Catarrhines, i.e., the primate group most similar to humans. Within this group, humanlikeness positively affected the attractiveness scores, and facial extremities such as a prolonged nose or exaggerated cheeks were considered the least attractive. On the contrary, the least human-like prosimians were scored as the most attractive group. The results are discussed in the context of the "uncanny valley," the widely discussed empirical rule.

#### Edited by:

Ian Stephen, Macquarie University, Australia

#### Reviewed by:

Justin Kyle Mogilski, University of South Carolina Salkehatchie, United States Ferenc Kocsor, University of Pécs, Hungary

> \*Correspondence: Daniel Frynta frynta@centrum.cz

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 07 August 2018 Accepted: 19 November 2018 Published: 11 December 2018

#### Citation:

Rádlová S, Landová E and Frynta D (2018) Judging Others by Your Own Standards: Attractiveness of Primate Faces as Seen by Human Respondents. Front. Psychol. 9:2439. doi: 10.3389/fpsyg.2018.02439 Keywords: primates, facial attractiveness, visual perception, human preferences, uncanny valley, colors, visual cues

## INTRODUCTION

Faces play a key role in the identification of other individuals, which is one of the most important skills needed in social communication of primates (Pascalis and Bachevalier, 1998; Santana et al., 2012). Humans can read emotional expressions from faces and gain a quick insight into the immediate mood of others (Ekman and Friesen, 1986; Fridlund, 1994; Russell, 1994; Calvo and Nummenmaa, 2016). Facial cues also bear information about the individual's social role (age, sex, and race; for a review, see Yovel, 2016) or personality, such as dominance (Jones et al., 2010), extraversion (Borkenau et al., 2009), trustworthiness (Stirrat and Perrett, 2010), intelligence (Zebrowitz et al., 2002), or emotional stability (Penton-Voak et al., 2006).

Recognition of individual faces is so important that during evolution we gained a complex neural system specialized for just this function (Haxby et al., 2000; Zhao et al., 2018). Because of that, we are able to holistically distinguish faces that subtly differ in minimal position changes of inner facial features, i.e., the eyes, nose, and mouth (Maurer et al., 2002). We also use this recognition ability when evaluating the facial attractiveness: the evaluation is very strict as minimal deviation from the averageness or subtle distinctiveness can be perceived as unattractive or attractive. However, the more different the faces are from our own race and species, the more this ability weakens and diminishes. Using configural processing, humans can process same-race, conspecific faces with a higher success than faces of other races and species (Tanaka et al., 2004; Michel et al., 2006; Ge et al., 2009; Taubert, 2009), and the same applies for non-human primates: according to Gothard et al. (2009), macaques use configural processing when identifying faces of conspecifics, but turn to feature-based mode of analysis when processing pictures of human faces. The way in which humans consider attractiveness of faces of other species thus forms a very interesting question. In this matter, primates represent the perfect group to study they include species phylogenetically closest to humans with very human-like faces, but also less similar species like the prosimians. Is it possible that human respondents see some primates as human caricatures and evaluate their facial attractiveness using the same facial cues as they use when evaluating facial beauty of conspecifics?

The majority of papers study facial beauty related to sexual attractiveness (for reviews, see Thornhill and Gangestad, 1999; Fink and Penton-Voak, 2002; Ko´scinski, 2007 ´ ). In this context, the best predictors for facial attractiveness are averageness (Jones and Hill, 1993; Rhodes and Tremewan, 1996; Komori et al., 2009; Zhang et al., 2011), symmetry (Grammer and Thornhill, 1994; Perrett et al., 1999; Scheib et al., 1999), sexual dimorphism (Perrett et al., 1998; Valenzano et al., 2006), smoothness of skin texture and color (Fink et al., 2001, 2006, 2012; Jones et al., 2004; Matts et al., 2007) and an absence of visible defects such as scars (Rankin and Borah, 2003) or congenital face clefts (Tobiasen, 1987).

In studies of human facial attractiveness, many aspects of the preferred facial traits vary under different domain specifity, i.e., different features may be preferred when considering facial attractiveness of short-term sexual partners and long-term romantic partners (DeBruine, 2005), competitors (Fisher, 2004), etc. Although the true nature of domain specifity that lies behind the ranking of primate facial "beauty" is unknown (possibly, the primates may be seen as rivals, cooperators, or may induce care-taking motivation), the respondents hardly evaluate the primates as potential romantic partners. However, recognition of human attractive and unattractive facial features is strongly tied to the identification of healthy and fertile mates and to the increase of one's fitness (e.g., Thornhill and Gangestad, 1999; Fink and Penton-Voak, 2002; Little et al., 2011). One can thus imagine that the selection pressure led to the perfection of fast and precise ability to assess the attractiveness of the conspecific faces. The question of interest now is whether the cues for the recognition of attractive human faces remain the same when assessing the attractiveness of non-human, but similar faces. To answer this question, we examined various factors, often described as important cues in the evaluation of human faces, and analyzed their effect on human-perceived beauty of primate faces.

Instead of experimenting with subtle facial details using computer manipulations, we chose to examine the true facial variability of extant primate species, which is enormous. Some primates are more similar to humans than the others, and their facial features—the same features that are heeded in human facial attractiveness, i.e., eyes, nose, mouth, etc.,—are often exaggerated to the extremes that may be perceived as human caricatures. And while caricatures (i.e., faces with high distinctiveness, Deffenbacher et al., 1998) may be helpful for a better recognition of individuals (Rhodes et al., 1987; Mauro and Kubovy, 1992), it is the average human faces that are considered as attractive (e.g., Rhodes et al., 2001; Trujillo et al., 2014). Distinctiveness is only seen as attractive when composited from highly attractive features, such as the higher cheek bones, thinner jaws, larger eyes, shorter length between mouth and chin, and between nose and mouth (Perrett et al., 1994).

The sexual dimorphism, i.e., masculinity and femininity, also plays an important role in the perception of human facial attractiveness. For example, Little et al. (2002) found that women showed higher preference for male face masculinity when judging for short-term relationships than when judging for long-term relationships. When studying the effect of sexual dimorphism in non-sexual context, Little et al. (2008) found that both women and men preferred more feminine female faces and more masculine male faces, though the preferences were stronger in women than in men. Other papers (Perrett et al., 1998; Rhodes et al., 2000) report that both women and men preferred more feminine faces, regardless of the face gender. Most of the papers agreed on the lack of difference between male and female respondents in the direction of preferred sexual dimorphism, they only differed in the degree. However, masculine male and female faces are perceived by respondents of both genders as dominant (Perrett et al., 1998). The features that make faces look masculine or feminine are very specific. For example, larger jawbones, more prominent cheekbones, and thinner cheeks are all features of human male faces that differentiate them from female faces (Little and Hancock, 2002). However, the particular features may differ from species to species—e.g., masculine features of a Mandrill rather include elongated jaw and bright colors (Dixson, 2012). Thus, this variable is not fully comparable when studying facial attractiveness across all primates.

Apart from human facial attractiveness, a lot is known about the human-rated attractiveness of animals (e.g., Frynta et al., 2009; Marešová et al., 2009; Landová et al., 2012, 2018; Frynta et al., 2013, 2014; Lišková and Frynta, 2013; Lišková et al., 2015). Specific features, such as an overall shape, body size, achromatic components including pattern, surface (skin/feather/fur) texture and coloration, taxonomic classification and human-likeness, etc., determine whether an animal will be preferred or neglected. As the full variability of primate faces include those that are more and less similar to humans, a mix of factors usually known for affecting the attractiveness of both human faces and animals may

play a role. Thus, many of these factors were included into the analysis.

In short, the purpose of this study is to examine humanperceived attractiveness (i.e., positive affinity toward an object) of primate faces. To our knowledge, this is the first study that focuses on the full variability of primate faces across all taxonomic groups. Other studies so far focused on facial attractiveness of humans or animals that are not closely related to humans (e.g., dogs and cats, Archer and Monton, 2011; Hecht and Horowitz, 2015; foxes, Elia, 2013). With the wide focus of this study, we aim to get insight into the human perception of primates, including the subjective recognition of human-animal boundary.

In our mainly exploratory study, we focused on the following two questions: (1) which factors determine the primate facial attractiveness (or beauty) rated by human respondents? (2) Do these determining factors differ among different primate groups? There are three main groups of primates: the prosimians, which are phylogenetically least related to us, the New World monkeys (Platyrrhini), and its sister taxon Catarrhini, which includes Old World monkeys, gibbons, great apes, and humans. Is the beauty of each of the groups rated using different cues? In search for the answers on these questions, we analyzed the effect of morphology, sexual size dimorphism (SSD), pattern, human-likeness, and colors, and we discussed the findings in terms of known facts about both human facial attractiveness and beauty of animals.

## MATERIALS AND METHODS

#### Selection of Species

There are about 376 known extant primate species (Wilson and Reeder, 2005) covering a wide range of morphological variability. For the purpose of this study, we aimed to choose a number of stimuli that would cover as much variability as possible. Thus, we included at least one species from each genus, except for Phaner (Fork-marked Lemur), Procolobus (Olive Colobus), Pseudopotto (False potto) and Simias (Pig-tailed Langur), of which there were no acceptable photographs available at the time of stimuli preparation. We also purposely excluded a human as we did not want to direct the respondents to rank the primate faces in the context of human facial attractiveness. The particular species within genera including similar species were selected based on availability of acceptable photographs. Where there was a high morphological variability within the genus, we included more species (two to eight; e.g., Cercopithecus, Eulemur, Macaca, Saguinus, etc.). The East Javan Langur (Trachypithecus auratus) was included in both black and orange forms. In case of sexually dimorphic species, only males were included. There is a trade-off between the inclusion of both sexes and taxonomic coverage as the number of stimuli need to be limited so that the respondents stay interested and give reliable rankings. Ten species were represented by two different individuals for a control. The random factors were set in a nested hierarchy. The variance of beauty ranking among individuals of the same species was negligible when compared to that between species (VarCorr function in R: Variance nested in the Group (infraorder/superfamily/family): 0.06898805; Genus: 0.10402177; Species: 0.24383308; Residual = individual: 0.05087559). We then assessed correlations between the conspecifics. Spearman's correlations for all factors (colors, facial measurements, rankings) were high and significant at the p < 0.05 level, except for mouth width and chin (beauty: r = 0.73; human-likeness: r = 0.95). The dataset contained 117 pictures in total (107 after the removal of the control species/individuals, which were not included in the analyses to avoid pseudo-replication). For the full list of included species, see **Supplementary Appendix 1**.

## Preparation of the Stimuli

We collected good quality photographs of primates facing the camera. The main resources were Flickr<sup>1</sup> or Wikimedia Commons<sup>2</sup> licensed under the Creative Commons license. Supplementary resources were our own photos, photos provided by addressed authors, and books (Rowe, 1996; Mittermeier et al., 2010). For the full list of picture resources, see **Supplementary Appendix 1**.

Each photograph was modified to show the primate face in a standardized position: the background was cut off and set to white and the face (in the form of a bust, see **Figure 1**) was rotated so the eyes were intersecting a straight, notional horizontal line. The primate faces were size-adjusted to cover approximately the same space relative to each other on each image. When there were primates showing an emotional expression (e.g., a smile or a frown) or looking sideways, the photos were retouched so the face showed a neutral expression with eyes looking straight to the camera (see **Figure 2**). Because the used photos could not be standardized under the exact same angle, the primates in the pictures slightly differed in the degree of rotation on both vertical and horizontal axes. Thus, we could not test the effect of symmetry on human rankings of primate facial "beauty" as it clearly corresponded to the rotation of the faces. This rotation had no effect on any of the explained variables (none of the Spearman's correlations were significant at the p < 0.05 level) and thus was excluded from further analyses.

### Definition of the Groups

The recognized taxonomy of Primates consists of seven distinct groups: Lorisoidea (African and Asian prosimians), Lemuroidea (Madagascar prosimians), the Tarsiers, Platyrrhini (New World primates), Cercopithecoidea (Old World monkeys), hylobatids (gibbons), great apes, and humans (Purvis, 1995; Yoder and Irwin, 1999; Pastorini et al., 2001; Geissmann et al., 2004; Mayor et al., 2004; Lei, 2008; Finstermeier et al., 2013). To identify groups of reasonable morphological variability suitable for the purpose of the analysis of human-rated facial attractiveness of primates, we performed the canonical variate analysis (CVA) using the geometrical morphometry data (see below, Section "Shape"). The CVA separated the primates into three distinct groups (see **Figure 3**), referred to as prosimians, Platyrrhini and Catarrhini in the manuscript. The analysis also confirmed morphological distinctiveness of humans when compared to other primates.

<sup>1</sup>https://www.flickr.com/

<sup>2</sup>https://commons.wikimedia.org

FIGURE 1 | Examples of the stimuli rated by the respondents. The depicted primates are, from upper left to upper right: Pygmy Marmoset (Cebuella pygmaea), Philippine Tarsier (Tarsius syrichta), and Goeldi's Monkey (Callimico goeldii); from lower left to lower right: Red Slender Loris (Loris tardigradus), Black Lemur (Eulemur macaco), and Northern Talapoin Monkey (Miopithecus ogouensis).

FIGURE 2 | Example of a standardization/modification of the stimuli. (a) The original, unaltered picture of a Gelada (Theropithecus gelada). (b) We modified the photo to use it as a stimulus: the background was cut out, the head was rotated to a straight vertical position and the mouth was closed. Photo© Alan Hill, used with a permission.

## Testing Human Preferences

Preferences for each of the primate faces were assessed using an online survey following Frynta et al. (2010) and Lišková and Frynta (2013). The respondents (n = 286, 91 men and 199 women) were Czech citizens, 15–69 years old (mean age was 22.7 years). Their task was to rate each of the faces on a scale (1–7 Likert scale; 1 = the most "beautiful," 7 = the least "beautiful" or "ugly") according to their perceived "beauty." The photographs, resized to 360 × 540 pixels, were presented one by one on a computer screen in a random order. Prior to the presentation of the stimuli, the respondents were able to see the whole variability of the stimuli in the form of thumbnail-sized preview pictures (160 × 240 pixels). After that, the respondents started to score the pictures. The whole set was divided into groups of 39 photos, and after evaluating each of the groups, the respondents were allowed to take a rest, although the majority of the respondents finished the scoring without the need of a break. In total, all 286 respondents rated all 117 pictures.

# Explanatory Variables

#### Shape In our study, we aimed to cover the whole facial variability including the length of facial hair and the forehead size of animal (primate) faces. However, landmarks usually used in human facial studies either only include the shape and position of the eyes, nose, mouth, and chin (e.g., Mitteroecker et al., 2013), or include landmarks that are not applicable for frontal view of primate faces (e.g., Sforza et al., 2007). Thus, we adopted the landmarks of Borgi et al. (2014), who already defined landmarks of animal faces, which, with a few modifications, could easily fit our experimental stimuli (see **Figure 4**): (A) top of the head, (B) right side of the face, (C) left side of the face, (D) end of chin, (E1, G1) outer sides of right and left eyes, respectively, (E2, G2) inner sides of right and left eyes, respectively, (F) middle point of the reference cross, (H) right side of the nose, (I) left side of the nose, (J) tip of the nose, (K) left end of the mouth, (L) middle point of the mouth crossing the reference line, (M) right end of the mouth, (N) top point of head hair, (O1, O2) right and left tips of side hair, (P) tip of the chin hair (beard). Five human facial measurements were added (photos were selected randomly from the FEI Face Database; Thomaz, 2012) and these data were then used to perform the CVA analysis (see above in Section "Definition of the Groups").

The landmarks were then converted to traditional morphometric variables: the face height (the A–D distance

measured in pixels) and width (B–C), forehead height (A–F), eyes size (averaged E1–E2 and G1–G2 distances), nose length (F–I) and width (H–J), mouth width (K–M), side-hair (averaged O1–B and C–O2), top-hair (N–A), beard (D–P), interocular length (E2–G2), eyes-to-mouth distance (F–L), philtrum (noseto-mouth distance, I–L), and chin (L–D). All analyzes and data transformations involving the landmarks were done using the IMP software series (Zelditch et al., 2012).

We then extracted maximum likelihood factors from these traits (varimax normalized) to reduce the number of morphological factors for the GLM/GLS analyses and especially to eliminate mutually correlated variables. The first extracted factor, accounting for 20.5% of variation, was interpreted as "outer facial features" (the height of the face and forehead on one side and the length of the beard and top-hair and width of the side-hair on the other side of the axis), while the second one (22.5%) corresponded to "inner facial features" (mainly the distance between eyes and nose from the mouth on one side and the size of the eyes and their distance on the other side of the axis; for factor loadings, see **Figure 5**).

#### Colors

To examine the effect of colors on the respondents' ranking, we used the software Barvocuc (Rádlová et al., 2016) to extract specific information about hues, lightness and saturation of each of the stimulus pictures converted to the HSL colorspace. For a detailed description of the Barvocuc software, see (Lišková and Frynta, 2013 and Lišková et al., 2015). The variation in color is

much smaller among primates than other animals, such as birds. Thus, the picture set included only the following colors, which we pre-defined using the software to describe the primate faces as accurately as possible: red (corresponding to the reddish brown in the primate photos) <350◦ ; 18◦ ), orange < 18◦ ; 45◦ ), yellow (corresponding to yellowish brown) <45◦ ; 75◦ ), and bluish tint <170◦ ; 270◦ ). The variability of blue color was too low in the dataset (only two primates possessed true blue facial parts: the Mandrill Mandrillus sphinx and Golden Snub-nosed Monkey Rhinopithecus roxellana). However, the blue color was present in a small amount on several photographs in the form of a bluish tint. Because blue color plays a crucial role in the determination of human preferences toward many groups of animals (Frynta et al., 2010; Lišková and Frynta, 2013; Lišková et al., 2015; Ptácková ˇ et al., 2017), we decided to include the "bluish tint" color (blue hue minus the facial parts of M. sphinx and R. roxellana) as an explanatory variable for the analysis.

The values for saturation (S) and lightness (L) covered the interval 0–1. We defined three additional colors: black (L < 0.20), white (L > 0.71), and gray (S < 0.15). The white background of the stimuli was set to transparent and thus excluded from the calculation. In order to improve normality, the portion of colored pixels in the tested pictures was square-root arcsin transformed prior to the analyses. We also included the "pattern," computed using an edge detection method (Sobel, 1978) as an explanatory variable in the analyses. The highest values of the pattern variable corresponded to the agouti coloration of some of the primates.

#### Sexual Size Dimorphism

Sexual dimorphism as studied in context of human facial attractiveness usually refers to the sexual shape dimorphism. This variable, however, is hardly comparable to primate facial sexual shape dimorphism. It is because each face represents a different species, and the particular features shaping males and females may differ for each species. These features may include various characteristics such as conspicuous cheeks, enlarged noses, colorful prolonged snouts, etc. These masculine features are not directly comparable to those of human males, which are defined by, e.g., subtle changes in jawbones size or more prominent cheekbones, as mentioned above (Little and Hancock, 2002). However, it is possible to use related species characteristics that is available from published sources—the sexual dimorphism in body size. Sexual selection, alongside with the increase in male body size, promotes the emergence of novel conspicuous traits, including those visible on primate faces. Thus, the larger the size difference between the sexes, the larger are the distinctive facial features. For example, size dimorphism in canines increases with SSD in primates (Leutenegger and Kelly, 1977; Kay et al., 1988) and thus modifies the primate mouth shape as bigger canines require more elongated jaws (Weston et al., 2004). Adult

males of sexually dimorphic (male-larger) species display red and blue sexual skin (e.g., the Mandrill), capes of hair, and facial adornments (e.g., the Bald Uacari, Proboscis Monkey, Golden Snub-nosed Monkey, or the Orangutans; Dixson, 2012). In this paper, we utilized this variable to indirectly examine the effect of conspicuous traits on the human evaluation of primate facial attractiveness.

Sexual size dimorphism was expressed using the Lovich and Gibbons ratio (LG ratio; Lovich and Gibbons, 1992), which produces measures of sexual dimorphism continuous around 0. The values were computed as follows: (body weight of the larger sex/body weight of the smaller sex) -1, negative by convention when males are the larger sex and positive when females are larger than males. LG ratios of the primates set varied within the range of -1.371 in the Western Gorilla (Gorilla gorilla) to 0.313 in Lemurine Night Monkey (Aotus lemurinus). The body weights were adopted from Lindenfors and Tullberg (1998); Gordon (2006), and Mitteroecker et al. (2013).

#### Human-Likeness

Sixty respondents (different from the ones evaluating the attractiveness) repeated the procedure described above (Section "Testing Human Preferences") to rate the primates' humanlikeness (1–7 Likert scale; 1 = most human-like, 7 = not human-like at all). Agreement among the respondents in humanlikeness of the primates was exceptionally high. The intra-class correlation (ICC, see later in the text), assessed using a twoway, consistency measure, was in an excellent range: ICC = 0.986 for average-measure, 0.553 for single-measure (Hallgren, 2012). To ensure that the knowledge of the great apes being the most phylogenetically related to humans did not distort the overall agreement, we also checked for the ICC of the data excluding the Homoidea (great apes and gibbons): ICC was 0.983 for averagemeasure, 0.5 for single-measure; i.e., these analyses show that the respondents agreed well on the human-likeness of the particular primate groups/species and their rankings were not influenced by just the most human-like apes. The multivariate analysis of variance revealed no effect of gender, age, nor their interaction. Thus, we pooled the dataset and used the mean values in the analyses as a reliable estimate of human-likeness of the ranked primate species.

#### Statistical Analyses

In order to quantify and test congruence in species ranking provided by different respondents, we adopted a two-way, consistency, average-measures intra-class correlation (ICC; McGraw and Wong, 1996; Hallgren, 2012) computed in R (irr package). Principal component analysis (PCA) was performed to visualize the multivariate structure of the data sets and to extract uncorrelated axes for further analyses. MANOVA and General Linear Models (LMs) were applied to test the effects of independent explanatory variables. Full LMs were further reduced according to Akaike criterion until log-likelihood tests revealed a significant comparison between the full and reduced models. Mann–Whitney test was used as a non-parametric alternative for variables deviating from normality (raw sores). The contribution of the explanatory variables (constrains) to the attractiveness rating of the primate faces was examined in redundancy analysis (RDA) as implemented in the R package vegan (Oksanen et al., 2017). RDA is a multivariate direct gradient method. It extracts and summarizes the variation in a set of response variables (subjective evaluation of primate beauty) that can be explained by a set of explanatory variables (see Section "Explanatory Variables"). This analysis permits to plot both response and explanatory variables to a space defined by the extracted gradients and enables to detect redundancy (i.e., shared variability) between sets of response and explanatory variables. Statistical significance of the gradients was confirmed by permutation tests. Most of the calculations were performed in R (R Development Core Team, 2010) and Statistica 9.1. (StatSoft, 2010).

## Ethics Statement

This study was carried out in accordance with the recommendations of Institutional Review Board (IRB), Faculty of Sciences, Charles University in Prague, approval no. 2013/7, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the IRB.

## RESULTS

## Agreement Among the Respondents

Results of the ranking procedure revealed considerable congruence among the respondents. Although the reliability of the individual rankings was quite low (ICC = 0.147, 0.204, 0.182 for men, women, and pooled data, respectively, with all p < 0.001), the ICC for the average-measures was in an excellent range: ICC = 0.940 for men, 0.981 for women, and 0.985 for the pooled data (Shrout and Fleiss, 1979; Cicchetti, 1994). These results indicate that there was a high degree of agreement within the group of the respondents and suggest that preferences for primate faces were rated similarly. Also the correlation between ranks provided by male and female respondents was very high: r <sup>2</sup> = 81.6%, p < 0.001. Multivariate analysis of variance revealed no effect of age (Wilks = 0.6191, F171, <sup>109</sup> = 0.97, p = 0.575) or age × gender interaction (Wilks = 0.5823, F171, <sup>109</sup> = 1.13, p = 0.2436), nevertheless, a small, but significant effect of gender (Wilks = 0.5384, F171, <sup>109</sup> = 1.35, p = 0.041) was found. To identify the species that substantially contributed to the gender differences, we performed Mann–Whitney U tests comparing the raw ranks of each species in male and female respondents; the levels of significance were Bonferroni-corrected. Men significantly differed in their preferences from women in only five cases, all of which were preferred by men more than by women: de Brazza's Monkey (Cercopithecus neglectus), Patas Monkey (Erythrocebus patas), Humboldt's Woolly Monkey (Lagothrix lagotricha), Drill (Mandrillus leucophaeus), and the Sumatran orangutan (Pongo abelii). Because the gender differences were small and involved only 5 out of 107 examined species of primates, we decided to pool the genders in further analyses concerning the means or multivariate axes (RDA) computed from the preference ranks. Both of these methods extract the agreement among respondents and thus further blend the minor effect of gender.

## The Attractiveness

fpsyg-09-02439 December 8, 2018 Time: 15:7 # 8

The primates whose faces were rated as the most "beautiful" were mostly prosimians: the top winners were the Black-and-white Ruffed Lemur (Varecia variegata), Ring-tailed Lemur (Lemur catta) and Southern Lesser Galago (Galago moholi). Moreover, little monkeys such as the marmosets (Callitrichinae) were favorite among the respondents, together with apes such as the Agile Gibbon (Hylobates agilis) and Bonobo (Pan paniscus). In contrast, the Proboscis Monkey (Nasalis larvatus) or the Bald Uacari (Cacajao calvus) were rated as the "least beautiful" (or "ugly").

When overviewed within the particular groups, all three groups included both attractive and unattractive species (see **Figure 6**). However, the prosimians were rated as significantly more attractive than the other groups (post-hoc Tukey test, p < 0.01). The particular cues affecting the respondents' decision and the relation to the uncanny valley theory is discussed below in the respective sections.

#### RDA Analysis of the Factors Affecting Attractiveness

We employed RDA to examine the contribution of various explanatory variables to the ratings of primate facial attractiveness. We utilized the automatic model-building feature based on both Akaike criterion (but with permutation tests) and on permutation p-values. Both methods agreed on the inclusion of the following variables into the reduced model, which were then confirmed as significant by the sequential "Type I" test (n permutations = 10,000): Factor2 (i.e., inner facial parts; F1,<sup>100</sup> = 22.4244, p < 0.0001), human-likeness (F1,<sup>100</sup> = 6.0119, p < 0.0001), blue color (F1,<sup>100</sup> = 3.4310, p = 0.0011), Factor1 (i.e., outer facial parts; F1,<sup>100</sup> = 2.8690, p = 0.0050), LG (F1,<sup>100</sup> = 2.3811, p = 0.0125), and pattern (F1,<sup>100</sup> = 1.9238, p = 0.0400). The RDA model has generated six constrained axes, which explained 28.08% of the full variability.

The visualization of the RDA results (see **Figure 7**; note that for a better clarity, we multiplied human-likeness and LG by −1 so the higher the number, the higher is both the human-likeness and exaggeration of the male facial parts) showed that Factor2, i.e., the inner facial parts, dominated the first multivariate axis (RDA1; correlation of RDA1 site scores with Factor2: r <sup>2</sup> = 72.2%, p < 0.0001). As the most attractive species are located on top and the least attractive on the bottom of the graph (second axis), we can conclude that the RDA2 axis corresponds to the actual attractiveness of the species. Correlation of the mean attractiveness scores with the RDA2 site scores supports this: r <sup>2</sup> = 69.6%, p < 0.0001. The only factors associated with this attractiveness irrespective of the second axis (and thus the primate grouping, which corresponds to this axis) are blue color (positive effect) and pattern (negative effect). The graph clearly shows that the grouping of the primates (based on real morphology) reflects the respondents' ratings of the species' beauty, i.e., the respondents' classification of the primate facial beauty differs among the groups and is mainly based on the inner facial properties of the species. Both the extent of human-likeness and the extent of male sexual dimorphism (-LG) feed this second morphological axis.

#### GLM of the Factors Affecting Attractiveness

In order to examine which factors contribute to the variability of preference rankings, we performed LMs (see **Table 1**). The initial full model of all the primates together (n = 107) included the group, outer (Factor1) and inner (Factor2) facial features, LG, human-likeness, mean lightness, pattern, mean saturation, reddish brown, orange, yellowish brown, and bluish tint. After reduction using the Akaike Information Criterion (AIC, Akaike, 1998), the reduced model explained 34.4% of variation in preference ranks (p < 0.0001) and included the group F2,<sup>100</sup> = 11.0290, p < 0.001), inner facial features (F1,<sup>100</sup> = 16.2674, p = 0.0001), LG (F1,<sup>100</sup> = 5.7186, p = 0.0187), human-likeness (F1,<sup>100</sup> = 8.0299, p = 0.0056), and bluish tint (F1,<sup>94</sup> = 9.4351, p = 0.0027).

We then conducted the same analysis separately for each of the groups Catarrhini, Platyrrhini, and prosimians (see **Table 1b,c**). Catarrhini (n = 50): the reduced model (r <sup>2</sup> = 54.6%, p < 0.0001) included the inner facial features (F1,<sup>44</sup> = 27.1710, p < 0.0001), LG (F1,<sup>44</sup> = 5.5832, p = 0.0226), human-likeness (F1,<sup>44</sup> = 16.6525, p = 0.0002), reddish brown, and bluish tint (F1,<sup>44</sup> = 12.4338, p = 0.0010). Platyrrhini (n = 24): the LG, human-likeness, mean lightness, pattern, orange, and yellowish brown color remained in the model (r <sup>2</sup> = 40.1%, p < 0.0001), but only the mean lightness (F1,<sup>17</sup> = 6.8828, p = 0.0178) and yellowish brown (F1,<sup>17</sup> = 6.4430, p = 0.0212) retained significance. The model for prosimians (n = 33) failed to explain any variability and was not significant.

## DISCUSSION

## The Effect of Shape

The inner facial parts represent one of the strongest factors determining the beauty of the species within the group of the primates most similar to us, i.e., the Catarrhini. Thus, the size of eyes, interocular length, mouth width, and length from the nose to the mouth (or eyes to the mouth) are strong cues that our respondents use as a guide when ranking the "beauty" of Catarrhine faces. Consequently, although the respondents were not instructed to categorize the primates (the scoring procedure in our experiment instructed the respondents to assign scores to the pictured faces according to the subjectively perceived beauty), the RDA2 axis shows that the respondents still categorized the ranked subjects, and this categorization was mainly based on the inner facial parts of the primate faces. This phenomenon is often reported in studies focused on human perception of animal beauty (e.g., snakes; Marešová et al., 2009; Landová et al., 2012; birds: Lišková et al., 2015) and resembles the task recognized as unsupervised human categorization (Pothos and Chater, 2002; Pothos and Close, 2008).

In literature, the understanding of the role of inner and outer facial features is unclear. Some authors claim that young children mostly use the outer facial features as the cues for facial recognition, and then this pattern switches to the "adult version," in which the faces are recognized using the inner

TABLE 1 | The final reduced models (GLM) describing the effect of the morphology, colors, and human-likeness on the attractiveness scoring of each of the main primate groups.


features (Campbell et al., 1995; Turati et al., 2006). Other authors argue that both children and adults use inner facial features for the recognition of familiar faces, but outer facial features for recognition of the unfamiliar ones (Ellis et al., 1979; Want et al., 2003; Bonner and Burton, 2004; Ge et al., 2008). Our results show that the inner facial features are not only used for categorization of the primates, but also play a very important role in the assessment of the facial beauty of the Catarrhine primates. Outer facial features are used to a much less extent, but also appear to contribute to the assessment of primate beauty (see **Figure 7**).

## Colors and Pattern in Primate Facial Attractiveness

Our results show that two colors affect the attractiveness of primate faces: the bluish tint (in Catarrhines and the full picture set) and the yellowish brown color (Platyrrhines). In literature, colors do play a role in the assessment of attractiveness, especially the red color, which is important for both humans and non-human primates. Human faces exhibiting brighter red are perceived as healthier and more attractive (Re et al., 2011). Female Rhesus Macaques prefer males with redder faces (Waitt et al., 2003; but see Waitt et al., 2006, where this preference only applied to red hindquarters). Moreover, red clothing or even extraneous red (for example, red background of a presented picture stimulus) is perceived as more attractive by both human respondents (Elliot et al., 2010) and non-human primates (Hughes et al., 2015).

We examined the full variability of colors present in the picture set of primates, i.e., not only red, but also orange, yellow, and the bluish tint. Within our examined picture set, only three primates possessed bright red coloration of the face (The Bald Uacari, Silvery Marmoset, and Japanese Macaque). Thus, we instead tested the effect of the overall presence of the red color, mostly expressed as darker red or reddish brown fur color. However, we found no effect of this color on human preferences. The only color that positively affected human decisions toward all primates (regardless of the particular groups) was blue—the same color that is, within the context of facial attractiveness, usually perceived negatively, as blue, pale faces indicate low oxygenation and poor health (Stadie, 1919). However, blue is very often reported as the most preferred color in other studies examining human rating of animal beauty, e.g., parrots (Frynta et al., 2010), birds (Lišková and Frynta, 2013; Lišková et al., 2015), and even snakes (Ptácková et al., 2017 ˇ ). In the latter case, the color was present only in the form of a bluish tint. Similarly, in this

paper, this bluish tint affected the overall evaluation of the beauty and was rather the effect of the photos than the primate species themselves.

inverted so that the higher value corresponds to higher attractiveness/human-likeness.

When analyzing the particular groups, the GLM revealed that the attractiveness of the New World monkeys was positively affected by the yellowish brown color. This is in agreement with the perception of human faces as more attractive (Stephen et al., 2012) and healthier (and thus more attractive: Stephen et al., 2009) show that the respondents increased yellow color together with red and overall lightness when aiming to create a healthy-looking human face. Even papers dealing with animal attractiveness report preference for yellow color in some cases, especially when rating the beauty of birds (Lišková and Frynta, 2013; Lišková et al., 2015). However, animal attractiveness is usually mainly determined by the pattern and achromatic contrasts (Lišková and Frynta, 2013; Lišková et al., 2015). Similarly, the only other variable next to the yellow color that explained the attractiveness ratings of the New World primates was the mean lightness—the respondents rated darker monkeys as more beautiful. This agrees with the animal studies but contrasts human-facial studies in which either lighter (Van den Berghe and Frost, 1986; Stephen et al., 2009; Coetzee et al., 2012) or medium-toned, but not pale or black faces are rated as more attractive (Frisby, 2006; Stephen et al., 2012, but see Fink et al., 2001).

Although the pattern variable was dropped out from the final LMs, the multivariate analysis shows that, at least to some extent, it negatively affects the evaluation of overall primate facial attractiveness. This seems to contradict some of the papers that report positive effect of pattern to the evaluation of animal (Lišková et al., 2015; Ptácková et al., 2017 ˇ ) and mammalian (Landová et al., in prep.) beauty. However, it is the highly contrasting pattern of large spots, stripes, and other marks, that positively affects the perceived animal beauty; not the diminutive unevenness of the fur color (i.e., agouti-type fur coloration), which is what the pattern variable used in this study corresponded to and which was perceived negatively. The primates possessing contrasting patches of fur coloration (e.g., the Black and White Ruffed Lemur and the Ring-tailed Lemur) were still considered the most beautiful.

Homogenous skin color distribution and surface topography (wrinkles), signs of health and age, also affect human preferences for attractive conspecific faces (Samson et al., 2010) and could possibly affect the preferences of primate faces as well. However, our set of stimuli was not controlled for the age of the depicted individuals (they were all adults of unspecified age) and thus we could not test the effect of the features affecting perception of age. Moreover, majority of the species included in the study possessed a face that was fully covered by fur. A carefully designed experiment with more uniform stimuli varying only in facial surface topography (e.g., faces of chimpanzees of varying age, or a manipulated picture set) would be needed to examine this interesting question.

## SSD, Averageness, and Facial Extremities

There is not much variability within the prosimians in SSD, as most species have genders of similar size (Dixson, 2012). Platyrrhine primates differ more (variance in LG reaches from −0.72 in the Brown Capuchin to 0.31 in the Lemurine Night Monkey), but still lack the most prominent facial extremities typical for many male-larger Catarrhine species such as the Western Gorilla, Mandrill, Drill, Orangutans, Golden Snubnosed Monkey, Proboscis Monkey, Patas Monkey, Gelada, or the Lion-tailed Macaque. Thus, it is not surprising that the

degree of sexual dimorphism only affects the beauty within the Catarrhines—the male-larger species are perceived as less beautiful. In other words, the respondents rate conspicuous facial features ("extremities") negatively. Similarly, many researchers agree that distinctive features (caricatures) usually help for better recognition of individual faces (Rhodes et al., 1987; Mauro and Kubovy, 1992; Lee et al., 2000; but see Hancock and Little, 2011), but are rated negatively when being evaluated for attractiveness (Deffenbacher et al., 1998). In other words, conspicuous features are usually rated as unattractive as opposed to average features (e.g., Rhodes and Tremewan, 1996; O'Toole et al., 1999). However, preference for facial averageness may be based on a more general principle as other objects such as dogs, birds, fish, or cars were also found to be preferred when they were of an average shape (Halberstadt and Rhodes, 2000, 2003).

## Human-Likeness and the Uncanny Valley Theory

The uncanny valley theory describes an empirical rule first mentioned in an essay by a robotics professor Masahiro Mori in 1970 (and later re-published in English for a broader audience to see; Mori et al., 2012). Mori hypothesized that if a robot resembled humans in appearance, people would feel affinity toward it, up to the point where it was too similar to humans almost undistinguishable. At that point, people would experience a negative, eerie-like sensation, and he called this descent of affinity the "uncanny valley." This relationship of human-likeness and attractiveness (in sense of positive affinity toward an object; for a relationship between the emotional ratings of eeriness and attractiveness, see Burleigh et al., 2013) was later tested in a number of papers, which found evidence in support of the uncanny valley theory (e.g., Seyama and Nagayama, 2007; MacDorman et al., 2009; Mitchell et al., 2011). Steckenfinger and Ghazanfar (2009) described a support for this phenomenon even in macaque monkeys, which preferred realistic and stylized macaque faces over faces very close to realism. The reason behind uncanny valley is unclear; often disputed mechanisms include the atypical feature hypothesis or the category conflict hypothesis (Burleigh et al., 2013). In the first case, the effect of uncanny valley is present when evaluating pictures that include an abnormal feature, such as bizarre eyes (Seyama and Nagayama, 2007). In

the second case, the uncanny valley negatively affects stimuli containing features belonging to multiple categories, eliciting discomfort because it is evaluated as ambiguous and confusing (Saygin et al., 2011). Once the respondents cease to recognize the conflicting object as "human," the attractiveness returns back to the linear character of the observed attractiveness (or eeriness).

In this matter, our results may seem contradictive as overally the most preferred primates were the prosimians, which were rated as the least human-like. However, within the group of Catarrhine primates, i.e., the group phylogenetically closest to humans (which also includes species most similar to humans, see **Figure 6B**), human-likeness positively affected the rated facial attractiveness. Thus, it is reasonable to examine whether the effect of uncanny valley can be applied to the results within this group. The graph clearly shows that some of the most human-like species do "fall into the valley," but when looked at in more detail, there are exceptions to this uncanny valley rule. Some primates sharing similar rates of human-likeness fall into the notional valley, while others remain attractive. The unattractive, yet human-like primates, are represented by species such as the Orangutans, Proboscis Monkey, and the Drill, and the reason may be the presence of the abnormal (distinctive) features, as discussed above in Section "SSD, Averageness, and Facial Extremities". Interesting fact is that at least some of these features, such as the prolonged nose of the Proboscis monkey, are not perceived as unattractive per se: for example, the elephants, elephant shrews, coatis or tapirs all possess a prolonged nose, but they were all rated as very or fairly attractive in a previous study (Frynta et al., 2013). Thus, these results cannot be interpreted simply as a preference for average feature size, but rather as a preference for average feature size when present on a human or human-like animal. Possibly, our complex neural system for facial recognition causes the judgements of "beauty" to be far more strict when judging "humans" (including humanlike objects and animals) than different objects (Hanson, 2005). The uncanny valley phenomenon can be in fact linked to the expertise to human faces: the reason why the uncanny effect is so widespread is because every human has an expertise in recognizing humans; however, it is possible that the same effect might be observed in other types of expertises, as these display similar behavioral and neuropsychological pattern (Diamond and Carey, 1986; Xu, 2005; Dufour and Petit, 2010).

Previously, some researchers showed that even distinct, extreme features (up to some point) can be perceived as more attractive than average, if the exaggeration is based on attractive features. Such feature then may represent a super-stimulus,

which is a concept derived from ethological studies, describing an object that contains features more accentuated than natural stimulus, which elicits a response more strongly than the stimulus for which it evolved (Tinbergen, 1951). For example, Perrett et al. (1994) report that enhanced female features, including higher cheek bones, thinner jaws, and larger eyes, were rated as more attractive than average (also see Perrett et al., 1998), and similar results were found by Jones and Hill (1993), who found that enlarged proportion of eye width to face height (i.e., feminine/neotenous feature) was preferred more than the average proportions. Could the presence of enhanced attractive features be the reason why in our data, some human-like primates are attractive above the linear relationship between humanlikeness and attractiveness (see **Figure 8**)? For example, the Agile Gibbon (H. agilis) does have a contrasting, black-and-white face, a feature found to be very attractive in human preferences for animals (Frynta et al., 2013; Lišková et al., 2015). It would be interesting to examine this phenomenon in another study with more controlled, manipulated stimuli.

The support for the uncanny valley theory is ambiguous in our study. Some of the most human-like primates do fall into the notional "valley"; however, some other primates overcome it. Thus, our results rather show that it is the "atypical feature present on a human-like object" that makes some of the primates to be rated very negatively. However, this does not necessarily neglect the uncanny valley. Rather, it supports the atypical feature hypothesis of the uncanny valley theory (Seyama and Nagayama, 2007; MacDorman et al., 2009).

## CONCLUSION

In our study, we focused on human evaluation of primate facial attractiveness. We found that there are differences in the evaluation of the three main primate groups. The attractiveness of the Catarrhine primates, i.e., the Old World monkeys, Gibbons, and the Great Apes, was explained by human-likeness, and also by factors similar as those usually utilized when evaluating human facial attractiveness: the inner facial features and SSD (i.e., lack of extreme, conspicuous features). Interestingly, the proportions of inner facial features were only used when evaluating the most human-like primates; in other groups, this factor had no effect, and its importance thus cannot be attributed to the evaluation of faces in general, but only those resembling humans.

The Platyrrhine primates, i.e., the New World monkeys, are phylogenetically more distant to humans. Regarding similarity to humans, they are somewhere between the Catarrhines and the prosimians (see **Figure 6**), and the results explaining their attractiveness scores reflect this. Their attractiveness is

#### REFERENCES

Akaike, H. (1998). "Information theory and an extension of the maximum likelihood principle," in Selected Papers of Hirotugu Akaike, eds E. Parzen, K. Tanabe, and G. Kitagawa (New York, NY: Springer), 199–213. doi: 10.1007/978- 1-4612-1694-0\_15

determined by human-likeness, yellowish brown color, and the mean lightness. However, the respondents liked more the monkeys that were scored as less-human like. The orange color, pattern, and SSD are all factors that could not be excluded from the final model but showed to be insignificant. The number of Platyrrhine primates included in the analysis was small though and it is thus possible that a larger sample could reveal significance of these factors. One way or another, the attractiveness of the Old World monkeys seem to be affected by factors that are otherwise reported as affecting evaluation of both human and animal attractiveness. On the contrary, the prosimians were rated as the most beautiful, but our analysis failed to reveal the particular cues responsible for their high scores.

## DATA AVAILABILITY STATEMENT

The datasets generated during and/or analyzed during the current study are available in the Mendeley repository, http://dx. doi.org/10.17632/ssv9m953mb.1.

## AUTHOR CONTRIBUTIONS

DF, SR, and EL conceived and designed the research and interpreted the results. SR performed the research and wrote the paper. DF and SR analyzed the data.

## FUNDING

This study was a result of the research funded by the project Nr. LO1611, with a financial support from the MEYS under the NPU I program.

## ACKNOWLEDGMENTS

We thank Jan Havlícek for a constructive discussion about the ˇ topic and Jakub Polák for critical reading of the text and linguistic revision. Moreover, we thank all of the authors who provided us photographs to use in the study, and to all our respondents for their kind participation in the project.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02439/full#supplementary-material

Archer, J., and Monton, S. (2011). Preferences for infant facial features in pet dogs and cats. Ethology 117, 217–226. doi: 10.1111/j.1439-0310.2010.01863.x

Bonner, L., and Burton, A. M. (2004). 7–11-year-old children show an advantage for matching and recognizing the internal features of familiar faces: evidence against a developmental shift. Q. J. Exp. Psychol. A 57, 1019–1029. doi: 10.1080/ 02724980343000657




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Rádlová, Landová and Frynta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Facial Adiposity, Attractiveness, and Health: A Review

Stefan de Jager 1,2 \*, Nicoleen Coetzee<sup>2</sup> and Vinet Coetzee<sup>3</sup>

*<sup>1</sup> Department of Psychology, Sefako Makgatho Health Sciences University, Ga-Rankuwa, South Africa, <sup>2</sup> Department of Psychology, University of Pretoria, Pretoria, South Africa, <sup>3</sup> Department of Genetics, Biochemistry and Microbiology, University of Pretoria, Pretoria, South Africa*

The relationship between facial cues and perceptions of health and attractiveness in others plays an influential role in our social interactions and mating behaviors. Several facial cues have historically been investigated in this regard, with facial adiposity being the newest addition. Evidence is mounting that a robust link exists between facial adiposity and attractiveness, as well as perceived health. Facial adiposity has also been linked to various health outcomes such as cardiovascular disease, respiratory disease, blood pressure, immune function, diabetes, arthritis, oxidative stress, hormones, and mental health. Though recent advances in the analysis of facial morphology has led to significant strides in the description and quantification of facial cues, it is becoming increasingly clear that there is a great deal of nuance in the way that humans use and integrate facial cues to form coherent social or health judgments of others. This paper serves as a review of the current literature on the relationship between facial adiposity, attractiveness, and health. A key component in utilizing facial adiposity as a cue to health and attractiveness perceptions is that people need to be able to estimate body mass from facial cues. To estimate the strength of the relationship between perceived facial adiposity and body mass, a meta-analysis was conducted on studies that quantified the relationship between perceived facial adiposity and BMI/percentage body fat. Summary effect size estimates indicate that participants could reliably estimate BMI from facial cues alone (*r* = 0.71, *n* = 458).

Keywords: facial adiposity, attractiveness, perceived health, health outcomes, BMI, percentage body fat, metaanalysis

INTRODUCTION

Facial appearance in humans conveys a substantial amount of non-verbal information when it comes to our interactions with others. These include judgments of health (Rhodes et al., 2007; Coetzee et al., 2009; Stephen et al., 2009; Phalane et al., 2017), attractiveness (Rhodes, 2006; Coetzee et al., 2012; Foo et al., 2017), leadership ability (Little et al., 2007; Re and Perrett, 2014) and even academic ability (Zebrowitz et al., 2002; Talamas et al., 2016), to name but a few. Naturally, the evolutionary, social and behavioral implications of the way in which we perceive and react to other people's faces, has generated significant interested in the scientific community. Several facial cues have been identified as integral aspects of our judgments of others, including sexual dimorphism (Perrett et al., 1998; Thornhill and Gangestad, 2006), symmetry (Perrett et al., 1999; Scheib et al., 1999; Jones et al., 2001), averageness (Grammer and Thornhill, 1994; Rhodes et al., 2001), skin condition (Jones et al., 2004; Stephen et al., 2011), and facial adiposity, or perceived weight in the face (Coetzee et al., 2009; Tinlin et al., 2013). Of particular interest to scientists is the relationship between facial cues and perceptions of health and attractiveness

#### Edited by:

*Kok Wei Tan, University of Reading Malaysia, Malaysia*

#### Reviewed by:

*Barnaby James Wyld Dixson, The University of Queensland, Australia Danielle Leigh Wagstaff, Federation University, Australia Shen Liu, University of Science and Technology of China, China*

#### \*Correspondence:

*Stefan de Jager gsdejager@gmail.com*

#### Specialty section:

*This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology*

Received: *12 September 2018* Accepted: *29 November 2018* Published: *21 December 2018*

#### Citation:

*de Jager S, Coetzee N and Coetzee V (2018) Facial Adiposity, Attractiveness, and Health: A Review. Front. Psychol. 9:2562. doi: 10.3389/fpsyg.2018.02562*

**65**

in others, as these play a particularly influential role in our social interactions and mating behaviors. Since mate choice plays a central role in evolutionary psychology, researchers are eager to understand the relative contribution that various facial cues make in shaping our perceptions and judgments of health and attractiveness (Rhodes, 2006).

Attractiveness plays a prominent role in our everyday interactions with others and research has consistently shown that attractive people are judged more positively in general compared to unattractive people (see Rhodes, 2006; Little et al., 2011). Attractive people are also typically judged to be healthier compared to unattractive people (Kalick et al., 1998; Shackelford and Larsen, 1999; Boothroyd et al., 2013). One potential reason for this is that facial cues associated with attractiveness could also be linked to various health markers. To this end, humans evolved to be particularly attentive to facial cues associated with attractiveness, as these cues allow us to make inferences about other people's health and genetic fitness, or even behavioral tendencies (Rhodes et al., 2001). For example, fitness-related theories of human behavior suggest that key phenotypic cues influence our judgments of others because they evolved as cues to general health and mate quality (Langlois et al., 2000). One example of such a fitness-related theory is the "good genes" hypothesis, which postulates that female mate choice is heavily influenced by phenotypic cues, since they aid females in making snap judgments about men's current health, genetic quality and fertility, and thus suitability for mating (Hamilton and Zuk, 1982; Thornhill and Gangestad, 1993, 1999).

Choosing the right mate could provide substantial direct and indirect benefits to females and their offspring. For example, according to the immunocompetence handicap hypothesis (Folstad and Karter, 1992) males that display strong secondary male sexual traits are perceived as more attractive by females, as these secondary sexual traits may potentially serve as good indicators of immune function in males. The reason for this is that the testosterone that is responsible for the development of these secondary male sexual traits also has an immunosuppressive effect, and only good quality males can therefore afford to display them. By choosing a male with a strong immune system as a mating partner, a female can simultaneously lessen the risk of pathogen exposure, while also maximizing the robustness of her offspring in terms of immune function. It should be mentioned, however, that some studies have found very little to no evidence for a relationship between testosterone and immune suppression in mammals and humans, respectively, thus casting at least some doubt on the viability of the immunocompetence handicap hypothesis (Roberts et al., 2004; Nowak et al., 2018).

From an evolutionary point of view, preferences for certain facial cues as reliable indicators of health and genetic quality, can only be a coherent hypothesis if two conditions are met: (i) there should be agreement between people regarding the facial features that they find attractive (Langlois et al., 2000; Rhodes, 2006) and (ii) perceptions of attractiveness and health should be related to actual health outcomes or genetic quality (Coetzee et al., 2009; Rantala et al., 2013a). At present there is at least some empirical support forthe notion that people can reliably detect and agree on facial cues that contribute to facial attractiveness and perceptions of health (Perrett et al., 1998; Langlois et al., 2000; Rhodes et al., 2001; Re and Rule, 2016a). This statement does come with a caveat though, as numerous studies have demonstrated that preferences for certain facial cues, such as facial adiposity for example, are likely mediated by environmental and cultural factors, including resource scarcity (Batres and Perrett, 2014, 2017), exposure to media beauty ideals (Batres and Perrett, 2014), and own ethnicity familiarity (Coetzee et al., 2014; Batres et al., 2017).

While there is strong support for the link between various facial cues and attractiveness, the link between facial attractiveness and actual health outcomes has been mixed (Weeden and Sabini, 2005; Rhodes et al., 2007; Foo et al., 2017; Cai et al., in press). For example, Kalick et al. (1998) found no reliable relationship between rated facial attractiveness and general health rating by physicians for people in adolescence, middle adulthood, or late adulthood. The researchers did, however, find that there was a correlation between rated facial attractiveness and perceived health for both males and females. After controlling for the mediating relationship that attractiveness may play between perceived health and actual health outcomes, a statistically significant correlation between perceived health and actual health was found. The researchers conclude that, paradoxically, attractiveness can cause a halo effect and serve to suppress accurate judgments of health, instead of enhancing them. In partial contrast to Kalick et al. (1998), Henderson and Anglin (2003) did find a significant correlation between rated attractiveness and longevity for 50 facial photographs taken from a high school yearbook, although it is not entirely clear how comparable longevity (Henderson and Anglin, 2003) and general health measures (Kalick et al., 1998) are within this context. For a review on the relationship between attractiveness and health see Weeden and Sabini (2005) and Rhodes (2006).

The relationship between individual facial cues associated with attractiveness and actual health outcomes has also been met with mixed results. For example, a study by Rhodes et al. (2001), using a very similar dataset to Kalick et al. (1998), found a relationship between facial averageness and perceived health, as well as facial symmetry and perceived health for both males and females. However, when the researchers investigated the potential relationship between actual health outcomes and facial averageness, a correlation was found only for male childhood health (r = 0.28), female adolescent health (r = 0.14) and the current health for females (r = 0.25). For rated asymmetry, no link was found between health outcomes and perceived symmetry of the participant's faces. One potential explanation for this inconsistent trend is that general health is a very broad concept that probably reflects a wide variety of health markers, which may, or may not, be linked to particular facial cues or even general facial attractiveness.

## Facial Adiposity as a Cue to Perceived Health and Attractiveness

One of the facial cues that has consistently been associated with both attractiveness (Coetzee, 2011; Coetzee et al., 2012; Rantala et al., 2013a; Foo et al., 2017; Phalane et al., 2017), perceived health (Coetzee et al., 2009; Fisher et al., 2014b; Han et al., 2016; Windhager et al., 2018) and actual health outcomes (Coetzee et al., 2009; Reither et al., 2009; Rantala et al., 2013a; Tinlin et al., 2013; Martinson and Vasunilashorn, 2016) is facial adiposity. Facial adiposity, or perceived weight in the face, was demonstrated to be a reliable cue to health in a study published by Coetzee et al. (2009) where higher facial adiposity ratings were linked to perceived health and attractiveness, as well as increased risk of infection and cardiovascular-illness. Studies also consistently report that people with lower perceived facial adiposity are rated as more attractive and healthier compared to people with higher perceived facial adiposity (Klaczynski et al., 2009; Rantala et al., 2013a; Han et al., 2016; Foo et al., 2017). A study by Foo et al. (2017) showed that facial adiposity was a better predictor of attractiveness compared to sexual dimorphism, averageness, and symmetry, for male faces. The researchers also found that, for females faces, facial adiposity squared and sexual dimorphism were the best predictors of female facial attractiveness, while facial adiposity was also the strongest predictor of perceived health for male faces, while sexual dimorphism was the strongest predictor of perceived health for female faces, with facial adiposity failing to reach statistical significance.

As mentioned previously, for a specific facial cue to serve as a valid cue to health, people need to reliably detect the cue and agree on its relationship to health or attractiveness. Some studies have reported that participants are able to reliably estimate a person's body fat percentage or body mass index (BMI) from facial cues alone (Fisher et al., 2013; Tinlin et al., 2013; Han et al., 2016; Phalane et al., 2017). The fact that participants are able to accurately detect changes in BMI likely stem from the fact that changes in BMI have been linked to a predictable set of morphological characteristics such as width-to-height ratio, perimeter-to-area ratio, cheek-to-jaw-width ratio (Coetzee et al., 2010), shape of lower face outline, nose width, and eyebrow position (Mayer et al., 2017). A study by Re et al. (2013) reported that people can detect changes in BMI as small as 1.3 kg/m<sup>2</sup> in male faces and 1.6 kg/m<sup>2</sup> in female faces from facial cues alone. Another study by Re and Rule (2016b) corroborated this finding by reporting that an average change in BMI of 1.33 kg/m<sup>2</sup> was sufficient for participants to report a noticeable change in the appearance of faces. In recent years there has also been a concerted effort from researchers to develop computer vision methods (Wen and Guo, 2013; Kocabey et al., 2017; Barr et al., 2018) and statistical models (Wolffhechel et al., 2015; Stephen et al., 2017) to predict BMI from facial images.

Being able to draw inferences about a person's weight from facial cues of adiposity, provides a potentially robust perceptual link between facial cues and actual health. Body fat percentage and BMI has consistently been linked to various negative health outcomes. Overweight individuals are subject to unfavorable health outcomes including diabetes, mental health problems, impaired immune function, high and low blood pressure and heart disease (Kopelman, 2007; Dixon, 2010; Zaccardi et al., 2017); while being severely underweight could be an indication that a person is vulnerable to communicable diseases (Fisher et al., 2014b), since being underweight is linked to malnutrition and compromised immune function (Ritz and Gardner, 2006; Dobner and Kaser, 2018). Accurate estimation of overall body weight from facial cues can thus serve as an effective cue to health, as lower or higher than average body weight could be indicative of a person's past, current and future health problems (Coetzee et al., 2009; Henderson et al., 2016). In addition to the strong relationship between BMI and health outcomes, a study by Levine et al. (1998) has also established a strong link between cheek adipose tissue and visceral abdominal fat and a study by Lee and Kim (2014) established a correlation between the distances between the inferior ear lobes and visceral fat. Inferring visceral fat from facial cues is a potentially important predictor of health outcomes, as visceral fat has been connected to negative health outcomes due to its high metabolic activity (Després and Lemieux, 2006). BMI is also highly hereditable, as twin studies have estimated the hereditability of BMI to be as high as 0.57 in people between 20 and 29 years of age, even after environmental variation and culturalgeographic regions were accounted for (Silventoinen et al., 2017).

**Tables 1**, **2** present summaries of studies investigating the relationship between facial adiposity, attractiveness, and perceived health.

While research into the link between facial adiposity and health and attractiveness has been gaining momentum over the last decade, there are some challenges faced by researchers. Investigating the relationship between facial adiposity, actual health, and attractiveness provides a series of methodological challenges. As both underweight and overweight individuals are likely to be judged less healthy and attractive, Coetzee et al. (2009) proposed that the relationship between perceived health, attractiveness and adiposity is best represented by a quadratic relationship. Seeing as both overweight and significantly underweight individuals are subject to negative health outcomes, any large deviations from average levels of adiposity is likely to lead to decreases in health and attractiveness ratings. Subsequent studies confirmed that curvilinear models are often the best fitting models for these data (Phalane et al., 2017; Windhager et al., 2018), though this is not always the case (Coetzee et al., 2012; see for example Foo et al., 2017). In studies where sampling range restrictions occur for BMI, it is also possible that the range restriction obscures the quadratic pattern of the results. One notable study by Windhager et al. (2013) created a set of female adolescent geometric morphs that allowed the researchers to produce a set of 5 images which ranged from −5 SD (19% body fat) to +5 SD (64% body fat). These facial morphs were then used in a recent study where 274 male and female participants had to rate each of the 5 images on maturity, dominance, masculinity, perceived health and attractiveness (Windhager et al., 2018). Analysis revealed that both health and attractiveness consistently produced distinct curvilinear shapes across all rater categories and genders.

In addition to the complex statistical relationship that exists between facial adiposity and preference ratings, there is also evidence to suggest that humans integrate information from multiple facial cues to evaluate faces. One of the facial cues that

#### TABLE 1 | Summary of studies that investigated the relationship between facial adiposity and attractiveness.


\**Used the same facial stimuli panels*

have been identified as playing a particularly important role in mediating the relationship between facial adiposity and health ratings is skin condition. According to Fisher et al. (2014b) judgments of health and attractiveness from facial adiposity in isolation produces a conundrums for raters, since it becomes very difficult to differentiate between individuals with low levels of facial adiposity due to health and those that have low facial adiposity levels due to ill-health. One way around this problem is to integrate information from skin condition with facial adiposity to provide more information on the potential risk associated with the facial adiposity levels being perceived. An experiment conducted by Fisher et al. (2014b) provided evidence that participants likely integrate both skin color and facial adiposity cues when judging faces for health and attractiveness. Redness and yellowness in the face has been associated with perceptions of health (Stephen et al., 2009, 2011) and in the study by Fisher et al. (2014b) there was evidence to suggest that participants relied more on color cues to decide if a face looked healthy or attractive when facial adiposity levels were relatively low.

TABLE 2 | Summary of studies that investigated the relationship between facial adiposity and perceived health.


According to the World Health Organization (WHO) classification, a healthy BMI for an average adult is between 18.5 and 24.9 kg/m<sup>2</sup> (World Health Organization, 2018). Numerous studies have shown that people generally prefer facial adiposity levels that are associated with the normal range of the World Health Organization's classification for BMI and that lower levels of facial adiposity is typically rated as more attractive, even across cultures (Coetzee et al., 2011, 2012, 2014; Re et al., 2011; Re and Rule, 2016b). For example, a study by Coetzee et al. (2011) found that, female participants judging other female faces, indicated that facial adiposity levels associated with a BMI of 19.76 kg/m<sup>2</sup> are optimally attractive, while male raters indicated that adiposity levels associated with a BMI of 20.01 kg/m<sup>2</sup> are optimally attractive for female faces. A similar study by Re and Rule (2016b) found that participants preferred facial adiposity levels associated with a BMI of 19.11 kg/m<sup>2</sup> for female faces and 23.79 kg/m<sup>2</sup> for male faces. Coetzee (2011) compared facial adiposity preferences between a cohort of British and African participants and found that Caucasian males, African females and African males all preferred similar levels of facial adiposity for attractiveness and health. Despite the evidence that there is agreement across cultures on facial adiposity levels associated with health and attractiveness, there does appear to be some degree of divergence in BMI preference judged from facial adiposity and BMI preferences as judged from bodies across cultures. For example, a study by Tovée et al. (2006) found that ethnic South African Zulus preferred a BMI that is above the upper limit of the normal range for both health and attractiveness when judging female bodies. Since most studies on facial adiposity preferences utilize university students, Coetzee et al. (2012) argues that this discrepancy could potentially be attributed to repeated exposure to Western media ideals and improved access to resources that African university students enjoy. This line of reasoning is also supported by Tovée et al. (2006) who found that South African born Zulus who immigrated to the United Kingdom preferred a lower BMI for both attractiveness and health compared to Zulus still residing in South Africa (also see Tovée et al., 2007). At present, more research is needed to elucidate the cultural and environmental factors associated with facial adiposity preferences across cultures.

There also appears to be relative differences between males and females when it comes to judgments of facial adiposity levels required for a healthy appearance for female faces. Coetzee et al. (2011) report that females preferred a slightly lower BMI (19.76 kg/m<sup>2</sup> ) for attractiveness compared to optimum health (20.84 kg/m<sup>2</sup> ) when rating other females, while no such difference was found for males. One hypothesis that has been put forth to account for this discrepancy, is that socio-cultural factors related to judgments of body fat influences and shapes female body weight ideals, especially via media exposure to a thin ideal (Groesz et al., 2002; Harrison and Hefner, 2006; Coetzee et al., 2011). Though the link between facial adiposity and perceived health and attractiveness appears to be remarkably stable, facial adiposity likely interacts in complex ways with various biological, environmental and socio-cultural factors, as well as other facial cues such symmetry, skin color, skin texture, sexual dimorphism and averageness in shaping social and health perceptions derived from faces.

#### Facial Adiposity and Health Outcomes

One of the primary reasons why researchers are interested in facial adiposity is its potential to act as a robust cue to health outcomes. Given the strong relationship that exists between facial adiposity and BMI, negative health outcomes related to BMI should therefore also be linked to facial adiposity. To date, numerous studies have found a relationship between facial adiposity and actual health outcomes, although the results are not always consistent.

#### Cardiovascular Health

A study by Coetzee et al. (2009) found that facial adiposity was associated with systolic blood pressure (r = 0.28) and diastolic blood pressure (r = 0.43), as well as a cardiovascularillness component (r = 0.41). In addition to the results reported by Coetzee et al. (2009) a study by Reither et al. (2009) also found a link between perceived facial adiposity and cardiovascular health. Reither et al. (2009) used 3,027 yearbook photographs of teenagers from the Wisconsin Longitudinal Study (WLS), which also captured basic health data from 1957 to 2004. The researchers found that increased facial adiposity ratings in adolescent faces were associated with high blood pressure (OR = 1.25), heart trouble (OR = 1.20) and heart disease (OR = 1.69) later in life. Recently a study by Stephen et al. (2017) also reported that a geometric morphometric model that captured aspects of facial shape from images, could be used to predict 21% of the variance in blood pressure in their sample. These results indicate that there is a robust link between cardiovascular health and facial adiposity.

#### Immunity

Coetzee et al. (2009) reported that higher levels of rated facial adiposity was associated with an increase in the number of colds and flu's participants reported (r = 0.209), bout length of those reported colds and flu's (r = 0.24), the frequency of antibiotics use reported by participants (χ <sup>2</sup> = 11.706) and a respiratory-illness component (r = 0.29). These relationships remained significant, even after the researchers controlled for parental income and age. A study on male participants by Rantala et al. (2013a) found an association between perceived facial adiposity and immunity (β = −0.38) using a direct measure of immune function (antibody response to a Hepatitis B vaccination). The researchers also reported that perceived facial adiposity mediated the relationship between antibody response and facial attractiveness. However, a similar study using female participants, found no link between the antibody response produced by a Hepatitis B vaccination and percentage body fat, though it should be mentioned that no direct measures of facial adiposity were included in the study (Rantala et al., 2013b). Although the studies reported above found indications that immune function may be linked to facial adiposity, especially in males, other studies have not produced the same pattern of results when using different measures of immune function. For example, a study by Phalane et al. (2017) using African participants found no significant relationship between adiposity and immune responsiveness as measured by functional cytokine profile and C-reactive proteins in African men. It is worth mentioning, however, that the sample size for the cytokine profile was relatively low (n = 41) and could thus reflect a potential type II error due to a lack of statistical power. A recent study by Foo et al. (2017) also found no relationship between facial adiposity and immune function (bacterial killing capacity, overall bacterial immunity, bacterial suppression capacity, and lysozyme activity) in either males or females. To date, there is some evidence to suggest that facial adiposity and immune function are linked, especially in men, although more research is needed to clarify the potential relationship between different aspects of immune function and facial adiposity.

#### Mental Health

Relatively few studies have investigated the link between facial adiposity and mental health. A study by Martinson and Vasunilashorn (2016) that was conducted using the same basic methodology applied to the WLS data used by Reither et al. (2009), found that females who were rated as being overweight (M = 8.79) in 1957 scored significantly higher on the Center for Epidemiological Studies Depression Scale (CES-D) later in life compared to normal weight females (M = 6.94). No statistically significant difference was found between overweight (M = 6.40) and normal weight (M = 5.97) CES-D scores for male students. Additionally, a study conducted by Tinlin et al. (2013) using female participants, also found that rated facial adiposity was negatively correlated with a psychological condition factor (r = 0.29). These findings suggest that facial adiposity may also serve as a valid cue to mental health in females, although more research is required to draw any conclusions about the link between mental health and facial adiposity in males.

#### Hormones

The relationship between facial adiposity and various hormone levels has also produced largely mixed results. For example, a study by Tinlin et al. (2013) conducted using female participants reported a significant negative correlation between salivary progesterone levels (r<sup>s</sup> = −0.30) and facial adiposity, but found no relationship between facial adiposity and estradiol levels, though the results should be interpreted with caution due to the small sample size (n = 49). Rantala et al. (2013a) found a significant relationship between circulating testosterone and facial adiposity in men (r = −0.52). Han et al. (2016) found no relationship between facial adiposity and cortisol levels in females, while Rantala et al., 2013b also found no relationship between cortisol levels and percentage body fat in females. To date, most of the research done on the relationship between hormones and facial adiposity has been done using female participants.

#### Other Health Outcomes

In addition to cardiovascular health, immune function, mental health and hormone levels, some studies also investigated potential links between facial adiposity and other health outcomes such as diabetes and arthritis or indicators of health such as oxidative stress and sperm health. For example, Reither et al. (2009) found that adolescents who were rated as having higher levels of facial adiposity were also more likely to experience muscle aches (OR = 1.13), shortness of breath (OR = 1.10) and chest pains (OR = 1.21) during the course of their lives, but also showed an increased risk of developing arthritis (OR = 1.19), diabetes (OR = 1.44), and die of non-accidental causes (OR = 1.32). A study by Foo et al. (2017) investigated the relationship between facial adiposity and oxidative DNA damage, as well as level of lipid peroxidation. The researchers found a significant relationship (r = −0.22) between facial adiposity and urinary 8-hydroxy-2′ –deoxyguanosine (8- OHdG), a biomarker for oxidative stress and carcinogenesis (Valavanidis et al., 2009) for males, but not for females. However, Foo et al. (2017) also reported that no relationship existed between facial adiposity and a measure of lipid peroxidation (isoprostane levels) in either males or females. Lastly, Foo et al. (2017) found no relationship between facial adiposity and sperm health (rapid progressive motility, linearity of sperm movement, sperm concentration and percentage motile sperm) for males.

A study by Tinlin et al. (2013) investigated the relationship between facial adiposity and general health markers in female participants. The researchers found that rated facial adiposity was not correlated with a physical condition factor, although the researcher did note that there was evidence for high collinearity between the physical condition and psychological condition factors in their study. A combination of these factors (labeled general condition factor), displayed a moderate association with facial adiposity (r = 0.41).

When combined, the results from these studies indicate that facial adiposity can serve as a reliable indicator of actual health outcomes including immune function, cardiovascular health, respiratory health, mental health and oxidative stress, although more research is needed to elucidate the link between these health outcomes and facial adiposity. The next section briefly touches upon factors that can potentially influence or mediate the relationship between facial adiposity and judgments of attractiveness and health.

**Table 3** presents a summary of studies investigating the relationship between facial adiposity and health outcomes.

TABLE 3 | Summary of studies that investigated the relationship between facial adiposity and health outcomes.


## Factors That Potentially Contribute to Adiposity Preferences

While numerous studies have found a robust link between facial adiposity and perceptions of health and attractiveness, a number of potential mediating factors have been identified that could serve to modify our facial adiposity preferences. A study by Re et al. (2011) demonstrated that our preferences for particular adiposity levels can be altered, at least in the short term. First, the researchers used a pre-test to establish a baseline for participants' facial adiposity preferences. By exposing participants to either "plus-sized" female bodies or regular female bodies, the researchers could induce an after-effect where participants who viewed the "plus-size" bodies increased their adiposity preference by an average of 0.5 kg/m<sup>2</sup> . Participants thus preferred faces higher in adiposity after viewing the "plussize" bodies. This finding illustrates an adaptation effect, whereby exposure to certain stimuli can create a preference shift toward that particular type of stimulus. According to the authors, this after-effect could be an indication that overall attractiveness judgments require integration of multiple cues, though these cues are often investigated independently.

The idea of an adaptation effect is also related to repeated exposure to facial cues found in our own ethnicity, or ethnicities that we are regularly exposed to. In a study by Schneider et al. (2013) Japanese and German participants were asked to provide body weight estimates for both German and Japanese individuals based on facial photographs alone. The researchers found that German observers tended to slightly overestimate the body weight of the Japanese faces, while the Japanese participants significantly underestimated the body weight of the German faces. The researchers argue that one potential explanation for the gross underestimation of body weight for German faces by Japanese participants, is that Japanese participants utilized an inappropriate reference point regarding the link between facial shape and body weight for German faces. Since Japanese faces tend to display more rounded and broad (brachycephalic) head proportions, German faces associated with an average body weight appeared more slender, compared to a Japanese face associated with and average body weight, which Japanese participants would have been more familiar with. These findings imply that accurate weight perceptions derived from facial cues, is also contingent upon reasonable exposure and familiarity with the association between facial phenotypic variation and body weight of a particular ethnic group.

Another factor that can play a significant role in influencing people's adiposity preferences, is potential pathogen exposure and how it relates to our behavioral immune system. Studies have shown that levels of pathogen disgust sensitivity can have a dramatic influence on individuals' perceptions of facial cues associated with attractiveness or perceived health (Park et al., 2012; Jones et al., 2013). These result suggest that a pathogen disgust reaction may indeed play a role in our relative tolerance for facial cues that may signal ill-health. A study by Fisher et al. (2013) found that men with higher pathogen disgust responses showed preferred facial cues associated with lower weight, indicating that individual differences in pathogen disgust also play a role in relative levels of facial adiposity that people find attractive or perceive to be healthy. These result are also supported by Rantala et al. (2013a) who found that facial adiposity mediated the association between facial attractiveness and immune response in men. However, a study by Dixson et al. (2017) found no relationship between facial adiposity preferences and malarial prevalence for people living in urban or rural areas in Vanuatu, a series of small Pacific islands. These results suggest that the relationship between facial adiposity preferences and pathogen exposure or disgust, while theoretically elegant, have to be interpreted with caution.

Environmental pressures also appear to play a role in shaping our preferences for facial or body adiposity. For example, it has been demonstrated that resource scarcity could play a role in males' preference of breast size, with men from lower socioeconomic regions preferring larger breasts, due to breast size being a good indicator of adipose tissue reserves (Dixson et al., 2011; Swami and Tovée, 2013). A study by Batres and Perrett (2014) found that people in rural areas of El Salvador tend to prefer female faces with higher levels of adiposity, compared to people from urban areas. One hypothesis that has been proposed to account for this difference is the fact that rural areas, especially in developing nations, tend to be poorer and people from these areas tend to have less access to resources such as food or medicine. A similar study conducted by Swami and Tovée (2007) also provide evidence that people from resourcepoor areas of Thailand preferred female bodies that were higher in BMI compared to people from more industrialized areas of Thailand. A more recent study by Batres et al. (2017) replicated the results of Batres and Perrett (2014) in another cohort of people from El Salvador, as well as Malaysia. What is especially interesting about these results is the fact that a difference in preference for facial adiposity levels between more developed and poorer areas was only found for females faces. One possible reason for this is that a trade-off exists in poorer areas, where males have to weigh the potential negative long-term health outcomes associated with higher levels of adiposity and more immediate concerns regarding survival and reproductive fitness of females (Batres et al., 2017). Another possible contributor to this phenomenon could be increased media exposure to thin ideals in more developed areas, which then serves to alter people's baseline expectations of attractiveness or health, via and adaptation effect as described by Re et al. (2011) and Batres and Perrett (2014). It should be pointed out, however, that there are studies that problematize the view that people in resource-poor areas prefer bodies or faces associated with a slightly higher BMI. For example, a study by Dixson et al. (2017) found no evidence to support the hypothesis that males from rural area would prefer females faces associated with higher levels of facial adiposity. A series of studies conducted in China, Papua New Guinea, Cameroon, Indonesia, Samoa and New Zealand, also found a high degree of cross-cultural consensus when participants were asked to judge female body attractiveness, with both males and females preferring a lower waist-to-hip ratio (WHR) regardless of BMI (Dixson et al., 2010a,b; Singh et al., 2010). These results suggest that sexually dimorphic fat distribution could in fact be

more important than overall BMI when judging female body attractiveness, even across ethnicities or socio-economic regions.

A study by Weston et al. (2015) revealed that judgments of weight in faces can even be influenced by the expression of the facial stimuli. The researchers observed that participants tended to rate male faces containing sad expressions as more overweight compared male faces containing a neutral facial expression. A study by Henderson et al. (2016) also found that downward mouth curvature was negatively correlated with perceived health (r = −0.20) and upward mouth curvature was positively correlated with perceived health (r = 0.51), presumably because a frown is associated with a sad facial expression that reflect a negative mood or mental state. It is thus likely that facial cues are also evaluated within the context of emotional attributions, adding yet another layer of complexity to the way in which we judge physical and emotional health from facial cues in others.

## Methodological Considerations in the Study of Facial Adiposity

In the laboratory there are several factors related to the facial stimuli that can influence the outcome of a study. For example, a study by Jones et al. (2012) found that perceptions and the derived judgments of facial stimuli can be altered by changing the viewing angle of the face. According to the authors, this effect is likely attributable to directional asymmetries in the perception of facial cues related to shape. A study by Russell et al. (2016) found that facial contrast can also influence perceptions of health within the face, with faces low in contrast being judged as less healthier. These two studies highlight the importance of controlling for subtle factors such as viewing angle and facial contrast in the generation of facial stimuli, as they could potentially confound the results of the study.

In addition to the way in which stimuli are generated, statistical and research design considerations also need to be taken into account when conducting facial morphology research. For example, Windhager et al. (2018) proposes that a major drawback of the traditional reliance on p-values in the analysis of data derived from facial stimuli, is the fact that facial morphology research often involves complex nested designs. Most designs in the field use human raters to rate multiple facial stimuli on one or more dimensions. Researchers then often calculate a mean rating for a particular face and although inter-rater reliability is mostly high (α > 0.80), the variance associated with rater responses is usually lost when rater responses are collapsed to a single mean. Many studies also use suboptimal sample sizes (n > 100) due to the complexity of the designs, which can lead to a lack of statistical power in detecting smaller effect sizes (see for example Tinlin et al., 2013; Phalane et al., 2017). In reality, collinearity concerns are likely always going to be a potential problem researchers will have to deal with, due to the complex facial cue integration that happens within an organic judgment faces. However, recent advances in the creation of facial morphs can potentially allow researchers to manipulate facial cue dimensions with increasing precision, while also theoretically isolating specific facial cues within a panel of facial stimuli (Windhager et al., 2013, 2018; Re and Perrett, 2014). Bottom-up or data-driven methodologies can also offer researchers a way to increase the precision with which facial cues are specified by reducing shape coordinates to a set of principle components that can be used in further analysis (Wolffhechel et al., 2015; Henderson et al., 2016).

## Estimating BMI From Facial Adiposity

One of the key components of utilizing weight related facial cues to make inferences about people's health is that people should be able to accurately judge body mass from facial cues alone. To our knowledge no study has provided a quantitative synthesis of the link between facial adiposity and judgments of body weight across different studies published in the field. A meta-analysis was therefore conducted to evaluate the strength of the relationship between judgments of facial adiposity and BMI/ percentage body fat.

## METHODS

### Inclusion Criteria

All peer-reviewed journal articles that provided an effect size that quantifies the relationship between perceived facial adiposity and estimations of BMI or percentage body fat were included in the meta-analysis. Studies that reported more than one effect size based upon the same facial stimuli were excluded from the analysis due to potential biases in heterogeneity estimates. For example, a study by Coetzee et al. (2012) calculated a correlation coefficient for the relationship between perceived facial adiposity and BMI, as well as percentage body fat for the same participants. Although two effect sizes can be extracted from this study, the two effect sizes obtained from the same group would not be statistically independent. Similarly, Fisher et al. (2014a) used the same facial stimuli as Fisher et al. (2013), thus the effect sizes obtained from these two studies would also not be statistically independent. The effect size for the relationship between facial adiposity and percentage body fat from Coetzee et al. (2012) and the study by Fisher et al. (2014a) were thus excluded from the analysis. No restriction criteria were placed on the age, gender or ethnicity of either the facial stimuli or raters in studies to be included in the meta-analysis.

## Search Strategy

A systematic review protocol was developed in accordance with the Preferred Reporting for Items for Systematic Reviews and MetaAnalyses-Protocols (PRISMA) (Moher et al., 2009). The PRISMA protocols can easily be reproduced by other researchers thereby ensuring transparency and reliability of meta-analyses (Gurevitch et al., 2018). The use of PRISMA protocols is becoming increasingly popular within the field of evolutionary psychology (Geniole et al., 2015; see for example Gouda-Vossos et al., 2018). In line with the protocol developed for this study, all studies that reported an effect size for the relationship between perceived facial adiposity and BMI or percentage body fat were identified by searching relevant databases for peerreviewed articles that contained the phrase "facial adiposity": Pubmed (9), EbscoHost (27), Science Direct (66), ProQuest

TABLE 4 | Summary of studies that investigated the relationship between facial adiposity and BMI/Percentage body fat.


(66), and Web of Science (21). Once databases were searched, reference lists of prominent published articles were also searched and any additional relevant abstracts were added. Results were entered into the open-source software management package Zotero Version 5.0. Duplicate entries were removed, after which abstracts were screened for relevance in line with the inclusion criteria by the authors. After the initial screening procedure, the full-text articles were downloaded for the remaining list and re-screened for relevance. See **Figure 1** for the PRISMA flow diagram.

#### Data Extraction

A data extraction template was used to capture all the relevant data from the articles selected for inclusion in the meta-analysis. The data extraction sheets included detailed information on the types of stimuli including gender, age, BMI/percentage body fat, ethnicity, as well as the rating scale used and if the facial stimuli were captured in 2D or 3D. Demographic information on the raters was included as well, including gender, age and ethnicity. Finally, effect sizes were captured for all relevant studies.

## Data Synthesis

All extracted effect sizes were converted to Pearson's r before any analyses were conducted. As discussed previously, studies often use the same facial stimuli or raters multiple times during a study, thus creating a complex nested design structure. This nested structure means that it is not unusual for a particular study to produce two or more analysis outcomes that were derived from highly correlated or identical sources within the study. If this statistical dependency is not adequately accounted for in the analysis procedures used by the researchers, they risk introducing bias in the variance estimation during the analysis (Hunter and Schmidt, 2004). In order to account for the within-study statistical dependence, we aggregated all effect sizes within each study to produce a single estimated effect size for that study using the "Agg" function contained in the "Mac" package for R (Del Re and Hoyt, 2012; R Core Team, 2013). As information on the true correlations of within-study effect sizes were unavailable, all within-study effect size correlations were fixed at 0.50 (Wampold et al., 1997). This aggregation procedure produced seven effect size estimates, one corresponding to each study. Due to key differences between studies regarding selection and presentation of facial stimuli, as well as sample composition, meta-analyses were conducted using random-effect models (Borenstein et al., 2010; Quintana, 2015). The aggregated Pearson's r for each study was transformed to a Fisher's z scale, as r is not normally distributed. These effect size estimates were back transformed where appropriate in the presentation of the results. See **Table 4** for a summary of studies included in the meta-analysis.

## RESULTS

A random effects model weighted by sample size was performed using the "metaphor" package in R (Viechtbauer, 2010). The analysis revealed a strong positive overall correlation between perceived facial adiposity ratings and BMI/percentage body fat [r = 0.71; 95% CI (0.66, 0.76), p < 0.001]. Heterogeneity statistics revealed no statistically significant between-study heterogeneity [Q = 7.12 (df = 6), p = 0.31; I <sup>2</sup> = 19.36%; τ 2 0.004; τ = 0.06]. According to guidelines published by Higgins et al. (2003), an I <sup>2</sup> proportion below 25% is a good indicator of low betweenstudies variability and can serve as a more robust indicator of between-study variance in small sample sizes.

Sensitivity analysis was conducted to investigate if any of the studies included in the analysis contributed disproportionately to heterogeneity. Outlier and influential case diagnostics were conducted, which indicate that only Coetzee et al. (2012) (Study 6 in **Figure 2**) displayed signs of being a potential outlier or influential case (Viechtbauer and Cheung, 2010). See **Figure 3** for influential case diagnostics. To test if the removal of Coetzee et al. (2012) from the analysis would influence the overall model fit, the study was excluded and the model re-fitted. The re-fitted model did not make a significant impact on the interpretation of the overall results however [r = 0.70; 95% CI (0.65, 0.76), p < 0.001]. Moderator analysis also revealed that the ethnicity of the facial stimuli (African or Caucasian) did not have a moderating effect on the relationship between perceived facial adiposity ratings and BMI/ percentage body fat [Q (1), 0.50, p = 0.48].

#### Risk of Bias

**Figure 4** shows a funnel plot of correlation coefficients plotted against standard errors for each study. Egger's regression test showed no evidence of any asymmetry in the funnel plot. Due to the relatively low sample size (n = 7) of the study, visual symmetry was hard to estimate and therefore no trim-and-fill methods were used.

#### DISCUSSION

Facial adiposity has consistently been linked to perceptions of attractiveness and health, with heavier faces being judged to be more unattractive and unhealthier. To date, facial adiposity has also been linked to a number of actual health outcomes including: cold and flu number, duration of colds and flu, frequency of antibiotic use, respiratory illness, blood pressure, cardiovascular illness, salivary progesterone, psychological wellbeing, arthritis, diabetes, circulating testosterone, immune function, and oxidative stress. While a strong relationship between facial adiposity, attractiveness, perceived health and actual health outcomes has been reported, there are a few limitations to the current evidence presented in favor of facial adiposity as an important contributor to health and attractiveness judgments. It is noteworthy that the majority of the studies published on facial adiposity used Caucasian students in their early to mid-twenties as both stimuli and raters, with isolated studies being done on Asian, Hispanic and African faces. A study by Batres and Perrett (2014) also found that people without internet access preferred female faces with higher levels of adiposity compared to people with internet access. This finding highlights a key shortcoming in the field, as most current published studies may provide a biased or incomplete picture of the way in which attractiveness and health judgments are made from facial cues related to body weight. It is therefore important for researchers to expand the diversity of sampled populations in future studies, as environmental pressures, and media exposure to Western weight ideals may differ substantially from region to region or culture to culture. It is also recommended that studies take socio-economic status of participants into account if possible.

FIGURE 3 | Plot of the (A) studentized deleted residuals; (B) DFFITS values; (C) Cook's distances; (D) covariance ratios; (E) estimates of τ2; (F) test statistics for (residual) heterogeneity; (G) hat values; and (H) weight for the 7 studies included in the analysis.

There also appears to be important differences in judgments made by males and females regarding adiposity as a cue to health and attractiveness. For example, a link has been found between adiposity and immune responsiveness in male Rantala et al., 2013a, but not female faces Rantala et al., 2013b. There is also at least some evidence to suggest that Western body ideals may have a differential effect on males and females, where ideal attractiveness is associated with a lower BMI compared to ideal weight for females (Coetzee et al., 2011). Due to the fact that research in the field is highly reliant on cross-gender judgments of health and attractiveness, most studies would be well-advised to be sensitive to the nuanced differences in how people of opposite genders judge one another based on facial cues related to adiposity, especially across socio-economic or cultural boundaries.

In addition to the socio-economic, cultural and gender factors, researchers also need to pay close attention to stimuli generation and study designs when conducting research. The recent development of 3D facial imaging technology allows researchers to produce 3D facial images that more closely approximate the way people would be able to view another person's face in reality. This is especially important when participants are required to make judgments about facial shape, as 2D facial images contain less overall facial shape information compared to 3D images. Researcher also need to be mindful of the potential drawbacks of using traditional statistical procedures and inadequate sample sizes when the research designs make use of nested structures where multiple participants judge multiple facial stimuli along multiple dimensions.

Despite the limitations presented in this paper, the field of facial morphometrics is a burgeoning field that holds the potential for the development of reliable, inexpensive and noninvasive methods for disease detection and monitoring. For example, the Andreu et al., 2016) is a personal health monitor that integrates a 3D optical scanner, multispectral cameras and gas detection sensor for collecting data of individuals who stand in front of the mirror. One of the key features of the Wize Mirror is the use of facial morphometric analysis to predict cardio-metabolic risk. Studies by Kocabey et al. (2017) and Barr et al. (2018) also reported on the development of Faceto-BMI- Systems that predicts BMI from facial images found

## REFERENCES


on social media platforms, for example. The results reported by these studies indicate that these computer models could predict actual BMI from facial cues alone to a degree of accuracy very similar to that of human observers. With rapid advancement of these types of technologies, exciting possibilities begin to open up for continuous monitoring of health risk associated with BMI.

As mentioned throughout this paper, one of the key conditions that need to be met for facial adiposity to serve as a valid cue to attractiveness and health, is that people need to be able to accurately judge body mass from facial cues. The meta-analysis presented in this review found that participants can reliably estimate BMI from facial cues alone (r = 0.71, n = 458) and that ethnicity of the rated faces did not mediate the relationship between perceived facial adiposity and estimates of body mass. This is an important finding as it demonstrates that cues related to facial adiposity can be reliably detected by participants and also reliably used to make inferences about another person's body weight. Due to the strong relationship between body weight and negative health outcomes, accurate judgment of facial cues related to body weight, is a key factor in allowing us to make inferences about another person's health.

## AUTHOR CONTRIBUTIONS

SdJ and VC contributed conception and design of the study; SdJ and VC organized the database; SdJ performed the statistical analysis; SdJ and VC wrote the first draft of the manuscript. All authors contributed to manuscript revision, read and approved the submitted version.

## FUNDING

The preparation of this paper was financially supported by a Department of Higher Education and Training (DHET) Research Development Grant and a National Research Foundation (NRF) Competitive Programme for Rated Researchers (99078).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 de Jager, Coetzee and Coetzee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Influence of Body Composition Effects on Male Facial Masculinity and Attractiveness

#### Xue Lei<sup>1</sup> \*, Iris J. Holzleitner<sup>2</sup> and David I. Perrett<sup>1</sup>

<sup>1</sup> School of Psychology & Neuroscience, University of St Andrews, St Andrews, United Kingdom, <sup>2</sup> Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, United Kingdom

Body mass index (BMI) and its facial correlates influence a range of perceptions including masculinity and attractiveness. BMI conflates body fat and muscle which are sexually dimorphic because men typically have more muscle but less fat than women. We therefore investigated the influence of facial correlates of body composition (fat mass and muscle mass) on the perception of masculinity in male faces. Women have been found to prefer more masculine looking men when considering short-term relationships compared with long-term relationships. We therefore conducted a second study of heterosexual women's preferences for facial correlates of fat and muscle mass under long and short relationship contexts. We digitally transformed face shape simulating the effects of raised and lowered levels of body fat or muscle, controlling for each other, height and age. In Study 1, participants rated masculinity of shape-transformed male faces. The face shape correlates of muscle mass profoundly enhanced perceived masculinity but the face shape correlates of fat mass only affected the perception of masculinity in underweight to low normal weight men. In Study 2, we asked two groups of women to optimize male face images (by adjusting the shape correlates of fat and muscle) to most resemble someone they would prefer, either for a short-term sexual relationship or for a long-term relationship. The results were consistent across the two participant groups: women preferred the appearance of male faces associated with a higher muscle mass for short-term compared with long-term relationships. No difference was found in women's preference for the face shape correlates of fat mass between the two relationship contexts. These findings suggest that the facial correlates of body fat and muscle have distinct impacts on the perception of male masculinity and on women's preferences. The findings indicate that body composition needs to be taken into consideration in psychological studies involving body weight.

Keywords: body composition, fat, muscle, masculinity, face preference, short-term relationship, long-term relationship, relationship context

## INTRODUCTION

Research on women's preference for male facial masculinity over the past two decades is marked by inconsistent findings. Some studies found that masculine faces were preferred by women (e.g., Rhodes et al., 2003; DeBruine et al., 2006; Feinberg et al., 2008; Little et al., 2008; Saxton et al., 2009; Jones et al., 2018), whereas other studies have reported a preference for femininity in men (e.g., Perrett et al., 1998; Penton-Voak et al., 1999; Little et al., 2002; Scott et al., 2010), and yet

#### Edited by:

Alex L. Jones, Swansea University, United Kingdom

#### Reviewed by:

Karel Kleisner, Charles University, Czechia Thomas Richardson, University of Manchester, United Kingdom

> \*Correspondence: Xue Lei xl55@st-andrews.ac.uk

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 12 October 2018 Accepted: 10 December 2018 Published: 04 January 2019

#### Citation:

Lei X, Holzleitner IJ and Perrett DI (2019) The Influence of Body Composition Effects on Male Facial Masculinity and Attractiveness. Front. Psychol. 9:2658. doi: 10.3389/fpsyg.2018.02658 other studies report no overall preference for sexual dimorphism (e.g., Swaddle and Reierson, 2002; Cornwell et al., 2004).

Variability in methods has been proposed to account for the differences in results (Rhodes, 2006), yet by directly comparing commonly used methods to measure women's preferences for male facial masculinity, DeBruine et al. (2006) found that different methods can produce similar results. Alternatively, individual differences in self-rated attractiveness, relationship status, own-health condition, exposure to violence, pathogen disgust sensitivity and resource availability might contribute to the variation in results (Holzleitner and Perrett, 2017). One factor that has been found to have a consistent effect on women's preference for male masculinity is relationship context. Using computer graphics techniques to manipulate masculinity in male facial shape, women show a stronger preference for facial masculinity when choosing short-term partners compared to long-term partners (Little et al., 2002; Penton-Voak et al., 2003; Jones et al., 2018). In addition, this relationship context effect was more pronounced in women with partners and not found in those taking hormonal contraception pills (Little et al., 2002). This preference for masculinity in men as short-term partners has been found with a range of stimuli and modalities, including face, body, voice, and odor (Little et al., 2011a).

Sexual Strategies theory proposes that females have evolved distinct strategies to solve different problems they may encounter when pursuing a short-term or long-term relationship (Buss and Schmitt, 1993). As women's reproductive success is restricted by the resources and protection they can obtain from men, women should prefer long-term partners who are more likely to provide paternal care, reliable resources and protection. Masculinity is perceptually associated with some negative personality traits, which might explain why women prefer less masculine men for long-term partners. Indeed, perceived facial masculinity was found to increase perceived dominance (Boothroyd et al., 2007), lower perceived paternal investment (Boothroyd et al., 2007) and decrease perceived trustworthiness (Perrett et al., 1998). Complementing these findings, several studies have found that high testosterone (an androgen contributing to male sexual dimorphism) is associated with lower likelihood of marriage, higher divorce rates and higher rates of domestic disputes (Julian and McKenry, 1989; Booth and Dabbs , 1993; Booth et al., 2000). Hence, less masculine men may be advantageous for long-term relationships.

In short-term relationships, women need not be restricted by consideration of paternal investment. Therefore, selection of partners may be guided by cues to long-term health and 'good genes' for immunity against currently prevalent pathogens that can be passed on to offspring (Gangestad et al., 2005). Masculinity is argued to be one cue to good genes as part of the immunocompetence handicap hypothesis (Folstad and Karter, 1992). This hypothesis states that testosterone has an immunosuppressive effect. Masculine men need a strong immune system to resist the immunosuppressive effect. Masculinity may therefore signal a strong immune system in men. Although studies examining the relationship between testosterone and immune function have produced mixed results, a recent crossspecies meta-analysis revealed a medium-sized effect from experimental studies which elevate testosterone artificially and find a concomitant decline in immune function (Foo et al., 2017).

While a considerable number of studies have focused on the role of testosterone in suppressing immune function, it is relevant that testosterone has also been found to play a key role in maintaining men's cardiovascular health. A deficiency in testosterone is associated with increased central adiposity, reduced insulin sensitivity, impaired glucose tolerance and increased cholesterol, which are all found in metabolic syndrome and type 2 diabetes and are detrimental to cardiovascular health (Kelly and Jones, 2013). Although there is debate about whether lower levels of testosterone cause cardiovascular diseases directly or whether decreased testosterone is a by-product of poor health, clinical studies have found that testosterone replacement therapy is effective in improving health in metabolic syndromes (Elagizi et al., 2018). If masculinity is heritable, masculinity may be a cue to current health and to genes for good health.

Despite the prolific research on the effect of masculine traits (e.g., faces, voices, odors) on attractiveness, few studies have explored the role that muscle plays. This is surprising considering the fact that higher muscle mass to lower fat mass is a typical masculine feature in humans (Wells, 2007) because testosterone promotes both muscle and bone growth (Mooradian et al., 1987). Thus, measures of muscle might be strong cues to masculinity. It follows that one may expect men with high muscle to be preferred by women, especially for short-term relationships, as women prefer more masculine looking men for short-term relationships. Indeed, muscular men were found to be preferred by women and have greater mating success (Frederick and Haselton, 2007).

Besides the close relationship between testosterone and muscle mass, muscularity may influence masculinity perception through its association with body size, which is also sexually dimorphic. Men on average are heavier compared to women. Indeed the faces of men with higher body mass index (BMI; weight scaled by the square of height) are perceived as more masculine than men with low BMI (Holzleitner et al., 2014). Therefore, muscular men may be perceived as masculine because they have greater weight. Since body weight is mainly composed of fat and muscle, it raises the question as to whether or not fat mass has a similar effect to muscle mass on male masculinity and attractiveness.

To our knowledge, only one study has explored the role of body composition on the perception of attractiveness in male bodies (Brierley et al., 2016). The results from this study suggest that men with levels of body fat and muscle mass in the healthy BMI range are most preferred by women. This study did not investigate the context of the attractiveness judgments. More importantly, no study has tested the effects of facial correlates of body composition (fat and muscle) on the perception of masculinity and facial attractiveness. Humans rely more heavily on facial attractiveness than physical (body) attractiveness when choosing mates (Currie and Little, 2009). In fact, when given the choice, women gave priority to men's faces over bodies when judging dating partners for both short- and long-term relationships (Confer et al., 2010). These findings highlight the importance of investigating the effect of the facial cues to body composition on attractiveness.

fpsyg-09-02658 December 22, 2018 Time: 12:37 # 2

In the current studies, we examine (a) the impact of facial correlates of body composition (fat and muscle) on perceived male facial masculinity, and (b) how the facial correlates of body composition influence women's preference for male faces under short-term and long-term relationship contexts.

Considering that testosterone encourages the growth of muscle, we predict that the facial correlate of muscle mass will be positively correlated with perceived facial masculinity (Hypothesis 1). Since men are heavier than women, a heavier body no matter whether the weight is due to fat mass or muscle mass may lead to higher perceived masculinity. We thus predict the facial correlate of fat mass should also contribute positively to the perception of male facial masculinity (Hypothesis 2). Nevertheless, we expect the face shape correlate of muscle to have a larger effect on perceived facial masculinity than the face shape correlate of fat based on the stronger association between muscle and testosterone than the association between fat and testosterone (Hypothesis 3).

Regarding facial preferences, we predict that women should show a stronger preference for facial cues to increased muscle mass under a short-term relationship context compared to a long-term relationship context (Hypothesis 4). Similarly, we predict a stronger preference for facial cues to increased fat mass in short-term relationships compared to long-term relationships (Hypothesis 5). We also predict that the relationship context effect on preferences will be more apparent for the facial correlates of muscle than the facial correlates of fat (Hypothesis 6). These hypotheses about preferences follow from Hypotheses 1−3 since higher weight, particularly from muscle is expected to increase masculinity.

#### MATERIALS AND METHODS

#### Stimuli

To examine the generalizability of findings, we included three sets of faces. One set of three-dimensional (3D) face stimuli, collected using a 3D camera and delineated with 49 landmarks using MorphAnalyser software that included scans of 50 Caucasian men (Mage ± SD = 21.2 ± 2.5 years, see Holzleitner and Perrett, 2016). A second set of two-dimensional (2D) images matched to the 3D scans were also available for the same 50 men (hereafter referred to as the 2D version of 3D face set). These 2D images were captured under a constant lighting condition using a Fujifilm FinePix S5Pro digital SLR camera (60 mm fixed length lens) in a booth painted with standard white paint. Facial images were captured in full color with participants' hair pulled back. Participants, seated at a set distance from the camera and the same relative eye height to the camera, were asked to maintain a neutral expression. Faces were delineated in PsychoMorph<sup>1</sup> with 189 landmarks and aligned on the left and right pupils (Tiddeman et al., 2001).

A further independent set of 2D face images was collected from 101 Caucasian male participants (Mage ± SD = 21.44 ± 3.33 years) who were recruited from the University of St Andrews. The participants contributing to the 3D face set and matched 2D face set did not contribute to the independent 2D face set.

#### Anthropometric Measurements

Anthropometric data were acquired after removing excess clothing and footwear. Each individual's height was measured with a tape measure (stadiometer), and body composition was measured barefoot using an electrical impedance scale (Tanita SC-330 body composition analyzer), which estimates weight, BMI, fat mass and muscle mass (lean fat-free mass). These estimations take into account information about athletic training (>10 h/week) and norms for each gender. The indicator 'muscle mass' refers to an estimate of the weight of fat-free mass excluding bone mass, and includes contributions from skeletal muscles, smooth muscles and cardiac muscles.

#### Face Transformation

The method used to transform the face shape involves defining the difference in face shape between two groups of faces differentiated along one dimension (e.g., high/low BMI, see Holzleitner and Perrett, 2016; Batres and Perrett, 2017). The difference is then applied to individual face images.

Prototypes associated with high or low fat mass or muscle mass were first created separately for 2D and 3D faces. Prototypes were made by averaging together the nine faces for 3D face set (and matched 2D version of 3D face set) ranked the highest and lowest on the fat mass or muscle mass dimension. This allows a direct comparison between 2D and 3D faces. Since larger individuals usually have higher absolute fat mass and muscle mass than smaller individuals, fat prototypes were created with age, height and muscle mass controlled. Similarly, muscle prototypes were created with age, height and fat mass controlled. Therefore, prototypes differed only in either fat or muscle mass dimension but not in both dimensions (see **Supplementary Material Table S1** for details). Similarly, we created prototypes from the 10 faces ranked highest and lowest in fat or muscle mass dimension for the independent 2D face set.

The fat and muscle prototypes were then used to create shape transforms of five Caucasian male faces. Face shapes were transformed to visualize body composition (fat/muscle mass) differences by adding or subtracting a proportion of the facial shape differences between low and high fat/muscle prototypes. To make the fat- and muscle- transformed images comparable, facial shapes were transformed to the same magnitude in terms of BMI (±4 BMI units) in 15 steps. This process created three sets of transformed images (using 3D prototypes, 2D version of 3D prototypes and an independent set of 2D prototypes). Each set of transformed images consisted of five identities transformed to lose/gain fat/muscle mass (**Figures 1**–**3**). For 3D images, both the front view and the half-profile view were created in the transformation process. These two views were combined in one image (**Figure 1**).

All images were masked with the black background to display only the face and neck and to remove confounds arising from hair (DeBruine et al., 2010). 2D images were aligned to have the same pupil positions and resized to 500 × 500 pixels.

<sup>1</sup>http://users.aber.ac.uk/bpt/jpsychomorph/

FIGURE 1 | 3D Male face shape associated with fat mass (A) and muscle mass (B). Individual faces (middle) were transformed to reflect face shapes associated with less fat/muscle mass (–4 BMI units, top) or more fat/muscle mass (+4 BMI units, bottom) based on the difference in the face shape between low and high fat/muscle prototypes for the 3D face set. Front and half-profile views of the same face are displayed. The participant gave written informed consent for the publication of his image and use in the experiments.

## STUDY 1: FACIAL CORRELATES OF BODY COMPOSITION AND PERCEIVED MASCULINITY

This study aimed at testing whether facial correlates of body composition (fat mass and muscle mass) influence perceived facial masculinity in males. We tested the following hypotheses:


## Methods

Ethical approval was received from University of St Andrews Ethics Committee (PS13092). Participants gave written informed consent to perform the experimental tasks.

#### Participants

Sixty-seven students from the University of St Andrews (Mage ± SD = 19.37 ± 3.84 years, range 18−45) including 56 females and 9 males (demographics were omitted by two participants; 51 Caucasian) completed this study.

#### Materials

Stimuli consisted of three face identities transformed to four levels (−4 BMI units, −2.3 BMI units, +2.3 BMI units, +4 BMI units) plus the untransformed image (+0 BMI units). Therefore, there was a total of 81 stimuli: 3 (face identities) × 3 (face sets: 3D face set, matched 2D version of 3D face set, independent 2D face set) × 9 [4 BMI levels × 2 dimensions (fat and muscle) + original face].

#### Procedure

Participants were asked to complete a demographic questionnaire (age, sex, ethnicity, and sexual orientation). Then faces were presented one at a time in three blocks (each block consisted of a set of faces with muscle and fat transform). Both the order of the trials within blocks and the three blocks were completely randomized. Participants were asked to rate the masculinity ("Please indicate how masculine you perceive this man to be") of each stimulus face by dragging the cursor on a sliding bar with anchors (1 = least masculine and 7 = most masculine). The starting point of the cursor along the

bar was randomized. There was no time limit to make judgments. The next face was shown only after the participant had adjusted the slider and clicked for the next trial.

#### Statistical Analysis

For each stimulus type, the mean ratings were calculated across face identities for each participant. The consolidated data were further analyzed in SPSS 24.0 three-way analysis of variance (ANOVA) was run, with the transform dimension (fat/muscle) and the transform level (five levels: −4 BMI units, −2.3 BMI units, no change, +2.3 BMI units, +4 BMI units) included as the independent variables. Face set (three sets) was included as an additional independent variable to

determine if results were consistent across the different samples of faces.

#### Results

his image and use in the experiments.

A three-way ANOVA was run to test the transformation attributions made to fat and muscle mass across the three face sets. The results showed non-significant main effects of the transform dimension [F(1,66) = 0.44, p = 0.507, η <sup>2</sup> = 0.007] and face sets [F(2,132) = 0.94, p = 0.392, η <sup>2</sup> = 0.014] on masculinity rating, but a significant main effect of transform level [F(4,264) = 74.80, p < 0.001, η <sup>2</sup> = 0.531] (see **Table 1**). As face shape simulated heavier individuals (higher BMI), the masculinity ratings increased. The interaction between transform dimension and face set was not significant [F(2,132) = 0.41,



p = 0.665, η <sup>2</sup> = 0.006] but a significant interaction was found between transform dimension and transform level [F(4,264) = 24.75, p < 0.001, η <sup>2</sup> = 0.273], reflecting a greater impact of muscle transform compared with fat transform on masculinity.

There was a significant interaction between face set and transformed level [F(8,528) = 2.61, p = 0.008, η <sup>2</sup> = 0.038]. Further, the three-way interaction among transform dimension, transform level, and face set was significant [F(8,528) = 2.17, p = 0.028, η <sup>2</sup> = 0.032]. To understand the three-way interaction, we conducted two-way ANOVA separately for each face set.

#### 3D Face Set

For 3D faces, the main effect of the transform dimension was non-significant [F(1,66) = 1.36, p = 0.252, η <sup>2</sup> = 0.020]. There was a significant main effect of transform level [F(4,264) = 31.17, p < 0.001, η <sup>2</sup> = 0.321], which was qualified with an interaction between transform dimension and transform level [F(4,264) = 4.40, p = 0.002, η <sup>2</sup> = 0.062, see **Figure 4**]. Paired-samples t-tests showed that significant increases in masculinity ratings occurred between all levels of muscle transform (p ≤ 0.004 each comparison) except between 0 and +2 BMI units (p = 0.186). By contrast, there were no significant increases in masculinity ratings for fat transform above normal weight (0, +2.3, and +4 BMI units, p ≥ 0.337 each comparison). There were significant decreases in masculinity ratings between faces associated with decreased fat mass compared to increased fat mass (p ≤ 0.005 each comparison). These findings provide further support for our Hypothesis 3 that the facial correlate of muscle mass increases perceived facial masculinity more than the facial correlate of fat mass.

#### 2D Version of 3D Face Set

For the 2D version of the 3D face set, there was no main effect of transform dimension [F(1,66) = 0.05, p = 0.833, η <sup>2</sup> = 0.001]. The main effect of transform level was significant [F(4,264) = 50.85, p < 0.001, η <sup>2</sup> = 0.435] but was qualified by a significant interaction between transform dimension and transform level [F(4,264) = 8.63, p < 0.001, η <sup>2</sup> = 0.116, see **Figure 5**]. Paired-samples t-tests showed an increase in muscle mass by ∼2 BMI units significantly increased masculinity ratings throughout the range (−4 to +4 BMI units, p ≤ 0.002 each comparison). Significant increases in masculinity ratings with fat mass transform were seen in most comparisons (p ≤ 0.014 each comparison) but no significant increases were seen in comparisons between faces associated with increased fat mass [0 vs. +2.3 BMI units (p = 0.170) and +2.3 vs. +4 BMI units (p = 0.070)]. These findings are again in line with our prediction that facial correlates of both fat mass and muscle mass positively influence perceived facial masculinity but that also the facial correlate of muscle mass has a larger impact on masculinity.

#### Independent 2D Face Set

For face transforms based on the independent 2D face set, the main effect of the transform dimension was non-significant [F(1,66) = 0.02, p = 0.888, η <sup>2</sup> = 0.000]. A significant main effect of transform level [F(4,264) = 34.89, p < 0.001, η <sup>2</sup> = 0.346] reflected faces associated with increased mass (fat or muscle) being considered more masculine.

The interaction between transform dimension and transform level was significant [F(4,264) = 15.82, p < 0.001, η <sup>2</sup> = 0.193, see **Figure 6**]. This interaction reflects a greater impact of muscle compared with fat on masculinity ratings. Paired-samples t-tests showed that participants rated faces with higher muscle mass significantly more masculine for comparisons between all five

levels (p ≤ 0.017 each comparison). In contrast, a significant increase in masculinity ratings for faces associated with higher fat mass was evident only for comparisons between faces with decreased fat mass (−4 BMI units, −2.3 BMI units) and the other levels (p ≤ 0.046 each comparison). There were no significant differences in masculinity ratings for fat transforms 0, +2.3, or + 4.3 BMI units (p ≥ 0.270 each comparison). As fat mass increased from low to normal weight, masculinity increased, but for gain in the fat level above normal weight, there was no significant change in masculinity ratings. These findings support our hypothesis that the facial correlate of muscle mass enhances perceived facial masculinity more than the facial correlate of fat mass.

The interaction between face set, transform dimension and transform level arises from the relative size of the muscle and fat transforms across the three face sets, with the fat and muscle differences being most subtle in the 3D face set though the pattern is similar for each face set.

#### Discussion

As expected, facial correlates of fat mass and muscle mass both positively affected perceived facial masculinity in men. The results are consistent with Holzleitner et al. (2014) findings of heavier men being perceived as more masculine. As we hypothesized, muscle mass enhances the perception of masculinity more than fat mass. Specifically, increasing the face shape correlate of muscle mass resulted in higher ratings of facial masculinity across the full weight range (BMI range 18−26). By contrast, increasing the face shape correlate of fat mass only raised masculinity rating from low to normal weight (BMI = 18−22). Further increases in fat mass above normal weight (BMI = 22) had little or no impact on the perception of masculinity. These results imply that the effect of fat on masculinity is more prevalent in men with underweight to normal weight bodies.

## STUDY 2: ATTRACTION TO THE FACIAL CORRELATES OF BODY COMPOSITION

Study 1 found that facial correlates of both fat mass and muscle mass contribute to perceived facial masculinity, which has been found to affect the perception of attractiveness. In this part of the study, we tested the relationship between facial correlates of body composition and facial attractiveness.

As discussed before, higher levels of masculinity are preferred by women more for short-term relationships than for longterm relationships. Hence, we measured heterosexual women's preferences for facial correlates of body composition in male faces under short-term and long-term relationship contexts. Given the findings above that the facial correlate of muscle mass increases perceived facial masculinity, we predicted that women would show a stronger preference for the facial correlate of muscle mass in a short-term rather than a long-term relationship context (Hypothesis 4). Regarding fat mass, in the introduction we hypothesized that women would show a stronger preference for higher fat mass in short-term relationships than in long-term relationships. In the light of the masculinity ratings we found in Study 1, this hypothesis should be modified. We can now hypothesize that if women show an overall preference for men with a BMI < 22, we predict women will prefer a face shape associated with more fat mass for a short-term relationship in comparison to a long-term relationship (Hypothesis 5a). Conversely, we predict that women will not shift their preference for the facial correlate of fat mass between short-term and long-term relationships if they prefer men with a BMI > 22 (Hypothesis 5b). Nevertheless, we predict the preference shift between short-term and long-term contexts will be more apparent for the facial correlate of muscle mass than the facial correlate of fat mass (Hypothesis 6).

This study was initially administered with Study 1 as a single experiment consisting of two tasks (masculinity rating and preference) for University students, with the preference task executed before the masculinity task. Considering the students are highly homogeneous groups due to their age and educational background, the study was repeated in a more heterogeneous group to test the generalizability of findings. Hence, we recruited another group of participants through the online recruitment platform, Amazon MTurk.

#### Methods

Ethical approval was received from University of St Andrews Ethics Committee (PS13176 and PS13092). Participants gave written informed consent to perform the experimental tasks.

#### Participants

For the student group, 63 heterosexual female participants (Mage ± SD = 18.94 ± 2.17, range 18–35 years; 48 Caucasian) completed this study after exclusion of those without demographic information (age, sex, ethnicity, and sexual orientation) or who reported to be homosexual or males.

For the MTurk workers group, 58 heterosexual women (Mage ± SD = 32.09 ± 6.68, range 22–45 years; 43 Caucasian) completed this study after exclusion using the same criteria as the students' group and an additional exclusion age criterion. Ten women over age 45 years were additionally excluded as our prediction was based on the assumption that the key benefit women gain from short-term relationships concerns potential reproductive success. MTurk participants were paid \$3 for their time.

#### Materials

The stimuli consisted of face images transformed as described above. For each face identity, 15 images were produced spanning the transformation ±4 BMI units on fat mass and muscle mass dimensions. The 15 images were presented as an interactive continuum. For MTurk workers, a total of 30 face continua: 5 face identities × 2 dimensions (fat/muscle) × 3 face sets (3D face set, 2D version of 3D face set, independent 2D face set) were presented twice in separate trial blocks asking about preferences for a short-term sexual relationship and long-term relationship. For the student group, the three face identities were used. Thus, 18 face continua: 3 identities × 2 dimensions (fat/muscle) × 3 face sets (3D face set, 2D version of 3D face set, independent 2D face set) were presented in each of two trial blocks.

#### Procedure

At the beginning of this study, participants were asked, " Please indicate the sex of face that you would like to see (as a sexual partner)" (Note: female faces were also given as an option for heterosexual males, homosexual and bisexual female participants

long-term) and preferred facial correlates of body composition (fat mass and muscle mass) in student participants. The vertical axis represents the associated BMI of the most preferred faces. Error bars represent standard errors.

to view, but data from these faces are not analyzed here). The participants' demographic information (age, sex, ethnicity, and sexual orientation) was collected in an initial questionnaire. Then

participants were presented with the stimuli twice in two blocks. They were asked to adjust the slider underneath each stimulus to make the face most resemble someone they would find attractive as a short-term (sexual) partner and as a long-term partner in two separate blocks. The order of the tasks was counterbalanced. Trials with 2D and 3D face stimuli were also grouped in two separate sub-blocks. The order of sub-blocks and the presentation order within each sub-block was randomized. The scroll direction to change the face shape was randomized across trials. The next image would only be shown when participants adjusted the slider and clicked the submit button. For each trial, the BMI level chosen by each participant was saved.

Instructions were given prior to tasks as follows (a) Short-term (sexual) relationship: "Please change the face to most resemble someone you would find attractive for a SHORT-TERM (sexual) relationship." (b) Long-term relationship: "Please change the face to most resemble someone you would find attractive for a LONG-TERM relationship."

#### Statistical Analysis

The dependent variable was the transform level that was most preferred (expressed as a BMI equivalent). The data for the students group and MTurk group were analyzed separately in SPSS 24.0.

#### Results

#### Student Group

A three-way ANOVA was run to test women's preference for facial correlates of fat mass and muscle mass in different relationship contexts and across the three face sets. The results showed a non-significant main effect of fat/muscle transform dimension [F(1,62) = 3.18, p = 0.079, η <sup>2</sup> = 0.049]. As expected, a significant main effect of context [F(1,62) = 9.26, p = 0.003, η <sup>2</sup> = 0.130] was found, with participants preferring faces of heavier men (with fat mass or muscle mass) for a short-term relationship (M = 21.42, SD = 1.15) rather than a long-term relationship (M = 20.98, SD = 0.90). In addition, there was a significant main effect of face set [F(2,124) = 107.37, p < 0.001, η <sup>2</sup> = 0.634, see **Figure 7**]. Although we did not expect to find a main effect of the face set, the paired-samples t-tests suggest that the effect might simply be due to participants choosing heavier faces in the 3D face set compared with the other two 2D face sets. Paired-samples t-tests showed that participants choose heavier faces for the 3D face set (M = 22.14, SD = 1.07) compared with the 2D version of 3D face set (M = 20.67, SD = 0.99) [t(62) = 12.02, p < 0.001] and the independent 2D face set (M = 20.80, SD = 0.93) [t(62) = 10.88, p < 0.001].

In line with our Hypothesis 6, a significant interaction was found between transform dimension and context [F(1,62) = 4.73, p = 0.034, η <sup>2</sup> = 0.071, see **Figure 8**]. This result indicates a greater effect of muscle than fat on preference in the two contexts. As expected, paired-samples t-tests showed that a higher level of facial correlate of muscle mass was preferred in a short-term (M = 21.43, SD = 1.22) rather than a long-term (M = 20.83, SD = 1.07) relationship [t(62) = 3.49, p = 0.001]. By contrast, there was a non-significant trend for a difference between preference for the facial correlate of fat mass in short-term (M = 21.42, SD = 1.23) and long-term (M = 21.13, SD = 0.96) [t(62) = 1.86, p = 0.068] relationships, which provides limited support for Hypothesis 5a.

The three-way interaction (transform dimension × relationship context × face set) was non-significant [F(2,124) = 0.33, p = 0.719, η <sup>2</sup> = 0.005]. Since the interaction between fat and muscle transform and relationship context was found to be significant and independent of the face set, it was not necessary to analyze the data further for each face set separately. Thus, our main prediction was borne out across the three face sets.

Finally, one-sample t-tests compared the preferred BMI (average across the three face sets) with a BMI of 22 (the average of the original starting BMI of the face stimuli) to test whether women show a general preference toward a lower or higher than normal weight. Significant decreases in preferred BMI below 22.0 were found, reflecting a reduction of fat mass and muscle mass for both short-term [fat mass: M = 21.42, t(62) = −3.78, p < 0.001; muscle mass: M = 21.43, t(62) = −3.70, p < 0.001] and longterm [fat mass: M = 21.13, t(62) = −7.18, p < 0.001; muscle mass: M = 20.83, t(62) = −8.72, p < 0.001] relationships.

#### MTurk Workers

Similarly, a three-way ANOVA was run to test MTurk women's preference for men's facial correlates of fat and muscle mass across relationship contexts. The results showed non-significant main effects of transform dimension [F(1,57) = 0.06, p = 0.808, η <sup>2</sup> = 0.001] and context [F(1,57) = 1.31, p = 0.258, η <sup>2</sup> = 0.022]. A significant main effect of face set was found [F(2,114) = 71.58, p < 0.001, η <sup>2</sup> = 0.557, see **Figure 9**]. Similar to the student group, paired-samples t-tests showed that participants chose heavier faces (with higher fat mass or muscle mass) with the 3D face set (M = 22.07, SD = 0.89) compared with the 2D version of 3D face set (M = 20.79, SD = 1.10) [t(57) = 8.89, p < 0.001] and the independent 2D face set (M = 20.95, SD = 0.93) [t(57) = 8.68, p < 0.001]. Unlike the results from the student group, MTurk participants preferred slightly heavier faces for the independent 2D face set compared to the 2D version of 3D face set [t(57) = −2.65, p = 0.010].

In line with our Hypothesis 6, a significant interaction was found between fat and muscle transform dimension and relationship context [F(1,57) = 7.36, p = 0.009, η <sup>2</sup> = 0.114, see **Figure 10**]. Paired-samples t-tests results suggest that MTurk women showed a stronger preference for the facial correlate of muscle mass in short-term relationships (M = 21.42, SD = 1.12) compared with long-term relationships (M = 21.10, SD = 0.95) [t(57) = 2.33, p = 0.024] but those women did not differ in their preference for the facial correlate of fat mass between short-term (M = 21.24, SD = 0.99) and long-term relationships (M = 21.32, SD = 0.96) [t(57) = 0.70, p = 0.488]. Further, the three-way interaction (transform dimension × relationship context × face set) was non-significant [F(2,114) = 1.52, p = 0.224, η <sup>2</sup> = 0.026], indicating that the interaction between fat/muscle transform and relationship context was consistent across the three face sets.

One-sample t-tests compared the preferred BMI transform level (average across the three face sets) to a BMI of 22 (the average of the original starting BMI of the face stimuli). MTurk participants preferred a BMI significantly reduced from a BMI of 22 for both fat mass and muscle mass in short-term [fat mass: M = 21.24, t(57) = −5.82, p < 0.001; muscle mass: M = 21.42,

FIGURE 10 | The interaction between relationship context (short-term vs. long-term relationship) and preferred facial correlates of body composition (fat mass and muscle mass) in MTurk participants. The vertical axis represents the associated BMI of the most preferred faces. Error bars represent standard errors.

t(57) = −3.96, p < 0.001] and long-term [fat mass: M = 21.32, t(57) = −5.37, p < 0.001; muscle mass: M = 21.10, t(57) = −7.17, p < 0.001] relationships.

#### Discussion

This study investigated heterosexual women's preferences for men's facial correlates of body composition under different relationship contexts. In line with our Hypothesis 4, women showed a stronger preference for faces associated with higher muscle mass in a short-term relationship compared with a long-term relationship. In contrast, women did not shift their preference for the facial correlate of fat mass between short-term and long-term relationships even though their overall preference lay in the low end of normal weight (BMI∼21 kg/m<sup>2</sup> ).

#### GENERAL DISCUSSION

The present study had two aims: first, to investigate the effect of facial correlates of body composition (fat mass and muscle mass) on the perceived facial masculinity of men and second to investigate the effect of facial correlates of body composition on women's preferences in different relationship contexts. Ratings of masculinity supported our hypotheses that both facial correlates of fat mass and muscle mass positively affect perceived facial masculinity. While the facial correlate of muscle mass had a pronounced effect on perceived masculinity, the effect of the facial correlate of fat mass increased masculinity only in underweight to lower normal weight men. In interactive preferences tests where women optimized the shape of a male face, we found that there is a context shift in preferences with women preferring facial correlates of higher muscle mass for a short-term relationship compared to a long-term relationship. By contrast, we found that women do not shift their preference for the facial correlate of fat mass between short-term and long-term relationships.

## Attribution to Perceived Facial Masculinity

fpsyg-09-02658 December 22, 2018 Time: 12:37 # 11

The results from Study 1 supported our predictions that facial correlates of body composition influence perceived facial masculinity. In line with Holzleitner et al. (2014) findings, the facial cues to higher body weight (BMI) increase perceived facial masculinity of male faces. The results extend previous findings that 'facial adiposity' (weight perceived from the facial appearance) is positively associated with perceived masculinity in under to normal weight men but not in overweight or obese men (Phalane et al., 2017). It should be noted that the definition of facial adiposity in Phalane et al.'s (2017) study was a measure of the weight perceived from the face. Hence, their perceived adiposity measure will include two components, namely weight from fat and weight from muscle. Phalane et al.'s (2017) results indicate a quadratic relationship between perceived facial adiposity and masculinity. By distinguishing the facial correlates of fat and muscle, we find a quadratic relationship between fat and masculinity, but a linear relationship between muscle and masculinity. Hence our study shows that the findings of Phalane et al. (2017) are likely to reflect the facial correlate of fat. Our findings indicate that the muscle and fat components should be treated separately in future work on facial perception.

Importantly, our results were consistent across the three face sets employed. Although the relationship between the facial correlate of fat mass and masculinity was slightly different between the 2D version of 3D face set and the other two face sets, the facial correlate of muscle mass was found to have a larger impact on perceived masculinity across all three sets of faces.

The distinct effects of fat mass and muscle mass on perceived facial masculinity might reflect the sex differences in body physique because men are generally heavier in body weight and have more muscle mass than women (Wells, 2007). Indeed, fat-free muscle mass are even more sexually dimorphic than differences in body weight (Lassek and Gaulin, 2009). Hence, heavier men with higher muscle mass have attributes associated with higher sexual dimorphism and should be seen as more masculine. Indeed, this is what we found in the first part of our study. Although men on average have greater weight compared to women, the weight difference is mainly due to the higher muscle mass that men possess. Hence, the excess fat mass does not make male faces more masculine but decreased weight, whether due to loss of fat mass or loss of muscle mass, decreases men's perceived masculinity.

It is also possible that the facial correlates of muscle serve as a cue to testosterone levels and thus enhance masculinity perception more than the facial correlate of fat mass. In fact, increased testosterone levels during puberty cause growth of jaw, brow, chin and nose (Marecková et al., 2011 ˇ ). As a result, adult male faces have a relatively longer and broader lower jaw, higher brow ridges, thinner cheeks and more prominent cheekbones compared to adult women (Little et al., 2011b). The perceptual studies here provide further evidence that the face shape correlates of fat mass and muscle mass are distinct in men. Holzleitner and Perrett (2016) found that observers were able to distinguish the face shape correlates of fat mass and muscle mass using 3D facial stimuli. Here, we find further distinctions for the fat and muscle aspects of body composition for both 2D and 3D facial stimuli. A visual adaptation study also suggested that body fat and muscle are processed independently in the brain (Sturman et al., 2017). The face shape correlates of muscle may not only provide cues to body composition and physique but also may provide a cue to testosterone levels, and hence influence masculinity perception.

Taken together, we have shown that the perception of male facial masculinity is not only based on the cues to body weight. More importantly, muscularity is the aspect of the body composition that has greatest influence on facial masculinity perception.

## Context Shifts in Preferences for Facial Masculinity

Study 2 indicates that women's preference for male face shape is dependent on context: we found that women preferred faces associated with a higher muscle mass for short-term relationships rather than long-term relationships but that women do not show different preferences for facial cues to fat mass between short- and long-term relationships.

Our findings appear to be in line with the good genes hypothesis, which argues that women are attracted to indicators signaling heritable aspects of immunity and health when seeking short-term partners (Gangestad and Simpson, 2000; Gangestad et al., 2005). We note that the contextual differences in preferences are also consistent with an alternative interpretation that the preference difference might reflect avoidance of negative characteristics associated with higher muscularity in long-term relationships. Previous studies have revealed that men with high testosterone levels and more fat-free mass (greater muscle mass) report having a larger number of sex partners, indicating that these men might devote more effort in mating relative to parenting (Peters et al., 2008; Lassek and Gaulin, 2009). Further, other studies show that men with high testosterone levels are less likely to get married and more likely to get divorced (Julian and McKenry, 1989; Booth and Dabbs , 1993; Booth et al., 2000). Hence, male faces that reflect high levels of androgen-mediated traits may be less preferred by women in a long-term relationship because of the associated behavioral traits that are inconsistent with paternal investment.

This interpretation may also account for why women do not show different preferences for the facial correlate of fat mass between the two relationship contexts. Although we predicted facial cues to higher fat mass would be preferred for shortterm relationships because higher fat mass contributes to facial masculinity (at least in low weight men), the masculinity perception contributed by the facial correlate of fat mass, however, is not testosterone dependent. Therefore, despite the fact that faces associated with higher fat mass are perceived to be more masculine, the same facial cues to fat mass are

Lei et al. Does Muscularity Matter?

not necessarily associated with the undesirable testosteronemediated traits. Consequently, women do not need to shift their preference between short- and long-term relationships since there are no (or fewer) associated costs with preferring masculinity that derives from slightly higher fat mass. Therefore, the relationship context preference differences that we find may reflect women's reluctance to choose very muscular men who appear unsuitable as long-term partners. Future studies investigating the perception of personality traits from facial cues to fat mass and muscle mass may provide better understandings for the context shifts.

It worth mentioning that women generally prefer faces reflecting low fat mass and muscle mass under both contexts. The associated BMI of the most preferred face was significantly reduced compared with the original starting BMI of the facial stimuli (namely BMI of 22.0 kg/m<sup>2</sup> ). This suggests that men with low-normal body weight but not underweight are most preferred by women as partners. This finding is in line with previous studies on men's attractiveness and BMI, which found that the most preferred male bodies resemble BMI around 21 kg/m<sup>2</sup> (Swami and Tovée, 2005, 2008). The findings are also consistent with one prior study, which found an inverted U shape relationship between men's body attractiveness and muscularity (Frederick and Haselton, 2007). Men with medium levels of muscle mass were rated to be more sexually desirable compared with the very low or very high levels of muscularity (Frederick and Haselton, 2007).

By contrast, our findings are less consistent with recent findings that stronger men are seen as more attractive (Sell et al., 2017; Foo et al., 2018) with a linear increase in attractiveness reported for the range of men's strength sampled. There are two possible reasons for the inconsistency. Firstly, it should be noted that the studies mainly focused on attractiveness of men's bodies rather than men's faces. There might be a discrepancy between the attractiveness of men's bodies and faces. Women might find a stronger body attractive but not necessarily the face shape accompanying such a body. Future study may set out to test whether women show consistent preferences for men's body muscularity and the facial correlates of muscle.

Second, the studies that found a positive relationship between strength and attractiveness have adopted a correlational method comparing strength to ratings of natural bodies (Sell et al., 2017; Foo et al., 2018), while we employed an interactive method to let participants optimize the most attractive face shape from stimuli synthesized with computer graphics. Support for the divergence of results reflecting different methods comes from the study of Brierley et al. (2016) who used a similar interactive method to test the attractiveness of men's bodies. Brierley et al. (2016) found that a slight decrease of body fat and slight increase of body muscle was optimal for men with normal starting BMI and body composition. In both the experiment of Brierley et al. (2016) and the experiment here, men with a high muscular body composition were not the most attractive. Studies comparing ratings of real and computer-manipulated images may help resolve the difference in attraction of strong and muscular men.

Although our hypotheses are supported with the use of both 2D and 3D facial stimuli, we note that a higher BMI (in both fat and muscle dimensions) was preferred in 3D faces compared to 2D faces. This effect of dimensionality might be due to the fact that our 3D stimuli combined both the front and the profile views, whereas our 2D stimuli used the front view alone. The combination of front and profile views may provide more information relating to weight. Alternatively, the profile view may provide information that is distinct from that evident in the front view. Indeed, prior study has shown that women make different choices for attractiveness and dominance when viewing front and profile views of the male faces (Swaddle and Reierson, 2002). Furthermore, Danel et al. (2018) showed that the measured sexually dimorphic facial features show only a moderate correlation across front and profile views (r = 0.20). These findings imply that further experiments are required to understand the processing of frontal and lateral views of the face.

## CONCLUSION

In summary, we have shown the distinct effects that facial correlates of fat mass and muscle mass have on perceptions of masculinity and attractiveness in men. Our findings show that the facial correlate of muscle mass has a profound impact on perceived facial masculinity in men of all weights. By contrast, the facial correlate of fat mass affects masculinity only in underweight to lower normal weight men. Further, we find a contextual shift in women's attraction to the facial correlate of muscle mass but not fat mass, with a stronger preference for male face shapes associated with high muscle mass under a short-term relationship context compared to a long-term relationship context.

Body size has an impact on a variety of social judgments including attractiveness, strength, dominance, leadership and employment (Windhager et al., 2011; Re and Perrett, 2014; Holzleitner and Perrett, 2016; Nickson et al., 2016; Phalane et al., 2017). Our findings highlight the importance of differentiating size-related effects separately for body fat and body muscle.

In spite of consistent results across the three face sets and two samples of participants, we note that the current studies used a limited number of face identities that were restricted to Caucasian ethnicity. A large and more diverse sample of faces should be employed in future studies.

## AUTHOR CONTRIBUTIONS

XL and DP conceived and designed the studies, analyzed the data, and wrote the manuscript. XL contributed to 2D stimuli production and data collection. IH and XL contributed to 3D stimuli production.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2018. 02658/full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Lei, Holzleitner and Perrett. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Predictors of Fighting Ability Inferences Based on Faces

Vít Trebický ˇ 1,2 \* † , Jitka Fialová1,2†, David Stella1,2, Klára Coufalová<sup>3</sup> , Radim Pavelka<sup>3</sup> , Karel Kleisner 1,2, Radim Kuba<sup>1</sup> , Zuzana Šterbová ˇ 1,2 and Jan Havlícek ˇ 1,2

<sup>1</sup> Faculty of Science, Charles University, Prague, Czechia, <sup>2</sup> Applied Neurosciences and Brain Imaging, National Institute of Mental Health, Klecany, Czechia, <sup>3</sup> Faculty of Physical Education and Sport, Charles University, Prague, Czechia

Facial perception plays a key role in various social interactions, including formidability assessments. People make relatively accurate inferences about men's physical strength, aggressiveness, and success in physical confrontations based on facial cues. The physical factors related to the perception of fighting ability and their relative contribution have not been investigated yet, since most existing studies employed only a limited number of threat potential measures or proxies. In the present study, we collected data from Czech Mixed Martial Arts (MMA) fighters regarding their fighting success and physical performance in order to test physical predictors of perceived fighting ability made on the basis of high-fidelity facial photographs. We have also explored the relationship between perceived and actual fighting ability. We created standardized 360◦ photographs of 44 MMA fighters which were assessed on their perceived fighting ability by 94 raters (46 males). Further, we obtained data regarding their physical characteristics (e.g., age, height, body composition) and performance (MMA score, isometric strength, anaerobic performance, lung capacity). In contrast to previous studies, we did not find any significant links between the actual and the perceived fighting ability. The results of a multiple regression analysis have, however, shown that heavier fighters and those with higher anaerobic performance were judged as more successful. Our results suggest that certain physical performance-related characteristics are mirrored in individuals' faces but assessments of fighting success based on facial cues are not congruent with actual fighting performance.

Keywords: perception, formidability, aggressiveness, strength, anaerobic performance, vital capacity, body composition, beardedness

## INTRODUCTION

Male intra-sexual competition is considered an important factor of selective pressure (Puts, 2010; Tˇrebický et al., 2012; Hill et al., 2013; Sell et al., 2017), because it is associated with access to resources via rise in social hierarchy and consequently also with broader mating opportunities. Evidence from various cultures (e.g., von Rueden et al., 2008) and ancestral societies (Walker, 2001) suggests that incidence of physical confrontations in humans is comparable to non-human species (Ellis, 1995). Benefits that can be gained in such confrontations must be, however, always weighed against potential costs, which may include injuries or even death. Decision whether to flee or fight is therefore frequently taken before an actual physical confrontation takes place, which means that one of the opponents often surrenders

#### Edited by:

Danielle Sulikowski, Charles Sturt University, Australia

#### Reviewed by:

Justin Kyle Mogilski, University of South Carolina Salkehatchie, United States Barnaby James Wyld Dixson, The University of Queensland, Australia

#### \*Correspondence:

Vít Trebický ˇ vit.trebicky@natur.cuni.cz

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 12 October 2018 Accepted: 19 December 2018 Published: 15 January 2019

#### Citation:

Trebický V, Fialová J, Stella D, ˇ Coufalová K, Pavelka R, Kleisner K, Kuba R, Šterbová Z and Havlí ˇ cek J ˇ (2019) Predictors of Fighting Ability Inferences Based on Faces. Front. Psychol. 9:2740. doi: 10.3389/fpsyg.2018.02740

**95**

without a fight (Sell et al., 2012). Individuals who are good at assessing their chances to win are likely to gain a selective advantage. We may thus expect that perceptual and/or cognitive adaptations for the assessments of one's own and others' fighting ability have evolved.

Recent research shows that humans are capable of inferring fighting ability from facial, body, and vocal cues (Sell et al., 2009, 2010; Puts et al., 2012; Tˇrebický et al., 2013; Little et al., 2015; Raine et al., 2018). Current studies tend to focus on investigating the relationship between the individual components of threat potential, such as body size, upper-body strength, or fighting success, and facial perception (Sell, 2016). One crosscultural study demonstrated that people can assess upper-body strength and fighting ability of males from facial photographs alone (Sell et al., 2009). Several other studies have investigated the association between hand-grip strength—a frequently used proxy for upper-body strength (Wind et al., 2010)—and various characteristics perceived from faces. It has been repeatedly shown that physically stronger men receive higher ratings of dominance, masculinity, and attractiveness (Fink et al., 2007; Windhager et al., 2011; Geniole and McCormick, 2015; Gallup and Fink, 2018). When 3D facial stimuli were used, Holzleitner and Perrett (2016) found an association between actual and perceived strength, but weaker than in earlier investigations. Results of this study also suggest that perceived strength was independently predicted by the amount of muscle and fat, which mediated the effect of actual strength on the perceived strength (Holzleitner and Perrett, 2016). A recent study revealed that men's perceived "facial threat potential"—derived from dominance, strength, and weight ratings—is related to scores of "actual threat potential," as based on a composite measure of hand-grip strength, weight, and height (Han et al., 2017).

Another line of research investigates the association between actual fighting ability and facial perception by employing Mixed Martial Arts (MMA) fighters and their fighting success score. When fight outcomes were assessed from faces of particular pairs of MMA fighters with known fight outcome, the actual winners were selected as more likely to win a fight, as being more aggressive, stronger, and more masculine than the losers (Little et al., 2015). A rating study by Tˇrebický et al. (2013) showed that perceived aggressiveness of MMA fighters is associated with their fighting success. Moreover, actual fighting ability was also linked to perceived fighting ability, but only in heavyweight fighters. However, the factors responsible for the perception of fighting ability, including their relative contribution, has not been investigated yet. This is partly due to the fact that most existing studies employ only a limited number of threat potential measures and/or proxies.

To explore these issues, we collected detailed data on Czech MMA fighters, regarding their actual fighting ability, and those physical performance measurements which were considered important in previous studies focused on the performance of MMA fighters (e.g., Lenetsky and Harris, 2012; Alm and Yu, 2013). We have chosen MMA as an analog to real-life physical confrontations because it combines various fighting styles used in other combat sports and blends them into a unique multielement martial art. It employs a wide variety of techniques: opponents fight in a standing position, where they rely on punches and kicks (much like in boxing, kick-boxing, and Muay Thai), but also on the ground, where they wrestle and grapple (using techniques from e.g., Brazilian Jiu-Jitsu, Judo, Greco-Roman wrestling, and freestyle wrestling). The extremely dynamic nature of MMA fights involves both repeated explosive movements and submaximal dynamic work, that is, a combination of high anaerobic and aerobic demands (Lenetsky and Harris, 2012). For these reasons, body composition (Boileau and Lohman, 1977; Braswell et al., 2010), aerobic endurance (Yoon, 2002; Radovanovic et al., 2011; Durmic et al., 2017), maximum strength, and anaerobic capacity (AC) (La Bountry et al., 2011) all play an important role in maintaining performance throughout the fight.

To cover a broad range of physical factors which might affect perceived fighting ability, we collected data on overall body strength (measured as the maximal isometric strength of hands, arms, legs, trunk, and neck), endurance (using lung capacity measurements), AC (using the Wingate test), and body composition (data on body weight, body fat mass, muscle mass, and bone mass).

Further, it has been shown that men's beardedness, while having no effect on fighting outcomes in competitions (Dixson et al., 2018), is linked to judgements of higher levels of masculinity (Dixson et al., 2017), dominance (Muscarella and Cunningham, 1996; Neave and Shields, 2008; Dixson and Vasey, 2012; Saxton et al., 2016; Sherlock et al., 2017), and aggressiveness (Muscarella and Cunningham, 1996; Neave and Shields, 2008; Dixson and Vasey, 2012; Geniole and McCormick, 2015). For this reason, we have also explored the effect of facial hair on the perception of fighting ability.

Most existing studies tended to rely on static frontal facial photographs of varying quality and standardization, which convey a limited amount of visual information regarding overall facial morphology (Danel et al., 2018). To overcome these issues, we collected highly standardized 360◦ view photographs of heads, which provide more visual information. These were then used to investigate the relationship between the perception of fighting ability and various measures of athletes' physical performance.

## MATERIALS AND METHODS

All procedures followed were in accordance with ethical standards of the relevant committee on human experimentation and with the Helsinki Declaration. The study was approved by the Institutional Review Board of the National Institute of Mental Health, Czech Republic (Ref. num. 28/15). All participants were informed about the goals of the study and approved their participation by signing informed consent.

### Participants Targets

In total, we obtained photographs and data on physical performance from 44 MMA athletes (mean age = 26.7, SD = 5.91, range = 18–38); all residents of the Czech Republic. They were recruited via social media advertisements, leaflets distributed at local MMA tournaments, at gyms, and with the help of Mixed Martial Arts Association Czech Republic (MMAA). They were reimbursed for their participation with 400 CZK (∼15 EUR). To obtain information about their fighting success rate, we computed their actual fighting ability as the proportion of wins relative to the total number of fights. Hereafter, the term "actual fighting ability" refers to their actual success in competition.

#### Raters

Photographs were judged by 46 male (mean age = 21.96 years, SD = 2.56, range = 19–29) and 48 female raters (mean age = 22.29 years, SD = 3.56, range = 18–38), mainly Charles University students, who were recruited via social media advertisements and mailing list of participants established in previous studies. The participants received 100 CZK (∼4 EUR) as a compensation for their participation and a debriefing leaflet which explained the purpose of the study.

## Stimuli Collection

#### Photographs Acquisition and Setting

Photographs were captured with 24 megapixels full-frame (35.9 × 24 mm CMOS sensor, a 35 mm film equivalent) digital SLR camera Nikon D610 equipped with a fixed focal length lens Nikon AF-S NIKKOR 85 mm f/1.8 G (Tˇrebický et al., 2016). Exposure values were set to ISO 100, shutter speed 1/200 s and aperture f11 in all photographs. Photographs were shot into 14-bit uncompressed raw files (NEF) and AdobeRGB color space. Color calibration was performed by using X-Rite Color Checker Passport color targets and a white balance patch photographed at the beginning of each session. The camera was mounted in portrait orientation directly on the light stand which also carried a strobe light positioned 125 cm from the participant. The aim of this setup was to achieve a perception close to the social interpersonal distance (Hall, 1966; Baldassare and Feller, 1975; Sorokowska et al., 2017), to maintain a constant distortion of perspective (Tˇrebický et al., 2016; Erkelens, 2018), and to avoid potential perception bias based on interpersonal distance (Bryan et al., 2012). Camera's distance from each participant was checked with a digital laser rangefinder (Bosch PLR 15). Camera's height was adjusted individually for each participant (target) so as to center his head in the middle of the frame, and the focus point was set on participant's eye. This setting of camera's distance, focal length, and sensor size yielded a 35 × 53 cm field of view (23.85◦ angle of view).

Participants were seated on a bar stool and asked to sit straight with hands hanging freely alongside their body. They were photographed in black underwear shorts we provided them with (i.e., they wore no T-shirts) and without any adornments, such as glasses or jewelry. They were instructed to look directly into the camera, adopt a neutral expression, and retain this position through whole photographing session. To capture 360◦ images of targets' head, a stool was placed on a turning platform which could be manually rotated by 10◦ in 36 steps, see **Figure 1** for illustration. This resulted in 36 photographs for each participant. The platform was placed in a purposebuilt portable photographic booth to control for any changes in ambient light and for color reflections (Rowland and Burriss,

FIGURE 1 | Illustrative image of photograph acquisition setup. Photograph by Jitka Fialová, published with informed and written consent of depicted participant and co-authors.

2017; Thorstenson, 2018). We took two full rotations of each participant to obtain one full set of high-quality photographs (to eliminate possible movements between shots, blinks etc.).

Standardized lighting conditions and uniform exposure across the whole scene were ensured by using one 800 W studio strobe (Photon Europe MSN-800) with a white reflective umbrella used as a light diffuser (Photon Europe, 109 cm diameter) mounted onto a 175 cm high light stand, tilted 10◦ downwards toward the booth. Correct lighting exposure was checked before each session with a digital light meter (Sekonic L-308S). For further details on the photograph acquisition procedure, see Tˇrebický et al. (2018).

## Stimuli Processing and Building 360◦ Head Rotations

Final sets of 36 photographs of full 360◦ head rotation for each participant were selected and postprocessed in Adobe Lightroom Classic CC (Version 2017, Adobe Systems Inc.). First, we converted photographs into DNG raw files, then we built DNG color calibration profiles and applied them to all photographs. Exposure across all selected photographs was verified in three background areas around the head (above, left, right) and eventual slight differences in exposure were manually adjusted to the same level. Subsequently, the calibrated photographs were exported into lossless 16-bit AdobeRGB TIFF files in real size of 35 × 53 cm and 168 PPI. This allowed us to present images of participants' heads in their real-life size.

Photographs were aligned so that each participant's head was positioned in the center of each frame with eyes on the same horizontal line in all pictures. Final photographs were batch-cropped to 2,095 × 2,305 side ratio to fit head rotations of all participants. All photographs were subsequently converted into sRGB color space and exported as 8-bit JPEG files (2,095 × 2,305 px @ 168 ppi).

We built the 360◦ head rotations with Sirv (www.sirv.com, Magic Toolbox Limited), an online suite for creating and managing image spins. Photographs of all target participants were uploaded, and individual spins created.

#### Rating Sessions

Rating sessions took place in a quiet perception lab under standardized conditions. Raters were seated 125 cm from the screen, i.e., at the distance at which the photographs were captured, so as to approximate a social interpersonal distance (Sorokowska et al., 2017) and thereby increase the ecological validity of the rating session.

Ratings were carried out on a 27′′ Dell U2718Q UltraSharp IPS color calibrated screen (3,840 × 2,160 px, 99% sRGB color space coverage) turned into a vertical position to accommodate the life-sized images used. The data were collected via Qualtrics survey suite (Qualtrics, Provo, UT, United States).

Raters were asked to rate fighting ability ("Jak moc by byl tento muž úspešný, kdyby se dostal do fyzického souboje?"/"If this ˇ man were involved in a physical confrontation, how successful would he be?") of each photograph on a 7-point verbally anchored scale (from "1—velice neúspešný"/"very unsuccessful," ˇ to "7—velice úspešný"/"very successful"). The 360 ˇ ◦ rotations spun automatically once and then raters could freely turn the photographs around for further inspection by dragging the mouse left or right before rating them. Photographs were presented in a randomized order and time spent rating was not restricted. Finally, all raters completed a brief questionnaire (regarding their age, height, weight, and self-rated formidability).

## Physical Performance and Body Composition Measurements

To determine the physical performance and body parameters of participating athletes, we employed complex measurements relevant to martial arts performance, which included quantifications of their body composition, maximal isometric strength, lung capacity, and AC measurements (Schick et al., 2010; Vidal Andreato et al., 2011; Lenetsky and Harris, 2012; Alm and Yu, 2013; Coufalová et al., 2014; Marinho et al., 2016). **Table 1** provides descriptive statistics. All measurements were performed at the Biomedicine Laboratory of the Faculty of Physical Education and Sport, Charles University (see **Figure 2**).

#### Body Composition Measurements

To acquire detailed measures of body composition, we performed a bioelectrical impedance analysis, which is based on measuring body's electrical resistance to an imperceptible electric current. Electrical resistance is a function of both body shape and TABLE 1 | Descriptive statistics of the target sample.

#### Descriptive statistics


the volume of conductive tissues in the body (Goran, 1998). Participating athletes were asked to avoid activities which may bias the measurement, such as consumption of alcoholic beverages, sauna, and demanding physical activities 24 h prior to the test, and eating and drinking for 2 h before the measurement (Brodie et al., 1991; Fogelholm et al., 1993). Body weight,

FIGURE 2 | Physical performance measurements. Top left—maximal isometric strength (arm extension dynamometry) measurement, top right—lung capacity (spirometry) measurement, bottom—anaerobic capacity measurement (Wingate test). Photographs by Jitka Fialová, with informed and written consent of the depicted participant.

body fat mass, muscle mass, and bone mass were measured (Vaara et al., 2012) using Tanita MC-980 bio impedance scale (Athlete setting). Testing was performed in a standing position, with participants both standing on and holding in their hands measuring electrodes with arms freely alongside the body. Participants were tested while wearing only the underwear we provided them with (Pinilla et al., 1992).

#### Maximal Isometric Strength Measurements

Isometric strength in flexion and extension of arms, legs, trunk, and neck was measured as the peak force produced by maximal voluntary isometric contraction of each muscle group while the athlete was seated on a specifically designed dynamometric station with low profile aluminum load cell (model 1042, measurement error ± 0.05%) (Coufalová et al., 2014). Using a digital hand-grip dynamometer Takei TKK 5401, we evaluated the isometric strength of hands (Vidal Andreato et al., 2011; Bonitch-Góngora et al., 2013). While performing the hand grip measurements, athletes were instructed to stand straight with their arms alongside their body.

Three attempts were performed for each type of measurement while switching sides between attempts and using the "best test" method, meaning that only the highest performance was recorded and included in subsequent analyses.

#### Lung Capacity Measurements

To assess the lung capacity, we used spirometry. This physiological test measures how individuals inhale and exhale volumes of air as a function of time while measuring either total volume or flow. Measures of lung capacity were acquired with spirometer MicroLab ML3500 MK8. Three standing forced vital capacity (FVC) maneuvers were performed: we measured the highest volume of FVC, forced expiratory volume in the first second (FEV1), and peak expiratory flow (PEF), while again applying the "best test" method, i.e., recording the highest of three test values. FVC is the maximal volume of air exhaled with maximally forced effort from a maximal inspiration delivered during an expiration made as forcefully and completely as possible (i.e., vital capacity performed with maximally forced expiratory effort). FEV1 is the maximal volume of air exhaled in the first second of forced expiration from a position of full inspiration and PEF indicates the maximum expiratory flow achieved from maximum forced expiration from the point of maximal lung inflation (Miller et al., 2005).

#### Anaerobic Capacity Measurements

Anaerobic performance we measured using the Wingate test, which consists of 30 s of supramaximal arm-cranking exercise at maximal speed against a frictional resistance determined relative to the subject's body weight (Bar-Or, 1987). Three indices are measured: (1) anaerobic power (AP), which indicates the highest mechanical power elicited during the test, (2) mean power, which shows the average power sustained throughout the 30 s period, (3) AC, which indicates the total work performed during the entire 30 s period, and (4) power decrease (PD), which measures the degree of power drop-off during the test (Collomp et al., 1991). The Wingate test was performed on a Monark arm ergometer (model Rump-Rokos 4.00/C01) with a load of 4 W per kilogram of body weight. Participants were instructed to remain seated and verbally encouraged to perform as quickly as possible right from the start and to maintain maximal turning rates throughout the 30 s period. The test was preceded by a short warm up period, where the participant exercised until achieving 120 bpm heart rate. This was followed by activation of the load (Franchini et al., 2003).

## Level of Beardedness

Two authors coded each target's image. To assess the level of facial hair, we employed three beardedness categories defined in earlier research (Dixson et al., 2018). Agreement between both authors was above 95%, i.e., in 42 out of 44 cases; the remaining two cases were discussed and categorized. The procedure resulted in categories: (1) "Shaved" including athletes with no facial hair of any kind (N = 15; 34%); (2) "Some beard" including athletes with all kinds of facial hair except shaven and full beards (N = 20; 45.5%); (3) "Full beard" including athletes with trimmed and bushy full beards (N = 9; 20.5%).

## Statistical Analysis

All statistical tests were performed in SPSS 23 (IBM Corp., 2015) and JASP 0.9.0.1 (JASP Team, 2018). McDonald's ω statistics was used to estimate inter-rater agreement. Differences in fighting ability ratings were analyzed by independent samples t-test and association between ratings was assessed by bivariate correlations using Pearson's correlation coefficient. Component scores of physical performance measures, which were then used in subsequent analyses, were calculated using principal component analysis (PCA) with no rotations. To assess which factors predict the perceived fighting ability and to estimate their relative contribution, we ran a linear regression analysis where all predictors were entered simultaneously using the enter method. Similarly, we used regression analysis to investigate the relationship between the actual and perceived fighting ability. To test the influence of beardedness on fighting ability ratings, oneway ANOVA was carried out. We entered fighting ability ratings as dependent and level of beardedness as independent variables. The effect size for one-way ANOVA is reported in η 2 p . A Holm's post-hoc test was performed and effect sizes for the comparison are reported in Cohen's d.

#### Component Scores of Physical Performance Measures

In order to reduce the number of variables produced by physical measurements and body composition tests and to obtain robust and representative component scores to apply in subsequent analyses, we used a PCA. We checked the assumptions of this analysis by looking for multicollinearity (>0.9) or singularity (=0.0) between variables by a bivariate correlation. For body composition measures, we found a high correlation between body weight, body fat, muscle mass, bone mass, and total body water (rs > 0.817). For later regression analysis, we have therefore decided to keep body weight as the most representative variable that includes all body composition measures. It is a frequently used measure of body size, thus allowing for a comparison with previous research. Analysis of AC data yielded by the Wingate test measurements revealed a high correlation between maximum performance, average performance, AC, and decrease of performance (rs > 0.9). In view of these results, and because we use maximal performance also in other measurements, we decided to use maximum performance as a variable in the PCA. After these initial adjustments, assumptions of the analysis were met.

We subjected maximal isometric strength measurements to the PCA. This produced a single component which we labeled "Isometric strength." Next, we entered spirometry test measurements into the PCA, which resulted in a component we labeled "Lung capacity." Anaerobic capacity measurements also yielded a single component, the "Anaerobic capacity." The resulting components and their loadings are listed in the **Supplementary Materials**, **Table S1**.

## Data Availability

Datasets generated and analyzed during the current study are available in the **Supplementary Material** of this article (Dataset athletes.XLSX, Dataset rating.XLSX).

## RESULTS

McDonald's ω scores of ratings by males (ω = 0.851), females (ω = 0.738), and total (ω = 0.795) showed a high interrater agreement. In subsequent analyses, we have therefore used mean fighting ability ratings. We have also found a high correlation between fighting ability ratings assigned by men and women (r = 0.972, 95% CI [0.95, 0.985], p < 0.001), which is why we decided to analyze the ratings of both sexes together. Independent samples t-test also showed no significant sex differences in ratings [t(86) = 0.041, p = 0.968, d = 0.009], which further supported our decision to analyze the ratings of both sexes together.

## Predictors of Perceived Fighting Ability

A multiple linear regression analysis was run to predict perceived fighting ability whereby age, weight, Isometric strength, Lung capacity, and AC components were all treated as independent predictors. The overall model was significant [F(5, 38) = 2.79, p = 0.031, R <sup>2</sup> = 0.269], but none of the individual predictors statistically significantly predicted the perception of fighting ability: all ps > 0.05 (see **Table 2**).

## Actual Fighting Ability as a Predictor of Perceived Fighting Ability

Exploratory correlation analysis showed a positive correlation between fighter's age (r = 0.35, p = 0.018), weight (r = 0.341, p = 0.022), and perceived fighting ability, which is why we added these measures into the linear regression model. The overall model significantly predicted perceived fighting ability [F(3, 40) = 3.579, p = 0.022, R <sup>2</sup> = 0.212]. Among the predictors, body weight significantly predicted perceived fighting ability (β = 0.31, t = 2.033, p = 0.049), but actual fighting ability (β = −0.175, t = −1.205, p = 0.235) nor age (β = 0.247, t = 1.669, p = 0.103) were statistically significantly related to perceived fighting ability (see **Table 3**).

## The Effect of Beardedness on Perceived Fighting Ability

We found a moderate effect bordering on a formal level of significance of beardedness (Shaved: M = 3.55, SD = 1.09; Some beard: M = 4.3, SD = 0.8; Full beard: M = 4.34, SD = 1.07) on fighting ability rating [F(2, 41) = 3.099, p = 0.056, η 2 <sup>p</sup> = 0.131]. For exploratory purposes, we ran Holm's post-hoc comparison. Although not significantly, the Shaved category received the lowest rating, while Some beard (t = 2.279, pHolm = 0.084, Cohen's d = 0.801) and Full beard (t = 1.943, pHolm = 0.118, Cohen's d = 0.729) categories received higher ratings. Some beard and Full beard categories did not differ (t = 0.102, pHolm = 0.919, Cohen's d = 0.044) (**Figure 3**).

## DISCUSSION

In the present study, we tested which aspects of physical performance affect the perception of fighting ability. To that purpose, we used 360◦ head rotation photographs of male MMA athletes. We gathered detailed physical measures relevant to physical confrontations, and although it turned out that TABLE 2 | A summary of regression analysis for variables predicting the perceived fighting ability (fighters' age, body weight, Isometric strength, Lung capacity, and Anaerobic capacity component).


TABLE 3 | Summary of regression analysis for the relationship between perceived and actual fighting ability, fighters' age, and body weight.


FIGURE 3 | Differences in mean ratings of fighting ability between levels of beardedness (Shaved, Some beard, Full beard). The graph represents the means, their 95% CIs, and data distribution for the three beardedness levels. Mean perceived fighting ability did not differ significantly between beardedness levels.

overall physical performance predicts fighting ability rating, statistical analysis did not show that any particular predictor contributes to the perception of fighting ability significantly. Body weight and AC (AC component) did, however, explained most of the variability, while isometric strength (Isometric strength component) was related negatively. We did not find any significant association between the perceived fighting ability and actual fighting ability in physical confrontations, and perceived fighting ability was predicted solely by athletes' body weight. Further, we explored a possible effect of beardedness on perceived fighting ability and found a moderate-sized but non-significant effect, whereby the shaved targets received the lowest rating.

Compared to previous investigations which used either just one or a limited number of threat potential measures, we collected detailed data about various aspects of physical performance. Although other studies (e.g., Sell et al., 2009; Han et al., 2017) have reported that handgrip strength, height, and weight affect the perception of overall fighting ability, our aim was to investigate other relevant factors which could potentially contribute to perceptual inferences, such as overall isometric strength, anaerobic, or lung capacity. Our regression model significantly predicted perceived fighting ability but none of the individual predictors contributed to the perceived fighting ability significantly. We identified body weight and AC as variables that have the greatest impact on perceived fighting ability. The general probability of being perceived as a winner seems to be related to body size and weight (Deaner et al., 2012), whereby heavier athletes are seen as better fighters than the lighter ones.

It has been suggested in earlier studies that body size (here assessed as body weight) plays a key role during the initial phase of formidability assessments (Tˇrebický et al., 2015). We could thus speculate that our findings are compatible with a model according to which assessment of a potential opponent takes place on multiple levels (Tˇrebický and Havlícek, ˇ 2017). The first step, the "fight or flight" decision, seems to depend

mainly on opponent's overall size. If, however, the rivals are of a comparable size, another level of assessment may be deployed that could be linked to the perception of other potentially significant characteristics.

Another predictor of perceived fighting ability was AC as measured by the Wingate test. Anaerobic capability has been reported as a key characteristic of successful martial arts athletes (James et al., 2016). MMA is physiologically complex and during contests, fighters deploy a wide range of mechanical and metabolic qualities. Intense striking exchanges are common, but twice as many fights end during highly physically demanding ground fight sequences, that is, when fighters use their wrestling and grappling techniques (Del Vecchio et al., 2011). Highintensity and relatively long engagements are therefore a significant part of the overall performance, and they can be approximated by the Wingate test. Moreover, this measure seems to be related to general physical fitness and performance, which is apparent from its correlation with body composition, isometric strength and lung capacity (see exploratory correlations **Table S2** in Supplementary Materials). Earlier research has revealed that cues to strength are present in human faces (Hugill et al., 2009) and individuals' strength is connected to masculinity and dominance ratings (Fink et al., 2007; Windhager et al., 2011), but we found no evidence of a similar relationship for assessments of fighting ability. In fact, our data suggest a rather surprising opposite pattern. Although physical performance is undoubtedly the cornerstone of successful performance, psychological characteristics such as personality, ability to cope with stress, or successful execution of techniques and other skills may also significantly affect fighting ability and success (Gould et al., 1981; Filaire et al., 2001; Radochonski et al., 2011; Ruiz and Hanin, 2011; Chen and Cheesman, 2013; Bernacka et al., 2016). These factors, however, exceed the scope of the current study and should be investigated in future.

Our results are also in contrast with former studies which showed that people can accurately assess actual fighting ability from facial photographs of MMA fighters when asked to rate their aggressiveness, fighting ability, or likelihood of wining (Tˇrebický et al., 2013; Little et al., 2015). Tˇrebický et al. (2013) found a link between the perceived and actual fighting ability only in heavyweight fighters. Limited sample size and uneven distribution within weight categories did not allow us to test the effect of weight category directly, but heavier athletes in our sample were perceived as more formidable competitors, which suggests a similar pattern (see **Table S2**).

Numerous studies have shown that beards enhance ratings of traits related to intrasexual competition, such as men's perceived age, masculinity, social dominance, or aggressiveness. In short, bearded men tend to score higher in these measures than cleanshaved individuals (Neave and Shields, 2008; Dixson and Vasey, 2012; Geniole and McCormick, 2015; Saxton et al., 2016). Our study provides additional support to these findings because athletes with facial hair were rated higher on the perceived fighting ability scale.

The main goal of present study was to investigate the visual perception of threat potential. We have therefore employed a more holistic concept of fighting ability and did not focus solely on the perception of particular characteristics which may contribute to fighting success. In other studies, participants were asked to assess strength, dominance, masculinity, and aggressiveness, which are all relatively simple characteristics. In the current study, participants rated fighting ability, an arguably more abstract or comprehensive quality, which made the assessments more difficult to process. One could speculate that the use of a different rating scale, e.g., one focused on aggressiveness, could yield significant findings, because earlier studies (e.g., Tˇrebický et al., 2013) have reported a close association between this characteristic and perceived fighting ability.

Future studies should address also non-European populations of raters (Tˇrebický et al., 2018). It is possible that other aspects of the male physique may be associated with perceived fighting ability for instance in Asian cultures, where agility, flexibility, and movement complexity may play a more important role. It is also possible that originally African fighting styles were more dynamic than the rather static and force-oriented European styles.

Our sample substantially differs from previous studies in several aspects. Athletes in the present study varied in performance levels, ranging all the way from beginner amateurs to seasoned professionals, while earlier studies used as stimuli photographs of high-profile professional fighters (UFC). Professional fighters have a considerably greater fighting record, which translates into more accurate estimation of their actual fighting ability. In our study, we included fighters who took part in at least two fights, but this low number of fights may result in an inaccurate picture of athlete's true fighting potential, especially in case of fighters who are just starting their careers. Moreover, reliability of the score could be affected by the way in which fighters are paired for matches, which is not a random process. Fighters are paired by organizing committees who take into account their previous experience and fighting record. This may potentially limit the use of the success score as a measure of actual fighting ability. One could think of a more complex measure of fighting ability which would take into consideration the formidability of the opponent (e.g., winning a fight against an experienced fighter would result in a bonus score, i.e., a higher score than winning a fight with a beginner). Data on formidability/experience of the individual fighters were not, however, obtained in our sample and the wins-to-all-fights ratio remains the most objective measure of the fighting ability available at the moment.

The present study is also based on a relatively small sample and one could therefore argue that it had a rather limited chance of detecting expected associations. Nevertheless, our sample size is comparable to earlier studies such as Han et al. (2017) (N = 44); Sell et al. (2009) (N = 59) or Windhager et al. (2011) (N = 26). Despite our best efforts, we found no more volunteers among fighters who would meet our criteria (age 18–40, at least two MMA fights) and were willing to participate, because the popularity of MMA in the Czech Republic is increasing rather slowly.

To limit a potential systematic bias in photographs and to give our raters maximum visual information, we took highly standardized 360◦ rotating photographs. We have also asked participants to adopt a neutral expression and straight position during the whole photo session, which could help eliminate possible cues to fighting ability inferences based on slight and unintentional facial expressions. This is an important point because several earlier studies used downloaded photographs of professional fighters, which varied in lighting conditions, head tilts etc. Variability in stimuli quality in previous research could be viewed as "noise" which decreased the likelihood of finding a systematic effect. Nonetheless, can we really take it for granted that this assumption is correct? Could it be that a more formidable depiction, e.g., one with a head tilt and a frown, is a cue to better fighters? The effect of image standardization vs. self-expression on accuracy of inferences should be addressed in future studies.

One may also argue that the generalization potential of results based solely on MMA fighter population may be limited due to its specific characteristics, such as overall high level of formidability and rather specific appearance (broken noses, facial scars, etc.). We tried to decrease potential bias by informing the raters about the target selection criteria only upon completion of the study. Interestingly, the mean formidability rating on a 7-point scale was ≈ 4 (ranging from 2.22 to 6) and the data followed a normal distribution. Although the physical performance of the MMA fighters may be considerably higher than that found in the general population in industrialized countries, it may be less impressive when compared to aged-matched individuals from small-scale societies. It is thus possible that a high level of athletic performance mirrors ancestral human conditions better than the commonly used student samples.

In conclusion, we found no significant connections between the measured predictors of physical performance and the perception of fighting ability from facial photographs. Based on observed effect sizes, we can tentatively conclude that inferences of fighting ability are mostly linked to body size (especially weight) and AC, which are both qualities which affect the outcome of physical confrontations. Our results therefore indicate that the perception of fighting ability may be more complex than previously thought.

## REFERENCES


## AUTHOR CONTRIBUTIONS

VT, JF, and JH developed the study concept. KK and DS contributed to the study design. Data collection was performed by VT, JF, DS, KC, RK, ZŠ, and RP. VT and JF performed the data analysis and interpretation. VT, JF, and JH drafted the manuscript. DS, KC, KK, RK, ZŠ, and RP provided critical revisions. All authors approved the final version of the manuscript for submission.

### FUNDING

This research was supported by Czech Science Foundation GACR P407/16/03899S, by the Charles University Research Centres UNCE/HUM/032, UNCE 204056, and by the Ministry of Education, Youth, and Sports NPU I program (No. LO1611).

### ACKNOWLEDGMENTS

We thank Tereza Nevolová, Žaneta Slámová, Dagmar Schwambergová, Pavel Šebesta, and other members of Human Ethology group (www.etologiecloveka.cz) for their help with data collection and ratings, Zsófia Csajbók for valuable advices on statistical analysis, Mixed Martial Arts Association Czech Republic (MMAA), Anna Pilátová, Ph.D. for proofreading, and all volunteers for their participation.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02740/full#supplementary-material

Video 1 | Sample of 360◦ head rotation.

Table S1 | Factor analysis; Exploratory correlation matrix.

Table S2 | Dataset for formidability rating.

Table S3 | Dataset for Athletes measurements.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Tˇrebický, Fialová, Stella, Coufalová, Pavelka, Kleisner, Kuba, Šterbová and Havlí ˇ ˇcek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Toward a New Approach to Cross-Cultural Distinctiveness and Typicality of Human Faces: The Cross-Group Typicality/ Distinctiveness Metric

#### Karel Kleisner<sup>1</sup> \*, Šimon Pokorný<sup>1</sup> and S. Adil Saribay<sup>2</sup>

<sup>1</sup> Department of Philosophy and History of Science, Charles University, Prague, Czechia, <sup>2</sup> Department of Psychology, Bogaziçi University, Istanbul, Turkey ˇ

#### Edited by:

Ian Stephen, Macquarie University, Australia

#### Reviewed by:

Ai-Suan Lee, Universiti Tunku Abdul Rahman, Malaysia Justin Kyle Mogilski, University of South Carolina Salkehatchie, United States Anthony James Lee, University of Glasgow, United Kingdom Iris J. Holzleitner, University of Glasgow, United Kingdom

> \*Correspondence: Karel Kleisner karel.kleisner@natur.cuni.cz

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 11 September 2018 Accepted: 14 January 2019 Published: 31 January 2019

#### Citation:

Kleisner K, Pokorný Š and Saribay SA (2019) Toward a New Approach to Cross-Cultural Distinctiveness and Typicality of Human Faces: The Cross-Group Typicality/ Distinctiveness Metric. Front. Psychol. 10:124. doi: 10.3389/fpsyg.2019.00124 In the present research, we took advantage of geometric morphometrics to propose a data-driven method for estimating the individual degree of facial typicality/distinctiveness for cross-cultural (and other cross-group) comparisons. Looking like a stranger in one's home culture may be somewhat stressful. The same facial appearance, however, might become advantageous within an outgroup population. To address this fit between facial appearance and cultural setting, we propose a simple measure of distinctiveness/typicality based on position of an individual along the axis connecting the facial averages of two populations under comparison. The more distant a face is from its ingroup population mean toward the outgroup mean the more distinct it is (vis-à-vis the ingroup) and the more it resembles the outgroup standards. We compared this new measure with an alternative measure based on distance from outgroup mean. The new measure showed stronger association with rated facial distinctiveness than distance from outgroup mean. Subsequently, we manipulated facial stimuli to reflect different levels of ingroup-outgroup distinctiveness and tested them in one of the target cultures. Perceivers were able to successfully distinguish outgroup from ingroup faces in a twoalternative forced-choice task. There was also some evidence that this task was harder when the two faces were closer along the axis connecting the facial averages from the two cultures. Future directions and potential applications of our proposed approach are discussed.

Keywords: typicality, distinctiveness, geometric morphometrics, cross-culture, face space, morphology

## INTRODUCTION

Travelers to a foreign country are sometimes mistaken to be local. One of the authors has discovered on his many trips to Turkey as a Czech that he can easily mislead people to believe that he is from the Black Sea region of Turkey. Another of the current authors, during his sabbatical in Prague as a Turk, has frequently found himself being spoken to in Czech by locals who did not realize he is a foreigner. It is possible that such experiences are partly the result of some level of resemblance of our respective faces to the typical outgroup face.

Typicality and the concept of type has traditionally played an important role within all life sciences including psychology, comparative anatomy, and morphology (Galton, 1879, 1883; Russell, 1916; Goethe, 1999; Kleisner, 2007). Typical object, or abstraction of a typical object, results from comparison of many particular occurrences of things. Such typical objects are usually considered as a reference against which all other things in the environment are evaluated. The things perceived as the most distant from a type are realized as distinct and less familiar.

Despite the extensive interest in facial distinctiveness, measuring distinctiveness is somewhat complicated because numerous facial aspects and their interrelatedness determine whether a face is perceived as distinctive (Wickham et al., 2000). In his influential work, Valentine (1991) defined facial distinctiveness as a function of Euclidean distance from populational mean face. In Valentine's Face Space, the understanding of typical and distinctive faces is not separated and both are covered by the same multidimensional framework. In such multidimensional similarity space, faces are represented as single points in high-dimensional similarity space defined by visual properties (or facial measurements); faces are normally distributed and there is a higher density of faces closer to origin (mean face); typical faces are closer to the origin (mean face) than atypical faces; typical faces are around the mean while atypical faces are on the periphery (Valentine, 1991; Valentine et al., 2016). Valentine's model is, however, limited to intra-population comparisons since outgroup faces will necessarily form a cluster far away from ingroup mean.

Previous research produced evidence that typicality does affect social perception of faces, focusing mostly on the relationship between face typicality and attractiveness (see, e.g., Langlois and Roggman, 1990; Perrett et al., 1994; Rhodes and Tremewan, 1996; Fink and Penton-Voak, 2002; Rhodes, 2006; DeBruine et al., 2007; Said and Todorov, 2011; Danel et al., 2012; Trujillo et al., 2014). Recently, Sofer et al. (2015) demonstrated that face typicality plays an important role in trustworthiness judgments showing that perceived trustworthiness, but not attractiveness, changes along the cline of facial typicality (Sofer et al., 2015). Moreover, a cross-cultural study on Japanese and Israeli populations revealed that ingroup typical faces were perceived as more trustworthy than outgroup typical faces suggesting that people from different cultures use their own culture-specific typicality cues when judging trustworthiness (Sofer et al., 2017). Furthermore, facial distinctiveness and typicality have been repeatedly shown to be important for face recognition (Bartlett et al., 1984; Valentine, 1991; Valentine and Ferrara, 1991; Vokey and Read, 1992; O'toole et al., 1994; Burton et al., 2005; Valentine et al., 2016). Outgroup perception of typicality is suggested to be the core mechanism of racial stereotypes, where members of a minority that are perceived as more typical (of their own group) face a higher degree of racial prejudice and discrimination (Maddox, 2004; Kahn and Davies, 2011; Hebl et al., 2012). Nevertheless, our work does not focus on how typicality/distinctiveness affects the recognition of individual faces or on the stereotypicality of local minorities. Here we ask to what extent an individual face resembles the standards of ingroup and outgroup population. Our proposed approach is not a refinement of intra-cultural facial typicality/distinctiveness research; rather, it is an extension of it into the area of crosscultural comparisons.

In this research, we compare faces from two populations, Czech and Turkish that are not closely related but also not extremely distant as to the geographical distance as well as to the distance in similarity space. See **Figure 1** for illustration of differences between Czech and Turkish facial morphology. An individual's face may resemble the standards of facial appearance typical of a foreign population while at the same time being perceived as somewhat distinct within its own population. Therefore, we do not ask how distinct the face is from the facial average of its own population. Rather, the question is how to measure the deviances from morphological standards of own population toward the standards of some foreign population. This perspective is crucial if we want to catch the local dynamics of cross-cultural social perception. When a visitor arrives to a foreign country and is encountered by locals, his/her face is not compared to the standards of its home culture but to standards of the local culture, i.e., how his/her face is distinct from the local majority type. In our case, this corresponds to how much a Czech face looks Turkish-like and vice versa.

In general, there are four theoretical options for assessing typicality within a face space: (1) to measure Procrustes distance of all faces from global mean (average of all faces from both cultures under comparison); (2) to calculate distance from each face to local mean of its own population; (3) to calculate distance of each face from mean of outgroup (foreign) population (4) to project individual faces on the axis connecting the local means. Moreover, the objective measures of facial similarity based on Euclidean distances in principal component space has been shown to correspond to the human perception of facial similarity (Tredoux, 2002).

Options 1 and 2 do not allow to determine which of Czech faces are the most similar or dissimilar to Turkish standards and vice versa. The first option reveals only the similarity to a facial average which is the combination of intermediate Czech and Turkish features. The second option only informs about the similarity to local standards but tells nothing about the closeness of face to outgroup mean. For instance, a Czech face that is the most similar to Turks can have the same distance from its local average as the Czech face which is the most dissimilar, because faces are distributed radially to all directions around their local means. The distance from local average remains the best measure of within-culture typicality but does not allow crosscultural comparison. The third and fourth options are more suitable for cross-cultural comparison because they provide some information about similarity of each face to the outgroup culture. The mean of one population is thus used as a reference to assess the level of distinctiveness of faces from second population and vice versa.

Inspired by research on human sexual dimorphism, we employed computational strategies represented by the third and fourth options that were formerly used for measuring the individual degree of male sex-typicality (Valenzano et al., 2006; Sanchez-Pages and Turiegano, 2010; Komori et al., 2011; Sánchez-Pagés and Turiegano, 2013; Sanchez-Pages et al., 2014;

Mitteroecker et al., 2015). Sanchez-Pages et al. (2014) suggested the possibility to use the distance between symmetrized facial configurations of each male individual and average female face as an objective measure of masculinity. In contrast, Mitteroecker et al. (2015) suggested the position of individual male faces along the axis between male and female average shape (maleness shape scores) as a measure of the individual degree of sexual dimorphism. This method, i.e., using two group averages to define an axis of morphological differences were formerly applied also by Komori et al. (2011) and Valenzano et al. (2006). The first approach is computationally identical to the option 3, i.e., distinctiveness measured as Distance from Outgroup Mean (DfOM) while the latter is identical to, option 4, a measure we call here as Cross-Group Typicality/Distinctiveness Metric (CTDM).

The aim of the present research is twofold: Study 1 aims to compare the measure of facial distinctiveness/typicality based on position of an individual face along the vector between ingroup and outgroup mean (CTDM) with a measure based on individual distance from the outgroup mean (DfOM). As the criterion measure, we gathered ratings of how foreign/local various ingroup and outgroup faces look in two cultures. We expect that CTDM will be more tightly correlated with these ratings than DfOM because DfOM does not carry any directional information about face position in morphospace. At the same time, we expect that the difference between correlations (CTDM – typicality/distinctiveness ratings vs. DfOM – typicality/distinctiveness ratings) will be statistically significant.

In Study 2, we used manipulated composites based on different levels of CTDM to estimate the accuracy with which participants categorize outgroup vs. ingroup faces when they are paired. We expect that observers will generally be able to recognize the face of outgroups with accuracy higher than chance. Further, we expect that accuracy in this task will be lower for composite facial pairs showing lower CTDM distance (i.e., when both faces in the pair are closer to their respective outgroup means).

In sum, the main goal of this article is to propose a simple measure of distinctiveness and typicality which could be easily computed and, thanks to its one-dimensional (univariate) continuous nature, used as universal input variable, which reflects the individual degree of typicality/distinctiveness in any crossgroup comparison, within all kinds of subsequent statistical modeling. In two studies, we aim to provide preliminary evidence that our new proposed measure behaves in line with theoretical expectations.

## STUDY 1

## Materials and Methods

In both studies, all procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation and with the Helsinki Declaration. The research (ref. number 06/2017) was approved by The Institutional Review Board of Charles University Faculty of Science. The datasets generated for this study can be found in OSF<sup>1</sup> .

#### Acquisition of Portrait Photographs

fpsyg-10-00124 January 29, 2019 Time: 17:5 # 4

We obtained standardized portrait photographs of 100 men (50 Czech, Mean Age ± SD = 23.89 ± 4.0; and 50 Turkish, Mean Age ± SD = 21.54 ± 1.93) and 100 women (50 Czech, Mean Age ± SD = 23.77 ± 4.32; and 50 Turkish, Mean Age ± SD = 21.28 ± 1.34). The participants were instructed to avoid any facial cosmetics and jewelry, seated in front of a white background and asked to pose for the camera with neutral facial expression. The photographs of Czech Targets were taken with a Canon 6D camera using a 85 mm lens, studio flash, and a reflection screen. The distance from the lens to the face of the participant was 1.5 meters. A similar setup was used for collecting photographs of the Turkish targets. For the majority of photographs, a Nikon D90 with a 105 mm lens was used and targets were seated 2.82 meters from the camera (for details see Saribay et al., 2018). The photographs were cropped so that the eyes were in the same absolute height and the same length of neck was visible. The original image files had dimensions that were too large for subsequent online ratings. Therefore, the final image resolution was reduced to 600<sup>∗</sup> 745 pixels (width∗height).

#### Portrait Ratings

Participants were sent an email inviting them to participate in an online study. A group of 315 Turkish raters (134 men, Mean Age ± SD = 21.13 ± 2.09 and 181 women, Mean Age ± SD = 20.9 ± 1.55) and 123 Czech raters (45 men, Mean Age ± SD = 28.88 ± 12.83 and 78 women, Mean Age ± SD = 27.45 ± 12.56) agreed to participate. Turkish raters were undergraduate students who participated in return for course credit and Czech raters were volunteers. Each rater was shown a total of 100 faces which were a random subset of 100 male and 100 female faces (50 of Turkish and Czech within each gender) in a randomized order, one face at a time. Raters were asked whether each face "looks more like a foreigner or more like a local person?" using a five-point response scale ranging from 1 (certainly a foreigner) to 5 (certainly local). No other information regarding the stimuli were given. Because higher ratings indicate greater certainty that the rated face belongs to the cultural ingroup, we refer to these ratings as "typicality/distinctiveness" and more specifically as "Turkishness" and "Czechness" when the assessment was done by Turkish and Czech raters, respectively. Inter-rater agreement estimated by Cronbach's alpha was 0.968 for Turkishness and 0.882 for Czechness. Male and female ratings were highly correlated for both Czechness (r = 0.926, p < 0.001) and Turkishness (r = 0.987, p < 0.001).

#### Landmark Digitization and Procrustes Fit

Using TpsDig2 software (Rohlf, 2015), we defined 72 landmarks on each face. To specify the shape information along the curves 36 of total number of 72 landmarks were denoted semilandmarks. For definitions and positions of landmarks, see

FIGURE 2 | Positions of landmarks and semilandmarks on a face. Landmarks are marked as white filled circles and semilandmarks as empty circles.

**Figure 2** (the same configuration we used in our previous works, e.g., Kleisner et al., 2010; Danel et al., 2016).

All shape data were symmetrized and subsequently subjected to Generalized Procrustes Analysis using the 'gpagen' function implemented in the geomorph package in R (Adams and Otárola-Castillo, 2013). We pooled shape coordinates for Czech and Turkish facial configurations and ran GPA analysis on this joined dataset. GPAs were run separately for men and women to avoid shape variation due to sexual dimorphism. This procedure centered, scaled, and rotated all landmark configurations giving aligned shape coordinates (Procrustes residuals). The method that minimizes bending energy between each specimen and the Procrustes mean shape was used to slide the semilandmarks along the curves (Bookstein, 1997). The Procrustes-aligned data were projected to tangent space and used in subsequent multivariate analyses. For purposes of intrasexual comparisons of the two alternative measures of typicality/distinctiveness (DfOM vs. CTDM) the mean Czech and Turkish facial configuration (consensus) was computed separately for male and female faces. The average shape differences between Czech and Turkish facial configuration were visualized using thin plate spline (TPS)

<sup>1</sup>https://osf.io/wh7mf/

deformation grids (Bookstein, 1989; Rohlf and Marcus, 1993; Klingenberg, 2013).

#### Distance From Outgroup Mean (DfOM)

Distance from outgroup mean (DfOM) was computed as the Procrustes distance between the outgroup average facial configuration (consensus) and each face in the set (**Figure 3A**). The outgroup defined relative to foreign faces at same time represents the ingroup for native faces. Thus the outgroup is understood from the perspective of the visitor to foreign country.

If Turkish faces are compared relative to Czech standards the DfOM is calculated as the distance of all faces

(Turkish and Czech) from Czech average and represents a measure that can be compared with ratings of "Czechness." If Czech faces are compared relative to Turkish standards the DfOM is calculated as the distance of all faces (Turkish and Czech) from Turkish average and represents a measure that can be compared with ratings of "Turkishness." The shorter is the distance of a face from outgroup consensus the more distinct is the face from its own population and the more typical it is of the foreign population.

#### Cross-Group Typicality/Distinctiveness Metric (CTDM)

To determine the position of an individual facial shape along the axis between ingroup and outgroup mean faces we projected the individual facial configurations in facial morphospace onto the vector between ingroup-outgroup means (see **Figure 3B**; see also Valenzano et al., 2006; Komori et al., 2011; Mitteroecker et al., 2015). This 'between-group PCA' represents a safer alternative to the linear discriminant analysis in cases where the number of individuals do not exceed the number of variables by several times (Mitteroecker and Bookstein, 2011). The position of an individual's face along the axis connecting Czech and Turkish mean shape corresponds to relative distance of each facial configuration from average Czech and Turkish facial shape. This position can be numerically expressed by scores that correspond to projection of each face onto the principal components of the group averages. These scores vary with changes in facial morphology along the vector intersecting Czech and Turkish means and thus represent a one-dimensional proxy to overall multidimensional facial morphology. Higher negative scores indicate more Czechlike morphology whereas higher positive scores indicate more Turkish-like facial shape.

#### Statistical Analysis

All statistical procedures were performed in R 3.5.0. Kendall rank correlation coefficient was computed to measure the strength of relationship between variables. We compared Kendall's correlations of CTDM and DfOM with ratings of Turkishness/Czechness (shared variable). The significance of the difference between the correlation coefficients together with their confidence intervals was bootstrapped. 10,000 random populations were sampled from the original data and the expected distribution of the coefficients and the difference between them was calculated. For the purpose of CI estimation, the bonds between ratings, CTDM, and DfOM values within individuals were maintained while sampling the individuals with replacement. This procedure was equivalent to the CI estimation in bootstrap version of "kendall.ci" function in NSM3 package in R (Schneider et al., 2018), which, however, does not take the relationship between more than two variables into consideration. When the distribution of expected differences between correlation coefficient was calculated, CTDM and DfOM vectors stayed unchanged to maintain their correlation within individuals and the ratings of Turkishness/Czechness was sampled with replacement.

To test for the shape differences between Czech and Turkish faces, we employed a permutation test based on comparison of the distance between Czech and Turkish means to the distances obtained by random assignment of observations to these groups. This was done, separately for each gender, by the "permudist" function implemented within the Morpho package in R (Schlager, 2017).

The differences in morphological variation between compared populations may have some effect on cross-cultural discrimination of faces because target faces from less variable population might have been easier to identify as belonging to the ingroup. Therefore, we tested for differences in morphological disparity between Czech and Turkish facial configurations using the function "morphol.disparity" in R's geomorph package (Adams and Otárola-Castillo, 2013), with significance test based on 9,999 permutations. Using the same routine we also compared the morphological disparity of male and female faces because the facial traits responsible for populational identity might be easier to detect in the less variable gender.

### Results and Discussion Comparison of CTDM and DfOM

As expected, CTDM showed systematically tighter correlations with typicality/distinctiveness ratings (Czechness/Turkishness) than DfOM. The results of Kendall rank correlations (with bootstrapped CIs) are summarized in **Table 1**. See also **Supplementary Table 1** comparing the results of cross-group metrics (CTDM and DfOM) with distance calculated from ingroup means (DfIM). The bootstrap test of difference between Kendall's correlations revealed that the association of typicality/distinctiveness ratings with CTDM is significantly stronger than with DfOM; this was true for faces of both men (p = 0.013) and women (p = 0.029).

TABLE 1 | Kendall's rank correlations (with bootstrapped CIs) between Cross-Group Typicality/Distinctiveness Metric (CTDM), Distance from Outgroup Mean (DfOM), and ratings of typicality/distinctiveness by Turkish (Turkishness) and Czech (Czechness) raters.


See Methods for details on calculation of CTDM and DfOM. Note that CTDM may acquire negative values; higher negative scores indicate more Czech-like morphology whereas higher positive scores indicate more Turkish-like facial shape. <sup>∗</sup>p < 0.01 and ∗∗p < 0.001.

#### Morphological Differences Between Czech and Turkish Faces

Two group permutation test showed that Czech faces significantly differ in their facial shape from Turkish faces both for faces of men (Procrustes distance between means– PDM = 0.01995, p < 0.001) and women (PDM = 0.01634, p < 0.001). This means that the members of our target populations differ in average as to their facial structure which makes the cross-group comparison morphologically meaningful.

The analysis of morphological disparity (MD) showed that Czech men (MD = 0.00199) are more variable in facial shape than Turkish men (MD = 0.00154) and Czech women (MD = 0.00182) are more variable in facial shape than Turkish women (MD = 0.00142). These differences were significant for both male (p = 0.015) and female (p = 0.01) faces. When sex differences were tested, men and women (including both Czech and Turkish faces) did not differ significantly as to the variation in facial shape (p = 0.149). These results suggest that Czechs are generally more variable in facial morphology than Turks. The raters might be thus more effective in classifying the Turkish faces as they are (at least in our sample of faces) morphologically more homogeneous. The stronger effects reported for Turkish faces seems to support this expectation.

## STUDY 2

## Materials and Methods

#### Production of Manipulated Stimuli

We generated manipulated composites in order to estimate the accuracy with which participants categorize a face across different types of trials where the following types of Turkish (TR) and Czech (CZ) face composites are paired: 1: CZ farthest away from TR; 2: Average CZ; 3: CZ closest to TR; 4: TR closest to CZ; 5: Average TR; 6: TR farthest away from CZ (when different types of faces are paired; e.g., 1 vs. 4).

The composite faces were generated with use of TpsSuper 2.05 software (Rohlf, 2015). Six facial images of individuals closest to a selected position determined by CTDM value (i.e., CZ farthest away from TR mean; Average CZ; CZ closest to TR mean; TR closest to CZ mean; Average TR; TR farthest away from CZ mean) were used to create composites. See **Figure 4** for exposition of manipulated composites and **Supplementary Figures 1**, **2**, men and women, respectively, for the position of composites along CTDM axis.

#### Two-Alternative Forced-Choice (2AFC) Discrimination of Manipulated Stimuli

Participants were sent an email inviting them to participate in an online study. 327 Turkish university undergraduates (153 men, Mean Age ± SD = 20.63 ± 1.46; 171 women, Mean Age ± SD = 20.6 ± 1.8; 3 others, Mean Age ± SD = 23 ± 3.6) participated in return for course credit. Participants were asked to view composite face pairs (whose typicality/distinctiveness was manipulated based on CTDM) and to select the face in each pair that "looks more foreign." We focused only on cross-group pairs. That is, in each pair, one face belonged objectively to the CZ group and the other to the TR group (with left-right position on the computer screen counterbalanced). This resulted in the following combinations (in terms of the above categories; see Production of manipulated stimuli section and **Figure 4**): 1 vs. 4; 1 vs. 5, 1 vs. 6; 2 vs. 4; 2 vs. 5; 2 vs. 6; 3 vs. 4; 3 vs. 5; 3 vs. 6. One trial for each possible combination was shown to each participant, in random order. Accuracy was defined in terms of whether the face selected as looking more foreign is objectively an outgroup (i.e., Czech) face.

#### Statistical Analysis

All statistical procedures were performed in R 3.5.0. The average difference in proportion of correct responses in trials in which the Czech face was presented on the left vs. right of the screen was −0.0157 (range = −0.059 to 0.036), suggesting that location of faces did not have a systematic effect on responses. Thus, we analyzed the data collapsing across this counterbalancing factor.

#### **Overall strategy**

We were interested in (1) whether participants were generally able to accurately distinguish outgroup from ingroup faces and (2) whether there would be any difference in the level of this ability in trials with different face pairs. We checked the former using one-proportion z-tests against chance level performance (i.e., 50% accuracy). For the latter, we first conducted Cochran's Q tests (with the RVAideMemoire package in R; Hervé, 2018) to examine whether any two face pairs differed in the proportion of correct responses that participants produced in those trials. To follow-up the significant Cochran's Q tests, we conducted a set of selective pairwise comparisons. For each gender (of face), 9 pairs had been presented to participants, resulting in 36 possible pairwise comparisons in total, most of which are theoretically uninteresting or uninterpretable. For ease of interpretation, we focused on comparisons in which we could hold one face constant (e.g., 1\_4 vs. 3\_4). We selected specific pairs of face pairs to explore the idea that Turkish raters would have more difficulty (i.e., produce lower proportion of correct responses) in trials that involved the Czech face closest to the Turkish average (#3), compared to the Czech faces that are further away from the Turkish average (#1 and #2). That is, the Czech face closest to the Turkish average should be more often mistakenly judged as not being foreign, compared to Czech faces further away from average, holding constant the Turkish face in the pair. We would expect this to be especially true compared to the Czech face furthest away from the Turkish average (#1) and to a lesser extent, to the average Czech face (#2).

This highlights the following 6 comparisons (per gender of face), with the expectation that there would be lower proportion of correct responses in the second pair in each comparison: 1\_X vs. 3\_X and 2\_X vs. 3\_X; where X takes the values of 4, 5, and 6 in different trials. We conducted these pairwise comparisons using McNemar's test (asymptotic; using the rcompanion R package; Mangiafico, 2018), which tests the null hypothesis that the proportion of responses is equal across two face pairs compared. To control for familywise error rate, p-values were adjusted using the Hochberg method. We also employed binomial mixed effects models ("glmer" function within lmerTest package;

FIGURE 4 | Visualization of manipulated composite stimuli of men (above) and women (below) based on different levels of CTDM along mean difference vector.

Kuznetsova et al., 2017) with "responses to CTDM trials" (coded as 1 = true answer, 0 = false answer) as a response variable, "type of CTDM trials" as an independent variable, and "raters identity" as a random effect. We have built separate models for male and female facial stimuli. We specified relevant contrasts between trials using "glht" function within multcomp package (Hothorn et al., 2008).

#### **Data exclusions**

Because the findings could be affected by the presence of raters whose cultural ingroup is other than Turkish, we sought to exclude such raters from the dataset. Participants were asked to report their nationality but this did not capture cultural ingroupoutgroup status or possible differences in amount of exposure to Turkish individuals (e.g., some participants who indicated having a nationality other than Turkish were dual citizens). Instead, we calculated approximately how many months of their lifetime each participant had spent in their homeland by taking the difference between their age in months and the months that they reported being abroad. There were 6 participants who had spent a month or less in Turkey. We excluded these participants from further analyses because their exposure to the Turkish faces appears to be very limited based on this information (1 participant with missing data on this measure was retained).

In the remaining sample, average proportion of correct responses for each participant across the 18 trials of the 2AFC task ranged from 0.16 to 1. There were only 6 participants with an average proportion of correct responses lower than 0.5. The morphological closeness of face pairs presented in the 2AFC trials and hence the difficulty of accurately responding suggests that these low-performers were not necessarily careless or responding randomly. Thus, we retained these low-performing participants. The final sample consisted of 321 participants. These decisions were inconsequential as the findings were not affected by inclusion vs. exclusion of participants on the basis of limited exposure to Turkish faces or low performance in the 2AFC (e.g., the decisions to reject or retain the null hypothesis were unchanged in **Table 3**).

## Results and Discussion

#### Ability to Distinguish Foreign Faces

We checked whether participants were generally able to distinguish foreign faces from local ones. **Table 2** presents the percentage of correct responses in the 2AFC trials across different face pairs separately for male and female target faces. Overall, participants appeared to have no difficulty distinguishing foreign faces. One-proportion z-tests indicated that the proportions for all face pairs in **Table 2** were significantly greater than chance level (0.5) (ps < 0.0001).

TABLE 2 | Proportion of correct responses in the 2AFC for each face pair by face gender.


See Figure 4 or the text for the correspondence between values in the face pair column and composite images.



Hochberg-adjusted p-values: <sup>∗</sup>p < 0.05; ∗∗p < 0.01; and ∗∗∗p < 0.001.

#### Differences in Ability to Distinguish Foreign Faces Across Face Pairs

Next, we examined whether participant's ability to produce the correct response (i.e., choosing the outgroup face as "foreign") differed across face pairs. For this purpose, we conducted a Cochran's Q test as an omnibus test of whether the proportion of correct responses for any pairs were different within the 9 pairs of male faces and 9 pairs of female faces, separately. The proportion of correct responses differed between at least one comparison within both male, χ 2 (8, N = 321) = 52.2, p < 0.0001; and female face pairs, χ 2 (8, N = 321) = 235, p < 0.0001.

To follow up, we compared selected pairs of face pairs (see Statistical Analysis section above) using McNemar's test. These pairwise comparisons are listed in **Table 3**. Together with the proportions listed in **Table 2**, it can be seen that the proportion of correct responses in all significant comparisons are in the expected direction. The comparisons that we expected to be most different are all significant for male faces; whereas the comparisons for which we had less strong expectation of difference are not. For female faces, the comparisons for all pairs were significantly different. The result carried out by binomial mixed effect modeling corroborated the results of the McNemar's test (see **Supplementary Tables 2**, **3**).

There could be limitations arising from the design that could be addressed in future research for a better, confirmatory test of the idea. In sum, while there are apparently other sources of variance in responses in the 2AFC, there is at least preliminary evidence from these results that outgroup faces closer to the ingroup average are more difficult to correctly categorize as "foreign" and/or that outgroup faces further away from the ingroup average are easier to correctly categorize as "foreign."

## GENERAL DISCUSSION

In the present research, we took advantage of geometric morphometrics to propose a sample dependent and thus datadriven method for estimating the individual degree of facial distinctiveness/typicality for cross-cultural comparisons. We attempted to provide preliminary evidence in support of a novel cross-cultural metric of typicality/distinctiveness (CTDM) in two studies. Study 1 revealed significantly tighter association between ratings of typicality/distinctiveness and CTDM than the same ratings and an alternative approach (DfOM). The same pattern was found for faces of both men and women and for both Turkish and Czech raters. The possible weakness of DfOM is that a pure distance measure does not carry information about the position of a face in facial morphospace. The reason why CTDM showed stronger correlations with ratings than DfOM is that DfOM is based on Procrustes distance and can have only positive values. That is, it provides no (or very limited) information about mutual positions of targets (faces) and reference (outgroup mean). The straight lines between individual faces and outgroup mean may form variety of angles with between group axis including right angle. DfOM thus does not account for the fact that faces may theoretically vary in all directions from outgroup mean (see **Figure 3A**). For instance, two Turkish faces having the same distance from the Czech mean may look dissimilar because one of the faces could be even less similar to Turkish standards than the Czech mean. In contrast, CTDM aligns the positional information of individual faces along the cross-group axis (see **Figure 3B**). The above-mentioned problematic interpretation of two Turkish faces having the same DfOM despite their apparent dissimilarity thus becomes clearer within the CTDM framework wherein these two faces acquire different CTDM scores.

In Study 2, we assessed accuracy of discrimination of outgroup vs. ingroup composite faces varying along CTDM in a 2AFC task. Performance was generally consistent with our expectations. For instance, when pairs of face pairs were compared, the pairs that contrasted faces closer in CTDM distance (e.g., the Czech face closest to Turkish average and the Turkish face closest to Czech average) were harder to accurately discriminate compared to face pairs in which the two faces are further apart from each other in CTDM. Results involving female (vs. male) faces were more consistent with expectations. This may indicate that female faces carry more information about cultural identity, a possibility that future research could examine.

In hindsight, even though participants showed high overall accuracy, the 2AFC task may have been difficult for participants as the Czech and Turkish faces show only slightly difference in morphology (see **Figure 4**). Other features of task design may have not been optimal for the current purposes, as well. For instance, because the same faces were repeatedly used across trials, participants may have anchored their responses for repetitions of a given face on their response in the first trial with that face (i.e., consistently judging face 1 to be foreign across trials where it was paired with 4, 5, and 6). This could have masked variability in responses that would otherwise occur. Future tests could better control such irrelevant task features for a purer test of CTDM effects. Despite these limitations, we view Study 2 findings as providing encouraging preliminary evidence for the usefulness of CTDM in predicting performance in ingroup-outgroup face discrimination situations.

## Future Directions, Limitations, and Potential Applications

We used only the shape information for calculation of CTDM but faces are more complex. Hence, the face space can theoretically be augmented with further types of non-shape variables and CTDM may be computed based on this improved face space. Future

studies should take into account also facial size, texture, skin color, eye color, hair style and color, contrasts between mouth and surrounding skin, and so on.

Cross-Group Typicality/Distinctiveness Metric can have a broad range of social applications. Previous research showed positive association between judgments of trustworthiness and facial typicality (Sofer et al., 2015, 2017). In crosscultural interactions, such as bilateral business tradeoffs, student exchange, or even war conflicts, the outgroup individuals closest to the ingroup standards would obtain a substantial advantage because their faces will be more trustworthy-looking to potential, business partners, tutors, or invaders, respectively. In fact, a given face being typical of the ingroup should engender feelings of familiarity, which is known to underlie positive responses to that face (Zebrowitz et al., 2008). Viewed differently, a lack of such typicality and familiarity for an encountered face and/or the face resembling an outgroup prototype is known to make prejudiced reactions toward the face-bearer more likely, sometimes even rendering the situation into a matter of life and death (Blair et al., 2004; Eberhardt et al., 2006). Importantly, these typicality/familiarity-contingent responses to faces may go beyond mere categorization of faces as ingroup vs. outgroup: Perceivers' responses are often driven by facial cues, not necessarily by the perceived social category of faces (Livingston and Brewer, 2002). The influence of group typicality on social perception had long been ignored with the field of social psychology but has been discussed more explicitly in recent years (Maddox, 2004). However, typicality is often coded in a categorial fashion (i.e., the main distinction being between individuals who are more vs. less typical of a group; e.g., Hebl et al., 2012; Study 2) and/or relying on human judges (e.g., Hebl et al., 2012; Studies 1 and 3). Furthermore, such studies usually focus on only one side of the coin: How typical of the ingroup individuals appear. In the case of research on prejudice against African-Americans in the United States, researchers may implicitly have in mind "smaller distance from White American average" when they mention "low racial stereotypicality" of African-American faces, but this is not made explicit (an African-American face can also be less typical of the ingroup and be closer to a different ethnic group such as Hispanic). CTDM may offer a way to distinguish these "cuevs. category-based" responses and the influence of individual differences in targets' appearance in a way that is conceptually clearer, more quantitative and/or objective, and more finely (vs. coarsely) related to face morphology than previous research. A Turkish visitor in Czech Republic should thus receive better treatment from Czech individuals when his/her CTDM is closer to (vs. farther away from) the Czech populational average, in a way that goes beyond him/her being recognized as Turkish (or foreign). In short, CTDM could be used to predict prejudice in intergroup contexts beyond the effect of categorization and could have wide application in social psychology.

Another possible application may link CTDM with face recognition. According to the attractor field model, the object's similarity and spatial density in multidimensional space is unintuitively interrelated. Face morphs were judged to be more similar to the atypical than to the typical parent image (Bartlett and Tanaka, 1998; Tanaka et al., 1998). It is thus more difficult to detect differences between atypical morphs than differences between typical morphs. This was true for morphs of various inanimate and animate objects such as birds, cars, and faces (Tanaka and Corneille, 2007). There seems to be a tradeoff between spatial density (and higher similarity) of faces around local means and bigger attractor fields around distinctive faces. These distinctive faces thus may be perceived as mutually more similar due to their larger attractor fields than they 'objectively' are. Thereof stems one prediction for future research testable with use of CTDM: The individual identity of outgroup faces closest to observers' ingroup mean should be recognized with higher accuracy than individual identities of faces most distant from observers' ingroup standards. A Turkish visitor in Czech Republic should thus better recognize the identity of Czech individuals with CTDM values closer to Turkish populational average.

As our research was intended as an initial test of CTDM, there were limitations other than those already discussed. Some differences between images from the two cultural groups need to be better controlled. Even minor differences in technical equipment used to produce images (e.g., focal length of the camera lens) could result in different images (Tˇrebický et al., 2016). Other stylistic differences such as facial hair may also influence ratings. Future research should seek to remedy these problems.

## Further Theoretical Considerations: A Note on the Limitations of Composite Images in Face Research

Portrait photograph blending is an old procedure, first used by Francis Galton who applied it to reveal features typical of certain categories of people (Galton, 1879, 1883). A long time since Galton, average composites are still used, with much technical improvement. Many researchers use manipulated composite stimuli to investigate various causative effects of facial traits on social impressions. We see this research agenda as at least partially problematic (see also Schaefer et al., 2009; Jones, 2018; Windhager et al., 2018). Calibrating facial morphs for use as stimuli in biological studies of social perception. Scientific reports, 8(1), 6698. doi: 10.1038/s41598-018-24911-0. Unlike individual images, composites yield clear results even when the sample size is low. For instance, Rhodes (2006) reported a strong positive relationship between symmetry and attractiveness in composite faces, but only moderate effects in non-manipulated faces (Rhodes, 2006). Moreover, recent cross-cultural evidence showed only moderate or no relationship between attractiveness ratings based on non-manipulated facial photos and averageness computed as each face's distance from sample Procrustes mean (Kleisner et al., 2017). The use of experimentally manipulated stimuli thus has various practical consequences, such as greater effect sizes and a higher probability of positive results. Moreover, stimuli experimentally manipulated for a particular research purpose become a reification of researchers' theoretical needs. This is a problem when such stimuli are not used as research tools but substituted for natural objects (faces), that is, when individual faces with their natural variations are substituted for manipulated

stimuli whose variation is constrained in a way that a priori corresponds to expectations given by a theory. By constraining the variation of stimuli, we also limit the variation of possible responses to these stimuli. One might claim that this is how experimental science works, which may well be so, but properties of the experimental toolkit must be included in the interpretation of the results. During twentieth century, this condition has been widely discussed within philosophy of science and become an indispensable part of some fields of experimental physics. Yet, it remains largely neglected in evolutionary psychology and biology.

What is the alternative? First, to use non-manipulated stimuli. Second, to use stimuli manipulated so as to correspond to the observed range of natural variation. Third, to use both manipulated and non-manipulated stimuli; the difference in results could then be used as a background to the overall interpretation of results. This can be easily accomplished by application of geometric morphometrics and related multivariate techniques, such as in the CTDM approach presented here, that provide direct statistical control over the direct stimuli analysis and their manipulation.

## CONCLUSION

To conclude, distinctiveness and typicality are two sides of the same coin so claiming that a face is either more/less distinct or more/less typical depends on which populations are taken as ingroup and outgroup. CTDM allows one to estimate the degree to which an individual from a given (ingroup/local) population resembles the facial standards of another (outgroup/foreign) population and vice versa. When mathematically expressed, such knowledge is potentially useful for studying relationships between the individuals' degree of cultural distinctiveness/typicality perceived by others and attributions of attractiveness and personality traits across cultures. Further, CTDM allows us to generate manipulated stimuli that respect the natural variation of human faces within a particular population. Finally, CTDM is not constrained to human faces and can be applied to any shape such as parts of the human body and cultural artifacts. We hope that future research will provide further evidence of CTDM's utility and realize its potential for application in face research and beyond.

## REFERENCES


## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of "The Institutional Review Board of Charles University Faculty of Science" with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the "The Institutional Review Board of Charles University Faculty of Science" (protocol ref. number: 06/2017).

## AUTHOR CONTRIBUTIONS

KK and SAS developed the study concept and drafted the manuscript. Data collection was performed by SAS in Turkey and by ŠP and KK in Czechia. All authors performed the data analysis and result interpretation, provided critical revisions, and approved the final version of the manuscript for submission.

## FUNDING

This study was supported by Czech Science Foundation grant number GA18-10298S.

## ACKNOWLEDGMENTS

We thank Petr Turecek for his development of the bootstrapped ˇ test of Kendall's tau differences used in Study 1, Salvatore Mangiafico for sending us a modified version of his rcompanion package that outputs exact test values for the McNemar's tests used in Study 2, and Jakub Kreisinger for helpful advices on binomial mixed effect modeling.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00124/full#supplementary-material




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Kleisner, Pokorný and Saribay. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# She Always Steps in the Same River: Similarity Among Long-Term Partners in Their Demographic, Physical, and Personality Characteristics

#### Zuzana Šterbová ˇ 1,2 \*, Petr Turecek ˇ 2,3 and Karel Kleisner2,3

<sup>1</sup> Department of Zoology, Faculty of Science, Charles University, Prague, Czechia, <sup>2</sup> National Institute of Mental Health, Klecany, Czechia, <sup>3</sup> Department of Philosophy and History of Science, Faculty of Science, Charles University, Prague, Czechia

#### Edited by:

Alex L. Jones, Swansea University, United Kingdom

#### Reviewed by:

Christopher Watkins, Abertay University, United Kingdom Tamsin Saxton, Northumbria University, United Kingdom

\*Correspondence:

Zuzana Šterbová ˇ zuzana.sterbova@natur.cuni.cz

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 12 October 2018 Accepted: 09 January 2019 Published: 05 February 2019

#### Citation:

Šterbová Z, Ture ˇ cek P and ˇ Kleisner K (2019) She Always Steps in the Same River: Similarity Among Long-Term Partners in Their Demographic, Physical, and Personality Characteristics. Front. Psychol. 10:52. doi: 10.3389/fpsyg.2019.00052 In mate choice, individuals consider a wide pool of potential partners. It has been found that people have certain preferences, but intraindividual stability of mate choice over time remains little explored. We tested individual consistency of mate choice with respect to a number of demographic, physical, and personality characteristics. Only mothers were recruited for this study, because we wanted to find out not only whether women choose long-term partners with certain characteristics but also whether the father of their child(ren) differs from their other long-term (ex-)partners. Women (N = 537) of 19–45 years of age indicated the demographic, physical (by using image stimuli), and personality characteristics of all of their long-term partners (partners per respondent: mean = 2.98, SD = 1.32). Then we compared the average difference between an individual's long-term partners with the expected average difference using a permutation test. We also evaluated differences between partners who had children with the participants (fathers) and other long-term partners (non-fathers) using permutation tests and mixed-effect models. Our results revealed that women choose long-term partners consistently with respect to all types of characteristics. Although effect sizes for the individual characteristics were rather weak, maximal cumulative effect size for all characteristics together was high, which suggests that relatively low effect sizes were caused by high variability with low correlations between characteristics, and not by inconsistent mate choice. Furthermore, we found that despite some differences between partners, fathers of participants' child(ren) do fit their 'type'. These results suggest that mate choice may be guided by relatively stable but to some degree flexible preferences, which makes mate choice cognitively less demanding and less time-consuming. Further longitudinal studies are needed to confirm this conclusion.

Keywords: repeatability, intraindividual variability, motherhood, stability of preferences, sexual selection, mating behavior, female preferences

## INTRODUCTION

fpsyg-10-00052 February 1, 2019 Time: 17:55 # 2

Human mate choices are influenced by various sociodemographic, physical, and psychological characteristics of a prospective partner. Majority of research on absolute partner preferences focuses on what is considered attractive across various individuals (see e.g., Buss, 1989; Regan et al., 2000). This line of research yielded evidence on high agreement with respect to attractiveness both within and across cultures (r > 0.90) (see a meta-analysis, Langlois et al., 2000). Agreement in attractiveness assessment between individual raters is, however, much lower (r > 0.50) (see a meta-analysis, Feingold, 1992). It seems therefore that despite a strong general consensus in attractiveness assessments in general, there exists a substantial variability between individual preferences (Hönekopp, 2006). This interindividual variability may be due to relative partner preferences (e.g., based on own characteristics and experiences, Figueredo et al., 2006; see a review, Šterbová and Valentová, ˇ 2012). It is also possible, moreover, that an individual's partner preferences also change over time (Ko´scinski, 2010 ´ ).

In non-human animals, it has been found that individual consistency of female mate preferences is rather low and context-dependent, because it varies depending on females' age, environment, and conditions (Cotton et al., 2006; Jennions and Petrie, 2007; Bell et al., 2009). In humans, ontogeny of mate preferences has been studied mostly cross-sectionally (Brumbaugh and Wood, 2013; Boothroyd and Vukovic, 2018). This approach revealed differences among various age groups which were due to changes in hormonal levels, personal development, and the like (Ko´scinski, 2011 ´ ), but it did not track intraindividual variation in preferences in a longitudinal fashion. Ko´scinski (2010) ´ tested individual consistency of facial attractiveness assessment and found that self-correlation of women's assessment was approximately 0.78, which means that about 40% (1–0.78<sup>2</sup> ) of individual variation in attractiveness rating varies over time. To sum up, existing evidence suggests that preferences can change over time with age and reproductive stage of life, and that they can change in reaction to current circumstances (Rosenthal, 2017).

In short, it has been established that over time, mate preferences are to some degree plastic, but research of intraindividual stability in real mate choice in humans is sparse. To the best of our knowledge, only three studies so far tested individual consistency of mate choice (Eastwick et al., 2017; Newman et al., 2018; Šterbová et al., 2018 ˇ ). They found consistency in preferences for eye color (Šterbová et al., 2018 ˇ ; but cf. Newman et al., 2018), hair color (Šterbová et al., 2018 ˇ ), attractiveness, masculinity, vitality, depression, delinquency, religiosity, educational aspirations, self-esteem, and intelligence (Eastwick et al., 2017). It is important to note, however, that the effect sizes were rather low. Nevertheless, it can be assumed that mate choice is affected by a vast array of demographic, physical, and personality traits.

In the present study, we tested individual consistency of mate choice with respect to traits that play an important role in human mating context and could therefore have a substantial impact on reproduction (Little, 2015). To wit, it is possible that different characteristics are valued in non-reproductive as opposed to reproductive relationships, that is, that different characteristics result in direct versus indirect benefits to offspring (Boothroyd and Vukovic, 2018).

The following is a list of characteristics we followed with respect to stability of individual mate preference in women:

## Tallness

Existing research suggests that women tend to choose partners who are tall (Hensley, 1994) and, in particular, taller than themselves (see meta-analysis, Pierce, 1996). This may be due to a link between body height and health (Christensen et al., 2007), and/or height and career success (Judge and Cable, 2004).

#### Body Form

Preferences for body shape and weight may reflect environmental variation in food availability (Anderson et al., 1992), but also serve as cues to an individual's social and economic status (Sarlio-Lahteenkorva and Lahelma, 1999). In general, optimal body mass index is perceived as attractive (Tovée et al., 1999). On top of that, metabolically expensive physical features, such as muscularity, are supposed to be attractive to females because they advertise that energy gathered from the environment could be converted to reproduction-related activities (Kaplan and Gangestad, 2004). Some studies found that women prefer muscular, but not too muscular, men (Dixson et al., 2003; Frederick and Haselton, 2007). In general, research supports the inverted-U hypothesis of masculine traits (Frederick and Haselton, 2007). These ambiguous results could be explained by personality characteristics associated with masculinity, such as higher dominance but also lower honesty, cooperativeness, emotionality, and parental qualities (Perrett et al., 1998; Boothroyd et al., 2008). Some studies thus found female preference for masculinity (Cunningham et al., 1990), whereas other research found preference for femininity in males (Perrett et al., 1998). Similarly, both hirsuteness and beardedness are sexual dimorphic traits. As in masculinity, evidence regarding female preferences is mixed (see for review, Dixson and Rantala, 2015), which could be due to association between beards and body hair on the one hand and perceived dominance and aggressiveness on the other hand (Puts, 2010).

## Eye and Hair Color

Research shows that eye and hair color play an important role in some human populations (Frost, 2006; White and Rabago-Smith, 2011) because they can affect perceived trustworthiness (Kleisner et al., 2013), dominance (Kleisner et al., 2010), attractiveness (Laeng et al., 2006), and health status (Frost et al., 2017).

#### Personality

Last but not least, it has been established that cross-culturally, some personality traits likewise play an important role in mate choice (Buss and Barnes, 1986). It has been shown that both men and women desire partners who score high on Agreeableness, Openness (Botwin et al., 1997), and Emotional Stability (Conroy-Beam et al., 2015). These characteristics contribute to cooperation

and altruism (Jensen-Campbell and Graziano, 2001), and thereby have a positive impact on the couple's reproductive success (Buss, 1991).

The main aim of our study was to examine individual consistency of mate choice in women. Specifically, we tested whether women repeatedly choose long-term partners with particular demographic, physical, and personality characteristics. In short, we tested intraindividual variability of female mate choice. Consistency of mate choice was measured by several methods (by consistency index, percentage of variance in partners' trait values accounted for by the respondent, and by correlations). Effect sizes were estimated by stepwise randomization effect size assessment and stepwise estimation of shared effect size. Only mothers were recruited for the study because from an evolutionary perspective, the most important partner is the father of a woman's child or children. We have therefore tested mutual similarity among all of women's longterm (ex-)partners and tried to find whether the partner with whom they had a child or children is different from those partners with whom they did not reproduce.

#### MATERIALS AND METHODS

All procedures followed were in accordance with ethical standards of the relevant committee on human experimentation and with the Helsinki Declaration. The study was approved by the Institutional Review Board of Charles University, Faculty of Sciences (Approval number 2016/23). All participants were informed about the goals of the study and approved their participation by clicking button 'I agree' below the informed consent. Written informed consent was not obtained because the study was conducted online.

## Participants

#### Respondents

Respondents were recruited via social sites, such as Instagram and Facebook, and websites aimed at mothering, e.g., babyweb.cz and emimino.cz, via flyers distributed to gynecology offices and dormitories, and by emails sent to respondents from our earlier studies. The initial sample consisted of 1,331 individuals. We analyzed only data from women who met the following criteria: (i) age between 18 and 45 years, (ii) heterosexual orientation (Kinsey scale < 3), (iii) had at least two long-term partners (defined as committed relationship that is believed to have future prospects), (iv) shared household with their biological father until at least 12 years old (this study is part of a larger research aimed at the imprinting-like effect).

The final sample consisted of 537 respondents (mean age = 29.14, SD = 6.281, median = 29, min = 19, max = 45). Information provided by each of these respondents was used in at least one analysis. All respondents together had a total of 1,599 partners (partners per respondent: mean = 2.98, SD = 1.32, median = 3, min = 2, max = 10). The mean length of relationship was 5.07 years (SD = 4.99, median = 3.17, min = 0, max = 27). When fathers and non-fathers were analyzed separately, mean length of relationship between these two categories of partners differed (mean length of relationship with fathers in years = 8.44, SD = 5.65, median = 7.416, min = 0.25, max = 27, mean length of relationship with non-fathers in years = 2.61, SD = 2.32, median = 2, min = 0, max = 20).

In many cases, respondents did not report all 21 characteristics about all partners, which prevented us from including all respondents and all partners in all tests of mate choice consistency. Respective sample sizes did not differ substantially (the number of respondents: mean = 482.8, SD = 21.9, median = 481, min = 435, max = 516; Number of partners with known information: mean = 1.388, SD = 64.3, median = 1.376, min = 1.236, max = 1.491) and are all reported in the **Appendix**.

#### Measurements

Respondents reported a total of 21 characteristics (3 demographic and 13 physical characteristics, and 5 personality traits). Since some of these characteristics can change within a short period of time (e.g., beardedness), respondents were asked to indicate characteristics as they remember them from the time when the relationship was established.

Of demographic characteristics, they were asked to report the size of their partners' and fathers' residence (1 – metropolis, 2 – city, 3, town, 4 – village), education level (1 – elementary school, 2 – high school, 3 – college, 4 – university), and age difference between themselves and their long-term partners (in months; negative numbers indicate that a woman is older than her partner).

Physical characteristics were reported by selecting the relevant image stimuli of eye color (gray, blue, green, brown, and black), hair color (9 shades varying from light blond to black), facial masculinity (five images varying from low to high masculinity) (Johnston, 2006), beardedness (four images varying from clean shaven to fully bearded) (Dixson and Brooks, 2013), muscularity (six images varying from low to high muscularity) (Frederick and Haselton, 2007), relative height (six images varying from mantaller pattern to women-taller pattern) (Pawlowski, 2003), body mass index (six images varying from low to high BMI) (Allen et al., 2003), hirsuteness (five images varying from a low to a high level of hirsuteness) (Dixson et al., 2010), leg to body ratio (LBR) (five images varying from relatively short to long legs) (Swami et al., 2006). Further, respondents indicated their partners' body weight (in kilograms), body height (in centimeters), and overall masculinity and attractiveness (using a 7-point verbally anchored Likert scale, ranging from 'under average' to 'above average').

To assess personality characteristics, we used the Ten-Item Personality Inventory (TIPI) (Gosling et al., 2003), which maps five domains: Emotional Stability, Extraversion, Openness, Agreeableness, and Conscientiousness. Responses for each item were indicated on a 7-point Likert scale ranging from 'strongly disagree' to 'strongly agree.' We used a method of translation and back-translation into the Czech language.

#### PROCEDURE

The test was administered online using the Qualtrics platform. At the outset, respondents were asked to confirm their

informed consent. In order to examine consistency of mate choice, respondents described their all (ex-)partners using image stimuli to assess their physical traits, indicated their demographic characteristics, and answered questions in a personality assessment questionnaire. They indicated only those characteristics they clearly remembered, otherwise they were asked to skip the question. A total of 206 (38.36%) out of 537 individuals involved in the analysis failed to indicate at least one of their partners' characteristics. To make the process easier on the participants, they indicated the names or assigned nicknames to all (ex-)partners at the beginning of questionnaire. These (nick)names were then displayed with each question, meaning the respondent did not have to remember the order of (ex-)partners and we could be reasonably certain that for each partner, we know all the characteristics the participant remembers. This study was part of a wider research, which is why respondents also answered questions pertaining to their fathers. The whole procedure took approximately 80 min.

## ANALYSES

All analyses described below were conducted using the R 3.5.1 software. The code is available at: https://github.com/ costlysignalling/Mate-choice-consistency-2.

### Consistency Evaluation

Average difference between respondents' partners (1¯ ) served as a measure of mate choice consistency. Larger differences between respondents' partners indicate a more diverse set of partners and a lower mate choice consistency.

To assess the consistency of mate choice, we used a procedure similar to consistency index described in an earlier study (Šterbová et al., 2018 ˇ ). Since the original consistency index views qualitative character states only in terms of identity (1) or difference (0) between pairs of respondent's partners with respect to a particular character state, we used the average difference between respondents' partners (1¯ ) as a parametric equivalent of consistency index.

First, we assessed the average difference between a pair of partners separately for each respondent. For example, if only two long-term partners were reported, one had extroversion value 11 and the other 13, the average difference between them was 2 (i.e., 13 – 11). When four long-term partners were reported, their extroversion values could be as diverse as 5, 10, 7, and 14. In such a case, we calculated mutual differences for every possible pair of partners (10 – 5 = 5, 7 – 5 = 2, 14 – 5 = 9, 10 – 7 = 3, 14 – 10 = 4, 14 – 7 = 7) and then computed the average, i.e., (5 + 2 + 9 + 3 + 4 + 7)/6 = 5. This average value characterizes a woman's mate choice consistency with respect to a particular trait. These individual values were later averaged across all respondents in the sample to evaluate overall mate choice consistency [in this short example, that value would be calculated as (2 + 5)/2 = 3.5]. This way, we ensured that every woman contributed to population consistency equally, i.e., regardless of the number of her long-term partners.

Populational average difference between all partners of an individual could thus be expressed as:

$$\bar{\Delta} = \sum\_{i=1}^{n} \sum\_{j=1}^{p\_i - 1} \sum\_{k=j+1}^{p\_i} \frac{|t\_{ij} - t\_{ik}|}{p\_i \left(p\_i - 1\right) / 2n}$$

where 1¯ indicates the populational average difference between partners of an individual, p<sup>i</sup> the number of partners of i-th individual, tix trait value of x-th partner of i-th individual, and n the number of respondents.

## Permutation Test of Mate Choice Consistency

Subsequently, we compared the observed average difference between an individual's partners (1¯ ) with the distribution of expected 1¯ in a population with random pairing. A permutation test was executed to obtain the equivalent of a unidirectional test p-value.

We assigned partners to respondents randomly while maintaining the number of partners each respondent actually reported. This was done for each trait separately. We generated 10,000 such random populations and calculated the 1¯ for each one. This yielded the distribution of 1¯ for a random pairing.

We assessed the proportion of 1¯ in random permutations which were smaller than the observed value of 1¯ . This gave us the equivalent of one-tailed test p-value, which indicated whether people were indeed significantly more consistent in their mate choice than one would expect if the choice were random.

## Stepwise Randomization Effect Size Assessment

This procedure allowed us to estimate the proportion of partners that have to be switched between respondents in order to lower the mate choice consistency to the expected level. This measure can range between 0% (observed consistency is lower than or equal to the expected consistency and no partners need to be switched) and 50% (see the example below).

We calculated the effect size attributable to consistency of mate choice using a stepwise randomization test. In this test, the observed 1¯ is gradually elevated by random relocation of one partner at a time until the expected value of 1¯ is reached (the procedure is described in detail in Šterbová et al., 2018 ˇ ). The resulting percentage indicates the proportion of partners that needs to be switched among participants until one arrives at a 1¯ that would be expected in a random pairing. This was done 10,000 times for each trait. The mean value is reported as the estimated effect size together with a 95% confidence interval of this measure calculated as a 2.5–97.5% quantile of these 10,000 permutation-yielded percentages. It should be noted that though expressed as a percentage, this effect size is not identical to the proportion of explained variance. Maximal effect size in terms of percentage of partners that need to be relocated is about 50% because after relocating one half of all partners, one necessarily starts approaching the initial configuration again.

For instance, imagine a hypothetical dataset where every respondent reports 2 partners who have the same, either brown

or blue, eye color. The observed 1¯ in such a population is 0. The maximal 1¯ is observed when data are permuted so that each individual has one brown-eyed and one blue-eyed partner. This state is achieved when 50% of partners switch places. Relocation of one more blue-eyed or black-eyed partner would necessary result in coupling with another partner of the same eye color, which would lower the overall 1¯ .

## Other Measures of Mate Choice Consistency

Several non-permutation methods based on correlational approach had been proposed to tackle the mate choice consistency problem. These methods are to some extent equivalent to the proportion of partners to be switched between respondents (see section "Stepwise Randomization Effect Size Assessment"), but they are expressed either as a correlation coefficient or as the proportion of explained variance (values between 0 and 1 or 0 and 100%, respectively).

As suggested earlier (Eastwick et al., 2017), the percentage of variance in partners' trait values accounted for by the respondent (i.e., the metric conceptually identical to the Intraclass correlation coefficient) can be used as a measure of mate choice consistency in parametric variables. We calculated this measure as well to enable a comparison with stepwise randomization effect size. We treated respondent identity (ID) as a random factor in a mixedeffect model (using lmer function within the lmerTest package). The statistic of interest was the random variance estimate for respondent ID divided by total variance. Reasonable benchmarks of this variance estimate were outlined at 10% for meaningful, 20% for a medium-sized effect, and 30% for a large effect (Kenny et al., 2006).

We have also calculated a simple Pearson correlation coefficient between two vectors of partners' trait values. Every possible pair of partners of the same individual was treated as a unit of analysis. Individuals who had more partners therefore contributed to the overall coefficient disproportionally. Comparisons between this measure and more rigorously estimated effect sizes described above might indicate, however, that this is not necessarily a problem.

Pearson correlation coefficients between these three effect size measurements were calculated to demonstrate the equivalence of these measures. Additionally, we evaluated a linear model of dependence between the explained variance and the proportion of partners that needs to be switched between partners. This provided a useful tool for future comparisons with results on mate choice consistency that would use different approaches to effect size reporting.

## Stepwise Estimation of Shared Effect Size

Elaborating on the permutational effect size estimation (see section "Stepwise Randomization Effect Size Assessment"), we can assess shared effect size between mate choice consistency along two non-independent variables. Permutational effect size is expressed in the proportion of partners that needs to be switched between respondents. Shared effect size is the proportion of partners switched in two seemingly independent estimates of consistency effect size where correlation between variables is taken into account. If one switches 16% of partners to reach the expected consistency in body weight, and then another 6% are switched to avoid also consistency in body height, one could claim that 22% of all partners need to be switched to avoid nonrandom consistency in both height and weight. Going in the opposite direction, we relocate 11% to avoid consistency in height and then another 11% to avoid consistency in weight. Since the sum of residual effect sizes (6% + 11% = 17%) is lower than the sum of simple effect sizes (16% + 11% = 27%), one can assume non-independence between these variables and calculate a shared effect size. This 'overlap' is missing in the sum of residual effect sizes 22–17% and present twice in the sum of simple effect sizes 27–22%, but in both cases, the resulting proportion is 5%. These shared effect sizes can be used to calculate the maximal cumulative effect size, i.e., the number of partners that need to be switched between individuals to avoid mate choice consistency on all characteristics.

The link between every pair of partners' qualities was assessed in two ways, namely Pearson correlation coefficient with a single partner as a unit of analysis and shared effect size, which is equivalent to the abovementioned stepwise randomization effect size for a pair of variables (A and B). Here, the 1¯ in variable A of empirical population is elevated by a stepwise reassignment of partners until the mean expected value of consistency with respect to A is reached. This rearranged population is then taken as a starting point and stepwise randomization effect size assessment is executed for variable B. The residual proportion of partners that need to be switched to avoid consistency in B is estimated after the effect of consistent mate choice with respect to A is eliminated. This is done 1,000 times to get the average residual effect size of B, and 1,000 times in the opposite direction to get the equivalent measure for A. Shared effect size is then calculated easily as A + B = (A ∩ ¬B) + (B ∩ ¬A) + 2 × (A ∩ B), where ∩ represents intersection, ∪ unity, and ¬ a set complement. This value was calculated for every pair of assessed variables.

Maximal cumulative effect size was then derived from pairwise shared effect sizes. Higher-order intersections were not estimated with permutation approach. It would have been possible, but extremely demanding with respect to computation time. Instead, we assumed that these intersections are proportional to the ratio of pairwise intersections. For example, if variables A, B, and C have effect sizes of 20, 15, and 10% partners to switch, and their shared effects are 10% for A ∩ B, 4% for A ∩ C, and 3% for B ∩ C, it is assumed that segments (A ∩ C) ∩ ¬B, (B ∩ C) ∩ ¬A, and A ∩ C ∩ B are in the same proportion as A ∩ ¬B, B ∩ ¬A, and A ∩ B (i.e., 10:5:10), and given that in the sum of A ∩ C and A ∩ C (4 + 3 = 7), segment A ∩ B ∩ C appears twice, it follows that the final proportions will be 2% for (A ∩ C) ∩ ¬B, 1% for (B ∩ C) ∩ ¬A, and 2% for A ∩ B ∩ C. The unique contribution of C [C ∩ ¬ (A ∪ B)] must equal 5% of partners and the total cumulative effect size (A ∪ B ∪ C) must be 30%. To minimize possible errors stemming from inaccuracy of the assumption of intersection proportionality, variables were added to the total cumulative effect size one at a time according to a criterion of

maximal unique contribution to the total effect size. First, we included variable A, which had the largest unique effect size, then we calculated for all other variables their unique contribution to the total effect size, selected the one which contributed the most, labeled it B, and included it in our calculation. For the next variable, we calculated its contribution to the union of A and B, added to the model the one with the largest unique contribution, and so on.

Growth of the unique contribution relative to the previous step of variable inclusion was a sign of accumulated error caused by inaccuracy of the assumption of intersection proportionality (i.e., in this step, the variable was rearranged back to high mate choice consistency). In each step, therefore, higher unique contributions were replaced by minimum values from contributions calculated in previous steps. This number represents the minimal possible contribution without allowing for a negative relationship between consistent mate choice along different variables. As a result, consistent mate choice in one variable or a union of variables could lead to inconsistent mate choice in another variable or variables. Maximal cumulative effect size was calculated as the total sum after the stepwise addition of all variables. The fact that a negative relationship between consistency on different variables was neglected is not problematic because the individual contributions still add up to the same total. Sacrifice of a consistent mate choice on one variable is compensated by an equivalent increase in the consistency along other variables. Therefore, although the order of unique contributions to overall consistency and their magnitude may be burdened by an error, the estimate of maximal cumulative effect size is sound and reliable.

Since a high number of assessed mate choice consistencies and their shared effects naturally leads to a substantial maximal cumulative effect size, the empirical level is contextualized with the expected maximal cumulative effect size in a population with random pairing. In this resampled population, the identity of partners and links between their qualities remained identical to the empirical data, but partners were scrambled among respondents so the total number of partners any respondent had remained unchanged.

## Differences Between Fathers and Non-fathers

Differences between partners with whom the respondents had children ('fathers') and other former long-term partners ('nonfathers') were investigated along all 21 romantic partner qualities our study had followed. Mean values and variances of fathers and non-fathers were compared to reveal possible differences between these groups. Changes in 1¯ after the exclusion of fathers were compared to expected changes in 1¯ after the exclusion of random individuals to assess whether fathers were especially typical of given partner sets and elevated overall mate choice consistency, or exceptional within these sets, thus lowering overall mate choice consistency.

The 1¯ was calculated first for the full partner set and then for a restricted sample where fathers were excluded. The observed difference between these two samples was compared with the distribution of expected differences yielded by a permutation test. In each referential permutation, fathers were selected randomly from sets of partners provided by respondents. For instance, if a participant reported four long-term partners and two of them fathered at least one of her children, two individuals from this set were labeled as fathers and excluded in each permutation run (as expected, however, most respondents had children with only one partner). Two tailed p-value was calculated as a measure of significance of the difference between measured and expected changes in mate choice consistency after the exclusion of fathers. 10,000 permutation runs were executed for each variable. Where non-father 1¯ was significantly higher than expected, fathers were highly typical (or intermediate) representatives of woman's partners. Where it was lower than expected, fathers were rather exceptional individuals within the sets of partners and measured consistency of mate choice was higher without them.

Yet even if fathers fitted in partner sets without being either exceptionally typical or highly atypical, it was still possible that there are some differences between fathers and non-fathers along the assessed qualities. Mixed effect models were employed to calculate the probability of equality of group means. Mixed effect equivalents of Levene's test, where distance from the group mean is used as a response variable, were then used to investigate equality of variances between father and non-father groups, since it could be the case that even if the two groups do not differ in their means, the extreme or intermediate individuals just may not be the right 'father material.' We treated respondent ID as a random factor in all mixed effect models and used the lmer function lmer from the lmerTest package.

All independent sets of p-values reported in the result section were adjusted for multiple comparisons using the Benjamini– Hochberg procedure. Vectors of p-values calculated for the sets of 21 qualities were adjusted separately, while the p-values of correlations between qualities were adjusted together.

## RESULTS

Mate choice consistency was higher than expected in all assessed qualities except for facial masculinity and beardedness. Difference between observed and expected consistency was statistically significant in most qualities, but effect sizes differed substantially. While consistency of mate choice in residence or weight was substantial, it was only medium-sized or small with respect to hair or eye color. Complete results are summarized in **Table 1** and **Figure 1**.

The average effect size was highest in demographic variables, but none of the pairwise comparisons between groups of variables (demographic, physical, and psychological) was statistically significant (p > 0.1). Permutation test results are visualized in **Figure 1**. All sample sizes and descriptive statistics of all variables are listed in the **Appendix**. The different estimates of effect size were highly correlated. The proportion of males who had to be relocated between respondents correlated with the variance accounted for by the respondent at 0.93, whereby a linear model of relationship between these two measures supports the idea that the latter is approximately double of the former. The slope

#### TABLE 1 | Mate choice consistency: complete results.

fpsyg-10-00052 February 1, 2019 Time: 17:55 # 7


FIGURE 1 | Visualization of permutation tests of mate choice consistency centered around observed 1¯ and normalized along the SD of expected 1¯ distribution. Difference between the observed and expected value is expressed in standard deviations from the expected value distribution. The higher the bell curve above the Observed 1¯ value, the higher the actual mate choice consistency. Bell curve below Observed 1¯ value indicates a trait where the observed mate choice was less consistent than expected.


TABLE 2 | Relations between investigated

 qualities of romantic partners expressed in shared effect sizes and Pearson correlations.

in the model where respondent-attributable variance regressed on the proportion of partners to relocate was 2.08 (95% CI = 1.72–2.45) with minimal (not significantly different from 0) intercept of -0.18 (95% CI = −3.19–2.83). Results yielded by the simple Pearson correlation correlated at 0.91 with the percentage of partners to relocate and at 0.98 with respondent-attributable variance. All of these measures can be thus treated as functionally equivalent.

Links between pairs of partners' qualities are summarized in **Table 2**. In total, 103 out of 210 correlations were significant even after Benjamini–Hochberg correction for multiple comparisons. Maximal cumulative effect size was 50.95% (expressed in the proportion of partners to switch between individuals). The first 10 variables ordered according to their unique contribution starting with the highest (residence, weight, relative height, age difference, attractiveness, hair color, openness, BMI, height, agreeableness, in this order) explained 48.30% of partner assignment. The other 11 variables contributed little (their unique contributions were less than 1%) or not at all (after the inclusion of all other variables, facial masculinity and beardedness failed to show any positive numbers). Full results are visualized in **Figure 2**.

Reaching maximal possible effect size suggests that adding yet other variables to a similar model of cumulative consistency would add little to our current sum. On the other hand, it is conceivable that one might select precisely those variables which are not intercorrelated and explain a majority of mate choice consistency in just a handful independent dimensions. In theory, complex interaction patterns may lead to an even higher cumulative effect size since 50% of partners to relocate as an effect size limit applies to a single variable with two levels and represents the difference between maximal and minimal consistency (i.e., not maximal and expected). The high proportion of significantly correlated pairs of variables (49%), does, however, fit well within the impression of a substantial redundancy in our model.

Permutation test of changes in mate choice consistency revealed that fathers are significantly exceptional amongst participants' long-term partners in beardedness, muscularity, hirsuteness, extraversion, and openness. The average 1¯ without these individuals was lower than the 1¯ in permutation runs where an equivalent proportion of random partners (i.e., fathers and non-fathers) was excluded. Fathers were not significantly typical long-term partners in any of the assessed qualities. Complete results of these tests are summarized in **Table 3** and visualization is provided in **Figure 3**.

In qualities where fathers were indicated as exceptional individuals (except for extraversion), mean trait values differed between fathers and non-fathers, while variances differed in



beardedness, muscularity, and hirsuteness. Fathers were more bearded, hairier, more muscular, and showed a higher openness to experience. These differences might explain the overall exceptionality of fathers except for extraversion. It seems that fathers are outliers within partner sets even where the group means and variances of father and non-father sets do not differ. Moreover, fathers lived in larger cities, had higher education, were heavier and taller (although relatively, their height was closer to the height of respondents), more attractive and masculine, had lighter eyes, darker hair, more masculine faces, and were more agreeable, conscientious, and emotionally stable than non-fathers.

Group variances differed in several qualities. Fathers were significantly more variable than non-fathers with respect to age difference from the respondent and less variable in attractiveness, masculinity (general and facial), BMI, conscientiousness, and agreeableness. It seems that along these variables, either or both of the extremes are not the right for the 'father material'. A graphic overview which compares densities that indicate differences between group means and variances is presented in **Figure 4**. Complete results in a textual form are listed in **Table 4**.

## DISCUSSION

The aim of this study was to examine consistency of mate choice with respect to a variety of demographic, physical, and personality characteristics. We found that women choose longterm partners consistently across all types of characteristics (demographic, physical, and personality), but consistency was not observed in all tested traits. We also investigated potential differences in tested characteristics between long-term expartners and partner(s) with whom women had child(ren). Results revealed that fathers in general fit the women's 'type,' although differences between them and other (ex-)partners are not large. Our findings are in line with earlier research (Eastwick et al., 2017; Šterbová et al., 2018 ˇ but cf. Newman et al., 2018), which found that people consistently choose partners with certain traits, although reported effect sizes were rather small.

Is there any potential advantage to having a 'type'? We could assume that preference for a particular 'type' may facilitate mate choice decisions. In theory, the pool of potential partners is immense and in the most extreme case covers almost one half of adult human population on Earth. This theoretical pool is, of course, unrealistic, but even so, people do have many potential partners to actually choose from. In order to navigate this vast amount of opportunities, it may be useful to follow a certain direction in this multidimensional trait space of human characteristics. Preference for a certain 'type' would constrain the spectrum of potential choices and reduce the dimensionality of trait space. A systematic, 'type-directed' exploration of this multidimensional trait space would facilitate better orientation on the 'mating market.' In short, having a 'type' means that women need not create new preferences always anew and based on random choices, i.e., it precludes them from jumping unsystematically across the vast dimensionality of trait space. A 'type' should not be viewed as a rigid attractor but rather as a polarizing filter which canalizes the selection of optimal partner. A 'type' is thus not a target in itself but rather the means by which a goal can be reached (and, e.g., an appropriate partner for reproduction, a father, found). This is why fathers do not fully correspond to a typical

partner and show some, however, small, deviation from the type.

This setup of optimal partner preferences may be beneficial. Mate choice not guided by such relatively stable but to some degree flexible preferences would be much more cognitively demanding and time-consuming. What remains unclear, however, is when and how are these preferences established. One of such mechanisms could be the imprinting-like effect (parent–partner similarity) or homogamy (self-similarity) (see Šterbová et al., 2018 ˇ ). Moreover, parent-partner similarity can be promoted by emotional closeness with a parent during childhood (Saxton, 2016). Some plasticity of preferences may be adaptive also because it helps individuals adjust their preferences according to the current situation (e.g., ecological circumstances, inner state, their own characteristics which vary over time, experiences). From an evolutionary perspective, variation in mate preferences is important for speciation and diversification (Rodríguez et al., 2013). Species can adapt to changing circumstances by adjusting their mate choice. One might assume that learning would decrease the consistency of mate choice, that one would, for instance, choose a partner with characteristics different from an earlier partner because of negative experiences. On the other hand, mate choice is a mostly non-conscious process, which implies that partner preferences are not easily modulated by experience. Our findings support these assumptions, because we found that women have a 'type' and choose partners who fit it.

One can only speculate whether mate choice when reproduction is in question differs from earlier preferences, i.e., preferences in a non-reproductive context. From an evolutionary perspective, the most important partner is the one with whom a woman will reproduce. This is why we tested whether fathers fit the women's 'type,' or rather whether fathers' characteristics differ from characteristics of the non-fathers.

Our results show that although consistency is found across all of woman's long-term partners, there are some notable differences between non-fathers and fathers. In particular, fathers disrupted consistency in beardedness, hirsuteness, muscularity, extraversion, and openness. The means and variance differ significantly between fathers and non-fathers in many other characteristics as well. This could be due to several reasons. First of all, it is possible that men with whom women reproduce actually differ from those with whom they do not. It should be noted, however, that most characteristics vary over time. This finding may thus be a side effect of higher age of fathers compared to non-fathers, especially in those characteristics where

TABLE 4 | Results of Mixed effect models comparing father/non-father means and variances.


Respondent ID is treated as a random factor.

fathers disrupt mate choice consistency. Secondly, differences between fathers and non-fathers might be due to time-dependent cultural shifts. For instance fashions concerning beardedness vary significantly over time, which may cause a higher mutual similarity among former partners (non-fathers) as opposed to fathers (the most recent partner). Moreover, from an evolutionary perspective, these slight differences among partners could be due to the fact that each partner could be a potential father of a woman's children. It may be therefore beneficial for a woman not to experiment too much in her mate choice.

Nevertheless, some differences between fathers and nonfathers were observed. They could be due to individual relationship experience. In other words, it is possible that women adjust their mate choice depending on experiences gathered over lifetime and reproduce with a partner who fits their preferences better than earlier partners. Differences between fathers and nonfathers could also be due to memory bias or cognitive dissonance influenced by positive or negative experiences with particular partners. If so, the level of negative experiences with former partners should positively correlate with fathers' non-typicality. Moreover, women might have a tendency to ascribe more positive characteristics to a current partner (usually the father of her child or children) than to their ex-partners. In other words, partnership status itself may have an impact on the assessment. Alternatively, former partners could be regarded on average more positively simply because women's detailed memories of problems encountered in earlier relationships fade with time. There might be therefore some trade-offs between the principles of 'my baby's father is always better' and 'sweet recollections of past loves.' We cannot address such possibilities in our analysis.

The fact that fathers lower the measured mate choice consistency and yet there is no meaningful systematic difference between fathers and non-fathers could be accounted for by either of two possible explanations. First of all, it is possible that women reproduce with men who have different characteristics than their ex-partners. This pattern was, however, found only for extroversion (whereby women who date extroverted men reproduced with more introverted individuals, while other women date introverts but reproduce with men who are more extroverted). Moreover, overall consistency of mate choice with respect to extroversion was high even when fathers were included in the partner sample and even in cases when fathers and non-fathers were excluded at random. It is then fair to assume that fathers do, after all, fit within the general type of women also in extraversion, although they tend to be on one of the extreme tails of this intrapersonal distribution. The second possible explanation is that variance in father and non-father group differs and fathers are a more variable group. We did not, however, encounter such a case in our dataset. If fathers had a higher variance than non-fathers, they would have to have also a higher average trait value. Where this was not the case (age difference), we found that father exclusion did significantly elevate mate choice consistency.

These findings are limited by including only women in reproductive age, because preferences and potentially also actual

choices can change in connection with changes in hormonal levels (e.g., menopause) during women's lives (Boothroyd and Vukovic, 2018). Female preferences are underpinned by a set of evolutionary adaptations (Kokko et al., 2003; Geary et al., 2004) and can change with age so as to reflect women's different interests (Ko´scinski, 2011 ´ ). Similarly, the importance of particular physical and personality characteristics can vary during one's life. To test intraindividual variability in mate choice, future studies should therefore investigate women's preferences and actual choices during their lives from childhood to menopause. Another limitation of our study is given by the fact that only respondents but not their (ex- )partners participated in the study. Although it would be nearly impossible to recruit also all (ex-)partners, it ought to be taken into account that when a woman reports about all of her partners during one session, this may lead to bias in a direction of mutual similarity. Alternatively, assessment of partners' characteristics could be biased by subsequent experiences, memories, or circumstances of a break-up of a relationship.

Our results support the hypothesis of consistency of mate choice with respect to a variety of characteristics, but further research is needed to confirm this effect through a longitudinal design. Secondly, consistency in mate choice should be investigated also in men and in a short-term mating context, where consistency of mate choice may be lower than in a long-term context (Šterbová et al., 2018 ˇ ). Furthermore, future research should investigate interindividual differences in individual consistency (to find out which characteristics predict a consistent mate choice, the role of family members, etc.). In the light of all of the above, it would also be highly relevant to investigate to what degree are preferences inherited or learned. A twin study (Germine et al., 2015) has reported that face preferences seem to be mainly explained by environmental variation, but more research in this field is needed. And finally, research

## REFERENCES


should focus not only on actual choices but also on partner preferences.

## AUTHOR CONTRIBUTIONS

ZŠ developed the study concept and collected data. PT performed the data analysis. ZŠ, PT, and KK interpreted results. ZŠ, PT, and KK drafted the manuscript. All authors approved the final version of the manuscript for submission.

## FUNDING

This research was supported by the Czech Science Foundation project reg. no. 18-15168S, and by the Charles University Grant Agency project no. 1436317. This work was supported by Charles University Research Centre program no. 204056. ZŠ and PT were supported by the 'Sustainability for the National Institute of Mental Health' project, grant number LO1611, with financial support from the Ministry of Education, Youth, and Sports of the Czech Republic under the NPU I program.

## ACKNOWLEDGMENTS

The authors wish to thank to Kristýna Taskovská for her help with data collection, Michaela Salák for assisting with participant recruitment, and Anna Pilátová for English proofreading.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00052/full#supplementary-material


physical attractiveness. J. Pers. Soc. Psychol. 59, 61–72. doi: 10.1037/0022-3514. 59.1.61


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Šterbová, Ture ˇ ˇcek and Kleisner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Do Single Men Smell and Look Different to Partnered Men?

#### Mehmet K. Mahmut\* and Richard J. Stevenson

Food, Flavor and Fragrance Lab, Department of Psychology, Macquarie University, Sydney, NSW, Australia

Previous research indicates human body odor (BO) can signal kinship, sickness and genetic compatibility. Based on research indicating single males have higher testosterone levels than partnered males and that higher testosterone levels are associated with stronger smelling BO, the current study aimed to determine if, by extension of previous findings, single males' BO smells stronger than partnered males' BO. Eighty-two heterosexual women aged 18–35 years rated the BO and faces of six different males also aged 18–35 years. Consistent with the hypothesis, single men's BO smelled stronger than partnered men's BO and single men's faces were rated as more masculine than partnered men's faces. The possible advantages of females being able to identify single males are addressed in the Discussion.

#### Edited by:

Kok Wei Tan, University of Reading Malaysia, Malaysia

#### Reviewed by:

Caroline Allen, Newcastle University, United Kingdom Jan Havlicek, Charles University, Czechia Shen Liu, University of Science and Technology of China, China

> \*Correspondence: Mehmet K. Mahmut mem.mahmut@mq.edu.au

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 12 September 2018 Accepted: 28 January 2019 Published: 13 February 2019

#### Citation:

Mahmut MK and Stevenson RJ (2019) Do Single Men Smell and Look Different to Partnered Men? Front. Psychol. 10:261. doi: 10.3389/fpsyg.2019.00261 Keywords: mate preferences, mate attraction, masculinity, body odor, face attractiveness

## INTRODUCTION

Humans rely heavily on visual cues to make mate preference judgements. From an evolutionary perspective, mate preferences based on facial attractiveness is advantageous for identifying and selecting a high quality partner (Buss and Schmidt, 1993). For example, research findings have demonstrated that facial attractiveness (Coetzee et al., 2009) and color (Stephen et al., 2011) are associated with physiological health. However, despite the vast majority of research focussing on signals detected by the visual sense, humans do not rely solely on visual cues to assess the suitability of a potential partner but also make judgements using their sense of smell (Stevenson, 2009). Specifically, the body odor (BO) of a potential partner is assessed by our sense of smell (Lübke and Pause, 2015) and given BOs can signal physical health and genetic compatibility with a potential partner, the role of BOs in mate attraction, and preference is not surprising.

In terms of our health, some infections (e.g., gangrene), and diseases (e.g., diabetic ketoacidosis) cause our bodies to emit odors that physicians can reliably recognize and use for diagnostic confirmation (Bijland et al., 2013). In terms of the genetic compatibility of a couple, a set of genes encoding the major histocompatibility complex (MHC) – cell-surface proteins involved in pathogen resistance (Milinski, 2006) that influence our BO (Milinski et al., 2013) – may also contribute to mate preference based on BO preference. For example, women have demonstrated a preference for the BO of men who have dissimilar MHC (Wedekind et al., 1996; Wedekind and Füri, 1997; Sorokowska et al., 2018) and offspring from MHC dissimilar (vs. similar) parents are potentially healthier. However, a recent meta-analysis (Winternitz et al., 2017) on the role of MHC in mate preference in various studies (not just those on BO-based preferences), concluded that mate choice was not driven by MHC differences.

Human BOs are not static and can change due to many factors, such as diet and menstrual cycle. For example, a study that experimentally controlled the amount of red meat consumed over a two-week period, found that a diet higher in meat is associated with unpleasant smelling

BO compared to a non-meat diet (Havlícek and Lenochova, ˇ 2006). However, it must be noted that Zuniga et al. (2017) found that higher meat consumption was associated with more pleasant smelling BO, although meat consumption frequency was based on self-report data which may account for the contrary findings to Havlícek and Lenochova (2006) ˇ . Moreover, men's preferences for female BO vary based on the different stages of a women's menstrual cycle which are associated with the most dramatic changes in hormone levels; giving higher preference ratings for women's BO in the fertile phase of their cycle than those in the non-fertile phase (Gildersleeve et al., 2012).

While research investigating changes in hormone levels predominantly focus on the menstrual cycle, numerous studies have found differences in men's hormone levels based on their relationship status. Specifically, research findings have shown that heterosexual men with higher levels of testosterone were less likely to be married (Booth and Dabbs, 1993; Mazur and Michalek, 1998; van Anders and Watson, 2007; Van Anders and Goldey, 2010) or in long-term relationships (Gray et al., 2004) whereas lower levels of testosterone were associated with being in a romantic relationship. Further, various hormones (e.g., cortisol and testosterone) may affect the quality of a man's BO (Rantala et al., 2006) and how attractive they are perceived to be. For example, Thornhill et al. (2013) found that women's preference for BO of high testosterone men was significantly correlated (r = 0.32) with their probability of conception risk, presumably because higher testosterone may confer some form of evolutionary fitness (see Folstad and Karter, 1992). Similarly, Butovskaya et al. (2013) reported that women in the most fertile phase of their menstrual cycle prefer the BO of men with masculine qualities (e.g., social dominance) and numerous studies have shown women prefer BO of men with symmetrical faces (Gangestad and Thornhill, 1998; Thornhill and Gangestad, 1999; Thornhill et al., 2003).

In van Anders and Watson (2006) social neuroendocrinology theoretical framework, they presented evidence detailing the important role testosterone plays in behaviors that predict evolutionary fitness, namely; competition for resources, establishing a pair bond (securing a relationship), sexual activity plus parenting and pregnancy. A prediction arising from this conceptual framework is that higher testosterone levels are associated with competitive behaviors (such as acquiring resources) whereas lower testosterone levels are associated with pair-bond maintenance behaviors (such as intimate contact; van Anders and Watson, 2006). Given the evidence that men's hormone levels may differ based on their relationship status, and that hormone levels may in turn change the perceptual quality of men's BO, the aim of the current study was to empirically investigate for the first time whether single and partnered men's BO was perceptually different. Moreover, to assess the role that both visual and olfactory perception may play in mate preference, two modalities that are predominantly researched independently, the current study also tested whether the faces of single and partnered men differed based on visual ratings.

To determine whether single men's BO smelled different to the BO of partnered men, heterosexual female participants rated men's BO on five characteristics (e.g., sexiness, liking). Based on previous research suggesting male testosterone levels were positively (but not significantly) associated with stronger smell BO ratings (Rantala et al., 2006) and single males have higher levels of testosterone (e.g., Booth and Dabbs, 1993), we hypothesized that single men's BO would smell stronger than that of partnered men's. Moreover, because stronger smelling BO ratings are associated with lower BO liking ratings (Havlícek ˇ and Lenochova, 2006), we predicted that single men's BO would be liked less and rated less sexy than partnered men's BO. In order to determine whether BO attractiveness predicted facial attractiveness, participants also rated the faces of the BO donors. Although the findings from three previous studies (Rikowski and Grammer, 1999; Thornhill and Gangestad, 1999; Foster, 2008) indicated the correlation between male BO and face attractiveness ratings made by fertile women is low (e.g., r = 0.28, p = 0.030; Thornhill and Gangestad, 1999), we hypothesized that favorable BO ratings (i.e., higher liking and sexiness) would be associated with favorable face ratings (e.g., attractive, masculine). We made no a priori predictions about differences between single and partnered men's face attractiveness ratings. Finally, to ensure the ability to compare the BO and face ratings of single and partnered men, participants rated the stimuli of three single and three partnered unknown men.

## MATERIALS AND METHODS

## Participants

Eight-two (42 single, 40 partnered) heterosexual females (M = 20.2 years, SD = 2.9) completed the study at Macquarie University for credit towards an introductory psychology course. A single participant was someone who was not in a committed romantic relationship whereas a partnered participant was someone was in a monogamous, romantic relationship. Given single and partnered women may perceive a man's BO or face differently (e.g., Little et al., 2002) we included both partnered and single women in this study. Participants were asked about their medical history and to indicate whether their sense of smell functioned normally. Only heterosexual females aged between 18 and 35 years, who indicated they had a normal sense of small with no history of a condition, injury or surgery that compromised their sense of smell prior to, or on the day of the study, qualified for the study. Clearance to conduct the study was granted by the Human Research Ethics Committee at Macquarie University's and all participants and donors gave written and informed consent.

## Donors of Body Odor and Face Pictures

The BOs and face pictures of 91 males formed the stimuli pool for the current study. The donors had no other involvement in the study aside from supplying their BO and face picture. The majority of donors were selected by participants; for partnered participants, the donor was their current partner and for single participants, the donor was their friend or brother. However, the Experimenters also recruited 10 donors to ensure there was a sufficiently large stimulus pool to draw from. All donors had to be aged between 18 and 35 years to qualify for the study. All donors

were heterosexual, except for one who identified as homosexual, whose BO was included in the stimulus pool. Overall, 46 of the BO donors were single and 45 were partnered. However, there was no significant difference between single and partnered donors in terms of their Body Mass Index (BMI; 24.8 vs. 24.3) or age (21 vs. 22.5 years).

## Donor Data, Stimuli Collection and Preparation

#### Body Odor Collection and Preparation

Approximately one week before testing, each participant collected a donor pack from the Experimenter. The donor pack included a new, white, 100% cotton T-shirt in a resealable plastic bag, an instructions sheet and short survey containing demographic questions which participants delivered it to their known donor. Odor donors were instructed to avoid eating odorous foods (e.g., garlic, onion; Fialová et al., 2016) 24 h before and while wearing the T-shirt, wash using non-perfumed products before wearing the T-shirt and not to use perfumed products while wearing the T-shirt (Allen et al., 2016). The donor was instructed to wear the T-shirt for one day (i.e., no more than 24 h) and to not remove the shirt until a significant amount of sweat was absorbed onto the underarm of the T-shirt. The instruction sheet included a photograph of a model wearing a white T-shirt depicting an unacceptable amount of underarm sweat (i.e., approximately 25% of underarm patch appeared wet with sweat) and the minimum acceptable amount of underarm sweat (i.e., approximately 75% of underarm patch appeared wet with sweat). The type of physical activity participants engaged in to produce the sweat was not prescribed but it was suggested that brisk walking or sporting activities may facilitate sweating.

After removing the T-shirt, donors were asked to return the T-shirt to the resealable plastic bag provided and immediately store in a freezer. Participants collected the sweated-in T-shirt from donors and brought it in on the day of testing. Participants were informed of the importance of keeping the shirt in a freezer until bringing it into the lab. Upon receiving the T-shirt, the Experimenter cut out both underarms of the T-shirt and placed each in a new separate, opaque, plastic condiment bottle that was approximately 14 cm tall with a 250 mL capacity. Each bottle had a screw-on lid with an elongated nozzle with a removal cap and a 5mm opening through which the odorant was delivered. When not in use, the bottles stored in a freezer, a method validated in previous studies (e.g., Lenochova et al., 2009).

## Face Pictures

Donors also supplied a current, digital, color, passport-style photo (i.e., neutral face, no hat or glasses) which was digitally adjusted using a computer to a height of 8 cm before being printed (in color) on white, A4-sized paper.

## Donor Demographics

Each donors' height, weight, age, relationship status (i.e., single or partnered) and relationship to participant (i.e., partner, friend, relative) was collected via a short self-report survey that was included in the donor pack.

## Measures

#### Excluded Participants and Variables

Two partnered participants' data were excluded from analyses because one's partner was not within the accepted age range (of 18 to 35 years) and the other returned a T-shirt smelling of perfume. Other measures were administered as part of a larger project, namely self-report measures relating to the nature of the donor-target relationship. The results from these measures were unrelated to the aims and hypotheses of the current study and are therefore not reported here. Finally, to remove any bias associated with preference a participant may have for their donor's BO and/or face, the results presented do not include the data from the ratings participants made of their donor.

## Body Odor and Face Stimuli Selection

The Experimenter selected six different donors' BO and the six corresponding face pictures which each Participant would be presented in a random order. The first BO and face picture selected belonged to the participant's donor. The BOs and faces of the next six donors (three single, three partnered) were randomly selected from two separate donor pools; one consisting of single and the other consisting of partnered donors unknown to the participant.

## Body Odor Characteristics Ratings Task

The six BOs were randomly presented to participants who made five ratings of each BO based on the following questions (variable label in brackets): (1) How much do you like/dislike this smell? ("Like"); (2) How sexy does this odor smell? ("Sexy"); (3) How familiar are you with this smell? ("Familiarity"); (4) How strong does this smell? ("Strong") (5) How much does this odor smell like your odor donor? ("Similarity"), on a 7-point scale from zero (not at all) to six (very). The Experimenter squeezed the bottle containing the BO three times approximately 2.5 cm from participants' nostrils while participants inhaled through their nose. The minimum inter-stimulus interval was 30-s. For each of the five BO characteristics ratings, two variables were computed: the first was the average rating given by the participant to the BO of partnered donors and the second was the average rating given by the participant to the BO of single donors. Therefore, a total of 10 variables were computed. For example, for the BO "Like" ratings, there were two variables created: one was the BO "Like" rating averaged across all single donors that were rated and the second variable created was the BO "Like" rating averaged across all partnered donors that were rated.

## Face Characteristics Ratings Task

Participants were randomly presented with the six faces corresponding to the six BOs selected and asked to rate each face on eight characteristics that have been found to be universally desired (Buss, 1989, 1994; i.e., Masculine, Good Partner, Sexy, Intelligent, Loyal, Attractive, Kind and Trustworthy) on a scale ranging from zero (not at all) to six (very). For each of the eight face characteristics ratings, two variables were computed: the first was the average rating given by the participant to the faces of partnered donors and the second was the average rating given

by the participant to the faces of single donors. Therefore, a total of 16 variables were computed. For example, for the face Masculine ratings, there were two variables created: one was the face Masculine rating averaged across all single donors that were rated and the other was the face Masculine rating averaged across all partnered donors that were rated.

#### Procedure

The study was administered by three different female Experimenters, each conducting a similar number of studies.

#### Preliminary Data Analysis

Note that we tested whether having a beard influenced face masculinity ratings by comparing face masculinity scores of donors with beards (9% of sample) and without (91% of the sample); the results of an independent samples t-tests revealed no significant differences between these groups (all ps > 0.05). We also tested, but found no significant differences, between partnered and single female participants or between females using or not using birth contraception in terms of their ratings of single and partnered men's BO and faces.

## RESULTS

## Were Single and Partnered Men's BO Rated Differently by Single and Partnered Women?

To determine whether single and partnered female participants rated single and partnered men's BO differently on five characteristics (i.e., Strong, Like, Sexy, Familiarity and Similarity), five 2 × 2 mixed design analysis of variances (ANOVA) were ran (see **Table 1** for descriptive statistics). The between-subjects variable in each ANOVA was Participant Relationship Status (i.e., partnered or single) and the within-subjects variable was Donor Relationship Status which had two levels (i.e., partnered or single). The family-wise error rate was adjusted for the five comparisons made such that the alpha-level was set at 0.01 (i.e., 0.05/5).

The first ANOVA was conducted with the BO strong ratings as the dependent variable (DV), which revealed a significant main effect for Donor Relationship Status, F(1,77) = 9.51, p = 0.003, η<sup>p</sup> <sup>2</sup> = 0.11, indicating that averaged across participants, single men's BO was rated as smelling stronger than partnered donor's BO. There was no significant main effect for Participant Relationship Status, F(1,77) = 2.16, p = 0.15, η<sup>p</sup> <sup>2</sup> = 0.03, or Participant Relationship Status × Donor Relationship Status interaction (F < 1).

The next four ANOVAs revealed no significant main or interaction effects (with 11 of 12 F-values < 1) indicating that partnered and single women did not rate partnered and single men's BO different on BO characteristic ratings of Like, Sexy, Familiarity and Similarity.

## Were Single and Partnered Men's Faces Rated Differently by Single and Partnered Women?

To determine whether single and partnered female participants rated single and partnered men's faces differently on eight characteristics (i.e., Masculine, Good Partner, Sexy, Intelligent, Loyal, Attractive, Kind, And Trustworthy), eight separate 2 × 2 mixed design analysis of variances (ANOVA) were conducted (see **Table 2** for descriptive statistics). The between-subjects variable in each ANOVA was Participant Relationship Status (i.e., partnered or single) and the within-subjects variable was Donor Relationship Status (i.e., partnered or single). The family-wise error rate was adjusted for the eight comparisons made such that the alpha-level was set at 0.006 (i.e., 0.05/8).

The first ANOVA was conducted with the face Masculine ratings as the DV, revealing a significant main effect for Donor Relationship Status, F(1,77) = 18.76, p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.20

TABLE 1 | Single and partnered women's ratings of single and partnered men's body odor.




and interaction for Donor Relationship Status by Participant Relationship Status, F(1,77) = 9.70, p = 0.003, η<sup>p</sup> <sup>2</sup> = 0.11 (see **Figure 1**). The main effect for Participant Relationship Status was not significant (F < 1). Follow-up contrast testing revealed the nature of the interaction, that is, while partnered female participants rated single men's faces as more masculine than partnered men's faces, t(39) = 5.72, p < 0.001, d' = 0.93, single women did not rate partnered and single men's faces differently on Masculine, t < 1.

The second ANOVA was conducted with the face Kind ratings as the DV, revealing a significant main effect for Donor Relationship Status, F(1,77) = 14.95, p < 0.003, η<sup>p</sup> <sup>2</sup> = 0.16 and interaction for Donor Relationship Status by Participant Relationship Status, F(1,77) = 9.53, p = 0.003, η<sup>p</sup> <sup>2</sup> = 0.11 (see **Figure 2**). The main effect for Participant Relationship Status was not significant (F < 1). Follow-up contrast testing revealed the nature of the interaction, that is, while partnered female participants rated partnered men's faces as appearing kinder

participant relationship status (single vs. partnered).

than single men's faces, t(39) = 4.94, p < 0.001, d' = 0.95, single women did not rate partnered and single men's faces differently on Kind, t < 1.

The third ANOVA was conducted with the face Trustworthy ratings as the DV, revealing that the main effect for Donor Relationship Status was not significant, F(1,77) = 3.68, p = 0.059, ηp <sup>2</sup> = 0.0, nor was the main effect for Participant Relationship Status (F < 1). While a significant interaction for Donor Relationship Status by Participant Relationship Status was found [F(1,77) = 5.45, p = 0.022, η<sup>p</sup> <sup>2</sup> = 0.07], this effect was not significant based on the adjusted alpha-level.

The fourth ANOVA was conducted with the face Loyalty ratings as the DV, revealing that the main effect for Donor Relationship Status was not significant, F(1,77) = 3.87, p = 0.053, ηp <sup>2</sup> = 0.05, nor was the Participant Relationship Status main effect, F(1,77) = 1.30, p = 0.26, η<sup>p</sup> <sup>2</sup> = 0.02. While a significant interaction for Donor Relationship Status by Participant Relationship Status was found [F(1,77) = 4.59, p = 0.035, η<sup>p</sup> <sup>2</sup> = 0.05], this interaction effect was rendered non-significant based on the adjusted alpha-level.

The next four ANOVAs conducted revealed no significant main or interaction effects (with 8 of 12 F-values < 1) indicating that partnered and single women did not rate partnered and single men's faces different on ratings of good partner, sexy, intelligent, and attractive.

## Exploratory Analyses: Do BO Ratings Predict Face Ratings?

In order to determine whether BO Characteristics ratings predicted Face Characteristics ratings, a Spearman's rho correlation analysis was conducted, which overall, revealed favorable BO ratings (i.e., Sexy and Like) were associated with favorable face ratings (e.g., Attractive, Intelligent; see **Table 3**). For example, higher BO Like ratings were significantly correlated with rating faces more Attractive (r = 0.29, p = 0.008), Masculine (r = 0.30, p = 0.007), Sexy (r = 0.26, p = 0.019), and someone who would make a Good Partner (r = 0.33, p = 0.003). The inter-correlations among the face ratings were positive and statistically significant (except for four); the lowest was between Intelligent and Good Partner (r = 0.12, p = 0.30) and the highest was between Sexy and Attractive (r = 0.89, p < 0.001). The inter-correlations among the BO ratings were mostly positive and statistically significant, except for those with the Strong ratings. The lowest significant correlation was between Familiarity and Sexy ratings (r = 0.48, p < 0.001) and the highest was between Sexy and Like (r = 0.78, p < 0.001; see **Table 3**).

The ANOVA results reported above demonstrated that partnered and single participants rated partnered and single donors differently, specifically on the BO Strong ratings and a subset of the face ratings (i.e., Masculine, Loyal, Kind, and Trustworthy). Therefore, we explored the correlations amongst the ratings indicated by the ANOVA findings to determine the nature of the differences between partnered and single females' ratings. This exploration revealed that the largest discrepancies were all based on ratings of partnered donors' BO and faces. The largest discrepancy was the correlation between BO Strong and Face Trustworthy ratings: specifically, for the ratings given by partnered women, we found a negative correlation (r = -0.35, p = 0.025) whereas for the ratings given by single women, we found a positive correlation (r = 0.11, p = 0.51). A Fisher's r-to-z transformation comparison test indicated these two correlations were significantly different (Z = 2.07), confirming that higher BO Strong ratings were associated with lower Face Trustworthy ratings for partnered women but no such relationship existed for single women. While there were other similarly large discrepancies between partnered and single participants' ratings, none were significantly different.

## DISCUSSION

Consistent with our hypothesis, single men's BO was rated as smelling stronger than the BO of partnered men. We also found that single men's faces were rated as more masculine


Results based on Spearman's rho correlations. <sup>∗</sup>p < 0.05, ∗∗p < 0.01. BO = body odor.

than partnered men's faces, but only among partnered women. Moreover, partnered women rated partnered men's faces as kinder, more trustworthy and loyal than single men's faces, but single females rated partnered and single men's faces similarly on these characteristics. Finally, the results showed favorable BO ratings were correlated with favorable ratings of the corresponding faces. Although testosterone levels were not directly tested here, the current study's findings are congruent with previous research showing that single and partnered males can be differentiated based on their testosterone levels (e.g., Van Anders and Goldey, 2010), that higher testosterone levels are associated with a stronger smelling BO (Rantala et al., 2006) and that more intense BOs are rated more masculine smelling (Havlícek and Lenochova, 2006 ˇ ).

An obvious question is; why would a single male's BO smell different from that of a partnered man's BO? The social neuroendocrinology theoretical framework (van Anders and Watson, 2006) helps frame a possible answer to this question. Specifically, BOs are the manifestation of our current endocrinology (e.g., low or high testosterone levels) which signal the fitness, viability, and/or availability of a potential mate. Based on their study's results, Van Anders and Goldey (2010) concluded that single males have higher levels of testosterone than partnered males because of the sexual competition associated with being single and that low testosterone levels are associated with bond maintenance. From an evolutionary perspective, it may be advantageous for women to be able to detect the chemosignals that connote coupledom and ultimately avoid courting partnered males (especially with offspring) due to the relatively reduced resources they can offer.

An alternative explanation is that single men's BO may smell more intense than partnered men's BO because of their poorer health and/or hygiene. Evidence for this assertion comes from research showing single men have poorer physical and mental health outcomes than partnered men (Hu and Goldman, 1990) which may manifest as poorer hygiene and therefore BO. Further evidence comes from research showing married men are significantly more likely to seek health care due to their wives' influence compared to unmarried men (Norcross et al., 1996). While we found no evidence that single men were less healthy than partnered men based on the fact there were no significant group differences in terms of BMI, the positive health impact of having a partner may explain our findings.

The current study's finding that single men's faces were rated significantly more masculine than partnered men's faces (among partnered women only) is consistent with previous research showing higher testosterone levels are associated with more intense smelling BO (Rantala et al., 2006); especially when considered in conjunction with the finding that single men have higher levels of testosterone than partnered men (e.g., Van Anders and Goldey, 2010). Given higher testosterone levels are associated with more masculine facial features (Penton-Voak and Chen, 2004), it is possible single males in the current sample had higher levels of testosterone. However, a single man's facial features are unlikely to change overnight unlike their relationship status, so alternative explanations for the differences in Masculine ratings for partnered and single men's faces must be considered. While facial features do change with age, partnered males were not older than single males so age can be ruled out as an explanation for group differences in facial masculinity. Having a beard was also excluded as an explanation for higher Masculine ratings of single men's faces but it remains possible that individual differences in what constitutes a "masculine" face may, to some extent, account for the finding.

While it is curious that only partnered women rated single men's faces as more Masculine than partnered men's faces, previous research indicates partnered women in the fertile phase of their menstrual cycle (compared to those in their non-fertile phase) find single men's faces more attractive than partnered men's faces, especially if they are masculine-versus feminine-looking (Bressan and Stranieri, 2008). A limitation of the current study was that participants' menstrual cycles were not assessed so we cannot conclude whether menstrual cycle phase influenced their face masculinity ratings. Further limitations include not supplying donors with non-perfumed body cleansing products or specifying a specific duration of exercise, which may have contributed to variability in the quality and nature of the stimuli collected.

The correlations between BO and face ratings revealed a consistent pattern of results indicating favorable BO ratings were associated with favorable face ratings. While the current study's findings are congruent with previous findings, the positive relationship between BO and face ratings has largely been demonstrated with female participants in the fertile phase of their menstrual cycle (Rikowski and Grammer, 1999; Thornhill and Gangestad, 1999). However, the correlation between BO like ratings and face attractiveness ratings for low fertile compared to high fertile women in both studies (i.e., Rikowski and Grammer, 1999; Thornhill and Gangestad, 1999) were not significantly different, suggesting no reliable group differences. Moreover, Allen et al. (2016) found women's ratings of masculinity for men's BO was positively and significantly correlated with face masculinity ratings, although the women's menstrual phase was not recorded in their study, either. As we could not compare the BO and face rating correlations based on a participant's menstrual phase, it remains possible that differences exist between the low and high fertility phases of the menstrual cycle.

While the current results show a single man's BO smells more intense and their face appears more masculine than a partnered man's, the findings are preliminary and require replication. A specific aim of future research would be to determine whether testosterone levels are responsible for the differences in BO and face ratings between single and partnered men found in the current study. This could be achieved in a single study using the same participants with the aim to (a) replicate the finding that single men's BO smells more intense than partnered men's BO; (b) replicate the finding that single men's faces are rated more masculine than partnered men's faces; (c) confirm that single men have higher levels of testosterone than partnered men; (d) assess women's menstrual

cycle phase, and (e) comparing an individual's BO while single and coupled. Future studies would also benefit from ruling out alternative explanations for BO differences between single and partnered men, such as those associated with poor physical and mental health.

## AUTHOR CONTRIBUTIONS

MM and RS was involved in the study design, data and analyses, and production and editing of the final document.

#### REFERENCES


### ACKNOWLEDGMENTS

We thank Macquarie University for providing funding to conduct the current study (grant no. 9201400673). Macquarie University had no involvement in the study design, data collection, data analysis, data interpretation, writing of this manuscript, or selection of the journal to submit the manuscript to. We also thank the meticulous, detailed and constructive feedback received from the Reviewers. Finally, we would like to thank the Research Assistants who ran the study; Dr. Samantha Adams, Dervisen Komuksu, and Madeleine Fraser.



Zuniga, A., Stevenson, R. J., Mahmut, M. K., and Stephen, I. D. (2017). Diet quality and the attractiveness of male body odor. Evol. Hum. Behav. 38, 136–143. doi: 10.1016/j.appet.2015.11.001

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a past co-authorship with one of the authors RS.

Copyright © 2019 Mahmut and Stevenson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Interplay Between Economic Status and Attractiveness, and the Importance of Attire in Mate Choice Judgments

#### Amany Gouda-Vossos<sup>1</sup> , Robert C. Brooks<sup>1</sup> and Barnaby J. W. Dixson<sup>2</sup> \*

<sup>1</sup> Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, The University of New South Wales, Sydney, NSW, Australia, <sup>2</sup> School of Psychology, The University of Queensland, Brisbane, QLD, Australia

Desirable characteristics of "opposite sex others," such as physical attractiveness and economic status, can influence how individuals are judged, and this is different for men and women. However, under various social contexts where cues of higher or lower economic status is suggested, sex differences in judgments related to mate choice have not been fully explored. In two studies, ratings of economic status and attractiveness were quantified for male and female targets that were presented under various social contexts. Study 1 assessed judgments (n = 1,359) of images of nine male and nine female targets in different sized groups containing only opposite-sex others (i.e., group size). While we found no significant effects of group size on male and female attractiveness, target female economic status increased when surrounded by two or more men. An ad hoc analysis controlling for the attire of the targets (business or casual) found that the association between target female economic status and group size occurred when females were in business attire. Study 2 investigates this effect further by presenting images of 12 males and 12 females, in higher and lower status attire (i.e., business and casual clothing) and measured judgments of attractiveness and economic status among women and men (n = 1,038). Consistent with the results of Study 1, female economic status was only affected when women were in business attire. However, female economic status decreased when in the presence of other men in business attire. There were no sex differences in judgments of economic status when judging stimuli in casual attire. Additionally, negative associations between attractiveness and economic status were found for males presented in casual attire. We discuss these results in the light of evolutionary sexual conflict theory by demonstrating how the asymmetrical importance of status between men and women can influence mate choice judgments.

Keywords: sex difference, status, attire, attractiveness, mate choice copying, economics

#### Citation:

Edited by: Alex L. Jones,

Reviewed by: Viktoria Mileva,

Andrew Thomas,

\*Correspondence: Barnaby J. W. Dixson b.dixson@uq.edu.au

Specialty section: This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Gouda-Vossos A, Brooks RC and Dixson BJW (2019) The Interplay Between Economic Status and Attractiveness, and the Importance of Attire in Mate Choice Judgments. Front. Psychol. 10:462. doi: 10.3389/fpsyg.2019.00462

Evolved mate preferences often target attributes that signal dimensions of reproductive health (Buss and Schmitt, 1993; Puts, 2010). In women, age-related physical cues such as feminine facial shape, breast morphology, and an hourglass distribution of body fat are attractive to men (Jasienska ´

et al., 2004; Singh et al., 2010; Dixson et al., 2011, 2015; Marcinkowska et al., 2014), ostensibly because they signal fecundability. In men, muscularity, vocal pitch, and facial masculinity provide

Swansea University, United Kingdom

University of Stirling, United Kingdom

Swansea University, United Kingdom

Received: 05 September 2018 Accepted: 15 February 2019 Published: 21 March 2019

GENERAL INTRODUCTION

information regarding health, age, social status, dominance, and formidability, that enhance mating success (Archer, 2009; Puts, 2010; Hill et al., 2013; Dixson et al., 2014).

Judgments of physical attractiveness are also shaped by factors other than physical attributes (Dixson, 2019; Luoto, 2019). For instance, men are more likely to be rated as more physically attractive when they are presented with high status cues such as an expensive car (Dunn and Searle, 2010; Shuler and McCord, 2010) or an upscale apartment (Dunn and Hill, 2014). While these cues may not influence ratings of women's physical attractiveness, high status may drive intersexual competition between women (Wang and Griskevicius, 2013). Judgments of physical attractiveness also increase with the addition of other people, an effect known as "mate choice copying" (Waynforth, 2007). Men are more likely to be rated as more attractive and to have higher economic status when in the presence of women, whereas mate choice copying effects are negligible when women are in the presence of men (Gouda-Vossos et al., 2016, 2018).

The associations between sex, economic status, and physical attractiveness may reflect evolved sex differences in mate choice (Gouda-Vossos et al., 2018). Throughout human evolution, resource acquisition positively influenced male reproductive success (Low, 1990, 2000; Smith, 2004), so that sexual selection may have favored status seeking behavior in men (Betzig, 1986; Dixson, 2016; Von Rueden and Jaeggi, 2016). An association between men's status and reproductive success has been reported across many small-scale (Von Rueden and Jaeggi, 2016) and several industrialized societies (Li et al., 2002; Hopcroft, 2006), which may have implications regarding the formation of social perceptions of women and men. For instance, participants judged female economic status comparatively lower than that of the males they were presented alongside (Gouda-Vossos et al., 2016). Unlike men, women's success can be judged negatively, as high status or successful women are more frequently derogated (Heilman et al., 2004), particularly when dressed in short skirts or shirts displaying cleavage (Glick et al., 2005; Howlett et al., 2015). Conversely, physical attractiveness is beneficial to women as attractive individuals receive favorable treatment (Rosenblat, 2008) and are more likely to find jobs and get promoted (Hamermesh and Biddle, 1993; Pfann et al., 2000), which benefits women more than men within employment scenarios (Busetta et al., 2013). However, the interplay between physical attractiveness, economic status, and attire in mixed social contexts that are more comparable to real-world scenarios has yet to be explored.

The current research assesses how modifiable cues of economic status influence judgments of an individual's physical attractiveness and economic status. Previous studies reported that high status men were more likely to receive respect and praise compared to women in high status roles (Forsythe, 1990; Brase and Guy, 2004). Studies have also shown that high status women within mixed sex groups were just as likely as men to attain leadership positions (Goktepe and Schneier, 1989). However, high status women were judged to be less attractive and approachable than high status men (Howlett et al., 2013). Additionally, ratings of women's economic status are lower relative to men they are presented alongside (Gouda-Vossos et al., 2016). However, whether this association persists when women are presented as higher in economic status than men or if physical attractiveness influences ratings of economic status remains unknown.

Based on evolutionary theories regarding the importance of status in male reproductive success (Hopcroft, 2006; Von Rueden et al., 2010) and mate choice copying theory (Waynforth, 2007; Gouda-Vossos et al., 2018), we predicted that cues of higher social status should have a stronger positive effect on ratings of economic status and physical attractiveness in men than in women. We also predicted that women's economic status would be rated lower than the men they were presented alongside, even if women appeared to be higher in economic status than men (Gouda-Vossos et al., 2016). We conducted two studies, both of which manipulated economic status via clothing in male and female stimuli and measured participant attractiveness and monetary earnings ratings of the stimuli. We first tested the effects of the presence of "opposite sex others" by manipulating mixed sex group sizes (Study 1). Based on the results of Study 1, we designed Study 2 wherein various forms of attire were used to manipulate social status, including the presence and absence of "opposite sex others" in various forms of attire.

## STUDY 1: JUDGEMENTS OF ATTRACTIVENESS AND ECONOMIC STATUS WITHIN MIXED-SEX GROUPS

Dynamics within groups can vary depending on the size of social groups and the distributions of gender therein. All-male groups tend to be more aggressive and competitive toward other group members than all-female groups (Schopler et al., 2001). Additionally, all-male groups form more stable hierarchies faster than all-female groups (Anderson et al., 2001) and are more likely to collaborate intra-sexually when an outside threat is present (Vugt et al., 2007). Sharing cooperatively produced resources is also an important factor within collective groups (Mangel, 1990; Melis and Semmann, 2010) and the dynamics within groups can vary if some members are more likely to obtain larger portions of resources (relative to other members) and subsequently gain direct benefits (Williams, 2002). Among men, resource acquisition and holding potential enhance mating opportunities and mating success (Betzig, 1986; Von Rueden et al., 2010). As a result, expectations and opportunities vary between men and women within mixed-sex groups (Eagly and Johnson, 1990). Mate choice copying studies have also found that men are judged to be more physically attractive when presented within a group of women, while women presented alongside men are not (Milonoff et al., 2007; Hill and Buss, 2008; Dunn and Doria, 2010).

Behaviors within groups may be driven by similar mechanisms associated with sex differences in mate choice, especially in reference to associations between status (social or economic) and physical attractiveness. In men, cues of social status, dominance, and formidability that enhance male physical attractiveness (Hill and Buss, 2008; Archer, 2009; Puts, 2010; Dixson et al., 2017, 2019) and mating success (Hill et al., 2013;

Kordsmeyer et al., 2018) may also predict assertiveness and group leadership (Anderson et al., 2001; Geniole et al., 2015). While cues of status may also predict the emergence of female leaders in groups (Anderson et al., 2001) they may not augment women's physical attractiveness and mating success (Puts, 2010). To test the effects of social group size on ratings of male and female attractiveness and economic status, we presented images of women and men in the presence of social groups varying in the number of opposite sex targets. Thus, each male and female was rated alone and again alongside opposite sex targets in increments of 1, 2, and 4 additional opposite sex individuals.

## Materials and Methods

#### Participants

Participants were recruited via Facebook, Twitter, and internal student email lists within the research institution, resulting in 1359 participants in total. All participants were over 18 years old and were not aware of the purpose of the study. Each participant provided details of their biological sex, age, and sexual orientation using a Kinsey Scale (Kinsey et al., 1948, 1953). As sexual orientation impacts on judgments of attractiveness of opposite sex targets (Petterson et al., 2015, 2016, 2018; Valentova et al., 2017), only participants who were heterosexual or bisexual were retained in the analyses (i.e., Kinsey scale 0–3). The final analysis included 569 (women = 494; men = 75) participants who completed surveys including target males and 598 (women = 357; men = 241) who completed the survey's including target females. The age range of participants was 29.04 years ± 9.4. The majority of participants listed their country of origin as Australia (38.7%), followed by USA (20%), then UK (17.7%). The majority identified as North Western European, British, or Irish (51.2%), followed by European Mixed Race (13.8%), then Southern European (3.8%) with 8% stating they "did not wish to report ethnicity."

#### Stimuli

Images of nine male and nine female targets surrounded by four members of the opposite sex (females and males, respectively) were chosen from a stock photo website<sup>1</sup> . This resulted in a total of 18 original images. Each target was presented in four group size conditions [alone, one opposite sex, two opposite sex, and four opposite sex others; for examples, see **Electronic Supplementary Material S1** (ESM 1)]. Overall, 72 images were constructed and used in this study, with the target pose and facial expression identical between treatments. The targets ages ranged from 22 to 56 years (males: mean = 40, SD ±12.3, females: mean = 38, SD ±12.7). All images were professionally taken under standardized lighting and filters. Photographs were taken in workplaces and casual settings with positions of targets and opposite sex others varying from image to image.

#### Procedure

Experiments were conducted on-line via www.socialsci.com. Each participant entering the study was randomly assigned to one of four experiments in which they rated either male or female targets for either attractiveness or monetary earning (i.e., economic status). The number of participants for each experiment was as follows: Attractiveness/target female: 202 women, 150 men; Earnings/target female: 155 women, 91 men; Attractiveness/target males: 259 women, 39 men; Earnings/target males: 235 women, 36 men. The study employed a "Within Target – Between Treatment" design where participants saw all nine targets in random order with the treatment (target alone, one opposite sex other, two opposite sex others, and four opposite sex others) for each target drawn at random. Similar designs have been used in past research on physical attractiveness (Janif et al., 2014, 2015; Brooks et al., 2015; Dixson et al., 2016). This research was approved by the University of New South Wales Human Research Ethics Advisory Board (Psychology) (HREAP 1880).

Participants were informed that they would be shown a range of images of people. In each image, the target was indicated with an arrow. If assigned to rate physical attractiveness, participants were asked to rate each target using a percentile scale from 0 to 100 where "50" indicated that the individual is more physically attractive than 50% of other individuals of the same sex (i.e., of median attractiveness). If rating economic status, participants were asked to rate each target using a percentile scale from 0 to 100 where "50" indicated the individual earns more than 50% of other same sex individuals in full time work (i.e., median income in full-time work).

#### Analysis

Multilevel modeling was used where data were organized so that each row represented one participants rating of one target in one treatment. Using the statistical software SPSS, separate general linear mixed models (MLMs) were fitted for the two dependant variables (physical attractiveness or economic status). In each of these models, Model ID was a repeated-measures factor, Participant ID was a random factor. Participant sex and Group Size (alone, +1 opposite sex individual, +2 opposite sex individual, and +4 opposite sex individuals) were included as fixed factors. SPSS does not calculate effect sizes for mixed models. Thus, we calculated approximate effect sizes as partial Eta-squared, from the F-test and degrees of freedom, although this practice has not been formally validated for multi-level models. When interpreting effect sizes, by convention, effects of 0.2, 0.5, and 0.8 are interpreted as small, medium, and large effect sizes, respectively.

#### Results

#### The Effect of Group Size on Male and Female Attractiveness and Economic Status

There were no significant main effect or interactions involving Group Size; suggesting no differences in the ratings of attractiveness across treatment (**Figures 1A,B** and **Table 1a**). The significant main effects of participant sex on ratings of target females were due to women rating target females as more attractive (mean = 59.85, SE ±0.391) than men (mean = 58.66, SE ±0.456).

Like the results for attractiveness, male ratings of economic status were not affected by Group Size as there was no significant main effect or interactions with participant sex (**Figure 1C** and **Table 1b**). However, ratings for target females revealed

<sup>1</sup>www.peopleimages.com

<sup>∗</sup>P < 0.05, ∗∗P < 0.01, determined by post hoc least significance difference tests.


a significant main effect of Group Size (**Table 1b**) as female economic status increased incrementally with the addition of two and four males (**Figure 1D**). There was no significant Participant Sex × Group Size interaction, suggesting that men and women were rating target females similarly (**Table 1b**).

#### The Effect of Attire and Group Size on Target Attractiveness and Economic Status; an ad hoc Analysis

Images included targets in either casual or business attire, which may have affected ratings. To test this, targets were classified as wearing either business or casual attire using methods from Forsythe (1990). Business attire referred to dark, angular, traditional business suits whereas casual attire referred to light, informal, everyday wear. There were four business and five casual attired target males and five business and four casual attired target females. Full analysis can be found in **Electronic Supplementary Materials S2**, **S3** (ESM 2: Male and Female Attractiveness and ESM 3: Male and Female Economic Status).

Another series of MLMs were conducted, with attire included as a fixed factor. We found no effects of Attire on target male attractiveness (F1,<sup>2555</sup> = 0.851, P = 0.356, η 2 <sup>p</sup> = 0.00033), Group Size (F3,<sup>2555</sup> = 0.091, P = 0.965, η 2 <sup>p</sup> = <0.001), and Participant Sex (F1,<sup>2555</sup> = 0.961, P = 0.339, η 2 <sup>p</sup> = <0.001). The effect sizes for all main effects were small for male attractiveness (i.e., less than 0.2), suggesting that both Attire and Group Size do not strongly impact on male attractiveness.

There was a main effect of Attire on target male economic status (F1,<sup>1877</sup> = 533.83, P < 0.001, η 2 <sup>p</sup> = 0.220), but not Group Size (F3,<sup>1853</sup> = 1.408, P = 0.239, η 2 <sup>p</sup> = 0.002), or Participant Sex (F1,<sup>1877</sup> = 2.156, P = 0.142, η 2 <sup>p</sup> = 0.001). Both men and women rated male economic status higher when in business attire (mean = 65.32, SE±0.713) than when in casual attire (mean = 39.43, SE ±0.865). We did not find mate choice copying effects, which suggest that the type of attire men were wearing influences ratings of target males more than the presence of other females.

There was a main effect of Attire on attractiveness ratings of target females (F1,<sup>2963</sup> = 68.13, P < 0.001, η 2 <sup>p</sup> = 0.023) but no main effect of Group Size (F3,<sup>2952</sup> = 1.937, P = 0.121, η 2 <sup>p</sup> = 0.002) or Participant Sex (F1,<sup>2963</sup> = 2.982, P = 0.084, η 2 <sup>p</sup> = 0.010). Target females were rated as more attractive when in business (mean = 61.46, SE ±0.404) than Casual Attire (mean = 56.52, SE ±0.442), although the effect size was small (i.e., below 0.2) and comparable to Group Size, suggesting that the impact of Attire on female attractiveness is small.

Ratings of female earnings were also significantly affected by Attire (F1,<sup>2075</sup> = 356.54, P < 0.001, η 2 <sup>p</sup> = 0.150) and Group Size (F2,<sup>2057</sup> = 3.181, P = 0.023, η 2 <sup>p</sup> = 0.005), although both effect sizes were small (i.e., below 0.2). A significant Attire × Participant Sex interaction (F1,<sup>2057</sup> = 18.68, P < 0.001, η 2 <sup>p</sup> = 0.009) occurred due to women rating females higher in economic status when in business attire than men (women mean = 63.57, SE ±0.566; men mean = 61.6, SE ±0.740; P = 0.043) and lower when in casual attire than men (women mean = 47.58, SE ±0.622; men mean = 51.89, SE±0.785; P < 0.001). There was also a significant Attire × Group Size interaction (F1,<sup>2057</sup> = 6.369, P < 0.001, η 2 <sup>p</sup> = 0.009), which reflects that ratings of female economic status increased incrementally with the addition of two and four males when females were presented in business but not casual attire (**Figures 2A,B**). This suggests that the original results of female economic status were likely driven by the responses toward target females in business attire, as opposed to casual attire.

In order to determine the model of best fit, we used the Akaike Information Criterion (AIC). We monitored the AIC between the original models and the ad hoc analysis. If AIC changes downward by more than 2 units, the model is significantly better (Bozdogan, 1987). Using both the level of significance of interactions and the AIC allows us to test the validity of different restrictions of a model and to choose a

model with the smallest probability of rejection to be the best fitting model as opposed to choosing based on a priori ground (Bozdogan, 1987). Across all four ad hoc models, the AIC was significantly lower than the original models constructed (Female Attractiveness, Original AIC: 26858.21, ad hoc AIC: 26,768.36; Male Attractiveness, Original AIC: 23,691.96, ad hoc AIC: 23,657.82; Female Earnings, Original AIC: 18,833.71, ad hoc AIC: 18,405.63; Male Earnings, Original AIC: 22,008.02, ad hoc AIC: 21,434.76). This suggests that entering Attire as a fixed factor improves all the models.

#### Discussion

Contrary to our predictions, ratings of attractiveness for target male and females were not influenced by social group size. Further, the economic status of males was unaffected when in the presence of an opposite sex other (i.e., female). Whereas the addition of opposite sex others positively influenced the economic status of target females. Previous studies have found that woman's task proficiency (Balkwell and Berger, 1996), economic status (Gouda-Vossos et al., 2016), and social status (Eagly and Karau, 2002) were rated lower than the men they were compared with, suggesting that the mere presence of a man can lower perceptions of women's status within an economic hierarchy. However, the type of attire women wear may reduce these effects as women dressed in more masculine attire (i.e., traditional business attire) are seen as having better managerial characteristics (Forsythe et al., 1984, 1985) are more likely to get hired for leadership positions (Forsythe, 1990), and are just as likely as men to emerge as leaders within a mixed sex group (Goktepe and Schneier, 1989). Taken together with the results of the current studies, women's perceived economic status appears to be heavily influenced by high status and masculine cues such as business attire and the number of men within their immediate presence.

However, we are limited in making these assumptions regarding attire, as the numbers of targets used were too small after separation for ad hoc analysis (i.e., four business and five casual attired target males and five business and four casual attired target females). Further, the position of the target in each photo was not randomized or controlled and therefore we could not conclude whether participants perceived targets as leaders, which could have influenced ratings of attractiveness and economic status. It is also possible that even though female economic status increased with the addition of "male others," ratings of female status may still be made relative to men. Unfortunately, we did not obtain ratings of earnings and attractiveness of the male "opposite sex others" to confirm this. Thus, we designed a second study focused on the impact of attire of various social status (business/casual) and measured effects of male and female attractiveness and economic status in targets presented individually and when paired (Study 2).

## STUDY 2: THE EFFECTS OF ATTIRE ON MEN AND WOMEN

Attire communicates information relating to identity, social status, and position within a hierarchy (Roach-Higgins and Eicher, 1992). Molloy and Potter (1975) suggested that attire may play a pivotal role in judgments of an individual's credibility, likeability, interpersonal attractiveness, and dominance. Bassett et al. (1979) found that high status clothing positively influenced judgments of credibility. However, females were rated lower than males across all four measurements that composed credibility (i.e., potency, character, composure, and competence). Unlike men, women's success can be judged negatively, as high status or successful women are more frequently derogated (Heilman et al., 2004) and are judged more negatively when dressed in short skirts (Glick et al., 2005; Howlett et al., 2015). However, physically attractive women receive better treatment (Rosenblat, 2008), are more likely to find employment, and are more likely to get promoted (Hamermesh and Biddle, 1993; Pfann et al., 2000), which may not be the case among men (Busetta et al., 2013).

In Study 2, we measured associations between rated physical attractiveness and economic status in male and female targets in different attires. We also assessed if the presence of opposite sex others in various attire influenced judgments of male and female targets. Based on mate choice copying theories (Waynforth, 2007; Gouda-Vossos et al., 2018), we hypothesized that males would attain high ratings of physical attractiveness and economic status when presented in high status attire (i.e., business attire) regardless of the attire the opposite sex other. As economic status may not have had strong effects on female reproductive success during human evolution (Betzig, 1986), we did not predict a positive association between attractiveness and economic status with target females. Based on the results of Study 1, we predicted that female economic status will be limited to the ratings of males when in casual but not business attire.

## Materials and Methods

#### Participants

A total of 1,035 participants were recruited. All participants were over 18 years old and were not aware of the purpose of the study. Each participant provided details of their biological sex, age, and sexual orientation using the Kinsey Scale (Kinsey et al., 1948, 1953). Participants received \$1US. As in Study 1, only participants who were heterosexual or bisexual were retained in the analyses (i.e., Kinsey scale 0–3). A total of 459 females and 578 males (Age 32 ± 10.5) were included in the final analysis. The majority of participants listed their country of origin as United States (82.4%), followed by Southern Europe (4.4%), then Australia (3.4%). The majority identified as ethnically North Western European, British, or Irish (40.4%), followed by European Mixed Race (17.2%), Southern European (7.2%), and 10% elected not to state their ethnicity.

#### Stimuli

Full body, color photographs of 12 male and 12 female targets were obtained from a stock photo website<sup>2</sup> . All photographs were taken using standardized lighting and filters and were on a white background. Two sets of images for each male and female target were obtained (i.e., either in casual attire or business attire),

<sup>2</sup>www.peopleimages.com

Permission.

comprised of 24 male and 24 female target images. We then created composite images, where each target (in both business and casual attire) was paired with opposite sex others (six in business and six in casual attire), resulting in a total of 144 composite images for male targets and 144 composite images for female targets (**Figure 3**). Business attire included suits, collared shirts, and pencil skirts (for females not wearing pant suits). Casual attire included t-shirts, jeans, shorts, or skirts (**Figure 3**). PeopleImages.com collects information on the targets they recruit, including biological sex, age, and ethnicity. The targets ages ranged from 20 to 30 years (Males Target mean = 25, SD ±4.8 years; Females Target Mean = 24, SD ±4.3) and the majority were Caucasian (66%) followed by multi-ethnic (16%), then African and Latino (9% each).

#### Procedure

<sup>3</sup>www.socialsci.com

Studies were conducted online using the SocSci platform<sup>3</sup> and participants were recruited via MTurk. Participants each

rated two batches of 12 images (24 in total). One batch of 12 images included either male or female targets alone. The other batch included male or female targets with opposite sex others. The order of the batches as well as the target sex was fully randomized, so that participants were presented with four possible combinations (i.e., female alone/male with other; female with other/male alone; male alone/female with other; and male with other/female alone). Within each batch, each target was drawn and shown once, in random order and either in casual or business attire (drawn at random with equal probability) (see **Figures 3A,B** for examples). Thus, each target was presented to the participant either alone or with an opposite sex other, in either business or casual attire. This design ensured participants saw all possible targets (male and female) in only one type of attire (i.e., Within Target – Between Treatment design). By designing the study in this manner, participants do not see the same target more than once in different scenarios, which minimizes possible carry-over effects and avoided participants deciphering the true nature of the study that previous studies have shown to influence ratings of targets (Chen, 2008). Experimental designs like this

have been previously employed to test preferences for physical appearance (Janif et al., 2014, 2015; Brooks et al., 2015; Dixson et al., 2016). Participants rated targets using a sliding scale (from 0 to 100) provided below each image for physical attractiveness and economic status using the same scales as in Study 1. This research was approved by the University of New South Wales Human Research Ethics Advisory Board (HREA 155047).

#### Analysis

Using the statistical software, SPSS, we first tested the influence of individual sex and attire on ratings of attractiveness and economic status by focusing on the ratings of attractiveness and economic status of targets when presented alone. This allowed us to determine how target attire (business/casual) and target sex (female/male) influence these ratings. This 2 × 2 between-subject design (Sex of target – Male/Female) × (Attire of Target – business/casual) employed MANOVAs, with rated attractiveness and economic status as dependant variables.

We then assessed if the attire of the opposite sex other influenced male and female attractiveness and economic status by focusing on ratings of targets when presentenced with opposite sex others. This was a 2 × 2 × 2 × 2 betweensubject design (Sex of target – Male/Female) × (Attire of TARGET – business/casual) × (Attire of OTHER – business/ casual) × (Participant Sex – Men/Women). We analyzed the results of targets when presented with an opposite sex other of varying attire (business/casual), using separate general linear mixed models (MLM) for ratings of physical attractiveness and economic status for each study. Target ID, Subject ID, and OtherID were included as random factors to specify the covariance structure for the residuals. Target Attire, Target Sex, Other Attire, and Participant Sex were fixed factors. All main effects and interactions were assessed.

### Results

#### Effects of Individual Attire and Sex on Attractiveness and Economic Status Ratings

The multivariate analysis revealed significant main effect of Target Sex, Target Attire, and their interaction (**Table 2**). For female targets, there were positive associations between Attractiveness and Economic status ratings when presented in both casual and business attire (**Figure 4**). In contrast, male targets were rated negatively for attractiveness and economic status when in casual attire (**Figure 4**). There was a significant Target Sex × Target Attire interaction (**Table 2**), which reflects target females were rated lower for economic status (target female mean = 54.03, SD ±11.27; target male = 55.33, SD ±12.04) but higher for attractiveness (target female = 67.178, SD ±5.33; target male = 56.32, SD ±5.27) than male targets. There was also a significant Target Sex × Target Attire interaction (**Table 2**), so that target females were rated as more attractive in casual attire (mean = 69.28, SD ±4.72) than business attire (mean = 65.08, SD ±5.91).

#### The Influence of Target Attire and the Attire of Opposite Sex Other on Target Attractiveness and Economic Status

There was a significant main effect of Target Sex (**Table 3a**), so that female targets were rated as more attractive than male targets (Target Female mean = 67.32, SE ±1.20; Target Male Mean = 56.32, SE ±1.16; P < 0.001). The Target Sex × Participant Sex interaction was not statistically significant (**Table 3a**). There was a significant Target Attire × Target Sex × Participant Sex interaction for attractiveness ratings that was driven by the ratings from men, who rated female attractiveness lower when in business attire than casual; and male attractiveness higher in business than casual (**Table 3a** and **Figure 5**). There were no significant effects due to women's ratings (**Figure 5**).

There were no significant main effects or interactions involving Other Attire on rated attractiveness (**Table 3a** and **Figure 6A**). However, there was a significant Other Attire × Target Attire interaction for ratings of economic status (**Table 3b** and **Figure 6B**). Targets in business attire presented alongside "others" in business attire were rated higher in economic status than when presented alongside "others" in casual attire (**Figure 6B**), which did not vary with Target Sex (**Table 3b**). A significant Target Sex × Target Attire interaction reflected men in business attire were rated significantly higher for economic status than women in business attire (**Table 3b** and **Figure 7**). Ratings of males and females in casual attire did not differ significantly (**Figure 7**). There were no significant main effects or interactions involving participant sex (**Table 3b**). For additional analyses see **Supplementary Tables S4**, **S5**.

### Discussion

As predicted, economic status ratings were higher when male and female targets were presented in business than casual attire.

TABLE 2 | MANOVA for rated economic status and attractiveness for targets presented alone.


TABLE 3 | MLMs for target male and female rated attractiveness and economic status.


AIC, Akaike Information Criterion.

Additionally, male and female economic status ratings increased when presented alongside others in business attire. In contrast to Study 1, target females received lower attractiveness ratings in business than casual attire, which was driven by men's ratings. The reduction in female attractiveness ratings when presented in business attire is consistent with previous research reporting high status women were judged negatively, less attractive, and less approachable than lower status women (Bassett et al., 1979; Forsythe, 1990; Heilman et al., 2004; Lavin A. et al., 2009). We also report female economic status was rated lower than the

men they were presented alongside, indicating that perceptions of women's economic status are influenced by the men they are presented with (Gouda-Vossos et al., 2016). Interestingly, the effects of economic status and sex disappeared when target males and females were presented in casual attire, so that sex differences in perceptions of status were specific to statusrelated social cues.

Our predictions regarding positive associations between attractiveness and economic status in male targets were supported, but only in males presented in business and not casual attire. Unexpectedly, we found positive associations between attractiveness and economic status ratings for women in both types of attire whereas previous research reported strong associations between economic status and attractiveness in males, with mixed results in women (Townsend and Levy, 1990; Hanson et al., 1991; Shuler and McCord, 2010). Our findings suggest that the influence of status-related clothing on judgments of attractiveness among women and men may be less robust than previously reported.

## GENERAL DISCUSSION

A "Wall Street" article discussing attire and business practices stated that "traditional business dress is seen as a uniform. . .it simplifies decision making and makes hierarchies easy to read." (Binkley, 2008). Our findings reinforce this sentiment, as participants made clear distinctions in physical attractiveness and economic status judgments based on clothing. We also report that women's economic status is judged relative to and lower than the men they are presented alongside. These sex differences in judgments of economic status disappeared when target males and females were presented in casual attire, demonstrating that judgments of women and men's economic status are most influenced by traditionally masculine clothing.

Status seeking is positively associated with men's mating and reproductive success (Betzig, 1986; Hopcroft, 2006) with high economic status associated more with ideals surrounding maleness and masculinity than femininity (Akerlof and Kranton, 2000). Past studies have found that, regardless of sex, group members expressing masculine gender roles or dress in masculine attire are more likely to emerge as leaders and are judged as more forceful and aggressive than those expressing outwardly feminine characteristics (Goktepe and Schneier, 1989; Forsythe, 1990). A limitation of the current study was that we did not compare the effects within same sex groups. In our previous study (Gouda-Vossos et al., 2016), we reported that the economic status of target males was judged to be highest when presented with another man. High status men form same-sex alliances and partnerships

(Von Rueden et al., 2010; Von Rueden, 2011), and the presence of two men not obviously in conflict may give the appearance that targets were forming same-sex partnerships. This was not the case when target females were presented alongside another female, as economic status was rated lower than when female targets were presented alone (Gouda-Vossos et al., 2016). In order to fully understand how men and women's economic status are perceived within various group dynamics, and if economic status is truly judged within a masculine hierarchy, comparisons within same-sex groups would be a worthwhile extension of the current research.

In male dominated social environments characterized by defined hierarchies (Anderson et al., 2001; Schmid Mast, 2004) business attire is associated with more masculine and socially dominant attributes (Forsythe, 1990). Without clothing that clearly communicates economic status, it may be difficult for people to judge where others fall within a hierarchy, which may be why economic status ratings were more neutral when targets were presented in casual attire. It was also unsurprising that female attractiveness was rated lower when presented in business attire. However, this directly contradicts studies reporting no negative influence on female physical attractiveness when presented with high status cues such as cars (Brase and Richmond, 2004) or luxury apartments (Dunn and Hill, 2014). This suggests that judgments of female attractiveness are more likely to vary when women are presented as being of higher status rather than alongside high status cues. It could be argued that participants did not believe that the women in the study actually own the high-status cues (i.e., cars, luxury apartments, etc.); with attire being a more convincing indicator of earned status. By presenting target females as high status individuals, it may communicate economic independence and decrease the attractiveness of female targets to men.

The current study also found positive associations between attractiveness and economic status among male and female targets, except for males presented in casual attire. Judgments of men's economic status and physical attractiveness are strongly positively correlated (Townsend and Levy, 1990; Hanson et al., 1991; Shuler and McCord, 2010) with competence, financial worth, and credibility being more consistently associated with men in business than casual attire (Bassett et al., 1979; Morris et al., 1996; Lavin A. M. et al., 2009). However, even subtle differences in attire within male-dominated business environments can lead to negative criticisms and attitudes toward men. For instance, men are perceived to be less confident, successful, and having lower salaries when presented in "off the peg" suits as opposed to "tailored suits" (Howlett et al., 2013). Further, men experience greater verbal harassment when presented in non-traditional business attire (i.e., business casual) than when presented in business attire (Kwantes et al., 2011). It could be argued that the economic status of more attractive male targets in casual attire was penalized in the current study, demonstrating how culturally malleable cues of status interplays with male attractiveness, possibly influencing women's mate preferences.

## CONCLUSION

Maestripieri et al. (2017) marshaled a comprehensive review on financial and prosocial biases and concluded that attractive individuals, especially women, were more likely to attain financial benefits and better treatment than their less attractive counterparts. The results of the current studies demonstrated that high status individuals, especially men, receive more favorable judgments relating to mate choice (i.e., attractiveness). However, whether this leads to better treatment or increased financial gain remains to be fully explored. Ostensibly, men and women both benefit from being highly attractive or high status, however, this benefit is not distributed equally. Although this is consistent with ideals surrounding the asymmetrical importance of status in males and physical attractiveness in women within mating contexts, the results of the current studies reflect how this may lead to unfair judgments and, possibly, unfair treatment of both men and women.

## AUTHOR CONTRIBUTIONS

AG-V, RB, and BD contributed conception and design of the study. AG-V carried out studies and organized the database. AG-V, RB, and BD performed the statistical analysis. AG-V wrote the first draft of the manuscript. AG-V, RB, and BD wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

## FUNDING

This work was supported by the Australian Research Council awarded to RB and BD and a University of Queensland Post-Doctoral Fellowship to BD.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00462/full#supplementary-material

MATERIAL S1 | Example images used for targets presented alone and with opposite sex others.

MATERIAL S2 | Ad Hoc MLMs of rated attractiveness of male and female models by men and women, factoring in 'target attire'.

MATERIAL S3 | Ad Hoc MLMs of rated economic status of male and female models by men and women, factoring in 'target attire'.

MATERIAL S4 | Data for groups study 1 separated by target sex and measure (attractiveness, earnings).

MATERIAL S5 | Data for attire study 2 separated by measure (attractiveness, earnings).

## REFERENCES

fpsyg-10-00462 March 19, 2019 Time: 17:59 # 13



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Gouda-Vossos, Brooks and Dixson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Roar of a Champion: Loudness and Voice Pitch Predict Perceived Fighting Ability but Not Success in MMA Fighters

#### Pavel Šebesta1,2 \*, Vít Trebický ˇ 1,3, Jitka Fialová1,3 and Jan Havlícek ˇ 1,3 \*

<sup>1</sup> National Institute of Mental Health, Klecany, Czechia, <sup>2</sup> Faculty of Humanities, Charles University, Prague, Czechia, <sup>3</sup> Faculty of Science, Charles University, Prague, Czechia

Historically, antagonistic interactions have been a crucial determinant of access to various fitness-affecting resources. In many vertebrate species, information about relative fighting ability is conveyed, among other things, by vocalization. Previous research found that men's upper-body strength can be assessed from voice. In the present study, we tested formidability perception of intimidating vocalization (roars) and a short utterance produced by amateur male MMA fighters attending the amateur European Championships in relation to their physical fitness indicators and fighting success. We also tested acoustic predictors of the perceived formidability. We found that body height, weight, and physical fitness failed to predict perceived formidability either from speech or from the roars. Similarly, there was no significant association between formidability of the roars and utterances and actual fighting success. Perceived formidability was predicted mainly by roars' and utterances' intensity and roars' harmonics-to-noise ratio and duration. Interestingly, fundamental frequency (F0) predicted formidability ratings in both roars and utterances but in an opposite manner, so that low F<sup>0</sup> utterances but high F<sup>0</sup> roars were rated as more formidable. Our results suggest that formidability perception is primarily driven by intensity and duration of the vocalizations.

Keywords: speech, roar, vocalization, handgrip, competition, perception, human

## INTRODUCTION

Historical and ethnographic evidence shows that physical encounters were a frequent way of resolving conflicts (Manson et al., 1991; Keeley, 1997). Cross-culturally, man's fighting ability is a powerful determinant of access to resources (Daly and Wilson, 1988). These findings are complemented by psychological studies which show that stronger men are more prone to anger (Archer and Thanzami, 2007; Sell et al., 2009b). One may therefore expect that cognitive processes evolved for assessing the threat potential of a prospective opponent (Sell et al., 2009a; Puts, 2010). Earlier research tended to focus on visual cues to the threat potential. It has been demonstrated, for instance, that people can relatively accurately assess physical strength from images of body and face (Sell et al., 2009a; Holzleitner and Perrett, 2016; Kordsmeyer et al., 2018). Moreover, it seems that based on facial images raters can predict winners of mixed martial arts (MMA) fights (Tˇrebický et al., 2013; Little et al., 2015; but see Tˇrebický et al., 2019).

The cues to threat potential are not restricted to the visual modality but evidence regarding vocal indicators of threat potential is rather mixed. On one hand, it was reported that both men and

#### Edited by:

Alex L. Jones, Swansea University, United Kingdom

#### Reviewed by:

Benedict C. Jones, University of Glasgow, United Kingdom Phil McAleer, University of Glasgow, United Kingdom

#### \*Correspondence:

Jan Havlícek ˇ jhavlicek@natur.cuni.cz Pavel Šebesta pavelsebest@email.cz

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 12 October 2018 Accepted: 01 April 2019 Published: 30 April 2019

#### Citation:

Šebesta P, Trebický V, Fialová J and ˇ Havlícek J (2019) Roar of a Champion: ˇ Loudness and Voice Pitch Predict Perceived Fighting Ability but Not Success in MMA Fighters. Front. Psychol. 10:859. doi: 10.3389/fpsyg.2019.00859 women can accurately assess men's physical strength from voice irrespectively of the language used (Sell et al., 2010). On the other hand, fighting ability assessed by acquaintances did not correlate with ratings of fighting ability based on vocal stimuli (Doll et al., 2014). Han et al. (2017) likewise reported no association between a composite measure of threat potential, consisting of handgrip strength, body height and weight, and the perceived vocal threat potential.

Importantly, all of the abovementioned studies used speech as their acoustic stimuli. Humans, however, produce also various other vocalizations, such as laughter, roars, screams and grunts, and these so far received only limited attention. This contrasts with evidence from a number of vertebrate species, including primates, which shows that vocal displays are frequently part of male intrasexual competition (Bradbury and Vehrencamp, 2011) and can indicate fighting ability (for evidence in red deer, see Clutton-Brock and Albon, 1979; for baboons, see Kitchen et al., 2003). In humans, it has recently been shown that tennis players who produce grunts with a lower fundamental frequency (F0) are more likely to win and listeners can to some extent predict match outcome from the grunts (Raine et al., 2017). Similarly, Raine et al. (2018a) reported that listeners accurately assess relative strength and body height from aggressive roars in both men and women.

In our complementary study, we tested predictors of perceived formidability using acoustic cues. It ought to be noted, however, that Raine et al. (2018a) and the current study differ in several important respects. First of all, Raine et al. focused on two important components of threat potential (height and strength), but threat potential and/or perceived formidability undoubtedly include other components as well. These may include morphological characteristics, such as body weight and lean muscle mass, as well as physical abilities other than isometric strength, for instance respiratory fitness. Secondly, while one can expect that threat potential is a predictor of outcomes of real-life fights, it cannot be entirely equated with fighting success.

To address these questions, we recorded both verbal and non-verbal vocalizations (utterances and roars) of amateur male MMA athletes along with (i) measurements of their body composition, isometric strength, and spirometry, and collected data regarding their (ii) fighting success.

We hypothesized that formidability perceived from vocalization should correlate with height, weight, and muscle mass as well as physical fitness indicators, such as strength and lung capacity. We also predicted that perceived formidability is positively associated with fighting success. Further, we performed an acoustic analysis to identify which parameters predict the perception of formidability from both roars and utterances. We hypothesized that perceived formidability is related to the F<sup>0</sup> and intensity in both verbal and non-verbal vocalizations.

## MATERIALS AND METHODS

All procedures applied in this study were in accordance with ethical standards of the responsible committee on human experimentation and with the Helsinki Declaration. The study was approved by the Institutional Review Board of the National Institute of Mental Health, Czech Republic (Ref. num. 28/15). All target participants were provided with a brief description of the study and approved their participation by signing informed consent. The present study is part of a larger project investigating multi-modal perception of traits associated with sexual selection and characteristics related to competition outcome.

## Targets

Data collection took place during 2016 IMMAF European Open Championships of Amateur MMA held in Prague (Czech Republic), which hosted a total of 155 contestants (incl. 20 women) from 30 countries (based on data from MyNextMatch.com). Contestants were approached by researchers during registration on site, 1 day before the start of the tournament. We focused on male athletes because championship attendance was highly biased toward male athletes and we thus managed to collect data from only three female athletes.

Forty male amateur MMA fighters (mean age = 24, SD = 4.4, range = 19–33 years), naïve to our project's aims, participated in the study. To assess a possible effect of weight category, we merged the weight categories used by competition organizers (Flyweight, N = 1; Bantamweight, N = 7; Featherweight, N = 4; Lightweight, N = 4; Welterweight, N = 7; Middleweight, N = 5; Light Heavyweight, N = 5; Heavyweight, N = 4; and Super Heavyweight, N = 3) into just three categories: Lightweight (N = 12; consisting of Flyweight, Bantamweight, and Featherweight categories), Middleweight (N = 16; consisting of Lightweight, Welterweight, and Middleweight categories) and Heavyweight (N = 12; consisting of Light Heavyweight, Heavyweight, and Super Heavyweight) following procedure in Tˇrebický et al. (2013). All targets reported their basic demographics, age, and total fighting record, from which computed their fighting success as a proportion of the number of wins relative to the total number of fights. Fighting success was calculated only for fighters whose record included more than two fights. Analyses involving fighting success are therefore based on 29 individuals. For technical reasons, we managed to obtain lung capacity measures from 34 individuals. For descriptive statistics, see **Table 1**. All other analyses are based on the complete dataset of 40 individuals. Participants in the study were financially reimbursed with 400 CZK (app. e15) and verbally debriefed upon completing their participation.

## Body Measurements

Body height was measured by Vít Tˇrebický using anthropometer Trystom A-213. Participants were standing with their back against a wall, looking directly ahead, and body height was measured from Vertex to ground to a nearest millimeter (Hall et al., 2007).

Body weight, amount of body fat, and muscle mass were measured by Jitka Fialová (JF) using bio impedance Tanita MC-980 scale (Athlete setting; Vaara et al., 2012). Testing was performed in a standing position while standing on and holding in hands the measuring electrodes with arms hanging freely along

#### TABLE 1 | Target descriptive statistics.


\* Intensity values are reported as analyzed from the original recordings (i.e., before being post-processed).

the body. Participants were wearing underwear only (Pinilla et al., 1992).

#### Physical Fitness Measurements

Handgrip strength was measured by JF using Takei TKK 5401 digital hand dynamometer (Vidal Andreato et al., 2011; Bonitch-Góngora et al., 2013). While undergoing the handgrip test, the athletes were instructed to stand straight with arms alongside their body. They had 3 attempts with each hand, alternated hands between attempts, and we used the "best test" method, meaning the attempt with the highest value of handgrip strength for each hand was recorded. Maximal handgrip strength between left and right hand was closely correlated (r = 0.808 95% CI [0.664, 0.894], p < 0.001, N = 40) and paired sample t-test showed no statistical difference between the maximal strength of left and right hand [t (39) = 0.618, p = 0.54, mean difference = 0.6 kp]. In all further analyses involving handgrip strength, we therefore represent handgrip strength by the mean of both hands "best test" score.

Measures of lung capacity were taken by JF using MicroLab ML3500 MK8. Three standing forced vital capacity (FVC) maneuvers were performed, "best test" method applied, and we recorded the maneuver with the highest recorded FVC value along with Forced expiratory volume in the first second (FEV1) and Peak expiratory flow (PEF). The "best test" method is a widely used and recommended approach in research employing spirometry (Crapo et al., 1981; Havryk et al., 2002). FVC is the maximal volume of air exhaled with maximally forced effort from maximal inspiration delivered during expiration made as forcefully and completely as possible. In other words, it is vital capacity performed with a maximally forced expiratory effort. FEV<sup>1</sup> is the maximal volume of air exhaled in the first second of forced expiration from a position of full inspiration, and PEF represents the maximum expiratory flow achieved by maximum forced expiration from the point of maximal lung inflation (Miller et al., 2005).

#### Vocal Stimuli Recordings and Processing

Acoustic stimuli were recorded by Pavel Šebesta (PŠ) using Sony PMC-D90 portable audio recorder (in-built microphone sensitivity 20–40 kHz). Recorder was equipped with a windscreen (AD-PCM1), mounted on a tripod with acoustic reflection shield and placed in a portable, acoustically treated booth to reduce any potential echoes and ambient noises. Recordings were captured at 24 bit/96 kHz in WAV format. Participants stood 1.5 meters from the recorder and Levels setting was kept constant in the course of all recordings to standardize recording intensity and to prevent clipping.

Participants were instructed to count from 1 to 10 in their native language and then perform three intimidating roars (their instruction was: "Roar three times, as much as you can, to intimidate a potential opponent"). For ratings and analyses, we use only the second roar because the first might be affected by the novelty of the task and the third by a potential decrease of effort (for differences between the three roars, see Supplementary Material **Tables S1**–**S11**). For examples of roars see **Audios S1**, **S2** and for utterances see **Audios S3**, **S4**.

Subsequent processing and acoustical analysis were performed by PŠ in Audacity 2.1.3 (Audacity Team, 2018) and Praat 5.4.09 (Boersma and Weenink, 2015). Roars and utterance levels were increased by +20 dB and +35 dB, respectively, while interindividual variation in vocalization intensity remained unmodified. This intensity adjustment was necessary because most utterance recordings were not sufficiently loud even at the highest volume settings. The employed adjustments in roars was the highest possible that did not introduce clipping in any of the recordings. We measured the mean intensity and duration of volume-adjusted utterances and roars. Mean F<sup>0</sup> was measured by autocorrelation method. Preset parameters for F<sup>0</sup> extraction were used, with a 75 Hz pitch floor in accordance with Praat programmers' recommendations and 300 Hz pitch ceiling based on a visual inspection of spectrographs (for similar approach see also Šebesta et al., 2017). The 300 Hz pitch ceiling recommended for utterances was not suitable for the roars. We visually inspected Praat's pitch contours in the Editing window. Most roar recordings showed erroneous F<sup>0</sup> measurements (see **Figure S1** for an example), which rendered the standard Praat's F<sup>0</sup> extraction method unreliable for this type of acoustic stimuli (for similar issues with F<sup>0</sup> extraction, see Raine et al., 2017). F<sup>0</sup> tracking frequently failed in the middle of recording or even unexpectedly "jumped down." This is possibly due to chaotic and subharmonic phenomena found in roars (Fitch et al., 2002). For this reason, we decided to use, as a F<sup>0</sup> analog, the long-term averaged Fast Fourier transformed (FFT) spectral peak frequency (see **Figure S2** for an example), corresponding to the first harmonic (verified by a visual inspection of harmonic structure). Further, we used standard Praat methods for harmonics-to-noise ratio (HNR; autocorrelation method, preset parameters) measurements for whole utterance recordings, and one second long snips from the initial part of the roars close to the spectrogram plateau where Praat's autocorrelation algorithm was able to track F0. Mean formant levels in speech (F1–F4) were measured by Burg method. In roars, however, only a peak around 2–3 kHz (which is in expected range for the third formant) was apparent by a visual inspection of long-term average spectrums (LTAS) and clearly distinguishable from other harmonics. Audacity's "Plot spectrum" feature (Spectrum, 1,024 window size, Hanning window) was used for the 2–3 kHz peak measurement. Because we were able to reliably extract only the third formant (F3) from roars and the first and second formants in speech are highly affected by speech content, we decided to use in subsequent analyses only the third formant of both utterances and roars to enable comparison.

## Rating Sessions

In total, 31 men (mean age = 27.1, SD = 5.2, range = 20–36 years) and 32 women (mean age = 24.4, SD = 4.3, range = 18–33 years), mainly students at the Charles University, Prague, Czech Republic, took part in rating sessions.

Raters were recruited via social media advertisements and mailing lists of participants from previous studies. After completing participation, they were financially reimbursed with 100 CZK (∼ e4), a small snack, and received a debriefing leaflet about the purpose of the study.

Raters were asked to assess the formidability ("Jak moc by byl tento muž úspešný, kdyby se dostal do fyzického souboje?"/"How ˇ successful would this man be if he was involved in a physical confrontation?") of a given recording on a 7-point verbally anchored scale (from "1–velice neúspešný"/"not successful at all," ˇ to "7–velice úspešný"/"highly successful"). Each participant rated ˇ all roar and utterance stimuli. To reduce participant fatigue, the rating was divided in two sessions 1 week apart. In the first session, participants rated half of the set of all roars and utterances in a randomized order. Individual stimuli within the set were randomized as well. In the second session, participants rated the remaining half of the stimuli in the same fashion.

Ratings took place in a quiet perception lab room with negligible ambient sounds. Focusrite Scarlett Solo Gen 2 audio I/O interface (22 Hz−22 kHz RCA output) and two Yamaha HS-7 active reference studio monitor speakers (43 Hz−30 kHz @ 95W, LF 60 W, HF 35 W output) were used to present stimuli in WAV format. Raters were seated 2.8 meters in front of and in focus of the speakers. We opted for speakers, instead of commonly used headphones, because it is a more ecologically valid approach to presenting stimuli in terms of sonic characteristics of roaring. Loudness of the playback was kept standard during the presentation, with the loudest roar registering 87 dB (measured with OnePlus One smartphone and Smart Tools <sup>R</sup> Sound meter 1.6.12 app). This is a level which, all authors agreed, was very naturalistic but not overwhelmingly loud.

## Statistical Analyses

All statistical tests were performed in JASP 0.9.0.1 (JASP Team, 2018) and jamovi 0.9.1.7 (jamovi project, 2018). McDonald's ω statistics was used to estimate interrater agreement (Dunn et al., 2014). To test for potential sex differences in ratings, a paired samples t-test was carried out. Associations between ratings by men and women were tested by bivariate correlations using Pearson's r coefficient with 95% CIs [lower limit, upper limit]. Potential differences between the maximal strength of left and right hand were tested with paired samples t-test, and associations between the left and right hand strength were tested by bivariate correlations using Pearson's r coefficient with 95% CIs. Cohen's d, as an effect size measure, was used for means comparisons. To assess the relative contribution of performancerelated and acoustic measures to the perceived roar and utterance formidability, we performed Linear mixed effects model (using REML fit) with individual rater ID and target stimuli ID as random intercepts. This approach accounted for variation on the level of individual raters and for variation on the level of individual stimuli. It also accounted for potential bias due to the data aggregation. To assess acoustic predictors of fighting success, we ran a linear regression analysis (Enter method). As measures of variability explained by regression, we list model R<sup>2</sup> values, while standardized βs and their 95% CI are reported for entered coefficients.

## Data Availability

Datasets generated and analyzed during the current study are available in the **Supplementary Material** of this article (**Tables S20**, **S21**).

## RESULTS

#### Sex Differences in Perceived Formidability Utterances

McDonald's ω scores of male (ω = 0.954) and female (ω = 0.933) ratings of formidability of utterances showed a high interrater agreement. We have therefore used mean formidability ratings given to the individual utterances separately by male and female raters. Perceived formidability of utterances was likewise highly correlated between men and women (r = 0.93 95% CI [0.871, 0.963], p < 0.001, N = 40). Paired sample t-test showed a statistically significant sex difference in formidability ratings with men giving higher ratings [t(39) = 9.165, p < 0.001, Cohen's d = 1.449, mean difference = 0.368] (for descriptive statistics, see **Table 2**). Although mean ratings of utterance formidability differed between sexes, all further analyses are reported with ratings combined because the results are virtually the same when analyzed separately. For results based on female and male ratings separately, see Supplementary Material **Tables S12**–**S19**.

#### Roars

McDonald's ω scores of males (ω = 0.953) and females (ω = 0.924) ratings of roar formidability showed a high interrater agreement. In subsequent analyses, we have therefore used mean formidability ratings given to the individual roars separately by male and female raters. Further, we found a high correlation between roar formidability ratings assigned by men and by women (r = 0.973 95% CI [0.95, 0.986], p < 0.001, N = 40). Paired sample t-test showed statistically significant difference between the sexes in roar formidability ratings with women giving higher ratings [t(39) = 2.695, p = 0.645, Cohen's d = 0.426, mean difference = 0.132]. For descriptive statistics, see **Table 2**.

## Formidability of Utterances and Roars as a Predictor of Fighting Success

To test whether formidability perception from roars and utterances predicts fighting success, we ran bivariate Pearson's correlations. We found that neither in utterances (r = −0.045 95% CI [−0.405, 0.327], p = 0.817, N = 29) nor in roars (r = −0.115 95% CI [−0.462, 0.263], p = 0.554, N = 29) was formidability perception associated with actual fighting success. To explore whether the effect is modulated by the weight categories, we grouped the fighters in three weight categories (lightweight, middleweight, and heavyweight) and entered this variable into the linear regression. Even after this modification, however, the overall model was not formally significant either in utterances [F(3, 25) = 1.841, p = 0.166, R<sup>2</sup> = 0.181] or in roars [F(3, 25) = 0.683, p = 0.571, R<sup>2</sup> = 0.076].

## Physical Fitness Predictors of Perceived Formidability

First, were ran exploratory correlational analyses to assess relationships between the physical fitness variables (see Supplementary Material **Table S22**). Body weight, muscle mass, and fat mass were all highly positively intercorrelated (rs > 0.757, ps < 0.001, N = 40). To avoid collinearity and to facilitate interpretation of the findings, we used only body weight TABLE 2 | Formidability rating descriptive statistics.


TABLE 3 | Summary of linear mixed effects model analysis for physical fitness predictors of perceived fighting ability based on utterances and roars.


in the subsequent analyses. FVC and FEV<sup>1</sup> spirometry measures were likewise highly positively correlated (r = 0.935 95% CI [0.872, 0.967], p < 0.001, N = 34), which is why we decided to omit the FEV<sup>1</sup> from subsequent analyses.

Linear mixed model analyses were run with age, height, weight, FVC, PEF, and handgrip strength as fixed effect predictors to assess whether physical fitness parameters predict the perceived formidability of utterances and roars. The overall model for utterances explained 44.9% of variance (R<sup>2</sup> conditional) and fixed factors explained 5.4% of variance (R<sup>2</sup> marginal). None of the physical fitness predictors for the formidability of utterances was formally significant. The overall model for roars explained 60.1% of variance (R<sup>2</sup> conditional), while fixed factors explained 8.2% of variance (R<sup>2</sup> marginal). Similarly, none of the predictors of perceived formidability in roars were significant. For an overview of the results, see **Table 3**.

## Acoustic Predictors of Perceived Formidability

Linear mixed model analyses were run to predict perceived formidability from utterances and roars with F0, F3, HNR, intensity, and duration entered as independent predictors. For utterances, the overall model explained 44.1% of variance (R<sup>2</sup>

TABLE 4 | Summary of linear mixed effects model analysis for acoustic predictors of perceived formidability based on utterances and roars.


TABLE 5 | Summary of multiple linear regression analysis for acoustic predictors of fighting success based on utterances and roars.


conditional), while fixed factors explained 9.6% of variance (R<sup>2</sup> marginal). We found that F<sup>0</sup> and intensity are significant predictors of perceived formidability. In the case of roars, the overall model explained 57% of variance (R<sup>2</sup> conditional) and fixed factors explained 37.5% of variance (R<sup>2</sup> marginal). We further found that perceived formidability was predicted by the F0, HNR, intensity, and duration. For full detail, see **Table 4**.

#### Acoustic Predictors of Fighting Success

To explore whether any acoustic parameters predict actual MMA fighting success, we ran a multiple linear regression analysis for both utterances and roars. Overall models were not statistically significant in either utterances or roars [Utterances: F(5, 23) = 0.774, p = 0.578, R<sup>2</sup> = 0.144; Roars: F(5, 23) = 1.107, p = 0.384, R<sup>2</sup> = 0.194]. For full results, see **Table 5**.

## DISCUSSION

The main goal of this study was to test whether a perception of formidability based on intimidating roars and non-intimidating utterances is related to body parameters such as body height, weight, and to some relevant aspects of physical fitness, such as strength and lung capacity. We have also tested whether perceived formidability is related to actual fighting success. Finally, we performed an acoustic analysis to investigate which parameters predict perceived formidability and fighting success. In contrast to our predictions, we found that neither body height, weight, or muscle mass predict perceived formidability neither from speech not roars. We also found no significant association between formidability of the roars and utterances and actual fighting success. Finally, our acoustic analysis showed that the intensity (the acoustic analog of loudness) of both speech and roars is the strongest predictor of perceived formidability. In roars, but not in utterances, lower HNR and longer duration predicted perceived formidability. Moreover, while lower voices (lower F0) were perceived as more formidable in utterances, the opposite held for the roars.

Our negative findings concerning an association between body height and strength of the roars contrast with results reported in a recent paper by Raine et al. (2018a), where the authors found that the listeners could predict relative body height and handgrip strength from both speech and roars. Such results are further supported by another study which showed a positive association between handgrip strength and perceived strength based on speech (Sell et al., 2010). On the other hand, another two studies found no association between threat potential and perceived fighting ability/dominance from speech (Doll et al., 2014; Han et al., 2017).

There are several possible explanations for such striking differences between our study and results reported in Raine et al. First of all, both Raine et al. (2018a) and Sell et al. (2010) asked participants specifically to assess strength, while our participants rated formidability. Although strength does certainly contribute to overall formidability, there are other important factors which influence it, such as agility or endurance. Moreover, differences in the use of perceptual attributes can, too, affect the association with measures of formidability. Since our main goal was to investigate how people perceive threat potential based on acoustic cues, we used a broader concept of formidability instead of focusing narrowly on the perception of strength. To resolve this issue, future studies should compare ratings of strength and formidability based on acoustic cues and its correlates while employing the same set of stimuli (for results based on the perception of faces and bodies, see Sell et al., 2010).

Secondly, Raine et al. (2018a) in their ratings used an ego-centered approach, i.e., their participants assessed strength relatively to their own strength. We agree that perceivers may be particularly sensitive when it comes to estimating their own chances of winning a confrontation. Nonetheless, several other studies did use absolute ratings, including rating of perception of strength from speech (e.g., Sell et al., 2010), and found positive results. It is possible that even under these conditions, people tend to use the scale relatively to their own prospects. It could also be argued that because our targets were experienced fighters, there should be no difference between the relative and absolute ratings because vast majority of student listeners would rate their formidability as lower than that of MMA fighters in either case. This is supported by a comparison of mean values of handgrip strength between our (**Table 1**) and Raine et al., 2018a (**Supplementary Information**, p. 4) study, although this is only a very approximate estimate because these two studies used different types of dynamometer and resulting values therefore cannot be directly compared. Alternatively, people might be able to assess formidability irrespective of ego involvement. This is supported by a study which explicitly used the bystander paradigm (Little et al., 2015). In particular, raters were asked to judge from facial photographs who will win a fight and they were successful above the chance level. Once again, to obtain more fine-grained insights into how ego-related context affects the cognitive processes of formidability assessments, future investigations should compare this directly.

Thirdly, in our study we used vocal stimuli from MMA fighters who have extensive experience with physical encounters and some fighters produce roars when winning a fight. It would seem advantageous to employ such a group of participants rather than, for instance, students who are likely to have limited experience with both fighting and roaring. Potential drawback of our sample of fighters may be that because of intense training, they will display little variability in their handgrip strength. Inspection of variation estimates, such as SD, shows that this was not the case (see **Table 1**). The sample size of our stimuli was rather moderate (N = 40), but a related study by Raine et al. (2018a) reported positive effect based on smaller sample of the male stimuli.

Finally, one could argue that formidability perception of the roars is related to the effort. This is supported by our acoustic analysis which showed that intensity and duration was the strongest predictor of formidability judgements. It is thus possible that in our sample, motivation and consequently also effort invested in the roars varied among our participants and as a result may have obscured some of the associations with physical characteristics. Alternatively, and perhaps most importantly, the full expression of intimidating roars is not under complete volitional control, which is why it is possible that it can be expressed only in the appropriate context (e.g., when conflict is imminent). Using on-demand roars might not be a problem for judgements of strength but could be a key factor in formidability inferences. Although we acknowledge that this might be a logistically challenging task, the use of real-life nonverbal vocal stimuli which vary little in their motivation and/or effort should thus be preferred. An excellent example in this context is the study by Raine et al. (2017) who used as their stimuli the grunts of professional tennis players.

The acoustic analysis showed that for formidability judgements, intensity and duration are the most salient predictor. This is in agreement with studies on various vertebrate species. For example, male green frogs (Rana clamitans) react differently to calls produced by large males as opposed to small ones (Bee et al., 1999; but see Bee, 2002). Similarly, more dominant male baboons produce longer and louder calls (so called "wahoos") during contest vocalizations (Fischer et al., 2002; Kitchen et al., 2003). Interestingly, many studies on speech perception standardize their vocal stimuli for intensity (because reliable measures of acoustic intensity are logistically difficult to acquire) and therefore cannot assess intensity's contribution to the respective perceptual attribute. However, our results, as well studies on perception of affective states and intentions (Scherer, 1986, 2003; Siegman et al., 1990; Banse and Scherer, 1996), show that loudness (i.e., the perceptual analog of voice intensity) is an integral and significant part of voice perception. Indeed, the same verbal content expressed in a soft, moderate, or loud voice often has a very different impact on perceivers (Patel et al., 2011). We further found that a low HNR of roars, but not of utterances, is associated with high formidability ratings. Previous studies also show a higher noise in threatening calls than in non-threatening vocalizations and a higher perturbation (lower HNR) in anger vocalization in humans (Patel et al., 2011).

Finally, fundamental frequency was negatively associated with formidability of speech, while associating positively with the formidability of roars. The results of formidability judgements from speech are in agreement with other studies which consistently show that male voices with a lower voice pitch (the perceptual analog of fundamental frequency) are perceived as more dominant and attractive (Puts et al., 2006). This could be a consequence of sex dimorphism in the voice pitch (Rendall et al., 2005; Markova et al., 2016). In contrast, our finding of a positive association between fundamental frequency and formidability judgements of roars came at first as a surprise. On the other hand, one could take into account that high-pitched voices might, similarly to intensity, provide cues about the effort and affective state of the producers, whereby those in a state of high arousal would produce higher F<sup>0</sup> roars. This speculation is supported by studies showing that arousal leads to increase in voice pitch perhaps as a consequence of tension in glottal area (Ekman et al., 1976; van Mersbergen et al., 2017). Moreover, high pitch is in some species associated with threat vocalizations (Stirling, 1971; Portfors, 2007) and in humans, it is associated with anger vocalizations (Scherer and Oshinsky, 1977; Frick, 1986). Fitch et al. (2002) have proposed that subharmonics (portions of F0) in loud calls are more prevalent and one of the hypothesized effects of this phenomenon is that they perceptually lower the pitch. In other words, a loud vocalization of the same individual that has the same F<sup>0</sup> could sound lower-pitched than if the same vocalization were produced in moderate loudness. Although we were able to detect subharmonics phenomena in a number of high intensity roars in our sample (see Supplementary Material **Table S10**), this effect should be systematically investigated in future studies.

To summarize, we found no significant association between formidability perception of the intimidating roars produced by the MMA fighters and their body height, weight, and physical fitness indicators such as handgrip strength or lung capacity. Neither did we find a correlation between the perceived formidability of their roars and their actual fighting success. This might be because accurate judgements of formidability can be made only on the basis of real-life roars and cannot be reliably performed on demand. It may also be relevant that while roars might be primarily interpreted as intentions (e.g., as affective state of anger), utterances might be interpreted primarily as characteristic of the individual (e.g., as a level of dominance). Alternatively, the association between some acoustic parameters and perceived formidability might be the result of sensory exploitation and have only limited predictive value for actual formidability (Feinberg et al., 2018). We also found that the main acoustic predictors of formidability in roars are intensity, HNR, duration, and to some extent also fundamental frequency. In a broader context, our study points to a need of further investigations of non-verbal vocalizations in humans. Scholars seem to be so blinded by humans' exceptional gift of speech that they tend to almost completely overlook the fact that this is not our only vocalization. Non-verbal vocalizations are crossculturally prevalent in human social milieu. This applies not only to preverbal infants (see for instance Lindová et al., 2015) but also to adult humans who produce a wide variety of non-verbal vocalizations in diverse contexts, such as co-laughter, painful injuries, aggressive confrontations, and sexual encounters, to name just few (for some pioneering studies, see Bryant et al., 2016; Raine et al., 2018a,b). We are confident that research into these non-verbal vocal displays will greatly contribute to our understanding of the complexity of human vocal expressions and perhaps also to the evolutionary history of verbal communication in general (Hauser et al., 2002).

## AUTHOR CONTRIBUTIONS

PŠ, VT, JF, and JH developed the study concept. Data collection was performed by PŠ, VT, and JF. PŠ performed acoustic analysis of vocal stimuli. VT and PŠ performed data analysis and interpretation jointly with JF and JH. JH, PŠ, and VT drafted the manuscript and JF provided critical revisions. All authors approved the final version of the manuscript for submission.

#### FUNDING

This research was supported by Czech Science Foundation GACRˇ P407/16/03899S and by the Ministry of Education, Youth, and Sports (MEYS) NPU I program (No. LO1611) and PROGRES program Q22 at the Faculty of Humanities, Charles University within the Institutional Support for Long-Term Development of Research Organizations from MEYS.

## REFERENCES


#### ACKNOWLEDGMENTS

We would like to thank the International Mixed Martial Arts Federation (IMMAF) and Mixed Martial Arts Association Czech Republic (MMAA) for giving us the opportunity to collect data during the 2016 IMMAF European Open Championships which were held in Prague, Czechia. We are indebted to all the volunteer contestants of the championship and raters for their participation. We wish to thank to Tereza Nevolová, David Stella, and other members of Human Ethology group (www.etologiecloveka.cz) for their help with data collection and ratings, Petr Turecek for help with stimuli ˇ randomization, Klára Coufalová, Ph.D. for providing us with physical performance measurements tools and Anna Pilátová, Ph.D. for English proofreading.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00859/full#supplementary-material

Audio S1 | Sample of highly formidable roar.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Šebesta, Tˇrebický, Fialová and Havlíˇcek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Evolutionary Theories and Men's Preferences for Women's Waist-to-Hip Ratio: Which Hypotheses Remain? A Systematic Review

#### Jeanne Bovet\*

Stony Brook University, Stony Brook, NY, United States

#### Edited by:

Alex L. Jones, Swansea University, United Kingdom

#### Reviewed by:

Gayle Brewer, University of Liverpool, United Kingdom Lynda Boothroyd, Durham University, United Kingdom Robert C. Brooks, University of New South Wales, Australia

> \*Correspondence: Jeanne Bovet jeanne.bovet@gmail.com

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 08 November 2018 Accepted: 08 May 2019 Published: 04 June 2019

#### Citation:

Bovet J (2019) Evolutionary Theories and Men's Preferences for Women's Waist-to-Hip Ratio: Which Hypotheses Remain? A Systematic Review. Front. Psychol. 10:1221. doi: 10.3389/fpsyg.2019.01221 Over the last 25 years, a large amount of research has been dedicated to identifying men's preferences for women's physical features, and the evolutionary benefits associated with such preferences. Today, this area of research generates substantial controversy and criticism. I argue that part of the crisis is due to inaccuracies in the evolutionary hypotheses used in the field. For this review, I focus on the extensive literature regarding men's adaptive preferences for women's waist-to-hip ratio (WHR), which has become a classic example of the just-so storytelling contributing to the general mistrust toward evolutionary explanations of human behavior. The issues in this literature originate in the vagueness and incompleteness of the theorizing of the evolutionary mechanisms leading to mate preferences. Authors seem to have rushed into testing and debating the effects of WHR on women's attractiveness under various conditions and using different stimuli, without first establishing (a) clear definitions of the central evolution concepts (e.g., female mate value is often reduced to an imprecise concept of "health-and-fertility"), and (b) a complete overview of the distinct evolutionary paths potentially at work (e.g., focusing on fecundability while omitting descendants' quality). Unsound theoretical foundations will lead to imprecise predictions which cannot properly be tested, thus ultimately resulting in the premature rejection of an evolutionary explanation to human mate preferences. This paper provides the first comprehensive review of the existing hypotheses on why men's preferences for a certain WHR in women might be adaptive, as well as an analysis of the theoretical credibility of these hypotheses. By dissecting the evolutionary reasoning behind each hypothesis, I show which hypotheses are plausible and which are unfit to account for men's preferences for female WHR. Moreover, the most cited hypotheses (e.g., WHR as a cue of health or fecundity) are found to not necessarily be the ones with the strongest theoretical support, and some promising hypotheses (e.g., WHR as a cue of parity or current pregnancy) have seemingly been mostly overlooked. Finally, I suggest some directions for future studies on human mate choice, to move this evolutionary psychology literature toward a stronger theoretical foundation.

Keywords: mate choice, attractiveness, evolutionary hypotheses, WHR, mate value, reproductive success, fertility

## INTRODUCTION

The ratio between the waist and the hips circumferences (Waist-to-Hip Ratio, or WHR) is a physical characteristic often used as an example to show that evolution shaped human mate preferences. It is also an example of just-so storytelling in evolutionary psychology. In 1993, Devendra Singh suggested that WHR represents a strong predictor of women's physical attractiveness (Singh, 1993a). He also argued that men's preference for a mate with a low WHR is adaptive, because a low WHR reflects a woman's high mate value. But what exactly is this "mate value"? During the past 25 years, the evolutionary literature on WHR and women's attractiveness has flourished, but the definition of this "mate value" is rarely expressed. In evolutionary biology, mate value is attached to the concept of reproductive success: a woman with a high mate value will increase the reproductive success of her mate(s). An increase in reproductive success is characterized by an increased number of descendants in next generations and can be achieved in various ways. First, survival until reproduction is indispensable. Second, the number of children born during an individual's lifespan is also crucial. But the survival and the quality of these children will directly impact their own reproductive success, and hence the number of grandchildren in the next generation, thus ultimately influencing the reproductive success of the grandparents. In short, a woman has higher value as a potential mate if she increases the number and quality of descendants a man will have (including the ones he has with other women). The question then is which of these components of reproductive success are actually linked to a mate's WHR? To answer this, I assemble the numerous hypotheses exposed since the idea of the WHR as an indicator of women's mate value was first suggested in 1993. These hypotheses are examined to determine which of the characteristics linked to WHR are most likely, in theory, to be translated into an increase in the reproductive success of the woman's mate.

The objective of this review is 2-fold. The first goal is to gather and pool all the existing evolutionary hypotheses regarding men's preferences for a certain (low, high or average) WHR. There are many reviews about men's preferences for women's WHR, but this is the first exhaustive review of the hypotheses mentioned in these studies. The second purpose of this paper is an indepth theoretical examination of these hypotheses, which are often only briefly justified and, in some cases, have never been properly developed.

Most of the debate around WHR and attractiveness has centered on two other questions: "Is the preference for a low WHR universal?" and "Is WHR the best predictor of the attractiveness of women's bodies?" I will not address these two questions extensively here (it is beyond the scope of this paper), but a brief commentary seems necessary at this point. A preference for a relatively low WHR (i.e., low relatively to men's WHR, or low relatively to the average female WHR) has been observed in a large number of studies, including a wide range of populations and methods. With that in mind, results show that there is some variation in what is the exact value of the ideal WHR [reviewed in Brooks et al. (2015) and Cashdan (2008)]. The second debate concerns WHR as the "best" predictor for attractiveness. Authors have debated whether WHR or BMI is the best predictor of attractiveness and mate value (Tassinary and Hansen, 1998; Tovée et al., 1999; Furnham et al., 2005; Cornelissen et al., 2009a,b). As could be expected, the results vary according to the population and stimuli used. Other measurements have also been proposed to replace WHR (for example, hip or waist size alone, abdominal depth or waist/stature ratio: Brooks et al., 2010, 2015; Lassek and Gaulin, 2016). The objective of this paper is not to decide if WHR is the best measure of physical attractiveness or if the ideal WHR is universal or not. For our purposes, it is sufficient to note that the effect of WHR on attractiveness is widespread (even if the value of the preferred WHR varies), and large enough to warrant questions about its possible adaptive basis.

## MATERIALS AND METHODS

The dataset encompasses any articles and book chapters addressing men's preferences for women's WHR, based on an evolutionary approach (see the **Supplementary Material** for the details). The final dataset consists of 104 papers from 58 different first authors, including 13 review papers and chapters from 1993 to 2017. All the hypotheses concerning men's adaptive preferences toward women's WHRs, waist size or hip size are collected (see **Figure 1** and **Supplementary Table S1**).

See the **Supplementary Material** for the details about the methodology used in the collection and the selection of hypotheses.

## RESULTS

In the following sections, I review each hypothesis found in the literature to see if it could, in theory, support an adaptive role of the preference for a low WHR. For a hypothesis to be plausible, two steps are required: 1) Correlation with WHR: first, WHR needs to be correlated with the biological trait of interest (Is WHR associated with the nominated characteristic in the population?). This correlation needs to be strong enough, such that a detectable variation of WHR attractiveness translates into a significant variation of the hypothesized trait; 2) Effect on the man's reproductive success: second, the nominated characteristic should be associated with a potential increase of reproductive success (meaning more descendants, and higher quality descendants) for the individual who chooses a mate carrying this characteristic. A dispensable third step can be added: 3) Perception of the characteristic using WHR: do people use the WHR to assess the nominated characteristic (Are people conscious of the link between WHR and the characteristic)? Importantly, this third step is not mandatory for the hypothesis to be valid, as people do not need to be conscious of the biological consequences of their preferences for them to have an effect. In other words, a preference for a trait can perfectly evolve when individuals do not suspect that this trait is a cue of something else. For example, a preference for sweet taste evolved in our ancestors without them knowing that it was a cue of a source of readily available energy. However, this third step can

FIGURE 1 | Two theoretical frameworks explaining men's adaptive preferences for women's WHR. (A) Example of a vague theoretical explanation often found in the literature. WHR is assumed to be a cue of women's health and fertility, which supposedly translate into women's "mate value." Some descriptive information of WHR is sometimes added to the theory (e.g., WHR is sexually dimorphic), but without any explicit link to mate value. (B) A more complex but more accurate theoretical explanation, including all the different hypotheses found in the literature. Each box represents a characteristic linked to women's WHR. The diagram illustrates their potential links to the reproductive success of the man (the woman's mate). The boxes with a gray background directly concern the man. All the other boxes relate to the woman's characteristics. Characteristics related to women's fertility (as usually defined in this literature) are represented in blue. In green: characteristics related to women's health. The lines connecting the boxes represent correlations, without implying any causality. Dotted lines indicate that empirical evidences for the correlation are scarce. Dotted frames indicate that evidences linking the characteristic to women's WHR are scarce. The links between each characteristic are not represented on the diagram. For example, parity is obviously correlated with a woman's age, but this correlation is not illustrated here (each characteristic is supposed to be correlated to WHR, controlling for other characteristics).

represent additional support in favor of the hypothesis and help us understand the mechanisms behind mate preferences.

## Cue of Biological Sex

According to this hypothesis, WHR would be a way to detect the biological sex of a potential mate. The first mention of this straightforward hypothesis appears relatively late, almost 10 years after the first paper about adaptive preferences for WHR (Tovée et al., 2002, see **Figure 2**), and is present in only 14% of the papers (see **Supplementary Material**).

#### Correlation With WHR

WHR is sexually dimorphic in the human species. A significant difference between men's and women's WHR has been found in all the populations where it has been investigated (Leibel et al., 1989; Björntorp, 1991, 1993; Marti et al., 1991; Beall and Goldstein, 1992; Ley et al., 1992). The size of the difference between sexes varies between populations, but no culture has been found where men have a lower average WHR than women. Thus, WHR is a reliable cue of biological sex.

However, there are many other traits which are sexually dimorphic in humans and which can be used to assess biological sex: height, shoulder-to-hip ratio, hair, facial traits, breast, genital, voice, and so on (Wenzlaff et al., 2018). Consequently, people can identify biological sex without using WHR. And because WHR is not indispensable to asses sex, the selective advantage of a preference for a low WHR as a way to assess biological sex is

number of papers citing each hypothesis in a given year. See the Supplementary Material for the details of this figure.

reduced (Iwasa and Pomiankowski, 1994; Bro-Jørgensen, 2010). On the other hand, the "redundant signaling" hypothesis (or "back-up signal" hypothesis) claims that multiple cues conveying similar information (the biological sex in the present case) compensate for errors during information coding (Moller and Pomiankowski, 1993; Johnstone, 1996). In other words, multiple cues serve as a back-up signal that ensures a low rate of mate choice errors (Bro-Jørgensen, 2010; Abend et al., 2015). Moreover, when the ability to detect different cues varies with environmental conditions or distances, individuals may pay attention to different cues under different conditions (Candolin, 2003). More experiments are required to measure by how much and when WHR does improve the detection of biological sex, in addition to other sexually dimorphic features.

#### Effect on the Man's Reproductive Success

Obviously, it is necessary to copulate with the opposite sex to increase the number of descendants. The problem in this case is not the effect of the characteristic (biological sex) on the individual's reproductive success, but the non-uniqueness of the cue. Thus, unless we discover evidence supporting the "redundant signaling" hypothesis, the sexual dimorphism of WHR may have contributed to the selection of men's preference for a low WHR but was probably not the only selective force involved.

#### Perception of the Characteristic Using WHR

People use WHR to assess individuals' biological sex, and a lower WHR is strongly associated with figures being perceived as female (Johnson and Tassinary, 2005; Johnson et al., 2010; Saunders et al., 2010; Pazhoohi and Liddle, 2012). People are able to detect sex using WHR when other cues of sex are unavailable (Pazhoohi and Liddle, 2012) but, as stated earlier, we need to measure the accuracy of this detection with and without the use of WHR. This would give us an indication of the strength of the selection on the use of WHR as a cue of biological sex.

## Cue of (Reproductive) Age

This hypothesis, already referred to in the first paper on the topic (Singh, 1993a, see **Figure 2**), is found in 43% of the papers. According to this hypothesis, WHR would be an indicator of chronological or reproductive age.

#### Correlation With WHR

WHR is high in early childhood (and similar between boys and girls) and drops around the onset of puberty for women. Then, women's WHR increases from the peak of fertility (in the 20's). This general age pattern is observed in many countries, including non-western and non-industrial populations (Rimm et al., 1988; Leibel et al., 1989; Seidell et al., 1990; Beall and Goldstein, 1992; Ley et al., 1992; Björntorp, 1993; Casey et al., 1994; Sugiyama, 2004; Bohler et al., 2010; Brooks et al., 2010; Bacopoulou et al., 2015; Butovskaya et al., 2017). Finally, WHR might also be a cue of menopause (end of reproductive age, independently of chronological age; Kirschner and Samojlik, 1991; Bjorkelund et al., 1996; Tchernof and Poehlman, 1998), but several other studies find no effect of menopausal status on WHR (Lanska et al., 1985; Tonkelaar et al., 1989; Seidell et al., 1990; Troisi et al., 1995; Tchernof and Poehlman, 1998; Sugiyama, 2004).

To sum up, WHR is a reliable cue of the start of women's reproductive capacity (menarche and puberty). WHR is also a reliable indicator of women's age after puberty, and maybe of menopause. However, as with biological sex (although to a lesser extent), individuals can rely on other cues to assess age (the face, for example, and menarche is also linked to an increase in breast size). The redundancy of the cue decreases the selective advantage of the preference for a low WHR as cue of age (Iwasa and Pomiankowski, 1994; Bro-Jørgensen, 2010). Nevertheless, according to the "redundant signaling" hypothesis, using several

cues simultaneously, including WHR, may increase the precision of the age estimation (Johnstone, 1996; Bro-Jørgensen, 2010; Abend et al., 2015). Alternatively, the use of redundant cues may reduce time and energy spent inspecting mates, make mate assessment possible under different conditions (Rowe, 1999; Candolin, 2003), or make it more difficult for women to cheat about their actual age (Candolin, 2003).

#### Effect on the Man's Reproductive Success

Fecundability (the ability to become pregnant) is age-dependent for women. Fecundability is null before menarche, increases from puberty, peaks in mid-twenties on average, and then decreases until menopause, the end of the reproductive window (Menken and Larsen, 1986; Weinstein et al., 1990; Dunson et al., 2002; Wallace and Kelsey, 2010). In addition, the risks of complications during pregnancy and childbirth are also related to age (more issues when very young, and after 30, even when controlling for parity; Naeye, 1983; Fretts et al., 1995; Amarin and Akasheh, 2001). Thus, the choice of a mate around the peak of fertility (for short-term relationships), or before (for long-term relationships), will increase the number of potential descendants a man can sire with this mate, and thus his reproductive success.

However, because WHR is not the only cue of women's age, the correlation between age and WHR may have contributed to the selection and/or maintenance of men's preferences for a low WHR, but it is unlikely to be sufficient on its own, unless the use of WHR in addition to other redundant cues increases the precision of the age estimation in a way which would confer a selective advantage to the men making this estimation. More investigation is necessary to explore the role of the back-up signal hypothesis in this case.

#### Perception of the Characteristic Using WHR

People seem to use WHR to assess women's age, and a low WHR tends to be associated with perceptions of youthfulness (Singh, 1993b, 1994, 1995; Singh and Luis, 1995; Furnham et al., 2004, 2005; Sugiyama, 2004; Andrews et al., 2017). However, many other results on perceived age are inconclusive (Singh, 1993a,b; Henss, 1995, 2000; Singh and Young, 1995; Furnham et al., 1997, 2002, 2005; Sorokowski et al., 2014; Wang et al., 2015). I suggest this is due to two reasons: first, WHR does not have a linear relationship with age (it is U shaped), and secondly, people simultaneously use other cues to infer age. As a consequence, depending on other cues depicted on the stimuli (face, breasts or hair, for example), results can reveal a negative, positive, or null relationship between perceived age and WHR. Further studies investigating the interaction effect between WHR and other physical cues on perceived age are needed. Moreover, the effect of WHR on perceived youthfulness should be explored in different populations, as most of the studies have been conducted in WEIRD countries (but see Furnham et al., 2002; Sugiyama, 2004; Sorokowski et al., 2014 for notable exceptions).

#### Cue of Current Pregnancy

According to this hypothesis, a woman's WHR could be used to detect if she is currently pregnant or not. This hypothesis is found in 31% of the papers.

#### Correlation With WHR

Women experience a drastic increase in WHR during pregnancy, which is mainly due to an increase in waist circumference. A decrease in hip circumference may also happen, as fat from this region is mobilized during late pregnancy to meet the needs of the growing fetus (Rebuffé-Scrive et al., 1985; Lassek and Gaulin, 2006). Moreover, unlike for sex and age, WHR is the unique reliable visual cue of pregnancy. The slope of the decreasing attractiveness of WHR through pregnancy for different populations remains to be specified. The earlier in pregnancy the WHR starts to be significantly less attractive, the more plausible this hypothesis will be, as men would be able to use this cue longer/more often.

#### Effect on the Man's Reproductive Success

Because women are infertile while pregnant, pregnancy is directly linked to current fecundity. As such, choosing a pregnant woman as a short-time mate will not enhance a man's reproductive success. However, being pregnant is a transient stage. For a young woman, pregnancy is positively associated with her future expected reproductive success (it is a sign that she is able to become pregnant and to carry a child). However, for older women, being currently pregnant is negatively correlated with the expected number of additional children. Some authors argue that the relationship between current pregnancy and future fertility depends on the fertility rate of the population: if the total fertility rate of the population is low (e.g., two children) it would be costly to be attracted to a woman who is already pregnant, because there is a high risk that she may conceive only one more child. In traditional societies, where total fertility rates sometimes exceed 6, if a woman is pregnant, she may nevertheless conceive at least a few more children (Marlowe and Wetsman, 2001). To sum-up, choosing a mate who is already pregnant will, most of the time, decrease the number of potential descendants of an individual, but the size of the effect depends on the woman's age, the total fertility rate in the population and the type of relationship (short or long term).

Moreover, choosing a long-term mate who is pregnant with the child of another man entails additional evolutionary costs, because investing in a non-biological child decreases the amount of investment an individual can invest in his own descendants. Lastly, Singh suggests that choosing a mate carrying the baby of another man could increase the risks of violence from the jealous current mate (which could impact the survival or future reproductive success of the individual suffering from the attack).

Altogether, mating with a woman with a high WHR because it indicates current pregnancy will have, on average, a negative effect on an individual' reproductive success. Added to the fact that WHR is a reliable and distinctive cue of current pregnancy, this gives solid theoretical support in favor of the present hypothesis.

#### Perception of the Characteristic Using WHR

To my knowledge, only two studies have investigated the role of WHR on the perception of current pregnancy (Furnham et al., 2001; Schützwohl, 2006). The results confirm that a low WHR is associated with a lower perceived probability of current pregnancy. The perception of pregnancy using WHR seems obvious for the last stages of pregnancy, based on profile views or in 3D. However, and even if the conscious awareness of a pregnancy is not mandatory for men's preference to evolve, it would be interesting to explore when exactly people start to detect pregnancy, using 2D images including frontal, back and profile views of women at different pregnancy stages.

## Cue of Parity (Number of Previous Pregnancies)

This hypothesis, first mentioned in 1998 (Yu and Shepard, 1998, see **Figure 2**), is present in 11% of the papers in this literature. It stipulates that WHR is a way to estimate the number of children (or number of pregnancies) that a woman has previously had in her life.

#### Correlation With WHR

There is evidence that WHR increases with the number of previous pregnancies (independently of age and BMI), due to an increase of waist circumference and/or a relative decrease in hip circumference (Kaye et al., 1990; Smith et al., 1994; Troisi et al., 1995; Bjorkelund et al., 1996; Rodrigues and Costa, 2001; Lassek and Gaulin, 2006; Wells et al., 2010, 2011; Butovskaya et al., 2017). This change in body shape (sometimes referred to as (covert) maternal depletion) is due to the mobilization of fat from the lower parts of the body to meet the needs of the developing child (as well as looser abdominal muscles). This may be interpreted as a life history strategy for allocating energy between competing gluteofemoral fat depots for reproduction, and central fat depots for maintenance and survival (Cashdan, 2008; Wells et al., 2010, 2011). This phenomenon has been observed in various countries: Brazil (Rodrigues and Costa, 2001), Sweden (Bjorkelund et al., 1996), Thailand (Wells et al., 2011), UK (Wells et al., 2010), USA (Kaye et al., 1990; Troisi et al., 1995; Lassek and Gaulin, 2006), and non-industrial societies including tribes from Sub-Saharan Africa, Western Siberia, South America and South Asia (Butovskaya et al., 2017). However, a few other studies find that parity has a negligible or null effect on WHR (Lanska et al., 1985; Tonkelaar et al., 1989; Seidell et al., 1990; Nenko and Jasienska, 2009), but these null results can be explained by the higher average age of the women sampled in those studies. Indeed, the parity effect seems to dissipate over time (Wells et al., 2010). Note that this does not affect the plausibility of the present hypothesis, as the effect of parity on WHR should be visible at the time of mate choice (relatively young).

#### Effect on the Man's Reproductive Success

Women's limited reproductive potential and resources mean that, even controlling for age, each child already born reduces the future number of children a man can sire with the woman if he mates with her long-term (Symons, 1981; Sugiyama, 2005). Parity status influences the survival and quality of future descendants. For example, both high parity and nulliparity are associated with increased risks during childbirth and lower birthweights (Kiely et al., 1986; Fretts et al., 1995; Hinkle et al., 2014; Merklinger-Gruchala et al., 2015, 2017), and IQ is negatively correlated with birth order (Downey, 2001). A recent pregnancy also increases the probability of current infertility because of lactational amenorrhea. Finally, as with current pregnancy, higher parity increases the costs linked to investment in genetically unrelated children.

In conclusion, even when the risks associated with first births are taken into account, choosing a mate with a low parity should have an overall positive impact on individuals' reproductive success (especially for long term relationships), and WHR as a cue of parity is likely to play a significant role in the selection of men's preferences for a low WHR.

#### Perception of the Characteristic Using WHR

To my knowledge, only one study investigates the effect of WHR on perceived parity, with the results validating that women with a higher WHR are perceived as having a higher number of children (Andrews et al., 2017). This study needs replications in populations other than undergraduate students from the USA, but the results suggest that people are using WHR as a cue of parity.

### Cue of Fecundity

One of the most cited argument for an adaptive preference for a low WHR is WHR as a cue of fecundity (cited in 54% of the papers). Healthy women of similar age and reproductive history vary in their ability to become pregnant and achieve a live birth, and WHR would be an indicator of this ability.

#### Correlation With WHR

The most direct evidences in favor of this hypothesis comes from a few clinical studies showing that women with a lower WHR have a higher probability of conception in the case of in vitro fertilization and artificial insemination (Zaadstra et al., 1993; Wass et al., 1997). But more recent studies find no relationships between women's WHR and their likelihood of conceiving after induction of ovulation (Imani et al., 2002; Eijkemans et al., 2003). These studies are informative because they are directly linked to fecundity, but women seeking medical assistance to conceive do not represent the ideal population to investigate factors of natural fecundity.

A few studies find that high WHRs are correlated with a later age at first live birth (Kaye et al., 1990) or longer time-topregnancy (Wise et al., 2013; McKinnon et al., 2016; but see Wise et al., 2010).

An indirect way to detect the link between fecundity and WHR is to look at the menstrual cycles or at the physiological factors linked to both WHR and fecundity. A few studies indicate that WHR is linked to menstrual abnormalities (Hartz et al., 1984; Moran et al., 1999) and to hormonal levels linked to fecundity (Björntorp, 1991; Jasienska et al., 2004). Similarly, one study finds that women with low WHRs have lower endocervical pH (Jenkins et al., 1995), which helps sperm penetration (Zavos and Cohen, 1980). However, these results seems not to hold for nonobese young women (see Lassek and Gaulin, 2018b for a richer discussion on this topic).

Finally, one study finds that WHR decreases around ovulation (Kirchengast and Gartner, 2002), suggesting that WHR might also reveal whether a woman is at peak cycle fertility. However, these results should be interpreted with caution, as others fail to replicate this effect (Bleske-Rechek et al., 2011).

To conclude, there are some indirect lines of evidence that WHR could be linked to fecundity, but this effect is mostly found when high WHR is associated with other factors (as obesity or older age) and might thus be negligible in populations of young and non-obese women (Lassek and Gaulin, 2018b). Moreover, these studies almost exclusively focus on WEIRD populations, limiting even more the generalization of these results.

#### Effect on the Man's Reproductive Success

Choosing highly fecund mates will increase the reproductive success of a man both for long-term and short-term relationships. In the case of a short-term relationship, it will simply increase the probability of a pregnancy. In the case of a long-term relationship, it will increase the number of potential descendants by reducing both interbirth intervals and the period before the first child (thus increasing the reproductive window).

However, in light of the lack of evidence of a link between WHR and young and non-obese women's fecundity, this hypothesis does not benefit from strong empirical support.

#### Perception of the Characteristic Using WHR

A few studies find that a low WHR is associated with higher perceived fecundity (Singh, 1993b; Furnham et al., 2004; Sugiyama, 2004), but the results are unclear for the vast majority of the cases (Singh, 1993b, 1994, 1995; Singh and Luis, 1995; Furnham et al., 1997, 2001, 2003, 2004, 2005, 2006; Tassinary and Hansen, 1998). I suggest that this lack of clarity is mainly due to the ambiguity of the questions asked to the participants. The main issue is the absence of any indication about the time frame. For example, high parity (linked to a high WHR), is positively associated with past fecundity, but negatively associated with future fecundity [see section Cue of Parity (Number of Previous Pregnancies)]. Thus, in the absence of additional information, it is impossible to know if the participants are rating past, current or future fecundity. The answer probably depends on other cues provided in the survey, or vary from one participant to another, which could explain the inconclusive results. Future tests of perceived fecundity should include the notion of time.

## Cue of Quantity and Availability of "Reproductive Fat"

The idea that fat located around women's hips is qualitatively different from fat found in other body regions, and is used specifically for reproductive functions, exists in the literature since 1993 (Singh, 1993b, see **Figure 2**). This hypothesis has been progressively enriched, stating that a mother's WHR is linked to the development of her fetus and infant. It is present in 34% of the papers.

#### Correlation With WHR

WHR is, by construction, positively correlated with the quantity of fat situated at the waist level (abdominal fat), and negatively correlated with fat quantity located around the hip (gluteofemoral fat). There is evidence that gluteofemoral fat in women is specific to reproduction: the storing of gluteofemoral fat is high (compared to males and to other body parts) during human female development (Fredriks et al., 2005). Moreover, even with restricted food intake, gluteofemoral fat is metabolically protected from use until late pregnancy and lactation, when it is selectively mobilized (Rebuffé-Scrive et al., 1985; Lassek and Gaulin, 2008). The hypothesis derived from these observations is that the quantity of gluteofemoral fat would have an effect on the development of the fetus during pregnancy and of the infant through lactation.

This reproductive fat appears to be of particular importance for brain development, as gluteofemoral fat is the main source of long-chain polyunsaturated fatty acids that are critical for fetal and infant neural development. Additionally, it seems that abdominal fat inhibits the availability of these neurodevelopment resources (abdominal fat decreases the amount of the enzyme 1-5 desaturase, which is rate limiting for the synthesis of long-chain polyunsaturated fatty acids; Lassek and Gaulin, 2008). Consequently, WHR is an indicator of the quantity and availability of the fatty acids needed for fetal and infant brain development. In favor of this hypothesis, a study shows that women with lower WHRs and their children have significantly higher cognitive test scores (Lassek and Gaulin, 2008).

Moreover, one study finds that a low WHR correlates with higher birth weight (Pawłowski and Dunbar, 2005), but other studies found the opposite (Brown et al., 1996; Salem et al., 2012).

To conclude, a woman's WHR seems to be a promising indicator of future fetus and infant neural development (although further data from different countries are needed), and additional evidence is required to confirm the link between pre-pregnancy WHR and fetal growth.

#### Effect on the Man's Reproductive Success

Mating with a woman able to provide enough resources during the development of the fetus and infant increases the survival and quality of the descendants. Offspring with higher cognitive abilities are likely to have a better rate of survival and reproductive success than individuals who suffer from worse conditions during their brain development.

A low birthweight is associated with higher infant mortality (Chase, 1969; Behrman et al., 1982; McCormick, 1985) and negative outcomes later in life (Hackman et al., 1983; Baker et al., 2008). However, a low birthweight is also associated with variables which may have no effect on the father's reproductive success (e.g., because occurring late in life), and could even have a positive effect in some environments (Bateson et al., 2004), as a low birthweight seems to be associated with a faster life history strategy (Nettle, 2010).

In conclusion, choosing a mate with a lower WHR if it is linked to higher resources for fetal and infant brain development (and maybe general growth), will have a generally positive impact on a man' reproductive success. However, the size of this effect according to the environmental conditions should be explored. For example, how does this trait impact the number of descendants in the next generation when conditions are more favorable to faster life history strategies?

#### Perception of the Characteristic Using WHR

To my knowledge, only one study explores the effect of WHR on the perceived quality of the descendants (Andrews et al., 2017). Andrews et al. (2017) ask participants to rate female bodies for the following questions: "If this woman were to have a child, it would be healthy;" "If this woman were to have a child, it would make friends easily;" "If this woman were to have a child, it would be popular." They find a negative relationship between WHR and projected offspring quality, supporting the idea that women with low WHRs are expected to have higher quality children than women with high WHRs (but, as often with this type of questions, it is difficult to tell if we are measuring something else than a halo effect).

## Cue of Health

One of the most cited hypotheses stipulates that a low WHR is an indicator of women's good health (hypothesis present in 87% of the papers). The health conditions which are referred to in the literature on WHR and attractiveness are: cardiovascular diseases, hypertension, strokes, myocardial infarction, diabetes, gallblader disease, kidney diseases, pancreatitis, lung function impairment, cretinism, psychiatric disorder, various cancers and preeclampsia.

#### Correlation With WHR

A high WHR is correlated with many health issues. This claim is supported by abundant evidence (for reviews see Björntorp, 1987a,b, 1993; Manolopoulos et al., 2010). However, these findings are based on relatively old women or men (often 60 years old or more, almost never before 30), mostly suffering from some degree of obesity, raising the possibility that this relationship is not present for evolutionary relevant reproductive-age populations (Lassek and Gaulin, 2018a).

#### Effect on the Man's Reproductive Success

The consequences for reproductive success of mating with a woman with a low WHR because it is a cue of her health are not straightforward. First, the cited health conditions are not contagious, thus the survival of the woman's mate cannot be directly affected. Secondly, most of the chronic diseases associated with WHR are recent, from an evolutionary point of view, and they are associated with present-day environments, lifestyle and alimentation (Eaton and Eaton, 1997; Groop, 2000). Third, even if we assume that these health issues were common in our evolutionary past, most of them appear late in life, after the end of women's reproductive life. Thus, most of the heath issues linked to high WHRs are unlikely to affect the number of descendants of a woman's mate (Lassek and Gaulin, 2018a).

A few exceptions in the list of WHR-related health issues can be made, however. First, a high WHR early in pregnancy seems to be correlated to higher risks of preeclampsia (a condition which can be fatal to both the fetus and mother; Yamamoto et al., 2001; Taebi et al., 2015). However, evidence is needed to see if preeclampsia is predicted by WHR before pregnancy (when mate choice occurs). One paper indicates that a high WHR can be an indicator of cretinism (a syndrome often linked to infertility; Streeter and McBurney, 2003). However, WHR is probably not a very good cue to detect cretinism, as this health condition generates other physical modifications, more easily noticeable than WHR (Chen and Hetzel, 2010). Another exception is the polycystic ovarian syndrome. This condition can affect the fertility of young women, but only when the syndrome is associated with obesity (Pall et al., 2006; Pasquali et al., 2006). And again, the prevalence of this condition in our evolutionary past is unclear. Lastly, the term "health" can include malnutrition and parasites (although it is almost never referred to in the literature), which can affect fertility at any age and are not restricted to our contemporary societies. These two last characteristics are discussed in the next sections of this paper (Cue of Parasite Load & Cue of Diet).

Health later in life could influence the survival and quality of descendants in another way, through maternal investment: long-term health and longevity increase the probability of having a living and healthy mother able to provide care for children and grandchildren (Sear et al., 2000). Thus, theoretically, WHR as a cue of health could have played a role in the selection of preferences for a low WHR. However, this hypothesis holds only if WHR at a younger age (at the time of mate choice) is a reliable predictor of health later in life, excluding diseases which are evolutionary novel. Longitudinal studies in non-WEIRD populations are needed to explore this possibility.

Alternatively, good health at old age could be related to genetic quality. Descendants from individuals with higher longevity could have a better health, even at younger ages. In this case, men's preferences for a low WHR as a cue to health could evolve through indirect selection. Cross-generational studies are needed to test this good genes hypothesis.

To conclude, in the light of the present evidence, the "WHR as a cue of health" hypothesis is unlikely to be at the evolutionary origins of preferences for a low WHR in young women. However, this hypothesis could receive new theoretical support through the maternal and grandmaternal investment or the genetic quality hypotheses, but only if some of the above predictions (links between women's WHR at young age and health at old age, or health of the descendants, excluding evolutionary novel diseases) are supported by evidence.

#### Perception of the Characteristic Using WHR

Participants are asked to rate the health of the stimuli in many studies (Singh, 1993a,b, 1994, 1995; Singh and Luis, 1995; Singh and Young, 1995; Furnham et al., 1997, 1998, 2001, 2002, 2003, 2004, 2005, 2006; Yu and Shepard, 1998; Wetsman and Marlowe, 1999; Henss, 2000; Marlowe and Wetsman, 2001; Sugiyama, 2004; Marlowe et al., 2005; Schützwohl, 2006; Tovée et al., 2007; Swami et al., 2009; Sorokowski et al., 2014). In general, a low WHR is associated with better perceived health. Interestingly, however, a few studies investigating non-WEIRD populations find a null or positive effect of WHR on perceived health (Yu and Shepard, 1998; Wetsman and Marlowe, 1999; Tovée et al., 2007; Sorokowski et al., 2014). This support the idea that the association between high WHR and poor health might be valid in contemporary western countries only. Even if, as explained earlier, the perception of health using WHR is not a mandatory step to validate the hypothesis, more research (with different stimuli and questions) is needed to clarify this point.

It would also be interesting to see if young women's WHR is linked to their perceived future health and longevity. One could also explore if individuals have any idea of the kind of diseases associated with WHR.

To my knowledge, only one study explores the effect of WHR on the perceived quality of the descendants (Andrews et al., 2017, see section Cue of Quantity and Availability of "Reproductive Fat" above). They find a negative relationship between women's WHR and the projected offspring quality, in accordance with the hypothesis of WHR as a cue of genetic quality.

### Cue of Parasite Load

The idea that WHR could be a sign of infection by parasites is not recent (e.g., Furnham et al., 1998, see **Figure 2**) but is quite rare in the literature (in 5% of the papers).

#### Correlation With WHR

Some parasites, including intestinal worms, can increase waist size through oedema while causing weight loss, which will result in a higher WHR (Cross, 1992; Kucik et al., 2004).

#### Effect on the Man's Reproductive Success

Parasite load can affect survival and fertility. Moreover, most parasites are contagious, and mating with a woman carrying parasites increases the probability of being infected. As such, WHR as a cue of parasite load can have an effect on a man's health and survival, as well as an effect on the number, survival and quality of descendants he can sire with the infected woman. This effect remains to be quantified and will certainly vary according to the frequencies and types of parasites present in the environment.

WHR as a cue of parasite load is an interesting hypothesis, but it has been largely overlooked and evidence is by consequence lacking.

#### Perception of the Characteristic Using WHR

There is no specific research on the perception of parasite load based on WHRs. However, many studies explore the effect of WHR on perceived general health (see section Cue of Health).

## Cue of Diet or Malnutrition

The hypothesis that WHR could be a cue of women's diet or malnutrition is found in 5% of the papers.

#### Correlation With WHR

One paper mentions that a high WHR could be a sign of Kwashiorkor, a form of malnutrition (Streeter and McBurney, 2003). Indeed, WHR can increases in some cases of malnutrition because of the presence of an oedema enlarging waist size (Golden, 1982; Waterlow, 1984).

A diet rich in fibrous food can also increases waist size and thus WHR. For example, Marlowe states that Hadza women may have a high WHR because "a larger gut is required to hold the amount of bulky, fibrous tubers in the Hadza diet" (Marlowe et al., 2005).

#### Effect on the Man's Reproductive Success

Malnutrition increases the morbidity and mortality of a woman and her children, and might also decreases her fecundity (Mosley, 1977; Osteria, 1982; Hernández-Julián et al., 2014). Choosing a mate suffering from malnutrition will thus decrease one's reproductive success. The prevalence of malnutrition involving a high WHR during our evolutionary past should be explored, to establish if it could have represented an evolutionary force for the preferences toward low WHRs.

Concerning diet, it is not clear if a large waist reveals a good ability to digest fibrous food or a poor ability to assimilate this kind of food. If the latter is true, a higher WHR will be associated with less resources available for pregnancy and lactation, leading to lower survival and quality of descendants. The opposite will be true if a large waist is associated with a better ability to digest fibrous food.

The hypotheses of WHR as a cue of malnutrition or diet (or ability to digest some type of food) have been mainly ignored, and evidence is thus missing.

#### Perception of the Characteristic Using WHR

There is no specific research on the perception of diet or malnutrition based on WHR.

## Cue of Fetal Conditions

This hypothesis is mentioned only once in the literature (Singh, 1995). It stipulates that the WHR of an adult woman could be an indication of her developmental conditions before her birth.

#### Correlation With WHR

A negative link between adult WHR and birth weight, or placental weight to birth weight ratio (an indicator of retarded fetal growth), has been found, but this study is only composed of men over 50 years old (Law et al., 1992). To my knowledge, there is no empirical evidence showing that young women's WHR is a reliable cue of their fetal development.

#### Effect on the Man's Reproductive Success

A low birthweight is associated with higher infant mortality (Chase, 1969; Behrman et al., 1982), but this cannot affect a mate's reproductive success, as the mating occurs after the woman's survival to infancy. But a low birthweight also has some negative outcomes later in life (Bateson et al., 2004), for women's fertility (Hackman et al., 1983) and longevity (Baker et al., 2008, which decreases the likelihood of having a living and healthy mother caring for her mate's descendants, see section Cue of Health).

On the other hand, as explained in section Cue of Quantity and Availability of "Reproductive Fat," a low birthweight is also associated with some advantages in harsh environments (Bateson et al., 2004), as well as a relatively early sexual maturation and reproduction (Nettle, 2010), which might increase the number of descendants for the potential mate.

To conclude, WHR as a cue of a woman's fetal condition could have, in theory, a negative, positive or null effect on her mate's reproductive success. Combined with the fact that the link between WHR and fetal conditions has been shown for older men only, this hypothesis lacks both empirical and theoretical support.

#### Perception of the Characteristic Using WHR

There is no test of the effect of WHR on perceived fetal conditions.

## Cue of Pelvis Size

This hypothesis, found in 16% of the papers in this literature, is already reported in one of the first papers from Singh (1993b), see **Figure 2**), and states that WHR is a cue of the size (or shape) of women's pelvis.

#### Correlation With WHR

WHR is, by definition, linked to hip size, which is indicative of underlying pelvic skeletal morphology. It is unclear, however, how much of the variation in WHR is explained by pelvic size (it seems that most of the variance in WHR is due to fat storage on the hip and waist regions).

#### Effect on the Man's Reproductive Success

The size of the pelvis determines the size of the bony pelvic canal through which the fetus passes during a delivery. As such, a wider pelvis reduces the risk of obstructed labor (Caldwell and Moloy, 1933; Stålberg et al., 2006). In the absence of healthcare, women who are unable to deliver their babies perish, along with their babies. Moreover, obstructed labor can lead to many longterm health issues on both sides, which can influence future survival and fertility. Thus, a woman's small pelvis will decrease the number of descendants a man can sire with her.

However, a large pelvis can be an obstacle to efficient locomotion (Leutenegger, 1974; Lovejoy, 1988; Ruff, 2017 but see Warrener et al., 2015). A woman with a lower ability to walk will have higher difficulties to secure resources for her children, which will decrease their survival or quality. Altogether, stabilizing selection is expected to be operating on female hip size, as well as on men's preferences for this trait.

To conclude, the evolutionary costs and benefits of a wide pelvis seem more appropriate to explain the origin of the sexually dimorphic hip size via natural selection, than to explain men's preferences for a specific WHR. Female pelvic size and shape are the result of two conflicting evolutionary pressures: bipedal locomotion and parturition of a highly encephalized fetus (Leutenegger, 1974; Lovejoy, 1988; Rosenberg and Trevathan, 1995 but see Leong, 2006; Betti and Manica, 2018). It is possible that the link between pelvic size and childbirth and locomotion contributed to the selection of men's preference for an average hip size, but more research is needed to confirm its effect on men's reproductive success.

#### Perception of the Characteristic Using WHR

To my knowledge, nobody has tested the effect of WHR on perceived difficulties during childbirth, or on perceived locomotion.

## Cue of Center of Body Mass

This hypothesis, suggested by Pawlowski and Dunbar (2001) and Pawłowski and Grabarczyk (2003) and found in 6% of the papers in the literature, stipulates that WHR is linked to the position of the body's center of gravity, which influences bipedal stability.

#### Correlation With WHR

Everything else being equal, a lower WHR will lower the center of mass of the body. One study uses body measurements of young women to experimentally establish the correlation between WHR and the center of body mass (Pawłowski and Grabarczyk, 2003). However, the correlation is not very strong in their sample of students, and more data is required.

#### Effect on the Man's Reproductive Success

In advanced pregnancy and during lactation, when the infant is being carried, a bipedal female has to contend with a substantial increase in the anterior load above the center of gravity (Pawłowski, 2001). Fat deposits in the buttocks and thighs may prevent the center of gravity from moving upwards and forwards, and facilitate walking and foraging during pregnancy and lactation. Choosing a mate with a lower center of gravity could increase the survival of the fetus and infants a man would sire with this woman, as she would be less likely to fall and injure the fetus, the infant or herself, and she would be more successful in foraging or escaping predation during these critical periods. A lower center of gravity would also mean a lower energetic cost to maintain balance, and thus an increase in resources available to be directed toward the descendants. Thus, a woman's center of gravity could have an effect on her mate's reproductive success (Pawlowski and Dunbar, 2001; Pawlowski, 2003).

However, as with the pelvic size argument, this hypothesis seems more suitable to explain the origin of dimorphic body shapes in the human species than to explain men's preferences.

#### Perception of the Characteristic Using WHR

To my knowledge, there has been no research concerning WHR and perceived center of body mass, or perceived walking abilities during pregnancy and lactation.

## Cue of Ability to Cope With Stress

The link between stress and women's WHR exists in the literature since 1995 (Singh, 1995, see **Figure 2**), but is included in only 5% of the papers. Depending on the author, a high WHR could be a sign of exogenous stress, a cue of a poor ability to cope with stress, or a cue of an effective response to stress.

#### Correlation With WHR

Compared to women with low WHRs, women with high WHRs report more chronic stress and have more psychological and psychiatric issues (Björntorp, 1987b, 1993). According to Björntorp, a high WHR might be interpreted as a sign of an inability to cope with environmental stress. One experiment shows that women with high WHRs evaluate laboratory challenges as more threatening, performed more poorly on them, and reported more chronic stress (Epel et al., 2000).

However, Cashdan draws an opposite conclusion from the same observations (Cashdan, 2008). Cortisol (the levels of which are associated with WHR) enables the mind and body to respond effectively to stress, by shifting energy substrates from storage sites to the bloodstream and by increasing blood pressure and cardiac output. As part of this response, cortisol increases WHR by increasing visceral fat. Conversely, stressinduced cortisol secretion is greater among women with more central fat (Epel et al., 2000).

To conclude, WHR seems to be related to stress responses, but it is not clear if a low WHR is a cue of a good or a poor ability to cope with environmental stress. The stress responses in women with high WHRs may be maladaptive in most WEIRD populations, yet it could be adaptive where conditions are extreme or where stress is episodic rather than constant (Cashdan, 2008).

#### Effect on the Man's Reproductive Success

If a high WHR is a sign of inadequate coping with stress, women with a high WHR may bear descendants of lower quality because they may be less able to secure resources or provide care for them. However, the opposite is true if a high WHR is a sign of a better ability to respond to stress.

Maternal stress during fetal growth can lead to a lower birthweight. Stress also has epigenetic effects on offspring' life history trajectories and health (Worthman and Kuzara, 2005). However, according to the adaptationist life history perspective, these effects could be associated with a phenotype adapted to the environment (Bateson et al., 2004; Worthman and Kuzara, 2005; Nettle, 2010, see section Cue of Quantity and Availability of "Reproductive Fat").

To conclude, it is unclear if choosing a woman with a lower WHR, as a cue of stress responses, would have a positive, neutral or negative impact on a man's reproductive success. The answer will probably differ according to the environment, and could lead to a preference for a relatively high WHR in some cases (Cashdan, 2008).

Overall, this hypothesis lacks clarity. Nevertheless, the link between stress and WHR is a valuable explanation of the variability of women's WHRs (Cashdan, 2008).

#### Perception of the Characteristic Using WHR

To my knowledge, the effect of WHR on perceived stress, or ability to cope with stress, has not been investigated.

### Cue of Ability to Acquire Resources

It has been suggested that a preference for a relatively high WHR could be adaptive in some environments because the hormonal profile associated with high WHRs (high androgen and cortisol, low estrogen) may favor success in resource competition, particularly under stressful and difficult circumstances (Cashdan, 2008). This hypothesis is mentioned in 10% of the papers.

#### Correlation With WHR

High androgen levels in women are associated both with higher WHR (Evans et al., 1983; Elbers et al., 1997; Santoro et al., 2005; van Anders and Hampson, 2005) and with greater assertiveness, competitiveness and aggressiveness in women (Purifoy and Koopmans, 1979; Baucom et al., 1985; Dabbs et al., 1988; Cashdan, 1995; Udry et al., 1995; Harris et al., 1996; Dabbs and Hargrove, 1997; Grant and France, 2001; von der Pahlen et al., 2002). Androgens also increase muscle mass and physical strength (Bhasin et al., 1996). Unfortunately, these studies have been conducted in western countries only, limiting the generalization of the results to other populations.

Androgens also shape features other than WHR (including facial traits, body features and voice; Abitbol et al., 1999; Rickenlund et al., 2003; Lefevre et al., 2013; Whitehouse et al., 2015), and individuals can rely on other cues to assess androgen levels. More importantly, men could use more direct cues of the ability to access resources (e.g., behavior, physical accomplishments or quantity of resources) and may not need indirect cues.

#### Effect on the Man's Reproductive Success

According to this hypothesis, having a relatively high WHR can increase a woman's survival and reproductive success, because she will be more able to work hard to support herself and her children, compete directly for resources for them, and cope with resource scarcity. Most of these effects will translate into positive effects on her mate's reproductive success.

In this case, the optimal female WHR (for herself and her mate) is likely to vary with the circumstances. In societies where women are expected to provide most of the food, through hard physical work and competition, the balance should be tipped toward a hormonal profile consistent with a higher WHR. In more benign conditions, where women get most of their resources from investing men, a hormonal profile consistent with a low WHR might be more adaptive (Cashdan, 2008).

Overall, as proposed by Cashdan herself, this hypothesis is more likely to explain the variations in women's WHRs (between environments and within lifetime) than to account for men's preferences (Cashdan, 2008). However, we cannot exclude that the link between WHR and women's ability to acquire resources might play a role in the variations observed in the exact value of the preferred WHR between populations.

#### Perception of the Characteristic Using WHR

One study investigating perceived aggressiveness finds no effect of WHR (Singh, 1994). Another study finds no effect of WHR on factors linked to perceived ambition, independence, selfconfidence and success (Henss, 1995). Two studies find that figures with low WHRs are rated as more dominant than figures with high WHRs (Henss, 2000; Buunk and Dijkstra, 2005), which goes in the opposite direction of what is expected according to the present hypothesis. However, these studies are designed to investigate the competition for a mate, not the competition for resources. Studies exploring the effect of WHR on the perceived ability to acquire resources (and not mates) are needed.

## Cue of Sex Ratio and Level of Testosterone in Descendants

This hypothesis includes two different sub-hypotheses. The first one, suggested by Manning et al. (1996), stipulates that women with a high WHR have more sons than women with a low WHR, controlling for the total number of children. The second hypothesis states that women with a high WHR have children exhibiting higher levels of testosterone. Pooled together, these two hypotheses are found in 4% of the literature (see **Figure 2**).

#### Correlation With WHR

A few studies show that a woman's WHR is positively correlated with her number of sons (Manning et al., 1996, 1999; Singh and Zambarano, 1997). However, these studies are measuring women who already have children and correlate WHR with the proportion of existing sons, and it is possible that having sons results in a greater increase in WHR than does having daughters. A more recent study looking at pre-conception WHR and offspring gender finds no significant correlation (Tovée et al., 2001). Thus, there is not enough evidence supporting the fact that a high WHR would be related to more sons in the future.

Manning also found that women with high WHRs tended to have children with low 2D:4D ratios (Manning et al., 1999). A low 2D:4D ratio is supposed to be correlated with high testosterone levels, and the authors conclude that women with high WHRs have more masculine children. However, there is new evidence that the 2D:4D is not a reliable indicator of the levels of testosterone (Hollier et al., 2015; Whitehouse et al., 2015; Apicella et al., 2016).

In conclusion, the idea that a woman with a high WHR will produce more sons or more masculine children is not supported by empirical data.

#### Effect on the Man's Reproductive Success

Several theories postulate that the sex of the descendants can influence an individual's reproductive success (Hiraiwa-Hasegawa, 1993; Hiraishi et al., 2016). The advantage of sons over daughters depends on various characteristics of both the parents (condition or rank) and the population (including dispersal patterns, inheritance of rank or resources, and degrees of local resource competition). In some cases, one sex has a greater chance of survival and a higher potential reproductive success than the other.

In the hypothetical case where high-WHR women would have children with high testosterone levels, choosing a mate with a relatively high WHR could represent an advantage in some environments. High testosterone is related to various characteristics (from muscular strength to competitive behavior; Bhasin et al., 1996; Apicella et al., 2011, 2015; Schipper, 2014), which could lead to a higher survival and a higher reproductive success.

To conclude, choosing a mate likely to produce more sons or more masculine children could increase the reproductive success of an individual, but it will depend on the environment and on the of the individuals' condition. More importantly, there is no solid evidence that WHR is an indicator of the sex ratio or masculinity of the future descendants. This hypothesis is therefore not supported by empirical evidences.

#### Perception of the Characteristic Using WHR

The effect of women's WHR on the perception of their children's sex ratio or masculinity has never been investigated.

#### Cue of Sexual or Maternal Behavior

Interestingly, the idea that a woman's WHR is linked to her behavior and personality, as perceived by others, is found in many of the pioneering papers of this literature (Singh, 1993a,b, 1994; Henss, 1995; Singh and Luis, 1995; Singh and Young, 1995; Furnham et al., 1998, 2004, 2005; Sugiyama, 2004). However, clear mentions of the hypothesis that WHR could be used as a predictor of past and future behavior by men to choose a mate are rare (2% of the papers) and recent (see **Figure 2**).

#### Correlation With WHR

Compared to women with a high WHR, women with a low WHR tend to have a less restricted sociosexuality, sexual intercourse at an earlier age, more sexual partners, and more extrapair copulations (Mikach and Bailey, 1999; Hughes and Gallup, 2003; Fisher et al., 2016). The question remains whether this correlation is due to different preferences and behaviors expressed by women (with hormonal levels as a potential proximal mechanism), or if it only reflects the different opportunities linked to different levels of physical attractiveness. In the latter case, this correlation cannot explain the origin of male preferences for a certain WHR [but it could potentially explain its maintenance, see section Cue of Sexy Daughters (Fisherian Runaway Model)].

Estrogen, testosterone and cortisol levels, all influencing WHR, are linked to maternal investment in many species, including humans (Fleming et al., 1997; Bardi et al., 2001; Dwyer et al., 2004). Thus, WHR could be a cue of women's maternal tendencies. However, there is no direct evidence of a correlation between a woman's WHR before pregnancy and her future maternal investment. Only a few studies provide some indirect evidences for this hypothesis, by showing a correlation between hormonal levels and reported maternal tendencies (Deady and Law Smith, 2006; Deady et al., 2006; Law Smith et al., 2012).

To conclude, more direct evidence is needed to validate the links and mechanisms between women's WHR and their behavior.

#### Effect on the Man's Reproductive Success

The effect of women's sexual behavior on their mates' reproductive success is double-edged. Women with unrestricted sociosexual orientations, relative to those with more restricted orientations, are more likely to engage in sex at an earlier point in their relationships and have more sexual partners (Simpson and Gangestad, 1991). Thus, being attracted to women with a less restricted sociosexuality might increase the man's chances of mating. On the other hand, women with unrestricted sociosexuality are also more willing to engage in and report higher levels of extradyadic activity (Seal et al., 1994; Barta and Kiene, 2005; Rodrigues et al., 2017; Weiser et al., 2018), therefore increasing the risk of extra-pair copulation costs for their mate (see section Cue of Current Pregnancy). However, these results need to be replicated in non-WEIRD populations before drawing any strong conclusions.

Mating with a woman with a less restricted sociosexuality also increases the risks of being contaminated by sexually transmitted diseases (Hall, 2012). Women with unrestricted sociosexual orientations report more casual sex encounters and multiple and concurrent sexual partners, factors known to increase the risk for exposure to sexually transmitted diseases (Seal and Agostinelli, 1994; Hoyle et al., 2000).

In sum, the effects of a less restricted sociosexuality on the mate's reproductive success are potentially positive for a short-term relationship if the occurrence of sexually transmitted

diseases in the population is low, and probably null or negative for a long-term relationship.

Higher maternal investment can increase the survival and quality of the descendants. However, as stated earlier, to this date there is no direct empirical evidence supporting pre-pregnancy WHR as a cue of future maternal investment.

Overall, this hypothesis has not been explored in many papers, and lacks empirical and theoretical support.

#### Perception of the Characteristic Using WHR

Several of the early studies investigate the effect of WHR on perceived behavioral and personality traits, but these papers do not include any theoretical background regarding WHR as a potential cue of behavior or personality (Singh, 1993a,b, 1994; Henss, 1995; Singh and Luis, 1995; Singh and Young, 1995; Furnham et al., 1998, 2004, 2005; Sugiyama, 2004). The absence of prediction in these papers is problematic, as the questions asked to the participants are sometimes unclear, and the authors often pooled together items which are linked to different hypotheses, making it impossible to properly test the hypothesis.

Some authors explore the effect of WHR on perceived traits like "desire for children," "likes children," "good parent," or "nurturing" (Singh, 1993a,b, 1994; Henss, 1995; Singh and Luis, 1995; Furnham et al., 2005), but the results are inconsistent. Thus, there is no good evidence that WHR is perceived as a cue of maternal behavior, but more appropriate tests with clear predictions are needed.

In a few studies, participants rated figures with high WHRs as more "faithful" (Singh, 1994; Singh and Young, 1995). Other studies find that figures with a low WHR are perceived as more "flirtatious" (Furnham et al., 2005). These results are in accordance with the hypothesis that WHR serves as a cue of sexual behavior.

## Cue of Sexy Daughters (Fisherian Runaway Model)

Fisher famously described a process whereby a small initial preference ultimately leads to extreme traits and preferences through "runaway" selection (Fisher, 1930). If a particular trait in one sex is preferred in mates, then genes disposing stronger preference for the trait could spread as they become linked with genes predisposing the preferred trait.

This hypothesis is not specific to WHR. In fact, the runaway process is almost never applied to men's preferences for WHR. Yet, in one paper, Singh explains that WHR is heritable and "offspring of women with lower, more feminine, WHR would have inherited good health and would have been physically attractive to potential mates" (Singh and Randall, 2007). Tassinary also refers to the runaway model, especially to explain why very small WHRs could theoretically be attractive to men (Tassinary and Hansen, 1998).

#### Correlation With WHR

For this hypothesis to be valid, WHR needs to be genetically heritable, and there is some evidence that this is the case (Donahue et al., 1992; Bouchard et al., 1996; Schousboe et al., 2004). According to this hypothesis, daughters of women with a low WHR will have a lower WHR and thus will be more attractive. The hypothesis also requires some heritability of preferences for a low WHR. However, this heritability may cease to be observed once the preference invades the population (since there will not be enough variance in the preferences left). Importantly, this hypothesis does not require any link between WHR and any physiological quality.

#### Effect on the Man's Reproductive Success

According to this hypothesis, a man mating with a woman with a low WHR will have more attractive daughters than if he mates with a woman with a high WHR. These attractive daughters will have a higher mating and thus reproductive success in the next generation in a population with men attracted by low WHRs, which will have a positive impact on their father's reproductive success. The size of the effect of women's WHR on their daughters' reproductive success remains to be identified. Indirect evidence can be found in studies showing that a low WHR is linked to a higher number of sexual partners, as a proxy for mating success (Mikach and Bailey, 1999; Hughes and Gallup, 2003).

It is important to point out that this hypothesis slightly differs from the other ones in this review because it only involves indirect selection on men's preferences. A man's mating preference is favored by direct selection if it increases his own lifetime reproductive output, and by indirect selection if his preference increases the reproductive output of his offspring. Some authors have shown that indirect selection on mate choice via the sexual attractiveness of offspring is a weak evolutionary force relative to direct selection (Kirkpatrick and Barton, 1997). However, such statements of relative strength should not be taken to imply that indirect selection is of little evolutionary importance (Kokko et al., 2003). This would be true only if direct and indirect selections were opposed, which does not seem to be the case for men's preference for WHR (most of the hypotheses point toward a preference for a low WHR). This hypothesis can then be seen as an additional force reinforcing direct selection on men's preferences.

Another possible limitation regarding this hypothesis is the indirect cost of sexual antagonism. If WHR is genetically heritable for both sexes, men will have to trade off higher sexiness in daughters with lower-quality sons when choosing a mate, as optimal WHR value differ between men and women (Rice and Chippindale, 2001). Measures of the heritability of WHR for both sexes is necessary to determine the existence of this indirect cost.

#### Perception of the Characteristic Using WHR

To my knowledge, there is no study investigating the effect of WHR on perceived attractiveness of a woman's future descendants. The only questions somehow related to this issue are asked by Andrews et al. (2017): "If this woman were to have a child, it would make friends easily;" "If this woman were to have a child, it would be popular." They find that the ratings for these items are higher for women with low WHRs. However, these questions were not specifically designed to explore this particular hypothesis.

TABLE 1 | Proposition of classification of the hypotheses found in the literature, according to their theoretical plausibility.


Some hypotheses are unlikely to explain men's preferences toward a certain WHR (column 1). Some hypotheses are likely to explain the emergence of men's preferences during our evolutionary history (column 3), while others better explain the maintenance of these preferences, once they already emerged in the population (column 4). Some hypotheses are good explanations for the selection of a certain WHR in human populations, but do not necessarily lead to a selection of men's preferences (column 5). Finally, some hypotheses need further investigation before one could properly estimate their plausibility (column 2). Both categories and hypotheses are non-exclusive.

#### Summary of Hypotheses Plausibility

The conclusions of the theoretical analyses of each hypothesis presented in this paper are summarized in **Table 1**. This classification is obviously not definitive and is anticipated to change according to the discovery of new evidence.

#### DISCUSSION

In this paper, I review the hypotheses explaining why men's preferences for a certain WHR in women may have been selected in the human species. These hypotheses are numerous, and overall, there is some solid theoretical and empirical support in favor of a selection of men's preferences for a mate with a relatively low WHR (with some variations on the exact value according to the population and the environment). However, many of the papers on this topic do not properly develop the theoretical framework, and some interesting hypotheses have been overlooked, while some of the most popular hypotheses require stronger theoretical or empirical support.

To show that men's preference for a certain WHR is an adaptation, it is necessary to demonstrate that a man choosing a mate with a certain WHR will benefit from an increase in reproductive success. Thus, it is crucial to describe the consequences of the preference and show that it can have an impact on the quantity or quality of men's descendants. Importantly, the ultimate focus here is the reproductive success of the individual who is expressing the preference, not of the woman displaying a certain WHR.

WHR as a cue of women's health is one of the most cited hypotheses, appearing in 87% of the papers examined in this review, although health issues linked to WHR have a very limited impact on the women's mates' reproductive success. WHR as a cue of women's fecundity is a notorious hypothesis but is not supported by empirical evidence among populations of young and non-obese women (which is the population of interest for the hypothesis). On the other hand, two hypotheses which are particularly good candidates (WHR as a cue of current pregnancy and parity) are too often forgotten in the literature. Some hypotheses are promising but have been largely overlooked (e.g., WHR as a cue of parasite load, diet or "sexy daughters"). WHR as a cue of quantity and availability of "reproductive fat" hypothesis has received decent empirical and theoretical support and is now generally accepted in the field. WHR as a cue of sex ratio and levels of testosterone in descendants is not supported by empirical evidence, and has therefore never taken hold in the field. Other interesting hypotheses are better suited to explain the presence of a sexually dimorphic WHR in our species through natural selection than men's preferences: WHR as a cue of pelvis size and center of body mass. The preference for slightly higher WHRs in some populations can be explain by WHR as a cue of the ability to acquire resources, although this hypothesis is primarily an excellent account for the variability of women's WHRs. Crucially, the numerous hypotheses reviewed in this paper are not mutually exclusive. The most likely scenario incorporates several of these hypotheses, operating at different periods of our evolutionary history.

To summarize, WHR is a powerful measure (as shown by the numerous physical and physiological characteristics correlated with it), but it may not be as "magical" as often assumed, and not all the features correlated with WHR are linked to mate value. Most of the mate value-related information provided by WHR is relatively basic (sex, age, number of children, current pregnancy). Nevertheless, WHR is a useful and practical visual trait aggregating the information that a potential mate might not even known is associated with an increase in his own reproductive success.

Non-adaptive explanations for men's preferences toward a certain WHR are not the focus of this paper but they are not necessarily refuted. For example, some authors argue that low WHR preferences may be the result of a generic psychological mechanism of enhanced responding to exaggerated features, or "supernormal" stimuli (Gray et al., 2003). According to this hypothesis, if men view a low WHR as "typical" of female bodies, this could lead men to prefer female WHRs that are even lower than normally attainable (Gray et al., 2003). However, this hypothesis still requires that men use WHR as a cue of biological sex (a hypothesis reviewed in this paper). Men's preferences for a certain WHR can also be explained by sociocultural theories. For example, it is argued that cross-cultural variations in men's preferences for women's WHR could be based on the gender roles occupied by men and women in different cultural settings (Swami et al., 2006a,b). But this hypothesis still requires an explanation regarding the origin of the association between WHR and a certain gender role. Finally, as mentioned earlier, some authors have argued that WHR might not be the best cue of a woman's mate value and that its correlation with attractiveness might be an artifact of men's preferences for another physical characteristic (Tassinary and Hansen, 1998; Tovée et al., 1999; Furnham et al., 2005; Cornelissen et al., 2009a,b; Brooks et al., 2010, 2015; Lassek and Gaulin, 2016). A similar systematic review focusing on a different measure instead of WHR might thus reveal a different picture than the one depicted here (although a few hypotheses concerning men's preferences for features correlated with WHR are incidentally already included in the present review).

The sketch presented by this review calls for more theoretical rigor and precision (and, to be clear, I do include myself in this criticism). Confusion about the theoretical framework can lead to inadequate predictions and suboptimal experimental designs. For example, the stimuli created to test the "WHR as a cue of current pregnancy" hypothesis should be different from the one used to test the "WHR as a cue of (reproductive) age," in terms of WHR range, WHR manipulation (hip or waist changes), and associations with other visual cues (e.g., age of the face). The questions asked to participants to explore the perception of characteristics induced by WHR are often too vague or inadequate, perhaps due to ambiguity in the underlying predictions. The imprecision of the predictions tested previously may have contributed to the increasing number of studies that find null results when testing evolutionary hypotheses for human mating preferences. Null results are not an issue per se, but the repeated failure to validate unsound predictions may incorrectly lead to the rejection of an evolutionary explanation to human mate preferences, thus undermining well-founded hypotheses by discrediting the general research paradigm. Finally, the posited theoretical framework will inherently drive the search for the empirical evidence necessary to support a hypothesis. Thus, it is possible that some of the hypotheses presented here would have received more empirical evidences if the theory had been clearer. For example, most of the evidence used to support the "WHR as a cue of health" hypothesis is not theoretically relevant (health issues at old age or evolutionary recent diseases), maybe in part because of an underspecified theory. With new and more precise predictions, as outlined in this review, additional evidences could be discovered through a deeper exploration of the relevant literatures.

This review has several limitations and should be regarded as a first step to a deeper understanding of this research question, and as a source of ideas to further test the evolutionary origins of mate preferences. I focus only on published research, as the aim is to inventory hypotheses accepted by the academic community, as well as their recognized justifications. However, an examination of unpublished data would be an important next step, in particular to give additional empirical support for the (im)plausibility of the different hypotheses. Moreover, the tentative classification of the different hypotheses presented in **Table 1** is based on their examination through verbal theorizing, but formal models might be helpful to provide a more objective way to define the likelihood of the different scenarios. Quantitative data on the correlations between WHR and the hypothesized traits needs to be gathered, as it would help for the specification of the parameters in such models. Additional layers could be explored to further scrutinize the plausibility of the hypotheses. For example, scenarios where a positive link between women's WHR and her offspring's reproductive success is a necessary condition would require a stronger selection than scenarios based on a higher number and survival of offspring.

In this review, I focus on the literature regarding men's adaptive preferences for women's WHR, but the criticism presented here could be applied to other research topics in evolutionary psychology. It is crucial to establish the evolutionary plausibility of existing hypotheses. Otherwise, we risk hanging on too long to implausible—although often parsimonious explanations, which can harm the credibility of our field in the long run. Since the replication crisis, much effort has been made to improve our methodological practices, which is extremely encouraging. I hope that this aspiration toward more rigor will also be reflected in how we approach the theoretical foundations of our research.

## AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.

## ACKNOWLEDGMENTS

I thank Vittorio Merola, as well as the three reviewers, for their meticulous and helpful comments.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.01221/full#supplementary-material

## REFERENCES


is associated with salivary testosterone levels. Biol. Psychol. 71, 29–32. doi: 10.1016/j.biopsycho.2005.01.009


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Bovet. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Skin Color Preferences in a Malaysian Chinese Population

*Kok Wei Tan1,2 \* and Ian D. Stephen3,4,5*

*1 School of Psychology and Clinical Language Sciences, University of Reading Malaysia, Iskandar Puteri, Malaysia, 2 School of Psychology, University of Nottingham Malaysia, Semenyih, Malaysia, 3 Department of Psychology, Macquarie University, North Ryde, NSW, Australia, 4 ARC Centre of Excellence in Cognition and its Disorders, Macquarie University, North Ryde, NSW, Australia, 5 Perception in Action Research Centre, Macquarie University, North Ryde, NSW, Australia*

Facial skin color influences the perceived health and attractiveness of Caucasian faces, and has been proposed as a valid cue to aspects of physiological health. Similar preferences for skin color have previously been found in African participants, while different preferences have been found among mainland Chinese participants. Here, we asked Malaysian Chinese participants (ethnic Chinese living in an Asian country with high levels of exposure to Western culture) to manipulate the skin color of Malaysian Chinese, Caucasian, and African faces to make them "look as healthy as possible." Participants chose to increase skin yellowness to a greater extent than to increase skin redness to optimize healthy appearance. The slight reduction in skin lightness chosen was not statistically significant after correction for multiple comparisons. While broadly in line with the preferences of Caucasian and African participants from previous studies, this differs from mainland Chinese participants. There may be a role for culture in skin color preferences, though methodological differences mean that further research is necessary to identify the cause of these differences in preferences.

Keywords: face perception, skin color, perceived health, Asian, culture difference

## INTRODUCTION

Since the 1990s, evolutionary psychologists have theorized that attractiveness and health judgments serve as a mechanism for identifying a healthy, fertile mate (for a review, see Stephen and Tan, 2015). While some facial cues, such as symmetry and averageness, may be perceived as attractive universally across populations (for reviews, see Rhodes, 2006; Little et al., 2011), other cues, such as body size, appear to vary cross-culturally, either due to cultural or ecological differences between populations (Tovée et al., 2006, 2007).

Facial skin color has been shown to influence perceived attractiveness and health (Stephen et al., 2009b, 2012; Whitehead et al., 2012b; Pezdirc et al., 2017), with increased lightness (represented by the L\* dimension in CIELab color space), redness (a\*), and yellowness (b\*) perceived as healthier. Preference for skin redness may be related to increased blood oxygenation (Stephen et al., 2009a), which serves as an indicator for aerobic fitness and fertility (Armstrong and Welsman, 2001; Charkoudian, 2003; Barelli et al., 2007). Skin luminance is determined by concentration of melanin (Edwards and Duntley, 1939) which is associated with health benefits such as photoprotection and synthesis of Vitamin D (Jablonski and Chaplin, 2000; Kourosh et al., 2010). Skin yellowness is influenced by levels of yellow-red carotenoid pigments

#### *Edited by:*

*T. Joel Wade, Bucknell University, United States*

#### *Reviewed by:*

*David Perrett, University of St Andrews, United Kingdom Xue Lei, University of St Andrews, United Kingdom, in collaboration with reviewer DP Justin Kyle Mogilski, University of South Carolina Salkehatchie, United States*

> *\*Correspondence: Kok Wei Tan t.kokwei@reading.edu.my*

#### *Specialty section:*

*This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology*

*Received: 30 November 2018 Accepted: 24 May 2019 Published: 19 June 2019*

#### *Citation:*

*Tan KW and Stephen ID (2019) Skin Color Preferences in a Malaysian Chinese Population. Front. Psychol. 10:1352. doi: 10.3389/fpsyg.2019.01352*

**186**

in the skin. These carotenoid pigments are obtained from fruit and vegetables in the diet, and then deposited in the top layer of the skin, the *stratum corneum* (Alaluf et al., 2002), and research in Caucasian, Asian, and African populations have linked fruit and vegetable consumption with increased skin yellowness (Stephen et al., 2011; Whitehead et al., 2012a; Tan et al., 2015). Carotenoids are thought to be beneficial for human immunity, visual acuity, and photoprotection of skin (Hughes, 1999; Samimi, 2005; Rao and Rao, 2007; Darvin et al., 2011).

Studies in Caucasian and Asian samples have shown that skin coloration associated with increased intake of fruit and vegetables is perceived as healthy and attractive (Stephen et al., 2011; Whitehead et al., 2012b,c; Lefevre et al., 2013; Lefevre and Perrett, 2015; Pezdirc et al., 2017; Tan et al., 2017), though some studies that did not control for facial expression (Appleton et al., 2018) or color calibration of images and monitors (Jones, 2018) have failed to replicate these results. While similar patterns of preferences for skin lightness, redness, and yellowness coloration have been reported for Caucasian and African samples (Stephen et al., 2009a, 2011; Coetzee et al., 2012), a recent study suggested that mainland Chinese participants show a weaker preference for increased redness and a stronger preference for increased lightness than Caucasian participants and, in contrast to Caucasian and African samples, prefer decreased yellowness (Han et al., 2018).

While the discrepancy between mainland Chinese and other samples may be explained by differences in methodology – Han et al. (2018) used a forced-choice paradigm that did not allow for the elucidation of the amount of color change that was perceived as healthiest – these differences may be attributable to cultural difference in skin color preference. Such cross-cultural differences have been found for other aspects of attractiveness preferences, including preferences for female body size (Swami and Tovée, 2005; Tovée et al., 2006) and male facial masculinity (DeBruine et al., 2010; Brooks et al., 2011), and may relate to cultural differences in cognitive process (Blais et al., 2008; Tan et al., 2012).

Malaysian Chinese, while ethnically Han Chinese, live in a Southeast Asian country that is strongly multicultural (61.7% of the population are ethnic Malay or indigenous, 20.8% Chinese, 6.2% Indian, 0.9% other, and 10.4% noncitizens; CIA, 2018) and influenced by Western culture (86% of movies shown in Malaysian cinemas are Western, 14% local, compared with 56% local movies in China; Epstein, 2011). Previous studies of face perception have found Malaysian Chinese participants to show patterns intermediate between Western and mainland Chinese samples (Tan et al., 2012), and exposure to Western culture has been shown to change individuals' face recognition strategies (Sangrigoli et al., 2005; Hancock and Rhodes, 2008) and attractiveness preferences (Tovée et al., 2006; Boothroyd et al., 2016). Malaysian Chinese participants have been found to show reduced (though still positive) preference for carotenoid coloration, which contains a large b\* component, compared to Western participants in an experimental study (Tan et al., 2017), and to show preferences for lighter and yellower, but not redder, skin in a correlational design (Tan et al., 2018). However, it is not yet known whether Malaysian Chinese show preferences for redness (a\*), yellowness (b\*), and lightness (L\*) in line with Western participants.

Here, we examine Malaysian Chinese participants' preferences for facial skin color by allowing them to manipulate – separately – facial skin lightness (L\*), redness (a\*), and yellowness (b\*) to optimize the healthy appearance of Asian, Caucasian, and African faces. In line with previous studies (Stephen et al., 2009a,b), we predicted that participants will increase skin redness, yellowness, and luminance to enhance the healthy appearance of faces (Lefevre and Perrett, 2015; Pezdirc et al., 2017; Tan et al., 2018). Previous studies have shown reduced preferences for yellowness and redness, and increased preference for lightness in Asian faces (Tan et al., 2017; Han et al., 2018) but not African faces (Stephen et al., 2011), compared to Caucasian faces, as perceived by own-race observers. However, the influence of color on perceptions of attractiveness has also been shown to be reduced in other-race faces, possibly due to unfamiliarity effects (Stephen et al., 2012). We predict a similar pattern of preferences for Malaysian Chinese participants observing Asian and Caucasian faces, and reduced effects in less familiar African faces.

## MATERIALS AND METHODS

#### Stimuli

Twelve facial photographs of three different ethnicities (four Caucasian, four African, and four East Asian) were obtained from Stephen et al. (2017). These photographs were taken under controlled conditions and color calibrated using Psychomorph (Tiddeman et al., 2001). Hair was held back from the face with a black head band, and participants were asked to pose with a neutral expression while holding a Munsell N5 painted board over their shoulders to obscure clothing.

Matlab was used to produce masks with even coloration representing the skin areas of faces, with a Gaussian blur at the edges. One mask was created to represent average face color +8 units of a\* (increased redness) and another one with average face color −8 units of a\* (decreased redness). Color changes are described using CIE L\*a\*b\* color space, in which colors are described along L\* [which takes values between 0 (darkest) and 100 (lightest)], a\* [which takes values between −110 (greenest) and 110 (reddest)], and b\* [which takes values between −110 (bluest) and 110 (yellowest)], which is designed to reflect the way in which the human visual system processes color information, and is perceptually uniform so that a change of 1 unit in one dimension is perceptually equivalent in magnitude to a change of 1 unit in another dimension (Martinkauppi, 2002). The Euclidean distance (ΔE) between two points in CIE L\*a\*b\* space mirrors the color differences as perceived by human vision (Wyszecki and Stiles, 1982). The facial redness of all the 12 faces used was transformed by the difference in color between each of the pairs of masks, in a series of 13 steps. This produced a series of 13 frames, numbered from 0 to 12, whereby frame 0 had skin redness reduced by 8 units of a\*, increasing incrementally so that frame 6 was the original image and frame 12 had skin redness increased by 8 units of a\*. Hair, eyes, clothing, and the background were not manipulated. This procedure was repeated for L\* (lightness) and b\* (yellowness) color axes (**Figure 1**).

FIGURE 1 | CIELab color transformation of a face with decreased (top) and increased (bottom) lightness (L\*, left); redness (a\*, middle); and yellowness (b\*, right). Presented face is a composite for illustration purposes, but photographs of real individuals were used as the stimuli.

#### Participants and Procedure

Forty-four Malaysian Chinese participants (18 males, 26 females; mean age = 22.05, SD = 1.23) were recruited for this study, giving 95% power to detect small to medium effect sizes in the hypothesized main effects and interactions. All participants were students at Universiti Tunku Abdul Rahman.

Stimuli were presented using computers attached to 15" TFT monitors that were color calibrated with a DataColor Spyder3 Pro. Participants were presented with facial images, one image at a time, and were asked to adjust the color of skin portions of the facial images presented to "make the face look as healthy as possible." By moving the mouse horizontally, the participants cycled through the 13 frames of the transform (same face, different level of color intensity). The participants clicked on the mouse when they felt that the face looked the healthiest.

Each facial image was presented once in each of the three different color dimensions (lightness, redness, yellowness), making a total of 36 trials (12 faces × 3 color dimensions). The location of the midpoint was randomized and the transform looped to obscure the location of the original facial color, and the order of the trials was also randomized in a single block.

## RESULTS

Mean color changes that were applied to the 12 faces along each color axis were calculated. One-sample *t*-tests showed that participants increased facial yellowness by 1.32 units (SD = 1.28), *t*(43) = 6.85, *p* < 0.001, and facial redness by 0.78 units (SD = 1.09), *t*(43) = 4.72, *p* < 0.001, and decreased facial lightness by 0.37 units (SD = 1.06), *t*(43) = 2.29, *p* = 0.027 (**Figure 2**), to optimize healthy appearance. Comparing these values to typical values of facial lightness (L\*), redness (a\*), and yellowness (b\*) of the studied populations obtained from previous datasets (Stephen et al., 2012; Tan et al., 2018), they correspond to an increase of 0.62 SD for b\*, 0.44 SD for a\* and − 0.07 SD for luminance. The result for skin lightness is no longer significant after Bonferroni correction for multiple comparisons.

A 4-way mixed ANOVA was run to examine the differences in the amount of color change applied to the faces of different sexes and ethnicites for all the three color axes, and participants of both genders.

There was a significant main effect for color axes, *F*(2, 84) = 25.91, *p* < 0.001, h*p* <sup>2</sup> = 0.38. Bonferroni-corrected pairwise comparisons showed a significantly greater increment in facial yellowness than redness (mean difference = 0.55, *p* = 0.037) or luminance (mean difference = 1.81, *p* < 0.001),

and greater increment in redness than luminance (mean difference = 1.25, *p* < 0.001).

A significant main effect of ethnicity was found *F*(2, 84) = 41.59.77, *p* < 0.001, h*p* <sup>2</sup> = 0.50. Caucasian faces received significantly more positive color adjustment as compared to African faces (mean difference = 1.05, *p* < 0.001) and Asian faces (mean difference = 0.36, *p* = 0.004). Asian faces also received more positive adjustment in facial coloration than African faces (mean difference = 0.69, *p* < 0.001).

No significant main effect was found for sex of face, *F*(1, 42) =0.015, *p* = 0.902, h*p* <sup>2</sup> = 0.000, nor sex of participants, *F*(1, 42) = 2.97, *p* = 0.092, h*p* <sup>2</sup> = 0.066.

There was a significant interaction for color × gender *F*(2, 84) = 4.357, *p* = 0.016, h*p* <sup>2</sup> = 0.094. We ran two one-way ANOVAs with repeated measures, and the pairwise comparison showed that, for male participants, there was a significant difference in their adjustment on skin luminance and skin redness (mean difference = −1.88, SE = 0.51, *p* = 0.006) and on skin luminance and skin yellowness (mean difference = −2.50, SE = 0.59, *p* = 0.002). There was no significant difference in adjustment on skin redness and skin yellowness (*p* = 0.35). For female participants, only the adjustment of skin luminance and skin yellowness was significantly different (mean difference = −1.12, SE = 0.26, *p* = 0.001).

There is a significant interaction for ethnic × gender of face × gender, *F*(2, 84) = 4.106, *p* = 0.02, h*p* <sup>2</sup> = 0.089, which is out of our main research focus. All the other interactions were not significant (*p* > 0.05).

#### DISCUSSION

The current study examined Malaysian Chinese participants' perception of healthy facial skin color. Participants increased the skin yellowness to a greater extent, and increased skin redness to a lesser extent to make faces of three ethnicities look as healthy as possible. The change in skin lightness was not significant after Bonferroni correction. A similar pattern of preferences for skin redness and skin yellowness was observed in previous studies, whereby Caucasian participants significantly increased facial skin yellowness and redness to optimize perceived facial health (Stephen et al., 2009b, 2011; Coetzee et al., 2012; Han et al., 2018). It may be that the observed preference for redder facial skin may be attributable to the appearance of the perfusion of the skin with oxygenated blood, which is associated with physical fitness and increased levels of sex hormones (Armstrong and Welsman, 2001; Charkoudian, 2001; Stephen et al., 2009a). Similarly, increased skin yellowness has been associated with higher levels of deposition of antioxidant carotenoids in the skin, associated with a diet rich in fruits and vegetables (Stephen et al., 2011; Lefevre et al., 2013; Pezdirc et al., 2014, 2017; Tan et al., 2015, 2017). Previous studies also found that preference for skin yellowness was stronger than that of skin redness and skin luminance (Lefevre and Perrett, 2015; Tan et al., 2018), which has been suggested to be related to the antioxidant properties of carotenoids (Paiva and Russell, 1999; Stahl and Sies, 2003), and its protective values to humans' physical health (Hughes, 2001; Tapiero et al., 2004; Krinsky and Johnson, 2005; Samimi, 2005; Rao and Rao, 2007).

However, it should be noted that Han et al. (2018) failed to find preferences for yellowness in mainland Chinese faces using a two-alternative forced-choice (2AFC) paradigm. Preferences of skin redness for mainland Chinese faces were also not as strong as those observed in the Caucasian sample. The manipulations used by Han et al. (2018), however, were more than double the amount of yellowness increment chosen by participants in the current study, and more than double the amount of carotenoidinduced color change preferred by Malaysian Chinese participants in a previous study (Tan et al., 2017), more than 1.5 SD of the yellowness in an Asian population (Tan et al., 2018), and more than triple the amount of color change preferred by participants in the current study. It may be, therefore, that the high redness and high yellowness images used by Han et al. (2018) were more extreme than looks healthy, and therefore real color preferences may have been obscured.

However, while previous studies have found that Caucasian, African, and mainland Chinese participants choose to increase the lightness of facial skin to optimize healthy appearance (Stephen et al., 2009b; Coetzee et al., 2012; Han et al., 2018), in this study, Malaysian Chinese participants decreased skin lightness, though this preference was no longer significant after Bonferroni correction. While Chinese diaspora culture typically values lighter skin, particularly in women (a common Chinese saying is "a fair skin can hide three facial flaws"; Mak, 2007), this may be offset in the Malaysian context, where ultraviolet radiation from the sun is frequently intense (Kuala Lumpur is less than 400 km from the equator), and increased levels of melanin provides increased protection from sunburn and skin cancer (Jablonski and Chaplin, 2000; Jablonski, 2004).

Previous studies have suggested that skin color changes are more easily detectable in lighter than in darker skinned populations (Coetzee and Perrett, 2014). In the current study, the amount of color adjustment made to optimize the apparent health of faces was greatest for Caucasian faces, followed by Asian, and then African faces, suggesting that skin color may play a greater role in the perception of health in faces from lighter skinned populations.

#### Limitations

It should be noted that the current paper allowed participants to manipulate the faces along each color axis separately. However, it may be that the color axes interact such that changes in one color axis affect preferences for color on a different axis. Studies in which all color axes are manipulated simultaneously are required to address this question.

#### REFERENCES


While some discrepancies in skin color preference have been observed across studies conducted at different geographical locations, it cannot be confidently concluded from these data that cultural differences account for the differences between the preferences shown here by Malaysian Chinese participants and participants from Western, African, and mainland Chinese populations. Studies in which methodology is standardized across multiple locations and in which measures of culture are deployed should be conducted to confirm the role of culture, as opposed to methodological differences or ecological differences, in driving the different preferences across populations.

In conclusion, Malaysian Chinese participants show a pattern of facial skin color preference intermediate between that reported in mainland Chinese (Han et al., 2018) and Western (Stephen et al., 2009b) populations, though more similar to the Westerners. While it may be speculated that exposure to Western culture may explain this pattern of results, future studies should standardize methodology across multiple geographical locations, and include measures of culture to confirm this hypothesis.

#### ETHICS STATEMENT

All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics Committee at the University of Nottingham Malaysia Campus.

#### AUTHOR CONTRIBUTIONS

KWT and IS contributed to the conception and design of the study. KWT collected the research data. Both authors performed the statistical analysis, wrote the manuscript, and read and approved the submitted version.

preferences for masculinized faces better than health does. *Proc. Royal Soc. Lond.* 278, 810–812. discussion 813-4. doi: 10.1098/rspb.2010.0964


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Tan and Stephen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Vocal Parameters of Speech and Singing Covary and Are Related to Vocal Attractiveness, Body Measures, and Sociosexuality: A Cross-Cultural Study

Jaroslava Varella Valentova<sup>1</sup> \*, Petr Turecek ˇ 2 , Marco Antonio Corrêa Varella<sup>1</sup> , Pavel Šebesta<sup>3</sup> , Francisco Dyonisio C. Mendes<sup>4</sup> , Kamila Janaina Pereira<sup>1</sup> , Lydie Kubicová<sup>3</sup> , Petra Stolarová ˇ <sup>3</sup> and Jan Havlícek ˇ 2

<sup>1</sup> Department of Experimental Psychology, Institute of Psychology, University of São Paulo, São Paulo, Brazil, <sup>2</sup> Faculty of Science, Charles University, Prague, Czechia, <sup>3</sup> Faculty of Humanities, Charles University, Prague, Czechia, <sup>4</sup> Department of Basic Psychological Processes, University of Brasília, Brasília, Brazil

#### Edited by:

Ian Stephen, Macquarie University, Australia

#### Reviewed by:

Justin Kyle Mogilski, University of South Carolina Salkehatchie, United States Katarzyna Pisanski, University of Sussex, United Kingdom

> \*Correspondence: Jaroslava Varella Valentova jaroslava@usp.br

#### Specialty section:

This article was submitted to Evolutionary Psychology, a section of the journal Frontiers in Psychology

Received: 31 October 2018 Accepted: 20 August 2019 Published: 22 October 2019

#### Citation:

Valentova JV, Turecek P, ˇ Varella MAC, Šebesta P, Mendes FDC, Pereira KJ, Kubicová L, Stolarová P and Havlí ˇ cek J (2019) ˇ Vocal Parameters of Speech and Singing Covary and Are Related to Vocal Attractiveness, Body Measures, and Sociosexuality: A Cross-Cultural Study. Front. Psychol. 10:2029. doi: 10.3389/fpsyg.2019.02029 Perceived vocal attractiveness and measured sex-dimorphic vocal parameters are both associated with underlying individual qualities. Research tends to focus on speech but singing is another highly evolved communication system that has distinct and universal features with analogs in other species, and it is relevant in mating. Both speaking and singing voice provides relevant information about its producer. We tested whether speech and singing function as "backup signals" that indicate similar underlying qualities. Using a sample of 81 men and 86 women from Brazil and the Czech Republic, we investigated vocal attractiveness rated from speech and singing and its association with fundamental frequency (F0), apparent vocal tract length (VTL), body characteristics, and sociosexuality. F0, VTL, and rated attractiveness of singing and speaking voice strongly correlated within the same individual. Lower-pitched speech in men, higherpitched speech and singing in women, individuals who like to sing more, and singing of individuals with a higher pitch modulation were perceived as more attractive. In men, physical size positively predicted speech and singing attractiveness. Male speech but not singing attractiveness was associated with higher sociosexuality. Lower-pitched male speech was related to higher sociosexuality, while lower-pitched male singing was linked to lower sociosexuality. Similarly, shorter speech VTL and longer singing VTL predicted higher sociosexuality in women. Different vocal displays function as "backup signals" cueing to attractiveness and body size, but their relation to sexual strategies in men and women differs. Both singing and speech may indicate evolutionarily relevant individual qualities shaped by sexual selection.

Keywords: human voice, song, vocal attractiveness, fundamental frequency, sociosexuality, fitness indicators, music, voice modulation

## INTRODUCTION

Speech and singing are among the most common vocal productions in adult humans and their presence seems to be universally shared across modern human populations (Brown, 1991). It is assumed that they have a common ancestor (Brown, 2001, 2017; Mithen, 2005) which evolved into two specialized systems of structured vocal communication (Lehmann et al., 2009). It also seems

that prosody, the musical part of speech which conveys mainly emotional information, is rooted already in the origins of both spoken and sung vocal production (Filippi, 2016; Brown, 2017). It has recently been shown that speech and singing may have diverged from a protolanguage and split in two systems based on their communicative function. In particular, when referential and emotional functions are introduced into an artificial communication system, the system diverges into speech- and music-like vocalizations, respectively (Ma et al., 2019). Moreover, despite a vast variability across cultures, the function of specific kinds of songs (e.g., a love song) is crossculturally comprehensible based on their structural form (Mehr et al., 2018). Interestingly, both human and bird songs tend to employ similar descending/arched melodic contour despite substantial differences in absolute pitch and duration, which indicates similar underlying motor constraints across cultures and species (Savage et al., 2017).

Singing and speech differ in the use of vocal anatomy (Sundberg, 1977, 2018), require different patterns of breathing (Leanderson et al., 1987), and neuroanatomy of production and appreciation is likewise specific to each of the two domains (Zatorre and Baum, 2012). Cognitive processing of speech and singing is also specific for each domain, as shown in patients with amusia who have intact speech processing and patients with aphasia who have no impairment of musical capacities (Peretz and Coltheart, 2003). Despite the different design features, such as the arbitrariness of speaking and regular beat and discrete set of pitches in singing, the two domains share some further features, such as hierarchical structure and complexity (Fitch, 2006). Moreover, both speaking and singing voice provide relevant information about the producer's gender, identity, location, emotional state, and behavioral tendencies (Weninger et al., 2011) and individuals can identify others based on their speech and singing (Trehub et al., 2009).

While spoken language is mostly specific to humans and language-like forms of vocalization exist in a few other animals (prairie dogs, dolphins, etc.) (Slobodchikoff et al., 1991; Janik, 2013), singing has its parallels in many other species. The capacity for learning complex songs, new sequences and sounds has arisen independently in birds (songbirds, hummingbirds, and parrots) and mammals (whales, seals, and humans) (Fitch, 2005). Since Darwin's (1871) groundbreaking works, sexual selection has been viewed as one of the most important factors that drove the evolution of singing as a way of attracting the opposite sex and advertising individual qualities. There is a large body of research showing the importance of singing in mating success across various avian and mammalian species (e.g., Searcy and Andersson, 1986). In some species, singing seems to function as an honest signal of underlying individual qualities, so that e.g., lower-pitched songs advertise a larger body size (Hall et al., 2013). In humans, irrespective of their original adaptive value, speaking and singing can likewise be considered honest signals that meet the four requisite criteria (Smith and Bird, 2000). They both require a long time for maturation, practice, and learning (Welch, 2006), their production is energetically costly because they rapidly fade (Fitch, 2006), they can suffer from noise interference, and require intense breathing (Leanderson et al., 1987). Both speech and singing are easily perceptible by most people, are used in mating-relevant contexts, such as courtship (White et al., 2018), can increase individual mating success, and both can serve as cues to genetic qualities of the producer (Miller, 2000). There are also some significant differences between the two: singing requires higher vocal control (Zarate, 2013) and is more demanding than speech because singers need to tailor the subglottal pressure to both pitch and loudness (Sundberg, 1977, 2003). Singing can also be louder than speech, involving more muscle activity (Åkerlund and Gramming, 1994; Leanderson et al., 1987), and it includes a performative context (Fitch, 2006) which attracts more attention and is thus socially riskier. People even tend to abbreviate their singing performance in front of supposedly expert audience (Garland and Brown, 1972). It is thus well possible that singing is even harder to fake as an honest signal of underlying individual qualities than speech is, thus serving as an ornament that can affect the quantity or quality of sexual partners.

Human voice plays an important role in mate preferences and intrasexual competition (Puts, 2010; Pisanski and Feinberg, 2019), but so far, most research on human voice attractiveness and its indicators focused on speech. Some vocal parameters, especially the fundamental frequency (F0), differ between males and females of many species, with humans exhibiting an even greater sexual dimorphism than other primates (Puts et al., 2016). F0 is produced by vibrations of the vocal folds within the larynx and together with the corresponding harmonics is perceived as voice pitch (Pisanski et al., 2016). On average, men produce lower-pitched voices than women: this is due to the effects of testosterone during puberty which thickens and lengthens male vocal folds and thereby lowers the F0 (Pisanski and Feinberg, 2019). From a more general perspective, vocal sexual dimorphism is supposed to be at least in part the result of intrasexual competition, especially in the context of male-male competition (e.g., Puts, 2010). Indeed, men with lower-pitched voices are perceived as older, taller, heavier, more masculine, and more dominant than men with higher-pitched voices (Collins, 2000; Feinberg et al., 2005; Puts et al., 2006, 2007; Pereira et al., 2019). And similarly, women with lower-pitched voices are perceived as more dominant (Borkowska and Pawlowski, 2011), and both men and women with lower-pitched voices reported higher leadership capacities (Klofstad et al., 2012).

Aside from intrasexual competition, intersexual selection may have also played a role in shaping sex differences in voice. There is robust evidence that women prefer relatively low-pitched male speaking voices, while men prefer relatively high-pitched female voices (for a review, see Pisanski and Feinberg, 2019). Nevertheless, the relationship between male and female F0 and attractiveness is non-linear: the most attractive male voices are around 96 Hz and the most attractive female ones up to 280 Hz (Borkowska and Pawlowski, 2011; Saxton et al., 2015). Importantly, preferences for lower- and higher-pitched voices in men and women, respectively can be specific to certain contexts and individuals, such as short-term relationships (Little et al., 2002), coupled women (Valentová et al., 2013), and nulliparous women (Apicella and Feinberg, 2009), and in some populations that can even be inverted (Shirazi et al., 2018). Moreover, recent

evidence suggests that lower-pitched female voices are perceived as attractive (Babel et al., 2014), and women actively lower their voices when speaking to attractive men or when willing to sound attractive (Hughes et al., 2014; Pisanski et al., 2018; but see Fraccaro et al., 2011). Lower pitched voices in women can thus signal their immediate interest and/or sexual appetence.

In line with the fitness indicator hypothesis within the sexual selection theory, vocal characteristics can convey information about the underlying qualities of voice producers, e.g., information about their health and reproductive potential. For example, men with relatively low-pitched voices exhibit low cortisol and high testosterone levels, which are related to immunoreactivity (Evans et al., 2008; Hodges-Simeon et al., 2015; Puts et al., 2016). Moreover, among North American men, a lower-pitched voice is associated with more female sexual partners (Puts, 2005), and lower-pitched male Hadza hunter-gatherers have on average a higher number of offspring (Apicella et al., 2007). Furthermore, both men and women with more attractive voices reported more sexual partners, extra-pair copulations, and earlier age of the first sex (Hughes et al., 2004), which are all considered proxies of potentially higher reproductive success.

Moreover, voice attractiveness is associated with several body measures that develop under the influence of sex-specific hormones and are thus viewed as indicators of genetic and developmental quality, and subsequently also the reproductive fitness of the individual. For example, voice attractiveness is positively associated with the shoulders-to-hip ratio in men and negatively associated with the waist-to-hip ratio in women (Hughes et al., 2004). Low pitched male voices are linked to larger body size, especially weight and height, to a particular body shape (shoulder and chest circumference, shoulder-to-hip ratio) (Evans et al., 2006), and arm strength (Puts et al., 2011). Nevertheless, a recent meta-analysis had shown that compared to other vocal parameters, voice pitch is not a reliable predictor of height in adults of the same sex (Pisanski et al., 2014) and it is a poor predictor of body weight, shape, and strength (Collins, 2000; Collins and Missing, 2003; Bruckert et al., 2006; Evans et al., 2006; Sell et al., 2010; Vukovic et al., 2010; Pisanski et al., 2016; Raine et al., 2019).

Formants, on the other hand, which are the resonant frequencies of the vocal tract, are more constrained by the anatomical structures related to body size. Formants are anatomically and functionally dissociated from fundamental frequency and are therefore a more reliable indicator of body size and shape both in humans and in numerous other mammalian species (Pisanski et al., 2014). Formants are also sexually dimorphic, whereby men show lower formant frequencies than women (Pisanski et al., 2016). Individuals who produce lower formant frequencies are perceived as more physically dominant (Puts et al., 2007) and women who produce higher formant dispersion are perceived as flirtatious and attractive by both men and women (Puts et al., 2011). Individual vocal characteristics thus may provide cues to different bodily traits and sexual behaviors linked to individual's potential reproductive success.

Importantly, voice is a dynamic behavioral display which can be both intentionally and involuntarily modulated under specific situations so as to express or exaggerate ecologically relevant traits, including emotions (Pisanski et al., 2016). For example, both men and women change their voice when speaking to infants (Foulkes et al., 2005; Broesch and Bryant, 2015) and this specific infant-directed speech affects attention and communicative outcomes of the children (Rowe, 2012; Spinelli et al., 2017). Similarly, women modulate voice pitch when speaking to attractive men (Fraccaro et al., 2011; Hughes et al., 2014; Pisanski et al., 2018) and voices of both men and women who speak to an attractive individual are perceived as more attractive by others (Leongómez et al., 2014). Also, people can volitionally increase their vocal tract length (as estimated from formant frequencies) and decrease fundamental frequency to imitate a larger body size, and vice versa (Pisanski et al., 2016). The overall prosody of speech can be effectively modulated when expressing different emotions, such as high, loud, and fast prosody while feeling happy, and the opposite pattern while being sad (for review, see Brown, 2017). Interestingly, the same vocal modulation appears when expressing emotions by music, which suggests that both displays may convey similar information (Juslin and Laukka, 2003; Zatorre and Baum, 2012).

Although both singing production and perception is a scientific research field in its own right (Sundberg, 2003), singing accuracy is related to several loci on chromosome 4 and exhibits 40% heritability (Park et al., 2012), and singing frequently features in mating contexts (e.g., as serenades and love songs, see Dukes et al., 2003; Levitin, 2008), it tends to be overlooked by psychological research on voice attractiveness. As an exception, one study found that women who were judged as good singers based solely on the audio recordings were also independently rated as more attractive based on soundless video recordings (Wapnick et al., 1997). This is in line with research which shows that in women, attractiveness and masculinity-femininity ratings based on different modalities are correlated (e.g., Valentova et al., 2017c; Pereira et al., 2019). Nevertheless, further research is needed to test to what extent are the perceptual characteristics of speech and singing voice intercorrelated and whether both vocal displays function as backup signals, i.e., as signals that indicate similar underlying qualities, rather than multiple messages, i.e., signals that indicate different qualities of individuals (see Johnstone, 1996; Bro-Jørgensen, 2010). To the best of our knowledge, only one study tested the attractiveness of speech and singing in women and it concluded that attractiveness rated from both vocal displays is correlated and in both cases increases with voice pitch (Isenstein, 2016). This can be viewed as indicating that different vocal displays may serve as backup signals.

## Aims of the Current Study

In the current study, we tested whether certain perceptual singing and speaking characteristics (perceived attractiveness, voice pitch, and formant frequencies) serve as cues to specific individual physical and behavioral qualities. Since singing production is more costly than speech, one could predict that the perceived attractiveness of singing would be a stronger indicator of individual quality than the attractiveness of speech. We have therefore tested the association between the attractiveness of singing and speech and selected body fitness indicators (body

size and shape). We have also tested the relation between attractiveness ratings of both vocal displays and sociosexuality, which we used as a proxy of a short-term sexual strategy that may, especially in men, lead to increased reproductive success. We have further investigated how selected vocal parameters (voice pitch and vocal tract length as estimated from formant frequencies) mediate the possible associations between the vocal attractiveness, body cues, and sociosexuality.

Further, we tested whether the capacity to modulate the voice and singing experience may influence the rated vocal attractiveness. We hypothesized that both singing experience and a higher ability to modulate voice would lead to a more attractive vocal production.

Additionally, we tested for possible differences in vocal parameters between the sexes in two distinct populations, a Brazilian and a Czech one. So far, very little cross-cultural research has been conducted on evolutionarily relevant aspects of voice characteristics and perceptions. Majority of that research was conducted in the United States, Western and Central Europe (for review, see Pisanski and Feinberg, 2019). Studies comparing more populations with different physical, cultural, and linguistic compositions are thus needed to increase generalization of results. For example, although most North American and European studies concluded that women prefer lower-pitched male voices, Filipino women seem to follow the opposite pattern (Shirazi et al., 2018). In our study, we employed two sets of participants using sampling in one South American and one Central European population (Brazil and Czech, respectively), which differ widely as to their history, culture, ethnicity, and demographic data, and which both also differ from Western European and North American societies. Moreover, these populations also differ in several body measures, such as height and weight (e.g., Varella et al., 2014; Valentova et al., 2016), facial and body hair in men (Valentova et al., 2017b), while self-rated breast size, buttock size, and WHR in women is the same in both (Valentova et al., 2017a). Furthermore, Brazilian population reports a significantly higher sociosexuality than the Czech population (Varella et al., 2014). Both populations are also linguistically different: Brazilian Portuguese is a Latin language while Czech belongs to Slavic languages. Previous studies reported that several vocal parameters differ between the different linguistic groups (Mennen et al., 2012). The two populations thus offer an interesting opportunity to analyze vocal production and perception and its relation to body measures and sociosexuality.

## METHODS

#### Target Participants

The final sample was composed of 40 Brazilian men (M = 23.70 years; SD = 3.67, range 19–34) and 44 women (M = 23.91 years; SD = 4.99, range 18–35) recruited at the University of São Paulo, in São Paulo city, and 33 Czech men (M = 22.45 years; SD = 2.35, range 18–28) and 35 women (M = 22.37 years; SD = 2.57, range 19–29), recruited at the Charles University, Prague. We selected predominantly heterosexual participants (0–2 on a Kinsey scale) because individuals with different sexual orientations can show variation in several vocal parameters (Kachel et al., 2018) which can be detected even by naïve listeners (Valentova and Havlícek, 2013 ˇ ).

#### Procedure

In both countries, each participant consented to take part in a broader study (see, Varella et al., 2014; Valentova et al., 2017c). Participants completed questionnaires, we took body measurements, standardized facial and body photographs, and recorded videos of both speech and singing. Only data relevant for this study are described below. Brazilians are not allowed to receive financial reward but Czech participants received remuneration amounting to 300 CZK (approximately 13 USD). The project was approved by the Charles University IRB (2011/07).

## Questionnaires

Participants completed a sociodemographic questionnaire and the Revised Sociosexual Orientation Inventory (SOI-R; Penke and Asendorpf, 2008). The SOI-R measures an individual's willingness to engage in uncommitted sex. It consists of nine items (e.g., "With how many different partners did you have sexual intercourse on one and only one occasion?"), loading into three subscales of sociosexual behavior, attitudes, and desire. They also answered, on a 10-point scale, how much they liked to sing (1 = not at all, 10 = very much). We used this information as a motivational factor that may influence singing frequency, singing training, and thus singing experience, as shown in Busch (2013).

## Vocal Recordings

Vocal samples were recorded under standardized conditions, in a closed and quiet room, and all by one researcher. For all recordings, we used a professional digital stereo Olympus LS-100 Multi-Track Linear PCM recorder, whereby the participants' lips were approximately 10 cm from the microphone. When performing the vocal tasks, all participants were seated on a chair. First, participants were informed about the whole recording procedure: this information was printed for them. After a small vocal exercise to warm-up the voice and get used to being recorded, participants read a short sentence using standardized names across all participants. In Brazil, all men and women, respectively, pronounced "Oi, meu nome é Pedro/Ana, e eu sou de Belo Horizonte," while Czech men and women, respectively, said "Jmenuji se Petr/Petra a pocházím z Havlíˇckova Brodu" (Hi, my name is Petr/Pedro/Petra/Ana and I come from Belo Horizonte/Havlíˇckuv Brod ˚ ). Subsequently, they sang the first part of "Happy Birthday" (in the Brazilian Portuguese version "Parabéns para você, nesta data querida, muitas felicidades, muitos anos de vida," in the Czech version "Hodne št ˇ estí zdraví, hodn ˇ eˇ štestí zdraví, hodn ˇ e št ˇ estí, milý Honzo, hodn ˇ e št ˇ estí zdraví ˇ "). Finally, they first read and then sang the first stanza of their national anthem (the verbal content of speech and singing was thus matched).

To minimize raters' overload, we extracted parts of the national anthem using SoundForge 8.0 software. In the Brazilian sample, we extracted the first two lines of the national anthem ("Ouviram do Ipiranga as margens plácidas, de um povo heróico o brado retumbante"), while for the Czech participants,

we extracted the third and fourth line, which unlike the first two lines are not repetition of each other ("Voda huˇcí po luˇcinách, bory šumí po skalinách"). Only these recordings were subsequently rated by independent participants and analyzed for vocal parameters. All participants spoke their native language, i.e., either Brazilian Portuguese or Czech. None of the participants reported any serious vocal or respiratory problem at the time of the data collection.

Happy Birthday was selected because it is cross culturally known and commonly sung in intimate and emotionally loaded social situations, usually with the family, friends, and romantic partners, and it has been used in research on singing previously (e.g., Christiner and Reiterer, 2013). The national anthem is also widely known within each country, it is relatively unconnected to mating context and is thus more neutral.

Recordings were analyzed using Praat software (Boersma and Weenink, 2013) for mean, minimal, and maximal fundamental frequency (F0), and the first four formants (F1–F4). F0 is the rate of vocal folds vibration perceived as an overall voice pitch. We used an autocorrelation algorithm with parameters set to a pitch floor of 75 Hz and pitch ceiling of 300 Hz for men, and a pitch floor of 100 Hz and pitch ceiling of 500 Hz for women, because these are the appropriate boundaries for analyzing adult voices recommended by the software developers (Boersma and Weenink, 2013). All other values were set to default. Average speech F0 per recording ranged between 92.47 (Corresponding to musical note F#2, here F note is heightened by semitone, which is indicated by #) and 177.70 Hz (F3) in men, and between 164.10 (E3) and 253.10 Hz (B3) in women. For singing, F0 ranged between 103.60 (G#2) and 208.50 Hz (G#3) in men, and between 168.5 (E3) and 348.20 Hz (F4) in women. All F0 were transformed to perceptual pitch expressed in a semitone difference between A4 (440 Hz) and F0 using a standard formula 12log<sup>2</sup> (F0/440). This scale is based on standard music notation and reflects the logarithmic nature of human pitch perception, where both A<sup>3</sup> (−12, 220 Hz) and A<sup>5</sup> (12, 880 Hz) are at an equal octave distance (12 semitones) from A4. We subtracted the minimal F0 from the maximal F0 of each recording to obtain its perceptual range in semitones. Average speech range per recording ranged between 4.61 and 21.07 semitones in men and between 5.34 and 27.61 semitones in women, while the singing range ranged between 6.76 and 23.74 semitones in men, and between 8.76 and 27.84 semitones in women. F0 and ranges were averaged for each participant across recordings for speech and singing separately.

Apparent vocal tract length (VLT) was calculated from the first four formants (F1–F4) according to a formula described in Pisanski et al. (2014). F1 to F4 were measured in Praat using semiautomated approach. First, recordings were preprocessed by Vocal Toolkit's "Extract voiced and unvoiced" script (Corretge, 2019) and subsequently only the voiced parts were used for further formants analysis. Second, formants were analyzed by Burg method with recommended preset values and maximum formant levels of 5000 and 5500 Hz for men and women, respectively. In each recording from the list of results were omitted readings suggesting presence of silence and erroneous readings. F1 to F4 levels are represented by median of remaining formants readings.

Subsequently, formant spacing (1F) was estimated as a slope of the linear regression line with an intercept set to 0 from a relationship

$$F\_i = \frac{(2i - 1)}{2} \Delta F$$

where "i" refers to the formant number. Apparent vocal tract length was derived from formant spacing using

$$VTL(\Delta F) = \frac{c}{2\,\Delta F}$$

where c = 33.500 cm/s is the speed of sound in a uniform tube with one end closed.

#### Anthropometry

We measured participants' body height in centimeters, weight in kilograms, and body characteristics previously found to be associated with vocal attractiveness, namely the circumference of the shoulders, waist, and hips (Dixson et al., 2003; Stulp et al., 2013; Valentova et al., 2014, 2016, 2017a). Then we computed the waist-to-shoulder ratio (WSR) in men and waistto-hip ratio (WHR) in women (for details on the procedure, see Varella et al., 2014).

## Vocal Ratings

An independent sample of heterosexual raters anonymously judged voice attractiveness of all vocal recordings of individuals of the opposite sex on a 7-point scale (1 = not at all attractive, 7 = very attractive) using Rater software (facelab.org). All raters reported being predominantly heterosexual (0–2 on a Kinsey scale). Brazilian raters (51 men: M = 22 years, SD = 3.4 years; 59 women: M = 22.1 years, SD = 3.4) were recruited among the students of the University of Brasília, while the Czech raters (46 men: M = 21.7 years, SD = 1.9; 47 women: M = 20.6 years, SD = 1.1) were recruited at the Charles University, Prague. The rating took place in an empty classroom, each voice recording containing the relevant phrase was presented only once using headphones and with unmanipulated volume. Each rater evaluated either all Brazilian or all Czech recordings. For instance, one Brazilian rater rated all Czech recordings, while another Brazilian rater rated all Brazilian recordings. The recordings were divided into eight blocks (two speech and two singing recordings, Brazilian and Czech sample) and randomized within each block. Interrater agreement (Cronbach's α) was high in all recording × rater set combinations (min α = 0.79) (For a full overview of Cronbach's α, see **Supplementary Material**). Pearson correlations between average attractiveness ratings of Czech and Brazilian raters were high for both speech [r = 0.694, 95%CI (0.602,0.768) p < 0.001] and singing [r = 0.788, 95%CI (0.719,0.841) p < 0.001]. We have therefore used as a unit of analysis the mean rating of attractiveness for each target across all raters.

#### Statistical Analyses

All analyses were conducted using R 3.5.1 software, and SPSS version 21 (IBM Corp., Armonk, NY, United States). To

explore associations between the measured and rated voice parameters in speech and song, we ran parametric correlations (Pearson correlation) and paired t-tests to test for possible differences between the two vocal displays.

Relationships between the four exogenous variables (waistto-hip or waist-to-shoulders ratio, height, weight, and age), mediating acoustic qualities (speech and singing F0 and range), speech and singing attractiveness, and the total sociosexuality score were investigated using path analysis. The structural model contained 6 correlations and 38 regression coefficients. Analysis was conducted using sem() function from the lavaan package. Because of small parameters/observations ratio (as low as 1.66 in the male sample), robust p values were obtained using Monte Carlo simulation. The distribution of expected correlation/regression coefficients was derived from 10,000 simulation runs, where the full model was estimated on a randomized dataset. The issue of influential points was avoided by jackknife resampling. Removing one observation at a time, we extracted sets of all measures including standardized model estimates and p values. Coefficients which remained significant regardless of the removed data points are emphasized in the main article, while full results are reported in the **Supplementary Material**. Path invariance was tested from the χ <sup>2</sup> difference between configural invariant, where structure is restricted to be equal between the groups, and path invariant, where all coefficients are restricted to be equal between the groups, with degrees of freedom corresponding to the number of estimated parameters. Path invariance was evaluated between men and women and subsequently between Czech and Brazilian participants within each sex. Interrater agreement was evaluated using Cronbach's α calculated using alpha() function from the psych package (the code is available at https://github.com/costlysignalling/Speech-and-singingattractiveness).

Further, to test for the possible effect of voice experience on rated voice attractiveness, we assessed non-parametric correlations (Kendall rank correlation indicated by coefficient τ) between the rated attractiveness of both spoken and sung recordings and how much the participants liked to sing. To test the voice modulation hypothesis, we computed the absolute difference between singing and speaking F0, singing and speaking F0 range, and the absolute difference between singing and speaking VTL, which gave us an index of (dis)similarity of these vocal parameters between the two vocal displays. The higher the absolute difference, the larger the difference between speech and singing, and thus the higher vocal modulation. We further correlated these absolute differences with attractiveness ratings, separately for men and women. In these analyses, we did not control for multiple comparisons across tests, because the samples were independent.

Additionally, we used General Linear Models (GLM) to test for possible effects of sex, age, and country on voice attractiveness ratings. Similarly, to test whether mean F0, range F0, and VTL of speech and singing differ between men and women or between Brazilian and Czech participants, we performed a multivariate GLM with mean F0 and F0 range as dependent variables and sex and country of targets as factors. Due to a limited samples size, we evaluated only simple models. The effect size displayed is a partial Eta-squared (η<sup>p</sup> 2 ).

## RESULTS

## The Effect of Targets' Sex and Country on Spoken and Sang F0, F0 Range, and VTL

We found large effects of targets' sex on all vocal parameters; mean speech F0 (F = 1074.30, df = 1, 153, p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.878), mean speech F0 range (F = 14.12, df = 1, 153, p < 0.001, ηp <sup>2</sup> = 0.086), VTL as measured from speech (F = 2114.02, df = 1,153, p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.934), mean singing F0 (F = 736.84, df = 1, 153, p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.831), mean singing F0 range (F = 7.00, df = 1, 153, p = 0.009, η<sup>p</sup> <sup>2</sup> = 0.045), and VTL as measured from singing (F = 1537.91, df = 1, 153, p < 0.001, ηp <sup>2</sup> = 0.911). Estimated marginal means revealed that women had a higher F0 and F0 range and shorter VTL than men (for mean values, see **Table 1**). There was also a significant effect of the target country on speech F0 range (F = 4.31, df = 1, 153, p = 0.040, η<sup>p</sup> <sup>2</sup> = 0.028), VTL as measured from speech (F = 10.49, df = 1,153, p = 0.001, η<sup>p</sup> <sup>2</sup> = 0.065), and VTL as measured from singing (F = 6.59, df = 1, 153, p = 0.011, η<sup>p</sup> <sup>2</sup> = 0.042). Estimated marginal means show that Czech participants had a lower speech F0 range and longer VTL than the Brazilian participants (see **Table 1** for details).

It is worth noting that the average VTL measures for men and women (**Table 1**) compare to population-level averages (Pisanski et al., 2014).

## Comparisons Between Speaking and Singing Voice

F0 measured from speech was strongly positively correlated with F0 measured from singing in both men (r = 0.800, N = 73, p < 0.001) and women (r = 0.607, N = 79, p < 0.001). F0 range measured from speech was correlated with F0 range measured from singing in men (r = 0.408, N = 73, p < 0.001) but not in women (r = 0.160, N = 79, p < 0.159). Vocal tract length (VTL) as estimated from formant frequencies was strongly positively correlated between speech and singing in both men (r = 0.808, N = 81, p < 0.001) and women (r = 0.764, N = 85, p < 0.001). Vocal attractiveness rated from speech and singing was also strongly positively correlated in both men (r = 0.720, N = 73, p < 0.001) and women (r = 0.674, N = 79, p < 0.001). Paired t-test revealed that voices rated from speech were judged significantly higher on attractiveness than voices rated from singing in both men (t = 6.66, df = 72, p < 0.001) and women (t = 3.85, df = 78, p ≤ 0.001).

#### Structural Models

The model which analyzes the fundamental frequency is not path-invariant with respect to the sex of individuals (χ <sup>2</sup> = 117.03, df = 44, p < 0.001) but is path-invariant with respect to participants' nationality (χ <sup>2</sup> = 49.58, df = 44, p = 0.26 in men, χ <sup>2</sup> = 60.68, df = 44, p = 0.05 in women). Results are therefore

TABLE 1 | Mean fundamental frequency (F0) and the range of fundamental frequency (F0 range) in semitones, and VTL (in centimeters) in men and women.


reported separately for men and women but jointly for Czech and Brazilian participants.

Using path analysis (see **Supplementary Tables S6**, **S7** for full models), we found that in men, lower-pitched speech was rated as more attractive (**Figure 1**). The same held of singing, but this relationship did not reach statistical significance. In men, a broader speech range, but not singing range, was rated as more attractive. Attractive speech was positively associated with the total SOI, but this relationship failed to maintain its stability in jackknife resampling. The total SOI was directly connected to a lower F0 in speech and higher F0 in singing. Body weight had a strong and positive direct effect on perceived speech and singing attractiveness. Age had a negative effect on speech attractiveness but the effect failed to remain stable under jackknifing (see **Supplementary Table S8**).

Higher-pitched female voices (both in speech and singing) were rated as more attractive. No other relationship except for correlation between height and weight was significant (see **Supplementary Tables S7**, **S9**).

The additional model that analyzed vocal tract length (VTL) was not path-invariant with respect to the sex of individuals (χ <sup>2</sup> = 109.44, df = 44, p < 0.001) but was path-invariant with respect to participants' nationality at least in women (χ <sup>2</sup> = 66.99, df = 44, p = 0.01 in men, χ <sup>2</sup> = 59.18, df = 44, p = 0.06 in women). Results are reported separately for men and women but jointly for Czech and Brazilian participants for a better comparison with the original model that employs the F0.

Many relationships in the structural model remained similar when we replaced average F0 with apparent VTL (**Figure 2**). Nevertheless, the VTL failed to predict speech or singing attractiveness reliably. In women, we observed a reverse relationship between speech and singing VTL and the total SOI. In this model, however, these relationships were stronger because the potentially mediating path between VTL and attractiveness was weaker. This was possibly due to the fact that in the first model, which relied on average fundamental frequency together with the F0 range, both measurements of vocal quality were based on the same characteristic (F0 – either as average or as a difference between minimum and maximum), which in effect allowed us to partition out their respective contributions to speech and singing attractiveness better. The model with VTL, which tightly correlated with average F0, lowered the partial correlations beyond the threshold of statistical significance. All the relationships were, however, in the direction that would be expected based on the strong negative correlation between VTL and mean F0 (See **Supplementary Tables S10**–**S12**).

## The Effect of Singing Experience and Voice Modulation on Voice Attractiveness

Non-parametric correlations showed a positive association between how much men liked to sing and attractiveness as rated from both speech (τ = 0.253, N = 87, p < 0.001) and singing (τ = 0.277, N = 87, p < 0.001). In women, this association was rather weak and significant only in singing attractiveness (τ = 0.171, N = 90, p = 0.024) but not in speech attractiveness (τ = 0.101, N = 91, p = 0.183). Furthermore, the absolute difference of F0 between speech and singing was positively correlated with how much men and women liked to sing (τ = 0.255, N = 90, p = 0.001; τ = 0.281, N = 93, p < 0.001, respectively). Moreover, the absolute difference of F0 was positively associated with rated singing attractiveness in both men (τ = 0.177, N = 87, p = 0.015) and women (τ = 0.294, N = 90, p < 0.001) but not significantly associated with speech attractiveness in either men (τ = 0.123, N = 87, p = 0.092) or women (τ = 0.118, N = 90, p = 0.101). Finally, the absolute difference of F0 was weakly positively associated with sociosexuality in men (τ = 0.139, N = 80, p = 0.069) but not in women (τ = 0.036, N = 84, p = 0.632). There were no significant correlations with the absolute difference between spoken and sung F0 range or VTL, rated attractiveness, and sociosexuality.

## The Effect of Targets' Sex and Country on Voice Attractiveness Ratings From Speech and Singing

Test of between-subjects effects of the GLM model showed significant main effect of sex of targets on attractiveness rated both from speech (F = 13.84, df = 1, 157, p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.082) and singing (F = 36.48, df = 1, 157, p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.192). Estimated marginal means revealed that the voices of female participants were rated as more attractive based on both speech (mean rating = 3.89, SD = 0.65) and singing (mean rating = 3.82, SD = 0.73) than the voices of male participants (mean ratings = 3.48, SD = 0.66; and 3.11, SD = 0.72, respectively). There was no effect of country.

## DISCUSSION

Using a cross-cultural sample of men and women, we have shown that speech and singing attractiveness are strongly correlated. We also found a strong correlation between the fundamental frequency (F0), F0 range, and vocal tract length (VTL) in both vocal displays. In men, low-pitched speech was rated as attractive and a similar trend was observed in singing. Furthermore, both vocal displays were invariably associated with body size (but not shape) and differently associated with sociosexuality.

significance stability criteria are represented with a dashed line. VTL = apparent vocal tract length; WSR = waist-to-shoulder ratio; and WHR = waist-to-hip ratio.

In women, both high-pitched singing and speaking voice predicted vocal attractiveness, and similarly to men, VTL as measured from singing and speech was differently associated with sociosexuality. Most results were invariant with respect to participants' nationality, which indicates a degree of universality.

Our results partly support the hypothesis that speech and singing work as backup signals. They share many vocal parameters, such as fundamental frequency, its range and formant frequencies, which could lead to similar attractiveness ratings in both vocal displays (for similar results, see Isenstein, 2016). Both studied vocal displays thus covary in their production and perception and can transmit similar information to listeners. This is in line with previous studies which show that women's cross-modal attractiveness or masculinity as rated from faces and spoken voices are intercorrelated, although no such correlation was found in men (Valentova et al., 2016; Pereira et al., 2019).

Nevertheless, we also found some features which are specific to the singing and speaking voice. For example, male speech attractiveness, but not singing attractiveness, is associated with higher sociosexuality (for similar results, see Hughes et al., 2004). The observed absence of association between singing attractiveness and male sociosexuality may suggest that singing voice is not part of the repertoire of short-term sexual strategy, at least in the two studied populations, which does not, however, exclude the possibility that it may be used to foster long-term relationships. Further, in line with previous studies, lower F0 in speech was directly connected to higher sociosexuality in men (e.g., Puts, 2005), while lower F0 in singing was connected to lower sociosexuality. Again, this could point to possible use of singing vocal display rather for committed long-term sexual strategy, which needs to be tested in future studies.

Further, although a high F0 in both speech and singing predicted vocal attractiveness in women, only low speech F0 was rated as attractive in men, although a similar non-significant trend appeared also in singing. This is in line with a study that found no difference in the attractiveness ratings of highand low-pitched performances of famous singers (Neumann et al., 2008). Nevertheless, when analyzing the relative vocal parameters (difference in voice pitch between spoken and sung voice of the same person), we found that the singing voice of individuals who are capable of a higher pitch modulation is perceived as more attractive. In accordance with the handicap theory, individuals who can produce a larger difference between their spoken baseline and singing performance can thus benefit in terms of higher attractiveness and consequently potentially higher fitness. In line with this, men who modulated their voice pitch more had a tendency for higher sociosexuality, and men who like to sing more had more attractive voices. Both singing experience and higher capacity of voice modulation are thus linked to male attractiveness and sexuality.

Interestingly, in our study speech was on average rated as more attractive than singing. This can indicate that the standards for evaluation are higher in the singing domain, whereby singing abilities (e.g., singing in-tune), which are 40% heritable (Park et al., 2012), and were not tested in this study, may have influenced this difference. Nevertheless, another study found higher attractiveness ratings of singing than in speech in women and found no association between attractiveness ratings and singing quality (Isenstein, 2016). More studies are clearly needed to discern and determine the overall pattern.

We found that body weight was a strong positive predictor of both speech and singing attractiveness in men and a weak negative predictor of singing attractiveness in women (for similar results, see e.g., Sell et al., 2010; Xu et al., 2013; Šebesta et al., 2017). Weight also positively predicted VTL as estimated from speech in men, which is likewise in line with previous studies (for a review, see Pisanski et al., 2014). Some studies found differences in several vocal parameters (F0, voice pressure, perceptual voice quality) as a function of body weight, whereby heavier individuals have lower-pitched voices of more attractive perceptual quality (Barsties et al., 2013; Jost et al., 2018). The link between decrease in F0 and increase in body weight could be driven by hormonal factors, since for example in men, increased amount of fat tissue relates to lower testosterone levels (Zumoff et al., 1990; Tchernof et al., 1995). On the other hand, body weight may be due to not only body fat but also muscularity, which are both correlated with body size. Since the male body is composed relatively more by muscles than by fat tissue, one could speculate that vocal attractiveness provides a reliable cue specifically to muscularity, but future studies should assess the contribution of individual body components to vocal attractiveness. We also predicted a stronger association between body size and singing attractiveness but our results did not confirm this hypothesis. In humans, as in some songbirds (Hall et al., 2013), different vocal manifestations can thus serve as a cue to body size but not to body shape. This is in line with the finding that lower-pitched voice affects the perception of physical dominance (Puts et al., 2007).

Although women report that they like to sing more than men (Varella et al., 2010), and women and men both prefer sexual partners who demonstrate some music abilities (Kaufman et al., 2016), we found no association between singing or speaking voice attractiveness and sociosexuality or body indicators in women. This is contrary to previous studies (e.g., Hughes et al., 2004) which reported that women with attractive speaking voices had a lower waist-to-hip ratio, age of first sex, and a higher total number of sexual partners. Nevertheless, we found that shorter VTL measured from speech and longer VTL measured from singing predicted higher sociosexuality in women (for similar results in men, see Hodges-Simeon et al., 2011). This is comparable to our finding obtained for men when we analyzed the fundamental frequency. Generally speaking, individuals with sextypical speech parameters and sex atypical singing parameters have higher sexual success (see, Bártová et al., 2019, for similar results on higher sociosexuality and gender non-conformity), which further supports the handicap hypothesis. Interestingly, there was no effect of the VTL on voice attractiveness and no effect of voice attractiveness on sociosexuality in women. Women's tendency for sexual variety thus does not seem to be defined by how attractive they appear to the opposite sex. Access to sexual partners in individuals who display honest signals can be influenced by other mechanisms, such as intra-sexual competition (Varella et al., 2017; Ostrander et al., 2018).

This is the first study whose aim was to test the potential involvement of intersexual selection on different vocal

displays on a cross-cultural sample of men and women (for intrasexual selection, see Raine et al., 2018; Šebesta et al., 2019). Although we used four different vocal recordings (standardized self-presentation, singing of "Happy Birthday," and reading and singing of the national anthem), they do not represent the full range of human speech or singing. Standardized songs, such as "Happy Birthday," are likely to limit pitch dynamics and range and thereby obscure or dampen the individual differences in pitch and voice modulation which might otherwise provide important cues to fitness.

Studies using different vocal recordings, such as spontaneous speech and singing, singing of more mating-relevant songs, or wordless singing, should be undertaken. This might be why some our predictions were not supported. It is for instance possible that a link between quality indicators and singing attractiveness becomes apparent in more demanding singing that involves complex rhythms, melody, or range (Charlton, 2014). The production of such demanding songs could be viewed as costly signaling and therefore serve as a more reliable indicator than the relatively undemanding songs employed in this study. Moreover, future studies should also perform more fine-tuned vocal analyses to compare both singing and speech (Šebesta et al., 2019).

It also ought to be taken into account that our samples in both countries were recruited from middle-class university student populations in the largest cities of both countries. They were thus not representative of the local populations and moreover compared only two countries. More cross-cultural comparisons are needed to test the generalizability potential of our current findings (see, Moshontz et al., 2018 for multi-lab psychological studies). Finally, as correlations between Czech and Brazilian raters were high, we pooled the ratings together, and did not analyze potential in-group and out-group effects, which might be addressed in future studies.

To conclude, we expected that singing would be a stronger indicator of individual body characteristics and sexuality than speech but our results show that cross-culturally, speech and singing seem to work rather in concert, i.e., as backup signals. Attractiveness of both singing and speaking voice is perceived in a similar way and is connected to a higher pitch in women and a lower pitch in men. Moreover, in men, speaking and singing both serve as similar cues to body indicators. On the other hand, the relation between speaking and singing voice and sociosexuality works in opposite ways in both men and women. Developmental pathways leading to sex-typical or atypical speaking and singing voice and sexuality should be addressed in future studies. In general, singing, together with other vocalizations, should be taken into account in evolutionary literature on voice production and perception.

## REFERENCES


## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Charles University IRB with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Charles University IRB (2011/07).

## AUTHOR CONTRIBUTIONS

JV and JH developed the study concept and MV expanded it. JV, MV, FM, KP, LK, and PS collected the data. JV performed the analysis of F0 and F0 range of the vocal stimuli. PŠ performed the formant analyses during revisions of the manuscript. JV and PT performed the data analysis and interpretation jointly with MV and JH. JV and MV drafted the manuscript. PT and JH provided the critical revisions. JV, MV, JH, PŠ, and PT worked on the revised version of the manuscript. All authors approved the final version of the manuscript for submission.

## FUNDING

JH was supported by the Charles University Research Centre program UNCE 204056. MV was supported by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), number PNPD 33002010037P0 – MEC/CAPES.

## ACKNOWLEDGMENTS

We are indebted to all volunteers for their participation and Anna Pilátová, Ph.D. for English proofreading. We are grateful for Tiago Leal Dutra de Andrade for helping with collecting data during the ratings phase in Brasília.We further thank Prof. Dr. Vera S. R. Bussab for enabling the initial data collection phase at the University of São Paulo. We also thank the reviewers who offered valuable and critical suggestions of improvements.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.02029/full#supplementary-material




aggressive and distressed speech. PLoS One 14:e0213034. doi: 10.1371/journal. pone.0213034


shape, and WHR. Pers. Indiv. Differ. 104, 313–319. doi: 10.1016/j.paid.2016. 08.005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Valentova, Tureˇcek, Varella, Šebesta, Mendes, Pereira, Kubicová, Stolaˇrová and Havlíˇcek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.