Impact Factor 2.990 | CiteScore 3.5
More on impact ›


Front. Psychol., 30 April 2019 |

Roar of a Champion: Loudness and Voice Pitch Predict Perceived Fighting Ability but Not Success in MMA Fighters

Pavel Šebesta1,2*, Vít Třebický1,3, Jitka Fialová1,3 and Jan Havlíček1,3*
  • 1National Institute of Mental Health, Klecany, Czechia
  • 2Faculty of Humanities, Charles University, Prague, Czechia
  • 3Faculty of Science, Charles University, Prague, Czechia

Historically, antagonistic interactions have been a crucial determinant of access to various fitness-affecting resources. In many vertebrate species, information about relative fighting ability is conveyed, among other things, by vocalization. Previous research found that men's upper-body strength can be assessed from voice. In the present study, we tested formidability perception of intimidating vocalization (roars) and a short utterance produced by amateur male MMA fighters attending the amateur European Championships in relation to their physical fitness indicators and fighting success. We also tested acoustic predictors of the perceived formidability. We found that body height, weight, and physical fitness failed to predict perceived formidability either from speech or from the roars. Similarly, there was no significant association between formidability of the roars and utterances and actual fighting success. Perceived formidability was predicted mainly by roars' and utterances' intensity and roars' harmonics-to-noise ratio and duration. Interestingly, fundamental frequency (F0) predicted formidability ratings in both roars and utterances but in an opposite manner, so that low F0 utterances but high F0 roars were rated as more formidable. Our results suggest that formidability perception is primarily driven by intensity and duration of the vocalizations.


Historical and ethnographic evidence shows that physical encounters were a frequent way of resolving conflicts (Manson et al., 1991; Keeley, 1997). Cross-culturally, man's fighting ability is a powerful determinant of access to resources (Daly and Wilson, 1988). These findings are complemented by psychological studies which show that stronger men are more prone to anger (Archer and Thanzami, 2007; Sell et al., 2009b). One may therefore expect that cognitive processes evolved for assessing the threat potential of a prospective opponent (Sell et al., 2009a; Puts, 2010). Earlier research tended to focus on visual cues to the threat potential. It has been demonstrated, for instance, that people can relatively accurately assess physical strength from images of body and face (Sell et al., 2009a; Holzleitner and Perrett, 2016; Kordsmeyer et al., 2018). Moreover, it seems that based on facial images raters can predict winners of mixed martial arts (MMA) fights (Třebický et al., 2013; Little et al., 2015; but see Třebický et al., 2019).

The cues to threat potential are not restricted to the visual modality but evidence regarding vocal indicators of threat potential is rather mixed. On one hand, it was reported that both men and women can accurately assess men's physical strength from voice irrespectively of the language used (Sell et al., 2010). On the other hand, fighting ability assessed by acquaintances did not correlate with ratings of fighting ability based on vocal stimuli (Doll et al., 2014). Han et al. (2017) likewise reported no association between a composite measure of threat potential, consisting of handgrip strength, body height and weight, and the perceived vocal threat potential.

Importantly, all of the abovementioned studies used speech as their acoustic stimuli. Humans, however, produce also various other vocalizations, such as laughter, roars, screams and grunts, and these so far received only limited attention. This contrasts with evidence from a number of vertebrate species, including primates, which shows that vocal displays are frequently part of male intrasexual competition (Bradbury and Vehrencamp, 2011) and can indicate fighting ability (for evidence in red deer, see Clutton-Brock and Albon, 1979; for baboons, see Kitchen et al., 2003). In humans, it has recently been shown that tennis players who produce grunts with a lower fundamental frequency (F0) are more likely to win and listeners can to some extent predict match outcome from the grunts (Raine et al., 2017). Similarly, Raine et al. (2018a) reported that listeners accurately assess relative strength and body height from aggressive roars in both men and women.

In our complementary study, we tested predictors of perceived formidability using acoustic cues. It ought to be noted, however, that Raine et al. (2018a) and the current study differ in several important respects. First of all, Raine et al. focused on two important components of threat potential (height and strength), but threat potential and/or perceived formidability undoubtedly include other components as well. These may include morphological characteristics, such as body weight and lean muscle mass, as well as physical abilities other than isometric strength, for instance respiratory fitness. Secondly, while one can expect that threat potential is a predictor of outcomes of real-life fights, it cannot be entirely equated with fighting success.

To address these questions, we recorded both verbal and non-verbal vocalizations (utterances and roars) of amateur male MMA athletes along with (i) measurements of their body composition, isometric strength, and spirometry, and collected data regarding their (ii) fighting success.

We hypothesized that formidability perceived from vocalization should correlate with height, weight, and muscle mass as well as physical fitness indicators, such as strength and lung capacity. We also predicted that perceived formidability is positively associated with fighting success. Further, we performed an acoustic analysis to identify which parameters predict the perception of formidability from both roars and utterances. We hypothesized that perceived formidability is related to the F0 and intensity in both verbal and non-verbal vocalizations.

Materials and Methods

All procedures applied in this study were in accordance with ethical standards of the responsible committee on human experimentation and with the Helsinki Declaration. The study was approved by the Institutional Review Board of the National Institute of Mental Health, Czech Republic (Ref. num. 28/15). All target participants were provided with a brief description of the study and approved their participation by signing informed consent. The present study is part of a larger project investigating multi-modal perception of traits associated with sexual selection and characteristics related to competition outcome.


Data collection took place during 2016 IMMAF European Open Championships of Amateur MMA held in Prague (Czech Republic), which hosted a total of 155 contestants (incl. 20 women) from 30 countries (based on data from Contestants were approached by researchers during registration on site, 1 day before the start of the tournament. We focused on male athletes because championship attendance was highly biased toward male athletes and we thus managed to collect data from only three female athletes.

Forty male amateur MMA fighters (mean age = 24, SD = 4.4, range = 19–33 years), naïve to our project's aims, participated in the study. To assess a possible effect of weight category, we merged the weight categories used by competition organizers (Flyweight, N = 1; Bantamweight, N = 7; Featherweight, N = 4; Lightweight, N = 4; Welterweight, N = 7; Middleweight, N = 5; Light Heavyweight, N = 5; Heavyweight, N = 4; and Super Heavyweight, N = 3) into just three categories: Lightweight (N = 12; consisting of Flyweight, Bantamweight, and Featherweight categories), Middleweight (N = 16; consisting of Lightweight, Welterweight, and Middleweight categories) and Heavyweight (N = 12; consisting of Light Heavyweight, Heavyweight, and Super Heavyweight) following procedure in Třebický et al. (2013). All targets reported their basic demographics, age, and total fighting record, from which computed their fighting success as a proportion of the number of wins relative to the total number of fights. Fighting success was calculated only for fighters whose record included more than two fights. Analyses involving fighting success are therefore based on 29 individuals. For technical reasons, we managed to obtain lung capacity measures from 34 individuals. For descriptive statistics, see Table 1. All other analyses are based on the complete dataset of 40 individuals. Participants in the study were financially reimbursed with 400 CZK (app. €15) and verbally debriefed upon completing their participation.


Table 1. Target descriptive statistics.

Body Measurements

Body height was measured by Vít Třebický using anthropometer Trystom A-213. Participants were standing with their back against a wall, looking directly ahead, and body height was measured from Vertex to ground to a nearest millimeter (Hall et al., 2007).

Body weight, amount of body fat, and muscle mass were measured by Jitka Fialová (JF) using bio impedance Tanita MC-980 scale (Athlete setting; Vaara et al., 2012). Testing was performed in a standing position while standing on and holding in hands the measuring electrodes with arms hanging freely along the body. Participants were wearing underwear only (Pinilla et al., 1992).

Physical Fitness Measurements

Handgrip strength was measured by JF using Takei TKK 5401 digital hand dynamometer (Vidal Andreato et al., 2011; Bonitch-Góngora et al., 2013). While undergoing the handgrip test, the athletes were instructed to stand straight with arms alongside their body. They had 3 attempts with each hand, alternated hands between attempts, and we used the “best test” method, meaning the attempt with the highest value of handgrip strength for each hand was recorded. Maximal handgrip strength between left and right hand was closely correlated (r = 0.808 95% CI [0.664, 0.894], p < 0.001, N = 40) and paired sample t-test showed no statistical difference between the maximal strength of left and right hand [t (39) = 0.618, p = 0.54, mean difference = 0.6 kp]. In all further analyses involving handgrip strength, we therefore represent handgrip strength by the mean of both hands “best test” score.

Measures of lung capacity were taken by JF using MicroLab ML3500 MK8. Three standing forced vital capacity (FVC) maneuvers were performed, “best test” method applied, and we recorded the maneuver with the highest recorded FVC value along with Forced expiratory volume in the first second (FEV1) and Peak expiratory flow (PEF). The “best test” method is a widely used and recommended approach in research employing spirometry (Crapo et al., 1981; Havryk et al., 2002). FVC is the maximal volume of air exhaled with maximally forced effort from maximal inspiration delivered during expiration made as forcefully and completely as possible. In other words, it is vital capacity performed with a maximally forced expiratory effort. FEV1 is the maximal volume of air exhaled in the first second of forced expiration from a position of full inspiration, and PEF represents the maximum expiratory flow achieved by maximum forced expiration from the point of maximal lung inflation (Miller et al., 2005).

Vocal Stimuli Recordings and Processing

Acoustic stimuli were recorded by Pavel Šebesta (PŠ) using Sony PMC-D90 portable audio recorder (in-built microphone sensitivity 20–40 kHz). Recorder was equipped with a windscreen (AD-PCM1), mounted on a tripod with acoustic reflection shield and placed in a portable, acoustically treated booth to reduce any potential echoes and ambient noises. Recordings were captured at 24 bit/96 kHz in WAV format. Participants stood 1.5 meters from the recorder and Levels setting was kept constant in the course of all recordings to standardize recording intensity and to prevent clipping.

Participants were instructed to count from 1 to 10 in their native language and then perform three intimidating roars (their instruction was: “Roar three times, as much as you can, to intimidate a potential opponent”). For ratings and analyses, we use only the second roar because the first might be affected by the novelty of the task and the third by a potential decrease of effort (for differences between the three roars, see Supplementary Material Tables S1S11). For examples of roars see Audios S1, S2 and for utterances see Audios S3, S4.

Subsequent processing and acoustical analysis were performed by PŠ in Audacity 2.1.3 (Audacity Team, 2018) and Praat 5.4.09 (Boersma and Weenink, 2015). Roars and utterance levels were increased by +20 dB and +35 dB, respectively, while interindividual variation in vocalization intensity remained unmodified. This intensity adjustment was necessary because most utterance recordings were not sufficiently loud even at the highest volume settings. The employed adjustments in roars was the highest possible that did not introduce clipping in any of the recordings. We measured the mean intensity and duration of volume-adjusted utterances and roars. Mean F0 was measured by autocorrelation method. Preset parameters for F0 extraction were used, with a 75 Hz pitch floor in accordance with Praat programmers' recommendations and 300 Hz pitch ceiling based on a visual inspection of spectrographs (for similar approach see also Šebesta et al., 2017). The 300 Hz pitch ceiling recommended for utterances was not suitable for the roars. We visually inspected Praat's pitch contours in the Editing window. Most roar recordings showed erroneous F0 measurements (see Figure S1 for an example), which rendered the standard Praat's F0 extraction method unreliable for this type of acoustic stimuli (for similar issues with F0 extraction, see Raine et al., 2017). F0 tracking frequently failed in the middle of recording or even unexpectedly “jumped down.” This is possibly due to chaotic and subharmonic phenomena found in roars (Fitch et al., 2002). For this reason, we decided to use, as a F0 analog, the long-term averaged Fast Fourier transformed (FFT) spectral peak frequency (see Figure S2 for an example), corresponding to the first harmonic (verified by a visual inspection of harmonic structure). Further, we used standard Praat methods for harmonics-to-noise ratio (HNR; autocorrelation method, preset parameters) measurements for whole utterance recordings, and one second long snips from the initial part of the roars close to the spectrogram plateau where Praat's autocorrelation algorithm was able to track F0. Mean formant levels in speech (F1–F4) were measured by Burg method. In roars, however, only a peak around 2–3 kHz (which is in expected range for the third formant) was apparent by a visual inspection of long-term average spectrums (LTAS) and clearly distinguishable from other harmonics. Audacity's “Plot spectrum” feature (Spectrum, 1,024 window size, Hanning window) was used for the 2–3 kHz peak measurement. Because we were able to reliably extract only the third formant (F3) from roars and the first and second formants in speech are highly affected by speech content, we decided to use in subsequent analyses only the third formant of both utterances and roars to enable comparison.

Rating Sessions

In total, 31 men (mean age = 27.1, SD = 5.2, range = 20–36 years) and 32 women (mean age = 24.4, SD = 4.3, range = 18–33 years), mainly students at the Charles University, Prague, Czech Republic, took part in rating sessions.

Raters were recruited via social media advertisements and mailing lists of participants from previous studies. After completing participation, they were financially reimbursed with 100 CZK (~ €4), a small snack, and received a debriefing leaflet about the purpose of the study.

Raters were asked to assess the formidability (“Jak moc by byl tento muž úspěšný, kdyby se dostal do fyzického souboje?”/“How successful would this man be if he was involved in a physical confrontation?”) of a given recording on a 7-point verbally anchored scale (from “1–velice neúspěšný”/“not successful at all,” to “7–velice úspěšný”/“highly successful”). Each participant rated all roar and utterance stimuli. To reduce participant fatigue, the rating was divided in two sessions 1 week apart. In the first session, participants rated half of the set of all roars and utterances in a randomized order. Individual stimuli within the set were randomized as well. In the second session, participants rated the remaining half of the stimuli in the same fashion.

Ratings took place in a quiet perception lab room with negligible ambient sounds. Focusrite Scarlett Solo Gen 2 audio I/O interface (22 Hz−22 kHz RCA output) and two Yamaha HS-7 active reference studio monitor speakers (43 Hz−30 kHz @ 95W, LF 60 W, HF 35 W output) were used to present stimuli in WAV format. Raters were seated 2.8 meters in front of and in focus of the speakers. We opted for speakers, instead of commonly used headphones, because it is a more ecologically valid approach to presenting stimuli in terms of sonic characteristics of roaring. Loudness of the playback was kept standard during the presentation, with the loudest roar registering 87 dB (measured with OnePlus One smartphone and Smart Tools® Sound meter 1.6.12 app). This is a level which, all authors agreed, was very naturalistic but not overwhelmingly loud.

Statistical Analyses

All statistical tests were performed in JASP (JASP Team, 2018) and jamovi (jamovi project, 2018). McDonald's ω statistics was used to estimate interrater agreement (Dunn et al., 2014). To test for potential sex differences in ratings, a paired samples t-test was carried out. Associations between ratings by men and women were tested by bivariate correlations using Pearson's r coefficient with 95% CIs [lower limit, upper limit]. Potential differences between the maximal strength of left and right hand were tested with paired samples t-test, and associations between the left and right hand strength were tested by bivariate correlations using Pearson's r coefficient with 95% CIs. Cohen's d, as an effect size measure, was used for means comparisons. To assess the relative contribution of performance-related and acoustic measures to the perceived roar and utterance formidability, we performed Linear mixed effects model (using REML fit) with individual rater ID and target stimuli ID as random intercepts. This approach accounted for variation on the level of individual raters and for variation on the level of individual stimuli. It also accounted for potential bias due to the data aggregation. To assess acoustic predictors of fighting success, we ran a linear regression analysis (Enter method). As measures of variability explained by regression, we list model R2 values, while standardized βs and their 95% CI are reported for entered coefficients.

Data Availability

Datasets generated and analyzed during the current study are available in the Supplementary Material of this article (Tables S20, S21).


Sex Differences in Perceived Formidability


McDonald's ω scores of male (ω = 0.954) and female (ω = 0.933) ratings of formidability of utterances showed a high interrater agreement. We have therefore used mean formidability ratings given to the individual utterances separately by male and female raters. Perceived formidability of utterances was likewise highly correlated between men and women (r = 0.93 95% CI [0.871, 0.963], p < 0.001, N = 40). Paired sample t-test showed a statistically significant sex difference in formidability ratings with men giving higher ratings [t(39) = 9.165, p < 0.001, Cohen's d = 1.449, mean difference = 0.368] (for descriptive statistics, see Table 2). Although mean ratings of utterance formidability differed between sexes, all further analyses are reported with ratings combined because the results are virtually the same when analyzed separately. For results based on female and male ratings separately, see Supplementary Material Tables S12S19.


Table 2. Formidability rating descriptive statistics.


McDonald's ω scores of males (ω = 0.953) and females (ω = 0.924) ratings of roar formidability showed a high interrater agreement. In subsequent analyses, we have therefore used mean formidability ratings given to the individual roars separately by male and female raters. Further, we found a high correlation between roar formidability ratings assigned by men and by women (r = 0.973 95% CI [0.95, 0.986], p < 0.001, N = 40). Paired sample t-test showed statistically significant difference between the sexes in roar formidability ratings with women giving higher ratings [t(39) = 2.695, p = 0.645, Cohen's d = 0.426, mean difference = 0.132]. For descriptive statistics, see Table 2.

Formidability of Utterances and Roars as a Predictor of Fighting Success

To test whether formidability perception from roars and utterances predicts fighting success, we ran bivariate Pearson's correlations. We found that neither in utterances (r = −0.045 95% CI [−0.405, 0.327], p = 0.817, N = 29) nor in roars (r = −0.115 95% CI [−0.462, 0.263], p = 0.554, N = 29) was formidability perception associated with actual fighting success. To explore whether the effect is modulated by the weight categories, we grouped the fighters in three weight categories (lightweight, middleweight, and heavyweight) and entered this variable into the linear regression. Even after this modification, however, the overall model was not formally significant either in utterances [F(3, 25) = 1.841, p = 0.166, R2 = 0.181] or in roars [F(3, 25) = 0.683, p = 0.571, R2 = 0.076].

Physical Fitness Predictors of Perceived Formidability

First, were ran exploratory correlational analyses to assess relationships between the physical fitness variables (see Supplementary Material Table S22). Body weight, muscle mass, and fat mass were all highly positively intercorrelated (rs > 0.757, ps < 0.001, N = 40). To avoid collinearity and to facilitate interpretation of the findings, we used only body weight in the subsequent analyses. FVC and FEV1 spirometry measures were likewise highly positively correlated (r = 0.935 95% CI [0.872, 0.967], p < 0.001, N = 34), which is why we decided to omit the FEV1 from subsequent analyses.

Linear mixed model analyses were run with age, height, weight, FVC, PEF, and handgrip strength as fixed effect predictors to assess whether physical fitness parameters predict the perceived formidability of utterances and roars. The overall model for utterances explained 44.9% of variance (R2 conditional) and fixed factors explained 5.4% of variance (R2 marginal). None of the physical fitness predictors for the formidability of utterances was formally significant. The overall model for roars explained 60.1% of variance (R2 conditional), while fixed factors explained 8.2% of variance (R2 marginal). Similarly, none of the predictors of perceived formidability in roars were significant. For an overview of the results, see Table 3.


Table 3. Summary of linear mixed effects model analysis for physical fitness predictors of perceived fighting ability based on utterances and roars.

Acoustic Predictors of Perceived Formidability

Linear mixed model analyses were run to predict perceived formidability from utterances and roars with F0, F3, HNR, intensity, and duration entered as independent predictors. For utterances, the overall model explained 44.1% of variance (R2 conditional), while fixed factors explained 9.6% of variance (R2 marginal). We found that F0 and intensity are significant predictors of perceived formidability. In the case of roars, the overall model explained 57% of variance (R2 conditional) and fixed factors explained 37.5% of variance (R2 marginal). We further found that perceived formidability was predicted by the F0, HNR, intensity, and duration. For full detail, see Table 4.


Table 4. Summary of linear mixed effects model analysis for acoustic predictors of perceived formidability based on utterances and roars.

Acoustic Predictors of Fighting Success

To explore whether any acoustic parameters predict actual MMA fighting success, we ran a multiple linear regression analysis for both utterances and roars. Overall models were not statistically significant in either utterances or roars [Utterances: F(5, 23) = 0.774, p = 0.578, R2 = 0.144; Roars: F(5, 23) = 1.107, p = 0.384, R2 = 0.194]. For full results, see Table 5.


Table 5. Summary of multiple linear regression analysis for acoustic predictors of fighting success based on utterances and roars.


The main goal of this study was to test whether a perception of formidability based on intimidating roars and non-intimidating utterances is related to body parameters such as body height, weight, and to some relevant aspects of physical fitness, such as strength and lung capacity. We have also tested whether perceived formidability is related to actual fighting success. Finally, we performed an acoustic analysis to investigate which parameters predict perceived formidability and fighting success. In contrast to our predictions, we found that neither body height, weight, or muscle mass predict perceived formidability neither from speech not roars. We also found no significant association between formidability of the roars and utterances and actual fighting success. Finally, our acoustic analysis showed that the intensity (the acoustic analog of loudness) of both speech and roars is the strongest predictor of perceived formidability. In roars, but not in utterances, lower HNR and longer duration predicted perceived formidability. Moreover, while lower voices (lower F0) were perceived as more formidable in utterances, the opposite held for the roars.

Our negative findings concerning an association between body height and strength of the roars contrast with results reported in a recent paper by Raine et al. (2018a), where the authors found that the listeners could predict relative body height and handgrip strength from both speech and roars. Such results are further supported by another study which showed a positive association between handgrip strength and perceived strength based on speech (Sell et al., 2010). On the other hand, another two studies found no association between threat potential and perceived fighting ability/dominance from speech (Doll et al., 2014; Han et al., 2017).

There are several possible explanations for such striking differences between our study and results reported in Raine et al. First of all, both Raine et al. (2018a) and Sell et al. (2010) asked participants specifically to assess strength, while our participants rated formidability. Although strength does certainly contribute to overall formidability, there are other important factors which influence it, such as agility or endurance. Moreover, differences in the use of perceptual attributes can, too, affect the association with measures of formidability. Since our main goal was to investigate how people perceive threat potential based on acoustic cues, we used a broader concept of formidability instead of focusing narrowly on the perception of strength. To resolve this issue, future studies should compare ratings of strength and formidability based on acoustic cues and its correlates while employing the same set of stimuli (for results based on the perception of faces and bodies, see Sell et al., 2010).

Secondly, Raine et al. (2018a) in their ratings used an ego-centered approach, i.e., their participants assessed strength relatively to their own strength. We agree that perceivers may be particularly sensitive when it comes to estimating their own chances of winning a confrontation. Nonetheless, several other studies did use absolute ratings, including rating of perception of strength from speech (e.g., Sell et al., 2010), and found positive results. It is possible that even under these conditions, people tend to use the scale relatively to their own prospects. It could also be argued that because our targets were experienced fighters, there should be no difference between the relative and absolute ratings because vast majority of student listeners would rate their formidability as lower than that of MMA fighters in either case. This is supported by a comparison of mean values of handgrip strength between our (Table 1) and Raine et al., 2018a (Supplementary Information, p. 4) study, although this is only a very approximate estimate because these two studies used different types of dynamometer and resulting values therefore cannot be directly compared. Alternatively, people might be able to assess formidability irrespective of ego involvement. This is supported by a study which explicitly used the bystander paradigm (Little et al., 2015). In particular, raters were asked to judge from facial photographs who will win a fight and they were successful above the chance level. Once again, to obtain more fine-grained insights into how ego-related context affects the cognitive processes of formidability assessments, future investigations should compare this directly.

Thirdly, in our study we used vocal stimuli from MMA fighters who have extensive experience with physical encounters and some fighters produce roars when winning a fight. It would seem advantageous to employ such a group of participants rather than, for instance, students who are likely to have limited experience with both fighting and roaring. Potential drawback of our sample of fighters may be that because of intense training, they will display little variability in their handgrip strength. Inspection of variation estimates, such as SD, shows that this was not the case (see Table 1). The sample size of our stimuli was rather moderate (N = 40), but a related study by Raine et al. (2018a) reported positive effect based on smaller sample of the male stimuli.

Finally, one could argue that formidability perception of the roars is related to the effort. This is supported by our acoustic analysis which showed that intensity and duration was the strongest predictor of formidability judgements. It is thus possible that in our sample, motivation and consequently also effort invested in the roars varied among our participants and as a result may have obscured some of the associations with physical characteristics. Alternatively, and perhaps most importantly, the full expression of intimidating roars is not under complete volitional control, which is why it is possible that it can be expressed only in the appropriate context (e.g., when conflict is imminent). Using on-demand roars might not be a problem for judgements of strength but could be a key factor in formidability inferences. Although we acknowledge that this might be a logistically challenging task, the use of real-life non-verbal vocal stimuli which vary little in their motivation and/or effort should thus be preferred. An excellent example in this context is the study by Raine et al. (2017) who used as their stimuli the grunts of professional tennis players.

The acoustic analysis showed that for formidability judgements, intensity and duration are the most salient predictor. This is in agreement with studies on various vertebrate species. For example, male green frogs (Rana clamitans) react differently to calls produced by large males as opposed to small ones (Bee et al., 1999; but see Bee, 2002). Similarly, more dominant male baboons produce longer and louder calls (so called “wahoos”) during contest vocalizations (Fischer et al., 2002; Kitchen et al., 2003). Interestingly, many studies on speech perception standardize their vocal stimuli for intensity (because reliable measures of acoustic intensity are logistically difficult to acquire) and therefore cannot assess intensity's contribution to the respective perceptual attribute. However, our results, as well studies on perception of affective states and intentions (Scherer, 1986, 2003; Siegman et al., 1990; Banse and Scherer, 1996), show that loudness (i.e., the perceptual analog of voice intensity) is an integral and significant part of voice perception. Indeed, the same verbal content expressed in a soft, moderate, or loud voice often has a very different impact on perceivers (Patel et al., 2011). We further found that a low HNR of roars, but not of utterances, is associated with high formidability ratings. Previous studies also show a higher noise in threatening calls than in non-threatening vocalizations and a higher perturbation (lower HNR) in anger vocalization in humans (Patel et al., 2011).

Finally, fundamental frequency was negatively associated with formidability of speech, while associating positively with the formidability of roars. The results of formidability judgements from speech are in agreement with other studies which consistently show that male voices with a lower voice pitch (the perceptual analog of fundamental frequency) are perceived as more dominant and attractive (Puts et al., 2006). This could be a consequence of sex dimorphism in the voice pitch (Rendall et al., 2005; Markova et al., 2016). In contrast, our finding of a positive association between fundamental frequency and formidability judgements of roars came at first as a surprise. On the other hand, one could take into account that high-pitched voices might, similarly to intensity, provide cues about the effort and affective state of the producers, whereby those in a state of high arousal would produce higher F0 roars. This speculation is supported by studies showing that arousal leads to increase in voice pitch perhaps as a consequence of tension in glottal area (Ekman et al., 1976; van Mersbergen et al., 2017). Moreover, high pitch is in some species associated with threat vocalizations (Stirling, 1971; Portfors, 2007) and in humans, it is associated with anger vocalizations (Scherer and Oshinsky, 1977; Frick, 1986). Fitch et al. (2002) have proposed that subharmonics (portions of F0) in loud calls are more prevalent and one of the hypothesized effects of this phenomenon is that they perceptually lower the pitch. In other words, a loud vocalization of the same individual that has the same F0 could sound lower-pitched than if the same vocalization were produced in moderate loudness. Although we were able to detect subharmonics phenomena in a number of high intensity roars in our sample (see Supplementary Material Table S10), this effect should be systematically investigated in future studies.

To summarize, we found no significant association between formidability perception of the intimidating roars produced by the MMA fighters and their body height, weight, and physical fitness indicators such as handgrip strength or lung capacity. Neither did we find a correlation between the perceived formidability of their roars and their actual fighting success. This might be because accurate judgements of formidability can be made only on the basis of real-life roars and cannot be reliably performed on demand. It may also be relevant that while roars might be primarily interpreted as intentions (e.g., as affective state of anger), utterances might be interpreted primarily as characteristic of the individual (e.g., as a level of dominance). Alternatively, the association between some acoustic parameters and perceived formidability might be the result of sensory exploitation and have only limited predictive value for actual formidability (Feinberg et al., 2018). We also found that the main acoustic predictors of formidability in roars are intensity, HNR, duration, and to some extent also fundamental frequency. In a broader context, our study points to a need of further investigations of non-verbal vocalizations in humans. Scholars seem to be so blinded by humans' exceptional gift of speech that they tend to almost completely overlook the fact that this is not our only vocalization. Non-verbal vocalizations are cross-culturally prevalent in human social milieu. This applies not only to preverbal infants (see for instance Lindová et al., 2015) but also to adult humans who produce a wide variety of non-verbal vocalizations in diverse contexts, such as co-laughter, painful injuries, aggressive confrontations, and sexual encounters, to name just few (for some pioneering studies, see Bryant et al., 2016; Raine et al., 2018a,b). We are confident that research into these non-verbal vocal displays will greatly contribute to our understanding of the complexity of human vocal expressions and perhaps also to the evolutionary history of verbal communication in general (Hauser et al., 2002).

Author Contributions

PŠ, VT, JF, and JH developed the study concept. Data collection was performed by PŠ, VT, and JF. PŠ performed acoustic analysis of vocal stimuli. VT and PŠ performed data analysis and interpretation jointly with JF and JH. JH, PŠ, and VT drafted the manuscript and JF provided critical revisions. All authors approved the final version of the manuscript for submission.


This research was supported by Czech Science Foundation GAČR P407/16/03899S and by the Ministry of Education, Youth, and Sports (MEYS) NPU I program (No. LO1611) and PROGRES program Q22 at the Faculty of Humanities, Charles University within the Institutional Support for Long-Term Development of Research Organizations from MEYS.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


We would like to thank the International Mixed Martial Arts Federation (IMMAF) and Mixed Martial Arts Association Czech Republic (MMAA) for giving us the opportunity to collect data during the 2016 IMMAF European Open Championships which were held in Prague, Czechia. We are indebted to all the volunteer contestants of the championship and raters for their participation. We wish to thank to Tereza Nevolová, David Stella, and other members of Human Ethology group ( for their help with data collection and ratings, Petr Tureček for help with stimuli randomization, Klára Coufalová, Ph.D. for providing us with physical performance measurements tools and Anna Pilátová, Ph.D. for English proofreading.

Supplementary Material

The Supplementary Material for this article can be found online at:

Audio S1. Sample of highly formidable roar.

Audio S2. Sample of low formidable roar.

Audio S3. Sample of highly formidable utterance.

Audio S4. Sample of low formidable utterance.

Figure S1. Sample of failed roar F0 measurement spectrogram.

Figure S2. Sample of successful roar FFT spectral peak frequency.

Table S1–S19. Supplementary results.

Table S20. Dataset ratings.

Table S21. Dataset targets.

Table S22. Exploratory correlation table.


Archer, J., and Thanzami, V. (2007). The relation between physical aggression, size and strength, among a sample of young Indian men. Pers. Individ. Dif. 43, 627–633. doi: 10.1016/j.paid.2007.01.005

CrossRef Full Text | Google Scholar

Audacity Team (2018). Audacity(R): Free Audio Editor and Recorder (Version 2.3.0). Available online at: (accessed March 8, 2017).

Banse, R., and Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70, 614–636. doi: 10.1037/0022-3514.70.3.614

PubMed Abstract | CrossRef Full Text | Google Scholar

Bee, M. A. (2002). Territorial male bullfrogs (Rana catesbeiana) do not assess fighting ability based on size-related variation in acoustic signals. Behav. Ecol. 13, 109–124. doi: 10.1093/beheco/13.1.109

CrossRef Full Text | Google Scholar

Bee, M. A., Perrill, S. A., and Owen, P. C. (1999). Size assessment in simulated territorial encounters between male green frogs (Rana clamitans). Behav. Ecol. Sociobiol. 45, 177–184. doi: 10.1007/s002650050551

CrossRef Full Text | Google Scholar

Boersma, P., and Weenink, D. (2015). Praat: Doing phonetics by Computer (Version 5.4.09). Available at: (accessed January 20, 2015).

Google Scholar

Bonitch-Góngora, J., Almeida, F., Padial, P., Bonitch-Domínguez, J., and Feriche, B. (2013). Maximal isometric handgrip strength and endurance differences between elite and non-elite young judo athletes. Arch. Budo. 9, 239–248.

Google Scholar

Bradbury, J. W., and Vehrencamp, S. L. (2011). Principles of Animal Communication, 2nd ed. Sunderland: Sinauer Associates Inc.

Google Scholar

Bryant, G. A., Fessler, D. M. T., Fusaroli, R., Clint, E., Aarøe, L., Apicella, C. L., et al. (2016). Detecting affiliation in colaughter across 24 societies. Proc. Natl. Acad. Sci. U S A. 113, 4682–4687. doi: 10.1073/pnas.1524993113

PubMed Abstract | CrossRef Full Text | Google Scholar

Clutton-Brock, T. H., and Albon, S. D. (1979). The roaring of red deer and the evolution of honest advertisement. Behaviour 69, 145–170.

Google Scholar

Crapo, R. O., Morris, A. T., and Gardner, R. M. (1981). Reference spirometric values using techniques and equipment that meet ATS recommendations. Am. Rev. Respir. Dis. 123, 659–664. doi: 10.1164/arrd.1981.123.6.659

PubMed Abstract | CrossRef Full Text | Google Scholar

Daly, M., and Wilson, M. I. (1988). Homicide. Hawthorne, NY: Aldine.

Google Scholar

Doll, L. M., Hill, A. K., Rotella, M. A., Cárdenas, R. A., Welling, L. L., Wheatley, J. R., et al. (2014). How well do men's faces and voices index mate quality and dominance? Hum. Nat. 25, 200–212. doi: 10.1007/s12110-014-9194-3.

PubMed Abstract | CrossRef Full Text | Google Scholar

Dunn, T. J., Baguley, T., and Brunsden, V. (2014). From alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. Br. J. Psychol. 105, 399–412. doi: 10.1111/bjop.12046

PubMed Abstract | CrossRef Full Text | Google Scholar

Ekman, P., Friesen, W. V., and Scherer, K. R. (1976). Body movement and voice pitch in deceptive interaction. Semiotica 16, 23–27. doi: 10.1515/semi.1976.16.1.23

CrossRef Full Text | Google Scholar

Feinberg, D. R., Jones, B. C., and Armstrong, M. M. (2018). Sensory exploitation, sexual dimorphism, and human voice pitch. Trends Ecol. Evol. 33, 901–903. doi: 10.1016/j.tree.2018.09.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Fischer, J., Hammerschmidt, K., Cheney, D. L., and Seyfarth, R. M. (2002). Acoustic features of male baboon loud calls: Influences of context, age, and individuality. J. Acoust. Soc. Am. 111, 1465–1474. doi: 10.1121/1.1433807

PubMed Abstract | CrossRef Full Text | Google Scholar

Fitch, W. T., Neubauer, J., and Herzel, H. (2002). Calls out of chaos: the adaptive significance of nonlinear phenomena in mammalian vocal production. Anim. Behav. 63, 407–418. doi: 10.1006/anbe.2001.1912

CrossRef Full Text | Google Scholar

Frick, R. W. (1986). The prosodic expression of anger: differentiating threat and frustration. Aggressive Behav. 12, 121–128. doi: 10.1002/1098-2337(1986)12:2<121::AID-AB2480120206>3.0.CO;2-F

CrossRef Full Text | Google Scholar

Hall, J. G., Allanson, J. E., Gripp, K. W., and Slavotinek, A. M. (2007). Handbook of Physical Measurements. New York, NY: Oxford University Press.

Google Scholar

Han, C., Kandrik, M., Hahn, A. C., Fisher, C. I., Feinberg, D. R., Holzleitner, I. J., et al. (2017). Interrelationships among men's threat potential, facial dominance, and vocal dominance. Evol. Psychol. 15:147470491769733. doi: 10.1177/1474704917697332

PubMed Abstract | CrossRef Full Text | Google Scholar

Hauser, M. D., Chomsky, N., and Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science 298, 1569–1579. doi: 10.1126/science.298.5598.1569

PubMed Abstract | CrossRef Full Text | Google Scholar

Havryk, A. P., Gilbert, M., and Burgess, K. R. (2002). Spirometry values in Himalayan high altitude residents (Sherpas). Respir. Physiol. Neurobiol. 132, 223–232. doi: 10.1016/S1569-9048(02)00072-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Holzleitner, I. J., and Perrett, D. I. (2016). Perception of strength from 3D faces is linked to facial cues of physique. Evol. Hum. Behav. 37, 217–229. doi: 10.1016/j.evolhumbehav.2015.11.004

CrossRef Full Text | Google Scholar

jamovi project (2018). jamovi (Version 0.9). Available online at: (accessed July 15, 2018).

JASP Team (2018). JASP (Version Available online at: (accessed July 1, 2018).

Keeley, L. H. (1997). War Before Civilization. New York, NY: Oxford University Press.

Google Scholar

Kitchen, D. M., Seyfarth, R. M., Fischer, J., and Cheney, D. L. (2003). Loud calls as indicators of dominance in male baboons (Papio cynocephalus ursinus). Behav. Ecol. Sociobiol. 53, 374–384. doi: 10.1007/s00265-003-0588-1

CrossRef Full Text | Google Scholar

Kordsmeyer, T. L., Hunt, J., Puts, D. A., Ostner, J., and Penke, L. (2018). The relative importance of intra- and intersexual selection on human male sexually dimorphic traits. Evol. Hum. Behav. 39, 424–436. doi: 10.1016/j.evolhumbehav.2018.03.008

CrossRef Full Text | Google Scholar

Lindová, J., Špinka, M., and Nováková, L. (2015). Decoding of baby calls: can adult humans identify the eliciting situation from emotional vocalizations of preverbal infants? PLoS ONE 10:e0124317. doi: 10.1371/journal.pone.0124317

PubMed Abstract | CrossRef Full Text | Google Scholar

Little, A. C., Třebický, V., Havlíček, J., Roberts, S. C., and Kleisner, K. (2015). Human perception of fighting ability: facial cues predict winners and losers in mixed martial arts fights. Behav. Ecol. 26, 1470–1475. doi: 10.1093/beheco/arv089

CrossRef Full Text | Google Scholar

Manson, J. H., Wrangham, R. W., Boone, J. L., Chapais, B., Dunbar, R. I. M., Ember, C. R., et al. (1991). Intergroup aggression in chimpanzees and humans. Curr. Anthropol. 32, 369–390.

Google Scholar

Markova, D., Richer, L., Pangelinan, M., Schwartz, D. H., Leonard, G., Perron, M., et al. (2016). Age-and sex-related variations in vocal-tract morphology and voice acoustics during adolescence. Horm. Behav. 81, 84–96. doi: 10.1016/j.yhbeh.2016.03.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Miller, M. R., Hankinson, J., Brusasco, V., Burgos, F., Casaburi, R., Coates, A., et al. (2005). Standardisation of spirometry. Eur. Respir. J. 26, 319–338. doi: 10.1183/09031936.05.00034805

PubMed Abstract | CrossRef Full Text | Google Scholar

Patel, S., Scherer, K. R., Björkner, E., and Sundberg, J. (2011). Mapping emotions into acoustic space: the role of voice production. Biol. Psychol. 87, 93–98. doi: 10.1016/j.biopsycho.2011.02.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Pinilla, J. C., Webster, B., Baetz, M., Reeder, B., Hattori, S., and Liu, L. (1992). Effect of body positions and splints in bioelectrical impedance analysis. JPEN-Parenter. Enter. 16, 408–412. doi: 10.1177/0148607192016005408

PubMed Abstract | CrossRef Full Text | Google Scholar

Portfors, C. V. (2007). Types and functions of ultrasonic vocalizations in laboratory rats and mice. J. Am. Assoc. Lab. Anim. 46, 28–34.

PubMed Abstract | Google Scholar

Puts, D. A. (2010). Beauty and the beast: mechanisms of sexual selection in humans. Evol. Hum. Behav. 31, 157–175. doi: 10.1016/j.evolhumbehav.2010.02.005.

CrossRef Full Text | Google Scholar

Puts, D. A., Gaulin, S. J. C., and Verdolini, K. (2006). Dominance and the evolution of sexual dimorphism in human voice pitch. Evol. Hum. Behav. 27, 283–296. doi: 10.1016/j.evolhumbehav.2005.11.003

CrossRef Full Text | Google Scholar

Raine, J., Pisanski, K., Oleszkiewicz, A., Simner, J., and Reby, D. (2018a). Human listeners can accurately judge strength and height relative to self from aggressive roars and speech. iScience 4, 273–280. doi: 10.1016/j.isci.2018.05.002

PubMed Abstract | CrossRef Full Text | Google Scholar

Raine, J., Pisanski, K., and Reby, D. (2017). Tennis grunts communicate acoustic cues to sex and contest outcome. Anim. Behav. 130, 47–55. doi: 10.1016/j.anbehav.2017.06.022.

CrossRef Full Text | Google Scholar

Raine, J., Pisanski, K., Simner, J., and Reby, D. (2018b). Vocal communication of simulated pain. Bioacoustics 1–23. doi: 10.1080/09524622.2018.1463295

CrossRef Full Text | Google Scholar

Rendall, D., Kollias, S., Ney, C., and Lloyd, P. (2005). Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry. J. Acoust. Soc. Am. 117, 944–955. doi: 10.1121/1.1848011

PubMed Abstract | CrossRef Full Text | Google Scholar

Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychol. Bull. 99, 143–165. doi: 10.1037/0033-2909.99.2.143

PubMed Abstract | CrossRef Full Text | Google Scholar

Scherer, K. R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256. doi: 10.1016/S0167-6393(02)00084-5

CrossRef Full Text | Google Scholar

Scherer, K. R., and Oshinsky, J. S. (1977). Cue utilization in emotion attribution from auditory stimuli. Motiv. Emotion 1, 331–346. doi: 10.1007/BF00992539

CrossRef Full Text | Google Scholar

Šebesta, P., Kleisner, K., Tureček, P., Kočnar, T., Akoko, R. M., Třebický, V., et al. (2017). Voices of Africa: acoustic predictors of human male vocal attractiveness. Anim. Behav. 127, 205–211. doi: 10.1016/j.anbehav.2017.03.014

CrossRef Full Text | Google Scholar

Sell, A., Bryant, G. A., Cosmides, L., Tooby, J., Sznycer, D., von Rueden, C., et al. (2010). Adaptations in humans for assessing physical strength from the voice. Proc. R. Soc. B Biol. Sci. 277, 3509–3518. doi: 10.1098/rspb.2010.0769

PubMed Abstract | CrossRef Full Text | Google Scholar

Sell, A., Cosmides, L., Tooby, J., Sznycer, D., von Rueden, C., and Gurven, M. (2009a). Human adaptations for the visual assessment of strength and fighting ability from the body and face. Proc. R. Soc. B Biol. Sci. 276, 575–584. doi: 10.1098/rspb.2008.1177

PubMed Abstract | CrossRef Full Text | Google Scholar

Sell, A., Tooby, J., and Cosmides, L. (2009b). Formidability and the logic of human anger. Proc. Natl. Acad. Sci. U. S. A. 106, 15073–15078. doi: 10.1073/pnas.0904312106

PubMed Abstract | CrossRef Full Text | Google Scholar

Siegman, A. W., Anderson, R. A., and Berger, T. (1990). The angry voice: Its effects on the experience of anger and cardiovascular reactivity. Psychosom. Med. 52, 631–643. doi: 10.1097/00006842-199011000-00005

PubMed Abstract | CrossRef Full Text | Google Scholar

Stirling, I. (1971). Studies on the behaviour of the south australian fur seal, Arctocephalus forsteri. Aust. J. Zool. 19, 243–266. doi: 10.1071/ZO9710243

CrossRef Full Text | Google Scholar

Třebický, V., Fialová, J., Stella, D., Coufalová, K., Pavelka, R., Kleisner, K., et al. (2019). Predictors of fighting ability inferences based on faces. Front. Psychol. 9:2740. doi: 10.3389/fpsyg.2018.02740

PubMed Abstract | CrossRef Full Text | Google Scholar

Třebický, V., Havlíček, J., Roberts, S. C., Little, A. C., and Kleisner, K. (2013). Perceived aggressiveness predicts fighting performance in mixed-martial-arts fighters. Psychol. Sci. 24, 1664–1672. doi: 10.1177/0956797613477117

PubMed Abstract | CrossRef Full Text | Google Scholar

Vaara, J. P., Kyröläinen, H., Niemi, J., Ohrankämmen, O., Häkkinen, A., Kocay, S., et al. (2012). Associations of maximal strength and muscular endurance test scores with cardiorespiratory fitness and body composition. J. Strength Cond. Res. 26, 2078–2086. doi: 10.1519/JSC.0b013e31823b06ff

PubMed Abstract | CrossRef Full Text | Google Scholar

van Mersbergen, M., Lyons, P., and Riegler, D. (2017). Vocal responses in heighted states of arousal. J. Voice 31, 127.e13–127.e19. doi: 10.1016/j.jvoice.2015.12.011

PubMed Abstract | CrossRef Full Text | Google Scholar

Vidal Andreato, L., Franzói de Moraes, S. M., Lopes de Moraes Gomes, T., Del Conti Esteves, J. V., Vidal Andreato, T., and Franchini, E. (2011). Estimated aerobic power, muscular strength and flexibility in elite Brazilian Jiu-Jitsu athletes. Sci. Sports 26, 329–337. doi: 10.1016/j.scispo.2010.12.015

CrossRef Full Text | Google Scholar

Keywords: speech, roar, vocalization, handgrip, competition, perception, human

Citation: Šebesta P, Třebický V, Fialová J and Havlíček J (2019) Roar of a Champion: Loudness and Voice Pitch Predict Perceived Fighting Ability but Not Success in MMA Fighters. Front. Psychol. 10:859. doi: 10.3389/fpsyg.2019.00859

Received: 12 October 2018; Accepted: 01 April 2019;
Published: 30 April 2019.

Edited by:

Alex L. Jones, Swansea University, United Kingdom

Reviewed by:

Benedict C. Jones, University of Glasgow, United Kingdom
Phil McAleer, University of Glasgow, United Kingdom

Copyright © 2019 Šebesta, Třebický, Fialová and Havlíček. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jan Havlíček,
Pavel Šebesta,