Vocal Parameters of Speech and Singing Covary and Are Related to Vocal Attractiveness, Body Measures, and Sociosexuality: A Cross-Cultural Study

Valentova, Jaroslava Varella; Tureček, Petr; Varella, Marco Antonio Corrêa; Šebesta, Pavel; Mendes, Francisco Dyonisio C.; Pereira, Kamila Janaina; Kubicová, Lydie; Stolařová, Petra; Havlíček, Jan

doi:10.3389/fpsyg.2019.02029

ORIGINAL RESEARCH article

Front. Psychol., 22 October 2019

Sec. Evolutionary Psychology

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.02029

This article is part of the Research TopicPerceptions of People: Cues to underlying Physiology and PsychologyView all 17 articles

Vocal Parameters of Speech and Singing Covary and Are Related to Vocal Attractiveness, Body Measures, and Sociosexuality: A Cross-Cultural Study

Jaroslava Varella Valentova^1*

Petr Tureček²

Marco Antonio Corrêa Varella¹

Pavel Šebesta³

Francisco Dyonisio C. Mendes⁴

Kamila Janaina Pereira¹

Lydie Kubicová³

Petra Stolařová³

Jan Havlíček²

¹Department of Experimental Psychology, Institute of Psychology, University of São Paulo, São Paulo, Brazil
²Faculty of Science, Charles University, Prague, Czechia
³Faculty of Humanities, Charles University, Prague, Czechia
⁴Department of Basic Psychological Processes, University of Brasília, Brasília, Brazil

Perceived vocal attractiveness and measured sex-dimorphic vocal parameters are both associated with underlying individual qualities. Research tends to focus on speech but singing is another highly evolved communication system that has distinct and universal features with analogs in other species, and it is relevant in mating. Both speaking and singing voice provides relevant information about its producer. We tested whether speech and singing function as “backup signals” that indicate similar underlying qualities. Using a sample of 81 men and 86 women from Brazil and the Czech Republic, we investigated vocal attractiveness rated from speech and singing and its association with fundamental frequency (F0), apparent vocal tract length (VTL), body characteristics, and sociosexuality. F0, VTL, and rated attractiveness of singing and speaking voice strongly correlated within the same individual. Lower-pitched speech in men, higher-pitched speech and singing in women, individuals who like to sing more, and singing of individuals with a higher pitch modulation were perceived as more attractive. In men, physical size positively predicted speech and singing attractiveness. Male speech but not singing attractiveness was associated with higher sociosexuality. Lower-pitched male speech was related to higher sociosexuality, while lower-pitched male singing was linked to lower sociosexuality. Similarly, shorter speech VTL and longer singing VTL predicted higher sociosexuality in women. Different vocal displays function as “backup signals” cueing to attractiveness and body size, but their relation to sexual strategies in men and women differs. Both singing and speech may indicate evolutionarily relevant individual qualities shaped by sexual selection.

Introduction

Speech and singing are among the most common vocal productions in adult humans and their presence seems to be universally shared across modern human populations (Brown, 1991). It is assumed that they have a common ancestor (Brown, 2001, 2017; Mithen, 2005) which evolved into two specialized systems of structured vocal communication (Lehmann et al., 2009). It also seems that prosody, the musical part of speech which conveys mainly emotional information, is rooted already in the origins of both spoken and sung vocal production (Filippi, 2016; Brown, 2017). It has recently been shown that speech and singing may have diverged from a protolanguage and split in two systems based on their communicative function. In particular, when referential and emotional functions are introduced into an artificial communication system, the system diverges into speech- and music-like vocalizations, respectively (Ma et al., 2019). Moreover, despite a vast variability across cultures, the function of specific kinds of songs (e.g., a love song) is cross-culturally comprehensible based on their structural form (Mehr et al., 2018). Interestingly, both human and bird songs tend to employ similar descending/arched melodic contour despite substantial differences in absolute pitch and duration, which indicates similar underlying motor constraints across cultures and species (Savage et al., 2017).

Singing and speech differ in the use of vocal anatomy (Sundberg, 1977, 2018), require different patterns of breathing (Leanderson et al., 1987), and neuroanatomy of production and appreciation is likewise specific to each of the two domains (Zatorre and Baum, 2012). Cognitive processing of speech and singing is also specific for each domain, as shown in patients with amusia who have intact speech processing and patients with aphasia who have no impairment of musical capacities (Peretz and Coltheart, 2003). Despite the different design features, such as the arbitrariness of speaking and regular beat and discrete set of pitches in singing, the two domains share some further features, such as hierarchical structure and complexity (Fitch, 2006). Moreover, both speaking and singing voice provide relevant information about the producer’s gender, identity, location, emotional state, and behavioral tendencies (Weninger et al., 2011) and individuals can identify others based on their speech and singing (Trehub et al., 2009).

While spoken language is mostly specific to humans and language-like forms of vocalization exist in a few other animals (prairie dogs, dolphins, etc.) (Slobodchikoff et al., 1991; Janik, 2013), singing has its parallels in many other species. The capacity for learning complex songs, new sequences and sounds has arisen independently in birds (songbirds, hummingbirds, and parrots) and mammals (whales, seals, and humans) (Fitch, 2005). Since Darwin’s (1871) groundbreaking works, sexual selection has been viewed as one of the most important factors that drove the evolution of singing as a way of attracting the opposite sex and advertising individual qualities. There is a large body of research showing the importance of singing in mating success across various avian and mammalian species (e.g., Searcy and Andersson, 1986). In some species, singing seems to function as an honest signal of underlying individual qualities, so that e.g., lower-pitched songs advertise a larger body size (Hall et al., 2013). In humans, irrespective of their original adaptive value, speaking and singing can likewise be considered honest signals that meet the four requisite criteria (Smith and Bird, 2000). They both require a long time for maturation, practice, and learning (Welch, 2006), their production is energetically costly because they rapidly fade (Fitch, 2006), they can suffer from noise interference, and require intense breathing (Leanderson et al., 1987). Both speech and singing are easily perceptible by most people, are used in mating-relevant contexts, such as courtship (White et al., 2018), can increase individual mating success, and both can serve as cues to genetic qualities of the producer (Miller, 2000). There are also some significant differences between the two: singing requires higher vocal control (Zarate, 2013) and is more demanding than speech because singers need to tailor the subglottal pressure to both pitch and loudness (Sundberg, 1977, 2003). Singing can also be louder than speech, involving more muscle activity (Åkerlund and Gramming, 1994; Leanderson et al., 1987), and it includes a performative context (Fitch, 2006) which attracts more attention and is thus socially riskier. People even tend to abbreviate their singing performance in front of supposedly expert audience (Garland and Brown, 1972). It is thus well possible that singing is even harder to fake as an honest signal of underlying individual qualities than speech is, thus serving as an ornament that can affect the quantity or quality of sexual partners.

Human voice plays an important role in mate preferences and intrasexual competition (Puts, 2010; Pisanski and Feinberg, 2019), but so far, most research on human voice attractiveness and its indicators focused on speech. Some vocal parameters, especially the fundamental frequency (F0), differ between males and females of many species, with humans exhibiting an even greater sexual dimorphism than other primates (Puts et al., 2016). F0 is produced by vibrations of the vocal folds within the larynx and together with the corresponding harmonics is perceived as voice pitch (Pisanski et al., 2016). On average, men produce lower-pitched voices than women: this is due to the effects of testosterone during puberty which thickens and lengthens male vocal folds and thereby lowers the F0 (Pisanski and Feinberg, 2019). From a more general perspective, vocal sexual dimorphism is supposed to be at least in part the result of intrasexual competition, especially in the context of male-male competition (e.g., Puts, 2010). Indeed, men with lower-pitched voices are perceived as older, taller, heavier, more masculine, and more dominant than men with higher-pitched voices (Collins, 2000; Feinberg et al., 2005; Puts et al., 2006, 2007; Pereira et al., 2019). And similarly, women with lower-pitched voices are perceived as more dominant (Borkowska and Pawlowski, 2011), and both men and women with lower-pitched voices reported higher leadership capacities (Klofstad et al., 2012).

Aside from intrasexual competition, intersexual selection may have also played a role in shaping sex differences in voice. There is robust evidence that women prefer relatively low-pitched male speaking voices, while men prefer relatively high-pitched female voices (for a review, see Pisanski and Feinberg, 2019). Nevertheless, the relationship between male and female F0 and attractiveness is non-linear: the most attractive male voices are around 96 Hz and the most attractive female ones up to 280 Hz (Borkowska and Pawlowski, 2011; Saxton et al., 2015). Importantly, preferences for lower- and higher-pitched voices in men and women, respectively can be specific to certain contexts and individuals, such as short-term relationships (Little et al., 2002), coupled women (Valentová et al., 2013), and nulliparous women (Apicella and Feinberg, 2009), and in some populations that can even be inverted (Shirazi et al., 2018). Moreover, recent evidence suggests that lower-pitched female voices are perceived as attractive (Babel et al., 2014), and women actively lower their voices when speaking to attractive men or when willing to sound attractive (Hughes et al., 2014; Pisanski et al., 2018; but see Fraccaro et al., 2011). Lower pitched voices in women can thus signal their immediate interest and/or sexual appetence.

In line with the fitness indicator hypothesis within the sexual selection theory, vocal characteristics can convey information about the underlying qualities of voice producers, e.g., information about their health and reproductive potential. For example, men with relatively low-pitched voices exhibit low cortisol and high testosterone levels, which are related to immunoreactivity (Evans et al., 2008; Hodges-Simeon et al., 2015; Puts et al., 2016). Moreover, among North American men, a lower-pitched voice is associated with more female sexual partners (Puts, 2005), and lower-pitched male Hadza hunter-gatherers have on average a higher number of offspring (Apicella et al., 2007). Furthermore, both men and women with more attractive voices reported more sexual partners, extra-pair copulations, and earlier age of the first sex (Hughes et al., 2004), which are all considered proxies of potentially higher reproductive success.

Moreover, voice attractiveness is associated with several body measures that develop under the influence of sex-specific hormones and are thus viewed as indicators of genetic and developmental quality, and subsequently also the reproductive fitness of the individual. For example, voice attractiveness is positively associated with the shoulders-to-hip ratio in men and negatively associated with the waist-to-hip ratio in women (Hughes et al., 2004). Low pitched male voices are linked to larger body size, especially weight and height, to a particular body shape (shoulder and chest circumference, shoulder-to-hip ratio) (Evans et al., 2006), and arm strength (Puts et al., 2011). Nevertheless, a recent meta-analysis had shown that compared to other vocal parameters, voice pitch is not a reliable predictor of height in adults of the same sex (Pisanski et al., 2014) and it is a poor predictor of body weight, shape, and strength (Collins, 2000; Collins and Missing, 2003; Bruckert et al., 2006; Evans et al., 2006; Sell et al., 2010; Vukovic et al., 2010; Pisanski et al., 2016; Raine et al., 2019).

Formants, on the other hand, which are the resonant frequencies of the vocal tract, are more constrained by the anatomical structures related to body size. Formants are anatomically and functionally dissociated from fundamental frequency and are therefore a more reliable indicator of body size and shape both in humans and in numerous other mammalian species (Pisanski et al., 2014). Formants are also sexually dimorphic, whereby men show lower formant frequencies than women (Pisanski et al., 2016). Individuals who produce lower formant frequencies are perceived as more physically dominant (Puts et al., 2007) and women who produce higher formant dispersion are perceived as flirtatious and attractive by both men and women (Puts et al., 2011). Individual vocal characteristics thus may provide cues to different bodily traits and sexual behaviors linked to individual’s potential reproductive success.

Importantly, voice is a dynamic behavioral display which can be both intentionally and involuntarily modulated under specific situations so as to express or exaggerate ecologically relevant traits, including emotions (Pisanski et al., 2016). For example, both men and women change their voice when speaking to infants (Foulkes et al., 2005; Broesch and Bryant, 2015) and this specific infant-directed speech affects attention and communicative outcomes of the children (Rowe, 2012; Spinelli et al., 2017). Similarly, women modulate voice pitch when speaking to attractive men (Fraccaro et al., 2011; Hughes et al., 2014; Pisanski et al., 2018) and voices of both men and women who speak to an attractive individual are perceived as more attractive by others (Leongómez et al., 2014). Also, people can volitionally increase their vocal tract length (as estimated from formant frequencies) and decrease fundamental frequency to imitate a larger body size, and vice versa (Pisanski et al., 2016). The overall prosody of speech can be effectively modulated when expressing different emotions, such as high, loud, and fast prosody while feeling happy, and the opposite pattern while being sad (for review, see Brown, 2017). Interestingly, the same vocal modulation appears when expressing emotions by music, which suggests that both displays may convey similar information (Juslin and Laukka, 2003; Zatorre and Baum, 2012).

Although both singing production and perception is a scientific research field in its own right (Sundberg, 2003), singing accuracy is related to several loci on chromosome 4 and exhibits 40% heritability (Park et al., 2012), and singing frequently features in mating contexts (e.g., as serenades and love songs, see Dukes et al., 2003; Levitin, 2008), it tends to be overlooked by psychological research on voice attractiveness. As an exception, one study found that women who were judged as good singers based solely on the audio recordings were also independently rated as more attractive based on soundless video recordings (Wapnick et al., 1997). This is in line with research which shows that in women, attractiveness and masculinity-femininity ratings based on different modalities are correlated (e.g., Valentova et al., 2017c; Pereira et al., 2019). Nevertheless, further research is needed to test to what extent are the perceptual characteristics of speech and singing voice intercorrelated and whether both vocal displays function as backup signals, i.e., as signals that indicate similar underlying qualities, rather than multiple messages, i.e., signals that indicate different qualities of individuals (see Johnstone, 1996; Bro-Jørgensen, 2010). To the best of our knowledge, only one study tested the attractiveness of speech and singing in women and it concluded that attractiveness rated from both vocal displays is correlated and in both cases increases with voice pitch (Isenstein, 2016). This can be viewed as indicating that different vocal displays may serve as backup signals.

Aims of the Current Study

In the current study, we tested whether certain perceptual singing and speaking characteristics (perceived attractiveness, voice pitch, and formant frequencies) serve as cues to specific individual physical and behavioral qualities. Since singing production is more costly than speech, one could predict that the perceived attractiveness of singing would be a stronger indicator of individual quality than the attractiveness of speech. We have therefore tested the association between the attractiveness of singing and speech and selected body fitness indicators (body size and shape). We have also tested the relation between attractiveness ratings of both vocal displays and sociosexuality, which we used as a proxy of a short-term sexual strategy that may, especially in men, lead to increased reproductive success. We have further investigated how selected vocal parameters (voice pitch and vocal tract length as estimated from formant frequencies) mediate the possible associations between the vocal attractiveness, body cues, and sociosexuality.

Further, we tested whether the capacity to modulate the voice and singing experience may influence the rated vocal attractiveness. We hypothesized that both singing experience and a higher ability to modulate voice would lead to a more attractive vocal production.

Additionally, we tested for possible differences in vocal parameters between the sexes in two distinct populations, a Brazilian and a Czech one. So far, very little cross-cultural research has been conducted on evolutionarily relevant aspects of voice characteristics and perceptions. Majority of that research was conducted in the United States, Western and Central Europe (for review, see Pisanski and Feinberg, 2019). Studies comparing more populations with different physical, cultural, and linguistic compositions are thus needed to increase generalization of results. For example, although most North American and European studies concluded that women prefer lower-pitched male voices, Filipino women seem to follow the opposite pattern (Shirazi et al., 2018). In our study, we employed two sets of participants using sampling in one South American and one Central European population (Brazil and Czech, respectively), which differ widely as to their history, culture, ethnicity, and demographic data, and which both also differ from Western European and North American societies. Moreover, these populations also differ in several body measures, such as height and weight (e.g., Varella et al., 2014; Valentova et al., 2016), facial and body hair in men (Valentova et al., 2017b), while self-rated breast size, buttock size, and WHR in women is the same in both (Valentova et al., 2017a). Furthermore, Brazilian population reports a significantly higher sociosexuality than the Czech population (Varella et al., 2014). Both populations are also linguistically different: Brazilian Portuguese is a Latin language while Czech belongs to Slavic languages. Previous studies reported that several vocal parameters differ between the different linguistic groups (Mennen et al., 2012). The two populations thus offer an interesting opportunity to analyze vocal production and perception and its relation to body measures and sociosexuality.

Methods

Target Participants

The final sample was composed of 40 Brazilian men (M = 23.70 years; SD = 3.67, range 19–34) and 44 women (M = 23.91 years; SD = 4.99, range 18–35) recruited at the University of São Paulo, in São Paulo city, and 33 Czech men (M = 22.45 years; SD = 2.35, range 18–28) and 35 women (M = 22.37 years; SD = 2.57, range 19–29), recruited at the Charles University, Prague. We selected predominantly heterosexual participants (0–2 on a Kinsey scale) because individuals with different sexual orientations can show variation in several vocal parameters (Kachel et al., 2018) which can be detected even by naïve listeners (Valentova and Havlíček, 2013).

Procedure

In both countries, each participant consented to take part in a broader study (see, Varella et al., 2014; Valentova et al., 2017c). Participants completed questionnaires, we took body measurements, standardized facial and body photographs, and recorded videos of both speech and singing. Only data relevant for this study are described below. Brazilians are not allowed to receive financial reward but Czech participants received remuneration amounting to 300 CZK (approximately 13 USD). The project was approved by the Charles University IRB (2011/07).

Questionnaires

Participants completed a sociodemographic questionnaire and the Revised Sociosexual Orientation Inventory (SOI-R; Penke and Asendorpf, 2008). The SOI-R measures an individual’s willingness to engage in uncommitted sex. It consists of nine items (e.g., “With how many different partners did you have sexual intercourse on one and only one occasion?”), loading into three subscales of sociosexual behavior, attitudes, and desire. They also answered, on a 10-point scale, how much they liked to sing (1 = not at all, 10 = very much). We used this information as a motivational factor that may influence singing frequency, singing training, and thus singing experience, as shown in Busch (2013).

Vocal Recordings

Vocal samples were recorded under standardized conditions, in a closed and quiet room, and all by one researcher. For all recordings, we used a professional digital stereo Olympus LS-100 Multi-Track Linear PCM recorder, whereby the participants’ lips were approximately 10 cm from the microphone. When performing the vocal tasks, all participants were seated on a chair. First, participants were informed about the whole recording procedure: this information was printed for them. After a small vocal exercise to warm-up the voice and get used to being recorded, participants read a short sentence using standardized names across all participants. In Brazil, all men and women, respectively, pronounced “Oi, meu nome é Pedro/Ana, e eu sou de Belo Horizonte,” while Czech men and women, respectively, said “Jmenuji se Petr/Petra a pocházím z Havlíčkova Brodu” (Hi, my name is Petr/Pedro/Petra/Ana and I come from Belo Horizonte/Havlíčkův Brod). Subsequently, they sang the first part of “Happy Birthday” (in the Brazilian Portuguese version “Parabéns para você, nesta data querida, muitas felicidades, muitos anos de vida,” in the Czech version “Hodně štěstí zdraví, hodně štěstí zdraví, hodně štěstí, milý Honzo, hodně štěstí zdraví”). Finally, they first read and then sang the first stanza of their national anthem (the verbal content of speech and singing was thus matched).

To minimize raters’ overload, we extracted parts of the national anthem using SoundForge 8.0 software. In the Brazilian sample, we extracted the first two lines of the national anthem (“Ouviram do Ipiranga as margens plácidas, de um povo heróico o brado retumbante”), while for the Czech participants, we extracted the third and fourth line, which unlike the first two lines are not repetition of each other (“Voda hučí po lučinách, bory šumí po skalinách”). Only these recordings were subsequently rated by independent participants and analyzed for vocal parameters. All participants spoke their native language, i.e., either Brazilian Portuguese or Czech. None of the participants reported any serious vocal or respiratory problem at the time of the data collection.

Happy Birthday was selected because it is cross culturally known and commonly sung in intimate and emotionally loaded social situations, usually with the family, friends, and romantic partners, and it has been used in research on singing previously (e.g., Christiner and Reiterer, 2013). The national anthem is also widely known within each country, it is relatively unconnected to mating context and is thus more neutral.

Recordings were analyzed using Praat software (Boersma and Weenink, 2013) for mean, minimal, and maximal fundamental frequency (F0), and the first four formants (F1–F4). F0 is the rate of vocal folds vibration perceived as an overall voice pitch. We used an autocorrelation algorithm with parameters set to a pitch floor of 75 Hz and pitch ceiling of 300 Hz for men, and a pitch floor of 100 Hz and pitch ceiling of 500 Hz for women, because these are the appropriate boundaries for analyzing adult voices recommended by the software developers (Boersma and Weenink, 2013). All other values were set to default. Average speech F0 per recording ranged between 92.47 (Corresponding to musical note F#₂, here F note is heightened by semitone, which is indicated by #) and 177.70 Hz (F₃) in men, and between 164.10 (E₃) and 253.10 Hz (B₃) in women. For singing, F0 ranged between 103.60 (G#₂) and 208.50 Hz (G#₃) in men, and between 168.5 (E₃) and 348.20 Hz (F₄) in women. All F0 were transformed to perceptual pitch expressed in a semitone difference between A4 (440 Hz) and F0 using a standard formula 12log₂ (F0/440). This scale is based on standard music notation and reflects the logarithmic nature of human pitch perception, where both A₃ (−12, 220 Hz) and A₅ (12, 880 Hz) are at an equal octave distance (12 semitones) from A₄. We subtracted the minimal F0 from the maximal F0 of each recording to obtain its perceptual range in semitones. Average speech range per recording ranged between 4.61 and 21.07 semitones in men and between 5.34 and 27.61 semitones in women, while the singing range ranged between 6.76 and 23.74 semitones in men, and between 8.76 and 27.84 semitones in women. F0 and ranges were averaged for each participant across recordings for speech and singing separately.

Apparent vocal tract length (VLT) was calculated from the first four formants (F1–F4) according to a formula described in Pisanski et al. (2014). F1 to F4 were measured in Praat using semiautomated approach. First, recordings were preprocessed by Vocal Toolkit’s “Extract voiced and unvoiced” script (Corretge, 2019) and subsequently only the voiced parts were used for further formants analysis. Second, formants were analyzed by Burg method with recommended preset values and maximum formant levels of 5000 and 5500 Hz for men and women, respectively. In each recording from the list of results were omitted readings suggesting presence of silence and erroneous readings. F1 to F4 levels are represented by median of remaining formants readings.

Subsequently, formant spacing (ΔF) was estimated as a slope of the linear regression line with an intercept set to 0 from a relationship

F_{i} = \frac{(2 i - 1)}{2} Δ F

where “i” refers to the formant number. Apparent vocal tract length was derived from formant spacing using

V T L (Δ F) = \frac{c}{2 Δ F}

where c = 33.500 cm/s is the speed of sound in a uniform tube with one end closed.

Anthropometry

We measured participants’ body height in centimeters, weight in kilograms, and body characteristics previously found to be associated with vocal attractiveness, namely the circumference of the shoulders, waist, and hips (Dixson et al., 2003; Stulp et al., 2013; Valentova et al., 2014, 2016, 2017a). Then we computed the waist-to-shoulder ratio (WSR) in men and waist-to-hip ratio (WHR) in women (for details on the procedure, see Varella et al., 2014).

Vocal Ratings

An independent sample of heterosexual raters anonymously judged voice attractiveness of all vocal recordings of individuals of the opposite sex on a 7-point scale (1 = not at all attractive, 7 = very attractive) using Rater software (facelab.org). All raters reported being predominantly heterosexual (0–2 on a Kinsey scale). Brazilian raters (51 men: M = 22 years, SD = 3.4 years; 59 women: M = 22.1 years, SD = 3.4) were recruited among the students of the University of Brasília, while the Czech raters (46 men: M = 21.7 years, SD = 1.9; 47 women: M = 20.6 years, SD = 1.1) were recruited at the Charles University, Prague. The rating took place in an empty classroom, each voice recording containing the relevant phrase was presented only once using headphones and with unmanipulated volume. Each rater evaluated either all Brazilian or all Czech recordings. For instance, one Brazilian rater rated all Czech recordings, while another Brazilian rater rated all Brazilian recordings. The recordings were divided into eight blocks (two speech and two singing recordings, Brazilian and Czech sample) and randomized within each block. Interrater agreement (Cronbach’s α) was high in all recording × rater set combinations (min α = 0.79) (For a full overview of Cronbach’s α, see Supplementary Material). Pearson correlations between average attractiveness ratings of Czech and Brazilian raters were high for both speech [r = 0.694, 95%CI (0.602,0.768) p < 0.001] and singing [r = 0.788, 95%CI (0.719,0.841) p < 0.001]. We have therefore used as a unit of analysis the mean rating of attractiveness for each target across all raters.

Statistical Analyses

All analyses were conducted using R 3.5.1 software, and SPSS version 21 (IBM Corp., Armonk, NY, United States). To explore associations between the measured and rated voice parameters in speech and song, we ran parametric correlations (Pearson correlation) and paired t-tests to test for possible differences between the two vocal displays.

Relationships between the four exogenous variables (waist-to-hip or waist-to-shoulders ratio, height, weight, and age), mediating acoustic qualities (speech and singing F0 and range), speech and singing attractiveness, and the total sociosexuality score were investigated using path analysis. The structural model contained 6 correlations and 38 regression coefficients. Analysis was conducted using sem() function from the lavaan package. Because of small parameters/observations ratio (as low as 1.66 in the male sample), robust p values were obtained using Monte Carlo simulation. The distribution of expected correlation/regression coefficients was derived from 10,000 simulation runs, where the full model was estimated on a randomized dataset. The issue of influential points was avoided by jackknife resampling. Removing one observation at a time, we extracted sets of all measures including standardized model estimates and p values. Coefficients which remained significant regardless of the removed data points are emphasized in the main article, while full results are reported in the Supplementary Material. Path invariance was tested from the χ² difference between configural invariant, where structure is restricted to be equal between the groups, and path invariant, where all coefficients are restricted to be equal between the groups, with degrees of freedom corresponding to the number of estimated parameters. Path invariance was evaluated between men and women and subsequently between Czech and Brazilian participants within each sex. Interrater agreement was evaluated using Cronbach’s α calculated using alpha() function from the psych package (the code is available at https://github.com/costlysignalling/Speech-and-singing-attractiveness).

Further, to test for the possible effect of voice experience on rated voice attractiveness, we assessed non-parametric correlations (Kendall rank correlation indicated by coefficient τ) between the rated attractiveness of both spoken and sung recordings and how much the participants liked to sing. To test the voice modulation hypothesis, we computed the absolute difference between singing and speaking F0, singing and speaking F0 range, and the absolute difference between singing and speaking VTL, which gave us an index of (dis)similarity of these vocal parameters between the two vocal displays. The higher the absolute difference, the larger the difference between speech and singing, and thus the higher vocal modulation. We further correlated these absolute differences with attractiveness ratings, separately for men and women. In these analyses, we did not control for multiple comparisons across tests, because the samples were independent.

Additionally, we used General Linear Models (GLM) to test for possible effects of sex, age, and country on voice attractiveness ratings. Similarly, to test whether mean F0, range F0, and VTL of speech and singing differ between men and women or between Brazilian and Czech participants, we performed a multivariate GLM with mean F0 and F0 range as dependent variables and sex and country of targets as factors. Due to a limited samples size, we evaluated only simple models. The effect size displayed is a partial Eta-squared (η_p²).

Results

The Effect of Targets’ Sex and Country on Spoken and Sang F0, F0 Range, and VTL

We found large effects of targets’ sex on all vocal parameters; mean speech F0 (F = 1074.30, df = 1, 153, p < 0.001, η_p² = 0.878), mean speech F0 range (F = 14.12, df = 1, 153, p < 0.001, η_p² = 0.086), VTL as measured from speech (F = 2114.02, df = 1,153, p < 0.001, η_p² = 0.934), mean singing F0 (F = 736.84, df = 1, 153, p < 0.001, η_p² = 0.831), mean singing F0 range (F = 7.00, df = 1, 153, p = 0.009, η_p² = 0.045), and VTL as measured from singing (F = 1537.91, df = 1, 153, p < 0.001, η_p² = 0.911). Estimated marginal means revealed that women had a higher F0 and F0 range and shorter VTL than men (for mean values, see Table 1). There was also a significant effect of the target country on speech F0 range (F = 4.31, df = 1, 153, p = 0.040, η_p² = 0.028), VTL as measured from speech (F = 10.49, df = 1,153, p = 0.001, η_p² = 0.065), and VTL as measured from singing (F = 6.59, df = 1, 153, p = 0.011, η_p² = 0.042). Estimated marginal means show that Czech participants had a lower speech F0 range and longer VTL than the Brazilian participants (see Table 1 for details).

TABLE 1

Table 1. Mean fundamental frequency (F0) and the range of fundamental frequency (F0 range) in semitones, and VTL (in centimeters) in men and women.

It is worth noting that the average VTL measures for men and women (Table 1) compare to population-level averages (Pisanski et al., 2014).

Comparisons Between Speaking and Singing Voice

F0 measured from speech was strongly positively correlated with F0 measured from singing in both men (r = 0.800, N = 73, p < 0.001) and women (r = 0.607, N = 79, p < 0.001). F0 range measured from speech was correlated with F0 range measured from singing in men (r = 0.408, N = 73, p < 0.001) but not in women (r = 0.160, N = 79, p < 0.159). Vocal tract length (VTL) as estimated from formant frequencies was strongly positively correlated between speech and singing in both men (r = 0.808, N = 81, p < 0.001) and women (r = 0.764, N = 85, p < 0.001). Vocal attractiveness rated from speech and singing was also strongly positively correlated in both men (r = 0.720, N = 73, p < 0.001) and women (r = 0.674, N = 79, p < 0.001). Paired t-test revealed that voices rated from speech were judged significantly higher on attractiveness than voices rated from singing in both men (t = 6.66, df = 72, p < 0.001) and women (t = 3.85, df = 78, p ≤ 0.001).

Structural Models

The model which analyzes the fundamental frequency is not path-invariant with respect to the sex of individuals (χ² = 117.03, df = 44, p < 0.001) but is path-invariant with respect to participants’ nationality (χ² = 49.58, df = 44, p = 0.26 in men, χ² = 60.68, df = 44, p = 0.05 in women). Results are therefore reported separately for men and women but jointly for Czech and Brazilian participants.

Using path analysis (see Supplementary Tables S6, S7 for full models), we found that in men, lower-pitched speech was rated as more attractive (Figure 1). The same held of singing, but this relationship did not reach statistical significance. In men, a broader speech range, but not singing range, was rated as more attractive. Attractive speech was positively associated with the total SOI, but this relationship failed to maintain its stability in jackknife resampling. The total SOI was directly connected to a lower F0 in speech and higher F0 in singing. Body weight had a strong and positive direct effect on perceived speech and singing attractiveness. Age had a negative effect on speech attractiveness but the effect failed to remain stable under jackknifing (see Supplementary Table S8).

FIGURE 1

Figure 1. Path analysis results for F0. Arrows represent estimated parameters. Relationships significantly different from 0 (indicated by robust permutation yielded p values) are colored (positive relationships in green, negative in red) and labeled with standardized model estimates. Relationships that failed to meet the jackknife significance stability criteria are represented with a dashed line. F0 = average fundamental frequency; WSR = waist-to-shoulder ratio; and WHR = waist-to-hip ratio.

Higher-pitched female voices (both in speech and singing) were rated as more attractive. No other relationship except for correlation between height and weight was significant (see Supplementary Tables S7, S9).

The additional model that analyzed vocal tract length (VTL) was not path-invariant with respect to the sex of individuals (χ² = 109.44, df = 44, p < 0.001) but was path-invariant with respect to participants’ nationality at least in women (χ² = 66.99, df = 44, p = 0.01 in men, χ² = 59.18, df = 44, p = 0.06 in women). Results are reported separately for men and women but jointly for Czech and Brazilian participants for a better comparison with the original model that employs the F0.

Many relationships in the structural model remained similar when we replaced average F0 with apparent VTL (Figure 2). Nevertheless, the VTL failed to predict speech or singing attractiveness reliably. In women, we observed a reverse relationship between speech and singing VTL and the total SOI. In this model, however, these relationships were stronger because the potentially mediating path between VTL and attractiveness was weaker. This was possibly due to the fact that in the first model, which relied on average fundamental frequency together with the F0 range, both measurements of vocal quality were based on the same characteristic (F0 – either as average or as a difference between minimum and maximum), which in effect allowed us to partition out their respective contributions to speech and singing attractiveness better. The model with VTL, which tightly correlated with average F0, lowered the partial correlations beyond the threshold of statistical significance. All the relationships were, however, in the direction that would be expected based on the strong negative correlation between VTL and mean F0 (See Supplementary Tables S10–S12).

FIGURE 2

Figure 2. Path analysis results for VTL. Arrows represent estimated parameters. Relationships significantly different from 0 (indicated by robust permutation yielded p values) are colored (positive relationships in green, negative in red) and labeled with standardized model estimates. Relationships that failed to fulfill the jackknife significance stability criteria are represented with a dashed line. VTL = apparent vocal tract length; WSR = waist-to-shoulder ratio; and WHR = waist-to-hip ratio.

The Effect of Singing Experience and Voice Modulation on Voice Attractiveness

Non-parametric correlations showed a positive association between how much men liked to sing and attractiveness as rated from both speech (τ = 0.253, N = 87, p < 0.001) and singing (τ = 0.277, N = 87, p < 0.001). In women, this association was rather weak and significant only in singing attractiveness (τ = 0.171, N = 90, p = 0.024) but not in speech attractiveness (τ = 0.101, N = 91, p = 0.183). Furthermore, the absolute difference of F0 between speech and singing was positively correlated with how much men and women liked to sing (τ = 0.255, N = 90, p = 0.001; τ = 0.281, N = 93, p < 0.001, respectively). Moreover, the absolute difference of F0 was positively associated with rated singing attractiveness in both men (τ = 0.177, N = 87, p = 0.015) and women (τ = 0.294, N = 90, p < 0.001) but not significantly associated with speech attractiveness in either men (τ = 0.123, N = 87, p = 0.092) or women (τ = 0.118, N = 90, p = 0.101). Finally, the absolute difference of F0 was weakly positively associated with sociosexuality in men (τ = 0.139, N = 80, p = 0.069) but not in women (τ = 0.036, N = 84, p = 0.632). There were no significant correlations with the absolute difference between spoken and sung F0 range or VTL, rated attractiveness, and sociosexuality.

The Effect of Targets’ Sex and Country on Voice Attractiveness Ratings From Speech and Singing

Test of between-subjects effects of the GLM model showed significant main effect of sex of targets on attractiveness rated both from speech (F = 13.84, df = 1, 157, p < 0.001, η_p² = 0.082) and singing (F = 36.48, df = 1, 157, p < 0.001, η_p² = 0.192). Estimated marginal means revealed that the voices of female participants were rated as more attractive based on both speech (mean rating = 3.89, SD = 0.65) and singing (mean rating = 3.82, SD = 0.73) than the voices of male participants (mean ratings = 3.48, SD = 0.66; and 3.11, SD = 0.72, respectively). There was no effect of country.

Discussion

Using a cross-cultural sample of men and women, we have shown that speech and singing attractiveness are strongly correlated. We also found a strong correlation between the fundamental frequency (F0), F0 range, and vocal tract length (VTL) in both vocal displays. In men, low-pitched speech was rated as attractive and a similar trend was observed in singing. Furthermore, both vocal displays were invariably associated with body size (but not shape) and differently associated with sociosexuality. In women, both high-pitched singing and speaking voice predicted vocal attractiveness, and similarly to men, VTL as measured from singing and speech was differently associated with sociosexuality. Most results were invariant with respect to participants’ nationality, which indicates a degree of universality.

Our results partly support the hypothesis that speech and singing work as backup signals. They share many vocal parameters, such as fundamental frequency, its range and formant frequencies, which could lead to similar attractiveness ratings in both vocal displays (for similar results, see Isenstein, 2016). Both studied vocal displays thus covary in their production and perception and can transmit similar information to listeners. This is in line with previous studies which show that women’s cross-modal attractiveness or masculinity as rated from faces and spoken voices are intercorrelated, although no such correlation was found in men (Valentova et al., 2016; Pereira et al., 2019).

Nevertheless, we also found some features which are specific to the singing and speaking voice. For example, male speech attractiveness, but not singing attractiveness, is associated with higher sociosexuality (for similar results, see Hughes et al., 2004). The observed absence of association between singing attractiveness and male sociosexuality may suggest that singing voice is not part of the repertoire of short-term sexual strategy, at least in the two studied populations, which does not, however, exclude the possibility that it may be used to foster long-term relationships. Further, in line with previous studies, lower F0 in speech was directly connected to higher sociosexuality in men (e.g., Puts, 2005), while lower F0 in singing was connected to lower sociosexuality. Again, this could point to possible use of singing vocal display rather for committed long-term sexual strategy, which needs to be tested in future studies.

Further, although a high F0 in both speech and singing predicted vocal attractiveness in women, only low speech F0 was rated as attractive in men, although a similar non-significant trend appeared also in singing. This is in line with a study that found no difference in the attractiveness ratings of high- and low-pitched performances of famous singers (Neumann et al., 2008). Nevertheless, when analyzing the relative vocal parameters (difference in voice pitch between spoken and sung voice of the same person), we found that the singing voice of individuals who are capable of a higher pitch modulation is perceived as more attractive. In accordance with the handicap theory, individuals who can produce a larger difference between their spoken baseline and singing performance can thus benefit in terms of higher attractiveness and consequently potentially higher fitness. In line with this, men who modulated their voice pitch more had a tendency for higher sociosexuality, and men who like to sing more had more attractive voices. Both singing experience and higher capacity of voice modulation are thus linked to male attractiveness and sexuality.

Interestingly, in our study speech was on average rated as more attractive than singing. This can indicate that the standards for evaluation are higher in the singing domain, whereby singing abilities (e.g., singing in-tune), which are 40% heritable (Park et al., 2012), and were not tested in this study, may have influenced this difference. Nevertheless, another study found higher attractiveness ratings of singing than in speech in women and found no association between attractiveness ratings and singing quality (Isenstein, 2016). More studies are clearly needed to discern and determine the overall pattern.

We found that body weight was a strong positive predictor of both speech and singing attractiveness in men and a weak negative predictor of singing attractiveness in women (for similar results, see e.g., Sell et al., 2010; Xu et al., 2013; Šebesta et al., 2017). Weight also positively predicted VTL as estimated from speech in men, which is likewise in line with previous studies (for a review, see Pisanski et al., 2014). Some studies found differences in several vocal parameters (F0, voice pressure, perceptual voice quality) as a function of body weight, whereby heavier individuals have lower-pitched voices of more attractive perceptual quality (Barsties et al., 2013; Jost et al., 2018). The link between decrease in F0 and increase in body weight could be driven by hormonal factors, since for example in men, increased amount of fat tissue relates to lower testosterone levels (Zumoff et al., 1990; Tchernof et al., 1995). On the other hand, body weight may be due to not only body fat but also muscularity, which are both correlated with body size. Since the male body is composed relatively more by muscles than by fat tissue, one could speculate that vocal attractiveness provides a reliable cue specifically to muscularity, but future studies should assess the contribution of individual body components to vocal attractiveness. We also predicted a stronger association between body size and singing attractiveness but our results did not confirm this hypothesis. In humans, as in some songbirds (Hall et al., 2013), different vocal manifestations can thus serve as a cue to body size but not to body shape. This is in line with the finding that lower-pitched voice affects the perception of physical dominance (Puts et al., 2007).

Although women report that they like to sing more than men (Varella et al., 2010), and women and men both prefer sexual partners who demonstrate some music abilities (Kaufman et al., 2016), we found no association between singing or speaking voice attractiveness and sociosexuality or body indicators in women. This is contrary to previous studies (e.g., Hughes et al., 2004) which reported that women with attractive speaking voices had a lower waist-to-hip ratio, age of first sex, and a higher total number of sexual partners. Nevertheless, we found that shorter VTL measured from speech and longer VTL measured from singing predicted higher sociosexuality in women (for similar results in men, see Hodges-Simeon et al., 2011). This is comparable to our finding obtained for men when we analyzed the fundamental frequency. Generally speaking, individuals with sex-typical speech parameters and sex atypical singing parameters have higher sexual success (see, Bártová et al., 2019, for similar results on higher sociosexuality and gender non-conformity), which further supports the handicap hypothesis. Interestingly, there was no effect of the VTL on voice attractiveness and no effect of voice attractiveness on sociosexuality in women. Women’s tendency for sexual variety thus does not seem to be defined by how attractive they appear to the opposite sex. Access to sexual partners in individuals who display honest signals can be influenced by other mechanisms, such as intra-sexual competition (Varella et al., 2017; Ostrander et al., 2018).

This is the first study whose aim was to test the potential involvement of intersexual selection on different vocal displays on a cross-cultural sample of men and women (for intrasexual selection, see Raine et al., 2018; Šebesta et al., 2019). Although we used four different vocal recordings (standardized self-presentation, singing of “Happy Birthday,” and reading and singing of the national anthem), they do not represent the full range of human speech or singing. Standardized songs, such as “Happy Birthday,” are likely to limit pitch dynamics and range and thereby obscure or dampen the individual differences in pitch and voice modulation which might otherwise provide important cues to fitness.

Studies using different vocal recordings, such as spontaneous speech and singing, singing of more mating-relevant songs, or wordless singing, should be undertaken. This might be why some our predictions were not supported. It is for instance possible that a link between quality indicators and singing attractiveness becomes apparent in more demanding singing that involves complex rhythms, melody, or range (Charlton, 2014). The production of such demanding songs could be viewed as costly signaling and therefore serve as a more reliable indicator than the relatively undemanding songs employed in this study. Moreover, future studies should also perform more fine-tuned vocal analyses to compare both singing and speech (Šebesta et al., 2019).

It also ought to be taken into account that our samples in both countries were recruited from middle-class university student populations in the largest cities of both countries. They were thus not representative of the local populations and moreover compared only two countries. More cross-cultural comparisons are needed to test the generalizability potential of our current findings (see, Moshontz et al., 2018 for multi-lab psychological studies). Finally, as correlations between Czech and Brazilian raters were high, we pooled the ratings together, and did not analyze potential in-group and out-group effects, which might be addressed in future studies.

To conclude, we expected that singing would be a stronger indicator of individual body characteristics and sexuality than speech but our results show that cross-culturally, speech and singing seem to work rather in concert, i.e., as backup signals. Attractiveness of both singing and speaking voice is perceived in a similar way and is connected to a higher pitch in women and a lower pitch in men. Moreover, in men, speaking and singing both serve as similar cues to body indicators. On the other hand, the relation between speaking and singing voice and sociosexuality works in opposite ways in both men and women. Developmental pathways leading to sex-typical or atypical speaking and singing voice and sexuality should be addressed in future studies. In general, singing, together with other vocalizations, should be taken into account in evolutionary literature on voice production and perception.

Ethics Statement

This study was carried out in accordance with the recommendations of the Charles University IRB with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Charles University IRB (2011/07).

Author Contributions

JV and JH developed the study concept and MV expanded it. JV, MV, FM, KP, LK, and PS collected the data. JV performed the analysis of F0 and F0 range of the vocal stimuli. PŠ performed the formant analyses during revisions of the manuscript. JV and PT performed the data analysis and interpretation jointly with MV and JH. JV and MV drafted the manuscript. PT and JH provided the critical revisions. JV, MV, JH, PŠ, and PT worked on the revised version of the manuscript. All authors approved the final version of the manuscript for submission.

Funding

JH was supported by the Charles University Research Centre program UNCE 204056. MV was supported by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), number PNPD 33002010037P0 – MEC/CAPES.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We are indebted to all volunteers for their participation and Anna Pilátová, Ph.D. for English proofreading. We are grateful for Tiago Leal Dutra de Andrade for helping with collecting data during the ratings phase in Brasília. We further thank Prof. Dr. Vera S. R. Bussab for enabling the initial data collection phase at the University of São Paulo. We also thank the reviewers who offered valuable and critical suggestions of improvements.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02029/full#supplementary-material

References

Åkerlund, L., and Gramming, P. (1994). Average loudness level, mean fundamental frequency, and subglottal pressure: comparison between female singers and nonsingers. J. Voice 8, 263–270. doi: 10.1016/s0892-1997(05)80298-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Apicella, C. L., and Feinberg, D. R. (2009). Voice pitch alters mate-choice-relevant perception in hunter – gatherers. Proc. R. Soc. B 276, 1077–1082. doi: 10.1098/rspb.2008.1542

PubMed Abstract | CrossRef Full Text | Google Scholar

Apicella, C. L., Feinberg, D. R., and Marlowe, F. W. (2007). Voice pitch predicts reproductive success in male hunter-gatherers. Biol. Lett. 3, 682–684. doi: 10.1098/rsbl.2007.0410

PubMed Abstract | CrossRef Full Text | Google Scholar

Babel, M., McGuire, G., and King, J. (2014). Towards a more nuanced view of vocal attractiveness. PLoS One 9:e88616. doi: 10.1371/journal.pone.0088616

PubMed Abstract | CrossRef Full Text | Google Scholar

Barsties, B., Verfaillie, R., Roy, N., and Maryn, Y. (2013). Do body mass index and fat volume influence vocal quality, phonatory range, and aerodynamics in females? Codas 25, 310–318.

PubMed Abstract | Google Scholar

Bártová, K., Štěrbová, Z., Varella, M. A. C., and Valentova, J. V. (2019). Femininity in men and masculinity in women is positively related to sociosexuality. Pers. Individ. Differ. 152:109575. doi: 10.1016/j.paid.2019.109575