What's next for size-sound symbolism?

Ekström, Axel G.

doi:10.3389/flang.2022.1046637

MINI REVIEW article

Front. Lang. Sci., 03 November 2022

Sec. Psycholinguistics

Volume 1 - 2022 | https://doi.org/10.3389/flang.2022.1046637

What's next for size-sound symbolism?

Axel G. Ekström^*

Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden

This text reviews recent research in phonetic size-sound symbolism – non-arbitrary attributions of size properties to speech acoustic properties. Evidence from a wide range of research works is surveyed, and recent findings from research on the relationships between fundamental frequency, vowel articulation, consonant articulation, phonation type, mora count, and phonemic position, are discussed. It is argued that a satisfactory explanatory model of phonetic size-sound symbolism should meet two criteria: they should be able to explain both (1) the relationship between size and speech acoustics (Association criterion), and (2) the inconsistent findings observed across languages in the relevant literature (the Inconsistency criterion). Five theories are briefly discussed: The frequency code, Embodied cognition, Sound-meaning bootstrapping, Sapir-Whorf hypotheses, and Stochastic drift. It is contended that no currently available explanatory model of size-sound symbolism adequately meets both criteria (1) and (2), but that a combination of perspectives may provide much of the necessary depth. Future directions are also discussed.

Introduction

In phonetics, size-sound symbolism – the non-arbitrary association of speech acoustics relationships to estimates of speaker physical size – has been the subject of almost a century of research efforts (Sapir, 1929; Jespersen, 1933; Peterfalvi, 1965; Ultan, 1978; Ohala, 1984, 1994; Diffloth, 1994; Tsur, 2006). The present brief review centers on recent research findings on size-sound symbolism in experimental settings. The text summarizes findings from various aspects of speech acoustics, including fundamental frequency, and vowel and consonant articulation¹. In a later section, explanatory models with bearing on the phenomenon are evaluated. Future directions are discussed in the final section.

Cues to size in frequencies, words, and elsewhere

Symbolism in fundamental frequency

Various researchers have investigated whether fundamental frequency of phonation (f₀) is cognitively associated with size (Ohala, 1984; McComb, 1991; Masataka, 1994). f₀ reflects the rate of vocal fold oscillation, corresponding to pitch in perception. Because longer vocal folds oscillate more slowly, vocal fold length exhibits a negative correlation with f₀. Thus, on average, larger species do indeed tend to produce lower-frequency f₀ (Hauser, 1993), though the variable size of various laryngeal properties likely plays a greater role than body size per se (Titze, 1994; Garcia et al., 2017; Grawunder et al., 2018). However, within species, various studies, including research on both humans (Gunter and Manning, 1982; Wermke and Robb, 2010) and nonhuman animals (e.g., Masataka, 1994) have failed to observe any such relationship.

Because the larynx is sensitive to sex hormones (Newman et al., 2000), f₀ represents a sexually selected trait in humans (Delgado, 2006; Puts et al., 2016). Thus, when size-f₀ correlations are observed, relationships are often conditional, and typically more prominent in male speakers (see overviews by Ey et al., 2007; Riede and Brown, 2013). As testosterone thickens the laryngeal folds, taller and more high-testosterone men typically exhibit lower-frequency f₀ (González, 2004; Bruckert et al., 2006). Considered overall, however, the contribution of f₀ to size estimates is limited, with one meta-analysis finding that only 2% of men's height and 0.5% of women's height were explained by f₀ (Pisanski et al., 2014), though f₀ may still convey semantic information. For example, pitch has been shown to correspond perceptually to spatial elevation in research on music perception (Shintel et al., 2006; Küssner et al., 2014), prosody (Ekström et al., 2022), and speech more generally (Shinohara et al., 2020). Due to the spurious and complicated nature of the evidence, however, much work has instead concentrated on other aspects of speech signals as suggestive of size.

Symbolism in vowels

Vowel articulation is reliably modeled as series of alternately compressed tubes through which pulmonary air flow is expelled, resulting in variable vowel quality (Fant, 1960). For example, [a] is reducible to an initial compressed tube (where compression corresponds to constrictions on vocal tract air flow, e.g., by tongue position or lip rounding), and a subsequent open tube, resulting in a “back” vowel; for the “frontal” [i], the relationship between tubes is reversed. The acoustic results are formants – high-energy peaks in the frequency spectrum, corresponding to resonances in the vocal tract. Resulting vowel quality has been argued as suggestive of physical size, such that higher vowels and front vowels are suggestive of smaller physical size, and back vowels and lower vowels are suggestive of larger size (Tarte, 1974; Ultan, 1978; Fitch, 1997; Owren et al., 1997; Knoeferle et al., 2017).

Empirically, back vowels have indeed been shown to be positively associated with size in German and Hungarian (Elsen et al., 2021), and in Chinese, English, Japanese, and Korean (Shinohara and Kawahara, 2010). Researching ethnozoological nomenclature for birds and fish across four Central and South American languages, Berlin (1994) found that names exhibited significant size-sound symbolism, such that names for smaller creatures contained disproportionate numbers of high-frequency vowels, i.e., [i], and names for larger creatures contained disproportionate numbers of low-frequency vowel sounds, i.e., [a] and [u]. Similarly, a recent lexicographic approach by Winter and Perlman (2021) found that sound structure was indeed indicative of semantic size in English size adjectives (but not for general words), in particular for vowels [i] (small), [I] (small), and [a] (large).

Notably, however, while effects of linguistic background are sometimes negligible (Hoshi et al., 2019; Elsen et al., 2021), investigations often present puzzling contradictory results, challenging any supposed universality. In an important potential counterexample, Diffloth (1994) examined iconicity in the Bahnar language of Vietnam and found iconic values opposite those purported universal (Ohala, 1984): high vowels corresponded to larger expressives, and low vowels to smaller ones. Observations running counter to typical findings are rare, however (Lockwood and Dingemanse, 2015). More common is an apparent lack of findings when some experimental or observational methods is applied to multiple languages. For example, while Berlin's (1994) study of ethnozoological nomenclature found evidence for size-symbolic naming schemes in birds across four languages, the same naming scheme was only observed in one language when applied to fish.

Symbolism in consonants

While there is a strong tradition focusing on the role of vowels in potential size-sound symbolism, the role of consonants has received comparatively little attention. Recently, however, a number of works have investigated the relationship. For instance, Klink's (2000) investigations of size-sound symbolism in company brand names suggested that names non-arbitrarily communicate meaningful information about the product (including size), with fricatives and voiceless obstruents being more readily associated with smaller products and stops and voiced obstruents more readily associated with larger ones. Indeed, in their study of size-symbolic name attributions to Pokémon characters, Kawahara and Kumagai (2019) observed an effect of voiced obstruents, such that voicing predicted size in both native Japanese speakers and native English speakers – although less so in the latter case.

In an attempted replication the study, Godoy et al. (2020) explored the role of voiced obstruents in size-symbolic name attributions by native speakers of Brazilian Portuguese but did not observe the exact effect. In native speakers of Brazilian Portuguese, the effect of voiced obstruents (ready name attribution to post-evolution Pokémon characters) was more likely to occur when two voiced obstruents were present in a word (unlike the single voiced obstruent required for the same effect to be observed in native Japanese speakers; Kawahara and Kumagai, 2019) again suggesting learned language-specific differences in symbolism. Research on non-voiced consonants is much rarer, though results by Winter and Perlman (2021) suggested that [t], a voiceless stop, was indicative of smaller size. Finally, Levickij (2013) observed extensive correlations between phonemes symbolizing size in languages from 12 language families, such that close articulations were associated with smaller size and voiceless articulations were associated with smaller size and lighter weight.

Further, there are again several examples of interlinguistic inconsistencies across the size-symbolic literature, as applied to consonants specifically. For example, while Duduciuc and Ivan (2015) replicated Klink's findings on iconicity in brand name vowels, they failed to observe the same effects of fricatives (as predicting of smaller size, vs. voiceless stops) in a Romanian sample. The same finding was also not corroborated by Winter and Perlman (2021). Moreover, voiced obstruents – low-frequency consonantal phonemes such as [d?] (i.e., the /j/ in the English “job”), assumed symbolic of greater size (Ohala, 1984) – has been shown similarly related to largeness in Chinese, English and Japanese – but not in Korean (Shinohara and Kawahara, 2016). Finally, Saji et al. (2019) found that native Japanese speakers tended to invent words involving voiced consonants for “big” and “heavy” moving objects, and words involving voiceless consonants for “small” and “light” ones – but the same relationship was not observed in native English speakers.

Human speech is, however, not made up solely of f₀, vowels, and consonants, and a variety of voice cues including voice quality, and phonemic length and position, have recently been shown to facilitate size attributions.

Symbolism in other sources

Phonation type

Akita (2021) investigated iconicity in four phonation types – modal voice, creaky voice, falsetto, and whispering – in a sample of native Japanese speakers and found that voice creakiness was associated with larger images and whispering with smaller images. Creakiness reflects the voluntary recruitment and pulling together of laryngeal arytenoid cartilages, relaxing the vocal folds and slowing glottal air flow; creakiness is also typically characterized by comparatively low f₀ (Gordon and Ladefoged, 2001). In comparison, voiceless sounds (i.e., whispering) are characteristically high-frequency signals. To the knowledge of the author, however, neither finding has yet been replicated in other languages.

Mora count

Recent work has also demonstrated a “longer-is-stronger” effect, such that larger Pokémon characters were attributed names with greater mora counts (i.e., long vowels; Kawahara et al., 2018, 2020; Kawahara, 2020). Following up from these findings, Kawahara and Kumagai (2019) showed how both native Japanese and English speakers attributed names containing greater mora counts to evolved (i.e., larger) characters, compared to their un-evolved (smaller) forms (see also Kawahara and Breiss, 2021). It has been suggested this effect is comparable to quantitative iconicity in grammar (Haiman, 1980; see discussion in Kawahara et al., 2018; Kawahara and Breiss, 2021). If so, perception of extended vowel production (with greater mora count) more readily affords iconicity, compared to shorter productions.

Phoneme position

Finally, initial evidence of a phonemic positional effect was provided by Kawahara et al. (2008), who pointed to word-initial syllables as determinants of iconicity. Additionally, in a larger-scale study, Haynie et al. (2014) examined 120 Australian languages, observing significant positional effects. The authors found a higher occurrence of palatal consonants (e.g., [j], the /y/ in the English “yes”) in medial positions for “small” items, while low sounds were frequently associated with “large” items, regardless of position. Word-initial syllables are indeed meaningful for word recognition (Marslen-Wilson, 1980), but – unlike voice quality and length of phonation – positional effects are not obviously afforded by existing theories of size-sound symbolism. Explicating this issue should be the goal of future theoretical work on the topic.

Discussion

What explains size-sound symbolism?

While extensive research has focused on the phenomenon of size symbolism in speech, explanatory models have largely been neglected in comparison (Lockwood and Dingemanse, 2015; Sidhu and Pexman, 2018). In this section, a set of relevant theories are briefly reviewed and discussed, with a focus on whether a given theory can explain (1) phonetic size-sound symbolic associations as a cognitive phenomenon (the Association criterion); and (2) inconsistent effects of iconicity across studies of different languages (the Inconsistency criterion).

The frequency code

No discussion of phonetic size symbolism can be complete without explicit coverage of Ohala's (1984) frequency code hypothesis (FC). Indeed, evidence from bioacoustics has been provided based on research on vocalizations by range of mammalian species (for a review, see Charlton and Reby, 2016), including humans (Pisanski et al., 2014). Human cross-species size attribution has also been explored (Taylor et al., 2008). The relative contextual success of the theory stems from its reflecting a fundamental principle of physics, with the size of laryngeal component vibrating at slower rates (Titze, 1994). Nevertheless, theories purely based on bioacoustics have no ready explanations for the apparent interlinguistic differences commonly observed across studies: if associations were a universal, we should expect equivalent findings across languages. Thus, FC constitutes a strong candidate for meeting the Association criterion but does not meet the Inconsistency criterion.

Sound symbolism bootstrapping

The “Sound symbolism bootstrapping” hypothesis was presented by Imai and Kita (2014), who argued that symbolism helps infants establish associate speech sounds with referents. Cross-modal sound-symbolism correspondence does indeed appear early in life (Ozturk et al., 2013; Imai et al., 2015), facilitating word learning (Lockwood et al., 2016) and verb learning (Yoshida, 2012). Further, semantics of an image (as opposed to de-facto size) appear to drive symbolism (Auracher, 2017), suggesting the connection is, at least partially, learned. The bootstrapping hypothesis thus appears to complement FC, providing a potential developmental component that may facilitate subsequent mapping, partially meeting the Association criterion; however, it does contribute toward meeting the Inconsistency criterion.

Theories of embodiment

Embodied cognition (EC) accounts assume that aspects of cognition and information processing are contingent on bodily aspects, including the motor and perceptual systems (Lakoff and Johnson, 1980; Johnson, 1987). In speech research, such theories corroborate the motor theory of speech perception proposed by Liberman and colleagues (Liberman et al., 1967; Liberman and Mattingly, 1985), which argues that perception of speech is contingent on speech gestures. Intriguingly, a meta-analysis on the emergence of sound symbolism by Fort et al. (2018) suggest that a “bouba” effect (round sound-shape symbolism) emerges prior to a “kiki” effect (sharp sound-shape symbolism), in human infants². At birth, an infant does have fully developed lips (i.e., for suckling) but cannot articulate plosives such as /k/, which require significant lingual development (Lieberman, 2012). Nevertheless, EC perspectives cannot explain why some languages appear to map iconic-phonetic features in contradictory ways (Diffloth, 1994; Inconsistency criterion).

Sapir-whorf hypotheses

Phonetic symbolism shares common origin with Sapir-Whorf hypotheses (SW) – assumptions that aspects of languages shape speakers' patterns of cognition (Sapir, 1929). While controversial in the literature (Deutscher, 2010; McWhorter, 2014), SW holds apparent value as potentially helping explore differences in symbolism across languages. For example, a probabilistic inference model of color perception by Cibelli et al. (2016) suggests that language-specific categories most significantly influence perception in conditions of uncertainty. There also appears to be direct exposure effects on pitch perception (Eitan, 2013), and possibly interacting effects of exposure on perception of auditory-spatial relationships (Ekström et al., 2022). However, while SW may provide an ostensible mechanistic explanation of interlinguistic inconsistencies (that is, why they exist at all), it does not present a reasonable explanation as to how inconsistencies arise (Association criterion); nor does it purport to explain the relationship between phonetic-symbolic features of spoken language (Inconsistency criterion).

Stochastic drift

Levickij (2013, p. 88) observed: “In a particular language […] the potential synaesthetic effect will be weakened or come to nothing by phonetic and semantic laws of this language. The set of subjective symbolism should be therefore theoretically always larger than that of objective symbolism universals.” Work comparing apparent phonetic symbolism across a great number of languages (Levickij, 2013; Haynie et al., 2014) may indeed be indicative of a explanatory role for randomness in interlinguistic differences in sound-symbolic sensitivity.

In population genetics, stochastic drift is the change of the average value of a random process (Fisher, 1922, 1930; Haldane, 1927). Drift is fundamental to evolutionary biology, explaining the appearance of novel traits and species as fixations of changes in sets of genotypes, resulting from changes in frequency across generations. Because linguistic forms are comparably copied between speakers and groups across generations, Newberry et al. (2017) argued, based on quantification of selection pressures relative to stochastic drift in grammatical changes in English, that drift is likely stronger for irregular forms of (English) past-tense verbs and rare words. While such an approach may explain the evolution and spread of interlinguistic differences in size-symbolic cognitions (Inconsistency criterion; indeed, it is the only explanation discussed here that seemingly may do so), it does not lend itself to explaining the source of those cognitions themselves (Association criterion).

Concluding comments

In summary, while there have been extensive empirical research efforts targeting various aspects of size-sound symbolism, no cohesive theorical framework yet exists that lends itself to explaining the nuanced findings observed. Here, two criteria were posited, but no theory discussed readily meets both (see Table 1).

TABLE 1

Table 1. Evaluation of five candidate theories and frameworks.

The phenomenon of phonetic size-sound symbolism likely results from a combination of factors, and while findings may well be suggestive of some particular theory, theorists must seek independent motivation and validation. The challenge for the future phonetic symbologist, then, is independently motivating a theory that both accommodate evolutionary-biological principles of bioacoustics, while simultaneously accounting for interlinguistic differences. Reflecting the need for such a theory, future directions are briefly discussed in the final section.

Future directions

While a long-running tradition have utilized synthetic words, exaggerating effects of iconicity (Köhler, 1929; Ramachandran and Hubbard, 2001; Westbury, 2005; Aveyard, 2012), more recent developments in the field have moved toward more realistic and integrative experiments. When stimulus words are derived from natural languages, iconicity effects are smaller compared to synthetic ones (Lockwood and Tuomainen, 2015; Styles and Gawne, 2017), but provide higher-validity data (e.g., Kanero et al., 2014). New research increasingly further points to a diversity of cues as responsible for sound-symbolic interpretations (Thompson and Estes, 2011; Knoeferle et al., 2017; Westbury et al., 2018), suggesting a more complex phenomenon than previously believed (e.g., Sapir, 1929). Importantly, there are observable individual differences in sound-symbolic sensitivity (Lockwood et al., 2016). To date, however, while a range of studies have observed group differences across native speakers of various languages, there has been no attempt known to the author, at isolating potential population-level differences in sound-symbolic sensitivity. Were such effects to be observed, they should provide valuable information toward proper understanding of inconsistencies observed in size-sound symbolism research, as well as contribute toward a cohesive theoretical framework that explicates them.

Author contributions

AE: conceptualization, investigation, writing—original draft, and writing—review and editing.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^The text as such is concerned with psychological attributions based on acoustic properties of speech sounds. Topics more typical of traditional linguistics (e.g., processing of sentences, syntax, etc.), are not included (see e.g., Altmann, 1998), nor are they of immediate interest to the topic of phonetic symbolism more generally.

2. ^Note, however, that judgements of size and shape appear to correspond to different acoustic cues, such that size is inferred from vowel F₁, F₂, and shape from F₂ and F₃ (Knoeferle et al., 2017; Saji et al., 2019; but see Elsen et al., 2021).

References

Akita, K. (2021). Phonation types matter in sound symbolism. Cognit. Sci. 45, e12982. doi: 10.1111/cogs.12982

PubMed Abstract | CrossRef Full Text | Google Scholar

Altmann, G. T. (1998). Ambiguity in sentence processing. Trends Cognit. Sci. 2, 146–152. doi: 10.1016/S1364-6613(98)01153-X

PubMed Abstract | CrossRef Full Text | Google Scholar

Auracher, J. (2017). Sound iconicity of abstract concepts: Place of articulation is implicitly associated with abstract concepts of size and social dominance. PloS One 12, e0187196. doi: 10.1371/journal.pone.0187196

PubMed Abstract | CrossRef Full Text | Google Scholar

Aveyard, M. E. (2012). Some consonants sound curvy: Effects of sound symbolism on object recognition. Memory Cognit. 40, 83–92. doi: 10.3758/s13421-011-0139-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Berlin, B. (1994). Evidence for pervasive synesthetic sound symbolism in ethnozoological nomenclature. Sound Symbolism 76–93. doi: 10.1017/CBO9780511751806.006