The Development of Idiom Knowledge Across the Lifespan

Sprenger, Simone A.; la Roi, Amélie; van Rij, Jacolien

doi:10.3389/fcomm.2019.00029

ORIGINAL RESEARCH article

Front. Commun., 05 July 2019
Sec. Psychology of Language
Volume 4 - 2019 | https://doi.org/10.3389/fcomm.2019.00029

The Development of Idiom Knowledge Across the Lifespan

Simone A. Sprenger¹^*

Amélie la Roi¹

Jacolien van Rij²^*

¹Center for Language and Cognition, Faculty of Arts, University of Groningen, Groningen, Netherlands
²Department of Artificial Intelligence, Faculty of Science and Engineering, University of Groningen, Groningen, Netherlands

Knowledge of multi-word expressions, such as break the ice, is an important aspect of language proficiency that so far we have known surprisingly little about. For example, it is largely unknown how much variability there is between speakers with respect to the number of different items that they know, or what factors contribute to their acquisition. This lack of knowledge seriously limits the generalizability of experimental studies on the production and comprehension of multi-word expressions (usually idioms) and generally suggests that there still is a sizable unknown territory of language knowledge to explore. Here, we present the results of two familiarity ratings for a large sample of Dutch idioms and a large number of participants that varied in age between 12 and 86 years old. The data show considerable variation between participants and between idioms. Non-linear mixed-effects regression analyses revealed that the age of participants, but not their education, as well as the frequency and decomposability of the idioms influenced the familiarity scores. Our findings suggest that the knowledge of multiword expressions develops across the lifespan, is acquired from exposure, and—in participants younger than about 40 years of age—varies with item decomposability.

Introduction

In everyday language use, many concepts are expressed by multi-word expressions, such as hit the road (depart), break the ice (relieve social tension by means of a remark) or how are you (a formula exchanged when people meet). These expressions must be learned alongside the words and rules that enable us to generate new sentences and represent an important aspect of what Pawley and Syder (1983) referred to as nativelike language proficiency. Based on analyses of conversational data, they estimated the number of such expressions in English as hundreds of thousands and suggested that access to these prespecified expressions in long-term memory is a prerequisite for fluent speech. Yet, even though the importance of multi-word expressions has been recognized in psycholinguistics (as evidenced by numerous experimental studies on the acquisition, processing and production of idioms, which we shortly discuss below), our knowledge about these processing units is still very limited. That is, in contrast to our knowledge about single words, we do not know what factors constrain the multi-word vocabulary and the way in which it varies between speakers. Here, we therefore want to explore how speaker characteristics (age, education) and item characteristics (frequency, decomposability) conjointly affect the acquisition of the Dutch idiom vocabulary across the lifespan.

In an exploration of what he called the boundaries of the lexicon (and thus the theoretical scope of grammatical theories), Jackendoff (1995) argued that the large number of multiword expressions that speakers of a language know and recognize—which he estimated at about the same size as the number of single words—must in fact be considered entries in the mental lexicon. He illustrated his position with the wheel of fortune corpus, which included about 600 compounds, idioms, names and clichés, all considered sufficiently familiar to native speakers to be included in a popular TV game show that required participants to guess these phrases with a few hints. Examples include I cried my eyes out, a breath of fresh air and May the Force be with you.

While the nature of the underlying representations of such well-known phrases is still a matter of debate in linguistics and psycholinguistics (e.g., Cacciari and Tabossi, 1988; Fillmore et al., 1988; Cutting and Bock, 1997; Jackendoff, 1997; Titone and Connine, 1999; Sprenger et al., 2006; Libben and Titone, 2008), most idiom researchers will agree that they need to be included in the mental lexicon. However, our knowledge about this part of the lexicon is still limited. That is, we do not know how many multi-word expressions a speaker can be expected to be familiar with, or what this knowledge depends on. Estimates in the literature (such as Pawley and Syder, 1983, hundreds of thousands) are often extrapolations from small samples of conversation. At the same time, collections of multiword expressions in dictionaries or analyses of large corpora (e.g., Moon, 1998) can only provide upper boundaries for the knowledge that a native speaker might acquire. Neither method can provide us with a reliable estimate of the multiword vocabulary, or the conditions that affect its size.

Psycholinguistic approaches to multiword expressions typically focus on idioms. Apart from the fact that they form relatively fixed combinations of words, their meanings are not a direct function of their constituent words, making them an interesting test case for theories of language comprehension and production. For example, depending on the context, the English phrase to break the ice either refers to relieving the tension in a social situation or to the actual process of crushing frozen water. However, given a context that fits better with the figurative interpretation, native speakers can easily retrieve the correct form and meaning from memory (in production and comprehension, respectively).

Experimental work that tries to uncover the representations and processes that are responsible for the fast and efficient production and comprehension of idioms depends on high-quality stimulus materials. There are two main criteria that play role in this context: first, the idioms must be representative for a larger collection of items (e.g., with respect to the relationship between form and meaning), and second they must reflect the subjects' knowledge. This second criterion is especially difficult to fulfill. Does every speaker of English know the idiom to kick the bucket, or is that knowledge mostly restricted to the subset of idiom researchers? What other well-known expressions are there, and where are these items located in the frequency distribution? In idiom studies, questions about specific items are often answered on an ad-hoc basis, with stimulus materials being rated for familiarity in the context of a specific study. The number of items in these studies rarely exceeds twenty (e.g., Bobrow and Bell, 1973: 5 items; Swinney and Cutler, 1979: 22 items; Cacciari and Tabossi, 1988: 20 items; Cutting and Bock, 1997: 36 items; Gibbs, 1991: 20 items; Sprenger et al., 2006: 16 items;), and it is unclear in how far those are representative for the category of idioms as a whole. This lack of knowledge is a fundamental problem for psycholinguistic research on idiom production and comprehension, as it limits the potential generalizability of our data.

For various languages, such as English (Titone and Connine, 1994; Libben and Titone, 2008; Bulkes and Tanner, 2017; Nordmann and Jambazova, 2017), French (Caillies, 2009; Bonin et al., 2013, 2017), German (Citron et al., 2016), Italian (Tabossi et al., 2011) and Chinese (Li et al., 2016), norms have been published with the aim to increase the reliability of stimulus material in psycholinguistic studies on idioms. These norms provide a number of interesting measures, such as familiarity, decomposability, predictability or emotional valence, for several hundreds of items per language. That is, for the average speaker of the language in question, these norms provide a best guess about how a specific item scores on the various dimensions, making it possible for researchers to select items from the corresponding distributions.

However, while clearly increasing the reliability and validity of idiom tasks, the use of norms is not without problems either. It is important to realize that there is no such thing as an average native speaker: they differ with respect to socio-economic backgrounds, education, personality, and age. Given the effect of such variables on the sizes of our vocabularies at large (Brysbaert et al., 2016), it is conceivable that there are considerable individual differences in the idiom vocabulary as well. For example, Brysbaert et al. (2016) showed that the single-word vocabulary expands rapidly during adolescence, but keeps growing steadily until old age, with an average increase of one word per two days. In other words, age has an important effect on vocabulary that exceeds well-beyond the initial stages of language acquisition and cognitive maturation. Yet, idiom norming studies traditionally do not take this factor into account. They usually average across age, often sample from a student population only (e.g., Li et al., 2016), and sometimes do not mention their participants' age at all (e.g., Bulkes and Tanner, 2017). Whether age affects the idiom vocabulary in a similar way as the single-word vocabulary is therefore an open question.

Here, we want to explore the contribution of age to the development of the idiom vocabulary in more detail. If age indeed played an important role in idiom acquisition, this would have important consequences for the design of experiments that are to reveal the psycholinguistic processes and representations involved in the production and comprehension of idioms. Apart from the need to calibrate idiom norms for age, an age effect on idiom knowledge would stress the role of individual differences on online idiom comprehension. Reports on such effects so far have been few, but fairly consistent. Cain et al. (2005), for example, studied the relationship between reading comprehension and idiom interpretation in 9-year olds and found that poor comprehenders were less able to use context when interpreting opaque, but not transparent (or rather, decomposable) idioms. Cacciari et al. (2007) compared slow and fast participants in a comprehension task and found that slow participants needed more perceptual input to identify an idiom and to activate its meaning. Columbus et al. (2015) found effects of executive control capacity on reading times for metaphors, but not for idioms. In contrast, Cacciari et al. (2018) found that idiom comprehension was affected by individual differences in working memory capacity, inhibitory control, and crystallized verbal intelligence, as well as personality-related variables (State Anxiety and Openness to Experience). Taken together, these studies indicate that individual differences affect online idiom comprehension processes, and thus are likely to affect acquisition as well. However, none of the studies considered age as a separate factor.

How would we expect age to affect the idiom vocabulary? First, the pattern that was observed by Brysbaert et al. (2016) for the development of the single-word vocabulary may be further delayed by the late development of figurative competence (i.e., the age at which children are able to understand an idiom's figurative interpretation, at about 9 years of age; Levorato and Cacciari, 1992), as well as by the relatively abstract concepts that are expressed by many idioms. So far, there are only few empirical data to backup this assertion, as developmental research on idioms has mostly focused on figurative competence, rather than the age at which children acquire specific tokens (e.g., Nippold and Martin, 1989; Levorato and Cacciari, 1992; Nippold and Rudzinski, 1993; Nippold and Taylor, 1995; Nippold and Duthie, 2003; Hung and Nippold, 2014). For example, Nippold and Martin (1989) report an increase in the ability to interpret idioms from the age of 14–17. As their observations are based on only twenty items per subject, we cannot draw conclusions about the size of the subjects' idiom vocabularies.

Beyond the age of adolescence, there are likewise only few data points to sketch the acquisition curve. A study by Kuiper et al. (2009) shows a rise in idiom knowledge until the age of 50–60 years, followed by a slight drop-off in the 65+ cohort (ten subjects per cohort). A drawback of this study is that the observations (based on 20 items) are not backed up by inferential statistics, making it difficult to judge their reliability. However, the pattern has partly been confirmed by Escaip (2015). Replicating Kuiper et al.'s (2009) study in Spanish, English, and French, she found a significant positive correlation of age with idiom knowledge in all three languages. That is, the older the participants, the more idioms they knew (with ages ranging between 15 and 83). For English, but not for the other two languages, Escaip also found a significant decrease of knowledge for speakers of 65 years and older.

The second important factor that we want to explore here is idiom frequency. In contrast to age, which is a characteristic of the subjects, frequency is a characteristic of the item itself. Similar to single-word acquisition, it is conceivable that frequency can explain a large part of the variance between idioms. This is supported by the observation that, in the past decade, a considerable number of studies has been published that demonstrate an important role for frequency in the acquisition of multi-word sequences. For example, Bannard and Matthews (2008) showed that children as young as 2 years old are sensitive to the frequency with which specific word combinations occur in child-directed speech: when asked to repeat sequences of words such as a drink of tea, they make fewer errors and—by the age of 3—also respond faster to high frequent word combinations than to matched low-frequent combinations. Likewise, Arnon and Snider (2010) demonstrated that adults are sensitive to the frequency of compositional multi-word phrases like don't have to worry. Their subjects responded faster in a phrasal decision task when the phrases were more frequent. Similar facilitatory effects for high-frequent items have been observed for language production in adult speakers, both for literal and more idiomatic sequences (e.g., Tremblay and Tucker, 2011; Janssen and Barber, 2012; Arnon and Cohen Priva, 2013; Sprenger and van Rijn, 2013).

The third factor that we include here is idiom decomposability, which was defined as the extent to which the idiom word meanings are related to the figurative meaning of the expression (similar to, for example, Rommers et al., 2013). Similar to frequency, decomposability is a feature of the individual idiom that may affect the ease with which a specific item can be acquired. If an idiom is highly decomposable, knowledge about its individual words may help the learner to deduce the idiom's meaning and/or to remember the item more easily when he or she encounters it again, since the words themselves may act as memory cues. This may explain why the poor comprehenders in the study by Cain et al. (2005) did not have difficulties interpreting decomposable idioms, in contrast to opaque idioms. From studies on online idiom processing, we know that decomposability is a relevant factor. Processing advantages have been reported for decomposable idioms over non-decomposable idioms: for example, with respect to sentence verification latencies (Gibbs et al., 1989) and in a lexical decision task that used idioms as primes for target words that were related to the item's figurative meaning (Caillies and Butcher, 2007). However, the exact nature of the way in which decomposability modulates online processing is still disputed, as its effect is not always facilitatory. Titone and Libben (2014) found late inhibitory effects of decomposability in a cross-modal semantic priming task and Titone et al. (2019) observed late inhibitory effects of decomposability during idiom reading. Interestingly, Westbury and Titone (2011) found an interaction of decomposability with age: in a literality judgment task, older adults were relatively slower than younger adults to accept non-decomposable idioms with a literal meaning and made more errors.

In the present article, we want to study the effect of age as an easy to assess speaker characteristic on idiom familiarity and compare it to the effects of idiom frequency and decomposability. If idioms indeed have their own entries in the mental lexicon, the idiom familiarity curve should be highly similar to that for single-word vocabulary (across speakers and items). That is, it should be modulated by age and education, with an early phase of rapid expansion, followed by a long phase of moderate but steady increase, and possibly decrease (as in Kuiper et al., 2009). Per item, this effect should be modulated by frequency, as we can expect the probability of acquisition to be a function of exposure. It may also be affected by idiom decomposability, which is supposed to reflect the ease with which an item can be analyzed, encoded, and retrieved (Caillies and Butcher, 2007). To test these predictions, we collected familiarity ratings for 194 Dutch idioms in two online rating studies and assessed the corresponding corpus frequencies. In addition to the ratings, respondents provided information about their gender, age, and level of education.

The Idiom Database

For the exploration of the effect of age on idiom familiarity (Study 1 and 2 presented below), we have composed a small database with Dutch idioms. The database is available in the supplementary materials¹ and contains 189 Dutch idioms with their meaning and associated frequency counts. For all idioms (and control items, as explained below) we additionally collected decomposability ratings in an online questionnaire.

Materials and Methods

Materials

Ninety-nine Dutch idioms with two nouns were collected for Study 1. They were not controlled for syntactic structure or position of the nouns, but often contained prepositional phrases. The number of nouns was controlled with respect to the item's usability in an unrelated behavioral experiment. In addition to the experimental items, four German idioms were literally translated to Dutch and included as control items. All items were presented in past tense and preceded by the temporal adverb “Toen”: (at a time in the past), for example “Toen kwam de aap uit de mouw.” (Then the monkey came out of the sleeve, which means that the true nature of a situation, the true character of a person, or a hidden motive was being revealed).

Ninety Dutch idioms with one noun were collected for Study 2. Again, syntactic structure or noun position were not controlled for. All items were presented in past tense and preceded by the temporal adverb “Toen” (at a time in the past), for example “Toen zette hij hem op straat.” (Then he put him on the street, which means then he laid him off) In contrast with Study 1, no control items were included. Thus, all idioms were existing Dutch idioms.

Frequencies for the idioms and translated German idioms were obtained from the Lassy Large corpus (Van Noord et al., 2013), a 700-million-word corpus of Dutch texts with automatically assigned syntactic annotations that is combined of both spoken and written sub-corpora (including the Dutch Wikipedia). By searching for lemmas, rather than exact word matching, most idioms were detected: the counts ranged between 0 (4 items) and 4,688. Surprisingly, three of the five control also were found in the corpus, probably due to their similarity with other Dutch idioms (for example, the German idiom Then he shot sparrows with cannons is very similar to the Dutch idiom Then he shot mosquitos with cannons). Before analysis, the frequency counts were log-transformed. Figure 1 shows the distribution of the log-transformed frequencies.

FIGURE 1

Figure 1. Left: Frequency (log-transformed) of the 194 Dutch idioms and translated German idioms, ordered. Right: Comparison of the frequency distributions in Study 1 and Study 2.

Participants

The decomposability questionnaire was advertised under students of the University of Groningen. We restricted the age range to 18–25 years old, to keep the decomposability ratings consistent with earlier studies (Rommers et al., 2013). The data consisted of 57 entries, but we excluded one participant who was not monolingual Dutch (a Frisian-Dutch bilingual), 21 participants who contributed less than ten ratings, and one participant whose age did not match the target age range. The clean data consisted of 34 participants in the age range 21–26 years old (mean 24.3 years old; 8 men) who contributed each 15–98 ratings (mean 89.9). Participants did not receive compensation for their participation.

Procedure

The questionnaire was implemented using the survey software Qualtrics (Qualtrics, Provo, UT). Participants could anonymously access the questionnaire with a link. At the start of the experiment, participants were informed on the goal of the survey and gave their consent that their participation was voluntary. Participants were asked to read idioms and to judge to what extent the meaning of the individual words was related to the figurative meaning of the expression as a whole (cf. Rommers et al., 2013). They had to click on one of five radio buttons, labeled from left to right as “1 (geen relatie tussen individuele woorden en figuurlijke betekenis)” (no relation between the individual words and the figurative meaning), “2,” “3,” “4,” and “5 (sterke relatie tussen individuele woorden en figuurlijke betekenis)” (strong relation between the individual words and the figurative meaning), or on a sixth radio button labeled as “ik ben niet bekend met deze uitdrukking” (I am not familiar with this idiom). Three idioms were presented individually at the start of the questionnaire to serve as anchors for the range of the decomposability scale (anchoring), but later idioms were presented in a random order. The idioms were divided in two lists of each 100 items (including the anchors). Each participant saw only one of the two lists.

Analyses

The data were analyzed using Generalized Additive Mixed Models (Hastie and Tibshirani, 1990; Wood, 2017; GAMMs), a non-linear mixed-effects regression method. GAMMs do not assume a linear relationship between the dependent variable and a covariate, but the relationship is estimated using penalized regression splines. The method does not require the user to specify the shape of the regression line on beforehand, but it is estimated based on the data. Other reasons for choosing this non-linear regression method are that it allows to include tensor product interactions for estimating interactions between multiple non-linear covariates, and it allows to include non-linear random effects (see for introductions Wieling, 2018; van Rij et al., in press). The statistical analyses are performed in R version 3.4.4 (2018-03-15; R Core Team, 2018), using the package mgcv version 1.8-24 (Wood, 2017) implementing GAMMs, and the package itsadug 2.3 (van Rij et al., 2017) for evaluation and visualization of the statistical models.

Decomposability Ratings

From the 3,056 responses, 504 (16.5%) were of the category I am not familiar with this idiom (henceforth “unfamiliar” responses). These responses were excluded from the analysis. A logistic mixed-effects regression analysis revealed that the proportion of “unfamiliar” responses was significantly influenced by the idioms' frequencies [ $χ_{(2)}^{2}$ = 24.02, p < 0.001]: the proportion of “unfamiliar” responses is larger for low-frequent idioms than for high-frequent idioms (see Supplementary Materials for the complete analysis).

All idioms were seen by at least thirteen participants. However, the number of actual decomposability ratings (i.e., when participants did not give an “unfamiliar” response) varied strongly between idioms, ranging from 2 to 34 (mean 13.1). Figure 2 shows this variation in the number or ratings that was collected for each idiom: On the right end of the x-axis, there is one idiom that received a decomposability score from all 34 participants, because it was included as anchor. At the left end of the x-axis we find one of the translated German idioms with no decomposability ratings. All 13 participants who were presented with this idiom indicated that they were not familiar with it. We excluded this item from further analysis, accordingly.

FIGURE 2

Figure 2. Histogram of the number of decomposability ratings per idiom. The x-axis shows the number of decomposability ratings, and the y-axis the number of idioms that received that number of ratings.

We did not use the average rating per idiom as decomposability score, to avoid a potential subject bias influencing the decomposability scores for the idioms with a low number of ratings. Instead, we fitted a GAMM with random effects for participants and idioms to account for the participants' response biases and the variation between idioms. Random effects allow for partial pooling: the estimates for idioms that only have a few observations will pull toward the average (shrinkage); and the idiom estimates may be corrected for subject biases, as the subject mean is taken into account. From this statistical model we extracted an estimated decomposability score for each idiom (the script is available in the Supplementary Materials). To fit the ordered categorical nature of the decomposability ratings (5-point scale), we used the GAM ordered categorical family (Wood et al., 2016). Figure 3 (left panel) visualizes the difference between the mean rating scores (x-axis) and the estimated decomposability scores (y-axis). Figure 3 (right panel) compares the estimated decomposability scores for Study 1 and Study 2.

FIGURE 3

Figure 3. Decomposability scores. Left: Mean rating scores (x-axis) vs. estimated rating scores with partial pooling (y-axis). The dashed lines mark the center of the scale. Right: Comparison of the estimated rating scores in Study 1 and Study 2.

Finally, we analyzed the effects of the idiom's frequency on the decomposability score. We used the GAM ordered categorical family (Wood et al., 2016) to fit the decomposability ratings (5-point scale). The log-transformed frequencies were included as non-linear main effect. In addition, by-Subject non-linear random smooths were included for Frequency and random intercepts for Idiom. However, the effect of Frequency was not significant [F_{(1.001, 2381.614)} = 2.69; p = 0.1].

In the following sections we will use item frequencies and decomposability ratings as predictors for the familiarity ratings of Study 1 and Study 2.