From incoherence to mirth: neuro-cognitive processing of garden-path jokes

Mayerhofer, Bastian; Schacht, Annekathrin

doi:10.3389/fpsyg.2015.00550

ORIGINAL RESEARCH article

Front. Psychol., 12 May 2015

Sec. Psychology of Language

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.00550

From incoherence to mirth: neuro-cognitive processing of garden-path jokes

Bastian Mayerhofer^*

Annekathrin Schacht

Courant Research Centre “Text Structures,” University of Göttingen, Göttingen, Germany

In so-called garden-path jokes, an initial semantic representation is violated, and semantic revision reestablishes a coherent representation. 48 jokes were manipulated in three conditions: (i) a coherent ending, (ii) a joke ending, and (iii) a discourse-incoherent ending. A reading times study (N = 24) and three studies with recordings of ERP and pupil changes (N = 21, 24, and 24, respectively) supported the hypothesized cognitive processes. Jokes showed increased reading times of the final word compared to coherent endings. ERP data mainly indicated semantic integration difficulties (N400). Larger pupil diameters to joke endings presumably reflect emotional responses. ERP evidence for increased discourse processing efforts and emotional responses, as assumed to be reflected in modulations of the late left anterior negativity (LLAN) and in an enhanced late frontal positivity (fP600), respectively, remains however incomplete. Processing of incoherent endings was also accompanied by increased reading times, a stronger and sustained N400, and context-sensitive P600 effects. Together, these findings provide evidence for a sequential, non-monotonic, and incremental discourse comprehension of garden-path jokes.

Introduction

A comedian enters the stage and announces to the audience: “I want to die peacefully in my sleep like my grandfather. Not screaming in terror like his passengers¹.” One does not need to be a comedian to know this specific moment between the delivery of a joke and the mirthful reaction that hopefully follows it. There is this confused look in the faces that instantly changes to smile and laughter, once the joke is successfully comprehended. But what exactly is happening in the recipient's mind in this very moment between confusion and laughter? Investigating the underlying neuro-cognitive and emotional processes of this very moment can reveal important insights for at least two overlapping research fields: psychology of humor (Martin, 2007), and discourse comprehension (e.g., Kintsch, 1998).

The most important theories of humor in Cognitive Sciences are variations of incongruity(-resolution) theories (Suls, 1972; Nerhardt, 1977; McGhee, 1979; Giora, 1991; Forabosco, 1992). Incongruity is defined as the violation of expectations during the perception and interpretation of a specific situation or communication (e.g., McGhee, 1979, p. 6/7). Much of linguistic and psycho-linguistic research on verbally expressed humor is focused on a specific kind of canned joke which forms a large and fairly homogeneous subclass of verbal humor, even though it rarely is explicitly mentioned as a specific subclass. The joke in the first paragraph serves as an example. This subclass of verbal humor can be called garden path (GP) jokes (Dynel, 2009) or forced re-interpretation jokes (Ritchie, 2004). GP jokes can be described in agreement with incongruity(-resolution) theories. GP jokes are usually rather short humorous texts. They include an ambiguous initial set-up, which allows (at least) two contrastive interpretations. This ambiguity can appear at several linguistic levels (lexical, syntactic, semantic, pragmatic, contextual; see Dynel, 2009, for a detailed analysis of different types of ambiguity in GP jokes); however, it should initially remain undetected by the recipient. The punch-line, usually the ending, violates the automatic and initially dominant or salient (Giora, 2003) interpretation of the ambiguous set-up. This violation triggers incoherence, leading to the perception of incongruity². In order to resolve this incongruity, the recipient needs to re-analyze the meaning of the text and to find an alternative interpretation which is consistent with the new linguistic input provided by the punch-line. The alternative interpretation then re-establishes a coherent meaning of the text.

Contrary to so-called GP sentences (Osterhout and Holcomb, 1992; Ferreira et al., 2001), the violation and the re-analysis of GP jokes are localized at the semantic rather than syntactic level although it is difficult—and a matter of theoretical conceptualization—to clearly disentangle these two levels. Semantic garden-path mechanisms have also been studied in the context of lexical ambiguity (resolution), polysemy and homonymy (e.g., Meyer and Federmeier, 2008). However, in GP jokes, crucially, the mental representation of the discourse, theoretically depicted as mental model (Johnson-Laird, 1983) or situation model (Kintsch, 1988, 1998), is violated at the punch-line. It is commonly assumed that the discourse comprehension is an active process of cognitive construction that involves the integration of explicit linguistic input with other linguistic and non-linguistic context information, including new semantic and pragmatic inferences and knowledge from long-term memory. Most importantly, a committed false belief concerning the interpretation of the text has to be substituted. We propose that this “(belief) revision” of the semantic representation (of the set-up) is the crucial mechanism during the comprehension of GP jokes (cf. Mayerhofer and Schacht, 2013). We will shortly illustrate these processes involved in the following example of GP jokes (1):

(1) –“Mummy, I just turned 14 years. May I please, finally, be allowed to wear a bra and make up.” –“No, you are not. And eat up your soup, my son!”

Given the linguistic information and the recipient's world knowledge, the child being a girl is the most plausible interpretation of the set-up phase. This interpretation gets violated when one hears the mother calling the child “son,” thus leading to incongruity. Belief revision occurs, and the recipient represents a boy who would love to wear a bra and make up. This incongruity resolution, in combination with the activation of the alternative, hidden interpretation and with its “inappropriateness” (Ritchie, 2004, p. 61), is typically accompanied by the experience of laughter and mirth in the recipient.

Many researchers agree upon the outlined sequential comprehension process, supported by empirical evidence. Vaid et al. (2003) demonstrated priming effects due to the dominant semantic networks specifically activated at different stages of joke comprehension over time. Coulson and Kutas (1998) found longer reading times for joke endings compared to straight (coherent) endings. These longer reading times were also accompanied by regressive eye movements after reading the punch-line (Coulson et al., 2006). Evidence for the enhanced costs of semantic revision also comes from non-joke texts (Carreiras, 1996; Sturt, 2007). Recently, several studies using event-related brain potentials (ERPs) have investigated the processing of jokes and verbal humor. Three (groups of) ERP components were especially fruitful for the study of verbal humor: the N400, late positivities, and the late left anterior negativity.

The N400 component (Kutas and Hillyard, 1980) is an enhanced negative-going deflection at centro-parietal electrodes starting around 200–250 ms after stimulus onset and lasting until 500–550 ms after stimulus onset with a peak around 400 ms, hence the name. It reliably occurs with semantic violations during sentence or discourse comprehension (Berkum et al., 1999). Other important factors that influence the amplitude of the N400 component are the predictability of a word in a given context, as for example reflected by the Cloze-probability (Kutas and Hillyard, 1984), and the semantic relatedness between the context and the expected word. The N400 effect functionally reflects semantic integration difficulties at the interface of word/stimulus recognition, linguistic, and non-linguistic context, and conceptual binding with the long-term-memory during an active comprehension process (Kutas and Federmeier, 2011). Previous ERP studies on joke comprehension have led to heterogeneous evidence regarding N400 effects. Derks et al. (1997) found augmented N400 amplitudes for jokes that also elicited a higher activation of the zygomatic muscle, indicating the elicitation of positive emotions. Coulson and Kutas (2001) found an N400 effect for joke endings involving frame shifting compared to straight endings. The effect was restricted to jokes with high semantic constraint on the ending. This finding was replicated in follow-up studies, shown to be only present for participants with a low verbal intelligence score (Coulson and Lovett, 2004), and to be related to the visual field of the stimulus presentation (Coulson and Williams, 2005).

Several ERP studies on language comprehension demonstrated syntactic violations to elicit an augmented positivity at posterior scalp sites. This so-called P600 component usually starts around 600 ms after stimulus onset and lasts until around 1200 ms. Since these late positivities are triggered by syntactic anomalies, such as in GP sentences (Bever, 1970; Osterhout and Holcomb, 1992), they are commonly considered to reflect syntactic repair processes which occur after the detection of a syntactic violation. However, Van Herten et al. (2005) found posterior P600 effects for semantic anomalies, and experimentally ruled out the possibility of a hidden syntactic anomaly being responsible for the component. This finding led the authors to argue that the P600 is a form of monitoring component “that checks upon the veridicality of one's analysis” (Van Herten et al., 2005, p. 254). In line with this assumption, the P600 has been suggested to reflect a combinatorial process, integrating both syntactic and semantic features of a sentence (e.g., Wicha et al., 2004; Martín-Loeches et al., 2009), and has also been reported for increased discourse complexity (Burkhardt, 2007). Moreover, a late positivity effect—distinguishable from the typical P600 effect by its frontal distribution—has been reported (Schacht et al., 2010) and related to the complexity and the ambiguity of a text (Kaan and Swaab, 2003). In many regards, GP jokes might be assumed as a semantic equivalent of GP sentences. Thus, the question is whether a semantic repair process in jokes—such as the belief revision—triggers similar brain response patterns as the mainly syntactic repair processes (P600 at posterior sites). Previous evidence has partly indicated such similarity, but remains incomplete (Coulson and Lovett, 2004; Marinkovic et al., 2011).

Apart from the P600 like findings, there is strong evidence that joke endings elicit a left-lateralized sustained anterior negativity (late left anterior negativity; LLAN), between 500 and 900 ms after stimulus onset. This component has been shown mainly for good comprehenders (Coulson and Kutas, 2001; Coulson and Williams, 2005). It was reduced for left-handed participants (Coulson and Lovett, 2004). Coulson and co-workers suggested that the component reflects the successful comprehension of jokes and called this effect “frame-shifting component” according to their conceptual framework. The LLAN has also been considered to reflect working memory activity necessary for the computation of a new mental representation of the discourse (Münte et al., 1998; Baggio et al., 2008; Meltzer and Braun, 2013). Similar activation patterns, most commonly earlier in time though, have been usually reported in studies investigating syntactical violations (see Steinhauer and Drury, 2012). It remains unclear whether these activation patterns that at a first glance appear to be similar, actually reflect a comparable underlying cognitive mechanism. But if they do, on a functional level, these processes are likely to be very broad and general, as for example increased working memory activity.

GP jokes also reliably lead to the subjective experience of mirth. Therefore, one might expect other ERP components elicited by jokes, reflecting the emotional processes. Emotion-related ERP responses to humorous visual stimuli have been reported as Posterior Positivities between 300 and 600 ms after the onset (Gierych et al., 2005; Korb et al., 2012). These components show strong similarities to the late positive complex (LPC), which has repeatedly been shown in response to emotional stimuli, such as affective pictures (e.g., Cuthbert et al., 2000; Schupp et al., 2000), and to facial emotional expressions and emotional words (e.g., Schacht and Sommer, 2009a,b). This effect has been related to sustained, elaborative processing of emotional relevance of a given stimulus. At longer latencies, Du et al. (2012) reported an enhanced positivity to Chinese jokes compared with neutral Chinese texts between 1250 and 1400 ms after the stimulus onset, which the authors related to an affective stage of the joke processing.

It is the main aim of the present study to disentangle different sub-processes or processing stages, respectively, involved in the comprehension of GP jokes to be reflected in distinguishable ERP components over time. At least, three different processing stages are hypothesized to be involved: (a) the violation of the pre-dominant initial semantic representation, (b) the revision of this semantic representation, and (c) the occurrence of an emotional reaction. To this aim, we constructed parallel versions of selected jokes in such a way that all comprehension processes should remain constant apart from the processes of interest outlined above. This manipulation was realized by exchanging only the final word of the original jokes as in the following examples [compared to (1)]:

(2) –“Mummy, I just turned 14 years. May I please, finally, be allowed to wear a bra and make up.” –“No, you are not. And eat up your soup, my girl!”

(3) –“Mummy, I just turned 14 years. May I please, finally, be allowed to wear a bra and make up.” –“No, you are not. And eat up your soup, my father!”

In example (2), the interpretation of the whole discourse is straight-forward and coherent. Thus, no belief revision is necessary. In example (3), the initial interpretation gets violated. The final sentence is a grammatically and semantically correct sentence, but its final word is discourse incoherent, thus triggering revision processes. In contrast to the joke ending of example (1), no hidden interpretation (or at least no plausible one) can be activated and no alternative meaningful coherent representation of the text can be constructed. The joke endings share the discourse incoherence with (3) at the occurrence of the final word, but share the comprehensibility of a meaningful discourse with (2), once the belief revision has been successfully carried out. In a series of experiments, we investigated the neuro-cognitive processes being specific for GP jokes, using 48 GP jokes and their coherent and incoherent manipulations as stimuli. Experiment 1 focused on behavioral measures using a self-paced reading time paradigm. Here, we expected increasing reading times from coherent over incoherent to joke endings. In Experiments 2–4, ERPs were of main interest in order to localize the GP-specific sub-processes. Hypotheses were as follows: Joke endings and incoherent endings both represent the violation of the initially dominant semantic representation and should thus elicit an augmented N400 component. Successful belief-revision processes in GP joke comprehension—requiring enhanced inferential and working-memory related processes—should be reflected in the occurrence of LLAN components. Only joke endings should elicit an emotional response. Therefore, we expected emotion-related ERP components at subsequent, late stages of joke processing, namely following the violation and the revision processes.

Another potent indicator of both cognitive and emotional processes during the comprehension of jokes could be provided by pupillary responses, which we also measured in Experiment 2. First, changes of pupil diameter have been shown to be a sensitive measure of the cognitive load during a task: Higher cognitive load leads to larger pupil diameters (Kahneman and Beatty, 1966; van der Meer et al., 2010). Second, larger pupil diameters have also been reported in association with higher emotional involvement, related to the arousal (Bradley et al., 2008) or to the intensity (Partala and Surakka, 2003) of an emotional reaction. Both factors cognitive load and emotional processing have been shown to affect pupil diameters also in the processing of verbal stimuli, such as single word processing and recognition (Võ et al., 2008; Bayer et al., 2011). Since the successful comprehension of GP jokes is hypothesized to involve both increased cognitive processing effort and an emotional response, we expected larger pupil diameters after joke endings compared to coherent endings. Changes of pupil diameter to incoherent endings should be intermediate due to enhanced cognitive demands (violation detection) on the one hand but the absence of both revision processes and emotional response on the other hand.

Experiment 1: Reading Times

The comprehension process of GP jokes is considered to contain two important stages: the detection of the violation of the semantic representation and the belief revision process. Both factors should lead to enhanced cognitive load which should be reflected in an increase of the reading times at the final word compared to coherent endings, as previously shown for English material (Coulson and Kutas, 1998). In the present experiment, we expected similar results for our German stimuli. Reading times for incoherent endings should be intermediate between coherent and joke endings since they also contain violations of the initial semantic representation that cannot be overcome by successful revision processes. However, given the equal contribution of all three stimulus categories, recipients might nevertheless attempt to search for potential hidden interpretations within their semantic system. Depending on how fast this search is interrupted, reading times might, alternatively, be even longer for incoherent compared to joke endings.

Method

Participants

Twenty-four participants (16 females), ranging in age between 18 and 29 years (M = 22.48, SD = 2.93), were tested. All of them were native speakers of German and students at the University of Göttingen, coming from a wide range of disciplines. They were rewarded with 8 €/h for their participation.

Material

A total number of 144 stimuli was constructed. Forty-eight jokes were selected from different sources according to the following criteria: (i) They had to exploit the GP mechanism. Additionally, they were selected to be (ii) ethically acceptable, (iii) subjectively amusing, (iv) translatable into German, unless they were originally German, without losing the amusement potential and without destroying the underlying GP structure, and (v) rewritable in such a way that the very final word of the last sentence could serve as the crucial punch-line element.

Based on these 48 jokes, two additional versions were constructed by exchanging only the final word of the text. In the Coherent condition, the final word of the joke was replaced by a word which was coherent according to the initial first interpretation of the text. In the Incoherent condition, the final word was replaced by a word which is incoherent according to the first interpretation and which does not offer a hidden interpretation of the set-up. Importantly, this final word violated neither the syntactic nor the semantic structure of the last sentence but it did not fit into the whole discourse of the text. This led to a total number of 144 stimuli with 48 text fragments identical in all three conditions but varying final words between conditions. Final words were matched between conditions according to word category, word frequency (Leipziger Worthäufigkeitsklasse; http://wortschatz.informatik.uni-leipzig.de/), and word length (number of letters). Descriptive statistics of the material is reported in Table 1. Stimulus material is provided as Supplementary Material.

TABLE 1

Table 1. Descriptive data of the stimulus features.

In pre-experimental ratings, 68 participants (46 females) between 18 and 36 years (M = 23.19, SD = 3.38) evaluated on 5-step scales from 1 (do not agree at all/ trifft überhaupt nicht zu) to 5 (totally agree/ trifft völlig zu). Items were constructed according to three theoretically derived dimensions: humorous potential (Humor), predictability of the ending (Predictability), and comprehensibility of the whole text (Comprehensibility). For each dimension, three items were constructed in order to obtain: (i) a behavioral component, (ii) a cognitive appraisal, (iii) an emotional response.

(it.1) The text was familiar, even though not necessarily literally. Der Text war mir zumindest sinngemäß bekannt. (Familiarity).

(it.2) I did understand the text. Ich habe den Text verstanden. (Comprehension).

(it.3) The text made me laugh/smile. Der Text hat mich zum Lachen/Schmunzeln gebracht. (Humor Behavioral).

(it.4) The text amused me. Der Text hat mich erheitert. (Humor Emotional).

(it.5) The text is funny. Der Text ist witzig. (Humor Cognitive).

(it.6) The text tricked me into the wrong way. Der Text hat mich in die Irre geleitet. (Predictability Behavioral).

(it.7) The ending of the text did surprise me. Das Ende des Textes hat mich überrascht. (Predictability Emotional).

(it.8) The ending of the text is predictable. Das Ende des Textes ist vorhersehbar. (Predictability Cognitive).

(it.9) The text is understandable. Der Text ist verständlich. (Comprehensibility Behavioral).

(it.10) The text confused me. Der Text hat mich verwirrt. (Comprehensibility Emotional).

(it.12) The text is nonsense. Der Text ist Unsinn. (Comprehensibility Cognitive).

After reading the stimulus, participants indicated whether they knew the text, and then rated the nine items in randomized order. These three items per scale were summed together for the three total scale scores. The results of the ratings are depicted in Figure 1.

FIGURE 1

Figure 1. Box plots of the three scales of the ratings, (A) Humor potential, (B) Predictability, and (C) Comprehensibility. Every data point presents one observation for one participant and one stimulus. The thick line is the median, the box represents the 25% and 75% quantiles, and the whiskers are the minimum and maximum values, while points represent statistical outliers.

ANOVAs and Bonferroni-corrected post-hoc t-tests were carried out for the three scales. Only texts that were indicated as unfamiliar to the participants were included in the analysis. There was a significant effect of Condition on all three scales: Humor, F_{(2, 141)} = 135.31, p < 0.001, Predictability, F_{(2, 141)} = 77.48, p < 0.001, and Comprehensibility, F_{(2, 141)} = 115.45, p < 0.001. The Joke condition (M = 8.83, SD = 1.04) was rated as more humorous than both the Coherent (M = 5.39, SD = 1.42), t₍₉₄₎ = 14.52, p < 0.001, and the Incoherent condition (M = 5.5, SD = 1.06), t₍₉₄₎ = 13.29, p < 0.001, while there was no significant difference between Coherent and Incoherent. The Joke condition (M = 8.83, SD = 1.04) was rated less predictable than Coherent (M = 9.92, SD = 1.24), t₍₉₄₎ = −4.66, p < 0.001, but more predictable than Incoherent (M = 7.07, SD = 1.11), t₍₉₄₎ = 8.04, p < 0.001. The Incoherent condition (M = 8.33, SD = 1.68) was rated less comprehensible than the Joke condition (M = 12.25, SD = 1.12), t₍₉₄₎ = −13.41, p < 0.001, and than the Coherent condition (M = 11.89, SD = 1.32), t₍₉₄₎ = −11.51, p < 0.001, while there was no significant difference between Joke and Coherent conditions. Together, ratings confirmed the validity and the suitability of the stimulus material.

The 144 stimuli (48 Joke, 48 Coherent, 48 Incoherent) were used for Experiment 1. In addition, 144 filler items were constructed as similar as possible to the original stimuli in terms of the linguistic style, e.g., syntactic structure, topic, lexical level, dialogs, etc. Similar to the experimental stimuli, identical 48 text fragments were completed with three different endings: two different coherent endings and a discourse-incoherent ending. The filler items reduced the proportion of jokes in order to make the purpose of the study less obvious, they reduced the number of repetitions of the text fragments and, should, therefore, distract the participants from keeping all the text fragments in memory. Note that responses to fillers were not analyzed. The total of 288 texts was distributed to three different sets (every set containing 96 different text fragments). The order of the texts within a set was randomized for every participant and the six possible permutations of the block order were equally distributed over all the participants, resulting in 288 short texts of six conditions for each participant, guaranteeing that possible influences by the repetition of the text fragments were at least equally counterbalanced.

Procedure

The experiment was carried out in a group lab on a computer with four participants per session. After they had indicated the demographic data, participants received instructions on the computer screen that they participated in an experiment on text comprehension. They were made familiar with the presentation of the stimuli and were told to carefully read the texts. They were explicitly told that some of the texts were hard to understand, and that some of them did not make sense at all. Also, they were explicitly instructed to continue with the next stimulus when they think that they understood the text or when they are sure that the text does not make sense.

The texts were presented on a computer screen with an adapted version of the Moving Windows Paradigm (Just et al., 1982), implemented by Pygame, a graphical interface for Python. First, the whole text was presented to the participant with the final sentence of the text being masked by blanks. The last sentence of the text appeared word by word after participants pressed the return key on a standard keyboard. Only the actual word appeared unmasked, and the words that had been read became masked again. Most importantly, the reading time for the final word (the crucial manipulation of the experiment) was measured as the time between the onset of the final word and the moment a participant pressed the return key on the keyboard in order to proceed with the next text.

After a pseudo-randomly chosen amount of trials (normal distribution with M = 10, SD = 4), participants were presented with a statement concerning the previously presented text and had to indicate whether the statement was true. The comprehension question was randomly chosen for correct “true” or correct “false” answers in order to prevent participants from clicking themselves through the task without proper processing of the stimuli.

Results

Responses below 200 ms and above 3 standard deviations above participant's average were excluded from the analysis. Every participant's mean reading times of the final word per condition were calculated and log-transformed. Cohen's d_z is reported as effect size as mean difference score per participant and condition divided by the standard deviations of these differences (Lakens, 2013). A One-Way ANOVA revealed a significant main effect of Condition, F_{(2, 46)} = 8.51, p < 0.001, η²_G = 0.27, with significantly shorter reading times for Coherent (M = 1018, SD = 329) as compared to both Joke (M = 1162, SD = 446), t₍₂₃₎ = −3.97, p < 0.001, d_z = −0.81, and Incoherent (M = 1111, SD = 403), t₍₂₃₎ = −2.62, p = 0.015, d_z = −0.53. The latter did not differ significantly, t₍₂₃₎ = −1.49, p = 0.149, d_z = −0.30.

Discussion

The hypothesis of longer reading times for joke endings compared to coherent endings was clearly supported by the data. Further, the reading times of the joke endings tended to be prolonged in comparison to incoherent endings, but this difference failed significance. Reading of incoherent endings took also significantly longer than of coherent endings. Together, these findings indicated that either the detection of the semantic incoherence itself is characterized by higher processing demands or the participants started the same attempt of finding an alternative interpretation as in the joke endings, possibly, triggered by the mere occurrence of jokes during the experiment.

Experiment 2: Evidence from ERPs and Pupil Diameters

Reading times, as measured in Experiment 1, reflect only the sum of several sub-processes, thus not allowing any specific assumptions regarding specific processing stages. ERPs provide the advantage of high temporal resolution in the range of milliseconds. Therefore, distinguishable ERP components can be related more precisely to the hypothesized underlying cognitive or emotional processing stages involved. Here, we recorded ERPs and changes of the pupil diameter in relation to the different endings of the stimulus material in order to investigate the hypothesized comprehension processes as outlined in the introduction.