Dialect Variation Influences the Phonological and Lexical-Semantic Word Processing in Sentences. Electrophysiological Evidence from a Cross-Dialectal Comprehension Study

Lanwermeyer, Manuela; Henrich, Karen; Rocholl, Marie J.; Schnell, Hanni T.; Werth, Alexander; Herrgen, Joachim; Schmidt, Jürgen E.

doi:10.3389/fpsyg.2016.00739

ORIGINAL RESEARCH article

Front. Psychol., 27 May 2016

Sec. Auditory Cognitive Neuroscience

Volume 7 - 2016 | https://doi.org/10.3389/fpsyg.2016.00739

Dialect Variation Influences the Phonological and Lexical-Semantic Word Processing in Sentences. Electrophysiological Evidence from a Cross-Dialectal Comprehension Study

Manuela Lanwermeyer¹^*

Karen Henrich^1,2

Marie J. Rocholl¹

Hanni T. Schnell¹

Alexander Werth¹

Joachim Herrgen¹

Jürgen E. Schmidt¹

¹Forschungszentrum Deutscher Sprachatlas, Philipps-Universität Marburg, Marburg, Germany
²Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany

This event-related potential (ERP) study examines the influence of dialectal competence differences (merged vs. unmerged dialect group) on cross-dialectal comprehension between Southern German dialects. It focuses on the question as to whether certain dialect phonemes (/ $\overset{⌢}{oa}$ /, / $\overset{⌢}{o Ʊ}$ /), which are attributed to different lexemes in two dialect areas (Central Bavarian, Bavarian-Alemannic transition zone) evoke increased neural costs during sentence processing. In this context, the phonological and semantic processing of lexemes is compared in three types of potentially problematic communication settings (misunderstanding, incomprehension, allophonic variation = potential comprehension). For this purpose, an oddball design including whole sentences was combined with a semantic rating task. Listeners from the unmerged Central Bavarian dialect area heard sentences including either native or non-native lexemes from the merged neighboring dialect. These had to be evaluated with regard to their context acceptability. The main difference between the lexemes can be attributed to the fact that they have different meanings in the respective dialect areas or are non-existent in the linguistic competence of the Central Bavarians. The results provide evidence for the fact that non-native lexemes containing the / $\overset{⌢}{oa}$ /-diphthong lead to enhanced neural costs during sentence processing. The ERP results show a biphasic pattern (N2b/N400, LPC) for non-existent lexemes (incomprehension) as well as for semantically incongruous lexemes (misunderstanding), reflecting an early error detection mechanism and enhanced costs for semantic integration and evaluation. In contrast, allophonic / $\overset{⌢}{o Ʊ}$ / deviations show reduced negativities and no LPC, indexing an unproblematic categorization and evaluation process. In the light of these results, an observed change of / $\overset{⌢}{oa}$ / to / $\overset{⌢}{o Ʊ}$ / in the Bavarian-Alemannic transition zone can be interpreted as a facilitation strategy of cross-dialectal comprehension to reduce both misunderstandings as well as neural costs in processing, which might be interpreted as the initial trigger for this particular phoneme change.

Introduction

Phoneme Change As a Result of Dialect Contact

Despite the intensive preoccupation with the phenomenon of linguistic change, the question as to why linguistic units change and which factors influence this process is still a matter of debate. Several studies show that lexically irregular changes are primarily the result of dialect contact (cf. for example Trudgill, 1986; Wang and Lien, 1993; Schmidt and Herrgen, 2011)¹. These changes result from the interference between systems and the interaction of speakers with different phonological competences.

In this context, one interesting phenomenon is the merger of a phonemic contrast in one dialect, which is still maintained in a related dialect. The expansion of unconditioned mergers can be explained by the close contact between merged and unmerged speech communities. For instance, the actuation of the low back merger in Pennsylvania is explained by a massive influx of foreign-born immigrants who had difficulties in acquiring the distinction between long and short /o/ in words like cot and caught due to their own reduced vowel systems. Thus, the expansion of the merger is the result of repeated misunderstandings of productions of one-phoneme-speakers by two-phoneme-speakers in face-to-face communication (cf. Herold, 1997; Labov, 2010). Overall, misunderstandings between groups of regional speakers are often motivated by differences in their linguistic competences (cf. Labov, 2010; Schmidt, 2010; Schmidt and Herrgen, 2011).

In order to investigate the relationship between cross-dialectal comprehension and phoneme change, it is useful to study similar varieties, which differ only in few phoneme contrasts so that the general understanding between the speaker groups is ensured. This particular setting can be found in two Southern German dialects. Therefore, the current study investigates the dialect / $\overset{⌢}{oa}$ /-/ $\overset{⌢}{o Ʊ}$ / contrast, which has developed differently in these areas. In the Central Bavarian dialect (henceforth CB), it is a stable contrast, while in the neighboring Bavarian-Alemannic transition zone (henceforth BA) only / $\overset{⌢}{oa}$ / occurs before obstruents, brought forth by a merger of Middle High German (MHG) ô and ei (see Figure 1 for the geographic location of the dialect areas). Interestingly, a phoneme change to either / $\overset{⌢}{o Ʊ}$ / or /oː/ can be observed in certain lexemes in BA, which is possibly due to the dialect contact with CB (Schmidt and Herrgen, 2011). Even if this development has been documented by production data, no perception study has tested this assumption so far. To investigate this gap in research, a study employing event-related potentials (ERPs) was conducted focusing on cross-dialectal comprehension between both of these dialect groups.

FIGURE 1

Figure 1. The Bavarian-Alemannic transition zone (BA) and the Central Bavarian dialect area (CB) with × displaying the recording location (Merching) and experimentation location (Isen).

Our study focuses on the question whether the usage of dialect phonemes (/ $\overset{⌢}{oa}$ /, / $\overset{⌢}{o Ʊ}$ /) which are attributed to different lexemes in the two contiguous dialect areas, leading to minimal pairs between these areas, evoke increased neural costs during sentence processing. If so, this would indicate difficulties in cross-dialectal comprehension. We are especially interested in semantic processing differences elicited by minimal phonological differences between different phoneme contact settings (misunderstanding, incomprehension, allophonic variation) in the form they can appear in everyday communication situations.

The Impact of Regional Variation on Phoneme Perception in Online Observation

Cross-dialectal comprehension is highly related to the capacity of listeners to deal with acoustic variability in pronunciation resulting from dialectal variation. A listener's ability to perceive different speech sounds as phonemes mainly depends on the phoneme inventory of his own native language (cf. e.g., Buchwald et al., 1994 concerning the discrimination between /r/ and /l/ in Japanese). Studies using the electroencephalography (EEG) technique provide evidence that this assumption can be adapted to non-native regional phonemic contrasts within a language, as well. For instance, Brunellière et al. (2011) compared the /e/-/ε/ contrast in word-final open syllables (e.g., /epe/ ‘sword’ vs. /epε/ ‘thick’) in merged and unmerged French speaker groups. They found processing differences concerning the cortial topographies, indicating that in contrast to unmerged speakers, merged speakers associate the two forms with only one semantic representation (homophones). These clear differences between the groups support the assumption that the access to lexical meaning in spoken word recognition heavily depends on the listeners' native regional accent. The influence of the native phoneme inventory on phoneme perception was also investigated by Conrey et al. (2005), who focus on mechanisms of semantic integration and phonological decision processes using the example of the /ɪ/ and /ε/ merger before nasal consonants in American English (the so-called pin-pen merger). The results show that in contrast to the unmerged group neither behavioral nor neural differences could be detected in the merged group. In contrast to previous studies, these results suggest that the different groups process the stimuli differently at a conscious, decisional level.

Previous studies dealing with dialect contrasts mainly used the Mismatch Negativity (MMN) component to examine vowel discrimination (cf. Brunellière et al., 2009, 2011; Scharinger et al., 2011). The MMN is a fronto-central negative component, usually peaking at 150–250 ms from change onset, when infrequent deviations (deviant stimuli) occur among frequently repeated sound patterns (standard stimuli) in a passive oddball design. The MMN is elicited regardless of the participant's direction of attention and thus reflects an automatic, pre-attentive response to any change in auditory stimulation. A basic prerequisite for MMN elicitation is the creation of a short-term memory trace in the auditory cortex, i.e., a representation of the repetitive standard stimulus. The MMN is the reflection of a discrimination process as the representation is violated by an infrequent deviant, indicating that the deviant is found to be incongruent with the memory representation of the preceding series of standard stimuli (cf. Näätänen et al., 2007). Using cross-linguistic oddball designs, several studies could establish language-specific memory traces for phonemes. The MMN deflection is increased when the deviant is a vowel category in the subject's native language in contrast to non-native vowel categories (Näätänen et al., 1997). Moreover, Kazanina et al. (2006) investigated the [t]-[d] contrast, which is mapped onto distinct phoneme categories in Russian, while it is an allophonic contrast in Korean. An MMNm was only elicited for the Russian listeners, indicating a rapid separation of these sounds into two categories, while the Korean listeners do not show any immediate sensitivity to the contrast. Furthermore, the MMN is also modulated by dialectal categories. Miglietta et al. (2013) compared the phonemic contrast [e]-[i] to the allophonic variation [ε]-[e] present in the Tricase dialect located in Southern Italy and found—in contrast to Kazanina et al. (2006)—an MMN response for both conditions. However, the latency of the phonemic condition was significantly earlier, pointing to a facilitated short-term memory trace formation in contrast to the allophonic condition.

In sum, these studies show that the electrophysiological investigation of allophonic and phonemic dialect contrasts is quite promising, since phonological and lexical processing stages are highly influenced by the listeners' regional accent.

The Present ERP Study

In Southern Germany, the Bavarian (including CB) and the Alemannic dialect area adjoin each other. Between both areas, a transition zone (BA) is located, in which phonological forms of both dialects interact with each other (cf. Wiesinger, 1983).

The investigated / $\overset{⌢}{oa}$ /-/ $\overset{⌢}{o Ʊ}$ / contrast distinguishes BA and CB since it is a stable contrast in CB, while in BA / $\overset{⌢}{oa}$ / occurs exclusively. Thus, the crucial point is how the respective phonemes have been assigned to lexemes in the dialect areas (see Supplementary Material for a precise diachronic description). For instance, /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / means ‘roses’ and ‘journeys’ in BA, while in CB it only corresponds to ‘journeys’. Moreover, /ʃtr $\overset{⌢}{oa}$ / means ‘straw’ in BA, while in CB it is pronounced as /ʃtr $\overset{⌢}{o Ʊ}$ /.

Thus, the differences in the speakers' competences lead to different form-meaning-associations in the dialect areas. As a result, they might evoke different kinds of dialect-related communication difficulties when lexemes containing the / $\overset{⌢}{oa}$ /-phoneme traced back to MHG ô are used by speakers of BA. Three types of potential communication difficulties are reflected in our experimental conditions:

(1) On the one hand, there might occur misunderstandings when a lexeme has different meanings in the two dialect areas. The usage of such lexemes in cross-dialectal communication leads to faulty decoding by the listeners. For example, the BA lexeme /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / for ‘roses’ means ‘journeys’ in CB.

(2) On the other hand, there might arise incomprehension, since many BA words containing the / $\overset{⌢}{oa}$ /-diphthong do not exist in CB. In these cases, the listeners do not have a lexical entry for the lexeme and cannot decode it. For example, the BA lexeme /ʃtr $\overset{⌢}{oa}$ / ‘straw’ is not part of the CB speakers' competence.

(3) A further differentiation between the dialects relevant for the present study affects the pronunciation of MHG ô before nasals (e.g., Lohn ‘wage’, Bohne ‘bean’). Among other variants, one common pronunciation is / $\overset{⌢}{o Ʊ}$ / in BA and /oː/ in CB. In cross-dialectal communication, it seems likely that articulatory very similar contrasts which are not integrated in the change of meaning between the dialects do not lead to erroneous decoding (cf., the BA variant /l $\overset{⌢}{õ \tilde{Ʊ}}$ / versus the CB /lõː/ for ‘wage’).

An ERP study using an oddball design was conducted in order to study the effect of the different phoneme to lexeme assignments on cross-dialectal comprehension caused by the merger of MHG ô and MHG ei in BA (condition 1 and 2) in contrast to a probable pure allophonic contrast (condition 3).

In a typical oddball design, isolated phonemes, syllables or meaningful words are presented to participants while they watch a silent film. However, since in everyday communication listeners are faced with words embedded in complex sentences, we were interested in vowel perception during auditory sentence processing in order to investigate a more natural setting. This way, the influence of phonological differences on lexical-semantic processes should be made ascertainable.

It is however doubtful whether sentences can constitute an equally invariant acoustic context as isolated syllables or words against which deviants are normally compared. So far, only few studies have adapted the oddball design to questions of phonemic and semantic processes during sentence comprehension (e.g., Menning et al., 2005; Boulenger et al., 2011; Bendixen et al., 2014). Their results indeed demonstrate the sensitivity of the MMN to complex linguistic material and suggest that during natural speech processing, the brain rapidly extracts phonetic information from the continuous signal and forms memory traces in the auditory cortex. Thus, it seems that memory traces can also develop for complex sentences, which include large-scale details about phonetic features of the speech signal.

Following Bendixen et al. (2014), an experimental design combining a classic oddball paradigm and a semantic rating task was developed for the current study. In contrast to Bendixen et al. (2014) as well as Boulenger et al. (2011), the material used in this study also included semantic violations that might involve higher levels of processing such as semantic contextual integration. Effects of semantic integration are indicated by a rather late negativity peaking around 400 ms after stimulus onset in the EEG. The N400 is distributed primarily over centro-parietal sites and is typically elicited by sentence-final words that are semantically anomalous or of low cloze probability (cf. Kutas and Hillyard, 1980, 1984; Connolly and Phillips, 1994; Lau et al., 2008; Kutas and Federmeier, 2011). Thus, it displays the violation of predictions and expectations built up by the preceding sentence context since predictable words are easier to access from memory and it requires more resources to process an implausible or infrequent continuation (cf. Lau et al., 2008). In the design of Boulenger et al. (2011), late ERP effects in the N400 time window were found indicating the involvement of semantic integration mechanisms. Thus, an oddball design including a task seems very promising to address the interaction between early acoustic processes and late mechanisms of semantic integration.

In the current study, a member of the N200 family (MMN, N2b) might be evoked as in the comparable study of Bendixen et al. (2014) for the conditions misunderstanding and incomprehension. However, due to the embedded semantic violations, which form an important difference to that study, we expect to also find an N400 in the misunderstanding condition. In contrast, we expect the fewest costs in semantic and lexical processing for the potential allophonic variation (condition 3) since the deviant might be categorized as a potential, allophonic form of the standard.

Materials and Methods

In the current study, listeners' perception of four different lexemes is investigated during sentence processing. The following three conditions reflect the special phoneme contact between the dialects of BA and CB (see Table 1). The experiment was conducted in CB, subsequently the terms ‘standard’ and ‘deviant’ are used with regard to the Central Bavarian dialect. The Central Bavarian lexemes served as the standard (2/3), while the Bavarian-Alemannic lexemes form the deviants (1/3).

(1) Condition 1: Misunderstanding

Whereas in BA the (former) homophonic lexeme /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / signifies both ‘roses’ and ‘journeys’, in CB it only bears the meaning ‘journeys’. When speakers from both areas communicate with one another, the usage of /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘roses’ could lead to misunderstandings because it is always understood as ‘journeys’.

(a) All sentences presented prime for the meaning ‘roses’. The Central Bavarian variant /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’ is infrequently interrupted by the Bavarian-Alemannic variant /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ /, which means ‘journeys’ in Central Bavarian.

e.g., Was im Garten viel Pflege braucht, sind Rosen. ‘That which needs much care in the garden, are roses’

(b) All sentences presented prime for the meaning ‘journeys’. The Central Bavarian variant /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’ is infrequently interrupted by the lexeme /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’.

e.g., Wofür er seinen Koffer packt, sind Reisen. ‘That for which he is packing his suitcase, are journeys’

(2) Condition 2: Incomprehension

Whereas in BA the MHG lexeme lôs ‘sow’ is pronounced like /l $\overset{⌢}{oa}$ s/, in CB the dialectal lexeme is /l $\overset{⌢}{o Ʊ}$ s/. In regional communication, the usage of /l $\overset{⌢}{oa}$ s/ may lead to incomprehension because this form is not part of the phonological competence of speakers from CB. Therefore, the lexical access fails.

Half of the sentences prime for the meaning ‘sow’, half are neutral sentences. The Central Bavarian variant /l $\overset{⌢}{o Ʊ}$ s/ is infrequently interrupted by the Bavarian-Alemannic variant /l $\overset{⌢}{oa}$ s/.

priming: e.g., Was die kleinen Ferkel säugt, ist die Lous. ‘That which is nursing the little piglets, is the sow'

neutral: e.g., Was er ihr genau beschreibt, ist die Lous. ‘What he is describing to her exactly, is the sow'

(3) Condition 3: Potential Comprehension

Whereas in BA ‘wage’ is pronounced like /l $\overset{⌢}{õ \tilde{Ʊ}}$ /, in CB the variant is /lõː/. Although the form is non-native to the listeners of CB, it is likely that it does not lead to erroneous decoding.

Half of the sentences prime for the meaning ‘wage’, half are neutral sentences. The Bavarian variant /lõː/ is sporadically interrupted by the Bavarian-Alemannic variant /l $\overset{⌢}{õ \tilde{Ʊ}}$ /.

priming: e.g., Was man am Monatsende bekommt, ist der Lohn. ‘That which you receive at the end of the month, is the wage'

neutral: e.g., Worüber sie beim Treffen reden, ist der Lohn. ‘That which they are talking about at the meeting, is the wage'

TABLE 1

Table 1. Experimental conditions.

Pretests

A pool of German sentences ending in Rosen ‘roses’ (77), Reisen ‘journeys’ (104), Lohn ‘wage’ (85) or Muttersau ‘sow (female pig)' (75) was developed using standard language vocabulary and syntax. All sentences followed the same global structure of topicalized relative clauses with the critical item in sentence final position (e.g., Was wild in der Hecke blüht, sind Rosen. ‘That which is blossoming wildly in the hedge, are roses’). The sentences were designed to create a semantic expectation of the critical items. In addition, neutral sentences were created, which were integrated into conditions 2 and 3, in which the lexical meaning of the sentence-final lexemes does not differ, in order to keep the participants' attention. Using exclusively well-matched priming sentences would have been too invariant for the semantic rating task, so this was counterbalanced by adding neutral sentence contexts.

To guarantee that the sentences were classified as either priming or neutral context sentences, the high and low cloze probability of the sentences was surveyed within an online rating procedure. The full sentences were presented visually on a computer screen and participants had to evaluate on a scale from 1 to 7 whether the sentence-final word fit the context, with 1 indicating that the respective word did not fit at all and 7 indicating that the word fit very well (task 1). In a second task, participants had to decide whether the critical item fit better, worse or equally well into the sentence in comparison to other possible options (task 2). If the participants stated that other words fit better into the context, they were asked to write them down (task 3). All sentences from all conditions were mixed up randomly into equal parts and separated into 8 groups, so that each participant had to judge approximately 60 sentences. Altogether, 78 speakers of the standard variety participated in the rating task (54 women, mean age 32.03 (SD 12.51)). In the subsequent steps, only sentences which were evaluated as very suitable regarding the sentence-final items were selected for priming conditions, reflected by the mean of >6 respectively >5.5 for Muttersau. In contrast, for neutral context conditions, moderately well-judged sentences were selected, reflected by a mean of 1.8–4.9 (Muttersau) respectively 2.9–5.5 (Lohn). This rating procedure resulted in 35 sentences for each condition, in total 210 sentences.

Stimuli

All 210 sentences were recorded several times by a male native speaker (year of birth 1963) who was born and raised in the Bavarian-Alemannic transition zone (Merching). He adapted the sentences to the dialect lexically and phonologically and produced the critical items in two variants each—his own and the Central Bavarian one. During the recordings, a natural pronunciation and a normal speech rate was ensured, as well as a comparable realization of the native and non-native lexemes with regard to their intensity and pitch. All stimuli were digitally recorded with a sampling rate of 44.1 kHz and a 16 bit (mono) sample size, using an electret microphone (Sony ECM-MS957).

For each condition, the best 30 sentences were selected from the recorded auditory material, leading to 180 sentences in total. Furthermore, 10 critical tokens of each item were chosen due to their F1 and F2 values being as similar as possible (see Table 2 for mean values). This acoustic variability was chosen in order to create a more natural speech perception and memory trace for the standard condition. This point is essential as it has been demonstrated that a higher and hence more natural variability in standard items allows for a more reliable abstraction or trace form from the different acoustic stimuli (cf. Phillips et al., 2000; Scharinger et al., 2011). Finally, four speakers from CB, who did not participate in the experiment, were asked to listen to the sentences to ensure that they are generally acceptable and comprehensible.

TABLE 2

Table 2. Phonetic values of the critical items (means).

The selected sentences were cross-spliced in order to use the same carrier sentence for standard and deviant conditions. To avoid different context inferences, a defined pause of 100 ms was inserted in front of the verb preceding the critical stimulus. In total, the sentences have an average length of 2.4 s. The dynamic range was manipulated in order to create a consistent sound because of the acoustic variation between and within sentences as a result of splicing. The pitch, duration or formants were not manipulated. Finally, all of the chosen items were controlled for and normalized in intensity. All of the adjustments were carried out using the software Adobe Audition CS6 (version 5.0.2).

Procedure

During the experiment, participants were seated comfortably in a dimly lit and quiet room. A computer screen was placed in front of them. Participants were instructed to listen to the auditorily presented sentences and to evaluate on a four-point-scale how well the sentence-final word fit the sentence context after the offset of each sentence. For each word pair of the misunderstanding condition, 180 sentences (30 prime sentences, presented four times = 120 primes; 30 deviant sentences presented twice = 60 deviants) were presented, distributed over 2 blocks containing 90 sentences each. For the word pairs of the incomprehension and potential comprehension conditions, 360 sentences (30 prime sentences, presented four times = 120 primes; 30 deviant sentences, presented twice = 60 deviants / 30 neutral sentences, presented four times = 120 neutral sentences; 30 deviant sentences, presented twice = 60 neutral deviants) in total, were distributed over 4 blocks containing 90 sentences each. All blocks of one word pair were presented directly following each other with only short breaks in between. The two word pairs of the misunderstanding condition did not directly follow each other in the presentation. In total, 1080 sentences were presented in 12 blocks consisting of 90 sentences each (60 standards / primes, 30 deviants), with each block of approximately 7 min duration. Between separate blocks, participants were offered a short break to rest their eyes. In order to avoid sequence effects, the block order was varied across participants.

Before the experiment started, the participants completed a short practice phase to ensure that the given task and further instructions regarding eye blink phases were understood. Thereafter, the first experimental block started with the request to the participant to click any key to begin the experiment to ensure the participants' full attention when each block started. Each trial began with the presentation of a fixation cross in the center of the computer screen for 500 ms. A stimulus embedded in a carrier sentence was played via two loudspeakers while the fixation cross remained displayed on the screen to minimize eye movements. After the offset of each sentence, the fixation cross was replaced by a question mark which gave the signal for the participants to rate how well the sentence-final word fitted the sentence context as accurately and as quickly as possible by pressing one of four buttons within maximally 2000 ms. The assignment of buttons to four possible answers (very well, rather well, rather badly, very badly) was counterbalanced across participants. During the question mark phase, participants were allowed to blink and rest their eyes. 1000 ms after each response or time-out, the next trial started with an upcoming fixation cross. All procedures were performed in compliance with relevant laws and institutional guidelines.

Participants

Twenty (13 women; mean age 44.5 (SD 4.87), age range 34–53) right-handed monolingual native speakers of German with normal or corrected-to-normal vision participated in the experiment. None of the participants had hearing deficits. All participants were born and raised in Isen (located in CB) and still live there. Their dialect competence was tested via a dialect pre-test. All participants gave their informed consent to this study and privacy rights were thoroughly obeyed. Each participant received monetary compensation for taking part in the study.

ERP Recording and Data Processing

An electroencephalogram (EEG) was recorded from 26 Ag/AgCl electrodes, mounted on an elastic cap (EasyCap), according to the 10–20 system (F7, F3, Fz, F4, F8, FC5, FC1, FCz, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CPz, CP2, CP6, P7, P3, Pz, P4, P8, POz) with a BrainVision (Brain Products GmbH) amplifier. The C2 electrode served as ground electrode, the reference electrode was placed at the tip of the nose. Two further electrodes were placed at the left and right mastoid sites. To measure the electrooculogram (EOG), two electrodes were placed below and above the left eye and two further electrodes laterally to the outer canthi of both eyes in order to control for horizontal and vertical eye movements. All electrode impedances were kept below 5 kΩ. EEG and EOG were recorded continuously with a sampling rate of 500 Hz and filtered offline with a 0.3–20 Hz bandpass filter. This filter setting was chosen in order to remove slow drifts from the signal and the 20 Hz low pass filter was chosen following previous ERP studies investigating natural speech processing (cf. Menning et al., 2005; Partanen et al., 2011). Then, EEG recordings were re-referenced offline to the linked mastoids.

Prior to data analysis, all individual EEG recordings were scanned for artifacts from body movements or eye blinks. All artifacts exceeding an amplitude of 40 microvolt were automatically removed from the data set. A subsequent manual inspection of all single-trial waveforms scanned for further artifacts in all EEG channels. Data sets with more than 25% artifacts within one condition were excluded from further analyses. As a result of these inspections, the data set of 1 participant (1 male) had to be excluded from the misunderstanding condition and the potential comprehension condition; 2 data sets (1 male) had to be excluded from analysis for the incomprehension condition. These data sets were also excluded from the respective behavioral data analyses (see Supplementary Material for exact numbers of rejections). From the overall data set of the remaining participants, 5.6% of the misunderstanding condition stimuli, 6.0% of the incomprehension condition stimuli and 5.0% of the potential comprehension condition stimuli were excluded from analysis.

Data Analyses

The arithmetical mean of all responses for each condition was calculated by allocating a numerical value to each of the four possible response levels: 1 ≙ very well, 2 ≙ rather well, 3 ≙ rather badly, and 4 ≙ very badly. The arithmetical means were analyzed with an ANOVA separately for each condition pair, the respective factors are therefore presented in the results section of each condition (see Sections Behavioral data Condition 1: Misunderstanding–Behavioral data Condition 3: Potential comprehension). Further analyses of comparisons of each word pair were conducted using the Wilcoxon signed-rank test with a Bonferroni correction for the p-values.

In order to prevent movement artifacts, the evaluation response was given with a delay after the offset of the sentence. Due to this temporal distance between the perception of each critical item and the response, the measured reaction times for each evaluation response are not meaningful and not reported here.

For the EEG data, a multifactorial repeated-measures ANOVA was calculated with the factors CONDITION (standard vs. deviant) and REGION [frontal (F3, FZ, F4), central (C3, CZ, C4), and parietal (P3, PZ, P4)]. Averages were calculated from onset of each sentence-final word up to 1000 ms thereafter for the misunderstanding condition and up to 900 ms thereafter for the two other conditions, with a baseline of 100 ms preceding the onset. The analysis was conducted with consecutive epochs of 50 ms from 0 ms up to 900 ms and 1000 ms respectively. Moreover, time windows for each paired comparison were also chosen based on hypotheses taken from the literature with similar experimental set-ups (Domahs et al., 2009; Boulenger et al., 2011; Bendixen et al., 2014) and were adjusted on the basis of visual inspection of the grand average curves. For effects with more than one degree of freedom, Huynh and Feldt (1976) corrections were applied to the p-values.

Results

Condition 1: Misunderstanding

Behavioral Data

The ANOVA for the misunderstanding condition with the factors DIPHTHONG (/r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’ vs. /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’) and EXPECTANCY (priming fulfilled vs. deviant) revealed only a main effect for the factor EXPECTANCY [F_{(1, 18)} = 605.67, p = 0.000], but not for the factor DIPHTHONG [F_{(1, 18)} = 0.56, p > 0.05], i.e., the diphthong itself did not have a significant influence on the participants' evaluation but only the compliance or non-compliance with the built-up expectancy due to the priming sentence context. Further analyses support this finding as they show that both correct priming conditions primed equally well regardless of the diphthong [/r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’: mean 1.12 (SD 0.20) vs. /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’: mean 1.15 (SD 0.17); Z₍₁₈₎ = −1.01, p > 0.05] while the non-compliant deviants were evaluated as significantly less acceptable than the correct sentence-final words in the priming condition sentences [on a scale from 1 = acceptable to 4 = unacceptable; prime /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’ mean 1.12 (SD 0.20) vs. deviant /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’: mean 3.72 (SD 0.79); Z₍₁₈₎ = −3.79, p = 0.000 / prime /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’: mean 1.15 (SD 0.17) vs. deviant /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’ mean 3.90 (SD 0.08); Z₍₁₈₎ = −3.82, p = 0.000].

ERP Data

The comparison of the standard /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’ and the deviant /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’ elicited an early negativity effect in the time window from 100 to 200 ms, followed by a late positivity effect from 400 to 900 ms (cf. Figures 2, 4). The calculation of a repeated measures ANOVA revealed a significant main effect for the factor EXPECTANCY (priming fulfilled vs. deviant) [F_{(1, 18)} = 11.52, p = 0.003, η²p = 0.05] but no significant interaction between the two factors EXPECTANCY and REGION in the early time window [F_{(2, 36)} < 1, p > 0.05, η²p = 0.00]. In the later time window, the statistical analysis showed a main effect for both conditions [EXPECTANCY: F_{(1, 18)} = 26.35, p = 0.000, η²p = 0.07; REGION: F_{(2, 36)} = 28.08, p = 0.000, η²p = 0.23] as well as a significant interaction between them [F_{(2, 36)} = 15.49, p = 0.000, η²p = 0.01]. The post-hoc analysis of this interaction by REGION revealed a stronger occurrence of the late positive component in the centro-parietal regions [frontal: F_{(1, 18)} = 7.37, p < 0.05, η²p = 0.01; central: F_{(1, 18)} = 26.55, p = 0.000, η²p = 0.12; parietal: F_{(1, 18)} = 44.45, p = 0.000, η²p = 0.28].

FIGURE 2

Figure 2. Condition 1a: Misunderstanding: Grand averages of event-related potentials obtained for the deviant condition /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / and priming standard condition /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / measured from 100 ms prior the word onset up to 1000 ms.

For the comparison of the standard /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / ‘journeys’ and the deviant /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / ‘roses’ (see Figures 3, 4), the ANOVA showed no significant effects in the early negativity time window (100–200 ms) but a significant negativity effect was elicited by the deviant condition in a later time window from 300 to 500 ms [EXPECTANCY: F_{(1, 18)} = 9.19, p = 0.007, η²p = 0.04; REGION: F_{(2, 36)} = 7.47, p = 0.012, η²p = 0.07]. The significant interaction between the factors EXPECTANCY and REGION [F_{(2, 36)} = 5.37, p = 0.028, η²p = 0.01] resolved by REGION revealed a stronger occurrence of the early negativity in the centro-parietal region [frontal: F_{(1, 18)} = 1.33, p > 0.05, η²p = 0.00; central: F_{(1, 18)} = 10.53, p < 0.01, η²p = 0.08; parietal: F_{(1, 18)} = 14.96, p < 0.01, η²p = 0.13]. In the time window from 550 to 1000 ms, a positivity effect was elicited by the deviant condition [EXPECTANCY: F_{(1, 18)} = 22.22, p = 0.000, η²p = 0.04; REGION: F_{(2, 36)} = 20.19, p = 0.000, η²p = 0.21] but there was no significant interaction between the two factors [F_{(2, 36)} = 2.66, p > 0.05, η²p = 0.00].

FIGURE 3

Figure 3. Condition 1b: Misunderstanding: Grand averages of event-related potentials obtained for the deviant condition /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / and priming standard condition /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / measured from 100 ms prior the word onset up to 1000 ms.

FIGURE 4

Figure 4. (Top) row: Topographic difference maps for the deviant condition /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / and priming standard condition /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / for the two significant time windows 100–200 ms and 400–900 ms. (Lower) row: Topographic difference maps for the deviant condition /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / and priming standard condition /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / for the two significant time windows 300–500 ms and 550–1000 ms.

Discussion

In the misunderstanding condition, the deviant r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ _{/r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$} elicited a negativity effect in the latency range between 300 and 500 ms. Due to its latency as well as its centro-parietal scalp distribution, which is typical for the context-dependent N400 effect (cf. Lau et al., 2008; Kutas and Federmeier, 2011), we interpret this negativity as an N400 reflecting the semantic mismatch between the expected continuation of the sentence and the perceived input. The semantic priming of ‘journeys’ builds up an expectation for the correct word form /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ /. The semantically incongruous word /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / leads to a strong semantic mismatch between the context-based information held in working memory and the unfitting item. The N400 effect indexes an increased integration difficulty of the (incongruous) critical item with the prior sentence context (cf. Kutas and Hillyard, 1980; Brown and Hagoort, 1993; Kutas and Federmeier, 2000; Lau et al., 2009). Moreover, the N400 amplitude is modulated by the ease of accessing information from long-term memory (cf. Kutas and Federmeier, 2000; Lau et al., 2009). Due to the priming context, a congruous ending is pre-activated which is then disrupted by the incongruous sentence-final word. This disruption leads to higher processing costs, displayed by the pronounced N400 amplitude. The semantic mismatch between expectancy and perceived input is also reflected by the behavioral data, since sentences with unexpected final lexeme /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / were evaluated significantly worse than the sentences ending with the predictable word /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ /.

Interestingly, in the reversed condition our results show an earlier negativity effect between 100 and 200 ms for the deviant r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ _{/r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$}. In previous studies, early negativities in similar time windows have been interpreted as detections of sudden changes in acoustic features of speech sounds embedded in sentences and have been classified as members of the N200 family (cf. Boulenger et al., 2011 for reversed speech; Bendixen et al., 2014 for omissions). The N200 component is distinguished into three sub-components (N2a, N2b, N2c) and is typically evoked between 180 and 325 ms respectively 100 and 200 ms (N2a) (cf. Patel and Azzam, 2005). Beside differences in latency and topography, early negativity effects are primarily separated on the basis of their sensitivity to varying task conditions. While the MMN (N2a) reflects pre-attentive passive change detections, the N2b is elicited by task relevant physical mismatches and thus requires attention to the stimuli. Thus, both components index different stages of mismatch detection (cf. Ritter et al., 1984; Pritchard et al., 1991; Folstein and Van Petten, 2008). Bendixen et al. (2014) support the component's dependency on active listening and its reflection of conscious error and mismatch detection, as well. In line with this perspective, we interpret the early negativity effect found for r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ _{/r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$} as an instance of the N2b, a reflection of a general rule-governed error detection mechanism. Since participants' attention was directed explicitly to the critical items due to the rating task, the N200 effect reflects an active discrimination and classification process, elicited by the deviation from a mentally-stored expectation of the standard stimulus (cf. Patel and Azzam, 2005), which more generally reflects the recognition of deviations and violations in regular structures (cf. Bohn et al., 2013; Henrich et al., 2014 for rhythmic irregularities). Thus, the semantic priming of ‘roses’ builds up an expectation for the correct word form /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ /. The deviation /r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ / is perceived as being different from the activated standard stimulus /r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ / and thus not fitting the activated memory trace.

Furthermore, late positive components (LPC) were elicited for r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$ _{/r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$} (550–1000 ms) and r $\overset{⌢}{oa}$ s $\underset{ˈ}{n}$ _{/r $\overset{⌢}{o Ʊ}$ s $\underset{ˈ}{n}$} (400–900 ms). We interpret these positivity effects as members of the P300 family, reflecting the evaluation process related to the given task requirements (Bentin et al., 1999; Knaus et al., 2007; Roehm et al., 2007; Domahs et al., 2008, 2009; Bohn et al., 2013; Henrich et al., 2014). Recall that participants had to perform a semantic rating task, i.e., their attention was directed consciously toward the linguistic material. Thus, the elicited positivity reflects the match or mismatch between the expected and encountered word form, i.e., the comparison of the critical stimulus with the expectation built-up by the memory trace of the standard form and—in priming contexts—the semantic information from the sentence context. The P300 is a positive deflection evoked by meaningful, task-relevant stimuli only when the subjects' attention is required for the task (cf. Picton, 1992). The P300 can be situated in processes of categorization, decision making, and context updating (cf. Coulson et al., 1998). Furthermore, the LPC may reflect a reanalysis process with regard to the semantic correctness of the presented sentences (cf. Domahs et al., 2008; Henrich et al., 2014). A deviating word form in sentence-final position requires a reanalysis and reevaluation of the previously built-up structure because the repetitively presented standard and the priming sentence context build up high expectations for a certain form. In this respect, the amplitude of the LPC is additionally modulated by the degree of the required reanalysis process, i.e., by the degree of deviation between what was expected and what was encountered.