Use what you can: storage, abstraction processes, and perceptual adjustments help listeners recognize reduced forms

Poellmann, Katja; Mitterer, Holger; McQueen, James M.

doi:10.3389/fpsyg.2014.00437

ORIGINAL RESEARCH article

Front. Psychol., 30 May 2014

Sec. Psychology of Language

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.00437

This article is part of the Research TopicPhonological and Phonetic Competence: Between Grammar, Signal Processing, and Neural ActivityView all 14 articles

Use what you can: storage, abstraction processes, and perceptual adjustments help listeners recognize reduced forms

Katja Poellmann^1,2^*†

Holger Mitterer^1†

James M. McQueen^1,3

¹Language Comprehension Department, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
²International Max Planck Research School for Language Sciences, Nijmegen, Netherlands
³Behavioural Science Institute and Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands

Three eye-tracking experiments tested whether native listeners recognized reduced Dutch words better after having heard the same reduced words, or different reduced words of the same reduction type and whether familiarization with one reduction type helps listeners to deal with another reduction type. In the exposure phase, a segmental reduction group was exposed to /b/-reductions (e.g., minderij instead of binderij, “book binder”) and a syllabic reduction group was exposed to full-vowel deletions (e.g., p'raat instead of paraat, “ready”), while a control group did not hear any reductions. In the test phase, all three groups heard the same speaker producing reduced-/b/ and deleted-vowel words that were either repeated (Experiments 1 and 2) or new (Experiment 3), but that now appeared as targets in semantically neutral sentences. Word-specific learning effects were found for vowel-deletions but not for /b/-reductions. Generalization of learning to new words of the same reduction type occurred only if the exposure words showed a phonologically consistent reduction pattern (/b/-reductions). In contrast, generalization of learning to words of another reduction type occurred only if the exposure words showed a phonologically inconsistent reduction pattern (the vowel deletions; learning about them generalized to recognition of the /b/-reductions). In order to deal with reductions, listeners thus use various means. They store reduced variants (e.g., for the inconsistent vowel-deleted words) and they abstract over incoming information to build up and apply mapping rules (e.g., for the consistent /b/-reductions). Experience with inconsistent pronunciations leads to greater perceptual flexibility in dealing with other forms of reduction uttered by the same speaker than experience with consistent pronunciations.

Introduction

In casual speech, speakers tend to articulate in a sloppy way. They frequently reduce words by slurring and even omitting segments or syllables (Ernestus, 2000; Patterson et al., 2003; Johnson, 2004; Mitterer and McQueen, 2009). A given native Dutch speaker may for example reduce the /b/ in bandiet “bandit” to [m] or leave out the first vowel in kanaal “canal” (Schuppler et al., 2011). Listeners might get used to such pronunciation habits; they may recognize a reduced word better the second time and they may be able to adjust rapidly to new forms of reduction produced by the same speaker. The present study investigates whether listeners adapt to a given reduction type (/b/-reductions or full-vowel-deletions) and, if so, how they adapt by asking if they can apply their knowledge to previously unheard reduced words of the same reduction type and/or of the other reduction type. Put another way, the present study tests word-specific learning effects as well as generalization of learning within and across reduction types.

Listeners are usually not aware that they encounter numerous reduced word forms every day (Kemps et al., 2004; Ernestus and Warner, 2011). They use the information provided by the sentence context or also the wider discourse context to predict and, if necessary, restore the upcoming word (Ernestus et al., 2002; Brouwer et al., 2013). On a lower level, listeners are also able to exploit the fine phonetic detail present in reduced forms to distinguish for instance between a reduced form [spɔːt] of support and the unreduced form [spɔːt] sport (Manuel, 1992).

Another mechanism which listeners may use to recognize reduced forms better is adaptation, as perceptual learning may be especially important when the conditions for spoken-word recognition become challenging.

Adaptation, for instance, has been found to play a crucial role in recognizing regional and foreign-accented speech (Clarke and Garrett, 2004; Floccia et al., 2006; Mitterer and McQueen, 2009). Listeners are able to adapt rapidly to these deviant pronunciations and can apply their acquired knowledge to the way they process other words (Witteman et al., 2013).

The present study tests whether a similar adaptation process also takes place when listeners encounter reduced words in their native language. Like regional and foreign-accented words, reduced words are also variants of canonical pronunciations, but the reduction types chosen for investigation in the present study (/b/-reductions and full-vowel-deletions) were not regionally marked. In contrast to regional and foreign accents, reductions affect predominantly unstressed segments and syllables. They are therefore probably less salient. This might make it harder for listeners to adapt to reduced speech than to regional or foreign-accented speech.

The present study investigates potential adaptation processes and their possible constraints. Consider a Dutch listener hearing the word paraat “ready” pronounced as p'raat. Different patterns of adaptation are possible that vary in how general they are. First, no adaptation whatsoever may be found. Second, the listener may find it easier to recognize a second instance of the same word with the same reduction pattern. This would be similar to the recognition benefits for words repeated in the same voice that provide some of the evidence for episodic models of word recognition (Nygaard et al., 1994; Goldinger, 1996, 1998; Nygaard and Pisoni, 1998). Third, listeners may learn that this speaker generally deletes vowels in unstressed syllables. This abstractionist learning may be quite specific, so that only very similar reductions to p'raat benefit (e.g., Parijs “Paris” produced as P'rijs; note that the Dutch rendition is stressed on the second syllable) or it may include reductions of unstressed vowels in other contexts (e.g., kanaal “canal” produced as k'naal). The strongest possible generalization would be that the listener assumes that this speaker reduces a great deal and hence finds it easier to recognize any kind of reduction uttered by the speaker.

Finding a word-specific learning effect, that is, better recognition of a reduced word on hearing it for the second time compared to the first time, would be evidence for episodic storage of reduced forms. In contrast, observing generalization of learning to new words of the same reduction type (e.g., generalization from p'raat to P'rijs or k'naal) would indicate that an abstraction process is taking place and that it occurs at a prelexical level. Storing reduced forms alone cannot account for easier recognition of previously unheard reduced words (McQueen et al., 2006; Cutler et al., 2010). In a purely episodic account of lexical access, there is no way to adjust weights of sublexical units like segments and syllables to build up rules that capture regular reduction processes (e.g., “Potentially restore a bilabial nasal in an unstressed syllable to a bilabial voiced stop if followed closely by another nasal”). Finding generalization of learning to new words of the same reduction type would thus support the claim that there is abstraction in lexical access. Observing generalization of learning from one reduction type to another may also be evidence for abstraction—if there is enough similarity between the reduction types to abstract over the respective mapping rules. Consider, for example, two types of prefix reductions, such as ge- /gə/ → /g/ and be- /bə/ → /b/ in German. An abstraction rule may be: “Potentially insert a schwa after an initial voiced stop” (instead of “… after an initial voiced velar/bilabial stop”). However, should generalization of learning across reduction types be found for very different reduction types, such as the /b/-reductions and full-vowel-deletions examined here, this would more likely indicate a non-specific adjustment and be evidence for the flexibility of the perceptual system. That is, instead of specific adaptation processes (storage of reduced forms and/or abstraction of reduction rules), listeners could make a more general adjustment to the current talker's speaking style.

To test these possible adaptation effects, the printed-word eye-tracking paradigm (McQueen and Viebahn, 2007) was used. In the exposure phase, one group of participants was exposed to segmental reductions, another group was exposed to syllabic reductions and a third group was exposed only to canonical pronunciations. The first group, the segmental reduction group, heard /b/-reductions, where the word-initial /b/ was reduced to a bilabial nasal (e.g., minderij instead of binderij “book binder”). The second group, the syllabic reduction group, heard words in which the first, unstressed full vowel was deleted (e.g., p'raat instead of paraat “ready”). The third group, the control group, heard the same words as the two experimental groups during the exposure phase but all in unreduced form (e.g., binderij and paraat).

In order to assess the frequency with which our chosen reduction types (/b/-reductions and full-vowel-deletions) occur in spontaneous speech, we conducted a corpus study following the principles of Pluymaekers et al. (2005). First, all sound files containing a /b/-initial word with a nasal in third position and an unstressed first syllable were extracted from the Corpus of Spoken Dutch (Oostdijk, 2000). Per word type (this notion here not only describes words belonging to different lemmas but also different word forms of one lemma, e.g., an inflected verb form or the plural of a noun) only one token was randomly chosen to determine its phonetic realization. Out of 65 word types, six showed a /b/ → [m] reduction in the first segment (i.e., 9.2% of the considered cases). A similar analysis was conducted to assess the frequency of full-vowel-deletions in initially unstressed words. The vowel was deleted in eight out of 66 word types (i.e., in 12.1%) containing either a voiceless plosive (/p/, /k/) or a voiceless velar fricative (/x/) in first position and an alveolar nasal or liquid in third position. This was also the segmental structure used in the syllabic reduction condition. The chosen reduction types were thus indeed real-world phenomena and comparable in terms of frequency.

These two reduction types were chosen to examine adaptation to two different-sized linguistic units, the phoneme and the syllable, and the possible interaction of the adaptation effects. An earlier study showed that listeners adapt to syllabic reductions involving a morpheme: After exposure to words containing the reduced prefix ver- (realized as [fː]), Dutch listeners recognized previously unheard reduced ver-words better than a control group (Poellmann et al., under revision). In the present study, we test whether this is also the case for non-morphemic syllables. The deletion of the unstressed, full vowel in CVC-initial words like paraat always leads to a reduction in the number of syllables, which is why this reduction type is called “syllabic.” A pure comparison of morphemic and non-morphemic reductions, however, turned out to be impossible in Dutch. Ideally, one would like to compare a morphemic reduction type (that only affects one specific morpheme, i.e., the same strings of segments, such as Dutch ge-) to a non-morphemic reduction type that also only affects one specific string of segments (e.g., pa-). The Dutch lexicon, however, does not contain enough words starting with one specific unstressed non-morphemic syllable to conduct such an experiment. This constraint on the (non-)morphemic status hence leads inevitably to higher variability in the segmental structure of the CVC-targets compared to the ver-targets examined in Poellmann et al. (under revision). This difference in the degree of consistency with which words are reduced in the two conditions allowed us to ask whether phonological consistency determines which adaptation processes (e.g., storage, abstraction rules, general flexibility) listeners are able to use.

In the test phase, all three groups of participants heard /b/-reductions and vowel-deletions. The reduced words were either the same as in the exposure phase (in Experiments 1 and 2) or different (in Experiment 3). If listeners adapt to a given reduction type and if they can transfer this knowledge to new words (Experiment 3) and/or to other reduction types (Experiments 1–3), participants in the experimental groups should recognize reduced words better than participants in the control group.

Regardless of the specifics concerning the reduction (such as size of the reduced unit or input consistency), it seems plausible that a reduced word can be recognized more easily if it is encountered a second time. We therefore expect to find word-specific learning effects for both /b/-reductions and vowel-deletions.

Moreover, we predict that learning about /b/-reductions generalizes to new words that are reduced in the same way. Such generalization effects have been observed for a similar kind of /b/-reduction where the word-initial voiced stop was reduced to a labio-dental approximant [ν] (Poellmann et al., under revision) and for learning about segmental idiosyncrasies (McQueen et al., 2006). In the McQueen et al. (2006) study, listeners adapted to an ambiguous sound (between /s/ and /f/) and transferred their knowledge to previously unheard minimal pairs that only differed in containing either /s/ or /f/.

The predictions concerning within-reduction-type generalizations for full-vowel-deletions are less clear. The constraint on the (non-)morphemic status of the syllable leads to higher variability in the segmental structure of the CVC-targets compared to the /b/-targets. If the input has to be highly consistent for the creation of abstract mapping rules, we might not observe generalization of learning.

The two reduction types under investigation differ in several respects, such as the degree of reduction (weakening of the [b] vs. deletion of the vowel), in the segment that is reduced (bilabial voiced stop vs. full vowel) and in the position the reduced segment occurs (first position for /b/-reductions vs. second position for vowel-deletions). In order to observe generalization of learning across reduction types, listeners would hence have to adapt on a fairly global level. However, such global adjustments to challenging listening conditions have been observed before (Brouwer et al., 2012; McQueen and Huettig, 2012).

Experiment 1

The aim of Experiment 1 was to test whether listeners are able to recognize segmental and syllabic reductions better when they have already encountered the same words in reduced form before. Experiment 1 also asked whether learning about reductions might generalize from one reduction type to another (i.e., from /b/-reductions to full-vowel deletions and/or vice versa). In the exposure phase, one group was exposed to /b/-reductions (segmental reduction group), a second group was exposed to full-vowel deletions (syllabic reduction group), while a third group was exposed to canonical forms only (control group). In the test phase, all three groups were tested on reduced-/b/ words and vowel-deleted words. Importantly, these reduced words had already occurred in reduced or canonical form (depending on the group) in the exposure phase. If listeners can adapt to reduced words, the segmental reduction group should recognize the reduced-/b/ words better than the syllabic reduction group and the control group because of their previous exposure to these words in reduced form. The same holds for participants in the syllabic reduction group: If they can adapt to vowel-deleted words, they should perform better on these words than the segmental reduction group and the control group. If listeners can additionally transfer their knowledge about one reduction type to another, the segmental reduction group should outperform the control group on the vowel-deleted words and the syllabic reduction group should outperform the control group on the reduced-/b/ words.

Methods

Participants

Seventy-five participants of the Max Planck Institute's subject pool, all native speakers of Dutch, were paid to take part. All reported normal hearing and normal or corrected-to-normal vision.

Design

Participants were randomly assigned to one of three groups: a segmental reduction group, a syllabic reduction group and a control group. They listened to sentences, saw four printed words on a computer screen and were asked to click on the word that occurred in the sentence. Improved word recognition in a visual-world eye-tracking experiment can be reflected by faster and more accurate mouse clicks on the target word as well as higher fixation proportions toward the target and away from the similar sounding competitor. We thus measured Reaction Times (RTs) and accuracy of mouse clicks and fixation behavior.

In the exposure phase, participants were exposed to words that were potentially reduced (see the experimental exposure trials in Table 1) but which did not appear on the screen. Instead, they saw (and had to click on) target words that occurred later in the sentences. All three groups were also exposed to unreduced /m/- and unreduced consonant-cluster-words (e.g., /mɑtros/ matroos “sailor” and /knɔflok/ knoflook “garlic”); they also had to click on these filler stimuli.

TABLE 1

Table 1. Experimental design and types of stimuli in Experiments 1 and 2.

In the test phase, all three groups heard reduced /b/-words and vowel-deleted words in the experimental trials. These were the same words as had appeared in the exposure phase (e.g., [mɪndərεɪ] instead of [bɪndərεɪ] binderij “book binder” and [prat] instead of [parat] paraat “ready”). All groups also heard new canonical /m/- and new canonical consonant-cluster words. The reduced /b/-words, the vowel-deleted words, the unreduced /m/-word fillers and the consonant-cluster filler words were all targets and were therefore displayed on the computer screen in (canonical) orthographic form.

Materials

The target words (i.e., the words participants had to click on) appeared toward the end of spoken sentences. Each target word occurred in a different sentence context not containing any further /b/s in unstressed syllables or any further unstressed CVC-sequences which would result in legal consonant clusters when omitting the vowel. The potentially reduced item occurred before the target word in the experimental trials (e.g., Pas in een [b]/[m]inderij wordt een boek of tijdschrift afgemaakt “Only at a book binder, a book or magazine gets finished,” where bold font indicates the target word and underlining marks the potentially reduced critical item). This was done to prevent participants from clicking on the same words twice, once in the exposure phase and once in the test phase. In the test phase, the semantic contexts preceding the target words were kept uninformative (e.g., Het tekstverwerkingprogramma kende het woordje [m]inderij niet “The word processor did not know the word book binder”). During each sentence, there were always four printed words on the screen. In the test trials, these were a /b/-word, a /m/-word, a CVC-word and a consonant-cluster word (see Figure 1 for an example display).

FIGURE 1

Figure 1. Example display of a test trial in Experiments 1–3.

The test phase consisted of 48 experimental trials containing either /b/-targets or CVC-targets and 48 filler trials containing either /m/-targets or CC-targets. For each type of target word (/b/-target, /m/-target, CVC-target, and consonant-cluster-target), 24 target-competitor pairs were selected (see Table S1 for the /b/-targets, the CVC-targets, and their respective competitors). If a /b/-word was the target, a /m/-word was the competitor and vice versa. The same holds for CVC- and consonant-cluster-targets. All /b/- and /m/-initial words contained an unstressed first syllable. In second position, any vowel including schwa could occur followed by a nasal in third position. The latter condition was necessary for all /b/-targets to motivate nasalization at the beginning of the word. However, there are not sufficient /m/-initial words in Dutch containing a nasal in third position to create perfectly matched pairs of /b/-targets and /m/-competitors. Ideally, /b/-words and /m/-words should be as similar as possible with as much overlap in the reduced forms as possible (e.g., binderij “book binder” pronounced as [mɪndərεɪ] overlaps in the first two syllables with [mɪndərjarəx] minderjarig “underage”). Due to the infrequent occurrence of a nasal in third position following an /m/ in first position, the /m/-targets contained a random consonant in third position (and so did the corresponding /b/-competitors; e.g., moeras “swamp” and boerin “farmer's wife”). Target-competitor pairs were further matched in terms of number of syllables, stress pattern and word frequency [taken from SUBTLEX-NL (Keuleers et al., 2010)] as much as possible (see Table S1).

The principles of as much overlap and similarity as possible between targets and competitors also applied to the (reduced) CVC- and (unreduced) consonant-cluster-words. CVC-words started with an open syllable, consisting of a voiceless consonant (either /p/, /k/, or /x/) and a full vowel, followed by a liquid or /n/ in third position (e.g., paraat “ready”), so that the sequence resulting from vowel deletion would be a phonotactically legal consonant cluster in Dutch. The consonant-cluster words started with the same voiceless consonants directly followed by a liquid or [n] (e.g., praat “talk”). While the stress of the CVC-words was on the second syllable, the consonant-cluster-words were stressed on the first syllable, so that both word types were matched on stress pattern when the full vowel of the CVC-words was deleted (e.g., p'RAAT for paRAAT “ready” and PRAAT “talk”). Again, target-competitor pairs were matched on number of syllables (in the reduced form) and word frequency (see Table S1).

The exposure phase consisted of 96 trials in total. Half of them were filler trials containing /m/-targets or CC-targets. The 48 experimental trials contained potentially reduced /b/-words or CVC-words that did not appear on the screen. The only constraint for the target-“competitor” pairs on the screen was that they did not overlap.

Stimulus construction

Digital recordings of the stimuli were made by a female native speaker of Dutch in a sound-proof booth, sampling at 44.1 kHz. She was instructed to produce the sentences in a casual way, not just reading them aloud. For sentences containing canonically pronounced /b/-targets, an additional set containing reduced forms was created by replacing the /b/ with an /m/ from a word with the same vowel context. The spliced parts were adjusted in pitch (with PSOLA in PRAAT, Boersma and Weenink, 2010) and intensity to their new context. The transitions in amplitude preceding and following the spliced-in [m]s were smoothed where necessary in order to reduce splicing artifacts. The set of sentences containing reduced CVC-words was created by cutting out the first (unstressed) vowel of the recorded versions of these words with intact vowels. Sentence contexts were thus identical across the reduced and unreduced forms of each target word. Filler sentences containing /m/- and consonant-cluster-targets were not manipulated.

Procedure

Participants were seated in a sound-attenuated booth at a comfortable viewing distance from the computer screen. Eye movements were monitored using an SR Research EyeLink 1000 set-up, sampling at 1 kHz. The auditory stimuli were presented to the participants over headphones. Prior to the experiment, participants received written instructions that informed them that they would see four printed words on the screen and asked them to click on the word that occurred in the sentence.

At the beginning of each trial, a fixation cross appeared in the center of the screen for 500 ms. Four printed words (in a 25-point Arial font) were then presented. After 1500 ms, the auditory stimulus was played. As soon as participants had listened to the entire sentence and had clicked with the mouse on the screen, the following trial was initiated. Every 10 trials, a drift correction was carried out. Participants had the opportunity to take a break after every 50th stimulus. The experiment started with six practice trials. The 96 exposure trials in random order were followed by 96 test trials in random order. Randomization was different for each participant. An experimental session took approximately 25 min.

Results

Exclusion criteria

Mouse click responses (reaction time and accuracy data) and eye movements served as dependent variables. For the eye-tracking data, we analyzed the data from the participant's right eye. For the analysis of the eye-tracking data, a total of 2.9% of the trials were excluded, because participants either appeared to have looked away from the screen (2.0%) or failed to click on the target or the potentially confusable competitor (0.9%). Clicks on the competitor were not excluded from all of the analyses, as the competitors sometimes better fitted the exact auditory input with reduced forms than the targets. For instance, reduced p'raat better fitted the canonical form of the competitor praat than the canonical form of the target paraat. Furthermore, the semantics of the test sentences did not make clear which word was the target. In the case of minimal pairs such as paraat and praat, participants thus never received disambiguating information about which of the two words they should click on. Therefore, clicks on competitors were not regarded as errors in the analyses of the eye-tracking and the reaction time data. Note also that excluding trials from the eye-tracking analysis in which participants clicked on the competitor would invalidate any learning effects. Presumably, participants look more at the competitor when they click on it. Excluding these trials would result in a greater preference for the target over the competitor and would thus misleadingly indicate a greater learning effect than was actually present. Moreover, the focus in the RT analyses is on the comparisons across the three exposure groups; these comparisons are thus orthogonal to any differences between targets and competitors. Click responses to competitors, however, were regarded as incorrect in the analysis of the accuracy scores.

The upper part of Table 2 displays descriptive statistics on RTs for trials in which participants clicked either on the target or on the phonological competitor in the test phase of Experiment 1. Participants in the syllabic reduction group took longer to respond than participants in the segmental or no-reduction group. Participants, however, were not asked to respond as fast as possible. Some participants chose to do so; others waited for the sentence to finish before giving a response. The high standard deviation (SD) values reflect these different strategies. Extreme cases, that is, trials in which participants responded either too fast or too slowly, were also excluded. To do that, a linear mixed-effects model containing only participants and items as random effects and Trial Number as fixed effect was run. The residuals of this atheoretical model were computed. Based on visual inspection of a residual plot, 19 trials (0.5%) in the test phase (with residuals either below −1300 or above 3200 ms) were excluded.

TABLE 2

Table 2. RTs in ms in the test phases of Experiments 1 and 2 for clicks on targets and competitors.

Statistical testing

Linear mixed-effects models were used to analyze the click responses (accuracy¹ and RT²) and the eye movement³ data on the experimental trials (the /b/-targets and the CVC-targets). To account for the categorical nature of the accuracy data, we used a logistic regression model for these data (cf. Dixon, 2008; Jaeger, 2008). The eye-tracking data were transformed into fixation proportions using the empirical logit function. Participants and Items were entered in the model as random factors including random slopes for Items. Group served as fixed effect. The segmental reduction condition (/b/-words) and the syllabic reduction condition (CVC-words) were analyzed independently. This is because a comparison between these two word sets is difficult: Both had to conform to different phonological constraints and could hence not be balanced on other variables (such as word length, lexical frequency, etc.). We therefore focus on the comparison of how the different groups recognize each word set independently (a one-factorial design with three levels: exposed to /b/-reductions, exposed to vowel-deletions, and not exposed to reductions). Trial Number was entered as another fixed effect with values centered around zero in the models for the accuracy and RT data. This variable was added to account for additional variance, as task performance often improves over the course of an experiment. The results for Trial Number, however, will not be reported below. Thus, we tested whether RTs, accuracy scores and target preference (as determined by the difference between proportion of target and competitor fixations) for the reduced words were influenced by the fixed effect of Group. That is, we examine whether the groups differ in how fast and accurately they recognize the reduced /b/-words and the vowel-deleted words and whether they show different target-competitor preferences when they process reduced words. The control group was always mapped on the intercept, so that the analysis gives two regression weights for the factor Group, one for the difference between the control group and the segmental reduction group and one for the difference between the control group and the syllabic reduction group. For the eye-tracking analyses, we had no a priori expectations about when effects would occur. We therefore analyzed the fixation data at all time points, using sliding 200 ms time windows from 200 to 1500 ms after target onset starting at every 100 ms.

Test phase

Reaction time data. Figure 2A displays the mean RTs of all three groups for the reduced /b/-words (visual /b/-targets) and the vowel-deleted words (visual CVC-targets) in the test phase of Experiment 1. In the segmental reduction condition (/b/-targets), all three groups responded about equally fast and no significant differences between the groups emerged (b_{Segmental reduction group} = −17.9, SE = 87.5, t = −0.2, p = 0.84; b_{Syllabic reduction group} = 117.3, SE = 87.4, t = 1.3, p = 0.21). In the syllabic reduction condition (CVC-targets), there was also no main effect of Group (b_{Segmental reduction group} = −32.7, SE = 98.9, t = −0.3, p = 0.77; b_{Syllabic reduction group} = 1.7, SE = 97.4, t = 0.02, p = 0.98). That is, neither of the experimental groups responded faster than the control group to the reduced words. We thus did not observe any adaptation effects in the RT data.

FIGURE 2

Figure 2. Mean RTs and SEs in the test phases of Experiment 1 (A), Experiment 2 (B), and Experiment 3 (D) and in the exposure phase of Experiment 3 (C). In the test phases, the /b/-words and CVC-words were reduced for all groups. In the exposure phases, the /b/-words were reduced only for the /b/-reduction group (segmental reduction group) and the CVC-words were reduced only for the V-deletion group (syllabic reduction group).

Accuracy data. The accuracy data in the test phase of Experiment 1 are displayed in Figure 3A in terms of percentage of correct click responses and SEs. In the segmental reduction condition (visual /b/-targets), the main effect of Group was significant. Both the segmental reduction group (b_{Segmental reduction group} = 3.4, SE = 0.7, p < 0.001) and the syllabic reduction group (b_{Syllabic reduction group} = 2.3, SE = 0.5, p < 0.001) gave more correct responses to /b/-targets than the control group. We thus observed an adaptation effect for both experimental groups in the accuracy data for the segmental reductions.

FIGURE 3

Figure 3. Accuracy in % correct click responses and SEs for the reduced /b/-words (visual /b/-targets) and the vowel-deleted words (visual CVC-targets) in the test phases of Experiment 1 (A), Experiment 2 (B), and Experiment 3 (C).

For the syllabic reductions (visual CVC-targets), the main effect of Group was not significant (b_{Segmental reduction group} = 0.2, SE = 0.3, p = 0.52; b_{Syllabic reduction group} = 0.3, SE = 0.3, p = 0.26). That is, neither of the experimental groups differed from the control group. We thus did not observe a significant adaptation effect for either group.

Eye movement data. The eye movement patterns for the segmental reduction condition (visual /b/-targets) of the two experimental groups compared to the no-reduction control group are displayed in Figures 4A,B. Early on, in a descriptive time window from 200 to 500 ms after target onset, the control group (represented by black lines) looks more often to the competitors (dashed lines) when hearing a reduced /b/-word than the segmental reduction group (in red, Figure 4A) or the syllabic reduction group (in green, Figure 4B). From around 500 ms onwards, all three groups show a similar preference for the /b/-targets (solid lines).

FIGURE 4

Figure 4. Proportion of fixations in the segmental reduction condition [reduced /b/-words in auditory input, visual /b/-targets on screen; (A,B)] and in the syllabic reduction condition [vowel-deleted words in auditory input, visual CVC-targets on screen; (C,D)] in the test phase of Experiment 1.

Statistical analyses considered time windows of 200 ms length which started at 200 ms after target onset and were then shifted by 100 ms (i.e., the following time windows were analyzed: 200–400, 300–500, 400–600, …, 1300–1500 ms). In the following and both subsequent experiments, only time windows showing significant effects are reported. If several consecutive 200 ms time windows were significant (e.g., the time windows 200–400 and 300–500 ms), the values reported are those for the accumulated time window.

The difference in target-competitor preference between the segmental reduction group and the control group did not reach significance. The main effect of Group, however, was marginally significant for the syllabic reduction group in the time window from 300 to 500 ms after target onset (b_{Syllabic reduction group} = 0.6, SE = 0.3, t = 1.9, p = 0.06). That is, we observed a weak adaptation effect for the syllabic reduction group in the segmental reduction condition, hence a weak generalization of learning across reduction types.

Figures 4C,D display the corresponding eye movement data for the syllabic reduction condition (visual CVC-targets). In the first 900 ms after target onset, all three groups show a very similar pattern for the vowel-deleted words. Only later, the two experimental groups have descriptively a greater target preference for the CVC-targets than the control group.

Statistical analyses did not reveal a significant difference between the control group and the segmental reduction group, but revealed that the main effect of Group was significant in the time window from 1100 to 1400 ms for the syllabic reduction group (b_{Syllabic reduction group} = 0.9, SE = 0.4, t = 2.2, p < 0.05). In this time window, the syllabic reduction group had a greater target-competitor preference for the CVC-words than the control group. For the syllabic reduction group, we thus found an adaptation effect.

Discussion

In Experiment 1, we found adaptation effects for both the segmental and the syllabic reductions. Learning about segmental reductions was evident in the accuracy data but not in the eye-tracking data. For the syllabic reductions, this pattern was reversed: A learning effect was found in the eye-tracking data but not in the accuracy data. Moreover, there was also evidence of generalization of learning across reduction types. Generalization across reduction types, however, was only found in one direction: learning about vowel deletions generalized to /b/-reductions, as shown by the accuracy data and the eye movement data for the segmental reductions. In contrast, learning about /b/-reductions did not generalize. That is, the segmental reduction group could not apply their experience with reductions to the vowel-deleted words.

The learning effects found in Experiment 1 seem somewhat weak. An explanation for this may be that the potentially reduced words in the exposure phase were not highly predictable. Participants did not see the potentially reduced words on the computer screen during the exposure phase and these words appeared early in the sentences, which were in fact designed to predict the targets (e.g., in Pas in een [b]/[m]inderij wordt een boek of tijdschrift afgemaakt, the target tijdschrift is predictable and the potentially reduced word [b]/[m]inderij is not). Participants may therefore not have been able to predict potentially reduced words. Having information about the upcoming reduced words in advance could however facilitate learning. Jesse and McQueen (2011) found that adaptation to ambiguous fricatives did not take place if those fricatives occurred at the onset of a word presented in isolation. They concluded that lexical information likely has to be available when the ambiguous sound is initially being processed. The present study investigates adaptation to another form of deviation, which also occurs at the beginning of the words. Predictable sentence contexts may provide sufficient cues about the upcoming words so that adaptation may be possible. Experiment 2 was run to test this hypothesis.

Experiment 2

Experiment 2 tested whether providing additional information about the reduced words in the exposure phase might strengthen the learning effects found in Experiment 1. Therefore, we changed the exposure sentences for the experimental words, leaving the filler sentences for the /m/-words and the consonant-cluster words intact. The sentence contexts now predicted the potentially reduced words. To avoid the orthographic versions of the reduced words appearing twice on the screen, the clicking task was not used in the exposure phase. Instead, participants simply listened to the exposure sentences and were asked to answer questions about the content of some of the filler sentences (those containing /m/- or CC-words).

The test phase was kept the same as in Experiment 1, apart from minor changes in three sentences (see Methods section). Further purposes of Experiment 2 were to replicate the generalization effect from vowel-deleted words to reduced /b/-words found in Experiment 1 and to test whether, with predictable sentences, a generalization effect in the other direction (from reduced /b/-words to vowel deletions) might occur.