An analysis of post-vocalic /s-ʃ/ neutralization in Augsburg German: evidence for a gradient sound change

Bukmaier, Véronique; Harrington, Jonathan; Kleber, Felicitas

doi:10.3389/fpsyg.2014.00828

ORIGINAL RESEARCH article

Front. Psychol., 31 July 2014

Sec. Psychology of Language

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.00828

An analysis of post-vocalic /s-ʃ/ neutralization in Augsburg German: evidence for a gradient sound change

VB
Véronique Bukmaier ^*
JH
Jonathan Harrington
FK
Felicitas Kleber

Institute of Phonetics and Speech Processing, Ludwig-Maximilians-Universität München Munich, Germany

Abstract

The study is concerned with a sound change in progress by which a post-vocalic, pre-consonantal /s-ʃ/ contrast in the standard variety of German (SG) in words such as west/wäscht (/vɛst/~/vɛʃt/, west/washes) is influencing the Augsburg German (AG) variety in which they have been hitherto neutralized as /veʃt/. Two of the main issues to be considered are whether the change is necessarily categorical; and the extent to which the change affects both speech production and perception equally. For the production experiment, younger and older AG and SG speakers merged syllables of hypothetical town names to create a blend at the potential neutralization site. These results showed a trend for a progressively greater /s-ʃ/ differentiation in the order older AG, younger AG, and SG speakers. For the perception experiment, forced-choice responses were obtained from the same subjects who had participated in the production experiment to a 16-step /s-ʃ/ continuum that was embedded into two contexts: /mIst-mIʃt/ in which /s-ʃ/ are neutralized in AG and /və'mIsə/-/və'mIʃə/ in which they are not. The results from both experiments are indicative of a sound change in progress such that the neutralization is being undone under the influence of SG, but in such a way that there is a gradual shift between categories. The closer approximation of the groups on perception suggests that the sound change may be more advanced on this modality than in production. Overall, the findings are consistent with the idea that phonological contrasts are experience-based, i.e., a continuous function of the extent to which a subject is exposed to, and makes use of, the distinction and are thus compatible with exemplar models of speech.

Introduction

The present study forms part of a series of investigations (e.g., Kleber, 2011; Müller et al., 2011; Harrington et al., 2012) into dialect leveling in High German varieties under the influence of Standard German (SG). Our particular concern is not just with phonological categorical changes in the direction of SG but more specifically with how such categorical changes are related to the continuously gradient variation in speech production and perception across generations of speakers. The present investigation deals with the association between the post-vocalic /s-ʃ/ contrast before /t/ in SG (e.g., West/wäscht; /wɛst/~/wɛʃt/, engl. west/washes) and the Augsburg variety of German (AG) in which, at least for older, but possibly not for younger speakers, the distinction is collapsed such that these minimal pairs are neutralized as a post-alveolar fricative (i.e., /wɛʃt/ for both West and wäscht). By Augsburg variety we mean a regional variety of Standard German, which is mainly influenced by the Swabian dialect.

In Standard German, the contemporary /s-ʃ/-contrast emerged as a consequence of various sound changes. Old High German (OHG) did not distinguish between those two places of articulation for fricatives, but only had alveolar sibilants, which were realized either voiceless (fortis, /s/) or voiced (lenis, /z/). The OHG /z/ later changed into the contemporary Standard German /ʃ/ (Renn and König, 2009). In addition, /s/ shifted to /ʃ/ in some /s+consonant/-clusters (/sC/ hereafter) from Middle High German (MHG) to SG. The shift from MHG /s/ to SG /ʃ/ took place only in syllable initial clusters (e.g., MHG slagen /slagən/ > SG schlagen /ʃlagən/, to beat), while in Southern German varieties this change also occurred in post-vocalic clusters (e.g., fast, engl. almost, which is /fast/ in SG but /faʃt/ in the south-west German variety of Swabian). However, while Bavarian (spoken in south-east Germany) nowadays contrasts /s/ and /ʃ/ before consonants just like SG, Swabian retains the pronunciation of /sC/-clusters as /ʃC/—not just in the deep dialect but also in the Swabian-colored, regional variety of Standard German. Thus, the Standard German phonemic contrast between post-vocalic, pre-consonantal /s/ and /ʃ/ is neutralized in favor of the post-alveolar pronunciation in Swabian, i.e., the minimal pair West (/vɛst/, west) and wäscht (/vɛʃt/, washes) are homophones when produced by a Swabian speaker. Nonetheless, in the Swabian variety the contrast between /s/ and /ʃ/ is maintained in intervocalic position (e.g., Tasse /tasə/, cup—Tasche /taʃə/, bag).

The data for the present study is taken from Augsburg—a city in Bavaria around 80 km north-west from Munich. Augsburg is situated in a transitional zone between the Bavarian and Swabian dialect areas and as a consequence, this variety has both Bavarian as well as Swabian dialect features (Nübling, 1988). In an investigation that forms the background to the present study, Bukmaier (2010) carried out an auditory analysis to determine whether the Augsburg variety should be classified as a Swabian or a Bavarian dialect based on the proportion of Bavarian and Swabian dialect features in Augsburg speakers' productions; in order to do so, she investigated the usage of dialectal features by younger (aged 20–30 years) and older (aged 40–70 years) Augsburg speakers. Her analysis showed that AG was predominantly Swabian but that there was nevertheless a tendency for younger speakers to make greater usage of SG features. It is this latter finding that is the primary motivation for the present study that focuses on the neutralization of pre-consonantal, post-vocalic /s-ʃ/ in Augsburg German.

The phonological process of neutralization is traditionally conceived as involving a categorical change from one category to another. Nevertheless, acoustic analyses have repeatedly shown that neutralization is incomplete (Port and O'Dell, 1985; Kleber et al., 2010). Similarly, the outcome of historical sound changes is usually categorical, although there is increasing evidence that a diachronic change comes about through a gradual change from one category to another across generations (e.g., Harrington et al., 2012). Since Labov's (1963) pioneering work in sociolinguistics, so-called sound changes in progress are inferred by comparing phonetic differences across two generations of the same speech community and most often within sounds that differ in continuous acoustic parameters (as the many studies on vocalic change show, e.g., Hawkins and Midlgey, 2005) since the gradual changes are perceptible and thus more obvious. There are, however, categorical sound changes such as metathesis that are typically considered to involve no such gradual change. Similarly, the auditory analysis of the data in Bukmaier (2010) points to a categorical change amongst younger speakers from AG /ʃ/ in clusters toward SG /s/.

On the other hand, research on assimilatory processes, in particular in /s#ʃ/ or /ʃ#s/ across word boundaries, has shown that sibilants vary gradually between the two places of articulation depending on the degree of assimilation (Niebuhr et al., 2008; Pouplier et al., 2011), although these fine phonetic differences may not be perceptible (Niebuhr and Meunier, 2011). Similarly, physiological studies of speech errors present evidence for gradual shifts between categories that may be perceived as clear instances of one category and may even result in auditory transcription errors (e.g., Pouplier and Hardcastle, 2005; Goldstein et al., 2007). In the light of this synchronic evidence, it seems quite possible that even these supposedly categorical diachronic changes may in fact be continuous. Thus, one of the main issues we address in this paper is whether the unmerging of /ʃt/ toward /st/ or /ʃt/ is a categorical or continuous process. A categorical change might occur lexically such that there is a discrete change for younger but not older AG speakers from /ʃt/ to /st/ in words such as West (SG /vɛst/). In a continuous change, speakers might gradually shift their production in such words between post-alveolar and alveolar productions with a greater shift toward /s/ in younger speakers.

Another major concern in this paper is whether the change affects the modalities of speech perception and production in equal measure. The arguments for parity between speech production and perception have been made across different kinds of models including at the level of gestures (e.g., Fowler et al., 2003) and also in terms of exemplar theory (Pierrehumbert, 2002) in which speech production draws upon the same sets of exemplars that have been stored in the acoustic/auditory space of the listener's mental lexicon as a result of speech perception. With respect to some sound changes, such parity can be observed within but not between generations. An example for such a sound change in progress in which there is parity between the two modalities within a generation is the age-graded neutralization of the voicing contrast of intervocalic consonants toward the lenis variant of East Franconian speakers (Müller et al., 2011). Older East Franconians neutralize the voicing contrast of Standard German plosives in perception as well as in production, while younger East Franconians neutralize this contrast equally in production as well as in perception to a lesser extent. Nevertheless, younger East Franconians do not yet maintain the voicing contrast to the same extent as Standard German speakers. The exemplar theory not only accounts for this parity but also for the shift toward the Standard German contrast¹: the more a speaker is exposed to Standard German, the more standard forms (with all the fine phonetic detail inherent to them) are added to the edge of an exemplar cloud (i.e., the density distribution of a set of exemplars across the acoustic/auditory space that constitute a phonological category) which eventually shifts in the acoustic/auditory space and then in turn causes the speakers to select more standard-like variants from the cloud for production. On the assumption that the contact with the standard variety increases with each generation of German dialect speakers, we therefore predict with respect to the present study that younger Augsburg speakers produce sibilants before /t/ in a more standard-like way than do older speakers.

At a particular point in time during the period of change, on the other hand, sound change may also present an exceptional case in which the two modalities are out of alignment with each other (Kleber et al., 2012). According to Ohala (1981, 1993), sound change is initiated by listeners' misperceptions of speakers' production. Given the vast amount of synchronic variation in speech signals (Hawkins, 2003), misperceptions may occur under certain conditions, although these misperceptions only rarely turn into a diachronic change. A similar line of argument is found in Browman and Goldstein (1991) who present evidence for articulatory gestures that overlap to such an extent that only one gesture is decoded correctly by the listener. These forms of overlap cause at first perceptual synchronic elision, which can under certain conditions result in diachronic elision. In both models it is the mismatch between production and perception that leads to sound changes on the listener's side. Applied to the present data, AG subjects might initially unmerge /ʃt/ as /st, ʃt/ in perception with production showing a greater degree of neutralization (cf. also Labov et al., 1991).

Sound changes triggered by misperceptions of or undercompensating for synchronic variation (Harrington et al., 2008; Kleber et al., 2012) are thus driven by internal or phonetic factors. External or sociolinguistic factors such as social status or the prestige of a dialect (Kerswill, 2003; Labov, 2007) may, however, also play a role in diachronic changes—in particular those that are due to dialect leveling, which refers to the reduction of dialectal forms, as for example the increasing monophthongization of regional /Iə/ as /e:/ in British English with the latter having a wider geographically distribution (Kerswill, 2003). The question arises whether sound changes that are triggered by sociolinguistic factors occur passively as a result of accommodation (e.g., Trudgill, 2004) or whether the speaker takes up a more active part. The model of sound change described in Lindblom et al. (1995) emphasizes the role of the speaker to a greater extent than the above-mentioned models, as it is the speakers who adapt to listeners' needs when producing speech along a continuum from hypo- to hyper-articulated speech. Sound changes may then evolve when listeners' attention is in such circumstances exceptionally directed to a word's form (i.e., its pronunciation) instead of its meaning. Perhaps speakers of regional varieties have a propensity to evaluate the word's form when they are in contact with speakers from other varieties.

The aim of the present study was to investigate whether or not Augsburg speakers completely neutralize the /s-ʃ/-contrast in the production and perception of /sC/-clusters and whether the degree of neutralization is age-related in this variety, with younger Augsburg speakers tending to a more standard-like pronunciation. The analysis in this paper draws upon the classic technique of an apparent time investigation in which sound change is inferred by comparing phonetic differences across two generations. However, in contrast to almost all sociolinguistic investigations, the present study is based both on production and on the same speakers' responses to perceptual stimuli (see also Harrington et al., 2012, 2013; Kleber et al., 2012). The hypotheses for the two experiments can be formulated as follows:

H1: Augsburg speakers differentiate the /s-ʃ/-contrast in /st/-clusters to a lesser extent in production than Standard German speakers.

H2: Older Augsburg speakers show a greater tendency toward neutralization of the /s-ʃ/-contrast in the production of /st/-clusters than younger Augsburg speakers.

H3: Augsburg listeners differentiate the /s-ʃ/-contrast in /st/-cluster to a lesser extent in perception than Standard German speakers.

H4: Older Augsburg listeners show a greater tendency toward neutralization of the /s-ʃ/-contrast in the perception of /st/-clusters than younger Augsburg speakers.

Production experiment

Methods

Participants

The production experiment was conducted with three different subject groups: older Augsburg speakers, younger Augsburg speakers and Standard speakers. The first group—the experimental group—contained 26 speakers of Swabian from the city of Augsburg. Eleven of these subjects were aged between 40 and 70 years (3 male and 8 female) and assigned to the older age group. 15 participants were aged between 20 and 30 years (8 male and 7 female) and assigned to the younger age group. All participants were born/or have spent most of their lives in Augsburg. At the time of participation in this experiment all Augsburg subjects were living in Augsburg.

The second group served as a control group and included 16 Standard German-speaking subjects (two male and 14 female) aged between 20 and 30 years. The participants in this group were all either from Northern Germany or from Munich². None of the 45 subjects reported any hearing, eye-sight, or reading problems.

Prior to the experiment the Augsburg participants were asked to fill out a questionnaire with questions about the participants education, the length of time that they had been living in Augsburg, and a self-assessment of how much and how often they speak dialect. The AG participants were chosen in accordance to the time they had been living in Augsburg; so all the young AG subjects were living in Augsburg all of their lives and the older AG participants were living in Augsburg most of their lives (30 years and more).

The subjects of the older and the younger experimental group were tested in a quiet room at their homes. The subjects of the control group were tested in a quiet room at the university. It is possible that the difference of whether the speakers were recorded at home or not could have had an influence on the results such that those recorded at home hypoarticulated more than those in the laboratory due to the slightly more informal recording setting at home. However, we found no evidence for this from our auditory impressions of the data.

Materials

In order to elicit productions of /st/-clusters, we designed a blending task (see also Kleber et al., 2010) in which the subjects had to combine the first syllable of one nonword with the second syllable of another nonword (see Table 1) in order to produce a real German word, e.g., the speaker's task was to produce the blend Kiste (/kIstə/, box) from the two nonsense words Kissingen and Wirte.

Table 1

Word 1	Word 2	Blend
Küssingen (kYsIŋən)	Wirte (/vIrtə/)	Küste (/kYstə/, coast)
Kissingen (kIsIŋən)	Würte (/vYrtə/)	Kiste (/kIstə/, box)
Lüssingen (lYsIŋən)	Kirte (/kIrtə/)	Lüste (/lYstə/, pl. desire)
Lissingen (lIsIŋən)	Kürte (/kYrtə/)	Liste (/lIstə/, list)
Schussingen (ʃʊsIŋən)	Kirter (/kIrtɐ/)	Schuster (/ʃuːstɐ/, cobbler)
Schwessingen (ʃvɛsIŋən)	Kürter (/kYrtɐ/)	Schwester (/ʃvɛstɐ/, sister)

Nonwords and resulting blends.

The syllables that were blended are underlined.

With the exception of /uː/ in Schuster, the vowels /I/, /ɛ/, and /Y/ in the initial syllables of the resulting blends were always phonologically short, which was triggered by a word medial orthographic double consonant in the first word, e.g., <ss> in Lüssingen (this orthographic representation corresponds to the Standard German norm indicating phonemic short vowels). While the onset consonant varied, the coda consonant of the first syllable was always /s/. The final syllable of the second word was either /tə/ or /tɐ/ (see Table 1). The 16 filler words were disyllabic German words which did not contain any sibilants and which varied in the vowel as well as in the coda consonant of the first syllable (while the second syllable was always −te /tə/), e.g., Wirte, Worte, Bunte, Kalte.

In addition to the cluster blends, we obtained prototypical /s/ and /ʃ/ in intervocalic or post-vocalic position, i.e., in a non-neutralizing context in both varieties. For this purpose, subjects read aloud the following four German real words: Biss (/bIs/, bite), wisse (/vIsə/, to know), Busch (/bʊʃ/, bush), and Tusche (/tʊʃə/, India ink). In order to minimize any coarticulatory effects, /s/ and /ʃ/ were combined with /I/ and /ʊ/, respectively.

Experimental set-up, digitization, labeling

The recordings were made with the SpeechRecorder software (version 2.6.14; see Draxler and Jänsch, 2004), an audio interface (M-Audio Fast Track) and a stereo headset (Beyer dynamics). Each of the six target blends together with eleven distractor blends were repeated ten times and presented in randomized order on a MacBookPro computer screen (in total 170 tokens). Following the blending task, but within the same session and experimental set-up, the subjects were presented with three repetitions of each of the German real words (in total 12 tokens). In both tasks, the subjects had to produce each word within a time slot of 1 s, which was then followed by an automatic pause of 0.8 ms before the next item was presented. In total, each subject produced 182 words.

The words were digitized at 44.1 kHz. All of the data were segmented and labeled automatically into phonetic segments using the Munich Automatic Segmentation System (MAuS, Schiel, 2004); manual readjustments were made subsequently whenever necessary to the target word in PRAAT (Boersma and Weenink, 2012). All words that were mispronounced were excluded from the analysis. For the present study a total of 2996 words were analyzed, including 2494 /st/-clusters, 252 prototypical /s/ and 250 prototypical /ʃ/ (cf. Table 2).

Table 2

	Older AG	Younger AG	SG
/s/	66	90	96
/ʃ/	65	89	96
/st/	655	883	956

Distribution of the 2996 /s/-/ʃ/-/st/-sequences by age group.

Experimental set-up, digitization, labeling

Spectra were extracted at the temporal midpoint between each fricative's acoustic onset and offset after applying a 256 point discrete Fourier transform with a 40 Hz frequency resolution, 5 ms Blackman window, and a frame shift of 5 ms to the target words using the Emu Speech Database system (Harrington, 2010).

The subsequent parameterization of these data involved the data reduction of each spectrum (at the sibilant's acoustic temporal midpoint in all cases) to a set of mel-scaled coefficients using the discrete cosine transformation. More specifically, for an N-point mel-scaled spectrum, x(n), extending in frequency from n = 0 to N − 1 points over the frequency range of 500–3500 Hz, the mth DCT-coefficient C_m (m = 0, 1, 2) was calculated with the formula in (1)

These three coefficients C_m (m = 0, 1, 2) encode the mean, the slope, and curvature respectively of the signal (in this case of a given sibilant's mel-scaled spectrum extracted at its temporal midpoint) to which the DCT transformation was applied (Harrington, 2010). Since C₀, which is proportional to the dB-mean across the entire spectrum, is largely irrelevant for the /s-ʃ/-distinction, only C₁ and C₂ (the spectral slope and curvature) were used for further quantification.

We quantified the degree of neutralization of the /s-ʃ/-distinction by calculating the Euclidean distances, E_s and E_ʃ, in the C₁ × C₂ space separately for each sibilant in the database to the Standard German speakers' /s/-centroid and to the Standard German speakers' /ʃ/-centroid, respectively. These two centroids are the positions in the C₁ × C₂ space averaged across all Standard German speakers' /s/-tokens and all Standard German speakers' /ʃ/-tokens respectively that occurred in the words from the reading condition. We then calculated for each sibilant its log-Euclidean distance ratio d_sib, from (2):

Thus, there is one d_sib value per sibilant which is a relative measure: greater positive values denote a closer distance of a given sibilant to the /ʃ/-centroid; greater negative values are associated with distances closer to the /s/-centroid; and a value of zero on d_sib denotes that a given sibilant is equidistant in the C₁ × C₂ space between the /s/ and /ʃ/-centroids (e.g., Harrington et al., 2008; Kleber et al., 2012, for a similar methodology).

Results

Figure 1 shows for each speaker group the log-Euclidean distance ratio, d_sib, for their singleton and cluster sibilants to the /s/ and /ʃ/-centroids. Negative/positive values are productions of a given sibilant closer to the /s/ and /ʃ/-centroids respectively. As Figure 1 shows, all speaker groups produced cluster sibilants as more /s/-like, although those of older and younger AG speakers tended to be closer to the /ʃ/-centroid than those of the SG speakers: this is evident in the medians (the dots in Figure 1) which are higher (closer to zero) in /st/ for AG than for SG speakers.

Figure 1

Figure 2 shows separately for each speaker group and vowel context (/ε I Y ʊ u:/) d_sib for the sibilants in /st/-clusters to the /s/ and /ʃ/-centroids. In these data, older AG speakers have values closest to zero: this shows that their productions were slightly more /ʃ/-like than for the other two groups. At the same time, the SG speakers always had the lowest median values such that their /st/ was closest to /s/ compared with the AG speakers. Figure 2 also shows that the younger AG speakers' medians were between those of the other two groups. A mixed model with d_sib (the data in Figure 2) as the dependent variable and with vowel context (/ε I Y ʊ u:/) and speaker group coded for increasing order (three ordered levels: older Augsburg > younger Augsburg > Standard) and with speaker as the random factor showed a significant effect for vowel [χ²₍₁₎ = 30.4, p < 0.001], a significant effect for group [χ²₍₁₎ = 4.7, p < 0.05], and no interaction between these factors. The significant effect for group is a confirmation of the evidence in Figure 2 that there is a trend from older AG to younger AG to SG speakers for /st/ to be progressively closer to /s/.

Figure 2