Perception of Lexical Neutral Tone Among Adults and Infants

Fan, Shanshan; Li, Aijun; Chen, Ao

doi:10.3389/fpsyg.2018.00322

ORIGINAL RESEARCH article

Front. Psychol., 23 March 2018

Sec. Psychology of Language

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.00322

This article is part of the Research TopicLexical Tone Perception in Infants and Young Children: Empirical studies and theoretical perspectivesView all 23 articles

Perception of Lexical Neutral Tone Among Adults and Infants

Shanshan Fan^1,2*

Aijun Li²

Ao Chen^3,4

¹School of Preparatory Education, Beijing Language and Culture University, Beijing, China
²Institute of Linguistics, Chinese Academy of Social Sciences, Beijing, China
³School of Communication Science, Beijing Language and Culture University, Beijing, China
⁴Utrecht Institute of Linguistics OTS, Utrecht University, Utrecht, Netherlands

Neutral tone (T0) is a special tone form in Mandarin that contains tonal and stress information. Compared with canonical tones, T0 has a much shorter duration and reduced pitch contour. Its tonal contour is determined by the preceding canonical tone. However, not much is known about the perception of tonal and stress information in T0. In the current study, we investigate (1) whether T0 can be perceived as lexically unstressed by stress-language listeners; and (2) how Mandarin (tone language)- and Dutch (stress language)-learning infants perceive T0. Three experiments were conducted. In Experiment 1, Dutch adults identified T0 as unstressed when presented with disyllabic sequences ending in T0. In Experiment 2, we used the visual fixation paradigm to test 4- to 6-month-old and 10- to 12-month-old Dutch and Mandarin infants on pseudoword discrimination (/pan1san4/ [high-level + high-falling] and /pan1san0/ [high-level + mid-falling]). T4 and T0 each exhibit a similar falling contour. The results show that (1) after being habituated to neutral tone sequences (/pan1san0/), Dutch infants discriminated the T1T0–T1T4 contrast; and (2) neither age groups of Mandarin infants discriminated the tone contrast. Assuming Mandarin infants’ lack of discrimination might be due to the similar F0 contours, we tested Mandarin infants in Experiment 3 using a more salient contrast, /pan1san2/ (high-level + mid-rising) and /pan1san0/. While no overall discrimination was observed, those who were habituated to /pan1san0/ demonstrated discrimination. The continuous discrimination of Dutch infants suggests that they might process neutral–canonical tone contrast as lexical stress rather than as tonal information. Overall, Mandarin infants’ failure implies that the representation of T0 is not complete during their 1st year of life; the acquisition of tonal categories may therefore take longer than we expected.

Introduction

Lexical tones are pitch variations that distinguish lexical meanings. Mandarin is the most widely studied tone language, in which four canonical tones are used to distinguish word meanings, including T1 (high-level; 55 in Chao tone letters), T2 (mid-rising; 35), T3 (low-dipping; 21/214) and T4 (high-falling; 51). For example, the following words have different meanings based on canonical tones: /ma1/ (, mother), /ma2/ (, numb), /ma3/ (, horse), and /ma4/ (, to scold). Besides the four canonical tones, neutral tone (T0) never occurs independently or at the beginning of a word. It is always preceded by a canonical tone. Neutral tone can distinguish word meanings, such as (, east and west) and (, things), and appear in different lexical and syntactic contexts, including reduplication, affixation, lexeme type, directional complements, complement particles, etc. With regard to lexeme type, words are distinguished solely by the presence of neutral tone without any other morphological or grammatical marker, such as (, east and west) vs. (, things) (Luo and Wang, 2002; Lin, 2012). In the present study, we focus on the lexeme type.

Neutral tone is acoustically light with a shorter duration and reduced pitch contour. It has been referred to as unstressed or weak stress in previous studies (Chao, 1979; Zhu, 2002; Lu and Wang, 2005; Wei, 2005; Duanmu, 2007; Cao, 2008; Deng, 2010; Jia, 2011; Bao and Lin, 2014). The tonal contour of T0 is determined by the preceding canonical tone. When preceded by T1, T2, or T4, the tonal contour of T0 is falling; when preceded by T3, the tonal contour is mid-level (Chao, 1979; Wu, 1992; Kong and Lü, 1998; Luo and Wang, 2002; Lin and Wang, 2013; Zhang and Li, 2016). Neutral tone has a lower pitch register and narrower pitch range. Pitch patterns are shown in Figure 1, where the dashed lines denote sequences ending with a neutral tone. The duration of neutral tone is about 50% of its corresponding canonical tone (Lin and Yan, 1980; Lin, 1983; Lee, 2003) or about 60% of the preceding canonical tone (Cao, 1986; Li, 2017). In summary, neutral tone contrasts with canonical tone lexically because the neutral tone is unstressed and has distinguished pitch pattern. Neutral tone possesses properties of lexical stress and lexical tone.

FIGURE 1

FIGURE 1. F0 contour patterns of all possible disyllabic tonal combinations (dashed lines represent neutral tone combinations). The vertical axis is the normalized z-score; the horizontal axis is the normalized duration (Li and Gao, 2017).

The acoustic correlates of neutral tone are duration, F0, intensity, and spectral features (i.e., vowel reduction, initial consonant voicing, and spectral tilt steeping). The main acoustic correlates of neutral tone are F0 and duration (Lin and Yan, 1980; Lin, 1983; Cao, 1986; Yang, 1989; Wang, 2004; Chen and Xu, 2006; Li and Fan, 2015), with F0 being more important than duration (Cao, 1986; Wang, 2004; Li and Fan, 2015; Li, 2017). Spectral tilt is a reliable cue, but it is less important than duration (Zhong et al., 2001). Intensity is not reliable (Lin and Yan, 1980; Lin, 1983). The same acoustic correlates are found for lexical stress in stress language, with duration being the most reliable cue for lexical stress in Dutch (Sluijter and van Heuven, 1995, 1996; van Heuven and de Jonge, 2011).

Previous research has revealed inconsistencies regarding how infants perceive lexical tones and lexical stress early in life. Some studies found supportive evidence for the perceptual reorganization of lexical tones, which occurred around 9 months. For example, prior to 6 months, both tone- and non-tone-language infants can discriminate lexical tones. By around 9 months, non-tone-language infants’ sensitivity to lexical tones declines, whereas no such decline is observed among tone-language infants (Mattock and Burnham, 2006; Mattock et al., 2008). Some other studies, however, reported different results. For instance, in Liu and Kager (2014), 5- to 18-month-old Dutch infants showed continuous discrimination of Mandarin T1–T4 contrast. But when the phonetic distance between T1 and T4 was reduced, the infants no longer demonstrated discrimination. In Chen and Kager (2016), 4-month-old Dutch infants failed to discriminate a non-salient Mandarin tonal contrast (T2–T3), yet 6- and 12-month-old infants succeeded. Infants may not be born with the ability to discriminate all the native contrasts and may especially need time to learn phonetically non-salient contrasts (Sundara et al., 2006; Narayan et al., 2010). For lexical tones, Shi (2010) discovered that Mandarin infants were only able to categorize phonetically variable lexical tones gradually after 8 months. In Tsao (2008), 12-month-old Mandarin infants discriminated T1–T3 better than T2–T3/T2–T4 contrasts. Taken together, early discrimination of lexical tones appears to exhibit a complex developmental pattern, where successful discrimination might relate to the phonetic salience of particular tonal contrasts.

In terms of lexical stress, in studies supporting perceptual reorganization, infants’ stress perception appears to shift from universal discrimination to their native language at 9 months of age (Sansavini et al., 1997; van Ooijen et al., 1997; Hohle et al., 2009; Skoruppa et al., 2009, 2013). For example, newborn French infants could discriminate stress-initial and stress-final words (Sansavini et al., 1997), while 9-month-old French infants failed to discriminate stress contrast at a phonological level. Hence, French infants adapted their stress perception to their native language by 9 months. Nine-month-old Spanish infants, whose native language has contrastive lexical stress, demonstrated discrimination (Skoruppa et al., 2009). In some other studies, however, the discrimination of contrastive lexical stress requires sufficient exposure to ambient input (Weber et al., 2004; Keij and Kager, 2013; Butler et al., 2015). For instance, 5-month-old German infants could discriminate between stress-initial and stress-final pseudowords, yet 4-month-old German infants could not (Weber et al., 2004). In summary, attunement seems flexible in early language perception. It might be modulated by ambient language input for lexical tone and lexical stress. For lexical tone, participants’ discrimination could be related to the acoustic salience of particular stimuli.

Besides acoustic salience, the order of stimuli presentation may influence the discrimination effect as well. Perceptual asymmetry was found in previous studies on the discrimination of both segments (Polka and Werker, 1994; Polka and Bohn, 1996, 2003) and suprasegments (Weber et al., 2004, 2005; Tsao, 2008; Chen, 2013; Segal et al., 2016). In Segal et al. (2016), when discriminating between initial and final lexical stress, Hebrew infants showed better discrimination when presented with uncommonly initial stress first. German-learning infants also showed similar perceptual asymmetry when perceiving lexical stress, namely that change detection was easier for infants when trochee, the predominant stress pattern, was embedded in iambs rather than the other way around (Weber et al., 2004). For early perception of lexical tones, Mandarin infants discriminated the T1–T3 contrast better if they were presented with T1 first than the other way around (Tsao, 2008). The mechanism underlying such asymmetry is not fully understood, yet it may be related to statistical distribution in the input. When habituated to an atypical pattern in ambient input, infants may consolidate such a pattern in representation and subsequently discriminate the frequent pattern in the input from the infrequent one. Yet if infants are habituated to the frequent pattern in the input, they might perceive the infrequent pattern as a non-prototypical realization of the frequent one.

The statistical distribution of particular phonological features in the input influences infants’ perceptions of such features. Scholars have largely agreed that infants are sensitive to statistical distribution in speech input (e.g., Saffran et al., 1996; Maye et al., 2002). Infants prefer predominant patterns to which they are exposed in their native language, and such preferences are established with accumulating exposure (Jusczyk et al., 1993). In the current study, we compared stress-language (Dutch) and tone-language (Mandarin) infants on their discrimination of canonical and neutral tones. Because neutral tone carries lexical stress and tonal information, it serves as a feasible means to investigate early attunement to lexical tone or stress as the result of ambient input. We posed the following questions in the current study: (1) whether Mandarin infants can discriminate between neutral and canonical tones, and whether such discrimination is influenced by acoustical salience of the tones; and (2) whether Dutch listeners perceive neutral tone as tonal or as lexical stress, and whether perceptual reorganization can be observed for neutral tone. We began by testing whether tone- and stress-language-speaking adults perceived neutral tone as unstressed, which served as a baseline for the subsequent infant experiments. Next, we tested 4- to 6-month-old and 10- to 12-month-old Dutch and Mandarin infants on their discrimination of Mandarin canonical–neutral tone contrast. If Dutch infants perceived the canonical–neutral tone contrast as lexical stress, we would expect successful discrimination at both ages; on the other hand, if they perceived them as tonal, discrimination may only be successful for the younger group. For Mandarin infants, we expected them to be capable of discriminating the contrasts at both ages. Considering that sequences with neutral tone occur less frequently than those involving canonical tones, it may take time for Mandarin infants to learn these contrasts. In this case, we would expect only the 10- to 12-month-old Mandarin infants to discriminate the contrasts.

Experiment 1: Adults’ Perceptions of Neutral Tone

To understand whether Dutch adult listeners perceive neutral tone as unstressed, a discrimination task and an identification task were conducted in Experiment 1. In the discrimination task, participants were required to discriminate disyllabic sequences ending in a neutral tone from those ending in a canonical tone. If Dutch adult listeners perceived neutral tone as unstressed, they would discriminate canonical–neutral tone contrast successfully. In the identification task, participants were required to identify the position of stress in the disyllabic sequences. Because duration is the most reliable cue for lexical stress in Dutch, and neutral tone exhibits a shorter duration compared with canonical tones, we predicted that Dutch adult listeners would identify the neutral tone as unstressed. For Mandarin listeners, given T0 as a category in native phonology, we assumed they would succeed in the discrimination task and thus be able to identify the neutral tone as unstressed.

The Discrimination Task

Stimuli

The pseudoword /pansan/ was selected as the tone-bearing sequence, which is a well-formed sequence phonotactically in Mandarin and Dutch. All possible tone combinations were included except T3T3, which is always produced as T2T3 due to the Mandarin tone sandhi process. In total, 19 target pseudowords were obtained, including 15 disyllabics ending with a canonical tone (4 × 4 - 1 = 15) and 4 disyllabics ending with a neutral tone (TnT0; n = 1, 2, 3, or 4). Another 20 tonal pairs of real words in Mandarin were added as fillers, which carried the same segments but different canonical tones, such as (, duration) vs. (, market).

All stimuli were produced by a 35-year-old male native Mandarin speaker. The speaker was born and raised in Beijing. No disorder was reported related to reading, speaking, or listening. Nineteen pseudowords were recorded along with 40 filler words in the soundproof room of the phonetics lab at the Chinese Academy of Social Sciences (CASS) using Cool Edit Pro 2.0 at a sample rate of 44,100 Hz.

Participants

Eighteen Mandarin adult listeners were tested, 10 males and 8 females, with an average age of 20.8 years (SD = 1.9). Another participant took part in the test but was excluded due to equipment failure. All participants were born and raised in Beijing, without reported hearing or speech disorders.

Eighteen Dutch adult listeners were tested, 6 males and 12 females, with an average age of 23.7 years (SD = 4.7). They were born and raised in the Netherlands. None of the participants had been exposed to any tone language, and no hearing or speech disorders were reported.

Procedures

The AX paradigm was adopted. Participants were presented with pairs of stimuli and required to indicate whether the two stimuli were the same or different. The series consisted of 30 pairs of different stimuli (AX or XA) and 19 pairs of identical stimuli (AA or XX). For each different pair, the comparison was only conducted between a sequence ending in a canonical tone and its corresponding neutral tone form. Taking /pan1san1/ as an example, its neutral tone form was “/pan1san0/”. The different pairs were “/pan1san0/ vs. /pan1san1/” and “/pan1san1/ vs. /pan1san0/”, and the identical pairs were “/pan1san1/ vs. /pan1san1/” and “/pan1san0/ vs. /pan1san0/”. Another 80 pairs of fillers included different pairs such as “ (, duration) vs. (, market)” and identical pairs such as “ (, duration) vs. (, duration).”

A practice phase preceded the experiment. Seven pairs of stimuli were used to familiarize participants with the procedure. Each trial started with a fixation cross, followed by two audio stimuli with an inter-stimulus interval of 200 ms. When the audio stimuli concluded, two buttons were shown on the screen, labeled as “Same (F)” and “Different (J).” Participants provided their response by pressing either “F (Same)” or “J (Different)” on the keyboard. The next trial started automatically after the participant had responded. The inter-trial interval was 500 ms. ZEP was used to control the procedures, randomize stimuli, and collect participants’ responses (Veenker, 2013).

Results

The accuracy rate was calculated by dividing the number of correct responses by the number of total trials for each participant. For identical pairs, the accuracy rate for Mandarin listeners was 93% (SD = 1.24) and 97.1% for Dutch listeners (SD = 0.55). When discriminating different stimuli pairs, the accuracy rate for Mandarin listeners was 91.7% (SD = 0.74) and 90.5% (SD = 0.99) for Dutch listeners. Figure 2 illustrates the accuracy rates of Mandarin and Dutch adult listeners. To better understand participants’ sensitivity to the canonical–neutral contrast, d-prime (d′) was calculated. An independent t-test was conducted using d-prime with the language group as the independent variable. No difference was found between Dutch and Mandarin adult listeners [t(34) = -0.57, p > 0.05].

FIGURE 2

FIGURE 2. Accuracy rate in the AX discrimination task by Mandarin and Dutch adult listeners.

Both Dutch and Mandarin adult listeners could discriminate neutral and canonical tones. To further investigate whether they perceived the neutral tone as unstressed, we conducted the following identification task.