Statistical learning of a tonal language: the influence of bilingualism and previous linguistic experience

Wang, Tianlin; Saffran, Jenny R.

doi:10.3389/fpsyg.2014.00953

ORIGINAL RESEARCH article

Front. Psychol., 03 September 2014

Sec. Psychology of Language

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.00953

This article is part of the Research TopicContext Specific Nature of Bilingual Cognitive AdvantagesView all 14 articles

Statistical learning of a tonal language: the influence of bilingualism and previous linguistic experience

Tianlin Wang^*

Jenny R. Saffran

Department of Psychology, University of Wisconsin–Madison, Madison, WI, USA

While research shows that adults attend to both segmental and suprasegmental regularities in speech, including syllabic transitional probabilities as well as stress and intonational patterns, little is known about how statistical learning operates given input from tonal languages. In the current study, we designed an artificial tone language to address several questions: can adults track regularities in a tonal language? Is learning enhanced by previous exposure to tone-marking languages? Does bilingualism affect learning in this task? To address these questions, we contrasted the performance of English monolingual adults (Experiment 1), Mandarin monolingual and Mandarin–English bilingual adults (Experiment 2), and non-tonal bilingual adults (Experiment 3) in a statistical learning task using an artificial tone language. The pattern of results suggests that while prior exposure to tonal languages did not lead to significant improvements in performance, bilingual experience did enhance learning outcomes. This study represents the first demonstration of statistical learning of an artificial tone language and suggests a complex interplay between prior language experience and subsequent language learning.

Introduction

An important component of learning a new language is segmenting words from the speech stream. Such initial learning can be accomplished by using the statistical regularities in fluent speech to determine the boundaries of novel word forms, along with other types of diagnostic cues such as pauses and reliable stress patterns. In speech, sounds that co-occur often are likely to comprise part of a single word, whereas rare sound sequences are likely to mark transitions between words. Both infants and adults are able to track this statistical information and use it to identify novel word forms in an unfamiliar language (e.g., Saffran et al., 1996a,b, 1999; Ludden and Gupta, 2000; Thiessen and Saffran, 2003; Newport and Aslin, 2004). Moreover, sensitivity to the statistical regularity between syllables (transitional probability, TP) is not only manifested at the segmental level (i.e., vowels and consonants), but is also evident at the suprasegmental level for both the linguistic and musical domains (Saffran et al., 1999; Saffran, 2003b; Creel et al., 2006; Thiessen and Saffran, 2007; Schön et al., 2008; Hay and Saffran, 2012).

Though research has shown that people can track statistical regularities of syllabic contrasts, prior studies have not investigated languages that rely on tones as integral aspects of lexical representations. Tonal languages are estimated to comprise 60–70% of the world’s languages (Yip, 2002). In syllable–tone languages, pitch variations function in a phonemic manner to distinguish lexical meanings at the syllabic level; these languages therefore employ lexical tones or pitch variations to denote different meanings at the suprasegmental level (e.g., Yip, 2002; Burnham and Mattock, 2007). These pitch contrasts occur regardless of their syntactic or morphological status. In the case of Mandarin Chinese, depending on the pitch contours, the four citation tones in Mandarin can be categorized as either high-level, low-rising, low-dipping, or high-falling – a syllable /pa/ can mean “eight,” “to pull,” “to hold,” or “dad” when carrying these respective tones. Pitch variations are therefore linguistically meaningful in syllable tonal languages because they determine the semantic meaning of a syllable. However, it is currently unclear how cues to word boundaries are weighted in languages that utilize both segmental and suprasegmental information.

There is a general sense that the spoken sound of a tonal language is markedly different from non-tonal languages. Even if a sentence in a non-tonal language is sung, it is not likely to approximate the variations of pitch and tonal contours over individual syllables that are typical of tonal languages. A possible explanation for this phenomenon is that while a sung syllable typically occurs on a single pitch rather than a continuous pitch contour, a tonal syllable includes information about both pitch height (fundamental frequency, or F0) and pitch contour, which can take on either level, rising, falling, or dipping shapes. Adult learners can track regularities between pure tones (Saffran et al., 1999; Saffran, 2003a) as well as sung sequences where a pure tone is super-imposed on a syllable (Schön et al., 2008). In tonal languages like Mandarin, however, pitch variations are used contrastively for lexical meaning and are truly foreign acoustic cues to the ears of non-tonal speakers (Peabody and Seneff, 2009).

To date, the stimuli used in statistical language learning studies have been based on the phonotactics of Indo-European languages, and have not incorporated the linguistic properties of lexical tones. In the current set of experiments, we designed an artificial tone language that resembles syllable–tone languages such as Mandarin and Cantonese in order to examine the process of word segmentation in a tonal context. By utilizing a linguistic cue that differs significantly from Indo-European phonological structure, the artificial tone language simulates a tonal language by providing linguistic regularities at both the suprasegmental and the segmental level. This design also provides an informative test case for assessing adults’ statistical learning ability in processing languages that are typologically different from Indo-European languages. Researchers have intentionally manipulated suprasegmental information in their word segmentation tasks, including stress, intonation, and musical tones (Johnson and Jusczyk, 2001; Thiessen and Saffran, 2003, 2007; Schön et al., 2008). Moreover, learners employ language-specific segmentation strategies that are dependent upon the prosodic organization of a particular language (e.g., Nazzi et al., 2006; Shukla et al., 2007; see Nazzi et al., 2014 for a discussion), such that their prior language knowledge impacts subsequent statistical learning (Finn and Hudson Kam, 2008; Shukla et al., 2011; Lew-Williams and Saffran, 2012). Though such results suggest that there will be differences between tonal and non-tonal speakers in a word segmentation task, the role of lexical tones – a key property that is present in over two-thirds of the worlds’ languages – has not been investigated in prior studies.

While English speakers are able to successfully track statistical properties of languages that range from syllabic artificial languages to natural Italian (Saffran et al., 1996a; Pelucchi et al., 2009), a significant amount of language experience may be necessary before they are able to segment tonal speech. Therefore, the current studies were designed to determine whether prior experience with lexical tones is necessary for adults to segment a tonal language, or whether other non-tonal linguistic experience could also facilitate tonal statistical learning. To begin to address these questions, Experiment 1 examined monolingual English-speaking participants’ performance in a tone–language statistical learning task. Crucially, the materials were created such that the syllable-level statistics and the tonal-level statistics both provided strong and redundant cues to word boundaries. As they utilized stimuli bearing limited similarity to the features of tonal cues, previous studies have shown that the extraction of linguistic information is enhanced when stress patterns coincide with word boundaries (Myers et al., 1996) and also when a speech is sung (Schön et al., 2008), thus suggesting a facilitating role of redundant suprasegmental information. In addition to being a key characteristic of tonal languages, incorporation of lexical tones in experiment stimuli promises to offer further insights into the influence of redundant segmentation cues in statistical learning.

If English monolinguals can make use of the syllable-level statistics and/or the tonal-level statistics, they should succeed at the task. However, if the presence of the unfamiliar tonal structure distracts learners from detecting the syllable-level structure, these materials may be more difficult for English speakers to acquire than the materials used in prior statistical language learning tasks.

Experiment 1

In Experiment 1, monolingual English adults participated in a statistical learning task, where they were exposed to an artificial tonal language followed by a forced-choice test (e.g., Saffran et al., 1996a). Compared to prior studies, the artificial language was relatively simple, containing only three trisyllabic words, with two redundant cues to word boundaries (syllable-level statistics and tonal-level statistics). Participants were then tested using a forced-choice design contrasting words versus non-words – sequences of syllable/tone pairs that were reordered relative to the exposure language. Importantly, both types of test items – words and non-words – maintained the trained correspondences between individual syllables and tones. The differences between the test items lay in their sequential statistics at both the syllable and tonal level. The question of interest was whether participants would learn enough about the structure of the artificial tonal language to successfully distinguish between test words and non-words. The speech stream provided identical regularities at both the syllabic and the tonal tiers. Thus, learners could track the syllables alone, the tonal regularities alone, or the two together, as in tonal languages. Given that syllable regularities in the absence of tones are readily acquired by English-learning adults (e.g., Saffran et al., 1996a), we expected that our participants would successfully acquire the artificial tonal language.

Method

Participants

Twenty-four English monolingual students at a Midwest university with self-reported normal hearing participated in the experiment. All participants in Experiment 1 and the subsequent experiments provided informed consent in accordance with the University IRB. Participants received extra credit in a psychology course in which they were enrolled. Data from two additional participants were excluded from the analysis due to experimenter error (1) or failure to follow directions (1).

Materials

The artificial language consisted of two tiers: syllables and tones. From the material used in the language created by Saffran et al. (1996a), we chose nine syllables (ta, tu, ti, da, du, di, ba, bi, gu) to incorporate into our design. For the tonal tier, three tonal contours (rising, level, falling) were paired with three F0 starting points (register; high, middle, low), resulting in nine tones in total (e.g., high rising, middle falling, low level). To construct the nine tones employed in this language, we surveyed tonal languages that use tones at the syllabic level to form contrastive lexical meaning. In these tonal languages, the span of F0 was suggested to be 87–308 Hz in the case of female speakers (Connell, 2002; Keating and Kuo, 2012), and 10–100 Hz for an individual tone (Lee et al., 2006). Using these distributions, we specified nine tones (see Table 1). Using the starting and ending points of F0, we synthesized nine pure tones using the Mbrola speech synthesizer (http://tcts.fpms.ac.be/synthesis/mbrola.html). The three tonal contours and the three tonal registers we used in our stimuli are present in natural tone languages. The stimuli were recorded by a female native English speaker who does not speak a tonal language. She has perfect pitch and music performance training. She was asked to listen to the synthesized pure tones and “sing” out the same tones with the nine syllables. The recording was conducted one tonal syllable at a time. All tonal syllables were further edited in Adobe Audition to be matched in length (500 ms) and amplitude, while preserving their original pitches.

TABLE 1

TABLE 1. Range of F0 (Hz) for all nine tones.

To control for arbitrary listening preferences during testing, two counterbalanced conditions of the language were constructed; the non-words in condition A were words in condition B and vice versa. For condition A, the aforementioned syllables and tones were then uniquely paired with one another to create three trisyllabic words: tadugu, bidatu, tibadi. As such, the speech stream consisted of three trisyllabic tonal words, and the words can be uniquely described by either their syllables or tones (see Table 2). Each word occurred 30 times, and never repeated twice in succession. Transitional probabilities from the syllabic and tonal tiers thus offered identical and redundant cues to word boundary: 0.5 between words and 1.0 within each word.

TABLE 2

TABLE 2. Words and non-words in condition A and B.

For the test items, the non-words were constructed by reversing the order of syllables in each word (e.g., a word whose syllable order was “ti–ba–di” would be rearranged to produce a non-word “di–ba–ti”), resulting in a within-word syllabic/tonal TP = 0. The three non-words in condition A were therefore guduta, tudabi, dibati. The tone/syllable pairings presented during training were maintained in the test materials. For instance the syllable ta is always paired with the high rising tone in words and non-words. Each word was paired exhaustively with each non-word, resulting in 18 test trials.

The syllables were concatenated together into a stream, with 10 ms of silence between syllables, using Adobe Audition. There was no coarticulation between syllables, unlike previous studies of word segmentation. No additional acoustic cues were inserted at word boundaries. The stream was presented 13 times during familiarization, with 390 presentations of each word for a total duration of 9 min.

Procedure

Participants were instructed that they would be listening to a non-sense language. They were informed that there were patterns in this language and that their task was to pay as much attention to the language as possible. We included these instructions based on prior results suggesting that adult performance in statistical learning tasks is enhanced by explicit instructions to attend to the stimuli (e.g., Saffran et al., 1997; Turk-Browne et al., 2010).

Participants were assigned to one of the two counterbalanced language conditions: condition A or condition B. After 9 min of listening during the familiarization phase, participants were tested using a forced-choice task between words from the language and non-words. In each test trial, participants heard two trisyllabic strings (one word and one non-word) separated by 500 ms of silence. At the end of each trial, participants were asked to indicate which of the two strings sounded more familiar. The order of presentation of 18 test trials was randomized for each participant. After the test phase, participants filled out questionnaires concerning their language and musical background.

Results and Discussion

We first compared the two counterbalanced familiarization conditions. A t-test (all t-tests reported are two-tailed) comparing the accuracy rates from the two counterbalanced languages revealed no significant differences [t(22) = 1.61, p = 0.122], suggesting that there were no a priori listening preferences for any of the test words. The two conditions were therefore combined in the subsequent analyses. A one-sample t-test showed that the English monolinguals did not perform better than chance (50%) on the forced-choice test [t(23) = 1.42, p = 0.169], with an average accuracy of 0.55 (SE = 0.03). These participants failed to learn the sequential statistical structure of the tonal artificial language (see Figure 1). There was no correlation between participants’ performance and self-reported musical background in Experiment 1 [r(22) = 0.11, p = 0.601].

FIGURE 1

FIGURE 1. Average accuracy rate of all four groups. This figure illustrates the average percentage correct of the four language groups: in Experiment 1, English monolinguals; in Experiment 2, Mandarin monolinguals and Mandarin–English bilinguals; in Experiment 3, non-tonal bilinguals.

Given that there is ample research showing that adults can regularly track either segmental or suprasegmental cues in statistical learning tasks (Saffran et al., 1996a; Saffran, 2003b; Schön et al., 2008), we had expected this tonal artificial language to be relatively easy to learn – it offers two sets of redundant and equally informative cues (i.e., tonal and syllabic). The failure of the English monolinguals to discern the statistical properties of this language was surprising, and suggests that they were unable to utilize either of the redundant cues to word boundaries available in these materials.

The results of Experiment 1 suggest that there may be attributes of these materials that made the task markedly more challenging than previous statistical learning tasks. One possible factor that could have given rise to our participants’ difficulty in performing this task is the acoustically prominent nature of lexical tones. Lexical tones are carried by the fundamental frequency in speech. Though vowel quality and coarticulation may slightly condition the realization of tones, lexical tones nevertheless exist at the suprasegmental level, i.e., a tone’s pitch contour and pitch height can be consistently realized irrespective of segmental characteristics of the syllable with which they are paired (Liberman and Pierrehumbert, 1984; Shen, 1992; Cao, 2002; Liu et al., 2007). In contrast to lexical stress, which can also change a word’s meaning (e.g., REcord versus reCORD), lexical tone alters meaning in a far more dramatic fashion and usually results in semantically unrelated lexical items (e.g., /pa/ can mean “eight,” “to pull,” “target,” or “father” depending on the four tones in Mandarin). Such lexical contrast cannot be accomplished by any suprasegmental properties inherent in Indo-European languages. Adult second language learners with non-tonal linguistic backgrounds frequently report that tones are the hardest aspect of a tonal language to acquire (Peabody and Seneff, 2009), suggesting an inherent complexity of tones as perceived by non-tonal language speakers. Furthermore, in a study looking at tonal discrimination abilities across languages, native Mandarin speakers outperformed non-tonal participants in a task requiring them to discriminate between Thai tones (Wayland and Guion, 2004). That result suggests that experience with lexical tones in one language facilitates the discrimination of tones in another language, demonstrating the important role of prior linguistic experience in processing lexical tones.

Therefore, we hypothesized that previous exposure to a tonal language might facilitate learning in our tonal statistical learning task. To address this hypothesis and examine the potential effect of a more variable language experience on word segmentation in a new language, we next examined the performance of native Mandarin speakers from the same university community as the participants from Experiment 1. These participants were bilingual in Mandarin and English. While they attended the same university as the monolingual English-speaking participants from Experiment 1, these participants differ two ways: exposure to Mandarin and bilingual status. To control for the latter factor, we also collected data from a group of monolingual college students in Mainland China. This group came from a different country and university setting than the participants in Experiment 1, but share their monolingual status. We thus included two groups of Mandarin-speaking participants in Experiment 2, bilingual international students in the USA and monolingual students in Mainland China.

Experiment 2

As native speakers of Mandarin, a lexical tone language, both the Mandarin monolinguals and Mandarin–English bilinguals tested in Experiment 2 are intimately familiar with the properties of tones. If the failure of monolingual English speakers in Experiment 1 was due to the aural interference produced by lexical tones, both groups of Mandarin speakers should be able to succeed in the segmentation task. This experiment also afforded the opportunity to examine the effects of more variable language experiences (bilingual versus monolingual) on a challenging statistical learning task.