Raspberry, not a car: context predictability and a phonological advantage in early and late learners’ processing of speech in noise

Gor, Kira

doi:10.3389/fpsyg.2014.01449

ORIGINAL RESEARCH article

Front. Psychol., 19 December 2014

Sec. Psychology of Language

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.01449

This article is part of the Research TopicLearning a non-native language in a naturalistic environment: Insights from behavioural and neuroimaging researchView all 13 articles

Raspberry, not a car: context predictability and a phonological advantage in early and late learners’ processing of speech in noise

Kira Gor^*

Graduate Program in Second Language Acquisition, School of Languages, Literatures, and Cultures, University of Maryland, College Park, MD, USA

Second language learners perform worse than native speakers under adverse listening conditions, such as speech in noise (SPIN). No data are available on heritage language speakers’ (early naturalistic interrupted learners’) ability to perceive SPIN. The current study fills this gap and investigates the perception of Russian speech in multi-talker babble noise by the matched groups of high- and low-proficiency heritage speakers (HSs) and late second language learners of Russian who were native speakers of English. The study includes a control group of Russian native speakers. It manipulates the noise level (high and low), and context cloze probability (high and low). The results of the SPIN task are compared to the tasks testing the control of phonology, AXB discrimination and picture-word discrimination, and lexical knowledge, a word translation task, in the same participants. The increased phonological sensitivity of HSs interacted with their ability to rely on top–down processing in sentence integration, use contextual cues, and build expectancies in the high-noise/high-context condition in a bootstrapping fashion. HSs outperformed oral proficiency-matched late second language learners on SPIN task and two tests of phonological sensitivity. The outcomes of the SPIN experiment support both the early naturalistic advantage and the role of proficiency in HSs. HSs’ ability to take advantage of the high-predictability context in the high-noise condition was mitigated by their level of proficiency. Only high-proficiency HSs, but not any other non-native group, took advantage of the high-predictability context that became available with better phonological processing skills in high-noise. The study thus confirms high-proficiency (but not low-proficiency) HSs’ nativelike ability to combine bottom–up and top–down cues in processing SPIN.

Introduction

Who are Heritage Speakers?

More people in the world are raised bilingual or multilingual than monolingual (Bhatia and Ritchie, 2013, XXI). Among the millions of bilingual speakers across the world, there is a group that have been called heritage speakers (HSs). HSs are early interrupted learners, who acquire their first language naturalistically as infants at home from their caregivers, but who switch to the language spoken in the community in their childhood (Valdés, 2005; Polinsky, 2008). As a result, second language (L2) becomes the dominant language of HSs, and their first (L1), heritage, language is reduced to non-native levels of proficiency due to incomplete acquisition and/or attrition (Montrul, 2008; Bylund, 2009; Bylund et al., 2010; Schmid, 2010; Polinsky, 2011). The heritage language may also be influenced by L2, the dominant language (Cook, 2003; Polinsky, 2014). HSs rely predominantly on auditory input, and often do not go through formal schooling in their first language. Due to this auditory bias, they typically prefer the listening and speaking modalities, have poor reading and writing skills, and are sometimes illiterate. HSs, early starters with non-native proficiency in their first language, have recently attracted the attention of researchers. And indeed, understanding the role of early start (from birth) in shaping the linguistic profile and the underlying processing mechanisms of HSs as opposed to late L2 starters makes it possible to address the critical period hypothesis (Abrahamsson and Hyltenstam, 2009; Bylund et al., 2012; DeKeyser, 2013). At the same time, HSs are compared to native speakers since both populations acquire language naturalistically from birth. This allows researchers to identify native and non-native aspects of heritage language (Montrul, 2012), and to establish the role of incomplete acquisition as opposed to attrition (Bylund et al., 2010).

Late L2 learners, unlike heritage language speakers, start learning their second language as adults, after puberty. The type of L2 exposure, naturalistic or formal classroom, depends on biographic trajectories of individual L2 learners, and on global migration patterns for larger populations of learners. Demographic trends, including the patterns of migration, often determine which populations of L2 learners will study L2 in a foreign language classroom, and which will actually move to the country where L2 is spoken. Formal late L2 learners, and university students in particular, often rely heavily on visual input (Psaltou-Joyceya and Kantaridoub, 2011). While there exists a range of methodologies for teaching a foreign language to late learners in a classroom setting outside the target language community, university-level academic programs in the U.S. typically introduce reading in Russian from the outset (Gor, 2000). A perusal of the major Russian language textbooks for beginners currently used in American universities shows that they rely on reading from day 1 (Lubensky et al., 2002; Lekic et al., 2008; Robin et al., 2014). In this study, native speakers of American English and late L2 learners of Russian were all predominantly shaped by in-class experience, which could be complemented by an immersion. No late L2 learner in the sample was a naturalistic learner. Conversely, HSs acquire their heritage language from birth in a uniquely auditory modality. Research on HSs in comparison with adult native speakers and late L2 learners makes it possible to gage the role of early naturalistic exposure in shaping the mechanisms underlying auditory speech processing. The uniqueness of HSs lies in the fact that they have received early naturalistic input in the same way as native speakers, yet have reduced, non-native proficiency in their L1, and thus can be compared to late L2 learners at the same proficiency level to single out the influence of early naturalistic exposure and input.

To summarize, heritage language is a native language acquired naturalistically from birth from caregivers that does not reach native proficiency levels due to a switch to another language spoken in the community, which becomes the dominant language. Heritage languages are often spoken languages due to the reduced amount of schooling that heritage language speakers receive. While there is a growing number of studies addressing the domains of heritage language phonology (Oh et al., 2003; Chang et al., 2011; Lukyanchenko and Gor, 2011), morphology (Gor et al., 2009; Gor and Cook, 2010), morphosyntax (Montrul et al., 2008, 2013, 2014; Montrul, 2009, 2011), and syntax (Keating et al., 2011; Lee-Ellis, 2011; Polinsky, 2011), there have been no studies, to the best of our knowledge, exploring the robustness of heritage auditory sentence processing, and in particular, HSs’ ability to rely on context predictability in adverse conditions, such as speech in noise (SPIN).

Speech in Noise and Top–Down and Bottom–Up Processing

Given that SPIN, as one of the adverse conditions, has been used to study the properties of the human speech recognizer (Mattys et al., 2012), it can become a powerful diagnostic tool for the robustness of non-native speech perception. Moreover, recent renewed interest in speech processing in adverse conditions, including different kinds of noise, stems from the understanding that (1) adverse conditions are ecologically more valid than unrealistic idealized listening conditions, e.g., clear speech (see Mattys et al., 2012), and (2) by manipulating the properties of noise and the listening materials, one gains insights into the complex interaction of top–down and bottom–up processing in different groups of listeners. Was it raspberry or car (‘malina’ or ‘mashina,’ correspondingly, in Russian) that was mentioned in the sentence? In noisy conditions, these two feminine nouns can be confused easily. However, the context in which they were heard usually disambiguates the word in question. The high cloze probability context, if recovered from noise, will disambiguate car and raspberry in Russian sentences 1a and 1b.

(1a) Okolo doma stojala staraja mashina.

Near house stood old car.NOM.SG.

‘An old car stood near the house.’

(1b) V sadu rosla spelaja malina.

In garden grew ripe raspberry.NOM. SG.

‘Ripe raspberries grew in the garden.’

Critically, the whole sentence is masked by noise, and not just the last word, and the listener therefore needs to recover sentence cues from the acoustically degraded signal. This means that the mechanisms of prediction and sentence integration need to rely on acoustic cues that are less than robust, starting from the beginning of the sentence and building up expectations by the last word. Note that Russian allows scrambling, but crucially, the word order with the sentence-final noun-subject is canonical for this particular sentence structure, with the adverbial phrase fronted. Context predictability was manipulated in the original SPIN test developed for native speakers of English (Kalikow et al., 1977) and later adapted for Spanish (Cervera and González-Alvarez, 2011). The role of prediction and its interaction with heritage and late L2 learner profiles and high/low-proficiency levels is the main focus of the present study.

Noise Types and Informational and Energetic Masking

Before we address non-native processing of SPIN, let us revisit the understanding of the impact of different types of environmental degradation, including noise, on speech processing in native speakers. This will assist us in situating the present study and later in interpreting the findings with regard to the type of the noise that it used. There are two types of environmental degradation that are used in psycholinguistic experiments: energetic masking and informational masking (Van Engen and Bradlow, 2007; see Mattys et al., 2012 for a review). Energetic masking is created by the use of white noise or filtering and requires signal separation and lower-level acoustic encoding and activation of lexical-semantic information. Conversely, informational masking such as babble noise or speech compression interferes with higher-order selection and integration (Aydelott and Bates, 2004). The study by Aydelott and Bates (2004) used two types of distortion, low-pass filtering and 50% speech compression, and three types of priming sentence context, congruent, incongruent, and neutral. The format of the experiment was a lexical decision task with priming, where the priming context was manipulated, and the target final word (or non-word) was presented without distortions. The study recorded reduced facilitation in congruent low-pass filtered sentences, and reduced inhibition in incongruent compressed sentences compared to the neutral context. It concluded that energetic masking induced by low-pass filtering interfered with early low-level acoustic encoding and the activation of lexical entries, while sentence compression affected central language processing and sentence integration. While, there are no data at present on the impact of different adverse conditions on HSs’ speech recognition, it is reasonable to assume that the involvement of different levels of speech processing depending on the type of distortion will be same as for native speakers.

The present study used a multi-talker babble noise, which sounds like the noise of many people talking at the same time in the background. This type of noise is ecologically valid given its pervasive presence in everyday life. Note that listening to speech in adverse conditions is considered to be part of a listener’s daily auditory experience rather than an extraordinary situation, and consequently, Mattys et al. (2012, p. 963) maintain that speech recognition in adverse conditions is synonymous with speech recognition per se. Thus, SPIN tests the robustness of non-native listeners’ speech recognition under ecologically valid conditions.

Multi-talker babble noise combines both energetic and informational masking and thereby has a double effect on speech intelligibility. The superposition of several speech recordings on the target sentence produced a white noise component that is associated with energetic masking (Mattys et al., 2012). Energetic masking, as well as low-pass filtering, primarily affects the acoustic-phonetic properties of speech, and decreases its intelligibility by interfering with low-level processing. The more talkers, the more energetic masking takes place. At the same time, once the informational masking effect is partialled out, babble noise also produces informational masking that has different implications for speech intelligibility. Informational masking has higher-level consequences, as it leads to attentional capture, semantic interference, and eventually, increases the cognitive load. In the present study, the multi-talker babble had a high component of steady noise, but it also had an informational masking component, with a more limited competition between the informational streams than in a two-talker babble.

Speech in Noise in Non-Native Perception

There exists a large body of evidence that L2 speakers’ perception of L2 speech in noisy conditions deteriorates to a greater extent than does the perception of native speakers (Kalikow et al., 1977; Mayo et al., 1997; Munro, 1998; van Wijngaarden et al., 2002). This effect has possible explanations involving redundancy reduction or fuzziness in L2 perception at different levels, from phonetic (e.g., uncertainty about phonetic contrasts) to semantic. Apparently L2 speakers do not make efficient use of the probabilities that context provides. “The levels of noise at which the speech was intelligible were significantly higher and the benefit from context was significantly greater for monolinguals … than for late bilinguals” (Mayo et al., 1997, p. 686).

While there is numerous evidence that non-native speech perception is affected by noisy conditions to a greater extent than native perception, there is no agreement regarding the relative role of several factors implicated in L2 learners’ perceptual problems when processing SPIN. Reduced speech discriminability in SPIN has been demonstrated in L2 listeners for non-word syllables (Cutler et al., 2004, 2008; Rogers et al., 2006; Broersma and Scharenborg, 2010), isolated words presented in lists (Rogers et al., 2006), words embedded in a sentence (Mayo et al., 1997; Bradlow and Alexander, 2007; Oliver et al., 2012), and whole sentences (Meador et al., 2000; Bradlow and Bent, 2002; Pinet et al., 2011). Studies focusing on the role of different aspects of non-native speech processing affected by noise fall mainly into three categories. The first category focuses on sublexical processing of isolated phonemes, e.g., individual phonemic confusions for English intervocalic consonants (Garcia Lecumberri and Cooke, 2006; Cutler et al., 2008; Broersma and Scharenborg, 2010). The second category is concerned with the phonological/lexical interface and phonemic confusions associated with word recognition (Oyama, 1982; Meador et al., 2000; Cooke et al., 2008). And finally, the third explores the reliance on sentence context and the use of cloze probabilities (van Wijngaarden et al., 2002; Bradlow and Alexander, 2007). The priming role of the context presented in noise in native and non-native populations has been explored for word priming (Golestani et al., 2009, 2013; Hervais-Adelman et al., 2014), and sentence priming (Aydelott and Bates, 2004). Crucially, two studies exploring the behavioral and neural bases of semantic context use in word and sentence priming, showed a consistent semantic context advantage for native speakers, but not second language learners (Golestani et al., 2009, 2013; Hervais-Adelman et al., 2014).

Studies explore the use of sentence context and cloze probabilities in various ways. The SPIN test (Kalikow et al., 1977) compared recognition of the sentence-final word, with the preceding context either making the word highly probable or impossible to predict. Thus, if at least part of the sentence can be auditorily recovered from noise in ‘The mouse was caught in the trap,’ the listener is unlikely to hear ‘tram’ instead of ‘trap.’ At the same time, when the context does not support the choice of one word over the other, confusion is more likely to occur. In: ‘They hope he heard about the rent,’ the low cloze probability does not support either the actual or the alternative word, for example, ‘tent.’ A more radical approach to cloze probabilities was adopted by Meador et al. (2000) who created sentences with low transitional probabilities between each word in the sentence and the following one, as in: ‘The blonde dentist ate the heavy bread.’ There, participant’s task was to repeat the sentence verbatim, and the accuracy score referred to the number of words that were correctly recovered from the sentence. The present study uses the approach of Kalikow et al. (1977), with two types of sentences differing by the probability of the last word only, which makes it possible to control for the properties of sentence-final words recognized in noise.

A study by Meador et al. (2000) directly addressed the relative role of non-native phonology in non-native word recognition in sentences. The study hypothesized that the native Italian participants’ accuracy in perceiving English vowels and consonants would be related to their recognition of English words in sentences with low transitional probabilities between words, as in the example above. To verify this hypothesis, the authors regressed the segmental perception scores obtained for the native Italian participants in two other studies onto the word recognition scores, i.e., the number of repeated words in the sentence. The results support the role of phonological deficits (non-native consonant perception in that specific case) in SPIN recognition. However, the findings of the study are not sufficient to evaluate the role of non-native phonological perception as opposed to top–down use of context predictability, since the sentences used in the study had the lowest cloze probabilities possible.

No data are yet available on heritage processing of SPIN. Is SPIN perception in HSs on the par with native speakers because they have the advantage of early starters, or is it degraded as in L2 learners because their proficiency is comparable to late L2 learners? While there is robust evidence that non-native speech perception is affected by noisy conditions to a greater extent than native perception, there is no agreement regarding the relative role of several factors implicated in L2 learners’ perceptual problems when processing SPIN. These factors include phonological deficits, reduced lexical knowledge, and a reduced ability to rely on top–down processing and to use contextual cues for sentence integration. The current study fills the gap and compares the perception of Russian speech in multi-talker babble noise in HSs of Russian and late L2 learners at the same proficiency levels to that of native Russian speakers. HSs of Russian in the study are early interrupted learners whose first language spoken at home was Russian, but who later switched to English, currently their dominant language. Given that heritage language is shaped by early naturalistic exposure from birth that relies exclusively on the aural modality, at least in the first years of life, one can hypothesize that HSs would have a processing advantage for SPIN over late L2 learners. Indeed, late learners, college-level students, mainly acquire Russian in a formal classroom and rely heavily on visual input, i.e., reading. While the goal of a modern foreign language classroom is to develop all four skills—two receptive, reading and listening, and two productive, speaking and writing (Rogers, 2014)—an objective assessment of the listening skills in late learners of Russian as a foreign language produced disappointing results (Thompson, 2000, p. 276). If a heritage SPIN advantage were to be found, the question arises as to the factors underlying this advantage.

The Current Study

This study investigates the role of sentence context predictability and uses two levels of multi-talker babble noise, high and low, to determine whether the efficiency of processing SPIN depends on bottom–up acoustic-phonetic and/or top–down semantic-syntactic sentence integration. It goes on to compare the outcomes of the SPIN test with three additional tests of phonological and lexical knowledge in the same groups of participants¹. To control for the role of possible phonological deficits leading to problems with efficient processing of acoustically degraded speech, the study uses two independent measures of phonological perception. Both measures target the phonological contrast that causes most difficulties for speakers of English, the hard/soft consonant contrast. The AXB discrimination task measures sensitivity to the contrast in nonsense syllables, while the picture-word discrimination task looks at the sensitivity to the same contrasts in minimal pairs of lexical items and thus investigates the robustness of phonolexical representations differentiated by the same hard/soft contrast. In order to explore the possibility that the advantage on the SPIN task may stem from superior knowledge of vocabulary, the study compares the accuracy scores on a multiple-choice task measuring vocabulary in different frequency ranges.

The study addresses the following questions:

Are HSs as efficient as L1 speakers in listening to SPIN or do they experience the same deficits as late L2 learners at the same proficiency levels?
Which factors are responsible for the problems experienced by HSs and L2 learners when processing SPIN: phonological deficits, lack of vocabulary knowledge, and/or the ability to rely on top–down processing and use sentence cues?
What is the role of proficiency and learning background, early versus late start in the ability to rely on top–down processing?

Experiment 1: Speech in Noise

Material and Methods

The present study uses the design of the original SPIN test (Kalikow et al., 1977), with high- and low-probability sentences presented in two levels of noise, high and low, and the task for the participant was to repeat the last word of the sentence. It used balanced lists of words created based on a comprehensive study of Russian speech recognition in white noise, that has identified numerous factors that influence speech comprehensibility in both native and non-native speakers (Shtern, 1992). These factors form a hierarchical structure and depend on the type of stimuli: syllables, words, sentences, and extended text. Since the task in the current experiment elicits the responses at the word level, only the findings about this level are provided below. Shtern obtained the following hierarchy of factors at the word level (words presented in isolation) in native speakers that are relevant to the present study:

Length of the word in phonemes: the longer the word, the better it is perceived.
Part of speech: nouns are best, and verbs worst, in intelligibility.
Stressed vowel: /a-o-e-i/ have better intelligibility than /u-ɨ/².
Consonantal load: the more consonants in a word, the better its perception.
Place of stress: disyllabic words with stress on the first syllable are perceived better than those with stress on the second syllable.

The same study emphasized that the level of predictability, defined and measured by the presence and number of key words suggesting the use of the target word, plays an important role in speech intelligibility at the sentence level and above and interacts with the level of noise and purely phonetic factors described above at the word level. Shtern (2001) created balanced word lists in such a way, that each list of 10 nouns in the Nominative case (the citation form in Russian) has the same parameters that have been demonstrated to be critical for recognition of SPIN by native Russian speakers. The lists of nouns created by Shtern and used in this study are balanced in frequency (with four gradations; only relatively high-frequency words are used), length in syllables (two monosyllabic, four disyllabic, and four trisyllabic words), stress placement, stressed vowel (two of each vowel: /a/, /u/, /e/, and /o/, and one of each: /ɨ/ and /i/), and the percentage of voiceless consonants (40–50% per list). We used eight lists with 10 nouns each to create 80 sentences.

Materials

The critical design of the SPIN used in this study crosses two factors: noise level and predictability of the final word based on the sentence context. In general, it is expected that higher noise levels will produce more errors. However, as proficiency increases, learners’ perception should be more robust in the face of noise, because of a greater internalization of syntactic structure, semantic properties, collocational tendencies, phonological information, etc. Therefore, sentence context was manipulated to be either highly predictive of the final word (e.g., ‘I don’t have a sister, but I have a brother’), or not at all predictive (e.g., ‘The man in the park has a brother’). It is expected that under very noisy conditions, advanced and near-native learners will show a large effect of context, where the words in highly predictive sentences are easier than the words in poorly predictive sentences. It is expected that this advantage of context will correlate with proficiency.

The task uses four conditions, with two levels of noise and two levels of context cloze probability. The high-noise level is combined with 20 high and 20 low cloze probability sentences. Identically, the low-noise level is combined with 20 high and 20 low cloze probability sentences. Thus, the task includes eight blocks of 10 sentences each—four high-probability (40 sentences), and four low-probability (40 sentences). The target word is a sentence-final noun. For the sentence-final word, the task uses phonetically balanced lists of nouns (Shtern, 2001). The carrier sentences, both high- and low-probability, were balanced for number of words (average 4.8 to 5.4 words depending on the block), and number of syllables (10.03 to 10.12 syllables). A total of 80 sentences were used. All participants listened to the same set of sentences, which made it possible to reduce the number of participants in the study and to ensure that no uneven distribution of participants with varying proficiency across different presentation lists takes place. This was imperative given that heritage and L2 participants were in the same proficiency range based on the standardized test of oral proficiency (see Participants). Sample items (2a,b) are provided below:

(2a) High cloze probability context

U menja net sestry, no est’ brat.

At me no sister but (there) is brother.

‘I don’t have a sister, but I have a brother.’

(2b) Low cloze probability context

Rebjonok ne znal, chto eto otvet.

Child not knew that this (was) answer.

‘The child did not know that this was the answer.’

Two voices, male and female, were used to record the stimulus sentences. Half of the sentences (40) were presented in the male voice, and another half in the female. Voices were not alternated, but presented in two blocks, first the male and then the female. The recordings were rescaled so that they had similar energy values. The multi-talker babble noise was produced by forward-superimposing multiple stimulus sentences from the same task so that the noise had a speech-shaped quality and the same frequency spectrum as the stimulus sentences. The level of the resulting noise was manipulated to create two noise conditions: low-noise and high-noise. The sentences were then combined with each of the two masker noise types such that the noise signal started on average 1.5 s before the onset of the sentence and continued for about 1.5 s after the sentence offset. The speech-to-noise ratio (SNR) for the low-noise condition was on average 4 dB, and the SNR for the high-noise condition was on average 1.5 dB. To determine the appropriate SNR for each sentence in the high- and low-noise conditions, a subjective piloting was used with four native speakers of Russian who did not take part in the experiment. Only sentences with the low-predictability context were used to establish the target noise level. In the high-noise condition, half of native listeners identified the last word in the sentence, while in the low-noise condition, three out of four did. Thus, the choice of the SNR for both noise conditions reflected average discriminability by native speakers of Russian established prior to the main experiment.

Participants

Sixty-eight people participated in the SPIN experiment and were paid for their participation. Specifically, the data were collected from 11 native speakers of Russian, 23 HSs, and 34 late L2 learners of Russian. The sample contained 31 males and 37 females. As seen in Table 1, the average age of the L2-high group is higher than that of the other participants, and L2 learners tend to be older on average. This tendency is understandable given that it takes several years to reach the low-level Russian proficiency threshold established in this study, and even longer to achieve very high proficiency. Given that the experiment did not collect reaction time data, these age differences are not expected to bias the results. The SPIN test was part of a larger 4-h long test battery (Gor and Cook, 2010; Long et al., 2012), and the results of the SPIN test are compared below to the tests gaging phonological discrimination and vocabulary control in the same heritage and L2 participant groups. HSs who participated in this experiment had Russian-speaking parents, were exposed to Russian from birth and heard it spoken at home on a daily basis. However, they had lived in the U.S. since the age of 7 on average (range: 0–14), and considered English to be their dominant language, and Russian, the language of the test, their weaker language. HSs did not live in Russia or a Russian-speaking country after puberty, and had little or no formal elementary schooling in the Russian language, although all of them could read in Russian. Late L2 learners were all native speakers of American English and started learning Russian after puberty in a formal classroom, most of them as young adults in college. The average age of onset of Russian was 18.4 years (range: 13–27), and an average length of formal study was 10 years (range: 0–39). While all but five L2 learners had a study abroad experience in Russia or a Russian-speaking country, they did not learn Russian in a naturalistic setting, merely by virtue of living in a Russian-speaking country or community.

TABLE 1

TABLE 1. Background information of the participants in the study.

Heritage speakers and L2 learners of Russian in this experiment were divided into two groups, high- and low-proficiency, using the Interagency Language Roundtable (ILR) testing format, which made possible direct comparisons of the high- and low-level proficiency heritage and L2 participants (Long et al., 2012)³. The ILR score is established based on an audio-recorded oral proficiency interview conducted with a certified tester. The interview lasts 20–30 min and takes the form of a rigidly structured conversation, although the topics of the conversation vary depending on the testee’s background. The ILR oral proficiency score is a standard global language proficiency score widely accepted in the U.S. In addition to the base levels, the ILR scale has “plus” sublevels that refer to the proficiency exceeding the requirements of the level. In our participant groups, both heritage and L2, the oral proficiency scores ranged from 1 (Intermediate) to 2 (Advanced), 3 (Superior), and 4 (Distinguished). Both the heritage and L2 samples also included “plus” sublevels, e.g., 1+ (Intermediate High). The participants were divided into low-proficiency groups containing participants with the ILR scores ranging from 1 to 2 (16 L2 and 11 HSs), and high-proficiency groups containing participants with ILR scores ranging from 2+ to 4 (18 L2 and 12 HSs). A detailed breakdown by age, gender, and proficiency level is provided in Table 1.

Procedure

The listening materials in the SPIN task were presented in two blocks of 40 sentences, the first recorded in a male voice and the second in a female voice, with a short pause between the blocks. Each set of 40 sentences included all four critical conditions, high-noise/high-predictability context, high-noise/low-predictability context, low-noise/high-predictability context, and low-noise/low-predictability context. The order of the sentences in these four conditions was randomized within each block (male-voice and female-voice), and was the same for all participants. Participants were tested individually, and were seated in a quiet room in front of Dell® Latitude/D820 computers with Plantronics. Audio 750 headsets with mounted microphones and Logitech® Precision USB game pads. They were presented with instructions on the computer screen in English, and used buttons on their game-pad to initiate the following trial. Participants listened to the entire sentence in noise and were then asked to repeat the sentence-final word into the microphone. The experiment was self-paced and took ~20 min. Participants were encouraged to take a break in the middle. All four experiments reported in the present publication were part of a larger test battery and were completed on the same day. Ample rest time was provided to participants to reduce possible fatigue. Also, the type of activity varied from one task to the next, which lessened the effect of monotony. The experiment was programmed in DMDX (Forster and Forster, 2003). Responses were recorded and then manually transcribed by trained linguists, native speakers of Russian. No substitutions were accepted when scoring the responses for accuracy. Only correct responses were scored as 1; all the other responses, e.g., responses with a phonological neighbor, were coded as 0. The accuracy score results were subjected to statistical analyses.

Results

The accuracy scores for each participant group broken down by the level of noise and context predictability are presented in Table 2 and Figure 1. Participants’ responses were analyzed with a repeated measures ANOVA in by-subject and by-item analyses. The study had a 2 × 2 × 5 factorial design, with the following predictor variables: context predictability (two levels: low and high), noise level (two levels: low and high), and language proficiency group (five levels: L2-low, L2-high, HS-low, HS-high, Native). The dependent variable was the accuracy of correctly identified words in a sentence. R statistical package was used for the analyses (R Core Team, 2013, version R 3.01). The results are represented in Table 3 (by-subject) and Table 4 (by-item).

TABLE 2

TABLE 2. Participants’ mean accuracy scores across all conditions.

FIGURE 1

FIGURE 1. Accuracy scores on SPIN task in heritage, L2, and native participants. Heritage and L2 participants are divided into high- and low-proficiency groups. The left panel represents the high-noise and the right the low-noise conditions. L2-low – low-proficiency L2 learners, L2-high – high-proficiency L2 learners, HS-low – low-proficiency heritage speakers, HS-high – high-proficiency heritage speakers, and Native – native speakers of Russian.

TABLE 3

TABLE 3. Repeated measures ANOVA results for Experiment 1: speech in noise, by-subject analyses.

TABLE 4

TABLE 4. Repeated measures ANOVA results for Experiment 1: speech in noise, by-item analyses.

The analysis revealed a significant context effect indicating that participants on average performed better in the high-predictability context condition. A significant noise effect suggests that word identification was significantly more accurate in the low-noise condition, and a language group effect supports the differences among the participant groups. There were also significant context by noise, context by group, and noise by group two-way interactions. Finally, a three-way interaction between context, noise and language group was also found significant, suggesting that the interaction between noise and context changed across the levels of the language group variable. Separate ANOVAs for each group showed that two-way interactions between context and noise were significant in the Native [F₁(1,252) = 10.93, p < 0.01; F₂(1,2215) = 3.96, p = 0.05], and the HS-high [F₁(1,252) = 10.64, p < 0.01; F₂(1,2215) = 9.85, p = 0.01], groups, while they were statistically insignificant in the L2-low, L2-high, and HS-low groups.

These data are represented visually in Figure 2 where the difference between the accuracy score in the high-predictability and low-predictability conditions is provided as a percentage. This difference accounts for the context effects on response accuracy under the same noise levels.

FIGURE 2

FIGURE 2. Context effects in SPIN task for heritage, L2, and native participants. Context effect is calculated as a difference between the score in the high-predictability context condition and low-predictability context condition. 0.5 corresponds to 50% increase in accuracy in the high-predictability condition.

Figure 2 demonstrates that while L2-low, L2-high, and HS-low groups benefited from high context predictability to a similar extent regardless of the noise condition (low or high), Native and HS-high groups appear to rely on context to a greater extent (almost 40% more) when they listen to sentences in high-noise compared to low-noise, or in other words, they take advantage of the context when it is both needed and available. TukeyHSD post hoc tests showed that the increasing group differences (from L2-low to Native group) in the accuracy scores in the high-noise/high context condition were significant across all group comparisons (p < 0.5) except for between L2-low and L2-high [t(63) = –1.58, p = 0.13], L2-high and HS-low [t(63) = 1.29, p = 0.2]. In the high-noise/low context condition, the differences were significant between Native and other language groups [L2-low: t(63) = –5.72, p < 0.001, L2-high: t(63) = –4.84, p < 0.001, HS-low: t(63) = –3.17, p < 0.01, HS-high: t(63) = –2.9, p < 0.01].

To summarize, predictably, all groups benefited from low-noise compared to high-noise, however, the role of context predictability depended on the participant group and interacted with the level of noise. In the low-noise condition, there was no need in the context to recover the sentence-final word, while in the high-noise condition, the ability to efficiently process the context and to generate predictions that would help to recover the acoustically degraded sentence-final word was crucial for performance. According to the obtained results, only two groups were able to take advantage of the high-predictability context in the high-noise condition, native speakers and HSs in the high-proficiency group. These two groups relied on the context significantly more in the high-noise than in the low-noise condition. All the other groups, low-proficiency L2 and heritage, and high-proficiency L2, improved their SPIN recognition due to the high-predictability context at both noise levels to a similar, limited extent. Obviously, the high-noise/high-context condition was critical for exploring the differences among the groups, because the context was available, but the high level of noise simultaneously made the use of the context difficult. Group comparisons of accuracy scores in the critical high-noise/high-context condition reveal that native speakers are more accurate in processing SPIN than all of the other groups, and at each proficiency level, high and low, HSs outperformed L2 learners, with the L2 high-proficiency group performing similarly to the heritage low-proficiency group. Thus, HSs showed an advantage over late L2 learners, but a disadvantage compared to native speakers.

A question arises as to what deficits underlie the non-native disadvantage in late L2 learners and what aspect of SPIN processing creates advantages for HSs compared to L2 learners. In the next sections, we will briefly report the results of three experiments targeting phonological and lexical control in the same groups of participants. We will then discuss the patterns observed in the non-native populations in relation to their language learning background and setting. Two experiments tested the heritage and L2 participants’ sensitivity to the phonological hardness/softness contrast that is very prominent in Russian, as it differentiates 12 pairs of Russian consonants and is widely used contrastively in building the sound shape of words and morphemes. For example, Russian infinitives and third person singular non-past tense for many verbs is contrasted by the hardness/softness of the final consonant, e.g., /pomn’it’/⁴ means ‘remember’ while /pomn’it/ means ‘he/she remembers,’ with the last consonant, soft /t’/or hard /t/, providing the phonological shape for this morphosyntactically loaded contrast (Chrabaszcz and Gor, 2014). The first experiment, AXB discrimination⁵, targeted lower-level perceptual sensitivity to the phonological hard/soft contrast, while the second, Picture-Word Discrimination, tested phonolexical representations, or representations of words as phonemic sequences.