Naming in Noise: The Contribution of Orthographic Knowledge to Speech Repetition

Pattamadilok, Chotiga; Morais, José  Junça De; Kolinsky, Regine

doi:10.3389/fpsyg.2011.00361

ORIGINAL RESEARCH article

Front. Psychol., 05 December 2011

Sec. Psychology of Language

volume 2 - 2011 | https://doi.org/10.3389/fpsyg.2011.00361

This article is part of the Research TopicThe cognitive and brain reorganization of language by literacy in typical and atypical populationsView all 7 articles

Naming in noise: the contribution of orthographic knowledge to speech repetition

Chotiga Pattamadilok¹*

José Junça De Morais¹

Régine Kolinsky^1,2

¹ Unité de Recherche en Neurosciences Cognitives, Université Libre de Bruxelles, Brussels, Belgium
² Fonds de la Recherche Scientifique, Brussels, Belgium

While the influence of orthographic knowledge on lexical and postlexical speech processing tasks has been consistently observed, it is not the case in tasks that can be performed at the prelexical level. The present study re-examined the orthographic consistency effect in such a task, namely in shadowing. Comparing the situation where the acoustic signal was clearly presented to the situation where it was embedded in noise, we observed that the orthographic effect was restricted to the latter situation and only to high-frequency words. This finding supports the lexical account of the orthographic effects in speech recognition tasks and illustrates the ability of the cognitive system to adjust itself as a function of task difficulty by resorting to the appropriate processing mechanism and information in order to maintain a good level of performance.

Introduction

The most influential spoken word recognition models typically assumed that speech is processed without reference to its written code. For instance, the Cohort model (Marslen-Wilson and Welsh, 1978; Marslen-Wilson and Tyler, 1980) claims that the word-initial cohort is constructed from the phonological information contained in a spoken word. Thereby, the cohort consists of all spoken words that share the same initial segment(s) of the input word. Although high-level information such as semantic, syntactic, or contextual information also plays a role by eliminating competitors from the cohort, the contribution of word spelling, which is a form of lexical knowledge, is totally ignored. The absence of interaction between the phonological information contained in the speech input and the corresponding orthographic information is inherent to autonomous models such as RACE or MERGE (Cutler and Norris, 1979; Norris et al., 2000). But the role of orthography is not mentioned either in a highly interactive model such as TRACE (Mcclelland and Elman, 1986), even though this could easily accommodate the impact of orthography on spoken word recognition by means of interactions between the representations activated at different levels (i.e., features, phonemes, and words).

Nowadays, all these models lack an account of orthographic effects reported in many speech recognition tasks. One of the most robust pieces of evidence was reported by Ziegler and Ferrand (1998) who showed that participants’ knowledge of word spelling influences their performance in an auditory lexical decision task: making lexical decision on words whose rimes can have different spellings (orthographically inconsistent words) takes longer than making lexical decision on words whose rimes have only one spelling (orthographically consistent words). Since this observation, the orthographic consistency effect has been replicated in many languages (French: Pattamadilok et al., 2007b, 2009a; Portuguese: Ventura et al., 2004, 2007, 2008; English: Ziegler et al., 2008) and in other speech recognition tasks involving semantic and gender decision (Pattamadilok et al., 2009b; Peereman et al., 2009). The orthographic influence on spoken word processing has also been examined for a different type of effect, namely the orthographic (in)congruency effect (i.e., the fact that the same phonological unit shares or not the same spelling), but the evidence is somewhat mitigated (e.g., Jakimik et al., 1985; Slowiaczek et al., 2003; Pattamadilok et al., 2007a; Taft et al., 2007).

Despite the robustness of the orthographic effects reported in the studies using classic speech recognition tasks like lexical or semantic decision (e.g., Ziegler and Ferrand, 1998; Ventura et al., 2004; Pattamadilok et al., 2007b, 2009b; Perre and Ziegler, 2008; Peereman et al., 2009; Perre et al., 2009b), the issue concerning the locus of the effects in the speech processing route remains controversial. The debate focuses on whether the effects take place at the lexical or prelexical processing level. To shed light on this issue, a number of recent event-related potential (ERP) studies have been performed. In both auditory lexical and semantic decision tasks, the orthographic effect was clearly observed (Perre and Ziegler, 2008; Pattamadilok et al., 2009b; Perre et al., 2009a,b). A direct comparison of the time course of the orthographic effect and the word frequency effect, which is a marker of lexical access, showed that the orthographic consistency effect took place in a restricted time window around 350 ms post-stimulus onset. This was earlier than the frequency effect observed in a late and large, 450–750 ms, time window. The finding suggests that the orthographic influence takes place early enough to constrain lexical access and could thus be considered as being prelexical. Yet, it is important to highlight that such effect was found in speech processing tasks that explicitly require lexical processing.

The prelexical account of the orthographic effect seems to be in contradiction with the quasi-absence of an orthographic effect in shadowing or auditory naming task (Ziegler and Ferrand, 1998; Ventura et al., 2004; Pattamadilok et al., 2007b). Contrary to tasks like lexical decision or semantic judgment, the shadowing response does not explicitly require lexical processing or rely on any binary choice decision. It requires only a precise analysis of the phonetic properties of the stimulus (word or pseudoword) in order to build an articulatory plan. Therefore, the task provides a way to investigate the orthographic influence at the prelexical speech processing level.

So far, the few studies that investigated the orthographic consistency effect in this non-lexical task have led to inconclusive findings. Ventura et al. (2004) were the first to explore the influence of orthographic consistency in shadowing. The absence of the orthographic consistency effect in the standard version of this task, combined with the fact that the effect was found in lexical decision and in lexically contingent shadowing where participants had to repeat the stimulus only when it was a word, led the authors to conclude that only lexical processes of spoken word recognition are affected by orthographic consistency. This finding was replicated by Pattamadilok et al. (2007b) who showed that the lexical interpretation of the orthographic consistency effect is also valid in a language (French) which written code is orthographically more inconsistent than the one of the language tested in Ventura et al.’s study (Portuguese).

To our knowledge, Ziegler et al.’s (2004) study is the only one that suggests that the orthographic effect might not be totally absent in the shadowing task, although the effect reported in their study was significant only in the analyses by subjects. Nevertheless, their finding could be explained by the fact that only word stimuli were presented, which may have induced the participants to perform the task at the lexical level. Moreover, the effect reported by the authors was restricted to the comparison between consistent words and inconsistent words with subdominant rime spellings (i.e., the rime spellings that occur in few or no monosyllabic words of a particular rime family, e.g., the “ign” spelling of the rime/- aIn/), and was not observed when consistent words were compared to inconsistent words with dominant rime spellings (i.e., the rime spellings that occur in most monosyllabic words of a particular rime family, e.g., the “ine” spelling of the rime/- aIn/). In their material, the consistent and inconsistent words with subdominant spellings also differed in the number of orthographic neighbors. A previous study from the same research group (Ziegler et al., 2003) showed a facilitatory effect of orthographic neighboring in shadowing. Despite this finding, the authors argued that the crucial factor in explaining effects of orthography in this task is the consistency of the phonology–orthography mapping rather than the sheer number of orthographic neighbors. In any case, these findings and the argument that the orthographic neighboring per se was not the origin of the reported orthographic consistency effect (Ziegler et al., 2003, 2004) still stand in contradiction to the absence of orthographic consistency effect in shadowing when the number of orthographic neighbors was well controlled across consistent and inconsistent conditions (Pattamadilok et al., 2007b).

Whereas the findings obtained in the shadowing task performed by adults are contradictory, the ones obtained in young readers provide a more coherent picture. Using a Portuguese material, Ventura et al. (2007, 2008) showed that only the children from Grade 6 onward showed the adult pattern of orthographic effect on spoken word recognition, i.e., the orthographic consistency effect being present in lexical decision task but absent in shadowing. In young readers before that age, the effect was found in both lexical decision and shadowing regardless of the lexicality of the stimuli. This generalized effect of orthography was also replicated in young French readers (Pattamadilok et al., 2009a), although the developmental shift from lexical to generalized effect occurred earlier (at Grade 3) than in Portuguese.

Ventura et al. (2007, 2008) interpreted the occurrence of the orthographic consistency effect in the shadowing task performed by young readers as reflecting strong online connections between phonology and orthography at the sub-lexical level. They argued that such sub-lexical connections would become less functional in expert readers who already abandoned the grapho-phonological transcoding procedures during reading and rely on a fast access to lexical orthographic representations. This change in reading strategy that resulted in a reduction of the strength of the sub-lexical connections between phonology and orthography provides an explanation to the nearly absence of the orthographic effect in the shadowing task performed by adults.

Given that shadowing can be, but is not necessarily, performed at the prelexical level, we used this task to test the lexical account of the orthographic influence on speech processing by exploring the conditions under which adults could display a consistency effect. As illustrated in Radeau et al.’s (2000) study, a lexical variable such as the uniqueness point influences shadowing only if the words are presented at a slow speaking rate (2.2 syll/s) but not with a somewhat faster rate (3.6 syll/s). This suggests that within the same speech processing situation, lexical information takes more time to emerge than prelexical information. Under the hypothesis that, in adults, the consistency effect originates from lexical processing, an experimental manipulation that would render prelexical processing difficult, thus offering lexical (including word spelling) knowledge an opportunity to influence perception of the items to be shadowed, would most likely lead to the observation of an orthographic consistency effect.

In other words, the presence of an orthographic consistency effect only when shadowing was performed at the lexical level would confirm our previous findings that the interaction between the spoken and the written codes was restricted to the lexical processing level (Ventura et al., 2004; Pattamadilok et al., 2007b). This would question the existence of feedback connections from the lexical (including orthographic knowledge) to the prelexical processing level or, at least, the existence of direct connections between phonology and orthography at this early stage. On the contrary, the presence of an orthographic consistency effect in this task even when it was performed at the prelexical level would provide further evidence that the interaction between the spoken and written codes takes place at all speech processing levels.

The experimental manipulation we resorted to is the presentation of the stimuli in clear or in noise. We thus compared shadowing of words and pseudowords presented either against a silent or a noisy background (henceforth, silent vs. noise condition, respectively). Obviously, shadowing in noise is a more difficult task than shadowing a clear input. To better interpret the observation, which we predicted, of an orthographic effect only in noisy shadowing, we also manipulated word frequency, i.e., a marker of lexical processing. The occurrence of the orthographic consistency effect in this noisy situation without evidence of lexical processing would suggest that orthographic knowledge also affects prelexical processing, even in adults.

On the contrary, the co-occurrence of the orthographic and frequency effects or the interaction between these two factors would, at this stage of research, reinforce the lexical account of the orthographic consistency effect. By making the task more difficult, the cognitive system may indeed resort to other kinds of information or processing mechanism (in occurrence, lexical knowledge, including the orthographic one) that otherwise would not be necessary to achieve good performance.

Materials and Methods

Participants

A total of 42 undergraduate students of the Université Libre de Bruxelles (8 men and 34 women, aged 17–45 years; average: 21.6 years) participated in the experiment as part of a psycholinguistics course. All were native French speakers. None reported hearing or language disorder. Twenty participants were tested in the silent condition. The remaining were tested in the noise condition.

Stimuli

All the stimuli were recorded by a male native French speaker in a soundproof room on a MiniDisk recorder. They were digitized at a sampling rate of 32 kHz with 16-bit analog-to-digital conversion, using the Sound Tools/DigiDesign editor software on a Macintosh SI Computer. The complete set of stimuli consisted of 160 monosyllabic French stimuli. Half of them were words, the others were pseudowords (Tables A1 and A2 in Appendix). The word list included 20 sets of four words sharing at least their initial phoneme. Initial phoneme matching was crucial since the differences in articulatory requirements for producing different sounds as well as difficulties at detecting different sounds with the voice key might affect the RT data (Kessler et al., 2002; Rastle and Davis, 2002). Each stimulus set consisted of one consistent/high-frequency word, one consistent/low-frequency word, one inconsistent/high-frequency word, and one inconsistent/low-frequency word. Consistent words (i.e., those with phonological rimes that are spelled in only one way) and inconsistent words (i.e., those with phonological rimes that can be spelled in more than one way) were selected on the basis of Ziegler et al.’s (1996) statistical analysis of bi-directional consistency of spelling and sound in French. This database provides the phonological rimes of monosyllabic words with their corresponding spelling and consistency ratio. The consistency ratio is the summed frequency of the words with the same rime and body relative to the summed frequency of words with the same rime (Ziegler and Ferrand, 1998). It thus reflects the degree of rime/body consistency, and varies between 0 and 1, being, by definition, one for consistent words. The word stimuli were matched for the following variables across conditions: mean duration, number of phonological and orthographic neighbors (i.e., words that can be obtained by replacing one phoneme or one letter by another phoneme or another letter, respectively), number of phonemes, number of letters, and phonological and orthographic uniqueness points (see Table A3 in Appendix). Consistent and inconsistent words were also matched for their frequency within either the ensemble of high-frequency words or the ensemble of low-frequency words (all ps > 0.10; New et al., 2004). The pseudoword list included 40 pairs of pseudowords that shared at least their initial phoneme. Each pair contained one consistent pseudoword (that ended with a consistent rime) and one inconsistent pseudoword (that ended with an inconsistent rime). They were matched for the following variables across conditions: mean duration, number of phonological neighbors, deviation point, and number of phonemes (all ps > 0.20, see Table A4 in Appendix). In the noise condition, zero-mean Gaussian white noise was added independently to each speech signal sample. The signal-to-noise ratio was 12.5 dB.

Procedure

Participants were tested individually in a quiet room. The stimuli were presented to the listener through headphones. The average intensity level of the stimuli was 70 dB and was kept constant across participants.

During the task, participants were instructed to listen carefully to each stimulus and then to repeat it as rapidly and accurately as possible. The vocal response triggered a voice key connected to a button box. Naming latencies were measured from the onset of the stimulus to the onset of the participants’ vocal response. Responses were recorded, which enabled identification of naming errors. In both tasks, presentation, timing, and RT data collection were controlled by E-prime 1.1 software (Schneider et al., 2002).

The stimuli were divided into four blocks of 40 stimuli each. Order of stimulus presentation was pseudo-randomized, with the constraints that words or pseudowords and consistent or inconsistent stimuli never occurred more than three times in a row and that the same phonological rime never occurred consecutively. Each block started with two fillers. There was a 1.5-s interval between each response and the beginning of the next trial. After each block, participants decided when they were ready to continue. The experimental session was preceded by one practice block of 10 trials consisting of five words and five pseudowords randomly presented. The rimes of these practice stimuli were different from those of the critical stimuli. The experiment lasted about 20 min. The same lists of stimuli and procedure were used in the silent and noise condition.

To control for orthographic knowledge, the word stimuli were presented through headphones once more at the end of the session (always without noise), one at a time with a 4-s interval between stimuli. The participants were asked to write down each critical stimulus as they heard it.

Results

Preliminary Data Inspection

Before performing the analyses, we looked at each subject’s performance in the spelling task. The RT and the accuracy data of the critical word trials that were not spelled correctly were discarded (corresponding to 1.8% of all critical word trials, in average). Inspection of the accuracy scores of the remaining data led us to discard one subject from the silent condition and two subjects from the noise condition who showed the scores lower than mean accuracy of the group minus 3 SD. For the remaining subjects, RTs longer or shorter than the mean RT ± 3 SD were also discarded from further analyses. This was done by subject separately for each stimulus type (as defined by frequency and consistency), leading us to eliminate 1.1 and 2.1% of the RT data on words and pseudowords, respectively.

Finally, further inspection of each item’s error rate showed that among the word stimuli presented in the noise condition, the following words led to more than 40% error: prince (HF/consistent); tronche (HF/inconsistent); bribe, meute (LF/consistent); bouse, couque, latte (LF/inconsistent). Including their data in the analyses might bias the overall result. Therefore, the RT and error rate of each of these items were replaced by the mean RT and mean error rate of the condition to which it belonged. This was done separately for each subject. In order to make the data observed in the silent and the noise condition comparable, the same procedure was also applied to the same words presented in the silent condition although their error rate were far below 40%. As regard the pseudowords presented in the noise condition, only 21 out of 40 consistent pseudowords and 22 out of 40 inconsistent pseudowords led to lower than 40% error. Given this overall high error rate, further analyses were not performed on pseudoword data (although their data are provided in Tables 1 and 2 for information).

TABLE 1

Table 1. Mean raw RTs for correct responses and error rates (SD in brackets) observed in the silent condition.

TABLE 2

Table 2. Mean raw RTs for correct responses and error rates (SD in brackets) observed in the noise condition.

Raaijmakers (2003), Raaijmakers et al. (1999) called the routine use of the analyses by subjects and by items into question and proposed to take the details of the experimental design into account before deciding on the particular analyses of variance (ANOVA) to be performed. According to the authors, the traditional F1 is the correct test statistic when item variability is experimentally controlled by matching items across conditions. Based on the simulation results of Wickens and Keppel (1983), the authors concluded that taking both subjects and items as random factors might considerably reduce the power of the analysis especially when the matching of the items is not taken into account in the item analysis.

Given that in the present study the stimuli presented in the different experimental conditions were not chosen randomly, but were carefully matched as close as possible regarding potentially relevant psycholinguistic variables, the results presented below are only based on analyses by subjects (F1). In order to ascertain the reliability of the reported effect, the effect size and the confidence interval around it (Dunlap et al., 1996; Johnson and Eagly, 2000) were also estimated. Finally, despite being overly conservative as regards the experimental design used here, a mixed-effects model was also used to perform the main analyses on both the RT and accuracy data.

Reaction Time Analyses

The ANOVA run on the RT data of correct responses included consistency (consistent vs. inconsistent) and frequency (high vs. low frequency) as within-subject factors. Listening condition (silent vs. noise) was treated as a between-subjects factor. The analysis showed a significant effect of listening condition [F(1,37) = 8.0, p < 0.01]. The three-way interaction almost reached significance [F(1,37) = 3.8, p = 0.058]. As illustrated in Tables 1 and 2, further analyses performed separately on the RT data obtained in the silent and the noise conditions showed no significant effect or interaction in the silent condition (Fs ≤ 1). Interestingly, in the noise condition, while neither the main effect of frequency (F < 1) nor of consistency was significant [F1(1,19) = 3.1, p = 0.10), there was a significant interaction between these two factors [F(1,19) = 5.9, p = 0.025]. The interaction reflected longer RTs for inconsistent compared to consistent high-frequency words [F(1,19) = 34.6, p < 0.00005]. No consistency effect was found on low-frequency word repetition (F < 1). A more detailed analysis of the consistency effect also showed that while consistent low-frequency and high-frequency words did not differ in their RTs [t < 1], orthographic inconsistency slowed down the RTs on high-frequency words [t1(19) = 2.1, p = 0.05].

Accuracy Analyses

The same ANOVA run on the error rates showed a significant effect of listening condition [F(1,37) = 67.2, p < 0.00001] as well as a frequency × consistency interaction [F(1,37) = 5.7, p < 0.025]. The frequency effect [F(1,37) = 3.8, p = 0.06] and the three-way interaction [F(1,37) = 3.5, p = 0.07] almost reached significance. As showed in Tables 1 and 2, the silent condition showed a marginal frequency effect [F(1,18) = 3.4, p = 0.08] with a slightly better performance observed on high-frequency words (0.6 vs. 1.4% ERR for high- and low-frequency words, respectively). As regards the noise condition, there was a significant frequency × consistency interaction [F(1,19) = 5.0, p < 0.05]. Coherently with the RT data, the orthographic influence was restricted to high-frequency words [F(1,19) = 5.0, p < 0.05], with better performance obtained on consistent words. As regards low-frequency words, an apparent lower error rate in the inconsistent condition was not statistically significant [F(1,19) = 1.9, p > 0.10], probably due to high variability of the scores. A direct comparison of the error rates obtained on high-frequency and low-frequency words showed that while orthographic inconsistency did not affect the performance [t < 1], orthographic consistency improved the performance obtained on high-frequency words [t(19) = 2.3, p < 0.05].

Additional Analyses

Does the orthographic effect observed in the noise condition reflect the specific involvement of lexical processing or is it merely associated with the longer reaction times and smaller accuracy of this condition?

In the analyses presented above, the orthographic consistency effect was found only in the noise condition where the participants took more time to repeat words and committed more errors. The presence of the orthographic effect could thus be explained either by the mere difference in overall performance between the two listening conditions (i.e., longer RTs and/or higher error rates would have allowed the effect to emerge) or the different cognitive processes that had been recruited (i.e., the task being performed at the lexical level). Although these two factors cannot be completely dissociated given that accessing lexical information takes more time than accessing prelexical information (Radeau et al., 2000), it would be interesting to ascertain that longer RTs and/or higher error rates per se was not the unique cause of the emergence of the orthographic effect. To do so, we standardized the raw RTs and error rates of each listening condition. More precisely, within each listening condition, the raw RTs (and error rates) were subtracted from the mean RT (and mean error rate) of the group and divided by the SD of the group. As a result, in both listening conditions, the mean and the SD of these standardized values were 0 and 1, respectively. If the emergence of the orthographic effect were due to different cognitive processes engaged in the silent and the noise conditions rather than to a simple increase in RTs and/or error rates in the noise condition, we should observe the same result pattern as the one previously obtained in the main analyses.

The same ANOVA as in the main analyses was performed on the standardized RT data. We observed again an almost significant three-way interaction [F(1,37) = 3.5, p = 0.069]. As illustrated in Table 3, while no significant effect or interaction was found in the silent condition (all Fs < 1), in the noise condition there was a significant interaction between consistency and frequency [F(1,19) 5.9, p = 0.025], reflecting a deleterious effect of orthographic inconsistency on high-frequency word repetition [F(1,19) = 34.6, p < 0.00005]. This confirmed the results obtained in the main analyses. A similar result pattern was obtained on the standardized error rates (cf. Table 4). Although the three-way interaction was no longer significant (F < 1), further analyses performed separately on the data from the two listening conditions provided the results that were coherent with the RT data. No significant effect or interaction was reported in the silent condition [Consistency: F = 1; Frequency: F(1,18) = 3.35, p > 0.05; Consistency × frequency: F = 1]. In the noise condition, the consistency × frequency interaction was significant [F(1,19) = 5.02, p < 0.05], with better performance for consistent compared to inconsistent high-frequency words [F(1,19) = 4.61, p < 0.05].

TABLE 3

Table 3. Mean standardized RTs for correct responses and error rates (SD in brackets) observed on word stimuli in the silent condition.

TABLE 4

Table 4. Mean standardized RTs for correct responses and error rates (SD in brackets) observed on word stimuli in the noise condition.

Reliability of the orthographic effect: Computation of the effect size and the confidence interval around the observed effect size

To further ascertain that there was a genuine influence of noise on the occurrence of the orthographic effect, we directly compared the effect sizes obtained in the two listening conditions. This method of comparison has an advantage over the ANOVA that enters listening condition as a between-subjects factor (as we did in the main analysis) because it is independent of the variability and the distribution of the data inherent to each condition.

This being, the performance differences observed between consistent and inconsistent high-frequency words in the silent and the noise condition were compared through the estimation of the RT consistency effect sizes, using the necessary adjustment procedure for repeated measures (Dunlap et al., 1996; Johnson and Eagly, 2000). The effect size obtained in the noise condition was d = 1.3 while it was d = −0.008 in the silent condition. The lower limit obtained from the estimation of the 5% one-sided confidence interval for the effect size obtained in noise condition was d = 0.29 (Cohen, 1994; Schmidt, 1996). Thus, the difference found in the silent condition was clearly out of the range of the confidence interval for the effect size of the noise condition. As for the error rates, the effect size obtained in the noise condition was d = 0.48; while this is small, it was completely absent (d = 0) in the silent condition.

Replication of the main analysis using the mixed-effects model (Baayen et al., 2008)

Although we considered that taking into account items as a random factor is not appropriate in the current experimental design where the items in the different conditions were carefully matched across several psycholinguistic variables, we also reanalyzed separately the RT and the accuracy data using a mixed-effects model. The analysis on error rates concerned 2996 observations. The one performed on RTs concerned the 2808 correct responses. Visual inspection of the shadowing latencies showed that the distribution was normal. No transformation was applied to raw data. Item and subject were treated as random factors. Listening condition, frequency, consistency as well as their interactions were treated as fixed factors. The details of the results are in Table 5. In sum, we obtained a similar result pattern as in the main ANOVAs presented above with a significant effect of listening condition and the interaction between listening condition, frequency, and consistency on both the RT and accuracy data.

TABLE 5

Table 5. Summary of the mixed-effects model on the RTs and the error rates.

Discussion

Today, the influence of orthographic knowledge on speech processing is generally accepted. Nevertheless, there is still a controversy on whether this conclusion can be applied to all speech processing tasks. Addressing this issue requires one to investigate the orthographic influence in different speech processing situations that vary in the processing stages and nature of the representations they involve.

If the influence of orthographic knowledge in tasks that require lexical processing has been consistently observed (e.g., Ziegler and Ferrand, 1998; Ventura et al., 2004; Pattamadilok et al., 2007b; Ziegler et al., 2008), this is not the case in tasks that can be performed at the prelexical processing level, like shadowing (Ventura et al., 2004; Ziegler et al., 2004; Pattamadilok et al., 2007b). The present study aimed at identifying the processing level at which the orthographic effect emerges, which would also shed light on the incoherent results previously obtained in this specific task.

The present experimental design allowed us to compare the occurrence of the orthographic consistency effect in two versions of the shadowing task, namely, when the speech signal was clearly presented (the silent condition) and when its quality was degraded by an additional noise (the noise condition). The result obtained in the silent situation replicated the findings previously obtained by Ventura et al. (2004) and Pattamadilok et al. (2007b), i.e., no hint of an orthographic consistency effect. On the contrary, when a background white noise was added to the speech signal, the overall performance level decreased and orthographic consistency affected both RTs and error rates, with a better performance being observed in the consistent condition. Interestingly, this effect was restricted to high-frequency words.

The Sources of the Orthographic Effect in Shadowing in Noise

Understanding the impact of noise on the way speech is processed is important to reveal the mechanism underlying the occurrence of the orthographic effect in the current study. Although unlikely, it is still possible that the higher RTs and error rates obtained in the noise condition is the critical factor that allowed the orthographic effect to become observable, regardless of the cognitive processes that came into play. However, this explanation in terms of ceiling effect is not supported by the results of the additional analyses performed on the standardized data. In fact, the orthographic effect was still present in and restricted to the noise condition even when we took the overall performance differences between listening conditions into account.

Another plausible explanation relies on the assumption that increasing task difficulty may, under some circumstances, induce changes in the processes involved in performing a task. This was illustrated in Obleser et al.’s (2007) study where participants were required to listen to noise vocoded sentences (Shannon et al., 1995) that varied in level of intelligibility (high, intermediate, low) and semantic predictability (low and high). At the behavioral level, the authors showed that semantic predictability was most effective in improving performance only at an intermediate signal quality but not when intelligibility was high (e.g., in normal speech) or extremely low. In accordance with this result, their brain imaging data showed that such increase in comprehension when the intermediate intelligible speech was presented in a highly predictive semantic context was associated with an increase in the functional connections between areas in the temporal, inferior parietal, and prefrontal cortices. The activity in these areas returned to baseline in an easy perceptual situation, which further suggested that the integration of the semantic context became relevant only when the signal was degraded but still intelligible. According to the authors, the widespread activations are likely to represent a number of cognitive-supporting mechanisms that come into play in difficult speech processing situations. Mechanisms such as attention, monitoring, selection process, and working memory would be necessary for processing the bottom-up acoustic signal, extracting meaning, and making a selection among many word candidates that were activated by the ambiguous signals. This neuronal account also provides a possible explanation to phenomena like “spectral restoration,” “phonemic restoration,” or “auditory induction” where high-order information is used to compensate for poor acoustic signal (e.g., Warren, 1970; Warren et al., 1972; Samuel, 1981; Assmann and Summerfield, 2004; Mcclelland et al., 2006). However, whether these phenomena occur at a conscious or unconscious level and result from automatic or strategic processes remains a matter of debate (Mcqueen et al., 2006).

How could this argument be applied to the orthographic consistency effect in shadowing? When the acoustic signal is clear, speech could be accurately perceived and reproduced merely on the basis of the acoustic information. Although it is impossible to completely rule out the contribution of higher-order information such as lexical, semantic, or syntactic one, several studies showed that the effects of these factors are extremely reduced in shadowing in comparison to more demanding speech processing tasks (e.g., Balota and Chumbley, 1985; Marslen-Wilson, 1985; Connine et al., 1990; Radeau and Morais, 1990; Radeau et al., 1995). But in situations where the acoustic signal is degraded, the speech processing system resorts to redundancy that is present at several levels, namely, acoustic, phonetic, phonologic, semantic, syntactic, or pragmatic as mentioned above. Under this perspective, orthography, which is a form of lexical knowledge, could also act as an additional source of redundancy that may help or hurt speech processing, depending upon the relation between the two codes. As already demonstrated in many studies, while orthographic consistency or congruency between phonology and orthography improves speech processing, orthographic inconsistency, or incongruency between the two codes appears to be deleterious (e.g., Seidenberg and Tanenhaus, 1979; Donnenwerth-Nolan et al., 1981; Zecker, 1991; Dijkstra et al., 1995; Ziegler and Ferrand, 1998).

The Interaction between Orthographic and Word Frequency Effects

Our analyses showed that only high-frequency words were affected by orthographic knowledge. This result pattern is in contrast with the prediction of a resonance model that, based on the findings obtained in visual word recognition, claims for a stronger consistency effect on low-frequency than on high-frequency words (Seidenberg et al., 1984; Van Orden et al., 1990; Stone and Van Orden, 1994; Stone et al., 1997; Ziegler and Ferrand, 1998). The model assumes that the greater amount of learning for high-frequency words will reinforce spelling-to-sound mappings at the biggest grain-size (i.e., word level) where inconsistency is smaller than at the smaller grain-size. Seidenberg et al. (1984) explained the occurrence of the spelling-to-sound consistency/regularity effect only on low-frequency words in a reading task by the assumption that high-frequency words are rapidly recognized on the basis of familiar visual information, with pronunciation subsequently read out of memory storage. A similar explanation was proposed by the dual-route reading models considering that the addressed process is faster for high-frequency words. Thus, only low-frequency words would suffer from irregular pronunciations generated by the non-lexical route (Coltheart and Rastle, 1994).

However, more empirical data is needed to decide whether this widely accepted assumption as well as its underlying mechanism can be applied to the auditory domain, especially in tasks that could be performed within a single modality (which is not the case in the reading aloud task, where participants are required to pronounce written stimuli). To our knowledge, none of the studies investigating orthographic effects on speech recognition reported a stronger orthographic consistency effect on low- than on high-frequency words (Pattamadilok et al., 2007b, 2009b). The opposite result pattern observed here might represent the final outcome of different sub-processes. First, we argued that adding noise into the speech signal equally decreased the performance level for both high- and low-frequency words. This is coherent with the absence of the frequency effect in both the silent and noise conditions. To maintain a good level of performance in this situation, the cognitive system resorted to other kinds of information, including the orthographic one. The observation that only high-frequency words were influenced by orthographic knowledge could be explained by the assumption that although the acoustic input was degraded, the abstract orthographic representations associated with high-frequency spoken words would remain more stable and could be accessed faster than the ones associated with low-frequency spoken words. This fast access would enable the orthographic representations of high-frequency words to interfere with shadowing performance before the response had been given. Although the literature provides no direct evidence on the time course or strength of activation of the orthographic representations associated with high- and low-frequency spoken words, our assumption is plausible given that high-frequency words are encountered more often (in both auditory and visual modalities) and their spellings are generally acquired earlier in reading acquisition (e.g., Backman et al., 1984; Gerhand and Berry, 1998; Sprenger-Charolles et al., 1998; but see Morrison and Ellis, 1995).

The Underlying Mechanism of the Orthographic Effect

The interpretation that orthography provides an additional source of redundancy in difficult speech perception situations implicitly implies that the orthographic representations were activated in an online way, that is, that the phonological input contained in the speech signal subsequently activated its corresponding orthographic representation (for more detailed descriptions of this account see Grainger and Ferrand, 1994, 1996; Ziegler and Ferrand, 1998; Grainger et al., 2003). Some previous studies nevertheless provided evidence in favor of an alternative, offline, account suggesting that the orthographic consistency effect reflects a modification of the nature of the phonological representations that occurs during reading acquisition (Perre et al., 2009b; Dehaene et al., 2010; Pattamadilok et al., 2010; for more detailed descriptions of this account see Taft and Hambly, 1985; Harm and Seidenberg, 1999, 2004; Muneaux and Ziegler, 2004; Taft, 2006). Although the latter underlying mechanism can nicely account for the orthographic effects previously reported in lexico-semantic tasks where both low- and high-frequency words were equally affected by orthography (Pattamadilok et al., 2007b, 2009b; Perre et al., 2009b), it fails to explain the absence of orthographic effect on low-frequency words observed here. One way for the offline account to accommodate the current finding is to further assume that orthographic knowledge does not modify the nature of the phonological representations of high- and low-frequency words to the same extent and that high-frequency words are more affected by reading acquisition. For instance, high-frequency words would benefit from more consolidated past reshaping of word phonological representations by spelling. Although this possibility cannot be excluded, so far, there is no empirical evidence supporting it.

Implications for Speech Processing Models

As mentioned in the Introduction, both autonomous and interactive speech recognition models could be enriched to accommodate the influence of orthographic representations on spoken word processing. Nevertheless, based on the current finding, the occurrence of the orthographic effect as well as the mechanism subserving its emergence must not be considered as a general phenomenon. First of all, if the effect occurs, it was restricted to the situation that required higher-order cognitive processes rather than the pure perceptual processes that allow shadowing at the prelexical level. Secondly, the effect was also restricted to the auditory stimuli that allow a fast access to their corresponding orthographic representations, i.e., high-frequency words. Although we acknowledge that these observations could be specific to the current speech processing context, together with the other findings reported in the literature, it seems quite clear that neither the models that claim for a systematic interaction between phonology and orthography by means of a single mechanism (e.g., the bimodal interactive activation: Grainger and Ferrand, 1994, 1996) nor those that implicitly assume the absence of such an effect (e.g., Race or Merge model: Cutler and Norris, 1979; Norris et al., 2000) are adequate to account for the existing findings. What one needs is a model flexible enough to accommodate the change in the type of knowledge representations and cognitive processes as a function of task demands and current outcome.

Conclusion

By investigating the occurrence of the orthographic consistency effect in the standard version of the shadowing task and in a situation where the acoustic signal was degraded by additional noise, we observed the orthographic effect only in the latter situation. These data suggest that the cognitive system is able to adjust itself and resort to different levels of processing and representations as a function of task difficulty. In an easy and shallow speech processing situation where a good level of performance could be reached only on the basis of elementary cognitive processes and information, there is no need for the system to resort to more complex mechanisms. However, in a situation where speech intelligibility is compromised, the cognitive system attempts to maintain a good level of performance by making use of various kinds of available information, including orthographic knowledge. Nevertheless, the fact that in the current study, orthographic knowledge exerted its influence only in the difficult speech perception situation does not necessary imply that the orthographic influence is completely absent in tasks that can be performed at lower processing levels. At this stage of research, one cannot fully exclude the possibility that the behavioral measures that have been used so far are not sensitive enough to reveal orthographic influences at low processing levels. An objective of our future research is to use more fine-grained measures like the ERPs to investigate this issue.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This research was funded by a grant from the Region de Bruxelles-Capitale (Institut d’Encouragement de la Recherche Scientique et de l’Innovation de Bruxelles, Brains Back to Brussels program) thanks to which Chotiga Pattamadilok benefited from a post-doctoral position, as well as by an Action de Recherche Concertée grant (06/11-342) of the Belgian French community. Régine Kolinsky is Research Director of the FRS-FNRS. We thank Matthieu Dubois and Kevin Diependaele for their help in material preparation and statistical analyses, respectively.

References

Assmann, P., and Summerfield, Q. (2004). “The perception of speech under adverse conditions,” in Speech Processing in the Auditory System, eds S. Greenberg, and W. Ainsworth (Berlin: Springer), 231–308.

Baayen, R. H., Davidson, D. J., and Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang. 59, 390–412.