Syllable frequency and word frequency effects in spoken and written word production in a non-alphabetic script

Zhang, Qingfang; Wang, Cheng

doi:10.3389/fpsyg.2014.00120

ORIGINAL RESEARCH article

Front. Psychol., 18 February 2014

Sec. Cognitive Science

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.00120

This article is part of the Research TopicWriting words: From brain to hand(s)View all 22 articles

Syllable frequency and word frequency effects in spoken and written word production in a non-alphabetic script

Qingfang Zhang^1,2^*

Cheng Wang²

¹Department of Psychology, Renmin University of China, Beijing, China
²Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China

The effects of word frequency (WF) and syllable frequency (SF) are well-established phenomena in domain such as spoken production in alphabetic languages. Chinese, as a non-alphabetic language, presents unique lexical and phonological properties in speech production. For example, the proximate unit of phonological encoding is syllable in Chinese but segments in Dutch, French or English. The present study investigated the effects of WF and SF, and their interaction in Chinese written and spoken production. Significant facilitatory WF and SF effects were observed in spoken as well as in written production. The SF effect in writing indicated that phonological properties (i.e., syllabic frequency) constrain orthographic output via a lexical route, at least, in Chinese written production. However, the SF effect over repetitions was divergent in both modalities: it was significant in the former two repetitions in spoken whereas it was significant in the second repetition only in written. Due to the fragility of the SF effect in writing, we suggest that the phonological influence in handwritten production is not mandatory and universal, and it is modulated by experimental manipulations. This provides evidence for the orthographic autonomy hypothesis, rather than the phonological mediation hypothesis. The absence of an interaction between WF and SF showed that the SF effect is independent of the WF effect in spoken and written output modalities. The implications of these results on written production models are discussed.

Introduction

Although effects of word frequency (WF) and syllable frequency (SF) have been investigated systematically in the speech production domain, only a few of studies address similar issue in the written production of normal subjects (i.e., Bonin et al., 1998a; Bonin and Fayol, 2002). As a result, the study of WF and SF is far more advanced in speaking than in writing. The current view of speech production provide a general theoretical framework from which hypotheses specific to writing can be derived. Hence, it could be argued that the generation of written words should be investigated in close parallel to spoken production. In the work reported here, we investigated WF and SF effects, and their interaction in spoken and written production in Chinese, and addressed the general question of whether or not written production is independent or dependent on spoken production.

A central theoretical issue in the field concerns the extent to which written production is autonomous from or dependent on, spoken production. Early theoretical accounts claimed that the retrieval of an orthographic representation was entirely dependent on the prior retrieval of phonological codes, which is called the obligatory phonological mediation hypothesis. Evidence supporting this view comes from the common introspective experience of how written codes are generated (Hotopf, 1980), and the phonologically mediated spelling errors such as homophone substitutions (e.g., there for their) or quasi-homophone substitutions (e.g., dirth for dearth) (Aitchison and Todd, 1982). Neuropsychological patients with writing disorders present comparable impairments in spoken and written language production (Luria, 1970; Basso et al., 1978).

However, other neuropsychological studies have demonstrated dissociations between spoken and written production. For example, Rapp et al. (1997) presented the case of a neurologically impaired individual who was often able to write the names of pictures correctly while being unable to provide the correct spoken names. Miceli et al. (1997) reported a patient who, when presented with a picture, sometimes generated different spoken and written responses (e.g., for picture of pliers, he would say pincers but write saw) (see Alario et al., 2003 for a similar case study). The agraphic patients also produced errors with phonologically illegal spelling (e.g., Caramazza and Miceli, 1990). These findings motivated the “orthographic autonomy hypothesis,” which assumes that individuals can gain access to orthographic representation directly from meaning without phonological mediation (Rapp and Caramazza, 1997).

This account, however, does not necessarily imply that intact writing is unaffected by phonological codes in normal individuals. Relatively few empirical studies have addressed the relationship between phonological and orthographic codes with chronometric tasks, and the results have not been consistent. A few studies have demonstrated that phonological codes indeed influence writing (e.g., Bonin et al., 2001; Zhang and Damian, 2010; Afonso and Álvarez, 2011; Damian et al., 2011). Bonin et al. (2001) manipulated the consistency of phonology-orthography mappings in picture names to identify the potential effects of phonological codes in written picture naming. Word-initial inconsistencies at the sublexical level were found to affect writing latencies: picture names with inconsistent phono-orthographic mapping were written more slowly than those with consistent ones, whereas no difference was found when consistency was manipulated at the lexical level. This finding further suggests that phonology affects orthographic encoding mainly via the sublexical route. In contrast, Bonin et al. (1998b) did not obtain evidence supporting the role of phonology in a picture writing task.

Overall, although some tentative evidence exists suggesting that phonological codes constrain orthographic output tasks such as handwriting, more evidence is needed to resolve this controversial issue. The experiments reported in this article contribute to this debate by comparing the WF effect and the SF effect in spoken and written production. In the following we will sketch a provisional framework which accommodates the effects of WF and SF, and previous studies on the effects of WF and SF.

The Word Frequency and Syllable Frequency Effect in Speech Production

The Word Encoding by Activation and VERification model (WEAVER++) is the most detailed theory about word-form encoding in speech production (Roelofs, 1992, 1997a,b; Levelt et al., 1999). The WEAVER++ assumes two steps for word-form encoding in speech production. A first step is to select the word's form information in the mental lexicon. There are two kinds of word form information: a word's segmental and its metrical form. A morpheme initially activates all its corresponding phonological segments and their order. In parallel to this segmental spell out, metrical codes containing an abstract grouping of syllables into phonological words are retrieved. Both segments and metrical structure (frame) are subsequently merged in a syllabification process in a strictly sequential fashion (segment-to-frame association). The second step is to compute or access the gestural score from the mental syllabary that will result in a phonological word's syllables, and this process is sometimes called phonetic encoding. The mental syllabary can provide pre-compiled gesture scores for phonetic encoding, and the mental syllabary is a store of abstract motor routines of syllabic size. The above-mentioned two steps are successive and independent in the WEAVER++ model.

It has been demonstrated that the retrieval of word form is sensitive to WF as high-frequency words are named faster than low-frequency words in picture naming (Oldfield and Wingfield, 1965; Wingfield, 1968). In Jescheniak and Levelt Study (1994) frequency effects were not obtained in either an object recognition task, or in a delayed word production task. The object recognition task taps into conceptual representations of speech production, while the delayed naming task taps into articulation process, and therefore, the absence of frequency effects in both tasks indicates that the WF effect is lexical in origin. They further demonstrated that the WF effect in word production is due mainly to accessing the phonological forms of words. Other evidence supporting the attribution of frequency effects to phonological forms comes from studies of homophone production (i.e., Stemberger and Macwhitney, 1986; Dell, 1990; La Heij et al., 1999; Jescheniak et al., 2003). However, other researchers (Caramazza et al., 2001; Bonin and Fayol, 2002; Shatzman and Schiller, 2004; Cuetos et al., 2010) failed to find a homophone frequency effect. These findings question the conclusion that the WF effect arises at the phonological forms of a word stage, and supports the lexical origin of WF effect, although they do not deny its influence at the phonological level.

Analogous to the findings for the WF effect, if the mentally syllabary consists of retrievable representations corresponding to syllables, then the stored syllables should exhibit a frequency effect, that is a syllable frequently used in language should be retrieved faster than one less frequently used. Levelt and Wheeldon (1994) tested this storage hypothesis by comparing retrieval latencies for high- vs. low-frequency syllables. They found that words with high frequency syllables were named faster than words with low-frequency syllables when WF was matched. The mental syllabary hypothesis assumes the pre-compiled gestural scores for the articulators rather than constructing the motor programs for each syllable on-line. If a syllable is retrieved from the mental syllabary rather than computed on-line, then the retrieval process should be sensitive to the frequency differences. Therefore, Levelt and Wheeldon interpreted this finding as support for the notion of a mental syllabary.

The hypothesis that content syllables are explicitly represented as chunks and retrieved from a mental syllabary has been tested by investigating whether or not speech production performance is sensitive to SF. SF effects were obtained in a number of studies in different alphabetic languages with words and pseudowords (German: Aichert and Ziegler, 2004; Dutch: Levelt and Wheeldon, 1994; Cholin et al., 2006; Spanish: Perea and Carreiras, 1998; Carreiras and Perea, 2004; French: Laganaro and Alario, 2006; English: Macizo and Van Petten, 2007; Cholin et al., 2011). The effect of SF in spoken output is usually facilitatory, which might be due to faster access to articulatory-phonetic syllable programs for high-frequency syllables (Levelt, 1989; Levelt and Wheeldon, 1994; Levelt et al., 1999). Laganaro and Alario (2006) employed immediate and delayed picture naming and pseudo word naming tasks, with or without articulatory suppression (i.e., repetition of the syllable /ba/) to investigate the assumption that stored syllables are retrieved during phonetic encoding by manipulating syllable frequencies. A SF effect was found in immediate pseudo-word naming, picture naming and in a delayed naming task with articulatory suppression but was not observed in standard delayed naming. As the process of articulatory suppression disrupts phonetic processing but not phonological encoding. This pattern of results is interpreted as evidence that SF affects the phonetic encoding stage.

In sum, available evidence suggests that WF plays a role at the lexical level while SF plays a role during accessing stored syllabic units (a later stage of word-form encoding) in speech production.

The Word Frequency and Syllable Frequency Effect in Written Production

Bonin et al. (1998a,b) have shown that frequency effects in writing are genuinly lexical because they did not find a significant frequency effect either in an object recognition task or in a delayed written picture naming task. Bonin and Fayol (2002) further investigated WF effects in written and spoken production of homophonic picture names, and found homophone frequency effects in spoken as well as in written production: heterographic homophonic picture names with high-frequency were produced faster than low-frequency picture names. They excluded the possibility that the effects arise at the conceptual level in a picture categorization task, and suggested that the WF effect in writing is lexical in origin. According to the obligatory phonological mediation hypothesis, the locus of WF effects should be the same in both spoken and written production, namely at the phonological lexeme level. In contrast, according to the orthographic autonomy hypothesis, which claims that orthographic representations can be accessed directly from semantic representations, the most likely locus of WF effects is at the orthographic lexeme level.

We are not aware of any studies examining the SF effect in written production. However, a few studies have demonstrated that syllables modulate processes of written production. Kandel et al. (2006a,b) observed that French 1st–5th graders write words and pseudowords syllable by syllable, reflecting that the children used the syllable as a unit for chunking letter strings in a coherent way. The syllable effect in handwriting has been demonstrated in other developmental studies (Kandel et al., 2006b) as well as in adults (Kandel et al., 2006a; Lambert et al., 2007). Kandel and her colleagues examined the nature of syllabic processing in children (Kandel et al., 2009). They manipulated orthographic and phonological matched or mismatched syllables in French word writing task. Third, 4th, and 5th graders were asked to write words that were mono-syllables phonologically (i.e., barque is [baRk]) but bi-syllables orthographically (i.e., barque = bar.que), which matched to words that were bi-syllables phonologically and orthographically (i.e., balcon = [bal.kõ] and bal.con). They found that results on letter stroke duration and fluency generated significant peaks at the syllable boundary for both types of words, reflecting that children use orthographic syllables rather than phonological syllables in handwriting production.

So far, word and syllable frequency effects have to our knowledge not been investigated in Mandarin Chinese, a non-alphabetic language. In alphabetic languages such as Spanish, syllables are predictable from orthography, and Dutch, French or English syllables are also predictable, are although less so than Spanish. By contrast, it is totally unpredictable in Chinese. Studies indicated that it is possible that a syllabic effect is not a phonological effect but an orthographic syllabic effect in written production (see also Kandel et al., 2009). Due to the unique characteristics of Chinese, a study of the SF effect in Chinese would clarify the role of phonology in written production.

A few studies addressed the role of the syllable in Chinese spoken production. O'Seaghdha et al. (2010) proposed a model of sequential steps in word form encoding in Mandarin Chinese monosyllabic word production. Similar to WEAVER++, content and structure are separated in the model. Activation from the corresponding abstract word flows to phonological content and structure. Phonological content is activated as syllables, while their syllabic frames are retrieved. Both syllabic content and frame are linked sequentially and metrical tone is also specified at this point. In contrast to WEAVER++ model of alphabetic languages, syllables are chunks in Mandarin Chinese.

According to O'Seaghdha et al. (2010), syllables are proximate units and are retrieved from the mental lexicon at an early stage of phonological encoding. This has been supported by several studies. For example, there are many syllable-sized phonological speech errors in Chinese, whereas segmental errors are quite rare (Chen, 1993, 2000). Chen et al. (2003) investigated the role of the syllable using a masked priming task as Ferrand and his colleagues had employed in French (Ferrand et al., 1996, 1997). Disyllabic Mandarin Chinese words were used as targets and single Chinese characters were used as primes. In Chen et al's third experiment, syllable overlap between prime and the first syllable of a disyllabic target was manipulated. They found that the CV targets were named faster when preceded by CV primes compared to the CVG (G represents glide sound) primes, whereas the opposite pattern was obtained for the CVG targets. The critical crossover interaction between prime type and target type was significant, and thus provides evidence for the notion that the syllable is a functional unit in speech production. You et al. (2012) obtained syllable priming effects across different stimuli and different tasks (word and picture naming), and provide more conclusive data regarding the role of the syllable in Chinese spoken production. In addition, studies using other production tasks such as the implicit priming task (Chen et al., 2002; O'Seaghdha et al., 2010) and the picture-word interference task in spoken production (Zhang, 2008; Zhang and Weekes, 2009; Zhang et al., 2009) and in written production (Qu et al., 2011) also attested to the important role of the syllable in Chinese. In contrast, most studies in Dutch, French and English demonstrated that the proximate unit is segments in alphabetic languages (Schiller, 1998, 1999, 2000; Brand et al., 2003). These contrastive findings suggest that the role of the syllable in Chinese is different from Dutch, English or French.

On the other hand, to the best of our knowledge, only a few experimental studies involving normal participants have investigated the extent to which the processes and the representations involved in speech production resemble those involved in written production (Bonin et al., 1997, 1998a,b; Bonin and Fayol, 2000) and then only in French.

In the present study, we aim to investigate the effects of WF, SF and their interaction, on spoken and written production latencies. Although spoken and written language production systems obviously share some processing levels, they also both have some specific processing components (Bonin et al., 1998a). Picture naming and writing are thought to differ beyond the conceptual-semantic level: a phonological lexeme level in naming and an orthographic lexeme level in writing (Ellis, 1982, 1988; Caramazza and Hillis, 1990). Phonological information can serve as input for articulation in spoken production and orthographic information can serve as input for orthographic output in written production. Writing also involves the retrieval of an orthographic plan as well as the execution of a motor program, which is different from speaking execution.

Given that the WF and SF effects are well-established phenomena in domains such as spoken production, it is plausible to predict similar effects in orthographic output tasks in the framework of the obligatory phonology mediation hypothesis, because the retrieval of orthographic codes depends on the retrieval of phonological codes. Our basic assumption was that, similar effects would indicate that similar processes are involved in both forms of language production: A SF effect in Chinese written production would provide support for the phonological mediation hypothesis due to deep mapping of orthography-to-phonology in Chinese. In contrast, in the framework of the orthographic autonomy hypothesis, the retrieval of orthographic codes does not require access to phonological codes, and orthographic and phonological representations can be accessed independently. We predict WF effects would differ in spoken and written output, and the absence of SF effect in written output.

A second purpose of the experiment was to assess the robustness of the effects over repetitions. Are the word and SF effects ephermeral, that is, do they dissipating with repeated use of the word? Or is it structural, insensitive to repeated processing of an individual item (see also Jescheniak and Levelt, 1994).

Experiment 1: Word Frequency and Syllable Frequency in Speaking

Methods

Participants

Twenty-four students (12 males, average 23.2 years, range 20–25 years) participated and were paid approximately $3. They were randomly taken from Beijing Forest University and China Agricultural University. All were native Mandarin Chinese speakers with normal or corrected-to-normal vision.

Materials

Sixty target pictures with monosyllabic names were selected from Zhang and Yang's (2003) picture database. Word frequencies were taken from the Modern Chinese Frequency Dictionary (Beijing Language Institute, 1986). The mean number of strokes of target names is 9.95. A Chinese character's pronunciation (pinyin) corresponds to one syllable, and thus SF was calculated by accumulating the word frequencies of one syllable (not counting tone). For 60 monosyllabic words, half were high frequency (all ≥130/per million), half were low frequency (≤47/per million). Among high and low frequency words, half had high SF (≥2558/per million), half had low SF (≤1479/per million). Note that, low-frequency syllables had above-average frequency of occurrence in the language. This is important as the WEAVER++ model claims that very low-frequency syllables will be formed on-line rather than retrieved from the mental syllabary. Thus, we used relatively low-frequency syllables in the experiment. Table 1 shows the properties of picture names and pictures used in the experiments. Statistical analyses showed that a significant difference between low and high WF [t₍₅₈₎ = 9.42, p < 0.0001], and a significant difference between low and high SF [t₍₅₈₎ = 9.57, p < 0.0001). Statistical analyses showed no significant difference between low and high WF on naming consistency, familiarity, image consistency, and complexity. All items are reported in Appendix.

TABLE 1

Table 1. Means word frequency (per million), syllable frequency (per million), number of neighbors, strokes number, naming consistency, familiarity, image consistency, and image complexity of the stimuli.

Design

The experimental design included (WF: low vs. high) and SF (low vs. high) and Repetition (1st, 2nd, and 3rd) as within-participants factors. Each participant names 60 target words three times, resulting in 180 trials in total. Each repetition was set in one block, and thus there were three blocks in total. The order of target words within a block was pseudo-randomized to prevent targets with the same onset repeating across five trials. A new sequence was generated for each participant and each block.

Apparatus

The experiment was performed using E-Prime Professional Software (Version 1.1; Psychology Software Tools). Participants were seated in a quiet room approximately 70 cm from a 19 inch LED computer screen. Naming latencies were measured from target onset using a voice-key, connected to the computer via a PST Serial Response Box.

Procedure

Participants were tested individually. They sat in a dimly lit room at a comfortable viewing distance in front of the computer. Before the experiment, participants were instructed that their task was to name pictures. Participants first were asked to familiarize themselves with the experimental stimuli by viewing each target for 3000 ms with the correct name printed below. Then, 4 warm-up trials and 60 trials for each repetition were administered.

Participants were asked to name pictures as quickly and accurately as possible. Each trial involved the following sequence: A fixation point (+) presented in the center of the screen for 500 ms, followed by a blank screen for 500 ms. After that the target picture appeared, an inter-trial interval of 1500 ms concluded each trial. The experiment took about 30 min in total.

Results

Data from incorrect responses (0.39%), naming latencies longer than 1500 ms or shorter than 200 ms (0.58%), and latencies deviating 2.5 standard deviations from the cell mean (2.27%) were removed from all analyses. The remaining data were used in the subsequenct statistical analysis. Figure 1 presents the mean latencies, presented by Character Frequency, SF, and Repetition in spoken picture naming.

FIGURE 1

Figure 1. Mean naming latencies in spoken responses by WF, SF, and repetitions (L, Low; H, High; WF, Word Frequency; SF, Syllable Frequency).

We used the lmer program of the lme4 package for estimated fixed effects and parameter estimation of the LMM (Bates, 2005; Baayen et al., 2008; Bates et al., 2009). The free software R was used (R Development Core Team, 2009). The data were analyzed using a linear mixed-effects model that included fixed effects of WF, SF, and Repetition, and by-participant and by-item random intercepts. Models were fit to the data using a restricted maximum likelihood estimation, which seeks to find those parameter values that make the model's predicted values most similar to the observed values. Model fitting was carried out by initially specifying a model that only included the random factors (participants and items) which was then enriched by subsequently adding the fixed factors WF, SF, and Repetition one by one, followed by the interaction between WF and Repetition, the interaction between WF and SF, the interaction of SF and repetition, and the 3-way interaction among WF, SF, and Repetition one by one. The best fitting model was defined to be the most complex model that significantly improved the fit over the previous model. If adding a fixed factor or an interaction among factors did not significantly improve the fit, this indicates that they do not produce significant influences on the dependent variables (i.e., naming latencies).

For speaking latencies, the best fitting model included WF, SF, Repetition, the interaction between WF and Repetition (see Table 2). Adding the interactions between WF and SF, χ²_{(1, 4180)} = 0.61, p = 0.43, SF and Repetition, χ²_{(2, 4180)} = 4.67, p = 0.10, and the triple interaction among WF, SF, and Repetition, χ²_{(2, 4180)} = 3.41, p = 0.18, did not significantly improve the fit. Data analysis indicated a significant WF effect and a marginally significant SF effect across repetitions. In order to examine the effects of WF and SF, and their interaction, speaking data were analyzed separately for each repetition.

TABLE 2

Table 2. LMM estimates of fixed effects for latencies in speaking.

For the first and the second repetitions, the best fitting model included WF and SF, adding the interaction between WF and SF did not significant improve the fit in the first repetition, χ²_{(1, 1396)} = 0.86, p = 0.35, and the second repetition, χ²_{(1, 1397)} = 1.26, p = 0.26. For the third repetition, the best fitting model included WF only, adding the SF, χ²_{(1, 1387)} = 0.60, p = 0.44, and the interaction between WF and SF, χ²_{(1, 1387)} = 0.00, p = 1, did not significantly improve the fit. Table 3 displays parameter estimates for fixed effects in each repetition.

TABLE 3

Table 3. LMM estimates of fixed effects for latencies for each repetition in speaking.

A parallel analysis was conducted on the errors, but a binomial family was used because of the binary nature of the responses. For each repetition, no models including WF, SF, or their interaction significantly improve the fit, χ²₍₁₎s ≤ 1.29, p ≥ 0.26. Planned comparisons showed no significant effects of WF, zs ≤ 0.28, ps ≥ 0.75, and no significant effects of SF, zs ≤ 0.74, ps ≥ 0.46.

Discussion

The experimental results are clear-cut. First, a highly reliable WF effect (average: 49 ms) was obtained, pictures with high-frequency names were produced faster and more accurately than those with low-frequency names. Although the WF effect decreased from the first (58 ms) repetition to the third (35 ms) repetition, it was still a substantial 35 ms in the third repetition. This finding is consistent with other studies that showed that the WF effect decreases with repeated presentations of the same set of pictures (e.g., Bartram, 1973; Monsell et al., 1992; Wheeldon and Monsell, 1992; Griffin and Bock, 1998).

Second, the experiment showed that there is a SF effect in the first and the second repetitions, pictures with high-frequency syllables produced faster than those with low-frequency syllables. This was in accordance with previous findings (Levelt and Wheeldon, 1994). According to the WEAVER++ model, syllables are retrieved from the mental syllabary, therefore, a SF effect was observed. However, this effect dissipated in the third repetition. What could be the cause of the ephemeral SF effect? We suggest that it is a recency effect (see also Jescheniak and Levelt, 1994) for a similar pattern on a gender decision task]. After two repetitions, high- and low- frequency syllables became equated on recency, and thus the SF effect decreased or disappeared in the third repetition.

Third, there was no interaction between WF and SF in spoken picture naming, indicating that the SF effect is independent of the WF effect (see also Levelt and Wheeldon, 1994 for a similar conclusion).