Investigating the flow of information during speaking: the impact of morpho-phonological, associative, and categorical picture distractors on picture naming

Bölte, Jens; Böhl, Andrea; Dobel, Christian; Zwitserlood, Pienie

doi:10.3389/fpsyg.2015.01540

ORIGINAL RESEARCH article

Front. Psychol., 12 October 2015

Sec. Psychology of Language

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.01540

Investigating the flow of information during speaking: the impact of morpho-phonological, associative, and categorical picture distractors on picture naming

1. Institut für Psychologie, Westfälische Wilhelms-Universität Münster Münster, Germany
2. Institut für Lernsysteme GmbH, Hamburg Germany
3. HNO Klinik, Universitätsklinikum Jena Jena, Germany

Abstract

In three experiments, participants named target pictures by means of German compound words (e.g., Gartenstuhl–garden chair), each accompanied by two different distractor pictures (e.g., lawn mower and swimming pool). Targets and distractor pictures were semantically related either associatively (garden chair and lawn mower) or by a shared semantic category (garden chair and wardrobe). Within each type of semantic relation, target and distractor pictures either shared morpho-phonological (word-form) information (Gartenstuhl with Gartenzwerg, garden gnome, and Gartenschlauch, garden hose) or not. A condition with two completely unrelated pictures served as baseline. Target naming was facilitated when distractor and target pictures were morpho-phonologically related. This is clear evidence for the activation of word-form information of distractor pictures. Effects were larger for associatively than for categorically related distractors and targets, which constitute evidence for lexical competition. Mere categorical relatedness, in the absence of morpho-phonological overlap, resulted in null effects (Experiments 1 and 2), and only speeded target naming when effects reflect only conceptual, but not lexical processing (Experiment 3). Given that distractor pictures activate their word forms, the data cannot be easily reconciled with discrete serial models. The results fit well with models that allow information to cascade forward from conceptual to word-form levels.

Spoken Word Production

The production of a simple greeting such as “Hi” is the result of series of cognitive processes that precede articulation. Processes such as conceptualization, message generation, lexical selection, morpho-phonological processing, phonetic encoding, and monitoring all take place prior to articulation (Dell, 1986; Butterworth, 1989; Levelt, 1989; Levelt et al., 1999). How information flows between how many different processing levels is a much-debated topic, distinguishing between serial-discrete (“two-step#x201D;) models, fully cascading models and fully interactive models (see Levelt, 1989). Interactive models allow for bidirectional information flow (from conceptual to phonological information, and vice versa). The major difference between discrete and fully cascading models concerns the information that is activated at certain processing stages, which are detailed below.

In the current study, we tested predictions derived from discrete and fully cascading models. We assessed the flow of information during speaking by investigating how distractor pictures that are not targets for speech production influence the speed with which a target picture is named. We varied the relationship between the distractor and target pictures to assess how “deeply” distractor pictures are processed. Target and distractor pictures could be semantically related (target “sunbed”, distractors “beach ball”, and “flippers”), and in addition, their names could share a morpheme (target “sheepdog”, distractors “sheep pen”, and “sheep wool”). An impact of these types of relatedness on picture naming is informative about the flow of information in speech production. To elucidate different predictions by the models that are put to test here, we briefly sketch these models.

Models of speech production agree that speaking makes demands on the following types of information. The first, conceptual/semantic information of the to-be-expressed concepts is often considered not to be lexical but part of semantic memory. Lexical information consists of grammatical aspects (e.g., word class, gender) and information about the form of words, including their morphological make up (cf. “collie” and “sheepdog”) and phonological specification (e.g., /d/ /o/ /g/). But models disagree with respect to the processing flow from conceptual to phonological information. In the serial models (Garrett, 1980; Levelt et al., 1999), speaking proceeds serially, in ordered steps, from conceptual processing to articulation. Critically, there are two distinct steps; the first step allows cascading of information, such that many representations can be active at adjacent levels of processing. The second step is only initiated when a selection process has delivered a single, complete output (cf. Levelt, 1989; Roelofs, 1997; Levelt et al., 1999; Bloem and La Heij, 2003). In discrete, two-step models, concepts activate multiple lexical entries at an initial level, labeled “lemma level”. Lemmas code the grammatical features (word class, gender, and so on), but not the morpho-phonological make up of lexical entries. Many related concepts (dog, cat, collie) can be active during speech production, and the activation cascades to their corresponding lemmas. Which lexical entry will be uttered is decided at the lemma level, by means of a competitive selection process (Roelofs, 1992). Selection is more difficult/takes more time when co-activated lemmas come from the same semantic category as the target (e.g., lemon–orange), because they compete more for selection than unrelated entries, or than related entries that have less semantic overlap (e.g., lemon–sour). Selection of one lemma as the target for production implies that only one lexical entry will activate its morpho-phonological word-form, and this is where cascading comes to a halt¹.

In contrast, processing stages in fully cascading models, although temporally ordered, deliver multiple, even partial, outputs to consecutive stages, allowing for the simultaneous activation of many word forms (Dell, 1986). Some of these models do not adopt a separate lemma level (Stemberger, 1985; Humphreys et al., 1988; Caramazza, 1997; Peterson and Savoy, 1998). The selection as to which word will be uttered is non-competitive; to cite Mahon et al. (2007, p. 203) “the level of activation of a non-target does not affect the selection of the target”. Thus, there are two crucial differences between these models; (1) discrete, two-step models predict interference, reflecting competition during selection due to the presence of same-category stimuli, but fully cascading models do not and (2) cascading models allow and predict that word-form (morphological and phonological) information is simultaneously available for more than one lexical entry, but discrete two-step models do not. Jescheniak and Schriefers (1998), Peterson and Savoy (1998), Rapp and Goldrick (2000), as well as Goldrick (2006) offer overviews of the discrete/cascading controversy.

Cascaded or Discrete Processing, Paradigms, and Evidence

In the following, we summarize the evidence in favor of fully cascaded, and against discrete, processing in speech production, and introduce the paradigms used together with their basic findings. Next, we present the manipulations and predictions for the three experiments of our study.

So far, evidence for cascaded processing comes from (1) speech errors, (2) picture naming experiments with word distractors, and (3) picture-naming experiments with picture distractors – the paradigm that we also used here. Speech-error data from patients and simulations of speech-error data argue against discrete models (Rapp and Goldrick, 2000). The relevant error type concerns mixed errors. A mixed error is a word that is semantically and phonologically related to the intended word (e.g., saying cat instead of calf). Taking error distributions into account, such errors are more likely to occur than pure semantic errors (e.g., saying dog instead of cat; Dell and Reich, 1981; Martin et al., 1996). Rapp and Goldrick (2000) argue that mixed errors can only occur in fully cascading models and/or interactive models, but not in discrete serial models. Roelofs (2004), however, argues that mixed errors result from erroneously selecting two lemmas instead of one. In his view, erroneous selection of multiple lemmas is not restricted to mixed errors but is also the basis for blend errors (e.g., close + near →clear, cf. Roelofs, 1992) and for activating multiple word forms of near synonyms.

The next source of evidence comes from picture–word interference (PWI) studies. In paradigms with word distractors, a picture that has to be named is accompanied by (written or spoken) words that can be ignored. Such PWI studies consistently show that picture naming is faster when distractor words are related in form (picture of a calf, distractor “cart”) than when not (picture of a calf, distractor word “bowl”; Meyer and Schriefers, 1991; Levelt et al., 1999, for an overview). This also holds for cases of large form overlap, when target and distractor word share a morpheme (picture of a sheepdog, distractor “sheep wool”), even when there is no obvious semantic relation between the concepts specified by picture and distractor word (e.g., picture of a hummingbird, distractor “jailbird”; see Lüttmann et al., 2011a). Semantically related distractors that do not share the target’s semantic category (picture of a cow, distractor “milk”) tend to speed target naming. This is often interpreted as stemming from the non-lexical, conceptual level (see La Heij et al., 1990; Alario et al., 2000). However, picture naming is slowed when the distractor comes from the same semantic category as the target (picture of a calf, distractor “sheep”). This is interpreted either as evidence for competitive lexical selection (Schriefers et al., 1990; Roelofs, 1992; Levelt et al., 1999), or as originating from post-lexical problems, occurring when a semantically related distractor word occupies a prominent place in the serial output buffer, thus hindering the timely output of the picture name (Mahon et al., 2007).

With respect to the issue of full or partial cascading, experiments with word distractors that are related in both meaning and form to the target picture (e.g., target picture calf, distractor word “cat”) revealed interactive effects: form relatedness counteracts the negative consequences of a shared semantic category between target and distractor (Starreveld and La Heij, 1995, 1996; Damian and Martin, 1999). Moreover, near synonyms or cognates (for bilinguals) activate multiple word forms (Jescheniak and Schriefers, 1998; Peterson and Savoy, 1998; Costa et al., 2000), also supporting the notion of full cascading.

Finally, some studies using multiple pictures instead of pictures and words also argue for a continuous cascade of information. In picture–picture paradigms, a target picture for naming is accompanied by one or more distractor pictures that should not be named (Glaser and Glaser, 1989; Morsella and Miozzo, 2002; Damian and Bowers, 2003; Navarrete and Costa, 2005; Meyer and Damian, 2007; Oppermann et al., 2008, 2014; Roelofs, 2008a). Morsella and Miozzo (2002) asked their participants to name one of two differently colored, superimposed line drawings, and to ignore the other. Faster picture-naming latencies were obtained for phonologically related (bed-bell) than for unrelated pictures (hat-bell; see also Damian and Bowers, 2003; Navarrete and Costa, 2005; Meyer and Damian, 2007; Roelofs, 2008a), suggesting that the distractor picture activates its phonological representation, which then (because of phonological overlap) speeds up target naming. Jescheniak et al. (2009), who failed to replicate this data pattern, suggest that differences in amount of phonological overlap, the inclusion of the distractor pictures in the response set, and/or subtle differences in name agreement might be responsible for the divergent results. Importantly, and despite the absence of semantic effects in Morsella and Miozzo (2002), the presence of phonological effects argues for the full cascading of activation.

The absence of semantic effects (e.g., table–bed) in Morsella and Miozzo (2002) is rather startling, given that language production proceeds from semantic to phonological representations. In general, studies using picture–picture paradigms showed diverging results for categorically related distractor pictures: facilitation (Bloem and La Heij, 2003; Roelofs, 2008a), interference (Glaser and Glaser, 1989), or no effects (Humphreys et al., 1995; Morsella and Miozzo, 2002; Damian and Bowers, 2003; Navarrete and Costa, 2005). It is not yet fully understood what causes the different result patterns. With picture distractors, it does not seem mandatory that all available conceptual information is automatically encoded lexically, and the task, target set, attention to the distractor picture, and material manipulations might play an important role.

One important factor concerns the availability of distractor pictures as (potential) targets – sometimes manipulated by including all pictures in the target set. This fits with data from Aristei et al. (2012), who presented two pictures simultaneously that both had to be named to produce a novel compound (e.g., lion dog). Participants were slower in producing such novel noun–noun compounds when the two pictures were categorically related (lion dog) than when not (chair dog). Aristei et al. (2012) argue that this provides evidence for lexical competition.

Similar conclusions can be drawn from studies by Oppermann et al. (2008, 2014), who presented a target and a distractor picture simultaneously, while spoken words that were semantically related, phonologically related or unrelated to the distractor picture served as additional distractors. When target and distractor objects were similar in shape, semantically related distractor words slowed down target picture naming relative to unrelated distractor words. This suggests that the concepts of the target and distractor pictures enter the lexicalization process provided that distractor pictures capture sufficient activation, because they are similar in shape to the target and are “boosted” by related distractor words.

Thus, whether semantic effects can be registered in picture–picture paradigms seems to depend on the amount of attention to the distractor picture (Jescheniak et al., 2014), on how to signal the target picture and/or on the particular task implemented (Glaser and Glaser, 1989; Bloem and La Heij, 2003; Damian and Bowers, 2003).

Note that evidence for cascading semantic information per se does not distinguish between fully cascading and discrete, two-step models, but the direction of semantic effects (facilitation, interference) does. Interference, due to same-category membership of distractors and targets, is predicted by two-step models but not by fully cascading models. It plays an important role in the discussion about lexical-competition (discrete models), and fully cascading models provide an explanation of such interference effects in terms of a post-lexical response-buffer. We will discuss this further below.

The Picture–Picture Paradigm, Conditions, and Predictions

To further test the predictions of discrete and fully cascading models, we opted for the picture–picture paradigm, because its suitability to test for activation of lexical form (morphology, phonology) of non-target pictures. We presented three different pictures, one of which was the target for naming. Which picture had to be named was either signaled by a cue that appeared with varying delays (Experiments 1 and 2), or was unequivocally signaled by presenting the target picture with some delay after the non-target (distractor) pictures (Experiment 3). We used multiple distractors (1) because effects can be larger with two than with one distractor (Melinger and Abdel Rahman, 2004) and (2) to create more uncertainty as to which picture has to be named eventually.

A first manipulation concerned the nature of semantic overlap between distractor and target pictures, which was either associative or categorical. Note that both models allow for the activation of multiple concepts (of all three pictures). To our knowledge, associatively related distractors (e.g., sailor and ship) or distractors representing semantic features of the target object (e.g., porthole and ship) have not been investigated so far within the picture–picture paradigm. It is well established that associatively and categorically related distractors have different effects in the PWI paradigm (Bölte et al., 2003, 2005; Costa et al., 2005; Mahon et al., 2007). Why words that are semantically associated or that represent semantic features of the target picture facilitate, whereas words that specify a same category member inhibit picture naming, is still a matter of intense debate (see Costa et al., 2005; Mahon et al., 2007; Abdel Rahman and Melinger, 2009; Janssen, 2013; Roelofs et al., 2013; Mahon and Navarrete, 2014). Whereas both associative and categorical similarity should induce priming at the level of conceptual representations, they seem to differ at lexical or post-lexical levels. According to discrete models, the activated lemmas of same-category concepts cause havoc during the selection of the lexical entry that is the target for speaking (Roelofs, 1992; Levelt et al., 1999), because they are confusable with the target and seem such valid responses (saying “dog” to a picture of a cat is more likely than saying “purr”). If we obtain categorical competition effects in a picture–picture paradigm, this is clear evidence for the existence of a competitive lexical selection process, and argues against prominent cascading models (Caramazza, 1997). Note that categorical interference from pictures also speaks against the response-exclusion hypothesis (Finkbeiner and Caramazza, 2006; Mahon et al., 2007). According to this hypothesis, the interference by categorically related distractor words observed in PWI is due to the fact that these distractors, because they are words, enter the articulatory response buffer that channels verbal responses for output. Words that are semantically related to the correct response (the picture name) are harder to remove from this buffer than unrelated words, hence, the interference. Most importantly, this holds for verbal stimuli only, not for pictures (see Jescheniak et al., 2014).

As stated above, discrete and fully cascading models also make different predictions concerning the impact of morpho-phonologically related distractor pictures on the speed of target-picture naming. We used German compound words, as distractors (garden hose, garden gnome) and targets (garden chair), because such stimuli have the advantage of sharing both semantic and form information. We crossed the type of semantic relation (associative vs. categorical) with form overlap, in terms of shared morphemes (initial or final morphemes of compound names). To our knowledge, combining semantic and form overlap has not been done before with the picture–picture paradigm (not even with partial overlap, as in “cart” and “calf”). The critical evidence for full cascading is when distractor pictures also activate their word-form information. This should not be the case according to discrete, two-step models.

As stated earlier, form-relatedness has been reliably demonstrated with the PWI paradigm, when a target picture (e.g., of a football) is accompanied by a distractor word that shares phonemes or morphemes with the target (e.g., “foodstuff” or “footstool”; cf. Meyer and Schriefers, 1991; Zwitserlood et al., 2002; Lüttmann et al., 2011a). In picture–word paradigms, distractor words automatically activate lexical information. Their processing proceeds from phonemes or graphemes via word-form and syntactic information to concepts. Word distractors can thus influence picture naming at all (lexical) levels. This is different for picture distractors that can only influence the lexical processing of the target if the distractors themselves activate their lexical information. Thus, if naming a “football” is easier when the distractor pictures show a “footprint” and a “footstool”, this provides clear evidence for the activation of morpho-phonological information belonging to the distractor pictures, and for full cascading of information during speech production. In contrast, the lack of activation of the distractor pictures’ word forms supports discrete, only partially cascaded models.

We thus included the following target-distractor conditions in our study. The relation between a target picture (e.g., a garden chair)² and its two different distractor pictures was either (1) associative with morpho-phonological³ overlap (+A+M) in the first constituent (e.g., garden hose, garden gnome), (2) same-category combined with morpho-phonological overlap +C+M) in the second constituent (e.g., rocking chair, office chair), (3) merely associative (+A–M; e.g., a swimming pool, lawn mower) or (4) merely categorically related (+C–M; e.g., office desk, shoe rack), thus without morpho-phonological overlap, or (5) completely unrelated (e.g., billiard ball, sock suspender).

Our rationale to use both types of semantic relation is as follows: if effects in the picture–picture paradigm solely originate at a conceptual level, effects should be similar for categorically and associatively related distractors. If interference – or reduced facilitation, relative to associatively related pictures – is observed for categorical distractors, this is evidence for their lexical coding. Such effects provide clear evidence for competitive lexical selection (cf. Levelt et al., 1999), and against fully cascading models as well as against the response-exclusion hypothesis that only applies to words, not to pictures (Mahon et al., 2007). Note that reliable interference due to categorically related context pictures has rarely been observed in picture–picture studies reported so far, which either suggests that distractor pictures are not lexically coded automatically (cf. Damian and Bowers, 2003; Jescheniak et al., 2014), or that conceptual facilitation and lexical competition cancel each other out.

We also implemented the distinction between same category and association with pictures whose names are morpho-phonologically related to the target picture’s name. Morphological relatedness is not specified at the conceptual level (Caramazza et al., 1988; Levelt et al., 1999; Janssen et al., 2008). If all effects are conceptual, without any lexical involvement, these should behave in the same way as associatively or categorically related pictures whose name is morpho-phonologically unrelated to the target. If distractor pictures are lexically processed, but at the lemma level only (in discrete models), the same predictions hold as formulated above for morphologically unrelated distractors. But if distractor pictures are processed all the way down to their word-form level, where morphology is specified, we expect facilitation due to morpho-phonological relatedness. In PWI studies, where form effects are obvious because the distractors are words, facilitation was observed with distractors and targets overlapping at word onset and offset, both with monomorphemic words (e.g., power and towel with the picture of a tower) and with morphologically related (e.g., tea rose and rosebush with the picture of a rose) distractors (Meyer and Schriefers, 1991; Zwitserlood, 1994; Zwitserlood et al., 2002; Belke, 2005; Lüttmann et al., 2011b).

When distractor pictures are encoded at the level of word form, we expect additional facilitation due to shared morphemes, relative to an unrelated baseline, in both morpho-phonological conditions (+A+M and +C+M). The size of effects might differ because of lexical competition in the +C+M condition. The purely associatively related distractor condition (+A–M) that does not induce much lexical competition should also reveal facilitation, but the categorically related distractors (+C–M) should show no effect or even interference. This is because they are conceptually related to the target (resulting in facilitation) but also lead to interference due to lexical competition with the target. Keep in mind that the presence of interference, or reduced facilitation, in the +C conditions speaks for competitive lexical selection (Levelt et al., 1999), but is incompatible with full cascading models (Caramazza, 1997) and with response-exclusion (Mahon et al., 2007).

Finally, we manipulated the signaling of the target picture, either by a cue (an arrow, Experiments 1 and 2) or by a time delay (Experiment 3). We varied the onset of the target cue (Experiments 1 and 2) relative to the stimuli display (SOA). This had two functions. First, given that it is unclear whether multiple pictures automatically activate their lexical information, a longer uncertainly as to which picture has to be named (implemented by a larger SOA) might invite a lexical activation of all pictures. A large SOA might invite the lexical coding of more than one picture, but a small SOA should not.

The next issue concerns the time course of lexical activation. In the PWI paradigm, the impact of semantic and phonological distractors on picture naming depends on the temporal relation between word distractor and target. Categorical and associative effects are largest if the distractor precedes the target, while phonological effects arise when the distractor follows the target or is presented simultaneously with the target (Glaser and Düngelhoff, 1984; Schriefers et al., 1990; Meyer and Schriefers, 1991; Alario et al., 2000; Jescheniak et al., 2005). Similarly, providing more or less time before it becomes clear which picture is to be named might lead to the involvement of different processing levels. An SOA of 200 ms between the onset of the pictures and the cue may well be too short for the activation of word-form information, but an SOA of 600 ms should suffice. So, the SOA manipulation was used to invite or discourage the (strategic) lexical coding of all (or some) pictures before the target was signaled. In Experiment 3, it was clear to the participants that the two objects that appeared first were never to be named, because the target was signaled by means of an onset delay. In this case, lexical activation of distractor pictures might be completely absent.

We also monitored eye-movements, in addition to voice-key latencies. The reason was to investigate whether targets had to be fixated for correct naming, and whether distractors had to be attended overtly to affect target naming. Previous research using eye-movements required their participants to name all displayed objects (cf. Meyer et al., 1998). In such tasks, participants look at the object until its phonological form is planned. On the other hand, Dobel et al. (2007) showed that fixations of scene elements are not necessary to identify (and name) agents, actions and patients of action scenes. Unlike in the study by Meyer et al. (1998), participants were not asked to give speeded responses, and sometimes were even prevented from making eye-movements into the scene, because of very short scene presentation durations. So, speakers can name visual stimuli without overt attention, but they may well look at objects to facilitate object recognition and name retrieval (Meyer et al., 2012). It is still unknown whether distractor pictures have to be fixated at all to affect target naming.

Experiment 1: Cue Onset 600 ms

Method

Participants

Forty participants from the Westfälische Wilhelms-University of Münster took part in the experiment. They were either paid 4 € or received course-credit for their participation. All had normal or corrected-to-normal vision and were native speakers of German.

Material

We used pictures that are named with noun–noun compounds to implement the morpho-phonological similarity, concurrent with semantic similarity, between target and distractor pictures. Material selection was a multi-phased procedure. First, we selected noun–noun compounds from the Celex lexical database, discarding all compounds that were not depictable (Baayen et al., 1993). Next, distractors were constructed for each target (Gartenstuhl, lawn chair) such that there were three to five distractors per Distractor Type: (1) +A+M, associatively and morpho-phonologically related (e.g., Gartenzwerg, garden gnome; Gartenschlauch, garden hose), (2) +C+M, categorically and morpho-phonologically related (e.g., Schaukelstuhl, rocking chair; Bürostuhl, office chair), (3) +A–M, associatively but not morpho-phonologically related (Rasenmäher, lawn mower; Schwimmbecken, swimming pool) and (4) +C–M, categorically but not morpho-phonologically related (e.g., Schreibtisch, desk; Schuhregal, shoe rack), and finally, control distractors that were neither categorically, associatively nor morpho-phonologically related to the target (e.g., Zahnbürste, tooth brush; Billardkugel; billiard ball). This resulted in a set of 377 compounds (22 targets, 355 potential distractors). Colored pictures for these compounds were taken from the Hemera Photo Objects (n.d.) database, or from the internet.

The material was tested in two pretests: (1) an oﬄine name-agreement test in combination with a semantic rating task and (2) an online name-agreement test. Twenty participants took part in the oﬄine tests, another 15 served in the online test. All participants came from the same population as mentioned above and received a similar compensation. In the oﬄine name agreement test, each distractor picture was presented alongside its target picture, resulting in 355 trials. Participants were asked to write the word that described best the depicted objects and to rate their semantic relatedness, using a 5-point scale (1 = unrelated, 5 = related). The online name agreement test served to assess the preferred naming of the picture under conditions similar to the actual experiment (see Table 1 for relevant means and SDs). Trials in this test were structured as follows: a fixation cross appeared on a computer screen for 250 ms, followed by the picture that remained on the screen for 600 ms Time-out was set to 1500. Participants were asked to name the picture as quickly as possible.

Table 1

		Percentage name agreement
	Semantic relatedness rating	Oﬄine task	Online task
+A+M	3.9 (0.9)	92.0 (11.6)	85.5 (13.6)
+C+M	3.3 (1.0)	87.2 (14.5)	82.7 (15.1)
+A–M	3.4 (0.7)	90.7 (10.6)	78.9 (14.1)
+C–M	3.1 (0.5)	88.4 (13.2)	80.4 (14.4)
Unrelated	1.21 (0.2)	97.5 (5.2)	88.4 (10.9)

Semantic relatedness rating and name agreement data from off- and online tasks, as a function of distractor condition (SD in parentheses).

We selected all pictures that were predominantly named with a morphologically complex word in the oﬄine (targets mean: 79%, SD: 6, range: 70–85%; distractors mean: 91%, SD: 12, range: 55–100%) as well as in the online naming test (targets: mean: 81%, SD: 14, range: 60–100%; distractors: mean: 84%, SD:14, range: 53–100%). This resulted in 15 target pictures, each with two different distractor pictures in each of the five distractor conditions. Mean ratings of all pretests for the selected items are provided in Table 1. The semantic relatedness judgments were evaluated with the help of a one-way univariate repeated measures ANOVA over items, using semantic relatedness judgments as dependent variable and Condition (+A+M, +C+M, +A–M, +C–M) as factor. The main effect Condition was not significant [F_(3,42) = 2.172, MSE = 0.758, p = 0.105, = 0.117].

Targets and distractors were distributed over five lists, with list order counter-balanced across participants. Participants were presented with all lists. An additional 24 filler trials, each with pictures of three morphologically complex but unrelated words, were included in each list, to increase the number of unrelated trials (e.g., Schlittschuh, ice skate; Bohrmaschine, drilling machine; Sonnenblume, sunflower). Each block consisted of 39 trials plus six warm-up trials.

Apparatus

Pictures (ranging from 22 × 245 pixel for “toothbrush” to 241 × 207 pixel for “oil lamp”) were presented on a 21-inch Samsung SyncMaster 1100p plus CRT monitor (1024 × 768 pixel, frame rate: 85 Hz), controlled by a Dell-Dimension 4200 IBM-compatible PC. Participants were seated approximately 60 cm in front of the monitor. Eye-movements were recorded with an Eyelink II (2004) eye-tracker, with a sampling rate of 500 Hz and an eye position resolution of less than 0.5°. The eye-tracker was controlled by a Dell-OptiPlex 280. Onset naming latencies were recorded with a voice key.

Procedure

Participants were tested individually in a quiet room. They received a written instruction. They were informed that three pictures would appear on the screen and that shortly after picture onset an arrow would signal the picture that they had to name. Participants were asked to name the target picture as quickly and accurately as possible such that the experimenter could correctly identify the target among the other objects on the display (see Bölte et al., 2009). Before the experiment proper, the following steps were taken. First, to minimize target name variation, participants received a booklet with target pictures and names. Second, after having read the booklet, each target picture was presented again for naming on the computer screen. Third, the eyetracker was calibrated and validated using a nine-point calibration type (HV9). Upon successful validation, the experiment started. A drift-correction was applied before each trial using the fixation point.

Trial structure was as follows: a fixation point, centered in the middle of the screen, indicated the beginning of a new trial. After successful fixation, the trial began and three pictures appeared in one of four possible configurations. Either there was one picture left of, one right of (160 pixel away from screen center) and one above (or below) the fixation point (150 pixel away from screen center) or one above, one below and one left (or right) of the fixation point (6.9° apart). An arrow appeared 600 ms after picture onset, signaling the target object. Target position on a list was (nearly) equally distributed (10 top, 10 left, 10 right, 9 bottom). Pictures disappeared with the participants’ voice onset or after 5000 ms. Stimuli were presented as colored photographs on a white background. The experimenter wrote down the participants’ answers.

Results

Responses different from expected names (1.6%), disfluencies (.8%), voice-key failures (0.1%), and time-outs (1.0%) were excluded from the analyses. Responses given before cue onset were also excluded (2.4%). No item set, but two participants had to be excluded from the analyses due to missing data. Mean voice-key latencies measured from cue onset served as dependent variable⁴. (see Table 2 for mean reaction times (RT) and standard errors; Figure 1 displays the effects (RT control condition – RT experimental condition) per experiment). Repeated-measurement factors were Presentation (1–5) and Distractor Type (+A+M, +C+M, +A–M, +C–M, unrelated) in an initial two-ways repeated measures ANOVA. Participants named pictures faster toward the end of the experiment, as indicated by a significant linear trend for the factor Presentation [F_(1,37) = 96.469, MSE = 9739, p < 0.001, = 0.723]. There was no significant interaction of Distractor Type and Presentation, F < 1. Therefore, Presentation was dropped from further analyses. Most importantly, this analysis also yielded a significant main effect of Distractor Type [F_(4,148) = 5.983, MSE = 17894, p < 0.001, = 0.021]⁵.

Table 2

Experiment	Distractor Type
	+A+M	+C+M	+A–M	+C–M	Unrelated
1	462 (25)*	490 (26)	485 (26)*	529 (27)	511 (30)
2	699 (21)*	745 (20)*	772 (23)	784 (20)	780 (21)
3	730 (24)*	754 (24)*	738 (24)*	755 (26)*	783 (28)

Mean picture naming latencies and standard error (in parentheses) as a function of Distractor Type and Experiment.

The asterisk (*) denotes a significant difference to the unrelated condition.

FIGURE 1

A two-ways repeated measure ANOVA with the factors Morphological Relatedness (related, unrelated) and Semantic Relatedness (associated, categorically related) using effect as dependent variable (control condition–experimental condition) yielded two significant main effects and a non-significant interaction (Morphological Relatedness: F_(1,37) = 8.024, MSE = 3835, p = 0.007, = 0.029; Semantic Relatedness: F_(1,37) = 13.810, MSE = 2966, p = 0.001, = 0.038; interaction: F < 1.

Mean voice key latencies of the +A+M condition were faster than those of the unrelated condition [one-sided t-tests: t₍₃₇₎ = –3.442, p = 0.001] and those of the associative condition without morpho-phonological overlap, +A–M [t₍₃₇₎ = –2.517, p = 0.016]. There was a trend toward significance when comparing the +A+M mean voice key latencies with those of the +C+M condition [t₍₃₇₎ = –1.585, p = 0.061]. Notice that we did not correct these and all following post hoc tests for multiple comparisons. Mean picture naming latencies in the category distractor condition +C–M were numerically longer but did not differ significantly from those in the unrelated condition [two-sided t-test: t₍₃₇₎ = 1.045, p = 0.303]. Thus, as in previous research, same-category members showed no facilitation, but also did not reliably interfere with picture naming in a picture–picture setting (cf. Glaser and Glaser, 1989; La Heij et al., 2003). Note that the main effect of semantic relatedness was significant, showing that an associative relation between distractors and target induced facilitation (37 ms) but a categorical relation did not (2 ms).

Fixations and dwell-time were measured from the onset of the pictures, with the help of the EyeLink Data Viewer program. Dwell-time was defined as the summation of the duration of all fixations on an interest area. Fixations reflect whether a specific item was fixated at all, from picture onset until reaction or trial end.

The eye-tracking data showed that participants fixated only one of the displayed objects in 36.6% of the trials (target: 29.1%, one distractor: 7.5%). Two objects were fixated in 33.9% of the trials (target and one distractor: 31.9%, both distractors: 1.7%). All three objects were looked at in 10.9% of the trials. All other fixations (19.0%) fell outside the objects (see Table 3 for an overview). The number of gazes shows that participants looked at the target object most often, which does not come as a surprise. As has been known for a long time, fixations – as a measure of overt attention – are not needed for the correct perception of objects or scenes (Fei-Fei et al., 2005). Evidently, targets can be and were named correctly without overt attention, and it is thus very likely distractors can also exert an influence on target naming without overt attention. Thus, overlapping stimulus configurations, as in the variant of Morsella and Miozzo (2002) are not mandatory for obtaining voice-onset latency effects of distractors. However, the visual angle and presentation time used here allow covert attention shifts. Two ANOVAs, one with first fixation onset on the target, the other with dwell time on the target as dependent variable and Distractor Type as factor showed no significant effects (F < 1).

Table 3

Experiment	Fixated object	+A+M	+C+M	+A–M	+C–M	Unrelated
		Distractor condition
1
	Target	6.9	5.1	5.7	5.8	5.6
	Distractor(s)	1.5	1.9	2.1	1.9	1.8
	Target and distractor(s)	7.8	9.1	8.7	8.4	8.7
	Nothing	4.0	3.9	3.4	3.7	4.0
2
	Target	11.0	9.7	9.9	9.5	8.7
	Distractor(s)	0.6	0.7	0.6	0.5	0.5
	Target and distractor(s)	6.1	7.6	8.1	8.3	8.7
	Nothing	2.4	1.9	1.4	1.7	2.1
3
	Target	12.2	11.2	10.3	11.4	10.6
	Distractor(s)	0.2	0.1	0.1	0.1	0.3
	Target and distractor(s)	5.6	6.6	7.1	6.3	6.6
	Nothing	1.3	1.1	1.6	0.8	1.3

Percentage gazes broken down by condition and fixated object for Experiments 1–3.

Discussion

To summarize, Experiment 1, with 600 ms time before the target was signaled, revealed both semantic effects (positive and null) as well as facilitation by shared morpho-phonological information with distractor pictures. Distractor pictures that were associatively related to the target picture clearly speeded target naming. Overall, categorically related distractor pictures showed no effect (2 ms). The large and reliable difference between the two semantic conditions, evident in the main effect of semantic relation, with 37 ms facilitation due to associatively related distractors but no effect for categorical distractors (2 ms), is in fact evidence for an impact of lexical competition on conceptually induced facilitation.

This modulation of conceptual/semantic facilitation by lexical competition fits with discrete models, but not with fully cascading models (Caramazza, 1997), nor with the response-buffer explanation of interference caused by word distractors (Mahon et al., 2007). The main effect of morpho-phonological relatedness, with 33 ms facilitation when morphological relatedness is present but no effect (4 ms) without such overlap, clearly indicates the presence of word-form information of distractor pictures. This replicates the word-form effects with overlapping, colored picture presentation (Morsella and Miozzo, 2002), and provides evidence for full cascading within the language production system.

To our knowledge, there are no picture–picture studies with associative relations between distractors and target. Our participants named target pictures faster in the presence of associatively related distractors. This replicates results from PWI studies (e.g., Bölte et al., 2003, 2005; Costa et al., 2005). Whereas semantic facilitation can be explained by activation at the non-lexical, conceptual level (see also La Heij et al., 2003), the fact that such semantic effects disappear when distractors and targets are from the same semantic category clearly indicates lexical involvement. Unlike others, we obtained no reliable interference relative to the unrelated condition. The closest comparison is a study by Humphreys et al. (1995), who also used a post-cue picture–picture procedure and observed semantic inference for categorically related pairs (e.g., horse–tiger). One difference between our study and Humphreys et al. (1995) is that naming responses were very slow, nearly twice as slow as ours. This suggests that interference might develop over time, but visual inspection of our data does not support this (see Figure 2), as there is no indication of interference at longer RTs.

FIGURE 2

Let us turn now to the interpretation of the “null effects” for categorical distractors. One argument could be that the distractor pictures never entered the lexical system to start with. But if distractors are not lexicalized, no effects of morpho-phonological relatedness should have been observed. In the absence of associative distractors, it would have been difficult to interpret the null effect, but compared to the clear facilitation for associative stimuli, the null effect seems to indicate that interference occurred, but was canceled out by facilitation due to semantic similarity. Note that according to the pretest, associative, and categorical stimuli were equally related to their targets. The combination of facilitatory conceptual effects, both for categorical and associative distractors, with an inhibitory lexical effect for categorical distractors only fits well with the idea of lexical competition implemented in the model proposed by Levelt et al. (1999). Semantic competition due to picture distractors is not predicted by the cascading model by Caramazza (1997), nor is it compatible with the post-lexical explanation of semantic interference that was devised for effects of word distractors (Mahon et al., 2007).

The type of semantic relation and the position of morphological overlap between distractors and target are naturally confounded. Associatively related distractors (e.g., garden gnome) overlap with the target name (e.g., garden chair) in their onset, sharing their modifier, while categorically related distractors overlap with the target in head position (e.g., rocking chair). There are no left-headed compounds in German that would allow separating overlap and semantic relatedness. Given that all three picture names started the same (e.g., garden gnome, garden chair, garden fence), participants could have prepared at least the modifier, in trials with associated stimuli, before even knowing which one was the target. Note, however, that this was not possible for the +A–M condition, which also showed semantic facilitation. Nevertheless, some additional processing advantage in the +A+M condition might result from phonological preparation – which still constitutes a down-stream lexical effect of word-form access and phonological encoding.

Given the SOA of 600 ms, it is quite possible that our participants started the lexical encoding of one or more pictures before the cue appeared. Although in discrete models, a parallel phonological encoding should not happen even in those situations, Experiment 2 was designed to minimize such preparation effects, by reducing the cue onset time to 200 ms.

Experiment 2: Cue Onset 200 ms

We reduced the SOA between the onset of the three pictures and the cue from 600 to 200 ms. A shorter cue-onset asynchrony provides less time for lexical activation of all pictures, and thus less time for an impact of lexical competition and of word-form similarity. Hence, a phonological preparation effect that might help target naming in cases of onset overlap (as with the associatively related stimuli) could be reduced. As a consequence, overall positive semantic (associative and categorical) effects, if present, might become more pronounced.