Stroop effects from newly learned color words: effects of memory consolidation and episodic context

Geukes, Sebastian; Gaskell, M. Gareth; Zwitserlood, Pienie

doi:10.3389/fpsyg.2015.00278

ORIGINAL RESEARCH article

Front. Psychol., 12 March 2015

Sec. Psychology of Language

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.00278

Stroop effects from newly learned color words: effects of memory consolidation and episodic context

Sebastian Geukes¹^*

M. Gareth Gaskell²

Pienie Zwitserlood¹

¹Institut für Psychologie, Westfälische Wilhelms-Universität Münster, Münster, Germany
²Department of Psychology, University of York, York, UK

The Stroop task is an excellent tool to test whether reading a word automatically activates its associated meaning, and it has been widely used in mono- and bilingual contexts. Despite of its ubiquity, the task has not yet been employed to test the automaticity of recently established word-concept links in novel-word-learning studies, under strict experimental control of learning and testing conditions. In three experiments, we thus paired novel words with native language (German) color words via lexical association and subsequently tested these words in a manual version of the Stroop task. Two crucial findings emerged: When novel word Stroop trials appeared intermixed among native-word trials, the novel-word Stroop effect was observed immediately after the learning phase. If no native color words were present in a Stroop block, the novel-word Stroop effect only emerged 24 h later. These results suggest that the automatic availability of a novel word's meaning depends either on supportive context from the learning episode and/or on sufficient time for memory consolidation. We discuss how these results can be reconciled with the complementary learning systems account of word learning.

Introduction

Learning a foreign language after childhood entails the acquisition of the rules of grammar of the novel language, knowledge that may arguably become explicit with practice, as well as learning a tremendous number of new words, as labels for concepts that have been acquired and mapped onto native words during first-language acquisition. Words as labels for concepts constitute explicit knowledge, and in the course of learning a new language, the human mental lexicon, which stores word knowledge, may double in size. There are intriguing questions as to when and how newly learned words are connected to the conceptual-semantic knowledge they refer to. Quite a few proposals have been offered for this aspect of second-language acquisition (e.g., Kroll and Stewart, 1994; Dijkstra and van Heuven, 2002). One problem that hampers the study of foreign-language vocabulary acquisition is that it mainly takes place in situations that do not provide adequate control over the input, the learning context, and many other potentially confounding variables that influence learning success.

For these reasons, researchers are increasingly turning to studying foreign-language learning with what are called novel-word learning paradigms. Common to these paradigms is that the learning input and method, stimulus materials, and external influences can all be kept under much stricter experimental control than in natural learning or in classroom situations. For example, entirely novel words are used instead of existing foreign words, to make sure that there is no overlap between the novel and native language word-forms. While learning with this approach may be ecologically less valid, many confounding influences can be excluded, allowing for clearer conclusions. The experimental manipulation of the learning process further makes it easier to relate the observed effects to the actual learning experience.

In recent novel-word learning studies, words and meanings were associated with rather different methods, such as presenting novel words together with definitions (e.g., Clay et al., 2007; Tamminen and Gaskell, 2013), associating novel words and their concepts by means of pictures (e.g., Yu and Smith, 2007; Dobel et al., 2010), and presenting novel words at the end of meaning-constraining sentences (Mestres-Missé et al., 2007; Borovsky et al., 2010, 2012). Common to these methods is that the word-concept links are established within a rich semantic context and with a salient focus on word meanings. Likewise, tests of these novel links also take place in contexts in which semantic processing is a major element of the task.

To test whether effective word-to-concept links have been established, different speeded and non-speeded tasks have been employed, such as object naming (Breitenstein et al., 2005), translation matching (Dobel et al., 2010), or semantic priming (Dobel et al., 2010; Tamminen and Gaskell, 2013). Results from these studies have shown that such links are indeed established and that these links are also evident when interacting with stimuli that were not presented during learning. However, both the more explicit, non-speeded tasks as well as the semantic priming paradigm are known to be susceptible to strategic manipulations (e.g., Neely, 1991), thus rendering it unclear as to how automatic the activation of the novel word's meaning actually is. Results from the Stroop task (Stroop, 1935; MacLeod, 1991), in contrast, are known to be much more robust against such manipulations. This makes the task a good one to test for at least some components of automaticity in the access process for word meanings¹ (Moors and De Houwer, 2006). The Stroop task thus promises to be an excellent extension to previous studies, because it allows to assess whether reading a novel word will automatically activate its meaning. Surprisingly, there seems to be only one word-learning study (Altarriba and Mathis, 1997) that made use of the Stroop task to test newly learned links between unfamiliar words and color concepts, and even the results from this study offer only limited conclusions with regard to the automaticity of semantic activation in novel words (see below).

The main focus of the present study is thus (1) to link novel words with familiar concepts within a semantically poor learning context, without an explicit focus on semantic processing and (2) to assess whether this learning nevertheless results in stable links between novel words and their meaning, to the extent that this meaning is automatically activated when merely reading the novel word. Because consolidation effects have been observed in several recent word-learning studies (e.g., Dumay and Gaskell, 2007, 2012; Davis et al., 2008), a further aim of this study is to test whether the establishment and availability of such semantic links in any way depends on an opportunity for memory consolidation.

During learning, novel words were directly paired with L1 (German) color words in a statistical association procedure adapted from Breitenstein and Knecht (2002). In our version of the paradigm, pairs of novel words and native color words are presented—some pairs representing correct matches, some not—such that correct word-word links can only be derived over time, on the basis of co-occurrence frequencies. Importantly, participants are merely instructed to decide whether the novel and native word of the current pair match or not (by pressing one of two buttons)—no semantic processing of the word stimuli is required (but not explicitly prevented). This simple instruction and the fact that no explicit feedback is given make it a procedure of low cognitive demand (e.g., see Clay et al., 2007, for a more explicit procedure, and Kachergis et al., 2013, for an interactive approach). Given that the words are not paired directly with a perceptual representation of their to-be-learned color concepts (e.g., a color patch, a color-related object), any connection between the novel word and the color concept can only be drawn indirectly, via the native color word. The amount of exposure can be easily quantified and manipulated, because novel and native words are associated in a systematic fashion. Likewise, learning progress can be continuously monitored, because a matching judgment is required in every trial. This paradigm has been successfully employed in a number of studies to associate novel words with pictures of existing concepts, using both spoken novel words (Breitenstein et al., 2005, 2007; Yu and Smith, 2007; Dobel et al., 2010; Liuzzi et al., 2010; Freundlieb et al., 2012) and written novel words (Laeger et al., 2014). To our knowledge, our implementation is the first to associate word-word pairs instead of word-picture pairs based on this statistical procedure.

In the typical modern version of the Stroop task, participants name (or indicate by button press) the ink color of a presented word. This response is slowed down if the word's meaning is incompatible with the ink color (e.g., ink color is red, but word is BLUE). Thus, the word's meaning interferes with task performance, although the task does not require any processing of the word's meaning. Apparently, reading the word activates the conceptual representation associated with that word. This ability of the Stroop task to reveal automatic semantic activation in such an indirect way promises to be an excellent test for whether, how fast, and how strongly, novel words are linked to their assigned color concepts.

Many studies have investigated how color words from a second language (L2) compare to L1 color words in the Stroop task, with the typical result that a substantial, but smaller interference effect is found in L2 compared to L1 color words (e.g., Preston and Lambert, 1969; Chen and Ho, 1986; Sumiya and Healy, 2004; earlier work reviewed in MacLeod, 1991). Even if the L2 is well-established and participants report equal levels of competence in both languages, the effect is larger in color words from the language that is dominant in everyday use (Altarriba and Mathis, 1997). Stroop effects of comparable size in the speaker's two languages are only found when both usage and competence are equally high (e.g., Mägiste, 1984).

As mentioned earlier, we are aware of only one published experiment in which a group of participants learned the set of novel color words immediately before the Stroop test (Experiment 2 in Altarriba and Mathis, 1997). In this experiment, monolingual English-speaking participants were trained with a set of four Spanish color words and subsequently further familiarized with these words in a series of quizzes. The quizzes involved rehearsing the new lexical link (e.g., matching Spanish to English color words) as well as the new semantic link (e.g., matching the Spanish words to color patches or to compatible objects: amarillo [yellow] goes with the school bus). These Spanish words, along with the English translations, were then entered into a Stroop task, in which the ink color had to be named using English color terms. In the English trials, naming latencies between congruently and incongruently colored words differed by 112 ms. Importantly, there was a similar but smaller difference in the Spanish trials (52 ms), indicating that the incompatibility between the word meaning and the verbal response slowed down color naming even with these newly learned words as distractors (Altarriba and Mathis, 1997).

These results are remarkable as they show that, even after a short learning session, the newly learned L2 words have already been sufficiently learned as to interfere with a task that does not explicitly require processing of word meaning. However, some features of this study hinder a full assessment of the power of the underlying semantic learning mechanisms. First of all, the color words of both languages have some phonological and orthographic overlap (e.g., red—rojo, yellow—amarillo) that may have artificially increased the L2 effect (cf. Sumiya and Healy, 2004). Second, given that the experiment was performed in the United States, it is also likely that participants, even though monolingual speakers of English, had some familiarity with the Spanish color words. Finally, the experiment required English color words as responses, which were the same color words that were repeatedly presented with the Spanish words during learning. One could argue that the observed interference was not between the novel words and their meanings, but between the novel words and the required English responses, as these links had been intensely rehearsed during learning (cf. the analogs discussion in the semantic priming literature on the differentiation between genuine semantic priming and priming by association, e.g., Lucas, 2000; Tamminen and Gaskell, 2013).

Hence, in our experiment, the stimuli and parameters of the Stroop task are chosen in such a way that these alternative explanations can be excluded. First, pseudowords instead of existing words are used to serve as to-be-learned color words. This is done to avoid any phonological/orthographic overlap between the L1 and the new color words, and to exclude that participants are familiar with any of the new words. Furthermore, the response format during the Stroop task is changed from verbal responses (color naming) to manual responses (color-matching): Participants indicate the ink color of the presented color-word stimulus by pressing one of four colored buttons. As the buttons are only present during the Stroop task, participants cannot learn any word-response associations beforehand. Consequently, a congruency effect in the Stroop task cannot be explained by a word-response association stemming from the learning phase.

The manual response format offers a further advantage over color naming. Although covert naming cannot be excluded (see e.g., Lupyan, 2012), lexical access is not even necessary to perform the Stroop task. Participants can simply rely on matching the presented ink color to the color of the corresponding button for correct responses. Consequently, this task should make it easier to ignore the presented word and its meaning. Indeed, in the native-language Stroop literature, the manual Stroop effect is usually substantially reduced relative to the verbal Stroop effect (about half the size, MacLeod, 2005). Moreover, Sharma and McKenna (1998) showed that, in contrast to verbal responses, there is no interference component in the manual response format that can be attributed to the word status of control items (that is, they found that a manual color-matching response to an XXXX letter string is as fast as a manual response to a color-unrelated existing word such as CHIEF). This in turn suggests that the manual response format more clearly captures the semantic component of the Stroop effect. In sum, the manual response format offers a stronger test for semantic learning of the novel color words.

Taken together, the adaptations we introduced to the original learning and testing paradigm provide a strong test of the power of semantic novel-word learning and of the automaticity of the resulting memory traces.

A further aim of our study was to test whether the establishment and availability of such semantic links depends on an opportunity for memory consolidation. In most studies that used our variant of a statistical learning procedure, learning took place over a number of consecutive days, and the crucial test of semantic integration was performed after the learning phase had been completed (e.g., Breitenstein et al., 2007; Dobel et al., 2010; Liuzzi et al., 2010; Freundlieb et al., 2012). With such designs, there is ample opportunity for consolidation, and it is not known whether effects obtained after 4 or 5 days of learning would also be present immediately after learning. However, in several other word-learning studies that used more targeted paradigms, clear effects of memory consolidation on word learning were found (e.g., Gaskell and Dumay, 2003; Bowers et al., 2005; Clay et al., 2007; Dumay and Gaskell, 2007; Tamminen et al., 2010; Tamminen and Gaskell, 2013; Bakker et al., 2014; but see Coutanche and Thompson-Schill, 2014; Kapnoula et al., 2015). It was further shown that, while consolidation may also happen during time awake (Walker, 2005; Lindsay and Gaskell, 2013), consolidation of novel words is strongest during sleep (Dumay and Gaskell, 2007; Henderson et al., 2012). There is also evidence that these consolidation effects are directly related to electrophysiological patterns of brain activity during sleep, such as sleep spindles and slow-wave activity (Tamminen et al., 2010, 2013).

Davis and Gaskell (2009) offered an explanation of the word-learning data based on the more general theory of Complementary Learning Systems (CLS; McClelland et al., 1995). According to their account, word learning is based on two separate neural systems, namely a fast-learning but temporary memory system involving the medial temporal lobe (particularly the hippocampus), and a slower-learning but longer-lasting neocortical memory system. Novel lexical entries are thought to rely initially on hippocampal mediation, with this reliance diminishing only some time after initial encoding, by means of interleaving novel and existing memories (possibly via hippocampal memory replay: Rasch and Born, 2008). Thus, novel lexical entries are thought to fully interact with existing neocortical memories only after they have been consolidated, avoiding the danger of catastrophic interference (McCloskey and Cohen, 1989).

Many of the studies that focus on consolidation included learning of novel word-forms and tested whether and when the novel words showed lexical competition effects with existing neighbors. Only a few looked at how acquiring the meaning of a novel word might be influenced by consolidation (e.g., Clay et al., 2007; Tamminen and Gaskell, 2013). The latter studies also showed evidence for consolidation, but the results were less clear than in the lexical-competition studies. Thus, further research is certainly warranted to identify necessary conditions for consolidation effects in semantic word learning.

Here, as the exact mechanism of any consolidation effects was not a focus of our study, a simple method for testing consolidation was selected: the set of novel words was split in half, and the two resulting sets of words were tested at different delays after learning. With this design, we are able to capture basic effects of consolidation, but not specific effects of sleep.

In the following, results from three experiments are reported. In all experiments, participants could associate novel words with L1 color words, by means of the above-described word-word pairing procedure. These words were then entered into a Stroop task, during which participants were instructed to press the button that corresponded in color to the ink color of the presented word. Novel words were presented either in their congruent (“learned”) or in an incongruent ink color. To capture the potential influence of memory consolidation, different subsets of the learned words were tested either immediately after learning and/or a day later.

Experiment 1 assessed whether newly learned color words would show any Stroop effects at all, immediately or a day after learning. To obtain a direct quantitative comparison of the effect sizes in the native and in the novel words, the novel words were intermixed with (L1) German color words. In Experiment 2, novel words were again tested alongside their German counterparts, but after a much shorter learning phase, and control trials were added to assess facilitation and inhibition components of the Stroop effect. Experiment 3 returned to the design of Experiment 1 and tested whether removing the German color word trials from the Stroop blocks affected the basic novel-word Stroop effect. To assess consolidation effects in more detail, this third experiment also included a second group of participants who received their first Stroop block only on the second day, 24 h after learning.

Experiment 1

Experiment 1 was designed to test whether novel color words are sufficiently integrated into lexico-semantic memory to produce Stroop congruency effects within 24 h of learning. In a brief learning session, novel words were associated with native color words and subsequently tested in a manual Stroop task. To assess potential effects of memory consolidation, half of the novel color words were tested immediately after learning, the other half 24 h later.

Materials and Methods

Outline

Experiment 1 was divided into two sessions, spaced approximately 24 h apart (see Figure 1C for an overview). Session 1 consisted of two parts: (a) Statistical learning of 10 novel words each paired with a German color term, with both novel and German words printed black (learning phase); (b) manual Stroop task with a subset of four novel color words and their German translations as stimuli (Stroop 1). Session 2: manual Stroop task, with a different subset of four novel color words and their German translations as stimuli (Stroop 2). In the manual Stroop tasks, participants had to press one of four colored buttons that matched the ink color of the novel or German word on the screen. To minimize effects of verbal short-term memory, a crossword puzzle separated the learning and test phases on Day 1.

FIGURE 1

Figure 1. Overview of Experiment 1. (A) Statistical learning principle: While match and mismatch trials appear equally often, some novel words are paired frequently with a particular native language color word (illustrated here for the pair of alep and blau [blue]). (B) Stroop task: Example stimuli for the four conditions. (C) The order of tasks.

Participants

Twenty-four native speakers of German, most of them students, took part in the experiment (21 female; age range: 19 to 28 years, M = 21.25, SD = 2.33). Participants reported to have no color vision deficiency and had normal or corrected-to-normal visual acuity. They gave their written consent and received course credit or 9 €. All experiments reported here complied with the ethical standards formulated by the Ethics Committee of the Psychology department, University of Münster.

Materials

Four focal colors (red, green, blue, yellow) and four subordinate colors (violet, orange, pink, brown) were selected, as well as black and white. Except for the latter two, which were included merely to increase the size of the learning set, all colors were used as “ink” colors in the Stroop task. Two different subsets of four colors were used for the Stroop tasks on Day 1 and on Day 2. The two subsets were composed so as to keep the four colors within a set sufficiently discriminable (Set 1: red-yellow-violet-brown, Set 2: green-blue-pink-orange). The two subsets were identical for all participants, but the assignment of the subsets to the two Stroop sessions was counterbalanced between participants.

The 10 corresponding German color words were: rot (red), gelb (yellow), blau (blue), grün (green), lila (violet), orange (orange), pink (pink), braun (brown), schwarz (black), and weiß (white). These were used for novel word to color word associations during the learning phase, and except of the latter two, as word stimuli during the Stroop blocks.

Twenty-five nonwords (e.g., alep, fupo, lopek) from an existing corpus (Breitenstein and Knecht, 2002) served as novel words in the learning and the Stroop tasks. They are 4–5 letters long and do not elicit any particular lexical associations, as rated by an independent sample (see Breitenstein and Knecht, 2002, for details on word generation and selection criteria). The nonwords are easily pronounceable for native German speakers. Because of their common bi-syllabic structure and simple vowel-consonant alternations, they can be classified as stemming from a common vocabulary of an unknown language. Ten of these nonwords were selected to serve as to-be-learned color names, from which three different sets of novel word to color word assignments were constructed (see Supplementary Materials). We made sure that there was no phonological or graphemic onset or offset overlap between selected nonwords and their corresponding German color names within each list. The remaining 15 of the 25 nonwords served as fillers during statistical learning. For practical purposes, we will henceforth use the generic term Language to differentiate the sets of German and novel words.

Experimental Procedure

The experiment was conducted using DMDX software (Forster and Forster, 2003) running on a Windows PC. Stimuli were presented at an eye-to-screen distance of about 60 cm on a 17″ LCD monitor running at 120 Hz. Stimuli appeared on a gray background (RGB values: 210-210-210). Words appeared in lower case Arial Bold font, subtending a maximal visual angle of about 3.5° horizontally and 1° vertically. Responses were recorded using a standard Windows keyboard connected via a USB port.

Learning procedure

The learning paradigm was adapted from the statistical learning procedure described by Breitenstein and Knecht (2002). During the learning phase, pairs of words were presented on a computer screen. Each pair consisted of a novel word and a German color word. On each trial, a fixation cross appeared centrally for 200 ms, followed by one of the novel words in black font, just above the center. 250 ms later, a German color word was added to the display, just below the center, and also in black. The two-word display remained on the screen for 1500 ms. From the onset of the second word, participants had a 1800 ms time window to decide whether the two words belonged together or not, pressing the right shift-key to indicate that the words belong together, or the left shift-key to indicate that they do not. Within the learning block, matching and mismatching word pairs appeared equally often (cf. Figure 1A).

Participants were informed beforehand that it was initially impossible to tell whether a pair matched or not, but that during the course of the learning phase, the more frequent co-occurrence of some word-word pairs would help discriminate matching from mismatching pairs. No trial feedback was given except if the participant failed to come up with a response in time, in which case the words “Zu langsam!” (= too slow) were presented at the bottom of the screen for 600 ms. After the button press or the time-out feedback, the next trial started after a random delay between 100 and 400 ms.

The statistical learning principle was implemented in the following manner (see also Table 1): During the learning phase, each German color word was presented 24 times with its to-be-associated novel word (match trials), and once with each of the remaining 24 novel words (mismatch trials). Of the 24 novel words from the mismatch trials, nine were from the other novel words of the learning set (i.e., novel words to take on the meaning of a different color). The remaining mismatch words were novel words that appeared in mismatch trials only and were not systematically associated with any particular meaning. Thus, over the course of the learning phase, participants could find out the matching word-word pairs only by exploiting the frequency of couplings.

TABLE 1

Table 1. Frequencies of word pairings during statistical learning of Experiment 1.

The learning phase consisted of 480 trials and lasted about 22 min. It was subdivided into 4 blocks of 120 trials each, separated by three 30-s breaks. Trials were presented in different random order for each participant, with the constraint that each 120-trial block contained 6 match and 6 non-match trials for each of the German color words. After the learning phase on Day 1, participants filled out the crossword puzzle (duration approx. 5 min.), after which the Stroop task of Day 1 followed.

Stroop task

Immediately after the crossword puzzle and again at the beginning of the second day's experimental session, participants took part in a Stroop block. The Stroop tasks of Day 1 and Day 2 were identical except that different sets of four colors were used on each day, along with the corresponding German and the learned novel color words.

In the Stroop task, words were presented one at a time: either a German color word or a novel color word. These words were printed in one of the four ink colors assigned to that session, yielding congruent and incongruent combinations of ink color and word meaning (see Figure 1B). Each trial began with the presentation of a fixation cross that stayed on the screen for 200 ms and was followed by a word presented centrally for 150 ms. Participants were to indicate the ink color of the word by pressing the correspondingly colored response button as quickly as possible. Four buttons of the PC keyboard were used (“y” “x” “,” and “.” on the German layout), marked by correspondingly colored stickers. Participants were to use their left and right middle and index fingers to indicate the ink color the word had been presented in, ignoring the word's meaning. Color-to-button assignments were switched between participants. Participants were given 1800 ms to respond. Feedback was given on the screen for all responses (Richtig! = correct, Falsch! = incorrect, Zu langsam! = too slow). A blank screen (random duration between 850 and 1150 ms) concluded each trial.

For the Stroop task, we selected only one incongruent ink color for each German or novel color word: e.g., we presented gike either in red (congruent) or in yellow (incongruent), not in the ink colors violet and brown that also appeared during the same block (see Table 2). The reason for this deviation from the classic Stroop design is the following: In a typical native-language four-colors Stroop task (e.g., with colors red, green, blue, yellow), each color word is presented three times as often in the congruent version (red printed in red) as in each of the three possible incongruent versions (red printed in green, blue, or yellow), such that congruent and incongruent trials occur equally often. However, if we had presented the novel-word Stroop trials according to this scheme, participants would have had an additional opportunity to learn the correct novel-word-to-color couplings (because, e.g., gike, meaning red, is more often presented in red than in any of the other colors). Moreover, such a presentation scheme would also have provided the opportunity for direct word-response association (e.g., gike = second button from left), which would be a severe confound in a manual Stroop task. Schmidt et al. (2007) present evidence for such associative learning within the Stroop task. By presenting the color words in just one incongruent version, we eliminated any opportunity to learn the correct word-color or word-response pairs within the Stroop task. Crucially, this excludes the possibility that subsequent performance differences between congruent and incongruent Stroop trials might be due to or influenced by learning effects during the Stroop task itself. The German color words were presented in the same incongruent color as the corresponding novel color words.

TABLE 2

Table 2. Overview of color-word stimuli in the Stroop task.

Each of the session's four German and four novel color words was shown 30 times in its congruent and 30 times in its incongruent ink color, yielding 480 trials, which were presented randomly in 4 blocks of 120 trials, separated by breaks of 30 s. The Stroop task lasted about 26 min.

On the second day, 24 ± 2 h after the first session, participants returned to the laboratory to repeat the Stroop task. This second Stroop task included the remaining set of four colors and their corresponding German and novel color words. All other details were identical to the Stroop task on Day 1.

Results

Learning Phase

To assess learning success, the percentage of correct responses was calculated for the final block from the learning phase (last 60 trials). Participants reached an average level of 95.1 % [SD = 4.4] correct decisions (chance level = 50%; see Supplementary Material for learning curves to all three experiments).

Stroop Task

For reaction time (RT) analysis of the Stroop data, the first 40 trials of each day's Stroop block, error trials, as well as the slowest and fastest 5% of each condition's remaining responses were discarded before calculating mean RTs. On both days and in both stimulus languages, responses to incongruent trials were slower than those to congruent trials, but the effect was larger in the German trials (Figure 2).

FIGURE 2

Figure 2. Mean response times in the Stroop task of Experiment 1. Error bars here and in the following graphs indicate within-participant standard errors of the mean (Loftus and Masson, 1994; Cousineau, 2005; Morey, 2008).

A repeated-measures analysis of variance (ANOVA) with factors Language (German/Novel), Congruency (Congruent/Incongruent) and Day (Day 1/Day 2) was calculated to confirm these observations. There were main effects of Language (responses to German words were slower than those to novel words), F_{(1, 23)} = 28.49, p < 0.001, η²_p = 0.55, and of Congruency (responses to congruent stimuli were faster than those to incongruent stimuli), F_{(1, 23)} = 95.80, p < 0.001, η²_p = 0.81. The main effect of Day just failed significance, F_{(1, 23)} = 3.87, p = 0.061, η²_p = 0.14. As indicated by a significant Congruency by Language interaction, F_{(1, 23)} = 65.12, p < 0.001, η²_p = 0.74, the congruency effect was larger for German color words (mean congruency effect over both days: 73 ms) than for novel color words (mean effect 20 ms). The remaining two-way interactions did not reach significance (Fs ≤ 1.31, ps ≥ 0.264).

To add statistical backing to the visual impression that congruency effects were present at both time points in both stimulus languages, we calculated separate repeated-measures ANOVAs for the German and novel word mean RTs, each including Congruency and Day as factors. The resulting pattern of effects was identical for both languages. The only significant effect in both cases was the main effect of Congruency: German words, F_{(1, 23)} = 129.40, p < 0.001, η²_p = 0.85; novel words, F_{(1, 23)} = 14.97, p < 0.001, η²_p = 0.39. The main effect of Day was marginally significant in both languages: German words, F_{(1, 23)} = 4.01, p = 0.057, η²_p = 0.15; novel words, F_{(1, 23)} = 3.17, p = 0.09, η²_p = 0.12. The interaction effect was not significant in either of the languages: German words, F_{(1, 23)} = 2.01, p = 0.170; novel words, F_{(1, 23)} = 0.56, p = 0.46. Thus, in both stimulus languages, the congruency effect was present on both days and did not change significantly between days.

Despite the fact that in both stimulus languages Congruency did not reliably interact with Day, there was a Three-Way interaction of Language by Congruency by Day in the overall ANOVA, F_{(1, 23)} = 5.69, p = 0.026, η²_p = 0.20. This is explained by the fact that the change of the congruency effect from Day 1 to Day 2 goes in opposite directions in the two languages: There is a decrease of the congruency effect in the German words from Day 1 to Day 2 (from 82 to 65 ms), and an increase of the effect in the novel words (from 15 to 24 ms). Although these changes themselves are not significant (see interaction effects in within-language ANOVAs), the three-way interaction is.

Errors showed a similar pattern as the RTs. A repeated-measures ANOVA with factors Language, Congruency, and Day on the arcsine-transformed percent error rates revealed significant main effects of Language, F_{(1, 23)} = 15.48, p < 0.001, η²_p = 0.40, and Congruency, F_(1,23) = 12.55, p = 0.002, η²_p = 0.35. Neither the main effect of Day nor any of the interactions reached significance (all Fs < 2.29, all ps > 0.143). Averaged over the two sessions, the mean percent error rates were (SDs in brackets): German congruent, 5.71 [3.80], incongruent, 7.84 [5.04], novel congruent, 4.91 [3.76], incongruent, 6.48 [4.03].

Discussion

Experiment 1 was designed to test whether novel words that have recently been associated with native color words via lexical association are already able to produce a congruency effect in the Stroop paradigm. The response-time findings show that this is indeed the case: Immediately after learning as well as 24 h later, novel color words generated sizable congruency effects. Given that learning in this experiment consisted of a word-word-association procedure that neither required nor encouraged deep semantic processing of the novel words, the presence of a Stroop effect seems notable. The fact that we see the effect immediately after learning suggests that, under these conditions, consolidation is not necessary for the effect to emerge.

We further found that the change of the congruency effect between the two sessions was not identical in the two stimulus languages: The congruency effect in the German words decreased by 17 ms on the second day compared to the first day's Stroop session, while in the novel words the effect increased by 9 ms. Thus, in both languages, congruency effects are present on both days, but the significantly contrasting pattern of overnight changes in the Stroop effects, signaled by the three-way interaction, points to the possibility that, during the 24 h interval, the two classes of words are processed in a qualitatively different way. Experiment 3 will address the question of time and consolidation effects more directly.

The learning run in this first experiment, although based on a relatively shallow learning task, contained a large number of trials per word and thus resulted in a classification performance that approached ceiling levels. It is therefore unclear whether the novel word congruency effect crucially depends on such a large number of learning trials or whether a significant reduction of the trial number will lead to a qualitatively similar result.

Furthermore, so far the Stroop sessions only contained congruent and incongruent trials but no neutral control stimuli, rendering it impossible to clearly identify the effect as facilitatory, inhibitory, or a mix of both. In native-language Stroop, these two main components (facilitation and inhibition) can indeed be distinguished (e.g., Redding and Gerjets, 1977). They are respectively defined as the difference in response times between neutral control stimuli and congruent stimuli (facilitation) or between neutral control stimuli and incongruent stimuli (inhibition). While the relative proportions of the components may vary depending on the properties of the neutral stimuli (e.g., Sharma and McKenna, 1998), the interference component is typically substantially larger than the facilitation component (MacLeod, 1991). If the novel word effect were closely linked to the native words effect, then it should at least be similarly divisible into an inhibitory and a facilitatory component.

In Experiment 2, we addressed both the question of learning intensity and the question of whether the novel word congruency effect is composed of facilitation, inhibition, or both.

Experiment 2

The design of Experiment 2 closely followed that of Experiment 1, but it contained two changes. First, we lowered the number of learning trials per novel word to one third of that from the previous experiment, to test whether the congruency effect in the novel words is obtained even if the classification performance at the end of learning is significantly reduced. Second, to isolate facilitation and inhibition effects, we introduced neutral control stimuli into the experiment, namely names of non-color-related objects.

Because these control stimuli were supposed to serve as a baseline for the respective stimulus language's color words, we introduced control items for both languages: for German color words, a set of not color-related German object names (e.g., Mappe [folder]); for novel color words, a further set of novel words that were to become translations of the German object names. The latter were learned in the same manner as the novel color words. Thus, German and novel color words had their own corresponding lexical baselines (the respective object names). The experiment also included a set of non-lexical control stimuli (strings consisting of upper- and lower case X-letters), but because responses to these stimuli did not differ from those to the other (lexical) control items, we will only briefly report the results from this condition.