The Use of Electroencephalography in Language Production Research: A Review

Speech production long avoided electrophysiological experiments due to the suspicion that potential artifacts caused by muscle activity of overt speech may lead to a bad signal-to-noise ratio in the measurements. Therefore, researchers have sought to assess speech production by using indirect speech production tasks, such as tacit or implicit naming, delayed naming, or meta-linguistic tasks, such as phoneme-monitoring. Covert speech may, however, involve different processes than overt speech production. Recently, overt speech has been investigated using electroencephalography (EEG). As the number of papers published is rising steadily, this clearly indicates the increasing interest and demand for overt speech research within the field of cognitive neuroscience of language. Our main goal here is to review all currently available results of overt speech production involving EEG measurements, such as picture naming, Stroop naming, and reading aloud. We conclude that overt speech production can be successfully studied using electrophysiological measures, for instance, event-related brain potentials (ERPs). We will discuss possible relevant components in the ERP waveform of speech production and aim to address the issue of how to interpret the results of ERP research using overt speech, and whether the ERP components in language production are comparable to results from other fields.

Talking is a daily routine in our lives. However, to date there are only few language production studies, in particular on sentence processing, using event-related potential (ERP) measures. This is due to the fact that, for instance, lip, head, and eye movements accompany overt speech (e.g., Grözinger et al., 1975;Brooker and Donald, 1980;Wohlert, 1993). It was feared that such muscle activation would distort the electroencephalography (EEG) signal and therefore make it impossible to investigate language production using EEG. To avoid this problem, language production research focused on meta-linguistic tasks (e.g., phoneme-monitoring), covert naming, and delayed naming (e.g., Van Turennout et al., 1997;Schmitt et al., 2000Schmitt et al., , 2001Abdel Rahman et al., 2003). These tasks are successful in avoiding potential speech movement related artifacts, however, they are not without disadvantages. For instance, in case of covert naming, one cannot be sure whether participants follow task instructions. Moreover, the need of actual production of speech may be important to earlier processing and qualitatively influence the speech production process. For instance, intracranial recordings and an fMRI study, showed a different pattern of brain activity for covert versus overt naming (Christoffels et al., 2007b;Pei et al., 2011). In case of button-presses, it is unlikely that only language processes contribute to the response. For instance, in the case of error processing, it cannot be completely excluded that some of the observed errors were due to action slips (e.g., responded with the wrong hand) and were not verbal errors per se (e.g., responding "yes" to a phoneme/n/in lamp).
The recent increase in published papers measuring overt speech responses using EEG clearly indicates that there is an interest and a great demand for research in language production combining both overt speech responses and EEG recordings. In this paper, we will give an overview of all presently published studies that used tasks requiring immediate overt responses (e.g., picture naming).
The paper is organized as follows: first, we review studies that focused on stimulus-locked analyses, i.e., locked to the time from stimulus onset until a response was given. Within these studies, a division is made between studies investigating native language production, followed by bilingual language production. Second, we will review studies that investigated response-locked ERPs, i.e., processes occurring shortly before or after an overt response was given.

STIMULUS-LOCKED STUDIES NATIVE LANGUAGE PRODUCTION
To our knowledge, the first published work that combined overt speech with EEG recordings was conducted by Duncan-Johnson and Kopell (1981) and closely replicated much later by Liotti et al. (2000). In both of these studies, a Stroop task was used, where participants were instructed to overtly name the color a word was printed in while ignoring the word itself. However, these earlier studies are limited by sample size (i.e., 12 and 8, respectively) and by number of analyzed electrodes (e.g., only three midline www.frontiersin.org electrodes). Recently, the interest in combining language production with EEG has been revived. The majority of these recent studies investigate the time course of word selection during language production. Most of what we know about the time course of stages of spoken word production comes from chronometric experiments (e.g., voice-key onset latencies; Levelt et al., 1999) and meta-analytic temporal estimates (Indefrey and Levelt, 2004). The high temporal resolution of EEG can provide more information about time course of the spoken word production when combined with tasks that require overt speech production.
According to the Levelt et al. (1999), production of a spoken word consists of lexical selection, lemma retrieval, morphological and phonological code retrieval, and finally articulation. Most of the recent ERP studies focused on the lexical access aspect of word production (Hirschfeld et al., 2008;Costa et al., 2009;Dell'Acqua et al., 2010;Strijkers et al., 2010;Aristei et al., 2011).
In a picture-word interference (PWI) paradigm, Hirschfeld et al. (2008) combined each picture with four different distractors: a non-linguistic distractor (e.g., row of Xs), an unrelated distractor word (e.g., flower -DOG), and two types of semantic distractors: words that reflected surface features of a target (e.g., fur -DOG) and words that belong to the same semantic category as a target (e.g., cat -DOG). At a 120-220-ms poststimulus time interval, the feature related condition resulted in a more negative deflection of the ERP waveform than the unrelated condition. This effect was interpreted as facilitating early stages of visual object processing. During the same time interval, there was a significant difference between all linguistic distractors and the non-linguistic ones. This effect was explained as a result of general conflict-monitoring processes, which are stronger for words than a row of Xs, since only the words have to be suppressed before naming a target picture. However, the 120-220-ms time window approximately corresponds to the time window of 150-250 ms estimated for lexical selection (Indefrey and Levelt, 2004). Thus, it is possible that the observed difference between linguistic and non-linguistic distracters was driven by lexical access, since that is what distinguishes word distractors from a row of Xs. This explanation is in line with the findings of more recent studies (Costa et al., 2009;Sahin et al., 2009;Dell'Acqua et al., 2010;Strijkers et al., 2010;Aristei et al., 2011).
For instance, Aristei et al. (2011) combined PWI with a blocking paradigm [i.e., naming pictures in a semantic context (e.g., cat, dog, horse) and in an unrelated context (e.g., cat, table, flute)]. Aristei et al. (2011) report similar timing for distractor and blocking effects (200 and 250 ms post-stimulus presentation, respectively), possibly suggesting that both effects have similar underlying mechanisms and occur within the time frame of lexical access (Indefrey and Levelt, 2004). In another recent study, Costa et al. (2009) used a so-called cumulative semantic interference paradigm. In this paradigm, participants were asked to name pictures presented in intermixed semantic categories (e.g., turtle, hammer, tree, crocodile, bus, axe, snake, etc.). The typical finding for this paradigm is that naming latencies of a given picture depend on the ordinal position of the picture and on how many items from the same category preceded the pictures (Howard et al., 2006;Costa et al., 2009). Costa et al. (2009) showed that pictures elicited a typical P1/N1/P2 ERP complex in all conditions. In addition, Costa et al. (2009) demonstrated a modulation of the P2, N2, and P300 components. In the N400 window, there was a significant effect of ordinal position; however, it did not correspond to a cumulative pattern seen in other components. Furthermore, similar to Aristei et al. (2011), they showed that lexical access occurred around 200 ms after the onset of the picture. This finding is in line with their previous picture naming study, in which Strijkers et al. (2010) showed that the P2 was sensitive to the lexical frequency of the items, with low-frequency items eliciting more positive amplitudes than high-frequency items.
Further evidence for the time course of lexical access comes from an anomic patient study. Anomic patients have difficulties in word production that could arise at different levels of word production: semantic, lexical, or phonological. Laganaro et al. (2009) recorded ERPs while anomic patients overtly named a series of pictures. They found that patients with lexical-semantic impairment exhibited ERP abnormalities starting at 110 ms after the picture onset. Interestingly, it has also been shown that during object naming, in-depth semantic knowledge about an object causes variation in EEG response 120 ms after object presentation (Abdel Rahman and Sommer, 2008).
Next to lexical access, the time course of morphological encoding in overt language production was investigated (Koester and Schiller, 2008). Koester and Schiller (2008) used a long lag-priming paradigm. Participants were presented with words and pictures, and were instructed to read aloud the words and to name the pictures aloud. The words were compounds that were morphologically related to a picture name (e.g., jaszak "coat pocket" -JAS "coat") or form-related monomorphemic words (e.g., jasmijn "jasmine" -JAS "coat"). The N400 amplitudes, starting 350 ms after the picture onset, were reduced for morphologically related compounds but not for form-related words. This corresponds to the language comprehension literature, where there is evidence that N400 amplitudes are sensitive to morphological processing (e.g., McKinnon et al., 2003). Further evidence comes from a study using intracranial recordings within Broca's area. Sahin et al. (2009) cued participants to inflect nouns (singular/plural) and verbs (past/present). The signal was modulated by the demand of inflection at 320 ms after the target word onset. The neuronal changes were independent of word class. The timing of this effect is also in accordance with meta-analytic temporal estimates of morphological encoding (Indefrey and Levelt, 2004). Eulitz et al. (2000) mapped the time course of phonological encoding during overt picture naming and forming nominal phrases (e.g., using the name and the color of the picture). Eulitz et al. (2000) compared overt production with passive viewing of the same pictures and found ERP markers of phonological encoding between 275 and 400 ms after picture onset. This effect was more pronounced in middle and posterior temporal regions in the left than the right hemisphere, possibly suggesting the involvement of Wernicke's area during phonological encoding. In a PWI paradigm, an effect of phonological distractors occurred in a similar time frame, at about 300 ms after picture onset (Dell'Acqua et al., 2010). Laganaro et al. (2009) showed that anomic patients who had impaired phonological encoding demonstrated normal electro-cortical activity (i.e., similar to healthy control Frontiers in Psychology | Language Sciences participants) before 300 ms, but abnormal patterns between 300 and 450 ms. This timing was also corroborated by intracranial recordings that showed sensitivity to phonological processes at about 450 ms after the target word onset (Sahin et al., 2009). This time window corresponds to the estimated time course of the phonological encoding (Indefrey and Levelt, 2004).
The papers discussed above have focused on single word production. However, in our everyday communication more complex utterances are produced. To our knowledge, there is only one published paper that investigated conceptual planning in complex utterances in overt language production (Habets et al., 2008). More specifically, Habets et al. (2008) addressed the so-called linearization problem, i.e., the ordering of the event in a sentence (e.g., "before X did A, Y did B" or "after Y did B, X did A"). Participants saw a sequence of two pictures. Each picture consisted of an object that has a strong association with a particular action (e.g., book and reading). Participants were instructed to describe the sequence of two actions associated with the object in chronological/reverse order. A color cue indicated a to-be produced order. ERPs for the "after" condition were more negative than for the "before" condition. This difference emerged between 180 and 230 ms after the vocalization cue, and had a fronto-central distribution. The timing of this effect corresponds closely with comprehension studies investigating temporal order of events in sentences (Münte et al., 1998) and is associated with the engagement of working memory processes in understanding more non-chronological sentences. From 300 ms onward, a parietal distribution was observed. This effect reflects the conceptualization complexity of "before" sentences (Habets et al., 2008).

BILINGUAL LANGUAGE PRODUCTION
To investigate lexical access during production of words in a second language, researchers focused on cognate words (Christoffels et al., 2007a;Verhoef et al., 2009;Strijkers et al., 2010). Cognates are words that are phonologically similar in different languages (e.g., the German -Dutch pair: Apfel -appel). Cognates are typically named faster than non-cognates (e.g., Costa et al., 2000Costa et al., , 2005Christoffels et al., 2003Christoffels et al., , 2006. Christoffels et al. (2007a) found more negative amplitudes for cognates compared to non-cognates at about 300 ms after the picture onset, which corresponds with the phonological encoding of words. Strijkers et al. (2010) found a somewhat earlier effect of cognates starting around 200 ms after picture onset, with cognates having more negative amplitudes than non-cognates. The pattern was remarkably similar during both first and second language naming. Note, however, that Figure 5 of Christoffels et al. (2007a) shows a difference between cognates and non-cognates already at around 170 ms after the picture onset. Verhoef et al. (2009) also manipulated cognate status of picture names, however, they do not report any main effect of cognates. Therefore, it is impossible to say whether and when the effects were present.
Next to cognate effects, Christoffels et al. (2007a) and Verhoef et al. (2009) investigated the role of cognitive control and inhibition during language switching. To investigate this issue, a switching paradigm was used, where participants on a given cue were required to name a picture in their first (L1) or second language (L2). Christoffels et al. (2007a) found that naming in L1 was slower and the ERPs were modulated between 275 and 375 ms (time window of N2) compared to naming pictures in L2. Verhoef et al. (2009) manipulated the time between cue and picture onset (i.e., long versus short stimulus onset intervals). They found that preparation time manipulated the degree to which inhibitory control biased language competition as indexed by the N2. Chauncey et al. (2009) also found modulation of the N2 amplitudes. Participants were instructed to overtly name pictures in their L1 (English) and their L2 (French). Pictures were preceded by a word prime, presented for 70 ms. Primes were either the (English or French) name of the to-be named picture or were unrelated to the picture. The language of the prime word affected ERP at about 200 ms after picture onset, but only when pictures were named in L2 and not in L1. The authors argued that the L1 prime interfered with suppression of the L1 lexical activation, which is needed for L2 but not L1 production, thereby creating a conflict reflected in the N2 amplitudes (Chauncey et al., 2009).
There were also first steps taken to investigate processes involved in translation from one language to another. Christoffels et al. (2009) asked participants to translate interlingual homographs: i.e., words that shared orthographic form but had a different meaning in two languages (e.g., "room" refers to cream in Dutch) and control words. Participants had to translate targets from and to their first and second language. The authors showed that the brain starts to distinguish between translation directions as early as 200 ms. The results of the study are in line with the idea that language information in the input, a "language cue," rather than an output lexicon, helps to reduce competition between languages when selecting the proper target response (Kroll et al., 2010).

CONCLUSION
The studies discussed above demonstrate that the combination of EEG recording and language production can be successfully employed. The studies provide converging evidence about the time course of word production on both native and second languages. Specifically, the brain engages in lexical selection around 200 ms after picture onset (e.g., Hirschfeld et al., 2008;Costa et al., 2009;Strijkers et al., 2010;Aristei et al., 2011), phonological encoding between 275 and 400 ms (Eulitz et al., 2000), and morphological processes starting around 350 ms after the picture onset (Koester and Schiller, 2008). The ERP research indicates that this time course is in accordance with the estimated timings reported by Indefrey and Levelt (2004). It also demonstrates that EEG recording may be a very sensitive tool to investigate temporal and qualitative differences between first and second language production. However, most of the paradigms used in speech production research require not only production of an utterance, but also comprehension (e.g., reading distractors) and a domain-general processes (e.g., suppressing distractor activation). Potentially more "pure" production paradigm could be a verbal fluency task, where participants required to name members of a given category within given time. However, even within production tasks it is difficult to manipulate different stages, e.g., lexical, morphological, phonological, and speech planning, independently of each other. Thus, ERPs could reflect multiple components associated with various comprehension, production, and domain-general processes. Future www.frontiersin.org studies are needed to disentangle these various aspects during speech production.

RESPONSE-LOCKED
During speech production, we continuously monitor what we say and what we are about to say. In investigating the working of the speech production monitor, researchers have focused on error monitoring. An electrophysiological measure related to error processing is the error-related negativity (ERN; Falkenstein et al., 1991;Gehring et al., 1993), a component of the ERP that has a fronto-central scalp distribution and peaks about 80 ms after an overt incorrect response (Bernstein et al., 1995;Scheffers et al., 1996;Holroyd and Yeung, 2003). The ERN originates in the anterior cingulate cortex (ACC) and/or the supplementary motor area (SMA; e.g., Dehaene et al., 1994;Debener et al., 2005). Recently, studies demonstrated an ERN after errors in meta-linguistics tasks (e.g., Ganushchak and Schiller, 2006, 2008aSebastián-Gallés et al., 2006) and in tasks that require an overt response (e.g., Masaki et al., 2001;Möller et al., 2007;Ganushchak and Schiller, 2008b;Riés et al., 2011). We will review the later studies below. Masaki et al. (2001) were the first to investigate whether an ERN occurs following speech errors in the Stroop color-word task. Participants were instructed to overtly name the color of each stimulus. Masaki et al. (2001) found an ERN-like component after speech errors, e.g., when participants named the wrong color. Masaki et al. (2001) used loud pink noise to suppress a socalled vocalization-related cortical potential (VRCP). The VRCP is related to movement related potential preceding vocalization and an auditory-evoked potential that follows vocalization (Gunji et al., 2000). The VRCP has a similar time course as the ERN but is independent from the correctness of the response. However, using a masking procedure might not be ideal to study verbal self-monitoring. Speakers use their output as feedback to monitor their own speech (e.g., Levelt et al., 1999). Removing such feedback might interfere with the normal working of the monitoring process (e.g., Christoffels et al., 2007b;Christoffels et al., 2011).
In a more recent study on verbal self-monitoring, no masking procedure was used. Möller et al. (2007) used a so-called SLIP paradigm to induce errors. In this task, participants have to read inductor word pairs such as "ball doze," "bash door," and "bean deck," which are followed by a target word pair such as "darn bore" (see Motley et al., 1982). The reversal of initial phonemes in the target pair compared to the inductor pairs may lead to onset exchange errors such as "barn door." Möller et al. (2007) asked their participants to covertly read the inductor word pairs and vocalize the target word pair preceding a response cue. They found an enlarged negativity on error trials, preceding, and following the response cue. The first negativity reflects conflict at a phonological/phonetic encoding stage. The second negativity indexes conflict at articulatory motor stage. Interestingly, Severens et al. (2011), found a similar negativity following the response cue in the absence of error on taboo-eliciting trials (e.g., katten nut → natte k * t; cats sense → wet c * t) compared to neutral trials. The authors concluded that taboo errors were elicited and corrected internally prior to articulation, and suggested that the negativity reflects resolution of conflict rather than detection of conflict. Ganushchak and Schiller (2008b) employed a semantic blocking picture naming task to study error monitoring in speech production. In addition to semantic context, participants' motivation was manipulated. In the high-motivation condition, participants were told that they would be financially punished for speech errors. In the low-motivation condition, neither financial punishment nor reward was administered. The authors obtained an ERN on error trials. The amplitude of the ERN was modulated by semantic context, with larger amplitudes for semantic blocks than unrelated blocks, indicating that semantic relatedness resulted in higher conflict between potential verbal responses. Furthermore, the ERN was larger and peaked later in the high-motivation condition compared to low-motivation condition, indicating higher monitoring activity.
Another component that is associated with error processing is the error positivity (Pe), which is thought to reflect a more thorough evaluation of the error response (Falkenstein et al., 1991). The Pe has a centro-parietal distribution and peaks about 300 ms after the overt error. Contrary to the ERN, the Pe is specific to overt and detected errors (for a review see Overbeek et al., 2005). The Pe after the overt vocal responses is inconsistently reported in the literature. For instance, Masaki et al. (2001) report a Pe after the incorrect trials. However, Riés et al. (2011) showed a Pe following errors that required manual response, but not after overt speech errors. It is possible that during overt speech production some of the errors are left undetected and therefore no Pe is elicited (for discussion on this issue see Riés et al., 2011). More research is needed to determine whether the Pe can be reliably observed following overt vocal responses and what the possible underlying mechanisms are.
The studies reviewed above suggest that verbal monitoring might be a special case of general performance monitoring rather than a completely different process. If so, the ERN should also be observed on correct trials. However, in the studies described above, no ERN was reported on correct trials. In contrast, in nonverbal tasks, the ERN was shown at both correct and incorrect trials (e.g., Vidal et al., 2000Vidal et al., , 2003Bartholow et al., 2005). The ERN-like amplitude on correct trials is smaller than on incorrect trials. During overt speech tasks, this negativity could have been masked by motor artifacts and therefore remained undetected on correct trials (Riés et al., 2011). To analyze overt picture naming data, Riés et al. (2011) used a blind source separation algorithm on the basis of canonical correlation analysis (BSS-CCA;De Clercq et al., 2006). This method reliably reduces the EMG artifacts induced by articulation (see De Vos et al., 2010). This analysis method allowed Riés et al. (2011) to reliably observe the ERN on both correct and incorrect trials, supporting the hypothesis that verbal monitoring involved in speech production is part of the general-purpose mechanism. This electrophysiological evidence is supported by imaging studies, showing the ACC and SMA activation during overt naming (e.g., Christoffels et al., 2007b). Interestingly, McArdle et al. (2009) showed that the Bereitschaftspotential (BP), an electrophysiological index of voluntary movement, was modulated by linguistic processes such as lexical access independently from articulation. This suggests that the premotor system plays a role in lexical access and provides further Frontiers in Psychology | Language Sciences evidence of a functional interaction between cortical motor and language networks (McArdle et al., 2009).
Taken together, the studies reviewed in this section suggest that the ERN obtained in overt speech production task is comparable to the ERN found in action monitoring studies and can be used as an electrophysiological marker in psycholinguistic research. More generally, the reliable investigation of language processes using overt responses in combination with EEG recordings is possible even in response-locked analyses.

METHODOLOGICAL RECOMMENDATIONS AND CONCLUSION
The above-reviewed studies show that artifact-free brain responses can be measured up to at least 400 ms post-stimulus presentation (e.g., Eulitz et al., 2000;Christoffels et al., 2007a;Aristei et al., 2011). In a stimulus-locked analysis, care needs to be taken to exclude trials that are contaminated by the earliest responses. A recent study, however, using Independent Component Analysis, showed that the early ERP components might not necessarily be artifact-free (Porcaro et al., 2010). Thus, the results should be interpreted with caution and potentially different methods should be used, e.g., Independent Component Analyses, to remove movement related artifacts. For the response-locked analysis, researchers interested in the ERN could use the standard procedures also used in the action monitoring studies. However, this is true only for error trials. The ERN on correct trials, is significantly smaller than the one on error trials and is more likely to be masked by motion artifacts (which are larger in overt speech compared to button-presses) and also largely affected by severe filtering (up to 12 Hz), which is commonly done in the ERN analysis on error trials. Researchers interested in the later processes, such as self-monitoring and response evaluation on correct rather than error trials should preferably use different methods of analysis to remove motionrelated artifacts (e.g., BSS-CCA, De Vos et al., 2010;Riés et al., 2011).
In terms of design, a simple and important consideration is to make sure that conditions are comparable in terms of overt output. It is known that the morphology of the speech artifacts in the ERPs varies systematically with the phonetic properties of the utterance. Therefore, it is advisable to compare conditions in which identical words are produced (Aristei et al., 2011) or -when this is impossible -care needs to be taken to match the to-be produced words not only on usual measures, such as frequency of occurrence, but also on their phonetic properties.
The ERP studies reviewed here demonstrate that classical ERP components, among others P2, N400, and ERN, can be observed in the paradigms that require an overt speech response. Thus, this review suggests that combining ERP with overt articulation is not only possible but necessary to provide more insights into the language production processes, allowing investigation of the temporal flow and scalp distributions of well-established behavioral effects (e.g., semantic interference) as well as investigation of various stages of word and sentence production.