On the Proper Treatment of the N400 and P600 in Language Comprehension

Event-Related Potentials (ERPs)—stimulus-locked, scalp-recorded voltage fluctuations caused by post-synaptic neural activity—have proven invaluable to the study of language comprehension. Of interest in the ERP signal are systematic, reoccurring voltage fluctuations called components, which are taken to reflect the neural activity underlying specific computational operations carried out in given neuroanatomical networks (cf. Näätänen and Picton, 1987). For language processing, the N400 component and the P600 component are of particular salience (see Kutas et al., 2006, for a review). The typical approach to determining whether a target word in a sentence leads to differential modulation of these components, relative to a control word, is to look for effects on mean amplitude in predetermined time-windows on the respective ERP waveforms, e.g., 350–550 ms for the N400 component and 600–900 ms for the P600 component. The common mode of operation in psycholinguistics, then, is to tabulate the presence/absence of N400and/or P600-effects across studies, and to use this categorical data to inform neurocognitive models that attribute specific functional roles to the N400 and P600 component (see Kuperberg, 2007; Bornkessel-Schlesewsky and Schlesewsky, 2008; Brouwer et al., 2012, for reviews). Here, we assert that this Waveform-based Component Structure (WCS) approach to ERPs leads to inconsistent data patterns, and hence, misinforms neurocognitive models of the electrophysiology of language processing. The reason for this is that the WCS approach ignores the latent component structure underlying ERP waveforms (cf. Luck, 2005), thereby leading to conclusions about component structure that do not factor in spatiotemporal component overlap of the N400 and the P600. This becomes particularly problematic when spatiotemporal component overlap interacts with differential P600 modulations due to task demands (cf. Kolk et al., 2003). While the problem of spatiotemporal component overlap is generally acknowledged, and occasionally invoked to account for within-study inconsistencies in the data, its implications are often overlooked in psycholinguistic theorizing that aims to integrate findings across studies. We believeWCS-centric theorizing to be the single largest reason for the lack of convergence regarding the processes underlying the N400 and the P600, thereby seriously hindering the advancement of neurocognitive theories and models of language processing.

Event-Related Potentials (ERPs)-stimulus-locked, scalp-recorded voltage fluctuations caused by post-synaptic neural activity-have proven invaluable to the study of language comprehension. Of interest in the ERP signal are systematic, reoccurring voltage fluctuations called components, which are taken to reflect the neural activity underlying specific computational operations carried out in given neuroanatomical networks (cf. Näätänen and Picton, 1987). For language processing, the N400 component and the P600 component are of particular salience (see Kutas et al., 2006, for a review). The typical approach to determining whether a target word in a sentence leads to differential modulation of these components, relative to a control word, is to look for effects on mean amplitude in predetermined time-windows on the respective ERP waveforms, e.g., 350-550 ms for the N400 component and 600-900 ms for the P600 component. The common mode of operation in psycholinguistics, then, is to tabulate the presence/absence of N400-and/or P600-effects across studies, and to use this categorical data to inform neurocognitive models that attribute specific functional roles to the N400 and P600 component (see Kuperberg, 2007;Bornkessel-Schlesewsky and Schlesewsky, 2008;Brouwer et al., 2012, for reviews).
Here, we assert that this Waveform-based Component Structure (WCS) approach to ERPs leads to inconsistent data patterns, and hence, misinforms neurocognitive models of the electrophysiology of language processing. The reason for this is that the WCS approach ignores the latent component structure underlying ERP waveforms (cf. Luck, 2005), thereby leading to conclusions about component structure that do not factor in spatiotemporal component overlap of the N400 and the P600. This becomes particularly problematic when spatiotemporal component overlap interacts with differential P600 modulations due to task demands (cf. Kolk et al., 2003). While the problem of spatiotemporal component overlap is generally acknowledged, and occasionally invoked to account for within-study inconsistencies in the data, its implications are often overlooked in psycholinguistic theorizing that aims to integrate findings across studies. We believe WCS-centric theorizing to be the single largest reason for the lack of convergence regarding the processes underlying the N400 and the P600, thereby seriously hindering the advancement of neurocognitive theories and models of language processing.

WHY THE DATA ARE INCONSISTENT
ERP studies examining the processing of semantic incongruity sometimes report contradictory results. To shed light on these contradictions, Van Petten and Luka (2012) (henceforth VP&L) conducted a systematic review on semantic incongruity effects. VP&L selected studies comparing incongruent to congruent sentence-final words-e.g., "He spread the warm bread with socks/butter" (Kutas and Hillyard, 1980)-in healthy adults, using sentences that were otherwise syntactically felicitous, and procedures that did not have an explicit by-item acceptability or judgment task. As these studies were mostly targeted at the N400 component, statistics for the P600 time-window were not always available; if they were not, a P600-effect was judged to be present if the difference in this time-window was at least half as large as the effect in the preceding N400 time-window. VP&L's literature search yielded 45 studies with a total of 64 incongruent/congruent contrasts. They observed that 21 (≈33%) contrasts produced a biphasic N400/P600-effect (cf. Figure 1A), whereas 43 (≈67%) contrasts produced an N400-effect only (cf. Figure 1B).
Hence, on the WCS approach, one third of a set of contrasts-selected to be as homogeneous as possible-produces a biphasic N400/P600-pattern, whereas the other two thirds produce an N400-effect only. Crucially, as VP&L pointed out, this set of results is internally inconsistent. At most, one would predict quantitative differences between contrasts, not qualitative differences. Any viable theory or model of the N400 and the P600 in language comprehension must address this apparent inconsistency in the elicitation pattern of the P600-regardless of whether it predicts a P600-effect to be present or absent for semantic incongruities-in order to successfully capture the full data spectrum.

WAVEFORM-BASED VS. LATENT COMPONENT STRUCTURE
The WCS approach to ERPs derives component structure-e.g., the modulation pattern of the N400 and P600 componentfrom the observable waveform by looking at effects on mean amplitude in predetermined time-windows (but see Groppe et al., 2011a,b, for an alternative approach). The (implicit) logic behind this is that peak amplitudes (maxima or minima in voltages) in these time-windows are indicative of the components of interest. However, this logic violates an important principle of ERP interpretation: A peak is not the same thing as a component, and the point at which an ERP waveform peaks carries no significance in itself (cf. Luck, 2005). At any given point, a waveform merely shows the summation of the latent components contributing to the ERP signal at that time. Indeed, the processes underlying different components may temporally overlap, and the ERP signal at any given point may be composed of multiple components. Moreover, at different electrode sites, the precise composition of these components may vary, depending on the location of their generators. Hence, at any electrode, peaks in the observed waveform are nothing more than epiphenomena of the underlying Latent Component Structure (LCS)-the set of components contributing to the scalp-recorded ERP signal-and due to spatiotemporal component overlap, this LCS may look very different from the WCS.
We can now distinguish between WCS-derived N400-and P600-effects, and the N400 and the P600 as latent components. Consider the idealized waveforms depicted in Figure 1. The top row shows the WCS for two contrasts between a target (red line) and a control condition (blue line), one contrast producing a biphasic N400/P600-effect (Figure 1A), and one producing an N400-effect only ( Figure 1B). The bottom row, in turn, decomposes the target condition (red line) of each of these contrasts into a hypothesized LCS underlying its waveform.
On this LCS perspective, both contrasts do lead to an increase in the amplitude of the latent P600 component (dashed yellow line), and modulate it to an equal degree. Crucially, the contrasts differ in the degree of modulation of the latent N400 component (dashed green line). Spatiotemporal overlap between the N400 and the P600 causes the larger N400 (contrast 2) to obscure the P600 in the WCS, whereas the smaller N400 (contrast 1) does not. Hence, when the latent N400 and P600 are both modulated to a strong enough extent, LCS and WCS may be in qualitative agreement on component structure (contrast 1). If, on the other hand, one latent component is modulated to a greater degree than the other, WCS and LCS may qualitatively disagree (contrast 2).
We now consider possible LCS-derived explanations for the apparent WCS-derived inconsistencies in the data. More specifically, given the robustness of the N400-effect across the studies reviewed by VP&L, there are effectively two LCS accounts to consider. The first entails an explanation in which spatiotemporal overlap between the latent N400 and the latent P600 in the control condition leads to the spurious presence of a P600-effect for the target condition relative to control, that is, a constellation in which the latent P600 for the control condition is less positive than the latent P600 in the target condition. However, as the control condition is always less negative in the N400 time-window, there is no plausible way for the latent P600 to be manifest as less positive than the target in the WCS. The second account, in turn, entails an explanation in which the absence of a P600-effect is due to spatiotemporal overlap between the latent N400 and the latent P600. For this account, there are two interacting factors that could explain why the P600 does not survive overlap with the N400. Firstly, target items may simply vary in degree of N400 modulation, and hence in the degree of P600 attenuation. This predicts that if the incongruent sentence-final words get replaced by congruent, but unexpected (non-zero Cloze) completions, N400 amplitudes for target items should be attenuated, rendering the P600 more likely to survive spatiotemporal overlap and be visible in the WCS. Consistent with this prediction, VP&L report 13 studies with a total of 27 unexpected/expected contrasts, 18 (≈66%) of which indeed produce a P600-effect 1 . Secondly, as we will argue below, the absence of a by-item task in the studies reviewed by VP&L induces an overall bias toward attenuated P600 modulations.

TASK-DEPENDENCE OF THE P600
It is well-established that the processes underlying the P600 are strongly task-dependent; that is, if an experiment does not involve a by-item acceptability or plausibility judgment task, this typically leads to attenuation of P600 amplitude (e.g., Kolk et al., 2003 andSchacht et al., 2014;see Kuperberg, 2007 andBrouwer et al., 2012 for discussion). Crucially, VP&L specifically selected the studies included in their review to not have an overt by-item task, thereby skewing the distribution of findings toward attenuation of the P600. The significance of this becomes particularly apparent when one considers the interaction between task-dependence and spatiotemporal component overlap: Taskdriven attenuation of the P600 is predicted to have a more pronounced effect on WCS when a target condition increases both N400 and P600 amplitude relative to control, compared to when it only increases P600 amplitude.
Two experiments by Kolk et al. (2003) support this prediction. In a first experiment with an explicit acceptability judgment task, they found that a semantically incongruous contrast produced a biphasic N400/P600-effect relative to control, and that a semantic reversal anomaly-e.g., a sentence describing a fox hunting poachers-produced a P600-effect only. In a second experiment, the judgment task was removed, and whereas the P600-effect to the reversal anomaly persisted, the semantic incongruity now produced an N400-effect only. This logic can be applied in reverse to the studies reviewed by VP&L: If these studies were repeated with a by-item acceptability judgment task, the P600 is predicted to survive spatiotemporal overlap with the N400, and be apparent for all semantic incongruities and unexpectancies in the WCS.

INVESTIGATING LATENT COMPONENT STRUCTURE
To the extent that the goal of ERP-based investigations of language is to inform our understanding of the underlying computational operations involved, researchers should be concerned with LCS rather than WCS. While the importance of LCS is generally acknowledged, psycholinguistic theorizing is predominantly WCS-centric (but see Hagoort, 2003, for a notable exception), thereby lagging behind other fields in which the implications of LCS have long been incorporated in interpreting ERP findings (see Squires et al., 1975;Näätänen, 1982, among others). Hence, psycholinguistics should go beyond just acknowledging the problem, and start factoring in the implications of LCS in integrating findings within and across studies. Ideally, this should also lead to a shift from studying WCS to studying LCS. However, as the scalp-recorded ERP signal is a summation of latent components, it obscures their independent contributions, and hence investigating the LCS underlying this signal is extremely challenging. Nonetheless, experimental design considerations as well as the (complementary) use of different analysis techniques may help to mitigate this challenge.
First of all, experimental designs could incorporate contrasts that focus on isolating a single component of the signal by keeping effects of spatiotemporal overlap constant between conditions. For instance, if the component of interest is the N400, spatiotemporal overlap with the P600 could be reduced as much as possible by running the experiment without any explicit task. If, on the other hand, the P600 is the component of interest, a relevant task should be used, as well as designs that keep the degree of N400 modulation constant, for instance by using context manipulation designs in which the target word is primed to an equal degree across conditions (cf. reversal anomalies; see Brouwer et al., 2012, for discussion).
Secondly, analysis techniques could be explored that go beyond identifying ERP components by applying ANOVAs or LMEMs on pre-determined time-windows. Such techniques include methods that decompose the signal into principal (PCA; Donchin and Heffley, 1978) or independent components (ICA; Makeig et al., 1997), those that aim to identify processing stages from the signal (using HSMMs; Borst and Anderson, 2015;Anderson et al., 2016), mass univariate analysis (Groppe et al., 2011a,b), regression-based estimation of the waveform (rERPs; Smith and Kutas, 2015a,b), and time-frequency analysis (TFA; Pfurtscheller and Da Silva, 1999;Roach and Mathalon, 2008). These methods cannot solve the problem of the obscured signal itself, but may offer complementary insights that help unravel the underlying LCS. For instance, Regel et al. (2014) show that P600 activity yields a specific frequency profile (power increase/decrease) as determined by TFA. Hence, for a given signal, the presence of such a frequency profile (in the average time-frequency representation) might be indicative of the presence of a P600, even though no P600-effect is present in the WCS.
Finally, LCS could be investigated through bottom-up modeling of the ERP signal. Data from complementary neuroimaging methods (e.g., fMRI and PET) and lesion studies, as well as neuroanatomical models of language electrophysiology (e.g., Brouwer and Hoeks, 2013) could be used to constrain and guide source modeling (see Elting et al., 2003, for a decomposition of the P300 into the P3a and P3b). Moreover, temporal overlap of the N400 and P600 could be modeled within explicit neurocomputational models of ERPs (Alday et al., 2014;Brouwer et al., 2017). Brouwer et al. (2017), for instance, show that their neurocomputational model of the N400 and the P600 best accounts for ERP data on semantic processing if spatiotemporal overlap between these components is taken into account.

CONCLUSION
Event-Related Potentials (ERPs) are invaluable to the study of language comprehension, but psycholinguistics often gets them wrong. The standard mode of operation is to tabulate the presence/absence of effects on mean amplitude in predetermined time-windows, which are then taken to be indicative of language-sensitive ERP components, such as the N400 and the P600. The VP&L review on the processing of semantic incongruity shows that this WCS approach leads to inconsistent data patterns. We have argued that when faced with such apparent inconsistencies in patterns of WCS-derived N400and/or P600-effects across studies with similar manipulations, the LCS underlying the observed waveforms may offer a principled explanation for the observed variance. The observed WCS may (minimally) be decomposed into the contribution of latent N400 and P600 components, which may overlap spatiotemporally in the ERP signal. Importantly, these latent components are known to be attenuated and/or amplified in systematic ways: For instance, N400 amplitude is sensitive to the degree of semantic expectancy, while P600 amplitude is modulated by the nature of the comprehension task. It is the interplay between the systematic modulation of these latent components, due to their spatiotemporal overlap in the observed ERP signal, which explains the variance in WCSderived effects, such as those observed for the studies reviewed by VP&L. We believe that in order to arrive at a viable neurocognitive model of language processing, it is essential to incorporate the implications of LCS into psycholinguistic theorizing.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.