Modeling Single-Trial ERP Reveals Modulation of Bottom-Up Face Visual Processing by Top-Down Task Constraints (in Some Subjects)

We studied how task constraints modulate the relationship between single-trial event-related potentials (ERPs) and image noise. Thirteen subjects performed two interleaved tasks: on different blocks, they saw the same stimuli, but they discriminated either between two faces or between two colors. Stimuli were two pictures of red or green faces that contained from 10 to 80% of phase noise, with 10% increments. Behavioral accuracy followed a noise dependent sigmoid in the identity task but was high and independent of noise level in the color task. EEG data recorded concurrently were analyzed using a single-trial ANCOVA: we assessed how changes in task constraints modulated ERP noise sensitivity while regressing out the main ERP differences due to identity, color, and task. Single-trial ERP sensitivity to image phase noise started at about 95–110 ms post-stimulus onset. Group analyses showed a significant reduction in noise sensitivity in the color task compared to the identity task from about 140 ms to 300 ms post-stimulus onset. However, statistical analyses in every subject revealed different results: significant task modulation occurred in 8/13 subjects, one showing an increase and seven showing a decrease in noise sensitivity in the color task. Onsets and durations of effects also differed between group and single-trial analyses: at any time point only a maximum of four subjects (31%) showed results consistent with group analyses. We provide detailed results for all 13 subjects, including a shift function analysis that revealed asymmetric task modulations of single-trial ERP distributions. We conclude that, during face processing, bottom-up sensitivity to phase noise can be modulated by top-down task constraints, in a broad window around the P2, at least in some subjects.

subjects show reliable ERPs to faces and noise, which differ reliably across subjects, for reasons yet to be discovered Gaspar et al., in press Reliability of ERP and single-trial analyses).
Individual differences in early visual processing have been largely ignored in the face literature, and implicitly treated as measurement errors that can be filtered out by averaging data across subjects. Although understanding the average brain is a worthy goal, only the single-trial approach, in conjunction with parametric designs, will allow us to understand brain mechanisms and the information content of brain states Schyns, 2010). In the single-trial framework, timing is essential. Indeed, how fast the visual system can discriminate among object categories provides strong constraints on possible computational implementations (Thorpe and Fabre-Thorpe, 2001;Rousselet et al., 2004;Thorpe, 2009). In particular, the timing of task modulations might help us tease apart periods of mostly bottom-up, stimulus driven activity, from time-windows engaging flexible neuronal populations that might be tuned to certain tasks. Thus, task modulations are key to understand brain mechanisms (Schyns, 1998;Pernet et al., 2007;Schyns et al., 2009).

IntroductIon
Following the first reports of larger scalp responses to faces compared to objects (Bötzel and Grüsser, 1989;Jeffreys, 1989;Seeck and Grüsser, 1992), there have been hundreds of studies on the early event-related potentials (ERPs) to faces and objects. The vast majority of these studies used (i) averaged ERP, (ii) group statistics, and (iii) categorical designs. Their findings can be summarized shortly: sometime between 100 and 200 ms after stimulus onset, ERPs to different object categories tend to differ from each other, and faces are most of the time associated with larger N170 peaks than other object categories .
Recently, several research groups have started to study these early preferential responses to faces in individual subjects Rousselet et al., 2007aRousselet et al., , 2008aRousselet et al., ,b, 2009Rousselet et al., , 2010Smith et al., 2007;Liu et al., 2009;Ratcliff et al., 2009;van Rijsbergen and Schyns, 2009). Individual subjects' ERPs, show, not surprisingly, systematic differences between faces and objects consistent with group effects reported so far (Rousselet et al., 2008a). These studies have also revealed inter-subject differences: despite coarse agreement between group and individual subject statistical analyses, individual Modeling single-trial ERP reveals modulation of bottom-up face visual processing by top-down task constraints (in some subjects) single-trial activity discriminated between face and car trials. In the same task, ERPs to faces and cars were sensitive to the level of image phase noise roughly in the time period 100-300 ms after stimulus onset. Among several important results in this paper, the authors show that in the color task, in which the noise dimension becomes task irrelevant, noise sensitivity was strongly reduced shortly after 200 ms. This is an important result because it suggests a timing for task effects in situations in which subjects discriminate stimuli presented at fixation: the first 200 ms of brain activity is mostly bottom-up, not modulated by task constraints, followed by a second period of brain activity which is modulated by top-down, task related influences.
In previous studies, similarly to  we described phase noise sensitivity of face ERPs in the time-window 100-300 ms (Rousselet et al., 2008b. Although in their second experiment  used only two noise levels, 30 and 45% phase coherence, their results suggest that when we use a larger range of noise levels, as in our previous experiments, noise sensitivity should be strongly reduced after 200 ms when it is made task irrelevant. We tested this hypothesis by asking subjects to perform two tasks: the same face identity discrimination task (face 1 vs. face 2) we used in previous studies (Rousselet et al., 2008b and the same color discrimination task (red vs. green) used by . We performed both group level and single-subject analyses to reveal the detailed time-course of the task effects.

MaterIals and Methods
Square brackets indicate the boundaries of 95% confidence intervals (CIs) constructed using a percentile bootstrap with 1000 samples (Wilcox, 2005). subjects We recruited 13 subjects, including the second and third authors, and 11 subjects from the Glasgow Psychology subject pool. Subjects' mean age was 24 years old (min = 20, max = 32); eight were females, 11 were right handed. Their mean high-contrast 63 cm decimal acuity was 104 (min = 99, max = 110); their low-contrast 63 cm decimal acuity was 96 (min = 89, max = 103). All subjects had a Pelli-Robson contrast sensitivity of 1.95 and successfully passed the Ishihara color blindness test for red-green color deficiencies. On average subjects had 19 years of education (min = 15, max = 23). All subjects except the two authors received £6/hour for their participation and all subjects gave written informed consent. The research ethics board from the University of Glasgow approved the research protocol. stIMulI We used two front-view male face photographs cropped within a common 4.3° × 6.3° oval frame and pasted on a uniform 9° × 9° gray background (Figure 1). These faces were selected from a set of 10 faces, which are described in detail in previous publications (Gold et al., 1999;Husk et al., 2007). All stimuli had the same global amplitude spectrum. We added noise to their phase spectra so that their percentage of global phase coherence ranged from 10 to 80%, with 10% increments. Noise was random on each trial, which means that subjects never saw the exact same image twice. We also colorized the faces with red and green tones by manipulating For over 10 years, the ERP face literature has been debating the existence of task modulations of the N170 face preferential response. Several studies used targets vs. non-targets manipulations, in which faces at fixation are attended or ignored, and found no evidence of task modulations on the N170 (Séverac-Cauquil et al., 2000;Carmel and Bentin, 2002;Rousselet et al., 2007b), and its magnetic analog, the M170 (Lueschow et al., 2004;Furey et al., 2006;Okazaki et al., 2008). Similarly, intracranial recordings failed to reveal top-down modulation of the N200 to faces (Puce et al., 1999). One exception is found in a recent study, which reported an effect of category expectation on the N170 (Aranda et al., 2010). However, the effect seems to be weak and in a direction opposite to the one expected, so it might be a type I error. In contrast to target vs. non-target task manipulations, the N170 can be modulated by spatial attention Crist et al., 2008) or by directing attention away from faces, in conditions in which letters superimposed on faces have to be discriminated (Eimer, 2000;Mohamed et al., 2009). Effects of language interference (Landau et al., 2010) and working memory (Sreenivasan et al., 2007) have also been suggested.
Hence, at least in some conditions, early face processing seems to be modulated by spatial attention and other factors. However, the modulations of face ERPs reported so far tend to be ill defined because it is unclear what aspect of face processing is modulated by the task. It remains also unclear how and when task demands affect the processing of a face presented at fixation. Very few studies have tackled this fundamental question by using a design in which the same stimulus is presented but processed differently because task requirements change the diagnosticity of input information (Pernet et al., 2007;Schyns, 1998). Schyns and his colleagues used reverse correlation techniques and large number of trials to reveal changes in single-trial information sensitivity Smith et al., 2007;van Rijsbergen and Schyns, 2009). However, although these studies show that ERPs are sensitive to different information from the same stimuli in different tasks, they do not provide a quantification of how task requirements affect the brain sensitivity to the same information.
One earlier study aimed at answering this question and reported larger N170 amplitude in a gender task compared to an identification task, but only for coarse, not fine scale information (Goffaux et al., 2003). This result suggests the use of certain face spatial scale information when it is relevant for the task at hand, an interaction between task demands and available information essential to reveal the information content of brain activity (Pernet et al., 2007;Schyns et al., 2009). However, the effects reported by Goffaux et al. (2003) seem very small and there was no report of task effects in individual subjects. It is also unclear if the effects might not be due to task modulations of the ERP sensitivity to the structured noise added to filtered images.
A more recent study reported one of the most striking and interpretable task effects on ERP face sensitivity. In , experiment 2, a cue indicated on each trial how subjects were to process a subsequent stimulus: either discriminate its color (red vs. green) or its category (face vs. car). Colored pictures of faces and cars had two noise levels, created by altering the Fourier phase spectrum, which contains most of the information about object identity ). In the categorization task, hood that task effects are due to a change in the task relevance of one stimulus dimension while subjects attempt to discriminate the same stimulus. A Dell Precision 390 workstation with Nvidia Quadro FX 3450/4000 graphics card and MATLAB Psychophysics Toolbox controlled the stimulus display. Images were displayed on a SAMSUNG SyncMaster 1100MB CRT monitor with a resolution of 800 × 600 pixels and a 85-Hz refresh rate. The screen was 28° × 21°of visual angle. the hue (H), saturation (S), and value (V) of the original images (red: H = 0.04, S = 0.17, V = unchanged; green: H = 0.34, S = 0.23, V = unchanged). The value (V) was normalized so that, on average, each face regardless of color or phase coherence had the same average luminance (about 33 cd/m 2 ) and RMS contrast (0.1). We colorized only the face itself and not the background, and used relatively small images to ensure that subjects paid attention to the face in the two tasks. This manipulation increases the likeli-FiGuRE 1 | Stimuli. (A) All observers saw the same two faces presented in red or green at eight levels of global phase coherence. Rows 1 and 2: face identity 1; rows 3 and 4: face identity 2; rows 1 and 3: red tones; rows 2 and 4: green tones. (B) Gray lines show edges identified using Kovesi's local phase coherence algorithm in eight face examples ranging from 10% phase coherence (left) to 80% phase coherence (right). Superimposed on each edge map, local phase coherence is color coded at the 10 pixels with the highest local phase coherence. These 10 pixels were identified in the two original faces at 100% global phase coherence. Local phase coherence was maximal (red) at 80% global phase coherence, and approached zero (blue) at 10%. (C) Boxplots of local phase coherence at each level of global phase coherence for all the images seen by one subject. There is a non-linear relationship between the two variables.
Each block was preceded by 10 practice trials that allowed subjects to learn the stimulus-key association. Practice trials were used to ensure a high level of performance in older subjects, whose data are not reported here. In a regular trial, a small fixation cross -a 0.3° " + " in the middle of the screen -appeared for 500 ms, after which a blank screen was presented for a random duration ranging from 500 to 700 ms (Figure 2A). Then a test stimulus was presented for 36 ms, followed by a blank screen that stayed on until subjects provided their response. Practice trials were very similar, except that immediately after stimulus presentation, a choice screen appeared that showed each face in grayscale (identity task) or red and green noise textures (color task) simultaneously, one above each other with a corresponding label below each item. Auditory feedback was provided after the subject pressed a response key, with lowand high-pitched tones indicating incorrect and correct responses. Feedback was provided only during practice trials.

eeG recordInG and preprocessInG
We acquired EEG data with a 128-channel Biosemi Active Two EEG system (BioSemi, Amsterdam, Netherlands). We recorded from four additional electrodes -UltraFlat Active BioSemi electrodes -below and at the outer canthi of both eyes. Analog signal was digitized at 512 Hz and band-pass filtered online between 0.1 and 200 Hz. Electrode offsets were kept between ±20 μV.
Offline, data were average-referenced. Then, we removed bad channels without interpolation, applied a 40-Hz low-pass filter and epoched the data between −300 and 1,200 ms. An ICA (Makeig et al., experIMental desIGn Testing was conducted in a sound-attenuated booth in which the monitor was the only source of light. An 80-cm viewing distance was maintained with a chinrest. We tested subjects in two experimental sessions. The first day was a practice behavioral session; the second day consisted of both behavioral tasks and simultaneous EEG recordings. Each day, subjects performed two interleaved tasks. On half of the blocks they performed a one-interval, two-alternative forced choice task discriminating between two faces. On the other half of the blocks, they discriminated between two colors. Identity and color tasks were blocked so subjects could focus on one task for an entire block of trials, without having to prepare to switch task on each trial (Johnson and Olshausen, 2003), in an attempt to increase the likelihood of finding strong task effects. The same stimuli were presented in the two tasks. In both tasks, on each trial, one face appeared briefly (36 ms), and subjects had to indicate which of two possible faces or two possible colors was presented by pressing 1 or 2 on the numerical pad of the keyboard. The association between button and identity/color was assigned randomly for all subjects. Subjects were given unlimited time to respond, and were told to emphasize response accuracy, not speed. All subjects performed the task with the same single pair of male faces throughout the experiment. Subjects saw eight conditions along a noise-signal continuum, from 10 to 80% phase coherence, with increments of 10% (Figure 1).
There were 10 blocks of 96 trials: 960 trials in total, with 120 trials per level of phase coherence. Within each block, there were equal repetitions of each face, each color and each phase coherence level.

FiGuRE 2 | Tasks and design matrix. (A)
Organization of practice trials and regular trials in the two tasks. A trial started with the presentation of a fixation point for 500 ms. Then, after a random delay ranging from 500 to 700 ms, a stimulus was presented for about 36 ms. During practice trials (top row), a choice screen appeared immediately after the stimulus, showing the two targets of the task and their associated response keys. The screen stayed on until the subject's response, which was followed by auditory feedback, before the trial sequence resumed. During regular trials (bottom row), a blank screen appeared immediately after the stimulus, and remained on until the subject's response. No feedback was provided during regular trials. Stimuli are not drawn to scale. (B) Example of a design matrix in one subject (color scale: green = 0; red = 1). The first eight predictors were categorical: they indicate the stimulus type (i.e., red or green, face 1 or 2) and the task. The next four predictors were continuous: global phase coherence (GPC) and local phase coherence (LPC) in the identity and color tasks. Each continuous predictor was z-scored independently before insertion in the design matrix. The last column was a constant term (cst = 1s). because subjects might be less attentive in the easier color task compared to the more challenging identity task. These confounding mean ERP differences were accounted for in the design matrix, thus allowing us to measure how single-trial ERPs were modulated by image noise in the two tasks. We used linear contrasts to combine the beta weights associated with the global and local phase coherence predictors in the identity task (column 9 + column 11 in Figure 2B) and in the color task (column 10 + column 12) to study the time-course of the overall ERP noise sensitivity: We did not look at task modulations separately for global phase coherence and local phase coherence because these two predictors were strongly correlated ( . High correlation between regressors may lead to unstable beta parameter estimates, whereas their linear combination remains stable, hence our analysis of the combination of global and local phase coherences. We refer to this summary statistics as noise sensitivity in the rest of the paper and explored task effects by contrasting d identity and d color .

Group level analyses
Group analyses of noise sensitivity task modulations were computed using a bootstrap-t technique for paired samples with 1000 resamples (Wilcox, 2005). Although full scalp analyses are possible in LIMO EEG, we performed the analyses at only one electrode for two reasons: first because we observed in previous studies that noise sensitivity is localized at few posterior electrodes that display redundant information (Rousselet et al., 2008b; second because we wanted to compare different group analyses to singlesubject analyses. We analyzed group results using four different ways to pull data together.
A popular way to do group analyses is to average the data across subjects, find the best electrode in this group average, and make a measurement at that same electrode in all subjects -group defined best electrode. Here, we averaged across subjects the R 2 maps of the ANCOVA model fit to the data, and selected the electrode showing the largest mean R 2 . Bootstrap paired t-tests were then computed between noise sensitivity contrasts in the identity and color tasks at all time points at this electrode.
A potentially more fruitful way to do group statistics is to optimize the electrode by selecting the best electrode independently in each subject (Foxe and Simpson, 2002;Liu et al., 2002;Rousselet et al., 2010). We thus took the electrode at which the model provided the best fit independently in each subject, i.e., where R 2 was the largest for each subject -R 2 optimized electrode. Then we computed paired t-tests between noise sensitivity contrasts from these potentially spatially different electrodes. The signal at R 2 optimized electrodes was the most sensitive to image and task parameters as described by the design matrix and therefore constitutes the most likely candidate for reflecting the activity of cortical sources sensitive to image information. Hence, this kind of optimized 2004), as implemented in the runica EEGLAB function Delorme et al., 2007) was then computed and we removed components corresponding to blink activity, identified by visual inspection of their scalp topographies, time-courses and activity spectra. Subsequently, we re-epoched the data between −300 and 500 ms, and subtracted the average baseline activity from each time point. Trials with abnormal activities were excluded based on a ±100-μV threshold for extreme values. An epoch was rejected for abnormal trend if it had a slope larger than 75 μV/epoch and a regression R 2 larger than 0.3. All remaining trials were included in the analyses, whether they were associated with correct or incorrect behavioral responses. After epoch rejection, the average number of trials per subject was 904 (min = 849, max = 958).

General lInear ModelInG of eeG data
Subjects' epoched data were modeled using LIMO EEG, an open source Matlab toolbox for hierarchical GLM, compatible with EEGLAB: https://gforge.dcn.ed.ac.uk/gf/project/limo_eeg/ . The general linear model was used to express singletrial ERP amplitudes, in microvolt, independently at each time point and each electrode, using the model: In Eq. 1, all trials for each time frame t and electrode e (ERP t,e dimension 1 × n) were modeled as the sum of a constant term ß 0 , the eight experimental conditions (each combination of stimulus identity, color and task -2 × 2 × 2 = 8, Cat 1-8 -corresponding to the first 8 columns of the design matrix), the global phase coherence in the identity and the color tasks (ϕ G-ID , ϕ G-CO ), the local phase coherence in the identity and the color tasks (ϕ L-ID , ϕ L-CO ), and an error term ε. All predictors formed the design matrix X of dimension n × p ( Figure 2B, p = 13). The beta parameters (dimension p × 1) were found using an ordinary least square solution.
Global phase coherence was our image noise manipulation. Kovesi's (1999Kovesi's ( , 2003 local phase coherence is a measure of wavelet phase alignment across spatial frequencies, which is independent of image contrast and luminance. Local phase coherence may predict subjects' behavior in a natural scene classification task ) and seems to provide a good representation of non-linear changes in local image structure imposed by the linear global phase coherence manipulation (Rousselet et al., 2008b. In our stimuli, pixels with high local phase coherence corresponded to local edges around the eyes, nose, and mouth (Figure 1).
The design matrix represents a typical ANCOVA model with categorical and continuous predictors. However, whereas in ANCOVA one is usually interested in the categorical effects whilst controlling for covariates, here we were interested in the covariate effects: we looked at the relationship between image phase coherence and single-trial ERP amplitude in the identity and the color tasks whilst accounting for the main effects of identity, color, and task. For instance, the average ERP in one condition (e.g., identity discrimination of green face 1) could differ from the average ERP in another condition (e.g., color discrimination of green face 1) of contiguous significant F values (univariate p < 0.05), separately for each predictor, each linear contrast, and in the case of the ANCOVA, for the global fit of the entire model (R 2 ). Second, we saved the maxima across these cluster sums -one maximum for each F test (familywise correction). After performing these steps 1000 times for group statistics and 600 times for single-subject analyses (as recommended for various linear models, Wilcox, 2005), we used the 95th percentiles of the bootstrapped maximum F cluster sums to threshold the original F cluster sums. For each test, the significant original F values (univariate p < 0.05) were clustered and if their sum were larger than the corresponding bootstrapped maximum cluster sum threshold, the cluster was significant.

noIse sensItIvIty cluster statIstIcs
For each subject, we used a percentile bootstrap rather than using an F test of noise sensitivity (sum of beta coefficients for global and local phase coherence). Bootstrap distributions were used to compute 95% CIs under H0, during the same simulation that was used to estimate the F distributions of the ANCOVA parameters. These thresholds were then applied to each bootstrap to mark significant noise sensitivity. Significant effects were then clustered and a maximum sum of absolute noise sensitivity was saved for each bootstrap. The bootstrap distributions of maximum sum of absolute noise sensitivity computed under H0 were used to cluster the observed noise sensitivity in each task.

shIft functIon analyses of the decIles of sInGle-trIal erp dIstrIbutIons
We also used the shift function to measure how single-trial ERP distributions changed from the identity task to the color task. The shift function compares entire distributions instead of relying exclusively on one point estimate such as the mean or the median. In our application of the shift function, the x-axis is the Harrell-Davis (hd) estimator of quantiles one to nine of the single-trial ERPs in the identity task (see Wilcox, 2005, pp. 71-73 and 139-141). The y-axis is the difference, Delta, between the hd estimators of the quantiles of the identity and color ERP distributions. Hence, the shift function represents how much the data from one task must be shifted to be comparable to the data from another task at each quantile. Task differences were estimated by a bootstrap procedure, and corrected for multiple comparisons such that the simultaneous probability coverage of the 9 CIs remained close to the nominal 0.05 alpha level (see Wilcox, 2005, pp. 151-155). The analyses were performed on modeled single-trial ERP data at the max R 2 electrode (i.e., the electrode at which the model explained best the data); they included all the significant time points that contained the maximum noise sensitivity task difference. Modeled ERP are more meaningful to analyze because they are reconstructed after removing the error term, the part of variance that the model cannot explain.

results
We consider first the group analyses, second the single-trial analyses, third the comparison of group and single-trial analyses and fourth the shift function analyses of the deciles of the single-trial ERP distributions. averaging tends to average signals that reflect common processing across subjects, whereas using the same spatial electrode may lead to averaging signals reflecting different processes.
Yet another way to optimize electrodes across subjects consists in selecting for each subject the electrode with the largest noise sensitivity task difference -task effect optimized electrode. In this case, instead of taking the electrode where the ANCOVA model provided the best fit, we selected for each subject the electrode showing the strongest noise sensitivity task effect. The paired t-test was then computed between noise sensitivity contrasts from these potentially spatially different electrodes.
Finally, we used the maximum absolute beta coefficients across electrodes computed at each time point (the envelope), to ensure our analyses did not miss local maxima at electrodes other than the one showing the largest R 2 . For every subject, we computed a paired t-test between noise sensitivity contrasts from the envelopes.
For both group and single-subject analyses, task modulations at one electrode were quantified by normalizing the maximum absolute task difference in noise sensitivity by the maximum absolute noise sensitivity in the identity task -the maxima were defined across time frames: We controlled for multiple comparisons using bootstrap and the clustering technique as implemented in the Matlab Fieldtrip toolbox, with a minimum of two neighboring channels per cluster (Maris and Oostenveld, 2007). As described in , the clustering technique in LIMO EEG works for analyses both at single electrodes (temporal clustering) and at multiple electrodes (spatial-temporal clustering). For group analyses, because only one electrode or equivalent electrode was considered, we employed temporal clustering to control for multiple comparisons. For single-subject analyses, because the whole scalp was analyzed, we employed spatial-temporal clustering (familywise error rate = 0.05). For t-tests and ANOVAs the validated bootstrap technique includes centering the empirical distributions of each betweensubject and within-subject levels so that the null-hypothesis of no difference in means is true (Berkovits et al., 2000;Wilcox, 2005;Seco et al., 2006). Thus, for the group paired t-tests, noise sensitivity contrasts across subjects were centered for each condition separately and paired t-tests were computed 1000 times by sampling subjects with replacement (Wilcox, 2005). However, this technique is not appropriate to our ANCOVA single-subject analyses because the continuous covariates can potentially have as many levels as trials. We used therefore a different strategy to derive an estimate of the sampling distribution of our F statistics under the null-hypothesis. For each subject, epoched single-trials were sampled with replacement and fitted to the original design matrix, thus breaking the link between the data and the design -an estimation of the data distribution under the null-hypothesis H0.
For both the bootstrap paired t-test (group analyses) and the bootstrap ANCOVA (single-subject analyses), in each bootstrap loop we first computed the sum of each temporal or spatial-temporal cluster

Group defined best electrode
If the best electrode is defined as the electrode showing the largest mean R 2 across subjects, we obtain the results in Figure 3A. This best electrode was right posterior-lateral (B8 in the Biosemi system, between PO8 and PO10) and had a maximum mean R 2 of 0.23 [0.17, 0.32] that peaked at 141 ms. The mean ERPs at the eight global phase noise levels started to diverge shortly after 100 ms in the identity and the color tasks. The parametric ERP modulation by image noise can be better appreciated by looking at the time-course of the groupaveraged noise sensitivity, which peaked at the same electrode and time point as R 2 did. Noise sensitivity was reduced in the color task compared to the identity task in a single cluster, between 139 and 277 ms after stimulus onset ( Table 1). At the latency of the maximum task effect, 242 ms, there was 20.7% noise sensitivity reduction compared to the maximum sensitivity in the identity task.

R 2 optimized electrode
The electrode with the largest R 2 was also the electrode with maximum noise sensitivity or was part of the same cluster as the electrode with maximum noise sensitivity and behaved similarly to it. Across subjects, max R 2 electrodes were all located in a cluster of lateral posterior electrodes, as reported in previous experiments and as expected from the face literature. R 2 averaged across subjects was stronger over the right hemisphere. This pattern was also found in individual subjects (Figures 5-8): eight subjects had a maximum R 2 at right hemisphere electrodes; two subjects at left hemisphere electrodes; three subjects at midline electrodes. The right hemisphere electrodes of maximum model fit included B8 (one subject) or one of its neighbors (seven subjects). The maximum mean R 2 at the optimized electrode was 0.27 [0.2, 0.35], and peaked at 139 ms ( Figure 3B; Table 1). As expected if R 2 results were sufficiently consistent across subjects, this optimized maximum average R 2 was larger than at the group defined best electrode. There was a significant task effect in a single cluster between 172 to 275 ms post-stimulus onset, with 18.7% noise sensitivity reduction at the latency of the maximum task difference, 213 ms.

Task effect optimized electrode
Results of this analysis are presented in Figure 3C. Although we selected for each subject the electrode showing the largest task difference in noise sensitivity, no differences could be observed at the group level. Indeed, taking the largest effect can be misleading because certain predictors can be significant at electrodes and time frames at which the overall model does not explain the data significantly. This is indeed what we found: except for 3 subjects for whom maximum task effects occurred at electrodes that were part of the cluster of electrodes with the maximum R 2 , for the other 10 subjects R 2 was lower and early noise sensitivity (<200 ms) was weak at the electrodes of maximum task differences ( Figure 3C). Hence, across subjects, noise sensitivity and task effects were not significant at the electrode optimized based on maximum task differences.

Maximum absolute betas
Analyses on the beta coefficient envelopes gave results similar to those obtained on the group defined best electrode and the R 2 optimized electrode ( Figure 3D; Table 1). Two significant clusters were observed: a first task effect occurred between 154 to 254 ms post-stimulus onset, with 18.2% noise sensitivity reduction at the latency of the maximum task difference, 197 ms; a second effect of similar size occurred after 400 ms. This analysis suggests that we did not miss the big picture by defining the electrode to analyze based on the group-averaged R 2 or the single-subject max R 2 . The statistical tests might suggest that the group defined best electrode did a better job because it showed a significant task effect earlier than the one observed in the R 2 optimized test. However, picking the best group electrode to show group effects is circular because what the result ought to be is unknown. By contrast, selecting the best electrode separately in each subject takes into account inter-subject variability and leads to group results more sensitive to individual differences.
In addition to group analyses at all time points, we compared noise sensitivity across subjects at the latency of the P1, N170, and P2 peaks, for the sake of comparisons with previous studies (Figure 4). For each subject, the latency of the peaks was measured at the max R 2 electrode in the 80% coherence condition (face ERP). Measurements at electrode B8 gave similar results. Then, the mean sensitivity was measured in time-windows encompassing five time points on either side of the peak latency, hence about 21.5 ms. This analysis revealed, across subjects, weak to no noise sensitivity around the P1 (∼96 ms), and stronger noise sensitivity around the N170 (∼146 ms) and the P2 (∼207 ms). Importantly, only the P2 showed task modulations of noise sensitivity. Similar P2 results were obtained at the latency of the P2 defined at the group level.

sInGle-subject analyses
Figures 5-8 provide, for each subject, a detailed description of their R 2 , noise sensitivity, task effects and behavioral results. The time-course of the R 2 functions and of the beta coefficients for noise sensitivity are similar to those reported previously in young subjects (Rousselet et al., 2008b. Because the main purpose of our study was to quantify task modulations of early ERP noise sensitivity, we focus the report of the single-trial analyses on the electrode showing the maximum R 2 for each subject. All these electrodes were found at posterior-lateral locations. No comparable fits were observed at frontal electrodes. Thus, our analyses seem to capture task modulations of evoked noise sensitivity from the visual system, rather than electrophysiological correlates of the topdown modulation signal itself. Figures at the electrodes showing the maximum noise sensitivity in the identity task or the color task were almost identical to those presented here (max R 2 ) because these electrodes were either the same electrodes or part of the same cluster of electrodes.
Single-trial analyses revealed an inter-subject variability hidden behind the seemingly simple group averages and statistics. Individual subjects differed widely in the shape of their ERPs, R 2 functions, scalp topographies, nature and time-course of the task effects (Figures 5-8). The mean of the maximum R 2 measured in each subject was 0.31 [0.24, 0.38], min = 0.17, max = 0.57; it peaked at 141 ms [136,149], min = 133, max = 186. Image noise sensitivity started at about 100 ms in both tasks. In the identity task, the median onset was 100 ms [98,105], min = 92, max = 115; in the color task it was 100 ms [96,111], min = 86, max = 133. The median difference between the two tasks was 0 ms [−7.8, 4], FiGuRE 3 | Event-related potentials group results. The left and middle columns contain the results for the identity and the color tasks. The right column shows the R 2 and the noise sensitivity task differences. (A) Group defined best electrode. Mean ERPs are shown color coded at each level of global phase coherence in the two tasks. In the R 2 plot, the inset shows the topographic map of the interpolated R 2 values at the latency of the maximum R 2 . Noise sensitivity is the sum of the global and local phase coherence beta weights in μV/std of the predictor. Thick lines represent averaged data, surrounded by thin lines for the 95% percentile bootstrap CI. The red horizontal bar under the zero line indicates time points of significant effects, based on bootstrap t-test with temporal cluster correction for multiple comparisons. The task difference in red is identity (black continuous line) minus color (green dashed line). At the latency of the maximum task difference observed within the first 300 ms after stimulus onset, noise sensitivity in the identity task was −4 μV/std, color = −2.8 μV/std, difference = −1.2 μV/std. (B) R 2 optimized electrode. The topographic map was obtained by averaging the maps from individual subjects. At the latency of the maximum task difference, noise sensitivity in the identity task was −2.8 μV/std, color = −1.8 μV/std, difference = −1 μV/std. (C) Task effect optimized electrode. The R 2 bump between 100 and 200 ms was mostly due to three subjects (S9-S11) who had maximum task effects at electrodes that were part of the cluster of electrodes with the maximum R 2 . (D) Noise sensitivity envelope. The maximum across electrodes of the absolute noise sensitivity was used for each subject. At the latency of the maximum task difference, noise sensitivity in the identity task was 3.9 μV/std, color = 2.8 μV/ std, difference = 1 μV/std. who did show a reduction in the color task (difference = −0.02 [−0.07; 0.04]), and 0.37 global phase coherence for S 2 , who showed opposite ERP results. Analyses with the median gave similar results.
The max R 2 electrodes almost never showed significant differences between the two faces (identity task) or the two colors (color task), in keeping with previous reports on faces, cars, and words (Nobre et al., 1998;Rousselet et al., 2008bRousselet et al., , 2010. One subject showed both significant identity sensitivity in the identity task and a significant task modulation of identity sensitivity at few time points around 450 ms after stimulus onset. Five subjects showed either identity or color sensitivity in one or the other task, but without significant task modulation, or significant task modulations but without significant identity or color sensitivity.

Group vs. sInGle-trIal analyses
Group analyses suggest a decrease in noise sensitivity in the color task compared to the identity task around 140-300 ms post-stimulus onset. This task modulation was observed at the group defined best electrode and the R 2 optimized electrode (Figure 9). In the spatialtemporal clusters containing these electrodes, single-trial analyses revealed a different picture: 8 subjects out of 13 had a significant task modulation of noise sensitivity; 1 had an increased sensitivity in the color task and 7 had a decreased sensitivity. In the timewindow of the group effect, only six subjects showed a significant effect; a maximum of five subjects showed an effect simultaneously, including the subject who had an effect in the opposite direction. Thus, in our sample, at any time point showing a significant group effect, there were at most 5 subjects out of 13 showing a significant effect, 4 of which in the same direction as the group effect (31%).
Onsets and durations of task effects also revealed discrepancies between group and single-trial analyses ( Table 1). At the R 2 optimized electrode, the group task effects started at 172 ms and lasted 103 ms. In contrast, for the eight subjects that showed significant effects, the average task effect onset was 214 ms [155, 271] min = 86, max = 332; the average task effect duration was the 186 ms [142,234], min = 92, max = 320.

234]
Peak latency is the latency of the maximum absolute task effect. Effect size is defined in Section "Group Level Analyses, " Eq. 4.

FiGuRE 4 | Task effects around peak time-windows.
Noise sensitivity in the two tasks was normalized by dividing by the maximum absolute noise sensitivity in the identity task. The three first ERP peaks were measured at these latencies across subjects: P1 (median = 96, min = 82, max = 119), N170 (median = 146, min = 131, max = 184), P2 (median = 207, min = 180, max = 256). The P2 defined from the group-averaged data had a latency of 207 ms. min = −21, max = 16. Results using the mean were similar and a shift function analyses failed to reveal significant differences at any deciles of the distribution of onset differences.
Only eight subjects showed a significant task modulation of noise sensitivity. Noise sensitivity decreased in the color task compared to the identity task in seven subjects: S 1 = 29%, S 3 = 46%, S 4 = 35%, S 6 = 19%, S 7 = 39%, S 10 = 29%, S 11 = 39%, mean = 33.7%. Subject S 2 showed an effect in the opposite direction (Figure 5), with significantly stronger noise sensitivity in the color task than in the identity task (40.7% sensitivity increase). All subjects but S 11 (Figure 7) had a single cluster of significant task differences.
There was no significant link between task effects and behavioral thresholds: mean behavioral 75% correct threshold was 0.37 global phase coherence for the five subjects who did not show a significant ERP task modulation, it was 0.39 global phase coherence for subjects FiGuRE 5 | individual results for subjects S1 to S4. (A) Statistically significant model R 2 at all the electrodes and time frames from −300 to 500 ms after stimulus onset. Electrodes are stacked up along the y-axis. The tick on the y-axis marks the electrode at which the maximum R 2 was recorded. This electrode is plotted as a continuous black line in (B). R 2 ranges from near zero in blue to the maximum for that subject in red. (B) Model R 2 at all the electrodes and time frames. The electrode at which the maximum R 2 was recorded is plotted in black. The other electrodes are plotted in gray. The inset shows the topographic distribution of the R 2 at the latency of the maximum, indicated by a vertical black dotted line. This vertical line is also plotted in all the other panels for comparisons. For subject S 1 , R 2 had a bilateral occipital-lateral distribution, with a maximum over the left hemisphere (left bottom red cluster). The red vertical dashed line indicates the time frame of the earliest significant R 2 across all electrodes. Near the top of the panel, the upper horizontal line (red) marks significant time frames at the maximum R 2 electrode. The lower horizontal line (green) marks significant time frames of the spatial-temporal cluster to which the maximum R 2 electrode belonged. For subject S 1 , this horizontal line starts at the latency of the earliest significant model fit (red vertical dashed line), indicating that the maximum R 2 electrode is part of a spatial-temporal cluster that captures the earliest effects. The horizontal dashed line is the univariate one-sided 95% CI of the R 2 under H0, at the maximum R 2 electrode. Although this is for illustration only, because the actual statistical test was based on spatial-temporal clusters, it gives a good indication of the R 2 values expected by chance. (C) Mean ERPs in the identity task. The red vertical dashed line indicates the time frame of the earliest significant noise sensitivity in the identity task, across all electrodes. This line is also plotted in (E). The red continuous vertical line indicates the latency of the maximum task difference and is also plotted in (D,G). (D) Mean ERPs in the color task. The red vertical dashed line indicates the time frame of the earliest significant noise sensitivity in the color task, across all electrodes. This line is also plotted in (F). (E) Noise sensitivity beta coefficients in the identity task. Noise sensitivity at the maximum R 2 electrode is plotted in black, the other electrodes in gray. Units are μV/ std of the predictor. Near the bottom of the panel, the upper horizontal line (red) marks significant noise sensitivity time frames at the maximum R 2 electrode. The lower horizontal line (green) marks significant noise sensitivity time frames of the spatial-temporal cluster to which the maximum R 2 electrode belonged. The black horizontal dashed lines show the univariate two-sided 95% confidence interval of noise sensitivity under H0, at the maximum R 2 electrode. (F) Noise sensitivity beta coefficients in the color task. Noise sensitivity at the maximum R 2 electrode is plotted as a green dashed line, the other electrodes in gray. (G) Noise sensitivity beta coefficient task differences. Noise sensitivity differences at the maximum R 2 electrode are plotted as a thick red line, the other electrodes in gray. The black continuous line and the green dashed line are the same as those in (E,F). The red continuous vertical line indicates the latency of the maximum task difference. At that latency, the title indicates the amplitude of the noise sensitivity in the identity task (ID), in the color task (CO), and the difference between the two tasks (diff). (H) Proportion correct as a function of global phase coherence, in red circles for the identity task, in green squares for the color task. Data from the identity task were fitted with a cumulative Weibull function. The vertical arrow points to the 75% correct threshold in the identity task. The threshold appears in bracket in the title. The red horizontal dashed line marks the maximum proportion correct in the identity task obtained from the fit. In addition to the analyses performed independently at each time point (Figures 5-8), we also provide a continuous measure of integration time in the two tasks. This was achieved by measuring the time it takes to integrate 50% of noise sensitivity during the first half-second following stimulus onset . Noise sensitivity in the two tasks was normalized by the maximum absolute noise sensitivity in the identity task, defined across time frames. Then the absolute noise sensitivity in each task was integrated over time (Figure 10). At the group level, noise sensitivity integration increased sharply after 100 ms and started to differ significantly between the two tasks at 227 ms after stimulus onset. The 50% integration threshold was reached 16 ms earlier in the identity task compared to the color task. About 14% less noise sensitivity was integrated in the color task relative to the identity task. Analyses performed in each subject individually provided a somewhat different picture. In keeping with group analyses, cumulated noise sensitivity started to rise at about 100 ms in most subjects. However, onset of task effects, 50% integration times and total cumulated noise sensitivity differed markedly across subjects and from the group analyses (Figure 10).
Given the discrepancy between group and individual subject analyses, it is important to consider weak power as a potential explanation for the absence of task effect in some subjects. Indeed, lack of significant effects might be due to a real absence of effects or the presence of relatively weak effects that our statistical test might miss. Although lack of power cannot be completely ruled out, it appears that subjects with significant task effects at the R 2 optimized electrode had substantial effect sizes, with maximum F cluster sums at least 1.6 times larger than the largest bootstrap F cluster sums obtained by chance (Figure 11: S 1 = 2.7, S 2 = 2.6, S 3 = 3.2, S 6 = 1.9, S 7 = 2.1, S 10 = 2.3, S 11 = 2.4). One subject had lower effect size than the other subjects, with a cluster sum 0.8 larger than that obtained by chance (Figure 11: S 4 ). Subjects with no significant task effects had no significant cluster whatsoever (S 5 ), relatively low cluster sums (S 9 and S 12 ) or cluster sums so low that they fell at the bottom of the bootstrap cluster sum distributions (S 8 and S 13 ).

shIft functIon analyses of the decIles of the sInGle-trIal erp dIstrIbutIons
Changes in task constraints could affect noise sensitivity by modulating preferentially single-trial ERPs to noise textures or to faces. Alternatively, these changes could be a uniform compression or expansion of the distribution. In our design, noise levels are artificial. Therefore, we studied the nature of the task effects using the shift function, a technique that assumes data follow a continuum. The shift function analyses revealed that the modulation in noise sensitivity in the color task could be attributed to a modulation of a particular type of stimuli. In five subjects (Figure 12: S 1 , S 3 , S 7 , S 10 , S 11 ), noise sensitivity reduction in the color task was due primarily to increased amplitudes of face ERPs, which became closer to that of noise trials. In two subjects (Figure 12: S 4 and S 6 ), noise sensitivity reduction was due mostly to an increase in amplitude of the noise trials. Finally, in the only subject who showed increased noise sensitivity in the color task (S 2 ), the effect was also due to an amplitude increase of ERPs to noise trials. In addition, in S 2 , S 4 , S 6 , S 7 , and S 10 , there was an overall increase in ERP amplitude in the color task compared to the identity task. Thus, task constraints had non-uniform effects on ERP distributions, with most modulations being an increase in amplitude of face trials.

dIscussIon
Using identical colored face stimuli in two tasks, and a parametric noise manipulation, we observed a significant reduction in ERP noise sensitivity when noise level was task irrelevant. Overall, following  we conclude that task effects on noise sensitivity are weak before 200 ms, in the window of the N170, and mainly present around the P2. However, task effects were highly variable across subjects, with individual differences in onsets, durations and effect sizes. These idiosyncrasies will need to be addressed in future studies.
Based on the work of , we tested the hypothesis that there is a clear boundary, at about 200 ms after stimulus onset, between bottom-up face processing and brain activity that depends on task demand. Our group results were qualitatively similar to those of  with weak task effects before 200 ms and stronger differences beyond 200 ms. Changing the task requirements did not abolish noise sensitivity altogether, but reduced it by about 19-46% in individual subjects. Results were also inconsistent across subjects, with a minority of subjects showing effects consistent with group analyses, and several subjects showing no significant effects whatsoever, despite similar behavioral performances. The nature of the task effects also differed among subjects, as revealed by analyses of singletrial ERP distributions. These results points to the existence of idiosyncratic modulations of brain activity depending on task requirements.
It remains unclear whether the task dependent noise sensitivity we observed is related to differences in task difficulty between the color and the identity tasks, or if it is due to changes in the diagnosticity of stimulus phase information (Banko et al., 2011;Philiastides and Sajda, 2007). More generally, noise sensitivity between 100 and 300 ms after stimulus onset probably reflects the activity of object and face processing areas that are sensitive to stimulus evidence (Philiastides and Sajda, 2007;Rousselet et al., 2008b;Tjan et al., 2006). Noise sensitivity however does not reflect activity from a general discrimination mechanism because it is not present for color and identity. Similarly,  found that early single-trial visual ERPs did not discriminate between red and green or between two motion directions. However, these other stimulus dimensions can be studied by using different techniques, such as adaptation (Vizioli et al., 2010), ICA and filtering (Snyder and Foxe, 2010), and frequency tagging (Quigley et al., 2010).
Contrary to several ERP studies described in the introduction, some of our subjects did show moderate task modulations in the time-window of the N170. The absence of task effects in previous face ERP studies is difficult to interpret because of the use of group statistics. One thing that most studies have in common is the use of relatively high-contrast stimuli. Because the effect of attention is contrast dependent (Reynolds and Heeger, 2009), attention effects on face ERPs might be more likely to be observed at low-contrast. A systematic study of attention modulations as a function of face contrast remains to be FiGuRE 9 | Significant task effects at the group level and in individual subjects. In the top graphs, significant group effects appear at the top of each column in black, above significant results in individual subjects in gray. Dark gray indicates significant effects at the group defined best electrode (left column) and at the R 2 optimized electrode (right column). Light gray indicates significant effects at the spatial-temporal cluster to which an electrode belonged. The middle graphs show the number of subjects with a significant task modulation at each time point at one electrode (light gray) or in the cluster to which it belonged (dark gray). Remember that subject 2 was the only one with stronger noise sensitivity in the color task compared to the identity task. The bottom graphs show the percentage of subjects showing a significant task modulation at one electrode (thick black line) with a 95% binomial confidence interval around it (thin black lines). This percentage provides an indication of the probability of observing a group difference given the single-subject results. These binomial confidence intervals place an upper limit of about 61% probability of observing a group effect given the single-subject data. In comparison, the group results presented at the top of Figure 9 depend on the probability of the data under the null-hypothesis. c arried out. In our experiment, contrast and luminance were quite low, suggesting even weaker task modulations in more realistic circumstances. Another problem with previous reports of null effects in group statistics is the absence of statistical analyses in individual subjects, as well as poor data description . Group statistics pull out effects that are consistent across subjects, even when these effects are not significant in individual subjects. Although this might seem like a good property, given the number of face ERP experiments carried out each year and the common belief that it is satisfactory to test more subjects to achieve significance, group statistics might be responsible for many false positives in the literature (Wagenmakers, 2007). By definition, group statistics are also insensitive to single-subject significant effects that are inconsistent across subjects, for instance if timing and topographies differ. Thus, group statistics can be misleading because of increased chances of false positives and false negatives, at least in theory.
Yet, it is at the moment difficult to evaluate if our results, showing a large discrepancy between group and single-trial analyses constitute a unique curiosity or if our results reveal a pervasive problem in the ERP literature. Indeed, typical face ERP studies are mostly concerned with group statistics of peak measurements, with little concern for reliability and quantification of the effects. In fact, most studies are content with the discussion of any effect p < 0.05 . Current practice in the ERP literature tends to hide the rich inter-subject variability that we ought to explain: we perform perceptual tasks as individuals, not as a collective brain. Moreover, many studies report weak effect sizes, unexpected results and do not control for multiple comparisons properly. One is left wondering what proportion of ERP results will ever be replicated (Miller, 2009). In many studies, beyond the recurrent fundamental flaws of nullhypothesis significance testing (Wagenmakers, 2007), the lack of robustness of t-tests and ANOVAs, and the lack of proper FiGuRE 10 | Cumulated task effects. The subject number is indicated by S#. Each cell shows the cumulated normalized sensitivity in the identity task (black) and in the color task (green). The difference between the two tasks is shown with thick red lines, with a 95% confidence interval around it (thin red lines). Red dots along the zero horizontal line mark time points of significant task differences, with no correction for multiple comparisons. The vertical red dashed line that crosses the entire cell marks the onset of significant task effects. The horizontal black dashed line marks the value corresponding to 50% of the total cumulated sensitivity in the identity task. The two vertical lines that originate from the 50% line and terminate on the x-axis mark the time to reach that 50% value in the two tasks. The title of each cell contains the onset of the task effects; the 50% integration time difference (50% ITD) between the color and the identity tasks; the task cumulated difference (TCD) between the identity and the color tasks, expressed in proportion of the maximum cumulated sensitivity in the identity task.
of the current dataset and previous datasets, the electrode at which the model provides the best fit seems to capture most of the effects (Rousselet et al., 2008b. Because of spatial blurring, neighboring electrodes contain redundant information, so pooling results across electrodes as is often done in group analyses would be of no benefit for univariate single-subject analyses. Of course, there might be extra information available in a multivariate space containing a large number of electrodes . Hence, it will be worth extending our univariate model to measure multivariate relationships between single-trial ERP amplitude, stimulus evidence, and task demand. Finally, as discussed by (Liu et al., 2009), a single-trial linear classifier has the advantage over a GLM approach to provide a measure of information. However, it is not clear how linear classifiers can be applied to more complicated designs such as our ANCOVA.
To conclude, all these considerations about group and singlesubject analyses are rather circular, because it is not clear what ought to be found. The ERP community relies mostly on group analyses, and therefore most readers might be biased to conclude that discrepancies between group and single-subject analyses reflect problems in single-subject analyses. This point of view is misguided because a significant group effect does not provide any guarantee that even 50% of the subjects will show the group effect (Figure 9). control for multiple comparisons (Wilcox, 2005), readers are too often left with so little evidence that it is impossible to judge the importance of their results. Here we've tried to provide a richer set of descriptions than is usually available in face ERP papers.
Of course, the lack of significant task effects in some subjects, and the lack of consistency across subjects who did show significant effects, might be attributable to different sources of variance, including differences in scalp thickness and electrode application, rather than individual differences in visual processing. These differences could lead to differences in statistical power across subjects. Although subjects who did show task effects had relatively large effect sizes, it is possible that more trials or better regression analyses, or both, would be necessary to reveal significant effects at different time points and in subjects showing null results. We are exploring the possibility of using smooth variance estimators, weighted models, and adjusting statistical thresholds based on empirical distributions to increase statistical power. However, our data driven estimates of effects expected by chance suggest that some subjects had indeed no task modulation of noise sensitivity whatsoever (Figure 11).
Finally, null or inconsistent effects might not reveal the absence of an effect but our failure to quantify changes in a multidimensional space. For instance, we reported most of our singletrial analyses at one electrode only. Based on extensive inspection FiGuRE 11 | Histograms of the bootstrap distributions of maximum F cluster sums for task effects on single-trial ERP noise sensitivity. The subject number is indicated by S# in bold font. These bootstrap distributions were calculated under the null-hypothesis H0, as described in Methods; hence they reflect the size of spatial-temporal task effects that can be expected by chance, due to random sampling, across the entire search space. For each subject, the vertical black dashed line marks the 95 th percentile of the H0 bootstrap distribution. The vertical red continuous line indicates the maximum sum of F values across the spatial-temporal clusters that contained the maximum R 2 electrode. For subject S 5 , the cluster sum is equal to zero because no cluster passed the two-electrode threshold: they were present at this electrode only.
FiGuRE 12 | Event-related potentials results from the eight subjects showing significant task effects. The subject number is indicated by S# in bold font. The first two columns show the modeled ERPs in the identity and the color tasks. The vertical dashed line marks the latency of the largest task difference. The vertical gray shaded area marks all the continuous time frames at which a significant effect was observed, and which contained the time frame of maximum effect. The third column shows boxplots of the single-trial modeled ERP (t,e) amplitudes in the identity and color tasks, summed across the time frames marked by gray areas in columns 1 and 2. The fourth column shows the shift function between the distributions in column three. The x-axis shows the estimated deciles in the identity task. The y-axis shows the estimated difference deciles between the identity and the color task, marked as nine dots, with the ends of the confidence intervals marked by plus signs.
(noise textures) in some subjects and an increase in ERP amplitude to the least noisy stimuli (faces) in others. Overall, our analyses demonstrate the usefulness of single-trial analyses and parametric designs to study information processing. Our results also suggest that, in some situations, group statistics can be so misleading that their use, without complementary individual subject analyses, is questionable.

acknowledGMents
ESRC grant RES-000-22-3209 supported Guillaume A. Rousselet. Cyril R. Pernet is funded by the SINAPSE collaboration -http:// www.sinapse.ac.uk, a pooling initiative funded by the Scottish Funding Council and the Chief Scientist Office of the Scottish Executive. A Biomedical Vacation Scholarship from the Wellcome Trust and a grant from the University of Glasgow Settlement charity supported Kacper P. Wieczorek. We thank Sarah Driscoll for her help collecting data.
The inter-individual differences we describe in detail in this paper might well be differences in visual processing. If this is true, these differences must be explained, and should not be simply smoothed out by using group statistics.
conclusIon At the group level, ERP sensitivity to phase noise was reduced between about 140 and 300 ms when stimulus phase information was task irrelevant. This result suggests that sensitivity to image structure can be modulated by the task at hand, but only after an initial period of essentially bottom-up visual processing, from about 90 to 140 ms. In contrast to what the group results might suggest, we observed a significant task effect in only 60% of subjects, and at any time point only 31% of subjects showed results consistent with group analyses. Interestingly, the origin of the reduction in ERP sensitivity to phase information differed among subjects, being due to an increase in ERP amplitude to the noisiest stimuli