# THE LOCUS OF THE STROOP EFFECT

EDITED BY : Benjamin Andrew Parris, Maria Augustinova and Ludovic Ferrand PUBLISHED IN : Frontiers in Psychology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-445-3 DOI 10.3389/978-2-88963-445-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# THE LOCUS OF THE STROOP EFFECT

Topic Editors:

Benjamin Andrew Parris, Bournemouth University, United Kingdom Maria Augustinova, Université de Rouen, France Ludovic Ferrand, Centre National de la Recherche Scientifique (CNRS), France

Citation: Parris, B. A., Augustinova, M., Ferrand, L., eds. (2020). The Locus of the Stroop Effect. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-445-3

## Table of Contents


## Editorial: The Locus of the Stroop Effect

#### Benjamin A. Parris <sup>1</sup> \*, Maria Augustinova<sup>2</sup> and Ludovic Ferrand<sup>3</sup>

<sup>1</sup> Department of Psychology, Bournemouth University, Poole, United Kingdom, <sup>2</sup> Normandie Université, UNIROUEN, CRFDP, Rouen, France, <sup>3</sup> Université Clermont Auvergne, CNRS, LAPSCO, Clermont-Ferrand, France

Keywords: Stroop, selective attention, cognitive control, task conflicts, semantic conflict, response conflict

#### **Editorial on the Research Topic**

#### **The Locus of the Stroop Effect**

One of the famous Monty Python's Holy Grail scenes pictures the Knights of the Round attempting to cross the Bridge of Death. After seeing one of his fellow knights failing to answer a challenging question posed by the Bridge keeper and being cast into the Gorge of Eternal Peril, Sir Galahad nervously approaches the Bridge keeper who asks his name, his quest and. . . .his favorite color. Relieved, he answers with ease before being struck by a sense of dread after saying the color "blue." The problem, he realizes as he plummets into the gorge, is that his favorite color is in fact yellow. . .

Even though individuals failing the requirements of the Stroop task (Stroop, 1935) are spared the dread Sir Galahad experienced, they are often heard self-correcting with some sense of consternation: "blue. . . .no yellow! Arghh!". The Stroop task requires participants to respond quickly to the color a word is printed in whilst at the same time ignoring the meaning of the word itself. The cost of failing to ignore the word is not a plunge into the Gorge of Eternal Peril, but is instead an incorrect response, or more commonly, longer response times compared to when naming the print color of a word that is not color-related (e.g., club in yellow). However, almost 30 years after the publication of MacLeod's (1991) seminal review paper, the locus of this so-called Stroop effect remains unclear. The aim of the present Research Topic was to address this still outstanding question.

When aiming to respond to the color, a participant must focus on that task and once perceived, a semantic representation of the color needs to be activated before the associated word form is retrieved. The Stroop effect suggests that this process can be interrupted by the processing of the to-be-ignored word since the task of word reading is seemingly automatically activated, as is the semantic representation of the word and its word form. Potentially then, amongst other loci, the process of color naming could be interrupted at the level of task set activation, semantic processing, and the word form response. Much of the theory and research however has assumed that the interruption, the locus of the Stroop effect, is at the level of responses (Cohen et al., 1990; Roelofs, 2003), and that it is this type of conflict for which control mechanisms monitor (Botvinick et al., 2001). In line with a recent and burgeoning literature (e.g., Parris, 2014; Levin and Tzelgov, 2016; Augustinova et al., 2018; Entel and Tzelgov, 2018; Kalanthroff et al., 2018; Ferrand et al., 2019; Hasshim et al., 2019; Hershman and Henik, 2019; Parris et al., 2019), the contributions to this Research Topic report findings that indicate that there is more than one locus to the Stroop effect.

Littman et al. review the literature on the physiological and behavioral signatures of task conflict and task control in the Stroop task whilst Hsieh and Sharma invoke task conflict and its (proactive) control to account for a general slowdown in color naming of studied non-color neutral and negative emotional words in the Stroop task. Continuing with the notion that emotion can modify Stroop effects, Berger et al. show that age does not affect the use of proactive control over

Edited and reviewed by: Bernhard Hommel, Leiden University, Netherlands

\*Correspondence: Benjamin A. Parris bparris@bournemouth.ac.uk

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 29 November 2019 Accepted: 03 December 2019 Published: 17 December 2019

#### Citation:

Parris BA, Augustinova M and Ferrand L (2019) Editorial: The Locus of the Stroop Effect. Front. Psychol. 10:2860. doi: 10.3389/fpsyg.2019.02860

**4**

emotion-related Stroop stimuli, but that the nature of the influence of emotion on Stroop effects depends on whether faces or words were the relevant dimension.

The modulating effects of response mode were examined in two contributions. In two experiments, Augustinova et al. report that the locus of interference and facilitation effects might depend on response mode with more types of interference (e.g., task, semantic, and response) and facilitation contributing to the vocal, compared to the manual, response. Highlighting another difference, Mills et al. report a negative priming effect in the vocal (Experiment 1) but not with manual (Experiment 2) Stroop task. The authors argued that this is because it is the actual naming response of the previously ignored stimulus that is suppressed and not the conflict that it generates.

In an fMRI study that controlled for variables that are often confounded, Parris et al. report regions of similar and dissociable neural activity to response and semantic conflict in the Stroop task, whilst Banich summarizes and updates the Cascade-of-Control neural model that argues that there is no single locus to the Stroop effect, and more importantly that the locus might move depending on how well each brain system deals with interference.

In their article, Algom and Chajut argue that the popular "conflict monitoring and control" view of the Stroop effect (Botvinick et al., 2001) fails to account for major Stroop results. Instead, they defend a "data-driven selective attention" view that they argue best accounts for most of Stroop results and one that does not involve higher-order cognitive level processes of control.

#### REFERENCES


Much of the work on the locus of the Stroop effect focusses on the conflict that occurs between the two dimensions of the stimulus on the current trial. However, two studies in this Research Topic build on work showing that interference is smaller on the current trial if the previous trial was incongruent; an effect known as the congruency sequence effect (CSE). Ménétré and Laganaro investigate subprocesses involved in the CSE with participants aged from 10 to 80 years old in order to analyze how interference, CSE, and the decomposition of attention and inhibition change across the lifespan. Aschenbrenner and Balota in contrast examine the relationship of the CSE with another measure of control referred to as the item-specific proportion congruency effect (ISPC).

In sum, these studies, reporting differences at behavioral and neurophysiological levels, highlight the loci of the Stroop effect at the level of task set, semantic, and response selection with the modulating effects of emotion and congruency-sequence.

#### AUTHOR CONTRIBUTIONS

Each author summarized the articles in the Research Topic that they edited. BP wrote the first draft of the editorial. MA and LF provided comments and recommended amendments.

#### FUNDING

This work was part funded by grant ANR-19-CE28-0013-01.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Parris, Augustinova and Ferrand. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Additive Effects of Item-Specific and Congruency Sequence Effects in the Vocal Stroop Task

Andrew J. Aschenbrenner<sup>1</sup> \* and David A. Balota<sup>2</sup>

<sup>1</sup> Department of Neurology, Washington University in St. Louis, St. Louis, MO, United States, <sup>2</sup> Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, MO, United States

There is a growing interest in assessing how cognitive processes fluidly adjust across trials within a task. Dynamic adjustments of control are typically measured using the congruency sequence effect (CSE), which refers to the reduction in interference following an incongruent trial, relative to a congruent trial. However, it is unclear if this effect stems from a general control mechanism or a distinct process tied to cross-trial reengagement of the task set. We examine the relationship of the CSE with another measure of control referred to as the item-specific proportion congruency effect (ISPC), the finding that frequently occurring congruent items exhibit greater interference than items that are often incongruent. If the two effects reflect the same control mechanism, one should find interactive effects of CSE and ISPC. We report results from three experiments utilizing a vocal Stroop task that manipulated these two effects while controlling for variables that are often confounded in the literature. Across three experiments, we observed large CSE and ISPC effects. Importantly, these effects were robustly additive with one another (Bayes Factor for the null approaching 9). This finding indicates that the CSE and ISPC arise from independent mechanisms and suggests the CSE in Stroop may reflect a more general response adjustment process that is not directly tied to trial-by-trial changes in attentional control.

Edited by:

Benjamin Andrew Parris, Bournemouth University, United Kingdom

#### Reviewed by:

James R. Schmidt, Université de Bourgogne, France Miriam Gade, MSB Medical School Berlin, Germany

#### \*Correspondence:

Andrew J. Aschenbrenner a.aschenbrenner@wustl.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 08 February 2019 Accepted: 01 April 2019 Published: 24 April 2019

#### Citation:

Aschenbrenner AJ and Balota DA (2019) Additive Effects of Item-Specific and Congruency Sequence Effects in the Vocal Stroop Task. Front. Psychol. 10:860. doi: 10.3389/fpsyg.2019.00860 Keywords: attentional control, congruency sequence effect, item-specific proportion congruency effect, attention, cognitive control

## INTRODUCTION

Attentional control is the ability to select relevant attributes from the environment for additional processing while ignoring competing and possibly more salient attributes. The Stroop color naming task (Stroop, 1935) is a classic test of attentional selection. In this paradigm, individuals are presented with color words printed in colored ink (e.g., the word RED in blue ink) and are instructed to name the ink color and ignore the word. The degree to which responses to incongruent stimuli (where the color and word are different) are slower than responses to congruent stimuli (where the color and word are the same) reflects the efficiency of attentional control.

A key theoretical issue is how control is recruited and/or adjusted across trials within a task. Extant models have been informed by the robust finding that interference on Trial N is consistently smaller when the stimulus on Trial N-1 was incongruent relative to when that item was congruent (Gratton et al., 1992). This phenomenon is known as the congruency sequence effect (CSE).

**6**

Importantly, the CSE indicates that some aspect of the stimulus from the prior trial induces a change in the processing system that influences performance on the subsequent trial(s). This suggests that attentional control is not a static process but rather is fluid and dynamic. A large body of research has since aimed to identify the specific mechanisms that produce these trial by trial adjustments in attentional control (see Duthoo et al., 2014b, for a review).

Many accounts of the CSE have been proposed and one of the most prominent is the conflict monitoring hypothesis which suggests the conflict produced by the stimulus on the preceding trial signals the system to upregulate control for the following trial (Botvinick et al., 2001). This theory has been able to account for a wide array of behavioral and neural data (Botvinick et al., 2004). Importantly, the conflict monitoring account suggests the CSE is fundamentally a modulation of control processes and has inspired a flurry of research that has aimed to determine whether the CSE truly reflects an adjustment in control. Some of the earliest alternative explanations suggested the CSE is actually produced by low-level feature characteristics such as item repetition (Mayr et al., 2003; Hommel et al., 2004) or response contingency (Schmidt and De Houwer, 2011). Although such confounds certainly do contribute to the observed effects, careful experimentation that has controlled for these confounds has generally still produced the expected finding, albeit reduced (Duthoo et al., 2014a; Kim and Cho, 2014; Schmidt and Weissman, 2014). Together these findings suggest that abstract properties (possibly conflict) of the prior stimulus are at least partially responsible for cross-trial changes and hence the CSE can be used as a marker of attentional control adjustment.

However, a number of studies have continued to challenge whether the CSE is a control phenomenon or rather arises from a more general trial-by-trial response adjustment mechanism. For example, Schmidt and Weissman (2016) conducted detailed analyses of prior trial response times and determined that the CSE is consistent with a simple temporal learning model. That is, participants tend to respond quickly after a relatively fast response (which tend to be congruent trials) on Trial N-1 and relatively slowly after a slow response (which tend to be incongruent trials) on Trial N-1. These expectations are implemented via momentary drops in response thresholds such that following a fast (congruent) trial, response thresholds are dropped relatively early and following a slow (incongruent) trial, thresholds are dropped relatively late. An early drop in threshold would benefit a congruent stimulus on Trial N whereas a later drop would benefit incongruent stimuli on Trial N, producing the CSE pattern (see Schmidt and Weissman, 2016, for computational details). It is important to point out, however, that while the statistical models revealed a robust current trial congruency by previous trial congruency by previous trial RT interaction (which indicates the CSE is modulated by the prior trial RT), the two-way interaction between current and previous congruency still remained. Thus, we can conclude that temporal learning may contribute to the magnitude of the CSE, but it is not the entire story.

Aschenbrenner and Balota (2015) took an individual differences approach and compared the magnitude of the CSE as a function of age and working memory in the Stroop task. They argued that because older adults and low-working memory individuals have impaired attentional control, one should expect these individuals to produce smaller CSEs. Instead, they found the opposite pattern, namely that the CSE increased with older age and lower working memory estimates. Furthermore, this increase was driven primarily by differences on post-congruent rather than post-incongruent trials.

The disproportionate influence of prior congruent responses (Lamers and Roelofs, 2011) led Aschenbrenner and Balota (2015) to propose a pathway priming account of the CSE. Specifically, they assumed a two-pathway model of Stroop performance (e.g., color and word pathway) in which activity accumulates along each pathway until a response is made. When Trial N-1 is incongruent, trials on which only the color dimension is relevant, the color pathway is primed for use on the subsequent trial. If Trial N is also incongruent, responses will be facilitated due to the greater activity along the color pathway. However, when Trial N-1 is congruent, the word pathway holds relative utility in reaching the correct response, hence primes the word pathway for use on the next trial. If Trial N is congruent, responses will again be facilitated due to increased activity along the word pathway, however if Trial N is incongruent, responses are slowed as the additional activity along the word pathway now needs to be controlled. Hence, the pathway priming model embodies the assumption that individuals are constantly adjusting specific procedures they utilize to achieve task goals based on the success of those procedures (e.g., use of color vs. word pathway) on the immediately preceding trial.

Of course, if this model is correct, then one should find cross trial effects in other tasks such as lexical decision and recognition memory, which are not tasks that place a heavy load on attentional control systems, certainly not to the same degree as the Stroop task. Indeed, there has been a recent flurry of research which suggests that non-attentional tasks also produce CSE-like patterns that can be interpreted within the pathway priming framework (Malmberg and Annis, 2012; Balota et al., 2018; Aschenbrenner et al., 2017; Hubbard et al., 2017).

As noted, most recent research has tried to address whether the CSE reflects control by eliminating all possible confounds (e.g., feature level characteristics) to ensure that some CSE is still obtained. We take an alternative approach here. Specifically, we examine these issues through the lens of the additive factors framework (Sternberg, 1969) which suggests that additive effects of two variables (i.e., reliable main effects but no interaction) indicate each variable influences a separate or independent processing stage whereas variables that interact influence a shared stage. For example, in the classic short-term memory scanning study where participants are shown a series of digits and asked to determine if a target probe is or is not contained in the presented array, it has been shown that the perceptual quality of the probe is additive with regards to the size of the memory set to be searched (Sternberg, 1967). Sternberg concluded that stimulus degradation and memory set size must each influence a separate processing stage. Of course, such an account is not the only way to interpret additive effects (e.g., McClelland, 1979), however the independent stages model has been shown to best accommodate

the relationship among mean reaction times and the associated variances (Roberts and Sternberg, 1993, see Balota et al., 2013 for similar interpretation of the additivity of degradation and word frequency in the lexical decision task).

In the present study, we used additive factors logic to examine whether the CSE involves attentional control adjustments by exploring the relationship between the CSE and an established marker of attentional control adjustment, the item-specific proportion congruency effect (ISPC: Jacoby et al., 2003). Specifically, it has been repeatedly shown that the magnitude of interference on any given trial depends on the overall frequency with which that particular item is congruent or incongruent. That is, items which are mostly congruent (MC items) exhibit greater interference than items that are mostly incongruent (MI items). This finding has been interpreted as evidence for a rapid retrieval or adjustment of control settings that occurs post-stimulus onset (Blais et al., 2007). For example, if the word GREEN is typically incongruent, control over the word pathway would be increased when GREEN is encountered in the list. Using additive factors logic, if the CSE is due to an adjustment in control processes, then it should interact with the ISPC. In contrast, if the CSE is the result of some other, non-control based mechanism (such as pathway priming), one would expect additivity to prevail.

We conducted a modified vocal Stroop task in which the CSE was examined following biased ISPC items (i.e., mostly congruent or mostly incongruent) or unbiased (50% congruent) items. As already indicated, exact repetition of stimuli can artificially magnify the CSE and hence repetition of stimuli or responses should be precluded from the design. This is typically done by expanding the size of the stimulus set (e.g., by using at least four colors in the Stroop task). However, this standard manipulation produces another confound, specifically a contingency bias such that the word dimension predicts the correct response more often than would be expected by chance alone which can also influence the observed CSE (Schmidt and De Houwer, 2011).

Therefore, in order to provide a confound-minimized test of CSE processes in the current study, the following procedure was implemented (Kim and Cho, 2014; Aschenbrenner and Balota, 2017). First, we created a set of Stroop stimuli that consisted of eight colors and eight color words which were placed into pairs. Incongruent items were always shown in the color of the opposite item of the pair. For example, if RED and BLUE form one pair, an incongruent BLUE stimulus would always be shown in the color RED and never in any other color. Such a procedure eliminates the contingency confound, and as long as different pairs are sampled across adjacent trials exact repetitions of items and responses are also precluded.

As an overview of the experiments, Experiment 1 examined the relationship between the CSE and the ISPC in young adults using a vocal Stroop paradigm that eliminates all confounds that have been previously identified in the literature. Experiment 2 examined the same effects in a sample of older adult participants, a group of people who have been shown to have difficulties in attentional control and therefore should produce larger overall effects and may increase our power to detect interactive influences. Finally, Experiment 3 eliminated a potential alternative account of the ISPC (associative learning) to ensure that the present ISPC is indeed a reflection of attentional control in this paradigm.

## EXPERIMENT 1

## Methods

#### Participants

Thirty-two young adults (78% female; mean age = 19.7 years, SD = 1.4) were recruited from the Washington University Psychology undergraduate research pool. All had normal or corrected-to-normal vision and participated for research credit or monetary compensation. A power analysis using the Bayes factor design analysis (BFDA) package (Schönbrodt, 2018) in R indicated that a sample size of 32 would give approximately 70% power to obtain an interpretable Bayes factor (i.e., greater than three) in favor of a difference in the CSE as a function of the ISPC using a paired t-test, assuming a moderate effect size (Cohen's D ranging from 0.45 to 0.65). Similarly, we had approximately 72% power to obtain a Bayes factor larger than three in favor of the null, assuming a true effect size of 0.

#### Stimuli

The stimulus set and the frequency of presentation of each item is shown in **Table 1**. The four pairs of items were presented with differing frequencies such that items from one pair were congruent 75% of the time (thus forming a mostly congruent: MC item set) and items from a different pair were only congruent 25% of the time (mostly incongruent: MI items). The final two pairs were 50% congruent, one of which was designated "neutral" items and the other as the "critical" items. The neutral items were intended to serve as a control condition to assess the CSE when the prior trial did not contain a frequency manipulation (consistent with prior examinations of the CSE). The critical items were used to assess the magnitude of the CSE. Importantly, while both the neutral and critical items are 50% congruent, only the critical items were experimentally controlled such that they followed each item type (MI, MC, and neutral) with equal probability. This insures that an equal number of trials occurred in each of the four cells that make up the CSE. The item pairs (e.g., RED always with BLUE) were kept the same but were rotated through the conditions such that each set of items was a MI, MC, neutral, or critical item across participants.

#### Procedure

The experiment began with a demonstration block in which each of the eight colors were shown as colored squares and the participant was asked to name them aloud. This was followed by a 23 item practice block which mimicked the structure of the test (i.e., mostly congruent items were more frequently presented in their matching color and so forth). During practice, corrective feedback was given as necessary (e.g., "speak more loudly," "remember to name the color not the word," etc.). After the practice block, the test itself began, illustrated in **Figure 1**. The test phase consisted of 1152 trials with 12 rest breaks programmed throughout. In both the practice and test blocks, the Stroop stimulus was displayed in the center of the screen for

#### TABLE 1 | Stimuli frequencies in Experiments 1 and 2.


Critical items: 50% congruent, used to examine the CSE; MI items: mostly incongruent; MC items: mostly congruent, Neutral items: 50% congruent.

5000 ms or until a verbal response by the participant triggered the microphone. The participant's response initiated a blank screen while the experimenter coded the response as correct, incorrect or microphone error (e.g., stutters, speaking too softly etc.). Once the response was coded, a 1000 ms blank screen intertrial interval was initiated prior to the presentation of the next stimulus. The Washington University Institutional Review Board approved all procedures.

#### Analysis

To avoid the influence of outlier RTs, individual's data were trimmed using the following method. First, microphone errors were removed followed by any valid response trial that was faster than 200 ms (presumed to be fast guesses or an undetected microphone error). Next, RTs that were faster or slower than three standard deviations from the participant's mean were removed. Finally, we also eliminated the first trial after each break, trials that occurred after an error and any trial immediately following when the experimenter took longer than 5 s to code the response. This trimming strategy eliminated 7.4% of the total responses.

The data were then split into critical items (used for the CSE analysis) and "biased" items (MC, MI, or neutral) for an analysis of the ISPC. RTs were z-scored to each individual's

mean and standard deviation within each set of items to control for individual differences in overall speed and ability (Faust et al., 1999). Raw mean RTs are provided in the **Supplementary Materials**. Mean z-scored RTs were calculated for each of the critical cells for analysis. The condition means were analyzed using a Bayesian linear mixed effects model using the rjags package (Plummer, 2016). For the ISPC analysis, the condition means included congruency (congruent vs. incongruent items) and item type (MC vs. MI vs. Neutral). For the CSE analysis, the condition means reflected the three-way crossing of congruency (congruent vs. incongruent items), previous trial congruency (congruent vs. incongruent) and the previous item type (MC, MI, or neutral). In order to generate representative and stable estimates, we ran three chains of 100,000 samples from the posterior distribution and excluded the first 1,000 as burn-in for each analysis. After checking that the chains converged using the Gelman and Rubin <sup>b</sup><sup>R</sup> statistic (Gelman and Rubin, 1992), we collapsed across the chains to analyze the posteriors. Mean z-scored RTs were analyzed as a combination of the conditions (defined above) and a random effect of subject. Each beta weight was given a broad (uninformative), normally distributed prior. Results are presented as a point estimate together with the 95% highest density interval (HDI), e.g., effect = X, HDI = Y:Z). An effect can be called "significant" if the HDI does not include zero. Finally, we provide Bayes Factor of the critical effects using the Savage-Dickey density ratio (Wagenmakers et al., 2010) as a quantification of the evidence for a given hypothesis. Conventionally, a Bayes Factor between 3.2 and 10 represents a "substantial" amount of evidence (Kass and Raftery, 1995).

#### Results z-Scored RTs

## **ISPC analysis**

The first and necessary step in our analysis is to demonstrate that an ISPC effect was obtained in our modified design. Condition means are displayed in **Figure 2**. The main effect of Stroop congruency was large and significant (Mean effect = 0.794, HDI = 0.718:0.871) indicating responses were 0.794 standard deviations slower to incongruent relative to congruent stimuli. More importantly, this effect interacted with the type of item (i.e.,

there was an ISPC). Specifically, relative to the neutral condition, interference was greater for the MC items (Mean effect = 0.256, HDI = 0.069:0.441) and was smaller for the MI items (Mean effect = −0.269, HDI = −0.454:−0.082). Thus, the ISPC effect is readily apparent even under these highly controlled conditions.

#### **CSE analysis**

**Figure 3** plots the CSE (post-incongruent interference minus post-congruent interference) as a function of each item type and the cell means are shown in **Table 2**. It is important

TABLE 2 | Mean z-scored RTs (and HDIs) for each condition in the CSE analysis of Experiment 1.


to remember that "item type" refers to the prior trial in this analysis as the current trial was always unbiased. As before, the Stroop effect averaged across all conditions was significant (Mean effect = 0.693, HDI = 0.652:0.734). Importantly, the magnitude of interference varied as a function of prior trial congruency producing the CSE, (Mean effect = −0.126, HDI = −0.208:−0.044). However, there was no evidence of an interaction between the CSE and the prior item type indicating that the CSE was of comparable magnitude regardless of whether it followed an MC, MI, or neutral item. Specifically, the HDI of the beta weight comparing the CSE following MC items to the CSE following neutral items was wide and encompassed zero (Mean effect = −0.046, HDI = −0.248:0.151, Bayes Factor = 8.88) as did the comparison between MI and neutral (Mean effect = −0.034, HDI = −0.236:0.165, Bayes Factor = 9.21). These results indicate that although there is a clear effect of the congruency of the previous trial (the significant CSE) and the probability of the previous item being mostly congruent, incongruent or neutral (ISPC effect), these two effects did not interact.

#### Accuracy

#### **ISPC analysis**

For the ISPC items, the Stroop effect was significant (Mean effect = −0.036, HDI = −0.044:−0.027) indicating more errors to incongruent items relative to congruent items. Furthermore, interference was larger for MC items relative to neutral (Mean effect = −0.024, HDI = −0.044:−0.002) and also relative to MI items (Mean effect = −0.036, HDI = −0.057:−0.015). However, the MI and Neutral items did not differ from one another (Mean effect = 0.013, HDI = −0.008:0.034).

#### **CSE analysis**

Looking at the critical items to assess the CSE, the Stroop effect was significant (Mean effect = −0.032, HDI = −0.039:−0.025) as was the CSE (Mean effect = 0.023, HDI = 0.010:0.036). However, none of the interactions with prior item type were significant. Specifically, the HDI of the beta weight comparing the CSE following MC items relative to neutral items was large and encompassed zero (Mean effect = 0.01, HDI = −0.019:0.046, Bayes Factor = 43.49) as was the CSE following MI items relative to neutral (Mean effect = 0.01, HDI = −0.028:0.038, Bayes Factor = 56.76).

#### Interim Discussion

The primary result from this experiment is that the CSE and ISPC both produce highly reliable effects but are additive with one another. This provides initial evidence that the ISPC and CSE reflect separate and independent mechanisms. The evidence for the independence of these two factors was quite large (∼9 times in favor of the null when testing the three-way interaction, as reflected by the Bayes Factor). However, there are a number of additional reasons that might account for the null interaction we obtained. We report two additional experiments that address these possibilities. First, it is possible that we did not have a sufficiently strong CSE to detect the hypothesized interaction. Although highly reliable, the CSE is relatively small,

at least when compared to the size of the overall Stroop effect. Thus, in order to both replicate our original finding and address the effect size issue, we conducted the same experiment again with an older adult sample. Older adults typically produce a larger CSE in the Stroop task relative to younger adults (Aschenbrenner and Balota, 2017) and therefore, if the null is simply due to the relatively small magnitude of the CSE, we may be more likely to detect the interaction in this population.

#### EXPERIMENT 2

#### Participants

A group of 32 healthy older adults (59% female; mean age = 72.7, SD = 4.3) were recruited from the St. Louis community. Participants were given \$25 for their time and effort.

#### Stimuli and Procedure

The stimuli, procedure and analysis were identical to Experiment 1. Our trimming method eliminated 6.7% of the total RTs.

#### Results

#### z-Scored RTs

#### **ISPC analysis**

Condition means for the ISPC effect are shown in **Figure 4**. As expected, there was a significant Stroop effect (Mean effect = 0.847, HDI = 0.766:0.930) indicating responses were slower to incongruent relative to congruent items. Furthermore, the interference effect was larger for MC items relative to neutral (Mean effect = 0.267, HDI = 0.067:0.465) and smaller for MI items relative to neutral (Mean effect = −0.260, HDI = −0.457:−0.06), reflecting the ISPC effect.

#### **CSE analysis**

The CSE as a function of prior item type is shown in **Figure 5** and the individual cell means are shown in **Table 3**. Once again, we observed a significant interference effect (Mean effect = 0.817, HDI = 0.774:0.860) as well as a significant CSE (Mean effect = −0.149, HDI = −0.234:−0.062) indicating smaller interference effects following an incongruent stimulus. Critically, the CSE did not interact with the prior item type. Specifically, the HDI for the difference between prior MI and prior neutral trials was wide and included zero (Mean effect = −0.058, HDI = −0.268:0.149, Bayes Factor = 8.09) as was the HDI of

TABLE 3 | Mean z-scored RTs (and HDIs) for each condition in the CSE analysis of Experiment 2.


the difference between prior MC and prior neutral items (Mean effect = −0.047, HDI = −0.256:0.159, Bayes Factor = 8.51).

#### Accuracy

#### **ISPC analysis**

In the analysis of accuracy rates, the Stroop effect was significant (Mean effect = −0.022, HDI = −0.028:−0.015). Interference was larger for the MC items relative to neutral (Mean effect = −0.020, HDI = −0.035:−0.004) but the MI and neutral items did not differ from one another (Mean effect = 0.005, HDI = −0.010:0.021).

#### **CSE analysis**

In the analysis of the CSE items, the Stroop effect was reliable (Mean effect = −0.017, HDI = −0.021:−0.013) but there was no CSE (Mean effect = 0.002, HDI = −0.01:0.006). Furthermore, the CSE following MC items did not differ from neutral items (Mean effect = −0.002, HDI = −0.02:0.018, Bayes Factor = 101.88) nor did the MI items differ from neutral (Mean effect = −0.004, HDI = −0.023:0.015, Bayes Factor = 93.03).

#### Discussion

We replicated our initial findings of additive effects of the CSE and ISPC in an older adult cohort. The ISPC itself was large and significant which suggests that control settings are being

modulated on those trials. Furthermore, the CSE itself was also significant indicating responses are being adjusted based on the congruency of the prior trial regardless of whether it was an MI, MC or neutral item. Importantly, a simple ANOVA confirmed that the cross-experiment Age by CSE interaction was reliable, F(3,186) = 3.13, p = 0.03, indicating that older adults produced larger CSEs compared to younger adults, collapsed across ISPC conditions, replicating the recent Age × CSE interaction that was reported by Aschenbrenner and Balota (2017). Moreover, the present replication and extension of Experiment 1 to an older adult sample again suggests that the CSE and ISPC reflect distinct mechanisms.

Before reaching such a conclusion, there is one final possibility regarding these additive effects that remains to be evaluated. Specifically, although we motivated the current experiments under the notion that the ISPC reflects an adjustment in control processes (i.e., when an MI item is encountered control is rapidly increased), an important alternative account of the ISPC is one of associative stimulus-response learning. For example, if BLUE is most frequently presented in the color red (hence is a mostly incongruent item), participants can learn that when the stimulus is the word BLUE they should respond with "red" (Schmidt and Besner, 2008). Indeed, a number of studies have suggested that once this contingency bias is experimentally controlled for, ISPC effects disappear (Schmidt and Besner, 2008; Schmidt, 2013; Hazeltine and Mordkoff, 2014). Thus, under this scenario the ISPC may not be an issue of control but rather a reflection of associative learning and therefore one many not expect to observe an interaction between the ISPC and CSE.

Of course, it is important to note that we included "neutral" items in our ISPC design, that is, items that were always 50% congruent. Therefore, if the MI or MC items invoked an associative learning mechanism, one would still have expected to obtain an interaction whereby the neutral items (which must be resolved via attentional control) interact with the CSE but not the biased items (which may reflect associative learning). This presents some initial evidence that associative learning processes may not be the entire story in the first two experiments. However, to further address this important concern, we conducted a final experiment in which we attempted to minimize the contribution of an associative learning mechanism. We do this by drawing on the Associations as Antagonists to Top-Down Control (AATC) hypothesis proposed by Bugg (2014). Specifically, Bugg argued that contingency biases typically produce the ISPC under most circumstances but when contingencies are accounted for, conflict adaptation processes then take over. For example, in an experiment when associative learning processes would be expected to be quite strong (e.g., when MI items only occur in one other color, red always in BLUE), no evidence of conflict adaptation was observed (there was no list-wide proportion congruency effect). However, when associative learning was lessened by simply increasing the number of response options, (e.g., when the word blue could occur in RED or GREEN), conflict adaptation was again observed. Thus, when reliable S-R associations can form (see blue respond RED), modulations of control are minimal whereas when the associations are not reliable (see blue respond either RED or GREEN) control adjustments are more likely prevail. Thus, as a final attempt to address the concern that associative learning processes are producing the ISPC in our studies, we followed Bugg (2014) by increasing the stimulus-response set such that each word is paired with two possible colors rather than just one.

## EXPERIMENT 3

## Participants

Sixty-six participants were recruited from the Psychology Department undergraduate research pool (67% female; mean age = 19.5, SD = 1.2). Our power analysis showed that this sample size gave 95% power for a meaningful (greater than three) Bayes factor in favor of the alternative hypothesis (assuming a moderate effect size) and 82% power to obtain a meaningful Bayes factor in favor of the null (assuming effect size of 0).

#### Stimuli

The stimuli were the same as those used in Experiments 1 and 2. However, as shown in **Table 4**, the frequency of presentation of each item has changed. Specifically, we eliminated the neutral items and now presented 3 MC items and 3 MI items which were


Critical items: 50% congruent, used to examine the CSE; MI items: mostly incongruent; MC items: mostly congruent.

counterbalanced and rotated across participants. In this way, we reduced the ability to rely on associative learning to resolve the interference on the biased items.

#### Procedure

The procedure was very similar to Experiments 1 and 2 with the exception of the stimulus configurations detailed above and that only 528 trials were presented with 36 practice items and 7 pre-programmed breaks. These changes were implemented to reduce the length of the experiment. We increased our sample size to compensate for these lower trial counts and also to increase our overall power. The analysis and trimming procedures were otherwise identical to the previous two experiments and 9.2% of RTs were identified as outliers and removed prior to analysis.

## Results

z-Scored RTs

#### **ISPC analysis**

The condition means for the ISPC analysis are shown in **Figure 6**. There was a large and significant Stroop interference effect (Mean effect = 0.844, HDI = 0.800:0.888) which interacted with item type. Specifically, interference was larger for MC items relative to MI items (Mean effect = 0.404, HDI = 0.316:0.492). Thus, even though the associative learning confound was minimized in this design, we are still able to detect a large ISPC effect.

#### **CSE analysis**

The CSE means are displayed in **Figure 7** and the cell means are shown in **Table 5**. The interference effect was reliable (Mean effect = 0.872, HDI = 0.825:0.919) and interacted with prior trial congruency (Mean effect = 0.149, HDI = 0.056:0.242) reflecting

TABLE 5 | Mean z-scored RTs (and HDIs) in each condition for the CSE analysis of Experiment 3.


the standard CSE. However, there was still no evidence of an interaction with the prior item type (Mean effect = −0.038, HDI = −0.221:0.150, Bayes Factor = 9.73).

#### Accuracy

#### **ISPC analysis**

For the ISPC items, the average Stroop effect was significant (Mean effect = −0.038, HDI = −0.048:−0.028) and this effect interacted with the prior item type (Mean effect = 0.031, HDI = 0.011:0.053) such that interference was larger for MC items relative to MI items, producing the ISPC.

#### **CSE analysis**

For the CSE items, the average Stroop effect was significant (Mean effect = −0.034, HDI = −0.044:−0.025) but the effect was not modulated by prior trial congruency (congruent vs. incongruent, Mean effect = −0.009, HDI = −0.028:0.011). Furthermore, the CSE did not interact with the prior item type (MC items vs. MI items, Mean effect = 0.013, HDI = −0.026:0.051, Bayes Factor = 41.25).

## Discussion

The results of Experiment 3 once again clearly demonstrated the presence of both a robust ISPC effect and a CSE but no hint of an interaction between these two factors. This replicates our prior experiments under conditions that minimize associative learning as a possible mechanism for the ISPC. Thus, the control settings engaged on Trial N-1 to produce the ISPC do not appear to differentially influence the interference effect on the subsequent trial (the CSE).

## GENERAL DISCUSSION

The primary aim of this work was to examine the relationship between two purported markers of dynamic adjustments in attentional control, the ISPC and the CSE. The main finding, replicated across three experiments, was that although there was both a robust ISPC and a CSE, these two manipulations did not interact. In other words, the CSE examined on Trial N was of a comparable magnitude regardless of the congruency bias of the stimulus on Trial N-1. Indeed, the Bayes Factor was quite large (∼9) in support of this null interaction, within each experiment. Additive factors logic therefore suggests that the mechanisms responsible for producing the change in interference reflected in the ISPC are not the same as the mechanisms producing the CSE, at least in the present experiments.

These results are consistent with a recent study that indirectly tested a similar idea. Specifically, Crump et al. (2018) used an attention capture paradigm that included an ISPC manipulation. In supplementary analyses, it was shown that sequential effects (i.e., the CSE) did not interact with the ISPC. We critically build on this work by a) including a set of well-controlled, contingency minimized "critical" items on which to assess the CSE in order to avoid the various confounds that hinder analysis of the CSE (e.g., Duthoo et al., 2014a) and b using a standard, vocal-response Stroop task, the quintessential measure of attentional control, in which most studies have explored both CSE and ISPC effects.

As already mentioned, both the ISPC and the CSE have been thought to reflect rapid and dynamic adjustments in attentional control processes. To the extent that these manipulations influence the same mechanism, one would expect a design that manipulates both would produce an interaction. Specifically, consider a congruent, MI item. Typically, the MI manipulation would produce an increase in control, due to the frequency manipulation (i.e., the ISPC) but the item would also be expected to reduce control due to the fact that it is congruent (producing the CSE). A priori, one would expect the CSE to be canceled out or at least minimized in this scenario, producing a statistical interaction. The robust additive pattern between the ISPC and CSE obtained in the current series of experiments would appear to call into question any mechanistic explanation of the CSE that relies on singular dynamic adjustments in control processes. Indeed, these results seem to suggest that the CSE is not a control modulation phenomenon at all, but rather may result from a more general mechanism that induces trial by trial changes in the recruitment of the specific operations that are employed to achieve a given task based on recent experience. In other words, the specific operations that are engaged on Trial N (whatever they may be) are informed by which operations were employed on the prior trial.

This idea is embodied in the pathway priming account of Stroop performance noted earlier (Aschenbrenner and Balota, 2015). That is, the use of a particular pathway, either color or word processing, is primed for use depending on the extent to which that pathway could be used on the prior trial. When a congruent trial was just processed, the word reading pathway is relied upon to a greater extent on the following trial, since it was a useful pathway to facilitate processing. Of course, in the context of conflict tasks such as the Stroop task, "reliance" on a given pathway is also a reflection of control processes. That is, attentional control dictates the degree of activation that propagates along any given pathway. While we are suggesting that pathway priming is independent from control processes per se, consistent with the additive effects obtained in the present study, we acknowledge that the overlap in mechanisms makes totally disentangling these processes rather difficult. Therefore, the extent to which local cross-trial changes in the Stroop task match those from other domains (e.g., visual word recognition or short-term memory scanning) provides a useful avenue to understand general mechanisms of dynamic (due to previous trials) adjustment of stimulus response configurations to accomplish task goals.

As noted earlier, it is interesting to note that our interpretation of the CSE is consistent with an established literature on cross-trial effects in other cognitive domains. For example, it has been repeatedly shown that in the lexical decision task, the speed to identify a stimulus as a word or nonword depends on the perceptual and response characteristics of both the current and previous trial (Balota et al., 2013; Masson and Kliegl, 2013). Specifically, if two adjacent trials are perceptually degraded, RTs are faster compared to when the perceptual clarity changes across trials. Moreover, if the lexical status of the previous trial is the same as the current trial (e.g., two "nonword" targets in a row), there are large effects of response congruency. We have proposed that this finding reflects the system adjusting to prepare to process the same, salient characteristics across trials (Balota et al., 2018). Importantly, however, large manipulations of variables known to influence lexical processing (e.g., word frequency) on Trial N are not influenced by previous trial characteristics (degradation or lexicality) which is similar to the current experiments where the CSE on trial N is not influenced by the ISPC on Trial N-1. Similar findings have been recently demonstrated in a diverse array of tasks including noun/verb judgments and short term memory scanning (Aschenbrenner et al., 2017) and speeded word naming (Zevin and Balota, 2000; Reynolds and Besner, 2005) suggesting that crosstrial influences is a rather general mechanism and not tied to tasks that presumably tap attentional control, such as the Stroop task.

The present study has many strengths including the replication of a theoretically important null effect across multiple experiments and samples, however a few limitations are worth mentioning. First, we focused only the influence of the immediately preceding trial. While it is fair to say this is the standard approach in the field, this approach does minimize the cumulative influence of multiple serial trials and may not accurately reflect the time course of control. For example, Jiménez and Méndez (2013) examined the CSE as a function of runs of 1, 2, or 3 sequential trials of the same congruency and they showed the congruency effect increased as the number of presented congruent trials increase but the effect decreased when numerous incongruent trials were presented. However, because the CSE is greatest from trial N-1 to trial N, the current study afforded the strongest test of a single trial dynamic adjustment in control. Second, we began these investigations under the assumption that the ISPC effect is due to modulations in attentional control that occur post-stimulus onset (Jacoby et al., 2003). However, such an interpretation is still under fierce debate in the literature (Bugg and Crump, 2012; Schmidt, 2018). As the contingencies of the items in our experiments still varied across the ISPC manipulations (even in Experiment 3) whether our results successfully precluded the contributions of S-R learning processes cannot be fully determined. At a minimum, however, these results can serve as a starting point for additional experimentation that can more cleanly separate these component processes.

## CONCLUSION

In summary, the ISPC and CSE were robustly additive across three distinct experiments. This pattern suggests that the CSE reflects an independent, response adjustment system and may not be related to adjustments in attentional control per se, at least as reflected by the ISPC effect. Hence, these results provide evidence of multiple distinct forms of response dynamics in the premier measure of attentional control, the Stroop task. The similarity of crosstrial effects in other standard cognitive tasks that do not demand high levels of control further question the standard interpretation of the CSE primarily reflecting dynamic changes of attentional control.

## REFERENCES


## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Institutional Review Board at Washington University in St. Louis with informed consent from all subjects. All subjects gave verbal informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Washington University in St. Louis Institutional Review Board.

## AUTHOR'S NOTE

The ability to rapidly direct attention to important aspects of the environment while ignoring distracting information is a critical cognitive skill. This paper investigates the relationship between different variables that are thought to engage these attentional processes. The results show there are multiple, independent factors that can aid or hinder the ability to focus attention on important, but possibly less salient, information in the face of distractors.

## AUTHOR CONTRIBUTIONS

AA collected and analyzed the data, and drafted the manuscript. DB edited the manuscript for intellectual content.

## FUNDING

This research was supported by an NIA Aging and Development Training Grant (T32 AG000030) awarded to DB and a dissertation award from the American Psychological Association awarded to AA. Portions of this research were presented at the Psychonomics Annual Meeting 2016.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00860/full#supplementary-material

Balota, D. A., Aschenbrenner, A. J., and Yap, M. J. (2013). Additive effects of word frequency and stimulus quality: the influence of trial history and data transformations.J.Exp.Psychol.Learn.Mem.Cogn.39,1563–1571.doi:10.1037/a0032186



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Aschenbrenner and Balota. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Task Conflict and Task Control: A Mini-Review

*Ran Littman1 \*† , Eldad Keha1,2† and Eyal Kalanthroff1*

*1 The Clinical Neuropsychology Laboratory, Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel, 2 Department of Psychology, Achva Academic College, Arugot, Israel*

#### *Edited by:*

*Benjamin Andrew Parris, Bournemouth University, United Kingdom*

#### *Reviewed by:*

*Nabil Hasshim, University College Dublin, Ireland Juan J. Ortells, University of Almería, Spain*

*\*Correspondence:* 

*Ran Littman ran.littman@mail.huji.ac.il † These authors have contributed equally to this work*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 15 May 2019 Accepted: 25 June 2019 Published: 17 July 2019*

#### *Citation:*

*Littman R, Keha E and Kalanthroff E (2019) Task Conflict and Task Control: A Mini-Review. Front. Psychol. 10:1598. doi: 10.3389/fpsyg.2019.01598*

Stimulus-driven behaviors are triggered by the specific stimuli with which they are associated. For example, words elicit automatic reading behavior. When stimulus-driven behaviors are incongruent with one's current goals, task conflict can emerge, requiring the activation of a task control mechanism. The Stroop task induces task conflict by asking participants to focus on color naming and ignore the automatic, stimulus-driven, irrelevant word reading task. Thus, task conflict manifests in Stroop incongruent as well as in congruent trials. Previous studies demonstrated that when task control fails, reaction times in congruent trials slow down, leading to a reversed facilitation effect. In the present mini-review, we review the literature on the manifestation of task conflict and the recruitment of task control in the Stroop task and present the physiological and behavioral signatures of task control and task conflict. We then suggest that the notion of task conflict is strongly related to the concept of stimulus-driven behaviors and present examples for the manifestation of stimulus-driven task conflict in the Stroop task and additional tasks, including object-interference and affordances tasks. The reviewed literature supports the illustration of task conflict as a specific type of conflict, which is different from other conflict types and may manifest in different tasks and under diverse modalities of response.

Keywords: Stroop task, cognitive control, executive functions, task conflict, task control, stimulus-driven behavior

The concept of cognitive control refers to a set of abilities which allow for the effortful application and maintenance of goal-directed behaviors (Banich, 2009; Diamond, 2013). For several decades, the Stroop task has been serving as a principal tool for investigating cognitive control in the lab (MacLeod, 1991). In the present mini-review, we focus on a unique feature of cognitive control, *task control*, and its recruitment for the resolution of a specific type of conflict – *task conflict*. We first review the literature of Stroop task conflict, illustrate task conflict's physiological and behavioral signature and then move to describe task conflict in the context of stimulus-driven behaviors, refer to its manifestation in other tasks and under diverse modalities of response, and suggest that impaired task control may be related to certain pathological behaviors.

#### TASK CONFLICT IN THE STROOP TASK

In various situations, individuals must decide between two alternative task demands. Such circumstances often result in the emergence of task conflict. Task conflict has been studied mainly by using the Stroop task (Stroop, 1935) in which participants are instructed to name the ink-color of congruent (e.g., RED written in red), incongruent (e.g., RED written in blue), and non-word neutral (e.g., XXXX written in red) stimuli while ignoring the word's meaning (MacLeod, 1991). The typical Stroop reaction time (RT) data show a robust Stroop interference effect (incongruent RT > neutral RT) and a smaller and less robust Stroop facilitation effect (congruent RT < neutral RT). Goldfarb and Henik (2007) suggested that the Stroop task consists of two separate conflicts – an *information conflict* between the incongruent word and ink color, which manifests in incongruent trials because of the incongruency between taskrelevant and task-irrelevant *information* (e.g., blue and red); and a *task conflict* between the relevant color-naming *task* and the irrelevant, stimulus-driven word-reading task, which manifests in incongruent as well as in congruent trials because words trigger an automatic tendency to read (also see Rogers and Monsell, 1995; MacLeod and MacDonald, 2000; Levin and Tzelgov, 2016b; Kalanthroff et al., 2018a). Thus, while Stroop incongruent trials consist of both information conflict and task conflict, Stroop congruent trials consist of task conflict and not information conflict. Accordingly, the RT difference between non-word neutrals (which serve as a conflict-free baseline of general performance) and congruent conditions commonly serves as a measure of task conflict (Goldfarb and Henik, 2007; Kalanthroff et al., 2018a). Dissociation between the two conflicts was demonstrated by their diverse patterns of brain activation (Aarts et al., 2009; Desmet et al., 2011; Elchlepp et al., 2013) and their reflection in different components of an ex-Gaussian distribution (Steinhauser and Hübner, 2009; also see Aarts et al., 2009; Moutsopoulou and Waszak, 2012; Shahar and Meiran, 2015). These findings support the existence of task conflict as a specific type of conflict that is dissociated from other conflict types.

#### PHYSIOLOGICAL SIGNATURE OF TASK CONFLICT AND TASK CONTROL

The resolution of task conflict is managed by the activation of a task control mechanism (Entel et al., 2015; Kalanthroff et al., 2018a; Schuch et al., 2019). Neuroimaging studies have shown that the anterior cingulate cortex (ACC) – a brain area that is involved in conflict monitoring (Carter et al., 1998, 1999; Botvinick et al., 1999, 2004; Bush et al., 2000; Braver et al., 2001; Kerns et al., 2004) is more active, not only when contrasting incongruent Stroop trials to non-word neutrals but also when contrasting congruent trials to non-word neutrals (Bench et al., 1993; Carter et al., 1995; Milham et al., 2002; Aarts et al., 2009).

Recent neuroimaging studies have provided evidence for the locus of task control in the brain. These studies have manipulated task conflict by using a word-arrow version of the Stroop task (Aarts et al., 2009) or by manipulating the proportion of congruent, incongruent, and neutral trials within Stroop blocks (Grandjean et al., 2012, 2013), a manipulation that reduces or enhances task control (see below). The data from these studies (Aarts et al., 2009; Grandjean et al., 2012, 2013) support the idea that task conflict results in activation of the ACC, the medial superior frontal gyrus (MFC), and ventral areas of the lateral prefrontal cortex (L-PFC). Subsequently, the resolution of task conflict is reflected by an involvement of the dorsal part of the L-PFC (DL-PFC), which marks the top-down monitoring processes of favoring the relevant task and the implementation of task demands (MacDonald et al., 2000; Egner and Hirsch, 2005; Carter and Van Veen, 2007; Brosnan and Wiegand, 2017). Additional findings marked the differences in brain activation in the face of task conflict and information conflict. While both conflicts activated the ACC and the MFC, information conflict was associated with activity in ventral L-PFC, whereas task conflict activated both ventral and dorsal regions (Aarts et al., 2009).

Other studies have employed Stroop tasks while scrutinizing changes in pupil dilation, which has been used as a measure of effort extraction and the employment of cognitive control (Kahneman and Beatty, 1966; for reviews see Beatty and Lucero-Wagoner, 2000; Laeng et al., 2012; Sirois and Brisson, 2014; van der Wel and van Steenbergen, 2018). These studies provided evidence for interference and facilitation effects, measured by pupil dilation (Brown et al., 1999; Siegle et al., 2004, 2008; Laeng et al., 2011; Hasshim and Parris, 2015). Recently, Hershman and Henik (in press) reported a dissociation between task conflict and information conflict by measures of pupil dilation. Specifically, participants' pupils became dilated when observing both congruent and incongruent trials in comparison to non-word neutrals at about 500 ms after the stimulus onset. A second dilation became evident for incongruent trials only at about 900 ms after the stimulus onset. These data show that the emergence of task conflict (and the recruitment of task control) precedes the emergence of information conflict and support previous suggestions after which the presentation of two task sets lead to the emergence of task conflict even before information regarding stimulus' identity of dimensions begins to compute (MacLeod and MacDonald, 2000; Monsell et al., 2001; Goldfarb and Henik, 2007; Steinhauser and Hübner, 2009; Braverman et al., 2014).

## BEHAVIORAL SIGNATURE OF TASK CONFLICT AND TASK CONTROL

The physiological evidence for the emergence of task conflict in Stroop congruent trials appears to stand in contradiction with behavioral findings, which indicate that responses to congruent trials are often faster than to neutral trials. It has been suggested (Goldfarb and Henik, 2007; Kalanthroff et al., 2018a) that in healthy adults, task control is highly efficient and leads to a rapid resolution of task conflict. Hence, task conflict is not behaviorally observable under standard conditions but can be seen under specific conditions, yielding in Stroop reverse facilitation (RF; faster responses to neutral stimuli than to congruent stimuli), which serves as the behavioral signature of task conflict (Kalanthroff et al., 2018a). For example, to illustrate Stroop RF, several studies have manipulated the proportion of congruent, incongruent, and neutral trials, creating blocks that consist of a majority or a minority of non-word neutrals, a manipulation that reduces or enhances task control, respectively, as participants mostly encounter non-conflictual or conflictual trials (Tzelgov et al., 1992; Goldfarb and Henik, 2007; Kalanthroff et al., 2013c; Entel et al., 2015; Shichel and Tzelgov, 2018). Other studies presented a cue that indicated whether the following trial will be conflictual or not (Goldfarb and Henik, 2007), have manipulated the length of the responsestimulus interval (RSI; Parris, 2014), or combined the Stroop task with additional measures of working memory (Kalanthroff et al., 2015), inhibitory control (Kalanthroff and Henik, 2013; Kalanthroff et al., 2013b), and task switching (Kalanthroff and Henik, 2014). The accumulating evidence from these studies shows that, when task control is overloaded, or, alternatively, when task control is reduced and "put to sleep," Stroop RF, signifying the behavioral marker of task conflict, becomes evident (however see Augustinova et al., 2018, for different results when using an RSI procedure). Recently, Kalanthroff et al. (2018a) have presented a computational model of the Stroop task, the proactive control/task conflict (PC-TC) model, which illustrates the resolution of task conflict and its modulation by task control (**Figure 1**). This model extends a previous model of the Stroop task (Botvinick et al., 2001) by accounting for the effects of task conflict and predicting RF. Behavioral evidence of task conflict was also demonstrated in task-switching paradigms (Braverman and Meiran, 2010; Schneider, 2015; Bugg and Braver, 2016), where a cue indicates which of two pre-determined tasks the participant needs to execute during a given trial. Unlike the Stroop task, in task-switching paradigms both tasks are relevant to some extent and the controlled process of favoring the relevant task cannot be prepared in advance.

The evidence discussed above illustrates task control as a specific type of cognitive control mechanism, which is recruited to resolve a specific type of conflict, task conflict. In the following section, we suggest that the emergence of task conflict and the recruitment of task control are strongly related to the concept of stimulus-driven behaviors.

## TASK CONFLICT IN THE CONTEXT OF STIMULUS-DRIVEN BEHAVIORS

Stimulus-driven behaviors are triggered by the specific stimuli with which they are associated (Monsell, 2003; Waszak et al., 2003; Koch and Allport, 2006; Reuss et al., 2011; Ganor-Moscovitz et al., 2018; Hochman et al., 2018). This concept has been widely investigated outside the scope of the taskcontrol framework, and it echoes the findings of instrumental conditioning in animal studies: After an association between a stimulus and an action was established, animals were shown to keep responding to the stimulus even when it no longer

FIGURE 1 | Architecture of the proactive control/task conflict (PC-TC) model of the Stroop task. From Kalanthroff et al., 2018a, p. 2. Copyright 2018 by American Psychological Association. Reprinted with permission from American Psychological Association. In this model, task control is considered a proactive, effortful process that deploys control in advance of the stimulus for the resolution of conflict (De Pisapia and Braver, 2006; Braver et al., 2007; Barch and Ceaser, 2012; Braver, 2012). Pointy-headed arrows represent excitatory connections, whereas the round-headed arrows represent inhibitory connections. A stimulus activates its color and lexical representations in the input (features) layers. The activations from the input layers propagate to the response layer and to the task demand layer, which feeds back to the input layers. Congruent and incongruent color words, but not (non-word) neutral stimuli, activate both task demand units, which lead to task conflict. This task conflict inhibits the response layer, thereby slowing down responses to color words and resulting in Stroop reverse facilitation effect. When proactive control is high, attention is sufficiently biased in a top-down manner to the color-naming task demand unit, thus preventing (or rapidly resolving) task conflict and resulting in Stroop facilitation effect. However, manipulations that reduce proactive control lead to a stronger capture of attention by the irrelevant task dimension (word meaning), resulting in a reverse facilitation effect. This process takes place in both congruent and incongruent trials. In incongruent trials, an additional information conflict takes place when both input layers provide contradictory information (e.g., blue in the color features and green in the lexical features), leading to the activation of the two (mutually inhibitory) response units in the response layer, which causes the slowing down of reaction time and result in a (robust) Stroop interference effect.

predicted a reward and demonstrated spontaneous recovery of the stimulus-response (S-R) association even after undergoing extinction (Graham and Gagné, 1940; Guttman, 1953; Skinner, 1953; Rescorla, 1993; Bouton, 2004). In humans, several studies have demonstrated the automatic triggering of response activation processes when facing stimuli which were associated with certain responses, even when these responses were not eventually executed (Osman et al., 1992; De Jong et al., 1994; Eimer, 1995; Valle-Inclán, 1996; Gibbons and Stahl, 2008; also see Rothermund et al., 2005).

The concept of S-R binding is relevant to the processes taking place in the Stroop task (Mordkoff, 1996; Schmidt et al., 2007; Schmidt and Besner, 2008), where words elicit automatic reading behavior, even without an explicit intention to read (MacLeod and MacDonald, 2000; Monsell et al., 2001; Perlman and Tzelgov, 2006; Augustinova and Ferrand, 2014). Consequently, when the stimulus-driven reading behavior is incongruent with one's current goals, task conflict between stimulus-driven and goal-directed behaviors emerges, requiring the activation of a task control mechanism for the resolution of conflict (Kalanthroff et al., 2018a). Hence, in both congruent and incongruent Stroop conditions, stimulus-driven task-irrelevant word reading is incongruent with the relevant task of color naming, leading to the emergence of task conflict. Importantly, interference due to task conflict can manifest as long as the stimulus can be read, regardless of whether it is color related or not (Levin and Tzelgov, 2014, 2016a). Hence, non-color word neutrals (e.g., CHAIR in red) and pseudo words (e.g., HIX) also trigger the stimulus-driven reading behavior and result in the emergence of task conflict (Monsell et al., 2001; Goldfarb and Henik, 2007; Kinoshita et al., 2017; Kalanthroff et al., 2018a). The following examples illustrate the manifestation of stimulusdriven task conflict in different tasks and under diverse modalities of response in addition to the Stroop task.

Following the notion that form-based object-naming and classification is habitual and automatic in children (Kagan and Lemkin, 1961; Siegel and Vance, 1970; Bloom, 2002; Diesendruck and Bloom, 2003), Prevor and Diamond (2005) have used a color-object Stroop task, asking young children to name the colors of abstract shapes and familiar objects, which were presented in their congruent (e.g., a yellow banana), incongruent (e.g., a blue banana), or neutral (e.g., a purple scissors) colors. Because of their stimulus-driven tendency to name the objects, children were slower and less accurate in naming the color of namable objects in comparison to abstract forms, even when the objects appeared in their congruent colors. In a series of studies, La Heij and colleagues have replicated and elaborated these findings (La Heij et al., 2010; La Heij and Boelens, 2011, 2013; also see Starreveld and La Heij, 2017). Specifically, the authors demonstrated that the "object-interference effect" manifests due to the competition between the task set of color naming and the children's stimulus-driven prepotent tendency to name the object and not by other types of conflicts, such as lexical-based response conflict (La Heij et al., 2010; La Heij and Boelens, 2011). These findings implicate a stimulus-driven task conflict, which resembles the task conflict taking place in the Stroop task, manifesting in children who are unable to read.

Recently, we have investigated the emergence of task conflict in an affordance task. According to Gibson's (1979) theory of affordances, a common manipulatable object may trigger a response that has acquired a strong association with it (Rogers and Monsell, 1995; Allport and Wylie, 2000). Thus, simply viewing a manipulatable object triggers automatic and specific motor plans for interacting with it, even in the absence of an explicit intention for interaction (Vainio et al., 2008; Makris et al., 2013), as is evident by the automatic activation of the pre-motor cortex (Martin et al., 1996; Creem-Regehr and Lee, 2005; Beauchamp and Martin, 2007; Proverbio et al., 2011, 2013; Righi et al., 2014). In affordance tasks, participants are asked to classify objects (e.g., natural vs. manufactured) by responding with their left or right hand. The objects are presented as to trigger an automatic grabbing response in one hand (e.g., a cup with the handle turning rightwards), and the participants must suppress their automatic tendency of grabbing the object by its extended handle. Participants typically respond faster and more accurately when the relevant response (classifying the object) and the automatic, task-irrelevant response (grabbing the object) result in the activation of the same hand rather than different hands (Tucker and Ellis, 1998, 2004; Ellis and Tucker, 2000; Phillips and Ward, 2002; Tipper et al., 2006; Vainio et al., 2007; Pellicano et al., 2010). Recent data from our lab show that the resolution of task conflict in the Stroop task strongly predicted the resolution of conflict in the affordance task level (grab the object vs. classify the object), but not in the affordance response level (responding with the right hand vs. left hand; Littman & Kalanthroff, manuscript in preparation). These findings link the emergence of stimulusdriven task conflict in both tasks, indicating the operation of a shared task control mechanism. As the Stroop task is based on linguistic skills and the affordance task calls for the activation of visuomotor abilities, these findings also illustrate the emergence of task conflict (and the recruitment of task control) in different tasks and under diverse modalities of response.

Recently, the conceptualization of task conflict as the result of stimulus-driven behaviors has proven to be an efficient framework for the understanding of several pathologies (Kalanthroff et al., 2018a). For example, it has been proposed that compulsivity in obsessive-compulsive disorder (OCD) may be strongly connected to excessive stimulus-response habit formation, rendering patients' capability of following elaborated environmental models in a manner that supports goal-directed behavior (Robbins et al., 2012; Kalanthroff et al., 2013a, 2018b; Gillan et al., 2014, 2015). In line with the task conflict framework, failure to suppress irrelevant stimulus-driven behaviors as a result of reduced task control functioning was suggested to be a pathological trait that also constitutes a core characteristic of the inability to suppress compulsive behaviors (Kalanthroff et al., 2017, 2018b). Following this line of study, interventions for the amelioration of task control abilities may prove useful for the enhancement of OCD patients' capability to suppress their urges to engage in compulsive behaviors.

## CONCLUSION

In the present work, we have reviewed the literature of task conflict, which manifests when several, contradictory task sets are activated simultaneously. The accumulating evidence aid portraying task conflict as a unique feature of cognitive control, which is distinct from other conflict types and results in specific neuronal and behavioral signatures. Task conflict has been shown to manifest under the Stroop task and additional tasks including task switching, object interference, and affordance tasks, and to be strongly related to the concept of stimulusdriven behaviors.

One final note should be mentioned. Despite the ample evidence for the manifestation of different conflict types in the Stroop and Stroop-like tasks (Kornblum, 1992, 1994; Kornblum and Lee, 1995), some researchers who are interested in Stroop interference seem to neglect that it goes beyond response competition or ignore the (non-word) neutral condition and use the RT difference between congruent and incongruent conditions as a sole measure. These practices may lead to overlooking some important aspects of cognitive control and result in misinterpretations of certain results (Augustinova et al., 2018; Hershman and Henik, in press). To avoid such errors, the contribution of task conflict to the general Stroop conflict should be regularly considered.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### REFERENCES


#### FUNDING

The authors are supported by the Israel Science Foundation (grant no. 31/3431) and the National Institute for Psychobiology, Israel (21517-18b).

#### ACKNOWLEDGMENTS

We thank Hadar Naftalovich for her useful input on this article.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Littman, Keha and Kalanthroff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## No Negative Priming Effect in the Manual Stroop Task

Luke Mills<sup>1</sup> \*, Sachiko Kinoshita1,2 and Dennis Norris<sup>3</sup>

<sup>1</sup> Department of Cognitive Science, Macquarie University, Sydney, NSW, Australia, <sup>2</sup> Department of Psychology, Macquarie University, Sydney, NSW, Australia, <sup>3</sup> MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom

The negative priming effect is an increase in interference when the response to the target on the current trial corresponds to the response to the distractor word on a preceding trial. Contrary to the commonly held belief that the negative priming effect is ubiquitous in the Stroop task, in the original study by Neill (1977), negative priming was found only in the oral, and not the manual Stroop task. The present paper makes three empirical observations. First, we replicate the discrepancy in the finding of the negative priming effect in the oral versus manual Stroop tasks tested under identical conditions, where response mode could be the only the causal factor. Second, we point out that previous manual Stroop experiments reporting the negative priming effect confounded the effect of response repetition. Third, we report the analysis of the negative priming effect at the level of whole RT distribution, which revealed that the effect was absent throughout the RT distribution in the manual task, and it was of constant size across the RT distribution in the oral task. Implications of the results for conflict control in the Stroop task is discussed.

#### Edited by:

Maria Augustinova, Université de Rouen, France

#### Reviewed by:

Derek Besner, University of Waterloo, Canada Hagit Magen, The Hebrew University of Jerusalem, Israel

#### \*Correspondence:

Luke Mills luke.mills@hdr.mq.edu.au

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 15 May 2019 Accepted: 15 July 2019 Published: 02 August 2019

#### Citation:

Mills L, Kinoshita S and Norris D (2019) No Negative Priming Effect in the Manual Stroop Task. Front. Psychol. 10:1764. doi: 10.3389/fpsyg.2019.01764 Keywords: negative priming effect, Stroop task, response mode, RT distribution analysis, conflict control

## INTRODUCTION

To stay on task while ignoring prepotent conflicting distractors is important in everyday life. A major research tool used to investigate this conflict control process is the Stroop task (Stroop, 1935), in which the participant is presented with a word in color and instructed to name the color, ignoring the word. The finding of an interference effect when the word is incongruent with the response color (e.g., the word GREEN presented in red) relative to a neutral non-readable stimulus (e.g., a row of #s) is highly robust, and is taken as evidence that the word was read, despite the instruction to ignore the word. As noted by Besner (2001), the Stroop interference effect is therefore widely regarded as demonstrating the automaticity of word reading; at the same time, however, the size of Stroop interference effect can be modulated, which is taken to indicate attentional control.

In a recent review making a case for the automaticity of reading in the Stroop task, Augustinova and Ferrand (2014) wrote that "if any intervention is found to indisputably prevent or control word reading, then this finding should be mirrored in complementary analyses, such as those involving negative priming (i.e., an additional indicator of the fact that the word dimension of a Stroop word has been read)" (p. 347). The negative priming effect is the slowdown in response to a stimulus that had to be ignored previously. The effect is well-established in a picture naming paradigm involving two overlapping line drawings presented in different colors (e.g., picture of a sparrow in green superimposed on a picture of a rabbit in red) one of which (e.g., red) designates the to-be-named item ("rabbit" in the present example) (Tipper, 1985). Compared to a control condition in which

**24**

the preceding trial contains two items that are unrelated to the two pictures in the current trial (e.g., the preceding trial contains a picture of a car in green superimposed on a picture of a tree in red), naming is slowed down when the to-be-named picture in the current trial was the to-be-ignored picture in the preceding trial (e.g., the preceding trial contained a picture of rabbit in green). This effect was originally explained in terms of distractor inhibition in the service of conflict control – "one means by which a response can be directed toward a target stimulus in the presence of a distractor that competes for the control of action, is for inhibition mechanisms to suppress the activation levels of the distractor's internal representations" (Tipper, 2001, p. 322). While other accounts that do not assume inhibition of the distractor representation have been proposed (see reviews by e.g., Tipper, 2001; Mayr and Buchner, 2007; and also Tipper and Cranston, 1985), in the present context, what is relevant is that the negative priming effect is assumed to be an index of a mechanism of conflict control.

It is widely believed that the negative priming effect is present in the Stroop task. In MacLeod's (1991) comprehensive review of the Stroop literature, the negative priming effect is listed as one of the "Eighteen Major Empirical Results That Must be Explained by Any Successful Account of the Stroop Effect." In a more recent review of the Stroop phenomena extending the reach to brain imaging data (MacLeod and MacDonald, 2000) negative priming is described as a well-established phenomenon in the Stroop task. It was a surprise to us, therefore, to read in the classic paper that established the negative priming effect in the Stroop task that the finding was limited to certain task conditions, and it is instructive to describe this study in detail.

Neill (1977) was the first to report finding a negative priming effect in the Stroop task, using the now standard, discrete trial version of the task. Although Dalrymple-Alford and Budayr (1966) have reported the effect earlier using a list version of the (oral) Stroop task, Neill (1977) noted that this finding may have been due to "a tendency to look ahead to the subsequent item while trying to respond to the current one" (p. 445). In Experiment 1, Neill used an oral (color naming) Stroop task, with all trials being incongruent<sup>1</sup> (with four response colors – red, green, blue, and yellow – thus comprising twelve color-word combinations). Eight participants were presented with 1,000 trials (20 blocks of 50 trials), in one 1-hour session.

Neill (1977) classified the trials into seven categories on the basis of whether the current target color or distractor matched the distractor or color on the preceding trial (see **Table 1** – although Neill (1977) did not use these labels). There were two critical categories: (1) the NONE condition, where there is no overlap between the distractor or target color on the current trial and the preceding trial, e.g., the word YELLOW presented in blue followed by RED presented in green; (2) the WORD-COLOR condition where the distractor word on the preceding trial is the target color on the current trial e.g., the word YELLOW in blue followed by RED in yellow. Neill (1977) defined the negative priming effect as the difference between NONE, which he referred to as the "unrelated" condition, and WORD-COLOR, which he referred to as the "related" condition. (This definition is also standard in the negative priming experiments.) In his Experiment 1, these two conditions yielded mean RTs of 823 ms and 855 ms, respectively (i.e., a 22 ms negative priming effect), a significant difference, with all eight participants showing the effect in the same direction.

In Neill's (1977) Experiment 2, participants responded manually. Six participants were tested over 6 days, with each day containing 20 blocks of 100 trials. On Days 1 and 6, congruent (e.g., the word RED presented in red) and neutral (four 0s presented in color) conditions were included to test if the standard Stroop congruence effects are found (they were: On Day 1, the incongruent, congruent, and neutral conditions yielded mean RT of 727, 665, and 670 ms, respectively; on Day 6, 572, 552, and 557 ms, respectively). In addition, on Days 2–5, the critical 10 color-response blocks alternated with 10 blocks in which participants were instructed to respond to the word. Unlike Experiment 1, this manual Stroop experiment (based on the data from the critical color-response blocks) did not show a negative priming effect: The related (WORD-COLOR) condition was in fact faster than the "unrelated" (NONE) condition, 706 ms and 715 ms, respectively.

In a later study, Neill and Westberry (1987) investigated the reason(s) for the discrepancy between the two experiments. They proposed that a likely explanation for the contradictory results of Neill (1977) lies not in the response mode (oral vs. manual), but in the different demands for speed versus accuracy in the two experiments. To test the latter, Neill and Westberry, (2001, Experiment 1) manipulated the instructional emphasis on speed vs. accuracy in a manual Stroop task. The negative priming effect was found when accuracy was emphasized but not when speed was emphasized, which led the authors to conclude that the emphasis on accuracy may have encouraged the use of inhibitory

TABLE 1 | Seven categories of trial type with examples based on the relationship between the distractor and response color in the preceding trials and the current trial.


The color name in CAPS denote the distractor and the color name in lowercase denote the response color (e.g., YELLOWblue denotes the word YELLOW presented in blue). The two last categories are ambiguous: COLOR-COLOR-WORD-WORD is a complete repetition and may be considered to be an instance of COLOR-COLOR or WORD-WORD; COLOR-WORD-WORD-COLOR is a complete reversal and may be considered to be an instance of WORD-COLOR or COLOR-WORD. Each of the five categories except the last two ambiguous categories are expected to occur approximately on one sixth of the trials by chance.

<sup>1</sup> It is worth noting that in the picture naming task standardly used to investigate the negative priming effect, the overlapping pictures are always different hence the trials are all incongruent.

processes. Two points may be noted about this experiment, however. One is that in the "accuracy emphasis" condition under which negative priming effect was found, the RTs were unusually slow (well over 800 ms). The fact that the negative priming effect was absent under the "speed emphasis" condition where the RTs were more representative of manual Stroop experiments could mean that negative priming effect is generally absent in the manual Stroop task conducted under typical experimental conditions. A second point is that this experiment did not compare the oral and manual Stroop tasks and hence the possibility that response mode is a factor responsible for the discrepancy has not been ruled out.

In summary, in the classic study oft-cited as the first report of the negative priming effect, contrary to the popular belief that the negative priming effect is ubiquitous, the original Neill (1977) study did not find an inhibitory negative priming effect in the manual Stroop task. In a later study employing the manual Stroop task, Neill and Westberry (1987) found that negative priming was only present in the manual task when accuracy was emphasized. A recent study using the manual Stroop task (Hazeltine and Mordkoff, 2014) also found no negative priming effect, finding little difference between the NONE condition (684 ms) and the WORD-COLOR condition (690 ms).

In contrast to these null findings, other studies used the manual Stroop task and reported finding a sizable negative priming effect: Besner (2001) reported a 52 ms negative priming effect; Raz and Campbell (2011) reported finding a 20 ms effect (see also Juvina and Taatgen, 2009).<sup>2</sup> However, negative priming was calculated differently in these studies than in the Neill and Westberry (1987) study. In particular, negative priming was not measured in terms of the difference between the WORD-COLOR condition and the NONE condition, but instead was referenced against a wider range of conditions. We will return to these studies in the Discussion. In the present study, our aim was 2-fold. The first was to see if Neill's (1977) original findings of inhibitory negative priming effect in the oral Stroop task, but not the manual Stroop task, can be replicated, and the second was to analyze the negative priming effect at the level of whole RT distributions. There were two reasons for conducting this replication study. First, Neill (1977) tested a small number of highly trained participants (8 participants over 1,000 trials in the oral Stroop task and 6 participants in 2,000 trials × 6 sessions = 12,000 trials in the manual Stroop task). It is not known whether the findings can be replicated under more typical experimental conditions. Second, in addition to response mode, Neill's (1977) two experiments differed in other important ways: The manual experiment was conducted over 6 days, bookended by blocks containing congruent and neutral trials, and further, the color-response blocks alternated with word-reading blocks. Here, the oral and manual Stroop tasks were tested under identical conditions containing the incongruent trials only, hence were the patterns of negative priming effects to differ between experiments, response mode would have to be the only causal factor. Such a result would have important implications for interpreting the data obtained with the manual Stroop task: If the negative priming effect is absent where it is expected, the effect cannot serve as an index of a mechanism of conflict control.

#### RT Distribution Analysis

The second aim of our study was to analyze the negative priming effect at the level of whole RT distribution. Previous studies examining the negative priming effect in the Stroop task have analyzed only the mean RT. As pointed out by Balota and Yap (2011), an analysis of RT distributions can provide richer information than the analysis of mean RT, because the distribution of RTs in speeded response tasks is almost always positively skewed, and hence the effect of manipulation may not be captured accurately by the mean. RT distribution analysis could also provide insights into the cognitive mechanism underlying the effect.

The method of RT distribution analysis used in the present study is quantile analysis. Quantile analysis is a non-parametric method of RT distribution analysis that involves rank ordering the RTs for each participant in each condition from fastest to slowest and then dividing them into equal size bins (e.g., the first bin contains the fastest 25% of RTs, the second bin contains the next faster 25% of RTs and so on). The quantiles for each subject in each condition are estimated by taking the mean of the fastest trial of the slower bin and the slowest trial of the faster bin. The quantile estimates are then averaged across each subject in each condition to form the quantile estimates for each condition. The quantile estimates for each condition can then be depicted graphically using a quantile plot and the size of the experimental effect as a function of quantiles can be depicted using a delta plot.

Conflict tasks, such as the Stroop task, have been found to produce three general delta plot patterns (Pratte et al., 2010): In one pattern, the delta slope shows a positive increase across the quantiles, indicating that the size of the effect increases as responses slow. In another pattern, the delta slope is flat, indicating that the size of the effect remains constant across the quantiles. In the third pattern, the delta slope is negative, indicating the size of the effect decreases as responses slow.<sup>3</sup>

Pratte et al. (2010) proposed that the positive and flat delta slope patterns are concordant with evidence accumulation models, such as the diffusion model (Ratcliff, 1978), which view decision making in speeded tasks as a process of accumulating evidence from the stimulus until enough evidence has been

<sup>2</sup> Juvina and Taatgen (2009) used an unusual response procedure. In their study, there were four response colors, but only two response keys. On each trial the two response options indicating color names (presented in black), one of which corresponding to the correct response color, were shown, one of the left and one on the right on the screen. This response procedure differs substantially from the standard response procedure in the manual Stroop task in which each response color is assigned a key, and the key assignment remains constant throughout the experiment. As pointed out by Juvina and Taatgen themselves, their response procedure places a greater requirement to read the word than the standard procedure, and it is unclear whether the sizable negative priming effect (45 ms) they reported was due to the non-standard response procedure, or the way the negative priming effect was calculated (see section "General Discussion"), and this study will not be discussed further.

<sup>3</sup>The negative delta slope pattern is unusual in conflict tasks, and has been found only with some versions of the Simon and Flanker tasks. The readers are referred to De Jong et al. (1994), Pratte et al. (2010) and Burle et al. (2014) for detail.

accumulated for response selection. In this framework, a positive delta slope is concordant with the manipulation affecting the rate of evidence accumulation ("drift rate" in the diffusion model), while a flat slope is concordant is with a change in decision threshold or "non-decision time" (which subsumes the encoding of stimulus before the evidence accumulation process begins, and the preparation of motor response). It is well-established that both the classic Stroop interference effect as indexed by the difference between the incongruent condition (e.g., GREEN presented in red) and the neutral condition (e.g., a row of #s presented in red) and the Stroop congruence effect as indexed by the difference between the incongruent condition and the congruent condition (e.g., GREEN presented in green) increase as responses slow, i.e., they show a positive delta slope (e.g., Steinhauser and Hübner, 2009; Pratte et al., 2010), and this is the case for both the oral and manual Stroop task (e.g., Kinoshita et al., 2017). This positive delta slope may be interpreted as reflecting that the evidence needed for response selection is accumulated from the word distractor at the same time as the color target, and the two are integrated during the evidence accumulation process. On the assumption that the classic Stroop interference effect (and the Stroop congruence effect) and the negative priming effect have the same origin, in inhibitory control, it would be expected that the negative priming effect also shows a positive delta slope (i.e., an increase in the effect as responses slow).

#### EXPERIMENT 1 (ORAL)

#### METHOD

#### Participants

Twenty students from Macquarie University participated in the experiment in return for course credit. Both experiments reported here were approved by the Macquarie University Human Research Ethics Committee.

#### Design

Experiment 1 used an oral Stroop color naming task. The dependent variables were color response latency and error rate, examined as a function of the five types of relationship between the distractor and response color on the preceding and current trials described in the Introduction (see **Table 1**).

#### Materials

The stimuli were four color names, RED, YELLOW, GREEN and BLUE presented in one of four colors, red (RGB 255, 000, 000), yellow (RGB 255, 255, 000), green (RGB 000, 128, 000) or blue (RGB 000, 000, 255), against a gray background (RGB 200, 200, 200). Each word was presented only in an incongruent color (e.g., RED was presented in yellow, green and blue, but not in red) thus there were twelve color-word combinations in total.

Each color-word combination was presented 32 times, resulting in 384 trials. The 384 trials were divided into eight sublists of 48 trials with each sublist containing an equal number of the 12 color-word combinations. Different random order of trials was generated for each sublist.

#### Apparatus and Procedure

Participants were tested individually, seated approximately 60 cm in front of a flat screen monitor, upon which stimuli were presented. Each participant completed 384 color identification trials, presented in eight blocks (with each block containing 48 trials) with a self-paced break between the blocks. A practice block of 48 trials containing an equal number of color-word combinations preceded the test blocks.

Participants were instructed at the outset of the experiment that on each trial they would be presented with a word presented in an incongruent color, in one of four colors, red, yellow, green or blue. The participants were instructed to make their responses as quickly as possible, while still maintaining accuracy. Before the experiment, participants were given eight color naming practice trials with five hash signs (#####) presented in each of four response colors.

Stimulus presentation and data collection were achieved using the DMDX display system developed by KI Forster and JC Forster at the University of Arizona (Forster and Forster, 2003). Stimulus display was synchronized to the screen refresh rate (10.01 ms).

Each trial started with the presentation of a fixation signal (a plus sign) for 250 ms, in the center of the screen. It was replaced by a blank screen for 50 ms, then by a word presented in one of four colors (red, yellow, green or blue) for a maximum of 2,000 ms, or until the participant made a response. In the oral Stroop task, the participant spoke the color name into the microphone which triggered the voice key. After the participant's response, the screen went blank for 816 ms after which the next trial started. All stimuli were presented in Lucinda Console 12 point font. Participants were given no feedback during the experiment. The experimenter sat next to the participant and recorded errors during the experiment.

#### Results

Two sets of analyses are reported below. The first analysis is of individual trial RTs, using linear mixed effect model (Baayen, 2008). Next, we analyzed for the negative priming effect at the level of the RT distribution using quantile analysis and delta plots.

## Mean RT

In this and subsequent experiment, correct RTs and error rates were analyzed according to the following procedure. In the analysis of RTs, we first examined the shape of the RT distribution for correct trials, and excluded those faster than 250 ms as outliers (most of the fast outliers were voice key trigger errors). In Experiment 1, 282 data points (out of 7501 trials, 3.7%) were identified as outliers.

We analyzed the RT data using linear mixed effects model with Trialtype (NONE, COLOR-COLOR, COLOR-WORD, WORD-WORD, WORD-COLOR) as a fixed factor and subjects and stimuli as crossed random factors (Baayen, 2008). (The additional ambiguous categories were included, but were not considered, in the analysis.) RT was log-transformed to reduce the positive skew as recommended by Baayen (2008). LogRT was analyzed using the Lme4 package (Version 1.1-5 Bates et al., 2014), implemented in R Version 3.4.3 (R Core Team, 2017). Degrees of freedom (Satterthwaite's approximation) and p-values were estimated

using the lmerTest package (Version 2.0-33 Kuznetsova et al., 2016). The initial model included only the random intercepts on participants and stimuli, and if the model comparison indicated a significantly better fit, the more complex model including random slopes (of the Trialtype factor) was preferred.

Error rates were analyzed with generalized linear mixed effects model with subjects and stimuli as crossed random factors, using the logit function appropriate for categorical variables (Jaeger, 2008). In both experiments, the model tested was: Error rate ∼ Trialtype + (1 | subject) + (1 | stimulus), with the Trialtype factor referenced to the NONE condition.

The mean correct RT and error rates are shown in **Table 1**.

In Experiment 1, the final model we report is: logrt ∼ Trialtype + (1| stimuli) + (1 + Trialtype| subject), with the Trialtype factor referenced to the NONE condition. The model showed that WORD-COLOR condition was significantly slower than the NONE condition, B = 0.057, SE = 0.010, t = 5.509, p < 0.001, i.e., a negative priming effect (32 ms). The COLOR-COLOR condition was significantly faster than the NONE condition, B = −0.097, SE = 0.016, t = −5.951, p < 0.001, i.e., a response repetition benefit (66 ms). The WORD-WORD condition was marginally slower than the NONE condition (13 ms), B = 0.026, SE = 0.012, t = 2.133, p < 0.05. The COLOR-WORD condition did not differ significantly from the NONE condition (10 ms), B = 0.022, SE = 0.012, t = 1.803, p = 0.085. We also computed Bayes Factors (BF) using the BayesFactor R package (Morey and Rouder, 2015). A BF indexes the relative strength of evidence for one hypothesis over another. The typical value considered to be reliable evidence for a hypothesis is a BF > 3 (Jeffreys, 1961). The BF for the negative priming effect was 38,917,899, indicating exceedingly strong evidence for its presence.

## Error Rate

Error rate was not analyzed as it was too low to warrant an analysis.

## RT Distribution Analysis

The quantiles for the negative priming conditions (NONE vs. WORD-COLOR) were estimated using QMPE version 2.0 (Heathcote et al., 2004). This involved rank ordering the correct RTs for each subject in both the NONE and WORD-COLOR conditions from fastest to slowest and then dividing them into four equal sized bins (i.e., the first bin contains the fastest 25% of RTs, the second bin contains the next fastest 25% of RTs, etc.). The average of the slowest trial of the faster bin and the fastest trial of the slower bin made the 4 quantile estimates. The quantiles were analyzed using a 2 (negative priming: NONE vs. WORD-COLOR) X 4 (quantiles) ANOVA in JASP version 0.9 (JASP Team, 2018). Averaged across quantiles the negative priming effect was significant, F(1,19) = 15.6, p < 0.001, with RTs for WORD-COLOR trials being slower than for NONE trials. The negative priming effect did not interact with quantiles, F(3,57) = 0.5, p = 0.693, indicating that the magnitude of the negative priming effect was constant across the quantiles of the RT distribution, resulting in a flat delta slope (see **Figure 1**).

## EXPERIMENT 2 (MANUAL)

## Participants

Twenty-one students from Macquarie University, additional to those in Experiment 1, participated in the experiment in return for course credit.

## Design and Materials

The design and materials were identical to Experiment 1.

### Apparatus and Procedure

The apparatus and the general procedure were identical to those of Experiment 1, except that the response mode was manual. Participants were instructed that they will be presented with stimuli consisting of color names presented in an incongruent color, and their task was to identify the color of the stimulus, as quickly and accurately as possible, by pressing one of four keys. The participants were instructed to make their responses as quickly as possible, while still maintaining accuracy. They were instructed to press the key Z for red, X for yellow, N for green, and M for blue (the four keys occurred in the bottom row of the QWERTY keyboard), the Z and X keys with their left middle and index fingers, and the N and M keys with their right index and middle fingers. During the practice block a card showing the spatial arrangement of the response keys colored in the corresponding color was displayed to facilitate learning the key assignment; the card was removed at the beginning of the test trials.

Participants were given a feedback following each trial (the word "CORRECT" or "WRONG" presented during the intertrial interval).

## Results

The same procedure for the preliminary treatment of RT data as Experiment 1 was applied to Experiment 2. In this experiment, out of a total of 7446 data points, no data point was identified as an outlier (faster than 250 ms).

#### Mean RT

The final model we report is: Logrt ∼ Trialtype + (1| stimuli) + (1 + Trialtype| subject), with the Trialtype factor referenced to the NONE condition. In this experiment, there was little difference between the WORD-COLOR condition and the NONE condition (−2 ms), B = −0.011, SE = 0.015, t = −0.731, p = 0.47, i.e., no negative priming effect. As in Experiment 1, the COLOR-COLOR condition was significantly faster than the NONE condition, B = −0.32, SE = 0.027, t = −12.353, p < 0.001, i.e., a response repetition benefit (208 ms). The WORD-WORD condition did not differ from the NONE condition, B = −0.027, SE = 0.014, t = −1.908, p = 0.068. The COLOR-WORD condition did not differ from the NONE condition, B = −0.022, SE = 0.016, t = −1.395, p = 0.17. As in Experiment 1, we calculated the BF for the negative priming effect. Here, it was 0.08 for the presence of the effect (i.e., 13 for the null effect) indicating strong evidence for the absence of negative priming effect.

FIGURE 1 | The delta plot depicts the size of the negative priming effect in Experiment 1 oral Stroop task. The error bars are 95% confidence intervals.

#### Error Rate

The only condition to differ from the NONE condition was the COLOR-COLOR condition, B = 0.8403, SE = 0.1605, Z = −5.236, p < 0.001.

#### RT Distribution Analysis

The quantile analysis in experiment 2 used the same procedure as in experiment 1. Averaged across quantiles there was no negative priming effect, F (1, 20) = 0.020, p = 0.888, with there being no significant difference between RTs for WORD-COLOR trials and NONE trials. There was no negative priming effect across the quantiles of the RT distribution, F(3,60) = 0.209, p = 0.890. The delta slope for the negative priming effect was flat and not significantly different from 0 (see **Figure 2**).

#### GENERAL DISCUSSION

The results of Experiments 1 and 2 are straightforward: While an inhibitory negative priming effect (32 ms) was found in

the oral Stroop task, there was no hint of the effect in the manual Stroop task. This pattern replicates the original pattern reported by Neill (1977) under a more typical experimental condition. The absence of an (inhibitory) negative priming effect in the manual task is also consistent with the recent result reported by Hazeltine and Mordkoff (2014).

As noted in the Introduction, in contrast to these null results, a couple of studies used the manual Stroop task and reported finding sizeable negative priming effects (Besner, 2001; Raz and Campbell, 2011). Besner (2001, Experiment 3) presented Stroop stimuli in which only a single letter was colored, and in an experiment in which 80% of the trials were incongruent and 20% were congruent, the Stroop congruence effect was minimal (1 ms). Despite the absence of the Stroop congruence effect, Besner reported finding a large (52 ms) negative priming effect. Raz and Campbell (2011) used an equal proportion of congruent, incongruent and neutral trials, and reported that in high-hypnotizable participants a post-hypnotic suggestion for word blindness (that the words would appear "gibberish") reduced the Stroop congruence effect, but it did not impact on the size of negative priming effect (20 ms with the word blindness suggestion present and absent). Augustinova and Ferrand's (2014) call to investigate the negative priming effect was made with these studies in the background, with the dissociative effects of single letter coloring manipulation and/or word blindness suggestion on the Stroop congruence effect and the negative priming effect as a theoretical puzzle that needs to be solved.

However, a closer look at these studies suggests that how the negative priming effect was calculated is not the same as Neill (1977). Specifically, Besner (2001) wrote "All incongruent trials in which a stimulus was preceded by an incongruent trial on which the response was correct were classified either as related, in which case the irrelevant word on the previous trial was the same as the color on the current trial, or as unrelated, in which case the irrelevant word on the previous trial was different from the color on the current trial." (p. 327). From this description, it appears that while the definition of the "related" condition is the same as that described by Neill (1977) corresponding to the WORD-COLOR condition here, Besner's "unrelated" condition seems to have included all other conditions (i.e., COLOR-COLOR, WORD-WORD, etc.), not just the NONE condition. This is also the case with Raz and Campbell (2011). They defined as the "NP (negative priming)" trial those in which two consecutive trials were incongruent and the distractor word on the preceding trial matched the target color of the current trial, and as CTRL (control) trials "an incongruent trial pair wherein the word ignored in the first trial was different from the ink color of the immediately following trial" (p. 313). It is apparent from the examples shown in their Figure 1 (p. 314) that the CTRL condition included not only the NONE condition, but also the WORD-WORD condition (and though not shown in the example, their definition could also include the other conditions like COLOR-COLOR, and COLOR-WORD). The fact that Raz and Campbell noted that there were almost three times as many CTRL trials (2691) as the NP trials (1021) suggests that the CTRL trials were not the same as Neill's "unrelated" ("NONE") trials, because the frequency of "related" and "unrelated" trials, expected by chance, should be roughly equal.

If it is the case that Besner, 2001 and Raz and Campbell (2011) defined the "unrelated" (or control) condition as all conditions other than the "related" (WORD-COLOR) condition, this would explain why these studies reported finding a "negative priming effect" in a manual Stroop task. In the present experiments, for both oral and manual, the COLOR-COLOR condition was substantially faster than all other conditions. Thus, including all conditions other than the WORD-COLOR ("related") condition as the comparison condition would result in a large difference (see **Table 2**, the last row). The "negative priming effect" reported by Besner (2001) and Raz and Campbell (2011) may have reflected this benefit due to repeating the response to the target.

TABLE 2 | Mean Color Response Latencies (RT, in ms) and Percent Error Rates (%E) in Experiment 1 (Oral) and Experiment 2 (Manual).


<sup>a</sup>The color name in CAPS denote the distractor and the color name in lowercase denote the response color (e.g., YELLOWblue denotes the word YELLOW presented in blue).

This is orthogonal to the target's relationship to the distractor, and consequently has little to say about the resolution of conflict.

The present study also analyzed the negative priming effect in terms of the whole RT distribution. For the manual Stroop task, this analysis corroborated the analysis of mean RT and showed that the negative priming effect was absent throughout the whole RT distribution. For the oral Stroop task, the negative priming effect showed a flat delta slope i.e., the effect remained constant throughout the RT distribution. It is of interest to note that this pattern is different from the classic Stroop interference effect and the Stroop congruence effect which have consistently been shown to increase as responses slow, i.e., a positively sloped delta plot (e.g., Pratte et al., 2010), which is interpreted in terms of the rate of evidence accumulation ("drift rate" in the diffusion model terms). More specifically, the information that determines response selection (what color is it?) is accumulated from the word distractor as well as the color target and integrated during the evidence accumulation process, with the conflicting (incongruent) information reducing the rate of evidence accumulation. The fact that the negative priming effect, in contrast, showed a flat delta slope suggests that unlike the Stroop interference effect or the Stroop congruence effect, the origin of the negative priming effect is not in the evidence accumulation process. It is relevant to note in this regard that Neill and Westberry (1987) reported that (under the accuracy emphasis in the manual Stroop task) the negative priming effect (which they referred to as the "distractor suppression effect") was found also with neutral trials (consisting of a series of 0s) as well as the incongruent trials. That is, response on the current trial was slowed when it matched the response that would have been required to the distractor in the previous trial, even when the stimulus in the current trial contained no conflicting information. This suggests that the negative priming effect does not reflect a mechanism of control that attempts to reduce informational conflict, consistent with our interpretation that the negative priming effect does not reflect the conflict in the evidence (information) accumulation process. Further, it is contrary to the suggestion by Tipper (2001) cited in the introduction to our paper, that negative priming reflects the inhibition of the distractor's internal representation. Our view is consistent with Neill and Westberry's (1987) own interpretation that the negative priming effect does not reflect the inhibition of the activated representations themselves, but instead reflects the suppression of "access to overt responses", which is an idea first proposed by Tipper and Cranston (1985). In other words, it is not the informational conflict from the distractor word that is suppressed, it is the naming response that is suppressed.

An important question is why is negative priming a robust finding in the oral Stroop task but is only found under a narrow range of conditions in the manual Stroop task. A point of difference between the two tasks is that only the oral task requires a speech response. As noted by Roelofs (2003), in alphabetic systems, written words are intrinsically linked with their sounds. In contrast, words are not linked with a specific key on the keyboard. Perhaps the arbitrary nature of the color-key mapping in the manual task reduces the strength of response conflict caused by the distractor, which in turn reduces the need for the response to the distractor to be suppressed. We hope that this paper leads to further investigation of this possibility.

## CONCLUSION

The present study made three empirical contributions. First, we replicated the absence of the negative priming effect in the manual Stroop task when the effect was found in the oral Stroop task tested under identical conditions. Second, we pointed out that previous manual Stroop experiments reporting the negative priming effect confounded the effect of response repetition. Third, we reported the analysis of the negative priming effect at the level of whole RT distribution, which revealed that the effect was absent throughout the RT distribution in the manual task, and it was of constant size across the RT distribution in the oral task. This pattern contrasts sharply with the pattern of Stroop interference effect and the Stroop congruence effect. We take these findings to argue that the negative priming effect does not serve as an index of control of informational conflict in the Stroop task.

## DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Faculty of Human Sciences Human Research Ethics Sub-Committee. The patients/participants provided their written informed consent to participate in this study.

## AUTHOR CONTRIBUTIONS

LM, SK, and DN made substantial contributions to the conception and design of the work, acquisition, analysis, and interpretation of data for the work, drafting and editing of the work, approves of the work being published and agreed to be accountable for all aspects of the work.

## FUNDING

SK and DN are supported by an ARC Discovery Project Grant (DP140101199), and LM by the Macquarie University Research Training Program (RTP) Scholarship.

## ACKNOWLEDGMENTS

The authors thank Christine Inkley for the research assistance.

## REFERENCES

fpsyg-10-01764 August 1, 2019 Time: 18:37 # 9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Mills, Kinoshita and Norris. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Reclaiming the Stroop Effect Back From Control to Input-Driven Attention and Perception

#### *Daniel Algom1 \* and Eran Chajut2*

*1 School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel, 2 Department of Education and Psychology, Open University of Israel, Ra'anana, Israel*

According to a growing consensus, the Stroop effect is understood as a phenomenon of conflict and cognitive control. A tidal wave of recent research alleges that incongruent Stroop stimuli generate conflict, which is then managed and resolved by top-down cognitive control. We argue otherwise: control studies fail to account for major Stroop results obtained over a century-long history of research. We list some of the most compelling developments and show that no control account can serve as a viable explanation for major Stroop phenomena and that there exist more parsimonious explanations for other Stroop related phenomena. Against a wealth of studies and emerging consensus, we posit that *data-driven selective attention* best accounts for the gamut of existing Stroop results. The case for data-driven attention is not new: a mere twenty-five years ago, the Stroop effect was considered "the gold standard" of *attention* (MacLeod, 1992). We identify four pitfalls plaguing conflict monitoring and control studies of the Stroop effect and show that the notion of top-down control is gratuitous. Looking at the Stroop effect from a historical perspective, we argue that the recent paradigm change from stimulus-driven selective attention to control is unwarranted. Applying Occam's razor, the effects marshaled in support of the control view are better explained by a selectivity of attention account. Moreover, many Stroop results, ignored in the control literature, are inconsistent with any control account of the effect.

#### *Edited by:*

*Ludovic Ferrand, Centre National de la Recherche Scientifique (CNRS), France*

#### *Reviewed by:*

*James R. Schmidt, Université de Bourgogne, France Derek Besner, University of Waterloo, Canada*

> *\*Correspondence: Daniel Algom algomd@tauex.tau.ac.il*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 08 May 2019 Accepted: 03 July 2019 Published: 02 August 2019*

#### *Citation:*

*Algom D and Chajut E (2019) Reclaiming the Stroop Effect Back From Control to Input-Driven Attention and Perception. Front. Psychol. 10:1683. doi: 10.3389/fpsyg.2019.01683*

Keywords: Stroop, control, conflict, salience, congruity, contingency

Everyday functioning requires a modicum of ability to attend selectively to the relevant feature of objects, excluding irrelevant or distracting features. In the absence of this ability, one cannot concentrate on texting a friend in the cafeteria, listening to a presentation in class, or negotiating the traffic when driving or walking. Facility at isolating the task-relevant attribute is indispensable for adaptation and survival. The Stroop effect (Stroop, 1935) assays this vital mental faculty. In fact, the Stroop effect is psychology's oldest and still most popular tool for assessing the ability at focusing exclusively on the attribute of interest in the object (Eidels et al., 2010). In Stroop's (1935) original setup, the objects were color words printed in color, and the relevant attribute for responding was the color (while ignoring the carrier word). To gauge the influence of the task-irrelevant words, the Stroop effect is defined as the difference in color-naming performance between congruent (the word naming its color such as RED in red, with the former indicating the word and the latter the color) and incongruent (word and color conflict, such as RED in green) stimuli. Better performance with congruent than with incongruent stimuli shows that people paid attention to the task-irrelevant words, thereby compromising exclusive focus on the print colors. Had people focused exclusively on the target color, no word dependent difference in color naming (=Stroop effect) would have emerged. A century after Stroop's landmark study, the effect bearing his name continues to fascinate researchers, sustaining an ever growing amount of studies. Despite the vast literature, the effect has eluded a consensual theoretical resolution.

## A BIT OF HISTORY

The Stroop effect boasts a convoluted history. In the first period, between 1935 and 1964, the effect attracted little interest and was discussed as a learning phenomenon (MacLeod, 1992). In Stevens' (1951) celebrated handbook, there is but a single passing reference to Stroop in a chapter on learning and retention. After 1964, the theoretical interpretation of the effect changed dramatically to one of attention (Klein, 1964; Jensen and Rohwer, 1966). The number of publications rose quickly, and the pace shows no signs of abating to date. The new construal of the Stroop effect occurred contemporaneously with the advent of the cognitive paradigm in psychology. The trend of accommodating attention peaked in the last decade of the Twentieth century. Colin M. MacLeod, author of the definitive review (MacLeod, 1991), called the Stroop effect "one of the benchmark measures of *attention*" (MacLeod, 1992, p. 12).

However, the dominant conceptual framing of the Stroop effect changed yet again at around the turn of the twenty-first century. The new approach centered on the notions of "conflict" and "control." It was actually the latter term that was first popularized by Posner and his associates (e.g., Posner and Petersen, 1990; Posner and Raichle, 1994; see also, Petersen and Posner, 2012). These authors conceived performance in the Stroop task to be under "executive control" (Fan et al., 2002, p. 341) or simply as an "executive function" (Petersen and Posner, 2012, p. 73) under the control of well localized brain loci (in particular, the anterior cingulate system). Of course, it would be absurd to deny brain control of whatever we do, but assuming minute monitoring and very-small-scale response adjustments *via* central command ignores the influence of input-driven bottom-up processes. An all-engulfing central control view would still need to explain the ways and means of top-down penetration of Stroop performance on such a fine-grain scale. For all his efforts at identification of brain loci for cognitive functions, Posner was aware of the fact that these associations did not amount to a (Stroop) theory, to wit, "much needs to be learned about the *mechanisms*" used by the "executive system" (Posner and Raichle, 1994, p. 174, emphasis added). Subsequent development of the control view claimed to identify such a specific top-down mechanism – conflict monitoring and management – which governs Stroop performance. This novel theory of the Stroop effect rests on the original observation by Posner and Raichle, 1994 that "the anterior cingulate system is more active during trials of the Stroop task in which conflict exists than during trials in which it does not" (p. 171). However, more recent research increasingly questions an exclusive connection between enhanced activity of the anterior cingulate system and conflict (e.g., Steinhauser and Hiibner, 2009; Grinband et al., 2011a; Levin and Tzelgov, 2016; see also again, Posner and Raichle, 1994).

Conflict monitoring theory (Botvinick et al., 2001) proposes that performance in the Stroop task is governed by central control, which adjusts the attention allocated to the target color on a trial-to-trial basis. In particular, Stroop-incongruent stimuli generate a large amount of conflict (due to the mismatch between the color and the word). This conflict, in turn, invites increased control, which subsequently reduces the attention allocated to the task-irrelevant word. It is difficult to overstate the grip on current research of the control account. The fad of conflict monitoring and control is unprecedented within the Stroop milieu; following Schmidt's (2019) observation, the first few articles published between 1998 and 2004 now combine for over 30,000 citations in the literature (e.g., Carter et al., 1998; Botvinick et al., 1999, 2001, 2004; MacDonald et al., 2000; Miller and Cohen, 2001; Kerns et al., 2004; see Schmidt, 2019, for an extensive bibliography). The upshot is, the Stroop effect has been appropriated from being an index of inputdriven selective attention to a tool for generating conflict and measuring control.

## GOAL OF THE PRESENT REVIEW

We believe that the recent paradigm shift in the construal of the Stroop effect is unwarranted. Our goal in this review is to show, against a wealth of recent studies and emerging consensus, that there is in fact no compelling evidence for control or top-down influence in the Stroop effect. Certainly, the term "top-down" is used in a variety of ways in different domains of cognitive psychology (see Firestone and Scholl, 2016). Within the Stroop milieu, "top-down" influence is currently conceived as an overall strategy, which is typically determined in advance. It is exercised through control and results in adaptation to conflict. It is this meaning of "topdown" influence that we challenge as a valid theory of the Stroop effect.

We are not alone in challenging the conflict monitoring account. In the face of an overwhelming literature, James Schmidt has mounted a powerful attack on the psychological reality of conflict monitoring and control, dubbing them repeatedly "an illusion" (e.g., Schmidt et al., 2015, 2018). In two comprehensive reviews, Schmidt concluded that data-driven explanations (e.g., biased learning and memory) provide a sufficient account of the findings subsumed under the conflict monitoring and control (Schmidt, 2013, 2019; see also, Schmidt and Besner, 2008; Schmidt, 2016a,b). Notably, Schmidt's alternative explanation does not appeal to the notion of conflict and control. Schmidt addresses in admirable detail the various biases lurking in major control studies and concludes that those biases compromise their validity as well as the attendant explanation in terms of conflict and control. Given Schmidt's contribution and the availability of further comprehensive reviews of the control literature (e.g., Egner, 2008, 2014; Bugg and Chanani, 2011; Bugg and Crump, 2012; Bugg and Hutchison, 2013; Bugg, 2014; Abrahamse et al., 2016; Cohen-Shikora et al., in press), we eschew another general review. Instead, the present article is a theoretical critique of the control account, one rooted in bona fide Stroop literature.

The present review takes the neglect of basic Stroop results in control studies as a point of departure and expands the analysis to show that conflict monitoring and control cannot serve as a viable theory of the Stroop effect. As we recounted, the Stroop effect boasts a long and rich history (rapidly approaching the century mark), but large chunks of this research are ignored in the control literature. We show that factoring in basic findings of proper Stroop research challenges the validity of any theory of conflict monitoring and control.

## THE STRUCTURE OF THE REVIEW

To anticipate the development, we first state in a concise fashion our main argument. Four pitfalls plaguing control studies of the Stroop effect are then pinpointed. We follow by discussing each point in detail. These discussions, informed by basic Stroop literature, form the backbone of the paper. The understanding that conflict or control accounts do not comprise a viable candidate explanation of the Stroop effect is stated in the section "Conclusion."

### THE MAIN ARGUMENT: WHAT IS AND WHAT IS NOT EXPLAINED BY CONFLICT AND CONTROL?

Very succinctly, the conflict monitoring account proposes that attention is dynamically allocated to either the target (color) or the distractor (color word) *via* central control. Each time high conflict is met (by a Stroop-incongruent stimulus), control is engaged to enhance focus on the target. This amplified control is relaxed when high conflict is not experienced (by a Stroop-congruent stimulus). Of the wide range of Strooprelated phenomena (see, e.g., MacLeod, 1991; Melara and Algom, 2003, or Sabri et al., 2001, for reviews), the evidence for the conflict monitoring account is based almost exclusively on two effects: the proportion congruent (PC) effect and the sequential effect known as the Gratton effect (Gratton et al., 1992).

## WHAT IS EXPLAINED BY CONFLICT MONITORING AND CONTROL?

The PC effect is the observation that the Stroop effect is smaller when there are a disproportionately large number of incongruent stimuli in the set. For example, the Stroop effect is smaller when the stimulus ensemble includes 80% incongruent stimuli (hence 20% congruent stimuli) than when the ensemble includes 20% incongruent stimuli (hence 80% congruent stimuli). The conflict monitoring account provides a ready explanation for this modulation of the Stroop effect: Participants experience a great deal of conflict in the mostly incongruent set, a condition that is bound to summon strong central control. The enhanced control, in turn, results in focused attention to the target attribute. The task-irrelevant word is less attended, and the net result is a small Stroop effect. Therefore, the greater the number of incongruent stimuli, the smaller the Stroop effect.

The Gratton effect is the observation that the (color) response to an incongruent stimulus that follows an incongruent stimulus is faster than the response to an incongruent stimulus that does not follow an incongruent stimulus (i.e., it is preceded by a congruent stimulus). The same explanation is offered by the conflict account, now on a smaller scale. After experiencing conflict on trial *n*−1, control is invited to exert its influence, so that its salutary effect is observed on trial *n*. In other words, due to enhanced control, the participant adapts to conflict and maximizes the ability to ignore the taskirrelevant word.

In summary, this new account provides reasonably straightforward explanations for these two effects in terms of conflict, control, and conflict adaptation. There is a pitfall, though: Much simpler explanations are available based on properties of the data at hand. We discuss these stimulusdriven explanations and show that they are to be favored over control on grounds of both parsimony and general applicability.

## WHAT IS NOT EXPLAINED BY CONFLICT MONITORING AND CONTROL?

Whereas alternative explanations exist for the PC and the Gratton effects (Schmidt, 2019), conflict monitoring and control theory have real difficulty explaining the following Stroop finding. Presenting the *same* number of incongruent stimuli can result in a large Stroop effect, a zero Stroop effect, or a reverse Stroop effect (where colors intrude on word naming more than vice versa). The trifle stimulus manipulation that produces these diverse outcomes is slight changes in the relative salience of the color and the word components of the stimulus. It is important to note that the changes of salience are so slight that the words remain eminently legible and the colors similarly remain eminently identifiable under all the conditions. These findings are devastating for the control account (e.g., Garner and Felfoldy, 1970; Garner, 1974; Pomerantz, 1983; Melara and Mounts, 1993; Algom et al., 1996; Melara and Algom, 2003; Algom and Fitousi, 2016). Presumably the same amount of conflict is experienced, yet performance changes dramatically regardless of "conflict."

Quite apart from these observations, portions of the Stroop literature contain studies in which presentation of Stroop stimuli – i.e., conflict generating stimuli – does not yield a Stroop effect (e.g., Flowers et al., 1979; McClain, 1983a,b; Glaser and Glaser, 1989). Again, no control explanation is able to account for such results. In general, control theory is unable to explain variation in Stroop results when the amount of conflict is held constant.

A further observation is arguably fatal for control theory: congruent stimuli produce Stroop facilitation (faster color naming to congruent than to neutral stimuli) just as incongruent stimuli produce Stroop interference (faster color naming to neutral than to incongruent stimuli), and the Stroop effect entails *both*, i.e., the effect is not solely interference. Thus, participants respond "red" faster to the word RED in red than to the word TABLE in red, a result called facilitation, and the Stroop effect is sometimes generated wholly or mostly by facilitation rather than by interference (Brown, 2011; Eidels, 2012). The faster RTs to congruent than to neutral stimuli – Stroop facilitation – is not a transient or ephemeral result; it is a systematic effect (as much as Stroop interference), and conflict monitoring theory seems unable to account for a Stroop effect produced by facilitation. Finally, control theory faces difficulty in accounting for Stroop's original results (Stroop, 1935). In Stroop's experimental condition, all of the stimuli were incongruent, so that control was presumably very strong. Conflict monitoring theory predicts a small Stroop effect (interference). In sharp contrast to this prediction, Stroop recorded what is arguably the largest Stroop effect in the literature.

In the remainder of the review, we expand on all the above points. We show that effects attributed to central top-down control are actually changes in the stimulus input; the effects are well captured by input-driven attention or its failure. Next, we identify four pitfalls lurking in studies performed under the control approach.

#### FOUR PITFALLS IN CONTROL STUDIES OF THE STROOP EFFECT

**First**, arguably the most severe pitfall is that key term of "conflict" in the "conflict-generated-control" approach is vague and imprecise. The problem is already apparent in the widely cited study of Botvinick et al. (2001), a pioneering undertaking in the field. The notions of "conflict monitoring" and "control" are thoroughly discussed, but what is missing from the text is a clear, unambiguous *theoretical* definition of the key term of "conflict." Monitoring is rightly showcased as the new development (the added component to the computational model of Cohen et al., 1990, or that of Cohen and Huston, 1994), but what is being monitored is underdefined. In lieu of a theoretical definition, Botvinick et al. (2001) ponder how "conflict might be *measured*" or "*operationally* defined" (p. 630; emphases added). For a tool, the authors elected to use Hopfield's (1982) measure of "energy" in a recurrent neural network to indicate the level of conflict; in words, "conflict" is conceived as "the simultaneous activation of incompatible representations … e.g., representations of alternate responses" (Botvinick et al., 2001, p. 630). This definition is imprecise as is. In particular, the notion of "incompatible representations" is left hopelessly ambiguous.

To understand the cost of the ambiguity, consider the following critical question. Does "conflict" and "incompatible representations" apply only to *logically* contradictory responses (hence, to truly incompatible responses) or to all possible responses to multidimensional stimuli? To render the question more concrete: Is a circle in green and the word RED in green *both* conflict stimuli? With the first stimulus, there is no logical or semantic conflict (or agreement) between color and shape. There cannot be congruent and incongruent cases with stimuli composed of color and shape – a green circle is neither more nor less congruent or incongruent than say a blue rectangle. The Stroop effect cannot be calculated for such stimuli simply because the Stroop effect is defined by the difference between congruent and incongruent cases. A certain shape and a certain color cannot be in conflict because neither excludes the other; the responses to the shape and the color of a green apple are never incompatible. By contrast, the second stimulus *is* a Stroop stimulus: The word and the color can match (=congruent stimulus) or conflict (=incongruent stimulus). An incongruent Stroop stimulus is a genuine conflict stimulus because the response to the word excludes the response to the color. The responses to the word and to the color are inescapably incompatible. Conversely, for the congruent Stroop stimulus, RED in red, the responses to the word and the color do not compete with one another as they are the very same single response. Because the responses are compatible (not incompatible), congruent Stroop stimuli are free of conflict. Considering the Botvinick et al. (2001) model, the approach called "conflict monitoring and control" does not appreciate or recognize the qualitative difference between Stroop or conflict stimuli, on the one hand, and non-Stroop or non-conflict stimuli, on the other hand. Adverse consequences ensue for theory and research alike.

In the computational model of Botvinick et al. (2001), virtually all multidimensional stimuli are conflict stimuli, i.e., Stroop-congruent stimuli such as RED in red and non-Stroop stimuli such as a green apple all are conflict stimuli. This feature alone defies common sense and violates fundamental laws of logic. For common sense, to maintain the absurd thesis that RED in red produces conflict – when both components agree, support, and converge on the same single response – is tantamount to leaving the notion of conflict void of meaning. For logic, to discount the structural difference between the Stroop-incongruent stimulus, RED in green, and the non-Stroop stimulus, green apple, means ignoring the basic law of non-contradiction. For RED in green, the possible responses (red, green) cannot both be true (for that ink color), so that the responses are mutually exclusive. By contrast, for a green apple, the possible responses (green for color and apple for shape) can both be true at the same time, so that the responses are not mutually exclusive. In logic, the truth-functionally compound statements (e.g., Copi, 2015) that are (or that can be) associated with RED in green and with a green apple are fundamentally different. Again, this difference is ignored in the model. Thus, Botvinick et al. (2001) affirm in their text that on "*incongruent* trials … the intersection of … two pathways … causes *conflict*" (p. 631, emphasis added), but this tells only part of the story; in their model, congruent trials also generate (less) conflict.

To recap, the Botvinick et al. model holds that Stroopcongruent stimuli, Stroop-incongruent stimuli, non-Stroop stimuli, all produce conflict to a different degree. The difference is merely quantitative. By contrast, common sense, logic, and insights based on a century of Stroop research hold that (1) incongruent stimuli entail conflict, (2) non-Stroop and neutral stimuli lack the quality of conflict (conflict is orthogonal to such stimuli), and (3) congruent stimuli are free of conflict. Although computationally elegant and manageable (and parsimonious), the idea that Stroop-congruent (and non-Stroop) stimuli cause conflict is conceptually untenable.

The tenuous relation in the model between Stroop-congruity and conflict came to the fore in subsequent extensions of the model, which also included errors (Yeung et al., 2004, 2011; Yeung and Nieuwenhuis, 2009). The extended versions each used a different implementation of the model, which, in turn, affected the Congruity-Conflict predictions to the extent that it was questioned "whether a single unified model of conflict monitoring exists" (Grinband et al., 2011b, p. 321). In the more recent version of Yeung et al. (2011), "conflict" is conceived as enhanced anterior cingulate activity that can result from a large variety of sources, including sensory noise, attention fluctuation, and response bias – all of which can and often do "dwarf " congruity-related conflict. Maintaining that "conflict" corresponds to *any* unrelated sensorimotor activity (that affects RT) leads to the absurd idea that "conflict" exists even when detecting a simple one-dimensional signal with a *single* response option. This "diffuse definition" of conflict (if it is a definition in the first place) "trivializes" the concept of conflict, making it practically useless (Grinband et al., 2011b, pp. 321–322). In the final analysis, "conflict" in the Yeung et al. (2011) model is basically independent of congruity and is independent of response compatibility (see again, Grinband et al., 2011b); the notions of congruity and (in)compatibility that first motivated the Botvinick et al. (2001) effort are trivialized in later implementations of the model. As a result, the model is an ill-suited candidate theory of the Stroop effect.

We identify three fundamental problems with the Botvinick et al. (2001) approach (and its various offspring). First, as noted in Grinband et al. (2011b), conflict monitoring was never tested against the natural null hypothesis that enhanced anterior cingulate activity is associated with task general processes of perception, attention, and memory, rather than with conflict. When tested against this null hypothesis (Grinband et al., 2011a), no evidence for involvement of conflict (monitoring) was found beyond the generic effect of task engagement. The second fundamental problem is that the model couples a highly specific and richly developed concept from cognitive psychology to electrophysiological activity in a certain brain region – ignoring throughout the loaded ramifications of the concept within cognitive science and philosophy. Instead, the model (especially in recent implementations) stretches the notion of conflict beyond reasonable limits (the model might well have used "energy" or any other term to replace the increasingly debilitated "conflict"). The third fundamental problem concerns methodology, namely the scientific value and usefulness of the concepts of "conflict" and "control." In the model, virtually any act of perception and cognition is marked by conflict. Conflict is lurking beneath such quotidian actions as reading familiar words, deciding between independent non-opposing alternatives, or just responding to any stimulus in an unspecified manner. However, if everything is conflict, then conflict becomes an empty, useless concept. A useful scientific definition should specify not only what is included, but also what is excluded.

Finally, inconsistent with the computational model discussed, the majority of Stroop studies subsumed under the control idea do place conflict quite naturally in Stroop-incongruent stimuli. As a rule, Stroop-incongruent trials are defined as "conflict stimuli," implying that Stroop-congruent stimuli are free of conflict. This binary conception is the dominant and accepted view in large portions of the control literature. The terms "incongruent stimuli" and "conflict stimuli" are used interchangeably in the control literature (e.g., see the titles of Bugg and Smallwood, 2016, or of Mayr et al., 2003). We reiterate, the term "conflicting stimuli" implies non-conflicting stimuli (i.e., congruent or neutral stimuli), and this distinction actually informs much discussion of the Stroop effect in the control literature. Nevertheless, we return to discuss the implications of basic Stroop findings for the continuum conception entailed in the computational model and show that "conflict" and "control" are superfluous to an explanation of the varieties of Stroop effects.

**Second**, in the "conflict-generated-control" approach, parallel processing or cross-talk is typically tailored to result in interference. However, a cross-talk can also result in facilitation and in a gain to performance (MacLeod, 1991; MacLeod and MacDonald, 2000; Roelofs, 2010). Again, the prime example in the control literature of cross-talk produced *interference* is the Stroop effect. However, the Stroop effect is not solely interference; it is also facilitation. Stroop effects attributed to interference may well be those of facilitation. In the absence of partitioning the effect into interference and facilitation, a partition that is rarely done in control studies, one cannot decide the source. Without appropriate measurement, the Stroop effect cannot serve as arbiter of conflict.

Arguably, too, the notion of a Stroop effect produced by facilitation is anathema to the conflict-control approach (e.g., Lindsay and Jacoby, 1994; Brown, 2011; Eidels, 2012). After all, conflict is supposed to generate interference. However, if the same Stroop presentation systematically generates facilitation (rather than conflict and interference), the notion of enhanced control summoned by conflict is called into question.

**Third**, it is not completely clear where the conflict resides (e.g., Levin and Tzelgov, 2016). Does the conflict reside in the stimulus, i.e., impacting early input-driven processing, or does it mainly reside in the response? In the face of a certain level of ambiguity, most discussions and modeling efforts focus on late processing, close to the response. However, this conception can be challenged. Following Garner (Garner, 1962, 1970, 1974; Garner and Felfoldy, 1970; see also Melara and Algom, 2003;

Algom and Fitousi, 2016), it is eminently possible that the conflict (mainly) resides in the stimulus. The problem is that authors within the control approach ignore the makeup of the stimulus. The perceptual properties of the Stroop stimulus – the physical features of the colors and the fonts used – are neglected. However, these basic perceptual properties can predict whether there will be a Stroop effect to begin with, as well as its direction (standard or reverse). For example, the relative perceptual salience of the presented color and word can determine if there is a Stroop effect, and, if there is, its magnitude (Garner, 1974; Melara and Mounts, 1993; Melara and Algom, 2003). Presenting Stroop stimuli does not *ipso-facto* guarantee that there is a Stroop effect! Depending on the perceptual properties of the stimuli, the *same* Stroop presentation can generate a Stroop effect, a zero Stroop effect, or a reverse Stroop effect (by which colors intrude on word reading more than vice versa; e.g., Pomerantz, 1983; Pomerantz and Pristach, 1989; Algom et al., 1996; Dishon-Berkovits and Algom, 2000). The upshot is, stimulus properties can determine the Stroop effect without need to engage any central control mechanism.

**Fourth**, the makeup of the stimulus is not the only datadriven mechanism governing the Stroop effect. Another datadriven influence on the Stroop effect is the correlation introduced over the experimental trials between the target colors and the task-irrelevant words. Because the Stroop *task* entails naming the color and because the Stroop *effect* measures the ability to attend selectively to the color, any color-word correlation introduced compromises exclusive attention to the color. A fair number of control experiments jeopardize the Stroop task by introducing just such a correlation between the relevant ink colors and the irrelevant words. The correlation makes the nominally irrelevant words predictive of the target color, so that attending to the word helps maximizing color performance. Inevitably, exclusive attention to the target colors is compromised. The original Stroop task as a measure of the selectivity of attention is disabled.

In several studies within the control approach (e.g., Bugg and Smallwood, 2016; Hutchison et al., 2016), the correlation between word and color over the experimental trials was created by the lopsided makeup of the block (for example, of a block of 10 trials, eight were congruent). In this case, the nominally irrelevant word largely predicts the target color. The situation is exacerbated by instructions that augment the actual correlation. For example, the participants are told that the majority (say, 80%) of the next block (of, say, 10 trials) will be congruent. The problem again is that this instruction and the attendant design already create a correlation between the nominally irrelevant words and the relevant colors, which is fatal for the selective attention tested (Dishon-Berkovits and Algom, 2000; Melara and Algom, 2003; Schmidt and Besner, 2008). Apart from the instructions, virtually all control studies entailed a word-color correlation by presenting (grossly) unequal number of congruent and incongruent stimuli. One must realize that imbalanced presentation of congruent and incongruent stimuli necessarily creates a correlation between the color and word components. Because (1) the Stroop effect measures (the failure of) selective attention to the color and (2) a color-word correlation diverts attention to the irrelevant word, a large Stroop effect is thereby created. Most important, this factor of correlation is stimulus dependent, i.e., it does not invite a central control mechanism to account for the Stroop results. All that is involved is simply the perception of correlation (Kareev, 1995a,b, 2000; Kareev et al., 1997).

We note that, in the control approach, providing advance information or biasing the probability of congruent and incongruent stimuli (by grossly imbalanced presentation) is legitimate. In this approach, these procedures are merely a means for generating conflict. What is not recognized though is that this way of generating conflict comes at the expense of compromising the meaning and the serviceability of the original Stroop test (as a tool of measuring selective attention). The manipulation is still called "Stroop," but, in truth, it has almost nothing to do with the Stroop effect. It is thus hardly surprising that the Stroop effect itself is not calculated or is rendered marginal in a fair number of studies within the control approach (e.g., Hutchison et al., 2016; Kleiman et al., 2016; see also, Wegner and Erber, 1992; Wegner et al., 1993, on the use of the Stroop *task* without the calculation of the Stroop *effect* in "mental control").

## RESOLVING THE PITFALLS WITHIN BONA FIDE STROOP RESEARCH

We proceed by elucidating the problems mentioned, benefiting from the results and insights obtained within Stroop research proper. To anticipate, resolution within genuine Stroop research shows that the notion of control is simply gratuitous as a means for explaining the Stroop phenomenon.

## PITFALL 1: GENERAL DEFINITION OF CONFLICT AND NON-CONFLICT STIMULI

In the absence of a definition for the basic term, "conflict," the control approach considers the Stroop stimulus as representative of all multidimensional stimuli. However, all multidimensional stimuli are not also conflict or Stroop stimuli. As we recounted, badly missing is the distinction between Stroop and non-Stroop stimuli. The missing distinction is conductive to the absurd notion that the ink-color response "green" to the word RED in *green* is comparable to the ink-color response "green" to a triangle in *green*. The missing distinction similarly leads to the notion that these ink-color responses are on the same foot as categorization responses to the word TABLE. Control theory holds that whenever there are multiple alternative responses to the (multidimensional) stimulus, there is conflict (in need of control). This idea, however, ignores the nature of the relations between the alternatives. The alternatives can be conflicting or *matching* as they are in Stroop-congruent stimuli (e.g., RED in red) *or* non-conflicting and non-opposing or simply logically unrelated. Stroop stimuli belong in the first class, but other multidimensional stimuli belong in the second class. Control studies blur the all-important dividing line between Stroop and non-Stroop stimuli.

What is the one property telling Stroop and non-Stroop stimuli apart? The defining feature of all Stroop stimuli is the existence of a *logical* relationship, compatibility or incompatibility, between their components. Each and every Stroop stimulus falls into one of the mutually exclusive and exhaustive classes of congruent or incongruent combinations. For example, all conceivable combinations of a color word and a print color *must* result in either a congruent (the word naming its color) or an incongruent (word and color mismatch) stimulus. Precluded is any other type of combination. By contrast, there is no logical conflict between the shape and the color of a green triangle. Again, an adequate theory of the Stroop effect must entail the uniqueness of Stroop stimuli as well as their distinct processing.

A ready example highlighting the last point is the so-called "emotional Stroop effect" (e.g., Algom et al., 2004, 2009). The emotional Stroop effect is the difference in color-naming performance between emotional (e.g., the word DEATH printed in red) and neutral (e.g., the word DOOR printed in red) stimuli. Because the words are not color words, these stimuli lack the logical relationship of conflict or correspondence between their attributes. The word DISEASE printed in blue is neither more nor less congruent than the word LECTURE presented in pink. The stimuli in the emotional Stroop task do not divide into congruent and incongruent combinations. Consequently, the Stroop effect cannot be calculated in studies of the emotional Stroop effect. Given a color-naming task, as in the classic Stroop task, the word BLUE printed in yellow (or in blue) is a Stroop stimulus, but the word CANCER printed in yellow (or in any other color) is not a Stroop stimulus. Conflict resides in the first type of stimuli but not in the second type of stimuli. Note that color naming may nonetheless be slower to CANCER than to TABLE, but that slowdown is not a Stroop effect. Clearly, all differences in performance do not derive from conflict.

## PITFALL 2: THE STROOP EFFECT: CONFLICT AND FACILITATION

The control approach (as a Stroop theory) fails to account for Stroop facilitation. The standard Stroop experiment includes three types of stimuli: congruent stimuli (e.g., the word RED in red), incongruent stimuli (RED in green), and neutral stimuli (e.g., TABLE in red). The following equation defines the Stroop effect in all experimental designs:

Stroop effect M= RT( ) incongruent M– , RT( ) congruent

where MRT is the mean reaction time (RT) to name the ink color. The Stroop effect can be partitioned into Stroop interference (SI), so that SI = MRT (incongruent) – MRT (neutral), and Stroop facilitation (SF), so that SF = MRT (neutral) – MRT (congruent). Therefore, the Stroop effect equals the simple algebraic sum of interference and facilitation,

#### Stroop effect S = +I SF

Note that the congruent stimulus "RED in red" does not entail any conflict, yet it is often a major contributor to the Stroop effect. People usually respond "red" to "RED in red" faster than they respond "red" to "TABLE in red"(=SF), and this facilitation enhances the observed Stroop effect. The Stroop effect is not equivalent to interference and conflict. It is also possible that the entire Stroop effect is produced by facilitation (e.g., Eidels et al., 2010; Eidels, 2012). A recognized theory of the Stroop effect, Tectonic theory (Melara and Algom, 2003), ascribes a major part of the Stroop effect to facilitation (rather than to interference).

It is worth pausing for a moment on the extreme theoretical version developed by Eidels (2012; see also Eidels et al., 2010). Eidels shows that a behavioral Stroop effect can derive from *independent* processing of the word and the color (i.e., there is an independent horse race between the processing channels). In Eidels' theory, the color horse does not know the position, speed, or, indeed, the very existence of the word horse. Eidels (2012) uses stochastic modeling based on the following simple idea: For congruent stimuli, both processing channels (word, color) count for the same (correct) response, whereas for incongruent stimuli, only the color channel does. For example, for the congruent stimulus, RED in red, the fastest channel wins the race producing the correct response for the experimenter, regardless if it comes from the color (correctly) or from the word (incorrectly, but undetectably). Again, processing is completely independent. If so, there cannot be interference (or facilitation) simply because there does not exist any crosstalk between the processing channels. The notion of control and conflict is gratuitous in Eidels' theory.

Ignoring theory, our main point is that merely observing a Stroop effect does not reveal the ingredients of interference and facilitation. Partitioning the effect by including the baseline condition of neutral stimuli is essential for arguing the case of conflict. In this respect, the majority of control studies of the Stroop effect did not include a baseline. Consequently, the Stroop effect cannot serve as a pure assay of conflict and control because the effect entails a significant non-conflict (i.e., facilitation) component. As a result, control cannot serve as a (parsimonious) theory of the Stroop effect.

## PITFALL 3: PHYSICAL DETERMINANTS OF THE STROOP EFFECT: THE RELATIVE DISCRIMINABILITY OF THE WORDS AND THE COLORS

A major determinant of the Stroop effect is the relative salience or discriminability of the different words and ink colors used. When dimensional discriminability is matched, the time and accuracy needed to tell apart the words from one another is the same as the time and accuracy needed to tell apart the ink colors from one another. However, mismatched discriminability favoring words was present in virtually all control studies of the Stroop effect. Without dedicated preparation of the stimulus (not implemented in control studies), it takes participants longer to tell apart the ink colors from one another (e.g., red from green) than the words from one another (e.g., RED from GREEN). The presence of this asymmetry is critical because the more discriminable dimension disrupts performance on the less discriminable dimension (Sabri et al., 2001). Consequently, the task-irrelevant words affect performance with the ink colors (=Stroop effect) not because word reading is the habitual response (which generates conflict), but simply because the words differ perceptually from one another more than do the colors from one another. This factor of relative dimensional salience has been ignored in the control literature with serious consequences for Stroop theory.

To recap, when the words are more salient than the colors (the default Stroop setup in the control literature), the usual Stroop effect appears. However, when the dimensions are made equally discriminable (by presenting appropriately matched values), the Stroop effect collapses. And, when the ink colors are made purposely more salient than the carrier words, a reverse Stroop effect emerges by which the ink colors intrude on word reading. We hasten to add that manipulations of salience entail nothing more than slight adjustment of the fonts (e.g., size, shape) and the colors (intensity, focality); they do not affect legibility or identification. Experimenters were able to produce a Stroop effect and a reverse Stroop effect or to eliminate the effect altogether at will (Garner and Felfoldy, 1970; Pomerantz, 1983; Melara and Mounts, 1993; Algom et al., 1996; Pansky and Algom, 1999, 2002; Sabri et al., 2001; Fitousi and Algom, 2006; Fitousi et al., 2009). A schematic summary of these results is provided in **Figure 1**.

The vital role of relative salience was discovered in a seminal work by Garner and Felfoldy (1970). More recently, Melara and Algom (2003) culled a sample of 35 published results from the Stroop literature and examined the relation between the *Stroop effect*, on the one hand, and the difference in *baseline salience* between word and color, on the other hand. The color Baseline task measures pure color performance: neutral words (e.g., TABLE, STREET, and CLOCK) in different colors are presented for color identification. The word Baseline task measures pure word-reading performance: Color words in uniform black are presented for word identification. Performance in these Baseline tasks can be compared to assess the ease or difficulty of classification along each dimension. Note that the Baseline tasks are *non-conflict* tasks in which the stimuli are one-dimensional. The Pearson correlation found between the word-color difference at baseline and the Stroop effect amounted to 0.78. This means that well over half of the variance in published values of the Stroop effect derives from mismatched salience between word and color. This relation is illustrated in **Figure 2**.

The effect of relative dimensional salience is evident already in Stroop's classic study (Stroop, 1935). Stroop's participants named the colors of 100 squares (pure color condition) in 63.3 s, on average, but read 100 words in black (pure word condition) in 41 s, on average – a staggering 22 s mismatch in task difficulty favoring words. When Stroop combined the two dimensions to produce color-word stimuli, *word* reading remained almost the same as in the pure word condition (mean of 43 s), but *color* naming was worse in the combined condition than in the pure color condition (mean of 110 s). The literature focused on this asymmetry in interference rather than on the prior asymmetry in baseline performance. However, given the summary of **Figure 1**, it is the latter that produced the former. Stroop's results thus form a special case of the law by which the more salient dimension intrudes on the less salient dimension more than vice versa.

interference from the ink colors more than vice versa (= reverse Stroop effect).

FIGURE 2 | The influence of stimulus makeup on the Stroop effect: the larger the baseline word-color difference in salience (favoring word), the larger the Stroop effect.

## IMPLICATIONS FOR THE CONTROL APPROACH

The results obtained with respect to the factor of relative salience are devastating for a control-based explanation of the Stroop effect. Conflict and control are said to depend on the number of conflict stimuli presented, those that produce the Stroop effect. In contrast to this notion, the literature shows that the Stroop effect can differ dramatically even when the number of conflict stimuli is kept constant. The Stroop result depends critically on the input-driven feature of word-color salience – with the same number of conflict stimuli presented. The condition entailing equal discriminability of word and color (**Figure 1**, middle panel) is particularly notable. In this condition, word and color are of equal salience, so that the typical perceptual advantage favoring the word dimension is removed. Despite the presence of a large number of conflict stimuli, the Stroop effect evaporates. In summary, the overall Stroop results mandate a stimulus-driven explanation. When the nominally irrelevant dimension (word) is more salient than the target dimension (color), attention to the color is compromised and expressed as the Stroop effect. However, this result is neither robust nor inevitable (Dishon-Berkovits and Algom, 2000; Melara and Algom, 2003). The upshot is that control cannot serve as a viable explanation of the Stroop effect.

## PITFALL 4: COLOR-WORD CORRELATION AND WORD-RESPONSE CONTINGENCY RENDER CENTRAL CONTROL GRATUITOUS

Another major factor affecting the Stroop effect is the number of congruent and incongruent stimuli included in the set. Any imbalance in the respective frequencies introduces a color-word correlation over the experimental presentations. This contextual effect has been attributed to conflict and control. By contrast, we show that the effect is data driven. Let us note that virtually all Stroop studies in the literature entail a biased design in the sense that there is a difference in the frequency of congruent and incongruent stimuli – so that the study entails a colorword correlation. The presence of this correlation renders the nominally irrelevant word predictive of the target ink color. On a trial, first noticing the word provides the participant a greater than chance probability of guessing the to-be reported color. By attending to the irrelevant word, the participant thus maximizes color performance. Because the Stroop effect gauges the influence of the irrelevant word (if there is no such influence, the Stroop effect is zero), a large color-word correlation encourages attention to the word, thereby producing a large Stroop effect. Notably, this large Stroop effect is generated by data-driven correlation, not by central control.

It might come as a surprise to realize that biased designs are used in the vast majority of published Stroop studies. Consider the standard and most popular Stroop design in the literature. Four color words are combined with the corresponding four colors in a factorial design to yield the basic matrix of 16 color-word stimuli (see **Figure 3**). Of these 16 stimuli, four are congruent (in the diagonal of the matrix) and 12 are incongruent (off diagonal). In the face of this asymmetry, investigators typically present an equal number of congruent and incongruent stimuli in the experimental block. The typical block thus includes 36 congruent and 36 incongruent stimuli. Note that this parity is *only* possible by presenting each congruent stimulus more often the each incongruent stimulus. In the popular design, each congruent stimulus is presented nine times, whereas each incongruent is presented three times to create the matched frequency of 36 presentations. The *a priori* probability of a color given a word is not equal across all colors, so that the word becomes predictive of the target color. A color-word correlation thus is created in this standard Stroop design.

In point of fact, biased Stroop designs started with Stroop himself (Stroop, 1935). In his experimental block, Stroop used only incongruent stimuli. None of the color words appeared in its own color. Unwittingly, Stroop introduced a correlation between words and colors in his list. Noticing first that the word was RED, the participant could safely infer that the ink color is not red. A sizable correlation was thus created, which, in turn, generated the large Stroop effect observed (see **Figure 4**).

FIGURE 3 | Anatomy of the standard Stroop experiment: Four color words are combined factorially with four ink colors to produce 16 color-words combinations. The entries are frequencies of presentations in 72 trials in the typical "balanced" experiment where trials in the congruent and incongruent conditions occur with equal frequency (36 congruent stimuli and 36 incongruent stimuli). The four combinations on the minor diagonal are congruent stimuli, whereas the 12 off-diagonal combinations are incongruent stimuli. The only way to equate the frequency of congruent and incongruent stimuli in the experimental block – the popular practice – is to present each congruent stimulus more often than each incongruent stimulus (in this case, three times as often). This design creates a correlation over the experimental trials between the nominally irrelevant words and the target ink colors.

In an effort to estimate the influence on the Stroop effect of word-color correlation, Melara and Algom (2003) calculated the correlations lurking in the designs of 35 experiments from the literature. They plotted the Stroop effect against the built-in correlation in the design. The results are noteworthy: the correlation between the Stroop effect and the word-color contingency in the design amounted to 0.69. This means that close to 50% of the variability in the published Stroop effects is attributable to the word-color correlation built into the design of the experiment (**Figure 5**).

If a built-in correlation exists in most standard Stroop studies, the correlation is even more marked and extreme in control studies. As we just recounted, the standard 50–50% congruency design (with four colors and four color words) already entails an appreciable correlation between the words and the colors. The grossly imbalanced congruency structure created in control studies produces an even larger color-word correlation. The common design in control studies typically entails 80% (in)congruent stimuli, which translates to a sizeable color-word correlation. Perception of this correlation suffices to explain the results.

The upshot is that the notion of fine grain, centrally imposed control is gratuitous when explaining the Stroop effect. When a correlation makes the words predictive of the colors, people attend to the word, so that exclusive attention to the color is compromised – and a large Stroop effect emerges. People are eminently sensitive to correlations between stimuli in their environment, and the Stroop effect is a manifestation of this sensitivity (Kareev, 2000).

## DIRECTIONAL PROPORTION-CONGRUITY (PC) EFFECTS

Proponents of control or conflict point to the directional effects observed in biased designs: the larger the proportion of

incongruent stimuli in the set, the smaller the Stroop effect. At first glance, color-word correlation cannot generate this asymmetric outcome (the PC effect). The PC effect is a major source of evidence presented in support of the control and conflict monitoring account of the Stroop effect. On close scrutiny tough, the PC effect results from a correlation between specific *words* and specific *responses* in the experiment. In all 2 (word) × 2 (color) designs or in designs in which incongruent stimuli come in a favored color (e.g., the word RED comes mostly in green), the larger the relative number of incongruent stimuli, the larger the correlation between a given word and a given response. This relation is termed the *contingency-learning* account of Stroop and PC effects (Schmidt and Besner, 2008; Schmidt, 2016a,b, 2019; Schmidt et al., 2018). The contingency account readily explains the PC effect:

… in the mostly congruent condition, words are presented most often in their congruent color (e.g., RED 75% of the time in red). As such, color words are strongly predictive of the congruent response, which benefits congruent trials. On incongruent trials (e.g., RED in green), however, the word mispredicts the color response, resulting in a cost. The net result is an increased Stroop effect. In the mostly incongruent condition, the situation is reversed. Depending on the exact manipulation, color words might be presented most often in a specific incongruent color (e.g., GREEN most often in red). Thus, words are accurately predictive of the *incongruent* response, and mispredict a congruent response. The net effect is a reduced congruency Stroop effect. *What is most interesting about the contingency learning account of the PC effect is that it is unrelated to conflict, control…* [On this account], *learning of*  *stimulus–response correspondences is all that matters.* (Schmidt, 2016a, p. 1, emphasis added)

Schmidt's stimulus-driven account shows that the correlation created in biased Stroop designs between the words and the (color) responses readily explains the PC effects, which are otherwise attributed to conflict and control. Applying Occam's razor, Schmidt's account is favored over the central control account. We should mention that in general contingency learning is not related to attention per se. However, it is an important contextual factor within the Stroop domain (after all, Stroop is a test of selective attention). Within the Stroop task, contingency affects the selectivity of attention to the stimulus attributes, hence the magnitude of the Stroop effect observed.

#### Are Color-Word Correlation and Word-Response Contingency Both Necessary?

The *color-word* correlation account by Melara and Algom (2003) and the *word-response* contingency account by Schmidt (2019) explain variations in the magnitude of the Stroop effect without any reference to the notions of control and conflict adaptation. The two accounts actually complement each other. On both views, the Stroop effect is the result of perception of correlation or contingency in the data (see also Lorentz et al., 2016). The correlation and contingency accounts rest on a common principle, but a word seems in order to clarify their distinct roles in the Stroop domain.

Contingency learning best explains the PC effects observed in 2 (word) × 2 (color) designs and in multi-valued designs with favorite pairings of incongruent stimuli. Color-word correlation readily explains the Stroop results obtained in the standard 4 (word) × 4 (color) designs that do not include favorite incongruent pairings. This account also explains the appearance of the Stroop effect in so-called balanced designs entailing 50–50% of congruent and incongruent stimuli. In the study by Dishon-Berkovits and Algom (2000), incongruent stimuli appeared only once under some conditions (so that contingency learning was impossible), yet the authors showed how color-word correlation produced their results in this unusual matrix. In summary, both the correlation and the contingency varieties are useful in accounting for Stroop results. Significantly, they do so without appeal to central control, conflict, or conflict adaptation.

## THE GRATTON EFFECT

As we recounted at the outset, the Gratton effect (Gratton et al., 1992) or more appropriately, the Congruency Sequence effect (Schmidt, 2013, 2019; Weissman et al., 2014), comprises arguably the strongest piece of evidence marshaled in support of the conflict monitoring account. To reconstruct the chronology, the original finding by Gratton and her colleagues (Gratton et al., 1992) has lain dormant for almost a decade when it was resuscitated and brought to the fore by Botvinick et al. (2001) to support their newly formed theory of central conflict monitoring. Since the publication of the Botvinick et al. model, research on the Gratton effect has intensified appreciably, sustaining a vigorous debate on the source of the effect: genuine on-line conflict monitoring or yet another trial-sequence-based facilitation (e.g., Effler, 1978; MacLeod, 1991). Given the role of the Gratton effect in deciding the fate of the conflictmonitoring model as a Stroop theory, we devote some space to elucidate the ongoing debate.

The Gratton effect is the sequential variation by which the RT to a Stroop-incongruent stimulus is faster after experiencing another Stroop-incongruent stimulus than after experiencing a Stroop-congruent stimulus (e.g., Mordkoff, 2012; Weissman et al., 2014; Schmidt, 2019). Less attention has been given to the parallel observation that RT to a Stroop-congruent stimulus is usually faster after experiencing another Stroop-congruent stimulus than after experiencing a Stroop-incongruent stimulus (e.g., Mayr et al., 2003). This latter observation alone should have cast doubts on the validity of the conflict monitoring model as a Stroop theory. After all, congruent-congruent sequences do not entail (high) conflict, yet these sequences affect Stroop performance to the same extent as do incongruentincongruent sequences. The possibility that *both* types of sequences are accounted by factors unrelated to conflict becomes all the more likely. The focus on incongruent-incongruent sequences in the literature comes from the theoretical stress on conflict and its on-line resolution. On that view, the role of fine-grain central control during Stroop performance is to enhance target (color) processing and reduce task-irrelevant (word) processing on a *trial-by-trial* basis. It is these top-down penetrations that produce the Gratton effect: experiencing conflict instantly triggers control activity, which results in better performance on the immediately following trial.

## THE MAYR ET AL. CHALLENGE

Barely a year after the formal development of the centralconflict-monitoring model (Botvinick et al., 2001), Mayr et al. (2003) challenged the ability of the model to provide a valid account of the Gratton effect. In their seminal study, Mayr et al. (2003) pinpointed correctly a central (if implicit at that point) assumption of the conflict monitoring model: The conflict that regulates performance is *stimulus-independent*. According to the conflict monitoring model, the incongruent-incongruent sequence of RED in green-RED in green (complete repetition) should produce the same adaptation as the incongruentincongruent sequence of RED in green-BLUE in yellow (complete change). According to conflict monitoring theory, it is the conflict that counts, not the means of generating it. Mayr et al. (2003) have shown in contrast that the Gratton effect is profoundly stimulus dependent.

Mayr et al. (2003) used the flanker task [2(targets) × 2(flankers)], noting that complete repetitions comprise 50% of the incongruent-incongruent sequences in any standard flanker task (as do 50% of the congruent-congruent sequences). They recorded the typical Gratton effect in their experiment. However, when the authors examined their data separately for sequences of complete repetition and sequences entailing change, they found the Gratton effect only for the former. Mayr et al. (2003) concluded that "stimulus specific repetition … can provide a complete explanation of the … pattern observed" (p. 451). The authors then conceived a second flanker experiment where immediate complete repetitions were eliminated altogether and where response repetitions were also eliminated (by presenting the flanker display horizontally or vertically on alternate trials and requiring appropriate left-right or up-down responses). Note that the absence of repetitions is irrelevant for the conflict monitoring account, but it is critical for accounts based on input-driven processes (in particular, on priming of complete repetitions). The latter account predicts that eliminating repetitions should eliminate the Gratton effect. Consistent with this prediction, no Gratton effect was observed in Mayr et al.'s (2003) second experiment.

Mayr et al. (2003) noticed a further feature of the data that was inconsistent with the conflict monitoring account. Although immediate repetitions were avoided in their second experiment, such repetitions could and did occur between trial *n*−2 and trial *n*. Stimulus-driven accounts predict that an attenuated Gratton effect should still appear on such trial *n*−2 to trial *n* repetitions. The conflict monitoring account, by contrast, lacks a mechanism that allows for adaptation to occur across non-conflicting intermediate trials. The results disconfirmed the central-control model, showing instead the presence of adaptation across non-adjacent repetitions. Mayr et al. (2003) stated in their conclusion that "conflict-triggered control is not necessary to explain the [Gratton] effect" (p. 452), that "regulative demands are bypassed by stimulus-driven repetitions" (p. 452), thereby justifying their title on the presence of the Gratton effect "in the absence of executive control."

## RECENT GRATTON RESEARCH

Mayr et al.'s (2003) formative study heavily impacted Gratton research in the ensuing two decades (see Schmidt, 2019, for a review of this research). The Mayr et al. (2003) study made it clear that the standard 2 (targets) × 2 (flankers) flanker task is hopelessly biased by stimulus-stimulus and stimulus-response correlations. The same confounds apply to the Simon task (Simon, 1969; Simon and Berbaum, 1990; see also Hatukai and Algom, 2017) and to the small-set version [2 (words) × (colors)] of the Stroop task. To remove the biases from the Stroop-, Simon-, and the flankertask (by far the most popular test used), succeeding investigators applied both of Mayr et al.'s (2003) strategies: statistical and experimental. The first approach allows for stimulus repetitions (complete or of component features) to occur but removes them statistically in subsequent analysis (e.g., Schmidt and De Houwer, 2011; see also Mordkoff, 2012). In the second approach, stimulus and response repetitions are not presented or allowed in the experiment itself. To exclude repetitions from the experimental design, most researchers employed Mayr et al.'s (2003) alternate horizontal-vertical procedure, often extending the flanker design in time (e.g., Schmidt and Weissman, 2014). The overall results obtained (in both approaches) do not support the conflict monitoring account.

Because our goal in this critique is *conceptual* scrutiny, we next highlight just a few important points (again, see Schmidt, 2019, for a detailed review of recent research). The goal of studies adopting the second "experimental approach" was to test the presence of the Gratton effect under sterile, confound-free stimulus conditions. If the Gratton effect still emerges under such conditions, the central control account is bestowed powerful support. Consequently, strenuous attempts have been made to purge all species of stimulus- and responsebased contingencies from the experiment. Unfortunately, the elimination of the confounds came at the cost of eliminating the flanker task itself, i.e., deforming it in a significant way. The popular tactic has been using Mayr et al.'s (2003) horizontalvertical alternation and extending the task in time, so that the target display is preceded by an advance cue (e.g., Kunde, and Wühr, 2006; Schmidt and Weissman, 2014; Weissman et al., 2014). However, this tactic likely compromised the nature of the flanker task as an interference design, so that the results obtained probably hinged on the perceived validity of the advance cue. We note in parenthesis that the alternation procedure itself might invite unrelated processes into the experiment (e.g., benefits/costs of switching; see also, Schmidt and De Houwer, 2011). It is moot whether the "Gratton effect" observed in such temporal prime-probe tasks is truly comparable with the original effect observed in the standard flanker task. The following Gedanken experiment can clarify this issue, i.e., how the "Gratton effect" can be observed in the absence of conflict or interference.

Suppose that the target display is a shape in color and that the task is to name the color. On different trials, the shape can be a triangle or a circle and its color can be red or green. Suppose further that the display is preceded by a prime, a patch of red or green color. Clearly, a red triangle is not a conflict stimulus, yet a spurious "Gratton effect" may well be observed in this conflict-free task. The prime-probe experiments in the literature, while tightly controlled for stimulus and response confounds, might not comprise a real test of the source of the Gratton effect. The results obtained in the confound-free, prime-probe, and temporal flaker experiments are commensurably mixed and difficult to interpret. Some studies reported the Gratton effect (e.g., Schmidt and Weissman, 2014; Weissman et al., 2014), but further features of the results are difficult to interpret and are certainly inconsistent with a conflict monitoring account. For example, Weissman et al. (2014) did not find a correlation between the Gratton effect and the flanker effect and have sometimes recorded a negative Graton effect (a larger flanker effect after incongruent-incongruent sequences). Note that a negative Gratton effect is impossible under conflict monitoring.

Considering the Stroop effect itself, methodological problems have been plaguing that research, too. Following the Mayr et al. (2003) study, the 2 (words) × 2 (colors) task is no longer feasible due to the stimulus and response correlations inhering in this design. The popular 4 (words) × 4 (colors) design (see **Figure 2**) obviously is more appropriate, but there exists the problem of the relative number of congruent stimuli. As we shown, the popular 50%–50% congruent-incongruent ratio entails a sizeable correlation, biasing performance (Dishon-Berkovits and Algom, 2000; Melara and Algom, 2003; Schmidt and Besner, 2008). Only a truly random allocation of the colors to the words can eliminate this bias. Random combinations in a 4 × 4 design entail a rate of 25% congruent stimuli. However, even this regime is open to further biases related to stimulus sequences. Removing all confounds from the Stroop task (if at all possible) remains a daunting task (Mordkoff, 2012; see also Sabri et al., 2001; Melara and Algom, 2003; Hommel et al., 2004; Schmidt and De Houwer, 2011). Existing research did not match those exacting standards. For example, Weissman et al. (2014) used four color words and four colors but paired each word with only two of the colors. The study by Mayr and Awh (2009) came close with the authors using a large set of 6 (words) × 6 (colors) and changing the rate of congruent stimuli across separate blocks of the Stroop task. The block with lowest rate included 30% congruent stimuli, a figure which still deviated appreciably from random allocation (the full matrix of 36 color-word combinations includes six congruent stimuli or 17%, not 30%; see also Schmidt and De Houwer, 2011). The problems granted, most important for the present concerns is the uniform absence of adaptation or the Gratton effect in the classic Stroop task, a consistent result in studies using either the statistical approach or the experimental approach [we should mention that Duthoo et al. (2014) recorded the Gratton effect in their Stroop tasks, but, again, the control against biases was less than compelling].

We conclude with four final observations. First, the hallmark of modern Gratton research is the stimulus dependence of adaptation. Minor changes in preparation and paradigm can determine the presence or magnitude of the Gratton effect. For example, in prime-probe studies, the spatial location of the prime and the probe (same, different) greatly affects the outcome. In a similar vein, stimulus overlap and response overlap in cross-task Gratton studies are a major determinant of adaptation. These observations violate the basic assumption of the conflict monitoring account on the *stimulus-independence* of adaptation. Second, another basic (if unarticulated) assumption of conflict monitoring is that adaptation is *task-independent*. In violation of this assumption, recent research has shown that adaptation is singularly task-dependent. The Gratton effect can be observed in the Simon task but not in the Stroop or in the flanker task using the same design within the same study (Weissman et al., 2014). Conflict adaptation typically does not generalize across tasks. And, when conflict in the Stroop task results in adaptation on the next conflict trial in the Simon task, the transfer is typically explained by shared features and task sources. Third, the observation that congruentcongruent sequences produce the same result as incongruentincongruent sequences implies that the Gratton effect is not related to conflict. Our fourth and final observation is methodological. Extant Gratton research treats "interference tasks" such as those of Stroop, Simon, and flanker on the same footing. However, all interferences or conflict tasks are not the same (Chajut et al., 2009). Thus, the flanker and Simon tasks entail spatial attention, with targets and distractors separated in space. The Stroop task, by contrast, does not entail spatial attention: The color and the word occupy the same location in space, so that space-based attention to isolate the target is impossible. In the Stroop task, people dissect mentally the stimulus object in order to respond to the task-relevant feature.

On balance, the available evidence with regard to the Stroop or Gratton effect is inconsistent with the theory of centrally guided conflict monitoring account. Instead, it is local, inputdriven bottom-up processes that likely generate the Gratton phenomenon (when it is observed). It is important to bear in mind that there is in fact a long history of research on sequential effects in the Stroop task. Dalrymple-Alford and Budayr (1966) may have been the first authors to report such effects more than half of century ago. In subsequent research, a fair number of sequential effects have been documented, some entailing interference and some, like the Gratton effect, facilitation (see MacLeod, 1991, for a review). Notably, none of the authors associated with the various effects thought it necessary to evoke the heavy machinery of centrally controlled conflict management as an explanatory device. Given the variety of sequential effects identified within basic Stroop research, the reader may well perceive that there is something not altogether satisfactory about the disproportionate exposure and study of a single facilitatory effect. The reason (not justification) for that one-sided research is obvious: the Gratton effect has been imported to a theory and domain, which, at its roots, is foreign to the Stroop effect.

## CONCLUSION

Performance in the Stroop task and the resulting Stroop effect does not seem to involve higher-order cognitive level processes of control, nor does it seem likely that minute top-down penetrations determine responding in the Stroop and allied tasks. The particular theoretical embodiment assuming such trial-by-trial top-down penetrations, the account called conflict monitoring, is not optimally suited to explain the gamut of results obtained over the years in the vast Stroop literature. The conflict monitoring account even does not recognize the existence of major Stroop variables apart from the duo of the PC and Gratton effects (see MacLeod, 1991 and Melara and Algom, 2003, for reviews of Stroop research). Focusing solely on that pair of effects, most monitoring studies are compromised by the input-based confounds noted. The few confound-free studies that did demonstrate adaptation (most did not) – allegedly supporting central control – ignored alternative input-based explanations, at once more plausible and parsimonious. We believe that the converging evidence provided by the findings reviewed in this article confirms the lawful dependence of the Stroop effect on input factors and seriously challenges centrally controlled conflict monitoring as a valid theory of the Stroop effect. All facets of the effect are explained in a straightforward fashion by input-driven selective attention (indeed, its failure). Concerning the PC and Gratton effects in particular, all that is truly involved is perception of color-word correlation and of wordresponse contingency.

This much granted, we realize that conflict monitoring modelers (e.g., Yeung et al., 2011) may agree with the importance of the factors uncovered in basic Stroop research but maintain that conflict monitoring also plays a role in addition to these factors. This way of reasoning is depicted in **Figure 6**. Conflict monitoring theory basically entails that conflict (B) drives control (C) so that they produce the Stroop outcome including notably PC and Gratton effects (D). Monitoring modelers probably have no problems with the link between (A), the basic Stroop variables reviewed in this paper, and (B). At a first glance, the relation between (A) and (B), the primary theme of this review, might be regarded as orthogonal to the validity of the conflict monitoring account. However, the present review makes it eminently clear that one can get directly from (A) to (D), so that (B) and (C) are not needed. In other words, once one is willing to accept the principles learned from basic Stroop research, then conflict monitoring and control are superfluous added assumptions.

Of course, there is a trivial sense in which people willfully apply control over what they do and experience. They come to the lab as planned, they choose to perform with their eyes open, and they are in charge of many other perfunctory chores. In the Stroop task itself, people follow quite successfully the instructions to name the colors and ignore (overtly at the least) the words. Indeed, there are task-demand units already included in the computational model of Cohen et al. (1990). For example, in the study by Bauer and Besner (1997), the mental set espoused by the observer determined the Stroop outcome with the same stimuli and the same responses. We acknowledge of course these instances of control, but they do not serve (nor are they meant to serve) as a comprehensive theory of the Stroop effect.

Pursuant to the previous point, we also acknowledge that the control and conflict monitoring account include the notion of attention. However, "attention" in this model is a generic process, governed centrally (by a homunculus?), and, like "conflict," is not rigorously defined. By contrast, attention as studied in the Stroop literature is a well-defined process of selectivity. It

FIGURE 6 | Possible chain of reasoning accommodating both the basic Stroop findings reviewed in the paper and the conflict monitoring and control account. Briefly, basic Stroop variables (A) drive conflict (B), which, in turn, drives control (C), so that they produce (D) the Stroop outcome, including PC and Gratton effects. The conflict monitoring model basically entails that B and C produce D. However, since it is possible to get directly from A to D, the conflict monitoring model is gratuitous as a Stroop theory.

is concerned with determining the quality of focusing on the task relevant attribute while ignoring irrelevant information. The whole process is governed by bottom-up contextual factors.

Perhaps, also, there would be something instructive to be gained from the way that proponents of control theory come close to espousing the present view in certain cases. These researchers are just unable to jettison the underdefined concept of control even when clearly unwarranted to make their case. Thus, Julie Bugg, a leading investigator of control, proposed to classify the accounts of Stroop performance into *expectation-based* and strategically guided accounts versus *experience-based* and reactive adjustment accounts (e.g., Bugg et al., 2015). The latter class is comparable to the present approach, but then the authors hasten to add that "experience-based accounts also subsume conflictmonitoring accounts" (Bugg et al., 2015, p. 1350). The same indetermination marks Tom Braver's influential model, the Dual Mechanisms of Control (DMC; Braver, 2012). Braver, a foremost researcher of control, proposes to distinguish between two species of control, "proactive control" and "reactive control." The former acts strategically through top-down adjustments, whereas the latter acts locally in response to the stimulus that has just occurred. Concerning reactive control, Braver states that "[it] is stimulus driven and transient … is stimulus dependent … [and] is reliant on strong bottom-up … cues" (Braver, 2012, p. 108). Remove "control" from Braver's depiction and you have the view that we are presenting here. The problem we noted is that there does not seem to be any process exempt from control in Braver's (and in other proponents of control) view (thereby undermining the value of "control" as a useful scientific concept). Retaining "control" in all places and instances may be due to the peculiarity of these investigators' disposition: associating each trifle mental act with a specific brain structure and activation (Braver, for

#### REFERENCES


one, claimed to pinpoint different loci and activation for proactive and reactive control). However, such activations have not been shown to be *uniquely* linked to a specific act or task, and, in any case, recording activation in brain loci does not *ipso facto* comprise a theory and explanation.

Our skeptical conclusions agree with those arrived by Schmidt (2019) and by Firestone (2013) and Firestone and Scholl (2016) in the general domain of alleged top-down influences in perception. To echo Firestone (2013), the deepest shortcoming of central conflict monitoring theory is not the lack of support in most available evidence, but that it is simply the wrong kind of theory for the Stroop effect that it has appropriated from input-driven attention.

#### AUTHOR CONTRIBUTIONS

Both authors contributed equally to the manuscript.

#### FUNDING

Preparation of this paper was supported, in part, by an Israel Science Foundation Grant (ISF-274-15) to DA.

#### ACKNOWLEDGMENTS

We thank our two reviewers for their very helpful comments on earlier drafts of this project. In particular, James Schmidt's expert input was invaluable in improving the present paper. We also thank Hagar Cohen for her generous assistance with all phases of the work on this project.




Copi, I. M. (2015). *Symbolic logic*. 5th Edn. Indiana: Pearson.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: an integrative review. *Psychol. Bull.* 109, 163–203. doi: 10.1037/0033-2909.109.2.163


contingency. *J. Exp. Psychol. Learn. Mem. Cogn.* 34, 514–523. doi: 10.1037/0278-7393.34.3.514


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Algom and Chajut. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Priming Emotional Salience Reveals the Role of Episodic Memory and Task Conflict in the Non-color Word Stroop Task

Chiao Wei Hsieh\* and Dinkar Sharma

School of Psychology, University of Kent, Canterbury, United Kingdom

Previous research attempted to account for the emotional Stroop effect based on connectionist models of the Stroop task that implicate conflict in the output layer as the underlying mechanism (e.g., Williams et al., 1996). Based on Kalanthroff et al.'s (2015) proactive-control/task-conflict (PC-TC) model, our study argues that the interference from non-color words (neutral and negative words) is due to task conflict. Using a studytest procedure 120 participants (59 high and 61 low trait anxiety) studied negative and neutral control words prior to being tested on a color responding task that included studied and unstudied words. The results for the low anxiety group show no emotional Stroop effect, but do demonstrate the slowdown in response latencies to a block of studied and unstudied words compared to a block of unstudied words. In contrast, the high anxiety group shows (a) an emotional Stroop effect but only for studied negative words and (b) a reversed sequential modulation in which studied negative words slowed down the color-responding of studied negative words on the next trial. We consider how these findings can be incorporated into the PC-TC model and suggest the interacting role of trait anxiety, episodic memory, and emotional salience driving attention that is based on task conflict.

Keywords: emotional stroop interference, task conflict, proactive control, reactive control, reversed sequential modulation, priming effect, anxiety

#### INTRODUCTION

The Stroop task is often used to investigate executive control processes. In particular, to examine the ability to selectively attend to relevant and ignore irrelevant information (Stroop, 1935). The most common form of the task is one in which a word is printed in an ink color, with the focus to report the ink color and ignore the word. Typically, with color words the word and ink color can be congruent (e.g., word RED printed in red) or incongruent (e.g., word GREEN printed in red), with the difference in reaction time (RT) used to measure the Stroop effect. A neutral control (e.g., XXXX printed in red) can also be used to separate the Stroop effect into interference (difference between incongruent and neutral trials) and facilitation (difference between congruent and neutral trials) effects (MacLeod, 1991).

The Stroop task is thought to result from two types of conflict, informational conflict, and task conflict. Informational conflict is thought to be dependent on the congruency between the word and ink color, with conflict arising when the meaning of the word, and the ink color contradict

#### Edited by:

Thomas Kleinsorge, Leibniz Research Centre for Working Environment and Human Factors (IfADo), Germany

#### Reviewed by:

Eyal Kalanthroff, Hebrew University of Jerusalem, Israel Roland Pfister, Julius Maximilian University of Würzburg, Germany

## \*Correspondence:

Chiao Wei Hsieh ch638@kent.ac.uk

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 15 May 2019 Accepted: 23 July 2019 Published: 07 August 2019

#### Citation:

Hsieh CW and Sharma D (2019) Priming Emotional Salience Reveals the Role of Episodic Memory and Task Conflict in the Non-color Word Stroop Task. Front. Psychol. 10:1826. doi: 10.3389/fpsyg.2019.01826

**50**

each other (Klein, 1964; though see Shichel and Tzelgov, 2018 for further decomposition of informational conflict). Task conflict occurs between two potentially competing tasks. This can occur when certain stimuli become associated with certain tasks. For example, words tend to activate reading processes which results in competition between the task of reading and responding to the ink color (MacLeod and MacDonald, 2000; Goldfarb and Henik, 2007; Kalanthroff et al., 2013a,b, Entel and Tzelgov, 2018; Sharma, 2018).

Connectionist models have been used to develop theoretical accounts of the Stroop effect (Cohen et al., 1990; Botvinick et al., 2001). Central to these models is the flow of information from an input layer (color and word units) to an output layer (color response units). In addition, a task demand layer (color naming and word reading units) is included to bias information flow based on task goals (e.g., instructions to focus on color naming) between the input and output layers. In such models, informational conflict results from competition between the output units (referred to as response conflict). Although early models relied on information flow in a bottom-up fashion, later models also allowed for a proactive top-down control mechanism (Botvinick et al., 2001; De Pisapia and Braver, 2006; Braver, 2012) to help maintain focus on the task goal. One source of evidence to support a proactive mechanism of control is the sequential modulation effect (aka the Gratton effect), in which incongruent trials are responded to faster when their previous trials are also incongruent than when they are congruent (Gratton et al., 1992; Kerns et al., 2004). It is thought that the attentional system monitors the degree of response conflict (a conflict monitoring node), and uses this to proactively increase the activation to the task goal of color naming to help reduce interference from words on subsequent trials (Botvinick et al., 2001). It is thought that the anterior cingulate cortex (ACC) is involved in the conflict monitoring mechanism (Botvinick et al., 2001). A more recent model, the Proactive-control/task conflict (PC-TC) model (Kalanthroff et al., 2015, 2018), inherits the response conflict mechanism from earlier models, but in addition includes a mechanism for task conflict. Kalanthroff et al. (2015, 2018)suggested that task conflict arises from the inhibitory connection between the task demand layer and the output layer (implemented by raising the response threshold for all the units in the output layer), where the level of inhibition is determined by the level of competition between the task demand (color naming and word reading units) units (see **Figure 1**).

Support for the PC-TC model comes from several sources. First, the reversed facilitation effect in which congruent words take longer to respond to than non-words under low PC (for a review see Kalanthroff et al., 2018). Here it is thought that the word reading task demand unit is activated by the congruent word in a bottom-up fashion to produce greater task conflict with color naming, compared to a non-word. Second, Sharma (2018) also provided evidence for the influence of task conflict using the non-color word Stroop task. Sharma used a priming procedure in which participants learned neutral words during a study phase (see also MacLeod, 1996). A subsequent testing session included two types of blocks. A block of unstudied words and a mixed block of studied and unstudied words. In both testing blocks the task was to ignore the words and respond to the ink color. Primed words resulted in slower responses to all studied and unstudied words in the mixed block compared to the unstudied block. Sharma suggested that the PC-TC model could explain this finding by assuming an episodic memory unit that holds the studied words temporarily and activates the word reading task demand unit, which can result in task conflict (see **Figure 1**). In addition, Sharma showed that in the second half of the mixed block, when presumably PC diminishes, there was a reversed sequential modulation in which studied words had longer latencies when preceded by studied words, compared to when preceded by unstudied words (for a similar finding with studied non-words see Dumay et al., 2018). This is consistent with a task conflict explanation that is due to reactive control from studied words.

Although much of the research using the Stroop task has focused on using color words, there is considerable evidence that non-color words can also slow down response latencies (Klein, 1964; Sharma and McKenna, 1998; Burt, 2002). One of the most common non-color word versions of the Stroop task is one in which negative emotional words are compared to neutral words, often labeled the emotional Stroop task (Williams et al., 1996; Algom et al., 2004; McKenna and Sharma, 2004; Dalgleish, 2005). The emotional Stroop task has been widely used to investigate attentional bias in anxiety and other emotional disorders such as depression, phobias, post-traumatic stress disorder (PTSD), obsessive-compulsive disorder, and panic disorder. The difference in response latencies between negative emotional and neutral words is referred to as the emotional Stroop effect. Findings suggest that both non-clinical individuals with high trait anxiety and clinically anxious individuals show attentional bias toward threat-related words, whereas such threat-related bias is not found in non-anxious individuals (Bar-Haim et al., 2007; Phaf and Kan, 2007; Yiend, 2010).

Following the connectionist model of Cohen et al. (1990), previous models to explain the emotional Stroop effect have tacitly assumed that emotional interference occurs at the output layer due to response conflict. Williams et al. (1996) hypothesized that input units which represent negative emotional words could have higher resting activation levels (implemented by regulating the gain parameter). Consequently, the greater activation throughout the negative emotional word processing pathway results in greater competition with color response units in the output layer. Matthews and Harley (1996) hypothesized that attentional bias is contingent on the allocation of voluntary attention to threat stimuli. Adapting from Cohen et al.'s (1990) model, they introduced a threat monitoring unit in the task demand layer (to simulate trait like effects), as well as an emotion word unit in the input layer. When a threatening word is presented, the active threat monitoring task demand unit would sensitize the threat emotional word processing pathway which would result in greater response conflict in the output layer. An alternative model for negative emotional interference was provided by Wyble et al. (2008), who suggested that negative words affected the balance of control proactively. This was implemented by

mutual inhibition between the conflict monitoring unit and the negative emotional unit in their adaptive attentional control layer, and supports the conclusion that negative emotional words reduce proactive control to the task goal of color naming. This approach is also consistent with other more general models that make similar predictions, such as the Dual Competition Model (Pessoa, 2009) and the attentional control theory (Eysenck et al., 2007).

Since the role of task conflict has not been considered in connectionist models of the emotional Stroop effect, here we consider how this might be implemented. In the PC-TC model this can occur in a number of ways, but one way might be by greater activation of the word reading task demand unit. The word reading task demand unit can be activated in two ways, either in a bottom-up reactive fashion (e.g., by activation from negative word input units) or in a top-down proactive control mechanism (e.g., by a threat monitoring task demand unit in high anxious individuals or more generally by priming from negative schemas) that enables the word reading task demand unit to compete with the color naming task demand unit. Evidence for both mechanisms was provided by Sharma (2018) when comparing trials within and between blocks. Between blocks proactive control was evidenced as a general slowdown, in particular the neutral words in the block containing studied words were slower than those in a block without studied words. On the other hand, within a block of studied and unstudied words, an indication of reactive control came from a reversed sequential modulation in which studied words were slower to respond to when preceded by another studied word than an unstudied word.

The main aim of our research was to use the priming procedure developed by Sharma (2018) to investigate further evidence for the role of task conflict in the non-color word Stroop task. In our experiment participants study both negative and neutral words during an initial study phase, which is followed by a test phase comprising four blocks with different word categories: (1) a block of unstudied neutral words [C]; (2) a block of unstudied negative and neutral words, [NC]; (3) a mixed block of studied and unstudied neutral words, [CsC]; and (4) a mixed block of studied negative and unstudied neutral words, [NsC]. This leads to seven word categories, which are represented by the following labels: (note that letters within square brackets refer to the type of block and letters outside the square brackets refer to the type of word) [C]-C, [NC]-C, [NC]- N, [CsC]-C, [CsC]-Cs, [NsC]-C, and [NsC]-Ns. As previous research highlights differential results for high and low anxiety with negative emotional stimuli, we also investigate the role of trait anxiety (Kalanthroff et al., 2016).

We expected to replicate Sharma's (2018) finding of a general slowdown for the studied [CsC] compared to the unstudied neutral words [C] that is an indicator of task conflict from proactive control. We also extend this research to using studied negative words and expected to find a similar general slowdown for a [NsC] block compared to the unstudied [C] block.

If there is a general hypervigilance for negative stimuli in high anxiety, then this may appear either as longer response times for negative words than neutral words in [NC] or [NsC],

and/or as a general slowing in block [NC] or [NsC] compared to [C]. However, previous research on mixing negative and neutral words has shown weak effects (Williams et al., 1996). Indeed there is strong evidence that priming plays an important role in the emotional Stroop effect (Richards et al., 1992; Holle et al., 1997; Lundh and Czyzykow-Czarnocka, 2001). For example, Richards et al. (1992) showed that high anxious participants do not show an emotional Stroop effect when neutral and negative words were randomly mixed. However, a more robust effect occurred after negative mood induction or when negative and neutral words were blocked during the test (see also Holle et al., 1997). Priming the anxiety schema prior to testing can also have similar effects (see Lundh and Czyzykow-Czarnocka, 2001). This suggests that negative words produce interference in high anxiety but only when they have been primed. In line with Richards et al. (1992) we expected to find an emotional Stroop effect for high anxious participants in the block containing studied negative words, [NsC]. Comparing the neutral words in the [NsC] block and the [C] block could help to distinguish between response conflict and task conflict. The general prediction is that if negative stimuli increase response conflict, then response latencies will speed up across trials due to the feedback from conflict monitoring increasing activation of the color naming task demand unit (Botvinick et al., 2001). If negative words increase activation of the word reading task demand unit, then the PC-TC model would predict a slower response to neutral words in the [NsC] block than the [C] block.

In line with Attentional Control Theory, we also expected there to be a reduced effect of proactive control in high trait anxiety (Eysenck et al., 2007; Berggren and Derakshan, 2013; Kalanthroff et al., 2016). A reduced proactive control could be seen as a general slowdown from studied words that is larger in the low anxiety group than the high anxiety group. In addition, it suggests that further analysis of the mixed blocks may show signs of reactive control that is more apparent in the high anxiety group than the low anxiety group. In particular we contrasted pairs of consecutively presented trials: CsCs or NsNs trials with CCs or CNs trials, respectively. If the effects of reactive control are due to response conflict, then the PC-TC model predicts a sequential modulation effect in which studied words are faster to respond to after studied words. However, as shown by Sharma (2018), if the effects of reactive control lead to task conflict, then the PC-TC model predicts a reversed sequential modulation effect: slower responses to studied words on the trial after a studied word.

## MATERIALS AND METHODS

#### Participants

A 120 native English-speaking students from the University of Kent took part in this study for course credits or 5 pounds in cash. The sample comprised of 104 females and 16 males, aged 18–49, and mean age of 20.72 (SD = 4.755). Ethical approval was given by the School of Psychology Ethics committee at the University of Kent.

#### Design

A 7 × 2 mixed factorial design was employed. Word category ([C]-C, [NC]-C, [NC]-N, [CsC]-C, [CsC]-Cs, [NsC]-C, and [NsC]-Ns) was the within-subject factor, and Trait group (high, low) was the between-subject factor. The dependent variable was the mean correct response latency to respond to the words.

#### Apparatus and Materials

The experiment program was written in Psychopy 1.83.04 and presented on a 21-inch Dell <sup>R</sup> widescreen monitor. RT was measured during the Stroop tasks. The manual responses, presentation, and randomization of the words were controlled by Psychopy 1.83.04. The words used are shown in **Table 1**.

A total of 40 negative emotional words were chosen from Affective Norms for English Words (Warriner et al., 2013) and separated into two sets of 20 words. 120 neutral words were selected from the English Lexicon Project (Balota et al., 2007) and divided into six sets of 20 words. Each set contained an equal number of 4, 5, 6, 7, 8, 9, and 10 letter words, which were matched for word frequency (average Log frequency HAL of 8.84), which was in the midrange for the corpus of words (Range 0–17) (Balota et al., 2007); word valence (average valence mean of 2.56 and 5.59 for negative emotional and neutral words, respectively) (Range 1.26–8.53), and word arousal (average arousal mean of 5.52 and 3.87 for negative emotional and neutral words, respectively) (Range 1.6–7.79) (Warriner et al., 2013).

#### Procedure

An information sheet and a consent form were given to each participant upon their arrival. After signing the consent form, participants sat in front of the pc monitor and were asked to read through the instructions for the experiment's procedure. There were four phases in this study: the study phase, test phase, recall phase, and questionnaire phase.

#### Study Phase

Each participant was shown 40 words in white print on a black background, which mixed 20 negative emotional words from one of two negative emotional word sets and 20 control words from one of six neutral word sets, and was asked to memorize them as best as they can. To help participants enhance their memory, after a word was shown a five-point grading scale was presented (1 = 0%, 2 = 25%, 3 = 50%, 4 = 75%, and 5 = 100%), in which they rated how strong the word related to themselves. Each word was presented one at a time in white print at the center of the screen for 1500 ms, followed by an 800 ms blank screen prior to the fivepoint grading scale. The grading stage remained until a response was given before the next word was shown.

#### Test Phase

Practice trials were provided before the experimental Stroop task, which consisted of 20 non-words (e.g., dfbvxz, whcag, and vfjtd). These 20 non-words were printed in each of four colors (red, green, blue, and yellow) on a black background for 80 trials which were randomly displayed. Each trial started with a 1000 ms fixation at the center of the screen, followed by a non-word which remained until a response was provided

#### TABLE 1 | Word lists used in the study.

fpsyg-10-01826 August 9, 2019 Time: 12:15 # 5


before the next trial started. Participants were instructed to place their index and middle fingers from each hand on top of four keys (z = red, x = green, n = blue, and m = yellow) on a QWERTY keyboard, and they were asked to ignore the non-words and respond to the ink color as quickly and as accurately as possible.

The general instructions and procedure for the experimental Stroop task were identical to the practice phase. There were four blocks ([C], [NC], [CsC], and [NsC]) with two sets of words, comprised of either two sets of 20 control words or 20 control words mixed with 20 negative words. In each block, 40 words were printed in each of four colors for 160 trials, resulting in 640 trials for the Stroop task. The two sets of negative emotional words and six sets of neutral control words were assigned to four experimental blocks and counterbalanced across participants. Each word was presented in a random order in each block.

As soon as a block was completed, participants were given an option to take a short break and were instructed to carry on with the next block by pressing the space bar. The order of four blocks was counterbalanced across participants.

#### Recall Phase

The test phase was followed by the recall phase, in which participants had 180 s to write down as many words that they had seen during the study phase as they could remember on a blank sheet of paper.

#### Questionnaire Phase

The questionnaire phase followed the recall phase. The Spielberger State-Trait Anxiety Inventory (STAI) was given to participants, consisting of 20 statements for state anxiety which indicates how you feel right now, and trait anxiety implying how you feel in general, respectively (Spielberger et al., 1983).

## RESULTS

#### Analysis of the Stroop Task Data Preparation

Scores on the STAI-trait ranged from 20 to 78 (M = 48.80, SD = 12.33). Based on norms collected between 2005 and 2007 (N = 368) from students at University of Kent, trait anxiety scores of 50 or above represent percentile ranks 75% [85% (for males) and 72% (for females)]. Participants were assigned to the low (<50) or high (>=50) trait anxiety group for the ANOVA analysis. Average STAI-trait score in the high anxiety group (range 50–78, M = 58.56, SD = 7.39, N = 59) low anxiety group (range 20–49, M = 39.36, SD = 8.02, N = 61).

Four participants' data were removed: one was due to a high error rate (18.9%) and the other three data due to long RTs (above 2.5 standard deviation). The error rate of the remaining 116 participants (Low trait: N = 59; High trait: N = 57) was 4.50%. Prior to the analysis of mean correct response latencies, the first trial of each block and trials with an RT less than 200 ms and larger than 3,000 ms, which was 5.5% of the trials, were excluded.

#### Analysis of Response Latencies

The first analysis was executed on the mean correct RTs, using a 7 × 2 two-way mixed analysis of variance (ANOVA), with Word category ([C]-C, [NC]-C, [NC]-N, [CsC]-C, [CsC]-Cs, [NsC]-C, and [NsC]-Ns) as a within-subject factor, and Trait group (high, low) as a between-subject factor. Greenhouse-Geisser corrected values were reported when the sphericity assumption was violated.

The analysis revealed a significant main effect of Word category, F(3.29,375.18) = 3.59, MSe = 6133.02, p = 0.011, ηp <sup>2</sup> = 0.031. Bonferroni corrected t-tests indicated that there

was a significant difference between [NsC]-Ns (M = 742.25 ms, SE = 14.17) and [C]-C (M = 712.37 ms, SE = 12.41) words (p = 0.007). A main effect of Trait group was not significant F(1,114) = 0.277, MSe = 114719,80 p = 0.600, ηp <sup>2</sup> = 0.002. However, there was an interaction between Word category and Trait group, F(3.29,375.18) = 2.67, MSe = 6133.02, p = 0.042, ηp <sup>2</sup> = 0.023. Bonferroni corrected t-tests indicated that in the low trait anxiety group, [CsC]-C (M = 745.62ms, p = 0.004), [CsC]-Cs (M = 745.96 ms, p = 0.003), [NsC]-C (M = 743.45ms, p = 0.025) words took longer to respond to than the [C]-C (M = 709.42 ms) words. On the other hand, in the high trait anxiety group, the [NsC]-Ns (M = 746.40 ms, p = 0.003) words took longer than [CsC]-C (M = 705.76 ms) words (see **Figure 2**).

Further analysis of this interaction involved planned comparisons. First, there was an emotional Stroop effect for high trait anxiety with studied negative words, F(1,56) = 8.49, p = 0.006, ηp <sup>2</sup> = 0.13, but not unstudied negative words, F(1,56) = 0.45, p = 0.51, ηp <sup>2</sup> = 0.008. There was no emotional Stroop effect for low trait anxiety (both F's < 1, p's > 0.37). Second, we looked for evidence for proactive task conflict across the blocks. For each trait group we asked if the mixed blocks took longer than the baseline block [C]. For the low anxiety group this was significant ([C] vs. [CsC], F(1,58) = 18.86, p < 0.001, ηp <sup>2</sup> = 0.25; [C] vs. [NsC], F(1,58) = 10.44, p = 0.002, ηp <sup>2</sup> = 0.15). This replicates similar findings by Sharma using neutral words and extends these to studied negative words. For the high trait anxiety group this was not significant for [NsC] vs. [C], F(1,56) = 3.19, p = 0.08, ηp <sup>2</sup> = 0.05, or [CsC] vs. [C], F(1,56) = 0.12, p = 0.73, ηp <sup>2</sup> < 0.01, or [NC] vs. [C], F(1,56) = 0.61, p = 0.4, ηp <sup>2</sup> = 0.01. These findings suggest that in high trait anxiety, studied words tend not to slow latencies for blocks with studied words. The above results generally indicate that blocks with studied words tend to have longer latencies than a block of unstudied control words, and that this seems to reduce with trait anxiety. Correlations with trait anxiety scores, however, showed that this impression was only supported for [CsC] [r(114) = −0.22, p = 0.016)] but not [NsC] [r(114) = −0.019, p = 0.84].

To investigate whether priming words results in task conflict from reactive control we carried out a series of planned comparisons within the two mixed blocks. We asked whether studied words take longer to respond to when preceded by studied words compared to unstudied words (i.e., trial CsCs vs. trial CCs or trial NsNs vs. trial CNs). For the low anxiety group there was no significant reversed sequential modulation effect in either [CsC], t(58) = 0.81, p = 0.42 or [NsC], t(58) = 0.002, p = 0.99. For the high anxiety group there was a significant reversed sequential modulation effect in [NsC], t(56) = 2.31, p = 0.025 but not [CsC], t(56) = 0.02, p = 0.98 (see **Figure 3**). The modulation found in high anxiety for studied negative words suggests that the reversed sequential modulation increases with higher levels of trait anxiety. This was supported by a positive correlation between trait anxiety scores and reversed sequential modulation scores in the [NsC] block, r(114) = 0.187, p = 0.04. The correlation between trait anxiety and reversed sequential modulation scores in the [CsC] block was not significant, r(114) = −0.155, p = 0.097, though the negative direction indicates that lower anxiety may be associated with a reversed sequential modulation effect from studied neutral words.

#### Analysis of Recall Phase

Prior to the analysis, the words written down by participants during the recall phase were checked. Misspellings were accepted (e.g., masaccare for massacre) but the altered forms were excluded (e.g., angry changed to anger).

A 7 × 2 mixed ANOVA was conducted with Word category ([C]-C, [NC]-C, [NC]-N, [CsC]-C, [CsC]-Cs, [NsC]-C, [NsC]- Ns) as a within-subject factor, and Trait group (high, low) as a between-subject factor. The results revealed a significant main effect for Word category F(2.37, 270.63) = 239.44, MSe = 4.57, p < 0.001, ηp <sup>2</sup> = 0.68 but not for Trait group F(1,114) = 1.34, MSe = 2.34, p = 0.249, ηp <sup>2</sup> = 0.01 or the Word category × Trait group interaction, F(2.37,270.63) = 1.25, MSe = 4.57, p = 0.289, ηp <sup>2</sup> = 0.01. Mean recall rates for studied words [NsC]-Ns (M = 0.24) and [CsC]-Cs (M = 0.17) are significantly higher than other word categories, all t's > 12.85, p's < 0.01 (see **Figure 4**). Moreover, mean recall was significantly higher for [NsC]-Ns (M = 0.24) than [CsC]-Cs (M = 0.17), t(115) = 4.76, p < 0.001. We also checked if the difference between [NsC]-Ns and [CsC]- Cs correlated with trait anxiety scores; it did not, r(114) = 0.120, p = 0.199.

We also note that the results were the same when analyzed using the lenient criteria in which altered forms were accepted as well (only the main effect of Word category was significant F(2.41,274.19) = 252.55, MSe = 4.60, p < 0.001, ηp <sup>2</sup> = 0.69; all other main and interaction effects were not significant F's < 2.2, p's > 0.1).

## DISCUSSION

The memory results were as expected: (a) higher recall for studied words than unstudied words. (b) Studied negative words have higher recall than studied neutral words. (c) No interaction with trait anxiety. As expected, these results show the typical episodic memory advantage for recently attended words and words that are semantically related. The lack of interaction with trait anxiety is consistent with previous reviews of the memory bias literature (see Williams et al., 1997; Mitte, 2008). There is some evidence that a memory bias with trait anxiety can occur for free recall memory tasks but only when the depth of processing is shallow during the study phase (for a review see Herrera et al., 2017). Our findings are consistent with these reviews, as a high level of processing (words were rated for self-relevance) was required during the study phase.

The main findings, however, are from the response latencies to the non-color (neutral and negative emotional) words. For the low anxiety group, there are two key findings. First, neutral words in the studied block [CsC] took longer to respond to than neutral words in the unstudied block [C]. This evidence is consistent with the task conflict hypothesis that is driven by proactive control and replicates findings by Sharma (2018) for studied neutral words. Within the PC-TC model, this could be due to stronger proactive activation of the word reading task

FIGURE 2 | Mean correct reaction times for high and low trait anxiety group in each Word category. Error bars represent the 95% confidence interval adjusted for the within-subject design, calculated separately for high and low trait anxiety group (Masson and Loftus, 2003). C, neutral word; N, negative word; s, word is studied.

FIGURE 3 | Showing the reversed sequential modulation effects within block [CsC] (trial CsCs vs. trial CCs) and block [NsC] (trial NsNs vs. trial CNs) for high and low trait anxiety group. Error bars represent 95% confidence interval for the paired difference between two means, computed separately for the effects within block [CsC] and block [NsC] (Pfister and Janczyk, 2013). C, neutral word; N, negative word; s, word is studied.

demand unit in studied blocks. Second, the slowdown for studied neutral words also generalizes to a block with studied negative words (i.e., [NsC]), and therefore suggests that negative words can also slow down responses in low anxiety but only when these words have been primed. As there was no difference between the two studied (neutral and negative) blocks, together these two

findings highlight the influence of studying words in the noncolor Stroop task. Therefore, this extends the original work of MacLeod (1996) and replicates the findings by Sharma (2018) to further demonstrate that the study-test methodology can be used to investigate implicit memory in the non-color word Stroop task.

For the high trait anxious group, there are three main findings. First, an emotional Stroop effect in the [NsC] block but not in the [NC] block. This supports previous research that priming a negative scheme (in our study by learning negative and neutral words during an initial study phase) can generate attentional biases (c.f. Richards et al., 1992; Holle et al., 1997; Lundh and Czyzykow-Czarnocka, 2001). In our study, the priming was specific to negative words for the high trait anxiety group and replicates the findings by Richards et al. (1992) and Holle et al. (1997) where negative words induced interference after negative mood induction or by presenting negative words in a single block of trials. More generally this finding also implicates the importance of memory processes when considering interference in the non-color word Stroop task. For example, it is possible that the priming effects found for studied negative words in high anxiety may have activated episodic memory (see **Figure 1**). In addition, it is possible that such memory activation also initiates higher thought processes such as rumination or self-reflective processes. This may also explain why studied neutral words did not show a similar effect in the high anxiety group. Further research is therefore required to further explore this possibility.

Second, although the high trait anxious group showed an emotional Stroop effect in the [NsC] block, there was no evidence of a general slowdown for the neutral words in the [NsC] block compared to the baseline [C] block. The lack of a general slowdown contrasts with the slowdown seen for the low anxiety group. This finding is consistent with Attentional Control Theory which suggests that in high trait anxiety the balance of control shifts away from proactive control. In the PC-TC model this could be implemented as a reduced top-down activation of the word reading task demand unit.

Third, in high trait anxiety, studied negative words took longer to respond to when preceded by studied negative words compared to unstudied neutral words. Here, we speculate on several potential explanations for the reversed sequential modulation. Sharma (2018) reported a similar finding with studied neutral words, namely a reversed sequential modulation for studied neutral words. He suggested a possible reactive control mechanism that activates task conflict in the PC-TC model. A similar mechanism could be suggested for studied negative words in high trait anxiety. However, it is also possible to suggest the influence of a proactive control mechanism. In **Figure 1**, the word reading task demand node can be activated by proactive control from episodic memory. Although this influence may be weaker in high anxiety, our results suggest that the episodic memory unit may be activated when responses are made to two consecutively presented studied negative words. These two suggestions point to task conflict as a potential mechanism. However, it is also possible to suggest that task conflict is not involved if it is assumed that two consecutively presented studied negative words require greater attentional resources that subsequently results in a relaxation of cognitive control, as suggested by the Duel Competition Model (Pessoa, 2009). In a connectionist model without task conflict, this could be implemented by inhibition of the conflict monitoring unit analogous to the inhibition from the negative emotion unit in the Adaptive Attentional Control model (Wyble et al., 2008). If this was the case, then more detailed predictions from the Wyble et al. (2008) model would suggest that studied words slow down subsequent neutral trials analogous to the slow effect reported by McKenna and Sharma (2004) for negative stimuli. We checked for a slow effect from studied words (negative or neutral), but could not find any evidence. Future research could examine the conditions under which slow effects appear. However, we believe the current work is more parsimonious with a model that includes task conflict.

Two puzzling features of our results suggest further avenues for future research. First, we did not find a reversed sequential modulation for studied words in the low anxiety group. This did not replicate the reversed sequential modulation for studied neutral words found by Sharma (2018). We suggest this may be due to the stronger proactive control from episodic memory to the word reading task demand unit in our study than in Sharma. This may be due to using a larger set of studied words (40 in our experiment compared to 20 in Sharma), and/or using negative words which forms a stronger semantic category than the neutral words set. Second, for the high anxiety group the reversed sequential modulation did not occur for the studied neutral words. This is surprising, particularly as it is thought that in high anxiety the balance of control shifts toward reactive control. One explanation might be that using a larger studied word set may have reduced the saliency of each individual item. However, for the studied negative words their stronger semantic associations may have enabled them to maintain a stronger level of priming.

In conclusion, our findings provide further evidence in support of using the priming technique to elucidate the role of task conflict in the non-color word Stroop task. For low anxiety, studying (neutral and negative) words resulted in a general slowdown that was attributed to task conflict resulting from a proactive control mechanism that increases activation of the word reading task demand node. For high anxiety, the general slowdown is limited suggesting a reduced influence from proactive control.

## DATA AVAILABILITY

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

## AUTHOR CONTRIBUTIONS

CWH and DS designed the study, analyzed the data, and wrote the manuscript. CWH collected and cleaned the data.

## REFERENCES

fpsyg-10-01826 August 9, 2019 Time: 12:15 # 9



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Hsieh and Sharma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The Loci of Stroop Interference and Facilitation Effects With Manual and Vocal Responses

#### Maria Augustinova1,2 \*, Benjamin A. Parris<sup>3</sup> and Ludovic Ferrand<sup>2</sup> \*

<sup>1</sup> UNIROUEN, CRFDP, Normandie Université, Rouen, France, <sup>2</sup> CNRS, LAPSCO, Université Clermont Auvergne, Clermont-Ferrand, France, <sup>3</sup> Department of Psychology, Bournemouth University, Bournemouth, United Kingdom

#### Edited by:

Joseph Tzelgov, Ben-Gurion University of the Negev, Israel

#### Reviewed by:

Walter J. B. van Heuven, University of Nottingham, United Kingdom Eyal Kalanthroff, The Hebrew University of Jerusalem, Israel Eldad Keha, Achva Academic College and Hebrew University of Jerusalem, Israel, in collaboration with reviewer Eyal Kalanthroff

#### \*Correspondence:

Maria Augustinova maria.augustinova@univ-rouen.fr Ludovic Ferrand ludovic.ferrand@uca.fr

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 28 May 2019 Accepted: 18 July 2019 Published: 19 August 2019

#### Citation:

Augustinova M, Parris BA and Ferrand L (2019) The Loci of Stroop Interference and Facilitation Effects With Manual and Vocal Responses. Front. Psychol. 10:1786. doi: 10.3389/fpsyg.2019.01786 Several accounts of the Stroop task assume that the Stroop interference effect has several distinct loci (as opposed to a single response locus). The present study was designed to explore whether this is the case with both manual and vocal responses. To this end, we used an extended form of the Stroop paradigm (Augustinova et al., 2018b) that successfully distinguishes between the contribution of the task vs. semantic vs. response conflict to overall Stroop interference. In line with past findings, the results of Experiment 1 yielded an important response modality effect: the magnitude of Stroop interference was substantially larger when vocal responses were used (as opposed to key presses). Moreover, the present findings show that the response modality effect is specifically due to the fact that Stroop interference observed with vocal responses results from the significant contribution of task, semantic, and response conflicts, whereas only semantic and response conflicts clearly significantly contribute to Stroop interference observed with manual responses (no significant task conflict was observed). This exact pattern was replicated in Experiment 2. Also, and importantly, Experiment 2 also investigated whether and how the response modality effect affects Stroop facilitation. The results showed that the magnitude of Stroop facilitation was also larger when vocal as opposed to manual responses were used. This was due to the fact that semantic and response facilitation contributed to the overall Stroop facilitation observed with vocal responses, but surprisingly, only semantic facilitation contributed with manual responses (no response facilitation was observed). We discuss these results in terms of quantitative rather than qualitative differences in processing between vocal and manual Stroop tasks, within the framework of an integrative multistage account of Stroop interference (Augustinova et al., 2018b).

Keywords: stroop interference and facilitation, response modality, task conflict, semantic conflict, response conflict

## INTRODUCTION

The typical results in the well-known Stroop task (Stroop, 1935) are at least twofold. First, Stroop interference refers to longer color identification times for color-incongruent Stroop words (i.e., words that are displayed in a color that is different from the one they designate such as "BLUE" displayed in green; hereafter BLUEgreen), than for color-neutral words (e.g., the word "DOG" displayed in green ink, hereafter DOGgreen) or letter strings (e.g., "XXXX" displayed in green

ink, hereafter XXXXgreen). Second, Stroop facilitation refers to shorter color identification times for color-congruent Stroop words (i.e., GREENgreen) than for color-neutral words (e.g., the word "DEAL" displayed in green ink, hereafter DEALgreen) or letter strings (e.g., "XXXX" displayed in green ink, hereafter XXXXgreen).

A still unexplained finding in the Stroop literature is that the magnitude of both Stroop interference and facilitation depends on the type of response output that the Stroop task involves (MacLeod, 1991). Specifically, this magnitude is usually substantially larger when the individuals are required to identify the font color of written characters vocally (saying the color name aloud) as compared to manually (key press responses; e.g., White, 1969; Neill, 1977; Redding and Gerjets, 1977; McClain, 1983; Sharma and McKenna, 1998). Moreover, some have argued that manual and vocal responses have differential access to the systems producing interference and facilitation (Glaser and Glaser, 1989; Sugg and McDonald, 1994; Sharma and McKenna, 1998). This suggests that the way participants identify the color of Stroop stimuli determines how the different features of these compound stimuli are actually processed. This puzzling idea might explain the recently renewed interest in just this issue (Kinoshita et al., 2017; Fennell and Ratcliff, 2019; Zahedi et al., 2019; Parris et al., in press see also Parris et al., under review).

It has been argued that the manual response Stroop task is a different task to the vocal response Stroop task (Kinoshita et al., 2017). Specifically, since manual responding involves color classification and vocal responding requires color naming, the tasks differ and so then should the mechanisms that lead to Stroop interference. Such an account predicts that the locus of Stroop effects varies by response mode and finds support in influential models of the Stroop task (Glaser and Glaser, 1989; Sugg and McDonald, 1994; Sharma and McKenna, 1998). In contrast, the traditional response competition view of the Stroop task (Morton and Chambers, 1973; Cohen et al., 1990; Roelofs, 2003) has assumed that the reading task that produces Stroop effects is invariant and, thus, that the locus of the Stroop effect should be similar for manual and vocal responding.

It is clear that there are differences between the two response modes. With a manual response, the irrelevant word provides evidence toward another key press option. With the vocal response, the irrelevant word provides evidence toward another speech production option. Therefore, the ensuing Stroop interference will depend on how difficult it is to favor the correct, or inhibit the alternative, option. That the interference magnitudes with the two response modes are not equivalent suggests that suppressing the irrelevant speech code is harder than suppressing the irrelevant key press option. This is perhaps due to there being separate effectors (different fingers) for each response option with a manual response vs. a single effector (one mouth) with the vocal response. With the manual response, it is possible that a speech code is also produced for the irrelevant word, but this speech code would not interfere because there is no competing speech code associated with the relevant, correct response. It is possible then, that for both response modes, the locus is at the later stage of response selection but that response selection happens in different modules due to there being different effectors.

Alternatively, it is possible that the response mode necessarily modifies how the irrelevant word is processed and, therefore, modifies the locus of Stroop interference. It has been argued that responding vocally encourages the phonological encoding of the irrelevant word, more than the manual response (Van Voorhis and Dark, 1995; Burt, 1999; if it happens at all with a manual response – see Kinoshita et al., 2017, and Parris et al., in press), which would account for the large Stroop effects with vocal responses and supports the notion that the task itself modifies how the word is processed. However, some models of the Stroop task predict no Stroop effects at all with manual responses (Glaser and Glaser, 1989), some predict no effect with manual responses depending on the button label-type (Sugg and McDonald, 1994), and some predict differential access to semantics with manual responses (Sharma and McKenna, 1998; although see Brown and Besner, 2001). Despite these competing accounts, until recently, empirical work that addressed the issue of processes underlying this response modality effect was scarce. Also, and importantly, the recent work that has been carried out has not directly investigated established sources of conflict and, furthermore, has considered Stroop facilitation effects.

To illustrate, the recent application of the RTCON2 multichoice decision-making and confidence model (Ratcliff and Starns, 2013) to the data from the four-color Stroop tasks firmly pointed to the fact that the differences between vocal and manual response modality lie for an important part outside of the processes of decision-making (Fennell and Ratcliff, 2019, Experiment 3; see also converging evidence from the two-color choice Stroop task). However, since the RTCON2 model does not describe sources of conflict or specify processes that contribute to performance at other stages of processing in the Stroop task (i.e., all these processes are confounded in the non-decision time parameter of RTCON2), processes driving the substantial response modality effect – observed in this experiment – remain to be elucidated. Therefore, the two experiments reported in this paper were designed to shed additional light on whether manual and vocal Stroop tasks result in interference effects at different levels of processing. Specifically, we set out to investigate whether the manual and vocal response Stroop tasks produce task, semantic, and response conflict and the much-understudied effects of response and semantic facilitation.

## Varieties of Conflict and Facilitation in the Stroop Task

Several accounts of the Stroop task posit that Stroop interference results from the simultaneous contribution of two distinct conflicts. In addition to response conflict as depicted above, they posit the existence of the so-called task conflict (hereafter TC-RC accounts; see Augustinova et al., 2018b; Parris et al., under review, for reviews) instead of the semantic conflict assumed by the aforementioned SC-RC accounts. Task conflict is thought to arise

for all kinds of readable items (including color-congruent words, e.g., BLUEblue) and is, thus, different from the specific colorincongruency conflict occurring for color-incongruent Stroop words (e.g., BLUEgreen). This is because the individual's attention is drawn to an irrelevant task (i.e., word reading) instead of being fully focused on the relevant task (i.e., color naming), leading to the two task sets to compete (e.g., Monsell et al., 2001; Goldfarb and Henik, 2006, 2007; Kalanthroff et al., 2013a,b; Parris, 2014 for empirical demonstrations; see also, e.g., Aarts et al., 2009; Desmet et al., 2011; Elchlepp et al., 2013 for fMRI and EEG evidence).

Other accounts argue for the existence of stimulus (or semantic) conflict, which is thought to occur earlier in processing than the response conflict (but likely after task conflict – see Hershman and Henik, 2019). For instance, Seymour (1977) considers that this (early) conflict occurs at conceptual encoding of color-incongruent words (e.g., BLUEgreen) because the meaning of the word dimension (i.e., blue for BLUEgreen) and that of the color dimension (i.e., green here) both correspond to colors. Indeed, "(. . .) delays of processing occur whenever distinct semantic codes are simultaneously activated, and that these delays become acute when the conflicting codes are values on a single dimension or closely related dimensions" (p. 263; see also, e.g., Scheibe et al., 1967; Seymour, 1974; Seymour, 1977; Stirling, 1979; Luo, 1999; but see, e.g., Hock and Egeth, 1970 for the idea of perceptual rather than conceptual type of stimulus conflict). There is substantial evidence for the presence of conflict at this level of processing (Zhang and Kornblum, 1998; De Houwer, 2003; Manwell et al., 2004; Schmidt and Cheesman, 2005; Augustinova and Ferrand, 2014a; see also, e.g., van Veen and Carter, 2005; Szucs and Soltész, 2010; Chen et al., 2013; Killikelly and Szücs, 2013; Augustinova et al., 2015; for electrophysiological and fMRI evidence), although it has been proposed that stimulus conflict is an indirect measure of response conflict (Roelofs, 2003; see Parris et al., under review, for a review and evaluation of this evidence). It is, thus, not surprising that conceptualizations of multistage processing in the Stroop task assume that color-incongruent words (e.g., BLUEgreen) generate both stimulus and response conflicts (hereafter SC-RC accounts; see Augustinova et al., 2018b for this terminology and review of these accounts).

Given that considerable behavioral, electroencephalography (EEG), and functional magnetic resonance imaging (fMRI) evidence points to the viability of both SC-RC and TC-RC multistage accounts of Stroop interference (see above), several lines of research highlighted the necessity to adopt an integrative perspective that would allow for bridging the two previously outlined multistage perspectives (Augustinova et al., 2018b; Parris et al., under review; for reviews). To implement this latter integrative proposal empirically, Augustinova et al. (2018b); see also Ferrand et al., in press) proposed that color-associated incongruent words (e.g., SKYgreen) and color-neutral letter strings (e.g., XXXgreen) supplement the standard color-incongruent words (e.g., BLUEgreen) and color-neutral words (e.g., DOGgreen) that are commonly used in the standard Stroop task (see above). Indeed, if the color-neutral letter strings (e.g., XXXgreen) and words (e.g., DOGgreen) only trigger task conflict, the color incongruency involved in both color-associated (e.g., SKYgreen) and standard (e.g., BLUEgreen) color-incongruent words triggers additional type(s) of conflict. More specifically, color-associated incongruent words (e.g., SKYgreen) trigger both task and semantic conflicts, and standard color-incongruent words (e.g., BLUEgreen) trigger all three types of conflict (i.e., task, semantic, and response; see the section "Present Study," for further developments).

Using this extended form of the Stroop paradigm – that builds on both SKY-PUT design suggested by Neely and Kahan's (2001) and Klein's (1964) semantic gradient – all three conflicts (i.e., task, semantic, and response conflicts) have been shown to contribute significantly to standard Stroop interference in both adults (Augustinova et al., 2018b) and reading-level children (Ferrand et al., in press) and have been shown to have specific developmental trajectories (Ferrand et al., in press). Taken together, these studies not only strongly reaffirm that the standard (i.e., overall) Stroop interference constitutes a composite and not a unitary (response-level) phenomenon but also clearly show the relevance of an integrative perspective bridging SC-RC and TC-RC multistage accounts. Yet, the extent to which these same components actually contribute to the overall Stroop interference collected with manual responses is a still-open issue. Therefore, the present study examined whether and the extent to which task, semantic, and response conflicts are affected by the type of response output (verbal vs. manual) that the Stroop task requires.

Additionally, the present study also examined how different forms of facilitation are modified by response modality. Indeed, to the best of our knowledge, only one published study has explored the potential variety in Stroop facilitation effects. Using a vocal response, Dalrymple-Alford (1972) reported a 42-ms semantic-associative facilitation effect (e.g., DOGblue – SKYblue) and a 63-ms standard facilitation effect (e.g., DOGblue – BLUEblue), suggesting a response facilitation effect of 21 ms. Interestingly, however, when compared to a letter string baseline (e.g., XXXblue), the congruent semantic associates actually produced interference, a finding implicating an influence of task conflict. These isolable forms of facilitation are interesting, require further study with more modern methods, and have the potential to shed light on impairments in selective attention and cognitive control. Of further interest of the present study is how these two forms of facilitation are modified by response modality.

## The Response Modality Effect Examined Within Multistage Accounts of Stroop Interference

In the aforementioned study of Augustinova et al. (2018b), the response modality effect was not an issue under consideration. Yet, the specific contributions of the task (e.g., DOGgreen – XXXgreen), semantic (e.g., SKYgreen – DOGgreen), and response conflict (e.g., BLUEgreen – SKYgreen) to the overall Stroop interference were examined with both manual (Experiment 1) and vocal responses (Experiment 2). While the contribution of

both response and semantic conflicts was significant in both experiments, with only the former being larger with the vocal response, the one of task conflict failed to reach significance when the Stroop task was administered with manual as opposed to vocal responses. Likewise, Kinoshita et al. (2017) observed task conflict with the vocal (Experiment 1) but not the manual (Experiment 2) Stroop task, but did not include a semantic Stroop condition to distinguish response and semantic conflict. However, more recently, Kinoshita et al. (2018) reported that both task and semantic conflicts were significant with both verbal (Experiments 1 and 3) and manual responses (Experiments 2 and 4), albeit with the magnitude of task conflict (but not of semantic conflict) being larger when a vocal (as opposed to manual) response output was required<sup>1</sup> . They did not, however, include a measure of response conflict. Thus, only one of the above studies included all three conflict types in the same study (Augustinova et al., 2018b), but none investigated facilitation types, and in all the above studies, response modality was a between-subjects factor.

There is only one study as far as we are aware that has used a within-subject design to investigate all three conflict types in both manual and vocal responses. Sharma and McKenna (1998) reported that task conflict (which they referred to as the lexical component of the Stroop effect) and semantic conflict were present when a verbal but not manual response output was required but that response conflict was present with both response types (see also, e.g., Redding and Gerjets, 1977; McClain, 1983). Sharma and McKenna's original conclusion about the lack of semantic Stroop effects with the manual response Stroop task was based on comparisons of adjacent conditions (in terms of response times), but Brown and Besner (2001) reanalyzed Sharma and McKenna's data using non-adjacent conditions and revealed semantic Stroop effects with manual responses. However, given that the adjacent conditions did not reveal evidence of semantic conflict, its magnitude must have differed between response modes.

In summary, of the four studies reviewed here, three provide evidence for a lack of task conflict with a manual response, but one provided evidence for the presence of task conflict with a manual response. Of the three studies designed to assess semantic conflict, all three provide evidence for semantic conflict with a manual response, but one showed greater semantic conflict with the vocal response Stroop task. Of the two studies designed to assess the individual contribution of response conflict, both provide evidence for larger response conflict with a vocal response. However, in only two of these studies were all three conflict types manipulated in the same experiment, and in only one of these studies was response modality manipulated within subjects. Notably, none of the above studies considered varieties of Stroop facilitation.

## Present Study

The present study was designed to further explore the types of conflict and facilitation and, thus, the locus of Stroop effects, with manual and vocal responses. To this end, the aim of Experiment 1 was to generalize the findings of Augustinova et al. (2018b); without the response stimulus interval manipulation, with manual and vocal responses. The aim of Experiment 2 was to extend these findings by including measures of response and semantic facilitation and by employing a fully withinsubjects design.

To this end, the present study used the aforementioned extended form of the Stroop paradigm (Augustinova et al., 2018b). The irrelevant dimension of all stimuli included in this paradigm (i.e., color-neutral letter strings, color-neutral words, color-associated and standard color-incongruent words) is composed of letters and, thus, is assumed to generate task conflict. Importantly, they do so to the same extent, except for the non-readable color-neutral letter strings (e.g., XXXgreen). In line with the bimodal, interactive activation model with (amodal) semantics (McClelland and Rumelhart, 1981; McClelland, 1987; Grainger and Ferrand, 1996; Stolz and Besner, 1996; Ferrand and New, 2003; McNamara, 2005), the processing of the written dimension of these color-neutral letter strings (i.e., xxx) stops at the orthographic prelexical level. The processing of the written dimension for all other stimuli composed of words (e.g., dog, sky, and blue) stops, on the other hand, with access to meaning (i.e., after a full chain of visual, orthographic, lexical, and semantic processing has come to completion). Consequently, the significant difference in mean response latencies between Stroop color-neutral words and letter strings (e.g., DOGgreen - XXXgreen) is thought to solely reflect differences in activation of the irrelevant reading task set and, hence, of the differential amount of the task conflict that this entails. Indeed, because the meaning of colorneutral words (e.g., dog for DOGgreen) is not related to a color (unlike sky or blue), the aforementioned contribution of task conflict to overall Stroop interference is not intermixed with that of the semantic and response conflicts that are generated by color incongruency.

Turning now to the separation of semantic and response conflicts and facilitation, numerous studies have argued that color incongruency causes semantic conflict (see Seymour's reasoning outlined above). Also, and importantly, in line with Seymour (1977), semantic conflict is generated to the same extent by associated (e.g., SKYgreen) as compared to standard (e.g., BLUEgreen) Stroop words (e.g., see Augustinova et al., 2015 for N400-like evidence). Consequently, the significant difference in mean response latencies between color-associated and colorneutral trials (e.g., SKYgreen – DOGgreen) is likely to reflect the semantic conflict that color-associated (e.g., SKYgreen) unlike color-neutral (DOGgreen) Stroop words generate. Indeed, given that color-associated words do not activate (pre-)motor responses linked to the associated color (e.g., press a blue button on seeing SKY; see Schmidt and Cheesman, 2005 for a direct demonstration), the aforementioned contribution of semantic conflict to overall Stroop interference is not confounded with that of response conflict – generated by standard color-incongruent

<sup>1</sup>As suggested by one of the reviewers, it is worth mentioning that the semantic conflict observed in the manual response condition was quite small. Specifically, its magnitude was 14 ms in the condition of high proportion and 8 ms in the condition of low proportion of neutral distractors (i.e., # signs) that was also manipulated in this study, although it did not significantly affect the aforementioned amplitudes of semantic conflict.

words only (e.g., BLUEgreen, but see Hasshim and Parris, 2014, 2015 for a discussion of this study). Likewise, semantic facilitation with color-associated congruent stimuli (e.g., SKYblue) would not be confounded with response facilitation observed on standard congruent trials (e.g., BLUEblue).

Finally, the irrelevant word dimension of standard incongruent trials also primes the aforementioned (pre-)response tendency that – for these words (e.g., blue for BLUEgreen) – is part of the response set. It therefore interferes with the (pre- )response tendency primed by the meaning of the relevant color dimension (green here). Consequently, the significant difference in mean response latencies between standard and associated color-incongruent trials (e.g., BLUEgreen – SKYgreen) is thought to result from this (pre-)motor (i.e., response) conflict occurring at the level of response processing and/or output. Likewise, the difference between color-associated congruent trials and standard congruent trials (e.g., SKYblue – BLUEblue) would represent response facilitation. Indeed, both task and semantic conflicts are assumed to be equal in those two types of color-incongruent items (BLUEgreen and SKYgreen, see above) even though more complex interactions between these different conflicts cannot be excluded.

To sum up, in both Experiments 1 and 2, the positive difference in mean response latencies between standard color-incongruent words and color-neutral letter strings (e.g., BLUEgreen – XXXgreen) was used to measure the magnitude of overall Stroop interference. Furthermore, in both experiments (and as in Augustinova et al., 2018b's study), the positive difference in mean response latencies between color-neutral words and letter strings (e.g., DOGgreen – XXXgreen) was used as a proxy for assessing the specific contribution of task conflict to this overall Stroop interference. The positive difference in mean response latencies between color-associated incongruent and color-neutral trials (e.g., SKYgreen – DOGgreen) was used as a proxy for assessing the specific contribution of semantic conflict to overall Stroop interference. Finally, the positive difference in mean response latencies between standard color-incongruent and color-associated incongruent trials (e.g., BLUEgreen – SKYgreen) was used as a proxy for assessing the specific contribution of response conflict to overall Stroop interference.

In Experiment 2, the magnitude of overall Stroop facilitation was also measured. It corresponded to the positive difference in mean response latencies between color-neutral words and standard color-congruent words (e.g., DOGblue – BLUEblue). Furthermore, the positive difference in mean response latencies between color-neutral trials and color-associated congruent trials (e.g., DOGblue – SKYblue) was used to isolate the specific contribution of semantic facilitation to the aforementioned overall Stroop facilitation. Finally, the positive difference in mean response latencies between color-associated and standard color-congruent trials (e.g., SKYblue – BLUEblue) was used to capture the specific contribution of response facilitation to overall Stroop facilitation.

The implementations of the Stroop paradigm depicted above, thus, enabled us to further assess the nature of processes that are influenced by the variations in the response output commonly employed in the Stroop task. To this end, color identification items were collected with both vocal and manual responses in both experiments. The response modality varied between participants in Experiment 1 and within participants in Experiment 2. Given the important discrepancies between findings regarding whether and the extent to which task and semantic conflict occur, respectively, with manual and vocal responses, we only a priori predicted that in both studies, the magnitude of Stroop interference will be larger with vocal as compared to manual responses and that this difference should result at least in part from a difference in response conflict.

#### EXPERIMENT 1

## Method

#### Participants

Seventy-six psychology undergraduates (56 females and 10 males, all native French speakers reporting normal or corrected-tonormal vision, Mage = 19.5 years; Mmin = 18; Mmin = 24) at Université Clermont Auvergne, Clermont-Ferrand, France, took part in this experiment in exchange for a course credit. The data of four participants were excluded from the analyses<sup>2</sup> , leaving a total of 72 participants (38 in the manual and 34 vocal response modality).

#### Design and Stimuli

Since the participants were randomly assigned to one of the two response modality conditions, the data were collected using a 2 (response modality: manual vs. vocal) × 4 (stimulus type: color-incongruent words vs. color-associated words vs. colorneutral words vs. color-neutral signs) design, with the first of these being used as a between-participants factor. There were 60 trials for each stimulus-type factor condition (resulting from five repetitions of the same set of stimuli), which varied randomly within a single block of 240 experimental trials.

The stimuli (presented in lowercase Courier font, size 18, on a black background) consisted of four color words: rouge [red], jaune [yellow], bleu [blue], and vert [green]; four colorassociated words: tomate [tomato], maïs [corn], ciel [sky], and salade [salad]; four color-neutral words: balcon [balcony], robe [dress], pont [bridge], and chien [dog]; and strings of Xs of the same length as the color-incongruent trials. The four colorassociated words were selected as strong associates (tomatored: 49.4%; corn-yellow: 30.2%; sky-blue: 44%; and salad-green: 31.5%) from French word association norms (Ferrand and Alario, 1998; De La Haye, 2003) and pretested as depicted in Augustinova and Ferrand (2007). In each condition, all the stimuli were similar in length (4.5, 5, 4.75, and 4.75 letters on average for the colorincongruent words, the color-associated words, the color-neutral words, and the strings of Xs, respectively) and frequency (74, 82, and 84 occurrences per million for the color-incongruent words, the color-associated words, and the color-neutral words,

<sup>2</sup>One participant made more than 33% of errors; the microphone did not detect responses for 2 participants, and EPrime failed to record responses for 1 participant.

respectively) according to Lexique (New et al., 2004). Colorincongruent and color-associated items always appeared in colors that were incongruent with the meaning of their word dimension.

#### Apparatus and Procedure

fpsyg-10-01786 August 19, 2019 Time: 16:56 # 6

EPrime 2.1 (Psychology Software Tools, Pittsburgh, PA, United States) running on a PC (Dell Precision) was used for stimulus presentation and data collection. The participants who were tested individually were seated approximately 50 cm from a 17-inch Dell color monitor. With both response modalities, their task was to identify the color of letter strings presented on the screen as quickly and accurately as possible while ignoring their meanings. To this end, they were instructed to concentrate on the white fixation cross (" + ") that appeared in the center of the (black) screen at the beginning of each trial. After 500 ms, the fixation point was replaced by the stimulus that continued to be displayed until the participant responded or until 2,000 ms had elapsed.

In the manual response modality, the participants were required to respond on a keyboard placed on a table between them and the monitor. The response keys were labeled with colored stickers such that a red, blue, yellow, and green roundshaped sticker covered, respectively, the "S," "D," "K," and "L" keys of an AZERTY-type keyboard. Consequently, the participants pressed the "red" key with the middle finger and "blue" key with the index finger of their left hand, and the "yellow" key with the index finger and "green" key with the middle finger of their right hand. In the vocal response modality, the participants were required to respond out loud. Their responses were recorded via a Koss 70-dB microphone headset and stored on a Sony IC Recorder-ICD PX333.

Before the beginning of the experimental block, the participants were familiarized with specificities of a given response modality. Following MacLeod (2005), 128 keymatching practice trials were used in the manual response modality so the participants can adequately learn the key–color correspondence. In the vocal response modality, the number of practice trials was reduced to 32 items. In both conditions, these practice trials consisted of strings of asterisks (presented in four aforementioned colors).

## Results and Discussion

Latencies greater than 3 SDs above or below each participant's mean latency for each condition (i.e., less than 2% of the total data in the task administered with manual responses and less than 3% of the total data in the task-administered with oral responses) were excluded from the analyses.

Mean reaction times for correctly identified items were subsequently analyzed in the 4 (stimulus type: standard colorincongruent words vs. associated color-incongruent words vs. color-neutral words vs. color-neutral signs) × 2 (response modality: manual vs. vocal) analysis of variance (ANOVA) (see **Table 1** for descriptive statistics). This analysis revealed the significant main effect of stimulus type [F(3,210) = 142.40; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.670] and a marginally significant one of response modality [F(1,70) = 2.88; p = 0.094, η<sup>p</sup> <sup>2</sup> = 0.039]. This latter effect was due to the fact that color identification times tended to be faster for vocal compared to manual responses. The latter main effects were also included in the significant stimulus type × response modality interaction [F(3,210) = 15.33; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.180]<sup>3</sup> .

Its decomposition further revealed that the simple main effect of stimulus type was significant in both manual [F(3,68) = 17.83; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.440] and vocal [F(3,68) = 53.61; p < 0.001, ηp <sup>2</sup> = 0.703] response modalities. Additional contrast analyses of these simple main effects revealed that in both response modalities, latencies for standard color-incongruent words were significantly longer than those observed for color-neutral signs (both p<sup>s</sup> < 0.001). Thus, a substantial amount of Stroop interference (i.e., BLUEgreen – XXXgreen) occurred in both response modalities. Yet, latencies for color-neutral words were significantly longer than those observed for color-neutral signs (see **Tables 1**, **2**) only in the vocal (p < 0.001) but not in the manual (p = 0.159; Mdifference = 7 ms; 95%CI = −3 to 18) response modality. This latter result implies that, in the Stroop task administered with manual responses, the contribution of task conflict to the overall Stroop interference failed to reach significance, whereas the contribution of both semantic (i.e., SKYgreen – XXXgreen) and response conflicts (i.e., BLUEgreen – SKYgreen) was significantly independently of the response output that was required (all p<sup>s</sup> < 0.001).

To examine further the extent to which the variation in response modality specifically influences task vs. semantic vs. response conflict, magnitudes of these conflicts were analyzed in 3 (conflict type: task vs. semantic vs. response) × 2 (response modality: manual vs. vocal) ANOVA (see **Table 2**). This analysis revealed significant main effects of conflict type [F(2,140) = 20.46; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.226] and of response modality [F(1,70) = 21.72; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.237] that were also included in the significant conflict type × response modality interaction [F(2,140) = 5.10; p = 0.007, η<sup>p</sup> <sup>2</sup> = 0.068]. Its decomposition further revealed that the simple main effect of response modality was significant on task [F(1,70) = 29.54; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.297] and response conflicts [F(1,70) = 8.18; p = 0.006, η<sup>p</sup> <sup>2</sup> = 0.105], such that their contribution to the overall interference was significantly larger when vocal (as opposed to manual) response output was required (see **Table 2**). This latter variation in the response output failed to influence the magnitude of semantic conflict [F(1,70) = 0.40; p = 0.532, η<sup>p</sup> <sup>2</sup> = 0.006; Mdifference = 4 ms; 95%CI = −17 to 9]. The contribution of the latter conflict to the overall interference was significant but remained of the same magnitude with both types of the required response output (see **Table 2**).

<sup>3</sup>The results of the same analysis on percentages of errors somewhat mirrored those observed on RTs as it revealed the significant main effect of stimulus type [F(3,210) = 25.70; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.269] that was also included in the significant stimulus × response modality interaction [F(3,210) = 15.36; p < 0.001, ηp <sup>2</sup> = 0.180] with the main effect of response modality remaining non-significant [F(1,70) = 1.08; p = 0.302, η<sup>p</sup> <sup>2</sup> = 0.015]. As can be seen in **Table 1**, the decomposition of the overall interaction suggests that all types of items were equally error prone in manual response modality (all p<sup>s</sup> ≥ 0.418), whereas standard color-incongruent items were significantly more error prone than the other kinds of items (all p<sup>s</sup> < 0.001) in vocal response modality. In sum, these results are not only in line with past studies but also rule out the possibility of speed–accuracy trade-off.

TABLE 1 | Mean correct response times (in milliseconds), standard errors (in parentheses), and percentages of errors observed as a function of stimulus type and response modality.


M, manual; V, vocal; ni, not included.

TABLE 2 | Stroop-like effects (in milliseconds and percent ratios) observed as a function of response modality.


<sup>∗</sup>p < 0.001; <sup>a</sup>p = 0.008; <sup>b</sup>p < 0.01; M, manual; V, vocal; RT diff., reaction time differences; ni, not included. Bold values correspond to the Response Modality Effect.

Given the important differences in the speed of processing across the two types of response output, the previous analyses were supplemented by those of distinct conflict computed in the form of interference ratio (Augustinova et al., 2018a). Such that for each individual, the observed magnitude of response conflict for instance was divided by its appropriate baseline (BLUEgreen – SKYgreen/SKYgreen). Resulting ratios were subsequently analyzed in 3 (conflict-type percent: task vs. semantic vs. response) × 2 (response modality: manual vs. vocal) ANOVA (see **Table 2**). This analysis mirrored the aforementioned results, such that it revealed significant main effects of conflict type [F(2,140) = 16.49; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.191] and of response modality [F(1,70) = 38.26; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.353] that were also included in the significant conflict type × response modality interaction [F(2,140) = 7.07; p = 0.001, η<sup>p</sup> <sup>2</sup> = 0.092]. Its decomposition further revealed that the simple main effect of response modality was significant on

the ratio of task [F(1,70) = 37.97; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.352] and of response conflict [F(1,70) = 11.02; p = 0.001, η<sup>p</sup> <sup>2</sup> = 0.136], such that their contribution to the overall interference was significantly larger when vocal (as opposed to manual) response output was required (see **Table 2**). Again, this latter variation in the response output failed to influence the ratio of semantic conflict [F(1,70) = 0.56; p = 0.458, η<sup>p</sup> <sup>2</sup> = 0.008; Mdifference = −0.006; 95%CI = −0.023 to 0.010].

In line with past literature, the results reported show substantially larger magnitudes of Stroop interference with vocal as compared to manual responses (see **Table 2**). These differences were due to the fact that both response and task conflict contributed less when manual response output was required – to the point that the contribution of task conflict remained nonsignificant in this response modality replicating Augustinova et al. (2018b); Experiment 1. The contribution of semantic conflict to overall Stroop interference remained significant, but the size of its magnitude remained equivalent across the two types of response output (see **Table 2**).

These results have several potentially interesting implications. First, the possible absence of task conflict in the Stroop task administered with manual responses, at least when measured by comparing response to color neutral and repeated Xs baseline, suggests that qualitative (Sharma and McKenna, 1998; Kinoshita et al., 2017) rather than just quantitative (Brown and Besner, 2001; Augustinova et al., 2018b; Parris et al., in press) differences between response modes. Thus, the investigation of the response modality effect in the Stroop task might actually add to uncovering the different components of Stroop interference observed with manual as compared to vocal responses supporting the notion that the two tasks are not equivalent (e.g., naming vs. categorization task entailing the different processes; see, e.g., Kinoshita et al., 2017; Fennell and Ratcliff, 2019). Given the importance of this second implication of the results reported, the following experiment was designed to (a) replicate these results while the response modality was manipulated within participants, and (b) extend them to Stroop facilitation (i.e., difference in mean reaction times for color-neutral and standard color-congruent words; DOGgreen – BLUEblue).

The rationale behind this extension corresponds to a further investigation of the fact that the magnitude of semantic conflict remained equivalent with both response modalities (see also Augustinova et al., 2018b but see Sharma and McKenna, 1998; Brown and Besner, 2001; Kinoshita et al., 2018). If semantic processing in the Stroop task is indeed invariant (and as such it cannot be prevented and/or reduced), results on Stroop facilitation should logically mirror those observed in the present experiment on Stroop interference. More specifically, Stroop facilitation observed with both manual and vocal responses should result from a substantial amount of semantic facilitation (i.e., differences in mean reaction times for color-neutral words and associated color-congruent words, e.g., DOGgreen – SKYblue) that should arise in both response modalities. However, the contribution of response facilitation (i.e., differences in mean reaction times for associated and standard color-congruent words; e.g., SKYgreen – BLUEblue) should be reduced in manual response modality. The following experiment was designed to test just these predictions.

## EXPERIMENT 2

#### Method

Forty-five psychology undergraduates (36 females and 9 males, all native French speakers reporting normal or corrected-to-normal vision, Mage = 21.04 years; Mmin = 19; Mmin = 26) at Université Clermont Auvergne, Clermont-Ferrand, France, took part in this experiment in exchange for a course credit. The data of nine participants were excluded from the analyses<sup>4</sup> , leaving a total of 36 participants. Unlike in Experiment 1, response modality factor was manipulated within participants. Thus, the order of the two response modalities was counterbalanced in a random fashion, such as half of the participants responded with a manual, the other half with a vocal response modality first. Stimulustype factor used in Experiment 1 was supplemented with two new kinds of items: standard color-congruent (BLUEblue) and associated color-congruent (SKYblue) words. In other words, color words not only appeared in colors that were incongruent but also congruent with the meaning of their word dimension. There were 48 trials in each condition of stimulus-type factor (resulting from four repetitions of the same set of colorincongruent and color-neutral stimuli and 12 repetitions of the same set of color-congruent stimuli) that varied randomly within a single block of 288 experimental trials (that was executed in each of the two response modalities). Because of this balancing, the facilitation effect was subject to a contingency bias (Schmidt and Besner, 2008; Schmidt et al., 2015). However, our main interest lies in processes underlying the response modality effect; thus, the same randomization procedure was used (see also, e.g., Fennell and Ratcliff, 2019).

Finally, DMDX software (Forster and Forster, 2003) was used for stimulus presentation and data collection. Remaining aspects were identical to those depicted in the section "Methods" for Experiment 1, including the practice trials that were administered again before the beginning of each of the two experimental blocks.

#### Results and Discussion

Latencies greater than 3 SDs above or below each participant's mean latency for each condition (i.e., less than 1% of the total data in the task administered with manual responses and less than 2% of the total data in the task-administered with oral responses) were excluded from the analyses.

Mean reaction times for correctly identified items were first analyzed in the omnibus 6 (stimulus type: standard colorincongruent words vs. associated color-incongruent words vs. color-neutral words vs. color-neutral signs vs. associated colorcongruent words vs. standard color-congruent words) × 2

<sup>4</sup>This exclusion was due to the fact that in the vocal response modality, eight participants exhibited more than 33% of errors and/or no responses (because the microphone did not detect their responses), and 1 participant made irrelevant mouth/tongue movements that systematically triggered the voice key prematurely.

(response modality: manual vs. vocal) ANOVA (see **Table 1** for descriptive statistics). This analysis revealed the significant main effects of stimulus type [F(5,175) = 113.65; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.765] and of response modality [F(1,35) = 6.40; p = 0.016, η<sup>p</sup> <sup>2</sup> = 0.155]. This latter effect was due to faster color identification times for manual compared to vocal responses. The latter main effects that were also included in a stimulus × response modality [F(5,175) = 27.92; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.444]<sup>5</sup> .

The decomposition of stimulus × response modality interaction (see above) revealed that the simple main effect of stimulus type was significant in both manual [F(5,31) = 18.42; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.748] and vocal [F(5,31) = 44.39; p < 0.001, ηp <sup>2</sup> = 0.877] response modalities. Additional contrast analyses of these simple main effects revealed that in both response modalities, latencies for standard color-incongruent words were significantly longer than those observed for color-neutral signs (both p<sup>s</sup> < 0.001), suggesting that a significant Stroop interference occurred with both types of response output.

As in Experiment 1, latencies for color-neutral words were significantly longer than those observed for color-neutral signs (see **Table 1**) in the vocal (p < 0.001) but not in the manual (p = 0.145; Mdifference = 5 ms; 95%CI = 2–12) response modality. This suggests again the absence of the significant contribution of the task conflict in this latter response modality, whereas both semantic (e.g., SKYgreen – DOGgreen) and response conflicts (e.g., BLUEgreen – SKYgreen) significantly contributed to Stroop interference in both response modalities (all p<sup>s</sup> < 0.001).

Additionally, these contrast analyses revealed that latencies for both color-neutral words were significantly longer than those observed for standard color-congruent items (both p<sup>s</sup> < 0.001), suggesting that a significant Stroop facilitation (e.g., DOGgreen – BLUEblue) occurred with both types of response output. The additional contrast analyses revealed that under both response modalities, the Stroop facilitation resulted from a significant contribution of semantic facilitation (e.g., DOGgreen – SKYblue; all p<sup>s</sup> < 0.01) modality, whereas the contribution of response facilitation (e.g., SKYbleu – BLUEblue) failed to reach when manual responses were used (p = 0.219; Mdifference = 8 ms; 95%CI = −20 to 5), while it was significant and of great magnitude (see **Table 2**) when vocal response output was required.

#### The Influence of Response Modality on Distinct Components of Stroop Interference

To examine further the extent to which the variation in response modality specifically influences task vs. semantic vs. response conflict, as in Experiment 1, magnitudes of these conflicts were analyzed in 3 (conflict type: task vs. semantic vs. response) × 2 (response modality: manual vs. vocal) ANOVA (see **Table 2**). This analysis revealed significant main effects of conflict type [F(2,70) = 35.00; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.500] and of response modality [F(1,35) = 63.60; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.645] that were also included in the significant conflict type × response modality interaction [F(2,70) = 8.65; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.198]. Its decomposition further revealed that the simple main effect of response modality was significant on task [F(1,35) = 37.67; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.518] and response conflicts [F(1,35) = 25.05; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.417], such that their contribution to the overall interference was significantly larger when vocal (as opposed to manual) response output was required (see **Table 2**). This latter variation in the response output failed to influence the magnitude of semantic conflict [F(1,35) = 0.76; p = 0.388, η<sup>p</sup> <sup>2</sup> = 0.006; Mdifference = -5 ms; 95%CI = −17 to 7]. Recall that the contribution of the latter conflict to the overall interference was significant but remained of the same magnitude with both types of the required response output (see **Table 2**).

Given the important differences in the speed of processing across the two types of response output even within participants, and as in Experiment 1, the previous analyses were supplemented by those of conflicts computed as interference ratios. Resulting ratios were subsequently analyzed in 3 (Conflict-type percent: task vs. semantic vs. response) × 2 (Response modality: manual vs. vocal) ANOVA (see **Table 2**). This analysis mirrored the aforementioned results. Such that it revealed significant main effects of Conflict-type [F(2,70) = 32.98; p < 0.001, ηp <sup>2</sup> = 0.485] and of Response-modality [F(1,35) = 72.65; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.675] that were also included in the significant Conflict-type × Response-modality interaction [F(2,70) = 8.96; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.204]. Its decomposition further revealed that the simple main effect of Responsemodality was significant on the ratio of task [F(1,35) = 41.95; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.545] and of response conflict [F(1,35) = 24.05; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.407] such that their contribution to the overall interference was significantly larger when vocal (as opposed to manual) response output was required (see **Table 2**). Again, this latter variation in the response output failed to influence the ratio of semantic conflict [F(1,35) = 0.33; p = 0.569, η<sup>p</sup> <sup>2</sup> = 0.009; Mdifference = −0.005; 95%CI = −0.022 to 0.012].

#### The Influence of Response Modality on Distinct Components of Stroop Facilitation

To examine further the extent to which the variation in response modality specifically influenced the contribution of semantic vs. response facilitation to the overall Stroop facilitation, magnitudes of these facilitation effects were analyzed in 2 (facilitation type: semantic vs. response) × 2 (response modality: manual vs. vocal) ANOVA (see **Table 2**). This analysis revealed significant main effects of response modality [F(1,35) = 10.94; p = 0.002, η<sup>p</sup> <sup>2</sup> = 0.238] and of facilitation type [F(1,35) = 18.19; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.342]. The facilitation type × response modality interaction was also significant [F(1,35) = 55.341; p < 0.001, ηp <sup>2</sup> = 0.613]. It was also significant when these latter effects were analyzed as facilitation ratios. More specifically, with these latter

<sup>5</sup>The results of the same analysis on percentages of errors somewhat mirrored those observed on RTs as it revealed a significant main effect of Stimulustype [F(5,175) = 32.12; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.479] that was also included in the significant Stimulus-type × Response-modality interaction [F(5,175) = 14.97; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.300] with the main effect of Response-modality remaining non-significant [F(1,35) = 2.07; p = 0.158, η<sup>p</sup> <sup>2</sup> = 0.056]. As can be seen in **Table 1**, the decomposition of the overall interaction suggests that standard colorincongruent items were significantly more error-prone than the other types of items (all p<sup>s</sup> < 0.001) in the vocal modality; it was also the case in the manual response modality (all p<sup>s</sup> < 0.03), except for the difference between standard color-incongruent items and associated color-incongruent items (p = 0.34).

indicators, the main effect of facilitation type was non-significant [F(1,35) = 2.42; p = 0.128, η<sup>p</sup> <sup>2</sup> = 0.065], whereas the one of response modality [F(1,35) = 18.41; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.345] was significant. As already mentioned, facilitation type × response modality interaction [F(1,35) = 12.24; p = 0.001, η<sup>p</sup> <sup>2</sup> = 0.259] was significant. The further decomposition of this latter interaction revealed that the simple main effect of response modality was significant on the ratio of response facilitation [F(1,35) = 22.73; p < 0.001, η<sup>p</sup> <sup>2</sup> = 0.394], such that its contribution to overall Stroop facilitation was significantly larger when vocal (as opposed to manual) response output was required (see **Table 2**). This latter variation in the response output failed to influence the ratio of semantic facilitation [F(1,35) = 0.41; p = 0.529, η<sup>p</sup> <sup>2</sup> = 0.011; Mdifference = 0.006; 95%CI = −0.012 to 0.022], such that the semantic facilitation contributed significantly to overall Stroop facilitation phenomenon in both response modalities (see **Table 2** for the very same pattern of results observed with magnitudes of semantic vs. response facilitation).

It is important to note that the aforementioned pattern of results would have been diluted by the use of color-neutral signs (as opposed to words) as a baseline. Indeed, even though color-neutral signs and words are often used interchangeably as baselines, standard Stroop facilitation observed in vocal response modality (p = 0.558; Mdifference = 5 ms; 95%CI = 12– 24 ms) and semantic facilitation observed in manual response modality (p = 0.139; Mdifference = 9 ms; 95%CI = 3– 20 ms) would have no longer been significant if colorneutral signs were used to compute these contrasts (see, e.g., Redding and Gerjets, 1977; Brown, 2011 for other empirical demonstrations). Thus, these results are compatible with Brown's (2011) conclusion that if a baseline consists of color-neutral signs instead of words, not only is the magnitude of Stroop interference overestimated, but also, and importantly, the magnitude of Stroop facilitation is largely underestimated. In light of the present results, but also because some task conflict actually occurs even for color-congruent stimuli (Goldfarb and Henik, 2007), it still seems useful to nuance this latter conclusion by specifying that if a baseline consists, indeed, of color-neutral signs instead of words, the magnitude of the color incongruency effect is overestimated and, importantly, the magnitude of Stroop facilitation is largely underestimated.

#### GENERAL DISCUSSION

The results of Experiments 1 and 2 yielded an important response modality effect – the direction of which was consistent with past findings (e.g., White, 1969; Redding and Gerjets, 1977; Neill, 1977; McClain, 1983; Sharma and McKenna, 1998; Kinoshita et al., 2018; Fennell and Ratcliff, 2019; Zahedi et al., 2019; Parris et al., in press). Indeed, in both experiments reported above, the magnitude of Stroop interference was substantially larger when vocal responses as opposed to key presses were used. This means that both the Stroop interference effects and response modality effects are the same in both experiments and are, therefore, not affected by the inclusion of the additional congruent conditions in Experiment 2.

The present study further extended the past results in several important ways. Indeed, it has shed a more direct light on processes driving this effect. Specifically, results of both experiments showed that the response modality effect is due to a significantly lesser contribution of task and response conflicts (but not the one of semantic conflict) to overall Stroop interference when manual, as opposed to vocal response, output is required. Even more precisely, with key presses, the magnitude of task conflict is reduced to the point that it actually fails to contribute significantly to overall Stroop interference. The significantly reduced magnitude of response conflict contributed, on the other hand, to overall Stroop interference, exactly like the magnitude of semantic conflict that remained unchanged by the induced differences in the required response output. The aforementioned pattern of results occurred independently of whether the response modality was manipulated between (Experiment 1) or within (Experiment 2) participants.

These results therefore present several potentially important implications. First, they seem consistent with a rather puzzling idea – mentioned earlier (see, section "Introduction") – that the way participants identify the color of Stroop stimuli determines how (rather than the extent to which) different features of these compound stimuli are actually processed. Indeed, if all types of conflict (task, semantic, and response conflicts) seem to significantly contribute to the overall Stroop interference observed with vocal responses, only semantic and response conflicts clearly significantly contribute to Stroop interference observed with manual responses. Consequently, the second important implication of this pattern of results is that vocal and manual Stroop tasks might actually correspond to two different tasks. Specifically, in line with conclusions of several recent studies, the former might correspond to a naming task, whereas the latter to a categorization task, hence entailing qualitative rather than quantitative differences in processing (e.g., Kinoshita et al., 2017; Fennell and Ratcliff, 2019).

While this latter possibility remains plausible and therefore should be thoroughly addressed by additional studies, we are inclined to argue in favor of a, perhaps, more parsimonious possibility of quantitative, rather than qualitative, differences in processing between vocal and manual Stroop tasks (Roelofs, 2003). Indeed, in line with an integrative perspective that bridges both SC-RC and TC-RC multistage accounts of Stroop interference (Augustinova et al., 2018b; Parris et al., under review; for reviews), it still remains equally plausible that some amount of task conflict occurs with both types of response output. However, given a modest magnitude of this contribution with manual responses, response time might not be the most suitable indicator for capturing it (see, e.g., Augustinova et al., 2015; see also Kinoshita et al., 2018 for findings consistent with this latter conclusion). This latter reasoning is consistent with findings of Heil et al. (2004) in a letter search priming paradigm. They convincingly demonstrated that the absence of semantic

activation cannot be validly inferred from the lack of response time effects in this latter paradigm. Indeed, in their study, the absence of response time effect occurred while eventrelated potential (ERP) correlate of semantic activation (i.e., the N400 amplitude) was still significant and sensitive to experimental manipulations used. It therefore remains possible that the significant contribution of an early component of Stroop interference such as task conflict can still be found in electrophysiological measures such as ERPs (see Elchlepp et al., 2013 for ERP task set conflict correlates observed in a version of task-switching paradigm).

Another point to note is that the measure of task conflict used in this and the previous studies mentioned above (e.g., DOGgreen - XXXgreen) is not the only measure of task conflict. To investigate the potential role of conflict between task sets in the Stroop task, Goldfarb and Henik (2007); see also Kalanthroff et al. (2013a,b) reported a study in which they attempted to reduce task conflict control by increasing the proportion of non-word neutral trials (repeated letter strings) to 75%. Increasing the proportion of non-word neutral trials would create the expectation for a low task conflict context, and so, task conflict monitoring would effectively be offline. In addition to increasing the proportion of non-word neutral trials, on half of the trials, participants received cues that indicated whether the following stimulus would be a non-word or a color word, giving another indication as to whether the mechanisms that control task conflict should be activated. For non-cued trials, when presumably task conflict control was at its nadir, and therefore task conflict at its peak, RTs were slower for congruent trials than for non-word neutral trials, producing a negative facilitation effect. This measure of negative facilitation, indicating the presence of task conflict, was observed with a manual response. Thus, our argument is not that there is no task conflict with manual responses, but that our data provide evidence for larger task conflict with a vocal response, which would contribute to the difference in interference (and facilitation) effects often reported between the two response modes. The fact that observing negative facilitation requires an experimental manipulation that would modify facilitation and other forms of conflict (e.g., Kinoshita et al., 2018) means that it is not ideal when measuring the contribution of conflict and facilitation types to Stroop effects.

The present behavioral findings also suggest that the contribution of semantic conflict remains unaffected by variations in response modality. Several past ERP studies are in line with this result as they show that the amplitude of the aforementioned N400 – corresponding to an ERP correlate of semantic conflict (Augustinova et al., 2015) – also remains unaffected by the response modality (Liotti et al., 2000; Zahedi et al., 2019). Note, however, that the scalp distribution of N400 might eventually differ as a function of response output (Liotti et al., 2000). Taken together, the present and past results are consistent with the idea that semantic processing in the Stroop task occurs and to the same magnitude irrespective of the type of response output required (Augustinova and Ferrand, 2014a,b; Brown and Besner, 2001 for discussions, but see Sharma and McKenna, 1998; Hasshim and Parris, 2014, 2015 for different empirical findings). This latter idea is actually strengthened by the fact that semantic facilitation remained unaffected by the type of response output, whereas response facilitation was substantially reduced (to the point of its actual elimination) in manual response modality. This finding is inconsistent with the notion that Stroop effects observed with semantic Stroop stimuli are due to the indirect measurement of response conflict (Roelofs, 2003). That is, it has been argued that the connections that semantically related stimuli have to response colors (i.e., sky is related to blue, and it is the activation of the response blue that leads to the Stroop effect) is what leads to apparent semantic Stroop effects. In the present data, semantic Stroop effects are unaffected by response mode, whereas response conflict is affected. If semantic effects were due to connections at the response level, one would expect to see simultaneous modification of the semantic- and response-level effects. However, the semantic Stroop effects are much smaller than the response effects, and so, the preserved effects could be due to effect of magnitude as opposed to effect type (see Parris et al., under review, for a fuller discussion of this issue).

It is clear from the present data set that, as a percentage of overall Stroop interference and facilitation effects, response processing contributes less when using a manual response (compared to a vocal response), suggesting that the makeup of Stroop effects differs between response modes. This is, however, due to the substantially reduced amount of both response conflict and response facilitation. Indeed, the finding of eliminated response facilitation with the manual response is important and surprising and one that shows how facilitation with manual and vocal responses is quite different. Response facilitation then, like response conflict, is substantially reduced with manual responses, suggesting a commonality between Stroop interference and facilitation, indices that have recently been considered as potentially unrelated phenomena (Parris, 2014; see Brown, 2011 for a further discussion of this important issue).

Finally, the concluding implication of the present findings is related to the fact that the investigations of the response modality effect in the Stroop task seemingly contribute to uncovering the different components of Stroop interference. Even though these different components still remain to be further studied, namely, with more time- and, perhaps, locus-sensitive indicators, it seems rather obvious that the historically favored single-stage response competition accounts should be abandoned in favor of multistage accounts of Stroop interference (Risko et al., 2006; Augustinova et al., 2018b).

This paradigmatic shift is, indeed, important because singlestage response competition accounts still largely dominate both empirical research and clinical practice, such that many researchers and practitioners who are interested in Stroop interference itself and/or in its measurement still seem to be unaware that it goes far beyond a mere response competition and that it should, thus, be measured or at least interpreted as a composite phenomenon involving additional types of components. Consequently, the extended semantic Stroop paradigm used in the present study might turn out as an evaluation tool that is simple enough to be administered in both laboratory and field (i.e., clinical) settings. Indeed, the specific contribution of all three types of conflict (task, semantic,

and response conflicts), as well as the modulation (or the lack of thereof) of these distinct contributions, can be clearly seen within this paradigm – at least when administered with vocal responses. Also, and importantly, it is not restricted to manual responses as is the case with the so-called 2-to-1 paradigm (De Houwer, 2003; see also, e.g., van Veen and Carter, 2005; Hasshim and Parris, 2014, 2015). As already emphasized by Augustinova et al. (2018b), the extended form of the Stroop paradigm can therefore "be administered not only using an item-by-item (i.e., computerized) presentation but also, potentially, in a card version that is still in widespread use in clinical practice (see, e.g., Bugg et al., 2007 for an example and Augustinova and Ferrand, 2014b; Augustinova et al., 2016 for discussions of this issue)" (p. 61, see also Augustinova et al., 2018b). Thus, a more fine-grained measurement of Stroop interference would represent an added value, namely, for neuropsychological practice. Indeed, because different components of Stroop interference are likely to be associated with distinct neural substrates (Bench et al., 1993; Milham et al., 2001; van Veen and Carter, 2005; Chen et al., 2013), it remains highly plausible that the different conflicts involved in Stroop interference are selectively impacted by various clinical conditions (e.g., Alzheimer's disease, attention deficit hyperactivity disorder). The present study therefore motivates a comparison of the neural substrates of all three conflict types in the same neuroimaging study. Indeed, one of the rationales for the original proposal of a TC-RC account of Stroop interference by MacLeod and MacDonald (2000) lies in the observation that the Anterior Cingulate Cortext (ACC) appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli (e.g., XXX; see also Bench et al., 1993). That said, no study has yet directly investigated this possibility using the required contrast of color-neutral words to color-neutral letter strings, so the precise location of activation within the ACC associated with task conflict

## REFERENCES


is not known (see Milham et al., 2001; van Veen and Carter, 2005; Chen et al., 2013 for distinct locations of semantic vs. response conflicts). Therefore, and again, future studies need to address this remaining issue directly. Meanwhile, the present study largely reaffirms that Stroop interference and facilitation have several loci as opposed to just a single (i.e., response) locus, at least with vocal responses.

## DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding authors.

## ETHICS STATEMENT

Human subject research: The studies involving human participants were reviewed and approved by Comité de Protection des Personnes Sud- Est 6 (Clermont- Ferrand, France). The patients/ participants provided their written informed consent to participate in this study. Animal subjects: No animal studies are presented in this manuscript. Human images: No potentially identifiable human images or data is presented in this study.

## AUTHOR CONTRIBUTIONS

MA and LF designed the study, collected and analyzed the data, and wrote the first draft of the manuscript. All authors contributed to the interpretation of the data, critically revised the final version of the manuscript, and equally contributed to read, comment, and approve the submitted version.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Augustinova, Parris and Ferrand. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Preserved Proactive Control in Ageing: A Stroop Study With Emotional Faces vs. Words

Natalie Berger\*, Anne Richards and Eddy J. Davelaar

Department of Psychological Sciences, Birkbeck, University of London, London, United Kingdom

Previous studies regarding age-related changes in proactive control were inconclusive and the effects of emotion on proactive control in ageing are yet to be determined. Here, we assessed the role of task-relevant emotion on proactive control in younger and older adults. Proactive control was manipulated by varying the proportion of conflict trials in an emotional Stroop task. In Experiment 1, emotional target faces with congruent, incongruent or non-word distractor labels were used to assess proactive control in younger and older adults. To investigate whether the effects of emotion are consistent across different stimulus types, emotional target words with congruent, incongruent or obscured distractor faces were used in Experiment 2. Data from this study showed that older adults successfully deployed proactive control when needed and that task-relevant emotion affected cognitive control similarly in both age groups. It was also found that the effects of emotion on cognitive performance were qualitatively different for faces and words, with facilitating effects being observed for happy faces and for negative words. Overall, these results suggest that the effects of emotion and age on proactive control depend on the task at hand and the chosen stimulus set.

#### Edited by:

Benjamin Andrew Parris, Bournemouth University, United Kingdom

#### Reviewed by:

Robert West, DePauw University, United States Yoshifumi Ikeda, Joetsu University of Education, Japan

> \*Correspondence: Natalie Berger n.berger@bbk.ac.uk

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 29 April 2019 Accepted: 02 August 2019 Published: 03 September 2019

#### Citation:

Berger N, Richards A and Davelaar EJ (2019) Preserved Proactive Control in Ageing: A Stroop Study With Emotional Faces vs. Words. Front. Psychol. 10:1906. doi: 10.3389/fpsyg.2019.01906 Keywords: proactive control, cognitive control, ageing, task-relevant emotion, Stroop task

## INTRODUCTION

Research suggests that the ability to exert cognitive control over incoming information is not a unitary process. According to the dual mechanisms of control (DMC) theory (Braver et al., 2007, 2009), there are at least two separable factors: proactive control refers to sustained control, which is recruited before the occurrence of conflict (Braver, 2012), whereas reactive control refers to transient control processes that are recruited once conflict has been detected (Botvinick et al., 2001). In recent years, research started to assess the effects of emotion on these two control modes (Kalanthroff et al., 2016; Grimshaw et al., 2017; Kar et al., 2017). However, none of these studies have investigated the effects of emotion on proactive control in ageing despite evidence of age-related changes in executive functions and in emotional functioning. The aim of the present research was therefore to investigate younger and older adults' ability to exert proactive control in two emotional Stroop tasks.

## Age-Related Changes in Cognitive and Emotional Functioning

Research indicates that reactive control is preserved in aging (Paxton et al., 2006; Braver, 2012). In contrast, research findings regarding age-related differences in proactive control have been mixed. Significantly impaired goal maintenance was found in older relative to younger adults

(e.g., Braver et al., 2005, 2009; Haarmann et al., 2005; Paxton et al., 2008, Exp. 1), which was interpreted as evidence for impaired proactive control in ageing. Other studies, however, reported intact (e.g., Paxton et al., 2008, Exp. 2; Staub et al., 2014) or even improved proactive control in older relative to younger adults (Staub et al., 2014). It should be noted that impaired proactive control in ageing was found in studies using the AX-Continuous Performance Task (AX-CPT; Rosvold et al., 1956), which requires participants to maintain goal-related information and to make target responses on cued trials and non-target responses on all other trials. This task not only tackles proactive control but also requires participants to remember a two-fold set of rules and to keep track of preceding items in order to make correct target and non-target responses to an X. Previous research has shown that these abilities are impaired in ageing: Older adults were found to show difficulties in maintaining two different tasks in working memory (Verhaeghen and Cerella, 2002; Reimers and Maylor, 2005; Wasylyshyn et al., 2011) and in working memory updating (Van der Linden et al., 1994; Hartman et al., 2001; Salthouse et al., 2003; De Beni and Palladino, 2004; Chen and Li, 2007; Schmiedek et al., 2009). Thus, age-related differences in AX-CPT performance might be found due to impairments in processes other than proactive control.

So far, the effects of emotion on proactive control in ageing have received little attention, despite evidence that emotioncognition interactions change with age (for comprehensive reviews, see Mather, 2004; Mather and Carstensen, 2005; Murphy and Isaacowitz, 2008; Kensinger, 2009). Research from the domain of WM has shown that older adults can benefit from the inclusion of emotional and particularly positive material (Mikels et al., 2005; Mammarella et al., 2013a,b). For instance, Mikels et al. (2005) found age-related impairments in a delayedresponse task when participants had to compare the brightness of two neutral pictures but not when they had to compare the emotional intensity of two emotional pictures. Moreover, older adults outperformed younger adults when they had to compare the emotional intensity of positive pictures, whereas younger adults showed better performance than older adults on trials with negative pictures. Age-related impairments were also found when neutral but not when emotional words were used in a modified version of the operation WM span test, in which participants had to maintain words while solving mathematical operations (Mammarella et al., 2013a,b).

Age-related changes in emotion-cognition interactions are usually interpreted within the socioemotional selectivity theory (SST; Carstensen, 1993), according to which older adults use cognitive resources to direct their attention to emotional and particularly positive information to enhance their well-being (for reviews, see Scheibe and Carstensen, 2010; Reed and Carstensen, 2012). It was found that cognitive load can eliminate this emotional bias in ageing (Mather and Knight, 2005), suggesting that older adults' preference for positive material requires controlled, resource-demanding processes. Based on this assumption, which centers around the availability of cognitive resources, specific hypotheses can be suggested regarding the effects of emotion on cognitive control in ageing. As goal representations are maintained continuously under proactive control, this control mode is thought to be resource-consuming (Braver, 2012) and thus, fewer cognitive resources should be available. If older adults indeed use cognitive resources in order to direct their attention to positive information, it can be expected that a positivity effect in ageing should be less pronounced under conditions requiring high proactive control relative to conditions requiring low proactive control.

#### Proactive Control in the Stroop Task

The Stroop task (Stroop, 1935) has been widely used to assess cognitive control. In the classic color version task, color words are printed in a congruent or an incongruent ink color (e.g., "red" printed in red vs. green ink) and participants have to name the color of the ink while ignoring the color word. It is assumed that there is a strong tendency to read the word due to lifelong experience with reading (Verhaeghen and De Meersman, 1998) and thus, cognitive control is required to selectively attend to and respond to the weak but task-relevant (i.e., the color of the ink) attribute in the presence of a strong but task-irrelevant (i.e., written color word) attribute (Miller and Cohen, 2001). Typically, incongruent trials are associated with slower responses than non-word trials, a pattern that is known as the Stroop effect (Lindsay and Jacoby, 1994).

However, research suggests that in contrast to non-word trials, not only incongruent but also congruent trials elicit task conflict between word reading and color naming due to the presence of both color and word information (Goldfarb and Henik, 2007, 2013; Kalanthroff et al., 2015; for a review, see Kalanthroff et al., 2013, 2018). Previous studies have used expectancy of task conflict to manipulate the recruitment of proactive control (De Pisapia and Braver, 2006; Funes et al., 2010; Krug and Carter, 2012; Kalanthroff et al., 2015). Goldfarb and Henik (2007), for instance, increased the number of non-word trials (see also Tzelgov et al., 1992) and added cues that informed participants on half of the trials whether the next trial would be a Stroop trial or a non-word trial. On the other half of the trials, the cues were uninformative. This was aimed at reducing or relaxing proactive control in participants on un-cued relative to cued trials, as most of the trials only had task-relevant color information. It was found that on non-cued trials, reaction times (RTs) were longer for congruent compared to non-word trials, which was labeled reversed facilitation. Additionally, RTs were longer for non-cued congruent stimuli compared to cued stimuli and incongruent trials were slower than non-word and congruent trials throughout. These results suggest that participants were less efficient in resolving task conflict on both incongruent and congruent trials when proactive control was low.

Neuroimaging studies also found that conditions with a high expectancy (HE) of conflict (i.e., congruent and incongruent) trials in a Stroop task were associated with sustained activity in the dorsolateral prefrontal cortex (DLPFC) that is linked to the deployment of cognitive control (De Pisapia and Braver, 2006; Krug and Carter, 2012). In contrast, conflict trials under conditions with a low expectancy (LE) of conflict trials were associated with event-related activation of a medial and lateral prefrontal cognitive control network, including the anterior cingulate cortex (ACC), which has been linked to conflict monitoring (De Pisapia and Braver, 2006; Krug and Carter, 2012). Behaviorally, two indices for the recruitment of proactive control in a Stroop paradigm can be used: interference, which is the difference between RTs for incongruent and non-word trials, and facilitation, which is the difference between RTs for congruent and non-word trials. High levels of proactive control under conditions of HE of task conflict are thought to be associated with reduced interference and facilitation. In contrast, low levels of proactive control under conditions of LE of task conflict are thought to be associated with increased interference and no or even reversed facilitation (Tzelgov et al., 1992; Goldfarb and Henik, 2007; Kalanthroff et al., 2013, 2015).

## The Present Research

fpsyg-10-01906 August 30, 2019 Time: 17:34 # 3

The aim of this research was to assess the effects of age and emotion on proactive control in two emotional Stroop tasks. Expectancy of task conflict was used to manipulate proactive control and emotional faces and words were used to test whether the role of emotion is consistent across different stimulus sets. Experiment 1 assessed older and younger adults' ability to exert proactive control in an emotional Stroop task with faces. Although the Stroop task has been used to investigate the effects of emotion on proactive control, emotional items were often included as task-irrelevant distractors (e.g., Kalanthroff et al., 2016; Grimshaw et al., 2017). The effects of taskrelevant emotional targets, on the other hand, were often not considered, despite evidence that emotion can improve cognitive performance through enhanced target processing (Pessoa, 2009). In a study by Krug and Carter (2012), for instance, participants responded to the emotion of neutral and fearful faces, while these were shown with congruent and incongruent emotion labels ("neutral" or "fearful"). The authors reported higher interference by an irrelevant emotional (i.e., "fearful") relative to an irrelevant neutral label distractor. An alternative interpretation, which was not explored by the authors, is that interference was actually reduced for emotional targets (fearful face with irrelevant neutral label) rather than increased for emotional distractors (neutral face with irrelevant emotional label). In another study, Kar et al. (2017) used happy vs. sad (Exp. 1) or happy vs. angry target faces (Exp. 2) with congruent and incongruent distractor labels in a Stroop task and found that conflict adaptation, a measure of proactive control, varied as a function of previously presented emotion. However, neutral faces were not included and this absence of a neutral baseline makes it difficult to interpret differential effects of sad vs. happy or angry vs. happy faces.

## EXPERIMENT 1

To address the limitations of previous research, three emotions were included in the present facial Stroop task: happy, neutral, and angry target faces. Based on research showing that happy faces are more efficiently detected than other expressions (Kirita and Endo, 1995; Becker et al., 2011; Becker and Srinivasan, 2014), it was predicted that happy targets would be associated with higher accuracy and faster RTs relative to neutral or angry targets. As research (Carstensen, 1993) suggests that older adults focus on positive material more than younger adults and that this focus requires cognitive resources, it was hypothesized that older adults would show particularly improved performance for happy faces relative to younger adults. However, this was expected under LE conditions requiring low levels of proactive control, as more resources would be available to focus on happy faces relative to HE conditions requiring high levels of resource-demanding proactive control.

## Methods

#### Participants

Thirty younger (19–40 years old) and 30 older adults (62–85 years old) participated in the experiment (see **Table 1** for participant characteristics). One younger and one older participant were excluded from the analysis due to RTs that were 2.5 SD slower than the respective age group's mean RTs. Younger adults were undergraduate and postgraduate students at Birkbeck, University of London, and received either course credits or £7.50 per hour for their participation. Older adults were recruited from the University of the Third Age in London and were paid at the same rate as younger adults for their participation. Participants were community-dwelling and were pre-screened for psychiatric disorders and a history of neurological disorders. They reported to be in good health and had normal or corrected-to-normal vision. Older participants had a score of 27 or above on the Mini-Mental State Examination (MMSE; Folstein et al., 1975). Older adults had better verbal knowledge as assessed with the NART (Nelson and Willison, 1991) and showed slower processing speed as measured by the Digit Symbol Substitution Test (Wechsler, 1955). No further differences were observed. The ethics board of Birkbeck, University of London, approved the procedure prior to the start of the study and written informed consent was obtained from each participant.

#### Materials

The stimuli were 36 faces from the FACES database (Ebner et al., 2010), a validated set of photographs of naturalistic faces of different ages in front view. Faces showed angry, neutral or happy expressions (12 items per emotion). The age group (younger,

#### TABLE 1 | Participant characteristics, Experiment 1.


NART = The National Adult Reading Test, BDI II = Beck Depression Inventory II, STAI = State-Trait Anxiety Inventory, MMSE = Mini-Mental State Examination.

TABLE 2 | Combinations of facial expressions and labels that formed congruent, non-word and incongruent stimuli in Experiment 1.


Congruent stimuli are color-coded in green, non-word stimuli in yellow and incongruent stimuli in red.

middle-aged, older) and sex (male, female) of the faces were balanced in each emotion category. The faces were taken from a pool of stimuli that had been previously rated by younger and older adults and were selected based on high agreement ratings between both age groups (for evaluation details, see Berger et al., 2017). Congruent items were created by printing matching emotion labels across the emotional faces (e.g., neutral face with "neutral" label). Incongruent items were created by printing nonmatching emotion labels across the faces (e.g., angry face with "happy" label). Non-word items were created by printing a string of "xxxxx" across the faces. Combinations of faces and labels are summarized in **Table 2**. Face images were turned to gray-scale, whilst labels were printed in red, 38-point Courier New font, and placed between eyes and mouths of the faces. Example stimuli are presented in **Figure 1**.

#### Procedure

After giving informed consent, participants completed a demographic questionnaire and were seated in front of a computer screen. A visual acuity test (Bach, 1996) was conducted at a distance of 65 cm to ensure that vision was in the normal range. Participants were then asked to remain at this distance to the screen and performed the computerized Stroop task, which was prepared and presented using E-Prime Version 2.0.10.353 (Schneider et al., 2002) on a 24-inch computer screen with a resolution of 1920 × 1200 pixels. The task consisted of two blocks, counterbalanced across participants. In the HE block, 75% of the trials were either congruent or incongruent (37.5%, respectively), while 25% of the trials were non-words. In the LE block, 25% of the trials were either congruent or incongruent (12.5%, respectively) and 75% of the trials were non-words. There were equal numbers of angry, neutral, and happy faces across congruent, non-word and incongruent trials as well as across the two blocks. Each block consisted of 288 trials and presentation of trials was random. In each trial, a fixation cross appeared for 500 ms. It was then replaced by the distractor label "angry," "neutral," "happy" or "xxxxx," which was presented for 100 ms. This was done to facilitate label reading, following prior procedures by Krug and Carter (2012). The presentation of the label was followed by the simultaneous presentation of the label and the target face. Participants were instructed to indicate the emotion of the face (angry, neutral or happy) as accurately and quickly as possible by pressing one of three labeled keys. On the computer keyboard, the buttons "1," "2," and "3" on the numeric keypad were used. Button presses initiated the presentation of a blank screen for 2000 ms, after which the next trial started. The assignment of emotion labels to buttons was counterbalanced across participants. Participants were instructed to leave the fingers on the buttons for the duration of the task. With the option to take short breaks after every 48 trials, there were five short breaks in each block and one in-between blocks. Participants were tested individually and each session lasted approximately 60–75 minutes in total.

#### Design and Statistical Analysis

Responses and RTs were recorded for each trial and accuracy and median rather than mean RTs for correct trials were calculated for each participant for each condition to account for the skewed distribution of RT data. Statistical analyses of

FIGURE 1 | Examples of Stroop stimuli in Experiment 1. Panel (A) shows an angry face with a congruent label, panel (B) shows a neutral face with an incongruent label, and panel (C) shows a happy face with a non-word label. Pictures are taken from the FACES database (Ebner et al., 2010) and can be accessed at: https://faces.mpdl.mpg.de/imeji/. Publication and display of the shown pictures for the purpose of illustrating research methodology are permitted under the FACES Platform Release Agreement.

the data were conducted with SPSS 22 (IBM Corp., Armonk, NY). Accuracy and RTs were analyzed by 2 × 3 × 3 × 2 mixed factors ANOVA including the within-subjects factors expectancy (LE vs. HE), congruency (congruent vs. non-word vs. incongruent) and emotion (angry vs. neutral vs. happy) as well as the between-subjects factor of age (younger vs. older). Post hoc t-tests with a Bonferroni adjustment to the 5% alpha level were performed to follow up significant main effects and interactions. Due to significant differences in the two age groups' verbal knowledge and processing speed, all analyses were repeated with NART verbal IQ and Digit Symbol as centered covariates. The results with age as a factor reported here were qualitatively the same and significant in the analysis including covariates. RTs varied considerably between younger and older adults. To guard against spurious interactions between age and experimental conditions due to general slowing in older adults (Faust et al., 1999), log-transformed RTs were used for the analysis (e.g., Kray and Lindenberger, 2000; Tun and Lachman, 2008). To aid interpretation, pre-transformed RTs are reported in the descriptives and figures.

## Results

#### Accuracy

Accuracy scores for younger and older adults are presented in **Figure 2**. The analysis yielded a significant main effect of congruency, F(2, 112) = 46.23, MSE = 0.007, p < 0.001, partial η <sup>2</sup> = 0.45, with higher accuracy for congruent (M = 96.9%, SD = 2.7%) compared to non-word (M = 95.6%, SD = 3.6%), t(57) = 3.98, p < 0.001, or incongruent trials (M = 92.0%, SD = 6.4%), t(57) = 7.32, p < 0.001. Accuracy was also higher for non-word than for incongruent trials, t(57) = 6.47, p < 0.001. There was also a main effect of emotion, F(2, 112) = 29.45, MSE = 0.026, p < 0.001, partial η <sup>2</sup> = 0.34, with higher accuracy for happy faces (M = 97.7% SD = 3.1%) compared with neutral (M = 96.3%, SD = 4.2%), t(57) = 2.88, p = 0.005, or angry faces (M = 90.5%, SD = 8.5%), t(57) = 6.53, p < 0.001. Accuracy was also higher for neutral than for angry faces, t(57) = 4.81, p < 0.001. These main effects were qualified by a significant congruency × emotion interaction, F(4, 224) = 4.26, MSE = 0.003, p = 0.007, partial η <sup>2</sup> = 0.07. Follow-up t tests revealed that for angry faces, accuracy was higher for congruent (M = 93.8%, SD = 7.2%) relative to non-word trials (M = 91.1%, SD = 8.5%), t(57) = 3.80, p < 0.001. In contrast, the difference in accuracy between congruent and non-word trials was not significant for neutral (p = 0.079) or for happy faces (p = 0.102). Accuracy was higher for non-word than for incongruent trials for all three valences (all t values ≥ 4.26). There was also a significant expectancy × congruency × emotion × age interaction, F(4, 224) = 3.45, MSE = 0.004, p = 0.026, partial η <sup>2</sup> = 0.06. Accuracies under HE and LE conditions were analyzed separately to follow up this interaction. The congruency × emotion × age interaction was non-significant under LE conditions (p = 0.560), but was significant under HE conditions, F(4, 224) = 5.94, MSE = 0.002, p = 0.001, η <sup>2</sup> = 0.10. Separate analyses for angry, neutral, and happy faces were conducted and while the congruency × age interaction was significant for angry faces, F(2, 112) = 3.45, MSE = 0.005, p = 0.048, partial η <sup>2</sup> = 0.06, and for neutral faces, F(2, 112) = 5.26, MSE = 0.002, p = 0.012, partial η <sup>2</sup> = 0.07, it was non-significant for happy faces (p = 0.237). Follow-up t-tests showed different response patterns to angry faces in younger and older adults: Under HE conditions, younger adults showed higher accuracy for congruent (M = 93.8%, SD = 9.0%) relative to non-word angry faces (M = 88.5%, SD = 10.5%), t(28) = 3.81, p = 0.001, and no difference between incongruent (M = 87.4%, SD = 11.1%) and non-word angry faces (p = 0.459). In contrast, older adults showed no difference (p = 0.515) in accuracy for congruent (M = 93.1%, SD = 8.2%) relative to nonword angry faces (M = 92.4%, SD = 7.5%). Instead, older adults' accuracy was significantly lower for incongruent (M = 85.5%, SD = 13.5%) relative to non-word angry faces, t(28) = 3.17, p = 0.004. Response patterns also differed for neutral faces. In younger adults, accuracy was lower for incongruent (M = 91.9%, SD = 7.9%) relative to non-word neutral faces (M = 97.0%, SD = 5.0%), t(28) = 3.76, p = 0.001, whereas the difference was non-significant in older adults (p = 0.239). Lastly, there was also a main effect of age, F(1, 56) = 5.77, MSE = 0.024, p = 0.020, partial η <sup>2</sup> = 0.09, driven by higher accuracy in older (M = 96.0%, SD = 2.9%) than in younger adults (M = 93.7%, SD = 4.3%). No further significant main effects or interactions were observed for accuracy.

#### Reaction Times

Reaction times for younger and older adults are presented in **Figure 3**. The analysis yielded a main effect of congruency, F(2, 112) = 124.06, MSE = 0.019, p < 0.001, partial η <sup>2</sup> = 0.69, with overall faster RTs for congruent (M = 724 ms, SD = 138 ms) than for non-word trials (M = 750 ms, SD = 137 ms), t(57) = 7.89, p < 0.001, or incongruent trials (M = 833 ms, SD = 203 ms), t(58) = 12.40, p < 0.001. RTs were also faster for non-word than for incongruent trials, t(57) = 9.86, p < 0.001. This main effect was qualified by an expectancy × congruency interaction, F(2, 112) = 12.25, MSE = 0.006, p < 0.001, partial η <sup>2</sup> = 0.18. To follow up on this interaction, the analysis was repeated with the factor congruency only comprising the factor levels congruent and non-word trials and there was no significant expectancy × congruency interaction (p = 0.878). In contrast, in the analysis with the factor congruency comprising the factor levels non-word and incongruent trials, there was a significant expectancy × congruency interaction, F(1, 57) = 15.87, MSE = 0.006, p < 0.001, partial η <sup>2</sup> = 0.22. Followup t-tests revealed that under HE conditions, RTs were slower for incongruent (M = 815 ms, SD = 200 ms) than for non-word trials (M = 754 ms, SD = 146 ms), t(57) = 8.22, p < 0.001. Under LE conditions, the difference in RTs between incongruent (M = 850 ms, SD = 226 ms) and non-word trials (M = 745 ms, SD = 142 ms) was more pronounced, t(57) = 9.95, p < 0.001. Moreover, there was a significant main effect of emotion, F(2, 112) = 50.61, MSE = 0.022, p < 0.001, partial η <sup>2</sup> = 0.48, and follow-up analyses revealed that RTs for happy faces (M = 716 ms, SD = 124 ms) were faster than for neutral faces (M = 788 ms, SD = 177 ms), t(57) = 7.36, p < 0.001, or angry faces (M = 802 ms, SD = 181 ms), t(57) = 9.20, p < 0.001. The difference between RTs for neutral and angry faces was not significant (p = 0.139). Lastly,

there was also a main effect of age, F(1, 56) = 27.32, MSE = 0.421, p < 0.001, partial η <sup>2</sup> = 0.33, as older adults were overall slower (M = 853 ms, SD = 162 ms) than younger adults (M = 684 ms, SD = 92 ms). No further significant main effects or interactions were observed for RTs.

#### Discussion

Experiment 1 assessed the effects of emotion on proactive control in older and younger adults. Both younger and older adults showed reduced interference in RTs from incongruent relative to non-word trials when expectancy of conflict was high (HE conditions). This suggests that both age groups deployed proactive control under HE conditions, which helped to prime task-relevant processing pathways before the onset of conflict trials. It was also observed that emotional faces affected performance in both age groups. Happy faces improved overall performance as evidenced by higher accuracy and faster RTs for happy compared to neutral or angry faces across conditions with no age-related differences. In contrast, accuracy was lowest and RTs were slowest for angry faces. Younger adults were more accurate when responding to congruent relative to non-word angry faces, whereas older adults showed reduced accuracy for incongruent relative to non-word negative information under HE conditions. Although this could suggest greater impairments in the presence of angry faces in older than in younger adults, this effect was in fact driven by lower accuracy for angry non-word trials in younger than older adults as can be seen in **Figure 2**. No age-related differences in accuracy were observed for congruent and incongruent angry faces. When presented with neutral faces under HE conditions, younger but not older adults showed lower accuracy for incongruent relative to non-word trials. Thus, there was not only no evidence for age-related impairments in proactive control, but older adults even outperformed younger adults when presented with neutral material under conditions requiring proactive control.

Higher accuracy and faster RTs in the presence of happy relative to neutral or angry faces were observed in both age groups and this is in line with previous research showing improved WM performance for happy faces relative to other expressions (Levens and Gotlib, 2010, 2012; Cromheeke and Mueller, 2015).

Enhanced performance for happy faces was found across conditions and did not interact with control in the present research. This indicates that more general processes, for instance emotion recognition, were facilitated by happy faces rather than specific control processes. This is in line with studies showing more accurate and faster recognition of happy relative to other emotional expressions (Juth et al., 2005; Becker et al., 2011; Becker and Srinivasan, 2014). Besides this perceptual advantage it is also likely that happy faces contributed to improved performance due to the rewarding value they carry (O'Doherty et al., 2003; Tsukiura and Cabeza, 2008), which might have facilitated particularly efficient processing of happy faces. In contrast to neutral and angry faces, all happy faces used in this experiment showed teeth, a perceptual cue that could have facilitated recognition of happy faces. Previous research indicates that despite a recognition advantage of open-relative to closed-mouth versions of happy faces, happy expressions are still identified more accurately than other emotional expressions with open or with closed mouth (e.g., Tottenham et al., 2009; Becker et al., 2011).

The facilitating effect of happy faces was not more pronounced in older relative to younger adults, neither in general nor in any of the two conditions, which is not fully in line with the SST (Carstensen, 1993). According to this theory, older adults focus on positive information in order to improve wellbeing, which is reflected in a positivity effect in their cognitive performance. In the present experiment, older adults were very accurate in both conditions, which suggests that the task was not too demanding and that additional cognitive resources were still available. Despite this availability of cognitive resources, the

data suggest that older adults did not use them to sustain an emotional bias. However, the results could be reconcilable with the SST when considering that specific task instructions may supplant chronically active emotion regulation goals in older adults in contrast to more open instructions (e.g., those allowing participants to view items as if watching TV; for a review, see Reed and Carstensen, 2012). In the present study, participants were instructed to respond to the emotional expression of each face, which might have hindered the processing of emotional stimuli in a motivation-based way. Previous studies that have also used specific and therefore restrictive task instructions in the domain of working memory and that have observed age-related differences in emotion-cognition interactions, have interpreted these within the SST theory (e.g., Mikels et al., 2005; Borg et al., 2011; Truong and Yang, 2014). Thus, it is important that the role of specific task instructions for age-related emotional biases is clarified in future research so that the theory's validity can also be tested in the domain of working memory, where specific task instructions are the norm.

It should be noted that accuracy was not improved for congruent relative to non-word trials when neutral or happy faces were shown. As incongruent distractors did interfere with responses for neutral faces in younger adults and happy faces in both age groups, it appears unlikely that participants were able to ignore distractors when presented with neutral or happy faces. In contrast, it is possible that the failure to observe facilitation for neutral and happy faces was due to ceiling effects, as accuracy was very high for these faces. When responding to neutral faces, younger adults showed lower accuracy for incongruent relative to non-word trials under HE conditions, whereas older adults did not show differences in accuracy between incongruent and non-word trials. On the one hand, this seems to suggest that older adults did not rely on external cues when responding to neutral targets under conditions requiring proactive control. On the other hand, it is also possible that the task conflict created by target words and distractor faces was not high enough under conditions requiring proactive control to affect accuracy in older adults. It is not possible to disentangle these two explanations in the present paradigm. However, the result suggests that older adults were able to overcome information conflict elicited by incongruent trials under conditions requiring proactive control and highlights preserved or even improved proactive control in older relative to younger adults.

It should be noted that facilitation in RTs was found for both age groups in both conditions. This finding suggests that the priming of task-relevant processing pathways improved performance for congruent relative to non-word trials irrespective of expectancy of conflict. Although research suggests that low levels of proactive control are associated with no or even reversed facilitation (Tzelgov et al., 1992; Goldfarb and Henik, 2007; Kalanthroff et al., 2013, 2015), a review by Roelofs (2003) has shown that facilitation occurs when distractors precede target stimuli as they did in Experiment 1: participants were presented with the distractor label 100 ms before the target face appeared. According to Roelofs (2003), such a preview can prime a particular response, resulting in facilitation in congruent trials, and this effect is considered to be "automatic" with preview times under 250 ms. Thus, it appears that the implementation of a distractor-first design in Experiment 1 resulted in facilitation across both experimental conditions.

## EXPERIMENT 2

Experiment 1 showed that emotional material affected cognitive performance in an emotional Stroop paradigm. More specifically, participants responded more accurately and faster when Stroop targets were happy faces, whereas accuracy was lowest and RTs were slowest for angry faces. However, it is not clear whether these effects of emotion can be expected for other stimulus sets such as words. On the one hand, research has shown more efficient processing of emotional relative to neutral material using a wide range of stimulus sets, including faces (e.g., Juth et al., 2005; Brosch et al., 2008; Calvo and Nummenmaa, 2008), images (e.g., Fox et al., 2007; Langeslag and Van Strien, 2008; Olofsson et al., 2008) and words (e.g., Hamann and Mao, 2002; Gotoh, 2008; Kopf et al., 2013). This suggests that effects of emotion can be expected to be consistent across different stimulus sets. On the other hand, there is also evidence that orienting to affective material was more pronounced for faces than for words (Kensinger and Corkin, 2003; Vuilleumier, 2005; Kensinger and Schacter, 2006) and that enhanced processing of emotional content was automatic for faces but not for words (Rellecke et al., 2011). Such differences in the effects of emotional faces and words were usually explained by differences in extracting emotional significance from words and faces. For instance, it was suggested that words must be processed to a higher level than faces before their meaning could be assessed (Kensinger and Corkin, 2003) and that their emotional significance needs to be extracted based on semantic knowledge (Schacht and Sommer, 2009; Rellecke et al., 2011). In contrast, perceptual features are used to extract emotional significance in faces (Vuilleumier, 2005; Vuilleumier and Huang, 2009). Given these differences in the processing of emotional words and faces, it is likely that verbal stimuli affect cognitive control differently than facial stimuli.

By using verbal stimuli in the same task as in Experiment 1, the aim was to assess whether cognitive control of emotional words would be associated with comparable effects as were observed for emotional faces. Should emotional words produce similar effects as in Experiment 1, this would suggest that the valence (i.e., pleasantness) is sufficient to affect performance independently of their biological preparedness. In contrast, if differential effects of emotion were to be observed, this would suggest that stimulus features that are not shared by faces and words contribute to the effects of emotional items on cognitive control.

## Methods

#### Participants

Thirty younger (20–38 years old) and 30 older adults (63–78 years old) participated in the experiment (see **Table 3** for participant characteristics). One younger and one older adult were excluded from the analysis due to RTs that were 2.5 SD slower than the respective group's mean RTs. Additionally, one younger adult was excluded due to high BDI-II scores, indicating moderate levels of depression. The recruitment criteria were the same as in Experiment 1 and none of the participants had taken part in the

#### TABLE 3 | Participant characteristics, Experiment 2.

fpsyg-10-01906 August 30, 2019 Time: 17:34 # 9


NART = The National Adult Reading Test, BDI II = Beck Depression Inventory II, STAI = State-Trait Anxiety Inventory, MMSE = Mini-Mental State Examination.

previous experiment. As can be seen in **Table 3**, older adults had better verbal knowledge than younger adults as assessed with the NART (Nelson and Willison, 1991) and scored lower on the Digit Symbol Substitution Test (Wechsler, 1955), suggesting slower processing speed in older than in younger adults. Whereas these results are commonly observed in ageing research as highlighted above, it was also found that older adults reported fewer years of education than younger adults. Additionally, younger adults reported higher levels of trait anxiety than older adults as assessed by the A-Trait version of the STAI (Spielberger et al., 1983). No further differences were observed between the two age groups. Older participants had a score of 27 or above on the MMSE (Folstein et al., 1975). The ethics board of Birkbeck, University of London, approved the procedure prior to the start of the study and written informed consent was obtained from each participant.

#### Materials

Stimuli consisted of a selection of 36 words from the ANEW database (Bradley and Lang, 1999), which provides normative emotional ratings for a large number of words in the English language. Words were either negative (e.g., abuse, wounds, crime), emotionally neutral (e.g., bench, board, moment) or positive (e.g., thrill, hug, love) and there were 12 words per category. The words had been rated in a preliminary evaluation study and were selected based on high agreement ratings between younger and older raters (see **Supplementary Materials** for evaluation details). Congruent items were created by printing the word on emotionally matching faces that were used in Experiment 1 (e.g., word "thrill" with happy face). Incongruent items were created by printing a word on non-matching emotional faces (e.g., word "bench" with angry face). "Nonface" items (equivalent to non-word items used in the previous experiments) were created by printing the word on a face picture, in which the area of the face was obscured. Combinations of words and faces are summarized in **Table 4**. Target words were printed in navy blue, 38-point Courier New font, and placed between the face's eyes and mouth. The face images were colored photographs that appeared 100 ms before the word, in TABLE 4 | Combinations of words and facial expressions that formed congruent, non-face and incongruent stimuli in Experiment 2.


Congruent stimuli are color-coded in green, non-face stimuli in yellow and incongruent stimuli in red.

accordance with the procedures used in Experiment 1. Example stimuli are presented in **Figure 4**.

#### Procedure

The procedure for Experiment 2 was identical to that of Experiment 1 as were the proportions of congruent, incongruent and non-face trials in the HE and LE blocks. There were equal numbers of negative, neutral and positive words across trials of different congruencies and across the two blocks. Each trial began with the presentation of the distractor face that was happy, neutral, angry, or obscured for 100 ms, followed by the simultaneous presentation of the distractor face and the target word. Participants were instructed to indicate the emotional valence of the word (negative, neutral or positive) as accurately and quickly as possible by pressing one of three labeled buttons.

#### Design and Statistical Analysis

The recording and exclusion of data were identical as in Experiment 1. Accuracy and RTs were analyzed by 2 × 3 × 3 × 2 mixed factors ANOVA including the within-subjects factors expectancy (LE vs. HE), congruency (congruent vs. non-face vs. incongruent) and emotion (negative vs. neutral vs. positive) as well as the between-subjects factor of age (younger vs. older). Procedures to conduct post hoc tests and to determine significance were as described above. Due to significant differences in the two age groups' reported years of education, verbal knowledge, processing speed and anxiety scores, all analyses were repeated with years of schooling, NART verbal IQ, Digit Symbol and STAI Trait Anxiety as centered covariates. The results with age as a factor reported here were qualitatively the same and significant in the analysis including covariates. As latencies varied considerably between younger and older adults, log-transformed RT data were used for the analysis. To aid interpretation, pre-transformed RTs are reported in the descriptives and figures.

#### Results

#### Accuracy

Accuracy scores for younger and older adults are shown in **Figure 5**. The analysis yielded a main effect of emotion, F(2, 110) = 12.64, MSE = 0.081, p < 0.001, partial η <sup>2</sup> = 0.19, as accuracy was generally higher for negative words (M = 97.7%, SD = 3.9%) than for neutral (M = 89.2%, SD = 13.2%), t(56) = 4.72, p < 0.001, or positive words (M = 91.7, SD = 7.8),

FIGURE 4 | Examples of Stroop stimuli in Experiment 2. Panel (A) shows a negative word with a congruent face, panel (B) shows a neutral word with an incongruent face, and panel (C) shows a positive word with an obscured face (non-face condition). Pictures are taken from the FACES database (Ebner et al., 2010) and can be accessed at: https://faces.mpdl.mpg.de/imeji/. Publication and display of the shown pictures for the purpose of illustrating research methodology are permitted under the FACES Platform Release Agreement.

t(56) = 6.33, p < 0.001. Accuracy scores for neutral and positive words were not significantly different (p = 0.267). There was a significant main effect of congruency, F(2, 110) = 7.23, MSE = 0.004, p = 0.002, partial η <sup>2</sup> = 0.12, as accuracy was lower for incongruent trials (M = 91.9%, SD = 6.0%) than for congruent (M = 93.4%, SD = 5.2%), t(56) = 2.98, p = 0.004, or non-face trials (M = 93.2%, SD = 5.0%), t(56) = 2.96, p = 0.005. There was no difference in accuracy between

Berger et al. Emotional Stroop in Ageing

non-face and congruent trials (p = 0.602). This main effect was qualified by a marginally significant congruency × age interaction, F(2, 110) = 3.13, MSE = 0.004, p = 0.053, partial η <sup>2</sup> = 0.05, as in younger adults, accuracy was significantly higher for congruent (M = 93.4%, SD = 5.7%) relative to incongruent trials (M = 90.8%, SD = 7.1%), t(27) = 3.35, p = 0.002. In older adults, accuracy scores for congruent and incongruent trials were not significantly different (p = 0.451). There was also an expectancy × congruency × emotion × age interaction, F(4, 220) = 3.96, MSE = 0.003, p = 0.007, partial η <sup>2</sup> = 0.07. Accuracy scores under HE and LE conditions were analyzed separately to follow up this interaction. The congruency × emotion × age interaction was significance under HE conditions, F(4, 220) = 3.27, MSE = 0.002, p = 0.019, η <sup>2</sup> = 0.06, but not under LE conditions (p = 0.142). As a next step, younger and older adults' data were analyzed separately and a congruency × emotion interaction was significant in younger adults, F(4, 108) = 2.91, MSE = 0.002, p = 0.039, η <sup>2</sup> = 0.10, but not in older adults (p = 0.471). Further analyses of younger adults' data showed that there was a main effect of congruency for neutral words, F(2, 54) = 6.38, MSE = 0.004, p = 0.003, η <sup>2</sup> = 0.19, but not for negative (p = 0.402) or positive words (p = 0.372). Follow-up t-test indicated that under HE conditions, younger adults showed higher accuracy for congruent neutral words (M = 92.6%, SD = 14.8%) than for incongruent neutral words (M = 87.4%, SD = 15.2%), t(27) = 4.45, p < 0.001, or non-face neutral words (M = 91.8%, SD = 12.1%), t(27) = 2.61, p = 0.014. No further significant main effects or interactions were observed for accuracy.

#### Reaction Times

RTs for younger and older adults are shown in **Figure 6**. The analysis yielded a main effect of congruency, F(2, 110) = 42.70, MSE = 0.006, p < 0.001, partial η <sup>2</sup> = 0.44, with overall slower RTs for incongruent (M = 802 ms, SD = 160 ms) than for non-face trials (M = 762 ms, SD = 142 ms), t(56) = 7.05, p < 0.001, or congruent trials (M = 762 ms, SD = 151 ms), t(56) = 8.42, p < 0.001. There was no significant difference in RTs for non-face compared to congruent trials (p = 0.582). This main effect was qualified by a significant expectancy × congruency interaction, F(2, 110) = 6.71, MSE = 0.004, p = 0.005, partial η <sup>2</sup> = 0.11. To follow up this interaction, the analysis was repeated with the factor congruency only comprising the factor levels congruent and non-face trials, which resulted in a significant expectancy × congruency interaction, F(1, 56) = 7.32, MSE = 0.004, p = 0.009, partial η <sup>2</sup> = 0.12. The analysis with the factor congruency comprising the factor levels non-face and incongruent trials also resulted in a significant expectancy × congruency interaction, which was more pronounced, F(1, 56) = 11.98, MSE = 0.006, p = 0.001, partial η <sup>2</sup> = 0.18. Follow-up t-tests revealed that under HE conditions, RTs on congruent trials (M = 755 ms, SD = 163 ms) were slightly faster than on non-face trials (M = 765 ms, SD = 154 ms), t(56) = 2.24, p = 0.029 (marginally significant after Bonferroni correction), whereas under LE conditions, the comparison between congruent and non-face trials was not significant (p = 0.155). Moreover, under HE conditions, RTs were slower for incongruent (M = 785 ms, SD = 153 ms) than for non-face trials (M = 765 ms, SD = 154 ms), t(56) = 3.20, p = 0.002. Under LE conditions, the difference in RTs between incongruent (M = 818 ms, SD = 188 ms) and non-face trials (M = 759 ms, SD = 159 ms) was even more pronounced, t(56) = 7.16, p < 0.001. Additionally, there was a significant main effect of emotion, F(2, 110) = 13.90, MSE = 0.031, p < 0.001, partial η <sup>2</sup> = 0.20, and follow-up analyses revealed that RTs for negative words (M = 746 ms, SD = 128 ms) were faster than for neutral words (M = 811 ms, SD = 192 ms), t(56) = 4.88, p < 0.001. RTs for positive words (M = 769 ms, SD = 155 ms) were also faster than for neutral words, t(56) = 3.32, p = 0.002, with no significant difference between RTs for positive and negative words (p = 0.064). Lastly, there was a main effect of age, F(1, 55) = 21.99, MSE = 0.437, p < 0.001, partial η <sup>2</sup> = 0.29, as older adults were overall slower (M = 849 ms, SD = 142 ms) than younger adults (M = 699 ms, SD = 115 ms). No further significant main effects or interactions were found for RTs.

#### Discussion

Experiment 2 investigated the effects of emotional target words on cognitive control in a Stroop paradigm. It was found that both younger and older adults showed reduced interference and facilitation in RTs under HE compared to LE conditions, suggesting that they engaged in proactive control when the proportion of conflict-generating trials was high. It was also found that emotion facilitated task performance in both younger and older adults, with more accurate responses to negative relative to neutral or positive words. Responses were faster for both negative and positive words relative to neutral words in younger and older adults. Age-related differences emerged for accuracy under HE conditions: When neutral words were the targets, younger adults showed lower accuracy for incongruent relative to non-face or congruent trials, whereas older adults did not show differences in accuracy between congruent, non-face and incongruent trials.

The enhancing effect of emotion on cognitive control in an emotional Stroop task was observed for both positive and negative words, as participants responded faster when presented with emotional rather than neutral words. This could be due to enhanced sensory processing of emotional material including words (Phelps and LeDoux, 2005; Vuilleumier, 2005; Phelps et al., 2006; Vuilleumier and Huang, 2009). It should be noted, however, that the enhancing effect on performance was particularly pronounced for negative words, as accuracy was higher for negative relative to neutral or positive words. This contrasts with findings from Experiment 1 showing improved performance for happy relative to neutral or angry faces. Differences in the effects of words and faces on Stroop performance will be discussed in the general discussion below.

When responding to neutral words, younger adults showed lower accuracy for incongruent relative to congruent or nonface trials under HE conditions in the present experiment. In contrast, older adults did not show differences in accuracy between congruent, non-face or incongruent trials. This mirrors the results in Experiment 1 that showed no interference effect for neutral targets in older adults and suggests that older adults were

relatively more successful than younger adults in responding to neutral words without being affected by distractors under HE conditions. Thus, the data do not support the notion of reduced proactive control in aging but suggest that older adults can even outperform younger adults under conditions requiring proactive control.

## GENERAL DISCUSSION

Two emotional Stroop experiments were conducted to assess proactive control in younger and older adults. The deployment of proactive control was manipulated by varying the expectancy of congruent and incongruent trials relative to trials without conflict (i.e., non-words in Experiment 1 and non-face items in Experiment 2). Besides addressing age-related differences in cognitive control, these experiments also investigated the effects of emotion on cognitive control using facial and verbal stimuli. These experiments revealed the following critical findings: First, older adults successfully deployed proactive control when the proportion of conflict-inducing items in a Stroop task was high. Second, emotion affected cognitive performance in a Stroop task similarly in both age groups. Third, the effects of emotion on performance were not uniform across facial and verbal stimuli. In the following, the implications of these findings will be discussed.

## No Evidence for Age-Related Impairments in Proactive Control in Emotional Stroop Tasks

The present findings extend the empirical evidence obtained in studies using the AX-CPT task (Rosvold et al., 1956) and suggest that older adults can deploy proactive control when needed. Across two experiments using an emotional Stroop paradigm, older adults showed reduced interference from incongruent relative to non-word/non-face trials when expectancy of conflict was high. Moreover, both age groups showed facilitation across both HE and LE conditions in Experiment 1 and under HE conditions in Experiment 2, with no age-related differences. These results are in accordance with prior research showing no

age-related impairments in proactive control (Paxton et al., 2008, Exp. 2; Staub et al., 2014).

In contrast, these results deviate from findings of previous research with the AX-CPT task (Rosvold et al., 1956). These indicated that older adults have greater difficulties than younger adults to efficiently use the context for a target response in AX-CPT tasks and were viewed as evidence for age-related decline in proactive control (e.g., Braver et al., 2005, 2009; Haarmann et al., 2005; Paxton et al., 2008, Exp. 1). The contrasting pattern of results suggests that ageing is not associated with a general impairment in proactive control but that older adults' ability to deploy it successfully might depend on the demand characteristics of the task at hand. The AX-CPT task can be used to assess proactive control "locally" at the level of trials, whereas the present study used a global approach to manipulate proactive control across an entire block of trials in a Stroop paradigm. Older adults were outperformed by younger adults in the former but not in the latter task. This suggests that, although they might find it difficult to adapt their performance flexibly on a trial-by-trial basis or under conditions of uncertainty (see also Mayr, 2001; Mutter et al., 2005), older adults can adapt to task conflict and deploy proactive control over a period of time (see also West and Baylis, 1998; cf. Staub et al., 2014). Moreover, participants have to maintain a two-fold set of rules and to update information in working memory in the AX-CPT task, which is not required in a Stroop task. Previous research has shown agerelated impairments in maintaining multiple tasks in working memory (Verhaeghen and Cerella, 2002; Reimers and Maylor, 2005; Wasylyshyn et al., 2011) and in working memory updating (Van der Linden et al., 1994; Hartman et al., 2001; Salthouse et al., 2003; De Beni and Palladino, 2004; Chen and Li, 2007; Schmiedek et al., 2009). Thus, it is possible that older adults showed no impairments in proactive control in the Stroop but in the AX-CPT task as the latter additionally involves processes that are known to undergo age-related changes.

## Stimulus-Specific Effects of Emotion on Cognitive Control

The experiments assessed the effects of emotion on cognitive control in younger and older adults. In Experiment 1, emotional faces were used as targets and happy faces were found to improve both accuracy and RTs in younger and older adults. This finding is in line with previous literature showing improved performance for happy faces relative to other expressions in WM tasks with facial stimuli (Levens and Gotlib, 2010, 2012; Cromheeke and Mueller, 2015). The effects of emotion were largely consistent across conditions, suggesting that emotion affected more general processes rather than cognitive control per se. More specifically, it is likely that improved performance for happy faces was driven by a recognition advantage of happy faces (Juth et al., 2005; Becker et al., 2011; Becker and Srinivasan, 2014) and their overall rewarding effect (O'Doherty et al., 2003; Tsukiura and Cabeza, 2008) as discussed above. Importantly, the facilitating effects of emotion did not differ in the two age groups and there was no evidence for an increased positivity effect in ageing. This finding is not fully compatible with SST (Carstensen, 1993), according to which older adults focus on positive information in order to improve their wellbeing. However, it has been argued that the positivity effect emerges under instructions encouraging participants to process material freely (for a review, see Reed and Carstensen, 2012). In the present experiments, participants received specific instructions how to respond to stimuli, giving less room for older adults to process material the way they wanted. Thus, it is possible that their chronically active bias to focus on positive material was overridden by specific task requirements in Experiment 1.

In contrast to the facilitating effects of happy faces and impairing effects of angry faces, a somewhat reversed pattern of results was found for emotional words in Experiment 2. Both younger and older adults responded more accurately to negative relative to neutral and positive words, whereas RTs were faster for both negative and positive words relative to neutral words. Together, the results from Experiment 1 and 2 add to growing evidence that the effects of emotion on cognitive performance are not consistent across different stimulus sets. Previous studies focused on the processing of emotional stimuli (Kensinger and Schacter, 2006; Rellecke et al., 2011) as well as attention (Vuilleumier, 2005; Vuilleumier and Huang, 2009) and reported that orienting to emotional material was more pronounced for faces than for words (Vuilleumier, 2005; Kensinger and Schacter, 2006; Rellecke et al., 2011). Such differences were usually explained with reference to the biological preparedness of emotional faces in contrast to words. More specifically, it was suggested that differences arise as emotional significance of faces can be extracted from perceptual features (Vuilleumier, 2005; Vuilleumier and Huang, 2009), whereas emotional significance needs to be extracted based on semantic knowledge from words (Kensinger and Corkin, 2003; Schacht and Sommer, 2009; Rellecke et al., 2011).

Consistent with previous research that found a stronger effect for faces than for words on cognitive performance (Kensinger and Corkin, 2003) the present research showed that the effects of emotion were greater for emotional faces in Experiment 1 (Accuracy: η <sup>2</sup> = 0.34; Reaction times: η <sup>2</sup> = 0.48) than for emotional words in Experiment 2 (Accuracy: η <sup>2</sup> = 0.19; Reaction times: η <sup>2</sup> = 0.20). However, the effects differed not only in size but also in their overall qualitative pattern. It is possible that extracting emotional significance by using semantic knowledge modified the effects of emotion on cognition not in a quantitative but a qualitative way. To gain a better understanding of why differences in effects between facial and verbal stimuli were observed, emotional pictures could be used in future studies. These allow the extraction of emotional significance through perceptual features but can convey the same meaning as words (e.g., picture of bomb rather than word "bomb"). Similar findings between pictures and words would indicate that faces are special in their effect on cognitive performance, which could be due to their evolutionary importance. In contrast, similar effects between pictures and faces would suggest that the extraction of emotional significance through perceptual features or semantic knowledge is relevant for the effect of emotion on cognitive performance.

Facilitation in RTs (i.e., faster RTs for congruent relative to non-word trials) was found for both age groups across both conditions in Experiment 1, which is consistent with

Roelofs (2003) suggestion that preceding distractors can prime a response. However, despite the same distractor-first paradigm in Experiment 2, facilitation was eliminated under LE relative to HE conditions. A reason for varying findings for facilitation in the two experiments could lie in differences in distractor priming between the different stimulus sets. In Experiment 1, irrelevant labels were presented 100 ms before the target face, whereas in Experiment 2, irrelevant faces were presented 100 ms before the target word. It is possible that priming was more effective for label than for face distractors for several reasons. For instance, the verbal modality of label distractors was congruent with the modality of responses in Experiment 1, as participants were required to respond to faces by using labels ("happy," "neutral" or "angry"). In contrast, there was no modality congruency between face distractors and target responses using labels ("positive," "neutral" or "negative") in Experiment 2. It is also possible that priming of words was particularly efficient in Experiment 1, as participants' attention was already directed to the word by the previously presented fixation cross. In contrast, the area of the face that the participants' attention was directed to by the fixation cross in Experiment 2 was unlikely the most diagnostic one as it was in the face's center rather than in the eye or mouth region. Thus, participants would have needed to saccade to the eye or mouth region to assess the expression in a short period of time. Taken together, it appears that despite using the same distractorfirst design in both experiments, distractor-first priming was more efficient in Experiment 1 with target faces and distractor labels than in Experiment 2 with target words and distractor faces.

## CONCLUSION

The present study contributes to research on proactive control in ageing and its effectiveness in the presence of emotional material. No age-related differences in proactive control were found in an emotional Stroop paradigm, which contrasts with results from AX-CPT studies that found age-related impairments in proactive control. Moreover, it was found that task-relevant emotion affected performance similarly in younger and older adults and that the effects of emotion on performance were qualitatively different for emotional faces and emotional words. Overall, these results highlight that the effects of emotion and

#### REFERENCES


age on proactive control depend on the task at hand and the chosen stimulus set.

## DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the ethics board of Birkbeck, University of London.

## AUTHOR CONTRIBUTIONS

NB contributed to the conception and design of the work, data collection, data analysis, and interpretation, drafting of the manuscript. AR and ED contributed to the conception and design of the work, the interpretation of the data, and critical revision of the manuscript.

## FUNDING

This work was supported by the Wellcome Trust Institutional Strategic Support Fund at Birkbeck, University of London, awarded to NB.

## ACKNOWLEDGMENTS

We thank all the participants for taking part in this research.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.01906/full#supplementary-material


Variation in Working Memory, eds A. R. A. Conway, C. Jarrold, and M. J. Kane, (New York, NY: Oxford University Press), 76–106. doi: 10.1093/acprof: oso/9780195168648.003.0004



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Berger, Richards and Davelaar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Attentional Reorientation and Inhibition Adjustment in a Verbal Stroop Task: A Lifespan Approach to Interference and Sequential Congruency Effect

Eric Ménétré\* and Marina Laganaro

Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

#### Edited by:

Ludovic Ferrand, Centre National de la Recherche Scientifique (CNRS), France

#### Reviewed by:

Andrew Aschenbrenner, Washington University in St. Louis, United States Miriam Gade, Medical School Berlin, Germany

> \*Correspondence: Eric Ménétré Eric.Menetre@unige.ch

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 14 June 2019 Accepted: 20 August 2019 Published: 06 September 2019

#### Citation:

Ménétré E and Laganaro M (2019) Attentional Reorientation and Inhibition Adjustment in a Verbal Stroop Task: A Lifespan Approach to Interference and Sequential Congruency Effect. Front. Psychol. 10:2028. doi: 10.3389/fpsyg.2019.02028 Several parameters influence the interference effect elicited in a Stroop task, especially contextual information. Contextual effects in the Stroop paradigms are known as the Gratton or Sequential congruency effect (SCE). This research aims at isolating two processes contributing to the SCE in a Stroop paradigm, namely attentional reorientation from the color to the word and vice-versa, as well as inhibition (engagement/disengagement from one trial to the next one). To this end, in Study 1 subprocesses of the SCE were isolated. Specifically, attentional reorientation and inhibition were segregated by submitting young adults to a discrete verbal Stroop task including neutral trials. In Study 2, the same procedure was applied to 124 participants aged from 10 to 80 years old to analyze how interference, SCE, and the aforementioned decomposition of attention and inhibition change across the lifespan. In both studies, the Gratton effect was only partially replicated, while both attentional reorientation and inhibition effects were observed, supporting the idea that these two processes contribute to SCE on top of conflict monitoring and of other processes highlighted in different theories (contingency learning, feature integration, and repetition expectancy). Finally, the classical age-related evolution was replicated in Study 2 on raw interference scores, but no age effect was observed when processing speed was taken into account, nor on the isolated attentional reorientation and inhibition processes, which is in line with the hypothesis of stability of the inhibition processes over age.

Keywords: attention, inhibition, interference, conflict adaptation, Stroop, sequential congruency effect, Gratton, verbal responses

## INTRODUCTION

At a first glance, the Stroop effect seems incredibly simple: incongruency between color word and color font interferes with color (font) naming. However, the Stroop task involves multiple cognitive processes whose effects can be disentangled. They include automatic word reading, color naming and inhibition, aiming at constraining the attentional focus on the relevant dimension. This definition of the task is relevant only when the current trial is taken into account. However, previous literature favors the hypothesis that the interference effect can vary depending on the

context of the previous trials as investigated by the conflict adaptation literature. In the framework of the sequential congruency effect (SCE), or conflict adaptation effect paradigms, specific effects of the subprocesses were isolated, namely the activation and deactivation of inhibition resources from the previous trial to the current as well as the reorientation of the attentional focus from the word to the color dimension and vice versa. This paper aims at investigating these two processes (inhibition and attentional reorientation) using a SCE paradigm including neutral trials (Study 1), and their evolution in relation with age using a lifespan approach in Study 2.

In the following, we will review the Stroop, the SCE effects and their evolution across the lifespan before proposing to isolate the aforementioned subprocesses of the SCE.

#### The Stroop Task and the Stroop Effect

One of the main approaches to investigate inhibition is by well-known situations of interference, such as those elicited by the Stroop task (Stroop, 1935). Traditionally, the task requires participants to vocalize the printed color from which orthographic color names are presented. The interference comes from the overlays of two inconsistent semantic pieces of information, namely the color word (reading) and the color font (naming). The initial task designed by Stroop (1935) required the participant to name consecutively all the trials from the same condition printed on a card and the experimenter measured the time spent to read the entire card. It is noteworthy to mention that the paradigm has been adapted over time as the development of informatics allowed for the task to become discrete and for reaction times to be measured for each trial and for conditions to be randomized across trials.

The cognitive mechanisms behind the Stroop effect are still largely debated, and several models tried to define the interactions between word reading and color naming dimensions. Among the first interpretations of this interference effect, the "horse-race model" (Dunbar and MacLeod, 1984) suggested that color naming and word reading processes are launched simultaneously, triggered by the stimulus onset and compete only when the two processes reach the production stage. This model represented the reference until the arising of interactionist models, and particularly the model by Cohen et al. (1990). It also gained in credit after its integration in a computational modeling of the Stroop effect (Botvinick et al., 2001, 2004). Botvinick and colleagues' model contributed to the understanding of the main processes involved in the Stroop effect, and it also confirmed the presumed localization of the Stroop effect from a neural point of view. This derived computational model based on Cohen, Dunbar, and MacClelland's model confirmed that the anterior cingulate cortex plays a key role in conflict detection and resolution, as suggested by ERP (Liotti et al., 2000; Hanslmayr et al., 2008; Szucs et al., 2009 ˝ ), fMRI (e.g., Peterson et al., 2002; Derrfuss et al., 2005), and PET studies (e.g., Bench et al., 1993; Carter et al., 1995).

Besides trying to explain the underpinning of the Stroop effect in cognitive or brain models, researchers also tried to test the conditions of the Stroop effect by designing innovative paradigms. This apparently simple task was then derived in a variety of ways (Macleod, 1991), among which the reverse Stroop task (in which the subject has to read the word and ignore the color font) (e.g., Stroop, 1935; Abramczyk et al., 1983; Dunbar and MacLeod, 1984), the semantic Stroop task (obtained by varying the semantic closeness of the distractor from the target) (Klien, 1964), or the auditory Stroop task (modulating the verbal information and the pitch) (e.g., Hamers and Lambert, 1972; Green and Barber, 1981). Authors also manipulated the response modality which is known to play a role in the magnitude of the interference effect (White, 1969; Keele, 1972; Macleod, 1991). At a single trial level, oral responses seem to increase the interference of the Stroop effect and have no or less impact on the facilitation effect, when compared to manual responses (Redding and Gerjets, 1977; Sharma and McKenna, 1998; Lamers and Roelofs, 2011).

As presented above, embracing all the processes involved in the Stroop task is very complex. Giving an understanding including as many aspects as possible of the task was achieved by focusing particularly on two parameters which are relevant in the discrete version of the Stroop task, namely the impact of the context in which the trial occurs and the lifespan evolution of task performance. Indeed, a trial can be influenced by the properties of the previous trial, modulating the interference and facilitation effects. This contextual effect relies on the general concept of SCE, or conflict adaptation effect (Gratton et al., 1992; Botvinick et al., 2001; Mayr et al., 2003; Egner, 2007) and can be analyzed by controlling the distribution of subsequent trial types in the experiment. Additionally, interference and conflict adaptation effects tend to evolve from childhood to adulthood and through aging. These modulations need to be clarified over the entire lifespan.

## The Sequential Congruency Effect

The SCE is a widely studied effect in cognitive psychology, offering an insight into new dimensions of well-known paradigms. It is usually defined as the facilitation effect to resolve a conflict, probably attributable to a pre-activation of the conflict monitoring mechanisms (Gratton et al., 1992; Botvinick et al., 2001, 2004; Egner, 2007; Schmidt, 2013; Duthoo et al., 2014b). This effect is not limited to the Stroop task but has been applied to other paradigms such as the Flanker (e.g., Mayr et al., 2003) and the Simon task (e.g., van Gaal et al., 2010). Contextual effects were investigated according to two different approaches. First, researchers varied the proportion of trials across conditions. This manipulation constrains the attentional system to be focused on the most recurrent dimension of a stimulus, creating a cost when trials of the less frequent condition are processed (Lowe and Mitterer, 1982). For instance, if congruent trials constitute the rarest condition, latencies for these trials will increase compared to incongruent ones, reducing the interference effect. The reverse effect is obtainable by designing a task with less incongruent than congruent trials.

Second, to observe the contextual effect of the previous trials, it is possible to consider the analysis of a specific trial's latencies depending on the condition of the previous one. This effect, also known as the Gratton effect, reflects the modulation of the attentional and executive system when evolving from a congruent

to an incongruent trial and from an incongruent trial to a congruent one (Gratton et al., 1992).

To understand the mechanisms behind this effect, a comprehension of the features on which attention is focused for each condition is needed. On congruent trials, the attention is attracted to the word dimension. This statement is debated (e.g., Besner et al., 1997), but supported by some evidence. All cognitive systems tend to choose the most efficient way to process information. It has been shown that reading color words was faster than naming color patches (Cattell, 1886; Brown, 1915; Stroop, 1935). It implies that the most efficient way to process a congruent trial (e.g., the word "blue" written in blue font) would be to focus on the word instead of the color. This statement was favored by empirical evidence. The combination of word reading and color naming even speeds up latencies of congruent trials. This facilitation effect was found when congruent trials were compared to neutral words displayed in different colors (as, for instance, the word "house" written in blue) (van Maanen et al., 2009). It implies that a color word in a Stroop item involves the retrieval of at least the semantic information via the reading processes; even though not all word reading processes are fully implemented in the Stroop task.

Regarding incongruent trials, the attentional control is avoiding focusing on the word dimension and is centered on the color. According to Gratton et al. (1992), switching from a congruent to an incongruent trial is then more effortful than processing two consecutive incongruent trials, because in the latter context the attentional focus is already constrained on the color dimension and the conflict monitoring mechanisms are already on (Botvinick et al., 2001, 2004). In this seminal paper, the same effect was reported on congruent trials, showing a facilitation for a second congruent trial compared to a congruent trial preceded by an incongruent one. The exact cognitive mechanisms behind the Gratton effect are still intensively debated. Originally, Gratton and colleagues suggested that participants strategically expected the following trial to be from the same condition as the current one, which involved a preparation to deal with the next trial (Gratton et al., 1992). A few years later the conflict monitoring hypothesis was proposed, suggesting that the brain is equipped with a specific conflict monitoring system, which activates the resources to face a conflict (Botvinick et al., 2001, 2004). In the case of a repetition of incongruent trials, the conflict monitoring system does not need to be activated, which explains the observed facilitation of two subsequent incongruent trials. The contextual effect is no longer explained by strategic pre-orientation of attentional resources on the relevant dimension but by an automatic process located in the anterior cingulate cortex. However, this interpretation has also been questioned and alternative explanations such as the feature integration theory and contingency learning have been proposed (Duthoo et al., 2014b). It has been claimed, for instance, that the contextual effect was the result of visual features facilitation due to the repetition of the same item (Mayr et al., 2003). The experiment motivating this conclusion was centered on the Flanker task. This task contains a limited number of items, implying that the exact same item would be repeated from the previous trial to the current one multiple times during the task. In a second paradigm the effect was no longer significant since the same condition was repeated but without repetition of the exact same stimulus. Nevertheless, some studies replicated the paradigm with variable stimuli in the same condition and found a SCE (Wühr and Ansorge, 2005; Notebaert et al., 2006; Egner, 2007). A more strict interpretation of the repetition problem, called "feature integration theory" speaks for a modulation of the contextual effect if even some elements are repeated while some others are not (Hommel et al., 2004). According to the authors, each stimulus' dimensions are binned together and stored in an "event file." If part of the encoded features are repeated but some associated others are not, this creates an interference (Hommel, 1998).

An other issue regarding the type of stimuli included in the task has also been debated. In a Stroop task designed to elicit a SCE, only congruent and incongruent trials are usually included. However, the set of items is more limited in the congruent condition since the number of combinations of color fonts and color words is more limited in the congruent condition (both dimensions must match) as compared to incongruent condition. Therefore, the words in the congruent condition are more often associated to the correct response (i.e., the irrelevant information) which make them become more informative of the response (Mordkoff, 2012; Schmidt, 2013; Duthoo et al., 2014b). This imbalance might also contribute to the SCE, however, it probably does not explain it in its entirety either. A study using a six-colors oral Stroop task, which controlled for this bias, still found a consistent SCE (Duthoo et al., 2014a). As described above, many studies tried to identify the factors and cognitive mechanisms underlying the SCE. To further clarify the underpinning processes of the SCE, some researchers introduced neutral trials in the Flanker task (Kunde, 2003), or the Simon task (Wühr and Ansorge, 2005; Aisenberg et al., 2014). To our knowledge, Lamers and Roelofs (2011) were the only ones to design an oral and manual response Stroop paradigm including neutral trials to investigate if the Gratton effect was driven either by the reaction to the conflict, the reorientation of the attention, or by a combined effect of the two. Their results suggested that, independently of the modality of the response (verbal or manual), attention reorientation seems to be the main generator of the observed facilitation effect in the repetition of incongruent trials relative to incongruent trials preceded by another condition.

It should be noted that the conflict adaptation effect was named in different ways across the literature. It was first referred to as the Gratton effect, relatively to the seminal paper by Gratton et al. (1992), then conflict adaptation effect, and sometimes SCE. It is however not always clear whether these effects relate only to the comparison between a repetition of an incongruent trial as compared to an incongruent trial preceded by a congruent one, all the different combinations of previous and current trials, or the general contextual effects in which the current trials happen (i.e., the manipulation of the proportion of congruent and incongruent trials). In the studies presented here, since neutral trials will be added to the set of stimuli, some combinations of previous/current trials will not involve conflict, but a difference in the gradient of congruency. We will therefore use the term sequential congruency effect to describe

all the different combinations of previous and current trials, and Gratton effect on incongruent current trials to name specifically the facilitation due to the repetition of incongruent trials. The facilitation due to the repetition of two congruent trials compared to a congruent trial preceded by an incongruent one will be referred to as the Gratton effect on congruent trials.

To sum up, the SCE results in a complexification of the more basic interference paradigm which takes into account the previous trial's condition. Nevertheless, the effect proved its volatility since it has not been consistently replicated. Moreover, SCE and Stroop effects have been extensively investigated providing an in depth understanding of interference. Nevertheless, interference is not a static phenomenon. It is rather a dynamic process evolving throughout the human development (Comalli et al., 1962; Macleod, 1991; Li et al., 2009). However, virtually all the studies presented above recruited only young adults. A lifespan insight into the Stroop interference and SCE would therefore bring a better understanding of the ways humans process interference.

#### Lifespan Perspective of the Stroop Task

As many cognitive processes, the Stroop effect follows a U-shaped curve across the lifespan (Macleod, 1991). Interference is maximal during childhood, diminishes to become minimal during adulthood, and increases again with aging (Comalli et al., 1962). In children, the Stroop effect appears within the first year of reading acquisition, is maximal at this time of life, and decreases with age. The interference U-shaped distribution was not questioned until the last two decades, specifically regarding aging. Indeed, some studies reported a deficit in inhibition for the elderly (e.g., Andrés et al., 2008), while some others show a stabilization of inhibition performances (e.g., Sebastian et al., 2013) or even an improvement (e.g., Fernandez-Duque and Black, 2006). In particular, a recent meta-analysis (Rey-Mermet and Gade, 2018) concluded that when integrating processing speed by using derivate of mixed models analyses (state-trace analyses), the specific lifespan effect of inhibition disappeared.

There is clearly a need for additional investigations on the entire lifespan, focusing on either changes from childhood to adulthood or through aging (Craik and Bialystok, 2006). Regarding the SCE evolution across the lifespan, we are not aware of any study investigating how age impacts this effect all along the lifespan spectrum. There are, however, investigations on the transition from childhood to adulthood (Waxer and Morton, 2011; Kray et al., 2012; Larson et al., 2012; Ambrosi et al., 2016; Smulders et al., 2018) and on the transition from young to older adults (Puccioni and Vallesi, 2012; Aisenberg et al., 2014; Aschenbrenner and Balota, 2015; Xiang et al., 2016). These studies suggest that children show stronger interference effects and higher error rates relative to young adults, but when combining studies from the entire lifespan spectrum, the SCE seems globally stable across development, nonetheless showing stronger magnitudes at the extremities of the lifespan continuum. Studies using electrophysiological evoked potentials also failed to observe differences in the amplitude of the evoked potentials on the N450 component between children and young adults (Larson et al., 2012), suggesting that the processes underpinning the Gratton effect are identical in children and young adults. Another ERP study on different tasks (stimulus-response compatibility, Simon task and a hybrid choice-reaction/No-Go task) found differences in the magnitude of the Gratton effect between children and young adults (Smulders et al., 2018), which disappeared after correction of processing speed differences among the age-groups. Results on aging (Puccioni and Vallesi, 2012; Aisenberg et al., 2014; Aschenbrenner and Balota, 2015; Xiang et al., 2016), suggest that the Gratton effect is preserved. This paradigm helped clarify two main hypotheses regarding decrease in cognitive performances with aging (Puccioni and Vallesi, 2012). The first one stands for a general slowing explaining the exacerbated reaction times in the elderly (Salthouse and Badcock, 1991; Salthouse, 1996), while the second suggests that the elderly suffer from a frontal lobe degeneration, altering executive performances (West, 1996; West and Bell, 1997). Since the Gratton effect remained preserved with aging, the authors suggested that their results favored the general slowing hypothesis. This conclusion was corroborated by Aisenberg et al. (2014). By increasing the inter-stimulus interval, the authors reported normalized performances regarding the Stroop effect and SCE in aging. However, it is noteworthy to emphasize that some studies failed to replicate the Gratton effect either with young adults and elderly participants (Xiang et al., 2016). Since repetitions of the same dimension were avoided in the design of the task, the authors argued that the results favored the hypothesis of the feature integration theory. The results would then be attributable to the repetition of the same dimension's characteristic from the previous trial to the current one (Hommel, 1998; Mayr et al., 2003; Hommel et al., 2004).

To summarize, while the Stroop effect is among the strongest phenomena reported in cognitive psychology, the Gratton effect tends to be volatile, sometimes difficult to highlight and thereby rendering the interpretation of the involved cognitive processes difficult. It nevertheless emerges from the literature that the Gratton effect reflects, at least partly, the activity of the conflict monitoring mechanism (Botvinick et al., 2001, 2004; Duthoo et al., 2014b), pre-orienting the attentional focus on the relevant dimensions for the next trials. Regarding the lifespan evolution, the Stroop effect does not seem to be impacted by age, suggesting that processing speed is a much more relevant factor to explain age differences in inhibition. The SCE effect also tends to remain stable over the lifespan. Although both the SCE and lifespan approaches already represent valid methodologies to better understand the interference effect. To go further, we therefore suggest that new insights can be given through the inclusion of neutral trials to a SCE paradigm. In particular, we propose to dissociate the switching mechanisms from one dimension to another and the modulation of inhibition processes from one trial to the next over the entire lifespan, as detailed in the ensuing section.

## The Present Study: Beyond the Sequential Congruency Effect

As described above, the increased reaction times resulting in an interference effect in the Stroop task encompass the SCE.

In particular, the SCE can be further decomposed into the bidirectional reorientation of the attentional focus between the word or the color dimension and the engagement/disengagement of inhibition processes. To our knowledge, no study has tried to demonstrate this dissociation so far. However, some previous studies investigated the phenomenon of attentional capture by a salient stimulus close to the target followed by a reorientation of the attention from the distractor to the relevant target (Folk et al., 1992; Lamy et al., 2004; Serences et al., 2005; Chang et al., 2013). More precisely, a relatively close paradigm to the Stroop task was proposed in this domain (Serences et al., 2005; Chang et al., 2013). In these studies, the participants were asked to spot the red central letter among rows of letters displayed in different colors. The surrounding letters could be in red, in different colors, or in black. This paradigm allowed to isolate either the attentional capture effect alone or associated with the reorientation of the attention. Results suggest that attentional reorientation is a specific mechanism and the related brain regions activated involve mainly the temporo-parietal junction (although the involved brain network responsible for this reallocation of the attentional resources is still debated: Corbetta and Shulman, 2002; Geng and Vossel, 2013; Diquattro et al., 2014). Even though this task has some similarities with the Stroop task, it does not involve word processing. Moreover, in the SCE, the changes are made sequentially from the previous trial to the current one, while in the Serences et al. (2005) and Chang et al. (2013) studies, targets and distractors were presented simultaneously. To understand how the SCE was decomposed in subprocesses in the present study, **Figure 1A** represents the nine possible combinations of transitions from the previous trial

to the current one along with changes in the attentional focus and inhibition cost in a Stroop paradigm including neutral trials. In the present study, neutral trials were sequences of symbols presented in colored ink.

**Figure 1B** represents the expected results according to the division of the SCE into Gratton, inhibition and attentional reorientation effects. It is expected from the predictions of the Gratton effect, that when processing a repetition of incongruent (II)<sup>1</sup> trials, the second incongruent trial will show faster responses than an incongruent trial (I) preceded by a congruent trial (C), i.e., that (II < CI). A repetition of congruent trials (CC) should also be performed faster than a C trial preceded by an I (IC). Crucially, as shown in **Figure 1B**, an incongruent trial preceded by a neutral trial (NI) involves the engagement of inhibition load but no attentional reorientation. To isolate inhibition, an NI trial can be compared to an II trial, which involves the same processes, except for the engagement of the inhibition load. The same logic can be applied to the deactivation of the inhibition, by comparing IN trials to NN ones. To isolate the attentional reorientation processes, the cognitive cost caused by the switching from the color to the word dimension can be assessed by the comparison between NC and CC trials. The investigation of the opposite effect, namely the reorientation from the word to the color dimension, can be measured by comparing CN to NN trials. Finally, the Gratton effect implies a change in both dimensions simultaneously. It is nevertheless possible to identify other situations where both dimensions operate simultaneously. Between NC and IC, inhibition processes are deactivated while in both conditions the attention is reoriented from the color to the word dimension. In the NI – CI comparison, the attentional focus switches across dimensions and inhibition has to be activated. Therefore, the Gratton effect should be considered as the sum of the two subprocesses, since it involves a change in both dimensions.

Study 1 will focus on testing the theoretical framework described above regarding the decomposition of the SCE in attentional reorientation mechanisms and engagement or disengagement of the inhibition processes, then, Study 2 will analyze its evolution over the lifespan.

## STUDY 1

This study tests the SCE decomposition, inhibition, attentional reorientation effects and interactions between the two processes by adding neutral trials to the standard Stroop task on a group of young adults.

## Method

#### Participants

Twenty seven young adults (mean age = 24.4 years old, SD: 3; 17 women) were recruited for the purpose of this study. They were all native French-speakers, did not report any neurological, psychiatric, color vision or language impairment and received a financial compensation for their participation to the study. All of them gave a written informed consent and the local Ethics Committee approved the entire procedure (see **Appendix 1** for the gender and age distribution).

#### Materials

The stimuli were those of discrete standard Stroop tasks (from Fagot et al., 2008), namely four French color words ("bleu"; "jaune"; "rouge"; "vert", respectively: blue, yellow, red, and green) displayed in lower case, at the center of the screen in one of the four possible colors. The neutral stimuli were arrays of symbols ("++++"; "ˆˆˆˆ"; " "" "; "∗∗∗∗") presented in one of the four colors. The stimuli were either congruent (the color word matched the color in which the word was displayed), incongruent (the color word was displayed in a different color font) or neutral (a non-verbal symbol displayed in one of the different possible color fonts).

The total number of trials was 180, equally distributed among the three conditions (60 congruent, 60 incongruent, and 60 neutral). Stimuli order was pseudo-randomized by the Mix software (Van Casteren and Davis, 2006) to avoid the repetition of the same item, and allow a repetition of the same condition for a maximum of three consecutive trials. In addition, a color presented in the previous trial (target or distractor level) was not present in the following one, to prevent visual (and verbal) repetition, according to the "feature integration theory" (Hommel et al., 2004). As described in the Introduction section, the interference effect is maximized in SCE paradigms since word-color combinations are more numerous in the incongruent condition compared to the congruent one. This effect is known as the contingency learning effect (Mordkoff, 2012), and was controlled in the present study by adding neutral trials. In a Four colors Stroop task of 180 items, each color is therefore presented 15 times per Stroop condition.

#### Procedure

The subjects sat approximately 80 cm from a 17-inches screen (refreshment rate: 50 Hz). The experiment was performed on the E-prime software (E-Studio). Oral responses were recorded by a dynamic microphone, digitally amplified and the signal was redirected to a computer. Subjects had to produce only oral responses and reaction times were obtained by marking manually the onset of the production (the delay between stimulus presentation and vocal onset) during the pre-processing stage using the Check Vocal software (Protopapas, 2007).

Each trial of the Stroop task began with a 500 ms white fixation cross on a black background, followed by a 200 ms black screen. The stimuli were then displayed on a black background for 1500 ms, followed by a variable interstimulus black screen lasting from 1000 to 1200 ms. The timing was identical for all age groups. Participants were systematically asked to name the color in which the stimuli were displayed as fast and as accurately as possible, independently of the written sequence. Before the beginning of the task, the participants were trained on 32 trials including all possible stimuli combinations, to make sure they understood the task and to avoid a novelty effect on the first trials.

<sup>1</sup>Hereafter, transitions are indicated as the first letter of the previous trial next to the first letter of the current trial. For example, a congruent item preceding an incongruent one will be mentioned as a CI trial.

#### Data Analysis

For data cleaning purposes, a trial was considered as incorrect if the subject produced the wrong color name (even if the response was corrected), or if the subject did not give any answer. Incorrect responses and latencies exceeding two standard deviations from the individual mean reaction times were excluded from the latency analysis. For the congruent, incongruent and neutral conditions, the percentage of rejected trials were respectively 3.83, 17.07, and 3.02%. See **Appendix 5** for the percentage of excluded trials among SCE conditions. In addition, the first trial of each subject has not been analyzed as a SCE trial since there was no previous trial.

Analyses were performed using the R software (R Core Team, 2019). Data wrangling was mainly performed using the dplyr (Wickham et al., 2019) and tidyr (Wickham and Henry, 2019) packages, while statistical analyses were computed with the base package, lmerTest (Kuznetsova et al., 2017) and Lme4 (Bates et al., 2017) using the mixed models lmer function. Errors were analyzed by generalized mixed models using the glmer function.

Since contrasts were explored by turning over the intercept variable of the model to target all relevant comparisons, the resultant multi-testing bias was corrected using the Bonferroni method (Bonferroni, 1936). Therefore, the significance threshold was divided by the number of necessary models.

#### Results

In the Stroop task, overall production latencies were the fastest for neutral trials, and latencies were faster on congruent trials than on incongruent trials. **Table 1** displays the mean reaction times for the current condition and for the previous trial condition. When considering the previous trial's condition, an incongruent previous trial causes larger interference on the processing of the next trial than any other condition, while the congruent condition seems to be facilitatory for the next trial. Regarding accuracy, the best performance was observed in the congruent condition, while the incongruent condition generated the highest error rate. A previous incongruent trial tends to lead to higher error rates on the current trial, while a congruent previous trial minimizes the chances to commit errors as compared to the other two conditions. SCE latencies and error rates are presented in **Figure 2**.

Results of the Stroop and SCE were included in a linear mixed model. Model selection was performed by loading the random part of the model with all relevant variables as random slopes and intercepts, and reducing this random part until the model

TABLE 1 | Mean reaction times and accuracy rate per condition, separately for the current and the previous trial.


converged. Then, the most complex random structure able to converge was adopted (regarding model selection, see Zuur et al., 2009). More precisely, in a first model, all fixed factors were added as a random slope (Barr et al., 2013; Matuschek et al., 2017). Since the model did not converge, interactions between factors was removed at first and then the less relevant factors were removed hierarchically from the model. After several attempts, only the current trial was supported as random slope. Regarding random intercepts, the participants (Subject) and Items variables were retained. Regarding fixed factors, the model included the previous and current condition, the stimuli presentation order as well as the interaction between the previous and current condition to account for an eventual learning or fatigue effect occurring during the task (complete model detailed in **Table 2A**).

The general model showed a main effect of the current condition, the previous condition, a main effect of stimuli presentation order, as well as a significant interaction between the previous and the current condition.

The Stroop effect (latencies on incongruent trials compared to congruent ones) was replicated. Moreover, latencies on incongruent trials were also significantly slower than on neutral ones. Finally, the congruent and neutral conditions did not differ significantly. The decomposition of the previous trials main effect showed that all contrasts are significant, which confirms that compared to both neutral and congruent conditions, an incongruent previous trial interferes with the processing of the following one (see **Table 2B** for detailed results).

Regarding the post hoc decomposition of the SCE conditions, six models (two per current trial condition) were necessary to estimate the results. The original data was divided in three data frames, one for each current trial condition, and one model was run for each data frame including the previous trial condition (presented in **Table 3**). Since there are three previous possible conditions, two models per condition were necessary. As shown in **Figure 3**, there is only a partial effect of the SCE, since the Gratton effect on incongruent trials was not replicated (II trials are not performed faster in comparison to CI trials). This result is in opposition to the Gratton effect on congruent trials (facilitation for CC trials compared to IC trials), which was strongly significant. Regarding the division of the SCE, concerning the attentional reorientation, both effects returned significant. Although, regarding the inhibition activation and deactivation, only the contrasts corresponding to deactivation (NN vs. IN) reached significance. As shown in **Figure 3**, neither of the comparisons implicating an effect of both attention and inhibition simultaneously reached significance.

Errors were analyzed using generalized linear mixed models according to a binomial distribution. The results (see **Table 4**) show a significant difference between the congruent and incongruent conditions with increased error rates in the latter, as well as between the incongruent and neutral condition, but no difference between the congruent and neutral conditions was seen.

#### Discussion Study 1

This first study aimed at establishing whether the SCE embedded other processes which can be isolated by adding neutral trials to




In first line the lmer R function generating the model and (B) post hoc results (after Bonferroni correction, <sup>∗</sup> the significance threshold equals 0.025) of the current and previous trials condition.

a Stroop task. As highlighted in the Introduction section, two processes can be isolated: an attentional reorientation mechanism from the color to the word dimension and vice versa and the engagement/disengagement of the inhibition load. Results of Study 1 partially sustain the involvement of these processes. First, the Gratton effect on incongruent trials was not replicated, even though the same effect on congruent trials, i.e., a facilitation of CC trials relative to IC trials, was observed. As already described in




To obtain these values, six models were mandatory, implying that, according to Bonferroni correction method, <sup>∗</sup> the significance threshold is reduced to 0.008.

the Introduction, the Gratton effect on incongruent trials is very volatile and several studies failed to replicate it (Mayr et al., 2003; Lamers and Roelofs, 2011; Kreutzfeldt et al., 2016; Xiang et al., 2016). From a cognitive point of view, the effect relies on a reduction of the interference effect by a preparedness from the attentional and executive systems to face a conflict. However, if the conflict is too strong for the current trial, the facilitation effect is reduced or suppressed. The cost of the incongruent trials may explain the absence of a Gratton effect in the present study. However, the absence of the Gratton effect may also be related to a limited number of trials (N = 180) or subjects (N = 27), or to the presence of the neutral trials themselves. It has actually been reported that increasing the stimulus set size might increase the interference effect (Gholson and Hohle, 1968; Fraisse, 1969; Macleod, 1991), but this factor is also known to reduce the SCE (Kray et al., 2012). In the present experiment, by adding neutral trials, the number of stimuli increased, reducing the possibility to anticipate the next trial's condition (Gratton et al., 1992; Schmidt and De Houwer, 2011). Moreover, since the design of the study was done to control for the bias described in the "features integration theory" (Hommel et al., 2004), there was no repetition (on both dimensions, i.e., the color font or the color word) from the previous trial to the current one. When this effect is controlled for, a strong reduction of the SCE is observed, therefore also contributing to the absence of effect in this study. Moreover, by introducing neutral trials, the classically observed contingency learning effect was controlled for. As described in the Introduction, this effect reflects the non-conscious association between the color word and the color font as in a Stroop task including the same number of congruent and incongruent trials, the color word is significantly more often associated with the correct response. However, in the present paradigm, the effect has been counterbalanced since neutral trials were added.

It has been suggested in the literature that the Gratton effect is due exclusively to the preparedness to the conflict resolution (Botvinick et al., 1999, 2001, 2004). However, when adding neutral trials, some of the comparisons do not contain conflictual trials, and facilitation effects were also highlighted. This suggests that conflict adaptation as triggered by a preactivation of

TABLE 4 | Results of the generalized linear mixed models estimating differences between conditions regarding the accuracy.


Two models were necessary to perform all the comparisons, according to the Bonferroni correction, <sup>∗</sup> the significance threshold equals 0.025.

the conflict monitoring hypothesis, is a component of the SCE but it is not the only mechanism involved. Attentional reorientation and specific manipulation of the inhibition load (activation/deactivation) seems to be one of them. Here, with the isolation of further processes, a significant increase of RTs was observed for the conditions including an attentional reorientation from the color dimension to the word dimension and from the word dimension to the color one. Since only congruent and neutral trials entered this comparison, these results tend to validate that the mechanism of attentional reorientation also causes an increase of RTs independently of incongruence. Finally, the activation of the inhibition resources from the previous trial to the current one was not significant (NI as compared to II), whereas deactivation increased the latencies (IN as compared to NN). To understand this effect, we need to emphasize that the SCE involving incongruent current trials relies on a diminution of the interference. This partial effect could reflect the fact that a threshold is reached after which the SCE is not strong enough to minimize the interference.

Before any further interpretation of the results, we will investigate in Study 2 whether the same results are observed with a larger sample and how these effects evolve over the lifespan.

## STUDY 2

The paradigm was virtually identical to Study 1, but involving a larger sample of participants covering six age groups from school-age children to 80 year-old adults. As discussed in the Introduction, the interference effect is known to remain stable over the lifespan, except if processing speed is controlled for. We will therefore divide the classical interference index (I-C) by the neutral trials' latencies to control for processing speed. As processing speed is neutralized in the SCE analyses (since only one dimension is manipulated independently of the other involved processes), the results of the SCE should remain globally stable with aging. This trend should also be generalizable to the other age groups (school-aged children), even though the underlying cognitive processes are probably different. Although inhibition seems to remain stable across the lifespan, the literature does not provide hints about the lifespan evolution of other executive processes such as the reorientation of the attentional focus to one specific dimension. However, under the assumption that the absence of lifespan effects is generalizable to other executive mechanisms, a lifespan evolution of attentional reorientation should not be observed either.

## Method

#### Participants

Hundred and twenty four participants, including the first 20 participants from Study 1 [aged 10–80 years-old, mean age: 39.8 years (SD = 24)] were recruited from six age groups (10–13, 16–18, 20–30, 40–50, 60–70, 70–80). All participants were native French-speakers, without self-reported neurological, language, color perception, or psychiatric impairment and received a financial compensation for their participation to the study. All participants signed a written consent and the entire procedure was approved by the local Ethics Committee.

#### Materials and Procedure

Materials and procedure were identical to Study 1 (see **Appendix 1** for gender and age distribution among the different age groups).

#### Data Analysis

Data preprocessing followed the same procedure as described in Study 1. Four participants with mean RTs more than 2 SD away from the mean of their age-group were excluded (one from the age-group 16–18, one from the age-group 40– 50, and two from the age-group 60–70). Regarding extreme reaction times, congruent, incongruent and neutral conditions respectively showed a 4.36, 16.77, and 3.3% of rejected trials. For SCE conditions over the lifespan and specifically per age groups, see **Appendix 5**.

To analyze whether processing speed plays indeed a role over the lifespan, in addition to the standard interference index (estimated by subtracting the averaged congruent latencies from the incongruent ones:I–C), a second score was computed by dividing the standard interference score by the averaged neutral trials latencies for each subject. A one-way ANOVA assessed the evolution of each of these scores across the lifespan.

The SCE and the isolation of attentional reorientation and inhibition activation or deactivation was investigated by mixed models analyses, following the same principles as analyses performed in Study 1, except for adding the age groups as a fixed effect. Error analyses followed the same logic.

#### Results

#### Corrected and Uncorrected Interference Indexes Over the Lifespan

Regarding the evolution of the standard interference score (I - C) over the lifespan, the one-way ANOVA for uncorrected interference indexes revealed a significant main effect of age groups [F(5,111) = 4.07, p = 0.002]. According to the Tukey test and as shown in **Figure 4**, children aged 10–13 were significantly slower than young adults aged 20–30 [t(111) = 3.3, p = 0.02, SE = 18.29, β = 60.34], and than adults aged 40– 50 years [t(111) = 3.27, p = 0.02, SE = 18.53, β = 60.62]. Older adults (60–70 years old) showed larger interference than younger adults [relative to 20–30 years old: t(111) = −3.05, p = 0.03,

SE = 18.8, β = −57.4; and to 40–50: t(111) = −3.03, p = 0.03, SE = 19.03, β = −57.69].

With the corrected interference index((I–C)/N), the analysis showed a main effect of age groups [F(5,111) = 2.36, p = 0.04], but none of the post hoc comparisons reached significance after correction (see **Appendix 2** for detailed results).

#### Sequential Congruency Effects

Latencies and error rates of the SCE conditions are presented in **Figure 2** and **Appendix 3**. The results of the mixed model on latencies are presented in **Table 5**. The final model highlighted a significant main effect of the current trial condition, a significant main effect of the previous trial condition, and a significant main effect of stimuli presentation order. Previous and current trial conditions interact with age and, crucially, a significant interaction between the previous trial condition and the current condition is observed, without a triple interaction with age.

Since raw reaction times were expected to evolve following a U-shaped curve, a linear model could be biased relative to the quadratic shape of the curve. To further analyze this issue, two models were generated. The first one compared age groups from children to young adults, while the second one compared the young adults to older ones. Results were globally similar for both halves of the lifespan, except for stimuli presentation order, which was no longer significant in the second part of the lifespan, as well as the interaction between the current trial and the age groups which was significant only for the second half of the lifespan (results are presented in **Appendix 4**).

The post hoc decomposition of the model presented in **Table 5B** highlighted a significant difference between current congruent and incongruent conditions (the standard Stroop interference effect), as well as significantly slower reaction times for the incongruent condition compared to the neutral one. Congruent and incongruent conditions were not significantly different. Regarding the previous trials condition, all three conditions differ from each other, and a previous incongruent trial alters the processing of the next trial, independently of the next trial's condition.

Since the interaction between the previous and current conditions was significant as well, but not the triple interaction with age groups (see **Table 5A** above), contrasts were computed across all age-groups. As in Study 1, the original data frame was divided in three sets, each including only congruent, incongruent or neutral current trials conditions. A mixed model analysis was performed on each data frame with the previous trial's condition as a fixed effect, the previous trial's condition as random slopes, and subjects and item as random intercepts. Results are presented in **Table 6** and **Figure 5**.

Errors were analyzed using generalized linear mixed models. The model included as fixed effects the current and previous conditions, and the Subject and Item variables as random factors. Interactions between the previous, the current trial and with the age-groups was not included in the final model since it failed to converge. Results, as presented in **Table 7**, suggest that all conditions were significantly different from each other.

## Discussion Study 2

This second study aimed at assessing interference and SCE as well as the specific contribution of their subprocesses over the lifespan. The present study examined effects of a bidirectional attentional

#### TABLE 5 | (A) Main effects and interactions of the general model, and (B) contrasts on current and previous trial condition.

#### (A) Model: lmer(log(RT)∼ Previous.trial∗Current.trial∗Age.groups + Presentation.order + (1 + Current.trial| Subjects) + (1| Items), data = data, REML = FALSE)


<sup>∗</sup>The significance threshold was reduced to 0.008.

TABLE 6 | Summary table of the mixed models used to estimate the SCE conditions.


Seven models were necessary to estimate all the comparisons. After Bonferroni correction, <sup>∗</sup> the significance threshold was reduced to 0.008.

reorientation and engagement or disengagement of inhibition in a larger sample, as well as their evolution over the entire lifespan.

First, the standard Stroop interference index showed a main effect of age groups and significant differences, especially between the groups at the two extremities of the lifespan and those in the middle. However, the corrected version of the interference index ((I–C)/N) did not show any significant difference across age. This finding is in line with the literature claiming that there are no influences of aging on performance when the processing speed factor is controlled for (Aisenberg et al., 2014; Rey-Mermet and Gade, 2018; Smulders et al., 2018).

Second, the results on the SCE replicated those of Study 1 with a larger sample, showing again that the Gratton effect on incongruent trials is not robust (see Discussion of Study 1). Contrariwise, clear effect of attentional reorientation in both directions (from color to word or from word to color) appeared significant. On the larger group of Study 2, both activation and deactivation of inhibition slowed down production latencies, whereas only deactivation reached significance in Study 1. Notably, the inhibition activation effect goes in the opposite direction, suggesting that, inconsistent trials seem to be more effortful when preceded by an incongruent trial (II) than when preceded by a neutral trial (NI). This favors the interpretation mentioned in the discussion of Study 1, suggesting that the SCE was not powerful enough to reduce the interference effect. This effect in contradiction with the predictions (II slower than NI) might reflect an overload in terms of cognitive control, impacting nonetheless the previous trial but there seems to be a carry-over effect of the previous trial, impacting also the successive one. This finding is in line with the literature on other tasks suggesting that there is a reset of the attentional system (Kreutzfeldt et al., 2016), requiring time before being able to process a new item. This interpretation

is supported by the post hoc decomposition of the results showing that a previous incongruent trial increases significantly the latencies of the current one, independently of the current trial condition.

As described above, subprocesses were isolated from the SCE by adding neutral trials to a Stroop paradigm. These subprocesses are namely the disengagement of the inhibition resources and the reorientation of the attention, either from the word to the color dimension or the opposite direction. Both of the subprocesses returned significant with a larger sample. Moreover, the comparison involving both mechanisms: the transition from NC to an IC trial translating the cost due to an inhibition reduction while the attentional system is redirected to the word dimension is now significant. This new effect favors a role of the sample size in the results.

## GENERAL DISCUSSION

The present study aimed at investigating if different processes embedded in the Stroop interference effect and more precisely the SCE could be disentangled. In particular, we tried to isolate the effect of attentional reorientation from the color word to the color font dimension and the engagement/disengagement of the inhibition resources from one trial to the next. This was achieved by adding neutral trials to a SCE in a Stroop paradigm (non-verbal signs displayed in different colors) requiring oral responses. In a first study, this dissociation was tested on a group of young adults, while in Study 2, the isolation of attentional reorientation and inhibition processes was investigated on a larger sample, covering the entire lifespan.

## Standard Stroop Interference Effect

The standard interference effect was found in young adults as well as in the other age groups. The effect was larger in the youngest and oldest groups, following a U-shaped curve over the lifespan as previously reported (Comalli et al., 1962; Macleod, 1991; Li et al., 2009). Nevertheless, processing speed seems to be responsible for a large part of the observed difference among age groups. Indeed, when interference is corrected for processing speed (by dividing the interference index by the latencies of the neutral trials), age effects disappear. This latter observation is in line with recent results on aging (Rey-Mermet and Gade, 2018). To our knowledge, the effect of processing speed on differences in interference observed in the younger age groups have not been reported so far. Since interference is known to remain stable with aging when processing speed is controlled for, it was expected that the same effect would be observable for the group of children. The involved mechanisms are not necessarily identical for the two extremities of the lifespan, even though the behavioral results are similar. Interestingly, our results confirmed that for children as well, processing speed is the major dimension responsible for the evolution of the standard Stroop interference index. This finding implies that processing speed plays a key role in the evolution of the latencies over the lifespan.

## Sequential Congruency Effect

As reviewed in the Introduction, several theoretical accounts provide an understanding of the processes underlying the SCE. In Study 1 in a sample of young adults, there was a significant Gratton effect on congruent current condition whereas the same facilitation on incongruent trial did not reach significance. Among the possible interpretations of the lack of Gratton effect on Study 1 we evoked the sample size. However, this assumption has been disproved since the results remained globally identical in Study 2 on 124 participants. The quite low number of trials (180) could have been an explanation of the results, nonetheless, a study comparing a 384 trials task with a 192 trials one, reported virtually the same results, namely a consistent

TABLE 7 | Summary table of the generalized linear mixed model used to appreciate the differences between the three current items conditions regarding errors.


To estimate these post hoc, two models were necessary, implying that <sup>∗</sup> the significance threshold was divided by two and equals 0.025.

SCE in the Stroop task for young adults and elderly subjects (Aschenbrenner and Balota, 2017).

Regarding the involved mechanisms, the present study supports at least a mechanism of modulation of the inhibition load, which could favor the hypothesis of conflict monitoring. Moreover, the results strongly support attentional reorientation as a component of the effect. However, despite the fact that the design of this study controlled for biases such as the contingency learning effects (Mordkoff, 2012; Duthoo et al., 2014b), features integration theory (Mayr et al., 2003; Hommel et al., 2004), and repetition expectancy (by increasing the stimulus set size) (Gratton et al., 1992), no facilitation effect was found for a repetition of incongruent trials (II) as compared to CI. These effects might have diminished the Gratton effect on incongruent current trials, explaining the non-significant result. This interpretation is corroborated by the unexpected result that a repetition of incongruent trials (II) is more interfering than NI. Nevertheless, these alternative mechanisms cannot explain the entirety of the effect, as already suggested in the literature (Egner, 2007; Schmidt, 2013; Duthoo et al., 2014b). It is therefore possible that the Gratton effect could emerge with such a paradigm only when the right balance between interference effect and SCE is found. This can be achieved by reducing the interference effect, for example by adding an asynchrony of the stimulus onset between the color word and the color font presentation. It has been suggested that the Stroop interference effect could be reduced when presenting the color word distractor 400 ms before the color to name (Glaser and Glaser, 1982; Coderre et al., 2011) or when using a single centered colored letter (Besner et al., 1997; Augustinova et al., 2010). Regarding the Gratton effect on congruent trials, our results are in line with the literature and show that CC sequences are processed faster than IC sequences, which favors the hypothesis of a carry-over effect of the interference from a previous incongruent trial to the next one. In the same way age effects no longer appeared for speed corrected interference, age did not seem to interact with the SCE. This finding suggests that there is a potential stability of the effect over the entire lifespan. It seems therefore that control adjustment processes are functional already in school-age children and are preserved during aging. There is a discrepancy in the literature regarding the evolution of the SCE over the lifespan. Since no previous study investigated the entire lifespan, results will first be confronted to the results of studies investigating the changes during development and then to those on aging. From childhood to adulthood, the SCE is present and is consistently increased in children as compared to young adults. Regarding ERP data, results tend to show a stability of the N450 between children and adults (Larson et al., 2012). The present results seem to be in opposition with these arguments, as previous literature tends to favor the hypothesis that there is an evolution of overall performances. Results on older age groups are not as consistent. On the other hand, some results on the evolution of the SCE in aging suggest that the effect is increased, at least in a Stroop task with button press responses (Aschenbrenner and Balota, 2017), and in other attention and inhibition tasks (Smulders et al., 2018). Other studies tend to show a stability of performances across ages (Aisenberg et al., 2014; Larson et al., 2016). The present results also favor a stability of performances over the lifespan. However, as neutral trials were included in the present design, we cannot exclude that the SCE relies at least partly on the ability of the participant to predict the next trial's condition (Gratton et al., 1992; Mordkoff, 2012; Duthoo et al., 2014b). In which case results may evolve across the lifespan with a different design (with only congruent and incongruent trials). Nevertheless, such an explanation based on prediction cannot support the entirety of the SCE effects, since some contextual effects were reported on other SCE conditions.

Latencies show that elderly subjects only need more time, which may underlie the observed changes in raw performance (Aisenberg et al., 2014). This observation is in line with the general slowing hypothesis (Salthouse and Badcock, 1991; Salthouse, 1996), and is less compatible with the specific frontal lobe degeneration hypothesis leading to a reduction of executive functions, since all age groups seem to be homogeneous (West, 1996; West and Bell, 1997). However, since this conclusion relies on an absence of effect, further investigations should try to replicate these results.

#### SCE Versus Standard Interference Effect

Finally, the main aim of this study was to demonstrate that the SCE allows the isolation of other embedded processes, namely the reorientation of the attentional focus from the color to the word dimension or to the opposite direction and the engagement or disengagement of the inhibition load. The results of Study 1 and Study 2 support the implication of such mechanisms on top of the Gratton effect. However, the detailed subprocesses are clearly at play when the current trial is inconsistent, whereas they are not significant or effects are in the unexpected direction on the incongruent current trials. As suggested also by the results of Lamers and Roelofs (2011), the cost of processing incongruent trials is probably too high to allow sequential switching effects to emerge on incongruent current trials, but it clearly affects negatively the processing of the following trial whichever it is (see **Table 1**). This might be due to a carry-over effect of the interference from the previous trial, affecting the next one. To sum up, from the moment a Stroop trial is presented among other trials, the context in which it is presented will have an impact on the intensity of the interference effect. It has been widely described that conflict monitoring is a mechanism involved in

the preparedness to react to a potential conflict (Botvinick et al., 2001, 2004; Egner, 2007). The present results nevertheless suggest that attentional reorientation as well as specific adjustments of the inhibition load also play an important role in this effect.

Some limitations must be addressed in the current study. First, the number of trials (180) is not as high as in the majority of the studies. Although it has been suggested that this factor should not impact the results (Aschenbrenner and Balota, 2017), and additional studies seem necessary to understand how such effects are modulated by the number of trials. Second, by attempting to correct for a maximum of biases in the study such as the contingency learning effects (Mordkoff, 2012; Duthoo et al., 2014b) and the features integration theory (Mayr et al., 2003; Hommel et al., 2004), the number of SCE items are not perfectly balanced across all combinations of previous-current trials, which could have impacted the results. However, by adding neutral trials, the number of items increased as well, making the prediction of the next trial's condition much more difficult.

## CONCLUSION

The two studies showed that the SCE can be further decomposed into attentional reorientation mechanisms (from the word to the color dimension and from the color to the word dimension) and the engagement/disengagement of the inhibition load from one trial to the next. This was achieved by including neutral trials in a SCE design of a Stroop task. The results suggest that both decomposed processes are relevant to (young) adults as well as over the entire lifespan. The identified SCE subprocesses do not change across the different age groups, as well as the standard interference effect when processing speed is controlled for. The present findings also confirm that the Gratton effect is very volatile and might be influenced by the presence of neutral trials, or other task design related effects. Finally, the present studies highlighted the importance of taking into account the attentional reorientation as much

#### REFERENCES


as inhibition modulation mechanisms when dealing with interference effects.

#### DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

Human Subject Research: The studies involving human participants were reviewed and approved by the Ethics Committee of the University of Geneva. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

#### AUTHOR CONTRIBUTIONS

Both authors contributed directly to the realization of the manuscript and approved it before submission. EM was involved in the data acquisition, analyses, interpretation, and manuscript writing and editing. ML supervised the work, and actively contributed to the manuscript edition and revision.

#### FUNDING

This research was funded by the Swiss National Science Foundation (SNSF), grant no. 100014\_165647.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.02028/full#supplementary-material



Fraisse, P. (1969). Why is naming longer than reading? Acta Psychol. 30, 96–103.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Ménétré and Laganaro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The Stroop Effect Occurs at Multiple Points Along a Cascade of Control: Evidence From Cognitive Neuroscience Approaches

#### Marie T. Banich\*

Institute of Cognitive Science, Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States

This article argues that the Stroop effect can be generated at a variety of stages from stimulus input to response selection. As such, there are multiple loci at which the Stroop effect occurs. Evidence for this viewpoint is provided by a review of neuroimaging studies that were specifically designed to isolate levels of interference in the Stroop task and the underlying neural systems that work to control the effects of interference at those levels. In particular, the evidence suggests that lateral prefrontal regions work to bias processing toward the task-relevant dimension of a Stroop stimulus (e.g., its color) and away from the task-irrelevant dimension (e.g., the meaning of the word). Medial prefrontal regions, in contrast, tend to be more involved in response-related and late-stage aspects of control. Importantly, it is argued that this control occurs in a cascade-like manner, such that the degree of control that is exerted at earlier stages influences the degree of control that needs to be exerted at later stages. As such, the degree of behavioral interference that is observed is the culmination of processing in specific brain regions as well as their interaction.

#### Edited by:

Benjamin Andrew Parris, Bournemouth University, United Kingdom

#### Reviewed by:

Emily Coderre, Johns Hopkins University, United States Kira Bailey, University of Missouri, United States

> \*Correspondence: Marie T. Banich Marie.Banich@colorado.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 16 May 2019 Accepted: 09 September 2019 Published: 09 October 2019

#### Citation:

Banich MT (2019) The Stroop Effect Occurs at Multiple Points Along a Cascade of Control: Evidence From Cognitive Neuroscience Approaches. Front. Psychol. 10:2164. doi: 10.3389/fpsyg.2019.02164 Keywords: Stroop, fMRI, dorsolateral prefrontal cortex, anterior cingulate, event-related potential

## INTRODUCTION

The premise of this article is that neuroimaging studies can provide unique insights into the locus of the Stroop effect. For purposes of this paper, we will define the Stroop effect as the inference that occurs between two dimensions of stimulus, one of which is task-relevant and one of which is taskirrelevant. Generally, when these two dimensions are incongruent (e.g., the word "red" printed in blue ink), more cognitive control is required than when the task-irrelevant information is congruent (e.g., the word "red" in red ink) or has no relationship to the task-relevant information (e.g., the word "sum" in red ink). In this paper, it will be argued that this interference can occur at a variety of levels. Furthermore, I will argue that neuroimaging studies can help identify the loci at which such interference occurs to a degree that may not always be possible in behavioral studies.

More specifically, behavioral studies have limitations in isolating the locus of the Stroop effect because it reflects the sum of processes yielding a final outcome of processing as reflected in reaction time or error rates. Since, as will be argued, the Stroop effect can be generated, and also influenced by control, at multiple levels along a cascade of control, cognitive neuroscience approaches can help to identify the multiple levels of interference and control. While careful experimental design

Yet, at the same time, simply examining which regions of the brain become active during performance of the Stroop task is not likely to yield critical information with regard to the potential loci of the Stroop effect. While there have been a number of metaanalyses to isolate brain regions consistently engaged during performance of Stroop-related paradigms with regard to both the more traditional Stroop tasks (e.g., Derrfuss et al., 2005) and variants (Feng et al., 2018), they do not necessarily provide insight into the locus of the Stroop effect. The reasons are that such meta-analyses aggregate findings across different variants of the Stroop task (discussed in more detail below) that may differ in the specific locus or loci that are most engaged by that variant (e.g., a vocal response vs. manual response Stroop task). Furthermore, such studies are often designed to examine cognitive control in general and not specifically designed to uncover the potential loci of the Stroop effect.

For that reason, in this paper, I review the findings of studies designed to isolate the different loci of the Stroop effect and their neural underpinnings, many of which are drawn from our laboratory's program of research that has melded specific behavioral paradigms with a cognitive neuroscience approach. From such work, we have proposed a model elucidating the brain systems that act with regard to the various loci of interference that can be engendered during the Stroop task (see **Figure 1**), as well as outlining a cascade of control between brain regions that influences the final behavioral interference effect that is observed.

As an overview, the cascade-of-control model suggests there are at least four important processes and brain loci that influence the Stroop effect. The first process, implemented by posterior regions of lateral prefrontal cortex, biases processing in posterior brain regions toward information that is most taskrelevant and/or away from information that is task-irrelevant. The second process, implemented by mid-dorsolateral regions, biases selection toward that information in working memory that is most relevant for the current task goal. The third process, implemented by caudal mid-cingulate regions, is involved in latestage selection, usually those that are response-related. Finally, rostral dorsal regions of the anterior cingulate cortex (ACC) evaluate the appropriateness of the response selected and send feedback to lateral prefrontal regions to make adjustments in control as needed.

Importantly, this model argues that the degree of Stroop interference observed and how it is controlled depend on how well earlier portions of the cascade, in this case mediated by lateral prefrontal regions, create an appropriate task set. To the degree that such control is not well enabled, medial brain regions, most notably portions of the ACC, must then exert control at later response-related stages of selection. Hence, the "locus" of the Stroop effect in any given experiment is influenced by the activity in and relationship between brain regions, as well as by the specific attributes of a given Stroop paradigm with regard to how much it taxes each of the four processes described above.

Before turning to the studies supporting this model, it should be noted that for purposes of this paper, the classic Stroop task as well as variants will be considered. Because what people describe as a "Stroop task" actually encompasses a family of tasks, we use a specific-naming convention to provide a bit more precision regarding the tasks being discussed. The phrase before the hyphen refers to the task-relevant dimension and the phrase after the hyphen refers to the task-irrelevant dimension. So, for example, the classic Stroop paradigm will be referred to as the color-word Stroop task, as the individual must identify the color in which an item is presented and ignore the meaning of the word.

## INTERFERENCE BETWEEN TWO PROCESSES THAT VARY IN THEIR AUTOMATICITY OR CONTROL DEMANDS

One level at which the Stroop effect occurs is through competition between two distinct processes, one that is more automatic and engaged by the task-irrelevant dimension and another that is less automatic, but which requires processing of the task-relevant dimension so as to meet task demands. In the classic color-word Stroop paradigm, word reading is more automatic than color identification. As such, it requires cognitive control to overcome the tendency to read the word and base a decision on that information to prioritize processing of ink color so as to guide responding.

This aspect of the Stroop task is well captured in computational models of the Stroop task, which includes a "prefrontal" unit that increases activation in units processing color so as to bias the competition toward that process, rather than word identification, in influencing response selection (Cohen et al., 1990). Behavioral evidence suggests that indeed it is the degree of engagement of the word reading process that influences the size of the Stroop effect (Monsell et al., 2001), with greater increases in the latency of color naming for words and pseudowords, which are more likely to engage word reading processes, than for consonant strings, "XXXXXs" or false fonts, which are less likely to engage word-reading processes.

Sometimes, word reading can be engendered not because of the "word-likeness" of letter strings, but because the meaning of the words enable attentional capture. This likely is the locus of interference observed in the emotional Stroop task, which is in essence a color-emotional word task by our nomenclature. In this task, there are various conditions. In one condition, the color of emotionally salient words, which can be either negative or positive in valence, such as "murder" or "joy," must be identified as compared to emotionally neutral words (e.g., "bench"). Here, word reading is engaged for emotionally salient words because they are thought to capture attention as compared to emotionally neutral words, making identification of the ink color difficult. In fact, the interference effect is reduced in this task vis a vis interference on the color-word task (e.g., Kaiser et al., 2015).

fpsyg-10-02164 October 4, 2019 Time: 18:35 # 2

Moreover, effects in the emotional Stroop task are sometimes hard to observe and may only occur in those individuals for whom the words have particular emotional significance (e.g., threat words for individuals who suffer from anxiety).

depends on how well control has been implemented at prior points in the cascade.

Conversely, manipulations that make the task-relevant dimension more salient can reduce the Stroop effect. In one study, Krebs et al. (2013) used a picture/scene-word Stroop task, in which individuals decided whether a picture represented an indoor or an outdoor scene on which was superimposed a taskirrelevant word ("outside" or "inside" in Dutch). Participants had previewed some of the picture scenes prior to the Stroop task while others were novel. Novel pictures, which are more likely to capture attention, were associated with reduced behavioral interference.

Although one must be careful in making reverse inferences from patterns of brain activation to cognitive processes (Poldrack, 2011), brain imaging studies can provide insights into the degree to which this competition between a more automatic and less automatic process engenders Stroop interference. In a study to examine this issue, Banich et al. (2000b) compared brain activation for two variants of the Stroop task, the standard colorword task and a color-object task, to reveal that automaticity of processing is critical for engaging cognitive control regions, more specifically the dorsolateral prefrontal cortex (DLPFC) and the inferior frontal gyrus (IFG). In the color-object task, individuals had to identify the color in which an object was displayed. On incongruent trials, objects were shown in an atypical color (e.g., a frog displayed in red, when frogs are typically green; a banana displayed in blue, when bananas are typically yellow). Brain activation was compared to neutral trials, in which the object displayed typically can occur in a variety of colors (e.g., a car displayed in red, when cars can be red, blue, gray, white, black, green, etc.).

For both the color-word and color-object tasks, for a given condition, individuals were told to monitor one (but not the other) dimension of the stimuli, making it task-relevant. Their task was to indicate when an item with a given characteristic appeared. For example, for the color-object task, one condition required individuals to monitor for an item in a specific color (e.g., purple), making color task-relevant, while in the other

conditions, they monitored for a given "word" (a non-sense word), making the word task-relevant. Likewise, in the colorobject task, in one condition, individuals were once again told to monitor for the color purple, and in the other condition, to monitor for a non-sense shape (making shape task-relevant).

Importantly, while color identification is less automatic than word identification in the color-word task, color identification is more automatic than object identification in the color-object task. Hence, if automaticity of processing is indeed a locus of Stroop interference, the pattern of brain activation should be influenced more by the relative automaticity of processes, rather than the nature of the attribute being attended to (i.e., color). Importantly, distinct patterns of brain activation were observed for color monitoring depending on whether it was the less automatic process, as in the color-word task, in which case prefrontal mechanisms were engaged or the more automatic process, as in the color-object task, in which no prefrontal activity was observed. As such, this study provided evidence that relative competition between the automaticity of processes is one locus at which Stroop interference occurs, and that prefrontal regions are involved in control over such effects.

A subsequent study demonstrated that prefrontal mechanisms are engaged when a less automatic process must guide responding, regardless of the specific nature of that process (Banich et al., 2000a). In this study, activation for the contrast of incongruent vs. neutral trials in a color-word Stroop task was compared to that in a spatial-word Stroop task. In the colorword task, incongruent trials consisted of color words displayed in conflicting colors (e.g., "red" in blue ink) while neutral trials consisted of non-color-related words displayed in a particular ink color (e.g., "lot" in blue ink). In the spatial-word task, individuals pressed a button to identify whether a word appeared above, within, or below a box. On incongruent trials, the word's position conflicted with its meaning (e.g., the word "above" positioned below the box), while on neutral trials, a non-spatial-related word was displayed (e.g., the word "civil" positioned below the box). Overlapping regions of DLPFC were activated for these two tasks, indicating that the need to overcome the automaticity of word reading can be engendered regardless of the nature of the task-relevant attribute (color vs. spatial position).

Taken together, this experiment and the one discussed just above demonstrate that it is not the nature of information in a given stimulus dimension that drives Stroop interference, but rather the relative automaticity of the two processes. In the first study discussed, different patterns of activation were observed when color was the task-relevant dimension, depending on the nature of its automaticity vis a vis the task-irrelevant dimension. In the second study, similar patterns of activation were observed even when the task-relevant attribute differed because processing each of those dimensions was less automatic than word reading.

If indeed it is the automaticity of word reading vis a vis another process that engenders Stroop interference, then one should observe similar patterns of brain activation for the colorword and color-emotional word Stroop task. A direct comparison in the same participants showed that that DLPFC activity is observed for the incongruent condition of a color-word task as well as trials in a color-emotion word task containing either a positive and negative emotionally valenced word compared to a neutral non-emotional word (e.g., "integer") (Compton et al., 2003). These findings are consistent with the idea that the automaticity of word reading or attentional capture by the word so as to engage word reading must be overcome to enable successful color identification. This overlapping pattern of activation in the frontoparietal network (DLPFC and parietal regions) for the color-emotion word task as compared to the color-word task has been observed in additional non-clinical samples both with positively and negatively valenced words (Kaiser et al., 2015) as well as for positive and threat words (Mackiewicz Seghete et al., 2017).

The effects observed in the color-emotional word task suggest that to the degree that a word is salient, it will capture attention so as to enhance word processing. If so, this should be a general mechanism that can help to increase Stroop interference. This idea is supported by a study (Compton et al., 2003) in which the words were specifically varied in terms of their arousal ratings. More activation was observed in frontoparietal regions for negative words high in arousal as compared to those low in arousal, suggesting that it is the salience of the word that engenders a greater need for control. Demonstrating that this is a general effect not specific to emotion words per se, in another study, the frequency with which certain items appears was varied, such that a subset of words occurred less frequently (i.e., oddball trials). DLPFC activation was enhanced for these oddball trials as compared to more frequent trials (Milham et al., 2003a). Thus, any of a number of manipulations that make words more salient so as to increase the engagement of word processing seems to be one locus of the Stroop effect.

## INTERFERENCE BASED ON DIFFERENT LEVELS OF STIMULUS-RELATED REPRESENTATIONS

While standard computational models of the Stroop task suggest that it can be explained by competition between two distinct processes, neuroimaging data provide a more complicated picture. In particular, if that were simply the only locus of the Stroop effect, then the identity of the task-irrelevant information should not affect brain activation, as for all intents and purposes it is downregulated relative to the task-relevant dimension.

However, neuroimaging research provided a contrary result. Greater activity was observed in brain regions that process the task-irrelevant attribute for incongruent as compared to neutral trials (Banich et al., 2001), a finding at odds with a simple downregulation of task-irrelevant processing. More specifically, different regions of posterior cortex showed greater activity on the contrast of incongruent (e.g., "red" in blue ink) vs. neutral trials (e.g., "lot" in blue ink) for the color-word task as compared to the contrast of incongruent (e.g., a red frog, when frogs are typically green) vs. a neutral trial (e.g., a red car, when cars can be red among a variety of other colors) in the color-object task. As such, there must be an additional level of competition and/or selection.

In an attempt to understand the factors that drive this pattern of brain activation, Herd et al. (2006) modified the standard computational model of the Stroop task so that it was able to replicate the pattern of brain activation observed as well as the behavioral pattern of results. In the standard computational model, there is an input layer with two subsections – one for the receipt of color information and one for the receipt of word information. These are each linked to an output layer that governs responding. A prefrontal control node modulates processing so as to increase activation of information in the color portion of the input layer in comparison to the word portion of the input layer.

The revised model had three important modifications. First, it included a layer between input and output meant to represent processing of information in posterior cortex in color-specific and word-specific regions, respectively. Part of the goal of including this layer was to see if activation in these portions of the model could mimic the activation observed in posterior brain regions in the empirical neuroimaging studies. Second, it included an additional top-down node to bias toward the abstract concept of color as being critical for the task set. The rationale was that, outside of the Stroop task, individuals typically do not have an abstract representation of color that excludes color words. As such, a task set for "color" is likely to broadly activate information related to the semantic category of color, regardless of whether it is contained in the task-relevant or the task-irrelevant dimension of the Stroop stimulus. Third, also related to the semantics of color, the model was modified so that there were excitatory linkages between representations of color in the ink processing layer (e.g., green) with the related representation in the word processing layer (e.g., "green").

This model could replicate both the behavioral results of the Stroop task (i.e., longer RT for incongruent than neutral trials) and also patterns of brain activation with more activity in the color processing layer for incongruent than neutral trials. An additional virtue of creating such a model is that portions of it can be "lesioned" to determine what aspect of its architecture is critical to engendering its results. Suggesting that the alterations to the original computational model were critical, neither a model that had the top-down color biasing unit removed nor a model without reciprocal connections between related semantic features could replicate the observed empirical results. Hence, the outcome of this computational modeling suggests that it is the color-relatedness of a representation that serves as a locus of interference.

While the color-relatedness of items is important, studies suggest that the nature of representation to which the semantic category of color is linked can vary and yet still produce interference. Support for this assertion comes from comparison of activation for incongruent vs. neutral trials for three types of Stroop tasks: the standard color-word Stroop task, the colorobject Stroop task, and a color-object word Stroop task. As noted above, in the color-object task, an object with a typical color is displayed in an atypical color (e.g., a frog in red) on incongruent trials, while on neutral trials, an object is displayed in one of the many different colors in which it can appear (e.g., a car in red). In the color-object word task, the person simply views the word describing an object that has a typical color (e.g., "frog"), rather than seeing a pictorial depiction of the object. Distinct regions of cortex showed activation depending on the nature of the task-irrelevant attribute, suggesting that it was not just an amodal semantic representation of color that is the source of interference. For example, different regions of the ventral visual processing stream are activated on incongruent trials for the color-word task as compared to the color-object task, suggesting that interference may arise from more orthographically based as compared to visual form-related representations in the former task as compared to the latter. In addition, different portions of the IFG (BA 45 vs. BA 48) became active for the color-object as compared to the color-object word task despite the fact that the interference would arise from the same semantic characteristic (e.g., semantic memory with regard to frogs creates interference because they are typically green not red) (Banich et al., 2001). This finding also suggests that interference can arise at multiple stimulus-related levels.

Another way to examine stimulus-related representations of color is to compare patterns of activation when items have colorrelated information in both the task-relevant and task-irrelevant dimension as compared to when color-related information is restricted solely to the task-relevant dimension. One can examine this question by determining patterns of brain activation common across both incongruent and congruent trials that are greater than those observed on neutral trials. Investigations taking such an approach (Milham et al., 2002; Milham and Banich, 2005) show that there is not only increased activation in DLPFC, which presumably reflects a more general increased need for control to bias toward task-relevant information, but also increased activation in ventral lateral prefrontal cortex, portions of which are regions involved in semantic retrieval and selection (Badre and Wagner, 2007). Also suggesting interference at the semantic level, left temporal language areas show activation for the contrast of incongruent and congruent trials, which contain semantically related color information in both the ink color and the word, as compared to neutral trials, which in this case were words unrelated to color (e.g., "lot") (Milham and Banich, 2005).

In sum, the work reviewed in this section suggests that interference can potentially arise in the Stroop task at a number of stimulus-related dimensions, from visual form to orthography, as they relate to the task-relevant category, and also with regard to semantic representations of task-relevant information.

#### RESPONSE-RELATED ASPECTS OF INTERFERENCE

Another series of studies provided evidence that Stroop interference is also engendered at response-related levels. In the first study of this nature, brain activation was examined for two types of incongruent trials, response-eligible and responseineligible. In response-eligible trials, the competing word also names a potential response. An example would be the word "red" printed in blue ink when the potential responses are red, blue, and green. Response-ineligible trials on the other hand name competing colors, but those that are not a potential response, such as the word "purple" printed in blue ink, when the potential

responses are red, blue, and green. If a particular brain region is specifically engaged in dealing with response conflict, it should show greater activation to response-eligible than responseineligible trials. Importantly, in addition, this region should also show no more activation to response-ineligible trials, which have semantic conflict but no response conflict, than to neutral trials, which have neither semantic nor response conflict (e.g., the word "mile"). A region of mid-cingulate cortex showed such a pattern (Milham et al., 2001), which was confirmed in a subsequent study (Milham et al., 2003a).

Another way to examine response-related aspects of Stroop interference is to compare processing on different blocks of trials in which the stimulus-response mapping is one-to-one as compared to one-to-many. More specifically, on some blocks, each incongruent response-ineligible word was mapped to a different color (e.g., the word "purple" shown in blue, the word "violet" shown in green, etc.), whereas in other blocks, the same task-irrelevant word was presented but paired with a variety of colors (e.g., shown on some trials in blue, in other trials in green, etc.). Hence, stimulus-response mappings were more overlapping in the former condition than the latter. Each of these blocks also contained neutral words (e.g., the word "closet") with oneto-one as compared to one-to-many color mappings within the appropriate blocks. While DLPFC showed greater activity for incongruent vs. neutral trials, regardless of the nature of the response-mapping (1 to 1; 1 to 4), the ACC was sensitive to the response mapping, showing more activity when the colorresponse mappings were overlapping (one word to four colors) and hence harder to distinguish than when they were one-to-one (one word to one color) (Liu et al., 2006).

Another way in which response-related interference in the Stroop task has been investigated is via an integrated Simon-Stroop task. In the Simon task, interference arises from stimulusresponse interference. In this task, interference is engendered when a right-sided (e.g., right hand) response is required to a left-sided stimulus (and vice versa) as compared to when the location of the item to be responded to and the effector making the response are on the same side of midline. In our integrated Simon-Stroop task, individuals viewed arrows that were located either to the right or left (Simon stimuli), or on different trials above or below (Stroop stimuli) a fixation point. Individuals were trained, for example, to press a right button for an upward arrow and a left button for a downward arrow. Simon interference, which is considered stimulus-response interference, was engendered by placing, for example, an upward arrow to the left of fixation, which then required a right button response to a left-sided stimulus. Stroop interference, which is considered engendered by conflict between two stimulus dimensions, occurred for example when an upward arrow was positioned below the fixation point.

While the contrast of incongruent vs. congruent trials yielded activation in DLPFC for both tasks, the Simon task trials generated activity in motor and response-related regions including the ACC and supplementary motor area (SMA), activity that was not observed in this spatial arrow– spatial position Stroop task. In contrast, the stimulus–stimulus interference of the Stroop task engendered activity in inferior parietal and inferior frontal regions that was not observed in the Simon contrast (Liu et al., 2004). Hence, this body of work suggests that another locus of Stroop interference is at responserelated aspects of processing. Consistent with this supposition, certain limitations of the classic computational model of the Stroop task by Cohen et al. (1990) with regard to fitting aspects of human performance can be overcome if the model includes a mechanism for performing final response selection (Stafford and Gurney, 2007).

## AN INTEGRATIVE MODEL: STROOP INTERFERENCE CAN OCCUR AT MULTIPLE POINTS ALONG A CASCADE-OF-CONTROL

The work described above suggests that Stroop interference can occur at multiple levels. How then can one integrate these findings to shed light on the locus of the Stroop effect? We have argued that, importantly, the degree to which control is exerted at one level of processing can then influence the degree to which interference is engendered or controlled at another.

A pair of early studies helped this idea to come into focus. As reviewed above, our work suggests a broad distinction between control engendered at the level of an abstract task set, mainly implemented by lateral prefrontal cortex, as compared to more response-related aspects of control, mainly implemented by medial prefrontal cortex. In examining differences in brain activation common to incongruent and congruent as compared to neutral trials (e.g., the word "lot") in the color-word task, there was a notable difference in patterns of activation for younger vs. older adults (Milham et al., 2002). In particular, younger adults exhibited more activation across frontal and parietal regions. Such findings are consistent with reported compromise with aging of prefrontal regions and processes involving executive function and cognitive control (Lockhart and DeCarli, 2014). In contrast, older individuals had more activation in portions of the ACC and SMA. This led us to consider the possibility that due to the lack of top-down control, older individuals were potentially utilizing more response-related mechanisms to deal with the interference.

The converse effect was observed in a study of practicerelated effects on the Stroop task. Since the Stroop effect can be maintained over tens of thousands of trials due to the automaticity of word reading, a Stroop task was used in which the interference effect could be reduced with practice. In this task, individuals were trained to assign a color-word label to a series of nonsense designs (e.g., nonsense design 1 was labeled "blue"). Then, later, they were shown either incongruent trials, in which a specific nonsense design was displayed in an incongruent color (e.g., nonsense design 1 labeled "blue" shown in yellow), or neutral trials, on which the nonsense designs were shown in white. To examine learning effects, the experiment was divided into thirds, examining activation for the first third, second third, and last third of trials. While lateral prefrontal activity stayed relatively static across the three portions of the task, that of

medial prefrontal activity declined, as did the behavioral Stroop effect, suggesting that individuals were gaining better control over interference. We interpreted this pattern as suggesting that less late-stage response-related interference was occurring, as reflected in reduced ACC activity, due to better top-down control by lateral prefrontal regions, which stayed engaged across all portions of the task (Milham et al., 2003b). Thus, ACC activity depends, in part, on the degree of interference control engendered by DLPFC.

Testing the idea that ACC activity depends in part on the degree of prior control exerted by DLPFC required using a method that afforded better temporal resolution than that provided by fMRI. The relationship between activity in DLPFC and ACC was examined by utilizing event-related potentials (ERPs) due to their superior temporal resolution, in conjunction with fMRI. Participants performed the Stroop task in the magnet and then again while electrophysiological recordings were made. fMRI results were used to enable source localization for ERP waveforms for the DLPFC and ACC. The relationship between ERPs generated by these sources was examined, in addition to how well they could predict, as tested via mediation models, interference on the Stroop task (indexed by the difference in performance between incongruent and congruent trials). The specific model examined whether the influence of DLPFC activity in the 300–440 ms time range on Stroop performance would be mediated, in part, by later ACC activity in the 520–680 ms time range. This pathway was significant. Moreover, the data showed that for individuals with larger DLPFC amplitude, indicative of higher levels of control, the degree of ACC activity was unrelated to behavioral interference. This finding is consistent with the idea that there is reduced need for late-stage selection when the task set is well specified so as to reduce interference from the task-irrelevant processing stream. In contrast, individuals with low DLPFC but high ACC amplitude exhibited a greater degree of interference as measured by the reaction time difference between incongruent and congruent trials, but no more errors than individuals with high DLPFC activity. In contrast, those individuals with both low DLPFC amplitude and low ACC amplitude committed more errors, suggesting that the reduced ability of the ACC to engage in late-stage selection led to compromised performance. An advantage of this approach was that alternative models could be tested. For example, one might argue that this model predicted the data because it posited that the effect of a component occurring earlier in time, that recorded from the DLPFC, was moderated by a component occurring later in time, that recorded from the ACC. Arguing against such an interpretation, a model positing a pathway from an earlier ACC component (in the 220–340 ms time range) via the DLPFC component (at 300–440 ms) did not predict performance. Nor did a model in which activity derived from source location of another brain region involved in cognitive control, RIFG, was substituted for DLPFC (Silton et al., 2010).

Integrating all these findings, we posited a cascade-of-control to control interference in the Stroop task (Banich, 2009). As discussed earlier (and as shown in **Figure 1**), this model argues that posterior portions of lateral prefrontal cortex are involved in setting a top-down attentional set (i.e., pay attention to ink color) for task-relevant information and act by modulating activity either in one or both of the posterior brain regions that process the task-relevant and task-irrelevant dimension of the Stroop stimulus. This idea is consistent with activation of IFG across distinct meta-analyses of Stroop tasks (Derrfuss et al., 2005). Such task setting can occur even prior to stimulus presentation in a proactive manner (see, for example, Braver, 2012). Once a stimulus appears, relevant information is identified and then mid-DLPFC regions are involved in selecting which of the relevant information should be actively maintained in working memory. Regions of mid-DLPFC have been implicated in buffering relevant information in working memory from interference from competing information (Burgess and Braver, 2010). This information is then sent along to more posterior and dorsal regions of ACC, which are then involved in responserelated and late-stage selection, which is required prior to emitting a response. Research with monkeys implicates the ACC as being particularly important for response selection (Isomura et al., 2003). Then, more rostral regions of ACC are involved in response evaluation, which can send a signal back to DLPFC (e.g., Jahn et al., 2014) as posited by the conflict monitoring theory (Botvinick et al., 2004; refer back to **Figure 1**). Consistent with this notion of a cascade are findings from ERP studies in which the onset of the two stimulus dimensions – taskrelevant and task-irrelevant – are varied in time. These studies reveal that ERP waveforms sensitive to stimulus incongruity vary depending on the stimulus onset asynchrony between these two dimensions, implicating a cascading process of interference effects (Appelbaum et al., 2009; Coderre et al., 2011).

## OTHER TYPES OF INTERACTIONS BETWEEN BRAIN REGIONS THAT MAY INFLUENCE THE LOCUS OF THE STROOP EFFECT

Conceptualizing Stroop interference as occurring via a cascade of control provides additional avenues to consider how the locus of Stroop interference might be considered. In this section, we consider some approaches in that regard. One issue not yet discussed is the mechanism via which top-down biasing by prefrontal regions for a task set influences processing of each of the task-relevant and the task-irrelevant dimension of a Stroop stimulus. One can ask whether interference occurs because the representation of task-relevant information is not adequately upregulated or because the representation of task-irrelevant information is not adequately downregulated. Because of the specificity of brain regions that process each of the two stimulus dimensions contained in Stroop stimuli, one can leverage brain imaging to examine this question.

A number of studies have examined whether, for example, in the standard color-word Stroop task, activity is increased in color processing regions or downregulated in word processing areas (e.g., Egner and Hirsch, 2005; Purmann and Pollmann, 2015). This question is generally approached via the utilization of localizer scans where individuals are shown a series of

words and then separately colors to identify, on an individual participant basis, those brain regions that are specifically involved in processing words and then those specifically involved in processing color. One can then examine the degree of activation of each of these regions on average for incongruent trials as compared to congruent trials. Work using such an approach suggests that both mechanisms (upregulation of task-relevant material, downregulation of task-irrelevant material) may occur (e.g., Polk et al., 2008; Coste et al., 2011).

Recently, we have expanded on such approaches to specifically examine how processing of task-relevant vs. task-irrelevant dimensions of a Stroop stimuli predict the degree of Stroop interference that is observed on a trial-by-trial basis (Banich et al., 2019). In our approach, participants performed a localizer task, which in conjunction with multi-voxel pattern analysis (Norman et al., 2006) was used to determine the pattern of brain activity over visual cortex that is specifically associated with processing the task-relevant dimension and then to also determine the pattern of activity associated with the task-irrelevant dimension. The task employed was an emotional word-emotional face Stroop task in which individuals characterized the valence of a word (positive, negative) superimposed on a task-irrelevant emotional face (sad, happy). On each trial, we determined how much activity over posterior cortex was similar to that typical for each dimension (using a classifier fit), that is, how much the pattern of activity looks like face activity and additionally how much the pattern looked like word activity. This approach provided a trialby-trial readout of how much each dimension was being attended and/or processed.

The important question for purposes of the present article was the degree to which processing of each of these dimensions could predict RT on a given trial and the degree to which such activity occurs as a result of activity in DLPFC modulating activity of posterior brain regions processing each of the task-relevant and task-irrelevant stimulus dimensions. The results yielded different patterns for incongruent as compared to congruent trials. On incongruent trials, greater DLPFC activity directly predicted longer RT, suggesting that when individuals were having difficulty on a given trial, they needed to engage more top-down mechanisms. In addition, more DLFPC activity was associated with less of a classifier fit for faces, suggesting that this brain region is downregulating processing of the task-irrelevant face. However, the degree of processing of the task-relevant face did not predict RT. Hence, interference, at least in the population of individuals in this study, late adolescents, seems to be predicted on incongruent trials by the degree to which DLPFC mechanisms must be engaged. On congruent trials, as on incongruent trials, more DLPFC activity was associated with a poor classifier fit (i.e., less activity) for faces. However, for these trials, more processing of the word was associated with longer RT, suggesting that when more attention needed to be directed to the word to extract the relevant information, RT was elongated.

While these results must be considered in the context that they were obtained in adolescents in whom cognitive control mechanisms are still developing (Andrews-Hanna et al., 2011), they nonetheless raise two important points. First, they provide another example of how brain imaging techniques can be leveraged to try to provide insights into the locus of the Stroop effect that would otherwise be difficult to obtain via behavioral methods alone. Secondly, they suggest that when one talks about the "locus of the Stroop" effect, considered in the context of a cascade, those effects can potentially vary for congruent and incongruent trials, and the interference observed may be a combination of these two effects.

Also suggesting that the locus of the Stroop effect may vary depending on task demands are findings examining the Stroop effect from a network perspective (Spielberg et al., 2015). Using a graph theory approach, higher demand for inhibitory control is associated with restructuring of the global network into a configuration that is more optimized for specialized processing (functional segregation), more efficient at communicating the output of such processing across the network (functional integration), and more resilient to potential interruption (resilience). In addition, there were regional changes with right inferior frontal sulcus and right anterior insula occupying more central positions as network hubs, and dorsal ACC becoming more tightly coupled with its regional subnetwork. This work also suggests that interference is generated via a cascade of activity among regions situated within a larger network and that such configurations can change with control demands on incongruent vs. congruent trials.

### TASK-RELATED VARIABLES THAT MAY INFLUENCE THE LOCUS OF THE STROOP EFFECT

The implications of the results discussed just above, and the model proposed, are that the Stroop effect can occur at a number of different loci and may be influenced by the interaction between these loci as well (e.g., top-down biasing by DLPFC; responserelated, late-stage selection by ACC). As a result, it may indeed be that where the Stroop effect is observed is dictated essentially by where your paradigm puts it, even if only implicitly. Two examples are provided here.

First, one of the reasons we used manual responses in most of our fMRI studies was to avoid the potential for head motion that is associated with verbal responding. However, that design choice likely influenced what was observed. In paradigms with a verbal response, there is a much stronger and more automatized mapping between seeing the word (or color) red and verbally producing the word that is associated with it than, for example, training individuals that pressing a button with your index finger denotes "red." Although we have never formally performed such a comparison, based on prior studies showing differences in activation based on response modality (verbal, manual) during a spatial Stroop task (Barch et al., 2001), one might expect that the interference effects in a vocal color-word Stroop paradigm would more likely involve response-related processing relative to top-down biasing mechanisms, as compared to manual response versions in which there is likely to be less response-related interference. Said differently, pressing an index finger to denote the color red when the word says "blue" is likely to engender less response interference than saying "red" compared to the

well-ingrained tendency to say "blue" when seeing the word blue. This idea has been recently supported by a study in which the vocal and manual Stroop effects were compared. The vocal Stroop effect was about twice as large as the manual one. Moreover, ERP recordings indicated that while both the vocal and manual version produced an N400 (suggestive of semantic interference), only for the vocal version was there a responselocked component over left inferior frontal and parietal regions, suggesting additional interference at the level of word production (Zahedi et al., 2019) (however, it should be noted that an alternative suggestion is that different portions of the anterior cingulate are involved in response-related selection for manual vs. vocal tasks; e.g., Liotti et al., 2000; Swick and Turken, 2002).

As a second example, the locus of the Stroop effect may vary depending on the relative automaticity of two processes. One of the reasons that the classic color-word Stroop effect gives such a potent behavioral interference effect is that word reading of color words is so automatic, being some of the earliest learned words. In contrast, the behavioral interference effects for a spatial location–spatial word Stroop task are much less potent. Hence, there may be a greater need for top-down biasing by DLPFC in the former case than the latter.

### INDIVIDUAL DIFFERENCES THAT MAY INFLUENCE THE LOCUS OF THE STROOP EFFECT

The locus of the Stroop effect may also vary depending on the characteristics of an individual or his/her experience. For example, during the teen years, overcoming interference engendered by Stroop stimuli seem to rely to a greater degree on DLPFC in older adolescents, but on the ACC in younger ones (Andrews-Hanna et al., 2011). Young adults with ADHD appear to show reductions in both DLPFC and ACC activity relative to controls, suggesting disruptions in both top-down and latestage/response-related aspects of controlling Stroop interference (Banich et al., 2009). Individuals with depression exhibit less DLPFC activity, especially in the left hemisphere (Herrington et al., 2010), with this effect being modulated by level of anxiety (Engels et al., 2010). Moreover, individual differences in approach and avoidance can modulate the lateralization of involvement of the DLPFC in top-down control (Spielberg et al., 2011).

In other individuals, different brain regions other than the typical ones are engaged. For example, women with a history of childhood abuse compared to controls exhibit less fronto-parietal activation, but more activity in regions that are part of the ventral attention/surveillance system during both a standard color-word and color-emotional word Stroop task (Mackiewicz Seghete et al., 2017). In adolescents with severe substance and conduct problems, more activation is observed in medial temporal regions including hippocampal regions (Banich et al., 2007), suggesting potentially a more instance-based processing of Stroop stimuli.

Studies with twins can help to elucidate the potential causes of these effects. For example, in a small sample of monozygotic twins who were discordant for stressful life events, those higher in stressful life events recruited regions of ventrolateral and medial frontal cortex as well as limbic regions while performing an emotional word–emotional face Stroop task. The control cotwins showed only the more typical recruitment of frontoparietal regions thought to be important for executive control of attention and maintenance of task goals. Behavioral performance was not significantly different between twins within pairs, suggesting that the twin who had experienced greater stress recruited additional neural resources associated with affective processing and updating working memory to obtain the same level of behavioral performance (Godinez et al., 2016). A study utilizing a case-control discordant twin pair design revealed that co-twins of individuals with ADHD, like their affected ADHD twin, show reduced activity in the anterior cingulate and insula compared to the unrelated controls, suggesting familial influences. In contrast, portions of the frontoparietal network appear to be the location of effects specific to ADHD, with twins with childhood ADHD showing reduced superior frontal (Brodmann's Area – BA 6) and parietal region (BA 40) activity compared to both their control co-twins and unrelated control twins (Godinez et al., 2015).

Other work suggests that the nature of the cascade is affected by individual differences. For example, using a source-guided examination of ERP effects, Silton et al. (2011) found that for individuals with high levels of depression, increased LDLPFC activity was directly related to decreased Stroop interference and that ACC did not play an intervening role. Separately for individuals with high levels of anxious apprehension (i.e., worry), higher ACC activity was related to more Stroop interference. These results indicate that depression and anxious apprehension modulate temporally and functionally distinct aspects of the fronto-cingulate network involved in top-down attention control. Additionally, Spielberg et al. (2014) observed that during performance of a color-word Stroop task, increasing levels of anxious arousal were positively associated with coupling of the right DLPFC with orbitofrontal cortex (OFC). In addition, increasing levels of depression were positively associated with right DLPFC–OFC coupling and negatively associated with left DLPFC–OFC coupling. As such, it may be that additional regions to those outlined by our model are brought into the set of regions influencing Stroop interference as a function of individual differences. For example, our model focuses exclusively on cortical regions. Yet at least some research suggests that the ventral tegmental area (VTA)/substantia nigra (SN) and locus coeruleus (LC) also show alterations in activity on incongruent vs. congruent trials, and have differential connectivity to prefrontal regions (Köhler et al., 2016). Hence, individual differences in noradrenergic and/or dopaminergic function may influence the locus of the Stroop effect as well.

#### CONCLUSION

The main takeaway from the work reviewed in this article is that the locus of the Stroop effect can occur at multiple levels from the initiation and creation of a task set for the task goal (e.g., make a decision based on ink color) to late-stage response-related aspects of control. In general, our model suggests that lateral

prefrontal regions are more involved in selection and modulation of specific information processing streams (i.e., task-relevant vs. task-irrelevant) while cingulate regions are more involved in latestage response-related aspects of control. However, even within this general dichotomy, these mechanisms are likely invoked along a cascade, providing the opportunity for control and interference to occur at multiple time points. Additional evidence points to the important role that connectivity between brain regions plays in producing the Stroop effect. Furthermore, the locus of interference may be influenced by the nature of the paradigm (e.g., vocal vs. manual responding in a color-word Stroop task) and by characteristics of an individual.

As such, there is likely no single locus of the Stroop effect, which is both the advantage and the disadvantage of using this task to understand mechanisms of control. On the one hand, if a researcher desires an all-purpose task for examining cognitive control, or alterations to such control, without regard to its locus, the family of Stroop tasks is an excellent choice. One of the reasons we have used Stroop variants in our research is exactly because it is a "broad spectrum" task for detecting deviation in cognitive control. In addition, we chose it because the task instructions are easily understood and, as such, it can be administered across a wide range of ages and with neurologically normal and clinical populations. Moreover, it provides a robust behavioral effect. In addition, while its effect may be more robust at the group than at the individual level (Enkavi et al., 2019), we have found that an interference score [i.e., (Incongruent RT − Congruent RT)/Congruent RT] that accounts for individual differences in overall RT works well especially when combined with neuroimaging. Another aspect of the Stroop task that makes it so versatile is that there are a wide variety of variants that are available.

It is, however, exactly this variation across Stroop paradigms that can be a disadvantage of the task, as it can make comparison across different studies difficult. Researchers often discuss using the "Stroop task" when they use one of the many members of the family of Stroop task variants. Yet, each variant of the task likely generates the need for control at different loci, as our research has demonstrated. To help facilitate comparison across studies, we have tried to be more explicit in our task nomenclature by indicating both the task-relevant and

#### REFERENCES


task-irrelevant dimensions (e.g., the classic color-word Stroop task). If this nomenclature were adopted more broadly across the field, it might facilitate comparisons across studies. However, to truly facilitate comparison, it would also be important to indicate the nature of neutral trials. In some studies of the classic color-word Stroop task, the neutral trials are simply a series of colored "xxxxxxx"s. Such stimuli are not as likely to engage word reading mechanisms as, for example, the neutral non-color words that we have typically employed (refer back to discussion in section "Interference Between Two Processes That Vary in Their Automaticity or Control Demands"), which will also influence the locus of the Stroop effect (as will the specific contrast being examined, e.g., incongruent vs. neutral trials, incongruent vs. congruent trials).

In conclusion, the "Stroop task" can be used either more as a hammer to detect cognitive control across a variety of loci in a broad-based manner or more as a scalpel to investigate control at a very limited level if designed with specifically constrained stimuli and contrasts. Just as there is a family of Stroop tasks, there is also a family of loci at which the Stroop effect can occur. Moreover, the different loci may be generated across a series of distinct but interacting brain regions to produce the single behavioral effect that is observed.

## AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.

## FUNDING

Portions of the research described in this article were supported by NIH grants P50 MH079485, R03 HD062600, R01 MH070037, R01 MH105501, and R01 MH61358. Preparation of this article was supported by R01 MH105501.

## ACKNOWLEDGMENTS

The author would like to thank all of her collaborators who helped in this line of research.


posterior brain systems in attentional selection. J. Cogn. Neurosci. 12, 988–1000. doi: 10.1162/08989290051137521


revealed by case–control, discordant twin pair design. Psychiatr. Res. 233, 458–465. doi: 10.1016/j.pscychresns.2015.07.019


fpsyg-10-02164 October 4, 2019 Time: 18:35 # 11


**Conflict of Interest:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Banich. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## An fMRI Study of Response and Semantic Conflict in the Stroop Task

Benjamin A. Parris<sup>1</sup> \*, Michael G. Wadsley<sup>1</sup> , Nabil Hasshim<sup>2</sup> , Abdelmalek Benattayallah<sup>3</sup> , Maria Augustinova<sup>4</sup> and Ludovic Ferrand<sup>5</sup>

<sup>1</sup> Department of Psychology, Bournemouth University, Poole, United Kingdom, <sup>2</sup> School of Psychology, University College Dublin, Dublin, Ireland, <sup>3</sup> Exeter MR Research Centre, University of Exeter, Exeter, United Kingdom, <sup>4</sup> Normandie Université, UNIROUEN, CRFDP, Rouen, France, <sup>5</sup> Université Clermont Auvergne, CNRS LAPSCO, Clermont-Ferrand, France

An enduring question in selective attention research is whether we can successfully ignore an irrelevant stimulus and at what point in the stream of processing we are able to select the appropriate source of information. Using methods informed by recent research on the varieties of conflict in the Stroop task the present study provides evidence for specialized functions of regions of the frontoparietal network in processing response and semantic conflict during Stroop task performance. Specifically, we used trial types and orthogonal contrasts thought to better independently measure response and semantic conflict and we presented the trial types in pure blocks to maximize response conflict and therefore better distinguish between the conflict types. Our data indicate that the left inferior PFC plays an important role in the processing of both response and semantic (or stimulus) conflict, whilst regions of the left parietal cortex (BA40) play an accompanying role in response, but not semantic, conflict processing. Moreover, our study reports a role for the right mediodorsal thalamus in processing semantic, but not response, conflict. In none of our comparisons did we observe activity in the anterior cingulate cortex (ACC), a finding we ascribe to the use of blocked trial type presentation and one that has implications for theories of ACC function.

Keywords: task conflict, semantic conflict, response conflict, fMRI, selective attention, Stroop 2-1 mapping, Stroop

## INTRODUCTION

The Stroop task (Stroop, 1935; MacLeod, 1991) has been referred to as the "gold standard" measure of selective attention (MacLeod, 1992). It elicits cognitive conflict by presenting two sources of information one of which is the relevant to-be-identified color and the other an irrelevant word and must be ignored. The Stroop interference effect refers to the finding that naming aloud the color that a word is printed in takes longer when the word denotes a different color (e.g., the word red displayed in blue font; an incongruent trial) compared to a baseline control condition (e.g., top in red or xxxx in red). The Stroop facilitation effect refers to the finding that naming aloud the color that a word is printed in is faster when the word denotes the same color (e.g., the word red displayed in red font; an congruent trial) compared to a baseline control condition. Influential models of Stroop task performance attribute Stroop effects to response level competition (or convergence in the case of facilitation; Cohen et al., 1990; Roelofs, 2003). Yet, more recent lines of research argue that these effects result from several distinct types of competition. Therefore, the present paper addressed just this issue by investigating the neural substrates of multiple sources of competition in the Stroop task.

#### Edited by:

Marco Steinhauser, Catholic University of Eichstätt-Ingolstadt, Germany

#### Reviewed by:

Yoshifumi Ikeda, Joetsu University of Education, Japan Yu-Chin Chiu, Purdue University, United States

#### \*Correspondence:

Benjamin A. Parris bparris@bournemouth.ac.uk

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 15 June 2019 Accepted: 14 October 2019 Published: 31 October 2019

#### Citation:

Parris BA, Wadsley MG, Hasshim N, Benattayallah A, Augustinova M and Ferrand L (2019) An fMRI Study of Response and Semantic Conflict in the Stroop Task. Front. Psychol. 10:2426. doi: 10.3389/fpsyg.2019.02426

## The Neural Substrates of Stroop Task Performance

The common implementation of the Stroop task involves incongruent, congruent and color neutral trials and imaging studies employing some or all of these conditions have consistently and mainly implicated left lateral prefrontal (particularly inferior frontal regions of BA44/45/47) and left parietal cortices in Stroop task performance (e.g., Bench et al., 1993; Khorram-Sefat et al., 1996; Peterson et al., 1999; Zysset et al., 2001; Adleman et al., 2002; Mead et al., 2002; Langenecker et al., 2004; Liu et al., 2004; Coderre et al., 2008; Song and Hakoda, 2015; Cipolotti et al., 2016). Many studies have also implicated the anterior cingulate cortex (ACC) in Stroop task performance (e.g., Bench et al., 1993; Peterson et al., 1999; Adleman et al., 2002; Langenecker et al., 2004; Liu et al., 2004; Coderre et al., 2008), although this is a matter of debate (e.g., Khorram-Sefat et al., 1996; Zysset et al., 2001; Mead et al., 2002; Roelofs et al., 2006; Aarts et al., 2008; Song and Hakoda, 2015).

An influential model (Botvinick et al., 2001) posits that the ACC is responsible for detecting the presence of response conflict between competing representations and consequently engages the DLPFC to impose cognitive control by biasing information in posterior cortices to resolve conflict (see also Miller and Cohen, 2001; van Veen and Carter, 2002). The parietal regions in contrast are thought to represent stimulusresponse mappings or to be involved in visuospatial selection, and thus play a role in conflict resolution Casey et al., 2000; Rushworth et al., 2001; Bunge et al., 2002).

The Cascade-of-Control model (Banich, 2009, 2019) is another model of the neural substrates of Stroop task performance based on a series of studies investigating control in Stroop-like tasks (e.g., Banich et al. (2000a,b); Milham et al., 2002; Compton et al., 2003; Liu et al., 2006; Mackiewicz Seghete et al., 2017). According to this model, posterior portions of the lateral prefrontal cortex, particularly portions of the inferior frontal gyrus, are responsible for setting the attentional set in the Stroop task, meaning that it can upregulate color processing and/or downregulate word processing, prior even to stimulus onset (proactive control). The posterior PFC will send signals to posterior brain regions to ensure the biasing of relevant information over irrelevant information. Mid dorsolateral prefrontal cortex (DPLFC) is purported to be responsible for selecting relevant information in working memory on the presentation of the Stroop stimulus. If the prefrontal regions do not do as good a job as they could posterior and dorsal ACC regions are argued to play a role in late stage, response-related selection. Finally, consistent with the conflicting monitoring model of Botvinick et al. (2001), more rostral regions of the ACC are responsible for response evaluation and sending signals back to the DLPFC so that it can adjust the strength of its involvement. An important concept with the Cascadeof-Control model is that the involvement of certain regions, particularly the ACC, depends on how well the early selection regions do their jobs. Moreover, according to the model the posterior and dorsal ACC are thought to play a role only in response conflict resolution and not conflict of other types such as the conflict between semantic representations activated by the dimensions of the Stroop stimulus (semantic conflict) or the conflict between the exogenously activated task set for reading and the endogenously activated task set for color classification (task conflict).

## Dissociating Response and Semantic Conflict

It is notable that few studies have attempted to decompose Stroop effects into their components. Stroop interference for example has been shown to comprise conflict at a variety of different levels of processing (Augustinova et al., 2019; Ferrand et al., 2019; for a review see Parris, Hasshim, Wadsley, Augustinova, and Ferrand, under review). Doing so not only refines our understanding of the mechanisms of selective attention but also has the potential to elucidate the functions of associated brain regions. Indeed, it has been postulated that different regions of the ACC detect differential types of conflict (e.g., response and semantic conflict; van Veen and Carter, 2005) which then engage separate regions of the PFC to independently resolve semantic (superior PFC) and response conflict (inferior PFC). In contrast, the results from another study suggest that PFC activity dissociates by hemisphere (Milham et al., 2001). Milham et al. report that right PFC is responsible for resolving response conflict while left PFC is responsible for resolving semantic conflict. van Veen and Carter (2005) also reported parietal activation to semantic conflict only, consistent with the notion that it plays a role in maintaining task-relevant response mappings. Milham et al. in contrast reported parietal activity to both response (superior parietal lobe) and semantic (inferior parietal lobe) conflict.

van Veen and Carter (2005) noted that the differences between their study and that of Milham et al. might be due to the way response and semantic conflict were measured (see below for more detail). Recent research concurs with this conclusion. The aim of the present study was to investigate the neural regions involved in processing different types of conflict using methods informed by recent research (Augustinova and Ferrand, 2014; Hasshim and Parris, 2014, 2015, 2018; Levin and Tzelgov, 2016). Below we describe and critically evaluate the methods employed thus far in the study of the neural correlates of response and semantic conflict.

## The 2:1 Color-Response Mapping Paradigm

In their study van Veen and Carter (2005) employed the 2:1 colorresponse mapping paradigm. First introduced by De Houwer (2003) this method maps two color responses to the same response button, which allows for a distinction between stimulusstimulus (semantic) and stimulus-response (response) conflict. By mapping two response options onto the same response key (e.g., both "blue" and "yellow" are assigned to the "z" key) any interference during same-response trials (e.g., when "blue" is printed in yellow) is thought to involve only semantic conflict. Any additional interference on incongruent trials (e.g., when "red" is printed in yellow and where both "red" and "yellow"

are assigned to different response keys) is taken as an index of response conflict. Performance on congruent trials is compared to performance on same-response incongruent trials to reveal interference that can be attributed to semantic conflict, whereas a different-response incongruent – same-response incongruent trial comparison is taken as in index of interference due to response conflict. Thus, the main advantage of using sameresponse incongruent trials as an index of semantic conflict is that it claims to be able to remove all the influence of response competition (De Houwer, 2003; Schmidt and Cheesman, 2005).

Using a Flanker task, van Veen et al. (2001) tested 12 participants using same-response and different-response incongruent trials to investigate the response of the ACC to response and stimulus conflict. They reported that the ACC was active only when response conflict was present, and that stimulus conflict activated the left inferior frontal gyrus. In their follow up study using the Stroop task with 14 participants, van Veen and Carter (2005) observed no overlap of activation between semantic and response conflict. They showed that semantic conflict activated dorso-lateral prefrontal cortex (DLPFC: BA8/9), posterior parietal cortex (PPC: BA40) and the (ACC: BA32/6), whereas response conflict activated more inferior lateral prefrontal cortex (BA9/44/45/46), left premotor areas (BA6) and regions of the ACC (BA24/32) more anterior and ventral to that activated by semantic conflict (see also Chen et al., 2013, and Kim et al., 2010, for replications of this finding). This finding of ACC activation to semantic conflict conflicts with the Cascade-of-Control model (Banich, 2009, 2019). The authors argued that their findings were consistent with and extended the conflict monitoring account (Botvinick et al., 2001) by showing the involvement of separable regions of the ACC in monitoring for different types of conflict. Thus, using the 2:1 color-response mapping method, response and semantic conflict have been dissociated at the neural level. However, despite providing a seemingly convenient way of separating these different forms of conflict, Hasshim and Parris (2014, 2015) have shown, using both RT and pupillometry as dependent variables, that sameresponse trials do not differ from non-color word neutral trials (e.g., top in red) questioning their utility in dissociating response and semantic conflict (see Parris et al., under review, for a review and fuller discussion of this issue).

#### Non-response Set Trials

The only other trial type that has been used to dissociate the neural substrates of response and semantic conflict is nonresponse set trials (Milham et al., 2001). Non-response set trials are trials on which the irrelevant color word used is not one of the possible response colors (e.g., the word "orange" in blue, where orange is not a possible response option and blue is; originally introduced by Klein, 1964). Since the non-response set color word will activate color-processing systems, interference on such trials can be taken as evidence for conflict occurring at the semantic level. These trials should in theory remove the influence of response conflict, as the irrelevant color-word is not a possible response option, and thus conflict at the response level is not present. The difference in performance between the nonresponse set trials and a neutral word baseline condition (e.g., the word "table" in red) is taken as evidence of interference caused by the semantic processing of the irrelevant color word. Whereas response conflict can be isolated by comparing the difference between the performance on incongruent trials and the nonresponse set trials. This index of response conflict is referred to as the response set effect and describes the interference that is a result of the irrelevant word denoting a color that is also a possible response option.

Milham et al. (2001) investigated the neural substrates of response and non-response-related conflict using response- and non-response set trials, but blocked stimulus presentation such that a block contained either response set trials and neutral trials or non-response set trials and neutral trials (see also Milham et al., 2003). Consistent with van Veen et al. (2001), but inconsistent with van Veen and Carter (2005) they reported ACC activation to response conflict but no ACC activation to non-response conflict. They also reported that both left and right PFC were activated by response conflict, but only left PFC was activated by semantic conflict, a finding that is inconsistent with previous imaging studies. The lack of ACC activation to semantic conflict indicates that the theorized conflict monitoring processes (Botvinick et al., 2001) are not processing all types of conflict, which is consistent with the Cascade-of-Control model (Banich, 2009, 2019).

Whilst the response set effect might provide a useful measure of response conflict, the magnitude of the response set effect has varied between studies. Noting this, Hasshim and Parris (2018) reported within-subjects experiments in which the trial types (e.g., response set, non-response set, neutral) were presented either in separate blocks (pure) or in blocks containing all trial types in a random order (mixed). They observed a decrease in RTs to response set trials when trials were presented in mixed blocks when compared to the RTs to response set trials in pure blocks. The findings demonstrate that presentation format modulates the magnitude of the response set effect, and thus response conflict, substantially reducing it when trials are presented in mixed blocks. In contrast, semantic conflict was not significantly affected by the manipulation. It is important for studies to consider how these manipulations may be used to maximize the detection of a response set effect (response conflict); all previous fMRI investigations of response and semantic conflict have employed mixed blocks. Hasshim and Parris (2018) results suggests that the use of pure blocks will enable a better index of response conflict. For this reason, in the present study we presented trial types in pure blocks. A further benefit of this approach is that blocked designs remain the most statistically powerful designs for fMRI experiments with the recommendation that each block should be between 16–40 s in duration (Bandettini and Cox, 2000). Moreover, the use of pure blocks also has potential implications for the role of the ACC in Stroop task performance and conflict processing.

## The Role of the ACC in Stroop Task Performance

As noted above, ACC activation has been observed in neuroimaging studies of the Stroop task (Bench et al., 1993;

Peterson et al., 1999; Adleman et al., 2002; Langenecker et al., 2004; Liu et al., 2004; Coderre et al., 2008) and, as noted, has been theorized to have an important role in Stroop task performance, particularly in detecting response conflict (Botvinick et al., 2001; Banich, 2009, 2019) and have separable regions for detecting response and semantic conflict (van Veen and Carter, 2005; cf. Milham et al., 2001). However, the role of the ACC in the Stroop task has been debated (Botvinick et al., 2001; Fellows and Farah, 2005; Roelofs et al., 2006; Aarts et al., 2008) with some work showing that atrophy of the ACC has no effect on Stroop task performance (Swick and Jovanovic, 2002; Fellows and Farah, 2005). Importantly for present purposes, in a recent study Floden et al. (2011) showed that ACC involvement in Stroop task performance is substantially larger when trial types are presented randomly intermixed compared to when presented in pure blocks, which the authors tentatively argued supported the notion that ACC activation reflects arousal and not conflict monitoring. If trial type mixing were responsible for ACC activations observed in the Stroop task, we should see little to no ACC activation to response nor semantic conflict, which would contrast with findings showing separate regions of the ACC being involved in response and semantic conflict and with theories positing a role for the ACC in detecting conflict, especially since response conflict is maximized using pure block designs.

## Semantic-Associative Trials and the Orthogonality of Comparisons

A final method of dissociating response and semantic conflict is through the use of semantic-associative trials. In these trials the irrelevant words used are associatively related to the response colors (e.g., sky – blue, grass – green). This method of isolating semantic conflict was also first introduced by Klein (1964) and has since been used in many studies investigating semantic Stroop interference (Stirling, 1979; Sharma and McKenna, 1998; Risko et al., 2006; Augustinova and Ferrand, 2014; see also Neely and Kahan, 2001). This is important because having another well-validated way of separating response and semantic conflict permits us to address another issue with previous studies attempting to dissociate response and semantic conflict; and that is the issue of orthogonality of comparisons (Levin and Tzelgov, 2016). In all previous studies, the estimation of response conflict has been computed by comparing standard incongruent trials with the trial type used to index semantic conflict (e.g., sameresponse trials, non-response set trials). The trial type used to index semantic conflict has then been used again to compute semantic conflict against a neutral trial. This multiple use of a single trial type to compute the two different forms of conflict results in contaminated non-orthogonal measures (Levin and Tzelgov, 2016). To avoid this issue in the present study we compare standard incongruent trials with semantic-associative trials to get an index of response conflict, and non-response set and neutral trials to get a measure of semantic conflict.

## Task Conflict

Another form of conflict thought to contribute to Stroop effects is task conflict. The presence of task conflict was first proposed in MacLeod and MacDonald's (2000) review of brain imaging studies. The authors proposed its existence because the ACC appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli (e.g., xxxx). They suggested that increased ACC activation by congruent and incongruent stimuli is likely an expression of the task conflict caused by the automatically activated, irrelevant reading task and the intentionally activated color identification task. This suggestion was recently supported in a computational model of task conflict (Kalanthroff et al., 2018) and in an fMRI study of a task switching task that also reported a dissociation between response and task conflict in the ACC (Desmet et al., 2011). However, no study has yet sought to confirm this hypothesis in a neuroimaging study of the Stroop task itself.

Since task conflict is produced by the activation of the mental machinery used to read, interference at this level occurs with any stimulus that is found in the mental lexicon. In line with this any readable letter string should produce more interference than any unreadable, non-word letter string. Previous studies have used this logic in order to isolate task conflict from informational conflict (e.g., Entel and Tzelgov, 2018). Since both congruent and incongruent trials produce task conflict, trials consisting of repeated letters or symbols (e.g., xxxx or ####) have been introduced as a baseline (e.g., Monsell et al., 2001; Kalanthroff et al., 2015; Entel and Tzelgov, 2018). However, non-word letter strings (e.g., xxxx) are still likely to activate letter reading processes which may produce conflict between word processing and color processing to some extent. Levin and Tzelgov (2016) used unreadable common shapes instead of letter strings to measure task conflict since using repeated letters might activate the task set for word reading to some extent. This is a potentially important modification, but one issue with the use of common shapes is that the use of common, unreadable but nameable shapes might well have activated a shape naming task set that could interfere with the color naming task set. Therefore, in contrast to Levin and Tzelgov, in the present study we employed uncommon, unnameable shapes to prevent a shape-naming task set from interfering in the color naming process. However, to foreshadow our results an initial manipulation check revealed that our unnameable shape baseline was indistinguishable from our neutral baseline in both the RT and neutral data. Furthermore, in a separate unpublished oculomotor Stroop study run alongside the present study, these stimuli produced longer RTs than even our standard incongruent condition. It is unclear why this condition presented such a challenge for our participants, but beyond reporting this simple analysis we draw no conclusions regarding task conflict.

## Summary

Using the 2:1 color response mapping paradigm, both van Veen and Carter (2005) and Chen et al. (2013) showed that semantic conflict activated DLPFC, PPC and the ACC, whereas response conflict activated more inferior lateral PFC, left premotor areas and regions of the ACC that were more anterior and ventral to that activated by semantic conflict. These findings are consistent not only with a monitoring role for the ACC and a conflict resolution role for lateral PFC regions, they also suggest that

distinct areas of both regions separately process response and semantic conflict. However, the employment of the 2:1 paradigm renders the interpretation of their data less clear. Using nonresponse set trials, Milham et al. (2001) reported ACC and specifically right PFC activation to response conflict, but activity in left PFC to both response and semantic conflict. This finding is consistent with a role for the ACC in monitoring for response conflict, but not semantic conflict. However, both studies mixed trial types which could be responsible for ACC activation during Stroop task performance (Floden et al., 2011) and furthermore does not maximize response conflict (Hasshim and Parris, 2018). Moreover, they employed non-orthogonal contrasts in their measures of semantic and response conflict. Finally, task conflict has been hypothesized to be reflected in ACC activity but no study has yet provided supporting evidence for this.

In the present study, we investigated the neural substrates of response, semantic and task conflict by presenting five different trial types in pure blocks. The following trial types were employed in this experiment: Response set (standard incongruent) trials, non-response set trials, semantic-associative trials, color neutral trials and non-nameable shapes. However, following recommendations from Levin and Tzelgov (2016) for ensuring orthogonality of comparisons in the Stroop task we made the following comparisons to index response and semantic conflict: (1) For semantic conflict we compared performance on non-response set trials and neutral trials; (2) Response conflict was isolated using an incongruent (response set) vs. semantic associative condition comparison. Finally, for comparison with the neuroimaging studies of the general Stroop effect (e.g., Bench et al., 1993; Khorram-Sefat et al., 1996; Peterson et al., 1999; Zysset et al., 2001; Adleman et al., 2002; Mead et al., 2002; Langenecker et al., 2004; Liu et al., 2004; Coderre et al., 2008; Song and Hakoda, 2015; Cipolotti et al., 2016) we also accepted nonorthogonality when comparing incongruent and neutral trials (see **Figure 1**).

## METHODS

#### Participants

Twenty participants (14 female, Mage = 23.90, SD = 7.40), recruited from Bournemouth University's staff and student populations, were tested. All participants were 18–45 years old, fluent in English and had normal or corrected-to-normal vision, as well as normal color vision. Each participant received £10 and a copy of their structural brain scan for participating. The study was approved by the Bournemouth University research ethics committee, and all subjects provided fully informed consent to participate.

## Materials and Measures

#### Stimuli

Twelve unique stimuli were used for each of our five conditions (unnameable shape trials, neutral word trials, semanticassociative trials, non-response set trials, and incongruent trials). Items were presented individually in uppercase Courier New font, size 42, in the center of the screen on a black background. Four irregular shapes were used to make up four unique shape string trials (matched to the word length of the colors in the response set). The shapes consisted of two irregular quadrilaterals and two irregular pentagons. Other trials consisted of: neutral non-color words: TOP, CLUB, STAGE, CHIEF; color-associated words: SKY, TOMATO, LEMON, GRASS; color words (nonresponse): PURPLE, GOLD, WHITE, GRAY; incongruent color words: RED, BLUE, GREEN, YELLOW. Color-associated words were always presented in an incongruent color (e.g., "grass" would be presented in red, blue or yellow as opposed to green). Participants responded to the colors red (RGB: 255; 0; 0), blue (RGB: 0; 32; 96), green (RGB: 0; 176; 80), and yellow (RGB: 255; 255; 0) by pressing the corresponding key on a Cedrus response box.

#### Procedure

After informed consent had been obtained participants entered the MRI scanner and completed practice trials while a structural scan was performed. The practice trials consisted of 32 color patches (8 of each response color: red, blue, green and yellow) presented in a random order. Participants responded to the color using a Cedrus response box. After the practice trials participants completed the 600 experimental trials whilst BOLD activation was recorded. Participants were instructed to respond as quickly and as accurately as possible to the color of each stimulus whilst ignoring the meaning of the irrelevant word.

OpenSesame 3.2 software (Mathôt et al., 2012) was used to administer the Stroop task. The stimuli were presented in pure blocks containing all 12 stimuli for each condition. Each run contained the five conditions, with each condition presented in a random order for each new run. Each run was repeated 10 times, meaning that each participant completed a total of 600 trials (120 trials per condition), giving us more than the recommend 1600 observations per condition across all subjects (Brysbaert and Stevens, 2018). Each trial began with a fixation cross for 500 ms. The stimuli were then presented for 1000 ms followed by an interstimulus interval of 1000 ms during which a black screen was shown. After each block of 12 stimuli a break occurred for 10 s. Each testing session lasted approximately 45 min.

#### Image Acquisition

Scanning was performed on a 1.5T Philips Intera magnet with standard RF head coil at the Exeter MR Research Centre, University of Exeter, United Kingdom. A T<sup>2</sup> ∗ -weighted echo planar imaging (EPI) sequence was used (TR = 2300 ms, TE = 45 ms, flip angle = 90◦ , 30 oblique transverse slices in ascending order and matrix size = 3 × 3 × 3.5 mm). A total of 880 volumes were acquired for each subject. Participants were able to view the stimuli on a screen placed at the foot of the scanner via a mirror mounted on the head coil. Between each block there was a break for 10 s to allow the BOLD signal to return to baseline.

#### Image Analysis

Data were analyzed using SPM12 Software<sup>1</sup> . The fMRI images were pre-processed -realigned, sliced timed (ascending sequence,

<sup>1</sup>www.fil.ion.ucl.ac.uk/spm

30 slices, TR = 2300 ms), normalized and smoothed (to 8 mm). Statistical regressors were generated by convolving a canonical hemodynamic response function with a series of discrete event onset times for blocks (30 s duration) corresponding to the presentation of stimuli in the unnameable shapes, neutral word, semantic-associative, non-response set and incongruent conditions. A general linear model approach was used to estimate parameter values for each regressor. Having created a series of t-contrast images for each effect for each subject, the contrast images were entered into a 2nd level ("random effects") analysis consisting of onesample t-tests with a hypothesized mean of 0 (thresholded at p = 0.001). Following Parris et al. (under review), and to further protect against the probability of type 1 error, we employed an extent voxel threshold cut-off of 30. This combination of intensity and extent thresholds produces a per voxel false positive probability of < 0.000001 (Forman et al., 1995). Two sample repeated measures t-tests with a statistical threshold of p < 0.001, uncorrected, and a voxel cluster size threshold of 30 were also performed for each of the planned comparisons. In order to determine the site of activation, MNI (SPM) coordinates were

converted to Talairach coordinates using BioimageSuite<sup>2</sup> (Lacadie et al., 2008).

## RESULTS

#### Analysis of Mean Response Times

The mean RTs of correct responses for each participant in each condition were subjected to a one-way repeated measures ANOVA. All RT outliers (RTs < 300 ms) were excluded from the analysis. In total seven trials were excluded as outliers (2 unnameable shapes, 1 semantic associate, 1 non-response, and 3 incongruent trials). The mean RTs of each experimental condition are summarized in **Table 1**.

Mauchly's test indicated that the assumption of sphericity had been violated χ 2 (9) = 25.13, p = 0.003, therefore the degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.56). The results of the oneway repeated measures ANOVA revealed that the main effect of condition was significant F(2.23, 42.40) = 4.59, p = 0.013,

<sup>2</sup>www.bioimagesuite.org


"NW" refers to neutral words. "SA" refers to semantic associates. "NRS" refers to non-response set. SD is presented between parentheses.

η 2 <sup>p</sup> = 0.195. Therefore, follow up pairwise comparisons were conducted for each of our planned comparisons. The comparison for task conflict (neutral words vs. unnameable shapes) revealed a non-significant difference between conditions [t(19) = −0.98, p = 0.340]. The comparison for semantic conflict revealed a significant semantic Stroop effect [t(19) = 3.04, p = 0.007]. The comparison for response conflict was also significant [t(19) = 2.38, p = 0.028]. Finally, an overall Stroop effect was observed using an incongruent vs. neutral word comparison [t(19) = 3.14, p = 0.005].

#### Analysis of Errors

Errors, including incorrect responses and time-out errors, accounted on average for 12.63% of the trials (unnameable shapes 12.71%: neutral words 11.08%; semantic associates 12.17%; non-response set 12.33%; incongruent 14.71%), which is similar to error rates seen in other fMRI assays (e.g., van Veen and Carter, 2005). An omnibus ANOVA for error rates across the five conditions was conducted. Mauchly's test indicated that the assumption of sphericity had been violated χ 2 (9) = 36.99, p = 0.001, therefore the degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.49). The results showed that the effect of condition on the rate of response errors was non-significant F(1.98, 37.54) = 2.39, p = 0.106, η 2 <sup>p</sup> = 0.112. Because our ANOVA revealed no significant effect of condition on error rates, follow-up pairwise comparisons between conditions were not carried out.

#### fMRI Data

fpsyg-10-02426 October 29, 2019 Time: 15:58 # 8

Analysis of the fMRI data revealed different patterns of brain activity in response to the different types of conflict indexed (see **Table 2** and **Figure 2**). Planned contrasts were carried out to reveal the brain regions that elicited activity in response to each of the types of conflict. The contrast for task conflict did not show any significant sites of activation. Compared to neutral word trials, non-response set trials elicited a significant cluster of activation in the left inferior frontal gyrus (BA44). Semantic conflict also led to a significant cluster of activation in the right thalamus. The comparison between incongruent and semantic associate trials, our index for response conflict, revealed activity in the left parietal (BA40) and prefrontal cortices (BA44/9). Finally, the incongruent – neutral word contrast revealed the brain regions recruited by the overall Stroop interference effect. The largest clusters of activation were found bilaterally in the dorso-lateral PFC (BA44/8/9/10) and the left parietal cortex (BA40), as well as activation within the right mediodorsal nucleus of the thalamus. Importantly, no activation was observed within the ACC in any of the contrasts even when the alpha and cluster thresholds were lowered to match that of previous studies that do report ACC activation (Milham et al., 2001; van Veen and Carter, 2005), and this is despite the present study involving more participants, with more trials per condition, and using the more powerful block design.

TABLE 2 | Activated areas in response to each of the components of Stroop interference.


"US" refers to unnameable shapes. "NW" refers to neutral words. "SA" refers to semantic associates. "NRS" refers to non-response set. "I" refers to incongruent. The normalized voxel size was 2 × 2 × 2 mm. Only clusters of 30 voxels or greater are presented.

## DISCUSSION

The aim of the present study was to investigate the neural substrates of response, semantic and task conflict using methods informed by recent research (Augustinova and Ferrand, 2014; Hasshim and Parris, 2014, 2015, 2018; Levin and Tzelgov, 2016). Following critical evaluation of previous methods employed in influential neuroimaging investigations (e.g., Milham et al., 2001; van Veen and Carter, 2005) we used trial types thought to better independently measure response and semantic conflict (see Parris et al., under review, for a review) and unlike previous studies computed orthogonal contrasts. Furthermore, we presented the trial types in pure blocks to both maximize response conflict and assess the role of the ACC in Stroop task performance. Finally, our study also included a measure of task conflict. In what follows we summarize our findings by considering their implications for each of the regions associated with Stroop task performance.

#### Anterior Cingulate Cortex

An important finding to note first, since it applies to all comparisons made, is that we observed no ACC activations in any of our contrasts. This is a notable difference in reported findings between the present study and all previous studies of the neural substrates of Stroop task performance. This held even when reducing the threshold to that used in the other studies and despite testing more participants, having more trials per condition and using the more powerful block design. Indeed, we attribute this difference to the use of the block design (Floden et al., 2011). Floden et al. (2011) compared blocked and mixed designs and observed substantially reduced ACC activation in the blocked trials. This led the authors to conclude that ACC activation represents arousal and not conflict monitoring. Whilst our data do not allow us to conclude in favor of an arousal function of the ACC, this finding strongly contrasts with a the role of the ACC in conflict monitoring (e.g., Botvinick et al., 2001; Fellows and Farah, 2005; Roelofs et al., 2006; Aarts et al., 2008), and with the notion that separate regions of the ACC detect different forms of conflict (van Veen and Carter, 2005). However, this finding does not necessarily contradict the Cascade-of-Control model (Banich, 2009, 2019). The Cascade-of-Control model predicts a role for posterior and dorsal ACC in late stage, response selection, and more rostral ACC in conflict monitoring. Uniquely, however, it stipulates that the role these ACC regions play depends on how well the earlier selection regions of the PFC perform their role. Conceivably, presenting the trials in pure blocks, enables better proactive control by the inferior frontal gyrus, a key region of activation in the present study, mitigating the role of ACC regions. Nevertheless, the Cascade-of-Control model would predict posterior and ACC activation specifically for response conflict, which we isolated and for which we do not observe ACC activation.

As foreshadowed in the introduction our unnameable shape condition produced RTs equivalent to those in the neutral condition, which means we are unable to clarify the role of the ACC in this form of conflict. In order to index task conflict (the conflict that arises from reading the irrelevant word dimension of a Stroop stimulus) we proposed that unnameable irregular shapes

would provide us with the most suitable baseline condition to compare against readable neutral word trials. Unexpectedly our data showed that the shape trials produced longer RTs and more response errors than neutral word trials and thus we were unable to demonstrate evidence for the effect of task conflict using this comparison. And whilst it has been convincingly argued that corroborative RT data is not necessarily needed to interpret fMRI data (Wilkinson and Halligan, 2004), it was also the case that the shape vs. neutral trial comparison in the fMRI data produced no significant activation sites in a whole brain analyses. Neither our data, nor existing literature, permits us to interpret the finding. We have subsequently observed similar RT findings in some unpublished data from an oculomotor study suggesting that unnameable shape trials are hard for participants to color name. More research is needed to understand this effect, but for now the results from the current study do not permit us to conclude anything regarding the neural substrates of task conflict.

#### Prefrontal Cortex

The neural activations reported for the standard incongruent (response set) trials vs. neutral trial comparison largely reflects a combination of the activations for response and semantic conflict. Whilst this comparison revealed more bilateral activations compared to the generally more left-sided activations seen in the response and semantic conflict analyses, the larger activation clusters are in the left hemisphere. The largest clusters of activations for the overall Stroop effect were in the left inferior frontal gyrus (BA44) consistent with a role for this region in setting the attentional set and biasing activation toward the color dimension and away from the word dimension of the Stroop stimulus (Botvinick et al., 2001; Banich, 2009, 2019). Mid and superior dorsal PFC regions were also more greatly activated by incongruent than neutral trials consistent with a role for these regions in selected the relevant dimension of the Stroop stimulus (Banich, 2009, 2019). Our data do not, however, permit us to conclude in favor of the dissociated roles of the inferior and mid PFC regions posited by the Cascade-of-Control model.

In terms of neural activations to response conflict we observed activity in the left middle and inferior frontal gyri (BA9/44). The finding of an association between the left IFG and response conflict is consistent with a previous finding (van Veen and Carter, 2005), although it has more frequently been associated with semantic conflict (Milham et al., 2001; van Veen et al., 2001; Chen et al., 2013), but is inconsistent with the Cascade-of-Control model (Banich, 2009, 2019), which predicts this region is an area of early selection, not late, response selection, which the model places in the ACC. An association between the left middle frontal gyrus (BA9) and response conflict is more consistent with previous research (Milham et al., 2001; van Veen et al., 2001; Chen et al., 2013), but is somewhat inconsistent with the Cascade-of-Control model since according the model the PFC is responsible for early selection, although it is unclear whether the model removes a role completely for mid PFC regions in response conflict processing. However, in two of those studies (van Veen et al., 2001; Chen et al., 2013), same-response trials were used to dissociate response and semantic conflict. Given the findings of Hasshim and Parris (2014, 2015) the findings from these two studies might be better interpreted as being the equivalent of an incongruent and neutral trial comparison and not therefore isolated response conflict. Having used a better measure of response conflict the present study presents more reliable findings as to the neural substrates of response conflict.

The non-response set trial vs. neutral trial comparison indexing semantic conflict revealed activations in the left inferior frontal gyrus (IFG; BA44). The finding of activation associated with the left IFG is consistent with all previous studies investigating the neural mechanisms of semantic conflict (Milham et al., 2001; van Veen and Carter, 2005; Chen et al., 2013), although in these previous studies this activation was unique to semantic conflict with the exception of van Veen and Carter (2005). Again though, as noted, two of the studies (van Veen and Carter, 2005; Chen et al., 2013) used same-response trials. Our data suggest that the IFG (BA44) plays an important role in processing both response and semantic conflict. Whilst we have argued that the former is inconsistent with the Cascade-of-Control model, a role for the IFG in semantic conflict processing is not. Semantic conflict occurs earlier than response conflict, and since the Cascade-of-Control model argues the IFG is involved in early selection once could consider this result consistent with the model. Notably, however, the model is unclear about the regions that are involved in the processing of semantic, and indeed all non-response, conflict.

## Parietal Lobe

The results of the incongruent vs., neutral comparison also concurs with many previous studies highlighting the importance of the parietal regions, in the left hemisphere in particular, in Stroop task performance (e.g., Bench et al., 1993; Khorram-Sefat et al., 1996; Peterson et al., 1999; Zysset et al., 2001; Adleman et al., 2002; Mead et al., 2002; Langenecker et al., 2004; Liu et al., 2004; Coderre et al., 2008; Song and Hakoda, 2015). These regions mainly comprise the frontoparietal network, the control network responsible for our ability to coordinate behavior in a goal-driven manner (Marek and Dosenbach, 2018), a region implicated in many tests of executive function.

One of the largest clusters of activations for the overall Stroop effect was in the left parietal lobe (BA40) which was also important in the processing of response, but not semantic, conflict in our data. Response conflict has been associated with the left parietal region (specifically BA40) in the present study and in Chen et al. (2013) and Milham et al. and in studies not employing the Stroop task (Wendelken et al., 2009), and is consistent with the notion that inferior parietal lobe (BA40) might be involved in the allocation of attention to different posterior processing streams to bias processing toward the relevant processing stream (e.g., color) to reduce conflict (Liu et al., 2004). Furthermore, the finding that it is not involved in semantic conflict is consistent with notion that the parietal role plays a role in representing stimulus-response mappings (Casey et al., 2000; Rushworth et al., 2001; Bunge et al., 2002).

Whilst neither the conflict monitoring nor Cascade-of-Control models focus on the role of the parietal lobe in accounting for Stroop task performance, Banich (2009; 2019) notes that the frontoparietal network is implicated in

biasing processing in posterior color and word processing regions of the brain.

#### Thalamus

Whilst not unprecedented (Peterson et al., 1999) activations of the thalamus are not often reported in fMRI studies of the Stroop task but this might be because of the Region of Interest approach taken by studies investigating response and semantic conflict whereby analysis is restricted to frontal and parietal regions (van Veen and Carter, 2005; Chen et al., 2013). However, the part of the thalamus activated by semantic conflict in the present study, the medio-dorsal nucleus, receives input from the lateral prefrontal cortex and forms part of the fronto-striatal system of reciprocal, cortical-subcortical loops (Alexander et al., 1986), and has been implicated in processing stimulus-response relationships (Parris et al., 2007) with a general role hypothesized to be in temporally extending the efficiency of the cortical networks involving the prefrontal cortex (Pergola et al., 2018). Moreover, smaller thalamic volume has been associated with slower RTs and poorer performance on the Stroop task (Van Der Elst et al., 2007; see also Hughes et al., 2012). Finally, and as already noted, no ACC activation was observed for semantic conflict, although this particular finding need not necessarily be attributed to the block design employed (Floden et al., 2011), given that lack of ACC activation to semantic conflict has been reported in two previous studies (Milham et al., 2001; van Veen et al., 2001).

## CONCLUSION

In conclusion, using methods informed by recent research on the varieties of conflict in the Stroop task (see Parris et al., under review, for a review) the present study provides evidence for specialized functions of regions of the frontoparietal network in Stroop task performance. Specifically, together with previous research our data indicate that the left inferior PFC plays an important role in the processing of both response and semantic conflict, a finding that is broadly consistent with other work

## REFERENCES


(e.g., Milham et al., 2001) whilst regions of the left parietal cortex (BA40) play an accompanying role in response, but not semantic, conflict processing. Moreover, our study reports a role for the thalamus in processing semantic, but not response, conflict. Finally, in none of our comparisons did we observe activity in the ACC, a finding we ascribe to the use of blocked trial type presentation (Floden et al., 2011) and one that is inconsistent with the conflict monitoring model (Botvinick et al., 2001). Whilst our results do not fully support the Cascade-of-Control model (Banich, 2009, 2019), the model does potentially account for most of the findings presented herein.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Bournemouth University Research Ethics Committee. The patients/participants provided their written informed consent to participate in this study.

## AUTHOR CONTRIBUTIONS

BP was involved in the design and analysis of the study, and wrote the manuscript. MW was involved in the design, preparation, data collection, analysis, and wrote portions of the manuscript. NH was involved in the design and preparation of the study, and provided comments on earlier drafts of the manuscript. AB was involved in the preparation of the study, data collection, and analysis. MA was involved in the design of the study and provided comments on earlier drafts of the manuscript. LF was involved in the design of the study and provided comments on earlier drafts of the manuscript.



association reversals. J. Cogn. Neurosci. 19, 13–24. doi: 10.1162/jocn.2007. 19.1.13


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Parris, Wadsley, Hasshim, Benattayallah, Augustinova and Ferrand. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.