Edited by: Srikantan S. Nagarajan, University of California San Francisco, USA
Reviewed by: Tracy L. Luks, University of California San Francisco, USA; Leighton B. Hinkley, University of California San Francisco, USA; Karuna Subramaniam, University of Califronia San Francisco, USA
*Correspondence: Thomas A. Christensen, Department of Speech, Language and Hearing Sciences, University of Arizona, 1131 E. 2nd Street, Tucson, AZ 85721, USA. e-mail:
This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
A common explanation for the interference effect in the classic visual Stroop test is that reading a word (the more automatic semantic response) must be suppressed in favor of naming the text color (the slower sensory response). Neuroimaging studies also consistently report anterior cingulate/medial frontal, lateral prefrontal, and anterior insular structures as key components of a network for Stroop-conflict processing. It remains unclear, however, whether automatic processing of semantic information can explain the interference effect in other variants of the Stroop test. It also is not known if these frontal regions serve a specific role in visual Stroop conflict, or instead play a more universal role as components of a more generalized, supramodal executive-control network for conflict processing. To address these questions, we developed a novel auditory Stroop test in which the relative dominance of semantic and sensory feature processing is reversed. Listeners were asked to focus either on voice gender (a more automatic sensory discrimination task) or on the gender meaning of the word (a less automatic semantic task) while ignoring the conflicting stimulus feature. An auditory Stroop effect was observed when voice features replaced semantic content as the “to-be-ignored” component of the incongruent stimulus. Also, in sharp contrast to previous Stroop studies, neural responses to incongruent stimuli studied with functional magnetic resonance imaging revealed greater recruitment of conflict loci when selective attention was focused on gender meaning (semantic task) over voice gender (sensory task). Furthermore, in contrast to earlier Stroop studies that implicated dorsomedial cortex in visual conflict processing, interference-related activation in both of our auditory tasks was localized ventrally in medial frontal areas, suggesting a dorsal-to-ventral separation of function in medial frontal cortex that is sensitive to stimulus context.
In the classic visual Stroop test, interference arises when behavioral responses are contingent upon selecting the task-relevant dimension (ink color) over the task-irrelevant information (word meaning) embedded in an incongruent sensory stimulus. Our ability to process word meaning is faster and more automatic than our skill at naming colors, and Stroop proposed that our natural tendency to read words must therefore be suppressed in favor of color naming to successfully complete the task (Stroop,
Concerns have been raised that the activation patterns reported in the imaging literature to date may have less to do with conflict processing
To address the issue of whether the sensory domain may influence neuroimaging results, one recent study used an auditory version of the Stroop test known as the “high/low” paradigm (Haupt et al.,
To help clarify the unresolved issues surrounding task- vs. modality-dependent neural activation in Stroop tasks, we developed a novel auditory Stroop-conflict paradigm for use in conjunction with functional magnetic resonance imaging (fMRI). The test consists of two tasks that use gender as a common construct that we varied along two dimensions. We chose gender-based tasks because gender is a highly salient social and emotional construct, and gender cues can be varied in terms of the perceptual and cognitive demands placed on the listener (Most et al.,
The study included 26 healthy volunteers (16 women), mean age was 25.9 years (range 19–53), and all were right-handed, native English speakers recruited from the Tucson community. All participants gave informed consent and were paid for their time. All procedures used in this study were approved by the University of Arizona Human Subjects Protection IRB.
We used an auditory variant of the Stroop test in which words that referenced male or female gender were spoken by either men or women in order to assess attentional control in the presence of either semantic or sensory interference (Figure
Stimuli were digitally recorded and edited using Sound Forge software (Sony Creative Software Inc., New York, NY, USA). To control for potential effects of habituation to a single male or female voice, five men and five women recorded the words. The word stimuli were tested in a pilot study involving eight listeners to ensure that the gender of each speaker was clear and consistent. To control for potential effects of word length, all categories included one to three-syllable words, and the average stimulus durations for the three categories were as follows (means ± SEM): male = 667 ± 21 ms; female = 680 ± 19 ms; neutral = 677 ± 18 ms. Recordings were edited to assure maximal signal-to-noise ratios without peak clipping, and were adjusted to equalize loudness percept across individual speech stimuli. The edited recordings were then delivered as individual sound files in the scanner environment using E-Prime software (Psychology Software Tools Inc., Pittsburg, PA, USA), and participants listened to the words through MRI-compatible headphones (Resonance Technology Inc., Northridge, CA, USA). Behavioral responses were collected using two response pads (Lumina System, Cedrus Corp., San Pedro, CA, USA), placed in the participant's right and left hands.
While undergoing an fMRI scan (see below), listeners performed two tasks in which they had to differentially use gender-based cues to classify the word presented in each trial. In the VOICE task, they had to attend to the voice gender while ignoring the gender-associated meaning, while in the SEMANTIC task, they had to attend to gender meaning irrespective of the voice. In the VOICE task, participants were asked to press the button in one hand if the word was spoken by a man, and the button in the other hand if it was spoken by a woman. In the SEMANTIC task, they were asked to press the button in one hand if the word meaning was masculine (or feminine), and the button in the other hand for all other words. The 60 gender-congruent trials served as a measure of facilitation, the 60 gender-incongruent trials were used to measure interference, and the 60 trials with neutral words served as controls. While some previous studies have used the contrast between incongruent and congruent responses as a measure of conflict, it is important to note that this contrast includes cognitive components relating to both interference and facilitation (Roberts and Hall,
Task order in the scanner was counterbalanced across subjects, and button-press responses (accuracy rates and reaction times) were recorded by E-Prime. Immediately following each functional scan, we asked each participant to rate the difficulty of the task using a subjective five-point rating scale, with one being very easy and five being very difficult.
Repeated measures ANOVA was used to evaluate accuracy and reaction times using TASK (focusing attention toward voice or meaning) and CONGRUENCY (congruent, conflict, and neutral trials) as repeated measures across all listeners. However, a number of neuroimaging studies have found significant gender differences in both speech production and comprehension tasks (Buckner et al.,
MRI data were acquired with a 3.0T GE Signa VH/i scanner (General Electric Medical Systems, Milwaukee, WI, USA) equipped with an eight-channel RF head coil. Each session began with a T1-weighted structural volume in the axial plane that covered the entire brain (fast-spin echo protocol: TR = 300 ms, TE = minimum, flip angle = 30°, in-plane resolution = 3.44 mm × 3.44 mm × 5 mm). Next, functional images were acquired using a spiral-in/out protocol (TR = 3500 ms, TE = 30 ms, flip angle = 90°, in-plane resolution = 3.75 mm × 3.75 mm × 5 mm) that reduces susceptibility artifacts and spatial distortion (Glover and Law,
Image analysis was performed with AFNI (
To extract parameter estimates, the preprocessed fMRI data from each participant was convolved with multiple regressors of interest using a gamma-spline hemodynamic response function using
Brain masks for each participant were corrected to account for possible susceptibility artifacts that can occur near air sinuses, resulting in distortion of the BOLD signal in regions of interest in this study, particularly areas in the ventromedial frontal lobe (Ojemann et al.,
As shown in Figure
Correct responses also revealed significantly longer reaction times for SEMANTIC trials than for VOICE trials (Figure
In contrast to a strong interference effect from our auditory Stroop protocol, we did not observe a significant facilitation effect (defined as the difference between congruent and neutral responses; Figure
We first investigated regions related to the main effects (all stimulus trials combined) for each of the two auditory tasks separately, using a group ANOVA (
Region (Brodmann areas) | VOICE task | SEMANTIC task | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Volume | Volume | |||||||||
Lateral frontal | ||||||||||
Dorsolateral prefrontal (9/46/10) | 44 | 23 | 30 | 3797 | 3.99 | – | – | – | – | – |
Middle frontal gyrus (6) | −38 | −1 | 53 | 1406 | 3.53 | – | – | – | – | – |
Medial frontal | ||||||||||
Medial frontal gyrus (8/32) | – | – | – | – | – | 3 | 29 | 45 | 1477 | 3.79 |
Parietal | ||||||||||
Superior parietal lobule (7) | −34 | −62 | 51 | 1547 | 3.44 | – | – | – | – | – |
− |
− |
− |
− |
|||||||
Inferior frontal | ||||||||||
Pars triangularis (45) | – | – | – | – | – | 45 | 21 | 12(b) | 2531 | 3.95 |
− |
− |
|||||||||
Right anterior insula (13) | 28 | 18 | 5 | 1969 | 3.91 | – | – | – | – | – |
Medial frontal | ||||||||||
Anterior cingulate cortex (32) | 3 | 36 | 0 | 2391 | 3.85 | – | – | – | – | – |
Medial frontal gyrus (10/32) | – | – | – | – | – | −2 | 55 | −6 | 2953 | 4.09 |
Temporal | ||||||||||
− |
− |
− |
− |
|||||||
R. Primary auditory ctx (41/42) | – | – | – | – | – | 61 | −16 | 10 | 1758 | 3.97 |
For both tasks, the most striking result was the enhanced activation in several regions of the medial frontal cortex situated below the genu of the corpus callosum. For the VOICE task, this activity was located in the ventral anterior cingulate cortex (vACC; BA32). vACC activity was right-lateralized, but present in both hemispheres (Figure
For the SEMANTIC task, the largest cluster was localized to the ventral portion of the medial frontal gyrus in BA10/32 (Figure
Following the initial screening for main effects, contrast analysis was used to separate the activation patterns associated with interference and facilitation for each task. Interference processing (incongruent > neutral) in the VOICE task was associated with increased activation in a subset of the regions identified from the main effects analysis in Figure
Condition/region | VOICE task | SEMANTIC task | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Volume | Volume | |||||||||
Ventral sites | ||||||||||
Inferior frontal | ||||||||||
L. pars triangularis (45) | – | – | – | – | – | −47 | 21 | 6 | 1266 | 3.75 |
R. pars triangularis (45) | – | – | – | – | – | 39 | 31 | 6(b) | 2461 | 3.98 |
− |
− |
− |
− |
|||||||
Medial frontal | ||||||||||
Anterior cingulate (24/32) | 8 | 31 | 1 | 1898 | 3.73 | – | – | – | – | – |
Medial frontal gyrus (10/32) | – | – | – | – | – | −5 | 50 | −3 | 2039 | 3.95 |
Subcortical | ||||||||||
Caudate head | – | – | – | – | – | 11 | 13 | 5 | 1336 | 3.57 |
Putamen | 21 | 13 | 0 | 1828 | 3.88 | – | – | – | – | – |
Dorsal sites | ||||||||||
Anterior cingulate (32) | 16 | 22 | 28 | 1828 | 3.90 | – | – | – | – | – |
Medial frontal gyrus (8) | – | – | – | – | – | 4 | 45 | 38 | 2180 | 3.59 |
In sharp contrast to the interference effect in the VOICE task, a facilitation effect (congruent > neutral words) was not associated with increased activity in vACC. Instead, activation associated with congruency was observed in a distinctly dorsal portion of the medial frontal wall, specifically right dACC (BA32; Figures
To examine these relationships in greater detail, individual subject analysis revealed that 92% (24 out of 26) of all listeners showed interference-related activity in vACC, whereas 80% showed increased activity in dACC associated with congruency. We then used linear regression to compare these two patterns of medial frontal activation with the behavioral measures for the Stroop effect examined in Figure
Medial frontal region | TASK/condition | Behavioral measures | ||||
---|---|---|---|---|---|---|
Accuracy | Reaction time | |||||
dACC | Congruent | 0.275 | 0.071 | |||
Incongruent | 0.222 | 0.063 | 0.252 | 0.158 | ||
vACC | Congruent | 0.044 | 0.217 | 0.179 | 0.140 | |
Incongruent | ||||||
dmFG | Congruent | |||||
Incongruent | 0.189 | 0.244 | 0.110 | 0.290 | ||
vmFG | Congruent | 0.081 | 0.110 | 0.077 | 0.725 | |
Incongruent |
The interference contrast in the SEMANTIC task also revealed activation in several anterior regions, including clusters along the medial frontal wall, in anterior insula, and in subcortical regions, but the specific activation loci were distinctly different than those revealed in the VOICE task. Interference processing in the SEMANTIC task recruited a large cluster of activation more anteriorly in the ventral portion of medial frontal gyrus (BA10/32; Figure
Individual subject analysis revealed that none of the 26 listeners showed significant interference-related activity along the dorsomedial frontal wall at a corrected threshold of
Since the classic Stroop effect was first described in the early twentieth century (Stroop,
Our data strongly favor the
As shown in Figure
Using this new paradigm, we found strong behavioral evidence for Stroop interference when listeners focused on the more attentionally demanding stimulus dimension, gender-referenced meaning, in the SEMANTIC task (Figure
In contrast, processing of congruent stimuli in both tasks led to increased activation only in dorsal portions of the right medial frontal cortex (Figures
In addition to medial frontal cortex, previous imaging studies of response inhibition have also consistently revealed activity in inferior frontal and insular cortices (Banich et al.,
Prefrontal and medial frontal regions are often activated concurrently, indicating a close functional link between these two attention areas (Carter et al.,
Interference-related activity in anterior insula was localized to the left hemisphere in both of our Stroop tasks (Table
Using a novel auditory Stroop paradigm, we demonstrate a significant interference effect with gender-typical nouns spoken by gender-mismatched voices. Using several independent measures of conflict management, our results support the conclusion that suppression of semantic processing cannot explain all instances of the Stroop effect in the auditory domain. Rather, the behavioral results provide fresh evidence that Stroop interference is due to a failure to suppress whichever stimulus attribute is processed more automatically (in this case, voice gender). Secondly, our fMRI results differ in several major respects from previous neuroimaging studies. Specifically, we found no evidence for a selective role of dorsomedial frontal structures in auditory Stroop interference processing. Instead, our results provide support for a dorsal-to ventral dissociation of function along the medial frontal wall, which links ventral regions to interference processing in emotionally salient, cognitively challenging tasks (in this case, discriminating nouns by their gender association), and dorsal regions to the more global task of conflict monitoring.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We wish to thank Juliana Bass, Rita Kaplon, Jessica Motzkin, and Scott Squire for technical assistance. This research was supported by a grant from the NIH/National Institute on Deafness and Other Communication Disorders (K01 DC008812) to Thomas A. Christensen.