The Stroop Color and Word Test

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify the theoretical adequacy of the various scoring methods used to measure the Stroop effect. We present a systematic review of studies that have provided normative data for the SCWT. We referred to both electronic databases (i.e., PubMed, Scopus, Google Scholar) and citations. Our findings show that while several scoring methods have been reported in literature, none of the reviewed methods enables us to fully assess the Stroop effect. Furthermore, we discuss several normative scoring methods from the Italian panorama as reported in literature. We claim for an alternative scoring method which takes into consideration both speed and accuracy of the response. Finally, we underline the importance of assessing the performance in all Stroop Test conditions (word reading, color naming, named color-word).


INTRODUCTION
The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used for both experimental and clinical purposes. It assesses the ability to inhibit cognitive interference, which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus (Stroop, 1935). In the most common version of the SCWT, which was originally proposed by Stroop in the 1935, subjects are required to read three different tables as fast as possible. Two of them represent the "congruous condition" in which participants are required to read names of colors (henceforth referred to as colorwords) printed in black ink (W) and name different color patches (C). Conversely, in the third table, named color-word (CW) condition, color-words are printed in an inconsistent color ink (for instance the word "red" is printed in green ink). Thus, in this incongruent condition, participants are required to name the color of the ink instead of reading the word. In other words, the participants are required to perform a less automated task (i.e., naming ink color) while inhibiting the interference arising from a more automated task (i.e., reading the word; MacLeod and Dunbar, 1988;Ivnik et al., 1996). This difficulty in inhibiting the more automated process is called the Stroop effect (Stroop, 1935). While the SCWT is widely used to measure the ability to inhibit cognitive interference; previous literature also reports its application to measure other cognitive functions such as attention, processing speed, cognitive flexibility (Jensen and Rohwer, 1966), and working memory (Kane and Engle, 2003). Thus, it may be possible to use the SCWT to measure multiple cognitive functions.
In the present article, we present a systematic review of the SCWT literature in order to assess the theoretical adequacy of the different scoring methods proposed to measure the Stroop effect (Stroop, 1935). We focus on Italian literature, which reports the use of several versions of the SCWT that vary in in terms of stimuli, administration protocol, and scoring methods. Finally, we attempt to indicate a score method that allows measuring the ability to inhibit cognitive interference in reference to the subjects' performance in SCWT.

METHODS
We looked for normative studies of the SCWT. All studies included a healthy adult population. Since our aim was to understand the various available scoring methods, no studies were excluded on the basis of age, gender, and/or education of participants, or the specific version of SCWT used (e.g., short or long, computerized or paper). Studies were identified using electronic databases and citations from a selection of relevant articles. The electronic databases searched included PubMed (All years), Scopus (All years) and Google Scholar (All years). The last search was run on the 22nd February, 2017, using the following search terms: "Stroop; test; normative." All studies written in English and Italian were included.
Two independent reviewers screened the papers according to their titles and abstracts; no disagreements about suitability of the studies was recorded. Thereafter, a summary chart was prepared to highlight mandatory information that had to be extracted from each report (see Table 1).
One Author extracted data from papers while the second author provided further supervision. No disagreements about extracted data emerged. We did not seek additional information from the original reports, except for Caffarra et al. (2002), whose full text was not available: relevant information have been extracted from Barletta-Rodolfi et al. (2011).
We extracted the following information from each article: • Year of publication.
• Indexes whose normative data were provided.
Eventually, as regards the variables of interest, we focused on those scores used in the reviewed studies to assess the performance at the SCWT.

RESULTS
We identified 44 articles from our electronic search and screening process. Eleven of them were judged inadequate for our purpose and excluded. Four papers were excluded as they were written in languages other than English or Italian (Bast-Pettersen, 2006;Duncan, 2006;Lopez et al., 2013;Rognoni et al., 2013); two were excluded as they included children (Oliveira et al., 2016) and a clinical population (Venneri et al., 1992). Lastly, we excluded six Stroop Test manuals, since not entirely procurable (Trenerry et al., 1989;Artiola and Fortuny, 1999;Delis et al., 2001;Golden and Freshwater, 2002;Mitrushina et al., 2005;Strauss et al., 2006a). At the end of the selection process we had 32 articles suitable for review (Figure 1). From the systematic review, we extracted five studies with Italian normative data. Details are reported in Table 1. Of the remaining 27 studies that provide normative data for non-Italian populations, 16 studies (Ivnik et al., 1996;Ingraham et al., 1988;Rosselli et al., 2002;Moering et al., 2004;Lucas et al., 2005;Steinberg et al., 2005;Seo et al., 2008;Peña-Casanova et al., 2009;Al-Ghatani et al., 2011;Norman et al., 2011;Andrews et al., 2012;Llinàs-Reglà et al., 2013;Morrow, 2013;Lubrini et al., 2014;Rivera et al., 2015;Waldrop-Valverde et al., 2015) adopted the scoring method proposed by Golden (1978). In this method, the number of items correctly named in 45 s in each conditions is calculated (i.e., W, C, CW). Then the predicted CW score (Pcw) is calculated using the following formula: (1) equivalent to: Then, the Pcw value is subtracted from the actual number of items correctly named in the incongruous condition (CW) (i.e., IG = CW − Pcw): this procedure allows to obtain an interference score (IG) based on the performance in both W and C conditions. Thus, a negative IG value represents a pathological ability to inhibit interference, where a lower score means greater difficulty in inhibiting interference. Six articles (Troyer et al., 2006;Bayard et al., 2011;Campanholo et al., 2014;Bezdicek et al., 2015;Hankee et al., 2016;Tremblay et al., 2016) adopted the Victoria Stroop Test. In this version, three conditions are assessed: the C and the CW correspond to the equivalent conditions of the original version of the test (Stroop, 1935), while the W condition includes common words which do not refer to colors. This condition represents an intermediate inhibition condition, as the interference effect between the written word and the color name is not present. In this SCWT form (Strauss et al., 2006b), for each condition, the completion time and the number of errors (corrected, noncorrected, and total errors) are recorded and two interference scores are computed: Five studies (Strickland et al., 1997;Van der Elst et al., 2006;Zalonis et al., 2009;Kang et al., 2013;Zimmermann et al., 2015) adopted different SCWT versions. Three of them (Strickland et al., 1997;Van der Elst et al., 2006;Kang et al., 2013) computed, independently, the completion time and the number of errors for each condition. Additionally, Van der Elst et al. (2006), computed an interference score based on the speed performance only: where WT, CT, and CWT represent the time to complete the W, C, and CW   Ingraham et al., 1988;Ivnik et al., 1996;Rosselli et al., 2002;Moering et al., 2004;Lucas et al., 2005;Steinberg et al., 2005;Seo et al., 2008;Peña-Casanova et al., 2009;Al-Ghatani et al., 2011;Norman et al., 2011;Andrews et al., 2012;Llinàs-Reglà et al., 2013;Morrow, 2013;Lubrini et al., 2014;Rivera et al., 2015;Waldrop-Valverde et al., 2015

References Index
where IG: interference score; CW: number of items properly named in 45 s in the CW condition; W: number of items properly named in 45 s in the W condition; C: number of items properly named in 45 s in the C condition. Troyer et al., 2006;Bayard et al., 2011;Campanholo et al., 2014;Bezdicek et al., 2015;Hankee et al., 2016;Tremblay et al., 2016 • Completion time for each condition. recorded: (i) the time; (ii) the number of errors and (iii) the number of self-corrections in the CW. Moreover, they computed an interference score subtracting the number of errors in the CW conditions from the number of items properly named in 120 s in the same table. Lastly, Zimmermann et al. (2015) computed the number of errors and the number of correct answers given in 45 s in each conditions. Additionally, they calculated an interference score derived by the original scoring method provided by Stroop (1935). Of the five studies (Barbarotto et al., 1998;Caffarra et al., 2002;Amato et al., 2006;Valgimigli et al., 2010;Brugnolo et al., 2015) that provide normative data for the Italian population, two are originally written in Italian (Caffarra et al., 2002;Valgimigli et al., 2010), while the others are written in English (Barbarotto et al., 1998;Amato et al., 2006;Brugnolo et al., 2015). An English translation of the title and abstract of Caffarra et al. (2002) is available. Three of the studies consider the performance only on the SCWT (Caffarra et al., 2002;Valgimigli et al., 2010;Brugnolo et al., 2015) while the others also include other neuropsychological tests in the experimental assessment (Barbarotto et al., 1998;Amato et al., 2006). The studies are heterogeneous in that they differ in terms of administered conditions, scoring procedures, number of items, and colors used. Three studies adopted a 100-items version of the SCWT (Amato et al., 2006;Valgimigli et al., 2010;Brugnolo et al., 2015) which is similar to the original version proposed by Stroop (1935). In this version, in every condition (i.e., W, C, CW), items are arranged in a matrix of 10 × 10 columns and rows; the colors are red, green, blue, brown, and purple. However, while two of these studies administered the W, C, and CW conditions once (Amato et al., 2006;Valgimigli et al., 2010), Barbarotto et al. (1998) administered the CW table twice, requiring participants to read the word during the first administration and then to name the ink color during the consecutive administration. Additionally, they also administered a computerized version of the SCWT in which 40 stimuli are presented in each condition; red, blue, green, and yellow are used. Valgimigli et al. (2010) and Caffarra et al. (2002) administered shorter paper versions of the SCWT including only three colors (i.e., red, blue, green). More specifically, the former administered only the C and CW conditions including 60 items each, arranged in six columns of 10 items. The latter employed a version of 30 items for each condition (i.e., W, C, CW), arranged in three columns of 10 items each.
Only two of the five studies assessed and provided normative data for all the conditions of the SCWT (i.e., W, C, CW; Caffarra et al., 2002;Brugnolo et al., 2015), while others provide only partial results. Valgimigli et al. (2010) provided normative data only for the C and CW condition, while Amato et al. (2006) and Barbarotto et al. (1998) administered all the SCWT conditions (i.e., W, C, CW) but provide normative data only for the CW condition, and the C and CW condition respectively.
These studies use different methods to compute subjects' performance. Some studies record the time needed, independently in each condition, to read all (Amato et al., 2006) or a fixed number (Valgimigli et al., 2010) of presented stimuli. Others consider the number of correct answers produced in a fixed time (30 s; Amato et al., 2006;Brugnolo et al., 2015). Caffarra et al. (2002) and Valgimigli et al. (2010) provide a more complex interference index that relates the subject's performance in the incongruous condition with the performance in the others. In Caffarra et al. (2002), two interference indexes based on reading speed and accuracy, respectively, are computed using the following formula: Furthermore, in Valgimigli et al. (2010) an interference score is computed using the formula: I = ((DC − DI)/(DC + DI)) × 100 (7) where DC represents the correct answers produced in 20 s in naming colors and DI corresponds to the correct answers achieved in 20 s in the interference condition. However, they do not take into account the performance on the word reading condition.

DISCUSSION
According to the present review, multiple SCWT scoring methods are available in literature, with Golden's (1978) version being the most widely used. In the Italian literature, the heterogeneity in SCWT scoring methods increases dramatically. The parameters of speed and accuracy of the performance, essential for proper detection of the Stroop Effect, are scored differently between studies, thus highlighting methodological inconsistencies. Some of the reviewed studies score solely the speed of the performance (Amato et al., 2006;Valgimigli et al., 2010). Others measure both the accuracy and speed of performance (Barbarotto et al., 1998;Brugnolo et al., 2015); however, they provide no comparisons between subjects' performance on the different SCWT conditions. On the other hand, Caffarra et al. (2002) compared performance in the W, C, and CW conditions; however, they computed speed and accuracy independently. Only Valgimigli et al. (2010) present a scoring method in which an index merging speed and accuracy is computed for the performance in all the conditions; however, the Authors assessed solely the performance in the C and the CW conditions, neglecting the subject's performance in the W condition.
In our opinion, the reported scoring methods impede an exhaustive description of the performance on the SCWT, as suggested by clinical practice. For instance, if only the reading time is scored, while accuracy is not computed (Amato et al., 2006) or is computed independently (Caffarra et al., 2002), the consequences of possible inhibition difficulties on the processing speed cannot be assessed. Indeed, patients would report a nonpathological reading speed in the incongruous condition, despite extremely poor performance, even if they do not apply the rule "naming ink color, " simply reading the word (e.g., in CW condition, when the stimulus is the word/red/printed in green ink, patient says "Red" instead of "Green"). Such behaviors provide an indication of the failure to maintain consistent activation of the intended response in the incongruent Stroop condition, even if the participants properly understand the task. Such scenarios are often reported in different clinical populations. For example, in the incongruous condition, patients with frontal lesions (Vendrell et al., 1995;Stuss et al., 2001;Swick and Jovanovic, 2002) as well as patients affected by Parkinson's Disease (Fera et al., 2007;Djamshidian et al., 2011) reported significant impairments in terms of accuracy, but not in terms of processing speed. Counting the number of correct answers in a fixed time (Amato et al., 2006;Valgimigli et al., 2010;Brugnolo et al., 2015) may be a plausible solution.
Moreover, it must be noted that error rate (and not the speed) is an index of inhibitory control (McDowd et al., 1995) or an index of ability to maintain the tasks goal temporarily in a highly retrievable state (Kane and Engle, 2003). Nevertheless, computing exclusively the error rate (i.e., the accuracy in the performance), without measuring the speed of performance, would be insufficient for an extensive evaluation of the performance in the SCWT. In fact, the behavior in the incongruous condition (i.e., CW) may be affected by difficulties that are not directly related to an impaired ability to suppress the interference process, which may lead to misinterpretation of the patient's performance. People affected by color-blindness or dyslexia would represent the extreme case. Nonetheless, and more ordinarily, slowness, due to clinical circumstances like dysarthria, mood disorders such as depression, or collateral medication effect, may irremediably affect the performance in the SCWT. In Parkinson's Disease, ideomotor slowness (Gardner et al., 1959;Jankovic et al., 1990) impacts the processing speed in all SCWT conditions, determining a global difficulty in the response execution rather than a specific impairment in the CW condition (Stacy and Jankovic, 1992;Hsieh et al., 2008). Consequently, it seems necessary to relate the performance in the incongruous condition to word reading and color naming abilities, when inhibition capability has to be assessed, as proposed by Caffarra et al. (2002). In this method the W score and C score were subtracted from CW score. However, as previously mentioned, the scoring method suggested by Caffarra et al. (2002) computes errors and speed separately. Thus, so far, none of the proposed Italian normative scoring methods seem adequate to assess patients' performance in the SCWT properly and informatively.
Examples of more suitable interference scores can be found in non-Italian literature. Stroop (1935) proposed that the ability to inhibit cognitive interference can be measured in the SCWT using the formula: total time + ((2 × mean time per word) × number of uncorrected errors) where, total time is the overall time for reading; mean time per word is the overall time for reading divided by the number of items; and the number of uncorrected errors is the number of errors not spontaneously corrected. Gardner et al. (1959) also propose a similar formula: total time + ((total time/100) × number of errors) where 100 refers to the number of stimuli used in this version of the SCWT. When speed and errors are computed together, the correct recognition of patients who show difficulties in inhibiting interference despite a non-pathological reading time, increases. However, both the mentioned scores (Stroop, 1935;Mitrushina et al., 2005) may be susceptible to criticism (Jensen and Rohwer, 1966). In fact, even though accuracy and speed are merged into a global score in these studies (Stroop, 1935;Mitrushina et al., 2005), they are not computed independently. In Gardner et al. (1959) the number of errors are computed in relation to the mean time per item and then added to the total time, which may be redundant and lead to a miscomputation.
The most adopted scoring method in the international panorama is Golden (1978). Lansbergen et al. (2007) point out that the index IG might not be adequately corrected for inter-individual differences in the reading ability, despite its effective adjustment for color naming. The Authors highlight that the reading process is more automated in expert readers, and, consequently, they may be more susceptible to interference (Lansbergen et al., 2007), thus, requiring that the score is weighted according to individual reading ability. However, experimental data suggests that the increased reading practice does not affect the susceptibility to interference in SCWT (Jensen and Rohwer, 1966). Chafetz and Matthews (2004)'s article might be useful for a deeper understanding of the relationship between reading words and naming colors, but the debate about the role of reading ability on the inhibition process is still open. The issue about the role of reading ability on the SCWT performance cannot be adequately satisfied even if the Victoria Stroop Test scoring method (Strauss et al., 2006b) is adopted, since the absence of the standard W condition.
In the light of the previous considerations, we recommend that a scoring method for the SCWT should fulfill two main requirements. First, both accuracy and speed must be computed for all SCWT conditions. And secondly, a global index must be calculated to relate the performance in the incongruous condition to reading words and color naming abilities. The first requirement can be achieved by counting the number of correct answers in each condition in within a fixed time (Amato et al., 2006;Valgimigli et al., 2010;Brugnolo et al., 2015). The second requirement can be achieved by subtracting the W score and C score from CW score, as suggested by Caffarra et al. (2002). None of the studies reviewed satisfies both these requirements.
According to the review, the studies with Italian normative data present different theoretical interpretations of the SCWT scores. Amato et al. (2006) and Caffarra et al. (2002) describe the SCWT score as a measure of the fronto-executive functioning, while others use it as an index of the attentional functioning (Barbarotto et al., 1998;Valgimigli et al., 2010) or of general cognitive efficiency (Brugnolo et al., 2015). Slowing to a response conflict would be due to a failure of selective attention or a lack in the cognitive efficiency instead of a failure of response inhibition (Chafetz and Matthews, 2004); however, the performance in the SCWT is not exclusively related to concentration, attention or cognitive effectiveness, but it relies to a more specific executive-frontal domain. Indeed, subjects have to process selectively a specific visual feature blocking out continuously the automatic processing of reading (Zajano and Gorman, 1986;Shum et al., 1990), in order to solve correctly the task. The specific involvement of executive processes is supported by clinical data. Patients with anterior frontal lesions, and not with posterior cerebral damages, report significant difficulties in maintaining a consistent activation of the intended response (Valgimigli et al., 2010). Furthermore, Parkinson's Disease patients, characterized by executive dysfunction due to the disruption of dopaminergic pathway (Fera et al., 2007), reported difficulties in SCWT despite unimpaired attentional abilities (Fera et al., 2007;Djamshidian et al., 2011).

CONCLUSION
According to the present review, the heterogeneity in the SCWT scoring methods in international literature, and most dramatically in Italian literature, seems to require an innovative, alternative and unanimous scoring system to achieve a more proper interpretation of the performance in the SCWT. We propose to adopt a scoring method in which (i) the number of correct answers in a fixed time in each SCWT condition (W, C, CW) and (ii) a global index relative to the CW performance minus reading and/or colors naming abilities, are computed. Further studies are required to collect normative data for this scoring method and to study its applicability in clinical settings.

AUTHOR CONTRIBUTIONS
Conception of the work: FS. Acquisition of data: ST. Analysis and interpretation of data for the work: FS and ST. Writing: ST, and revising the work: FS. Final approval of the version to be published and agreement to be accountable for all aspects of the work: FS and ST.

ACKNOWLEDGMENTS
The Authors thank Prerana Sabnis for her careful proofreading of the manuscript.