Understanding Performance Decrements in a Letter-Canceling Task: Overcoming Habits or Inhibition of Reading

Myers, Larry; Downie, Steven; Taylor, Grant; Marrington, Jessica; Tehan, Gerald; Ireland, Michael J.

doi:10.3389/fpsyg.2018.00711

ORIGINAL RESEARCH article

Front. Psychol., 15 May 2018

Sec. Personality and Social Psychology

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.00711

Understanding Performance Decrements in a Letter-Canceling Task: Overcoming Habits or Inhibition of Reading

School of Psychology and Counselling, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, Australia

The importance of self-regulation in human behavior is readily apparent and diverse theoretical accounts for explaining self-regulation failures have been proposed. Typically, these accounts are based on a sequential task methodology where an initial task is presented to deplete self-regulatory resources, and carryover effects are then examined on a second outcome task. In the aftermath of high profile replication failures using a popular letter-crossing task as a means of depleting self-regulatory resources and subsequent criticisms of that task, current research into self-control is currently at an impasse. This is largely due to the lack of empirical research that tests explicit assumptions regarding the initial task. One such untested assumption is that for resource depletion to occur, the initial task must first establish an habitual response and then this habitual response must be inhibited, with behavioral inhibition being the causal factor in inducing depletion. This study reports on four experiments exploring performance on a letter-canceling task, where the rules for target identification remained constant but the method of responding differed (Experiment 1) and the coherence of the text was manipulated (Experiments 1–4). Experiment 1 established that habit forming and behavioral inhibition did not produce any performance decrement when the targets were embedded in random letter strings. Experiments 2–4 established that target detection was sensitive to language characteristics and the coherence of the background text, suggesting that participants’ automatic reading processes is a key driver of performance in the letter-e task.

Introduction

Self-regulation is typically taken to refer to people’s ability to overcome their immediate impulses in order to achieve different outcomes and it is argued to be the cornerstone of all functional behavior (Duckworth and Seligman, 2005; Baumeister and Vohs, 2007; Moffitt et al., 2011). The capacity for self-control is thought to emerge from one’s temperament and social surroundings (Wills and Dishion, 2004); with proficiency—or deficiency—in one’s capacity for self-regulation having an impact at individual, interpersonal, and social levels (Mead et al., 2010). Those who are adept at self-regulating tend to exercise more, be academically successful, and have more satisfying interpersonal relationships (Kelly and Conley, 1987; Duckworth and Seligman, 2005; Moffitt et al., 2011). While those who show a relative inability to self-regulate make impoverished monetary, health and dietary decisions, as well as show an increased propensity for risky and criminal behaviors (Pratt and Cullen, 2000; Baumeister, 2002; Quinn and Fromme, 2010). Additionally, low levels of self-regulation can also predict increased symptoms of some psychological conditions such as post-traumatic stress disorder (Walter et al., 2010).

With much of people’s lives affected by self-regulatory ability, there has been great interest in explaining the mechanisms and antecedents of self-regulation and self-regulatory failure. Baumeister et al. (1998) proposed one influential theory known as the limited-strength model of self-regulation. In this model, people’s ability to self-regulate is governed by a limited pool of resources; such that repeated acts of self-regulation deplete this resource pool leaving the participant vulnerable to subsequent self-regulatory failure until this resource pool is replenished. This decline of self-regulatory efficacy with increased self-regulatory actions is known as the ego-depletion effect. An often-used metaphor is to liken self-regulation to a muscle, in that the use of a muscle leaves it fatigued and temporally reduces its effectiveness until sufficient time passes in which it can fully recuperate (Baumeister et al., 1998; Muraven et al., 1998).

Empirical evidence for the limited-strength model originated in two seminal articles (Baumeister et al., 1998; Muraven et al., 1998), which, across eight experiments, demonstrated that one’s ability to self-regulate was impoverished in cases following an activity that also required self-regulation. Aside from providing support for the limited-strength model, these studies also established the sequential-task paradigm as the standard means of exploring resource depletion in general, and the strength model in particular.

The sequential-task paradigm refers to the succession of experimental manipulations presented to research participants. In most cases, participants engage in two sequential tasks: an intervention task and an outcome task. The dependent variable is assessed using performance on the outcome task. The outcome task is identical across the treatment and control conditions and always involves some form of self-regulation. The intervention task varies between conditions. For the control condition, the intervention task does not require self-regulation. In the experimental or intervention condition, the intervention task does require some use of self-regulation. In the experimental condition, the intervention task and outcome task can vary along cognitive or emotional domains as long as both involve self-regulation. The guiding assumption of this approach is that all forms of self-regulation rely on a domain-general pool of resources. The paradigmatic example of this occurs within Baumeister et al.’s (1998) seminal article in which the participants in the experimental condition had to resist the urge to eat chocolate chip cookies and subsequently attempt to solve unsolvable geometric line puzzles. The authors found that those who had to resist the urge to eat the cookies spent less time and made fewer attempts to solve the puzzles before abandoning the task. This led Baumeister et al. (1998) to conclude that self-regulating eating exhausted the general resource pool that were needed for persistence on the cognitive outcome task.

Conflicting Evidence for Ego-Depletion

Hagger et al. (2010) provided strong empirical support for the limited-strength model through a meta-analysis of 83 studies (comprising 198 individual tests) from which they concluded that the ego-depletion effect was reliable and represented a medium to large effect size, though they did note the effect was heterogeneous. However, when Carter and McCullough (2014) reanalyzed the Hagger et al. (2010) data they found that if small study effects and publication bias were accounted for then the “results do not support the claim that the depletion effect is meaningfully different from zero” (Carter and McCullough, 2014, p. 7). Furthermore, Carter et al. (2015) conducted another meta-analysis using stricter inclusion criteria (e.g., not including extension studies or those that used uncommon and ‘questionable’ intervention tasks), incorporating unpublished studies, and tested if subsequent effects are domain-general as the strength model assumes or restricted to specific cognitive domains. A reliable ego-depletion effect did emerge, however, it was restricted to one specific type of outcome task, that of standardized test scores. With all other outcome tasks their analysis suggested that the average ego-depletion effect was “indistinguishable from zero” (Carter et al., 2015, p. 16).

In addition to questions raised by these meta-analyses, researchers have reported difficulty in replicating findings using pre-registered studies and with diverse, large samples. Both Xu et al. (2014) and Lurquin et al. (2016) reported non-significant results despite using the most reliable experimental procedures and having large sample sizes to detect the effect. These critiques and replication failures of ego-depletion spurred a pre-registered, multi-lab replication study published by Hagger and Chatzisarantis (2016). The study used 23 independent labs and 2,141 participants to replicate a study by Sripada et al. (2014), which previously found a significant depletion effect (d = 0.69). The study employed a letter-crossing task as the intervention task and the Multi-Source Interference Task as the outcome task. The data from this trial failed to show a depletion effect significantly different from zero. Hagger and Chatzisarantis (2016) comment that this finding concurs more with Carter et al.’s (2015) critical meta-analytic estimate than Hagger et al.’s (2010) favorable meta-analytic estimate.

These findings are, however, not conclusive. Carter et al. (2015) note that while their analysis did include a large amount of unpublished data, there was still more data unavailable for analysis, thereby making their statistical corrections only speculative. Moreover, it has been argued that the bias-corrected estimates used in Carter et al.’s (2015) critical meta-analysis may not be a reliable correction for publication bias, especially if the effect is heterogeneous (Inzlicht et al., 2015; Reed et al., 2015). Additionally, Baumeister and Vohs (2016) criticized the multi-lab replication study on methodological grounds. They argued that the letter-e task would not deplete self-regulatory resources as participants did not have to overcome an habitual response, which they argue is a necessary requirement to induce depletion. Consequently, they argued that the lack of a depletion effect in the outcome task was understandable as the treatment task was not ego-depleting, because of the way in which the task was administered.

The Need for a New Type of Evidence in Ego-Depletion

While evidence for the ego-depletion effect remains inconclusive, it is unlikely that mere replications will be sufficient to properly evaluate the strength model. Lurquin and Miyake (2017) comment that to overcome “the conceptual crisis for the ego-depletion literature” (p. 1) performance on the intervention task needs to be independently evaluated. In other words, a major omission in the ego-depletion literature, and within the sequential task-paradigm specifically, is that performance on the intervention task is rarely examined and more importantly has not been shown to produce a decrement in performance that would be indicative of a resource depletion. This lack is partly due to the fact that many of the intervention tasks that have been adopted (e.g., not thinking of a white bear) are not readily amenable to measurement. However, until it can be demonstrated that intervention tasks do result in depleted resources, a key theoretical assumption of the strength model will remain just that, an assumption.

One popular depletion induction task where performance can be tracked is the letter-crossing task. The letter-crossing task is one of the most commonly used intervention tasks (Hagger et al., 2010; Carter et al., 2015) and is a type of search and identification task. It involves participants locating a particular letter, generally the letter e, according to different sets of rules. The most common operation is as follows. Participants are asked to cross out every letter e on a page of text, the point being to establish a habitual response. Once completed, in the experimental condition participants are given additional pages of text and additional rules that contravene the habitual response. For example, the first rule might instruct participants to cross out an e in cases where there is another vowel immediately before or after it. A second rule typically accompanies this rule and instructs them not to cross out the e if the vowel before or after is an i. Effectively, this means participants respond to ae, ea, ee, eo, oe, eu, and ue combinations, but inhibit any response to ei or ie. In the control condition, participants’ may simply be asked to continue with the same rule as for the first page and cross out every letter e on the following pages. The theoretical rationale for why the experimental condition consumes more self-regulatory resources than the control is straightforward. The instructions of the first page instill a habit (i.e., cross out every letter e) and the instructions on the second page force the participants to inhibit this habituated response (i.e., do not cross out the e in specific cases). It is argued that the overriding of this newly acquired habitual response is the cause of resource-depletion (Tice et al., 2007; Wheeler et al., 2007; Clarkson et al., 2010; DeWall et al., 2011; Boucher and Kofos, 2012; Molden et al., 2012; Salmon et al., 2014; Xu et al., 2014; Achtziger et al., 2015; Chow et al., 2015; Harkness et al., 2015; Petrocelli et al., 2015; Wang et al., 2015; Baumeister and Vohs, 2016; Haynes et al., 2016; Jia and Hirt, 2016; Voce and Moston, 2016).

The pertinent point for the present discussion is that, using this common task, both the experimental and control conditions require behavioral responses that can be tracked over time. This is critical since the depletion of regulatory resources is a within-subjects effect (occurs within participants over time) and the standard approach to observing it has been between-subjects using comparisons of experimental conditions. Currently, the only investigation known to the authors that systematically measured change across time is reported by Arber et al. (2017). In this study, participants were presented with five written stories (one per page as each story represented a page of text) and, in the experimental condition, were required to follow the rules just described regarding responses to vowel pairs. Across five independent studies, participants’ ability to detect target vowel combinations did in fact decline as a function of time on task (i.e., as they completed five stories/pages; see Figure 2). Furthermore, for those participants that showed performance decrements across time, deterioration in performance was also observed on a secondary outcome task (working memory span). At face value, finding a time dependent decline on the experimental task with carry-over effects on the outcome task is totally consistent with the predictions of the strength model. However, there are some aspects of that study that are problematic.

Rather than having an active control where participants identify instances of the letter e, Arber et al. (2017) utilized a passive control condition, in that they had a 10-min (equal to the experimental depletion-induction task length) “chat” with the experimenter before doing the outcome task. As such, it is yet to be shown whether the commonly used active control procedure would or would not show the same negative performance gradient. Finding an equivalent negative gradient of the control task would be particularly problematic for the strength model. Further, the procedures used by Arber et al. (2017) did not have the habit forming first page. All pages required the application of the two vowel pair rules. The researchers argued that having participants identify the letter e in one set of cases and not in another set of cases would be sufficient to cause resource depletion, as this is a self-regulatory action that would require resources. This notion is not without precedence; other researchers have also argued that this immediate implantation of both rules would be sufficient to cause ego-depletion (Baumeister et al., 1998; Fischer et al., 2007; Wan and Sternthal, 2008; Halali et al., 2014; Sripada et al., 2014; Hagger and Chatzisarantis, 2016).

This presence or absence of a habit-forming component in the letter-crossing task, however, has been the cause of recent debate, with Baumeister and Vohs (2016) listing the lack of a habit-forming stage in the Hagger and Chatzisarantis (2016) multi-lab replication study as a reason to dismiss the null findings. Baumeister and Vohs comment that the replications study’s version of the letter-crossing task, which also did not start with a habit forming stage prior to the instigation of a new set of rules, was an essential methodological flaw that invalidated the non-effects on the outcome measure. “Without first instilling the habit, there is nothing to override. This may be a difficult cognitive judgment task, but no impulse is overridden, contrary to the nature of self-control tasks” (Baumeister and Vohs, 2016, p. 574). Although there have been theoretical justifications for the requirement of behavioral inhibition within the letter-crossing task (e.g., Baumeister and Vohs, 2016), there has yet to be independent empirical evidence to give credence to such justifications. This dispute between the leading researchers as to whether a treatment condition is, in fact, ego-depleting, further emphasizes the need to empirically justify the theoretical claims made within the ego-depletion literature as Lurquin and Miyake (2017) have recommended.

Although the Arber et al. (2017) findings argue against the need for a habit-forming stage, from an empirical standpoint, the question remains as to whether the request to inhibit an ie or ei response is the cause of the decrement in performance in their data as alternative explanations could explain the negative performance gradient they found. For instance, it could be argued that the vowel pairs are embedded in text and that the participants must override an automatic response to read the words presented in the stimuli when they are trying to engage in the primary task of locating specific letters. While this explanation is acceptable within the limited-strength model, it is not the justification given within the literature. Additionally, if this alternative explanation can account for the depleting effect, then it would be equal across experimental and control conditions as standardly conceived. That is, both the control condition and the experimental conditions both require the overriding of the automatic reading process. Alternatively, simply following multiple rules could be the cause of depletion by merely increasing cognitive load (increasing task difficulty), and the fact that the second rule involves behavioral inhibition is only incidental and not necessary contributing to the depletion effects.

One might argue that the above manipulations simply increase the complexity and difficulty of the letter-e task. While manipulations of task difficulty across different presentations might serve to reduce performance (obviously, a more difficult task leads to poorer performance), task difficulty alone can’t account for the trend of declining performance with time on task. A more difficult task will mean poorer performance (a main effect in a sense) when compared to a less difficult task but it won’t necessarily produce greater deficits in performance over time. If task difficulty were the only process accounting for performance variance, we might expect the degree of decrement in performance over time to remain constant (or even improve due to practice effects) regardless of the difficulty of the task – though overall performance will be poorer when compared to an easier task. This is partly what supports our inference that performance decrements in a task like this may reflect declining resources available to do the task.

In general the letter e task is fairly easy (it is an effortful task but not a difficult task) and that is partly why we don’t observe practice effects like we do with more difficult skill-based tasks like Stroop tasks (people get better at it over time and therefore performance is enhanced). Difficult tasks plausibly require greater skill acquisition and practice to maximize ability, whereas the letter e task does not require much skill to complete. We believe characteristics of the task make it ideal for investigations into resource depletion.

Task difficulty across our experiments can plausibly account for variation in the overall performance differences between participants but cannot solely account for declining performance over time if the difficulty of the task remains constant through administration (i.e., over time). The task becomes more difficult for the participant over time as their resources decline, though the actual demands of the task remain constant.

The Present Studies

It is the goal of the following investigation to understand the factors that drive performance on the letter-e task. The first experiment was designed to explicitly test the Baumeister and Vohs (2016) assertion that habit forming and subsequent inhibition of that habit are necessary for resource depletion to occur. The key variable is the accuracy of target detection, and we assume that the negative performance gradient (see Figure 2) reported by Arber et al. (2017) will change as a function of the manipulations introduced. To preview our results, we could find little evidence in support of these assumptions or for the strength model. The strength model as it is currently articulated, could not provide a compelling theoretical basis for performance on the letter-e task.

As mentioned earlier Arber et al. (2017) did present empirical data that demonstrated a decline over time consistent with the notions of resource depletion. The follow-up experiments were designed to be an in-depth, exploratory examination of the stimuli and procedures used in their experiments. The intent here was to examine whether or not the marked decrement in performance could be the result of possible confounds in linguistic features of the materials they used, or the fact that the materials all involved coherent text.

Experiment 1

The first experiment is a conceptual replication of the Arber et al. (2017) study with three procedural differences which include (a) using quasi-random letters for the test materials instead of coherent text, (b) using a habit formation stage in the depleting condition as suggested by Baumeister and Vohs (2016), and (c) using three active conditions—one control group and two treatment groups.

The current study uses quasi-random letters as the stimuli to eliminate any automatic reading response as an alternative explanation for any decrement in performance. The text is quasi-random because the number of target items remains constant across pages and conditions, but their location is randomized within each page. Additionally, like the Arber et al. (2017) study, this experiment tracks participants’ performance across five sections of text, but unlike Arber et al. (2017), and in line with Baumeister and Vohs (2016) mandate, this experiment has one page for habit-forming followed by four pages in which the participants have to override this habitual response.

The final difference involves the experimental groups in the study. Unlike the Arber et al. (2017) study, the current experiment adopts an active control condition. Participants will be asked to circle every letter e they can locate and this rule will not change over the five pages of text. These participants will not be affected by any rules that require behavioral inhibition or affected by the requirement of following multiple rules.

This study utilizes two treatment conditions, the first of which we have labeled the inhibition condition. In this condition, participants circle all es (i.e., any letter e that has an a, e, i, o, or u directly before or after it) on the first page. Then on the following four pages, they are asked to continue to circle e-vowel pairs except if that pair contains an i (i.e., ei or ie pairs). In these cases, they are to refrain from circling the e-vowel pairs. In this manner, and in line with Baumeister and Vohs (2016) argument and previous justifications (e.g., Tice et al., 2007; DeWall et al., 2011), the participants in this condition will have one page in which they form a habit (i.e., circling every e-vowel pair) followed by four pages in which they will have to override this response (i.e., not circling the e-vowel pairs that contain the letter i). According to the limited-strength model, because this task involves repeatedly overriding a habitual response, a decrease in performance across the pages of text is hypothesized.

The other treatment condition we have labeled the no-inhibition condition. Participants start with one page in which they circle every e-vowel pair (i.e., any e that has an a, e, i, o, or u immediately before or after it). This is followed by four more pages on which they continue to circle e-vowel pairs. However, for these final four pages, there is an additional rule for e-vowel pairs that contain the letter i. That is in these cases, and in contrast to the inhibition condition, participants must circle and underline e-vowel pairs that contain the letter i. These participants are thus asked to detect ei and ie pairs, and to produce an alternate response, not inhibit a response. Although this task is as computationally difficult in the same way as the inhibition condition, according to the key premise that been the subject of recent debate, this task should not deplete self-regulatory resources as no behavioral inhibition is required (Baumeister and Vohs, 2016) and no decrement in performance across pages should be observed. However, if performance does deteriorate in the no-inhibition condition, this would suggest that following multiple rules is necessary for the effect and not behavioral inhibition per se.

In sum, the specific expectations derived from the strength model perspective are that there will be no significant difference in accuracy rates on the first page of text between the three conditions, as the behaviors only change after the completion of the first page of text. This will serve as the pre-test to check group equivalence. If Baumeister and Vohs (2016) are correct in their assertions, the accuracy rate will significantly decline across the final four pages of text for the inhibition condition, but no significant differences will be seen across these pages in the no-inhibition and control conditions.

Method

Ethics Statement

This study was carried out in accordance with the recommendations of NHMRC National Statement on Ethical Conduct in Human Research (2007) that governs research involving human participants in Australia. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Human Research Ethics Committee of the University of Southern Queensland.

Participants

The total sample size for the experiment was N = 77; however, data from three participants were removed because they did not follow their conditions instructions. The remaining participants’ age ranged from 18 to 74 years old with an average of 31.49 years (SD = 11.39 years), 48% were females, and 70% were studying at University or had university degrees. All participants for the experiment needed to be 18 years of age or older, have English as their predominant language and read basic English. While all participation for the experiment was voluntary and was conveniently sampled, a small portion (n = 6) were 1st-year psychology students who received course credit for participating in the experiment. All other participants were recruited from the researcher’s social and professional networks and received no incentive to participate in the experiment.

Materials

The study booklet consisted of seven pages of materials, of which the first five pages were used. The final two pages were dummy pages to ensure that participants did not anticipate the end of the task. The seven pages of test material were printed on white A4 paper with black, size eight, Times New Roman font with double line spacing. Each page consisted of 1,800 letters with a single space between each letter. There were 60 characters per line and 30 lines per page. On the five critical pages, each page contained 200 letter es, with a vowel directly before or after each e (see Figure 1 for an excerpt of a page of test materials). In the test pages for participants’ in the control condition, every instance of a double e was replaced with an e p. This was done so the location of the targets remained constant across conditions but ensured that each page only consisted of 200 targets; as a double e in this condition would be considered as two separate targets but that same double e would be considered as one target in both treatment conditions.

FIGURE 1

FIGURE 1. Excerpt of test material for the letter-crossing task of the current experiment. This excerpt was from the control condition so all cases of “e e” that are present in the other two conditions have been replaced with “e p.”

Aside from this change, the only difference in test materials between conditions was the printed instructions given to the participants. As such, each participant had 200 targets to identify on each page. As the control condition and the first page of both treatment conditions only had the one rule, all 200 targets on each of these pages corresponded to that one rule. On the remaining four pages in the treatment conditions, each page consisted of 120 targets that corresponded to the treatment conditions’ first rule (e plus a, e, o, u), and 80 that corresponded the second rule (e i or i e combinations).

To ensure that all pages consisted of 200 target items, with their location randomly distributed within each page, the 1,800 characters were comprised of 300 six-letter strings. The strings that did not contain any targets were made of random letters generated from a list of all alphabet letters—with replacement—excluding the letter e. For those strings that did contain a target, the letter e was located in one of four internal serial positions with a consonant always residing in the first and sixth serial position. The consequence of this is that target items could vary in position within the letter string but target items could not interact with one another between letter strings because there are always at least two consonants between them. In total, each page contained 200 letter-string that contained vowel combinations and 100 letter-strings that contained no target items. This ratio was consistent across the five pages; however, new letter-strings were generated for each page and a new random order of letter-strings was generated for each page. Only after the letter-strings and ordering were generated were the double es in the control condition replaced with the letters e and p.

The dependent measure in the control condition was the proportion of letter es detected by the participants. For both the inhibition and no-inhibition conditions, the DV was the number of e-vowel combinations detected on the first page, but on the remaining four pages, it was the number of e-vowel combinations that involved the letters a, e, o, and u. This dependent measure was chosen for two reasons. Firstly, Arber et al. (2017) showed a reliable decline in the accuracy rate of this measure across time as the strength model would predict and, for those who showed self-regulatory errors on the secondary rule, this decline was positively correlated with performance on a secondary task (again, consistent with the strength model). Secondly, it is the only measurable outcome that does not lend itself to contradicting interpretations. For instance, spending more time on each page could be indicative of both having greater self-regulatory resources allowing participants to persist longer (as in Baumeister et al., 1998, Experiment 1), and as having less self-regulatory resources resulting in slower reaction times (see Hagger and Chatzisarantis, 2016) for each item on the page. Also, random errors per page also lead to contradicting conclusions with random errors being indicative of low self-regulatory resources resulting in failures to correctly follow the instructions, as well as no random errors being interpreted as having lower self-regulatory resources resulting in an increased passive-option effect (see Baumeister et al., 1998, Experiment 4) making random errors actually less likely. Finally, tracking performance on targets that correspond with the second rule (i.e., e-vowel pairs that contain the letter i) becomes uninterpretable for the inhibition condition. This because they are told to not circle these vowel pairs making it impossible to know if they are correctly applying the rules to this target or are simply missing these targets.

Procedures

Following written, informed consent, each participant was tested individually. Participants were allocated to one of the three experimental conditions in a pseudo-random manner, such that there were at least 25 participants in each condition. All participants within a condition given the same order of the five pages of the test materials. Those in the control condition were first given their written instructions that told them to circle every letter e they could find and they were then given an example line to practice. After correct completion of this practice line, they were given the remaining pages of the test materials.

For both treatment conditions, participants were first given the same set of instructions that directed them to circle every letter e that had a vowel (a, e, i, o, or u) directly before or after it. They were also given a test line to ensure they understood their instructions, followed by the first page of test materials. After the participants completed their first page of text, those in the inhibition condition were given their new set of instructions that directed them to continue circling e-vowel pairs except if that pair contained an i (e.g., ei or ie). In those cases, they were told to not circle the pair. Those in the no-inhibition condition were also told to continue circling e-vowel pairs. They were also given an additional rule that directed them to circle and underline any e that had an i directly before or after it (e.g., e i or i e). Participants in both treatment conditions were given example lines to ensure they understood their new instructions. After they correctly completed this example line, they were then given the remaining four pages of text to complete. All participants were told that both their speed (which was recorded for each page of text) and accuracy were being recorded, and to work from top-to-bottom and from left-to-right as if they were reading the text. After the participants had completed the task, they were debriefed and thanked for their participation. The materials were then marked for accuracy. For each page, a proportion of target items correctly identified by the participant for each corresponding rule was calculated. Instances of random errors (i.e., circling letters did not comply with any rule) were also counted for each page of text.

Data Analysis Plan

All data for the following experiments can be found at https://osf.io/rnz5w/. All four experiments reported here deal with a single measure – the number of target items correctly detected. While other measures are possible, we have limited our focus to target identification because it is an obvious measure to test the assumptions of the strength model and empirically it has been shown to be sensitive to time on task and carry-over effects have been observed based on this measure (Arber et al., 2017). The means of all conditions in each experiment are reported, although not necessarily analyzed in all experiments. Sampling procedures are described in each method section which make it clear that sample size was determined prior to the commencement of testing and was determined by the limited time available to recruit participants. Decisions regarding the relationship between sample size and power are described in the following paragraph for Experiment 1 and the results section of Experiment 2. Data from all participants were used, unless indicated otherwise (Experiment 1) and data were not inspected until all participants had been tested.

Since the hypotheses of the experiment center on the presence or absence of an interaction between condition and time on task (story number), two a priori power analyses using G^∗power were conducted to evaluate the minimum sample size needed to detect that interaction. This analysis first established the sample size needed to detect an effect size equivalent to the large effect obtained in Arber et al. (2017). The second power analysis was conducted to determine the number of participants needed to detect a more conservative medium effect size. With the error rate set to 0.05, power set to 0.95 and a correlation of 0.5 among repeated measures a total sample of 39 was sufficient to test the critical interaction for both large and medium effect sizes. We did not use this power analysis as a stopping rule for determining the final number of participants, preferring to use samples sizes that were more indicative of prior studies. Consequently, the current sample exceeds that required to adequately evaluate the critical interaction.

All null-hypothesis significance testing was conducted using SPSS (version 20) with an α-level set at 0.05 (two-tailed). In addition to null-hypothesis testing, supplemental Bayesian statistics were conducted using JASP (version 0.8.1.2). The use of Bayesian statistics, and subsequent calculation of Bayes Factors (BF), assesses (a) evidence for the alternate hypothesis, (b) evidence for the null hypothesis and (c) no evidence for one over the other (Dienes, 2011), where the importance of experimental power is diminished. For example, a BF₀₁ = 11, indicates that the data is 11 times more likely under the null hypothesis than the alternate hypothesis. Note that the sub-script of 01 is an indication of reference in favor of the null hypothesis. If the sub-script was BF₁₀ this would then be interpreted as the data being 11 times more likely under the alternate hypothesis than the null hypothesis (Wagenmakers et al., 2017). BF of 3 and above are considered to be evidence for that specific hypothesis denoted by the subscript (Wagenmakers et al., 2017). The calculation for Bayesian statistics does require a prior distribution to be set–a prior belief as to what the effect is–and for the following analysis, the uninformed priors recommended by Wagenmakers et al. (2017) were utilized.

Results

Figure 2 shows participants accuracy rates over the five pages of text for the three conditions (Control n = 27, Inhibition n = 23, and No-Inhibition n = 24). It also displays the results of one of the studies from the aforementioned Arber et al. (2017) article for a visual comparison of a previously found negative gradient. As can be seen in Figure 2, participants’ accuracy rates were markedly similar on the first page of text, indicating that the groups were equivalent at the beginning of the task. Despite this, the obtained results differ markedly from Arber et al.’s previous findings, in that there is little deterioration across pages, and little difference among the three conditions.

FIGURE 2

FIGURE 2. Accuracy rate of the participants for the first rule of five pages of text. The dashed line is the result of previous research conducted by Arber et al. (2017). One page of text for the current study and one story for Arber et al. (2017) study represent functionally equivalent units of text.

To test for differences in accuracy rates for the first page of text, a one-way between-subjects ANOVA was conducted with the three levels of the experimental condition set as the fixed-factor and accuracy of the first page set as the DV. The results suggested that there was no difference between the groups for the first page of text, F(2,69) = 0.14, p = 0.87, $η_{p}^{2}$ < 0.01, BF₀₁ = 7.48.

To test for differences in accuracy rates on the following four pages a 3(condition) × 4(page [2–5]) mixed factorial ANOVA was conducted with accuracy as the DV. There was no main effect of accuracy rates across the four pages of text, F(3,213) = 0.52, p = 0.67, $η_{p}^{2}$ = 0.01, BF₀₁ = 35.82, however, for the main effect of experimental condition significant differences in accuracy rates were found, F_w(2,44.86) = 5.11, p = 0.010, $η_{p}^{2}$ = 0.19, BF₁₀ = 12.78. Furthermore, the condition × page interaction was significant, F(6,213) = 2.40, p = 0.029, $η_{p}^{2}$ = 0.06, BF₀₁ = 1.13.

To examine the condition × page interaction a one-way repeated measures ANOVA was conducted for each of the three experimental conditions. The results indicated that there was no significant change in performance for the control condition, F_G(1.77, 46.05) = 0.87, p = 0.46, $η_{p}^{2}$ = 0.03, BF₀₁ = 7.59, and the no-inhibition condition, F(3,69) = 1.43, p = 0.24, $η_{p}^{2}$ = 0.06, BF₀₁ = 3.76, with evidence supporting the null hypothesis in both conditions. While, the page effect for the inhibition condition was also non-significant using a strict 0.05 cut-off, F(3,66) = 2.56, p = 0.06, $η_{p}^{2}$ = 0.10, BF₀₁ = 1.03, the p-value approached significance. However, the Bayesian analysis favored neither the null nor alternate hypotheses.

Discussion

The first motivation of the current study was a general criticism of the resource-depletion literature. Specifically, that there has been little demonstration that the depletion tasks employed in the sequential-task paradigm actually resulted in decrements in performance consistent with a depleting pool of self-regulatory resources. The second, and more specific motivation, was based on Baumeister and Vohs (2016) assertion that depletion only occurs under a limited set of conditions; namely that a habit formation stage was necessary and that response inhibition of this habit was the causal factor in producing depletion effects. Consequently, it was expected that all groups would be equivalent on the first page of text when no response inhibition was required, but they should differ on subsequent pages. Specifically, it was expected that there would be no deterioration in performance across pages for those in the control condition and the no-inhibition condition as neither condition involved overriding a habitual response and therefore should not deplete self-regulatory resources. It was, however, expected that those in the inhibition condition would deplete self-regulatory resources which would be reflected in a performance decrement similar to that observed by Arber et al. (2017).

The outcomes of the study met all but one expectation. As predicted, the three groups were equivalent on the first page of text. Additionally, the control and no-inhibition conditions did not show any decrement in performance consistent with the notion that self-regulatory resources were not depleted, and that depletion may require habit-formation and dominant response-inhibition. However, contrary to model-based expectations, participants in the inhibition condition also did not show a marked decrement in performance across pages, and certainly not of the magnitude that was observed in the Arber et al. (2017) experiments. In short, there was no compelling evidence for performance deficits in any of the three conditions.

Counter to Baumeister and Vohs (2016) position, the results of this study do not support the notion that the behavioral inhibition implemented within the letter-crossing task drives the resource-depletion effect. The crucial decline in accuracy rates across page, which would suggest that self-regulatory resources were depleting, was not witnessed. The current data exhibited a trend in the expected direction but the effect was non-significant and the magnitude of the effect was unlike that observed in prior studies.

Given that the inhibition instructions were practically identical to those used in the Arber et al. (2017) study, a potential cause for the different outcomes in the magnitude of deterioration may be the stimuli used. Specifically, the Arber et al. (2017) study used intact stories for their stimuli, thereby containing a narrative to be read, while the current study used lines of random letters, thereby containing no elements that could be read. Given that it is widely understood that a large component of reading is automatic (Walczyk, 2000), it could be argued that the participants within the Arber et al. (2017) study had to override their automatic response to read the text when they were complying with the rules that were given to them. Furthermore, it is plausible that it was this overriding of their automatic reading response that was the cause of Arber et al.’s negative performance gradient. This would be in contrast to the current study in which there was no automatic reading response to overcome potentially contributing to the lack of resource-depletion. In short, the comparison of the effects observed between the studies would suggest that while participants engage in a secondary task (i.e., following the multiple rules of the letter-crossing task) some aspect of readable language in the stimuli is the active element needed to induce the resource-depletion effect.

The other major difference from the Arber et al. (2017) study was that in the current study, every letter e was paired with a vowel and therefore constituted a target. This is opposed to the Arber et al. (2017) materials in which isolated es were present, meaning that there were es that were not a target. Although this latter case is more reflective of the letter-crossing tasks that have been historically used, it does lead to an interesting question. That is, what habits are being overridden in the letter-crossing task? While researchers posit on theoretical grounds that participants form a habit of circling e-vowel pairs, operationally participants are historically exposed to more isolated es than e-vowel pairs. It could then be argued that the first habit formed would come from not circling isolated es which then must be broken to circle the e-vowel pairs, which then again must be broken if that vowel pair complies with the secondary ei, ie rule. This three-stage process present in the Arber et al. (2017) study was not employed within the current study and its absence may have contributed to the null finding. In other words, having multiple levels to the self-regulatory task may have additive or multiplicative effects on depleting self-regulatory resources and only having a two-stage self-regulatory process in the current study may have massively reduced the resource-depletion effect.

A third difference involves the fact that the letter-e task was presented in pencil and paper format. Arber et al. (2017) tested both pencil and paper and computer presented versions of the letter-e task, with all conditions producing negative performance gradients, so presentation format was not expected to influence the current results. However, to confirm this assumption, the remaining experiments incorporate computer presentation of text with participants verbally identifying the target items.

While the changes made to the presentation format were incorporated to control for language factors, it is possible that the changes have fundamentally changed the task. In fact, the stimuli in the experiment are very similar to those used in the Mesulam-Weintraub Cancellation Test (Mesulam, 1985), a standardized test that is used in neuropsychological assessment of visual scanning speed and visual neglect. The first trial on the test consists of a page containing rows/columns of random letters almost identical to those displayed in Figure 1, and the task is to search for the letter A in each of the rows/columns. These authors reported that normal adults could complete each of the four tests without error in less than 2 min. The fact that in the current experiment performance is also virtually error free, suggests the possibility that we have fundamentally changed the task turning it into a visual search task rather than a test of resources required for self-regulation. If this is so, then this test does not provide a fair test of the strength model.

To this point we have assumed that the results of Experiment 1 are somehow aberrant or do not provide a fair test of the strength model. It is, however, possible that the Arber et al. (2017) results are the aberrant findings. The materials in their study, five short stories, were sourced from the internet without any thought to the characteristics of the selected text. Moreover, the stories were always presented in the same order. Thus, if there was factor that made some targets more difficult to detect than others, and that was more prevalent in later stories than earlier stories, then deterioration in target detection across time could emerge. The possibility remains that confounds could be the causal factor for the decrement rather than depletion of self-regulatory resources. Alternatively, if the current test is not a fair test of the strength model, ithas nothing to say about the depletion process that is apparent in the Arber et al. (2017) findings. On both counts, a much closer inspection of the materials used in the Arber et al. (2017) experiment appear to be warranted.

One possible factor that may be playing a role in target identification is bigram frequency (i.e., the frequency of two-letter combinations occurring in the language). It is known that the frequency that the different e-vowel combinations occur in the written language have an impact on many cognitive tasks. In the visual word identification literature, for example, it is well established that bigram frequency has an impact upon reading and word identification, with frequently occurring combinations being associated with better performance [see Chetail (2015), for a review]. We took as a starting point, that identification of targets in coherent text would be related to the bigram frequency of the vowel combinations, with targets containing frequently occurring combinations like ea being better detected than targets containing rarer combinations like ae. If such an outcome was obtained, the interest then centered on how those differences changed across stories. It should be noted that such an exploration requires that we measure the degree to which specific words in the text were identified.

We were also interested in testing the notion that the discrepancies in the magnitude of the deterioration over time between the Arber et al. (2017) and Experiment 1 results were related to the interference that reading coherent prose produces. As such, we adopt the materials and procedures that Arber et al. (2017) employed (Experiment 2), but we manipulate the intelligibility of the prose by randomly re-ordering the words in each story (Experiment 3) or randomizing the letters in each word in each story (Experiment 4), while retaining the same surface structure (punctuation, sentence, and paragraph structure), with the target vowels being maintained in the same location across stories in all three experiments. Here we were interested in the extent to which absolute levels of performance would increase, and how bigram frequency effects might vary with the change in coherence.

Experiment 2

In exploring the impact of bigram frequency on target detection in the letter-e task, we utilize the materials of Arber et al. (2017). In that study, five short passages of text with 115 target items and 24 ie and ei words were used. In this and following experiments the independent variable was the specific vowel combinations, of which there were seven different combinations (see Table 1). However, in the statistical analyses, we have limited the conditions to the two most frequently occurring combinations ea (most frequent) and ee (less frequent), primarily because these are the only two conditions where there are examples in all five stories. The dependent variable in all studies was the proportion of participants who identified the target item. If bigram frequency has an effect upon performance then the expectation would be that targets containing frequently occurring combinations will be better recalled than less frequent combinations. It is also expected that target detection should deteriorate across stories.

TABLE 1

TABLE 1. Characteristics of the stories used in Experiments 2–4.