Workspace Disorder Does Not Influence Creativity and Executive Functions

Recent research by Vohs et al. (2013) garnered media attention after reporting that disordered environments increase creativity. The present research was designed to conceptually replicate and extend this finding by exploring the effect of workspace disorder on creativity. Participants were randomly assigned to work at a neatly organized (Order condition) or a messy desk (Disorder condition), where they completed several paper-and-pencil and computerized tasks, including two validated creativity measures (Abbreviated Torrance Test for Adults; ATTA; Goff and Torrance, 2002; Alternative Uses Task; adapted from Guilford, 1967). We also included several executive control measures from the NIH EXAMINER (Kramer, 2011), to explore the role of reduced top-down control in explaining a possible creativity-disorder connection. Independent-samples t-tests failed to replicate any significant difference in creativity between the Order and Disorder conditions. Furthermore, the conditions did not differentially affect executive control. Despite implementing an experimental setup similar to the one in Vohs et al. (2013), including a larger sample size, and adopting multiple measures of the constructs of interest, we did not find any effect of workspace clutter on cognitive performance. At this stage, the relationship between disorder and cognition seems elusive and does not warrant the claims it generated in the popular press.

The NIH EXAMINER battery (Kramer, 2011) includes subtests that assess different facets of executive functioning (fluency, task-switching, updating/working memory). The overall administration time of the subtests we included is approximately 30 minutes. In the Letter and Category Fluency Tasks, participants were asked to come up with as many words as possible beginning with the letter L, the letter F, belonging to the category "animal", and belonging to the category "vegetable" (1 minute for each subtask). Performance was assessed as number of correct responses (reflecting retrieval speed), as well as errors in the form of repetitions (of the same answer more than once) and violations of the task rules (for example, producing the name of a city instead of common names for letter fluency; or a food that is not a vegetable for category fluency). In the Set Shifting Task, participants had to respond to the shape or the color of bivalent stimuli in homogeneous blocks of "pure" trials (that is, blocks in which the same task is repeated) and intermixed blocks, in which naming the shape or the color randomly alternate. In the latter case of intermixed tasks, on a given trial participants might either perform the same task as on the previous trial ("stay" trials) or switch to the other task ("switch" trials). Following the task-switching literature (Monsell, 2003), we used mixing costs and switching costs as performance indices. Specifically, mixing costs were computed as the difference between RTs on stay trials in mixed blocks and pure trials in single-task blocks. Mixing costs have been associated with different processes summarized under the term task-set updating, and reflecting the active maintenance of multiple task-sets in working memory. Switching costs were measured as the difference in performance between switch and stay trials in mixed blocks, and are believed to capture taskset reconfiguration. In the Flanker Task, participants responded to the orientation of a central arrow that could appear as flanked by arrows pointing in the same direction of the target or in the opposite one. Performance was assessed by subtracting reaction times (RTs) in the congruent conditions from RTs in the incongruent condition (reflecting processes involved in solving the interference due to the presence of conflicting information). In the Digit Span Task (verbal working memory), participants counted strings of dots out loud, keeping in mind the total number of dots over consecutive trials and producing it at the end of the sequence; the longest sequence correctly held and retrieved on at least two trials was used as a performance indicator. In the N-back Task (visuo-spatial working memory), participants had to keep in mind the location of squares appearing on a screen one at a time, and comparing the location of each square to 1 or 2 squares presented before it (1-back and 2-back tasks, respectively); performance was assessed counting the number of errors committed for the each version of the n-back task.
The GEFT included 18 items and its alpha was .92.
Task Order. The individual differences measures were all administered at the beginning of the session and in the same order (FFI-GEFT). Instead, the experimental tasks were administered in four possible rotations to counterbalance the effect of task order. The four orders were as follows: Rotation 1: AUT-ATTA-Break-EXAMINER; Rotation 2: ATTA-AUT-Break-EXAMINER; Rotation 3: EXAMINER-Break-AUT-ATTA; Rotation 4: EXAMINER-Break-ATTA-AUT. To test the differences between rotations, a series of 2 (Condition: Order/Disorder) X 4 (Rotation: 1, 2, 3 & 4) ANOVAs were conducted on all the creativity and executive functioning measures. The only two rotation effects with lowest p-values are reported below. There was a significant difference in the Category Overall, given the isolated rotation effects on two minor performance aspects, and given that the rotations were collapsed, we believe that rotation differences did not bias our results.
Rotation 1. We examined differences between Order and Disorder in the task rotation that most closely matched the task order in the paper by Vohs et al. (2013) (n = 25), in which the Alternative Uses Task was administered right after participants were moved to the orderly or disorderly workspace. When comparing all the creativity measures between the conditions, no significant differences were found. The smallest pvalue (p = .079) was returned for the number of rejected responses at the ATTA 1 (M = 1.00, SD = 1.35; M = .23, SD = .60, respectively), however this finding was isolated and greater than the Bonferroni-corrected significance cut-off (.0125).
Rotation 3 and 4. Similarly, we compared performance on all the executive function measures between conditions using only Rotation 3 and 4 (n = 50), in which EF tasks were administered right after participants were moved to the orderly or disorderly workspace. Indeed, in these two rotations, the executive function measures were collected before the creativity ones, possibly reducing the risk of task interaction, fatigue, etc. The lowest p-value was obtained for Category Fluency violations (Order: M = .64, SD = 1.11; Disorder: M = 2.76, SD = 4.59; t (26.8) = -2.12; p = .03). Another low p-value was found for Category Fluency repetitions (Order: M = .56, SD = .92; Disorder: M = .20 , SD = .41; t (33.2) = .36; p = .08). It is worth noting that both p-values where far from the Bonferroni-corrected cut-off and the effects where in opposite directions, with more errors in the Disorder conditions in the first variable and less in the second. Overall, we feel confident to conclude that in our design the lack of condition effects on executive control was not attributable to task order.
Power analysis and sample size. We ran a power analysis using G-Power for an independent samples t-test with an effect size of 0.81 (which was the highest reported effect size by Vohs et al., 2013), using expected power of 0.80, and an alpha of 0.05. Given these constraints, the estimated total sample size per group is 50. When estimating the sample size needed for a multiple regression for a large effect size and power of 0.80 using 12 predictors, the total sample size is 61 people. Moreover, we followed recent guidelines on sample size in the context of replication studies (Simonsohn, 2015), and we increased the sample size to at least twice the size of the original study by Vohs et al. (2013), resulting in a final retained sample of 100 participants.
Regression Analyses. Finally, to assess whether executive functioning interacted with Order-Disorder to influence creativity, we regressed average creativity on the executive function and experimental condition variables. Furthermore, we added interactions between the condition and all executive function variables. The model did not significantly predict creativity [R 2 = 0.14, F (13, 80) < 1]. The only effect that remained significant when accounting for all other variables was the interaction between condition and switching cost (RT slowing when switching tasks in a mixed block) [B = -.16, p = .038, Bzero = -.17], where lower switching cost (better executive control) led to higher creativity scores in the Disorder condition (gradient of simple slope = -0.18, p = 0.085) compared to the Order condition (gradient of simple slope = 0.14, p = 0.24) [see Table B and Figure A,  Notes: All variables are grand mean centered; = standardized regression coefficients, B = unstandardized regression coefficients; * Shared contributions of the predictors; ∔ Unique contributions of the predictors.