The balanced mind: the variability of task-unrelated thoughts predicts error monitoring

Self-generated thoughts unrelated to ongoing activities, also known as “mind-wandering,” make up a substantial portion of our daily lives. Reports of such task-unrelated thoughts (TUTs) predict both poor performance on demanding cognitive tasks and blood-oxygen-level-dependent (BOLD) activity in the default mode network (DMN). However, recent findings suggest that TUTs and the DMN can also facilitate metacognitive abilities and related behaviors. To further understand these relationships, we examined the influence of subjective intensity, ruminative quality, and variability of mind-wandering on response inhibition and monitoring, using the Error Awareness Task (EAT). We expected to replicate links between TUT and reduced inhibition, and explored whether variance in TUT would predict improved error monitoring, reflecting a capacity to balance between internal and external cognition. By analyzing BOLD responses to subjective probes and the EAT, we dissociated contributions of the DMN, executive, and salience networks to task performance. While both response inhibition and online TUT ratings modulated BOLD activity in the medial prefrontal cortex (mPFC) of the DMN, the former recruited a more dorsal area implying functional segregation. We further found that individual differences in mean TUTs strongly predicted EAT stop accuracy, while TUT variability specifically predicted levels of error awareness. Interestingly, we also observed co-activation of salience and default mode regions during error awareness, supporting a link between monitoring and TUTs. Altogether our results suggest that although TUT is detrimental to task performance, fluctuations in attention between self-generated and external task-related thought is a characteristic of individuals with greater metacognitive monitoring capacity. Achieving a balance between internally and externally oriented thought may thus aid individuals in optimizing their task performance.


INTRODUCTION
Our day-to-day lives are rich with thoughts and feelings that emerge without a direct relationship to the here and now. Socalled "task-unrelated thoughts" (TUTs) can be quite variable in content. We might think about our dinner plans while waiting for the bus, or rehearse an important speech in the shower. These selfgenerated experiences are unique insofar as they are not derived directly from an external stimulus; rather they form a train of endogenous thoughts, perceptually decoupled from ongoing sensory information and any task being performed (Smallwood, 2013). While such thoughts presumably facilitate goal-oriented behavior over longer time frames, they can also interfere with cognitive performance of tasks in the moment, for example when worrying about a negative social interaction causes us to forget to stop for groceries on the way home from work. An interesting and underexplored question is how TUTs both facilitate and interfere with behavior, and the underlying brain processes supporting these interactions.
Large-scale thought sampling studies investigating the context and intensity of TUTs suggest that self-generated thoughts may comprise a large part of our daily mental activity (Killingsworth and Gilbert, 2010) and have a complex relationship to psychological well-being, relating to both costs and benefits (Smallwood and Andrews-Hanna, 2013). For example, while increased TUT intensity is commonly reported in attention-deficit disorder and negative affect (Weyandt et al., 2003;Smallwood et al., 2007b;McVay et al., 2008;Marchetti et al., 2012), TUT-related benefits for cognition included creativity , an enhanced memory for personally relevant information , the opportunity to plan for the future (Baird et al., 2011), and a style of decision-making characterized by patience (Smallwood et al., 2013a). One important question in investigations of self-generated thought is therefore what determines whether for a given individual or context, TUT is associated with costs or benefits.
Investigating self-generated thoughts presents particular methodological difficulties, as their spontaneous nature renders direct experimental manipulation problematic. An established method used in the present investigation is to study the experience of TUTs while people perform an external task; an advantage of this approach is that the experiential reports can be validated by a process of triangulation using behavioral, physiological, and subjective measures recorded during the session (Jack and Roepstorff, 2002;Schooler, 2002). Neuroimaging studies have revealed that self-generated cognition is linked to functional activity in the posterior cingulate (pCC) and medial prefrontal cortex (mPFC), central hubs of the default mode network (DMN) (Mason et al., 2007;Christoff et al., 2009). The DMN is a constellation of cortical regions also including mPFC, pCC, and inferior parietal cortex (Greicius et al., 2003;Hampson et al., 2006;Andrews-Hanna et al., 2010;Anticevic et al., 2010) that reliably deactivates during cognitively demanding tasks.
While functional connectivity studies suggest that the DMN may be "anti-correlated" with the salience and control related networks (Fox et al., 2005; although see Murphy et al., 2009 for critique), the network also participates in a variety of functional processes important for self-regulation including prospection, episodic memory, and social cognition (Buckner and Carroll, 2007). Several functional magnetic resonance imaging (fMRI) studies have implicated activity in the mPFC and the pCC specifically with self-reports of mind-wandering, including one study that found that when participants reported being more aware of their mind-wandering, executive and DMN nodes co-activate (Mason et al., 2007;Christoff et al., 2009;Stawarczyk et al., 2011). Correlation between the control and DMN is also observed during autobiographical planning (Spreng et al., 2010). The extent and nature of interactions between these networks and their contribution to the costs and benefits of TUTs is currently unclear.
Consistent with a general functional role of self-generated thought, the mPFC is typically activated when thinking about the self and when making judgments about others (Mitchell, 2009). In addition, the medial and rostral-lateral portion of the PFC are also implicated in social cognition and metacognitive problem solving (Burgess et al., 2007;Dumontheil et al., 2010a,b) and individual differences in the volume of the rostral-lateral PFC predict metacognitive ability (Fleming et al., 2010). Metacognition supports flexible problem solving, and is thought to both monitor and control internal and external attention (Flavell, 1979;Fleming et al., 2012). Consistent with this notion, a recent resting state study by Baird et al. (2013) demonstrated that the medial section of the frontal pole shows increased functional integration with regions of the DMN for individuals with greater metacognitive performance on a memory task, while lateral regions of the mPFC predicted improved metacognition of perceptual processes. Together evidence for both self-generated thought and metacognition converge on the notion that the mPFC supports such processes, including those necessary to navigate complex social interactions (Amodio and Frith, 2006;Frith and Frith, 2012). Metacognitive monitoring enables an individual to correct problems in task performance, facilitating flexible responses. As such, a key aim of the current experiment was to both replicate the well-documented impairment of task performance by TUTs and also establish whether the monitoring of concurrent performance might be similarly impaired.
Error monitoring in the context of response inhibition is an extensively researched metacognitive ability, beginning with early work suggesting that correction of errors can occur as early as 200 ms post-error, before conscious awareness of having committed a mistake (Rabbitt, 1966(Rabbitt, , 2002. Although a variety of experimental methods exist for eliciting awareness of errors, a common difficulty relates to eliciting sufficient aware and unaware errors for comparison, due to an inverse relationship between task difficulty and awareness (e.g., participants typically make few errors with high overall awareness on easy tasks, or many errors with little awareness on difficult tasks). One effective experimental paradigm is the Error Awareness Task (EAT) in which a participant must inhibit their responses according to two competing stimulus rules (Hester et al., 2005(Hester et al., , 2012O'Connell et al., 2009). Prior studies have shown that awareness of errors in the EAT depends upon a distributed neural network including anterior insula, cingulate cortices, and medial frontal gyrus (Hester et al., 2005(Hester et al., , 2012Ullsperger et al., 2010). Here we used the EAT to distinguish the contributions of particular neural systems, including the salience and DMNs, to task performance, error monitoring, and TUTs. Additionally, we explored whether particular aspects of mind-wandering, such as its intensity and variability, would predict error monitoring performance.
To measure the experience of TUT we embedded experience sampling probes within the EAT, prompting participants to rate the subjective intensity of TUTs in the preceding interval. Previous investigations have utilized this approach in both behavioral  and neuroimaging (Christoff et al., 2009) research. Although it is unclear whether TUTs should be treated as a continuous or dichotomous state (although see Schad et al., 2012 for evidence of the former), our approach allows the estimation of the intensity of subjective experience within a given period. This offers the advantage of a metric that combines both the temporal occurrence and subjective intensity of different aspects of mind-wandering. We also explored a phenomenological distinction concerning the self-absorbing nature of TUTs. To do so, participants were trained to rate both the intensity of TUTs and the subjective "stickiness" of these experiences. We defined TUT stickiness as recurring thoughts that absorb attention or meta-awareness beyond their intrinsic frequency. Our aim was to dissociate the intensity of TUTs from their ability to absorb awareness, leading to rumination.
In addition to examining overall TUT and stickiness rates, we were also interested in the variability of these experiences. Within-subject variability reflects important dynamical aspects of cognition and experience, reflecting discrete state transitions (Varela et al., 2001;Lutz et al., 2002). While analysis of selfreported mean TUT yields information relating to the contents of introspective subjective awareness, we reasoned that variability in TUT report usage might reflect underlying trends in TUT experience not necessarily open to direct self-report. Previous research has demonstrated a link between reaction time variability during sustained attention tasks and attentional instability (Larson and Alderton, 1990;Stuss et al., 1994;Hultsch et al., 2002). Increased reaction time variability during sustained attention tasks is also predictive of psychopathology; people with ADHD typically show alterations on RT variance even when there are no discernable differences in mean RT (Leth-Steensen et al., 2000;Vaurio et al., 2009;Epstein et al., 2011). However, variability of reaction time can also be adaptive, for instance in slowing responses following error awareness (Shalgi et al., 2007), and variation in RT is reduced following intensive attention training (Lutz et al., 2009). Following commission errors participants rapidly decelerate motor responses (e.g., post-error slowing). This source of RT variability is thought to reflect flexible monitoring behavior, and is reduced both in patients with ADHD (Shiels et al., 2012) and following a negative mood induction-a period when the intensity of TUTs increase . As selfgenerated thought has both costs and benefits (Smallwood and Andrews-Hanna, 2013), it is conceivable that a highly variable style of thinking in which neither TUT nor task-related thought unduly dominate cognition could support flexible and adaptive cognitive performance. In summary, the present experiment examined the relationship amongst within-and between-subject variability in TUT intensity and stickiness, response inhibition accuracy, and the awareness of consequent mistakes. Based on prior research, we expected increased TUT to be associated with worse inhibition performance and to engage prefrontal nodes of the DMN. The main focus of the experiment, however, was to ascertain the relationship between TUT and metacognition, which was assessed by measuring self-reported stickiness and error monitoring. One possibility is that better metacognition increases the ability to regulate mind-wandering (Schooler, 2002) and so for motivated participants under demanding conditions, the capacity to detect errors should correlate with reduced TUT. Alternatively, if TUT variability reflects flexible shifting between internal and external information, balancing both selfgenerated and perceptually directed thought, then we expected to find greater variability in TUT associated with increased error awareness.

STUDY PARTICIPANTS
42 participants (27 females) were recruited from an online participant pool system in Aarhus, Denmark, from both the local university and community. The average age of participants was 34.8 years (±0.9 SEM, range = 25-47 years), with 17.6 mean years education (±0.5 SEM, range = 10-23 years). All procedures were approved by the local research ethics committee, De Videnskabsetiske Komitéer for Region Midtjylland, in accordance with the declaration of Helsinki. As part of a separate investigation concerning the impact of mindfulness on EAT and visual sensitivity, half of our participants (n = 21) were mindfulness meditation practitioners recruited locally using flyers and an online participant pool (Sona-Systems Experiment Management Software).
Here we were specifically interested in how individual differences in TUT experience and variability would predict EAT performance; inclusion of meditation practitioners in our sample was thus used as a strategy to maximize TUT-related variability within the sample. To ensure that our present findings were not biased by systematic group differences, all analyses were conducted using group status as a nuisance covariate. Specific group contrasts are not examined here; although they will be reported in a follow-up investigation of the impact of mindfulness training on EAT performance and visual sensitivity. Groups were matched for age (mean age meditation = 35.1 years, mean age control = 34.6 years), gender (meditation = 14 males, control = 15 males), and education (controls mean education = 16.5 years; meditation mean education = 18.6 years). In our meditation study we aimed to specifically sample "adept" practitioners; inclusion criteria specified that participants must practice at least 20 min per day at a minimum of 3 times per week over the two years prior to the study, and have attended at least 1 meditation retreat in the previous year (mean hours practiced = 1303.6).
All fMRI scans were acquired over a one-week period following enrollment in the study. Participation in the fMRI scan was incentivized with a 200DKK (approximately $35 USD) reimbursement, and to control motivation all participants were instructed that the top 1/3rd of scores on the scanning task would receive an additional 200DKK (Jensen et al., 2012).

EXPERIMENTAL PROCEDURES
Before scanning, participants were informed that the purpose of the study was to investigate individual differences in their attentional ability. Participants visited the lab twice, once to provide informed consent and complete a psychophysical vision sensitivity test (data not reported here) and again to complete the fMRI scan. Specifically the psychophysics test was the "theory of visual attention task" (TAVT). This measure was included to replicate a previous result that meditation experience improves TAVT performance irrespective of motivation levels (Jensen et al., 2012) and is thus not analyzed here. Participants completed 6 runs of the EAT within the scanner, ∼45 min in total. Immediately following the scan, participants completed a debriefing survey, rating (0-100) their experienced difficulty, interest in the task, task effort expended, and self-estimated stop accuracy and error awareness. These measures were included as part of the meditation study to investigate the role of perceived effort, interest, and retrospective metacognition in detected group differences. They are presented here as overall summary measures indicating general participant engagement with the task. See Table 1, below for descriptive statistics of these measures.

Error awareness fMRI task
To assess individual differences in response-inhibition and error monitoring, we adapted the delayed-response EAT from Hester et al. (2005Hester et al. ( , 2009Hester et al. ( , 2012 and Shalgi et al. (2007) (see Figure 1 for a task schematic). The EAT requires participants to respond to a serial presentation of color-words in incongruent font colors (i.e., the word "blue" colored red). Participants were instructed to respond to Go trials by pressing the "1" button on a 2-button FIGURE 1 | The EAT task with interleaved thought probes. Adapted with permission from Hester et al. (2012). Participants respond by pressing the left button (L) during Go trials and withhold from responding (−) to repeated or font-color matching words. Following commission errors, participants are trained to forgo the normal Go response to instead press the right button "R" indicating error awareness for that trial. Pseudo-randomly intermixed "thought probes" prompted participants to rate the intensity of TUTs and their "stickiness" in the pre-probe interval. See Methods for detailed overview of task timing and instructions.
box with the right index finger. No-go trials in which participants were instructed to withhold their response ("stop") occurred on approximately 11% of the total trials, according to two stop rules, "repeat" and "color-match" (Hester et al., , 2012. In the latter, participants were required to stop whenever a word was presented in matching font-color, and in the former whenever a word was repeated on two consecutive trials. Each trial consisted of a stimulus (600 ms) followed by an interstimulus interval (900 ms). Participants completed 6 runs of the EAT within the scanner, each consisting of 200 Go and 25 pseudo-randomly intermixed Stop trials, for a total of 1350 trials. In order to maximize unaware errors and mind-wandering, participants were trained to respond during the interstimulus interval, emphasizing accuracy and timing consistency over absolute response speed, increasing the repetitive nature of the Go task as in Shalgi et al. (2007). Response timing has been used previously on the EAT task and typically reduces the intersubject variability of responses (Hester et al., 2005). In the case of a commission error, on the trial immediately following that error participants were instructed to forgo their normal Go response and to instead "fix" their error using a second button (right middle index finger), indicating error awareness. Participants were randomly asked throughout the task to answer probes regarding their experience during the task using the 1 (right index) and 2 (right middle index) button (see below). Each probe lasted up to 6 s in total followed by a fixation cross (duration = 6 s -probe duration).
Because the occurrence of TUTs is negatively related to task difficulty (Christoff et al., 2009), task-overlearning was promoted prior to scanning by training all participants on 2-4 practice runs of the EAT until a minimum of 40% stop and 40% error awareness rate was reached. To control participant's motivation to perform the task, participants were instructed that they would gain an additional 200 DKK (about 35 USD) if they were within the top 1/3rd of EAT performance. Participants were further reminded that "fixing" commission errors via the report button would cause those errors to not count against their total score, to ensure that participants did not selectively focus on stopping to the exclusion of error reports. Individual stop accuracy and error awareness scores were determined for each participant. Stop accuracy was calculated as the ratio of correctly withheld stop trials over the total number of stop trials (Correct stops/total stops), and error awareness as the ratio of number of reported error trials over total number of error trials (aware errors/unaware errors). Go accuracy was calculated as the ratio of the total number of correct responses (e.g., reaction time > 0 in a Go trial) over the total number of Go trials, excluding trials following errors (which are confounded by the error reporting response). Mean TUT and stickiness scores, as well as within-subject standard deviations were calculated for each participant as measures of the average content and variance of each subjective dimension.
Prior to analysis participants with extremely low stop accuracy (indicating a failure to correctly perform the task) were identified; one participant with <50% accuracy was excluded from all subsequent EAT-related analyses. As overall performance was generally high, three participants had too few errors (<5) to be included in error related behavioral analyses and were excluded (O'Connell et al., 2009;Hester et al., 2012). Finally, for our error related fMRI analysis 2 additional participants with 0 aware errors were excluded. A total of 6 participants were thus excluded from the error related fMRI analysis.

Subjective mind-wandering reports
Subjective awareness of TUTs was assessed in a similar fashion to Christoff et al. (2009), using interleaved "thought probes" distributed pseudo-randomly throughout the EAT task. Each probe consisted of two questions, one evaluating the subjective intensity of TUTs, "Where was your attention focused just before the probe?" and the second evaluating the "sticky" or ruminative quality of mind-wandering, "How sticky were your TUTs just before the probe?." Participants responded using a 7-point scale ranging from "completely offtask" to "completely ontask" for the first and "completely sticky" to "not at all sticky" for the second. As we were interested in investigating particular phenomenological properties of mind-wandering, we combined two previous approaches to sampling task irrelevant thoughts. First, to minimize differences in scale properties we utilized a TUT scale in which participants rated their pre-probe thoughts as "ontask" or "offtask" as in Christoff et al. (2009). However, to specifically operationalize the subjective intensity of mind-wandering, we adapted phenomenological descriptions of TUTs from Mason et al. (2007). Participants were thus instructed that being "ontask" specifically meant that they had a "low frequency of task irrelevant thoughts," with task irrelevant thoughts being defined as "any that do not facilitate performance and are not immediate reactions to perceptual information gleaned over the course of a trial." All participants indicated understanding that ratings on the ontask/offtask scale corresponded to this definition. Examples of task relevant thoughts were given, such as those concerning the color of a word on the previous trial.
By fixing this subjective dimension, we aimed to stress the possibility of dissociation between high intensity TUTs with little impact on participants' metacognitive capacity and highintensity TUTs that fully absorbed the participants' attention (e.g., "sticky" TUTs). Participants were therefor instructed to rate the "stickiness" of their task-irrelevant thoughts, with sticky thoughts defined as those that "distract (the participant) for a greater period of time, and are more attention catching than other task-irrelevant thoughts; this experience is sometimes described as being 'lost in thought' ." Stickiness was thus included to explore whether ruminative and absorptive TUT compared to nonruminative and non-absorptive TUT differentially impact sustained attention and error awareness (Koster et al., 2011;Van Vugt et al., 2012). Examples were given to emphasize the decoupled nature of sticky and unsticky task-irrelevant thoughts; participants were instructed that for example certain thoughts might arise ("What am I having for dinner tonight?") but be relatively non-distracting from the task, whereas others ("Did I leave my oven on?") might recur frequently throughout the task and demand more attention. Participants were provided with further examples until they indicated a good understanding of the distinction and completed practice probes during the EAT training session. During scanning participants completed 26 probes in total, 4-5 per 6 EAT runs, with one probe event ("focus" and "sticky") occurring pseudo-randomly every 40-60 trials throughout the EAT. Each probe appeared for a maximum of 6 s followed by a fixation cross lasting up to 6 s depending on probe reaction time (e.g., 6 -Probe Duration). Due to the dependent nature of sticky thoughts on having some TUTs, we did not counter-balance the order of focus and sticky. Stickiness was thus operationalized as a second-order judgment on the quality of those TUTs reported in the first probe.

Auxillary recordings
As the BOLD signal reflects complicated neurovascular coupling, a considerable portion of BOLD variability can be explained by non-neural origins such as respiratory and cardiovascular fluctuation (Glover et al., 2000;Lund et al., 2006). Previous investigations have shown that regions implicated in both error monitoring (e.g., insula and cingulate) and mind-wandering (mPFC) are among the most susceptible to such artifacts, with as much as 8% of event-related variance being explained by these sources . To exclude such confounds and improve overall signal-to-noise ratio, we recorded both respiration and pulse in parallel with EPI image acquisition, in order to apply a nuisance variable regression approach to modeling serial correlations in the BOLD time series (Lund et al., 2006). During the functional MRI acquisition the cardiac and respiratory cycles were recorded with an infrared pulse oximeter on the patient's index-finger and a pneumatic thoracic belt, respectively.
All pulse and respiration time series were visually examined for acquisition artifacts (e.g., clipping, drop-out). Due to technical failure of the respiration belt, respiration time series were severely confounded and discarded from further analysis. While inclusion of both respiratory and pulse regressors has been shown to provide an optimal estimation of serial correlation, inclusion of pulse and motion regressors without respiration has been shown to also outperform standard autoregressive ("AR1") noisewhitening techniques, particularly at faster repetition times (e.g., TR <4 s) (Lund et al., 2006). Descriptive statistics (mean heartbeats per min) were calculated for each subject. One participant's Frontiers in Human Neuroscience www.frontiersin.org November 2013 | Volume 7 | Article 743 | 5 physiological data were lost due to technical failure and was hence excluded from all fMRI analyses.

fMRI acquisition protocols and preprocessing
Echo-planar images (EPI) were acquired at the Aarhus University Hospital, using a T2 * -weighted, gradient echo sequence on a 3 Tesla (Siemens Trio) scanner, equipped with a 32-channel head coil. EPI images were acquired in an interleaved slice acquisition order (TR = 2000 ms, TE = 30 ms, flip angle = 90 • , 47 slices of 3 mm thickness, in-plane resolution of 3 × 3 × 3, FOV = 192 × 192 mm). Soft cushions were used to minimize head movement. All fMRI preprocessing and data analyses were performed in SPM8 (version 4667) (Friston et al., 2006). Default settings were used throughout, unless otherwise specified. The functional images of each participant were realigned and resliced , spatially normalized to MNI space using the SPM EPI template and trilinear interpolation (Ashburner and Friston, 1999), and smoothed using a 8 mm full-width half-maximum (FWHM) smoothing kernel (Worsley and Friston, 1995;Friston et al., 2000). Serial correlations were modeled using a nuisance variable regression approach (Lund et al., 2006). In addition to the SPM8 standard discrete cosine set high pass filter (128 s cut off), this approach includes 10 regressors based on cardiac and/or respiratory oscillations (Glover et al., 2000) and 6 motion parameters obtained from the realignment algorithm .

Error awareness task-reaction times and accuracy
To compare our results with previous experiments using the EAT, we analyzed accuracy and reaction time values across conditions. Go reaction times for each participant were calculated as the mean of all correct Go trials, excluding responses 2 SD below the participants mean Go RT. For comparison to previous experiments with the EAT, stop accuracy, error awareness, and mean reaction times were calculated for each subcategory of stop, i.e., color and repeat stop accuracy, color and repeat aware/unaware errors, and RT to color and repeat stop errors. Mean reaction times where inspected for values ±2 SD from the mean, resulting in the removal of 3 participants RT data. Reaction times were entered into One-Way repeated measures ANOVAs, within subject factor Response type (Go, Aware, Unaware) to assess differences in response speed across condition. Finally, we analyzed Stop Accuracy and Error awareness for each error subcategory (Repeat, Color) in separate one-way repeated measures ANOVAs.

Mind-wandering and behavior
Our first aim was to replicate previously reported relationships between performance and TUTs. To establish whether or not TUT ratings were utilized as a continuous or discrete measure, we created response histograms across all collected ratings, which showed a clear continuous distribution indicating that participants did not treat the scales as discrete binary measures (Figure 2). As TUT variance and stickiness had not been previously investigated, we then conducted a cross-correlation analysis to determine measurement colinearity. All scores were converted into Z-scores at the group level. Correlations between TUT Mean ., TUT Variance , Stickiness Mean , Stickiness Variance , error awareness and stop accuracy were calculated (see Table 1). TUT Mean and Stickiness Mean were highly correlated (r = 0.78, p < 0.001). TUT Mean also predicted stop accuracy (r = −0.56, p < 0.001). TUT Variance was correlated with Stickiness Variance , (r = 0.65, p <

fMRI analysis-single subject level
Following preprocessing, functional BOLD data were analyzed using an event-related hierarchical general linear modeling approach (Friston et al., 1994). Sessions were first concatenated and then entered into a first level design matrix modeling fixed linear effects over the entire time series. Each fMRI time series was modeled using 3 event-related regressors (duration = 0 s) for each condition of interest, in order: Correct Stop Trials, Unaware Errors, Aware Errors, as well as a separate TUT probe regressor (30 s duration epoch) shifted 37.50 s (25 trials) before each thought probe occurred, and a parametric modulation of the probe regressor encoding the rating for that probe. Due to the high correlation between TUT and stickiness ratings, we estimated two separate models, one with the mean TUT rating only and one with the stickiness only. Stop trials were modeled as the onset of each correct stop. Aware and unaware error trials were modeled as the onset of the trial in which the error occurred. In the EAT, Go trials are commonly left unmodeled as implicit baseline (Hester et al., 2005(Hester et al., , 2012O'Connell et al., 2009). The onset of each probe block regressor was jittered ± ∼1 s. Probe regressors thus modeled task-related activity in the pre-probe interval and the linear modulatory effect of self-reported TUT intensity during that period. In addition to conditions of interest, each session model included 10 pulse and 6 motion-realignment nuisance regressors, to model confounding effects of these parameters. Session offsets modeled between-session variance. Fixed-effects of interest were identified using unidirectional t-contrasts for correct stop trials [(1 0 0 0 . . . )], aware > unaware errors [(0 −1 1 0. . . )], and linear correlation with the TUT report parameter [(0 0 0 0 1)]. As is common in error awareness paradigms, participants generally report substantially fewer errors than they commit (see Table 2 for summary of average total errors in each condition), leading to concerns that comparison of aware and unaware events may be unduly biased. Previous use of the EAT (Hester et al., 2005) has demonstrated that the aware vs. unaware analysis is not biased toward activity from aware errors.

fMRI analysis-group level
Our random-effects (RFX) analysis focused on three contrasts: correct stops (vs. baseline), aware vs. unaware errors, and the negative correlation of TUT reports and task-related BOLD activity. All RFX analyses were conducted by passing each participant's corresponding contrast image (stops, aware > unaware, TUT ratings) to a one-sample t-test. These contrasts were corrected for multiple comparisons using Gaussian random-field-theory, peak level family-wise error threshold (FWE) p < 0.05 (Worsley et al., 1996). For our analysis of BOLD correlation with task-unrelated thoughts, a mask of the DMN was created by conducting an automated meta-analysis on the Neurosynth database for "mPFC" (http://neurosynth.org) (Yarkoni et al., 2011). As the TUT parameter encoded the intensity of TUTs (1-7) prior to each probe, low values encoded higher levels of TUT. Thus, at the group contrast level, we tested for areas where greater reports of TUT predicted higher BOLD activity (negative correlation with TUT ratings and BOLD). To restrict the mask to primary clusters in mPFC, pCC, and inferior parietal lobes the downloaded NIFTII image was binarized at a Z-score > 4 threshold. The resulting mask (see Figure 5) was visually inspected to confirm that it provided good coverage of key DMN nodes, particularly in mPFC and pCC, and was subsequently applied in a region of interest analysis using the Wake Forest University (WFU) Pickatlas toolbox v3.0 (Maldjian et al., 2003(Maldjian et al., , 2004, cluster-level corrected for multiple corrections, voxel selection threshold p = 0.01, pFWE <0.05 (Hayasaka et al., 2004). To ensure BOLD results were not biased by group status, all random effects contrasts included group status as a covariate of no-interest.
There was a significant effect of response condition on reaction time (RT) F (1.69, 57.77) = 5.17, p = 0.012; post-hoc comparisons demonstrated that this effect was driven by a significant difference between Go RT (M = 1103.32 ms, SD = 11.49 ms) and Unaware Error RTs (M = 1060.31 ms, SD = 17.31 ms), mean difference = −43.0 ms, p = 0.008. This decrease in RT during unaware errors is consistent with prior studies showing a link between task automaticity and mind-wandering (Smallwood et al., 2007a(Smallwood et al., , 2008b [F (4, 38) = 3.21, p = 0.025], respectively. Within the model predicting stop accuracy, the mean mindwandering composite was a significant predictor, β = −3.49, p = 0.001, as well as group status, β = 0.31, p = 0.026. Conversely, within the model predicting error awareness, only the mindwandering variance composite significantly predicted EA, β = 0.44, p = 0.010. The observed difference between the inclusive and composite models suggests that TUT Mean -related variance (as opposed to Stickiness Mean ) was the primary predictor for SA. Importantly, EA exhibited strong zero-order correlations with TUT Variance and showed no correlation with Stickiness Variance .
Thus, including stickiness in the overall model only reduced model sensitivity, as shown by the significant effect of the variance composite on EA, suggesting that only the unique TUT-related variance predicts EA (e.g., the only observed impact of modeling Stickiness Variance can be explained by the reduced degrees of freedom for that model). We thus report the composite here for completeness, noting that the high multi-colinearity of the Stickiness and TUT variance likely reduces our ability to distinguish them in a regression model (Farrar and Glauber, 1967). These results suggest that individual differences in the average and variability of mind-wandering are specific predictors of EA and SA ability, with increasing absorption in internal thought predicting worse stop performance, and higher levels of variability predicting greater error awareness. See Figure 2 for plots of these relationships.

fMRI-OVERALL RESPONSES TO STOP AND AWARE vs. UNAWARE ERRORS
Across participants, correct stops elicited significant BOLD activations throughout the canonical motor inhibition network, including bilateral anterior insula, superior parietal lobes, supplementary motor areas, and bilateral putamen (Hester et al., 2005;Wager et al., 2005;Verbruggen and Logan, 2008). Robust deactivations were observed in the DMN (dorsal mPFC and precuneus) as well as primary and secondary visual cortices (see Tables 3, 4, Figures 3, 4, for a complete summary of Stop-related responses). The Aware > Unaware contrast revealed significant activations in the salience and frontal-parietal attention networks (Seeley et al., 2007), including right anterior insula, thalamus, caudate nucleus, mid-cingulate cortex, middle frontal gyrus, and bilateral dorsolateral/rostral prefrontal cortex. Interestingly, we also observed significant activation of bilateral inferior parietal cortex, a region of the DMN, during aware errors. See Table 5 and Figure 5 for a complete summary of Aware > Unaware responses.

fMRI-BOLD CORRELATION WITH TUT REPORTS
We found significant correlations between probe-related BOLD signal and TUT intensity reports in clusters located in the mPFC (pFWE = 0.038, k = 685), pCC (p = 0.004, k = 11), and superior parietal lobe (p = 0.004, k = 9) (See Figure 4, Table 6). Only the mPFC cluster survived correction for multiple comparisons. No significant clusters correlating positively with TUT were found. For comparison of spatial overlap, results from the TUT and Stop analyses were overlaid on a single image. This comparison suggested that while correct stops primarily elicited de-activations in a dorsal region of mPFC, TUT intensity predicted activity in a more rostral region (Figure 4). We found no significant correlations between probe-related BOLD signal and TUT stickiness.

DISCUSSION
Consistent with previous fMRI work (Mason et al., 2007;Christoff et al., 2009;Stawarczyk et al., 2011), we found that during the EAT, intervals in which participants reported more frequent TUT predicted significant BOLD signal increases in the mPFC. We also found that correct stop trials were characterized by both deactivations in the dorsal mPFC and pCC, and Frontiers in Human Neuroscience www.frontiersin.org November 2013 | Volume 7 | Article 743 | 8 increased activity in the motor inhibition and salience networks, suggesting an important interaction between these networks during cognitive control (Allen and Williams, 2011). Interestingly the portion of the mPFC deactivated by inhibition was in a nonoverlapping portion of midline cortex, more dorsal to those voxels showing correlated responses to TUT intensity. While much of the research on task performance and the DMN has focused on their mutual antagonism, this finding may support some degree of functional segregation between self-generated thoughts and executive-related prefrontal inhibition. Additionally we found that during error awareness the inferior parietal cortex, a region of the DMN, was recruited along with common salience related regions such as insula and cingulate cortex. Coupled with our finding that mind-wandering variability predicts error monitoring performance, these results suggest that the relationship   Table 3 for a complete list of foci.
between task performance and self-generated thought may be more nuanced than mere antagonism. Our results also inform our understanding of the link between TUT and task performance. We replicated the finding that overall levels of TUT can interfere with demanding tasks, potentially reflecting the role of TUT in facilitating perceptual decoupling (Smallwood, 2013). Consistent with prior studies we also found TUT reports were associated with greater mPFC activity (Mason et al., 2007;Christoff et al., 2009). Higher levels of variability FIGURE 4 | DMN deactivations during stop trials (blue, top) and correlation with TUT reports (red, top), and mask used for ROI analysis (red, bottom). Note that while both task-unrelated thoughts and response inhibition engage the DMN, they recruit spatially unique regions of the mPFC. To facilitate comparison of spatial topography for EAT and TUT-related DMN activity, both activation maps are overlaid on a single MNI structural brain. In blue, significant deactivations during EAT stop trials (pFWE peak < 0.05, k threshold = 5 contiguous voxels). In red, increased self-reported TUT predicts greater mPFC BOLD activation, pFWE cluster < 0.05, region-of-interest analysis with DMN mask volume (bottom image shown in red), k threshold = 685 contiguous voxels. DMN mask generated using automated meta-analysis for term "mPFC" on neurosynth.org, z-score threshold > 4 (see Methods for further details). Statistical parametric maps superimposed on SPM canonical anatomical image, average of 305 T1-weighted images. Top image shown at MNI X = −5, bottom at X = 0.
in TUT were associated with a trend toward better stop performance, but more importantly, individuals who showed high variability in TUT more accurately detected mistakes made in the response inhibition task. Altogether, this pattern suggests that although mind-wandering has negative consequences for task performance, individuals who balance states of self-generated and task related experiences are relatively more effective at monitoring their performance. Importantly, a simple demand characteristic explanation would not predict these results as participants were motivated by financial rewards to perform the task as well as possible. As self-generated thought has both costs and benefits (McVay et al., 2008;Smallwood et al., 2008a;Baird et al., 2011;Mrazek et al., 2012;Smallwood and Andrews-Hanna, 2013;Smallwood et al., 2013a), it is possible that the association between metacognition and greater variability in mental states  reflects an increased ability to balance self-generated thought and external perceptual processes, optimizing task performance over both immediate and more temporally extended events. In general, behavioral variability reflects the sensitivity of cognition to fluctuating task demands, and can produce positive or negative outcomes depending on behavioral context (Lutz et al., 2002). Conscious error awareness depends upon integrating interoceptive error cues with visual-motor control signals (Sridharan et al., 2008;Ullsperger et al., 2010;Klein et al., 2013). Our finding that individual differences in TUT variability relate to error monitoring performance may thus depend upon the individual capacity to flexibly switch between and integrate across different information sources. Alternatively, our data may simply indicate that both online performance monitoring and flexibility in the contents of conscious thought both depend on a single domain general metacognitive process. However, this later interpretation is inconsistent with prior studies indicating that the correlation between metacognitive accuracy for memory and perception is unreliable and that success in both domains both depend on distinct resting neural networks (Baird et al., 2013).
Regardless of the specific relationship, our data extends the role of metacognition in enhancing the flexibility of conscious thought (Flavell, 1979;Shimamura, 2000). One speculative implication of this result is that by utilizing metacognition to reduce perseveration on either internal or external information, an individual may be able to exploit the benefits of self-generated thought while minimizing the costs as far as possible. One possible mechanism supporting such a benefit may be an increased ability to regulate the context and content in which self-generated thought occurs (Smallwood and Andrews-Hanna, 2013). More generally our data suggest that the role of metacognitive monitoring in facilitating TUT may explain why the mPFC, a region implicated in metacognition (Schmitz et al., 2004;Fleming et al., 2012;Frith and Frith, 2012), is also active during self-generated thought. Plausibly, the mPFC could allow an individual to reflect upon the contents of their self-generated thoughts and so benefit from this memory driven mode of thought to make progress on their ongoing behavioral goals. Future research should explore the possibility that certain forms of self-generated thought entail metacognitive processing in coordinating the occurrence or the content of the experience.
Although we advance an interpretation of TUT variance as relating to metacognition, it must be noted that there is emerging evidence for a functional dissociation of reflective metacognition and online error monitoring processes (Fleming et al., 2012). While the former is typically thought to involve top-down conscious judgments of the reliability of a particular source of information, error awareness has increasingly been shown to involve distinct functional systems, particularly the salience network, e.g., anterior insula and rostral cingulate. Both reflective metacognition and error monitoring are thought to contribute to such self-evaluations and are impaired by lesions to the PFC and anterior cingulate cortex (Hoerold et al., 2012). However, signal-theoretic models suggest that error awareness may contribute more directly to an interoceptive sense of uncertainty or doubt, or a graded subliminal awareness of errors (Fleming et al., 2012;Charles et al., 2013). Thus, an interesting and unresolved question for future research is whether meta-cognitive confidence in TUT ratings might predict stop-accuracy, as predicted by the meta-awareness hypothesis (Schooler, 2002;Maniscalco and Lau, 2012 2010; Andrews-Hanna, 2012). We found that the intensity of self-reported TUTs predicted activations in the mPFC, and that successful stops de-activated a more dorsal region of the mPFC. Thus, our results suggest a dissociation between elements of the DMN: Both the pCC and dorsal areas of the mPFC are inhibited when individuals engage in cognitive control, whereas more rostral regions of the mPFC are engaged during self-generated thought. Consistent with the distinction our data suggests, a graph theoretical analysis of the DMN implicates ventral regions of mPFC in the midline core of the system, while dorsal regions of the mPFC participates in what is known as the dorsal medial pre-frontal subsystem (Andrews-Hanna et al., 2010). Based on our data we speculate that more rostral-prefrontal regions of mPFC may be especially important in the self-generation of mental contents that are unrelated to ongoing task performance, an observation which is consistent with evidence that this brain region is linked to self-referent information processing (Mitchell et al., 2005;Mitchell, 2009). In contrast, dorsal regions of mPFC were deactivated when cognitive control was employed on task relevant information, supporting suggestions that this region may play a more general role in states of decoupled processing regardless of whether they are based on personally relevant information (see also Smallwood et al., 2013b).

FUTURE DIRECTIONS
In the present design we attempted to control participant motivation through financial reward; it is possible that our motivation manipulation interacted with self-reports, although we specifically instructed participants to report their honest experience. Indeed, we observed activations in reward-related areas including the caudate nucleus and putamen in our error awareness and stop contrasts (Schultz, 2000;Haruno and Kawato, 2006). Future research may benefit from shorter task intervals in a behavioral setting, to establish the role of motivational reward in reports of mind-wandering behavior (see for example Mrazek et al., 2012).
Our results also suggest that TUT accounted for a significant amount of variability in the EAT. Previous research has implicated disrupted error awareness in ADHD and cocaine abuse (Hester et al., 2007;O'Connell et al., 2009); our findings suggest that such disruptions could be related to reduced variability of mind-wandering during the EAT and/or increased overall TUT, reflecting cognitive rigidity. We also attempted to apply a subjective distinction between the intensity or subjective frequency of TUTs and their "stickiness," to generate self-reports capturing unique aspects of phenomenological experience, effectively "front-loading" phenomenological intuition into our experimental design (Gallagher, 2003). Although we attempted to create a distinction between the subjective frequency of TUTs and their attention-capturing nature, our data suggest a large degree of colinearity in these measures. This null-finding raises the possibility that a more prolonged familiarization of participants with subtle subjective categories is required to measure them empirically (Lutz et al., 2002). However, because variability in the experience of TUT showed a pattern suggestive of better performance and superior monitoring, it is possible that individuals who are high on mean levels of TUT and lack variance reflect a population for whom self-generated thoughts are especially sticky and hence problematic. While we note that a limitation of the present design is a lack of validation for the stickiness measure, future methodological research should consider this possibility.

CONCLUSION
In conclusion, our findings confirm previous work suggesting that TUTs interfere with task performance under demanding task conditions. In addition, we found novel evidence that variability in mind-wandering experience related to greater metacognitive ability, suggesting a role for online monitoring in ensuring flexibility in the manner that attention is deployed on both external and self-generated sources of information. We observed activations of both default mode and salience networks during error monitoring, a finding in line with the observation that particular aspects of mind-wandering are related to self-monitoring. We also found that ventral regions of the mPFC increased activity as TUT increased, while more dorsal regions were deactivated when individuals engaged cognitive control, a finding broadly in line with a component process view of the DMN. Given these results we recommend that a time series analysis of subjective variability with continuous self-report measures, or investigation of secondorder confidence in TUT ratings, may reveal further granularity in the experience of mind-wandering and related contributions to behavioral performance. Such approaches may prove important in determining the extent to which individuals regulate the balance of conscious thought so as to maximize the benefits of self-generated thought, while simultaneously limiting its costs.