Predictors of Long-Term Improvement Following Cognitive Remediation in a Sample With Elevated Depressive Symptoms

Objective Cognitive remediation (CR) techniques (interventions to enhance cognitive functioning) have proven moderately effective in improving cognition and daily functioning in major depressive disorder (MDD). However, baseline predictors of treatment response are lacking. The present study aimed to identify factors influencing long-term CR outcomes in a sample with current or previous, mild or moderate MDD and with self-reported cognitive deficits. Methods Forty-two completers of group-based CR (strategy learning or drill-and-practice), were pooled into one sample. Based on change scores from baseline to 6-month follow-up, participants were categorized as “improvers” or “non-improvers” using reliable change index calculations. Measures included a questionnaire of everyday executive functioning and a neuropsychological test of attention. Finally, improvers and non-improvers were compared in terms of various sociodemographic, psychological, illness-related, and neuropsychological baseline variables. Results Seventeen participants improved reliably in everyday executive functioning, and fourteen demonstrated a reliable improvement in attention. No statistically significant differences emerged between improvers and non-improvers. Conclusion No major predictors of CR were identified. Importantly, the current findings are insufficient to guide clinical decision-making. Large-scale studies with a priori hypotheses are needed to make advances in the future.


INTRODUCTION
Major depressive disorder (MDD) is characterized by deficits in cognitive functions, including attention, memory, and executive functions (EF) (Snyder, 2012;Ahern and Semkovska, 2017). However, the heterogeneity of the cognitive profile in MDD appears to be large, with distinct neurocognitive subgroups (Pu et al., 2018). For those experiencing cognitive difficulties, deficits often persist into remission (Rock et al., 2014) and have a deleterious effect on everyday functioning (Baune et al., 2010). Cognitive deficits, particularly in EF, are additionally associated with unfavorable depression outcomes, such as impaired long-term recovery (Vicent-Gil et al., 2018) and reduced antidepressant medication effectiveness (Groves et al., 2018). Hence, cognitive functioning represents a potential treatment target in MDD (Kaser et al., 2017).
Nonetheless, both antidepressant medication and cognitivebehavioral therapy (CBT) show limited effectiveness in alleviating cognitive deficits (Porter et al., 2016;Shilyansky et al., 2016). Cognitive remediation (CR) interventions -specifically targeting cognitive dysfunction -intended to produce lasting improvements in everyday functioning are thus emerging. Although the heterogeneity of treatment approaches labeled "CR" is large (Motter et al., 2016), these interventions can be divided into either bottom-up drill-and-practice approaches or top-down approaches focusing on strategy learning. Bottom-up approaches typically consist of computerized cognitive training (CCT) tasks intended to improve basic cognitive processes through the process of neuroplasticity (Motter et al., 2016). In contrast, top-down approaches consist of learning compensatory strategies for wide appliance in daily living, to compensate for the cognitive difficulties. Findings indicate moderate effectiveness in improving cognition and everyday functioning following CR, but there is a paucity of evidence for improved EF or on its longterm effects (for a meta-analysis, see Motter et al., 2016). In this context, it has been argued that the substantial heterogeneity in the cognitive profile of MDD may influence the effectiveness of CR (Koster et al., 2017).
The lack of factors associated with successful treatment outcomes may be a barrier to improving CR effectiveness (Motter et al., 2016;Koster et al., 2017). The identification of pretreatment predictors could improve efficacy by facilitating individualized clinical decision-making. Moreover, it may be helpful for the development of future treatments by providing insight into the mechanisms of CR interventions (Koster et al., 2017). The investigation of CR predictors and moderators in MDD is scarce, but a meta-analysis covering a range of CCT interventions found decreased treatment effectiveness with increased age (Motter et al., 2016). Additional variables, such as gender and receiving concurrent treatment (antidepressant medication or psychotherapy), did not significantly influence outcomes. In a recent study dedicated to identifying the predictors in a CR intervention consisting of both CCT and strategy learning, Listunova et al. (2020) observed a shorter duration of illness to be the only factor associated with improvement on a neuropsychological measure of attention in a partially remitted MDD sample. Several sociodemographic, neurocognitive, psychopathological, and training-specific factors thus failed to predict outcomes. However, the study was limited by its exploratory approach, modest sample size, and exclusive focus on the attention domain (Listunova et al., 2020). Interestingly, CR findings diverge from psychotherapy research in MDD, where illness characteristics such as greater depression symptom severity, younger age at onset, and more previous episodes have all been associated with poorer responses in relation to depressive symptom alleviation (Hamilton and Dobson, 2002). Moreover, in schizophrenia research, where predictors and moderators of CR effectiveness have been more frequently studied, most of the factors reviewed fail to significantly influence CR treatment response (Reser et al., 2019;Seccomandi et al., 2020). However, in a selection of studies, better baseline performance in several cognitive domains predicted both cognitive and functional improvement, while increased chronicity and severity of schizophrenia has been associated with worse CR outcomes (Medalia and Richardson, 2005;Kurtz et al., 2009;Vita et al., 2013;Lindenmayer et al., 2017).
The main aim of the present study was to explore whether a selection of sociodemographic, neuropsychological, illnessrelated, and psychological variables could predict long-term CR outcomes in a sample with current or previous mild or moderate MDD. Data were collected as part of a single-blind randomized controlled trial (RCT) comparing the effectiveness of a strategy-based CR approach, Goal Management Training (GMT), with drill-and-practice CCT, in improving EF (Hagen et al., 2020). Both groups improved on measures of EF in daily life, neuropsychological tests of EF and attention, and depression symptom severity following CR. That is, no significant differences emerged between groups in the original study, although withingroup changes in everyday EF and depression symptom severity were only significant following GMT. Owing to the limited number of previous studies examining predictors of CR in MDD, the present study applied an exploratory approach with no a priori hypothesis.

MATERIALS AND METHODS
The original RCT was preregistered at clinicaltrials.gov with the identifier NCT03338413, and the study protocol was approved by the Regional Committee for Medical and Health Research Ethics, South-Eastern Norway (2017/666). The study was conducted following the World Medical Association's Declaration of Helsinki, and all participants gave their written informed consent. For more detailed information on the methodological approach of the original RCT study, see Hagen et al. (2020).

Participants
The sample (n = 63) included participants diagnosed with mild or moderate MDD according to International Classification of Diseases -10th Revision (ICD-10) criteria (World Health Organization, 2004), either as a primary or as a secondary diagnosis. All participants had undergone a diagnostic evaluation and completed treatment at the Return-to-Work clinic at Lovisenberg Diaconal Hospital within 2 years before inclusion. The Return-to-Work clinic offers shortterm outpatient psychotherapeutic treatment to patients with mental health issues of mild to moderate severity, at risk of receiving sick leave because of mental illness. In addition, to be included, participants had to be between 18 and 60 years of age and to have self-reported everyday EF deficits (e.g., difficulties with memory, organizing/planning, emotional regulation, and/or concentration) in a custommade telephone interview. All participants were additionally asked to confirm that depression symptoms represent a major mental health complaint. Exclusion criteria included comorbid neurological conditions, ongoing alcohol or substance abuse, and severe cognitive problems or mental disorders (psychotic disorders and severe personality disorders). No cutoff scores were specified for current depressive symptom severity. There were no restrictions on participants concerning additional concurrent psychotherapeutic or antidepressant treatment.

Study Design and Blinding
All participants completed a baseline assessment (T1) before being randomly assigned to nine sessions of either GMT or CCT, using computer-generated simple randomization. Participants were reassessed immediately following treatment completion (T2) and at a 6-month follow-up (T3). The assessments consisted of neuropsychological tests and selfreport rating scales. Assessments were not blind, because the person responsible for data collection also acted as a therapist in both interventions. To compensate, an external assessor performed a limited set of blind T3 assessments (n = 5) for comparison. However, the study was singleblind because participants were not informed whether they had been allocated to the condition considered to be the active treatment.

Goal Management Training
GMT is a manual-based CR intervention to improve everyday EF (Levine et al., 2011). A central element of GMT is to learn and internalize strategies for wide application in daily living, promoting goal-directed behavior through increased executive control and improved problem-solving capacities. Strategies consist of a self-instruction to stop ongoing behavior, check the current content of working memory, state and define goals, apply a systematic approach to problem-solving, and monitor performance.
In the present study, the Norwegian translation of the standard GMT protocol was employed (Stubberud et al., 2013;Tornås et al., 2016). A clinical psychologist and a neuropsychologist delivered GMT in groups of five to seven participants in 9 weekly 2 h sessions. In-class exercises included practicing the use of the compensatory strategies (e.g., practice a systematic approach to problem-solving by arranging a fictitious wedding party). Mindfulness exercises (Kabat-Zinn, 1990), intended to enhance attentional control, were also practiced in class. Sessions emphasized group discussions addressing personal examples of dysexecutive behavior. Between-session assignments included monitoring EF-related errors, mindfulness exercises, and the application of learned strategies in daily life ( Table 1).

Computerized Cognitive Training
The CCT consisted of seven exercises from the BrainHQ platform. Repetition is the hallmark feature of CCT, and neuroplasticity is its theoretical foundation (Siegle et al., 2007). Cognitive improvements, including EF, have been identified using BrainHQ or similar exercises (Morimoto et al., 2014;Lewandowski et al., 2017), and these studies established the empirical basis for the selection of exercises. In addition, exercise selection was based on the provider's description 1 .
The CCT consisted of nine twice-weekly 1 h sessions by groups of three participants. Exercises targeted attention, memory, processing speed, and EF. To ensure appropriate levels of mastery and frustration, the platform adapted difficulty levels to the individual participants' performance, keeping the success rate at 80% throughout. A clinical psychologist acted as a therapist and gave participants positive feedback on their efforts. The first session included psychoeducation, with the therapist introducing the concept of neuroplasticity, typical cognitive deficits in depression, and the importance of cognitive processes in different everyday situations. Participants had online access to the training platform and were encouraged to practice for at least 30 min between each session ( Table 1).

Completer Sample
Participants had to attend a minimum of six training sessions and complete the 6-month follow-up assessment (T3) to be included in the completer sample. Forty-two completers from both groups were pooled into one sample. The pooling of participants receiving different treatments was done to increase the sample size (Figure 1).

Outcome Measures
The Global Executive Composite (GEC) from the Behavior Rating Inventory of Executive Function -Adult version (BRIEF-A) (Roth et al., 2005) was applied as an outcome of self-reported everyday EF. The BRIEF-A GEC consists of 70 items and nine non-overlapping subscales (Inhibit, Self-Monitor, Plan/Organize, Shift, Initiate, Task Monitor, Emotional Control, Working Memory, Organization of Materials), tapping the frequency of everyday dysexecutive behavior (item range 1-3, total range 70-210). The psychometric properties of the GEC are acceptable, with a 1-month test-retest reliability of 0.94 and a Cronbach's alpha of 0.96 (Roth et al., 2005).
The Conners Continuous Performance Test -Third edition (CPT-3) (Conners, 2015) was applied as a neuropsychological measure of attention. The CPT-3 is a 14 min go/no-go test of visual attentiveness, response inhibition, and sustained attention. The number of commission errors (Commissions; response to "no-go" targets: attentiveness and inhibition) and hit reaction time standard deviation (HRT SD: response consistency/sustained attention) subscales were included as outcome measures. The corrected test-retest reliabilities (1-5 weeks) of the included subscales are 0.68 (HRT SD) and 0.85 (Commissions) (Conners, 2015). A neuropsychological measure of attention was selected as an outcome to facilitate comparison with the only previous study that we know of that investigated the predictors of CR response in MDD (Listunova et al., 2020).

Sociodemographic Factors
Sociodemographic factors included age, gender, years of education, and employment status, all self-reported in a custom-made interview. Employment status was transformed into a dichotomous variable with the categories "full-time employment/full-time student" and "other" (including "part-time employment/part-time student, " "sick leave, " and "looking for a job").

Illness-Related Factors
Illness-related factors included current depressive symptom severity in addition to the self-reported number of previous depressive episodes, age of onset, duration of illness, and current antidepressant medication use. Depressive symptom severity was assessed with the Beck Depression Inventory (BDI) (Beck et al., 1961), which has satisfactory internal consistency (Beck et al., 1988). The remaining illness-related factors were self-reported during a custom-made interview. The number of previous episodes was transformed into a dichotomous variable based on the categories of "one episode" and "more than one episode, " as this was considered a theoretically meaningful subdivision of the highly skewed original continuous variable.

Psychological Factors
Psychological factors included overall psychological distress and a tendency to ruminate. The Clinical Outcomes in Routine Evaluation -Outcome Measure (CORE-OM) (Barkham et al., 2001) was applied as a measure of overall psychological distress. The CORE-OM clinical score was calculated as a mean of completed items multiplied by 10 (range: 0-40). Rumination was assessed using the Ruminative Response Scale (RRS) (Treynor et al., 2003). Both questionnaires have acceptable internal consistency (Barkham et al., 2001;Treynor et al., 2003).

Neuropsychological Factors
Neuropsychological factors included estimated IQ, assessed with the two-subtest form (Matrix reasoning; Vocabulary) of the Wechsler Abbreviated Scale of Intelligence (WASI) (Wechsler, 1999). Performance on the Delis-Kaplan Executive Function System (D-KEFS) (Delis et al., 2001) Trail-Making Test was applied as a measure of processing speed (condition 2) and EF/shifting (condition 4). Memory was assessed using the California Verbal Learning Test-Second edition -Short form (CVLT-II SF) (Delis et al., 2000). The digit span forward (attention span) and digit span backward (working memory) subtests of the Wechsler Adult Intelligence Scale -Fourth edition (WAIS-IV) (Wechsler, 2014), were also employed.

Other Factors
Received intervention (GMT or CCT) was included as a trainingspecific factor. In addition, baseline performance scores on the outcome variables were included as predictors.

Calculation of Reliable Change Index
The reliable change index (RCI) (Jacobson and Truax, 1991) was calculated for everyday EF (BRIEF-A GEC) and the neuropsychological measure of attention (CPT-3: Commissions, HRT SD). The RCI analysis is a statistical approach to identify individuals with statistically reliable improvement, given the scale reliability. Thus, the approach is sensitive to individual participant improvements potentially lost in grouplevel statistical analysis (Jacobson and Truax, 1991). To calculate the RCI, a change in individual raw score (BRIEF-A) or T-score (CPT-3) between T1 (X 1 ) and T3 (X 2 ) was divided by the standard error of the difference (SE diff ) using the formula: RCI = (X 2 − X 1 )/SE diff SE diff was derived from the standard error of measurement (S E ), calculated using the test-retest reliability (r xx ) of the instrument, and the standard deviation (SD) using the following formulas: An RCI smaller than -1.96 (because of measurement direction) was required to be considered as a reliable improvement. A change surpassing the ±1.96 threshold occurs by chance in only 5% of cases. Information on test-retest reliability and standard deviation was collected from the test manuals (Roth et al., 2005;Conners, 2015). The decision to calculate change scores from baseline (T1) to the 6-month follow-up (T3) was because long-term outcomes were regarded as most clinically relevant. For the CPT-3, a reliable improvement on one of the two subscales (Commissions or HRT SD) was required to count as an improvement. Finally, participants improving reliably on one subscale were not included if there was a reliable deterioration on the other.

Comparison of Improvers and Non-improvers
For each of the two outcome measures, participants were categorized as either "improvers" or "non-improvers" based on their RCI score. Improvers were compared with nonimprovers in the pooled completer samples using the nonparametric Mann-Whitney U test and chi-square test, for pairs of continuous and dichotomous variables, respectively. In addition, the T3 results on the outcome measures were compared between assessors (blind/non-blind), and T1 results between completers and non-completers, using the Mann-Whitney U test. All tests were two-tailed, and to partially account for multiple testing, the significance level was set to 0.01. Values between 0.01 and 0.05 were interpreted as trends. SPSS version 24.0 for Windows was applied for all analyses.

RESULTS
The completers (n = 42) had a median age of 41 years (range = 28-59) and a median of 15 years of education (range = 9-18). The majority were female (79.1%), and 76.7% did not currently use antidepressant medication. Furthermore, their average depression symptom severity was in the mild range (BDI: median = 17.0, range = 4.0-34.0) (Beck et al., 1988). Fourteen completers had been diagnosed with a comorbid ICD-10 mental or behavioral disorder ( Table 2). The sample reported substantial executive dysfunction in daily living (BRIEF-A GEC T-score: median = 64, range = 44-80) but performed in the normal range for the included CPT-3 subscales at baseline (Commissions T-score: median = 48, range = 35-73; HRT SD T-score: median = 44, range = 33-75). At follow-up, completers reported overall fewer EF deficits in daily life (BRIEF-A GEC T-score: median = 59.5, range = 36-78) and performed better on the measure of attention (CPT-3: Commissions T-score, median = 44, range = 25-71; HRT SD T-score, median = 41, range = 31-56). No statistically significant differences emerged for any of the outcome measures at follow-up between the blinded and nonblinded assessors. Finally, the completers were not significantly different from the non-completers (n = 21) on any of the included variables.

Comparison of Everyday Executive Functioning of Improvers and Non-improvers
Seventeen participants (40.5%) were identified to improve reliably on the BRIEF-A GEC between T1 and T3. For participants to surpass the critical value for improvement, a 13-point reduction in BRIEF-A GEC raw score was required. The mean BRIEF-A GEC raw score of improvers at follow-up was 107.9 (SD = 17.9). No statistically significant differences emerged between improvers and non-improvers for any of the predictors. However, improvers had a higher estimated IQ than non-improvers at trend level (p = 0.044) ( Table 3).

Comparison of Attention by Improvers and Non-improvers
Fourteen participants (33.3%) were identified as improvers on the measure of attention. Twelve participants improved on the Commissions subscale and four on the HRT SD subscale, while two participants improved on both subscales. For participants to surpass the critical value for improvement, a change in T-score of 10 (Commissions) or 13 (HRT SD) was required. At followup, the mean T-score for the improver group (n = 14) was 42.2 (SD = 6.5) on the Commission subscale and 39.9 (SD = 7.5) on the HRT SD subscale. No statistically significant differences emerged between improvers and non-improvers. However, at trend level, fewer of the improvers compared with the nonimprovers (p = 0.011) had experienced only one previous depressive episode ( Table 4). The mean number of self-reported previous depressive episodes was 4.8 (SD = 3.3) for improvers and 3.4 (SD = 3.2) for non-improvers (values > 10 were recoded > 10 = 10).

Overlap Between Improvers Across Outcomes
Fifteen participants (35.7%) improved on neither measure, and four participants (9.5%) improved reliably on both the BRIEF-A GEC and the CPT-3.

DISCUSSION
The present study aimed to identify factors predicting long-term treatment outcomes following CR in an MDD sample. None of the variables emerged as major predictors of change in either everyday EF or attention. The lack of factors associated with CR improvement is generally consistent with previous research on MDD (Motter et al., 2016;Listunova et al., 2020). Even though none of the illness-related factors emerged as major predictors, surprisingly, a reliable improvement in attention was associated at trend level with having experienced more than one previous depressive episode. Recurrence of episodes arguably indicates greater illness severity and chronicity, previously associated with reduced CR effectiveness for both MDD (Listunova et al., 2020) and schizophrenia (Medalia and Richardson, 2005;Vita et al., 2013;Lindenmayer et al., 2017). This result thus diverges from a selection of previous findings. However, contrary to conclusions from systematic reviews in MDD (Rock et al., 2014;Ahern and Semkovska, 2017), the present sample did not display objective attention deficits at baseline. Additionally, previous research has suggested distinct neurocognitive subgroups for MDD, with a majority showing near-normative performance on neuropsychological tests (Pu et al., 2018). Participants in the Return-to-Work program report less overall illness severity and are more likely to hold a job compared with other outpatients (Victor et al., 2016). In addition, IQ estimates were above average in the present sample. Such sample characteristics could have contributed to normal performance on the cognitive measures (Elgamal et al., 2010;Venezia et al., 2018). Furthermore, owing to the weak correlations between self-reported and neuropsychological measures of cognition in MDD, the inclusion based on subjective deficits may have also resulted in a subgroup of cognitively unimpaired participants (Petersen et al., 2019). Notably, at baseline, the non-improver group performed even better than improvers on the outcome measure of attention and may as such represent a part of the sample without actual attention deficits. The non-improvers were thus less likely to surpass the threshold for improvement, because their baseline left little room for further gains. Indeed, not accounting for individual difference in baseline performance is a limitation of the RCI approach (Duff, 2012). Moreover, although overall results are mixed, some previous findings indicate that MDD recurrence is related to impaired performance on measures of cognition (Hasselbalch et al., 2011), and this could potentially explain why recurrent episodes were associated with attention improvement in our sample when members of this group had the opportunity to improve their attention as a result of the interventions.
The present study failed to replicate the single significant finding from a previous meta-analysis investigating moderators of CR outcomes in MDD, namely, that treatment effectiveness decreases with increasing age (Motter et al., 2016). Excluding participants above the age of 60 years restricted the age range and reduced the sample variability of the present study, potentially limiting the prospect of obtaining significant results. Nonetheless, the overall available evidence does not indicate that age is a reliable predictor of CR outcomes (Reser et al., 2019;Listunova et al., 2020;Seccomandi et al., 2020). However, in accordance with previous research on MDD (Motter et al., 2016;Listunova et al., 2020), neither gender nor receiving concurrent treatments predicted improvements in cognition or everyday EF, with the latter indicating no additive effect on cognition of combining CR with antidepressant medication.
Cognitive performance at baseline did not predict outcomes for attention or EF in our sample. This finding is contrary to a selection of findings regarding schizophrenia (Reser et al., 2019) but in accordance with MDD research (Listunova et al., 2020). One notable exception was that higher IQ estimates were associated with improvement in everyday EF at the trend level. Theoretically, a greater general ability may contribute to reaching one's potential for applying learned strategies in daily living, thereby increasing CR effectiveness (Velligan et al., 2006). Hence, the role of IQ in CR should be further investigated in future studies.
Delivering CR therapies to patients with MDD hinges on the theoretical assumption that cognitive deficits impair everyday functioning and act as risk factors for depressive symptoms. Thus, it is striking that cognitive performance at baseline (i.e., the degree of the cognitive deficits) lacks support as a moderator of CR outcome (Koster et al., 2017). No established neuropsychological profile exists for MDD (Marazziti et al., 2010), and this heterogeneity could limit the chance of detecting reliable pretreatment cognitive predictors. Furthermore, the relationship of cognitive factors with outcomes may be nonlinear, exerting different influences at different levels of each variable. To illustrate, higher baseline cognitive performance may be conceptualized both as facilitating CR gains and as restricting improvement potential (Twamley et al., 2011;Vita et al., 2013).
Rumination is proposed to have a bidirectional relationship with EF (Davis and Nolen-Hoeksema, 2000;Philippot and Brutoux, 2008), and in CCT interventions specifically addressing EF processes (i.e., cognitive control training), rumination has been found to mediate depressive symptom outcomes (Quinn et al., 2014). To our knowledge, no previous study has investigated whether baseline rumination predicts CR outcomes in cognition or functioning. Our findings suggest that baseline rumination is not a major predictor of improvements in attention or everyday EF. However, important subcomponents of the rumination construct have been identified (Treynor et al., 2003) but were not presently investigated.
Pooling participants who received CR interventions that differed in content and theoretical foundation was necessary to obtain an acceptable sample size. This may have obscured the effect of predictors on each treatment. Nevertheless, the number of improvers was similar across interventions for both outcome measures in the present study; moreover, a previous metaanalysis of schizophrenia has indicated that different approaches produce similar overall effects on measures of cognition (Wykes et al., 2011), suggesting commonalities between treatments.

Clinical Implications and Future Directions
The percentage of improvers in the present study (33.3-40.5%) and previous studies (34.2%) (Listunova et al., 2020) indicates the potential to increase CR effectiveness in MDD. The heterogeneity of cognitive deficits in depression suggests that individualized interventions may be required, and understanding why participants achieve different outcomes represents a critical hurdle to individualizing CR. However, identifying easily available major predictors of treatment outcomes has proven to be a challenge, and current findings are insufficient to guide clinical decision-making. Moreover, no consistent barriers to improvement have been identified to date, so these findings suggest that MDD patients have the potential to improve following CR, regardless of their baseline characteristics. For advances in the field, large-scale and fine-grained investigations with a priori hypotheses are required (Reser et al., 2019;Seccomandi et al., 2020). In addition, as has been suggested for schizophrenia, we may need to go beyond generic demographic and clinical factors to predict CR outcomes reliably (Reser et al., 2019).
The present study focused on identifying baseline predictors that could be easily disseminated into clinical practice. Hence, it did not include several factors identified as potential mediators or moderators of CR in MDD or schizophrenia, such as the number of training sessions (Buonocore et al., 2017), motivation/engagement with training (Medalia and Richardson, 2005;Siegle et al., 2014), therapist characteristics (e.g., clinical experience) (Medalia and Richardson, 2005), and patienttherapist working alliance (Huddy et al., 2012), all candidates for further investigation.

Strengths and Limitations
The study was based on data from a single-blind RCT, applied a multimodal selection of outcome measures, and used a stringent approach to define improvement. In addition, the study attempted to extend on previous research by applying a long follow-up period as an endpoint, aiming to identify predictors of durable change following CR. However, the following limitations should be considered when interpreting the above findings. No a priori hypotheses were generated, and the analyses were exploratory, calling for caution in the interpretation of results. Another notable limitation was the modest sample size, reducing statistical power and increasing the risk of type II errors. Furthermore, multiple testing inflated the risk of type I errors, even if partially accounted for by lowering the significance level.
The neuropsychological measure of attention was not corrected for practice effects (Chelune et al., 1993). Although the overall practice effects on the CPT-3 are reported to be small-to-moderate across the included subscales [T = 2.9 for Commissions; T = 0.2 for HRT SD; Conners, 2015], correcting for these would still provide reliable improvement in attention that is very hard to achieve for a substantial proportion of the sample, given the conservative RCI threshold and baseline performance in the normal range. Moreover, the lack of an adequate sample for comparison (i.e., non-intervention control or comparable norm population) restricted the advantages of applying more sophisticated statistical approaches to overcome some of the above issues.
A selection of variables was transformed into dichotomous categories, which may result in a loss of information and power (MacCallum et al., 2002). Notably, this included the "number of previous episodes" variable, found to differ between groups at trend level. The follow-up assessments lacked blinding for most of the sample, increasing the risk of biased responding. However, results from a small number of blind assessments were not significantly different from those of non-blind assessments. Finally, the illness-related variables (e.g., age of onset, number of episodes) were self-reported and thus more susceptible to bias, including memory biases.

CONCLUSION
In the present study, no major predictors of long-term improvement in attention or executive functioning following CR emerged. The results are consistent with previous research, which mostly failed to identify predictors of CR treatment. Importantly, the current findings are insufficient to guide clinical decisionmaking, and there is a need for large-scale and fine-grained investigations to extend current knowledge.

DATA AVAILABILITY STATEMENT
The datasets for this article are not publicly available because of restrictions specified in the study consent-form, and conditions for approval from the local ethics committee, concerning patient confidentiality and participant privacy. Requests to access the datasets should be directed to Jan Stubberud, jan.stubberud@psykologi.uio.no.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Regional Committee for Medical and Health Research Ethics, South-Eastern Norway (2017/666). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
BH completed the data collection and data curation, analyzed the data, and wrote the manuscript draft. The study was part of a doctoral thesis by BH. JS provided supervision, conceptualized the original trial, acted as principal investigator, and contributed with revisions of the manuscript draft. BL and NL contributed to the conceptualization of the original trial and revision of the manuscript draft. EK contributed with revisions of the manuscript draft. All authors contributed to the article and approved the submitted version.

FUNDING
This study was funded by the South-Eastern Norway Regional Health Authority (Grant No. 2019120) and the research fund at the Lovisenberg Diaconal Hospital. The funding sources were not otherwise involved in the research.