Response and Remission Rates in Internet-Based Cognitive Behavior Therapy: An Individual Patient Data Meta-Analysis

Background: Internet-delivered cognitive behavior therapy (ICBT) was developed over 20 years ago and has since undergone a number of controlled trials, as well as several systematic reviews and meta-analyses. However, the crucial question of response rates remains to be systematically investigated. The aim of this individual patient meta-analysis (IPDMA) was to use a large dataset of trials conducted in Sweden to determine reliable change and recovery rates across trials for a range of conditions. Methods: We used previously collected and aggregated data from 2,866 patients in 29 Swedish clinical trials of ICBT for three categories of conditions: anxiety disorders, depression, and others. Raw scores at pre-treatment and post-treatment were used in an IPDMA to determine the rate of reliable change and recovery. Jacobson and Truax’s, (1991) reliable change index (RCI) was calculated for each primary outcome measure in the trials as well as the recovery rates for each patient, with the additional requirement of having improved substantially. We subsequently explored potential predictors using binomial logistic regression. Results: In applying an RCI of z = 1.96, 1,162 (65.6%) of the patients receiving treatment were classified as achieving recovery, and 620 (35.0%) were classified as reaching remission. In terms of predictors, patients with higher symptom severity on the primary outcome measure at baseline [odds ratio (OR) = 1.36] and being female (OR = 2.22) increased the odds of responding to treatment. Having an anxiety disorder was found to decrease the response to treatment (OR = 0.51). Remission was predicted by diagnosis in the same direction (OR = 0.28), whereas symptom severity was inversely predictive of worse outcome (OR = 0.81). Conclusions: Response seems to occur among approximately half of all clients administered ICBT, whereas about a third reach remission. This indicates that the efficacy of ICBT is in line with that of CBT based in prior trials, with a possible caveat being the lower remission rates. Having more symptoms and being female might increase the chances of improvement, and a small negative effect of having anxiety disorder versus depression and other conditions may also exist. A limitation of the IPDMA was that only studies conducted in Sweden were included.


INTRODUCTION
Internet-delivered cognitive behavior therapy (ICBT) has existed for more than 20 years (1), and treatment programs have been developed for a wide range of clinical and non-clinical conditions. Most forms of ICBT are administered in the form of guided selfhelp, with text and video presentations boosted by email support in a secure online platform resembling Internet banking (2). One way to describe ICBT is to pinpoint the similarities with online education, even if the treatment is largely based on self-help texts and cognitive behavioral therapy (CBT) manuals (3). Thus, ICBT programs tend to rely on psychoeducation and instructions for how to change thoughts, emotions, and behaviors in everyday life, and, like CBT, programs usually last 5 to 15 weeks with homework assignments and therapist feedback provided on a weekly basis (3).
Research suggests that therapist-supported ICBT-in contrast to unguided treatments (4)-can be as effective as face-to-face cognitive behavior therapy (5), yield long-term results (6), and work under clinically representative conditions (7). ICBT has also been tested for different target groups-for example, young persons (8), adults (9), older persons (10), and immigrants (11). Treatment programs have focused on specific problems, such as procrastination (12); diagnoses like post-traumatic stress disorder (PTSD) (13); or tailored according to a client's specific problem profile (14). Another approach has been to deliver transdiagnostic treatments in which one treatment is used to target underlying common processes, such as avoidance (15). There are also studies on other psychotherapy forms, including psychodynamic psychotherapy (e.g., 16), interpersonal psychotherapy (17), versions of CBT such as acceptance and commitment therapies (18), and attention training (19). Finally, there are also programs based on physical exercise (20), mindfulness (21), and relaxation (22), even if the latter two are often incorporated into ICBT protocols (23). In addition to the controlled trials, several studies exist on moderators and mediators of outcome (e.g., 24), as well as some qualitative studies on the client's experience during ICBT (25).
While it is common to report effects in clinical trials, there are also studies and reviews reporting negative effects and nonresponse to ICBT. In two previous individual patient data metaanalyses (IPDMA), we studied these two outcomes (26,27). With regards to negative effects in the form of deterioration, 5.8% of treated research participants showed such effects (26), and non-response was present among 26.8% of participants (27). When completing these two reviews based on our dataset of controlled trials, a question emerged regarding response rates in our ICBT studies as this was not reported in the previous ones given uncertainties regarding definitions and scope of the two previous reviews. In contrast to the standard of reporting mean standardized differences with metrics like Cohen's d, there is far less agreement on how to define response in psychological treatments studies in general, and specifically in CBT (28). Several questions emerge when defining what constitutes a "response" to treatment. In a review, 26, p. [73][74] mentioned several issues, such as (a) number of measures used to define response, (b) number of modalities (e.g., self-report versus observed behavior), (c) blinding of assessors, (d) degree of change from baseline, and (e) use of a clinical cut-off to determine if a client has reached a non-clinical state (sometimes referred to as high end-state or remission). In a seminal paper, Jacobson and Truax (29) outlined guidelines for defining change from baseline (reliable change index-RCI) and different ways to define what constitutes being within a non-clinical range or having reached high endstate function/remission. In the present review, we will use the term "remission" to refer to what can be counted as high endstate function, which allows us to be consistent with a previous IPDMA on depression by Karyotaki et al. (30).
In their IPDMA on guided ICBT for depression focusing on response and remission, Karyotaki et al. (30) included 24 RCTs (4,889 participants) and compared guided ICBT with a control group. The mean pooled response rate (based on RCI) at posttreatment was 56.19%, and the mean remission rate at posttreatment in the treatment groups (N = 26) was 38.51% based on the RCI criteria (1.96)-two standard deviations improvement from baseline for each measure. Given the dataset we coded based on our own trials, we decided to conduct a new IPDMA (31) knowing that we could use original data across studies to investigate reliable change and remission (high end-state function). As in our previous two IPDMAs (26,27), we used data from 29 clinical trials. The final dataset with complete data consisted of 1,535 patients who had received ICBT, and trials were categorized into three groups: anxiety disorders, depression/ mood disorders, and other conditions (i.e., erectile dysfunction, relationship problems, and gambling disorder). The aim of the current study was to determine the rates of treatment response and remission in clients who had received ICBT in clinical trials conducted by our group in Sweden. As a secondary exploratory aim, we examined potential predictors of response.

Individual Patient Data Meta-Analysis
As described in our two previous IPDMAs (26,27), we used the scores for individual client and outcome variable across studies female might increase the chances of improvement, and a small negative effect of having anxiety disorder versus depression and other conditions may also exist. A limitation of the IPDMA was that only studies conducted in Sweden were included.
Keywords: response rates, recovery, predictors, individual patient data meta-analysis, internet-based cognitive behavior therapy (32). In IPDMAs, it is study factors that might be predictive of treatment outcome using the raw data instead of group means, as has been done in previous IPDMAs-for example, on low-intensity psychological treatments (33) and Internet interventions for problem drinking (34). As in the previous IPDMAs, we aggregated available data from clinical trials that we conducted. A complete description surrounding our data collection procedure is presented in Rozental et al. (26). An obvious limitation of using this method is that we cannot assess the risk of bias, which is a common practice in systematic reviews (32). However, by including trials from our own group, we were able to obtain an overall view of response and remission, which we assume could be representative, particularly for Swedish settings. For a complete description of the inclusion and exclusion criteria used, see 26, p.163).
The raw scores from each client in the included trials were entered into one data matrix, and codes for background variables were aligned. Given the complexity of modeling reasons for missing data in such a heterogeneous group of research participants and the fact that we were focused on binary outcomes in this review, we decided to use a complete case approach instead of multiple imputation, as we were convinced that data were not missing completely at random (35). Moreover, complete case analysis has been recommended as the first approach when conducting meta-analyses, even if this approach is followed by sensitivity analyses to detect possible bias in estimates (36).
Sociodemographic variables were occasionally collapsed to facilitate comparisons and to obtain consistency across trials (see 26,27). Trials were categorized into three categories: (1) anxiety disorders, (2) depression and mood disorders, and (3) other (i.e., erectile dysfunction, relationship problems, and gambling disorder). In Table 1, we present an overview of the clients' sociodemographic variables in the trials included (the table overlaps with Table 3 in 26, p. 169 but does not include the control groups). We present data from baseline in the trials and also the amount of missing data for the full sample.

Statistical Analysis
The RCI was chosen based on its widespread use for assessing reliable change (29,37). As is common practice, the RCI was calculated by taking each individual change score and then dividing this score by the standard error of the difference, i.e., SE diff = SD 1 √2√1-r. In the formula, SD 1 corresponds to the standard deviation of a condition at pre-treatment, and r is the reliability estimate (38). We calculated RCIs for the primary outcome measure of each included trial and used the test-retest reliability for that specific trial measure (see Table 2; also reported in 27). Basically, the RCI sets the limit for when a change score is unlikely to be real (p = .05). Following the usual standards, we calculated RCIs for which a change equal to z = 1.96 on the basis of a standard deviation unit was used. Following this, each participant in the trials could be classified as either a responder or a non-responder to treatment, with the definition of response being specific to each study and measure used, a similar method to that used by Karyotaki et al. (30). Heterogeneity was tested by entering response rates into the program comprehensive metaanalysis (version 2.2.021; CMA).
We also followed the methods of Karyotaki et al. (30) when calculating remission, again using criteria set by Jacobson and Truax (29). Participants were classified as remitters if they moved two standard deviations below the mean of the clinical group to which they belonged in each study. The resulting cut-off scores indicates remission, which is a hard criterion of remission, often being equivalent to a symptom-free state. For six of the studies, it was not possible to use the two standard deviation criteria due to floor effects, so instead, we used one standard deviation as the criterion in these cases.
In order to investigate possible predictors, we applied binomial logistic regression and used either response or recovery rates as dependent variables. All variables were entered into the model simultaneously. In terms of the variables used, we selected a few clinically relevant demographic and clinical predictors (54) (31). We selected the same variables as in our previous IPDMAs on deterioration (26) and non-response (27). The predictors were (a) symptom severity at baseline, (b) civil status, (c) age, (d) sick leave, (e) previous psychological treatment, (f) previous or ongoing psychotropic medication, (g) educational level, (h) diagnosis, and (i) gender. We also added a separate analysis of the association between publication year and the two outcomes response and recovery.
We present odds ratios (OR) with 95% confidence intervals (CI) in order to reflect an increase or decrease in odds of response and remission in relation to a reference category. In the case of dichotomous predictors (such as gender), the OR reflects the odds of response or remission when a client is female versus male (reference). A positive OR thus means better response in women. For continuous predictors (symptom severity at baseline and age), the OR instead represents an increase of one standard deviation above their respective mean. The statistical analyses on predictors were performed using jamovi version 0.9.2.9 (55), with the proportions of response and remission performed on a complete case basis (see online Supplementary Material).

Ethics
The data in the current IPMA were derived from several clinical trials, all of which had received ethical approval from the respective regional ethical review boards at each study location.

Study Characteristics
The 29 clinical trials were coded according to the previously predefined inclusion and exclusion criteria and deemed eligible for the current IPDMA (see 26, 27 for further details). Raw scores from all clients were entered into one spread sheet. The exception was 46 clients who had received either psychodynamic psychotherapy or interpersonal psychotherapy via the Internet as a control condition. In total, 1,535 clients were included in this IPDMA. The following diagnoses were included (clinical trials, k): social anxiety disorder (9), depression (with/without dysthymia) (5), generalized anxiety disorder (3), anxiety disorder (with/without depression) (3), mixed anxiety disorders (e.g., panic disorder as well as social anxiety disorder) (2), specific phobia (2), posttraumatic stress disorder (1), panic disorder (with/without agoraphobia) (1), gambling disorder (1), erectile dysfunction (1), and relationship problems (1) (see 26, p.6). Briefly, most participants in the trials had been recruited from the general population based on self-referral (n = 27). A common practice in the trials was to use structured telephone interviews such as the Structured Clinical Interview for DSM-IV-Axis I Disorders (56) or the MINI-International Neuropsychiatric Interview (57). Some studies had used diagnosis-specific instruments such as the Clinician-Administered PTSD Scale (58). For a description of treatment content original studies, (see 26,27). The amount of missing data for the primary outcome measures at post-treatment was 12.9%. A complete overview of the clinical trials is presented in Table 3 (which to some extent overlaps with Table 3 in 27 but with different results presented).

Response and Remission Rates
Using the RCI criteria for detecting response, 1,027 (69.9%; 95% CI: 67.61-72.19) of the 1,535 participants receiving treatment were categorized as treatment responders when using an RCI of z = 1.96. The lowest rates were found in the trials on erectile dysfunction (12.1%) and older adults with anxiety (27.3%), whereas the highest rates were found in the trials on gambling (93.6%) and spider phobia (92.3%). As seen in Table 3, the proportions varied, which was confirmed by the CMA program showing a significant heterogeneity (I 2 = 87.5%; Q = 223, p < .001). Remission was achieved in 540 participants (35.2%; 95% CI: 32.81-37.59) when adjusting for floor effects (the unadjusted proportion was 31.9%). There was a large variation with ranging from 0% (erectile dysfunction, depression, and bias modification for social anxiety disorder) to 82% (gambling) and 69% (spider phobia). As with the response rates, a significant heterogeneity was found (I 2 = 91.6%; Q = 334, p < .001).

Predictors of Response
Binomial logistic regressions were calculated with the predefined variables entered as predictors of response. Results are presented in Table 4 with OR and 95% CI for each predictor. The results that indicate a higher symptom severity on the primary outcome measure at baseline was predictive of better outcome. The odds for responding to treatment decreased when having an anxiety disorder as compared to depression/mood disorder and other diagnoses (i.e., erectile dysfunction, relationship problems, and gambling disorder), and the odds increased if the subject was female. The other variables were not predictive of response.
We also repeated the analyses with remission as outcome (see Table 5). In this analysis, symptom severity at pre-treatment was marginally associated with less remission (OR = 0.81). As with the analysis for response, the odds for remission were lower if the subject suffered from anxiety. Gender and the other variables did not reach statistical significance. Publication year was unrelated to response and remission.

DISCUSSION
The aim of this IPDMA was to obtain estimates of response and remission rates for ICBT with a range of conditions categorized into three groups (anxiety, depression, and other). In line with a previous IPDMA on depression by Karyotaki et al. (30) which included trials from different countries, we found that 65.6% of the treated research participants could be classified as treatment responders. This is slightly higher than the 56.19% reported by Karyotaki et al. (30). While Karyotaki et al. (30) imputed missing data, they also reported that the estimates between complete case analysis and the imputed dataset were minor. In addition, we found that 35.0% of participants could be classified as having remitted, which is slightly lower than the 38.51% remission rate reported by Karyotaki et al. (30). Moreover, we had a problem with floor effects and if not considering that our estimate is even lower (31.9%). Overall, the results are rather similar to Karyotaki et al. (30) in that roughly half of clients showed improvement following ICBT and remission was achieved by a third. Given these estimates, the outstanding question is how well this compares against face-to-face CBT. As previously mentioned, there are a few studies in which clients have been randomly assigned either face-to-face CBT or therapist-guided CBT. In the most recent of such study, Carlbring et al. (5) found no differences in effect. The response rates in CBT across different disorders and conditions are difficult to estimate; as to our knowledge, there is no similar IPDMA on response rates in face-to-face CBT. The closest we can get is a meta-analysis on anxiety disorders based on published data (28). In that review, 31% of the studies had defined response using RCI, and the results were similar to the range found in this analysis (44.5-51.1%). However, Loerinc et al. (28) also reported that the use of RCI in combination with a clinical cut-off (as was done in the present study) was associated with a 28% lower response rate (or, as expressed in this study, the remission rate was 28% lower than the RCI response rate). The difference we found was somewhat larger, as about half of the responders also showed remission. From a more naturalistic perspective, Gyani et al. (82) reported that 63.7% of participants in their clinical sample showed reliable improvement following evidence-based face-to-face treatment. In a recent meta-analysis, Springer et al. (83) reported a remission estimate as high as 51% (compared to our estimate of 35%). As stated by Loerinc et al. (28), there is a large variation in how to define both response and remission in CBT trials; thus, one advantage of the approach taken here is that we can use the same approach across studies. However, the definition of remission used (two SDs below the pre-treatment mean) was impossible in some studies (leading us to correct that estimate) and unrealistic in others. Another disadvantage of a statistical definition of response and remission is that it is heavily dependent on the sample upon which it is calculated. This pertains to both RCI and recovery as outlined by Jacobson and Truax (29). Another approach would have been to determine criteria for response and remission independently of the study, which is possible when there is data on non-clinical samples. For example, on the Beck Depression Inventory (44), a 10-point reduction could be seen as indicative of response, and 13-point reduction indicative of recovery. Applying these criteria for the Stella trial (see Table 3; 63), in which 57.5% of participants responded according to the RCI and 0% remitted, the corresponding figures of using a 10-point reduction (53.1%) and a score of 13 or below on the BDI (59.4%) paints a slightly different picture (although the results are similar between the RCI criteria and the 10-point reduction). In particular, the difference between the 0% classified as remitters versus the 59.4% having a score indicating minimal depression (44) shows the importance of response definitions (for a detailed discussion, see 37). However, in the present IPDMA, we found it to be of value to use similar criteria across trials. Future research could focus more on the external criteria of improvement instead of study-specific criteria. Unfortunately, the BDI is an exception, being widely used in many trials, and for some conditions and measures included in this IPDMA, cut-off scores and nonclinical norms have not been established.
In the present IPDMA, we conducted exploratory analyses to see if response could be predicted. It is important to note that there was no firm theoretical basis for our selection of predictors and, therefore, our findings must be interpreted with caution, as the identification of a significant predictor could have practical implications for future treatment recommendations. This is particularly the case when findings are based on samples larger than is commonly used in psychotherapy trials. However, the finding that symptom severity was not a negative predictor, but rather the opposite is in line with a previous IPDMA on lowintensity interventions for depression (33). From a clinical point of view, this makes sense, as having symptoms makes treatment more relevant than if subclinical or even different symptoms are  present in the client. However, it also suggests that our ICBT trials probably included participants without severe symptoms. It is therefore possible that this finding is based on selection criteria used in trials and not necessarily relevant for clinical practice when treatment is offered with fewer restrictions. In contrast to the finding for response, we found a small negative effect of pre-treatment severity for the prediction of remission. These two conflicting results may indicate that large improvements are less likely if the client has more symptoms. However, as remission requires a low level of symptoms, it is less likely that a client with many symptoms will reach that low level. As in treatment research in general, it could be that the search for predictors is best pursued in ordinary clinical settings rather than in wellcontrolled trials. On a promising note, large ICBT effectiveness studies are being reported (84), which may provide clearer insight into the efficacy of different approaches for different populations. The odds ratio in favor of female participants was surprisingly high (OR = 2.22), which is hard to explain as this is not a consistent result from previous individual trials. One possible explanation is the fact that the gender proportions in trials are nested with the condition treated. For example, in a typical depression trial, there is a majority of women, whereas in other conditions, there are more or less equal proportions of men and women, while some studies have only men (e.g., erectile dysfunction). The overall takeaway message here is that this finding needs further investigation in future trials to confirm its veracity. Gender was not found to be predictive of remission.
Interestingly, while the IPDMA by Karyotaki et al. (30) found old age to be weakly associated with better response (OR = 1.01), there was no such effect in this study suggesting that age is not a predictor of outcome. In Karyotaki et al. 's (30) complete case analyses, baseline severity (OR = 1.16) was found to predict better outcome, which was in line with our findings (but not in their intention to treat analyses). Gender did not predict outcome in Karyotaki et al. 's (30) IPDMA, but, again, their review focused on only depression which means a larger portion of the population was likely female.
In this IPDMA, we found decreased odds for responding to treatment when having an anxiety disorder as compared to depression/mood disorders or other (erectile dysfunction, relationship problems, and gambling disorder). This was a small effect, but still puzzling and hard to understand given that the overall picture is the opposite-indicating a higher response to ICBT in clients with anxiety disorders. Anxiety disorders were also found to be predictive of lower rates of remission.
While non-significant findings cannot be viewed as proof of absence of an effect, it is still interesting that ICBT has had marginal success in finding predictors of change as well as moderators and mediators of outcome (1). There are several possible reasons for this. First, participants in trials are not selected for their differences. Rather, they are chosen based on diagnostic criteria, access to the Internet, willingness to be a research participant, etc., all of which likely reduce the chance of identifying accurate predictors. Second, predictors of dropout (85) may ultimately be more illuminating, even if a given IPDMA was on self-guided ICBT for depressive symptoms. It is, however, interesting to note that they found that male gender (RR = 1.08) and co-morbid anxiety symptoms (RR = 1.18) significantly increased the risk of dropping out of the study, which is in line with our findings with regards to gender and anxiety disorders. In spite of the overall inconsistency in findings and lack of findings, we believe that IPDMA is a powerful and reliable tool for answering questions regarding predictors of response (86). However, this will require concerted efforts to align both outcome measures as well as data on predictors in order to make studies comparable. Even if we were co-workers and principal investigators in the trials included in this IPDMA, the data were not consistent with regards to background variables etc. It becomes even more problematic when attempting to combine datasets from different research groups.
This study had several limitations. We focus on four, knowing that there are further objections that could be raised. First, we only included our own trials, which were conducted in Sweden. IPDMAs are often selective, as original data sometimes cannot be obtained from authors, but in this case, we cannot generalize the results outside of our own culture and setting. In addition to not including trials from international groups, we also did not include the most recent and unpublished trials from our own groups, and there are other groups in Sweden with trials that were also not included. Furthermore, with the focus on our own studies conducted over a period close to 20 years, there is a possibility of time trends, which we did not focus on in this study. Technological changes may not influence effects, but, to give an example, early studies relied more on printing out text materials (87), whereas recent studies are typically delivered via online platforms (e.g., responsive to the presentation format, such as smartphones or computers) (2). Second, even if this limitation is not unique for ICBT, outcome measures were largely based on self-report. Such measures are useful and generally have good psychometric properties but still present a possible risk that selfreported changes do not correspond with actual behavior changes and do not conform to the findings of an interview. Third, as commented on by Loerinc et al. (28), treatment response can be defined by several measures, but here, we focused only on the primary outcomes. Our dataset on IPDMA could also be used to analyze effects on other secondary measures of constructs, like quality of life, as several of our studies have used the same measure for this construct (e.g., 88), and it has been reported that the effects of ICBT might be lower on that construct (89). The fourth limitation relates to the statistical methods used. We decided to report on a complete case basis, but we could not exclude bias, as missing data was not considered.

CONCLUSIONS
In spite of the limitations, the present IPDMA suggests that ICBT can lead to major reductions in symptoms and that more than half of clients, on average, respond to treatment. A lower proportion remits; thus, there is room for improvement. It is possible that women benefit more from ICBT based on our findings, and symptom severity seems to be predictive of outcome, but these findings should be interpreted with caution. Our findings that studies on clients with anxiety disorders were associated with less response is also notable but should be regarded with care. Future