Beyond the Lab: Empirically Supported Treatments in the Real World

Laboratory studies of empirically supported treatments (ESTs) for mental health problems achieve much higher rates of clinical improvement than has been observed following treatment in the community. This discrepancy is likely to due to limited reliance on ESTs by therapists outside of academia. Concerns about the generalizability of ESTs to patients in the community, who may have comorbid problems, likely limit rates of adoption. The present study examined the impact of ESTs delivered in the real-world for 1,256 adults who received services through an employee assistance program specializing in the delivery of ESTs. Rates of anxiety and depression decreased significantly, following treatment with an EST, and 898 (71.5%) patients demonstrated reliable improvement. Even among patients comorbid for depression and anxiety at baseline, over half reported reliable improvement in both disorders. Findings suggest ESTs can be effectively delivered outside of academic RCTs. However, additional research is needed to understand and overcome barriers to disseminating ESTs to the broader community.

Empirically supported therapies (ESTs) are behavioral health interventions that have been rigorously tested in randomized controlled trials (RCTs) or a series of well-designed single-subject experiments and have demonstrated efficacy when compared to a control or active treatment condition (Chambless and Hollon, 1998;Tolin et al., 2015). ESTs have been developed for a range of behavioral health problems, including the most common disorders, depression and anxiety. In particular, cognitive behavioral therapy (CBT), cognitive therapy (CT), and interpersonal psychotherapy have all demonstrated efficacy in the treatment of moderate or severe depression, with evidence suggesting that CBT and CT may be as efficacious as antidepressant medication (Shapiro et al., 1994;Gloaguen et al., 1998;DeRubeis et al., 2005). Multiple meta-analyses also support the efficacy of CBT or behavioral therapies for the treatment of anxiety disorders, including panic, obsessive-compulsive disorder, social anxiety, post-traumatic stress disorder, and generalized anxiety disorder (Deacon and Abramowitz, 2004).
Reliable improvement, defined as symptom improvement not accounted for by measurement error alone (Jacobson and Truax, 1991), is the standard by which ESTs are often measured. Laboratory studies of ESTs, in which efficacy is tested in an RCT, demonstrate rates of reliable improvement at or greater than 50% (Hofmann et al., 2012;Loerinc et al., 2015). However, findings indicate that among adults treated with psychotherapy under naturalistic conditions in the community, fewer than 30% achieve reliable improvement (Hansen et al., 2002).
One explanation for the discrepancy in observed outcomes may be due to limited adoption of ESTs by providers in the community. Despite efforts to disseminate ESTs more broadly, they are underutilized outside of academia (Stewart and Chambless, 2007). Concerns about the external validity of RCTs are frequently cited by clinicians in the community (Nelson and Steele, 2007;Kazdin, 2008), who worry that ESTs are infeasible to implement in real-world settings (Addis et al., 1999) and that subjects in RCTs differ in important ways from patients in community settings (Westen and Morrison, 2001). In particular, it has been hypothesized that including patients with psychological comorbidities, which tends to be the norm in real-world clinical settings, could reduce the impact of ESTs (Shadish et al., 2000).
Unfortunately, treatment studies often focus only on a single diagnosis, with comorbid psychiatric diagnoses considered exclusion criteria, thus diminishing their application to realworld settings (Goldstein-Piekarski et al., 2016). There are only a small number of RCTs examining EST effectiveness with more real-world-like samples. For example, Craigie and Nathan (2009) found that both individual and group CBT could be effectively used to treat depression in a sample of adult community outpatients with high rates of comorbidity. Barlow et al. (2017) similarly demonstrated that ESTs could be effective in the treatment of adult anxiety disorders when applied to patients with multiple psychiatric diagnoses.
However, other studies on the generalizability of ESTs have produced mixed results, with some research suggesting that ESTs are less potent when delivered outside of more controlled settings (e.g., Weisz et al., 2006;Jonsson et al., 2015). Consistent with these findings, it has been suggested that as study conditions more closely resemble the real world, the efficacy of ESTs may diminish (Westen and Morrison, 2001;Stewart and Chambless, 2009). For example, when providers are not required to use a treatment manual or when treatment fidelity is not measured, CBT may be less potent, as therapists may "drift" from standardized practices (Waller, 2009). Further, therapists in well-controlled RCTs often receive more intensive training in ESTs and clinical consultation than therapists practicing in the community (Weisz et al., 2006;Becker and Stirman, 2011), which may affect treatment fidelity and outcomes.
The EST literature has been criticized for focusing on efficacy in tightly controlled clinical trials at the expense of real-world generalizability (Tolin et al., 2015), and there are few studies that examine the efficacy of ESTs under more naturalistic conditions. In particular, only a handful of studies have looked at how adult patients with psychological comorbidities fare when treated with an EST. In addition, most studies of ESTs rely on a randomized controlled design, which may limit the generalizability of findings as not all patients are willing to be randomized to conditions (Tolin et al., 2015). Studies of the portability of ESTs outside of academia are often conducted in the context of extensive clinical training and oversight, something that is generally lacking in the community. In revising the criteria for ESTs, Tolin et al. (2015) encouraged research that (1) did not involve randomizing subjects to conditions, (2) was conducted by clinicians outside of academia, and (3) involved patients with behavioral health comorbidities.
This present study applies a retrospective design to build on the criteria outlined by Tolin et al. (2015) to better understand whether ESTs for behavioral health problems are effective under real-world conditions. We examined rates of reliable improvement among patients with depression or anxiety who received an EST from a community therapist. Because research on the efficacy of ESTs for patients with psychological comorbidities is limited, we also report separately rates of reliable improvement among patients who started treatment with both depression and anxiety.

Participants
Adult patients, 18 years or older, who started individual therapy between July 1, 2018 and May 31, 2019, were included in the present study. All patients in the study were referred to a community therapist by an employee assistance program (EAP) that partners with therapists who utilize ESTs. Patients were employees or dependents of customers who had purchased the EAP. Patients were included in the study if they scored in the clinical range on a measure of depression or anxiety and completed a baseline assessment within 2 weeks of their first appointment (Figure 1). Patients were sent electronically secure assessment questionnaires every 4 weeks following their first appointment. Baseline assessments were compared to the most recent assessment to estimate the impact of treatment. This study was deemed exempt from human patients review by the Western Institutional Review Board.

Therapists
All therapists in the present study were in community private or group practices and had agreed to join the network of an EAP that specializes in referring patients to providers who practice ESTs. Prior to joining the EAP network, a vetting team reviewed each provider's public presence (e.g., website) and application, if available, to determine whether they likely utilized ESTs. Those providers who passed this initial step were invited to participate in a clinical vetting interview designed to test their knowledge of and ability to apply ESTs. Sample components of the clinical interview include asking prospective therapists about their theoretical orientation, the therapies and interventions they use, how they measure treatment progress, the average length of treatment, and how they adapt treatment plans based on a patient's response to treatment. In particular, we were looking for providers who used ESTs as defined by Chambless and Hollon (1998) and Tolin et al. (2015), used validated measures to assess treatment progress and outcomes, and practiced short-term therapy in contrast to treatments of indeterminate length. Only those therapists who passed the rigorous clinical vetting interview were invited to join the network. Historically, approximately 5% of providers who applied to join the network have passed the review and vetting interview. Providers included in the study were compensated monetarily, as per standard community practice.

Measures
Assessments consisted of the Patient Health Questionnaire 9 (PHQ-9) and Generalized Anxiety Disorder scale (GAD-7), wellvalidated measures of depression and anxiety (Kroenke et al., 2001;Spitzer et al., 2006;Plummer et al., 2016). The PHQ-9 includes nine items rated on a 4-point scale, with scores ranging from 0 to a maximum of 27. The GAD-7 includes seven items rated on a 4-point scale, with scores ranging from 0 to 21. Both measures have been shown to be sensitive to treatment changes over time in psychiatric populations (Beard and Björgvinsson, 2014). For inclusion in the study, a clinical cutoff of 10 was used for the PHQ-9, as research suggests that patients who score at or greater than 10 are very likely to meet the criteria for major depression (Kroenke et al., 2001). A clinical cutoff of 8 was used for the GAD-7, as research suggests that scores at or greater than 8 are highly likely to correspond to an anxiety disorder diagnosis (Kroenke et al., 2007;Plummer et al., 2016).

Analyses
Multiple regression analyses were conducted to examine a possible association between number of sessions and reduction in PHQ-9 and GAD-7 scores, respectively, after controlling for baseline scores. Paired t-tests were used to test differences in PHQ-9 and GAD-7 scores between baseline and followup. For each measure, we compared the baseline assessment score to the last available assessment and calculated Cohen's d rm , a conservative measure of effect size for within-subjects designs that controls for correlation between measurements (Lakens, 2013). We calculated the number of patients who demonstrated reliable improvement on either measure, using the RC index proposed by Jacobson and Truax (1991). The RC index allowed us to determine whether a patient made substantial improvement in symptomatology, beyond what could be attributed to measurement error. Consistent with previous research (Gyani et al., 2013), we used an RC index of 6 for the PHQ-9 and 4 for the GAD-7. We also calculated the percentage of patients who recovered, meaning they moved from the clinical range to below the clinical cutoff on either measure. In addition, we calculated the number of patients who demonstrated reliable recovery (Gyani et al., 2013) in that they both demonstrated reliable improvement and recovered on either the PHQ-9 or GAD-7. Finally, to better assess the impact of treatment on patients with psychological comorbidity, rates of reliable improvement and recovery are reported separately for patients who started in the clinical range on both the PHQ-9 and GAD-7.

RESULTS
Of the 1,256 patients included in the analyses, 54.3% (n = 682) identified as female, 33.3% (n = 418) identified as male, 12.3% (n = 155) did not specify gender, and gender was missing for 1 patient. Sixty-eight percent of patients were between the ages of 18 and 34 years. The mean age of patients was 32.8 (SD = 8.7) years ( Table 1).
The 1,256 patients saw 559 separate therapists ( Table 1). On average, each therapist saw 2.1 (SD = 1.6) patients. Approximately 43.3% of therapists had a doctoral degree; 56.7% of therapists had a master's degree (e.g., LMFT, LCSW, LPCC). Therapists delivered ESTs, as per their normal practice, with the exception that patients could receive up to a pre-specified number of session visits per calendar year, with the maximum number of sessions being 25. The average number of sessions delivered across the course of treatment was 9.4 (SD = 7.13). The average number of weeks patients spent in treatment was 13.1 (SD = 10.4). The number of sessions was inversely correlated with depression and anxiety at follow-up, in that patients with more sessions showed greater improvement on the PHQ-9 [β = −0.05, t(1253) = 2.60, p = 0.01] and the GAD-7 [β = −0.05, t(1094) = 2.65, p < 0.01], after controlling for baseline scores.
Independent-samples t-tests were conducted to compare baseline severities on the GAD-7 and PHQ-9 for patients who did and did not complete a second outcome assessment. In terms of baseline PHQ-9 scores, there was no significant difference in severity for patients who completed a second outcome (mean = 14.30, SD = 3.79) and patients who did not complete a second outcome (mean = 14.65, SD = 3.83); t(1418) = −1.69, p = 0.09. Similarly, there was not a significant difference in the severity of baseline GAD-7 scores for patients who completed a second outcome (mean = 12.69, SD = 3.61) and patients who did not complete a second outcome (mean = 12.70, SD = 3.62); t(1860) = −0.05, p = 0.96.

Depression Symptoms
Of the 1,256 patients, 845 (67.3%) scored in the clinical range on the PHQ-9 at baseline. The baseline average score on the PHQ-9 was 14.30 (SD = 3.79), corresponding to the moderate range of depression severity. Results of paired t-tests revealed that among patients who started in the clinical range, depression scores decreased significantly between baseline and follow-up (mean = 7.58, SD = 4.45), with patients improving an average of 6.7 points on the PHQ-9 [95% confidence interval (CI), 6.

Anxiety Symptoms
At baseline, 1,097 patients (87.3%) scored in the clinical range on the GAD-7. The average baseline score on the GAD-7 was 12.69 (SD = 3.61), corresponding to the moderate range of anxiety severity. Anxiety scores decreased an average of 5.6 points (95% CI, 5.3-5.9) between baseline and follow-up (mean = 7.07, SD = 4.35), and results of paired t-tests revealed that this difference was statistically significant, t(1096) = 38.66, p < 0.001. Cohen's d suggested a large treatment effect size on anxiety (d rm = 1.40). Of the 1,097 patients with clinical scores on the GAD-7 at baseline, 750 (68.4%) met the criteria for reliable improvement, and 682 patients (62.2%) recovered ( Table 2). Reliable recovery from anxiety was observed in 595 patients (54.2%).

Depression and Anxiety Symptoms
A total of 686 patients (54.6%) scored in the clinical range on both measures at baseline, suggesting they were comorbid for depression and anxiety. Of these, 361 patients (52.6%) showed reliable improvement on both measures, and 350 (51.0%) recovered on both measures ( Table 2). Approximately 269 (39.2%) of the 686 patients demonstrated reliable recovery from both depression and anxiety.

DISCUSSION
Findings presented here demonstrate that ESTs can be efficacious under real-world conditions and deliver results that are comparable to those observed in RCTs (e.g., Hofmann et al., 2012). Among patients receiving an EST from a community provider, levels of depression and anxiety significantly decreased over the course of treatment. Of note, more than half of patients who were comorbid for depression and anxiety at baseline made meaningful improvement in both areas. In utilizing the criteria outlined by Tolin et al. (2015), this study further supports the efficacy of ESTs and extends their usefulness to settings outside of academia. It is widely known that a gap exists between academia and real-world clinical practice, with the majority of providers in the community relying on prior experience and professional preferences, rather than research data, to inform their clinical decisions (Stewart and Chambless, 2007;Lilienfeld et al., 2013). One reason that community providers cite for rejecting ESTs is a concern that subjects in RCTs are not representative of patients in the real world who may have higher rates of comorbidity (Shafran et al., 2009). Contrary to this hypothesis, findings presented here are consistent with previous research (Craigie and Nathan, 2009;Barlow et al., 2017) and suggest that, even for patients with psychiatric comorbidities, ESTs can produce significant symptom reduction. Further, treatment effectiveness was comparable to efficacy rates seen in RCTs (e.g., Hofmann et al., 2012) despite the lack of clinical oversight and standardized training in ESTs that are characteristic of most studies examining EST effectiveness in the community.
Some limitations to the present study should be considered. In particular, there was no measure of treatment fidelity, so we cannot be certain whether ESTs were delivered with strict adherence to treatment manuals. However, this limitation allowed us to examine how ESTs perform when delivered under more naturalistic conditions by therapists with heterogeneous training in ESTs and use of varied ESTs. It is also possible that therapists combined elements of different ESTs, and in fact, the research suggests that this approach may be associated with better outcomes for patients with psychological comorbidities (Chorpita et al., 2013). It would have been helpful to capture what other comorbid behavioral health conditions patients may have been experiencing and more specific details on the types of anxiety or depressive disorders patients had. In addition, we did not account for any other treatments patients may have been receiving that could also have produced a change in depression or anxiety. Further, because this was a naturalistic study, the timing of when baseline and follow-up measures were completed varied across patients. It is possible that the last outcome measure we have for some patients was collected midtreatment, and we would have seen even higher rates of reliable improvement and recovery if all patients completed the final outcome survey immediately after the end of treatment. It should also be noted that this was not an intent-to-treat analysis, and there may have been important differences between those patients who completed outcomes assessments and those who did not or those who dropped out of treatment prematurely, possibly skewing findings in a more positive direction.
Finally, because of a possible therapist selection bias, it is unknown whether these results are generalizable to ESTs delivered by community providers who have not undergone extensive vetting. In the majority of studies testing the generalizability of ESTs beyond academia or with patients who have psychological comorbidities, providers receive considerable clinical training, and treatment fidelity is measured in an ongoing way (e.g., Weisz et al., 2006;Barlow et al., 2017). Additional research is therefore needed to determine whether most providers can deliver ESTs in the community without substantial clinical vetting or oversight.
Further research is also needed to understand which ESTs are most easily transported to community settings and what modifications, if any, therapists in the community must make to improve patient acceptability. In traditional research studies, there may be a self-selection bias favoring patients who are interested in more structured or novel interventions and who are better-educated around what ESTs typically entail. Patients in the community may be less familiar with therapy, in general, and likely have very different expectations for therapy relative to those participating in a research study at an academic center. Therapists in the community may be adapting ESTs to make them more appealing to patients, and an understanding of these adaptations could improve efforts to disseminate ESTs more broadly.
Despite the limitations, this study addresses an important gap in the empirical research on the external validity of ESTs. Given that research demonstrates that ESTs are effective outside of academia (e.g., Weisz et al., 2006;Craigie and Nathan, 2009), translational studies aimed at understanding and overcoming barriers to the adoption of ESTs in the real world are an important next step. Understanding what changes therapists in the community may make to ESTs to improve acceptability and how those changes affect treatment efficacy may enhance dissemination of ESTs outside of academic settings and increase the sustainability of EST implementation in clinical practice.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The study was granted exempt status by the Western IRB. Written informed consent from the participants was not required to participate in this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
RS wrote the initial draft of the manuscript. JG, CC, ER, and BK contributed to the development and editing of the manuscript. SC did the statistical analyses. All authors contributed to the article and approved the submitted version.

FUNDING
This research was funded by the Lyra Health.