Smartphone Psychological Therapy During COVID-19: A Study on the Effectiveness of Five Popular Mental Health Apps for Anxiety and Depression

Marshall, Jamie M.; Dunstan, Debra A.; Bartik, Warren

doi:10.3389/fpsyg.2021.775775

ORIGINAL RESEARCH article

Front. Psychol., 13 December 2021

Sec. Health Psychology

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.775775

This article is part of the Research TopicDigital Health Interventions for Psychological and Behavioral Changes During the COVID-19 PandemicView all 9 articles

Smartphone Psychological Therapy During COVID-19: A Study on the Effectiveness of Five Popular Mental Health Apps for Anxiety and Depression

Jamie M. Marshall^*

Debra A. Dunstan

Warren Bartik

School of Psychology, Faculty of Medicine and Health, University of New England, Armidale, NSW, Australia

The aims of this study were to examine the effectiveness of a range of smartphone apps for managing symptoms of anxiety and depression and to assess the utility of a single-case research design for enhancing the evidence base for this mode of treatment delivery. The study was serendipitously impacted by the COVID-19 pandemic, which allowed for effectiveness to be additionally observed in the context of significant community distress. A pilot study was initially conducted using theSuperBetter app to evaluate the proposed methodology, which proved successful with the four finishing participants. In the main study, 39 participants commenced (27 females and 12 males,M_Age = 34.04 years,SD = 12.20), with 29 finishing the intervention phase and completing post-intervention measures. At 6-month follow-up, a further three participants could not be contacted. This study used a digitally enhanced, multiple baseline across-individuals single-case research design. Participants were randomly assigned to the following apps:SuperBetter (n = 8),Smiling Mind (n = 7),MoodMission (n = 8),MindShift (n = 8), andDestressify (n = 8). Symptomatology and life functioning were measured at five different time points: pre-baseline/screening, baseline, intervention, 3-week post-intervention, and 6-month follow-up. Detailed individual perceptions and subjective ratings of the apps were also obtained from participants following the study’s completion. Data were analyzed using visual inspection, time-series analysis, and methods of statistical and clinical significance. Positive results were observed for all apps. Overall, more favorable outcomes were achieved by younger participants, those concurrently undertaking psychotherapy and/or psychotropic medication, those with anxiety and mixed anxiety and depression rather than stand-alone depression, and those with a shorter history of mental illness. Outcomes were generally maintained at 6-month follow-up. It was concluded that a diverse range of evidence-based therapies offered via apps can be effective in managing mental health and improving life functioning even during times of significant global unrest and, like all psychotherapies, are influenced by client features. Additionally, this single-case research design is a low-cost/high value means of assessing the effectiveness of mental health apps.

Clinical Trial Registration: The study is registered with the Australian and New Zealand Clinical Trials Registry (ANZCTR), which is a primary registry in the World Health Organization Registry Network, registration number ACTRN12619001302145p (http://www.ANZCTR.org.au/ACTRN12619001302145p.aspx).

Introduction

Currently, there are over 10,000 mental health apps publicly available (Torous et al., 2018), but most of these have not been developed using established theoretical frameworks (Marshall et al., 2020a), or by recognized mental health experts (Shen et al., 2015; Alyami et al., 2017). Additionally, most of the comparatively few apps with research evidence of efficacy are not further supported by additional studies by researchers unaffiliated with the app or in diverse samples of participants (Marshall et al., 2019). In the interests of public safety and greater understanding of the usage of individual apps, more research needs to be carried out (Firth et al., 2017a,b; Lui et al., 2017). To achieve this, a low-cost/high-value research design is required (Clough and Casey, 2015a; Marshall et al., 2020b).

Regulation and Risks of Mental Health Apps

Mental health apps have gone largely unregulated by government authorities in most parts of the world (Marshall et al., 2020d), but there is evidence that this is changing as it becomes apparent that regulatory oversight of mental health apps may improve the quality of the available apps. Increased regulation may assist app developers to create apps that use evidence-based, “best practice” principles and may also assist clinicians and consumers in choosing efficacious apps. The possible downside of regulation may be that smaller app developers with limited financial resources may not be able to afford to pay for their app to be regulated or “assessed,” and this in turn may lead to novel app interventions being restricted or blocked from being widely available.

One of the main issues at the heart of regulation concerns the risk of harm to consumers. Specifically, governments and health authorities need to be sure that, at the very least, a consumer will not be at risk of harm when using a mental health app. Without proper oversight, it is possible that an app may provide ill-advised advice to a consumer who may be experiencing significant mental health issues, such as suicidal ideation. If the wrong advice is given, or an inappropriate intervention is offered, the worst outcome could be harm or death to the user. Furthermore, regulation may be required to confirm than an app does what it says it does. For example, an app may claim to use interventions from a specific type of therapy (see next section), but if the interventions are not accurately based on such an evidence-based framework, it may result in the credibility of that framework being questioned by the user (Marshall et al., 2020a). More worryingly, if a user questioned such an evidence-based framework which was misinterpreted or misunderstood, the misinformation could easily and quickly be disseminated in online forums and social media platforms, possibly resulting in widespread unfair negative criticism being broadcast about that theoretical framework.

Mental Health Apps as Mechanized Psychotherapy

Best practice for treating symptoms of anxiety and depression depends on the individual’s unique presentation and will involve evidence-based psychotherapy and/or antidepressant medication (American Psychiatric Association, 2009, 2010; Cuijpers et al., 2013). Widely used evidence-based psychotherapies include: cognitive-behavioral therapy (CBT; Beck et al., 1979; Butler et al., 2006; Bennett-Levy et al., 2009, 2010), interpersonal therapy (ITP; de Mello et al., 2005), dialectical behavior therapy (DBT; Lynch et al., 2006), acceptance and commitment therapy (ACT; Vollestad et al., 2012; A-Tjak et al., 2015), and positive psychology interventions (Seligman and Csikszentmihalyi, 2000; Seligman et al., 2005).

Research has shown that several factors influence the prognosis and outcomes of psychotherapy. These include client–therapist rapport (Tang and DeRubeis, 1999), client motivation (Addis and Jacobson, 2000), chronicity/history of mental illness (Hamilton and Dobson, 2002), functional impairment, social support, coping style, level of client resistance, subjective distress, and readiness to change (Beutler and Clarkin, 1990). Such factors combined with selected treatment may account for over 90% of the variance in successful outcomes (Beutler et al., 1999). In terms of appropriate treatment, while CBT is effective in treating depression, ITP may be more useful in circumstances where the precipitating factor is an interpersonal relationship issue (Zhou et al., 2017). Similarly, positive psychology approaches may be more applicable when highly motivated; older individuals wish to focus on strengths and positive interventions to maximize their psychological well-being (Sin and Lyubomirsky, 2009). In this way, positive psychology strategies may complement rather than replace traditional CBT approaches (Harvard Medical School, 2008). Overall, it is reasonable to assume that individual participant characteristics, the treatment approach, and the participant’s perceived engagement will influence clinical outcomes, including outcomes from treatments using a mental health app. It is these types of influences that are examined in effectiveness research (Singal et al., 2014).

The Importance of Effectiveness Research

The current evidence base supporting the use of mental health apps for anxiety and depression includes a small number of studies of efficacy and even fewer of effectiveness. In clinical psychology, efficacy studies occur under controlled conditions where participants are screened for their suitability to improve the homogeneity of the experimental group, whereas effectiveness studies are designed to measure interventions in “real-world” clinical settings with more heterogeneous populations (Kazdin, 2017). An intervention that has been found to be efficacious also needs to demonstrate effectiveness in clinical practice (Singal et al., 2014). An efficacy trial may inflate an intervention’s clinical impact (effect size) in practice; therefore, it is important for treatments to have demonstrated effectiveness in this context. Although proven efficacy increases the chances of observing an intervention effect if one exists, effectiveness research accounts for individual clinician, client, and process characteristics that may moderate an intervention’s effect (Singal et al., 2014).

If the research on mental health apps is to be free of the limitations of inflated effect sizes found in efficacy studies, effectiveness studies are required. In a recent review of the two major app stores, only 3% of apps that claimed to offer a therapeutic treatment for anxiety and depression had published peer-reviewed research to back up their assertions (Marshall et al., 2019); thus, the majority of apps do not have the research evidence needed to inform individuals or clinicians (Firth et al., 2017a,b). If the proportion of research of both efficacy and effectiveness was increased, mental health apps could achieve widespread acceptance and validation by consumers and clinicians alike.

Research on Mental Health Apps

There are several challenges to conducting traditional efficacy studies on mental health apps, and these are mainly due to the rapid pace of app development, the high cost of running large research trials, and the obsolescence of digital products – some mental health apps that go through a research process never become publicly available (Firth et al., 2017b). The app industry is populated by young start-up companies with large investment funds to produce the “next big thing” in health-related apps (Medical Startups, 2020) and bring it to market as soon as possible. As such, efficacy research using the traditional gold-standard randomized controlled trial (RCT) may be an impediment for mental health app research, given the long periods (sometimes years) to organize, run, and analyze a trial. During this time, other apps aimed at the same market may be listed for download, making the app going through this research process obsolete even before it has been publicly released (Clough and Casey, 2015a). Such costs on top of already large financial sums that have gone into the development of a product up to that point can be difficult for investors to accept. Thus, while RCTs are the “gold standard” for demonstrating efficacy, a different research approach may be required in the area of mental health apps (Clough and Casey, 2015a).

Single-Case Designs

Single-case research designs are a viable alternative to large group designs, such as RCTs, and have the capacity to evaluate both the efficacy and effectiveness of mental health apps (Clough and Casey, 2015b; Mehrotra and Tripathi, 2018; Marshall et al., 2020d). This is because single-case research designs can assess the causal relationship between an intervention and outcomes (i.e., its efficacy), while also having the external validity to demonstrate effectiveness in heterogeneous samples (Lobo et al., 2017).

Single-case designs control for threats in internal validity by having continuous and repeated measurement of outcomes (dependent variables), random assignment, the potential for multiple participants, replication, and specific data analysis and statistics (Krasny-Pacini and Evans, 2018). With a baseline phase of “no treatment,” a participant acts as their own control through the sequential introduction of varying levels of an intervention (the independent variable) across “phases” of a study. In a design involving multiple participants, random assignment to the staggered introduction of the intervention addresses threats to internal validity from history, maturation and testing. In circumstances where three or more participants share similar presentations, receive identical treatment, and show strong outcomes, the results are considered to be a legitimate demonstration of efficacy (Horner et al., 2005; Barlow et al., 2009; Kazdin, 2017). By taking into account the features of individual participants, such designs also provide data on effectiveness (Buckley et al., 2014; Sheridan, 2014). See the Procedure section for details of themultiple baseline across-individuals design of the present study.

There have been calls for practicing clinicians to be more involved in the process of researching mental health interventions, especially those that are well-suited to being incorporated into real-world therapeutic settings, such as smartphone apps (Clough and Casey, 2015a). The use of single-case designs could facilitate the recruitment of practicing clinicians to research the efficacy and effectiveness of mental health apps by focusing on a limited number of participants from the clinician’s usual client load (Clough and Casey, 2015a; Marshall et al., 2020d). Marshall et al. (2020d) summarizes a model of how clinicians may contribute to a centralized database of efficacy and effectiveness information on mental health apps by following the design of the present study. Such a database would offer an ever-increasing knowledge hub that complements app review websites such asPsyberGuide¹;Head To Health²; and theNHS Apps Library.³

The Impact and Consequences of COVID-19

Soon after the present study commenced in early 2020, the COVID-19 pandemic began to have widespread negative impacts around the globe. It became clear that mental health was one such negative impact in countries, including Australia (Koh, 2020), China (Feng, 2020), India (Mukherjee, 2020), New Zealand (Mindfood, 2020), United Kingdom (Chowdhury, 2020), United States (Heilweil, 2020), and others. Due to the increased demand for services from mental health professionals, many people struggled to access in-person services and this led to increased demand for online/Telehealth options (Dunlop et al., 2019; Liu et al., 2020; Medhora, 2020; Marshall et al., 2020c). This included a 50% increase in the number of Australian young people aged 18–25 who were accessing online mental health help (Reachout.com, 2020), and increased downloads of mental health apps (Basu, 2020; Heilweil, 2020; Statista, 2020; Marshall et al., 2020c).

Mental health apps may seem like a good option to manage mental illness during a pandemic. After all, over 5.2 billion people worldwide own a smartphone (Barboutov et al., 2017), and these figures are growing. With such potential for wide access to mental health apps, and with ongoing difficulties accessing in-person treatment for mental health issues (Liu et al., 2020), it was little wonder that people turned to digital options during the pandemic.

Mental health apps are also attractive for general practitioners. Apps have the potential to reduce the burden on primary health care at a time when such care is dealing en masse with the acute need to treat COVID-19 (Azarang et al., 2019). It is possible that many general practitioners believed that they could “prescribe” a mental health app for their patients (Byambasuren et al., 2018) due to the shortage of in-person mental health treatment options. However, it is likely that many would not have been aware of the lack of evidence for the efficacy and effectiveness of most publicly available mental health apps.

The timing of the pandemic in relation to the present study is both serendipitous and intriguing. The baseline period for all participants commenced on January 30, 2020, and all participants were using their assigned app by February 28, 2020. In Australia, where the study was completed, the Federal Government made several key announcements (including lockdown orders and making available additional government payments for people who became unemployed) between March 12-23 (Klapdor, 2020). This 12-day period saw an increase in stress across communities, including panic buying at grocery stores (Wright, 2020).

In relation to the present study, all participants had been using their app for at least 2 weeks before the pandemic reached fever pitch in Australia. The methodology was able to detect reliable spikes in Subjective Units of Distress (SUDS) between the crucial period of March 12–23, 2020, and in the weeks and months afterward. Therefore, the study has been able to provide data on how well these five mental health apps were able to assist people to manage symptoms of anxiety and/or depression during what has arguably been the most stressful period in a generation. More broadly, the results of this study provide quality evidence of the effectiveness of these apps to help manage anxiety and/or depression during a period of massive global upheaval.

Lessons From Pilot Work

A pilot study using a randomly chosen app from the five used in this study (SuperBetter) was conducted to test the feasibility of the proposed methodology. The pilot study confirmed that the methodology can be used to answer the research questions and that assertive follow-up of participants who prematurely stop providing daily SUDS ratings should be used in an effort to reduce the rate of attrition. The pilot results also provided data for comparative comments about the apparent effectiveness of the intervention when delivered in the main study and in the context of COVID-19, as the pilot study was conducted prior to COVID-19 having pandemic status.

The Present Studies

The main objective of the research was to examine the effectiveness of five mental health apps, from a range of theoretical orientations for reducing symptoms of anxiety and/or depression. The apps selected wereSuperBetter,Smiling Mind,MoodMission,MindShift, andDestressify (see Materials and Measures section for further details).

The protocol for this research has been published (Marshall et al., 2020b) and is registered with the Australian and New Zealand Clinical Trials Registry (ANZCTR), which is a primary registry in the World Health Organization Registry Network, registration number ACTRN12619001302145p.⁴ Readers are encouraged to refer to the published open access protocol (Marshall et al., 2020b) for further information relating to the Methods used in the present research.

The present research sought to answer the following research questions:

1. Can a range of mental health apps, employing diverse theoretical orientations, reduce subjective distress and clinically significant symptoms of anxiety and/or depression, and improve functioning in a sample of heterogeneous participants?

2. Are there specific factors about the participants that impact on the results?

3. What are the participants’ experiences of using the apps?

Materials and Methods

The following Materials and Methods section is a summary. Refer to the published open access research protocol (Marshall et al., 2020b) for the complete Materials and Methods section.

Participants

Inclusion criteria:

1. Eighteen years of age or older;

2. Ability to read English;

3. Have access to a smartphone or tablet device capable of connecting to the Internet and downloading the required app, and sending and receiving SMS text messages;

4. Agreeable to providing daily SUDS ratings via SMS text message and to completing self-report measures at five different time points; and

5. Mild-to-moderate anxiety and/or depression, diagnosed by a qualified health professional, and confirmed by the researchers (all of whom are clinical psychologists) after screening.

Exclusion criteria:

1. Severe anxiety and/or depression, as indicated by the initial outcome measures and in any responses to specific questions in the Demographics Questionnaire;

2. History of psychosis, or other complex mental health presentation as deemed by the researchers to be unsuitable for participation in this research (a question in the Demographics Questionnaire asked participants for their complete mental health diagnoses); and

3. Current suicidal ideation, as indicated by a participant’s responses on the initial outcome measures.

Removal criteria:

1. Not providing any SUDS rating for a 2-week period;

2. Not providing a minimum of 20 SUDS ratings in the baseline and post-intervention phases, or a minimum of 40 SUDS ratings in the intervention phase;

3. Not completing outcome measures either pre-intervention, or post-intervention;

4. Clinically significant/unsafe decline in mental health as indicated by SUDS ratings or outcome measures, or in the judgment of researchers; and

5. Suicidal ideation that has developed during the participants’ involvement in the study.

Materials and Measures

The Apps

The apps used in this study wereSuperBetter (Roepke et al., 2015; Worthen-Chaudhari et al., 2017)⁵;Smiling Mind (Flett et al., 2019)⁶;MoodMission (Bakker and Rickard, 2018; Bakker et al., 2018a,b)⁷;MindShift (Paul and Fleming, 2019)⁸; andDestressify (Lee and Jung, 2018). These apps were purposively selected on the basis of using an evidence-based treatment approach; evidence of efficacy in reducing symptoms of anxiety and/or depression and had an accompanying website with further information, including privacy statements (Note that at time of publication,Destressify is no longer available). In terms of theoretical orientations,SuperBetter uses a positive psychology framework and incorporates ideas from neuroscience in the area of neuroplasticity;Smiling Mind uses a structured mindfulness-based framework;MoodMission uses a CBT framework that emphasizes a behavioral approach, but also contains cognitive elements;MindShift uses a more cognitively focused CBT framework; andDestressify uses a less structured mindfulness-based framework compared toSmiling Mind. These apps were also chosen because the instructions given to participants could be equally applied across all of the apps. That is, the instruction to use the app for at least 10 min per day, for 5 days per week, would encourage participants to engage with their app and to use it for longer periods if they wished. Furthermore, 10 min per day, for 5 days per week, was deemed as dose equivalent to one “therapeutic hour” of psychological intervention each week.

Demographic and Biographic Features

A questionnaire was developed by the researchers to elicit demographic and biographic information.

Mental Health and Well-Being

The three-phase model of psychotherapy outcomes (Howard et al., 1993) was used as the framework for examining participant outcomes relating to subjective distress, symptomatology, and life functioning as follows:

1. Subjective distress: SUDS ratings – participants rated their level of distress in response to the question: “How do you feel today?,” with 0 indicatingno distress and 10 indicatingworst possible distress, and a score of 3 or more indicating a mild but noticeable level of upset (Wolpe and Lazarus, 1966). SUDS ratings have been shown to be a valid measure of emotional discomfort when compared with other measures of distress (r = 0.351,p < 0.05; Tanner, 2012).

2. Symptoms: The Depression Anxiety Stress Scale-21 short-form version (DASS-21; Henry and Crawford, 2005). Participants rated their experience of symptoms of depression, anxiety and stress over the previous week on a four-point scale ranging from 0 (did not apply to me at all) to 3 (applied to me very much, or most of the time). The total scores for the subscales are multiplied by two in order to interpret the severity ratings according to the longer 42-item scale (Lovibond and Lovibond, 1995; Antony et al., 1998). In this study, only the depression and anxiety subscales were used. The ratings for the depression subscale are 0–9 (Normal), 10–13 (Mild), 14–20 (Moderate), 21–27 (Severe) and 28+ (Extremely Severe); and, for the anxiety subscale are 0–7 (Normal), 8–9 (Mild), 10–14 (Moderate), 15–19 (Severe) and 20+ (Extremely Severe).

3. Life functioning: The Outcome Questionnaire-45 2nd Edition version (OQ-45.2; Boswell et al., 2013) is a 45-item self-report scale that measures distress, interpersonal relationships and social role functioning in adults 18 years and older (Beckstead et al., 2003). An index for overall life functioning is calculated (Lambert and Finch, 1999). Participants rate their feelings over the previous week on a five-point scale ranging from 0 (never) to 4 (always). Possible scores range from 0 to 180 with a total score of 63 or more being indicative of clinically significant symptoms (Lambert and Finch, 1999). Lambert et al. (2004) have suggested the following interpretive labels: >105 isHigh, 83–104 isModerately High, 63–82 isModerate, and <63 isNormal.

App Appraisal

The Mobile Application Rating Scale-User Version (uMARS; Stoyanov et al., 2016) is a 20-item questionnaire recording an individual’s rating on the quality of a mobile app. It contains multiple-choice and Likert-type responses and also contains a free-text field allowing users to provide a qualitative description of any aspect of the app, or their experience of using the app.

Data Analysis

Data from this project are publicly available through the University of New England’sResearch UNE website,⁹ DOI: 10.25952/c5nc-fq89. For further information on the data analysis plan and statistical methods used, see the published research protocol for this study (Marshall et al., 2020b).

Descriptive Statistics and Qualitative Accounts

Descriptive statistics were used to describe individual participant features and augment the findings from the other analyses.

Visual Inspection

Visual inspection was used to assess the impact of the intervention on subjective distress (SUDS). Plotted data allow for a personal judgment about the effect of an intervention (Kazdin, 2017), and in this study, visual inspection was possible using up to 122 data points of SUDS ratings (this was the highest number of individual SUDS ratings, by Participant B4 – see Supplementary Table 20 in the Supplementary Material section).

Time-Series Analysis

A time-series analysis was used to assess the statistical significance of changes in each participant’s plotted data across each phase of the study. Scores at the commencement of the intervention were used as the predictor in a regression model.

Clinical Significance and Statistical Reliability

Clinically significant symptoms of depression and anxiety, and changes in level of severity, were identified according to the published normative data for the DASS-42. Life functioning was assessed for clinical significance and change using the clinical significance index (CSI; Jacobson et al., 1999) and the reliable change index (RCI; Jacobson and Truax, 1991). Normative data for a scale are used to calculate the CSI, which is the cutoff point between the scores obtained by functional (non-clinical) and dysfunctional (clinical) populations (Jacobson and Truax, 1991; Evans et al., 1998). In this study, the CSI was used to note each participant’s clinical status pre- and post-intervention (Jacobson and Truax, 1991; Evans et al., 1998). The reliable change index (RCI) was used to assess and classify the statistical significance of any change in participants’ score from pre- to post-intervention:Recovered = clinically significant and statistically reliable;Improved = not clinically significant, but statistically reliable;Unchanged = not clinically significant or statistically reliable;Deteriorated = clinically significant and/or statistically reliable in a worsening direction.

Procedure

The University of New England Human Research Ethics Committee approved the project on November 1, 2019, Approval Number HE19-186.

Between November 1, 2019, and January 30, 2020, participants were recruited throughout Australia by directly approaching non-government mental health services, mental health associations (both consumer and professional), and support groups and other organizations in the mental health sector. By early December 2019, 10 participants were recruited and used in the pilot study. Another 39 participants responded to calls for expressions of interest and were used in the main study. After informed consent was obtained, participants commenced the baseline phase simultaneously and were randomly selected to begin the intervention phase in staggered order. Randomization was achieved using the online random number generator,Research Randomizer (Urbaniak and Plous, 2019).¹⁰ See Figure 1 for a flowchart of the study’s phases and participant involvement.

FIGURE 1

Figure 1. Flowchart of participant involvement and study phases.

By using a single-case design, participants were able to be observed closely and in “real-time” allowing for highly responsive treatment. This is an important consideration in mental health research where participants may be experiencing suicidal ideation. Individual well-being was monitored by participants providing daily SUDS ratings by sending a SMS text message from their smartphone to a centrally monitored hub. While it is acknowledged that a rating out of 10 is itself limited in its ability to convey the complexities of an individual’s mental health, it can allow a mental health researcher/clinician to evaluate the relatively immediate influence of a treatment (Machalicek and Horner, 2018), adjust the intervention in response to changes in ratings, or halt the intervention and rapidly arrange crisis support if necessary (Bentley et al., 2019). However, halting the intervention or crisis support was not required for any participant.

Six participants dropped out of the pilot study prior to completing the intervention phase and were not followed up.

In the main study, 10 participants were lost to the study by the time the intervention phase had finished. A total of 29 participants completed the post-intervention phase, producing an attrition rate of 25.60%. This was substantially improved from the pilot study’s attrition rate of 60% and is attributed to assertive follow-up by the researchers when participants did not provide SUDS ratings for three consecutive days during the baseline or intervention phases. This strategy was introduced following the outcomes of the pilot study and was the single difference in methodology between the main study and the pilot study.

For a more detailed breakdown of the processes of each phase, refer to the published research protocol (Marshall et al., 2020b). The phases were identified as: phase 1 (pre-baseline), phase 2 (baseline), phase 3 (intervention), phase 4 (post-intervention), and phase 5 (6-month follow-up).

Results: Pilot Study

Four out of 10 participants completed the pilot study with an age range of 20–49 (M = 35.25,SD = 14.9). Three participants were female; three reported comorbid anxiety and depression, and one reported anxiety disorders only; two had chronic illness of >11 years and two were receiving concurrent treatment (either psychotherapy or psychotropic medication); all were ambivalent in their motivation to comply with the app. The age range of the six participants that dropped out was 31–55 (M = 44.3,SD = 10.5), which was not significantly different to the mean age of the finishing participants [t(9) = −1.14,p = 0.29]. Four were male; two reported co-morbid anxiety and depression, and four reported depression only; five had chronic illness of >11 years and five were receiving concurrent treatment (either psychotherapy or psychotropic medication); one was strongly motivated and five were ambivalent in their motivation to comply with the app. See upplementary Tables 1–3 in the Supplementary Material section for further information regarding the demographic and biographic features of all participants in the pilot study, including those who dropped out.

Visual inspection of the plotted SUDS data for the four participants who completed the pilot study revealed that all were experiencing noticeable feelings of distress at baseline. By post-intervention, three participants had achieved a reduction in subjective distress; two to a non-noticeable level (i.e., a rating of <3). Supplementary Table 4 in the Supplementary Material section shows the mean SUDS ratings per participant by phase. Time-series analyses confirmed the findings observed through visual inspection of the plotted SUDS data. See Supplementary Figure 1; Supplementary Table 5 in the Supplementary Material section for the time-series analysis data.

The severity of each participant’s symptoms of anxiety, measured by the DASS-21 Anxiety subscale, is shown in Supplementary Table 6 in the Supplementary Material section. Two participants exhibited clinically significant improvements in anxiety from baseline to post-intervention. See Supplementary Figure 2 in the Supplementary Material section for a summary of the anxiety outcomes.

The severity of each participant’s symptoms of depression, measured by the DASS-21 Depression subscale, is shown in Supplementary Table 6. Three participants exhibited clinically significant improvements in depression from baseline to post-intervention. See Supplementary Figure 3 in the Supplementary Material section for a summary of depression outcomes.

Each participant’s overall functioning, measured by the OQ-45 Total Score, is shown in Supplementary Table 7 and illustrated in Supplementary Figure 4 in the Supplementary Material section. All participants recorded an improvement in their life functioning ratings from baseline to post-intervention. See Supplementary Table 7; Supplementary Figure 4 for a summary of life functioning outcomes.

All finishing participants showed some level of improvement in one or more areas examined by the self-report measures. The two key factors that were associated with discontinuation in the study for the six non-completers were a diagnosis of depression alone and a longer duration of mental illness. Four of the six non-completers had depression only, and five of the six non-completers had their mental illness for longer than 11 years.

See Supplementary Table 8 in the Supplementary Material section for how participants rated the app.

Results: Main Study

Participant Characteristics and Descriptive Statistics

A total of 39 participants commenced the main study. Of the 29 who finished the post-intervention phase, 20 were female. Seven participants (B7, C3, D3, D5, E1, E4, and E6) were assertively followed up during the study when they did not provide SUDS ratings for three consecutive days and then re-joined the study. The age-range of completers was 18–57 (M = 34.0,SD = 12.2); 16 (55.17%) had their diagnosis for 5 years or less, 15 (51.72%) were receiving concurrent counselling, and 14 (48.28%) were taking psychotropic medication. Eleven (37.93%) had an anxiety disorder only, eight (27.59%) had depression only, and 10 (34.48%) had co-morbid anxiety and depression. In terms of motivation to comply with the intervention, 10 of the completers (34.48%) agreed that their psychological health would improve, 13 (44.83%) were neutral, and six (20.69%) thought their psychological health would not improve.

Of the 10 non-completers, seven dropped out in the baseline phase and three in the intervention phase. All 10 were followed up once after not providing SUDS for 3 days and encouraged to continue in the study. The age range of the non-completers was 29–68 (M = 44.5,SD = 12.3) and was significantly different to the completers [t(39) = −2.34;p = 0.03] who were younger. For the 10 non-completers, all had depression with two (20%) having comorbid anxiety. Six (60%) agreed that the intervention would improve their psychological health, three (30%) were neutral, and one (10%) thought their psychological health would not improve.

Three participants (B2, B4, and D1) could not be contacted at 6-month follow-up.

For more information on participant characteristics, including those who dropped out of the study, see Supplementary Tables 9–18 in the Supplementary Material section.

Effectiveness of the Apps in Reducing Subjective Distress

All the apps were able to demonstrate significant improvements in reducing subjective distress for at least two of their participants.

Visual Inspection

Visual inspection of the plotted SUDS data and the time-series analyses for the 29 participants who completed the study is reported below by app.

SuperBetter

Five participants used theSuperBetter app. Visual inspection revealed that four participants were experiencing noticeable feelings of distress at baseline. By post-intervention, three participants had achieved a reduction in subjective distress to a non-noticeable level (i.e., a rating of <3), but two had deteriorated. Supplementary Table 19 in the Supplementary Material section shows the mean SUDS ratings per participant by phase; Supplementary Figures 5, 6 in the Supplementary Material section display the continuous data.

Smiling Mind

Seven participants used theSmiling Mind app. Visual inspection revealed that all were experiencing noticeable feelings of distress at baseline. By post-intervention, six participants had achieved a reduction in subjective distress; five to a non-noticeable level (i.e., a rating of <3). Supplementary Table 20 in the Supplementary Material section shows the mean SUDS ratings per participant by phase; Supplementary Figures 7, 8 in the Supplementary Material section display the continuous data.

MoodMission

Six participants used theMoodMission app. Visual inspection revealed that all were experiencing noticeable feelings of distress at baseline. By post-intervention, all but one participant (C3) had achieved a reduction in subjective distress; four to a non-noticeable level (i.e., a rating of <3). Supplementary Table 21 in the Supplementary Material section shows the mean SUDS ratings per participant by phase; Supplementary Figures 9, 10 in the Supplementary Material section display the continuous data.

MindShift

Five participants used theMindShift app. Visual inspection revealed that all were experiencing noticeable feelings of distress at baseline. By post-intervention, three participants had achieved a reduction in subjective distress; two to a non-noticeable level (i.e., a rating of <3). Supplementary Table 22 in the Supplementary Material section shows the mean SUDS ratings per participant by phase; Supplementary Figures 11, 12 in the Supplementary Material section display the continuous data.

Destressify

Six participants used theDestressify app. Visual inspection revealed that all but one participant were experiencing noticeable feelings of distress at baseline. By post-intervention, four participants had achieved a reduction in subjective distress; three to a non-noticeable level (i.e., a rating of <3). Supplementary Table 23 in the Supplementary Material section shows the mean SUDS ratings per participant by phase; Supplementary Figures 13, 14 in the Supplementary Material section display the continuous data.

Time-Series Analysis

Time-series analyses confirmed the findings observed through visual inspection of the plotted SUDS data for each app. Using the statistical package,R, version 1.2.5033, an interrupted time-series analysis (ITSA) used autoregressive integrated moving average (ARIMA) models to evaluate intervention effects on each participant’s data. Autocorrelation effects were addressed using the augmented Dickey–Fuller test (Mushtaq, 2011) and Ljung–Box Q (Burns, 2002). The residuals in the models exhibited independence and normality.

The SUDS time-series analysis data are presented in Tables 1–5 and can be matched to the relevant participants in Supplementary Figures 5–14 in the Supplementary Material section.

TABLE 1

Table 1. Times-series analysis results for participants A1–A5 usingSuperBetter.

TABLE 2

Table 2. Time-series analysis results for participants B1–B7 usingSmiling Mind.

TABLE 3

Table 3. Time-series analysis results for participants C1–C6 usingMoodMission.

TABLE 4

Table 4. Time-series analysis results for participants D1–D5 usingMindShift.

TABLE 5

Table 5. Time-series analysis results for participants E1–E5 usingDestressify.

Effectiveness of the Apps for Reducing Anxiety

All the apps were able to demonstrate significant improvements in anxiety for at least three of their participants. The severity scores of each participant’s symptoms of anxiety, measured by the DASS-21 Anxiety subscale and illustrated in Supplementary Figures 15–19 in the Supplementary Material section, are reported by app below.